Time-domain computing in hardware accelerators for edge applications

Lou, Jie; Gemmeke, Tobias; TaheriNejad, Nima

doi:44940

TY - THES
AU - Lou, Jie
TI - Time-domain computing in hardware accelerators for edge applications
PB - Rheinisch-Westfälische Technische Hochschule Aachen
VL - Dissertation
CY - Aachen
M1 - RWTH-2025-10712
SP - 1 Online-Ressource : Illustrationen
PY - 2025
N1 - Veröffentlicht auf dem Publikationsserver der RWTH Aachen University 2026
N1 - Dissertation, Rheinisch-Westfälische Technische Hochschule Aachen, 2025
AB - Efficient computing is becoming increasingly critical for energy-constrained edge devices. Digital computing remains the dominant approach, valued for its robustness against noise and compatibility with technology scaling. Traditionally, power reduction has relied on scaling of process technology featuring smaller device dimensions and lower supply voltages. However, as process technology scaling reaches its physical limits, this approach is becoming increasingly challenging. Beyond the classical digital approaches, analog computing has undergone a renaissance, demonstrating significant improvements in energy and area efficiency. However, it comes with a trade-off between the cost of analog computation and the achievable signal-to-noise ratio (SNR). Fundamentally, analog computing faces challenges such as variability, limited voltage scaling and re-design efforts in technology migration. Time-domain (TD) computing has attracted attention for its inherent analog signal processing properties and compatibility with digital circuits. It encodes numerical values as intervals between discrete signal arrival times, leveraging the inherent delay characteristics of circuits to enable energy-efficient operation. By using digital components to encode information as time signals, TD computing benefits from technology and voltage scaling while being compatible with commercial digital design flow. However, due to its specific energy-delay trade-off, novel circuit designs and computing architectures are still being developed to optimally exploit the potential of TD computing. This thesis focuses on the design of hardware accelerators for time-domain computing, analyzing suitable application domains, and identifying the principles and constraints that should be considered during algorithm-hardware co-optimization. It addresses the general research question: How to improve energy efficiency in view of limited technological progress for deterministic signal processing? We begin our exploration of time-domain circuits with binary neural networks (BNNs) for image processing tasks (Chapter 2.3). A standard cell based basic cell is first proposed, along with its corresponding array architecture and peripheral circuits. The basic cell supports both rising and falling edge computing. To enhance throughput in TD computing, we introduce a wave-pipelining technique that improves throughput while preserving computational accuracy. Regular placement (RP) and custom routing (CR) methodologies are employed to preserve structural regularity and minimize variation during place and route (P</td><td width="150">
AB - R). Furthermore, we propose two custom delay cells for BNNs, offering trade-offs in energy efficiency, area, and accuracy. Following this, we extend our exploration of time-domain circuits to convolutional neural networks (CNNs) for more complex tasks (Chapter 2.4). We adopt two design types: a 1x1 custom cell based design, which utilizes two custom cells for 1x1 bitwise multiplication while balancing accuracy and energy cost; and a 1x2 standard cell based design, which performs 1x2 bitwise multiplication using 1-bit weights and 2-bit activations. A successive approximation register time-to-digital converter (SAR-TDC) is designed to convert time-domain signals into digital values. By leveraging the inherent characteristics of the network, we explore multiple operational modes to enhance the computational speed of the macro. Furthermore, we extend TD computing beyond neuromorphic applications to forward error correction (FEC) in communication systems, with a particular focus on low-density parity-check (LDPC) codes (Chapter 3.3). We incorporate double-edge operation and utilize the bit-split technique to enable efficient decoding. The minimum values computed at the check node are preserved in the time domain and converted into the digital domain using TDCs. To optimize energy efficiency, both individual and shared DTC architectures are explored. We demonstrated the results of time-domain computing across various applications using multiple GlobalFoundries 22nm FDSOI silicon prototype chips. Power and performance were measured under various supply voltages. For neuromorphic applications, we analyzed cell delay and computation arrival times for different MAC results across supply voltages, demonstrating that the TD macro can perform high-resolution computations under process, voltage and temperature (PVT) variations. For the LDPC application, we evaluated the accuracy of the check node’s minimum value finder under different supply voltages. Based on multiple successful tapeouts, our results indicate that time-domain computing offers a promising alternative to conventional digital and analog approaches in certain application domains. It achieves approximately a 1.8X-3.7X improvement in energy efficiency for BNNs and a 1.3X improvement for CNNs, compared to SotA works.
LB - PUB:(DE-HGF)11
DO - DOI:10.18154/RWTH-2025-10712
UR - https://publications.rwth-aachen.de/record/1023746
ER -

h1

h2

h3

h4

h5

h6

RWTH

Kontakt

RWTH Publications

Allgemeines