# Design and Characterization of a 3-bit 24-GS/s Flash ADC in 28-nm Low-Power Digital CMOS

Gregor Tretter, Mohammad Mahdi Khafaji, David Fritsche, Corrado Carta, Member, IEEE, and Frank Ellinger, Senior Member, IEEE

*Abstract*—This paper presents the design and characterization of a 24-GS/s 3-bit single-core flash analog-to-digital converter (ADC) in 28-nm low-power digital CMOS. It shows the design study of the track-and-hold circuit and subsequent buffer stage and provides equations for bandwidth calculations without extensive circuit simulations. These results are used to target leading-edge speed performance for a single ADC core. The ADC is capable of achieving its full sampling rate without time interleaving, which makes it the fastest single-core ADC in CMOS reported to date to the best of our knowledge. With a power consumption of 0.4 W and an effective number of bits of 2.2 at 24 GS/s, the ADC achieves a figure of merit of 3.6 pJ per conversion step while occupying an active area of 0.12 mm<sup>2</sup>. Due to its high sampling frequency this ADC can enable ultra-high-speed ADC systems when combined with moderate time interleaving.

*Index Terms*—Analog-to-digital converter (ADC), flash ADC, non-time-interleaved, track-and-hold amplifier (THA) bandwidth, THA buffer.

## I. INTRODUCTION

ODERN communication systems require data rates up to several tens of Gb/s. One particularly challenging case is the wireless board-to-board communication in supercomputers, where data throughput above 100 Gb/s is needed. Technically this can be achieved with carrier frequencies above 100 GHz, as large bandwidths up to tens of GHz are available in this case [1]. Systems with such large bandwidth are very challenging for the incorporated analog-to-digital converters (ADCs), which can easily become the bottleneck of the wireless link. Additionally, in order to enable systems-on-chip (SOCs) with digital signal processing and ADCs integrated on the same chip, it becomes a requirement for the ADC to be realized in a modern CMOS technology. Recently published CMOS ADCs show good power efficiency at sampling rates in the lower GHz range [2]-[5] with successive approximation register (SAR) ADCs being most popular. It is possible

The authors are with the Department of Electrical and Computer Engineering, Technische Universität Dresden, 01069 Dresden, Germany (e-mail: gregor.tretter@tu-dresden.de).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMTT.2016.2529599

to reach higher sampling rates with the same basic circuit structures by applying time interleaving [6]-[8]. As long as the overhead of multi-phase clock generation is negligible, it is theoretically possible to increase the sampling rate with no penalty in terms of required energy per conversion. For this reason time-interleaving topologies are widely used for high-speed ADCs and have been "extensively exploited (...) to achieve low figures of merit" [9], equivalent to low energy per conversion step in this context. Recently, sampling rates as high as 90 GS/s have been reported with ADC cores running at 1.4 GS/s [10]. Unfortunately it is not possible to use time interleaving at an arbitrary scale, as several problems limit the performance of heavily interleaved systems, such as jitter in multi-phase clock generation and distribution, clock transition times, input capacitance, requirements on the track-and-hold amplifiers (THAs), and latency [9], [11]-[13]. Further increases in sampling rate without exacerbating those problems can be achieved by implementing faster ADC cores. This relaxes the requirements on the multi-phase clock generation and reduces the latency, while enabling highest input bandwidth.

The design goal for the presented ADC has been to achieve the highest possible sampling speed with a single ADC core. As a result, the flash ADC topology has been chosen. The presented ADC core is capable of working at sampling rates up to 24 GS/s, while being designed in a low-cost low-power digital CMOS technology. In addition to the topics presented in [14], this paper presents comprehensive design considerations for the analog input stages and gives insights into the circuit implementation of all ADC sub-blocks. Furthermore, it shows additional and more detailed measurement results, statically as well as at highest input frequencies. Section II shows the ADC architecture. As circuit implementations for such high frequencies require comprehensive design considerations, it is important to specify the bandwidth requirements for the critical circuit blocks, especially for the analog input stages. Section III investigates the track-and-hold (T/H) circuit and the subsequent buffer stage and provides a method to directly calculate the required bandwidth without extensive circuit simulations. Insights into the circuit implementation are given in Section IV, while Section V presents the chip characterization.

# II. ADC ARCHITECTURE

Fig. 1 shows the system-level schematic of the presented ADC. For highest conversion speed, it relies on the flash topology. The schematic shows a T/H stage and subsequent buffer  $(Buf_1)$  at the input, followed by a comparator (Cmp), further amplifiers, and latches (L) in each of the parallel

0018-9480 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received July 21, 2015; revised December 02, 2015; accepted February 05, 2016. Date of publication February 23, 2016; date of current version April 01, 2016. This work was supported by the German Research Foundation under the framework of Collaborative Research Center 912 'Highly Adaptive Energy-Efficient Computing'' and by the Excellence Cluster Cool Silicon under the framework of the BMBF project Cool-RF-28. This paper is an expanded version from the IEEE RFIC Symposium, Phoenix, AZ, USA, May 17–19, 2015.



Fig. 1. ADC block diagram.



Fig. 2. (a) Basic SC T/H circuit. (b) T/H equivalent circuit during track phase.

data-processing paths. The binary output signals are generated by thermometer to binary conversion logic (T2B). By utilizing a modern CMOS process it is possible to achieve sampling rates of tens of GHz with circuit structure sizes in the range of hundreds of  $\mu$ m. This requires RF design techniques including electromagnetic (EM) field simulations for lines and structures because the circuit size is no longer negligible. Special care needs to be taken of the bandwidth of the analog frontend consisting of the T/H buffer and the comparators. The time synchronization after the comparators is usually performed by a master–slave flip-flop, which is very challenging to design at frequencies of tens of GHz. In order to reduce the effective regeneration time, three latches and an amplifier have been combined to form a master–slave–master (MSM) flip-flop [15].

#### III. BANDWIDTH CONSIDERATIONS

Circuit operation at highest speed requires careful bandwidth consideration. The input stages consisting of a T/H circuit and subsequent buffer pose the highest requirements because they work in the analog domain where the signals contain time and amplitude information. While it is a common approach to determine the required bandwidth by complex transistor-level simulations, this section presents practical equations based on simple models to directly calculate the target bandwidth.

# A. T/H Stage

The most basic topology of a switched capacitor (SC) T/H circuit is shown in Fig. 2(a). The transistor  $M_1$  controls the electrical connection between the input and output of the circuit. While input and output are isolated during the hold phase and the charge on the hold capacitor  $C_h$  is preserved, the electrical



Fig. 3. Response of a T/H circuit in track mode to a step input, using the low-pass filter model of Fig. 2(b), as described by (1).

connection during the track phase should ideally be a short circuit. In this phase the impedance between input and output depends on the drain-source resistance of  $M_1$ , which can be modeled as a resistor  $R_t$ . This creates a first-order low-pass filter as a simple model for a SC T/H circuit in track mode, as illustrated in Fig. 2(b). For input frequencies close to the Nyquist frequency it is possible that two consecutive hold voltages are at the minimum and at the maximum of the input signal envelope. In this case the output signal of the T/H stage has to change from the minimum value to the maximum within one tracking period. This scenario can be modeled with a step at the input of the T/H stage with the amplitude  $V_{in,T/H,pp}$ , the peak-to-peak value of the T/H input voltage. The corresponding step response converges exponentially towards the input step value  $V_{in,T/H,pp}$ with a time constant of  $\tau_t = R_t C_h$ , as shown in Fig. 3,

$$V_{\rm out,T/H}(t) = V_{\rm in,T/H,pp} \cdot (1 - e^{-t/\tau_t}).$$
 (1)

At time  $\Delta t_1$ , the output of the T/H reaches  $V_{\text{out},\text{T/H}}(\Delta t_1) = V_{\text{in},\text{T/H},\text{pp}} - \Delta V_1$ . As we are interested in the change of the output voltage within one track period, we define  $\Delta t_1$  as the duration of one track period, which is half of a sampling period  $f_s: \Delta t_1 = 1/(2f_s)$ . The T/H deviation  $\Delta V_1$  should be less or equal to half of a least significant bit (LSB) of the ADC,

$$\Delta V_1 \le 0.5 \cdot \frac{V_{\text{in},\text{T/H},\text{pp}}}{2^B} = \frac{V_{\text{in},\text{T/H},\text{pp}}}{2^{B+1}}.$$
 (2)

*B* represents the number of bits of the ADC. The definitions of  $\Delta t_1$  and  $\Delta V_1$  together with (1) give the required corner frequency  $f_{3 \text{ dB}}$  for the T/H stage in track mode

$$f_{3 \text{ dB}} \ge \frac{f_s \cdot \ln\left(2^{B+1}\right)}{\pi}.$$
(3)

This means that the tracking bandwith for the T/H stage needs to exceed the sampling frequency for resolutions higher than 4 bit. For the presented ADC the sampling rate is  $f_s = 24$  GHz and the number of bits is B = 3, which results in a required tracking bandwidth of  $f_{3 \text{ dB}} = 21.2$  GHz.



Fig. 4. Simulated output waveforms of buffers with different bandwidth, which are fed by an ideal T/H circuit. The T/H input signal frequency is close to the Nyquist frequency  $f_{nyq}$ . Low buffer bandwidths  $f_{buf}$  compromise the hold plateaus of the ideal T/H signal and thus defeat the purpose of the T/H stage.

## B. T/H Buffer

Apart from designing the T/H stage it is also important to consider the bandwidth of the subsequent buffer. High bandwidth is difficult to achieve for this buffer because it has to drive all comparators, which create a large capacitive load. On the other hand, this buffer is especially important because if its bandwidth is too low it will substantially decrease the effective ADC resolution, as has been described in [16]. While in other designs this critical point is addressed empirically and only sometimes the resulting bandwidth specifications are given [17], this section describes a method to calculate the bandwidth requirements for the T/H buffer, which can be used for system-level specifications without extensive circuit simulation.

The reason for the importance of the buffer are the hold plateaus, which are introduced by the T/H stage. They are formed by higher order harmonics. If the low-pass behavior of the buffer filters those harmonics, the plateaus are proportionally compromised. This effect is shown in Fig. 4, which depicts  $V_{\rm out, buf}$ , the output signal of an ideal T/H stage being filtered by buffer stages, which are modeled as first-order low-pass filters with different corner frequencies  $f_{\text{buf}}$ . The corner frequencies vary between the Nyquist frequency  $f_{nyq}$  and four times the Nyquist frequency. Additionally, the response of an ideal filter with infinite bandwidth is shown. The input signal of the ideal T/H stage is a sinusoidal signal of frequency f and amplitude  $V_{\text{in},\text{T/H}} = V_{\text{in},\text{T/H},\text{pp}}/2$ . During the track phase, the ideal buffer perfectly follows the output signal of the T/H stage and preserves the sinusoidal signal shape and the amplitude  $V_{\text{in,T/H}}$ . At the time t = 0, the output signal of the ideal buffer changes from the sinusoidal waveform of the track mode to the hold plateau. Depending on the buffer bandwidth  $f_{buf}$ , the other signals need a longer time to follow the ideal filter signal.

As can be seen in the given example, a buffer bandwidth of  $f_{\text{buf}} = f_{nyq}$  is clearly not sufficient because the resulting signal is no longer constant during the hold phase, which defeats the purpose of the T/H stage. The buffer output signal for  $f_{\text{buf}} = 2f_{nyq}$  is on the borderline with the signal reaching the value of the ideal hold plateau just at the end of the hold phase. For all acceptable buffer bandwidths  $(f_{buf} > 2f_{nyq})$  the corresponding signals are approximately parallel to the ideal signal at the beginning of the hold phase (t = 0 in Fig. 4), which means they have similar slopes. As long as this approximation holds, the voltage deviation  $\Delta V_2$  can be understood as the result of a phase difference  $\Delta \varphi$  between the ideal signal and the output signal of the buffer,

$$\Delta V_2(t) \approx |\hat{V}_{\rm in,T/H} \cdot \sin(2\pi f t + \Delta \varphi) - \hat{V}_{\rm in,T/H} \cdot \sin(2\pi f t)|.$$
(4)

The phase shift  $\Delta \varphi$ , which is introduced by the low-pass behavior of the buffer stages, is much smaller than 45° because the buffer corner frequencies  $f_{\text{buf}}$  are much higher than the maximum signal frequency  $f \cdot \Delta \varphi$  can be expressed as

$$\Delta \varphi = \arctan\left(\frac{f}{f_{\text{buf}}}\right) \ll 45^{\circ}.$$
 (5)

For small values of  $\Delta \varphi$  there is a maximum for  $\Delta V_2$  at

$$t_{\max} = -\frac{\Delta\varphi}{4\pi f} \tag{6}$$

leading to

$$\Delta V_{2,\max} = \Delta V_2(t = t_{\max}) = \left| 2\hat{V}_{\text{in},\text{T/H}} \cdot \sin\left(\frac{\Delta\varphi}{2}\right) \right|.$$
(7)

During the hold phase, the input signal of the buffer is constant at a value of  $V_{\rm in,buf}$  and the output signal  $V_{\rm out,buf}(t)$  approaches this constant value exponentially over time, starting at  $V_{\rm in,buf} + \Delta V_2$ , the beginning of the hold phase. The largest deviation happens if the hold phase starts at  $t_{\rm max}$  so that  $\Delta V_2 = \Delta V_{2,\rm max}$ . In this case,

$$V_{\text{out,buf}}(t - t_{\text{max}}) - V_{\text{in,buf}}(t_{\text{max}})$$
  
=  $\Delta V_{2,\text{max}} \cdot e^{-(t - t_{\text{max}})/\tau_{\text{buf}}}$   
=  $|2\hat{V}_{\text{in,T/H}} \cdot \sin\left(\frac{\Delta\varphi}{2}\right)| \cdot e^{-2\pi(t - t_{\text{max}})f_{\text{buf}}}.$  (8)

Since the purpose of the T/H circuit is to keep the output voltage constant during the hold phase it is necessary to define a time  $t_c$  after which the signal change for the rest of the hold phase can be neglected. Assuming less than a tenth of an LSB is negligible results in

$$V_{\text{out,buf}}(t - t_{\text{max}} = t_c) - V_{\text{in,buf}}(t_{\text{max}}) = \frac{1}{10} \frac{2V_{\text{in,T/H}}}{2^B}.$$
 (9)

Solving for  $t_c$  yields

$$t_c = \frac{1}{2\pi f_{\text{buf}}} \ln\left(10 \cdot 2^B \cdot \sin\left(\frac{1}{2}\Delta\varphi\right)\right).$$
(10)

Applying (5) to (12) and considering the small-angle approximation for both sin and arc tan results in a simplified result,

$$t_c \approx \frac{1}{2\pi f_{\text{buf}}} \ln\left(5 \cdot 2^B \cdot \frac{f}{f_{\text{buf}}}\right). \tag{11}$$

This is shown in Fig. 5 for an input signal frequency of f = 12 GHz, which is the nyquist frequency at 24-GS/s operation. The resolution *B* is swept between 3–6 bits. To achieve a certain



Fig. 5. Settling time at the output of the buffer versus buffer bandwidth  $f_{buf}$ , calculated using (11). The input signal frequency is f = 12 GHz, the number of bits B varies between 3 and 6.

settling time, higher bandwidth is required for higher resolutions because of the increased settling precision. While increasing the buffer bandwidth in order to reduce the settling time is very efficient as long as the values of  $f_{buf}$  are small, this tradeoff becomes more and more inefficient for higher bandwidths. The required bandwidth for the presented 3-bit 24-GS/s ADC can also be determined with Fig. 5. Buffer bandwidths below 20 GHz result in settling times larger than 20 ps, which are not sufficient for the presented ADC with a hold phase duration of  $t_H = 1/2f_s = 21$  ps. The required settling time depends on the system implementation, but values in the order of half of a hold phase are feasible. This corresponds to values of  $t_c = 10.5$  ps in this case and results in a minimum buffer bandwidth of 38 GHz for the presented ADC.

## **IV. CIRCUIT IMPLEMENTATION**

#### A. Basic Considerations

The presented ADC has been designed in a 28-nm low-power digital CMOS process, which offers transistors with a breakdown voltage of 1.1 V. The transit frequency  $f_T$  and maximum frequency of oscillation  $f_{\text{max}}$  of the process are both around 250 GHz for drain-source voltages of 1.1 V and below 200 GHz for operating points with drain-source voltages of 0.6 V. To achieve the highest possible sampling rates, all circuits employ source coupled logic (SCL). SCL circuits are differential and are suited for highest operating frequencies [18]. Moreover, SCL is robust against power-supply switching noise, which CMOS logic creates to a high extent [19]. In order to increase the transistor drain-source bias voltages and improve the device speed, the chip works with two custom supply voltage domains at 1.4 and 1.75 V. Careful SCL design ensures that no transistor exceeds its specified breakdown voltage. The 1.75-V domain is used only for the clock buffer stages that drive the THA so that high gate voltages can be supplied to control the THA.

## B. T/H Stage

The T/H stage of the presented ADC is a differential SC circuit with clock feedthrough cancellation [20] (Fig. 6). The required hold capacitance  $C_h$  depends on the sampling rate and



Fig. 6. T/H stage and buffer.

resolution of the ADC. Its size impairs the bandwidth [21], [22], droop, thermal noise [23], and signal coupling. The nodes that  $C_h$  is connected to are also loaded by parasitic capacitances in the buffer input and wiring. A conservative approach is to completely implement the required value of  $C_h$  with metal-insulator-metal (MIM) capacitors [21], [24], [25]. Alternatively the parasitic capacitances can be considered so that  $C_h$  consists of MIM capacitors and parasitics, which reduces the size of the MIM capacitors [26]. Due to the high sampling rates and low resolution of the presented ADC the parasitic capacitances suffice and the approach of [27] is used, which has no physical capacitor implementation for  $C_h$ , but relies solely on parasitics. The drain-source resistance of  $M_{1a,b}$ ,  $r_{DS,M1a,b}$ , defines the electrical connection between the circuit input and the subsequent buffer and thus represents resistor  $R_t$ , which has been introduced in the T/H model in Fig. 2(b). The gate-source voltage of  $M_{1a,b}$ ,  $V_{GS,M1a,b}$  controls  $r_{DS,M1a,b}$ , which means for the given circuit implementation that  $r_{DS,M1a,b}$  changes with the input voltage,

$$r_{\rm DS,M1a,b} \propto \frac{1}{V_{\rm GS,M1a,b} - V_{\rm th}}.$$
 (12)

For large input voltages,  $V_{\rm GS}$  is decreased and  $r_{\rm DS}$  increased, which results in the smallest bandwidth of the T/H circuit. The T/H stage has been designed for a track-mode bandwidth of 30 GHz under this worst case condition, which is still higher than the minimum required bandwidth of  $f_{3 \text{ dB}} = 21.2 \text{ GHz}$  according to (3).

## C. T/H Buffer

As derived in Section III-B, a buffer bandwidth of 38 GHz is required to achieve a settling time of  $t_c = 10.5$  ps at the output of the T/H buffer. For additional design margin the implemented buffer has been over-constrained to a bandwidth of  $f_{\rm buf} = 45$  GHz, which results in a settling time of  $t_c = 8.4$  ps according to (11). Fig. 7 shows a typical waveform of transistor-level simulations of the T/H circuit and buffer at the change between the T/H phase. Simulated at an input frequency of  $f_{\rm in} =$ 



Fig. 7. Simulated transistor-level waveforms at the output of the T/H stage and the subsequent buffer.

11.8125 GHz and a sampling rate of  $f_s = 24$  GS/s it demonstrates the behavior close to the Nyquist frequency. The simulated settling time is 7.7 ps, which fits very well to the calculated value of 8.4 ps.

In order to achieve the bandwidth of  $f_{\text{buf}} = 45$  GHz, two peaking inductors  $L_{1a,b} = 200$  pH have been employed. The  $80-\Omega$  feedback resistor  $R_F$  is required to increase the static linearity of the buffer stage. A 110-fF capacitor  $C_F$  creates an additional pole-zero pair, which helps increasing the bandwidth of the buffer. Since the corner frequency of  $C_F R_F$  is above the Nyquist frequency, the buffer benefits from the improved linearity during the hold phases even though  $C_F$  shorts the feedback resistor at higher frequencies

$$f_{RC1} = \frac{1}{2\pi C_F R_F} = 18 \text{ GHz} > \frac{f_s}{2}.$$
 (13)

## D. Comparators and Offset Compensation

The flash ADC topology uses a set of comparators to simultaneously compare the input signal to different reference voltages, as shown in the ADC block diagram in Fig. 1. Deviations due to device mismatch in the reference voltage generation, as well as in the comparator circuits, directly create static nonlinearities in the ADC transfer characteristics, which manifest in degradations of the integral nonlinearity (INL) and differential nonlinearity (DNL). Classic flash ADCs make use of resistive voltage divider ladders to create the required reference voltages [13]. Since this leaves no possibility to account for random process mismatch, different approaches have been shown to reduce static nonlinearity by means of adjustable offset compensation. While [28] adds a calibration circuit to a resistive reference ladder to generate the required voltages, [12], [17], [29] use on-chip DACs for reference voltage generation and offset compensation and [30] goes even one step further by integrating DACs, as well as redundant comparators on chip in order to increase the calibration range. While these approaches can effectively eliminate static nonlinearities, they also require further circuitry and increase the system complexity. The presented ADC requires 14 different reference voltages for the 7 differential comparators, which are needed for 3-bit resolution. By incorporating single-ended to differential conversion circuitry into each comparator, the number of different voltages can be halved to 7. This opens the possibility to create the required ref-



Fig. 8. Employed comparator circuit.

erence voltages off-chip, which gives a simple, yet efficient and versatile method to account for all static nonlinearity problems that can arise.

The circuit implementation of the differential comparators is depicted in Fig. 8. The single-ended dc reference voltage  $V_{\rm ref}$ is created off-chip and supplied to the chip via a bond-wire interface. Each SCL comparator uses a differential reference signal, which it generates from the single-ended version. The single-ended to differential conversion is shown on the left side of Fig. 8. The differential pair consisting of  $M_{1a,b}$  is degenerated by a 1-k $\Omega$  feedback resistor  $R_F$ , which leads to a differential output voltage adjustment range of  $\pm 500$  mV for input voltages  $V_{ref}$  between 820 mV and 1.18 V for a bias voltage  $V_{B0} = 1.0$  V. The stabilization capacitors  $C_S$  are used solely to isolate the differential reference voltage  $V_{ref,d}$  from the input voltages. For this purpose, capacitance values of 500 fF are sufficient, while enabling area-efficient integration of the capacitors into the layout of the comparator cell. In order to protect  $V_{\rm ref,d}$  from ripples on the supply voltage or on the single-ended reference voltage  $V_{ref}$ , a separate dc voltage distribution network that employs zero-ohm lines is used. This design aspect is described in Section IV-H.

## E. Latches

The circuit implementation of the latches is shown in Fig. 9. While the transistors  $M_{2a,b}$  control the behavior during the latch phase,  $M_{3a,b}$  are responsible for the regenerative recovery phase, which is achieved by cross-coupled feedback between both transistors.  $M_{1a,b}$  serve as buffers and level shifters for the clock signal  $V_{clk}$ . The bias voltage  $V_B$  can be used to adjust the gain of the buffers.

# F. Clock Generation

Supplying analog circuit blocks with clock signals at highest speeds is a very challenging task [16]. The ADC relies on two clock signals at full sampling rate, one for the latches and one for the T/H circuit. The structure for both clock generation circuits is the same. They consist of an active balun and two gain stages, all designed in SCL with inductive peaking for bandwidth optimization. A dc voltage distribution network based on zero-ohm lines is used to avoid crosstalk via the supply voltage between clock and data processing circuitry in the ADC core.

## G. Thermometer to Binary Conversion Logic

The last circuit block in the signal path before the output buffers is the thermometer to binary conversion logic, which



Fig. 9. Employed latch.



Fig. 10. Thermometer to binary conversion logic.

generates the full-speed binary output signals. It is based on the structure proposed in [31], which requires NAND and OR logic gates only and can prevent simple bubble errors. The logic gates have been designed in the SCL topology. Buffer stages with delay times close to those of the NAND and OR gates have been added so that the delay of the conversion logic is the same for all output bits. The resulting block diagram is shown in Fig. 10.

## H. DC Voltage Distribution

DC voltage distribution is an important aspect of the design of integrated circuits at frequencies in the GHz range. It is closely connected to the circuit layout because the utilized decoupling capacitors oftentimes occupy significant portions of the chip area. The decoupling capacitors are required to stabilize on-chip dc voltages, which are supplied from external sources by bondwire interfaces. The goal is to have clean dc voltages without any high-frequency components in form of spikes or ripples. Furthermore, crosstalk between different circuits via the supply voltage domain is to be prevented. Both aspects can be met with large decoupling capacitors, which short high-frequency signals to ground. The problem in practical realizations is that any capacitor forms a resonance frequency together with a series inductance and no decoupling or stabilization takes place at this frequency. The inductance can, for example, originate from the bond-wire interface. Large capacitors can easily move the resonance frequency into regions, which are important for circuit operation. The possible consequences range from increased crosstalk to unintended circuit operation such as oscillation. A different dc voltage distribution method is the use of zero-ohm lines [32]. Metal-oxide-metal (MOM) capacitors are formed to the shape of a transmission line with very low wave impedance.



Fig. 11. ADC core area. Massive metal walls are used to guarantee the required metal density.



Fig. 12. Measurement setup

Those structures offer high attenuation of high-frequency signals, which makes them fulfill the same purpose as decoupling capacitors. They can be modeled with the help of EM field solvers and simulated with transmission line models to predict the expected behavior. They are not susceptible to resonance with single series inductances. Furthermore, implementing the dc connections of different circuit blocks with zero-ohm lines offers a well-defined isolation between those blocks to prevent crosstalk. The zero-ohm line approach has been used for all externally supplied dc voltages including supply voltages and reference voltages. Special care has been taken to prevent crosstalk to the different reference voltages of the comparators and to isolate the dc voltages of the clock circuitry from the data-processing hardware.

## I. Layout Considerations

Design rules in heavily scaled CMOS processes pose further restrictions on RF designs. One critical factor is the required metal density, which is 20% for tiles of 50  $\mu$ m × 50  $\mu$ m area in the given technology. One way to prevent metal filling structures within critical RF structures such as transmission lines, inductors, or RF gain stages is to reduce the size of those components below 50  $\mu$ m and encompass them with metal filling structures. As a result, the maximum size of inductors used is 35  $\mu$ m. The circuit blocks in the ADC core area are surrounded by massive metal "walls" to fulfill the density requirements without increasing parasitic capacitances inside the circuit blocks due to metallic filling structures (Fig. 11).



Fig. 13. Measured static transfer characteristics with and without dc offset compensation.



Fig. 14. Simulated and measured SNDR at different sampling rates. (a) 20GS/s. (b) 24 GS/s.

#### V. MEASUREMENTS

## A. Measurement Setup

The presented ADC has been characterized with a hybrid measurement setup, which makes use of wire bonding and on-chip probing, as shown in Fig. 12. While all RF input and output signals have been connected using probes, the dc supply and control voltages have been generated on a dc printed circuit board (PCB) and have been supplied to the chip by a bond-wire interface to a daughter PCB. An off-chip balun has been used to generate the differential input signal from a single-ended source. Its bandwidth of 10 GHz limits the maximum input



Fig. 15. Chip photograph.



Fig. 16. Measured FFT of the ADC output for a 9.1-GHz input signal sampled at 24 GS/s.

signal frequency for this test setup. The two clock signals are created from one signal source in combination with a power divider and two phase shifters, which guarantees maximum measurement accuracy and flexibility. The output signals are captured with a real time oscilloscope (RTO) (Agilent DSA-X 96204Q), which offers four channels with 33-GHz input bandwidth so that the single-ended output of all bits can be evaluated simultaneously.

## **B.** Experimental Results

The static behavior of the presented ADC is shown in Fig. 13. The target differential peak-to-peak input voltage amplitude is 800 mV. Without offset compensation for the dc reference voltages, large deviations from the ideal static transfer function are visible. The corresponding DNL of 0.8 has a sizeable impact on the overall circuit performance. To account for this, automatic script-driven offset compensation at circuit startup has been used. A control script running on a dedicated PC can access the ADC input signals, the RTO at the output, as well as the dc board to automatically determine and apply the compensation coefficients for the dc reference voltages. This leads to a reduction of the DNL below 0.05, which represents almost perfect static behavior.

To describe the dynamic ADC performance, the measured signal-to-noise-and-distortion ratio (SNDR) is plotted in Fig. 14 for sampling rates of 20 and 24 GS/s in comparison to the simulated values. The measurements have been taken for normal circuit operation, as well as for a transparent THA

|           | Year | $f_{ m S}$ (GHz) | $^{\rm ENOB}_{@f_{\rm in}}$ | $f_{ m in}$ (GHz) | P<br>(W) | cores | $f_{ m s,1core} \  m (GHz)$ | $A \ (mm^2)$ | FOM<br>(pJ/convstep) | *<br>latency<br>(ps) | Technology            |
|-----------|------|------------------|-----------------------------|-------------------|----------|-------|-----------------------------|--------------|----------------------|----------------------|-----------------------|
| [33]      | 2014 | 20               | 3.7                         | 10                | 1.0      | 1     | 20                          | 0.6          | 3.9                  | 100                  | SiGe 130 nm           |
| [16]      | 2009 | 35               | 3.0                         | 11                | 4.5      | 1     | 35                          |              | 16.1                 | 57                   | <b>SiGe</b> 180 nm    |
| [34]      | 2009 | 20               | 4.0                         | 10                | 4.8      | 1     | 20                          |              | 15.0                 | 100                  | SiGe 130 nm           |
| [10]      | 2014 | 90               | 5.2                         | 20                | 0.7      | 64    | 1.4                         | 0.5          | 0.2                  | 1429                 | CMOS 32 nm SOI        |
| [17]      | 2013 | 10               | 5.0                         | 5                 | 0.2      | 4     | 2.6                         | 0.3          | 0.7                  | 777                  | CMOS 40 nm GP         |
| [35]      | 2014 | 14               | 3.5                         | 3                 | 0.2      | 1     | 14                          | 0.1          | 1.3                  | 143                  | CMOS 90 nm            |
| [36]      | 2011 | 36               | 2.2                         | 15                | 2.6      | 4     | 9                           | 0.2          | 15.7                 | 222                  | CMOS $65 \mathrm{nm}$ |
| This Work |      | 24               | 2.2                         | 10                | 0.4      | 1     | 24                          | 0.1          | 3.6                  | 83                   | CMOS 28 nm LP         |
|           |      |                  |                             |                   |          |       |                             |              |                      |                      |                       |

 TABLE I

 Performance Comparison to State-of-the-art ADCs Above 10 GS/s

\* to estimate the latency, 2-cycle conversions have been assumed for all listed ADCs

stage, which means the THA clock signal is constantly high. For each data point the signal source power has been adjusted to compensate for the frequency behavior of the input balun, which at the highest frequencies is not sufficiently matched to 50  $\Omega$ . At 20 GS/s, simulation predicts SNDR values close to the theoretical maximum of 20 dB for a 3-bit ADC. The measurements show slightly higher distortion resulting in a degradation of the SNDR to a minimum of 17 dB. Within measurement accuracy, both THA test scenarios show similar results. At 24 GS/s, both simulated and measured SNDR values decrease at high input signal frequencies. Furthermore, the difference between simulation and measurement is slightly increased in comparison to the 20-GS/s scenario. The minimum measured SNDR is 15 dB for both THA test cases, which corresponds to an effective number of bits (ENOB) of 2.2, as reported in [14] for the same hardware. Even though there is no visible advantage of the enabled THA circuit, the presence of a functional THA paves the way for moderate time interleaving with the same chip, which can enable an essential improvement in overall sampling speed. Fig. 16 shows the fast Fourier transform (FFT) of the ADC output of a 9.1-GHz signal, which is sampled at 24 GS/s.

The chip photograph in Fig. 15 shows the hybrid setup. The size of the inductorless flash core is  $0.06 \text{ mm}^2$ . Due to the clock buffers, which make use of inductors, the overall active area increases to  $0.12 \text{ mm}^2$ . The complete die area is 2.4 mm<sup>2</sup>.

Table I compares the performance of the presented ADC to the state-of-the-art in ADCs above 10 GS/s. The table is split into two groups, considering CMOS ADC implementations and ADCs in SiGe semiconductor technologies featuring bipolar devices. While the SiGe circuits achieve highest single-core sampling rates  $f_{s,1\text{core}}$ , they suffer from high power consumption and have the distinct disadvantage that they cannot be integrated on a single chip together with large-scale digital circuits. For the CMOS ADCs, especially the implementations that rely heavily on time interleaving and make use of high-performance or SOI CMOS processes achieve good tradeoffs between performance and power, which manifests in low numbers for the Walden figure of merit (FOM) [12]. Since the presented ADC does not employ time interleaving, the single-core sampling rate is higher than that of interleaved implementations with comparable sampling rates. It is the highest single-core sampling rate in CMOS to the best of our knowledge and has been implemented in a low-power low-cost CMOS technology. In order to compare the



Fig. 17. Comparison of state-of-the-art of ADC designs based on their singlecore performance as defined by  $FOM_{single}$  in (14).

state-of-the-art of ADC designs based on their single-core performance, a FOM is required, which states the core performance without the impact of time interleaving. The FOM of a single core is better than that of the complete interleaved ADC system due to the additional power consumption in the interleaving circuitry. Based on the numbers given in [10], an overhead of 25% in terms of power consumption is assumed for all time-interleaved ADC systems,

$$FOM_{single} = \begin{cases} 0.75 \cdot FOM, & \text{for interleaved ADCs} \\ FOM, & \text{for non-interleaved ADCs.} \end{cases}$$
(14)

Fig. 17 shows the single-core performance comparison. It illustrates the high sampling rate and yet good FOM of the presented ADC core.

The presented ADC can enable ultra-high sampling frequencies if combined with moderate time interleaving while preserving very low latency. To compare the dimensions of the delay between input and output, two-cycle conversions have been assumed for all listed ADCs. The resulting latency is shown in Table I and is superior for the presented ADC.

## VI. CONCLUSION

A 3-bit single-core flash ADC in LP digital CMOS has been presented, achieving sampling rates up to 24 GS/s without time interleaving. This is the result of a design that aims at the highest possible sampling rate for a single ADC core. In order to achieve this goal, the bandwidth requirements for the ADC input stages have been investigated and evaluated in the form of simple math equations for efficient circuit implementation. Featuring the highest single-core sampling rate reported in CMOS up to now, the presented ADC enables communications with high data throughput and yet low latency and paves the way for ultra-high sampling rates by applying moderate time interleaving.

## ACKNOWLEDGMENT

The authors thank Agilent Technologies for the assistance with high-speed real-time measurements.

#### REFERENCES

- D. Fritsche, C. Carta, and F. Ellinger, "A broadband 200 GHz amplifier with 17 dB gain and 18 mW DC power consumption 0.13 μm SiGe BiCMOS," *IEEE Microw. Wireless Compon. Lett.*, vol. 24, no. 11, pp. 790–792, Nov. 2014.
- [2] L. Kull et al., "A 3.1 mW 8 b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3049–3058, Dec. 2013.
- [3] C.-H. Chan, Y. Zhu, S.-W. Sin, S.-P. U, R. Martins, and F. Maloberti, "A 5 bit 1.25 GS/s 4times capacitive-folding flash ADC in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 9, pp. 2154–2169, Sep. 2013.
- [4] P. Harpe, B. Busze, K. Philips, and H. de Groot, "A 0.47–1.6 mW 5 bit 0.5–1 GS/s time-interleaved SAR ADC for low-power UWB radios," *IEEE J. Solid-State Circuits*, vol. 47, no. 7, pp. 1594–1602, Jul. 2012.
- [5] B. Verbruggen, J. Craninckx, M. Kuijk, P. Wambacq, and G. V. der Plas, "A 2.6 mW 6 b 2.2 GS/s 4-times interleaved fully dynamic pipelined ADC in 40 nm digital CMOS," in *Int. Solid-State Circuits Conf.*, 2010, pp. 296–297.
  [6] L. Kull *et al.*, "A 35 mW 8 b 8.8 GS/s SAR ADC with low-power
- [6] L. Kull et al., "A 35 mW 8 b 8.8 GS/s SAR ADC with low-power capacitive reference buffers in 32 nm Digital SOI CMOS," in VLSI Circuits Symp., 2013, pp. 260–261.
- [7] E. Tabasy, A. Shafik, K. Lee, S. Hoyos, and S. Palermo, "A 6 b 10 GS/s TI-SAR ADC with embedded 2-tap FFE/1-tap DFE in 65 nm CMOS," in VLSI Circuits Symp., 2013, pp. 274–275.
- [8] J. Wu et al., "A 5.4 GS/s 12 b 500 mW pipeline ADC in 28 nm CMOS," in VLSI Circuits Symp., 2013, pp. 92–93.
- [9] B. Razavi, "Design considerations for interleaved ADCs," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1806–1817, Aug. 2013.
- [10] L. Kull et al., "A 90 GS/s 8 b 667 mW 64× interleaved SAR ADC in 32 nm digital SOI CMOS," in *Int. Solid-State Circuits Conf.*, 2014, pp. 378–379.
- [11] M. Chu, P. Jacob, J.-W. Kim, M. LeRoy, R. Kraft, and J. McDonald, "A 40 Gs/s time interleaved ADC using SiGe BiCMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 2, pp. 380–390, Feb. 2010.
- [12] M. El-Chammas and B. Murmann, "A 12 GS/s 81 mW 5 bit time-interleaved flash ADC with background timing skew calibration," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 838–847, Apr. 2011.
- [13] B. Razavi, Principles of Data Conversion System Design. New York, NY, USA: Wiley, 1994.
- [14] G. Tretter, M. Khafaji, D. Fritsche, C. Carta, and F. Ellinger, "A 24 GS/s single-core flash ADC with 3 Bit resolution in 28 nm low-power digital CMOS," in *IEEE Radio Freq. Integr. Circuits Symp.*, May 2015, pp. 347–350.
- [15] W. Cheng et al., "A 3 b 40 GS/s ADC–DAC in 0.12 μm SiGe," in Int. Solid-State Circuits Conf., 2004, pp. 262–263.
- [16] S. Shahramian, S. Voinigescu, and A. Carusone, "A 35 GS/s, 4 Bit flash ADC with active data and clock distribution trees," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1709–1720, Jun. 2009.
  [17] A. Varzaghani *et al.*, "A 10.3 GS/s, 6 Bit flash ADC for 10G eth-
- [17] A. Varzaghani et al., "A 10.3 GS/s, 6 Bit flash ADC for 10G ethernet applications," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3038–3048, Dec. 2013.
- [18] P. Heydari and R. Mohanavelu, "Design of ultrahigh-speed low-voltage CMOS CML buffers and latches," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 10, pp. 1081–1093, Oct. 2004.
- [19] M. Alioto and G. Palumbo, "Design strategies for source coupled logic gates," *IEEE Trans. Circuits Syst. I, Fundam. Theory Appl.*, vol. 50, no. 5, pp. 640–654, May 2003.
- [20] G. Tretter, D. Fritsche, C. Carta, and F. Ellinger, "10 GS/s track and hold circuit in 28 nm CMOS," in *Int. Dresden–Grenoble Semicond. Conf.*, 2013, pp. 1–4.

- [21] D. Mattos et al., "An 8 Gsps, 65 nm CMOS wideband track-and-hold," in Int. New Circuits Syst. Conf., 2011, pp. 321–324.
- [22] G. Tretter, D. Fritsche, C. Carta, and F. Ellinger, "Enhancing the input bandwidth of CMOS track and hold amplifiers," in *Int. Microw., Radar, Wireless Commun. Conf.*, 2014, pp. 1–4.
- [23] B. Sedighi, A. Huynh, and E. Skafidas, "A CMOS track-and-hold circuit with beyond 30 GHz input bandwidth," in *IEEE Int. Electron., Circuits. Syst. Conf.*, 2012, pp. 113–116.
- [24] D. Cascella, F. Cannone, G. Avitabile, and G. Coviello, "A 2.5 GS/s 62 dB THD SiGe track-and-hold amplifier with feedthrough cancellation technique," in *IEEE Int. Electron., Circuits, Syst. Conf.*, 2012, pp. 109–112.
- [25] X. Li, W.-M. L. Kuo, Y. Lu, R. Krithivasan, J. Cressler, and A. Joseph, "A 5 bit, 18 GS/sec SiGe HBT track-and-hold amplifier," in *Compound Semicond. Integr. Circuit Symp.*, 2005.
- [26] S. Shahramian, S. Voinigescu, and A. Carusone, "A 30 GS/sec track and hold amplifier in 0.13 μm CMOS technology," in *IEEE Custom Integr. Circuits Conf.*, 2006, pp. 493–496.
- [27] H.-L. Chen, S.-C. Cheng, and B.-W. Chen, "A 5 GS/s 46 dBc SFDR track and hold amplifier," in *Int. Intell. Signal Process. Commun. Syst. Symp.*, 2012, pp. 636–639.
- [28] D. Ferenci, M. Groezing, F. Lang, and M. Berroth, "A 3 bit 20 GS/s flash ADC in 65 nm low power CMOS technology," in *IEEE Eur. Microw. Integr. Circuits Conf.*, 2010, pp. 214–217.
- [29] J. Lee and Y.-K. Chen, "A 50 GS/s 5 b ADC in 0.18 μm SiGe BiCMOS," in *IEEE MTT-S Int. Microw. Symp. Dig.*, 2010, pp. 900–903.
- [30] S. Park, Y. Palaskas, and M. Flynn, "A 4 GS/s 4 bit flash ADC in 0.18 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 1865–1872, Sep. 2007.
- [31] Y.-J. Chuang, H.-H. Ou, and B.-D. Liu, "A novel bubble tolerant thermometer-to-binary encoder for flash A/D converter," in *Int. VLSI Design, Automat., Test Symp.*, 2005, pp. 315–318.
- [32] G. Tretter, D. Fritsche, C. Carta, and F. Ellinger, "Zero-ohm transmission lines for millimeter-wave circuits in 28 nm digital CMOS," *Electron. Lett.*, vol. 51, no. 11, pp. 845–847, 2015.
- [33] P. Ritter, S. Le Tual, B. Allard, and M. Möller, "Design considerations for a 6 bit 20 GS/s SiGe BiCMOS flash ADC without track-and-hold," *IEEE J. Solid-State Circuits*, vol. 49, no. 9, pp. 1886–1894, Sep. 2014.
- [34] R. Kertis et al., "A 20 GS/s 5-bit SiGe BiCMOS dual-nyquist flash ADC with sampling capability up to 35 GS/s featuring offset corrected exclusive-or comparators," *IEEE J. Solid-State Circuits*, vol. 44, no. 9, pp. 2295–2311, Sep. 2009.
- [35] H.-C. Hong, Y.-S. Chen, and W.-C. Fang, "14 GSps four-bit noninterleaved data converter pair in 90 nm CMOS with built-in eye diagram testability," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, pp. 1238–1247, Oct. 2013.
- [36] D. Ferenci, S. Mauch, M. Grözing, F. Lang, and M. Berroth, "A 3 bit 36 GS/s flash ADC in 65 nm low power CMOS technology," in *Int. Integr. Circuits Symp.*, 2011, pp. 344–347.



**Gregor Tretter** was born in Schweinfurt, Germany, in 1986. He received the Diploma degree in electrical engineering from the Technische Universität Dresden (TUD), Dresden, Germany in 2011, and is currently working toward the Ph.D. degree at TUD.

His research interests lie in the area of integrated analog circuit design where he is focused on broadband circuit design in general and data converter structures in particular.



Mahdi Khafaji was born in Tehran, Iran, in 1982. He received the Ph.D. degree (with highest honors) from the Dresden University of Technology, Dresden, Germany, in 2015.

From 2008 to 2012, he was with IHP Microelectronics, Frankfurt (Oder), Germany, where he was involved with high-speed digital-to-analog converters. He is currently with the Chair for Circuit Design and Network Theory, Technische Universität Dresden (TUD), Dresden, Germany. His research interests include high-speed data converters and

broadband circuits for optical communication.



**David Fritsche** was born in Bautzen, Germany, in 1986. He received the Diploma degree in electrical engineering from the Technische Universität Dresden (TUD), Dresden, Germany in 2011, and is currently working toward the Ph.D. degree at TUD.

His main research interests are in the field of analog circuit design with a current focus on circuits in advanced semiconductor technologies and for operation at sub-terahertz frequencies.



**Corrado Carta** (S'02–M'05) was born in Cagliari, Italy. He received the Master degree in electrical engineering from the University of Cagliari, Cagliari, Italy, in 2000, and the Ph.D. degree from the Swiss Federal Institute of Technology (ETH) Zürich, Zürich, Switzerland, in 2006.

From July 2000 to February 2006, he was with the Microwave Electronics Group, ETH Zürich, where his main research interests were in the field of silicon-based RF integrated circuit (RFIC) design for microwave wireless communications. From April

2006 to May 2008, he was with the High-Speed Electronics Group, Electrical and Computer Engineering Department, University of California at Santa Barbara, Santa Barbara, CA, USA, where his research was focused on the design of silicon-based integrated circuits for very large millimeter-wave phased arrays. In June 2008, he joined Sonos, Inc., where he led the RF engineering and compliance team and was involved in the development and characterization of the wireless interface of new and existing products. In March 2010, he joined the Chair for Circuit Design and Network Theory, Technische Universität Dresden

(TUD), Dresden, Germany, where he currently leads the mm-wave Integrated Circuit (IC) Design Group and the Beyond-Moore Electronics Group.



**Frank Ellinger** (S'97–M'01–SM'06) was born in Friedrichshafen, Germany, in 1972. He received the Diploma degree in electrical engineering from the University of Ulm, Ulm, Germany, in 1996, and the MBA and Ph.D. degree in electrical engineering and Habilitation degree in high-frequency circuit design from ETH Zürich (ETHZ), Zürich, Switzerland, in 2001 and 2004, respectively.

Since 2006, he has been a Full Professor and Head of the Chair for Circuit Design and Network Theory, Technische Universität Dresden (TUD),

Dreseden, Germany. From 2001 to 2006, he was Head of the RFIC Design Group, Electronics Laboratory, ETHZ, and a Project Leader of the IBM/ETHZ Competence Center for Advanced Silicon Electronics, hosted at IBM Research, Rüschlikon, Switzerland. He has been the Coordinator of the RESOLUTION, MIMAX, ADDAPT, and FLEXIBILITY projects funded by the European Union. He coordinates the cluster project FAST with more than 70 partners (most of them from industry) and the Priority Program FFlexCom of the German Research Foundation (DFG). He has authored or coauthored over 350 refereed scientific papers. He authored the lecture book *Radio Frequency Integrated Circuits and Technologies* (Springer, 2008).

Prof. Ellinger has been a Member of the Management Board of the German Excellence Cluster Cool Silicon. He was an elected IEEE Microwave Theory and Techniques Society (MTT-S) Distinguished Microwave Lecturer (2009–2011). He was the recipient of several awards including the IEEE Outstanding Young Engineer Award, the ETH Medal, the Denzler Award, the Rohde&Schwarz/Agilent/GerotronEEEf-COM Innovation Award (twice), and the ETHZ Young Ph.D. Award.