

## A Wideband Four-Way Doherty Bits-In RF-Out CMOS Transmitter

Beikmirza, Mohammadreza; Shen, Yiyu; de Vreede, Leo C.N.; Alavi, Morteza S.

DOI 10.1109/JSSC.2021.3105542

Publication date 2021 **Document Version** Accepted author manuscript

Published in IEEE Journal of Solid-State Circuits

**Citation (APA)** Beikmirza, M., Shen, Y., de Vreede, L. C. N., & Alavi, M. S. (2021). A Wideband Four-Way Doherty Bits-In RF-Out CMOS Transmitter. *IEEE Journal of Solid-State Circuits*, *56*(12), 3768-3783. Article 9526619. https://doi.org/10.1109/JSSC.2021.3105542

#### Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

# A Wideband Four-Way Doherty Bits-In RF-Out CMOS Transmitter

Mohammadreza Beikmirza<sup>(D)</sup>, *Graduate Student Member, IEEE*, Yiyu Shen<sup>(D)</sup>, *Member, IEEE*, Leo C. N. de Vreede<sup>(D)</sup>, *Senior Member, IEEE*, and Morteza S. Alavi<sup>(D)</sup>, *Member, IEEE* 

Abstract-We present a wideband, 12-bit four-way Doherty Cartesian digital transmitter (DTX) featuring an innovative 50%-LO signed I/Q interleaved up-conversion technique that enables close to perfect orthogonal I/Q summation. The DTX incorporates a compact four-way lumped-element Doherty power combining network to enhance its average efficiency at deep power back-off (DPBO). It comprises a signed second-order hold (SOH) interpolation filter to suppress the sampling spectral replicas significantly. The proposed DTX is realized in a 40-nm bulk CMOS and delivers a peak output power of 27.54 dBm with drain and system efficiencies of 46.35% and 30.77%, respectively, at 5.3 GHz. At 12 dB DPBO, the realized DTX demonstrates a drain efficiency (DE) of 41.74%-39.27% in a 5.2-5.5 GHz band, respectively. Its intrinsic I/Q image, LO leakage, and C-IMD3/H<sub>3BB</sub> for a 200 MHz tone spacing over a 4.8-6.2 GHz band are -64, -65, and -69 dBc, respectively, without calibration. Applying a simple memoryless  $2 \times 1$ -D digital pre-distortion, its error vector magnitude and adjacent channel leakage ratio are lower than -31 dB and -39 dBc, respectively, for a six-carrier "40 MHz 256-QAM OFDM" signal with 18 dBm average output power and a 41% average DE. The signed SOH functionality is verified employing a four-carrier "80 MHz 512-QAM OFDM" signal with spectral purity of better than -35 dBc, while its baseband sampling frequency is 675 MHz.

Index Terms—50%-LO clock distribution, 8-shape inductor/balun, current-mode class-D (CMCD), efficiency enhancement, in-phase/quadrature (I/Q) interleaving, radio frequency digital-to-analog converter (RF-DAC), sign-bit.

#### I. INTRODUCTION

**O** VER the last few years, digital transmitters (DTXs) have increasingly gained popularity as they supercede the circuit building blocks of conventional analog-intensive TXs with radio frequency digital-to-analog converters (RF-DACs) [1]–[20]. These bits-in RF-out TXs feature digital interpolation filters and arrays of digital up-converters with their subsequent digital power amplifiers (DPAs). They offer several advantages: highly efficient operation, direct-digital synthesis (DDS) capability, frequency-agile operation, multi-mode/multi-band functionality, and nanoscale CMOS compatibility, enabling higher integration and reduced die

Manuscript received April 16, 2021; revised July 15, 2021; accepted August 10, 2021. This article was approved by Associate Editor Nagendra Krishnapura. This work was supported in part by the Catrene Project, EAST. (*Corresponding author: Mohammadreza Beikmirza.*)

The authors are with the Department of Microelectronics, Delft University of Technology, 2628 CD Delft, The Netherlands (e-mail: m.r.beikmirza@tudelft.nl).



Fig. 1. Conventional 25%-LO I/Q DTX with its signed I/Q phase selector.

area. Despite these benefits, next-generation DTXs must be augmented by circuit, system, and architectural innovations to satisfy the stringent requirements of modern communication standards. As a result, they must support spectrally efficient wideband modulation schemes, achieve low error vector magnitude (EVM), and meet TX spurious emission requirements. Generally, in-phase/quadrature (I/Q) DTXs are considered superior for wideband application over their polar counterparts due to their linear I/Q operation that avoids bandwidth expansion [21]–[23].

Nevertheless, I/Q DTXs can suffer from the interaction between their I and Q paths, especially at higher power levels, giving rise to an I/Q image, in-band nonlinearity, and spectral regrowth. As depicted in Fig. 1, the DTX in [24] uses 25% quadrature clocks and a signed I/Q phase selector to alleviate the I/Q interaction. However, the I/Q interaction predominately remains due to the analog I/Q summation that is prone to mismatch and excessive output parasitics. The design reported in [25] applies an I/Q power-cell sharing method based on time-division multiplexing. In this approach, the upconverted I/Q signals are digitally added together while sharing a single power-cell, coined as IQ-sharing architecture. Still, it requires 25%-LO, which is challenging to realize in the 5 GHz band due to its excessive power consumption of the LO clocks. Alternatively, Deng et al. [26] deploy 50% clocks while adopting the phase selector of [24]. Nonetheless, employing 50% quadrature clocks due to their inherent overlap causes a non-orthogonal operation and distortion, entailing sophisticated digital pre-distortion (DPD).

On the other hand, modern communication standards employ modulation schemes, such as 256-quadratureamplitude modulation (QAM) orthogonal frequency-division multiplexing (OFDM). These modulation schemes feature a

© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works



Fig. 2. 25%-LO with conventional phase selector operation. (a) Resulting up-converting quadrature clock waveforms. (b) Idealized interleaved unit-cell. (c) Simulated single-sideband constellation diagram.

high peak-to-average-power ratio (PAPR) (e.g., >10 dB), which requires the DTX to operate in deep power back-off (DPBO), degrading its average efficiency. Many efficiencyenhancement techniques, such as out-phasing [27]-[29], envelop-tracking (ET) [30], and Doherty [31]-[37], are currently adopted in various DPA topologies to enhance their efficiency at DPBO. Two-way Doherty DPAs (DDPAs) are popular due to their less complicated baseband processing and handling large modulation bandwidth, but they typically enhance efficiency at 6 dB PBO. The N-way DDPA improves the efficiency at DPBO while maintaining the simplicity advantage of the DDPA configuration. However, it is incredibly challenging to implement an N-way DDPA since incorporating more DPA banks leads to excessive power combiner losses. In [31], a three-way DDPA architecture with decent overall linearity/efficiency performance has been reported.

Recently in [38], we have introduced a four-way DTX that achieves decent EVM, spectral purity, and efficiency at DPBO. It comprises a 50%-LO signed I/Q interleaved up-converter that facilitates close to perfect orthogonal I/Q summation. Moreover, it incorporates a push-pull low-loss four-way DDPA to enhance its average efficiency. This article elaborates on the system-/circuit-level design considerations and extensive measurement results. Section II introduces the proposed 50%-LO signed I/Q interleaving technique. Section III unveils the compact four-way DDPA. Section IV describes the TX architecture, while Section V demonstrates the measurement results. Section VI concludes this article.

### II. 50%-LO SIGNED I/Q INTERLEAVED UP-CONVERTER

#### A. Conventional Phase Selector Operation

1) 25%-LO With Conventional Phase Selector Operation: Fig. 2 demonstrates the conventional phase selector operation concept with a 25%-LO clock [39]. Conventionally, to minimize the distortion due to the I/Q overlap, nonoverlapping complementary quadrature clocks with a 25% duty cycle are required. In this context, the clock tree is typically implemented using a phase selector that operates directly on these narrow 25%-LO clocks ( $f_{LO,0.25\%}$ ,  $f_{LO,90.25\%}$ ,  $f_{LO,180.25\%}$ , and  $f_{LO,270.25\%}$ ). Depending on the four states of the I/Q sign-bits, the related complementary clock pairs can be swapped [see Fig. 2(a)]. Note that,

in this conventional approach, the Q sign-bit (Sign<sub>Q</sub>) only acts on the QP/QN related clocks CLK<sub>OP.25%</sub>/CLK<sub>ON.25%</sub>, while the I sign-bit  $(Sign_I)$  only operates on the IP/IN related clocks  $CLK_{IP,25\%}/CLK_{IN,25\%}.$  For example, in the case of a transition from the first quadrant (Sign<sub>0</sub> = 0 and  $\operatorname{Sign}_{I} = 0$  to the fourth one ( $\operatorname{Sign}_{Q} = 1$  and  $\operatorname{Sign}_{I} = 0$ ), since the sign of Q is changed, the corresponding complementary clocks, CLK<sub>OP.25%</sub>/CLK<sub>ON.25%</sub>, are swapped [see table in Fig. 2]. These 25% phase-modulated LO clocks are then directly mixed with the up-sampled baseband data [see Fig. 2(b)], driving in the subsequent power cells to cover the targeted constellation quadrants. Fig. 2(c) shows the simulated single-sideband constellation diagram of the conventional phase selector, indicating correct sign-bit operation without compression due to non-overlapping 25%-LO quadrature clocks. Nonetheless, since this phase selector exploits 25% quadrature LO clocks, it entails practical limitations and causes various issues in the DTX clock trees. First, compared to using 50%-LO clocks, their rise/fall times in the clock tree must be very short, requiring faster buffers in the clock tree and consuming more dc power. Therefore, this approach is less attractive at higher frequencies. Second, compared to 50%-LO clocks, their 25% counterparts are not inherently symmetric and balanced, making them more prone to signal interference, dc offsets, and abrupt transient conditions (e.g., due to the changing sign-bits). These issues yield timing inaccuracies in practical implementations, which, in turn, gives rise to performance nonidealities, such as limited I/Q image rejection and unwanted spectral leakage spurs, especially at higher frequency bands [16], [40]–[42]. These TX spectral spurs increase the far-out spectral noise floor at receiver (RX) bands, which is detrimental. The single-sideband spectrum with 100 MHz tone-spacing and the power consumption breakdown at 2.4 GHz employing global 25%-LO clocks with conventional phase selector is presented in Fig. 3. Accordingly, its I/Q image is -59 dBc, while its digital circuit blocks consume 380.8 mW. The performance of this conventional arrangement will be compared to the proposed method in Section II-B.

2) 50%-LO With Conventional Phase Selector Operation: To address the issues mentioned above, Fig. 4(a) illustrates the resulting waveforms of applying the conventional phase



Fig. 3. (a) Single-sideband spectrum with 100 MHz tone-spacing at 2.4 GHz employing 25%-LO clocks with the conventional phase selector. (b) Power consumption breakdown.



Fig. 4. (a) Resulting up-converting quadrature clock waveforms of 50%-LO with conventional phase selector operation. (b) Simulated single-sideband constellation diagram.

selector to 50% quadrature LO clocks. Accordingly, this setup leads to correct quadrant selection depicted in Fig. 4(b). However, it causes distortions due to an I/Q overlap, which significantly deteriorates the image-rejection ratio, in-band linearity, and close-in spectral purity [26], [43], [44]. To mitigate this issue, 25% quadrature LO clocks are required when both I and Q are active. Therefore, 25% complementary quadrature LO clocks are generated by multiplying the adjacent 50%-LO clocks together [see Fig. 5(a)]. However, after applying these 25%-LO clocks to the up-sampled baseband data, as shown in Fig. 5(b), it turns out that the sign-bit operation is still incorrect even when using these 25% non-overlapping clocks. Fig. 5(c) compares this latter case with the conventional phase selector using 25%-LO clocks. As illustrated, the resulting clock waveforms are correct in the first/third quadrant (alternate quadrant traverse). Nonetheless, the modified phase selector does not replicate the suitable clock waveforms in the second/fourth quadrant (adjacent quadrant traverse). The reason lies in the fact that both I and Q sign-bits change in an alternate quadrant traverse case, and thus, the clocks preserve their lead/lag phase relation. On the other hand, in an adjacent quadrant traverse condition, one of the I or Q sign-bits changes. Therefore, the clocks do not preserve their lead/lag phase relation. To tackle this issue, we propose a new 50%-LO signed I/Q interleaved up-converter.

#### B. Proposed 50%-LO Signed I/Q Interleaved Up-Converter

The proposed signed I/Q interleaved up-converter addresses this issue by exploiting global 50%-LO clocks, phase modulated by the sign-bits, along with a new singlesideband I/Q digital up-converter. In this method, the DTX quadrant selection (sign-mapping operation) is realized in two steps: 1) global 50% duty-cycle quadrature clock mapping



Fig. 5. (a) Possible implementation diagram of 25% clock generation by multiplying the adjacent 50%-LO clocks. (b) Simulated single-sideband constellation diagram. (c) Resulting 25% quadrature clock waveforms.

and 2) local I/Q up-conversion and I/Q interleaving. The remainder of this section elaborates further on the details of these two steps and their design considerations.

1) Step 1: Global 50% Duty-Cycle Quadrature Clock Mapping: Fig. 6(a) illustrates a graphical representation of this step. The DTX clock tree uses 50% square-wave LO clocks  $(f_{\text{LO},0.50\%}, f_{\text{LO},90.50\%}, f_{\text{LO},180.50\%}, \text{ and } f_{\text{LO},270.50\%})$ , and the conventional clock phase selector described above in Fig. 2 is replaced by the proposed sign-bit phase mapper. As illustrated, in the first step, based on the I/Q sign-bit states, the 50% quadrature clocks are swapped in a particular fashion that contrasts with the typical approach whose complementary clocks are swapped. The table in Fig. 6 summarizes the phase mapping relations for the proposed 50%-LO clocks. In this approach, the I sign-bit impacts both the I-related clock signals (CLK<sub>IP,50%</sub>/CLK<sub>IN,50%</sub>) operation and, more importantly, the Q-related clocks ( $CLK_{OP,50\%}/CLK_{ON,50\%}$ ). Similarly, the Q sign-bit affects both Q-/I-related clocks. For example, in the transition from the first quadrant to the fourth one, in the first step, 50%-LO clocks of CLKQP/CLKQN are swapped with CLK<sub>IN</sub>/CLK<sub>IP</sub>, respectively [see Fig. 6(b)]. The proposed 50%-LO quadrature sign-bit mapper scheme enables simple multiplication actions on the resulting clocks (CLK<sub>OP50%</sub>, CLK<sub>IP,50%</sub> CLK<sub>ON,50%</sub>, and CLK<sub>IN,50%</sub>) to adequately generate 25% up-converting LO clocks for the unit cells, which is described in the next step.

2) Step 2: Local I/Q Up-Conversion and I/Q Interleaving: To complete the up-conversion operation, the required 25%-LO clocks are generated by multiplying the appropriate pair of the phase-modulated 50%-LO quadrature clocks together. A possible implementation is illustrated in Fig. 6(c) showing the generation of the local 25%-LO clocks in each RF-DAC sub-cell (CLK<sub>QP,25%</sub>, CLK<sub>IP,25%</sub>, CLK<sub>QN,25%</sub>, and CLK<sub>IN,25%</sub>), by bit-wise multiplying of their corresponding



Fig. 6. Proposed global 50% duty-cycle quadrature clock mapping operation. (a) Resulting up-converting quadrature clock waveforms. (b) Example for the first quadrant to the fourth-quadrant transition. (c) Resulting 25% clocks generated by multiplying the adjacent 50%-LO clocks. (d) Simulated single-sideband constellation diagram.



Fig. 7. (a) Schematic of the local 25%-LO clock generation, I/Q up-conversion, and interleaving. (b) NAND gate-based logical implementation circuitry. (c) Two-/three-input symmetrical NAND logic gates.

50% clocks (CLK<sub>QP,50%</sub>, CLK<sub>IP,50%</sub>, CLK<sub>QN,50%</sub>, and CLK<sub>IN,50%</sub>) with their clockwise adjacent clock (CLK<sub>IP,50%</sub>, CLK<sub>ON.50%</sub>, CLK<sub>IN.50%</sub>, and CLK<sub>OP.50%</sub>), respectively. Accordingly, as illustrated in Fig. 6(c), the resulting 25% clocks replicate the required clock waveforms in all quadrants. Fig. 7(a) conceptually shows how the 25%-LO clock signals are generated and multiplied with their baseband signals, i.e.,  $I_{BB}$  and  $Q_{BB}$ , performing the upconversion. The up-converted I/Q bitstreams are combined to fulfill I/Q interleaving. Fig. 6(d) presents the simulation results of the proposed 50%-LO signed I/Q interleaved up-converter concept showing accurate sign-bit operation without compression. It is worth mentioning that the local 25%-LO clock generation and up-conversion are performed using three-input NAND gates [see Fig. 7(b)]. As stated above, 50%-LO clocks are inherently symmetrical. Therefore, they will be less prone to electromagnetic (EM) couplings and interferences, especially when the clock lines are implemented symmetrically. In addition to these symmetrical conditions, the logic gates used for the sign-bit mapper, local

I/Q up-conversion, and interleaving circuits are also fully symmetrical to their inputs. Most traditional logic gates do not have this feature. For example, a conventional NAND logic gate is inherently asymmetrical. Therefore, it does not offer precise symmetrical loading to the incoming phase-modulated clock lines. This arrangement jeopardizes the DTX operation due to interfering signals and leads to timing errors. A NAND gate circuitry comprising symmetrical logic gates is employed to address this issue, which provides symmetric input loading and transfer function [see Fig. 7(c)]. By making the clock tree and its loading fully symmetrical, the timing error is remarkably diminished. The single-sideband spectrum with 100 MHz tone-spacing and the power consumption breakdown at 2.4 GHz of the proposed global 50%-LO signed I/Qinterleaved up-converter is presented in Fig. 8. Accordingly, the DPA's I/Q image is -67 dBc, while its digital circuit blocks consume 333.8 mW. Compared to Fig. 3, the I/Qimage and dc power consumption are boosted by 8 dB, and 47 mW, respectively, at 2.4 GHz. Nevertheless, at higher operational frequencies, employing global 25%-LO clocks becomes impractical, leading to performance degradation. Consequently, the proposed technique facilitates highfrequency I/Q up-conversion by utilizing symmetrical, balanced, and matched 50% quadrature LO clocks in a two-step I/Q phase modulator. Using 50%-LO clocks has benefits that significantly enhance orthogonal summation and system efficiency (SE). Its advantages are given as follows.

- 1) It lowers the impact of the clock line EM couplings and parasitics.
- 2) It has more immunity to the duty-cycle distortion.
- The LO signals in the clock trees are more robust to interfering signals due to their symmetrical operation and balanced loading.
- 4) It consumes less power.



Fig. 8. (a) Single-sideband spectrum with 100 MHz tone-spacing at 2.4 GHz employing the proposed 50%-LO signed I/Q interleaving up-converter. (b) Power consumption breakdown.



Fig. 9. (a) General four-way DPA with lossless power combining network. (b) and (c) Output current and voltage profiles versus input codeword.

5) It improves its in-band linearity and close-in/far-out spectral purity.

#### **III. DOHERTY POWER COMBINING NETWORK**

As mentioned in Section I, to obtain highly efficient operation at DPBO and generate relatively high average RF power in CMOS technology, a compact four-way DDPA is introduced. Its power combining network comprises four input ports and one RF output for the 50- $\Omega$  antenna connection. To facilitate the design of the four-way combiner, an approach similar to [45] has been adopted to design various types of four-way DDPA power combining networks. In this context, as shown in Fig. 9(a), first, the designated five-port power combiner is considered as a five-port black box. Next, its ports' current profiles are determined based on a given set of three free-to-choose peak efficiency power back-off points [i.e.,  $k_{B1}$ ,  $k_{B2}$ , and  $k_{B3}$  in Fig. 9(b)]. The power combining network is assumed to be lossless and reciprocal. Nevertheless, an intermediate network that includes the power combining network and the DDPA load,  $R_L$ , is hypothetically considered to reduce the five-port to four-port configuration. The intermediate network is still reciprocal but no longer lossless because of the 50- $\Omega$  lossy load component. The z-parameters can be expressed in terms of voltages and currents by solving (1), as shown at the bottom of the next page, where the subscripts m and pi (i = 1-3) represent the main and peaki (i = 1-3)

DPAs, respectively. The second subscripts F and Bi (i = 1 - 1)3) represent the full power and back-off points, which are related to its full power by factors of  $k_{B1}^2$ ,  $k_{B2}^2$ , and  $k_{B3}^2$ , respectively. The next step is to determine all of the voltage and current variables at the output of each DPA to determine the z-parameters of the intermediate network uniquely. It is worth noting that different sets of boundary conditions result in various power combining networks. In this work, however, the output current of all DPAs at their maximum, i.e.,  $I_{Max}$ , is considered to be equal, which occurs at full power (F). Due to this boundary condition, the proposed DTX comprises four identical DPA banks to implement the main and peaking power devices. Furthermore, the corresponding peaking DPAs are off before their corresponding back-off points and are active beyond their related back-off point. Thus, the following boundary conditions are considered.

$$i_{m,F} = i_{p1,F} = i_{p2,F} = i_{p3,F} = I_{\text{Max}}$$
  
 $i_{p1,B3} = i_{p2,B3} = i_{p2,B2} = i_{p3,B1} = i_{p3,B2} = i_{p3,B3} = 0.$  (2)

On the other hand, to maximize the efficiency at full power and the three other back-off points Bi (i = 1-3), the RF voltage amplitude at the output of the relevant devices (main, peak1, peak2, and peak3) has to be maximized at these points, as depicted in Fig. 9(c)

$$|v_{m,F}| = |v_{m,B1}| = |v_{m,B2}| = |v_{m,B3}| = |v_{p1,F}| = |v_{p1,B1}|$$
$$= |v_{p1,B2}| = |v_{p2,F}| = |v_{p2,B1}| = |v_{p3,F}| = V_{Max}.$$
 (3)

The output power of the four-way DPA, assuming a lossless power-combining network, can be expressed as a function of its voltage and current variables

$$P_{\text{out}} = \frac{1}{2} \text{Re} \left( v_m i_m^* + v_{p1} i_{p1}^* + v_{p2} i_{p2}^* + v_{p3} i_{p3}^* \right).$$
(4)

For the requirement of peak efficiency at the back-off points, the output power of the four-way DPA at the back-off point has a fixed ratio to its full power

$$k_{B1}^{2} = \frac{P_{\text{out},B1}}{P_{\text{out},F}} = \frac{\text{Re}(v_{m,B1}i_{m,B1}^{*}) + \text{Re}\left(\sum_{i=1}^{2} v_{pi,B1}i_{pi,B1}^{*}\right)}{\text{Re}(v_{m,F}i_{m,F}^{*}) + \text{Re}\left(\sum_{i=1}^{3} v_{pi,F}i_{pi,F}^{*}\right)}$$
$$= \frac{\text{Re}\left(v_{m,B1}i_{m,B1}^{*} + v_{p1,B1}i_{p1,B1}^{*} + v_{p2,B1}i_{p2,B1}^{*}\right)}{\text{Re}\left(v_{m,F}i_{m,F}^{*} + v_{p1,F}i_{p1,F}^{*} + v_{p2,F}i_{p2,F}^{*} + v_{p3,F}i_{p3,F}^{*}\right)}$$
(5)

$$k_{B2}^{2} = \frac{P_{\text{out},B2}}{P_{\text{out},F}} = \frac{\text{Re}(v_{m,B2}i_{m,B2}^{*} + v_{p1,B2}i_{p1,B2}^{*})}{P_{\text{out},F}}$$
(6)

$$k_{B3}^{2} = \frac{\text{Re}(v_{m,B3}i_{m,B3}^{*})}{P_{\text{out},F}}.$$
(7)

In addition, each DPA current is linearly related to its input code when the DPA is active. Consequently, the main DPA currents at back-off points Bi (i = 1-3) are

$$i_m = \frac{I_{\text{Max}}}{D_{\text{Max}}}(D_{\text{in}}) \Longrightarrow$$

$$i_{m,B3} = \frac{I_{\text{Max}}}{D_{\text{Max}}}(k_{B3}D_{\text{Max}}) = k_{B3}I_{\text{Max}}$$

$$i_{m,B2} = k_{B2}I_{\text{Max}}, \text{ and } i_{m,B1} = k_{B1}I_{\text{Max}}$$
(8)

where  $D_{\text{Max}}$  is the maximum input code of each DPA. Moreover, the peak1 DPA currents at back-off points B2 and B1 are

$$i_{P1} = \frac{I_{\text{Max}}}{D_{\text{Max}}(1 - k_{B3})} (D_{\text{in}} - k_{B3}D_{\text{Max}}) \Longrightarrow$$

$$i_{p1,B2} = \frac{I_{\text{Max}}}{1 - k_{B3}} (k_{B2} - k_{B3})$$

$$i_{p1,B1} = \frac{I_{\text{Max}}}{1 - k_{B3}} (k_{B1} - k_{B3}). \tag{9}$$

In addition, the peak2 DPA current at back-off point B1 is

$$i_{P2} = \frac{I_{\text{Max}}}{D_{\text{Max}}(1 - k_{B2})} (D_{\text{in}} - k_{B2}D_{\text{Max}}) \Longrightarrow$$
$$i_{P2,B2} = \frac{I_{\text{Max}}}{1 - k_{B2}} (k_{B1} - k_{B2}). \tag{10}$$

Moreover, the current profile of peak3 DPA is represented as

$$i_{P3} = \frac{I_{\text{Max}}}{D_{\text{Max}}(1 - k_{B1})} (D_{\text{in}} - k_{B1}D_{\text{Max}}).$$
(11)

To complete our set of equations, the reciprocal property of the intermediate network forces to have these relationships:  $Z_{12} = Z_{21}, Z_{13} = Z_{31}, Z_{14} = Z_{41}, Z_{23} = Z_{32},$  $Z_{24} = Z_{42}$ , and  $Z_{34} = Z_{43}$ . Afterward, the remaining unknown variables can be uniquely solved using the above independent equations for specific back-off levels  $k_{B1}$ ,  $k_{B2}$ , and  $k_{B3}$ , and phase relations between main and peak DPAs. The z-parameters of the intermediate network will then be defined. Next, its s-parameters of the intermediate network can be obtained from the already known four-port z-parameter matrix. Therefore, the five-port s-parameters of the power combining network can be derived from its four-port s-parameters. Note that only the fifth port (which is connected to the load) needs to be reintroduced, and the remaining s-parameters are identical to those of the intermediate network. Since the power combining network is considered to be lossless, the s-parameters of (N + 1)-port have the following properties:

$$\sum_{n=1}^{N+1} |s_{np}|^2 = 1 \quad \forall p \text{ and } \sum_{n=1}^{N+1} s_{np} s_{nq}^* = 0 \quad \forall p \neq q \quad (12)$$

where N represents the power combining network number of ports excluding the load port. Using (12), the unknown variables are  $s_{5i}$  (i = 1-5). The magnitude of  $s_{51}$  can be obtained

$$|s_{51}| = \sqrt{1 - |s_{11}|^2 - |s_{21}|^2 - |s_{31}|^2 - |s_{41}|^2}.$$
 (13)

Thus, we can define  $\angle s_{51} = \alpha$ , and then,  $s_{5i}$  for i = 2-5 are defined as functions of  $\alpha$ 

$$s_{11}s_{12}^{*} + s_{21}s_{22}^{*} + s_{31}s_{32}^{*} + s_{41}s_{42}^{*} + s_{51}s_{52}^{*} = 0 \Longrightarrow$$

$$s_{52} = \frac{-1}{s_{51}^{*}} \left( s_{11}^{*}s_{12} + s_{21}^{*}s_{22} + s_{31}^{*}s_{32} + s_{41}^{*}s_{42} \right)$$

$$s_{53} = \frac{-1}{s_{51}^{*}} \left( s_{11}^{*}s_{13} + s_{21}^{*}s_{23} + s_{31}^{*}s_{33} + s_{41}^{*}s_{43} \right)$$

$$s_{54} = \frac{-1}{s_{51}^{*}} \left( s_{11}^{*}s_{14} + s_{21}^{*}s_{24} + s_{31}^{*}s_{34} + s_{41}^{*}s_{44} \right)$$

$$s_{55} = \frac{-1}{s_{51}^{*}} \left( s_{11}^{*}s_{15} + s_{21}^{*}s_{25} + s_{31}^{*}s_{35} + s_{41}^{*}s_{45} \right). \tag{14}$$

Consequently, the *s*-parameters of the power combiner are known and only depend on the variable  $\alpha$ . The phase of  $\alpha$  may yield the optimum power combiner topology. Subsequently, the passive network topology of the power combiner can be extracted from the z-/y-parameters of the five-port network  $(Z_C/Y_C)$ . The general four-way power combiner topology is shown in Fig. 10(a). It is worth mentioning that different passive topologies can be achieved by solving the equations for various phase relations that differ in terms of simplicity, operational bandwidth, and passive efficiency. In some cases, the values of  $Z_C/Y_C$  may become complex and not pure imaginary numbers, which indicates that the network is not realizable by the  $\lambda/4$  or  $3\lambda/4$  transmission lines shown in Fig. 10(a). Since the four-way Doherty network is considered as a black box, thus, any network, whose z-/y-parameter is equal to the calculated z-/y-parameter, can be considered a feasible solution for the desired Doherty network. Fig. 10(c) and (d) depicts two possible arrangements for the four-way Doherty combiner based on  $\lambda/4$ -line impedance inverter. In this work, the structure in Fig. 10(d) is chosen [46] since, in contrast to the conventional Doherty parallel combiner, the selected configuration has the advantage of a relatively short RF path from the main DPA branch to the load. This arrangement directly improves the DPBO average efficiency [see Fig. 10(e)]. The z-parameter of this network is given as follows:

$$Z_{T} = \begin{bmatrix} 0 & j \frac{Z_{01} Z_{02}}{Z_{03}} & 0 & 0 & -j Z_{01} \\ j \frac{Z_{01} Z_{02}}{Z_{03}} & 0 & j \frac{Z_{02}}{Z_{05}} & 0 & 0 \\ 0 & j \frac{Z_{02} Z_{04}}{Z_{05}} & 0 & -j Z_{04} & 0 \\ 0 & 0 & -j Z_{04} & 0 & 0 \\ -j Z_{01} & 0 & 0 & 0 & 0 \end{bmatrix}$$
(15)

where  $Z_{0i}$  (i = 1-4) are the characteristic impedance of ) the  $\lambda/4$ -line impedance inverters. Assuming  $\alpha = 90^{\circ}$ ,  $Z_C$  is

| $\int z$ | 11 | $z_{12}$ | <i>z</i> <sub>13</sub> | $z_{14}$ | ] | $i_{m,F}$                                                        | $i_{p1,F}$  | $i_{p2,F}$  | i <sub>p3,F</sub> -       | $ ^{-1} $ | $v_{m,F}$  | $v_{p1,F}$  | $v_{p2,F}$  | $v_{p3,F}$ -                |     |
|----------|----|----------|------------------------|----------|---|------------------------------------------------------------------|-------------|-------------|---------------------------|-----------|------------|-------------|-------------|-----------------------------|-----|
|          | 21 | $z_{22}$ | Z23                    | Z24      | = | $\begin{bmatrix} i_{m,B1} \\ i_{m,B2} \\ i_{m,B3} \end{bmatrix}$ | $i_{p1,B1}$ | $i_{p2,B1}$ | $i_{p3,B1}$               |           | $v_{m,B1}$ | $v_{p1,B1}$ | $v_{p2,B1}$ | $v_{p3,B1}$                 | (1) |
|          | 31 | Z32      | Z33                    | Z34      |   |                                                                  | $i_{p1,B2}$ | $i_{p2,B2}$ | $i_{p4,B2}$               |           | $v_{m,B2}$ | $v_{p1,B2}$ | $v_{p2,B2}$ | $v_{p3,B2}$                 |     |
| Lz       | 41 | Z42      | Z43                    | Z44 _    |   |                                                                  | $i_{p1,B3}$ | $i_{p2,B3}$ | <i>i</i> <sub>p4,B3</sub> |           | $v_{m,B3}$ | $v_{p1,B3}$ | $v_{p2,B3}$ | <i>v</i> <sub>p3,B3</sub> _ |     |



Fig. 10. (a) General topology of the Doherty power combiner. (b) Selected topology. (c) Conventional four-way Doherty parallel power combiner. (d) Proposed (on-chip) Doherty parallel combiner. (e) Simulated passive efficiency. (f) Lumped component model of the T-line.

calculated as

$$Z_C = \begin{bmatrix} 0 & j17.319 & 0 & 0 & -j25\\ j17.319 & 0 & j6.946 & 0 & 0\\ 0 & j6.946 & 0 & -j22.37 & 0\\ 0 & 0 & -j22.37 & 0 & 0\\ -j25 & 0 & 0 & 0 & 0 \end{bmatrix}.$$
(16)

Solving (15) and (16), the characteristic impedance value of the associated transmission lines can be obtained. Note that  $Z_{02}$ ,  $Z_{03}$ ,  $Z_{04}$ , and  $Z_{05}$  have dependent relation. In our design,  $Z_{01} = 25 \cdot \Omega, Z_{02} = 22.86 \cdot \Omega, Z_{03} = 33 \cdot \Omega, Z_{04} = 22.37 \cdot \Omega,$ and  $Z_{05} = 73.63 \cdot \Omega$ . It should be emphasized that our proposed Doherty architecture contrasts with the approach in [31] since, in our work, the phase delay is digitally implemented using the quadrature clocks. Implementing such a transmission line of the DDPA occupies silicon area and exhibits excessive losses, resulting in a significant deviation of its passive efficiency from the theoretical performance. To make the power combiner more compact, the transmission lines can be replaced either with a lumped low- or high-pass  $\pi$ -networks, as shown in Fig. 10(f). Nevertheless, in this work, the  $\lambda/4$  transmission lines are approximated by a lumped high-pass equivalent LC  $\pi$ -networks with a  $-90^{\circ}$  phase delay. These high-pass  $\pi$ -networks are equivalent to  $3\lambda/4$ transmission lines that require different LO phase relations



Fig. 11. (a) and (b) Implementation of the four-way Doherty output impedance matching network with consolidated on-chip lumped elements for transmission lines. (c)–(f) Theoretical and simulation (colored dotted lines) results of the proposed four-way Doherty output matching network at 5.4 GHz [EM simulation results of DE in other frequencies are plotted in gray dotted lines in (f)].

in the four identical DPA branches connected to the power combiner. Using the proper LO clocks' phases, the I/Q DPA banks' currents drive the four-way DPA combiner, yielding the desired active-load modulation and in-phase power summation.

Fig. 11(a) also shows that the DPAs require RF chokes for the dc feed and second-harmonic termination. Employing the high-pass LC networks facilitates consolidating shunt inductors directly at the output of the DPAs into four inductors  $L_{eq1}-L_{eq4}$  that resonate out the associated DPA output capacitance and also provide dc biasing and only one extra inductor L<sub>eqM</sub>. Moreover, high-pass LC networks allow incorporating the output balun into the DPA power combining network, yielding an even more compact power combiner. Furthermore, the capacitor in the high-pass network acts as an ac-coupled capacitor providing dc voltage isolation to the other DPAs enabling the incorporation of different supply voltages for the individual DPAs [indicated as V<sub>DDMain</sub>, V<sub>DDPeak1</sub>,  $V_{\text{DDPeak2}}$ , and  $V_{\text{DDPeak3}}$  in Fig. 11(b)]. This technique can potentially improve the efficiency even at deeper power backoff regions. Fig. 11(d)–(f) demonstrates the theoretical and simulated results of the proposed four-way Doherty output matching network. As shown in Fig. 11(f), with loss-less passive components and real active power devices, the DTX achieves an average drain efficiency (DE) of 76% throughout the 12 dB PBO, while this value is 59.73% considering lossy inductors with a quality factor of Q = 20. The DTX delivers



Fig. 12. (a) Detailed block diagram of the proposed four-way Doherty I/Q DTX. (b) Chip microphotograph.

an average DE of 49.73% incorporating EM simulation of the complete structure. The physical layout floorplan of the proposed four-way Doherty power combiner will be revealed and further discussed in Section IV.

#### **IV. IMPLEMENTATION DETAILS**

An overview of the architecture is illustrated in Fig. 12(a). It comprises digital and RF parts. The digital part includes the digital baseband signal processing block, the LO and sampling clock generator block, the phase (sign-bit) selector block, and the digital I/Q interleave banks. In addition, the RF part consists of digital power cells and the four-way power combiner. In the remainder of this section, its building blocks will be sequentially disclosed, and their circuit design techniques will be described.

#### A. Clock Generation And Distribution

An off-chip single-ended clock operating at  $2 \times f_C$  is applied to a matched on-chip transformer, which converts the unbalanced clock to its balanced counterpart. Fig. 13(a) exhibits the layout of the matched input transformer. A recursive design is performed to achieve matched wideband transformer with negligible amplitude and phase mismatch. The transformer outer diameter is 185  $\mu$ m  $\times$  185  $\mu$ m. The EM simulation results using Momentum are plotted versus frequency in Fig. 13(b)-(e), including the magnetic coupling factor  $K_m$ , primary and secondary inductances  $L_P$  and  $L_S$ , winding resistances  $R_P$  and  $R_S$ , and quality factors  $Q_P$ and  $Q_s$ . At 12 GHz, these parameters are  $K_m = 0.389$ ,  $L_P = 412 \text{ pH}, L_S = 832 \text{ pH}, R_P = 1.9-\Omega, R_S = 6.9-\Omega,$  $Q_P = 16.8$ , and  $Q_S = 9.1$ , respectively. The next stage has a large impedance. Thus, a parallel resistor of 250- $\Omega$  and a capacitor of 220 fF are added differentially to transform these differential loads into an optimum single-ended primary load, facilitating the matching condition. The EM simulation



Fig. 13. (a) Layout of the input unbalanced-to-balanced transformer. (b)–(e) EM simulation results of the input balun. (f) Simulated versus measured  $S_{11}$ .

and measured s-parameters of Fig. 13(f) represent a good matching condition at the transformer input. Utilizing this matched transformer at the transmitter's input, the required power for the off-chip single-ended LO at 10.8 GHz is 5 dBm. Therefore, considering 27.54 dBm output power at this frequency, the LO-IN RF-Out power conversion gain is roughly 22 dB. Although the transformer's differential layout traces are completely symmetrical, a phase aligner comprising a backto-back inverter pair is employed at the transformer output to prevent any misalignment. The phased aligned differential  $2 \times$  $f_C$  clocks, i.e.,  $2 \times f_{C,0}$  and  $2 \times f_{C,180}$ , are applied to a divideby-2 circuit to generate the desired 50%-LO clocks at  $f_C$  with a relative phase difference in multiples of 90°. These complementary quadrature LO clocks are  $f_{LO,0.50\%}$ ,  $f_{LO,90.50\%}$ ,  $f_{\rm LO,0_{-180\%}}$ , and  $f_{\rm LO,270_{-50\%}}$  in Fig. 12(a). The divide-by-2 circuit is implemented as a flip-flop-based frequency divider that consists of four C<sup>2</sup>MOS latches arranged in a loop [24]. All other divide-by-2 circuits also utilize the same structure. The transistor sizing, however, is adjusted based on its operational frequency.

On the other hand, the master/baseband sampling clocks  $(F_S/F_{BB})$  are generated by employing two different approaches. In the first method, these clocks can be created from the existing carrier clock by applying one of its complementary clock pairs (e.g.,  $f_{\rm LO,0.50\%}/f_{\rm LO,180.50\%}$  or  $f_{\rm LO,90}$  50%/ $f_{\rm LO,270}$  50%) to another divide-by-2 circuit. By utilizing this arrangement, the  $F_S$  clock operates at  $f_C/2$ , resulting in direct dependency of the baseband modulation bandwidth to its carrier operating frequency. In the second method, independent master/baseband clocks can be generated using another off-chip single-ended clock running at  $2 \times F_S$ . Using an active unbalanced-to-balanced converter and a subsequent divide-by-2 circuit, the  $F_S$  clock is generated. This master clock is then applied to a divide-by-4 circuit to generate the  $F_{BB}$  clock. The following block is a multiplexer to select the appropriate baseband clock. It is worth mentioning that, to mitigate the crosstalk mostly caused by capacitive coupling, ground lines were placed in-between the quadrature LO lines. In addition, to suppress the LO leakage, shielding is utilized to diminish the coupling from other routing lines, e.g., data routing, when multiple crossover lines occur.

#### B. Delay Alignment and the Phase (Sign-Bit) Selector

1) Delay Alignment: The required Doherty phase relations of the DPA branches are digitally implemented by appropriately swapping carrier quadrature clocks. To compensate for different design variations, such as the process/voltage/temperature (PVT), frequency, and load variations on Doherty phase relations, fine-tune phase aligners are adopted [17] and implemented, as shown in Fig. 14. Their controlling signals are static and come from a serialto-parallel interface (SPI). A binary-to-thermometer encoder converts the 4-bit input binary code (CNTL(1:4)) to a 15-bit thermometer code where each bit is used as a delay control bit for a delay cell. The delay line can be bypassed or employed by the Enable bit. The absolute delay of each delay cell is controlled with a single bit by enabling or disabling NMOS and PMOS transistors in series with the supply/ground paths. The RF clock passes through 15 cascaded delay cells to arrive at the output, resulting in a total relative delay of 85 ps with a resolution of roughly 5.5 ps, which is more than enough to compensate for the variation mentioned above.

2) Phase selector: The following stage is the carrier clock phase selector. As demonstrated in Fig. 15, it is implemented as four NAND-gate-based multiplexers with its input selection control signals of Sign<sub>I</sub> =  $I_{BB,up}[11]$  and Sign<sub>Q</sub> =  $Q_{BB,up}[11]$  [see Fig. 12(a)]. As depicted in Fig. 12(b), four controlling signals of  $C_1$ ,  $C_2$ ,  $C_3$ , and  $C_4$  are first generated by ANDing the corresponding I/Q sign-bits and subsequently applied to the phase mapper. Based on the four different states of the I/Q sign-bits, the 50% phase-modulated quadrature clocks fed to the DPA can adequately be swapped, and thus, the entire four-quadrant I/Q plane can be covered. To equalize the delays of the clocks, the NAND gates are implemented in a fully symmetric configuration [see Fig. 15(c)]. Moreover, due to employing 50%-LO clocks, a back-to-back inverter pair is



Fig. 14. (a) Schematic of the 4-bit fine-resolution delay line and (b) its delay-cells.



Fig. 15. (a) Schematic of the global 50%-LO quadrature clock mapper. (b) NAND gate-based multiplexer implementation circuitry. (c) Four-input symmetrical NAND logic gate.

employed to align further the phases of the complementary clock pairs.

#### C. 11-Bit I/Q DPA Floor Plan

The transmitter comprises four identical 12-bit resolution I/Q banks (including the sign-bit) that act as four I/Q RF-DACs. Fig. 17(a) depicts the implementation details and the floorplan of one of the I/Q banks. For each bank, the digital I/Q baseband data  $(I_{BB}/Q_{BB}[11:0])$  are stored on a 4-K SRAM and clocked at  $F_{BB}$ , which are programed through the low-speed SPI interface. The 12-bit digital I/Q baseband signals passed through a "signed" zero-/first-/second-order hold (ZOH/FOH/SOH) FIR filter to up-sample by a factor of 4 ( $F_S = 4 \times F_{BB}$ ) and low-pass filter the up-sampled I/Q baseband data, suppressing the sampling spectral replicas (SSRs) [22]. Fig. 16 unveils the detailed multiplexer implementation and the corresponding waveforms. The DFF at the input of the MUX is clocked at  $F_{BB}$  and re-times the output data of the digital FIR filter  $(I/Q_{BB1}[i], I/Q_{BB2}[i])$ ,  $I/Q_{BB3}[i]$ , and  $I/Q_{BB4}[i]$ , where i = 0-11). As illustrated in Fig. 16(b), the pulsewidth of the selection signals driving the transmission gates  $(S_0-S_3)$  are  $1/F_S$ , which are realized by bitwise ANDing of the proper  $F_{BB}$  clocks pairs [see Fig. 16(c)]. Accordingly, the multiplexer performs the 4  $\times$  up-sampling and summing operation by generating three zeros and merely one sample of the input signal during one period of the baseband signal. The DFF at the output is clocked at  $F_S$  and re-times the up-sampled and interpolated I/Q data, resulting in ZOH function at  $F_S$ , and FOH/SOH suppression at multiples of  $F_{BB}$ . This FIR filter architecture contrasts with the approach in [22] since, in our work, it is implemented for signed baseband data.  $I_{BB,UP}[10:0]$ , and  $Q_{BB,UP}[10:0]$  represent the interpolated un-signed binary digital codes that must be converted to corresponding thermometer codes to avoid nonmonotonic behavior and mid-code transition glitches. However, the pure thermometer-coded approach increases the complexity of the encoders, the chip area, interconnect parasitics, and power consumption. Thus, in this design, a segmented approach



Fig. 16. Interpolation filter and its high-speed MUX. (a) Detailed implementation. (b) Corresponding waveforms. (c) Generating MUX selection signals from different phases of the  $F_{BB}$  clock [22].

with 4-bit  $(I/Q_{BB,UP}[3:0])$  binary-weighted LSB and 7-bit  $(I/Q_{BB,UP}[10:4])$  thermometer-coded MSB cells is adopted. Therefore, the I/Q bank implementation requires 128 MSB, with an aspect ratio of  $W/L = 19.2 \ \mu \text{m}/40 \text{ nm}$ , which are realized as eight parallel cascoded transistors with an aspect ratio of  $W/L = 2.4 \ \mu m/40$  nm and 16 LSB units with an aspect ratio of  $W/L = 1.2 \ \mu m/40$  nm. Moreover, the 7-bit MSB  $(I/Q_{BB,UP}[10:4])$  is split into 3-bit  $(I/Q_{BB,UP}[10:8])$ for a column encoder and 4-bit  $(I/Q_{BB,UP}[7:4])$  for a row encoder. Hence, the 128 MSB units of each part are arranged such that they comprise 16 rows  $(I/Q_{BB,UPR}[15:0])$  and eight columns  $(I/Q_{BB,UPC}[7:0])$ . Furthermore, the LSB units comprise 16 small unit cells  $(I/Q_{BB,UPB}[15:0])$  that occupy a column. In the I/Q RF-DACs floorplan, the "snake" traverse movements are performed among sub-cells to preserve continuity and improve the differential nonlinearities (DNLs). Fig. 17(d) presents the simulated transient voltage and current waveforms of the final stage. As can be seen, the waveforms are no longer ideal square waves or half-sinusoids due to device parasitic capacitance. However, the overlap between high voltage and high current is a small fraction of the whole period.

Instead of having two separate push-pull banks, an interdigitated push-pull layout is implemented. In other words, every other column of the I/Q RF-DAC matrix is dedicated to the in-phase arrays and their 180° out-of-phase counterparts. This technique reduces the overall I/Q RF-DAC core size for the same achievable output power resulting in a highly compact area, minimal mismatch, fewer parasitics, and power consumption leading to an improved overall DTX efficiency. To equalize the primary output traces, swapped/cross-coupled routings for the in-phase and out-of-phase drain lines are utilized. A data-aware clock gating technique [47], [48] is also employed to reduce LO distribution power in the back-off and enhance the SE at these levels.

The I/Q RF-DAC sub-cells comprise two parts: a pure digital logic section and a power-cell part [see Fig. 17(b)].



Fig. 17. (a) 11-bit I/Q DPA floor plan. (b) I/Q RF-DAC sub-cells. (c) MSB and LSB power cells aspect ratios. (d) Simulated transient voltage and current waveforms.

The logic part consists of a decoding logic and a time synchronizer flip-flop followed by an I/Q implicit mixing circuit. The AND-OR decoder determines whether the designated cell should be activated. The master/slave DFF is employed to synchronize all I/Q RF-DAC unit cells to the master clock  $(F_S)$ , diminishing undesirable spectral impurity related to an early-late arrival of each unit cell's input data. Before the mixing operation, the designated 25%-LO generation is performed using a three-input NAND gate. Next, the synchronized digital data are up-converted by 25%-LO clocks using the bitwise AND operation. Subsequently, the up-converted I/Qbitstreams are combined by the subsequent NAND gate to fulfill I/Q interleaving and fed to the power cell inverter buffers. As stated previously, the orthogonal summing of the I and Qpaths is achieved by employing the complementary quadrature 25%-LO clocks at this stage. As a result, the local 25% duty-cycle generation and up-conversion circuit are among the most crucial building blocks of the DPA chain. All critical digital logics are implemented based on symmetrical gates to equalize the delay from the input to the output and the fan-out



Fig. 18. (a) Physical layout implementation of the four-way Doherty DTX configuration. (b) 8-shaped inductor's EM fields. (c) Simulated insertion loss of the Doherty network. (d) Implemented distributed series ac-coupling capacitors. (e) EM simulation of the  $C_{21}$  capacitance. (f) Quality factor of the  $C_{21}$ .

for proceeding circuitry. The power cells are current-mode class-D (CMCD) PAs [49], [50] driven by three-stage digital buffers. Meanwhile, since, in the CMCD, the drain voltage can exceed two to three times the supply voltage, thus, a cascode topology is adopted in this design for preventing reliability violations.

#### D. Doherty Power Combiner Floor Plan

Fig. 18(a) illustrates the Doherty power combiner floor plan. Its layout can be so compact that the four-way combiner's inductors suffer from undesired magnetic couplings between the different Doherty branches. Consequently, 8-shaped inductors that offer self-cancellation of their EM field are implemented. As illustrated in Fig. 18(b), the inductors provide an opposite orientation of the magnetic flux for their coil loops to avoid unwanted magnetic coupling between the closely spaced DPAs. As depicted in Fig. 18(d), to bridge a physical distance between the DPA banks, the series accoupling capacitors ( $C_{01}, C_{12}, \ldots, C_{22}$ ) are implemented as distributed capacitors. As illustrated, these distributed capacitors can be modeled by a series LC-network that provides the same susceptance at the fundamental frequency as the original floating capacitors in the high-pass sections. The EM simulation results of  $C_{21}$  are presented in Fig. 18(e) and (f). As depicted, the capacitance value is exactly 1.18 pF, as the originally designed  $C_{21}$  value in the high-pass section, while the self-resonance frequency of the LC-network occurs at 19.2 GHz, far enough from the designated operational bandwidth of the DTX. The quality factor of this capacitor is Q = 76 representing its broadband operation. As demonstrated in Fig. 18(c), the EM simulated insertion loss is approximately 1.7 dB at 5.2 GHz.

#### V. EXPERIMENTAL RESULTS

The proposed DTX is designed and fabricated in the 40-nm bulk CMOS. Fig. 12(b) exhibits the chip micrograph,



Fig. 19. Measurement setup.

while the block names are specified in a table. The chip occupies an area of 2.25 mm × 1.58 mm with a core area of 1.3 mm × 1.15 mm, as shown in Fig. 12(b). Moreover, the SPI, the designated SRAMs, and the low-speed part of the interpolation filter are digitally synthesized and occupy an area of  $2 \times 0.67$  mm × 0.38 mm, while decoupling capacitors and I/O pads occupy the remainder. The measurement setup is shown in Fig. 19. The I/Q data are generated in MATLAB and then applied to the DTX using four on-chip 4-K SRAMs (one SRAM for each I/Q DPA) running at  $F_S = 675$  MHz. The power consumption of all blocks (except the SRAMs) is included in the reported SE.

#### A. Static Measurements

The DTX is first characterized by static measurements. The output power is measured using a power meter. Fig. 20(a) presents the measured output power, DE,1 and SE2 over a 4–6.2 GHz band under the static input condition of  $I_{BB} =$  $Q_{\rm BB} = 2047$  for all DPAs. The proposed bits-in RF-out transmitter generates 27.54 dBm peak output power and 46.35% DE at 5.3 GHz with a supply voltage of 1 V dedicated to each DPA. It achieves a 3 dB bandwidth of 1.3 GHz in a 4.6-5.9 GHz band, while the 1 dB bandwidth is roughly 5–5.6 GHz, maintaining decent performance. To measure the DTX PBO performance, the corresponding  $I_{\rm BB}/Q_{\rm BB}$  data are swept based on the current profile of the main and peak DPAs, as discussed in Section III. The measured drain and system efficiencies versus output power at different frequencies are presented in Fig. 20(b), vielding a DE of 37.35/35.47/33.49%, 40.33/38.11/36.95%, 43.49/41.06/40.6%, and 41.74/40.16/39.27% for 6/9/12 dB PBOs at 5.2/5.3/5.4/5.5 GHz, respectively. These results indicate that the realized compact four-way Doherty DTX maintains its decent drain and system efficiencies enhancement

<sup>2</sup>SE(%) =  $100 \times (P_{\text{RF}_{\text{Out}}})/(P_{\text{dc-Power Cells}} + P_{\text{dc-All Blocks (Except SRAMS)}})$ .

<sup>&</sup>lt;sup>1</sup>DE(%) =  $100 \times (P_{\text{RF}_{\text{Out}}})/(P_{\text{dc}-\text{Power Cells}})$ .



Fig. 20. Measured (a) peak output power, drain, and system efficiencies versus operational frequency, (b) drain and system efficiencies versus output power at different frequencies, (c) drain/system efficiencies, and the related dc power consumption versus power back-off at 5.4 GHz, and (d) power consumption breakdown at the full power and 12 dB back-off at 5.4 GHz frequency.

over DPBO. Fig. 20(c) demonstrates the drain/system efficiencies and the related dc power consumption versus power back-off at 5.4 GHz. Fig. 20(d) represents the power consumption breakdown at the full power and 12 dB back-off at 5.4 GHz frequency. As expected, the SE degrades more in lower codewords as it includes the power consumption of circuit blocks that do not scale with the output power. However, utilizing more effective clock gating mitigates this issue.

#### B. Single-Sideband Signal Measurements

The performance of the proposed DTX versus power back-off is measured with a single-sideband signal at 5.4 GHz with 200 MHz tone-spacing, and the output spectrum and corresponding I/Q image, LO leakage, and C-IMD3/ $H_{3BB}$ suppression are demonstrated in Fig. 21(a) and (b), respectively. Hence, the uncalibrated I/Q image, LO leakage, and C-IMD3/ $H_{3BB}$  preserve their approximate value of -64/-65/-69 dBc over the power back-off. In addition, the I/Q trajectory depicted in Fig. 21(c) shows that the proposed 50%-LO signed I/Q interleave up-converter retains its orthogonal summation enhancement over the output power back-off, and it demonstrates correct sign-bit operation without compression. Moreover, the single-sideband performance over the frequency band of 4.5-6 GHz dependent on tonespacing of 10-140 MHz is presented in Fig. 21(d)-(f). The intrinsic LO leakage, I/Q image, and C-IMD3/ $H_{3BB}$  remain better than -64/-60/-67 dBc, respectively, without calibration while measuring five different samples of the proposed DTX.

#### C. Complex Modulated Signal Measurements

The DTX dynamic performance is also verified by employing OFDM signals with different modulation bandwidths.



Fig. 21. Measured (a) output spectrum of a single-tone test with 200 MHz tone-spacing at 5.4 GHz versus power back-off, (b) its corresponding LO leakage, I/Q image, and C-IMD3/ $H_{3BB}$ , and (c) I/Q trajectory. (d)–(f) Measured single-sideband performance over frequency band versus tone-spacing.

A simple fixed memoryless  $2 \times 1$ -D DPD is employed for all complex modulated signals in this work. The effect of the I/Q image on the performance of the multi-carrier scenarios is demonstrated in Fig. 22. In the first scenario, a two-carrier "20 MHz 64-QAM OFDM" signal, located at -90 (CH1) and -50 MHz (CH2), respectively, away from  $f_C = 5.4$  GHz, is applied to the DTX. Fig. 22(a) and (b) demonstrates the spectrum (blue) and its constellation diagrams with EVMs of -35.64 and -33.65 dB, respectively. In the second scenario, the TX signal is mirrored with respect to the carrier frequency, locating the channels at +50 (CH3) and +90 MHz (CH4), respectively, away from  $f_C$ . Fig. 22(a) and (c) shows the spectrum (red) and its constellation diagrams with EVMs of -33.44 and -34.94 dB, respectively. As illustrated, the channels' image component in the first scenario is located at the same position as the channels in the second scenario and vice versa. Therefore, when operating four channels simultaneously, large image components dramatically deteriorate the EVM of each channel. The spectrum (black) and its constellation diagrams of the four-channel scenario are presented in Fig. 22(a) and (d), exhibiting that the channels' EVM has not been degraded significantly due to the decent I/Q image performance of the DTX.

The spectral purity of a single-carrier "40 MHz 256-QAM OFDM" signal is measured at  $f_C = 5.4$  GHz. The measured spectrum of the signal and its constellation diagram are depicted in Fig. 23(a) and (b). The DTX achieves an average

 TABLE I

 Performance Summary and Comparison With State-of-the-Art Works

| Specifications                  |          | This Work                                              |                             | ISSCC 2021<br>A. Zhang |                         | ISSCC 2021<br>B. Yang               |                       | JSSC 2020<br>S.W. Yoo  |                         | JSSCC 2020<br>A. Bassat |                 | RFIC 2020<br>J. Sheth     | TMTT 2019<br>D. Jung | JSSC 2020<br>D. Jung | JSSC 2020<br>Y. Yin     | ISSCC 2016<br>P. Filho           | TMTT 2017<br>W. Gaber   | JSSC 2018<br>M.Mehrpoo | CICC<br>Y.S         | 2020<br>hen       |
|---------------------------------|----------|--------------------------------------------------------|-----------------------------|------------------------|-------------------------|-------------------------------------|-----------------------|------------------------|-------------------------|-------------------------|-----------------|---------------------------|----------------------|----------------------|-------------------------|----------------------------------|-------------------------|------------------------|---------------------|-------------------|
| Technology                      |          | CMOS 40nm                                              |                             | CMOS 65nm              |                         | CMOS 40nm                           |                       | CMOS 60nm              |                         | CMOS 28nm               |                 | CMOS 65nm                 | CMOS 55nm            | SOI 45nm             | CMOS 40nm               | CMOS 28nm                        | CMOS 28nm               | CMOS 40nm              | n CMOS 40nm         |                   |
| Architecture                    |          | Quadrature<br>4-way Doherty<br>/ CMCD                  |                             | Current-Mode SHS       |                         | Quadrature SFCPA<br>/Hybrid Doherty |                       | TI-Doherty<br>/Class G |                         | Polar                   |                 | 4-way<br>Doherty          | Analog<br>Doherty    | Hybrid<br>Doherty    | Switched<br>Transformer | RQDAC                            | DDRM                    | DDRM                   | IQ-Mapping<br>DDRM  |                   |
| Die Area                        |          | 3.55mm <sup>2</sup> (1.5mm <sup>2</sup> <sup>‡</sup> ) |                             | 7.1mm <sup>2</sup>     |                         | 2.2mm <sup>2</sup>                  |                       | 3.36mm <sup>2</sup>    |                         | 4mm <sup>2 ¥1</sup>     |                 | 3mm <sup>2</sup>          | 6mm <sup>2</sup>     | 6mm <sup>2</sup>     | 0.8mm <sup>2</sup>      | 0.22mm <sup>2</sup> <sup>‡</sup> | 1.53mm <sup>2</sup>     | 0.21mm <sup>2‡</sup>   | 1.1mm <sup>2‡</sup> |                   |
| Supply                          |          | 1V                                                     |                             | N/A                    |                         | 1.2/2.4V                            |                       | 2.5V                   |                         | 1.4V                    |                 | 0.55V                     | 5.5V                 | 1.2V                 | 1.1V                    | 0.9V                             | 3.6V                    | 2V                     | 2.5V                |                   |
| Frequency                       |          | 5.4GHz                                                 |                             | 5.4GHz                 |                         | 2.4GHz                              |                       | 2.4GHz                 |                         | 5GHz                    |                 | 5.25GHz                   | 5.8GHz               | 2.3GHz               | 1.5GHz                  | 2.4GHz                           | 1GHz                    | 3GHz                   | 2.4GHz              |                   |
| 1-dB Power BW                   |          | 5-5.6GHz                                               |                             | 5.3-6.05GHz            |                         | 2.3-2.8GHz*                         |                       | N/A                    |                         | N/A                     |                 | 4.5-5.25GHz*              | 5.4-6.1GHz*          | 2.1-2.5GHz*          | 1.3-3.5GHz              | N/A                              | N/A                     | N/A                    | 0.5-2GHz*           |                   |
| Peak P <sub>Out</sub>           |          | 27.4dBm                                                |                             | 27dBm (5.7GHz)         |                         | 30.3dBm                             |                       | 30dBm                  |                         | 27dBm                   |                 | 6.5dBm                    | 27.2dBm              | 22.4dBm              | 21.4dBm                 | 3.5dBm**                         | 21dBm**                 | 9.2dBm**               | 14.1dBm** @2GHz     |                   |
| Ē                               | Peak     | 47.4% / 30.66%                                         |                             | 40.1% / N/A5.4GHz      |                         | 41.3% / 36.5%                       |                       | 40.2% / N/A            |                         | 37% / N/A               |                 | 42% / 26%                 | N/A / 24.5%          | 38.5% / N/A          | N/A / 31.3%             | N/A                              | N/A / 33%**             | NA / 5.7%†             | NA / 7.56%          |                   |
| siency<br>System                | 3dB PBO  | 47.68% / 31.63%                                        |                             | 32%* / N/A             |                         | 37.5% / 32.9%                       |                       | 37.9% / N/A            |                         | 31%* / N/A              |                 | 36% / 20%*                | N/A / 22*%           | 33%* / N/A           | N/A / 28%*              | N/A                              | N/A                     | N/A                    | N/A                 |                   |
|                                 | 6dB PBO  | 43.49% / 26.34%                                        |                             | 26.3% / N/A            |                         | 36.1% / 29.1%                       |                       | 38.8% / N/A            |                         | 24%* / N/A              |                 | 28% / 14%                 | N/A / 13*%           | 25%* / N/A           | N/A / 27.7%             | N/A                              | N/A                     | N/A                    | N/A                 |                   |
| Tain Effi                       | 9dB PBO  | 41.06% / 23.1%                                         |                             | 29.2% / N/A            |                         | 30.9% / 23.7%                       |                       | 36.3% / N/A            |                         | 18% / N/A               |                 | 23% / 15%*                | N/A / 8*%            | 18.7% / N/A          | N/A / 18%*              | N/A                              | N/A                     | N/A                    | N/A                 |                   |
| ē                               | 12dB PBO | 40.6% / 18.1%                                          |                             | 19.1% / N/A            |                         | 26.2% / 18.6%                       |                       | 29.4% / N/A            |                         | 14%* / N/A              |                 | 20%* / 7%*                | N/A                  | 13%* / N/A           | N/A / 16.6%             | N/A                              | N/A                     | N/A                    | N/A                 |                   |
| Modulation                      |          | 240MHz<br>6×256-QAM<br>OFDM                            | 320MHz<br>4×512-QAM<br>OFDM | 20MHz<br>256-QAM       | 80MHz<br>64-QAM<br>OFDM | 60MHz<br>256-QAM                    | 40MHz<br>1024-<br>QAM | 10MHz<br>1024-<br>QAM  | 10MHz<br>64-QAM<br>OFDM | 20MHz<br>MCS7           | 160MHz<br>MCS11 | 1.6MHz<br>16 <b>-</b> QAM | 80MHz<br>256-QAM     | 40MHz<br>64-QAM      | 20MHz<br>64-QAM<br>LTE  | 20MHz<br>64-QAM                  | 40MHz<br>64-QAM<br>WLAN | 113MHz<br>64-QAM       | 160MHz<br>256-QAM   | 320MHz<br>256-QAM |
| Average P <sub>Out</sub> (dBm)  |          | 17.82                                                  | 18.16                       | 22                     | 18                      | 23.3                                | 20.4                  | 23.2                   | 19.1                    | 21.1                    | 19.2            | 2.9                       | 17                   | 15.3                 | 15.2                    | -3.87                            | 12                      | 0.1                    | NA                  | 5.15              |
| Average Efficiency              |          | 41.23%(DE)<br>22.17%(SE)                               | 41.12%(DE)<br>20.52%(SE)    | 27.4%<br>(DE)          | 28.1%<br>(DE)           | 30.7%<br>(DE)                       | 22.6%<br>(DE)         | 36.2%<br>(DE)          | 30.3%<br>(DE)           | 26%<br>(PE)             | 21%<br>(PE)     | 34%<br>(DE)               | 5.3%<br>(PAE)        | 24.7%<br>(DE)        | 25.3%<br>(PAE)          | 1.7%* (SE)                       | 11% (SE)                | NA                     | A NA                |                   |
| EVM                             |          | -32.28dB                                               | -29.65dB                    | -33.5dB                | -30.4dB                 | -31.9dB                             | -35.9dB               | -44.5dB                | -41.7dB                 | -28dB                   | -35dB           | -20dB                     | -34.8dB              | -32dB                | -32.5dB                 | -36dB                            | -30.3dB                 | -27dB                  | -36dB               | -32dB             |
| PAPR                            |          | 9.68dB                                                 | 9.41dB                      | 4.4dB                  | 8.4dB                   | 6.98dB                              | 9.86dB                | 6.8dB                  | 10.9dB                  | 5.9dB                   | 7.8dB           | 3.6dB                     | 6.3dB <sup>\$</sup>  | 7.1dB                | 6.2dB                   | 7dB                              | 8.73dB                  | 6.2dB                  | 8dB                 | 8dB               |
| LO Leakage /<br>IQ image / CIM3 |          | <-58 / -60 / -67dBc                                    |                             | N/A                    |                         | N/A                                 |                       | N/A                    |                         | N/A                     |                 | N/A                       | N/A                  | N/A                  | N/A                     | -59 / -44<br>/ -50dBc            | -44 / -39<br>/ -26dBc   | N/A / -45<br>/ -56dBc  | 5 -52 / -54         |                   |
| Linearization                   |          | DPD                                                    |                             | N/A                    |                         | DPD                                 |                       | No                     |                         | DPD                     |                 | DPD                       | MGTR                 | AM-AM LUT DPD        |                         | DPD                              | DPD                     | No N                   |                     | 0                 |

\* Estimated from reported figures and plots. \*\* Off-chip matching network. \*Core area. \*1 Area including Digital front end, DPLL, and LB/HB DTX. †Excluding LO generation. <sup>\$</sup> Measured at 3.9dB additional PBO



Fig. 22. Measured (a) output spectrum of two-/four-carrier 20 MHz 64-QAM OFDM scenarios and (b)–(d) corresponding constellation diagram and EVM.

output power of 18.9 dBm while maintaining the average drain and system efficiencies of 43.11% and 24.51%, respectively. Utilizing the abovementioned fixed memoryless  $2 \times 1$ -D DPD, the ACLR is better than -47 dBc, and the EVM is -40.03 dB. The ACLR and average EVM performances versus average output power are also exhibited in Fig. 23(c), reaching -45.14 dB EVM at 12 dBm average output power, while the ACLR is better than -51 dBc. These results indicate that



Fig. 23. Measured (a) spectrum of single-carrier 40 MHz 256-QAM OFDM signal and (b) constellation diagram and EVM. (c) ACLR and average EVM performances versus average output power.

the spectral purity and EVM of the proposed digital TX can meet the TX spectral emission requirements of the prevailing wireless communication standards.

A six-carrier "40 MHz 256-QAM OFDM" signal with an aggregated bandwidth of 240 MHz is applied to the DTX, and the performance is verified at  $f_C = 5.4$  GHz using the simple fixed memoryless 2 × 1-D DPD. Fig. 24(a) exhibits the measured spectrum of the signal and its CH6 constellation diagram. Accordingly, the DTX delivers 17.82 dBm average power while achieving an ACLR of better than -39 dBc and



\* A 17.1dB external loss at 5.4GHz is de-embeded. (a)

Fig. 24. Measured (a) spectrum of six-carrier 256-QAM OFDM signal and (b) worst channel constellation diagram and EVM. (c) ACLR and average EVM performances versus average output power.



Fig. 25. (a) Measured suppression of SSRs using ZOH/FOH/SOH filters. (b) Worst channel constellation diagram and EVM using SOH filter.

an average EVM of -32.28 dB. The measured average drain and system efficiencies are 41.23% and 22.17%, respectively. The ACLR and average EVM performances versus average output power are also exhibited in Fig. 24(c), achieving -37.55 dB average EVM at 12 dBm average power, while the ACLR is better than -44 dBc.

The effectiveness of signed ZOH, FOH, and SOH filters on the suppression of SSRs is demonstrated in Fig. 25 where a four-carrier "80 MHz 512-QAM OFDM" signal with an aggregated bandwidth of 320 MHz and  $F_S = 675$  MHz is applied to the DTX, and the interpolation filter order is varied. The SOH filter achieves suppression of more than 30 dB compared to ZOH for such a wideband signal.

The performance of our DTX is summarized and compared to that of the prior art in Table I. It indicates that the realized compact four-way Doherty DTX achieves excellent efficiency at 12 dB DPBO at 5.4 GHz while generating more than 27 dBm peak power. The I/Q DTX can also support wide modulation bandwidth with high average output power and decent average drain and system efficiencies. Moreover, our 50%-LO signed I/Q interleaved upconverter yields exceptional orthogonal I/Q summation compared to the other works.

#### VI. CONCLUSION

This article demonstrates a compact wideband digital I/Q transmitter realized in a 40-nm bulk CMOS. Introducing a

50%-LO signed I/Q interleaved up-converter and a compact four-way Doherty combiner, the proposed DTX achieves a spectrally pure operation, and simultaneously, it enhances its DPBO efficiency. The DTX generates more than 27.54 dBm with 46.35% DE in a 4–6.2 GHz band. Its EVM and ACLR performance are better than -31 and 39 dB, respectively, for a six-carrier "40 MHz 256-QAM OFDM" signal. The realized DTX can be reconfigured as a stand-alone DDS to support signals with a 320 MHz modulation bandwidth. Moreover, the proposed bits-in RF-out TX replicates the spectral purity performance of its state-of-the-art analog-intensive counterparts. Finally, it can perform as a high-power energy-efficient CMOS transmitter to target next-generation multi-band/ multi-band applications requiring large modulation/aggregated bandwidth.

#### ACKNOWLEDGMENT

The authors would like to thank Atef Akhnoukh and Zu-Yao Chang for their strong support during the design, fabrication, and measurement, and the members of Electronics Circuits and Architectures (ELCA) for helpful technical discussions. imec-Leuven is acknowledged for handling the tape-out.

#### REFERENCES

- [1] B. Yang, H. J. Qian, and X. Luo, "26.5 A watt-level quadrature switched/floated-capacitor power amplifier with back-off efficiency enhancement in complex domain using reconfigurable self-coupling canceling transformer," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64, Feb. 2021, pp. 362–364.
- [2] A. Zhang, C. Yang, M. Ayesh, and M. S.-W. Chen, "26.6 A 5-to-6 GHz current-mode subharmonic switching digital power amplifier for enhancing power back-off efficiency," in *Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC)*, vol. 64, Feb. 2021, pp. 364–366.
- [3] S.-M. Babamir and B. Razavi, "A digital RF transmitter with background nonlinearity correction," *IEEE J. Solid-State Circuits*, vol. 55, no. 6, pp. 1502–1515, Jun. 2020.
- [4] A. Ben-Bassat *et al.*, "A fully integrated 27-dBm dual-band alldigital polar transmitter supporting 160 MHz for Wi-Fi 6 applications," *IEEE J. Solid-State Circuits*, vol. 55, no. 12, pp. 3414–3425, Dec. 2020.
- [5] H. J. Qian *et al.*, "A quadrature digital power amplifier with hybrid Doherty and impedance boosting for efficiency enhancement in complex domain," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Aug. 2020, pp. 127–130.
- [6] Y. Yin *et al.*, "A broadband switched-transformer digital power amplifier for deep back-off efficiency enhancement," *IEEE J. Solid-State Circuits*, vol. 55, no. 11, pp. 2997–3008, Nov. 2020.
- [7] S. Yoo et al., "10.7 A 0.26 mm<sup>2</sup> DPD-less quadrature digital transmitter with <-40 dB EVM Over >30 dB Pout range in 65nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 184–186.
- [8] S.-W. Yoo, S.-C. Hung, and S.-M. Yoo, "A multimode multi-efficiencypeak digital power amplifier," *IEEE J. Solid-State Circuits*, vol. 55, no. 12, pp. 3322–3334, Dec. 2020.
- [9] D. Zheng et al., "A 15b quadrature digital power amplifier with transformer-based complex-domain power-efficiency enhancement," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 370–372.
- [10] Z. Bai, A. Azam, and J. S. Walling, "A frequency tuneable switchedcapacitor PA in 65 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2019, pp. 295–298.
- [11] Z. Bai, W. Yuan, A. Azam, and J. S. Walling, "4.3 A multiphase interpolating digital power amplifier for TX beamforming in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 78–80.
- [12] E. Bechthum et al., "A CMOS polar class-G switched-capacitor PA with a single high-current supply, for LTE NB-IoT and eMTC," *IEEE J. Solid-State Circuits*, vol. 54, no. 7, pp. 1941–1951, Jul. 2019.

- [13] Y. Yin, L. Xiong, Y. Zhu, B. Chen, H. Min, and H. Xu, "A compact dualband digital polar Doherty power amplifier using parallel-combining transformer," *IEEE J. Solid-State Circuits*, vol. 54, no. 6, pp. 1575–1585, Jun. 2019.
- [14] S.-W. Yoo, S.-C. Hung, and S.-M. Yoo, "A watt-level quadrature class-G switched-capacitor power amplifier with linearization techniques," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1274–1287, May 2019.
- [15] A. Zhang and M. S.-W. Chen, "A watt-level phase-interleaved multisubharmonic switching digital power amplifier," *IEEE J. Solid-State Circuits*, vol. 54, no. 12, pp. 3452–3465, Dec. 2019.
- [16] W. M. Gaber *et al.*, "A 21-dBm I/Q digital transmitter using stacked output stage in 28-nm bulk CMOS technology," *IEEE Trans. Microw. Theory Techn.*, vol. 65, no. 11, pp. 4744–4757, Nov. 2017.
- [17] M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. de Vreede, "An intrinsically linear wideband polar digital power amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3312–3328, Dec. 2017.
- [18] E. Roverato *et al.*, "All-digital LTE SAW-less transmitter with DSPbased programming of RX-band noise," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3434–3445, Dec. 2017.
- [19] W. Yuan, V. Aparin, J. Dunworth, L. Seward, and J. S. Walling, "A quadrature switched capacitor power amplifier," *IEEE J. Solid-State Circuits*, vol. 51, no. 5, pp. 1200–1209, May 2016.
- [20] P. E. P. Filho, M. Ingels, P. Wambacq, and J. Craninckx, "A 0.22 mm<sup>2</sup> CMOS resistive charge-based direct-launch digital transmitter with 159 dBc/Hz out-of-band noise," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Jan. 2016, pp. 250–252.
- [21] Y. Shen, R. Bootsman, M. S. Alavi, and L. C. N. de Vreede, "A 1–3 GHz I/Q interleaved direct-digital RF modulator as a driver for a commongate PA in 40 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Aug. 2020, pp. 287–290.
- [22] M. Mehrpoo et al., "A wideband linear-interleaving DDRM," IEEE J. Solid-State Circuits, vol. 53, no. 5, pp. 1361–1373, May 2018.
- [23] Y. Shen, R. Bootsman, M. S. Alavi, and L. de Vreede, "A 0.5-3 GHz I/Q interleaved direct-digital RF modulator with up to 320 MHz modulation bandwidth in 40 nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Mar. 2020, pp. 1–4.
- [24] M. S. Alavi et al., "A Wideband 2×13-bit all-digital I/Q RF-DAC," IEEE Trans. Microw. Theory Techn., vol. 62, no. 4, pp. 732–752, Apr. 2014.
- [25] H. Jin, D. Kim, and B. Kim, "Efficient digital quadrature transmitter based on IQ cell sharing," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1345–1357, May 2017.
- [26] Z. Deng *et al.*, "A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan. 2016, pp. 172–173.
- [27] Z. Hu, L. C. N. de Vreede, M. S. Alavi, D. A. Calvillo-Cortes, R. B. Staszewski, and S. He, "A 5.9 GHz RFDAC-based outphasing power amplifier in 40-nm CMOS with 49.2% efficiency and 22.2 dBm power," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, May 2016, pp. 206–209.
- [28] W. Tai *et al.*, "A transformer-combined 31.5 dBm outphasing power amplifier in 45 nm LP CMOS with dynamic power control for back-off power efficiency enhancement," *IEEE J. Solid-State Circuits*, vol. 47, no. 7, pp. 1646–1658, Jul. 2012.
- [29] I. Hakala, D. K. Choi, L. Gharavi, N. Kajakine, J. Koskela, and R. Kaunisto, "A 2.14-GHz Chireix outphasing transmitter," *IEEE Trans. Microw. Theory Techn.*, vol. 53, no. 6, pp. 2129–2138, Jun. 2005.
- [30] E. McCune, Dynamic Power Supply Transmitters: Envelope Tracking, Direct Polar, and Hybrid Combinations (The Cambridge RF and Microwave Engineering Series). New York, NY, USA: Cambridge Univ. Press, 2015.
- [31] D. Jung, S. Li, J.-S. Park, T.-Y. Huang, H. Zhao, and H. Wang, "A CMOS 1.2-V hybrid current- and voltage-mode three-way digital Doherty PA with built-in phase nonlinearity compensation," *IEEE J. Solid-State Circuits*, vol. 55, no. 3, pp. 525–535, Mar. 2020.
- [32] J. Sheth and S. M. Bowers, "A differential digital 4-way Doherty power amplifier with 48% peak drain efficiency for low power applications," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Aug. 2020, pp. 119–122.
- [33] H. T. Nguyen, S. Li, and H. Wang, "4.6 A mm-wave 3-way linear Doherty radiator with multi antenna coupling and on-antenna currentscaling series combiner for deep power back-off efficiency enhancement," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 84–86.

- [34] Y. Shen *et al.*, "A fully-integrated digital-intensive polar Doherty transmitter," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2017, pp. 196–199.
- [35] V. Vorapipat, C. S. Levy, and P. M. Asbeck, "A class-G voltage-mode Doherty power amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3348–3360, Dec. 2017.
- [36] S. Hu, S. Kousai, and H. Wang, "A broadband mixed-signal CMOS power amplifier with a hybrid class-G Doherty efficiency enhancement technique," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 598–613, Mar. 2016.
- [37] D. Jung, H. Zhao, and H. Wang, "A CMOS highly linear Doherty power amplifier with multigated transistors," *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 5, pp. 1883–1891, May 2019.
- [38] M. Beikmirza *et al.*, "6.2 A 4-way Doherty digital transmitter featuring 50%-LO signed IQ interleave upconversion with more than 27 dBm peak power and 40% drain efficiency at 10 dB power back-off operating in the 5 GHz band," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2021, pp. 92–94.
- [39] M. S. Alavi, A. Visweswaran, R. B. Staszewski, L. C. N. de Vreede, J. R. Long, and A. Akhnoukh, "A 2-GHz digital I/Q modulator in 65-nm CMOS," in *Proc. IEEE Asian Solid-State Circuits Conf.*, Nov. 2011, pp. 277–280.
- [40] M. Ingels et al., "A linear 28 nm CMOS digital transmitter with 2×12bit up to LO baseband sampling and -58 dBc C-IM3," in Proc. 40th Eur. Solid State Circuits Conf. (ESSCIRC), Sep. 2014, pp. 379–382.
- [41] B. Yang et al., "A 65-nm CMOS I/Q RF power DAC with 24- to 42-dB third-harmonic cancellation and up to 18-dB mixed-signal filtering," *IEEE J. Solid-State Circuits*, vol. 53, no. 4, pp. 1127–1138, Jan. 2018.
- [42] M. Ingels, D. Dermit, Y. Liu, H. Cappelle, and J. Craninckx, "A 2×14bit digital transmitter with memoryless current unit cells and integrated AM/PM calibration," in *Proc. 43rd IEEE Eur. Solid State Circuits Conf.*, Sep. 2017, pp. 324–327.
- [43] C. Lu et al., "A 24.7 dBm all-digital RF transmitter for multimode broadband applications in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 332–333.
- [44] H. Wang et al., "A highly-efficient multi-band multi-mode all-digital quadrature transmitter," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 5, pp. 1321–1330, May 2014.
- [45] W. C. E. Neo, J. Qureshi, M. J. Pelk, J. R. Gajadharsing, and L. C. N. de Vreede, "A mixed-signal approach towards linear and efficient *N*-way Doherty amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 55, no. 5, pp. 866–879, May 2007.
- [46] R. Gajadharsing, "N-way Doherty amplifier," U.S. Patent 8928402, Jan. 6, 2015.
- [47] N. H. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective. London, U.K.: Pearson, 2015.
- [48] M. Alavi, J. Mehta, and R. Staszewski, *Radio-Frequency Digital-to-Analog Converters: Implementation in Nanoscale CMOS*. Amsterdam, The Netherlands: Elsevier, 2016.
- [49] D. Chowdhury, S. V. Thyagarajan, L. Ye, E. Alon, and A. M. Niknejad, "A fully-integrated efficient CMOS inverse class-D power amplifier for digital polar transmitters," *IEEE J. Solid-State Circuits*, vol. 47, no. 5, pp. 1113–1122, May 2012.
- [50] T.-P. Hung, A. G. Metzger, P. J. Zampardi, M. Iwamoto, and P. M. Asbeck, "Design of high-efficiency current-mode class-D amplifiers for wireless handsets," *IEEE Trans. Microw. Theory Techn.*, vol. 53, no. 1, pp. 144–151, Jan. 2005.



Mohammadreza Beikmirza (Graduate Student Member, IEEE) received the B.Sc. and M.Sc. degrees in electrical engineering from the Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran, in 2014 and 2016, respectively. He is currently pursuing the Ph.D. degree with the Delft University of Technology, Delft, The Netherlands.

His current research interests include digitalintensive transmitters and RF/monolithic microwave integrated circuits design.



Yiyu Shen (Member, IEEE) received the M.S. degrees in microelectronics from Tsinghua University, Beijing, China, and Katholieke Universiteit Leuven, Leuven, Belgium, in 2014. He is currently pursuing the Ph.D. degree in electrical engineering with the Delft University of Technology (TU Delft), Delft, The Netherlands.

His current research interests include power amplifiers and digital-assisted RF integrated circuits and systems.



Leo C. N. de Vreede (Senior Member, IEEE) received the Ph.D. degree (*cum laude*) from the Delft University of Technology, Delft, The Netherlands, in 1996.

In 1996, he was appointed as an Assistant Professor at the Delft University of Technology, working on the nonlinear distortion behavior of active devices. In 1999 and 2015, he was appointed, respectively, as an Associate Professor and a Full Professor at the Delft University of Technology, where he became responsible for the Electronic

Research Laboratory (ERL/ELCA). During that period, he worked on solutions for improved linearity and RF performance at the device, circuit, and system levels. He is a Co-Founder/Advisor of Anteverta-mw, Eindhoven, The Netherlands, a company specialized in RF device characterization. He has (co)authored more than 120 IEEE refereed conference papers and journal articles. He holds several patents. His current interests include RF measurement systems, RF technology optimization, and (digital-intensive) energy-efficient/wideband circuit/system concepts for wireless applications.

Dr. de Vreede was a (co)recipient of the IEEE Microwave Prize in 2008 and a Mentor of the Else Kooi Prize Awarded Ph.D. Work in 2010 and the Dow Energy Dissertation Prize Awarded Ph.D. Work in 2011. He was a recipient of the TUD Entrepreneurial Scientist Award in 2015. He (co)guided several students who won (best) paper awards at the Bipolar/BicMOS Circuits and Technology Meeting (BCTM), Program for Research on Integrated Systems and Circuits (PRORISC), the European Solid-state Circuits and Devices Conference (ESSDERC), the International Microwave Symposium (IMS), the Radio-Frequency Integration Technology (RFIT), and the Radio Frequency Integrated Circuits Symposium (RFIC).



**Morteza S. Alavi** (Member, IEEE) received the B.Sc. degree in electrical engineering from Iran University of Science and Technology, Tehran, Iran, in 2003, the M.Sc. degree in electrical engineering from the University of Tehran, Tehran, in 2006, and the Ph.D. degree in electrical engineering from the Delft University of Technology (TU-Delft), Delft, The Netherlands, in 2014.

He was a Co-Founder and the CEO of DitIQ B.V., The Netherlands, a local company developing energy-efficient, wideband wireless transmit-

ters for the next generation of the cellular network. Since September 2016, he has been an Assistant Professor with the ELCA Group, Delft University of Technology. He has coauthored *Radio-Frequency Digital-to-Analog Converter* (Elsevier, 2016). His main research interest is designing high-frequency and high-speed wireless/cellular communication and sensor systems, as well as in the field of wireline transceivers.

Dr. Alavi was the Best Paper Award Recipient of the 2011 IEEE International Symposium on Radio-Frequency Integrated Technology (RFIT). He was a recipient of the Best Student Paper Award (Second Place) of the 2013 Radio-Frequency Integrated Circuits (RFIC) Symposium. His Ph.D. student also won the Best Student Paper Award (First Place) of the 2017 RFIC Symposium held in Honolulu, HI, USA. He also serves as a Reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), the IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES (TMTT), IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS (TCAS-I), IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, and *ICEIE Transactions* on *Ocmmunications*.