## Digital-Intensive Up-Converters for Wireless Communication Shen, Y. 10.4233/uuid:e42721d6-6bf6-47b3-a683-3497f3d917ae **Publication date** **Document Version** Final published version Citation (APA) Shen, Y. (2021). Digital-Intensive Up-Converters for Wireless Communication. [Dissertation (TU Delft), Delft University of Technology]. https://doi.org/10.4233/uuid:e42721d6-6bf6-47b3-a683-3497f3d917ae Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Copyright Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. ## Digital-Intensive Up-Converters for Wireless Communication Yiyu Shen ## Digital-Intensive Up-Converters for Wireless Communication ## Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus Prof. dr. ir. T. H. J. J. van der Hagen, voorzitter van het College voor Promoties, in het openbaar te verdedigen op Tuesday 23 November 2021 at 12:30 o'clock door ## Yiyu SHEN Master of Science in Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium, and Master of Engineering in Integrated Circuit Engineering, Tsinghua University, Beijing, China. geboren te Beijing, China. This dissertation has been approved by the promotor: Prof. dr. ing. L. C. N. de Vreede Composition of the doctoral committee: Rector Magnificus voorzitter Prof. dr. ing. L. C. N. de Vreede Delft University of Technology, promotor Dr. S. M. Alavi Delft University of Technology, co-promotor ### Independent members: Dr. ir. F. van Rijs Ampleon Prof. dr. P. Wambacq Vrije Universiteit Brussel, Belgium Prof. dr. ir. P. G. M. Baltus Eindhoven University of Technology Prof. dr. ir. B. Nauta University of Twente Prof. dr. ir. W. A. Serdijn Delft University of Technology Prof. dr. C. S. Vaucher Delft University of Technology, reserve member Yiyu Shen, Digital-Intensive Up-Converters for Wireless Communication, Ph.D. Thesis Delft University of Technology. Keywords: up-converters, digital-intensive transmitters (DTXs), digital power amplifiers (DPAs), polar transmitters, efficiency enhancement, phase modulators, direct-digital RF modulators (DDRMs), IQ-image, transmitter line-ups. ISBN 978-94-6419-378-7 Copyright © 2021 by Yiyu Shen Front & Back: cover by Pinpaishejibang, using a photo of EWI building in TU Delft taken by Yiyu Shen, a photo of TU Delft Library available at https://www.tudelft.nl/huisstijl/downloads, and a painting of Vincent Willem van Gogh (Dutch, Zundert 1853–1890 Auvers-sur-Oise) titled Wheat Field with Cypresses (1889), currently exhibited at Metropolitan Museum of Art in New York. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the prior written permission of the copyright owner. Printed in the Netherlands. ## Contents | $\mathbf{C}$ | onter | ${f nts}$ | i | | | |---------------------------------------|--------------|---------------------------------------------------------------------------|----|--|--| | 1 | Introduction | | | | | | | 1.1 | Introduction | 1 | | | | | 1.2 | CMOS TXs: Challenges and Opportunities | 3 | | | | | | 1.2.1 Challenges | 3 | | | | | | 1.2.2 Opportunities | 4 | | | | | 1.3 | Thesis Objective | 5 | | | | | 1.4 | Thesis Outline | 6 | | | | | Refe | erences | 8 | | | | 2 | Wir | reless TX line-ups Review | 9 | | | | | 2.1 | Efficiency of Class-A, Class-B, and Class-AB PAs in Power Back-Off Region | 9 | | | | | 2.2 | Analog Cartesian Line-Ups | 12 | | | | | 2.3 | Polar TXs | 13 | | | | | | 2.3.1 Supply Modulators | 14 | | | | | | 2.3.2 Envelope Tracking System | 15 | | | | | 2.4 | Digital-Intensive TX Line-Ups | 15 | | | | | | 2.4.1 DPAs and Digital-Intensive Polar TXs | 16 | | | | | | 2.4.2 DPA-Based Cartesian DTXs | 18 | | | | | | 2.4.3 Direct-Digital RF Modulator | 19 | | | | 2.5 Efficiency Enhancement Techniques | | Efficiency Enhancement Techniques | 20 | | | | | | 2.5.1 Supply Switching: Class-G Operation | 21 | | | | | | 2.5.2 Doherty Power Amplifiers | 22 | | | | | | 2.5.3 Outphasing Power Amplifiers | 23 | | | | | 2.6 | Conclusion | 24 | | | | | Refe | erences | 24 | | | ii Contents | <b>3</b> | ly Integrated Digital-Intensive Polar Doherty Transmitter 2 | | |------------|-------------------------------------------------------------|----------------------------------------------------------------------| | | 3.1 | Introduction | | | 3.2 | Digital-Controlled Output Stage | | | | 3.2.1 Doherty DPA | | | | 3.2.2 ACW-AM and ACW-PM Curves | | | 3.3 | System Architecture | | | 3.4 | Implementation of the Output Stage | | | | 3.4.1 Unit Cell Implementation | | | | 3.4.2 DPA Bank Floorplan | | | | 3.4.3 Thermometer Encoder and DFF | | | | 3.4.4 Output Matching Network | | | 3.5 | Implementation of Digital Baseband | | | 3.6 | Implementation of LO Generation Circuits | | | 3.7 | Digital Pre-Distortion | | | 3.8 | Measurement Results | | | | 3.8.1 Measurement Setup | | | | 3.8.2 CW Measurement Results | | | | 3.8.3 Single-Tone and Two-Tone Measurement Results | | | | 3.8.4 Broadband Measurement Results | | | | 3.8.5 Performance Summary and Comparison with the State-of-the-Art 5 | | | 3.9 | Conclusion | | | Refe | erences | | | | | | 4 | | DAC-Based Wideband Phase Modulator 5 | | | 4.1 | Overview of Phase Modulators | | | | 4.1.1 PLL-Based Phase Modulators | | | | 4.1.2 Delay Line-Based Phase Modulators 6 | | | | 4.1.3 Cartesian-Based Phase Modulators 6 | | | 4.2 | Towards Wideband Cartesian-Based Phase Modulator 6 | | | | 4.2.1 Phase Error Source Analysis | | | | 4.2.2 Proposed System Architecture | | | | Design of RFDAC | | | | 4.3.1 Design of unit cell | | | | 4.3.2 Floorplan | | | 4.4 | Implementation of Limiter | | | 4.5 | Measurement Results | | | | 4.5.1 Single-tone Test | | | | 4.5.2 Modulated Signal Measurement | | 4.6 | | Conclusion | | References | | | Contents | 5 | $\mathbf{A} \mathbf{V}$ | Videband | d IQ DDRM with an IQ-Mapping Technique | <b>7</b> 9 | | | | | |---|----------------------------|---------------------------------------------|------------------------------------------------------------|------------|--|--|--|--| | | 5.1 | Direct-D | Digital RF Modulators | 79 | | | | | | | | 5.1.1 | Concept of Direct-Digital RF Modulators | 79 | | | | | | | | 5.1.2 C | Comparison between DDRM and DPA-Based Cartesian DTXs | 81 | | | | | | | | 5.1.3 I | Design Challenges of DDRMs | 86 | | | | | | | 5.2 | IQ-Mapp | ping Technique | 89 | | | | | | | | 5.2.1 In | mproved Output Power and Efficiency | 90 | | | | | | | | 5.2.2 In | ntrinsic Image Rejection | 91 | | | | | | | 5.3 | System A | Architecture | 93 | | | | | | | 5.4 Implementation of DDRM | | | | | | | | | | | 5.4.1 S | Schematic of DDRM | 95 | | | | | | | | 5.4.2 E | Binary Cells | 97 | | | | | | | | 5.4.3 F | Floorplan of RFDAC | 97 | | | | | | | | 5.4.4 L | O Distribution Network | 97 | | | | | | | | 5.4.5 C | Output Tree Layout | 99 | | | | | | | | 5.4.6 L | Local Data Decoder | 99 | | | | | | | 5.5 | Impleme | entation of LO Clock Generation | 99 | | | | | | | 5.6 | Impleme | entation of Data Path | 101 | | | | | | | | 5.6.1 In | nterpolation Filters | 102 | | | | | | | | 5.6.2 I | DEM and Thermometer Encoder | 104 | | | | | | | 5.7 | Measure | ement Results | 104 | | | | | | | | 5.7.1 C | CW Test | 105 | | | | | | | | 5.7.2 S | Single-Tone and Two-Tone Tests | 106 | | | | | | | | 5.7.3 E | Broadband Signal Test | 106 | | | | | | | | 5.7.4 C | Comparison with the State-of-the-Art | 111 | | | | | | | 5.8 | Conclusi | ion | 111 | | | | | | | Refe | erences | | 112 | | | | | | _ | | - 10 - : | | | | | | | | 6 | | , • | erleaved DDRM as a Driver for a Common-Gate/Common-Base PA | | | | | | | | 6.1 | Using the DDRM as the Driver for a CG/CB PA | | | | | | | | | 6.2 | Auxiliary Current Division Path | | | | | | | | | 6.3 | _ | Q-Mapping Unit Cell | | | | | | | | | | Q-Mapping in Signed RFDAC | | | | | | | | | | Comparison with Unsigned IQ-Mapping Technique in Chapter 5 | | | | | | | | | | Design of the Unit Cell | | | | | | | | 6.4 | v o 1 | | | | | | | | | | | Concept of Dynamic Biasing Technique | | | | | | | | | | mplementation | | | | | | | | 6.5 | | Harmonic Rejection Technique | | | | | | | | | | Class-B Type Harmonic Rejection | | | | | | | | | 6.5.2 In | nfluence of Amplitude and Phase Mismatch | 132 | | | | | iv Contents | | | 6.5.3 | Implementation | . 133 | | |----------------------|-----------------|--------|------------------------------------------------|-------|--| | | 6.6 | Systen | Architecture | . 136 | | | | 6.7 | Design | Consideration of the CG/CB PA | . 138 | | | | 6.8 | Experi | mental Results with Standalone CMOS Driver | . 140 | | | | | 6.8.1 | CW Measurement | . 140 | | | | | 6.8.2 | Single-Tone and Two-Tone Measurement | . 143 | | | | | 6.8.3 | Broadband Signal Measurement | . 143 | | | | | 6.8.4 | Comparison to State-of-the-Art TXs | . 145 | | | | 6.9 | Experi | mental Results with a CB BJT PA | . 145 | | | | 6.10 | Conclu | ısion | . 147 | | | | | | | | | | 7 | Con | clusio | n | 151 | | | | 7.1 | Thesis | Outcome | . 151 | | | | 7.2 | | al Experiences and Contributions to Other DTXs | | | | | 7.3 | | stions for Future Developments | | | | | Refe | | | | | | Li | st of | Figure | es | 156 | | | Li | st of | Tables | 3 | 163 | | | Summary | | | | | | | Sa | Samenvatting | | | | | | List of Publications | | | | | | | A | Acknowledgement | | | | | ## CHAPTER. ## 1 ## Introduction ## 1.1 Introduction Nowadays, telecommunication is of profound importance to modern society. This is due to the increasing demand for fast and reliable data streams. Cellular communication has evolved over several generations of wireless standards. These have ranged from second-generation cellular, based on the global system for mobile communications (GSM) and enhanced data rates for GSM evolution (EDGE), to third-generation (3G) cellular systems, based on wideband code division access (WCDMA), to fourth-generation (4G) cellular, using 3GPP long-term evolution (LTE), to today's fifth-generation (5G), based on the 3GPP 5G new radio (5G-NR) (Fig. 1.1). Besides cellular communication, various other applications such as localization or near-field communication (NFC) are also reliant on wireless channels. This has resulted in additional standards such as: wireless local area networks (WLANs, IEEE 802.11), bluetooth (BT)/bluetooth low energy (BLE), and ultra-wideband (UWB), which have all been developed to fulfill the very different demands of these wireless links. All these wireless links need to co-exist and be co-integrated into portable devices such as tablets and smartphones. Along with the ongoing iterations in wireless standards, another evolution never seems to stop, namely "Moore's Law". The past decade has witnessed this law, which has dictated industries oriented from complementary-metal-oxide-semiconductor (CMOS) processes, downscale the gate length from 28 nm to 5 nm. For every reduction in size by a factor of two, the transistor density in a CMOS process doubles, while the transistors can operate at a lower supply voltage with reduced parasitics. Consequently, transistors have become faster and more friendly in digital signal 2 Introduction Figure 1.1: Evolution of cellular mobile communication standards. [1] processing (DSP) applications. Moreover, their cut-off frequency has improved, enabling their use in RF analog/mixed-signal applications as well. The improvements in CMOS technology offer a growing potential for expansion into (new) applications in (wireless) market segments which have been traditional strongholds of III-V technology, e.g. GaAs, GaN, or bipolar-based technologies such as SiGe. These more traditional RF technologies, until now, still offer the best performance in terms of linearity, output power, and efficiency when considering purely analog-oriented RF circuits. On the other hand, these analog implementations suffer from low integration, poor design logistics, high costs, while their analog nature yields drift and circuit inaccuracies that can degrade spectral purity. Since CMOS is already the technology of choice for the baseband and signal processing parts of wireless systems, the following question arises: "What if we change the nature of the RF front-end, such that we can start truly benefiting from the power of CMOS in "digital" (switching) operations?" It is this research question that provides the foundation of this thesis. To answer this research question sufficiently, we will focus on the transmitter (TX) (and its modulator/pre-drivers) of a wireless system. It is this key building block that proves to be the most power-hungry and offers the biggest challenges in electrical performance in terms of: bandwidth, output power, linearity/spectral purity, and efficiency. Consequently, this dissertation will explore novel TX architectures that enable high-performance sub-6 GHz TX systems with simultaneously high spectral purity and efficiency. The concepts proposed in this thesis open up a new realm for revolutionary digital-intensive TX (DTX) line-ups that can offer higher functionality, higher performance, and increased integration at reduced costs. These properties are essential to the successful implementation of massive multiple-input-multiple-output (mMIMO) wireless systems, which require between 4, 16, 32, 64, 128, or even 256 more TX line-ups, compared to conventional wireless systems. Figure 1.2: Roadmap of Moore's law. [2] ## 1.2 CMOS TXs: Challenges and Opportunities ### 1.2.1 Challenges As mentioned above, the high cut-off frequencies of today's CMOS technologies have paved the way for high performance CMOS-based analog circuits. However, their low output power is still one of the biggest obstacles in developing fully integrated TXs for wireless applications. As can be seen in Fig. 1.3, the achievable $P_{\rm sat}$ of a CMOS power device is much lower than that of devices fabricated in other technologies [3]. One of the reasons for this is the thin gate oxide layer in these modern CMOS processes. The need for such a thin oxide can be understood by considering the side view of the MOS transistor in Fig. 1.4 [4]. To suppress field-effect-transistor (FET) operation degradation, the doping of the drain must be very high. Consequently, to boost transistor gain in analog operation and to keep the gate voltage swings low for digital operation, a very thin gate oxide is required. In the most advanced CMOS processes, the gate oxide is now less than 2 nm. Such a thin oxide layer is not immune to breakdown when a high voltage is applied between the gate and source. Therefore, in most cases, the output power of a CMOS device is restricted by this breakdown voltage and is significantly lower than that of III-V technologies. Another reason for inferior TX performance is the lossy CMOS substrate and its metal stack. These tend to degrade the performance of on-chip passive devices, such as inductors, transformers, 4 Introduction Figure 1.3: Saturated output power of different semiconductor device technologies vs. frequency. [3] and capacitors, which are commonly used in RF circuits. Due to the short distance between the metal layers used for routing and the low ohmic substrate, as well as the absence of thick copper and aluminum layers in a CMOS process, these insertion losses are typically higher than in those of other III-V processes. Therefore, the output power and energy efficiency of CMOS TX chipsets tend to be worse than those in other semiconductor technologies. Consequently, to meet the high demands of today's smartphones, monolithic microwave integrated circuit (MMIC)-based power amplifiers (PAs) are used as the RF front-end while CMOS chipsets are only used to drive them. Figure 1.5 presents the front and rear of an Iphone-12 Pro motherboard. The central processing unit (CPU), intermediate frequency (IF) modules, UWB modules, and RF modules are all CMOS technology-based, while the MMIC front-ends are based on III-V technology. In summary, the CMOS chipsets are mainly focused on signal generation and signal decomposition, while III-V front-ends are applied to handle the RF signals going from/to the antennas. Therefore, all the chipsets implemented in III-V technologies are placed close to the antennas. One exception, in Fig. 1.5, is the UWB chipsets. Due to its low RF output power, the related UWB CMOS chipset can directly drive the antenna. These disadvantages in CMOS process will deteriorate with CMOS process-scaling: the supply voltage decreases with each process evolution along with the related transistor breakdown voltage. Unfortunately, the targeted RF output power does not scale down as CMOS technology evolves. Therefore, it is increasingly more difficult to generate sufficient RF power with the most advanced CMOS technologies. ### 1.2.2 Opportunities Despite the aforementioned challenges in TX design, employing advanced CMOS technologies also brings some key advantages. First of all, the speed of CMOS circuitry is significantly improved as the technology evolves, providing an excellent platform for DSP. For example, in a 40 nm LP 1.3 Thesis Objective 5 Figure 1.4: Side view of typical CMOS devices. [4] CMOS process, the minimum delay of a single inverter can drop below 20 ps, which is less than half of the delay in a 0.18 $\mu m$ process. As such, it can provide faster signal processing at an even lower level of power consumption. This enables the use of digital calibration and correction for analog circuits. Furthermore, the FETs in a modern CMOS process exhibit excellent switching characteristics. Switching PAs, such as class-D, class-E, and class-F PAs, can be relatively easily employed with CMOS circuits to achieve better efficiency performance than with analog linear PAs. These rapid improvements have opened the door for so-called DTXs which have narrowed the performance gap between CMOS-based TX implementations and their more conventional counterparts. As a result, recently, DTX has become a popular topic in the RFIC research domain. ## 1.3 Thesis Objective From the discussion above, one can conclude that the power efficiency and linearity of (traditional) CMOS-based TXs suffer from many compromises made in process and circuit design. To realize high-efficiency, high-linearity sub-6G DTXs in a modern CMOS process, novelties at both the system and circuit level must be introduced. The objective of this thesis is to develop these techniques and illustrate their effectiveness in proof-of-concept demonstrators. In the first half of this dissertation, the polar TX architecture is chosen as the starting point due to its high efficiency performance. Next, modifications to the polar TX architecture are proposed to extend its modulation bandwidth, while still achieving an output spectrum with high purity. For this purpose, a dedicated ultra-wideband phase modulator has been developed with high phase accuracy. To facilitate the handling of complex modulated signals with a large peak-to-average power ratio (PAPR), a Doherty efficiency enhancement technique has been adopted to further enhance the polar TX-lineup efficiency in power back-off (PBO) region. To improve the overall polar TX line-up efficiency, the entire digital baseband circuitry has been co-integrated on-chip, including coordinate rotation digital computing (CORDIC) and digital pre-distortion (DPD), yielding the first fully functional system-on-chip (SoC) polar Doherty DTX demonstrator. Although this polar demonstrator demonstrates state-of-the-art linearity performance, it still needs some DPD to meet the spectral mask requirements. In practical low-to-medium power 6 Introduction Figure 1.5: Front and rear mainboard of an iPhone-12 (Courtesy of Apple Inc.). applications, such a DPD unit, including an error-correction loop could result in the consumption of too much supply power and therefore might lower the achievable overall TX line-up efficiency. This is especially a concern when targeting very large modulation bandwidths (e.g., > 100 MHz). Therefore, in the second part of this thesis, an alternative approach to the DTX design problem is explored. A very linear, wideband Cartesian DTX architecture is selected as the starting point, after which we aim to improve its power efficiency. Given this strategy, two direct digital RF-modulators (DDRMs) are proposed that provide superior spectral purity and excellent wideband performance, with improved efficiency over conventional Cartesian DTX architectures. ## 1.4 Thesis Outline This thesis is organized as follows. • Chapter 2 gives an overview of the fundamentals used in conventional analog TX line-ups and PAs, and the evolution from analog-intensive TX line-ups towards DTX line-ups. In addition, this chapter gives a brief overview of efficiency enhancement techniques to improve the average efficiency of a wireless system when dealing with complex signals such as quadrature amplitude modulation (QAM) or an orthogonal frequency division multiplexing (OFDM) signal. After the introduction in Chapters 1 and 2, the thesis splits into two parts. The first part is focused on the design and implementation of a fully-integrated digital-intensive polar Doherty TX, which to the best of the author's knowledge, is the first-ever reported fully integrated SoC polar Doherty DTX. The second part of the thesis covers the design of very linear wideband DDRMs. Details of these two different concepts are explained in the following part. 1.4 Thesis Outline 7 • Chapter 3 provides the system-level discussion of the segmented class-E digital Doherty TX. A comprehensive analysis of the segmented switched-mode class-E output stages with their amplitude-control-word (ACW)-AM and ACW-PM distortion is given and extended here to also cover the behavior of the proposed Doherty configuration. The design and implementation details of the complete DTX chain are provided in this chapter, except for the wideband phase modulator which is discussed in Chapter 4. The realized Doherty digital PA (DPA) achieves a measured drain efficiency of respectively 49.4 % at peak power and 33.7 % drain efficiency at its 6 dB PBO point. Using the on-chip DPD functionality, it can support 40 MHz 64-QAM signals at an operating frequency of 2.4 GHz, while achieving an average drain efficiency of 25 %. • Chapter 4 concentrates on the design of an ultra-wideband phase modulator. The severe bandwidth expansion in polar TXs can exceed 3× to 5× the original modulation bandwidth, which yields severe design challenges for the phase modulator in these systems. Any bandwidth constraint in such a phase modulator results in limitations on the spectral purity of the output signal. In these phase modulators, a common source of phase distortion is the presence of third-order counter-intermodulation C-IMD3) products, which result from the folding back of intermodulation (IM) products around the higher harmonics. In the proposed ultra-wideband phase modulator, harmonic rejection (HR) techniques are deployed to suppress these undesired mixing products, and thus, enhancing its phase linearity. To enlarge its video bandwidth, the proposed phase modulator uses an RFDAC-based Cartesian architecture that employs a current-steering topology. Measurement results show that this phase modulator can successfully support 80 MHz of modulation bandwidth with an error-vector-magnitude (EVM) of better than -27 dB, making it an excellent candidate for realizing (future) wideband polar TXs. Chapters 5 and 6 cover the second part of this thesis and are mainly focused on the DDRM architecture to achieve high linearity performance combined with very large modulation bandwidths. In addition, several dedicated efficiency enhancement techniques are proposed for this architecture. • Chapter 5 provides first a comprehensive comparison between DPA based Cartesian DTXs and DDRMs. This shows the advantages and disadvantages of these two increasingly popular architectures. Following that, we propose an advanced IQ-mapping technique for unsigned DDRM operation, which offers both an efficiency and linearity advantage over conventional DDRM implementations. These improved properties are also supported by an analysis. The proposed DDRM architecture is demonstrated to support a video bandwidth up to 320 MHz without DPD. Although unsigned DDRMs can provide superior linearity performance, the PAs they are driving is the bottleneck in the linearity of the entire line-up. Conventionally, these PAs are configured as a common source (CS)/common emitter (CE) topology, which yields the highest gain but suffers from non-linear I-V relations. In a traditional analog design, these non-linear I-V curves are utilized 8 Introduction to clip the waveforms in the output stage, and as such, boost the efficiency. However, at the same time, it is this non-linear behavior that yields distortion in the TX output signal. - Chapter 6, addresses this problem by introducing a new TX line-up based on a signed DDRM that drives a common-gate (CG)/common-base (CB) output stage in pure current-mode operation. In this approach, the waveform clipping, needed for improving the efficiency in the CG/CB PA, is implemented by introducing a novel auxiliary current division path in the DDRM unit cells which allows engineering of the current waveform. Furthermore, signed IQ-mapping, dynamic biasing, and class-B-like HR techniques are also applied to the proposed driver to boost the efficiency and linearity of the line-up. To adapt to the proposed CMOS driver, several special considerations are included in the design and layout of the CG/CB PA. Based on the experimental results, the proposed driver operates over a 1-3 GHz frequency range while generating 19.6 dBm peak RF power with 505 mW DC power consumption at 2.4 GHz. For a 160 MHz 256-QAM signal, the measured adjacent channel leakage ratio (ACLR) is better than -40.5 dBc. When connected to a CB SiGe PA, the peak output power is about 27 dBm with a system efficiency of 20 %. When transmitting an 80 MHz 64-QAM signal at 2.2 GHz, the measured ACLR is -32 and -37.7 dBc, without and with static bleeding current, respectively, with an EVM of -27 and -30 dB, respectively. - Chapter 7, concludes the thesis with some recommendations for the direction of future work. ## References - [1] X. Chen, "Status and Trends of Global 5G spectrum," - [2] Wikipedia, https://en.wikipedia.org/wiki/Moore%27s law - [3] H. Wang, T. Huang, N. Sasikanth Mannem, J. Lee, E. Garay, D. Munzer, E. Liu, Y. Liu, B. Lin, M. Eleraky, S. Li, F. Wang, A. S. Ahmed, C. Snyder, S. Lee, H. T. Nguyen, and M. Smith, "Power Amplifiers Performance Survey 2000-Present," [Online]. Available: https://gems.ece.gatech.edu/PA\_survey.html - [4] K. Bult, "Basic analog CMOS design: an intuitive approach" TU Delft 2012. ## CHAPTER # 2 ## Wireless TX line-ups Review This chapter gives an overview of the most popular wireless TX line-ups and PA architectures of today. As such, Section 2.1 is focused on the PA output stage itself and compares the efficiency performance of several conventional analog linear PAs. Sections 2.2 and 2.3 review traditional analog Cartesian and polar (up-converting) TX architectures, respectively. With this foundation in place, the digital-intensive counterparts of these popular architectures are introduced in Section 2.4. To address the needs of modern high-order modulation schemes, various efficiency enhancement techniques are discussed in Section 2.5. Finally, Section 2.6 concludes this chapter. ## 2.1 Efficiency of Class-A, Class-B, and Class-AB PAs in Power Back-Off Region Linear PAs are often categorized according to their conduction angle, which is determined by the bias condition used in the final stage. When the resulting conduction angle is $2\pi$ or $\pi$ , the PAs are categorized as class-A or class-B. Class-AB PA operation occurs when the conduction angle is between $\pi$ and $2\pi$ . The use of lower conduction angles is mainly motivated by the ambition to improve PA efficiency. Different efficiency definitions are used in the literature. However, in our discussion, we will use the drain efficiency $\eta_{\text{drain}}$ . This term is defined as the RF output power delivered to the load normalized by the DC power consumption which drawn from the supply source connected to the drain of a linear PA: $$\eta_{\text{drain}} = \frac{P_{\text{RF}}}{P_{\text{DC}}} = \frac{V_{\text{RF}}}{VDD} \cdot \frac{I_{\text{RF}}}{I_{\text{DC}}}$$ (2.1) where $P_{\rm RF}$ is the RF output power delivered to the load, $P_{\rm DC}$ is the total DC power consumption from the supply, $I_{\rm RF}$ is the root-mean-square (R.M.S) output current, $V_{\rm RF}$ is the R.M.S output voltage, VDD is the supply voltage connected to the train, and $I_{\rm DC}$ is the DC current drawn from the supply. As shown in (2.1), the drain efficiency can be split into two parts: the ratio between the RF voltage and DC voltage and the ratio between the RF current and DC current. In an ideal case, when the PA is at peak power, the maximum instantaneous voltage $V_{\rm max}$ equals VDD. In a class-A PA, when $V_{\rm max}$ equals VDD, then the R.M.S value of the sinusoidal output voltage is: $$V_{\rm RF} = \frac{\rm VDD}{\sqrt{2}} \tag{2.2}$$ In a proper class-A design, the drain current at the peak power has an amplitude equal to the class-A quiescent current: $$I_{\rm RF} = \frac{I_{\rm DC}}{\sqrt{2}} \tag{2.3}$$ Therefore, after substituting (2.2) and (2.3) with (2.1), the peak drain efficiency of a class-A PA, occurring at peak power, is: $$\eta_{\text{peak,class-A}} = \frac{V_{\text{RF}}}{\text{VDD}} \cdot \frac{I_{\text{RF}}}{I_{\text{DC}}} = 50 \%$$ (2.4) Consequently, when the output amplitude is only half of the peak amplitude, both output voltage and current are half of its peak counterpart. Therefore, the drain efficiency in the 6 dB PBO region (25 % of the peak power) for a class-A PA is: $$\eta_{\text{6dB,class-A}} = 12.5 \% = \frac{1}{4} \eta_{\text{peak,class-A}}$$ (2.5) In class-B PAs, the drain current is a half-wave rectified sine-wave. Since the conductance angle equals $\pi$ , the R.M.S value of output current is: $$I_{\text{OUT}} = \frac{2\sqrt{2}}{\pi} I_{\text{DC}} \tag{2.6}$$ while the maximum $V_{\text{OUT}}$ is the same as VDD, which is: $$V_{\rm RF} = \frac{\rm VDD}{\sqrt{2}} \tag{2.7}$$ Consequently, the maximum drain efficiency for a class-B PA is: $$\eta_{\text{peak,class-B}} = \frac{V_{\text{RF}}}{\text{VDD}} \cdot \frac{I_{\text{RF}}}{I_{\text{DC}}} = \frac{\pi}{4} = 78.5 \%$$ (2.8) Figure 2.1: Drain efficiency of an ideal class-A and class-B power amplifier versus PBO level, relative to the peak power condition at 0 dB. In the 6 dB PBO region, the DC current is proportionally reduced with respect to the output current while VDD remains the same, thus: $$\eta_{\text{6dB,class-B}} = 39.2 \% = \frac{1}{2} \eta_{\text{peak,class-B}}$$ (2.9) Therefore, the class-B PA can achieve higher drain efficiency performance both at peak power (78.5 % vs. 50 %), and in the 6 dB PBO region (39.2 % vs. 12.5 %), as shown in Fig. 2.1. However, linearity is typically degraded in a class-B PA compared to a class-A PA due to the non-linear I-V relationship with the amplifying transistor. In practice, to balance the efficiency and linearity, the linear PA is usually biased between class-A and class-B mode (class-AB), which requires the conductance angle to be between $\pi$ and $2\pi$ . The efficiency and linearity properties of a class-AB PA are also between those of a class-A PA and a class-B PA. In analog mixed-signal or digital-intensive designs, the definition of class-A and class-B is ultimately more general. Class-A circuits are typically defined as cases where the DC current does not scale down with the amplitude of the output signal, while class-B circuits are related to the situations where the DC current does scale down linearly with the output signal. Comparable to the linear PA discussions, digital-intensive class-AB-like circuits are referred to as in-between types of class-A and class-B operation. Figure 2.2: Block diagram of an analog intensive Cartesian TX. ## 2.2 Analog Cartesian Line-Ups Typically, an RF TX line-up should perform signal modulation, up-conversion, and amplification. A typical block diagram of the conventional analog Cartesian line-up is shown into Fig. 2.2. Usually, it consists of a digital baseband, digital-to-analog converters (DACs), low-pass filters (LPFs), mixers, and a PA. In modern wireless TXs, the data provided to be transmitted is mostly in the form of quadrature signals, i.e., IQ data. Such IQ signals are pre-processed in a digital baseband block. A typical processing step is the use of pulse shaping into transform the rectangular data waveform to a much smoother pulse shape, and as such, suppress inter-symbol interference (ISI). Although such pulse shaping can also be done in the analog domain, such a filter would be bulky, especially for a lower signal bandwidth. What is more, different wireless standards usually need different pulse shaping. For example, in Gauss frequency shift keying (GFSK), the pulse shape is Gaussian, but in QAM, it is to use a square-root raised cosine (SRRC) filter. Therefore, to make the TX unit more compact and compatible with various wireless standards, this pulse shaping is almost exclusively done in the digital domain. After the digital baseband, the digital data with a sampling rate of $F_s$ is transferred to the analog domain by the baseband DAC, which implicitly acts as a zero-order hold (ZOH) interpolation filter. Consequently, since there will be sampling spectral replicas at $F_s$ and its harmonics, here a LPF is used to filter out the sampling spectral replicas. Next, a quadrature mixer configuration is used to up-convert the IQ baseband signals to the RF domain, where the resulting complex modulated signal will further be amplified by the drivers and final PA stage before transmission by the antenna. In these analog-intensive direct-conversion TXs, the design challenges are mostly centered around the PA design, which needs to provide the targeted output power to the antenna with adequate linearity and good efficiency. These PAs typically employ large transistors. Therefore, often there will be a pre-driver needed between the mixers and PA. Note that in a TX line-up, since the signal amplitude is high, the noise of the mixers is considered to be less critical than in the receiver chain. 2.3 Polar TXs Figure 2.3: Block diagram of a polar TX line-up. ## 2.3 Polar TXs Another popular TX architecture is the polar architecture, which is sometimes also known as the envelope elimination and restoration (EER) configuration. A typical polar TX line-up is shown in Fig. 2.3. Here the IQ signal decomposes into a digital amplitude modulation (AM) and phase modulation (PM) by applying a CORDIC algorithm during the digital baseband signal processing. $$\begin{cases} AM = \sqrt{I^2 + Q^2} \\ PM = \tan^{-1} \frac{Q}{I} \end{cases}$$ (2.10) The PM data is used to modulate the LO signal in a phase modulator. The resulting analog phase-modulated output signal, with a constant envelope, is amplified by an efficient PA, while the AM information is restored through the PA supply modulation. Because the AM information is conveyed by the AM modulation, a non-linear switching PA (e.g., class-D, class-E, or class-F) can be employed to amplify the PM signal. This allows a much higher efficiency to be achieved compared to the use of linear PAs (e.g., class-A and class-AB PAs). By using an ideal switching PA and DC-to-DC supply converter, this architecture can achieve 100 % efficiency in theory. Although the use of a switching PA will boost the efficiency in polar TXs, it comes at the expense of bandwidth extension. Figure 2.4(a) shows the spectrum for the original signal, and the related AM and PM signal in a polar TX. The bandwidth of the original signal in Figure 2.4(a) is only 20 MHz, but due to the non-linear decomposition in (2.10), the bandwidths of the AM and PM signal are extended. In particular, the PM signal will have a bandwidth that is 3-5 times higher than that of the original signal. As a consequence, the phase modulator must be able to handle a signal with a much larger bandwidth than the original signal, which significantly increases the design burden. What is worse, any mismatch between the group delay in the AM and PM path is also a source of distortion. Such mismatch will result in phase errors which can severely corrupt Figure 2.4: (a) Spectrum of IQ signal, AM and PM signal in a polar TX; (b) degradation of ACLR and EVM vs. AM/PM delay mismatch in a polar TX with 10 MHz 64-QAM signal. ACLR and EVM performance. This is illustrated in Fig. 2.4(b). When considering a 10 MHz 64-QAM signal and a delay mismatch of 800 ps, the EVM and ACLR are degraded to -45 dB and -54 dBc, respectively. Note that this delay alignment requirement becomes gradually stringent with increasing video bandwidth. ## 2.3.1 Supply Modulators Supply modulators shown in Fig. 2.4(a) need to handle at least three times the bandwidth of original bandwidth. Therefore, it usually appears as the efficiency and linearity bottleneck in a polar TX. The simplest implementation of a supply modulator is using a linear voltage regulator (shown in Fig. 2.5(a)). This approach provides a reasonable bandwidth, a good power supply rejection ratio (PSRR), and potentially excellent spectral purity ([1]). However, in the PBO region, its efficiency would still follow the class-B-like roll-off. This can be understood by considering that the switching PA in Fig. 2.5(a) would draw the same current from its regulated supply as the linear regulator does from the supply voltage. Another way of looking at it would be that the voltage drop-over of the linear regulator is wasted and, as such, degrades the overall TX efficiency. In these configurations, the feedback loop will, in the end, limit the bandwidth of the supply modulator. A switching regulator dynamic DC-to-DC converter (Fig. 2.5(b)) has been proposed to boost the efficiency in the literature ([2]). However, these configurations can only support small video bandwidths that comprise only a small fraction of the switching frequency. Ignoring this constraint would result in voltage ripples on their outputs, yielding extra spurious components. To support higher bandwidths while still meeting the stringent spectral requirements from modern communication systems, a sufficiently high switching frequency in the DC-DC converter is required, thus yielding a degradation of the efficiency. Figure 2.5: Supply modulator: (a) linear regulator; (b) switching regulator. ## 2.3.2 Envelope Tracking System The key bottleneck to overcome for the supply modulator in a polar system in achieving high bandwidth lies in the bandwidth limitation of the feedback loop. Other bottlenecks are the phase modulator bandwidth and the precise AM-PM alignment requirement in terms of group delay. Envelope Tracking (ET) systems have been proposed to avoid the problem of the feedback loop bandwidth and AM-PM path alignment ([3] and [4]). A conceptual block diagram of this type of system is shown in Fig. 2.6. In an ET system, the AM information of the TX signal is used to control a supply modulator, such that it tracks the desired output signal envelope with some margin (voltage headroom). In contrast to an EER system, the signal entering the PA will contain both AM and PM information, as such avoiding the alignment problem. Consequently, a linear PA is needed for the RF signal amplification. It is now the PA itself that determines the signal quality rather than the AM path with a supply modulator, as in EER. This approach makes this ET architecture much more relaxed in terms of implementation, and therefore higher bandwidths can be achieved compared to the EER system. The achievable efficiency of an ET system can be very good ([4]), but it is not as high as that of an EER system due to the use of a linear amplifier and the remaining voltage headroom in the supply voltage tracking. Also, the ET supply modulator can be relaxed more in terms of bandwidth (low bandwidth path Fig. 2.6) since it no longer needs to precisely track/define the output signal envelope. By using appropriate data processing to reduce the bandwidth, it can be made even relatively slow. By using these ET approaches, bandwidths up to 40 MHz ([4]) can be achieved. ## 2.4 Digital-Intensive TX Line-Ups Recently, DTXs have gained attention due to their excellent hardware scalability with nanoscale CMOS and the great potential to incorporate extensive digital correction circuitries such as DPD. These properties are essential in achieving high system integration, linearity, and efficiency at a low cost. This section will give a short review of existing DTX architectures. It mainly categorizes DTXs into two categories: polar DTXs and Cartesian DTXs. For the Cartesian DTXs, we will discuss the DPA-based Cartesian DTXs and DDRMs. Figure 2.6: (a) Block diagram of envelope tracking system; (b) time domain waveform of AM signal [3]. Figure 2.7: Equivalent model of a DPA in polar DTXs. ## 2.4.1 DPAs and Digital-Intensive Polar TXs One of the underlying reasons for an ET system that it can support higher video bandwidth than an EER system is the fact that the up-converter is an open-loop system, which can provide larger bandwidths than a close-loop system. Similarly, direct-digital amplitude modulation is also an open-loop approach that can be applied in polar DTXs. As such, the digital-intensive approach opens up new possibilities to optimize for both TX efficiency and linearity ([5] and [6]). In a direct-digital amplitude modulator, AM modulation is achieved by turning on and off a discrete number of PA unit elements. A conceptual block diagram of this approach is shown in Fig. 2.8. Since direct amplitude modulation is realized with the RFDAC operation, there is no longer a need for a baseband DAC and frequency up-converter, which allows for a decrease in the power consumption of the overall system. An equivalent model of the DPA is shown in Fig. 2.7, where $G_{\rm ACT}$ and $G_{\rm DIS}$ stand for the admittance of each PA unit cell in the on and off state, respectively. To achieve a high efficiency, $G_{\rm ACT}$ should be much larger than $G_{\rm DIS}$ . The fact that this configuration functions as a (non-linear) resistive divider implies that will corrupt the linearity. Therefore, in practical implementations, there will be a trade-off between the linearity and efficiency performance with straightforward linear activation of the PA elements. This will be analyzed in more detail in Chapter 3. When benchmarking the direct-digital amplitude modulation technique for its efficiency versus PBO behaviors, as shown in Fig. 2.9, again, we find a similar class-B-like efficiency roll-off, since the Figure 2.8: Block diagram of a polar DTX with direct AM modulation. Figure 2.9: Efficiency versus. PBO in a typical polar TX. output current scales with the signal amplitude. Note that the achievable (theoretical) peak efficiency will depend on the applied (harmonic) impedance matching of the PA stage and duty-cycle ([7] and [8]). In terms of spectral purity, DPAs are often inferior to their linear counterparts for two reasons. First, their sample and hold nature, which combined with the fact no LPF is taking place (as in the analog TX line-up), yields spectral replicas at the sampling frequency and its harmonics. Second, the effective number of bits (ENOB) is usually limited to 3-6 bits, which limits the out-of-band (OOB) noise floor. In some cases, the DPA needs RF filtering at its output to lower the OOB noise floor. This can severely limit its application in high-power wireless infrastructure systems, where stringent spectral performance is required. As in the analog polar approach, the digital-intensive polar TX architecture also requires a wideband phase modulator. A phase modulator for this purpose using a digital-intensive approach will be discussed in Chapter 4. Consequently, by using digital-intensive AM and PM modulators, Figure 2.10: (a) Block diagram of a DPA-based Cartesian DTX and (b) its constellation diagram. [10] polar TXs can be made truly "digital". This also provides an excellent starting point to incorporate the digital-extensive digital signal correction circuitry such as DPD on the same transceiver IC. Note that in a polar system, this is done quite easily due to its relatively independent amplitude-control-word (ACW)-AM and ACW-PM behavior, allowing a two times 1D correction approach ([5]). ## 2.4.2 DPA-Based Cartesian DTXs A conceptual diagram of a Cartesian-based DTX is shown in Fig. 2.10(a), which is firstly introduced in [9]. The key difference with the polar approach is that in the Cartesian DTX, there are two DPA branches (instead of one). Each of the DPAs has a fixed phase input signal, which are all 90° apart (this instead of a single input with a varying phase, as found in the polar DTX). At the output of a Cartesian DTX, vector summing is applied. As a result, the entire constellation diagram can be constructed (Fig. 2.10(b)). An advantage of this architecture over polar DTXs is the absence of a wideband phase modulator, which is often the bottleneck to achieving high linearity for large modulation bandwidth. Compared to the conventional analog Cartesian TX in Section 2.2, these Cartesian DTXs can achieve higher system efficiency due to the absence of baseband DACs and quadrature mixers, which typically all operate in class-A conditions. Unfortunately, the linearity of these Cartesian DTXs is typically not as good as that of a conventional analog Cartesian line-up. This is mainly due to the I-branch/Q-branch interaction in the Cartesian DTX. This interaction worsens when using 50 % duty-cycle summing [11] or when aiming for a high efficiency implementation of the I- and Q-DPA branches [10], which do not rely on the use of unit cell current sources. IQ-interaction cannot be modeled as AM-AM and AM-PM distortion as it requires, in contrast to polar systems (two-times 1D), a full 2-dimensional (2D) Figure 2.11: Equivalent model of DPA-based Cartesian DTXs. Figure 2.12: Typical schematic of a (a) baseband DAC and (b) active up-mixer. signal correction in the DPD approach. An analysis of this IQ interaction will be given in Chapter 5. ## 2.4.3 Direct-Digital RF Modulator DPA-based Cartesian DTXs are typically centered around a PA-stage design and then extended to other building blocks, including the mixer. This places emphasis on the efficiency at the cost of linearity. The design can be centered more towards the modulator, with the focus on obtaining superior linearity with a less emphasis on efficiency. There is common ground between DPA-based Cartesian DTXs and DDRMs since both of them employ a "digitized" version of the conventional Cartesian architecture. In view of this, Fig. 2.12 shows typical designs of a baseband DAC and a active up-converter for a single channel (I or Q). The baseband DAC usually employs a current-steering topology, which is well-known for its superior linearity. The up-converter topology in Fig. 2.12(b) is essentially a Gilbert cell to improve linearity and noise performance. Usually, in an analog up-converter line-up, there is an LPF between the DAC and mixer to filter out the sampling spectral replicas. To implement the RF modulator in a more straightforward and digital-intensive way, the baseband DAC and up-converter can be merged together. Such "direct" circuits are called DDRMs, which were first introduced in [12] (Fig. 2.13). Note that the DDRM presented in [12] includes both the I- and Q-branch. The DDRM essentially is a current-steering RFDAC with a 1-bit digital Figure 2.13: DDRM proposed in [12]. mixer inside each unit cell. By using a high sampling frequency with related data interpolation, the sampling spectral replicas can be pushed far away, and the LPFs can be avoided. Due to the use of the current-steering topology, the system efficiency of a DDRM is not optimized. For example, in [12], the DDRM generates a WCDMA signal with an average power of -10 dBm which consumes 73 mW of DC power, yielding an overall efficiency of 0.14 %. A more extensive theoretical comparison between the DPA-based Cartesian DTXs and DDRMs will be presented in Chapter 5. ## 2.5 Efficiency Enhancement Techniques In modern communication systems, due to the use of high-order modulation schemes, the output power of a modulated TX signal when equals to its peak power is only a small fraction of the time. Consequently, it is more effective to consider the average efficiency instead of the peak efficiency. This average efficiency depends both on the amplitude distribution of the modulated TX signal, as well as the TX efficiency in PBO. The amplitude distribution of a modulated signal can be represented by its probability distribution function (PDF). To put this a bit more in perspective, the PDFs of an LTE and a WLAN signal are shown in Fig. 2.14 ([13]), together with the ideal class-AB PA efficiency curve as a function of the PBO level. As can be observed, the average output power of PAs for WLAN and LTE signals is, respectively, about 6 dB and 13 dB less than the peak output power. However, to remain linear, the PA is not allowed to clip the signal. Therefore, as can be concluded from Fig. 2.14, the average drain efficiency of the class-AB PA for these signals is close to only 37 % and 24 % of the peak drain efficiency, respectively. To mitigate this waste of supply energy, different efficiency enhancement techniques have been proposed over time. The most popular efficiency enhancement techniques can be categorized into two main groups: supply modulation and load modulation. In Section 2.3, the analog supply modulation techniques such as EER and ET have already been briefly discussed. In this section, we introduce Figure 2.14: Efficiency of a class-AB PA and the PDFs of a LTE signal and a WLAN signal vs. PBO level. [13] an open-loop supply switching technique, namely: class-G PA operation (also called the dual supply modulator). Next, two popular load modulation techniques are discussed, namely the Doherty and outphasing techniques. ## 2.5.1 Supply Switching: Class-G Operation As discussed above, the supply modulator often fails to track the amplitude of large bandwidth signals, so a more straightforward, simpler approach is to use a few discrete power supplies (often two supplies) and switch the levels using switches (class-G), which was first introduced in [14]. A conceptual diagram is shown in Fig. 2.15(a). There are two supplies in the system. When the signal is below the 6 dB PBO region; the supply voltage is set to VDD/2, and when the signal is beyond 6 dB PBO, the supply voltage is set to VDD. The efficiency versus the normalized output voltage is shown in Fig. 2.15(b) ([14]). As can be noted, the efficiency below the 6 dB PBO region is enhanced and along with the average efficiency. In a practical PA design, however, there is usually an RF choke in the supply feed. Therefore, when the supply is switched, there will be a glitch, both in current and in the supply voltage provided. As a result, most class-G implementations use supply voltage switching for average power tracking (APT). Recently, several researchers ([15] and [16]) have employed this concept in supply modulation by dynamically switching the power supply and tracking the envelope of the signal, thus achieving efficiency enhancement similar to that with ET in modulated signals. However, the increased distortion due to the glitch prevents it from achieving a large bandwidth with good signal Figure 2.15: (a) Top-level topology of the class-G PA; (b) Drain and average efficiency comparisons for a PA with dual supply versus a class-AB PA. [14] Figure 2.16: (a) Basic Doherty PA; (b) equivalent schematic of the Doherty PA. quality. ## 2.5.2 Doherty Power Amplifiers The Doherty PA topology achieves enhanced PBO efficiency by using active load modulation ([17]). Since it can potentially support a large modulation bandwidth without extra computations for the input signals, it has been widely employed in base stations. With the Doherty PA, only low-complexity hardware is needed to support a large modulation bandwidth. Its simplicity and good performance make it a popular choice for base station applications. The basic Doherty PA topology is shown in Fig. 2.16(a), and its equivalent circuit topology is shown in Fig. 2.16(b). It is composed of two branches, namely a main and peak branch, and a $\lambda/4$ transmission-line-based impedance inverter that acts as a Doherty output power combiner. The Doherty operation itself relies on the predefined interaction between the two PA paths. To obtain this desired interaction for analog implementations, the main PA is biased in class-AB operation, while the peak PA is biased in class-C operation. When the output signal is below 6 dB PBO ( $V_{\text{out}}$ is smaller than $V_{\text{out,peak}}/2$ ), only the main PA is active (Fig. 2.17), while the peak PA remains off. In this mode, the impedance seen from the Figure 2.17: Load impedance (a) and voltage swing (b) of the main PA and peak PA of a Doherty PA; and (c) efficiency curve vs. PBO level. main PA is constant and is given by the impedance inverter action on $R_L$ yielding: $$Z_{\text{main}} = \frac{Z_o^2}{Z_L} \tag{2.11}$$ When the signal is between the -6 dB and 0 dB PBO regions ( $V_{\text{out}}$ is between $V_{\text{out,peak}}/2$ and $V_{\text{out,peak}}$ ), the peak PA is activated, and the load impedance seen by the main PA becomes: $$Z_{\text{main}} = \frac{Z_o^2}{\frac{I_{\text{main}} + I_{\text{peak}}}{I_{\text{main}}} Z_L}$$ (2.12) where $Z_o$ is the characteristic impedance of the transmission line, $I_{\text{main}}$ and $I_{\text{peak}}$ are the currents of the main and peak branches, respectively, and $Z_L$ is load impedance. Note that due to the peak PA activation at 6 dB PBO and the impedance inversion, the impedance for the main is lowered with an increasing $I_{\text{peak}}$ . This frees the main amplifier from the signal clipping, while its output voltage swing is kept close to the supply voltage (Fig. 2.17(a)). As a result, the maximum drain efficiency is achieved for the main PA. The peak PA will only reach its maximum output voltage swing and efficiency at full output power. Figure 2.17(c) provides the resulting efficiency of the overall Doherty PA versus PBO level, and as such, shows the well-known Doherty efficiency enhancement in PBO region. Doherty operation can also be achieved using energy-efficient RFDACs for the main and peak PA implementation, offering new opportunities to achieve higher TX system integration and performance. We will discuss such an implementation in Chapter 3. ## 2.5.3 Outphasing Power Amplifiers Implementing an outphasing TX is another well-known technique to obtain efficiency enhancement in the PBO region. In such a TX, to create the output signal, two (saturated) branch PAs are used which produce two equal constant envelope signals that are vector-summed in an output power combiner. The output amplitude of this combiner depends on the relative phase difference between the signals in the two branches. The use of saturated or switch-mode branch amplifiers enhances the TX peak efficiency, which can theoretically reach 100 %. The efficiency in PBO operation, however, is highly dependent on the type of output power-combiner used. For example, if an isolating power combiner is used, e.g., a Wilkinson power combiner, the effective loading of the branch PAs remains constant and equal to its $R_{\rm opt}$ for maximum output power. Consequently, no PBO efficiency enhancement takes place, and all excess RF power generated by the branch PAs will be absorbed by the isolation resistor in the Wilkinson combiner. When using a non-isolating power combiner, e.g., a Chireix combiner, loading of the branch amplifiers become dynamically modulated with the outphasing angle, allowing the PBO efficiency enhancement to be achieved. However, this Chireix combiner is inherently narrowband as it relies on "matched" positive and negative reactance compensation for the PA branches. This limits its use in narrowband applications. Outphasing TXs can be implemented for both analog and digital-intensive implementation. Digital-intensive implementation will have a better performance potential than that of analog counterparts. However, the computational overhead can be considerable. Specifically, the number of adders required for outphasing signal decomposition is always much larger than required for a conventional CORDIC in a polar line-up. Especially when targeting low or moderate TX output power, the high computational power can dramatically degrade the overall TX line-up efficiency. ## 2.6 Conclusion This chapter gives the efficiency and bandwidth considerations of wireless TX architectures. First, linear class-A and class-(A)B operation with related efficiency versus PBO are discussed. Next, conventional analog TX line-up architectures are explained, namely polar and analog Cartesian approaches, and are evaluated in terms of modulation bandwidth. With this established, the next step is taken toward designing their (new) digital-intensive polar and Cartesian counterparts. These new concepts provide a promising path to higher integration while opening up new opportunities to dramatically improve the performance of wireless systems. Two classes of DTXs are identified. The first is the DPA topology, which is mostly focused on achieving high-efficiency through switch-mode PA. The second is the DDRM approach, which favors the use of current-steering-oriented architectures to achieve superior linearity. In conclusion, this chapter explains the need to boost the efficiency in PBO, and for this purpose describes briefly the most popular efficiency enhancement techniques to do so. Building on this foundation of wireless systems, we are ready to enter the field of advanced DTXs. ## References - [1] E. McCune, "Operating Modes of Dynamic Power Supply Transmitter Amplifiers," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 11, pp. 2511-2517, Nov. 2014. - [2] V. Pinon et al., "A Single-Chip WCDMA Envelope Reconstruction LDMOS PA with 130MHz Switched-Mode Power Supply," 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2008, pp. 564-636 2.6 References 25 [3] J. Jeong *et al.*, "Wideband Envelope Tracking Power Amplifiers With Reduced Bandwidth Power Supply Waveforms and Adaptive Digital Predistortion Techniques," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 57, no. 12, pp. 3307-3314, Dec. 2009. - [4] D. Chowdhury et al., "2.2 A fully integrated reconfigurable wideband envelope-tracking SoC for high-bandwidth WLAN applications in a 28nm CMOS technology," 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, pp. 34-35. - [5] L. Ye et al., "Design Considerations for a Direct Digitally Modulated WLAN Transmitter With Integrated Phase Path and Dynamic Impedance Modulation," in *IEEE Journal of Solid-State Circuits*, vol. 48, no. 12, pp. 3160-3177, Dec. 2013. - [6] S. Zheng et al., "A WCDMA/WLAN Digital Polar Transmitter With Low-Noise ADPLL, Wideband PM/AM Modulator, and Linearized PA," in *IEEE Journal of Solid-State Circuits*, vol. 50, no. 7, pp. 1645-1656, July 2015. - [7] S. Cripps "RF Power Amplifiers for Wireless Communications" Artech House Publishers. - [8] D. Mul et al., "Efficiency and Linearity of Digital "Class-C Like" Transmitters" 2020 50th European Microwave Conference (EuMC), Utrecht, the Netherlands, 2020, pp. 1-4. - [9] M. S. Alavi et al., "All-Digital RF I/Q Modulator," in IEEE Transactions on Microwave Theory and Techniques, vol. 60, no. 11, pp. 3513-3526, Nov. 2012. - [10] M. S. Alavi et al., "A Wideband 2× 13-bit All-Digital I/Q RF-DAC," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 4, pp. 732-752, April 2014. - [11] Z. Deng et al., "9.5 A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40nm CMOS," 2016 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 172-173. - [12] P. Eloranta et al., "Direct-digital RF modulator IC in 0.13 /spl um CMOS for wide-band multiradio applications," ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits, 2005., San Francisco, CA, 2005. - [13] C. Ramella *et al.*, "High Efficiency Power Amplifiers for Modern Mobile Communications: The Load-Modulation Approach", *Electronics* 2017, 6, 96. - [14] J. S. Walling *et al.*, "A Class-G Supply Modulator and Class-E PA in 130 nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 44, no. 9, pp. 2339-2347, Sept. 2009. - [15] S. Hu et al., "A Broadband Mixed-Signal CMOS Power Amplifier With a Hybrid Class-G Doherty Efficiency Enhancement Technique," in *IEEE Journal of Solid-State Circuits*, vol. 51, no. 3, pp. 598-613, March 2016. - [16] V. Vorapipat et al., "A Class-G Voltage-Mode Doherty Power Amplifier," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 12, pp. 3348-3360, Dec. 2017. - [17] F. H. Raab *et al.*, "Power amplifiers and transmitters for RF and microwave," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 50, no. 3, pp. 814-826, March 2002. ## CHAPTER # 3 ## Fully Integrated Digital-Intensive Polar Doherty Transmitter This chapter presents the design, realization, and verification of the world's first fully-integrated bits-in-RF-out digital polar Doherty TX. It has been implemented in a 40 nm bulk CMOS technology and operates from 2.3-2.8 GHz. The proposed architecture comprises CORDIC, digital delay aligners, DPD correction circuitry, and frequency-agile wideband phase modulators. The latter drives the polar main and peak PA switching banks that operate in quasi-load insensitive (QLI) class-E PAs [1]. The output powers are combined by an on-chip transformer-based Doherty power combiner. At 2.5 GHz, its maximum output power reaches +21.4 dBm, while its drain efficiency is 49.4 % at peak power and 33.7 % in the 6-dB PBO region. After applying DPD for a 40 MHz 64-QAM signal, the measured EVM is better than -31 dB, with an ACLR better than -40 dBc at an average drain efficiency of 25 %. This chapter is organized as follows. Section 3.1 introduces the concept of the polar DTX with its benefits and challenges, and it also presents the research targets of this work. Section 3.2 deduces the segmented QLI class-E PA model and presents the analysis for estimating the ACW-AM and ACW-PM curve of the Doherty DPA. Following that, the overall system architecture is discussed in Section 3.3. Section 3.4 provides the Doherty DPA circuit implementation, while Section 3.5 and Section 3.6 discuss the baseband and LO generation, respectively. Section 3.7 briefly discusses the DPD algorithm used in the measurements. Experimental results are shown in Section 3.8, while Section 3.9 concludes the chapter. Figure 3.1: Conceptual block diagram of a polar DTX. ## 3.1 Introduction Recently, DTXs ([2]-[14]) have drawn attention due to their excellent hardware scalability in nanoscale CMOS processes and the great potential they offer to incorporate digital correction such as DPD. These properties are essential in achieving high system integration, linearity, and efficiency at a low cost. Figure 3.1 presents a typical block diagram of a polar DTX, including an AM path, a PM path, and a digital baseband. In such a configuration, the IQ data will first decompose into a digital AM and PM signals in the baseband. Next, this AM/PM data is used to drive the digital-intensive AM/PM modulators and PA switch banks. The output signal is constructed in the switch banks, which combine the phase and amplitude information. The DPD circuitry is typically included in the digital baseband. In this chapter, some of the DTX challenges that have been discussed in Chapter 2, like the realization of efficiency enhancement in PBO region and low-power DPD, will be addressed. Note that the discussion of wideband phase modulators will be addressed in Chapter 4. The aim of this work is to achieve high overall system efficiency, spectral purity, and video bandwidth. To do so, a polar Doherty DTX is proposed and realized in 40 nm bulk CMOS ([15]). To the authors' knowledge, this work represents the first "bits-in RF-out" single-chip polar DTX employing the Doherty topology. ## 3.2 Digital-Controlled Output Stage #### 3.2.1 Doherty DPA The Doherty DPA is derived from an analog Doherty PA, which is shown in Fig. 3.2(a) ([16] and [17]). In such an analog Doherty PA, the branch amplifier activation is set by amplifying the input signal and bias conditions of the main (class-AB) amplifier and peak (class-C) amplifier, respectively. Unlike the analog Doherty PA, in the digital polar Doherty PA of Fig. 3.2(b) [18]-[27], both the main and peak amplifier branches are replaced by RFDACs that are driven by two coherent phase-modulated signals. This improved control method provides significant benefits in terms of the PBO efficiency compared to its analog counterparts. As a result, the overall drain efficiency of Figure 3.2: (a) Principle schematic of a conventional analog Doherty PA; (b) principle schematic of a polar Doherty DPA. Figure 3.3: (a) Behavioral model of a Doherty DPA when assuming QLI class-E PA operation; (b) related simulated efficiency. Doherty DTX can be boosted to theoretically 100 % at the peak and 6 dB PBO point by employing QLI class-E switch-mode operation for the branch amplifiers. Figure 3.3 shows the Doherty DPA's conceptual schematic based on the switch-mode operation of the DPA branches, and its related simulated curve drain efficiency versus the output power. As can be observed, the simulated drain efficiency is higher than that of an analog class-B/class-C Doherty PA, which is limited to 78.5 % theoretical peak efficiency. Also, in Fig. 3.2(b), the main and peak branches share the same PM signal with a 90° phase lag in the peaking path. This feature allows the elimination of the bulky and lossy power splitter in the input of the analog Doherty PA. Finally, the digital control also provides both perfect activation and very accurate switch bank drive profiles, by avoiding slow activation of the peak branch as well as activation inaccuracies, which are both well-known problems in analog Doherty PAs. Consequently, in contrast to analog Doherty PAs, pronounced Doherty efficiency peaking will occur at the PBO high-efficiency point. Figure 3.4: (a) Principle schematic of a single-ended QLI class-E DTX; (b) drain voltage for different activation levels of the unit cells. Figure 3.5: (a) Schematic of push-pull class-E DPA; (b) single-ended LTI model. #### 3.2.2 ACW-AM and ACW-PM Curves In Fig. 3.4(a), the concept of a segmented QLI class-E PA is shown, using the output matching network of [1]. By selecting the number of activated transistor unit cells, the amount of output current can be controlled, and thus, the output amplitude can be set. This configuration can be modeled as a current source with finite impedance. We first consider the case where all the transistor cells are disabled. In this case, the related admittance is: $$Y_{\text{DISACT}} = Y_{\text{OFF}} \tag{3.1}$$ When the cell is activated, the admittance becomes: $$Y_{\text{ACT}} = Y_{\text{OFF}} + (Y_{\text{ON}} - Y_{\text{OFF}})u(t) \tag{3.2}$$ Furthermore, u(t) represents a normalized LO square wave. Figure 3.4(b) shows the drain voltage for different activation levels. In practice, a push-pull configuration, as shown in Fig. 3.5(a), is usually employed to reach higher output power and suppress even-order harmonics. The transformer in the output matching network can also act as a DC feed inductor and balun. As a result, when analyzing the output power, higher even harmonics can be neglected. Thus, an equivalent single-ended linear time-invariant (LTI) model is used in Fig. 3.5(b), where N is the total number of unit cells and n is the number of activated cells. In this simplified LTI model, $Z_1$ stands for the impedance seen from the DPA switch bank, and each current source has the same current waveform independent of the control word n. Consequently, the total current is: $$I_{\text{total}} = nI_{\text{unit}}(t) \tag{3.3}$$ where $I_{\rm unit}$ is the current pulse related to a single unit cell. Thus, the source admittance is: $$Y_{\text{source}} = nY_{\text{ACT}} + (N - n)Y_{\text{DISACT}}$$ (3.4) where $Y_{\text{ON}}$ and $Y_{\text{OFF}}$ are the reciprocal of $Z_{\text{ON}}$ and $Z_{\text{OFF}}$ , respectively. As a result, the output current that enters the matching network can be written as: $$I_{\text{Drain}} = n \frac{Y_1}{nY_{\text{ACT}} + (N - n)Y_{\text{DISACT}} + Y_1} I_{\text{unit}}(t)$$ $$= n \frac{Y_1}{n(Y_{\text{ON}} - Y_{\text{OFF}})u(t) + NY_{\text{OFF}} + Y_1} I_{\text{unit}}(t)$$ (3.5) where $Y_1$ is the reciprocal of $Z_1$ . Note that to minimize the switch's loss when activated, $Y_{\rm ON}$ should be large, and $Y_{\rm OFF}$ should be minimized. As can be noted from (3.5), $I_{\rm Drain}$ is not a linear function of n. Therefore, there is an inherent conflict between the efficiency and linearity in this QLI class-E DPA design when using uniformly sized unit cells. Furthermore, the phase distortion results from the imaginary part of $Y_{\rm ON}$ . With (3.5), the ACW-AM and ACW-PM curves can be calculated. In Fig. 3.6, the simulated and calculated output power versus control word is presented, where N is set to 512. The schematic used for simulation is given in Fig. 3.5(a) and its result is plotted in Fig. 3.6, together with the analytical result of (3.5). The $Y_{\rm ON}$ and $Y_{\rm OFF}$ of the transistor used in Fig. 3.5 are extracted from the DC simulation. This deviation is mainly caused by the more gradual transition between the "on" and "off" state when using a more realistic transistor model. The deviation in the ACW-PM curve is mainly due to the non-linear drain capacitance of the switch bank. It is worth mentioning that the linearity of QLI-class-E (Doherty) operation has been extensively discussed in [28] and [29]. Both concluded that uniformly partitioned SQLI class-E operation is inherently non-linear in both its ACW-AM and ACW-PM profiles. The analysis made here is different from earlier works based on the static $Y_{\rm ON}$ . In [28], a circuit design method utilizing non-linear segmentation and multi-phase clocking techniques was also introduced that fully compensates these non-linearities directly in the design, yielding a linear ACW-AM and ACW-PM transfer. Although very useful, these techniques tend to make the design significantly more complex, while the linearity in practical implementations can only be improved up to a certain extent. Consequently, in this work, the non-linearity shown in Fig. 3.5 will be pre-distorted using a relatively simple LUT approach, which allows greater flexibility. When going from a single TX line-up to a Doherty topology, the ACW-AM and ACW-PM behavior becomes more complicated. It is therefore convenient to split the (symmetrical) Doherty behavior into two parts, namely, in the PBO region below 6 dB and that beyond 6 dB. When the DPA is operating below the 6 dB PBO region, all peak branch elements are inactive. When the DPA operates between -6 dB and 0 dB PBO, all elements in the main path are activated, together Figure 3.6: Simulated and calculated (a) ACW-AM curve and (b) ACW-PM curve for a standalone DPA, where the parameters in the calculation are extracted from DC simulations. Figure 3.7: Doherty DPA model (a) below 6 dB PBO region and (b) beyond 6 dB PBO region. with an (increasing) number of elements in the peaking path. In the PBO region below 6 dB, the Doherty DPA operates similar to a standalone DPA. However, the impedance inverter transfers the external load before being offered to the output stage. In this case, the output current can be given as: $$I_{\text{Drain}} = n \frac{Y_{\text{INV}}}{n(Y_{\text{ON}} - Y_{\text{OFF}})u(t) + NY_{\text{OFF}} + Y_{\text{INV}}} I_{\text{unit}}(t)$$ (3.6) where: $$Y_{\rm INV} = \frac{1}{Z_0^2(Y_{\rm Load})} \tag{3.7}$$ Note that the $Y_{\text{OFF}}$ in the peak branch is neglected here. In the -6 dB to 0 dB PBO region, the peak and main branch modulate each other's load condition. Assume in the peak branch that n cells are activated. For the N unit cells activated in Figure 3.8: Simulated and calculated (a) ACW-AM curve and (b) ACW-PM curve for a Doherty DPA, where the parameters in the calculation are extracted from DC simulations. the main branch, the output current they contribute is given by: $$I_{\text{Drain,main}} = N \frac{Y_{\text{INV,main}}}{NY_{\text{ACT}} + Y_{\text{INV,main}}} \times \frac{Y_{\text{Load}}}{Y_{\text{Load}} + nY_{\text{ACT}}} I_{\text{unit}}$$ (3.8) where: $$Y_{\text{INV,main}} = \frac{1}{Z_0^2(Y_{\text{Load}} + nY_{\text{ACT}})}$$ (3.9) For n unit cells activated in the peak branch, the output current they contribute is: $$I_{\text{Drain,peak}} = n \frac{Y_{\text{Load}}}{nY_{\text{ACT}} + Y_{\text{INV,peak}}^2 / Y_{\text{Load}}} I_{\text{unit}}$$ (3.10) where: $$Y_{\text{INV,peak}} = \frac{1}{Z_0^2(NY_{\text{ACT}})} \tag{3.11}$$ The final output current is: $$I_{\text{Drain}} = I_{\text{Drain,main}} + I_{\text{Drain,peak}} \tag{3.12}$$ In Fig. 3.8, the simulated and analytically calculated output power versus control word is presented. Here the total number of unit cells N has been set to 1024 and is equally distributed over the main and peak branches. The transistor's $Y_{\rm ON}$ and $Y_{\rm OFF}$ are also extracted from the DC simulation. The calculated and simulated curves almost overlap each other. When only the main branch is activated, the ACW-AM and ACW-PM curves are very similar to those of the standalone DPA. Note that for the results of Fig. 3.8, the $\lambda/4$ impedance inverter is assumed to be lossless. In practice, the ACW-AM and ACW-PM curves will be affected by the loss from matching network. Figure 3.9: Block diagram of the proposed DTX. ## 3.3 System Architecture The overall block diagram of the proposed Doherty DTX is depicted in Fig. 3.9. This configuration consists of a digital baseband processing unit, clock divider, wideband phase modulators, and two DPAs in Doherty configuration. The $4 \times f_{\rm LO}$ single-ended off-chip clock, where $f_{\rm LO}$ is the carrier frequency, is applied to an on-chip balun to convert the single-ended clock into a differential signal. This differential clock is then applied to a divide-by-4 circuit to generate the desired multi-phase clock signals at $f_{\rm LO}$ . These clock signals are then fed to the main and peak phase modulators of the Doherty branches, with the clock signals of the peak branch lagging 90° those of the main branch, to match the delay of the $\lambda/4$ transmission line in a conventional Doherty power combiner topology. Employing an on-chip CORDIC, the IQ baseband signals are converted into AM and PM signals. The baseband AM signal is first pre-distorted and split into an AM-main and AM-peak value fed to the main and peak branches, respectively. Note that, preceding the switch banks, the envelope signals are first converted from a binary bit-stream into a thermometer code and then applied to the (digital) up-converted mixers to prevent a non-monotonous transfer curve. Meanwhile, the baseband phase signal is first applied to a normalizer unit which decomposes the phase information into a constant envelope I/Q signals. The resultant I/Q signals drive two IQ RFDAC-based wideband phase modulators that generate the desired up-converting clock signals for the main and peak PA Figure 3.10: Top-level schematic of the output stage. branches. The two branch signals are finally combined using an on-chip Doherty matching network. # 3.4 Implementation of the Output Stage This section explains the implementation of the output stage, including unit cell, floorplan, digital logic, and matching network. Figure 3.10 presents the top-level schematic of the output stage, together with the output matching network. In this design, a differential/push-pull symmetrical Doherty configuration is employed, and thus, the main and peak branches are identical. #### 3.4.1 Unit Cell Implementation There are 511 MSB cells and 3 LSB cells for each bank, and the MSB cells form a 2D matrix. The floorplan for each bank will be discussed below. As can be seen from Fig. 3.10, the thermometer signal for each bank includes the Row and Col activation, along with extra control bits for the Row to guarantee that all DPA unit cells of the previous rows are activated. The unit cell comprises digital logic and a switch. The logic part consists of a decoder (AND-OR gate) and a mixer (AND gate). The AND-OR decoder determines whether the unit cell should be activated or not. Then, the decoded control signal will be mixed with the PM signal by a CMOS AND gate. All transistors of both the decoder and mixing circuit are implemented with a minimum-sized CMOS device ratio in 40 nm technology, namely, $W/L = 0.12 \ \mu m/40 \ nm$ to minimize area and power consumption. The core of the MSB unit cell is a single-transistor switch working in class-E PA operation. Unlike other DPA work, e.g., [12] and [32], the cascode topology is not deployed here to minimize the $R_{\rm on}$ to obtain a high drain efficiency. Since the highest R.M.S voltage that a single transistor can handle is 1.4 V, the supply voltage is set to 0.7 V because that the peak drain voltage of class-E PA can reach more than two times the supply voltage. In low-frequency operation, the transistor's $R_{\rm on}$ is the reciprocal of its W/L, while at the higher operating frequency, the parasitic gate resistance will become dominant. After a few design iterations, a ratio of W/L=5 $\mu$ m/40 nm for the MSB cell was chosen, with the width split over two fingers to lower the parasitic gate resistance. As depicted in Fig. 3.10, between the mixing AND gate and out switching stage, there are also two extra buffers to restrict its rising/falling time within 25 ps at final stage. Note that for the LSB cells, except for the decoder, which features minimum sizing, the switching transistors are scaled at a quarter of their counterpart in the MSB cells. For every row of unit cells, the delays in the PM signal path are aligned. #### 3.4.2 DPA Bank Floorplan The floorplan of a push-pull DPA bank is shown in Fig. 3.11. The bank in the symmetrical Doherty features 9-bit MSB and 2-bit LSB cells, yielding high complexity for the layout of the DPA due to challenges in matching the delays between the AM and PM paths, as well as in power combiner. Therefore, the MSB cells are placed in a 2D pattern, and the LSB unit comprises four small unit cells which occupy only one row (1 × 4) at the corner of the DPA bank. Moreover, dummy unit cells are placed to surround the DPA bank to improve the overall matching of the DPA unit cells. The additional drain capacitors for optimum class-E matching are placed at the top of the bank, which is very close to the output matching network. The activation signal is locally decoded from both the column and the row selection bits inside each cell by applying a simple AND-OR gate. By doing so, the number of traces required for each bank is drastically reduced. As can be seen from Fig. 3.11, the horizontal data contains the five LSBs, and the vertical data contains the four MSBs. The column and row data signals are first encoded as a thermometer code and then vertically and horizontally routed and connected to each unit cell. Moreover, the unit cells are controlled in a serpentine style ([12]) to minimize the DNL effect on "snake" traverse switching. The push-pull DPA banks are mirrored, and the differential PM signal is fed to the bank from the Figure 3.11: Floorplan of DPA array in one branch. middle point. The row and column buses are retimed at the output of the thermometer encoder, and thus, the skew issue among the unit cells is relieved, which in simulation is within 27 and 40 ps after RC parasitic extraction, respectively. ### 3.4.3 Thermometer Encoder and DFF Based on the floorplan and data decoder strategy, 4-to-15 and 5-to-31 binary-to-thermometer encoders are employed for the MSB cells and placed as shown in Fig. 3.11. The basic 2-to-3 encoders are implemented based on a regenerative approach depicted in Fig. 3.12, which is adopted from [12]. The LSB (BB0) and MSB (BB2) are encoded by CMOS OR and AND gates in this approach. Moreover, BB1 is equal to the input A1. The 3-to-7 shown in the encoder in Fig. 3.12(b), however, is implemented based on the 2-to-3 encoder with two increments. First, the intermediate 3-bit thermometer codes of Fig. 3.12(b) are encoded. B0 to B6 of the final thermometer code is generated by CMOS OR and AND gates with an input of BB0, BB1, BB2, and A2. The 4-to-15 (Fig. 3.12(c)) and 5-to-31 (Fig. 3.12(d)) are generated in the same approach with more increments. The CMOS gates are customized to have an optimized rising and falling time to decrease the timing skew in the output. At the thermometer encoder's input and output, there are DFFs to synchronize the output signal, which is the last synchronizing stage before the output DPA cell. Since the clock of such DFFs can be as high as $f_{\rm LO}$ , these DFFs employ the transmission-gate-based topology in contrast to the DFFs in the standard cell library, as shown in Fig. 3.13(a). The CMOS DFFs consist of two cascaded latches with the inverted clock signal. A schematic of the latch is shown in Fig. 3.13(b). A back-to-back inverter-based latch buffers the transmission-gate-based latch's output to sharpen the rising/falling time. Figure 3.12: (a) 2-to-3; (b) 3-to-7; (c) 4-to-15 and (d) 5-to-31 thermometer encoders. [12] Figure 3.13: (a) Schematic of DFF; (b) schematic of latch. Figure 3.14: 3D layout of output matching network. ### 3.4.4 Output Matching Network The physical implementation of the matching network is shown in Fig. 3.14. The two DPA banks are connected through a C-L-C $\pi$ -network as the $\lambda/4$ impedance inverter, and their output parasitic capacitors are lumped together with a drain capacitor $(C_E)$ and tuned to satisfy the QLI class-E PA parameters proposed in [1]. The inductance of the transformer's primary loop can also act as the DC feed in the class-E matching network. Since the drain voltage of the switch is fixed in view of the reliability, a large turn ratio for the transformer is used to offer a low impedance level to maintain sufficient output power. For this purpose, in this design, a turn ratio of three is chosen. The load seen from the peak branch and $\lambda/4$ impedance inverter is set to: $$R_{\text{Load}} = R/3^2 = 5.56\Omega$$ (3.13) after taking the single-ended-to-differential conversion and power combining into account. The layout of the whole output matching network, together with its metal stack, is also shown in Fig. 3.14. In the primary loop of the transformer, three loops are "in parallel", which provides the low inductance needed for the primary loop, whereas in the secondary loop, the three loops are "in series". Additionally, the DC ohmic loss is also reduced in the bias path, and due to inter-finger shielding patterns, loss caused by the effect of proximity to the substrate is decreased. Note that all the traces are routed using ultra-thick-metal (UTM) traces to minimize the ohmic loss. In [30], the efficiency of a transformer is expressed as: $$\eta = \frac{R_{\text{Load}}/N^2}{\frac{\omega L_P/Q_S + R_{\text{Load}}/N^2}{\omega k L_P} \frac{\omega L_P}{Q_P} + \frac{\omega L_P}{Q_S} + R_{\text{Load}}/N^2}$$ (3.14) Figure 3.15: Simulated (a) Q factor and (b) effective inductance of the primary and secondary loop, respectively; simulated (c) coupling factor and (d) insertion loss, respectively. Figure 3.16: Simulation results of (a) passive efficiency using an EM-simulator for the power combiner network; (b) drain efficiency of the complete Doherty DPA. where $R_{\text{Load}}$ is the load impedance of the transformer, N is the turn ratio, k is the coupling factor, $L_P$ is the inductance of the transformer's primary loop, and $Q_P$ and $Q_S$ are the inductance of the transformer's primary and secondary loop, respectively. As (3.14) indicates, k should be large to reduce the insertion loss. However, in practical implementations, k cannot be as high as 1. Consequently, an extra capacitor in the primary loop should be added to resonate out the leakage inductance. Figure 3.15 shares the EM simulation results of the previously described transformer structure. According to Fig. 3.15(a),the Q factor of the primary loop and secondary loop is better than 10 and 8 within the frequency range of 1.5 to 2.9 GHz, respectively. The effective inductance is about 0.51 and 4.2 nH for the primary and secondary loop at 2.5 GHz, respectively, leading to an effective turn ratio better than 2.8. This number is somewhat lower than the ideal, expected turn ratio of 3 due to geometry mismatch between the loops and the non-ideal coupling factor. However, it still provides a sufficiently low impedance level for the DPA to reach more than 22 dBm output power in the simulation. Also, the realized k of the transformer up to 4 GHz is 0.82 (Fig. 3.15(c)). Furthermore, the insertion loss is proved to be better than -1 dB in the frequency range of 1.7 to 4 GHz (Fig. 3.15(d)). The C-L-C $\lambda/4$ impedance inverter is located on the secondary side, allowing for a single-ended version, without the need to enforce symmetry (Fig. 3.14). According to the EM simulator, L should be around 6.2 nH. On one side, the explicit capacitor deployed is a MOM capacitor. On the other side, parasitic capacitance among the pads and ground mimics MOM capacitors. Based on the EM simulations, the loss of the whole impedance inverter is less than 1 dB at 2-3 GHz. Note that underneath the transformers and inductors, shielding structures are placed until M4 (UTM is M7 in this technology). Doing so not only suppresses Eddy currents, but also relieves mechanic stress, yielding improved reliability. The passive efficiency of the whole power combiner network versus the output power $P_{\text{OUT}}$ is plotted in Fig. 3.16(a), presenting a passive efficiency of 72 % and 58 % at peak power and the 6 dB PBO region, respectively. The passive efficiency depends on the output power for two reasons. First, the main branch output experiences more loss than the peak branch due to the $\lambda/4$ impedance inverter. Second, the passive efficiency varies with the load modulation. Figure 3.16(b) presents the simulated efficiency versus $P_{\text{OUT}}$ for the complete Doherty DPA. The peak power is about 22 dBm, while the drain efficiency is 56 % and 44 % in the peak power region and 6 dB PBO region, respectively. To test the Doherty DPA performance when antenna impedance variation is present, load-pull simulations at peak power and 6 dB PBO were carried out, respectively. The results are shown in Fig. 3.17. For the output power and drain efficiency contour, the difference between each adjacent contour is 0.3 dBm and 3 %, respectively. Although the $R_{\rm opt}$ for output power and drain efficiency do not overlap, they are relatively closely located in the Smith chart and near the 50 $\Omega$ point. This simulation result demonstrates that the digital Doherty DPA can achieve good efficiency with non-extreme load variations. At first glance, with an inductor and two transformers in the matching network, the Doherty DPA seems to occupy more area than [19] and [22], which contains fewer transformers. However, since the inductor in this work can be placed in the corner between the two transformers, the overall Figure 3.17: Load-pull simulation results of the Doherty DPA at (a) peak power condition (ACW=4095) and (b) 6 dB PBO operation (ACW=2048). area needed for the PA banks and matching network is less than half of the reported area in [19]. Furthermore, its IC-area consumption is comparable in size to [22], where only one transformer is employed. In contrast, the drain efficiency at peak power and in the 6 dB PBO region in this work is higher than [19] and [22]. ## 3.5 Implementation of Digital Baseband A block diagram of the digital baseband is depicted in Fig. 3.18, comprising a CORDIC, a fractional delay line, DPD LUTs, and normalizers. Their sampling frequency is $f_{\rm LO}/4$ at most due to the speed limitation imposed by the SRAMs. The realized demonstrator can also use lower sampling speeds by selecting $f_{\rm LO}/8$ , $f_{\rm LO}/16$ , and $f_{\rm LO}/32$ , yielding lower power consumption when targeting lower data rates. As stated above, by using a CORDIC, the AM and PM signals are derived from a $2 \times 10$ bit input I/Q signal. A fractional delay line compensates for the delay mismatch between the AM and PM paths. Its accuracy is better than 250 ps; its basic structure is adopted from [28]. Two on-chip ACW-AM and ACW-PM correction LUT SRAMs are integrated into the DPA, through which the AM signal is first pre-distorted by the ACW-AM LUT and is then fed to the Doherty decoder, which activates the main and peak switch bank devices. Likewise, the ACW-PM LUT pre-distorts the PM and maps it back to the Cartesian domain using a normalizer implemented as an on-chip SRAM. This approach enforces a constant amplitude IQ phase vector (shown in Fig. 3.18) to drive the phase modulator. Furthermore, the normalizer can also perform the IQ image and LO leakage calibration for the phase modulator. Figure 3.18: Conceptual block diagram of the digital baseband block. ## 3.6 Implementation of LO Generation Circuits A conceptual block diagram of the LO generation circuitry is shown in Fig. 3.19. The single-ended external LO at $4 \times f_{\rm LO}$ is firstly transferred to its differential representatives, $4 \times f_{\rm LO,0}$ and $4 \times f_{\rm LO,180}$ on-chip. The geometry of the 1:1 transformer is 170 $\mu m \times 170 \mu m$ , and the windings are routed with UTM. The width of the trace is 8 $\mu m$ , and the spacing is 2 $\mu m$ . The operation frequency range of the balun is from 2 GHz to 20 GHz according to the EM simulations. The center-tap of the balun is biased at VDD/2, which sets the DC voltage of the differential output. This voltage can be adjusted if the duty-cycle of the differential signal is not exactly 50 %. This duty-cycle mismatch can result from non-identical inter-connecting parasitics between the differential outputs. Consequently, the two signals could arrive at the following divider misaligned and impact the phase accuracy. Therefore, scaled back-to-back CMOS inverters are employed as phase aligners. Next, the $4 \times f_{\rm LO,0}$ and $4 \times f_{\rm LO,180}$ are divided by 2, generating the four phases: CLK2IP, CLK2IN, CLK2QP, and CLK2QN. To generate the subsequent eight phases required, quadrature dividers are employed. Given this, Fig. 3.20 from [31] compares commonly used divider topologies: CML dividers, C<sup>2</sup>MOS dividers, and regular CMOS dividers. Note that the exact numbers in Fig. 3.20 highly depend on the applied CMOS technology node and implementation. C<sup>2</sup>MOS logic was chosen to implement this divider after taking power consumption and operation frequency into account. The schematic of the C<sup>2</sup>MOS divider in the first and second stage is shown in Fig. 3.21(a). Note that the CLK and D input shown in Fig. 3.21(a) are swapped, in contrast to conventional C<sup>2</sup>MOS devices, to decrease the delay from D to Q ([12]). This swapping substantially expands the divider's operating frequency range. The back-to-back inverters in Fig. 3.21(a) prevent illegal states and re-align the differential quadrature phases. Figure 3.19: Conceptual block diagram of LO generation circuitry. Figure 3.20: Comparison of different types of dividers in terms of operating frequency and power in a 40 nm bulk CMOS technology [31]. In the second stage, CLK2IP and CLK2IN are divided by two again. The output signals are CLK1P and CLK1N, which are retimed by CLK2IP, CLK2IN, CLK2QP, and CLK2QN to generate the eight phases required in the correct order. The C<sup>2</sup>MOS DFF is quite similar to the divider mentioned above, except for breaking the feedback path from Q to D. Figure 3.21(b) shows its schematic. Note that there is a dummy divider in the second stage to match the load between CLK2IP, CLK2IN, CLK2QP, and CLK2QN. Although this circuitry is different from the ring divider commonly employed in the literature ([31]), simulation results show this divider can operate up to 12 GHz despite PVT variations. At the 12 GHz typical-typical corner, the simulated phase noise is better than -150 dBc/Hz at an offset frequency of 10 MHz. The DFF can also support up to 8 GHz with PVT variations. The sampling clock signal for the baseband circuit is also generated from the signal CLK1P and CLK1N in Fig. 3.19. Since the sampling frequency is relatively low, CMOS DFF dividers are designed using cells from the standard library. There are four options for sampling frequency: $f_{\rm LO}/4$ , $f_{\rm LO}/8$ , $f_{\rm LO}/16$ , and $f_{\rm LO}/32$ , while SPI will control the selection. ## 3.7 Digital Pre-Distortion For the inherently non-linear DPA behavior discussed in this chapter, DPD is necessary to achieve an acceptable linearity performance. As indicated in Section 3.5, an ACW-AM and an ACW-PM LUT are used to pre-distort the polar Doherty DPA. First, memoryless LUT DPD is applied because of its simplicity. Although the analysis in Section 3.2 can model the distortion reasonably accurately, the unavoidable difference in the EM structure between simulations and measurements will undermine DPD approaches using such a model. Therefore, the content of LUTs is based upon the inversions of the measured ACW-AM and ACW-PM curves resulting from a CW test. Additionally, to obtain better performance for higher-order signals, such as QAM and OFDM with a large modulation bandwidth, the PA's memory effects have to be considered. In general, these memory effects can be caused by dynamic thermal behavior and semiconductor trapping effects (e.g., in a GaN device) but are mostly determined by the bias network. A memory polynomial (MP) DPD based on the Volterra series can be implemented to correct these phenomena by adopting an indirect learning architecture. The basic MP model can be expressed as in [6]: $$P(n) = \sum_{i=0}^{L} \sum_{j=0}^{M} a_{ijm} |Q(n-i)|^{(m-j)} + N(n)$$ (3.15) where Q(n) is the received RF output, P(n) is the pre-distorted signal, $a_{ijm}$ are the DPD coefficients, L is the memory length, M is the non-linearity order, and N(n) is the Gaussian distributed noise with a mean value of zero. A few iterations are typically needed for the MP to converge to achieve an optimized pre-distorted signal. However, as discussed in Section 3.2, for the digital Doherty PA, the transition point, where the peak branch is activated, appears as a singularity in the ACW-AM and ACW-PM curves. Therefore, Figure 3.21: Schematic of (a) $\mathrm{C}^2\mathrm{MOS}$ fully differential divided-by-2 divider; (b) $\mathrm{C}^2\mathrm{MOS}$ fully-differential DFF. Figure 3.22: Chip micrograph of fabricated polar Doherty DTX. main and peak branches should be pre-distorted separately in the ACW-AM and ACW-PM LUT to improve the DPD algorithm convergence, respectively. In the measurement of this demonstrator, data streaming is not possible, and the correction occurs offline. The ACW-AM curve and ACW-PM curve are calculated offline on a PC and then loaded onto the LUT by the SPI. The speed of the SPI interface limits the LUT update speed, causing it to take about 2 s to fill the ACW-AM and ACW-PM LUTs. Future work can be targeted at integrating a complete down-conversion path and on-chip DPD engine to facilitate real-time data streaming with the linearized output signal. ### 3.8 Measurement Results This section addresses the measurement results of the proposed Doherty TX demonstrator in this chapter. The proposed DTX is fabricated in a 40 nm CMOS bulk process; its chip micrograph is shown in Fig. 3.22. This DTX occupies 8.2 mm<sup>2</sup>; its core area is about 5 mm<sup>2</sup>, including digital baseband circuitry, which occupies $1.1 \times 0.65$ mm<sup>2</sup>. The Doherty DPA and matching network occupies less than 0.9 mm<sup>2</sup>. The remaining chip area is occupied by decoupling capacitors and I/O pads. All I/O, including the single-ended RF input clock and RF output, are wire-bonded directly to the measurement board. As shown in Fig. 3.22, a staggered bond pad pattern is employed in the I/O ring except for the RF output. All pads in the outer loop of the IO ring are down-bonded to the ground on the PCB to reduce parasitic inductance in the ground connection. Meanwhile, to decrease the parasitic inductance of the bonding wire in the supply and bias path, chip capacitors are placed as close as possible to the die for decoupling purposes. Figure 3.23: Diagram of the measurement setup used for the polar Doherty DTX characterization. #### 3.8.1 Measurement Setup The measurement setup is shown in Fig. 3.23. The DC supply and bias are generated on the LDO board, employing low-noise standalone LDOs from Analog Devices Inc., which share a common supply voltage of 5 V from the DC supply source. A signal generator generates the singled-end sinusoid LO signal at a frequency of $4 \times f_{LO}$ . The output signal is split into two paths. One flows directly to a spectrum analyzer to measure the spectral purity, while the other goes to a power meter. When measuring the EVM, a high speed sampling oscilloscope is used to measure the output signal, and the related EVM is computed on a PC. SPI control is facilitated by using an SPI interface board between the DUT and PC. Note that a level shifter is needed to transfer the signal-swing from the standard 3.3 V SPI interface to the 1.1 V to be provided to the CMOS die. For the measurements, IQ data is uploaded to two on-chip SRAMs. #### 3.8.2 CW Measurement Results The DTX demonstrator is firstly characterized for CW Doherty DPA operation. The carrier frequency is swept from 2.3 to 2.8 GHz in steps of 100 MHz for these measurements. For frequencies lower than 2.3 GHz, the performance is limited by the Doherty matching network, which makes the efficiency enhancement effect less effective. When $f_{\rm LO}$ is higher than 2.8 GHz, the LO generation circuitry becomes the bottleneck. Figure 3.24(a) shows the peak RF output power ( $P_{\rm out}$ ), peak drain efficiency ( $\eta_{\rm PEAK}$ ), and drain efficiency in 6 dB PBO operation ( $\eta_{\rm 6dB}$ ) versus LO frequency ( $f_{\rm LO}$ ) from 2.3 GHz to 2.8 GHz. In this frequency range, the peak $P_{\rm out}$ is 21.4 dBm, with a fluctuation of Figure 3.24: Measured (a) drain efficiency and RF output power vs. $f_{LO}$ ; (b) drain efficiency vs. RF output power at $f_{LO}$ =2.5 GHz. less than 1 dBm, and the $\eta_{\rm PEAK}$ is between 52 % and 46 %. The fractional operational bandwidth proves to be higher than 20 %. Also, the AM control word is swept to measure the efficiency versus output power curve. Figure 3.24(b) shows the resulting drain efficiency $\eta_{\rm drain}$ versus $P_{\rm out}$ at 2.5 GHz, indicating that the $\eta_{\rm PEAK}$ is 49.4 % and $\eta_{\rm 6dB}$ is 33.7 %, respectively. Compared to the class-B scenario, the drain efficiency is enhanced by about 9 % efficiency points, and becomes 1.4 times higher than the normalized class-B roll-off in the 6 dB PBO region. The phase noise of a single branch in static operation is measured for various carrier frequencies. The noise floor is better than -138 dBc/Hz. The goal of measuring only one branch is to explore the noise performance of a single TX chain. Figure 3.25(a) shows the single branch phase noise at 2.4 GHz, and its integrated R.M.S jitter is less than 98 fs. The integrated jitter at the LO input is found to be 14 fs at 9.6 GHz, which is shown in Fig. 3.25(b). Note that the extra jitter results are not only from the dividers but also from the whole TX chain, including phase modulator, phase buffer, and DPA. #### 3.8.3 Single-Tone and Two-Tone Measurement Results Following the CW measurements, single-tone and two-tone tests are carried out. In a single-tone test, first, LO leakage and IQ image suppression are examined to test the phase modulator. In this experiment, the LO frequency is set to 2.5 GHz, and the baseband frequency of signals is approximately 150 kHz. Figure 3.26 demonstrates that, even without applying any I/Q calibration, the LO leakage and image levels are -56 dBc and -50 dBc, respectively. After IQ calibration, the LO leakage and IQ image reduce to -58 dBc and -72 dBc, respectively. More phase modulator testing results will be presented in Chapter 4. Figure 3.25: Measured phase noise and integrated jitter at (a) $f_{LO}$ =2.4 GHz at the output of the single branch; (b) input LO signal at 9.6 GHz. Figure 3.26: Measured IQ image and LO leakage of phase modulator (a) before and (b) after calibration. Figure 3.27: Measured (a) ACW-AM and (b) ACW-PM characteristic Also, a two-tone signal is applied to the TX to examine the "dynamic" ACW-AM/ACW-PM behaviors. In Fig. 3.27, the measured ACW-AM and ACW-PM curves with and without incorporating on-chip DPD are shown. At 2.5 GHz, before applying DPD, the IM3 is worse than -20 dBc, and it improves to better than -45 dBc after applying the on-chip DPD. As can be seen from Fig. 3.27, there is a singular point in the 6 dB PBO region resulting from the applied Doherty architecture. #### 3.8.4 Broadband Measurement Results Finally, the DTX is also characterized using a broadband modulated signal. Figure 3.28(a) shows the measured close-in spectrum of a 20 MHz bandwidth 64-QAM single-carrier signal at 2.5 GHz. The spectrum emission is better than -40 dBc with an EVM better than -30 dB. Figure 3.28(b) shows the measured close-in spectrum and constellation diagram with 40 MHz bandwidth 64-QAM signal at 2.4 GHz. The average drain efficiency is 25 %, with an average output power of 14.2 dBm. Note that this measurement is done using only the on-chip LUTs, and its results could be further improved when applying a more advanced DPD algorithm. Meanwhile, with only one branch, this polar TX can still achieve an emission better than -40 dBc and an EVM better than -27 dB with an 80 MHz 64-QAM signal at 2.4 GHz, as will be shown in Chapter 4. #### 3.8.5 Performance Summary and Comparison with the State-of-the-Art Table 3.1 summarizes the performance of the proposed Doherty DPA with state-of-the-art DPAs featuring various efficiency enhancement techniques. Our proposed Doherty DPA achieves the highest peak drain efficiency with comparable drain efficiency at the 6 dB PBO region point. What is more, the overall area of the proposed Doherty DPA is small compared to other Doherty DPAs using a lower number of transformers/inductors. In [19] and [21], class-G topologies are employed to switch the supply voltage when the DPA is in the deep PBO region. Although this technique can enhance the efficiency in the deep PBO region, switching the DPA's supply voltage will result Figure 3.28: Measured spectrum and constellation diagram of 20 MHz (a) and 40 MHz (b) 64-QAM signals. 3.9 Conclusion 53 | Table 3.1: Performance summary a | and comparison | with state-of-the-art | DPAs with | various efficiency | |----------------------------------|----------------|-----------------------|-----------|--------------------| | enhancement techniques. | | | | | | Ref | Hu'<br>JSSC15 | Hu'<br>JSSC16 | Hu'<br>RFIC16 | Vorapipat'<br>JSSC17 | Vorapipat'<br>JSSC17 | Yin'<br>JSSC18 | Jung'<br>JSSC20 | Qian<br>JSSC21 | This Work | |-------------------------------------|-------------------|---------------------|----------------|----------------------|---------------------------|----------------------|------------------|------------------|------------------| | Process (nm) | 65 | 65 | 40 | 65 | 45 SOI | 55 | 45 SOI | 40 | 40 | | Architecture | Doherty | Doherty<br>+Class G | Outphasing | Doherty | Doherty<br>+Class G | Dual-Band<br>Doherty | Doherty | Doherty | Doherty | | Frequency<br>(GHz) | 3.1-3.98 | 3-4.32 | 5.8-6.1 | 0.85-1.1 | 3-4.3 | 0.85/1.7 | 2.1-2.5 | 2.3-3.5 | 2.3-2.8 | | Peak Pout | 27.3 | 26.7 | 22.2 | 24 | 25.3 | 27 | 22.4 | 23.6 | 21.4 | | Peak η <sub>drain</sub> (%) | 32.5 | 40.2 | 49.2 | 45 | 30.4 | 25.4 | 38.5 | 38.5 | 49.6 | | η <sub>drain</sub> (%)<br>@PBO (dB) | 23.5<br>@ (6 dB) | 37<br>@ (6 dB) | 20<br>@ (8 dB) | 34<br>@ (6 dB) | 25.3<br>@ (6 dB) | 16.8<br>@ (6 dB) | 18.7<br>@ (9 dB) | 29.5<br>@ (6 dB) | 33.7<br>@ (6 dB) | | Area (mm²) | 2.1 | 3.2 | 0.8 | 0.43,4 | 1.2 <sup>4.</sup> | 0.74 | 24 | 0.83 | 0.85 | | Modulation<br>Type | 16 QAM | 16 QAM | 64 QAM | 256 QAM | 256 QAM | 256 QAM | 64 QAM | 64 QAM | 64 QAM | | Bandwidth<br>(MHz) | 0.5 | 1 | 20 | 40 | 40 | 20 | 40 | 20 | 40 | | EVM (dB) | -27 | -30 | -30 | -34.5 | -35.8 | -30.5 | -32 | -30 | -31 | | Emission/ACL<br>R (dBc) | <-35 <sup>1</sup> | -26.5 | <-40 | <-45 | < <b>-40</b> <sup>1</sup> | N.A. | -30.6 | -34 | <-40 | | PAPR (dB) | 5.4 | 5.4 | 5.8 | 9 | 10 | 6.2 | 7.1 | 6.2 | 7.1 | | Average η <sub>drain</sub> (%) | 22.1 | 26.5 | 23.3 | 22 | 19.2 | 22.7 | 24.7 | 24.6 | 25 | | Matching<br>Network | On-Chip | On-Chip | On-Chip | Off-Chip | On-Chip | On-hip | On-Chip | On-Chip | On-Chip | | DPD | Off-Chip On-Chip | | Whole TX<br>Chaim (Y/N) | No Yes | <sup>&</sup>lt;sup>1</sup> Estimated from measured spectrum; <sup>2</sup> 2dB bandwidth estimated from measured results <sup>3</sup> Without matching network; <sup>4</sup> Estimated from the micrograph in high interference in the output and a degradation of the linearity. The proposed Doherty PA's linearity by using on-chip LUT is sufficient to achieve a 40 MHz signal bandwidth with -31 dB EVM. Although the linearity is somewhat less than [21], higher efficiency is achieved in this work. In Table 3.2, the complete TX chain is compared with the state-of-the-art polar DTXs. Our proposed TX includes full integration of the whole TX chain on a single chip, and in that sense, it is the first reported "Bits-In, RF-Out" polar Doherty DTX. Furthermore, it can support a 40 MHz 64-QAM signal using a full Doherty DPA. Using only one branch, the TX supports an 80 MHz 64-QAM signal, which is the largest reported video bandwidth in the literature for polar DTXs. #### 3.9 Conclusion This chapter describes the world's first highly efficient, fully integrated polar Doherty DTX in CMOS technology. This Doherty DPA achieves 49.4 % drain efficiency at a peak power of +21.4 dBm and 33.7 % drain efficiency at 6 dB PBO by employing a class-E topology. By utilizing a novel wideband on-chip phase modulator and LUT-based DPD, the proposed DTX demonstrates linearity better than -40 dBc for a 40 MHz 64-QAM signal at 2.4 GHz while its average drain efficiency and EVM are better than 25 % and -31 dB, respectively. With only one branch, the bandwidth can be extended to 80 MHz. The proposed DTX architecture is scalable with advanced CMOS processes. | Reference | Ye'<br>JSSC13 | Markulic'<br>JSSC19 | Li'<br>RFIC19 | Zhu'<br>JSSC17 | Xu'<br>JSSC 19 | Zheng'<br>JSSC13 | Winoto'<br>ISSCC16 | This Work | | |-------------------------|------------------------|---------------------|---------------|-------------------|----------------|----------------------|--------------------|------------------|------------------| | Process (nm) | 65 | 28 | 40 | 55 | 16 | 65 | 28 | 40 | | | Frequency<br>(GHz) | 2-2.5 | 4.95-6.05 | 1.2-2.5 | 1.6-2.4 | 0.8-1 | 1.8-2.4 | 2.4/5 | 2.3-2.8 | | | DPA<br>Architecture | Impedance<br>Switching | Class B | Class B | Class B | Class-B | Class B | Class B | Doherty | Class B | | Peak Pout | 23.3 | 15 | 20.1 | 21.9 | 11.5 | 24 | 27.3 | 21.4 | 15 | | Modulation<br>Type | 64 QAM | 256 QAM | 64 QAM | 64 QAM | 64 QAM | 64 QAM | 64QAM | 64 QAM | 64 QAM | | Bandwidth<br>(MHz) | 20 | 2.5 | 10 | 20 | 2 | 40 | 40 | 40 | 80 | | EVM (dB) | -28 | -37 | -30 | -32 | -28 | -28 | -30 | -31 | -27 | | Emission/AC<br>LR (dBc) | <-40 | -35 | -31 | <-40 | <-55 | <-401 | <-45 <sup>1</sup> | <-40 | <-40 | | Matching<br>Network | On-Chip | On-Chip | On-Chip | On-Chip | On-Chip | On-Chip | N.A. | On-Chip | On-Chip | | LO<br>Generation | Off Chip | On-Chip | Off Chip | On-Chip | On-Chip | On-Chip | On-Chip | Off Chip | Off Chip | | DPD (Y/N) | Yes<br>(Off-Chip) | Yes<br>(On-Chip) | No | Yes<br>(Off-Chip) | N.A. | Yes<br>(Off-Chip) | Yes<br>(On-Chip) | Yes<br>(On-Chip) | Yes<br>(On-Chip) | | TX Chain | No Baseband<br>& DPD | No Baseband | Whole Chain | N.A. | No Baseband | No Baseband<br>& DPD | Whole Chain | Whole Chain | Whole Chain | Table 3.2: Performance summary and comparison with state-of-the-art polar DTXs. <sup>1</sup> Estimated from the spectrum. ### References - [1] M. Acar et al., "Analytical Design Equations for Class-E Power Amplifiers," in *IEEE Transactions* on Circuits and Systems I: Regular Papers, vol. 54, no. 12, pp. 2706-2717, Dec. 2007 - [2] R. B. Staszewski *et al.*, "All-digital PLL and transmitter for mobile phones," in *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 1468-1482, Dec. 2005 - [3] L. Ye, et al., "A Digitally Modulated 2.4GHz WLAN Transmitter with Integrated Phase Path and Dynamic load Modulation in 65nm CMOS," in *ISSCC Tech. Digest*, Feb. 2013, pp. 330-331. - [4] S. Zheng et al., "A WCDMA/WLAN Digital Polar Transmitter With Low-Noise ADPLL, Wideband PM/AM Modulator and Linearized PA," in *IEEE J. Solid-State Circuits*, vol. 50, no. 7, pp. 1645-1656, Jul. 2015 - [5] R. Winoto et al., "9.4 A 2×2 WLAN and Bluetooth combo SoC in 28nm CMOS with on-chip WLAN digital power amplifier, integrated 2G/BT SP3T switch and BT pulling cancelation," in ISSCC Tech. Digest, Feb. 2016, pp. 170-171 - [6] Q. Zhu et al., "A Digital Polar Transmitter With DC–DC Converter Supporting 256-QAM WLAN and 40-MHz LTE-A Carrier Aggregation," in *IEEE J. Solid-State Circuits.*, vol. 52, no. 5, pp. 1196-1209, May. 2017. - [7] P. Madoglio et al., "13.6 A 2.4GHz WLAN digital polar transmitter with synthesized digital-to-time converter in 14nm trigate/FinFET technology for IoT and wearable applications," 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, pp. 226-227. 3.9 References 55 [8] T. Li et al., "A Wideband Digital Polar Transmitter with Integrated Capacitor-DAC-Based Constant-Envelope Digital-to-Phase Converter," 2019 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Boston, MA, USA, 2019, pp. 83-86 - [9] K. Xu et al., "A 0.85mm2 51%-Efficient 11-dBm Compact DCO-DPA in 16-nm FinFET for Sub-Gigahertz IoT TX Using HD2 Self-Suppression and Pulling Mitigation," in *IEEE Journal* of Solid-State Circuits, vol. 54, no. 7, pp. 2028-2037, July 2019. - [10] N. Markulic et al., "A 5.5-GHz Background-Calibrated Subsampling Polar Transmitter With -41.3-dB EVM at 1024 QAM in 28-nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 54, no. 4, pp. 1059-1073, April 2019, - [11] A. Ben-Bassat et al., "10.5 A Fully Integrated 27dBm Dual-Band All-Digital Polar Transmitter Supporting 160MHz for WiFi 6 Applications," 2020 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2020, pp. 180-182. - [12] M. S. Alavi et al., "A Wideband 2× 13-bit All-Digital I/Q RF-DAC," in IEEE Transactions on Microwave Theory and Techniques, vol. 62, no. 4, pp. 732-752, April 2014. - [13] Z. Deng et al., "9.5 A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40nm CMOS," 2016 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 172-173. - [14] W. Yuan et al., "A Quadrature Switched Capacitor Power Amplifier," in *IEEE Journal of Solid-State Circuits*, vol. 51, no. 5, pp. 1200-1209, May 2016. - [15] Y. Shen et al., "A fully-integrated digital-intensive polar Doherty transmitter," 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Honolulu, HI, 2017, pp. 196-199. - [16] N. Wongkomet, et al., "A +31.5 dBm CMOS RF Doherty Power Amplifier for Wireless Communications," in *IEEE J.Solid-State Circuits*, vol. 41, no. 12, pp. 2852-2859, Oct 2006. - [17] Y. Chee, et al., "17.1 A Digitally Assisted CMOS WiFi 802.11ac/11ax Front-End Module Achieving 12% PA Efficiency at 20dBm Output Power with 160MHz 256-QAM OFDM Signal,"in ISSCC Tech. Digest, Feb. 2017, pp. 292-293 - [18] S. Hu *et al.*, "Design of A Transformer-Based Reconfigurable Digital Polar Doherty Power Amplifier Fully Integrated in Bulk CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 50, no. 5, pp. 1094-1106, May 2015. - [19] S. Hu et al., "A Compact Broadband Mixed-Signal Power Amplifier in Bulk CMOS With Hybrid Class-G and Dynamic Load Trajectory Manipulation," in *IEEE Journal of Solid-State* Circuits, vol. 52, no. 6, pp. 1463-1478, June 2017 - [20] V. Vorapipat et al., "Voltage Mode Doherty Power Amplifier," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 5, pp. 1295-1304, May 2017. - [21] V. Vorapipat et al., "A Class-G Voltage-Mode Doherty Power Amplifier," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 12, pp. 3348-3360, Dec. 2017 - [22] Y. Yin *et al.*, "A Compact Dual-Band Digital Polar Doherty Power Amplifier Using Parallel-Combining Transformer," in IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1575-1585, June 2019. - [23] S. Hung et al., "A Quadrature Class-G Complex-Domain Doherty Digital Power Amplifier," 2019 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Boston, MA, USA, 2019. - [24] J. Sheth et al., "A Differential Digital 4-way Doherty Power Amplifier with 48% Drain Efficiency for Low Power Application," 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Los Angeles, CA, USA, 2020. - [25] B. Yang, H. J. Qian and X. Luo, "26.5 A Watt-Level Quadrature Switched/Floated-Capacitor Power Amplifier with Back-Off Efficiency Enhancement in Complex Domain Using Reconfigurable Self-Coupling Canceling Transformer," 2021 IEEE International Solid- State Circuits Conference (ISSCC), 2021, pp. 362-364 - [26] H. Qian et al., " A Quadrature Digital Power Amplifier with Hybrid Doherty Impedance boosting for Efficiency Enhancement for in Complex Domain", 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Los Angeles, CA, USA, 2020. - [27] D. Jung et al., "A CMOS 1.2-V Hybrid Current- and Voltage-Mode Three-Way Digital Doherty PA With Built-In Phase Nonlinearity Compensation," in *IEEE Journal of Solid-State Circuits*, vol. 55, no. 3, pp. 525-535, March 2020. - [28] M. Hashemi *et al.*, "An Intrinsically Linear Wideband Polar Digital Power Amplifier," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 12, pp. 3312-3328, Dec. 2017. - [29] M. Hashemi *et al.*, "A Highly Linear Wideband Polar Class-E CMOS Digital Doherty Power Amplifier," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 67, no. 10, pp. 4232-4245, Oct. 2019. - [30] I. Aoki, S. D. Kee, D. B. Rutledge and A. Hajimiri, "Distributed active transformer-a new power-combining and impedance-transformation technique," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 50, no. 1, pp. 316-331, Jan. 2002. - [31] W. Wu et al., "A 56.4-to-63.4 GHz Multi-Rate All-Digital Fractional-N PLL for FMCW Radar Applications in 65 nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 49, no. 5, pp. 1081-1096, May 2014 - [32] Z. Hu, L. C. N. de Vreede, M. S. Alavi, D. A. Calvillo-Cortes, R. B. Staszewski and S. He, "A 5.9 GHz RFDAC-based outphasing power amplifier in 40-nm CMOS with 49.2 % efficiency and 22.2 dBm power," 2016 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), San Francisco, CA, 2016, pp. 206-209. 3.9 References 57 [33] L. Xiong, T. Li, Y. Yin, H. Min, N. Yan and H. Xu, "4.2 A Broadband Switched-Transformer Digital Power Amplifier for Deep Back-Off Efficiency Enhancement," 2019 *IEEE International Solid-State Circuits Conference - (ISSCC)*, San Francisco, CA, USA, 2019, pp. 76-78. ### CHAPTER. # RFDAC-Based Wideband Phase Modulator In this chapter, a fully integrated Cartesian RFDAC-based wideband phase modulator for a polar DTX is implemented in a 40 nm bulk CMOS process [1]. A harmonic rejection (HR) RFDAC architecture which suppresses the third and fifth harmonics is proposed to boost in-band linearity and the frequency range of operation. The achieved frequency agility of the phase modulator is verified over a 0.6-2.5 GHz range yielding an EVM of -34.5 dB and -36 dB for 18 Mb/s and 75 Mb/s GMSK signals, respectively. It can also support an 80 MHz 64-QAM signal in combination with an on-chip AM modulator. The power consumption of the proposed phase modulator is only 33 mW at 2.4 GHz This chapter is organized as follows: Section 4.1 presents a brief overview of the current state-of-the-art phase modulators. Section 4.2 analyzes the source of phase error in Cartesian-based phase modulators and proposes a novel architecture to widen the operation frequency range. Following that, Section 4.3 and Section 4.4 focus on the circuit implementation of the RFDAC and subsequent limiter, respectively. Experimental results will be shown in Section 4.5, and Section 4.6 concludes this chapter. #### 4.1 Overview of Phase Modulators Various DTX concepts have been proposed which exploit highly efficient switch-mode PA operation, e.g., standalone polar, as well as, more recently, digital Doherty and outphasing concepts. In all of these DTX concepts, the phase modulator is ultimately the key building block that significantly impacts the achievable overall linearity of the targeted TX line-up. The most straightforward Figure 4.1: (a)Top-level conceptual diagram of a typical CP-PLL (type 2); (b) small-signal model of the CP-PLL in the phase domain. way to achieve phase modulation is to apply the modulation data directly to a voltage control oscillator (VCO). Although this approach has the advantage of a wide modulation bandwidth, it also suffers from various disadvantages such as the VCO phase transfer non-linearity, high close-in phase noise, and relatively large frequency deviations due to PVT variations. In the literature, several alternative phase modulator concepts have been introduced to overcome those shortcomings. They can be categorized into three main types: the PLL-based phase modulator, delay line-based phase modulator, and Cartesian-based phase modulator. ### 4.1.1 PLL-Based Phase Modulators To alleviate the aforementioned VCO non-linearity, the most widely used approach to implement a phase modulator is to incorporate this function within a PLL. A block diagram of a conventional charge pump (CP)-PLL is shown in Fig. 4.1(a), including the phase-frequency detector (PFD), CP, LPF, VCO, and multi-mode divider (MMD). The principle of a PLL is similar to that of a negative feedback loop in the phase domain as it locks the phase of VCO to a reference. The divider reduces the VCO frequency first to the frequency of the reference signal. Then, the PFD measures the phase (and frequency) error of the divided signal with respect to this reference. This phase error Figure 4.2: Conceptual diagram of (a) the direct modulation; (b) the two-point modulation. is fed to the CP and is transferred to a voltage. This voltage is filtered by an LPF to suppress the reference spur. The digital $\Delta\Sigma$ block illustrated in Fig. 4.1(a) is used when there is no integer relation between the output frequency of the PLL and the reference signal. The small-signal model for the phase domain is shown in Fig. 4.1(b). Note that since the VCO output is expressed as a frequency signal instead of phase. Thus, in the phase domain, the VCO should be modeled as an integrator. From Fig. 4.1(b), it can be deduced that the output of the overall loop $\Phi_{\rm OUT}$ shows a low-pass frequency response to $\Phi_{\rm IN}$ and a high-pass frequency response to $\Phi_{\rm VCO}$ . To add phase modulation functionality to the PLL, perhaps the most straightforward way is by adding the phase data to the input of $\Delta\Sigma$ modulator, as shown in Fig. 4.2(a) ([2]). This approach yields the lowest amount of additional circuitry and helps to decrease power consumption and circuit complexity. What is more, thanks to the high Q of the resonator in the VCO, the harmonic level in the output is low, eliminating the need for an additional filter. However, the bandwidth of the modulation signal cannot be wide due to the limited bandwidth of the PLL loop. Another path going directly to the VCO can be added to widen the video bandwidth. The resulting output presents high-pass behavior to the signal fed directly to the VCO. The low-pass behavior from the input and high-pass behavior for the extra signal path to the VCO can be effectively summed together, resulting in an all-pass behavior that extends the modulation bandwidth. This approach is called "two-point" modulation and is widely used for PLL-based phase modulators. Its conceptual Figure 4.3: Open-loop phase modulator with a coarse tapped delay line with (a) a fine digitally-controlled delay and (b) a digital $\Delta\Sigma$ modulator. diagram is shown in Fig. 4.2(b). However, in practical situations, the modulation bandwidth is still limited since the total transfer function is not flat. What is worse, the loop parameters such as $K_{\text{VCO}}$ also vary due to PVT variations, yielding phase errors. Therefore, background calibration ([3] and [4]) is typically required for two-point modulation, which increases the complexity of the whole system. #### 4.1.2 Delay Line-Based Phase Modulators The aforementioned PLL-based phase modulators still have limitations preventing them from supporting large video bandwidths. Delay line-based phase modulators have been proposed to overcome this issue. One example of such an approach can be found in [5] and [6], which is focused on application in an outphasing TX. These modulators typically use inverter-based delay elements to delay the LO signal by the desired phase angle. As shown in Fig. 4.3(a), the inverter-based delay line delays the LO signal by the desired phase angle combined with a multiplexer (MUX). This delay can statically or dynamically change the phase selection with a MUX. Unlike PLL-based phase modulators, these modulators need an external PLL/oscillator to generate a static LO signal. Phase modulation can also be done through the controlled delay of the elements by modulating their supply voltage. To generate a delay/phase with more accuracy, a delay lock loop (DLL) can also be employed. The DLL controls each element's delay in order to align the edges of the input and the output of the last delay element. The drawback is that although the DLL regulates the delay of the whole line, the delay of the individual elements can still suffer from mismatch among elements, and as such, will also degrade the phase accuracy. In addition, the intrinsic delay of an inverter in a typical 40 nm CMOS technology is about 20 ps. To cover the whole period of a 2.4 GHz center frequency, at least 20 inverters are needed, which will take considerable design effort to match the delay accuracy. What is worse, 20 delay elements with a MUX can only provide a resolution of 4 bits, which is far less than the requirement of high-order modulation schemes such as 256-QAM. Consequently, auxiliary circuits are needed to generate a digital control delay line (DCDL) with finer steps (Fig. 4.3(a)). The concept shown in Fig. 4.3(a) is similar to a two-step ADC, whereas the fine DCDL is more challenging to design. Despite these drawbacks, this type of phase modulator supports WLAN signals with both 20 MHz and 40 MHz channel bandwidths using DPD. To avoid the design burden of finer steps in the DCDL, a $\Delta\Sigma$ modulator can also be used to achieve a more accurate phase resolution (e.g., see Fig. 4.3(b) in [7]). By having a high oversampling ratio (OSR) for the $\Delta\Sigma$ modulator, the phase accuracy will be improved at the expense of higher out-of-band noise. On the other hand in [7], measurements show that for 20 Mb/s, the shaped quantization noise in the $\Delta\Sigma$ modulator does not corrupt the noise floor. In this approach, the phase-modulated signal will not reach a high bandwidth, since a high OSR for the $\Delta\Sigma$ modulator must be kept to guarantee sufficient phase accuracy. Therefore, a large bandwidth (>80 MHz) signal cannot be supported due to the bandwidth extension. #### 4.1.3 Cartesian-Based Phase Modulators In [8], a Cartesian-based phase modulator is used in a digital polar TX line-up, and its conceptual diagram is shown in Fig. 4.4(a). This concept is based on a conventional Cartesian TX configuration. In such an architecture, if phase modulation data is converted into a constant-envelope I/Q signal by a normalizer, the Cartesian-based TX can serve as a phase modulator in a polar TX architecture. In such a line-up, if a switching PA bank is employed for AM modulation, a limiter will be required at the output of the phase modulator to convert the phase-modulated signal into a square wave, which in turn, is used to drive the switch banks. This architecture is able to meet the requirements for 20 MHz WLAN signals ([8]). A current steering RFDAC can provide the desired combination of a baseband DAC and mixer operation, where the baseband data is mixed with a square-wave signal with a frequency $f_{LO}$ to drive the output stage switch banks. Compared to other phase modulators, these phase modulators suffers from higher power consumption, due to the relatively power-hungry operation of RFDACs. Some power will also be consumed by the normalizers. Furthermore, the Cartesian-based phase modulator requires a passive BPF at the input of the limiter to suppress the harmonics, which typically includes an LC resonator. Implementation of such a filter will limit the operating frequency range while requiring a large area. The theoretical background of harmonics corrupting phase accuracy will be given in the following section. Last but not least, like delay line-based concepts, Cartesian-based phase modulators also require an external LO source. Figure 4.4: (a) Replacing BPF with RC-LPF in a conventional Cartesian-based phase modulator; (b) principle of C-IMD explained using single-sideband modulation. #### 4.2 Towards Wideband Cartesian-Based Phase Modulator In the previous section, three types of phase modulators were reviewed. Although the PLL-based phase modulator can generate the LO signal locally while consuming less power, its low-loop bandwidth limits its broadband applications. The delay line-based phase modulator can support a very large modulation bandwidth, but since PVT variations will corrupt its phase accuracy, background calibration is needed. Finally, a Cartesian-based phase modulator can support a very large video bandwidth and is free from bandwidth extensions. However, its output BPF limits its operation bandwidth while occupying a relatively large area. This section will propose a novel Cartesian-based phase modulator with an extended operation bandwidth and reduced area based on the HR technique. #### 4.2.1 Phase Error Source Analysis In Fig. 4.4(a), the operation frequency range is limited by the BPF, usually an LC resonator. Although the LC BPF can be designed to be reconfigurable when employing switched capacitor banks, its Q factor will be corrupted due to losses caused by the non-zero on-resistance ( $R_{\rm ON}$ ) of the switches. To reach higher flexibility and a smaller die area, an RC-LPF would be preferred despite its slow roll-off of out-of-band attenuation (Fig. 4.4). However, due to the high harmonic signal content of a conventional Cartesian-based RFDAC, the failure to suppress the LO harmonics adequately in combination with an output non-linearity would lead to the folding back of frequency components to the close-in spectrum. Such inter-modulation products are called counter-intermodulation distortion (C-IMD); the principle of third C-IMD (C-IMD3) and fifth C-IMD (C-IMD5) is explained graphically in Fig. 4.4(b). C-IMD is a mix product of the LO harmonics, the single sideband signal, and the output non-linearity. Since in an RFDAC, the LO signal has a duty cycle of 50 %, the Fourier expansion up to fifth order of which is: $$V_{\rm LO}(t) = e^{j\omega_{\rm LO}t} - \frac{1}{3}e^{j3\omega_{\rm LO}t} + \frac{1}{5}e^{j5\omega_{\rm LO}t}$$ (4.1) then the frequency component in the up-converted signal in (Fig. 4.4(b)-top) is $\omega_{LO} + \omega_{in}$ , $3\omega_{LO} - \omega_{in}$ and $5\omega_{LO} + \omega_{in}$ . When we assume that the RFDACs only have third and fifth order memoryless non-linearity: $$V_{\text{out}}(t) = a_1 v_{\text{in}}(t) + a_3 v_{\text{in}}^3(t) + a_5 v_{\text{in}}^5(t)$$ (4.2) then at the output of the limiter, there will be inter-modulation products close to the desired signal, namely, $\omega_{\rm LO} - 3\omega_{\rm in}$ and $\omega_{\rm LO} + 5\omega_{\rm in}$ (Fig. 4.4(b)-middle) ([9]). $\omega_{\rm LO} - 3\omega_{\rm in}$ is the joint product of both C-IMD3 and C-IMD5, whereas $\omega_{\rm LO} + 5\omega_{\rm in}$ is the product of C-IMD5 only. With an ideal limiter, such a single-sideband frequency component with respect to $\omega_{\rm LO}$ will contribute in-band components that yield phase errors (Fig. 4.4(b)-bottom). To obtain a clean spectrum, the harmonics should be suppressed adequately. In the time domain, these close-in distortion products appear as phase errors, which corrupt the zero-crossing point in the PM signal. Figure 4.5 plots the simulated Figure 4.5: Static phase error for various RC-LPFs. static phase error with a first-, second-, and third-order RC-LPF, respectively. Although using a higher-order RC-LPF will improve the phase error, the absolute phase error is still high. For instance, even when using a third-order RC-LPF, the static phase error can be as high as 2°. The motivation not to choose higher-order RC-LPFs is that they increase in-band attenuation, and thermal noise due to the resistors in the RC-LPF. In other words, another technique to suppress the LO harmonics would be most welcome. #### 4.2.2 Proposed System Architecture To allow frequency-agile filtering for harmonics, especially for the third and fifth, HR is an increasingly popular solution which was proposed for the first time in [10]. This technique significantly relieves the filtering requirements and facilitates a viable solution with reconfigurable low-order RC-LPFs. The principle of the HR technique is shown in Fig. 4.6(a). By employing three mixers with proper conversion gain scaling and shifting their LO inputs by 45° with respect to each other, the third and the fifth harmonics can be canceled out. The gain scaling factor for this purpose should be 1, $\sqrt{2}$ , and 1 for the vector shifted 0°, 45°, and 90°, respectively. The resulting transient output waveform is shown in Fig. 4.6(b). The principle of harmonic cancellation is explained by the vector diagrams shown in Fig. 4.6(c). At the fundamental frequency, the output is $(1/20^{\circ} + \sqrt{2}/45^{\circ} + 1/290^{\circ})$ , while at the third and fifth harmonics, the output is $(1/20^{\circ} + \sqrt{2}/135^{\circ} + 1/270^{\circ})$ , and $(1/20^{\circ} + \sqrt{2}/2225^{\circ} + 1/290^{\circ})$ , respectively. Therefore, at the fundamental frequency, the signals are added constructively, whereas, at the third and fifth harmonics, the signals cancel each other. Note that the resulting HR level highly depends on the amplitude and phase mismatch among three mixers, which is given by: $$HD_3 = \frac{1}{9}(1 - \cos(3\theta))(1 + \Delta)^2 + ((1 + \Delta)\sin(3\theta))$$ (4.3) and: $$HD_5 = \frac{1}{25}(1 - \cos(5\theta))(1 + \Delta)^2 + ((1 + \Delta)\sin(5\theta))$$ (4.4) in [10], respectively, where $\Delta$ is the amplitude mismatch and $\theta$ is phase mismatch. Figure 4.6: (a) Principle of the HR technique in [10]; (b) transient and (c) vector diagrams for the fundamental frequency and the third and fifth harmonics. In this work, the RFDAC can replace the mixer and amplifier in combination with Fig. 4.6(a), of which the resulting overall system diagram of the proposed phase modulator is shown in Fig. 4.7(a). As indicated, the signals of three parallel I/Q RF-DACs are summed at their output node, while they are driven by different clock phases, namely, $0^{\circ}$ , $45^{\circ}$ , and $90^{\circ}$ , along with the current steering scale factors of 1, $\sqrt{2}$ , and 1, respectively. A transient simulation was done to determine the necessary order of the RC-LPF to satisfy the static phase error requirement, including the RFDACs with HR. The resulting static phase error is shown in Fig. 4.7(b). Compared to Fig. 4.5, the static phase error is dramatically reduced. With second- and third-order RC-LPFs, the R.M.S error is now less than 0.4° and 0.2°, respectively. Therefore, the second-order RC-LPF is chosen for this design. The overall functionality of the realized RFDAC-based phase modulator is as follows. As indicated in Fig. 4.7(a), the digital I/Q data is initially loaded through an SPI into two on-chip SRAMs. These SRAMs are connected to an on-chip CORDIC that decomposes the I/Q data into AM and PM signals. Note that the sampling speed $F_S$ of the digital baseband is a quarter of $f_{LO}$ . An internal clock generator provides the eight phases which are required for HR operation. The phase output from the CORDIC is applied to a normalizer, which converts the wanted (digital) phase into a new digital I/Q signal with a constant magnitude. The normalizers can also be used as a LUT to compensate for I/Q mismatch in the phase modulator. ## 4.3 Design of RFDAC The RFDAC in this design integrates a current-steering baseband DAC and a mixer in a single block. As discussed in the previous section, six such RFDACs in the proposed design features dual TX line-ups to suppress the third and fifth harmonics in both branches. This section will focus on the implementation of the RFDAC. #### 4.3.1 Design of unit cell A schematic of the unit cell is shown in Fig. 4.8(a). A unit cell contains a current source to provide a DC current, and a switch to steer this current. A cascode configuration for the current source is employed to increase its output impedance at the expense of voltage headroom for the output signal. It has been shown that this is a necessary measure to boost DAC linearity ([11]). The switch is implemented as a digitally driven differential pair. The output of the mixer is differential, which is necessary to establish overall push-pull RFDAC operation. In this design, the mixing operation is implemented using a bit-by-bit XOR approach inside the RFDAC unit cell. Instead of driving the switch transistors M1 and M2 directly with data as a conventional current-steering DAC, the switches are driven by two XOR gates. The inputs of these XOR gates are the data bit $m_i$ under consideration and the LO signals. A schematic of the XOR gate used in this design is shown in Fig. 4.8(a), which consists of two transmission gates. However, there is a drawback to this unit cell. Due to the gate-drain capacitance $C_{\rm gd}$ of the transistors M1 and M2, charge injection will be injected and drift from the drive signal directly to 4.3 Design of RFDAC 69 Figure 4.7: (a) Block diagram of the proposed phase modulator; (b) related static phase error with HR technique using different RF-LPFs. Figure 4.8: (a) Conceptual schematic of an RFDAC unit cell; (b) schematic of an RFDAC unit cell with dummy switches. Figure 4.9: Floorplan of one single RFDAC the output (Fig. 4.8(a)). This non-linear behavior will disturb the output at each rising or falling edge with current spikes. Since these spikes are data-dependent, the RFDAC output will exhibit non-linear distortion. To address this issue, dummy switch transistors M3 and M4 are employed, as illustrated in Fig. 4.8(b). These transistors are identical to M1 and M2 so that $C_{\rm gd}$ is equal for all four devices. The drains of the dummy transistors are cross-connected to cancel out the charge-injection. The sources of M3 and M4 are connected and float to other circuitry. Regardless of the capacitive load of the XOR gates doubling, these dummy transistors prove to be capable of boosting linearity. #### 4.3.2 Floorplan To satisfy the out-of-band requirement and shrink the occupied area, the resolution of the RFDAC is set to be 9 bits that are split into two segments: a 6-bit MSB thermometer configuration and a 3-bit LSB binary-weighted structure. One common approach is to place the cells in an array and drive the cells in rows and columns to decrease the parasitic capacitance and improve device matching compared to a line-oriented topology. In this approach, three LSB bits are used Figure 4.10: Top-level conceptual schematic of the limiter to control the binary-weighted cells. For the unary-weighted part, DATA[5-3] and DATA[8-6] are converted into the row- and column-select signals, respectively, using two 3-to-7 thermometer encoders. Following the encoders, DFFs are placed to prevent any glitches that would degrade to the phase signal's spectral purity. The binary-weighted segment is configured in the same manner. The DFF and thermometer encoder used are the same applied in Chapter 3. ## 4.4 Implementation of Limiter The limiter consists of analog gain stages and a digital buffer chain. Its top-level schematic is shown in Fig. 4.10. As shown in Fig. 4.11(a), the analog gain stages consist of three cascaded CML buffers, which is a widely used configuration in optical communication systems. The first gain stage also acts as the second stage of the RC-LPF. Re-configurable capacitor banks are connected in parallel with the load resistors in the first stage to adapt the cut-off frequency based on the LO frequency. The capacitor banks have both 4-bit resolution, which the SPI controls, with a maximum capacitance of approximately 87 fF. The W/L of the transistors is tuned to achieve the gain with a power budget of 1 mW. Overall, the three gain stages in this design achieve a small-signal gain of approximately 30 dB at 2.4 GHz. Unlike the RFDAC design, the current sources used to bias the differential pairs are not of the cascode type due to the limited available voltage headroom. Each stage has an output common-mode voltage of approximately 0.6 V. The second and third stages will touch the clipping region, and thus, their output swing ranges from 100 mV up to 1.1 V. The output of the analog gain stages is AC-coupled to the input of a CMOS buffer chain to convert the signal into rail-to-rail CMOS logic signals. A schematic of the self-biased inverter, and the buffer chain is depicted in Fig. 4.11(b). The series capacitors at the input serve to block the DC component at the output of the gain stages. Self-biased inverters generate this input DC bias voltage in the buffer chain. Note that any deviation in the DC bias voltage will corrupt the phase accuracy. The corner frequency of the high pass filter (HPF) due to the RC of the AC path and DC feed is approximately 83 MHz, which is far below the target operating frequency range. The push-pull digital buffer chain includes two stages of inverters with phase aligners. The buffer chain has to drive the input buffer of the DPA, which is an inverter that is four times the minimum size Figure 4.11: Schematic of (a) analog gain stages and (b) buffer chain. with 10 fF of extra parasitic capacitance in the routing traces. In the post-layout simulation, the rising and falling time of the phase-modulated signal is approximately 70 ps. #### 4.5 Measurement Results The proposed wideband phase modulator is fabricated in a 40 nm bulk CMOS technology, together with the circuits from Chapter 3. Figure 4.12 shows the chip micrograph. The three pairs of RFDACs are placed side by side, and the total core area is 0.21 mm<sup>2</sup> excluding the CORDIC and normalizers, while the core area is 0.16 mm<sup>2</sup>. The measurement setup is the same as that in Chapter 3. In the measurements, the AM control word is set to its highest value. #### 4.5.1 Single-tone Test A single-tone test is a commonly used method to determine the spectral performance and phase error of the phase modulator. Such a signal should result in a constant frequency shift from the LO center frequency. The LO leakage and IQ image of the single-tone measurement within the frequency range of 0.6-2.5 GHz is demonstrated in Fig. 4.13(a), while the tone spacing is $f_{\rm LO}/128$ . These measurements verify that the LO leakage and IQ image are below -45 dBc and -57 dBc without any IQ calibration or DPD, respectively. Figure 4.13(b) presents the constellation diagram of the single-tone serrodyne measurement at $f_{\rm LO}$ =2.5 GHz. Note that the 128 constellation points are clearly distinct, showing superior phase accuracy. To quantize the phase error, Fig. 4.14(a) shows the measured output phase versus digital input, while Fig. 4.14(b) highlights the related static Figure 4.12: Micrograph of the phase modulator. Figure 4.13: Measured (a) LO leakage and IQ image vs. $f_{LO}$ , (b) constellation diagram for 128 phases at $f_{LO}$ = 2.5 GHz. Figure 4.14: Measured (a) output phase and (b) static phase error vs. input phase at $f_{LO} = 2.5 \text{GHz}$ , which equals $0.72^{\circ}$ (R.M.S value). Figure 4.15: Measured spectrum of (a) an 8 Mb/s GFSK signal at $f_{\rm LO}=1$ GHz; (b) an 80 Mb/s GFSK signal at $f_{\rm LO}=2.5$ GHz phase error versus input phase at $f_{LO}$ =2.5 GHz, after going through a digital static phase LUT, and yielding an R.M.S phase error of 0.72° only. The power consumption, excluding that consumed by the baseband circuits, at 0.6 and 2.4 GHz are 12 and 33 mW, respectively. #### 4.5.2 Modulated Signal Measurement The phase modulator is firstly measured using a phase-modulated signal with a constant amplitude. For this purpose, a GFSK signal is applied to the proposed phase modulator as IQ data stored in the memory. Figure 4.15 presents the measured spectrum with the GFSK signal at 1 GHz and 2.5 GHz, respectively. With an 8 Mb/s GFSK signal at $f_{\rm LO}=1$ GHz, the EVM is -27 dB with an output emission lower than -46 dBc. With 80 Mb/s GFSK signal at $f_{\rm LO}=2.5$ GHz, the EVM is -26 dB with an output emission lower than -48 dBc, showing that the proposed phase modulator can operate over a wide frequency range. Following that, the GMSK spectrum of 75 Mb/s for $f_{LO}$ =2.4 GHz is shown in Fig. 4.16 in combination with the trellis diagram, indicating the out-of-band emission level is better than -50 dBr, Figure 4.16: Measured spectrum of a 75 Mb/s GMSK signal at $f_{\rm LO}=2.4$ GHz, with a trellis diagram. Figure 4.17: Measured spectrum and constellation diagram of an 80 MHz 64-QAM signal at $f_{LO}$ =2.4 GHz. while the EVM is about 1.6 % (-36 dB). The same measurement results at $f_{LO}$ =0.6 GHz, and the EVM becomes 1.92 % (-34.5 dB). Finally, the proposed phase modulator is also measured together with the DPA described in Chapter 3, and the yielding measured spectrum is shown in Fig. 4.17 with the constellation diagram. With an 80 MHz bandwidth 64-QAM signal, the system achieves an EVM of better than -27 dB after DPD correction for distortion resulting from the AM path. Note that due to non-linear decomposition from IQ signal to polar signal, the effective bandwidth for an 80 MHz QAM signal is in the order of 240-400 MHz. These measurement results show that the proposed wideband phase modulator can indeed support a large video bandwidth. Table 4.1 summarizes the performance of the phase modulator and compares it with state-of-the-art phase modulators. Compared to other standalone or integrated phase modulators, the proposed phase modulator shows a large operational bandwidth and excellent capability to of handling a high modulation bandwidth. Note that the AM path causes the bottleneck in the linearity of the QAM signal measurement. | Ref | Ye'<br>JSSC13 | Li'<br>RFIC19 | Marzin'<br>JSSC12 | Li'<br>TMTT17 | Gheidi'<br>JSSC17 | Nidhi'<br>TMTT17 | This Work | | |-----------------|------------------|-------------------------|-------------------|---------------|-------------------|------------------|-----------|-------------------------| | Process (nm) | 65 | 40 | 40 | 65 | 45 SOI | 45 | 40 | | | Architecture | Cartesian | Cartesian | PLL | PLL | Delay | Delay | Cartesian | | | Area (mm²) | 0.31 | 0.045 | 0.7 | 0.7 | 0.15 | 0.1 | 0.21 | | | Frequency (GHz) | 2.2 | 1.2-2.5 | 2.9-4 | 1.5-2.1 | 1-3 | 1.8 | 0.6 | 2.4 | | Power (mW) | 17 | 12.5 | 5 | 10.6 | 34 | 20 | 12 | 33 | | Modulation | OFDM | LTE | GMSK | GFSK | GMSK | GFSK | GMSK | 64QAM | | Signal BW | 20 | 10 | 10 | 20 | 20 | 100 | 18 | 80 | | EVM (dB) | -28 <sup>2</sup> | <b>-28</b> <sup>2</sup> | -36 | -28.6 | -34 | -22 | -34.5 | <b>-27</b> <sup>2</sup> | Table 4.1: Performance summary and comparison with state-of-the-art phase modulators. #### 4.6 Conclusion This chapter demonstrates a 0.6-2.5 GHz Cartesian-based wideband phase modulator in a 40 nm CMOS technology. By employing the HR technique, it achieves -45 dBc LO leakage and a -58 dBc IQ image at $f_{\rm LO}$ =2.5 GHz. Its out-of-band emission is better than -50 dBc for 75 Mb/s of a GMSK signal at $f_{\rm LO}$ =2.4 GHz with a related EVM of about -36 dB. The proposed phase modulator can also support an 80 MHz 64-QAM signal in a polar TX, achieving an output emission better than -40 dBc and an EVM better than -27 dB. These values are very favorable when compared to current state-of-the-art phase modulators. #### References - [1] Y. Shen et al., "A wideband I/Q RFD AC-based phase modulator," 2018 IEEE 18th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF), 2018, pp. 8-11 - [2] N. M. Filiol, T. A. D. Riley, C. Plett and M. A. Copeland, "An agile ISM band frequency synthesizer with built-in GMSK data modulation," in *IEEE Journal of Solid-State Circuits*, vol. 33, no. 7, pp. 998-1008, July 1998 - [3] T. Buckel *et al.*, "A Highly Reconfigurable RF-DPLL Phase Modulator for Polar Transmitters in Cellular RFICs," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 66, no. 6, pp. 2618-2627, June 2018. - [4] X. Li, S. Lv, W. Rhee, W. Jia and Z. Wang, "20-Mb/s GFSK Modulator Based on 3.6-GHz Hybrid PLL With 3-b DCO Nonlinearity Calibration and Independent Delay Mismatch Control," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 65, no. 7, pp. 2387-2398, July 2017 <sup>1</sup> Estimated from chip micrograph; 2 Combined with AM modulator and the distortion partly is contributed by AM path. 4.6 References 77 [5] H. Xu, Y. Palaskas, A. Ravi, M. Sajadieh, M. A. El-Tanani and K. Soumyanath, "A Flip-Chip-Packaged 25.3 dBm Class-D Outphasing Power Amplifier in 32 nm CMOS for WLAN Application," in *IEEE Journal of Solid-State Circuits*, vol. 46, no. 7, pp. 1596-1605, July 2011 - [6] P. Madoglio et al., "A 20dBm 2.4GHz digital outphasing transmitter for WLAN application in 32nm CMOS," 2012 IEEE International Solid-State Circuits Conference, San Francisco, CA, 2012, pp. 168-170 - [7] H. Gheidi, T. Nakatani, V. W. Leung and P. M. Asbeck, "A 1–3 GHz Delta–Sigma-Based Closed-Loop Fully Digital Phase Modulator in 45-nm CMOS SOI," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 5, pp. 1185-1195, May 2017. - [8] L. Ye, J. Chen, L. Kong, P. Cathelin, E. Alon and A. Niknejad, "A digitally modulated 2.4GHz WLAN transmitter with integrated phase path and dynamic load modulation in 65nm CMOS," 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, 2013, pp. 330-331. - [9] M. Mikhemar et al., "A Rel-12 2G/3G/LTE-Advanced 3CC Cellular Receiver," in IEEE Journal of Solid-State Circuits, vol. 51, no. 5, pp. 1066-1079, May 2016 - [10] J. A. Weldon *et al.*, "A 1.75-GHz highly integrated narrow-band CMOS transmitter with harmonic-rejection mixers," in *IEEE Journal of Solid-State Circuits*, vol. 36, no. 12, pp. 2003-2015, Dec. 2001. - [11] C. Lin et al., "A 12 bit 2.9 GS/s DAC With IM3 < -60 dBc Beyond 1 GHz in 65 nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 44, no. 12, pp. 3285-3293, Dec. 2009. - [12] T. Li et al., "A Wideband Digital Polar Transmitter with Integrated Capacitor-DAC-Based Constant-Envelope Digital-to-Phase Converter," 2019 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Boston, MA, USA, 2019, pp. 83-86. - [13] G. Marzin, S. Levantino, C. Samori and A. L. Lacaita, "A 20 Mb/s Phase Modulator Based on a 3.6 GHz Digital PLL With -36 dB EVM at 5 mW Power," in *IEEE Journal of Solid-State Circuits*, vol. 47, no. 12, pp. 2974-2988, Dec. 2012. - [14] N. Nidhi and S. Pamarti, "Design and Analysis of a 1.8-GHz Open-Loop Modulator for Phase Modulation and Frequency Synthesis Using TDC-Based Calibration," in *IEEE Transactions on Microwave Theory and Technique*, vol. 65, no. 10, pp. 3975-3988, Oct. 2017. # CHAPTER # 5 # A Wideband IQ DDRM with an IQ-Mapping Technique This chapter presents a wideband, $2\times12$ -bit DDRM with an IQ-mapping technique realized in a 40 nm CMOS technology. The proposed digital-intensive quadrature up-converter features an advanced IQ-mapping technique to boost RF power, in-band linearity, and out-of-band spectral purity. The modulator can provide more than 14 dBm peak RF output power with a 50 $\Omega$ load and achieves an ACLR of -52 dBc and an EVM of -40 dB when applying a 20 MHz 256-QAM signal at 2.4 GHz. When applying a 320 MHz 256-QAM signal at 2.4 GHz, the measured ACLR and EVM are better than -43 dBc and -32 dB, respectively, without applying any DPD. To develop the proposed DDRM, Section 5.1 introduces the concept of DDRM, as well as its challenges. Section 5.2 provides the background of the IQ-mapping concept and presents its advantages. The related system architecture is developed in Section 5.3, while Sections 5.4, 5.5, and 5.6 are focused on the circuit implementation. Experimental results are shown in Section 5.7, and Section 5.8 concludes the main findings of this chapter. # 5.1 Direct-Digital RF Modulators #### 5.1.1 Concept of Direct-Digital RF Modulators The ever increasing demand for fast data access and high throughput is driving wireless communication towards its 5G, which utilizes a larger modulation bandwidth and MIMO operation Figure 5.1: Typical block diagram of (a) an analog modulator; (b) a DDRM. while demanding higher system efficiency and integration. To meet this demand, in recent years, extensive research has been directed towards high-linear and power-efficient TX line-ups. The key building block of these architectures is the RF modulator, which can be realized as either a Cartesian (I/Q) modulator ([1]-[5]) or its polar counterpart ([6]). Due to the linear summation of the I and Q signals, an I/Q DTX appears to be the best candidate to handle the large modulation bandwidths of a 5G mobile network. Because of this, among several I/Q DTX architectures, the DDRMs are rapidly gaining interest due to their natural compatibility with nanoscale CMOS technology, small chip area, and potential to offer high output power, excellent spectral purity, and frequency-agile operation. The concept of the DDRM can be derived from the analog Cartesian RF modulators introduced in Section 2.2, the diagram of which is again presented in Fig. 5.1(a). The baseband DACs convert the digital signal into an analog representation, which is then up-converted using orthogonal LO phases (square waves) in a quadrature mixer configuration. The LPFs suppress the sampling spectral replicas generated by the DACs. Alternatively, enabled by advanced high-speed CMOS technologies, the digital signal can also be bitwise up-converted using logic gates that process the digital baseband data and LO phases in a digital fashion for sub-6 GHz applications. The latter approach eliminates the need for a standalone high-linearity mixer and LPF. Consequently, the baseband DACs and mixers in Fig. 5.1(a) can be merged into one single block, namely, a mixing-DAC or RFDAC. Combining two RFDACs with related digital baseband signal processing compromises the DDRM, shown in Fig. 5.1(b). Note that there is no explicit LPF in the DDRM, which can be bulky and lossy in analog modulators. Figure 5.2(a) depicts the schematic of a conventional analog-intensive modulator ([7]), compromising a pair of RFDACs which implement the I and Q up-conversion. In a DDRM, the *unsigned* complementary I and Q baseband signals are first interpolated in the digital domain to adequately suppress close-in sampling spectral replicas and push them further away, as such acting as an LPF in the digital domain. Secondly, the resultant two high-speed bit-streams are subsequently converted Figure 5.2: Conceptual diagram of the conventional DDRM in (a) [7] and (b) [9]. into two separate I/Q RF signals by exploiting current-steering RFDACs. Eventually, the two up-converted signals are combined. For each RFDAC, the output can be expressed as a result of the XOR calculation of the LO signal and data: $$\begin{cases} RF_{\text{out+}} = CLK_P \cdot BB^+ + CLK_N \cdot BB^- = CLK_P \oplus BB^+ \\ RF_{\text{out-}} = CLK_N \cdot BB^+ + CLK_P \cdot BB^+ = \overline{CLK_P \oplus BB^+} \end{cases}$$ (5.1) A CMOS XOR/XNOR gate can implement the XOR/XNOR calculation in the voltage domain, and then the schematic can be simplified, as shown in Fig. 5.2(b) ([8] and [9]). By transferring the XOR operation to the voltage domain, the voltage headroom can also be improved. Doing so allows more flexibility in choosing the Nyquist frequency of the RFDAC, which can be beneficial in view of the widening operation frequency range ([8]). #### 5.1.2 Comparison between DDRM and DPA-Based Cartesian DTXs Although the principle of DDRMs is similar to DPA-based Cartesian DTXs, which are introduced in Section 2.2, the different implementation approaches can affect, however, both their spectral purity and power efficiency. In this section, the DDRM architecture will be compared with DPA-based Cartesian DTXs in terms of these properties. A fair comparison starts with a universal digital modulator model. This DDRM model can Figure 5.3: Simplified universal modulator model for a single-branch mixing DAC or DPA. be adapted as discussed in Chapter 3 (Fig. 5.3). By doing so, the unit cell is represented as a baseband-controlled RF current source with the source conductances $G_{\rm ON}$ and $G_{\rm OFF}$ representing the "on" and "off" state admittance, respectively. Since the current source switches on/off every RF cycle, the impedance will vary within each RF cycle. When the unit cell is not activated, this can be expressed as: $$\begin{cases} G_{\text{ACT}} = G_{\text{OFF}} + (G_{\text{ON}} - G_{\text{OFF}})u(t) \\ G_{\text{DIS}} = G_{\text{OFF}} \end{cases}$$ (5.2) and when it is activated, the current is given by: $$I_{\text{unit}} = I_0 u(t) \tag{5.3}$$ where u(t) is a normalized rectangular LO waveform. Consequently, the effective admittance for a single RFDAC can be expressed as: $$G_{\text{source}} = nG_{\text{ACT}} + (N - n)G_{\text{DIS}}$$ = $n(G_{\text{ON}} - G_{\text{OFF}})u(t) + NG_{\text{OFF}}$ (5.4) where N is the total number of unit cells and n is the control word. Then, based on Kirchhoff's laws, the current driven to load $R_L$ should be $$I_{L} = nI_{0} \frac{\frac{1}{R_{L}}}{n(G_{ON} - G_{OFF})u(t) + NG_{OFF} + \frac{1}{R_{L}}} u(t)$$ (5.5) In DDRMs, due to the current-steering topology, $G_{\rm ON}$ and $G_{\rm OFF}$ are typically much smaller than $G_L(=1/R_L)$ . Thus in the ideal case, to make the output $I_L$ linear with the control word n, the current split ratio should not change, which is the case if $G_{\rm ON}$ is equal to $G_{\rm OFF}$ . However, in a practical situation, there is always some mismatch between $G_{\rm ON}$ and $G_{\rm OFF}$ yielding distortion. To improve linearity, the use of bleeding current can bring $G_{\rm ON}$ close to $G_{\rm OFF}$ ([10]). Compared to a DDRM architecture, as discussed in Chapter 3, the $G_{\rm ON}$ in a DPA design needs to be much larger than $G_{\rm OFF}$ to achieve good efficiency performance. Since $G_{\rm OFF}$ is much smaller Figure 5.4: Simplified universal modulator model for a Cartesian-based DDRM or DPA. than $1/R_L$ , in this case, the output current in (5.5) can be simplified to: $$I_L = nI_0 \frac{\frac{1}{R_L}}{nG_{\rm ON} + \frac{1}{R_L}} u(t)$$ (5.6) resulting in a non-linear transfer function between the control word n and output current. Since this is similar to AM-AM distortion in conventional PAs, DPD is required. Note that if the source admittance in Fig. 5.3 also includes an imaginary component, ACW-PM distortion will also be present in addition to the ACW-AM distortion. However, the previous analysis only covers simple amplitude modulation. The situation becomes more complicated when taking the I/Q operation into account. Particularly in DPA-based Cartesian DTXs, where the RF current is the combined output of the I/Q DPA banks, the effect of I/Q interaction is no longer negligible. Here, I/Q interaction refers to the fact that the output current is no longer the result of I and Q signals independently, but also results from the I and Q together, i.e., $$\frac{\partial}{\partial I \partial Q} OUT(I, Q) \neq 0 \tag{5.7}$$ I/Q interaction is different from conventional AM-AM/AM-PM distortion. Namely, the distortion from I/Q interaction is not only related to the amplitude but also related to the actual location in the constellation diagram. To analyze I/Q interaction effectively, another set of current sources with a quadrature phases needs to be added to the model in Fig. 5.3. The amended model is shown in Fig. 5.4, where $I_{\text{unit-I}}$ and $I_{\text{unit-Q}}$ represent the quadrature RF current pulses, which have the same shape but are 90° shifted in phase. This can be expressed as: $$\begin{cases} I_{\text{unit-I}} = I_0 u(t) \\ I_{\text{unit-Q}} = I_0 u(t - T/4) \end{cases}$$ $$(5.8)$$ where T is the period of the LO signal, and thus, the output current is: $$I_{L} = [nI_{0}u(t) + mI_{0}u(t - T/4)] \frac{\frac{1}{R_{L}}}{n(G_{ON} - G_{OFF})u(t) + m(G_{ON} - G_{OFF})u(t - T/4) + 2NG_{OFF} + \frac{1}{R_{L}}}$$ (5.9) In a DDRM, as stated before, both $G_{\rm ON}$ and $G_{\rm OFF}$ are small compared to $1/R_L$ . Consequently, in the ideal case, where $G_{\rm ON} = G_{\rm OFF} = 0$ , the output current can be simplified to: $$I_L = nI_0 u(t) + mI_0 u(t - T/4)$$ (5.10) showing in an ideal DDRM, there is no I/Q interaction. However, in the DPA scenario, where only $G_{\text{OFF}}$ can be assumed to be close to zero, the output current is: $$I_L = \left[nI_0 u(t) + mI_0 u(t - T/4)\right] \frac{\frac{1}{R_L}}{nG_{\text{ON}} u(t) + mG_{\text{ON}} u(t - T/4) + \frac{1}{R_L}}$$ (5.11) where the term $\frac{\frac{1}{R_L}}{nG_{\rm ON}u(t)+mG_{\rm ON}+\frac{1}{R_L}}$ yields the I/Q interaction. If u(t) and u(t-T/4) are non-overlapping, e.g., the LO is a 25 % duty cycle clock, the output current can be simplified to: $$I_{L} = nI_{0}u(t)\frac{\frac{1}{R_{L}}}{nG_{ON}u(t) + \frac{1}{R_{L}}} + mI_{0}u(t - T/4)\frac{\frac{1}{R_{L}}}{mG_{ON}u(t - T/4) + \frac{1}{R_{L}}}$$ (5.12) and again, there will be no I/Q interaction. The behavioral simulation results shown in Fig. 5.5 confirm this statement. In DDRM scenarios, there is no I/Q interaction shown in the constellation diagram regardless of the LO duty cycle. In contrast, in DPA-based Cartesian DTXs, I/Q interaction will exist when the LO clock overlaps. While DDRMs do enjoy superior linearity over DPA-based DTXs, especially regarding I/Q interaction, in return, they suffer from low output power and poor power efficiency performance. Figure 5.6(a) shows the peak output power versus peak system efficiency of DDRMs and DPAs reported in the literature. It can be noted that although the output power of DDRMs is comparable to that of DPAs, but the peak system efficiency of DDRMs is well below 20 %, in contrast to 25 % and higher peak efficiency in most DPA designs. What is worse, since DDRMs usually employ class-A-like current-steering topologies in favor of linearity, the overall DC current does not scale down with output power. Therefore, the drain efficiency of DDRMs at 6 dB PBO will reach only a quarter of their peak drain efficiency. In contrast, as DPAs usually feature class-B-like operation, their drain efficiency in the 6 dB PBO region will be only halved with respect to their peak drain efficiency (Fig. 5.6(b)). Therefore, the efficiency difference between the DDRM and DPA in PBO region will be even larger than in the peak power region. Consequently, effort should be made to also boost the efficiency in the PBO region for DDRMs. In this chapter, the proposed DDRM architecture still uses pure class-A-like conditions, but in Chapter 6, we will discuss how to design a class-B-like DDRM to boost the power efficiency in the PBO region. In summary, compared to DPA-based Cartesian DTXs, DDRMs enjoy better linearity. However, Figure 5.5: Resulting constellation diagram of the behavioral simulation in a DDRM (a) with 25 % LO; (b) with 50 % LO and DPA (c) with 25 % LO; (d) with 50 % LO. Figure 5.6: (a) Reported output power and efficiency for existing DDRMs and DPA-based TXs; (b) normalized drain efficiency versus PBO level for the DDRM and DPA. Figure 5.7: (a) Typical spectrum of conventional DDRM architectures; (b) transient waveform and corresponding constellation diagram of one IQ pair typically used in conventional DDRMs. their low efficiency and use of class-A operation limit their employment in the final stages. #### 5.1.3 Design Challenges of DDRMs Unfortunately, existing DDRMs still suffer from some obstacles striving for high spectral purity and low in-band distortion. First, similar to the analog modulators in Fig. 5.1, the conventional DDRMs in Fig. 5.2 are also typically realized by using two (separate) banks of RFDACs. Figure 5.7(a) shows the typical spectrum and the constellation diagram for one pair of unit cells. The mismatch between two banks of I/Q RFDACs results in an unwanted IQ image component. This image component is essential to the in-band linearity as it usually deviates from the constellation points and corrupts the EVM. In general, there are three main sources of I/Q mismatch: gain mismatch, phase mismatch and delay mismatch. Gain mismatch mainly comes from amplitude mismatch among current sources in unit cells. The transfer function of the RFDAC can be written as: $$y(x) = a_1 x + a_3 x^3 + a_5 x^5 + \cdots (5.13)$$ regardless of any quantization error. Due to the fact that the RFDAC is typically differential in nature, all even terms in (5.13) are effectively eliminated. Assuming that the mismatch between I and Q RFDACs is uncorrelated, we can write: $$\begin{cases} I(x_I) = a_1 x_I + a_3 x_I^3 + a_5 x_I^5 + \cdots \\ Q(x_Q) = b_1 x_Q + b_3 x_Q^3 + b_5 x_Q^5 + \cdots \end{cases}$$ (5.14) Furthermore, for a single-sideband one-tone test, the input for I and Q will be: $$\begin{cases} x_I = cos(\omega_{LO}t)cos(\omega_{BB}t) \\ x_Q = sin(\omega_{LO}t)sin(\omega_{BB}t) \end{cases}$$ (5.15) where $\omega_{LO}$ is the carrier angular frequency and $\omega_{BB}$ is the signal angular bandwidth of the single-tone signal. After the substitution of (5.15) in (5.14), the output of the DDRM is given by: $$OUT(t) = 0.5((a_1 + b_1)cos(\omega_{LO} + \omega_{BB}t) + 0.5((a_1 - b_1)cos(\omega_{LO} - \omega_{BB}t) + 0.125a_3(cos(\omega_{LO} + \omega_{BB}t) + cos(\omega_{LO} - \omega_{LO}t))^3 + 0.125b_3(cos(\omega_{LO} + \omega_{BB}t) - cos(\omega_{LO} - \omega_{BB}t))^3 + \cdots$$ (5.16) Since $a_n$ and $b_n$ are uncorrelated and the contribution from the higher-order items can be neglected, the amplitude of the IQ image is: $$image_{gain} = -20\log\left|\frac{a_1 - b_1}{a_1 + b_1}\right|$$ (5.17) The impact of higher-order items will be discussed in the next section. A similar analysis can be provided for the impact of the phase mismatch. Both phase errors in LO generation and delay mismatch among the LO distribution network contribute to this phase mismatch. The phase error from LO generation will be discussed first. Assume that the phase deviation for $LO_I$ and $LO_Q$ is $\theta_I$ and $\theta_Q$ , respectively; (5.15) will change to: $$\begin{cases} x_I = \cos(\omega_{LO}t + \theta_I)\cos(\omega_0 t) \\ x_Q = \sin(\omega_{LO}t + \theta_Q)\sin(\omega_0 t) \end{cases}$$ (5.18) Consequently, even without any gain mismatch, an I/Q image will still arise due to this phase mismatch. When $\theta_I$ and $\theta_Q$ are small, the image component equals: $$image_{\text{phase}} = -10\log \left| \frac{1 - \cos(\theta_I - \theta_Q)}{1 + \cos(\theta_I - \theta_Q)} \right|$$ (5.19) An analysis of the delay yields similar results. For each unit cell, the LO phase can differ slightly due to delay mismatch in the LO generation and distribution. As a result, the coefficients of (5.14) could be complex, representing both phase mismatch and gain mismatch. This gain and phase mismatch in the real circuits is not easy to correct since the complex coefficients vary with both PVT variations and operation frequency. Figure 5.8 shows the distribution of IQ-image rejection in a conventional DDRM versus gain mismatch and phase mismatch from 200 Monte-Carlo (MC) simulations. This particular DDRM comprises a pair of 12-bit RFDACs; each RFDAC features a 6-bit thermometer and 6-bit binary segmentation. In Fig. 5.8(a), the gain mismatch is represented by introducing random mismatch with a deviation of $\sigma$ to each cell amplitude. The phase mismatch in Fig. 5.8(b) is incorporated by adding random phase mismatch to the LO phases. With a standard variation of 50 % LSB, the gain mismatch will usually lead to an IQ image rejection of -40 to -50 dBc, and with a standard variation of $0.5^{\circ}$ , the resulting IQ image is -40 to -55 dBc. To suppress the image and improve in-band linearity, IQ calibration techniques are usually deployed. This calibration block is implemented in conventional analog modulators by a feedback Figure 5.8: MC simulation results of (a) gain mismatch; (b) phase mismatch. Figure 5.9: Conceptual diagram of the IQ-interleaving DDRM in [12] loop with an on-chip DSP processor ([22]). However, such calibration usually only calibrates gain mismatch to the first order, i.e., $a_1$ and $b_1$ , which will not suppress the image completely since the mismatch in higher-order distortion will still contribute to the IQ image. What is worse, such systems are complicated and power-hungry. Therefore, calibration-free solutions are becoming increasingly popular. What [23] proposes is an IQ-sharing topology where the I and Q banks share the same unit cell. However, this technique requires 25 % quadrature clocks, which, in turn, generate less RF power. Moreover, when driven with 50 % quadrature clocks, ternary states appear, yielding up-converted current pulses with a 75 % duty cycle. This also contributes to IQ interaction that degrades the in-band linearity and out-of-band spectral purity, especially when operating with wideband signals. In [12], the IQ-interleaving DDRM concept is introduced, which improves odd-order distortion and IQ image rejection. It still uses two separate current-mode XOR/XNOR complementary IQ banks with separate current sources to generate its differential RF output signal (Fig. 5.9). Any mismatch among these current sources contributes to even-order distortion. Furthermore, bitwise XOR operation of the I/Q vectors produces binary output current pulses with a 75 % duty-cycle for the tail current sources. This operation exacerbates the impact of the finite settling time for the unit cells, limiting the achievable spectral purity for the wideband signals. # 5.2 IQ-Mapping Technique To improve the performance of the DDRMs in terms of video bandwidth, RF output power, and most importantly, spectral purity, this section presents a novel IQ-mapping technique. It enables an IQ-mixing DAC the unit cells of which can up-convert binary current pulses with a 50 % duty-cycle to the highest current levels. The concept of this IQ-mapping technique is shown in Fig. 5.10. To achieve these desired properties, the original square-shaped four-point constellation (diagonal points in Fig. 5.7) of prior art DDRMs is mapped into a diamond-shaped counterpart (orthogonal points). To explain the mapping process mathematically, we start with the single-unit IQ pair in conventional DDRMs presented in Fig. 5.2(a). The Fourier transformation of the four phases of the LO signal shown in Fig. 5.7 is: $$\begin{cases} \mathscr{F}[LO_0] &= Sa(\omega) \times e^{j0} \\ \mathscr{F}[LO_{90}] &= Sa(\omega) \times e^{j\pi/2} \\ \mathscr{F}[LO_{180}] &= Sa(\omega) \times e^{j2\pi/2} \\ \mathscr{F}[LO_{270}] &= Sa(\omega) \times e^{j3\pi/2} \end{cases}$$ (5.20) where $Sa(\omega)$ is the Fourier transformation of the LO pulse, i.e., a 50 % duty cycle pulse. Then, the Fourier transformation of four IQ combinations (a,b,c,d in Fig. 5.10(a)) for each single-unit IQ pair can be written as: $$\begin{cases} \mathscr{F}[OUT_a] &= Sa(\omega) \times e^{j0} + Sa(\omega) \times e^{j\pi/2} \\ \mathscr{F}[OUT_b] &= Sa(\omega) \times e^{j\pi/2} + Sa(\omega) \times e^{j2\pi/2} \\ \mathscr{F}[OUT_c] &= Sa(\omega) \times e^{j2\pi/2} + Sa(\omega) \times e^{j3\pi/2} \\ \mathscr{F}[OUT_d] &= Sa(\omega) \times e^{j3\pi/2} + Sa(\omega) \times e^{j0} \end{cases} (5.21)$$ which for the fundamental element equals: $$\begin{cases} \mathscr{F}[OUT_a](\omega_{\text{LO}}) &= 4/\pi(\sqrt{2} \times e^{j\pi/4}) \times e^{j0} \\ \mathscr{F}[OUT_b](\omega_{\text{LO}}) &= 4/\pi(\sqrt{2} \times e^{j\pi/4}) \times e^{j\pi/2} \\ \mathscr{F}[OUT_c](\omega_{\text{LO}}) &= 4/\pi(\sqrt{2} \times e^{j\pi/4}) \times e^{j\pi} \\ \mathscr{F}[OUT_d](\omega_{\text{LO}}) &= 4/\pi(\sqrt{2} \times e^{j\pi/4}) \times e^{j3\pi/2} \end{cases} (5.22)$$ If the waveform of (5.22) is replaced by (5.20), the output waveform is the same as that of the LO, i.e., a 50 % duty cycle square wave. By doing so, the original square-shaped constellation is mapped onto a diamond-shaped constellation without corrupting the orthogonality for IQ signal. The post-mapping constellation diagram is shown in Fig. 5.10(a). This operation can be viewed as a mapping of the original I/Q vectors into two new orthogonal vectors: $$\begin{cases} I = 1 & \& Q = 1 \Rightarrow I' = 1 & \& Q' = 0 \\ I = -1 & \& Q = 1 \Rightarrow I' = 0 & \& Q' = 1 \\ I = -1 & \& Q = -1 \Rightarrow I' = -1 & \& Q' = 0 \\ I = 1 & \& Q = -1 \Rightarrow I' = 0 & \& Q' = -1 \end{cases}$$ (5.23) where (I', Q') is the vector after the mapping operation. Note that the original vector (I, Q) needs two-unit current sources, while its new phase mapped vector (I', Q') needs only a one-unit current source for its representation. This IQ-mapping technique can be extended from a single-unit cell to the whole DDRM array. Mathematically, the operation in (5.23) is the mapping of the traditional I and Q vectors into two new orthogonal vectors: $$\frac{1}{\sqrt{2}}Re[(I+jQ)e^{j\omega_{LO}t}e^{-j\pi/4}] = \frac{1}{2}Re[(I+jQ)(1-j)e^{j\omega_{LO}t}]$$ $$= 0.5 * Re[((I+Q)+j(Q-I))e^{j\omega_{LO}t}]$$ (5.24) The scaling factor of $\frac{1}{\sqrt{2}}$ stems from the fact that the vector norm after mapping is scaled by $\frac{1}{\sqrt{2}}$ , as opposed to its original vector norm. Consequently, I' and Q' can be expressed as $$\begin{cases} I' = 0.5 \cdot (I+Q) \\ Q' = 0.5 \cdot (Q-I) \end{cases}$$ (5.25) Equation (5.25) indicates that the proposed DDRM would only need half the number of unit current sources than is needed in conventional DDRM implementation. Although [23] also introduces a mapping technique for a signed-IQ digital PA that results in a diamond-shaped IQ profile, this implementation in [23] can still suffer from distortion due to the inherent risk of IQ clipping, which requires 2D DPD. Moreover, the mapping is realized in the digital domain by pre-processing IQ data prior to the mixing-DAC stage. In contrast, the proposed mapping technique is a linear operation that can be directly implemented into the unit cell of the proposed DDRM resulting in less signal processing overhead and faster data throughput, which enables its use with very high signal bandwidths. #### 5.2.1 Improved Output Power and Efficiency Compared to prior art DDRMs, for a given drain current budget, the proposed IQ-mapping technique enhances the peak output power and efficiency twofold. In other words, if 2N unit cells functioning as I or Q in conventional DDRMs, this would yield a maximum fundamental output current at the outer corners of the IQ-constellation diagram, i.e., vector summation. $$I_{\text{maxIQ}} = I_{\text{unit}}|N + jN| = I_{\text{unit}} \cdot \sqrt{2} \cdot N$$ (5.26) Figure 5.10: (a) Conceptual diagram of the proposed IQ-mapping DDRMs and (b) its spectrum. In contrast, with the proposed IQ-mapping technique (Fig. 5.10), at the outer corners of the constellation diagram, all 2N cells can be directed to the same output phase. As such, the maximum current budget for these points is: $$I_{\text{maxI'Q'}} = I_{\text{unit}} \cdot 2 \cdot N \tag{5.27}$$ which is indeed derived based on scaler summation. Therefore, for the same DC current budget, the proposed IQ-mapping DDRM topology provides a peak RF output power that is 3 dB higher than that of a conventional DDRM. Thus, the drain efficiency is doubled effectively. Mathematically, based on (5.26) and (5.27), this can also be explained by: $$||(I_{\text{maxI'}}, I_{\text{maxQ'}})||^2 = 2||(I_{\text{maxI}}, I_{\text{maxQ}})||^2$$ (5.28) #### 5.2.2 Intrinsic Image Rejection In the proposed IQ-mapping technique, the I' and Q' branches employ a single current source, and therefore, are identical in their transfer function. Consequently, the TX image is inherently canceled. Figure 5.11 graphically explains the impact of I/Q mismatch inside the unit cells. In conventional DDRMs, a difference in the I and Q current sources (pink arrow Fig 5.11-left) yields asymmetry/mismatch in the constellation points over the diagonal axis. As a result, the vector will fail to align with the diagonal accurately. In contrast, the proposed mapping operation requires only one current source. Therefore, the resulting mapped (I', Q') constellation (green arrows) is symmetric. Consequently, although the amplitude of unit cell vectors can differ, there is no mismatch Figure 5.11: Principle of intrinsic image rejection within unit cells. Figure 5.12: Simulated spectra in behavior MC simulation with single-sideband signals: (a) conventional DDRMs; (b) proposed IQ-mapping DDRMs. Simulated spectra with the whole multi-carrier signal and one signal channel: (c) conventional DDRMs; (d) proposed IQ-mapping DDRMs. inside one single cell. Therefore, in the IQ-mapping DDRM, the absence of I/Q mismatch ensures that corresponding polynomial coefficients in (5.14) become equal. $$a_1 = b_1, a_3 = b_3, a_5 = b_5, \cdots$$ (5.29) With the single-sideband one-tone signal of (5.15), the output of the proposed DDRM is: $$(\cos(\omega_{\text{LO}}t)\cos(\omega_{\text{BB}}t))^{2n-1} + (\sin(\omega_{\text{LO}}t)\sin(\omega_{\text{BB}}t))^{2n-1}$$ $$= 0.5^{2n-1}[\cos(\omega_{\text{LO}}t - \omega_{\text{BB}}t) + \cos(\omega_{\text{LO}}t + \omega_{\text{BB}}t)]^{2n-1}$$ $$+ 0.5^{2n-1}[\cos(\omega_{\text{LO}}t - \omega_{\text{BB}}t) - \cos(\omega_{\text{LO}}t + \omega_{\text{BB}}t)]^{2n-1}$$ $$= 0.5^{2n-2} \sum_{k=1}^{n-1} C_{2n-1}^{2k-1}\cos^{2k-1}(\omega_{\text{LO}}t - \omega_{\text{BB}}t) \cdot \cos^{2n-2k}(\omega_{\text{LO}}t + \omega_{\text{BB}}t)$$ $$= 0.5^{2n-2} \sum_{k=1}^{n-1} C_{2n-1}^{2k-1} (\frac{\cos(2\omega_{\text{LO}}t - 2\omega_{\text{BB}}t) + 1}{2})^{k-1} \cdot \cos(\omega_{\text{LO}}t - \omega_{\text{BB}}t) \cdot (\frac{\cos(2\omega_{\text{LO}}t + 2\omega_{\text{BB}}t) + 1}{2})^{n-k}$$ $$(5.30)$$ which in the frequency domain, yields: $$(2k-1)\cdot(\omega_{\rm LO}-\omega_{\rm BB})\pm 2\cdot(n-k)\cdot(\omega_{\rm LO}+\omega_{\rm BB}) \tag{5.31}$$ As can be observed, there is no frequency value ( $\omega_{\rm LO} + \omega_{\rm BB}$ ), meaning that the IQ image is canceled. Simulations also confirm this cancellation. In the simulation, for conventional DDRMs, two vectors with random deviation are generated for the I and Q branch, respectively, whereas in IQ-mapping DDRMs, only one vector is shared with I and Q. Figure 5.12(a) shows a typical spectrum for conventional DDRMs, while Fig. 5.12(b) shows a corresponding spectrum for an IQ-mapping DDRM, proved that the IQ image is canceled. Figure 5.12(c) and (d) also verify this cancellation. Note that in Fig 5.12(c) and (d), DC calibration is conducted to suppress the LO leakage. As demonstrated in (c), without the mapping technique, the image of a single-channel TX signal can infiltrate another channel, and as such, corrupt its SNR. Note that the simulations in Fig. 5.12 do not consider any phase mismatch. Since the gain mismatch is canceled, in IQ-mapping DDRMs, the phase mismatch is the dominant source of IQ image. Thus, it requires special attention during the layout phase. In IQ-mapping DDRMs, since each cell requires four LO phases, the route of the four phases is similar, preventing mismatch in layout. # 5.3 System Architecture Figure 5.13 depicts the overall architecture. The proposed DDRM features digital-intensive quadrature up-conversion, with all signal processing realized in the digital domain. The TX data is fed to SRAMs using an SPI interface. Four on-chip SRAMs are time-interleaved to support a large modulation bandwidth, allowing the bit-stream throughput to equal half of the center frequency, Figure 5.13: Systematic block diagram of the proposed DDRM with the IQ-mapping unit cell. $f_{\rm LO}$ . The external LO signal is divided on-chip to generate the required four IQ clock phases, each with a 50 % duty-cycle. The core circuitry of the proposed DDRM is an IQ RFDAC with 12-bit resolution. To guarantee monotonic operation and prevent mid-code glitches, thermometer decoding is favored over binary coding. However, a pure thermometer code increases the complexity of the encoders, the chip area, interconnecting parasitics, and power consumption. Thus, a segmented approach is adopted. The 12 bits are split into two parts: 6 bits are MSB units with unary cells and 6 bits are LSB units with binary cells. Therefore, the RFDAC implementation requires 64 MSB and 6 LSB units, resulting in several design iterations between the schematic and layout. The implementation details will be shown in the following sections. As shown in Fig. 5.13, preceding the proposed unsigned IQ-mapping DDRM, there are thermometer encoders and dynamic element matching (DEM) circuitry. DEM is employed to randomize mismatch to improve the linearity for signals with smaller bandwidths (BW<10 MHz). The output signal of the DDRM is fed to an off-chip matching network. It is worth mentioning a somewhat similar cell topology is reported in [25] for a neural recording system using offset quadrature phase-shift keying (O-QPSK) modulation with a constant amplitude. In the more generic, high-resolution IQ mixing DAC that we propose, however, CMOS AND gates are utilized before the mixing-DAC unit cell to improve the voltage headroom and, thus, to boost the in-band linearity and out-of-band spectral purity. # 5.4 Implementation of DDRM This section will mainly focus on the implementation of the DDRM. A schematic of a unit cell is shown in Fig. 5.14. A proper schematic and layout are essential to achieve superior spectral purity. The LO distribution network, data decoder, and output network are also covered. Figure 5.14: Detailed unit-cell schematic of the proposed DDRM. #### 5.4.1 Schematic of DDRM The operating principle of the proposed mixing and output stage implementation inside each unit cell can be understood by considering the mapping of the original IQ data by simple AND gates. Here, the bitwise ANDing of $(DATA\_I, DATA\_Q)$ , $(\overline{DATA}\_I, DATA\_Q)$ , $(\overline{DATA}\_I, DATA\_Q)$ , $(\overline{DATA}\_I, DATA\_Q)$ , $(\overline{DATA}\_I, DATA\_Q)$ , and subsequent bitwise multiplication using current-mode XOR/XNOR logic, with 50 % quadrature LO clocks whose phases are $0^{\circ}$ , $90^{\circ}$ , $180^{\circ}$ , and $270^{\circ}$ , a four-point diamond-shaped constellation diagram is created. In this way, the proposed circuit inherently creates the desired IQ mapping (i.e. $45^{\circ}$ rotation with respect to the original constellation points) without using any additional clock phases. Using 50 % LO clocks, the resulting duty-cycle of the up-converted IQ current pulses is also precisely 50 % at the highest current level. Even more importantly, in the proposed structure, a single current source is used for generating both the I' and Q' signals, thus minimizing any mismatch problems. In each unit cell, thick-oxide cascode transistors at the top of each branch are employed which boost the output impedance of the unit cell and enable high output-voltage swings, thus improving linearity and output power. To minimize the loss and distortion, the cascode transistor should be minimally sized. In this process, the minimum gate length of the thick-oxide transistor is set to 250 nm to guarantee reliability. However, this still brings in extra capacitance and, more importantly, some capacitance variation when changing between the on/off state. The latter aspect will introduce code-dependent distortion. Some studies in the literature propose the use of bleeding current sources to decrease this variation ([10] and [24]). However, in this design, since the output swing is high (>1 V), the impact of the non-linear capacitance between the gate and drain ( $C_{\rm gd}$ ) caused by the output voltage swing is larger than the variation caused by on/off. Therefore, no bleeding current source is employed, which also contributes to higher system efficiency. The mixing transistor operates at $f_{LO}$ , and hence its parasitic capacitance is critical to the Figure 5.15: Two possible implementations of phase multiplexer: (a) hybrid mode; (b) voltage mode. overall linearity performance. Ideally, the mixer should be implemented using thin-oxide devices with a minimum gate length. However, with this approach, it would be a challenge to fit four stacked transistors within a voltage headroom of 1.1 V without compromising the reliability in this process. What is worse, since a large amount of current is required to generate sufficient output power, the W/L ratio should be large enough to fit in the voltage headroom. Doing so will lead to a large parasitic capacitance and a lowering of the operation frequency. One solution is to bias the bulk of the current source and its cascode transistors with a negative voltage at the expense of complicating the bias circuitry. In modern DAC designs, this issue can be alleviated by reducing the number of stacked transistors [8]. The solution chosen for those DACs is to use a CMOS multiplexer instead of a hybrid mode one, shown in Fig. 5.15(b). The main reason for not using the former in this design is that the delay mismatch for the four phases in the CMOS logic gates is expected to be significantly higher than it is for the hybrid mode multiplexer implementation due to PVT variations, which would lead to large distortion. Another option is to omit cascoding in the current source or the output, which would result in a decrease in the linearity performance. Using thick-oxide transistors as mixing transistors would also be a solution, but this would result in a lower switching frequency or output power. In this work, thanks to the availability of triple-well technology, transistors can be put into a deep N-well (DNW), allowing the operating voltages of the mixer to be shifted. As a result, the bulk voltage of the mixing transistor is elevated from 0 V to 0.6 V. Delicate effort must be taken to guarantee that the voltage of all terminals is always higher than 0.6 V. After shifting the voltage of the mixing transistor to 0.6-1.7 V, the relative voltage between the mixing transistor terminals is still not higher than 1.1 V. As such, reliability is not compromised. The total effective voltage headroom for the four core transistors, in this case, is increased to 1.7 V, offering sufficient voltage headroom. The implementation of the data switches is similar to that in the mixing transistors, using an elevated bulk. Although the baseband bandwidth is much smaller than $f_{LO}$ , to suppress the sampling spectral replica adequately, the baseband switch should be able to run at a frequency of $f_{LO}/2$ . Hence the minimum gate length is also employed here. To shift the data signals to the voltage domain of 0.6-1.7 V, a current-mode logic (CML) buffer with thick-oxide transistors is used, which will be discussed below. The cascode current sources in the RFDAC are optimized to achieve a high voltage headroom and minimize the parasitic capacitances, especially in the CS nodes. These transistors consist of a number of units to offer good scalability for the binary current cells, surrounded by dummy cells. In each MSB cell, the current source consists of a large transistor (800 $\mu m/2.5~\mu m$ ) to meet the 12-bit resolution requirement, and for each cell the current capability is more than 3 mA. Moreover, to relax the matching requirement and also shrink the active area to some extent, re-configurable transistors for initial calibration have been added, which can be controlled by the SPI. #### 5.4.2 Binary Cells To minimize the distortion, the binary cells should be perfectly scaled versions of the unary cells. Different strategies are applied to achieve optimal scaling. For the current sources, in unary cells, 320 separate transistors are used. In this way, the binary cells can be realized by disconnecting a number of the transistor unit elements. Therefore, in the last LSB cell, there are only five transistors connected. Each binary cell has only half of the unit elements enabled compared to the previous binary cell. The disabled elements are connected as dummies. In the other parts, i.e., for the cascode transistors, mixing transistors, and data switches, a multi-finger pattern is employed. The number of fingers enabled is also reduced to scale down the parasitics. However, both unary and binary cells should respond identically to the rising/falling edges of driving signals. Following the proposed approach, the data driver and the clock distribution tree need not to be scaled down in the binary cells, since they experience the same input impedance for each cell. Therefore, the remaining unused dummy unit transistors should also be connected to the driving data and clock signals. #### 5.4.3 Floorplan of RFDAC Together with the schematic design, a proper layout of the RFDAC itself is essential to achieve a high spectral purity. The floorplan of the complete RFDAC is shown in Fig. 5.16. As discussed previously, the $DCLK_{\rm HIGH}$ can be as high as half of $f_{\rm LO}$ , and the $DCLK_{\rm LOW}$ will be a quarter of $DCLK_{\rm HIGH}$ . The critical signals in terms of timing in Fig. 5.16 are OUT, CLK, and $DCLK_{\rm HIGH}$ . Hence the layout for these signal lines needs to be designed carefully. Since the data will be retimed by $DCLK_{\rm HIGH}$ before the data switch, the timing constraints for the digital circuits related to $DCLK_{\rm LOW}$ are not severe. #### 5.4.4 LO Distribution Network As stated above, the timing errors in the LO network should be minimized to achieve high spectral purity. Imbalance in the distribution of the LO signal will result in delay timing errors. Hence, a balanced tree structure is implemented, as shown in Fig. 5.17. In conventional baseband DACs (e.g. [10]), using one universal clock buffer to drive the entire passive LO network is a popular approach due to the low delay mismatch. Such a buffer is always Figure 5.16: Floorplan of the proposed DDRM. Figure 5.17: Top-level layout of LO tree. implemented as a CML buffer, which requires high power consumption and introduces a slow rising/falling time. In this work, local CMOS inverters are placed to reduce the rising/falling times and the power consumption. The principle of the clock tree is similar to the tree design of a baseband DAC. However, to trade-off the area and power consumption, the root driver drives only three branches, including two unary branches and one binary branch. The secondary root drives two nodes, and each root drives four leaves (Fig. 5.17). To minimize the delay errors in the LO distribution tree, the load of all unary and binary cells is made equal. For the binary cells, this is achieved by utilizing the dummies. The size of these inverters is chosen to be large to drive the long LO lines with sufficiently short rising and falling times while, at the same time, decreasing the delay spread. In the MC simulations, the standard deviation of the delay mismatch for the four phases in the whole LO distribution network is found to be smaller than 500 fs with an overall power consumption of less than 30 mW. Figure 5.18: Layout of output current combiner. #### 5.4.5 Output Tree Layout A similar approach is used to design the output current combining network. A top-level skeleton of this design is shown in Fig. 5.18. There are two differences between the LO network and the output tree. The first one is that there is no buffer in this network, and there are only two hierarchies to decrease the loss. The second is the trace width, which is much larger due to the high output current (>200 mA). A "twisted" layout pattern is implemented in the UTM layers to handle such a large output current and avoid excessive offset between the differential outputs. The width of the traces that connect the output cells with the remaining part of the tree is tuned to equalize the RC time constants and minimize the delay mismatch. The standard deviation of the delay errors in the unary part is 25 fs, while the insertion loss is about 0.6 dB. #### 5.4.6 Local Data Decoder A block diagram of the local data decoder is shown in Fig. 5.19. The four decoded data streams with a sampling frequency of $DCLK_{LOW}$ will firstly be merged into one stream with a sampling frequency of $DCLK_{HIGH}$ in a 4-to-1 multiplexer. The select signal is generated together with $DCLK_{LOW}$ and $DCLK_{HIGH}$ . The timing sequence is critical to achieve the correct merged data sequence, and to prevent setup and hold errors. A CML buffer shifts the merged data streams with a differential pair of thick-oxide transistors. The bias current and resistor are tuned together to obtain an output voltage swing of 0.6-1.7 V. The voltage-shifted signal is regulated by a CMOS aligner and finally retimed by $DCLK_{HIGH}$ . The aligner and DFFs are placed in a DNW, the bulk of which is biased as 0.6 V, allowing an output swing of 0.6-1.7 V. # 5.5 Implementation of LO Clock Generation This section is focused on the implementation of LO clock generation circuits. As discussed previously, the phase error in the LO clock generation circuitry should be sufficiently low to achieve superior IQ-image performance. Therefore, to minimize the phase error, a quadrature divider is employed to generate the quadrature phases accurately. An off-chip single-ended sinusoid clock at a $2 \times f_{\rm LO}$ frequency is fed to the chip, and an on-chip transformer converts it into a differential Figure 5.19: Block diagram of local data decoder. Figure 5.20: Block diagram of LO clock generation circuits signal representation. The transformer occupies a silicon area of 170 $\mu$ m × 170 $\mu$ m with a turn ratio of 1:1. The center-tap, located in the secondary loop with a bias of VDD/2, allows the DC bias point of output differential signal to be adjusted to tune the duty-cycle of the output LO signal. The transformer windings employ the UTM to minimize the ohmic loss, while underneath shielding occupies the lower metal stack with a radiation pattern to lower the Eddy current in the lossy substrate and mitigate the density requirement and mechanical stress. EM simulation results show that the coupling factor k is about 0.6 and that the transformer can handle an input frequency range of 2-20 GHz. Note that since the input transformer is shared with other front-end circuits, its frequency operation range is larger than what is needed for this DDRM. Due to imbalanced parasitic capacitors in the transformer, the differential signal output can have a phase error that might corrupt the accuracy of the quadrature dividers. Consequently, a phase aligner is employed. The quadrature divided-by-2 divider is shown in Fig. 5.21. It is based on a $C^2MOS$ logic DFF, which produces the four differential quadrature clock signals ( $LO_0$ , $LO_{90}$ , $LO_{180}$ , and $LO_{270}$ , as shown in Fig. 5.21) with a frequency of $f_{LO}$ . Note that the CLK and D input shown in Fig. 5.21 are swapped in contrast to a conventional $C^2MOS$ DFF to decrease the delay from D to Q ([21]). Such a design substantially expands the divider's operation frequency. The back-to-back inverters, which Figure 5.21: Schematic of (a) C<sup>2</sup>MOS divider; (b) LO level shifter. prevent illegal states, help align the differential quadrature phases ( $LO_0$ , $LO_{180}$ , $LO_{90}$ , and $LO_{270}$ ). Since the output of the divider features a rail-to-rail voltage swing, they exhibit superior noise performance over the conventional low-swing CML latches, which typically also suffer from higher power dissipation and a longer rising time in their output signals. The $C^2MOS$ -based divider proves to be operational for all PVT conditions in the simulations. Before being fed to the RFDAC, the LO signal is shifted from the low voltage domain (0-1.1 V) to the high voltage domain (0.6-1.7 V). Since the clock signals are continuous with a high frequency, an AC-coupled level shifter is employed (Fig. 5.21(b)). Simulation results show that the operational frequency of the level shifter ranges from 10 MHz to 6 GHz for all PVT variations. # 5.6 Implementation of Data Path The data path, shown in Fig. 5.13, consists of an SPI, four time-interleaved SRAMs, thermometer encoders, and DEM circuitry. The SPI will first load the data into the SRAMs. During actual DDRM operation, the SRAMs run in a loop. Together with the multiplexer, the four SRAMs act as a poly-phase filter, working as an up-sample-by-four interpolation filter to suppress the sampling spectral replicas that are close to the main signal. Note that most of the digital operations are executed at $f_{LO}/8$ , which causes harmonic content in the supply voltage at $nf_{LO}/8$ . Due to the coupling of substrate and fringing capacitors, spurious components at those harmonics can appear in the output of DDRM. Therefore, isolation between the analog and (low speed) digital parts are essential. In the following section, we will evaluate various interpolation filters. The use of DEM and different decoding schemes to drive the section bank elements will also be discussed. # 5.6.1 Interpolation Filters There are four SRAMs running at a sampling frequency of $f_{LO}/8$ , the data streams of which can be serialized into one data stream with a sampling frequency of $f_{LO}/2$ . Various interpolation techniques can be employed, such as zero-order-hold (ZOH), first-order-hold (FOH), and second-order-hold (SOH) ([12]). In the following, we will compare these different up-sample-by-4 interpolation filters. The simplest type of interpolator filter is a ZOH. As an example, we assume its input data are $I_{\text{BB}}$ and $Q_{\text{BB}}$ . The up-sampled-by-4 data $I_{\text{Inter}}$ and $Q_{\text{Inter}}$ are the up-sampled signals comprising the original samples, which are now separated by three extra zeros. Mathematically, the ZOH operation can be expressed as the convolution of the up-sampled signal with a boxcar function: $$(I/Q)_{\text{Inter}}[n] = upsample((I/Q)_{\text{Inter}}[n] * \Pi[n])$$ (5.32) However, in the frequency domain, the boxcar function $\Pi[n]$ acts as a digital LPF with a *Sinc*-shaped amplitude characteristic. In other words, the subsequent RF power spectrum is shaped by the sinc<sup>2</sup> function with a frequency shift of $\omega_{LO}$ : $$Sinc(\omega) = sinc^2(\frac{\omega - \omega_{LO}}{\omega_{sample}})$$ (5.33) where $\omega_{\text{sample}}$ is the angular baseband sampling frequency. In the FOH filter, the up-sampled $I_{\text{Inter}}$ and $Q_{\text{Inter}}$ are applied to a digital LPF with an impulse response $\wedge[n]$ . Mathematically, the FOH operation can be expressed as the convolution of the up-sampled signal with $\wedge[n]$ , where: $$\wedge[n] = \Pi[n] * \Pi[n] \tag{5.34}$$ where $\Pi[n]$ is the boxcar function used in the ZOH. In the frequency domain, $\wedge[n]$ performs as a digital LPF with $Sinc^2$ amplitude characteristic. In other words, the subsequent RF power spectrum is shaped by the $sinc^4$ function: $$(I/Q)_{\text{Inter}}[n] = upsample((I/Q)_{\text{Inter}}[n] * \wedge [n])$$ (5.35) An SOH digital filter is also adopted to restrain the amplitude of the replicas even more compared to that of an FOH filter. The corresponding digital LPF function is now a quadratic function (X(n)) which can be obtained through the convolution of $\wedge [n]$ : $$Sinc^2(\omega) = sinc^4(\frac{\omega - \omega_{LO}}{\omega_{sample}})$$ (5.36) Figure 5.22: Comparison of ZOH, FOH, and SOH interpolations: (a) wideband spectrum with various interpolation filters; (b) magnitude responses; (c) in-band magnitude responses; (d) magnitude response at the first sampling replica. Therefore, in the SOH filter, the up-sampled I/Q data is applied to a digital $X^2(n)$ : $$X^{2}[n] = \Pi[n] * \wedge [n] \tag{5.37}$$ Mathematically, the SOH operation can be expressed as the convolution of the up-sampled signal with $X^2(n)$ : $$(I/Q)_{\text{Inter}}[n] = upsample((I/Q)_{\text{Inter}}[n] * X^{2}[n])$$ (5.38) In the frequency domain, $X^2[n]$ performs as a digital LPF with $Sinc^3$ amplitude characteristic. In other words, the subsequent RF power spectrum is shaped by the sinc<sup>6</sup> function: $$Sinc^{3}(\omega) = sinc^{6}(\frac{\omega - \omega_{LO}}{\omega_{sample}})$$ (5.39) The spectra of the broadband signal with ZOH, FOH, and SOH are shown in Fig. 5.22(a), which illustrates the suppression of the sampling replicas provided by the various interpolation filters. The magnitude response over the whole frequency range is shown in Fig. 5.22(b); the magnitude of the responses in-band and around the first replica is shown in Fig. 5.22(c) and Fig. 5.22(d), respectively. All the linear interpolation filters have zeros at their sampling frequency and their harmonics. Of these sampling methods, the SOH provides the highest suppression of the sampling spectral replica and gives the poorest in-band distortion. The FOH/SOH can suppress the sampling spectral replica adequately for narrowband applications, with almost negligible in-band distortion. However, for broadband applications (e.g. if the video bandwidth approaches a quarter of the sampling frequency), even when using an SOH interpolation filter, the suppression can degrade to only -40 dB. Figure 5.23: Simulated EVM with (a) correct sampling time and (b) incorrect sampling time in behavior simulations, respectively. Additionally, the large in-band distortion can yield spectral regrowth and corrupt linearity. Such distortion will place a heavy burden on the RX chain in terms of increased demands on the ADC operating at $4 \times F_s$ . To equalize the in-band distortion of an FOH or SOH interpolation, the down-converted signal received should be sampled at $4 \times F_s$ and filtered by an IIR filter which needs to operate at the same sampling frequency. Note that the EVM in an FOH/SOH can vary depending on the different sampling moment. Figure 5.23 shows the two EVMs in an FOH sampled at different moments in a behavioral simulation without any equalization. The inferior EVM in Fig. 5.23(b) can reach as low as -30 dB. Therefore, in this design, to simplify the signal processing while lowering the amplitude of the sampling spectra replica without deteriorating the EVM, a ZOH with a sampling rate of $f_{\rm LO}/2$ is employed. #### 5.6.2 DEM and Thermometer Encoder A block diagram of the DEM and thermometer encoder is shown in detail in Fig. 5.24. In this design, the effective sampling frequency is $f_{\rm LO}/2$ , ranging from 250 MHz to 1.5 GHz. Such a high speed complicates the digital flow in the CMOS 40 nm technology used for this design. To relieve the design burden, we use four sets of DEM circuits running at $f_{\rm LO}/8$ . The resulting four data streams are merged at their outputs. The four DEMs share the same random data, which is generated by a 32-bit pseudo-random binary sequence (PRBS32) with linear feedback shifted registers (LFSRs). These blocks are synthesized using a CMOS standard cell library. The isolation between the digital and analog circuit blocks is realized through a carefully designed layout. # 5.7 Measurement Results This section focuses on the measurement results of the proposed IQ-mapping DDRM. This DDRM is fabricated in a bulk 40 nm CMOS process. Its chip micrograph is shown in Fig. 5.25, which illustrates the 1 $mm^2$ core area, excluding testing SRAMs and the input balun. Since the output is differential, a commercial balun (Johanson 1720BL15B0050) is employed to convert the Figure 5.24: Block diagram of DEM and thermometer encoder. signal into a single-ended representation. In all of the following measurements, DPD has been omitted. # 5.7.1 CW Test Firstly, the proposed DDRM is characterized in CW measurements. Figure 5.26 shows its peak output power $P_{\rm OUT}$ versus carrier frequency $f_{\rm LO}$ while driving a 50 $\Omega$ differential load from 0.5 GHz to 3 GHz. At 2 GHz, the peak $P_{\rm OUT}$ is 14.1 dBm, and the related DC power consumption $P_{\rm DC}$ is 340 mW (excluding SRAMs). When the differential load is set to 12 $\Omega$ , the peak $P_{\rm OUT}$ exceeds 18 dBm, and subsequently, the overall system efficiency can be >24 % at 1 GHz. What this indicates Figure 5.25: Chip micrograph of the proposed DDRM. Figure 5.26: Measured peak $P_{\rm OUT}$ and $P_{\rm DC}$ (with 50 $\Omega$ and 12 $\Omega$ ) vs. $f_{\rm LO}$ . is the excellent potential of the proposed DDRM to be used as a linear energy-efficient pre-driver. #### 5.7.2 Single-Tone and Two-Tone Tests Next, single-tone and two-tone signals are uploaded into the proposed DDRM. Figure 5.27(a) shows the measured LO Leakage, IQ Image, and IM3 versus $f_{LO}$ . At 2 GHz, the LO leakage and the IQ image are lower than -52 dBc and -54 dBc, respectively, in a single-tone test with a 4 MHz signal offset. The spectrum measured is shown in Fig. 5.27(b). In the two-tone test, the IM3 level achieved is lower than -58 dBc for a tone spacing of 4 MHz when the DEM is disabled. The corresponding measured spectrum is shown in Fig. 5.27(c). Using DEM, the IM3 can be improved by 5 dB at the expense of increased out-of-band noise level, as shown in Fig. 5.27(d). #### 5.7.3 Broadband Signal Test Following the two-tone tests, the performance of the proposed DDRM is also verified using complex modulated signals. Figure 5.28(a) depicts the spectral purity of a 20 MHz bandwidth single-carrier 256-QAM signal at 2.4 GHz. With an average output power greater than 5 dBm and a 7.8 dB PAPR, it achieves an ACLR of -52 dBc and an EVM of -40 dB. Note that the 2.4 dB loss is de-embedded from this measurement setup. Figure 5.28(b) shows the measurement for the full-span spectrum of a 144 MHz signal at 2.2 GHz; its spectral purity is better than -50 dBc across the span. It is worth mentioning that the sampling spectral replica level is relatively high due to the large video bandwidth and fast roll-off of the Sinc function in the ZOH, which is discussed in Section. 5.6. Large bandwidth signals are also applied to the proposed DDRM. The measured spectrum of a 160 MHz bandwidth single-carrier 256-QAM signal is shown in Fig. 5.29(a). The corresponding out-of-band spectral purity is better than -48 dBc, while its EVM is better than -36 dB. As such, it is compliant with the 802.11ax spectral mask. In Fig. 5.29(b), spectrum of a 320 MHz bandwidth single-carrier 256-QAM signal with an ACLR better than -43 dBc and the related EVM of -32 dB Figure 5.27: Measured LO leakage, IQ image, and IM3 level vs. $f_{\rm LO}$ (with DEM disabled); and measured spectrum the (b) single-tone test, the (c) two-tone test when DEM is disabled, and (d) the two-tone test when DEM is enabled. Figure 5.28: (a) Measured spectrum and EVM of a 20 MHz 256-QAM signal; (b) measured broad span spectrum of a 144 MHz 64-QAM signal at $2.2~\mathrm{GHz}$ . (b) Figure 5.29: (a) Measured spectrum and EVM of a 160 MHz 256-QAM signal at 2.4 GHz, (b) measured spectrum and EVM of a 320 MHz 256-QAM signal at 2.4 GHz. is presented, proving the proposed DDRM can also be used as the modulator for next-generation WLAN (802.11be) applications. Figure 5.30(a) shows the ACLR and EVM linearity performance for different modulation bandwidths at 2.4 GHz. In Fig. 5.30(b), using a 10 MHz bandwidth 64-QAM signal, the ACLR versus $f_{\rm LO}$ is given, illustrating ACLR levels well below -45 dBc for all operating frequencies. Meanwhile, carrier-aggregated signals are also applied to the implemented DDRM. Figure 5.31(a) presents a single channel at 2.2 GHz, showing that the IQ image is lower than -44 dBc. In Fig. 5.31(b), the measured spectrum and constellation diagram of an 11-channel 64-QAM carrier aggregated signal are given. The resulting ACLR is better than -40 dBc, while EVM is better than -34 dB in the weakest channel. Figure 5.30: (a) Linearity performance vs. modulation bandwidth at 2.4 GHz; (b) ACLR performance vs. center frequency $f_{\rm LO}$ while the modulation bandwidth is 10 MHz. Figure 5.31: (a) Measured spectrum of a single-channel carrier aggregated signal at 2 GHz; (b) measured spectrum of an 11-channel carrier aggregated signal at 2.2 GHz with constellation diagram results from the weakest channel. 5.8 Conclusion 111 | Reference | | This | Work | Su<br>ISSCC20 | Mehrpoo<br>JSSC18 | Su<br>JSSC21 | Roverato<br>ISSCC17 | Yoo<br>ISSCC20 | Deng<br>ISSCC16 | Qian<br>JSSC21 | Zheng<br>ISSCC20 | Qi<br>ISSCC20 | Lee<br>ISSCC21 | |------------------|-------|------------------------|------|---------------|-------------------|--------------|---------------------|----------------|-----------------|----------------|------------------|---------------------|----------------| | Architecture | | DDRM | | DDRM | DDRM | DDRM | DDRM | CDAC | DPA | DPA | DPA | Analog <sup>3</sup> | Analog | | Matching Network | | Off-Chip | | On-Chip | Off-Chip | Off-Chip | On-Chip | On-Chip | Off-Chip | On-Chip | On-Chip | On-Chip | On-Chip | | Technology | [nm] | 40 | | 65 | 40 | 65 | 28 | 65 | 40 | 40 | 55 | 28 | 14 | | Frequency | [GHz] | 0.5-3 | | 1.4-3 | 0.9-3.1 | 0.9-5.2 | 0.85 | 2.2 | 2.4 | 2.3-3.5 | 0.85 | 1.4-2.7 | 0.5-6 | | Peak Pout | [dBm] | 14.1/18.2 <sup>1</sup> | | 22 | 9.2 | 15 | 3 | 13 | 19 <sup>4</sup> | 23.6 | 29.3 | 33,4 | 7.24 | | DC Power | [mW] | 340/540 <sup>1</sup> | | 1350 | 146 <sup>2</sup> | 900 | 150 | N.A. | 830 | 790 | 1974 | 70.5 | N.A. | | IQ Image. | [dBc] | -54 | | N.A. | -49 | N.A. | <-36 | N.A. | N.A. | N.A. | N.A. | <-40 | <-30 | | f <sub>LO</sub> | [GHz] | 2.4 | | 2.2 | 3 | 2.4 | 0.85 | 2.2 | 2.4 | 3.3 | 0.85 | 2.5 | 3.7 | | Bandwidth | [MHz] | 20 | 320 | 20 | 57 | 20 | 20 | 40 | 40 | 20 | 10 | 20 | 100 | | Modulation | | 256 | 256 | 256 | 64 | 256 | LTE20 | 802.11ax | 802.11ac | 64 | 64 | 5G-NR | 5G-NR | | Schemes | | QAM | QAM | QAM | QAM | QAM | | 1024 QAM | 64 QAM | QAM | QAM | n7 | n78 | | ACLR1 | dBc | -52 | -43 | -45 | -44 | -42 | -61 | <-45 | <-40 | -30 | -32 | -44 | -415 | | EVM | dB | -40 | -32 | -40 | -30 | -42 | N.A. | -426 | <-30 | -29 | -26 | -35 | -37 | | DPD | Y/N | No | | Yes | No | No | No | No | Yes | Yes | No | No | No | Table 5.1: Performance summary and comparison with state-of-the-art DTXs. 1Measured at 50Ω and 12Ω at 2 GHz, respectively. 2Not include LO generation circuits. 3 Not include baseband DAC. 4Average power. 5 Estimated from the figure. 6 Measured at -3dBm output # 5.7.4 Comparison with the State-of-the-Art The performance of the proposed DDRM is summarized in Table 5.1 and compared to prior art DDRMs and conventional analog modulators. This chart indicates that the proposed DPD-free DDRM, which can operate at frequencies from 0.5 up to 3 GHz, can provide superior spectral purity up to a 320 MHz signal bandwidth, with a peak RF output power of more than 14 dBm. Although [15] exhibits better ACLR performance, its peak power efficiency and carrier frequency range are considerably lower. Additionally, this work achieves a high image rejection ratio without IQ calibration. In general, the efficiency of the DDRM is lower than that of Cartesian DTXs ([27]) due to the current-steering topology. However, the DDRM provides better linearity in return. The overview in Table 5.1 indicates that with the IQ-mapping technique, the proposed DPD-free DDRM achieves superior spectral purity up to a 320 MHz modulation bandwidth. # 5.8 Conclusion This chapter proposes a novel IQ-mapping technique. Using this technique, the proposed DDRM can achieve: - 3 dB more output power compared to the conventional DDRM architecture for the same DC power consumption; - Superior image rejection. To validate the proposed architecture, a DDRM demonstrator is implemented in a 40 nm CMOS technology. With the delicate design of the RFDAC, the proposed DDRM operates over a 0.5-3 GHz frequency range while generating +14 dBm peak RF output power with a DC power consumption of only 340 mW at 2 GHz. When operating with a 320 MHz 256-QAM signal, the average output power is more than 5 dBm, and the ACLR is better than -43 dBc. This DDRM can act as an energy-efficient driver for the 802.11ax WLAN applications, or as the pre-driver of a PA in 5G cellular networks. To the author's best knowledge, this DDRM is the first-reported DDRM that can support 320 MHz signals and achieve a better than -43 dBc ACLR without applying any DPD. # References - [1] B. Jann et al., "21.5 A 5G Sub-6GHz Zero-IF and mm-Wave IF Transceiver with MIMO and Carrier Aggregation," 2019 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2019, pp. 352-354. - [2] J. Lee et al., "21.6 A Sub-6GHz 5G New Radio RF Transceiver Supporting EN-DC with 3.15Gb/s DL and 1.27Gb/s UL in 14nm FinFET CMOS," 2019 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2019, pp. 354-356. - [3] J. Lee et al., "6.1 A Low-Power and Low-Cost 14nm FinFET RFIC Supporting Legacy Cellular and 5G FR1," 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2021, pp. 90-92 - [4] E. Lu et al., "10.4 A 4×4 Dual-Band Dual-Concurrent WiFi 802.11ax Transceiver with Integrated LNA, PA and T/R Switch Achieving +20dBm 1024-QAM MCS11 Pout and -43dB EVM Floor in 55nm CMOS," 2020 IEEE International Solid- State Circuits Conference-(ISSCC), San Francisco, CA, USA, 2020, pp. 178-180 - [5] G. Qi et al., "10.1 A 1.4-to-2.7GHz FDD SAW-Less Transmitter for 5G-NR Using a BW-Extended N-Path Filter-Modulator, an Isolated-BB Input and a Wideband TIA-Based PA Driver Achieving < -157.5dBc/Hz OB Noise," 2020 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 172-174.</p> - [6] N. Markulic, P. T. Renukaswamy, E. Martens, B. van Liempd, P. Wambacq and J. Craninckx, "A 5.5-GHz Background-Calibrated Subsampling Polar Transmitter With -41.3-dB EVM at 1024 QAM in 28-nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 54, no. 4, pp. 1059-1073, April 2019. - [7] P. Eloranta and P. Seppinen, "Direct-digital RF modulator IC in 0.13 /spl mu/m CMOS for wide-band multi-radio applications," ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005., San Francisco, CA, 2005, pp. 532-615 Vol. 1 - [8] C. Erdmann et al., "16.3 A 330mW 14b 6.8GS/s dual-mode RF DAC in 16nm FinFET achieving -70.8dBc ACPR in a 20MHz channel at 5.2GHz," 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, pp. 280-281 - [9] S. Su and M. S. -W. Chen, "A Time-Approximation Filter for Direct RF Transmitter," in *IEEE Journal of Solid-State Circuits*. 5.8 References [10] C. Lin et al., "A 12 bit 2.9 GS/s DAC With IM3 $\ll$ -60 dBc Beyond 1 GHz in 65 nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 44, no. 12, pp. 3285-3293, Dec. 2009 - [11] S. Su and M. S. Chen, "10.2 A SAW-Less Direct-Digital RF Modulator with Tri-Level Time-Approximation Filter and Reconfigurable Dual-Band Delta-Sigma Modulation," 2020 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2020, pp. 174-176 - [12] M. Mehrpoo, M. Hashemi, Y. Shen, L. C. N. de Vreede and M. S. Alavi, "A Wideband Linear I/Q -Interleaving DDRM," in IEEE Journal of Solid-State Circuits, vol. 53, no. 5, pp. 1361-1373, May 2018 - [13] Y. Shen, R. Bootsman, M. S. Alavi and L. C. N. de Vreede, "A 1–3 GHz I/Q Interleaved Direct-Digital RF Modulator As A Driver for A Common-Gate PA in 40 nm CMOS," 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Los Angeles, CA, USA, 2020, pp. 287-29 - [14] Y. Shen, R. Bootsman, M. S. Alavi and L. de Vreede, "A 0.5-3 GHz I/Q Interleaved Direct-Digital RF Modulator with up to 320 MHz Modulation Bandwidth in 40 nm CMOS," 2020 IEEE Custom Integrated Circuits Conference (CICC), Boston, MA, USA, 2020, pp. 1-4 - [15] E. Roverato et al., "All-Digital LTE SAW-Less Transmitter With DSP-Based Programming of RX-Band Noise," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 12, pp. 3434-3445, Dec. 2017 - [16] P. E. Paro Filho, M. Ingels, P. Wambacq and J. Craninckx, "9.3 A transmitter with 10b 128MS/S incremental-charge-based DAC achieving -155dBc/Hz out-of-band noise," 2015 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, San Francisco, CA, 2015, pp. 1-3 - [17] S. Yoo, S. Hung, J. S. Walling, D. J. Allstot and S. Yoo, "10.7 A 0.26mm<sup>2</sup> DPD-Less Quadrature Digital Transmitter With <-40dB EVM Over >30dB Pout Range in 65nm CMOS," 2020 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2020, pp. 184-186 - [18] D. Zheng et al., "24.5 A 15b Quadrature Digital Power Amplifier with Transformer-Based Complex-Domain Power-Efficiency Enhancement," 2020 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2020, pp. 370-372 - [19] W. Yuan and J. S. Walling, "A Multiphase Switched Capacitor Power Amplifier," in IEEE Journal of Solid-State Circuits, vol. 52, no. 5, pp. 1320-1330, May 2017 - [20] S. -C. Hung, S. -W. Yoo and S. -M. Yoo, "A Quadrature Class-G Complex-Domain Doherty Digital Power Amplifier," in *IEEE Journal of Solid-State Circuits* - [21] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede and J. R. Long, "A Wideband 2× 13-bit All-Digital I/Q RF-DAC," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 4, pp. 732-752, April 2014 - [22] N. Klemmer *et al.*, "9.1 A 45nm CMOS RF-to-Bits LTE/WCDMA FDD/TDD 2×2 MIMO base-station transceiver SoC with 200MHz RF bandwidth," 2016 *IEEE International Solid-State Circuits Conference (ISSCC)*, San Francisco, CA, 2016, pp. 164-165. - [23] H. Jin, D. Kim and B. Kim, "Efficient Digital Quadrature Transmitter Based on IQ Cell Sharing," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 5, pp. 1345-1357, May 2017 - [24] E. Bechthum, G. I. Radulov, J. Briaire, G. J. G. M. Geelen and A. H. M. van Roermund, "A Wideband RF Mixing-DAC Achieving IMD < -82 dBc Up to 1.9 GHz," in IEEE Journal of Solid-State Circuits, vol. 51, no. 6, pp. 1374-1384, June 2016 - [25] Y. Liu, C. Li and T. Lin, "A 200-pJ/b MUX-Based RF Transmitter for Implantable Multichannel Neural Recording," in IEEE Transactions on Microwave Theory and Techniques, vol. 57, no. 10, pp. 2533-2541, Oct. 2009 - [26] D. J. McLaurin et al., "A highly reconfigurable 65nm CMOS RF-to-bits transceiver for full-band multicarrier TDD/FDD 2G/3G/4G/5G macro basestations," 2018 IEEE International Solid -State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 162-164 - [27] Z. Deng et al., "9.5 A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40nm CMOS," 2016 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 172-173, # CHAPTER # 6 # An I/Q Interleaved DDRM as a Driver for a Common-Gate/Common-Base PA This chapter presents a 1–3 GHz, 2 × 13-bit I/Q interleaved DDRM realized in a 40 nm CMOS technology as a driver for an external CG/CB PA. The proposed digital-intensive quadrature up-converter features novel RFDACs with an additional current division path to boost the efficiency of the CG/CB PA while maintaining linearity. The realized DDRM also employs signed IQ-mapping, class-B harmonic rejection (HR), and dynamic biasing to improve spectral purity, in-band linearity, and system efficiency. The proposed standalone digital up-converter prototype provides more than 19.6 dBm peak RF output power at 2 GHz. Without using any DPD, it achieves an ACLR of –44.5 dBc and an EVM of –35 dB when applying an "80 MHz 256-QAM" signal at 2.4 GHz. When connected to a CB SiGe PA, the overall TX line-up can generate 27 dBm RF power. With an "80 MHz 64-QAM" signal at 2.2 GHz, the overall line-up can achieve an ACLR of -37.7 dBc and an EVM of -30 dB. This chapter is organized as follows. Section 6.1 proposes a novel concept for driving a CG/CB output stage PA using a current-steering DDRM. Section 6.2 introduces an extra auxiliary current division path to overcome the conventional linearity-efficiency conflict in the TX line-up. Following that, three novel techniques are proposed in Section 6.3, Section 6.4, and Section 6.5, respectively. The overall system architecture is presented in Section 6.6, and the design considerations for the CG/CB PA are discussed in Section 6.7. The measurement results of the realized demonstrator and overall TX line-up are shown in Sections 6.8 and 6.9, respectively. Finally, Section 6.10 will conclude this chapter. Figure 6.1: (a) Conceptual diagram of a common source (CS) output stage; (b) corresponding voltage/current waveforms. # 6.1 Using the DDRM as the Driver for a CG/CB PA The 5G of mobile wireless networks offers low latency and high data throughput to users, utilizing m-MIMO technologies. The RF transceivers employed in these networks must provide high system integration and energy-efficient signal transmission while being able to handle a large modulation bandwidth. Several DTXs have been proposed over time to address these requirements while fully benefiting from nanoscale CMOS technologies. In these architectures, to support wideband operation, a Cartesian (I/Q) modulator is typically used as a key building block, as it can be realized as a DDRM in advanced implementations. Among these implementations, current-steering DDRMs are well-known for their superior spectral purity due to always-on current sources, making them excellent candidates for driving external MMIC PAs. As depicted in Fig. 6.1(a), such an external power device is mostly implemented as a CS/CE PA for high gain considerations. Furthermore, the output stage is biased in the class-AB/B region to boost drain efficiency while still somewhat preserving linearity. Consequently, at higher output power levels, a clipped class-B-shaped drain current waveform will appear (Fig. 6.1(b)). In such a configuration, the inter-stage matching network will, to a large extent, determine the signal transfer from the DDRM output to the (voltage-mode) input of the output stage. Due to the non-linear I-V curve of a CS/CE output stage, the input to the output transfer of the final stage will undergo some distortion (Fig. 6.2). For example, in [1] and [2], the ACLR is corrupted by 20 dBc when the CMOS analog modulator is connected to an external CS PA implemented in LDMOS technology. To overcome this distortion, linearization techniques such as DPD are used in practical m-MIMO systems ([1]). To avoid the need for DPD, instead of a CS/CE output stage, a true current-mode CG/CB Figure 6.2: Typical I-V curve in a CS-connected transistor. Figure 6.3: Conceptual diagram of a DDRM driving a CG PA. configuration can be employed. Doing so, the non-linear transformation from the DDRM output voltage to the power device output current can be avoided, which can provide significantly higher TX line-up linearity. However, following this current-mode concept, other "new" challenges appear that need to be handled correctly. First, designing an MMIC CG/CB PA is more difficult than a CS/CE PA due to the more stringent stability conditions caused by either larger impact of the gate-drain capacitance ( $C_{\rm gd}$ ) or any inductance in the gate connection. Therefore, in practical CG/CB configurations, the voltage gain is typically compromised to maintain stability, which is lower than in a CS/CE output stage. As a consequence, any series impedance between the DDRM's output and the CG/CB PA's input must be minimized to obtain sufficient output power. Therefore, there will be no (passive) matching gain nor filtering of the harmonics. Finally, conventional DDRMs employ unsigned IQ operation, which automatically results in class-A operation. The related non-return-to-zero (NRZ) current waveform in the class-A (Fig. 6.4(a)) does not support power-efficient operation due to the absence of current clipping. To obtain higher power efficiency, the waveform should return-to-zero (RZ) in every RF cycle, as shown in Fig. 6.4(b). Figure 6.4: Ideal current curve:(a) class-A type; (b) class-B type. Such a waveform (except for the square wave shape) is somewhat similar to an analog class-B waveform as the conductance angel is $\pi$ . In some studies, a DPA ([3]) is employed to drive a CG PA ([4]), the conceptual diagram of which is shown in Fig. 6.5(a). In this approach, the CMOS driver can generate a clipped current waveform, which makes it energy-efficient and capable of reaching high output power. However, it will only offer limited linearity due to the strongly varying output impedance of the CMOS driver. In contrast, a DDRM ([5]-[12]) can offer superior linearity due to its current-steering architecture, especially when these current sources are always-on and in class-A operation. However, this comes with an inherent conflict between linearity and efficiency when driving a CG/CB PA (Fig. 6.5(b)). In particular, the CG/CB PA needs class-B operation to achieve high efficiency whereas the CMOS driver needs class-A operation to maintain high linearity. The next section will propose a novel technique to achieve good linearity at the expense of a small drop in efficiency. # 6.2 Auxiliary Current Division Path As discussed previously, there seems to be a rather fundamental conflict in the TX configuration between linearity and efficiency. This section proposes a novel "signed" IQ DDRM TX driver ([13]) to overcome this conflict. The operation is conceptually illustrated in Fig. 6.6. As stated above, conventional IQ DDRMs typically comprise two separate banks of current-steering RFDACs, which are operated in class-A operation to allow the use of always-on current sources. To achieve the current function needed for class-B operation, as depicted in Fig. 6.6(a), an additional division path is proposed to redirect the current of these always-on sources in the case of clipping to an auxiliary division path. These division paths are connected to a low supply voltage (e.g., 2.5 V), while the main differential paths carrying the clipped current are connected to the CG/CB MMIC PA. The up-converted differential current waveforms and the division current of one unit cell are shown in Fig. 6.6(b). As can be observed, the total current of all branches ( $I_P + I_N + I_{LEAK}$ ) becomes constant and equal to the corresponding current source of the unary cell ( $I_{UNIT}$ ). Most importantly, thanks to the division path, each unit cell can generate three different logical states, namely +1, Figure 6.5: Conceptual diagram of driving a CG PA by (a) a DPA and (b) a DDRM. Figure 6.6: (a) Conventional DDRM with division paths; (b) its unit cell up-converted current waveform. Figure 6.7: Settling behavior of output current with a load of (a) 3 $\Omega$ and (b) 12 $\Omega$ . 0, and -1, enabling the signed operation. Therefore, the proposed signed IQ DDRM executes the clipping function, boosting the following CG/CB PA's drain efficiency without negatively affecting the TX linearity by keeping the current sources of all banks constant. It is worth mentioning that the degradation of the overall system efficiency of the added division path is relatively small since the drain voltage of the PA is typically much higher (i.e., preferably as high as 18 V) than the CMOS supply voltage (2.5 V in this design). Considering the voltage headroom and assuming a voltage gain of 10 for the CG/CB stage, the achievable peak system efficiency for these conditions is 52 %. Furthermore, since the currents of all branches are kept constant, both bias and thermal-induced memory effects are minimized, which is very beneficial when striving for high spectral purity. In the proposed signed DDRM operation, the class-B-like waveforms have sharper transitions than they do in conventional unsigned class-A DDRM operation. Consequently, the settling behavior of the output signal will have more impact on the linearity. One of the main factors influencing this settling is the load impedance. A lower impedance level will cause a higher RC time constant $\tau$ at the DDRM output, and yield a more favorable ratio for the wanted ohmic component. This incomplete settling behavior will introduce signal-dependent distortion. Figure 6.7 compares the output signal settling behavior for two different load impedances relative to 3 $\Omega$ and 12 $\Omega$ . The lowest impedance causes the lowest settling error and offers the highest linearity. To suppress this type of distortion, the load impedance and its parasitic capacitance should be minimized. When targeting a non-frequency-agile solution, resonating out the output capacitance of the DDRM allows an optimum in linearity to be reached for a given frequency band. # 6.3 Signed IQ-Mapping Unit Cell In Chapter 5, a novel IQ-mapping technique is proposed to achieve higher output power and lower in-band distortion for an *unsigned DDRM*. A similar concept can also be applied to the *signed DDRM* in this chapter. The corresponding concept diagram is deduced from Fig. 6.6 and shown in Fig. 6.8. When comparing Fig. 6.6 and Fig. 6.8, a single unit cell can up-convert both I/Q data, improving the utilization ratio of the current sources. What is more, the division path in two Figure 6.8: Proposed top-level schematic of the DDRM. Figure 6.9: Different combinations with their corresponding waveforms within a single unit cell. cells can be merged into one path and allow us to reduce the area and lower the effect from parasitics. However, due to the different IQ data representations (*signed* vs. *unsigned*), modifications are still needed to adapt this concept into a linear energy-efficient TX system approach, which we will # 6.3.1 IQ-Mapping in Signed RFDAC discuss next. Chapter 5 proposes the IQ-mapping technique, which maps the original IQ constellation points into its diamond-shape I'Q' counterpart. By doing so one can reuse the current cell, yielding 3 dB more output power and intrinsic image rejection of the unit cell. Note that the data representation used in Chapter 5 is *unsigned*, meaning that for I or Q in each unit cell (pair), there are only two states: -1 and 1. Therefore, there are four combinations: (1,1), (1,-1), (-1,1), and (-1,-1), which can be mapped to the four LO waveforms with 90° phase shifts. Meanwhile, the reference, or DC levels, should be in half of the swing (Fig.6.4(a)). However, this *signed* DDRM, which targets the clipped current waveform, has three levels for the I or Q in the unit cell, namely: +1, 0, and -1. Therefore, there are nine combinations in total: (+1,+1), (+1,0), (+1,-1), (0,+1), (0,0), (0,-1), (-1,+1), (-1,0) and (-1,-1). These cannot be mapped to the four LO phases, unlike with the previously discussed IQ-mapping technique. If 25 % duty-cycle clocks can be used ([14]), there will be more options to choose from (LO with both a 25 % and 50 % duty-cycle), but at the expense of a lower RF output power level compared to the 50 % case. When a 50 % duty-cycle LO is employed, the bottleneck occurs in the scenario when both I and Q are not zero, namely, in the combinations of (+1,+1), (+1,-1) (-1,+1) and (-1,-1); the corresponding waveforms are shown in the right half of Fig.6.9. In a single unit cell, it is difficult to generate such waveforms with only one current source. Therefore, the I and Q signals cannot share the same cell current simultaneously. To enable an IQ signal to share one unit cell without sacrificing output power, an I/Q complementary decoding scheme is applied [15], which is shown in Fig. 6.10(a). For N unit cells, each unit is given a unique label n $(1 \le n \le N)$ . The I decoding scheme is in ascending order: if $|I_{\rm BB}| = I$ , the I enables signals from 1 to I to be turned on. On the other side, the Q decoding scheme is in descending order: if $|Q_{\rm BB}| = Q$ , the Q enables signals from N to N - Q + 1 to be turned on. To eliminate IQ overlap, the baseband data must be pre-processed as: $$\begin{cases} I' = I + Q \\ Q' = Q - I \end{cases}$$ (6.1) using the condition of: $$\begin{cases} I \le N/2 \\ Q \le N/2 \end{cases} \tag{6.2}$$ By doing so, the constellation diagram is transferred from a square shape to a diamond shape, as shown in Fig. 6.10(b), and similar to the IQ-mapping technique introduced in Chapter 5. As discussed in Chapter 5, there will be 3 dB more output power for the same DC current budget. However, such reusing of unit MSB cells can only be done in the unary cells; in binary LSB cells there have to be two sets of cells in case I and Q signal are not zero at the same time. Therefore, there are two independent LSB arrays for I and Q banks. Note that (6.1) is similar to the (5.25) for IQ-mapping DDRM in Chapter 5, since both scenarios use a current-reusing strategy to boost output power. However, there are several differences between their implementations, which will be summarized next. The image due to IQ mismatch can be suppressed with such arrangements, but not to the same extent as that described in Chapter 5, as it still yields a (smaller) image component at the output. There are two main reasons for this incomplete cancellation. One is the LSB banks for I and Q are separate (Fig. 6.10(a)), so the mismatch of I and Q in the LSB cell is independent and therefore cannot be canceled. In other words, the resolution of the I/Q image cancellation is 6 bits at most since it is only set by LSB. One potential solution for this limitation is to shuffle the signal fed to two banks, e.g., using DEM. However, this comes at the expense of higher out-of-band noise. The other main reason why the IQ image fails to be canceled is that the I and Q thermometer encoders use the opposite direction (Fig. 6.11(a)). Thus, since the mismatch of the I and Q MSB cells are also not identical, it also contributes to the I/Q image. In an extreme case, when the amplitude of both I and Q is small, the mismatch is completely independent. Therefore, the mismatch of both Figure 6.10: (a) I/Q complementary decoding scheme; (b) constellation diagram transformation. Figure 6.11: (a) DNL pattern of MSB in the proposed decoder pattern; (b) simulated image rejection ratio distribution with $\sigma = 1$ LSB with and without the signed IQ-mapping technique. the I and Q branches mismatch is cor-related but not canceled completely, which is demonstrated in the MC simulation with a single-sideband signal in Fig. 6.11(b). With a 6-MSB thermometer and 6-LSB binary pattern, with a variation of $\sigma$ =1 LSB, the image can be suppressed dramatically but not completely. # 6.3.2 Comparison with Unsigned IQ-Mapping Technique in Chapter 5 The aforementioned unit cell is in some respects similar to the cells in Chapter 5, but quite different in others. This section presents a comparison. The shared advantage of the two techniques is the current utilization enhancement. As can be seen, both techniques map the original constellation diagram into the diamond shape, which intrinsically provides 3 dB more output power, and thus doubles drain efficiency. However, there are still several main differences: - Although both share diamond-shaped constellation diagrams, they are obtained differently. With the *unsigned* IQ-mapping technique of Chapter 5, the mapping takes place within each unit cell. For the *signed* unit cell, however, pre-processing (see (6.1)) is required to obtain the diamond-shaped constellation diagram and avoid an overflow. - Also, the data representation in both is different. The IQ-mapping technique in Chapter 5 can only be applied to the *unsigned* data. In this chapter, the *signed* data format is employed. - In Chapter 5, the direction of the thermometer encoder of I and Q are the same (Fig. 5.13). In contrast, the thermometer encoder direction in this chapter is inverse (Fig. 6.11(a)), yielding the selection of different unit cells for the I' and Q' data. Therefore, matching differences between these cells yields a partial survival of the image component. In addition, in Chapter 5, there is only one LSB binary array, while there are two binary arrays in this DDRM architecture (Fig.6.10), which is another contributor to the image. - In Chapter 5, there is no extra path needed to keep the current sources activated, since they are effectively always on (class-A like), and configured in opposite driving phases when no output signal is needed. In contrast, in this chapter, a current division path is needed to keep the current constant in a single RF cycle. - In Chapter 5, due to the NRZ property, the DDRM can work well even at high load impedance levels (e.g., 50 Ω), which is also verified in the measurement. However, as shown in Fig. 6.8, the DDRM proposed in this chapter works best at a low impedance level due to its RZ property. #### 6.3.3 Design of the Unit Cell The topology of the unit cell for use in a signed DDRM is shown in Fig. 6.13. It can be derived from Fig. 5.14 by adding the auxiliary division path. This division path is connected to an external (low voltage) supply. Extra on-chip decoupling capacitors are added to this supply line to avoid unwanted (supply) voltage modulation. To boost the output impedance of each unit cell, a thick-oxide transistor is also placed on the top of this division path. The mixing unit up-converts the IQ data with the related logic that satisfies the signed IQ-mapping requirements. Also, a switch is added to the bias path that can be used to deactivate the current source in deep PBO operation, which will be discussed in the following section. Calibration circuitry is added to allow cancellation of any remaining current source mismatch among the unit cells to boost the linearity and keep the required DDRM core area relatively small. A detailed schematic of the mixing unit is depicted in Fig. 6.13. The decoded data are generated by the bitwise AND operation of the I/Q data and the sign bits: DI·SI, DQ·SQ, DI· $\overline{\text{SI}}$ , and DQ· $\overline{\text{SQ}}$ . Subsequently, the up-conversion is performed using the bitwise multiplication of the current-mode XOR/XNOR of the quadrature LO clocks, with the related decoded data. For the division path, the data switch is only activated when both DI and DQ are logical zeros. # 6.4 Dynamic Biasing Technique Conventional DDRMs exhibit superior linearity but low power efficiency, especially in the deep PBO region, due to their unsigned operation. In the deep PBO region, regardless of how low the output signal amplitude is, all unit cells will still be activated to keep the DC output level constant. As a consequence, the DC power consumption does not scale down with the output signal. Therefore, conventional unsigned DDRMs operate basically in class-A mode. Their power efficiency at the 6 dB PBO point is reduced to only a quarter of their peak efficiency (Fig. 6.14(a)). If in the deep PBO region the current sources of the unit cells are dynamically (de)activated to save power, the unavoidable settling (activation) times of these current sources will introduce time-varying "DC" Figure 6.12: Detailed topology of a unit cell. Figure 6.13: Detailed topology of the mixing unit. Figure 6.14: (a) $\eta_{Drain}$ vs. PBO region in a conventional DDRM and the proposed signed DDRM; (b) signal-dependent distortion introduced by a fluctuating DC level. Figure 6.15: Concept of dynamic biasing technique. offsets" and consequently signal-dependent distortion, which is illustrated in Fig. 6.14(b). However, in signed operation with the availability of an auxiliary division path, these drawbacks can be overcome, as explained next. # 6.4.1 Concept of Dynamic Biasing Technique In the proposed signed IQ DDRM structure, thanks to the division path, the unused current at the PBO regions can be independently scaled down. This requires placing a switch to turn on/off the corresponding current source dynamically inside each unary cell, which is already shown in Fig. 6.12. Figure 6.15 demonstrates the principle operation of the proposed dynamic biasing technique, assuming that the system is operated in the deep PBO region. The unused current sources at the PBO regions are turned off (shift from green to gray in Fig. 6.15). The details of this implementation will be discussed in the following section. Nonetheless, the independent control in this topology allows pre-activation of the current sources (see Fig. 6.15), which helps preserve linearity. The corresponding modified efficiency versus PBO level is shown in Fig. 6.14(a). Note that the resulting performance is very similar to that of a class-B efficiency curve. Compared to conventional DDRMs, the proposed signed DDRM with dynamic biasing can double its efficiency when the PAPR of the TX signal is 6 dB, and even further in the deeper PBO region. To the authors' knowledge, this is the first reported DDRM/RFDAC with the signed operation that can provide a class-B efficiency curve. # 6.4.2 Implementation The dynamic biasing technique sets high demands on the design and implementation of the related biasing networks. In this DDRM design, local bias current generation with decoupling capacitors is used to prevent crosstalk and minimize current mismatch among the different unit cells. A global gate-source voltage is distributed to aid the local bias current generation and minimize the IR drop. This voltage is routed by UTM. The bias current is used to locally bias the gate of the current-source, and decoupling capacitors at sensitive voltage nodes ensure that disturbances and noise are filtered out. Note that the use of only one shared reference source for the whole DDRM prevents disturbances, compared to the use of multiple reference sources, which would show up as spurs in the output signal. The topology for the local bias current generation is shown in Fig. 6.16(a). To speed up the turning-on time, a current re-direction technique is employed in the switch design. As shown in Fig. 6.16(a), there is an identical dummy diode cell to guarantee the voltage at node A remains constant when the current source is switched off. When the current source is switched off, the control signal EN will push the switch to connect to the dummy cell. In addition, node B is connected to ground to speed up the turning-off process. Despite these measures, the current source turning-on process still takes more than 50 ns due to the large device used for the NMOS current sources. This can still somewhat limit the bandwidth for the complex modulated signals when using this technique, regardless of the pre-activation approach. However, the turning-off process is not important since the current has already been directed to the division path. No dynamic biasing switch has been implemented in the binary cells, and only a dummy switch for cell matching purposes. Note that turning on/off a large number of current sources simultaneously yields a high current step, which might induce some supply variations that will appear as spurious content in the output spectrum. Therefore, it is preferable to distribute the pre-activation of the current sources over time. To achieve high flexibility and enable a different pre-activating time for the current cells, an extra SRAM together with a pair of thermometer encoders are also employed, as shown in Fig. 6.16(b). Only when both I and Q are zero is the current source disabled. The direction of both thermometer encoders is opposite to facilitate the signed operation, which is similar to what is illustrated in Fig. 6.10. Even more accurate/faster control of dynamic biasing can be a research direction in the future. Figure 6.16: Topology of current source switch in unit cells # 6.5 Class-B Harmonic Rejection Technique In PA design, odd harmonics, especially the third and fifth, often cause difficulties in design. Especially, in CS/CE PAs, odd harmonics are typically folded back to the close-in spectrum region due to the non-linearities of the I-V curve in the PA. Such folding-back imposes intermodulation products, especially C-IMD3 (introduced in Chapter 4, originating from [16]), corrupts the in-band and out-of-band linearity performance. Furthermore, a large harmonic content will fall short of some wireless standards, demanding an extra passive BPF at the output, which is lossy and expensive and violates the frequency-agile nature offered by the DDRM approach. In the CG/CB PA, the transfer almost relaxes the C-IMD3. However, it would still require some suppression of the third and fifth harmonics at its output, again scarifying the desired frequency-agile behavior. Therefore, in this work, we propose a class-B type HR to cancel the third and fifth harmonics more efficiently. # 6.5.1 Class-B Type Harmonic Rejection The class-A type of HR has already been introduced in Chapter 4. Its principle is based on the use of three "parallel" mixers with an amplitude scaling of 1, $\sqrt{2}$ and 1, and phase shift of 0°, 45°, 90°, respectively. Under these conditions, the third and fifth harmonics will cancel each other while the fundamental is preserved. The corresponding waveforms in the unit cell are shown in Fig. 6.17(a), and the related vector diagram is shown in Fig. 6.17(b). Note that the effective duty-cycle of the waveform shown in Fig. 6.17 is 75 %, and there will be an NRZ waveform when summing I and Q, so the related operating class of the CG/CB PA will be class-A (Fig. 6.17(c)), and yielding low efficiency. Therefore, the classical HR technique cannot be applied in this work. Except for the aforementioned HR technique ([7] and [16]), a few other HR techniques have been proposed in the literature ([17] and [18]). However, those concepts require a $60^{\circ}$ or even $64^{\circ}$ phase shift between the vectors, which is much more difficult than generating $45^{\circ}$ phase shifts. In summary, there is a need for a HR technique with an RZ waveform based on the use of $45^{\circ}$ phase shifts. The class-B type HR developed in the next part satisfies both demands. The principle of class-B type HR used in this work is shown in Fig. 6.18. It is based on two DACs: one operating with a 50 % duty-cycle square waveform, and another with a 25 % duty-cycle square waveform, both shifted $45^{\circ}$ in each respective phase. The Fourier series of the 50 % square waveform $(LO_1(t))$ duty-cycle and the 25 % duty cycle square waveform $(LO_2(t))$ are: $$LO_1(t) = \frac{2A_{50}}{\pi} \sum_{n=1}^{\infty} \frac{\sin((2n-1)\omega_0 t)}{2n-1}$$ (6.3) and: $$LO_2(t) = \frac{A_{25}}{4} \sum_{n=-\infty}^{+\infty} \frac{\sin(n\pi/8)}{n\pi/8} e^{jn(\omega_0 t - \frac{pi}{4}) + \frac{pi}{2}}$$ (6.4) respectively, where $\omega_0 = 2\pi f_0$ , and $f_0$ is the LO frequency. From (6.3) and (6.4), to cancel the third Figure 6.17: Principle of the class-A HR technique: (a) block diagram; (b) timing diagram; (c) waveform with the summation of I and Q signals. Figure 6.18: Principle of the class-B HR technique: (a) block diagram; timing diagram; (c) spectrum; (d) push-pull output to cancel even harmonics. Figure 6.19: Harmonic rejection levels with mismatch for different amplitudes and phase mismatch: (a) for the third harmonic; (b) for the fifth harmonic. and fifth harmonics, the amplitude $A_{50}$ and $A_{25}$ should satisfy: $$\frac{A_{25}}{A_{50}} = \sqrt{2} \tag{6.5}$$ The corresponding frequency response is shown in Fig. 6.18(c). Note that in contrast to the class-A approach, even harmonics are also present in the generated signal. However, since the output of the RFDACs here is push-pull, and class-B type output matching network is used in the output, these even harmonic components will not show up in the DDRM output signal (Fig. 6.18(d)). Although a similar HR waveform has been proposed in a polar TX system ([8]), here, class-B HR is used in an interleaved-phased-mapped Cartesian DDRM. Such a configuration benefits from some unique features. First of all, the polar system is far less suitable for this type of HR technique since it is difficult to generate 25 % and 50 % duty-cycle time-varying phase-modulated signals, which perfectly track each other. More specifically, in [8], the 25 % duty cycle is deduced from the 50 % duty cycle clock using a delay line and AND gates, which is rather inaccurate and needs a real-time LUT to control the delay. Therefore, as can be concluded from [8], such a polar HR technique is not suitable for large bandwidth applications. Another constraint of the work in [8] is that either the third or fifth harmonic can be canceled since the output of the class-D<sup>-1</sup> DPA used in that particular work is no longer a square wave, not even in theory. In this work, however, class-B HR technique is applied in a Cartesian current-steering RFDAC application. As a result, it's much easier to generate the 25 % duty cycle square wave. Moreover, the absence of strong non-linearities due to the use of a CG/CB output stage leads to excellent spectral purity. # 6.5.2 Influence of Amplitude and Phase Mismatch As in the class-A HR technique technique, accurate amplitude and phase matching between two RFDACs are essential to achieve good third and fifth harmonic rejection levels. As can be deduced from the Fourier series, the harmonic rejection ratio is given by: $$HD_3(t) \simeq \frac{1}{36}[(1+\Delta)\cos 3\theta - 1]^2 + \frac{1}{36}[(1+\Delta)\sin 3\theta]^2$$ (6.6) and: $$HD_5(t) \simeq \frac{1}{100}[(1+\Delta)\cos 5\theta - 1]^2 + \frac{1}{100}[(1+\Delta)\sin 5\theta]^2$$ (6.7) where $\Delta$ is the amplitude mismatch, and $\theta$ is the phase mismatch at the fundamental frequency caused by a delay in $\Delta t$ . As can be seen from these equations, the rejection ratio depends strongly on the phase error in the original signal. In Fig. 6.19, the harmonic rejection is plotted as a function of phase error for different amplitude errors, respectively. For 1° phase error and 1 % amplitude matching error, rejection levels of better than 40 dB is achieved for both the third and fifth harmonics. Compared to the simulation results in [16], the proposed class-B HR technique provides, in theory a higher harmonic rejection (more than 5 dB), with a lower number of RFDACs. The duty-cycle accuracy is another factor that influences the achievable (class-B) harmonic rejection ratio. The accuracy does not show up in class-A HR technique since all three branches share the same duty-cycle. This mismatch is typically caused by mismatch between the rising and falling time in the buffer or by a non-optimized bias point in the clock circuitry. Assuming that the 50 % clock is ideal and duty-cycle mismatch is in the 25 % clock, one can conclude that from the Fourier series that the resulting harmonic rejection ratio can be written as: $$HD_{3}(t) \simeq \frac{1}{36} [(1+\Delta)\cos(\frac{3}{2}(\theta_{2}-\theta_{1})\cos(\frac{1}{2}(3\theta_{2}+3\theta_{1})-1)^{2} + \frac{1}{36} [(1+\Delta)\cos(\frac{3}{2}(\theta_{2}-\theta_{1})\sin(\frac{1}{2}(3\theta_{2}+3\theta_{1}))]^{2}$$ $$(6.8)$$ and: $$HD_5(t) \simeq \frac{1}{100} [(1+\Delta)\cos(\frac{5}{2}(\theta_2 - \theta_1)\cos(\frac{1}{2}(5\theta_2 + 5\theta_1) - 1)]^2 + \frac{1}{100} [(1+\Delta)\cos(\frac{5}{2}(\theta_2 - \theta_1)\sin(\frac{1}{2}(5\theta_2 + 5\theta_1))]^2$$ $$(6.9)$$ where $\Delta$ is the amplitude mismatch and $\theta_1$ , and $\theta_2$ are phase errors caused by delay mismatch in the rising and falling transition, respectively. What is shown in Fig. 6.20 is the harmonic rejection level plotted as a function of the duty-cycle with a 5 % amplitude error. Generally, the mismatch of the duty-cycle will deteriorate the harmonic rejection, but there are some exceptions in which the phase and amplitude mismatch will cancel each other. #### 6.5.3 Implementation To implement the class-B HR technique in this work, two RFDACs with different duty-cycles are embedded in the DDRM. For this purpose, a dedicated floorplan has been developed, and is shown in Fig. 6.21. The two RFDACs are mirrored with the binary cells in the middle. Every RFDAC includes two sets of binary cells. When functioning as the IQ-mapping DDRM described in the previous chapter, only one set will be activated in contrast to the class-B operator, in which both sets are activated. There are dummy cells in the center of the two RFDACs, and the clock and LO Figure 6.20: Harmonic rejection contours with duty-cycle mismatch: (a) for the third harmonic; (b) for the fifth harmonic with an amplitude error. Figure 6.21: Floorplan of the proposed dual RFDACs. lines are routed above them. To simplify the design and improve matching, the unit cells in the 25 % RFDAC are kept the same as the 50 % RFDAC unit cell in Section 6.3, without the mixing cell topology. Two separate bias networks are used for the 25 % and 50 % duty-cycle RFDACs to achieve an accurate $\sqrt{2}$ current scaling with more flexibility. The physical size of the current sources in the 50 % and 25 % RFDAC are the same, while the maximum current capacity current source in MSB cells is more than 3 mA. The topology of the mixing cell in the unit cell of the 25 % RFDAC is shown in Fig. 6.22. The decoded data is generated by both the bitwise AND operation of the I/Q data and the sign bits: $DI \cdot SI$ , $DQ \cdot SQ$ , $DI \cdot \overline{SI}$ , $DQ \cdot \overline{SQ}$ , which is identical to the 50 % RFDAC case. For the division path, however, the local decoding is different, because in the 25 % RFDAC case, when one of the IQ data becomes a logical one, the division path still needs to carry current, which can be demonstrated in Fig. 6.23. Figure 6.22: Topology of the mixing core in the unit cell of the 25 % RFDAC. Figure 6.23: Timing diagram of the dynamic biasing in the 25 % RFDAC. Figure 6.24: Top-level architecture of proposed DDRM. As can be seen from Fig. 6.20, any duty-cycle mismatch can profoundly affect the harmonic rejection ratio. Therefore, it is crucial to generate the 25 % LO clock locally inside each mixing unit using a bitwise AND of two 50 % quadrature clock signals to minimize the delay mismatch between the two LO distribution networks. In Fig. 6.22, the 25 % duty-cycle signal is generated inside each unit cell with an analog AND gate, whereas in the 50 % RFDAC in Fig. 6.22, the clock switch is split between two transistors to obtain matching with the 25 % RFDAC. By doing so, the signal in the LO tree for both RFDACs is 50 %, and an identical layout can be (re-)used to minimize mismatch and lower the design complexity. As mentioned above, the proposed mixing unit has been designed to feature only a single current source, thus reducing the IQ mismatch. Figure 6.23 also shows that the aforementioned dynamic biasing technique can also be applied in the 25 % RFDAC without modification. ### 6.6 System Architecture The overall DDRM architecture is shown in Fig. 6.24 and includes all aforementioned features. There are two RFDACs, one with a 50 % duty-cycle and the other one using a 25 % duty-cycle. The IQ baseband data are fed to the SRAMs through an SPI and transferred to thermometer format on-chip. Four on-chip SRAMs are time-interleaved to allow the data bit-stream to equal the operating frequency ( $f_{LO}$ ), thus allowing the support of very large modulation bandwidths without any linearity degradation. Furthermore, there is an extra SRAM to fully control the dynamic biasing independently. For the typical process corner, with 1.1 V, the sampling frequency of rhese SRAMs is expected to be as high as 800 MHz. The topology of LO generation circuitry is shown in Fig. 6.25. The external single-ended $4 \times f_{LO}$ clock signal is first transferred to a differential signal and then divided on-chip to generate that the required eight clock phases, each with a 50 % duty-cycle. The first stage divider employs CML logic Figure 6.26: (a) Schematic of the first stage divider; (b) schematic of the CML latch. (a) to guarantee that the divider can work properly at 12 GHz. A schematic of the divide-by-2 CML frequency divider is shown in Fig. 6.25, where its output signal is buffered to obtain a rail-to-rail swing square wave. The second divider stage is implemented in C<sup>2</sup>MOS logic, and its topology can be found in Fig. 3.21. For the CML divider, in the typical process corner at 120°C temperature, the power consumption is expected to be less than 5 mW when using a 1.1 V supply yielding an operating frequency range from 2 to 20 GHz. Simulations show that over different process corners (slow-slow, slow-fast, fast-fast, fast-slow, and typical-typical) and temperatures from -40°C to 120°C degrees, the CML divider can always cover the targeted frequency band of operation. To obtain accurate output phases, a C<sup>2</sup>MOS D-flip-flop retimes the eight output phases. Also, AC-coupled level shifters are employed to shift the LO signal from the low voltage domain of 0-1.1 V to 0.6-1.7 V. To characterize the CMOS driver independently, the DDRM outputs are connected first to an off-chip matching network/balun. At a later stage, this balun will be replaced by a CG/CB PA stage. Previous discussions indicate that the DDRM's output loading must be low-ohmic to achieve high output power and linearity. Therefore, an off-chip Marchand balun is adopted from [19], the layout and transfer pattern of which are shown in Fig. 6.27. Tight differential coupling with a high even-mode impedance is required to realize a wideband Marchand balun with sufficiently low Figure 6.27: Compensated Marchand balun with second harmonic termination implemented by a via, and the measured and simulated differential-to-single-ended transmission loss in [19]. impedance. By employing the re-entrant type coupled lines with a proper dielectric constant and dielectric layer thickness between and underneath the conductors, the target tight coupling and high even-mode impedance are achieved, yielding a low-loss wideband balun. By combining the Marchand balun with a differential re-entrant type impedance inverter (total length $\lambda/4$ ) featuring second harmonic impedance control, a well-controlled wideband performance can be achieved. In other words, by having a second harmonic short circuit termination for the second harmonic after the first $\lambda/8$ section, an open circuit at the reference plane of the DDRM can be achieved. The required even-mode second harmonic short-circuit condition in the re-entrant coupled lines can also be realized by adding a simple via from the floating middle layer conductor to ground at the position where the (even-mode) electrical length for the second harmonic $2f_{LO}$ equals $\lambda/8$ . Due to the tight coupling between the three conductors, the top metals are automatically forced to ground for their even-mode signals, while the differential operation/terminations remain unaffected. ### 6.7 Design Consideration of the CG/CB PA This section discusses design considerations in the co-design of the DDRM driver and the CG/CB PA. To boost output power and achieve more stability, the CG/CB PA employs a push-pull topology. As discussed in Section 6.1, the voltage gain of the CG/CB PA cannot reach very high due to stability concerns. As is shown in Fig. 6.28(a), the input voltage of CG/CB PA $V_{PA}$ is equal to $$V_{\rm PA} = \frac{1/g_m}{1/g_m + Z_{\rm CON}} V_{\rm DDRM}$$ (6.10) Figure 6.28: Connection between CMOS driver and CG/CB PA. where $g_m$ is the transconductance of CG/CB PA, and $Z_{\rm CON}$ is the impedance of the connection usually dominated by the inductance of the bonding wire that connects the driver and the PA. For example, the distance between the CMOS die and CG/CB PA die is typically 0.5 mm. Consequently, based on the rule of thumb of 1 nH/mm, one could expect bonding wire inductance of about 0.5 nH, contributing an impedance of 3.14 $\Omega$ at 1 GHz. Assuming that the $1/g_m$ is 3 $\Omega$ , then the chip-chip connection will corrupt the overall output stage voltage gain by 3 dB. Below such a low impedance level, the parasitic capacitance can be neglected. Therefore, minimizing the series impedance of the connecting structure ( $Z_{\rm CON}$ ) is essential for providing a high drive voltage to the CG/CB stage, and consequently, for achieving higher output power. Besides decreasing the distance between the dies, which becomes impractical beyond a certain point, using multiple bonding wires is another method that can decrease the impact of the bonding wire inductance. An equivalent schematic of a multi-bond wire connection is shown in Fig. 6.28(b). Although the use of parallel bond wires decreases the self-inductance, the mutual inductance among these bonding wires, if they share the same current direction, will typically become dominant. If an interleaving bonding diagram is applied, such as the one shown in Fig. 6.28(c), the effective overall mutual inductance due to the opposite current directions will be much lower, which is to lower the total inductance of the bond wire array between the dies. As shown in Fig. 6.29, the interleaving transistors in the push-pull topology can be achieved. There are four pairs of push-pull transistors in the layout, and the order of the eight inputs is P, N, P, N, P, N, P, N, to cancel the inductance of the bonding wire. Note that the output of the push-pull CG/CB PA stages are combined but still differential, so it needs to be connected to a balun to achieve a single-ended output. Another issue is that to boost efficiency, the current waveform is artificially "clipped" with a division path. This operation corrupts the linearity performance if the PA transistor is forced to be in the diode region when there is no current. To eliminate this issue with a low-efficiency penalty, two sets of static bleeding current sources are placed in the CMOS driver chip, similar to the bleeding techniques in the baseband DAC design ([20] and [21]). As shown in this design, a typical value for bleeding current is 100 mA in total (for both positive and negative branches). Figure 6.29: Conceptual diagram of the CB PA with bleeding current sources within CMOS driver. ### 6.8 Experimental Results with Standalone CMOS Driver The proposed DDRM driver, as illustrated in Fig. 6.30, is fabricated in an LP 40 nm CMOS process. Its core circuitry, excluding the SRAMs, occupies 2.4 mm<sup>2</sup>. The total chip area (including pads) is 10 mm<sup>2</sup>. Figure 6.31 illustrates the measurement setup. The 13-bit I/Q baseband data (12 bits with one sign bit) for the DDRM is generated in Matlab and loaded using an FDTI SPI interface to the on-chip SRAMs. The output of the DDRM is monitored by a spectrum analyzer, while the output power is measured by a power meter. The EVM can be measured by a Keysight MSOS804A mixed-signal oscilloscope. The LO signals fed to the test chip are sine-wave in nature and provided by an external signal generator (Keysight E8257D). Power supplies for the DUT are used to provide the DC supply voltage to the off-chip LDO to achieve the low-noise supply voltages provided to the chip. #### 6.8.1 CW Measurement The performance of the signed Cartesian DDRM is first characterized in a CW test. For this measurement, the I/Q data loaded in the SRAM represents a static state, hence the data switches in the data path do not switch. Although the DDRM is verified to work properly from 100 MHz to 3.5 GHz in the simulation, the best-measured performance in terms of output power is achieved from 2 to 3 GHz due to bandwidth limitations of the off-chip balun As illustrated in Fig. 6.32(a), the output power reaches 19.6 dBm. For this measurement, the carrier frequency is swept from 1 to 3 GHz in steps of 100 MHz. Meanwhile, the measured harmonic rejection is also presented in the same chart. As shown, between 1 and 3 GHz, the third and fifth harmonic rejection level is better than 35 dBc and 45 dBc, respectively. Figure 6.32(b) shows the measurement result for the drain and system efficiency in the CW Figure 6.30: Micrograph of the proposed DDRM. Figure 6.31: Conceptual diagram of the measurement setup. Figure 6.32: CW measurement results: (a) output power and harmonic rejection level vs. LO frequency; (b) drain efficiency and system efficiency with/without dynamic biasing; (c) power break-up in the peak power region. Figure 6.33: Measurement result of (a) single-tone and (b) two-tone tests vs. $f_{LO}$ . measurement. At 2.4 GHz, the peak drain and system efficiencies are about 23 % and 18 %, respectively. Without activating dynamic biasing, the drain efficiency versus PBO level follows a conventional class-A efficiency curve, meaning that in the 6 dB PBO level, the drain efficiency is only a quarter of the peak efficiency. With dynamic biasing enabled, the drain efficiency in the 6 dB PBO region doubles, and the efficiency curve represents a class-B-like efficiency roll-off. Such an operation allows higher average efficiencies to be reached with complex modulated signals (e.g., QAM or OFDM) compared to conventional DDRMs. A pie diagram of the DC power consumption in the CW measurement is shown in Fig. 6.32(c). ### 6.8.2 Single-Tone and Two-Tone Measurement Following the CW measurements, single-tone test and two-tone tests are performed to investigate the IQ imbalance and linearity of the proposed DDRM. The frequency offset or tone spacing is scaled with $f_{\rm LO}$ and at 2.5 GHz, is about 5 MHz. The measured LO leakage IQ-image and C-IMD3 in single-tone tests is shown in Fig. 6.33(a), while Fig. 6.33(b) presents the IM3 and IM5 in the two-tone tests. At 2.5 GHz, the LO leakage, IQ-image and C-IMD3 are -60 dBc, -49 dBc, and -54 dBc, respectively, while the IM3 and IM5 in the two-tone test are -62 dBc and -63 dBc, respectively. Note that in this measurement, no DPD or calibration is applied. These measurement results show the superior linearity potential of the proposed DDRM. ### 6.8.3 Broadband Signal Measurement Finally, the performance of the proposed DDRM is also verified using complex modulated signals. Figure 6.34(a) shows its spectral purity when operating with a 20 MHz bandwidth single-carrier 256-QAM signal at 2.4 GHz while using dynamic biasing, yielding an ACLR of -43 dBc and an EVM Figure 6.34: Measured spectrum and constellation diagram of (a) a 20 MHz 256-QAM signal with dynamic biasing enabled; (b) an 80 MHz 256-QAM signal with dynamic biasing disabled. | Specifica | ation | This | Work | Su<br>ISSCC20 | Mehrpoo<br>JSSC18 | Su<br>JSSC21 | Shen<br>CICC20 | Yoo<br>ISSCC20 | Diddi<br>CSICS15 | Deng<br>ISSCC16 | Qian<br>JSSC21 | Zhang<br>ISSCC20 | Qi<br>ISSCC20 | |-----------------|--------|------------------|--------------------|---------------|-------------------|--------------|----------------|------------------|------------------|-----------------|----------------|------------------|---------------| | Architec | ture | DD | RM | DDRM | DDRM | DDRM | DDRM | CDAC | DPA | DPA | DPA | DPA | Analog | | Matching N | etwork | Off- | Chip | On-Chip | Off-Chip | Off-Chip | Off-Chip | On-chip | Off-Chip | Off-Chip | On-Chip | On-Chip | On-Chip | | Technology | [nm] | 4 | 0 | 65 | 40 | 65 | 40 | 65 | 180 SOI | 40 | 40 | 55 | 28 | | Frequency | [GHz] | 1- | -3 | 1.4-3 | 0.9-3.1 | 0.9-5.2 | 0.5-3 | 2.2 | 0.9 | 2.4 | 2.3-3.5 | 0.85 | 1.4-2.7 | | Peak Pout | [dBm] | 19.6@2 | 2.4GHz | 22 | 9.2 | 15 | 18.2 | 13 | 31.6 | 27 | 23.6 | 29.3 | <b>3</b> 5 | | DC Power | [mW] | 505@2 | 2.4GHz | 1350 | 146³ | 900 | 540 | N.A. | 2140 | 2230 | 790 | 1980 | 70.56 | | Peak η | % | 18 | 3.1 | 11.7 | 9.2 | 5.7 | 12.2 | N.A. | 66 | 22.4 | 29 | 43 | N.A. | | f <sub>LO</sub> | [GHz] | 2 | .4 | 2.2 | 3 | 2.4 | 2.4 | 2.2 | 0.9 | 2.4 | 3.3 | 2.4 | 2.5 | | Bandwidth | [MHz] | 20 <sup>1</sup> | 160 <sup>2</sup> | 20 | 57 | 20 | 320 | 40 | 5 | 40 | 20 | 10 | 20 | | Madulation | a Tuma | 256 | 256 | 256 | 64 | 256 | 256 | 1024 QAM | WCDMA/ | 64 QAM | 64 | 64 | 64 | | Modulation Type | | QAM | QAM | QAM | QAM | QAM | QAM | OFDM | LTE | OFDM | QAM | QAM | QAM | | ACLR1 | dBc | -43 <sup>1</sup> | -40.5 <sup>2</sup> | -45 | -44 | -42 | -43 | <45 <sup>4</sup> | -36 | <-40 | -30 | -32 | -45 | | EVM | dB | -33 <sup>1</sup> | -33 <sup>2</sup> | -40 | -30 | -42 | -32 | -427 | N.A. | <-30 | -29 | -25.6. | -37 | | DPD | Y/N | N | 0 | Yes | No | No | No | No | Yes | Yes | Yes | No | No | Table 6.1: Performance summary and comparison with state-of-the-art DDRMs/DTXs/PA drivers <sup>1</sup>Dynamic Biasing Enabled; <sup>2</sup> with Dynamic Biasing Disabled; <sup>3</sup> LO generation power is not included. <sup>4</sup>Estemated from Figure; <sup>5</sup> Average power; <sup>6</sup> not including baseband DAC; <sup>7</sup> measured at the output of -3dBm of -33 dB. As discussed above, for a higher modulation bandwidth, the dynamic biasing technique becomes less effective. The measured spectrum of an 80 MHz bandwidth single-carrier 256-QAM signal (without dynamic biasing) is shown in Fig. 6.34(b). The corresponding out-of-band spectral purity is better than -44.5 dBc, with an EVM of better than -35 dB. When the bandwidth reaches 160 MHz, the resulting ACLR is -40.5 dBc, and the EVM is better than -33 dB. Again, DPD and calibration are omitted. #### 6.8.4 Comparison to State-of-the-Art TXs Table 6.1 summarizes the performance and compares this work to relevant DTXs and analog modulators. Among the DDRMs, the proposed signed DDRM achieves the highest system efficiency, although it is still below the DPA efficiency levels reported in the literature. What is more, to the author's knowledge, the proposed signed DDRM is the first reported class-B DDRM with a doubled drain efficiency in the 6 dB PBO region. Besides power efficiency performance, the proposed DDRM also presents superior linearity performance. Among the reported DDRMs, this DDRM can have 160 MHz bandwidth with comparable linearity while providing current-mode output. Although [10] shows better ACLR performance, its efficiency is much lower when using a limited signal bandwidth. Compared to the DPA-based common-gate PA drivers in [3] for [4], the proposed driver offers better spectral purity and a larger modulation/operational-RF bandwidth without an external modulator or any need for DPD. ## 6.9 Experimental Results with a CB BJT PA The proposed DDRM is also measured with a CB SiGe PA fabricated by NXP Inc. The assembled micrograph is shown in Fig. 6.35. As discussed in Section 6.7, in the die shown in Fig. 6.35, Figure 6.35: Combined micrograph of the SiGe-CMOS line-up. there are four pairs of push-pull CB PAs and an on-chip balun, which combines the output current and converts the differential signal into a single-ended representation. The bases of the transistors are biased at 2.9 V, and the collectors are biased at 7.3 V, since the breakdown voltage is about 12 V. First, CW measurements are applied, the results of which are shown in Fig. 6.36(a), and the third and fifth harmonic rejection level are shown in Fig. 6.36(b). The 3 dB bandwidth is 1.95-2.45 GHz, and is mostly limited by the output balun showing a fractional bandwidth larger than 20 %. The maximum peak power is more than 27 dBm, with a system efficiency of 20 %. The main reason for the relatively low system efficiency is the low collector DC voltage (7.3 V). The measured third and fifth harmonic levels are better than -36 dBc and -50 dBc, respectively. The TX line-up is also characterized using a broadband signal. The measured results for an 80 MHz single-carrier 64-QAM spectrum at 2.2 GHz are shown in Fig. 6.37(a). Its ACLR is better than -32 dBc while the EVM is better than -27 dB. The degraded linearity performance can be blamed partly on the non-ideal connection between the driver and the CB PA, as well as imperfections in the output balun implementation. When bleeding currents are used (100 mA), the ACLR and EVM improve to -37 dBc and -30 dB, respectively (Fig. 6.37(b)). Table 6.2 compares the performance of the CMOS driver MMIC PA TX line-up. This demonstrator avoids the use of DPD completely. The power efficiency of the proposed combination is lower than reported in other published state-of-the-art work, but this is mainly due to the relatively low collector voltage used for the CB BJT PA. If a higher voltage PA can be deployed, system efficiency will be dramatically increased. Nevertheless, the realized demonstrator still shows a decent linearity performance, while handling a large video bandwidth. As such, it demonstrates the potential to achieve high-power, high-linearity, high-efficiency operation with future implementation. 6.10 Conclusion 147 Figure 6.36: (a) Measured output power and efficiency vs. $f_{LO}$ ; (b) measured third and fifth harmonics vs. $f_{LO}$ . Table 6.2: Performance summary and comparison with state-of-the-art line-ups. | Specification | l | This Work | Diddi'<br>RFIC16 | Bootsman'<br>IMS20 | |-------------------|-------|----------------------------------------|------------------|--------------------| | Architecture | | DDRM+PA | DPA+PA | DPA+PA | | Connection to I | PA | Current | Current | Voltage | | Technology: CMOS | +MMIC | 40nm+BJT | 180nm SOI+GaN | 40nm+LDMOS | | On Chip Modulator | Y/N | Yes | No | No | | Supply Voltage | [V] | 1.1/2.5/7.3 | 1.5/4.1/15 | 1.1/2.5/20 | | Frequency | [GHz] | 2.2 | 0.9 | 2.1 | | Peak Pout | [dBm] | 27 | 34.5 | 43.7 | | Peak Efficiency | % | 20 | 50 | 62.6 | | Bandwidth | [MHz] | 80 | 5 | 10 | | Modulation Type | | 64 QAM | 16 QAM | 256 QAM | | ACLR1 | dBc | -32.3 <sup>1</sup> /-37.7 <sup>2</sup> | -36 | -46 | | EVM | dB | -27 <sup>1</sup> /-30 <sup>2</sup> | N.A. | -38 | | DPD | Y/N | No | Yes | Yes | <sup>&</sup>lt;sup>1</sup> Without bleeding current; <sup>2</sup> with bleeding current ### 6.10 Conclusion A novel line-up architecture, including a signed DDRM as a driver and a CG/CB PA, is presented in this chapter. It features an advanced architecture with an auxiliary current division path, signed IQ-mapping, dynamic biasing, and HR technique to boost efficiency and linearity. The proposed standalone driver operates over a 1-3 GHz frequency range while generating 19.6 dBm peak RF power with 505 mW DC power consumption at 2.4 GHz. For a 160 MHz 256-QAM signal, the measured ACLR is better than -40.5 dBc. When combined with a CB SiGe PA, the peak output power is increased to 27 dBm. The proposed configuration can support an 80 MHz 64-QAM signal with an ACLR of -37.7 dBc and an EVM of -30 dB without using any DPD. The proposed DTX-CB/CG TX line-up concept can act as an enabler for energy-efficient, wideband DTX solutions, for WIFI 6/7 or 5G cellular network applications, which do not demand any DPD. Figure 6.37: Measured spectrum and constellation diagram of the 80 MHz 64-QAM signal (a) without bleeding current; (b) with 100 mA bleeding current. 6.10 References 149 ### References [1] C. Mayer et al., "A direct-conversion transmitter for small-cell cellular base stations with integrated digital predistortion in 65nm CMOS," 2016 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), San Francisco, CA, 2016, pp. 63-66 - [2] D. J. McLaurin *et al.*, "A highly reconfigurable 65nm CMOS RF-to-bits transceiver for full-band multicarrier TDD/FDD 2G/3G/4G/5G macro basestations," 2018 *IEEE International Solid State Circuits Conference (ISSCC)*, San Francisco, CA, 2018, pp. 162-164 - [3] V. Diddi, H. Gheidi, Y. Liu, J. Buckwalter and P. Asbeck, "A Watt-Class, High-Efficiency, Digitally-Modulated Polar Power Amplifier in SOI CMOS," 2015 *IEEE Compound Semiconductor Integrated Circuit Symposium (CSICS)*, New Orleans, LA, 2015, pp. 1-4. - [4] V. Diddi, S. Sakata, S. Shinjo, V. Vorapipat, R. Eden and P. Asbeck, "Broadband digitally-controlled power amplifier based on CMOS/GaN combination," 2016 *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, San Francisco, CA, 2016, pp. 258-261. - [5] S. Su and M. S. -W. Chen, "A Time-Approximation Filter for Direct RF Transmitter," in *IEEE Journal of Solid-State Circuits* - [6] S. Su and M. S. Chen, "10.2 A SAW-Less Direct-Digital RF Modulator with Tri-Level Time-Approximation Filter and Reconfigurable Dual-Band Delta-Sigma Modulation," 2020 IEEE International Solid- State Circuits Conference (ISSCC), 2020, pp. 174-176 - [7] M. Mehrpoo, M. Hashemi, Y. Shen, L. C. N. de Vreede and M. S. Alavi, "A Wideband Linear I/Q -Interleaving DDRM," in IEEE Journal of Solid-State Circuits, vol. 53, no. 5, pp. 1361-1373, May 2018 - [8] N. Markulic, P. T. Renukaswamy, E. Martens, B. van Liempd, P. Wambacq and J. Craninckx, "A 5.5-GHz Background-Calibrated Subsampling Polar Transmitter With -41.3-dB EVM at 1024 QAM in 28-nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 54, no. 4, pp. 1059-1073, April 2019 - [9] Y. Shen, R. Bootsman, M. S. Alavi and L. de Vreede, "A 0.5-3 GHz I/Q Interleaved Direct-Digital RF Modulator with up to 320 MHz Modulation Bandwidth in 40 nm CMOS," 2020 IEEE Custom Integrated Circuits Conference (CICC), Boston, MA, USA, 2020, pp. 1-4 - [10] E. Roverato et al., "All-Digital LTE SAW-Less Transmitter With DSP-Based Programming of RX-Band Noise," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 12, pp. 3434-3445, Dec. 2017 - [11] S. Yoo, S. Hung, J. S. Walling, D. J. Allstot and S. Yoo, "10.7 A 0.26mm<sup>2</sup> DPD-Less Quadrature Digital Transmitter With <-40dB EVM Over >30dB Pout Range in 65nm CMOS," 2020 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2020, pp. 184-186 - [12] P. E. Paro Filho, M. Ingels, P. Wambacq and J. Craninckx, "9.3 A transmitter with 10b 128MS/S incremental-charge-based DAC achieving -155dBc/Hz out-of-band noise," 2015 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, San Francisco, CA, 2015, pp. 1-3 - [13] Y. Shen, R. Bootsman, M. S. Alavi and L. C. N. de Vreede, "A 1–3 GHz I/Q Interleaved Direct-Digital RF Modulator As A Driver for A Common-Gate PA in 40 nm CMOS," 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Los Angeles, CA, USA, 2020, pp. 287-29 - [14] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede and J. R. Long, "A Wideband 2× 13-bit All-Digital I/Q RF-DAC," in *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 4, pp. 732-752, April 2014 - [15] Z. Deng et al., "9.5 A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40nm CMOS," 2016 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 172-173 - [16] J. A. Weldon et al., "A 1.75-GHz highly integrated narrow-band CMOS transmitter with harmonic-rejection mixers," in *IEEE Journal of Solid-State Circuits*, vol. 36, no. 12, pp. 2003-2015, Dec. 2001 - [17] B. Yang, E. Y. Chang, A. M. Niknejad, B. Nikolić and E. Alon, "A 65-nm CMOS I/Q RF Power DAC With 24- to 42-dB Third-Harmonic Cancellation and Up to 18-dB Mixed-Signal Filtering," in IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 1127-1138, April 2018 - [18] C. Huang, Y. Chen, T. Zhang, V. Sathe and J. C. Rudell, "A 40nm CMOS single-ended switch-capacitor harmonic-rejection power amplifier for ZigBee applications," 2016 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), San Francisco, CA, 2016, pp. 214-217 - [19] M. Hashemi, L. Zhou, Y. Shen, M. Mehrpoo and L. de Vreede, "Highly efficient and linear class-E CMOS digital power amplifier using a compensated Marchand balun and circuit-level linearization achieving 67 % peak DE and -40dBc ACLR without DPD," 2017 IEEE MTT-S International Microwave Symposium (IMS), Honololu, HI, 2017, pp. 2025-2028 - [20] C. Lin et al., "A 12 bit 2.9 GS/s DAC With IM3 $\ll$ -60 dBc Beyond 1 GHz in 65 nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 44, no. 12, pp. 3285-3293, Dec. 2009 - [21] C. Lin et al., "A 16b 6GS/S nyquist DAC with IMD <-90dBc up to 1.9GHz in 16nm CMOS," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 360-362 - [22] R. J. Bootsman et al., "An 18.5 W Fully-Digital Transmitter with 60.4 % Peak System Efficiency," 2020 IEEE/MTT-S International Microwave Symposium (IMS), Los Angeles, CA, USA, 2020, pp. 1113-1116 ### CHAPTER ## Conclusion This dissertation is focused on the analysis, design, and implementation of digital-intensive up-converters, which are the core elements of modern communication TXs. In the first part, a fully integrated digital-intensive polar Doherty TX is introduced together with a wideband phase modulator. In the second part, two different versions of current-mode DDRM are proposed with the aim to achieve high linearity over a large modulation bandwidth. All the realized TX modulators/TX line-ups in this dissertation demonstrate better performance (efficiency/linearity) than prior-art implementations. This final chapter concludes with the most important findings of the dissertation, namely: first a summary of the technical findings of this work in Section 7.1, followed by some personal findings/experiences in Section 7.2. Section 7.3 concludes with suggestions and recommendations for future DTX research work. ### 7.1 Thesis Outcome Chapter 3 reports the first-ever realized "Bits-In RF-Out" Doherty polar DTX. This single-chip DTX includes the whole DTX chain, ranging from digital baseband, wideband phase modulator, to the Doherty DPA. In this design, a QLI class-E topology is employed in combination with a compact Doherty power combiner/output matching network to achieve high average efficiency. The digital baseband includes a CORDIC, normalizers, and LUTs to apply static ACW-AM and ACW-PM-based DPD. The realized polar DTX achieves a peak power of +21.4 dBm, with drain efficiency at peak power and 6 dB PBO reaching 49.4 % and 33.7 %, respectively. Without applying any external DPD, the DTX achieves an EVM of -31 dB for a 40 MHz 64-QAM signal with an 152 Conclusion average drain efficiency of 25 %, while satisfying the 802.11ac spectral mask. Chapter 4 details a novel wideband frequency-agile phase modulator, which is used in the polar DTX of Chapter 3. An HR-based phase modulator is employed to suppress C-IMD products caused by the non-linearities of the RFDAC, and odd harmonics, over a wide frequency range. The proposed phase modulator operates over the frequency range of 0.6-2.5 GHz and achieves -45 dBc LO leakage and -58 dBc IQ image at $f_{\rm LO}$ =2.5 GHz. The phase modulator can support an 80 MHz 64-QAM signal in a polar TX at 2.4 GHz, with an output emission better than -40 dBc and an EVM better than -27 dB. Chapters 3 and 4 are first focused on designing energy-efficient DTXs, while maintaining the linearity by on-chip DPD. Next, we aim to improve the efficiency of the high-linear driver/line-up. Chapter 5 concentrates on designing a DDRM as a driver for the external CS/CE PA. A wideband DDRM with an IQ-mapping technique was presented. It features a novel IQ-mapping unit cell to boost the RF output power (efficiency) and in-band linearity. The proposed quadrature RF modulator operates over a 0.5-3 GHz frequency range while generating +14.1 dBm peak RF output power with DC power consumption of only 340 mW at 2 GHz. With more than 5 dBm of average output power, the ACLR is better than -43 dBc when applied to a 320 MHz 256-QAM signal. In Chapter 6, a current-mode TX line-up configuration is proposed, which combines an I/Q DDRM CMOS driver with a CG/CB PA. Here the I/Q DDRM features an advanced architecture with an auxiliary current division path, signed IQ-mapping, dynamic biasing, and class-B HR to boost overall efficiency and linearity. The I/Q DDRM generates +19.6 dBm peak RF power with 505 mW DC power consumption at 2.4 GHz. For a 160 MHz 256-QAM signal, the measured ACLR of the standalone driver is better than -40.5 dBc, providing an EVM of -33 dB without using any DPD. When connected to the CB SiGe PA, the measured uncorrected ACLR is -32 dBc and -37 dBc, respectively, for an 80 MHz 64-QAM signal without and with bleeding current. ## 7.2 Personal Experiences and Contributions to Other DTXs DTX design demands a wide range of expertise, is extremely labor-intensive, and is also an expensive hobby. Given this, although obtaining a Ph.D. degree is often related to individual achievements, DTX development demands good team spirit to meet the industry expectations and specifications. Thus, it turned out during the course of this thesis work, that the DTX can only exist with good communication and cooperation. In this setting, the author had the honor of interacting, collaborating and contributing to "parallel" DTX developments. Some of the outcomes are shown in Fig. 7.1, illustrating each DTX with its own unique feature(s). Although that work falls outside the exploration range of this dissertation, the author acknowledges his colleagues and supervisors for all these inspiring collaboration efforts. The related chip area occupies more than 40 mm<sup>2</sup> in total and this number includes some designs not published yet. The chip related to the work in this dissertation (headed by the author) is more than 18 mm<sup>2</sup>. Dohery Polar DTX & Phase Modulator; RFIC17, SiRF 18; Chip Lead: Y. Shen Unsigned/Signed DDRMs & Line-ups; CICC20, RFIC20; Chip Lead: Y. Shen Non-Linear Sizing DPAs ISSCC17, JSSC17, TMTT19 Chip Lead: M. Hashemi IQ-Interleaving DDRM RFIC17, JSSC18 Chip Lead: M. Mehrpoo Digital-Intensive CMOS-LDMOS Line-ups IMS20, EuMW20 Chip Lead:R. Bootsman & D. Mul 4-way Doherty Cartesian DTX ISSCC21 Chip Lead: M. Beikmirza 30GHz 4-way Doherty DTXs CICC21 Chip Lead: M. Mortezavi ## To be continued Figure 7.1: Chip Gallery (only published works). 154 Conclusion ### 7.3 Suggestions for Future Developments The research presented in this thesis introduces several digital-intensive up-converters techniques that aim to improve efficiency and linearity simultaneously. The effectiveness of the proposed concepts has been demonstrated through various hardware realizations that achieve state-of-the-art performance. These encouraging results not only confirm existing insights but also provide new ideas and opportunities for future research activities. We address some of those below: - Generally speaking, conventional CMOS technologies are not suitable for generating truly large output power (>30 dBm) due to their low breakdown voltages and lossy substrate. To realize high power DTX operation, and make the DTX more appealing to a larger group of wireless applications, the DTX designs also need to make use of RF power technologies such as GaN and LDMOS. Initial works in this direction with GaN or LDMOS power dies have been recently published ([1]-[3]). However, GaN and LDMOS processes by themselves cannot support large-scale DSP circuitry as CMOS does, since they cannot offer a comparable integration level. Consequently, to create meaningful power DTX implementations, they must be implemented as a system in a package (SiP), using a separate CMOS controller and LDMOS or GaN power die. Or in future implementations, they could even be implemented as a single chip (e.g. GaN-on-silicon), in which the "silicon" should offer full CMOS functionality. At this moment, it will take many technological innovations in both GaN/LDMOS, as well as CMOS technology to make them truly compatible on a single wafer. Despite the challenge, this might be the road to success and a complete victory for power DTX implementations. - In this dissertation, a symmetric Doherty PA with an efficiency enhancement at 6 dB PBO is proposed. However, in modern telecommunication systems, the typical PAPR value is 9-12 dB, since OFDM signals are employed. Therefore, the efficiency enhancement should be shifted to the deep PBO region (>12 dB). There are already several researchers exploring this possibility (e.g., [4]). Although efficiency can be enhanced at such a deep PBO, it turns out that in most practical implementations, both bandwidth and linearity tend to degrade. Furthermore, correcting their linearity with DPD has proven to be difficult, especially when operating with large video bandwidths (e.g., 120 MHz in [4]). In these cases, DPD yields poor OOB noise performance, and so, additional filtering is needed ([5]). Therefore, future research should focus on how to improve linearity, modulation bandwidth, and efficiency in the deep PBO region simultaneously. Also, low-cost on-chip DPD implementations with limited power consumption are urgently needed. - As discussed in this dissertation, compared to Cartesian DTXs, polar DTXs are easier to pre-distort due to the absence of IQ interaction. However, the bandwidth of the phase modulator in such a polar system limits the overall bandwidth of the transmitter. In [6], an open-loop phase modulator is proposed that can support a 160 MHz bandwidth signal, while consuming 70 mW DC power. Future research can be directed to the design of wideband phase modulators with even lower power consumption. 7.3 References 155 • This dissertation work proposes unsigned/signed DDRMs with large video bandwidth for use below 3 GHz. Designing these DDRMs for the 3-6 GHz range with comparable linearity and efficiency is challenging due to the always-present device and circuit parasitics (e.g., [7]). Additionally, in millimeter-wave design, conventional analog TX line-ups are exclusively used. Developing energy-efficient DDRMs for these frequency bands can prove to be very attractive in emerging upcoming fields, such as 5G communication systems, that are more energy and cost-effective, while offering higher functionality than their analog-oriented counterparts. • This dissertation proposes a DDRM with a CG/CB PA output stage configuration and has demonstrated its performance with a CB PA using a "low-power" SiGe BJT technology. Compared to the GaN CG PA in [1], the output power of the proposed demonstrator is limited by the relatively low breakdown voltage of the BJT device. Consequently, to take full advantage of the proposed techniques in this dissertation, more design effort should be focused on the implementation of a high power CG/CB MMIC PA. ### References - [1] V. Diddi, S. Sakata, S. Shinjo, V. Vorapipat, R. Eden and P. Asbeck, "Broadband digitally-controlled power amplifier based on CMOS/GaN combination," 2016 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2016, pp. 258-261 - [2] R. J. Bootsman *et al.*, "An 18.5 W Fully-Digital Transmitter with 60.4 % Peak System Efficiency," 2020 IEEE/MTT-S International Microwave Symposium (IMS), 2020, pp. 1113-1116 - [3] D. P. N. Mul et al., "Efficiency and Linearity of Digital "Class-C Like" Transmitters," 2020 50th European Microwave Conference (EuMC) 2021, pp. 1-4 - [4] M. Beikmirza et al., "6.2 A 4-Way Doherty Digital Transmitter Featuring 50 %-LO Signed IQ Interleave Upconversion with more than 27dBm Peak Power and 40 % Drain Efficiency at 10 dB Power Back-Off Operating in the 5GHz Band," 2021 IEEE International Solid-State Circuits Conference (ISSCC), 2021 - [5] Steve Cripps, RF Power Amplifiers for Wireless Communications, Second Edition, Artech, 2006. - [6] A. Ben-Bassat et al., "A Fully Integrated 27-dBm Dual-Band All-Digital Polar Transmitter Supporting 160 MHz for Wi-Fi 6 Applications," in *IEEE Journal of Solid-State Circuits*, vol. 55, no. 12, pp. 3414-3425, Dec. 2020 - [7] B. Zheng, L. Jie and M. P. Flynn, "A 6-GHz MU-MIMO Eight-Element Direct Digital Beamforming TX Utilizing FIR H-Bridge DAC," in *IEEE Transactions on Microwave Theory and Techniques* 156 Conclusion | 1.1 | Evolution of cellular mobile communication standards. [1] | 2 | |------|---------------------------------------------------------------------------------------------------------------------|----| | 1.2 | Roadmap of Moore's law. [2] | 3 | | 1.3 | Saturated output power of different semiconductor device technologies vs. frequency. | | | | $[3] \dots $ | 4 | | 1.4 | Side view of typical CMOS devices. [4] | 5 | | 1.5 | Front and rear mainboard of an iPhone-12 (Courtesy of Apple Inc.) | 6 | | 2.1 | Drain efficiency of an ideal class-A and class-B power amplifier versus PBO level, | | | | relative to the peak power condition at 0 dB | 11 | | 2.2 | Block diagram of an analog intensive Cartesian TX | 12 | | 2.3 | Block diagram of a polar TX line-up | 13 | | 2.4 | (a) Spectrum of IQ signal, AM and PM signal in a polar TX; (b) degradation of | | | | ACLR and EVM vs. AM/PM delay mismatch in a polar TX with 10 MHz 64-QAM $$ | | | | signal | 14 | | 2.5 | Supply modulator: (a) linear regulator; (b) switching regulator | 15 | | 2.6 | (a) Block diagram of envelope tracking system; (b) time domain waveform of AM | | | | signal [3] | 16 | | 2.7 | Equivalent model of a DPA in polar DTXs | 16 | | 2.8 | Block diagram of a polar DTX with direct AM modulation | 17 | | 2.9 | Efficiency versus. PBO in a typical polar TX | 17 | | 2.10 | (a) Block diagram of a DPA-based Cartesian DTX and (b) its constellation diagram. | | | | [10] | 18 | | 2.11 | Equivalent model of DPA-based Cartesian DTXs | 19 | | 2.12 | Typical schematic of a (a) baseband DAC and (b) active up-mixer | 19 | | 2.13 | DDRM proposed in [12] | 20 | | 2.14 | Efficiency of a class-AB PA and the PDFs of a LTE signal and a WLAN signal vs. | | | | PBO level. [13] | 21 | | 2.15 | (a) Top-level topology of the class-G PA; (b) Drain and average efficiency comparisons | | | | for a PA with dual supply versus a class-AB PA. [14] | 22 | | | | | | | (a) Basic Doherty PA; (b) equivalent schematic of the Doherty PA Load impedance (a) and voltage swing (b) of the main PA and peak PA of a Doherty PA; and (c) efficiency curve vs. PBO level | 22<br>23 | |------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------| | 3.1<br>3.2 | Conceptual block diagram of a polar DTX | 28<br>29 | | 3.3 | (a) Behavioral model of a Doherty DPA when assuming QLI class-E PA operation;<br>(b) related simulated efficiency. | 29 | | 3.4 | (a) Principle schematic of a single-ended QLI class-E DTX; (b) drain voltage for different activation levels of the unit cells | 30 | | 3.5 | (a) Schematic of push-pull class-E DPA; (b) single-ended LTI model | 30 | | 3.6 | Simulated and calculated (a) ACW-AM curve and (b) ACW-PM curve for a standalone | | | | DPA, where the parameters in the calculation are extracted from DC simulations | 32 | | 3.7 | Doherty DPA model (a) below 6 dB PBO region and (b) beyond 6 dB PBO region. | 32 | | 3.8 | Simulated and calculated (a) ACW-AM curve and (b) ACW-PM curve for a Doherty | 20 | | 3.9 | DPA, where the parameters in the calculation are extracted from DC simulations Block diagram of the proposed DTX | $\frac{33}{34}$ | | 3.10 | Top-level schematic of the output stage | $\frac{34}{35}$ | | | Floorplan of DPA array in one branch | $\frac{35}{37}$ | | 3.12 | | 38 | | | (a) Schematic of DFF; (b) schematic of latch | 38 | | | 3D layout of output matching network | 39 | | 3.15 | Simulated (a) Q factor and (b) effective inductance of the primary and secondary | | | | loop, respectively; simulated (c) coupling factor and (d) insertion loss, respectively. $\boldsymbol{.}$ | 40 | | 3.16 | Simulation results of (a) passive efficiency using an EM-simulator for the power | | | | combiner network; (b) drain efficiency of the complete Doherty DPA | 40 | | 3.17 | Load-pull simulation results of the Doherty DPA at (a) peak power condition | | | | (ACW=4095) and (b) 6 dB PBO operation (ACW=2048) | 42 | | | Conceptual block diagram of the digital baseband block | | | | Conceptual block diagram of LO generation circuitry. | 44 | | 3.20 | Comparison of different types of dividers in terms of operating frequency and power in a 40 nm bulk CMOS technology [31] | 4.4 | | 3 91 | Schematic of (a) $C^2MOS$ fully differential divided-by-2 divider; (b) $C^2MOS$ fully- | 44 | | 5.21 | differential DFF | 46 | | 3.22 | Chip micrograph of fabricated polar Doherty DTX | 47 | | | Diagram of the measurement setup used for the polar Doherty DTX characterization. | | | | Measured (a) drain efficiency and RF output power vs. $f_{LO}$ ; (b) drain efficiency vs. | | | | RF output power at $f_{\text{LO}}$ =2.5 GHz | 49 | | 3.25 | Measured phase noise and integrated jitter at (a) $f_{\rm LO}{=}2.4$ GHz at the output of the | | | | single branch; (b) input LO signal at 9.6 GHz | 50 | | 3.26 | Measured IQ image and LO leakage of phase modulator (a) before and (b) after calibration | 50 | |------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| | 3.27 | Measured (a) ACW-AM and (b) ACW-PM characteristic | 51 | | | Measured spectrum and constellation diagram of 20 MHz (a) and 40 MHz (b) 64-QAM | - | | | signals | 52 | | 4.1 | (a) Top-level conceptual diagram of a typical CP-PLL (type 2); (b) small-signal model of the CP-PLL in the phase domain | 60 | | 4.2 4.3 | Conceptual diagram of (a) the direct modulation; (b) the two-point modulation Open-loop phase modulator with a coarse tapped delay line with (a) a fine digitally-controlled delay and (b) a digital $\Delta\Sigma$ modulator | 61<br>62 | | 4.4 | <ul><li>(a) Replacing BPF with RC-LPF in a conventional Cartesian-based phase modulator;</li><li>(b) principle of C-IMD explained using single-sideband modulation</li></ul> | 64 | | 4.5<br>4.6 | Static phase error for various RC-LPFs | 66<br>67 | | 4.7 | the fundamental frequency and the third and fifth harmonics | 69 | | 4.8 | (a) Conceptual schematic of an RFDAC unit cell; (b) schematic of an RFDAC unit cell with dummy switches | 70 | | 4.9 | Floorplan of one single RFDAC | 70 | | 4.10 | | 71 | | 4.11 | Schematic of (a) analog gain stages and (b) buffer chain | 72 | | 4.12 | Micrograph of the phase modulator | 73 | | 4.13 | Measured (a) LO leakage and IQ image vs. $f_{LO}$ , (b) constellation diagram for 128 phases at $f_{LO}$ = 2.5 GHz | 73 | | 4.14 | Measured (a) output phase and (b) static phase error vs. input phase at $f_{LO}$ = 2.5GHz, which equals $0.72^{\circ}$ (R.M.S value) | 73 | | 4.15 | Measured spectrum of (a) an 8 Mb/s GFSK signal at $f_{LO}=1$ GHz; (b) an 80 Mb/s GFSK signal at $f_{LO}=2.5$ GHz | 74 | | 4.16 | Measured spectrum of a 75 Mb/s GMSK signal at $f_{\rm LO}=2.4$ GHz, with a trellis diagram | 75 | | 4.17 | Measured spectrum and constellation diagram of an 80 MHz 64-QAM signal at $f_{\rm LO}{=}2.4$ GHz | 75 | | 5.1 | Typical block diagram of (a) an analog modulator; (b) a DDRM | 80 | | 5.2 | Conceptual diagram of the conventional DDRM in (a) [7] and (b) [9] | 81 | | 5.3 | Simplified universal modulator model for a single-branch mixing DAC or DPA. $$ . $$ . | 82 | | 5.4<br>5.5 | Simplified universal modulator model for a Cartesian-based DDRM or DPA Resulting constellation diagram of the behavioral simulation in a DDRM (a) with | 83 | | | 25~% LO; (b) with $50~%$ LO and DPA (c) with $25~%$ LO; (d) with $50~%$ LO | 85 | | 5.6 | (a) Reported output power and efficiency for existing DDRMs and DPA-based TXs; | |------|--------------------------------------------------------------------------------------------------------------------| | | (b) normalized drain efficiency versus PBO level for the DDRM and DPA 85 | | 5.7 | (a) Typical spectrum of conventional DDRM architectures; (b) transient waveform | | | and corresponding constellation diagram of one IQ pair typically used in conventional | | | DDRMs | | 5.8 | MC simulation results of (a) gain mismatch; (b) phase mismatch | | 5.9 | Conceptual diagram of the IQ-interleaving DDRM in [12] | | 5.10 | (a) Conceptual diagram of the proposed IQ-mapping DDRMs and (b) its spectrum. 91 | | 5.1 | Principle of intrinsic image rejection within unit cells | | 5.13 | 2 Simulated spectra in behavior MC simulation with single-sideband signals: (a) | | | conventional DDRMs; (b) proposed IQ-mapping DDRMs. Simulated spectra with | | | the whole multi-carrier signal and one signal channel: (c) conventional DDRMs; (d) | | | proposed IQ-mapping DDRMs | | | 3 Systematic block diagram of the proposed DDRM with the IQ-mapping unit cell 94 | | | Detailed unit-cell schematic of the proposed DDRM | | 5.15 | 5 Two possible implementations of phase multiplexer: (a) hybrid mode; (b) voltage mode. 96 | | 5.10 | Floorplan of the proposed DDRM | | 5.1' | 7 Top-level layout of LO tree | | 5.18 | B Layout of output current combiner | | 5.19 | Block diagram of local data decoder | | 5.20 | Block diagram of LO clock generation circuits | | 5.2 | Schematic of (a) $C^2MOS$ divider; (b) LO level shifter | | 5.25 | 2 Comparison of ZOH, FOH, and SOH interpolations: (a) wideband spectrum with | | | various interpolation filters; (b) magnitude responses; (c) in-band magnitude responses; | | | (d) magnitude response at the first sampling replica | | 5.23 | 3 Simulated EVM with (a) correct sampling time and (b) incorrect sampling time in | | | behavior simulations, respectively | | 5.24 | Block diagram of DEM and thermometer encoder | | 5.25 | 5 Chip micrograph of the proposed DDRM | | 5.20 | 6 Measured peak $P_{\mathrm{OUT}}$ and $P_{\mathrm{DC}}$ (with 50 $\Omega$ and 12 $\Omega$ ) vs. $f_{\mathrm{LO}}$ | | 5.2' | 7 Measured LO leakage, IQ image, and IM3 level vs. $f_{\rm LO}$ (with DEM disabled); and | | | measured spectrum the (b) single-tone test, the (c) two-tone test when DEM is | | | disabled, and (d) the two-tone test when DEM is enabled | | 5.28 | 8 (a) Measured spectrum and EVM of a 20 MHz 256-QAM signal; (b) measured broad | | | span spectrum of a 144 MHz 64-QAM signal at 2.2 GHz | | 5.29 | • • • • • • • • • • • • • • • • • • • • | | | measured spectrum and EVM of a 320 MHz 256-QAM signal at 2.4 GHz 109 | | 5.30 | (a) Linearity performance vs. modulation bandwidth at 2.4 GHz; (b) ACLR perfor- | | | mance vs. center frequency $f_{LO}$ while the modulation bandwidth is 10 MHz 110 | | 5.31 | (a) Measured spectrum of a single-channel carrier aggregated signal at 2 GHz; (b) measured spectrum of an 11-channel carrier aggregated signal at 2.2 GHz with constellation diagram results from the weakest channel | |------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 6.1 | (a) Conceptual diagram of a common source (CS) output stage; (b) corresponding | | | voltage/current waveforms | | 6.2 | Typical I-V curve in a CS-connected transistor | | 6.3 | Conceptual diagram of a DDRM driving a CG PA | | 6.4 | Ideal current curve:(a) class-A type; (b) class-B type | | 6.5 | Conceptual diagram of driving a CG PA by (a) a DPA and (b) a DDRM 119 | | 6.6 | (a) Conventional DDRM with division paths; (b) its unit cell up-converted current waveform | | 6.7 | Settling behavior of output current with a load of (a) 3 $\Omega$ and (b) 12 $\Omega$ | | 6.8 | Proposed top-level schematic of the DDRM | | 6.9 | Different combinations with their corresponding waveforms within a single unit cell. 121 | | 6.10 | (a) I/Q complementary decoding scheme; (b) constellation diagram transformation. 123 | | 6.11 | (a) DNL pattern of MSB in the proposed decoder pattern; (b) simulated image | | | rejection ratio distribution with $\sigma = 1$ LSB with and without the signed IQ-mapping | | | technique | | 6.12 | Detailed topology of a unit cell | | 6.13 | Detailed topology of the mixing unit | | 6.14 | (a) $\eta_{Drain}$ vs. PBO region in a conventional DDRM and the proposed signed DDRM; | | | (b) signal-dependent distortion introduced by a fluctuating DC level | | 6.15 | Concept of dynamic biasing technique | | | Topology of current source switch in unit cells | | 6.17 | Principle of the class-A HR technique: (a) block diagram; (b) timing diagram; (c) waveform with the summation of I and Q signals | | 6.18 | Principle of the class-B HR technique: (a) block diagram; timing diagram; (c) | | | spectrum; (d) push-pull output to cancel even harmonics | | 6.19 | Harmonic rejection levels with mismatch for different amplitudes and phase mismatch: | | | (a) for the third harmonic; (b) for the fifth harmonic | | 6.20 | Harmonic rejection contours with duty-cycle mismatch: (a) for the third harmonic; | | | (b) for the fifth harmonic with an amplitude error | | | Floorplan of the proposed dual RFDACs | | | Topology of the mixing core in the unit cell of the 25 $\%$ RFDAC | | | Timing diagram of the dynamic biasing in the 25 $\%$ RFDAC | | | Top-level architecture of proposed DDRM | | | Topology of the LO generation circuitry | | 6.26 | (a) Schematic of the first stage divider; (b) schematic of the CML latch 137 | | 6.27 | Compensated Marchand balun with second harmonic termination implemented by a | |------|------------------------------------------------------------------------------------------| | | via, and the measured and simulated differential-to-single-ended transmission loss in | | | [19] | | 6.28 | Connection between CMOS driver and CG/CB PA | | 6.29 | Conceptual diagram of the CB PA with bleeding current sources within CMOS driver. 140 | | 6.30 | Micrograph of the proposed DDRM | | 6.31 | Conceptual diagram of the measurement setup | | 6.32 | CW measurement results: (a) output power and harmonic rejection level vs. LO | | | frequency; (b) drain efficiency and system efficiency with/without dynamic biasing; | | | (c) power break-up in the peak power region | | 6.33 | Measurement result of (a) single-tone and (b) two-tone tests vs. $f_{LO}$ | | 6.34 | Measured spectrum and constellation diagram of (a) a 20 MHz 256-QAM signal | | | with dynamic biasing enabled; (b) an 80 MHz 256-QAM signal with dynamic biasing | | | disabled | | 6.35 | Combined micrograph of the SiGe-CMOS line-up | | 6.36 | (a) Measured output power and efficiency vs. $f_{\rm LO}$ ; (b) measured third and fifth | | | harmonics vs. $f_{LO}$ | | 6.37 | Measured spectrum and constellation diagram of the 80 MHz 64-QAM signal (a) | | | without bleeding current; (b) with 100 mA bleeding current | | 7.1 | Chip Gallery (only published works) | ## List of Tables | 3.1 | Performance summary and comparison with state-of-the-art DPAs with various | |-----|------------------------------------------------------------------------------------| | | efficiency enhancement techniques | | 3.2 | Performance summary and comparison with state-of-the-art polar DTXs 54 | | | | | 4.1 | Performance summary and comparison with state-of-the-art phase modulators 76 | | | | | 5.1 | Performance summary and comparison with state-of-the-art DTXs | | | | | 6.1 | Performance summary and comparison with state-of-the-art DDRMs/DTXs/PA drivers 145 | | 6.2 | Performance summary and comparison with state-of-the-art line-ups | List of Tables ## Summary This thesis focuses on digital-intensive up-converters for sub-6GHz wireless communication. Nowadays, wireless cellular communication is entering its 5<sup>th</sup> generation (5G), driven by the demand for faster mobile access and higher data throughput. 5G utilizes larger modulation bandwidths, higher-order modulations, and (many) more transmitters and receivers than its precessors, requiring higher system efficiency, flexibility, and integration of the transmitter (TX). An essential building block in the TX system is the RF modulator that converts the baseband data to an RF signal. New modulator architectures and circuits are required to handle the increased 5G modulation bandwidths linearly and energy-efficiently. Along with the progress in wireless communication, nano-scale CMOS technologies are advancing toward their physical limitations. Transistors have become smaller and more suited towards digital signal processing (DSP). Moreover, their high-frequency performance has improved, enabling RF analog/mixed-signal circuits. These improvements offer digital-intensive transmitters (DTXs) the opportunity to enter a territory that has been the traditional stronghold of analog-intensive TXs. Consequently, the research question of this dissertation is "What if we change the nature of the RF front-end, such that we can start truly benefiting from the power of CMOS in "digital" (switching) operations?" This thesis proposes new digital-intensive TX line-ups and up-converters architectures with enhanced linearity, bandwidth, and power efficiency to answer this question. Chapter 1 provides a brief overview of the evolution of modern communication standards and nano-scale CMOS technology. It shows that with advanced CMOS technology, digital-intensive solutions are gaining popularity over their conventional analog-intensive counterparts. Chapter 2 gives an overview of conventional analog transmitters with their power amplifiers (PAs) and their evolution towards digital-intensive transmitter (DTX) line-ups. Several existing DTX architectures are discussed. Also, efficiency enhancement techniques are introduced to improve the wireless system's average efficiency when operating with complex modulated signals, like quadrature amplitude modulation (QAM) or orthogonal frequency division multiplexing (OFDM). **Chapter 3** proposes a fully-integrated digital-intensive polar class-E Doherty transmitter line-up featuring segmentation of its RF output stages. A comprehensive analysis of the segmented switched-mode class-E output stages with their 166 Summary ACW-AM and ACW-PM distortion is given. Next, the analysis is extended to cover the behavior of the proposed Doherty configuration. The realized Doherty Digital PA (DPA) achieves a measured drain efficiency of 49.4 % at peak power and 33.7 % at 6 dB power back-off (PBO), respectively. The on-chip digital pre-distortion (DPD) can support 40 MHz 64-QAM signals at an operating frequency of 2.4 GHz while achieving an average drain efficiency of 25 %. Chapter 4 concentrates on the design of a wideband phase modulator for polar DTXs. The severe bandwidth expansion in polar TXs can exceed 3- to 5-times the original modulation bandwidth, yielding severe design challenges. A bandwidth constraint in such a phase modulator, or phase error, will directly restrict the spectral purity of the output signal. In these phase modulators, a common source of phase distortion is the presence of counter-intermodulations (C-IMDs) products. In the proposed wideband phase modulator, harmonic rejection (HR) techniques are deployed to suppress these undesired mixing products and thus enhancing their phase linearity. The proposed phase modulator uses a radio-frequency digital-to-analog converter (RFDAC)-based Cartesian architecture employing current-steering to enlarge its video bandwidth. Measurement results show that this phase modulator can support 80 MHz of modulation bandwidth with an error-vector-magnitude (EVM) of better than -27 dB, making it an excellent candidate for realizing (future) wideband polar DTXs. Chapter 5 proposes another up-converter architecture: a wideband direct-digital RF modulator (DDRM). First, a comprehensive comparison between DPA-based Cartesian DTXs and DDRMs is made. This analysis shows the advantages and disadvantages of these two architectures. Following that, we propose an advanced IQ-mapping technique for unsigned DDRM operation, which offers both efficiency and linearity advantages over conventional DDRM implementations. The high bandwidth of the proposed DDRM architecture is demonstrated by supporting 320 MHz transmit signal without DPD. Chapter 6 introduces a new TX line-up based on a signed DDRM that drives a common-gate (CG)/common-base (CB) output stage in pure current-mode operation. This approach breaks the fundamental trade-off in analog-intensive and digital-intensive TX line-ups between power efficiency and linearity. In this approach, the waveform clipping, needed for improving the CG/CB PA efficiency, is implemented by adding an auxiliary current division path in the DDRM unit cells. Furthermore, signed IQ-mapping, dynamic biasing, and class-B-like HR techniques are applied to the proposed driver to boost the efficiency and linearity of the TX line-up. Several considerations are included in the design and layout of the CB/CG PA to adapt to the proposed CMOS driver. The experimental results show that the proposed driver can operate over a 1-3 GHz frequency range while generating 19.6 dBm peak RF power using 505 mW DC power at 2.4 GHz. For a 160 MHz 256-QAM signal, the measured adjacent-channel-leakage-ratio (ACLR) is better than -40.5 dBc. When connected to a CB SiGe PA, the peak output power is about 27 dBm with a system efficiency of 20 %. When transmitting an 80 MHz 64-QAM signal at 2.2 GHz without and with using the extra division path, the measured ACLR is -32.3 and -37.7 dBc, respectively, with an EVM of -27 and -30 dB. Chapter 7 summarizes the most important conclusions of this thesis and provides recommendations for future research on up-converters and TX line-ups. ## Samenvatting Dit proefschrift richt zich op digitaal-intensieve modulatoren voor de draadloze communicatie onder de 6GHz. Op dit moment betreedt mobiele communicatie zijn 5e generatie. Een ontwikkeling die wordt gedreven door de vraag naar nog snellere responsetijden en hogere datasnelheden. 5G maakt gebruik van hogere modulatiebandbreedtes, hogere-order modulaties en veel meer zenders en ontvangers dan zijn voorgangers. Dit vereist een hogere systeemefficiëntie, flexibiliteit en integratie van de zenderketen (TX). Een belangrijke bouwsteen in een zenderketen is de radiofrequentie (RF) modulator die de basisbandgegevens omzet naar een hoogfrequent RF-signaal. Er is behoefte aan nieuwe modulator-architecturen en circuits om de 5G-breedband signalen energiezuinig met een goede kwaliteit te verwerken. Tegelijkertijd met de progressie in draadloze communicatie, bereiken de CMOS-technologie dimensies hun lithografische limiet. Transistoren worden steeds kleiner en daarmee geschikter voor digitale signaalverwerking (DSP). Bovendien zijn hun hoogfrequent eigenschappen verbeterd, waardoor radiofrequenties (RF) analoge, of gemengd analoge-digitaal schakelingen mogelijk zijn. Deze verbeteringen bieden de digitaal-intensieve zenders (DTX's) een kans om het traditionele analoge bolwerk van communicatie zenders te betreden. De onderzoeksvraag van dit proefschrift luidt dan ook: "Wat als we de aard van een RF-front-end veranderen, zodat we echt kunnen gaan profite en van de kracht van CMOS in "digitale" (schakel) operaties?" Dit proefschrift introduceert nieuwe digitaal-intensieve zender (TX) technieken en modulatoren met een verbeterde lineariteit, bandbreedte en energie-efficiëntie om deze vraag te beantwoorden. **Hoofdstuk 1** geeft een beknopt overzicht van de evolutie van moderne communicatie- en CMOS-technologie. Het laat zien dat met de introductie van geavanceerde CMOS-technologie, digitaal-intensieve oplossingen aan populariteit winnen ten opzichte van hun conventionele analoog-intensieve tegenhangers. Hoofdstuk 2 geeft een overzicht van conventionele analoge zenders met hun vermogensversterkers (PAs) en hun evolutie naar digitaal-intensieve configuraties. Verschillende DTX-architecturen worden besproken. Ook worden efficiëntieverbeteringstechnieken geïntroduceerd om de gemiddelde efficiëntie van een draadloos systeem te verbeteren voor complex gemoduleerde signalen. Hoofdstuk 3 introduceert een volledig geïntegreerde digitaal-intensieve polaire klasse-E Doherty 168 Samenvatting configuratie die segmentatie van zijn RF-eindversterkertrappen benut. Een analyse van deze gesegmenteerde klasse-E eindversterkertrappen met hun ACW-AM en ACW-PM vervorming wordt gegeven. Vervolgens wordt deze analyse verder uitgebreid om het gedrag in een Doherty-configuratie te beschrijven. De gerealiseerde Digitale Doherty PA (DPA) behaalt een drain-efficiëntie van respectievelijk 49,4 % bij piekvermogen en 33,7 % bij 6 dB power back-off (PBO). De on-chip digitale pre-distorsie (DPD) functionaliteit kan 40 MHz 64-QAM-signalen ondersteunen op een zendfrequentie van 2,4 GHz, bij een gemiddelde drain-efficiëntie van 25 %. Hoofdstuk 4 concentreert zich op het ontwerp van een breedband fasemodulator voor een polaire DTX. De bandbreedte-expansie in zo'n TX kan 3 tot 5 keer de oorspronkelijke modulatiebandbreedte bedragen, wat aanzienlijke ontwerpuitdagingen met zich meebrengt. Een bandbreedtebeperking of fasefout zal direct de spectrale zuiverheid van het uitgangssignaal beperken. Een veelvoorkomende bron van fasevervorming, in deze modulatoren, is de aanwezigheid van contra-intermodulatieproducten (C-IMD's). In de voorgestelde breedband fasemodulator wordt harmonische onderdrukking (HR) ingezet om deze ongewenste mengproducten te elimineren en zo de fase-lineariteit te verbeteren. De geïntroduceerde fasemodulator maakt gebruik van een Cartesiaanse RFDAC-architectuur met stroomsturing om de videobandbreedte te vergroten. Meetresultaten tonen aan dat een 80 MHz modulatiebandbreedte kan worden bereikt bij een EVM beter dan -27 dB. Dit maakt dat dit concept een goede kandidaat is voor het realiseren van (toekomstige) breedband polaire DTXs. Hoofdstuk 5 introduceert een andere up-converter-architectuur: een breedband direct-digitale RF-modulator (DDRM). Eerst wordt een uitgebreide vergelijking gemaakt tussen de, op een digitale vermogens versterker (DPA) gebaseerde, Cartesiaanse DTX's en DDRM's. Deze analyse geeft de voor- en nadelen van deze architecturen. Daarna stellen we een geavanceerde "IQ-mapping"-techniek voor "unsigned DDRM-operatie", die zowel efficiëntie als lineariteit voordelen biedt ten opzichte van conventionele DDRM-implementaties. De hoge bandbreedte van de voorgestelde DDRM-architectuur wordt aangetoond m.b.v. een 320 MHz zendsignaal zonder gebruik te maken van DPD. Hoofdstuk 6 introduceert een nieuw TX-concept op basis van een "signed-DDRM" welke een common-gate (CG)/common-base (CB) uitgangstrap aanstuurt in stroommodus. Deze aanpak doorbreekt de traditionele uitruil tussen efficiency en lineariteit in analoog- en digitaal-intensieve TX-opstellingen. In deze aanpak wordt de golfvormclipping welke nodig is voor het verbeteren van de CG/CB PA-efficiëntie geïmplementeerd d.m.v. een extra stroomverdelingspad in de DDRM-eenheidscellen. Verder worden verbeterde "IQ-interleaving", dynamische biasing en een klasse-B HR-techniek toegepast in de driver om de efficiëntie en lineariteit van de zenderopstelling te vergroten. Aanpassingen zijn gemaakt in het ontwerp en de layout van de CB/CG PA om deze aan te passen aan de CMOS-driver. Experimentele resultaten tonen aan dat de driver kan werken over een frequentiebereik van 1-3 GHz terwijl deze 19,6 dBm RF-vermogen genereert bij 505 mW DC-verbruik op 2,4 GHz. Voor een 160 MHz 256-QAM signaal is de gemeten ACLR beter dan -40.5 dBc. Met een CB SiGe PA eindtrap is het piekuitgangsvermogen 27 dBm bij een systeemrendement van 20 %. Voor een 80 MHz 64-QAM-signaal op 2,2 GHz, zonder en met gebruik van het extra stroompad, is de gemeten ACLR respectievelijk -32,3 en -37,7 dBc, bij een EVM van -27 en -30 dB. Hoofdstuk 7 geeft de belangrijkste conclusies van dit proefschrift en een reeks van aanbevelingen voor toekomstig onderzoek naar zenders en hun modulatoren. Samenvatting ## List of Publications ### Journal Papers - Y. Shen, R. J. Bootsman, M. S. Alavi and L. C. N. de Vreede, "A Wideband IQ-Mapping Direct-Digital RF Modulator for 5G Transmitters," under review for publication in *IEEE Journal of Solid-State Circuits* - Y. Shen, M. Hoogelander, R. J. Bootsman, M. S. Alavi and L. C. N. de Vreede, "Towards Energy-Efficient Linear Wideband Operation: A Current-Mode TX Line-up with Direct Digital RF Modulator as a Driver for a Common-Gate/Common-Base PA," in preparation. - M. Beikmirza, Y. Shen, D. P. N. Mul, L. C. N. de Vreede and M. S. Alavi, "A Wideband Four-Way Doherty Bits-In RF-Out CMOS Transmitter," accepted for publication in *IEEE Journal of Solid-State Circuits* - M. Mortazavi, Y. Shen, D. P. N. Mul, L. C. N. de Vreede, M. Spirito and M. Babaie, "A 4-way Series Doherty Digital Polar Transmitter at mm-wave frequencies," under review for publication in *IEEE Journal of Solid-State Circuits* - M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi and L. C. N. de Vreede, "An Intrinsically Linear Wideband Polar Digital Power Amplifier," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 12, pp. 3312-3328, Dec. 2017 - M. Hashemi, L. Zhou, Y. Shen and L. C. N. de Vreede, "A Highly Linear Wideband Polar Class-E CMOS Digital Doherty Power Amplifier," in *IEEE Transactions on Microwave Theory* and Techniques, vol. 67, no. 10, pp. 4232-4245, Oct. 2019 - M. Mehrpoo, M. Hashemi, Y. Shen, L. C. N. de Vreede and M. S. Alavi, "A Wideband Linear I/Q -Interleaving DDRM," in IEEE Journal of Solid-State Circuits, vol. 53, no. 5, pp. 1361-1373, May 2018 #### Conference Papers • Y. Shen, R. J. Bootsman, M. S. Alavi and L. C. N. de Vreede, "A 1–3 GHz I/Q Interleaved Direct-Digital RF Modulator As A Driver for A Common-Gate PA in 40 nm CMOS," 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2020, pp. 287-290 172 List of Publications • Y. Shen, R. J. Bootsman, M. S. Alavi and L. C. N. de Vreede, "A 0.5-3 GHz I/Q Interleaved Direct-Digital RF Modulator with up to 320 MHz Modulation Bandwidth in 40 nm CMOS," 2020 IEEE Custom Integrated Circuits Conference (CICC), 2020, pp. 1-4 - Y. Shen, M. Polushkin, M. Mehrpoo, M. Hashemi, E. McCune, M. S. Alavi and L. C. N. de Vreede, "A wideband I/Q RFDAC-based phase modulator," 2018 IEEE 18<sup>th</sup> Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF), 2018, pp. 8-11 - Y. Shen, M. Mehrpoo, M. Hashemi, M. Polushkin, L. Zhou, M. Arca, R. van Leuken, M. S. Alavi and L. C. N. de Vreede, "A fully-integrated digital-intensive polar Doherty transmitter," 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2017, pp. 196-199 - M. Mortazavi, Y. Shen, D. P. N. Mul, L. C. N. de Vreede, M. Spirito and M. Babaie, "A 30GHz 4-way Series Doherty Digital Polar Transmitter Achieving 18% Drain Efficiency and -27.6dB EVM while Transmitting 300MHz 64-QAM OFDM Signal," 2021 IEEE Custom Integrated Circuits Conference (CICC), 2021, pp. 1-2 - M. Beikmirza, Y. Shen, M. Mehrpoo, M. Hashemi, D. P. N. Mul, L. C. N. de Vreede and M. S. Alavi, "6.2 A 4-Way Doherty Digital Transmitter Featuring 50%-LO Signed IQ Interleave Upconversion with more than 27dBm Peak Power and 40% Drain Efficiency at 10dB Power Back-Off Operating in the 5GHz Band," 2021 IEEE International Solid-State Circuits Conference (ISSCC), 2021, pp. 92-94 - M. Hashemi, Y. Shen, M. Mehrpoo, M. Acar, R. van Leuken, M. S. Alavi and L. C. N. de Vreede, "17.5 An intrinsically linear wideband digital polar PA featuring AM-AM and AM-PM corrections through nonlinear sizing, overdrive-voltage control, and multiphase RF clocking," 2017 IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 300-301 - R. J. Bootsman, D. P. N. Mul, **Y. Shen**, R. M. Heeres, F. van Rijs, M. S. Alavi and L. C. N. de Vreede, "An 18.5 W Fully-Digital Transmitter with 60.4 % Peak System Efficiency," 2020 IEEE/MTT-S International Microwave Symposium (IMS), 2020, pp. 1113-1116 - M. Mehrpoo, M. Hashemi, Y. Shen, R. van Leuken, M. S. Alavi and L. C. N. de Vreede, "A wideband linear direct digital RF modulator using harmonic rejection and I/Q-interleaving RF DACs," 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2017, pp. 188-191 - M. Hashemi, L. Zhou, Y. Shen, M. Mehrpoo and L. de Vreede, "Highly efficient and linear class-E CMOS digital power amplifier using a compensated Marchand balun and circuit-level linearization achieving 67% peak DE and -40dBc ACLR without DPD," 2017 IEEE MTT-S International Microwave Symposium (IMS), 2017, pp. 2025-2028 - D. P. N. Mul, R. J. Bootsman, Q. Bruinsma, Y. Shen, S. Krause, R. Quay, M. Pelk, F. van Rijs, R. Heeres, S. Pires, M. S. Alavi and L. C. N. de Vreede, "Efficiency and Linearity of - Digital "Class-C Like" Transmitters," 2020 50th European Microwave Conference (EuMC), 2021, pp. 1-4 - Z. Gao, J. He, M. Fritz, J. Gong, Y. Shen, Z. Zong, P. Chen, G. Spalink, B. Eitel, K. Yamamoto, R. B. Staszewski, M. S. Alavi and M. Babaie, "A 2.6-to-4GHz Fractional-N Digital PLL Based on a Time-Mode Arithmetic Unit Achieving -249.4 FOM and -59 dBc Fractional Spurs," accepted for publication in 2022 IEEE International Solid-State Circuits Conference (ISSCC) #### **Patents** - L. C. N. de Vreede, M. S. Alavi, M. Mehrpoo, M.Polushkin, M. Hashemi and Y. Shen "RF-DAC based phase modulator," Pub. No. US 2020/10644656, Published May. 2020. - Y. Shen, M. S. Alavi and L. C. N. de Vreede, "Digital transmitter with high linearity for wideband signals," PCT/NL 2021/050187, Filed Mar. 2021 174 List of Publications ## Acknowledgement Time flies. It feels like just yesterday that I arrived in the Netherlands to pursue my Ph.D. at TU Delft. Now after seven incredible years, this dissertation marks the end of this adventure. Now, the time has come to present my dissertation and share with you the challenges and triumphs that brought me here. I have been inspired by and have received a great deal of support from many people throughout my education and work. Some people planted a seed in my heart; others helped to nourish, water, and care for it along the way. Many people have supported my growth, and for that I feel both fortunate and grateful. I would first like to express my deep gratitude and appreciation for my Ph.D. Yoda, promoter, and daily advisor, Professor Leo de Vreede for his unwavering guidance, support, and encouragement. I was presented with a difficult choice when I first arrived in the Netherlands; I could go to Ireland to pursue a Ph.D. or continue my Ph.D. in Delft under his supervision. I am glad I decided to stay. Throughout these past seven years, I have learned so much from him in many areas, but especially about how to handle complex situations. His vision and energy when conducting research always calms me down in my most anxious moments; it is a true joy to learn under his supervision. He inspired me to develop our CMOS-SiGe module. And for the times he trusted me and paid two complete MPWs for my tape-out, I am eternally grateful. Fortunately, both chips worked, and I did not cause a "financial crisis" to ELCA (our ELectronic Circuits and Architectures group). I would also like to express my deep gratitude to my co-promoter and co-daily supervisor, Dr. Morteza Alavi, a true gentleman. He was the first person to give me a tour of the EWI building of TU Delft. I had already read his TMTT papers and Ph.D. thesis before coming to Delft, setting a high standard for me to catch up with. His support and guidance have been consistent and steadfast, and my research achievements would not have been possible without his help. Furthermore, it is my honor to be his first (co-)promoted student. I would like to thank Prof. Stazewski for taking me in as a Ph.D. candidate at TU Delft, which opened a new chapter in my career. I would like to give special thanks to the members of the doctoral exam committee: Prof. Serdijn, Prof. Baltus, Prof. Nauta, Prof. Wambacq, Dr. van Rijs, and Prof. Vaucher. Thank you for taking the time to review my dissertation and approve the manuscript. 176 Acknowledgement My gratitude goes out to other staff members from Microelectronics Department involved in my Ph.D. projects and paper publications of TU Delft: Prof. McCune, Prof. Makinwa, Dr. Verhoeven, Dr. Spirito, and Dr. Babaie. Special thanks to Dr. van Leuken for guiding me through the digital flow. This work also would not have been possible without the help of our support staff. I cannot be more grateful to the great Atef Akhnoukh for his countless hours of assistance, including designing the pad ring, tiling, and contacting Euro Practice. He SAVED my Ph.D. career during the Easter holidays in 2019 when I found two missing vias in the supply track! I also enjoyed the moments he took time from his busy schedule to talk to me about the traffic and his retirement life in Spain, and I am grateful for his concern for me while I was traveling in Egypt. Special thanks to Wil Straver and Zu-yao Chang for wire bonding my chips and giving me suggestions on PCB design and assembly. Also, special thanks to Marco Pelk for fixing my broken boards and giving me instructions on de-embedding. His soldering technique is exceptional! Thanks to Antoon Frehe for his very efficient and professional IT support, especially in the final stage of tape-out. Last but not least, I want to thank Marion de Vlieger, our group secretary, for administrative support and organizing ELCA activities. Having excellent colleagues in TU Delft also contributed to the success of my research. First of all, I want to thank my co-authors: Mohsen Hashemi, Michael Polushkin, Lei Zhou, Rob Bootsman, Dieuwert Mul, Mohammadreza Beikmirza, Mohsen Mortezavi, and Martijn Hoogelander. Working with them has been a fantastic experience. Special thanks to Milad Mehrpoo from whom I learned a lot, and with whom I hope to work again in the future. I would also like to thank my office-mates: Amir Reza Ahmadi Mehr, Reza Lotfi, Wannaya Ngamkham, Gerasimos Vlachogiannakis, Mina Shahmohammadi, Masoud Pashaeifar, and Nawarf Almotairi. My gratitude extends to my other colleagues at the ELCA, Bio-electronic, and Qutech groups: Akshay Visweswaran, Massoud Tohidian, Iman Madadi, Augusto Ximens, Gustavo Martins, Luca Galatro, Carmine De Martino, Augusto Carimatto, Harshita Thippur Shivamurthy, Ronaldo Ponte, Jordi van der Meulen, Mohammad Ali Montazerolghaem, Gagan Singh, and Anil Kumar Kumaran. Special thanks to Alessandro Urso; I will always remember our time in summer schools in Lausanne and Beijing fondly. Last but not least, I want to thank Satoshi Malatoux who surprised me by showing up at a Chinese hotpot party in the winter of 2014 and playing Mahjong with us. I am grateful for our many follow-up hotpot parties and days cycling. I sincerely wish you a happy life ahead. In times when I was feeling very homesick, I was fortunate enough to have a great Chinese community in Delft where I could find joy and comfort. Thank you to Rui Hou for showing me around when I first came to the Netherlands. Thank you to the so-called "Delft Experienced Drivers," Yongjia Li and Ying Wu for shuttling me to many exciting activities in my spare time, and especially for unintentionally paving my way to Leiden where I found my special one. Thank you to Zhebin Hu and Zhirui Zong for our extensive technical discussions about setting up the EM simulation and measurement environment. More importantly, they fed my love for photography, which helped ease the tape-out pressure. And to Zhebin, who once brought a fresh homemade Tiramisu to his girlfriend all the way from Delft to Guangzhou via Dusseldorf and Shanghai, thank you for the reminder that even engineers like us can sometimes be romantic. However, I also won't soon forget the time you woke me up at 2 AM to unlock the office door, or when you asked me to deliver your passport to Schiphol Airport at 6 AM. Thanks goes to Zhirui for helping me adapt to Delft and Eindhoven, and for reviewing my draft papers. Our "fire alarm incident" is still fresh in my memory. I sincerely wish you enjoy your new life in China with your partner. And in the final phase of my Ph.D., I was fortunate to meet some great fellows who helped me along my way: Jiang Gong, Zhong Gao, Linghan Zhang, Jingchu He, and Bolin Chen. Thank you for spending lots of time in technical and non-technical discussions with me, especially during the "ZuoFa" ceremony after tape-out (our insider tradition for wishing good luck). Many special thanks to Yue Chen, the most hard-working guy I have ever met. We arrived in the Netherlands at almost the same time and have been roommates and officemates for four years. I admire your special type of persistence! In addition, I want to give my thanks to other Chinese colleagues in EWI building whom I was fortunate to meet: Duan Zhao, Yao Liu, Chao Chen, Chao Zhang, Zeyu Cai, Hui Jiang, Long Xu, Zhao Chen, Yu Xin, Qing Ding, Ting Gong, Mingliang Tan, Sining Pan, Xin Guo, Tianyi Jin, Hengqian Yi, Lingling Lao, and Rui Guan. And to my other roommates: Ruimin Yang, Jiliang Ma, Heng Wang, Jing Li, Xianwei Zeng, and many others that I have met on the basketball court, in bars, and at parties. Thank you for making my life abroad fun. My Ph.D. work is supported by NWO and Ampleon B.V, Nijmegen. The experts there always offer insightful advice and suggestions during discussions. And my one year in the advanced concepts and system group was nothing short of joyful. Thanks to Nick Pulsford, John Gajadharsing, Segio Pires, Rob Heeres, Mustafa Acar, Jawad Qureshi, Junlei Zhao, Yi Zhu, Andre Prata, Lazaro Marco, Adam Cooman, Alireza Shamsafar, Domenico Calzona, Jordan S., Komo Solaksono, Martino Lorenzini, Sandra Kits, Mohadig Widha Rousstia, Lei Zhou, and Abdul Raheem Qureshi for their support and fruitful technical and non-technical discussions. I would like to thank my colleagues at imec the Netherlands: Mario Konijnenburg, Yao-Hong Liu, Yuming He, Paul Mateman, Johan Dijkhuis, Johan van den Heuvel, Erwin Allebes, Christian Bachmann, Peng Zhang, Garauv Singh, Minyoung Song, Elbert Bechthum, Stefano Traferro, Evgenii Tiurin, Haoming Xin, Anoop Bhat, Bart Thijssen, Ming Ding, Yu Huang, and Chengyao Shi. It has been my pleasure to work with you on various projects over the past year. Taking part in the wonderful analog competence meetings and analog Friday Seminars expands my technical view. Also, thanks to Qilong Liu from NXP for frequently joining our lunch break and discussing interesting technical questions at the High-Tech Campus, Eindhoven. During my Ph.D., I had the pleasure of knowing Peng Chen and Feifei Zhang from University College Dublin. Their comments on my draft paper proved to be invaluable. I started my tape-out career at Tsinghua campus under the mentorship of Professor Woogeun Rhee. His guidance has led me to the practical analog/RFIC design, and more importantly, he taught me how to be a professional researcher and team player. I am truly grateful for his guidance. It is a pity that we did not meet in the CICC and RFIC in person. Also, thanks to my former groupmates in Tsinghua University: Ni Xu, Xican Chen, Dang Liu, Yudong Zhang, and Yining Zhang. Also, I would like to thank Prof. Zhihua Wang and Prof. Nan Sun for their encouragement. During my master's and Ph.D. programs, I have had the pleasure of attending electrical engineering lectures on-site and virtually. I would like to thank Prof. Steyaert, Prof. Gielen, 178 Acknowledgement Prof. Reynaert, Prof. Verhelst, Prof. Dehaene, Dr. van Hoof, Prof. Nauwelaers, and Prof. Moonen from KU Leuven and imec for their excellent lectures that guided me into the field of IC design and signal processing. I am grateful for the insightful lectures of Prof. Bult and Dr. Pelgrom from TU Delft about analog IC design and data converters. I would also like to thank Prof. Razavi and Prof. Hajimiri, for their generosity in sharing incredible lectures on the primary circuit design on YouTube. Many thanks to Sarah for proofreading my thesis. The many achievements and progress of my career would not have been possible without the understanding and support of my family and my partner. I feel loved and extremely lucky when I think about them. My deepest gratitude goes to my dear parents, Baba and Mama, for their continued love, support, and encouragement; I hope to spend more time together in the future. And thanks to my dear Laolao and Laoye (my maternal grandparents) who gave me wonderful childhood memories and whose conversations are my favorite moment in the day. Many thanks to my uncles, aunts, and cousins; reuniting with family always brings back sweet childhood memories and recharges my energy and spirit. Thanks to my dear Yeye, Nainai, and Dada (my paternal grandparents and uncle) whose presence I miss daily. I believe they are watching over me from heaven and have protected me on this journey. No other source of support has encouraged me to succeed more than that of my family. Lastly, but most importantly, I would like to thank my special one, Ran Chang, for always being there for me during my ups and downs. Of everything I discovered in the Netherlands while pursuing my Ph.D., she is the most precious. I would also like to thank her loving and supportive parents. To Ran, you are the sun that drives away the dark clouds in my skies; for your understanding, support, and deep love I am forever grateful. I look forward to starting our new chapter of life together. Yiyu Shen Oct 2021 Delft, The Netherlands