A 128 Gbps PAM-4 Feed Forward Equaliser with Optimized 1UI Pulse Generator in 65 Nm CMOS.
Jiawei Wang,Hao Xu,Ziqiang Wang,Haikun Jia,Hanjun Jiang,Chun Zhang,Zhihua Wang
DOI: https://doi.org/10.1049/cds2.12151
2023-01-01
Abstract:A quarter-rate PAM-4 FFE employing INCC 1UIPG is implemented in 65 nm CMOS. The proposed INNC 1UIPG reduces the average transition time by ~20%, saving clocking power consumption by ~1.5X, lowering jitter amplification by about 2~5 dB compared with previous works. Along with the bandwidth- and power-efficient partially segmented tailless 1-stage front-end architecture, the proposed FFE achieves 128Gbps PAM-4 data rate with a 0.014 mm2 area. This letter presents a 4-level Pulse Amplitude Modulation (PAM-4) Feed Forward Equaliser (FFE) with a novel Internal Node Charge Controlled 1-Unit Interval Pulse Generator (INCC 1UIPG). Partially segmented architecture and tailless 1-stage front end are chosen to reduce the overall load capacitance for better bandwidth and power performance. The proposed INCC 1UIPG adopts a stacking-reduced structure and precisely controls the internal nodes, demonstrating advantages in speed, power, and jitter, showing better potential of working at a ultra-high baud rate. The wider bandwidth and faster transition edge allow the implementation of the equaliser working at 128Gbps with an area of 0.014 mm2 in 65 nm CMOS. The ever-increasing bandwidth demand in high-performance computing and other applications is continuously promoting the data rate of wireline communication systems with some protocols already requiring data rate in excess of 50-Gbaud, posing serious challenges to the design of transceivers. State-of-the-art TXs adopt a hybrid architecture to fully integrate the advantages of their analogue [1, 2] and Digital-to-Analogue Converter (DAC)-based [3] counterparts, which have not only high resolution and low complexity, but also flexible and efficient Finite Impulse Response (FIR) tuning, called segmented FFE architecture [4-6] in this letter. To further ease bandwidth pressure, high-speed TXs have a trend to reduce the number of full-rate nodes. By combining the 4:1 MUX into the pre-driver, the authors in ref. [2] reduce this number to 2 with the internal full-rate nodes that are peaked by inductors. However, this technique is not suitable for DAC-based and segmented TXs for area considerations. Another technical route attempts to further merge 4:1 pre-driver into the driver [1, 4, 6], thereby eliminating all the internal full-rate nodes, called 1-stage front end in this letter. In a 1-stage front end, total capacitance of output stage becomes even more critical, which determines achievable bandwidth and overall power dissipation. Extremely, some design employs the tailless CML driver to obtain the smallest size for a specified output swing [5, 6]. Given that this letter is targeted at an aggressive 128Gbps data rate in 65 nm CMOS technology, segmented architecture and 1-stage tailless front end are chosen with the FIR tap is designed to be partially adjustable to ease the bandwidth pressure ever further. Contrary to the trend of the front end, a high-performance full-rate working 1UIPG widely used in quarter-rate architecture attempts to adopt a multi-stage structure [5, 6] to improve speed – it is difficult to optimise both two edges of the pulse in a single stage, which usually corresponds to 3-stacked devices [1, 3]. The authors in reference [4] proposed a pre-charged structure that generates the 1UI pulse in a single-stage circuit. However, this technique is not suitable for a tailless CML driver, in which any pre-charge level will be translated to the output immediately. The authors in ref. [6] adopt a 2-stage structure to avoid 3-stacked paths. Unfortunately, the 2-stacked devices on critical path and undriven internal nodes ultimately limit the achievable speed. In order to address these drawbacks, the proposed 2-stage INCC 1UIPG optimises two edges of the pulse separately, reduces device stacking on critical paths, and reasonably controls the internal nodes, showing the best potential of working at ultra-high speed. Figure 1 shows the overall architecture of the proposed equaliser (half circuits). Data path is divided into MSB and LSB to generate PAM-4 output where MSB block is composed of the same two LSBs for good linearity. Each block is further divided into three groups of slices, X1, X2, and X6, forming a 3-bit DAC. X1 and X2 can be configured as a main tap or post tap as required with X6 is fixed as a main tap. Finite Impulse Response timing is generated at 1/8 rate with C8 clock. Subsequently, X1 and X2 slices can select data with different timing under the control of FFE_DAC<1:0> to be configured as different taps with X6 experiencing a matching delay. The selected 4-bit parallel data become time-interleaved 1UI pulses in the proposed INCC 1UIPG and finally complete the combination in the 4:1 tailless CML driver. When assigned as a post tap, the output current of the drivers of X1 and X2 slices can be further continuously adjusted through the bias of their cascode transistors. Feed Forward Equaliser (FFE) architecture (half circuits). The proposed equaliser adopts a partially segmented quarter-rate architecture and 1-stage tailless front end to reduce overall load capacitance and achieve the aggressive target of 128Gbps. A 3-bit DAC is used to provide coarse tuning, with the fine tuning being implemented in the analogue domain, forming a segmented architecture. Since X1 and X2 slices can be allocated as a main tap when ‘strong’ equalisation is not required, the equaliser is more bandwidth- and power-efficient compared with its analogue counterpart – in which the main tap driver itself must be sized to deliver specific output swing, and any of the equalisation tap drivers would introduce additional loading. At the same time, the DAC is allowed to be simple with low circuit complexity and small parasitic capacitance. (A ‘pure’ DAC-based TX needs to have much more bit with complex calibration for resolution and linearity considerations.) Moreover, the front end is designed to be partially adjustable – the largest X6 slices are fixed as main tap, allowing the cancelation of their cascode transistors to further reduce driver size under the same output swing, which greatly reduces the load capacitance, at a cost of tuning flexibility. Figure 2 shows three 1UIPGs with different structure and their timing diagrams under 64 Gbaud with the critical paths in stage 1 marked as red. Figure 2a adopts a single-stage structure, which uses the falling edge of CKQ and the rising edge of CKI to select the low level of the data, where M3 is used to control the internal charge of N2. This structure achieves 112 Gbps PAM-4 data rate in [1] and 224Gbps in [3], both in 10 nm CMOS. Although the charge of internal node N2 is reasonably controlled, there is a 3-stacked charging path (M1-M2-M4) existing, which leads to a slow rising edge at the output and it is difficult to reach full swing at a high baud rate. Comparison of 3 types of 1UIPGs under 64Gbaud. The authors in reference [6] adopt a two-stage architecture to avoid 3-stacked paths as shown in Figure 2b. Using the rising edge of CKQ and the falling edge of CKI to select the high level of the data, this structure achieves a PAM-4 data rate of 200Gbps in 28 nm CMOS. In the first stage, when the data is high and the rising edge of CKQ comes, OUT1, which is originally high, is pulled down. In the second stage, M6 pre-charges N2 when CKIB is pulled down and the falling edge of OUT1 controls M6 and M7 to charge OUT2, thus producing its rising edge. Subsequently, the rising edge of CKIB controls M8 to discharge OUT2, thus producing its falling edge. The pre-charged 2-stacked path allows OUT2 to have a faster rising edge. However, this structure's speed is still limited due to the following reasons. Firstly, the falling edge of OUT1, which is used to produce the final 1UI pulse, is generated by a 2-stacked path where N1 needs to be discharged first when M2 and M3 try to pull down OUT1. More importantly, when CKIB changes from low to high, OUT1 remains low for a period, thereby M8 needs to discharge not only OUT2 but also N2 at the same time, which leads to a slow falling edge of the final 1UI pulse. The proposed INCC 1UIPG is shown in Figure 2c. Different from (b), this 2-stage structure uses the rising edge of OUT1 and the falling edge of CKQB to generate the final 1UI pulse. In the first stage, the falling edge of CKI controls single M1 to produce the rising edge of OUT1 when data is high. Since M2 has been already turned off, N1 node will no longer affect this charging process. Considering that the falling edge of OUT1 is non-critical and N1 can be pre-discharged by M3, relative transistors are allowed to use smaller size, which further expands the bandwidth of OUT1. Meanwhile, CKQ generates CKQB through an inverter, matching the delay between CKI path to ensure an accurate 1UI pulse width under PVT variations. In the second stage, M6 pre-charges N2 when OUT1 is low, the rising edge of OUT2 is finally generated by the falling edge of CKQB. It is important to notice that M8 and M9 will discharge OUT2 and N2 simultaneously at the rising edge of OUT1, accelerating the falling edge of the final 1UI pulse. In this two-stage structure, bandwidth of the intermediate-node OUT1 has been further optimised with all the internal nodes (N1 and N2) are reasonably controlled, resulting a higher-performance 1UI pulse. Figure 3 shows a use case of the 3 aforementioned 1UIPGs. Use cases 1, 2, and 3 are obtained by using structures (a), (b), and (c) in Figure 2 as the 1UIPG in Figure 3, respectively. Note that the three use cases have the same input clock and data buffer and employ the same size 4:1 multiplexer for a fair comparison (marked as red in Figure 3). From the analysis and simulation results, we can explain the following properties of the proposed INCC 1UIPG. A use case of the three aforementioned 1UIPGs. Figure 4 shows simulation results of the 1UI pulses over PVT variations of the three use cases under 64Gbaud. As shown in Figure 4a, Use case #1 has the largest rise time due to the 3-stacked charging path. Figure 4b illustrates the limited fall time of Use case #2 due to the uncontrolled internal nodes. Figure 4c compares the average transition time of the 1UI Pulses. Use case #3 shows the best performance with the help of 2-stacked dynamic logic and reasonable INCC. Compared with the previous two, the average transition time is reduced by 22% and 17% under TT corner, respectively. Simulation results of 1UI pulses over PVT variations of the 3 use cases. (a) Rise time, (b) fall time, and (c) average time. Faster slew rate of the 1UI pulse can speed up the charging and discharging processes of the output of 4:1 multiplexer, extend the bandwidth, and therefore reduce its deterministic jitter (DJ). And what's more, the sharper slope at the transition point of pulse generator and multiplexer outputs reduces the conversion of their intrinsic voltage noise into jitter. Figure 5 shows 4:1 multiplexer output DJ of the three use cases under 64Gbaud and 80Gbaud, respectively. Use case 3 shows minimal output DJ, demonstrating its potential to work at higher baud rates. Simulation results of 4:1 multiplexer output of the 3 use cases. Reducing device stacking on a critical path can also improve the size design, reduce the total loading of clock path and data path, and therefore reduce the power consumption of their buffers. It is attractive to minimise the clock loading to reduce the design effort of clocking network, of which must take speed, jitter, and power consumption into fully consideration. Specifically, the critical edges of the proposed INCC 1UIPG (Rising edge of OUT1 and falling edge of CKQB, see Figure 2) are both generated by a stacking-free transistor (M1 and M5). M2 cuts off the pull-down path and shields N1 node when M1 charges OUT1 and therefore M1 can be small in size, just like in an inverter. The falling edge of OUT1 is non-critical so that M2 and M3 can be sized even smaller. By contrast, the critical edge in use case 2 – the falling edge of OUT1 is generated by stacking devices M2 and M3 with N1 cannot be discharged in advance, the size of relative transistors cannot be small (M3 is twice of M2 in use case #2, increasing clock loading by about 1.5X). Similarly, M2 is twice of M4 and M1 is triple of M4 in use case #1. Figure 6 shows power breakdown of the three use cases. Since the fan-out factor of buffers cannot be huge for speed and jitter considerations (we use FO2 for 16 GHz clock in 65 nm CMOS). The heavier loading of data and clock path leads to more buffer stages, greater total power dissipation, and more clocking jitters. Considering the large number of slices in an actual TX (need ~6X of the use case for a 1.2Vppd output swing), these power savings are very attractive. Power breakdown of the 3 Use cases. The stacked devices will also underperform in terms of jitter amplification due to the poor slope. We designed a simulation to verify this. As shown in Figure 7, a small jitter impulse (1ps in this simulation) is injected into one of the quarter-rate clocks (C0 in this simulation). By recording the transient response of the output of pulse generator and multiplexer when transmitting repeating clock patterns in the three use cases (we removed the clock and data buffers in this simulation; an ideal clock source with a fixed slope is used as a substitute to eliminate the impact of the multi-stage buffers), we can obtain their jitter impulse response (JIR). After normalising them to the input injection, we obtained the corresponding jitter transfer function (JTF) of the three use cases by Discrete-time Fourier Transform. Jitter amplification simulation method. Figure 8 shows the simulated JIR and JTF under 64Gbaud. Use case #3 reflects a milder JIR and about 5 dB/2 dB lower jitter amplification than use cases #1 and #2, respectively. Simulated jitter impulse response (JIR) and jitter transfer function (JTF) of the 3 Use cases. The FFE prototype chip is fabricated in 65 nm CMOS technology with a core area of 0.014 mm2 as shown in Figure 9a. Figure 9b demonstrates the post-layout simulation results of proposed INCC 1UIPG working at 64Gbaud. The 1UI pulse eye with 10.83ps rise time and 11.33ps fall time is shown in Figure 9c. The pulse is full-swing and fast enough to drive the subsequent tailless CML transistors. Power breakdown of the FFE (i.e., high speed data path of the TX prototype, design of high-performance clocking network is not discussed in this letter, and its power consumption is therefore not calculated here) is shown in Figure 9(d). Feed Forward Equaliser slices (FFE selectors + D4 buffers + INCC 1UIPGs, as shown in Figure 2) consume about half of the power consumption of the data path. The driver stage consumes about 45.6 mW power to provide ~1Vppd output swing. Layout details and post-layout simulation results of the Feed Forward Equaliser (FFE) prototype chip. The channel responses with 2.7 dB/5.7 dB/10.3 dB insertion loss, respectively, at Nyquist frequency (32 GHz) are shown in Figure 9e. Figure 9f shows the 128Gbps PRBS15 eye after a 2.7 dB channel loss. Figure 9g~j compare the 128Gbps PRBS15 PAM-4 eye w/or w/o TX FFE under 5.7 dB/10.3 dB channel loss, respectively. By adjusting the coefficient of the segmented equaliser reasonably, the eye can be opened up to 0.49UI with approximately 95mVppd height per sub-eye for a 10.3 dB loss. Table 1 summarises the performance of the proposed FFE and compares it with reported quarter-rate PAM-4 TXs' high-speed data paths. A quarter-rate PAM-4 FFE employing INCC 1UIPG is implemented in 65 nm CMOS. The proposed INNC 1UIPG reduces the average transition time by ~20%, saving clocking power consumption by ~1.5X, lowering jitter amplification by about 2~5 dB compared with previous works. Along with the bandwidth- and power-efficient partially segmented tailless 1-stage front-end architecture, the proposed FFE achieves 128 Gbps PAM-4 data rate with a 0.014 mm2 area. Jiawei Wang: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review and editing. Hao Xu: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – review and editing. Ziqiang Wang: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review and editing. Haikun Jia: Methodology, Resources, Software, Validation, Writing – review and editing. Hanjun Jiang: Methodology, Resources, Software, Writing – review and editing. Chun Zhang: Methodology, Resources, Software, Writing – review and editing. Zhihua Wang: Funding acquisition, Methodology, Project administration, Resources, Supervision. This work is supported by the Shenzhen Science and Technology Program (No. JCYJ20180306170609470) and Key Research and Development Plan of Shandong Province (No. 2022CXGC010109). The authors declare that we do not have any possible conflicts of interest. Shenzhen Science and Technology Program, Grant/Award Number: JCYJ20180306170609470; Key Research and Development Plan of Shandong Province, Grant/Award Number: 2022CXGC010109 The data that support the findings of this study are available from the corresponding author upon reasonable request.
What problem does this paper attempt to address?
-
A 128 Gbps PAM-4 feed forward equalizer with optimized 1UI pulse generator in 65nm CMOS
Jiawei Wang,Hao Xu,Ziqiang Wang,Haikun Jia,Hanjun Jiang,Chun Zhang,Zhihua Wang
DOI: https://doi.org/10.22541/au.166121285.57581602/v1
2022-01-01
Abstract:This letter presents a 4-level Pulse Amplitude Modulation (PAM-4) Feed Forward Equalizer (FFE) with a novel Internal-Node-Charge-Controlled 1-Unit Interval Pulse Generator (INCC 1UIPG). Partially segmented architecture and tailless 1-stage front-end are chose to reduce the overall load capacitance for better bandwidth and power performance. The proposed INCC 1UIPG adopts a 2-stage structure and precisely controls the internal nodes, reducing average transition time by ~30% compared with the prior works. The wider bandwidth and the faster transition edge allow the implementation of the equalizer working at 128Gbps with a 0.39pJ/bit power efficiency and an area of 0.014mm2 in 65nm CMOS, which advance the state-of-the-art with a mature technology.
-
A 40 Gbps PAM-4 Receiver with 12-Tap Direct Decision Feedback Equalizer Employing 1.5-Stage Slicers in 65-Nm CMOS
Zeliang Zhao,Xin Wu,Dengjie Wang,Ziqiang Wang,Chun Zhang,Xiangyu Li,Zhihua Wang
DOI: https://doi.org/10.1109/icta53157.2021.9661649
2021-01-01
Abstract:This article describes a four-level pulse amplitude modulation (PAM-4) receiver with an analog front end (AFE) and a 12-tap direct decision feedback equalizer (DFE). A 1.5-stage slicer is proposed and the layout arrangement is optimized to relax the stringent timing constraint of the first tap loop. The post simulation results show that, compared with the traditional strong-arm slicer, the proposed 1.5-stage slicer reduces the clock-to-Q delay by 18%, which allows the implementation of the receiver with energy efficiency of 4.2 pJ/bit at 40 Gbps and a core area of 0.27 mm 2 in 65-nm CMOS process.
-
Pin-efficient 9-Bit 8-Wire 4-Level Synergetic-Equalisation Coding Scheme for 216 Gb/s PAM4 Transceiver
Linqi Shi,Weixin Gai,Yandong He
DOI: https://doi.org/10.1049/el.2019.3976
2020-01-01
Electronics Letters
Abstract:A synergetic-equalisation coding scheme for pulse-amplitude modulation 4-level (PAM4) is proposed, based on which a 216 Gb/s 9-bit 8-wire 4-level (9B8W4L) transceiver is designed. According to the scheme, eight channels are grouped into four pairs based on the states for encoding. The two channels in pair transmit the same symbol in this unit interval (UI), and transmit different symbols in different pairs in the next UI. When decoding, the symbols in different channels are subtracted from and added with the others, following the grouping in the prior UI. From this, the first post-cursor inter-symbol interference is eliminated. Based on the proposed coding scheme, the PAM4 transceiver transmits 9-bit data by eight channels, giving a channel utilisation of 112.5%. The improvement of channel utilisation reduces the Nyquist frequency for the same data rate, which is of great significance for reducing channel loss and power consumption. The 216 Gb/s 9B8W4L transceiver with 1-tap feed-forward equaliser achieves a bit error rate <10(-12) through -20 dB channel loss at the Nyquist frequency of 12 GHz.
-
A 56 Gbps 4‐tap PAM‐4 Direct Decision Feedback Equaliser with Negative Capacitance Employing Dynamic CML Comparators in 65‐nm CMOS
Dengjie Wang,Zeliang Zhao,Ziqiang Wang,Chun Zhang,Zhihua Wang,Hong Chen
DOI: https://doi.org/10.1049/ell2.12224
2021-01-01
Electronics Letters
Abstract:Here, a 4-level pulse amplitude modulation direct decision feedback equaliser (DFE) with a novel dynamic current-mode-logic comparator (DCMLC) is presented. The DCMLC breaks the trade-off between settling time and regeneration time in traditional CML comparator design by utilizing dynamic logic and separately optimizes the tracking stage and regeneration stage for a correct latch operation at ultrahigh speed. Compared with the traditional CML comparator, the DCMLC reduces delay by 36% and has better input sensitivity on high baud rates at the cost of 7% shrunk output swing. The negative capacitance is adopted to achieve a 0.5 dB bandwidth extension ratio of up to 1.89. The reduced delay and wider bandwidth of the proposed comparator allow the implementation of 4-tap direct DFE at 56 Gbps with 2.8 pJ/bit energy efficiency and an active area of 0.007 mm(2) in 65-nm CMOS technology.
-
40 Gbps 4‐level Pulse Amplitude Modulation Closed‐loop Decision‐feedback Equaliser with High‐speed Comparator in 55 Nm CMOS Technology
Ai He,Weixin Gai,Liangxiao Tang
DOI: https://doi.org/10.1049/el.2018.1112
2018-01-01
Electronics Letters
Abstract:A 40 Gbps 4-tap 4-level pulse amplitude modulation closed-loop decision feedback equaliser (DFE) is proposed. The DFE adopts a novel high-speed comparator to resolve the critical timing constraints of the first tap. The comparator decreases the slicing delay by shortening the gap between initial and target voltages. Compared with the existing closed-loop DFE designs, the proposed scheme relieves timing constraints without complex clock distribution circuits and extra area. Simulations based on the RF-MOS model verify that the delay of the comparator is improved by 32.8% and the output swing is increased by more than 2.8 times. The proposed DFE which can compensate −9.5 dB channel loss is designed in 55 nm CMOS technology. The power consumption is 67 mW from a 1.2 V supply and the circuit occupies an active area of 0.021 mm2, achieving 1.68 pJ/bit energy efficiency.
-
56 Gb/s PAM4 Receiver with an Overshoot Compensation Scheme in 28 Nm CMOS Technology
Ai He,Weixin Gai,Bingyi Ye,Boyang Zhang,Kai Sheng,Yuanliang Li
DOI: https://doi.org/10.1016/j.mejo.2021.105236
2021-01-01
Abstract:A 56 Gbps 2-tap 4-level pulse amplitude modulation closed-loop decision feedback equalizer (DFE) is designed in 28 nm CMOS technology. The first-tap feedback signal directly tapped from the slicer causes uncontrolled overshoot, resulting in over-correction. With insignificant hardware and power consumption, an overshoot compensation scheme is proposed, which generates an opposite overshoot to compensate the original one. Simulation results show the overshoot caused by different process-voltage-temperature conditions during the sampling aperture is reduced by at least 40%. More than 59% improvement is achieved in recovered eye height over its conventional counterpart with ≥ 9 dB channel loss at 14 GHz. In addition, a two-stage slicer is proposed to resolve the critical timing constraints of the first-tap direct feedback path. The receiver occupies an area of 0.19 mm 2 , and consumes a power of 179 mW, achieving 3.2 pJ/bit energy efficiency.
-
A Highly-Scalable Analog Equalizer Using a Tunable and Current-Reusable for 10-Gb/s I/O Links
Yong Chen,Pui-In Mak,Yan Wang
DOI: https://doi.org/10.1109/tvlsi.2014.2318733
2014-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:A 0.0015-mm(2) 1.28-mW single-branch analog equalizer is demonstrated in 65-nm CMOS for 10-Gb/s input/output links. Instead of using passive inductors that are untunable and unscalable with technologies, gain compensation here is optimized via a tunable and current-reusable active inductor (AI). This AI incorporates a positive-feedback impedance converter with only two MOSFETs and one MOS varactor. Together with the use of: 1) negative Miller capacitors to optimize the pole-zero composition and 2) tunable resistive source degeneration to adjust the low-frequency losses, the analog equalizer recovers an eye-opening rate of minimally 30% up to 10 Gb/s over a pair of 60-cm FR4 microtrip traces. The data Pk-to-Pk jitter is <24 ps, and the RMS jitter is <4 ps, over a number of pseudorandom bit sequence patterns (2(7)-1, 2(15)-1, and 2(31)-1).
-
A 44 Gbps PAM-4 Transmitter with Resistance Feedback 4:1 MUX in 65nm CMOS
Ziqiang Wang,Dengjie Wang,Xin Wu,Jiawei Wang,Hao Xu,Chun Zhang,Hong Chen,Zhihua Wang
DOI: https://doi.org/10.1109/icsict55466.2022.9963444
2022-01-01
Abstract:This paper presents a power-efficient four-level pulse amplitude modulation (PAM-4) transmitter with 4-tap feed-forward equalizer (FFE) SST driver. A resistance feedback 4:1 MUX is proposed to overcome the bandwidth limit in the last MUX node. The bandwidth is improved more than twice and the jitter is reduced from 7.07ps to 381fs. An absolute delay-based clock distribution network is designed to ensure the timing of the 4:1 MUX. The PAM-4 transmitter is fabricated in 65nm CMOS process and occupies 1.53mm×1.35mm. It delivers a 1.2Vppd PAM-4 signal with RLM of 99% under 1.2V supply and achieves an energy efficiency of 2.27pJ/bit at 44Gbps.
-
6.7 A 128gb/s PAM-4 Transmitter with Programmable-Width Pulse Generator and Pattern-Dependent Pre-Emphasis in 28nm CMOS
Kai Sheng,Weixin Gai,Zeze Feng,Haowei Niu,Bingyi Ye,Hang Zhou
DOI: https://doi.org/10.1109/isscc42615.2023.10067407
2023-01-01
Abstract:The ever-growing demands for high-bandwidth communications continuously push wireline links to operate at higher speeds. Recently reported transmitters (TXs) have achieved a data rate of more than 100Gb/s [1–6]. PAM-4 modulation, which doubles the data rate at the same symbol rate, has been widely adopted to make use of the link bandwidth more efficiently. However, the complex transitions introduce greater data-dependent jitter, decreasing the horizontal eye-opening. In addition, the transitions between non-adjacent levels bring about twice or three times inter-symbol interference (ISI) compared with transitions between adjacent levels, resulting in reduced vertical eye-opening. Although a feed-forward equalizer (FFE) can be used to mitigate these issues, it is usually implemented in a de-emphasis manner in PAM-4 TXs, which reduces the output swing and lowers the signal-to-noise ratio. The proposed TX incorporates a pulse generator with programmable width for optimizing transition edges and a pattern-dependent pre-emphasis scheme that performs equalization without sacrificing output swing.
-
Novel Wavelength Multiplexer Using (<italic>N</italic> + 1) × (<italic>N</italic> + 1) Arrayed Waveguide Grating and Polarization-Combiner-Rotator on SOI Platform
Jun Zou,Xiao Ma,Xiang Xia,Changhui Wang,Ming Zhang,Jinhua Hu,Xuyang Wang,Jian-Jun He
DOI: https://doi.org/10.1109/JLT.2021.3053837
IF: 4.7
2021-01-01
Journal of Lightwave Technology
Abstract:We propose an ultra-compact novel wavelength multiplexer employing a (N + 1) × (N + 1) arrayed waveguide grating (AWG) and a polarization-combiner-rotator (PCR) on the SOI platform, to realize a multiplexing for 2N wavelengths with a spacing of Δλ. The (N + 1) × (N + 1) AWG works at a bidirectional way to provide two groups of N × 1 wavelength multiplexing with each group having a channel spacing of 2×Δλ, and the central wavelengths of all input channels in one group have a wavelength shift of Δλ with respect to those in the other group. The double channel spacing results in a significant decrease on the footprint of the (N + 1) × (N + 1) AWG-based multiplexer compared with a conventional 2N × 1 AWG multiplexer with the same wavelength spacing Δλ. Due to the fact that a single mode fiber is insensitive to the polarization of input light, if we consider short reach datacom applications such as 100/400 GbE, the two separate multiplexing outputs of the (N + 1) × (N + 1) AWG can be combined as one output with one half wavelengths working at TE polarization and the other at TM polarization by employing a low loss and broadband PCR. In the experiment, we demonstrate a 16 × 200 GHz multiplexer based on a 9 × 9 AWG. The experimental results show that the on-chip loss of the fabricated multiplexer is 2.7 dB and the loss uniformity is 0.5 dB. The 1-dB and 3-dB bandwidths are >0.56 nm (i.e., 35% of the wavelength spacing) and >1.1 nm (i.e., 69% of the wavelength spacing), respectively. They can also be further increased by decreasing the gap between adjacent input waveguides at the interfaces of star couplers of the designed AWG without inducing an excess loss. The proposed multiplexer has great potential for application to future super large capacity (> Tb/s) data transmission systems.
-
A 5-50 Gb/s Quarter Rate Transmitter with a 4-Tap Multiple-Mux Based FFE in 65 Nm CMOS
Xuqiang Zheng,Chun Zhang,Fangxu Lv,Feng Zhao,Shigang Yue,Ziqiang Wang,Fule Li,Zhihua Wang
DOI: https://doi.org/10.1109/esscirc.2016.7598303
2016-01-01
Abstract:This paper presents a 5-50 Gb/s quarter-rate transmitter with a 4-tap feed-forward equalization (FFE) based on multiple-multiplexer (MUX). A bandwidth enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed to increase the maximum operating speed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. Implemented in 65 nm CMOS technology, the transmitter occupying an area of 0.6 mm2 achieves a maximum data rate of 50 Gb/s with an energy efficiency of 3.1 pJ/bit.
-
A 116-Gb/s PAM4 0.9-pJ/b Transmitter With Eight-Tap FFE in 5-nm FinFET
Yevgeny Perelman,Zeev Toroker,Daljeet Kumar,Eran Maday,Noam Familia,Tzachi Carbone,Gal Kidron,Idan Mizrahi,Yoni Landau,Rushdy Saba,Yaakov Goldberg,Alon Meisler
DOI: https://doi.org/10.1109/jssc.2024.3351372
IF: 5.4
2024-01-01
IEEE Journal of Solid-State Circuits
Abstract:This article presents a 116-Gb/s PAM4 voltage-mode (VM) transmitter (TX). The TX includes a 4:1-multiplexed 7-bit digital-to-analog converter (DAC) driver with an eight-tap feedforward equalizer (FFE). A high energy efficiency of 0.9 pJ/bit was achieved by novel data and clock path architectures that operate at up to 14.5 GHz. In the data path, the serializer is based mainly on MUXes that are biased in an unregulated low voltage supply of 0.75 V. Since high-quality clocks are not needed in the data path, the buffer loading of the clocks can be reduced. In the clock path, 1-unit interval (UI) pulse generation is formed to sample the data inside the 4:1 multiplexer (MUX) driver. It is shown that the driver is compatible to IEEE802.3-ck and OIF CEI-112G-LR PAM4 standards. The TX was fabricated in a TSMC 5-nm FinFET node and occupies an area of 0.082 ( $300$ $\times$ $272$ $\mu$ m).
engineering, electrical & electronic
-
FPGA-based 4 × 29.4912 Gbit/s PS-PAM4 signal transmission with a low-complexity probabilistic shaping scheme.
Kaihui Wang,Long Zhang,Yu Chen,Yikai Wang,Chen Wang,Yun Chen,Jianjun Yu
DOI: https://doi.org/10.1364/OL.484599
IF: 3.6
2023-01-01
Optics Letters
Abstract:In this experiment, we demonstrate a real-time intensity modulation and direct detection (IM/DD) system based on a field programmable gate array (FPGA). For high-speed parallel signal processing, we propose and implement the simplified parallel-constant modulus algorithm (CMA) and decision-directed least mean square (DDLMS) equalizers with low complexity and low latency. Moreover, the bit-class probabilistic shaping (PS) scheme is adopted with very few hardware resources. The digital signal processing (DSP) steps are implemented in the XCVU9P-FLGB2104-2-I Xilinx FPGA with a clock frequency of 230.4 MHz. Based on the experimental results, 4 × 29.4912 Gbit/s PS-pulse amplitude modulation (PAM4) signals can be successfully transmitted over 25 km of standard single-mode fiber (SSMF) while satisfying the hard-decision forward error correction (HD-FEC) threshold at 3.8 × 10. Compared with the uniformly distributed PAM4 signal, the low-complexity PS scheme can improve the receiver sensitivity by more than 1 dB.
-
6.3 A 0.43pj/b 200gb/s 5-Tap Delay-Line-Based Receiver FFE with Low-Frequency Equalization in 28nm CMOS
Bingyi Ye,Guangdong Wu,Weixin Gai,Kai Sheng,Yandong He
DOI: https://doi.org/10.1109/isscc42615.2023.10067348
2023-01-01
Abstract:The ever-increasing demand for greater I/O bandwidth has pushed the transceiver data rate to 200Gb/s [1]. At this rate, the implementation of decision-feedback equalizers faces severe timing constraints. Discrete-time feed-forward equalizers (FFEs) in receivers (RXs) break the timing loop and compensate for electrical and optical impairments [2–3]. However, it relies on accurate, multiphase, and high-speed sampling clocks. The RX FFEs implemented in the continuous-time domain use active [4–5] or passive [5–6] delay lines, which eliminate clock and interleaved sample-and-hold circuits. In addition, the continuous-time FFE preserves edge information and therefore supports the oversampling clock and data recovery (CDR). This paper presents a 5-tap delay-line-based receiver FFE operating at 200Gb/s and equalizing a 17.2dB-loss channel.
-
FPGA Implementation for 44.2368-Gbit/s PAM8 Signal Transmission with Pruned Pre-Equalization.
Yikai Wang,Yu Chen,Long Zhang,Sicong Xu,Kaihui Wang,Jianjun Yu
DOI: https://doi.org/10.1364/ol.498579
IF: 3.6
2023-01-01
Optics Letters
Abstract:In this experiment, we demonstrated an intensity modulation and direct-detection (IM/DD) system based on a field-programmable gate array (FPGA). The PAM8 signals are successfully delivered at 44.2368 Gbit/s over a 45-km standard single-mode fiber (SSMF), satisfying the soft-decision forward error correction (SD-FEC) criterion of 2.4 x 10-2, and the net bit rate may reach 36.864 Gbit/s without the need of optical amplifiers. At the transmitter, we used a pruned pre-equalization algorithm to process the PAM8 signals, and the high-speed parallel PAM8 signals were processed at the receiver using 64 parallel constant modulus algorithm (CMA) and decision-directed least mean square (DD-LMS) equalizers. Additionally, we analyzed the bit error rate (BER) performance of the DD-LMS equalizer in FPGA simulation with various data lengths, both before and after equalization.& COPY; 2023 Optica Publishing Group
-
A PAM-4 adaptive analog equalizer with decoupling control loops for 25-Gb/s CMOS serial-link receiver
Shunbin Li,Peng Liu,Weidong Wang,Xing Fang,Dong Wu,Xiang-Hui Xie
DOI: https://doi.org/10.1109/SOCC.2015.7406950
2015-01-01
Abstract:PAM-4 signaling is an effective solution for high-speed CMOS serial-link transceivers, but it suffers from the difficulty of signal regeneration in analog front-end owning to its multi-level characteristics. An adaptive analog equalizer with decoupling control loops is proposed to address the nonlinearity of amplifiers. A low-frequency gain invariant equalizer and a golden signal generator are designed to serve boost and swing control loops, respectively. An integrating charge pump is employed to improve the convergence performance of receiver. Transistor-level simulation results show that the proposed adaptive analog equalizer in 40-nm CMOS technology can recover 25 Gb/s random data transmitted over a 29.8 inches Megtron6 printed circuit board (PCB) copper channel.
-
An Output Bandwidth Optimized 200-Gb/s PAM-4 100-Gb/s NRZ Transmitter With 5-Tap FFE in 28-nm CMOS
Zhongkai Wang,Minsoo Choi,Kyoungtae Lee,Kwanseo Park,Zhaokai Liu,Ayan Biswas,Jaeduk Han,Sijun Du,Elad Alon
DOI: https://doi.org/10.1109/jssc.2021.3109562
2022-01-01
Abstract:This article presents a 200-Gb/s pulse amplitude-modulation four-level (PAM-4) and 100-Gb/s non-return-to-zero (NRZ) transmitter (TX) in 28-nm CMOS technology. To achieve the target data rate, the output bandwidth and swing of the proposed TX are optimized by minimizing the output capacitance of the 4:1 multiplexer (MUX) and driver stage with pull-up current sources and adopting a fully reconfigurable 5-tap feed-forward equalizer (FFE). The key circuit includes a segmented 8:4 MUX and 4:1 MUX/driver, a thermal encoder and retimer, and a flexible clock distribution network. Using the layout generated with Berkeley Analog Generator (BAG), the proposed TX achieves an eye opening with >52.9-mV eye height, 0.36 UI eye width, >98% RLM, and 4.63 pJ/b at 200-Gb/s PAM-4 signaling under >6-dB channel loss at 50 GHz, demonstrating the highest data rate achieved using a planar process.
engineering, electrical & electronic
-
A 1.41-Pj/b 56-Gb/s PAM-4 Receiver Using Enhanced Transition Utilization CDR and Genetic Adaptation Algorithms in 7-Nm CMOS
Behzad Dehlaghi,Shayan Shahramian,Joshua Liang,Ryan Bespalko,Dustin Dunwell,James Bailey,Bo Wang,Alireza Sharif-Bakhtiar,Michael O'Farrell,Kerry Tang,Anthony Chan Carusone,David Cassan,Davide Tonietto
DOI: https://doi.org/10.1109/lssc.2019.2938677
2019-01-01
IEEE Solid-State Circuits Letters
Abstract:This letter presents a 56.25-Gb/s analog-mixed signal pulse amplitude modulation (PAM)-4 receiver in 7-nm fin field effect transistor (FinFET) CMOS. The receiver uses an analog front-end (AFE) with extensive programmability and can equalize channels with up to 22.3-dB loss at 14 GHz. AFE settings are optimized using a genetic adaptation algorithm to find the global minima for the bit-error-rate (BER). A PAM4 clock recovery scheme is proposed that reduces the number of required edge samplers for a PAM-4 bang-bang phase detector without degrading the jitter tolerance of the receiver. Using an AFE as opposed to a decision-feedback equalizer (DFE) along with the proposed clock recovery scheme results in low energy consumption of 1.41 pJ/bit.
-
A 40-Gb/s PAM-3 Receiver With Modified Summer-Merged Slicers and PRTS Checker
Jhe-En Lin,Yi-Hao Lan,Shen-Iuan Liu
DOI: https://doi.org/10.1109/tvlsi.2024.3393896
2024-07-26
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:This article presents a 40-Gb/s (25.6 GBaud) quarter-rate receiver utilizing three-level pulse amplitude modulation (PAM). The continuous-time linear equalizer (CTLE) with a passive high-pass filter provides a boosting gain of 13 dB at 12.8 GHz. A two-tap data decision feedback equalizer (DFE) and a one-tap edge DFE are included. The phase detector (PD) logic directly controls the digitally controlled oscillator (DCO) to reduce the loop latency. This receiver is fabricated by a 28-nm CMOS process and its area is 0.12 mm2. By using a pseudorandom ternary sequence (PRTS) of , this 40-Gb/s receiver compensates the channel loss up to 23-dB loss with a bit error rate (BER) . The total power consumption of this receiver is 90 mW at 40-Gb/s, which achieves an FoM of 98 fJ/bit/dB.
engineering, electrical & electronic,computer science, hardware & architecture
-
A Controller PHY for Managed DRAM Solution With Damping-Resistor-Aided Pulse-Based Feed-Forward Equalizer
Hyeongjun Ko,Mino Kim,Hyunkyu Park,Sangyoon Lee,Jaewook Kim,Suhwan Kim,Joo-Hyung Chae
DOI: https://doi.org/10.1109/jssc.2021.3062876
2021-08-01
Abstract:A controller PHY for high-capacity DRAM is presented. To reduce precursor and postcursor intersymbol interference due to its dispersive channel characteristics and a heavy load of many DRAM chips and to attenuate reflection on a highly reflective command/address (C/A) channel, a damping-resistor-aided three-tap pulse-based feed-forward equalizer (PB-FFE) is introduced. An appropriate damping resistance can attenuate reflection, and the PB-FFE compensates for increased insertion loss due to the damping resistor. In addition, the current flows only before and after a signal transition in the PB-FFE, improving energy efficiency and maintaining the turn-ON resistance during the no-transition region. A controller PHY based on this equalizer was fabricated in a 55-nm CMOS process. The PB-FFE increases the timing margin of the C/A signal from 0.23 to 0.29 UI at 1067 Mb/s. At 2133 Mb/s, the read timing and voltage margins of the DQ signal are 0.53 UI and 211 mV after read training, and its write margin is 0.72 UI and 230 mV, respectively, after write training.
engineering, electrical & electronic