ABSTRACT Title of dissertation: SUPERCONDUCTING LOGIC CIRCUITS OPERATING WITH RECIPROCAL MAGNETIC FLUX QUANTA Oliver Timothy Oberg, Doctor of Philosophy, 2011 Dissertation directed by: Professor Fred Wellstood Department of Physics Complimentary Medal-Oxide Semiconductor (CMOS) technology is expected to soon reach its fundamental limits of operation. The fundamental speed limit of about 4 GHz has already effectively been sidestepped by parallelization. This increases raw processing power but does nothing to improve power dissipation or latency. One approach for increasing computing performance involves using super- conducting digital logic circuits. In this thesis I describe a new kind of superconduct- ing logic, invented by Quentin Herr at Northrop Grumman, which uses reciprocal pairs of quantized single magnetic flux pulses to encode classical bits. In Recipro- cal Quantum Logic (RQL) the data is encoded in integer units of the magnetic flux quantum. RQL gates operate without the bias resistors of previous superconducting logic families and dissipate several orders of magnitude less power. I demonstrate the basic operation of key RQL gates (AndOr, AnotB, Set/Reset) and show their self-resetting properties. Together, these gates form a universal logic set and provide memory capabilities. Experiments measuring the bit error rate of the AndOr gate extrapolated a minimum BER of 10?480 and a BER of 10?44 with 30% margins on flux biasing. I describe an analytic timing model for RQL gates which demonstrates the self-correcting timing features. From this model I derive equations for the timing behavior and operating limits. Using this timing model I ran simulations to deter- mine correction factions for more accurate predictions at higher frequencies. Using these results, I also develop Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL) models to describe the combinational logic of RQL gates. To test the timing predictions of the timing model, I performed three experi- ments on Nb/AlOx/Nb circuits at 4.2 K. The first measured the time of output. The second measured the operating margins of the circuit. The third measured the max- imum frequency of operation for RQL circuits. Together, these three experiments showed quantitative agreement with the model for the timing output, qualitative agreement with the limits of operation, and a projected speed limit of 50 GHz for the Hypres 4.5 kA/cm2 process. To power RQL circuits I describe a new design for power splitters and com- biners which minimize standing waves. I describe a new kind of Wilkinson power splitter which required numerical optimization but proved to be adequate. I exper- imentally tested two new designs of the power splitter. Both showed less than 10% variation in standing waves between power splitter and combiner, making it ade- quate for RQL circuits. I also compared these results with the S-parameters of the power network, which also indicated that the design was adequate for RQL circuits. Finally, I tested an 8-bit Kogge-Stone architecture carry-look ahead adder designed using VHDL models. The adder contained 815 Josephson junctions and was fully functional at 6.21 GHz with a latency of 1.25 clock cycles. The adder produced the correct logical output, had a measured optimal operating point within 8% of the optimal simulated operating point, and measured power margins of 1 dB. It operated best at the designed clock amplitude of 0.88Ic and dissipated 0.570 mW of power. SUPERCONDUCTING LOGIC CIRCUITS OPERATING WITH RECIPROCAL MAGNETIC FLUX QUANTA by Oliver Timothy Oberg Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2011 Advisory Committee: Professor Frederick Wellstood Professor Anna Herr Professor Christopher Lobb Professor James Anderson Dr. Benjamin Palmer Professor Chris Davis c? Copyright by Oliver Timothy Oberg 2011 Enjoy the little things in life, for one day you may look back and realize they were the big things. - Robert Brault ii Acknowledgments I have had a harder time properly giving thanks to the many people who made this thesis possible than writing the whole rest of the thesis. Instead, I will try to give acknowledgements as best I can in as short a space as possible, and trust that those whom I may not have explicitly mentioned know they have more gratitude than I can properly express. First and foremost I?d like to thank Dr. Anna Herr, my research advisor first at UMCP and then at Northrop Grumman. Her guidance and patience are the foundation not only of my thesis but of my academic success so far. She has not only given me fantastic research opportunities beyond any I expected to see in graduate school but also taken a personal interest in my work and education. Even at the busiest times she never failed to help me through a problem and I never had to wait in idle frustration. I also owe a great amount of appreciation to my academic advisor, Professor Fred Wellstood at UMCP. He was willing to step in for me at the university to take care of all University-related issues, and went above and beyond helping me review and edit this thesis. His input has been both quite helpful and educational. Dr. Quentin Herr at Northrop Grumman has been as much a mentor to me as Anna or Fred. He has worked with me daily for almost three years and has been instrumental in my understanding of superconductivity and microwave physics. This thesis would not have been possible without his support, patience, and guidance. There are many other individuals who have not only helped me greatly as iii a graduate student but have contributed their ideas and work efforts to the work in this thesis. At Northrop Grumman, John Fusco has been pivotal in setting up my graduate studies here. Stephen Van Campen has likewise been a fantastic supporter in management without whom very little could have been accomplished. Dr. James Baumgartner and Dr. Aaron Pesetski have been fantastically helpful and supportive, always happy to explain concepts, provide feedback, and share a joke. Dr. Ofer Naaman has been part of the same work efforts as I have and has always been willing to help bridge the gaps in my understanding of microwave behavior and superconductivity. Steven Shauck has been a wonderful tutor to me in all things VHDL. He?s probably forgotten more about VHDL than I will ever know, but has always been happy to find time in his very busy schedule to teach me about VHDL. Alex Ioanniadis, who has since gone off to graduate school himself, was a fantastic lab partner and experimentalist who taught me much of what I know about running experiments. Donald Miller has been a constant resource of knowledge and wisdom, always ready to help me hash out new and odd ideas and nitpick the details of old ideas. Many people have contributed directly to the results in this thesis. Quentin Herr and Alex Ioanniadis performed the measurements shown in Figures 2.18 and 2.19. Ofer Naaman performed the numerical optimization of the design shown in Figure 4.10 and made the CAD layout of the same device as seen in Figure 4.17. Pavel Borodulin at Northrop Grumman supplied the analysis of the probe shown in Figure C.2. Steven Shauck supplied the final design of the adder of Chapter 6. Dr. John Pryzbysz at Northrop Grumman supplied the idea of using the spectrum iv analyzer to measure the side-band power of the CLA. Although the bulk of my research was done at Northrop Grumman in Balti- more, most of my education was done at College Park, where there are a number of people I have to thank for their friendship, support, and camaraderie. I have to thank Dr. Rupert Lewis ? now also at Northrop Grumman ? and Dr. Gus Vlahacos for their kindness and support while I was starting my stint as a research assistant. Many thanks to Professor Ellen Williams for years of guidance, listening when others were busy or away. I can not in good conscience fail to mention the handful of people outside of work, and outside of the university who gave me emotional support and friendship. I am lucky to have friends in almost every of the 50 states and in more countries than I can remember off the top of my head. But a special few never wavered from complete and permanent support in all parts of life, and without them my life couldn?t have moved forward let alone would I have been able write this thesis. v Table of Contents List of Tables viii List of Figures ix List of Abbreviations xii 1 Introduction to Superconductivity and Josephson Junctions 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Superconductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Josephson Junctions . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Superconducting Interferometers . . . . . . . . . . . . . . . . . . . . . 27 1.5 Introduction to Superconducting Digital Logic . . . . . . . . . . . . . 37 2 Reciprocal Quantum Logic 41 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Josephson Transmission Line . . . . . . . . . . . . . . . . . . . . . . . 42 2.3 Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.4 Composite Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.5 Fabrication and Equipment . . . . . . . . . . . . . . . . . . . . . . . 59 2.6 Experimental Verification . . . . . . . . . . . . . . . . . . . . . . . . 69 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3 Combinational Gates 83 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.2 Junction Switching Time under AC Bias Current . . . . . . . . . . . 84 3.3 Timing Extraction from Simulation . . . . . . . . . . . . . . . . . . . 95 3.4 VHDL Models for RQL Gates . . . . . . . . . . . . . . . . . . . . . . 109 4 Power Network Design 121 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.2 Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.3 Standalone Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 4.4 Test with RQL Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5 Experimental Verification of RQL Timing Parameters 163 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.2 Circuits and Simulation for Experiments 1, 2, 3 . . . . . . . . . . . . 166 5.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.4 Data and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 vi 6 Carry-Look Ahead Adder Experiment 191 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.2 Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 7 Summary and Conclusions 219 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.2 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . 221 7.3 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Appendices 227 A Numerical Solution of the Sine-Gordon Equation 227 B Parameters for fits 229 B.1 Timing Extraction Results for the JTL . . . . . . . . . . . . . . . . . 229 B.2 Comparison of Threshold Values in Timing Extraction . . . . . . . . 233 B.3 Simulation File for Timing Extraction . . . . . . . . . . . . . . . . . . 234 C Wilkinson Power Splitter Response Parameters 263 C.1 Derivation of Impedance Values . . . . . . . . . . . . . . . . . . . . . 263 C.2 HPI Probe Internal Reflections . . . . . . . . . . . . . . . . . . . . . 267 C.3 Netlist for simulation of S-Parameters . . . . . . . . . . . . . . . . . . 268 D Fitting Functions for Race Circuit Experiments 272 D.1 Two-Output Fitting Code . . . . . . . . . . . . . . . . . . . . . . . . 272 D.2 Fit to Experiment 2 Data . . . . . . . . . . . . . . . . . . . . . . . . 276 D.3 And-Output Fitting Code for gnuplot . . . . . . . . . . . . . . . . . . 277 D.4 Calculation of Depressed IcRN Product . . . . . . . . . . . . . . . . . 282 E Hypres Fabrication Summary 284 F Spice Netlist of CLA 287 Bibliography 319 vii List of Tables 2.1 Universal Logic Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.1 Comparison of different Jc, IcRN and switching time t0 . . . . . . . . 87 3.2 Timing Fit Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.3 Global VHDL Quantities . . . . . . . . . . . . . . . . . . . . . . . . . 111 3.4 Truth table for AndOr and AnotB in VHDL . . . . . . . . . . . . . . 115 3.5 Truth table for Set/Reset in VHDL . . . . . . . . . . . . . . . . . . . 117 3.6 Truth table for a JTL in VHDL . . . . . . . . . . . . . . . . . . . . . 119 4.1 Impedance values for the Wilkinson splitter stages . . . . . . . . . . . 128 4.2 Resistance values in Wilkinson power splitter . . . . . . . . . . . . . . 132 4.3 Chip resonance lengths for frequencies f of interest . . . . . . . . . . 161 5.1 Operational bias conditions for N22TE . . . . . . . . . . . . . . . . . 177 5.2 Fitting parameters of two-output circuit data . . . . . . . . . . . . . 181 5.3 Summary of measurements of Pin and Vp?p in N22TE . . . . . . . . . 185 5.4 Analysis of the long, deep shift register from N22TE . . . . . . . . . . 189 6.1 Expected CLA output pattern for two cyclic input sequences . . . . . 203 6.2 Power Measurement Calculations . . . . . . . . . . . . . . . . . . . . 217 B.1 Extracted JTL Timing Parameters . . . . . . . . . . . . . . . . . . . 230 B.2 Extraction of JTL Timing Parameters (polynomial fit) . . . . . . . . 231 B.3 Extraction of AndOr OR output timing parameters . . . . . . . . . . 232 D.1 Fitting parameters of and-output circuit data . . . . . . . . . . . . . 277 D.2 Alternative switching time calculation . . . . . . . . . . . . . . . . . . 283 E.1 Hypres fabrication design specifications . . . . . . . . . . . . . . . . . 286 viii List of Figures 1.1 Green?s Functions Ip and Iq for Josephson Junction at T = 0 . . . . . 14 1.2 Superconductor-Insulator-Superconductor tunneling I-V Curve . . . . 15 1.3 Superconductor-Insulator-Superconductor Tunneling . . . . . . . . . . 18 1.4 Equivalent electrical circuit of a Josephson Junction in the RSJ model 20 1.5 Phase Diagram of Josephson Junction . . . . . . . . . . . . . . . . . . 23 1.6 Josephson junction potential energy . . . . . . . . . . . . . . . . . . . 25 1.7 Voltage vs time dynamics of overbiased junction . . . . . . . . . . . . 28 1.8 I-V curve of current driven junctions . . . . . . . . . . . . . . . . . . 29 1.9 Single-junction interferometer . . . . . . . . . . . . . . . . . . . . . . 30 1.10 Single-junction interferometer . . . . . . . . . . . . . . . . . . . . . . 32 1.11 Josephson Transmission Line . . . . . . . . . . . . . . . . . . . . . . . 35 1.12 Phase behavior of junction in JTL . . . . . . . . . . . . . . . . . . . . 36 2.1 Basic RQL Interconnect Element . . . . . . . . . . . . . . . . . . . . 43 2.2 Josephson transmission line and SFQ launch circuit diagram . . . . . 44 2.3 Data propagation in an RQL 4-phase clock transmission line . . . . . 47 2.4 Deep Pipeline JTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5 RQL Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.6 Set/Reset Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.7 RQL Exclusive-OR Gate . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.8 Non-Destructive Read-Out Gate . . . . . . . . . . . . . . . . . . . . . 58 2.9 RQL Clock Line Transformer Layout . . . . . . . . . . . . . . . . . . 60 2.10 RQL Clock Line Transformer with DC Bias . . . . . . . . . . . . . . 62 2.11 Schematic of test probe . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.12 Layout of Monrovia 20 RQL chip . . . . . . . . . . . . . . . . . . . . 65 2.13 Monrovia 20 logic chip . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.14 Experimental setup for timing experiments . . . . . . . . . . . . . . . 68 2.15 Oscilloscope output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.16 Logic Test of Basic RQL Gates . . . . . . . . . . . . . . . . . . . . . 74 2.17 Power schematic for RQL . . . . . . . . . . . . . . . . . . . . . . . . 76 2.18 Power Dissipation Measurements . . . . . . . . . . . . . . . . . . . . 79 2.19 Bit Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.1 Junction phase delay versus starting junction phase ? . . . . . . . . . 88 3.2 Self-correcting timing mechanism of RQL . . . . . . . . . . . . . . . . 90 3.3 Relationship between input time on consecutive phases . . . . . . . . 92 3.4 Switching delay ? versus input phase for different clock frequencies . 94 3.5 Phases of two sequential junctions during switching . . . . . . . . . . 98 3.6 Data path through AndOr gate . . . . . . . . . . . . . . . . . . . . . 99 3.7 Circuits used to extract RQL timing results from spice simulations . . 100 3.8 Fit of delay equation to simulated switching times . . . . . . . . . . . 104 3.9 Simulated delay versus input phase . . . . . . . . . . . . . . . . . . . 106 3.10 Comparison of Extracted Timing Curves . . . . . . . . . . . . . . . . 107 ix 3.11 Simulated timing data for the JTL at 13 GHz . . . . . . . . . . . . . 108 3.12 Timing model for RQL clock . . . . . . . . . . . . . . . . . . . . . . . 110 3.13 Combinational logic of RQL gates . . . . . . . . . . . . . . . . . . . . 114 3.14 AndOr gate VHDL code . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.15 Combinational behavior of the JTL in VHDL . . . . . . . . . . . . . 118 4.1 Wilkinson Power Splitter . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.2 Schematic of the Wilkinson power splitter (1221 configuration) . . . . 125 4.3 Even and Odd mode analysis of the Wilkinson Power Splitter . . . . 126 4.4 Wilkinson 1221 Simulated Reflection Parameters . . . . . . . . . . . . 129 4.5 Circuit schematic for Wilkinson 4440 configuration . . . . . . . . . . 130 4.6 Circuit schematic for WPS2220 . . . . . . . . . . . . . . . . . . . . . 130 4.7 Geometric versus max flat power splitter reflections . . . . . . . . . . 131 4.8 Geometric power splitter isolation . . . . . . . . . . . . . . . . . . . . 133 4.9 Isolation parameter measurement . . . . . . . . . . . . . . . . . . . . 134 4.10 Circuit schematic for N23PS . . . . . . . . . . . . . . . . . . . . . . . 135 4.11 Wilkinson 3111 Simulated S-Parameters . . . . . . . . . . . . . . . . 136 4.12 Block diagram for measuring standing currents . . . . . . . . . . . . . 138 4.13 Simulated standing wave currents . . . . . . . . . . . . . . . . . . . . 139 4.14 Standing Waves in Wilkinson 3111 Power Network . . . . . . . . . . . 142 4.15 Experimental setup for measurement of S-parameters . . . . . . . . . 143 4.16 M20PS even mode test . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.17 Microphotograph of Norwalk 23 . . . . . . . . . . . . . . . . . . . . . 147 4.18 Measured parameters of geometric power splitter . . . . . . . . . . . 148 4.19 Simulated reflection for the N23PS circuit . . . . . . . . . . . . . . . 150 4.20 Measured S-parameters on N21CLA Wilkinson power splitter . . . . . 152 4.21 S-parameters from ADS for N23PS . . . . . . . . . . . . . . . . . . . 154 4.22 Odd mode test block diagram for N23PS . . . . . . . . . . . . . . . . 156 4.23 Wilkinson-powered RQL circuit measurements of N23PS . . . . . . . 160 5.1 Microphotograph of N22TE . . . . . . . . . . . . . . . . . . . . . . . 164 5.2 Block diagram and layout of N22TE . . . . . . . . . . . . . . . . . . 167 5.3 Input and output phases versus time for the two-output race circuit . 168 5.4 Two-output race circuit timing predictions . . . . . . . . . . . . . . . 170 5.5 Operational space of N22TE . . . . . . . . . . . . . . . . . . . . . . . 171 5.6 Long, deep pipeline shift register . . . . . . . . . . . . . . . . . . . . 174 5.7 Experimental setup for timing experiments . . . . . . . . . . . . . . . 175 5.8 Two-output race circuit measured data . . . . . . . . . . . . . . . . . 179 5.9 And-output race circuit data . . . . . . . . . . . . . . . . . . . . . . . 182 5.10 Multi-Phase Shift Register Amplitude Margins . . . . . . . . . . . . . 187 6.1 Photo of N21CLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 6.2 Carry-Look Ahead elements . . . . . . . . . . . . . . . . . . . . . . . 194 6.3 Generic Kogge-Stone CLA Architecture . . . . . . . . . . . . . . . . . 196 6.4 Final Carry-Look Ahead Adder design . . . . . . . . . . . . . . . . . 198 x 6.5 Block diagram of experimental setup for N21CLA . . . . . . . . . . . 200 6.6 Shift register input for CLA . . . . . . . . . . . . . . . . . . . . . . . 202 6.7 Measured CLA Output . . . . . . . . . . . . . . . . . . . . . . . . . . 205 6.8 Power margins for CLA . . . . . . . . . . . . . . . . . . . . . . . . . 208 6.9 Measured power spectrum of Carry-Look Ahead Adder . . . . . . . . 211 6.10 Modulation of Clock Signal by RQL Gate Operation . . . . . . . . . 212 B.1 Comparison of Threshold Values . . . . . . . . . . . . . . . . . . . . . 234 C.1 Simulated Probe PCB Losses . . . . . . . . . . . . . . . . . . . . . . 267 C.2 S-Parameter Test Circuits . . . . . . . . . . . . . . . . . . . . . . . . 268 D.1 Curve fitting to Experiment 2 . . . . . . . . . . . . . . . . . . . . . . 277 xi List of Abbreviations ADS Advanced Design System 2009 software BCS Bardeen-Cooper-Schrieffer BER Bit Error Rate CLA Carry Look Ahead CMOS Complimentary metal-oxide-semiconductor GHz Gigahertz HPD High Precision Devices IREAP Institute for Research in Electronics and Applied Physics JJ Josephson Junction JTL Josephson Transmission Line NSA National Security Agency NRZ Non-Return to Zero PCB printed circuit board ps picosecond RQL Reciprocal Quantum Logic RSFQ Resistive Single Flux Quantum RSJ Resistively Shunted Junction RZ Return to Zero SFQ Single Flux Quantum SIS Superconductor-Insulator-Superconductor SNS Superconductor-Normal Metal-Superconductor SQUID Superconducting QUantum Interference Device std ulogic Synopsys extension to IEEE 1164, a VHDL class VHDL VHSIC Hardware Description Language VHSIC Very High Speed Integrated Circuit VLSI Very Large Scale Integration ? Clock phase (? = ? t in most cases) ?(?) Analytic Timing Model equation ? Phase across Josephson junction List of Samples M20LT Monrovia 20 Logic Test M20SR Monrovia 20 Shift Register M20PS Monrovia 20 Power Splitter N23PS Norwalk 23 Power Splitter N22TE Norwalk 22 Timing Experiment N21CLA Norwalk 21 Carry Look Ahead Adder xii Chapter 1 Introduction to Superconductivity and Josephson Junctions 1.1 Overview Nearly 200 years passed between Franklin?s discovery of electricity and the de- velopment of the first electronic computer [1, 2]. Superconductivity was discovered in 1911 by Heike Kamerlingh Onnes [3] and only began to impact computing about 50 years later [4]. In 1962 Brian David Josephson postulated the Josephson effect, which would lead to the invention of the dc SQUID two years later [5]. By the mid- 1980s IBM terminated a major effort to build a computer using superconductivity. Shortly thereafter Josephson junctions began being considered for reversible com- putation [6] and used in Resistive Single Flux Quantum digital circuits [4]. Now, one century after the discovery of superconductivity, contemporary semiconductor- based computation seems to be approaching a fundamental limit [7] and this raises the possibility that a new generation of computers based on superconductivity and Josephson junctions may arise to push technology forward [8]. One technology potentially capable of pushing computation forward is Recip- rocal Quantum Logic (RQL), the subject of my research over the past few years. The goal of this research was to demonstrate the feasibility of using RQL for very high speed and very low power computation. This thesis has three main parts. First, I provide a basic overview of superconductivity and Josephson junctions (Chapter 1). 1 This overview is far from exhaustive but serves to highlight aspects that are most important to the subsequent discussion of RQL. The first part concludes with an introduction to RQL (Chapter 2), which is where my own work begins. Next, in the second part, I describe my research into the behavior of junctions using high-level simulations in Very High Speed Integrated Circuit Hardware Description Language (VHDL). In Chapter 3, I derive an analytic model for the timing behavior of Joseph- son junctions in RQL circuits. After I verify this model in simulations, I proceed to cast RQL into the industry-standard VHDL. Chapter 4 is a departure from the previous topics and describes the development of a new power network for RQL, but together chapters 2?4 provide the basis for design of functioning circuits. In the third part, I describe my experiments testing the timing behavior (Chapter 5) and a fully functioning 8-bit adder (Chapter 6). Finally, in Chapter 7, I conclude with a brief summary of my main results and make some suggestions for future work. 1.2 Superconductivity Following the discovery of superconductivity in 1911 many attempts were made to understand the phenomenon. In the Drude model of conduction in normal metals current density is proportional to the average velocity of electrons [9], which accel- erate under an electric field over a distance l until colliding with defects and slowing down. A stable current is reached when the average deceleration due to collisions matches the acceleration due to the electric field. In the limit l ? ? infinite con- ductivity would result. However, in the 1930s it was found a superconductor is not 2 merely a metal with infinite conductivity. Superconductors exhibit new behavior that ultimately required new physics to be understood. 1.2.1 London Equations Around the time superconductivity was being discovered, quantum mechanics was being developed. In quantum mechanics the canonical momentum of a particle in a magnetic field is given by ~p = m~v + qe ~A, (1.1) where m is the particle?s mass, q is its charge, e is defined as the magnitude of the charge of an electron (+1.609?10?19 C), and ~v and ~A are the velocity and magnetic vector potential. If one assumes that in the ground state of a system the (local) average ?~p? = 0 then the current density Js can be expressed as ~Js = nse?~vs? = ?nse 2 m ~A = ? ~A ? . (1.2) Here the s-subscript refers to superconducting currents and electrons and ? = m/nse2. We can also define ? = ?2; the meaning of ? will become clearer, but for now we note that it has dimensions of length. Taking the time derivative and then the curl of (1.2), one can show that this leads to the London equations [9, 10] for the electric field ~E = ?? ~A/?t and magnetic field ~H = ?? ~A ~E = ? ?t(? ~Js), (1.3) ~H = ??? (? ~Js). (1.4) 3 Finally, using Maxwell?s equation ?? ~H = ~Js on (1.4) gives ?2 ~H = ~H ?2 , (1.5) which implies that ? is the characteristic length scale for the penetration of magnetic field into a bulk superconductor. Equations (1.3) ? (1.5) were first obtained by Fritz and Heinz London in 1935 [10]. The original London equations were phenomenological and ? was simply a fitting parameter. Two insightful results come from this very cursory derivation. Equation (1.3) implies that the current increases in time for a static electric field. Meanwhile, (1.4) implies that magnetic fields are expelled from the interior of super- conductors within a characteristic length ?. Note also that the value of ns is limited on the upper end by the total number of charges in the metal. It can be seen from energy considerations that (1.3) and (1.5) imply an upper limit on the current den- sity ~Js. In addition, in a wire carrying a current, the magnetic field generated by the current will be constrained to a depth ? in the wire. If the current gets too large, the magnetic forces from the current would drive charge into the interior of the wire, destroying the superconductivity. These rough phenomenological consider- ations reveal some of the major features of superconductivity. However, the insight they provide of the superconducting state is limited. For a fuller understanding, I turn to the theory developed by Bardeen, Cooper, and Schrieffer in 1957 [11]. I have three goals for this section. First, I show that the superconducting state exists at any temperature below the transition temperature. Second, I show that quasiparticles in a superconductor have an energy that is at least as large as 4 the superconducting energy gap. Finally, the third and most important point is to develop an understanding of the I-V curve of a Josephson Tunnel junction, as this will be the basis for much of the rest of the thesis. 1.2.2 BCS Theory In superconducting materials and at finite temperatures, ordinary unpaired electrons are present as well as superconducting Cooper pairs [9]. Unpaired elec- trons yield a normal current component and follow Fermi-Dirac statistics. Two unpaired electrons will generally not have identical energies (with the exception of spin pairs) and consequently their quantum mechanical phase will change at dif- ferent rates. In contrast, Cooper pairs follow Bose-Einstein statistics and can have identical energies and phase. In conventional BCS superconductors, Cooper pairs are formed by the interaction of electrons mediated though the exchange of phonons. Individual Cooper pairs are much larger than the mean spacing between pairs [9] and the pairs maintain phase coherence amongst each other by the large amount of overlap between their wave functions [9]. 1.2.2.1 Cooper Pairs Since electrons are charged they exert a Coulomb force on the semi-stationary nuclei in a metal. This force can scatter the electron and perturb nuclei from their equilibrium positions. The perturbations of positive nuclei by a scattered electron can attract other electrons, thus resulting in a net attractive potential V between 5 two electrons despite the presence of electron-electron Coulomb repulsion. In a superconductor this attraction leads to electrons pairing up. The general wave function for a Cooper pair is ?0(~r1, ~r2) = ? ~k g~k e i~k?~r1 e?i ~k?~r2?1?2, (1.6) where ~r1 and ~r2 are the positions of the first and second electron, respectively, g~k is the weighing factor of the orbital wave function, ~k is the wave vector, and ?1 and ?2 are spin functions for the first and second electron, respectively. This wave function can be recast into a form that is explicitly anti-symmetric in ~r1 and ~r2 by considering the distance between a pair ~r1 ? ~r2. We can write in general ?0(~r1 ? ~r2) = ??? ~k>~kF g~k cos~k ? (~r1 ? ~r2) ?? ?singlet12 (1.7) for the singlet state in conventional BCS theory [11]. The sum in (1.7) is only over wave vectors that have lengths greater than the Fermi wave vector, for reasons which will become apparent shortly. ?singlet12 is the spin part of the singlet wave function and it is anti-symmetric under exchange of the electrons. Inserting (1.7) into Schro?dinger?s equation gives a relationship between the energy E and the interaction potential V [9]: 1 V = ? ~k>~kF 1 2~k ? E (1.8) where ~k = h? 2k2/2m. Equation (1.8) can be evaluated as an integral from the Fermi energy EF to a higher energy EF + h??c. One finds 1 V N(0) = ? EF+h??c EF d 2? E = 1 2 ln 2EF ?E + 2h??c 2EF ?E , (1.9) 6 where N(0) is the density of electron states at the Fermi level. In the weak-coupling approximation V N(0)  1 one finds E ? 2EF ? 2h??c e?2/V N(0). (1.10) This result shows that the energy of a pair is reduced by the interaction in a non- perturbative manner and bound states (pairs) can exist no matter how small V becomes. 1.2.2.2 Ground State To get further understanding of the behavior of a superconductor we apply second quantization [9]. Let |F ? be the state of a metal in which all the electron states below the Fermi surface are occupied. Then the wave function for the state ?0 of a superconductor becomes |?0? = ? ~k>~kF g~k c ? ~k ? c ? ? ~k ? |F ? (1.11) where c? ~k ? and c~k ? are the creation and annihilation operators for a pair with mo- mentum ~k and spin ? and the g~k are weighing factors for the pairs, with the anti- commutation relations { c~k ?, c ? ~k? ?? } = ?kk????? and { c~k ?, c~k? ?? } = 0. The number of electrons which wave vector k and spin ? is then given by the operator nk? = c?k?ck? which gives unity when operating on a filled state and zero when operating on an unoccupied state. In a macroscopic superconductor at sufficiently low temperature the fluctua- tions about the ground state will be small and Bardeen, Cooper, and Schrieffer were 7 able to apply a mean-field approach [9]. They wrote the ground state as a product of superposition states with differing momenta: |?G? = ? ~k1,...,~kM ( u~k + v~k c ? ~k ? c ? ? ~k ? ) |?0? , (1.12) with ??v~k ??2+ ??u~k ??2 = 1. ??v~k ??2 is the probability of the state ( ~k ?,?~k ? ) being occupied and ??u~k ??2 is the probability of it being empty. We can learn a bit about the ground state ?G if we assume v~k and u~k differ by a set phase. With this assumption, we can rewrite (1.12) as |?G? = ? ~k1,...,~kM (??u~k?? + ??v~k?? ei? c?~k ? c??~k ?)|?0? . (1.13) The phase ? turns out to be the order parameter of the superconductor, and it obeys an uncertainty relationship with the number of pairs N [9]: ?N?? ? 1. (1.14) 1.2.2.3 Pairing Hamiltonian To arrive at (1.12) for the state ?0, Bardeen, Cooper, and Schrieffer wrote a simplified Hamiltonian H that included a pairing-interaction term H = ? ~k? k n~k? + ? ~k~k? V~k~k? c ? ~k? c ? ? ~k? c?~k?? c~k??. (1.15) The first term is the energy k of a Cooper pair with momentum k and spin ?. The second term describes the energy gained by the annihilation of a Cooper pair con- sisting of electrons with momentum ~k? and the creation of a pair with momentum ~k, where V~k~k? is the scattering matrix element. Interactions between electrons with 8 different momenta ~k do not play a role in BCS theory but may in other applica- tions. The ground state (1.12) and Hamiltonian (1.15) can then be substituted into Schro?dinger?s equation. The ground state energy and g~k can be found by a canonical transformation. Following Tinkham [9] we define b~k = ?c?~k? c~k?? and write c ? ~k? c~k? = b~k + ( c ? ~k? c~k? ? b~k ) . (1.16) The ideas is that the term in parentheses should be small. I can also define ?~k = ? ? ~k? V~k~k? ?c?~k? c~k?? and ?~k = ~k ? EF , neglecting terms that are quadratic in the term in parentheses above. Then the Hamiltonian becomes [9] H = ? ~k? ?~k c?~k? c~k? ? ? ~k ( ?~k c ? ~k? c ? ? ~k? +? ? ~k c ? ? ~k? c ? ~k? ??~k b ? ~k ) (1.17) This can be diagonalized if we define new creation operator ??k and annihilation operator ?~k from: c~k = u ? ~k ?~k + v~k ? ? ~k (1.18) c? ? ~k = ?v ? ~k ?~k + u~k ? ? ~k (1.19) where the v~k and u~k satisfy ??v~k ??2 = 1 2 ( 1? ?~k E~k ) (1.20) ??u~k ??2 = 1 2 ( 1 + ?~k E~k ) (1.21) Finally (1.15) and (1.17) can be put in the form [9]: H = ? ~k (?~k ? E~k +?~k b?~k)+? ~k E~k ( ?? ~k?~k + ? ? ~k?~k ) (1.22) 9 This Hamiltonian has two terms. The first term is just the condensation energy and is a constant. The second term accounts for the energy due to quasiparticles with energy E~k = ( ?2 ~k + ???~k ??2 )1/2 . ?~k is the energy decrease when a Cooper pair forms. A superconducting state will be stable if the energy decrease for a pair forming is greater than that required to leave the Fermi surface. Once a pair is formed, 2? is the energy necessary to break a pair into two quasiparticles. 1.2.2.4 Density of States Quasiparticles behave much like electrons in a normal metal. The density of states Ns(E) of the quasiparticles is related to the density of states of the electrons in the normal metal Nn(?) by Ns(E)dE = Nn(?)d?. For energies small compared to the Fermi energy the number of normal electron states Nn(?) = N(0) can be taken as constant [9]. One then finds: Ns(E) N(0) = d? dE = ? ???? ???? E/?E2 ??2 if E > ? 0 if E < ? (1.23) The dependance of Ns on the quasiparticle energy E is directly manifest in the tunneling behavior between superconductors. The quasiparticle current I that flows between two superconductors with a voltage V between them can be written 10 as [9]: I ' A ? +? ?? N1(E)N2(E + eV )(f(E)? f(E + eV )) dE = A ? +? ?? Ns1(E) N1(0) Ns1(E + eV ) N1(0) (f(E)? f(E + eV )) dE = A ? +? ?? |E|? E2 ??21 |E + eV |?(E + eV )2 ??22 (f(E)? f(E + eV )) dE, (1.24) where f(E) is the Fermi function (probability that a quasiparticle state at energy E is occupied) and A is a constant that depends on the junctions barrier and other details such as temperature. This integral (1.24) can be evaluated numerically or treated analytically for T = 0. Instead, I consider an analysis of the IV curve by Likharev. To proceed, first transform the phase ?(t) into its Fourier components W (?) by using ei?/2 = ei?/2 ? +? ?? W (?)ei?td? where the time derivative of ? is related to the average voltage ?V by ?? = (2 e/h?) ?V . Barone et al. then define the supercurrent component IS(t) and normal current component IN(t) as [4] IS(t) = Im (? +? ?? d?1 ? +? ?? d?2W (?1)W (?2)Ip ( ?2 + ?J 2 ) ei(?!+?2)t+i? ) , (1.25a) IN(t) = Im (? +? ?? d?1 ? +? ?? d?2W (?1)W ?(?2)Iq ( ?2 + ?J 2 ) ei(?!+?2)t ) . (1.25b) In turn, we define Ip and Iq as the Green?s functions for the superconducting elec- trodes. These functions do not depend on the phase dynamics of the junction, only 11 the junction itself. They characterize the junction fully and are given by [4] Ip(?) = 1GN (2pi2e) ? +? ?? d?1 ? +? ?? d?2 ( tanh h??1 2kBT + tanh h??2 2kBT ) (1.26a) ? Im (F1(?1)) Im (F2(?2)) ?1 + ?2 ? w + j0 , Iq(?) = 1GN (2pi2e) ? +? ?? d?1 ? +? ?? d?2 ( tanh h??1 2kBT + tanh h??2 2kBT ) (1.26b) ? Im (G1(?1)) Im (G2(?2)) ?1 + ?2 ? w + j0 , where the subscripts 1 and 2 refer to the left and right superconducting banks of the junction. The functions F and G can be derived from BCS theory and are given by [4] F (?) = pi?(T )? ?2(T )? h?2(? + j0)2 , (1.27) G(?) = pih??? ?2(T )? h?2(? + j0)2 . (1.28) The term ? + j0 is real. The latter part of the expression is a remnant from the complex analysis used to derive the expression and included here for clarity. Barone and Paterno provide a detailed analysis including this term [6]. Substituting (1.27) and (1.28) into (1.26a) and (1.26b) one finds relations for the real and imaginary 12 components of Ip and Iq. (See Fig. 1.1.) Re Ip(?) = ?(0) eRN ? ? ? ?? ? ?? K(x) if x < 1 x?1K(x?1) if x > 1 (1.29a) Im Ip(?) = ?(0) eRN ? ? ??? ??? 0 if x < 1 x?1K(x?) if x > 1 (1.29b) Re Iq(?) = sign(?)?(0) eRN ? ? ??? ??? K(x)? 2E(x) if x < 1 (2x? x?1K(x?1)? 2xE(x?1) if x > 1 (1.29c) Im Iq(?) = sign(?)?(0) eRN ? ? ??? ??? 0 if x < 1 2xE(x?)? x?1K(x?) if x > 1 (1.29d) where E and K are complete elliptic integrals of the first and second kind, re- spectively. I also define x = |?|/?g, the gap frequency ?g = 2?(T )/h?, and x? = ? 1? x?2. Equations (1.29a)?(1.29d) are true only for T = 0; at higher temperatures Ip and Iq must be calculated numerically. By evaluating these equations at arbitrary temperature T , we can find an expression for Vc = IcRN for SIS junctions in terms of the gap energy ?. Here Ic is the critical current and RN is the normal resistance of the junction. Substituting (1.27) and (1.28) into (1.26a) and (1.26b) one finds Vc = IcRN = pi 2 e ?(T ) tanh ?(T ) 2kBT (1.30) for SIS junctions [4]. Figure 1.2 shows the average current ?I for a constant voltage V found from 13 -2 -1 0 1 2 3 4 0 0.4 0.8 1.2 ? / ?g Re Ip 0 0.4 0.8 1.2 ? / ?g Im Ip 0 0.4 0.8 1.2 ? / ?g Re Iq 0 0.4 0.8 1.2 ? / ?g Im Iq Figure 1.1: Green?s Functions Ip and Iq for Josephson Junction at T = 0. Real and imaginary components of the Cooper pair and quasi- particle Green?s functions from (1.29a) ? (1.29d) in superconductor-to- superconductor tunneling. Each components shows a clear transition at the gap frequency ?c. Below this frequency Im Ip = 0 and Im Iq = 0; no quasiparticles tunnel through the barrier. (1.29a) ? (1.29d). Below Vc only a relatively small number of thermally excited quasiparticles tunnel and the average current is low. Only past a critical voltage Vc ? 2?/e does current flow. Above Vc the current is similar to that of a non- superconducting tunnel junction. This behavior is important for Josephson junc- tion dynamics, which shall be the topic of the rest of this chapter. However, it is important to note that it does not include supercurrent flow at V = 0. V = 0 supercurrent flow is considered in the next section. 14 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2 A v er a ge Cu rr en t ? I/ I c Voltage V/Vc Figure 1.2: Superconductor-Superconductor Tunneling I-V Curve for V > 0. The time-averaged current from (1.24) is plotted at zero tem- perature (red). Blue curve shows the effects of non-zero temperature. Green curve shows linear resistive I-V relationship of the normal state. Below Vc a nearly negligible current of unpaired electrons flows in the superconducting state. Above this critical potential, Cooper pairs break apart and the I-V curve becomes similar to the ohmic resistance curve. 1.3 Josephson Junctions In this section, I provide a brief review of the behavior of Josephson junctions. In general the equations of motion for junctions include terms to account for ther- mal noise. For my purposes, these noise considerations are mostly secondary and are deferred to later chapters where I discuss experimental data. A review of the complete range of behaviors and applications for Josephson junctions is well beyond the scope of this thesis. Instead, I highlight some features of junctions that are 15 of particular importance for RQL: the generation of quantized single flux quantum pulses, the finite number of flux states in the single-junction interferometer, and the transport of single flux quantized pulses through transmission lines. Josephson tunneling was predicted in 1962 by B. D. Josephson [12] and ex- perimentally demonstrated in 1964 [13]. Josephson junctions come in many forms. All junctions have in common two superconducting electrodes which are separated by a region that impedes current. This can be a physical narrowing of the su- perconductor itself, a normal metal region, or a thin insulator though which the Cooper pairs can tunnel. My focus is exclusively on the Superconductor-Insulator- Superconductor (SIS) type of junctions, though Superconductor-Normal Metal- Superconductor (SNS) types and Superconductor-Ferromagnet-Superconductor (SFS) junctions may play a role in RQL in the future. In particular, the junctions of in- terest for RQL are SIS junctions with Niobium superconductors and Al2O3 tunnel barriers. The SIS type junction can also have a shunt resistor connected across the junction. As we will see, the properties of shunt resistors effect the dynamics of junctions. 1.3.1 Josephson Equations Given superconductivity and a few assumptions, the Josephson equations can be obtained from the Schro?dinger equation. Consider a state for which the magni- tude of the order parameter |?| does not change in time. If the phase can change 16 in time, then we can write the order parameter as: ?(~r, t) = |?| ei?(~r,t). (1.31) The Schro?dinger equation ih? ??t |?? = H |?? (1.32) then becomes: h? ?? = E, (1.33) where E is the energy of the state |??. I define a phase difference across the junction between superconductor 1 and 2 as ? = ?1 ? ?2, where ?1 is the phase in superconductor 1 and ?2 is the phase in superconductor 2. The current that flows through the junction from 1 to 2 will depend only on this phase difference, IS = IS(?). I now make an assumption that the current is zero for directly connected superconductors with no phase difference. (Relaxation of this assumption leads to another theory of junctions not used in this thesis.) Since ? is periodic in ? with period 2pi, we expect that IS(0) = IS(2pin) = IS(pi + 2pin) = 0 for any integer n. For SIS junctions these physical requirements lead to the lowest order terms in the first Josephson relation, Is(t) = Ic sin(?(t)), (1.34) where Ic is the critical current of the junction, which is determined by the physics of the materials and the geometry of the junction. It is the maximum supercurrent that can flow through the junction. (Higher order terms such as sin 2?, sin 4?, etc. also satisfy these requirements but in general are not experimentally significant for SIS junctions used in RQL [9].) 17 b Superconductor Insulator Superconductor a Re?n2ei?2 Re?2 = ? = ?1(a)? ?2(b) Re (?1 +?2) Current I 2eV = E1 ? E2 = h? ??E1 E2 Re?1 = Re ? n1e i?1 Figure 1.3: Superconductor-Insulator-Superconductor Tunneling. A wave function ? tunneling from Superconductor 1 through an insulating barrier to Superconductor 2 has densities and phases n1 and ?1 in Super- conductor 1 and n2 and ?2 in Superconductor 2. A potential difference between E1 in Superconductor 1 and E2 in Superconductor 2 develops if the phase difference ? = ?1 ? ?2 changes in time. Finally, Josephson found a relationship between the electrical potential V be- tween two superconducting nodes with phase difference ? = ?1 ? ?2 between them and the energy E1 and E2 of the two nodes, 2eV = E1 ? E2 = h? ??. (See Fig. 1.3.) We can write the second Josephson relation as V (t) = h? 2e ??(t) ?t = ?0 2pi ??(t) ?t = h??J 2 e , (1.35) where ?0 is the flux quantum ?0 = 2.062mV ? ps. The 2eV in (1.35) explicitly shows that the voltage is associated with an energy h??J per charge of a Cooper pair ?2e. Here, ?J is the angular frequency for the phase difference across the junction. 18 The quantity ?0 is not merely a unit of convenience. The order parameter ? must be single-valued at every point, meaning that the phase change ?? in going one or more times around the loop must take on values that can differ only by 2pin. It can be shown that this limits the total flux through the loop to values of ? = n?0 [4]. By integrating (1.35) with respect to time over one cycle of phase change, we get ? V (t) dt = ?0 ? d? 2pi = ?0. (1.36) Ultimately, it can be shown that this leads to the ?area? under a voltage pulse from a junction being quantized exactly to ?0 for a single complete switching event. This is a single quantum of magnetic flux, and the voltage pulse associated with it is an SFQ pulse. This result can be related to the flux ? in a closed loop using ? = ? L dI = ? V dt = ?0. Any inductive loop with a junction which switches by 2npi will change the flux by exactly an integer multiple n of ?0. This result has profound implications for superconductors in the presence of magnetic fields. For example, from (1.36) we can see that magnetic fields are expelled from the interior of any superconductor, a phenomenon knows as the Meissner effect, and that an inductor loop with a junction in it contains n? magnetic flux. 1.3.2 RSJ Model In an SIS junction the capacitance occurs because the two superconducting electrodes are separated by an insulating barrier of small thickness, thus taking on the geometry of a capacitor. Figure 1.4 shows the electrical representation of the 19 RJ (a) (b) CIS RS Inoise Figure 1.4: Equivalent electrical circuit of a Josephson Junction in the RSJ model. (a) Ideal Josephson junction symbol. (b) Equivalent circuit of real Josephson junctions. The Josephson junction can be treated as a parallel combination of an ideal Josephson junction with the supercur- rent IS, a capacitor C, and a non-linear resistor RJ , a shunt resistor RS and a noise current Inoise. Josephson junction in the resistively shunted junction (RSJ) model. A junction is represented electrically as an ideal Josephson junction shunted by a resistor (linear or non-linear) and a capacitance. A bias current I can be sent through the junction?s components and can be thought of being composed of a displacement current, a normal current, and a supercurrent. The total current through the junction can then be written as h? C 2 e d2?(t) dt2 + h? 2 eRN d?(t) dt + Ic sin(?(t)) = I + Inoise (1.37a) This can be put in reduced form: ??2p ??(t) + ??1c ??(t) + sin?(t) = i+ inoise = I + InoiseIc , (1.37b) where ?c = (2pi/?0)IcRN is the characteristic frequency of the junction, ?p = 1/?LcC is the plasma frequency (to be defined more rigorously in the next section), and Lc = ?0/(2pi Ic) is the effective Josephson inductance, and RN = RJRS/(RJ + 20 RS) is the effective normal current resistance across the junction. Equation (1.37b) rearranges the terms in (1.37a) to show the similarity to the damped harmonic oscil- lator. The time constants of the junction are ?N = RNC = ?c/?2p and ?c = RN/Lc. The damping factor of the junction is equivalent to the quality factor Q of the junction, which is related to the Stewart-McCumber parameter ?c by ?c = Q2 = ( ?c ?p )2 = ?cRNC = 2e h? IcR 2 NC = 2e h? (IcRN) 2 cs jc , (1.38) where the specific parallel plate capacitance is cs = C/A = 0r/d and the critical current density is jc = Ic/A. For a given jc, ?c is a function of cs and the IcRN product. It is worth pointing out that the equation of motion (1.37a) is identical to that of a damped pendulum, in which the torque is analogous to current, the capacitance is analogous to moment of inertia, the conductance is analogous to the damping coefficient, the critical current is analogous to the maximum gravitational torque, and the junction phase is analogous to angle. This analogy is useful in understanding some of the more complex junction behavior we will shortly discuss. Figure 1.5 shows phase diagrams in the position (?) - momentum (p?) plane for I = 0 of a junction for the underdamped (Q > 1), critically damped (Q = 1), and overdamped (Q < 1) cases. The traces show the trajectories of the junction phase. Two attractors show the equilibrium points. A sufficient increase in momen- tum will cause the junction to switch to a different equilibrium point, generating an SFQ pulse. The critically damped case shows a return to equilibrium in minimum time. Underdamped junctions show oscillation before reaching equilibrium. Over- 21 damped junctions take on low values of p? and do not oscillate but do not return to equilibrium as quickly. In unshunted SIS junctions IcRN = (2pi/3)?. Typically ?c  1, so the junc- tions are over damped. To decrease ?c an external shunt resistor RS is added across a junction to reduce ?c, and then RN can be replaced by RN = RJRS/(RJ+RS) ? RS for RS  RJ , decreasing the overall shunting resistance of the junction. In most cases RS  RN and we can substitute the value of RS for RN without any further modifications [4]. Vc = IcRN then sets the voltage scale for junction behavior, and is typically on the order of a few mV for Nb/Al2O3/Nb junctions. Thus the behavior of the junction is determined in design by the choice of the shunt resistor RS, the junction area A, and the critical current density jc. For ?c > 1 the junction is underdamped and plasma oscillations with frequency ?p will occur that will damp out on a time scale ? = RNC. For ?c < 1 the junction is overdamped and after a disturbance the phase slowly moves towards equilibrium with a time constant 1/?c. The potential energy of the junction can be calculated directly from (1.34) and (1.35). The work WS done on the junction leads directly to an expression for the potential energy of a junction US as follows: WS = ? t2 t1 IS(t)V (t) dt = ?0Ic2e ? ?2 ?1 sin ? d? = ?0Ic 2e (cos?1 ? cos?2) = US(?2)? US(?1). (1.39) From this we can define: US(?) = ? cos??0Ic2e . (1.40) 22 (a) Underdamped (?c = 0.5) ? p? (b) Critically Damped (?c = 1.0) ? p? (c) Overdamped (?c = 0.5) ? p? Figure 1.5: Phase Diagram of Josephson Junction. This figure shows plots of trajectories in the phase ? and momentum p? plane. Phase dia- grams for (a) underdamped, (b) critically damped, and (c) overdamped junctions. In all three cases the state of the junction moves to ? = 2pin, ?? = 0. Oscillations can be seen in the underdamped case. 23 This is the Jospehson energy and we can think of it as energy stored in the effective junction inductance. For small oscillations of the phase, for which ? ? ? + ??, we can look at the variations in current ?Is and voltage ?V . Expanding (1.34) in a Taylor series gives Is = Ic(sin ?+ cos?? ??+ . . .). Using (1.35) to express the phase as the integral of voltage, we get a relation between current and phase which is that of an effective inductance: LS ?IS = LSIc cos?? ?? = ? ?V dt = ?0 2pi ??, (1.41a) and we can define the bias-dependent Josephson inductance LS = ?0 2piIc 1 cos ? = h? 2 e Ic 1 cos? = Lc cos ?, (1.41b) with Lc as previously defined. For bias current Ib < Ic the junction can be in the zero-voltage (S) state where the phase is constant at ? = arcsin Ib/Ic + 2pin, where n is an integer. The potential for a biased junction, using (1.40) in the Gibbs free energy relation G = U ? (Ib/Ic)? [4], is U(?) = h? 2 e Ic (1? cos(?)? i ?) . (1.42) Figure 1.6 shows plots of U(?) for the cases of a high critical current compared to Ib, a low critical current, and an overbiased junction with Ib > Ic. Three general situations can occur. First, for Ib < Ic a junction can exhibit plasma oscillations about the equilibrium position. Second, if Ib is less than but close to Ic, the junction can tunnel through the barrier to the next minimum (and possibly beyond, depend- ing on ?c) and increase its phase by 2pi. Third, if Ib > Ic no minima exist and the junction phase continuously increases with time. 24 Ib, Ic Ib, I ?c = Ic/ ? 2I ?b = 1.05 Ic, I ?c ?? = 2pi ? Us Figure 1.6: Josephson junction potential energy. The potential energy Us of a junction as a function of phase is shown for high and low critical currents of an underbiased junction (Ib < Ic), and for an overbiased junction (Ib > Ic). The mechanical analog is a ball rolling down a tilted washboard. The bias current Ib through the junction determines the average downward slope. (top) Plasma oscillations of a junction trapped in a local minima. (middle) Tunneling though the potential barrier of height ?U(I) = 2?0Ic(1 ? I/Ic)3/2 to the next local minima [9]. This generates an SFQ pulse and changes the phase by exactly 2pi. (bottom) For Ib > Ic no local minima exist and the phase increases without limit. 25 1.3.3 Behavior of Overdamped Junctions The relationship ? V dt = ?0 for one cycle of oscillation is fundamental to all Josephson junctions and is a key property of SFQ pulses. The case of an overdamped junction illustrates how SFQ pulses can be generated and shows the utility of the shunt resistor. The equation of motion for a Josephson junction cannot in general be solved analytically. However in the simple case of an overdamped junction where ?c ? 0 (which implies ?c  ?p) an analytic solution can be found for a constant current. Let ib = Ib/Ic and let time be normalized to the characteristic time 1/?c such that t? = ?ct. Equation (1.37b) then becomes ib = ??(t?) + sin?(t?), (1.43) which has an analytic solution. We are interested in the SFQ pulse dynamics in the I-V curve characteristics of the junction. For ib > 1 the solution to (1.43) is [4, 6] ?(t?) = 2 arctan ( 1 + v tan (1 2 v t ? ) ? v2 + 1 ) . (1.44) where v = ? i2b ? 1 is the time-averaged normalized voltage. The derivative of (1.44) determines the voltage as a function of time. V (t?) = ?0 2pi ??(t?) = ?0 2pi v2 ? 1 + v2 sec2 (1 2 v t ? ) 1 + v tan2 (1 2 v t? ) (1.45) This is a periodic solution with period (in non-normalized time units) ?t = 2pi ?c v . (1.46) Figure 1.7 shows examples of the voltage behavior of an overdamped junction. Figure 1.7(a) shows the solution of (1.45) for two values of Ib. Figure 1.7(b) shows 26 the numerical solutions to (1.37b) for the same values of Ib but with ?c = 0.25. Notice in Fig. 1.7 how for low values of v the junction produces individual pulses. The average voltage ?V across the junction can be calculated from (1.45), ? ??? = v 2pi ? 2pi/v ??(t) dt = v = ?i2 ? 1, (1.47) ?V (Ib) = ?0?c2pi ?(Ib/Ic)2 ? 1. (1.48) Figure 1.8 shows I-V curves of over- and underdamped junctions for various values of ?c. For overdamped junctions (?c < 1) the behavior is non-hysteretic. For underdamped junctions (?c > 1), such as those we will use in RQL, the behavior is hysteretic. Starting from zero bias current, a small increase in bias current results in no steady voltage until the critical current is exceeded. The voltage then jumps to a finite value instead of gradually increasing (solid black arrow). Upon decreasing the current, the momentum of the junction can carry the phase through an infinite number of rotations despite some dissipation so long as the bias current supplies enough tilt to the potential. The voltage quickly decreases with decreasing current (red arrows) and drops to zero once the curve falls below the return current Ir = 4 Ic/(pi??c) [9]. The zero-voltage branch and finite voltage branch correspond to different initial conditions for ??(0) when solving (1.37b). 1.4 Superconducting Interferometers In the previous section, I discussed the behavior of single junctions. Most importantly, I have shown how flux is quantized in loops and that overdamped 27 0 0.5 1 1.5 2 30 35 40 45 50 55 V [m V ] Time t [ps] (a) Analytic Case (? ? 0) I = 2.02 Ic I = 1.01 Ic ?V = 1.32 mV ?V = 0.11 mV ?t = 2pi ?c v 0 0.5 1 1.5 2 2.5 3 30 35 40 45 50 55 V [m V ] Time t [ps] Numerical Solution (? = 0.25)(b) I = 2.02 Ic I = 1.01 Ic ?V = 1.91 mV ?V = 0.22 mV ?t < 2pi ?c v Figure 1.7: Voltage vs time dynamics of overbiased junction for two values of applied bias current. (a) Analytic solution to (1.43). Red curve (v = 0.142) for small overbias shows well-separated SFQ pulses. Green curve (v = 1.76) for large overbias resembles a high-frequency sinusoidal variation instead of individual pulses. For low overbias the separation between pulses is ?t = 2pi/?cv. The average voltage is given by (1.48). (b) Numerical solution to (1.37b) for same values (other than ?c) as in part (a). Pulses become closer together. Average voltages are higher. (IcRN = 0.75mV) 28 0 0.5 1 1.5 2 0 0.2 0.4 0.6 I/ I c ?V /IcRN a b c d e Figure 1.8: I-V curve of current driven junctions. Green curve shows I-V curve for ?c = 0. For ?c < 1 there is no hysteresis. For ?c > 1 the red I-V curves shows hysteretic behavior. Curves a?e have ?c = 1.1, 2, 4, 10, 30. Junctions remain in the zero voltage state (zero average voltage) until the critical current is reached. The voltage then jumps (horizontal black line) to a finite value (red curves). The voltage does not return to zero until the current has been reduced below the return current value Ir = 4 Ic/(pi??). (Results calculated numerically from (1.37b).) 29 I LIc ? ?e Figure 1.9: Single-junction interferometer equivalent circuit. A junction with critical current Ic is connected at both ends to an inductance L. The phase across the junction is ? and the current through the junction is I. An externally applied magnetic field couples flux ?ext into the loop and this can be thought of an inducing a phase ?e in the loop. junctions can create individual single-flux-quantum voltage pulses. These two effects lay the foundation for digital logic in superconducting circuits. A digital ?one? is stored as a flux ?0 in a loop and transmitted as an SFQ voltage pulse. To make further progress requires examining more complicated circuits with an inductor and one or more junctions. 1.4.1 Single Junction Interferometer In RQL circuits, each Josephson junction is part of one or more superconduct- ing loops. In such circuits the phase difference across the junction is modulated by the magnetic flux applied to the loop. Figure 1.9 shows a single-junction interferom- eter formed from a superconducting inductor L and a single junction with critical current Ic. The total flux in the loop is related to the current I in the loop [4] and the applied flux ?ext by ? = LI +?ext. The flux-phase relation allows us to express 30 the phase across the junction as ? = ?e ? 2pi?0LI = ?e ? ?i (1.49) where i = I/Ic, ?c = 2pi?ext/?0, and the normalized inductance of the loop is ? = L/Lc, and Lc = ?0/(2piIc). This gives the junction phase ? as a function of the applied flux phase ?e and the loop current I. In a stationary state, the current through the junction obeys i = sin ?, which allows us to rewrite (1.49) as ?+ ? sin? = ?e. (1.50) The phase of the single junction interferometer ? is plotted as a function of ?e in Fig. 1.10(a). For ? < 1 the junction phase follows the applied phase, ? ? ?e. For ? > 1 the value of ? becomes hysteretic, with only certain values of junction phase allowed. These values correspond to the number of single flux quanta ?0 stored in the loop. When the the junction switches, it jumps from one branch to another. Between the branches an SFQ pulse is generated by the changing phase across the junction and the changing current through the inductor. A different way to understand these jumps is to look at the energy of the loop, including both the junction and the inductor. This gives an energy in terms of the phases ? and ?e as: U(?) = Ic?0 ( 1? cos?+ (?? ?e) 2 2? ) . (1.51) This is plotted in Fig. 1.10(b) for the special case of ?e = pi/2, which makes the energy symmetric about ? = 0. In switching between the two lowest minima, no energy is dissipated and the switching behavior back and forth is the same. 31 -2 -1 0 1 2 3 4 -10 -5 0 5 10 15 Ju n ct io n Ph a se ?/ pi Applied External Phase ?e/pi = 2?e?0 Junction phase branches(a) 0 0.2 0.4 0.6 0.8 1 1.2 -3 -2 -1 0 1 2 3 En er gy [E/ ? 0I C ] Phase ? [rad/pi] (b) Single-junction Interferometer Potential Energy Figure 1.10: Single-junction interferometer phase behavior and potential energy. (a) The junction phase ? is plotted as a function of the externally applied phase ?e showing both the allowed branches (red solid) and prohibited branches (green dashed). A transition from one branch to the next results in an integer change of the number of flux quanta ?0 in the interferometer. (b) Potential energy (red solid curve) of the single- junction interferometer when biased by ?0/2 flux in the loop. Solid and empty circles show two meta-stable states at equal energies. Green dashed curve shows quadratic term in (1.49). 32 This behavior will become very important in the next chapter, in which I describe the nature of a new logic family. We wish for a symmetry between positive flux and negative flux. With this kind of symmetry in a single loop interferometer, no power will be dissipated by switching events. 1.4.2 Josephson Transmission Line In this section I briefly discuss the Josephson transmission line (JTL). A Josephson transmission line is a series of single junction superconducting interfer- ometers coupled together by inductances L. It is of fundamental importance to RQL because it can carry SFQ pulses from one junction to another. The basic concept of the JTL can be seen in Fig. 1.11. A constant bias current ? less than Ic and by convention about 0.7Ic ? is supplied to each junction. The current of an SFQ pulse causes the underbiased junction to become overbiased and switch through a phase of 2pi. This switching generates new SFQ pulses which travels both backwards, canceling out the original SFQ pulse, and forwards, allowing a pulse to propagate forward. When multiple junctions and inductors L are coupled together, (1.37b) can be generalized and one finds: ?0 2piL (??i?1(t) + 2?i(t)? ?i+1(t)) = ? ??2p ??i(t)? ??1c ??i(t)? sin (?i(t)) + Ib, (1.52) where i refers to the ith junction in the JTL and Ib is a generic externally applied bias current. The left hand side of the equation describes the currents flowing 33 to and from the ith single-junction interferometer loop. On the right-hand-side of the equation we see the regular terms from (1.37b) and a biasing function of our choosing. On the left-hand-side are the coupling terms between junctions, which are simply the currents flowing through the inductors from the previous (i?1) and next (i + 1) junctions. The set of equations for i = 1, . . . , N , plus boundary conditions, forms the equations of motion for the whole transmission line. (This is in fact a discreetized version of the sine-Gordon equation, which implicitly has solutions of traveling SFQ pulses [6].) The JTL configuration shown in Fig. 1.11 will allow positive pulses to travel rightward and negative pulses to travel leftward. Negative pulses will travel right- ward if the direction of the bias current is reversed. We have a choice of Ib. If the bias current is supplied through coupled inductors, the junction forms a single junction interferometer, and the switching can occur between equipotential states. If the bias current is not constant but can vary over time and (discreetly) over space, we can control the flow of pulses. In RQL, one chooses Ib = A sin(? t) so that both positive and negative pulses travel rightward during opposite clock phases. The solution to (1.52) for Ib = A sin(?t) is shown in Fig. 1.12 for four junc- tions, each on two phases. (See Appendix A.) The propagation of pulses is clear. I also note another important fact, that pulses can be held at a ?phase boundary? between different bias conditions. Also it is clear that the junctions can propagate both positive and negative pulses. The detailed behavior of such an arrangement of Josephson junctions is the topic of the rest of this thesis. 34 (a) Ib Ib Ib IbIb Ib Ib Ib Ib IcIcIc L L L L (c) (b) Figure 1.11: Josephson Transmission Line. Junctions with critical cur- rent Ic are biased by a current Ib < Ic and coupled to adjacent junctions through inductors L. SFQ current shown by arrows. This configuration allows the propagation of SFQ pulses from one side to the other. (a) though (c) show a time progression of pulse propagation. (a) A current (red) from the SFQ pulse passes through the first junction. The com- bined SFQ and bias current exceed the critical current. (b) The junction switches and generates an SFQ voltage pulse, which in turn creates ex- actly one SFQ pulse to the left and right (blue arrows). Red and blue arrow on left cancel out. (c) The newly generated SFQ pulse causes the second junction to switch, repeating the process. 35 Ib = 0.7 sin(?t) Ib = 0.7 sin(?t+ pi/2) SFQ Input LLLLLLL JJ1 RN JJ2 JJ3 JJ4 JJ5 JJ6 JJ7 JJ8 -1 -0.5 0 0.5 1 1.5 2 100 120 140 160 180 200 220 240 -1 -0.5 0 0.5 1 Ju n ct io n Ph a se [ra d/ 2pi ] Cl oc k A m pli tu de [I/ I c ] Time [ps] JJ 1 JJ 2 JJ 3 JJ 4 JJ 5 JJ 6 JJ 7 JJ 8 Phase 1 Clock Phase 2 Clock Figure 1.12: Phase behavior of junction in JTL. The solution to (1.52) for the circuit schematic shown on top is shown with phases ?1 to ?8 driven by the bias current shown on bottom. Junctions 1 ? 4 are driven by black sinusoid, junctions 5 ? 8 by the red sinusoid, a quarter period later. The junctions can be seen to switch in sequence with the later four junctions switching only when the local bias current is high. (Junction 8 is highly damped to prevent reflections and does not actually switch itself.) The first junction is driven by a positive SFQ pulse followed by a negative SFQ pulse half a period later. (Numerical solution. IcRN = 0.75, ?c = 1.56. Further details found in Appendix A.) 36 1.5 Introduction to Superconducting Digital Logic Moore?s Law predicts that the speed of digital electronics will increase by a factor of two every two years. This has been achieved in practice by making CMOS circuits smaller and smaller. Recently, this progress has slowed and CMOS has been stuck at clock speeds of around 4 GHz for the last few years [14]. Faster circuits have come at the cost of higher power dissipation and heat loading. This is one factor limiting progress of CMOS technologies. Multi-processor schemes have allowed further throughput improvements, however this is expected to reach a limit, too. Parallel processing introduces an overhead which some have predicted will impact performance at about 16 processing units [14]. To break through these limitations, a new class of digital circuits is needed. Some type of superconducting digital electronics may ultimately fill this need. Superconducting technologies have a number of inherent advantages over CMOS. The flux in a superconducting ring is quantized to exactly h?/2e making digital one and digital zero intrinsically defined quantities in the system. In Josephson junctions, creating one such quantized flux corresponds to an energy consumption of typically about 10?18J , far lower than that for CMOS [14]. Also, the inherent switching speed of junctions is fast; 1 mV applied potential corresponds to a 500 GHz oscillation frequency. Of course, CMOS technology also has many advantages, and developing a new technology that can compete with CMOS is not easy. The first logic based on the processing of SFQ pulses was Rapid Single Flux Quantum (RSFQ) logic [15]. In RSFQ circuits, digital one and digital zero are en- 37 coded as the presence or absence of an SFQ pulse between two clock pulses. Within one clock period, the data is stored as magnetic flux states of superconducting in- terferometers. Clock pulses are used to read the state of internal memory and reset gates. RSFQ logic has demonstrated fast operating speeds [16] (up to 700 Gbit/s in a static divider), low dynamic power dissipation [17] (a few mW for a whole circuit) [6], chip-to-chip communication of more than 100 Gbit/s [18, 19], and an integration density of tens of thousands of junctions per chip [20], on demonstrated prototypes [21, 22]. However, RSFQ has some issues. For example, pulse encoding used in RSFQ logic imposes some limitations. RSFQ uses a ripple clock distribution where active elements ? the Josephson junctions ? regenerate the clock pulses. The ripple- clock distribution necessitates active hardware delays between gates and leads to a jitter accumulation [23]. The internal memory of the gates inherently leads to large latency, as the resetting clock signal must propagate through the whole circuit. Another problem is that the DC power scheme uses bias resistors that give at least ten times higher static power dissipation than the switching power [24]. Also, RSFQ circuits are built from finite-state machines and pipelined on the gate level. This allows high throughput at the cost of high latency. Together these properties limit application of RSFQ and make it unsuitable for VLSI applications such as high end computing where operations-per-Joule and latency are prime performance metrics [25]. Many challenges have prevented of superconducting technologies from seeing widespread use in the past. However, recently a number of superconducting digital 38 electronics have found commercial use. Advances in cooling technology make space- and energy-efficient options available for computing [14]. Digital signal processors, adaptive filtering, and direct digitalization has all been performed in a commercial setting using superconducting digital electronics. Despite these advances, a number of issues still remain. In particular, the lack of existing superconducting digital memory prevents its use as a general purpose computer that can compete with silicon-based processors. 39 Chapter 2 Reciprocal Quantum Logic 2.1 Introduction Power consumption has increasingly become a limiting factor in high perfor- mance digital circuits and systems. According to a U.S. Environmental Protection Agency study [26], the demand of servers and data centers in the U.S. is approach- ing 12 GW, equivalent to the output of 25 typical 500 MW power plants. Here I describe a new logic family, Reciprocal Quantum Logic, that yields a factor of 300 reduction in power compared to projected nano-scale CMOS, even taking into account the power consumed to maintain a cryogenic operating temperature. In this chapter I discuss the fundamentals of reciprocal quantum logic. I first describe an RQL transmission line and RQL logic gates. I then describe three benchmark experiments that I completed that show the scalability of RQL for very large scale integrated (VLSI) circuits. In this introduction I describe the encoding of classical digital data using reciprocal SFQ pulses. RQL gates, operate with single magnetic flux quanta (SFQ) generated by overdamped Josephson junctions. This is the same approach used in RSFQ gates. Figure 2.1 illustrates how data is encoded in RQL. A ?one? bit is encoded as a pair of positive and negative (reciprocal) SFQ pulses generated in the positive and negative phases of the sinusoidal clock. A ?zero? bit corresponds to 41 the absence of positive/negative pulse pairs during a clock cycle. The positive SFQ pulse arrives during the positive part of the clock signal while the negative pulse follows later during the negative part of the clock cycle. A major difference between RQL and RSFQ is how power is supplied to the gates. RSFQ gates [4] use static dc power applied in parallel through bias resistors, while RQL uses ac power applied in series. Figure 2.1 shows the AC power applied through the inductively coupled bias line; the AC power simultaneously serves as a global clock reference. With no bias resistors there is no static power dissipation. Power induced in LAC is conservative apart from junction switching events. 2.2 Josephson Transmission Line Figure 2.2 shows the schematic of an RQL Josephson transmission line. The JTL is formed from a series of cells with each cell being an inductive loop formed by junctions JJ1 and JJ2 and inductors L1 and L2. The inductances L1 and L2 are small (L1Ic1  ?0) so an incoming SFQ pulse will induce switching of both junctions in series. Junctions are biased through inductor L0 which is coupled to the AC line inductance Lc via a mutual inductance M0c. AC current in the clock line induces positive bias current through the junctions in the positive half-period of the clock cycle and negative current during the negative half-period. The flux bias inductor Lb is large (LbI1 > ?0) and any junction connected directly to a bias inductor (L0) forms a single junction interferometer with two stable states, similar to the single junction interferometer circuit shown in Fig. 2.1. The circuit in Fig. 42 SF Q 1 10 0 ti m e Voltage Current Data a c po w er cl o ck si gn a l J1 Lclock LACk LS RN SFQ Pulse Clock AC Power Figure 2.1: Basic RQL active interconnect element showing grounded junction J1 coupled inductively to clock line. Junctions are coupled to other elements through LS. The interconnect draws energy from the power line, much like an RSFQ JTL. This element provides isolation, amplification of the current with a characteristic delay. The junction and bias inductor LAC form a single-junction interferometer. 43 (c)clock line flux bias L2L1 JJ1 JJ2 Lb Lc L?1 L?2 00 L0 Lc JJ1 data line L1 L0 (a) (b) (d) Figure 2.2: Josephson transmission line and SFQ launch circuit diagram. (a) Circuit diagram for RQL JTL. The values of inductors are: L0 = 13.4 pH, L1 = L?1 = 3.0 pH, L2 = L?2 = 2.1 pH, M0b = 0.5 pH, M0c = 1.7 pH. The critical currents for the junctions are: JJ1 = 0.100mA, JJ2 = 0.141mA. (b) Circuit diagram for RQL launch. (c) Block diagram symbol of JTL, with two digits indicating phase and clock line number. (d) Block diagram symbol of launch. 2.2 is electrically equivalent to that shown in Fig. 2.1 for LAC = L1 + L0/2 and LS = L?1 for JJ1 and LAC = L2 +L0/2 and LS = L?2 for JJ2. A single flux quantum ?0 will be stored in the loop formed by the junction J1 and bias inductor after each increase of the phase by of the junction by 2pi. This stored flux is canceled out by the reciprocal pulse. Additional DC flux bias on the clock line induces a flux of ?0/2 in the bias inductor. This makes the states of the single-junction interferometer symmetric with ??0. The RQL JTL is known to have wide operating margins, more than 50% on individual critical currents [27]. The critical parameter found in our simulations turns out to be the ratio between bias inductor Lb and transmission inductor L1+L2. The bias inductor Lb needs to be as large as possible so the current in the bias 44 inductor does not effect junction switching. However, for practical reasons bias inductors need to be limited in size. Values in Fig. 2.2 correspond to the nominal design I used. RQL JTLs allow amplification of SFQ pulse energy. This is achieved by step- ping up the critical current from one cell to the next. With two sequential steps with amplification by ? 2, the energy of the SFQ pulse is doubled. The nominal design values I used corresponds to an amplification of the SFQ energy by ?2 per stage. This allows me to use the same JTL cell layout as part of an RQL SFQ splitter. An RQL splitter is formed by attaching the input of two JTL units to the output of a single JTL. This effectively gives the JTL unit a fan-in of one-half and a fan-out of one. Many cells can be connected in series to form a long JTL. A pulse will prop- agate through a JTL segment so long as the bias current is sufficient. However, one-phase AC power does not provide directionality for pulses. Using only one phase, during the negative half-cycle, junctions will switch in the opposite order and a positive pulse ? which moved forward during the positive half clock cycle ? would travel backward during the negative half. To prevent this, RQL uses a four- phase clock; two clock lines with a phase difference of pi/2 provide two phases. By coupling the clock lines to the junctions in a wound or counter-wound fashion one produces a total of four phases differing by 0, pi/2, pi, and 3pi/2. With four phases, when one phase is nearing the end of the ?timing window,? the next has already started, allowing a pulse to continue onward. (I will give a precise definition of the timing window in Chapter 3. For now, it can be thought of as the time during which 45 the clock is close to maximum amplitude, approximately the third of the period in which the AC signal has half its maximum amplitude or higher current.) For slower clock speeds, the pulses will wait for the rise of the next clock phase. Figure 2.3 shows a four-phase RQL JTL. Given the geometry of two clock lines and two winding directions, the natural way to index the clock phases is with a two-bit binary in which the first digit is the winding bit and the second the clock line bit. The first clock line (0 phase) with regular winding is 00 and the third (pi phase), for example, is 10. The four-phase clock provides an implicit pipeline ? data processing elements with the output of one element as the input of the next one ? without additional devices needed for latches or clock distribution. However, in cases where the phase of the clock on the next element is delayed by pi/2 relative to the current JTL, JTLs on such phase boundaries require slightly altered values compared to the inductors shown in Fig. 2.2. Because of the phase difference, the bias current through adjacent junctions will cause current to leak from one JTL to the next. This can be prevented by altering the inductance values so as to redistribute the currents correctly. A pipeline in RQL can have any number of logical elements on a single phase, i.e. the elements are connected in series to a single clock line with a single phase. Short pipelines can have higher operating frequencies with fewer operations per phase. Long pipelines, such as shown in seen in Fig. 2.4, can have more logical operations per phase at the cost of lower operating frequency; the SFQ pulses must reach the next phase before the current becomes too small to bias the junctions. In general, the speed of large circuits is effectively the product of clock frequency and pipeline 46 (c) (b) (a) #1 #2 #3 #4 Clock 1Clock 2 Direction of Travel #4#3#2#1 00 01 10 11 +?0??0 Negative SFQ Positive SFQ Phase 00 Phase 01 Phase 10 Phase 11 Clock 1 Clock 2 Figure 2.3: Data propagation in an RQL 4-phase clock transmission line. (a) Two clock signals with a quarter-period offset between them. (b) Aligned with the waves in (a), this figure shows four JTL units in series on four different phases labeled 00, 01, 10, and 11. Two clock lines provide four phases by counter-winding, shown by inductors pointing in opposite directions. Pulse directionality is achieved by a four-phase clock. A positive flux propagates bias current is positive, and a negative propagates forward when the bias current is negative. Positive and nega- tive SFQ pulses represented by the current generated by their flux in two pairs of junctions each. Positive pulses drives bias current down both junctions, and the rightmost will switch first. Negative pulses drive bias current upward though junctions; again the right junction of the pair switches first. (Only coupled inductors are shown for clarity.) (c) Block diagram of the circuit shown in (b). 47 00 00 00 00 00 +?0 #1 #2 #3 #4 #4#3#2#1 00 00 00 Figure 2.4: Deep Pipeline JTL. Four JTL units are on the same clock phase with a single positive SFQ pulse traveling left-to-right. The recip- rocal pulse comes later. Pulses travel through any number of junctions provided that the bias current is correct. Pulses propagate through junc- tions until a clock phase boundary is reached. This decreases latency as the pulse can travel though more junctions per clock cycle. depth. RQL pipelines are robust against timing errors. The data self-synchronizes to the AC clock signal. At a phase boundary ? where junctions are coupled to a clock line with different phase ? early pulses wait for the rise of the clock signal in the next section. The jitter accumulates only within one pipeline stage and is negligible. Nominally timed pulses have a window during which propagation is possible. Unlike in RSFQ, pulses do not need to wait for a specific clock SFQ pulse to propagate. The hold-and-release operation in RSFQ fixes the clock speed to a specific value. In RQL, the clock speed can be changed freely up to a maximum (which will be derived in Chapter 3). Timing errors can generally be corrected by reducing the clock speed; every pipe can be operated at a lower frequency and the pulses will wait at the pipe 48 phase boundary. The maximum pipeline depth in terms of Jospheson junctions is determined by the operating clock frequency fclock and the delay time per cell. The delay of a pipeline should be less than approximately one-third of a clock cycle. I give a detailed description of RQL timing in Chapter 3. 2.3 Logic Gates The routing and processing of pulse-based signals is distinct from transistor- based voltage-state logic. In RQL, logic is performed by routing pulses though an inductive network. The Josephson junctions in the JTLs act as signal repeaters. Considering only the positive pulses, the gates are similar to the state machines of RSFQ logic. The trailing negative pulse erases the internal state every clock cycle. Logic gates are unclocked. The timing and bias depend on input and output JTLs connected to the gate. There are three fundamental RQL gates that form a complete set [28] and thus can be used to build any digital circuit: the AndOr gate, the AnotB gate, and the Set-Reset latch. 2.3.1 AND and OR Gates Figure 2.5(a) shows a schematic of the AndOr gate. The gate has two sym- metric inputs and two outputs. The first pulse the gate receives on either input is routed to the OR output; the second to the AND output. The gate contains two junctions that are connected to the inputs through inductive networks formed by two high-efficiency transformers, k12 and k34. The high-efficiency transformers form 49 A B L4 L1 L2 L3 k23 k34 Bias In B A JJ2 JJ1 Bias Out L5 L6 AANB B A L1 k34 JJ2 JJ1 Bias OutBias In L2 L3 L4 k12 L5 L6 OR AND (b) (c) (d) (a) (f) (e) Figure 2.5: RQL Logic Gates. (a) AND and OR logic gate schematic. Inputs A and B are highly coupled through k12. The high common mode inductance drives currents through both junctions upon SFQ input. Flux biasing through k34 biases JJ1 to switch first, after which the flux biases JJ2. Low odd-mode inductance between L1 and L2 prevents switching junctions from generating backwards-traveling SFQ pulses. (Optional circuit elements shown in grey.) The values of inductors are: L1 = L2 = 26.9 pH, L3 = L4 = 9.8 pH, L5 = L6 = 3.0 pH, M12 = 24.8 pH, M34 = 0.57 pH. The critical currents for the junctions are: JJ1 = JJ2 = 0.141mA (b?d) Symbols for OR, AND, and the combined AndOr gate. (e) Schematic for the A-and-not-B (AnotB) logic gate. A pulse at B before A will reverse-bias JJ2 though the high efficiency k34 transformer, inhibiting output. A pulse at A before B will pass though uninhibited. The values of inductors are: L1 = L2 = 3.25 pH, L3 = 28.3 pH, L4 = 32.3 pH, L5 = 4.2 pH M23 = 0.525 pH, M34 = 15.76 pH. The critical currents for the junctions are: JJ1 = JJ2 = 0.100mA (f) Symbol for AnotB gate. 50 a high-inductance differential mode between inputs and a low-inductance common mode between the inputs and the junctions. A pulse at either input will send cur- rent through both junctions. The junction JJ1 at the OR-output is preferentially biased by ?0/2 flux induced in inductor L4. This junction will switch when the first input pulse on either input arrives. After switching, the flux state of the gate is reversed and junction JJ2 at the AND output becomes preferentially biased. This means that junction JJ2 will switch if a second positive input pulse arrives. The high differential inductance between inputs prevents propagation of the input pulses from one input to the other. Negative pulses are processed in a similar way, except that junction JJ2 at the AND output will switch first in the case of two input pulses. The first negative pulse will follow the second positive pulse in this case, and the second negative pulse will follow the first positive pulse. This switching does not violate the RQL data encoding, which requires that every positive pulse is followed by a negative pulse approximately half a clock period later, since all positive pulses on the output are followed by reciprocal pulses. The ordering for negative pulses is reversed, though this is only a timing issue and not a logic error. I note that the AndOr gate does not have an explicit clock bias. The bias current for the junctions is provided from input JTLs. The input JTLs to the gate require special parameters and negligible output inductance. The combination of L1, L2, and k12 produce a total inductance of 5.1 pH, the same as the JTL would see when connected to another JTL. The AndOr gate parameters are optimized in such a way that the signal is amplified at the input of the gate and there is sufficient bias current to switch the junctions in the gate. The critical margin in the AndOr 51 gate is on the dc bias inductor. The margins on other parameters are more than 50%. The gate has a fan-out of one-half and requires one standard JTL segment at the output to connect to other gates. The AndOr gate does not have timing restrictions on input signals. If both in- puts arrive simultaneously, both junctions switch and produce simultaneous output at the AND and OR outputs. The internal flux state of the gate does not change in this case. The gate can operate either at a phase boundary or inside a single-phase pipeline. Similar to standard JTLs, the input and output JTLs for the AndOr gate have adjusted parameters that compensate for differences in bias current at the phase boundary. The AndOr gate can be used as a stand-alone OR or AND gate. Either (but not both) of the inductors L5 and L6 are optional in cases where only one of the two logic functions is desired. 2.3.2 A-and-not-B Gate Figure 2.5(e) shows a schematic of the AnotB gate. The A-and-not-B (AnotB) gate allows pulses arriving at A to pass through as long as a pulse has not arrived previously at B. The gate consists of two junctions, JJ1 and JJ2, connected through a high-efficiency transformer k34. The high efficiency transformer ?negatively? cou- ples the junctions to each other; a positive current through one junction induces a negative current through the other, and vice versa. Therefore, when an input pulse arrives at the B input and JJ1 switches, a negative current is induced through JJ2, which inhibits it from switching. In this case an A-input pulse is stored in input 52 inductor L5 and will annihilate with the reciprocal pulse half a cycle later. In the absence of a B-pulse, junction JJ2 switches with each incoming A-input pulse. The AnotB gate has a bistable internal flux state corresponding to ??0/2. The gate has a DC flux bias line that sets up a positive current through both junctions to ground. When either junction is triggered by an SFQ pulse, the flux state is reversed, which reverses the current through the junctions and inhibits the triggering of the other junction. The biasing and input/output JTL parameters are the same as for the AndOr gate. However, the AnotB gate has specific timing requirements. The pulse on the B-input has to arrive before or simultaneously with the A-input pulse. The later case can be realized by placing the AnotB gate at a phase boundary. For this reason this gate is often placed at a phase boundary to save on explicit hardware delays. 2.3.3 Set-Reset Gate The Set-Reset gate shown in Fig. 2.6(a) is the most complicated RQL gate I have worked with. The gate is complicated because the attraction of positive and negative reciprocal pulses makes it difficult to realize internal memory. The Set- Reset gate has an internal state which switches between two bi-stable flux states. A positive pulse is output when the internal state switches to the positive flux state. A negative pulse is output when the state switches to the negative flux state. The state changes only for the first SFQ pulse pair on either input; later pulses do not switch the state or generate output. The waveforms during gate operation are shown 53 Set Reset Output timephase 0 timephase 0 timephase 0 JJ3JJ2 JJ1 Bias In Bias Out Output L3 L5 L4 L6L2 L1 Set Reset (a) (b) 2pi 2pi 2pi Figure 2.6: Set/Reset (SRS) unit schematic and behavior. (a) Circuit schematic of the Set/Reset gate. The bias induces a flux of +1/2?0 though JJ2, L5, and JJ3 (clockwise current). This puts the unit in the Set state. SFQ pulse pairs that arrive at the Set input when the SRS is in the set state have no effect. A positive pulse at reset will be inverted and output. The trailing negative SFQ pulse changes the internal state to ?1/2?0. With the SRS in the ??0/2 state, pulses at Reset do nothing while a positive pulse at Set will travel through and the trailing negative pulse will return the internal flux to its original state. The values of inductors are: L1 = 3.25 pH, L2 = 3.25 pH, L3 = 1pH, L4 = 28.3 pH, L5 = 32.3 pH, L6 = 4.2 pH, M35 = 0.5 pH, M45 = 26.6 pH. The critical currents for the junctions are: JJ1 = JJ2 = JJ3 = 0.118mA (b) Junction phases across JJ1 (Set), JJ2 (Reset) and JJ3 (Output). Set pulses have no effect unless the most recent pulse was at Reset. Multiple reset pulses have an effect only for the first incoming pulse. Output shows the internal state of the unit. 54 in Fig. 2.6(b). The Set-Reset gate consists of three junctions. Junctions JJ2, JJ3, and the inductor L5 form the Set memory loop. Junction JJ1 and inductor L4 form the Reset memory loop. Both loops are coupled to each other through the high-efficiency transformer k35. The Set memory loop is initially biased to have +1/2?0, so JJ3 is preferentially biased with positive current and JJ2 has a negative current. Junction JJ2 has a small critical current so that it will switch despite the positive flux in the loop. This leaves the internal flux in the Set loop at +3/2?0. JJ3 then switches, producing a positive output, and returns the internal flux to +1/2?0. The reciprocal pulse switches JJ2 and changes the internal flux from +1/2?0 to ?1/2?0. Any further set pulses will simply change the internal state from ?1/2?0 to +1/2?0 (the leading pulse) and back (the reciprocal pulse) without causing any output. The above process continues until a Reset pulse switches junction JJ1. In this case, ??0 is applied to the Set loop and the flux state in the set loop becomes ?3/2?0. A negative output follows when JJ3 switches and returns the internal flux state to ?1/2?0. The following reciprocal Reset pulse switches the internal state to +1/2?0. Following Reset pulses will switch the internal state back and forth between ?1/2?0 and +1/2?0, similar to the above behavior of the Set pulses. For the gate to work properly, the Set and Reset pulses must arrive with a half- phase delay between them. To see why, notice that a positive Reset pulse generates a negative output pulse. In order for this pulse to propagate, it must wait for or be generated during the negative clock cycle on the output. For this reason, the Set-Reset gate must operate on a mixed phase boundary with the Set input on the 55 And 00 00 01 phase boundary A B Or Figure 2.7: RQL Exclusive-OR Gate. An XOR gate can be constructed in RQL by connecting the outputs of an AndOr gate to the inputs of an AnotB gate and placing the AnotB gate output at a phase boundary. A single output from the AndOr gate will propagate through, two outputs will not go through, realizing the logical behavior of the XOR gate. same phase as the output and the Reset input on a clock phase offset by pi/2. The detailed timing behavior of the Set/Reset gate is discussed in Chapter 3. 2.4 Composite Logic Gates In this section I briefly describe two logic gates built from more fundamental gate elements. 2.4.1 Exclusive-Or Gate Figure 2.7 shows the schematic of an exclusive-or gate. An exclusive-or (XOR) gate can be constructed by connecting the outputs of an AndOr gate to the inputs of an AnotB gate, with the OR output going to A and the AND output going to B (see Fig. 2.7). A phase boundary must exist between the input and output. This avoids a ?race? condition on the AnotB gate, in which the output of the gate depends on the timing behavior of the JTLs. The gate operates as follows. A single input pulse 56 will produce a single output on the OR-output. The output pulse then travels to the AnotB gate where it must wait until the clock rises on the output JTL (marked 01 in the figure). Provided a second pulse never came, the output of the AnotB gate will not be inhibited. If a second pulse comes, regardless of timing (but on the same clock phase) both pulses will arrive at the AnotB gate and no output will occur. The XOR gate is particularly important for my work because the carry-look ahead adder (Chapter 6) is composed of both AnotB and AndOr gates as in the XOR gate. 2.4.2 Non-Destructive Read-out Gate The Set-Reset gate is inherently destructive in the sense that any output re- sults in a of the change of the internal flux state. To preserve an internal flux state after multiple read operations we can use a Non-Destructive Read-out (NDRO) Gate. Figure 2.8 shows the schematic of the NDRO gate. In this gate, the Set- Reset gate allows either a positive pulse or a negative pulse to be propagated to the B-input of an AnotB gate. Because the reciprocal pulse only follows when the next Reset (or Set) pulse arrives at the Set/Reset gate, the AnotB gate will remain in the output-inhibiting flux state. Pulses from the read input will not generate output, nor will the internal flux state change after many read inputs. 57 (b) Output Reset Set Output Reset Set Reset Set 01 00 00 10 Reset Set Read 00 01 00 00 10 (a) Figure 2.8: Non-Destructive Read-Out Gate. The NDRO gate serves as a memory unit. The Set-Reset gate only outputs a pulse when the internal state changes. By including an AnotB gate the internal state can be determined repeatedly without changing the state. (a) In this configuration a read RQL pulse will be output from the AnotB gate upon input at the Read input, depending on the internal state of the Set-Reset gate. (b) In this configuration a constant series of self-generated SFQ pulse pairs continuously read the state of the Set-Reset gate. This will generate output SFQ pulses every clock cycle until the Set-Reset gate gets a reset signal. 58 2.5 Fabrication and Equipment 2.5.1 Fabrication I had test gate circuits fabricated at Hypres using their 4.5 kA/cm2 process. (Further details of the Nb/AlOx/Nb fabrication process can be found in Appendix E.) These chips were given the name Monrovia20. This chip contained experiments for logic and power tests. Circuits fabricated in Hypres? superconductor fabrication process with 4.5 kA/cm2 Josephson junction critical current density have a 1.5?m minimum feature size [29]. (See Appendix E.) The process contains four Nb metallization layers. The sec- ond and third layers are used for wiring, Josephson junctions, and gate inductances while the first and fourth metallization layers are used as superconducting ground planes. I designed AC clock lines as microstrips with signal in the first metal layer and ground in the fourth metal layer, connected to the first layer ground through frequently spaced vias. This topology of clock lines gives a high yield because it avoids step coverage problems and film defects in higher metal layers. It also pro- vides a superconducting shield above the signal wire that reduces cross-coupling between adjacent lines. The impedance of the line is limited to 42? because the width is limited to the minimum feature size of the signal layer (2.3?m) and the SiO2 isolation has a thickness of 850 nm. (A 50? line would be realized with a 2.0?m wide microstrip in the first layer with a ground plane in the fourth layer.) The shift register was fabricated in four metal layers by Hypres with 4.5 kA/cm2 critical current density and 1.5?m minimum feature size. The junction plasma 59 Figure not to scale. Dielectric M3 M2 Via M0 (Ground) M0 (Signal) Substrate (top) (front) (side) Figure 2.9: RQL Clock Line Transformer Layout. The transformer cou- pling junctions to clock lines are shown here in three orthogonal views. On the bottom of the chip moats separate the M0 signal line (light red) from the M0 ground plane (red). The M3 layer (dark blue) is con- nected to the M0 ground plane with vias, creating a grounded skyplane. This creates a microstrip. The transformer is fabricated by depositing a second microstrip in M2 (green) between the M0 signal line and the skyplane. The transformer is grounded on one end by a via connecting to the skyplane. The mutual inductance between M0 and M2 induces currents through the junction (not shown in figure). The M2 transformer extends over the edges of the M0 signal line to ensure that misalignment during fabrication does not affect the mutual inductance between M0 and M2. Cut-away views. frequency was ?p/2pi = 250GHz. This gives a minimum SFQ pulse width of ?p = 1/?p = 3ps. The clock lines are 2.3?m wide strips (the minimum width) with 850?m SiO2 dielectric thickness to the ground plane1. This gives a maximum impedance of 32?. Tapered lines between pads and circuit provided impedance matching. (See Chapter 4 for details on the power delivery network.) Figure 2.9 shows the design of the transformers that couple the junctions to 1The ground plane is the top metal layer in this design 60 the clock lines. The clock signal is carried in the bottom and first metallization layer (M0). Moats in M0 electrically isolate the signal line from the grounded portions of M0. The fourth and top metallization layer (M3) serves as a local ground plane for signal lines. Vias, between M3 and the grounded portions of M0, ground the M3 skyplane. Bias transformers are formed from the third metallization layer (M2) on top of the clock signal line (M0 Signal). The inductive coupling scales linearly with the length of the transformer and there is small capacitive coupling, typically on the order of 7 fF. The transformer is grounded at one end by a via to M3. The other end (not shown in Fig. 2.9) leads to a grounded Josephson junction, creating an inductive loop with a junction. Figure 2.10 shows a top view of a similar layout for multiple transformers. The grounds on either side of the three signal lines are connected to the M3 skyplane through vias (shown as gold boxes). In this example the RQL gate is on phase 11. The direction of current is from bottom to top in the clock lines and from top to bottom in the DC bias line. The DC bias current will inductively generate a positive current through the junction, biasing it. Clock I is not utilized here, but supplies current to the next transformer which will use current from Clock I and not Clock Q. During the first half of the clock cycle Clock Q will likewise induce positive current through the junction. 61 G ro u nd G ro u nd C lo ckQ D C B ia s C lo ck I Figure 2.10: RQL Clock Line Transformer with DC Bias. Top-down diagram of clockline transformers to supply flux and current bias to junctions. Two ground planes on the left and right (dark red) connect to the skyplane (dark blue, visible in outline) though vias (yellow squares). Transformer shown in dark green outline. Currents flow upward in Clock I and Q lines, and downward in the DC Bias line. Flux bias pulls current out of junction to bias it at ??0/2. Clock line pulls current out of junction during positive clock phase and pushes current into junction during negative clock phase, making this a phase 11 transformer. 2.5.2 Chip Mounting and Cryogenic Environment Figure 2.11 shows the overall layout of the cryogenic probe used for measure- ments on the chip. The probe is inserted into a dewar with a 3 inch wide neck which holds approximately 60 L of liquid helium. (See Fig. 2.11(a).) The liquid helium is at 4.2 K and the chip is completely submerged in liquid. Figure 2.11(b) shows a more detailed schematic view of the probe. Above the dewar, female coaxial con- nectors attach to stainless steel UT-85 coaxial cables which travel down the neck of the probe to the cold end where they are attached to a printed circuit board (PCB) 62 mount. The printed circuit board is held by a plastic probe head. The middle of the probe head is open and exposes 48 or 80 gold contact bumps (depending on the probe model). 24 or 40 of the bumps are ground contacts. The remaining 24 or 40 contacts connect to the coaxial cables through the PCB. The chip is placed in contact with these bumps to provide electrical connectivity. The chip is held in place by a pressure foot. (See Fig. 2.11(b).) Pressure is applied evenly to the chip by means of the pressure foot. The pressure can be adjusted by the pressure screw connecting the pressure foot to the bridge. The bridge is held in place by two screws which are threaded through the bridge and into the probe head. After the chip is securely in place, two ?-metal shields (not shown in Fig. 2.11) are attached to the probe head, one over the other, to exclude magnetic fields from the interior of the probe. Finally, a fiberglass shell is placed around the assembly and held in place with four screws which are threaded through the fiberglass shell and into the probe head. The above generic description cover both the American Cryoprobe Petersen probe (with 24 signal pads) and the High Precision Devices, Inc., probe (with 40 signal pads). These probes were build-to-order and do not have model numbers. 2.5.3 Experimental Setup Figure 2.12 shows the overall CAD layout of the Monrovia 20 chip which I used to test RQL logic gates. This CAD layout was generated by the computer aided 63 Pressure Screw Bridge Screw Bridge Screw Bridge Pressure Foot 3 in Coaxial Connectors Probe Neck 36 in Gold Contact Bumps Circuit Chip Probe PCB Probe Head Liquid Helium 12 in(b) (a) Liquid Helium Probe Va cu u m Va cu u m Neck Figure 2.11: Schematic of test probe. (a) The probe used to test chips is cooled by inserting it into a 60 L dewar of liquid helium. (b) Detailed view of the probe. 40 coaxial connectors at the top of the probe lead down the probe neck to the probe?s printed circuit board. 80 gold bumps on the PCB serve as pressure contacts to the circuit chip. The chip is held in place by a pressure foot. The pressure of the foot on the chip can be adjusted by the pressure screw. The pressure screw is held in place by the bridge, which is screwed to the probe head by two bridge screws. Figures not to scale. 64 5 m m Monrovia 20 Logic Test (M20LT) Monrovia 20 Power Test (M20SR) Monrovia 20 Power Splitter (M20PS) Figure 2.12: Layout of Monrovia 20 RQL chip. This chip contains three experimental RQL circuits: the M20LT experiment, which tests the logic behavior of RQL gates and measures the bit error rate of these gates; the M20SR experiment, which measures the effect of switching junctions on phase and amplitude of the clock signal; and M20PS, which is an experiment that tests the behavior of the even mode of a Wilkinson power splitter (see Chapter 4). 65 design software Cadence.2 There were three experimental circuits on this chip. The Monrovia 20 Logic Test (M20LT) is the subject of the first experiment here and it tested the correct logical operation of RQL gates and the bit error rate (BER) of these gates. The Monrovia 20 Shift Register (M20SR) circuit was used to test the phase and amplitude modulation of RQL JTLs. The Monrovia 20 Power Splitter (M20PS) was the subject of a third experiment covered in Chapter 4. Figure 2.13 shows more detailed views of the layout of the logic circuits in M20LT. The input is on the left. The logic circuits can be seen in the middle. The output is on the right, consisting of two large output amplifiers. Figure 2.14 shows a block diagram for the experimental setup for M20LT and M20SR. When the bit error rate is tested, the oscilloscope is replaced with an Anritsu MP1764C BER detector. The data and clock lines return to room temperature without connecting to the ground on chip. (Each line is inductively coupled to the circuit.) Return lines are marked with (*) on the block diagram. Output data from q0 is generated on chip. One clock generator (#1 in figure) (Agilent Technologies E8275D) generated a synchronization signal for the pattern generator (Anritsu / Hewitt Packard 70843A). The pattern generator operated at a peak-to-peak voltage of 0.25 V?2.0 V. The clock signal to the pattern generator was clocked at twice the speed of the data to provide a return-to-zero (RZ) data input pattern, allowing a maximum data pattern frequency of 6 GHz RZ. The pattern generator also supplied a synchronization signal to the oscilloscope (Tektronix TDS 8000 Digital Sampling Oscilloscope), which always triggered on the beginning of the data input cycle. The 2All layout diagrams in this thesis are taken from the Cadence design environment. 66 LogicInput Output Figure 2.13: Monrovia 20 logic chip. The input consists of two pulse generators triggered off an RZ voltage. The logic section contains four logic gates and 28 JTLs. The output consists of two large amplifiers which produce a return to zero signal at the chip pad. output of the pattern generator was attenuated by 40 dB to reduce the power going to the on-chip SFQ pulse generator. A second clock generator (#2) (Agilent Technologies E8275D) was used to feed the on-chip clock lines. The clocks were synchronized through each generators? respective synchronization port. The clock generator had a variable power output and phase for the clock sinusoid. The clock signal was split by a 6 dB splitter before traveling through identical physical delay lines ??1 and ??2. Additional hardware delays of unknown electrical length could be added after ??2. All data input and output lines were connected to the circuit through bias-Ts before entering the probe 67 + + + Attenuator (40 dB) c1 c1* a0*a0 c0 c0* dc0 dc0* q0Circuit Junction DC Offset Hardware Delay Trigger 4.2 K ??2 Bias-T Low-Pass Filter ??1 Low Noise Amp Amplifier DC Source Sync DC Data Offset Oscilloscope Clock Generator Pattern Gen. Clock Generator #2 #1 Figure 2.14: Experimental setup for timing experiments. Block diagram of the experimental setup for the timing experiments. a0: data input; a0*: data return; c0, c0*, c1, c1*: clock phases and returns; dc0: DC offset bias; dc0*: offset bias return; q0: experimental output. Low noise bandpass filters have a cutoff frequency of fC = 1 kHz. Low noise amplifier is a Miteq LNA with an operation range of 0.518 GHz and a 2.5 dB noise floor. to reduce noise and isolate the circuit from the measurement equipment. Bias-Ts were also used to isolate the DC offset line, the ?Amplifier DC Source? line, and the ?DC Data Offset? line (see Fig. 2.14). These were connected to the chip through the bias-Ts and were grounded on chip. The output amplifier changed the phase of the final junctions into an RZ signal and required a DC bias. The DC Data Offset supplied an overall DC bias to the data input lines. Each circuit under test contained an on-chip output amplifier that generated a DC voltage pulse with approximately 2 mV amplitude. The output voltage pattern is in RZ format, such that the duration of the output pulse is half a clock period. In 68 order to be detectable on the oscilloscope, the output was amplified with a Miteq3 GaAs low-noise amplifier with a frequency range from 0.5?18 GHz, 20 dB gain, and 2.5 dB noise floor. The output voltage was monitored by an oscilloscope, which could easily detect the 20mV output switching signal. Figure 2.15 shows typical data waveforms as recorded on the oscilloscope. Both clock lines were monitored. On the bottom, the output waveform is shown. Typically, there is a certain amount of cross talk between clock line and data output lines due to the coupling between lines on the printed circuit board (PCB) used to make connections to the chip. This crosstalk can be later subtracted from the signal during data analysis. Each output was fed into the oscilloscope, except the DC phase offset. 2.6 Experimental Verification Routing and processing of pulse-based data in RQL is different from what is used in conventional transistor voltage-based digital logic circuits. While CMOS logic families are sensitive to rise and hold times, pulse-based logic depends on the sequence of arrival of pulses. For example, for the RQL AnotB gate, the B pulse must arrive sufficiently before an A pulse to function properly, even within the same clock phase. As another example, the AndOr gate sends pulses first to the OR output, then (if applicable) to AND. In general, in RQL pulses are transient and are only held at clock phase boundaries. Unlike CMOS transistors, where previous logic 3This is a Miteq ASS4-00501800-25-5P-4 with an operation range of 0.5?18 GHz and a 2.5 dB noise floor. All references in this thesis to a Miteq amplifier refer to this model with this performance. 69 Vo lta ge 20 mV / div 50 mV / div 50 mV / div 100 mV / div Time (500 ps / div) Figure 2.15: Oscilloscope output. Typical output from the Tektronix TDS8000 Digital Sampling Oscilloscope while measuring the RQL cir- cuit. Top two rows are input signals returned from chip. The third row is the two superimposed clock signals returning from chip. The bottom row is output from the RQL chip, where each peak represents a single reciprocal SFQ pulse pair. Measurable voltages ranged from 10 mV / div upward, giving 0.1 mV resolution. Measurable times ranged from 50 ps / division, allowing 1 ps time resolution. Bottom image is a color-inverse of the output, in which the divisions can be seen. 70 operations have lasting effects on output, correct logical operation in RQL depends on the depth (number of junctions) of each clock phase to ensure synchronized inputs. The use of pulse-encoding also affects power dissipation and the bit error rate. CMOS operates with constant voltage states, while the voltages in RQL are transient. In both CMOS and RQL, power is expended only during switching, and then only for digital ?one.? CMOS transistors only expend power when switching, though this has a much different meaning in the technology. RQL junctions switch for every ?one? bit, regardless of previous inputs. CMOS only switches for a change from ?one? to ?zero? (or vice versa). For constant inputs this gives a significantly different number of power dissipating events. For random bits, though, the number of power-dissipating events in CMOS or RQL will be approximately the same. To find the bit error rate in RQL, I note that the error condition is very well defined. Since SFQ pulses are exactly quantized amounts of flux, there is no threshold voltage as in CMOS. Voltage and current in RQL are defined by the junctions and are not design parameters as in CMOS. In RQL, errors constitute the presence of SFQ pulses in a clock period where there should be none, or the absence of SFQ pulses during a clock period in which they should have been present. Before designing circuits that used multiple gates, I checked the operation of individual gates. I also did simulations and then indicated a broad range of operating margins. Experimental tests (see Chapter 5) have shown that RQL gates behave much as expected and in some cases actually perform better than expected. In the remainder of this chapter, I describe three experiments to verify the logical 71 Table 2.1: Universal Logic Test for RQL gates. Two inputs, A and B, lead to four possible output conditions from the logic gates AnotB, OR, XOR, and AND. Input Output A B AnotB OR XOR AND 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 1 1 0 1 1 0 1 0 1 operation of RQL gates, measure the power dissipation, and determine the bit error rate of the AnotB gate. 2.6.1 Logic Operation Test I tested the basic logic gates (AndOr and AnotB) with the simple circuit (named M20L) shown in Fig. 2.16(a). For the AndOr gate, two bits of input ? A and B ? correspond to four possible input conditions. I synthesized the XOR gate by feeding the AndOr gate outputs into the AnotB gate. Because the AnotB gate is on a clock phase boundary, output is inhibited (for any input combination) until the beginning of propagation on the next clock phase; the order of A and B input pulses is unimportant as both will arrive before output occurs. The expected results for each gate are given in Table 2.1. A block diagram of the test circuit is shown in Fig. 2.16(b) with the SFQ generating junctions shown on the left and amplifiers shown on the right. The SFQ launches convert a return-to-zero (RZ) voltage into SFQ pulses. The output amplifiers are similar to [27] and provide a 2 mV RZ output signal. The logic 72 circuit was designed to operate at 4.2 K with speeds up to 20 GHz. I used an Anritsu pattern generator to supply the input pulses and this limited me to 6 GHz. Figure 2.16(c) shows a representative output of the sampling oscilloscope, which shows the input on the top two lines and the four outputs on the bottom four lines. The room-temperature amplifiers invert the signal, causing positive SFQ pulses to produce downward spikes instead of upward (as on the input). Careful examination of Fig. 2.16(c) reveals the outputs correspond to the expected results (see Table 2.1). I also measured the operating margins on the clock power. I found that the clock power could be varied by ?25% without producing errors, limited by output amplifiers on the low end. At the high end, an excessive clock amplitude causes all junctions to switch and generate output, regardless of input. The total latency through the circuit is one clock cycle. The results shown in Fig. 2.16(c) were per- formed at a speed of 6 GHz, the upper frequency limit of the pattern generator. Testing at lower frequencies produced the same logical output, though margins on the clock power were larger. At lower clock frequencies the switching time of the junctions becomes a smaller fraction of the clock period. The triangular peaks seen in Fig. 2.16 become rectangular patterns. However, the logical operation is the same. 73 Clock 2 Clock 1 A B XOR OR AND AnotB Input A Input B Voltage (20 mV / div) A B Phase 1 Phase 2 Phase 3 Phase 4 Phase 1 OR AnotB AND Active interconnect ? XOR AND XOR OR AnotB Outputs A B AND XOR OR AnotB c b a 6 Gbs Return?to?Zero (1 ns/div)?1 Figure 2.16: Logic Test of Basic RQL Gates. (a) Block diagram of RQL logic test circuit. Two inputs, A and B, enter from the left. Signals are split and sent through five total clock phases, emerging on the left. One AndOr gate and one AnotB gate shown in Phase 3. One AnotB gate shown in Phase 4. Small blocks represent active interconnect JTL units. Four logic operations are synthesized in this circuit. (b) Cold stage wiring diagram of (a). Two clock lines couple to the junctions in the circuit. Inputs on right made via two coupled lines which bias junctions. Clocked output amplifiers (different from JTLs) shown on the right. (c) Oscilloscope output of (b) after room temperature low-noise amplifiers have increased the outputs to the millivolt scale. Amplifiers provide arbitrary amplification and invert signal. Top two traces are inputs A and B; bottom four are logical outputs corresponding to above inputs. 74 2.6.2 Clock Power Measurement To better understand the power dissipation in RQL circuits, I tested a 200-bit shift register. This device had 1600 Josephson junctions, and is named the Mon- rovia 20 Shift Register (M20SR). (The shift register is similar to Fig. 2.3 repeated 100 times is sequence. Figure 2.12 shows a layout of the chip.) In RQL circuits the AC power is delivered on 50? microstrip clock lines that return to room temper- ature without termination on chip. The clock lines are inductively coupled to the circuit so the Josephson junctions are effectively biased in series. This allows direct measurement of the amplitude of the clock for both active and inactive circuits. To make power measurements easier, I designed the shift register with many junctions and high coupling between the clock line and junctions. This was necessary because RQL circuits operate with low AC power amplitude and small dissipation in the junctions; the junctions load the clock line only when they switch to the resistive state. With high coupling between the junctions and the clock line, more power will be drawn from the clock line than would regularly be the case. The loss of power will be reflected in a decreased amplitude of the returning clock signal. This setup, although it does not have logic gates, will show the change in amplitude clearly. The clock signal attenuation and phase delay due to the RQL gates scale as the square of the coupling coefficient k2 and can be minimized by reducing coupling to the clock line and increasing AC clock power. In real RQL circuits, these parameters are chosen to allow at most 10% attenuation and less than 2 ps phase delay in a circuit with 106 Josephson junctions [27]. 75 Figure 2.17: Power schematic for RQL delay line: (a) detail schematic; (b) equivalent block diagram, where Z is impedance of RQL gate; (c) equivalent parallel circuit. AC power losses in Nb microstrips are quite small, on the level of 1% loss per wavelength up to the gap frequency of 700 GHz [30, 31]. In a practical cir- cuit with multiple parallel lines, AC losses can be an order of magnitude less than dynamic power dissipation in the gates. Use of microstrips, as opposed to copla- nar wave guides, is essential for integration with digital circuits since such circuits require multiple crossings and couplings to the gates. However, line impedance of microstrips in general favors sub-micron processes, currently only developed by two research groups [32, 33]. Figure 2.17 shows the equivalent circuit for a junction that is switching in an RQL circuit. Switching junctions change the voltage and currents on the clock line and also affect the impedance of the coupled clock lines. In a simple linear model, the junction acts as a perfect superconductor before it switches, and it behaves as a resistor after it switches. Thus, in the case of all digital ?ones? we can treat the junctions as resistors, and the clockline time constant (speed) is simply ? = ? LcCc = 7.6 fs/?m (2.1) 76 where I have used our clockline geometry with LC = 0.3 pH/?m and CC = 0.29 fF/?m. For digital ?zero? the junction can be treated as an inductance and the clockline time constant becomes ? ? = ? L?CCC where L?C = (1? k2)LC + k2LC LgLg + LT . (2.2) Here the magnetic coupling constant is k = Lmutual/?LCLT and the inductance of the RQL gate attached to the bias inductor is Lg. For JTL elements in a shift register configuration, LJJ1 and L1 are in series, LJJ2 and L2 are in series, and one finds for the parallel combination of these two series inductors Lg = (LJJ1 + L1)(LJJ2 + L2) (LJJ1 + L1) + (LJJ2 + L2) (2.3) where the junction critical inductances LJJ1 = ?0/2piIc1, and LJJ2 = ?0/2piIc2. The effect of the coupling can be seen in (2.2). The accumulated clock delay though 1600 junctions on the return clock signal on the oscilloscope when data was input and compared to the delay when no data was input. The measured delay was 1.4?0.2 ps for the whole chip and was independent of frequency from DC to 6 GHz. Figure 2.18 shows the results of taking the difference of output clock power of all ?ones? to all ?zeros? in the shift register. The measured dissipation is 1.35 times higher than that expected from simulations but three times lower than the maximum switching power of 2Ic?0 energy dissipation per digital ?one? with average Ic = 170?A. The power dissipation in RQL circuits with random data can be approximated as P = 13Ic?0N f, (2.4) 77 where N is the number of junctions in the circuit, Ic is the weighted average critical current of the junctions, and f is the operational clock frequency. The fraction of 1/3 is due to the behavior of SFQ pulses under AC biasing. Instead of switching at the critical current Ic, the junctions switch earlier in the clock phase at bias current Ib < Ic. With increasing phase the junctions switch at larger bias currents, leading to a slight non-linearity in the relationship between power and frequency, as shown in Fig. 2.18. Current Intel i7 processors demand 8 Amps at 12 V, or a power of about 96 W, for approximately 731 million transistors [34], or approximately 130?W per transistor at approximately 3 GHz. RQL operates at approximately 0.5?W per junction at twice the speed, as shown in Fig. 2.18. A direct comparison is not possible because CMOS CPUs by design do not utilize all transistors at once. Some architectures however minimize the execution time of instructions by utilizing as many transistors as possible, leading to a 6% increase in speed at the cost of a 16% increase in power consumption [35]. However, an estimate will show the scale of RQL junction power consumption. RQL logic operations require about four junctions total. Normalized to logic operation count and clock speed, RQL is still more than 100 times more power efficient compared to current CMOS technology. 2.6.3 Bit Error Rate Another key performance metric in testing RQL gates is the bit error rate. Many mechanisms can lead to failure in a circuit. In many cases, a reduction of clock 78 3 2 1 0 0 2 4 6 8 10 12 Clock Rate (GHz) Pseudo?random code Clock 2 Clock 1 Power Dissipation (?W) 2nIc?0f 1.35? Pnum Figure 2.18: Power Dissipation Measurements. Measured dissipation on both clock lines for M20SR. Filled circles and squares show measure- ments performed by comparing a continuous sequence of 0s versus 1s. Open circle and square show power dissipation for a pseudorandom se- quence of bits and 0s, which is approximately at half the value found for the full 1s measurements at 6 GHz. The expected power dissipation based on RSFQ estimates is given by 2nIc?0f , shown as the straight line. Measured data is fit to a power law and indicates power dissipation approximately 1.35 times higher than in simulation. Slight increase of power rate with frequency in measured data is due to higher average biasing currents during switching for higher frequencies. Ic = 170?A. 79 frequency will reduce the rate of errors. In other cases, the failure is independent of clock speed. For example, the flux bias creates symmetry between positive and negative SFQ pulses. If the flux bias is too low, pulses fail to propagate and digital ?one? reads as ?zero.? High flux bias has the opposite effect, with ?zero? reading as ?ones.? In general, excessive current causes junctions to switch even in the absence of an SFQ pulse. The AnotB gate is particularly sensitive because it not clocked and as a result its operating margin depends strongly on the flux bias. I tested an AnotB gate (see Fig. 2.12) by observing the XOR output of circuit M20LT. At a clock speed of 6 GHz I monitored the XOR output while changing the flux bias near failure. A 32-bit input pattern from an Anritsu MP1763C was split and applied to the inputs with a 15-bit relative shift between A and B. The XOR output was compared to the correct pattern with an Anritsu MP1764C error detector. I could operate this setup for no more than 30 hours due to drift of the synchronization signal between the generating and measuring units. This set a lower bound on measuring the bit error rate of about 10?15. Figure 2.19 shows results I obtained from these measurements. The solid curves in Fig. 2.19 shows fits of the measured error rate to a Gaussian distribution p = 1 4 erfc ( ?(I ? It)/20? 2?I ) (2.5) with the two fit parameters It, the current threshold, and ?I , the root-mean-squared noise current. The factor of 1/4 occurs because only ?ones? create readout errors and the factor of 1/20 represents the amount of coupling between the applied flux current 80 -16 -14 -12 -10 -8 -6 -4 -2 0 1 1.5 2 2.5 3 log BER No errors detected Flux Bias on AnotB Gate (mA) 10x reduced power 4x reduced power Figure 2.19: Bit Error Rate for AnotB gate from the M20LT circuit at 6 GHz as a function of flux bias Iflux. Very broad operating margins even for low BER of 10?44. Error bars on the lowest points correspond to counting statics of 4 errors and 5 errors (left and right). No errors detected for a period of 30 hours given an error floor below 10?15. Data fit extrapolates to a BER of 10?480 at optimal bias of 1.82 mA. Curves scaled for decreased size and power. 81 and the resulting flux induced through the junction. For the low bias error (left curve), I found It = 0.66mA and ?I = 1.02?A; for high bias (right), It = 3.04mA and ?I = 1.56?A. No errors were detected for a period of 30 hours at a bias of about 2.1 mA. This included errors in the chip and the entire measuring apparatus. At the extrapolated optimal bias point of 1.82 mA the expected extrapolated bit error rate, based on these measurements, is 10?480. A bit-error rate of 10?44 is considered the norm in CMOS. From the extrapolation, at a bit error rate of 10?44 the flux bias margins for a nominal flux bias of 1.75 mA will be 30% on either side. Our extrapolated optimal bit-error rate is phenomenally smaller, and of course it is a very large extrapolation. Nevertheless, this suggests that this RQL circuit is performing well. 2.7 Summary RQL uses positive and negative pairs of SFQ pulses to encode digital data and performs logic by routing the pulses. This makes logic operations both energy and size efficient, as well as fast. The AndOr and AnotB gates compose a universal set. The NDRO gate provides a form of memory. JTLs provide connections between gates. I found an extrapolated BER of 10?480 at 4.2 K on the output of a synthesized XOR gate, which is far below the minimum error rate I could measure, 10?15. Noise current scales as the square root of the Josephson critical current. This gives a negligible BER of 10?44 with operating margins of ?30% on flux bias. 82 Chapter 3 Combinational Gates 3.1 Introduction The complexity of modern digital circuits necessitates the use of computer aided design. Computer aided design also allows for simpler ways to describe digital circuitry behavior than what would be found from detailed physical simulations, and yet still encompasses the full range of possible behavior. One standard tool in common use is VHSIC Hardware Description Language, or VHDL for short. This standard language (one of only two commonly used by the semiconductor industry) is used to design nearly every CMOS digital circuit. By casting RQL into the formalism of VHDL, I can enable the great range of existing design tools and aids developed for CMOS to be applied to superconducting digital logic. In the previous chapter, I focused on the behavior of individual junctions and gates, and described the qualitative timing requirements of RQL gates. In this chapter I provide a quantitative approach to timing in RQL circuits. First, I derive an analytic expression for the timing of a single junction. This analytic expression allows me to understand the self-correcting timing behavior in RQL and the upper frequency limit of operation. I then fit the analytic equation to simulation data to produce three parameters which describe the timing behavior of a gate at a certain frequency. Finally, I build VHDL models using the timing parameters and 83 combinational logic. 3.2 Junction Switching Time under AC Bias Current In this section I examine the switching of junctions in RQL circuits and find an analytic equation for the switching time. Using this equation, I show how the non-linear behavior of the junctions leads to stable, jitter-free pulse propagation. I also derive a failure condition to determine the maximum operational speed of RQL circuits. The testing of those models on real RQL circuits is described in Chapter 5. 3.2.1 Analytic Equation for Switching under AC Bias The switching of junctions in RQL circuits depends on the clock frequency, clock amplitude, and junction IcRN product. Here I extend the analysis of character- istic junction switching time for constant biasing to include the case of time-varying bias currents. In particular, I will assume Ib = AIc sin(?t), where A is the maxi- mum bias current in units of Ic, and find the average switching times for the range of bias conditions. Under the assumption that the junction is overdamped (which the junctions in my RQL circuits with ? ? 1 approach), the switching time ? of Josephson junctions with shunt resistor RN can be expressed as1: ? = LJ RN . (3.1) 1Note that in these RQL circuits, the junctions have been shunted by a resistor RS which is much smaller than the junctions? own resistance R. The junction resistance RN in the RSJ model is the parallel combination of RS and R, and because RS  R, RS ? RN . 84 For Ib  Ic the junction inductance LJ during switching can be approximated as [6] LJ = ?0 2pi Ic ? Ib/Ic = ?0 2pi Ic arcsin(Ib/Ic) Ib/Ic , (3.2) where ?(t) is the initial phase across the junction at time t and assuming it has not already switched. Combining (3.1) and (3.2), the switching time becomes ? = LJ RN = h? 2 e IcRN ? Ib/Ic = ?0 2pi 1 IcRN ? Ib/Ic = t0 ? Ib/Ic , (3.3) where t0 = ?0/(2pi IcRN ) is the characteristic switching time. The quantity t0 depends on the fabrication process though the IcRN product. Table 3.1 compares IcRN and t0 for different fabrication processes. Equation (3.3) can also be written as ? A sin(?t) = t0 ?. (3.4) Equation (3.4) implies that the switching time depends on the time t when AC power biases the junction. In order to account for this we find the average static bias switching time between the time at the beginning of dynamic switching tin and the time and the end of dynamic switching tout. This dynamic switching time ?? = tout ? tin can be calculated as follows, 1 ?? ? tout tin dt ?? A sin(?t) = 1 ?? ? tout tin dt t0 ?(t), (3.5a) ? tout tin A sin(?t)dt = 1 ?? ? tout tin t0 ?(t)dt = t0 ??, (3.5b) where ?? is the time-average phase across the junction. The time-averaged value of the phase across the junction is ?? ? 3 rad to first order [4]. It is useful to express 85 the time in radians of the AC clock to make the results independent of the clock frequency. Accordingly, I define ? = ?t and thus d? = ?dt. I then solve for the output clock phase ?out = ?tout as a function of ?in = ?tin by noting that Eq. (3.5b) gives ? ?out ?in A sin(?)d? ? = 3 t0. (3.6a) I can then write: ? ?out ?in sin ? d? = cos ?in ? cos ?out = 3? t0 A , (3.6b) and thus ?out = arccos [ cos(?in)? 3? t0A ] . (3.6c) For overdamped junctions with Ib = Ic the switching time is approximately t = 3??1c = 3 t0 [4]. I now define a new quantity ? = 3? t0/A. (3.7) From Eq. (3.6c) we can see that this is the switching time of a DC-biased junction with bias current Ib = AIc, normalized to the clock period T = 2pi/?. The quantity ? represents the integral of a normalized voltage over a normalized time period. A constant bias current A would result in a junction switching over a normalized time period of ?out??in. As I will show later in this chapter, it turns out that ? is a good metric of circuit behavior. For example, for ? < 1 SFQ pulses in RQL circuits wait at phase boundaries, whereas for ? > 1 the pulses can be free-running through the circuit. 86 Table 3.1: Comparison of different Jc, IcRN and switching time t0 fab- rication technologies. Hypres Hypothetical Future Pro- cess Jc 4.5 10 kA/cm2 IcRN 0.75 1.00 mV t0 0.44 0.33 ps The amount of change the phase (normalized time) changes from beginning to the end of switching is ? = ?out ? ?in. Using Eq. (3.6c), the phase delay can be written as a function of input phase ?in ?(?in) = arccos (cos ?in ? ?)? ?in, ?in > 0. (3.8) Figure 3.1 shows this function for several different clock frequencies. There are a few interesting things that can be understood from this plot. Notice that as the frequency increases, so does the relative phase delay ? for any given input phase, and the curves end at lower and lower input phases. Notice also how the delay increases rapidly for large ?in. The implication is that pulses arriving at sufficiently large ?in will not propagate. Also, note that none of the curves cross, meaning the timing behavior is uniquely determined. Finally, at low clock speeds the switching time is far shorter than the clock period and the change in bias current plays only a small role. That is, at low frequencies ?  1 and the cos(?) term in (3.8) dominates. At higher clock speeds the changing bias current affects the switching time much more, and in (3.8) the ? term becomes important. As ? ? 1 the non-linearity become more pronounced. 87 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.2 0.4 0.6 0.8 1 1.2 ? (?) [ra d/ pi ] ? [rad/pi] 40 GHz 25 GHz 15 GHz 8 GHz 2 GHz Figure 3.1: Junction phase delay versus starting junction phase ?. ?(?) from Eq. (3.8) plotted for clock frequencies of 2, 8, 15, 25, and 40 GHz. Here IcRN = 0.75mV and A = 0.83. ?(?) is essentially the switching time ?? normalized to clock frequency, i.e. ?? = ?(?)/2pif . As clock speeds increase, the delay becomes longer relative to the clock period and the timing window becomes smaller. With the delay ? known, the switching time for a single junction can be found to be ?? (t) = ?(? t)/?, for a pulse arriving at time t. The switching time for a series of junctions on the same phase is ?i ?? (ti), where ti+1 = ti + ??(ti). 3.2.2 RQL Timing Stability The four phase clock/power used in RQL plays a critical role in pulse propaga- tion. Signal pulses are not free-running in the circuit unlike DC-biased JTLs found in RSFQ [4], but are instead controlled by the phases of the clock. Here I show that 88 the multi-phase clock provides a self-correcting timing mechanism for RQL gates. What this means is that small variations in gate delay are corrected by the gate on the next phase. According to Fig. 3.1, pulses that arrive early cause junctions to switch slower and the delay between junctions will be longer. Late pulses see a higher clock bias current and will switch faster. Thus early pulses will arrive later at the next phase, while late pulses will be accelerated. After traveling through several clock phases, pulses achieve an equilibrium speed with zero accumulated jitter. Figure 3.2 illustrates the behavior of a pulse propagating through a JTL with two junctions on each of four phases. The pulse arrives at JJ1 during the first phase at 6.8 ps, which is late in the clock cycle and about 5 ps after the junction could have switched. Because of this, the switching of JJ2, which is also on the A phase occurs very late in the clock window. As a result, JJ2 has a long switching time, ending at 11.5 ps. JJ3 is the first junction on the second phase. It receives the SFQ pulse early in the B clock phase. The delay of the pulse is shorter and leaves earlier. The delay of the SFQ pulse on the C phase is approximately the same as on the second phase. The SFQ pulse leaves the C phase before it can begin propagating on the D phase. When the D phase reaches sufficient bias current at approximately 21 ps, the first junction (JJ7) on the D phase starts to switch. Because the switching of both junctions JJ7 and JJ8 on the D phase completes within a quarter clock cycle of the beginning of the clock window, all junctions on later phases will start to switch at the earliest possible time. Thus, a pulse which arrived late in phase A is at equilibrium by phase D. From this example, we can see how the timing stability is enforced by the clock 89 0 0.2 0.4 0.6 0.8 1 1.2 0 5 10 15 20 25 30 I/I c t [ps] JJ1 JJ2 JJ3 JJ4 JJ5 JJ6 JJ7 JJ8 A B C D Figure 3.2: Self-correcting timing mechanism of RQL simulated for a clock frequency of 40 GHz and IcRN = 0.75mV. Four sinusoidal clock phases are shown as a function of time. Eight junctions, two per phase, are represented as different hatched regions beneath their respective curves. The area associated with each junction is equal and defines the beginning and end of the switching process. Each junction must switch sequentially. A?D show the earliest possible switching time for each junction, with the arrows indicating how long after this time switching actually starts. Arrows labeled JJ1?JJ8 show the length of switching time. The delay is about 5 ps for junction A while later delays decrease as the pulse travels through the JTL. Note that switching events near the peak of the bias current occur faster. 90 phase boundaries. Equation (3.8) gives the delay of an SFQ pulse as a function of the input phase. At a clock boundary, the delay can be expressed in terms of the input phase at one clock phase and the output phase to the next clock phase. Figure 3.3 shows a plot of tout (the actual output time relative to the beginning of the clock cycle, not the delay time ??) versus the input phase. Point a in the figure is the stable timing point. Pulses arriving earlier are delayed relative to the leading edge of the sinusoid whereas later pulses are accelerated. Notice however, if pulses arrive late enough in the cycle, they will be slowed down. This leads to the stable timing window ending at point b in the figure. At this meta-stable timing point any small decrease in speed will cause greater delays and a slight increase in speed will cause smaller delays. Pulses that are so late that they arrive after point b will slow down until they reach point c, the timing window cut-off point, after which pulses fail to propagate. Although it is not obvious from Fig. 3.3 which is drawn for ? = 1.216, for slow clock speeds (? < 1) the stable timing point coincides with the origin and the metastable point does not exist. In this case, all pulses arriving before the cut-off point c are sped up. An important advantage of the RQL timing scheme is that timing errors can be corrected by decreasing the clock frequency. This is not always true for RSFQ where fixed hardware delays can cause failure at low or high frequencies [36]. 91 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 4 8 12 16 20 24 0 4 8 12 16 20 24 N ex t In pu t Ph a se [ra d] N ex t In pu t T im e [ps ] Local Input Phase ? [rad] Local Input Time [ps] c b a sp ee d d ec re as e sp ee d i nc re as e Figure 3.3: Relationship between input time on consecutive phases. In this example for f = 13.5GHz, A = 0.77, and N = 8 the input time at the next phase is shown as a function of input time at the previous phase by the red (solid) line. A green (dotted) line bisects the graph to show regions of speed increase and speed decrease. For points below the green line, pulses arrive earlier at the next phase than they did at the previous phase. Arrows on the red line indicate direction of change. This tends to move input time to point a, where both input times are equal. Input before point b tends to move the input time to a, whereas input times after b tend to move the input time away from a, instead towards c, after which pulses cannot propagate. 92 3.2.3 Frequency Limit The analytic timing model discussed in the previous two sections allows me to predict the frequency limits of operation for RQL circuits. As can be seen from Fig. 3.4, as the frequency increases the stable timing window gets smaller. Eventually the stable and meta-stable timing points converge, corresponding to the maximum operational frequency. At this limiting frequency, the minimum delay is equal to one quarter the clock period. Figure 3.4 shows the relationship between the timing window and the stable and meta-stable timing points. In the figure, the timing window is marked by the empty box on the left and solid box on the right. For ? ? 1 the stable timing window extends from ? = 0 to ? = ?c and no stable or meta-stable timing points exist. All pulses within the timing window will move towards t = 0 while pulses outside the timing window will fail to propagate. For ? > 1 the stable timing window extends from ?s, the input phase corresponding to the stable timing point, to ?ms, the input phase corresponding to the metastable timing point ? that is, the clock phase ? of points a and b in Fig. 3.3, respectively. In Fig. 3.4, all stable and metastable timing points correspond to an output phase delay of ? = pi/2 because this represents a delay of one quarter clock cycle. Pulses arriving earlier than ?s will be slowed down and move towards t = ?s/?. Pulses arriving later than ?ms will be slowed down as well until they are delayed to the point they can no longer propagate. Equation (3.8) was derived from the behavior of a single junction. We can extend the equation to cover multiple junctions in a clock phase by changing ? ? N?, 93 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.2 0.4 0.6 0.8 1 ? (?) [ra d/ pi ] ? [rad/pi] 16.2 GHz 13.5 GHz 8 GHz 4 GHz 1 GHz 17 GHz equilibrium Figure 3.4: Switching delay ? versus input phase for different clock frequencies. Similar to Figure 3.1, this figure shows (3.8) plotted for IcRN = 0.75mV, A = 0.83, and N = 8. Open boxes show stable timing points, filled boxed show metastable timing points. Metastable timing points lie along the line ? = pi?? until they reach a maximum of ? = pi/2. Both stable and metastable timing points then lie at the intersection of ?(?) and ? = pi/2. The limiting case for 17 GHz is shown as the top curve. where N is the number of junctions in a given phase. The fastest propagation is at the minimum of the timing equation. We find the input phase ?min that gives the shortest delay by setting d?(?) d? ??? ?=?min = sin ?min? 1? (cos ?min ? ?)2 ? 1 = 0, (3.9) and thus the input phase at minimum delay is ?min = arccos ? 2 . (3.10) 94 Substituting (3.10) into (3.8), I can write for the minimum delay condition: ?(?min) = arccos (cos ?min ? ?)? ?min = pi2 , (3.11a) arccos ( ? 2 ? ? ) ? arccos ( ? 2 ) = pi 2 . (3.11b) Equation (3.11b) has a solution at ? = ?2. Using the definition of ? then gives the following relationship for the maximum expected possible frequency fmax = A3?2piNt0 . (3.12) fmax can be made large by choosing short SFQ pulses (i.e. small t0, which can be obtained by choosing large IcRN) and a small number of junctions per phase N . I can draw a number of important conclusions about timing in RQL circuits. First, at f = fmax the metastable timing point is at ?min = pi/4. Second, the value of ? is limited to 0 < ? < 2. For ? < 1, pulses are clock-limited, propagating through each clock phase and waiting at the phase boundary. For 1 < ? < ? 2 pulses travel ballistically, traveling through several clock phases before reaching equilibrium. For ? > ? 2 pulses propagate slower than the clock signal, and for ? > 2 no pulses can propagate. Using the values for the Hypres process given in Table 3.1 and choosing A = 0.83 and N = 8, I estimate the maximum clock frequency for RQL circuits at about 17 GHz (see Fig. 3.4). 3.3 Timing Extraction from Simulation WRSpice2 uses the RSJ model of Josephson junctions. The capacitor behaves linearly. The resistor is non-linear and uses a piecewise linear model of the resistance, 2http://www.wrcad.com/manual/wrsmanual/wrsmanual.html 95 using one value of resistance Rs for currents through the junction I < Ic, and another value of resistance Rq for currents through the junction I > Ic. The junction itself is modeled following the two Josephson equations (1.34) and (1.35), additionally recording the phase of the junction. The time-domain analysis of the circuit is done in small time steps,3 identical to the solution method of the original open-source SPICE3 program [37] on which WRSpice is based. At each time step the circuit is represented as a sparse matrix G relating the voltages at every node to the currents between nodes, where GV = I. Non-linear elements, such as Josephson junctions, are represented in differential or integral form4. SPICE solves the ordinary differential equation using the Newton- Raphson iterative method. When convergence is achieved the simulator moves on to the next time step. WRSpice takes ASCII-format netlist files. To perform my simulations, I first used Cadence Virtuoso schematic editor to layout the circuits shown in Fig. 3.7 using dummy value for the relevant circuit parameters (clock speed, input time). I generated a separate netlist for each gate (two for the AndOr gate, one for each of the two outputs). Then, using a perl computer language script to replace dummy values with real values, I had the simulations done and results analyzed automatically. (See Appendix B for details on the simulations.) The analytic timing expression (3.8) can be applied to real gates. However, I had to make some assumptions that were not necessarily well justified for RQL 3http://bwrc.eecs.berkeley.edu/classes/icbook/spice/ 4http://www.nutwooduk.co.uk/pdf/Issue82.PDF#page=27 96 circuits. More accurate results can be obtained by full physical simulation of the junction behavior. The analytic timing model can then be fit to the simulation to obtain a relationship between input phase and output phase. To do this, in this section I first define a delay time in terms of measurable (physical) quantities. I then define a consistent path though a gate to which this delay applies. Finally, I discuss the physical simulation of gates in an appropriate circuit to generate results which I then fit using the analytic model. The phase ? across a junction is the natural coordinate describing the switching of a junction. In the mechanical pendulum analog of a junction, ? is the angular position of the pendulum. In the washboard potential, if the particle is moving from one local minimum to the next, then at any time the phase is clearly on one or the other side of the potential barrier. I will assume that ? = 0 initially and that after an SFQ pulse ? = 2pi. I then define ?c = (1 ? e?1) ? 2pi as the transition point. For the reciprocal pulse the transitions point is ??c = 2pi ? ?c. This choice of ?c is somewhat arbitrary, so I leave ?c as a variable and need to check that my particular choice does not impact critical results. (See Appendix B.) Figure 3.5 shows the simulated behavior of the phase difference across of two junctions in a JTL that are connected in series and switch on the same clock phase. From this plot, I define the points tin and tout as the time when the JJ1 and JJ2 junction phases respectively cross the value ?c. I then calculate the difference ?t = tout? tin as the delay in the two-junction circuit. In normalized units of clock phase, I can define ?in = ?tin and ?? = ??t for clock frequency ? and clock amplitude A = Ib/Ic. I preformed multiple simulations for different values of clock frequency 97 0 0.2 0.4 0.6 0.8 1 1.2 100 110 120 130 140 150 ? [ra d/ (2pi )] t [ps] tin tout ?c Figure 3.5: Spice simulation of the phases of two sequential junctions in a JTL during switching. The phase ? of junction JJ1 (solid curve) and JJ2 (dashed curve) on the same clock phase is plotted as a function of time. The phase of JJ1 crosses ?c at tin, which marks tin and the beginning of the switching time for JJ2. When the phase of JJ2 crosses ?c at tout the switching time ends. The arrow in the figure indicates the length of the switching time. and clock amplitude to generate a set of timing data for a gate or logic operation. The gate delay is defined as the time between arrival of the input pulse at the input junction of the gate and the arrival of the output pulse at the input junction to the next gate. Figure 3.6 shows this concept for the AndOr gate. A pulse arrives at junction b, causing it to switch. Later, junction c at the input of the next gate switches. For consistency I demand that the time of output at one gate is the same as the time of input at the following gate. That is, I define the data path to extend past the conceptual boundary of the logic gate and into the next circuit element 98 b d ca ANDOR GATE Figure 3.6: Data path through AndOr gate. The AndOr and four JTLs, one each at each input and output, is shown schematically with four junctions of interest labeled a ? d. The junctions a and b, enclosed in the dashed box, which are physically part of the AndOr gate. This figure illustrates the data path through the gate for ?or.? The data path is shown in black; the inactive path is shown in grey. For ?or? the data path starts with junction b and ends with junction c, even though c is outside the physical boundaries of the AndOr gate. (see Fig. 3.6). 3.3.1 Fitting Simulation Results With the delay and data path defined, I proceeded to simulate the gate be- havior in WRSpice. Because the timing parameters should not depend on the test bench (the circuit schematic which is intended to be representative of any generic circuit), I must choose a test bench which generates representative simulation re- sults for junction switching behavior. One of the test circuits I used is shown in Figure 3.7(a). In the circuit, JTL stages are inserted in different numbers for differ- ent clock speeds. The number of JTL stages was scaled to the frequency to cause failure (non-switching) of one of the junctions in one of the last units. As the SFQ 99 c d Single JTL Phase 1 Phase 1 a b A B N JTLs, 2N JJs (single phase) (b) (a) Figure 3.7: Circuits used to extract RQL timing results from spice sim- ulations. Schematics of two circuits used for timing extraction simula- tions. Input is on the left and signals are terminated in the circuit by resistors to ground on right. (a) A series of JTLs on a single phase lead to a resistor to prevent reflection of the SFQ pulse. The total number N of JTLs in series was scaled to the frequency. High frequencies contained fewer JTLs, while lower frequencies contained as many as forty JTLs on a single phase. (b) Gate timing extraction with AndOr gate as a repre- sentative example. Two inputs lead to two JTLs labeled a and b, which then output to a logic gate. The two gate outputs lead to two further JTLs, c and d. Output from JTLs c and d terminate in resistors. Other gates use a different combination of JTLs a, b, c, and d. pulses pass each JTL stage in sequence, it will generate one timing simulation point per stage. Figure 3.7(b) shows the test circuit for the gates. I chose the AndOr gate as a representative gate because it has two inputs and two outputs. The basic simulation approach is the same for all gates although one must make some modifications for certain gates. For example, for the splitter, input JTL b is omitted. As another example, for the AnotB and Set-Reset gates, output JTL d is omitted. Each sim- 100 ulation provides one timing data point and with multiple simulations, I varied the input phase ?in to give a spread of timing results (phase is a function of time) which I then analyzed. Details of the simulation routine I used are found in Appendix B. Using a perl script I analyzed the simulation output and recorded the time pairs at which successive junctions in a data path crossed the critical phase value ?c. Linear interpolation was used to estimate the time of crossing based on the immediately previous and following points. The delays ?fit for a given input phase ? were then fit to: ?fit(?) = ?1 ?3 [arccos (cos (?2 ?)? ?/?3)? ?2 ?] , (3.13) where ?1, ?2, ?3 are of fitting parameters and ? is the delay in radians. This is a modified form of our expected timing (3.8) and this choice of fitting parameters removes correlation between them. Ideally, these parameters would all be unity. Table 3.2 shows a portion of the full timing table I constructed from simulations of the JTL. The full table can be found in Appendix B. The first thing to notice about the Table is that ?1 and ?2 are close to 1, as expected. These two parameters simply scale the input phase and phase delay. ?3 is close to 1 for higher frequencies, but is noticeably different from 1 for low frequencies. In fact, the general trend is that all ? parameters get closer to 1 as the frequency is increased. ?3 is a scaling parameter for the curvature and we expect it to be less close to 1 than the other parameters due to the varying bias conditions in real circuits. Because the timing in much less sensitive to the delay at low frequencies, the divergence from 1 for the ? parameters at lower frequencies is of lesser consequence. The additional information 101 Table 3.2: Timing Results for the JTL unit found by fitting simulation results to (3.13). First column is the frequency. ?i parameters are calculated using gnuplot. The end of the timing window ?c, and first and last timing data points ?first and ?last are also given. Horizontal lines indicate truncated data. The full table is available in Appendix B. f ?1 ?2 ?3 ?c ?first ?last 3.5 1.15 1.026 4.611 2.882 0.6598 2.469 4.5 1.084 1.04 15.49 2.916 0.6757 2.507 5.5 1.132 1.042 2.671 2.721 0.8015 2.438 8.5 1.065 0.9487 0.514 2.408 1.064 2.427 9 1.064 0.9584 0.5778 2.411 1.111 2.444 9.5 1.062 0.9614 0.61 2.404 1.159 2.436 10 1.11 0.9596 0.5932 2.334 1.247 2.363 14.5 1.013 0.984 0.9403 2.383 1.58 2.393 15 1.048 0.9892 0.9253 2.318 1.7 2.319 15.5 1.095 0.9883 0.868 2.238 1.818 2.247 17 1.002 0.9941 1.051 2.339 1.839 2.328 in the table, ?c, ?first, and ?last tell us something about the timing window. As the frequency increases, ?c has a very definite downward trend. In agreement with earlier predictions, as the frequency increases, the timing window becomes smaller. The range of simulated input phases is given by ?first and ?last. This range decreases as the clock frequency increases, indicating that at higher frequencies, SFQ pulses are more likely to arrive closer to the peak clock amplitude. It fact, the simulations at low frequencies do not always fit well to (3.13). For example, in Table 3.2 the value of ?3 is often very different from 1. As another example, Fig. 3.9(b) shows the simulation results of the AndOr gate at 1 GHz, and one sees that the points do not neatly fall along a curve of the form of (3.13). To bet- ter capture the behavior of gates at speeds where (3.13) does not accurately match 102 the simulation results, I also fit the simulation results between the first recorded input phase ?first and the last recorded input phase ?last to a piecewise polynomial equation pfit(?) = ((?11? + ?12)? + ?13) ? ?? ? polynomial ?(? ? w) + ((?21? + ?22)? + ?23) ? ?? ? polynomial ?(w ? ?), (3.14) where ?(x) is the step function defined by ?(x) = 1 for x > 1 and ?(x) = 0 for x < 1, w = (?first + ?last)/2 is the phase between the first and last recorded input phases, and the ?ij are fitting parameters ? unrelated to the ?i parameters in (3.13). Figure 3.8 shows the simulated delay points for the JTL at a frequency of 16 GHz and clock amplitude A = 0.83. The fit to (3.13) is shown by the green dashed curve. We can see that it is similar to the prediction from (3.8) using the nominal circuit parameters (red solid curve) This corresponds to ?1 = ?2 = ?3 = 1. More importantly, we see that the points are close to the fit. In Fig 3.8, I also show the polynomial fit as a blue dashed curve. The polynomial fit parameter values are recorded in Table B.2 (see Appendix B, page 229). The polynomial fit is only applicable for ?first < ? < ?last. Outside this range the fit to (3.13) gives better results. If (3.8) is correct, then ideally all the ?i parameters should be close to unity. Clearly, this is not always the case. Also, in certain cases the fit to (3.13) gives large errors, especially at higher frequencies where fewer data points are available and at very low frequencies where the simulation behavior indicates a stepwise timing function. In such cases, the piecewise polynomial fit produces better results. 103 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 0 1 2 3 4 5 6 7 8 9 10 0 3 6 9 12 15 18 21 24 27 O u tp u t Ph a se D el ay [ra d] O u tp u t T im e D el ay [ps ] Input Phase [rad] Input Time [ps] a1 = 1.04914 a2 = 0.981119 a3 = 0.896482 f = 16 GHz A = 0.83 N = 2 Nominal Fit Polyfit Figure 3.8: Fit of delay equation to simulated switching times for f = 16GHz, A = 0.83, N = 2. Purple data points come from simulation and analysis of the circuit shown in Figure 3.7(a). The nominal behavior (red curve) matches the data closely. The fit to the analytic equation (green dashed curve) is close to the nominal behavior. The fit parameters ?1, ?2, and ?3 are shown on the figure. The fit to the piecewise polynomial function fits well within the region where data is available but diverges strongly from the other fits outside this region. Logic gates that are not directly biased by the ac clock signal will not neces- sarily have a timing response that is similar to clock-powered JTLs. In contrast to a JTL, one should expect these logic gates to exhibit behavior that diverges from the analytic timing equation (3.13). Figure 3.9 shows results for the AndOr gate outputting an OR-pulse at f = 12GHz (Fig. 3.9(a)) and the AnotB gate generating output at f = 1GHz (Fig. 3.9(b)). For the AndOr gate the analytic equation (green dashed curve) works surprisingly well even for an unpowered junction. The fitting 104 parameters ?1, ?2, ?3 show some divergence from nominal values of one, although for practical purposes the fit is excellent. Figure 3.9(a) shows an example of poor fitting for the AndOr gate at 1 GHz. The behavior departs strongly from (3.13). For the AnotB gate (see Fig. 3.9(b)) the analytic fit (green dashed) shows significant errors. As can be seen, the timing response is more like a step function than the best fit to the analytic timing equation. Although it is worth noting that the median switching time is close to the value predicted by the nominal case, the data points lie fairly close to the red curve over the region of interest. A piecewise polynomial fit better represents the timing behavior in this case. At low frequencies, early pulses will get stuck at a logic gate, as there is very little current leaking in to the gate from either adjacent JTL unit (see Chapter 2). The pulse must wait until the local bias current is sufficiently high, at which time the switching event follows almost immediately. This behavior can be seen in the figure as a steady increase in the phase delay for shorter input delays. As a practical matter in circuit design, errors for low frequencies are of little importance. For ?  1 the latency is much less than the clock period and pulses will not fail due to timing issues. For frequencies of interest, where ? ? 1, the fits work well. Using Table 3.2 I can generate plots of the fitted equation ?fit(?) for different values of frequency. For example, Fig. 3.10 shows timing curves for different clock frequencies that required different ? parameters. As in the purely analytic case the trend is toward longer phase delays for higher frequencies and the curves do not cross, as one would expect for real physical processes. 105 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.5 1 1.5 2 2.5 0 1 2 3 4 5 6 7 8 0 5 10 15 20 25 30 35 40 O u tp u t Ph a se D el ay [ra d] O u tp u t T im e D el ay [ps ] Input Phase [rad] Input Time [ps](a) a1 = 1.22437 a2 = 0.913881 a3 = 0.251574 f = 12 GHz A = 0.83 N = 1 Nominal Fit Polyfit 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.5 1 1.5 2 2.5 3 0 3 6 9 12 15 18 21 24 27 0 71 142 213 284 355 426 497 568 O u tp u t Ph a se D el ay [ra d] O u tp u t T im e D el ay [ps ] Input Phase [rad] Input Time [ps](b) a1 = 1.38126 a2 = 0.254457 a3 = 7.54544 f = 1 GHz A = 0.83 N = 1 Nominal Fit Polyfit Figure 3.9: Simulated delay versus input phase. (a) Fits of (3.13) and (3.14) to simulated output delays versus input phase ?. AndOr gate at f = 12GHz, A = 0.83, N = 1. The points are noticeably different from the nominal case (red curve) given by (3.13). The value of ?3 is noticeably different from the nominal value of 1. Nevertheless (3.13) and (3.14) still match the simulation points well. (b) Simulated delay versus input phase for the AnotB gate at f = 1GHz, A = 0.83, N = 1. The best fit (green) of (3.13) does not match the simulation points. However, the fit to the polynomial (3.14) (blue) matches the points well. 106 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 1.5 2 2.5 O u tp u t D el ay [ra d] Input Phase [rad] 1 GHz 2.5 GHz 3.5 GHz 6 GHz 10 GHz 10.5 GHz 13 GHz Figure 3.10: Comparison of Extracted Timing Curves. The fitted phase delay (3.13) is plotted using appropriate values for ?1, ?2, and ?3 for seven different frequencies from 1 GHz to 13.5 GHz. These curves have the same properties as the curves shown in Figure 3.1: the curves do not cross, low frequency curves are nearly flat, as the frequency increases, so does the output phase delay, and the endpoints of the curves move inward as the frequency increases. It would be impractical to tabulate fitting parameters for all frequencies. In- stead, I used linear interpolation in the frequency range between simulated frequen- cies. Figure 3.11 shows that linear interpolation between 10 GHz and 15 GHz fits well to the simulated 13 GHz data. The interpolated curves for both the analytic and piecewise polynomial fits match the data very well. The only drawback of inter- polation is that it is limited by the end of the timing curve for the higher frequency. 107 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 Ph a se D el ay ? fi t(? )[r a d] Input Phase ? [rad] Interpolation between Analytic Timing Equation Fits(a) 15 GHz 10 GHz 13 GHz 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 Ph a se D el ay p f it (?) [ra d] Input Phase ? [rad] Interpolation between Polynomial Fits(b) 15 GHz 10 GHz 13 GHz Figure 3.11: Simulated timing data for the JTL at 13 GHz (red solid dots) and plots of (3.13) and (3.14) for the JTL at 10 GHz and 15 GHz (green dashed curves). Both interpolations work well except at the limits of the region of interest, where the circuit is near failure already. The curve passing through the data points (red solid curve) is not a fit of (3.13) or (3.14) to the data but a linear interpolation of the functions to 13 GHz. Interpolation works well except at the end of the window higher frequencies. 108 3.4 VHDL Models for RQL Gates In this section I explain an RQL gate model that I implemented in VHDL5, a widely-used standard for timing design in the semiconductor industry. The mod- els describe logic functions, timing behavior, and failure mechanisms. With these VHDL models, I can use semiconductor timing design techniques. VHDL uses multi-valued signals (called a class in VHDL) and determines the times at which transitions between the allowed values occur. I used the existing std ulogic class6, a common CMOS class which I found was appropriate for use with RQL circuits. 3.4.1 Behavior of VHDL Models The VHDL models of RQL circuits that I built start from a model of the AC clock. The AC sinusoid is partitioned into three equal parts as shown in Figure 3.12. ?High? and ?Low? are above half maximum or below negative half maximum, respectively, and otherwise the clock is ?Off.? A positive pulse arriving during High will over-bias the junction and generate a new pulse; likewise for Low and a negative pulse. Insufficient bias during Off means pulses do not propagate and wait for the next High or Low region. This combination of ?High,? ?Low,? and ?Off? sectioning gives a model for the clock signal identical to the std ulogic model for CMOS in VHDL. To simulate the behavior of RQL circuits and gates I developed a VHDL simu- lation package. The VHDL model contains the pulse-based logic of RQL, calculates 5IEEE 1076-2008: VHSIC Hardware Description Language 6IEEE 1164-1993: IEEE Standard Multivalue Logic System for VHDL Model Interoperability 109 -1 -0.5 0 0.5 1 0 10 20 30 40 50 60 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B ia s Cu rr en t [I/ I c ] Time [ps] Clock Phase [rad? (pi/6)] HIGH OFF LOW OFF HIGH Figure 3.12: Timing model for RQL clock. The AC clock is partitioned into three logical segments. For |sin ?| < 1/2 the clock is ?Off,? dur- ing which pulse propagation is forbidden because of low bias. Likewise, ?High? and ?Low? are defined for sin ? > 1/2 and sin ? < ?1/2, respec- tively, and only allow positive or negative pulses to propagate. the delay from the analytic equation, and tracks energy dissipation, switching events, and approximate circuit size. The VHDL model is scalable for different fabrication processes. Table 3.3 shows the basic elements of the VHDL model. Global parameters include IcRN , the junction energy scale set by the fabrica- tion process. Clock frequency ? is the chosen clock frequency for operation, identical for all JTL units in the same circuit. A is the clock amplitude, a quantity important for the timing behavior. ?c is the Stewart-McCumber parameter which can be cho- sen for a particular design, and effectively changes the IcRN product. Jtotal is the total junction count of the chip, which serves as a metric for the area of the circuit. 110 Table 3.3: Global VHDL Quantities. The interaction between gates and the overall behavior of the circuit is governed by certain global inputs and signals. Global elements are inputs which allow the code to scale to any process, as well as record operation metrics such as power. Signals specify how the output of one gate becomes the input of the next, as well as carry clock phase information. Global Elements IcRN Fabrication energy scale ? Clock Frequency A Clock Amplitude ?c Damping scale factor Jtotal Total junction count (Area metric) Nswitch Total number of junction switches (Power metric) Signals 1 SFQ Unit of Positive Flux (during switching) 0 SFQ Unit of Negative Flux (during switching) H SFQ Unit of Positive Flux (residual) L SFQ Unit of Negative Flux (residual) Clock H Clock above +0.5Ib/Ic W Clock between +0.5Ib/Ic and ?0.5Ib/Ic L Clock below ?0.5Ib/Ic Nswitch is a running counter in the simulation which keeps track of the total number of junction switching events, which provides a metric for the energy dissipation. For the signals between gates, the values are 1 for a stored positive pulse and H for a transition to the 1 state, and 0 for a stored negative pulse and L for a transition to the 0 state. H and L allow easy visualization of the switching process. The clock signal can take on three values, H, W, and L, corresponding to the segments shown in Figure 3.12. 111 3.4.2 VHDL Combinational Gates RQL circuits behave as state machines when the input and output are viewed as voltages. SFQ pulses travel from junction to junction and magnetic flux is stored inside the gates. However, from a higher-level view, RQL gates function as combina- tional logic gates if one considers the input and output of the gates to be phase. In the RQL data encoding scheme, the phase of a junction is normally approximately zero, but switches to approximately 2pi after a positive SFQ has been generated (see Fig. 2.1). The reciprocal pulse changes the phase from 2pi back to zero about half a clock cycle later. Because every positive pulse is followed by a negative one, the history of pulses in RQL is therefore equivalent to the history of phases on a junction. The behavior of RQL logic gates is combinational. That is, the output of a gate depends on the input phases only, not the history of inputs as in state machines. Figure 3.13 shows two inputs and the four outputs of the three fundamental RQL logic gates described in this Chapter. The two inputs are shown in blue; the outputs are shown in green. Errors are shown in red. For the OR output, the phase is high whenever either the phase of A or B is high. For the AND output, the phase is high only when the phase of both A and B is high. Both these gates behave almost identically to the logic gates found in CMOS. The phase of the AnotB gate is high when A is high and B is low, but not when A is high and B is also high. For these three outputs, the output phase is always low if both inputs are low. I wrote a special part of the VHDL code for the 112 AnotB gate, which checks if B transitions from B=0 to B=1 while A=1, and will generate an error in this case. The truth table for the AndOr and AnotB gate is shown in Table 3.4. This error condition is a result of the underlying pulse-based behavior of the junctions. Though it is not part of the combinational logic model, the VHDL models still check for this condition. 113 Set (A) / Reset (B) A and not B AND B A Phase Time Phase Time OR Phase Time logic error! Phase Time Phase Time Phase Time Figure 3.13: Combinational logic of RQL gates. Phases at inputs A and B are shown as functions of time (blue curves). The output phases of the AndOr, AnotB, and Set/Reset gate (where A is used as the Set input and B as the Reset input) are shown by the green curves. This behavior is similar to the behavior of CMOS gates using voltages as inputs. The exception is that because the AnotB gate has a timing requirement on the order of A and B inputs, a transition of B from 0 to 1 is flagged as a logic error in the VHDL code. This is shown by the red section of the AnotB output line. The AndOr and SetReset gates have no inherent timing restrictions and never generate errors. 114 Table 3.4: Truth table for AndOr and AnotB in VHDL. A and B are input phases to the AndOr and AnotB gates. And, Or, and AnotB in the table refer to the output phases of these gates. 0 is low phase, 1 is high phase. This is analogous to CMOS voltages. Note that this table does not capture one element of the behavior of the RQL AnotB gate; if a transition occurs from A = 1, B = 0 to A = 1, B = 1 for the AnotB gate, the model reports an error. Input Output A B And Or AnotB 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 The Set/Reset gate has fundamentally different behavior than the AndOr or AnotB gate. It behaves as a memory element. Much like in CMOS, a memory element can be constructed from combinational gates [38] with a bi-stable output. In terms of combinational logic, the Set/Reset gate is described as having three inputs: Set, Reset, and Output. By feeding the output of the gate back into itself, two bi-stable states are are found. Table 3.5 shows the truth table for the Set/Reset gate. The output is stable for 0 output so long as the Set input phase is 0 or the Reset input phase is 1. If the output is 0 and the Set input phase is 1, the output will switch. As can be seen in the table, the state Q=1, S=1, R=0 is a stable state. The output will remain in this state until the the Reset input phase is 1 while the input Set phase is 0. This changes the output to 0. The state Q=0, S=0, R=1 is a stable state again. Because the Set/Reset gate has no timing requirements, it never generates an error in the model. The RQL logic gates generate output following the timing behavior described 115 Figure 3.14: AndOr gate VHDL code. A snippet of VHDL code taken from the AndOr description in VHDL. Note the four cases and the func- tion nu calculating the output delay. in Section 3.3. Figure 3.14 shows the main part of the code for the AndOr VHDL gate. Note the keyword after indicating a delay in output. 3.4.3 JTL in Combinational Logic The Josephson transmission line in RQL can be treated as a combinational logic gate, with two caveats. One, much like the Set/Reset gate, the output must also be considered an input. Unlike similar CMOS elements, this connection between output and input is completely conceptual. Two, unlike logic gates which have two SFQ inputs and where the output is a function of two possible phases on each gate, the JTL has one input which is a phase and one which is a clock signal. The clock signal has three possible values (L, W, and H, see Fig. 3.15), not two. Furthermore, 116 Table 3.5: Truth table for Set/Reset in VHDL. Q is the current output. Q? is the resulting output. Set and Reset are the input phases at the respective inputs. 0 indicates a low phase, 1 indicates a high phase. The states Q=0,S=1,R=0 and Q=1,S=0,R=1 are unstable and will produce transitions in output. Current Output Input Resulting Output notes Q Set Reset Q? 0 0 0 0 stable state 1 0 stable state 1 0 1 causes a transition 1 0 stable state 1 0 0 1 stable state 1 0 causes a transition 1 0 1 stable state 1 1 stable state while the phase of a junction is well-defined, especially in RQL circuits, the clock signal is a sinusoid without the discreet levels given to the phase by the flux quantum. The goal here is to efficiently model the behavior of real RQL gates, I will proceed to describe the JTL as a combinational logic element. Figure 3.15 shows an example of the combinational behavior of the JTL. The input is shown on top and takes two values as a function of time, 0 and 1, where 0 indicates low phase and 1 indicates high phase. The clock can take on three values, L, W, and H. These correspond to the values described in Table 3.3. They are shown here in cyan (H), gold (W), and magenta (L) for clarity. The output, like the input, takes on the values 0 and 1. The input is blue and the output is green under normal operating conditions, while the output turns to red to indicate an 117 Error! Time Time Time Phase Phase H L W 1 0 1 0O u tp u t In pu t C lo ck Figure 3.15: Combinational behavior of the JTL in VHDL. Input is shown in blue on top as a function of time, taking on values 0 (low phase) and 1 (high phase). The clock signal takes on three values as a function of time, L, W, and H, corresponding to the definitions given in Table 3.3. For visual aid, the clock signal is color-coded in cyan (H), gold (W), and magenta (L). The output is shown in green on the bottom, and takes the values 0 and 1 like the input. An error is shown on the far right when the clock changes from W to L while the input is high and the output is low. error has occurred. Although RQL gates can be described by combinational logic, the underlying SFQ pulse-based logic imposes some restrictions on the behavior of the gates, much like in the case of the AnotB gate. In the example shown in Fig. 3.15, the clock first changes from W to H, then the input changes to 1. This causes the output to change to 1 as well. The clock changes to W and then L. Only when the input changes from 1 to 0 does the output change. So far, the output has simply mirrored the input. Next, the input changes from 0 to 1 while the clock is W. The output changes to 1 only once the clock reaches W. Similarly, the output only changes from 1 to 0 once both input is 0 and the clock is L. When a change in input from 0 to 1 occurs while the clock is L, there is no change in output. Similarly, when the input changes from 1 to 0 while the clock is 118 Table 3.6: Truth table for a JTL in VHDL. Q is the present output. Q? is the resulting output. Input (A) is the input phase at the input and Clock (C) is the clock signal. 0 indicates a low phase, 1 indicates a high phase. L indicates low clock, W indicates off clock, and H indicates High clock. The states Q = 0, A = 1, C = H and Q = 1, A = 0, C = L are unstable and will produce transitions in output. Present Output Input Resulting Output notes Q Input Clock Q? 0 0 L 0 stable state W 0 stable state H 0 stable state 1 L 0 stable state W 0 stable state H 1 causes transition 1 0 L 0 causes transition W 1 stable state H 1 stable state 1 L 1 stable state W 1 stable state H 1 stable state W, there is no change in output. The only error occurs when input is 1, output is 0, and the clock changes from W to L. This is outside the description of the behavioral logic. The VHDL model separately checks for this condition to occur (or similarly, a change in the clock from W to H while input is 0 and output is 1. The example inputs shown in Fig. 3.15 are not exhaustive. Table 3.6 gives the full truth table for the JTL ?gate?. Just like the Set/Reset gate, the initial output is considered as an input as well. The JTL output is stable for all but two cases: 119 when Q = 0, A = 1, C = H ; or Q = 1, A = 0, C = L. While the clock is low, only the condition Q = 0, A = 1, C = H will change the output to Q = 1. Once this has occurred, only the condition Q = 1, A = 0, C = L will change the output back. Not captured in the table is the error detection. I wrote a separate code in the VHDL model to check for timing violation conditions, which occur when Q = 0, A = 1, C = W ? Q = 0, A = 1, C = L or Q = 1, A = 0, C = W ? Q = 1, A = 0, C = H . Finally I note that the figures here show the transitions occurring instantly. In the full VHDL model, these transitions occur only after a delay given by (3.13) and using the ?i parameters discussed in Section 3.3. 3.4.4 Summary of RQL Gates in VHDL RQL gates with input phases behave very similar to CMOS gates with voltage inputs, as can be seen from Fig. 3.13. This allows them to be used in a similar fashion and their behavior can be analyzed using existing software design tools intended for CMOS. However, RQL gates are still subject to certain constraints and the timing behavior is much different from CMOS. Nevertheless, this behavior can also be captured in VHDL as shown in this Chapter. The AnotB gate in particular has certain timing requirements due to the detailed behavior of the gate. It is the only gate that generates logic errors. Finally, JTL units follow combinational logic rules as well, with the caveat that the clock signal carries three values which do not correspond to any particular physical quantity. The JTL is the only ?gate? which generates timing errors. 120 Chapter 4 Power Network Design 4.1 Introduction In this Chapter, I discuss the design of the network I developed for powering my RQL circuits. In RQL circuits the power splitter must accomplish several tasks. First, the splitter must step down the impedance from 50? at the pads to 32?, the impedance of the clocklines coupled to transformers within the circuit. Second, it must not only split the power evenly, but but also recombine the power in the clock lines to be taken off chip. Third, although it must function over a broad frequency range, the amount of space it uses on the chip needs to remain small. Fourth, in order for an RQL circuit to work properly, the distribution of currents between splitters and combiners must remain within 10% of nominal values within the frequency range. Finally, the splitter must also maintain these properties when the electrical length of individual lines between the splitter and combiner are changed by loading or fabrication. I first describe a general design for an eight-way Wilkinson power splitter and discuss three possible responses: geometric, equal ripple, and maximum flat. I then describe the design of two power splitters that I used for testing RQL circuits. I next compare results from testing both designs. Finally, I describe a second experiment using a maximum flat Wilkinson power splitter to power an RQL circuit containing 121 (b) RN R1RN?1 ?/4 ?/4 Z0 Z0 Z0 ? 2Z0 ? 2Z0 R = 2Z0 (a) Port 1 Port 3 Port 2 Zout Zout ZN ZN?1 Z1 Zin Figure 4.1: Wilkinson Power Splitter. (a) The traditional Wilkinson power splitter: one stage with equal impedances at all ports and quarter- wavelength transmission lines. For the design frequency, evenly splits power without losses between output ports (on right) while completely isolating the output ports from each other. (b) The general Wilkinson power splitter with N stages, 2N transmission lines, and N resistors. Higher bandwidth than one-stage Wilkinson power splitter. Note also that input and output impedances need not be equal and we allow any electrical length ? so long as all transmission lines are equal in length. shift registers. Figure 4.1 shows the general layout of a Wilkinson power splitter using quarter- wave length transmission lines and resistors. Figure 4.1(a) shows the most basic Wilkinson power splitter design with equal impedances on input on left and output on right. The quarter-wave segment is central to the operation of the power splitter. For transmission lines of length l the impedance of the line is Zin(l) = Z0ZL + iZ0 tan ?lZ0 + iZL tan ?l , (4.1) where Z0 is the characteristic impedance of the transmission line, ZL is the load impedance, and ? = 2pi/? is the wavenumber. For quarter-wave segments, this gives Zin = Z20/ZL, which for ZL = Z0, Zin = ? 2Z0. Using this value of the impedance for the quarter-wave segments, the impedance ?R seen at the input can be calculated as the combination of one quarter wave segment in parallel with the 122 resistor and other quarter-wave segment in series, as follows: 1 ?R = 1 Z0 [ 1? 2 + 1 2 + ? 2 ] = 1 Z0 (2 +?(2)) +?2? 2(2 +?2) = 1 Z0 2 + 2 ? 2 2 ? 2 + 2 = 1 Z0 . (4.2) This gives the effective impedance of the power splitter as Z0, a perfect impedance match. Figure 4.1(b) shows a generic layout for a 2-way Wilkinson power splitter with impedance Zin on the input and impedance Zout on the output, with N stages. Note that the numbering of resistors and quarter-wave transmission lines starts at the output. A more thorough description of the Wilkinson power splitter can be found in many textbooks on electrical engineering, such as [39]. 4.2 Circuit Design I designed a Wilkinson power network in two steps. As I discuss below, I first minimized the input port reflections in ?even? mode and I then maximized the out- put port isolation in the ?odd? mode. Both steps are important for power networks in digital circuits since reflection at the input port of the power combiner causes standing waves in the power lines and the corresponding nonuniform distribution of the current produces potential spikes at the anti-nodes. Odd mode analysis ap- plies to the situation where the clock lines have different electrical length due to the different topology of the lines or different load by the gates. To proceed, I considered a generic Wilkinson power splitter design (see Fig. 4.2). The configuration of a Wilkinson power splitter is specified by giving the num- 123 ber of power splitter segments1 in successive sections. I used a one, two, two, and one or 1221 configuration. In general, a splitter configuration with M segments is desig- nated by a0a1 . . . aM. The total number of quarter-wavelength stages is ?M m=0 2mam and the number of resistors is ?M m=1 2m?1am. These are important design metrics for system integration. These configuration only describe the layout of the power splitter, not the values of the impedances Zm or resistances Rm. In the next sec- tion, I introduce three different methodologies for determining the impedances of the stages. 4.2.1 The Even Mode Figure 4.3 shows the decomposition of a generic Wilkinson power spitter into the ?even? and ?odd? mode. The full response of the circuit to any input at any port can be found by superposition of the circuits shown in Fig. 4.3(b,c) [39]. In the even mode, two equal voltages +V are applied to the output ports. Because the potential across each resistor connecting the upper and lower segments (see Fig. 4.3(a)) is zero due to symmetry, they can be removed. The resulting circuit is shown in Fig. 4.3(b) and has the layout of a generic quarter-wave impedance matching filter. Because the two halves of the circuit behave identically, they are symmetric or even. This is the primary mode of operation of the power splitter I designed; the odd mode shown in Fig. 4.3(c) will be of interest later. In the even mode the Wilkinson behaves like an impedance matching filter 1A note on terminology: In microwave engineering, the term ?stage? is common for the quarter- wavelength transmission lines in a WPS. As such, we need a different word to describe groups of these stages into hierarchical units. As both ?stage? and ?segment? are similar in both sound and meaning, I wish to explicitly note the difference. 124 (a) Z12Z22Z34Z44Z58Z6 Zout8Zin Zout Zin Port 0 (b) Z1 Z4Z5 Z6 Z3 Z2 Port 2 Port 3 Port 1 Port 4 Port 5 Port 6 Port 7 Port 8 R5 R4 R3 R1 R2 Figure 4.2: Schematic of the six-stage, eight-way Wilkinson power split- ter (1221 configuration). Input on the left from a line with impedance Zin. Output on right to a line with impedance Zout. (a) Stage 1 is a single-stage Wilkinson. Stages 2?5 are each part of two two-stage Wilkinson power splitters. Stage 6 is an impedance matching stage. All elements on vertical line share same design values. (b) Even-mode anal- ysis schematic of the schematic in (a). Starting from the output, each Wilkinson stage increases the input impedance by a factor of two. The Eight-way Wilkinson in even mode is equivalent to a six-stage impedance matching filter with impedances 8Z6, 4Z5, 4Z4, 2Z3, 2Z2, and 1Z1. 125 +V +V 2Zin 2Zin +V Zin ?V Zout Zout Zout Zout Zout Zout R R/2 R/2 +V ?V 0 0 (c) (b) (a) Figure 4.3: Even and Odd mode analysis of the Wilkinson Power Splitter. The Wilkinson power splitter shows a perfect bilateral symmetry, which aids the analysis. Notation mostly removed for clarity. (a) Symmetry line separating the Wilkinson power splitter into two electrically identical halves. We apply a voltage V at port 2 and ?V at port 3. (b) Even mode. +V applied to port 3. Because of symmetry, potential across each resistor is zero and no current flows through them. Input impedance is double, as the two inputs are in parallel. The halves now appear virtually identical to an impedance-matching filter. (c) Odd mode. -V applied to port 3. By symmetry a zero-potential must exist between the halves, effectively grounding the middle of each resistor and electrically separating the two halves. The representative resistor R has value R/2 now for each half in odd mode. No signals propagate through in odd mode. 126 between impedances Zin and Zout. The Wilkinson power splitter is normally ana- lyzed with equal impedances on both input and output. However, the even- and odd-mode analysis can be generalized to the case of unequal impedances [39]. The response of the equivalent impedance matching filter is determined by the choice of impedances in each stage of the Wilkinson power splitter. I consider three responses. First, the geometric response is of the form ?(?) ? cos(N ?) + cos((N ? 2)?) + . . . and has a constant ratio of impedances, making it easy to design in physical layout. Second, the maximum flat response has the form ?(?) ? (1?e?i ?)N and fulfills the condition dn?/d?n(pi/2) = 0 for n = 0, 1, . . . , N?1. Third, the equal ripple response has the form ?(?) ? Tn(cos ?) where Tn is the Chebyshev polynomial of the first kind. This gives the broadest bandwidth for a given maximum reflection coefficient within the bandwidth. The detailed derivation of filter impedances for the different cases can be found in Appendix C (page 263). I find [39] for the geometric response: Zn+1 = Zn N+1 ? 2M Zin Zout , (4.3) for maximum flat response: Zn+1 = Zn ? exp(2?NKNn ln(2M ZinZout )), (4.4) where binomial coefficient KNn = N ! (N ? n)!n! , and the equal ripple response: Zn+1 = Zn 1 + ?n 1? ?n , (4.5) where N = ? am is the total depth of the Wilkinson power splitter and ?j is propor- tional to the jth expansion coefficient of the Chebyshev polynomial (see Appendix B). ForN = 6 I find the impedance values shown in Table 4.1. Figure 4.4 shows the 127 Table 4.1: Impedance values for the Wilkinson splitter stages for different configurations with N = 6 stages. All values are in Ohms. Values given in order matching Fig. 4.2. Final design values correspond to Fig. 4.10 and are given here for comparison. Fractional bandwidth given for -13 dB. Values for final design are constrained by fabrication limitations, unlike for the three other responses. Z6 Z5 Z4 Z3 Z2 Z1 Bandwidth Maximum Flat 30.88 50.98 31.31 32.70 20.09 33.06 0.9 Equal Ripple 28.19 42.88 28.70 36.47 24.41 37.14 1.2 Geometric 23.78 35.33 26.25 39.01 28.98 43.07 0 Final Design 47.94 37.24 20.72 19.36 21.44 33.36 0.73 even mode reflection parameters2 for these three configurations as well as the final design which I used. The geometric response has the largest input mismatch and this gives the highest reflection at the design frequency and across the bandwidth. The equal ripple response has better matching and will have lower overall reflections and a higher bandwidth. The maximum flat response has a smaller bandwidth than the equal ripple response, but has the lowest overall reflection. At the target frequency the variation is lowest for the maximum flat response. I discuss my final design in Section 4.2.5. The 1221 Wilkinson is not the only configuration for a power splitter. I com- pare two other configurations, the 4440 configuration and 2220 configuration power splitter. Figure 4.5 shows the 4440 Wilkinson splitter. (Resistors have been omitted for clarity.) This kind of design lends itself well to the geometric response, as each segment has identical impedances in each of the quarter-wave stages. Figure 4.6 shows a similar configuration with only two stages per segment. 2Unless noted otherwise, all S-parameters are given as Sij = 10 log10(Vi/Vj). 128 0 0.1 0.2 0.3 0.4 S 1 1 [lin ea r] -40 -30 -20 -10 0 0.5 1 1.5 2 Frequency [f/fc] [dB ] Maximum Flat Equal Ripple Geometric Figure 4.4: Wilkinson 1221 Simulated Reflection Parameters for max- imum flat design (red solid), equal ripple design (green dotted), and geometric design (blue dashed) are plotted as a function of normalized frequency in linear and log space. Circuit shown in Fig. 4.2 and circuit parameters correspond to values given in Tables 4.1 and 4.2. Line drawn in at -13 dB (5%) is for reference. Geometric design has high reflections over entire design range. Equal ripple has less than -13 dB reflection over the broadest frequency range (by design), as shown by bottom ar- row. Maximum flat response has smaller range (shown by top arrow) for which reflection is below -13 dB but has the smallest variation of reflections within the design range. Figure 4.7 shows the even mode reflection parameters for the splitter design shown in Fig. 4.5. (Compare with Fig. 4.4.) Notice that the six-stage geometric series response has high reflection parameters. The six-stage device is similar to the 12 stage design, though each of the branches with four quarter-wave segments becomes a branch with two quarter-wave segments, turning it into a 2220 configu- ration splitter. Comparing the 12-stage geometric series response and the six-stage 129 Z4 Z3 Z2 Z1 Z4 Z3 Z2 Z1 Z4 Z3 Z2 Z1 Figure 4.5: Circuit schematic for Wilkinson 4440 configuration for initial prototype power splitter using a geometric response. Resistors not used in this design; testing was only done in even mode. Designed for input and output impedance of 32 ?. Ratio of impedances for each trans- mission line stage to the next was 2?1/4, such that Z1/12Z4 = Z4/Z3. (a) 2Z12Z24Z34Z48Z58Z6 Zout8Zin (b) R4 R3 R2 R1 Zout Port 2 Port 3 Port 1 Port 4 Port 5 Port 6 Port 7 Port 8 Z3Z4 Z5 Z2 Z1 Z6 Port 0 Zin Figure 4.6: Circuit schematic for WPS2220. Similar to the 4440 design. maximum flat response, the reflection parameters are similar even though the latter has half the number of stages. 130 -30 -25 -20 -15 -10 -5 0 0 5 10 15 20 S 1 2 [dB ] f [GHz] fc = 10GHz Geometric Series Max Flat Response 12 Stage 6 Stage Figure 4.7: Geometric versus max flat power splitter reflections. Simu- lated reflection parameters of 1:8 Wilkinson power splitter in even mode using actual impedance values. The maximum flat response for a 6-stage deep splitter (red curve, 2220 configuration) is compared to the geomet- ric response for 6 and 12 (4440 configuration) stages (dotted and solid green curves, respectively). 4.2.2 The Odd Mode The second half of the analysis of a Wilkinson power splitter involves applying a voltage +V to one output port and -V to the other output port, as shown in Fig. 4.3(c). In this case, the circuit has a zero-potential between the top and bottom halves with, making the two halves appear opposite or odd to each other. Proper odd mode analysis [39] depends on specific impedance values. We can maintain a mirror symmetry in the odd mode by treating each segment as an individual power splitter with ?V at each input. 131 Table 4.2: Resistance values in Wilkinson power splitter for different responses. Values in Ohms. Resistors R2 and R4 tend to have very small values compared to the rest of the resistors. Final design values given for reference and correspond to Fig. 4.10, which omits resistors R4 and R2. Final design is for a 3111 configuration, all others are for a 1221 configuration. R5 R4 R3 R2 R1 Maximum Flat 49.29 2.05 31.62 2.08 64 Equal Ripple 44.43 2.06 37.39 2.07 64 Geometric 39.77 2.06 43.90 2.05 64 Final Design 23.56 n/a 20.62 n/a 58.62 For N = 1 the problem is trivial; for N = 2 we follow the method of Cohn3, which for N = 2 and a fractional bandwidth of 1 gives [40] R2 = 2Z1Z2?(Z1 + Z2)Z2 , (4.6a) R1 = 2R2(Z1 + Z2) R2(Z1 + Z2)? 2Z2 , (4.6b) where R2 and R1 are the resistor values of the resistors in Fig. 4.1(b) and Z2 and Z1 are the impedances of the quarter-wave transmission lines shown in Fig. 4.1(b). I can apply (4.6a) and (4.6b) to the results calculated from (4.3), (4.4), and (4.5) for the circuit shown in Fig. 4.2. The results are shown in Table 4.2. The odd mode analysis normally completes the design of a power splitter. RQL circuits place additional requirements on the design. So far I have only considered a generic 1221 power splitter. I will now also consider several alternative configura- tions, including a 4440, 2220, and 3111 configuration Wilkinson power splitter. 3Cohn?s original result is R2 = 2Z1Z2/ ?(Z1 + Z2)(Z2 ? Z1 cot2 ?3). In Cohn?s method [40] ?3 is the fractional bandwidth in units of 2pi. For better comparison I simplify the equations for a fractional bandwidth of 1, for which cot?3 = 0. Fine tuning of the circuit occurs later in the design process and this simplification does not impact final results. 132 Figure 4.8: Simulated Port Isolation for Geometric 8:1 Wilkinson Power Splitter. Simulated port isolation of 8:1 Wilkinson combiner between Port 0 and Port 8 for geometric series and maximum flat response. Note that the maximum flat responses will be different for the two designs on account of the different designs, 3111 vs 2220. 4.2.3 Isolation Isolation between the input and output parts of a Wilkinson splitter port is achieved by placing resistors in between divided power branches. No current flows through these resistors in the even mode. Resistor values are chosen to null reflections between ports in the odd mode. The number of resistors and their values are selected to minimize reflections between ports at maximum bandwidth [40]. Figure 4.8 shows S-parameters for refection (S88) and throughput (S80) for the 2220 Wilkinson divider for the worst case, i.e. when power is applied to one of the output ports with the rest of the output ports and the input port matched and 133 Zout 8 7 6 5 4 3 2 1 0 WPS ZoutZin Figure 4.9: Isolation parameter measurement. Power is applied to port 8 of the Wilkinson power splitter while all other ports are terminated in matched loads. terminated (see Fig. 4.9). One can see that the reflection S88 from port 8, where power is applied, is similar for both responses. Figure 4.8 shows the S-parameters for a geometric divider similar to that shown in Fig. 4.5 but with only two quarter- wave segments per stage. The impedance values of this circuit were chosen to give a geometric series response and a maximum flat response, using actual impedance values possible in fabrication instead of the ideal calculated values. The choice of even mode response has little effect on the isolation behavior of the splitter. As for the even mode analysis, it is worthwhile to consider a new design of the Wilkinson power splitter. Figure 4.10 shows a schematic of the 3111 configuration Wilkinson power splitter. This design has fewer resistors and quarter-wave stages than similar 2220 or 1221 configurations. Figure 4.11 shows the S-parameters of the 3111 divider shown in Fig. 4.10(a), using actual impedance values. The circuits were simulated using AC analysis in the WRspice simulator [41] for the central frequency of 10 GHz, chosen as a design 134 (a) Z1 8Zin Zout 8Z5 8Z4 4Z3 2Z28Z6 Zout Zin Port 0 (b) Port 2 Port 3 Port 1 Port 4 Port 5 Port 6 Port 7 Port 8 Z1 Z2 Z3 Z4Z5Z6 R1 R2 R3 Figure 4.10: Circuit schematic for N23PS. (a) Schematic of final design for N23PS, a six-stage, eight-way Wilkinson power splitter. (3111 con- figuration.) Similar to Fig. 4.2. Input line on the left has impedance Zin. Output line on right has impedance Zout. Stages 1 ? 3 are single- stage Wilkinsons. Stages 4 ? 6 are impedance matching stages. All elements on vertical line share same design values. Note that resistor R3 in this schematic is equivalent to R4 in Fig. 4.2. Resistors R3 and R5 from Fig. 4.2 have been eliminated in the final design. (b) Even-mode analysis schematic of the schematic in (a). Starting from the output, each Wilkinson stage increases the input impedance by a factor of two. The eight-way Wilkinson in even mode is equivalent to a quarter-wave transformer with segment impedances 8Z6, 8Z5, 8Z4, 4Z3, 2Z2, and 1Z1. 135 -50 -45 -40 -35 -30 -25 -20 0 5 10 15 20 R efl ec tio n Co effi ci en t [dB ] Frequency [GHz] S08 S78 S58 S18 Figure 4.11: Wilkinson 3111 Simulated S-Parameters. Simulated S- Parameters of the Wilkinson 3111 maximum flat response power splitter which has a design frequency of 7.5 GHz. All S-Parameters are given as 10 log(V/V0). S80 (red) is the ratio of the voltage at the ?input? divided by the voltage applied to output port 8. S88 (light blue) is reflection off the output port where the voltage is applied at the output port. S87 (green) is throughput to adjacent port. S85 (dark blue) and S81 (purple) represent ports two and three branches away on the Wilkinson, respec- tively. Within the design frequency range the throughput to Port 0 is about -5 dB while reflection remains below -15 dB. Throughput to other ports remains under -10 dB. 136 compromise between available real estate and cryoprobe limitations. The design rules limit the widths of microstrips to certain values, and thus limits the impedances to certain values. This design was optimized for a 1:8 Wilkinson transformer with a minimum required depth N = 6 of ?/4 segments. Impedances of the ?/4 segments were calculated by analogy with a quarter-wave transformer of the same depth with all branches taken in parallel, using (4.3) or (4.4) as appropriate. In the geometric series response, the ratio between two adjacent sections was held constant at 1.34. Note that in the geometric series response design, the impedances repeat at each stage, and this makes the design scalable to an arbitrary number of power divisions. On the contrary, parameters in the maximum flat response have to be recalculated for each particular case, as in (4.4) above. The geometric series and maximum flat response designs involve opposite trade-offs in reflection and bandwidth. As can be seen from S-parameters shown in Fig. 4.7 the geometric series needs double the number of stages that the maximum flat response design required to achieve a comparably small level of reflections. In this case, the reflection S88 at the design frequency is less than -30 dB. The geo- metric response design with N = 12 shows similar behavior to the maximum flat response design with N = 6. 137 l = 90 ps W ilk in so n Po w er Sp lit te r Tr a n sm iss io n Li n es 6? W ilk in so n Po w er Sp lit te r Transmission Line 1 Transmission Line 8 Transmission Line 3 Transmission Line 2 Variable Length Record Currents Here l = 90 ps Figure 4.12: Block diagram for measuring standing currents. Two Wilkinson power splitters are connected through eight microwave trans- mission lines, each with a nominal length of l = 90 ps. The bottom six transmission lines connect the two power splitters. The second trans- mission line has a variable length ?l. The top transmission line is simu- lated as eleven shorter transmission lines in series, each with a length of l = 8.18 ps. In simulation, the currents between the short transmission lines can be recorded. These currents are plotted in Figs. 4.13 and 4.14. 4.2.4 Current Distribution The main consideration in choosing one power splitter design over another is the requirement for current uniformity in the clock power lines. I analyzed the current uniformity by embedding relatively long ? = 90 ps (l = 9 mm) clock power lines between two dividers (one of which was used as a combiner) and monitoring the current profile at 10 equally spaced points of 9 ps apart. (See Fig. 4.12.) Figure 4.12 shows a block diagram of the setup used to simulate standing 138 Figure 4.13: Simulated standing wave currents at ten locations inside clock lines between power splitters in the 4440 configuration, as shown in Fig. 4.16. Nominal and 10% of bias current are indicated by straight lines. Graphs offset for clarity. Imbalance refers to extra electrical length of only one line. waves in the transmission lines. Two Wilkinson power splitters are connected by eight transmission lines in total. Six are regular transmission lines with electrical length l = 90 ps on ports 3?8. Port 2 is connected to a transmission line with length ?l, which I vary in simulation to induce odd mode behavior in the Wilkinson power splitters. Port 1 is connected to a series of 11 shorter transmission lines with length l = 8.18 ps. Though this transmission line is of the same overall length, in the simulation I can record the currents at the nodes between shorter transmission line segments. (See Appendix C, Section C.3 on page 268 for the netlist used to generate this data.) 139 Figure 4.13 shows the results from standing wave analysis for the case when the electrical length of one power line is either 0 ps or 40 ps longer than the others. In this simulation, I used the 2220 splitter configuration, as shown in Fig. 4.6. For comparison, Fig. 4.14 shows the results from the standing wave analysis for the case when the electrical length of one power line is either balanced or 10% longer than the others for the 3111 design shown in Fig. 4.10 (only for the maximum flat response). The former case corresponds to 44% imbalance in electrical length, which far exceeds the expected worst case in a practical circuit. As described in Chapter 2 (pg. 75) the delay is less than 2 ps for 106 junctions. In that experiment, the coupling between clock lines and junctions was greater by a factor of three than in regular RQL circuits. It would take 107 Josephson junctions to accumulate this amount of phase delay due to dynamic switching [27]. Junction switching therefore does not contribute to an imbalance in the power splitters. In Nb microstrips with a propagation speed of about 100 ?m/ps, it will take 4 mm to accumulate 40 ps of delay. This is a large distance compared to the scale of circuit elements, and therefore small variations in the clock length through a chip will also not contribute greatly to imbalances in the power splitters. Figure 4.13 shows that the expected distribution of the bias current is signif- icantly different between the geometric series and maximum flat response for the 2220 splitter. The maximum flat response is designed to have minimum variation within the bandwidth, and this property carries over to the bias current distribu- tions. Figure 4.14 shows that the distribution of the bias current is significantly different between the 2220 layout (see Fig. 4.6) and the 3111 layout (see Fig. 4.10). 140 The geometric series with 6 stages does not satisfy requirements of ? 10% variation in bias current (see Fig. 4.13). In contrast, the maximum flat response for the 3111 design gives an octave of bandwidth (5-15 GHz) with no more than ? 10% vari- ance in bias current (see Fig. 4.14). Between 8.5-11.1 GHz the 3111 design gives ? 1% variation. This indicates that maximum flat response performs much better for our figure of merit of current uniformity. The 3111 design does even better be- tween about 7?13.5 GHz with ? 1% variation. The 3111 design also saves space by matching between 50? and 32?, which the 2220 design does not. 4.2.5 Final Design To test the capabilities of Wilkinson power splitters I had two designs fabri- cated, which I will denote as Monrovia 20 Power Splitter (M20PS) and Norwalk 23 Power Splitter (N23PS). The first design, M20PS, is a 4440 configuration Wilkinson and has a geometric response and no isolation resistors (see Fig. 4.5). I used this de- sign to test the even mode response of a power network. The second design, N23PS, seen in Fig. 4.10, is a 3111 Wilkinson and has an optimized maximum flat response. I chose the parameters for this design it using (4.4), (4.6a) and (4.6b). In a second iteration of the design of N23PS, the impedances and resistors were simulated and fine tuned to produce better standing wave ratios. In addition to good responses in the even and odd modes, I also checked the behavior of the currents that flow between splitters and combiners. 141 0 0.5 1 1.5 2 0 5 10 15 20 0 0.5 1 1.5 2 N o m in a lC u rr en t [na tu ra lu n its ] N o m in a lC u rr en t [na tu ra lu n its ] Frequency [GHz] Figure 4.14: Standing Waves in Wilkinson 3111 Power Network. Simu- lated current amplitudes at ten points along the transmission line con- necting Wilkinson 3111 power splitters in Fig. 4.10 as a function of fre- quency. Top graph (green) shows results with eight transmission lines of equal length between Wilkinsons with f = 10GHz designed center fre- quency. Bottom graph (red) shows same currents along a regular length of transmission line when another line has a 10% longer electrical length. ?10% lines shown for reference. With the maximum flat design and no length imbalance the currents stay within ?10% between about 5 GHz and 15 GHz, with less than 1% variation between about 7 GHz and 13 GHz. With the 10% length imbalance, the operational range with less that 10% variation in bias current is between about 6 GHz and 14 GHz, with substantial variation throughout. Graphs offset for clarity. 142 calibrated reference planes c1 c1* a0 c0 c0* dc0 dc0* q0Circuit 4.2 K a0* c1 c1* a0 c0 c0* dc0 dc0* q0Circuit 4.2 K a0* or Calibrated Lines Network Analyzer Calibrated Lines Network Analyzer Figure 4.15: Experimental setup for measurement of S-parameters. S- parameter measurements were performed only on inactive circuits and no other equipment was attached. The network analyzer was connected to the clock input and output of the probe. Using either a standards kit or a standards chip, the network analyzer could be calibrated to the top of the probe or to the pads on the chip. 4.3 Standalone Test To test the above simulation results, I designed and had fabricated two Wilkin- son splitters. The first one, N20PS, is shown in Fig. 4.5(a) and was designed with a geometric series in the 4440 configuration. The second one, M23PS, is shown in Fig. 4.10 and was designed with the maximum flat response in the 3111 configuration to measure isolation. To measure the S-parameters of the M20PS and N23PS circuits, I used the setup shown in Fig. 4.15. The coaxial cables leading to and from the network an- 143 alyzer were calibrated out of the measurement using a standards kit. Calibration chips were also available, though because the chip had not been characterized, the network analyzer used a generic profile for the standards chip. Ultimately, calibrat- ing to the top of the probe with the calibration kit with a known profile yielded better results. 4.3.1 Experimental Setup Figure 4.16 shows a microphotograph of the geometric response 1:8 Wilkinson divider/combiner M20PS test circuit. Due to limited space on the chip only the 12-step geometric series response even mode circuit was tested. The circuit was optimized for a center frequency of 17 GHz and 10 GHz bandwidth. The impedance values were adjusted to accommodate the available discrete values of the width of signal wire due to the 0.5?m lithography step (see Table 4.1). The total occupied area of the circuit is 630?m(0.08?)? 1550?m(0.2?) with approximate dimensions of one segment of 400?m (0.05?)? 320?m (0.04?). The dimensions of the circuit compare favorably with previously published lumped-element and distributed de- signs [42, 43]. The clock signal enters from the left at port 1, is split eight ways, and then combined and taken off chip on the right at port 2 (see Fig. 4.16). Figure 4.16 shows a test circuit schematic with a 18.5 dB resistive tap added to monitor current. The tap is included in the bottom power line at an electrical distance of approxi- mately 0.13? from the power splitter output port. To balance the circuit, identical taps are added to other lines and resistively terminated on chip (not shown). The 144 32 32 4.2 K t = 90 ps Port 1 270 42 3232 1.6 1.6 t=45 pst=45 ps Port 3 50 50 50 Port 2 (b) (a) Figure 4.16: M20PS even mode test. (a) Microphotograph of M20PS circuit showing input, output, and 20 dB tap array (left). Fabricated by Hypres using the 4.5 kA/cm2 process. (b) Circuit schematic of M20PS circuit. All ports were impedance matched. The impedance-matched tap is shown only on one line for clarity. All other lines had a similar tap but grounded directly on chip instead of going to a pad. 145 circuit has three contact pads to monitor power at each port. Figure 4.17 shows a microphotograph of the chip containing N23PS, the maxi- mum flat response 3111 Wilkinson divider/combiner test circuit. I used this chip to test the 6-stage maximum flat response in even and odd mode. The circuit was op- timized for a center frequency of 7.5 GHz with a 6 GHz bandwidth. The impedance values were adjusted to accommodate the available discrete values of the width of signal wire due to the 0.5?m lithography step (see Table 4.1). The physical size of the power splitter in N23PS is comparable to that of the geometric response Wilkin- son in M20PS. The size of the N23PS relative to the wavelength of the M23PS is smaller by a factor of 7.5/17 = 0.44, although this comparison does not take into ac- count the different fractional bandwidth. This circuit was designed with RQL shift registers with separate inputs to differentially load the lines and cause an imbalance in the power network. Unfortunately, design errors on the chip prevented all but one of the shift registers from being utilized. Nevertheless, I was still able to obtain S-parameter measurements of both pairs of splitters and combiners on chip. I tested the Wilkinson divider/combiners using an American Cryoprobe (ge- ometric response) and a High Precision Devices (HPD) probe (maximum flat re- sponse) with the microwave test setups shown in Figures 4.16 and 4.17 respectively. The American Cryoprobe probe is designed for a 3 dB at 10 GHz cutoff frequency and has pressure contacts. The HPD probe has a 5 dB cutoff at 26 GHz. For both probes, a calibration of the coaxial lines leading from testing equipment, though the probe, and to the chip was performed on a separate chip for different combi- nations of the contact pads to accommodate the spread in electrical parameters 146 Wilkinson Power Splitter Input Input Output Output Sh ift Re gi st er s ShiftR egisters Wilkinson Power Splitter 5 mm 5 m m Figure 4.17: Microphotograph of Norwalk 23. The N23PS circuit is shown on this chip. The four Wilkinson power splitters (red boxes) are in the four corners of the chip. The bottom two and top two are connected by eight lines between them. Input and output to each of the splitter/combiners is shown. Eight shift registers are in the mid- dle (green boxes). Data input to the shift registers is on the left (not marked); output is on the right. This chip was used to test the odd mode analysis of a 6-stage maximum flat series response 1:8:1 Wilkinson power splitter/combiner. 147 Figure 4.18: Comparison of simulated results (solid)with Measurements (dashed) from M20PS with a 17 GHz center frequency and 10 GHz bandwidth. between contact pads. These calibrations indicate that this experimental setup and both probes can reliably be used up to 12 GHz. At higher frequencies the response becomes highly non-uniform due at least partially to limitations of the calibration procedure. 4.3.2 Measurements Figure 4.18 shows a comparison of simulations and measurements on the M20PS circuit shown in Fig. 4.16. Experimental transmission parameters S21 and S31 match the simulated results to within 3 dB at all frequencies within the probe range. Both curves are flat to within 0.5 dB above 4 GHz. Below 4 GHz the device 148 acts like a current divider and not a power divider. Experimental and simulation results for power at the tap (S31) agree well within 1 dB between 3-7 GHz. Both curves show an approximately 1.8 GHz periodicity which corresponds to pad-to-pad resonance (5.5 mm) due to impedance mismatch at the pressure contact. The dips at 8 GHz and 11 GHz are due to inaccuracy in the probe calibration. From the results, I can conclude that overall response at the tap in the M20PS circuit is flat, which indicates no standing waves in the power line. Figure 4.19 shows Spice simulation results of the 3111 circuit shown in Fig. 4.17 with both 0 ps and 40 ps imbalance in one of the lines connecting the splitter to the combiner for a center frequency of 7.5 GHz. Outside the bandwidth (about 4.5?10.5 GHz) the simulation with an imbalance plays little role. Within the region of interest, the 0 ps imbalance has very low reflection below -20 dB. Even with a 40 ps imbalance the reflection is between -10 and -20 dB. Thus, in this range which is the desired response of the maximum flat design, the reflection is still very flat. Figure 4.20 shows measurements of the N23PS maximum flat response (3111) Wilkinson power splitter/combiners with a center frequency of 7.5 GHz. This data comes from N21CLA, which had an identical design of the power splitter4 as in N23PS, now was on a different chip. Experimental transmission parameters S12 (purple) and S21 (blue) match simulations shown in Fig. 4.19. The expected fre- quency range based on simulation is between 4.5 GHz and 10.5 GHz. As can be seen in Fig. 4.20, the measured bandwidth reaches from about 3.5 GHz to 11 GHz, beyond which the transmission parameters become considerably smaller. The transmission 4See Chapter 6 149 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 12 14 -30 -25 -20 -15 -10 -5 0S- Pa ra m et er s (lin ea r) S- Pa ra m et er s (dB ) Frequency f (GHz) ?f/f = 1 Throughput (S12) Reflection (S11) Throughput (S12) w/ 40 ps imbalance Reflection (S11) w/ 40 ps imbalance Figure 4.19: Simulated reflection for the N23PS circuit with a configu- ration as shown in Fig. 4.10. Green (dotted) curve shows nominal case with no imbalance between lines connecting power splitters. Red (solid) curve shows the results on the overall reflection parameter S11 at either input or output port of the whole power network when a 10 ps imbalance is in one of the eight lines between power splitters. Close to the center frequency the reflection (solid green) is less than -25 dB, whereas with the imbalance the reflection (dashed green) ranges from -20 dB on the low end to -10 dB lower than 4 GHz or higher than 10 GHz. 150 parameters of the maximum flat response Wilkinson splitter/combiner compares fa- vorable with the transmission parameters of a simple through line between pads, shown as the dotted black curve in the Fig. 4.20. The inset shows the transmis- sion is fairly flat, usually within 1 dB, between 3 GHz and 12 GHz. As with the 4440 design, below 3 GHz the 3111 power divider acts like a current splitter. The reflection is around -20 dB but reaches closer to -10 dB at some points, notably one peak close to 6 GHz. The flat nature of the responses indicates only minimal standing waves. Finally, I noticed that during layout, part of the 8th segment was inadvertently shortened by 1/20th of a wavelength at 7.5 GHz. The results here thus contain an imbalance of at least 2 ps on one of the lines. 4.3.3 Power Splitter Test Conclusions Among the two designs I tested (M20PS and N23PS) the geometric response was the least desirable because it had the highest reflections within the range of interest. Figure 4.10 shows the schematic of N23PS. The values of R2 and R4 in the 1221 design of Fig. 4.2 were uniformly small for all responses. They dissipated relatively little power compared to R3 andR5 and I have discarded them in the design of N23PS. This reduces the splitter from a 1221 configuration to a 1111 configuration with only 15 transmission line stages and seven resistors in total. Adding two stages to create a 3111 Wilkinson with N = 6 costs little in space, which is why I chose the 3111 configuration for the design of N23PS. Compare this to the 21 stages and 10 resistors in the 4440 configuration. The extra stages in the 3111 design help step 151 -30 -25 -20 -15 -10 -5 0 0 5 10 15 20 S- pa ra m et er s [dB ] f [GHz] (a) S11 S22 S21 S12 Through S12 -10 -9 -8 -7 -6 -5 -4 -3 -2 2 4 6 8 10 12 S- pa ra m et er s [dB ] f [GHz] (b) S21 S12 Figure 4.20: Measured S-parameters on N21CLA Wilkinson power split- ter. (a) Blue and purple curves show throughput S12 and S21. Red and green show reflections S11 and S22. The throughput (S12) from cal- ibration measurements of a standards chip is shown as a gray curve for reference. (b) A detailed view of the throughput up to 12 GHz. Between about 3 GHz and 12 GHz the curve is relatively flat. The noise on S11 was due to a damaged connector on the network analyzer. 152 down the impedance from Zin = 50? to Zout = 32?. For the N23PS circuit I found the initial impedance values using (4.4) and used (4.6a) and (4.6b) for the resistor values. I then simulated the Wilkinson 3111 with the schematic shown in Fig. 4.17. (The netlist is given in Appendix C.) Using commercial software (Agilent Advanced Design System 2009) I optimized the impedances and resistors to get the desired frequency responses. Figure 4.21 shows the results of this optimization on the isolation and crosstalk responses in the N23PS circuit. Using only three resistors between ports, the iso- lation is substantial. Most power is absorbed, with less than -50 dB reaching the input port and even less reaching any output ports. Figure 4.14 shows the effect of line imbalances on the distribution of current in transmission lines between the Wilkinson power splitters. The effect of imbalances on current distributions is small. When perfectly matched, the transmission lines have almost no variation over 6 GHz range and are still within 10% of nominal down to about 5 GHz. Even a 40 ps im- balance still has all currents within the tolerances down to 5 GHz. On chip, this would amount to a 1 mm or more difference in length on a 10 square millimeter chip, much more than is expected. Figure 4.19 shows the overall reflection of the whole power network for the optimized 3111 design used in N23PS. A 10 ps imbalance changes the reflection coefficient from -30 dB to about -20 dB. Though this is a jump of 10 dB, the overall reflection still remains low, no higher than other elements in a testing setup. (See Appendix C for details on the S-parameters of the probe itself.) The experimental test of M20PS suffered from pad-to-pad resonances. Figure 4.19 indicates that these 153 Input Reflection Output Reflection Isolation fc = 7.5 GHz Figure 4.21: S-parameters from ADS for N23PS. Three curves charac- terizing the final design of N23PS calculated from ADS. Input reflection is S00, the reflections off the input port to the Wilkinson power splitter. Output reflection is S88, the reflection off of signals arriving at one of the outputs. Isolation is S78, the signal at an adjacent port when a signal arrives at one of the outputs. small variations may be even smaller in the final design. The tolerance to imbalances in the clock lines of the N23PS power splitter is smaller than for the previously studied Wilkinson 1221 configuration power splitter. However, the design of the N23PS Wilkinson incorporates a 50? input impedances, eliminating the need for a separate high-bandwidth transformer. The maximum flat response design used for N23PS has a bandwidth appropriate to RQL circuit needs. Furthermore, it displayed adequate current uniformity. The design is also space efficient. Looking ahead, I will need four splitters on each chip with more than one 154 clock phase and more than one parallel pipeline. For each chip, efficient size is a design necessity. 4.4 Test with RQL Circuits After completing the test described in Section 4.3, I still needed to test the Wilkinson power splitter in an RQL circuit. From here forward, the design used was the maximum flat response Wilkinson power splitter in the 3111 configuration and fabricated on the N23PS and N20CLA circuits. This design minimizes the effects of imbalances and achieves a broad frequency operating range. Despite design and fabrication problems, the chip pictured in Fig. 4.17 still had one functioning shift register powered by the Wilkinson power splitters. Although differential loading of the circuit by selectively turning shift registers on and off was not possible, I was still able to measure the clock power margins on the shift register. The power margins can be found by noting that the RQL shift register will fail if at any point along the clock line powering the shift register the current becomes too high or too low. If too low, the SFQ pulse will fail to propagate. If too high, the junctions will switch without input and generate a continuous series of pulses. Both cases can be readily observed using an oscilloscope. In both cases, failure is defined as the power at which the output becomes probabilistic, the output voltage sometimes registering as high and sometimes as low for a given pulse. 155 C B Shift Register Clock 1 Clock 2 Data In Data Out D A Chip Pad A (7) A (6) A (5) B (4) C (3) C (2) C (1) D (0) Figure 4.22: Odd mode test block diagram for N23PS. Four Wilkinson power splitters in the corners provide power and a clock signal to seven shift registers in the middle, labeled 1?7 and in groups A?C. The eighth lines from the power splitters goes between splitter and combiner but does not supply power to a shift register. An eight shift register (0) is powered directly by two clock lines and is completely separate from the rest of the circuit. Shift registers 1?3 are in group C, triggered by input on port C Shift register 4 is triggered by an individual input port B in group B. Shift registers 5?7 are in group A and triggered by input from pad A. Each input has a corresponding return pad. Each shift register has its own output pad. 156 4.4.1 Power Margin Experiment Figure 4.22 shows a schematic of the circuit I used to find the power margins in the N23PS circuit. Eight shift registers are labeled with a unique number (starting with 0 at the bottom) and a letter corresponding to four pairs of pads A ? D. Each pair of pads is connected in series to a corresponding set of shift registers. Thus, shift register 0 (SR0) is part of group D, and so on. Each group of shift registers is triggered by the same input. Four power splitters supply two-phase clock power to the shift registers in A, B, and C, with an eighth line between the splitter and combiner which is unloaded by any digital circuits. Group D serves as a control circuit and has an identical but completely isolated shift register. I used this circuit to check for malfunctions in either the other shift registers or the power network. Each shift register outputs to a different output pad. Not shown are two dc bias lines for D (separate) and A, B, and C (in series). This setup allows certain branches to be loaded while others are left unloaded to determine the effects of RQL loading on the power network. The network analyzer can also be used on the power networks, although not while the RQL circuits are engaged. The experimental procedure is as follows: First I established the optimal op- erating point. For a given frequency, the optimal operating point (for flux bias and input phase) will depend on the phase between clocks, the offset between data and the clocks, and the clock power. Because of the self-correcting timing of RQL circuits the data offset has little effect over a broad range of operation. The rela- tive clock phase between the two clock lines, which should be offset by pi/2, is the 157 most sensitive parameter. After I found a general operating point by adjusting the relative clock phase, clock power, and input phase, I turned the clock power down until the circuit was barely functional. I then adjusted the clock phases until cor- rect operation was established. I then turned down the power again, repeating the process until any significant change of the clock phases resulted in failure. At the low-power, optimum clock phase point I turned the data phase up and down, noting failure points, before returning it to the middle of the range. This established the low power margin. From here, I turned the power up until failure. This established the high power margin. One feature of RQL is that the reciprocal pulses need not follow on the same clock cycle. The Anritsu pattern generator I used is rated only up to 12 GHz for a non-return-to-zero (NRZ) output, essentially limiting me to 6 GHz data input speed. However, by ?over-clocking? the clock signal by a factor of 3/2 faster than the return-to-zero (RZ) data rate, the reciprocal pulses can follow one-and-a-half clock cycles later instead of one-half clock cycles later. For example, a data steam triggered off of 6 GHz signal will generate a positive SFQ pulse every 166.6 ps. These pulses can be propagated on an RQL circuit being clocked at 9 GHz. Instead of following half a clock cycle later (as measured by the ac bias applied to all junctions), the reciprocal pulse follows one-and-a-half clock cycles later. Although this does not increase the data speed, it does allow the power splitter to work at higher frequencies where the power network may have better performance. 158 4.4.2 Data and Analysis Figure 4.23 shows the results of my power margin measurements on the M20PS circuit. The input power for high and low margins (adjusted for the attenuators and other equipment attached to the probe) is shown by the black points. The region of correct operation is shown as the green shaded area between these data points. On the top of the figure, the power margin in dB is shown by the black curve. Red and blue curves are measurements of the S-parameters of the power network5 from input chip pad to output chip pad. Above 6.5 GHz data rate the clock rate was increased to 32 the data rate, which is marked by the ?overclock? region in the Figure. Examining 4.23, we see that the S-parameters look similar to those measured previously (compare to Fig. 4.20). I also note that between about 4 GHz and 8.5 GHz the power margins are between 2 dB and 5 dB. The power margins are on average about 2.4 dB in width, or about ?36%. Above 8.5 GHz the data input rate is approaching 12 GHz NRZ, the maximum rated for the equipment. At 1.75 GHz, where the power splitter has a very high local throughput and almost no reflection, the power required takes a sharp drop, as expected. In the region between 4 GHz and 8.5 GHz the throughput (S12 and S21) and average power are flat. Table 4.3 shows the pad-to-pad and splitter-to-combiner lengths for the N23PS circuit in units of the wavelength at the frequencies of interest. The length of each quarter-wave segment in the power splitter is approximately 4421?m in length, with 5Note that these measurements are from N23PS. The data shown in Fig. 4.20 is taken from N21CLA. Because N23PS contained design and fabrication errors I have avoided relying on data from this circuit if at all possible. However, here I am comparing the behavior of the power network to the behavior of the circuit powered by this same power network. A valid comparison can only be made by this direct comparison. 159 -30 -25 -20 -15 -10 -5 0 5 0 1 2 3 4 5 6 7 8 9 -30 -25 -20 -15 -10 -5 0 5 Po w er [dB m ] S- Pa ra m et er [dB ] f [GHz] Overclock Power Range Power Margins S12 & S21 S11 & S22 Figure 4.23: Wilkinson-powered RQL circuit measurements of N23PS. Clock amplitude margins (black) overlaid with S-parameters (blue and red) of the on-chip power network. Green region shows operational range of SR1 after adjusting input power for losses in the splitter, attenuator, and on-chip power network. Size of the margins is shown above, ranging from 0 dBm to almost 5 dBm. S-parameters are overlaid with throughput in red (S12 and S21) and reflection in blue (S11 and S22). Measurements in both directions are shown in solid and dashed lines. The overclock region shows where data was taken with the data rate equal to 3/2 the clock speed. Particularly noticeable is the correspondence between the throughput and the power range over the 3?9 GHz operational region. Frequency is given for the clock speed, not data input speed. 160 Table 4.3: Chip resonance lengths for frequencies f of interest. Here, ? = c?/f is the wavelength at frequency f . f Pad-to-Pad distance Splitter-to-Splitter distance 1.75 GHz 0.84 ? 0.14 ? 2.75 GHz 1.32 ? 0.22 ? 3.5 GHz 1.68 ? 0.28 ? 6.5 GHz 3.13 ? 0.53 ? slight variation to account for different propagation speeds at different impedances. The length of the clock lines between power splitters is 10 718?m, with variations of less than 50?m due to design constraints. Even at 6.5 GHz, the length is still a small fraction of a wavelength (about 6%). At this frequency Fig. 4.23 shows a small, narrow drop in margins (a half-wavelength reflection between splitters could cause standing waves and account for the loss of margins). At 3 GHz the loss of margins is due to the high reflections of the power splitter. 4.5 Conclusions The Wilkinson power splitter is a common part of many microwave systems. However, the demands of RQL required a new kind of Wilkinson power splitter. Using even and odd mode analysis, I decomposed the design of the 8-way Wilkinson splitter into the design of quarter-wave impedance matching filters for the even mode and two-stage wilkinson power splitters for the odd mode. I considered four different configurations of the Wilkinson power splitter. The 4440, 2220, and 1221 configurations all performed best with the maximum flat response design. The key requirement for RQL circuits is a low VSRW on transmission lines connecting 161 power splitters, for which the maximum flat response is a clear choice. In the final design of the power splitter, I chose a 3111 configuration with 50? input and 32? output. Starting from an initial design using the maximum flat response, I used ADS software to further increase the performance of the power splitter. In the second half of this chapter I presented experimental measurements in Wilkinson power splitters that shown they can be used for power RQL circuits. In addition to even and odd mode analysis, I used general filter theory to create three designs. This resulted in the 3111 design shown in Fig. 4.10 with the design parameters shown in Tables 4.1 and 4.2. My measurement on this design revealed that the bandwidth was larger than expected and the margins were adequate. 162 Chapter 5 Experimental Verification of RQL Timing Parameters 5.1 Introduction In Chapters 1 and 2, I described the foundations of Reciprocal Quantum Logic. In Chapter 3, I described the timing parameters of RQL and corresponding VHDL models that I used to develop larger circuits. This chapter describes my effort to verify the timing behavior of junctions predicted by the analytic timing model (eq. (3.8)) in Chapter 3. With this in mind, I designed and tested three circuits in order to measure: (1) the timing delay of junctions on a single phase as a function of input phase, clock frequency, and clock amplitude; (2) operating margins on the input phase and clock amplitude as a function of frequency; and, (3) operational margins of the clock amplitude and frequency for a long, deep pipeline shift register. With these three experiments I was able to test the timing behavior predicted by equation (3.8), verify the operational boundaries derived from this equation, the switching time t0, and test the self-correcting influence of the clock boundaries on input phase delay. These results confirmed the timing behavior of SFQ pulses, and also showed that RQL circuits can operate over greater input phase ranges than expected based on my models. Figure 5.1 shows a microphotograph of the experimental chip. I designed this chip, designated Norwalk 22 Timing Experiment, or N22TE, and had it fabricated 163 Set Short Shift Register And-output Race Circuit Long Shift Register Two-output Race Circuit 5 mm Standards 5 mm Figure 5.1: Microphotograph of N22TE, fabricated at Hypres. This chip contains three experiments on RQL timing. The chip contains one long shift register (red box), two short shift registers (dark blue boxes), two And-output race circuits (green boxes), two Two-output race circuits, and a set of standards (yellow box). Pads on the far left include a set of standards with a through-line, 50? on-chip termination, an open, and a short. Note that this layout is flipped from the design layout by the fabrication process. The short shift registers were used only to confirm operation of the chip. 164 at Hypres (see Appendix E for details of the Hypres process). The the circuits I used for the three experiments can be seen on the chip. I duplicated some of the circuits to mitigate the chance of fabrication errors disabling a circuit. Three different circuits were used because no single circuit can test all aspects of the timing model. In the first circuit I measured the output timing as a function of input timing by comparing the timing difference between two identical SFQ pulses in a race condition. These two pulses propagate through two parallel JTL lines with different lengths, and each line has a separate output. I observed the output of this two-output race circuit experiment directly on a Tektronix TDS 8000 Digital Sampling oscilloscope1. In a second experiment, I alternatively compare two propagating pulses fed to the AND gate on chip. While this method lost information on the relative timing, it eliminates the uncertainty introduced by the testing equipment. This and-output race circuit experiment provided a binary result of correct or incorrect operation as a function of input time and clock frequency and amplitude, and thus was a good test of the operational margins. In the third experiment I tested a long shift register circuit and examined the behavior of SFQ pulses crossing clock boundaries. In this experiment, I measured the maximum operating frequency of the long shift register with deep pipelines. The measured frequencies correspond to the maximum operating frequency of the circuit with a given pipeline and serve as an experimental measure of the switching time parameter t0. Also, the timing model predicts a stability limit after which pulses will 1All references to an oscilloscope in this Chapter refer to this model. 165 no longer have self-correcting timing (see pg. 88). I measured the phase boundaries for correct operation and checked these boundaries against the predictions of the analytic model. 5.2 Circuits and Simulation for Experiments 1, 2, 3 Figure 5.2 shows block diagrams and a view of the circuits in the Cadence computer design environment. 5.2.1 Simulation of Experiment 1 ? Two-output Race Circuit In the first experiment, an SFQ pulse generated by a single input splits into two pulses propagating along separate paths with a different number of junctions (see Fig. 5.2(a)). The difference in number of junctions between paths was designed to accumulate enough time delay that could be measured directly on the oscilloscope. Figure 5.3 shows waveforms of pulse propagation between short and long paths simulated at 4 GHz. The short track contains 4 JTLs with 8 junctions and the long path contains 20 JTLs with 40 junctions. The last JTL on both paths is shaded on the figure. It is an amplification JTL with twice the critical current, required for the input to the output amplifier. One can see that the total accumulated delay is around 20 ps. This 20 ps delay is on the limit that can be measured by oscilloscope with sampling measurements at the range of frequencies between 1 and 6 GHz. The number of junctions in the long path was designed for pulse propagation in the range of frequencies up to 5 GHz at maximum clock amplitude. The total 166 ?18 ?2 Long Track Short Track Amplifier Amplifier ?3 ?19 Long Track Short Track Phase 1 Phase 1Phase 2 Phase 3 Phase 4 ?20 ?20 ?20 ?20 ?20 Phase 4 ?20 Amplifier (d) (a) (e) Two-Output Race Circuit (c) And-Output Race Circuit Long, Deep Pipeline Circuit (b) Figure 5.2: Block diagram and layout of N22TE. Block diagrams show the essential features of the three timing experiments. Shaded JTL symbols indicate higher critical currents were used in the junctions, Jc1 = 282?A (instead of the normal Jc1 = 141?A) and Jc2 = 400?A (instead of Jc2 = 200?A). (See Fig. 2.2 on pg. 44.) (a) & (b) Block diagram and layout of two-output race circuit. A single input generates an SFQ pulse which is split to travel along two paths. The paths have different lengths and each path has its own output. The two outputs can be seen on the right of the layout, which has one track going down the bottom of the ?T? shape, and the other along the top. (c) & (d) Block diagram and layout of and-output race circuit. Similar to the previous circuit, one input is split into two tracks. The two tracks feed into an AND gate, which produces output when the circuit is operating. (e) Block diagram of long, deep pipeline shift register. Unlike the previ- ous circuits, no splitting of the SFQ pulse occurs and the circuit uses multiple clock phases. 167 0 0.2 0.4 0.6 0.8 1 1.2 1.4 ? [ra d] t [ns] Input Short Track Long Track Delay Difference 0 2pi 0 2pi 0 2pi Figure 5.3: Plot of the input and output phases versus time for the two- output race circuit shown in Fig. 5.2(c). This figure shows the input phase of the first and last junctions in the two-output race circuit, as simulated in Spice for Hypres 4.5kA/cm2 process at 4 GHz. The input shows the phase rise and fall with SFQ pulse pairs. The short track shows an output almost immediately after the input junctions switch. The long track takes noticeably longer. This timing difference between output from the short and long tracks is the delay difference between the two positive SFQ pulses. clock line path length difference is approximately 1200?m, the delay in the clock line between the two paths is approximately 12 ps. At 5 GHz the wavelength is approximately 2 ? 104 ?m. The change in wavelength over the size of the circuit is approximately 6% at the maximum operating frequency. There is always an unknown delay introduced by the output cables. However this difference in delay is constant and can be subtracted out from the measurement data. Figure 5.4 shows predicted timing behavior in this experiment as a function 168 of input phase as found from (3.8). There are three curves showing output phase delay for the path with 1, 8 and 20 JTLs (or 2, 16, and 40 Josephson junctions, respectively), and also a curve showing the accumulated output delay between long and short paths. One can see that the cutoff point of the input phase is dominated by the longer path and that the time delay difference between long and short paths is fairly constant until the phase approaches the timing limit, at which point it sharply increases. This time delay difference can be measured in a real device as a function of input phase by adjusting the phase of the data relative to the clock. Figure 5.4 shows the difference in timing behavior for both short and long paths. Three points are worth noting in this graph: (1) the cut-off point is dominated by the long path, for obvious reasons; (2) the timing difference is fairly constant until the phase reaches close to the timing limit, at which point it sharply increases; and (3) the delay flattens out at ? = 0. These qualitative features should be readily observable in the actual data 5.2.2 Simulation of Experiment 2 ? And-output Race Circuit The second experiment was similar in concept to the first experiment in that it also measured relative delay between a short and long path. The same number of junctions is used in the short and long track. However, instead of observing the delay on the oscilloscope it was sampled by an AND gate. This experiment eliminates the uncertainty in measuring picosecond-scale delays on the oscilloscope. The AND gate compares the relative timing of the pulses ? modulated by one 169 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.5 1 1.5 2 2.5 3 0 20 40 60 80 100 120 0 20 40 60 80 100 120 140 160 180 200 O u tp u t Ph a se D el ay ? (?) [ra d] O u tp u t T im e D el ay [ps ] Input Phase ? [rad] Input Time [ps] 2 JJs 16 JJs (short) 40 JJs (long) Difference between short and long Figure 5.4: Two-output race circuit timing predictions. Output phase delay calculated from (3.8) of the JTL circuit in Experiment 1 for Hypres 4.5 kA/cm2 process at 2.5 GHz, with two parallel paths ? short with 8 junctions total and long with 40 junctions total. Different curves show the phase delay as a function of input phase for 2 JJs, 16 JJs, and 40 JJs on a single phase. Paths with more junctions take longer. The red solid curve shows the expected difference in accumulated delay between long and short path. clock period ? and produces binary data. If two delayed pulses come within the same clock window, then the AND gate output is ?one?? and otherwise the output is ?zero?. By changing the the amplitude of the bias current, I can modulate the AND output and observe the operation of the circuit for different input phases. Figure 5.5 shows the expected regions with output and without output from the AND gate for this experimental circuit, calculated from (5.3b) and plotted for different frequencies. These calculated curves on the plot are derived from the 170 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 Cl oc k A m pli tu de A [I b /I c ] Input Phase ? [rad] operational failure 1 GHz 1.7 GHz 2.5 GHz 3 GHz Figure 5.5: Operational space of N22TE. The four curves show the cal- culated boundary conditions (from (5.3b)) for correct operation for four different frequencies for N = 40. The contours for a given frequency can be mapped out by varying clock amplitude and input phase until a change occurs. analytical expression (3.8) ?(?) = arccos (cos ? ? ?)? ?. (5.1) This is the equation I used in Chapter 3 to derive the boundary conditions for SFQ pulse propagation. Experimentally the most convenient quantity to measure is the limiting clock amplitude Alim(?, f) at which operation fails as a function of clock input phase ? and frequency f . From Alim(?, f) I can find expressions for the limiting frequency flim(A, ?set) as a function of clock amplitude A and a set, fixed input phase ?set. I can also find the ?late limit? on the clock input phase ?lim(f, A). 171 The boundary condition is determined from (3.8) by the criteria ? ? 1 < cos ?, (5.2) where ? = 6pift0/A. This criteria corresponds to the case where the SFQ pulse arrives too late in the clock cycle to switch the junction. I expect these equations to give a very good approximation to the boundary conditions of SFQ pulse propaga- tion in the limit of frequencies where the pulse timing fitting parameters are close to unity (see Section 3.3, page 95). (In the general case where the pulse timing fitting parameters are not close to unity, the simulation would need to iteratively calculate the next output based on the previous input, using the fitting parameters.) Solving (5.2) for ?, f , and A gives the expected behavior: ?lim(A, f) = arccos(2piNt0 fA ? 1), (5.3a) flim(A, ?set) = 1 + cos ?set2piNt0 A, (5.3b) Alim(f, ?set) = 2piNt01 + cos ?setf, (5.3c) with ?set now a variable in operational space instead of the input phase. These three equations define the boundaries in operational space between successful pulse propagation and failure. From (5.3a)?(5.3c), one can see that for a given frequency there exists a minimum amplitude and a maximum phase input. As the frequency increases, the operational range decreases. All frequencies have a common upper limit on amplitude at A = 1, after which failure occurs due to overdriving of all junctions. This failure mechanism is unrelated to frequency or clock phase. Coming back to Fig. 5.5, each curve divides the the operating space into regions where pulses propagate ? above the curve ? and regions where the pulses do 172 not propagate ? below the curve. Notice that for a given clock amplitude, the operational space for correct operation grows as frequency is decreased. That is, for slower clock frequencies the pulses may arrive later without preventing operation. For a given clock input phase, the clock amplitude can also decrease as the frequency decreases. Pulses arriving before ? = 0 are not allowed (in this model) and at A > 1 a different failure mechanism takes over (all junctions switch regardless of input). 5.2.3 Simulation of Experiment 3 ? Long Shift Register I designed the third circuit to verify the effect of phase boundaries on timing stability and check the maximum operating frequency of an RQL circuit. Instead of trying to measure an RQL circuit in the 20 to 40 GHz range ? which is challenging experimentally ? I decided to design long, deep pipeline shift registers with 20 JTLs per stage. In this case, the maximum operating frequency from simulation was expected to be 5 GHz at maximum clock amplitude (A = 1). The timing behavior of this shift register is equivalent to a short shift register operating at 100 GHz with pipeline depth of one JTL. This follows from the definition of ?. In addition, this experiment allowed me to check the self-correcting behavior of an RQL circuit by measuring the clock power margins as a function of input phase. The detailed circuit schematic of the shift register is shown in Fig. 5.6; there are 1384 junctions and 40 junctions per phase, which gives 20 JTL segments per phase. The long length ensured that practically any phase delay introduced at the input would be corrected by the time output occurs. The testing equipment (see 173 Four phases, 1384 Junctions 20 JTLs Amp Phase 00 20 JTLs Phase 00 20 JTLs 20 JTLs 20 JTLs Launch Phase 01 Phase 10 Phase 11 Figure 5.6: Long, deep pipeline shift register. Schematic of the long, deep pipeline shift register used in Experiment 3. (Intermediate stages are omitted for clarity.) Each phase contains 20 JTLs with 40 junctions, a pipeline depth designed to fail at f = 5GHz and A = 1. Unlike previous experiments? circuits, this circuit has a four-phase clock with 80 JTLs per cycle, for a total of 17.25 full clock cycles of delay through the circuit. Two extra junctions from a squid and serve as output amplification at the output. Section 4.4.1 on pg. 157) only worked up to a speed of about 6 GHz, and the circuits were designed to fail below this speed for maximum clock amplitude. Similar to the other two circuits, the final JTL in the shift register had critical currents of Jc1 = 282?A and Jc2 = 400?A. This was the only circuit on N22TE that I tested with two clock lines and four clock phases; the previous circuits only had one phase on a single AC clock line and one DC flux offset line. The phase difference between the clock lines was controlled externally (see next Section). 5.3 Experimental Setup Figure 5.7 shows a block diagram for the experimental setup, which has been reproduced from Fig. 2.14 in Chapter 2. However, the initial settings for the circuits was different. I obtained the optimal operating point under the assumption that 174 + + +Attenuator (3 dB) c1 c1* a0*a0 c0 c0* dc0 dc0* q0Circuit Junction DC Offset Hardware Delay Trigger 4.2 K ??2 Bias-T Low-Pass Filter ??1 Low Noise Amp Amplifier DC Source Sync DC Data Offset Attenuator (40 dB) Oscilloscope Clock Generator Pattern Gen. Clock Generator #2 #1 Figure 5.7: Experimental setup for timing experiments. Block diagram of the experimental setup for the timing experiments. a0: data input; a0*: data return; c0, c0*, c1, c1*: clock phases and returns; dc0: DC offset bias; dc0*: offset bias return; q0: experimental output. Similar to Fig. 2.14, although bias-Ts have been replaced with 3 dB attenuators in some cases. Low noise bandpass filters have a cutoff frequency of fC = 1 kHz. Low noise amplifier is a Miteq LNA with an operation range of 0.5?18 GHz and a 2.5 dB noise floor. 175 the margins on clock amplitude, input phase, and relative clock phase are indepen- dent. To find the margins on amplitude, I first turned the amplitude down to near failure. I then changed the clock phase up and down to find preliminary margins on the phase. The clock input phase could be searched easily by introducing a small frequency difference between the two clock generators. At the middle of the phase range I then turned the clock amplitude down until I found the lower bound of the clock amplitude. At the same input phase, I next increased the amplitude until fail- ure. The optimum amplitude is assumed to be the geometric mean between clock amplitudes. To establish the optimum relative clock phase, I set the amplitude at its optimal value and then decreased the relative phase until failure, increased it until failure, and chose the mean value as the optimal operating value. Table 5.1 shows the operating conditions I found for the circuits? dc bias cur- rents it tests on the chip. The nominal operating point was reasonably close to the design values, indicating good correspondence between designed and as-built parameters. To ensure margins on the bias currents were adequate, I checked that each was close to midway between the maximum and minimum limit currents while the circuit was at its optimal operating point. Each of the three experiments allowed me to test the timing behavior as a function of phase delay between clock and data input. To take data, I initially reduced the clock phase until failure occurred, even with the clock amplitude at the maximum value. Then I increased the clock phase in small increments. At each increment, I measured the high and low values of clock amplitude (in dBm). Failure on the high end was clear to observe ? all digital output read ?one.? At 176 Table 5.1: Operational bias conditions for N22TE. The nominal values are close to the design values. The minimum and maximum ranges show that the circuit is operating without major fabrication issues. The a0 input voltage range is approximate. The DC Data Offset value is dependent on the input data pattern; the value here is for a pseudo- random data pattern. Input Nominal Minimum Maximum Design Amplifier DC Source 175?A 100?A 226?A 168?A DC Data Offset 80?A 27?A 187?A (n/a) dc0 2 mA 0 mA 4 mA 2 mA a0 (with 40 dB attenuator) 1 Vp?p 0.5 Vp?p 1.75 Vp?p 1 Vp?p low amplitudes, the circuits start to produce random errors that result in gradually decreasing voltage levels in sampling measurements on the oscilloscope and flickering measurements for non-sampling measurements. Failure was defined as the clock amplitude at which an increase of +0.1 dBm input to the clock power produced no change in the output signal. Under this condition, the logical operation of the circuit remains the same with a further increase in clock power, i.e. the circuit is operating correctly. When the change in clock power produces a change in the output signal, some part of the circuit is not operating fully correctly. 5.4 Data and Analysis For the time difference measurements, I define the delay as the difference in output time ?t between the midpoints of two output pulses as measured by the user- adjustable markers on the oscilloscope. I also checked the output for feedthrough or line-to-line coupling by turning off the output bias current. To determine coupling between output and clock lines I turned the bias on and off to observe the change 177 in output voltages. 5.4.1 Experiment 1 Results ? Two-output Race Circuit Figure 5.8 shows the main experimental results from Experiment 1 on the two-output race circuit. The measured timing difference (filled circles) are plotted as a function of the input phase offset. The error bars correspond to the resolution of the oscilloscope. The first thing to notice is that the circuit works over almost the entire clock cycle. This is a surprise because the analytical model given by (3.8) and the VHDL model define operating regions that start from the beginning of the clock window, which is one-third of the clock period or a phase. One implication of this plot is that positive SFQ pulses arriving during negative clock bias wait at the JTL until the bias current reaches the critical value for propagation. In this circuit the delay of the short JTL is approximately constant, whereas the delay of the long JTL is far more dependent on the input clock phase. This suggests that the slight variations in the phase delay below 2.5 rad are due to the long line. The small maximum near 2.5 rad corresponds to ? = 0 input phase relative to the clock. For larger offsets (beyond 2.5 rad) the behavior follows the predictions of (3.8) (see red curves shown in Fig. 5.4). The timing was fairly even until the largest phase offsets, where the timing difference quickly increased before failing. For the first experiment, I used the oscilloscope to find the output times of two pulses. The delay of the equipment (probe, wires, etc.) was unknown but assumed to be constant. (They need not be the same for both pulses because any constant 178 0 1 2 3 4 5 6 ? t [ps ] ?in [rad] 20 ps 0.5 GHz 1.0 GHz 1.5 GHz Figure 5.8: Two-output race circuit measured data. Measured timing difference ?t between outputs from short and long JTL paths for several different frequencies plotted as a function of input phase ?in. ?in is relative to the first recorded data point. The blue (dashed) curve is a fit of (5.4a) to all data points. The red (solid) curve is a fit of (5.4a) to the points at large phase offset where (3.8) is valid. The range of phase delays that result in SFQ propagation were considerably larger than predicted by (5.3c). The sharp upturn of the phase delay close to failure was a prediction of (5.3c) that was verified here. Figures are offset for clarity. The delay includes a contribution from the delay of the coaxial lines in the probe, making only a relative measurement of the timing difference possible. 179 term will produce an offset but otherwise not affect the results.) The expected relationship for the timing output of a pulse is given by (5.1). Since I was looking for the timing difference, the results were expected to be of the form ??(?) = ?long(?)? ?short(?) = arccos(cos ? ?Nlong?)? arccos(cos ? ?Nshort?). (5.4a) In practice, I tried fitting to a function of the form ??(?) = ?4?1[arccos(cos(? + ?2)?Nlong ?/?1)? arccos(cos(? + ?2)?Nshort ?/?1)] + ?3, (5.4b) where ? = 3?t0/A (as defined in Chapter 3), Nshort and Nlong are the number of junctions in the short and long paths, respectively, and ?1, ?2, ?3, and ?4 are fitting parameters. Ideally I expected ?1 = ?4 = 1; ?2 and ?3 merely allow for the unknown offset due to the unknown length of the electrical lines. The parameters of particular interest are ?1 and ?4, which are related to the curvature and stretching of the curve. These factors are similar to those found in Chapter 3. Although I will shortly show two ways in which the clock amplitude could be determined, for this experiment the clock amplitude was still an unknown but it was not critical to the measurements at hand. Likewise, as I discovered in Chapter 3, the effective value of ? could change in real circuits due to leakage currents from adjacent JTLs and gates. The parameters ?2 and ?3 have no restrictions and account for the systematic uncertainty in the absolute phase and delay, respectively. I plotted two different fits in Fig. 5.8 corresponding to (5.4b) with different fitting ranges. The red (solid) curve is analogous to the timing difference curve 180 Table 5.2: Fitting parameters of two-output circuit data. ?1 and ?4 were expected to be close to unity, but are still within the ranges of analogous parameters found by fitting simulations in Chapter 3. Compare to Table B.1. ?2 and ?3 merely account for systematic constant phase offset un- certainty in the experimental setup and carry no particular significance. 0.5 GHz 1.0 GHz 1.5 GHz Red Fit ?1 1.0635 0.6677 1.5924 ?2 -2.5064 -2.4101 -2.9371 ?3 37.4684 67.2539 49.6685 ?4 0.5821 0.8502 0.2378 Blue Fit ?1 5.2273 0.6677 8.4889 ?2 -2.1259 -2.4101 -2.5474 ?3 48.1487 67.2539 49.6719 ?4 5.3251 0.8502 4.078 shown in Fig. 5.4, which starts at ? = 0. The blue (dashed) curve fits the same function to the entire data range. As one can see in Fig. 5.8 there is a very good fit to the experimental data. The fitting parameters are given in Table 5.2. Both fits are similar to each other and closely follow the measured data. In both cases the abrupt increase in timing difference at large phase delays was captured by the function. This qualitatively confirms that the timing behavior is following (5.4b) but not necessarily (5.4a). The values of the parameters ?1 and ?4 for the red fitting are closer to unity than those for the blue fit. This is not unexpected as the model was derived only for ? > 0, whereas the blue curve is fit to data for which ? < 0 as well. 181 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 Cu rr en t (I b /I c ) Phase ? (rad) 1.0 GHz 1.5 GHz 2.0 GHz 2.5 GHz Figure 5.9: And-output race circuit data. This figure shows the data from Experiment 2 and predictions for the boundary based on (5.3c). Four frequencies between 1.0 GHz and 2.5 GHz are shown. Data between 1.0 GHz and 2.0 GHz follows the correct qualitative behavior. The data matches very closely f or the current limits and somewhat closely for the current limits. 5.4.2 Experiment 2 Results ? And-output Race Circuit Figure 5.9 shows the main results of the And-output race circuit experiment. The purpose of this experiment was to measure the operational margins of an RQL circuit as a function of clock frequency, amplitude, and input phase. In Fig. 5.9 I have plotted input phase ? versus the lower bound of bias current Ib. The points are measured data, and the curves are predictions based on (3.12) (see pg. 172). This plot can be compared to Fig. 5.5. The data in Fig. 5.9 is shown for four frequencies (1.0 GHz, 1.5 GHz, 2.0 GHz, 2.5 GHz). The limiting factor on measurements was 182 the step size of the power of the clock signal generator. With a step size of 0.1 dBm, the error on bias current depends on the bias current A = Ib/Ic and goes as ?A = A log 10/200. The main observation is that for the 1.0, 1.5, and 2.0 GHz data, the phase limit is found near the expected value (as the solid curves) from (5.3b). However, the measured lower current bound occurs at a higher value than expected. As a result, there is less dependence of the current on clock phase at the lower measured limits of clock amplitude. Apart from from this, the observed behavior qualitatively matches the predictions shown in Fig. 5.5. The cutoff input phase for the three lower frequencies 1.0, 1.5, and 2.0 GHz matches data to within 5% when one compares the value of the predictions at Ib/Ic = 1 and the data point with the highest phase ?. However, the cutoff lower clock amplitude is notably above the values predicted. The data for these three frequencies shows a specific cutoff for the lower clock amplitude: about 0.25 for 1 GHz instead of 0.175, about 0.35 for 1.5 GHz instead of 0.25, and about 0.5 for 2.0 GHz instead of 0.35. Instead of a gradual decrease in minimum current, it seems a different effect may take over the lower clock amplitude limit. In the model, at the lowest clock bias values the junctions just barely manage to switch in time. The model considers only single junctions in isolation. In real circuits, junctions are coupled to each other through inductors. In particular, at phase boundaries, junctions on one side of the phase boundary will influence junc- tions on the other side (this can be seen in Fig. 1.12). Although the distribution of current is set by the choice of inductance values in JTL units on phase boundaries, this additional coupling is still present, adding an additional resistance to switching 183 of the last junction on a phase. The result is that additional ?torque? needs to be applied to the junction in order for it to switch. However, for later input phases, the bias current on the next phase will already be higher, thus making it easier for junctions on both sides of the phase boundary to switch. Hence, the effect goes away after a certain input phase. For this experiment, I had to determine the bias A = Ib/Ic from the measured clock output. The mean power in dBm for a given measured peak-to-peak voltage Vp?p can be found from PdBm = 10 log10 ( V 2p?p 8? 50? 1000 W ) , (5.5) where the factor of 8 is necessary to convert the peak-to-peak voltage into a root- mean-squared value, 50? is the internal impedance of the oscilloscope, and the factor 1000/W defines the dBm unit. The test setup (see Fig. 5.7) was symmetric between input and output with the exception of the power splitter on the clock output (which had a loss of approximately 6 dB) and any attenuators placed on the probe. If we account for all power attenuators on the clock lines with a factor Pattenuator , the power delivered to the circuit on-chip Pchip is related to the input power Pin and output power Pout at the oscilloscope by Pchip = 1 2 (Pin ? Pattenuator + Pout), (5.6) where all values are in dBm. (The factor of 1/2 is due to the square root in the geometric mean when going from linear to log scale.) We can then express the current Ib induced in a the bias inductor as Ib = M L ? 2Pchip Zclock = M L ? 2 1000? Zclock 10(Pin?Pattenuator+Pout)/20, (5.7) 184 Table 5.3: Summary of measurements of Pin and Vp?p in N22TE used for calibration of the junction bias current Ib. Pattenuator is the amount of attenuation found between the clock signal generator and the chip that is not found between the chip and the oscilloscope. Pout is calculated from Vp?p by (5.5). Pchip is calculated by (5.6), and given in both dBm and mW. Ib (total) is calculated using (5.7). Ib (JJ1) and Ib (JJ2) are calcu- lated as fractions of Ib (total) based on the distribution of supercurrents through two parallel inductances (see Fig. 2.2). The attenuator shown in Fig. 5.7 was measured to have an attenuation of 2.83 dB. Calibration of the circuit to a known current of Ib = Ic gives an additional factor of 1.75 (or 2.43 dB) to current due to attenuation. f (GHz) 1.0 1.5 2.0 2.5 2.5 GHz Pin 3.4 5.1 5.3 5.5 5.7 dBm Vp?p 81.1 84.9 82.2 74.6 76.2 mV Pattenuator 5.26 5.26 5.26 5.26 5.26 dBm Pout -17.840 -17.442 -17.723 -18.566 -18.382 dBm Pchip -9.850 -8.801 -8.842 -9.163 -8.971 dBm Pchip 0.104 0.132 0.131 0.121 0.127 mW Ib (total) 0.221 0.250 0.248 0.239 0.245 mA Ib (JJ1) 91.072 102.761 102.285 98.570 100.775 ?A Ib (JJ2) 130.103 146.802 146.122 140.815 143.965 ?A where M is the mutual inductance of the RQL clock-line transformer, L is the bias inductor, and Zclock is the clock line impedance (by design about 32?). Equation (5.7) can be checked at a known reference point where Ib = Ic and the actual attenuation can be found. Equation (5.7) was fundamental to the second experiment because I needed to find the input phase at known clock amplitudes. The highest working clock amplitude was very consistent and represented a failure mechanism independent of the input phase. This power set the baseline for the critical current though the junction. Using this as a reference point, I then found the current through the 185 junction from Ib = Ic ( 10?P/10 )1/2 (5.8) with ?P = Phigh ? Plow, the difference in measured values for the high and low clock power values, respectively. This method avoided the effects of uncertainties in measurements of Vp?p, as well as the effect of losses in the probe, losses in devices, and the uncertainties in values of mutual and linear inductances. Furthermore, (5.8) and (5.7) give independent estimate for the critical current, and these values matched within the uncertainty of the attenuation of the clock splitter, which is about 12%. Table 5.3 shows results from this analysis of the data. The main result is that calibration suggests a loss of 2.43 dB in power before reaching the chip. The measured attenuation of the 3 dB attenuator was actually 2.83 dB, which is within the tolerances of the device. The average currents Ib (JJ1) and Ib (JJ2) should be as close to their design values of 100?A and 141?A as possible. This occurs for Pattenuator = 5.26, leaving a remaining 2.43 in unaccounted- for attenuation. This extra attenuation is likely found in the variable delays ??1 and ??2. In conclusion, the results from the and-output race circuit experiment quali- tatively matched the predictions of the analytic timing model. In principle, there were clear cutoff values for both the clock current amplitude and the input phase and the margins shrank as the frequency was increased. The cutoff values approxi- mately matched with the predictions from (5.1). However, whereas the predictions were for gradual transitions from one limit to the other, the measured data had a 186 0 0.5 1 1.5 2 2.5 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Cl oc k Po w er M a rg in s [dB ] ? [rad] 2.5 GHz 2.0 GHz 1.5 GHz 1.0 GHz Figure 5.10: Multi-phase shift register clock amplitude margins plotted versus input phase for four frequencies. The data has been centered at zero phase. The sharp drop in margins at either end confirmed the expected behavior of a long, deep pipeline shift register. For |?| < 1.5 rad the clock margins generally show less than 0.2 dBm variation. Maximum clock amplitude was 4.5 dBm (measured) for all frequencies. more abrupt and earlier cutoff. Also, at a clock frequency of 2.5 GHz, the data did not fit the predictions. Nevertheless, there was still an increasing minimum clock amplitude with increasing frequency. 5.4.3 Experiment 3 Results ? Long, Deep Pipeline Shift Register Figure 5.10 shows my main results from tests of the long, deep pipeline multi- phase shift register. In this figure I plot the clock margins versus the input phase. The power margins show a small variation over a range of ?1.5 radians. The rapid 187 decline in clock margins at either extreme indicates rapid failure of the shift register. These wide margins confirm the main results of Experiment 1, in which I found the pulse propagation window extended well beyond that assumed by the model. I note also that as the clock frequency increased the nominal clock power margins decrease. The primary goal of this experiment was to measure the junctions switching time parameter t0. At failure, ? = ? 2 and using the definition of ? we can express the baseline switching time as t0 = A 3 ? 2pifN , (5.9) where N = 40 is the number of junctions in each clock phase of the long, deep pipeline shift register in N22TE and A is the experimentally determined clock cur- rent amplitude in units of Ic. Table 5.4 shows the results of my analysis of the data, using the nominal clock power measurements shown in Fig. 5.10 and taking into account the calibration from Experiment 2. The average value for t0 is 0.44 ps, with 20% variation between measurements at different frequencies. This compares well with the value of 0.44 ps expected from the IcRN product 0.75 mV and application of (3.7). Equivalently, the equivalent IcRN value from t0 = 0.44 ps is 0.77 mV, close to the 0.75 mV assumed in design. A secondary goal in testing the multi-phase shift register was to observe if there was any shift in the output timing as a function of input phase. No such shift was observed. Only at the very ends of each range was there a noticeable change in the power margins. Thus, these results showed that long pipelines with ?slow? clocks could adequately propagate SFQ pulses. Similarly, from the 2.5 GHz 188 Table 5.4: Analysis of the long, deep shift register from N22TE. Esti- mated parameters for results of the junction switching time of the long, deep pipeline shift register experiment. f is the clock frequency; Pin is the displayed value of power on the clock generator; Pout is the mea- sured output from the chip; Pchip is the calculated power delivered to the chip within the ??? < 1.5 rad limit; Ib/Ic is the minimum bias current calculated from Pchip using (5.8); t0 is calculated using (5.9); and IcRN is calculated using (3.7). The average value of t0 is 0.440 ps. Average value of IcRN is 0.770 mV. Both values have an uncertainty of about 20%. The value of IcRN used for design was assumed to be 0.75 mV. f 1.0 1.5 2.0 2.5 GHz Pin 4.500 4.500 4.500 4.500 dBm Pout 1.800 2.400 2.700 4.100 dBm Pchip 2.700 2.100 1.800 0.400 dB Ib/Ic 0.307 0.352 0.378 0.521 t0 0.576 0.441 0.354 0.391 ps IcRN 0.570 0.744 0.926 0.839 mV maximum operating frequency measured for 40 junctions per phase, I can conclude that an RQL circuit with 2 junctions per phase will operate at 50 GHz. 5.5 Conclusions In this Chapter I described three experiments to test different aspects of the analytic timing model. With Experiment 1, I examined the applicability of the model to actual junction switching events. With Experiment 2, I showed that the experimental limits of RQL circuits mainly follow the pattern predicted by the an- alytic model. A significant difference was that the data showed an earlier clock amplitude cutoff, implying that margins decrease in fabricated circuits. In Chapter 6, I examine further the issue of suppressed operating margins . Finally, in Experi- ment 3 I showed that the long, deep pipeline behavior of RQL circuits matches the 189 timing behavior predicted by the analytic model fairly well. I note that an alternate analysis in Appendix D shows that a suppressed IcRN , or an increased ?c, explains the upper experimental limit of 2.5 GHz instead of the designed 5 GHz. 190 Chapter 6 Carry-Look Ahead Adder Experiment 6.1 Introduction In Chapter 5, I described results from three experiments which showed the applicability of the timing model to simple test RQL circuits. What remains is to design and test more complex RQL circuits that could be used for practical applications. In this Chapter, I first describe the design of a digital adder. This circuit is powered by a Wilkinson power splitter, contains non-local interconnects, and has a fan-out of four in the chosen architecture. I then describe experiments I performed to verify operation of the device. Figure 6.1 shows a microphotograph of the completed circuit, which I designate Norwalk 21 Carry Look Ahead, or N21CLA. 6.2 Circuit Design A Carry-Look Ahead (CLA) adder adds numbers with a minimum latency. I chose to build and test a CLA for several reasons. When built in RQL, an 8-bit CLA contains 815 junctions with non-local interconnects between physically separated logic gates. This means it is a fairly complex circuit that will provide a fairly hard operational test for RQL. Also, the effective fanout was four as in a CMOS design. This was achieved through the use of amplification junctions. These features would 191 5 cm 5 cm Figure 6.1: Photo of N21CLA. Four power splitters (green boxes) were in the four corners of the chip. Data was input from the bottom and moved into a shift register (black box) on the middle-right. The CLA (red box) pipeline moved data from right to left. The density of circuit elements decreased towards the right. Eight high-fidelity output amplifiers (blue box) on the left amplified the output signal to measurable levels with a very low bit-error rate. Output was on the left side. (Chips were mirror imaged from design by the fabrication process.) 192 be essential in any real-world application. Many different CLA architectures exist in CMOS design. I chose to use the Kogge-Stone architecture [44]. Implementing this architecture in RQL required me to use the VHDL models of Chapter 3. Needless to say, correct operation of the CLA depends on getting all the details right. For example, correct synchronization of clock signals is necessary and thus provided a secondary test of the Wilkinson power splitter. 6.2.1 Propagate/Generate Logic As noted above, I chose to build a Carry-Look Ahead (CLA) adder with a Kogge-Stone architecture for our digital adder demonstration. The CLA is based on the AndOr operation, making it particularly well suited to a demonstration of RQL. The CLA reduces latency (and size) by pre-calculating the propagate and generate signals of each bit being added. The propagate signal P is the OR or XOR operation between the bits, while the generate signal G is the AND operation, as follows: Pi = Ai ? Bi, (6.1) Gi = Ai ? Bi. (6.2) Both these operations can be performed in one quarter of a clock cycle by the RQL AndOr gate, including use of JTLs to increase the fanout to four. The carry bit Ci+1 from any bit-level i is given by Ci+1 = Gi + (Pi ? Ci), (6.3) where Ai and Bi are the i-th bits of the two binary numbers being added. 193 (c) P = A P = Ai ? Aj G = (Ai ?Bj) + Bi P G Ai Aj Bi Bj GP A B G = A ?B P = A? B GP A B(a) (b) G = B Figure 6.2: Carry-Look Ahead elements. The Kogge-Stone Adder archi- tecture uses these three logical elements. Logical operations are shown below symbols. Note that each unit contains both a pipe for propagate (P) and generate (G) signals. (a) Simple delay unit. Corresponds to JTL units in RQL. (b) PG generation unit. This unit is the first step in any CLA and it is used only once at the beginning of each pipeline, changing separate 8-bit signals into PG signals. (c) Carry-Look Ahead gate. Takes four inputs, two each for P- and G-signals, and outputs a P- and G-signal. Figure 6.2 shows the elements of the CLA adder. Figure 6.2(a) shows a delay unit, which is simply a JTL. Figure 6.2(b) shows the PG calculation unit that is necessary at the beginning of every pipeline and corresponds to the AndOr gate. Finally, the actual summing is done by the unit in 6.2(c), which takes two inputs each for the propagate and generate signals and generates the output according to (6.1) and (6.3). In the Carry-Look Ahead scheme the carry bit is pre-calculated ahead of the actual sum. Whether or not a given carry will propagate or generate is pre-computed instead of waiting for the carry signal from lesser bits, as in the ripple-carry adder architecture. By recursively substituting (6.3) into itself for i ? i+ 1, one obtains the following results for the carries going into the first 3 bits. For example, for a 194 four-bit adder, C1 = G0 + (P0 ? C0), (6.4) C2 = G1 + (P1 ? C1) = G1 + P1 ?G0 + P1 ? P0 ? C0, (6.5) C3 = G2 + (P2 ? C2) = G2 + P2 ?G1 + P2 ? P1 ?G0 + P2 ? P1 ? P0 ? C0. (6.6) By definition C0 = 0, but (6.4)?(6.6) hold even if we allow C0 6= 0. The importance of (6.4)?(6.6) is that it shows that the final carry bit C3 can be calculated without waiting for the calculation of the previous carry bit, as in the ripple-carry adder. In the CLA scheme addition can be broken into groups of n bits. The group-carry and group-propagate bits can be passed to the next block of n bits, in which case C0 6= 0. This branching structure gives the CLA a latency of O(logn) [44], where n is the number of bits added. 6.2.2 Kogge-Stone Architecture The specific architecture of the CLA can be optimized for latency, size (tran- sistor count in CMOS or junction count in RQL), or congestion [38], which is the number of crossings of data pathways [45]. The Kogge-Stone Architecture [44] has more crossings than other designs and requires more chip area, but has the fastest performance [45]. Figure 6.3 shows the generic architecture of an 8-bit Kogge-Stone Adder. The top row of operations is the pre-calculation of propagate and generate bits. The following three rows contain calculation units and delay units, each with identical latency. As the depth increases the number of crossings increases. Between the first 195 B7 A5 B5 A4 B4 A3 B3 A2 B2 A1 B1 A0 B0A6 B6 GP GP GP GP GP GP GP GP P G P G GPP GP GP G P G P G P G P G P G P G GP GP GP GP GPGPP G P GP G P G P G P G A7 Figure 6.3: Generic Kogge-Stone CLA Architecture. Data flows from top to bottom. Most significant bit on left, least significant bit on right. Red lines indicate P-signal pipelines. Blue lines indicate G-signal pipelines. There are eight pipes and four stages. Latency of the Kogge-Stone ar- chitecture CLA is O(log n), where n is the number of bits added, here n = 8. The longest interconnect can be seen between step 3 and 4, where many signal lines jump by 4 pipelines. and second rows there is one crossing bit path. Between the second and third rows there are three crossings, and between the third and fourth rows there are seven crossings. For two four-bit numbers, only the left-most four columns are needed, and the first three rows. Doubling the number of input bits requires adding only one additional row of logic gates, while doubling the number of pipes. In this design the branching structure leading to the O(logn) latency can be seen. However, the number of additional rows decreases as the number of input bits increases. Other than the crossings, columns 0?3 and 4?7 are nearly identical, with only a delay being swapped for a regular adding unit. I made a number of refinements to the generic Kogge-Stone architecture in 196 our circuit. I designed and synthesized an adder using the RQL gate library of Chapter 3, optimized for latency. The fan-in and fan-out of each stage of the design can be changed to increase power consumption but decrease latency. I also had a choice of the amount of congestion due to crossings. My design included non-local interconnects. Because the design did not include active transmission lines (a device which can transmit an SFQ pulse between junctions where the inductance between junctions L > ?0/Ic), some of the longer interconnects necessitated multiple delay units (JTLs) between the logic gates, which increased the latency. Additionally, RQL circuits have specific fan-in and fan-out requirements for each gate (see Section 2.2 on pg. 42). This leads to an optimization problem between size, congestion, and power. Remarkably, the design was done by use of CMOS design tools with the necessary VHDL additions. The overall design of the CLA is shown in Fig. 6.4. Five sequential clock phases are shown in different colors and the connections are shown between logic elements. The branching structure of the Carry-Look Ahead design is shown by the decreasing number of data paths and the greater distances crossed by the data paths on the right in the figure compared to the left. The single colored boxes are logic elements while the long, graduated-color boxes represent delays. In this design, the critical path contains a non-local interconnect between the third and fourth phases, the path connecting bit pipeline 3 to pipeline 7. This interconnect spans not only four bits but crosses nine other data paths. In this design, this connection is made by three JTL elements (six junctions) and is the longest delay in the circuit. We expect the timing constraints to be tightest on this particular pipeline. To alleviate this 197 Figure 6.4: Final Carry-Look Ahead Adder design. This figure shows the synthesized layout of the CLA in schematic form, with sixteen bit pipelines and five stages. Data flows left to right, with the most- significant bit on top and least significant bit on bottom. The five clock phases are shown in color: 1 in blue, 2 in green, 3 in orange, 4 in yellow, and 5 in pink. Solid color elements are logic elements. Graduated-color elements are delays. Lines between elements are data connections. The pipeline with greatest timing constraints can be seen between phases 3 and 4, the top of three lines. This pipeline crosses nine other data paths. 198 issue, we added a fifth phase to the design, which increased latency but increased timing margins. 6.3 Experimental Setup The setup of this experiment is shown in a block diagram in Fig. 6.5. This Figure is similar to Figs. 2.14 and 5.7 with only a few notable differences. There are eight outputs, labeled q0 to q7, from least-significant to most-significant bit. Instead of bias-Ts on the clock inputs and outputs, I used 3 dB attenuators in an effort to reduce reflections and standing waves on the chip. The oscilloscope has six inputs. With eight data outputs, I decided to monitor one clock output, the data return from the input, and four of the eight CLA outputs. The other clock output was terminated with a matched load, as were all CLA outputs not fed into the oscilloscope. Because of the size of the bias-Ts necessary to apply DC current to the output amplifiers, no two adjacent outputs could be observed at once. The Figure shows an example where q1, q3, q5, and q7 are monitored. Observing q0, q2, q4, and q6 was simply a matter of unplugging the bias-Ts from one port and plugging them into the others. No additional cool-down was necessary, though the DC current sources were turned off during the switch. The operating point of the N21CLA circuit was the same found for the N22TE circuit, shown in Table 5.1. 199 + + + + + + Trigger c1 c1* a0*a0 c0 c0* dc0 dc0* Circuit Junction DC Offset Hardware Delay 4.2 K ??2 Low-Pass Filter ??1 Sync DC Data Offset Attenuator (40 dB) Attenuator (3 dB) Attenuator (3 dB) q5 q3 q1 Low Noise Amp Source Amplifier DC Low Noise Amp Clock Generator Pattern Gen. Clock Generator Oscilloscope q0 q7 q6 q4 q2 Bias-T #2 #1 Figure 6.5: Block diagram of experimental setup for N21CLA. Similar to Fig. 5.7, the same elements present in the experiments from Chapters 2 and 5 are again found here. Block diagram of the experimental setup for the timing experiments. a0: data input; a0*: data return; c0, c0*, c1, c1*: clock phases and returns; dc0: DC offset bias; dc0*: offset bias return; q0 ? q7: experimental output. Due to space limitations, only four outputs could be observed at once, and only on non-adjacent ports. Ports not used were terminated with matched impedances (not shown). Low noise bandpass filters have a cutoff frequency of fC = 1 kHz. Low noise amplifier is a Miteq LNA with an operation range of 0.518 GHz and a 2.5 dB noise floor. 200 There were four major circuit components on the CLA chip. The Wilkinson power splitters have been discussed previously in Chapter 4. A stack of 12 SQUIDs provided amplification of each output signal from the CLA core [46]. Figure 6.6 shows a simplified version of the shift register that was used to send inputs bits to the CLA. A 16-bit serial shift register provided input to the CLA. With each clock cycle the data progressed one bit in the shift register and was also inputted to the CLA. Bits A0?A7 received bits 0?7 from the shift register. Bits B0?B7 received bits 15?8 from the shift register. Note the reverse ordering of the shift register bits for input B. In Fig. 6.6 I use blue arrows for the A input and red arrows for the B input. Any 16-bit pattern could be sent in serial form down the input to the shift register and then applied to the CLA I tested two specific patters. (See Table 6.1). A full test of the CLA would measure the error rate for many different input patterns. Overall, the clocks and single data input were arranged in the same fashion described for the shift register in Chapter 5. (See Fig. 6.5.) 6.4 Experimental Results I first established the optimum operating point for clock power, relative clock phase, and data input phase using a procedure that was similar to the procedure in Chapter 4 for Experiments 1?3. I then tested the digital output of the circuit. Correct operation was verified by comparing the measured output to the expected output. Additionally, I measured the power margins for the CLA gates and com- pared them with simulation results. Finally, I measured the power dissipation of 201 Input (a0) Shift Register CLA Adder Output (Bit 0) Output (Bit 1) Output (Bit 2) Output (Bit 3) Output (Bit 4) Output (Bit 5) Output (Bit 6) Output (Bit 7) Input Return (a0?) Figure 6.6: Shift register input for CLA. The sixteen input bits to the CLA are delivered by an on-chip shift register. Blue arrows represent bits of A, red arrows represent bits of B. The input to the register is on the bottom right and the output on the bottom left. This 16-bit register passes the first eight bits to the CLA in reverse order, with the first bit being the least significant bit, and passes the last eight bits to the CLA in reverse order, with the 8th bit being most significant. the CLA core and compared it with the expected dissipation. 6.4.1 Logic Test The input a0 to the shift register was supplied with two arbitrary patterns of 16 bits each, at a repetition rate of 1/(16f). Table 6.1 shows the two 16-bit input sequences I used and the expected output from the CLA. The ID number in the left column is for reference. The columns labeled q7 to q0 are the output bits, from most-significant to least-significant. The far right column lists the corresponding 202 Table 6.1: Expected CLA output pattern for two cyclic input sequences. This table shows the expected output of each bit of the CLA in sequence in columns. Two short, arbitrary inputs sequences were chosen. Rows constitute simultaneous output values. Base-10 decimal numbers are given in the right column for reference. ID number on left is for reference only. 1111111111111100 1110110111111100 ID q7 q6 q5 q4 q3 q2 q1 q0 D10 q7 q6 q5 q4 q3 q2 q1 q0 D10 0 1 1 1 1 1 0 1 1 251 1 0 1 1 0 0 1 1 179 1 1 1 1 1 1 0 0 0 248 1 1 0 1 0 1 0 0 212 2 1 1 1 1 0 0 1 0 242 1 1 1 0 0 0 0 0 224 3 1 1 1 0 0 1 1 0 230 1 1 0 1 1 1 0 1 221 4 1 1 0 0 1 1 1 0 206 1 1 0 0 1 0 0 1 201 5 1 0 0 1 1 1 1 0 158 1 0 0 1 1 0 1 0 154 6 0 0 1 1 1 1 1 0 62 0 0 1 1 1 0 0 1 57 7 1 1 1 1 1 1 1 0 254 1 1 1 1 0 1 0 1 245 8 0 0 1 1 1 1 1 0 62 0 0 1 0 1 1 0 0 44 9 1 0 0 1 1 1 1 0 158 0 1 1 1 1 0 1 0 122 10 1 1 0 0 1 1 1 0 206 1 0 0 0 0 1 1 0 134 11 1 1 1 0 0 1 1 0 230 0 1 0 1 0 1 1 0 86 12 1 1 1 1 0 0 1 0 242 0 1 0 1 0 0 1 0 82 13 1 1 1 1 1 0 0 0 248 0 1 1 1 1 0 0 0 120 14 1 1 1 1 1 0 1 1 251 0 1 0 1 1 0 1 1 91 15 1 1 1 1 1 1 0 0 252 0 1 1 0 1 1 0 0 108 203 base-10 number associated with each 8-bit output. Each output can be found by the following procedure. Shift the input pattern to the left by the value in the ID column. The first number becomes the last with every shift. Starting from the left, the first eight numbers are binary digits B0 through B7, in that order. The next eight numbers are binary digits A7 through A0, in that order. Note that the most significant bit of both numbers appears in the middle of the pattern. Add the decimal numbers A = ?7 i=0Ai2i and B = ?7 i=0Bi2i to get D = A + B, and then take the modulo 256 to get D10 = D mod 256, which is shown in the table. Then q7 through q0 are the digits in the binary representation of D10. This procedure will give 16 different outputs, for each of the 16 cyclic permu- tations of the input pattern. For example, for ID 12 from the second input pattern, A = 11011111|binary = 223|decimal and B = 01110011|binary = 115|decimal. As shown in the table, (A+B) mod 256 = 82|decimal = 01010010|binary. Figure 6.7(a) shows the measured output of the CLA while operating at a clock speed of 6.21 GHz as measured by a sampling digital oscilloscope when the first 16 bit input pattern was fed to the shift register. Figure 6.7(b) shows the measured output of the CLA while operating at a clock speed of 6.21 GHz as measured by a sampling digital oscilloscope when the second 16 bit input pattern was fed to the shift register. I tested and found correct operation at 4 GHz as well. Because the Wilkinson power splitter operates better closer to the design frequency of 7.5 GHz, I will describe the results for the highest operational frequency. Additionally, I found no operating points at frequencies between 4 GHz and 6.21 GHz. An on-chip 50 MHz resonance between pads prevented operation at all but a few frequencies. No 204 q3 q4 q5 q6 Input q0 q1 q2 q3 q4 q5 q6 q7 q7 Input q0 q1 q2 0.8 V [V ] 0 2 3 4 51 t [ns] 1.4 1.6 1.8 0.6 0.4 0.2 1 1.2 V [V ] 0 1 2 3 54 t [ns] 1 0 01 1 1 1 1 1 1 1 1 1 10 0 Input Pattern: 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0(b) 1 0 01 1 1 1 1 1 1 1 1 1 1 1 1 Input Pattern: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0(a) 1.4 1.6 1.8 1.2 1 0.8 0.6 0.4 0.2 Figure 6.7: Measured CLA Output. Two examples of CLA outputs for different inputs. Output voltage levels as measured on oscilloscope from the CLA chip with wafer coordinates +9+B. Top row shows the voltage level of the return from the shift register on chip. Next rows show output from CLA outputs q7 through q0. Digital one corresponds to a low voltage, digital zero corresponds to a high voltage. Black curves show expected digital output. Data for q4 is shown in red because its correct operation was recorded during a separate cool down of the chip. 205 smoothing or averaging was applied. The top row is the returned input signal. The eight lower rows are, from top to bottom, the outputs from the most-significant to least-significant bits. As in my previous experiments, outputs had inverting amplifiers which made digital ?zero? appear as sustained high output voltages, and digital ?one? as low- dipping voltages. The input return signal was nosier than the CLA output bit it is not flipped. An input ?one? was 1 V with a 36 dB attenuator applied to the chip and returned without amplification. The eight output bits were amplified on-chip by the SQUID amplifiers and then the Miteq amplifiers at room temperature. Digital ?zero? and ?one? appear as peaks of different height ? instead of peaks and flats ? due to crosstalk between data and clock lines carrying a sinusoidal signal. Also, on this chip, the clock power was eight times greater than I used in my previous experiments, since each of the eight pipelines required a fully-powered clock line, and there are eight clock lines per Wilkinson power splitter. Also, crosstalk was more pronounced for outputs with pads physically closer to the clock signal lines. Nevertheless, the output was clearly discernible. I made note that the difference in length of coaxial cables in the probe and from probe to equipment was small, such that the CLA outputs were not skewed in time relative to each other. An additional elbow connector used for q4 and q5 introduced the delay seen in Fig. 6.7. The measured signals (solid curves) shown in Fig. 6.7 correspond to the ex- pected output (black curves) shown in Table 6.1. The pattern was cyclic and I found the beginning of the pattern (corresponding to ID 0) by inspection. 206 6.4.2 Operating Margins Figure 6.8(a) shows the expected power margins based on a spice simulation I did of the entire CLA (see Appendix F) and Fig. 6.8(b) shows the measured power margins of the CLA. In the figure, the optimal operating point from measurements is marked for 6.21 GHz and a clock amplitude of 0.88Ic. This is about 8% above the 6.21 GHz design value marked in Fig. 6.8(a). The clock power supplied to the probe was measured using a Agilent E4416A EMP-P Series Power Meter, and checked against the oscilloscope readings. The out- put power was measured using an oscilloscope or Hewitt Packard 70620B Spectrum Analyzer. The power delivered on-chip was then calculated as the geometric mean of the applied and measured power, or the average value in dBm. Figure 6.8(b) shows the margins and the optimal operating point for power at 6.21 GHz. The op- timal power is indicated by the ? and is centered between the high and low power margins. The nominal operating power of -2.4 dBm was found to correspond to the design power of -2.4 dBm to within 8%. The correct logical operation of the CLA, along with verification of the designed optimum operating point, also confirms that the Wilkinson power splitter is operating correctly. A significant discrepancy with these results is that the predicted margins of 5.5 dB at 6.21 GHz, were much greater than the 1 dB margins I measured. Nevertheless, the measured optimal operation point of -2.4 dBm clock power was the same as the design value. The narrowing of the margins is due to microwave effects such as resonances within the power network (between pads, between splitter and combiner, 207 -7 -6 -5 -4 -3 -2 -1 0 0 5 10 15 20 25 Cl oc k A m pli tu de M a rg in s [dB m ] Clock Frequency [GHz] Simulated Margins on AndOr Gate(a) 3 dB 1.5 dB Designed Operating Point -3 -2.5 -2 -1.5 6.19 6.195 6.2 6.205 6.21 6.215 6.22 Cl oc k Po w er M a rg in s [dB m ] Clock Frequency [GHz] Measured Margins on CLA(b) Measured Optimal Operating Point Figure 6.8: Power margins for CLA. (a) Simulation of the CLA (both powered and unpowered gate designs) shows the upper and lower power margins of the gate. The designed operating point of the AndOr gate is marked at approximately -2.4 dBm. (b) Measured power margins on the CLA in the immediate vicinity of 6.21 GHz. The optimal operating point is the geometric mean of the power to the chip at high failure and low failure. The optimal operating point is marked here at approximately -2.4 dBm. 208 etc.) in the power network, not operational problems in the digital parts of the circuit, since any errors in the digital part of the circuit would be immediately apparent in the incorrect digital output seen on the oscilloscope. By measuring the power margins over a range much greater than the 50 MHz shown in the Figure one should be able to determine if microwave resonances are present. I tried to find operational frequencies at 6.0 GHz, 6.1 GHz, and 6.3 GHz, none of which yielded positive results. In fact, at other clock speeds I found even smaller margins for clock power. I also found that the operating margins of clock power of the CLA fluctuated rapidly as a function of frequency and correlated with neighboring measurements only within small 50 MHz steps. This behavior is not at all like what I expected, as shown in Fig. 6.8, 50 MHz corresponds to an on-chip pad-to-pad resonance which I noted in Chapter 5. This made finding other operational frequencies difficult, but with further testing I found another good point at 4.0 GHz. 6.4.3 Power Dissipation Test All power for the CLA came from two clock lines. The first clock line powered one phase of the CLA and the output amplifiers. The second clock powered only the other phase of the CLA. I measured the power drawn from both clocks. In Chapter 2, I showed that the amount of power drawn by a single junction in RQL was very small. For the approximately 815 junctions in the CLA, a direct measurement of the power loss on the clock lines would require measuring 1?W or less out of 1 mW. This was not possible with available equipment. Instead, I used an alternative 209 method of measuring power dissipation that involves measuring the clock sidebands which appear due to phase and amplitude modulation. Figure 6.9 shows the measurement of the clock signal using a spectrum ana- lyzer. The main peak at 6.21 GHz stands well above the noise floor. Sidebands at 6.210 258 GHz and 6.209 742 GHz can clearly be seen in both measurements. The sidebands are at ?258.75 kHz with respect to the main band, because they were generated by a cyclic 24,000-bit input sequence containing 12,000 successive ?zeros? and either 12,000 successive random bits or ?ones.? The key thing to notice is that the sideband peaks in Fig. 6.9(a) are higher than the side band peaks in Fig. 6.9(b). This is because power was drawn from Clock 0 by the output amplifiers than from Clock 1. The calculation of the power drawn by the CLA at the optimal operation point follows in three parts. First, using the results of Chapter 2, Section 2.6.2, I determine the relative effect of junction switching on phase and amplitude modu- lation. In Chapter 2, I argued that the expected delay in the clock signal due to junction switching was 1.4 ps, independent of clock frequency. Since the time period corresponding to 6.21 GHz is 161 ps, the phase modulation of the signal from one junction switch is expected to be ?? = 2pi 1.4 ps 161 ps = 0.0546 rad. (6.7) Because this is the total variation, the expected amplitude of variation of the phase is half the value given in (6.7) is gp = ??/2 ? 0.0273 rad. (6.8) 210 -100 -80 -60 -40 -20 0 6.2097 6.2098 6.2099 6.21 6.2101 6.2102 6.2103 Po w er [dB m ] Frequency f [GHz] 258 kHz ?P = ?69.26 dB Clock 0 (in phase)(a) -100 -80 -60 -40 -20 0 6.2097 6.2098 6.2099 6.21 6.2101 6.2102 6.2103 Po w er [dB m ] Frequency f [GHz] 258 kHz ?P = ?79.33 dB (b) Clock 1 (quad phase) Figure 6.9: Measured power spectrum of Carry-Look Ahead Adder. This output from the Agilent spectrum analyzer displays three peaks. The center peak is at the carrier frequency (6.21 GHz) and the two side- bands are at the modulation frequency ?258 kHz from the center peak. Data input was a sequence of 48,000 non-return-to-zero bits, composed of 12,000 return-to-zero random bits followed by 12,000 return-to-zero ?zero? bits. (a) shows clock phase 0, (b) shows clock phase 1. 211 Amplitude Modulation Frequency Modulation Inactive Active Figure 6.10: Modulation of Clock Signal by RQL Gate Operation. This figure shows a clock signal modulated by an input signal consisting of a sequence of random bits and all zeros. While the CLA is active the impedance of the clock lines changes, which results in both amplitude and frequency modulation. This modulation of the clock signal is a measure of the power drawn by the CLA while operational. The power dissipation of the circuit is 0.6?W (see Chapter 2, Section 2.6.2) for a total applied power of 12.5?W [8]. Thus, the variation in power has an amplitude of ga = 1? 12.5?W? 0.6?W 12.5?W = 0.0243, (6.9) which is close to the value of the phase modulation factor gp. Thus, the two effects are different but we expect them to be of approximately equal strength. Figure 6.10 shows the behavior of a clock signal while the CLA was at times active and at times inactive. While active, the loading of the circuit by the switch- 212 ing junctions drew power (decreasing amplitude), the effective inductance of the transformers changed, and this changed the propagation time (modulating phase). This modulation can be expressed as V (t) = V0(1 + ga sin?at) sin[?0t(1 + gp sin(?pt)], (6.10) where V0 is the nominal output voltage, ?0 is the nominal clock frequency, ?a is the amplitude modulation frequency, ?p is the phase modulation frequency, ga is the amplitude modulation factor, and gp is the phase modulation factor. For our test ?a  ?0, ?p  ?0, gp  1, and ga  1. A Fourier decomposition of (6.10) gives the main peak power P0 at ?0/2pi and a sideband power ?P below P0 at (?0 ? n?a)/2pi, where in is an integer. As |n| grows larger, the sidebands grow smaller. I am only interested in the first sideband; sidebands with |n| > 1 are below the noise floor of the spectrum analyzer. A Taylor series expansion of (6.10) will have three terms of interest. A constant value representing the main peak power at 6.21 GHz, and two terms at ?0 + ?a: one representing the effect of amplitude modulation and one representing the effect of phase modulation. The relative weights of each are 0.53 for ga and 0.47 for gp, which we can see by comparing (6.8) and (6.7). To calculate the effect of amplitude modulation, I temporarily set gp = 0 in (6.10). The relative strength of the effect of phase modulation on the clock is known from (6.7) and will be accounted for later. The voltage of the clock line as measured at the output is different while operational (on) and non-operational (off), and can 213 be expressed as Von = (1 + x)V0, (6.11a) Voff = (1? x)V0, (6.11b) where x = (Von ? Voff )/2V0 is the ratio of the variation with respect to the average voltage V0 for both periods with and without data together. Since x is small, the power dissipated while the chip is on and off is, Pon = 1 R(1 + x) 2V 20 = P0(1 + 2x), (6.12a) Poff = 1 R(1? x) 2V 20 = P0(1? 2x), (6.12b) where P0 is the nominal power in the clock line. Here I have approximated (1+x)2 ? 1 + 2x. This gives the differential power to the chip as Pchip = Pon ? Poff = 4xP0, (6.13) or as a ratio, Pchip P0 = 4x. (6.14) I now come to the key idea of the sideband measurement. The power measure- ment gives us the ratio of power between the main peak and the single side band. To use (6.14) to calculate the power dissipation of the CLA, I can recast it in terms of the measured sideband power difference shown in Fig. 6.9. In both linear and logarithmic forms, I can write the power dissipation ratio as Pchip P0 ???? linear = 2? 4? ? 10?P/10 ? 0.53? pi 4 , (6.15a) Pchip P0 ???? dB = 9dB + 1 2 ?P ? 1.38 dB? 1 dB, (6.15b) 214 where ?P = 10 log10 x, or the ratio of Pchip to P0 as given in dB, which can be seen in Fig. 6.9. The terms in (6.15b) require some explanation. In (6.15a), there is a factor of two for the two sidebands in Fig. 6.10 at ?0 + ?a and ?0 ? ?a. There is a factor of four from the derivation of (6.14). The factor of 0.53 comes from (6.7) because 53% of the variation is due to amplitude modulation and 47% due to phase modulation, whereas x is the power ratio taking both effects into account. Finally, the factor of pi/4 is a correction factor due to the data being a square wave and not a sinusoid, as assumed in (6.10). This last correction is calculated as the difference in amplitude between the pure sine wave and the fundamental harmonic of a square wave. Equation (6.15b) follows directly from (6.15a), and is more useful here because the power measurements are made in dBm. For eight 32? clock lines with 2 mA AC current amplitude passing through each of the eight clock lines fed from the same input pad, the estimated power to the chip is P0 = 8? ( 2mA? 2 )2 32? = 512?W. (estimate) (6.16) Table 6.2 shows the results of the calculation of Pchip. ?P is measured from Fig. 6.9. Pchip/P0 is given by (6.15b). Pin and Pout, the clock line input and output power, and the value of the attenuator on the probe are measured quantities used to calculate P0. Pchip is calculated from Pchip/P0 from (6.15b) and P0. Because Clock 0 powers only half the CLA and Clock 1 powers both half the CLA and the output amplifiers, I estimated the CLA power as twice that being drawn from Clock 0. In the end, this gives the estimated CLA power as 570 nW. This does not take 215 into account power dissipated by the junctions in the shift register, nor the slightly higher number of junctions powered by Clock 1 than Clock 0. The expected power for 815 junctions with a weighted average critical current of 162?A (using (2.4)) is P = 1/3 Ic?0Nf = 563 nW for the CLA at 6.21 GHz. The measurement and prediction match within 2%, which is remarkable given the many assumptions and corrections than needed to be made. Finally, I note that this corresponds to only 700 pW per junction, roughly a factor of 200 smaller than CMOS transistors. 6.5 Conclusions In this Chapter I described an experiment on an RQL CLA that demonstrates that RQL circuits operate at very low power. At the time of this writing, common processors typically have 750 million transistors and use over 100 W of power, or about 130 nW/transistor. The CLA, with 570 nW power dissipation for 815 junc- tions, uses about 700 pW/junction, i.e. almost 200 times less power per junction than CMOS uses per transistor. An implication of this experiment is that RQL is one of the lowest-power digital technologies now known. Additionally, my experi- ments showed that RQL can be scaled to large, non-trivial circuits and also showed that RQL was compatible with existing CMOS design and analysis methods. 216 Table 6.2: Power Measurement Calculations. ?P is measured from Fig. 6.9. Pchip/P0 is calculated from (6.15b). Pin and Pout were measured separately using a power meter. P0 is calculated from Pin, Pout, and the attenuator value (also measured separately). Pchip is calculated from Pchip/P0 from (6.15b) and P0. The sum of powers is calculated by adding the power dissipation from both clocks. CLA and Amp power are cal- culated assuming CLA drain from Clock 1 is the same as Clock 0. The predicted value of power for the CLA was 563 nW. Clock 0 Clock 1 ?P -79.33 -69.26 dBm Pchip/P0 -33.045 -28.01 dB Pchip/P0 0.000496 0.001581 Pin 1.77 1.85 dBm Pout -9.41 -8.66 dBm Attenuator 2.83 2.83 dB P0 -2.405 -1.99 dBm 0.57478 0.63241 mW Pchip 0.00029 0.00100 mW 285 1000 nW -35.45 -30 dBm Sum of Both 0.0013 mW 1285 nW -28.9 dBm CLA Only 0.000570 mW 570.2 nW -32.4 dBm Amplifier Only 0.000715 mW 714.9 nW -31.5 dBm 217 Chapter 7 Summary and Conclusions 7.1 Summary In the opening to the first chapter, I mentioned that RQL is a superconducting technology that may eventually replace CMOS technology. It is a classical digital technology based on encoding classical digital data as decidedly quantum flux units. Demonstrated RSFQ technologies support the use of SFQ pulses to encode digital data. However, although RSFQ has many advantages, it is not applicable for many applications, nor for general purpose processing. An alternative encoding as pulse pairs gave birth to RQL (see Chapter 2). Theoretical predictions for RQL estimated a power consumption of approxi- mately ?0Ic per junction switching event, which I confirmed in Chapter 2. This, together with the demonstration of functioning logic gates, opened up the possibility of large scale integrated RQL digital circuits. RQL logic gates have three or fewer junctions and are powered through an inductively coupled clock line, removing the power consumption from dc-biasing resistors. The gates behave as combinatorial gates on a higher level, but as state machines on a lower, pulse-based level. In Chapter 3, I examined the detailed behavior of some RQL circuits. The analytic timing model provides a detailed description of the timing behavior of RQL junction switching events. Starting from this model, I also derived the self-correcting 219 timing behavior of RQL circuits. Finally, the model also provides estimates on the limits of operation for the number of junctions per phase, the clock frequency, clock amplitude, and input time. More than just a useful behavioral model, this analytic timing model was the basis for a VHDL behavioral model. Together with simulation results, the VHDL model described RQL circuits in an industry-standard language, opening up the possibility of applying the large library of existing CMOS design tools to RQL. Chapter 4 was focused on developing a suitable power network to supply bias current to the junctions in an RQL circuit. This power network needed to meet a number of goals. My primary metric was the current amplitude distribution across the clock lines. Too much or too little current at any junction would cause improper operation of the whole chip. Too much variation in the current along the clock line leads to bad timing properties or a complete failure of the circuit. Because the clock power has to be pulled back off the chip, using a limited number of pads, I designed a power splitter/combiner with minimal current variation on the chip, minimal reflection from the chip, and maximum isolation of different clock lines on the chip. This was accomplished with a modified set of cascaded Wilkinson power splitters. The design was collapsed into only six stages with a maximum flat response, and then this design was optimized using numerical simulations. The test of two power networks, a 12-stage geometric design to test even mode operation and a 6-stage maximum flat design to test the odd mode, showed acceptable margins for the RQL circuit under test. In Chapter 5, I described three experiments to test the analytic timing model. 220 I directly measured the output timing difference between two pulses on Josephson transmission lines of different lengths to compare results with (3.8). I measured the clock power margins as a function of input phase, testing the predictions of the limits of operation from the analytic timing model. Finally, I measured the clock power margins of a very long, deep pipeline shift register to directly get an estimate for the value of t0. The final experiment on a long, deep pipeline shift register gave a value of the switching time parameter t0 within 2% of the predicted value of 0.47 ps, but with 20% spread in the measured values. Together, these results demonstrated that the analytic timing model provides reasonably accurate descriptions of RQL junction switching behavior. Finally, Chapter 6 put all previous results together and demonstrated the op- eration of a practical, integrated RQL circuit. I designed an eight-bit adder using the Kogge-Stone architecture. It was fully operational at 6.21 GHz with power mar- gins of 1.5 dB. The device was also operational at about 4 GHz. I found a small operational regions in clock frequency due to on-chip resonances. The power con- sumption was predicted to be 563 nW and measured to be 570 nW. For comparison, a CMOS transistor would be expected to require roughly two orders of magnitude more power. 7.2 Conclusions and Future Work RQL began as an alternative to RSFQ. Many of the problems with RSFQ appear to have been mitigated or absolved by RQL. This thesis has provided some 221 key groundwork for future VHSIC applications of RQL. The timing model has been shown to be appropriate. The power supply, a limiting factor of modern CMOS design, is in RQL a matter of designing an appropriate Wilkinson power splitter. Unlike CMOS, only the power actually used in switching is dissipated on the chip. The VHDL models of RQL allow the combinational RQL gates to be used in much the same way as CMOS gates. Finally, the CLA experiment is proof that RQL can perform digital processing tasks. Needless to say, despite the progress a number of hurdles remain. The cryo- packaging of superconducting digital logic of any sort remains expensive and com- plicated. Though recent advances in this technology may allow superconducting digital logic to be used in more mainstream applications, it is still a large barrier to the vast markets served by CMOS technology. Also, the 6.21 GHz clock frequency demonstrated here is not markedly faster than the current limits of CMOS, around 4 GHz. Although advanced materials promise critical current densities of 10 kA/cm2 ? a fourfold increase over the switching speeds displayed here ? these processes are still at an experimental stage. To accommodate the non-local interconnects be- tween logic elements in the CLA, I had to add an additional clock phase ? and thus latency ? to cover the distance. A passive transmission line will solve this issue but remains untested. In addition, while the RQL gates described here constitute a universal set, many additional digital elements would be needed to produce the versatility of CMOS designs. Amongst others, the return-to-zero input requirement of RQL clashes directly with the non-return to zero patterns in CMOS. To interface or be compatible to CMOS, RQL must adopt a method of NRZ input. 222 Perhaps the most glaring omission for a fully-functional CMOS replacement using RQL is memory. While the Set-Reset gate provides a limited memory func- tionality, contemporary users require vast arrays of memory. The Set-Reset gate cannot be efficiently scaled up to meet the needs of real data storage. A few possi- ble solutions may exist. For example, in Chapter 1, I considered only junctions for which the phase difference between nodes of a junction with zero current was zero. Other kinds of junctions, pi-Josephson junctions instead have a phase difference of pi when no static current flows through the junction. Such exotic junctions can exhibit hysteretic I-V curves [47], and they can be used to store nonvolatile data. Implementing such a memory structure in RQL is the topic of ongoing research [48, 49, 50]. As a practical matter, the number of junctions on a single chip is limited. While junction counts into the millions have been reported [51], modern processors may require hundreds of millions of junctions. Scaling of the chip size to accommo- date hundreds of millions of junctions will be very challenging because of fabrication errors. Instead, individual chips which each are part of a larger circuit may need to be fabricated and tested separately, only later to be included as part of a multi- chip-module. Finally, all aspect of RQL will benefit from advances in fabrication technolo- gies. Circuit density is limited by the number of metal layers. The four used here lead to sometimes convoluted designs when multiple crossings are needed. A six- layer process may solve some of these issues. Congestion could further be relieved by adding more layers, such as the 10-layer process developed by Tanaka et al. [52]. In 223 the horizontal instead of vertical direction, smaller lithography sizes will also allow improvement to circuit density. 7.3 Final Words Originally conceived as a classical logic family capable of interfacing neatly with quantum computers and quantum bits, RQL?s strengths make it a viable clas- sical computer technology in its own right. Many of RQLs advantages are found in the inherently quantum mechanical behavior of Josephson junctions. Superconduc- tivity gives rise to the quantization of flux, tunneling gives birth to the behavior of Josephson junctions, and the Schro?dinger equation gives equations of motion which allow traveling wave solutions. In RQL, these traveling waves are not the solitons found on individual junctions, but chains of junctions coupled together through inductors. Unique to RQL, the data is encoded as pairs of pulses, switching junc- tions back and forth as it travels through the circuit. In this sense, it has aspects of inherently quantum behavior, but is not quantum computing as it is currently understood. RQL is still a technology in its infancy. Many aspects of my work on RQL have not been mentioned here. Development of the design environment and VHDL models are ongoing processes. Development of RQL logic gates is an ongoing pro- cess. Although the combinational behavior of RQL logic gates makes it similar to CMOS in many ways, CMOS has decades of research and development behind it. If RQL proves to be even partially as technologically successful as CMOS, there will 224 undoubtedly be many discoveries in RQL in the future. Although not yet realized, the idea of quantum computation is old enough to have been studied to a considerable extent in theory. The implications range from simple computational advantages [53] to world view-changing. As for RQL, it is in a somewhat unique place. The underlying behavior of the junctions is man- ifestly quantum, but the output decidedly classical, so RQL in some ways bridges the behavior between quantum and classical realms. The equations of motion for a junction are those of a damped pendulum, and yet the behavior of Josephson junctions is fantastically rich and complex. Perhaps someday RQL will help bring this complex quantum behavior into everyday use. 225 Appendix A Numerical Solution of the Sine-Gordon Equation Figure 1.12 on page 36 was generated by solving the sine-Gordon equation numerically for an AC bias current, four junctions per phase, and two phases. The final junction is overdamped to prevent reflections, and does not switch itself. f=10;(* GHz *) bf=Sqrt[2]; (* Beta Factor for Ic stepup *) \[Beta]1=1.1; (* Damping Factor *) \[Beta]2=\[Beta]1 bf; (* Alt Damping Factor *) IcRN = 0.75;(* mV *) L=9.9; (* pH *) Amp = 0.77;(* Clock Amplitude *) A2 = 0.8; (* SFQ Amplitude *) Phi0 = 2.07; (* mV ps *) DC=Phi0/(2L IcRN); (* DC Bias *) t0=Phi0/(2 IcRN); (* Baseline switching time *) \[Omega]=2\[Pi] f / 1000; InterL=6.2; endp=8\[Pi]/\[Omega]; (* End of simulation time *) delta = \[Omega] t0 / Amp;(* calculated value *) wc=(2\[Pi])/Phi0 IcRN; (* Calculated Value *) wp1 = wc/Sqrt[\[Beta]1]; (* Calculated Value *) wp2 = wc/Sqrt[\[Beta]2]; (* Calculated Value *) R1=Phi0/(2\[Pi]) wc; (* Calculated Value *) R2 = R1/bf;(* Calculated Value *) \[Tau] = 2.3\[Pi]/\[Omega](* Pulse input time *) x=\[Pi]/2;(* next clock phase *) (* pDrive[t_,t1_]:=1/2 2\[Pi](-2+Erf[t1/(Sqrt[2] t0)]+Erf[(\[Pi]-t \[Omega]+t1 \[Omega])/(Sqrt[2] t0 \[Omega])]+Erfc[(-t+t1)/(Sqrt[2] t0)]+Erfc[(\[Pi]+t1 \[Omega])/(Sqrt[2] t0 \[Omega])]);*) pD[t_,t1_]:=\[Pi] (1- Erf[Sqrt[2] ( IcRN (-t+t1))/(3 Phi0)]); pDrive[t_,t1_]:=pD[t,t1]-pD[t,t1+\[Pi]/\[Omega]]; eqn1=Phi0/(2\[Pi] L) (p1[t]-p2[t])==-(1/wp1^2)p1??[t]-1/wc p1?[t]-Sin[p1[t]]+Amp Sin[\[Omega] t] +0.5DC+A2 Sin[pDrive[t,\[Tau]]]; eqn2=InterL Phi0/(2\[Pi] L) (-p1[t]+p2[t]/bf+p2[t]-p3[t]/bf)==-(1/wp2^2)p2??[t]-1/wc 227 p2?[t]-Sin[p2[t]]+Amp Sin[\[Omega] t] + DC; eqn3= InterL Phi0/(2\[Pi] L) (-p2[t]/bf+p3[t]+p3[t]/bf-p4[t])==-(1/wp1^2)p3??[t]-1/wc p3?[t]-Sin[p3[t]]+Amp Sin[\[Omega] t] + DC; eqn4=InterL Phi0/(2\[Pi] L) (-p3[t]+p4[t]/bf+p4[t]-p5[t]/bf)==-(1/wp2^2)p4??[t]-1/wc p4?[t]-Sin[p4[t]]+Amp Sin[\[Omega] t] + DC; eqn5=InterL Phi0/(2\[Pi] L) (-p4[t]/bf+p5[t]+p5[t]/bf-p6[t])==-(1/wp1^2)p5??[t]-1/wc p5?[t]-Sin[p5[t]]+Amp Sin[\[Omega] t-x] + DC; eqn6= InterL Phi0/(2\[Pi] L) (-p5[t]+p6[t]/bf+p6[t]-p7[t]/bf)==-(1/wp2^2)p6??[t]-1/wc p6?[t]-Sin[p6[t]]+Amp Sin[\[Omega] t-x] + DC; eqn7= InterL Phi0/(2\[Pi] L) (-p6[t]/bf+p7[t]+p7[t]/bf-p8[t])==-(1/wp1^2)p7??[t]-1/wc p7?[t]-Sin[p7[t]]+Amp Sin[\[Omega] t-x] + DC; eqn8=Phi0/(2\[Pi] L) (p7[t]-p8[t])==-(1/wp2^2)p8??[t]-1/wc p8?[t]-Sin[p8[t]]+Amp Sin[\[Omega] t-x] +DC; s=NDSolve[{eqn1, eqn2, eqn3, eqn4, eqn5, eqn6, eqn7, eqn8, p1[0]==0, p1?[0]==0, p2[0]==0, p2?[0]==0, p3[0]==0, p3?[0]==0, p4[0]==0, p4?[0]==0, p5[0]==0, p5?[0]==0, p6[0]==0, p6?[0]==0, p7[0]==0, p7?[0]==0, p8[0]==0, p8?[0]==0}, {p1, p2, p3, p4, p5, p6, p7, p8}, {t,0, endp}, MaxSteps-> 100000]; Plot[{Amp Sin[\[Omega] t],Amp Sin[\[Omega] t-x],1/(2\[Pi]) Evaluate[p1[t]/.s], 1/(2\[Pi]) Evaluate[p2[t]/.s], 1/(2\[Pi]) Evaluate[p3[t]/.s], 1/(2\[Pi]) Evaluate[p4[t]/.s], 1/(2\[Pi]) Evaluate[p5[t]/.s], 1/(2\[Pi]) Evaluate[p6[t]/.s], 1/(2\[Pi]) Evaluate[p7[t]/.s], 1/(2\[Pi]) Evaluate[p8[t]/.s], }, {t,\[Tau]-0.2 (2\[Pi])/\[Omega],\[Tau]+1.2 (2\[Pi])/\[Omega]}, PlotRange-> All] Plot[{Amp Sin[\[Omega] t],Amp Sin[\[Omega] t-x],1/(2\[Pi]) Evaluate[p1[t]/.s], 1/(2\[Pi]) Evaluate[p2[t]/.s], 1/(2\[Pi]) Evaluate[p3[t]/.s], 1/(2\[Pi]), 1/(2\[Pi]) Evaluate[p5[t]/.s], 1/(2\[Pi]) Evaluate[p6[t]/.s], 1/(2\[Pi]) Evaluate[p7[t]/.s], 1/(2\[Pi]) Evaluate[p8[t]/.s], }, {t,0,endp}, PlotRange-> All] 228 Appendix B Parameters for fits B.1 Timing Extraction Results for the JTL Tables B.1, B.2, and B.3 below show the results from simulation and analysis discussed in Chapter 3, including the analytic function fit to the data for the JTL, the piecewise polynomial function fit to the data for the JTL, and the analytic function fit to the data for the AndOr gate OR operation. 229 Table B.1: Extracted JTL Timing Parameters. Analytic timing data function fit to data for JTL, as described in Section 3.3. A block diagram of the circuit simulated is shown in Fig. 3.7. f is the clock frequency of the circuit. ?1, ?2, and ?3 are fitting parameters defined in (3.13). In this simulation, A = 0.83 and ?c = 4.29. The netlist for the circuit is given in B.3.4 on page 253. f [GHz] ?1 ?2 ?3 1 1.158 0.9926 5.928 2 1.121 1.013 5.592 3 1.1 1.022 4.867 4 1.088 1.029 4.01 4.5 1.084 1.04 15.49 5 1.081 1.034 3.085 5.5 1.078 1.034 2.453 6 1.075 1.032 2.246 6.5 1.076 0.9538 0.4551 7 1.072 0.9484 0.454 7.5 1.07 0.9463 0.4658 8 1.068 0.9526 0.5127 8.5 1.065 0.9487 0.514 9 1.064 0.9584 0.5778 9.5 1.062 0.9614 0.61 10 1.06 0.9698 0.6737 10.5 1.059 0.9645 0.6624 11 1.058 0.9584 0.6456 11.5 1.057 0.9716 0.7328 12 1.056 0.9692 0.7353 13 1.055 0.9799 0.8295 14 1.05 0.9919 0.9262 15 1.048 0.9892 0.9253 16 1.049 0.9811 0.8965 17 1.042 0.9963 1.008 230 Table B.2: Extraction of JTL Timing Parameters (polynomial fit). Analytic timing data function fit to data for JTL, as described in Section 3.3. A block diagram of the circuit simulated is shown in Fig. 3.7. f is the clock frequency of the circuit. ?11, ?12, ?13, ?21, ?22, and ?23, are the parameters defined in (3.14). In this simulation, A = 0.83 and ?c = 4.29. The netlist for the circuit is given in B.3.4 on page 253. f [GHz] ?11 ?12 ?13 ?21 ?22 ?23 w 1 0.04943 -0.1293 0.1068 0.03458 -0.1199 0.1261 1.531 2 0.06989 -0.1908 0.175 0.06889 -0.2382 0.2505 1.547 2.5 0.07512 -0.2095 0.202 0.08034 -0.2757 0.2924 1.542 3 0.08075 -0.2284 0.2287 0.09868 -0.3386 0.3575 1.558 3.5 0.08178 -0.2359 0.2486 0.09908 -0.3334 0.3589 1.554 4 0.08486 -0.2484 0.2712 0.1362 -0.4688 0.4936 1.583 4.5 0.09129 -0.268 0.2973 0.1539 -0.5278 0.5535 1.591 5 0.0895 -0.2679 0.3119 0.1859 -0.6462 0.6747 1.62 5.5 0.09162 -0.2762 0.3308 0.2094 -0.7294 0.7603 1.633 6 0.09336 -0.2834 0.3487 0.2163 -0.7487 0.7847 1.645 6.5 0.09548 -0.2912 0.3668 0.3 -1.069 1.102 1.682 7 0.09773 -0.2987 0.3843 0.2998 -1.06 1.099 1.699 7.5 0.1039 -0.3172 0.4092 0.3305 -1.17 1.209 1.713 8 0.1068 -0.3256 0.4267 0.3791 -1.355 1.397 1.729 8.5 0.1121 -0.3414 0.4497 0.4216 -1.514 1.56 1.745 9 0.1202 -0.3656 0.4789 0.523 -1.91 1.957 1.778 9.5 0.1281 -0.3889 0.5073 0.5681 -2.086 2.144 1.798 10 0.1387 -0.4205 0.5422 0.6981 -2.601 2.665 1.822 10.5 0.1369 -0.413 0.5463 0.614 -2.246 2.308 1.826 11 0.1414 -0.4224 0.5612 0.6775 -2.489 2.554 1.84 11.5 0.1719 -0.5186 0.6482 0.9123 -3.445 3.539 1.871 12 0.1905 -0.5773 0.7056 1.017 -3.872 3.989 1.894 13 0.2589 -0.8039 0.9165 1.681 -6.631 6.883 1.935 14 0.3398 -1.078 1.172 1.526 -5.903 6.081 1.973 15 0.3885 -1.24 1.332 2.43 -9.689 10.08 2.01 16 0.5837 -1.968 2.036 3.486 -14.16 14.86 2.05 17 0.8756 -3.074 3.111 3.93 -15.95 16.68 2.097 231 Table B.3: Extraction of AndOr OR output timing parameters. Analytic timing data function fit to data for JTL, as described in Section 3.3. A block diagram of the circuit simulated is shown in Fig. 3.7. f is the clock frequency of the circuit. ?1, ?2, and ?3, are the parameters defined in (3.13). In this simulation, A = 0.83 and ?c = 4.29. f [GHz] ?1 ?2 ?3 1 1.532 0.3301 5.706 2 0.8508 0.3227 13.53 3 1.408 0.7678 7.683 3.5 1.393 0.8385 6.265 4 1.404 0.9214 5.169 4.5 1.385 0.9689 4.473 5 1.365 1.004 3.904 5.5 1.343 0.9888 3.7 6 1.329 1.009 3.281 6.5 1.315 1.025 2.882 7 1.302 1.038 2.441 7.5 1.297 1.018 2.49 8 1.287 1.029 2.139 8.5 1.281 1.033 1.531 9 1.26 0.899 0.1862 9.5 1.252 0.9183 0.2147 10 1.243 0.9177 0.2196 10.5 1.238 0.865 0.1799 11 1.233 0.8826 0.2019 11.5 1.228 0.8991 0.2259 12 1.224 0.9139 0.2516 13 1.214 0.9414 0.3097 14 1.205 0.9037 0.2685 15 1.181 0.9523 0.3581 16 1.167 0.9738 0.4234 232 B.2 Comparison of Threshold Values in Timing Extraction Throughout this thesis, the timing parameters ?i have been used assuming ?c = 2pi ? 0.632. This is a convenient value for damped harmonic motion, but still an arbitrary choice. To be sure that the results do not depend strongly on my choice of ?c, I compare the results from both ?c = 2pi ? 0.632 and ?c = 2pi ? 0.75. As can be seen in Fig. B.1, the agreement between analyses is good. First, ?1 and ?2 lie close to 1. Though ?3 has a clear frequency dependence, as frequency increases ?3 goes to one. As for the comparison between ?c values, both results match closely in all but a few cases. Particularly at low frequencies, some fitting parameters are much different than 1, and in these cases the particular data points are discarded and linear interpolation is used for that frequency. 233 0 0.5 1 1.5 2 0 5 10 15 20 ? i Pa ra m et er f [GHz] Anomalous Points ?1 ?2 ?3 ?c = 2pi ? 0.632?c = 2pi ? 0.750 Figure B.1: Comparison of Threshold Values. The resulting ?i param- eters are plotted as functions of frequency for ?c = 2pi ? 0.632 and ?c = 2pi ? 0.75. ?1 and ?2 agree for all cases. ?3 values differ in the region marked ?Anomalous Points?, though these points are not used in timing calculations due to their anomalous values. Otherwise, the values agree and the choice of ?c is of little importance. B.3 Simulation File for Timing Extraction The following perl script performs the extraction of timing data from simula- tions described in Chapter 3. Inline notes in the code explain the steps performed. A number of smaller subroutines are called which do not reflect significant steps in the overall task. Each run of the timing extraction started with definition of several constants, such as IcRN product, number of junctions in the data path, and the clock am- plitude. The threshold value ?c of the phase for crossing was also defined at this 234 time. Then, for each frequency and amplitude, the simulation was run for different values of the input phase. The simulation was run by substituting dummy values in a template with the actual values desired. The results of the simulation are a file containing time-phase (t??) data pairs, which are analyzed with a separate script. One input to this script is the threshold value ?c. Simulations do not depend on this value, however the analysis does. The same simulation results can be analyzed with different values of ?c. Each run performs this same analysis for the desired values of ?c. This analysis runs through the time series data and monitors when each phase crosses the threshold value. Linear interpolation between the points immediately proceeding and following the crossing gives a more accurate value of the time at which the threshold is crossed. Thus finding the timing of the switching of the first junction, the timing of the switching of the second junction can be found by the same way and the difference in time calculated. The timing of the switching of the first junction also gives the phase input time. This data pair of input phase time and delay in switching is recorded in a data file, to be analyzed separately once all simulations have been performed and analyzed. After all input phases have been simulated for a given clock frequency, clock amplitude, and threshold value, a gnuplot script is called to fit all input phase- timing delay values to (3.13). The results of this fit are written to a file; each line of which contains the clock frequency, clock amplitude, threshold value, ?1, ?2, and ?3. (The gnuplot script also fits the data to (3.14), and creates a similar file containing the clock frequency, clock amplitude, threshold value, ?11, ?12, ?13, ?21, ?22, and 235 ?23.) These files are shown in Tabels B.1, B.2, B.3. B.3.1 Main Script #!/usr/bin/perl ## This script is the grand-daddy script which runs the timing ## extraction for a whole circuit. It needs slight modification in a ## few places to account for changes in number of junctions track and ## so forth, but for the most part is automated. Only this script ## needs to be called to do a timing extraction; individual other ## scripts are called as needed. ## Verbose and trial are good for testing the script before commiting ## an hour or two of computer time to simulations. ## Set the basic parameters here. $zname = "JTL"; ## Name of the gate being simualted. Make sure it ## matches the gate in print.tpl. $zjj = 2; ## Junction count per gate $zicrn = 0.75; ## IcRN value (0.75 for Hypres) $ztol = 20; ## Fit parameter tolerance $znom = 0.83; ## Actual clock amplitude $verbo = 0; ## 0: runthrough; 1: verbose $trial = 0; ## 1: trial; 0: full parameter extraction ## Set up a "table" of values to simulate for. ## NO TRAILING ZEROES!!! @freqs = ("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "2.5", "3.5", "4.5", "5.5", "6.5", "7.5", "8.5", "9.5", "10.5", "11.5"); @inph = ("0", "1", "2", "3", "4"); @amps = ("0.9", "1", "1.1"); ## this value is the modifier of the nominal value, not the actual clock amplitude @amps = ("1"); ## single-amplitude ## This part changes the values if you just want a short runthough to ## test. if ($trial == 1) { @freqs = ("1", "4", "7", "14", "20"); @amps = ("1"); ## this value is the modifier of the nominal value, ## not the actual clock amplitude @inph = ("0", "1", "3"); } 236 ## The threshold (not to be confused with tolerance) is a value used # to indicate the timing of switching. Adding additional numbers # doesn?t require additional simulations, but does increase the number # of output files. The difference between any reasonable values is # small. uncomment the first line here to compare, but for practical # purposes, only the second line is needed. @thrs = ("0.6321", # "0.75"); @thrs = ("0.6321"); ## calculate the number of simulations needed. $runcounter = 1; $size_freqs = @freqs; $size_amps = @amps; $size_inph = @inph; $size_total = $size_freqs * $size_amps * $size_inph; $idx = 0; ## clear out old data my $status = system("\\rm -r datafiles/*"); my $status = system("\\rm -r points/*"); my $status = system("\\rm -r figs/*"); my $status = system("\\rm -r results/*"); my $status = system("\\rm -r latex/*"); my $status = system("\\rm -r compare*.plt"); ## run through ALL combinations of frequency and amplitude. foreach $zcl (@freqs) { foreach $zam (@amps) { $idx = $idx + 1; ## Run through each input phase to generate different sets of ## timing points on the same circuit for the same clock freq ## and amp. $phasecounter = 2; foreach $zpn (@inph) { ## verbose display my $status = system("clear"); print "\n ** NEW RUN ** \n"; print "** $zname **\n"; print "Frequency\t $zcl GHz\n"; print "Clock Amp\t $zam Nominal\n"; print "Input Time\t $zpn pi/6\n"; 237 print "Phase Count\t $phasecounter\n"; print "Run $runcounter out of $size_total\n\n"; $runcounter = $runcounter + 1; ## create a new script to simulate current run parameters open (fTempl, ?scan.tpl?); open (fOutpt, ?>scan.pl?); while () { chomp; $outdoc = $_; $outdoc =~ s/ZCL/$zcl/g; $outdoc =~ s/ZAM/$zam/g; $outdoc =~ s/ZPN/$zpn/g; print fOutpt "$outdoc\n"; } close(fTempl); close(fOutpt); ## run simulation (once) my $status = system("perl scan.pl"); ## The following if segment is depreciated and does not run. if ($phasecounter <= 1) { foreach $zth (@thrs) { open (fTempl, ?extract.tpl?); open (fOutpt, ?>extract.pl?); while () { chomp; $outdoc = $_; $outdoc =~ s/ZCL/$zcl/g; $outdoc =~ s/ZAM/$zam/g; $outdoc =~ s/ZTH/$zth/g; $outdoc =~ s/ZSTPT/$phasecounter/g; print fOutpt "$outdoc\n"; } close(fTempl); close(fOutpt); print ("\n\nNew Threshold: $zth\n"); ## This section needs not be modified ## my $status = system("perl extract.pl datafiles/start_${zname}_cl${zcl}_am${zam}_15_16.dat $zth $zcl"); } } $phasecounter = $phasecounter +1; } $phasecounter = 2; ## go through each threshold value and extract switching times foreach $zth (@thrs) { 238 ## create a new script to simulate current run parameters open (fTempl, ?extract.tpl?); open (fOutpt, ?>extract.pl?); while () { chomp; $outdoc = $_; $outdoc =~ s/ZCL/$zcl/g; $outdoc =~ s/ZAM/$zam/g; $outdoc =~ s/ZTH/$zth/g; $outdoc =~ s/ZSTPT/$phasecounter/g; print fOutpt "$outdoc\n"; } close(fTempl); close(fOutpt); print ("\n\nNew Threshold: $zth\n"); ## This section needs to be modified to match the files ## found in "print.tpl" ## my $status = system("perl extract.pl datafiles/${zname}_cl${zcl}_am${zam}_02_04.dat $zth $zcl"); my $status = system("perl extract.pl datafiles/${zname}_cl${zcl}_am${zam}_04_06.dat $zth $zcl"); my $status = system("perl extract.pl datafiles/${zname}_cl${zcl}_am${zam}_06_08.dat $zth $zcl"); } ## create a new gnuplot fitting script foreach $zth (@thrs) { open (fTempl, ?gnufit.tpl?); open (fOutpt, ?>gnufit.plt?); while () { chomp; $outdoc = $_; $outdoc =~ s/ZCL/$zcl/g; $outdoc =~ s/ZAM/$zam/g; $outdoc =~ s/ZNOM/$znom/g; $outdoc =~ s/ZTH/$zth/g; $outdoc =~ s/ZTOL/$ztol/g; $outdoc =~ s/ZICRN/$zicrn/g; $outdoc =~ s/ZJJ/$zjj/g; $outdoc =~ s/ZNAME/$zname/g; $outdoc =~ s/ZIDX/$idx/g; print fOutpt "$outdoc\n"; } close(fTempl); close(fOutpt); print ("\n Now Fitting Data to Parameters\n"); 239 ## sort data before analyzing my $status = system("sort -n -o sorted.dat points/points_cl${zcl}_am${zam}_th${zth}.dat"); ## determine first and last recorded data points my $status = system ("firstlast.sh"); ## perform the fit my $status = system ("gnuplot gnufit.plt"); } if ($verbo == 1) { use strict; use warnings; print "\nPlease press enter key to continue."; ; } } } ## output results foreach $zth (@thrs) { $fname = "compare_${zname}_v${zicrn}_th${zth}_x${ztol}.plt"; print "$fname \n"; open (fTempl, ?compare.tpl?); open (fOutpt, ?>?, $fname); while () { chomp; $outdoc = $_; $outdoc =~ s/ZNAME/$zname/g; $outdoc =~ s/ZICRN/$zicrn/g; $outdoc =~ s/ZTH/$zth/g; $outdoc =~ s/ZTOL/$ztol/g; print fOutpt "$outdoc\n"; } close(fTempl); close(fOutpt); } if ($trial == 1) { my $status = system ("evince figs/*"); } my $status = system("\\rm -r datafiles/*"); 240 B.3.2 Timing Extraction Script This script has a simple purpose. Analyze the output of the simulation to determine when the output phase has crossed the given threshold value ?c, determine the time difference between the two outputs, and calculate the corresponding input phase ?in and phase delay ??. This script reads in the data in the format produced by spice, discarding simulation information provided in the file. The file is a series of time values and the phase values of the two junctions under consideration. The script checks the phase value of the first junction every time step, making note when it has crossed the threshold value. When the threshold has been crossed, the time of crossing is interpolated from the value immediately proceeding the crossing, and the value immediately following it. The script flags the first junction as having crossed the threshold, and waits for the second junction to cross the threshold. When the second junction crosses the threshold, the time is estimated in the same was as for the first junction. From these two time values tin and tout, and given the clock frequency f , the script calculates the input phase ?in and phase delay ??, recording these values to a file. JTL simulations produce several data points with each execution of this script, gates produce only one. #!/usr/bin/perl ## This is an extraction tool and generally does NOT need to be ## modified in any way. $pi = 3.14159265358979323; ## read in data (scalar(@ARGV) == 3) || die "SYNOPSIS: \n"; $thresh = $ARGV[1]; 241 $frq = $ARGV[2]; $fname = $ARGV[0]; $stpt = ZSTPT; ## open input file open (dataF, $fname) || die "ERROR: Can?t Open File $fname"; ## skip junk lines ## The number of junk lines is set by the output format of WRspice. $line=; $line=; $line=; $line=; $line=; $line=; ## open output file (in append mode) open (resultF, ?>>points/points_clZCL_amZAM_thZTH.dat?); # open (resultG, ?>>points/metric_clZCL_amZAM_thZTH.dat?); printf # resultF ("\# inPhase, DeltaPhase, OutPhase, frequency, threshold, # flagtype, index\n"); ## the flags keep track of phase 0 and 1 if they are in the flipped or ## unflipped state $flag0 = 0; $flag1 = 0; $idx = 1; $flag1st = 0; while($line=) { ($index, $time, $p0, $p1)=split(" ", $line); ## look for upward threshold on p0 if ($p0 >= $thresh && $flag0 == 0) { $Dpost = $p0 - $thresh; $Dprior = $thresh - $p0old; $up0 = ($time* $Dprior + $timeold* $Dpost)/($Dpost + $Dprior); $flag0 = 1; } ## look for upward threshold on p1 if ($p1 >= $thresh && $flag1 == 0) { $Dpost = $p1 - $thresh; $Dprior = $thresh - $p1old; 242 $up1 = ($time* $Dprior + $timeold* $Dpost)/($Dpost + $Dprior); ## calculate normalized phase $DeltaT = $up1 - $up0; $DeltaP = $DeltaT * 2 * $pi * $frq * 1e9; $InputP = $up0 * 2 * $pi * $frq * 1e9; $InputP = $InputP / (2*$pi); $InputP = $InputP - int($InputP); $InputP = $InputP * 2 * $pi; $OutptP = $up1 * 2 * $pi * $frq * 1e9; $OutptP = $OutptP / (2*$pi); $OutptP = $OutptP - int($OutptP); $OutptP = $OutptP * 2 * $pi; ## print results to file file format: inPhase, DeltaPhase, ## OutPhase, frequency, threshold, flagtype, index if ($DeltaP >= 0 && $DeltaP <= $pi/2.0 && $InputP < $pi) { if ( $stpt == 0 && $flag1st == 0) { printf resultG ("earlyp = %f \# ZPN + \n", $InputP); $flag1st = 1; } if ( $stpt == 1 && $flag1st == 0) { printf resultG ("fitstart = %f \# ZSTPT +\n", $InputP); $flag1st = 1; } if ( $flag1st == 0) { printf resultF ("%f \t %f \t %f \t %f \t %f \t %f \t %f \n", $InputP, $DeltaP, $OutptP, $frq, $thresh, $flag1, $idx); } } print "$InputP \t $DeltaP \t $flag1 \t $idx \n"; $idx = $idx + 1; $flag1 = 1; } ## look for downward threshold on p0 if ($p0 <= (1-$thresh) && $flag0 == 1) { $Dpost = (1-$thresh) - $p0; $Dprior = $p0old - (1-$thresh); $down0 = ($time* $Dprior + $timeold* $Dpost)/($Dpost + $Dprior); $flag0 = 0; } ## look for downward threshold on p1 if ($p1 <= (1-$thresh) && $flag1 == 1) { $Dpost = (1-$thresh) - $p1; $Dprior = $p1old - (1-$thresh); $down1 = ($time* $Dprior + $timeold* $Dpost)/($Dpost + $Dprior); ## calculate normalized phase $DeltaT = $down1 - $down0; $DeltaP = $DeltaT * 2 * $pi * $frq * 1e9; $InputP = ($down0 * 2 * $pi * $frq * 1e9); $InputP = $InputP / (2*$pi); $InputP = $InputP - int($InputP); $InputP = $InputP * 2 * $pi - $pi; $OutptP = ($down1 * 2 * $pi * $frq * 1e9); $OutptP = $OutptP / (2*$pi); 243 $OutptP = $OutptP - int($OutptP); $OutptP = $OutptP * 2 * $pi - $pi; ## print results to file file format: inPhase, DeltaPhase, ## OutPhase, frequency, threshold, flagtype if ($DeltaP >= 0 && $DeltaP <= $pi/2.0 && $InputP < $pi) { if ( $flag1st == 0) { printf resultF ("%f \t %f \t %f \t %f \t %f \t %f \t %f \n", $InputP, $DeltaP, $OutptP, $frq, $thresh, $flag1, $idx); } } print "$InputP \t $DeltaP \t $flag1 \t $idx \n"; $idx = $idx + 1; $flag1 = 0; } $p0old = $p0; $p1old = $p1; $timeold = $time; } close(resultF); # close(resultG); B.3.3 Gnuplot Fitting of Data This gnuplot script fits the data to both (3.13) and (3.14). It also produces various graphs for analysis of the simulations. ## GNUplot fitting instructions. Does not need to be modified. ## ## This gnuplot script does the bulk of the fitting work need to # extract the fitting parameters. It is a general script which takes # a standard input generated elsewhere in the timing extraction # process. Thus, it generally shouldn?t need to be modified at all. # A number of variables appear here in ALL CAPS, which means they # will be replaced by the motherscript (generate_pulses.pl) before # the run-copy is executed. I have added a lot of commentary here to # explain what is actually going on. ## Double hashes indicate comments and should not be changed. Single # hashes are used to comment out code that could, in principle, be # run. This line is the exception. ## IcRN is a fixed value for a given process. IcRN = ZICRN # fixed for each process, in mV ## These three are variables that change with each simulation, though 244 # is generally constant for any given gate. f = ZCL # in GHz A = ZNOM*(0.72*ZAM+0.28) ## linear fit for nominal, high, low values N = ZJJ ## Initialize the fitting parameters to unity. This is not just a # mathematical convenience; in theory, these should stay at unity. a1 = 1.0 a2 = 1.0 a3 = 1.0 ## Do some math to calculate more easy-to-use parameters. w = 2*pi*f ## angular frequency ep=1/100.0 ## fake derivative constant t = 2.07 / (2*IcRN) # calculate t0 from IcRN, the minimum switching time d = N*w*t/A; d = d/1000 # fixes GHz x ps scale factor z = acos(d-1) # failure point zE= acos(d+1) # early window point (this is pretty much depreciated and not used at # all, but is here for legacy reasons) z1= 1/a2*acos(d/a3-1) # at this point z1 = z, though in principle it could change. Again, # something of a holdover. ## A custom digit cutoff function needed for output. I don?t want too ## many digits clogging up my results. Results are normalized to ## values of order unity, so past the fourth digit or so no real ## information is lost. rdx = 4 ;round(x) = (x != 0) ? ## 10**(floor(log10(x))-(rdx-1))*floor(0.5+x/(10**(floor(log10(x))-(rdx-1)))) ## : 0 ## Make things pretty for gnuplot. reset unset label unset arrow set print ?-? set term x11 set samples 600 ## An interactive output. Not important for big runs, but useful for ## debugging. pr "\nParameters: " pr "t0 = ", t, " ps", "\t", "A = ", A pr "f = ", f, " GHz", "\t", "N = ", N pr "a1 = ", a1, "\t", "a2 = ", a2, "\t", "a3 = ", a3 pr " " 245 pr "d = ", round(d) pr "z = ", round(z), " rad \t(Failure Point)" pr "z = ", round(zE), " rad \t(Early Failure Point)" ## Define the Taylor Series Expansion of ArcCos(x) and Cos(x) about 0 ## and pi/2, respectively. p_acos(x) = \ pi/2 - x - \ 1/6.0 * x**3 - \ 3/40.0 * x**5 - \ 5/112.0 * x**7 - \ 35/1152.0 * x **9 - \ 63/2816.0 * x**11 # p_cos(x) = 1 - 1/2.0 * x**2 + 1/24.0 * x**4 - 1/720.0 * x**6 + # 1/40320.0 * x**8 - 1/3628800.0 * x**10 legacy fit about zero. # Improved by the following: p_cos(x) = \ -(x-pi/2) + \ 1.0/6 * (x-pi/2)**3 -\ 1.0/120 * (x-pi/2)**5 + \ 1.0/5040 * (x-pi/2)**7 - \ 1.0/362880 * (x-pi/2)**9 ## Define the analytic timing equation, both in trig and taylor. f(x) = acos(cos(x)-d)-x p_f(x) = p_acos(p_cos(x)-d)-x ## Define three fitting values used for the fitting process. c1=1.0 c2=1.0 c3=1.0 ## Define the fit version of the timing equation. So far, it is ## identical to f(x) and p_f(x). g(x) = c1*(acos(cos(c2*x)-d*c3)-(c2*x)) p_g(x) = c1*(p_acos(p_cos(c2*x)-d*c3)-(c2*x)) ## Define some limits. This will plot a graph only in the range of ## the unfitted curve. x_lim = 1.2 * z 246 y_lim = 1.2 * f(0) # if (y_lim < f(0)) y_lim = 1.2*f(0) # legacy, delete ## Do some good gnuplot stuff. I?m really just making the graphs look ## nice. This section isn?t vital for fitting. set xrange [0:x_lim] set xlabel "Input Phase [rad]" set xtics nomirror set yrange [0:y_lim] set ylabel "Output Phase Delay [rad]" set ytics nomirror ## Here, I?m calculated the appropriate time (in ps) for the ## normalized time (i.e. input phase). set x2range [0:x_lim/w*1000] set x2label "Input Time [ps]" set x2tics 0,floor(x_lim/w*1000/8.0) set y2range [0:y_lim/w*1000] set y2label "Output Time Delay [ps]" # set y2tics 0,floor(y_lim/w*1000/8.0) stack = (floor(y_lim/w*1000/8.0) > 1 ? floor(y_lim/w*1000/8.0) : 1) set y2tics 0, stack ## This section uses Newton?s method to calculate the point at which ## the meta-stable point is found. x0 = 0.9*z ## start point h(x) = f(x)-pi/2 ## zero?d function h1(x) = (h(x+ep)-h(x-ep))/(2*ep) ## fake derivative x0 = x0 - h(x0)/h1(x0); x0 = real(x0) x0 = x0 - h(x0)/h1(x0); x0 = real(x0) x0 = x0 - h(x0)/h1(x0); x0 = real(x0) x0 = x0 - h(x0)/h1(x0); x0 = real(x0) x0 = x0 - h(x0)/h1(x0); x0 = real(x0) x0 = x0 - h(x0)/h1(x0); x0 = real(x0) x0 = x0 - h(x0)/h1(x0); x0 = real(x0) x0 = x0 - h(x0)/h1(x0); x0 = real(x0) x0 = x0 - h(x0)/h1(x0); x0 = real(x0) ## For N delta < 1 the metastable point coincides with the end of the ## timing window. So just leave it there. 247 if (f(z)-pi/2 < 0) x0 = z; pr "x0 = z" if (imag(x0)!=0) print "Stability Limit Not Found" pr "x0 = ", round(x0), " rad \t(Stability Limit)" ## Now we?re going to do the same with the stable timing point. y0 = 0.0 h(x) = f(x)-pi/2 h1(x) = (h(x+ep)-h(x-ep))/(2*ep) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) y0 = y0 - h(y0)/h1(y0); y0 = real(y0) if (imag(y0)!=0) print "Stability Point Not Found" if (real(y0)< 0) print "Stability Point Probably at 0"; y0 = 0.0 pr "y0 = ", round(y0), " rad \t(Stability Point)" pr " " ## End of message outputs ## Draw in some dots where the stability points are found. set label at 0.9*z,f(0.9*z) point lt 0 pt 7 ps 1 set label at 0.9*z1,g(0.9*z1) point lt 0 pt 7 ps 1 set label at x0,f(x0) point lt 1 pt 7 ps 2 set label at y0,f(y0) point lt 1 pt 7 ps 2 ## This part outputs the timing parameters individually in a file. ## It?s a legacy part of the script and not essential. This is ## different from the fitting parameters, which have not been ## calculated yet and are output separately. set print "output_params.txt" pr d, z, x0, y0 set print ?-? ## END GENERAL INPUT ## This ends the setup for the fitting. We now ## begin the actually fitting process. ## Make pretty gnuplot graphs unset label 248 ## Legacy stuff, delete # fitstart = 0 # earlyp = 0 # load "points/metric_clZCL_amZAM_thZTH.dat" ## This file contains the range of fitting values. We need to know ## over which range to fit. load "firstlast.dat" ## Initialized fitting parameters. c1 = 1.0 c2 = 1.0 c3 = 1.0 ## fit to analytic equation, excuding the very last few points. We ## only want the middle 90%. fit \ [firstpoint+0.05*(lastpoint-firstpoint):lastpoint-0.05*(lastpoint-firstpoint)]\ g(x) ?sorted.dat? using 1:2 via c1, c2, c3 ## Save the fit values for later use. b1 = c1 b2 = c2 b3 = c3 ## Gnuplot doesn?t work well with interdependant fitting variables. ## The Out-values here correspond with theory. out1 = c1*c3 out2 = c2 out3 = 1.0/c3 ## calculate the new cutoff point based on the fit data. z1 = 1/c2*acos(d*c3-1) ## display the endpoint on the graph with an explicit dot set label at z1,g(0.999*z1) point lt 2 pt 7 ps 1 ## Introduce a piecewise 2nd-order polynomial with break at midway p1(x) = (k11*x+k12)*x+k13 p2(x) = (k21*x+k22)*x+k23 ## note that midway is read from the file "firstlast.dat" 249 w1 = midway ## Join the two polynomials in a piecewise fashion. z2(x) = x < w1 ? p1(x) : (x < lastpoint ? p2(x) : 1/0) ## Do the fit for the middle 90%. fit \ firstpoint+0.05*(lastpoint-firstpoint):lastpoint-0.05*\ lastpoint-firstpoint)] z2(x) "sorted.dat" using 1:2 via \ 11, k12, k13, k21, k22, k23 ## Legacy # out1 = c1 # out2 = c2 # out3 = c3 # z1E= z1 # z1 = 1/out2*acos(d/out3-1) ## The tolerace is a factor chosen to weed out bad fits. In ## principle, any fit works. However, for practical reasons we may ## wish to ignore fits with excessivly large or small values, as these ## tend to be hard to work with in VHDL. tolerance = ZTOL ## gobot is a flag to check for values within the tolerance. if any ## value is too big or too small, the value is NOT written to the ## results file (though it is written to the gnu-file or refernce ## later). gobot = 1 if (out1 > tolerance) gobot = 0 if (out2 > tolerance) gobot = 0 if (out3 > tolerance) gobot = 0 if (out1 < 1.0/tolerance) gobot = 0 if (out2 < 1.0/tolerance) gobot = 0 if (out3 < 1.0/tolerance) gobot = 0 if (gobot == 0) pr "Warning: Tolerances Exceeded" pr "" pr "params : \t a1 \t a2 \t a3" pr "nu_fit : \t", out1, "\t", out2, "\t", out3 pr "polyfit: \t", b1, "\t", b2, "\t", b3 pr "" 250 ## The fit is done and checked. We now output the values to a file. set print "results/TX_ZNAME_vZICRN_thZTH_xZTOL.dat" append if (gobot == 1) pr f, A, out1, out2, out3, z1, firstpoint, lastpoint ## Also output a more human-readable version. set print "results/TX_ZNAME_vZICRN_thZTH_xZTOL_human.dat" append if (gobot == 1) pr f, " & ", A, " & ", round(out1), " & ", round(out2), \ & ", round(out3), " & ", round(z1), " & ", round(firstpoint), " & ", \ ound(lastpoint), " \\\\" ## Also output the polyfit values to a SEPARATE file. set print "results/TX_ZNAME_vZICRN_thZTH_xZTOL_poly.dat" append pr f, A, k11, k12, k13, k21, k22, k23, w1, firstpoint, lastpoint ## Human-readable polytext set print "results/TX_ZNAME_vZICRN_thZTH_xZTOL_polyhuman.dat" append pr f, " & ", round(A), " & ", round(k11), " & ", round(k12), " & ", \ ound(k13), " & ", round(k21), " & ", round(k22), " & ", round(k23), " & ", \ ound(w1), " & ", round(firstpoint), " & ", round(lastpoint), " \\\\" idx = ZIDX ## We also want the values easily accessible in gnuplot. The ## following outputs the same data in a gunplot-readable file. set print "results/gnu_ZNAME_vZICRN_thZTH_xZTOL.dat" append pr \ "f", idx, " = ", f, \ "; A", idx, " = ", A, \ "; outA", idx, " = ", round(out1), \ "; outB", idx, " = ", round(out2), \ "; outC", idx, " = ", round(out3), \ "; z", idx, " = ", round(z1), \ "; fp", idx, " = ", round(firstpoint), \ "; lp", idx, " = ", round(lastpoint), "\n", \ "nu_", idx, "(x) = ", c1, \ *(acos(cos(", c2, "*x)-", d*c3, ")-(", c2, "*x))\n", \ "k11_", idx, " = ", round(k11), \ "; k12_", idx, " = ", round(k12), \ "; k13_", idx, " = ", round(k13), \ "; k21_", idx, " = ", round(k21), \ "; k22_", idx, " = ", round(k22), \ "; k23_", idx, " = ", round(k23), \ "; w1_", idx, " = ", round(w1), "\n", \ 251 "poly_", idx, "(x) = x < ", round(w1), " ? (", k11, "*x+", k12, ")*x+", k13, " : (x "maxidx = ", idx, "\n" ## check the limits of the functions to make sure we get a good view ## of the whole thing ylim = (g(0) > f(0)) ? g(0) : f(0) set yrange [0:1.2*ylim] x_lim = (lastpoint > z1) ? lastpoint : z1 x_lim = 1.1 * x_lim x_lim = x_lim > pi ? pi : x_lim ## display the relevant data on the graph set label sprintf("a_1 = %g", out1) at x_lim/4.0, 1.0*ylim set label sprintf("a_2 = %g", out2) at x_lim/4.0, 0.9*ylim set label sprintf("a_3 = %g", out3) at x_lim/4.0, 0.8*ylim ## calculate a few values for putting the values on the figure x_loc = 0.7*x_lim y_loc = ylim*0.8 delta = ylim*0.07 ## output the circuit run parameters for reference. set label sprintf("f = %g GHz", f) at x_loc, y_loc-delta # set label sprintf("$\\mathrm{t_0}$ = %g ps", t) at x_loc, y_loc-delta set label sprintf("A = %g", A) at x_loc, y_loc-2*delta set label sprintf("N = %g", N) at x_loc, y_loc-3*delta set label sprintf("z = %g rad", round(z1)) at x_loc, y_loc-4*delta set label at firstpoint,g(firstpoint) point lt 2 pt 13 ps 1 set label at lastpoint,g(lastpoint) point lt 2 pt 13 ps 1 # set arrow from firstpoint,0 to firstpoint,g(firstpoint) if # (lastpoint -g1 ? (n20(x+g1)-n8(x+g1))+g2 : 1/0 trun2(x) = x > -k1 ? (m20(x+k1)-m8(x+k1))+k2 : 1/0 trun3(x) = x > -l1 ? (h20(x+l1)-h8(x+l1))+l2-25 : 1/0 ## cosmetic appearance set yrange [20:130] set label at 5.2, 120 "0.5 GHz" set label at 5.2, 80 "1.0 GHz" set label at 5.2, 52 "1.5 GHz" # set border 3 set xtics nomirror set ytics nomirror ## plot the results plot \ "dat_twoout.dat" u 1:2:($3/2) w yerrorbars ls 0 pt 7, \ "dat_twoout2.dat" u 1:($2+10):($3/2) w yerrorbars ls 0 pt 7, \ "dat_twoout3.dat" u 1:($2-25):($3/2) w yerrorbars ls 0 pt 7, \ trun1(x) ls 1 lw 3, \ trun2(x)+10 ls 1 lw 3, \ trun3(x) ls 1 lw 3, \ (An20(x+Ag1)-An8(x+Ag1))+Ag2 ls 3 lw 1, \ (Am20(x+Ak1)-Am8(x+Ak1))+Ak2+10 ls 3 lw 1, \ (Ah20(x+Al1)-Ah8(x+Al1))+Al2-25 ls 3 lw 1 ## Output the data in a useful fashion. pr ?? pr "g1 = ", round(g1), " g2 = ", round(g2), " \ g3 = ", round(g3), " g4 = ", round(g4) pr "k1 = ", round(k1), " k2 = ", round(k2), " \ k3 = ", round(k3), " k4 = ", round(k4) pr "l1 = ", round(l1), " l2 = ", round(l2), " \ l3 = ", round(l3), " l4 = ", round(l4) pr ?? 280 set print "two_out.tex" pr " & & 0.5 GHz & 1.0 GHz & 1.5 GHz \\\\ \\hline" pr "\\multirow{4}{*}{Red Fit} & $\\gamma_1$ & ", \ round(1.0/g3), " & ", round(1.0/k3), " & ", \ round(1.0/l3), " \\\\" pr " & $\\gamma_2$ & ", round(g1), " &", round(k1), \ " & ", round(l1), " \\\\" pr " & $\\gamma_3$ & ", round(g2), " & ", round(k2), \ " & ", round(l2), " \\\\" pr " & $\\gamma_4$ & ", round(g4/g3), " & ", \ round(k4/k3), " & ", round(l4/l3), " \\\\ \\hline" pr "\\multirow{4}{*}{Blue Fit} & $\\gamma_1$ & ", \ round(1.0/Ag3), " & ", round(1.0/Ak3), " & ", \ round(1.0/Al3), " \\\\" pr " & $\\gamma_2$ & ", round(Ag1), " &", round(Ak1), \ " & ", round(Al1), " \\\\" pr " & $\\gamma_3$ & ", round(Ag2), " & ", round(Ak2), \ " & ", round(Al2), " \\\\" pr " & $\\gamma_4$ & ", round(Ag4/Ag3), " & ", \ round(Ak4/Ak3), " & ", round(Al4/Al3), " \\\\ \\hline" # set xrange [0:2*pi] set term epslatex color set output "fig_twoexp.tex" replot set term x11 system "epstopdf fig_twoexp.eps" set term epslatex color standalone set output "fig_twoexp_sa.tex" replot set term x11 set print ?-? pr "latex fig_twoexp_sa.tex; dvips fig_twoexp_sa; \ epstopdf fig_twoexp_sa.ps" 281 D.4 Calculation of Depressed IcRN Product Experiment 2 in Chapter 5 indicated that at a known calibration point, Ib = Ic, the measurement of the bias current through the junctions could confirm the power delivered to the chip. Results matched expectations, given that attenuators in the test setup are accurate only to approximately 10%. This reference was subsequently used in calculations for Experiment 3. When adjusted using this calibration, the measured IcRN product of the experimental chip was 0.770 mV. Alternatively, I can try ignoring this correction and proceed with the calcu- lations. The results are shown in Table D.2. In this case, the average value of the timing parameter t0 is 0.77 ps and the average value of IcRN is 0.44 mV. Just as importantly, as frequency increases the lower bias current limit increases to close to A = 1, which one would expect close to the highest operating frequency. Assuming a depression of the IcRN product by almost one-half, the maximum frequency of the circuits in the experiment would decrease by the same factor. This fits observations that the maximum operating frequency of the experiments was 2.5 GHz, despite a design for a maximum frequency of 5 GHz. While the IcRN product is generally constant for a given process, it can ef- fectively be changed by a different ?c value. The ?c value is determined both by design and fabrication. The value of the shunt resistor RN can be chosen to essen- tially scale the IcRN product and change the switching time of junctions. However, misalignment during the fabrication process can change the resistor values from the design, thus slowing down the junctions. A test of the chip fabrication properties 282 requires specialized circuits, and the results can depend greatly on the location of the chip on the wafer. This alternative explanation is given to account for different possible reasons for the low maximum frequency instead of only a shift in current values from those measured before calibration. Table D.2: Alternative switching time calculation results from the long, deep pipeline shift register experiment. (See Chapter 5, particularly Table 5.4, pg. 189.) Average timing parameter t0 is 0.77 ps. Average value of IcRN is 0.44 mV. Both averages have a spread of about 20%. The value of IcRN used for design was assumed to be 0.75 mV. The correction factor from Experiment 2 was not applied in this case. The lower IcRN value explains the lower maximum operating frequency of the circuits. f 1.0 1.5 2.0 2.5 GHz Pin 4.500 4.500 4.500 4.500 dBm Pout 1.800 2.400 2.700 4.100 dBm Pchip 2.700 2.100 1.800 0.400 dBm Ib/Ic 0.537 0.617 0.661 0.912 t0 1.007 0.771 0.620 0.684 ps IcRN 0.325 0.425 0.529 0.479 mV 283 Appendix E Hypres Fabrication Summary The experiments described in this thesis were performed on chips manufac- tured by Hypres, Inc. This appendix briefly summarizes the design rules for this process. The Hypres niobium integrated circuit fabrication design rules are avail- able on the Hypres website [29]. The designs described here follow the 24th revision, dated January 11, 2008. Niobium is the superconducting material in the Hypres integrated circuits. The junctions are Niobium/Aluminum-Oxide/Niobium SIS Josephson junctions fab- ricated using an in-situ trilayer over the entire wafer. Junction areas are created through photolithography and etching of the trilayer. The photolithography does not employ any size reduction. All my integrated circuits used the 4.5 kA/cm2 critical current density process. There are four superconducting niobium metal lay- ers. Josephson junctions are connected between the second and third metallization layers. Junctions are shunted by normal metal in a separate molybdenum normal metal layer. The molybdenum layer has a sheet resistance of 2.1? 0.3? per square. The metal layers are insulated with silicon dioxide. Josephson junctions are addi- tionally insulated by anodization of the base electrode of the trilayer. The specific capacitance of junctions is approximately 59 fF/?m2 for the 4.5kA/cm2 process. Fabrication is done on 6-inch (150 mm) diameter oxidized silicon wafers. The 284 Hypres design rules specify a number of constraints on design of integrated circuits and the accuracy of circuit elements in fabrication. These can be found in the published design rules. The physical design of integrated circuits is shown in Table E.1. This includes 11 process layers, a minimum feature size of 1?m, and current density tolerance and resistor tolerances of ?5% on chip and ?15% between runs. The maximum microstrip impedance using this process is 42?, using M0 as the signal layer, M3 as the ground plane. In this case, the width of the microstrip is 2.5?m and the spacing between the M0 signal and M0 ground is 2.5?m. Due to fabrication bias of ?0.2?m, the actual fabricated width of the microstrip is about 2.3?m wide. 285 Table E.1: Hypres fabrication design specifications. Taken from Hypres Design Rules. Some thicknesses not given or not applicable (na/ng) Layer Name Thickness Description (none) na/ng Niobium deposition M0 100? 10?m M0 paterning (holes in niobium ground plane) (none) na/ng SiO2 deposition I0 150? 15?m Contact (via) between M1 and ground plane (none) na/ng Niobium / Aluminum Oxide / Niobium trilayer deposition I1C 50? 5?m Counter-electrode (junction area) definition (none) na/ng Base electrode anodization AI 40? 5?m Anodization layer patterning M1 135? 10?m Trilayer base electrode patterning (none) 100? 10?m SiO2 deposition (none) na/ng Resistive layer deposition R2 na/ng Resistor patterning (none) 100? 10?m SiO2 deposition I1B na/ng Contact (via) between M2 and (I1A, R2, or M1) (none) na/ng Nb deposition M2 300? 20?m M2 layer patterning (none) 500? 40?m SiO2 deposition I2 na/ng Contact (via) between M2 and M3 (none) na/ng Nb deposition M3 600? 50?m M3 layer patterning (none) na/ng Ti/Pd/Au contact metallization deposition R3 350? 60?m Contact pad patterning 286 Appendix F Spice Netlist of CLA * HNL Generated netlist of AN_chip21 .global 0 * MODEL Declarations * Found stopping cell - lm_open * Found stopping cell - junction * Found stopping cell - muind * Found stopping cell - rsj .subckt rsj a b jjmod=jj110D ic=0.25 icrn=0.7 rsh=2.8 lprsh=1.5p b0 a b phi jjmod area=ic r_sh a a1 rsh lp_rsh a1 b lprsh .ends rsj * Found stopping cell - lp * Found stopping cell - inductor * Found stopping cell - resistor * End MODEL Declarations .subckt on_input_wireup ci1 ci2 ci3 ci4 ci5 ci6 ci7 ci8 cq1 cq2 cq3 cq4 cq5 cq6 cq7 cq8 wi1 wi2 wi3 wi4 wi5 wi6 wi7 wi8 wq1 wq2 wq3 wq4 wq5 wq6 wq7 wq8 L15 wq1 cq1 1p L14 wq2 cq2 1p L13 wq3 cq3 1p L12 wq4 cq4 1p L11 wq5 cq5 1p L10 wq6 cq6 1p L9 wq7 cq7 1p L8 wq8 cq8 1p L7 wi8 ci8 1p L6 wi7 ci7 1p L5 wi6 ci6 1p L4 wi5 ci5 1p L3 wi4 ci4 1p L2 wi3 ci3 1p L1 wi2 ci2 1p 287 L0 wi1 ci1 1p .ends on_input_wireup .subckt on_output_wireup ci1 ci2 ci3 ci4 ci5 ci6 ci7 ci8 cq1 cq2 cq3 cq4 cq5 cq6 cq7 cq8 wi1 wi2 wi3 wi4 wi5 wi6 wi7 wi8 wq1 wq2 wq3 wq4 wq5 wq6 wq7 wq8 L15 wq1 cq1 1p L14 wq2 cq2 1p L13 wq3 cq3 1p L12 wq4 cq4 1p L11 wq5 cq5 1p L10 wq6 cq6 1p L9 wq7 cq7 1p L8 wq8 cq8 1p L7 wi8 ci8 1p L6 wi7 ci7 1p L5 wi6 ci6 1p L4 wi5 ci5 1p L3 wi4 ci4 1p L2 wi3 ci3 1p L1 wi2 ci2 1p L0 wi1 ci1 1p .ends on_output_wireup .subckt on_50_ohm_wps_40pin p1 p2 p3 p4 p5 p6 p7 p8 uwin R6 net49 net11 23.56 R5 net47 net13 20.62 R4 net43 net15 20.62 R3 p8 p7 58.62 R2 p6 p5 58.62 R1 p4 p3 58.62 R0 p2 p1 58.62 L16 net47 p8 1p L15 net47 p7 1p L14 net13 p6 1p L13 net13 p5 1p L12 net43 p4 1p L11 net43 p3 1p L10 net15 p2 1p L9 net15 p1 1p L8 net11 net15 1p L7 net11 net43 1p L6 net49 net13 1p L5 net49 net47 1p 288 L4 net53 net49 1p L3 net53 net11 1p L0 uwin net53 1p .ends on_50_ohm_wps_40pin .subckt n_bias_out bias c0 c1 d0 d1 c0 c0 0 5.28f K1 L1 L0 0.936529 K1d L1d L0 0.218218 L0 0 bias 1p L1 c1 c0 13.18p L1d d1 d0 5.25p .ends n_bias_out .subckt m_out_rql a b c c00 c01 c10 c11 dc0 dc1 L0 net72 net33 12.6p L6 net41 net72 750f L9 net57 net59 1.5p Lf c10 c00 1f L10 net59 net38 9.2p L5 net61 b 962f L4 net61 net59 962f L3 net68 net59 750f L2 a net68 750f L8 net57 net70 2.56p L7 net70 net72 1.05p Lp0 net39 0 200f Lc net41 c 2f Lp4 net43 0 200f Lp2 net45 0 200f Lp3 net47 0 200f Lp1 net49 0 200f Xb2 net45 net57 rsj jjmod=Hyp5a ic=0.200 icrn=0.7 rsh=3.5 lprsh = 1.75p Xb0 net39 net41 rsj jjmod=Hyp5a ic=0.400 icrn=0.35 rsh=875.000m lprsh = 437.5f Xb4 net43 net61 rsj jjmod=Hyp5a ic=0.312 icrn=0.7 rsh=2.24359 lprsh = 1.121795p Xb3 net47 net68 rsj jjmod=Hyp5a ic=0.400 icrn=0.7 rsh=1.75 lprsh = 875.000f Xb1 net49 net70 rsj jjmod=Hyp5a ic=0.282 icrn=0.7 rsh=2.48227 lprsh = 1.241135p XI14 net33 c11 net34 dc0 net37 n_bias_out XI13 net38 net34 c01 net37 dc1 n_bias_out .ends m_out_rql .subckt m_out_squid a b c c0 g 0 44f c1 h 0 16f c2 f 0 16f 289 c3 c f 5f c4 d h 5f c5 e bb 10f K0 L0 L1 0.64 K1 L2 L3 0.64 Xb1 f g rsj jjmod=Hyp5a ic=0.168 icrn=0.42 rsh=2.5 lprsh = 1.25p b0 h g phib0 Hyp5a area=0.168 L6 b bb 120p L5 g a 120p L7 c d 200f L2 bb h 3.92p L3 d e 5.17p L1 e 0 5.17p L0 f bb 3.92p .ends m_out_squid .subckt m_out12 a c00 c01 c10 c11 dc0 dc1 q R1 net23 0 50 R0 0 net28 50 XI11 net77 net23 net33 net021 net022 c10 c11 dc0 net019 m_out_rql XI10 net84 net77 net36 net030 net031 net021 net022 net019 net028 m_out_rql XI9 net91 net84 net39 net033 net040 net030 net031 net028 net037 m_out_rql XI8 net98 net91 net42 net042 net049 net033 net040 net037 net046 m_out_rql XI7 net105 net98 net97 net051 net050 net042 net049 net046 net045 m_out_rql XI6 net112 net105 net104 net066 net067 net051 net050 net045 net064 m_out_rql XI5 net119 net112 net51 net069 net076 net066 net067 net064 net073 m_out_rql XI4 net126 net119 net118 net078 net085 net069 net076 net073 net082 m_out_rql XI3 net133 net126 net125 net087 net094 net078 net085 net082 net091 m_out_rql XI2 net140 net133 net132 net096 net0103 net087 net094 net091 net0100 m_out_rql XI1 net147 net140 net139 net0111 net0112 net096 net0103 net0100 net0109 m_out_rql XI0 a net147 net146 c00 c01 net0111 net0112 net0109 dc1 m_out_rql XI23 q net37 net33 m_out_squid XI22 net37 net40 net36 m_out_squid XI21 net40 net43 net39 m_out_squid XI20 net43 net44 net42 m_out_squid XI19 net44 net49 net97 m_out_squid XI18 net49 net52 net104 m_out_squid XI17 net52 net55 net51 m_out_squid XI16 net55 net58 net118 m_out_squid XI15 net58 net61 net125 m_out_squid XI14 net61 net64 net132 m_out_squid XI13 net64 net67 net139 m_out_squid XI12 net67 net28 net146 m_out_squid .ends m_out12 290 .subckt n_bias bias c0 c1 d0 d1 c0 c0 0 2.64f L0 0 bias 1p L1 c1 c0 6.59p L1d d1 d0 5.25p K1 L1 L0 0.662226 K1d L1d L0 0.218218 .ends n_bias .subckt q_out400e_v a bias q Xb1 net010 net050 rsj jjmod=Hyp5a ic=0.400 icrn=0.75 rsh=1.875 lprsh = 937.5f Xb2 net035 net026 rsj jjmod=Hyp5a ic=0.200 icrn=0.42 rsh=2.1 lprsh = 1.05p LPbias1 net011 0 1f b3 net023 net030 phib3 Hyp5a area=0.141 Lg1 net050 0 200f Lg2 net026 0 600f Lg3 net023 net035 100f Lg5 net035 q 100f Lp net010 net012 55f L7 net030 net011 3.5p Lbias bias net010 9.3p L5 net012 a 1.05p L6 net012 net030 2.5p .ends q_out400e_v .subckt q_out282e a bias q Lg0 net013 0 200f Xb0 net08 net013 rsj jjmod=Hyp5a ic=0.282 icrn=0.75 rsh=2.659574 lprsh = 1.329787p L3 a net08 1.5p L4 net08 q 1.5p Lbias bias net08 13.7p .ends q_out282e .subckt q_in a0 a1 q Xb0 net020 net013 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p Lb0 net020 0 170f L1 net013 q 200f K2 L2 L0 0.484934 L0 0 net013 12p L2 a1 a0 93p .ends q_in 291 .subckt q_c0_io a a0 a1 c01 c11 dc0 dc1 q qv XI4 net14 c01 net11 net13 dc1 n_bias XI3 net19 net11 c11 dc0 net13 n_bias XI2 net20 net14 qv q_out400e_v XI1 a net19 net20 q_out282e XI0 a0 a1 q q_in .ends q_c0_io .subckt a_jtl_chop a bias q Lg0 net013 0 220f Lg1 net05 0 150f L6 net014 q 1f Xb1 net014 net05 rsj jjmod=Hyp5a ic=0.200 icrn=0.75 rsh=3.75 lprsh = 1.875p Xb0 net021 net013 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p L3 a net021 3.0p L4 net021 net019 3.0p Lbias bias net019 13.4p L5 net019 net014 2.1p .ends a_jtl_chop .subckt n_bias_ihm bias c0 c1 d0 d1 c0 c0 0 1.79f L0 0 bias 1p L1 c1 c0 4.47p L1d d1 d0 3.67p K1 L1 L0 0.402036 K1d L1d L0 0.260998 .ends n_bias_ihm .subckt a_anotb a b d0 d1 q cb b 0 58.71f ca a 0 91.30f Lp0 net19 net023 96f Lg1 net018 0 172f Lg0 net027 0 106f Xb0 net023 net027 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p Xb1 net019 net018 rsj jjmod=Hyp5a ic=0.100 icrn=0.75 rsh=7.5 lprsh = 3.75p K3 L10 L11 0.289 k05 L0 L5 0.0672226 k65 L5 L6 0.815404 L11 0 net031 1p L10 d1 net044 1p L3 b net019 2.1p 292 L7 net19 q 3.0p L4 a net19 2.1p L0 d0 net044 1p L5 net019 0 18.355p L6 net031 net023 19.913p .ends a_anotb .subckt a_jtle_chop a bias q Lg0 net08 0 230f Lg1 net010 0 150f L6 net014 q 1f Xb1 net014 net010 rsj jjmod=Hyp5a ic=0.200 icrn=0.75 rsh=3.75 lprsh = 1.875p Xb0 net013 net08 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p L3 a net013 3.0p L4 net013 net019 2.67p Lbias bias net019 11.7p L5 net019 net014 2.43p .ends a_jtle_chop .subckt a_or a b bias qo L5 net10 net029 20p L6 net10 net62 20p L3 net050 net44 20.5p L4 net58 net44 20.5p L8 net51 qo 3.0p L7 net19 net20 23.5p L9 net059 net034 15.2p Lbias bias net085 17.4p Xb0 net51 net29 rsj jjmod=Hyp5a ic=0.118 icrn=0.75 rsh=6.355932 lprsh = 3.177966p Xb1 net059 net41 rsj jjmod=Hyp5a ic=0.118 icrn=0.75 rsh=6.355932 lprsh = 3.177966p Lp1 net20 net059 671f Lp2 net19 net44 200f Lp0 net19 net085 277f Lb0 net29 0 245f Lb1 net41 0 62f Lp5 a net050 387f Lp4 net51 net085 11f Lp6 b net58 442f Lp3 net20 net10 630f k35 L3 L5 0.78 k46 L4 L6 0.78 LPag net034 0 1f 293 LPqq net050 net62 1f LPq net58 net029 1f .ends a_or .subckt a_and a b bias qa LPqq net030 net62 1f LPq net026 net028 1f L4 net026 net034 20.5p L6 net58 net62 20p L5 net58 net028 20p L3 net030 net034 20.5p L7 net19 net20 23.9p L9 net018 qa 3.0p Lbias bias net036 17.4p Xb0 net036 net019 rsj jjmod=Hyp5a ic=0.128 icrn=0.375 rsh=2.929688 lprsh = 1.464844p Xb1 net018 net049 rsj jjmod=Hyp5a ic=0.118 icrn=0.75 rsh=6.355932 lprsh = 3.177966p Lp1 net20 net018 760f Lp2 net19 net034 500f Lp0 net19 net036 470f Lg0 net019 0 170f Lg1 net049 0 90f Lp5 a net030 620f Lp6 b net026 620f Lp3 net20 net58 550f k35 L3 L5 0.78 k46 L4 L6 0.78 .ends a_and .subckt a_jtl_e a bias q Lg0 net08 0 230f Lg1 net010 0 230f Xb1 net014 net010 rsj jjmod=Hyp5a ic=0.200 icrn=0.75 rsh=3.75 lprsh = 1.875p Xb0 net022 net08 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p L3 a net022 3.0p L4 net022 net019 3.3p Lbias bias net019 11.2p L5 net019 net014 1.8p L6 net014 q 2.1p .ends a_jtl_e .subckt a_c3_b3 a b bias15 bias16 c00 c01 c10 c11 dc0 dc1 dc2 dc3 g gl gm pm1 q L0 net070 c11 1p 294 XI47 bias16 net075 net070 net0153 net085 n_bias XI46 bias15 c01 net075 net0148 net056 n_bias XI5 net31 net54 net55 a_jtl_chop XI40 net33 net0150 net0145 net085 net0148 n_bias_ihm XI43 net25 net0100 net0120 net0103 net0123 n_bias_ihm XI48 net0101 net097 c00 net059 dc1 n_bias XI44 net36 c10 net0115 dc0 net0118 n_bias XI22 net0149 net0145 net097 net056 net059 n_bias XI38 net0134 net090 net0150 net0133 net0153 n_bias XI26 net54 net0120 net090 net0123 net0133 n_bias XI27 net047 net0115 net0100 net0118 net0103 n_bias Xanotb a b dc2 dc3 net013 a_anotb XI2 gm net0134 net22 a_jtle_chop XI3 gl net0149 net34 a_jtle_chop XI4 pm1 net0101 net32 a_jtle_chop Xor net22 net55 net25 net49 a_or Xand1 net32 net34 net33 net31 a_and XI1 net49 net36 g a_jtl_e XI6 net013 net047 q a_jtl_e .ends a_c3_b3 .subckt a_jtle_ a bias q Lg0 net013 0 230f Lg1 net05 0 230f Xb1 net023 net05 rsj jjmod=Hyp5a ic=0.200 icrn=0.75 rsh=3.75 lprsh = 1.875p Xb0 net021 net013 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p L3 a net021 3.0p L4 net021 net037 2.67p Lbias bias net037 11.7p L5 net037 net023 2.43p L6 net023 q 2.1p .ends a_jtle_ .subckt a_and011 a b bias qa LPqq net053 net62 1f LPq net034 net029 1f L4 net034 net051 20.5p L6 net58 net62 20p L5 net58 net029 20p L3 net053 net051 20.5p L7 net19 net20 12.5p L9 net018 qa 3.0p Lbias bias net020 13.0p Xb0 net020 net019 rsj jjmod=Hyp5a ic=0.118 icrn=0.375 rsh=3.177966 295 lprsh = 1.588983p Xb1 net018 net063 rsj jjmod=Hyp5a ic=0.118 icrn=0.75 rsh=6.355932 lprsh = 3.177966p Lp1 net20 net018 616f Lp2 net19 net051 76f Lp0 net19 net020 400f Lg0 net019 0 275f Lg1 net063 0 96f Lp5 a net053 450f Lp6 b net034 450f Lp3 net20 net58 503f k35 L3 L5 0.78 k46 L4 L6 0.78 .ends a_and011 .subckt a_jtl a bias q Lg0 net013 0 230f Lg1 net010 0 230f Xb1 net022 net010 rsj jjmod=Hyp5a ic=0.200 icrn=0.75 rsh=3.75 lprsh = 1.875p Xb0 net021 net013 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p L3 a net021 3.0p L4 net021 net019 3.0p Lbias bias net019 13.4p L5 net019 net022 2.1p L6 net022 q 2.1p .ends a_jtl .subckt a_c2_b3 a a_ b b_ bias1a bias2a bias2i bias3 bias7 c00 c01 c10 c11 dc0 dc1 g1 g1a g2 g2a g2i g3 g7 g15 g16 gl gm p1 pl pm1 pm2 L0 c00 c10 1p XI2 net222 g2i net253 a_jtl_chop XI40 bias2a net168 net163 net165 net170 n_bias_ihm XI41 bias1a c01 net168 net170 dc1 n_bias_ihm XI33 net126 net153 net123 net125 net155 n_bias_ihm XI42 net234 net198 net133 net135 net0149 n_bias XI43 bias7 net133 c11 dc0 net135 n_bias XI36 net231 net183 net198 net0149 net185 n_bias XI30 net249 net193 net183 net185 net195 n_bias XI31 net216 net188 net153 net155 net190 n_bias XI39 bias2i net163 net158 net160 net165 n_bias XI38 bias3 net158 net173 net175 net160 n_bias XI35 net181 net173 net178 net180 net175 n_bias XI34 net191 net178 net188 net190 net180 n_bias XI17 net196 net203 net193 net195 net205 n_bias 296 XI28 net206 net123 net203 net205 net125 n_bias XI12 a_ net191 net211 a_jtle_ XI11 b_ net181 net214 a_jtle_ XI1 gm net216 net217 a_jtle_chop Xor net217 net253 net126 net218 a_or Xand1 pm1 gl g1a net222 a_and011 Xand2 pm2 pl g2a net227 a_and011 XI4 net244 net231 p1 a_jtl_e XI8 net250 net234 g2 a_jtl_e XI7 net250 g7 g1 a_jtl_e XI3 net227 g3 net244 a_jtl XI13 net214 net206 net247 a_jtl XI5 net218 net249 net250 a_jtl XI15 net247 g15 b a_jtl XI16 net262 g16 a a_jtl XI14 net211 net196 net262 a_jtl .ends a_c2_b3 .subckt a_jtl_chop_e a bias q Lg0 net010 0 230f Lg1 net015 0 158f L6 net022 q 1f Xb1 net022 net015 rsj jjmod=Hyp5a ic=0.200 icrn=0.75 rsh=3.75 lprsh = 1.875p Xb0 net013 net010 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p L3 a net013 3.0p L4 net013 net019 3.3p Lbias bias net019 11.2p L5 net019 net022 1.8p .ends a_jtl_chop_e .subckt m_outjtl282_400 a bias q Lg0 net010 0 200f Lg1 net04 0 200f Xb1 net023 net04 rsj jjmod=Hyp5a ic=0.400 icrn=0.75 rsh=1.875 lprsh = 937.5f Xb0 net013 net010 rsj jjmod=Hyp5a ic=0.282 icrn=0.75 rsh=2.659574 lprsh = 1.329787p L3 a net013 1.5p L4 net013 net027 1.5p Lbias bias net027 6.2p L5 net023 net027 1.05p L6 net023 q 1.05p .ends m_outjtl282_400 .subckt a_andor011 a b bias qa qo 297 LPqq net060 net022 1f LPq net62 net031 1f L5 net58 net031 20p L6 net58 net022 20p L3 net060 net035 20.5p L4 net62 net035 20.5p L8 net055 qo 3.0p L7 net19 net20 20.0p L9 net066 qa 3.0p Lbias bias net51 11.8p Xb0 net055 net29 rsj jjmod=Hyp5a ic=0.118 icrn=0.75 rsh=6.355932 lprsh = 3.177966p Xb1 net066 net41 rsj jjmod=Hyp5a ic=0.118 icrn=0.75 rsh=6.355932 lprsh = 3.177966p Lp1 net20 net066 760f Lp2 net19 net035 500f Lp0 net19 net51 470f Lg0 net29 0 140f Lg1 net41 0 90f Lp5 a net060 620f Lp4 net055 net51 1f Lp6 b net62 620f Lp3 net20 net58 550f k35 L3 L5 0.78 k46 L4 L6 0.78 .ends a_andor011 .subckt a_c5_b5 a b c00 c01 c10 c11 dc0 dc1 dc2 dc3 q XI31 b net072 net0110 a_jtl_chop_e XI32 a net067 net0107 a_jtl_chop_e XI51 net067 net068 net063 net065 net070 n_bias XI50 net072 c00 net068 net070 net0123 n_bias XI22 net059 net051 q m_outjtl282_400 XI47 net056 net078 net053 net081 net096 n_bias XI48 net051 c11 net078 dc0 net081 n_bias XI21 net062 net056 net059 a_jtle_ XI11 net49 net36 net062 a_jtl_e XI38 net0134 net0145 net053 net0133 net0153 n_bias XI26 net54 c01 net092 net0123 dc1 n_bias XI40 net33 net092 net0145 net0153 net065 n_bias_ihm Xandor net0110 net0107 net33 net31 net087 a_andor011 XI44 net36 c10 net063 net096 net0133 n_bias Xanotb net22 net55 dc2 dc3 net49 a_anotb XI1 net087 net0134 net22 a_jtl XI2 net31 net54 net55 a_jtl 298 .ends a_c5_b5 .subckt a_jtle_e a bias q Lg0 net09 0 250f Lg1 net011 0 250f Xb1 net014 net011 rsj jjmod=Hyp5a ic=0.200 icrn=0.75 rsh=3.75 lprsh = 1.875p Xb0 net013 net09 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p L3 a net013 3.0p Lbias bias net022 9.9p L4 net013 net022 3.0p L5 net022 net014 2.1p L6 net014 q 2.1p .ends a_jtle_e .subckt a_c5_b0 b c00 c01 c10 c11 dc0 dc1 dc2 dc3 q L1 dc2 dc3 1p L0 net042 c11 1p XI31 b net038 net033 a_jtl_e XI50 net038 c00 net035 net037 dc1 n_bias XI21 net060 net051 net041 a_jtle_ XI48 net046 net042 net047 dc0 net050 n_bias XI47 net051 net047 net0150 net050 net069 n_bias XI22 net041 net046 q m_outjtl282_400 XI1 net033 net0134 net49 a_jtle_e XI11 net49 net36 net060 a_jtle_e XI38 net0134 c01 net0150 net0123 net037 n_bias XI44 net36 c10 net035 net069 net0123 n_bias .ends a_c5_b0 .subckt a_c4_b5 _gl6 _gl7 a a_ c00 c01 c10 c11 dc0 dc1 g gl gl6_ gl7_ gm pm XI5 net281 net220 net280 a_jtl_chop XI23 net190 net186 net181 net189 net184 n_bias XI24 net185 net181 net236 net184 net239 n_bias XI28 net180 net176 c01 net179 dc1 n_bias XI43 net195 net217 c10 dc0 net219 n_bias_ihm XI40 net283 net227 net197 net199 net229 n_bias_ihm XI22 net205 c00 net202 net204 net189 n_bias XI55 net210 net222 net207 net209 net224 n_bias XI26 net220 net207 net217 net219 net209 n_bias XI38 net225 net197 net222 net224 net199 n_bias XI16 net293 net202 net227 net229 net244 n_bias XI18 net245 c11 net246 net244 net249 n_bias XI25 net276 net236 net176 net239 net179 n_bias XI19 net273 net246 net186 net249 net204 n_bias 299 XI13 gl net245 net253 a_jtl_e XI10 a_ net185 net259 a_jtle_e XI12 pm net180 net256 a_jtle_e XI11 gm net190 net262 a_jtle_e XI1 net259 net210 a a_jtle_ XI14 gl6_ net273 _gl6 a_jtl XI15 gl7_ net276 _gl7 a_jtl Xand1 net291 net294 net283 net281 a_and Xor net297 net280 net195 g a_or XI4 net256 net205 net291 a_jtle_chop XI3 net253 net293 net294 a_jtle_chop XI2 net262 net225 net297 a_jtle_chop .ends a_c4_b5 .subckt a_c4_b6 _gl7 a a_ c00 c01 c10 c11 dc0 dc1 g gl gl7_ gm pm XI5 net281 net220 net280 a_jtl_chop XI23 net190 net186 net181 net189 net184 n_bias XI24 net185 net181 net236 net184 net239 n_bias XI28 net180 net236 c01 net239 dc1 n_bias XI43 net195 net217 c10 dc0 net219 n_bias_ihm XI40 net283 net227 net197 net199 net229 n_bias_ihm XI22 net205 c00 net202 net204 net189 n_bias XI55 net210 net222 net207 net209 net224 n_bias XI26 net220 net207 net217 net219 net209 n_bias XI38 net225 net197 net222 net224 net199 n_bias XI16 net293 net202 net227 net229 net244 n_bias XI18 net245 c11 net246 net244 net249 n_bias XI19 net273 net246 net186 net249 net204 n_bias XI13 gl net245 net253 a_jtl_e XI10 a_ net185 net259 a_jtle_e XI12 pm net180 net256 a_jtle_e XI11 gm net190 net262 a_jtle_e XI1 net259 net210 a a_jtle_ XI14 gl7_ net273 _gl7 a_jtl Xand1 net291 net294 net283 net281 a_and Xor net297 net280 net195 g a_or XI4 net256 net205 net291 a_jtle_chop XI3 net253 net293 net294 a_jtle_chop XI2 net262 net225 net297 a_jtle_chop .ends a_c4_b6 .subckt a_c4_b7 a a_ c00 c01 c10 c11 dc0 dc1 g gl gm pm XI5 net281 net220 net280 a_jtl_chop XI23 net190 net186 net181 net189 net184 n_bias XI24 net185 net181 net236 net184 net239 n_bias 300 XI28 net180 net236 c01 net239 dc1 n_bias XI43 net195 net217 c10 dc0 net219 n_bias_ihm XI40 net283 net227 net197 net199 net229 n_bias_ihm XI22 net205 c00 net202 net249 net189 n_bias XI55 net210 net222 net207 net209 net224 n_bias XI26 net220 net207 net217 net219 net209 n_bias XI38 net225 net197 net222 net224 net199 n_bias XI16 net293 net202 net227 net229 net244 n_bias XI18 net245 c11 net186 net244 net249 n_bias XI13 gl net245 net253 a_jtl_e XI10 a_ net185 net259 a_jtle_e XI12 pm net180 net256 a_jtle_e XI11 gm net190 net262 a_jtle_e XI1 net259 net210 a a_jtle_ Xand1 net291 net294 net283 net281 a_and Xor net297 net280 net195 g a_or XI4 net256 net205 net291 a_jtle_chop XI3 net253 net293 net294 a_jtle_chop XI2 net262 net225 net297 a_jtle_chop .ends a_c4_b7 .subckt a_c4_b4 _gl5 _gl6 _gl7 a a_ bias1 bias10 c00 c01 c10 c11 dc0 dc1 g g_ gl5_ L0 net241 c11 1p XI31 bias1 c00 net066 net068 net0108 n_bias XI23 net190 net186 net181 net204 net184 n_bias XI24 net185 net181 net236 net184 net239 n_bias XI28 net180 net176 c01 net179 dc1 n_bias XI55 net210 net222 c10 dc0 net224 n_bias XI38 net225 net066 net222 net224 net068 n_bias XI25 net276 net236 net176 net239 net179 n_bias XI19 net273 net0100 net186 net249 net204 n_bias XI30 bias10 net241 net0100 net0108 net249 n_bias XI10 a_ net185 net259 a_jtle_e XI11 g_ net190 net262 a_jtle_e XI2 net262 net225 g a_jtle_ XI16 gl7_ net180 _gl7 a_jtle_ XI1 net259 net210 a a_jtle_ XI14 gl5_ net273 _gl5 a_jtl XI15 gl6_ net276 _gl6 a_jtl .ends a_c4_b4 .subckt a_c4_b3 _gl5 _gl6 a a_ bias1 bias10 c00 c01 c10 c11 dc0 dc1 g g1 g10 g_ gl5_ gl6_ L0 net186 c11 1p XI31 bias1 c00 net052 net054 net077 n_bias 301 XI15 gl5_ net276 _gl5 a_jtl XI23 net190 net059 net236 net249 net239 n_bias XI28 net180 net176 c01 net179 dc1 n_bias XI38 net225 net052 c10 dc0 net054 n_bias XI30 bias10 net186 net059 net077 net249 n_bias XI25 net276 net236 net176 net239 net179 n_bias XI10 g_ g10 net259 a_jtle_e XI11 a_ net190 net262 a_jtle_e XI2 net262 net225 g a_jtle_ XI16 gl6_ net180 _gl6 a_jtle_ XI1 net259 g1 a a_jtle_ .ends a_c4_b3 .subckt a_c4_b2 _gl5 a a_ bias1 bias10 c00 c01 c10 c11 dc0 dc1 g g1 g10 g_ gl5_ L0 net186 c11 1p XI30 bias10 net186 net059 net052 net249 n_bias XI31 bias1 c00 net055 net057 net052 n_bias XI23 net190 net059 net236 net249 net239 n_bias XI28 net180 net236 c01 net239 dc1 n_bias XI38 net225 net055 c10 dc0 net057 n_bias XI10 g_ g10 net259 a_jtle_e XI11 a_ net190 net262 a_jtle_e XI2 net262 net225 g a_jtle_ XI1 net259 g1 a a_jtle_ XI16 gl5_ net180 _gl5 a_jtl .ends a_c4_b2 .subckt a_c4_b1 a a_ c00 c01 c10 c11 dc0 dc1 g g1 g10 g_ L0 net186 c11 1p XI23 net190 net186 c01 net249 dc1 n_bias XI38 net225 c00 c10 dc0 net249 n_bias XI10 g_ g10 net259 a_jtle_e XI11 a_ net190 net262 a_jtle_e XI2 net262 net225 g a_jtle_ XI1 net259 g1 a a_jtle_ .ends a_c4_b1 .subckt a_c4_b0 a a_ c00 c01 c10 c11 dc0 dc1 XI24 net185 c11 c01 net224 dc1 n_bias XI55 net210 c00 c10 dc0 net224 n_bias XI10 a_ net185 net259 a_jtle_e XI1 net259 net210 a a_jtle_ .ends a_c4_b0 .subckt a_c3_b0 a b c00 c01 c10 c11 dc0 dc1 dc2 dc3 q 302 L0 c01 c11 1p XI27 net047 c10 c00 dc0 dc1 n_bias Xanotb a b dc2 dc3 net013 a_anotb XI1 net013 net047 q a_jtl_e .ends a_c3_b0 .subckt a_c3_b2 a b bias16 c00 c01 c10 c11 dc0 dc1 dc2 dc3 g g_ q L0 net045 c11 1p XI46 bias16 c01 net045 net034 dc1 n_bias XI2 g_ net0134 net49 a_jtle_ XI44 net36 c10 net0115 dc0 net0118 n_bias XI38 net0134 net0100 c00 net0123 net034 n_bias XI27 net047 net0115 net0100 net0118 net0123 n_bias Xanotb a b dc2 dc3 net013 a_anotb XI1 net013 net047 q a_jtl_e XI6 net49 net36 g a_jtl_e .ends a_c3_b2 .subckt a_c3_b4 a b bias8 bias15 bias16 c00 c01 c10 c11 dc0 dc1 dc2 dc3 g g4 gl gm pm1 q L0 net080 c11 1p XI47 bias16 net075 net080 net0148 net059 n_bias XI48 bias15 c01 net075 net068 net062 n_bias XI5 net31 net54 net55 a_jtl_chop XI40 net33 net0150 net0145 net0153 net0148 n_bias_ihm XI43 net25 net0100 net0120 net0103 net0123 n_bias_ihm XI45 bias8 net072 c00 net062 dc1 n_bias XI44 net36 c10 net0115 dc0 net0118 n_bias XI22 net0149 net0145 net072 net059 net068 n_bias XI38 net0134 net090 net0150 net0133 net0153 n_bias XI26 net54 net0120 net090 net0123 net0133 n_bias XI27 net047 net0115 net0100 net0118 net0103 n_bias Xanotb a b dc2 dc3 net013 a_anotb XI2 gm net0134 net22 a_jtle_chop XI3 gl net0149 net34 a_jtle_chop XI4 pm1 g4 net32 a_jtle_chop Xor net22 net55 net25 net49 a_or Xand1 net32 net34 net33 net31 a_and XI1 net013 net047 q a_jtl_e XI6 net49 net36 g a_jtl_e .ends a_c3_b4 .subckt a_c3_b7 a b c00 c01 c10 c11 dc0 dc1 dc2 dc3 g g4 g8 g9 gl gm p pl pm1 pm2 q L0 c01 c11 1p XI40 net33 net0150 net0145 net0153 net0148 n_bias_ihm 303 XI43 net25 net0100 net0120 net0103 net0123 n_bias_ihm XI39 net27 net0125 net0110 net0138 net0133 n_bias_ihm XI42 net016 net072 c00 net068 dc1 n_bias XI33 net094 net090 net0125 net093 net0138 n_bias XI44 net36 c10 net0115 dc0 net0118 n_bias XI22 net0149 net0145 net072 net0148 net068 n_bias XI38 net0134 net0110 net0150 net0133 net0153 n_bias XI26 net54 net0120 net090 net0123 net093 n_bias XI27 net047 net0115 net0100 net0118 net0103 n_bias Xanotb a b dc2 dc3 net013 a_anotb XI8 pl g8 net018 a_jtle_ XI9 net018 g9 net30 a_jtl_chop XI5 net31 net54 net55 a_jtl_chop XI2 gm net0134 net22 a_jtle_chop XI3 gl net0149 net34 a_jtle_chop XI4 pm1 g4 net32 a_jtle_chop XI7 pm2 net016 net29 a_jtle_chop Xor net22 net55 net25 net49 a_or Xand1 net32 net34 net33 net31 a_and Xand2 net29 net30 net27 net28 a_and XI1 net013 net047 q a_jtl_e XI10 net28 net094 p a_jtl_e XI6 net49 net36 g a_jtl_e .ends a_c3_b7 .subckt a_c3_b5 a b bias8 c00 c01 c10 c11 dc0 dc1 dc2 dc3 g g4 g8 g9 gl gm p pl pm1 pm2 q XI40 net33 net0150 net0145 net0153 net0148 n_bias_ihm XI43 net25 net0100 net0120 net0103 net0123 n_bias_ihm XI39 net27 net0125 net0110 net0138 net0133 n_bias_ihm XI42 net016 net072 net073 net068 net074 n_bias XI33 net094 net090 net0125 net093 net0138 n_bias XI45 bias8 net073 c00 net074 dc1 n_bias XI44 net36 c10 net0115 dc0 net0118 n_bias XI22 net0149 net0145 net072 net0148 net068 n_bias XI38 net0134 net0110 net0150 net0133 net0153 n_bias XI26 net54 net0120 net090 net0123 net093 n_bias XI27 net047 net0115 net0100 net0118 net0103 n_bias L0 c01 c11 1p Xanotb a b dc2 dc3 net013 a_anotb XI8 pl g8 net018 a_jtle_ XI9 net018 g9 net30 a_jtl_chop XI5 net31 net54 net55 a_jtl_chop XI2 gm net0134 net22 a_jtle_chop XI3 gl net0149 net34 a_jtle_chop 304 XI4 pm1 g4 net32 a_jtle_chop XI7 pm2 net016 net29 a_jtle_chop Xor net22 net55 net25 net49 a_or Xand1 net32 net34 net33 net31 a_and Xand2 net29 net30 net27 net28 a_and XI1 net013 net047 q a_jtl_e XI10 net28 net094 p a_jtl_e XI6 net49 net36 g a_jtl_e .ends a_c3_b5 .subckt a_jtle_chop_e a bias q Xb1 net014 net09 rsj jjmod=Hyp5a ic=0.200 icrn=0.75 rsh=3.75 lprsh = 1.875p Xb0 net021 net013 rsj jjmod=Hyp5a ic=0.141 icrn=0.75 rsh=5.319149 lprsh = 2.659575p Lg0 net013 0 250f Lg1 net09 0 250f L6 net014 q 1f L3 a net021 3.0p Lbias bias net023 9.9p L4 net021 net023 3.0p L5 net023 net014 2.1p .ends a_jtle_chop_e .subckt q_c0_b7 _b a a_ b c00 c01 c10 c11 dc0 dc1 XI13 net89 net0114 a a_jtle_chop_e XI12 net73 net0119 b a_jtle_chop_e XI7 net70 net0104 _b a_jtle_e XI6 net73 net0109 net70 a_jtle_e XI5 net76 net0129 net73 a_jtle_e XI4 net078 net0139 net76 a_jtle_e XI3 net86 net099 net078 a_jtle_e XI2 net89 net0124 net86 a_jtle_e XI1 net92 net088 net89 a_jtle_e XI0 a_ net091 net92 a_jtle_e XI8 net091 net097 net34 net079 net0100 n_bias XI10 net099 net60 net65 net67 net62 n_bias XI17 net0104 net39 net60 net62 net22 n_bias XI18 net0109 net19 net50 net22 net42 n_bias XI15 net0114 net097 net096 net0100 net37 n_bias XI14 net0119 c11 net34 dc0 net47 n_bias XI11 net0124 net096 net19 net37 net67 n_bias XI19 net0129 net39 c00 net42 net52 n_bias XI9 net088 c10 net65 net47 net079 n_bias XI16 net0139 c01 net50 net52 dc1 n_bias .ends q_c0_b7 305 .subckt q_c0_b6 _a _b a a_ b b_ c00 c01 c10 c11 dc0 dc1 XI13 net89 net0114 a a_jtle_chop_e XI12 net73 net0119 b a_jtle_chop_e XI7 net70 net0104 _b a_jtle_e XI6 net73 net0109 net70 a_jtle_e XI5 net76 net0129 net73 a_jtle_e XI4 b_ net0139 net76 a_jtle_e XI3 net86 net099 _a a_jtle_e XI2 net89 net0124 net86 a_jtle_e XI1 net92 net088 net89 a_jtle_e XI0 a_ net091 net92 a_jtle_e XI8 net091 net097 net34 net079 net0100 n_bias XI10 net099 net60 net65 net67 net62 n_bias XI17 net0104 net39 net60 net62 net22 n_bias XI18 net0109 net19 net50 net22 net42 n_bias XI15 net0114 net097 net096 net0100 net37 n_bias XI14 net0119 c11 net34 dc0 net47 n_bias XI11 net0124 net096 net19 net37 net67 n_bias XI19 net0129 net39 c00 net42 net52 n_bias XI9 net088 c10 net65 net47 net079 n_bias XI16 net0139 c01 net50 net52 dc1 n_bias .ends q_c0_b6 .subckt a_c2_b5 a a_ b b_ bias1a bias2a bias2i bias3 bias4 bias7 bias9 c00 c01 c10 c11 dc0 dc1 g1 g1a g2 g2a g2i g3 g7 gl gm p1 p3 pl pm1 pm2 XI2 net222 g2i net253 a_jtl_chop XI44 bias4 c10 net0129 net0123 net140 n_bias XI47 bias9 net0129 c00 net145 net200 n_bias XI40 bias2a net168 net163 net165 net170 n_bias_ihm XI41 bias1a c01 net168 net170 dc1 n_bias_ihm XI33 net126 net153 net123 net125 net155 n_bias_ihm XI37 net240 net138 c11 dc0 net0123 n_bias XI42 net234 net198 net133 net135 net0186 n_bias XI43 bias7 net133 net138 net140 net135 n_bias XI36 net231 net183 net143 net200 net185 n_bias XI30 net249 net193 net148 net150 net195 n_bias XI31 net216 net188 net153 net155 net190 n_bias XI39 bias2i net163 net158 net160 net165 n_bias XI38 bias3 net158 net173 net175 net160 n_bias XI35 net181 net173 net178 net180 net175 n_bias XI23 net186 net148 net183 net185 net150 n_bias XI34 net191 net178 net188 net190 net180 n_bias XI17 net196 net203 net193 net195 net205 n_bias 306 XI24 net201 net143 net198 net0186 net145 n_bias XI28 net206 net123 net203 net205 net125 n_bias XI12 a_ net191 net211 a_jtle_ XI11 b_ net181 net214 a_jtle_ XI1 gm net216 net217 a_jtle_chop Xor net217 net253 net126 net218 a_or Xand1 pm1 gl g1a net222 a_and011 Xand2 pm2 pl g2a net227 a_and011 XI4 net244 net231 p1 a_jtl_e XI8 net250 net234 g2 a_jtl_e XI7 net250 g7 g1 a_jtl_e XI6 net244 net240 p3 a_jtl_e XI3 net227 g3 net244 a_jtl XI13 net214 net206 net247 a_jtl XI5 net218 net249 net250 a_jtl XI15 net247 net186 b a_jtl XI16 net262 net201 a a_jtl XI14 net211 net196 net262 a_jtl .ends a_c2_b5 .subckt a_c2_b4 a a_ b b_ bias1a bias2a bias2i bias3 bias4 bias7 bias9 c00 c01 c10 c11 dc0 dc1 g1 g1a g2 g2a g2i g3 g7 gl gm p1 pl pm1 pm2 XI44 bias4 c10 net0125 dc0 net0156 n_bias XI47 bias9 net0125 c00 net145 net0149 n_bias XI2 net222 g2i net253 a_jtl_chop XI40 bias2a net168 net163 net165 net170 n_bias_ihm XI41 bias1a c01 net168 net170 dc1 n_bias_ihm XI33 net126 net153 net123 net125 net155 n_bias_ihm XI42 net234 net198 net133 net135 net200 n_bias XI43 bias7 net133 c11 net0156 net135 n_bias XI36 net231 net183 net143 net0149 net185 n_bias XI30 net249 net193 net148 net150 net195 n_bias XI31 net216 net188 net153 net155 net190 n_bias XI39 bias2i net163 net158 net160 net165 n_bias XI38 bias3 net158 net173 net175 net160 n_bias XI35 net181 net173 net178 net180 net175 n_bias XI23 net186 net148 net183 net185 net150 n_bias XI34 net191 net178 net188 net190 net180 n_bias XI17 net196 net203 net193 net195 net205 n_bias XI24 net201 net143 net198 net200 net145 n_bias XI28 net206 net123 net203 net205 net125 n_bias XI12 a_ net191 net211 a_jtle_ XI11 b_ net181 net214 a_jtle_ XI1 gm net216 net217 a_jtle_chop 307 Xor net217 net253 net126 net218 a_or Xand1 pm1 gl g1a net222 a_and011 Xand2 pm2 pl g2a net227 a_and011 XI4 net244 net231 p1 a_jtl_e XI8 net250 net234 g2 a_jtl_e XI7 net250 g7 g1 a_jtl_e XI3 net227 g3 net244 a_jtl XI13 net214 net206 net247 a_jtl XI5 net218 net249 net250 a_jtl XI15 net247 net186 b a_jtl XI16 net262 net201 a a_jtl XI14 net211 net196 net262 a_jtl .ends a_c2_b4 .subckt a_c2_b6 a a_ b b_ bias1a bias2a bias2i bias3 bias4 bias7 bias9 c00 c01 c10 c11 dc0 dc1 g1a g2 g2a g2i g3 gl gm p1 p3 pl pm1 pm2 XI47 bias9 net0114 c00 net145 net0160 n_bias XI44 bias4 c10 net0114 net0122 net140 n_bias XI2 net222 g2i net253 a_jtl_chop XI40 bias2a net168 net163 net165 net170 n_bias_ihm XI41 bias1a c01 net168 net170 dc1 n_bias_ihm XI33 net126 net153 net123 net125 net155 n_bias_ihm XI37 net240 net138 c11 dc0 net0122 n_bias XI42 net234 net198 net133 net135 net200 n_bias XI43 bias7 net133 net138 net140 net135 n_bias XI36 net231 net183 net143 net0160 net185 n_bias XI30 net249 net193 net148 net150 net195 n_bias XI31 net216 net188 net153 net155 net190 n_bias XI39 bias2i net163 net158 net160 net165 n_bias XI38 bias3 net158 net173 net175 net160 n_bias XI35 net181 net173 net178 net180 net175 n_bias XI23 net186 net148 net183 net185 net150 n_bias XI34 net191 net178 net188 net190 net180 n_bias XI17 net196 net203 net193 net195 net205 n_bias XI24 net201 net143 net198 net200 net145 n_bias XI28 net206 net123 net203 net205 net125 n_bias XI12 a_ net191 net211 a_jtle_ XI11 b_ net181 net214 a_jtle_ XI1 gm net216 net217 a_jtle_chop Xor net217 net253 net126 net218 a_or Xand1 pm1 gl g1a net222 a_and011 Xand2 pm2 pl g2a net227 a_and011 XI4 net244 net231 p1 a_jtl_e XI8 net250 net234 g2 a_jtl_e 308 XI6 net244 net240 p3 a_jtl_e XI3 net227 g3 net244 a_jtl XI13 net214 net206 net247 a_jtl XI5 net218 net249 net250 a_jtl XI15 net247 net186 b a_jtl XI16 net262 net201 a a_jtl XI14 net211 net196 net262 a_jtl .ends a_c2_b6 .subckt a_c2_b7 a a_ b b_ bias4 c00 c01 c10 c11 dc0 dc1 g1a g2 g2a g2i g3 gl gm p1 p3 pl pm1 pm2 XI44 bias4 c10 c00 net135 net0114 n_bias XI2 net222 g2i net253 a_jtl_chop XI33 net126 net153 net123 net125 net155 n_bias_ihm XI37 net240 net138 c11 dc0 net135 n_bias XI42 net234 net198 net138 net0114 net200 n_bias XI36 net231 net183 net143 net145 net185 n_bias XI30 net249 net193 net148 net150 net195 n_bias XI31 net216 net188 net153 net155 net190 n_bias XI35 net181 c01 net178 net180 dc1 n_bias XI23 net186 net148 net183 net185 net150 n_bias XI34 net191 net178 net188 net190 net180 n_bias XI17 net196 net203 net193 net195 net205 n_bias XI24 net201 net143 net198 net200 net145 n_bias XI28 net206 net123 net203 net205 net125 n_bias XI12 a_ net191 net211 a_jtle_ XI11 b_ net181 net214 a_jtle_ XI1 gm net216 net217 a_jtle_chop Xor net217 net253 net126 net218 a_or Xand1 pm1 gl g1a net222 a_and011 Xand2 pm2 pl g2a net227 a_and011 XI4 net244 net231 p1 a_jtl_e XI8 net250 net234 g2 a_jtl_e XI6 net244 net240 p3 a_jtl_e XI3 net227 g3 net244 a_jtl XI13 net214 net206 net247 a_jtl XI5 net218 net249 net250 a_jtl XI15 net247 net186 b a_jtl XI16 net262 net201 a a_jtl XI14 net211 net196 net262 a_jtl .ends a_c2_b7 .subckt a_c2_b2 a a_ b b_ bias1a bias2a bias2i bias3 bias7 bias15 c00 c01 c10 c11 dc0 dc1 g1 g1a g2 g2i g7 g15 g16 gl gm pm1 309 L0 c00 c10 1p XI2 net222 g2i net253 a_jtl_chop XI40 bias2a net168 net163 net165 net170 n_bias_ihm XI41 bias1a c01 net168 net170 dc1 n_bias_ihm XI33 net126 net153 net123 net125 net155 n_bias_ihm XI42 net234 net198 net133 net135 net150 n_bias XI43 bias7 net133 net0125 net0127 net135 n_bias XI44 bias15 net0125 c11 dc0 net0127 n_bias XI30 net249 net193 net198 net150 net195 n_bias XI31 net216 net188 net153 net155 net190 n_bias XI39 bias2i net163 net158 net160 net165 n_bias XI38 bias3 net158 net173 net175 net160 n_bias XI35 net181 net173 net178 net180 net175 n_bias XI34 net191 net178 net188 net190 net180 n_bias XI17 net196 net203 net193 net195 net205 n_bias XI28 net206 net123 net203 net205 net125 n_bias XI12 a_ net191 net211 a_jtle_ XI11 b_ net181 net214 a_jtle_ XI1 gm net216 net217 a_jtle_chop Xor net217 net253 net126 net218 a_or Xand1 pm1 gl g1a net222 a_and011 XI8 net250 net234 g2 a_jtl_e XI7 net250 g7 g1 a_jtl_e XI13 net214 net206 net247 a_jtl XI5 net218 net249 net250 a_jtl XI15 net247 g15 b a_jtl XI16 net262 g16 a a_jtl XI14 net211 net196 net262 a_jtl .ends a_c2_b2 .subckt a_c2_b1 a a_ b b_ bias1a bias2i bias15 c00 c01 c10 c11 dc0 dc1 g1 g2 g7 g15 g16 g_ L0 c00 c10 1p XI41 bias1a c01 net168 net165 dc1 n_bias_ihm XI42 net0136 net0115 net086 net088 net0117 n_bias XI44 bias15 net086 c11 dc0 net088 n_bias XI30 net249 net193 net0115 net0117 net195 n_bias XI31 net216 net188 net123 net125 net190 n_bias XI39 bias2i net168 net158 net175 net165 n_bias XI35 net181 net158 net178 net180 net175 n_bias XI34 net191 net178 net188 net190 net180 n_bias XI17 net196 net203 net193 net195 net205 n_bias XI28 net206 net123 net203 net205 net125 n_bias XI12 a_ net191 net211 a_jtle_ XI11 b_ net181 net214 a_jtle_ 310 XI1 g_ net216 net218 a_jtle_chop XI8 net250 net0136 g2 a_jtl_e XI7 net250 g7 g1 a_jtl_e XI13 net214 net206 net247 a_jtl XI5 net218 net249 net250 a_jtl XI15 net247 g15 b a_jtl XI16 net262 g16 a a_jtl XI14 net211 net196 net262 a_jtl .ends a_c2_b1 .subckt a_c2_b0 a a_ b b_ c00 c01 c10 c11 dc0 dc1 g15 g16 L0 c00 c10 1p XI35 net181 c01 net178 net180 dc1 n_bias XI34 net191 net178 net123 net125 net180 n_bias XI17 net196 net203 c11 dc0 net205 n_bias XI28 net206 net123 net203 net205 net125 n_bias XI12 a_ net191 net211 a_jtle_ XI11 b_ net181 net214 a_jtle_ XI13 net214 net206 net247 a_jtl XI15 net247 g15 b a_jtl XI16 net262 g16 a a_jtl XI14 net211 net196 net262 a_jtl .ends a_c2_b0 .subckt a_c1_b5 a5 b5 bias5 bias7 bias10 bias11 c00 c01 c10 c11 dc0 dc1 g1_chop g2 g3 g5 g7 g10 g11 p1_chop p2_chop p3_chop p4 XI29 bias11 net0124 net92 net94 net0126 n_bias XI30 bias7 net0129 net97 net99 net0131 n_bias XI31 bias5 net0134 net102 net104 net0136 n_bias L0 c01 c11 1p XI14 net85 c00 net82 net84 dc1 n_bias_ihm XI28 bias10 net0109 net112 net114 net0111 n_bias XI27 net138 net92 c10 dc0 net94 n_bias XI26 net164 net97 net0124 net0126 net99 n_bias XI25 net135 net102 net0129 net0131 net104 n_bias XI24 net144 net122 net0134 net0136 net124 n_bias XI17 net110 net112 net107 net109 net114 n_bias XI16 net156 net127 net0109 net0111 net129 n_bias XI22 net141 net107 net117 net119 net109 n_bias XI23 net153 net117 net122 net124 net119 n_bias XI15 net130 net82 net127 net129 net84 n_bias XI10 net145 g10 p4 a_jtl_e XI13 net154 net135 g3 a_jtl_e XI12 net154 net138 g2 a_jtl_e 311 XI3 net157 net141 net142 a_jtl XI4 net157 net144 net145 a_jtl XI2 net158 net130 net148 a_jtl XI5 net148 g5 net151 a_jtl XI6 net148 net153 net154 a_jtl XI1 net162 net156 net157 a_jtl Xandor a5 b5 net85 net158 net162 a_andor011 XI9 net145 net164 p3_chop a_jtl_chop_e XI8 net142 net110 p2_chop a_jtl_chop_e XI11 net151 g11 g1_chop a_jtl_chop_e XI7 net142 g7 p1_chop a_jtl_chop_e .ends a_c1_b5 .subckt a_c1_b6 a5 b5 bias5 bias7 bias10 bias11 c00 c01 c10 c11 dc0 dc1 g2 g3 g10 p2_chop p3_chop p4 L0 c01 c11 1p XI29 bias11 net0124 net92 net94 net0126 n_bias XI30 bias7 net0129 net97 net99 net0131 n_bias XI31 bias5 net0134 net102 net104 net0136 n_bias XI14 net85 c00 net82 net84 dc1 n_bias_ihm XI28 bias10 net059 net112 net0111 net0133 n_bias XI27 net138 net92 c10 dc0 net94 n_bias XI26 net164 net97 net0124 net0126 net99 n_bias XI25 net135 net102 net0129 net0131 net104 n_bias XI24 net144 net122 net0134 net0136 net124 n_bias XI17 net110 net112 net107 net109 net0111 n_bias XI16 net156 net127 net059 net0133 net129 n_bias XI22 net141 net107 net117 net119 net109 n_bias XI23 net153 net117 net122 net124 net119 n_bias XI15 net130 net82 net127 net129 net84 n_bias XI10 net145 g10 p4 a_jtl_e XI13 net154 net135 g3 a_jtl_e XI12 net154 net138 g2 a_jtl_e XI3 net157 net141 net142 a_jtl XI4 net157 net144 net145 a_jtl XI2 net158 net130 net148 a_jtl XI6 net148 net153 net154 a_jtl XI1 net162 net156 net157 a_jtl Xandor a5 b5 net85 net158 net162 a_andor011 XI9 net145 net164 p3_chop a_jtl_chop_e XI8 net142 net110 p2_chop a_jtl_chop_e .ends a_c1_b6 .subckt a_c1_b7 a5 b5 c00 c01 c10 c11 dc0 dc1 g3 g10 p4 L0 c01 c11 1p 312 XI14 net85 c00 net82 net84 dc1 n_bias_ihm XI25 net135 net0134 c10 dc0 net104 n_bias XI24 net144 net122 net0134 net104 net124 n_bias XI16 net156 net127 net117 net119 net129 n_bias XI23 net153 net117 net122 net124 net119 n_bias XI15 net130 net82 net127 net129 net84 n_bias XI10 net145 g10 p4 a_jtl_e XI13 net154 net135 g3 a_jtl_e XI4 net157 net144 net145 a_jtl XI2 net158 net130 net148 a_jtl XI6 net148 net153 net154 a_jtl XI1 net162 net156 net157 a_jtl Xandor a5 b5 net85 net158 net162 a_andor011 .ends a_c1_b7 .subckt a_c1_b1 a5 b5 bias5 bias10 bias11 c00 c01 c10 c11 dc0 dc1 g1_chop g2 g3 g5 g7 g10 g11 p1_chop p2_chop p4 L0 c01 c11 1p XI29 bias11 net97 net92 net94 net0126 n_bias XI31 bias5 net0134 net102 net104 net0136 n_bias XI14 net85 c00 net82 net84 dc1 n_bias_ihm XI28 bias10 net0109 net112 net114 net0111 n_bias XI27 net138 net92 c10 dc0 net94 n_bias XI25 net135 net102 net97 net0126 net104 n_bias XI24 net144 net122 net0134 net0136 net124 n_bias XI17 net110 net112 net107 net109 net114 n_bias XI16 net156 net127 net0109 net0111 net129 n_bias XI22 net141 net107 net117 net119 net109 n_bias XI23 net153 net117 net122 net124 net119 n_bias XI15 net130 net82 net127 net129 net84 n_bias XI10 net145 g10 p4 a_jtl_e XI13 net154 net135 g3 a_jtl_e XI12 net154 net138 g2 a_jtl_e XI3 net157 net141 net142 a_jtl XI4 net157 net144 net145 a_jtl XI2 net158 net130 net148 a_jtl XI5 net148 g5 net151 a_jtl XI6 net148 net153 net154 a_jtl XI1 net162 net156 net157 a_jtl Xandor a5 b5 net85 net158 net162 a_andor011 XI8 net142 net110 p2_chop a_jtl_chop_e XI11 net151 g11 g1_chop a_jtl_chop_e XI7 net142 g7 p1_chop a_jtl_chop_e .ends a_c1_b1 313 .subckt a_c1_b0 a5 b5 bias10 c00 c01 c10 c11 dc0 dc1 g1_chop g2 g3 g5 g11 p4 L0 c01 c11 1p XI14 net85 c00 net82 net84 dc1 n_bias_ihm XI28 bias10 net0109 net089 net091 net0111 n_bias XI27 net138 net97 c10 dc0 net0126 n_bias XI25 net135 net096 net0134 net104 net098 n_bias XI24 net144 net089 net112 net124 net091 n_bias XI17 net110 net112 net096 net098 net124 n_bias XI16 net156 net127 net0109 net0111 net129 n_bias XI23 net153 net0134 net97 net0126 net104 n_bias XI15 net130 net82 net127 net129 net84 n_bias XI10 net145 net110 p4 a_jtl_e XI13 net154 net135 g3 a_jtl_e XI12 net154 net138 g2 a_jtl_e XI4 net157 net144 net145 a_jtl XI2 net158 net130 net148 a_jtl XI5 net148 g5 net151 a_jtl XI6 net148 net153 net154 a_jtl XI1 net162 net156 net157 a_jtl Xandor a5 b5 net85 net158 net162 a_andor011 XI11 net151 g11 g1_chop a_jtl_chop_e .ends a_c1_b0 .subckt a_add a0 a1 c00_0 c00_1 c00_2 c00_3 c00_4 c00_5 c00_6 c00_7 c01_0 c01_1 c01_2 c01_3 c01_4 c01_5 c01_6 c01_7 c10_0 c10_1 c10_2 c10_3 c10_4 c10_5 c10_6 c10_7 c11_0 c11_1 c11_2 c11_3 c11_4 c11_5 c11_6 c11_7 dc0_0 dc0_1 dc0_2 dc0_3 dc0_4 dc0_5 dc0_6 + dc0_7 dc1_0 dc1_1 dc1_2 dc1_3 dc1_4 dc1_5 dc1_6 dc1_7 dc2 dc3 q0 q1 q2 q3 q4 q5 q6 q7 qv XI49 net0408 net0412 net0409 c10_0 c11_0 dc0_0 net0411 q0 m_out12 XI50 net0416 net0420 net0417 c10_1 c11_1 dc0_1 net0419 q1 m_out12 XI51 net0424 net0428 net0425 c10_2 c11_2 dc0_2 net0427 q2 m_out12 XI52 net0432 net0436 net0433 c10_3 c11_3 dc0_3 net0435 q3 m_out12 XI53 net0440 net0444 net0441 c10_4 c11_4 dc0_4 net0443 q4 m_out12 XI54 net0565 net0564 net0563 c10_5 c11_5 dc0_5 net0559 q5 m_out12 XI55 net0456 net0460 net0457 c10_6 c11_6 dc0_6 net0459 q6 m_out12 XI56 net0464 net0468 net0465 c10_7 c11_7 dc0_7 net0467 q7 m_out12 XI48 net90 a0 a1 c01_0 net86 net94 dc1_0 net83 qv q_c0_io XI28 net141 net140 net0265 net0850 net134 net030 net035 net034 net029 net124 net053 net025 net039 net041 net032 net040 net038 a_c3_b3 XI14 net141 net127 net140 net311 net165 net177 net178 net179 net240 net316 net315 net134 net030 net124 net132 net128 net239 net032 net248 net249 net250 net166 net0295 314 net0294 net410 net129 net040 net145 net286 net285 a_c2_b3 XI42 net0617 net0615 net0612 net0611 net0460 net0457 net0459 net0605 net0248 net0292 net0456 a_c5_b5 XI45 net0653 net0658 net0656 net0655 net0436 net0433 net0435 net0647 net0259 net0303 net0432 a_c5_b5 XI47 net0677 net0681 net0680 net0679 net0420 net0417 net0419 net0673 net0319 net0314 net0416 a_c5_b5 XI43 net0603 net0600 net0596 net0595 net0564 net0563 net0559 net0589 net0304 net0248 net0565 a_c5_b5 XI41 net0629 net0627 net0626 net0625 net0468 net0465 net0467 net0619 net0292 dc3 net0464 a_c5_b5 XI44 net0637 net0644 net0640 net0639 net0444 net0441 net0443 net0631 net0303 net0304 net0440 a_c5_b5 XI46 net0667 net0671 net0670 net0669 net0428 net0425 net0427 net0661 net0314 net0259 net0424 a_c5_b5 XI40 net0690 net0689 net0688 net0412 net0409 net0411 net0683 net091 net0319 net0408 a_c5_b0 XI39 net0616 net0613 net0600 net0114 net0111 net0110 net0596 net0595 net0589 net0105 net0603 net0635 net0645 net0642 net0115 net0121 a_c4_b5 XI38 net0628 net0615 net0135 net0132 net0131 net0612 net0611 net0605 net0126 net0617 net0616 net0613 net0136 net0142 a_c4_b6 XI37 net0627 net072 net069 net068 net0626 net0625 net0619 net063 net0629 net0628 net073 net078 a_c4_b7 XI36 net0635 net0645 net0642 net0644 net055 net0477 net0478 net052 net051 net0640 net0639 net0631 net046 net0637 net056 net0651 net0659 net039 a_c4_b4 XI35 net0651 net0659 net0658 net038 net0506 net0507 net035 net034 net0656 net0655 net0647 net029 net0653 net0477 net0478 net039 net0665 net0702 a_c4_b3 XI34 net0665 net0671 net026 net0520 net0521 net023 net022 net0670 net0669 net0661 net018 net0667 net0506 net0507 net0702 net0715 a_c4_b2 XI33 net0681 net0715 net010 net09 net0680 net0679 net0673 net05 net0677 net0520 net0521 net013 a_c4_b1 XI32 net0690 net093 net090 net089 net0689 net0688 net0683 net084 a_c4_b0 XI31 net278 net082 net086 net273 net090 net089 net084 net267 net092 net091 net093 a_c3_b0 XI29 net245 net244 net0874 net238 net019 net023 net022 net018 net229 net025 net012 net0702 net235 net026 a_c3_b2 XI30 net264 net263 net0895 net07 net258 net010 net09 net05 net252 net012 net092 net0715 net0861 net013 a_c3_b2 XI27 net045 net168 net0122 net0294 net0295 net048 net047 net052 net051 net046 net050 net0112 net053 net056 net0240 net233 net049 net159 net055 a_c3_b4 315 XI24 net062 net061 net217 net216 net069 net068 net063 net067 dc2 net070 net073 net081 net079 net0807 net100 net213 net078 net103 net074 net076 net072 a_c3_b7 XI26 net111 net110 net0120 net106 net105 net0111 net0110 net0105 net95 net0113 net0112 net0115 net0783 net0122 net0755 net128 net0108 net0121 net040 net103 net99 net0114 a_c3_b5 XI25 net0125 net0124 net079 net0128 net189 net0132 net0131 net0126 net180 net070 net0113 net0136 net0805 net0120 net0144 net156 net0129 net0142 net159 net0137 net184 net0135 a_c3_b5 XI23 net16 net393 net13 net392 c00_7 c01_7 net391 net390 net396 dc1_7 q_c0_b7 XI22 net13 net28 net375 net25 net374 net16 c00_6 c01_6 net373 net372 net381 dc1_6 q_c0_b6 XI21 net25 net40 net352 net37 net351 net28 c00_5 c01_5 net350 net349 net360 dc1_5 q_c0_b6 XI20 net37 net52 net329 net49 net328 net40 c00_4 c01_4 net327 net326 net337 dc1_4 q_c0_b6 XI19 net49 net64 net55 net61 net48 net52 c00_3 c01_3 net51 net53 net57 dc1_3 q_c0_b6 XI18 net61 net76 net67 net73 net60 net64 c00_2 c01_2 net63 net65 net69 dc1_2 q_c0_b6 XI17 net73 net88 net79 net85 net72 net76 c00_1 c01_1 net75 net77 net81 dc1_1 q_c0_b6 XI16 net85 net90 net91 net83 net84 net88 c00_0 net86 net87 net89 net430 net94 q_c0_b6 XI15 net111 net353 net110 net109 net123 net116 net117 net118 net0783 net108 net0144 net97 net361 net106 net105 net95 net348 net100 net119 net0108 net120 net121 net122 net107 net113 net101 net103 net99 net115 net332 net331 a_c2_b5 XI13 net045 net330 net168 net334 net119 net120 net121 net122 net0240 net166 net0755 net154 net338 net048 net047 net050 net160 net156 net165 net049 net177 net178 net179 net108 net290 net312 net159 net287 net309 net308 a_c2_b4 XI12 net0125 net376 net0124 net194 net218 net226 net227 net228 net0805 net107 net0807 net182 net382 net0128 net189 net180 net188 net123 net0129 net116 net117 net118 net336 net185 net0137 net184 net333 net355 net199 a_c2_b6 XI11 net062 net394 net061 net219 net081 net209 net208 net217 net216 net067 net215 net218 net213 net226 net227 net228 net359 net212 net074 net076 net356 net378 net224 a_c2_b7 XI10 net245 net284 net244 net243 net239 net248 net249 net250 net261 net0848 net231 net230 net238 net019 net229 net236 net233 net260 net235 net265 net240 net0265 net0850 net429 net234 net406 a_c2_b2 XI9 net264 net255 net263 net262 net260 net265 net0876 net413 net253 316 net07 net258 net252 net400 net041 net0861 net261 net0848 net0874 net266 a_c2_b1 XI8 net278 net426 net082 net427 net269 net431 net086 net273 net267 net421 net0876 net0895 a_c2_b0 XI2 net67 net60 net300 net301 net318 net299 net63 net65 net231 net230 net236 net69 net290 net129 net243 net323 net324 net295 net322 net287 net286 net285 net284 a_c1_b5 XI3 net55 net48 net323 net324 net341 net322 net51 net53 net316 net315 net132 net57 net113 net312 net311 net346 net347 net318 net345 net115 net309 net308 net127 a_c1_b5 XI4 net329 net328 net346 net347 net344 net345 net327 net326 net154 net338 net160 net337 net336 net101 net334 net343 net342 net341 net340 net333 net332 net331 net330 a_c1_b5 XI5 net352 net351 net343 net342 net384 net340 net350 net349 net97 net361 net348 net360 net359 net185 net109 net387 net388 net344 net386 net356 net355 net199 net353 a_c1_b5 XI6 net375 net374 net387 net388 net385 net386 net373 net372 net182 net382 net188 net381 net212 net194 net384 net378 net224 net376 a_c1_b6 XI7 net393 net392 net391 net390 net209 net208 net215 net396 net219 net385 net394 a_c1_b7 XI1 net79 net72 net434 net295 net433 net75 net77 net413 net253 net400 net81 net410 net234 net262 net300 net301 net435 net299 net145 net406 net255 a_c1_b1 XI0 net91 net84 net435 net87 net89 net269 net431 net421 net430 net429 net266 net427 net434 net433 net426 a_c1_b0 .ends a_add R7 dc0_7 w10 270.00m R6 dc0_6 w10 270.00m R5 dc0_5 w10 270.00m R4 dc0_4 w10 270.00m R3 dc0_3 w10 270.00m R2 dc0_2 w10 270.00m R1 dc0_1 w10 270.00m R0 dc0_0 w10 270.00m L2 net0154 net0152 1p L5 net0148 net0146 1p L6 net0146 net0144 1p L7 net0144 net0142 1p L3 net0152 net0150 1p L0 s1 net0156 1p L1 net0156 net0154 1p L4 net0150 net0148 1p L8 net0172 net0198 1p 317 XI6 net0188 net58 net59 net0182 net0342 net0343 net0344 net0174 net49 net0185 net0332 net0333 net53 net0335 net0336 net0337 net14 net15 net16 net0170 net0169 net19 net20 net0166 net5 net6 net7 net0161 net9 net0159 net0158 net0157 on_input_wireup XI5 c10_7 c10_6 c10_5 c10_4 c10_3 c10_2 c10_1 c10_0 c11_7 c11_6 c11_5 c11_4 c11_3 c11_2 c11_1 c11_0 net0172 net0199 net37 net36 net0202 net0195 net33 net0204 net30 net0190 net28 net27 net0193 net25 net24 net23 on_output_wireup XI4 net5 net6 net7 net0161 net9 net0159 net0158 net0157 e9 on_50_ohm_wps_40pin XI3 net14 net15 net16 net0170 net0169 net19 net20 net0166 e2 on_50_ohm_wps_40pin XI2 net23 net24 net25 net0193 net27 net28 net0190 net30 s2 on_50_ohm_wps_40pin XI1 net0198 net0199 net37 net36 net0202 net0195 net33 net0204 n2 on_50_ohm_wps_40pin XI0 s7 s6 net0174 net0344 net0343 net0342 net0182 net59 net58 net0188 net0337 net0336 net0335 net53 net0333 net0332 net0185 net49 c10_0 c10_1 c10_2 c10_3 c10_4 c10_5 c10_6 c10_7 c11_0 c11_1 c11_2 c11_3 c11_4 c11_5 c11_6 c11_7 dc0_0 dc0_1 dc0_2 dc0_3 + dc0_4 dc0_5 dc0_6 dc0_7 net0156 net0154 net0152 net0150 net0148 net0146 net0144 net0142 n7 n6 w9 w8 w7 w6 w5 w4 w3 w2 s8 a_add .end 318 Bibliography [1] Wikipedia, Benjamin franklin ? wikipedia, the free encyclopedia, 2011, [On- line; accessed 14-October-2011]. [2] Wikipedia, Eniac ? wikipedia, the free encyclopedia, 2011, [Online; accessed 14-October-2011]. [3] H. Kamerlingh Onnes, Koninklijke Nederlandse Akademie van Weteschappen Proceedings Series B Physical Sciences 13, 1274 (1910). [4] K. Likharev, Dynamics of Josephson Junctions and Circuits, Gordon and Breach Science Publishers, New York, 1986. [5] R. C. Jaklevic, J. Lambe, J. E. Mercereau, and A. H. Silver, Phys. Rev. 140, A1628 (1965). [6] A. Barone and G. Paterno`, Physics and applications of the Josephson effect, UMI books on demand, Wiley, 1982. [7] Y. Taur, IBM Journal of Research and Development 46, 213 (2002). [8] O. T. Oberg, Q. P. Herr, A. G. Ioannidis, and A. Y. Herr, IEEE Transactions on Applied Superconductivity 21, 571 (2011). [9] M. Tinkham, Introduction to superconductivity, Dover books on physics and chemistry, Dover Publications, second edition, 2004. [10] F. London and H. London, Proceedings of the Royal Society of London. Series A-Mathematical and Physical Sciences 149, 71 (1935). [11] J. Bardeen, L. Cooper, and J. Schrieffer, Physical Review 108, 1175 (1957). [12] B. Josephson, Physics Letters 1, 251 (1962). [13] P. W. Anderson and J. M. Rowell, Phys. Rev. Lett. 10, 230 (1963). [14] S. Anders et al., Physica C: Superconductivity 470, 2079 (2010), European Roadmap on Superconductor Electronics - Status and Perspectives. [15] K. Likharev and V. Semenov, IEEE Transactions on Applied Superconductivity 1, 3 (1991). [16] W. Chen, A. V. Rylyakov, V. Patel, J. E. Lukens, and K. K. Likharev, Applied Physics Letters 73, 2817 (1998). [17] S. Yorozu, Y. Kameda, Y. Hashimoto, and S. Tahara, IEEE Transactions on Applied Superconductivity 13, 450 (2003). 319 [18] Q. P. Herr, A. D. Smith, and M. S. Wire, Applied Physics Letters 80, 3210 (2002). [19] Y. Hashimoto, S. Yorozu, T. Satoh, and T. Miyazaki, Applied Physics Letters 87, 022502 (2005). [20] A. Inamdar et al., IEEE Transactions on Applied Superconductivity 19, 670 (2009). [21] I. V. Vernik et al., Superconductor Science and Technology 20, S323 (2007). [22] Y. Hashimoto, S. Yorozu, and Y. Kameda, IEICE Trans. on Electronics 91, 325 (2008). [23] S. Intiso et al., IEEE Transactions on Applied Superconductivity 15, 328 (2005). [24] Applied Superconductivity Conference, Reciprocal Quantum Logic, 2008. [25] N. S. Agency, NSA superconducting technology assessment, 2005, http://www.nitrd.gov/pubs/nsa/sta.eps. [26] Public Law 109-431, Report to congress on server and data center energy efficiency, 2007, http://www.energystar.gov/ia/partners/prod development/ downloads/EPA Datacenter Report Congress Final1.pdf. [27] Q. P. Herr, A. Y. Herr, O. T. Oberg, and A. G. Ioannidis, Journal of Applied Physics 109, 103903 (2011). [28] S. Shiva, Introduction to Logic Design, M. Dekker, New York, 1998. [29] HYPRES Design Rules, Available: http://www.hypres.com. [30] A. Vayonakis, C. Luo, H. Leduc, R. Schoelkopf, and J. Zmuidzinas, The millimeter-wave properties of superconducting microstrip lines, in AIP Confer- ence Proceedings, pages 539?542, Citeseer, 2002. [31] M. Bin, M. Gaidis, J. Zmuidzinas, T. Phillips, and H. LeDuc, Applied physics letters 68, 1714 (1996). [32] M. Johnson et al., Superconductor Science and Technology 23, 065004 (2010). [33] T. Satoh, K. Hinode, S. Nagasawa, Y. Kitagawa, and M. Hidaka, IEEE Trans- actions on Applied Superconductivity 17, 169 (2007). [34] Intel, Intel core i7 processor ? integration overview (lga1366-land package), http://www.intel.com/support/processors/corei7/sb/CS-030866.htm. 320 [35] J. Charles, P. Jassi, N. Ananth, A. Sadat, and A. Fedorova, Evaluation of the intel R? coreTM i7 turbo boost feature, in Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 188?197, IEEE, 2009. [36] P. Bunyk and P. Litskevitch, IEEE Transactions on Applied Superconductivity 9, 3714 (1999). [37] L. W. Nagel and D. Pederson, Spice (simulation program with integrated circuit emphasis), Technical Report UCB/ERL M382, EECS Department, University of California, Berkeley, 1973. [38] P. Horowitz and W. Hill, The art of electronics, volume 2, Cambridge university press Cambridge, 1989. [39] D. Pozar, Microwave engineering, Wiley-India, 2009. [40] S. Cohn, IEEE Trans. on Microw. Theory Tech. MTT-16, 110 (1968). [41] S. R. Whiteley, IEEE Trans. Magn. 27, 2902 (1991). [42] M. Elsbury et al., IEEE Trans. on Microwave Theory and Techn. 57, 2055 (2009). [43] M. Elsbury, P. Dresselhaus, S. Benz, and Z. Popovic, Integrated broadband lumped-element symmetrical-hybrid n-way power dividers, in Microwave Sym- posium Digest, 2009. MTT?09. IEEE MTT-S International, pages 997?1000, IEEE. [44] P. Kogge and H. Stone, IEEE Transactions on Computers 100, 786 (1973). [45] S. Knowles, A family of adders, in 14th IEEE Symposium on Computer Arith- metic, 1999. Proceedings., pages 30?34, IEEE, 1999. [46] A. Silver et al., Superconductor Science and Technology 16, 1368 (2003). [47] J. M. Martinis and R. L. Kautz, Physical Review Letters 63, 1507 (1989). [48] C. Bell et al., Applied physics letters 84, 1153 (2004). [49] T. Khaire, M. Khasawneh, W. Pratt Jr, and N. Birge, Physical review letters 104, 137002 (2010). [50] H. Hilgenkamp, Superconductor Science and Technology 21, 034011 (2008). [51] M. Dorojevets, On the road towards superconductor computers: Twenty years later, Technical report, DTIC Document, 2004. [52] M. Tanaka et al., IEEE Transactions on Applied Superconductivity 21, 1 (2011). [53] E. Farhi et al., A quantum adiabatic evolution algorithm applied to random instances of an np-complete problem, 2001. 321