ABSTRACT
Title of dissertation: CIRCUIT DESIGN AND ROUTING
FOR FIELD PROGRAMMABLE
ANALOG ARRAYS
Ji Luo, Doctor of Philosophy, 2005
Dissertation directed by: Professor Joseph Bernstein
Professor Martin Peckerar
Department of Electrical & Computer Engineering
Accurate, low-cost, rapid-prototyping techniques for analog circuits have been
a long awaited dream for analog designers. However, due to the inherent nature of
analog system, design automation in analog domain is very difficult to realize, and
field programmable analog arrays (FPAA) have not achieved the same success as
FPGAs in the digital domain. This results from several factors, including the lack of
supporting CAD tools, small circuit density, low speed and significant parasitic effect
from the fixed routing wires. These factors are all related to each other, making
the design of a high performance FPAA a multi-dimension problem. Among others,
a critical reason behind these difficulties is the non-ideal programming technology,
which contributes a large portion of parasitics into the sensitive analog system, thus
degrades the system performance.
This work is trying to attack these difficulties with development of a laser
field programmable analog array (LFPAA). There are two parts of work involved,
routing for FPAA and analog IC building block design. To facilitate the router
development and provide a platform for FPAA application development, a generic
arrayed based FPAA architecture and a flexible CAB topology were proposed. The
routing algorithm was based on a modified and improved pathfinder negotiated
routing algorithm, and was implemented in C for a prototype FPAA. The parasitic
constraints for performance analog routing were also investigated and solutions were
proposed. In the area of analog circuit design, a novel differential difference op
amp was invented as the core building block. Two bandgap circuits including a
low voltage version were developed to generate a stable reference voltage for the
FPAA. Based on the proposed FPAA architecture, several application examples were
demonstrated. The results show the flexible functionality of the FPAA. Moreover,
various laser Makelink test structures were studied on different CMOS processes
and BiCMOS copper process. Laser Makelink proves to be a powerful programming
technology for analog IC design. A novel laser Makelink trimming method was
invented to reduce the op amp offset. The application of using laser Makelink to
reconfigure the analog circuit blocks was presented.
CIRCUIT DESIGN AND ROUTING
FOR FIELD PROGRAMMABLE ANALOG ARRAYS
by
Ji Luo
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2005
Advisory Committee:
Professor Joseph Bernstein, Advisor
Professor Martin Peckerar, Advisor
Professor Neil Goldsman
Professor Pamela Abshire
Professor Ali Mosleh
c? Copyright by
Ji Luo
2005
DEDICATION
To my parents
ii
ACKNOWLEDGMENTS
One would be lucky enough to have an exceptional advisor. I have had two.
Six years ago, Professor Bernstein put me onto the right track toward this Ph.D.
degree. He shared with me his expertise in the academic area as well as his wisdom
about life. He has a unique angle of view when facing the tough problems and
can quickly grasp the key point. I hope I did learn a little from him. Two and a
half years ago, Professor Peckerar founded the Analog Systems Design Laboratory
(ASDL). It is a privilege for me to join ASDL since the very beginning and to
further conduct my doctorate work under his guidance. He is sharp, energetic and
extremely knowledgeable in almost every area. I not only improved my circuit
design techniques, but also learned a lot from him in the areas of device physics and
semiconductor processing. He has always made himself available for help or advice.
So, first of all, I want to thank them for their kindness, enthusiasm, and support. I
am very proud of being their student.
I would like also to thank other professors who served in my dissertation com-
mittee. In some sense, Professor Neil Goldsman is my ?un-official? advisor, and one
of my favorite instructors in the ECE department. I?m very grateful for his help
through the years. Professor Abshire has always been gracious and helpful since I
knew her. I benefit from some of the classic papers she collected, and thank her for
the comments on my Ph.D. proposal and the help from her research group. I got
iii
to know Professor Mosleh six years ago when he taught me a mathematics class.
As Director of the Reliability Engineering Program, he has a very tight schedule. I
sincerely appreciate him for spending his valuable time reading my dissertation and
serving in the committee.
The path to completing this dissertation has included the discovery of new
friends and colleagues. I want to thank all the fellows in the ASDL and Micro-
electronics Reliability group. They have made my graduate school experience a
cherished one. A special ?thank you? goes to Dr. J. Ari Tuchman for managing
various projects. It?s a precious experience to work with him.
Furthermore, I would like to acknowledge the University of Maryland Graduate
School for providing me two-year Fellowship. Thank Professor Bernstein and Pro-
fessor Peckerar for providing me the research assistantship through various funding
sources.
Finally, I am deeply indebted to my family for supporting me every step along
the journey. I thank my sister, Chun, for taking care of our parents and for encour-
aging me to pursue this doctorate degree. Thank my son Kevin for inspiring me and
bringing me a lot of happiness. I?m extremely grateful to my dear wife, Jing. We
have been walking through some of the hardest time together. Her love, encourage-
ment and care make this dissertation as much hers as it is mine. My parents have
breathlessly awaited this dissertation, and they deserve every single credit of any of
my achievements. Thank you, Mom and Dad!
iv
TABLE OF CONTENTS
List of Tables viii
List of Figures ix
1 Introduction to FPAA 1
1.1 Why Analog? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 What is a Field Programmable Analog Array? . . . . . . . . . . . . . 2
1.3 Evolution of FPAA and Other Programmable Analog Devices . . . . 6
1.4 Motivation of this work . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 TSMC018 CL/CM Process . . . . . . . . . . . . . . . . . . . . 11
1.4.2 Potential FPAA Applications . . . . . . . . . . . . . . . . . . 11
2 Programming Technology 13
2.1 Programming Technology Overview . . . . . . . . . . . . . . . . . . . 13
2.1.1 SRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Antifuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 EPROM and EEPROM . . . . . . . . . . . . . . . . . . . . . 19
2.2 Laser Makelink Technology . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Laser Makelink Principle . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Laser Makelink Design . . . . . . . . . . . . . . . . . . . . . . 24
2.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Laser Makelink Applications . . . . . . . . . . . . . . . . . . . . . . . 35
3 Routing for FPAA 37
3.1 What is routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . 40
3.1.2 Switch Box and Connection Box . . . . . . . . . . . . . . . . . 41
3.1.3 Definition of Legal Connections . . . . . . . . . . . . . . . . . 42
3.2 Problem Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 FPAA Routing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.2 Pathfinder Negotiated Routing Algorithm . . . . . . . . . . . 51
3.4 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5 Investigations of Performance Constraints on the Routing . . . . . . . 60
4 Configurable Analog Block 67
4.1 PCA and PRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 CAB Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5 The Differential Difference Op Amp Design 75
5.1 Op Amp Topology Selection . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Design of the Differential Difference Op Amp . . . . . . . . . . . . . . 83
5.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 106
v
5.4 Application of Laser Makelink in the Op Amp Design . . . . . . . . . 113
5.4.1 Offset Trimming . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.2 Laser Reconfiguration . . . . . . . . . . . . . . . . . . . . . . 119
6 Bandgap Reference 123
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2 Principle of Bandgap Reference . . . . . . . . . . . . . . . . . . . . . 125
6.3 A CMOS Implementation of Bandgap Reference . . . . . . . . . . . . 129
6.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.3.2 The Bandgap Core . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3.3 Op Amp Design . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3.4 The Complete Circuit . . . . . . . . . . . . . . . . . . . . . . 136
6.3.5 Layout Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3.6 Results and Discussions . . . . . . . . . . . . . . . . . . . . . 144
6.4 Laser Makelink Trimming for Precision . . . . . . . . . . . . . . . . . 147
6.5 A Low Voltage, Curvature Compensated Bandgap Reference . . . . . 152
7 FPAA Applications 164
7.1 CAB Based Applications . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.1.1 Gain Amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.1.2 Active Analog Filter . . . . . . . . . . . . . . . . . . . . . . . 168
7.2 Temperature Measurement . . . . . . . . . . . . . . . . . . . . . . . . 175
7.3 A Hierarchical Implementation of an 8-bit Two-Step ADC . . . . . . 180
8 Conclusions and Future Work 191
A Chip Layout 194
A.1 Laser Makelink Test Chips . . . . . . . . . . . . . . . . . . . . . . . . 194
A.2 The Fully Differential Difference Amplifier . . . . . . . . . . . . . . . 196
A.3 The Bandgap Reference . . . . . . . . . . . . . . . . . . . . . . . . . 197
A.4 Two-Step ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
B FPAA Router Documentation 198
Bibliography 204
vi
LIST OF TABLES
5.1 Comparison between continuous time and discrete time . . . . . . . . 76
vii
LIST OF FIGURES
1.1 A typical digital VLSI design flow . . . . . . . . . . . . . . . . . . . . 4
1.2 Anadigm?s Field Programmable Analog Array AN10E40 . . . . . . . 5
1.3 Analog Design Tradeoffs . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 SRAM controlled MOSFET switch . . . . . . . . . . . . . . . . . . . 14
2.2 The parasitic capacitance associated with an MOSFET switch . . . . 15
2.3 Actel antifuse (a) A cross section; (b) A simplified drawing (c) top view 16
2.4 Metal-metal antifuse. (a) An idealized cross section of a QuickLogic
metal-metal antifuse in a two-level metal process. (b) A metal-metal
antifuse in a three-level metal process that uses contact plugs. The
conductive link usually forms at the corner of the via where the elec-
tric field is highest during programming. . . . . . . . . . . . . . . . . 17
2.5 An EPROM transistor. (a) With a high (> 12 V) programming
voltage, V PP , applied to the drain, electrons gain enough energy
to ?jump? onto the floating gate (gate1). (b) Electrons stuck on
gate1 raise the threshold voltage so that the transistor is always off
for normal operating voltages. (c) Ultraviolet light provides enough
energy for the electrons stuck on gate1 to ?jump? back to the bulk,
allowing the transistor to operate normally. . . . . . . . . . . . . . . . 20
2.6 Vertical Laser Makelink structure (a) top view (b) cross-section view 23
2.7 FIB cross-section of a vertical Makelink structure . . . . . . . . . . . 24
2.8 Energy effect on the vertical link (2um bottom metal line, 4um hole)
formation (a)E = 0.11uJ; (b)E = 0.49uJ . . . . . . . . . . . . . . . . 26
2.9 Four later link structures design for NSC?s 0.18 um CMOS process . . 29
2.10 Energy windows of the four later link structures and their average
resistance per link (2.2 um pitch) . . . . . . . . . . . . . . . . . . . . 30
2.11 Test chain yield of the four later link structures . . . . . . . . . . . . 31
2.12 Table 2.1 comparison between laser Makelink with other program-
ming technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.13 Reconfigurable MOS transistor aspect ratio . . . . . . . . . . . . . . . 36
viii
3.1 A simplified FPAA CAD design flow . . . . . . . . . . . . . . . . . . 37
3.2 An array based FPAA architecture . . . . . . . . . . . . . . . . . . . 39
3.3 (a)A Connection Box; (b)Switch box patterns 1; (c) pattern 2 . . . . 43
3.4 (a) a simplified FPAA architecture (b) the corresponding routing re-
source graph (RRG) . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 (a) a directed graph (b) adjacency list (c) adjacency matrix . . . . . . 46
3.6 The role of routing resource graph generator . . . . . . . . . . . . . . 46
3.7 Lee?s Maze Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8 Dijkstra algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 Prim algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.10 The functionality of p(n) in resolving the congestion . . . . . . . . . . 53
3.11 The improved pathfinder negotiated routing algorithm . . . . . . . . 54
3.12 Pathfinder algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.13 Data structure definitions . . . . . . . . . . . . . . . . . . . . . . . . 59
3.14 Pathfinder algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 The resistors and capacitors arrangement inside the CAB [60] (a)
PCA ; (b) PRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 The improved resistors and capacitors arrangement inside the CAB
(a) PCA ; (b)PRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3 The differential difference amplifier . . . . . . . . . . . . . . . . . . . 72
4.4 The Sallen-Key bandpass filter [61] . . . . . . . . . . . . . . . . . . . 73
4.5 The complete CAB structure . . . . . . . . . . . . . . . . . . . . . . . 74
5.1 Analog Design Tradeoffs . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Four single stage amplifier topologies . . . . . . . . . . . . . . . . . . 80
5.3 The DDA conceptual block diagram (a)symbol; (b)block diagram . . 83
ix
5.4 The input stage of the DDA . . . . . . . . . . . . . . . . . . . . . . . 86
5.5 The output stage of the DDA . . . . . . . . . . . . . . . . . . . . . . 88
5.6 A transistors-only common-mode feedback circuit . . . . . . . . . . . 91
5.7 The common-mode feedback circuit used in this design . . . . . . . . 92
5.8 The Vth referenced biasing block (a) two possible operating points (b)
the complete biasing block with a startup circuit. . . . . . . . . . . . 93
5.9 Biasing the cascoded current mirror . . . . . . . . . . . . . . . . . . . 95
5.10 The high swing Biasing block . . . . . . . . . . . . . . . . . . . . . . 96
5.11 The amplifier core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.12 The complete biasing block . . . . . . . . . . . . . . . . . . . . . . . 98
5.13 The fully differential input stage . . . . . . . . . . . . . . . . . . . . . 103
5.14 The class AB output stage . . . . . . . . . . . . . . . . . . . . . . . . 104
5.15 The complete amplifier layout . . . . . . . . . . . . . . . . . . . . . . 105
5.16 Supply independent biasing block at start-up (a) DC sweep; (b) tran-
sient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.17 Temperature sweep of the supply independent biasing block . . . . . 108
5.18 Open-loop frequency response . . . . . . . . . . . . . . . . . . . . . . 109
5.19 Common mode rejection ratio vs. frequency . . . . . . . . . . . . . . 110
5.20 Power supply rejection ration vs. frequency . . . . . . . . . . . . . . . 110
5.21 Large signal step response Gain=1 with 0.8V step . . . . . . . . . . . 111
5.22 Small signal step response Gain=1 with 10mV step . . . . . . . . . . 111
5.23 Closed-loop gain as a function of frequency . . . . . . . . . . . . . . . 112
5.24 The input referred noise as a function of frequency . . . . . . . . . . 113
5.25 Offset cancellation (a) Auto-zeroing; (b) Chopper stabilization . . . . 114
5.26 (a) the input stage of a fully differential CMOS op amp (b) The
internal configuration of the trim box . . . . . . . . . . . . . . . . . . 117
x
5.27 Offset trimming by laser Makelink: 10mV offset is reduced to 50uV
with a group of 10 ?trim? transistors . . . . . . . . . . . . . . . . . . 118
5.28 The amplifier core showing multiple compensation . . . . . . . . . . . 121
5.29 DDA open-loop frequency response: 250fF Cc vs. 450fF Cc . . . . . . 122
6.1 A generic Mixed-Signal System . . . . . . . . . . . . . . . . . . . . . 124
6.2 Diode References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3 An Illustration of Bandgap Principle . . . . . . . . . . . . . . . . . . 126
6.4 AD580Precision BandgapReference Based on BrokawCell, Analog Devices,
1974 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.5 Realization of Substrate PNP BJTs on the CMOS process [88] . . . . 130
6.6 A Block Diagram of the Proposed BGR . . . . . . . . . . . . . . . . . 131
6.7 The BGR Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.8 Schematic of the 2-stage folded-cascode op amp . . . . . . . . . . . . 135
6.9 Op Amp frequency response with 2pF capacitive . . . . . . . . . . . . 137
6.10 Op Amp frequency response with 10pF capacitive load . . . . . . . . 138
6.11 The complete BGR schematic . . . . . . . . . . . . . . . . . . . . . . 140
6.12 Resistor layout arrangement . . . . . . . . . . . . . . . . . . . . . . . 142
6.13 BJT layout arrangement . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.14 Overall BGR Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.15 BGR Temperature Sweep . . . . . . . . . . . . . . . . . . . . . . . . 146
6.16 BGR voltage as a function of supply voltage . . . . . . . . . . . . . . 148
6.17 BGR power supply rejection ratio . . . . . . . . . . . . . . . . . . . . 149
6.18 BGR output noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.19 BGR output noise with improvement . . . . . . . . . . . . . . . . . . 151
6.20 BGR programmable resistor for laser Makelink trimming . . . . . . . 153
xi
6.21 A low voltage BGR without curvature compensation . . . . . . . . . 155
6.22 A low voltage BGR with curvature compensation . . . . . . . . . . . 158
6.23 Comparison between BGR?s with and without curvature compensation159
6.24 BGR voltage as a function of supply voltage variation . . . . . . . . . 161
6.25 BGR power supply rejection ratio . . . . . . . . . . . . . . . . . . . . 162
6.26 BGR noise performance . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.1 Non-inverting gain amplifier configuration . . . . . . . . . . . . . . . 165
7.2 Non-inverting gain amplifier frequency response . . . . . . . . . . . . 166
7.3 Voltage controlled current source (VCCS) (a)schematic; (b)output . . 167
7.4 A reference voltage generation block for ADC . . . . . . . . . . . . . 168
7.5 DDA as a modulation/multiplication cell (a)schematic; (b)output . . 169
7.6 Choice of filter as a function of the operating frequency range . . . . 170
7.7 Generalized Sallen-Key topology . . . . . . . . . . . . . . . . . . . . . 171
7.8 A second order Sallen-Key narrow band-pass filter . . . . . . . . . . . 173
7.9 A third order Butterworth low-pass filter based on Sallen-Key topology174
7.10 A third order Butterworth high-pass filter based on Sallen-Key Topol-
ogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.11 A simplified schematic of the generation of VPTAT . . . . . . . . . . . 177
7.12 Temperature monitoring/measurement block (a) diagram; (b) result
(0-100?C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.13 A Fully Differential 2-step Flash ADC Diagram . . . . . . . . . . . . 182
7.14 (a) charge injection; (b) clock feedthrough . . . . . . . . . . . . . . . 183
7.15 A fully differential S/H based on DDA follower (a) schematic; (b)
power spectrum of the sampled signal . . . . . . . . . . . . . . . . . . 184
7.16 A fully differential BPS S/H (a) schematic; (b) timing graph . . . . . 186
7.17 BPS: Power spectrum of the sampled signal . . . . . . . . . . . . . . 187
xii
7.18 The DDA based comparator . . . . . . . . . . . . . . . . . . . . . . . 188
7.19 The DDA based implementation of the subtractor . . . . . . . . . . . 189
7.20 The DDA based implementation of the subtractor . . . . . . . . . . . 190
A.1 Al Makelink test chip - NSC 0.18um CMOS . . . . . . . . . . . . . . 194
A.2 Cu Makelink test chip - IBM 8HP 0.13um BiCMOS SiGe . . . . . . . 195
A.3 The fully differential difference op amp - TSMC018 CM process . . . 196
A.4 Thefirst orderbandgap reference chip with test transistors -TSMC018
CM process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
A.5 The two-step flash analog-to-digital data converter - TSMC018 CM
process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
xiii
Chapter 1
Introduction to FPAA
1.1 Why Analog?
Since the 1980s digital signal processing algorithms have become increasingly
powerful. With today?s advanced CMOS VLSI technology (both TI and STmi-
croelectronics have successfully commercialized the 65 nm CMOS process [1], [2]),
millions of transistors can be integrated into a tiny silicon chip. These high density
and powerful digital ICs have made many functions that traditionally were realized
in analog form are now easily implemented in the digital domain. This seems to
announce the demise of analog circuit. But, why are analog designs still in such
great demand?
After all, the real world is analog. Physical properties such as sound, light,
temperature, position, speed, pressure, etc., are all ?analog signals?. Analog circuits
play an extremely important role in bringing the ?analog world? together with
the ?digital world?. The real world signals need to be conditioned before being
processed, either in the digital or analog domain, before driving an analog output.
Therefore analog circuit blocks find broad applications including signal processing
and conditioning, power management ICs, industry control, function generation,
1
A/D, D/A converters ...In fact, analog circuits as the interfacing blocks exist in
almost every digital IC. ?For every dollar spent on microprocessors, another $1.50
is required to create an interface to the rest of the system? [3]. Databeans, Inc.
estimates that the analog semiconductor market was worth about $31 billion in
2004. Following a relatively flat year in 2005, this market is expected to rebound
significantly in 2006, with up to 17 percent growth [4].
1.2 What is a Field Programmable Analog Array?
An important advantage of digital ICs has been their relative ease of design.
Figure 1 is a typical digital system design flow [5]. Many CAD compatible digi-
tal IC design methodologies have been developed. For example, a standard ASIC
design flow includes hardware behavioral description (VHDL/Verilog), design syn-
thesis and optimization, place and route and final fabrication. When time-to-market
and cost are primary concerns, the above flow can be implemented through Field
Programmable Gate Arrays (FPGAs). On the contrary, analog design still features
an intuitive and manual approach. Its design automation is very difficult to realize.
The first-pass (analog) silicon depends heavily on the designer?s experience, and the
design cycle for a successful analog IC is very long.
It?s well known that many (digital) ASICs can be quickly implemented and
verified by FPGAs with appropriate programming. Due to their low non-recurring
engineering (NRE) costs, short time-to-market, ease of design and low testing costs,
2
FPGAs have become the most popular ASIC solution [6]. Naturally, we may ask:
can we have a field programmable analog array (FPAA) as its digital counter part?
In general, an FPAA is a monolithic collection of analog building blocks (con-
figurable analog blocks, i.e., CABs), a programmable routing network used for pass-
ing signals between CABs, and a block of memory (for SRAM based FPAA) storing
configuration data which is used to define both the functions and structures. Al-
ternatively, the circuit topologies and routing structures may be defined by other
methods such as antifuse programming technologies. A commercial FPAA chip
(AN10E40) layout is shown in Figure 2. It contains a 4 x 5 CAB array, an intercon-
nect network and 13 I/O blocks. A configuration bit stream stored in the on-chip
SRAM is used to configure the topology [7]. Each CAB can implement a number
of analog signal processing functions such as amplification, integration, differentia-
tion, addition, subtraction, multiplication, comparison, log, and exponential. The
interconnection network routes signals from one CAB to another, and to and from
the I/O blocks.
Laser Field Programmable Analog Array, i.e., LFPAA, is a variant of FPAA.
All the switches of an LFPAA are implemented with Laser Makelink technology.
LFPAAs are programmed with an infrared (IR) laser.
3
Figure 1.1: A typical digital VLSI design flow
4
Figure 1.2: Anadigm?s Field Programmable Analog Array AN10E40
5
1.3 Evolution of FPAA and Other Programmable Analog Devices
The first field-reconfigurable analog IC, originally intended primarily for syn-
thesis and test of analog neural-network architectures, was proposed by Sivilotti [8].
CMOS transmission gates were used as the active switch elements that connected
basic resources such as differential pairs and current mirrors in a hierarchical rout-
ing network. On board memory (SRAM) was provided for storing the state of each
switch element but no memory was provided for storing circuit coefficients. In later
work, Lee and Gulak [9] presented a low power FPAA based on MOS subthreshold
circuit technique, where pass transistors controlled by SRAM based memory ele-
ments, were used as the active switches. Multi-valued memories were used to store
circuit coefficients. However die-to-die variations in subthreshold model parameters
brought challenges to circuit operation.
Simultaneously with [9] above, two patents were filed describing the design
of an FPAA. Pilkington Micro-Electronics [10] described an array of operational
amplifiers and associated programmable resistors and capacitors. Pass transistors
were used as interconnect switches, while programmable resistors were constructed
from multiple pairs of complementary MOS transistors. Each resistor was individu-
ally compensated to allow for manufacturing tolerances and temperature variations.
Capacitors with value of 5e-12 Farad were fabricated, which were then multiplied
?by two impedance converters to final value of 5e-9 farad. Its applications were
in the area of graphic equalizers, audio mixer desks, special purpose filters, spec-
trum analyzers, signal generators, prototyping, bands-free circuits for telephones,
6
and education. Sako [11] also described an FPAA design consisting of operational
amplifiers, passive resistor and capacitor elements interconnected with pass transis-
tors. More recently, Pankiewies et al. proposed a CMOS implementation of OTA
based FPAA [12] which was especially attractive for analog filter applications.
Some commercial programmable analog ICs are also available. One of the
first is GAP-01 [13]. This is the first attempt by industry to define a universal
analog building block that could be used in several applications by externally routing
signals present on the pins of the package. The first switched capacitor based FPAA
was proposed by IMP [14] in 1995. It aims at general-purpose signal conditioning
tasks in medical, industrial or other instrumentation and control systems, but the
bandwidth is very small, only 150KHz at unit gain. This product was withdrawn
from market in 1997. In the same year, Zetex [15] introduced the first continuous-
time based analog programmable device - TracTM. The bandwidth increased to
4MHz., but the functionality it can realize is limited. By now probably the most
successful FPAA products are from Anadigm (the former FPAA group of Motorola).
Anadigm?s FPAAs are also based on switched capacitor technique. The bandwidth
of their products has increased from 250KHz (AN10E40) to 2MHz (AN20E40) [16].
A set of pre-designed analog module libraries (CAM - configurable analog module)
and a software package are provided with Anadigm?s FPAA chip. Many analog
functions can be easily implemented with Anadigm?s FPAA quickly.
7
1.4 Motivation of this work
There have been several programmable analog circuits available in the litera-
ture as well as some commercial chips available on the market. However, the func-
tionalities they implement are relatively limited and their bandwidth is small. A
general purpose FPAA with good supporting CAD tool suitable for high frequency
applications has not yet appeared. From circuit design point of view, this could
be due to (1) Most of previous designs are based on switched-capacitor technique,
thus the system bandwidth is limited by the clock and sampling rate; (2) Many of
them use MOS transistor based switches. When the array size grows, the numerous
switches can contribute significant amount of parasitics into the circuit and dramat-
ically degrade the system performance. In the area of design automation, very few
papers [17], [18] available address the CAD tools development for analog arrays or
other programmable analog devices. The difficulty mainly comes from the inherent
difference between analog and digital systems in many aspects:
(1) Loose form of hierarchy: the hierarchical decomposition of digital systems
is clearly defined with well-accepted levels (Figure 1), while analog designs have a
loose form of hierarchy because the hierarchical decomposition in analog is based on
an intuitive structural decomposition of the modules, rather than the properties of
signal type and corresponding time representation at different abstraction levels as
in digital.
(2) Large spectrum of specifications: more performance specifications are im-
posed on analog circuits than the digital ones. In addition, the specifications often
8
impose conflicting requirements on the design. This results in many trade-offs to be
managed during the design of the circuit, usually a multidimensional problem which
is difficult to handle (Figure 3 [19]).
Figure 1.3: Analog Design Tradeoffs
(3) Big influence of technology: technology and environmental parameters show
a larger influence on analog circuits. Process, biasing or temperature variations and
layout parasitics strongly influence the circuit performance and can even change the
functionality of the circuit.
(4) Interactions at the system level: Analog circuits are also very sensitive to
interactions at the system level. The interactions may be between two analog blocks,
or between analog block and digital block of a large system such as clock noise.
Similarly, if several different channels of a data-acquisition system are integrated
on one chip, strong crosstalk may happen between these channels and cause serious
signal integrity issue.
(1) and (2) make automatic technology mapping and placement prohibitive
to implement for FPAAs. Because of (3) and (4), the parasitics induced perfor-
9
mance degradation such as loading and coupling effect is much more complicated
for FPAAs than that of FPGAs. To attack these difficulties, we need 1) flexible and
efficient internal CAB and FPAA architecture; 2)rich IP portfolio which provide
high performance, pre-qualified analog/mixed-signal IPs; 3)good supporting CAD
tools. Apparently there are a lot of work involved. So this work is an attempt to pro-
vide some initial solutions for these areas with focus on analog IP design and FPAA
router. Although the idea originates from the concept of a field programmable ana-
log array, the author is trying to go beyond the array based approach and develop a
hierarchical analog/mixed-signal design approach by taking advantage of the flexi-
bility that laser Makelink provides. A hierarchical based design is configurable and
suitable for CAD methodologies. It provides pre-qualified software and hardware
components, and is able to translate complex analog circuits to a simple set of high-
level functions. So it?s ideal for building prototype systems or low volume analog
ASICs for it?s quick-to-market time.
It should also be noted that in this array based FPAA architecture, there are
abundant interconnect routing resources. The coupling between the wires and the
noise from substrate may be a serious issue. Therefore, careful layout design is ex-
tremely important. In this work, common centroid, interdigitated device structures,
and dummy devices are used extensively to improve matching.
Laser Makelink is an essential programming technology in this work. Make-
link?s are not only used as routing switches, but also used as a trimming method to
improve the precision and reduce the cost due to the extra circuits.
10
1.4.1 TSMC018 CL/CM Process
Most of the designs in this work were done on TSMC 0.18um Mixed-Signal
Mode Process with 3.3 V power supply. The default features of this Mixed-Signal
process include: 1p6M, 1.8 V/3.3 V MOS transistors, deep N-Well, linear MIM
capacitor, spiral inductor, MOS varactor, junction varactor, poly/diffusion resistors
and thick top metal interconnect [20].
1.4.2 Potential FPAA Applications
FPAAs and the hierarchical designs won?t be suitable for large volume semi-
conductor analog products, such as in the sectors of flat panel display, storage,
consumer electronics ...However, it?s a cost efficient solution for a relatively small
volume, analog ASICs or for quick system prototyping or verifications. The potential
applications include:
? Signal Amplification, Summation, Filtering, Integration
? Signal Conditioning for A/D Converters: buffer, pre-amplifier
? Flexible AFEs for Data Acquisition
? Industry and Aerospace Control Circuit Block (PID application)
? Sensor Signal Conditioning
? Precision Voltage Monitoring
11
They may be used in the areas include discrete PCB design integration, aerospace
applications which requires radiation-hard design, or as a sub-system of an SoC or
structured ASIC.
12
Chapter 2
Programming Technology
2.1 Programming Technology Overview
FPGAs/FPAAs can be categorized by how they are being programmed, i.e.,
how the switches are implemented. The programming technology has critical impact
on the system performance. The existing options today include SRAM controlled
MOSFET switch, antifuses and EPROM/EEPROM.
2.1.1 SRAM
SRAM controlled MOSFET switch (or transmission gate) is probably the most
widely used programming technology [22]. An SRAM-based FPGA/FPAA is pro-
grammed by loading the configuration bit stream from an external source into the
on-chip SRAM memory. Each switch, in most cases an MOS transistor, in the
CAB/logic and routing interconnect is controlled by a memory cell. Figure 2.1 is a
typical switch matrix and the controlling SRAM cell. Using SRAM programming
technology, users may reuse chip during prototyping to reduce cost, and a system
can be configured using ISP (in system programming). SRAM programming is
also useful for upgrade - manufacturer may send customers a new configuration file
13
(a) An SRAM controlledswitchma-
trix. M represents an SRAM cell
(b) A 6-transistor SRAM cell [21]
Figure 2.1: SRAM controlled MOSFET switch
instead of a new chip to upgrade the system function. However, SRAM?s biggest ad-
vantage ?reconfigurability? also brings a disadvantage - volatility. When the power
is off, the configuration data is lost. So an SRAM based FPGA/FPAA must be
reprogrammed each time power is applied. And SRAM based programmable de-
vices often cost more silicon area. Moreover,the relatively high resistance of the
MOS switch may severely limits the overall system bandwidth. As shown in figure
2.2, each terminal of the MOSFET switch is associated with some parasitic capaci-
tance. As the number of switches grows, those parasitic capacitors and resistors will
dramatically slow down the speed.
14
Figure 2.2: The parasitic capacitance associated with an MOSFET switch
2.1.2 Antifuse
An antifuse is the opposite of a regular fuse - an antifuse is normally an open
circuit until a programming current flowing through it.
Actel?s Antifuse Technologies
Actel?s oxide-nitride-oxide (ONO) antifuse is a well known programming tech-
nology. In this poly-diffusion antifuse, the high current density causes a large power
dissipation in a small area, which melts a thin insulating dielectric between polysil-
icon and diffusion electrodes and forms a thin, permanent, and resistive silicon link.
The programming process also drives dopant atoms from the poly and diffusion
electrodes into the link, and the final level of doping determines the resistance value
of the link. Actel calls this antifuse a programmable low-impedance circuit element
(PLICE) [23]. Figure 2.2 shows a poly-diffusion antifuse with an ONO dielectric
sandwich of SiO2 grown over the n-type antifuse diffusion, a Si3N4 layer, and an-
other thin SiO2 layer [24]. The average resistance of a blown antifuse are controlled
15
Figure 2.3: Actel antifuse (a) A cross section; (b) A simplified drawing (c) top view
by the fabrication process and the programming current, but actual values may vary
in the range between 100 - 800 ? with nominal value of about 500 ?. ONO antifuse
has smaller footprint and it?s radiation tolerant, but its fabrication requires modi-
fications of the standard CMOS process. For examples, a double-metal, single-poly
CMOS process typically uses about 12 masks-the Actel process requires an addi-
tional three masks. The n-type antifuse diffusion and antifuse polysilicon require an
extra two masks and a 40 nm (thicker than normal) gate oxide (for the high-voltage
transistors that handle the programming voltage) uses one more masking step. And
it?s a weak one dimensional filament, which is not suitable for carrying high current.
Actel also has another antifuse called M2M. M2M antifuse is composed of
layers of amorphous silicon and dielectrics, sandwiched between top metal and the
via-plug that is used for connecting lower metal to the top metal. Application of
a 15V programming pulse causes a phase change within the amorphous silicon. A
filament of crystalline silicon forms between the metal layers. That filament is a
mixture of silicon and the metal-layer material. Typical connection resistance is
20 ? to 100 ?.
16
Quicklogic Metal-Metal Antifuse Technology
Figure 2.4 shows a QuickLogic metal-metal antifuse (ViaLinkTM). QuickLogic
ViaLink is a Tungsten plug connecting the two metal layers with a layer of amor-
phous silicon antifuse material deposited on top. The amorphous silicon provides
a high resistance layer (>1 G?) insulating the Tungsten plug. When the program-
ming voltage is applied the amorphous silicon is converted to low resistance silicon
with resistance of typically 80 ?.
Figure 2.4: Metal-metal antifuse. (a) An idealized cross section of a QuickLogic
metal-metal antifuse in a two-level metal process. (b) A metal-metal antifuse in a
three-level metal process that uses contact plugs. The conductive link usually forms
at the corner of the via where the electric field is highest during programming.
There are two advantages of a metal-metal antifuse over a poly-diffusion anti-
fuse. The first is that connections to a metal-metal antifuse are direct to metal-the
wiring layers. Connections from a poly-diffusion antifuse to the wiring layers require
extra space and create additional parasitic capacitance. The second advantage is
that the direct connection to the low-resistance metal layers makes it easier to
17
use larger programming currents to reduce the antifuse resistance. The nominal
QuickLogic metal-metal antifuse resistance is approximately 80 ? (with a standard
deviation of about 10 ?) using a programming current of 15mA as opposed to an
average antifuse resistance of 500 ? for a poly-diffusion antifuse.
The size of an antifuse is limited by the resolution of the lithography equipment
used to makes ICs. The Actel antifuse connects diffusion and polysilicon, and both
of these materials are too resistive for use as signal interconnects. To connect the
antifuse to the metal layers requires contacts that take up more space than the
antifuse itself, reducing the advantage of the small antifuse size.
An antifuse is resistive and the addition of contacts adds parasitic capacitance.
The intrinsic parasitic capacitance of an antifuse is small, but to this we must add
the extrinsic parasitic capacitance that includes the capacitance of the diffusion and
poly electrodes (in a poly-diffusion antifuse) and connecting metal wires. These
unwanted parasitic elements could add considerable RC interconnect delay if the
number of antifuses connected in series is not kept to minimum. Clever routing
techniques are therefore crucial to antifuse-based FPGAs [22]. The long-term re-
liability of antifuses is an important issue. , Actel?s research has shown that the
programmed link is fragile under over-current conditions. Such conditions occur
frequently in normal operation, making the field reliability of amorphous antifuses
questionable. High circuit speeds and large array sizes increase the likelihood of
over-current failure, limiting the speed and size attainable with an amorphous an-
tifuse. The programmed antifuses sometimes revert to a high-impedance state due
to cracking or a phenomenon called read disturb. The result is that the antifuse?s
18
resistance jumps, which will change the corresponding logic circuit?s propagation
delays and may even look to the logic like an open-circuit. This reversion tends to
be self-healing; normal logic-high voltages are sufficient to reprogram the disturbed
antifuse. However, there is no guarantee that the node containing the disturbed
antifuse will see a logic-high voltage again, once the change has occurred. Thus,
the tendency to self-heal is not a reliable antidote. Therefore, the designer using
the Actel M2M or QuickLogic antifuse must limit the current flow through them to
avoid stressing the filament, and it?s virtually impractical to use it for analog design
2.1.3 EPROM and EEPROM
UV-erasable electrically programmable read-only memory (EPROM) cells are
used in many programmable devices such as Altera MAX 5000 EPLDs and Xilinx
EPLDs as their programming technology. Altera?s EPROM cell is shown in Figure
2.5 [24]. The EPROM cell is almost as small as an antifuse. An EPROM transistor
looks like a normal MOS transistor except it has a second, floating gate (gate1 in
Figure 2.5). Applying a programming voltage VPP (usually greater than 12V) to
the drain of the n- channel EPROM transistor programs the EPROM cell. A high
electric field causes electrons flowing toward the drain to move so fast they ?jump?
across the insulating gate oxide where they are trapped on the bottom, floating
gate. We say these energetic electrons are hot and the effect is known as hot-
electron injection or avalanche injection. EPROM technology is sometimes called
floating-gate avalanche MOS (FAMOS).
19
Figure 2.5: An EPROM transistor. (a) With a high (> 12 V) programming voltage,
V PP , applied to the drain, electrons gain enough energy to ?jump? onto the floating
gate (gate1). (b) Electrons stuck on gate1 raise the threshold voltage so that the
transistor is always off for normal operating voltages. (c) Ultraviolet light provides
enough energy for the electrons stuck on gate1 to ?jump? back to the bulk, allowing
the transistor to operate normally.
Electrons trapped on the floating gate raise the threshold voltage of the n-
channel EPROM transistor. Once programmed, an n-channel EPROM device re-
mains off even with VDD applied to the top gate. An unprogrammed n- channel
device will turn on as normal with a top-gate voltage of VDD. The programming
voltage is applied either from a special programming box or by using on-chip charge
pumps. Exposure to an ultraviolet (UV) lamp will erase the EPROM cell. An
absorbed light quantum gives an electron enough energy to jump from the floating
gate. The manufacturer provides a software program that checks to see if a part
is erased. EPLD parts are available in a windowed package for development, erase
it, and use it again, or in a nonwindowed package and program (or burn) the part
once only for production. The packages get hot while they are being erased, so that
20
windowed option is available with only ceramic packages, which are more expensive
than plastic packages.
Programming an EEPROM transistor issimilar toprogramminganUV-erasable
EPROM transistor, but the erase mechanism is different. In an EEPROM transis-
tor an electric field is also used to remove electrons from the floating gate of a
programmed transistor. This is faster than using a UV lamp and the chip does not
have to be removed from the system.
Advantages of EPROM/EEROM are their reconfigurability and non-volatility.
But they have large resistance, occupy more silicon area and require multiple volt-
age sources to be programmed. Moreover, their fabrication is not compatible with
standard CMOS processes.
2.2 Laser Makelink Technology
All of the programming technologies introduced above suffer from various
problems, such as high resistance, large parasitic capacitance, incapable to carry
large current and incompatible with standard CMOS processes. Ideally we wish
the programmable switches have the properties of a metal wire. Thus laser Make-
link technology is the most promising candidate, especially for analog/mixed-signal
applications.
Laser processing techniques have been used in semiconductor industry for
many years. Laser-induced cutting was one of the successful examples. The technol-
ogy was first commercialized by IBM in 1979 [25], in which a laser with a 1060 nm
21
wavelength was used to cut off the defective memory cells and ?replace? them with
redundancies. During the early years, poly-silicon was the target material. How-
ever, with the development of multi-level metallization, deeply buried polysilicon
lines have become harder to cut. Laser diffused link was also reported, but high
resistance and current leakage limited its commercial application [26]. Hence, peo-
ple started looking at the shallower metal layers. Open-window metal cuts have
been found in commercial devices, like LPGAs, but the exposed metal can evoke
reliability concerns and the process requires extra-mask and process steps. The
most favorable metal cut structure would be hermetic, no need for extra-mask and
compatible with the standard CMOS process. Unfortunately, recent study indicates
that the applicable laser processing window for buried cuts is too narrow to satisfy
the yield [27].
As a complementary scheme, laser-induced metal antifuse, i.e., laser Makelink,
has been proposed [28] [29] [30] [31] which has shown much broader process window
and higher yield. The electrical connection is formed vertically between two levels
of metallic interconnects by applying an IR laser pulse (1047 nm wavelength) with a
time frame of several nanoseconds. This link structure possesses inherit advantages:
extremely low parasitics, strong connections, high reliability hermeticity, radiation
hardness, and CMOS process compatibility. Thus, this kind of link can be widely
implemented in digital logic and analog circuit integration.
22
2.2.1 Laser Makelink Principle
Figure 2.6 is the schematic of a typical vertical Makelink structure. A laser
Makelink is an electrical connection formed between two layers (vertical link) or
within the same layer (lateral link) of metallization by a commercial pulsed IR laser.
The principle of link formation employs the contrast of material properties between
Figure 2.6: Vertical Laser Makelink structure (a) top view (b) cross-section view
the metal and the surrounding dielectrics SiO2/Si3N4. The IR laser beam passing
through the square hole of the upper metal (M2) frame is impinged on the lower
metal (M1) line with negligible loss of energy in the covering dielectrics. The laser
energy is absorbed on the surface of M1 to be resulting in a sharp metal temperature
increase. Due to the extremely low thermal conductivity and light absorbency of
23
the dielectrics, the dielectric temperature is not changed so much. In the mean time,
metal expansion fractures the surrounding dielectrics along the stress concentration
paths and molten metal fills in the crack. At an optimal laser energy and spot size,
dielectric cracks can be controlled to initiate from the upper corners of the M1 line
and terminate near the inside lower corners of the M2 frame without propagating
to the outside of the structure or fracturing the top dielectric passivation. An FIB
cross-section image of a laser-induced Makelink interconnecting structure is shown
in the following figure.
(a) (b)
Figure 2.7: FIB cross-section of a vertical Makelink structure
2.2.2 Laser Makelink Design
Much effort has been made to investigate the laser-metal interaction and so-
induced thermal/mechanical phenomena in different link structures[35]. Both ex-
periments and simulations indicated a broad processing window in term of laser
24
energy that ensures a good tolerance of laser errors as well as the fabrication-caused
variation. There are several factors involved in the successful laser linking process.
Generally, those factors can be divided into three categories: process characteristics,
laser conditions and link geometry structure. In most cases, designers have little
control on the process parameters. So only the latter two are discussed here.
? Energy Effect: Laser conditions include single pulse energy, pulse duration,
shape and laser spot size. Among them, only laser energy and spot size are
adjustable. In most cases, spot size is determined by the geometry size of
the link structure and is usually a fixed parameter. The choice of the laser
energy depends on the specific process parameters (such as dielectric material
and thickness etc) and the link structure. Due to the non-uniformity of the
temperature distribution, the thermal stress-induced crack initiation time and
propagation direction are different around the annulus. Heat conduction along
the metal line causes a fairly deep temperature gradient beyond the thermal
diffusion length in a single pulse duration. If the energy is too low, the cracking
will stop in the middle between two metal lines and fail to formavalid electrical
connection (figure 2.8 (a)). If the energy is too high, the crack will continue to
propagate along the bottom plane after it reaches the frame. Excessive metal
flow results in large voids in the lower metal that increases electro migration
risk; in the mean time, the undesired crack outside the link frame can destroy
the completeness of the top passivation (figure 2.8 (b)).
In order to characterize the yield and robustness of a Makelink structure, it?s
25
(a) (b)
Figure 2.8: Energy effect on the vertical link (2um bottom metal line, 4um hole)
formation (a)E = 0.11uJ; (b)E = 0.49uJ
useful to define an appropriate laser process window. The process window
in term of absolute energy lacks universal significance. Thus a normalized
window is preferred[32].
RelativeEnergyWindow = EH ?ELE
Avg
(2.1)
where EH, EL and EAvg are the high, low and average energies, of which a
link can be formed, respectively. The relative window is a normalized, non-
dimensional term that eliminates the dependence of the absolute energy win-
dow on the characteristics of different laser systems. It has been shown that
an acceptable energy window will always be found for the metal link process
for aluminum metallization processes insulated by SiO2 dielectric[33].
? Geometry Effect: Zero gap is desired in order to increase the link density.
26
However, for a laser beam with Gaussian energy distribution, a link structure
with zero-gap is not an efficient design. For example, for a link structure with
a 4um line, a 4um hole with zero gap and a 2um frame width, 38.4% laser
energy is absorbed by the frame, if the FWHM laser spotdiameter is equal to
the metal line width[34]. Thus, the available energy window is significantly
reduced due to the increased probability of frame damage. Besides, due to the
lens effect of the passivation over the metal2, a part of the laser energy could
be absorbed by the frame face inside the hole. The lower corner of the metal2
frame is heated up more quickly than for a planar structure receiving normal
incident laser bean. This lens effect causes undesired link formation from the
upper corner[34].
Based on our extensive simulation and experiment results [35], we came up
a basic rule of thumb: for vertical Makelink, the horizontal gap between the
top and bottom level of metal should be roughly equal to the thickness of the
dielectric, because the vertical link usually forms in the 45? direction; for later
link, the distance between the two adjacent metal wires is set to be equal to
the size determined by the specific design rule.
Vertical laser Makelink?s have been successfully demonstrated onvariousCMOS
processes with aluminum metallization [35]. The successful, low resistance link for-
mation and link yield are highly dependent on the specific process. Sometimes,
later link structures may provide better results. For examples, figure 2.9 shows
four later link structures designed on National Semiconductor?s 0.18 um, five-layer
27
metallization process. Vertical link designs have been proved to be unsuccessful for
this process. Figure 2.10 shows the energy window of the four structures and their
average resistance per link with standard deviation smaller than 1 ? ((pitch 2.2 um
test chip). Structure 4 has the lowest average resistance and standard deviation,
but it also has the narrowest energy window, because the links were laid out on
the top metal layer where a small energy increase may easily break the passivation
layer Si3N4. Structure 3, which shows the highest average resistance and standard
deviation within the window, indicates that the three-line design in metal 4 layer
has high resistance with a large variation, but it is likely to increase the probability
of link formation. For structure 2, 3 and 4, an optimal energy exists within the
energy window which produce the smallest resistance. Figure 2.11 is the link yield
for different structures. For this specific CMOS process, structure 2 achieves high-
est yield and lowest resistance per link simultaneously at the optimal energy 0.25
uJ. Furthermore, its energy window curve follows its yield curve. The experiment
results show that the optimal energy for structure 2 and 3 are 0.25 uJ and 0.22
uJ, respectively. In the case of structure 4, with 0.25 uJ energy, 100% yield was
obtained. (no test chain?s open or short) at the cost of a slightly higher average
resistance[36]. Laser Makelink?s are not necessary limited to aluminum links. They
can also be formed using copper. Some novel copper test structures are now being
developed on IBM?s BiCMOS SiGe process in Peckerar & Bernstein?s group. Ap-
pendix A shows some laser Makelink test chips designed on various CMOS processes
and the IBM SiGe Copper process.
28
Figure 2.9: Four later link structures design for NSC?s 0.18 um CMOS process
29
Figure 2.10: Energy windows of the four later link structures and their average
resistance per link (2.2 um pitch)
30
Figure 2.11: Test chain yield of the four later link structures
31
2.2.3 Summary
Advantages of Laser Makelink in the application of programmable devices
include:
? Makelink is the ideal programmable switch; among all current programming
technologies, Makelink offers the lowest programmed switch resistance and
unprogrammed switch capacitance. For example, a typical Makelink switch
resistance is approximately 1 ?, which is about 2-3 orders smaller than that
of Actel antifuse or a MOS switch. This makes the laser Makelink technology
ideal for high speed, low power and low noise FPAA applications.
? High reliability and tolerant to high current density: n analog application, the
switches are required to be able to carry large current. Therefore, ONO and
M2M antifuses cannot be used.
? Leakage Current: Because there are many antifuses on a chip, leakage currents
can amount to considerable power consumption. A 10 nA leakage current
in each of the typical 750,000 antifuses on a large FPAA would waste 7.5
mA. Since the amorphous silicon/dielectric layer in ONO and M2M antifuses
are very thin, they produce significantly higher leakage current than does
Makelink.
? Area efficiency: Compared with SRAM technology, Makelink can save signif-
icant amount of silicon. For example, on a commercial CMOS process, the
minimum width transistor area can be represented by MWTA. For a SRAM
32
controlled MOS switch, the number of MWTAs needed for a switch is 1+5 = 6
(assume a five transistor SRAM cell). While for laser Makelink based FPAA,
no SRAM cell is needed to control the laser Makelink switch. In fact, only
top 2 levels of metals are used and no silicon area is occupied. The silicon
under the routing interconnects could be used to build more active devices
and larger passive element matrices. Furthermore, considering its radiation
hardness, Makelink will save much more area than traditional SRAM based
technology. It is also worth noting that, at first glance, Makelink appears to
occupy greater area than ONO and M2M antifuses. However, in fact, ONO
and M2M antifuses require contacts to connect to the metal layers and these
take up more space than the antifuse itself. Accordingly, ONO and M2M do
not offer density advantages to Makelink; the contact and metal spacing de-
sign rules limit how closely the antifuses may be packed rather than the size
of the antifuse itself.)
? CMOS compatible processing steps: Unlike ONO and M2M antifuse technol-
ogy, Makelink is completely compatible with any commercial CMOS processes.
No extra process step or photomask is required.
? Radiation Hardness: Since no active devices in the Makelink switch, it?s inher-
ently a radiation hard technology; Makelink consists of pure aluminum and is
therefore truly radiation hard. Accordingly, Makelink-based LFPAA provide
significant cost savings and are ideal for high-reliability space missions.
33
No additional mask 
levels
5X-6X
~nA
100-1000W
~5-10 fF
Analog, Digital, and 
Mixed-signal
~ mA-mA
Robust, also 
Reprogrammable
Radiation Soft
SRAM
Additional mask levels 
required
Additional mask levels 
required
No additional mask levelsCMOS
Compatibility
1X1X2XNet Link Area
10 nA3 nAZEROLeakage Current
125W25W1-10W
Typical
programmed
Resistance
7.7 fF2.9 fF< 0.05fF
Typical
unprogrammed
Capacitance
DigitalDigitalAnalog, Digital and Mixed-signalApplication
~10mA~mA~mACurrent Carrying Capacity
1-Dimensional poly/n+
diffusion filament
1-Dimensional weak metal 
silicide filament (cross-section
area about 0.06 mm2)
2-Dimensional strong metal 
sheet (cross-section area 
about 0.5 mm2)
Robustness
Tactical Radiation 
Hardness
TID = 500Krad(Si)
Tactical Radiation Hardness
TID = 500Krad(Si)Strategic Radiation Hardness
Radiation
Tolerance
Actel ONOActel M2MMakeLink?
Lower metal (M1)
Upper
metal
(M2)
Laser beam
SiO2/S i3N4
Link sheetLink sheet
Cross-section B
Cross-
Section A
Cross-section A
(a) (b)
Figure 2.12: Table 2.1 comparison between laser Makelink with other programming
technologies.
34
2.3 Laser Makelink Applications
Laser Makelink is a metallurgic connection. It is similar to a via but much
stronger, reliable and capable to carry high current density. Unlike SRAM based
programming technology, where MOSFET just functions as a switch to route signal.
Makelinks can be used in the core circuit blocks as a ?mask metal lines?. To the
circuit designs, the beauty of laser Makelink is that it can give them the capability to
reconfigure the circuit topology at almost equivalent to mask level even after fabri-
cation. Moreover, laser Makelink can also be used as a low cost ?trimming? method
[37]. Redundant transistors, resistors and capacitors may be added along with the
specific devices. Whenever it?s necessary, the component value such as transistor
aspect ratio W/L or resistance can be fine tuned for precision by adding/removing
some redundant component(s). Figure 12 is an example of changing MOSFET W/L
using Makelink. The detailed application of laser Makelink in active circuits will be
discussed in the following chapters.
35
Figure 2.13: Reconfigurable MOS transistor aspect ratio
36
Chapter 3
Routing for FPAA
Similar to the FPGA, implementing an analog circuit on an FPAA requires a
large number of switches to be programmed to the proper state so that the desired
circuit topology and signal path can be established. Clearly, if the end user has
Figure 3.1: A simplified FPAA CAD design flow
to specify the state of each switch in the FPAA, the design cycle will be too long.
37
Therefore, the FPAA is designed so that the end user only describes a targeted
application at a high level of abstraction, typically using a schematic entry with
the IPmodule/CAM (configurable analog module) library provided by the manufac-
turer. Then, this high-level description is mapped and placed into a specific FPAA
architecture. A netlist file, which describes a set of connections to be made, is gen-
erated after the placement phase. Then FPAA router takes this netlist file as input
and performs routing. Combined with the chip layout, the end user will know which
switches need to be turned on (e.g., laser programmed).
3.1 What is routing
The FPAA routing problem is defined as follows: Given a netlist and a place-
ment of the CABs and IO cells, to route all the nets on the given FPAA architecture
without exceeding the total available routing resources and without overly degrading
the performance of the circuit [38].
Unlike custom analog IC designs, routing resources in FPAAs are fixed and
limited. All connections must be completed within the horizontal and vertical chan-
nels, via Manhattan paths. The FPAA routing architecture not only affects routing
but also has significant impact on the performance of the implemented circuit. To
facilitate the FPAA router development, an array-based FPAA architecture was
developed, as shown in figure 3.2.
38
Figure 3.2: An array based FPAA architecture
39
3.1.1 Architecture Overview
The following notations were used to describe some important parameters of
the FPAA routing architecture [40]. The number of wires or tracks contained in a
channel is denoted by W, i.e., width of the channel. The number of wires in each
channel to which a CAB pin can connect is called the connection block flexibility, or
Fc. The number of wires to which each incoming track can connect, in a switch block,
is called the switch block flexibility, or Fs. The length of a segment is measured
by the number of CAB blocks it spans. The segmentation distribution Fsd defines
what fraction of the tracks in each channel is of each length.
This FPAA architecture contains a 4X4 CAB array. Each CAB has 8 pins,
with 4 input pins on the left of the CAB and 4 output pins on the right of the CAB,
for fully differential circuit operation. Each CAB is surrounded by 4 connection
boxes. There are 8 tracks per horizontal and 8 tracks per vertical channel. In all,
there are 13 switch boxes and 32 I/O PADs, with 8 pads on each row/column of
CABs. The left column and bottom row PADs are for input only; the top row and
right column PADs are for output only. All the routing resources are uniformly
distributed. So, for this architecture, W is 8 for all channels, Fc is 8, Fs is 4, Fsd
is 1 and all segments have length 4. This FPAA architecture does not contain
segmentation but it can easily be modified, if segmentation is desired. Any array
based FPAA can be readily fitted into this basic architecture, with some appropriate
adjustment. A more versatile structure can be obtained by adding more pins, pads,
tracks or segmentation.
40
The coordinate system for the architecture is, as defined in Figure 2, from (0,
0) to (5, 5). The four corner positions, (0, 0), (0, 5), (5, 0), (5, 5), are blank areas,
i.e. no routing resources are available. Each X or Y directed channel belongs to
the pad or CAB right below it, or on the left to it, having the same coordinates.
In the routing resource graph, a pin-pad-track (PPT) number is used to record the
internal index of CAB pins, I/O pads and tracks in the channel. The PPT number
for PADs ranges from 0 to 7, starting with bottom or left most PAD. CAB pins are
sorted from inputs to outputs. Input CAB pins have PPT numbers ranging from 0
to 3; output CAB pins have PPT numbers ranging from 4 to 7. The top left pin
(first CAB pin) has a PPT number of 0, and the bottom right (the last CAB pin)
has a PPT number of 7. Inside each channel, the PPT number ranges from 0 to 7,
with 0 always denoting the bottom or left most track.
3.1.2 Switch Box and Connection Box
As depicted in figure 3.3 (a), the input/output pins of the CAB connect to the
tracks in horizontal and vertical channels through a connection box. Connection
boxes are also used to connect I/O PADs to the tracks. Connections from vertical
to horizontal tracks, or vice versa, are switched at the intersection by a switch box.
There are two types of switch box patterns. Pattern 1: tracks with different
parity indices are connected. Pattern 2: tracks with same parity indices are con-
nected. In the FPAA architecture, these two patterns alternate in each column,
41
starting with switch box pattern 1, which is the first one on top left.
3.1.3 Definition of Legal Connections
Based on the architecture above, the following are defined as legal routing
connections:
? LHS column and bottom row pads are for input only.
? RHS column and top row pads are for output only.
? LHS pads can connect to all the tracks in chany (0, 1).
? RHS pads can connect to all the tracks in chany (4, 1).
? Bottom row pads can connect to all the tracks in chanx (1, 0).
? Top row pads can connect to all the tracks in chanx (1, 4).
? Pins on the left the CAB are for input; pins on the right of the CAB are for
output.
? Input CAB pins can connect to tracks in the channels immediately on the left,
top and bottom of the CAB.
? Output CAB pins can connect to tracks in the channels immediately on the
right, top and bottom of the CAB.
? Tracks in the horizontal channel can connect to tracks in vertical channel if a
switch is available at the intersection.
42
(a)
(b)
(c)
Figure 3.3: (a)A Connection Box; (b)Switch box patterns 1; (c) pattern 2
43
? Direct connections between CAB pins are not allowed.
? Direct connections between PADs and CAB pins are not allowed.
? Dogleg is not allowed, i.e., CAB pin cannot be acted as intermediate vertex
to route a net.
3.2 Problem Formation
Routing problems are generally studied as a graph problem [39]. All routing
resources and their relationships, capacities and constraints are incorporated into
a routing resource graph (RRG). The router uses this graph to solve the routing
problem. A simplified FPAA architecture, and its associated RRG, is shown in
figure 3.4. Each track, PAD or CAB block pin is represented by a vertex in the
RRG. Each switch is represented by an edge. For examples, pin3 of CAB1 block
is represented by vertex (3); wire b is represented by vertex (b). The red net is
shown as a red tree in the RRG. Each vertex has a capacity, which is defined as the
maximum number of nets that can use this vertex in a legal routing. Track segments
have capacity one because only one net can use each. Because the Laser Makelink
switch is bi-direction, the RRG of FPAA is a non-directed acyclic graph.
To route a multi-terminal net in minimum distance or delay is equivalent
to finding a minimum-length tree on the routing resource graph, that spans all
the connecting vertices of the net[39]. This is essentially a Minimum Steiner Tree
44
Figure 3.4: (a) a simplified FPAA architecture (b) the corresponding routing re-
source graph (RRG)
Problem (MST). Using RRG, the routing problem is converted into a graph problem:
find multiple MSTs in the routing resource graph. The MST problem is NP-complete
[41], [42], [43]. Therefore, routing multiple nets with multi-terminals, for an FPAA,
is also a NP-complete problem. Accordingly, no routing algorithm can guarantee
the optimal result, i.e. it?s likely only an approximation/sub-optimal solution will
be obtained.
There are two standard ways to store a RRG: as a set of adjacency lists, Fig.
7(b); or as an adjacency matrix, Fig. 7(c) [41]. An adjacency-list was used for the
routing algorithm development, because it provides a more economic way to store
sparse graphs. The adjacency-list representation of a graph G = (V,E) consists
of an array of v lists, one for each vertex in V. For each u ? V, the adjacency
45
Figure 3.5: (a) a directed graph (b) adjacency list (c) adjacency matrix
list contains all the vertices, v, such that there is an edge (u,v) ? E. A potential
disadvantage of the adjacency-list is that there is no quicker way to determine if a
given edge is available in the graph.
A unique RRG is required for routing each FPAA architecture. Manually cre-
ating such graphs is very time-consuming, or even impossible. In order to test as
many architecture variations as possible, and interactively optimize both the ar-
chitecture and router, a routing resource graph generator (RRGG) was developed
to automatically generate the RRG, for each given architecture. The role of the
Figure 3.6: The role of routing resource graph generator
RRGG is schematically demonstrated in figure.6[40]. The RRGG converts the tar-
geted FPAA architecture into a highly detailed RRG, which will be used by the
router. The RRGG is transparent to the ?user? (who defines the architecture) and
46
the router. Moreover, if the architecture is modified, only the RRGG needs to
be modified; the router code does not need to be re-written and can still function
correctly, with very little modification.
When building the RRG, a coordinate system must be clearly defined. The
order or index of the vertices is chosen from bottom to top, and left to right, i.e.,
(0, 0), (0, 1) ...(0, 5), (1, 0) ...(1, 5), ...(5, 5). The four bland positions, (0,
0), (5, 5), (0, 5), (5, 0), should be skipped. The program starts building routing
resource graph from position (0, 1). I/O PAD vertices are added onto the RRG
first. Whenever there is a possible connection between PAD and routing tracks, an
edge (i.e., a neighbor of this vertex) is added into the linked list of that PAD vertex.
Since Makelink switch is bi-directional and this is an undirected graph, an edge is
also added into the linked list of this vertex?s neighbor, as well. This work is done by
subroutine creat edge list. Given the vertex (x, y) coordinates, its routing resource
type and its internal PPT number, vertex index in the routing resource graph can
by calculated by calling subroutine get vertex index. X, Y, routing resource type
and PPT number can be directly obtained from the loop control. For example, PAD
0 (the first PAD in PAD group (0, 1)) is added into the RRG first. According to
our connection definition, it can connect to all the tracks in Channel Y (0, 1). The
program loops over all the tracks at position (0, 1), from 0 to 7, calculates their
indices respectively, and adds these vertices into the neighbor list of PAD 0. At the
same time, PAD 0 is added into the neighbor list of those tracks. Similarly, CAB pins
and the associated X/Y tracks are added. If there is a switch box at the intersection
of the X and Y channels, the tracks in these channels are added onto each other?s
47
linked list. Please note, there are two types of switch box pattern. Care should be
taken when adding the tracks into the neighbor list of the connected tracks. Finally,
after all the vertices have been counted, the generated RRG is outputted into an
RRG file.
3.3 FPAA Routing Algorithm
3.3.1 Introduction
Conventionally, the task of routing is carried out in two phases: global routing
and detailed routing [39], [40], [44], [45], [46], [47]. In the global routing phase, a list
of regions (channels) are assigned to each net, without specifying actual track-pin
connections; connections are completed in the detailed routing phase. This two-step
routing method is mainly due to the complexity of the problem. However, there are
two apparent drawbacks: (1) The task of detailed routing is usually very difficult or
impossible because the routing resource of FPGA/FPAA is fixed and limited and
the detailed routing is highly constrained by the decisions made during the global
routing phase; (2) In case the circuit is routable, it?s very likely the routing result
is only sub-optimized, even if an optimized result in both phases were obtained.
Therefore, a one-step, combined, global-detailed routing scheme is preferred in our
routing algorithm development [48], [49], [50], [51].
As stated previously, the routing problem is essentially an MST problem, in
graph theory. There are several algorithms available to attack this problem. Many
of these routing algorithms use some variations of Lee?s Maze router. A Maze router
48
essentially consists of running Dijkstra?s algorithm. The searching strategy is very
similar to the one used in Prim?s algorithm. So, in this subsection, a brief overview
of these three most important algorithms is given.
? Lee?s Maze Algorithm[52] This algorithm is best illustrated by figure 3.7. The
Figure 3.7: Lee?s Maze Router
task is to find a shortest path from source, s, to target, t. First, grids overlaid
over the plane are defined. Each grid is where one wire can cross. Then
mark each grid by its relative distance to the source. The search begins at
the source, finds all the grids at distance 1, distance 2 ...until reaching the
destination, t. This algorithm addresses the problem in a manner consistent
with wave propagation. With this procedure it is guaranteed that the shortest
path will be found.
? Dijkstra?sAlgorithm[41] Dijkstra?sAlgorithmsolves thesingle-source, shortest-
path problem on a weighted, directed graph G = (V,E), for the case in which
all edge weights arenon-negative values, and ispresented, asfollows: Dijkstra?s
49
Dijkstra(G,w,s)
1. for each u?V[G] {
2. dist(s,u) = ?;
3. pre(u) = NULL;
    }
4. dist(s,s) = 0;
5. Done = ?;
6. Q = G; 
7. while Q != ? {
8.    find u ? Q with min. dist(s,u);
9.    Q = Q ? {u};
10.  for each v adjacent to u
11.     if dist(s,v) > dist(s,u) + dist(u,v) {
12. dist(s,v)=dist(s,u)+dist(u,v);
13. pre[v] = u;
}
14. Done = Done? {u};
15.}
Figure 3.8: Dijkstra algorithm
algorithm maintains a set ?Done? of vertices whose final, shortest-path from
the source s, have already been determined. Initially, all the vertices are en-
queued to Q. The algorithm repeatedly selects the vertex u ? Q - Done with
the minimum shortest path evaluated, saves its predecessor if available and
inserts u into set ?Done?.
? Prim?s Algorithm[41] Prim?s algorithm operates much like Dijkstra?s algorithm
for finding shortest paths. At each step, a light edge is added. The shortest
path of a new vertex is calculated, with respect to the existing, partially
finished tree (net). This algorithm applies a greedy strategy. The key to
efficiently implementing Prim?s algorithm is to make it easy to select a new
edge to be added to the tree. During execution of the algorithm, all vertices
that are not in the partial tree (net) are stored in a priority queue. Key v is
50
Prim (G,w,r)
1. for each u?V[G] {
2. do key [u] <- ?;
3. pi [u] <- NIL
4. key [u] <- 0 
5. Q <- V[G];
6. Q = G; 
7. while Q != ? {
8.    do u <-  Extract Min (Q)
9. for each v ? Adj [u]
10.     do if v ? Q and w (u, v) < key [u]
11. then pi [v] <- u
12. key [v] <- w (u, v)
Figure 3.9: Prim algorithm
vertex?s priority value. Prim?s algorithm is shown as above.
3.3.2 Pathfinder Negotiated Routing Algorithm
There are many trade-offs when routing a circuit netlist. For example, per-
formance and congestion may conflict. A pure, routability-driven router may pro-
duce poor performance, while pure performance-driven routing may result in an
unroutable circuit. How to balance these trade-offs is the major concern of the
router. A very efficient way to do this, is to incorporate those trade-offs into a
cost function. Most routers perform multiple routing iterations in which some or
all of the nets are ripped-up and rerouted by different paths to resolve competition
for routing resources, or to improve circuit performance. The criteria to determine
which net should be routed first, is determined by the cost function. Therefore cost
function design is critical for routing algorithm development.
The FPAA router is based on the Pathfinder Negotiated Routing Algorithm
51
[48], [49],[51], [53]. Depending on the cost function design, it can be either pure
routability-driven or balanced, congestion-performance driven routing. However,
for this small scale FPAA, a 4x4 CAB array comparable to Anadigm?s AN10E40, a
routability driven router is sufficient.
? Cost Function Definition [48], [54]
Before any further discussion of the algorithm, let?s first define the cost func-
tion. The following equations were used for the cost function in this router:
Cost(n) = b(n)?h(n)?p(n) (3.1)
where b(n), h(n) and p(n) are base cost, history congestion cost and present
congestion cost, respectively. The present and history congestion cost func-
tions are defined, as follows:
p(n) = 1 +occupancy?pfac (3.2)
h(h) = h(n)i?1 +occupancy?hfac (3.3)
where pfac and hfac are experimental parameters, and i is the iteration num-
ber. When i = 1, h(n) equals 1. The example in figure 3.10 shows how the
router can use p(n) to resolve the congestion. During the first iteration, all
3 nets go through vertex B, with lowest cost. During subsequent iterations,
p(n) is updated, i.e., the penalty of using vertex B increases. Then, during
some later iteration, net 1 will find that a path through vertex A gives a lower
cost. Similarly, net 3 will find that a path through C gives overall lower cost.
52
Figure 3.10: The functionality of p(n) in resolving the congestion
In this router, base cost is set to 1, for all the vertices. The performance of
the router is not very sensitive to how the exact base cost is chosen, since
the primary goal of the router is congestion avoidance, regardless of the base
cost value. In the p(n) and h(n) functions, pfac and hfac are two parameters
that determine how the routing is scheduled. Since h(n) is incremented after
every iteration and provides sufficient penalties for overused vertices, hfac can
be set to a constant value. hfac is set to 0.5 in this router. p(n) is updated
more frequently. To achieve high quality routing results, pfac should initially
be small, allowing congestion to have little penalty; and gradually increases
from iteration to iteration. The trade-off is that slowly increasing pfac will get
a better quality routing, while quickly increasing pfac (by making congestion
very expensive) will speed up the router. Here, pfac is initially set to 0.5 and
then increase it by 1.5 times of its previous value, with each iteration. Due to
the scale of this FPAA, there?s no noticeable differences due to variations in
these two parameters.
? Pathfinder Negotiated Routing Algorithm The detailed pathfinder negotiated
53
RT(neti): a linked list used to store the set of vertices in the current routing of net i
While (overused resources exist && max iteration not exceeded) {
For (each net, i) {
If RT is not empty then Rip-up existing RT(neti) and update p(n) ;
Initialize RT to the source terminal;
For(each sink net i) {
If PQ is not empty then free PQ and re-initialize PQ;
Initialize PQ to RT;
Mark all the vertices as un-reached by wave expansion;
Initialize PriorityQueue to RT(neti) and set pathcost equal to the base cost of
each vertex in RT;
If this sink j is not foundd in RT(neti) {
     do {
Dequeue PQ;
For (all fanout vertices n of node m){
If (this fan-out is not a PIN or PAD and un-reached during previous 
wave expansion) 
  add it  to PQ & update pathcost(n) = pathcost(m) + cost(n);
else if (this fanout is a sink)
add it to a sink list;
else continue wave expansion;
}
     } while (no sink has been found); /* Wave expansion ends here */
}
if  ( more than one sinks are found during this wave expansion) {
add those sinks and their parents to RT;
update p(n) only if vertex n is not contained in RT;
}
for (all vertices in path from RT(i) to sink,j){  /* Backtrace from the linked list 
of sinks */
Update p(n) only if  vertex n is not contained in RT;
Add n to RT(i);
} /* Backtracing ends here */
    }
}
Update h(n) for all n;
}  /*End of one iteration*/
Figure 3.11: The improved pathfinder negotiated routing algorithm
routing algorithm is shown as of above.
Pathfinder negotiated routing repeatedly rips-up and re-routers every net in
the circuit until all the congestions are eliminated. During the first routing
iteration, every net is routed for minimum cost, even if this leads to congestion.
After each routing iteration, the cost of overuse is increased. The router can
determine how to arrange the routing resource, based on the cost of each
54
vertex. Consequently, if overuse exists at the end of a routing iteration, more
iterations are performed to resolve this congestion.
? Implementation of Pathfinder Negotiated Routing Algorithm
The router takes the netlist as its input and starts routing based on the RRG
generated by RRGG. The flow chart for the program is shown in Fig.12. There
Figure 3.12: Pathfinder algorithm
are three types of vertices in the router: routing resource graph (RRG) vertex;
routing tree RT vertex; and priority queue PQ vertex. A routing tree is used
to store the vertices in the partially finished, or finally completed, routing of
55
a net. Each RT corresponds to a net in the netlist. It will be ripped-up,
after every routing iteration, until all the nets are successfully routed. In this
pathfinder negotiated routing, whenever a new iteration starts, the router first
rips-up the existing RT/net. The cost of vertices in the RT is re-calculated and
the source of this net is re-assigned to RT. Then, it loops over all the terminals
of this net. The router performs a breadth-first search (wave expansion) over
all the fanouts of the lowest cost vertex in PQ. If no sink is found, all of the
fanouts are added to PQ. After the router finds a sink (or more sinks), it
begins back-tracing. If x(x > 1) sinks are found, the first x?1 sinks and their
parents are added to the RT, then the router starts back-tracing from the last
one. Routing iterations stop when all the nets are successfully routed or when
the maximum routing iteration is exceeded.
When programming the router, the following should be noticed:
? Since some vertices in RT may appear more than once, when initializ-
ing PQ to RT, it need to make sure there?s no repeated vertex in PQ.
Otherwise, multiple wave expansions will be carried out from the same
vertex. Obviously, this will reduce the router?s performance. Similarly,
during intermediate wave expansion stages, any vertices that have been
previously reached should be removed from future wave expansions.
? By intuition, if a netlist is placed appropriately in a FPAA/FPGA, sinks
of the same net tend to stay close to each other. It is very likely that
more than one sink could be found during the same wave expansion.
56
However, in the original algorithm, the wave expansion procedure stops
whenever a sink is found. Then, another new wave expansion starts for
the next sink. Thus a significant amount of the router?s work could be
wasted, especially when the wave expansion starts very deeply inside the
RRG. Thus, a more efficient mechanism was developed, by introducing
a temporary sink list. Every wave expansion must be fully completed
even if a sink has been found. If more than one sink is found, those
sinks are added to the temporary sink list and then added to RT. Before
a new iteration starts for the next sink, the router first checks if this
sink has already been contained in RT. The wave expansion stops when
the number of sinks found is larger or equal to 1. The back-tracing stage
starts from the last sink in this temporary sink list. Since sinks sometimes
may be found out of the loop order, a flag variable should be introduced
to ensure every sink of a certain net is found. The router checks this flag
before it moves onto the next sink in the loop, so it won?t miss any sink.
? Priority Queue, PQ, is the critical data structure in implementing this
algorithm. The memory occupied by PQ must be appropriately allo-
cated and released after each iteration. In order to better manage the
dynamically allocated PQ memory, three special data members, size0,
avail0 and d0, are used to track the vertices that were historically in PQ.
Those three members represent a redundant array. This redundant array
is used to copy the locations of all the vertices that are currently in PQ,
57
or that used to be in PQ. Then, the router knows where and how to
release the memory for the new PQ.
3.4 Data Structure
The primary data structures used in the router are linked list and priority
queue.
There are three types of vertices in the router: routing resource graph (RRG)
vertex; routing tree RT vertex; and priority queue PQ vertex. RRG vertices and RT
vertices are maintained by linked list, while PQ vertices are maintained by priority
queue. Their definitions are shown as follows:
When a vertex is used by a net, its occupancy increases by 1. Capacity is 1
for all vertices, since only one net can legally use a vertex. A vertex?s edge list is
designed as a 1-D array, for easy access. After the main program calls the build rrg()
subroutine, all the RRG vertices will be loaded into memory and ready to use for
the router.
A routing tree is used to store the vertices in the current routing of a net.
Each RT corresponds to a net in the netlist. It will be ripped-up after each routing
iteration, until all the nets are successfully routed. Since all the information needed
in the back-tracing stage is stored in priority queue, RT was implemented just with
a simple linked list.
The critical data structure in the routing algorithm development is priority
queue, or more precisely, a minimum binary heap priority queue, [39],[41], [55]. For
58
typedef struct {int index; short x; short y;short ppt_num; t_rr_type type; int occupancy;
int capacity; int num_edges; int *edge_list; } t_rr_vertex;
 / * index: index of the vertex *
 * x, y:  integer coordinates *
 * type:  What is this routing resource? *
 * occupancy: how many nets are using this vertex now? *
 * capacity: how many nets can legally use this vertex? *
 * ppt_num:  Pin, track or pad number, depending on rr_vertex type.      *
 * num_edges:  number of edges exiting this vertex, i.e. the number *
 *             of vertexs to which it connects. *
 * edge_list: pointer to the linked list of all its neighbors *
***********************************************************************/
struct s_RTvertex { int index; short PQflag; struct s_RTvertex *pNext; };
typedef struct s_RTvertex t_RTvertex;
/**** Data structure of a routing tree member *****/
/* index: the index of this vertex; */
/* pNext: pointer to the next vertex */
/**************************************************/
struct s_PQvertex { int index; struct s_PQvertex *pParent; double pathcost; };
/**** Data structure for priority queue vertices *****/
/* index: the index of this vertex; */
/* pathcost: the pathcost of this vertex in the partial net */
/* pParent: parent of this vertex in RRG, NOT the parent in PQ*/
/**************************************************/
Figure 3.13: Data structure definitions
a regular queue, new items are added to one end of the queue and are removed
from its other end. The sequence an item is taken out of the queue is first-in-first-
out (FIFO). A priority queue is different from a regular queue in that the items
it contains are not arranged in the order of their respective time of enqueuement,
but by their priority. When an item is removed from a priority queue, it has, of all
items, the highest priority (in the context of this router, the highest priority means
the item has the minimum pathcost).
A binary heap basically is a binary tree for which the following two properties
hold:
59
? Each vertex is associated with a scalar key value, called priority.
? No vertex in the three has children whose key is higher than its own.
Binary heaps have two important properties. First, the vertex bearing the highest
key value is always the root vertex. Second, insertion or removal of records takes
O(logn) time, where n is the number of items in the heap. A binary heap priority
queue is a priority queue, internally using a binary heap to organize its items.
There are many methods to implement a priority queue. The most efficient
way is to use a plain, 1?D array. Assume there are n vertices in P. The vertices
are stored in the array, with n slots in which:
? the children of the vertex in slot i occupy slots 2i and 2i + 1
? the parent of the vertex in slot i lives in slot i/2.
So, when removing the lowest cost vertex from PQ, the root vertex that sits in slot
1 is going to be removed. There is a straightforward one-to-one correspondence be-
tween binary heaps and flattened-out array representations of binary heaps. Since
the link relationship between any two vertices is directly obvious from their respec-
tive slot indices, it?s no need to explicitly store any links, thus saving substantial
amounts of time and space.
3.5 Investigations of Performance Constraints on the Routing
The goal of routing is not only to complete all the required connections with-
out congestion, but also to satisfy a set of performance constraints. For a small
60
scale FPAA (comparable to Anadigm?s AN10E40 [4]), a routability-driven router
is sufficient. When the scale of FPAA grows and bandwidth of instantiated circuit
increases significantly, performance-driven routing would be necessary.
The performance constraints imposed on analog routing are quite different
from that of digital routing. For an FPGA/digital circuit, performance is measured
by clock speed or/and delay on the critical path. However, for an FPAA/analog
circuit, the system performance is usually measured by its bandwidth, gain, slew
rate, output swing, CMRR, PSRR, linearity etc. Thus, signal delay is not the only
concern. Routing parasitics can affect the performance of analog system in many
different ways. For examples: (1) In an op amp circuit, a small capacitive coupling
may degrade the frequency response due to the Miller effect; (2) Stray coupling
which gives rise to positive feedback may lead to oscillations. (3) In some cascode
configuration, the output node usually has very large resistance, Rout. When a net
travels a long distance, the parasitic capacitance to ground can introduce an extra
pole (for example, a pole very close to the dominant pole) that may deteriorate the
op amp?s stability and slew rate.
To our best knowledge, there is no explicit timing definition comparable to the
digital counterpart. Therefore, FPAA routers cannot simply compare the timing
criticalness/delay of two paths to decide the route. In the digital domain, the
performance constraints are in fact induced by RC delay, which can be counted
efficiently with the timing term in the cost function. However, the performance
constraints (tolerable variation of gain, bandwidth etc.) imposed on analog array are
too abstract for the routing tools to handle directly; thus they must be converted to
61
a set of routing constraints, i.e. interconnect parasitic constraints. Once the routing
constraints are met by the router, the performance constraints of the analog circuit
should also be satisfied. The performance-driven routing problem can be defined as
follows [56], [57]:
Definition: For a set of performance functions {Wi}, i = 1,2,...Nw and a
set of parasitics {pj},j = 1,2,...,Np, The parasitic constraints or routing constrains
on a subset of {pj} are defined as:
? Matching constraint: pj = pk
? Bounding Constraint: pj ? pj bound
and they ensure: ?Wi ? |?Wi,max|, where |?Wi,max| is the maximally allowed
performance variation due to the parasitics.
Parasitics that are to be controlled during routing are metal wire resistance,
switch resistance, metal wire to ground and metal-to-metal capacitance.
Modeling the interconnect as a true, frequency-dependent transmission line
can capture the behavior of the line more accurately. However, inductance is much
more complicated to extract than resistance or capacitance because of the loop
current definition of inductance. The critical length of a line can be determined
from knowing the desired signal frequency along with the speed of propagation of
the interconnect structure. As a rule of thumb, an interconnect structure should be
considered as a transmission line when its physical length approaches 1/4 to 1/10
the wavelength of the highest frequency signal [58], [59].
For the case of a simple microstrip line, the wavelength at a given frequency
62
is:
?g = 300F??
eff
mm (3.4)
where ?eff is the effective dielectric constant given by ?eff = 1/2(?r + 1), and F is
the frequency in GHz. Assuming the highest frequency signal of interest (to pass) is
1 GHz, the corresponding wavelength is 191.69 mm. Assuming the 1/4 rule, the line
length for which transmission line property becomes important is about 47.92mm.
Therefore, the effect of parasitic inductance can be neglected for on-chip circuits.
Due to the unique advantages of laser Makelink technology, parasitic capac-
itance of the interconnect metal wires (figure 3.14) is the major concern in FPAA
routing. Parasitic resistance can also be taken into account, if needed.
Figure 3.14: Pathfinder algorithm
Imposing Bounding Constraints on Performance-driven Routing
63
Bounding constraints can be divided into two classes: (1) loading constraint
(to ground); (2) coupling constraint.
(1) Loading constraint: Usually the routing interconnects reside on the top
level metal layers. In many cases, the parasitic capacitance to ground Cground is
not a problem. But, if the metal wire travels a long distance (the net spans a
large portion the FPAA chip), Cground can deteriorate op amp performance, such
as stability and transient response time, especially when the circuit node has large
impedance. So, when routing a net, its accumulated parasitics are checked against
the pre-defined bound. If the bound is exceeded, the wave expansion terminates
and starts over again.
(2) Coupling constraints: for analog circuit, the coupling capacitance could be
more important than the parasitic capacitance to ground, since it usually has a much
larger value. The sub-problem can be defined as: given a set of sensitive pairs of
nets (ni,,nj) (sensitive pairs are pairs of nets between which coupling constraints are
imposed) and a set of associated bounds Cbound(i,j), the completed routing should
satisfy: C(i,j) ? Cbound(i,j), where C(i,j) are coupling capacitance between nets
ni and nj.
Capacitive coupling is present whenever two nets have segments that cross or
are parallel to each other. Thus, it can be further classified by crossover constraints
and adjacency constraints. For FPAA, the adjacency constraints are the dominant
factor because most of the capacitances induced by crossover can only occur at the
intersections of horizontal and vertical channels. A preliminary idea of imposing the
coupling constraints on the routing, is: when the routing of one net in the sensitive
64
pair is completed, the cost of those tracks that cross over it or are immediately close
(some influence distance should be set; to the first order, only consider the closest
ones) and parallel to it increases. The increased value must be larger than the
regular cost due to congestion. This will make the router tend to use other tracks,
which have no or little coupling capacitance, to route another net in the sensitive
pair. Then, net re-ordering is performed after each iteration. With this approach,
the effects of coupling constraints are effectively incorporated into the cost function.
Imposing Matching Constraints on Performance-driven Routing
Fully differential topology is frequently used in the FPAA circuit. This results
in an additional need for the interconnect parasitics associated with appropriate
nodes or branches to nominally match, for impedance matching and noise cancel-
lation purposes. Bad matching not only reduces the CMRR but also increases the
offset voltage, or even affects proper functioning of the circuit. The matching con-
straints require: (1) For impedance matching, the capacitances to ground associated
with each matched pair of nets should be equal; (2) When a casual net (the net that
does not have any constraints) is close to a matched pair, the coupling capacitances
between that casual net and the pair of matched nets should match; (3) When two
pairs of matched nets come close to each other, it is necessary to match the direct-
coupling capacitances and cross-coupling capacitances. Besides having symmetrical
loading, this also ensures that equal levels of noise on the two nodes of one matched
pair causes the same on the other pair, if any coupling is present. The FPAA router
can employ a simple scheme to route the matched pairs. First, net ordering is per-
formed. Then the pair of nets in each matched pair is treated as a single net (called
65
merged net) and routed. Another way to impose matching constraints is, after rout-
ing one net of the matched pair, the cost of those tracks that are symmetric to the
segments of the finished net can intestinally decreased. The router will tend to use
these ?matched tracks? to finish the routing, so that matching constraints are also
effectively incorporated into the cost function.
For the current FPAA architecture, the routability-driven router is sufficient
because:
(1) The scale of the FPAA is quite small. The specifications of the CAB and
the targeted application speed is still well below 100 MHz range.
(2) The pathfinder algorithm employs a similar strategy as Lee?s Maze algo-
rithm, which is used to solve the shortest path problem. Thus, although the router
developed is ?congestion-only driven?, it in fact not only resolves the congestion but
also tries to find the ?shortest path?. In other words, the accumulated parasitics
(especially the loading capacitance and serial resistance) are automatically kept to
a near minimum value, along with the wave expansion process.
Appendix B is a brief program documentation for the FPAA router. The
output is the laser Makelink switch indices, which can be converted into physical
(x,y) coordinates on the actual layout.
66
Chapter 4
Configurable Analog Block
The configurable analog block, i.e., CAB, is a critical architecture building
block. The FPAA developed in this work is essentially an array of CABs. They are
connected by the surrounding interconnect network, including horizontal channels,
vertical channels, connection boxes and switches boxes, to route signals between
CABs and I/O pads. The CAB circuit and its internal arrangement strongly affect
the flexibility and functionality of the FPAA. It is always desirable to implement an
application just using one CAB or as few number of CAB?s as possible. An efficient
CAB architecture can minimize the unnecessary external long routings wires, which
contribute significant more parasitics than the CAB internal wiring.
An CAB is usually composed of several programmablecapacitor arrays(PCAs),
programmable resistor arrays (PRAs) and an op-amp-like analog core unit. In the
following section, PCA and PRA topologies are discussed first.
4.1 PCA and PRA
Generally, the most ?expensive? parts in VLSI technology are not active de-
vices but capacitors and resistors, because these passive components occupy a large
portion of the chip area resulting a significant silicon real estate cost, and it?s diffi-
67
cult to precisely control their absolute values. However, for continuous-time mode
operation, resistors and capacitors can?t be completely removed or substituted with
active devices. They are used to realize feedback loop, signal coupling, integration,
differentiation and other analog signal processing functions. Thus, they are a must
for the proposed FPAA.
Those passive component values are obtained through the programmable ca-
pacitor array (PCAs) and the programmable resistor arrays (PRAs). To minimize
area cost and increase the design flexibility, their values and arrangement inside the
CAB must be chosen with special caution. In [60], the resistors in the PRA and the
capacitors in the PCA and PRA are all in parallel. For each PCA or PRA, there
are only two terminals. This is shown in Figure 4.1.
The drawback of this arrangement is even if only one resistor/capacitor is used,
the rest of them will not be usable anymore because they share the terminals. This
wastage considerably increases the chip cost because more PCAs and PRAs will be
needed to increase the flexibility. Also, the way the PRA is constructed makes it
impossible to obtain resistor value higher than 32x the unit resistance.
To remedy theabove difficulties, a new PCA and PRA topologywas developed,
as shown in figure 4.2. Considering the way that the capacitance is added up, the
binary-weighted capacitors in PCAs are placed in parallel. The smallest capacitance
unit is denoted by 1x. The capacitors can be used individually, or users can pick any
number of them or all of them to obtain larger desired capacitance by appropriately
programming the switches. Then the available capacitance range achievable via
PCA is from 1x to 63x with minimum resolution of 1x. By the same token, the
68
32x1x 2x 16x8x4x
SB1
ST1
SB6SB5SB4SB3
ST6ST5
SB2
ST4ST3ST2
(a)
32x1x 2x 16x8x4x
SB1
ST1
SB6SB5SB4SB3
ST6ST5
SB2
ST4ST3ST2
(b)
Figure 4.1: The resistors and capacitors arrangement inside the CAB [60] (a) PCA
; (b) PRA
69
32x1x 2x 16x8x4x
SB1
ST1
SB6SB5SB4SB3
ST6ST5
SB2
ST4ST3ST2
(a)
1x 2x 4x 8x 20x 100x
SP1
SS2SS1
SP6SP5
SS5
SP3SP2
SS4
SP4
SS3
(b)
Figure 4.2: The improved resistors and capacitors arrangement inside the CAB (a)
PCA ; (b)PRA
70
resistors in PRA are in serial. The switches SSx and SPx (x=1,2,3,4,5,6) allow
almost arbitrary connections between the resistors. For examples, if switches SP1
through SP6 are all closed, this PRA essentially behaves as a metal wire (0?) and
can be used to configure a unity gain buffer. If maximum resistance is desired,
one can simply close switches SS1 to SS6 and leave switches SP1 to SP6 open. If
resistance of 10x is needed, switches other than SS2, SP3 and SS4 can be left open.
Also, each of the resistors or capacitors has its own terminals. Comparing to figure
4.1, this allows re-use of the PRA and PCA.
No FPAA can satisfy all application requirements. The exact unit capacitance
or resistance value should be determined by the specific range of operating frequency.
The basic rule of thumb is, this value should be large enough so that the parasitic
capacitance of the transistors or interconnect wiring is negligible; at the mean time,
it shouldn?t be too large to over-load the core circuit or degrade the speed. From
IC layout design perspective, all the resistors or capacitors should be built using the
unit cell (if the desired resistance range is too wide, the unit cell can be made of
two 2x unit value resistors in parallel).
4.2 CAB Structure
In this work, a fully differential difference amplifier was used as the core circuit
block in the CAB(figure 4.3).
The number of PCA?s and PRA?s and their relative placement to the DDA
should be considered for certain target applications. Considering the general use of
71
Figure 4.3: The differential difference amplifier
those resistors and capacitors, such asto formfeedback loop or coupling components,
two pairs of resistors/capacitors would be needed. To form different type of filter
response, the capacitorsshould have the flexibility tobe connected beforethe resistor
or after the resistor on the signal path. As mentioned at early in this chapter,
whenever it is possible, a certain application should be implemented within the
CAB because of the shorter signal traveling distance thus faster speed. Bearing this
in mind, an Sallen-Key bandpass filter which requires fairly complex internal wiring
was chosen as a start point.
Four PCAs and four PRAs were chosen. Two pairs of them are put on the top
and bottom of the DDA, which can be used to form feedback loops. Another two
pairs are put before the DDA inputs. These resistors and capacitors can be uses in
coupling path or in some active filter applications (Chapter 7). The overall CAB
architecture is shown in figure 4.5 [62].
The regular thin black lines are single wires. The thick red lines in figure
4.5 are ?BUSes?. Each red ?Bus? contains 6 single wires corresponding to the 6
pairs of terminals in PCAs and PRAs. Each small square in the figure represents
72
Figure 4.4: The Sallen-Key bandpass filter [61]
a programming switch or a matrix of programming switches, which is determined
by the context. With this configuration, the sequence of the resistors or capacitors
appearing on the signal path can be easily adjusted by properly programming the
laser Makelink switch. Even one CAB is powerful enough to implement certain
complex analog functions, for instance, Sallen-Key low pass filter, bandpass filter,
subtracter etc, as will be introduced in Chapter 7.
73
DDA
PCA
PRA
PRA
PRA
PRA
PCA
PCA
PCA
Fi
gu
re
4.5
:T
he
co
mp
let
eC
AB
str
uc
tu
re
74
Chapter 5
The Differential Difference Op Amp Design
Today?s high density FPGAs usually feature a large number of modules and
interconnections that allow almost arbitrary configurations of combinatorial and se-
quential logic. However, due to the nature of analog system design, FPAAs typically
contain a relatively small number of CABs. The functionality that an FPAA can
implement is largely determined by the CAB circuit. Thus a good CAB internal
circuit topology not only provides more flexibility but also dramatically affects the
performance of the instantiated system.
A major choice when designing an FPAA is whether to operate it in discrete-
time or continuous-time. Discrete-time approaches are well suited for digital control,
and for low to medium resolution, they do not require on-chip tuning scheme for
VLSI implementations of the programmable components. Many discrete-time design
techniques are widely used, such as switched-capacitor circuit [63], [64], controlled
duty-cycle signal chopping and reconstruction [65], analog to digital conversion fol-
lowed by digital processing and digital to analog conversion [66], or switched-current
circuits [67]. However, such sampled-data techniques require that input signals be
band-limited to at least one half of the sampling frequency (Nyquist Theorem [68]),
and hence anti-aliasing and reconstruction filters are needed. This requirement
75
significantly limits the bandwidth of discrete-time FPAA circuit implementations.
Continuous-time circuit techniques [69], [70], [71], [72], [73] do not require band-
limited input signals, but may need more complicated implementations to have
circuit components programmable over a large dynamic range. Continuous-time
techniques of both sub-threshold [74] and linear circuits have been used in pro-
grammable analog circuits. The sub-threshold approach, however, is difficult to
apply to a wide variety of analog circuits because of its increased sensitivity to
process variation and the parasitic effects.
Table 5.1: Comparison between continuous time and discrete time
Continuous time Discrete time
No pre and post filtering Pre and post filtering
No sample and hold Sample and hold
Limited by op amp?s bandwidth Limited to less than 1/10th the op-amp?s bandwidth
Narrower component parameter range Wider component parameter range
No clock noise Noise due to clock signals
Less routing Programmable routing for clock
Sensitive to switch nonidealities Not sensitive to switches
As discussed in the previous chapter, an CAB usually contains some passive
component arrays (i.e., PCA?s and PRA?s), some interconnect switches, and an
op-amp-like unit. This unit is the core circuit building block of FPAA. Its func-
tionality and performance will dramatically affect the CAB and the overall system
specifications.
76
5.1 Op Amp Topology Selection
Just like any other analog circuits, the design of an op amp is a multidimen-
sional problem that involves many trade-offs (figure 5.1). The choice of the topology
is highly dependent on the desired specifications. No op amp is suitable for all ap-
plication needs, because sometimes different specifications may impose conflicting
requirements on the design. For examples, gain usually trades for bandwidth; speed
usually trades for power. The op amp developed here is used as the core build-
ing block for a general purpose FPAA, not for one specific application. Therefore
some typical op amp parameters were optimized, while some others were not. Be-
cause there?s no well-defined application standard, instead of giving a set of rigorous
numbers, the following specifications of interest were proposed:
? Flexible Functionality: the op amp should be easily configurable to implement
many analog functions.
? High Gain: for better linearity and precision. The desired gain should be
larger than 80dB
? High Speed: desired unity-gain frequency fu ? 100MHz with high slew rate
SR ? 100V/uS.
? High Swing: for large dynamic range and high signal to noise ratio.
Other important specifications include fast settling, high common-mode-rejection-
ratio (CMRR) and power-supply-rejection-ratio (PSRR) large output swing (close
to rail-to-rail), good stability (phase margin PM ? 45/?) etc. The op amp is not
77
meant to operate at ultra low power, low supply condition, so power consumption
and low noise are not the major concerns for the prototype FPAA. Again, the design
were carried on the TSMC 018 CM mixed-signal CMOS process.
Figure 5.1: Analog Design Tradeoffs
In the FPAA, all the interconnect wirings are pre-defined and fixed, the cou-
pling effect and noise can be a serious issue. Naturely, when designing the op amp,
a fully differential configuration is desired because (1) it has large output swing;
(2) circuit is less susceptible to common-mode/coupling noise; and (3) there are no
even-order harmonics thus better linearity [75].
There might be many ways to start the design to meet the above specifica-
tions. Probably it?s easiest to start from the gain requirement. As CMOS technology
migrates to deep submicron regime, the op amp design becomes increasingly chal-
lenging as the supply voltage and transistor channel lengths scale down with every
generation, but threshold voltage does not accordingly.
The intrinsic gain of an MOS transistor can be expressed as:
Ai = gm ?ro = 2LeffV
gs ?Vt
?( ?xd?V
ds
)?1 = 2V
ov
? 1? (5.1)
78
where gm is the MOS transconductance, ro is the transistor output resistance, xd
is the width of the depletion region between the end of the channel and the drain,
Leff is the effective channel length, Vov is the overdrive voltage (Vgs ? Vt) and ?
is the channel length modulation coefficient. As the device feature size decreases,
the effective channel length shrinks so much that the channel length modulation
effect (?) becomes very prominent. Usually the overdrive voltage is in the order
of several hundred milivolts. ? for short channel devices could be larger than 0.2.
Thus the intrinsic gain of a short channel MOS transistor is between 10-50, which
is a fairly small number. In order to increase the gain, channel lengths can be
increased to reduce ? (suppress the channel length modulation effect). However,
the achievable gain is still quite low. Also, as device size increases, the parasitic
capacitance associated with the device also increases. Frequency response of the
device will degrade. To attack this difficulty, cascoding and gain-boosting techniques
can be used. Figure 5.2 shows four candidate topologies. Figure 5.2(a) and 5.2(b)
are two simple topologies. They can increase the intrinsic gain by a factor ? gmro.
But this still cannot meet the desired specification. Figure 5.2(c) and 5.2(d) use
gain-boosting technique, which can significantly boost up the intrinsic gain by a
factor of Avgmro3, where Av is the gain of the booster (i.e., the auxiliary amplifier).
They should satisfy the gain requirement at the cost of area, complexity and more
power consumption. These four amplifiers all have good speed, but due to the stack
of the cascoding devices, they have very limited output swing. Thus the dynamic
range is very small.
79
(a) Telescopic (b) Folded-cascode
(c) Gain-boosted telescopic (d) Gain-boosted folded-cascode
Figure 5.2: Four single stage amplifier topologies
80
To increase the open-loop gain,and at the mean time, provide a large output
swing, a multi-stage topology may be employed. Although adding a third stage
can improve the gain, the drawbacks are obvious: (1) it dissipates more power;
(2) it deteriorates op amp frequency response because the 3rd stage introduces at
least one more pole, which usually makes the op amp difficult to compensate and
therefore deteriorate the overall frequency response. On the other hand, there are
several advantages of a two-stage topology. Firstly, with appropriate design, two-
stage configuration can well balance the gain and bandwidth tradeoff. Secondly, in a
typical two-stage op amp, the noise is attenuated by the gain of the first stage when
it?s referred back to the inputs. Thus the noise of a two-stage amplifier is comparable
to that of a single stage amplifier. Thirdly, the second stage or output stage can
be designed to source and sink large currents (push-pull) to improve the slew rate.
These benefits suggest that the tradeoffs among gain, noise, bandwidth and output
swing can be significantly mitigated by employing a two-stage topology. It should be
noted though, the traditional, simple two-stage topology is not sufficient due to its
limited gain (below 70dB) in the deep submicron regime. The cascoding structure
was adopted in this design. The gain-boosting technique was not used, because the
op amp appears in every CAB of the FPAA and the gain-boosting topology will add
significant area cost and power consumption to the system.
Input Stage On the TSMC018 CM process, the supply voltage is 3.3V. This
provides a good voltage headroom. For the first stage, a topology that allows for
high gain, low noise and low power consumption is desired. Here, output swing is
less important since high swing can be obtained in the second stage. As discussed in
81
the previous section, the open-loop gain for regular two-stage amplifier is fairly small
for deep submicron CMOS technology (below 70dB). The gain-boosting technique
should not be used, because it takes up more area and add significant amount of
power consumption to the FPAA. To increase the gain, cascoding technique can be
adopted. The available options are folded-cascode and telescopic structures (figure
5.2). Telescopic structure has slightly higher gain and better frequency response
because the second dominant pole of the folded-cascode structure is closer to the
origin. When further comparing these two topologies, it should be noticed that,
to minimize power dissipation, the number of current legs in the amplifier must be
minimized. This favors telescopic topology compared to folded cascode. Also, in a
two-stage amplifier, noise is dominated by the first high gain stage. This means the
input devices and the active loads will contribute significant amplifier noise. The
folded cascode has more devices in the signal path, which contribute more noise.
Therefore a telescopic first stage will be a better choice. In this work, a novel fully
balanced, telescopic differential difference input stage is used as the first high gain
stage.
Output Stage For the second Stage, the main concern in selecting the appro-
priate configuration is the output swing and its driving capability. In comparing the
class A and class AB output stages, the latter allows for a smaller standby biasing
current while still being able to source and sink large current for dynamic transi-
tions. These advantages make class AB outperform class A configuration. In this
design, a common-source class AB output stage was employed. It allows a nearly
rail-to-rail output swing, i.e., one overdrive voltage within the supply rails, and low
82
Figure 5.3: The DDA conceptual block diagram (a)symbol; (b)block diagram
power consumption.
5.2 Design of the Differential Difference Op Amp
In this work, a novel differential difference amplifier (DDA) was developed
(figure 5.3). Using the notations similar to those in [75], the signal variables of
DDA are described by four vectors:
?
??
??
??
??
??
??
??
Vid
Vcy
Vcx
Vcd
?
??
??
??
??
??
??
??
=
?
??
??
??
??
??
??
??
1 ?1 ?1 1
1/2 1/2 0 0
0 0 1/2 1/2
1/2 ?1/2 1/2 ?1/2
?
??
??
??
??
??
??
??
?
??
??
??
??
??
??
??
Vyp
Vyn
Vxp
Vxn
?
??
??
??
??
??
??
??
(5.2)
The differential voltage Vid = (vyp?vyn)?(vxp?vxn) is what needs to be amplified.
The other three vector components are common-mode voltages and usually should
83
not be amplified. Ideally, the DDA amplifies the differential voltage vD by an near
infinite amount and fully suppress all common-mode voltages:
vo = A0[(vyp ?vyn)?(vxp ?vxn)] (5.3)
where A0 is the open-loop gain. When negative feedback is applied and A0 ? ?,
vyp ?vyn = vxp ?vxn. As the open-loop gain decreases, the difference between the
two differential voltages increases. Therefore, the open-loop gain is required to be
as large as possible in order to improve the performance.
The output of the non-ideal op amp with the parameters of its linear model
can be characterized as:
Vo = Ad[(vD ?Voff) + 1CMRR
y
(vcy ?Vcy0)
+ 1CMRR
x
(vcx ?Vcx0)
+ 1CMRR
d
(vcd ?Vcd0)] (5.4)
where Ad is the open-loop gain, Voff is the offset voltage, CMRRy and CMRRx are
Y port and X port common-mode rejection ratios. CMRRd is a new parameter that
is not available for general two-input op amps. It measures the effect of equal floating
voltages at the two input ports. The nonlinear function is linearized around the
biasing points vo = VCM0, vyp = Vyp0, vxp = Vxp0 vcd = Vcd0. This equation indicates
that to improve the common-mode-rejection-ratio, not only each of the transistors in
the X or Y port should be matched, the two differential pairs should also match each
other (to improve CMRRcd). Thus a well-planned layout arrangement is crucial to
the amplifier?s performance.
84
Input Stage
Figure 5.4 shows the schematic of the fully differential input stage. NMOS
transistors are faster than PMOS transistors due to the higher mobility of electrons
than that of holes. As a result, amplifiers with all NMOS transistors on the signal
paths will have higher speed than the their PMOS counterpart1. Therefore NMOS
transistors were used in the two differential pairs. Transistor M11 an M12 are the
tail current sources. Transistor M1 to M4 form the two differential pairs, which
convert the differential input voltages into currents. The two differential pairs are
drain cross-coupled. Transistors M21, M22 are cascoding devices. Cadcoded current
sources were used as active loads, where M5 and M6 are cascoding current source
loads. They convert the differential currents into two differential output voltages,
which are the inputs to the second stage. Since the two outputs Vout1 L and Vout1 R
are high impedance nodes, a little mismatch between the currents through M11
and M12 and current sources on the top through M21 and M22 will result a large
voltage drift from the desired common-mode output voltages. Thus a common-
mode feedback (CMFB) block must be employed. Here, transistor M5C and M6C
are used to adjust the common-mode output voltage of the first stage. Their gates
are controlled by the common-mode feedback signal. Voltages Vbp, Vb2, Vb1 and Vbn
are used to properly bias the telescopic stage. A supply independent, high swing
cascoding biasing block was used to generate these voltages. This biasing block
1For some low noise applications, it would be beneficial to use PMOS transistors as the input
pair because of their smaller 1/f noise. However, this should not be a major concern for this general
purpose FPAA.
85
Figure 5.4: The input stage of the DDA
will be discussed later. One drawback of this input stage topology is its limited
common-mode input range.
The outputs of the first stage can be expressed as:
Vout1 = Vout1 L ?Vout1 R = A1[(Vyp ?Vyn)?Vxp ?Vxn] (5.5)
where A1 is the small signal voltage gain of the input stage, as shown below:
A1 = gm1,2,3,4(ro,up//ro,down) (5.6)
86
ro,up = ro21 + [1 + (gm21 +gmb21)ro21](ro5//ro5c) (5.7)
ro,down = ro,down1//ro,down2 (5.8)
ro,down1 = ro1 1 + [1 + (gm1 1 +gmb1 1)ro1 1]ro1 (5.9)
ro,down2 = ro3 3 + [1 + (gm3 3 +gmb3 3)ro3 3]ro3 (5.10)
In the above equations, gm,x, gmb,x and ro,x are the transconductance, body transcon-
ductance and output resistance of transistor x, respectively. The input stage incor-
porates cascoding devices (M5-M21, M1-M1 1 etc), with the two differential pairs.
It takes advantages of the telescopic structure, namely high gain and excellent fre-
quency response. This stage along will be able to provide DC gain of approximately
60dB.
Output Stage The output stage is also a fully differential structure. It con-
sists of two identical common-source class AB output stages (M7-M10 and M17-
M20) [76], which can provide large output current diving capability with relatively
low standby power consumption. The common-source topology ensures a large out-
put swing, about one Vov within the supply rails. The problem, the common-mode
output voltage of the first stage is not the same as the DC bias point required by the
second stage input, is solved through the use of voltage level-shifting transistors M15
and M16. Transistors M15 and M16 are used to properly bias both stages. Since
transistors M7 and M8 are set to have the same size, to the first order, they carry
the same current, thus they have the same gate source voltages (i.e., Vgs7 = Vgs8).
Vsg9 +Vsg7 +Vsg10 = VDD (5.11)
Vsg15 +Vsg16 +Vsg7 = VDD (5.12)
87
Figure 5.5: The output stage of the DDA
From equations (8) and (9), it can be derived that:
Vsg9 +Vsg10 = Vsg15 +Vsg16 (5.13)
To the first order, the current through transistors M9 and M10 equals the current
flowing through transistors M15 and M16. When the sizes of transistors M7, M8,
M9 and M10 are properly defined, the gate voltage of transistor M10 would be
slightly larger its threshold voltage. In other words, during standby mode, the
88
current through M9 and M10 is can be set by the current source ISB, which is
usually a very small static current. Thus the static/standby power consumption
can be kept to minimum. However, when the circuit is in normal operation, either
transistor M9 or M10 or both will work in saturation region, or one is fully turned
on, another is turned off (depends on the signal level at the outputs of the first
stage). The common-source configured transistors M9 and M10 will supply large
current to drive the capacitive load.
The small signal gain of the second stage can be expressed as:
A2 = (gm9 +gm10)(ro9//ro10) (5.14)
In order to properly set the output common-mode voltage, the output voltages are
sensed and compared with the reference voltage. The generated control voltage is
fed back to adjust the output voltages of the first stage. Consequently, the common-
mode voltages of the second stage are adjusted.
Compensation for the Op Amp
Compensation is required to maintain stability of most amplifiers when they
are configured in some form of feedback loop. For this two-stage op amp, the domi-
nant pole is at node Vout1, and the first nondominant pole is at the final output node
Vout. The traditional Miller compensation scheme places a pole-splitting capacitor
between the final output of the amplifier and the output of the first stage of the
amplifier. This has the effect of creating a low frequency, dominant pole and moving
the second pole to a higher frequency which will ensure amplifier stability. Due to
the relatively small gm of MOS transistors, a right half place (RHP) zero is closed to
89
the non-dominant poles [75]. This brings stability problem. Several methods have
been developed to eliminate this undesired RHP zero such as adding a follower stage
at the cost of more power consumption and complexity. A simpler method would be
using a nulling resistor (which is usually implemented by a MOS transistor) to can-
cel the zero. However, this requires an extra biasing voltage to adjust the effective
resistance of the nulling transistor. Thus, an alternative method, cascode compen-
sation scheme was used in this design. It creates a dominant pole and two complex
poles at higher frequency by placing a compensation capacitor between the amplifier
output and first stage cascode node. This will also ensure amplifier stability when it
is placed in a feedback loop. Although both compensation schemes ensure stability,
the cascode compensation scheme improves the speed of the amplifier as compared
to the standard Miller compensation method [77].
Common-Mode-Feedback (CMFB) Block The current of the current
source loads on the top is essential set by the gate biasing, while the current at the
bottom half circuit is set by the tail current source. Because the high impedance of
thecascoded structure, aslightly current mismatch will result in largecommon-mode
variation. Therefore, CMFB block is necessary for all the fully differential ampli-
fier with active loads. Switched-capacitor based CMFB scheme was not considered
here because the op amp was designed to operate in continuous-mode. Figure 5.6
is a widely used scheme that uses transistors only[78], [79]. This scheme does not
resistively load the op amp outputs, but the source-coupled pairs MC11 and MC22
capacitively load the op amp outputs. More importantly, the proper operation of
this CMFB block requires MC11, MC12, MC21 and MC22 to remain on during the
90
Figure 5.6: A transistors-only common-mode feedback circuit
entire output swing. As a result, the output swing is limited. Since dynamic range
is an important parameter for this op amp, the CMFB block shown in figure 5.7 was
used. The common-mode output voltage Voc = Vop+Von2 is sensed by the resistors.
And this value is compared with the desired common-mode reference voltage Vcmrf.
The amplified difference is sent back to control the gate bias in the first stage, which
in turn adjust the output CM voltage until it?s equal to Vcmrf. The CM sensing
resistors with the input capacitance of the CM sense amplifier differential pair will
introduce a pole in the CMFB loop. This degrade the CMFB loop gain at higher
frequency. The two capacitors in parallel with the resistors are used to introduce
a left-half-plane zero to slow down the gain drop, so CMFB still functions at fairly
high frequency. Although this scheme may resistively load the amplifier, the gain is
already high enough to meet the requirement.
91
Figure 5.7: The common-mode feedback circuit used in this design
The Supply Independent, High-Swing Biasing Block
The Supply Independent Biasing Block In an op amp and many other analog
circuits it is usually desirable that the quiescent conditions be stabilized with respect
to variations in the supply voltage. Reducing the sensitivity of current sources to
changes in the supply voltage is essential to making circuits immune to noise on
the supply. Generally, the current source is directly derived from the power supply
VDD. To reduce the sensitivity of the current source to VDD, a voltage value other
than power supply must be used. In this work, a Vth referenced supply independent
biasing circuit was used, as shown in figure 5.8. The current is set by VGS1/R,
which suggests the current is independent of power supply. However, due to MOS
transistors finite output resistance, it current will still be related to VDD but at much
92
Figure 5.8: The Vth referenced biasing block (a) two possible operating points (b)
the complete biasing block with a startup circuit.
less sensitive level comparing the current directly set by the power supply. It should
be noted that there are two possible operating points, as shown in figure 5.8 (a).
Point B, where only leakage currents flow, should be normally be unstable. However,
in practical circuits the transistor current gains degrade at very low currents. As a
result, B may be a stable operating point. Therefore, a start-up circuit was used to
ensure that ID2 ? 0. When the circuit is stuck at point B, transistor M5 is used to
pull up the gate voltage of M2 until the circuit goes back to the normal operation
point A. At this point, the gate-source voltage of M5 is much smaller than the
threshold (due to the two stacked diode-connected NMOS transistors on the left)
and it turns off.
93
The current source ID2 is given as:
ID2 = VGS1R (5.15)
Since VGS1 = Vth1 +Vov1, the temperature dependence of the current source is:
TCIref = 1I
D2
?ID2
?T (5.16)
= 1V
GS1
(?Vth1?T + ?Vov1?T )? 1R?R?T
= Vth1V
GS1
TCVth1 + Vov1V
GS1
TCVov1 ?TCR
It?s well known that MOS transistor threshold voltage has a negative TC [75]. The
overdrive voltage also has a negative TC which primarily is due to the negative
temperature dependence of the electron/hole mobility. By properly choosing a type
of resistor with a certain amount of negative TC on this specific CMOS process,
the temperature coefficient of the source current will be minimized. The simulation
result shows a overall TCIref of approximately 240ppm/?C.
The High Swing Biasing Block The cascoded current source is a useful struc-
ture that is widely used as active load in many analog circuits. The easiest way to
bias the cascoded current mirrors is shown in figure 5.9:
To keep transistors M3 and M2 in saturation, the minimum voltage of nodes
P and Y should satisfy:
VP,min = VGS1 +VGS0 ?Vth3 ? Vth + 2Vov (5.17)
VY = VGS1 +VGS0 ?VGS3 ? Vth +Vov (5.18)
This means to keep both M3 and M2 in saturation there?s a high voltage overhead,
because ideally only 2Vov is needed. This VP voltage will limit the amplifier swing in
94
Figure 5.9: Biasing the cascoded current mirror
the first stage. To reduce the overhead, the voltage at node Y should be minimized
to a value slightly higher than the Vov. A high swing biasing block called Sooch
cascode current mirror similar to figure 5.10 [75] was used in this design. Transistor
M5 is deliberately set to operate in the triode region. If all the transistors have the
same aspect ratio W/L, then when M5 is sized as 13 WL , the gate voltage would be
Vth+Vov. The drain voltage of M1 is about Vov. As a result, one Vth voltage is saved
for the swing. The transistor M4 here is also operated in saturation region. This
ensures M3 and M1 have the same drain-source voltage and improve the current
matching. In the actual design, the aspect ratio of M5 was smaller than 13 WL to leave
some room to make M1 and M2 stay in saturation region.
The Complete DDA Circuit The complete amplifier schematic including
the amplifier core and the whole biasing block is shown in figure 11 and 12.
When designing this amplifier, the first thing determined was the tail current of
95
Figure 5.10: The high swing Biasing block
the two differential pairs. Based on previous design experience, the compensation
capacitance is in the range of a few hundred femto farads to a pico farad. To achieve
higher than 100V/us slew rate, approximately 100uA current was picked for each
tail current source. Because the two differential pairs have drain cross coupled, this
guarantees at least 200V/us slew rate2. This relatively high biasing current also
improves the bandwidth and settling time of the amplifier. As for the bandwidth
concern, considering the worst case scenario with compensation capacitance of 1pF,
when the amplifier is Miller compensated with phase margin PM ? 45?, there?s
only one dominant pole. Thus it can be modeled as a first order system with unity
2If the capacitance at the amplifier?s final output is comparable or larger than the compensation
capacitance, it can be charged or discharged through the class AB output stage, which essentially
does not have slew limitation.
96
Figure 5.11: The amplifier core
97
Figure 5.12: The complete biasing block
98
gain bandwidth (or gain-bandwidth product GBW) of gm/(2piCc) = Itail/(Vov2piCc).
Put in the numbers and the estimated bandwidth is about 110MHz, which certainly
meets the desired specification.
Using laser Makelink, the first stage may be used along as a single stage
amplifier or OTA (because there?s only one high impedance node at its output).
To get an adequate swing headroom, the sum of the overdirve voltages of the five
stacked transistors were chosen to be half of the supply voltage. The four PMOS
transistors M5,M6,M23 and M24 were allocated with higher overdrive ? 400mV.
They also have longer channel length. These two arrangements improve the overall
amplifier performance because (1) both threshold voltage and transconductance
parameter mismatches are inversely proportional to the square root of the transistor
area [80]. Longer channel length reduces mismatch between current mirrors, and
large overdrive minimize the effect of the mismatch; (2) these four PMOS transistors
are the major noise contributors other than the four input devices. Since they do not
capacitively load the signal path, increasing their size will reduce the noise without
affecting the amplifier bandwidth. The four NMOS cascoding devices and the four
PMOS cascoding devices directly contribute to the dominant pole of the amplifier,
so they have shorter channel length. Since the speed/bandwidth is a major concern,
the two differential pairs also have smaller channel length at the cost of higher offset
and noise. The transistors in the output stage also have shorter length but large
aspect ration. This improve the amplifier?s frequency response.
The Nonidealities and Layout Considerations For any fully differential
topology, the mismatch between transistors always has critical effect on the circuit
99
performance, especially on CMRR and offset voltage. The mismatch between the
transistors inside the same differential pair has been analyzed extensively in many
texts [75]. The emphasis here is focused on the mismatch effect between the differ-
ential pairs, CMRRd. For simplicity, it?s assumed the transistor sizes of the same
differential pair are nominally matched.
Using the notation in equation (5.2), equation (5.2) can be written in a more
general form:
vo = AV 2[fy(vyp ?vyn)?fx(vxp ?vxn)] (5.19)
or:
vo = AV2[fy(vcd ? vid2 )?fx(vcd ? vid2 )] (5.20)
Here AV 2 is the gain of the second stage, I = f(x) represents the voltage to current
transfer function of the input stage times the output resistance. Ideally, when vid is
zero, the output should be zero (AC ground).
The current Id in each differential branch satisfies:
Id =
?
???
???
???
???
???
??
???
???
???
???
???
???
?Itail for Vd ??
radicalBig
2Itail/?
1
2?
radicalBig4I
tail
? ?V
2
d for |Vd| ?
radicalBig
2Itail/?
Itail for Vd ?
radicalBig
2Itail/?
(5.21)
Combining equation (5.20) and (5.21), the first stage differential gain AV 1 is:
AV 1 = ?vo?v
id
|vcd=VCD0,vid?0 (5.22)
100
= AV 2
radicalBig
?Id 1?
?
Id
V 2CD0
2radicalbigg
1? ?I
d
V 2CD0
2
(5.23)
where ? is MOS transistor transconductance. The gain is highest when the common-
mode voltage of the two differential voltages VCD0 is zero. If ?X and ?Y are the
transconductance of the transistor within X and Y ports, and they are not identi-
cal, then according to the definition, CMRR = Adm/Acm?dm (Adm is the differential
gain, while Adm cm is the common-mode to differential gain) with some approxima-
tion:
CMRRd,stage1 ? 1
1?
radicalBig
?Y /?X
(5.24)
and the offset voltage between the two ports is:
Voff =
radicalBig
?Y/?X ?1?VCD0 (5.25)
Similarly, the second stage CMRR ratio can be derived as:
CMRRd,stage2 ? 1
1?
?
?N?+
?
?P??
?N+
?
?P
(5.26)
So the overall CMRR of this op amp is:
CMRRd ? 1
1?
radicalBig
?Y /?X
? 1
1?
?
?N?+
?
?P??
?N+
?
?P
(5.27)
The above equations show that ? mismatch reduces common mode rejection
ratio and increases the offset voltage. Since there are two tail currents in the first
stage, the mismatch between them also has negative effect. The CMRR ratio and
offset of the first stage due to tail current mismatch are:
CMRRd1 ? 11? Itail1
Itail2
? (2?
?
IdV
2
CD0)
2
2 + ?I
d
V 2CD0 (5.28)
101
Voff ? (Itail1I
tail2
?1)? VCD02? ?
IdV
2CD0 (5.29)
These errors could be due to (1) lithography and etching induced geometry mis-
match, which result in device size deviation from the ideal value; (2) process induced
mismatches which result in variations of the threshold voltage, gate oxide thickness
and carrier mobility etc. Because the amplifiers exist in a fixed, pre-defined array
based architecture, crosstalk and other noise sources are inevitable. Thus to min-
imize these negative effects and improve CMRR is crucial to the overall system
performance. This can be achieved by careful layout design.
The critical matching components include the four NMOS transistors of the
two differential pairs and the two tail current sources. Each of the four transistors
in the differential pairs was splitted into four parts and arranged as
?
??
??
A B C D D C B A
D C B A A B C D
?
??
??
The two tail current sources also use the interdigitated structures and have the
similar common-centroid arrangement. Dummy devices were used on the sides of the
block to ensure the active devices have the same percentage of over/under etching.
A plenty of substrate contacts were placed around the block to ensure device have
even ground potential. They also function as a guardring to reduce the substrate
coupling. The above layout techniques help to average out the process variations
across the area and improve the geometry matching. Figure 5.13 and 14 show the
layout of the input and output, respectively. The same strategy was extensively in
the amplifier layout. Figure 5.15 is the final layout of the complete amplifier.
102
Figure 5.13: The fully differential input stage
103
Figure 5.14: The class AB output stage
104
Figure 5.15: The complete amplifier layout
105
5.3 Results and Discussion
As mentioned previously, the problem of start-up of the supply independent
biasing block must be carefully analyzed. The supply voltage was DC swept from
zero to VDD (to exclude the parasitic capacitance caused false start-up) as well as
in a transient test [81], as shown in figure 5.16. Both simulation and experimental
results prove the proper start-up of this block.
Figure 5.17 shows the biasing current as a function of the temperature. By
the careful selection of a P+ poly resistor with an appropriate negative TC, TC
of this current source is about 236ppm/?C, which is about an order of magnitude
better than the general specification (? 2000ppm/?C [81]).
The open-loop frequency response (parasitic extracted post-layout simula-
tion) is shown in figure 5.18. The unity gain bandwidth is well above 100MHz
at 641.2MHz with 62? phase margin with the nominal on-chip capacitance. Even
with 1.5pF capacitive load, it still provides adequate phase margin of 45?.
When the devices are nominally matched, the CMRR and PSRR are ex-
tremely high for this fully differential amplifier. But this is an unrealistic condition
which will never occur. In practice the actual rejection ratio is always measured, ei-
ther explicitly or inexplicitly, with certain offset present. Figure 5.19 is the CMRRd
as a function of frequency. Clearly the offset voltage has big impact on this per-
formance figure of merit. On the contrary, the amplifier still shows a very good
power supply rejection ratio PSRR+ even with a relatively large offset of 15mV.
This benefits from the cascoded compensation scheme (cascoding structure and less
106
Isource
0.0 .5 1.0 1.5 2.0 2.5 3.0 3.5
dc (V)
125
100
75.0
50.0
25.0
0
I (uA)
Supply Independent Biasing at startup (DC sweep)
(a)
Isource Power Supply
0 .500 1.0 1.5 2.0 2.5 3.0
time (us)
125
100
75.0
50.0
25.0
0
25.0
I (uA)
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
V (V)
Supply independent biasing at startup: Transient Response
(b)
Figure 5.16: Supply independent biasing block at start-up (a) DC sweep; (b) tran-
sient 107
Isource
0.0 20.0 40.0 60.0 80.0 100
temp (C)
100.5
100.0
99.5
99.0
98.5
98.0
I (uA)
SIB: temperature sweep
Figure 5.17: Temperature sweep of the supply independent biasing block
noise feeding through the Cc and the supply insensitive biasing) as introduced in
the previous section.
The amplifier?s large signal and small signal transient response should also be
carefully examined. They are important specifications for op amps used in data
converters. Figure 5.21 and 5.22 are the amplifier?s large signal and small signal
step response, respectively. With typical on-chip capacitive load, the slew rate SR
is about 723V/uS, and the settling time within 0.1% accuracy is 8.8ns.
The amplifier can be configured to implement different gain by using a proper
feedback. Figure 23 demonstrate the gain of 1, 2, 4, 8 and 16 as a function of
frequency.
Finally figure 5.24 shows the input referred noise of this amplifier. At 1KHz,
108
OpenLoopGain PhaseResponse
100 101 102 102 104 105 105 107 108 108 1010
freq (Hz)
100
75.0
50.0
25.0
0
25.0
50.0
75.0
Gain
 ( dB
)
50.0
0
50.0
100
150
200
250
300
Phase
 ( deg
)
Openloop frequency response (nominal onchip cap load)
   GBW=641.2M, PM=62.0 DC gain=90.5dB, OL3dB frequency=11.7K
Fi
gu
re
5.1
8:
Op
en
-lo
op
fre
qu
en
cy
res
po
ns
e
10
9
CMRRd
101 102 102 104 105 105 107 108 108 1010
freq (Hz)
90
80
70
60
50
40
30
CMRRd
 (dB)
Commonmoderejectionratio CMRRd
Vcm=1.65, T=27C
Figure 5.19: Common mode rejection ratio vs. frequency
PSRR+
101 102 102 104 105 105 107 108 108 1010
freq (Hz)
125
100
75.0
50.0
25.0
0
PSRR+ (dB
)
PowerSupplyRejectionRatio: PSRR+
Vcm=1.65, T=27C
Figure 5.20: Power supply rejection ration vs. frequency
110
Output Step
0 50.0 100 150 200 250 300
time (ns)
2.2
2.0
1.8
1.6
1.4
1.2
Output (V)
Large Signal Step Response
Gain=1 with 0.8V Step
Figure 5.21: Large signal step response Gain=1 with 0.8V step
Output
0 50.0 100 150 200 250 300
time (ns)
1.658
1.655
1.653
1.65
1.648
1.645
1.643
Output (V)Output (V)
Small Signal Step Response
Gain=1 with 10mV step
Figure 5.22: Small signal step response Gain=1 with 10mV step
111
Gain=16 Gain=8 Gain=4 Gain=2 Gain=1
101 102 102 104 105 105 107 108 108 1010
freq (Hz)
17.5
15.0
12.5
10.0
7.5
5.0
2.5
0
2.5
Gain (V)
Gain vs. Frequency
Figure 5.23: Closed-loop gain as a function of frequency
the equivalent input noise voltage is about 2.1uV. For very low noise application,
the amplifier can be further modified to improve its noise performance at the cost
of area and more power consumption.
112
input noise; V / sqrt(Hz)
101 102 102 104 105 105 107 108 108 1010
freq (Hz)
17.5
15.0
12.5
10.0
7.5
5.0
2.5
0
2.5
Noise (uV/sqrt(Hz))
Input Referred Noise
     2.11uV at 1KHz
Figure 5.24: The input referred noise as a function of frequency
5.4 Application of Laser Makelink in the Op Amp Design
5.4.1 Offset Trimming
Due to the geometry mismatch and the other fabrication process variations
(doping level, oxide thickness etc) induced mismatch, a relatively large input offset
voltage often exists in the CMOS op amps (and comparators) comparing to that of
their bipolar counterpart. The amplified input offset introduces a DC shift at the
amplifier?s outputs, which affects the output swing or may even drive the amplifier
into nonlinear operation mode. Moreover, the input offset severely limits precision of
the system. Sometimes it may impose the lower bound on the maximum resolution
that the system can obtain. Thus offset cancellation technique is always employed in
113
Figure 5.25: Offset cancellation (a) Auto-zeroing; (b) Chopper stabilization
high precision analog and mixed-signal designs such as analog-to-digital converters.
Traditionally offset voltage is canceled dynamically using clock controlled
MOSFET switches. A periodic refreshing is required, because the junction and
subthreshold leakage of switches eventually corrupts the correction voltage stored
across the capacitors. In a typical design, the offset must be refreshed at a rate of at
least a few kilohertz. Previous offset cancellation techniques fall into two categories:
1) autozeroing (AZ) or 2) chopper stabilization (CHS). The follow figures illustrate
these two techniques [82].
AZ is essentially a sampling technique. Two clock phases are needed: (1)
sampling phase: the unwanted quantity (offset and noise) is sampled and stored on
the capacitors; (2) signal processing phase: this unwanted quantity is subtracted
from the contaminated signal either at the input, intermediate nodes or the output
114
of the amplifier/comparator [83]. Unlike the AZ approach, the CHS technique does
not use sampling, but rather applies modulation to transpose the signal to a higher
frequency where there is no 1/f noise, and then demodulates it back to the baseband
after amplification. A low pass filter is usually required to recover the desired signal.
These traditional approaches can reduce the input offset as well as the low
frequency noise (mainly 1/f noise), but the drawbacks are also obvious: (a) increased
circuit complexity and transistor count (b) extra clocks needed (c) more silicon
real estate cost (d) increased thermal and flicker noise and power consumption (e)
increased production cost. (f) degraded circuit performance (for example, reduced
bandwidth).
Other offset cancellation techniques such as DigitrimTM from ADI and laser
trimmed thin film resistor are also available. DigiTrim adjusts circuit performance
by programming digitally weighted current sources [84]. The latter method is to use
laser to cut the thin film resistors. As the beam traces along a resistor, it effectively
changes the width of the resistor. Since the resistor?s value is proportional to its
width, this permanently changes the resistor?s value. This requires special process
steps to deposit the thin film thus increases fabrication cost. For CMOS op amp,
this method is not very attractive because resistor load structure is not used very
often.
On the contrary, the laser trimming technique proposed here does not have
these limitations. The offset is measured and trimmed at the wafer level during
production.
The input offset voltage is caused by process variation (doping level, litho-
115
graphic errors, thermal/mechanical stress etc) induced device mismatches, which
include unmatched geometry sizes, threshold voltage or mobility mismatch etc. The
offset voltage of the differential pair can be expressed as [75]:
Voff =
radicaltpradicalvertex
radicalvertexradicalbt 2IDS
?C?oxWL [
??IDS
2IDS +
?(W/L)
2(W/L) ]??Vth (5.30)
IDS is the drain current of one branch, ?IDS is the current difference between
the two branches, ?Vth is the threshold voltage difference between the two input
transistors, ?(W/L) is the transistor size mismatch. Neither the first term nor the
second term on the right hand side of the above equation can be controlled to zero
in practical fabrication process, but their difference may be minimized so that the
offset voltage value is reduced. This goal can be achieved by adjusting the current
flowing in either branch. The idea is illustrated with the input stage of a fully
differential op amp as shown in figure 5.26.
The W, L or Vth mismatch will produce unbalanced DC bias currents in the
two branches. Due to the high impedance presented at nodes InOut L and InOut R,
a fairly large voltage difference will occur at the output. Therefore, non-zero differ-
ential input voltage, i.e., the offset voltage, must be applied in order to drive the
output to the desired value, which in this design is the middle of the supply rail. To
compensate the input offset, some smaller size PMOS transistors were intentionally
added (inside Trim Box) in parallel with the PMOS current source. These extra
transistors have the same channel length (or longer channel length can be chosen
to reduce the effect of loading) as the PMOS current source but binary weighted
channel width, 1x Wmin, 2x Wmin .... Each PMOS transistor has a laser Makelink
116
(a)
(b)
Figure 5.26: (a) the input stage of a fully differential CMOS op amp (b) The internal
configuration of the trim box
117
WtrimL="950n" WtrimL="1.05u";Output Difference WtrimL="1.15u"
WtrimL="1.25u" WtrimL="1.35u" WtrimL="1.45u"
1.645 1.648 1.65 1.653 1.655
dc (V)
3
2
1
0
1
2
3
Voutp  Voutn (V)
1.64995V
Differential Output
Figure 5.27: Offset trimming by laser Makelink: 10mV offset is reduced to 50uV
with a group of 10 ?trim? transistors
switch attached to it. One terminal of the Makelink is connected the drain, the
other one is connected to the drain of the PMOS current source in the differential
branch. A 5? resistor inserted in serial with each transistor to represent the actual
Makelink resistance. Before laser processing/trimming, all the switches are at off
state. Once the Makelink is being zapped, the extra transistor(s) will be added to
the current path. Based on the actual measurement, the number of these ?extra?
transistors added to the original circuit can be controlled. This is equivalent to
increasing the width of the PMOS current source at metallization level. Thus the
current in the two differential branches can be adjusted with minimal effect on the
circuit normal operation.
The width step of the ?extra? PMOS transistors should be determined based
118
on the fabrication statistics. This usually achieved by running a Monte Carlo simu-
lation or obtained from the field data. Based on the process statistic data provided
by TSMC, the maximum offset distribution was found to be below 10mV (1000
samples). Assuming 10mV offset, a group of 10 ?trim transistors? with 0.1um step
size is sufficient. Figure 5.28 is offset cancellation result. After laser ?trimming?, it
can be reduced to 50uV.
The advantages of laser trimming can be summarized as:
? Fully compatible with most of the commercial CMOS processes
? Has little or negligible effect on the amplifier/comparator?s speed/bandwidth
? No aliasing or intermodulation issues
? Simple circuit scheme: a) minimum modification on the circuit topology; b)
less component count thus less silicon area cost; c) less power consumption; d)
very little extra thermal/flicker noise; e) no external/internal clocks needed.
5.4.2 Laser Reconfiguration
The application of laser Makelink is not just limited to ?trimming?. The
flexibility of this technology can be seen by its capability to reconfigure the circuit
topology at ?equivalent-to-mask? level after fabrication.
For an on-chip, internally compensated op amp, to ensure the circuit stability,
it?s often designed to be over-compensated. The side effect of this over-compensation
is it sacrifices op amp?s bandwidth and speed for stability. Or, if the design under-
119
estimates the loading capacitance and does not provide enough phase margin, then
the op amp will be unstable. To remedy this over- or under-compensation, mul-
tiple compensation capacitors may be placed in parallel on the path, as shown in
figure 5.28. In this example, three capacitors with value of 100fF, 150F and 200fF
are used in the compensation capacitor bank. According to the application specific
loading capacitance, the achievable bandwidth can be optimized. The default com-
pensation capacitance value is 250fF. If in some case, a very large CL is present at
the output node, for example, CL = 2pF. The 250fF compensation only provides
42.7? phase margin, which may not be sufficient. To gain more phase margin, a
450fF compensation can be obtained by connecting three capacitors in parallel.
The achieveable phase margin is improved to 64.8?.
120
Figure 5.28: The amplifier core showing multiple compensation
121
OpenLoopGain (250fF Cc) PhaseResponse (250fF Cc) OpenLoopGain (450fF Cc) PhaseResponse (450fF Cc)
102 102 104 105 105 107 108 108 1010
freq (Hz)
100
75.0
50.0
25.0
0
25.0
50.0
75.0
Gain (dB)
50.0
0
50.0
100
150
200
250
300
Phase (deg)
OpenLoop Response
Figure 5.29: DDA open-loop frequency response: 250fF Cc vs. 450fF Cc
122
Chapter 6
Bandgap Reference
6.1 Introduction
Voltage and current references are pivotal building blocks in Analog/Mixed-
Signal/RF designs. Not only are they used as stand-alone ICs, but they are also
used within the designs of many other ICs (Figure 1). They exist in the power
management block, data converter reference and amplifier biasing network. Some-
times they may have a major impact on the performance and accuracy of the whole
system. For example, a voltage reference is often required in high resolution data
converters to quantify the input signal. A ?1.2mV tolerance on a 1.2V reference
corresponds to ?0.1% accuracy. This limits the resolution to approximately 10 bits.
The bandgap references developed here can be used as common-mode reference for
the DDA, to generate data converter reference voltage, or it may be combined with
other components in the FPAA to implement various applications.
In general such reference circuits generate an DC quantity, which exhibits
little dependence on process parameters, supply voltage or temperature (PVT)1.
1Some references have a well-defined dependence on the temperature instead of temperature in-
dependence, for example, a quantity that is directly proportional to absolute temperature (PTAT).
This kind of circuits are widely used in the temperature monitoring systems.
123
ADC DSP
&
Microsystem
DAC Amp
REF REF
Amp
REF
Clocks & Timer
Power Management
Interface
Figure 6.1: A generic Mixed-Signal System
There are many reference topologies available based on the different applications
and process technologies. Figure 2 shows two simple implementations.
Although it?s somewhat decoupled from the power supply rail, Figure 2 (a)
still has many deficiencies as a reference. The VBE value highly depends on process
parameters and has a very strong temperature coefficient (TC) of about -3.3%/?C.
Figure 2 (b) shows a better implementation. A Zener 2 diode is used in conjunction
with a regular diode, and an appreciably higher output voltage is realized. Since
Zener diode has a positive TC between +1.5 and +5 mV/?C, the overall TC of
the reference can be reduced to 100ppm/ ?C or less with proper bias and care-
2Pure Zener breakdownusually occurs below 5 V. Nowadaysdiodes with well-defined breakdown
characteristics are all called Zener diodes even though their breakdown mechanism is actually
avalanche breakdown. They typically have a breakdown voltage between 5 V and 8.5 V.
124
Figure 6.2: Diode References
fully chosen diodes. However, Zener diode based references must be driven from
a supply voltage considerably higher than 5 V, and it?s not fully compatible with
modern CMOS process. Also, Zener diodes are noisy due to the noisy nature of
the (avalanche) breakdown mechanism. Therefore they are not adequate for high
performance CMOS ICs.
6.2 Principle of Bandgap Reference
The bandgap reference (BGR) circuit has been the most elegant way to imple-
ment a stable, accurate and temperature independent low voltage reference on sili-
con. The principle relies on theproper summation of acomplementary-to-absolute-temperature
125
Temperature
Voltage
Vref
VCTAT
VPTAT
I
E
0 I0
AEmA
R
Q1 Q2
+
?
Figure 6.3: An Illustration of Bandgap Principle
(CTAT) voltage with a proportional-to-absolute-temperature (PTAT) voltage, as
shown in Figure 3 and equation (3.1):
Vref = VPTAT +VCTAT = VBE +x?VBE (6.1)
so that ?Vref/?T = 0. Here x is a weighting factor.
The PTAT voltage can be generated by two BJT?s operated at different current
densities:
VBE1 = VT ln(J1/Js) (6.2)
VBE2 = VT ln(J2/Js)
Then:
?VBE = VT ln(J1/J2) = VT lnm (6.3)
126
and:
??VBE
?T =
k
q lnm (6.4)
where k is Boltzmann constant, q is electron charge, JC is current density and VT
is thermal voltage. This shows ?VBE is PTAT. Note here JC = ?VBE/(RAE) =
VT/(RAE) is also an PTAT quantity, if neglecting the resistor R?s temperature
dependence for the moment.
It is well known that VBE has a negative TC with nonlinear temperature
dependence. This can be found by taking derivative of equation (2) (assuming
linear temperature dependence of JC):
?VBE
?T =
?VT
?T ln
JC
Js +
VT
JC
?Jc
?T ?
VT
JS
?JS
?T (6.5)
= kq ln JCJ
s
+ kq ? VTJ
S
?JS
?T
The first two terms in equation (5) represent the part of linear temperature
dependent behavior of VBE, while the 3nd term represents the higher order temper-
ature dependence. The saturation current JS can be approximated by [85]:
JS ? C1T? exp(?EG(T)kT ) (6.6)
where ? is a process-dependent parameter (representing the temperature dependence
of intrinsic carrier concentration ni and mobility ?), EG(T) is silicon bandgap at
temperature T. Substitute equation (6) into (5) and re-arrange these three terms:
?VBE
?T =
VBE(T)?(??1)?VG(T)
T (6.7)
From equation (7), it?s obvious VBE has a non-linear temperature dependence. But
to the first order, the variation of VBE with temperature can be approximated as
127
linear with TC between 1.5 mV/?C and 2.2 mV/?C, depending on the process pa-
rameters and temperature. Using equation (1) and by properly choosing a weighting
factor x (typical value is ? 23), a reference with near zero TC can be obtained.3
The bandgap reference technique is attractive in IC designs because of its
simplicity, the avoidance of Zener diodes and their noise, and more importantly these
days low voltage operation (<5 V). In 1971, Widlar introduced the first bandgap
reference, LM113 [86]. It used conventional junction-isolated bipolar IC technology
to make a stable 1.220V reference. However most of today?s bandgap references
are based on the classical topology invented by Brokaw in 1974 [87] as shown in
Figure 4. The two BJT?s Q1 and Q2 with different emitter area (1:8) are operated
at same collector current by virtue of the closed loop around the buffer amplifier
and R8 = R7, thus here m is 8 (equation (2)). The PTAT voltage ?VBE drops on
resister R2 and defines the current I2 = I1 = ?VBE/R2. The voltage drop across
resistor R1 is:
V1 = 2R1R
2
?VBE (6.8)
The resistors may have very high TC, but their ratio should be nearly temperature
independent. So V1 is PTAT. The bandgap voltage V2 is determined by:
VZ = VBE1 +V1 (6.9)
= VBE1 + 2R1R
2
VT ln8
= 1.205V
3Since most process parameters vary with temperature, if a quantity is temperature indepen-
dent, it?s usually also process independent.
128
Figure 6.4: AD580 Precision Bandgap Reference Based on Brokaw Cell,
Analog Devices, 1974
The output voltagecan bescaled up using thebuffer amplifier and theresistor ladder.
This is a first order bandgap reference because it only compensates the linear com-
ponent in equation (7).
6.3 A CMOS Implementation of Bandgap Reference
The goal of this work is to develop an CMOS bandgap reference suitable for
Field Programmable Analog Array and its associated applications. Apparently high
129
Figure 6.5: Realization of Substrate PNP BJTs on the CMOS process [88]
performance BJT?s are not available on the standard CMOS processes. Therefore
the classical Brokaw cell cannot be implemented directly in the original form. For-
tunately, for CMOS bandgaps, the parasitic substrate PNP transistors (Figure 5)
can be used. Even though they have a low ?, ?a poorly performing bandgap refer-
ence is still considerably superior to anything that can be built out of pure CMOS
components?.
130
Start-up Supply-Indept Current Mirrors
Op Amp
Summation
Figure 6.6: A Block Diagram of the Proposed BGR
6.3.1 Architecture
A block diagram of the proposed BGR is shown in Figure 6. It consists of
a bandgap core (Q0 through Q3) which generates the PTAT and CTAT voltages;
a high-gain op amp which is used to maintain nodes A and B at same potential
and regulate the current mirror biasing point to suppress supply voltage variation;
a summation branch generating the final bandgap voltage; and at last a start-up
circuit to ensure the BGR operates properly at power-up. It should be noted that
instead of one PN junction there are two VBE?s stacked together in each of the
branches. Besides it can directly provide a higher output reference voltage (?2.5V,
about twice the value of the general structure), an added advantage is this topology
can lower the effect of the op amp offset error.
131
6.3.2 The Bandgap Core
Figure 7 shows the BGR core circuit. It contains two pairs of stacked diode-
connected substrate pnp?s. Transistors Q1 and Q3 in conjunction with Q0 and Q2
are used to develop the PTAT voltage. The emitter area of the four transistors was
set as: AE1 = AE3 = 4AE0 = 4AE2. Using TSMC018 CM process model parameters,
the TC of VBE was found to be ?1.80 mV at room temperature. The four identical
PMOS transistors have channel length of 4 um to reduce channel length modulation
effect. An 40 uA biasing current with a relatively high overdrive voltage (Vov ? 0.55
V) was picked to improve the matching between the current mirrors. Since each of
the branches carries the same current and nodes A and B have the same potential
due to the negative feedback around the op amp, the two VBE difference drops across
resistor R1:
VR1 = (VBE1 ?VBE0) + (VBE3 ?VBE2) = 2VT ln4 (6.10)
This defines the PTAT voltage. The drain-source voltages of the PMOS transistors
are matched well so the current systematic offset won?t be an issue. As will be
explained later, the accuracy and temperature dependence of the resistor R1 are not
a problem. The primary error source here is the op amp offset, which causes a finite
potential difference between nodes A and B:
VERR = VOFF + VCA
m
(6.11)
where VOFF is the op amp offset voltage, Am is the gain. This error voltage VERR
should be kept small comparing to the PTAT voltage 2VT ln(4). This explains why
two stacked PN junctions can lower the effect of op amp offset error.
132
Figure 6.7: The BGR Core
133
6.3.3 Op Amp Design
The op amp plays an important role in the BGR. By intuition a high-gain, low-
offset op amp would be desired. High-gain can be achieved through cascoding or a
two-stage structure. For op amp offset, large transistor size for the input differential
pair may be used. With judicious layout design such as interdigitation and common-
centroid structure, offset can be reduced. Alternatively, autozeroing or chopper
stabilization techniques can be employed [89], [90]. But seems these two methods
are not widely used in BGR due to the cost of complexity, more power consumption
and switching noise. As discussed in the previous chapter, Laser Makelink based
trimming approach would be an excellent choice. It can also be used to trim the
poly-resistor to get high precision bandgap voltage. Another factor that needs to be
examined is the input common-mode range (ICMR). By inspecting Figure 7, ICMR
is found to be within 2VBE < ICMR < VDD ?VDsat. Other concerns including
noise and power consumption are especially important for low voltage operation.
With above considerations, a folded-cascode two-stage topology was adopted.
Two NMOS transistors areused asthe input differential pair fortheir speed and large
gm. Their common-mode operation voltage falls well within the required ICMR. The
amplifier has a very high gain (? 133dB at DC), at the mean time, provides enough
bandwidth (? 65 MHz unity-gain bandwidth with 2 pF). An effort was also made
to reduce the 1f noise in the first stage by choosing fairly large PMOS transistors
and overdrive voltage. Comparing to telescopic structure, the folded-cascode can
save one Vdsat drop. Thus it can be easily modified for TSMC 018 CM process 1.8
134
Figure 6.8: Schematic of the 2-stage folded-cascode op amp
135
V or sub-1.5 V operation. For low to medium precision system, a simple one stage
error amplifier could be used. However, for stability reason, a fairly large capacitor
has to be inserted between the gates of the PMOS mirrors and VDD (or gnd) to
make a dominant pole compensation. This not only occupies more area but also
makes the circuit susceptible to the coupling noise from the power lines through the
large compensation capacitor.
The op amp schematic is shown in Figure 8. Figure 9 and figure 10 are the
frequency response of this amplifier with 2 pF and 10 pF capacitive load, respec-
tively. They show DC gain of 133 dB, unity-gain frequency of 65 MHz with 83 ?
phase margin when 2 pF loading capacitance is present. Even with 10 pF load, this
amplifier still has enough phase margin of 60?.
6.3.4 The Complete Circuit
The complete BGR circuit is shown in figure 11. The branch consisting BJT
Q4 and Q5 (AE4 = AE5 = 4AE1) and PMOS transistors PM6 and PM7 is the
summation block. The special arrangement of resistor R1 and R2 will be explained
in the layout section. The final BGR output voltage is defined by:
Vref = VBE4 +VBE5 +IDS6R2 = 2VBE + R2R12VT lnm (6.12)
Ideal resistors should have low voltage and low temperature coefficients. In this
design, N+ poly resistors without silicide were used. They can be fabricated with
better accuracy comparing to the N-Well resistors (?15% vs ?22.7%). Also, they
have a reasonable sheet resistance of 292?/a50. For the resistor values in the BGR,
136
OpenLoopGain Phase Response
100 101 102 102 104 105 105 107 108 108 1010
freq (Hz)
150
100
50.0
0
50.0
100
Gain (dB)
0
50.0
100
150
200
250
300
350
Ph
as
e  (deg)
Unitygain
Op Amp Frequency Response
           DC Gain: 136.8dB; UGF: 65.09M; PM: 83.64 (2pF cap load)
Fi
gu
re
6.9
:O
pA
mp
fre
qu
en
cy
res
po
ns
ew
ith
2p
F
ca
pa
cit
ive
13
7
OpenLoopGain Phase Response
100 101 102 102 104 105 105 107 108 108 1010
freq (Hz)
150
100
50.0
0
50.0
100
Gain
 (dB)
0
50.0
100
150
200
250
300
350
Phase 
( deg)
Op Amp Frequency Response
           DC Gain: 113.8dB; UGF: 68.94M; PM: 60.6 (10pF cap load)
Fi
gu
re
6.1
0:
Op
Am
pf
req
ue
nc
yr
esp
on
se
wi
th
10
pF
ca
pa
cit
ive
loa
d
13
8
this ensures the resistor layout spanning long enough for good accuracy without
taking too much space. The non-silicide poly resistors can be modeled as:
R = R0[1 +VC1?V +VC2(?V)2][1 +TC1?T +TC2(?T)2] (6.13)
where ?T = T ?25?.
R0 = Ra50(L??L)(W ??W)
VC, TC are voltage coefficient and temperature coefficient, respectively. Ra50 is sheet
resistance, ?L, ?W are length and width variations, and R0 is the nominal layout
dependent resistance. Once the BGR is in normal operation, the voltage variation
across the resistors will be very small. So the VC is negligible. Because the resistors
are made of the same material, their ratio should be nearly temperature dependent.
The main error source here is the geometry/process variation caused mismatch.
This kind of mismatch can be minimized with careful layout technique. By fine
tuning the resistor ratio, an accurate, temperature insensitive reference voltage can
be generated.
Although BGR is essentially DC-operated, there are two important dynamic
issues related to the proper operation of BGR circuits: their behavior at start-up,
and their behavior in response to the transient loads. For example, when the gate
voltages of the PMOS current mirrors are VDD and the their source voltages are
0 (ground), there will be no current flowing through all the branches. This is a
possible and stable operating point. Thus, like most of the self-biasing or bootstrap
topologies, a start-up circuit is necessary to ensure the normal operation of BGR.
Transistors ST1 through ST4 perform this function. Initially all the transistors are
139
Figure 6.11: The complete BGR schematic
140
off with VG6=VDD and VS6=0. Thus transistor ST3 is off and VD ST3 is near VDD.
This causes transistor ST2 to start conducting, which pulls down the gate voltages
of all the PMOS current mirrors (i.e., discharging). Eventually they will be coming
out of the ?0? zone. At this point, ST3 starts conducting current and turns ST2 off.
When BGR is in normal operating mode, the start-up circuit should not affect the
main circuit. Here ST1 was designed to be a weak transistor to minimize the power
consumption.
With regard to the second dynamic issue, its? usually solved by adding a
high speed buffer amplifier to decouple the BGR block from the rest of the circuit
and improve the response time. The tradeoff here is between speed and power
consumption.
6.3.5 Layout Design
The importanceofjudicious layout will never beoveremphasized in analog/mixed-
signal IC design. In the BGR schematic, the critical matching components include:
BJT?s Q0 through Q6, PMOS transistors PM0 through PM6, and resistors R1 and
R2.
The values of passive components such as resistors and capacitors cannot be
controlled precisely in integrated circuits. For N-Well resistors, the resistance vari-
ation could be up to ?20% or more. So whenever it is possible, a precision value
should always be based on the ratio instead of absolute component value. Fortu-
nately in the BGR design, the accurate resistor value is not utter most important.
141
Figure 6.12: Resistor layout arrangement
The point of interest lies mainly in matching the resistor ratios rather than the abso-
lute values. This goal can be achieved through careful layout design. As illustrated
in Figure 12, both R1 and R2 were laid out with a 2u wide, 36u long unit resistor. R1
contains three unit resistors in parallel, while R2 contains five unit resistors in serial.
Dummy resistors were placed on the sides to eliminate the uneven etching/doping at
the edges. With this arrangement, even if the geometry sizes may deviate from the
layout, the resistors ratio will remain the same. Also, using eight resistors instead
of two, we have the flexibility to arrange them in a symmetric and common-centroid
structure. This helps to average out the process gradient along either direction on
the wafer.
Based on the same idea, BJT Q1 and Q3 were used as the unit transistors. Q1,
Q0 and Q5 were grouped together, with Q1 placed in the center and surrounded by
Q0 and Q5. Similarly Q3, Q2 and Q4 were laid out in another group with the same
structure. This arrangement can significantly improve the matching between these
BJT?s.
The final BGR layout is shown in figure 14. To improve the matching between
the current mirrors, the two PMOS transistors PM6 and PM7 were placed in the
142
Figure 6.13: BJT layout arrangement
143
center. Multi-finger structure was used and all the PMOS transistors were splitted
into smaller units and placed on the two sides.
6.3.6 Results and Discussions
Temperature stability is the primary specification for voltage references. This
BGR provides a reference voltage of 2.4927 V at 25?C. From 0?C to 85?C, the volt-
age variation is within ?0.587mV (figure 15). The maximum TC 4 is 16.06ppm/?C
at 85?C. It consumes about 1.4mW at 25?C. It should be noted this design is not
optimized for low power operation, but the can be readily modified to significantly
reduce the power consumption by using less current branches and low biasing cur-
rent. Another important specification of BGR is its insensitivity to power supply
variation, both at DC and at higher frequency, i.e., AC. For those small especially
battery powered devices, the power supply variation may be up to ?10%. Figure 16
shows the BGR output voltage as a function of power supply voltage. The circuit
can operate properly at 3V with TC of 69.02ppm/?C. It?s more robust for higher
than standard supply voltage. At 3.6 V, the maximum TC is 21.7ppm?C. If line
regulation or cascoded current mirrors are used, the BGR output voltage will be
more insensitive to the power supply variation. Figure 17 is the power supply rejec-
tion ratio (PSRR) of the BGR. High PSRR can effectively reject the coupling noise
from the power supply line. For the noisy environment, BGR wit a higher PSRR
can be achieved by using cascoded current mirrors based on the similar concept as
discussed in the amplifier chapter. Off-chip decoupling capacitors can also be used
4The temperature coefficient here is defined as: TC = 1
Vref
?Vref
?T .
144
Figure 6.14: Overall BGR Layout
145
TC
0 20.0 40.0 60.0 80.0 100
temp (C) ()
20
15
10
5.0
0
5.0
TC
 (E6)
Temperature Coefficient
/Vref
0.0 20.0 40.0 60.0 80.0 100
temp (C)
2.4932
2.4931
2.493
2.4929
2.4928
2.4927
2.4926
2.4925
V  (V
)
BGR voltage as a function of temperature (3.3V supply)
Fi
gu
re
6.1
5:
BG
R
Te
mp
era
tu
re
Sw
eep
14
6
to further cancel out the supply noise. The noise performance is a critical specifi-
cation for low voltage, high precision systems. This BGR circuit was not specifically
designed for low noise application. The output noise at 1 KHz is about 10.2 uV.
The regulation op amp generates most of the noise. By inspecting the amplifier
topology again, we can identity that the main noise contributor is from the input
differential pair. Because of their smaller sizes, 1f noise is the dominant factor. This
has been verified by the simulation. At the cost of more silicon area, the size of
two transistors can be increased. Figure 18 is the improved noise performance of
this BGR. It shows an output noise of 1.91 uV at 1 KHz with four times the size of
original differential pair.
6.4 Laser Makelink Trimming for Precision
Many of today?s electronics systems are migrating to small footprint and low
voltage operation. The reduced supply voltage leaves very small room for errors
and increases the accuracy requirements of the reference block. As discussed in the
previous sections, the main error contributors in the BGR are the op amp and the
resistors. Considering the op amp offset and its finite gain, equation (12) can be
re-written as:
Vref = 2VBE + R2R1(2VT lnm+VERR) (6.14)
Due to op amp?s high gain, the offset voltage is the main component of VERR. It
can be significantly reduced by the laser Makelink trimming method as described in
Chapter 5.
147
/Vref
1.5 2.0 2.5 3.0 3.5 4.0
Vsupply ()
2.5
2.4
2.3
2.2
2.1
2.0
1.9
1.8
V  (V
)
BGR voltage as a function of supply voltage
Fi
gu
re
6.1
6:
BG
R
vo
lta
ge
as
af
un
cti
on
of
su
pp
ly
vo
lta
ge
14
8
PSRR
100 101 102 102 104 105 105 107 108 108
freq (Hz)
20
10
0
10
20
30
40
50
PSRR (
dB)
BGR power supply rejeciton ratio
(45.1dB at DC)
Fi
gu
re
6.1
7:
BG
R
po
we
rs
up
ply
rej
ect
ion
rat
io
14
9
output noise; V / sqrt(Hz)
101 102 102 104 105 105 107 108 108
freq (Hz)
80
70
60
50
40
30
20
10
0
10
Noise (uV/sqrt(Hz))
BGR output noise
Fi
gu
re
6.1
8:
BG
R
ou
tp
ut
no
ise
15
0
output noise; V/ sqrt(Hz)
101 102 102 104 105 105 107 108 108
freq (Hz)
17.5
15.0
12.5
10.0
7.5
5.0
2.5
0
2.5
Noise (uV/sqrt(Hz))
BGR output noise
Fi
gu
re
6.1
9:
BG
R
ou
tp
ut
no
ise
wi
th
im
pr
ov
em
en
t
15
1
Another error comes from the resistors. Sometimes even with careful lay-
out design, the process/lithograph induced mismatches is still not tolerable. Laser
trimmed thin film resistors are often used for high precision BGR?s by many major
chip providers such as Texas Instruments, Analog Devices and National Semicon-
ductors. Thin film resistors are very temperature stable and can add to the thermal
stability and accuracy of a device, even without trimming. For better accuracy,
laser beam are used to ?cut? the thin film and very precise resistor values can be
obtained. However, the fabrication of this kind of resistors is not compatible with
standard CMOS processes. They require the integration of thin film deposition and
patterning, which increases the fabrication cost.
Comparing to the traditional laser trimmed thin film resistors, laser Makelink
is an excellent alternative. It?s fully compatible with all the CMOS processes. It
actually forms link instead of ?cutting?. For this BGR, resistor R2 and R1 can
be arranged as follows: In figure 20, RN is the nominal resistor value. Resistors
RT1 through RT8 are digitally grouped together with minimum value determined by
Monte Carlo simulation and process statistics. Using this arrangement, 1-15 times
the minimum trimming resistor values can be obtained.
6.5 A Low Voltage, Curvature Compensated Bandgap Reference
The design discussed so far is a first order bandgap reference, which should
be sufficient for a low to medium resolution system. However, some high precision
systems especially those operated at low power supply (? 1.8V) put a more stringent
152
R
Rt Rt 4Rt2Rt
Figure 6.20: BGR programmable resistor for laser Makelink trimming
requirement on the reference accuracy. This mandates the higher order, curvature
compensated BGR?s.
In the previous BGR design, the temperature dependence of VBE is assumed to
be linear. This is only a first order approximation. A more accurate representation
of VBE is [91]:
VBE(T) = VG0 ? TT
r
[VG(Tr)?VBE(Tr)]?(? ??)VT ln TT
r
(6.15)
where VG0 is the extrapolated bandgap voltage of silicon at 0 K, Tr is the reference
temperature, ? is a process dependent parameter which is usually less than four,
and ? is the order of temperature dependence of the BJT collector current IC. If IC
is PTAT, then ? is 1. The non-linear temperature dependence of VBE comes from
the third term in the equation. It can be further expanded using Taylor series.
Based on equation (15), many creative topologies have been developed to ap-
proximately cancel the nonlinear component of VBE. A classical method proposed
153
by Song & Gray is to add a squared PTAT term into the output of the first order
bandgap [92]. The basic idea is to cancel out the negative temperature dependence
of the logarithmic term in equation (15) with a positive parabolic term. The draw-
back of this method is a complex circuit is needed to generate the squared PTAT
voltage. It occupies more silicon area and consumes more power. Another technique
developed by Lee [93] et. al is by exploring the temperature dependence of BJT?s
current gain and exponentially cancel the non-linear component of VBE. This is
a simple yet very effective technique. However, it is not adequate for low voltage
operation because at least a bandgap voltage plus an overdrive voltage are needed.
A CMOS curvature compensated BGR presented in this section was designed
based on the topology proposed by Malcovati [94] with some modifications. It can
be operated at 1.8 V or even lower power supply. To demonstrate the effectiveness
of this method, an BGR without curvature compensation was also designed first.
Its schematic is shown in figure 21 (start-up circuit is not included). The same
amplifier (figure 8) was used with slight modifications. Comparing to the previous
design, this BGR has smaller number of current legs, thus it consumes less power.
More importantly, it can be operated at lower power supply as long as VDD ?
VBE +VD,sat. The output voltage is defined as:
VBGR = R3R
1
(VT R1R
0
lnN +VBE) (6.16)
where N was chosen as 16 so that moderate resistor sizes were used in the layout.
The temperature dependence of VBE is canceled out by the first term in the paren-
thesis to the first order. The output reference voltage may be arbitrarily set by the
154
Figure 6.21: A low voltage BGR without curvature compensation
155
resistor ratio R3/R1. Thus non-standard value (i.e., < 1.2 V) can be generated.
Attention should be paid that it?s best to set VBGR to about 0.7 V ? VBE1 ? VBE0
to minimize the current mismatch between the current mirrors.
To further improve the BGR accuracy and reduce its TC, a solution proposed
by Gunawan et. al [95] was used. The basic idea is to compensate the logarithmic
term in equation (15) by a proper combination of an VBE with a temperature-
independent current IC (this implies ? ? 0) and an VBE with an PTAT current
(? ? 1). Looking at figure 21, we know the collector currents through BJT Q1
and Q0 are PTAT. Since VBGR is nearly temperature independent, the drain-source
current of PMOS PM6 is at first order temperature independent (? ? 0). This
current can be mirrored and injected into another dioded connected BJT branch.
The new curvature compensated BGR is shown in figure 22. Again, using equation
(15), the VBE of Q0,Q1 and Q6 can be expressed as:
VBE0,1(T) = VG0 ? TT
r
[VG(Tr)?VBE0,1(Tr)]?(? ?1)VT ln TT
r
(6.17)
VBE6(T) = VG0 ? TT
r
[VG(Tr)?VBE6(Tr)]??VT ln TT
r
(6.18)
The VBE difference
VNL = VBE6(T)?VBE0,1(T) ? VT ln TT
r
(6.19)
is a nonlinear term which can be used to cancel the higher order temperature depen-
dence component of VBE. The nonlinear current INL defined by VNL/R4,5 is injected
into the BGR core. Then the BGR output voltage is:
VBGR = R3R
1
(VT R1R
0
lnN +VBE + R1R
4,5
VNL) (6.20)
156
where R4 and R5 are nominally matched. Because the last term in the parenthesis
is used to correct the nonlinear component of VBE, it?s straight forward to find out
that:
R4,5 = R1? ?1 (6.21)
However, the above theoretical analysis cannot be used directly in the actual
design because some of inexplicit assumptions were made. First, the VBE(Tr) is not
same for BJT Q0 and Q6. Secondly, the resistors have non-zero TC, so IC0 is not
PTAT and IDS6 ? IC6 is not temperature insensitive. Therefore equation (20) and
(21) should only be used as a general guidance. The exact resistor values highly
depend on the precise process parameter characterization and extensive simulation.
For this design, it?s found that ? ?? is close to 0.5, which is actually much smaller
than the expected value of 3. Here, R0 through R3 are the same type P+ Poly
resistors. R4 and R5 were intentionally chosen as N + Poly resistors, which have
slightly higher negative TC than that of the P+ poly resistors. This makes R1/R4,5
increases as temperature drops. The special choice of resistors proves to compensate
the VBE curvature most efficient.
Figure 23 is the comparison between the two BGR?s. The curvature com-
pensated BGR clearly shows a significant improvement of accuracy. The maximum
TC was reduced from 12.8ppm/?C to 6.21ppm/?C, with maximum reference volt-
age variation reduced from 297.8 uV to 41.0 uV between 0-85?C. This BGR can
work properly with 1.6 V power supply (can work in the sub-1V range with some
157
Figure 6.22: A low voltage BGR with curvature compensation
158
TC  NO CC TC  With CC
0 20.0 40.0 60.0 80.0 100
X0 ()
15
10
5.0
0
5.0
10
TC (E6)
Without curvature compensationWithout curvature
With curvature compensationure compensationWith curvat
Temperature Coefficient
VBGR  NO CC VBGR With CC
20.0 0 20.0 40.0 60.0 80.0 100
temp (C)
712.2
712.15
712.1
712.05
712.0
711.95
711.9
Vref (mV
)
 Without curvature compensation Without curvature compensation
With curvature compensationWith curvature compensation
BGR voltage as a function of temperature
Fi
gu
re
6.2
3:
Co
mp
ari
son
be
tw
een
BG
R?
sw
ith
an
dw
ith
ou
tc
ur
va
tu
re
co
mp
en
sat
ion
15
9
modifications on the op amp). It shows a power supply rejection ratio of 60 dB at
DC, and generates a noise voltage of 12.6 uV/squr(Hz) at 1 KHz. It consumes 426
uW power at 27?.
160
/VBGR
1.0 1.25 1.5 1.75
 ()
800
700
600
500
400
300
200
100
0
V (m
V)
BGR voltage as a function of supply voltage
Fi
gu
re
6.2
4:
BG
R
vo
lta
ge
as
af
un
cti
on
of
su
pp
ly
vo
lta
ge
va
ria
tio
n
16
1
PSRR
101 102 102 104 105 105 107 108 108
freq (Hz)
20
10
0
10
20
30
40
50
60
PSRR
 (dB
)
BGR power supply rejection ratio
(60dB at DC)
Fi
gu
re
6.2
5:
BG
R
po
we
rs
up
ply
rej
ect
ion
rat
io
16
2
output noise; V / sqrt(Hz)
101 102 102 104 105 105 107 108 108
freq (Hz)
100
80.0
60.0
40.0
20.0
0
20.0
Noise (uV/sqrt(Hz))
BGR output noise
(12.6uV/squr(Hz) at 1KHz
Fi
gu
re
6.2
6:
BG
R
no
ise
pe
rfo
rm
an
ce
16
3
Chapter 7
FPAA Applications
7.1 CAB Based Applications
The differential difference op amp (DDA) is a powerful analog circuit building
block. The amplifier itself combined with some passive components can implement
many analog signal processing functions. This section devotes to some CAB based
applications using this amplifier and the CAB architecture developed in the previous
chapter. All the circuits employ fully differential topologies.
7.1.1 Gain Amplifier
The straight forward applications of this op amp would be various gain ampli-
fiers, either inverting or non-inverting. Probably the easiest and most widely used
configuration is the fully differential unity-gain buffer, as shown in figure 7.1 (a).
Comparing to the standard implementations which require four matched resistors
with one 2-input, 2-output differential op amp, or two matched, single ended op amp,
this DDA based implementation is simpler and more accurate. Figure 7.1 (b) and
(c) are the inverting and non-inverting gain amplifier configurations. It should be
noted though, in contrast to the inverting configuration, the non-inverting gain am-
164
Figure 7.1: Non-inverting gain amplifier configuration
plifier displays a high input impedance that doesn?t load previous stage. For voltage
mode operation, it is usually desirable to use the non-inverting amplifier whenever
it?s possible. The typical frequency response is shown in figure 7.2 for different gain
values. A differential transresistance amplifier can also be easily implemented with
one DDA, as shown in figure 7.1(d).
Figure7.3(a)demonstrates another DDAapplication, voltage-controlled-current-
source (VCCS). The output current is determined by Vin/R. Figure 7.3 (b) shows
a sinusoidal voltage controlled current source, 500mV over a 10K resistor. In fact,
the control voltage doesn?t have to be an AC signal. A stable voltage source, such
as a bandgap can be fed into the op amp as the control voltage (through a simple
single to differential ended conversion). If the loading resistors are chosen to be the
165
Gain=16 Gain=8 Gain=4 Gain=2 Gain=1
101 102 102 104 105 105 107 108 108 1010
freq (Hz)
17.5
15.0
12.5
10.0
7.5
5.0
2.5
0
2.5
Gain (V)
Gain vs. Frequency
Figure 7.2: Non-inverting gain amplifier frequency response
same type of resistors, then this configuration can be used to generate the reference
voltages for ADCs or DACs, as shown in figure 7.4.
DDA together with two MOS transistors operated in triode region can also be
used as a modulation/multiplication cell, as shown in figure 7.5 (a). Two same size
PMOS transistors were used because their source and body can be shorted together
to reduce the signal dependent nonlinearity. When they are biased in the triode
region, the source drain current follows a linear relationship:
IDS ? 2?(VS ?VG ?Vthp) (7.1)
where ? is the transconductance parameter, VG and VS are the source and gate
voltage, respectively. In the above schematic, the carrier signal Vc = vcp ?vcn have
166
(a)
Controled Current
0 .500 1.0 1.5 2.0
time (us)
75
50
25
0
25
50
75
I (uA)
Voltage Controlled Current Source
(b)
Figure 7.3: Voltage controlled current source (VCCS) (a)schematic; (b)output
167
Figure 7.4: A reference voltage generation block for ADC
the same common-mode DC level as the modulating signal Vm. The gates of the
PMOS transistors are biased at 0. The modulated output is given as:
Vout = 2[1?2?(Vc ?Vthp)]Vm (7.2)
Figure 7.5 (b) shows the result.
7.1.2 Active Analog Filter
Filters arethe fundamental building blocks in all kinds of analogsignal process-
ing systems. They can be categorized into discrete analog passive filters, switched-
capacitor filter, active analogfilters (including RCactive filter andGm-C/MOSFET-
C filters), passive LC filters and distributed (waveguide) filters. Figure 7.6 summa-
rizes the choice of filter type based on the desired operating frequency [96]. One
168
(a)
Output Modulating Signal Carrier Signal
0 5.0 10 15 20
time (us)
3
2
1
0
1
2
3
Voltage (V)
DDA as a modulation/multiplication cell
(b)
Figure 7.5: DDA as a modulation/multiplication cell (a)schematic; (b)output
169
Figure 7.6: Choice of filter as a function of the operating frequency range
of the motivations of this work is to develop a continuous-mode operated FPAA
suitable for high frequency operation. Although switched-capacitor filters have the
advantage of less sensitive to the component precision, its bandwidth is usually lim-
ited to 1MHz. So only active RC filters are discussed here. It should be noted
though the DDA op amp itself has no problem to to used for either type of the
filters.
When it is internally compensated with larger than 45? phase margin, the
DDA can be modeled as a first order system with transfer function of:
H(s) ? ?us (7.3)
where ?u is the unity-gain frequency of the amplifier. As a general rule of thumb,
when used in an active filter, the bandwidth of the op amp should be at least 10
times of the filter?s cut-off frequency [97], because as frequency goes up, the op
amp?s dominant pole is coming to play thus there?s more ?unexpected? roll-off. One
170
Figure 7.7: Generalized Sallen-Key topology
exception is that for some low pass filters, this ?extra? roll-off may be welcome since
it provides more attenuation.
The Sallen-Key structure [98] is a popular filter implementation method. It
only requires one op amp per bi-quad. Thus it?s simple and especially attractive
when cost and power consumption are concerns. Figure 7.7 is a general representa-
tion of this topology. The voltage transfer function is given as:
Vo
Vi =
K
Z1Z2
Z3Z4 +
Z1
Z3 +
Z2
Z3 +
Z1(1?K)
Z4 + 1
(7.4)
where K = 1+R4/R3 is the filter gain. By properly choosing the component types
and values, low pass, high pass or bandpass filter response may be realized. Using
the DDA developed in chapter 5, fully differential Sallen-Key filter can be readily
implemented.
It?s usually difficult to design a fully differential Sallen-Key bandpass filter
171
using one standard op amp. The DDA provides an easy solution [99]. Figure 7.8
(a) is the implementation schematic. The center frequency and the quality factor
can be expressed as:
?0 =
radicalBigg 1
C1C2R1R2(1 +k) (7.5)
Q =
radicalBig
C1C2R1R2(1 +k)
C1R1 +C2R2 +C2R1 (7.6)
where k = (1+R3R4) is the gain. While this implementation is simple and less sensitive
to the component values [99], but since the quality factor is directly related to the
gain, so it?s difficult to adjust them independently. Moreover, the high Q have to
achieved by high gain with wide bandwidth. This may mandate less compensation
thus bring the risk of instability. Figure 7.8 (b) shows the simulated result.
It is also very convenient to implement low-pass and high-pass Sallen-Key
filters using DDA and some passive components. Filter 7.9 (a) is a third order But-
terworth low-pass filter implemented by cascading a first order stage with a second
order Sallen-Key. Figure 7.9 (b) is the filter?s frequency response. It shows a ?3dB
cut-off at 10.3MHz with 60dB/decade roll-off in the transition band. For Butter-
worth filter, there?s no ripple in the passband. Even though the component value
may not be precisely controlled, this filter can still be practically used on-chip as an
anti-aliasing filter for some high speed, low to medium resolution data converters
(for examples, SNR ? 60dB), since the attenuation at the aliasing frequency is
already below the noise floor. The transfer function of this filter can be expressed
172
(a)
Output
101 102 102 104 105 105 107 108
freq (Hz)
50
25
0
25
50
75
Output
 (dB)
Second order SallenKey Bandpass Filter
Center: 120K Q: 9.1
(b)
Figure 7.8: A second order Sallen-Key narrow band-pass filter
173
(a)
Output
101 102 102 104 105 105 107 108 108 1010
freq (Hz)
25.0
0
25.0
50.0
75.0
100
125
150
175
Gain (dB)
Third Order Butterworth Filter
 Gain=1   3dB = 10.3MHz
(b)
Figure 7.9: A third order Butterworth low-pass filter based on Sallen-Key topology
174
as:
Vo
Vi =
K
Z1Z2
Z3Z4 +
Z1
Z3 +
Z2
Z3 +
Z1(1?K)
Z4 + 1
(7.7)
Similarly, a third order Butterworth high-pass filter can also be implemented,
as shown in figure 10.
7.2 Temperature Measurement
This section describes the application of using FPAA sub-components to im-
plement a bigger system, namely, a temperature monitoring block. The application
uses the BGR, the DDA, some passive components and the inter-CAB tracks of the
FPAA.
To measure the temperature, we need to find a physical value that has a stable
and accurate relationship with temperature and compare it with a temperature
independent parameter. Although at a first glance that the PN junction voltage VBE
might be an option, that?s not a good design because VBE temperature dependence is
non-linear and varies significantly from fabrication process to process. As introduced
in Chapter 6, the voltage difference between two PN junctions operated at different
current densities is an excellent choice. This value has a precise PTAT temperature
dependence behavior (equation (6.3) an (6.4)), and can be easily derived from the
BGR as shown in figure 7.11. Theoretically, the floating voltage ?VBE across R1
can be used differentially, but Q2 collect voltage is at the lower boundary of the
DDA common-mode input and may be out of this range at low temperature. So
the PTAT voltage was developed across R2. The current Is = ?VBE/R1 is not an
175
(a)
Output
101 102 102 104 105 105 107 108 108 1010
freq (Hz)
50.0
0
50.0
100
150
200
250
300
350
Ou
tp
ut
(d
B)
Third Order Butterworth High Pass Filter
Gain =1 3dB at 10.3M
(b)
Figure 7.10: A third order Butterworth high-pass filter based on Sallen-Key Topol-
ogy
176
R1
R2
Q1 Q2
Is IsIs
VPTAT
Figure 7.11: A simplified schematic of the generation of VPTAT
accurate PTAT value, but VPTAT = ?VBE ?(R2//R1) would be since same type of
resistors were used for R2 and R2. By careful layout design, this ration can be kept
accurately and is almost independent of temperature.
Figure 7.12 (a) shows the overall temperature monitoring block diagram. The
output from BGR is a reference that is not capable to drive resistive load, so a buffer
amplifier was used. The Vref and VPTAT were pre-calibrated to the same value at
room temperature 27?C, which serves as a reference point. As temperature changes,
their difference is compared and amplified (by five in this design) by the DDA. The
temperature can be read according to the following formula:
T = Tref + VT ?VrefK (7.8)
where Tref is 27?C, and K is the slop of the output voltage as a function of tem-
perature.
177
Vo
BGR Buffer R-divider
Compare
&
Amplify
VBGR Vref
VPTAT
(a)
Output Voltage VPTAT Vref
0.0 25.0 50.0 75.0 100
temp (C)
4
3
2
1
0
1
2
Voltage (V)
Temperature Monitoring Block  Voltages vs. Temperature
          Tref=27C
(b)
Figure 7.12: Temperature monitoring/measurement block (a) diagram; (b) result
(0-100?C)
178
When this implementation is combined with some digital circuits and an ADC,
the output voltage may be directly converted into temperature reading. A more
straightforward but useful application isto use it to monitor the critical temperature.
For example, the VPTAT can be pre-calibrated to be a value smaller than Vref until
the chip temperature reaches the critical temperature. Thus the output will trigger
a positive pulse, which can be used to shut down a certain circuit block or to lower
the power.
179
7.3 A Hierarchical Implementation of an 8-bit Two-Step ADC
FPAA is essentially an analog system. All the applications developed previ-
ously are still pure analog signal processing. However, using the flexibility provided
by laser Makelink, namely, reconfiguration at metallization level, the array based
approach can be extended further into a hierarchical design methodology.
Analog-to-Digital converter is probably the most important mixed-signal cir-
cuit, which builds a bridge between the real analog world and the digital domain.
There are many types of ADC?s [97], [100], [101]. Flash ADC has a simple archi-
tecture and the fastest speed. Today?s 6-bit CMOS flash can be operated at GSPS
(Giga sample per second) speed [102], [103], but it has a prominent drawback - the
number of comparators grows exponentially with the number of bits. Increasing
the quantity of the comparators also increases the area of the circuit, as well as the
power consumption. The folding-and-interpolating architecture originally developed
for bipolar technology can reduce the number of comparators. But the folding am-
plifiers are usually open-loop configuration to provide the high frequency operation.
The large offset of CMOS implementation makes it difficult to implement open-loop
amplifiers. Also, since the coarse stage and the folding stages are inherently differ-
ent, the timing error is going to be a critical issue [101], [104], [105]. Another option
would be pipelined architecture. It significantly reduces the number of comparators.
High speed, high resolution and low power may be achieved simultaneously by using
this architecture. However, its long latency (for a N-bit pipelined ADC, the latency
is usually N clock cycles or longer) may exclude it to be used in many applications
180
[100]. Thus, a two-step flash architecture was chosen for this 8-bit ADC design.
Figure 7.13 is the block diagram of this two-step ADC. The traditional flash
architecture is separated into two subrange flash ADCs with feed-forward circuitry.
After the fully differential signal is sampled, a coarse estimate of the input signal is
obtained by the by the most-significant-bit (MSB) ADC, or coarse ADC. The result
is then converted back to an analog voltage with the DAC and subtracted from
original input. The residue from the subtraction is multiplied by 24 and fed into the
fine ADC (LSB ADC) to generate the final four bits. The coarse conversion, DAC
conversion, subtraction and fine conversion have to be completed in the sampling
period. Among them, the subtractor is the slowest part. Because the major error
would be from the MSB ADC and the error in fine ADC is at LSB level, to improve
the speed, the residue in this design was multiplied by 2 instead of 16. Comparing
to 8-b flash which requires 255 comparators, this two-step architecture only needs
30 comparators. Most of the analog components in this design were based the DDA
and CAB structure.
Sample and Hold The front-end S/H circuit plays a crucial role in the perfor-
mance of the two-step flash ADCs. Without the S/H circuit, the maximum allowable
slew rate of the input signal is severely limited. This occurs because if the analog
input signal varies rapidly in the conversion period, then the signal level digitized
by the DAC is not equal to the signal sensed by the subtractor. To avoid the errors
during the conversion period, a high speed amplifier with large slew rate is desired.
The DDA developed in Chapter 5 is an excellent choice.
Typically, the switches in the ADC are implemented with MOS transistors.
181
Figure 7.13: A Fully Differential 2-step Flash ADC Diagram
It is important that the MOS switches have a linear transfer function and constant
on resistance, which is effectively independent of input voltage so the RC time
constant for charging the capacitor is constant for all input signal amplitudes. More
importantly, two classical non-ideal effects associated with MOS switches usually
limit the performance of these switches. These two effects are known as charge
injection and clock feed-through [106], [107].
When a MOS switch is on, it operates in the triode region and its drain-to-
source voltage, VDS, is near zero. During the time when the transistor is on, it holds
mobile charges in its channel. Once the transistor is turned off, these mobile charges
must flow out from the channel region and into the drain and the source. This is
charge injection. Because the amount of charge in the channel is signal dependent,
182
?
Charge injection
+
vin Cload vload
(a)
vin
?
1
2Cox
1
2Cox
Cload
vout
(b)
Figure 7.14: (a) charge injection; (b) clock feedthrough
this charge injection error is non-linear and difficult to be removed completely. The
clock feedthrough is due to the MOS capacitance. It can be largely removed by fully
differential operation.
Figure 7.15 (a) is a simple CAB/DDA based implementation. A dummy
NMOS transistor in serial with the main switch has 1/2 the W/L of the main
switch transistor. To the first order, the injected charges are absorbed by the dummy
switch [102]. However, from the power spectrum of the sampled signal (100MSPS
with 10.1258MHz input signal), the THD (total harmonic distortion) is smaller
than 37dB. Even without considering the noise, this parameter itself already limits
ADC?s ENOB (effective number of bits) smaller than 6 bits. So this topology is
inadequate for close to hundred MSPS operation of an 8-bit ADC.
In this design, a fully differential bottom plate S/H circuit was chosen [108],
as shown in figure 7.16 (a) and (b). At time t1, the CLK1 MOS switches turn
off. The charge injection and clock feed through resulting from this action are
common-mode signal and can be largely reduced with the fully differential topology.
183
(a)
Sampled Signal
0 10 20 30 40 50
X1 (E6)
0
25.0
50.0
75.0
100
125
150
Power (dB)
Power Spectrum of the Sampled Signal
(b)
Figure 7.15: A fully differential S/H based on DDA follower (a) schematic; (b) power
spectrum of the sampled signal
184
Attention should be paid that the time internal between time t1 and t2 should be
small because the DDA is in open-loop operation. When CLK2 controlled switches
turn off, due to the high impedance at DDA?s inputs, which makes the sampling
capacitors? top plate floating, the sampled voltage won?t change. At last, the charge
injection and clock feedthrough due to CLK3 off can also be reduced differentially.
The bottom plate of the integrated capacitor is always associated with large parasitic
capacitance. The BPS connection brings two advantages: (1) the substrate coupling
noise doesn?t directly feed into the DDA, nor does it affect the sampled voltage; (2)
reduced gain error [97].
Figure 7. 17 shows the power spectrum of the sampled signal. With this
topology, the THD is improved to near 60dB, which is sufficient for 8-bit ADC.
Comparator The speed of the ADC strongly depends on the comparator
design. The comparison process is effectively a binary phenomenon that generates
the logic HIGH or LOW. The op amp may be used directly as a comparator, but
its comparison speed is often very slow. Even in open-loop configuration, the time
required to settle the valid logic output is still not tolerable for high speed ADC?s.
Since the comparator needs not to be linear or closed-loop, positive feedback can
be introduced to attain near infinite gain, at the mean time, improve the speed.
Attention should be made that, to avoid unwanted latch-up, the positive feedback
must be enabled only at a proper time. This usually means the comparator gain
changes from a small value to a very large value at proper timing [101].
In order to use as much existing components in the FPAA as possible, the
following design was developed. The comparator consists of a preAmp and feed-
185
(a)
?1
t3t2t1to
?2
?3
(b)
Figure 7.16: A fully differential BPS S/H (a) schematic; (b) timing graph
186
Sampled Signal
10 20 30 40
X1 (E6)
0
25.0
50.0
75.0
100
Power (dB
)
Power Spectrum of the Sampled Signal
Figure 7.17: BPS: Power spectrum of the sampled signal
forward latch. The preAmp is actually just the first stage of the DDA with a reset
switch connecting the two outputs. It amplifies the input difference by a small
amount so as to cover the offset of the latch, which is difficult to cancel. It also
functions as a buffer between the resistor ladder and the latch. An added benefit of
using this structure is, the cascoding devices help shielding the kick-back from the
regenerative outputs of the latch, thus reduce the ADC bit-error-rate (BER) due to
the fluctuation in the resistor ladder caused by the kick-back noise. To complete
the comparator design, an extra block, feedforward latch, which doesn?t exist in
the pure analog array has to be added. Through positive feedback, the latch will
generate the final logic level quickly. Two sized inverters were added as logic buffer.
At sampling clock high (CLK = 1), the preAmp is in reset mode which re-
moves any residue left in the previous comparison process. During the hold mode
187
(a) preAmp
(b) Comparator
Figure 7.18: The DDA based comparator
188
Input Difference Reference Difference Positive Output Negative Output Clock
0 5.0 10 15 20 25 30
time (ns)
4
3
2
1
0
1
Voltage (V)
Comparator Transient Response
Figure 7.19: The DDA based implementation of the subtractor
(CLKb=1), the outputs of the preAmp are short to the latch thus forming a low gain
but high speed amplifier (pre-amp the input difference). For the latch, it regenerate
the logic outputs while preAmp is in the reset mode. Figure 7.19 shows the two reset
switches can significantly reduce the ?over-drive? recovery time of the comparator.
The propagation delay is about 0.5ns from negative full scale difference to positive
1LSB difference.
Subtractor Using DDA and some CAB components, the implementation of
the subtractor is simple. It only takes one amplifier plus several feedback resistors,
as shown in figure 7.19. Since gain directly trade bandwidth, the subtractor has
been the slowest component of this design. To somewhat increase the speed a bit, a
gain of 2 instead of 16 was used. Of course, this reduces the value of the LSB in the
fine ADC. This puts more stringent requirement on the comparator offset, which
189
Figure 7.20: The DDA based implementation of the subtractor
may be reduced by using laser Makelink trimming.
Other ADC components employ the existing designs in the Analog System
Design Lab. To reduce the switch parasitics, a folded resistor string DAC was used.
The encoder is a gate level implementation which contains the bubble error correc-
tion scheme [109]. The overall two-step ADC achieves speed of 70MSPS, consumes
154mW and occupies 1600x1600um2. Comparing to the similar full custom design,
it is about 50% slower and takes more space. Although this is not an optimal de-
sign, it demonstrates the flexibility of the laser Makelink based hierarchical design
approach. And the overall the design cycle was significantly reduced. Therefore,
when the short turn-around time is the primary concern, this design methodology
is going to be an attractive option.
190
Chapter 8
Conclusions and Future Work
The main contributions of this work can be summarized as follows: (1) CAD
proposed a generic arrayed based FPAA architecture and a flexible CAB topol-
ogy; improved a pathfinder negotiated routing algorithm and implemented the al-
gorithm in C for a prototype FPAA. investigated and proposed analog constraints
for performance-driven routing. (2) Analog Sub-System Design invented and
designed a novel differential difference op amp; designed two bandgap reference cir-
cuits including a low voltage version; based on the prototype FPAA architecture,
developed several application examples; (3) Laser Makelink studied and designed
various laser Makelink test structures on different CMOS processes and BiCMOS
copper process; invented a novel offset trimming method using laser Makelink; pro-
posed some preliminary ideas on laser Makelink reconfiguration on analog circuits.
However, the FPAA development is a very complex project that requires a
significant amount of work in CAD, architecture and circuit design. Also, although
the idea of laser Makelink reconfiguration was proposed, its application in many
analog circuits have not been fully investigated and experimentally verified. Thus,
the following work is expected to be continued in the future.
(a) Analog CAD Design Methodology Due to the fundamental differ-
191
ence between analog and digital system, the full automatic design synthesis can not
be obtained for analog IC design, but we may still borrow some digital IC design
methodologies. Instead of a traditional bottom-up design, a top-down hierarchical
process can be employed. Design entry can start from Verilog-A or AHDL (analog
hardware description language) block. The overall system performance can be es-
timated at the early design stage thus preventing the risk of insufficient design or
over-design. Then the HDL-based design can converted or optimized with support-
ing IPmodule library.
(b) Analog IPmodule Library Development Besides the universal op amp
unit and the accurate reference blocks, other analog building blocks also need to be
developed. For examples, a fully differential wideband buffer with rail-to-rail input
and output range is desired. To properly drive the off-chip load, a high speed I/O
block needs to be developed. At a higher design level, various application specific
circuit functions should be added into the IPmodule library as the pre-qualified
design for the end users. This includes filters, control circuits etc.
(c)ApplicationsAlthough someof theFPAA functionalities have been demon-
strated, the exact, practical application examples are still not clear. More investi-
gation needs to carried on for field application.
(d)Laser ReconfigurationLaser Makelink isan powerful programming tech-
nology that provide tremendous design flexibility. It can used as a trimming method,
and further more, to reconfigure the analog circuits into different forms and to mod-
ify them to satisfy different application needs. For example, the active filter devel-
oped in the previous chapter suffer from the poor precision of the integrated passive
192
components thus inaccurate filter response. And it is difficult to tune. OTA-C (or
called Gm-C) would be an excellent choice for continuous-time filter implementation,
because it provides the flexibility of ?tuning? by adjusting the transconductance.
Although the DDA op amp can also be used as an ?OTA?, but due to its two stage
structure, the second pole is very close to the dominant pole. So its speed is still
limited. Using laser Makelink, we can ?cut-off? the second output stage, and just
use the first stage as an OTA. This way FPAA can be ?reconfigured? for filter type
of applications. The above method effectively gives us two critical building blocks: a
high-gain, flexible op amp and a high speed OTA. Or we can still keep the two stage
structure with the original class AB output stage to provide the necessary swig,
but reconfigure the first stage as a diode connected differential input stage. The
application of laser Makelink reconfigurability actually has been beyond the original
FPAA design concept. This can be treated as a hierarchical design approach that?s
applicable to SoC?s or other Mixed-Signal ASICs.
193
Appendix A
Chip Layout
A.1 Laser Makelink Test Chips
Figure A.1: Al Makelink test chip - NSC 0.18um CMOS
194
Figure A.2: Cu Makelink test chip - IBM 8HP 0.13um BiCMOS SiGe
195
A.2 The Fully Differential Difference Amplifier
Figure A.3: The fully differential difference op amp - TSMC018 CM process
196
A.3 The Bandgap Reference
Figure A.4: The first order bandgap reference chip with test transistors - TSMC018
CM process
197
A.4 Two-Step ADC
Figure A.5: The two-step flash analog-to-digital data converter - TSMC018 CM
process
Appendix B
FPAA Router Documentation
198
I. Variables 
? s_net (t_net): defined in header file netlist.h. This is a struct type data structure used 
to maintain a net. The definition is shown as below:
*****************************************************************
struct s_net {
int index; /* index of this net in the netlist */
int num_terminals; /* totoal number of terminals of this net */
int *pTerminals; /* used to maintain the linked list of terminals */
};
typedef struct s_net t_net;
*****************************************************************
? s_Rtvertex (t_Rtvertex): defined in header file route.h. This struct defines a data 
structure used by the vertices in routing tree RT.
*****************************************************************
struct s_RTvertex { int index; /* index: the index of this vertex; */
short PQflag;
struct s_RTvertex pNext; /* pNext: pointer to the next vertex */
};
typedef struct s_RTvertex t_RTvertex;
*****************************************************************
? s_RT (t_RT): defined in header file route.h. This is a struct type data structure used to 
maintain a routing tree RT, which is a linked list. Each routing tree RT is 
corresponding to a partial or complete net.
*****************************************************************
struct s_RT {
 int neti; /* index of this RT/net */
int num_v; /* total number of vertices in the final RT */
t_RTvertex pRTvertex; /* Pointer to a member of this RT. In this version, it always  */
/* points to the head of this routing tree because new vertex  */
/* is alway added in the front of the old ones  */
};
typedef struct s_RT t_RT;
*****************************************************************
? s_PQvertex (t_PQvertex): defined in header file pqueue.h. This data structure used to 
maintain a vertex in priority queue.
*****************************************************************
struct s_PQvertex {
int index; /*  the index of the vertex in RRG */
struct s_PQvertex pParent;  /* the parent of this vertex in RRG, NOT in the
context of PQ. This parameter is used in back-tracing stage. */
double pathcost; /* the path cost of from current partial routing tree to this
 vertex including the cost of this vertex itself*/
};
*****************************************************************
199
? pqueue: defined in header file pqueue.h. This data structure used to maintain a 
priority queue.
**********************************************************************************
struct pqueue {
int size; /* number of elements this priority queue actually contains */
int size0; /* how many vertices in PQ (including the vertices which have been removed */
int avail; /* the maximum number of elements the array can still hold*/
int avail0; /* the maximum number of elements the redundant array can still hold */
int step; /* the number of additional elements to be allocated */
PQvertex *d0; /* pointer to the starting position of the redundant priority queue */
PQvertex *d; /* pointer to the current location of the priority queue */
};
**********************************************************************************
? t_rr_type: defined in header file rrgtypes.h. This data structure defines the available 
types of routing resource vertex.
typedef enum {IPAD, OPAD, IPIN, OPIN, CHANX, CHANY} t_rr_type;
? s_edge_list (t_edge_list): defined in header file rrgtypes.h. This data structure used to 
maintain all the neighbors of a vertex.
**********************************************************************************
struct s_edge_list{ int index;
struct s_edge_list *pNext;
};
typedef struct s_edge_list t_edge_list;
**********************************************************************************
? eList: defined in header file rrgtypes.h. This data structure maintains the linked list of 
each vertex?s edges.
******************************************************************
typedef struct { int index;
int size;
t_edge_list *p;
} eList;
******************************************************************
? t_rr_vertex: defined in header file rrgtypes.h. This is the main data structure used to 
describe a routing resource graph vertex.
******************************************************************
typedef struct {
int index;
short x;
short y;
short ppt_num; 
t_rr_type type;
int occupancy;
int capacity;
int num_edges;
int *edge_list; 
} t_rr_vertex;
/*****************************************************************************
 * index: index of the vertex
200
 * x, y:  integer coordinates
 * type:  What is this routing resource? 
 * occupancy: how many nets are using this vertex now?
 * capacity: how many nets can legally use this vertex?
 * ppt_num:  Pin, track or pad number, depending on rr_vertex type. 
 * num_edges:  number of edges exiting this vertex, i.e. the number of vertexs to which it connects 
 * edge_list: pointer to the linked list of all its neighbors  *
*****************************************************************************/
II. Function Descriptions
? t_net * read_netlist (char * fname): read in the netlist file to be routed. Return a 
t_net type pointer, which points to the start position of the netlist stored in memory.
? void free_netlist (t_net * net): free the memory occupied by the nets to be routed.
? struct pqueue * pqinit (struct pqueue *, int): initialize priority queue. Return a
pqueue type pointer to the start position of the priority queue in memory if success. If 
memory is insufficient, return a NULL value.
? int pqinsert (struct pqueue *, PQvertex): insert an element into priority queue. 
Return 1 if the insertion successes. Return 0 if insertion fails.
? PQvertex pqremove (struct pqueue *): remove the highest-ranking element from the 
priority queue. Return a pointer to the memory location of that element. Return a 
NULL value if the removal fails.
? int print_PQ (struct pqueue *q): print out the content of the priority queue. Return 1 
if it successfully prints the priority queue. Return 0 if the priority queue is empty.
? void free_PQ (void): free the memory occupied by priority queue.
? void init_RT(int neti): initialize a routing tree (RT) for each net.
? void add_v_RT (int vindex, int neti): add a vertex with index "vindex" into the 
corresponding routing tree RT.
? int is_in_RT (int jv, int neti): check if a sink is already in routing tree. Return 1 if this 
sink is already contained in routing tree. Return 0 if this sink hasn?t been added into 
routing tree.
? void route (int neti): the primary subroutine used to route a single net.
? void print_RT (int neti): output the vertices currently contained in routing tree. This 
subroutine is for debugging purpose.
201
? void update_p (int vindex, int trend): update vertex's present congestion cost and 
total cost. 
? void init_PQ_to_RT (int neti, int * reached): initialize priority queue with the 
vertices currently resided in routing tree.
? void enqueue_PQ (int jn, PQvertex pParent): add the fanout verteice of vertex m 
(which is removed from PQ) to priority queue and calculate the pathcost.
? int is_sink (int jsink, int neti): justify if this is a sink of net i. Return 1 if it?s a sink. 
Return 0 if it?s not a sink.
? void build_rrg (void): build routing resource graph and dump it out into a RRG file.
? void add_***_pads (int i, int j): add pad at position (i, j) into the routing graph.
? void add_cab_pin (int i, int j): add pins of the CAB (i, j)  into the routing graph.
? void add_tracks (int i): connect tracks in channel x and channel y if there's a switch 
box
? void creat_edge_list (int present, int neighbor): add neighbors of current vertex into 
the linked list and count the total number of edges.
? void free_rrg (void): free the memory occupied by the routing resource graph
? int get_vertex_index (int i, int j, enum t_rr_type rr_type, int ppt_num): calculate the 
vertex index at specified position. Return this vertex?s index number.
? void init_cost (void): initialize the cost functions for all vertices.
? void free_cost_mem (void): free the memory occupied by the cost functions.
? void update_h (int vindex): update history congestion function.
? void output_RT (int neti): print out the vertices in routing to screen for debugging 
purpose.
? int overuse (void): check if overuse exists. Return 1 if overuse exists. Return 0 if no 
overuse.
? double dump_RT (char *fname): dump out the finished routing into a file. Return 
the track usage rate for this routing.
? void free_RT (void): free the memory occupied by routing trees.
202
? void * my_malloc (size_t size): allocate a block of memory. Exit program if no 
sufficient memory.
? int odd_or_even (int dividend, int divisor); justify parity of an integer. Return 1 if the 
integer is an odd number. Return 0 if the integer is an even number.
III. Instructions
The router program was developed on Windows platform with Microsoft Visual
C/C++6.0. The executable can by built by:  start Microsoft Visual Studio, create a new
workspace and a new project, add all the source files into this project and then build. Or 
simply copy all the files from 1 ? 3 into a directory and click build button in Microsoft 
Visual Studio. 
Usage: name_of_executable input_netlist_file output_file
Where input_netlist_file is the netlist to be routed, output_file is user specified output 
file to store finished routing result. For example, in a command line, type:
 pathfinder test.net test.r
? File List
1. Source Files: main.c, netlist.c, pqueue.c, route.c, rrg.c, utils.c
2. Header Files: netlist.h, pqueue.h, route.h, rrg_funcs.h, rrg_types.h,
utils.h
3. Workspace File & Project Files: pathfinder.dsw, pathfinder.dsp
4. Sample input netlist file: test.net
5. Executable: pathfinder.exe
6. Generated routing resource graph file: rr_graph.out
7. Sample output file: test.r
203
BIBLIOGRAPHY
[1] http://focus.ti.com/docs/pr/pressrelease.jhtml?prelId=sc04074
[2] http://www.st.com/stonline/press/news/year2004/t1573h.htm
[3] http://www.eetimes.com
[4] Databeans Inc.(http://www.databeans.net), ?2005 Analog Markets World-
wide?.
[5] Microelectronics Design Center, Simplified Digital Design Flow, Swiss Federal
Institute of Technology.
[6] Stephen M. Trimberger et al., Field Programmable Gate Array Technology,
Kluwer Academic Publishers, 1994
[7] http://www.anadigm.com
[8] M. Sivilotti, ?A Dynamically Configurable Architecture for Prototyping Ana-
log Circuits?, MIT VLSI Conference, pp. 237-258, 1988.
[9] E. Lee and G. Gulak, ?A CMOS Field-programmable Analog Array?, ISSCC
Digest of Technical Papers, Feb., 1991, pp. 186-188
[10] K. Austin, ?Integrated Circuit for Analog System?, U.S. Patent 5,196740,
Pilkington Micro-Electronics, March 23, 1993
[11] N. Sako, ?Integrated Circuit and Gate Array?, U.S. Patent 5,298,806,
Kawasaki Steel Corp., March 29, 1994
204
[12] Bogdan Pankiewicz, Marek Wojcikowski et al., ?A Field Programmable Ana-
log Array for CMOS Continuous-Time OTA-C Filter Applications?, IEEE
Journal of Solid-State Circuits, Vol. 37, No. 2, February 2002
[13] Precision Monolithics Inc., ?Analog Signal Processing Subsystem?, GAP-01
Data Sheet 1982
[14] F. Goodenough, ?Analog Counterparts of FPGAs Ease System Design?, Elec-
tronic Design, October 14, 1994
[15] http://www.zetex.com/3.0/a5-6.asp
[16] http://www.anadigm.com/
[17] Scree Ganesan and Ranga Vemuri, ?A Methodology for Rapid Prototyping of
Analog Systems?, 12th Intl. Conf. VLSI Design, pp.556-563, 1999
[18] Sree Ganesan and Ranga Vemuri, ?FAAR: A Router for Field-Programmable
Analog Arrays?, 12th Intl. Conf. VLSI Design, pp.556-563, 1999
[19] Behzad Razavi, ?CMOS Technology Characterization for Analog and RF De-
sign?, IEEE Journal of Solid-State Circuit, Vol. 34, No.3, March 1999
[20] TSMC 0.18 um Process Design Kit, http://www.mosis.org
[21] J. Baker et. al, CMOS circuit design, layout, and Simulation, IEEE Press,
1997
[22] S. Trimberger, Field-Programmable Gate Array Technology, Kluwer Academic
Publishers, 1994
205
[23] http://www.actel.com
[24] M. John and S. Smith, Application-Specific Integrated Circuits, VLSI Systems
Series, 1997
[25] R. T. Smith, J. D. Chlipala, ?Laser programmable Redundancy and Yield
Improvement in a 64K DRAM,? IEEE J. Solid-State Circuits, vol. SC-16, pp.
506-514, Oct. 1981.
[26] S. S Cohen and G. H. Chapman, ?Laser Beam Processing and Wafer-Scale
Integration,? Beam Processing Technologies, Academic Press 1989.
[27] J. B. Bernstein, Y. Hua, and W. Zhang, ?Laser energy limitation for buried
metal cuts,? IEEE Elect. Dev. Let., vol. 19, no. 1, pp. 4-6, 1998.
[28] J. B. Bernstein, T. M. Ventura, and T. Radomski, ?High Density Laser Linking
of Metal Interconnect,? IEEE Trans. on Comp., Pack., and Manuf. Tech.,
Vol.17, pp. 590-593 Dec. 1994.
[29] J. B. Bernstein, B. D. Colella, ?Laser-Formed Metallic Connections Employing
a Lateral Link Structure,? IEEE Trans. on Comp., Pack. and Manuf. tech.,
Part A, Vol.18, pp. 690-692, Sep. 1995.
[30] Y. L. Shen, S. Suresh, and J. B. Bernstein, ?Laser Linking of Metal Intercon-
nect: Analysis and Design Considerations,? IEEE Trans. on Elect. Dev., Vol.
43, pp. 402-410 Mar. 1996.
206
[31] J. B. Bernstein, W. Zhang and C. H. Nicholas, ?Laser Formed Metallic Con-
nections,? IEEE Trans. Comp. Pack. and Manuf. Tech., Part B: Advanced
Packaging, Vol. 21, No. 2, pp. 194, May 1998.
[32] J. Lee, Analysis of Laser Processing of Metal Wires used in Microelectron-
ics Applications, Doctoral dissertation, University of Maryland, College Park,
2001.
[33] W. Zhang, J. Lee and J. Bernstein, ?Energy Effect of Laser-induced Vertical
Metallic Link?, IEEE Trans. Semiconduct. Manufact., 2001.
[34] W. Zhang, Laser-induced Vertical metallic Link and Implementations in VLSI,
Doctoral dissertation, University of Maryland, College Park, 2000.
[35] http://www.enre.umd.edu/JB.
[36] K. Chung, J. Luo, H. Huang, J. Tuchman and J. Bernstein, ?Experimen-
tal Verification of the Optimal Laser-Induced Advanced-Lateral MakeLink
Structures?, submitted to IEEE Transaction of Semiconductor Manufactur-
ing, 2005.
[37] J. Luo, M. Peckerar, J. Bernstein, ?A Novel Method to Reduece Op
Amp/Comparator Offset?, Patent Disclosure, University of Maryland 2004.
[38] Sree Ganesan and Ranga Vemuri, ?FAAR: A Router for Field-Programmable
Analog Arrays?, 12th Intl. Conf. VLSI Design, pp.556-563, 1999.
207
[39] Naveed A. Sherwani, Algorithms for VLSI Physical Design Automation,
Kluwer Academic Publishers, 1999.
[40] V. Betz, J. Rose and A. Marquardt, Architecture and CAD for Deep-
Submicron FPGAs, Kluwer Academic Publishers, 1999.
[41] T. Cormen, C. Leiserson et. al, Introduction to Algorithms McGraw-Hill, 2001.
[42] J. M. Ho, G. Vijayan, and C. K. Wong, ?New algorithms for the rectilinear
Steiner tree problem?, IEEE Tans. Computer-Aided Design, vol. 9, no. 2, 1990.
[43] Y. Sun, T. Wang etc., ?Routing for Symmetric FPGA?s and FPIC?s?, IEEE
Tans. Computer-Aided Design of Integrated Circuits and Systems, Vol. 16, No.
1, January 1997.
[44] G. Lemieux, S. Brown, D. Vranesic, ?On Two-Step routing for FPGAs?, ACM
Symp.on Physical Design, 1997.
[45] S. Brown, J. Rose et. aletc., ?A Detailed Router for Field-Programmable Gate
Arrays?, IEEE Trans. on Computer-aided Design, Vol. II, No. 5, May 1992.
[46] J. Rose, ?Parallel Global Routing for Standard Cells?, IEEE Trans. On CAD?,
Oct. 1990.
[47] Y. Chang, S. Thakur et. al, ?A New Global Routing Algorithm for FPGAs?,
ICCAD, 1994.
[48] C. Ebeling, L. McMurchie et. al., ?Placement and Routing Tools for the Trip-
tych FPGA?, IEEE Trans. On VLSI, Dec. 1995.
208
[49] G. Borriello, C. Ebeling, ?The Triptych FPGA Architecture?, IEEE Trans.
On VLSI Systems, Vol. 3, No. 4, Dec., 1995.
[50] Y. Wu, M. Sadowska, ?Routing for Array-Type FPGA?s?, IEEE Trans. on
Computer-aided Design of Integrated Circuits and Systems, Vol. 16, No. 5,
May 1997.
[51] L. McMurchie, C. Ebeling, ?Pathfinder: A Negotiateion-based Performance-
Driven Router for FPGAs?, Univ. of Washington, 1996.
[52] C. Lee, ?An Algorithm for Path Connections and its Applications?, IRE
Trans. Electron. Comp, Vol EC=10, 1961.
[53] J. Swartz, V. Betz and J. Rose, ?A Fast Routability-Driven Router for FP-
GAs?, ACM/SIGDA International Symposium on Field Programmable Gate
Arrays, Monterey, CA, 1998.
[54] V. Betz, J. Rose, ?VPR: A New Packing, Placement and Routing Tool for
FPGA Research?, 1997 International Workshop on Field Programmable Logic
and Applications.
[55] R. Kruse, C. Tondo, B. Leung, Data Structures and Program Design in C,
Prentice Hall, 1997.
[56] U. Choudhury, and A. Sangiovanni-Vincentelli, ?Constraint-Based channel
routing for analog and mixed analog/digital circuits? IEEE Tans. Computer-
Aided Design of Integrated Circuits and Systems, Vol. 12, No. 4, 1993.
209
[57] U. Choudhury, and A. Sangiovanni-Vincentelli, ?Use of Performance Sensitiv-
ities in Routing of Analog Circuits?, CH2868-8/90, IEEE 1995.
[58] W. Kao, Cy. Lo, M. Basel and R. Singh, ?Parasitic Extraction: Current State
of the Art and Future Trends?, Proceedings of the IEEE, vol. 89, No. 5, May
2001.
[59] J.H. Chem, J. Huang, L. Arledge, P. C. Li and P. Yang, ?Multilevel Metal
Capacitances Models for CAD Design Synthesis Systems?, IEEE Elec. Dev.
Letters, 13 (1): 32-34, Feb. 1992.
[60] R. T. Edwards, K. Strohbehn and S. E. Jaskulek, ?A Field-Programmable
Mixed-Signal Array Architecture Using Antifuse Interconnects?, ISCAS pp.
319-322, 2000.
[61] H. Alzaher and M. Ismail, ?A CMOS Fully Balanced Four-Terminal Float-
ing Nullor?, IEEE Trans. On circuit & Systems I: Fundamentals Theory &
Applications, Vol. 49, April 2002.
[62] J. Luo, H. Huang, J.B. Bernstein, J. Ari Tuchman and M. Peckerar, ?A Con-
figurable Analog Block Architecture for Field Programmable Analog Arrays?,
UMCP Patent Disclosure, 2004.
[63] A. Bratt and I. Macbeth, ?Design and Implementation of a FPAA?,
ACM/SIGDA FPGA?96, Monterey, Ca., Feb. 11-13, pp. 88-93, 1996.
[64] H. Kutuk and S. Kang, ?A Field-Programmable Analog Array (FPAA) Using
Swithed-Capacitor Techniques?, IEEE ISCAS, pp. 41-44, 1996.
210
[65] D. Vallancourt and Y. P. Tsividis, ?Timing-Controlled Switched Analog Fil-
ters with Full Digital Programmability?, IEEE ISCAS, pp. 329-333, 1987.
[66] Adaptive Logic, AL220 Analog Micro Controller preliminary data sheet,
http://www.adaptivelogic.com
[67] S. T. Chang, B. R. Hayes-Gill and C. J. Paull, ?Multi-Function Block for a
Switched Current Field Programmable Analog Array?, Midwest Systemsium
on Circuits and systems, Ames, Iowa, August 18-21 1996.
[68] Sophocles J. Orfanidis, Introduction to Signal Processing, Prentice Hall,1996.
[69] E. Lee and P. G. Gulak, ?Field-Programmable Analogue Array based on MOS-
FET Transconductors?, Electronic Letters 28(1), pp. 28-19, January 2, 1992.
[70] S. Chang, B. Hayes-Gill, and C. Paull, ?Implemention of a Mulit-Function
Signal Detection Block for a Field-Programmable Analogue Array?, Fifth Eu-
rochip Workshop on VLSI Design Training, Oct. 17-19, pp. 226-231, Dresden,
Germany, 1994.
[71] E. Pierzchala, M.Perkowski, Paul Van Halen and Rolf Schaumann, ?Current-
Mode Amplifier-Integrator for a Field-Programmable Analog array?, ISSCC
Disest of Technical Papers, pp. 196-197, Feb. 1995.
[72] C. Premont, R. Grisel, N.Abouchi and J. Chante, ?Current-Conveyor Based
Field Programmable Analog Array?, Midwest Systemsium on Circuits and
systems, Ames, Iowa, August 18-21 1996.
211
[73] S. H. K. Embabi, X. Quan, N. Oki, A. Manjrekar and E. Sanchez-Sinencio, ?A
Field Programmable Analog Signal Processing Array?, Midwest Systemsium
on Circuits and systems, Ames, Iowa, August 18-21, 1996.
[74] J. Colinge, Silicon-on-Insulator Technology: Materials to VLSI, Kluwer Aca-
demic Publishers, 1991.
[75] P. Gray, P. Hurst, S. Lewis and R. Meyer, Analysis and Design of Analog
Integrated Circuits, John Wilet & Sons, Inc., 2000.
[76] H. Alzaher and M. Ismail, ?A CMOS Fully Balanced Four-Terminal Float-
ing Nullor?, IEEE Trans. On circuit & Systems I: Fundamentals Theory &
Applications, Vol. 49, April 2002.
[77] K. Nakamura, ?An 85 mW, 10 b, 40 Msample/s CMOS Parallel-Pipelined
ADC?, IEEE J. of Solid-State Circuits, Vol. 30, No. 3. pp. 629-633, March
1995.
[78] R. Whatley, ?Fully Differential Operational Amplifier with DC Common-
Mode Feedback?, U.S. Patent 4, 573,020, Feb. 1986.
[79] C. Shih and P. Gray, ?Reference Refreshing Cyclic Analog-to-Digital and
Digitall-to-Analog Converters?, IEEE J. of Solid-State Circuits, Vol. 21, No.
4, pp. 544-554, 1986
[80] M. Pelgrom, C. Duinmaijer and A. Welbers, ?Matching Properties of MOS
Transistors?, IEEE J. Solid-State Circuits, Vol. 24, No. 5, Oct. 1989.
212
[81] J. Baker, CMOS Circuit Design, Layout, and Simulation, 2nd Edition Wiley-
IEEE 2004
[82] Christian C. Enz, Gabor C. Temes, ?Circuit Techniques for Reducing the
effects of Op Amp Imperfections: Autozeroing, Correlated Double Sampling,
and Chopper Stabilization?, Proceedings of the IEEE, Vol. 84, No. 11, pp.
1584-1614, November 1996.
[83] B Razavi, Bruce Wooley, ?Design Techniques for High-Speed, High-Resolution
Comparators?, IEEE Journal of Solid-State Circuits, Vol. 27, No. 12, Decem-
ber 1992.
[84] http://www.analog.com
[85] D. Hilbiber, ?A New Semiconductor Voltage Standard?, ISSCC Dig. of Tech.
Paper, pp. 32-33, February 1964
[86] LM113 Data Sheet, National Semiconductor Linear Data Book, 1972
[87] A.P. Brokaw, ?A simple three-terminal IC bandgap reference?, IEEE Journal
of Solid-State Circuits, vol. SC-9, pp. 388-393, Dec. 1974
[88] J. Baker, CMOS Circuit Design, Layout, and Simulation, 2nd Edition Wiley-
IEEE 2004
[89] M.A.T. Sanduleanu, A.J.M. van Tuijl, and R.F. Wassenaar, ?Accurate low
power bandgap voltage reference in 0.5 um CMOS technology,? IEE Electron-
ics Letters, vol. 34, pp. 1025-1026, 14th May 1998.
213
[90] V.G. Ceekala, L.D. Lewicki, J.B. Wieser, D. Varadarajan, and J. Mohan,
?A method for reducing the effects of random mismatch in CMOS bandgap
references,? Proc. IEEE Intl. Solid-State Circuits Conf., vol. 2, pp. 318-319,
2002.
[91] Y. Tsividis, ?Accurate analyzes of temperature effects in IC-VBE character-
istics with application to bandgap reference sources?, IEEE J. Solid State
Circuits, vol. 15, pp. 1076-1084, Dec. 1980.
[92] B. Song and P. Gray, ?A Precision Curvature-Compensated CMOS Bandgap
Reference?, IEEE J. Solid State Circuits, vol.sc-18, No.6, 1983.
[93] I. Lee, G. Kim and W. Kim, ?Exponential Curvature-Compensated BiCMOS
Bandgap References?, IEEE J. Solid State Circuits, Vol. 29, No. 11, Nov.,
1994
[94] P. Malcovati, F. Maloberti et. al, ?Curvature-Compensated BiCMOS Bandgap
with 1-V Supply Voltage?, IEEE J. Solid State Circuits, Vol. 35, No. 7, July
2001.
[95] M. Gunawan, G. Meijer, J. Fonderie, and H. Huijsing, ?A curvature corrected
low-voltage bandgap reference?, IEEE J. Solid-State Circuits, vol. 28, pp. 667-
670, June 1993.
[96] R. Schaumann, Mac E. Valkenburg, Design of Analog Filters, Oxford Univer-
sity Press, 2001.
[97] R. Baker, CMOS Mixed-signal Circuit Design, IEEE Press, 2003.
214
[98] , ?A Practical Method of Designing RC Active Filters?, IRE Trans. Circuits
Theory, Vol. CT-2, No. 3, pp. 74-85, 1955.
[99] H. Alzaher and M. Ismail, ?A CMOS Fully Balanced Four-Terminal Floating
Nullor?, IEEE Trans. on Circuit and System II: FUNDAMENTAL THEORY
AND APPLICATIONS, VOL. 49, NO. 4, APRIL 2002.
[100] R. Plassche, CMOS Integrated Analog-to-Digital and Digital-to-Analog Con-
verters, 2nd Edition, Kluwer Academic Publishers, 2003.
[101] B. Razavi, Principles of Data Conversion System Design, IEEE Press, 1995.
[102] M Choi and A. Abidi, ?A 6-b 1.3-Gsample/s A/D Converter in 0.35um
CMOS?, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO.
12, DECEMBER 2001.
[103] P. Scholtens and M. Vertregt, ?A 6-b 1.6-Gsample/s Flash ADC in 0.18um
CMOS Using Averaging Termination?, IEEE JOURNAL OF SOLID-STATE
CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002.
[104] M. Choe, B. Song, K. Bacrania, ?A 13-b 40-MSample/s CMOS Pipelined Fold-
ing ADC with Background Offset Trimming?, IEEE J. Solid-State Circuits,
vol. 35, pp. 1781C1790, Dec. 2000.
[105] P. Vorenkamp, R. Roovers, ?A 12-b, 60-MSample/s Cascaded Folding and
Interpolating ADC?, IEEE J. Solid-State Circuits, vol. 32, pp. 1876-1886,
Dec. 1997.
215
[106] J. Shieh, M. Patil and B. Sheu, ?Measurement and Analysis of Charge Injec-
tion in MOS Analog Switches?, IEEE J. Solid-State Circuits, Vol. 22, No. 2,
April 1987.
[107] G. Wegmann, E. Vittoz and F. Rahali, ?Charge Injection in Analog MOS
Switches?, IEEE J. Solid-State Circuits, Vol. 22, No. 6, December 1987.
[108] P. Li, M. Chin, P. Gray and R. Castello, ?A Ratio-Independent Algorithmic
Analog-to-Digital Conversion Technique?, IEEE J. Solid-State Circuits, Vol.
SC-19, No. 6, December 1984.
[109] Design of the 4-bit DAC and Encoder from H. Huang and K. Laurentz, ASDL
Lab, Department of Electrical and Computer Engineering, University of Mary-
land, College Park
216