ABSTRACT Title of Dissertation: NONLINEAR COMPLEXITY OF BOOLEAN PERMUTATIONS Thomas Gordon Draper, Doctor of Philosophy, 2009 Dissertation directed by: Dr. Lawrence C. Washington Department of Mathematics We introduce the concept of nonlinear complexity, where the complexity of a function is determined by the number of nonlinear building blocks required for construction. We group functions by linear equivalence, and induce a complexity hierarchy for the a ne equivalent double cosets. We prove multiple invariants of double cosets over the a ne general linear group, and develop a specialized double coset equivalence test. This is used to classify the 16! permutations over 4 bits into 302 equivalence classes, which have a maximal nonlinear depth of 6. In addition, we present a new complexity class de ned in terms of nonlinearity. NONLINEAR COMPLEXITY OF BOOLEAN PERMUTATIONS by Thomas Gordon Draper Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial ful llment of the requirements for the degree of Doctor of Philosophy 2009 Advisory Committee: Dr. Lawrence C. Washington, Chairman/Advisor Dr. Je ery Adams Dr. John J. Benedetto Dr. Kartik Prasanna Dr. William Gasarch, Dean?s Representative c Copyright by Thomas Gordon Draper 2009 DEDICATION To Tyra, Thomas, Lillian, Elizabeth and most of all Susan, who made it all possible. ii ACKNOWLEDGEMENTS Thoughtful consideration revealed many who contributed to my dis- sertation, in ways large and small. My parents head the list. From my rst memories, they saw my a nity for mathematics and nurtured it. They challenged, inspired, balanced and loved me. I realize more and more how important this was. My professors and teachers at BYU challenged me further and helped me set a higher standard for myself, both in mathematics and in life. Their greatest contribution to me was helping me to center on God and then to build my life around Him. Interestingly, I realize that much of this communication was made possible because of our common love of mathematics and science. My job has been one of the greatest experiences of my life. I would never have imagined the satisfaction received from puzzle solving with impact. Again and again I have found myself in the right place at the iii right time for the right opportunity. I consider myself very blessed and wish everyone could do what they love for a living. In addition, my bosses and coworkers have been extremely supportive of my dis- sertation work, even when it increased their own burden. I am very grateful for this and I know this is not the norm for the workplace. I wish to personally thank Dr. Washington and Dr. Gasarch. They were both extremely helpful in directing and focusing my research e orts. I admire them both as researchers and teachers. Finally, I wish to thank my children and especially my wife. I know we sacri ced much of our time together and I hope it will be worth it. We have a long and exciting road ahead to share together. I hope Tyra, Thomas, Lillian and Elizabeth will each nd and enjoy their talents. I am excited to see their lives unfold. I conclude with an overwhelming appreciation for the love of my life, Susan. She encouraged me when I was tired. She strengthened me when I was weak. She burdened herself to make things easier for me. She loved me. I will always be in her debt. I adore her more than anyone else on this earth. I am looking forward to our many adventures yet to come. iv TABLE OF CONTENTS List of Tables xi List of Figures xiii 1 Introduction 1 1.1 The Di culty with Di culty . . . . . . . . . . . . . . . . . . . . . 1 1.2 Combinatorics and Complexity . . . . . . . . . . . . . . . . . . . 2 1.3 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Summary of Primary Results . . . . . . . . . . . . . . . . . . . . 4 1.5 Classi cation Strategy . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5.1 What Functions are Equivalent? . . . . . . . . . . . . . . . 5 1.5.2 How Do We Induce a Complexity Hierarchy? . . . . . . . . 6 1.5.3 What Can We Compute? . . . . . . . . . . . . . . . . . . . 7 2 Complexity Measures 8 2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 A Di erent View of Complexity . . . . . . . . . . . . . . . . . . . 8 2.3 Transposition Complexity Measure . . . . . . . . . . . . . . . . . 9 2.4 Gate Complexity Measure . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Reversible Gate Complexity Measure . . . . . . . . . . . . . . . . 12 v 2.6 Nonlinear Reversible Complexity Measure . . . . . . . . . . . . . 13 3 Reversible Circuits 15 3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Classical Circuit Diagrams . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Basic Reversible Gates . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.1 NOT Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2 Controlled NOT Gate (CNOT) . . . . . . . . . . . . . . . 18 3.3.3 SWAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.4 To oli Gate . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.5 Example Permutation . . . . . . . . . . . . . . . . . . . . 21 3.3.6 Fredkin Gate . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Linear transformations on reversible circuits . . . . . . . . . . . . 23 3.5 Controlled Linear Transformations . . . . . . . . . . . . . . . . . 25 3.6 A ne Transformations . . . . . . . . . . . . . . . . . . . . . . . . 26 3.7 Scratch Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.7.1 3-bit AND . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Which Permutations Should Be Equivalent? 32 4.1 Results and Application . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 The Equivalence Choice . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 A ne Equivalence of To oli and Fredkin Gates . . . . . . . . . . 35 4.5 Controlled A ne and Linear Permutations . . . . . . . . . . . . . 36 vi 5 Counting Equivalent Functions 37 5.1 Summary and Results . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 Counting Fixed Points Under a Double Coset Action . . . . . . . 40 5.4 Counting A ne Equivalent Double Cosets in S2n . . . . . . . . . 42 5.4.1 Example: A ne equivalent 3-bit permutations . . . . . . . 42 5.4.2 Counting A ne Equivalent n-bit Permutations . . . . . . 45 5.5 Counting A ne Equivalent Double Cosets in A2n . . . . . . . . . 46 5.6 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6 Hamiltonian Cycles over GLn(2) 53 6.1 Results and Application . . . . . . . . . . . . . . . . . . . . . . . 53 6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.3 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.4 Borel Subgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.5 Heuristic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.6 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7 Double Coset Representatives 63 7.1 Results and Application . . . . . . . . . . . . . . . . . . . . . . . 63 7.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.3 Basis Fixing Permutations . . . . . . . . . . . . . . . . . . . . . . 65 7.4 Basis Permuting Permutations . . . . . . . . . . . . . . . . . . . . 66 7.5 Minimal Basis Fixing Permutations . . . . . . . . . . . . . . . . . 68 7.6 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 vii 8 Multiple Rank Invariant 71 8.1 Results and Application . . . . . . . . . . . . . . . . . . . . . . . 71 8.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8.3 The k-rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8.4 The Invariance of the k-rank . . . . . . . . . . . . . . . . . . . . . 74 8.5 Multiple Rank Invariant . . . . . . . . . . . . . . . . . . . . . . . 75 8.6 Categorization of 3-bit Permutations by MRI . . . . . . . . . . . . 76 8.7 Categorization of 4-bit Permutations by MRI . . . . . . . . . . . . 77 8.8 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 9 Additional Invariants 79 9.1 Results and Application . . . . . . . . . . . . . . . . . . . . . . . 79 9.2 Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 9.3 Two-Way Permutations . . . . . . . . . . . . . . . . . . . . . . . . 80 9.4 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 10 Complexity Theory 84 10.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 10.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 10.3 Nonlinear Complexity Class . . . . . . . . . . . . . . . . . . . . . 86 10.4 Nonlinear Polynomial Complexity . . . . . . . . . . . . . . . . . . 91 10.5 Uniform and Non-Uniform Nonlinear Circuits . . . . . . . . . . . 93 10.6 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 11 Computation 96 11.1 Storage of Permutations . . . . . . . . . . . . . . . . . . . . . . . 96 11.1.1 Truth Table Form . . . . . . . . . . . . . . . . . . . . . . . 97 viii 11.1.2 Function Table Form . . . . . . . . . . . . . . . . . . . . . 98 11.1.3 Polynomial Form . . . . . . . . . . . . . . . . . . . . . . . 99 11.2 Quick double coset test . . . . . . . . . . . . . . . . . . . . . . . . 100 11.2.1 Composition via a Hamiltonian Path . . . . . . . . . . . . 100 11.2.2 Linearity Test . . . . . . . . . . . . . . . . . . . . . . . . . 101 11.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 12 Classi cation of 3-bit and 4-bit permutations 102 12.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 12.2 Permutations of Interest . . . . . . . . . . . . . . . . . . . . . . . 102 12.2.1 Modular Addition . . . . . . . . . . . . . . . . . . . . . . . 103 12.2.2 Modular Multiplication . . . . . . . . . . . . . . . . . . . . 103 12.2.3 Modular A ne . . . . . . . . . . . . . . . . . . . . . . . . 104 12.2.4 Modular Inverse . . . . . . . . . . . . . . . . . . . . . . . . 104 12.2.5 Modular Exponentiation . . . . . . . . . . . . . . . . . . . 105 12.2.6 Pseudo-Inverse over GF(2)n . . . . . . . . . . . . . . . . . 105 12.3 Complexity Class NLC(i;j) . . . . . . . . . . . . . . . . . . . . . . 106 12.4 3-bit Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 106 12.4.1 NLC(3;0) . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 12.4.2 NLC(3;1) . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 12.4.3 NLC(3;2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 12.4.4 NLC(3;3) . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 12.4.5 No 3-bit Permutations are One-Way . . . . . . . . . . . . 110 12.5 4-bit Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 110 12.6 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 ix 13 Case Studies 114 13.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 13.2 The Hiltgen function . . . . . . . . . . . . . . . . . . . . . . . . . 114 13.3 Zech Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 13.4 Incrementing modulo 2n . . . . . . . . . . . . . . . . . . . . . . . 118 13.5 Multiplication by 2n 1 + 1 modulo 2n . . . . . . . . . . . . . . . . 119 13.6 Addition and Subtraction modulo 2n . . . . . . . . . . . . . . . . 120 14 Conclusion 122 A Source Code 123 x LIST OF TABLES 3.1 Truth Table: NOT Gate . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Wire Program: NOT Gate . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Truth Table: Controlled NOT Gate . . . . . . . . . . . . . . . . . 18 3.4 Wire Program: CNOT Gate . . . . . . . . . . . . . . . . . . . . . 19 3.5 Wire Program: SWAP . . . . . . . . . . . . . . . . . . . . . . . . 19 3.6 Truth Table: SWAP . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.7 Wire Program: To oli Gate . . . . . . . . . . . . . . . . . . . . . 21 3.8 Truth Table: To oli Gate . . . . . . . . . . . . . . . . . . . . . . 21 3.9 Wire Program: f = (a2 a0a1;a1 1;a2 a0a1 a0) . . . . . . . 22 3.10 Truth Table: f = (a2 a0a1;a1 1;a2 a0a1 a0) . . . . . . . . 22 3.11 Truth Table: Fredkin Gate . . . . . . . . . . . . . . . . . . . . . . 23 4.1 Number of Permutation Equivalence Classes . . . . . . . . . . . . 34 5.1 Cycle Sets of AGL(3,2) . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 MAGMA function for counting double cosets . . . . . . . . . . . . 45 5.3 Number of A ne Equivalent Permutations . . . . . . . . . . . . . 46 5.4 Number of A ne Equivalent Even Permutations . . . . . . . . . . 51 8.1 Multiple Rank Invariant of 3-bit Permutations . . . . . . . . . . . 77 8.2 Multiple Rank Invariant of 4-bit Permutations . . . . . . . . . . . 78 xi 9.1 Invariant Table for 4-bit Permutations . . . . . . . . . . . . . . . 83 10.1 Space Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 88 11.1 Truth Table: f = (a2 a0a1;a1 1;a2 a0a1 a0) . . . . . . . . 97 11.2 Monomial Conversion Table . . . . . . . . . . . . . . . . . . . . . 99 12.1 NLC(3;0) Permutations . . . . . . . . . . . . . . . . . . . . . . . . 107 12.2 NLC(3;1) Permutations . . . . . . . . . . . . . . . . . . . . . . . . 108 12.3 NLC(3;2) Permutations . . . . . . . . . . . . . . . . . . . . . . . . 109 12.4 NLC(3;3) Permutations . . . . . . . . . . . . . . . . . . . . . . . . 109 12.5 Invariant Table for each 4-bit Permutations Depth . . . . . . . . . 112 12.6 Depth of Various 4-bit Permutations . . . . . . . . . . . . . . . . 113 xii LIST OF FIGURES 3.1 Full Adder from XORs, ANDs and ORs . . . . . . . . . . . . . . . 16 3.2 Circuit Diagram: NOT Gate . . . . . . . . . . . . . . . . . . . . . 17 3.3 Circuit Diagram: Controlled NOT Gate . . . . . . . . . . . . . . 18 3.4 Circuit Diagram: SWAP . . . . . . . . . . . . . . . . . . . . . . . 19 3.5 Circuit Diagram: To oli Gate . . . . . . . . . . . . . . . . . . . . 20 3.6 Circuit Diagram: f = (a2 a0a1;a1 1;a2 a0a1 a0) . . . . . . 21 3.7 Circuit Diagram: Fredkin Gate . . . . . . . . . . . . . . . . . . . 23 3.8 Circuit Diagram: Linear Matrix Example . . . . . . . . . . . . . . 24 3.9 Block matrix product acting on kets (A B = BA) . . . . . . . . 25 3.10 Two equivalent block matrix actions . . . . . . . . . . . . . . . . 25 3.11 Fredkin gate as a controlled linear transformation . . . . . . . . . 26 3.12 To oli gate as a controlled linear transformation . . . . . . . . . . 26 3.13 Equivalent A ne Transformations . . . . . . . . . . . . . . . . . . 27 3.14 3-bit AND Using 3 Wires? . . . . . . . . . . . . . . . . . . . . . . 28 3.15 3-bit AND: Second Attempt . . . . . . . . . . . . . . . . . . . . . 28 3.16 3-bit AND: Third Attempt . . . . . . . . . . . . . . . . . . . . . . 29 3.17 3-bit AND: Third Attempt (Revisited) . . . . . . . . . . . . . . . 29 3.18 3-bit AND: Fourth Attempt . . . . . . . . . . . . . . . . . . . . . 29 3.19 3-bit AND: Fifth Attempt . . . . . . . . . . . . . . . . . . . . . . 30 xiii 3.20 3-bit AND: Final Attempt . . . . . . . . . . . . . . . . . . . . . . 31 4.1 Equivalence of Fredkin and To oli gates . . . . . . . . . . . . . . 35 4.2 A ne Equivalent Controlled Transformations . . . . . . . . . . . 36 6.1 Extending the Active Cycle . . . . . . . . . . . . . . . . . . . . . 59 12.1 3-bit Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 107 13.1 Zech Logarithm for x3 +x+ 1 . . . . . . . . . . . . . . . . . . . . 117 13.2 Zech Logarithm for x4 +x+ 1 . . . . . . . . . . . . . . . . . . . . 118 13.3 Incrementing modulo 8 . . . . . . . . . . . . . . . . . . . . . . . . 119 13.4 Incrementing modulo 16 . . . . . . . . . . . . . . . . . . . . . . . 119 xiv Chapter 1 Introduction 1.1 The Di culty with Di culty In 1996, James L. Massey delivered the IACR Distinguished Lecture at EU- ROCRYPT ?96, entitled \The Di culty with Di culty" [Mas96]. This talk, which served as an inspirational starting point for my own research, addressed why measuring the complexity of Boolean functions is so di cult. This thesis presents a new method for measuring complexity and develops tools that allow us to examine a previously unknown structure: the minimal complexity of all 16! = 20;922;789;888;000 4-bit Boolean permutations. In particular, we introduce the notion of nonlinear complexity. We will prove that all permutations may be constructed using alternating rounds of linear and simple nonlinear permutations. In other words, every n-bit permutation p is a composition of the form p = L0N1L1N2L2 NnLn; where each Li is an n-bit linear function, and each Ni is a simple nonlinear permutation selected from a small set. The nonlinear complexity of a permu- 1 tation will be the minimum number of rounds of nonlinearity needed to build the permutation. To nd the minimal nonlinearity of all 4-bit Boolean permutations, faster double coset identi cation methods were needed. Special methods for working with AGLn(2) double cosets were needed, since general double coset methods would take months to compute the results given in this thesis. Special Hamiltonian cycles accelerate the double coset equivalence test. Vari- ous invariants are used to accelerate double coset identi cation. The mathemat- ical theory behind each of these advances is developed in this thesis. Using these new tools, we examine the nonlinear complexity of many common 3-bit and 4-bit permutations. 1.2 Combinatorics and Complexity Let V = GF(2), the nite eld over two elements. Given a Boolean function f : Vn!V, how di cult is f to compute? Typically, the di culty of computing f is associated with how many \building blocks" are needed to construct f. One common set is the AND, OR and NOT gates. AND, OR and NOT are a universal set. This means any Boolean function can be constructed from a nite number of them. Such constructions are not unique, but there will be a minimal number of gates needed to construct a function (There may also be multiple minimal solutions). Minimal solutions are found by exhaustively constructing all possible func- tions and marking the minimal instances of a function by where they rst occur. This method is essentially a combinatorical search. Unfortunately, the number of functions f : Vn!V is 22n. Due to the doubly exponential growth, classifying 2 the 225 = 4;294;967;296 5-bit to 1-bit functions is a challenge for current desktop computers. If we could classify a billion billion functions (1018) per second, then classifying all 227 = 2128 = 340282366920938463463374607431768211456 7-bit to 1-bit functions would take about 1013 years, which is a thousand times longer than the current age of the universe (1:3 1010 years). In response to this massive explosion of complexity, researchers chose to focus on certain equivalence classes of functions. 1.3 History In 1951, researchers at the \Computational Laboratory of Harvard" exhaus- tively classi ed the 402 classes under permutation and complementation of the input variables of the 65;536 four-variable Boolean functions [AtSotCL51]. After- ward, mathematicians realized Polya theory could easily enumerate the number of classes. This sparked a large amount of theoretical interest in counting the number of Boolean functions in various manners [Lor64]. Studying methods for nding optimal or minimal representations for these functions was largely ignored. This is likely due to the fact that look-up tables work very e ciently for small functions. Quantum computing has forced mathematicians to reconsider the foundations of computer science with the added constraint of reversibility. In the reversible context, it is more natural to study Boolean permutations (n-bit to n-bit invert- ible functions) as opposed to the more commonly studied Boolean functions (n to 1 bit functions). Many foundational ideas that were thought to be settled have been resurrected. Even lowly addition has once again become a hot topic. In the search for e cient programs on a quantum computer, the exact calculation of 3 small permutations is important. Look-up tables are problematic on a quantum computer since computation space is at a premium. Due to the di culty of quan- tum computation, we are willing to spend a great deal of classical computational power to nd an e cient quantum program. This paper presents a new method for categorizing the complexity of a func- tion in terms of the number of rounds of nonlinearity needed to achieve the func- tion. Functions which di er from each other by a linear transformation will be considered to have the same nonlinear complexity. A small set of simple nonlinear permutations will be used to connect the linearly related equivalence classes. By counting the minimal number of nonlinear transitions needed to create a function, we can assign a partial ordering to the equivalence classes. 1.4 Summary of Primary Results A new theory of complexity is introduced, based on nonlinearity. This new notion of complexity proves useful in comparing the exact complexity of very small functions, but also de nes new complexity classes that t nicely into the current complexity framework of computer science. We show how Boolean permutations can be separated into double coset equiv- alence classes based a ne transformations. We can then link the double cosets via a special class of nonlinear functions. The nonlinear connections induce a complexity hierarchy among Boolean Permutations. Previous methods for nding the complexity of Boolean functions focused on counting the minimal number of gates needed to realize a function from a given universal set. The doubly exponential growth of Boolean functions has made research beyond 3-bit Boolean permutations and 5-to-1 bit Boolean functions 4 impractical. The central contribution of this paper is to count rounds of nonlinearity in- stead of counting gates. By relaxing the restriction from the ne granularity of gate count to a coarser measure, we can view the complexity structure present in permutations previously too di cult to analyze. Using the results of this paper, we will revisit the complexity of 3-bit permu- tations in a new light and present a previously unknown complexity structure for 4-bit permutations. 1.5 Classi cation Strategy 1.5.1 What Functions are Equivalent? Linear functions play an integral role in virtually every eld of mathematics. Why? Because we can solve, manipulate, invert, combine and expand them in an understandable way that does not carry over very neatly into the realm of nonlinear functions. We do not use linear functions because they are more common than nonlinear functions. We use linear functions because they represent the majority of problems that we know how to solve. A linear function of a bit vector is a matrix where all of the entries are 0 and 1 and the resulting product is reduced modulo 2. This is the same as the ring of matrices over the two-element nite eld GF(2). Of special interest are the linear functions that are invertible. These linear functions form the group GLn(2) and will be referred to as linear permutations since they permute the 2n n-bit vectors. Considering n-bit permutations in this light, linear permutations might be considered the permutations with minimal complexity. They form a subgroup in 5 which it is easy to multiply and invert. Their matrix form is much more compact (n2 bits) than their associated permutation description (n2n bits). In fact, we will want to slightly enlarge our de nition of minimal complexity permutations to include all n-bit a ne permutations, since a ne permutations also share the same nice properties. After deciding that a ne permutations have minimal complexity, it is natural to associate two nonlinear functions if they are equivalent under a ne transfor- mations. This equivalence divides S2n into disjoint double coset where AGLn(2) acts on the inputs and outputs of a permutation. It is interesting to note that the current best known bound between the com- plexity of a function and its inverse is based on a sparse linear transformation with a dense inverse [Hil93]. 1.5.2 How Do We Induce a Complexity Hierarchy? We obviously need a nonlinear permutation to add to our a ne permutations, or we will never be able to realize all permutations in S2n. There are two reasonable choices here, and both will be considered. First, the To oli and Fredkin gates are two di erent universal gates for re- versible computation. We will prove that these gates are a ne equivalent, and thus they both induce the same complexity structure. This will be referred to as the To oli basis or BT. Second, controlled a ne permutations are also universal for reversible com- putation. Although this class of functions is not as simple as a single To oli gate, controlled a ne functions have a complexity roughly equivalent to the lin- ear transformations between each nonlinear round. They are also easy to invert 6 and cover a broader range of permutations faster. These gates will be referred to as the controlled a ne basis or BCA. Once we have chosen which nonlinear basis we wish to use, we can induce the related complexity tree. We start with a ne permutations as our level zero complexity (zero rounds of nonlinearity). Then all equivalence classes that can be reached using one gate from our nonlinear basis become level one complexity (one round of nonlinearity). Then the new equivalence classes that can be reached from level one, become level two complexity and so on. We can then create a graph illustrating how di erent permutations relate and the number of rounds of nonlinearity needed to achieve a particular function gives a rough complexity measure. 1.5.3 What Can We Compute? Even though we have reduced the combinatorial complexity problem to a much more algebraic problem, we still face formidable computational problems. The equivalence classes themselves are double cosets, and there are currently no truly satisfactory algorithms for double coset enumeration, equivalence or canonical representation [Hol05]. Much of the paper develops the theory behind special methods for solving a ne double coset problems speci cally over S2n. The nonlinear complexity depth is computed for all 3 and 4 bit permutations. 7 Chapter 2 Complexity Measures 2.1 Summary This chapter recalls three di erent, but known, complexity measures. These measures serve as a useful benchmark for assessing the advantages of the nonlinear complexity measure. The complexity measures introduced are: Transposition Complexity Measure Gate Complexity Measure Reversible Gate Complexity Measure 2.2 A Di erent View of Complexity In computer science, studies of complexity usually focus on an in nite class of functions. For example, the complexity class P, contains decision problems that can be computed in polynomial time, and NP, contains decision problems whose answer can be veri ed in polynomial time. The open question of whether or not P = NP is the most important question in computer science. 8 However, we may be interested in a particular function and not an in nite class. In designing a 64-bit adder, there will be optimal constructions given a set of building blocks (e.g. logic gates, transistors, etc.). Knowing which complexity class addition is in may provide a reasonable starting design, but nding an optimal design is a much harder problem. Let us now consider a number of measures for assessing the optimal construc- tion of a function. 2.3 Transposition Complexity Measure One simple complexity measure in which a complete theory can be easily devel- oped is the transposition complexity measure. Recall that a permutation p on N =f1;2;:::;ngis a bijective map from N to itself. One common representation for permutations is: 0 B@ 1 2 ::: n p(1) p(2) ::: p(n) 1 CA NOTE: Due to the computational nature of the permutations studied in this paper, it will be more natural to consider \zero-based" permutations instead of the more traditional \one-based" permutations used in most algebra texts. Thus, permutations on n elements will be presented as: 0 B@ 0 1 ::: n 1 p(0) p(1) ::: p(n 1) 1 CA Example 2.3.1. Consider the function p(x) = 3x + 3 mod 8. Evaluating the 9 function for each x in f0;1;2;3;4;5;6;7g yields: 0 B@ 0 1 2 3 4 5 6 7 3 6 1 4 7 2 5 0 1 CA Lemma 2.3.2. Every permutation can be written as a composition of disjoint cycles. Proof. See [Jac74, 49] Example 2.3.3. Decomposing the permutation 3x+3 mod 8 into cycles yields: (0347)(1652) A transposition is a permutation that swaps two elements and leaves all of the other elements xed. Essentially, it is the simplest non-trivial permutation. Using the key fact that any cycle can be decomposed into a product of transpo- sitions (e.g. (0347) = (47)(37)(07)), we arrive at another simple, yet powerful result. Lemma 2.3.4. Every permutation can be written as a composition of transposi- tions. Furthermore, every m-cycle can be written as the composition of m 1 transpositions. Proof. See [Jac74, 49-50] We can now de ne the transposition complexity, CT, of a permutation to be the minimum number of transpositions whose composition realizes that permutation. Lemma 2.3.5. If p is a permutation on f0;1;:::;n 1g then CT(p) = n c, where c is the number of disjoint cycles in p. 10 Proof. See [Mas96] Let us now consider the pros and cons of the transposition complexity mea- sure. Pro: CT is easy to compute given a permutation. Pro: CT de nes a partial ordering on the complexity measure of permuta- tions. Con: Linear functions have high complexity. Linear and a ne functions, which are generally considered easy to compute, can obtain close to maximal complexity. Con: A permutation and its inverse always have the same complexity. This leads to the surprising result that there are no one-way permutations for the transposition complexity measure. If we believe that there are one- way permutations, then this complexity measure is not measuring the right thing. 2.4 Gate Complexity Measure This is probably the complexity measure that rst comes to mind for most re- searchers. First, choose a universal set of gates such as AND, OR and NOT, or all possible 2-bit gates, or even just NAND gates. Then count the number of gates necessary to compute a function. Functions are created by connecting the gates in an acyclic network and then computing the output given a speci c input. The gate complexity of a given function with respect to a given set of universal gates is the minimum number of gates needed to realize the function. 11 Let us now consider how gate complexity stacks up against our criteria. Pro: Simple functions have simple constructions. Pro: Counting gates de nes a partial ordering on the complexity measure of functions. Con: Computing the measure seems to be very di cult. It appears that exhaustively enumerating all possible functions from acyclic networks is the only way to nd the minimum function in general. Con: Functions that are closely related can have very di erent looking gate structures. It is also di cult to know if functions are only one gate away from each other without exhaustively checking. Current state of the art: Exhaustively searching the 226 = 264 functions that map 6 bits to 1 bit or the (24)! 244 permutations over 4 bits is prohibitive. In a search of the literature, the author has found no indication that these cases have been exhausted. However, in 1997, the coordinated e orts of DESCHALL exhaustively searched the 256 DES keys in an RSA Security challenge. Thus, either of these classi cations could possibly be done by a large coordinated e ort. To date, it seems that only the minimum number of gates needed for all 3-bit permutations and 5-bit to 1-bit functions have been computed. 2.5 Reversible Gate Complexity Measure Since it appears that look-up tables can not be used very e ciently on a quantum computer, there has been increased interest in nding the optimal gate construc- tion of a function under the restrictions of reversibility. 12 Unfortunately, the search for optimal gate implementations in the reversible context faces many of the same challenges that plague the gate complexity mea- sure [MDM07]. Pro: Counting reversible gates de nes a partial ordering on the complexity measure of permutations. Pro: Has low complexity for simple functions. Since classical computation can be e ciently simulated by a reversible computation, simple functions will have simple reversible gate implementations. Con: Exhaustively enumerating all possible functions by applying new gates again appears to be the only way to nd the minimum function in general. 2.6 Nonlinear Reversible Complexity Measure Each of the previous complexity measures have been studied in depth, and will not be treated further in this paper. However, each of these complexity measures provides a useful measuring stick in evaluating new complexity measures. We now present the basics of nonlinear complexity, and analyze its advantages. Since linear functions are so simple, yet powerful, it is reasonable to suggest that we consider two functions to be equivalent if a linear transformation on the input and output of one makes it equal to the other. Using this basic notion of equivalence, it is now natural to measure a function by the number of rounds of nonlinearity necessary to achieve it. Obviously, we must restrict the class of nonlinear connections, otherwise ev- ery function is achievable in at most one round of nonlinearity { using itself! 13 The class of nonlinear connections should have low complexity and have inverses that are easy to compute. Inspired again by quantum computing, the most nat- ural nonlinear function is the To oli gate. This gate is universal for reversible computation. Pro: Since the complexity measure graph can be constructed from the identity up, we get a partial ordering similar to the gate complexity mea- sure. Except in this case, we are counting the number of nonlinear functions needed. Pro: Simple functions have low complexity. Linear functions have com- plexity zero, since they require no rounds of nonlinearity. Simple nonlinear functions have implementations requiring few NAND gates. These can be converted into reversible functions using few To oli gates. Since each Tof- foli gate can be used as a round of nonlinearity, this sets an upper bound on the number of rounds of nonlinearity a function needs. Pro: We are able to compute a complexity graph for 4-bit permutations. This is currently not feasible for the gate complexity measure or reversible gate complexity measure. Con: Similar to combinatorical complexity measures, this approach also quickly runs out of steam due to the doubly exponential size of the space we are exploring. Possibly the most exciting result of the nonlinear complexity measure is that we are able to use more powerful mathematics to analyze the problem of com- plexity. 14 Chapter 3 Reversible Circuits 3.1 Summary This chapter introduces the basic gates of reversible computing and illustrates how these gates may be composed. Three di erent methods for visualizing re- versible circuits are introduced: circuit diagram, truth table and wire program. The majority of the content can be found in the open literature. It is presented here to provide the reader su cient background to understand the material in the later chapters. 3.2 Classical Circuit Diagrams Classical circuits may be represented as diagrams connecting the inputs and out- puts by a combination of wires connected by universal gates. There are certain rules to building circuits, such as no closed loops, but these restrictions will not be discussed in this paper. Example 3.2.1. One very common building block of circuit design is the full adder, which computes the sum and carry of two bits and an incoming carry bit. 15 A common presentation is given in Figure 3.1. Figure 3.1: Full Adder from XORs, ANDs and ORs It is also useful to represent reversible circuits with diagrams. In [Ben73], Ben- nett discusses the necessary conditions for reversible computing. For reversible circuits, the conditions imply that 1. The number of wires is constant throughout the computation. 2. Every gate or action must be a permutation. 3. Wires may not be split or joined. 3.3 Basic Reversible Gates The three most commonly used reversible gates are the NOT gate, the con- trolled NOT gate or CNOT, and the To oli gate, also referred to as a controlled- controlled-NOT gate, a doubly controlled NOT gate or simply CCNOT. As we discuss reversible functions it is useful to view the functions in multiple forms, the wire program, circuit diagram and truth table. We now present some basic gates in each of the three forms. 16 In the wire programs presented in this thesis, the binary operation = will be used repeatedly. This notation is very useful in reversible computing since it ensures that the end function is a permutation. Similar to a C program where a+ = 2 means that the value of a is incremented by 2, hki = f(a0;a1;:::an) means that the value of hki is updated by XORing it with the value of f. 3.3.1 NOT Gate The NOT gate is the simplest and acts on only one bit. The wire program for the NOT gate acting on the kth bit is simply hki = 1, In gure 3.2, the wire h0i is negated. Figure 3.2: Circuit Diagram: NOT Gate a0 b0 = a0 1 Table 3.1: Truth Table: NOT Gate Input (a0) Output (b0) 0 1 1 0 Given an n wire circuit, the NOT gate applied to various wires will generate any bit translation (e.g. f(x) = x b where b2Zn2 ). 17 Table 3.2: Wire Program: NOT Gate Instruction h0i a0 h0i = 1 a0 1 3.3.2 Controlled NOT Gate (CNOT) The controlled NOT gate or CNOT adds the value of one wire to another. If hii is the control and hji is the target, then the wire program for the CNOT gate is hji =hii. In gure 3.3, h0i is the control and h1i is the target. Figure 3.3: Circuit Diagram: Controlled NOT Gate a0 b0 = a0 a1 b1 = a1 a0 Table 3.3: Truth Table: Controlled NOT Gate a0a1 b0b1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0 Note that the control and target can be any wires. The CNOT gates on n wires generate the linear group GLn(2). Combining CNOT and NOT gates will generate the a ne linear group AGLn(2). 18 Table 3.4: Wire Program: CNOT Gate Instruction h0i h1i a0 a1 h1i =h0i a0 a1 a0 3.3.3 SWAP Lemma 3.3.1. The value of two lines can be swapped using 3 CNOTs. (Referred to as SWAP.) Proof. If we wish to switch the values stored inhiiandhji, we can simply apply the circuit in Figure 3.4. Figure 3.4: Circuit Diagram: SWAP a0 b0 = a1 a1 b1 = a0 We can trace the values on line h0i and line h1i through the following wire program: Table 3.5: Wire Program: SWAP Instruction h0i h1i a0 a1 h1i =h0i a0 a1 a0 h0i =h1i a1 a1 a0 h1i =h0i a1 a0 19 Table 3.6: Truth Table: SWAP a0a1 b0b1 0 0 0 0 0 1 1 0 1 0 0 1 1 1 1 1 This is a well known trick from computer science when you wish to switch the value of two variables and not use a temporary variable. 3.3.4 To oli Gate The doubly controlled NOT gate, CCNOT or To oli gate adds the product of two wires to another. If hii and hji are the controls and hki is the target, then the wire program for the CCNOT gate ishki =hiihji. In gure 3.5,h0iandh1i are the controls and h2i is the target. Figure 3.5: Circuit Diagram: To oli Gate a0 b0 = a0 a1 b1 = a1 a2 b2 = a2 a0a1 The To oli is probably the most well known universal reversible gate. Note that both the NOT and CNOT gates can be derived from the To oli gate by forcing certain control lines to be 1. 20 Table 3.7: Wire Program: To oli Gate Instruction h0i h1i h2i a0 a1 a2 h2i =h0i h1i a0 a1 a2 a0a1 Table 3.8: Truth Table: To oli Gate a0a1a2 b0b1b2 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 3.3.5 Example Permutation Example 3.3.2. Consider the function f(a0;a1;a2) = (a2 a0a1;a1 1;a2 a0a1 a0). The circuit diagram, wire program and truth table for f are in gure ??, and tables 3.9 and 3.10. Figure 3.6: Circuit Diagram: f = (a2 a0a1;a1 1;a2 a0a1 a0) a0 b0 = a2 a0a1 a1 b1 = a1 1 a2 b2 = a2 a0a1 a0 21 Table 3.9: Wire Program: f = (a2 a0a1;a1 1;a2 a0a1 a0) Instruction h0i h1i h2i a0 a1 a2 h1i = 1 a0 a1 1 a2 h2i =h0ih1i a0 a1 1 a2 a0a1 a0 h0i =h2i a2 a0a1 a1 1 a2 a0a1 a0 Table 3.10: Truth Table: f = (a2 a0a1;a1 1;a2 a0a1 a0) a0a1a2 b0b1b2 0 0 0 0 1 0 0 0 1 1 1 1 0 1 0 0 0 0 0 1 1 1 0 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 3.3.6 Fredkin Gate Another common universal gate for reversible computing is the Fredkin gate, or controlled swap gate. While the To oli gate has two controls and one target, the Fredkin gate has one control and two targets. If the control is 1, then the values of the two targets are swapped. Again, if the control is 0, then no action takes place. Writing wire programs with the Fredkin gate is awkward since it updates the value of two wires at once. For this reason, most wire programs will be written 22 Figure 3.7: Circuit Diagram: Fredkin Gate a0 b0 = a0 a1 SWAP b1 = a1 a0a1 a0a2a 2 b2 = a2 a0a1 a0a2 Table 3.11: Truth Table: Fredkin Gate a0a1a2 b0b1b2 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 1 0 0 1 0 1 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 using To oli gates instead of Fredkin gates. Due to the linear equivalence between To oli and Fredkin gates (that will be proven later), the necessary conversion is fairly trivial. 3.4 Linear transformations on reversible circuits Linear transformations are central to the equivalence relation that will be devel- oped later. Not surprisingly, linear permutations can be constructed using only a small subset of the gates presented so far. Lemma 3.4.1. The CNOT gates on n wires generate GLn(2). 23 Proof. Observe that the CNOT from hii to hji is the same as adding column i to column j and the SWAP gate on hii and hji simply swaps columns i and j. These elementary column operations can be used to reduce any invertible linear transformation to the identity matrix. Thus, by simply reversing the echelon reduction, any invertible linear transformation can be made from CNOTs. Instead of representing a linear transformation as a series of controlled NOT gates, we can represent the linear transformation as a single matrix action on a group of wires. This can be done without hiding too much complexity since every linear permutation may be constructed using O(n) CNOTs [KMS07]. As the binary state progresses from left to right, the binary state will be a GF(2) column vector acted on by left multiplication of GF(2) matrices. Mirroring the physics convention for quantum vectors, we will represent the column vectors using the \ket" notation. (e.g. The column vector [1;1;0]T =j110i.) Example 3.4.2. Consider the vector j110i acted upon by the linear transfor- mation h0 0 1 1 0 10 1 0 i . The result will be the vector j110i multiplied on the left by the matrix h0 0 1 1 0 10 1 0 i j110i= h0 0 1 1 0 10 1 0 i [1;1;0]T = h0 0 1 1 0 10 1 0 ih1 10 i = h0 11 i = [0;1;1]T =j011i: Figure 3.8: Circuit Diagram: Linear Matrix Example a0 0 0 1 1 0 1 0 1 0 a2 ja0a1a2i= a1 a0 a2 =ja2;a0 a2;a1i a2 a1 24 Remark 3.4.3. In order to maintain a consistent notation with established no- tation, actions are applied to the input ket from left to right. When considering linear actions applied to the ket, the matrix multiplication is applied on the left of the column vector. Thus, to retain readability, we will use a notation similar to the opposite ring or Rop notation when multiplying matrices in block form. Thus we de ne A B = BA for matrices A and B. Note that applying matrix A and then B to ket jxi yields B(Ajxi) = (A B)jxi. Figure 3.9: Block matrix product acting on kets (A B = BA) A B A B= It should also be noted that linear transformations can be enlarged to include bit lines they do not a ect. Of course, the opposite is also true. Bit lines that are una ected can be dropped out of the block matrix. Figure 3.10: Two equivalent block matrix actions 1 0 0 0 a b 0 c da bc d = 3.5 Controlled Linear Transformations Controlled linear transformations will operate in a wire diagram as anticipated. If the controls are all 1 then the diagram will apply the linear transformation. 25 Otherwise, no action is taken. Example 3.5.1. We can express the Fredkin gate by controlling the matrix [ 0 11 0 ]. Figure 3.11: Fredkin gate as a controlled linear transformation 0 1 1 0 = SWAP Example 3.5.2. Since a single control a ne shift is equivalent to a linear trans- formation, we can express the To oli gate as a controlled linear transformation. Figure 3.12: To oli gate as a controlled linear transformation = 1 1 0 1 3.6 A ne Transformations An a ne transformation is simply a linear transformation plus the addition of a vector. Flipping the state of a wire in reversible circuits is typically represented by placing a on the wire. While this notation is a little strange, it also makes sense. It is simply a controlled NOT without the control, and therefore just a NOT gate. 26 Example 3.6.1. We can express the a ne transformation h0 0 1 1 0 00 1 1 i +j110iin two di erent ways. Figure 3.13: Equivalent A ne Transformations 0 0 1 1 1 0 0 1 0 1 1 0 0 0 1 1 0 0 0 1 1 = Controlled a ne transformations will also operate in exactly the same manner as controlled linear transformations. If the controls are all 1, then the a ne transformation will be applied. Otherwise, no action is taken. 3.7 Scratch Space Finally we consider scratch space, or additional computation space. For various calculations, additional wires may be needed to store intermediate values or the nal calculation. In some cases, such additional wires are necessary, in other cases, they simply speed up the computation. It should be noted that although the To oli gate is universal, it is only universal given enough scratch space. To illustrate how scratch space is used in reversible computation, we will consider a simple problem: computing the AND of three wires. 3.7.1 3-bit AND Suppose we wanted to construct the 3-bit AND from To oli, CNOT and NOT gates. We will explore various ideas to show why two scratch bits are necessary 27 and give motivation for the idea that works. Since a permutation is a bijection, there do not exist functions f and g such that gure 3.14 can have the output abc on the third wire (Two values must map to one). Thus, we must utilize at least one additional wire to perform a reversible computation for the 3-bit AND gate. Figure 3.14: 3-bit AND Using 3 Wires? a ? f(a;b;c) b g(a;b;c) c abc Now consider the circuit in gure 3.15. Figure 3.15: 3-bit AND: Second Attempt a a b b abc c c 0 abc Figure 3.15 has computed the value of the 3-bit AND abc, but one of the wires has been modi ed. It is often preferable not to modify the input variables in the event they are needed for other computations. This is corrected in the implementation in gure 3.16. Figure 3.16 now does seem to do the trick, but usually in reversible computing, one wants to compute a value and have it XORed onto some set of bits. In this case we are relying on the scratch bit to be initialized to zero for the 3-bit AND to work. Figure 3.17 shows the circuit evaluated with an arbitrary scratch bit d. 28 Figure 3.16: 3-bit AND: Third Attempt a a b b c c 0 abc Figure 3.17: 3-bit AND: Third Attempt (Revisited) a a b b acd c c d d acd abc Computing the 3-bit AND and XORing its value on the fourth bit is a triply controlled NOT. Upon further re ection, we realize that the CCCNOT, or triply controlled NOT is an odd permutation on 4-bit wires. Since the NOT, CNOT and CCNOT are all even permutations on 4-bit wires, we will need to add yet another scratch bit to create a permutation equivalent to CCCNOT. With 2 scratch bits, it doesn?t take long to nd the construction in gure 3.18. Figure 3.18: 3-bit AND: Fourth Attempt a a b b c c 0 ab e e abc 29 Thus, in gure 3.18 all of the input values are preserved and the 3-bit AND is XORed onto the target wirehei. The only problem is that one of our scratch bit still contains some left over information. For classical computation, this doesn?t seem like much of an issue. But for reversible computation, erasure means heat dissipation. We would prefer to \uncompute" the scratch bit to return it to its original value. Furthermore, if the reversible computation is to be used on a quantum computer, it is necessary to return any scratch space used to its original state or it will destroy the superposition of the computation. Thus, gure 3.19 is still a better version of our 3-bit AND. Figure 3.19: 3-bit AND: Fifth Attempt a a b b c c 0 0 e e abc Finally, there is one more surprising improvement that can be made. The fourth scratch bit does not need to be initialized to zero for the computation to work. For any value in d, gure 3.20 successfully computes the 3-bit AND of a;b;c, XORs the value onto e and returns d to it original value. This amazing trick is even more useful in quantum computation. Qubit wires can be used as computation scratch space, even when we cannot observe their contents. 30 Figure 3.20: 3-bit AND: Final Attempt a a b b c c d d e e abc 3.8 Conclusion Hopefully this small foray into reversible computation illustrates some of the di culty in exploring permutation constructions. One important observation to make at this point is the relationship between scratch space and one-way permutations. Any reversible computation that does not use any scratch space, or a compu- tation that \uncomputes" the scratch space that it does use, is easily reversible. All of the gates can be run in the reverse order. Thus for a permutation to be one-way, some circuit must exist which can compute one direction of the permutation e ciently. Furthermore, this circuit must leave \dirty" scratch space (not \uncomputed"), otherwise the inverse com- putation can simply run the circuit backwards. 31 Chapter 4 Which Permutations Should Be Equivalent? 4.1 Results and Application The major results of this chapter are: A ne equivalence is a maximal equivalence relation in S2n. The two major universal reversible gates (To oli and Fredkin) are a ne equivalent. Singly-controlled linear and a ne permutations are a ne equivalent. 4.2 Background In the early 1950?s, Aiken[AtSotCL51] and Moore[Moo52] exhaustively computed the 402 equivalence classes of the 65536 four-variable Boolean functions (4-bit to 1-bit) under permutation and complementation of the input variables. Shortly thereafter, mathematicians found combinatorical methods for counting that were far superior to the exhaustive methods used by Aiken and Moore. This work 32 aroused a lot of interest in using Polya theory to count equivalent Boolean func- tions. References to much of this work can be found in [Lor64]. Permutations as well as functions may be classi ed by Polya theory. This chapter explores various notions of equivalence, and why a ne equivalence is considered optimal. 4.3 The Equivalence Choice Any group action on the input and output variables will de ne an equivalence. Aiken and Moore?s original research stemmed from the question of how many 4-input, 1-output \cans" had to be designed to create all 4-bit nonlinear func- tions. Since the input plugs could be arranged in any order, equivalence under permutation of the inputs was one of the rst equivalences studied. The following is a list of some of the more natural and common choices. 1. Complementation of variables. 2. Permutation of variables. 3. Complementation and permutation of variables. 4. Linear transformation of variables. 5. A ne transformation of variables. Table 4.1 is extracted from [Lor64]. It is helpful to see how the di erent equivalence relations on inputs and outputs change the number of classes of per- mutations. For most of the equivalence relations, no closed form solution to count them is known. A lower bound is provided for the last four. 33 Table 4.1: Number of Permutation Equivalence Classes Equivalence n 1 2 3 4 None 2n! 2 24 40320 20922789888000 Complementation 2n!+(2n 1)22n 1!22 n 1 22n 1 6 924 81738720000 Permutation > 2n!(n!)2 2 7 1172 36325278240 Comp. & Perm. > 2n!22n(n!)2 1 2 52 142090700 Linear > 2n!22n2 2 2 10 52246 A ne > 2n!22n2+2n 1 1 4 302 Note that the group of a ne transformations contains all of the other listed group actions as subgroups. One obvious question is whether or not there is a larger subgroup than a ne transformations suitable for larger equivalence classes. Theorem 4.3.1 (O?Nan-Scott). Let Fq be the nite eld with q elements and Fnq the n-dimensional vector space over Fq. Let AGL(n;q), S(Fnq ) and A(Fnq ) be the a ne general linear group, symmetric group and alternating group acting on Fnq respectively. Then AGL(n;q)\A(Fnq ) is a maximal subgroup of A(Fnq ). Proof. See [LPS87]. This proves that AGLn(2) is a maximal subgroup in A2n. Thus, adding any nonlinear element of S2n to the a ne group will generate all of A2n or S2n. Although S2n contains other maximal subgroups, a ne equivalence is the natural choice if we wish linear functions to have low complexity. 34 4.4 A ne Equivalence of To oli and Fredkin Gates As we start our journey of studying the complexity of functions using a ne equivalence, it is refreshing to see that the two universal gates for reversible computing are in fact a ne equivalent. Lemma 4.4.1. The To oli gate and the Fredkin gate are a ne equivalent. Figure 4.1: Equivalence of Fredkin and To oli gates 0 1 1 0 = 0 1 1 1 1 1 0 1 1 1 1 0 Proof. We need to show that when the top wire is 0, no action as taken by both sides, and when the top wire is 1, the values in the bottom two lines are swapped. When the top wire is 0, the controlled gate is not applied. Thus no action is taken on the left, and the action on the right is [ 0 11 1 ] [ 1 11 0 ] = [ 1 11 0 ] [ 0 11 1 ] = [ 1 00 1 ]; which is the identity, and therefore takes no action. In the case where the top wire is 1, the action on the right is [ 0 11 1 ] [ 1 10 1 ] [ 1 11 0 ] = [ 1 11 0 ] [ 1 10 1 ] [ 0 11 1 ] = [ 0 11 0 ]; which is equivalent to a swap. Thus the To oli and Fredkin gates are a ne equivalent. 35 4.5 Controlled A ne and Linear Permutations Lemma 4.5.1. A singly controlled a ne permutation is equivalent to a singly controlled linear permutation composed with an a ne permutation. Proof. Separate the controlled a ne function into the linear permutation and vec- tor addition. The singly controlled vector addition is just the product of multiple CNOTs. Any composition of CNOTs is a linear permutation. Thus, the singly controlled a ne permutation is equivalent to a controlled linear permutation and a linear permutation. Another way to consider the equivalence is: conditionally negating wire hii by wire hji; which is the same as adding wire hii to wire hji. Example 4.5.2. Consider the following controlled a ne permutation: Figure 4.2: A ne Equivalent Controlled Transformations 0 1 1 1 0 1 = 0 1 1 0 Since the composition of CNOTs is simply a linear function, the controlled a ne permutation on the left is linear (and therefore a ne) equivalent to the controlled linear permutation on the right. 36 Chapter 5 Counting Equivalent Functions 5.1 Summary and Results A method for counting a ne equivalent double cosets in S2n is reviewed. This result is then extended to show how we can enumerate the a ne equivalent double cosets in A2n. 5.2 Background Polya theory is a very robust method for counting di erent types of objects, subject to an equivalence relation. Let us rst review the basic counting theorems and how they are applied to counting the number of a ne equivalent double cosets. Definition 5.2.1. Let f and g be n-bit permutations in S2n. Then f and g are a ne equivalent or f g if there exist a0 and a1, a ne permutations in AGLn(2) < >: 1 if g x = x 0 otherwise Then by summing over x or g we get: X g2G jXgj= X g2G;x2X f(g;x) = X x2X jGxj Recall the Orbit-Stabilizer relation (we use it twice): For all x2X jGxj jG(x)j=jGj: Any group action divides X = O1[O2[ [ON into a disjoint union of orbits, and it is exactly these orbits we wish to count. Consider the right hand sum over a single orbit generated by y. X x2Oy jGxj= X x2Oy jGyj=jGyj jG(y)j=jGj: Thus, the sum of jGxj over any orbit is jGj. Therefore, X g2G Xg = X x2X jGxj= X x2O1[ [ON jGxj= NX i=1 jGj= NjGj where N is the number of orbits. It follows immediately that jG==Xj= N = 1jGj X g2G Xg: 39 5.3 Counting Fixed Points Under a Double Coset Action De ne : (AGLn(2) AGLn(2)) S2n !S2n by ((a;b);p) = apb 1. In order to count the number of equivalence classes we have, we need to gure out how many permutations are xed by the action of (a;b). For any permutation we can de ne the cycle set to be a tuple ( 1; 2; 3;:::) where each i indicates the number of i-cycles in the permutation. For any permutation on N elements, X i i i = N: Lemma 5.3.1. If the action of (a;b) on the permutation p xes p, then a and b must have the same cycle set. Proof. Suppose the action (a;b) xesp. Thenapb 1 = por equivalently, apb 1p 1 = 1. LetCk = (b0b1:::bk 1) be ak-cycle inb 1. ThenpCkp 1 = (p(b0)p(b1):::p(bk 1)). This can be veri ed by checking that pCkp 1(p(bi)) = pCk(bi) = p(bi+1): Since conjugating by p maps k-cycles to k-cycles for all k, conjugating by p will take b to another permutation with the same cycle structure. Therefore a must also have the same cycle structure since a is the inverse of pbp 1. Lemma 5.3.2. Given two permutations a and b with the same cycle set ( 1; 2;:::), the number of permutations xed by the action (a;b) is Y k k!k k: 40 Proof. Recall that when the action (a;b) xes a permutation p, apb 1p 1 = 1. Let Cbk = (b0b1:::bk 1) be a k-cycle in b 1. Then pCbkp 1 = (p(b0)p(b1):::p(bk 1)). Given another k-cycle in a, Cak, there are k di erent permutation maps p that will conjugate Cbk to become the inverse of Cak. This is simply due to the fact that the permutation can be shifted in k possible ways, all of which are actually the same permutation. If we have k k-cycles in b 1 to match with k k-cycles in a, there are k! ways to choose the matching. With each matched pair we have shown that there are k permutations with which we can conjugate Cbk to become Cak. Thus there are k!k k ways to match all k-cycles. Since the permutation acts on each cycle independently, the total number of permutations xed by the action (a;b) is the product over all k. For counting the number of double cosets in general, we would examine the cycle sets for the left and right action. Theorem 5.3.3. Let H < S2n and let H = fh2Hjh has cycle set g. Then the number of HH double cosets of S2n is 1 jHj2 X jH j2 Y k k!k k: Proof. Recall the double coset group action of G = H H on S2n, : (H H) S2n !S2n de ned by ((a;b);p) = apb 1. By Theorem 5.2.6, the number of orbits (double cosets) is jG==S2nj = 1jGj X g2G Xg = 1jHj2 X h1;h22H X(h1;h2) 41 By Lemma 5.3.1 the number of xed points is zero unless h1 and h2 are in the same cycle set. This reduces our equation to 1 jHj2 X X h1;h22H X(h1;h2): Applying Lemma 5.3.2, we get 1 jHj2 X X h1;h22H Y k k!k k: Since the summand is independent of h1 and h2, we arrive at the desired result: 1 jHj2 X jH j2 Y k k!k k: 5.4 Counting A ne Equivalent Double Cosets in S2n By theorem 5.3.3, the number of a ne equivalent double cosets in P is 1 jAGLn(2)j2 X jH j2 Y k k!k k where H is the number of elements in H with the cycle type . Recall that all the elements of any conjugacy class have the same cycle type. To computejH j, we will take the union of conjucacy classes in AGLn(2) with cycle type . 5.4.1 Example: A ne equivalent 3-bit permutations We will now use MAGMA[BCP97] to nd the cycle sets for the a ne general linear group over GF(2). 42 > Classes(AGL(3,2)); Conjugacy Classes ----------------- [1] Order 1 Length 1 Rep Id($) [2] Order 2 Length 7 Rep (1, 2)(3, 4)(5, 6)(7, 8) [3] Order 2 Length 42 Rep (1, 5)(2, 6) [4] Order 2 Length 42 Rep (1, 2)(3, 4)(5, 8)(6, 7) [5] Order 3 Length 224 Rep (1, 7, 4)(2, 5, 8) [6] Order 4 Length 84 Rep (1, 4, 3, 2)(5, 8, 7, 6) [7] Order 4 Length 168 Rep (1, 6, 2, 5)(7, 8) [8] Order 4 Length 168 Rep (1, 8, 2, 5)(3, 6, 4, 7) [9] Order 6 Length 224 Rep (1, 5, 7, 8, 4, 2)(3, 6) [10] Order 7 Length 192 Rep (1, 5, 2, 3, 4, 6, 8) [11] Order 7 Length 192 Rep (1, 6, 3, 5, 8, 4, 2) Note that some classes have the same cycle set, even though they are not conjugate. These need to be grouped together in our counting. (i.e. conjugacy classes 2 and 4 are not conjugate, but both have a (0;4) cycle set.) 43 Table 5.1: Cycle Sets of AGL(3,2) Cycle Set Classes Total Count (8) 1 1 12 8!18 (4;2) 3 42 422 4!142!22 (0;4) 2;4 49 492 4!24 (2;0;2) 5 224 2242 2!122!32 (0;0;0;2) 6;8 252 2522 2!42 (2;0;1;0;1) 7 168 1682 2!121!211!41 (0;1;0;0;0;1) 9 224 2242 1!211!61 (0;0;0;0;0;0;0;1) 10;11 384 3842 1!81 Thus, we calculate the number of a ne equivalent 3-bit permutations by summing the terms in table 5.1, and then dividing by jAGL(3;2)j2 = 13442. SUM = 12 8!18 + 422 4!142!22 + 492 4!24 + 2242 2!122!32 + 2522 2!42 + 1682 2!121!211!41 + 2242 1!211!61 + 3842 1!81 = 7225344 Thus the total number of double cosets is SUM 13442 = 7225344 13442 = 4: 44 5.4.2 Counting A ne Equivalent n-bit Permutations Table 5.2 provides a general MAGMA program to compute the number of a ne equivalent permutations for any n. The computation is mostly limited by the di culty of nding the conjugacy classes of AGLn(2). Table 5.2: MAGMA function for counting double cosets Count_AGL_double_cosets_in_Sym := function(n) H:=AGL(n,2); // Permutation group in Sym(2^n) ConjClass:=ConjugacyClasses(H); CycleTypes:={}; // Union together similar cycle sets for x in ConjClass do CycleTypes:=CycleTypes join {CycleStructure(x[3])}; end for; sum:=0; // Compute and add summands for each cycle type for x in CycleTypes do size:=0; for y in ConjClass do if (x eq CycleStructure(y[3])) then size:=size+y[2]; end if; end for; summand:=size^2; for y in x do summand:=summand*Factorial(y[2])*(y[1]^y[2]); end for; sum:=sum+summand; end for; return sum/(#H)^2; // Divide by group order end function; Using the function in table 5.2, the author was able to compute the number of a ne equivalent double cosets for n 7. The total count grows doubly exponentially, so only the values for n 5 are provided in table 5.4. 45 Table 5.3: Number of A ne Equivalent Permutations n A ne Equivalent Total Permutations (2n!) 2 1 24 3 4 40320 4 302 20922789888000 5 2569966041123963092 263130836933693530167218012160000000 5.5 Counting A ne Equivalent Double Cosets in A2n Since the To oli gate is an even permutation for all n > 3, the To oli gate and AGLn(2) can only generate A2n. For this reason, it will be more natural to examine the complexity structure of the double cosets in A2n and extend to S2n via an odd permutation only when necessary. Let G be a group acting on a set X by conjugation. Let CG(x) denote the centralizer and xG the conjugacy class of an element x. We will consider both S2n and A2n acting on A2n by conjugation. For any 2A2n, jCS2n( )j j S2nj= n! jCA2n( )j j A2nj= n!2 : Note that CA2n( ) = CS2n( )\A2n < >: jCS2n( )j if has only odd distinct cycles. 1 2jCS2n( )j otherwise. Proof. See [Sco87]. Corollary 5.5.2. Let S2n and A2n both act on A2n by conjugation, and let 2A2n. Then j A2nj= 8 >< >: 1 2j S2nj if has only odd distinct cycles. j S2nj otherwise. Proof. The order of the centralizer times the orbit must be n!2 . Thus if the centralizer doesn?t change in A2n, the conjugacy class must split. Likewise, if the conjugacy class doesn?t split in A2n, the centralizer must be half as large. Since most permutations do not have a cycle type with only distinct odd cycles, the conjugacy class for most permutations is the same between S2n and A2n. However, the centralizer is now half as large for these permutations. Let represent the distinct odd cycles and the rest. Let 1 and 2 represent the two A2n conjugacy classes of cycle type . Then the equation for theorem 5.3.3 reduces to 1 jAGLn(2)j2 X jH j2 12 Y k k!k k + X ;i=1;2 jH ij2 Y k k!k k ! : The number of a ne equivalent double cosets in A2n is close to half the number in S2n due to the fact that most of the cycles types are not distinct odd cycles. 47 Lemma 5.5.3. If the distinct odd cycles of AGLn(2) split evenly into the two A2n conjugacy classes 1 and 2, then the number of a ne equivalent double cosets in A2n is exactly half the number in S2n. I.e. if for all distinct odd cycles jHA2n 1 j=jHA2n 2 j= 12jHS2n j Then A2n contains half of the double cosets of S2n. Proof. Since the summand result is already correct for cycle types , that are not odd and distinct, we need only consider the summands associated with the odd distinct cycle types . X ;i=1;2 jH ij2 Y k k!k k = X ;i=1;2 jH 2 j2 Y k k!k k = 14 X ;i=1;2 jH j2 Y k k!k k = 14 2 X jH j2 Y k k!k k ! = 12 X jH j2 Y k k!k k: Thus the contribution from both the odd distinct cycles and the other cycles is half in both cases. Therefore the number of a ne double cosets A2n will be exactly half of S2n. Elements of the same AGLn(2) conjugacy class, will obviously be in the same A2n conjugacy class, thus if a A2n conjugacy class has a cycle type that is odd and distinct, there is no guarantee that there will be a matching conjugacy class. In order to test whether or not two permutations are in the same A2n conjugacy class, we will use the following result. 48 Lemma 5.5.4. Let p and q be permutations in S2n with the same odd distinct cycle structure. Since p and q have the same cycle structure, there exists 2S2n such that p = q 1. If is odd, then p and q are in di erent A2n conjugacy classes. If is even then p and q are in the same A2n conjugacy class. Proof. Find such that p = q 1. Such a is guaranteed to exist since p and q have the same cycle structure. Assume is an odd permutation. If p and q are in the same A2n conjugacy class, then there exists an even permutation such that q = p 1. Thus p = q 1 = p 1 1 = ( )p( ) 1: Therefore is an odd permutation in the centralizer of p, which is a contra- diction, since the centralizer of p is contained in A2n. Assume that is an even permutation. Since 2A2n, this implies that p and q are in the same A2n conjugacy class. Consider now the number of cycles in a permutation in S2n with odd distinct cycles. Since the length of the cycles must sum to 2n, there must always be an even number of distinct odd cycles. This leads us to a case that will pair two odd distinct cycle AGLn(2) conjugacy classes. Lemma 5.5.5. Let b be an a ne permutation in AGLn(2), n > 2, with an odd distinct cycle structure . Let m be the number of cycles in . If m 2 (mod 4), then b and b 1 are in di erent A2n conjugacy classes. 49 Proof. Consider the cycles in the permutation b. If every cycle is reversed, we have permutation b 1. We will consider the parity of the permutation that reverses every cycle by conjugation. Let Ck = (b1b2 bk) be a cycle of b. De ne on Ck so that (bi) = bk i+1. Then Ck 1 = (b1b2 bk) 1 = ( (b1) (b2) (bk)) = (bk b2b1): Since k is odd, the midpoint of the cycle stays xed, and is thus composed of k 1 2 transpositions. Extending over all cycles, we see that is composed of 2n m2 transpositions. Since m 2 (mod 4), 2n m2 is odd. Therefore is an odd permutation. By lemma 5.5.4, b and b 1 must be in di erent A2n conjugacy classes. Since the inversion action stays within AGLn(2), there must be an exact pairing between the two A2n conjugacy classes within AGLn(2). Thus all distinct odd cycle types with 2 (mod 4) cycles will split evenly in the A2n conjugacy classes. Thus the only cycle types which can a ect the sum are the distinct odd cycle types with 0 (mod 4) cycles. The case of n = 5 is the rst instance where there exists a odd distinct cycle type with 0 (mod 4) cycles. There are two conjugacy classes of size 15237120 with a (1;3;7;21) cycle structure. A parity check between the two conjugacy classes reveals that they di er by an even conjugation. Lemma 5.5.4 indicates that the two conjugacy classes will be in the same A2n conjugacy class. Let NS be the number of a ne equivalent 5-bit permutations. Computing 50 the number of a ne equivalent 5-bit even permutations NA yields NA = Ns2 + 15237120 2 3 7 21 2jAGLn(2)j2 = 1284983020561981548: Is should be noted that we only added half of the summand since the other half was already included in NS2 . The correction factor turns out to only make a di erence of 2, since Ns2 = 1284983020561981546. In the cases where n = 4;6, there are no odd distinct cycle types with 0 (mod 4) cycles. Thus, the number of a ne equivalent even permutations is ex- actly half for those two cases. For n = 7 there are 10 conjugacy classes with odd distinct cycle type and 0 (mod 4) cycles. Table 5.4: Number of A ne Equivalent Even Permutations n A ne Equivalent in A2n 3 2 4 151 5 1284983020561981548 6 38115488450430370396302626146823126191571813695482842576562378932 The fact that there are 151 a ne equivalent classes in A16 will be used in the classi cation of 4-bit permutations. 5.6 Open problems Develop a general method for computing the conjugacy classes of AGLn(2). 51 A method was developed in [Hou06], but the results are incomplete. Hou?s calculation indicated that the number of a ne equivalent 5-bit permuta- tions was 2569966041123938084 instead of 2569966041123963092. A veri - cation check by the author revealed that Hou was missing two conjugacy classes found by MAGMA. While the canonical form (conjugacy class) of linear matrices has been well studied, the problem is surprisingly not com- pletely solved for nding the canonical form of a ne transformations. Develop a general method for counting the a ne equivalent double cosets in A2n. Are there always half as many a ne equivalent double cosets in A2n when n is even (n> 2)? 52 Chapter 6 Hamiltonian Cycles over GLn(2) 6.1 Results and Application Identifying whether or not two functions are a ne equivalent is critical to the development of a nonlinear hierarchy. This basic computation, which is essentially a double coset membership test, must be done many times, and thus requires a very e cient implementation. With permutations stored as bit vectors, CNOTs can be applied by simple masking, shifting and XORing. These operations are much faster to implement than generically composing a permutation and a linear function. Since the double coset test requires exhausting over all bit matrices in GLn(2), it is natural to ask whether a Hamiltonian cycle exists over GLn(2) where the vertices are the invertible matrices and the edges correspond to adding one row to another. The major results of this chapter are: A proof that the subgroups of upper triangular matrices in GLn(2) have Hamiltonian cycles. 53 A heuristic algorithm for generating Hamiltonian cycles over GLn(2) The algorithm found Hamiltonian cycles over GLn(2) for n = 2;3;4;5. The Hamiltonian cycles found are used to construct a highly e cient double coset test. 6.2 Background A graph contains a Hamiltonian cycle if there exists a sequence of connected edges that pass through each vertex only once, returning to the the start vertex at the end. The Knight?s Tour problem of nding a sequence of knight moves that touch each square of the chess board once is a famous Hamiltonian cycle problem. The Gray code is another famous example of a Hamiltonian cycle. In general, determining whether or not a given graph has a Hamiltonian cycle is NP-complete. Given a group G and a generating set S, the Cayley graph = (G;S) is constructed by assigning each element of G to a vertex, and then connecting two vertices if their di erence is in S. Since S is a generating set, (G;S) must be connected. The following lemma and corollary will be used to construct Hamiltonian cycles in this chapter. Lemma 6.2.1. Let (G;S) be the Cayley graph for a group G and generating set S = f 0; 1;:::g. Let f : Zn ! f0;1;2;:::jSj 1g be a map such that H = [ f(0) f(1) f(n 1)] is a Hamiltonian cycle on . Then any cyclic rotation of H is a Hamiltonian cycle on , i.e. [ f(k) f(k+1) f(k+n 1)] is a Hamiltonian cycle on for all k2Z. 54 Proof. Since H = [ f(0) f(1) f(n 1)] is a Hamiltonian cycle, we may list the elements of G in the following order. f(0) f(0) f(1) f(0) f(1) f(2) ... f(0) f(1) f(2) f(n 1) = e If we now act on the elements of G by multiplying on the left by 1f(0), we get another ordering of the elements of the group G. e f(1) f(1) f(2) ... f(1) f(2) f(n 1) Recall that H is a Hamiltonian cycle implies e = f(0) f(1) f(2) f(n 1): Conjugating both sides by f(0) yields e = f(1) f(2) f(n 1) f(0): 55 Moving the rst term e to the end, now gives f(1) f(1) f(2) ... f(1) f(2) f(n 1) f(1) f(2) f(n 1) f(0): This ordering is associated with the Hamiltonian cycle [ f(1) f(2) f(n 1) f(0)]. Since a single circular shift will yield a new Hamiltonian cycle, any circular shift will yield a new Hamiltonian cycle. Thus [ f(k) f(k+1) f(k+n 1)] is a Hamiltonian cycle for any k. Corollary 6.2.2. Let (G;S) be the Cayley graph for a group G and generating set S. Let G1 be a subgroup of G. If H is a Hamiltonian cycle over the vertices in G1, then any rotation of H is a Hamiltonian cycle over G1. Proof. Apply Lemma 6.2.1 the Cayley graph (G1;S1) where S1 is the subset of S used in H. Corollary 6.2.3. Let (G;S) be the Cayley graph for a group G and generating set S. Let C1 be a left coset of G. If H is a Hamiltonian cycle over the vertices in C1, then any rotation of H is a Hamiltonian cycle over C1. Proof. Any Hamiltonian cycle over a coset will also be a Hamiltonian cycle over the associated subgroup. The Lov asz Conjecture simply states that every nite connected Cayley graph contains a Hamiltonian cycle. Many sub-cases of the the Lov asz Conjecture 56 have been proved. These cases mostly deal with special generating sets for the symmetric group Sn. For a survey of these results see [CG96]. The author is not aware of any results over GLn(2). 6.3 Existence The computational e ect of an AND, shift and XOR is the same as applying a single CNOT gate, which is the same as the linear operation of adding one row to another. This will be demonstrated in detail later. It is well known from linear algebra that simple row operations will generate any matrix. Thus, the computational speed-up relies on the solution of the following problem. Problem 6.3.1. Starting from the identity matrix, is it possible to step through all possible invertible matrices in GLn(2) using only a single row operation at each step? In other words, is there a Hamiltonian cycle over GLn(2) with the generating set S =f ijg, where ij is the operation of adding row i to row j? For n = 2, the problem turns out to be quite easy. Since 2ij = 1 for all i;j, the only choice is to alternate between 12 and 21. It turns out that [ 12; 21; 12; 21; 12; 21], is a Hamiltonian cycle as follows: [ 1 00 1 ] 12 ! [ 1 01 1 ] 21 ! [ 0 11 1 ] 12 ! [ 0 11 0 ] 21 ! [ 1 11 0 ] 12 ! [ 1 10 1 ] 21 ! [ 1 00 1 ] Finding a solution for n = 3 turned out to be not so easy. Early attempts to solve this problem found many near misses. Even though the ability to add any row to any other row gives a lot of freedom, the extra freedom made searching very di cult. Appealing to the Lov asz conjecture, a solution should exist for any generating set. Consider the following lemma. 57 Lemma 6.3.2. Let G = GLn(2) and let i be the row operation that adds row i to row i + 1. For n, the row operation will wrap around and add row n to row 1. Then S =f i for i2[1::n]g is a generating set for G. Proof. Each i = i;i+1. Noting that ik kj ik kj = ij, one can build up any ij. Thus, in the case wheren = 3, instead of consideringS =f 12; 13; 23; 21; 31; 32g, the Lov asz conjecture implies that a Hamiltonian cycle should exist using only S =f 1; 2; 3g. Using the reduced generating set, the following 24 long sequence was found. When the sequence is repeated 7 times, it generates all 168 matrices in AGL(3,2). 1 2 1 2 1 2 1 3 1 2 1 2 1 2 1 3 1 2 1 2 1 3 1 2 The cases where n = 2;3 are special in that they cannot be found using the heuristic algorithm presented later. 6.4 Borel Subgroup Removing one generator from S = f ig, namely n, leaves a set that generates the lower triangular matrices, since the rows can only add down. This subgroup, also known as the Borel group, is one generator away from GLn(2). Hamiltonian cycles on cosets of this subgroup could be pieced together to form a Hamiltonian cycle for GLn(2). This is how the heuristic algorithm to be presented later will work. Theorem 6.4.1. Let Bn be the lower triangular subgroup of GLn(2) and let S =f i for i2[1::(n 1)]g. Then the Cayley graph (Bn;S) has a Hamiltonian cycle. 58 Figure 6.1: Extending the Active Cycle ve Active cycle vg n 1 n n k k Coset cycle Proof. For n = 3, one can verify that [ 1 2 1 2 1 2 1 2] is a Hamiltonian cycle. By induction, we assume that a Hamiltonian cycle exists for Bn. Identify Bn with the subgroup of Bn+1 that is zero everywhere o the diagonal in the nal row. The cosets of Bn partition Bn+1. Each coset can be traversed using the Hamiltonian cycle for Bn. Note that n commutes with all of the generators in Bn except for n 1. The following process may be used to generate a Hamiltonian cycle for all of Bn+1. 1. Start with the Hamiltonian cycle associated with the subgroup Bn. This will be the rst state of what will be referred to as the active cycle. Note that all the cosets of Bn may also be traversed by the same transition order as the Hamiltonian cycle for Bn. The goal is to connect each coset cycle to the active cycle, eventually enlarging the active cycle to cover all of Bn+1. 2. Starting at the identity, e, follow the Hamiltonian path on the active cycle and test each vertex by applying n to see if n transitions to a vertex not on the active cycle. 59 3. Let vg be the rst vertex on the active cycle such that vg n is not on the active cycle. Since S is a generating set, the only time this can fail is when we have generated the entire Hamiltonian cycle. (Note: In the case where transition away from the active cycle from the initial point e, one of the incoming or outgoing edges to e must not be n 1. Thus, the rest of the argument will still apply.) 4. Since n commutes with all of the generators for Bn except n 1, the edge transitioning into vg must be n 1. See gure 6.1. Otherwise, commuta- tivity would imply that we would have transitioned away from the active cycle on the previous step. 5. Let k be the edge transitioning out of vg on the active cycle. Then k6= n, since n must be taking us to a new coset cycle. And k6= n 1, since 2n 1 = e, and the vertices are distinct on a Hamiltonian path. Thus n k = k n. 6. By corollary 6.2.3, there is a Hamiltonian cycle over the coset and by corol- lary 6.2.2, we may rotate the cycle so that the last edge is k. 7. Since the coset cycle is a Hamiltonian cycle, the product of all the tran- sitions is the identity (return to the starting point). Thus, if we do not complete the last transition, the product of all the previous transitions must by 1k which is simply k since it has order 2. 8. Extend the active cycle to include the points on the coset cycle by branching out from the active cycle at vg. Continue around the Hamiltonian cycle from the coset, but stopping before applying the nal transition k. Return back to the active cycle via n. The total action is equivalent to n k n which is 60 equivalent to k due to commutativity. Thus we return at exactly the next vertex on the active cycle and can now complete the trip back to ve. 9. If the active cycle now covers all of Bn+1 stop. Otherwise, return to step 2 to extend the active cycle by another coset cycle. 6.5 Heuristic Algorithm Although the evidence is clearly in favor of a Hamiltonian cycle existing for GLn(2), the author was unable to nd a solution. However, the following heuristic algorithm was used to nd Hamiltonian cycles over GLn(2) for n = 4;5. In this case note that n = n;1. The nal n adds the nth position to the rst position. The only di erence between this algorithm and the algorithm for nding Hamiltonian cycles over the Borel group is that we cannot prove that this al- gorithm succeeds. It is possible to build an active cycle that cannot be extended, but is also not the full Hamiltonian cycle. 1. Start with the Hamiltonian cycle associated with the Borel subgroup Bn. 2. Apply n to each vertex vg. If n transitions to a coset not contained in the active cycle, and the incoming and outgoing transitions for vg are not both 1 and n 1, then we have found a vertex from where we can extend the active cycle. NOTE: Such a vertex may not exist. This is where the algorithm could fail. 3. Let k be the transition where k6= 1;n 1. If k is an incoming transition, then choose the vertex immediately preceding vg to be our branching point. 61 4. In exactly the same manner as the previous algorithm, insert the new coset cycle in the active cycle in the place of k. 5. If the active cycle now covers all of GLn(2) stop. Otherwise, return to step 2 to extend the active cycle by another coset cycle. 6.6 Open Problems Prove that a Hamiltonian cycle exists over GLn(2) using only row addi- tions. This could include all possible row additions, not just the reduced set considered here. Find a \Gray Code" style algorithm for the Hamiltonian cycle. The current method only nds a path whose description is essentially a long description of what move to do at each step. For Gray Code, there is a function f(t) which indicates which bit should be ipped at time t. The ideal answer would be a simple function f(t) = (a;b) indicating the next step would be to add row a to row b. 62 Chapter 7 Double Coset Representatives 7.1 Results and Application This chapter de nes a Basis Fixing Permutation (BFP) as a permutation that xes the zero vector and each vector of weight one. Each double coset contains at least one, and usually many BFPs. Additional reductions can be applied to nd a minimal BFP from a given BFP. Unfortunately, many double cosets still contain multiple minimal BFPs. This set of minimal BFPs is used to identify a given double coset. Once found, minimal BFPs can quickly discover which class a given permutation is in. Unfortunately, nding all the minimal BFPs is computationally intensive. The major results of this chapter are: Every permutation is a ne equivalent to a basis xing permutation (BFP). Each coordinate polynomial of a BFP has a single linear term. Given a lexicographic ordering and a BFP, a minimal BFP can be quickly computed. 63 A ne equivalent representatives for each double coset are used to accelerate the identi cation algorithms. 7.2 Background Let us rst recall some basic terms used with linear and a ne transformations. For the following de nitions, let K be a eld and the vi?s elements of a vector space over K. Definition 7.2.1. The linear span of m vectors v1;:::;vm2Kn is ( mX i=1 aivi ai2K ) : Similarly, the a ne span of m+ 1 vectors v0;:::;vn2Kn is ( mX i=0 aivi mX i=0 ai = 1;ai2K ) : Definition 7.2.2. A set of m vectors v1;:::;vm is linearly independent if mX i=1 aivi = 0 =) 8i;ai = 0: Similarly, a set of n+ 1 vectors v0;:::;vn is a nely independent if nX i=0 aivi = 0 and nX i=0 ai = 0 =) 8i;ai = 0: Definition 7.2.3. Given n, the a ne basis vectors are e0 =~0 and e1;:::;en, where each ei is a bit vector with zeros in every position except a 1 in the ith position, for i in 0;1;:::;n. 64 7.3 Basis Fixing Permutations Definition 7.3.1. A Basis Fixing Permutation (BFP) is a function that sends each a ne basis vector to itself. Thus, f is basis xing if f(~0) =~0;f(~e1) = ~e1:::;f(~en) = ~en. The rst major reduction that can be made is realizing that every permutation is a ne equivalent to a BFP. This was proved by the author and subsequently discovered to be in [Hou06]. Theorem 7.3.2. Every permutation in S2nis a ne equivalent to a basis xing permutation. Proof. Suppose there exists a set of n+1 a ne independent vectors x0;x1;:::;xn such that p(x0);p(x1);:::;p(xn) is also a ne independent. Then there exist a ne maps a0;a1 such that for all i, a0(ei) = xi and a1(p(xi)) = ei. Thus a1pa0(ei) = ei for all i and a1pa0 is therefore basis xing. Thus the proof reduces to nding an a ne independent set that maps to an a ne independent set via p. We prove by induction that for every k in 0 k n there exists an a ne independent setXk such that p(Xk) is also a ne independent. Clearly fork = 0;1 any Xk and p(Xk) is a ne independent. Assume that for k < n, Xk is a ne independent. LethXkiaf denote the a ne span of ak+1 element setXk. Consider the permutation p, restricted as follows: p : V2nnhXkiaf !V2nnp(Xk): Since jp(V2nnhXkiaf)j+jV2nnhp(Xk)iafj= 2(2n 2k) > 2n (k+ 1) =jV2nnp(Xk)j; 65 the intersection p(V2nnhXkiaf)\(V2nnhp(Xk)iaf) is non-empty, and thus there exists an x not in the spanhXkiaf that maps to p(x) not in the span ofhp(Xk)iaf. Adding x to Xk, we create a new set Xk+1 = Xk[x, such that Xk+1 and p(Xk+1) are both a nely independent. Using the nal Xn constructed in this manner, we can now nd maps a0 and a1 so that a1pa0 is a basis xing permutation. Lemma 7.3.3. Let p be a BFP. Then each coordinate function pi of p has the form pi(x1;:::;xn) = xi +fi(x1;:::;xn) where fi contains no constant or linear terms. Proof. p(e0) = e0 = ~0 implies no coordinate function can contain a 1. Given that all coordinate functions have no constant terms, p(ei) = ei indicates that the linear term for each pi is xi. 7.4 Basis Permuting Permutations One important subgroup of GLn(2) is the group of matrices with a single one in each row and column. This subgroup is isomorphic to Sn acting on f1;2;:::;ng via the injective map : Sn!GLn(2) de ned by the outer product sum (p) = nX i=1 e p(i) he ij: Thus (p) is the linear function that maps the basis vectorjeiito the basis vector e p(i) for each i2f1;2;:::;ng. In AGLn(2), there is a subgroup isomorphic to Sn+1 acting onf0;1;2;:::;ng, which allows e0 to be permuted with the basis vectors. We de ne this subgroup via the injective map : Sn+1 !AGLn(2) de ned by 66 (q) = nX i=1 e q(i) he ij+ e q(0) he ij + e q(0) : The preceding a ne form is applied to a vectorjviby rst multiplying by the matrix component and then adding the vector component. Thus (q)jvi= nX i=1 e q(i) he ijjvi+ e q(0) he ijjvi + e q(0) : Definition 7.4.1. A Basis Permuting Permutation (BPP) is a function that sends each a ne basis vector to another. Thus, p is basis permuting if for all i in 0;1;:::;n, f(~ei) = ~ej for some j in 0;1;:::;n. Lemma 7.4.2. All of the elements of (Sn+1) < AGLn(2) are basis permuting permutations. Proof. Given q 2Q, verifying that q(jeji) maps to eq(j) is su cient to prove that q is a BPP. Recall that heijjeji= 8 >< >: 1 i = j6= 0 0 otherwise. When q is evaluated at e0 =~0, only the a ne component eq(0) does not zero out. The result q(je0i) = eq(0) is as expected. Evaluating q at ej where j6= 0 gives: 67 q(jeji) = " nX i=1 e q(i) he ij+ e q(0) he ij + e q(0) # (jeji) = nX i=1 e q(i) he ijjeji+ e q(0) he ijjeji + e q(0) = eq(j) + eq(0) + eq(0) = eq(j) : Thus, q permutes the a ne basis vectors, and is therefore a BPP. Lemma 7.4.3. Any basis xing permutation conjugated by a basis permuting permutation is a basis xing permutation. Proof. Let q be a BPP and p be a BFP. Then for i in f0;1;:::ng q 1pq(jeji) = q 1p( eq(j) ) = q 1( eq(j) ) = eq 1(q(j)) = jeji: Since q 1pq xes all a ne basis vectors, q 1pq is a BFP. 7.5 Minimal Basis Fixing Permutations Conjugating by the elements in Q< AGLn(2), we can construct many di erent BFPs from a given BFP. Determining which BFP is minimal depends on the 68 lexicographic order chosen. For Gr obner bases, grevlex is preferred, since it is usually easier to compute an ideal with respect to that order. Definition 7.5.1. The Graded Reverse Lexicographic or grevlex is a poly- nomial ordering. The ordering of monomials is decided rst by degree. If the degree is the same, it decides ties by the degree of the variables in reverse order. Example 7.5.2. Polynomials ordered by grevlex. x1x3 1, it only takes a non- zero value at one point, namely (1;1;:::;1). Every other monomial is non-zero at an even number of points. Since the coordinate function of a permutation must be balanced, and therefore have an even number of 0?s and 1?s (for n> 1), x1x2 xn cannot be contained in any coordinate function because it would make the weight odd, and therefore not balanced. Thus we only need to consider monomials of degree up to n 1 for permutation coordinate functions. Definition 8.5.1. The multiple rank invariant of an n-bit permutation is an n 1 tuple of k invariant di erences. I(p) = (R1(p);R2(p) R1(p);:::;Rn 1(p) Rn 2(p). Example 8.5.2. Consider the following permutations. Let p1 = (x1;x2;x3 +x1x2). As computed earlier, R1(P1) = 2 and R2(P1) = 3. Thus the multiple rank invariant is the tuple (2;1). Let p2 = (x1;x1 + x2;x3 + x1x2x4;x4 + x1x2). By inspection, R1(P2) = 2, R2(P2) = 3 and R3(P2) = 4. Thus the multiple rank invariant is the tuple (2;1;1). 8.6 Categorization of 3-bit Permutations by MRI For the four 3-bit equivalence classes, the multiple rank invariant completely distinguishes each class. 76 Table 8.1: Multiple Rank Invariant of 3-bit Permutations MRI # of classes Description (3,0) 1 A ne (2,1) 1 To oli (1,2) 1 (0,3) 1 We will discover later that the multiple rank invariant also captures the rela- tive complexity of all 3-bit permutations, ordering them in terms of those requir- ing 0,1,2 or 3 To oli gates. 8.7 Categorization of 4-bit Permutations by MRI As seen in Table 8.2, the multiple rank invariant separates many of the 302 4-bit equivalence classes, but fails to distinguish among some of the most prevalent permutation types. When the complexity tree is developed later, we will see that the multiple rank invariant has a rough correlation to the nonlinear complexity of a permutation. 8.8 Open Problems For n 4 the multiple rank invariant is always the same for p and p 1, even when they are in di erent double cosets. Is this true for all n? 77 Table 8.2: Multiple Rank Invariant of 4-bit Permutations MRI # of Classes Description (4,0,0) 1 A ne (3,1,0) 1 To oli (3,0,1) 1 Triple controlled NOT (2,2,0) 2 (2,1,1) 3 (2,0,2) 4 (1,3,0) 3 (1,2,1) 9 (1,1,2) 20 (1,0,3) 13 (0,3,1) 5 (0,2,2) 39 (0,1,3) 127 (0,0,4) 74 78 Chapter 9 Additional Invariants 9.1 Results and Application In addition to the multiple rank invariant, there are a handful of other minor invariants that can be used to re ne the double coset identi cation process. This chapter presents a number of these invariants and their associated proofs. The major results of this chapter are: The parity of a permutation is invariant under a ne equivalence. De nition of a two-way permutation and its use as an invariant. These additional invariants are used extensively to speed up computation of the a ne equivalence complexity tree. 9.2 Parity The parity of a permutation is often a useful characteristic, and it will play a central role in classifying the complexity of permutations. 79 Lemma 9.2.1. Any permutation that does not involve all of the bit wires is an even permutation. Proof. Let p be a permutation that does not involve a certain wire. Without loss of generality, we can assume that the high bit is not involved in the permutation. Then p can be decomposed into two disjoint cycle sets: those whose high bit is 0, and those whose high bit is 1. Furthermore, since the high bit is not involved in p, the two cycle sets must have exactly the same structure. Thus the overall permutation must be even. Corollary 9.2.2. For n > 2, the parity of a permutation is invariant under a ne equivalence. Proof. All a ne permutations are generated by the NOT gate on any single bit, and by CNOT gates between any two bits. When n > 2, there is at least one bit that is not involved in the action of the NOT or the CNOT. Thus by Lemma 9.2.1, CNOT and NOT are both even permutations, and therefore any a ne permutation is an even permutation. As proved earlier, there are an equal number of even and odd double cosets. The parity test is independent of the multiple rank invariant, distinguishing some permutations with identical MRIs. 9.3 Two-Way Permutations A one-way permutation is a permutation which is easy to compute, but its inverse is hard to compute. Proving the existence of one-way permutations has proven extremely di cult, and is closely related to the question of P ?= NP. We will not 80 be solving this problem here, but rather introduce the notion of the opposite of a one-way permutation. Definition 9.3.1. A permutation p, is a two-way permutation, if there exist a;b2AGLn(2) such that apb = p 1. Thus a two-way permutation is a ne equivalent to its inverse. This implies that if a method was found to compute p (respectively p 1) faster, that would automatically yield a method for computing p 1 (respectively p) faster. Thus no two-way permutation has a chance of being a one-way permutation. Lemma 9.3.2. If p is a two-way permutation, then every permutation a ne equivalent to p is a two-way permutation. Proof. Assume p is a two-way permutation. Then there exist a;b 2 AGLn(2) such that apb = p 1. Consider any other element in the a ne double coset of p, having the form gph. Then (gph) 1 = h 1p 1g 1 = h 1apbg 1 = h 1ag 1(gph)h 1bg 1: Thus the inverse of gph is a ne equivalent to gph. Therefore gph is a two-way permutation. Thus if an a ne equivalent double coset contains a two-way permutation, then the entire double coset is made up of two-way permutations. This implies that \two-way-ness" is invariant over the double coset. Now let us consider the relationship between involutions and two-way permutations. Lemma 9.3.3. If a permutation p a ne equivalent to an involution, then p is a two-way permutation. 81 Proof. Suppose p is in a double coset containing an involution i. Then there exist a;b2AGLn(2) such that apb = i. Thus, (apb)(apb) = i2 = 1 pbapba = 1 bapba = p 1: Therefore, p is a ne equivalent to p 1 and is therefore a two-way permutation. It is unknown if the converse is true. Table 9.1 shows how the parity and two-way invariants further re ne permutation identi cation. 9.4 Open Problems Is every two-way permutation a ne equivalent to an involution? For n 4 it is true. Are there other useful invariants? 82 Table 9.1: Invariant Table for 4-bit Permutations Two-way not Two-way MRI Even Odd Even Odd (4,0,0) 1 (3,1,0) 1 (3,0,1) 1 (2,2,0) 2 (2,1,1) 1 2 (2,0,2) 3 1 (1,3,0) 3 (1,2,1) 1 4 4 (1,1,2) 11 3 2 4 (1,0,3) 4 9 (0,3,1) 3 2 (0,2,2) 12 3 10 14 (0,1,3) 14 35 42 36 (0,0,4) 32 26 6 10 83 Chapter 10 Complexity Theory 10.1 Results The goal of this chapter is to illustrate the usefulness of determining the complex- ity of a function by measuring the nonlinearity of a function. The major results of this chapter are as follows: De nition of new complexity class based on nonlinearity of a computation. Proofs relating nonlinear complexity to AC and NC. Proofs relating the classical and nonlinear versions of P and P=poly. 10.2 Background An excellent introduction to the complexity classes relevant to circuits is given in [Vol99]. This section provides important complexity class de nitions from that book. To determine a circuit class, we rst choose a basis for constructing functions. Let us consider two of the most common bases. 84 Definition 10.2.1. B0 =f:;_2;^2gis the standard bounded fan-in basis. B1 = f:;(_n)n2N;(^n)n2Ng is the standard unbounded fan-in basis. In B0 you can only take the AND or OR of two values, whereas in B1 you may take the AND or OR of as many wires as you wish. The depth of a circuit is the length of the longest path from the input bits to the output bits. Definition 10.2.2. Let B be a basis and let s;d : N ! N. We de ne the following complexity classes: 1. SIZEB(s) is the class of all sets A f0;1g for which there is a circuit family C over basis B of size O(s) that accepts A as an input. 2. DEPTHB(d) is the class of all sets A f0;1g for which there is a circuit family C over basis B of depth O(d) that accepts A as an input. 3. SIZE-DEPTHB(s;d) is the class of all sets A f0;1g for which there is a circuit family C over basis B of size O(s) and depth O(d) that accepts A as an input. Two important complexity classes capture the notion of what circuits can be parallelized e ciently: NC and AC. Definition 10.2.3. For i 0, de ne NCi = SIZE-DEPTHB0(nO(1);(logn)i); and let NC = [ i 0 NCi: Similarly, for i 0, de ne ACi = SIZE-DEPTHB1(nO(1);(logn)i); 85 and let AC = [ i 0 ACi: Thus NC and AC both represent polynomial sized circuits with polylog depth. The only di erence is AC allows unbounded fan-in (using basis B1) as opposed to bounded fan-in for NC (using basis B0). The following two results are well known in complexity theory. Theorem 10.2.4. For i 0, NCi ACi: Corollary 10.2.5. NC = AC: For a proof of these results, see [Vol99]. 10.3 Nonlinear Complexity Class Similar to the notion of size and depth for classical circuits, we introduce the notions of width and depth for nonlinear reversible circuits. We also need to introduce the notion of a nonlinear basis. Recall that Tijk is a To oli gate with control wires i;j acting on the target wire k. Also recall that (i;j;A) represents the controlled a ne permutation with i control wires dictating whether or not the j j a ne transformation A is applied to the next j wires. Definition 10.3.1. Let C be a reversible circuit acting on n wires. Then BT = fTijk where i;j;k2[0;n)g is the To oli basis for C. BCA =f(i;j;A) where i2 [1;n 1);j2[1;n i) and A2AGLn(j)g is the controlled a ne basis for C. 86 The depth of a nonlinear circuit will be the number of times the nonlinear basis had to be used. Note that multiple nonlinear basis gates could be used in a single round as long as they were operating on independent wires. Example 10.3.2. The 6-bit permutation computing T0;1;2 and T3;4;5 could be done in one round of nonlinearity since the two To oli gates are acting on inde- pendent sets of wires. Definition 10.3.2. Let B be a nonlinear basis and let w;d : N!N. We de ne the following complexity classes: 1. WIDTHB(s) is the class of all functions f :f0;1g !f0;1g for which there is a reversible circuit family C over basis B using O(w) wires that computes f. 2. DEPTHB(s) is the class of all functions f :f0;1g !f0;1g for which there is a reversible circuit family C over basis B using O(d) rounds of B that computes f. 3. WIDTH-DEPTHB(s) is the class of all functions f : f0;1g !f0;1g for which there is a reversible circuit family C over basis B using O(w) wires and O(d) rounds of B that computes f. In the same spirit in which NC and AC were de ned, we now de ne the class of nonlinear circuits with polynomial width and polylog depth. Definition 10.3.3. For i 0, de ne NLCi = WIDTH-DEPTHBCA(nO(1);(logn)i); 87 and let NLC = [ i 0 NLCi: Note that NLC is de ned using the controlled a ne basis BCA. Another complexity class similar in nature to NCi can be de ned using the To oli basis BT. Although NLC may seem overpowered, given its ability to perform arbitrary a ne transformations between every nonlinear step, we will see that NLCi ts nicely within the AC hierarchy. Lemma 10.3.4. ACi NLCi: Proof. Let C be a circuit in ACi with s(n) gates and d(n) depth. Since there are only s(n) gates in the circuit, there are at most s(n) +n di erent wires to join in a gate at any given time: n for the inputs and s(n) for the output of each gate. Thus the maximal fan-in gate for the circuit has less than s(n) +n inputs. Construct a nonlinear circuit of width w(n) = (s(n)+1)(s(n)+n). This space will be allocated according to table 10.1. Table 10.1: Space Requirements Size Description n Initial input s(n)(s(n) +n) Wires to hold input for each of the s(n) gates. s(n) Wires to hold output for each of the s(n) gates. 88 To populate the circuit with gates, we will follow the ow of the original circuit C. Let Gi be the set of gates computed at depth i in the circuit C. For each gate g2Gi, we construct it using the following steps. 1. Using CNOTs, copy the value of the inputs to g to the input staging area allocated for g. 2. If g is an OR gate, negate all input wires. 3. Apply a multiple controlled NOT from the inputs of g to the output wire allocated for g. 4. If g is an OR gate, negate the output wire. Note that step 3 is the only nonlinear step. All CNOTs and negations are a ne functions, and can be computed in the linear round between each nonlinear round. Furthermore, since each g2Gi only depends on values from earlier rounds, all g 2Gi can be computed in the same nonlinear round. Thus the nonlinear circuit will also have depth d(n). Since C 2 ACi, s(n) is polynomial. Thus w(n) = (s(n) + 1)(s(n) + n) is also polynomial. Also, C 2 ACi implies d(n) = O((logn)i). Therefore, ACi NLCi. Lemma 10.3.5. NLCi NCi+1: Proof. Recall that NLCi = WIDTH-DEPTHBCA(nO(1);(logn)i): 89 This implies NLCi = [ k 0 WIDTH-DEPTHBCA(nk;(logn)i): We will prove that for each k, WIDTH-DEPTHBCA(nk;(logn)i) NCi+1: The construction of NLCi by union will then imply that NLCi NCi+1. Let C be a nonlinear circuit in WIDTH-DEPTHBCA(nk;(logn)i). Then C consists of O((logn)i) rounds of alternating linearity and nonlinearity on a width of O(nk) wires. Let us consider the depth in NC terms of the linear and nonlinear rounds separately. In each linear round, an a ne matrix acts on the O(nk) wires. Since each output wire of the a ne transformation, is essentially the parity of some subset of the wires, and PARITY is in NC1 (polynomial gates, log depth), the a ne transformation can be computed with polynomial many gates and O(log(nk)) = O(klogn) = O(logn) depth. In each nonlinear round, multiple controlled a ne transformations are ap- plied. For each wire i compute the output value ai as if the a ne transformation it is associated with was applied. Similar to the reasoning above, this can be done in O(logn) depth and polynomial gates. Also, for each i compute the control value ci which is the AND of the control wires for the a ne function associated with i. The AND of O(nk) inputs can also be computed in O(logn) depth in NC. Let ni be the original value of wire i with no transformation applied. The correct output for wire i is then (ni^:ci)_(ai^ci): Examination veri es that if all the controls are 1, the output of wire i will be ai, 90 and otherwise ni. This nal computation only requires depth 2, thus the total depth for the nonlinear round is O(logn). Since the nonlinear circuit has O((logn)i) linear and nonlinear rounds, and each round has NC depthO(logn), the entire circuit will have NC depthO((logn)i) O(logn) = O((logn)i+1). Since the entire circuit uses only polynomially many gates, WIDTH-DEPTHBCA(nk;(logn)i) NCi+1: Furthermore, since this inclusion holds for all k, NLCi NCi+1: Theorem 10.3.6. NLC = AC = NC: Proof. Combining the two previous lemmas, we get NCi ACi NLCi NCi+1 ACi+1 NLCi+1: Since NC, AC and NLC are all in nite unions, this is su cient to prove NLC= AC= NC. 10.4 Nonlinear Polynomial Complexity Consider now the complexity class with polynomial width and a polynomial num- ber of nonlinear rounds. Theorem 10.4.1. WIDTH-DEPTHBCA(nO(1);nO(1)) = SIZEB1(nO(1)): 91 Proof. For each k, let C be a circuit class in SIZEB1(nk). Convert every AND and OR gate into NAND gates using NOT gates on the inputs and outputs as needed. Let s(n) be the number of NAND gates necessary to compute C. Construct a nonlinear circuit of width n + s(n). The rst n wires will contain the function input, and the last s(n) will be used to hold the output from each of the s(n) NAND gates. Populate the circuit by computing the NAND gates in the same order as for the circuit computation, storing each output in the s(n) available positions. In the worst case, each NAND gate must be computed in a separate nonlinear depth. Thus the depth of the nonlinear circuit is also O(nk). This circuit conversion implies SIZEB1(nk) WIDTH-DEPTHBCA(nk;nk): Since this containment holds for any k, SIZEB1(nO(1)) WIDTH-DEPTHBCA(nO(1);nO(1)): Alternatively, for each k let C be a circuit class in WIDTH-DEPTHBCA(nk;nk). Let us consider the number of gates required in terms of the linear and nonlinear rounds separately. In the linear rounds, we need to compute a matrix multiply followed by an a ne shift. This can be done in O((nk)2) = O(n2k) gates. Similarly, in the nonlinear rounds, compute the potential a ne output for each bit ai for each wire i. This also requires O((nk)2) = O(n2k) gates. Also, for each i compute the control value ci which is the AND of the control wires for the a ne function associated with i. Finally, let ni be the original value of wire i with no transformation applied. The correct output for wire i is then (ni^:ci)_(ai^ci): 92 Using O(n2k) gates for each of the O(nk) wires potentially requires O(n3k) gates. Combining together all O(nk) rounds of nonlinearity, we require O(n3k) O(nk) = O(n3k gates. Thus, WIDTH-DEPTHBCA(nk;nk) SIZEB1(n4k): Since this containment holds for any k, WIDTH-DEPTHBCA(nO(1);nO(1)) SIZEB1(nO(1)): Combining the two containments, we now conclude WIDTH-DEPTHBCA(nO(1);nO(1)) = SIZEB1(nO(1)): 10.5 Uniform and Non-Uniform Nonlinear Cir- cuits Theorem 10.4.1 states that any nonlinear circuit with polynomial width and depth can be converted into a circuit with polynomial gates. We can therefore derive a relation between nonlinear circuits and the complexity classes P and P=poly. The di erence between uniform and non-uniform polynomial circuits is subtle yet important. For a circuit class to be uniform, all circuits must be described by a nite algorithm, or nite set of instructions. Even though the circuit generated by the algorithm may get larger and larger as n increases, the same nite description is used to generate every circuit. For non-uniform circuit classes, there is no blueprint. Each circuit may be di erent. Even if each circuit is polynomially sized, the collective description of the circuits must be in nite. 93 Definition 10.5.1. De ne the following classes: P is the class of uniform circuits that run in polynomial time. P=poly is the class of non-uniform circuits that run in polynomial time. nonlinearP is the class of uniform circuits with polynomial width and rounds of nonlinearity, nonlinearP=poly is the class of non-uniform circuits with polynomial width and rounds of nonlinearity, Corollary 10.5.2. nonlinearP = P and nonlinearP=poly = P=poly: Proof. Using theorem 10.4.1 we can convert back and forth between polynomial circuits and nonlinear circuits with polynomial width and rounds of nonlinearity. Since the conversion description is nite, any uniformity will be preserved. 10.6 Open Problems The results of this chapter show that measuring the complexity of a function or algorithm via its nonlinearity captures most of the essence of \polynomialness". It is hoped that this point of view will be useful in developing new complexity proofs. As quantum computing was part of the inspiration for nding optimal re- versible computation, it is interesting to consider what nonlinearity means in the quantum computing context. For quantum computing, only the CNOT and arbi- trary single unitary gates on qubits are needed for computation. It is unclear to 94 the author what subset of this computation should be chosen to form the a ne equivalence. 95 Chapter 11 Computation Due to the doubly exponential nature of boolean permutations, extremely e cient computational methods are needed to perform many of the calculations in this paper. This chapter outlines a number of the more important methods used. To describe any function f : Vn!Vn, n2n storage bits are required. Even though n-bit permutations theoretically require onlydlog2(2n!)e n2n 2n storage bits, it is useful to store the permutations using all n2n bits, especially since there are cases where we are unsure whether or not we have a permutation. Each of the following algorithms will assume that all n2n bits are in a single multi-precision integer. Note that for n = 4, n2n = 64, and thus a 4-bit permutation may be stored as a single word on a 64-bit machine. Thus the computation time increases signi cantly when extending beyond 4-bit permutations. 11.1 Storage of Permutations There are multiple ways to store bit permutations, and there are algorithmic reasons for preferring each. In this section, we will discuss the three types used 96 in this paper and how to convert between them. 11.1.1 Truth Table Form We can always describe a permutation using its truth table. Recall the example permutation considered earlier: Table 11.1: Truth Table: f = (a2 a0a1;a1 1;a2 a0a1 a0) a0a1a2 b0b1b2 0 0 0 0 1 0 0 0 1 1 1 1 0 1 0 0 0 0 0 1 1 1 0 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 Given the values in a particular order, storing the inputs is unnecessary and we only need to store the output of the permutation in the correct order. Since the author works on a little endian architecture, it was easier to store the least signi cant bits on the right. Thus the truth table is stored in the following order: f(111)f(110)f(101)f(100)f(011)f(010)f(001)f(000): Thus the permutation is stored as the binary word w = 001100110011101000111 010|{z} f(000) : 97 Extract the value f(i) shifting w by i n bits to the right and then masking o the right n bits (or low n bits). Note that (1 <> (i*n) ) & ((1<> ( i*(1<> ( i*(1<< >: x+b mod m if x< >: ax mod m if x< >: ax+b mod m if x< >: x 1 mod m if x< >: ax mod p if 0 malloc(sizeof(ulong)*self.n) for i in range(self.n): self.x_mask[i] = (self.p_mask // ((1 << (1<<(i+1)))-1) ) \ * ((1 << (1<<(i)))-1) def x_test(cls): for i in range(cls.n): print rbin(cls.x_mask[i],64) 124 def p_id(cls): ??? Mysterious bug p_id() != p_identity ??? return int(cls.p_identity) def p_mk(cls): return int(cls.p_mask) ################################################################ # Print methods ################################################################ def str_v(cls, v): ???Binary form of permutation >>> print Size(3).str_v(0) 000 >>> print Size(3).str_v(6) 011 ??? return rbin(v, cls.n) def str_m(cls, m): ???Binary form of matrix >>> print Size(3).str_m(0421) 100 010 001 >>> print Size(3).str_m(0153) 110 101 100 ??? s = ?? for i in xrange(cls.n): t = (m >> cls.n*i) & cls.r_mask s += rbin(t, cls.n) + ?\n? s = s[:-1] return s def str_a(cls, a): ???Binary form of affine matrix >>> print Size(3).str_a(07421) 100 010 125 001 111 >>> print Size(3).str_a(04153) 110 101 100 001 ??? s = ?? for i in xrange(cls.n + 1): t = (a >> cls.n*i) & cls.r_mask s += rbin(t, cls.n) + ?\n? s = s[:-1] return s def str_p(cls, p): ???Binary form of permutation >>> s = Size(3) >>> print s.str_p(s.p_identity) 0 1 2 3 4 5 6 7 >>> print s.str_p(067452301) 1 0 3 2 5 4 7 6 ??? s = str(p & cls.r_mask) for i in xrange(1, cls.N): s += ? ?+str((p >> (i*cls.n)) & cls.r_mask) return s def str_t(cls, p): ???Binary form of permutation >>> s = Size(3) >>> print s.str_t(s.p_identity), 000 001 010 011 100 101 110 111 >>> print s.str_t(067452301), 001 000 011 010 126 101 100 111 110 ??? s = ?? for i in xrange(cls.N): s += bin((p >> (i*cls.n)) & cls.r_mask, cls.n)+?\n? return s ################################################################ # Conversion methods ################################################################ ??? NOTE: Finite Functions acting on [0,1,...,2**n-1] 0 1 2 3 4 5 ==> 8 7 6 5 4 3 2 1 0 6 7 8 Functions are named as follows _ReturnType_InputTypes_FunctionDescription Possible types are: m matrix v vector r row vector c column vector a affine b bit p permutation ? controlled affine (condition off of most significant bits) ??? cpdef ulong r_c(cls, ulong c): ???Converts column vector to row vector >>> s = Size(3) >>> print s.str_v(s.r_c(s.c_mask)) 111 ??? c ^= c >> (2*cls.n-2) c ^= (c & (cls.r_mask << cls.n)) >> (cls.n - 1) c &= cls.r_mask return c cpdef ulong c_r(cls, ulong r): ???Converts row vector to column vector 127 >>> s = Size(3) >>> print s.str_v(s.r_c(s.c_r(s.r_mask))) 111 ??? r ^= (r >> 1) << cls.n r ^= (r >> 2) << 2*cls.n r &= cls.c_mask return r cpdef ulong p_v(cls, ulong v): ???Returns permutation equal to adding vector v. >>> s = Size(3) >>> print s.str_p(s.p_v(1)) 1 0 3 2 5 4 7 6 ??? return cls.p_identity ^ (v * cls.p_base) cpdef ulong p_m(cls, ulong m): ???Returns permutation equal to matrix multiplication by m. >>> print Size(3).str_p(Size(3).p_m(0153)) 0 3 5 6 1 2 4 7 ??? cdef uint i cdef ulong p p = 0 for i in range(cls.N): p ^= cls.r_rm_mul(i, m) << (i * cls.n) return p cpdef ulong p_a(cls, ulong a): ???Permutation equal to affine matrix multiplication by a. >>> print Size(3).str_p(Size(3).p_a(04153)) 4 7 1 2 5 6 0 3 ??? cdef ulong p, i p = 0 for i in range(cls.N): p ^= cls.r_ra_mul(i, a) << (i * cls.n) return p cpdef ulong a_v(cls, ulong v): ???Returns affine matrix equal to adding vector v. >>> print Size(3).str_a(Size(3).a_v(6)) 100 010 128 001 011 ??? return cls.m_identity ^ (v << cls.nn) cpdef ulong m_v(cls, ulong v): ???Return matrix with char polynomial associated with v. >>> print Size(3).str_m(Size(3).m_v(3)) 010 001 110 ??? cdef ulong m m = cls.m_identity << 1 m ^= ( v << (cls.n - 1) * cls.n ) & cls.m_mask return m def a_m(cls, m): return m # TODO: Tensor two permutations together ################################################################ # Coercion methods ################################################################ cpdef ulong v_p(cls, ulong p): ???Returns zero shift of permutation. Coerce - Inverts p_v >>> s = Size(3) >>> print s.str_v(s.v_p(s.p_identity)) 000 ??? return p & cls.r_mask cpdef ulong m_p(cls, ulong p): ???Matrix function of permutation basis. Coerce - Inverts p_m >>> s = Size(3) >>> print s.str_m(s.m_p(s.p_identity)) 100 010 001 ??? cdef uint i 129 m = 0 for i in range(cls.n): m ^= (p >> ( (1 << i) * cls.n) & cls.r_mask) \ << (i * cls.n) return m cpdef ulong a_p(cls, ulong p): ???Returns affine function of permutation basis. Coerce - Inverts p_a >>> s = Size(3) >>> print s.str_a(s.a_p(s.p_identity)) 100 010 001 000 >>> print s.str_a(s.a_p(067452301)) 100 010 001 100 ??? cdef uint i cdef ulong v, a a = 0 v = p & cls.r_mask for i in range(cls.n): a ^= (v ^ (p >> ( (1 << i) * cls.n) & cls.r_mask)) \ << (i * cls.n) return a ^ (v << cls.nn) cpdef ulong v_a(cls, ulong a): ???Returns affine shift. Coerce - Inverts a_v >>> s = Size(3) >>> print s.str_v(s.v_a(07421)) 111 ??? return (a >> cls.nn) & cls.r_mask cpdef ulong m_a(cls, ulong a): ???Returns zero shift of permutation. Coerce - Inverts convert.a_m >>> s = Size(3) >>> print s.str_m(s.m_a(07421)) 100 130 010 001 ??? return a & cls.m_mask ################################################################ # Get and set methods ################################################################ cpdef ulong r_pi_get(cls, ulong p, uint index): ???Returns the value p(i) from the permutation p. >>> s = Size(3) >>> print s.str_v(s.r_pi_get(s.p_identity, 4)) 001 ??? return (p >> cls.n*index) & cls.r_mask cpdef ulong b_vi_get(cls, ulong v, uint index): ??? Candidate static method. ??? return (v >> index) & 1 cpdef ulong b_mij_get(cls, ulong m, uint i, uint j): ???Returns the (i,j) entry of the matrix m. >>> s = Size(3) >>> print s.b_mij_get(s.m_identity,0,0) 1 >>> print s.b_mij_get(s.m_identity,1,0) 0 ??? return (m >> (i*cls.n +j)) & 1 ??? _b_ci_get _r_mi_get _c_mi_get _vib_set ??? ################################################################ # Matrix methods ################################################################ cpdef ulong r_rm_mul(cls, ulong r, ulong m): 131 ???Row vector times matrix >>> s = Size(3) >>> print s.str_v(s.r_rm_mul(1, 0421)) 100 >>> print s.str_v(s.r_rm_mul(7, 0437)) 000 >>> print s.str_v(s.r_rm_mul(7, 0467)) 101 ??? cdef ulong v v = cls.c_r(r) m &= v * cls.r_mask return cls.r_m_xor_cols(m) cpdef ulong r_ra_mul(cls, ulong r, ulong a): ???Row vector times affine matrix >>> s = Size(3) >>> print s.str_v(s.r_ra_mul(1, 03421)) 010 >>> print s.str_v(s.r_ra_mul(7, 05437)) 101 >>> print s.str_v(s.r_ra_mul(7, 01467)) 001 ??? cdef ulong v,m v = cls.c_r(r) m = a & (v * cls.r_mask) return cls.r_m_xor_cols(m) ^ (a >> cls.nn) cpdef ulong r_mr_mul(cls, ulong m, ulong r): ???Matrix vector times row >>> s = Size(3) >>> print s.str_v(s.r_mr_mul(0421, 1)) 100 >>> print s.str_v(s.r_mr_mul(0665, 7)) 000 >>> print s.str_v(s.r_mr_mul(0467, 7)) 101 ??? cdef ulong v m &= r * cls.c_mask v = cls.c_m_xor_rows(m) return cls.r_c(v) cpdef ulong r_ar_mul(cls, ulong a, ulong r): 132 ???Affine matrix times row vecto First multiply by matrix and then add affine component >>> print Size(3).str_v(Size(3).r_ar_mul(04421, 1)) 101 >>> print Size(3).str_v(Size(3).r_ar_mul(07566, 1)) 110 ??? cdef ulong m, v m = a & (r * cls.c_mask) & cls.m_mask v = cls.c_m_xor_rows(m) return cls.r_c(v) ^ (a >> cls.nn) cpdef ulong m_mm_mul(cls, ulong m1, ulong m2): ???Matrix multiplication >>> s = Size(3) >>> print s.str_m(s.m_mm_mul(0421, 0421)) 100 010 001 >>> print s.str_m(s.m_mm_mul(311, 311)) 101 010 001 ??? cdef uint i cdef ulong f,r,t t = cls.m_m_transpose(m2) r = cls.c_m_xor_rows(m1 & t) for i in range(1, cls.n): f = cls.m_mi_upshift_rows(m1, i) f = cls.c_m_xor_rows(f & t) f = cls.m_mi_upshift_rows(f, cls.n - i) r ^= f << (cls.n - i) r = cls.m_m_ishift_rows(r) return r cpdef ulong a_aa_mul(cls, ulong a1, ulong a2): ???Affine Matrix multiplication >>> print Size(3).str_a(Size(3).a_aa_mul(0421, 0421)) 100 010 001 000 >>> print Size(3).str_a(Size(3).a_aa_mul(07467, 07467)) 133 101 010 001 010 ??? cdef ulong m,v m = cls.m_mm_mul(a1 & cls.m_mask, a2 & cls.m_mask) v = cls.r_ra_mul((a1 >> cls.nn) & cls.r_mask, a2) return m ^ (v << cls.nn) cpdef ulong m_m_inv(cls, ulong m): ???Assuming the matrix is invertible. Returns the inverse. >>> s = Size(3) >>> print s.str_m(s.m_m_inv(0421)) 100 010 001 >>> print s.str_m(s.m_m_inv(143)) 010 001 111 >>> print s.str_m(s.m_m_inv(0124)) 001 010 100 >>> print Size(2).str_m(Size(2).m_m_inv(6)) 01 10 ??? cdef ulong m1,m2,sval,oval,val cdef uint i,j m1 = m m2 = cls.m_identity for i in range(cls.n): for j in range(i, cls.n): # pivot a 1 to (i,i) position if cls.b_mij_get(m1, j, i) == 1: break m1 = cls.m_mij_swap_rows(m1, i, j) m2 = cls.m_mij_swap_rows(m2, i, j) val = (m1 >> i) & cls.c_mask # which rows added val ^= 1ull << i*cls.n # zero out pivot position sval = val * ((m1 >> i*cls.n) & cls.r_mask) oval = val * ((m2 >> i*cls.n) & cls.r_mask) m1 ^= sval 134 m2 ^= oval return m2 cpdef ulong a_a_inv(cls, ulong a): ???Affine inverse. Assume multiplication on right. >>> Size(3).a_a_inv(07567) 1003 ??? cdef ulong m,v m = cls.m_m_inv(a & cls.m_mask) v = cls.r_rm_mul(a >> cls.nn, m) return m ^ (v << cls.nn) cpdef ulong m_m_transpose(cls, ulong m): ???Transpose of Bit Matrix >>> Size(3).m_m_transpose(0421) 273 >>> Size(3).m_m_transpose(0153) 143 ??? cdef ulong f cdef uint i f = 0 for i in range(cls.n): t = cls.c_mask & (m >> i) f ^= cls.r_c(t) << (cls.n * i) return f def m_random(cls): ???Returns a random invertible bit matrix >>> s = Size(3); m=s.m_random(); >>> s.m_mm_mul(m, s.m_m_inv(m)) == s.identity True ??? while True: n = rand.randrange(1<>> s = Size(3); a=s.a_random(); >>> s.a_aa_mul(a, s.a_a_inv(a)) == s.identity True ??? 135 while True: n = rand.randrange(1<>> for i in Size(4).density2(): print Size(4).str_v(i) 1100 1010 1001 0110 0101 0011 ??? m = 3 for i in xrange(cls.n-1): current = (m << i) yield current for j in xrange(i+1, cls.n-1): current ^= m << j yield current return def density3(cls): ???Generate all weight 2 words. >>> for i in Size(5).density3(): print Size(5).str_v(i) 11100 11010 11001 10110 10101 10011 01110 01101 01011 00111 ??? m = 7 136 for i in xrange(cls.n-2): last = (m << i) current = last yield current for j in xrange(i+1, cls.n-2): for k in xrange(j+1, cls.n-1): current ^= 3 << k yield current last ^= 5 << j current = last yield current return """ ################################################################ # Helper methods ################################################################ cpdef ulong r_m_xor_cols(cls, ulong m): ???XOR the values in each column to form a row vector. >>> s = Size(3) >>> s.r_m_xor_cols(s.m_identity) == s.r_mask True ??? cdef uint i for i in range(1, cls.n): m ^= (m >> (cls.n * i) ) & cls.r_mask m &= cls.r_mask return m cpdef ulong c_m_xor_rows(cls, ulong m): ???XOR the values in each row to form a column vector. >>> s = Size(3) >>> s.c_m_xor_rows(s.m_identity) == s.c_mask True ??? cdef uint i for i in range(1, cls.n): m ^= (m >> i) & cls.c_mask m &= cls.c_mask return m cpdef ulong m_mij_swap_rows(cls, ulong m, uint i, uint j): ??? >>> Size(3).m_mij_swap_rows(0421, 0, 2) 137 84 ??? cdef ulong val val = (m >> cls.n*i) ^ (m >> cls.n*j) val &= cls.r_mask m ^= val << cls.n*i m ^= val << cls.n*j return m cpdef ulong m_mi_upshift_rows(cls, ulong m, int i): ??? Circular shift of rows upward >>> Size(3).m_mi_upshift_rows(0421, 1) 98 ??? return (m>>cls.n*i)^((m<<(cls.n*(cls.n-i)))&cls.m_mask) cpdef ulong m_m_ishift_rows(cls, ulong m): ??? Circular shift ith row by i >>> Size(3).m_m_ishift_rows(0421) 161 ??? cdef uint i cdef ulong f f = m & cls.r_mask for i in range(1, cls.n): val = (m >> cls.n*i) & cls.r_mask val = cls.r_r_circ_shift(val, i) f ^= val << cls.n*i return f cpdef ulong r_r_circ_shift(cls, ulong r, uint i): return ((r>>(cls.n-i)&cls.r_mask)^(r<<(i)))&cls.r_mask cpdef ulong m_rr_outer_product(cls, ulong r1, ulong r2): ??? Outer product >>> s = Size(3) >>> print s.str_m(s.m_rr_outer_product(05, 07)) 111 000 111 ??? r1 = cls.c_r(r1) return r1 * r2 138 ??? m_m_canonical form b_vv_dotproduct ??? ################################################################ # Permutation methods ################################################################ cpdef ulong p_p_inv(cls, ulong p): ???Assuming the function is a permutation. Returns inverse. >>> print Size(3).str_p(Size(3).p_p_inv(024710536)) 3 4 7 1 6 2 0 5 ??? cdef ulong i cdef ulong f f = 0 for i in range(cls.N): f ^= i << ( ( (p >> i*cls.n) & cls.r_mask ) * cls.n ) return f cpdef ulong p_p_reverse(cls, ulong p): ???Reverses permuation as a list >>> print Size(3).str_p(Size(3).p_p_reverse(024710536)) 2 4 7 1 0 5 3 6 ??? cdef uint i cdef ulong f f = 0 for i in range(cls.N): f ^= ((p>>i*cls.n)&cls.r_mask)<<(cls.N-i-1)*cls.n return f cpdef ulong p_pp_mul(cls, ulong p0, ulong p1): ???Composition function. >>> print Size(3).str_p(Size(3).p_pp_mul(024710536,024710536)) 4 0 7 6 3 2 1 5 >>> print Size(3).str_p(Size(3).p_pp_mul(024710536,053610742)) 3 0 6 2 4 5 1 7 ??? cdef uint i cdef ulong f, p0i, p1j f = 0 for i in range(cls.N): 139 p0i = (p0 >> i*cls.n) & cls.r_mask p1j = (p1 >> p0i*cls.n) & cls.r_mask f ^= p1j << cls.n*i return f cpdef ulong p_pa_mul(cls, ulong p, ulong a): ???Composition function. >>> s = Size(3) >>> print s.str_p(s.p_pa_mul(053610742,07124)) 5 6 0 7 3 4 1 2 ??? cdef uint i cdef ulong f, pi f = 0 for i in range(cls.N): pi = (p >> i*cls.n) & cls.r_mask f ^= cls.r_ra_mul(pi, a) << cls.n*i return f cpdef ulong p_ap_mul(cls, ulong a, ulong p): ???Composition function. (Slower than reversed composition) >>> s = Size(3) >>> print s.str_p(s.p_ap_mul(07124,053610742)) 5 0 6 4 3 7 1 2 ??? pa = cls.p_a(a) return cls.p_pp_mul(pa,p) cpdef ulong p_p_fixBasis(cls, ulong p): cdef ulong a while not cls.is_inv_a(cls.a_p(p)): p = cls.p_ap_mul(cls.a_random(), p) a = cls.a_a_inv(cls.a_p(p)) p = cls.p_pa_mul(p,a) return p cpdef uint is_a_pp_mul(cls, ulong p, ulong q): cdef ulong v000,v001,v010,v100,v101,v110,v111 cdef ulong t v000 = (q >> (cls.n * (p&cls.r_mask))) v001 = (q >> (cls.n * ((p >> cls.n) & cls.r_mask))) v010 = (q >> (cls.n * ((p >> 2*cls.n) & cls.r_mask))) v011 = (q >> (cls.n * ((p >> 3*cls.n) & cls.r_mask))) if ((v000 ^ v001 ^ v010 ^ v011) & cls.r_mask) != 0: return 0 140 v100 = (q >> (cls.n * ((p >> 4*cls.n) & cls.r_mask))) v101 = (q >> (cls.n * ((p >> 5*cls.n) & cls.r_mask))) if ((v000 ^ v001 ^ v100 ^ v101) & cls.r_mask) != 0: return 0 v110 = (q >> (cls.n * ((p >> 6*cls.n) & cls.r_mask))) if ((v000 ^ v010 ^ v100 ^ v110) & cls.r_mask) != 0: return 0 v111 = (q >> (cls.n * ((p >> 7*cls.n) & cls.r_mask))) if ((v000 ^ v001 ^ v110 ^ v111) & cls.r_mask) != 0: return 0 t = cls.p_pp_mul(p,q) if cls.is_a_p(t) == 1: # Just test the darn thing then! return t else: return 0 cpdef uint is_inv_a(cls, ulong a): ???Determines whether affine transformation is invertible. >>> Size(3).is_inv_a(02743) False >>> Size(3).is_inv_a(07764) True ??? ai = cls.a_a_inv(a) return cls.a_aa_mul(a,ai) == cls.m_identity cpdef int cmp_p(cls, ulong p0, ulong p1): ???Compare using lexicographic value Examples: >>> Size(2).cmp_p(0xe4,0xd9) True ??? cdef uint i cdef ulong val0, val1 for i in range(cls.N): val0 = cls.r_pi_get(p0, i) val1 = cls.r_pi_get(p1, i) if val0 == val1: continue else: return val0 < val1 return 0 cpdef ulong p_random_s(cls): 141 cdef ulong p cdef ulong i,j p = cls.p_identity for i in range(cls.N): j = rand.randrange(i,cls.N) p = cls.p_pij_swap(p,i,j) return p cpdef ulong p_random(cls): cdef ulong p cdef ulong i,j p = cls.p_identity for i in range(cls.N): j = (random() % (cls.N - i)) + i p = cls.p_pij_swap(p,i,j) return p cpdef ulong p_pij_swap(cls, ulong p, uint i, uint j): cdef ulong val val = cls.r_pi_get(p,i) ^ cls.r_pi_get(p,j) p ^= val << cls.n*i p ^= val << cls.n*j return p cpdef ulong p_p_next(cls, ulong p): cdef uint i,j cdef ulong pi,pj i = cls.N - 1 pj = cls.r_pi_get(p,i) while True: i = i -1 if i < 0: return 0 pi = cls.r_pi_get(p, i) if pi < pj: break pj = pi j = cls.N while cls.r_pi_get(p, j-1) <= pi: j = j - 1 p = cls.p_pij_swap(p,i,j-1) 142 i = i + 1 j = cls.N -1 while i>> s = Size(3); p = s.p_identity >>> print s.str_p(s.p_p_cnot(p,0)) 1 0 3 2 5 4 7 6 ??? return p ^ (cls.p_base << target) cpdef ulong p_p_cnot(cls, ulong p, ulong source, ulong target): ???Control NOT >>> s = Size(3); p = s.p_identity >>> print s.str_p(s.p_p_cnot(p,0,1)) 0 3 2 1 4 7 6 5 ??? cdef ulong d d = (p & (cls.p_base << source)) >> source return p ^ (d << target) cpdef ulong p_p_ccnot(cls, ulong p, ulong source1, \ ulong source2, ulong target): ???Double control NOT >>> s = Size(3); p = s.p_identity >>> print s.str_p(s.p_p_ccnot(p,0,1,2)) 0 3 2 1 4 7 6 5 ??? cdef ulong d d = (p & (cls.p_base << source1)) >> source1 d &= (p & (cls.p_base << source2)) >> source2 return p ^ (d << target) cpdef ulong p_p_cccnot(cls, ulong p, ulong source1, \ ulong source2, ulong source3, ulong target): ???Triple control NOT >>> s = Size(4); p = s.p_identity >>> print s.str_p(s.p_p_cccnot(p,0,1,2,3)) ??? 143 cdef ulong d d = (p & (cls.p_base << source1)) >> source1 d &= (p & (cls.p_base << source2)) >> source2 d &= (p & (cls.p_base << source3)) >> source3 return p ^ (d << target) def Test3(cls, p): L = [] L.append(cls.p_identity) L.append(cls.p_p_ccnot(L[0],0,1,2)) L.append(cls.p_p_ccnot(L[1],0,2,1)) L.append(cls.p_p_ccnot(L[2],1,2,0)) for i in range(4): if cls.DC(p,L[i]): return i return ?Fail? def Test4(cls,p): if cls.b_p_parity(p) == 0: return ?Even: ?+str(cls.Test4Even(p)) else: return ?Odd: ?+str(cls.Test4Odd(p)) def order(cls, p): ???Find order of permutation >>> s=Size(3); s.order(s.p_identity) == 1 True >>> s.order(057136420) 7 ??? d = p i = 1 while True: if d == cls.p_identity: return i d = cls.p_pp_mul(d,p) i += 1 if i > 999: return None ################################################################ # Rank Algorithms ################################################################ 144 cpdef uint b_p_parity(cls, ulong p): ???Return the parity of the permutation. >>> s = Size(4); p = s.p_identity >>> s.b_p_parity(p) 0 >>> p = s.p_random(); s.b_p_parity(s.p_pp_mul(p,p)) 0 >>> print Size(3).b_p_parity(067543210) # Toffoli Gate 1 ??? cdef uint a,c,i,j a = 0; c = 0 for j in range(cls.N): if (a>>j)&1 == 0: c ^= 1 i = j while True: a ^= 1<> 32 if (p & 0xffff) == 0: n += 16 p = p >> 16 if (p & 0xff) == 0: n += 8 p = p >> 8 if (p & 0xf) == 0: n += 4 p = p >> 4 if (p & 0x3) == 0: n += 2 p = p >> 2 if (p & 0x1) == 0: n += 1 return n 145 cpdef uint is_a_p(cls, ulong p): ???Determines whether a permutation is an affine function. >>> print Size(3).is_a_p(Size(3).p_identity) True >>> print Size(3).is_a_p(067543210) # Toffoli Gate False >>> print Size(3).is_a_p(067452301) # NOT Gate on lsb True ??? cdef uint i cdef ulong a,v a = 0 v = p & cls.r_mask for i in range(cls.n): a ^= (v ^ (p >> ( (1 << i) * cls.n) & cls.r_mask)) \ << (i * cls.n) a ^= (v << cls.nn) for i in range(cls.N): if cls.r_ra_mul(i,a)!=((p>>(i*cls.n))&cls.r_mask): return 0 return 1 cpdef uint is_a_p4(cls, ulong p): ???Determines whether a permutation is an affine function. >>> print Size(4).is_a_p(Size(4).p_identity) True >>> print Size(3).is_a_p(067543210) # Toffoli Gate False >>> print Size(3).is_a_p(067452301) # NOT Gate on lsb True ??? p ^= ( p & 0xf)*0x1111111111111111ull p ^= ((p >> 4) & 0xf)*0x1010101010101010ull p ^= ((p >> 8) & 0xf)*0x1100110011001100ull if (p & 0xffff) != 0: return 0 p ^= ((p >> 16) & 0xf)*0x1111000011110000ull if (p & 0xffffffff) != 0: return 0 p ^= ((p >> 32) & 0xf)*0x1111111100000000ull return (p == 0) cpdef uint is_t_p(cls, ulong p): ???Determines whether a permutation is a.e. to Toffoli. 146 ??? cdef uint i cdef ulong t,q i = cls.n_p_signature(p) if i == 11: return 1 else: return 0 cpdef uint is_t3_p(cls, ulong p): ???Determines whether a permutation is a.e. to Toffoli. ??? cdef uint i cdef ulong t,q i = cls.n_p_signature(p) if i == 3: return 1 else: return 0 cpdef uint n_p_rank(cls, ulong p): ???Determines rank of the four vector truth tables. ??? cdef uint rank, n, a, b rank = 0 while p != 0: n = cls.n_p_lowbit(p) b = n & 0x3 a = n - b p ^= ((p>>b)&0x1111111111111111ull)*((p>>a)&0xf) rank += 1 return rank cpdef ulong p_p_reduce_affine(cls, ulong p): p ^= ( p & 0xf)*0x1111111111111111ull p ^= ((p >> 4) & 0xf)*0x1010101010101010ull p ^= ((p >> 8) & 0xf)*0x1100110011001100ull p ^= ((p >> 16) & 0xf)*0x1111000011110000ull p ^= ((p >> 32) & 0xf)*0x1111111100000000ull return p cpdef ulong p_p_reduce_quadratic(cls, ulong p): p ^= ((p >> 3*4) & 0xf)*0x1000100010001000ull p ^= ((p >> 5*4) & 0xf)*0x1010000010100000ull p ^= ((p >> 6*4) & 0xf)*0x1100000011000000ull 147 p ^= ((p >> 9*4) & 0xf)*0x1010101000000000ull p ^= ((p >> 10*4) & 0xf)*0x1100110000000000ull p ^= ((p >> 12*4) & 0xf)*0x1111000000000000ull return p cpdef ulong p_p_reduce_cubic(cls, ulong p): p ^= ((p >> 7*4) & 0xf)*0x1000000010000000ull p ^= ((p >> 11*4) & 0xf)*0x1000100000000000ull p ^= ((p >> 13*4) & 0xf)*0x1010000000000000ull p ^= ((p >> 14*4) & 0xf)*0x1100000000000000ull return p def rank_signature(cls, ulong p): cdef uint prev, next sig = [ cls.b_p_parity(p) ] prev = cls.n_p_rank(p) p = cls.p_p_reduce_affine(p) next = cls.n_p_rank(p) sig.append(prev - next) prev = next p = cls.p_p_reduce_quadratic(p) next = cls.n_p_rank(p) sig.append(prev - next) prev = next p = cls.p_p_reduce_cubic(p) next = cls.n_p_rank(p) sig.append(prev - next) prev = next return sig cpdef uint n_p_signature(cls, ulong p): cdef uint prev, next, sig p = cls.p_p_reduce_affine(p) sig = 4 - cls.n_p_rank(p) p = cls.p_p_reduce_quadratic(p) sig ^= (4 - sig - cls.n_p_rank(p)) << 3 return sig cpdef uint n_p_signature2(cls, ulong p): cdef uint sig sig = cls.n_p_signature(p) sig ^= cls.DC(p,cls.p_p_inv(p)) << 6 sig ^= cls.b_p_parity(p) << 7 return sig 148 def L_p_signature(cls, ulong p): ???Signature List (Linear, Quadratic, Parity, CommutatorSize, involution, order3, order4) ??? cdef uint sig sig = cls.n_p_signature(p) L = [] L.append(sig&0x7) L.append(sig>>3) L.append(cls.b_p_parity(p)) L.append(len(cls.S_p_affineCommutator(p))) L.append(cls.DC(p,cls.p_p_inv(p))) ???Order does not appear to be working L.append(cls.b_p_AE_Order3(p)) L.append(cls.b_p_AE_Order4(p)) L.append(cls.b_p_AE_Order5(p)) L.append(cls.b_p_AE_Order6(p)) L.append(cls.b_p_AE_Order7(p)) L.append(cls.b_p_AE_Order8(p)) L.append(cls.b_p_AE_Order9(p)) L.append(cls.b_p_AE_Order16(p)) ??? return L ################################################################ # Double Coset ################################################################ cpdef uint equivDC(cls, ulong a, ulong b): cdef uint h cdef ulong t for h in range(1<<(cls.nn+cls.n)): if cls.is_inv_a(h): t = cls.p_pa_mul(a,h) t = cls.p_pp_mul(t,b) if cls.is_a_p_fastfail(t): return 1 return 0 cpdef linear(cls): cdef ulong h H = set() for h in (1< malloc(sizeof(ulong)*20160) qlist = malloc(sizeof(ulong)*16) p = cls.p_p_inv(p) coset[0] = p p = cls.p_p_4bit_linear(p, coset+1) for j in range(16): qlist[j] = cls.p_ap_mul(cls.a_v(j),q) for i in range(20160): for j in range(16): #if cls.is_a_pp_mul(coset[i],qlist[j]) != 0: if cls.is_a_p4(cls.p_pp_mul(coset[i],qlist[j]))!=0: free(coset) free(qlist) return 1 free(coset) free(qlist) return 0 cpdef ulong a_pp_DC(cls, ulong p, ulong q): cdef ulong* coset cdef ulong* qlist cdef ulong t, a, pi cdef uint i,j coset = malloc(sizeof(ulong)*20160) qlist = malloc(sizeof(ulong)*16) pi = cls.p_p_inv(p) # pi = p^-1 coset[0] = pi cls.p_p_4bit_linear(pi, coset+1) for j in range(16): qlist[j] = cls.p_ap_mul(cls.a_v(j),q) 150 for i in range(20160): for j in range(16): if cls.is_a_p4(cls.p_pp_mul(coset[i],qlist[j]))!=0: t = cls.p_pp_mul(coset[i],qlist[j]) a = cls.a_a_inv(cls.a_p(t)) << 32 #a = cls.a_p(cls.p_p_inv(t)) << 32 #a = cls.a_p(t) << 32 t = cls.p_pp_mul(p,t) t = cls.p_pp_mul(t,cls.p_p_inv(q)) free(coset) free(qlist) return a ^ cls.a_p(t) free(coset) free(qlist) return 0 cpdef uint DC_Toff(cls, ulong p, ulong q): ??? Tests if p and q affinely differ by a Toffoli. There exists a,b,c such that apbqc=T ??? cdef ulong* coset cdef ulong* qlist cdef ulong t cdef uint i,j coset = malloc(sizeof(ulong)*20160) qlist = malloc(sizeof(ulong)*16) p = cls.p_p_inv(p) coset[0] = p p = cls.p_p_4bit_linear(p, coset+1) for j in range(16): qlist[j] = cls.p_ap_mul(cls.a_v(j),q) for i in range(20160): for j in range(16): if cls.is_t_p(cls.p_pp_mul(coset[i],qlist[j]))!=0: free(coset) free(qlist) return 1 free(coset) free(qlist) return 0 cpdef uint DC_T3(cls, ulong p, ulong q): ??? Tests if p and q affinely differ by a CCCNOT. There exists a,b,c such that apbqc=T^3 ??? 151 cdef ulong* coset cdef ulong* qlist cdef ulong t cdef uint i,j coset = malloc(sizeof(ulong)*20160) qlist = malloc(sizeof(ulong)*16) p = cls.p_p_inv(p) coset[0] = p p = cls.p_p_4bit_linear(p, coset+1) for j in range(16): qlist[j] = cls.p_ap_mul(cls.a_v(j),q) for i in range(20160): for j in range(16): if cls.is_t3_p(cls.p_pp_mul(coset[i],qlist[j]))!=0: free(coset) free(qlist) return 1 free(coset) free(qlist) return 0 #cpdef uint b_p_AE_Involution(cls, ulong p): def b_p_AE_Involution(cls, p): ??? Tests if p is Affine Equivalent to an involution. There exists a,b such that apb=i where i^2=1. ??? cdef ulong* coset cdef ulong* qlist cdef ulong t cdef uint i,j coset = malloc(sizeof(ulong)*20160) qlist = malloc(sizeof(ulong)*16) coset[0] = p cls.p_p_4bit_linear(p, coset+1) for j in range(16): qlist[j] = cls.p_ap_mul(cls.a_v(j),p) for i in range(20160): for j in range(16): t = cls.p_pp_mul(coset[i],qlist[j]) if cls.is_a_p4(t) != 0: t = cls.p_p_inv(t) t = cls.p_pp_mul(p,t) if cls.p_pp_mul(t,t) == cls.p_identity: free(coset) free(qlist) 152 return 1 free(coset) free(qlist) return 0 # Hamiltonian cycle cdef inline ulong a_aij_cnot(cls, ulong a, uint i, uint j): return a ^ (((a>>i) & cls.p_base) << j) cdef ulong p_p_01(cls, ulong p, ulong* coset): p ^= ((p >> 1) & cls.p_base); coset[0] = p # CNOT(1,0) p ^= ((p & cls.p_base) << 1); coset[1] = p # CNOT(0,1) p ^= ((p >> 1) & cls.p_base); coset[2] = p # CNOT(1,0) p ^= ((p & cls.p_base) << 1); coset[3] = p # CNOT(0,1) p ^= ((p >> 1) & cls.p_base); coset[4] = p # CNOT(1,0) return p cdef ulong p_p_10(cls, ulong p, ulong* coset): p ^= ((p & cls.p_base) << 1); coset[0] = p # CNOT(0,1) p ^= ((p >> 1) & cls.p_base); coset[1] = p # CNOT(1,0) p ^= ((p & cls.p_base) << 1); coset[2] = p # CNOT(0,1) p ^= ((p >> 1) & cls.p_base); coset[3] = p # CNOT(1,0) p ^= ((p & cls.p_base) << 1); coset[4] = p # CNOT(0,1) return p cdef ulong p_p_20(cls, ulong p, ulong* coset): cdef ulong mask mask = cls.p_base << 2 p = cls.p_p_10(p,coset) p ^= ((p & mask) >> 1); coset[5] = p # CNOT(2,1) p = cls.p_p_10(p,coset+6) p ^= ((p & mask) >> 2); coset[11] = p # CNOT(2,0) p = cls.p_p_10(p,coset+12) p ^= ((p & mask) >> 1); coset[17] = p # CNOT(2,1) p = cls.p_p_10(p,coset+18) return p cdef ulong p_p_21(cls, ulong p, ulong* coset): cdef ulong mask mask = cls.p_base << 2 p = cls.p_p_01(p,coset) p ^= ((p & mask) >> 2); coset[5] = p # CNOT(2,0) p = cls.p_p_01(p,coset+6) p ^= ((p & mask) >> 1); coset[11] = p # CNOT(2,1) p = cls.p_p_01(p,coset+12) 153 p ^= ((p & mask) >> 2); coset[17] = p # CNOT(2,0) p = cls.p_p_01(p,coset+18) return p cdef ulong p_p_0212(cls, ulong p, ulong* coset): cdef uint i,j for i in range(7): j = ((i>>2)^i)&1 # produces pattern 0101101 if j==0: p = cls.p_p_20(p,coset+i*24) else: p = cls.p_p_21(p,coset+i*24) if i<6: p = cls.a_aij_cnot(p,j,2); coset[(i+1)*24-1] = p return p cdef ulong p_p_30(cls, ulong p, ulong* coset): p = cls.p_p_0212(p,coset) p = cls.a_aij_cnot(p,3,2); coset[168-1] = p p = cls.p_p_0212(p,coset+168) p = cls.a_aij_cnot(p,3,1); coset[2*168-1] = p p = cls.p_p_0212(p,coset+2*168) p = cls.a_aij_cnot(p,3,2); coset[3*168-1] = p p = cls.p_p_0212(p,coset+3*168) p = cls.a_aij_cnot(p,3,0); coset[4*168-1] = p p = cls.p_p_0212(p,coset+4*168) p = cls.a_aij_cnot(p,3,2); coset[5*168-1] = p p = cls.p_p_0212(p,coset+5*168) p = cls.a_aij_cnot(p,3,1); coset[6*168-1] = p p = cls.p_p_0212(p,coset+6*168) p = cls.a_aij_cnot(p,3,2); coset[7*168-1] = p p = cls.p_p_0212(p,coset+7*168) return p cdef ulong p_p_31(cls, ulong p, ulong* coset): p = cls.p_p_0212(p,coset) p = cls.a_aij_cnot(p,3,2); coset[168-1] = p p = cls.p_p_0212(p,coset+168) p = cls.a_aij_cnot(p,3,0); coset[2*168-1] = p p = cls.p_p_0212(p,coset+2*168) p = cls.a_aij_cnot(p,3,2); coset[3*168-1] = p p = cls.p_p_0212(p,coset+3*168) p = cls.a_aij_cnot(p,3,1); coset[4*168-1] = p p = cls.p_p_0212(p,coset+4*168) p = cls.a_aij_cnot(p,3,2); coset[5*168-1] = p 154 p = cls.p_p_0212(p,coset+5*168) p = cls.a_aij_cnot(p,3,0); coset[6*168-1] = p p = cls.p_p_0212(p,coset+6*168) p = cls.a_aij_cnot(p,3,2); coset[7*168-1] = p p = cls.p_p_0212(p,coset+7*168) return p cdef ulong p_p_4bit_linear(cls, ulong p, ulong* coset): p = cls.p_p_30(p,coset) p = cls.a_aij_cnot(p,0,3); coset[1*1344-1] = p p = cls.p_p_31(p,coset+1344) p = cls.a_aij_cnot(p,2,3); coset[2*1344-1] = p p = cls.p_p_31(p,coset+2*1344) p = cls.a_aij_cnot(p,0,3); coset[3*1344-1] = p p = cls.p_p_30(p,coset+3*1344) p = cls.a_aij_cnot(p,1,3); coset[4*1344-1] = p p = cls.p_p_31(p,coset+4*1344) p = cls.a_aij_cnot(p,2,3); coset[5*1344-1] = p p = cls.p_p_31(p,coset+5*1344) p = cls.a_aij_cnot(p,0,3); coset[6*1344-1] = p p = cls.p_p_30(p,coset+6*1344) p = cls.a_aij_cnot(p,1,3); coset[7*1344-1] = p p = cls.p_p_31(p,coset+7*1344) p = cls.a_aij_cnot(p,0,3); coset[8*1344-1] = p p = cls.p_p_31(p,coset+8*1344) p = cls.a_aij_cnot(p,2,3); coset[9*1344-1] = p p = cls.p_p_30(p,coset+9*1344) p = cls.a_aij_cnot(p,1,3); coset[10*1344-1] = p p = cls.p_p_31(p,coset+10*1344) p = cls.a_aij_cnot(p,0,3); coset[11*1344-1] = p p = cls.p_p_31(p,coset+11*1344) p = cls.a_aij_cnot(p,2,3); coset[12*1344-1] = p p = cls.p_p_30(p,coset+12*1344) p = cls.a_aij_cnot(p,0,3); coset[13*1344-1] = p p = cls.p_p_30(p,coset+13*1344) p = cls.a_aij_cnot(p,3,2) # One wasted step. p = cls.a_aij_cnot(p,2,3); coset[14*1344-1] = p p = cls.p_p_30(p,coset+14*1344) return p ############################################################### # Set Manipulators ############################################################### cdef ulong* A_S(cls, S): 155 cdef ulong* A cdef uint i,n n = len(S) A = malloc(sizeof(ulong)*n) i = 0 for x in S: A[i] = x i += 1 return A cdef S_An(cls, ulong* A, uint n): cdef uint i S = set() for i in range(n): S.add(A[i]) return S cdef ulong* A_An_inv(cls, ulong* A, uint n): cdef uint i for i in range(n): A[i] = cls.p_p_inv(A[i]) return A cdef ulong* A_p_leftAffineCoset(cls, ulong p): cdef ulong* coset cdef uint i,j coset = malloc(sizeof(ulong)*20160*16) coset[0] = p p = cls.p_p_4bit_linear(p, coset+1) for i in range(16): for j in range(20160): coset[20160*i+j] = coset[j] ^ (cls.p_base * i) return coset def S_p_leftAffineCoset(cls, ulong p): cdef ulong* coset coset = cls.A_p_leftAffineCoset(p) return cls.S_An(coset,322560) cdef ulong* A_p_leftLinearCoset(cls, ulong p): cdef ulong* coset cdef uint i,j coset = malloc(sizeof(ulong)*20160) coset[0] = p p = cls.p_p_4bit_linear(p, coset+1) 156 return coset def S_p_leftLinearCoset(cls, ulong p): cdef ulong* coset coset = cls.A_p_leftAffineCoset(p) return cls.S_An(coset,20160) cdef ulong* A_p_affineCommutator(cls, ulong p): cdef ulong* S cdef ulong* C cdef uint i, count S = malloc(sizeof(ulong)*20160*16) C = malloc(sizeof(ulong)*20160*16+1) S = cls.A_p_leftAffineCoset(p) # pH p = cls.p_p_inv(p) count = 1 for i in range(322560): if cls.is_a_pp_mul(S[i],p) != 0: # pHp^-1 C[count] = cls.p_pp_mul(S[i],p) count += 1 C[0] = count - 1 return C def S_p_affineCommutator(cls, ulong p): cdef ulong* S cdef uint n S = cls.A_p_affineCommutator(p) return cls.S_An(S+1,S[0]) cdef ulong* A_p_linearCommutator(cls, ulong p): cdef ulong* S cdef ulong* C cdef uint i, count S = malloc(sizeof(ulong)*20160) C = malloc(sizeof(ulong)*20160+1) S = cls.A_p_leftLinearCoset(p) # pH p = cls.p_p_inv(p) count = 1 for i in range(20160): if cls.is_a_pp_mul(S[i],p) != 0: # pHp^-1 if cls.v_p(cls.p_pp_mul(S[i],p)) == 0: C[count] = cls.p_pp_mul(S[i],p) count += 1 C[0] = count - 1 return C 157 def S_p_linearCommutator(cls, ulong p): cdef ulong* S cdef uint n S = cls.A_p_linearCommutator(p) return cls.S_An(S+1,S[0]) def S_p_affineTransversal(cls, ulong p): cdef ulong t H = cls.S_p_affineCommutator(p) G = cls.S_p_leftAffineCoset(cls.p_identity) for h in H: G.remove(h) H.remove(cls.p_identity) # t*id will be removed by pop() T = set([ cls.p_identity ]) while len(G) > 0: t = G.pop() for h in H: G.remove(cls.p_pp_mul(t,h)) T.add(t) return T def S_p_linearTransversalToff(cls): ???Affine shift commute through a Toffoli gate into linear. Thus, the transversal can be only linear functions. ??? cdef ulong t,p p = 0xbedcfa9836547210 # Toffoli H = cls.S_p_linearCommutator(p) G = cls.S_p_leftLinearCoset(cls.p_identity) for h in H: G.remove(h) H.remove(cls.p_identity) # t*id will be removed by pop() T = set([ cls.p_identity ]) while len(G) > 0: t = G.pop() for h in H: G.remove(cls.p_pp_mul(t,h)) T.add(t) return T def is_pS_DC(cls, ulong p, S): cdef ulong s cdef uint sig 158 sig = cls.n_p_signature(p) for s in S: if cls.n_p_signature(s) != sig: continue if cls.DC(p,s) != 0: return 1 return 0 def p_pS_DC(cls, ulong p, S): cdef ulong s cdef uint sig sig = cls.n_p_signature(p) for s in S: if cls.n_p_signature(s) != sig: continue if cls.DC(p,s) != 0: return s return 0 cdef ulong is_pAn_DC(cls, ulong p, ulong* A, uint n): cdef ulong* coset cdef ulong t cdef uint i,j,k coset = malloc(sizeof(ulong)*20160) p = cls.p_p_inv(p) coset[0] = p p = cls.p_p_4bit_linear(p, coset+1) for i in range(20160): for j in range(16): t = coset[i] ^ (cls.p_base * j) for k in range(n): if cls.is_a_pp_mul(t,A[k]) != 0: free(coset) return 1 free(coset) return 0 ??? Is Basis Fixing Is Normal (zero fixing) Is involution Cycle Index Polynomial Conjugation Random Hilbert polynomial 159 One and two variable ??? if __name__ == ?__main__?: import doctest, sys doctest.testmod(sys.modules[__name__]) 160 BIBLIOGRAPHY [AtSotCL51] H. H. Aiken and the Sta of the Computation Laboratory, Synthesis of electronic computing and control circuits, 1951, pp. 231{278. [BCP97] Wieb Bosma, John Cannon, and Catherine Playoust, The magma algebra system i: the user language, J. Symb. Comput. 24 (1997), no. 3-4, 235{265. [Ben73] C. H. Bennett, Logical reversibility of computation, IBM J. Res. Develop. 17 (1973), 525{532. MR MR0449020 (56 #7325) [CG96] Stephen J. Curran and Joseph A. Gallian, Hamiltonian cycles and paths in cayley graphs and digraphs { a survey, Discrete Mathemat- ics 156 (1996), no. 1-3, 1 { 18. [HG92] A. P. Hiltgen and J. Ganz, On the existence of speci c complexity { asymmetric permutations, Technical Report, Signal and Informa- tion Proc. Lab., ETH-Zurich (1992). [Hil93] Alain P. L. Hiltgen, Constructions of feebly-one-way families of per- mutations, Advances in cryptology|AUSCRYPT ?92 (Gold Coast, 1992), Lecture Notes in Comput. Sci., vol. 718, Springer, Berlin, 1993, pp. 422{434. MR MR1292706 (96e:94014) 161 [Hol05] Derek F. Holt, Handbook of computational group theory (discrete mathematics and its applications), Chapman & Hall/CRC, January 2005. [Hou06] Xiang-dong Hou, A nity of permutations of Fn2, Discrete Appl. Math. 154 (2006), no. 2, 313{325. MR MR2194404 (2006j:06024) [Jac74] Nathan Jacobson, Basic algebra I, W. H. Freeman and Company, San Francisco, CA, USA, 1974, ISBN 0-7167-0453-6 (v. 1.). [Jun93] Dieter Jungnickel, Finite elds, Bibliographisches Institut, Mannheim, 1993, Structure and arithmetics. MR MR1238714 (94g:11109) [KMS07] Samuel A. Kutin, David Petrie Moulton, and Lawren M. Smithline, Computation at a distance, 2007. [Lor64] C.S. Lorens, Invertible boolean functions, Electronic Computers, IEEE Transactions on EC-13 (1964), no. 5, 529{541. [LPS87] Martin W. Liebeck, Cheryl E. Praeger, and Jan Saxl, A classi ca- tion of the maximal subgroups of the nite alternating and symmet- ric groups, J. Algebra 111 (1987), no. 2, 365{383. MR MR916173 (89b:20008) [Mas96] James L. Massey, The di culty with di culty, Presented EURO- CRYPT 1996, in Zaragoza, Spain, 1996. [MDM07] D. Maslov, G. W. Dueck, and D. M. Miller, Techniques for the synthesis of reversible to oli networks, ACM Trans. Des. Autom. Electron. Syst. 12 (2007), no. 4, 42. 162 [Moo52] E. F. Moore, A table for four-relay two-terminal contact networks, 1952. [Sco87] W. R. Scott, Group theory, second ed., Dover Publications Inc., New York, 1987. MR MR896269 (88d:20001) [Vol99] Heribert Vollmer, Introduction to circuit complexity, Texts in The- oretical Computer Science. An EATCS Series, Springer-Verlag, Berlin, 1999, A uniform approach. MR MR1704235 (2001b:68047) 163