ABSTRACT Title of dissertation: APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN LEARNING QUANTUM SYSTEMS Ruizhi Pan Doctor of Philosophy, 2023 Dissertation directed by: Professor Charles W. Clark Department of Physics Quantum machine learning is an emerging field that combines techniques in the disciplines of machine learning (ML) and quantum physics. Research in this field takes three broad forms: applications of classical ML techniques to quantum physical systems, quantum computing and algorithms for classical ML problems, and new ideas inspired by the intersection of the two disciplines. We mainly focus on the power of artificial neural networks (NNs) in quantum-state representation and phase classification in this work. In the first part of the dissertation, we study NN quantum states which are used as wave-function ansätze in the context of quantum many-body physics. While these states have achieved success in simulating low-lying eigenstates and short-time unitary dynamics of quantum systems and efficiently representing particular states such as those with a stabilizer nature, more rigorous quantitative analysis about their expressibility and com- plexity is warranted. Here, our analysis of the restricted Boltzmann machine (RBM) state representation of one-dimensional (1D) quantum spin systems provides new insight into their computational complexity. We define a class of long-range-fast-decay (LRFD) RBM states with quantifiable upper bounds on truncation errors and provide numerical evidence for a large class of 1D quantum systems that may be approximated by LRFD RBMs of at most polynomial complexities. These results lead us to conjecture that the ground states of a wide range of quantum systems may be exactly represented by LRFD RBMs or a variant of them, even in cases where other state representations become less efficient. At last, we provide the relations between multiple typical state manifolds. Our work proposes a paradigm for doing complexity analysis for generic long-range RBMs which naturally yields a further classification of this manifold. This paradigm and our characterization of their nonlocal structures may pave the way for understanding the natural measure of complexity for quantum many-body states described by RBMs and are generalizable for higher-dimensional systems and deep neural-network quantum states. In the second part, we use RBMs to investigate, in dimensionsD = 1 and 2, the many- body excitations of long-range power-law interacting quantum spin models. We develop an energy-shift method to calculate the excited states of such spin models and obtain a high- precision momentum-resolved low-energy spectrum. This enables us to identify the critical exponent where the maximal quasiparticle group velocity transits from finite to divergent in the thermodynamic limit numerically. In D = 1, the results agree with an analysis using the field theory and semiclassical spin-wave theory. Furthermore, we generalize the RBM method for learning excited states in nonzero-momentum sectors from 1D to 2D systems. At last, we analyze and provide all possible values (3 2 , 2 and 3) of the critical exponent for 1D generic quadratic bosonic and fermionic Hamiltonians with long-range hoppings and pairings which serves for understanding the speed of information propagation in quantum systems. In the third part, we study deep NNs as phase classifiers. We analyze the phase diagram of a 2D topologically nontrivial fermionic model Hamiltonian with pairing terms at first and then demonstrate that deep NNs can learn the band-gap closing conditions only based on wave-function samples of several typical energy eigenstates, thus being able to identify the phase transition point without knowledge of Hamiltonians. APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN LEARNING QUANTUM SYSTEMS by Ruizhi Pan Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor Mohammad Hafezi, Chair Professor Charles W. Clark, Co-Chair/Advisor Professor Victor Yakovenko Professor Alexey Gorshkov Professor Christopher Jarzynski, Dean’s representative © Copyright by Ruizhi Pan 2023 Acknowledgments I would first like to express my heartfelt gratitude to my advisor Charles Clark, who has supported me throughout my Ph.D. study with his great patience and encouragement. I really appreciate the freedom that Charles gave to me in choosing research topics and exploring them at my own pace. I am really grateful for his instructions on how to delve into a specific research area and the so-called paper-torture session through which we could have a detailed discussion on manuscript writing. In fact, it is not torture for me and I really enjoy and cherish such a process. Working under the instruction of a knowledgeable, prestigious and amiable professor is my long-cherished wish and I think it is fulfilled at the University of Maryland, College Park. I would like to thank many professors at the Joint Quantum Institute, the Joint Center for Quantum Information and Computer Science and other research institutes who offered me tremendous help in my Ph.D. research, including Alexey Gorshkov, Victor Yakovenko, Mohammad Hafezi, Andrew Childs, Ian Spielman, Sankar Das Sarma, Jay Deep Sau, Ana Maria Rey, Christopher Jarzynski, Jacob Taylor, Chris Greene, Ricardo Nochetto, Xiaoji Zhou, and Xuzong Chen. I really appreciate their willingness to share their knowledge and enthusiasm in science. I also want to thank my friends in the department of physics, mathematics and computer science: Wenbo Li, Fangli Liu, Yi-Hsieh Wang, Yiming Cai, Peizhi Du, Bin Cao, ii Yuchen Yue, Tongyang Li, Linfeng Zhang, Dong-ling Deng, Yidan Wang, Zhexuan Gong, Renxiong Wang, Xunnong Xu, Haitan Xu, Chunxiao Liu, Alejandra Maldonado-Trapp, Ben Eller, Tengfei Su, Peng Zhang and so many other colleagues. Their support and encouragement give me warmth and courage. At last, I would like to thank my parents. Their love and support are my permanent source of strength to explore the world. iii Table of Contents Acknowledgements ii Table of Contents iv List of Tables vi List of Figures vii List of Abbreviations xiii Chapter 1: Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Outline of dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2: Efficiency of neural-network state representations of one- dimensional quantum spin systems 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 The restricted Boltzmann machine as a wave-function ansatz . . . . . . . . 11 2.2.1 Long-range-fast-decay RBMs . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 Effects of wave-function truncation for fixed system sizes . . . . . . 22 2.2.3 Scaling of complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.4 Spin-correlation information . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Ground-state applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4 State manifolds and complexity classification . . . . . . . . . . . . . . . . . 37 2.5 Transformation from RBMs to MPSs . . . . . . . . . . . . . . . . . . . . . 40 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 3: Learning quasiparticle excitations in long-range interacting quantum systems using neural networks 44 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Learning excited states with RBMs . . . . . . . . . . . . . . . . . . . . . . 47 3.3 Group velocity transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4 Correlations in excited states . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5 2D generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.6 Summary of quadratic Hamiltonians with long-range hoppings and pairings 63 3.7 Experimental relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 iv Chapter 4: Deep neural networks as phase classifiers 69 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Model and physical intuition . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3 Zero-energy edge modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 Phase diagram at the sweet spot . . . . . . . . . . . . . . . . . . . . . . . . 81 4.5 Application of neural networks as phase classifiers . . . . . . . . . . . . . . 88 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Appendix A: Appendix for Chapter “Efficiency of neural-network state representations of one-dimensional quantum spin systems” 92 A.1 Proof of the convergence of |Ψ(L,∞)⟩ for long-range-fast-decay RBMs . . . . 92 A.2 Proof of upper bounds on truncation errors for LRFD RBMs . . . . . . . . 100 A.3 Proof of spin correlation formula . . . . . . . . . . . . . . . . . . . . . . . . 107 A.4 LRFD RBMs approximating the Kronecker delta function . . . . . . . . . 113 A.5 Error curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Appendix B: Appendix for Chapter “Learning quasiparticle excitations in long-range interacting quantum systems using neural networks” 118 B.1 Analysis of energy-shift method . . . . . . . . . . . . . . . . . . . . . . . . 118 B.2 Field-theory formula analysis and data collapse . . . . . . . . . . . . . . . 122 Appendix C: Appendix for Chapter “Neural networks as phase classifiers” 124 C.1 Hamiltonian in Majorana representation . . . . . . . . . . . . . . . . . . . 124 C.2 Derivation of the phase boundary between gapped SC and gapless phases . 125 Appendix D: Other research projects 127 D.1 Optomechanics and novel quantum phase of the Bose-Einstein Condensate with the cavity mediated spin-orbit coupling . . . . . . . . . . 127 Bibliography 132 v List of Tables 2.1 Complexity estimations for distinct typical settings of µ(r) and λ(k̃). “−” in the µ(r) column denotes all µ(r) functions that make Q(L) converge as L → ∞. “−” in the λ(k̃) column denotes all λ(k̃) functions that make P (Nh/L) have the asymptotic behavior of O(1/ ln(Nh/L)) as Nh → ∞. In all settings, δP > 1, αP > 1/2 and each entry provides a description of the asymptotic behavior of the corresponding function. . . . . . . . . . . . . . 28 2.2 Failing of the algorithm of finding an optimal transformation from RBM to MPS. The RBM to be transformed has a network structure shown in Fig. 2.6. The original algorithm uses a random selection of “separation sets”. . . . . 42 2.3 Success of generating an MPS using our improved algorithm. The RBM to be transformed is the same as that in Table 2.2. . . . . . . . . . . . . . . . 42 3.1 The critical exponent αc for generic fermionic quadratic Hamiltonians with long-range hoppings and pairings. dj with 1 ≤ j ≤ 7 are constants dependent on J0, J1, ∆ and α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2 The critical exponent αc for generic bosonic quadratic Hamiltonians with long-range hoppings and pairings. d′j with 1 ≤ j ≤ 9 are constants dependent on J0, J1, ∆ and α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 vi List of Figures vii 2.1 (a) Network structure of RBMs as a wave-function ansatz for 1D quantum spin systems. Long-range RBMs usually implies full connectivity between the visible and hidden layer. The hidden layer is divided into Nh/L levels, each containing L hidden nodes. (b) Importance measure η(j, k̃, L) for a LRFD RBM with translational symmetry. The RBM is constructed as Eqs. (2.24)–(2.26) show, where λ(k̃) = k̃−αP , µ(r) = 1 2 δQr −αQ for r ̸= 0, µ(0) = δQ = 0.5, αP = 0.75, αQ = 1.5, cw = 1 + i, cb = 0, a0 = 0 and L = 11. The inset plots the decay of the maximum of η(j, k̃, L) among all j at each level with increasing k̃ on a log-log scale. The linearity of the curve reveals a power-law decaying of the “ridge” (red circles) of the 3D structure. 12 2.2 (a)(b) Comparison between the exact and estimated truncation errors ε(L,Nh) as a function ofNh with fixed L for 1D SPT cluster states with a perturbation part. E Ψ: exact values, first-type truncation errors; U Ψ: upper-bound- based estimation, first-type; E CZ: exact, second-type, B̂ = σ̂z1σ̂ z 2; E CX: exact, second-type, B̂ = σ̂x1 σ̂ x 2 ; U CX: upper-bound-based estimation, second- type, B̂ = σ̂x1 σ̂ x 2 . The perturbation part is constructed as Eqs. (2.24)–(2.26) show. µ(r) = 1 2 δQr −αQ for r ̸= 0, µ(0) = δQ = 0.1, αQ = 3, cw = cb = 1 + i, a0 = 0 and L = 11. (a) Exponential decay of λ(k̃) (vertical axis: log scale). λ(k̃) = 0.2δ −(k̃−1) P with δP = 1.5. (b) Power-law decay of λ(k̃) (on a log-log scale). λ(k̃) = k̃−αP with αP = 3. (c) Schematic interpretation of variables used in proofs. The inset in (c) shows the distribution of data points of the ratio ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗) with Nh = L, which corresponds to a trunca- tion removing hidden nodes starting from the second level. The parameter setting is the same as that in (b). The data points are all localized in the neighborhood of z = 1 in the complex plane enclosed by the red solid curves in (c) as we analyzed. (d) Scaling of N∗ h(L, ε0) in L for two fixed values of the first-type truncation errors ε0 with the same parameter setting as in (b). Red circle: ε0 = 10−7; Blue square: ε0 = 10−10. The inset in (d) shows the scaling of Nh estimated based on our upper bounds with ε0 = 10−3. Magenta solid: using exact values of upper bounds; Brown dashed: using leading-order estimations. The two curves almost coincide. . . . . . . . . . 21 2.3 Spin correlations in the z direction as a function of distance r on a log-log scale. The LRFD RBMs are constructed as Eqs. (2.24)–(2.26) show, where µ(r) = 1 2 δQr −αQ for r ̸= 0, µ(0) = δQ = 0.2, λ(k̃) = k̃−αP , αP = 3.5, cw = 1, cb = 0, a0 = 0, L = 22 and Nh = 5L. The inset shows the spin correlation ⟨σ̂z1σ̂z1+L/2⟩ with r being the half-chain length for varying L (on a log-log scale). It shows a convergence of ⟨σ̂z1σ̂z1+L/2⟩ to an L-independent constant (almost attaining the maximum value 1) for αQ = 1/2 and a decay for αQ = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 viii 2.4 Importance measure η(j, k̃, L) for the RBMs approximating ground states of two critical systems with L = 15. (a) TFIM with Bx = 1. (b) XXZ model with Jz = −0.2. The insets in each subfigure show the decays of the maximum importance measure at each level as level number k̃ increases on a log-log scale. The system size L = 9, 11, 13 and 15. The purple dashed curve implies that these decaying curves can be upper bounded by a power- law decay. By numerical fitting, the corresponding αP for (a) and (b) are 2.957 and 1.232, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5 Relations between multiple typical state manifolds. S1: short-range RBMs; S2: LRFD RBMs; S (j) 2 (for 1 ≤ j ≤ 6, j ∈ N): LRFD RBMs with distinct parameter conditions, specified in Table 2.1; S3: RBMs with spatial complexities scaling at most polynomially in system sizes; S4: RBMs with a faster-than-polynomial scaling of spatial complexities in system sizes, corre- sponding to inefficiency of representation; S5: ground states of 1D quantum spin systems. The dashed boundary of S5 means that its relations with other manifolds have not been fully determined. . . . . . . . . . . . . . . . . . . 38 2.6 Network structure of a sparse RBM to be transformed into an MPS. This RBM serves as an example to show the possible failing of the original trans- formation algorithm and the effect of our improvement. . . . . . . . . . . . 40 3.1 Low-energy spectra (red circles) obtained by RBMs with hidden-unit density β = 1 compared to the results obtained by exact diagonalization (solid blue line), and the relative errors on energy values using RBMs with β = 1 and 2 (dashed lines). (a) Gapped long-range TFIM with α = 1.2, B = 5 and L = 18. (b) Gapless long-range XXZ model with α = 1.2, Jz = −0.5 and L = 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2 (a,b) Dispersion relations shifted vertically so that the ground-state energy is zero. The results are obtained by RBMs with β = 1. (a) TFIM with α = 1.5, 2, 3 and ∞, while B = 4 and L = 53. (b) XXZ model with α = 1.5, 2.5, 3, 3.5 and ∞, while Jz = −0.5 and L = 53. (c) Group velocity vg(k1) as a function of α for different system sizes L for TFIM with B = 4. (d) Data collapse for (c). The insets in the lower right corner show the variance in data collapse processing for a range of αc. . . . . . . . . . . . . . . . . 53 3.3 Longitudinal correlations in low-lying excited states of the long-range TFIM in different phases. (a) The lowest excited states in sectors k = 0 and k = 2π/L in the paramagnetic phase with α = 1.8, B = 4 and L = 70. Inset: the long-distance limits of longitudinal correlations in the lowest excited state in the sector k = 0 for 30 ≤ L ≤ 70. (b) The lowest excited state in the sector k = 2π/L in the ferromagnetic phase with α = 1.8, B = 0.9 and L = 70. Inset: the long-distance limits of longitudinal correlations in the lowest excited state in the sector k = 2π/L for 50 ≤ L ≤ 70. . . . . . . . . 56 ix 3.4 2D long-range TFIM on a square lattice. (a) Low-lying energy spectrum by generalized RBMs with hidden-unit density β = 2. α = 3.5, B = 6, L = 4 × 4 = 16. (b) Relative error in energy calculations in (a) compared to exact-diagonalization solutions. Ej1j2 denotes E(ky = 2π√ L j1, kz = 2π√ L j2). (c,d) Energy spectra by generalized RBM with β = 1. α = 2.5 (for (c)) and α = 4.5 (for (d)), B = 8, L = 8× 8 = 64. In (a), (c) and (d), the red points correspond to the ground states. . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 2D long-range TFIM on a triangular lattice. (a) Low-lying energy spectrum by generalized RBMs with β = 2. α = 3, B = 8, L = 4 × 4 = 16. (b) Relative error in energy calculations in (a) compared to exact-diagonalization solutions. Ej1j2 denotes E(k1 = 2π√ L j1, k2 = 2π√ L j2). (c,d) Energy spectra by generalized RBMs with β = 1. α = 2.7 (for (c)) and α = 4.5 (for (d)), B = 9, L = 8× 8 = 64. In (a,c,d), the red points correspond to the ground states. In (a,c), the insets are the heat map of the corresponding energy spectra and exhibit the expected rotational symmetry. In (d), the inset is the scaling of the y-component vy of group velocity at k⃗ = 0⃗ with system sizes (L = 16, 36, and 64) for different α by RBM learning. . . . . . . . . 59 x 4.1 Physical intuition of generating zero-energy edge modes in the case of µ = 0. (a) Kitaev’s 1D spinless p-wave SC quantum wire. The two neighboring MFs constitute a normal fermion. The blue arrows in the upper chain signify the internal pairing of MFs with no unpaired MFs remaining. The red arrows in the lower chain indicate the inter-cell pairing of MFs with two unpaired MFs at the ends of the chain. (b) MF coupling at a single armchair edge of a 2D honeycomb lattice. The upper two and lower left subfigures are for our model in which there is no 3-fold rotational symmetry (RS). The solid bonds are the net Majorana couplings contributed by terms related to (∆, t), (∆↑, t↑) and (∆↓, t↓) in Ĥ, respectively. The shaded and colored cells denote the dangling MFs in a hexagon at a single armchair edge of the lattice. The lower right subfigure shows an example of unexpected MF couplings in which there is rotational symmetry, just in comparison with that in our model. . 72 4.2 Schematic of our physical system. (a) The depictions of the three vectors (δ⃗j) connecting nearest-neighbor sites and the three vectors (r⃗j) connecting NNN sites for j = 1, 2, 3. (b) Angular distribution of the sign in the amplitude and phase of the defined pairing which is similar to the domain wall structure. (c) The distribution of zero modes at a single armchair edge. The heights of the pillars denote the amplitudes of the wave function at each site. . . . . . 76 4.3 Band structure for µ = 0 in four cases characterized by different gap con- ditions and winding numbers w. Fixed parameters: t = 1, t↑ = 0.4, t↓ = 0.6. (a) p⃗ = (0.9Kx, 0); (b) p⃗ = (0, 3Ky); (c) p⃗ = (0.1Kx, 0.2Ky); (d) p⃗ = (0.6Kx, 3Ky), where Kx = 2π/3 and Ky = Kx/ √ 3. Each group of blue curves represents a bulk band. The dark red straight lines are the zero-energy edge modes and the brown and green curves in (b) and (d) are gapless edge modes. In (a) and (b), the zero modes are 4-fold degenerate; in (c) and (d), the zero modes are 2-fold degenerate. The inset in (c) zooms in on the split zero modes (magenta) and completely flat zero modes (dark red) separately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.4 Phase diagram at the generalized “sweet spot” when µ = 0, obtained by numerical simulation. (a) shows the phase boundary between the gapped SC phase (purple) and gapless phase (light green). (b) shows the phase boundary between phases with different winding numbers. The green region indicates w = 0, the blue region w = 1 and the red region w = −1. A combination of these two figures shows the full 4-phase diagram. . . . . . . 82 4.5 Regions of the gapped SC phase in the 3D parameter space (sinΦ1, sinΦ2, sinΦ3). The shaded surface near the 12 edges of the cube shows the domain in which at least one inequality | cosΦj| + | cosΦk| ≥ | cosΦl| is violated, where (j, k, l) is a permutation of (1, 2, 3). This surface defines the gapped SC phase. Note that the only points in this parameter space that have physical significance are those for which Φ2 = Φ1 + Φ3 (including both the near-edge and kernel regions). . . . . . . . . . . . . . . . . . . . . . . . . . 85 xi 4.6 (a) Training process using the deep-learning toolbox provided by Matlab software. The prediction accuracy increases and the loss decreases as the training proceeds. (b) Effects of learning the phase transition between the gapped SC phase and the gapless phase (both with a winding number of 0). py/Ky is fixed to be 0. The exact phase transition point is px/Kx = 2/3. The output of a fully trained deep NN exhibits a sudden change when px/Kx varies across the phase transition point. The wave-function samples used as training data just cover a small parameter range far away from the critical point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A.1 Distribution of the square of normalized wave-function amplitudes in the spin-configuration space which can approximate the Kronecker delta function. The LRFD RBMs are constructed as Eqs. (A.141)–(A.143) show and µ0 = 0.1, L = 13 and λ(k̃) = δ −(k̃−1) P with δP > 1. The horizontal axis denotes the numerical indices of all 2L spin configurations which are sorted in a monotonically decreasing order of their corresponding amplitudes. The inset shows the ratio |ψ(L,∞)(σ⃗0)/ψ (L,∞)(σ⃗1)|2 as a function of 1/δP . . . . . . . . 113 A.2 (a) Approximation errors as a function of the number of hidden nodes for truncated LRFD RBMs and the optimal RBMs. (b) Approximation errors for the calculations of spin correlations in the x and z directions as a function of the number of levels kept in the truncated LRFD RBMs for the XXZ model compared with results from exact-diagonalization methods. The figure is plotted on a log-log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 B.1 Long-range TFIM with B = 6 learned with an RBM with hidden-unit density equal to 1. (a) Group velocity vg(k1) as a function of α for a range of system sizes L. (b) Data collapse for (a), while the inset shows the variance in data collapse processing for different choices of αc. . . . . . . . . . . . . 121 xii List of Abbreviations BEC Bose-Einstein Condensate CQED Cavity Quantum Electrodynamics EM Electromagnetic LRFD Long-range-fast-decay MD Multi-fold Degeneracy MF Majorana Fermion ML Machine Learning MPS Matrix Product State NN Neural Network NNN Next Nearest Neighbor RBM Restricted Boltzmann Machine SC Superconducting SOC Spin-orbit Coupling TFIM Transverse Field Ising Model TRS Time-Reversal Symmetry xiii Chapter 1: Introduction 1.1 Background Quantum machine learning is an emerging field that combines techniques in the disciplines of machine learning (ML) and quantum physics [1, 2, 3, 4, 5, 6, 7]. Research in this field takes three broad forms [2]: applications of classical ML techniques to quantum physical systems [4, 5, 6, 7, 8, 9, 10], quantum computing and algorithms for classical ML problems [11, 12, 13, 14], and new ideas inspired by the intersection of the two disciplines [15, 16]. In the field of learning quantum systems, there has been tremendous progress in applying ML techniques to identifying quantum phases and transitions [3, 17, 18, 19, 20, 21], molecular modeling [22, 23], quantum state tomography [24, 25], and accelerating Monte Carlo simulations [26, 27]. While ML encompasses a wide range of modeling tools and computational algorithms to suit different needs in theoretical modeling and information processing, artificial neural networks (NNs), which are computing systems with specific architectures analogous to and actually inspired by the biological neural networks in animal brains, often play an important role due to their tremendous power in function approximation, classification and data processing. One rapidly advancing field in recent years is the investigation of neural-network quantum states in the context of quantum many-body physics [5, 6, 7, 8, 9, 28, 29, 30, 31]. 1 The core idea in this field is to postulate an ansatz for the wave function in terms of a neural network (NN) [6], which targets a low-dimensional manifold in the exponen- tially large Hilbert space for state approximation [32], and apply ML algorithms to find a specific solution. One of the most commonly used neural networks is the restricted Boltzmann machine (RBM) [6, 7, 8, 28, 29], which is a bipartite stochastic construct that combines the concepts of thermodynamic partition functions with those of classical artificial neural networks. The RBM usually works as the building block for understanding and training deeper networks because of its relatively simple structure for inference and its power in parametric modeling as a universal approximator for discrete distribution [33]. It has achieved success in representing a wide range of quantum states such as low-lying eigenstates of quantum many-body-localized systems [6, 34], code words of a stabilizer code [30, 35, 36] and chiral topological states [8, 37, 38]. One of the central challenges in state representation theory for quantum many-body states is to find efficient representations of states based on which the physical quantities of the global quantum systems can be extracted with information loss as less as possible [32]. It has been demonstrated that the RBMs as a representative of NNs have nonnegligible advantages in state representation for particular sets of quantum states including the states with entanglement entropy violating an area law, states of high-dimensional quantum systems and states related to chiral topological order [7]. But it remains to be determined what the natural measure of the complexity for quantum many-body states described by RBMs is and how to quantitatively study the expressibility of this state manifold. To fully understand and exploit the power of NN state representation, we apply it to the investigation of long-range interacting quantum systems—those with interactions 2 decaying as a power law 1/rα in distance r. The low-lying eigenstates of such systems can possess quantum correlations with a long-range decaying and entanglement entropy violating the area law, thus are often associated with a higher parameterization complexity. In the past years, numerous atomic, molecular, and optical systems exhibiting long-range interactions are emerging as versatile platforms for quantum computation and quantum simulation [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]. So the investigation of the application of NNs to these systems can possibly benefit understanding quantum states with more complexity and provide potential benchmarks for large-scale near-term quantum simulators. Besides the use in parameterizing wave functions for quantum systems, NNs with more complicated architectures, such as convolutional NNs, can also benefit the field of statistical physics in other ways. One of which is to use deep NNs to identify phases and phase transitions based on their ability to detect multiple types of order parameters, including some nontrivial states without conventional order. It is shown that NNs can identify phase transitions in correlated many-body systems without the knowledge of locality conditions of Hamiltonians [3]. We are curious about how NNs behave in learning the phase-transition points of quantum models with pairing terms. 1.2 Outline of dissertation In this dissertation, we focus on quantitatively analyzing the expressibility and com- plexity of NN quantum states in state representation in the context of quantum many-body physics, investigating their application to learning many-body excitations of long-range 3 power-law interacting quantum spin models and using deep NNs for phase classification. In Chapter 2, we study the efficiency of RBMs in state representation and provide a paradigm for doing complexity analysis for generic long-range RBMs. We propose a new concept—long-range-fast-decay (LRFD) RBM states with a quantified nonlocal structure. We derive an upper bound on truncation errors associated with two measures of state differences. Then we identify distinct asymptotic scaling laws of spatial complexities for RBMs determined by the nonlocal interaction pattern between physical and virtual particles in their forms. Based on this analysis, we provide numerical results supporting that ground states of a wide range of 1D quantum systems may be approximated by LRFD RBMs with at most polynomial complexities. These results offer evidence for the potentially high efficiency of RBMs in the scope where other state parameterizations become less efficient. At last, we provide the relations between multiple typical state manifolds. In Chapter 3, we use RBMs to investigate the many-body excitations of long-range power-law interacting quantum spin models in dimensions D = 1 and 2. We develop an energy-shift method to calculate the excited states of such spin models and obtain a high- precision momentum-resolved low-energy spectrum. We numerically identify the critical exponent where the maximal quasiparticle group velocity transits from finite to divergent in the thermodynamic limit. In D = 1, we show that the results agree with an analysis using the field theory and semiclassical spin-wave theory. Furthermore, we generalize the RBM method learning excited states in nonzero-momentum sectors from 1D to 2D systems. At last, we analyze and provide all possible values (3 2 , 2 and 3) of the critical exponent for generic 1D quadratic bosonic and fermionic Hamiltonians with long-range hoppings and pairings. 4 In Chapter 4, we study deep NNs as phase classifiers. We design a 2D topologically nontrivial fermionic model Hamiltonian with pairing terms and study its phase diagram at first and then demonstrate that deep NNs can learn the band-gap closing conditions only based on wave-function samples of several typical energy eigenstates (zero-energy edge modes), thus being able to identify the phase-transition point without knowledge of Hamiltonians. 5 Chapter 2: Efficiency of neural-network state representations of one- dimensional quantum spin systems 2.1 Introduction In recent years, considerable activity has been devoted to the investigation of neural network quantum states in the context of quantum many-body physics [5, 6, 7, 8, 9, 28, 29, 30, 31]. The core idea is to postulate an ansatz for the wave function in terms of a neural network (NN) [6], which targets a low-dimensional manifold in the exponentially large Hilbert space for state approximation [32], and apply ML algorithms to find a specific solution. The restricted Boltzmann machine (RBM) [6, 7, 8, 28, 29] is a bipartite stochastic construct that combines the concepts of thermodynamic partition functions with those of classical artificial neural networks. The RBM usually works as the building block for understanding and training deeper networks because of its relatively simple structure for inference and its power in parametric modeling as a universal approximator for discrete distribution [33]. It has achieved success in representing a wide range of quantum states such as low-lying eigenstates of quantum many-body-localized systems [6, 34], code words of a stabilizer code [30, 35, 36] and chiral topological states [8, 37, 38]. While RBMs have demonstrated their power in numerical simulation, we have strong 6 motivations to theoretically and quantitatively investigate the expressibility and com- plexity of generic long-range RBMs. Here, long-range RBMs refer to RBMs with full connectivity between the visible (spin) and hidden layers, whose structural information is usually characterized by dense networks, in contrast to the so-called short-range or sparse RBMs [7, 30, 31]. The first motivation is that there is a high probability that the exact RBM repre- sentation of generic target quantum states naturally has a long-range form and needs at least exponentially many parameters. This argument is based on the empirical fact that most RBMs returned by ML algorithms after a full training for learning specific quantum states have dense network structures and the magnitudes of parameters characterizing connectivity generally decay but do not vanish as the site difference between visible and hidden nodes increases. The more and more hidden nodes added to the network, charac- terizing increasing computational resources, are able to capture correlations of higher and higher orders between spins [6]. Another piece of evidence is that, in the field of learning ground states of quantum systems, only for very few Hamiltonian families, such as those with a stabilizer nature, have we found a succinct RBM form to exactly represent the ground states [30, 31]. The difficulty of finding exact RBM solutions originates from the difficulty of reducing the number of nonlinear equations for parameters from exponential to polynomial or even linear in system sizes and solving them. Due to the high nonlinearity of RBM forms, it is possible that at least exponentially many hidden nodes are needed in an exact ground-state solution. The second motivation is that the RBM solutions provided by relevant ML algorithms 7 are approximators of the exact target states and these approximations often feature the long-range form and the fast parameter decay as described above, even though the exact RBM representations of the target states are unknown or less efficient, or do not have such features. One example is the demonstration of high accuracy and less complexity for the approximated RBM representations of Jastrow wave functions obtained by numerical computations in the branch of learning chiral topological states [8]. Their exact RBM constructions can be derived in principle but have a polynomial scaling with a comparatively larger exponent, thus being less economical. Moreover, we also find that the RBM construction for a target state, such as the ground state of a given Hamiltonian, is not unique in some cases even with a fixed global phase which implies eliminating the degree of freedom associated with a global gauge trans- formation. This argument can be justified using an example where the ground state of a spin-1 2 system is the state with all spins up. Therefore, if the algorithm for obtaining an RBM approximator has a stochastic nature [6], to which RBM form the solution converges may depend on the initial and hyperparameter settings. As shown in many numerical works [6, 8], the RBM approximators in a wide range of cases hold the long-range and fast-decay features and have a form similar to a finite trun- cation of RBM forms with infinitely many parameters removing those with small enough magnitudes. Magnitude-based prunings can also be conducted for RBMs with a finite number of parameters and their effects may be phase and system dependent [53, 54]. Thus, it is worthwhile to generalize the form of the RBM wave-function ansatz by including infinitely many hidden nodes, quantitatively analyze the effects of finite truncations and justify the faithfulness of using these truncated long-range RBM approximators in math- 8 ematics. The core idea in viewing a conventional long-range RBM as a truncation of an “infinitely large” RBM is analogous to analyzing a finite truncation of the Taylor series in approximating an analytic function at a specific point while the limit of the infinite sequence of the Taylor polynomials converges to the exact value. The third motivation for studying long-range RBMs stems from the central goal of exploring effective compressed state representations. It includes understanding the natural measure of complexity for quantum states described by a specific representation [7] and understanding how the global information and physical properties of the states are encoded in that description, just as how researchers interpret the tensor network states [32]. Additionally, there has been some work studying the relationship between RBMs and other concepts about state representations, such as string-bond states [8], correlator product states [55] and tensor network states [38, 56]. Especially, the transformation from RBMs to matrix product states (MPSs) [56] provides one way in principle to analyze the spatial complexity of RBMs. But such types of transformations may lead to redundancy when the RBMs are not quite short-range or sparse as they only use structural information of the network. It is perhaps also inconvenient to analyze the effects of truncations applied to RBMs through an extra intermediate transformation to other representations. Furthermore, the RBM is an architecture that can naturally describe quantum states in a nonlocal manner [7], which is not the same as tensor network states that aim to encode the global wave-function information into a local tensor operator [32]. Therefore, it is strongly desirable to find a way to analyze the spatial complexity and extract information about the physical properties of long-range RBMs themselves. In this work, we analyze the efficiency of long-range RBM state representation for 1D 9 quantum spin systems. Our procedure is as follows: 1. In Sec. 2.2.1, we generalize the RBM wave-function ansatz to an infinitely-many- hidden-node regime and define a subset of generic RBM states—the long-range- fast-decay (LRFD) RBM states, whose parameter conditions constrain the nonlocal interactions between spins (visible nodes) and virtual particles (hidden nodes). 2. In Sec. 2.2.2, we derive an upper bound on truncation errors associated with two measures of state differences for the sequence of truncated LRFD RBM states. One measure is the l2-norm of the state-vector difference and the other is a Hermitian- operator-based expectation-value difference. 3. In Sec. 2.2.3, we identify the dependence of the spatial complexity for LRFD RBMs in state approximation on the decaying rates specified in the nonlocal interaction pattern. 4. In Sec. 2.3, we provide numerical evidence supporting a conjecture that the ground states of a wide range of 1D quantum spin systems, including some critical systems with logarithmic entanglement entropy, can be approximated by LRFD RBMs with the scaling of the spatial complexity being at most polynomial in both the system size and the inverse of approximation errors. 5. In Sec. 2.3, we also provide the relations between multiple typical state manifolds through which the importance of the concept of LRFD RBMs in efficiency analysis for state representation theory is manifested. Our results offer evidence for the utility of RBMs in cases where other state parame- 10 terizations, such as matrix product states (MPSs), become less efficient. Our work actually proposes a paradigm of doing complexity analysis for general long-range RBMs, rather than limited to short-range or sparse RBMs, and naturally yields a further classification of this manifold based on the complexity scaling. We find that the nonlocal structure of LRFD RBMs can be characterized by two conditions. These conditions are each determined by bounds associated with two degrees of freedom, defined within a framework of levels that is depicted in Fig. 2.1(a). One of the two degrees of freedom is a single-level decaying factor resembling localized orbitals and encoding information about correlations between spins (Sec. 2.2.4). The second is a level- decay factor, which has a significant influence on the complexity of the RBMs (Sec. 2.2.3). This paradigm and our characterization of the nonlocal structures may promote the understanding of the natural measure of complexities for quantum many-body states described by RBMs and may be generalizable to higher-dimensional systems and to deep neural-network quantum states. 2.2 The restricted Boltzmann machine as a wave-function ansatz We use the RBM as a wave-function ansatz for 1D quantum many-body spin-1 2 systems [5, 6, 7]. The RBM usually works as the building block for understanding and training deeper networks because of its relatively simple structure for inference and its power in parametric modeling as a universal approximator for discrete distribution [33]. As basic constructs of deep NNs, the RBMs have two layers. The first layer (a visible layer) represents a spin configuration σ⃗ in the usual way. Here, the vector σ⃗ = (σ1, . . . , σL) 11 (a) input output | ⟩�⃗�𝜎 𝜓𝜓(�⃗�𝜎) σ1 … ℎ1 _ _ _ _ _ �𝑘𝑘 = 1 �𝑘𝑘 = 2 level σ2 σ3 σ𝐿𝐿 … ℎ𝐿𝐿 ℎ𝐿𝐿+1 … ℎ2𝐿𝐿 _ _ _ _ _ … �𝑘𝑘𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑁𝑁ℎ/𝐿𝐿 (b) Figure 2.1: (a) Network structure of RBMs as a wave-function ansatz for 1D quantum spin systems. Long-range RBMs usually implies full connectivity between the visible and hidden layer. The hidden layer is divided into Nh/L levels, each containing L hidden nodes. (b) Importance measure η(j, k̃, L) for a LRFD RBM with translational symmetry. The RBM is constructed as Eqs. (2.24)–(2.26) show, where λ(k̃) = k̃−αP , µ(r) = 1 2 δQr −αQ for r ̸= 0, µ(0) = δQ = 0.5, αP = 0.75, αQ = 1.5, cw = 1 + i, cb = 0, a0 = 0 and L = 11. The inset plots the decay of the maximum of η(j, k̃, L) among all j at each level with increasing k̃ on a log-log scale. The linearity of the curve reveals a power-law decaying of the “ridge” (red circles) of the 3D structure. 12 represents a system of L spins with σj = ±1 for j = 1, . . . , L. The second layer is a hidden layer. It is composed of Nh nodes, denoted by a vector h⃗ = (h1, . . . , hNh ) with hk = ±1 for k = 1, . . . , Nh. The hk’s are introduced as auxiliary particles in the probability model; they play roles similar to those of virtual particles in the valence-bond picture for MPSs [32, 57]. Given a specific spin configuration σ⃗, the RBM outputs the corresponding wave- function amplitude ψ(σ⃗) = 2−Nh ∑ {h⃗:hk=±1} exp ( L∑ j=1 ajσj + Nh∑ k=1 bkhk + ∑ 1≤j≤L,1≤k≤Nh,j,k∈N Wj,kσjhk ) (2.1) = L∏ j=1 eajσj Nh∏ k=1 cosh(bk + L∑ j=1 σjWj,k). (2.2) Here, aj and bk are the bias parameters for the j-th spin and k-th hidden node, respectively, Wj,k is a weight parameter describing the interlayer interaction between the j-th spin and k-th hidden node, and N denotes the set of all natural numbers. The aj, bk and Wj,k are complex numbers. All such amplitudes defined on the computational basis yield a quantum state vector |Ψ⟩ = ∑ σ⃗ ψ(σ⃗)|σ⃗⟩, where the summation is over all 2L spin configurations. It is remarkable that we adopt the RBM form with a factor of 2−Nh . This choice allows us to use infinitely many hidden nodes hk as long as bk and Wj,k decay sufficiently fast to ensure the convergence of ψ(σ⃗) as Nh → ∞ for fixed system sizes L. In other words, it ensures that adding hidden nodes with associated parameters (bk and Wj,k) being zero will not change the value of the wave function. This choice will facilitate the asymptotic analysis as shown below. 13 As mentioned in Sec. 2.1, the RBMs solved by relevant ML algorithms to approximate target states often feature a long-range form and a fast parameter decay. As more hidden nodes are added to the network, the RBM can capture higher-order correlations between spins [6], thus leading to higher accuracy in approximation. The parameter decay is manifested by the decay of weight parameters Wj,k with an increasing index separation |j − k| as well as the decay of bk with increasing k. In this work, we assume Nh to be an integer multiple of L which will facilitate the scaling analysis, especially for translationally invariant systems. When Nh is not an integer multiple of L, we can simply fill the last fragment with hidden nodes associated with zero-value parameters without influencing the wave-function values. We divide the hidden layer into multiple levels, each of which contains L hidden nodes (Fig. 2.1(a)). Thus, there are totally Nh/L levels while the ratio Nh/L is called the hidden-unit density in some references [6]. We will show that hidden nodes at the same level can capture the correlation of the same order between spins by performing an algorithm to reorder all hidden nodes for general RBMs. This point will be further clarified when we use the RBM form with translational symmetry to represent the ground states of 1D translationally invariant quantum systems as shown below [6]. One example of the quantum states that can be exactly represented by short-range RBMs [30, 31] is the 1D symmetry-protected topological (SPT) cluster state. The Hamilto- nian of the SPT cluster system is defined on a 1D L-site lattice with periodic boundary con- ditions as Ĥcluster = − ∑L j=1 σ̂ z j−1σ̂ x j σ̂ z j+1, where σ̂ x and σ̂z are Pauli matrices. A conventional r0-range RBM is defined as an RBM satisfying Wj,k = 0 for any |j−k| > r0. A short-range RBM usually refers to an r0-range RBM with r0 being a small constant independent of the 14 system size L. It was shown in Ref. [30, 31] that the ground state of Ĥcluster can be exactly represented by a 1-range RBM with L hidden nodes defined as: aj = 0 (for any j ∈ {1, 2, . . . , L}), bk = iπ/4,Wk−1,k = iπ/2,Wk,k = 3iπ/4, Wk+1,k = iπ/4 (for any k ∈ {1, 2, . . . , L}), Wj,k = 0 (for |j − k| > 1), (2.3) by using the stabilizer nature of the system to decrease the number of equation constraints for parameters from exponential to linear in L. Using our language of levels, this RBM just has one level and its weight parameters at this single level have a support of very short length which is a manifestation of its quantum entanglement satisfying an area law. Moreover, the translational symmetry of the system is inherited by the RBM form. The parameter patterns of this RBM also have a translational symmetry, which means that its parameters for different hidden nodes can be generated by the action of a translational- symmetry transformation operator on those for a single hidden node [6]. Inspired by the extensibility of the system of equations (2.3) with growing system sizes and considering the need to capture higher-order correlations between spins [6] and stronger quantum entanglement between subsystem blocks [31], we expect that the RBM representa- tion of general quantum states has multiple, possibly infinitely many, levels and the length of the support of weight parameters at each level may increase from a small constant to the maximum length L. This motivates us to analyze generic long-range RBMs with properly specified nonlocal interactions between spins and hidden nodes (virtual particles). 15 2.2.1 Long-range-fast-decay RBMs We now discuss aspects of the nonlocal structure of LRFD RBMs that were summa- rized at the end of Sec. 2.1. This leads to specific definitions of the two conditions that were mentioned there. We begin by generalizing the RBMwave-function ansatz to an infinitely-many-hidden- node regime. An RBM state |Ψ(L,∞)⟩ with infinitely many hidden nodes and a system size L can be defined as |Ψ(L,∞)⟩ = ∑ σ⃗ ψ(L,∞)(σ⃗)|σ⃗⟩, (2.4) where ψ(L,∞)(σ⃗) = L∏ j=1 ea (L) j σj ∞∏ k=1 cosh(b (L) k + L∑ j=1 σjW (L) j,k ). (2.5) Its corresponding truncated-RBM sequence is defined as {|Ψ(L,Nh)⟩}, where |Ψ(L,Nh)⟩ = ∑ σ⃗ ψ(L,Nh)(σ⃗)|σ⃗⟩ (2.6) 16 and ψ(L,Nh)(σ⃗) = L∏ j=1 ea (L) j σj Nh∏ k=1 cosh(b (L) k + L∑ j=1 σjW (L) j,k ) (2.7) is constructed by removing the hyperbolic cosine terms with k ≥ Nh + 1 from ψ(L,∞)(σ⃗). Then, we define a subset of generic RBM states with infinitely many hidden nodes— long-range-fast-decay (LRFD) RBM states—as the RBMs whose parameters satisfy the following two conditions. Condition 1 (boundedness of Wj,k). There exists an L-independent integer k̃s ∈ N and three nonnegative monotonically decreasing real functions λR(k̃), λI(k̃) and µ(r) such that, after a reordering of all hidden nodes, for all k > k̃sL, |Re(W (L) j,k )| ≤ λR(k̃)µ(|j − jc|circ), (2.8) | Im(W (L) j,k )| ≤ λI(k̃)µ(|j − jc|circ), (2.9) where k̃ ∈ {1, 2, . . . , Nh/L} designates the numerical index of levels; jc, the center spin for the k-th hidden node, denotes the site index of the spin with which the interaction of the k- th hidden node reaches its maximum among all j ∈ {1, 2, . . . , L}; |m|circ = min{m,L−m} in accordance with the periodic boundary conditions; and r ∈ {0, 1, . . . , (L−1)/2} denotes the distance between j and jc assuming L is odd without influencing the validity of the following asymptotic analysis. The functions λR(k̃), λI(k̃) and µ(r) satisfy the conditions 17 that there exist finite L-independent nonnegative constants P0 and µ0 such that ∞∑ k̃=k̃s+1 ( λ2R(k̃) + β2 1λ 2 I(k̃) ) = P0 <∞, (2.10) µ(r) ≤ µ0 <∞ (for all r ≥ 0), (2.11) where β1 = 3 √ 2 ln 2/π is found in the convergence proof given in Appendix A.1. We provide an interpretation of each new variable as follows. k̃ = k̃(k) and jc = jc(k) are both functions of k and the correspondence between the pair (k̃, jc) and k is a bijective map. It means that every hidden node is associated with a unique pair and thus can be uniquely positioned in the RBM network after the reordering (Fig. 2.1(a)). The hidden nodes capturing the correlation of the same order between spins are grouped at the same level so that the new indices of all hidden nodes characterized by the pair (k̃, jc) actually manifest the level of correlations. This characterization can also facilitate a symmetry manifestation for quantum states holding translational symmetry. This reordering step is to solve the problem that ML algorithms with a stochastic nature are often unable to automatically group the hidden nodes according to level stratification and their site positions usually exhibit randomness. Condition 2 (boundedness of bk). After the same reordering of all hidden nodes that is described in Condition 1, for all k > k̃sL, |Re(b(L)k )| ≤ λR(k̃)µ(0), (2.12) | Im(b (L) k )| ≤ λI(k̃)µ(0). (2.13) 18 The definition of LRFD RBMs should be understood from the point of view of state manifolds [32, 58]. A state manifold for quantum many-body states usually refers to a subspace of the whole Hilbert space spanned by a parameterized wave-function family [58], thus is a set containing specific types of quantum states. So the manifold of LRFD RBMs can be defined as a space spanned by all parameterized wave functions, every one of which belongs to a quantum-state sequence associated with a varying system size and satisfying the above Condition 1 and 2. One LRFD-RBM state refers to an element in this manifold. So this definition is in the same spirit as the definition of MPSs with different scaling laws [32, 57]. Condition 1 gives an upper bound on the magnitude of RBM weight parameters and actually provides a description of the nonlocal interaction between spins and hidden nodes (virtual particles). It requires that |Re(W (L) j,k )| and |Im(W (L) j,k )| are upper bounded, respectively, by the products λR(k̃)µ(r) and λI(k̃)µ(r). The monotonically decreasing functions λR(k̃) and λI(k̃) can be regarded as level-decay factors, while µ(r) is a factor describing the decay due to the increase of the distance between the spin-site index (j) and the corresponding spin-site index of the center spin (jc(k)) for the k-th hidden node. The function µ(r) has a localization feature and resembles a single-modal localized orbital in the physics of periodic potentials, such as Wannier modes [59], which can be reflected by its monotonically decreasing with increasing r. So this description can effectively capture the parameter decays induced by both the level increase and the growth of system size, providing two degrees of freedom in characterizing the nonlocal interaction pattern. The separate treatments for the real and imaginary parts originate from their inequivalent positions in the RBM wave-function form, which is shown in Appendix A.1. 19 Condition 2 implies that the contribution of bk-related terms can be upper bounded by the largest Wj,k-related terms at each level so that the Wj,k weight parameters play a dominant role in the asymptotic analysis (Appendix A.1). Since there is often a degree of freedom in choosing the value of µ(0), Condition 2 can be satisfied for a wide range of RBM states. Conditions 1 and 2 are proposed to ensure the convergence of the state vector (Eq. (2.5)) and provide a clear quantification for the rate of parameter decays, on the basis of which a complexity analysis can be conducted. A rigorous proof of the convergence of the state vector when Conditions 1 and 2 are satisfied is given in Appendix A.1. This proof is important not only because it ensures that the generalization of RBMs to an infinitely- many-hidden-node regime makes sense by defining them as the limits of some infinite sequences, but also because it introduces the key mathematical tricks and concepts that are necessary for analyzing the effects of truncations. The core idea of the proof is that we can prove the sequence {ψ(L,nL)(σ⃗) : n ∈ N} is a Cauchy sequence in the field of complex numbers C [60]. This proof is inspired by the fact that, when b (L) k and W (L) j,k decay sufficiently fast, the complex-valued ratio ψ(L,(n+m)L)(σ⃗)/ψ(L,nL)(σ⃗) will quickly fall into the neighborhood of the point z = 1 in the complex plane as n increases. So we derive an upper and lower bound on the ratio’s modulus |ψ(L,(n+m)L)(σ⃗)/ψ(L,nL)(σ⃗)| which converge to 1 and an upper bound on the magnitude of its argument | arg ( ψ(L,(n+m)L)(σ⃗)/ψ(L,nL)(σ⃗) ) | which converges to 0 as n increases. Then we show that the corresponding magnitude sequence {|ψ(L,nL)(σ⃗)|} and the argument sequence {arg(ψ(L,nL)(σ⃗))} are Cauchy sequences in the field of real numbers R, thus {ψ(L,nL)(σ⃗)} is a Cauchy sequence in C. 20 (a) 0 100 200 10-15 10-10 10-5 100 (b) 10 60 360 10-15 10-10 10-5 100 (c) 0.9998 1 0 0.01 0.02 (d) 8 15 22 20 90 160 101 102 103101 102 103 104 U F Figure 2.2: (a)(b) Comparison between the exact and estimated truncation errors ε(L,Nh) as a function of Nh with fixed L for 1D SPT cluster states with a perturbation part. E Ψ: exact values, first-type truncation errors; U Ψ: upper-bound-based estimation, first- type; E CZ: exact, second-type, B̂ = σ̂z1σ̂ z 2; E CX: exact, second-type, B̂ = σ̂x1 σ̂ x 2 ; U CX: upper-bound-based estimation, second-type, B̂ = σ̂x1 σ̂ x 2 . The perturbation part is constructed as Eqs. (2.24)–(2.26) show. µ(r) = 1 2 δQr −αQ for r ̸= 0, µ(0) = δQ = 0.1, αQ = 3, cw = cb = 1 + i, a0 = 0 and L = 11. (a) Exponential decay of λ(k̃) (vertical axis: log scale). λ(k̃) = 0.2δ −(k̃−1) P with δP = 1.5. (b) Power-law decay of λ(k̃) (on a log-log scale). λ(k̃) = k̃−αP with αP = 3. (c) Schematic interpretation of variables used in proofs. The inset in (c) shows the distribution of data points of the ratio ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗) with Nh = L, which corresponds to a truncation removing hidden nodes starting from the second level. The parameter setting is the same as that in (b). The data points are all localized in the neighborhood of z = 1 in the complex plane enclosed by the red solid curves in (c) as we analyzed. (d) Scaling of N∗ h(L, ε0) in L for two fixed values of the first-type truncation errors ε0 with the same parameter setting as in (b). Red circle: ε0 = 10−7; Blue square: ε0 = 10−10. The inset in (d) shows the scaling of Nh estimated based on our upper bounds with ε0 = 10−3. Magenta solid: using exact values of upper bounds; Brown dashed: using leading-order estimations. The two curves almost coincide. 21 2.2.2 Effects of wave-function truncation for fixed system sizes We derive upper bounds on truncation errors associated with two measures of state differences for the sequence of truncated LRFD RBM states. Define ε(L,Nh) to be a specific type of truncation error for using |Ψ(L,Nh)⟩ to approximate |Ψ(L,∞)⟩. A natural measure of state differences is the square of the l2-norm [61] of the state- vector difference, ∥|Ψ̃(L,∞)⟩ − |Ψ̃(L,Nh)⟩∥22, where the tilde symbol is used to represent cor- responding states after a normalization operation. It is remarkable that the RBM wave- function ansatz is not automatically normalized and an estimation of the normalization factor ⟨Ψ|Ψ⟩ is important and often tricky as shown in Appendix A.2. This measure of truncation errors is adopted in fundamental works about the faithfulness and efficiency of other wave-function ansätze, such as MPSs [62, 63]. So it allows us to make a direct comparison between the efficiencies of RBMs and other state representations. A second measure of state differences is a Hermitian-operator-based expectation-value difference defined as |⟨B̂⟩(L,∞) − ⟨B̂⟩(L,Nh)| = |⟨Ψ̃(L,∞)|B̂|Ψ̃(L,∞)⟩ − ⟨Ψ̃(L,Nh)|B̂|Ψ̃(L,Nh)⟩|. (2.14) Here, B̂ can be any Hermitian operator of the form B̂ = ⊗L j=1 σ̂ (mj) j , where ⊗ is the tensor product symbol, mj ∈ {0, 1, 2, 3}, σ̂(0) j = I2×2 is a 2-by-2 identity matrix, and {σ̂(1) j , σ̂ (2) j , σ̂ (3) j } denote the Pauli matrices. We also use this measure as {σ̂(m) j : m = 0, 1, 2, 3} is a complete basis set for the local Hilbert space for the j-th spin and a wide 22 range of typical physical observables, such as spin correlations and total energy, correspond to Hermitian operators of such type or linear combinations of polynomially many such operators. Then we can prove a lemma which provides upper bounds on truncation errors of the above two types for the sequence of truncated LRFD RBM states. Lemma 3 (upper bounds on truncation errors). For LRFD RBMs satisfying Conditions 1 and 2, after the same reordering of all hidden nodes described in Condition 1, for all Nh > k̃sL, ∥|Ψ̃(L,∞)⟩ − |Ψ̃(L,Nh)⟩∥22 ≤ F1 ( LQ(L)P (Nh/L) ) , (2.15) |⟨B̂⟩(L,∞) − ⟨B̂⟩(L,Nh)| ≤ F2 ( LQ(L)P (Nh/L) ) , (2.16) 23 where F1(x) = 2− 2 exp[−2(1 + β2 1)x] cos(4β2x) (2.17) = c1x+O(x2) (as x→ 0), (2.18) F2(x) = max{| exp(4x)− 1|, |1− exp(−4β2 1x)|} +max{ [ exp(8x)− 2 exp(4x) cos(8β2x) + 1 ]1/2 ,[ exp(−8β2 1x)− 2 exp(−4β2 1x) cos(8β2x) + 1 ]1/2 } (2.19) = c2x+O(x3/2) (as x→ 0). (2.20) P (m) = ∞∑ k̃=m+1 λ2(k̃) (m ≥ k̃s,m ∈ N), (2.21) Q(L) = ( (L−1)/2∑ r=0 µ(r) )2 , (2.22) the relevant constants are β2 = 3 √ 3/π, c1 = 4(1 + β2 1) and c2 = 4β2 1 + 4(β4 1 + 4β2 2) 1/2 and we have assumed that λR(k̃) = λI(k̃) = λ(k̃) for simplicity which holds throughout the following discussion. The proof is given in Appendix A.2 which uses arguments similar to those described in the proof for the convergence of LRFD RBMs. Based on the intuition that the ratio ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗) will fastly converge to 1 with increasing Nh, we derive an upper bound √ R1 and a lower bound √ R2 on the ratio’s modulus |ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗)| and an upper bound Θ on the magnitude of its argument | arg ( ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗) ) |. The two types of truncation errors can be upper bounded using these three variables and the two 24 upper bounds can be finally expressed as functions (F1(x) and F2(x)) of LQ(L)P (Nh/L) which decreases to zero with increasing Nh and fixed L. The idea of the proof is shown schematically in Fig. 2.2(c). For simplicity, we assume that λR(k̃) = λI(k̃) = λ(k̃) throughout the following discussion. Based on our description of the nonlocal interactions between spins and virtual particles and using the language of levels, P (Nh/L) is a summation of all level-decay factors for hidden nodes at levels starting from k̃ = Nh/L + 1 to k̃ = ∞, while Q(L) represents the localized “orbital” at every single level and contributes a factor reflecting the pure influence of system-size growing regardless of levels. The two different types of truncation errors correspond to two different forms of the function F (x), but both of them are analytic at the point x = 0. We give the scaling of truncation errors in Nh as below. It can be obtained that, if Q(x) = O(q(x)) as x→ ∞, P (x) = O(1/pd(x)) as x→ ∞, and F (x) = O(f(x)) as x→ 0, then ε(L,Nh) = O(f( L q(L) pd(Nh/L) )) (as Nh → ∞). (2.23) Our construction of LRFD RBMs and theoretical analysis of the truncation errors can be further clarified with results from numerical computations. We can construct LRFD 25 RBMs with translation symmetry whose parameters exactly satisfy W (L) j,k = cwλ(k̃)µ(|j − jc|circ), (2.24) b (L) k = cbλ(k̃)µ(0), (2.25) a (L) j = a0 (2.26) for any 1 ≤ j ≤ L, 1 ≤ k ≤ Nh, where a0, cw and cb are complex constants with |cb| ≤ |cw| to satisfy Condition 2, k̃ = k̃(k) = ⌈k/L⌉, jc = jc(k) = k − (k̃ − 1)L, Nh is an integer multiple of L, ⌈x⌉ denotes the ceiling function, and k̃s = 0 in this case. It can be shown that such an RBM form can be directly transformed into the RBM form proposed to represent ground states of 1D translationally invariant systems [6] for any finite Nh but we generalize it to an infinitely-many-hidden-node regime. Since the parameters for different hidden nodes can be generated by the action of a translational-symmetry transformation operator on those for a single hidden node, we just need to focus on one representative hidden node for each level. So we propose an importance measure η(j, k̃, L) to measure the importance of a set of edges which is defined as η(j, k̃, L) = ∣∣∣Re(W (L) j,(L+1)/2+(k̃−1)L ) ∣∣∣2 + β2 1 ∣∣∣ Im(W (L) j,(L+1)/2+(k̃−1)L ) ∣∣∣2 (2.27) and present it as a function of the spin-site index j and level index k̃. Its 3D structure can reflect the decay of both λ(k̃) and µ(r) while the center of the “orbital” at every level is localized around j = (L + 1)/2. So a plotting of the peak at every level as a function of the level number (k̃) can reflect the decay of λ(k̃). One example of such LRFD RBM with 26 a power-law decaying λ(k̃) is shown in Fig. 2.1(b). We show the two types of truncation errors ε(L,Nh) as a function of Nh with fixed L for 1D SPT cluster states with a perturbation part (Fig. 2.2(a) and 2.2(b)). It means that the RBM is constructed as a summation of the setting defined in the system of equations (2.3) and a perturbation part specified as Eqs. (2.24)–(2.26) show. The numerical results for λ(k̃) with exponential and power-law decays are given. As described above, the 1D SPT cluster states can be exactly represented by a short-range (1-range) RBM [30]. Using our description, its RBM representation just has one level, and the corresponding λ(k̃) and µ(r) quickly go down to zero for k̃ > 1 and r > 1. The addition of the perturbation part makes the composite RBM a LRFD RBM so that we can study the truncation errors. We give the results for both types of truncation errors and let B̂ be the operator of spin correlations between spin 1 and 2 in z and x directions. Our numerical experiments on the scaling of the truncation errors in Nh with fixed L are well upper bounded by our estimations given in inequalities (2.15) and (2.16), which substantiates our theoretical analysis. Those experiments also indicate that our estimations in Eq. (2.23) correctly capture the asymptotic properties of ε(L,Nh) with varying Nh. Moreover, the fact that the curve of exact ε(L,Nh) and that of our estimation associated with B̂ = σ̂x1 σ̂ x 2 have exactly the same slope implies that our estimation in Eq. (2.23) gives an asymptotically optimal upper bound. It means that, for the second-type truncation errors (inequality (2.16)), there is still room to improve the constant prefactors in our estimation, but we cannot qualitatively further improve the upper bound. In comparison, there is room to both qualitatively improve the upper bound and improve the constant prefactors for the first-type truncation errors (inequality (2.15)). 27 2.2.3 Scaling of complexity Table 2.1: Complexity estimations for distinct typical settings of µ(r) and λ(k̃). “−” in the µ(r) column denotes all µ(r) functions that make Q(L) converge as L→ ∞. “−” in the λ(k̃) column denotes all λ(k̃) functions that make P (Nh/L) have the asymptotic behavior of O(1/ ln(Nh/L)) as Nh → ∞. In all settings, δP > 1, αP > 1/2 and each entry provides a description of the asymptotic behavior of the corresponding function. Manifold µ(r) Q(L) λ(k̃) P (Nh/L) N∗ h(L, ε) S (1) 2 - converge δ−k̃P O(δ −2Nh/L P ) O(L ln(L/ε)) S (2) 2 - converge k̃−αP O((Nh/L) 1−2αP ) O((L2αP /ε)1/(2αP−1)) S (3) 2 r−1 O((lnL)2) δ−k̃P O(δ −2Nh/L P ) O(L ln(L/ε)) S (4) 2 r−1 O((lnL)2) k̃−αP O((Nh/L) 1−2αP ) O((L2αP (lnL)2/ε)1/(2αP−1)) S (5) 2 → µ∞ > 0 O(L2) δ−k̃P O(δ −2Nh/L P ) O(L ln(L/ε)) S (6) 2 → µ∞ > 0 O(L2) k̃−αP O((Nh/L) 1−2αP ) O((L2αP+2/ε)1/(2αP−1)) S (7) 2 - converge - O(1/ ln(Nh/L)) O(L exp (L/ε)) We can investigate the scaling of spatial complexity in system sizes for LRFD RBMs as the results in Sec. 2.2.1 and Sec. 2.2.2 still hold for varying L. We give an upper-bound estimation of the complexity of RBM representations which depends on the asymptotic behavior at x = ∞ of the functions P (x) (Eq. (2.21)) and Q(x) (Eq. (2.22)), and thus is determined by the decaying rates specified by λ(k̃) and µ(r). Define the minimum Nh to achieve a sufficiently small approximation error ε0 as N∗ h(L, ε0) = inf{Nh : ε(L,Nh) ≤ ε0}. (2.28) Using Lemma 3, the sufficient condition for ε(L,Nh) ≤ ε0 is that the corresponding upper bound on truncation errors is no larger than ε0. So this provides one way to get an upper 28 bound on N∗ h(L, ε0) for LRFD RBMs. It can be shown that N∗ h(L, ε0) = O(Lp−1 d ( L q(L) f−1(ε0) )) (as L→ ∞), (2.29) where q(x), pd(x) and f(x) are functions to specify the asymptotic behaviors of Q(x), P (x) and F (x) as defined above and the superscript “−1” denotes the inverse of the corresponding function. Rich information can be extracted from Eq. (2.29). First, the first factor L comes from our assumption that Nh is an integer multiple of the system size L and the second factor L in front of q(L) is extracted using the translational symmetry of the wave function. So these two factors reflect the growing system sizes and the remaining factors reflect the distinction in complexity for different LRFD RBMs. Second, P (Nh/L) and Q(L) (thus µ(r) and λ(k̃)) which characterize the nonlocal structure of RBMs in our description have qualitatively different influence on the complex- ity. Specifically, Q(L) can converge to a finite L-independent constant in the thermodynamic limit and does not influence the complexity for sufficiently localized “orbitals” in the cases where µ(r) decays sufficiently fast. With the upper boundedness condition for µ(r) (Eq. (2.11)), Q(L) can contribute an at-most-linear factor to this upper bound onN∗ h(L, ε0). By contrast, the asymptotic property of P (x) significantly influences the complexity and may lead to the inefficiency of RBM representations if λ(k̃) decays sufficiently slowly. That would imply that there are too many high-order correlations between spins to be captured by the RBM so polynomially many parameters are not enough to fully compress the information into the RBM form. But as long as p−1 d (x) has an at-most-power-law 29 dependence on x, this upper-bound estimation will imply that the complexity is definitely at most polynomial in both system size L and 1/ε0 with the above two types of truncation errors. Moreover, it is also remarkable that our estimation only provides an upper bound on the complexity, so a faster-than-polynomial scaling of the bound (such as S (7) 2 in Table 2.1) does not necessarily imply the inefficiency of the representation. It is possible that the upper bound is not tight and the real complexity is at most polynomial in this case. Third, the asymptotic behavior of F (x) at x = 0 also influences the scaling of the complexity and it directly acts on ε0. We have demonstrated that, for the two types of truncation errors described above, the corresponding F (x)’s (F1(x) and F2(x)) are both analytic at x = 0. For general types of truncation errors that can be upper bounded by a function F (LQ(L)P (Nh/L)), 1/f −1(ε0) has a power-law dependence on 1/ε0 as long as F (x) is analytic at x = 0 based on the Taylor series expansion of the function. This result suggests separate effects of the factors µ(r) and λ(k̃). The scaling of entan- glement entropy, which is an important measure of the complexity of quantum many-body states, is influenced by µ(r), whereas λ(k̃) significantly influences the spatial complexity of parameterization in LRFD RBM representations. The length of the support of µ(r), which determines the “range” r0 of RBMs, directly influences the scaling of the entanglement entropy of the states between subregions [31] but does not directly contribute a faster- than-polynomial factor to the parameterization complexity. This result possibly provides further theoretical evidence for the high efficiency of RBMs in representing states with entanglement entropy scaling faster than an area law in system sizes [31]. We apply our complexity estimation to several typical settings of µ(r) and λ(k̃) in Table 2.1. The manifolds S (j) 2 with 1 ≤ j ≤ 6 all correspond to a spatial complexity which is 30 at most polynomial in L. We also apply this analysis to RBMs constructed as the 1D SPT cluster states with a perturbation part. Our numerical results on the scaling of N∗ h(L, ε0) in L with fixed ε0 (Fig. 2.2(d)) for small system sizes are consistent with our theoretical analysis summarized in Table 2.1. The piecewise linearity of N∗ h(L, ε0) as a function of L with a slope growing very slowly implies that the scaling is perhaps just slightly faster than linear, consistent with our estimation based on parameter settings. The piecewise linearity is due to our assumption that Nh is an integer multiple of L. So it applies a ceiling operation to the ratio Nh/L which will not change when L varies within a small range. The inset in Fig. 2.2(d) shows that NU h (L, ε0) as upper bounds on N ∗ h(L, ε0) in our analysis obtained by using the exact values of the right-hand side of inequality (2.15) and its leading-order estimations are almost the same and both have a power-law scaling in L as indicated by Eq. (2.29), which support the validity of our complexity analysis. 2.2.4 Spin-correlation information In this subsection, we analyze what information about the physical properties of the quantum states can be extracted from the LRFD RBM form using our description of the nonlocal structure. Here, we focus on a small-parameter regime in which aj, bk, Wj,k ≤ ε1, and ε1 ≪ 1/L, ε1 ≪ 1/Nh. We do not explicitly write the superscript “(L)” for RBM parameters and assume that the RBM just has a finite number (Nh) of hidden nodes in this subsection. Based on the proof given in Appendix A.3, we find that the correlation in the z direction between spins with a distance of r for a LRFD RBM with translational symmetry 31 1 6 11 10-2 10-1 100 14 22 30 10-2 100 Figure 2.3: Spin correlations in the z direction as a function of distance r on a log-log scale. The LRFD RBMs are constructed as Eqs. (2.24)–(2.26) show, where µ(r) = 1 2 δQr −αQ for r ̸= 0, µ(0) = δQ = 0.2, λ(k̃) = k̃−αP , αP = 3.5, cw = 1, cb = 0, a0 = 0, L = 22 and Nh = 5L. The inset shows the spin correlation ⟨σ̂z1σ̂z1+L/2⟩ with r being the half-chain length for varying L (on a log-log scale). It shows a convergence of ⟨σ̂z1σ̂z1+L/2⟩ to an L-independent constant (almost attaining the maximum value 1) for αQ = 1/2 and a decay for αQ = 2. is Cz unnorm(r) = ⟨Ψ(L,Nh)|σ̂z1σ̂z1+r|Ψ(L,Nh)⟩ (2.30) = 2 ( Re(WW T ) ) 1,1+r + 4Re(a1) Re(a1+r) +O(ε31) (as ε1 → 0). (2.31) Note that the above result is the r-related part of the spin correlation, while its real value is Cz unnorm(r) divided by an r-independent normalization factor ⟨Ψ(L,Nh)|Ψ(L,Nh)⟩. So for RBMs constructed as Eqs. (2.24)–(2.26) show with a0 = 0 for simplicity, Cz unnorm(r) ≈ |cw|2 Nh/L∑ k̃=1 |λ(k̃)|2 L∑ jc=1 µ(|1− jc|circ)µ(|1 + r − jc|circ). (2.32) 32 So the µ(r)-related factor as shown above describes the decaying rate of spin correlations in the z direction as a function of the distance r, while the λ(k̃)-related factors independent of r do not influence the decaying rate if we only consider the leading-order terms in Eq. (2.31). The above result in Eq. (2.31) gives an interpretation of the roles of hidden nodes. The hidden nodes can be viewed as intermediate virtual particles that relate spins (physical particles) at different lattice sites. When an RBM is short-range, the term ( Re(WW T ) ) 1,1+r will vanish for large enough r as there is no virtual particle that can have both nonzero connectivity to two spins separated by r. Then, more intermediate hidden nodes are needed to transport such relations, which means that we need to consider higher-order terms. This is additional evidence that long-range RBMs can represent states with strong quantum correlations. It is shown in Appendix A.3 that, even when µ(r) → 0 as r → 0, we can still construct LRFD RBMs in which the spin correlations in the z direction can have long-range decayings lower bounded by Θ(1/rαQ) (for µ(r) = Θ(1/rαQ)) with αQ > 1, Θ(ln r/r) (for µ(r) = Θ(1/r)), and even Θ(1) (for µ(r) = Θ(1/rαQ) with 0 < αQ ≤ 1 2 ). These three kinds of decaying rates of spin correlations are demonstrated by numerical computations (Fig. 2.3). The spin correlation ⟨σ̂z1σ̂z1+r⟩ almost saturates the maximum value 1 for αQ = 1/2 and have different long-range decaying rates for αQ = 1 and αQ = 2 as r increases. 33 (a) (b) Figure 2.4: Importance measure η(j, k̃, L) for the RBMs approximating ground states of two critical systems with L = 15. (a) TFIM with Bx = 1. (b) XXZ model with Jz = −0.2. The insets in each subfigure show the decays of the maximum importance measure at each level as level number k̃ increases on a log-log scale. The system size L = 9, 11, 13 and 15. The purple dashed curve implies that these decaying curves can be upper bounded by a power-law decay. By numerical fitting, the corresponding αP for (a) and (b) are 2.957 and 1.232, respectively. 2.3 Ground-state applications Based on the proposal of the concept of LRFD RBMs and the theoretical analysis of their spatial complexity, it is natural to explore their applications to learning quantum states associated with specific models. First, we theoretically prove that the state with all spins pointing up in the z direction, which is the ground state of a spin-1 2 system with a single magnetic field in the z direction and has a form of the Kronecker delta function, can be approximated by LRFD RBMs with arbitrary accuracy in Appendix A.4. We find that the RBM construction is not unique for such a target state even when fixing the global phase which implies eliminating the degree of freedom associated with a global gauge transformation and we give one construction. Thus, we provide one example of the utility 34 of LRFD RBMs in state representation for arbitrarily large system sizes. Second, we are particularly interested in the behavior of RBMs in cases where other state representations become less efficient. We numerically study the representation of the ground states of critical systems with finite sizes for which the MPS representation becomes less efficient [62, 63], while MPS has achieved notable success in representing quantum many-body states with entanglement entropy satisfying an area law [32, 57]. We use RBMs with translational symmetry and apply the conventional quantum Monte Carlo algorithm (also a variational method) with stochastic-reconfiguration opti- mizations [6, 64, 65, 66] to learn the ground states of two typical quantum models: the 1D transverse-field Ising model (TFIM) (Eq. (2.33)) and XXZ model (Eq. (2.34)), described by Hamiltonians Ĥ = − ∑ 1≤j≤L σ̂zj σ̂ z j+1 −Bx L∑ j=1 σ̂xj , (2.33) and Ĥ = ∑ 1≤j≤L (−σ̂xj σ̂xj+1 − σ̂yj σ̂ y j+1 + Jzσ̂ z j σ̂ z j+1) (2.34) with periodic boundary conditions, respectively, where Bx denotes the strength of a trans- verse field and Jz denotes the strength of coupling in the z direction. We use RBMs to learn the ground state of the TFIM with Bx = 1 which implies that the quantum system is exactly in the phase-transition point between a ferromagnetic and a paramagnetic phase [67] and of the XXZ model with Jz = −0.2 which implies that the system is in a gapless disordered 35 XY phase [68]. Both systems are critical systems with the entanglement entropy of the ground states scaling logarithmically in system sizes [67, 69, 70]. The ground states of these two Hamiltonians (at least for small system sizes) can be well learned by RBMs, which is demonstrated by the high accuracy in spin-correlation calculations given in Appendix A.5. The importance measures η(j, k̃, L) for these two RBMs are provided in Fig. 2.4(a) and 2.4(b). The numerical results show that the RBM representations of the two ground states of the above two critical systems have forms very similar to LRFD RBMs. The overall 3D structures for the importance measures η(j, k̃, L) are similar to the one presented in Fig. 2.1(b) which corresponds to a standard LRFD RBM. The weight parameters for hidden nodes at the same level are quite localized and decay fastly as the level number k̃ increases and as the spin site index j goes away from the center. Moreover, it seems that the “ridge” of η(j, k̃, L) for varying system sizes can be upper bounded by an L-independent power-law decay curve, based on which we can extract a corresponding αP characterizing the rate of level decay for these small-system-size wave functions. If these features still hold as L increases and approaches infinity, these states will form LRFD RBMs which belong to the set S (2) 2 or S (6) 2 in Table 2.1 and the corresponding λ(k̃) and µ(r) can be defined. Moreover, the above results exhibit a feature that is also manifested in the theory of MPS representations. It has been shown that [62], though MPS becomes less efficient in representing the ground states of critical systems, the bond dimension required to achieve an approximation error ε0 can still be upper bounded by a function scaling polynomially in the system size L. The exponent in the power-law dependence of spatial complexity of MPSs on L depends on the central charge c, which is a quantity roughly quantifying the “degrees of 36 freedom of the theory” in conformal field theory [57]. A larger c leads to a higher exponent in that estimation which implies a higher complexity in MPS representation. While the TFIM at the above phase-transition point has c = 1 2 and the XXZ model in the disordered XY phase has c = 1 [71], our numerical results do show a smaller fitted αP for the XXZ model, which implies that the XXZ model has more intrinsic “complexity” compared to the TFIM, thus needing more parameters to capture this complexity. 2.4 State manifolds and complexity classification Rigorously speaking, the numerical results for systems of finite sizes only provide evidence supporting that the states may be LRFD RBMs but cannot prove it, since the properties of RBMs in the process of approaching the thermodynamic limit are not yet known. Based on the success of RBMs in numerical simulations and the fact that they can often achieve high accuracy even with a constant number of levels (at least for small system sizes), we conjecture that the ground states of a wide range of quantum systems may be exactly represented by LRFD RBMs or a variant of them. Here, the term “variant” means generalizing the forms specified in Condition 1 and 2 by adding more factors that can be naturally incorporated into our complexity analysis. For example, the λ(k̃) and µ(r) functional forms, which are L-independent in our definition of LRFD RBMs, can be generalized into λ(k̃, L) and µ(r, L), respectively, while their effects can be easily evaluated using our paradigm for complexity analysis. We summarize the relations between multiple typical state manifolds so that the significance of proposing the concept of LRFD RBMs can be better understood. A state 37 𝑆𝑆2: LRFD RBM 𝑆𝑆3: RBM-Polynomial 𝑆𝑆4 : RBM-Inefficient 𝑆𝑆5 : 1D Ground States 𝑆𝑆1: SR RBM 𝑆𝑆2 (𝑗𝑗) Figure 2.5: Relations between multiple typical state manifolds. S1: short-range RBMs; S2: LRFD RBMs; S (j) 2 (for 1 ≤ j ≤ 6, j ∈ N): LRFD RBMs with distinct parameter conditions, specified in Table 2.1; S3: RBMs with spatial complexities scaling at most polynomially in system sizes; S4: RBMs with a faster-than-polynomial scaling of spatial complexities in system sizes, corresponding to inefficiency of representation; S5: ground states of 1D quantum spin systems. The dashed boundary of S5 means that its relations with other manifolds have not been fully determined. 38 manifold usually refers to a subspace of the whole Hilbert space spanned by a parameterized wave-function family [58], thus it is a set containing a specific scope of quantum states. The manifolds S1, S2, S (j) 2 (for 1 ≤ j ≤ 6, j ∈ N), S3 and S4 are defined to be the space spanned by quantum states represented by RBMs satisfying corresponding conditions as given in Fig. 2.5, while S5 is defined to be the manifold spanned by all ground states of 1D quantum many-body spin systems. The definitions of these manifolds directly implies that S1 ⊊ S (j) 2 ⊊ S2 (for 1 ≤ j ≤ 6). Our complexity analysis for LRFD RBMs (Sec. 2.2.3) gives the result that S (j) 2 ⊆ (S2∩S3). Previous research shows that a set of problems where RBMs appear to be powerful are related to topological states, among which the 1D SPT cluster states belong to S5∩S1 [30]. The Laughlin wave functions, which have the structure of Jastrow wave functions and are associated with chiral topological order, can be exactly represented by RBMs in S3 with a quadratic scaling of Nh in L but their approximations with RBMs of a long-range form and less complexity are often used [8]. S4 contains all other sets mentioned in Fig. 2.5 as RBMs without restriction on the number of hidden nodes are universal approximators for discrete distribution [33]. Numerical results seem to support that a “large fraction” of S5 is contained in its intersection with S2. We argue that the concept of S2 may benefit the understanding of which fraction of S5 falls into its intersection with S3, thus also promoting the understanding of the complexity of quantum many-body states. It is remarkable that our paradigm for complexity analysis and our characterization of the nonlocal structures of RBMs for 1D quantum spin systems can be generalized to higher-dimensional systems, e.g., lattices. This is done by generalizing the description of single-level “orbitals” from µ(r) to µ(r⃗) while keeping λ(k̃) as a level-decay factor. For deep 39 σ1 σ2 σ3 σ4 σ5 σ6 ℎ1 ℎ2 ℎ3 ℎ4 Figure 2.6: Network structure of a sparse RBM to be transformed into an MPS. This RBM serves as an example to show the possible failing of the original transformation algorithm and the effect of our improvement. NN quantum states, we can still view each single hidden layer as a combination of multiple levels which capture correlations of different orders. We can calculate the truncation errors for each hidden layer associated with specific nodal functions and analyze the propagation of errors through layers. 2.5 Transformation from RBMs to MPSs There are some works studying the relationship between RBMs and other concepts about state representations, such as string-bond states [8], correlator product states [55] and tensor network states [38, 56]. Especially, the transformation from RBMs to MPSs is analyzed [56]. But such types of transformations may lead to redundancy in parametrization when the RBMs are not quite short-range or sparse as they only use structural information of the network. An algorithm for finding an optimal mapping from RBMs to MPSs is given in Ref. [56], but we point out that this algorithm may fail and mainly works for short-range or very sparse RBMs. The core idea of the optimal transformation algorithm is to find a minimal-size “separation set” when dichotomizing the visible nodes into two parts. In other words, 40 when the nodes in the “separation set” are excluded, the remaining two subsets of visible nodes are not connected. It implies that the wave function of the quantum state can be factorized into two independent parts conditioned on fixed values of spins in the “separation set”. This “separation set” is very similar to the concept of a “Markov blanket” in the undirected graph, but is different from the latter concept in that it allows the nodes in the separated sets to be included. We explore the reason for the failure of the optimal transformation algorithm and put forward an improvement of the algorithm. The algorithm will fail if there are common visible nodes in the two node sets to be separated (“X” and “Y” sets in Ref. [56]), because they should be included in the “separation set”, but can not be done so as their correspond- ing indices have been contracted at the last step. Such a phenomenon will appear when the “separation set” at some step of the tensor construction contains too many visible nodes waiting to be contracted. So this algorithm will fail when the RBM is not so sparse. One example of the failing of this algorithm is given in Fig. 2.6 and Table 2.2. We point out that this algorithm can be improved by not randomly choosing the “separation set” at the construction of tensor A[σj] when there are several choices, but rather choosing the one containing “σj” if possible. This principle can extend the length of steps in which there exists a separation set. The effect of adopting such a principle is given in Table 2.3. The improved algorithm can generate an MPS while the original one fails for the same RBM. In both tables, wjk is an abbreviation of Wjkσjhk. 41 step X Y separation set tensor 1 σ1 σ2, ..., σ6 h1 A (1) 1,h1 [σ1] = exp(a1σ1 + w11) 2 h1, σ2 σ3, ..., σ6 σ3, σ4 Algorithm fails Table 2.2: Failing of the algorithm of finding an optimal transformation from RBM to MPS. The RBM to be transformed has a network structure shown in Fig. 2.6. The original algorithm uses a random selection of “separation sets”. step X Y separation set tensor 1 σ1 σ2, ..., σ6 σ1 A (1) 1,σ1 [σ1] = exp(a1σ1) 2 σ1, σ2 σ3, ..., σ6 σ2, h1 A (2) σ1,σ2h1 [σ2] = exp(a2σ2 + w11 + w21) 3 σ2, h1, σ3 σ4, ..., σ6 σ3, σ4 A (3) σ2h1,σ3σ4 [σ3] = ∑ h1,h2 exp(a3σ3+w22+w31+ w32 + w41 + w42 + b1h1 + b2h2) 4 σ3, σ4 σ5, σ6 h3, h4 A (4) σ3σ4,h3σ4 [σ4] = exp(a4σ4+w33+w43+w44) 5 h3, h4, σ5 σ6 σ6 A (5) h3h4,σ6 [σ5] = ∑ h3,h4 exp(a5σ5 + w53 + w54 + w64 + b3h3 + b4h4) 6 σ6 ∅ ∅ A (6) σ6,1 [σ6] = exp(a6σ6) Table 2.3: Success of generating an MPS using our improved algorithm. The RBM to be transformed is the same as that in Table 2.2. 2.6 Conclusion In this work, we define a subset of generic RBM quantum states—long-range-fast- decay (LRFD) RBM states. Using the language of levels, the nonlocal structure of LRFD RBMs is described with two functions: one of which, µ(r), captures the localization of the spatial distribution of the wave function for each single level and encodes information about spin correlations; the other, λ(k̃), is a level-decay factor capturing correlations of different orders and significantly influencing the complexity of the RBMs. We derive upper bounds on truncation errors, which allow us to analyze the scaling of the spatial complexity in system sizes and approximation errors for LRFD RBMs. We provide numerical results supporting that the ground states of a wide range of 1D quantum spin systems, including 42 some critical systems, may be approximated by LRFD RBMs with an at-most-polynomial complexity. Finally, we describe the relationships between state manifolds of different computational complexity and identify hierarchies of RBM-efficient approximation. Generalizing the RBMwave-function ansatz to an infinitely-many-hidden-node regime and proposing the concept of LRFD RBMs does not imply the use of an infinitely-large neural network for state representations. These serve to define the completeness of a set of variational states and serve as a tool for complexity analysis based on the good extensibility and analyzability of LRFD-RBM forms. This concept may promote general understanding of the intrinsic complexity of quantum many-body states. 43 Chapter 3: Learning quasiparticle excitations in long-range interacting quantum systems using neural networks 3.1 Introduction The interface between machine learning and quantum information processing is an emerging and rapidly advancing field [2, 3, 5, 6, 24, 72, 73]. Besides the research on quantum algorithms [11, 12, 13, 14] and physics-inspired ideas to enhance traditional machine learning [15, 16] and on the fundamental limitations of physical agents based on quantum information theory [74, 75], there has also been tremendous progress in directly applying machine learning techniques to quantum systems. Indeed, these techniques were applied to a wide range of topics including the identification of quantum phases and transitions [3, 17, 18, 19, 21], molecular modeling [22, 23], and quantum state tomogra- phy [24, 25]. Neural network states have been extensively studied recently as they provide a compact wave-function ansatz in the context of quantum many-body physics [5, 6, 7, 8, 9, 28, 30, 31]. These states have achieved success in simulating the low-lying eigenstates and short-time unitary dynamics of quantum many-body-localized systems [6, 34] and representing particular states such as code words of a stabilizer code [30, 35, 36] and chiral topological states [8, 37]. 44 Numerous atomic, molecular, and optical systems exhibiting long-range interactions are emerging as versatile platforms for quantum computation and quantum simulation [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]. These long-range interactions include dipolar (decaying as 1/r3 with distance r) interactions between electric [39, 40] or magnetic dipoles [41, 42, 43], strong van der Waals (1/r6) interactions between Rydberg atoms [39, 44, 45] or Rydberg polaritons [46], along with more general forms of interaction between trapped ions (1/rα, 0 ≤ α ≤ 3) [47, 48, 49, 50, 51, 52]. Following a quench, quantum information in long-range interacting systems can propagate faster than in short-range interacting ones [48, 49, 76, 77, 78]. The notion of quasiparticles as elementary excitations often provides an effective way for understanding non-equilibrium dynamics and quantum thermalization [79, 80, 81]. For instance, the dynamics of correlation spreading and entan- glement growth are often constrained by an effective light cone and at low energy they can often be ascribed to the propagation of quasiparticles through the system [76, 77, 82, 83, 84, 85, 86, 87, 88, 89, 90]. This light cone is linear (t ∼ r) for short-range interactions, but, for small α, can become sub-linear (e.g., t ∼ rβ, 0 < β < 1) or even instantaneous (for any r, t = 0 in the thermodynamic limit). In the case of linear light cones, the maximum speed of quantum information propagation can often be related to the maximum group velocity of quasiparticles, while the sub-linear and instantaneous light cones can often be associated with infinite group velocities of quasiparticles in the thermodynamic limit. In higher dimensions D and in the presence of strongly long-range interactions, it can be hard to numerically study the system with some widely-adopted methods, such as the density matrix renormalization group, due to the violation of entanglement area law [69, 91, 92] and intractability of contracting higher-dimensional projected-entangled pair states [93]. 45 In this work, we use restricted Boltzmann machines (RBMs) to learn the momentum- resolved low-energy spectrum of long-range interacting systems, which are studied in current quantum simulation experiments [47, 48, 49, 50, 51, 52]. We introduce an energy-shift method to calculate the excited states. The core idea of the method is to shift the energies of the original Hamiltonian in a way that makes a target excited state the ground state of the new Hamiltonian. This method has an advantage that the target excited state is represented by a single RBM along the entire optimization path, rather than by multiple RBMs [34]. Such a single-RBM representation has a smaller parameter space and its variational parameters have a more natural physical interpretation, so that the geometrical properties, such as the quantum Fisher matrix, of the optimization path can be studied more easily and help understand the corresponding quantum phase [94, 95, 96]. Combined with a fixed-quantum-number method [34] generalized to long-range interactions, this method can resolve the full quasiparticle dispersion relation even in the presence of strongly long-range interactions and works independently of the ground-state degeneracy and of the size of the gap above the ground-state manifold. Based on the low-energy spectrum, we identify the critical exponent αc where the maximal quasiparticle group velocity transits from finite to divergent in the thermodynamic limit in D = 1. The results are quantitatively consistent with an analysis using field theory and linear spin-wave theory [68, 76, 85, 97]. Our use of RBMs to learn the spectrum of two-dimensional (2D) models is particularly noteworthy since other numerical methods, such as tensor network states, are often computationally intractable in 2D [93]. Our results can help understand the information propagation speed, entanglement growth, and the rate of thermalization in long-range interacting quantum systems. Our work can also be used 46 to provide benchmarks for large-scale 1D and 2D state-of-the-art quantum simulators. 3.2 Learning excited states with RBMs We use the RBM as a variational wave-function ansatz to learn the eigenstates of quantum many-body spin systems. An RBM is a stochastic artificial neural network for probability modeling defined on a bipartite undirected graph [6]. Its first layer is the visible layer containing a spin configuration σ⃗ as input, where σ⃗ = (σ1, . . . , σL) has L degrees of freedom and σj = ±1 (j = 1, . . . , L) for spin-1/2 systems. The second layer is comprised of hidden nodes h⃗ = (h1, . . . , hNh ) (hk = ±1 for k = 1, . . . , Nh), which are introduced as auxiliary spins in the probability model. Given a specific spin configuration σ⃗, the RBM outputs a wave function amplitude: ψ(σ⃗) = ∑ h⃗ e ∑ j ajσj+ ∑ k bkhk+ ∑ j,kWjkσjhk = L∏ j=1 eajσj Nh∏ k=1 2 cosh(bk + ∑ j σjWjk), (3.1) where aj and bk are the visible and hidden biases, respectively, Wjk are weights corre- sponding to interlayer interactions, and all these parameters are complex numbers. All such amplitudes defined on the computational basis yield a quantum state vector |Ψ⟩ =∑ σ⃗ ψ(σ⃗)|σ⃗⟩. This RBM ansatz possesses good expressive power [7] and can often be optimized (trained) by several well-developed numerical methods, such as the stochastic reconfiguration method [6, 64, 65] and stochastic gradient descent [66]. While the RBM can be used to find ground states by reinforcement learning [6], its 47 applicability to low-lying excited states of various quantum many-body systems and the development of efficient optimization methods are less fully explored. Here we introduce an energy-shift method which shifts the energies of the original Hamiltonian by adding projection operators to make a target excited state the ground state of the new Hamiltonian. Let |Ψj⟩ denote the j-th excited states of a given Hamiltonian Ĥ0 for a system containing L spins, where j = 0, 1, . . . , 2L− 1 and j = 0 corresponds to the ground state. To calculate |Ψj⟩, the idea is to first calculate all the lower excited states |Ψl⟩ (0 ≤ l ≤ j − 1) and then modify Ĥ0 by adding projection operators that lift the energies of all |Ψl⟩: Ĥj = Ĥ0 + j−1∑ l=0 E (shift) l |Ψl⟩⟨Ψl| ⟨Ψl|Ψl⟩ . (3.2) This treatment is analogous to the quadratic penalty method for the optimization problem with equality constraints [98] and leads to the convergence of the RBM to |Ψj⟩ in the effective imaginary-time evolution [6], provided that E (shift) l is larger than the energy differ- ence between |Ψj⟩ and |Ψl⟩. These projection operators can be efficiently implemented (Ap- pendix B.1), thus at least low-lying excited states can be calculated successively in principle. This method is also applicable to resolving nearly degenerate ground states. But for systems with exact ground-state degeneracy such as spin glasses [99] and topologically ordered systems [100, 101], how to improve the accuracy under the sign problem [102] and whether RBMs can properly diagonalize the degenerate subspace deserves more investigations. For translationally invariant systems, the symmetry information can be utilized to calculate the lowest-energy eigenstate in each momentum sector [34]. In this RBM variant, the original RBM wave-function ansatz (Eq. (4.1)) only applies to canonical spin configu- 48 rations, while the amplitudes of other configurations are constructed by building mappings onto canonical configurations so that the whole state satisfies translational symmetry constraints. Specifically, in D = 1, let T̂ denote the unit-distance translation operators for spin configurations along the cha