ABSTRACT

Title of dissertation: APPLICATIONS OF ARTIFICIAL
NEURAL NETWORKS IN LEARNING
QUANTUM SYSTEMS

Ruizhi Pan
Doctor of Philosophy, 2023

Dissertation directed by: Professor Charles W. Clark
Department of Physics

Quantum machine learning is an emerging field that combines techniques in the

disciplines of machine learning (ML) and quantum physics. Research in this field takes

three broad forms: applications of classical ML techniques to quantum physical systems,

quantum computing and algorithms for classical ML problems, and new ideas inspired by

the intersection of the two disciplines. We mainly focus on the power of artificial neural

networks (NNs) in quantum-state representation and phase classification in this work.

In the first part of the dissertation, we study NN quantum states which are used as

wave-function ansätze in the context of quantum many-body physics. While these states

have achieved success in simulating low-lying eigenstates and short-time unitary dynamics

of quantum systems and efficiently representing particular states such as those with a

stabilizer nature, more rigorous quantitative analysis about their expressibility and com-

plexity is warranted. Here, our analysis of the restricted Boltzmann machine (RBM) state


representation of one-dimensional (1D) quantum spin systems provides new insight into

their computational complexity. We define a class of long-range-fast-decay (LRFD) RBM

states with quantifiable upper bounds on truncation errors and provide numerical evidence

for a large class of 1D quantum systems that may be approximated by LRFD RBMs of at

most polynomial complexities. These results lead us to conjecture that the ground states

of a wide range of quantum systems may be exactly represented by LRFD RBMs or a

variant of them, even in cases where other state representations become less efficient. At

last, we provide the relations between multiple typical state manifolds. Our work proposes

a paradigm for doing complexity analysis for generic long-range RBMs which naturally

yields a further classification of this manifold. This paradigm and our characterization

of their nonlocal structures may pave the way for understanding the natural measure of

complexity for quantum many-body states described by RBMs and are generalizable for

higher-dimensional systems and deep neural-network quantum states.

In the second part, we use RBMs to investigate, in dimensionsD = 1 and 2, the many-

body excitations of long-range power-law interacting quantum spin models. We develop an

energy-shift method to calculate the excited states of such spin models and obtain a high-

precision momentum-resolved low-energy spectrum. This enables us to identify the critical

exponent where the maximal quasiparticle group velocity transits from finite to divergent

in the thermodynamic limit numerically. In D = 1, the results agree with an analysis using

the field theory and semiclassical spin-wave theory. Furthermore, we generalize the RBM

method for learning excited states in nonzero-momentum sectors from 1D to 2D systems.

At last, we analyze and provide all possible values (3
2
, 2 and 3) of the critical exponent for

1D generic quadratic bosonic and fermionic Hamiltonians with long-range hoppings and


pairings which serves for understanding the speed of information propagation in quantum

systems.

In the third part, we study deep NNs as phase classifiers. We analyze the phase

diagram of a 2D topologically nontrivial fermionic model Hamiltonian with pairing terms

at first and then demonstrate that deep NNs can learn the band-gap closing conditions

only based on wave-function samples of several typical energy eigenstates, thus being able

to identify the phase transition point without knowledge of Hamiltonians.


APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS
IN LEARNING QUANTUM SYSTEMS

by

Ruizhi Pan

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2023

Advisory Committee:
Professor Mohammad Hafezi, Chair
Professor Charles W. Clark, Co-Chair/Advisor
Professor Victor Yakovenko
Professor Alexey Gorshkov
Professor Christopher Jarzynski, Dean’s representative


© Copyright by
Ruizhi Pan

2023


Acknowledgments

I would first like to express my heartfelt gratitude to my advisor Charles Clark, who

has supported me throughout my Ph.D. study with his great patience and encouragement.

I really appreciate the freedom that Charles gave to me in choosing research topics and

exploring them at my own pace. I am really grateful for his instructions on how to delve

into a specific research area and the so-called paper-torture session through which we could

have a detailed discussion on manuscript writing. In fact, it is not torture for me and I

really enjoy and cherish such a process. Working under the instruction of a knowledgeable,

prestigious and amiable professor is my long-cherished wish and I think it is fulfilled at the

University of Maryland, College Park.

I would like to thank many professors at the Joint Quantum Institute, the Joint Center

for Quantum Information and Computer Science and other research institutes who offered

me tremendous help in my Ph.D. research, including Alexey Gorshkov, Victor Yakovenko,

Mohammad Hafezi, Andrew Childs, Ian Spielman, Sankar Das Sarma, Jay Deep Sau, Ana

Maria Rey, Christopher Jarzynski, Jacob Taylor, Chris Greene, Ricardo Nochetto, Xiaoji

Zhou, and Xuzong Chen. I really appreciate their willingness to share their knowledge and

enthusiasm in science.

I also want to thank my friends in the department of physics, mathematics and

computer science: Wenbo Li, Fangli Liu, Yi-Hsieh Wang, Yiming Cai, Peizhi Du, Bin Cao,

ii


Yuchen Yue, Tongyang Li, Linfeng Zhang, Dong-ling Deng, Yidan Wang, Zhexuan Gong,

Renxiong Wang, Xunnong Xu, Haitan Xu, Chunxiao Liu, Alejandra Maldonado-Trapp,

Ben Eller, Tengfei Su, Peng Zhang and so many other colleagues. Their support and

encouragement give me warmth and courage.

At last, I would like to thank my parents. Their love and support are my permanent

source of strength to explore the world.

iii


Table of Contents

Acknowledgements ii

Table of Contents iv

List of Tables vi

List of Figures vii

List of Abbreviations xiii

Chapter 1: Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Outline of dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Chapter 2: Efficiency of neural-network state representations of one-
dimensional quantum spin systems 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 The restricted Boltzmann machine as a wave-function ansatz . . . . . . . . 11

2.2.1 Long-range-fast-decay RBMs . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Effects of wave-function truncation for fixed system sizes . . . . . . 22
2.2.3 Scaling of complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.4 Spin-correlation information . . . . . . . . . . . . . . . . . . . . . . 31

2.3 Ground-state applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 State manifolds and complexity classification . . . . . . . . . . . . . . . . . 37
2.5 Transformation from RBMs to MPSs . . . . . . . . . . . . . . . . . . . . . 40
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Chapter 3: Learning quasiparticle excitations in long-range interacting
quantum systems using neural networks 44

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Learning excited states with RBMs . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Group velocity transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Correlations in excited states . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 2D generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.6 Summary of quadratic Hamiltonians with long-range hoppings and pairings 63
3.7 Experimental relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

iv


Chapter 4: Deep neural networks as phase classifiers 69
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Model and physical intuition . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Zero-energy edge modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4 Phase diagram at the sweet spot . . . . . . . . . . . . . . . . . . . . . . . . 81
4.5 Application of neural networks as phase classifiers . . . . . . . . . . . . . . 88
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Appendix A: Appendix for Chapter “Efficiency of neural-network state representations
of one-dimensional quantum spin systems” 92

A.1 Proof of the convergence of |Ψ(L,∞)⟩ for long-range-fast-decay RBMs . . . . 92
A.2 Proof of upper bounds on truncation errors for LRFD RBMs . . . . . . . . 100
A.3 Proof of spin correlation formula . . . . . . . . . . . . . . . . . . . . . . . . 107
A.4 LRFD RBMs approximating the Kronecker delta function . . . . . . . . . 113
A.5 Error curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Appendix B: Appendix for Chapter “Learning quasiparticle excitations in long-range
interacting quantum systems using neural
networks” 118

B.1 Analysis of energy-shift method . . . . . . . . . . . . . . . . . . . . . . . . 118
B.2 Field-theory formula analysis and data collapse . . . . . . . . . . . . . . . 122

Appendix C: Appendix for Chapter “Neural networks as phase classifiers” 124
C.1 Hamiltonian in Majorana representation . . . . . . . . . . . . . . . . . . . 124
C.2 Derivation of the phase boundary between gapped SC and gapless phases . 125

Appendix D: Other research projects 127
D.1 Optomechanics and novel quantum phase of the Bose-Einstein

Condensate with the cavity mediated spin-orbit coupling . . . . . . . . . . 127

Bibliography 132

v


List of Tables

2.1 Complexity estimations for distinct typical settings of µ(r) and λ(k̃). “−”
in the µ(r) column denotes all µ(r) functions that make Q(L) converge as
L → ∞. “−” in the λ(k̃) column denotes all λ(k̃) functions that make
P (Nh/L) have the asymptotic behavior of O(1/ ln(Nh/L)) as Nh → ∞. In
all settings, δP > 1, αP > 1/2 and each entry provides a description of the
asymptotic behavior of the corresponding function. . . . . . . . . . . . . . 28

2.2 Failing of the algorithm of finding an optimal transformation from RBM to
MPS. The RBM to be transformed has a network structure shown in Fig. 2.6.
The original algorithm uses a random selection of “separation sets”. . . . . 42

2.3 Success of generating an MPS using our improved algorithm. The RBM to
be transformed is the same as that in Table 2.2. . . . . . . . . . . . . . . . 42

3.1 The critical exponent αc for generic fermionic quadratic Hamiltonians with
long-range hoppings and pairings. dj with 1 ≤ j ≤ 7 are constants dependent
on J0, J1, ∆ and α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.2 The critical exponent αc for generic bosonic quadratic Hamiltonians with
long-range hoppings and pairings. d′j with 1 ≤ j ≤ 9 are constants dependent
on J0, J1, ∆ and α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

vi


List of Figures

vii


2.1 (a) Network structure of RBMs as a wave-function ansatz for 1D quantum
spin systems. Long-range RBMs usually implies full connectivity between
the visible and hidden layer. The hidden layer is divided into Nh/L levels,
each containing L hidden nodes. (b) Importance measure η(j, k̃, L) for a
LRFD RBM with translational symmetry. The RBM is constructed as
Eqs. (2.24)–(2.26) show, where λ(k̃) = k̃−αP , µ(r) = 1

2
δQr

−αQ for r ̸= 0,
µ(0) = δQ = 0.5, αP = 0.75, αQ = 1.5, cw = 1 + i, cb = 0, a0 = 0 and
L = 11. The inset plots the decay of the maximum of η(j, k̃, L) among all j
at each level with increasing k̃ on a log-log scale. The linearity of the curve
reveals a power-law decaying of the “ridge” (red circles) of the 3D structure. 12

2.2 (a)(b) Comparison between the exact and estimated truncation errors ε(L,Nh)
as a function ofNh with fixed L for 1D SPT cluster states with a perturbation
part. E Ψ: exact values, first-type truncation errors; U Ψ: upper-bound-
based estimation, first-type; E CZ: exact, second-type, B̂ = σ̂z1σ̂

z
2; E CX:

exact, second-type, B̂ = σ̂x1 σ̂
x
2 ; U CX: upper-bound-based estimation, second-

type, B̂ = σ̂x1 σ̂
x
2 . The perturbation part is constructed as Eqs. (2.24)–(2.26)

show. µ(r) = 1
2
δQr

−αQ for r ̸= 0, µ(0) = δQ = 0.1, αQ = 3, cw = cb = 1 + i,

a0 = 0 and L = 11. (a) Exponential decay of λ(k̃) (vertical axis: log scale).

λ(k̃) = 0.2δ
−(k̃−1)
P with δP = 1.5. (b) Power-law decay of λ(k̃) (on a log-log

scale). λ(k̃) = k̃−αP with αP = 3. (c) Schematic interpretation of variables
used in proofs. The inset in (c) shows the distribution of data points of
the ratio ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗) with Nh = L, which corresponds to a trunca-
tion removing hidden nodes starting from the second level. The parameter
setting is the same as that in (b). The data points are all localized in the
neighborhood of z = 1 in the complex plane enclosed by the red solid curves
in (c) as we analyzed. (d) Scaling of N∗

h(L, ε0) in L for two fixed values of
the first-type truncation errors ε0 with the same parameter setting as in (b).
Red circle: ε0 = 10−7; Blue square: ε0 = 10−10. The inset in (d) shows
the scaling of Nh estimated based on our upper bounds with ε0 = 10−3.
Magenta solid: using exact values of upper bounds; Brown dashed: using
leading-order estimations. The two curves almost coincide. . . . . . . . . . 21

2.3 Spin correlations in the z direction as a function of distance r on a log-log
scale. The LRFD RBMs are constructed as Eqs. (2.24)–(2.26) show, where
µ(r) = 1

2
δQr

−αQ for r ̸= 0, µ(0) = δQ = 0.2, λ(k̃) = k̃−αP , αP = 3.5,
cw = 1, cb = 0, a0 = 0, L = 22 and Nh = 5L. The inset shows the spin
correlation ⟨σ̂z1σ̂z1+L/2⟩ with r being the half-chain length for varying L (on

a log-log scale). It shows a convergence of ⟨σ̂z1σ̂z1+L/2⟩ to an L-independent

constant (almost attaining the maximum value 1) for αQ = 1/2 and a decay
for αQ = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

viii


2.4 Importance measure η(j, k̃, L) for the RBMs approximating ground states
of two critical systems with L = 15. (a) TFIM with Bx = 1. (b) XXZ
model with Jz = −0.2. The insets in each subfigure show the decays of the
maximum importance measure at each level as level number k̃ increases on
a log-log scale. The system size L = 9, 11, 13 and 15. The purple dashed
curve implies that these decaying curves can be upper bounded by a power-
law decay. By numerical fitting, the corresponding αP for (a) and (b) are
2.957 and 1.232, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5 Relations between multiple typical state manifolds. S1: short-range RBMs;
S2: LRFD RBMs; S

(j)
2 (for 1 ≤ j ≤ 6, j ∈ N): LRFD RBMs with

distinct parameter conditions, specified in Table 2.1; S3: RBMs with spatial
complexities scaling at most polynomially in system sizes; S4: RBMs with a
faster-than-polynomial scaling of spatial complexities in system sizes, corre-
sponding to inefficiency of representation; S5: ground states of 1D quantum
spin systems. The dashed boundary of S5 means that its relations with other
manifolds have not been fully determined. . . . . . . . . . . . . . . . . . . 38

2.6 Network structure of a sparse RBM to be transformed into an MPS. This
RBM serves as an example to show the possible failing of the original trans-
formation algorithm and the effect of our improvement. . . . . . . . . . . . 40

3.1 Low-energy spectra (red circles) obtained by RBMs with hidden-unit density
β = 1 compared to the results obtained by exact diagonalization (solid blue
line), and the relative errors on energy values using RBMs with β = 1 and
2 (dashed lines). (a) Gapped long-range TFIM with α = 1.2, B = 5 and
L = 18. (b) Gapless long-range XXZ model with α = 1.2, Jz = −0.5 and
L = 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2 (a,b) Dispersion relations shifted vertically so that the ground-state energy
is zero. The results are obtained by RBMs with β = 1. (a) TFIM with
α = 1.5, 2, 3 and ∞, while B = 4 and L = 53. (b) XXZ model with α = 1.5,
2.5, 3, 3.5 and ∞, while Jz = −0.5 and L = 53. (c) Group velocity vg(k1)
as a function of α for different system sizes L for TFIM with B = 4. (d)
Data collapse for (c). The insets in the lower right corner show the variance
in data collapse processing for a range of αc. . . . . . . . . . . . . . . . . 53

3.3 Longitudinal correlations in low-lying excited states of the long-range TFIM
in different phases. (a) The lowest excited states in sectors k = 0 and
k = 2π/L in the paramagnetic phase with α = 1.8, B = 4 and L = 70. Inset:
the long-distance limits of longitudinal correlations in the lowest excited state
in the sector k = 0 for 30 ≤ L ≤ 70. (b) The lowest excited state in the
sector k = 2π/L in the ferromagnetic phase with α = 1.8, B = 0.9 and
L = 70. Inset: the long-distance limits of longitudinal correlations in the
lowest excited state in the sector k = 2π/L for 50 ≤ L ≤ 70. . . . . . . . . 56

ix


3.4 2D long-range TFIM on a square lattice. (a) Low-lying energy spectrum
by generalized RBMs with hidden-unit density β = 2. α = 3.5, B = 6,
L = 4 × 4 = 16. (b) Relative error in energy calculations in (a) compared
to exact-diagonalization solutions. Ej1j2 denotes E(ky = 2π√

L
j1, kz =

2π√
L
j2).

(c,d) Energy spectra by generalized RBM with β = 1. α = 2.5 (for (c)) and
α = 4.5 (for (d)), B = 8, L = 8× 8 = 64. In (a), (c) and (d), the red points
correspond to the ground states. . . . . . . . . . . . . . . . . . . . . . . . . 58

3.5 2D long-range TFIM on a triangular lattice. (a) Low-lying energy spectrum
by generalized RBMs with β = 2. α = 3, B = 8, L = 4 × 4 = 16. (b)
Relative error in energy calculations in (a) compared to exact-diagonalization
solutions. Ej1j2 denotes E(k1 = 2π√

L
j1, k2 = 2π√

L
j2). (c,d) Energy spectra by

generalized RBMs with β = 1. α = 2.7 (for (c)) and α = 4.5 (for (d)),
B = 9, L = 8× 8 = 64. In (a,c,d), the red points correspond to the ground
states. In (a,c), the insets are the heat map of the corresponding energy
spectra and exhibit the expected rotational symmetry. In (d), the inset is

the scaling of the y-component vy of group velocity at k⃗ = 0⃗ with system
sizes (L = 16, 36, and 64) for different α by RBM learning. . . . . . . . . 59

x


4.1 Physical intuition of generating zero-energy edge modes in the case of µ = 0.
(a) Kitaev’s 1D spinless p-wave SC quantum wire. The two neighboring MFs
constitute a normal fermion. The blue arrows in the upper chain signify the
internal pairing of MFs with no unpaired MFs remaining. The red arrows
in the lower chain indicate the inter-cell pairing of MFs with two unpaired
MFs at the ends of the chain. (b) MF coupling at a single armchair edge
of a 2D honeycomb lattice. The upper two and lower left subfigures are for
our model in which there is no 3-fold rotational symmetry (RS). The solid
bonds are the net Majorana couplings contributed by terms related to (∆, t),
(∆↑, t↑) and (∆↓, t↓) in Ĥ, respectively. The shaded and colored cells denote
the dangling MFs in a hexagon at a single armchair edge of the lattice. The
lower right subfigure shows an example of unexpected MF couplings in which
there is rotational symmetry, just in comparison with that in our model. . 72

4.2 Schematic of our physical system. (a) The depictions of the three vectors (δ⃗j)
connecting nearest-neighbor sites and the three vectors (r⃗j) connecting NNN
sites for j = 1, 2, 3. (b) Angular distribution of the sign in the amplitude and
phase of the defined pairing which is similar to the domain wall structure.
(c) The distribution of zero modes at a single armchair edge. The heights of
the pillars denote the amplitudes of the wave function at each site. . . . . . 76

4.3 Band structure for µ = 0 in four cases characterized by different gap con-
ditions and winding numbers w. Fixed parameters: t = 1, t↑ = 0.4,
t↓ = 0.6. (a) p⃗ = (0.9Kx, 0); (b) p⃗ = (0, 3Ky); (c) p⃗ = (0.1Kx, 0.2Ky);
(d) p⃗ = (0.6Kx, 3Ky), where Kx = 2π/3 and Ky = Kx/

√
3. Each group

of blue curves represents a bulk band. The dark red straight lines are the
zero-energy edge modes and the brown and green curves in (b) and (d) are
gapless edge modes. In (a) and (b), the zero modes are 4-fold degenerate;
in (c) and (d), the zero modes are 2-fold degenerate. The inset in (c) zooms
in on the split zero modes (magenta) and completely flat zero modes (dark
red) separately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.4 Phase diagram at the generalized “sweet spot” when µ = 0, obtained by
numerical simulation. (a) shows the phase boundary between the gapped
SC phase (purple) and gapless phase (light green). (b) shows the phase
boundary between phases with different winding numbers. The green region
indicates w = 0, the blue region w = 1 and the red region w = −1. A
combination of these two figures shows the full 4-phase diagram. . . . . . . 82

4.5 Regions of the gapped SC phase in the 3D parameter space (sinΦ1, sinΦ2,
sinΦ3). The shaded surface near the 12 edges of the cube shows the domain
in which at least one inequality | cosΦj| + | cosΦk| ≥ | cosΦl| is violated,
where (j, k, l) is a permutation of (1, 2, 3). This surface defines the gapped
SC phase. Note that the only points in this parameter space that have
physical significance are those for which Φ2 = Φ1 + Φ3 (including both the
near-edge and kernel regions). . . . . . . . . . . . . . . . . . . . . . . . . . 85

xi


4.6 (a) Training process using the deep-learning toolbox provided by Matlab
software. The prediction accuracy increases and the loss decreases as the
training proceeds. (b) Effects of learning the phase transition between the
gapped SC phase and the gapless phase (both with a winding number of 0).
py/Ky is fixed to be 0. The exact phase transition point is px/Kx = 2/3.
The output of a fully trained deep NN exhibits a sudden change when px/Kx

varies across the phase transition point. The wave-function samples used as
training data just cover a small parameter range far away from the critical
point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

A.1 Distribution of the square of normalized wave-function amplitudes in the
spin-configuration space which can approximate the Kronecker delta function.
The LRFD RBMs are constructed as Eqs. (A.141)–(A.143) show and µ0 =

0.1, L = 13 and λ(k̃) = δ
−(k̃−1)
P with δP > 1. The horizontal axis denotes

the numerical indices of all 2L spin configurations which are sorted in a
monotonically decreasing order of their corresponding amplitudes. The inset
shows the ratio |ψ(L,∞)(σ⃗0)/ψ

(L,∞)(σ⃗1)|2 as a function of 1/δP . . . . . . . . 113
A.2 (a) Approximation errors as a function of the number of hidden nodes for

truncated LRFD RBMs and the optimal RBMs. (b) Approximation errors
for the calculations of spin correlations in the x and z directions as a function
of the number of levels kept in the truncated LRFD RBMs for the XXZ model
compared with results from exact-diagonalization methods. The figure is
plotted on a log-log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

B.1 Long-range TFIM with B = 6 learned with an RBM with hidden-unit
density equal to 1. (a) Group velocity vg(k1) as a function of α for a range of
system sizes L. (b) Data collapse for (a), while the inset shows the variance
in data collapse processing for different choices of αc. . . . . . . . . . . . . 121

xii


List of Abbreviations

BEC Bose-Einstein Condensate
CQED Cavity Quantum Electrodynamics
EM Electromagnetic
LRFD Long-range-fast-decay
MD Multi-fold Degeneracy
MF Majorana Fermion
ML Machine Learning
MPS Matrix Product State
NN Neural Network
NNN Next Nearest Neighbor
RBM Restricted Boltzmann Machine
SC Superconducting
SOC Spin-orbit Coupling
TFIM Transverse Field Ising Model
TRS Time-Reversal Symmetry

xiii


Chapter 1: Introduction

1.1 Background

Quantum machine learning is an emerging field that combines techniques in the

disciplines of machine learning (ML) and quantum physics [1, 2, 3, 4, 5, 6, 7]. Research in

this field takes three broad forms [2]: applications of classical ML techniques to quantum

physical systems [4, 5, 6, 7, 8, 9, 10], quantum computing and algorithms for classical

ML problems [11, 12, 13, 14], and new ideas inspired by the intersection of the two

disciplines [15, 16]. In the field of learning quantum systems, there has been tremendous

progress in applying ML techniques to identifying quantum phases and transitions [3, 17,

18, 19, 20, 21], molecular modeling [22, 23], quantum state tomography [24, 25], and

accelerating Monte Carlo simulations [26, 27]. While ML encompasses a wide range of

modeling tools and computational algorithms to suit different needs in theoretical modeling

and information processing, artificial neural networks (NNs), which are computing systems

with specific architectures analogous to and actually inspired by the biological neural

networks in animal brains, often play an important role due to their tremendous power

in function approximation, classification and data processing.

One rapidly advancing field in recent years is the investigation of neural-network

quantum states in the context of quantum many-body physics [5, 6, 7, 8, 9, 28, 29, 30, 31].

1


The core idea in this field is to postulate an ansatz for the wave function in terms of

a neural network (NN) [6], which targets a low-dimensional manifold in the exponen-

tially large Hilbert space for state approximation [32], and apply ML algorithms to find

a specific solution. One of the most commonly used neural networks is the restricted

Boltzmann machine (RBM) [6, 7, 8, 28, 29], which is a bipartite stochastic construct that

combines the concepts of thermodynamic partition functions with those of classical artificial

neural networks. The RBM usually works as the building block for understanding and

training deeper networks because of its relatively simple structure for inference and its

power in parametric modeling as a universal approximator for discrete distribution [33].

It has achieved success in representing a wide range of quantum states such as low-lying

eigenstates of quantum many-body-localized systems [6, 34], code words of a stabilizer

code [30, 35, 36] and chiral topological states [8, 37, 38].

One of the central challenges in state representation theory for quantum many-body

states is to find efficient representations of states based on which the physical quantities of

the global quantum systems can be extracted with information loss as less as possible [32].

It has been demonstrated that the RBMs as a representative of NNs have nonnegligible

advantages in state representation for particular sets of quantum states including the states

with entanglement entropy violating an area law, states of high-dimensional quantum

systems and states related to chiral topological order [7]. But it remains to be determined

what the natural measure of the complexity for quantum many-body states described by

RBMs is and how to quantitatively study the expressibility of this state manifold.

To fully understand and exploit the power of NN state representation, we apply it

to the investigation of long-range interacting quantum systems—those with interactions

2


decaying as a power law 1/rα in distance r. The low-lying eigenstates of such systems

can possess quantum correlations with a long-range decaying and entanglement entropy

violating the area law, thus are often associated with a higher parameterization complexity.

In the past years, numerous atomic, molecular, and optical systems exhibiting long-range

interactions are emerging as versatile platforms for quantum computation and quantum

simulation [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]. So the investigation of

the application of NNs to these systems can possibly benefit understanding quantum states

with more complexity and provide potential benchmarks for large-scale near-term quantum

simulators.

Besides the use in parameterizing wave functions for quantum systems, NNs with more

complicated architectures, such as convolutional NNs, can also benefit the field of statistical

physics in other ways. One of which is to use deep NNs to identify phases and phase

transitions based on their ability to detect multiple types of order parameters, including

some nontrivial states without conventional order. It is shown that NNs can identify phase

transitions in correlated many-body systems without the knowledge of locality conditions

of Hamiltonians [3]. We are curious about how NNs behave in learning the phase-transition

points of quantum models with pairing terms.

1.2 Outline of dissertation

In this dissertation, we focus on quantitatively analyzing the expressibility and com-

plexity of NN quantum states in state representation in the context of quantum many-body

physics, investigating their application to learning many-body excitations of long-range

3


power-law interacting quantum spin models and using deep NNs for phase classification.

In Chapter 2, we study the efficiency of RBMs in state representation and provide a

paradigm for doing complexity analysis for generic long-range RBMs. We propose a new

concept—long-range-fast-decay (LRFD) RBM states with a quantified nonlocal structure.

We derive an upper bound on truncation errors associated with two measures of state

differences. Then we identify distinct asymptotic scaling laws of spatial complexities for

RBMs determined by the nonlocal interaction pattern between physical and virtual particles

in their forms. Based on this analysis, we provide numerical results supporting that ground

states of a wide range of 1D quantum systems may be approximated by LRFD RBMs with

at most polynomial complexities. These results offer evidence for the potentially high

efficiency of RBMs in the scope where other state parameterizations become less efficient.

At last, we provide the relations between multiple typical state manifolds.

In Chapter 3, we use RBMs to investigate the many-body excitations of long-range

power-law interacting quantum spin models in dimensions D = 1 and 2. We develop an

energy-shift method to calculate the excited states of such spin models and obtain a high-

precision momentum-resolved low-energy spectrum. We numerically identify the critical

exponent where the maximal quasiparticle group velocity transits from finite to divergent

in the thermodynamic limit. In D = 1, we show that the results agree with an analysis

using the field theory and semiclassical spin-wave theory. Furthermore, we generalize the

RBM method learning excited states in nonzero-momentum sectors from 1D to 2D systems.

At last, we analyze and provide all possible values (3
2
, 2 and 3) of the critical exponent for

generic 1D quadratic bosonic and fermionic Hamiltonians with long-range hoppings and

pairings.

4


In Chapter 4, we study deep NNs as phase classifiers. We design a 2D topologically

nontrivial fermionic model Hamiltonian with pairing terms and study its phase diagram

at first and then demonstrate that deep NNs can learn the band-gap closing conditions

only based on wave-function samples of several typical energy eigenstates (zero-energy

edge modes), thus being able to identify the phase-transition point without knowledge of

Hamiltonians.

5


Chapter 2: Efficiency of neural-network state representations of one-

dimensional quantum spin systems

2.1 Introduction

In recent years, considerable activity has been devoted to the investigation of neural

network quantum states in the context of quantum many-body physics [5, 6, 7, 8, 9, 28,

29, 30, 31]. The core idea is to postulate an ansatz for the wave function in terms of a

neural network (NN) [6], which targets a low-dimensional manifold in the exponentially

large Hilbert space for state approximation [32], and apply ML algorithms to find a specific

solution. The restricted Boltzmann machine (RBM) [6, 7, 8, 28, 29] is a bipartite stochastic

construct that combines the concepts of thermodynamic partition functions with those of

classical artificial neural networks. The RBM usually works as the building block for

understanding and training deeper networks because of its relatively simple structure for

inference and its power in parametric modeling as a universal approximator for discrete

distribution [33]. It has achieved success in representing a wide range of quantum states

such as low-lying eigenstates of quantum many-body-localized systems [6, 34], code words

of a stabilizer code [30, 35, 36] and chiral topological states [8, 37, 38].

While RBMs have demonstrated their power in numerical simulation, we have strong

6


motivations to theoretically and quantitatively investigate the expressibility and com-

plexity of generic long-range RBMs. Here, long-range RBMs refer to RBMs with full

connectivity between the visible (spin) and hidden layers, whose structural information is

usually characterized by dense networks, in contrast to the so-called short-range or sparse

RBMs [7, 30, 31].

The first motivation is that there is a high probability that the exact RBM repre-

sentation of generic target quantum states naturally has a long-range form and needs at

least exponentially many parameters. This argument is based on the empirical fact that

most RBMs returned by ML algorithms after a full training for learning specific quantum

states have dense network structures and the magnitudes of parameters characterizing

connectivity generally decay but do not vanish as the site difference between visible and

hidden nodes increases. The more and more hidden nodes added to the network, charac-

terizing increasing computational resources, are able to capture correlations of higher and

higher orders between spins [6].

Another piece of evidence is that, in the field of learning ground states of quantum

systems, only for very few Hamiltonian families, such as those with a stabilizer nature,

have we found a succinct RBM form to exactly represent the ground states [30, 31]. The

difficulty of finding exact RBM solutions originates from the difficulty of reducing the

number of nonlinear equations for parameters from exponential to polynomial or even

linear in system sizes and solving them. Due to the high nonlinearity of RBM forms, it is

possible that at least exponentially many hidden nodes are needed in an exact ground-state

solution.

The second motivation is that the RBM solutions provided by relevant ML algorithms

7


are approximators of the exact target states and these approximations often feature the

long-range form and the fast parameter decay as described above, even though the exact

RBM representations of the target states are unknown or less efficient, or do not have

such features. One example is the demonstration of high accuracy and less complexity for

the approximated RBM representations of Jastrow wave functions obtained by numerical

computations in the branch of learning chiral topological states [8]. Their exact RBM

constructions can be derived in principle but have a polynomial scaling with a comparatively

larger exponent, thus being less economical.

Moreover, we also find that the RBM construction for a target state, such as the

ground state of a given Hamiltonian, is not unique in some cases even with a fixed global

phase which implies eliminating the degree of freedom associated with a global gauge trans-

formation. This argument can be justified using an example where the ground state of a

spin-1
2
system is the state with all spins up. Therefore, if the algorithm for obtaining an

RBM approximator has a stochastic nature [6], to which RBM form the solution converges

may depend on the initial and hyperparameter settings.

As shown in many numerical works [6, 8], the RBM approximators in a wide range of

cases hold the long-range and fast-decay features and have a form similar to a finite trun-

cation of RBM forms with infinitely many parameters removing those with small enough

magnitudes. Magnitude-based prunings can also be conducted for RBMs with a finite

number of parameters and their effects may be phase and system dependent [53, 54]. Thus,

it is worthwhile to generalize the form of the RBM wave-function ansatz by including

infinitely many hidden nodes, quantitatively analyze the effects of finite truncations and

justify the faithfulness of using these truncated long-range RBM approximators in math-

8


ematics. The core idea in viewing a conventional long-range RBM as a truncation of an

“infinitely large” RBM is analogous to analyzing a finite truncation of the Taylor series

in approximating an analytic function at a specific point while the limit of the infinite

sequence of the Taylor polynomials converges to the exact value.

The third motivation for studying long-range RBMs stems from the central goal

of exploring effective compressed state representations. It includes understanding the

natural measure of complexity for quantum states described by a specific representation [7]

and understanding how the global information and physical properties of the states are

encoded in that description, just as how researchers interpret the tensor network states [32].

Additionally, there has been some work studying the relationship between RBMs and other

concepts about state representations, such as string-bond states [8], correlator product

states [55] and tensor network states [38, 56]. Especially, the transformation from RBMs

to matrix product states (MPSs) [56] provides one way in principle to analyze the spatial

complexity of RBMs. But such types of transformations may lead to redundancy when the

RBMs are not quite short-range or sparse as they only use structural information of the

network. It is perhaps also inconvenient to analyze the effects of truncations applied to

RBMs through an extra intermediate transformation to other representations.

Furthermore, the RBM is an architecture that can naturally describe quantum states

in a nonlocal manner [7], which is not the same as tensor network states that aim to

encode the global wave-function information into a local tensor operator [32]. Therefore, it

is strongly desirable to find a way to analyze the spatial complexity and extract information

about the physical properties of long-range RBMs themselves.

In this work, we analyze the efficiency of long-range RBM state representation for 1D

9


quantum spin systems. Our procedure is as follows:

1. In Sec. 2.2.1, we generalize the RBM wave-function ansatz to an infinitely-many-

hidden-node regime and define a subset of generic RBM states—the long-range-

fast-decay (LRFD) RBM states, whose parameter conditions constrain the nonlocal

interactions between spins (visible nodes) and virtual particles (hidden nodes).

2. In Sec. 2.2.2, we derive an upper bound on truncation errors associated with two

measures of state differences for the sequence of truncated LRFD RBM states. One

measure is the l2-norm of the state-vector difference and the other is a Hermitian-

operator-based expectation-value difference.

3. In Sec. 2.2.3, we identify the dependence of the spatial complexity for LRFD RBMs

in state approximation on the decaying rates specified in the nonlocal interaction

pattern.

4. In Sec. 2.3, we provide numerical evidence supporting a conjecture that the ground

states of a wide range of 1D quantum spin systems, including some critical systems

with logarithmic entanglement entropy, can be approximated by LRFD RBMs with

the scaling of the spatial complexity being at most polynomial in both the system

size and the inverse of approximation errors.

5. In Sec. 2.3, we also provide the relations between multiple typical state manifolds

through which the importance of the concept of LRFD RBMs in efficiency analysis

for state representation theory is manifested.

Our results offer evidence for the utility of RBMs in cases where other state parame-

10


terizations, such as matrix product states (MPSs), become less efficient. Our work actually

proposes a paradigm of doing complexity analysis for general long-range RBMs, rather than

limited to short-range or sparse RBMs, and naturally yields a further classification of this

manifold based on the complexity scaling.

We find that the nonlocal structure of LRFD RBMs can be characterized by two

conditions. These conditions are each determined by bounds associated with two degrees

of freedom, defined within a framework of levels that is depicted in Fig. 2.1(a). One of the

two degrees of freedom is a single-level decaying factor resembling localized orbitals and

encoding information about correlations between spins (Sec. 2.2.4). The second is a level-

decay factor, which has a significant influence on the complexity of the RBMs (Sec. 2.2.3).

This paradigm and our characterization of the nonlocal structures may promote

the understanding of the natural measure of complexities for quantum many-body states

described by RBMs and may be generalizable to higher-dimensional systems and to deep

neural-network quantum states.

2.2 The restricted Boltzmann machine as a wave-function ansatz

We use the RBM as a wave-function ansatz for 1D quantum many-body spin-1
2

systems [5, 6, 7]. The RBM usually works as the building block for understanding and

training deeper networks because of its relatively simple structure for inference and its

power in parametric modeling as a universal approximator for discrete distribution [33].

As basic constructs of deep NNs, the RBMs have two layers. The first layer (a visible

layer) represents a spin configuration σ⃗ in the usual way. Here, the vector σ⃗ = (σ1, . . . , σL)

11


(a)

input output
| ⟩�⃗�𝜎 𝜓𝜓(�⃗�𝜎)

σ1

…

ℎ1

_ _ _ _ _

�𝑘𝑘 = 1

�𝑘𝑘 = 2

level

σ2

σ3

σ𝐿𝐿

…

ℎ𝐿𝐿

ℎ𝐿𝐿+1

…

ℎ2𝐿𝐿
_ _ _ _ _

…

�𝑘𝑘𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑁𝑁ℎ/𝐿𝐿

(b)

Figure 2.1: (a) Network structure of RBMs as a wave-function ansatz for 1D quantum spin
systems. Long-range RBMs usually implies full connectivity between the visible and hidden
layer. The hidden layer is divided into Nh/L levels, each containing L hidden nodes. (b)
Importance measure η(j, k̃, L) for a LRFD RBM with translational symmetry. The RBM
is constructed as Eqs. (2.24)–(2.26) show, where λ(k̃) = k̃−αP , µ(r) = 1

2
δQr

−αQ for r ̸= 0,
µ(0) = δQ = 0.5, αP = 0.75, αQ = 1.5, cw = 1 + i, cb = 0, a0 = 0 and L = 11. The inset
plots the decay of the maximum of η(j, k̃, L) among all j at each level with increasing k̃ on
a log-log scale. The linearity of the curve reveals a power-law decaying of the “ridge” (red
circles) of the 3D structure.

12


represents a system of L spins with σj = ±1 for j = 1, . . . , L. The second layer is a hidden

layer. It is composed of Nh nodes, denoted by a vector h⃗ = (h1, . . . , hNh
) with hk = ±1 for

k = 1, . . . , Nh. The hk’s are introduced as auxiliary particles in the probability model; they

play roles similar to those of virtual particles in the valence-bond picture for MPSs [32, 57].

Given a specific spin configuration σ⃗, the RBM outputs the corresponding wave-

function amplitude

ψ(σ⃗) = 2−Nh

∑
{h⃗:hk=±1}

exp
( L∑
j=1

ajσj +

Nh∑
k=1

bkhk

+
∑

1≤j≤L,1≤k≤Nh,j,k∈N

Wj,kσjhk

)
(2.1)

=
L∏
j=1

eajσj
Nh∏
k=1

cosh(bk +
L∑
j=1

σjWj,k). (2.2)

Here, aj and bk are the bias parameters for the j-th spin and k-th hidden node, respectively,

Wj,k is a weight parameter describing the interlayer interaction between the j-th spin and

k-th hidden node, and N denotes the set of all natural numbers. The aj, bk and Wj,k are

complex numbers. All such amplitudes defined on the computational basis yield a quantum

state vector |Ψ⟩ =
∑

σ⃗ ψ(σ⃗)|σ⃗⟩, where the summation is over all 2L spin configurations. It

is remarkable that we adopt the RBM form with a factor of 2−Nh . This choice allows us to

use infinitely many hidden nodes hk as long as bk and Wj,k decay sufficiently fast to ensure

the convergence of ψ(σ⃗) as Nh → ∞ for fixed system sizes L. In other words, it ensures

that adding hidden nodes with associated parameters (bk and Wj,k) being zero will not

change the value of the wave function. This choice will facilitate the asymptotic analysis

as shown below.

13


As mentioned in Sec. 2.1, the RBMs solved by relevant ML algorithms to approximate

target states often feature a long-range form and a fast parameter decay. As more hidden

nodes are added to the network, the RBM can capture higher-order correlations between

spins [6], thus leading to higher accuracy in approximation. The parameter decay is

manifested by the decay of weight parameters Wj,k with an increasing index separation

|j − k| as well as the decay of bk with increasing k.

In this work, we assume Nh to be an integer multiple of L which will facilitate the

scaling analysis, especially for translationally invariant systems. When Nh is not an integer

multiple of L, we can simply fill the last fragment with hidden nodes associated with

zero-value parameters without influencing the wave-function values. We divide the hidden

layer into multiple levels, each of which contains L hidden nodes (Fig. 2.1(a)). Thus,

there are totally Nh/L levels while the ratio Nh/L is called the hidden-unit density in

some references [6]. We will show that hidden nodes at the same level can capture the

correlation of the same order between spins by performing an algorithm to reorder all

hidden nodes for general RBMs. This point will be further clarified when we use the RBM

form with translational symmetry to represent the ground states of 1D translationally

invariant quantum systems as shown below [6].

One example of the quantum states that can be exactly represented by short-range

RBMs [30, 31] is the 1D symmetry-protected topological (SPT) cluster state. The Hamilto-

nian of the SPT cluster system is defined on a 1D L-site lattice with periodic boundary con-

ditions as Ĥcluster = −
∑L

j=1 σ̂
z
j−1σ̂

x
j σ̂

z
j+1, where σ̂

x and σ̂z are Pauli matrices. A conventional

r0-range RBM is defined as an RBM satisfying Wj,k = 0 for any |j−k| > r0. A short-range

RBM usually refers to an r0-range RBM with r0 being a small constant independent of the

14


system size L. It was shown in Ref. [30, 31] that the ground state of Ĥcluster can be exactly

represented by a 1-range RBM with L hidden nodes defined as:

aj = 0 (for any j ∈ {1, 2, . . . , L}),

bk = iπ/4,Wk−1,k = iπ/2,Wk,k = 3iπ/4,

Wk+1,k = iπ/4 (for any k ∈ {1, 2, . . . , L}),

Wj,k = 0 (for |j − k| > 1), (2.3)

by using the stabilizer nature of the system to decrease the number of equation constraints

for parameters from exponential to linear in L. Using our language of levels, this RBM

just has one level and its weight parameters at this single level have a support of very

short length which is a manifestation of its quantum entanglement satisfying an area law.

Moreover, the translational symmetry of the system is inherited by the RBM form. The

parameter patterns of this RBM also have a translational symmetry, which means that its

parameters for different hidden nodes can be generated by the action of a translational-

symmetry transformation operator on those for a single hidden node [6].

Inspired by the extensibility of the system of equations (2.3) with growing system sizes

and considering the need to capture higher-order correlations between spins [6] and stronger

quantum entanglement between subsystem blocks [31], we expect that the RBM representa-

tion of general quantum states has multiple, possibly infinitely many, levels and the length

of the support of weight parameters at each level may increase from a small constant to the

maximum length L. This motivates us to analyze generic long-range RBMs with properly

specified nonlocal interactions between spins and hidden nodes (virtual particles).

15


2.2.1 Long-range-fast-decay RBMs

We now discuss aspects of the nonlocal structure of LRFD RBMs that were summa-

rized at the end of Sec. 2.1. This leads to specific definitions of the two conditions that

were mentioned there.

We begin by generalizing the RBMwave-function ansatz to an infinitely-many-hidden-

node regime. An RBM state |Ψ(L,∞)⟩ with infinitely many hidden nodes and a system size

L can be defined as

|Ψ(L,∞)⟩ =
∑
σ⃗

ψ(L,∞)(σ⃗)|σ⃗⟩, (2.4)

where

ψ(L,∞)(σ⃗) =
L∏
j=1

ea
(L)
j σj

∞∏
k=1

cosh(b
(L)
k +

L∑
j=1

σjW
(L)
j,k ).

(2.5)

Its corresponding truncated-RBM sequence is defined as {|Ψ(L,Nh)⟩}, where

|Ψ(L,Nh)⟩ =
∑
σ⃗

ψ(L,Nh)(σ⃗)|σ⃗⟩ (2.6)

16


and

ψ(L,Nh)(σ⃗) =
L∏
j=1

ea
(L)
j σj

Nh∏
k=1

cosh(b
(L)
k +

L∑
j=1

σjW
(L)
j,k )

(2.7)

is constructed by removing the hyperbolic cosine terms with k ≥ Nh + 1 from ψ(L,∞)(σ⃗).

Then, we define a subset of generic RBM states with infinitely many hidden nodes—

long-range-fast-decay (LRFD) RBM states—as the RBMs whose parameters satisfy the

following two conditions.

Condition 1 (boundedness of Wj,k). There exists an L-independent integer k̃s ∈ N and

three nonnegative monotonically decreasing real functions λR(k̃), λI(k̃) and µ(r) such that,

after a reordering of all hidden nodes, for all k > k̃sL,

|Re(W (L)
j,k )| ≤ λR(k̃)µ(|j − jc|circ), (2.8)

| Im(W
(L)
j,k )| ≤ λI(k̃)µ(|j − jc|circ), (2.9)

where k̃ ∈ {1, 2, . . . , Nh/L} designates the numerical index of levels; jc, the center spin for

the k-th hidden node, denotes the site index of the spin with which the interaction of the k-

th hidden node reaches its maximum among all j ∈ {1, 2, . . . , L}; |m|circ = min{m,L−m}

in accordance with the periodic boundary conditions; and r ∈ {0, 1, . . . , (L−1)/2} denotes

the distance between j and jc assuming L is odd without influencing the validity of the

following asymptotic analysis. The functions λR(k̃), λI(k̃) and µ(r) satisfy the conditions

17


that there exist finite L-independent nonnegative constants P0 and µ0 such that

∞∑
k̃=k̃s+1

(
λ2R(k̃) + β2

1λ
2
I(k̃)

)
= P0 <∞, (2.10)

µ(r) ≤ µ0 <∞ (for all r ≥ 0), (2.11)

where β1 = 3
√
2 ln 2/π is found in the convergence proof given in Appendix A.1.

We provide an interpretation of each new variable as follows. k̃ = k̃(k) and jc = jc(k)

are both functions of k and the correspondence between the pair (k̃, jc) and k is a bijective

map. It means that every hidden node is associated with a unique pair and thus can be

uniquely positioned in the RBM network after the reordering (Fig. 2.1(a)). The hidden

nodes capturing the correlation of the same order between spins are grouped at the same

level so that the new indices of all hidden nodes characterized by the pair (k̃, jc) actually

manifest the level of correlations. This characterization can also facilitate a symmetry

manifestation for quantum states holding translational symmetry. This reordering step

is to solve the problem that ML algorithms with a stochastic nature are often unable

to automatically group the hidden nodes according to level stratification and their site

positions usually exhibit randomness.

Condition 2 (boundedness of bk). After the same reordering of all hidden nodes that is

described in Condition 1, for all k > k̃sL,

|Re(b(L)k )| ≤ λR(k̃)µ(0), (2.12)

| Im(b
(L)
k )| ≤ λI(k̃)µ(0). (2.13)

18


The definition of LRFD RBMs should be understood from the point of view of state

manifolds [32, 58]. A state manifold for quantum many-body states usually refers to a

subspace of the whole Hilbert space spanned by a parameterized wave-function family [58],

thus is a set containing specific types of quantum states. So the manifold of LRFD RBMs

can be defined as a space spanned by all parameterized wave functions, every one of which

belongs to a quantum-state sequence associated with a varying system size and satisfying

the above Condition 1 and 2. One LRFD-RBM state refers to an element in this manifold.

So this definition is in the same spirit as the definition of MPSs with different scaling

laws [32, 57].

Condition 1 gives an upper bound on the magnitude of RBM weight parameters

and actually provides a description of the nonlocal interaction between spins and hidden

nodes (virtual particles). It requires that |Re(W (L)
j,k )| and |Im(W

(L)
j,k )| are upper bounded,

respectively, by the products λR(k̃)µ(r) and λI(k̃)µ(r). The monotonically decreasing

functions λR(k̃) and λI(k̃) can be regarded as level-decay factors, while µ(r) is a factor

describing the decay due to the increase of the distance between the spin-site index (j) and

the corresponding spin-site index of the center spin (jc(k)) for the k-th hidden node. The

function µ(r) has a localization feature and resembles a single-modal localized orbital in

the physics of periodic potentials, such as Wannier modes [59], which can be reflected by

its monotonically decreasing with increasing r. So this description can effectively capture

the parameter decays induced by both the level increase and the growth of system size,

providing two degrees of freedom in characterizing the nonlocal interaction pattern. The

separate treatments for the real and imaginary parts originate from their inequivalent

positions in the RBM wave-function form, which is shown in Appendix A.1.

19


Condition 2 implies that the contribution of bk-related terms can be upper bounded

by the largest Wj,k-related terms at each level so that the Wj,k weight parameters play a

dominant role in the asymptotic analysis (Appendix A.1). Since there is often a degree

of freedom in choosing the value of µ(0), Condition 2 can be satisfied for a wide range of

RBM states.

Conditions 1 and 2 are proposed to ensure the convergence of the state vector (Eq. (2.5))

and provide a clear quantification for the rate of parameter decays, on the basis of which

a complexity analysis can be conducted. A rigorous proof of the convergence of the state

vector when Conditions 1 and 2 are satisfied is given in Appendix A.1. This proof is

important not only because it ensures that the generalization of RBMs to an infinitely-

many-hidden-node regime makes sense by defining them as the limits of some infinite

sequences, but also because it introduces the key mathematical tricks and concepts that

are necessary for analyzing the effects of truncations.

The core idea of the proof is that we can prove the sequence {ψ(L,nL)(σ⃗) : n ∈

N} is a Cauchy sequence in the field of complex numbers C [60]. This proof is inspired

by the fact that, when b
(L)
k and W

(L)
j,k decay sufficiently fast, the complex-valued ratio

ψ(L,(n+m)L)(σ⃗)/ψ(L,nL)(σ⃗) will quickly fall into the neighborhood of the point z = 1 in the

complex plane as n increases. So we derive an upper and lower bound on the ratio’s modulus

|ψ(L,(n+m)L)(σ⃗)/ψ(L,nL)(σ⃗)| which converge to 1 and an upper bound on the magnitude of its

argument | arg
(
ψ(L,(n+m)L)(σ⃗)/ψ(L,nL)(σ⃗)

)
| which converges to 0 as n increases. Then we

show that the corresponding magnitude sequence {|ψ(L,nL)(σ⃗)|} and the argument sequence

{arg(ψ(L,nL)(σ⃗))} are Cauchy sequences in the field of real numbers R, thus {ψ(L,nL)(σ⃗)} is

a Cauchy sequence in C.

20


(a)

0 100 200
10-15

10-10

10-5

100

(b)

10 60 360
10-15

10-10

10-5

100

(c)

0.9998 1
0

0.01

0.02

(d)

8 15 22
20

90

160

101 102 103101
102
103
104

U
F

Figure 2.2: (a)(b) Comparison between the exact and estimated truncation errors ε(L,Nh)
as a function of Nh with fixed L for 1D SPT cluster states with a perturbation part.
E Ψ: exact values, first-type truncation errors; U Ψ: upper-bound-based estimation, first-
type; E CZ: exact, second-type, B̂ = σ̂z1σ̂

z
2; E CX: exact, second-type, B̂ = σ̂x1 σ̂

x
2 ;

U CX: upper-bound-based estimation, second-type, B̂ = σ̂x1 σ̂
x
2 . The perturbation part

is constructed as Eqs. (2.24)–(2.26) show. µ(r) = 1
2
δQr

−αQ for r ̸= 0, µ(0) = δQ = 0.1,

αQ = 3, cw = cb = 1 + i, a0 = 0 and L = 11. (a) Exponential decay of λ(k̃) (vertical axis:

log scale). λ(k̃) = 0.2δ
−(k̃−1)
P with δP = 1.5. (b) Power-law decay of λ(k̃) (on a log-log

scale). λ(k̃) = k̃−αP with αP = 3. (c) Schematic interpretation of variables used in proofs.
The inset in (c) shows the distribution of data points of the ratio ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗) with
Nh = L, which corresponds to a truncation removing hidden nodes starting from the second
level. The parameter setting is the same as that in (b). The data points are all localized in
the neighborhood of z = 1 in the complex plane enclosed by the red solid curves in (c) as
we analyzed. (d) Scaling of N∗

h(L, ε0) in L for two fixed values of the first-type truncation
errors ε0 with the same parameter setting as in (b). Red circle: ε0 = 10−7; Blue square:
ε0 = 10−10. The inset in (d) shows the scaling of Nh estimated based on our upper bounds
with ε0 = 10−3. Magenta solid: using exact values of upper bounds; Brown dashed: using
leading-order estimations. The two curves almost coincide.

21


2.2.2 Effects of wave-function truncation for fixed system sizes

We derive upper bounds on truncation errors associated with two measures of state

differences for the sequence of truncated LRFD RBM states. Define ε(L,Nh) to be a specific

type of truncation error for using |Ψ(L,Nh)⟩ to approximate |Ψ(L,∞)⟩.

A natural measure of state differences is the square of the l2-norm [61] of the state-

vector difference, ∥|Ψ̃(L,∞)⟩ − |Ψ̃(L,Nh)⟩∥22, where the tilde symbol is used to represent cor-

responding states after a normalization operation. It is remarkable that the RBM wave-

function ansatz is not automatically normalized and an estimation of the normalization

factor ⟨Ψ|Ψ⟩ is important and often tricky as shown in Appendix A.2. This measure of

truncation errors is adopted in fundamental works about the faithfulness and efficiency

of other wave-function ansätze, such as MPSs [62, 63]. So it allows us to make a direct

comparison between the efficiencies of RBMs and other state representations.

A second measure of state differences is a Hermitian-operator-based expectation-value

difference defined as

|⟨B̂⟩(L,∞) − ⟨B̂⟩(L,Nh)|

= |⟨Ψ̃(L,∞)|B̂|Ψ̃(L,∞)⟩ − ⟨Ψ̃(L,Nh)|B̂|Ψ̃(L,Nh)⟩|. (2.14)

Here, B̂ can be any Hermitian operator of the form B̂ =
⊗L

j=1 σ̂
(mj)
j , where

⊗
is the

tensor product symbol, mj ∈ {0, 1, 2, 3}, σ̂(0)
j = I2×2 is a 2-by-2 identity matrix, and

{σ̂(1)
j , σ̂

(2)
j , σ̂

(3)
j } denote the Pauli matrices. We also use this measure as {σ̂(m)

j : m =

0, 1, 2, 3} is a complete basis set for the local Hilbert space for the j-th spin and a wide

22


range of typical physical observables, such as spin correlations and total energy, correspond

to Hermitian operators of such type or linear combinations of polynomially many such

operators.

Then we can prove a lemma which provides upper bounds on truncation errors of the

above two types for the sequence of truncated LRFD RBM states.

Lemma 3 (upper bounds on truncation errors). For LRFD RBMs satisfying Conditions 1

and 2, after the same reordering of all hidden nodes described in Condition 1, for all

Nh > k̃sL,

∥|Ψ̃(L,∞)⟩ − |Ψ̃(L,Nh)⟩∥22 ≤ F1

(
LQ(L)P (Nh/L)

)
, (2.15)

|⟨B̂⟩(L,∞) − ⟨B̂⟩(L,Nh)| ≤ F2

(
LQ(L)P (Nh/L)

)
, (2.16)

23


where

F1(x) = 2− 2 exp[−2(1 + β2
1)x] cos(4β2x) (2.17)

= c1x+O(x2) (as x→ 0), (2.18)

F2(x) = max{| exp(4x)− 1|, |1− exp(−4β2
1x)|}

+max{
[
exp(8x)− 2 exp(4x) cos(8β2x) + 1

]1/2
,[

exp(−8β2
1x)− 2 exp(−4β2

1x) cos(8β2x) + 1
]1/2

}

(2.19)

= c2x+O(x3/2) (as x→ 0). (2.20)

P (m) =
∞∑

k̃=m+1

λ2(k̃) (m ≥ k̃s,m ∈ N), (2.21)

Q(L) =
( (L−1)/2∑

r=0

µ(r)
)2

, (2.22)

the relevant constants are β2 = 3
√
3/π, c1 = 4(1 + β2

1) and c2 = 4β2
1 + 4(β4

1 + 4β2
2)

1/2 and

we have assumed that λR(k̃) = λI(k̃) = λ(k̃) for simplicity which holds throughout the

following discussion.

The proof is given in Appendix A.2 which uses arguments similar to those described

in the proof for the convergence of LRFD RBMs. Based on the intuition that the ratio

ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗) will fastly converge to 1 with increasing Nh, we derive an upper

bound
√
R1 and a lower bound

√
R2 on the ratio’s modulus |ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗)| and an

upper bound Θ on the magnitude of its argument | arg
(
ψ(L,∞)(σ⃗)/ψ(L,Nh)(σ⃗)

)
|. The two

types of truncation errors can be upper bounded using these three variables and the two

24


upper bounds can be finally expressed as functions (F1(x) and F2(x)) of LQ(L)P (Nh/L)

which decreases to zero with increasing Nh and fixed L. The idea of the proof is shown

schematically in Fig. 2.2(c). For simplicity, we assume that λR(k̃) = λI(k̃) = λ(k̃)

throughout the following discussion.

Based on our description of the nonlocal interactions between spins and virtual

particles and using the language of levels, P (Nh/L) is a summation of all level-decay factors

for hidden nodes at levels starting from k̃ = Nh/L + 1 to k̃ = ∞, while Q(L) represents

the localized “orbital” at every single level and contributes a factor reflecting the pure

influence of system-size growing regardless of levels. The two different types of truncation

errors correspond to two different forms of the function F (x), but both of them are analytic

at the point x = 0.

We give the scaling of truncation errors in Nh as below. It can be obtained that, if

Q(x) = O(q(x)) as x→ ∞, P (x) = O(1/pd(x)) as x→ ∞, and F (x) = O(f(x)) as x→ 0,

then

ε(L,Nh) = O(f(
L q(L)

pd(Nh/L)
)) (as Nh → ∞). (2.23)

Our construction of LRFD RBMs and theoretical analysis of the truncation errors

can be further clarified with results from numerical computations. We can construct LRFD

25


RBMs with translation symmetry whose parameters exactly satisfy

W
(L)
j,k = cwλ(k̃)µ(|j − jc|circ), (2.24)

b
(L)
k = cbλ(k̃)µ(0), (2.25)

a
(L)
j = a0 (2.26)

for any 1 ≤ j ≤ L, 1 ≤ k ≤ Nh, where a0, cw and cb are complex constants with |cb| ≤ |cw|

to satisfy Condition 2, k̃ = k̃(k) = ⌈k/L⌉, jc = jc(k) = k − (k̃ − 1)L, Nh is an integer

multiple of L, ⌈x⌉ denotes the ceiling function, and k̃s = 0 in this case. It can be shown that

such an RBM form can be directly transformed into the RBM form proposed to represent

ground states of 1D translationally invariant systems [6] for any finite Nh but we generalize

it to an infinitely-many-hidden-node regime. Since the parameters for different hidden

nodes can be generated by the action of a translational-symmetry transformation operator

on those for a single hidden node, we just need to focus on one representative hidden node

for each level. So we propose an importance measure η(j, k̃, L) to measure the importance

of a set of edges which is defined as

η(j, k̃, L) =
∣∣∣Re(W (L)

j,(L+1)/2+(k̃−1)L
)
∣∣∣2 + β2

1

∣∣∣ Im(W
(L)

j,(L+1)/2+(k̃−1)L
)
∣∣∣2 (2.27)

and present it as a function of the spin-site index j and level index k̃. Its 3D structure can

reflect the decay of both λ(k̃) and µ(r) while the center of the “orbital” at every level is

localized around j = (L + 1)/2. So a plotting of the peak at every level as a function of

the level number (k̃) can reflect the decay of λ(k̃). One example of such LRFD RBM with

26


a power-law decaying λ(k̃) is shown in Fig. 2.1(b).

We show the two types of truncation errors ε(L,Nh) as a function of Nh with fixed L

for 1D SPT cluster states with a perturbation part (Fig. 2.2(a) and 2.2(b)). It means

that the RBM is constructed as a summation of the setting defined in the system of

equations (2.3) and a perturbation part specified as Eqs. (2.24)–(2.26) show. The numerical

results for λ(k̃) with exponential and power-law decays are given. As described above, the

1D SPT cluster states can be exactly represented by a short-range (1-range) RBM [30].

Using our description, its RBM representation just has one level, and the corresponding

λ(k̃) and µ(r) quickly go down to zero for k̃ > 1 and r > 1. The addition of the perturbation

part makes the composite RBM a LRFD RBM so that we can study the truncation errors.

We give the results for both types of truncation errors and let B̂ be the operator of spin

correlations between spin 1 and 2 in z and x directions.

Our numerical experiments on the scaling of the truncation errors in Nh with fixed

L are well upper bounded by our estimations given in inequalities (2.15) and (2.16), which

substantiates our theoretical analysis. Those experiments also indicate that our estimations

in Eq. (2.23) correctly capture the asymptotic properties of ε(L,Nh) with varying Nh.

Moreover, the fact that the curve of exact ε(L,Nh) and that of our estimation associated

with B̂ = σ̂x1 σ̂
x
2 have exactly the same slope implies that our estimation in Eq. (2.23) gives

an asymptotically optimal upper bound. It means that, for the second-type truncation

errors (inequality (2.16)), there is still room to improve the constant prefactors in our

estimation, but we cannot qualitatively further improve the upper bound. In comparison,

there is room to both qualitatively improve the upper bound and improve the constant

prefactors for the first-type truncation errors (inequality (2.15)).

27


2.2.3 Scaling of complexity

Table 2.1: Complexity estimations for distinct typical settings of µ(r) and λ(k̃). “−” in
the µ(r) column denotes all µ(r) functions that make Q(L) converge as L→ ∞. “−” in the
λ(k̃) column denotes all λ(k̃) functions that make P (Nh/L) have the asymptotic behavior
of O(1/ ln(Nh/L)) as Nh → ∞. In all settings, δP > 1, αP > 1/2 and each entry provides
a description of the asymptotic behavior of the corresponding function.

Manifold µ(r) Q(L) λ(k̃) P (Nh/L) N∗
h(L, ε)

S
(1)
2 - converge δ−k̃P O(δ

−2Nh/L
P ) O(L ln(L/ε))

S
(2)
2 - converge k̃−αP O((Nh/L)

1−2αP ) O((L2αP /ε)1/(2αP−1))

S
(3)
2 r−1 O((lnL)2) δ−k̃P O(δ

−2Nh/L
P ) O(L ln(L/ε))

S
(4)
2 r−1 O((lnL)2) k̃−αP O((Nh/L)

1−2αP ) O((L2αP (lnL)2/ε)1/(2αP−1))

S
(5)
2 → µ∞ > 0 O(L2) δ−k̃P O(δ

−2Nh/L
P ) O(L ln(L/ε))

S
(6)
2 → µ∞ > 0 O(L2) k̃−αP O((Nh/L)

1−2αP ) O((L2αP+2/ε)1/(2αP−1))

S
(7)
2 - converge - O(1/ ln(Nh/L)) O(L exp (L/ε))

We can investigate the scaling of spatial complexity in system sizes for LRFD RBMs

as the results in Sec. 2.2.1 and Sec. 2.2.2 still hold for varying L. We give an upper-bound

estimation of the complexity of RBM representations which depends on the asymptotic

behavior at x = ∞ of the functions P (x) (Eq. (2.21)) and Q(x) (Eq. (2.22)), and thus is

determined by the decaying rates specified by λ(k̃) and µ(r).

Define the minimum Nh to achieve a sufficiently small approximation error ε0 as

N∗
h(L, ε0) = inf{Nh : ε(L,Nh) ≤ ε0}. (2.28)

Using Lemma 3, the sufficient condition for ε(L,Nh) ≤ ε0 is that the corresponding upper

bound on truncation errors is no larger than ε0. So this provides one way to get an upper

28


bound on N∗
h(L, ε0) for LRFD RBMs. It can be shown that

N∗
h(L, ε0) = O(Lp−1

d (
L q(L)

f−1(ε0)
)) (as L→ ∞), (2.29)

where q(x), pd(x) and f(x) are functions to specify the asymptotic behaviors of Q(x), P (x)

and F (x) as defined above and the superscript “−1” denotes the inverse of the corresponding

function.

Rich information can be extracted from Eq. (2.29). First, the first factor L comes

from our assumption that Nh is an integer multiple of the system size L and the second

factor L in front of q(L) is extracted using the translational symmetry of the wave function.

So these two factors reflect the growing system sizes and the remaining factors reflect the

distinction in complexity for different LRFD RBMs.

Second, P (Nh/L) and Q(L) (thus µ(r) and λ(k̃)) which characterize the nonlocal

structure of RBMs in our description have qualitatively different influence on the complex-

ity. Specifically, Q(L) can converge to a finite L-independent constant in the thermodynamic

limit and does not influence the complexity for sufficiently localized “orbitals” in the

cases where µ(r) decays sufficiently fast. With the upper boundedness condition for µ(r)

(Eq. (2.11)), Q(L) can contribute an at-most-linear factor to this upper bound onN∗
h(L, ε0).

By contrast, the asymptotic property of P (x) significantly influences the complexity and

may lead to the inefficiency of RBM representations if λ(k̃) decays sufficiently slowly.

That would imply that there are too many high-order correlations between spins to be

captured by the RBM so polynomially many parameters are not enough to fully compress

the information into the RBM form. But as long as p−1
d (x) has an at-most-power-law

29


dependence on x, this upper-bound estimation will imply that the complexity is definitely

at most polynomial in both system size L and 1/ε0 with the above two types of truncation

errors. Moreover, it is also remarkable that our estimation only provides an upper bound on

the complexity, so a faster-than-polynomial scaling of the bound (such as S
(7)
2 in Table 2.1)

does not necessarily imply the inefficiency of the representation. It is possible that the

upper bound is not tight and the real complexity is at most polynomial in this case.

Third, the asymptotic behavior of F (x) at x = 0 also influences the scaling of the

complexity and it directly acts on ε0. We have demonstrated that, for the two types of

truncation errors described above, the corresponding F (x)’s (F1(x) and F2(x)) are both

analytic at x = 0. For general types of truncation errors that can be upper bounded by

a function F (LQ(L)P (Nh/L)), 1/f
−1(ε0) has a power-law dependence on 1/ε0 as long as

F (x) is analytic at x = 0 based on the Taylor series expansion of the function.

This result suggests separate effects of the factors µ(r) and λ(k̃). The scaling of entan-

glement entropy, which is an important measure of the complexity of quantum many-body

states, is influenced by µ(r), whereas λ(k̃) significantly influences the spatial complexity of

parameterization in LRFD RBM representations. The length of the support of µ(r), which

determines the “range” r0 of RBMs, directly influences the scaling of the entanglement

entropy of the states between subregions [31] but does not directly contribute a faster-

than-polynomial factor to the parameterization complexity. This result possibly provides

further theoretical evidence for the high efficiency of RBMs in representing states with

entanglement entropy scaling faster than an area law in system sizes [31].

We apply our complexity estimation to several typical settings of µ(r) and λ(k̃) in

Table 2.1. The manifolds S
(j)
2 with 1 ≤ j ≤ 6 all correspond to a spatial complexity which is

30


at most polynomial in L. We also apply this analysis to RBMs constructed as the 1D SPT

cluster states with a perturbation part. Our numerical results on the scaling of N∗
h(L, ε0)

in L with fixed ε0 (Fig. 2.2(d)) for small system sizes are consistent with our theoretical

analysis summarized in Table 2.1. The piecewise linearity of N∗
h(L, ε0) as a function of

L with a slope growing very slowly implies that the scaling is perhaps just slightly faster

than linear, consistent with our estimation based on parameter settings. The piecewise

linearity is due to our assumption that Nh is an integer multiple of L. So it applies a

ceiling operation to the ratio Nh/L which will not change when L varies within a small

range. The inset in Fig. 2.2(d) shows that NU
h (L, ε0) as upper bounds on N

∗
h(L, ε0) in our

analysis obtained by using the exact values of the right-hand side of inequality (2.15) and

its leading-order estimations are almost the same and both have a power-law scaling in L

as indicated by Eq. (2.29), which support the validity of our complexity analysis.

2.2.4 Spin-correlation information

In this subsection, we analyze what information about the physical properties of the

quantum states can be extracted from the LRFD RBM form using our description of the

nonlocal structure. Here, we focus on a small-parameter regime in which aj, bk, Wj,k ≤ ε1,

and ε1 ≪ 1/L, ε1 ≪ 1/Nh. We do not explicitly write the superscript “(L)” for RBM

parameters and assume that the RBM just has a finite number (Nh) of hidden nodes in

this subsection.

Based on the proof given in Appendix A.3, we find that the correlation in the z

direction between spins with a distance of r for a LRFD RBM with translational symmetry

31


1 6 11

10-2

10-1

100

14 22 30

10-2

100

Figure 2.3: Spin correlations in the z direction as a function of distance r on a log-log
scale. The LRFD RBMs are constructed as Eqs. (2.24)–(2.26) show, where µ(r) = 1

2
δQr

−αQ

for r ̸= 0, µ(0) = δQ = 0.2, λ(k̃) = k̃−αP , αP = 3.5, cw = 1, cb = 0, a0 = 0, L = 22 and
Nh = 5L. The inset shows the spin correlation ⟨σ̂z1σ̂z1+L/2⟩ with r being the half-chain length

for varying L (on a log-log scale). It shows a convergence of ⟨σ̂z1σ̂z1+L/2⟩ to an L-independent
constant (almost attaining the maximum value 1) for αQ = 1/2 and a decay for αQ = 2.

is

Cz
unnorm(r) = ⟨Ψ(L,Nh)|σ̂z1σ̂z1+r|Ψ(L,Nh)⟩ (2.30)

= 2
(
Re(WW T )

)
1,1+r

+ 4Re(a1) Re(a1+r)

+O(ε31) (as ε1 → 0). (2.31)

Note that the above result is the r-related part of the spin correlation, while its real value

is Cz
unnorm(r) divided by an r-independent normalization factor ⟨Ψ(L,Nh)|Ψ(L,Nh)⟩. So for

RBMs constructed as Eqs. (2.24)–(2.26) show with a0 = 0 for simplicity,

Cz
unnorm(r) ≈ |cw|2

Nh/L∑
k̃=1

|λ(k̃)|2
L∑

jc=1

µ(|1− jc|circ)µ(|1 + r − jc|circ). (2.32)

32


So the µ(r)-related factor as shown above describes the decaying rate of spin correlations

in the z direction as a function of the distance r, while the λ(k̃)-related factors independent

of r do not influence the decaying rate if we only consider the leading-order terms in Eq.

(2.31).

The above result in Eq. (2.31) gives an interpretation of the roles of hidden nodes.

The hidden nodes can be viewed as intermediate virtual particles that relate spins (physical

particles) at different lattice sites. When an RBM is short-range, the term
(
Re(WW T )

)
1,1+r

will vanish for large enough r as there is no virtual particle that can have both nonzero

connectivity to two spins separated by r. Then, more intermediate hidden nodes are needed

to transport such relations, which means that we need to consider higher-order terms. This

is additional evidence that long-range RBMs can represent states with strong quantum

correlations. It is shown in Appendix A.3 that, even when µ(r) → 0 as r → 0, we can

still construct LRFD RBMs in which the spin correlations in the z direction can have

long-range decayings lower bounded by Θ(1/rαQ) (for µ(r) = Θ(1/rαQ)) with αQ > 1,

Θ(ln r/r) (for µ(r) = Θ(1/r)), and even Θ(1) (for µ(r) = Θ(1/rαQ) with 0 < αQ ≤ 1
2
).

These three kinds of decaying rates of spin correlations are demonstrated by numerical

computations (Fig. 2.3). The spin correlation ⟨σ̂z1σ̂z1+r⟩ almost saturates the maximum

value 1 for αQ = 1/2 and have different long-range decaying rates for αQ = 1 and αQ = 2

as r increases.

33


(a) (b)

Figure 2.4: Importance measure η(j, k̃, L) for the RBMs approximating ground states of
two critical systems with L = 15. (a) TFIM with Bx = 1. (b) XXZ model with Jz = −0.2.
The insets in each subfigure show the decays of the maximum importance measure at each
level as level number k̃ increases on a log-log scale. The system size L = 9, 11, 13 and 15.
The purple dashed curve implies that these decaying curves can be upper bounded by a
power-law decay. By numerical fitting, the corresponding αP for (a) and (b) are 2.957 and
1.232, respectively.

2.3 Ground-state applications

Based on the proposal of the concept of LRFD RBMs and the theoretical analysis

of their spatial complexity, it is natural to explore their applications to learning quantum

states associated with specific models. First, we theoretically prove that the state with all

spins pointing up in the z direction, which is the ground state of a spin-1
2
system with a

single magnetic field in the z direction and has a form of the Kronecker delta function,

can be approximated by LRFD RBMs with arbitrary accuracy in Appendix A.4. We

find that the RBM construction is not unique for such a target state even when fixing the

global phase which implies eliminating the degree of freedom associated with a global gauge

transformation and we give one construction. Thus, we provide one example of the utility

34


of LRFD RBMs in state representation for arbitrarily large system sizes.

Second, we are particularly interested in the behavior of RBMs in cases where other

state representations become less efficient. We numerically study the representation of

the ground states of critical systems with finite sizes for which the MPS representation

becomes less efficient [62, 63], while MPS has achieved notable success in representing

quantum many-body states with entanglement entropy satisfying an area law [32, 57].

We use RBMs with translational symmetry and apply the conventional quantum

Monte Carlo algorithm (also a variational method) with stochastic-reconfiguration opti-

mizations [6, 64, 65, 66] to learn the ground states of two typical quantum models: the 1D

transverse-field Ising model (TFIM) (Eq. (2.33)) and XXZ model (Eq. (2.34)), described

by Hamiltonians

Ĥ = −
∑

1≤j≤L

σ̂zj σ̂
z
j+1 −Bx

L∑
j=1

σ̂xj , (2.33)

and

Ĥ =
∑

1≤j≤L

(−σ̂xj σ̂xj+1 − σ̂yj σ̂
y
j+1 + Jzσ̂

z
j σ̂

z
j+1) (2.34)

with periodic boundary conditions, respectively, where Bx denotes the strength of a trans-

verse field and Jz denotes the strength of coupling in the z direction. We use RBMs to learn

the ground state of the TFIM with Bx = 1 which implies that the quantum system is exactly

in the phase-transition point between a ferromagnetic and a paramagnetic phase [67] and

of the XXZ model with Jz = −0.2 which implies that the system is in a gapless disordered

35


XY phase [68]. Both systems are critical systems with the entanglement entropy of the

ground states scaling logarithmically in system sizes [67, 69, 70]. The ground states of these

two Hamiltonians (at least for small system sizes) can be well learned by RBMs, which is

demonstrated by the high accuracy in spin-correlation calculations given in Appendix A.5.

The importance measures η(j, k̃, L) for these two RBMs are provided in Fig. 2.4(a) and

2.4(b).

The numerical results show that the RBM representations of the two ground states

of the above two critical systems have forms very similar to LRFD RBMs. The overall

3D structures for the importance measures η(j, k̃, L) are similar to the one presented in

Fig. 2.1(b) which corresponds to a standard LRFD RBM. The weight parameters for hidden

nodes at the same level are quite localized and decay fastly as the level number k̃ increases

and as the spin site index j goes away from the center. Moreover, it seems that the “ridge”

of η(j, k̃, L) for varying system sizes can be upper bounded by an L-independent power-law

decay curve, based on which we can extract a corresponding αP characterizing the rate

of level decay for these small-system-size wave functions. If these features still hold as L

increases and approaches infinity, these states will form LRFD RBMs which belong to the

set S
(2)
2 or S

(6)
2 in Table 2.1 and the corresponding λ(k̃) and µ(r) can be defined.

Moreover, the above results exhibit a feature that is also manifested in the theory of

MPS representations. It has been shown that [62], though MPS becomes less efficient in

representing the ground states of critical systems, the bond dimension required to achieve an

approximation error ε0 can still be upper bounded by a function scaling polynomially in the

system size L. The exponent in the power-law dependence of spatial complexity of MPSs on

L depends on the central charge c, which is a quantity roughly quantifying the “degrees of

36


freedom of the theory” in conformal field theory [57]. A larger c leads to a higher exponent

in that estimation which implies a higher complexity in MPS representation. While the

TFIM at the above phase-transition point has c = 1
2
and the XXZ model in the disordered

XY phase has c = 1 [71], our numerical results do show a smaller fitted αP for the XXZ

model, which implies that the XXZ model has more intrinsic “complexity” compared to

the TFIM, thus needing more parameters to capture this complexity.

2.4 State manifolds and complexity classification

Rigorously speaking, the numerical results for systems of finite sizes only provide

evidence supporting that the states may be LRFD RBMs but cannot prove it, since the

properties of RBMs in the process of approaching the thermodynamic limit are not yet

known. Based on the success of RBMs in numerical simulations and the fact that they

can often achieve high accuracy even with a constant number of levels (at least for small

system sizes), we conjecture that the ground states of a wide range of quantum systems

may be exactly represented by LRFD RBMs or a variant of them. Here, the term “variant”

means generalizing the forms specified in Condition 1 and 2 by adding more factors that

can be naturally incorporated into our complexity analysis. For example, the λ(k̃) and

µ(r) functional forms, which are L-independent in our definition of LRFD RBMs, can be

generalized into λ(k̃, L) and µ(r, L), respectively, while their effects can be easily evaluated

using our paradigm for complexity analysis.

We summarize the relations between multiple typical state manifolds so that the

significance of proposing the concept of LRFD RBMs can be better understood. A state

37


𝑆𝑆2: LRFD RBM

𝑆𝑆3: RBM-Polynomial

𝑆𝑆4 : RBM-Inefficient

𝑆𝑆5 : 1D 
Ground 
States 𝑆𝑆1: 

SR RBM
𝑆𝑆2

(𝑗𝑗)

Figure 2.5: Relations between multiple typical state manifolds. S1: short-range RBMs;
S2: LRFD RBMs; S

(j)
2 (for 1 ≤ j ≤ 6, j ∈ N): LRFD RBMs with distinct parameter

conditions, specified in Table 2.1; S3: RBMs with spatial complexities scaling at most
polynomially in system sizes; S4: RBMs with a faster-than-polynomial scaling of spatial
complexities in system sizes, corresponding to inefficiency of representation; S5: ground
states of 1D quantum spin systems. The dashed boundary of S5 means that its relations
with other manifolds have not been fully determined.

38


manifold usually refers to a subspace of the whole Hilbert space spanned by a parameterized

wave-function family [58], thus it is a set containing a specific scope of quantum states.

The manifolds S1, S2, S
(j)
2 (for 1 ≤ j ≤ 6, j ∈ N), S3 and S4 are defined to be the space

spanned by quantum states represented by RBMs satisfying corresponding conditions as

given in Fig. 2.5, while S5 is defined to be the manifold spanned by all ground states of 1D

quantum many-body spin systems.

The definitions of these manifolds directly implies that S1 ⊊ S
(j)
2 ⊊ S2 (for 1 ≤ j ≤ 6).

Our complexity analysis for LRFD RBMs (Sec. 2.2.3) gives the result that S
(j)
2 ⊆ (S2∩S3).

Previous research shows that a set of problems where RBMs appear to be powerful are

related to topological states, among which the 1D SPT cluster states belong to S5∩S1 [30].

The Laughlin wave functions, which have the structure of Jastrow wave functions and are

associated with chiral topological order, can be exactly represented by RBMs in S3 with

a quadratic scaling of Nh in L but their approximations with RBMs of a long-range form

and less complexity are often used [8]. S4 contains all other sets mentioned in Fig. 2.5 as

RBMs without restriction on the number of hidden nodes are universal approximators for

discrete distribution [33]. Numerical results seem to support that a “large fraction” of S5

is contained in its intersection with S2. We argue that the concept of S2 may benefit the

understanding of which fraction of S5 falls into its intersection with S3, thus also promoting

the understanding of the complexity of quantum many-body states.

It is remarkable that our paradigm for complexity analysis and our characterization

of the nonlocal structures of RBMs for 1D quantum spin systems can be generalized to

higher-dimensional systems, e.g., lattices. This is done by generalizing the description of

single-level “orbitals” from µ(r) to µ(r⃗) while keeping λ(k̃) as a level-decay factor. For deep

39


σ1 σ2 σ3 σ4 σ5 σ6

ℎ1 ℎ2 ℎ3 ℎ4

Figure 2.6: Network structure of a sparse RBM to be transformed into an MPS. This RBM
serves as an example to show the possible failing of the original transformation algorithm
and the effect of our improvement.

NN quantum states, we can still view each single hidden layer as a combination of multiple

levels which capture correlations of different orders. We can calculate the truncation errors

for each hidden layer associated with specific nodal functions and analyze the propagation

of errors through layers.

2.5 Transformation from RBMs to MPSs

There are some works studying the relationship between RBMs and other concepts

about state representations, such as string-bond states [8], correlator product states [55]

and tensor network states [38, 56]. Especially, the transformation from RBMs to MPSs is

analyzed [56]. But such types of transformations may lead to redundancy in parametrization

when the RBMs are not quite short-range or sparse as they only use structural information

of the network. An algorithm for finding an optimal mapping from RBMs to MPSs is given

in Ref. [56], but we point out that this algorithm may fail and mainly works for short-range

or very sparse RBMs.

The core idea of the optimal transformation algorithm is to find a minimal-size

“separation set” when dichotomizing the visible nodes into two parts. In other words,

40


when the nodes in the “separation set” are excluded, the remaining two subsets of visible

nodes are not connected. It implies that the wave function of the quantum state can be

factorized into two independent parts conditioned on fixed values of spins in the “separation

set”. This “separation set” is very similar to the concept of a “Markov blanket” in the

undirected graph, but is different from the latter concept in that it allows the nodes in the

separated sets to be included.

We explore the reason for the failure of the optimal transformation algorithm and

put forward an improvement of the algorithm. The algorithm will fail if there are common

visible nodes in the two node sets to be separated (“X” and “Y” sets in Ref. [56]), because

they should be included in the “separation set”, but can not be done so as their correspond-

ing indices have been contracted at the last step. Such a phenomenon will appear when the

“separation set” at some step of the tensor construction contains too many visible nodes

waiting to be contracted. So this algorithm will fail when the RBM is not so sparse. One

example of the failing of this algorithm is given in Fig. 2.6 and Table 2.2. We point out

that this algorithm can be improved by not randomly choosing the “separation set” at the

construction of tensor A[σj] when there are several choices, but rather choosing the one

containing “σj” if possible. This principle can extend the length of steps in which there

exists a separation set. The effect of adopting such a principle is given in Table 2.3. The

improved algorithm can generate an MPS while the original one fails for the same RBM.

In both tables, wjk is an abbreviation of Wjkσjhk.

41


step X Y separation set tensor

1 σ1 σ2, ..., σ6 h1 A
(1)
1,h1

[σ1] = exp(a1σ1 + w11)

2 h1, σ2 σ3, ..., σ6 σ3, σ4 Algorithm fails

Table 2.2: Failing of the algorithm of finding an optimal transformation from RBM to
MPS. The RBM to be transformed has a network structure shown in Fig. 2.6. The original
algorithm uses a random selection of “separation sets”.

step X Y separation
set

tensor

1 σ1 σ2, ..., σ6 σ1 A
(1)
1,σ1

[σ1] = exp(a1σ1)

2 σ1, σ2 σ3, ..., σ6 σ2, h1 A
(2)
σ1,σ2h1

[σ2] = exp(a2σ2 + w11 + w21)

3 σ2, h1, σ3 σ4, ..., σ6 σ3, σ4 A
(3)
σ2h1,σ3σ4

[σ3] =
∑
h1,h2

exp(a3σ3+w22+w31+

w32 + w41 + w42 + b1h1 + b2h2)

4 σ3, σ4 σ5, σ6 h3, h4 A
(4)
σ3σ4,h3σ4

[σ4] = exp(a4σ4+w33+w43+w44)

5 h3, h4, σ5 σ6 σ6 A
(5)
h3h4,σ6

[σ5] =
∑
h3,h4

exp(a5σ5 + w53 + w54 +

w64 + b3h3 + b4h4)

6 σ6 ∅ ∅ A
(6)
σ6,1

[σ6] = exp(a6σ6)

Table 2.3: Success of generating an MPS using our improved algorithm. The RBM to be
transformed is the same as that in Table 2.2.

2.6 Conclusion

In this work, we define a subset of generic RBM quantum states—long-range-fast-

decay (LRFD) RBM states. Using the language of levels, the nonlocal structure of LRFD

RBMs is described with two functions: one of which, µ(r), captures the localization of

the spatial distribution of the wave function for each single level and encodes information

about spin correlations; the other, λ(k̃), is a level-decay factor capturing correlations of

different orders and significantly influencing the complexity of the RBMs. We derive upper

bounds on truncation errors, which allow us to analyze the scaling of the spatial complexity

in system sizes and approximation errors for LRFD RBMs. We provide numerical results

supporting that the ground states of a wide range of 1D quantum spin systems, including

42


some critical systems, may be approximated by LRFD RBMs with an at-most-polynomial

complexity. Finally, we describe the relationships between state manifolds of different

computational complexity and identify hierarchies of RBM-efficient approximation.

Generalizing the RBMwave-function ansatz to an infinitely-many-hidden-node regime

and proposing the concept of LRFD RBMs does not imply the use of an infinitely-large

neural network for state representations. These serve to define the completeness of a set of

variational states and serve as a tool for complexity analysis based on the good extensibility

and analyzability of LRFD-RBM forms. This concept may promote general understanding

of the intrinsic complexity of quantum many-body states.

43


Chapter 3: Learning quasiparticle excitations in long-range interacting

quantum systems using neural networks

3.1 Introduction

The interface between machine learning and quantum information processing is an

emerging and rapidly advancing field [2, 3, 5, 6, 24, 72, 73]. Besides the research on

quantum algorithms [11, 12, 13, 14] and physics-inspired ideas to enhance traditional

machine learning [15, 16] and on the fundamental limitations of physical agents based on

quantum information theory [74, 75], there has also been tremendous progress in directly

applying machine learning techniques to quantum systems. Indeed, these techniques were

applied to a wide range of topics including the identification of quantum phases and

transitions [3, 17, 18, 19, 21], molecular modeling [22, 23], and quantum state tomogra-

phy [24, 25]. Neural network states have been extensively studied recently as they provide

a compact wave-function ansatz in the context of quantum many-body physics [5, 6, 7, 8,

9, 28, 30, 31]. These states have achieved success in simulating the low-lying eigenstates

and short-time unitary dynamics of quantum many-body-localized systems [6, 34] and

representing particular states such as code words of a stabilizer code [30, 35, 36] and chiral

topological states [8, 37].

44


Numerous atomic, molecular, and optical systems exhibiting long-range interactions

are emerging as versatile platforms for quantum computation and quantum simulation [39,

40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]. These long-range interactions include

dipolar (decaying as 1/r3 with distance r) interactions between electric [39, 40] or magnetic

dipoles [41, 42, 43], strong van der Waals (1/r6) interactions between Rydberg atoms [39,

44, 45] or Rydberg polaritons [46], along with more general forms of interaction between

trapped ions (1/rα, 0 ≤ α ≤ 3) [47, 48, 49, 50, 51, 52]. Following a quench, quantum

information in long-range interacting systems can propagate faster than in short-range

interacting ones [48, 49, 76, 77, 78]. The notion of quasiparticles as elementary excitations

often provides an effective way for understanding non-equilibrium dynamics and quantum

thermalization [79, 80, 81]. For instance, the dynamics of correlation spreading and entan-

glement growth are often constrained by an effective light cone and at low energy they can

often be ascribed to the propagation of quasiparticles through the system [76, 77, 82, 83,

84, 85, 86, 87, 88, 89, 90]. This light cone is linear (t ∼ r) for short-range interactions, but,

for small α, can become sub-linear (e.g., t ∼ rβ, 0 < β < 1) or even instantaneous (for

any r, t = 0 in the thermodynamic limit). In the case of linear light cones, the maximum

speed of quantum information propagation can often be related to the maximum group

velocity of quasiparticles, while the sub-linear and instantaneous light cones can often be

associated with infinite group velocities of quasiparticles in the thermodynamic limit. In

higher dimensions D and in the presence of strongly long-range interactions, it can be hard

to numerically study the system with some widely-adopted methods, such as the density

matrix renormalization group, due to the violation of entanglement area law [69, 91, 92]

and intractability of contracting higher-dimensional projected-entangled pair states [93].

45


In this work, we use restricted Boltzmann machines (RBMs) to learn the momentum-

resolved low-energy spectrum of long-range interacting systems, which are studied in current

quantum simulation experiments [47, 48, 49, 50, 51, 52]. We introduce an energy-shift

method to calculate the excited states. The core idea of the method is to shift the energies

of the original Hamiltonian in a way that makes a target excited state the ground state

of the new Hamiltonian. This method has an advantage that the target excited state is

represented by a single RBM along the entire optimization path, rather than by multiple

RBMs [34]. Such a single-RBM representation has a smaller parameter space and its

variational parameters have a more natural physical interpretation, so that the geometrical

properties, such as the quantum Fisher matrix, of the optimization path can be studied more

easily and help understand the corresponding quantum phase [94, 95, 96]. Combined with a

fixed-quantum-number method [34] generalized to long-range interactions, this method can

resolve the full quasiparticle dispersion relation even in the presence of strongly long-range

interactions and works independently of the ground-state degeneracy and of the size of the

gap above the ground-state manifold.

Based on the low-energy spectrum, we identify the critical exponent αc where the

maximal quasiparticle group velocity transits from finite to divergent in the thermodynamic

limit in D = 1. The results are quantitatively consistent with an analysis using field theory

and linear spin-wave theory [68, 76, 85, 97]. Our use of RBMs to learn the spectrum of

two-dimensional (2D) models is particularly noteworthy since other numerical methods,

such as tensor network states, are often computationally intractable in 2D [93]. Our results

can help understand the information propagation speed, entanglement growth, and the rate

of thermalization in long-range interacting quantum systems. Our work can also be used

46


to provide benchmarks for large-scale 1D and 2D state-of-the-art quantum simulators.

3.2 Learning excited states with RBMs

We use the RBM as a variational wave-function ansatz to learn the eigenstates of

quantum many-body spin systems. An RBM is a stochastic artificial neural network for

probability modeling defined on a bipartite undirected graph [6]. Its first layer is the visible

layer containing a spin configuration σ⃗ as input, where σ⃗ = (σ1, . . . , σL) has L degrees of

freedom and σj = ±1 (j = 1, . . . , L) for spin-1/2 systems. The second layer is comprised

of hidden nodes h⃗ = (h1, . . . , hNh
) (hk = ±1 for k = 1, . . . , Nh), which are introduced as

auxiliary spins in the probability model. Given a specific spin configuration σ⃗, the RBM

outputs a wave function amplitude:

ψ(σ⃗) =
∑
h⃗

e
∑

j ajσj+
∑

k bkhk+
∑

j,kWjkσjhk

=
L∏
j=1

eajσj
Nh∏
k=1

2 cosh(bk +
∑
j

σjWjk), (3.1)

where aj and bk are the visible and hidden biases, respectively, Wjk are weights corre-

sponding to interlayer interactions, and all these parameters are complex numbers. All

such amplitudes defined on the computational basis yield a quantum state vector |Ψ⟩ =∑
σ⃗ ψ(σ⃗)|σ⃗⟩. This RBM ansatz possesses good expressive power [7] and can often be

optimized (trained) by several well-developed numerical methods, such as the stochastic

reconfiguration method [6, 64, 65] and stochastic gradient descent [66].

While the RBM can be used to find ground states by reinforcement learning [6], its

47


applicability to low-lying excited states of various quantum many-body systems and the

development of efficient optimization methods are less fully explored. Here we introduce

an energy-shift method which shifts the energies of the original Hamiltonian by adding

projection operators to make a target excited state the ground state of the new Hamiltonian.

Let |Ψj⟩ denote the j-th excited states of a given Hamiltonian Ĥ0 for a system containing

L spins, where j = 0, 1, . . . , 2L− 1 and j = 0 corresponds to the ground state. To calculate

|Ψj⟩, the idea is to first calculate all the lower excited states |Ψl⟩ (0 ≤ l ≤ j − 1) and then

modify Ĥ0 by adding projection operators that lift the energies of all |Ψl⟩:

Ĥj = Ĥ0 +

j−1∑
l=0

E
(shift)
l

|Ψl⟩⟨Ψl|
⟨Ψl|Ψl⟩

. (3.2)

This treatment is analogous to the quadratic penalty method for the optimization problem

with equality constraints [98] and leads to the convergence of the RBM to |Ψj⟩ in the

effective imaginary-time evolution [6], provided that E
(shift)
l is larger than the energy differ-

ence between |Ψj⟩ and |Ψl⟩. These projection operators can be efficiently implemented (Ap-

pendix B.1), thus at least low-lying excited states can be calculated successively in principle.

This method is also applicable to resolving nearly degenerate ground states. But for systems

with exact ground-state degeneracy such as spin glasses [99] and topologically ordered

systems [100, 101], how to improve the accuracy under the sign problem [102] and whether

RBMs can properly diagonalize the degenerate subspace deserves more investigations.

For translationally invariant systems, the symmetry information can be utilized to

calculate the lowest-energy eigenstate in each momentum sector [34]. In this RBM variant,

the original RBM wave-function ansatz (Eq. (4.1)) only applies to canonical spin configu-

48


rations, while the amplitudes of other configurations are constructed by building mappings

onto canonical configurations so that the whole state satisfies translational symmetry

constraints. Specifically, in D = 1, let T̂ denote the unit-distance translation operators

for spin configurations along the cha