ABSTRACT

Title of Dissertation: AGE OF INCORRECT INFORMATION:
A NEW PERFORMANCE METRIC IN
SEMANTIC COMMUNICATIONS

Yutao Chen
Doctor of Philosophy, 2023

Dissertation Directed by: Professor Anthony Ephremides
Department of Electrical and Computer Engineering

With the increasing popularity of smart devices and the rapid development of networking

and communication technologies, cyber-physical system applications have been widely deployed

and are receiving increasing attention. Some examples of these systems include vehicle networks,

where vehicles collect real-time external information through their on-board sensors and cameras

to generate a reliable description of the surroundings; intelligent transportation systems, where

real-time monitoring of road conditions and traffic congestion is essential; and natural or man-

made disaster prevention and management, where real-time monitoring of omens and disaster

propagation is crucial. A common feature of these systems is the high requirement for the

timeliness of the acquired information, which has led to the development of optimization frameworks

aimed at capturing information freshness. Age of Information (AoI) is a prime example, but it has

the drawback of only considering information freshness and ignoring the importance of content.

As a result, the Age of Incorrect Information (AoII) has been developed to capture both the


freshness and content of information. In this dissertation, we study the fundamental nature and

optimization of AoII in numerous systems.

With the proliferation of smart devices, energy consumption has become a major concern.

In the first part, we focus on the characteristics and performance of AoII under limited resources.

In particular, we propose an efficient algorithm to obtain the AoII-optimal policy under resource-

constrained conditions and compute the performance of the optimal policies.

The massive connectivity of communication systems has made scheduling a hot research

topic. In the second part, we analyze and optimize the performance of AoII in the scheduling

problem. We present the Whittle’s index policy for AoII, whose superior performance has been

recognized in many other problems. However, it also has limitations. Therefore, we propose

a new scheduling policy, the indexed priority policy, which has comparable performance to the

Whittle’s index policy but has broader applicability.

With the unprecedented increase in the amount and types of data to be transmitted and the

impact of external factors such as urban construction, data transmission will experience numerous

uncertainties. Therefore, in the third part, we study the characteristics and optimization of AoII

in an environment with random delays. Specifically, in the first half, we consider the case where

the communication channel suffers from a random delay. In the second half, we build on the first

half and consider the case where the transmitter has preemption capability. For both halves, we

precisely compute the performance of some canonical policies and theoretically find the optimal

policies, which lay the foundation for further generalization and application of AoII.


AGE OF INCORRECT INFORMATION: A NEW PERFORMANCE METRIC
IN SEMANTIC COMMUNICATIONS

by

Yutao Chen

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2023

Advisory Committee:
Professor Anthony Ephremides, Chair/Advisor
Professor Sennur Ulukus
Professor Prakash Narayan
Professor Behtash Babadi
Professor Ashok Agrawala


© Copyright by
Yutao Chen

2023


Dedication

This dissertation is dedicated to my family for their unwavering and boundless support.

ii


Acknowledgments

Foremost, I would like to thank my advisor, Dr. Ephremides, whose support and guidance

have been the cornerstone of my academic journey. As a beacon of scholarly excellence, he has

exemplified the traits of a good researcher and has always been a direction and role model for my

future endeavors. I still remember how nervous I was when I first met him, and how overwhelmed

I was to be a student of such a prestigious scholar in the academic world. However, his amiable

personality dispelled my worries and strengthened my resolve to emulate him in the future. I am

always grateful for his mentorship, which has left an indelible mark on my life.

I am also grateful to all the scholars who have helped me over the years. Whether our paths

have crossed or not, their advice is one of the things that motivates me to work hard and move

forward. I also thank all the members of my defense committee, whose constructive feedback

has helped me to refine my dissertation.

Next, I must express my heartfelt appreciation to my family, despite the distance that

separates us. Without the support and understanding of my family, it would have been difficult

for me to pursue my personal goals with determination. Their resolute backing in every decision

I made, even when the path was unclear, has been a vital source of strength and determination.

Once again, I would like to express my highest respect and gratitude to my family.

Finally, I would like to thank my friends, both in my home country and here. They have

been invaluable companions on this journey of life and have helped me in countless ways, both

iii


academically and in life. They have shared in my joys and sorrows, and their unwavering

camaraderie has been an invaluable asset that I cherish. To them, I offer my most profound

thanks and deep respect.

I would be remiss if I failed to acknowledge the mysterious force of luck that has been a

fortuitous presence in my life. Its timely intervention during critical moments has been a crucial

element in shaping who I am today. To luck, I extend my deepest appreciation, and I sincerely

hope that its influence will continue to grace my life.

iv


Table of Contents

Dedication ii

Acknowledgements iii

Table of Contents v

List of Tables ix

List of Figures x

Chapter 1: Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Semantic Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Information Freshness . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Age of Incorrect Information . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Research Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Outline and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Chapter 2: Freshness under Limited Resources 16
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 System Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Problem Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Lagrangian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Structural Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.3 Finite-State MDP Approximation . . . . . . . . . . . . . . . . . . . . . 25
2.3.4 Expected Transmission Rate . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.5 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Chapter 3: Scheduling for Freshness 40
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

v


3.2.2 System Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3 Structural Properties of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 48
3.4 Whittle’s Index Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4.1 Relaxed Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4.2 Decoupled Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4.3 Indexability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4.4 Whittle’s Index Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.5 Optimal Policy for Relaxed Problem . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5.1 Optimal Policy for Single User . . . . . . . . . . . . . . . . . . . . . . . 63
3.5.2 Optimal Policy for RP . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.6 Indexed Priority Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6.1 Primal-Dual Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6.2 Indexed Priority Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.7 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Chapter 4: Freshness against Generic Delay 81
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.2 System Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3 MDP Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.1 State Transition Probability . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.4 Policy Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.1 Multi-step State Transition Probability . . . . . . . . . . . . . . . . . . . 91
4.4.2 Stationary Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.4.3 Expected AoII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.5 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.5.1 Existence of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . . 103
4.5.2 Value Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.5.3 Policy Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5.4 Simplifying the Bellman Equation . . . . . . . . . . . . . . . . . . . . . 109
4.5.5 Optimality Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.6.1 Verification of Condition 1 . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.6.2 Performance of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 113

4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Chapter 5: Freshness with Preemption 117
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

vi


5.3 MDP Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.4 Strong Preemptive Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.5 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.5.1 Existence of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . . 129
5.5.2 Value Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.5.3 Policy Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.5.4 Geometric Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.5.5 Zipf Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

5.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.6.1 Verification of Condition 2 . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.6.2 Performance of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 141

5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Chapter 6: Conclusion and Outlook 145
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Appendix A: Freshness under Limited Resources 148
A.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.2 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
A.3 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
A.4 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.5 Proof of Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
A.6 Proof of Corollary 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
A.7 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Appendix B: Scheduling for Freshness 171
B.1 Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
B.2 Proof of Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
B.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.4 Proof of Corollary 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
B.5 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
B.6 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
B.7 Proof of Proposition 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
B.8 Proof of Proposition 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
B.9 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
B.10 Proof of Proposition 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
B.11 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
B.12 Proof of Proposition 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Appendix C: Freshness against Generic Delay 204
C.1 Details of State Transition Probability . . . . . . . . . . . . . . . . . . . . . . . 204
C.2 Proof of Lemma 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
C.3 Proof of Lemma 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
C.4 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

vii


C.5 Proof of Corollary 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
C.6 Proof of Lemma 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
C.7 Proof of Theorem 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
C.8 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
C.9 Proof of Lemma 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
C.10 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
C.11 Proof of Theorem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
C.12 Proof of Theorem 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
C.13 Proof of Theorem 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Appendix D: Freshness with Preemption 248
D.1 Details of State Transition Probability . . . . . . . . . . . . . . . . . . . . . . . 248
D.2 Proof of Lemma 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
D.3 Proof of Corollary 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
D.4 Proof of Lemma 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
D.5 Proof of Theorem 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
D.6 Proof of Theorem 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
D.7 Proof of Theorem 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Bibliography 277

viii


List of Tables

2.1 Optimal thresholds for different p . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Optimal thresholds for different ps . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Condition 2 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

ix


List of Figures

1.1 An illustration of the status update system. . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sample paths of ∆AoII(Xt, X̂t, t). In the figure, Xt and X̂t are the state of the

dynamic source and the receiver’s estimate at time slot t, respectively. Gi and Di

are the sampling time and delivery time of the i-th update, respectively. Note that
the sampling decisions in the graph are random. We assume that the sampling
occurs at the beginning of a time slot and the update arrives at the end of a time
slot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Illustrations of the Markovian source. . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Illustrations of the evolution of d. . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Illustrations of AoII-optimal policy and AoI-optimal policy. The truncation parameter

in ASM m = 800 and the tolerance in Bisection search ξ = 0.01. RVI converges
when the maximum difference between the results of two consecutive iterations
is less than ϵ = 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.1 The structure of the system model. . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Performance when the source processes vary. We choose pi = 0.05 + 0.4(i−1)

N−1
,

fi(s) = s, γi = 0.6, p0e,i = p0e, and p1e,i = 0.1 for 1 ≤ i ≤ N . . . . . . . . . . . . . 78

3.3 Performance when the communication goals vary. We choose fi(s) = s0.5+
i−1
N−1 ,

pi = 0.3, γi = 0.6, p0e,i = p0e, and p1e,i = 0.1 for 1 ≤ i ≤ N . . . . . . . . . . . . . 79
3.4 Performance in systems with random parameters when N = 5. The parameters

for each user are chosen randomly within the following intervals: γ ∈ [0, 1],
p ∈ [0.05, 0.45], p0e ∈ I , p1e ∈ [0, 0.45], and f(s) = sτ where τ ∈ [0.5, 1.5]. . . . . 79

4.1 An illustration of the system model, whereXk and X̂k are the state of the dynamic
source and the receiver’s estimate at time slot k, respectively. . . . . . . . . . . . 84

4.2 Illustrations of the expected AoII as a function of p and τ . We set the upper limit
of the transmission time tmax = 5 and the success probability in the Geometric
distribution ps = 0.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.3 Illustrations of the expected AoII as a function of ps and τ . We set the upper limit
of the transmission time tmax = 5 and the source dynamic p = 0.35. . . . . . . . 115

4.4 Illustrations of the expected AoII as a function of tmax and τ . We set the success
probability in the Geometric distribution ps = 0.7 and the source dynamic p = 0.35.116

x


5.1 An illustration of the system model. . . . . . . . . . . . . . . . . . . . . . . . . 119
5.2 A visual representation of the results of numerical check of Condition 2 when

the transmission delay follows the Zipf distribution with a = 2.25 under different
tmax and p. In the figure, the check mark indicates that Condition 2 is verified,
and the cross indicates that Condition 2 is not verified. . . . . . . . . . . . . . . . 141

5.3 The performance comparison when the transmission delay follows the Geometric
distribution. In this case, there are two system parameters. One is the Markovian
source dynamics p, and the other is the success probability ps in the Geometric
distribution. Therefore, we fix one of the parameters and plot the corresponding
results when the other parameter varies. . . . . . . . . . . . . . . . . . . . . . . 142

5.4 The performance comparison when the transmission delay follows the Zipf distribution.
In this case, there are three system parameters: the Markovian source dynamics
p, the constant a in the Zipf distribution, and the upper bound on the transmission
time tmax. We fix two of these parameters in the calculations, then vary the
remaining one and plot the corresponding results. . . . . . . . . . . . . . . . . . 143

B.1 Illustration of the DTMC induced by the threshold policy n = (n0, n1). In the
figure, c1 = (1− γ)(1− p) + γα and c2 = (1− γ) β + γα. . . . . . . . . . . . . 189

xi


Chapter 1: Introduction

1.1 Background and Motivation

With the development of communication technology, communication systems have become

ubiquitous in all aspects of our daily lives, not only for transmitting text, voice, and images [1–4].

This pervasiveness of communication systems and their expanding communication purposes

require us to set higher performance requirements. However, as the requirements increase, the

traditional optimization frameworks seem increasingly ineffectual. To illustrate, let us consider

the status update system depicted in Figure 1.1. In this system, the dispatcher controls the

Dynamic Source Dispatcher Channel Destination

Figure 1.1: An illustration of the status update system.

transmission of updates over a communication channel to a remote destination to ensure the

remote destination has the best real-time knowledge of the dynamic source. The performance

of the system depends on the chosen optimization framework. In traditionally optimization

frameworks, we choose to optimize metrics such as throughput and delay. However, these metrics

are objective-blind, i.e., they do not take into account the communication goals of the system.

They assume that optimizing these metrics will always result in the best system performance.

Unfortunately, the fact is that using these traditional metrics can result in reduced performance

1


and wasted resources.

To see such dissatisfaction, we again consider a time-sensitive application with the system

model given by Figure 1.1, where the dispatcher is now an elementary queue, and the dynamic

source submits data at an adjustable rate. We note that the most critical task in time-sensitive

applications is to guarantee the timeliness of the information at the receiving end. We know

that the most common optimization framework in throughput is to maximize it. To do this, we

need to increase the data generation rate to the point where system resources are fully utilized.

Increasing the data generation rate this way may sound appealing because resources are well

spent. However, this can cause problems in time-sensitive applications because increasing the

data generation rate can cause the data to wait too long in the queue, which significantly ages

the data. So, blindly maximizing the throughput is not the best option. At the same time, delay

causes the same dissatisfaction for the opposite reason. In delay minimization frameworks, we

minimize the time it takes for data to arrive at its destination from the start of transmission. Such

requirements are usually achieved by carefully adjusting the data generation rate so that the data

spends the minimum amount of time in the queue. However, we know that delays occur only

when data is being transmitted. Therefore, to reduce the average delay, the system can skip

transmissions if the expected delay is too large. Reducing the delay by skipping the transmission

will result in too little data being transmitted, which will harm the timeliness of the information

at the receiving end.

One of the reasons these optimization frameworks fall short in time-sensitive applications

is that these metrics treat all transmitted updates equally, ignoring the fact that not all updates

provide the same level of importance to the receiver for communication purposes. For example,

in time-sensitive applications, an update after a long idle is more important than an update after

2


a period of sustained transmissions. Hence, adopting traditional metrics alone is not enough. To

optimize communication performance, it is critical to consider other factors, such as the freshness

and content of the information being transmitted. By taking these factors into account, we can

achieve a more efficient communication system that better meets the specific communication

purposes of the data exchange.

1.2 Semantic Communication

Awareness of these problems and the importance of solving them has led to new ways of

designing and evaluating communication networks. One example is semantic communication.

According to [5], semantic communication is defined as ”the provisioning of the right and

significant piece of information to the right point of computation (or actuation) at the right

time”. However, traditional network optimization frameworks do not incorporate the semantics

of information, which can be highly inefficient given the unprecedented growth in data exchange.

Therefore, the first step in addressing this problem is to define the semantics of information.

In [5], the authors claim that the semantics of information is ”defined not necessarily as the

meaning of the messages, but as their significance, possibly within a real-time constraint, relative

to the purpose of the data exchange”. Hence, in semantics communications, updates are treated

differently based on their significance relative to the communication purpose. To achieve this

vision, unlike the metrics considered in conventional optimization frameworks, the metrics in

semantic communications should incorporate one or more semantic measures. The paper outlines

three examples of semantic measures, the first being Freshness, which captures the timeliness of

the information. This measure is crucial in applications that require real-time information, such as

3


vehicular networks. The second measure is Relevance, which measures the change in a process

since the last sample. This measure can alter the sampling strategies for remote estimation or

tracking with low-power devices. The third measure is Value, which evaluates the benefit of

having a particular sample compared to the cost of transmitting it. This measure is particularly

applicable to control networks, where incorporating the value of information can help achieve the

same level of performance with reduced data traffic. Overall, semantic communication marks a

departure from the traditional approaches of processing and transmitting the data by incorporating

the semantics of information.

1.2.1 Information Freshness

One of the most successful efforts in incorporating the semantics of information is the

introduction of the Age of Information (AoI) in [6]. AoI captures the freshness of information by

tracking the time elapsed since the generation of the last received update. Let V (t) be the most

recent generation time of updates received up to time t. Then, AoI at time t is defined by

∆AoI(t) = t− V (t).

We notice that the age increases when no new updates are received, and the delivery of new

updates can bring down the age. Therefore, AoI is small when the receiver has relatively timely

information. Note that, under AoI, not all updates are equal. For example, when the update

can significantly reduce the age at the receiver side, it becomes important and worth the extra

resources to transmit.

After its introduction, AoI has attracted extensive attention and research [7–9]. In general,

4


the AoI analysis can be divided into the following cases.

• Time-average Age: The average AoI over a time period T can be calculated by

⟨∆⟩T =
1

T

∫ T

0

∆AoI(t)dt.

In this case, there are two commonly used analysis tools. The first is based on a graphical

method that divides the zigzag dynamics into several trapezoids. The other leverages

the notion of stochastic hybrid system (SHS), a mathematical framework used to model

complex systems that exhibit both continuous and discrete behavior.

• Peak Age [10]: The peak age An is the age at the instant before the delivery of the nth

update. The average peak age is defined by

∆(p) = lim
T→∞

1

N(T )

N(T )∑
n=1

An,

where N(T ) is the number of peaks in the time period T . Peak age is more tractable than

time-average AoI.

• Functions of Age [11]: The age penalty function is denoted by p(∆AoI(t)) where p :

[0;∞) 7→ R is non-decreasing. One can specify the age penalty function based on the

applications.

Equipped with the above analysis approaches, the majority of the work on AoI can be

roughly divided into the following categories.

• AoI in queuing systems [6, 10, 12–17]: In this case, AoI is studied in a system with one or

5


more queues and sources submit updates according to stochastic processes.

• AoI under resource constraints [18–22]: In this case, AoI is examined in systems where the

capability of the transmitter is constrained by external factors such as limited or unstable

power supplies. Hence, the transmitter should choose wisely how to utilize the limited

resources to maximize the freshness of the information.

• AoI in remote estimation [11, 23–25]: The core task of remote estimation is to reconstruct

the real-time information of the transmitting end at the receiving end. Consequently, the

timeliness of the information is important. Hence, the relationship between estimation error

and AoI has also become a hot research direction.

• AoI in wireless networks [26–30]: In this case, AoI is studied in systems where the data

is distributed over wireless networks. The literature mainly focuses on AoI analysis and

optimization under conventional protocols. At the same time, age-based protocols are also

proposed and analyzed [31–35].

Although AoI captures well the timeliness of information, the limitation is that AoI ignores

the information content of the transmitted updates. Therefore, it falls short in the context of

remote estimation. For example, we want to estimate a rapidly changing event remotely. A

small AoI does not necessarily mean that the receiver has accurate information about the event.

Likewise, when the event changes slowly, the receiver may not need very timely information

to estimate the event relatively accurately. It is shown in [23] that optimizing the AoI does

not necessarily lead to the minimum estimation error. Thus, considering only the freshness

of the information may lead to non-optimal performances in some applications. At the same

time, we recall that error-based metrics, such as real-time error, are typical for remote estimation

6


tasks. Consequently, researchers seek to establish a more sophisticated framework that combines

freshness and error-based metrics, which brings us to the Age of Incorrect Information (AoII)

introduced in [8].

1.2.2 Age of Incorrect Information

We consider a basic transmitter-receiver pair in a slotted-time system, where the transmitter

observes a dynamic source and transmits updates over a communication channel to the remote

receiver. In light of the shortcomings discussed in the previous subsection, a more sophisticated

performance metric should capture the following two aspects of the system.

The first aspect is the purpose of the update transmission. Let us take remote estimation

as an example for a more intuitive understanding. Generally, in remote estimation problems,

the purpose of the update transmission is to allow the receiver to have enough information to

reconstruct the information of the remote source in real time. Thus, as long as the receiver

has enough information for its reconstruction, we can assume that the purpose of the update

transmission is achieved, and thus no further penalties need to be paid. We note that AoI does not

always capture this. For example, we want to monitor the water level in a reservoir. The water

level changes slowly. Hence, old information may still contain the correct information about the

water level. Then, we do not need to waste resources to keep the information at the receiver as

fresh as possible. For a more formal elaboration, let Xt and X̂t be the information content of the

dynamic source and the information content of the receiver at time slot t, respectively. Then, we

define the information penalty function as g(Xt, X̂t) where g(·) : S×S ′ 7→ [0,+∞). S and S ′ are

the state space of Xt and X̂t, respectively. Then, we can choose g(Xt, X̂t) so that it quantifies the

7


information inadequacy on the receiver side. Note that we define g(Xt, X̂t) ≜ 0 as the case where

the receiver has enough information to achieve the communication purpose. In practice, we can

choose different information penalty functions according to the sensitivity of different systems

to information inadequacy. For example, g(Xt, X̂t) = |Xt − X̂t|, g(Xt, X̂t) = (Xt − X̂t)
2,

g(Xt, X̂t) = 1{|Xt − X̂t| > c}, and so on.

The second aspect is the impact of inadequate information on system performance over

time. We notice that, for many real-world applications, the impact of information inadequacy on

system performance accumulates over time. In other words, an old, persistent inadequacy may

have more severe consequences than a new, short-term one. However, this is rarely captured by

traditional performance metrics. For example, when we use real-time error as the performance

metric, we focus on the estimation error of the moment and ignore its history, which can lead

to huge costs because small errors that persist for a long time can cause as much damage to the

system as a short-lived significant error. A real-world example of this is machine temperature

monitoring, where the damage caused by overheating accumulates rapidly over time. Therefore,

we introduce the time penalty function to capture the time effect. To this end, we first define

Ut = max{k ≤ t : g(Xk, X̂k) = 0}.

Hence, t−Ut captures the time elapsed since the last time the receiver had enough information to

achieve the communication purpose. Because of the different definitions of Ut and Vt, there is a

fundamental difference between AoI and t−Ut. Recall that AoI will continue to increase when no

new update is received, and the successful delivery of a new update can bring down AoI. However,

this is not true for t − Ut. Note that the old information may still convey enough information

8


for the receiver to accomplish the communication purpose, and the new update may still be

insufficient due to the communication delay. Hence, the delivery of updates does not necessarily

lead to a decrease or increase in the value of t−Ut. Then, on top of Ut, we define the time penalty

function as f(t − Ut) where f(·) : [0,+∞) 7→ [0,+∞) can be any non-decreasing function.

Then, we choose f(t−Ut) so that it can capture the aging process of the inadequate information.

In the application, we are free to choose which time penalty function to adopt based on the

characteristics of the system under consideration. For example, f(t) = t2, f(t) = log(1 + t),

f(t) = et, and so on. Then, combined with the information penalty function introduced earlier,

we are ready to present the formal definition of AoII.

In a slotted-time system, the AoII is defined by

∆AoII(Xt, X̂t, t) =
t∑

k=Ut+1

(
g(Xk, X̂k)F (k − Ut)

)
,

where F (t) ≜ f(t) − f(t − 1) captures the increment of the time penalty. We notice that

AoII increases if the receiver does not have enough information to achieve the communication

purpose, and the increase is amplified by the degree of inadequacy. Hence, AoII captures both

the inadequacy of the information and the aging process of the inadequate information. To better

understand the metric, we consider the case where we want to estimate a two-state dynamic

source remotely. Hence, Xt ∈ {0, 1} and X̂t ∈ {0, 1}. Moreover, we choose g(Xt, X̂t) =

|Xt − X̂t| and f(t) = t. Then, AoII can be simplified as ∆AoII(Xt, X̂t, t) = t − Ut. We note

that for this particular choice of penalty functions, AoII is simply the time elapsed since the

last time the receiver’s estimate was correct. A sample path of ∆AoII(Xt, X̂t, t) in this case is

depicted in Figure 1.2a. We also consider a more sophisticated setting, where the source process

9


has eight states. More precisely, Xt ∈ {0, 1, ..., 7} and X̂t ∈ {0, 1, ..., 7}. Meanwhile, we choose

f(t) = t2 and g(Xt, X̂t) = 1{|Xt − X̂t| ≥ 2}. A sample path of AoII with this setting is shown

in Figure 1.2b.

Unlike traditional metrics such as throughput and delay, AoII treats each update differently

depending on how well the information it carries can achieve the communication purpose and

the current AoII of the system. For example, we consider the case where an update carries the

enough information about the source, and the current AoII of the system is high. In this case, it

is worth spending extra resources to ensure the timely and accurate delivery of the update to the

receiving end. At the same time, AoII also differs from AoI in that AoI ignores the information

content of the transmitted updates. In AoII, the information content carried by the update will

strongly influence the evolution of AoII.

1.3 Research Trends

In [36], the authors consider a transmitter-receiver pair in a slotted time system. In the

setting, the updates are transmitted over an unreliable channel. The minimization of AoII for the

transmitter with and without power constraints is studied. The power constraints limit the average

number of transmissions a transmitter can initiate. When the transmitter is power-constrained,

the results show that the optimal policy is a mixture of two threshold policies, where the threshold

policy is the one under which the transmitter transmits the update only when the AoII exceeds

the threshold. Then, the authors extend the results to the case of the generic time penalty function

in [37]. Similar results are obtained in this case. Moreover, three real-life applications are studied

to highlight the performance advantages of AoII over AoI and real-time estimation error.

10


X1 = 1

X̂1 = 1

X2 = 0

X̂2 = 1

X3 = 1

X̂3 = 1

X4 = 0

X̂4 = 1

X5 = 0

X̂5 = 0

X6 = 1

X̂6 = 0

X7 = 1

X̂7 = 0

1

2

G1 D1/G2 D2 t

∆AoII (Xt, X̂t, t)

(a) For a two-state source with g(Xt, X̂t) = |Xt − X̂t| and f(t) = t.

X1 = 3

X̂1 = 2

X2 = 4

X̂2 = 2

X3 = 4

X̂3 = 2

X4 = 5

X̂4 = 4

X5 = 6

X̂5 = 4

X6 = 5

X̂6 = 4

X7 = 6

X̂7 = 4

X8 = 7

X̂8 = 5

X9 = 6

X̂9 = 5

G1 D1 G2 D2

1

4

t

∆AoII (Xt, X̂t, t)

(b) For an eight-state source with g(Xt, X̂t) = 1{|Xt − X̂t| ≥ 2} and f(t) = t2.

Figure 1.2: Sample paths of ∆AoII(Xt, X̂t, t). In the figure,Xt and X̂t are the state of the dynamic
source and the receiver’s estimate at time slot t, respectively. Gi and Di are the sampling time
and delivery time of the i-th update, respectively. Note that the sampling decisions in the graph
are random. We assume that the sampling occurs at the beginning of a time slot and the update
arrives at the end of a time slot.

11


The authors of [38] consider a problem where we need to monitor symmetric binary sources

over a communication channel with random packet delivery times. More precisely, the source

decides whether or not to sample at the beginning of each time slot. The sample enters the

channel, and after a random number of time slots delay, it is received error-free at the monitor.

The authors consider three performance metrics: AoII, AoI, and real-time error. The authors

formulate the optimal sampling problem using the Markov decision process (MDP) and apply a

simplified relative value iteration (RVI) algorithm to obtain the optimal policy for each metric.

Furthermore, the performance of the obtained optimal policy is compared with two canonical

sampling policies: the sample-at-change policy and the zero-wait policy, through simulation

results. Under the sample-at-change policy, it samples the source whenever the source changes

state. Under the zero-wait policy, it samples the source whenever possible. The numerical results

indicate that for various delay distributions and AoII penalty functions, the optimal policies for

the real-time error and AoII coincide with the sample-at-change policy. In contrast, the optimal

policy for AoI is a threshold policy. In most cases, the threshold is zero, which is equivalent to

the zero-wait policy.

The authors of [39] studied the AoII minimization problem in the context of scheduling.

They consider a system where the central scheduler monitors multiple Markovian sources and

needs to update multiple users. The central scheduler can only update a subset of the users in

each time slot. Hence, the central scheduler must choose which users to update to minimize the

AoII. A fundamental assumption made in the paper is that the central scheduler cannot know

the states of the sources prior to transmission. In this case, the central scheduler cannot know

the exact value of AoII in each time slot. To overcome this problem, the authors introduce the

belief value, which can be interpreted as the probability that the state at the receiver side is

12


correct. Then, the Whittle’s index policy is developed, and the performance is compared with the

Whittle’s index policy developed when AoI is adopted.

In the real world, we usually do not know the statistical model of the dynamic process

we want to observe, or we do not know the parameters of the statistical model. Minimizing

AoII, in this case, is usually difficult, but very important. Therefore, the authors of [40] consider

the problem of minimizing AoII without knowing the parameters of a Markovian process. The

relationship between the estimation error and AoII is studied in [41]. Moreover, a variant of

AoII, the Age of Incorrect Estimates (AoIE), is proposed in [42]. AoIE is defined as the product

of an estimation error and the time elapsed since the last time at which the receiver had a

sufficiently correct estimate. In the paper, the authors consider a slotted-time system where a

source generates status updates about an auto-regressive Markov process and sends the updates

to a remote receiver. Then, the receiver will make estimations about the realization of the

process based on the previously received updates using the least-square estimation strategy. The

paper considers the optimal sampling problem, and the results show that the optimal policy is a

threshold-type policy. Another variant called Age of Outdated Information (Ao2I) is introduced

in [43]. In this paper, Ao2I quantifies the elapsed time between the present and the first time when

the stored information at the destination becomes outdated compared to its source. Then, Ao2I

is studied in the context of scheduling, and the authors derive a theoretical lower bound for the

minimum Ao2I and propose a low-complexity online scheduler.

13


1.4 Outline and Contribution

In Chapter 2, we investigate the problem of minimizing AoII under certain constraints.

Specifically, we consider the AoII in a system where a transmitter sends updates about a multi-

state Markovian source to a remote receiver over an unreliable channel. The communication goal

is to minimize AoII subject to a power constraint. We cast the problem into a constrained Markov

decision process (CMDP) and prove that the optimal policy is a combination of two deterministic

threshold policies. Afterward, by leveraging the notion of RVI and the structural properties of

threshold policies, we propose an efficient algorithm to find the threshold policies as well as the

mixing coefficient.

In Chapter 3, we investigate the properties and optimization of AoII in scheduling problems.

More precisely, we study a slotted-time system where a base station needs to update multiple

users simultaneously. Due to the limited resources, only part of the users can be updated in

each time slot. We consider the problem of minimizing AoII when imperfect channel state

information (CSI) is available. Leveraging the notion of MDP, we obtain the structural properties

of the optimal policy. By introducing a relaxed version of the original problem, we develop the

Whittle’s index policy under a simple condition. However, indexability is required to ensure the

existence of the Whittle’s index. To avoid indexability, we develop a new scheduling policy called

the indexed priority policy based on the optimal policy for the relaxed problem.

In Chapter 4 and Chapter 5, we investigate the effect of delay on the performance and

optimization of AoII. Specifically, we consider a transmitter-receiver pair in a slotted-time system.

The transmitter observes a dynamic source and sends updates to a remote receiver over an error-

free communication channel that suffers a random delay. The receiver estimates the state of the

14


dynamic source using the received updates. In Chapter 4, we assume that when the channel is

busy, the transmitter can do nothing but wait for the channel to become idle. Then, we investigate

the problem of optimizing the transmitter’s action in each time slot to minimize AoII. We first

characterize the optimization problem using the MDP and investigate the performance of the

threshold policy, under which the transmitter transmits updates only when the transmission is

allowed and the AoII exceeds the threshold τ . By delving into the characteristics of the system

evolution, we precisely compute the expected AoII achieved by the threshold policy using the

Markov chain. Then, we prove that the optimal policy exists and provide a computable RVI

algorithm to estimate the optimal policy. Furthermore, by leveraging the policy improvement

theorem, we theoretically prove that, under an easily verifiable condition, the optimal policy is

the threshold policy with τ = 1. In Chapter 5, we consider a similar problem, but the transmitter

has the ability to preempt. In this scenario, we prove the existence of the optimal policy and

provide a feasible value iteration algorithm to approximate the optimal policy. We also analyze

the system characteristics under two canonical delay distributions and theoretically obtain the

corresponding optimal policies using the policy improvement theorem.

15


Chapter 2: Freshness under Limited Resources

2.1 Overview

With the massive deployment of communication devices, the energy consumption of these

devices has become a critical design consideration. Typically, these devices can only access

limited resources to perform their tasks because they must share a centralized energy source

with other devices or obtain their own energy from the outside world to operate for extended

periods of time without human intervention. This leads to the question of how to optimize

system performance with limited resources. In this chapter, we study the optimization problem

in the presence of power constraints. The existing literature [36, 37] only considers simple

communication models and penalty functions. Thus, the performance of more general AoII

in a more complicated system is still unclear. In particular, we study the system where the

source is modeled by a multi-state Markov chain. Meanwhile, the considered AoII adopts non-

binary information penalty function. Similar constraints are considered in [18], with the goal

of minimizing the AoI. Our problem is more complicated because AoI ignores the information

content of the transmitted updates. Remote estimation under resource constraints is studied

in [44–47]. However, they focus mainly on minimizing the estimation error but ignore the effect

of time since, in many real-world applications, persistent errors will cause more damage to the

system than short-lived errors.

16


The main contributions of this chapter can be summarized as follows. 1) We study the

minimization of AoII under power constraints. 2) We consider the AoII that adopts non-binary

information penalty function and model the source using a multi-state Markov chain. 3) We

rigorously prove the structural properties of the optimal policy and propose an efficient algorithm

to obtain the optimal policy.

This chapter summarizes the work in [48], and the rest is organized as follows. Section 2.2

elaborates on the modeling of the system and discusses the system dynamic under the chosen

AoII. In section 2.3, we present a step-by-step analysis of the optimization problem and detail

the proposed algorithm. Lastly, in Section 2.4, numerical results are laid out.

2.2 System Overview

2.2.1 System Model

We consider a slotted-time system in which a transmitter sends updates about a process to

a remote receiver through an unreliable channel. The transmitted update will not be corrupted

during the transmission but the transmission will not necessarily succeed. When the transmission

fails, the update will be discarded and it will not affect the transmitter’s decision at the next time

slot. We denote the channel realization as rt where rt = 1 if the transmission succeeds and

rt = 0 otherwise. We assume rt is independent and identically distributed over the time slots.

We define Pr(rt = 1) = ps and Pr(rt = 0) = 1 − ps = pf . We notice that, in many status-

update systems, the size of the update is very small so that the transmission time for an update

is much smaller than the time unit used to measure the dynamic of the process. Thus, when a

transmission attempt succeeds, the update is assumed to be received instantly by the receiver.

17


1 2 3 N − 1 N1− 2p

1− 2p 1− 2p 1− 2p

1− 2p

2p

p

p

p

p

2p

Figure 2.1: Illustrations of the Markovian source.

This assumption will provide us with analytical benefits, and a similar assumption is also made

in [49]. The source process {Xt}t∈N is modeled by an N-state Markov chain where transmissions

only happen between adjacent states with probability 2p and themselves with probability 1− 2p.

An illustration of the Markovian source is shown in Figure 2.1. The transmitter is capable of

generating updateXt by sampling the process at any time on its own will. However, the sampling

opportunities only occur at the beginning of each time slot. We assume the transmitter is also

capable of making transmission attempt in the same time slot the sampling happens. Every time

the transmission succeeds, the receiver will use the received update as its new estimate X̂t. The

receiver will send an ACK/NACK packet to inform the transmitter whether it has received

a new update. We suppose that the ACK/NACK packets will be delivered reliably, and the

transmission time is negligible as the packets are very small in general. Therefore, if ACK

is received, the transmitter knows that the transmission succeeded, and the receiver’s estimate

changed to the transmitted update. If NACK is received, the transmitter knows that the receiver

did not receive the new update, and the receiver’s estimate did not change. Hence, we can assume

that the transmitter always knows the receiver’s estimate.

The system adopts the AoII as the performance measure. Since the dynamic source has N

states, we let Xt ∈ {1, ..., N} and X̂t ∈ {1, ..., N}. Then, we choose g(Xt, X̂t) = |Xt− X̂t|, and

f(t) = t. Consequently, F (t) = 1 by its definition and dt ≜ g(Xt, X̂t) ∈ {0, 1, ..., N − 1}.

18


2.2.2 System Dynamic

Now, we tackle down the system dynamic which can be fully captured by the dynamic

of the pair (dt,∆t). ∆t is short for ∆AoII(Xt, X̂t, t). Thus, it is essential to characterize the

relationship between (dt+1,∆t+1) and (dt,∆t). We notice that the relationship depends on the

transmitter’s action and its result. Therefore, we define at ∈ {0, 1} as the transmitter’s action

at time t where at = 1 if the transmitter makes the transmission attempt and at = 0 otherwise.

Then, we can divide our discussion into the following three cases.

Case 1: at = 0. In this case, no new update is received by the receiver. Thus, the estimate

X̂t+1 will be nothing but X̂t, and Xt will evolve following the Markov chain shown in Figure 2.1.

When the state of the source process does not change which happens with probability 1− 2p, we

have Xt+1 = Xt. Then, dt+1 = dt. When the state of the source process changes, dt+1 depends

on the value of dt. Thus, we further distinguish between the following cases:

• When dt = 0, according to the Markovian source reported in Figure 2.1, dt+1 = 1 with

probability 2p.

• When 1 ≤ dt ≤ N − 2, to simplify our analysis, we assume, for any Xt ∈ {1, 2, ..., N},

Pr(Xt+1 = Xt − 1 | Xt) = Pr(Xt+1 = Xt + 1 | Xt) = p. Then, when Xt > X̂t,

dt+1 = |Xt ± 1 − X̂t| = |dt ± 1| = dt ± 1. When Xt < X̂t, dt+1 = |Xt ± 1 − X̂t| =

| − dt ± 1| = dt ∓ 1. Combining together, dt+1 = dt ± 1 with equal probability p.

• When dt = N − 1, Xt must be either 1 or N and X̂t must be either N or 1, respectively.

Combining with the Markovian source reported in Figure 2.1, dt+1 = N−2 with probability

2p.

19


0 1 2 N − 2 N − 1p1

p1 p1 p1

p1

2p2

p2

p2

p2

p2

2p2

Figure 2.2: Illustrations of the evolution of d.

Let us denote by Pd,d′ the transition probability from d to d′. Then, the results can be summarized

as follows. 

Pd,d = 1− 2p 0 ≤ d ≤ N − 1,

Pd,d+1 = Pd,d−1 = p 1 ≤ d ≤ N − 2,

P0,1 = PN−1,N−2 = 2p.

(2.1)

Such dynamic can be characterized by the Markov chain shown in Figure 2.2 with p1 = 1 − 2p

and p2 = p. Meanwhile, the value of ∆t+1 can be captured by the following two cases.

• When dt+1 = 0, the receiver’s estimate at time t+1 is correct. By definition, Ut+1 = t+1.

Hence, ∆t+1 = 0.

• When dt+1 ̸= 0, the receiver’s estimate at time t + 1 is incorrect. In this case, Ut+1 = Ut

by definition. Therefore, ∆t+1 = ∆t + dt+1.

To sum up,

∆t+1 = 1{dt+1 ̸= 0}(∆t + dt+1),

where 1{A} is an indicator function that takes value 1 when A is true and 0 otherwise.

Case 2: at = 1 but rt = 0. In this case, we notice that no new update is received by the

receiver. Following the same trajectory as in Case 1, we can conclude that the dynamic of dt can

20


be characterized by the Markov chain shown in Figure 2.2 with p1 = pf (1 − 2p) and p2 = pfp.

∆t+1 is fully dictated by ∆t and dt+1 as detailed in Case 1.

Case 3: at = 1 and rt = 1. In this case, the receiver receives the update instantly. Thus,

Ut+1 = t. Then, we can conclude that ∆t+1 = dt+1. Since the update is received instantly, we

have dt+1 ∈ {0, 1}. More precisely,

Pr[(dt+1,∆t+1) = (0, 0)|(dt,∆t), at = 1] = ps(1− 2p).

P r[(dt+1,∆t+1) = (1, 1)|(dt,∆t), at = 1] = 2psp.

Combining the above three cases, we can fully capture the evolution of (dt,∆t).

2.2.3 Problem Formulation

We consider the problem where there exists a unit power consumption along with each

transmission attempt regardless of the result. At the same time, the transmitter has a power

budget α < 1. We define ϕ = (a0, a1, ...) as a sequence of actions the transmitter takes and

denote all the feasible series of actions as Φ. Then, our problem can be formulated as

arg min
ϕ ∈ Φ

∆̄ϕ ≜ lim
T→∞

1

T
Eϕ

(
T−1∑
t=0

∆t

)

subject to R̄ϕ ≜ lim
T→∞

1

T
Eϕ

(
T−1∑
t=0

at

)
≤ α.

(2.2)

As (2.2) shows, the system is resource-constrained. Thus, we realize a necessity to require the

transmission attempts to help minimize AoII. More precisely, we require Pr[(0, 0) | (dt,∆t), at =

21


1] ≥ Pr[(0, 0) | (dt,∆t), at = 0] for any (dt,∆t). Leveraging the system dynamic in Section 2.2.2,

we conclude that it is sufficient to require p ∈ [0, 1
3
]. Therefore, we only consider the case of

p ∈ [0, 1
3
] throughout the rest of this chapter. A similar assumption is also made in [37].

We notice that solving problem (2.2) is equivalent to solving a constrained Markov decision

process (CMDP). To this end, we adopt the Lagrangian approach.

2.3 Problem Optimization

2.3.1 Lagrangian Approach

First of all, we write the constrained optimization problem (2.2) into its Lagrangian form.

L(ϕ, λ) = lim
T→∞

1

T
Eϕ

[
T−1∑
t=0

(∆t + λat)

]
− λα,

where λ is the Lagrange multiplier. Then, the corresponding dual function will be

G(λ) = min
ϕ∈Φ
L(ϕ, λ). (2.3)

According to the results in [50], the optimal policy for the constrained problem (2.2) can be

characterized by the optimal policies for the minimization problem (2.3) under certain λ. Thus,

we start with solving problem (2.3) for any given λ ≥ 0. As λα is independent of policy, we can

ignore it which leads to the following optimization problem.

minimize
ϕ ∈ Φ

lim
T→∞

1

T
Eϕ

[
T−1∑
t=0

(∆t + λat)

]
. (2.4)

22


The above problem can be cast into an infinite horizon with average cost Markov decision process

(MDP)M = (X ,A,P , C), where

• X denotes the state space: the state is x = (d,∆). We define xd = d and x∆ = ∆. We will

use x and (d,∆) to represent the state interchangeably for the rest of this chapter.

• A denotes the action space: the two feasible actions are making the transmission attempt

(a = 1) and staying idle (a = 0). The action space is independent of the state and the time.

More precisely, a ∈ A(x, t) = A = {0, 1}.

• P denotes the state transition probabilities: we define Px,x′(a) as the probability that action

a at state x will lead to state x′. The values of Px,x′(a) can be obtained easily from

Section 2.2.2.

• C denotes the instant cost: when the system is at state x and action a is chosen, the instant

cost is C(x, a) = x∆ + λa.

2.3.2 Structural Results

In this section, we provide the key structural properties of the optimal policy forM, which

plays a vital role in the analysis later on. The optimal policy for M is captured by its value

function V (x), which can be obtained by solving the Bellman equation. In the infinite horizon

with average cost MDP, the Bellman equation is defined as

θ + V (x) = x∆ + min
a∈{0,1}

{
λa+

∑
x′∈X

Px,x′(a)V (x′)

}
, (2.5)

23


where θ is the minimal value of (2.4). A canonical procedure to solve the Bellman equation is

applying the relative value iteration (RVI) [51]. To this end, we denote by Vν(·) the estimated

value function at iteration ν of RVI. We initialize V0(x) = x∆. Then, the estimated value function

is updated in the following way.

Vν+1(x) = Qν+1(x)−Qν+1(x
ref ), (2.6)

where xref is the reference state. Qν+1(x) is the interim value function and is calculated by

applying the right-hand side of (2.5). More precisely,

Qν+1(x) = x∆ + min
a∈{0,1}

{
λa+

∑
x′∈X

Px,x′(a)Vν(x
′)

}
. (2.7)

RVI is guaranteed to converge to V (·) when ν → +∞ regardless of the initialization [51].

However, it requires infinitely many iterations to achieve the exact solution. To conquer the

impracticality, we leverage the special properties of our system and provide the structural property

of Vν(·), which turns out to be enough to characterize the structure properties of the optimal policy

forM. We start with the following lemma.

Lemma 1. The estimated value function Vν(x) is increasing in both xd and x∆ at any iteration

ν.

Proof. Leveraging the iterative nature of RVI, we use induction to prove the desired results. The

complete proof can be found in Appendix A.1.

We refer state x as active if the optimal action at x is making the transmission attempt and

inactive otherwise. Then, leveraging Lemma 1, we provide the key structural properties of the

24


optimal policy forM.

Proposition 1. The optimal policy forM under any λ ≥ 0 is a threshold policy which possesses

the following properties.

• State (0, 0) will never be active.

• For states x with fixed xd ̸= 0, the optimal action a∗ will switch from a∗ = 0 to a∗ = 1 as

x∆ increases and the switching point (i.e. threshold) is non-increasing in xd.

Proof. The optimal action at state x is captured by the sign of δV (x) ≜ V 1(x) − V 0(x) where

V a(x) is the value function resulting from taking action a at state x. Then, we characterize the

sign of δV (x) using Lemma 1. The complete proof can be found in Appendix A.2.

We define the threshold for the states with fixed d ̸= 0 as the smallest ∆ such that a∗ = 1.

We notice that state (0, 0) will never be active if optimal policy is adopted. Hence, we can

characterize the optimal policy using the thresholds. In the following, an optimal policy is

denoted by a vector nλ where (nλ)i is the threshold for the states with d = i. The subscript

λ indicates the dependency between the optimal policy and λ.

2.3.3 Finite-State MDP Approximation

In the sequel, we tackle down the problem of finding the optimal policy for M. Direct

application of RVI becomes impractical as we need to estimate infinitely many value functions at

each iteration. To overcome this difficulty, we use approximating sequence method (ASM) [52]

and rigorously show the convergence of this approximation. To this end, we construct another

25


M(m) = (X (m),A,P(m), C) by truncating the value of ∆. More precisely, we impose

X (m) :


x
(m)
d ∈ {0, 1, ..., N − 1},

x
(m)
∆ ∈ {0, 1, ...,m},

where m is the predetermined maximal value of ∆. The transition probabilities from x ∈ X (m)

to z ∈ X − X (m) (called excess probabilities) are redistributed to the states x′ ∈ X (m) in the

following way.

P
(m)
x,x′ (a) = Px,x′(a) +

∑
z∈X−X (m)

Px,z(a)qz(x
′),

where qz(x
′) is the probability of distributing state z to state x′ and

∑
x′∈X (m) qz(x

′) = 1.

We choose qz(x′) = 1{x′d=zd} × 1{x′∆=m}. So, the transition probabilities P (m)
x,x′ (a) satisfy the

following.

P
(m)
x,x′ (a) =


Px,x′(a) x′∆ < m,

Px,x′(a) +
∑
G(z,x′)

Px,z(a) x′∆ = m,

where G(z, x′) = {z : zd = x′d, z∆ > m}. The action spaceA and the instant cost C are the same

as defined inM.

Theorem 1. The sequence of optimal policies forM(m) will converge to the optimal policy for

M as m→∞.

Proof. We show that our system verifies the two assumptions given in [52]. Then, the convergence

is guaranteed according to the results in [52]. The complete proof can be found in Appendix A.3.

For a given truncation parameter m, the state space X (m) is finite with size |X (m)| ∝ m.

26


When m is huge, the basic RVI will be inefficient since the minimum operator in (2.7) requires

calculations for both feasible actions at every state. In the following, we propose an improved

RVI which avoids minimum operators at certain states. To this end, we claim that the properties in

Proposition 1 are also possessed by the optimal policies forM(m) at any iteration ν of RVI. The

proof is omitted since it is very similar to what we did in Section 2.3.2. Utilizing Proposition 1,

we can conclude that, for any state x(m), if there exists an active state y(m) such that y(m)
∆ ≤ x

(m)
∆

and y(m)
d ≤ x

(m)
d , then x(m) must also be active. The update step at each iteration of the improved

RVI can be summarized as follows. For each x(m) ∈ X (m),

• if y(m) exists, we can determine the optimal action at x(m) immediately, and the minimum

operator is avoided.

• if y(m) does not exist, the optimal action at x(m) is determined by applying the minimum

operator.

In this way, we avoid
∑N

i=1(m − ni) minimum operators at each iteration of RVI where ni =

(nt)i and nt is the optimal policy at iteration ν of RVI. In almost all cases, we have ni ≪ m.

The pseudocode is given in Algorithm 1. In the algorithm, Vν converges when the maximum

difference between the results of two consecutive iterations is less than ϵ.

2.3.4 Expected Transmission Rate

In this section, we calculate the expected transmission rate R̄n under given threshold policy

n. It enables us to develop an efficient algorithm for finding the optimal policy for (2.2). As we

can see in (2.2), R̄n is nothing but the expected average number of transmission attempts made.

Thus, it can be fully captured by the stationary distribution of the discrete-time Markov chain

27


Algorithm 1 Improved Relative Value Iteration
1: procedure RELATIVEVALUEITERATION(M)
2: Initialize V0(x) = x∆; ν = 0
3: Choose xref ∈ X arbitrarily
4: while Vν is not converged do
5: for x ∈ X do
6: if ∃ active state y s.t. yd ≤ xd and y∆ ≤ x∆ then
7: a∗(x) = 1
8: Qν+1(x) = C(x, 1) +

∑
x′ Pxx′(1) · Vν(x′)

9: else
10: for a ∈ A do
11: Hx,a = C(x, a) +

∑
x′ Pxx′(a) · Vν(x′)

12: a∗(x) = argmina{Hx,a}
13: Qν+1(x) = Hx,a∗

14: for x ∈ X do
15: Vν+1(x) = Qν+1(x)−Qν+1(x

ref )

16: ν = ν + 1

17: return n← a∗(x)

(DTMC) induced by n. More precisely,

R̄n =
N−1∑
d=1

+∞∑
∆=nd

πd(∆),

where πd(∆) is the steady-state probability of state (d,∆) and nd = (n)d. To obtain the stationary

distribution, we utilize the balance equation associated with the induced DTMC which takes the

following form

π(x) =
∑
x′∈X

Px′,x(a
n
x′)π(x

′), (2.8)

where anx′ is the action suggested by policy n at state x′. The problem arises since the state

space of the induced DTMC is infinite. To overcome the difficulty, we present the following

proposition.

28


Proposition 2. The expected transmission rate under threshold policy n is

R̄n =
N−1∑
d=1

(
τ−1∑
∆=nd

πd(∆) + Πd(τ)

)
,

where τ = max{n}. πd(∆)’s and Πd(τ)’s are the solution to the following finite system of linear

equations.

π0(0) =(1− 2p)π0(0) + p
τ−1∑
∆=1

(1− psa1,∆)π1(∆) + pfpΠ1(τ)+

ps(1− 2p)
N−1∑
d=1

(
τ−1∑
∆=nd

πd(∆) + Πd(τ)

)
,

(2.9)

π1(1) = 2pπ0(0) + 2psp
N−1∑
d=1

(
τ−1∑
∆=nd

πd(∆) + Πd(τ)

)
, (2.10)

N−1∑
d=1

(
τ−1∑
∆=ld

πd(∆) + Πd(τ)

)
+ π0(0) = 1, (2.11)

and for each 1 ≤ d ≤ N − 1,

πd(∆) = 0, 0 < ∆ < ld, (2.12)

πd(∆) =
N−1∑
d′=1

Pd′,d(1− psad′,∆−d)πd′(∆− d), max{2, ld} ≤ ∆ ≤ τ − 1, (2.13)

Πd(τ) =
N−1∑
d′=1

Pd′,d

(
τ−1∑

∆=τ−d

(1− psad′,∆)πd′(∆) + pfΠd′(τ)

)
, (2.14)

where ld = d2+d
2

and ad,∆ is the action suggested by n at state (d,∆).

Proof. We cast the induced infinite-state Markov chain to an equivalent finite-state Markov chain

with size depending on the policy. Then, πd(∆)’s and Πd(τ)’s are the steady-state probabilities

29


of the finite-state Markov chain. The complete proof can be found in Appendix A.4.

Remark 1. We can verify that (2.9) is a linear combination of the other equations in the system

of linear equations. Therefore, we can exclude (2.9) in practice.

The system of linear equations can be reformulated into the matrix form Aπ = b where π

is the unknowns and A, b can be obtained easily from Proposition 2. However, solving a finite

system of linear equations can still be problematic, especially when the system is huge. For our

problem, the size of A is O((N − 1)τ), and we notice that A is sparse.

In general, solving a large system of linear equations of size O(n) requires O(n2) storage

and O(n3) floating-point arithmetic operations when A is dense. In the case of sparse A, the

computational cost will be less. The sparse matrix algorithms are designed to solve equations in

time and space proportional to O(n) +O(cn) where c is the average number of non-zero entries

in each column. Although there are cases where this linear target cannot be met, the complexity

of sparse linear algebra is far less than that in dense case [53]. Generally speaking, the complexity

depends on the sparsity of A. By exploiting the zero entries, we can often reduce the storage and

computational requirements to O(cn) and O(cn2), respectively.

The calculation of R̄n is constantly needed, and it requires a significant amount of time

when the thresholds in n are huge. Hence, we provide an efficient alternative that can approximate

the expected transmission rate in this case.

Corollary 1. When the thresholds in n are huge, the expected transmission rate under policy n

can be approximated as

R̄n ≈
N−1∑
d=1

(
τ−1∑
∆=nd

πd(∆) + Πd(τ)

)
,

where τ = max{n}. πd(∆)’s and Πd(τ)’s are the solution to the following finite system of linear

30


equations.

π0(0) =(1− 2p)π0(0) + pΠ1(η) + pfpΠ1(τ) + p

τ−1∑
∆=η+1

(1− psa1,∆)π1(∆)+

ps(1− 2p)
N−1∑
d=1

(
τ−1∑
∆=nd

πd(∆) + Πd(τ)

)
,

(2.15)

π1(1) = 2pπ0(0) + 2psp
N−1∑
d=1

(
τ−1∑
∆=nd

πd(∆) + Πd(τ)

)
, (2.16)

N−1∑
d=1

(
Πd(η) +

τ−1∑
∆=η+1

πd(∆) + Πd(τ)

)
+ π0(0) = 1, (2.17)

Π1(η)− π1(1) + π1(η + 1) =
N−1∑
d′=1

Pd′,1Πd′(η), (2.18)

Πd(η) +

η+d∑
∆=η+1

πd(∆) =
N−1∑
d′=1

Pd′,dΠd′(η), 2 ≤ d ≤ N − 1, (2.19)

and for each 1 ≤ d ≤ N − 1:

πd(∆) = ρ
N−1∑
d′=1

Pd′,d(1− psad′,∆−d)σd′(∆− d), η + 1 ≤ ∆ ≤ η + d, (2.20)

πd(∆) =
N−1∑
d′=1

Pd′,d(1− psad′,∆−d)πd′(∆− d), η + d+ 1 ≤ ∆ ≤ τ − 1, (2.21)

Πd(τ) =
N−1∑
d′=1

Pd′,d

(
τ−1∑

∆=τ−d

(1− psad′,∆)πd′(∆) + pfΠd′(τ)

)
, (2.22)

where ad,∆ is the action suggested by the threshold policy n at state (d,∆), η = min{n} − 1

and ρ = π0(0)
σ0(0)

. σd(∆) is the stationary distribution associated with the Markov chain induced by

another threshold policy n′ = [η′, ..., η′] where η′ = η + 1.

31


Proof. When the thresholds in n are huge, the expected transmission rate will be insignificant.

Combining with (2.10), we have π1(1) ≈ 2pπ0(0). Consequently, we can show that, for any state

(d,∆), πd(∆) ≈ cnd,∆π0(0) where cnd,∆ is a scalar that depends on the policy and the state. At

the same time, we notice that, for any two threshold policies n1 and n2, the suggested actions

at states with ∆ < min{[n1,n2]} are the same. We denote by G(n1,n2) the set of these states.

Then, we can prove that, for (d,∆) ∈ G(n1,n2),

π1
d(∆)

π2
d(∆)

≈ cd,∆π
1
0(0)

cd,∆π2
0(0)

=
π1
0(0)

π2
0(0)

, (2.23)

where π1
d(∆) and π2

d(∆) are the stationary distributions when n1 and n2 are adopted, respectively.

Based on (2.23), we can obtain the two systems of linear equations. The complete proof is in

Appendix A.5.

Remark 2. For the same reason as in Remark 1, we can exclude (2.15) in practice.

In Corollary 1, instead of solving a large system of linear equations of size O((N − 1)τ),

we approximate R̄λ by solving two systems of linear equations of size O((N − 1)(η + 1)) and

O((N−1)(τ−η+1)), respectively. It is worth noting that when τ ≈ η or τ ≫ η, the complexity

reduction of Corollary 1 is limited. For other cases, Corollary 1 can significantly reduce the

complexity and the resulting error is negligible. The methodology presented in Proposition 2

can also be applied to the calculation of the expected AoII ∆̄n. More precisely, we can use the

following corollary.

32


Corollary 2. The expected AoII under threshold policy n is

∆̄n =
N−1∑
d=1

(
τ−1∑
∆=ld

ωd(∆) + Ωd(τ)

)
,

where τ = max{n} and ld = d2+d
2

. ωd(∆)’s and Ωd(τ)’s are the solution to the following finite

system of linear equations. For each 1 ≤ d ≤ N − 1:

ω0(0) = 0; ωd(∆) = 0, ∆ < ld, (2.24)

ωd(∆) = ∆πd(∆), ld ≤ ∆ ≤ τ − 1, (2.25)

Ωd(τ) =
N−1∑
d′=1

Pd′,d

(
τ−1∑

∆=τ−d

(1− psad′,∆)ωd′(∆) + pfΩd′(τ)

)
+ dΠd(τ), (2.26)

where ad,∆ is the action suggested by the threshold policy n at state (d,∆). πd(∆)’s and Πd(τ)’s

can be obtained using Proposition 2 with the same threshold policy n.

Proof. We define ωd(∆) ≜ ∆πd(∆) and Ωd(τ) ≜
∑+∞

∆=τ ωd(∆). Then, we combine the states

with ∆ ≥ τ as did in Proposition 2. After some rearrangements, we can obtain the system of

linear equations shown above. The complete proof is in Appendix A.6.

2.3.5 Optimal Policy

Till this point, we are able to find the optimal policy for problem (2.4). However, our

goal is to find the optimal policy for the constrained problem (2.2). Based on the work in [50],

the optimal policy for problem (2.2) can be expressed as a mixture of two deterministic policies

that are both optimal for problem (2.4) with λ = λ∗. More precisely, the optimal policy can be

33


summarized in the following theorem.

Theorem 2. The optimal policy for the constrained problem (2.2) can be expressed as a mixture

of two deterministic policies nλ∗+ and nλ∗− that are both optimal for problem (2.4) with λ = λ∗ ≜

inf{λ > 0 : R̄λ ≤ α}. R̄λ is the expected transmission rate resulting from policy nλ. More

precisely, if we choose

µ =
α− R̄λ∗+

R̄λ∗− − R̄λ∗+

, (2.27)

the mixed policy nλ∗ , which selects nλ∗− with probability µ and nλ∗+ with probability 1 − µ

each time the system reaches state (0, 0), is optimal for the constrained problem (2.2) and the

constraint in (2.2) is met with equality.

Proof. We verify that our system satisfies all the assumptions given in [50]. Then, combining the

characteristics of our system and the results in [50], we obtain the optimal policy. The complete

proof is in Appendix A.7.

Next, we describe an efficient algorithm to obtain the optimal policy for the constrained

problem (2.2). The core of obtaining the optimal policy is to find λ∗. We recall that, for any

given λ, the deterministic policy nλ is obtained by applying the improved RVI and the resulting

R̄λ, which is non-increasing in λ [50], is calculated using Proposition 2. Hence, R̄λ can be

regarded as a non-increasing function of λ, and we can use Bisection search with tolerance ξ to

find λ∗. Then, λ∗+ and λ∗− can be the boundaries of the final interval. More precisely, we initialize

λ− = 0 and λ+ = 1. Then, the procedure can be summarized as follows.

• As long as R̄λ+ ≥ α, we set λ− = λ+ and λ+ = 2λ+. Then, we end up with an interval

I = [λ−, λ+].

34


Algorithm 2 Bisection Search
1: procedure BISECTIONSEARCH(M(m)(λ), α)
2: Initialize λ− = 0; λ+ = 1
3: nλ+ = RV I(M(m)(λ+), ϵ) using Algorithm 1
4: R̄λ+ = R̄(nλ+) using Proposition 2
5: while R̄λ+ ≥ α do
6: λ− = λ+; λ+ = 2λ+
7: nλ+ = RV I(M(m)(λ+), ϵ) using Algorithm 1
8: R̄λ+ = R̄(nλ+) using Proposition 2

9: while λ+ − λ− ≥ ξ do
10: λ = λ++λ−

2

11: nλ = RV I(M(m)(λ), ϵ) using Algorithm 1
12: R̄λ = R̄(nλ) using Proposition 2
13: if R̄λ ≥ α then
14: λ− = λ
15: else
16: λ+ = λ

17: return (λ∗+, λ
∗
−)← (λ+, λ−)

• We apply Bisection search on the interval I until the length of I is less than the tolerance

ξ. Then, the algorithm returns λ∗+ and λ∗−.

The pseudocode is given in Algorithm 2. Finally, the mixing coefficient µ is calculated using

(2.27) and the resulting expected AoII is calculated using Corollary 2. The algorithm is efficient

for the following reasons.

• We obtain nλ using the improved RVI which avoids minimum operators at certain states.

• When calculating R̄λ, we cast the induced infinite-state Markov chain to a finite-state

Markov chain.

• We find λ∗+ and λ∗− using Bisection search which has a logarithmic complexity.

35


0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3
p

0

2

4

6

8

10

12

14

E
xp

ec
te

d
 A

o
II

AoII-optimal policy
AoI-optimal policy

(a) The expected AoII in function
of p.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
p

s

0

20

40

60

80

100

120

E
xp

ec
te

d
 A

o
II

AoII-optimal policy
AoI-optimal policy

(b) The expected AoII in
function of ps.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0

5

10

15

20

25

30

35

40

45

50

E
xp

ec
te

d
 A

o
II

AoII-optimal policy
AoI-optimal policy

(c) The expected AoII in function
of α.

Figure 2.3: Illustrations of AoII-optimal policy and AoI-optimal policy. The truncation parameter
in ASM m = 800 and the tolerance in Bisection search ξ = 0.01. RVI converges when the
maximum difference between the results of two consecutive iterations is less than ϵ = 0.01.

2.4 Numerical Results

In this section, we provide numerical results that accent the effect of system parameters

on the performance of AoII-optimal policy. We also compare the AoII-optimal policy with the

AoI-optimal policy derived in [18].

Effect of p: We compare the performances of AoII-optimal policies under different values of

p. To this end, we fix N = 7 and ps = 0.8. We also set α = 0.06. We vary the value of p and

plot the corresponding results. As we can see in Figure 2.3a, the expected AoII is increasing in

p. To explain this trend, we notice that as p increases, the source process will be more inclined to

change state at the next time slot. Then, those successfully transmitted updates will more likely

be obsolete at the next time slot. As the power budget α is fixed which dictates the transmission

rate, the expected AoII will increase as p increases.

We also show, in Table 2.1, the deterministic policies nλ∗+ , nλ∗− and the corresponding

mixing coefficient µ for some values of p. 1 The optimal policy is the mixture of nλ∗+ and nλ∗−

1In the table, ” / ” indicates the threshold where the two policies differ.

36


Table 2.1: Optimal thresholds for different p

Mixing Coef. n1 n2 n3 n4 n5 n6

p = 0.1 µ = 0.7176 15 6/7 1 1 1 1
p = 0.2 µ = 0.0331 37 16 8/9 1 1 1
p = 0.3 µ = 0.1178 69 25/26 15 1 1 1

Table 2.2: Optimal thresholds for different ps

Mixing Coef. n1 n2 n3 n4 n5 n6

ps = 0.2 µ = 0.6712 556 228 140 96 70/71 60
ps = 0.4 µ = 0.3260 151 62 36/37 24 17 1
ps = 0.6 µ = 0.4089 67 27/28 16 1 1 1
ps = 0.8 µ = 0.0331 37 16 8/9 1 1 1

with mixing coefficient µ as described in Theorem 2. We can see that the thresholds are, in

general, increasing in p. The reason behind this is as follows. When p is small, the successfully

transmitted updates are more likely still accurate in the next few time slots. In another word, the

transmission is more ”efficient”. We refer a transmission as ”efficient” if it reduces the age to

the greatest extent. This allows the transmitter to make transmission attempts when the age is

relatively low without violating the power constraint.

Effect of ps: In this scenario, we fix p = 0.2 and investigate the effect of channel reliability ps

on the performance of AoII-optimal policy. We still consider the case of N = 7 and α = 0.06.

The corresponding results are shown in Figure 2.3b. As ps increases, the expected AoII will

decrease. The reason is as follows. As ps increases, the transmitted updates will more likely

be successful. Consequently, the transmission will be more ”efficient”. As the power budget is

fixed, the expected AoII will decrease as ps increases. We also present some selected thresholds

in Table 2.2. As we see in the table, the thresholds are, in general, decreasing in ps. We recall

37


that as ps increases, the transmission will be more ”efficient”. Thus, the transmitter can make

transmission attempts when the age is relatively low while keeping the transmission rate not

exceeding the power budget.

Effect of α: Then, we analyze the performances of AoII-optimal policies under different values

of α. We adopt N = 7 and ps = 0.8. We also set p = 0.2. The expected AoII achieved by the

AoII-optimal policies are plotted in Figure 2.3c. As we see, the expected AoII decreases as α

increases. The reason is simple. As the power budget increases, more transmission attempts are

allowed. We recall that we impose a transmission attempt to always help reduce the age. Keeping

this in mind, we can conclude that the expected AoII is decreasing in α. It is worth noting that as

α increases, the expected AoII will stop decreasing before α = 1. To explain this, we recall that

the transmitter will never make transmission attempt at state (0, 0) if an optimal policy is adopted.

Thus, as α becomes large, the transmitter will have enough budget to make transmission attempts

at any states other than (0, 0). Then, the transmission rate is saturated, and the expected AoII will

not decrease further.

Comparison with AoI-optimal policy: Lastly, we compare the AoII-optimal policy with the

AoI-optimal policy. From Figure 2.3, we can see that the performance gap expands with the

increase in p and the decrease in ps and α. The reason behind it lies in the value of transmission

attempts. As p increases, the source process becomes more inclined to change states. Therefore,

transmission attempts are more needed to bring correct information to the receiver. As ps decreases,

the number of successful transmissions will decrease, and the AoII will build up faster. In this

case, transmission attempts are more valuable because they will greatly reduce AoII once they

38


succeed. As α decreases, transmission attempts will be more valuable as fewer attempts are

allowed. Due to the different definitions of age, there are often cases where AoI is large, but

AoII is small. In these cases, the AoI-optimal policy will waste valuable transmission attempts.

Therefore, the increase in the value of transmission attempts will lead to an expansion of the

performance gap.

2.5 Conclusion

In this chapter, we consider a system where the source process is modeled by an N-state

Markov chain. The AoII that adopts non-binary information penalty function is used. We study

the problem of minimizing the AoII subject to a power constraint. By casting the problem

into a CMDP, we can prove that the optimal policy is a mixture of two deterministic threshold

policies. Then, an efficient algorithm is proposed to find such policies and the mixing coefficient.

Lastly, numerical results are provided to illustrate the performance of the AoII-optimal policy

and compare it with the AoI-optimal policy.

39


Chapter 3: Scheduling for Freshness

3.1 Overview

With the development of communication and sensor technology, the number of users that

a base station need to handle is constantly increasing. However, due to resource limitations, the

base station can only process a subset of the users simultaneously. This leads us to study how the

base station should reasonably schedule these resources so that all users can get optimal service

under a given evaluation system. Therefore, in this chapter, we study the AoII minimization

problem in the context of scheduling. We consider a system where a base station needs to

update a part of all available users at the same time. Meanwhile, we consider the generic time

penalty function and study the minimization problem in the presence of imperfect channel state

information (CSI). Due to the presence of CSI, the Whittle’s index policy becomes infeasible

in general. Therefore, we introduce another scheduling policy that is more versatile and has

comparable performance to Whittle’s index policy. The scheduling problem with AoI as the

performance measure is studied under various system settings in [54–58]. The problem studied

in this chapter is different and more complicated because AoII considers the aging process of

inconsistent information rather than the aging process of updates. Meanwhile, none of them

consider the case where CSI is available. The problem of optimizing information freshness in

the presence of CSI is studied in [59, 60]. However, they focus on the single-user system and

40


mainly discuss cases where CSI is perfect. The scheduling problems with the goal of minimizing

an error-based performance measure are considered in [61–63]. Our problem is fundamentally

different because AoII also considers the time effect. Moreover, we consider the system where

a base station observes multiple sources simultaneously and has to send updates to multiple

destinations.

The main contributions of this chapter can be summarized as follows. 1) We study the

problem of minimizing AoII in a multi-user system where imperfect CSI is available. Meanwhile,

the time penalty function is generic. 2) We derive the structural properties of the optimal policy

for the considered problem. 3) We establish the indexability of the considered problem under a

simple condition and develop the Whittle’s index policy. 4) We obtain the optimal policy for a

relaxed version of the original problem. By exploring the characteristics of the relaxed problem,

we provide an efficient algorithm to obtain the optimal policy. 5) Based on the optimal policy for

the relaxed problem, we develop the indexed priority policy, which is free of indexability and has

comparable performance to Whittle’s index policy.

This chapter summarizes the work in [64], and the rest is organized as follows. Section 3.2

introduces the system model and formulates the primal problem. In Section 3.3, we explore the

structural properties of the optimal policy for the primal problem. Under a simple condition, we

develop the Whittle’s index policy in Section 3.4. Section 3.5 presents the optimal policy for a

relaxed version of the primal problem. On this basis, we develop the indexed priority policy in

Section 3.6. Finally, in Section 3.7, the numerical results are laid out.

41


Base
Station

Receiver1

Receiver2

ReceiverN

U
S
E
R
1

U
S
E
R
2

U
S
E
R
N

Source1

Source2

SourceN

Channel1

Channel2

ChannelN

Figure 3.1: The structure of the system model.

3.2 System Overview

3.2.1 System Model

We consider a slotted-time system with N users and one base station. Each user has a

source process, a channel, and a receiver. We assume all the users share the same structure, but

the parameters are different. The structure of the communication model is provided in Figure 3.1.

For user i, the source process is modeled by a two-state Markov chain where transitions happen

between the two states with probability pi > 0 and self-transitions happen with probability 1−pi.

At any time slot t, the state of the source process Xi,t ∈ {0, 1} will be reported to the base

station as an update, and the base station will decide whether to transmit this update through

the corresponding channel. The channel is unreliable, but the estimate of the CSI is available at

the beginning of each time slot. Let ri,t ∈ {0, 1} be the CSI at time t. We assume that ri,t is

independent across time and user indices. ri,t = 1 if and only if the transmission attempt at time

t will succeed and ri,t = 0 otherwise. Then, we denote by r̂i,t ∈ {0, 1} the estimate of ri,t. We

42


assume that r̂i,t is an independent Bernoulli random variable with parameter γi, i.e., r̂i,t = 1 with

probability γi ∈ [0, 1] and r̂i,t = 0 with probability 1 − γi. However, the estimate is imperfect.

We assume that the error depends only on the user and its estimate. More precisely, we define

the probability of error as pr̂ie,i ≜ Pr[ri ̸= r̂i | r̂i]. We assume pr̂ie,i < 0.5 because we can flip

the estimate if pr̂ie,i > 0.5. We are not interested in the case of pr̂ie,i = 0.5 since r̂i,t is useless

in this case. Although the channel is unreliable, each transmission attempt takes exactly one

time slot regardless of the result, and the successfully transmitted update will not be corrupted.

Every time an update is received, the receiver will use it as the new estimate X̂i,t ∈ {0, 1}. The

receiver will send an ACK/NACK packet to inform the base station of its reception of the new

update. Since an ACK/NACK packet is generally very small and simple, we assume that it

is transmitted reliably and received instantaneously. Then, if ACK is received, the base station

knows that the receiver’s estimate changed to the transmitted update. If NACK is received, the

base station knows that the receiver’s estimate did not change. Therefore, the base station always

knows the estimate at the receiver side.

At the beginning of each time slot, the base station receives updates from each source and

the estimates of CSI from each channel. The old updates and estimates are discarded upon the

arrival of new ones. Then, the base station decides which updates to transmit, and the decision is

independent of the transmission history. Due to the limited resources, at mostM < N updates are

allowed per transmission attempt. We consider a base station that always transmits M updates.

All the users adopt AoII as a performance metric, but the choices of penalty functions

may vary. We consider the case where the users adopt the same information penalty function

g(Xt, X̂t) = |Xt − X̂t| but possibly different time penalty functions. To ease the analysis, we

require f(t) to be unbounded. Combined with the monotonicity requirement of the time penalty

43


function, we require f(t1) ≤ f(t2) if t1 < t2 and limt→+∞ f(t) = +∞. Without a loss of

generality, we assume f(0) = 0. Meanwhile, as the source is modeled by a two-state Markov

chain, g(Xt, X̂t) ∈ {0, 1}. Hence, AoII can be written as

∆AoII(Xt, X̂t, t) =
t∑

k=Ut+1

F (k − Ut) = f(st),

where Ut is defined in Section 1.2.2 and st ≜ t − Ut. Therefore, the evolution of st is sufficient

to characterize the evolution of AoII. To this end, we distinguish between the following cases.

• When the receiver’s estimate is correct at time t + 1, we have Ut+1 = t + 1. Then, by

definition, st+1 = 0.

• When the receiver’s estimate is incorrect at time t + 1, we have Ut+1 = Ut. Then, by

definition, st+1 = t+ 1− Ut = st + 1.

To sum up, we get

st+1 = 1{Ut+1 ̸= t+ 1}(st + 1). (3.1)

In the remainder of this chapter, we use fi(·) to denote the time penalty function user i adopts.

3.2.2 System Dynamic

In this section, we tackle the system dynamic. We notice that the status of user i can

be captured by the pair xi,t ≜ (si,t, r̂i,t). In the following, we will interchangeably use xi,t

and (si,t, r̂i,t). Then, the system dynamic can be fully characterized by the dynamic of xt ≜

(x1,t, ..., xN,t). Hence, it suffices to characterize the value of xt+1 given xt and the base station’s

action. To this end, we denote, by at = (a1,t, ..., aN,t), the base station’s action at time t. ai,t = 1

44


if the base station transmits the update from user i at time t and ai,t = 0 otherwise. We notice

that given action at, users are independent and the action taken on user i will only affect itself.

Consequently

Pr(xt+1 | xt,at) =
N∏
i=1

Pr(xi,t+1 | xi,t,at) =
N∏
i=1

Pr(xi,t+1 | xi,t, ai,t).

Combined with the fact that all the users share the same structure, it is sufficient to study the

dynamic of a single user. In the following discussions, we drop the user-dependent subscript i.

We recall that r̂t+1 is an independent Bernoulli random variable. Then, we have

Pr(xt+1 | xt, at) = P (r̂t+1)× Pr(st+1 | xt, at).

By definition, P (r̂t+1 = 1) = γ and P (r̂t+1 = 0) = 1 − γ. Then, we only need to tackle the

value of Pr(st+1 | xt, at). To this end, we distinguish between the following cases

• When xt = (0, r̂t), the estimate at time t is correct (i.e., X̂t = Xt). Hence, for the receiver,

Xt carries no new information about the source process. In other words, X̂t+1 = X̂t

regardless of whether an update is transmitted at time t. We recall that Ut+1 = Ut if

X̂t+1 ̸= Xt+1 and Ut+1 = t+ 1 otherwise. Since the source is binary, we obtain Ut+1 = Ut

if Xt+1 ̸= Xt, which happens with probability p and Ut+1 = t+1 otherwise. According to

(3.1), we obtain

Pr(1 | (0, r̂t), at) = p.

Pr(0 | (0, r̂t), at) = 1− p.

45


• When at = 0 and xt = (st, r̂t), where st > 0, the channel will not be used and no new

update will be received by the receiver,and so, X̂t+1 = X̂t. We recall that Ut+1 = Ut if

X̂t+1 ̸= Xt+1 and Ut+1 = t + 1 otherwise. Since Xt ̸= X̂t and the source is binary, we

have Ut+1 = Ut if Xt+1 = Xt, which happens with probability 1 − p and Ut+1 = t + 1

otherwise. According to (3.1), we obtain

Pr(st + 1 | (st, r̂t), at = 0) = 1− p.

Pr(0 | (st, r̂t), at = 0) = p.

• When at = 1 and xt = (st, 1) where st > 0, the transmission attempt will succeed with

probability 1−p1e and fail with probability p1e. We recall that Ut+1 = Ut if X̂t+1 ̸= Xt+1 and

Ut+1 = t + 1 otherwise. Then, when the transmission attempt succeeds (i.e., X̂t+1 = Xt),

Ut+1 = Ut if Xt+1 ̸= Xt and Ut+1 = t+ 1 otherwise. When the transmission attempt fails

(i.e., X̂t+1 = X̂t ̸= Xt), we have Ut+1 = Ut if Xt+1 = Xt and Ut+1 = t + 1 otherwise.

Combining (3.1) with the dynamic of the source process we obtain

Pr(st + 1 | (st, 1), at = 1) = p1e(1− p) + (1− p1e)p ≜ α.

Pr(0 | (st, 1), at = 1) = p1ep+ (1− p1e)(1− p) = 1− α.

• When at = 1 and xt = (st, 0), where st > 0, following the same line, we obtain

Pr(st + 1 | (st, 0), at = 1) = p0ep+ (1− p0e)(1− p) ≜ β.

46


Pr(0 | (st, 0), at = 1) = p0e(1− p) + (1− p0e)p = 1− β.

Combines together, we obtain the value of Pr(st+1 | xt, at) in all cases. As only M out of

N updates are allowed per transmission attempt, we realize a necessity to require transmission

attempts always help minimize AoII. It is equivalent to impose Pr(st+1 > st | (st, r̂t), at = 0) >

Pr(st+1 > st | (st, r̂t), at = 1) for any (st, r̂t). Leveraging the results above, it is sufficient to

require p < 0.5. As all the users share the same structure, we assume, for the rest of this chapter,

that 0 < pi < 0.5 for 1 ≤ i ≤ N .

3.2.3 Problem Formulation

The communication goal is to minimize the expected AoII. Therefore, the problem can be

formulated as the following

arg min
ϕ ∈ Φ

lim
T→∞

1

T
Eϕ

(
T−1∑
t=0

N∑
i=1

fi(si,t)

)

subject to
N∑
i=1

ai,t =M ∀t,
(3.2)

where Φ is the set of all causal policies. We refer to the constrained minimization problem

reported in (3.2) as the primal problem (PP). We notice that the PP is a restless multi-armed

bandit (RMAB) Problem. The optimal policy for this type of problem is far from reachable

since it is PSPACE-hard in general [65]. However, we can still derive the structural properties

of the optimal policy. These structural properties can be used as a guide for the development of

scheduling policies and can indicate the good performance of the developed scheduling policies.

47


3.3 Structural Properties of the Optimal Policy

In this section, we investigate the structural properties of the optimal policy for PP. We first

define an infinite horizon with an average cost Markov decision process (MDP)MN(w,M) =

(XN ,AN(M),PN , CN(w)), where

• XN denotes the state space. The state is x = (x1, ..., xN) where xi = (si, r̂i).

• AN(M) denotes the action space. The feasible action is a = (a1, ..., aN) where ai ∈ {0, 1}

and
∑N

i=1 ai =M . Note that the feasible actions are independent of the state and the time.

• PN denotes the state transition probabilities. We define Px,x′(a) as the probability that

action a at state x will lead to state x′. It is calculated by

Px,x′(a) =
N∏
i=1

P (r̂′i)Psi,s′i(ai, r̂i),

where Psi,s′i(ai, r̂i) is the transition probability from si to s′i when the estimate of CSI is r̂i

and action ai is taken. The values of Psi,s′i(ai, r̂i) can be obtained easily from the results in

Section 3.2.2.

• CN(w) denotes the instant cost. When the system is at state x and action a is taken, the

instant cost is C(x,a) ≜
∑N

i=1C(xi, ai) ≜
∑N

i=1

(
fi(si) + wai

)
.

We notice that PP can be cast intoMN(0,M). Since w = 0, the instant cost is independent

of action a. Therefore, we abbreviate C(x,a) as C(x). To simplify the analysis, we consider

the case of M = 1. Equivalently, we investigate the structural properties of the optimal policy

forMN(0, 1).

48


Remark 3. For the case of M > 1, we can apply the same methodology. However, as M

increases, the action space will grow quickly, resulting in the need to consider more feasible

actions in each step of the proof. Hence, to better demonstrate the methodology, we only consider

the case of M = 1 in this chapter.

It is well known that the optimal policy for MN(0, 1) can be characterized by the value

function. We denote the value function of state x as V (x). A canonical procedure to calculate

V (x) is applying the value iteration algorithm (VIA). To this end, we define Vν(·) as the estimated

value function at iteration ν of VIA and initialize V0(·) = 0. Then, VIA updates the estimated

value functions in the following way

Vν+1(x) = C(x)− θ + min
a∈AN (1)

{ ∑
x′∈XN

Px,x′(a)Vν(x
′)

}
, (3.3)

where θ is the optimal value ofMN(0, 1). VIA is guaranteed to converge to the value function [51].

More precisely, Vν(·) = V (·) when ν → +∞. However, the exact value function is impossible

to get since we need infinite iterations and the state space is infinite. Instead, we provide two

structural properties of the value function.

Lemma 2. ForMN(0, 1), V (x) is non-decreasing in si for 1 ≤ i ≤ N .

Proof. Leveraging the iterative nature of VIA, we use mathematical induction to prove the desired

results. The complete proof can be found in Appendix B.1.

Before introducing the next structural property, we make the following definition.

Definition 1 (Statistically identical). Two users are said to be statistically identical if the user-

dependent parameters and the adopted time penalty functions are the same.

49


For the users that are statistically identical, we can prove the following

Lemma 3. ForMN(0, 1), if users j and k are statistically identical, V (x) = V (P(x)) where

P(x) is state x with xj and xk exchanged.

Proof. Leveraging the iterative nature of VIA, we use mathematical induction to prove the desired

results. At each iteration, we show that for each feasible action at state x, we can find an

equivalent action at state P(x). Two actions are equivalent if they lead to the same value

function. The complete proof can be found in Appendix B.2.

Equipped with the above lemmas, we proceed with characterizing the structural properties

of the optimal policy. We recall that the optimal action at each state can be characterized by the

value function. Hence, we denote, by V j(x), the value function resulting from choosing user j

to update at state x. Then, V j(x) can be calculated by

V j(x) = C(x)− θ +
∑
x′−x′j


(∏
i ̸=j

Pxi,x′i(0)

)∑
r̂′j

P (r̂′j)
∑

s′j

Psj ,s′j(1, r̂j)V (x′)

 .

If V j(x) < V k(x) for all k ̸= j, it is optimal to transmit the update from user j. When V j(x) =

V k(x), the two choices are equally desirable. In the following, we will characterize the properties

of δj,k(x) ≜ V j(x)− V k(x) for any j and k.

Theorem 3. ForMN(0, 1), δj,k(x) has the following properties

1. δj,k(x) ≤ 0 if r̂k = p0e,k = 0. The equality holds when sj = 0 or r̂j = p0e,j = 0.

2. δj,k(x) is non-increasing in r̂j and is non-decreasing in r̂k when sj, sk > 0. At the same

time, δj,k(x) is independent of r̂i for any i ̸= j, k.

50


3. δj,k(x) ≤ 0 if sk = 0. The equality holds when sj = 0 or r̂j = p0e,j = 0.

4. δj,k(x) is non-increasing in sj if Γr̂jj ≤ Γr̂kk and is non-decreasing in sk if Γr̂jj ≥ Γr̂kk when

sj, sk > 0. We define Γ1
i ≜ αi

1−pi and Γ0
i ≜ βi

1−pi for 1 ≤ i ≤ N .

5. δj,k(x) ≤ 0 if sj ≥ sk, r̂j ≥ r̂k, and users j and k are statistically identical.

Proof. The proof can be found in Appendix B.3.

We notice that Γr̂ii can be written as

Γr̂ii =
Pr(si + 1 | (si, r̂i), ai = 1)

Pr(si + 1 | (si, r̂i), ai = 0)
< 1,

where si can be any positive integer. Consequently, Γr̂ii is independent of any si > 0 and indicates

the decrease in the probability of increasing si caused by action ai = 1. When Γr̂ii is large, action

ai = 1 will achieve a small decrease in the probability of increasing si. In the following, we

provide an intuitive interpretation of why the monotonicity in Property 4 of Theorem 3 depends

on Γr̂ii . We take the case of Γr̂jj ≤ Γr̂kk as an example and assume that there are only users j and k

in the system. Then, according to Section 3.2.2, the dynamic of sj and sk can be divided into the

following three cases

• Neither sj nor sk increases. In this case, both sj and sk become zero.

• Either sj or sk increases and the other becomes zero. We denote by P k
j the probability that

only sk increases when aj = 1. The notation for other cases is defined analogously. The

probabilities can be obtained easily using the