ABSTRACT Title of Dissertation: AGE OF INCORRECT INFORMATION: A NEW PERFORMANCE METRIC IN SEMANTIC COMMUNICATIONS Yutao Chen Doctor of Philosophy, 2023 Dissertation Directed by: Professor Anthony Ephremides Department of Electrical and Computer Engineering With the increasing popularity of smart devices and the rapid development of networking and communication technologies, cyber-physical system applications have been widely deployed and are receiving increasing attention. Some examples of these systems include vehicle networks, where vehicles collect real-time external information through their on-board sensors and cameras to generate a reliable description of the surroundings; intelligent transportation systems, where real-time monitoring of road conditions and traffic congestion is essential; and natural or man- made disaster prevention and management, where real-time monitoring of omens and disaster propagation is crucial. A common feature of these systems is the high requirement for the timeliness of the acquired information, which has led to the development of optimization frameworks aimed at capturing information freshness. Age of Information (AoI) is a prime example, but it has the drawback of only considering information freshness and ignoring the importance of content. As a result, the Age of Incorrect Information (AoII) has been developed to capture both the freshness and content of information. In this dissertation, we study the fundamental nature and optimization of AoII in numerous systems. With the proliferation of smart devices, energy consumption has become a major concern. In the first part, we focus on the characteristics and performance of AoII under limited resources. In particular, we propose an efficient algorithm to obtain the AoII-optimal policy under resource- constrained conditions and compute the performance of the optimal policies. The massive connectivity of communication systems has made scheduling a hot research topic. In the second part, we analyze and optimize the performance of AoII in the scheduling problem. We present the Whittle’s index policy for AoII, whose superior performance has been recognized in many other problems. However, it also has limitations. Therefore, we propose a new scheduling policy, the indexed priority policy, which has comparable performance to the Whittle’s index policy but has broader applicability. With the unprecedented increase in the amount and types of data to be transmitted and the impact of external factors such as urban construction, data transmission will experience numerous uncertainties. Therefore, in the third part, we study the characteristics and optimization of AoII in an environment with random delays. Specifically, in the first half, we consider the case where the communication channel suffers from a random delay. In the second half, we build on the first half and consider the case where the transmitter has preemption capability. For both halves, we precisely compute the performance of some canonical policies and theoretically find the optimal policies, which lay the foundation for further generalization and application of AoII. AGE OF INCORRECT INFORMATION: A NEW PERFORMANCE METRIC IN SEMANTIC COMMUNICATIONS by Yutao Chen Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor Anthony Ephremides, Chair/Advisor Professor Sennur Ulukus Professor Prakash Narayan Professor Behtash Babadi Professor Ashok Agrawala © Copyright by Yutao Chen 2023 Dedication This dissertation is dedicated to my family for their unwavering and boundless support. ii Acknowledgments Foremost, I would like to thank my advisor, Dr. Ephremides, whose support and guidance have been the cornerstone of my academic journey. As a beacon of scholarly excellence, he has exemplified the traits of a good researcher and has always been a direction and role model for my future endeavors. I still remember how nervous I was when I first met him, and how overwhelmed I was to be a student of such a prestigious scholar in the academic world. However, his amiable personality dispelled my worries and strengthened my resolve to emulate him in the future. I am always grateful for his mentorship, which has left an indelible mark on my life. I am also grateful to all the scholars who have helped me over the years. Whether our paths have crossed or not, their advice is one of the things that motivates me to work hard and move forward. I also thank all the members of my defense committee, whose constructive feedback has helped me to refine my dissertation. Next, I must express my heartfelt appreciation to my family, despite the distance that separates us. Without the support and understanding of my family, it would have been difficult for me to pursue my personal goals with determination. Their resolute backing in every decision I made, even when the path was unclear, has been a vital source of strength and determination. Once again, I would like to express my highest respect and gratitude to my family. Finally, I would like to thank my friends, both in my home country and here. They have been invaluable companions on this journey of life and have helped me in countless ways, both iii academically and in life. They have shared in my joys and sorrows, and their unwavering camaraderie has been an invaluable asset that I cherish. To them, I offer my most profound thanks and deep respect. I would be remiss if I failed to acknowledge the mysterious force of luck that has been a fortuitous presence in my life. Its timely intervention during critical moments has been a crucial element in shaping who I am today. To luck, I extend my deepest appreciation, and I sincerely hope that its influence will continue to grace my life. iv Table of Contents Dedication ii Acknowledgements iii Table of Contents v List of Tables ix List of Figures x Chapter 1: Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Semantic Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Information Freshness . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Age of Incorrect Information . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Research Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Outline and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 2: Freshness under Limited Resources 16 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2 System Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Problem Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 Lagrangian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2 Structural Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.3 Finite-State MDP Approximation . . . . . . . . . . . . . . . . . . . . . 25 2.3.4 Expected Transmission Rate . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.5 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 3: Scheduling for Freshness 40 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 v 3.2.2 System Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3 Structural Properties of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 48 3.4 Whittle’s Index Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4.1 Relaxed Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.2 Decoupled Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.3 Indexability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.4 Whittle’s Index Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5 Optimal Policy for Relaxed Problem . . . . . . . . . . . . . . . . . . . . . . . . 62 3.5.1 Optimal Policy for Single User . . . . . . . . . . . . . . . . . . . . . . . 63 3.5.2 Optimal Policy for RP . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.6 Indexed Priority Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6.1 Primal-Dual Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6.2 Indexed Priority Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.7 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Chapter 4: Freshness against Generic Delay 81 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.2 System Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3 MDP Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3.1 State Transition Probability . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.4 Policy Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.4.1 Multi-step State Transition Probability . . . . . . . . . . . . . . . . . . . 91 4.4.2 Stationary Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.3 Expected AoII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.5 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.5.1 Existence of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . . 103 4.5.2 Value Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.5.3 Policy Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.5.4 Simplifying the Bellman Equation . . . . . . . . . . . . . . . . . . . . . 109 4.5.5 Optimality Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.6.1 Verification of Condition 1 . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.6.2 Performance of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 113 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Chapter 5: Freshness with Preemption 117 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 vi 5.3 MDP Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.4 Strong Preemptive Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.5 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.5.1 Existence of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . . 129 5.5.2 Value Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.5.3 Policy Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.5.4 Geometric Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.5.5 Zipf Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.6.1 Verification of Condition 2 . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.6.2 Performance of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 141 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Chapter 6: Conclusion and Outlook 145 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Appendix A: Freshness under Limited Resources 148 A.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A.2 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 A.3 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 A.4 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 A.5 Proof of Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 A.6 Proof of Corollary 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 A.7 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Appendix B: Scheduling for Freshness 171 B.1 Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 B.2 Proof of Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 B.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 B.4 Proof of Corollary 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 B.5 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 B.6 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 B.7 Proof of Proposition 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 B.8 Proof of Proposition 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 B.9 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 B.10 Proof of Proposition 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 B.11 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 B.12 Proof of Proposition 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Appendix C: Freshness against Generic Delay 204 C.1 Details of State Transition Probability . . . . . . . . . . . . . . . . . . . . . . . 204 C.2 Proof of Lemma 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 C.3 Proof of Lemma 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 C.4 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 vii C.5 Proof of Corollary 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 C.6 Proof of Lemma 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 C.7 Proof of Theorem 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 C.8 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 C.9 Proof of Lemma 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 C.10 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 C.11 Proof of Theorem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 C.12 Proof of Theorem 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 C.13 Proof of Theorem 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Appendix D: Freshness with Preemption 248 D.1 Details of State Transition Probability . . . . . . . . . . . . . . . . . . . . . . . 248 D.2 Proof of Lemma 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 D.3 Proof of Corollary 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 D.4 Proof of Lemma 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 D.5 Proof of Theorem 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 D.6 Proof of Theorem 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 D.7 Proof of Theorem 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Bibliography 277 viii List of Tables 2.1 Optimal thresholds for different p . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2 Optimal thresholds for different ps . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.1 Condition 2 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 ix List of Figures 1.1 An illustration of the status update system. . . . . . . . . . . . . . . . . . . . . . 1 1.2 Sample paths of ∆AoII(Xt, X̂t, t). In the figure, Xt and X̂t are the state of the dynamic source and the receiver’s estimate at time slot t, respectively. Gi and Di are the sampling time and delivery time of the i-th update, respectively. Note that the sampling decisions in the graph are random. We assume that the sampling occurs at the beginning of a time slot and the update arrives at the end of a time slot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Illustrations of the Markovian source. . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Illustrations of the evolution of d. . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Illustrations of AoII-optimal policy and AoI-optimal policy. The truncation parameter in ASM m = 800 and the tolerance in Bisection search ξ = 0.01. RVI converges when the maximum difference between the results of two consecutive iterations is less than ϵ = 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.1 The structure of the system model. . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Performance when the source processes vary. We choose pi = 0.05 + 0.4(i−1) N−1 , fi(s) = s, γi = 0.6, p0e,i = p0e, and p1e,i = 0.1 for 1 ≤ i ≤ N . . . . . . . . . . . . . 78 3.3 Performance when the communication goals vary. We choose fi(s) = s0.5+ i−1 N−1 , pi = 0.3, γi = 0.6, p0e,i = p0e, and p1e,i = 0.1 for 1 ≤ i ≤ N . . . . . . . . . . . . . 79 3.4 Performance in systems with random parameters when N = 5. The parameters for each user are chosen randomly within the following intervals: γ ∈ [0, 1], p ∈ [0.05, 0.45], p0e ∈ I , p1e ∈ [0, 0.45], and f(s) = sτ where τ ∈ [0.5, 1.5]. . . . . 79 4.1 An illustration of the system model, whereXk and X̂k are the state of the dynamic source and the receiver’s estimate at time slot k, respectively. . . . . . . . . . . . 84 4.2 Illustrations of the expected AoII as a function of p and τ . We set the upper limit of the transmission time tmax = 5 and the success probability in the Geometric distribution ps = 0.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.3 Illustrations of the expected AoII as a function of ps and τ . We set the upper limit of the transmission time tmax = 5 and the source dynamic p = 0.35. . . . . . . . 115 4.4 Illustrations of the expected AoII as a function of tmax and τ . We set the success probability in the Geometric distribution ps = 0.7 and the source dynamic p = 0.35.116 x 5.1 An illustration of the system model. . . . . . . . . . . . . . . . . . . . . . . . . 119 5.2 A visual representation of the results of numerical check of Condition 2 when the transmission delay follows the Zipf distribution with a = 2.25 under different tmax and p. In the figure, the check mark indicates that Condition 2 is verified, and the cross indicates that Condition 2 is not verified. . . . . . . . . . . . . . . . 141 5.3 The performance comparison when the transmission delay follows the Geometric distribution. In this case, there are two system parameters. One is the Markovian source dynamics p, and the other is the success probability ps in the Geometric distribution. Therefore, we fix one of the parameters and plot the corresponding results when the other parameter varies. . . . . . . . . . . . . . . . . . . . . . . 142 5.4 The performance comparison when the transmission delay follows the Zipf distribution. In this case, there are three system parameters: the Markovian source dynamics p, the constant a in the Zipf distribution, and the upper bound on the transmission time tmax. We fix two of these parameters in the calculations, then vary the remaining one and plot the corresponding results. . . . . . . . . . . . . . . . . . 143 B.1 Illustration of the DTMC induced by the threshold policy n = (n0, n1). In the figure, c1 = (1− γ)(1− p) + γα and c2 = (1− γ) β + γα. . . . . . . . . . . . . 189 xi Chapter 1: Introduction 1.1 Background and Motivation With the development of communication technology, communication systems have become ubiquitous in all aspects of our daily lives, not only for transmitting text, voice, and images [1–4]. This pervasiveness of communication systems and their expanding communication purposes require us to set higher performance requirements. However, as the requirements increase, the traditional optimization frameworks seem increasingly ineffectual. To illustrate, let us consider the status update system depicted in Figure 1.1. In this system, the dispatcher controls the Dynamic Source Dispatcher Channel Destination Figure 1.1: An illustration of the status update system. transmission of updates over a communication channel to a remote destination to ensure the remote destination has the best real-time knowledge of the dynamic source. The performance of the system depends on the chosen optimization framework. In traditionally optimization frameworks, we choose to optimize metrics such as throughput and delay. However, these metrics are objective-blind, i.e., they do not take into account the communication goals of the system. They assume that optimizing these metrics will always result in the best system performance. Unfortunately, the fact is that using these traditional metrics can result in reduced performance 1 and wasted resources. To see such dissatisfaction, we again consider a time-sensitive application with the system model given by Figure 1.1, where the dispatcher is now an elementary queue, and the dynamic source submits data at an adjustable rate. We note that the most critical task in time-sensitive applications is to guarantee the timeliness of the information at the receiving end. We know that the most common optimization framework in throughput is to maximize it. To do this, we need to increase the data generation rate to the point where system resources are fully utilized. Increasing the data generation rate this way may sound appealing because resources are well spent. However, this can cause problems in time-sensitive applications because increasing the data generation rate can cause the data to wait too long in the queue, which significantly ages the data. So, blindly maximizing the throughput is not the best option. At the same time, delay causes the same dissatisfaction for the opposite reason. In delay minimization frameworks, we minimize the time it takes for data to arrive at its destination from the start of transmission. Such requirements are usually achieved by carefully adjusting the data generation rate so that the data spends the minimum amount of time in the queue. However, we know that delays occur only when data is being transmitted. Therefore, to reduce the average delay, the system can skip transmissions if the expected delay is too large. Reducing the delay by skipping the transmission will result in too little data being transmitted, which will harm the timeliness of the information at the receiving end. One of the reasons these optimization frameworks fall short in time-sensitive applications is that these metrics treat all transmitted updates equally, ignoring the fact that not all updates provide the same level of importance to the receiver for communication purposes. For example, in time-sensitive applications, an update after a long idle is more important than an update after 2 a period of sustained transmissions. Hence, adopting traditional metrics alone is not enough. To optimize communication performance, it is critical to consider other factors, such as the freshness and content of the information being transmitted. By taking these factors into account, we can achieve a more efficient communication system that better meets the specific communication purposes of the data exchange. 1.2 Semantic Communication Awareness of these problems and the importance of solving them has led to new ways of designing and evaluating communication networks. One example is semantic communication. According to [5], semantic communication is defined as ”the provisioning of the right and significant piece of information to the right point of computation (or actuation) at the right time”. However, traditional network optimization frameworks do not incorporate the semantics of information, which can be highly inefficient given the unprecedented growth in data exchange. Therefore, the first step in addressing this problem is to define the semantics of information. In [5], the authors claim that the semantics of information is ”defined not necessarily as the meaning of the messages, but as their significance, possibly within a real-time constraint, relative to the purpose of the data exchange”. Hence, in semantics communications, updates are treated differently based on their significance relative to the communication purpose. To achieve this vision, unlike the metrics considered in conventional optimization frameworks, the metrics in semantic communications should incorporate one or more semantic measures. The paper outlines three examples of semantic measures, the first being Freshness, which captures the timeliness of the information. This measure is crucial in applications that require real-time information, such as 3 vehicular networks. The second measure is Relevance, which measures the change in a process since the last sample. This measure can alter the sampling strategies for remote estimation or tracking with low-power devices. The third measure is Value, which evaluates the benefit of having a particular sample compared to the cost of transmitting it. This measure is particularly applicable to control networks, where incorporating the value of information can help achieve the same level of performance with reduced data traffic. Overall, semantic communication marks a departure from the traditional approaches of processing and transmitting the data by incorporating the semantics of information. 1.2.1 Information Freshness One of the most successful efforts in incorporating the semantics of information is the introduction of the Age of Information (AoI) in [6]. AoI captures the freshness of information by tracking the time elapsed since the generation of the last received update. Let V (t) be the most recent generation time of updates received up to time t. Then, AoI at time t is defined by ∆AoI(t) = t− V (t). We notice that the age increases when no new updates are received, and the delivery of new updates can bring down the age. Therefore, AoI is small when the receiver has relatively timely information. Note that, under AoI, not all updates are equal. For example, when the update can significantly reduce the age at the receiver side, it becomes important and worth the extra resources to transmit. After its introduction, AoI has attracted extensive attention and research [7–9]. In general, 4 the AoI analysis can be divided into the following cases. • Time-average Age: The average AoI over a time period T can be calculated by ⟨∆⟩T = 1 T ∫ T 0 ∆AoI(t)dt. In this case, there are two commonly used analysis tools. The first is based on a graphical method that divides the zigzag dynamics into several trapezoids. The other leverages the notion of stochastic hybrid system (SHS), a mathematical framework used to model complex systems that exhibit both continuous and discrete behavior. • Peak Age [10]: The peak age An is the age at the instant before the delivery of the nth update. The average peak age is defined by ∆(p) = lim T→∞ 1 N(T ) N(T )∑ n=1 An, where N(T ) is the number of peaks in the time period T . Peak age is more tractable than time-average AoI. • Functions of Age [11]: The age penalty function is denoted by p(∆AoI(t)) where p : [0;∞) 7→ R is non-decreasing. One can specify the age penalty function based on the applications. Equipped with the above analysis approaches, the majority of the work on AoI can be roughly divided into the following categories. • AoI in queuing systems [6, 10, 12–17]: In this case, AoI is studied in a system with one or 5 more queues and sources submit updates according to stochastic processes. • AoI under resource constraints [18–22]: In this case, AoI is examined in systems where the capability of the transmitter is constrained by external factors such as limited or unstable power supplies. Hence, the transmitter should choose wisely how to utilize the limited resources to maximize the freshness of the information. • AoI in remote estimation [11, 23–25]: The core task of remote estimation is to reconstruct the real-time information of the transmitting end at the receiving end. Consequently, the timeliness of the information is important. Hence, the relationship between estimation error and AoI has also become a hot research direction. • AoI in wireless networks [26–30]: In this case, AoI is studied in systems where the data is distributed over wireless networks. The literature mainly focuses on AoI analysis and optimization under conventional protocols. At the same time, age-based protocols are also proposed and analyzed [31–35]. Although AoI captures well the timeliness of information, the limitation is that AoI ignores the information content of the transmitted updates. Therefore, it falls short in the context of remote estimation. For example, we want to estimate a rapidly changing event remotely. A small AoI does not necessarily mean that the receiver has accurate information about the event. Likewise, when the event changes slowly, the receiver may not need very timely information to estimate the event relatively accurately. It is shown in [23] that optimizing the AoI does not necessarily lead to the minimum estimation error. Thus, considering only the freshness of the information may lead to non-optimal performances in some applications. At the same time, we recall that error-based metrics, such as real-time error, are typical for remote estimation 6 tasks. Consequently, researchers seek to establish a more sophisticated framework that combines freshness and error-based metrics, which brings us to the Age of Incorrect Information (AoII) introduced in [8]. 1.2.2 Age of Incorrect Information We consider a basic transmitter-receiver pair in a slotted-time system, where the transmitter observes a dynamic source and transmits updates over a communication channel to the remote receiver. In light of the shortcomings discussed in the previous subsection, a more sophisticated performance metric should capture the following two aspects of the system. The first aspect is the purpose of the update transmission. Let us take remote estimation as an example for a more intuitive understanding. Generally, in remote estimation problems, the purpose of the update transmission is to allow the receiver to have enough information to reconstruct the information of the remote source in real time. Thus, as long as the receiver has enough information for its reconstruction, we can assume that the purpose of the update transmission is achieved, and thus no further penalties need to be paid. We note that AoI does not always capture this. For example, we want to monitor the water level in a reservoir. The water level changes slowly. Hence, old information may still contain the correct information about the water level. Then, we do not need to waste resources to keep the information at the receiver as fresh as possible. For a more formal elaboration, let Xt and X̂t be the information content of the dynamic source and the information content of the receiver at time slot t, respectively. Then, we define the information penalty function as g(Xt, X̂t) where g(·) : S×S ′ 7→ [0,+∞). S and S ′ are the state space of Xt and X̂t, respectively. Then, we can choose g(Xt, X̂t) so that it quantifies the 7 information inadequacy on the receiver side. Note that we define g(Xt, X̂t) ≜ 0 as the case where the receiver has enough information to achieve the communication purpose. In practice, we can choose different information penalty functions according to the sensitivity of different systems to information inadequacy. For example, g(Xt, X̂t) = |Xt − X̂t|, g(Xt, X̂t) = (Xt − X̂t) 2, g(Xt, X̂t) = 1{|Xt − X̂t| > c}, and so on. The second aspect is the impact of inadequate information on system performance over time. We notice that, for many real-world applications, the impact of information inadequacy on system performance accumulates over time. In other words, an old, persistent inadequacy may have more severe consequences than a new, short-term one. However, this is rarely captured by traditional performance metrics. For example, when we use real-time error as the performance metric, we focus on the estimation error of the moment and ignore its history, which can lead to huge costs because small errors that persist for a long time can cause as much damage to the system as a short-lived significant error. A real-world example of this is machine temperature monitoring, where the damage caused by overheating accumulates rapidly over time. Therefore, we introduce the time penalty function to capture the time effect. To this end, we first define Ut = max{k ≤ t : g(Xk, X̂k) = 0}. Hence, t−Ut captures the time elapsed since the last time the receiver had enough information to achieve the communication purpose. Because of the different definitions of Ut and Vt, there is a fundamental difference between AoI and t−Ut. Recall that AoI will continue to increase when no new update is received, and the successful delivery of a new update can bring down AoI. However, this is not true for t − Ut. Note that the old information may still convey enough information 8 for the receiver to accomplish the communication purpose, and the new update may still be insufficient due to the communication delay. Hence, the delivery of updates does not necessarily lead to a decrease or increase in the value of t−Ut. Then, on top of Ut, we define the time penalty function as f(t − Ut) where f(·) : [0,+∞) 7→ [0,+∞) can be any non-decreasing function. Then, we choose f(t−Ut) so that it can capture the aging process of the inadequate information. In the application, we are free to choose which time penalty function to adopt based on the characteristics of the system under consideration. For example, f(t) = t2, f(t) = log(1 + t), f(t) = et, and so on. Then, combined with the information penalty function introduced earlier, we are ready to present the formal definition of AoII. In a slotted-time system, the AoII is defined by ∆AoII(Xt, X̂t, t) = t∑ k=Ut+1 ( g(Xk, X̂k)F (k − Ut) ) , where F (t) ≜ f(t) − f(t − 1) captures the increment of the time penalty. We notice that AoII increases if the receiver does not have enough information to achieve the communication purpose, and the increase is amplified by the degree of inadequacy. Hence, AoII captures both the inadequacy of the information and the aging process of the inadequate information. To better understand the metric, we consider the case where we want to estimate a two-state dynamic source remotely. Hence, Xt ∈ {0, 1} and X̂t ∈ {0, 1}. Moreover, we choose g(Xt, X̂t) = |Xt − X̂t| and f(t) = t. Then, AoII can be simplified as ∆AoII(Xt, X̂t, t) = t − Ut. We note that for this particular choice of penalty functions, AoII is simply the time elapsed since the last time the receiver’s estimate was correct. A sample path of ∆AoII(Xt, X̂t, t) in this case is depicted in Figure 1.2a. We also consider a more sophisticated setting, where the source process 9 has eight states. More precisely, Xt ∈ {0, 1, ..., 7} and X̂t ∈ {0, 1, ..., 7}. Meanwhile, we choose f(t) = t2 and g(Xt, X̂t) = 1{|Xt − X̂t| ≥ 2}. A sample path of AoII with this setting is shown in Figure 1.2b. Unlike traditional metrics such as throughput and delay, AoII treats each update differently depending on how well the information it carries can achieve the communication purpose and the current AoII of the system. For example, we consider the case where an update carries the enough information about the source, and the current AoII of the system is high. In this case, it is worth spending extra resources to ensure the timely and accurate delivery of the update to the receiving end. At the same time, AoII also differs from AoI in that AoI ignores the information content of the transmitted updates. In AoII, the information content carried by the update will strongly influence the evolution of AoII. 1.3 Research Trends In [36], the authors consider a transmitter-receiver pair in a slotted time system. In the setting, the updates are transmitted over an unreliable channel. The minimization of AoII for the transmitter with and without power constraints is studied. The power constraints limit the average number of transmissions a transmitter can initiate. When the transmitter is power-constrained, the results show that the optimal policy is a mixture of two threshold policies, where the threshold policy is the one under which the transmitter transmits the update only when the AoII exceeds the threshold. Then, the authors extend the results to the case of the generic time penalty function in [37]. Similar results are obtained in this case. Moreover, three real-life applications are studied to highlight the performance advantages of AoII over AoI and real-time estimation error. 10 X1 = 1 X̂1 = 1 X2 = 0 X̂2 = 1 X3 = 1 X̂3 = 1 X4 = 0 X̂4 = 1 X5 = 0 X̂5 = 0 X6 = 1 X̂6 = 0 X7 = 1 X̂7 = 0 1 2 G1 D1/G2 D2 t ∆AoII (Xt, X̂t, t) (a) For a two-state source with g(Xt, X̂t) = |Xt − X̂t| and f(t) = t. X1 = 3 X̂1 = 2 X2 = 4 X̂2 = 2 X3 = 4 X̂3 = 2 X4 = 5 X̂4 = 4 X5 = 6 X̂5 = 4 X6 = 5 X̂6 = 4 X7 = 6 X̂7 = 4 X8 = 7 X̂8 = 5 X9 = 6 X̂9 = 5 G1 D1 G2 D2 1 4 t ∆AoII (Xt, X̂t, t) (b) For an eight-state source with g(Xt, X̂t) = 1{|Xt − X̂t| ≥ 2} and f(t) = t2. Figure 1.2: Sample paths of ∆AoII(Xt, X̂t, t). In the figure,Xt and X̂t are the state of the dynamic source and the receiver’s estimate at time slot t, respectively. Gi and Di are the sampling time and delivery time of the i-th update, respectively. Note that the sampling decisions in the graph are random. We assume that the sampling occurs at the beginning of a time slot and the update arrives at the end of a time slot. 11 The authors of [38] consider a problem where we need to monitor symmetric binary sources over a communication channel with random packet delivery times. More precisely, the source decides whether or not to sample at the beginning of each time slot. The sample enters the channel, and after a random number of time slots delay, it is received error-free at the monitor. The authors consider three performance metrics: AoII, AoI, and real-time error. The authors formulate the optimal sampling problem using the Markov decision process (MDP) and apply a simplified relative value iteration (RVI) algorithm to obtain the optimal policy for each metric. Furthermore, the performance of the obtained optimal policy is compared with two canonical sampling policies: the sample-at-change policy and the zero-wait policy, through simulation results. Under the sample-at-change policy, it samples the source whenever the source changes state. Under the zero-wait policy, it samples the source whenever possible. The numerical results indicate that for various delay distributions and AoII penalty functions, the optimal policies for the real-time error and AoII coincide with the sample-at-change policy. In contrast, the optimal policy for AoI is a threshold policy. In most cases, the threshold is zero, which is equivalent to the zero-wait policy. The authors of [39] studied the AoII minimization problem in the context of scheduling. They consider a system where the central scheduler monitors multiple Markovian sources and needs to update multiple users. The central scheduler can only update a subset of the users in each time slot. Hence, the central scheduler must choose which users to update to minimize the AoII. A fundamental assumption made in the paper is that the central scheduler cannot know the states of the sources prior to transmission. In this case, the central scheduler cannot know the exact value of AoII in each time slot. To overcome this problem, the authors introduce the belief value, which can be interpreted as the probability that the state at the receiver side is 12 correct. Then, the Whittle’s index policy is developed, and the performance is compared with the Whittle’s index policy developed when AoI is adopted. In the real world, we usually do not know the statistical model of the dynamic process we want to observe, or we do not know the parameters of the statistical model. Minimizing AoII, in this case, is usually difficult, but very important. Therefore, the authors of [40] consider the problem of minimizing AoII without knowing the parameters of a Markovian process. The relationship between the estimation error and AoII is studied in [41]. Moreover, a variant of AoII, the Age of Incorrect Estimates (AoIE), is proposed in [42]. AoIE is defined as the product of an estimation error and the time elapsed since the last time at which the receiver had a sufficiently correct estimate. In the paper, the authors consider a slotted-time system where a source generates status updates about an auto-regressive Markov process and sends the updates to a remote receiver. Then, the receiver will make estimations about the realization of the process based on the previously received updates using the least-square estimation strategy. The paper considers the optimal sampling problem, and the results show that the optimal policy is a threshold-type policy. Another variant called Age of Outdated Information (Ao2I) is introduced in [43]. In this paper, Ao2I quantifies the elapsed time between the present and the first time when the stored information at the destination becomes outdated compared to its source. Then, Ao2I is studied in the context of scheduling, and the authors derive a theoretical lower bound for the minimum Ao2I and propose a low-complexity online scheduler. 13 1.4 Outline and Contribution In Chapter 2, we investigate the problem of minimizing AoII under certain constraints. Specifically, we consider the AoII in a system where a transmitter sends updates about a multi- state Markovian source to a remote receiver over an unreliable channel. The communication goal is to minimize AoII subject to a power constraint. We cast the problem into a constrained Markov decision process (CMDP) and prove that the optimal policy is a combination of two deterministic threshold policies. Afterward, by leveraging the notion of RVI and the structural properties of threshold policies, we propose an efficient algorithm to find the threshold policies as well as the mixing coefficient. In Chapter 3, we investigate the properties and optimization of AoII in scheduling problems. More precisely, we study a slotted-time system where a base station needs to update multiple users simultaneously. Due to the limited resources, only part of the users can be updated in each time slot. We consider the problem of minimizing AoII when imperfect channel state information (CSI) is available. Leveraging the notion of MDP, we obtain the structural properties of the optimal policy. By introducing a relaxed version of the original problem, we develop the Whittle’s index policy under a simple condition. However, indexability is required to ensure the existence of the Whittle’s index. To avoid indexability, we develop a new scheduling policy called the indexed priority policy based on the optimal policy for the relaxed problem. In Chapter 4 and Chapter 5, we investigate the effect of delay on the performance and optimization of AoII. Specifically, we consider a transmitter-receiver pair in a slotted-time system. The transmitter observes a dynamic source and sends updates to a remote receiver over an error- free communication channel that suffers a random delay. The receiver estimates the state of the 14 dynamic source using the received updates. In Chapter 4, we assume that when the channel is busy, the transmitter can do nothing but wait for the channel to become idle. Then, we investigate the problem of optimizing the transmitter’s action in each time slot to minimize AoII. We first characterize the optimization problem using the MDP and investigate the performance of the threshold policy, under which the transmitter transmits updates only when the transmission is allowed and the AoII exceeds the threshold τ . By delving into the characteristics of the system evolution, we precisely compute the expected AoII achieved by the threshold policy using the Markov chain. Then, we prove that the optimal policy exists and provide a computable RVI algorithm to estimate the optimal policy. Furthermore, by leveraging the policy improvement theorem, we theoretically prove that, under an easily verifiable condition, the optimal policy is the threshold policy with τ = 1. In Chapter 5, we consider a similar problem, but the transmitter has the ability to preempt. In this scenario, we prove the existence of the optimal policy and provide a feasible value iteration algorithm to approximate the optimal policy. We also analyze the system characteristics under two canonical delay distributions and theoretically obtain the corresponding optimal policies using the policy improvement theorem. 15 Chapter 2: Freshness under Limited Resources 2.1 Overview With the massive deployment of communication devices, the energy consumption of these devices has become a critical design consideration. Typically, these devices can only access limited resources to perform their tasks because they must share a centralized energy source with other devices or obtain their own energy from the outside world to operate for extended periods of time without human intervention. This leads to the question of how to optimize system performance with limited resources. In this chapter, we study the optimization problem in the presence of power constraints. The existing literature [36, 37] only considers simple communication models and penalty functions. Thus, the performance of more general AoII in a more complicated system is still unclear. In particular, we study the system where the source is modeled by a multi-state Markov chain. Meanwhile, the considered AoII adopts non- binary information penalty function. Similar constraints are considered in [18], with the goal of minimizing the AoI. Our problem is more complicated because AoI ignores the information content of the transmitted updates. Remote estimation under resource constraints is studied in [44–47]. However, they focus mainly on minimizing the estimation error but ignore the effect of time since, in many real-world applications, persistent errors will cause more damage to the system than short-lived errors. 16 The main contributions of this chapter can be summarized as follows. 1) We study the minimization of AoII under power constraints. 2) We consider the AoII that adopts non-binary information penalty function and model the source using a multi-state Markov chain. 3) We rigorously prove the structural properties of the optimal policy and propose an efficient algorithm to obtain the optimal policy. This chapter summarizes the work in [48], and the rest is organized as follows. Section 2.2 elaborates on the modeling of the system and discusses the system dynamic under the chosen AoII. In section 2.3, we present a step-by-step analysis of the optimization problem and detail the proposed algorithm. Lastly, in Section 2.4, numerical results are laid out. 2.2 System Overview 2.2.1 System Model We consider a slotted-time system in which a transmitter sends updates about a process to a remote receiver through an unreliable channel. The transmitted update will not be corrupted during the transmission but the transmission will not necessarily succeed. When the transmission fails, the update will be discarded and it will not affect the transmitter’s decision at the next time slot. We denote the channel realization as rt where rt = 1 if the transmission succeeds and rt = 0 otherwise. We assume rt is independent and identically distributed over the time slots. We define Pr(rt = 1) = ps and Pr(rt = 0) = 1 − ps = pf . We notice that, in many status- update systems, the size of the update is very small so that the transmission time for an update is much smaller than the time unit used to measure the dynamic of the process. Thus, when a transmission attempt succeeds, the update is assumed to be received instantly by the receiver. 17 1 2 3 N − 1 N1− 2p 1− 2p 1− 2p 1− 2p 1− 2p 2p p p p p 2p Figure 2.1: Illustrations of the Markovian source. This assumption will provide us with analytical benefits, and a similar assumption is also made in [49]. The source process {Xt}t∈N is modeled by an N-state Markov chain where transmissions only happen between adjacent states with probability 2p and themselves with probability 1− 2p. An illustration of the Markovian source is shown in Figure 2.1. The transmitter is capable of generating updateXt by sampling the process at any time on its own will. However, the sampling opportunities only occur at the beginning of each time slot. We assume the transmitter is also capable of making transmission attempt in the same time slot the sampling happens. Every time the transmission succeeds, the receiver will use the received update as its new estimate X̂t. The receiver will send an ACK/NACK packet to inform the transmitter whether it has received a new update. We suppose that the ACK/NACK packets will be delivered reliably, and the transmission time is negligible as the packets are very small in general. Therefore, if ACK is received, the transmitter knows that the transmission succeeded, and the receiver’s estimate changed to the transmitted update. If NACK is received, the transmitter knows that the receiver did not receive the new update, and the receiver’s estimate did not change. Hence, we can assume that the transmitter always knows the receiver’s estimate. The system adopts the AoII as the performance measure. Since the dynamic source has N states, we let Xt ∈ {1, ..., N} and X̂t ∈ {1, ..., N}. Then, we choose g(Xt, X̂t) = |Xt− X̂t|, and f(t) = t. Consequently, F (t) = 1 by its definition and dt ≜ g(Xt, X̂t) ∈ {0, 1, ..., N − 1}. 18 2.2.2 System Dynamic Now, we tackle down the system dynamic which can be fully captured by the dynamic of the pair (dt,∆t). ∆t is short for ∆AoII(Xt, X̂t, t). Thus, it is essential to characterize the relationship between (dt+1,∆t+1) and (dt,∆t). We notice that the relationship depends on the transmitter’s action and its result. Therefore, we define at ∈ {0, 1} as the transmitter’s action at time t where at = 1 if the transmitter makes the transmission attempt and at = 0 otherwise. Then, we can divide our discussion into the following three cases. Case 1: at = 0. In this case, no new update is received by the receiver. Thus, the estimate X̂t+1 will be nothing but X̂t, and Xt will evolve following the Markov chain shown in Figure 2.1. When the state of the source process does not change which happens with probability 1− 2p, we have Xt+1 = Xt. Then, dt+1 = dt. When the state of the source process changes, dt+1 depends on the value of dt. Thus, we further distinguish between the following cases: • When dt = 0, according to the Markovian source reported in Figure 2.1, dt+1 = 1 with probability 2p. • When 1 ≤ dt ≤ N − 2, to simplify our analysis, we assume, for any Xt ∈ {1, 2, ..., N}, Pr(Xt+1 = Xt − 1 | Xt) = Pr(Xt+1 = Xt + 1 | Xt) = p. Then, when Xt > X̂t, dt+1 = |Xt ± 1 − X̂t| = |dt ± 1| = dt ± 1. When Xt < X̂t, dt+1 = |Xt ± 1 − X̂t| = | − dt ± 1| = dt ∓ 1. Combining together, dt+1 = dt ± 1 with equal probability p. • When dt = N − 1, Xt must be either 1 or N and X̂t must be either N or 1, respectively. Combining with the Markovian source reported in Figure 2.1, dt+1 = N−2 with probability 2p. 19 0 1 2 N − 2 N − 1p1 p1 p1 p1 p1 2p2 p2 p2 p2 p2 2p2 Figure 2.2: Illustrations of the evolution of d. Let us denote by Pd,d′ the transition probability from d to d′. Then, the results can be summarized as follows.  Pd,d = 1− 2p 0 ≤ d ≤ N − 1, Pd,d+1 = Pd,d−1 = p 1 ≤ d ≤ N − 2, P0,1 = PN−1,N−2 = 2p. (2.1) Such dynamic can be characterized by the Markov chain shown in Figure 2.2 with p1 = 1 − 2p and p2 = p. Meanwhile, the value of ∆t+1 can be captured by the following two cases. • When dt+1 = 0, the receiver’s estimate at time t+1 is correct. By definition, Ut+1 = t+1. Hence, ∆t+1 = 0. • When dt+1 ̸= 0, the receiver’s estimate at time t + 1 is incorrect. In this case, Ut+1 = Ut by definition. Therefore, ∆t+1 = ∆t + dt+1. To sum up, ∆t+1 = 1{dt+1 ̸= 0}(∆t + dt+1), where 1{A} is an indicator function that takes value 1 when A is true and 0 otherwise. Case 2: at = 1 but rt = 0. In this case, we notice that no new update is received by the receiver. Following the same trajectory as in Case 1, we can conclude that the dynamic of dt can 20 be characterized by the Markov chain shown in Figure 2.2 with p1 = pf (1 − 2p) and p2 = pfp. ∆t+1 is fully dictated by ∆t and dt+1 as detailed in Case 1. Case 3: at = 1 and rt = 1. In this case, the receiver receives the update instantly. Thus, Ut+1 = t. Then, we can conclude that ∆t+1 = dt+1. Since the update is received instantly, we have dt+1 ∈ {0, 1}. More precisely, Pr[(dt+1,∆t+1) = (0, 0)|(dt,∆t), at = 1] = ps(1− 2p). P r[(dt+1,∆t+1) = (1, 1)|(dt,∆t), at = 1] = 2psp. Combining the above three cases, we can fully capture the evolution of (dt,∆t). 2.2.3 Problem Formulation We consider the problem where there exists a unit power consumption along with each transmission attempt regardless of the result. At the same time, the transmitter has a power budget α < 1. We define ϕ = (a0, a1, ...) as a sequence of actions the transmitter takes and denote all the feasible series of actions as Φ. Then, our problem can be formulated as arg min ϕ ∈ Φ ∆̄ϕ ≜ lim T→∞ 1 T Eϕ ( T−1∑ t=0 ∆t ) subject to R̄ϕ ≜ lim T→∞ 1 T Eϕ ( T−1∑ t=0 at ) ≤ α. (2.2) As (2.2) shows, the system is resource-constrained. Thus, we realize a necessity to require the transmission attempts to help minimize AoII. More precisely, we require Pr[(0, 0) | (dt,∆t), at = 21 1] ≥ Pr[(0, 0) | (dt,∆t), at = 0] for any (dt,∆t). Leveraging the system dynamic in Section 2.2.2, we conclude that it is sufficient to require p ∈ [0, 1 3 ]. Therefore, we only consider the case of p ∈ [0, 1 3 ] throughout the rest of this chapter. A similar assumption is also made in [37]. We notice that solving problem (2.2) is equivalent to solving a constrained Markov decision process (CMDP). To this end, we adopt the Lagrangian approach. 2.3 Problem Optimization 2.3.1 Lagrangian Approach First of all, we write the constrained optimization problem (2.2) into its Lagrangian form. L(ϕ, λ) = lim T→∞ 1 T Eϕ [ T−1∑ t=0 (∆t + λat) ] − λα, where λ is the Lagrange multiplier. Then, the corresponding dual function will be G(λ) = min ϕ∈Φ L(ϕ, λ). (2.3) According to the results in [50], the optimal policy for the constrained problem (2.2) can be characterized by the optimal policies for the minimization problem (2.3) under certain λ. Thus, we start with solving problem (2.3) for any given λ ≥ 0. As λα is independent of policy, we can ignore it which leads to the following optimization problem. minimize ϕ ∈ Φ lim T→∞ 1 T Eϕ [ T−1∑ t=0 (∆t + λat) ] . (2.4) 22 The above problem can be cast into an infinite horizon with average cost Markov decision process (MDP)M = (X ,A,P , C), where • X denotes the state space: the state is x = (d,∆). We define xd = d and x∆ = ∆. We will use x and (d,∆) to represent the state interchangeably for the rest of this chapter. • A denotes the action space: the two feasible actions are making the transmission attempt (a = 1) and staying idle (a = 0). The action space is independent of the state and the time. More precisely, a ∈ A(x, t) = A = {0, 1}. • P denotes the state transition probabilities: we define Px,x′(a) as the probability that action a at state x will lead to state x′. The values of Px,x′(a) can be obtained easily from Section 2.2.2. • C denotes the instant cost: when the system is at state x and action a is chosen, the instant cost is C(x, a) = x∆ + λa. 2.3.2 Structural Results In this section, we provide the key structural properties of the optimal policy forM, which plays a vital role in the analysis later on. The optimal policy for M is captured by its value function V (x), which can be obtained by solving the Bellman equation. In the infinite horizon with average cost MDP, the Bellman equation is defined as θ + V (x) = x∆ + min a∈{0,1} { λa+ ∑ x′∈X Px,x′(a)V (x′) } , (2.5) 23 where θ is the minimal value of (2.4). A canonical procedure to solve the Bellman equation is applying the relative value iteration (RVI) [51]. To this end, we denote by Vν(·) the estimated value function at iteration ν of RVI. We initialize V0(x) = x∆. Then, the estimated value function is updated in the following way. Vν+1(x) = Qν+1(x)−Qν+1(x ref ), (2.6) where xref is the reference state. Qν+1(x) is the interim value function and is calculated by applying the right-hand side of (2.5). More precisely, Qν+1(x) = x∆ + min a∈{0,1} { λa+ ∑ x′∈X Px,x′(a)Vν(x ′) } . (2.7) RVI is guaranteed to converge to V (·) when ν → +∞ regardless of the initialization [51]. However, it requires infinitely many iterations to achieve the exact solution. To conquer the impracticality, we leverage the special properties of our system and provide the structural property of Vν(·), which turns out to be enough to characterize the structure properties of the optimal policy forM. We start with the following lemma. Lemma 1. The estimated value function Vν(x) is increasing in both xd and x∆ at any iteration ν. Proof. Leveraging the iterative nature of RVI, we use induction to prove the desired results. The complete proof can be found in Appendix A.1. We refer state x as active if the optimal action at x is making the transmission attempt and inactive otherwise. Then, leveraging Lemma 1, we provide the key structural properties of the 24 optimal policy forM. Proposition 1. The optimal policy forM under any λ ≥ 0 is a threshold policy which possesses the following properties. • State (0, 0) will never be active. • For states x with fixed xd ̸= 0, the optimal action a∗ will switch from a∗ = 0 to a∗ = 1 as x∆ increases and the switching point (i.e. threshold) is non-increasing in xd. Proof. The optimal action at state x is captured by the sign of δV (x) ≜ V 1(x) − V 0(x) where V a(x) is the value function resulting from taking action a at state x. Then, we characterize the sign of δV (x) using Lemma 1. The complete proof can be found in Appendix A.2. We define the threshold for the states with fixed d ̸= 0 as the smallest ∆ such that a∗ = 1. We notice that state (0, 0) will never be active if optimal policy is adopted. Hence, we can characterize the optimal policy using the thresholds. In the following, an optimal policy is denoted by a vector nλ where (nλ)i is the threshold for the states with d = i. The subscript λ indicates the dependency between the optimal policy and λ. 2.3.3 Finite-State MDP Approximation In the sequel, we tackle down the problem of finding the optimal policy for M. Direct application of RVI becomes impractical as we need to estimate infinitely many value functions at each iteration. To overcome this difficulty, we use approximating sequence method (ASM) [52] and rigorously show the convergence of this approximation. To this end, we construct another 25 M(m) = (X (m),A,P(m), C) by truncating the value of ∆. More precisely, we impose X (m) :  x (m) d ∈ {0, 1, ..., N − 1}, x (m) ∆ ∈ {0, 1, ...,m}, where m is the predetermined maximal value of ∆. The transition probabilities from x ∈ X (m) to z ∈ X − X (m) (called excess probabilities) are redistributed to the states x′ ∈ X (m) in the following way. P (m) x,x′ (a) = Px,x′(a) + ∑ z∈X−X (m) Px,z(a)qz(x ′), where qz(x ′) is the probability of distributing state z to state x′ and ∑ x′∈X (m) qz(x ′) = 1. We choose qz(x′) = 1{x′d=zd} × 1{x′∆=m}. So, the transition probabilities P (m) x,x′ (a) satisfy the following. P (m) x,x′ (a) =  Px,x′(a) x′∆ < m, Px,x′(a) + ∑ G(z,x′) Px,z(a) x′∆ = m, where G(z, x′) = {z : zd = x′d, z∆ > m}. The action spaceA and the instant cost C are the same as defined inM. Theorem 1. The sequence of optimal policies forM(m) will converge to the optimal policy for M as m→∞. Proof. We show that our system verifies the two assumptions given in [52]. Then, the convergence is guaranteed according to the results in [52]. The complete proof can be found in Appendix A.3. For a given truncation parameter m, the state space X (m) is finite with size |X (m)| ∝ m. 26 When m is huge, the basic RVI will be inefficient since the minimum operator in (2.7) requires calculations for both feasible actions at every state. In the following, we propose an improved RVI which avoids minimum operators at certain states. To this end, we claim that the properties in Proposition 1 are also possessed by the optimal policies forM(m) at any iteration ν of RVI. The proof is omitted since it is very similar to what we did in Section 2.3.2. Utilizing Proposition 1, we can conclude that, for any state x(m), if there exists an active state y(m) such that y(m) ∆ ≤ x (m) ∆ and y(m) d ≤ x (m) d , then x(m) must also be active. The update step at each iteration of the improved RVI can be summarized as follows. For each x(m) ∈ X (m), • if y(m) exists, we can determine the optimal action at x(m) immediately, and the minimum operator is avoided. • if y(m) does not exist, the optimal action at x(m) is determined by applying the minimum operator. In this way, we avoid ∑N i=1(m − ni) minimum operators at each iteration of RVI where ni = (nt)i and nt is the optimal policy at iteration ν of RVI. In almost all cases, we have ni ≪ m. The pseudocode is given in Algorithm 1. In the algorithm, Vν converges when the maximum difference between the results of two consecutive iterations is less than ϵ. 2.3.4 Expected Transmission Rate In this section, we calculate the expected transmission rate R̄n under given threshold policy n. It enables us to develop an efficient algorithm for finding the optimal policy for (2.2). As we can see in (2.2), R̄n is nothing but the expected average number of transmission attempts made. Thus, it can be fully captured by the stationary distribution of the discrete-time Markov chain 27 Algorithm 1 Improved Relative Value Iteration 1: procedure RELATIVEVALUEITERATION(M) 2: Initialize V0(x) = x∆; ν = 0 3: Choose xref ∈ X arbitrarily 4: while Vν is not converged do 5: for x ∈ X do 6: if ∃ active state y s.t. yd ≤ xd and y∆ ≤ x∆ then 7: a∗(x) = 1 8: Qν+1(x) = C(x, 1) + ∑ x′ Pxx′(1) · Vν(x′) 9: else 10: for a ∈ A do 11: Hx,a = C(x, a) + ∑ x′ Pxx′(a) · Vν(x′) 12: a∗(x) = argmina{Hx,a} 13: Qν+1(x) = Hx,a∗ 14: for x ∈ X do 15: Vν+1(x) = Qν+1(x)−Qν+1(x ref ) 16: ν = ν + 1 17: return n← a∗(x) (DTMC) induced by n. More precisely, R̄n = N−1∑ d=1 +∞∑ ∆=nd πd(∆), where πd(∆) is the steady-state probability of state (d,∆) and nd = (n)d. To obtain the stationary distribution, we utilize the balance equation associated with the induced DTMC which takes the following form π(x) = ∑ x′∈X Px′,x(a n x′)π(x ′), (2.8) where anx′ is the action suggested by policy n at state x′. The problem arises since the state space of the induced DTMC is infinite. To overcome the difficulty, we present the following proposition. 28 Proposition 2. The expected transmission rate under threshold policy n is R̄n = N−1∑ d=1 ( τ−1∑ ∆=nd πd(∆) + Πd(τ) ) , where τ = max{n}. πd(∆)’s and Πd(τ)’s are the solution to the following finite system of linear equations. π0(0) =(1− 2p)π0(0) + p τ−1∑ ∆=1 (1− psa1,∆)π1(∆) + pfpΠ1(τ)+ ps(1− 2p) N−1∑ d=1 ( τ−1∑ ∆=nd πd(∆) + Πd(τ) ) , (2.9) π1(1) = 2pπ0(0) + 2psp N−1∑ d=1 ( τ−1∑ ∆=nd πd(∆) + Πd(τ) ) , (2.10) N−1∑ d=1 ( τ−1∑ ∆=ld πd(∆) + Πd(τ) ) + π0(0) = 1, (2.11) and for each 1 ≤ d ≤ N − 1, πd(∆) = 0, 0 < ∆ < ld, (2.12) πd(∆) = N−1∑ d′=1 Pd′,d(1− psad′,∆−d)πd′(∆− d), max{2, ld} ≤ ∆ ≤ τ − 1, (2.13) Πd(τ) = N−1∑ d′=1 Pd′,d ( τ−1∑ ∆=τ−d (1− psad′,∆)πd′(∆) + pfΠd′(τ) ) , (2.14) where ld = d2+d 2 and ad,∆ is the action suggested by n at state (d,∆). Proof. We cast the induced infinite-state Markov chain to an equivalent finite-state Markov chain with size depending on the policy. Then, πd(∆)’s and Πd(τ)’s are the steady-state probabilities 29 of the finite-state Markov chain. The complete proof can be found in Appendix A.4. Remark 1. We can verify that (2.9) is a linear combination of the other equations in the system of linear equations. Therefore, we can exclude (2.9) in practice. The system of linear equations can be reformulated into the matrix form Aπ = b where π is the unknowns and A, b can be obtained easily from Proposition 2. However, solving a finite system of linear equations can still be problematic, especially when the system is huge. For our problem, the size of A is O((N − 1)τ), and we notice that A is sparse. In general, solving a large system of linear equations of size O(n) requires O(n2) storage and O(n3) floating-point arithmetic operations when A is dense. In the case of sparse A, the computational cost will be less. The sparse matrix algorithms are designed to solve equations in time and space proportional to O(n) +O(cn) where c is the average number of non-zero entries in each column. Although there are cases where this linear target cannot be met, the complexity of sparse linear algebra is far less than that in dense case [53]. Generally speaking, the complexity depends on the sparsity of A. By exploiting the zero entries, we can often reduce the storage and computational requirements to O(cn) and O(cn2), respectively. The calculation of R̄n is constantly needed, and it requires a significant amount of time when the thresholds in n are huge. Hence, we provide an efficient alternative that can approximate the expected transmission rate in this case. Corollary 1. When the thresholds in n are huge, the expected transmission rate under policy n can be approximated as R̄n ≈ N−1∑ d=1 ( τ−1∑ ∆=nd πd(∆) + Πd(τ) ) , where τ = max{n}. πd(∆)’s and Πd(τ)’s are the solution to the following finite system of linear 30 equations. π0(0) =(1− 2p)π0(0) + pΠ1(η) + pfpΠ1(τ) + p τ−1∑ ∆=η+1 (1− psa1,∆)π1(∆)+ ps(1− 2p) N−1∑ d=1 ( τ−1∑ ∆=nd πd(∆) + Πd(τ) ) , (2.15) π1(1) = 2pπ0(0) + 2psp N−1∑ d=1 ( τ−1∑ ∆=nd πd(∆) + Πd(τ) ) , (2.16) N−1∑ d=1 ( Πd(η) + τ−1∑ ∆=η+1 πd(∆) + Πd(τ) ) + π0(0) = 1, (2.17) Π1(η)− π1(1) + π1(η + 1) = N−1∑ d′=1 Pd′,1Πd′(η), (2.18) Πd(η) + η+d∑ ∆=η+1 πd(∆) = N−1∑ d′=1 Pd′,dΠd′(η), 2 ≤ d ≤ N − 1, (2.19) and for each 1 ≤ d ≤ N − 1: πd(∆) = ρ N−1∑ d′=1 Pd′,d(1− psad′,∆−d)σd′(∆− d), η + 1 ≤ ∆ ≤ η + d, (2.20) πd(∆) = N−1∑ d′=1 Pd′,d(1− psad′,∆−d)πd′(∆− d), η + d+ 1 ≤ ∆ ≤ τ − 1, (2.21) Πd(τ) = N−1∑ d′=1 Pd′,d ( τ−1∑ ∆=τ−d (1− psad′,∆)πd′(∆) + pfΠd′(τ) ) , (2.22) where ad,∆ is the action suggested by the threshold policy n at state (d,∆), η = min{n} − 1 and ρ = π0(0) σ0(0) . σd(∆) is the stationary distribution associated with the Markov chain induced by another threshold policy n′ = [η′, ..., η′] where η′ = η + 1. 31 Proof. When the thresholds in n are huge, the expected transmission rate will be insignificant. Combining with (2.10), we have π1(1) ≈ 2pπ0(0). Consequently, we can show that, for any state (d,∆), πd(∆) ≈ cnd,∆π0(0) where cnd,∆ is a scalar that depends on the policy and the state. At the same time, we notice that, for any two threshold policies n1 and n2, the suggested actions at states with ∆ < min{[n1,n2]} are the same. We denote by G(n1,n2) the set of these states. Then, we can prove that, for (d,∆) ∈ G(n1,n2), π1 d(∆) π2 d(∆) ≈ cd,∆π 1 0(0) cd,∆π2 0(0) = π1 0(0) π2 0(0) , (2.23) where π1 d(∆) and π2 d(∆) are the stationary distributions when n1 and n2 are adopted, respectively. Based on (2.23), we can obtain the two systems of linear equations. The complete proof is in Appendix A.5. Remark 2. For the same reason as in Remark 1, we can exclude (2.15) in practice. In Corollary 1, instead of solving a large system of linear equations of size O((N − 1)τ), we approximate R̄λ by solving two systems of linear equations of size O((N − 1)(η + 1)) and O((N−1)(τ−η+1)), respectively. It is worth noting that when τ ≈ η or τ ≫ η, the complexity reduction of Corollary 1 is limited. For other cases, Corollary 1 can significantly reduce the complexity and the resulting error is negligible. The methodology presented in Proposition 2 can also be applied to the calculation of the expected AoII ∆̄n. More precisely, we can use the following corollary. 32 Corollary 2. The expected AoII under threshold policy n is ∆̄n = N−1∑ d=1 ( τ−1∑ ∆=ld ωd(∆) + Ωd(τ) ) , where τ = max{n} and ld = d2+d 2 . ωd(∆)’s and Ωd(τ)’s are the solution to the following finite system of linear equations. For each 1 ≤ d ≤ N − 1: ω0(0) = 0; ωd(∆) = 0, ∆ < ld, (2.24) ωd(∆) = ∆πd(∆), ld ≤ ∆ ≤ τ − 1, (2.25) Ωd(τ) = N−1∑ d′=1 Pd′,d ( τ−1∑ ∆=τ−d (1− psad′,∆)ωd′(∆) + pfΩd′(τ) ) + dΠd(τ), (2.26) where ad,∆ is the action suggested by the threshold policy n at state (d,∆). πd(∆)’s and Πd(τ)’s can be obtained using Proposition 2 with the same threshold policy n. Proof. We define ωd(∆) ≜ ∆πd(∆) and Ωd(τ) ≜ ∑+∞ ∆=τ ωd(∆). Then, we combine the states with ∆ ≥ τ as did in Proposition 2. After some rearrangements, we can obtain the system of linear equations shown above. The complete proof is in Appendix A.6. 2.3.5 Optimal Policy Till this point, we are able to find the optimal policy for problem (2.4). However, our goal is to find the optimal policy for the constrained problem (2.2). Based on the work in [50], the optimal policy for problem (2.2) can be expressed as a mixture of two deterministic policies that are both optimal for problem (2.4) with λ = λ∗. More precisely, the optimal policy can be 33 summarized in the following theorem. Theorem 2. The optimal policy for the constrained problem (2.2) can be expressed as a mixture of two deterministic policies nλ∗+ and nλ∗− that are both optimal for problem (2.4) with λ = λ∗ ≜ inf{λ > 0 : R̄λ ≤ α}. R̄λ is the expected transmission rate resulting from policy nλ. More precisely, if we choose µ = α− R̄λ∗+ R̄λ∗− − R̄λ∗+ , (2.27) the mixed policy nλ∗ , which selects nλ∗− with probability µ and nλ∗+ with probability 1 − µ each time the system reaches state (0, 0), is optimal for the constrained problem (2.2) and the constraint in (2.2) is met with equality. Proof. We verify that our system satisfies all the assumptions given in [50]. Then, combining the characteristics of our system and the results in [50], we obtain the optimal policy. The complete proof is in Appendix A.7. Next, we describe an efficient algorithm to obtain the optimal policy for the constrained problem (2.2). The core of obtaining the optimal policy is to find λ∗. We recall that, for any given λ, the deterministic policy nλ is obtained by applying the improved RVI and the resulting R̄λ, which is non-increasing in λ [50], is calculated using Proposition 2. Hence, R̄λ can be regarded as a non-increasing function of λ, and we can use Bisection search with tolerance ξ to find λ∗. Then, λ∗+ and λ∗− can be the boundaries of the final interval. More precisely, we initialize λ− = 0 and λ+ = 1. Then, the procedure can be summarized as follows. • As long as R̄λ+ ≥ α, we set λ− = λ+ and λ+ = 2λ+. Then, we end up with an interval I = [λ−, λ+]. 34 Algorithm 2 Bisection Search 1: procedure BISECTIONSEARCH(M(m)(λ), α) 2: Initialize λ− = 0; λ+ = 1 3: nλ+ = RV I(M(m)(λ+), ϵ) using Algorithm 1 4: R̄λ+ = R̄(nλ+) using Proposition 2 5: while R̄λ+ ≥ α do 6: λ− = λ+; λ+ = 2λ+ 7: nλ+ = RV I(M(m)(λ+), ϵ) using Algorithm 1 8: R̄λ+ = R̄(nλ+) using Proposition 2 9: while λ+ − λ− ≥ ξ do 10: λ = λ++λ− 2 11: nλ = RV I(M(m)(λ), ϵ) using Algorithm 1 12: R̄λ = R̄(nλ) using Proposition 2 13: if R̄λ ≥ α then 14: λ− = λ 15: else 16: λ+ = λ 17: return (λ∗+, λ ∗ −)← (λ+, λ−) • We apply Bisection search on the interval I until the length of I is less than the tolerance ξ. Then, the algorithm returns λ∗+ and λ∗−. The pseudocode is given in Algorithm 2. Finally, the mixing coefficient µ is calculated using (2.27) and the resulting expected AoII is calculated using Corollary 2. The algorithm is efficient for the following reasons. • We obtain nλ using the improved RVI which avoids minimum operators at certain states. • When calculating R̄λ, we cast the induced infinite-state Markov chain to a finite-state Markov chain. • We find λ∗+ and λ∗− using Bisection search which has a logarithmic complexity. 35 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 p 0 2 4 6 8 10 12 14 E xp ec te d A o II AoII-optimal policy AoI-optimal policy (a) The expected AoII in function of p. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 p s 0 20 40 60 80 100 120 E xp ec te d A o II AoII-optimal policy AoI-optimal policy (b) The expected AoII in function of ps. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 30 35 40 45 50 E xp ec te d A o II AoII-optimal policy AoI-optimal policy (c) The expected AoII in function of α. Figure 2.3: Illustrations of AoII-optimal policy and AoI-optimal policy. The truncation parameter in ASM m = 800 and the tolerance in Bisection search ξ = 0.01. RVI converges when the maximum difference between the results of two consecutive iterations is less than ϵ = 0.01. 2.4 Numerical Results In this section, we provide numerical results that accent the effect of system parameters on the performance of AoII-optimal policy. We also compare the AoII-optimal policy with the AoI-optimal policy derived in [18]. Effect of p: We compare the performances of AoII-optimal policies under different values of p. To this end, we fix N = 7 and ps = 0.8. We also set α = 0.06. We vary the value of p and plot the corresponding results. As we can see in Figure 2.3a, the expected AoII is increasing in p. To explain this trend, we notice that as p increases, the source process will be more inclined to change state at the next time slot. Then, those successfully transmitted updates will more likely be obsolete at the next time slot. As the power budget α is fixed which dictates the transmission rate, the expected AoII will increase as p increases. We also show, in Table 2.1, the deterministic policies nλ∗+ , nλ∗− and the corresponding mixing coefficient µ for some values of p. 1 The optimal policy is the mixture of nλ∗+ and nλ∗− 1In the table, ” / ” indicates the threshold where the two policies differ. 36 Table 2.1: Optimal thresholds for different p Mixing Coef. n1 n2 n3 n4 n5 n6 p = 0.1 µ = 0.7176 15 6/7 1 1 1 1 p = 0.2 µ = 0.0331 37 16 8/9 1 1 1 p = 0.3 µ = 0.1178 69 25/26 15 1 1 1 Table 2.2: Optimal thresholds for different ps Mixing Coef. n1 n2 n3 n4 n5 n6 ps = 0.2 µ = 0.6712 556 228 140 96 70/71 60 ps = 0.4 µ = 0.3260 151 62 36/37 24 17 1 ps = 0.6 µ = 0.4089 67 27/28 16 1 1 1 ps = 0.8 µ = 0.0331 37 16 8/9 1 1 1 with mixing coefficient µ as described in Theorem 2. We can see that the thresholds are, in general, increasing in p. The reason behind this is as follows. When p is small, the successfully transmitted updates are more likely still accurate in the next few time slots. In another word, the transmission is more ”efficient”. We refer a transmission as ”efficient” if it reduces the age to the greatest extent. This allows the transmitter to make transmission attempts when the age is relatively low without violating the power constraint. Effect of ps: In this scenario, we fix p = 0.2 and investigate the effect of channel reliability ps on the performance of AoII-optimal policy. We still consider the case of N = 7 and α = 0.06. The corresponding results are shown in Figure 2.3b. As ps increases, the expected AoII will decrease. The reason is as follows. As ps increases, the transmitted updates will more likely be successful. Consequently, the transmission will be more ”efficient”. As the power budget is fixed, the expected AoII will decrease as ps increases. We also present some selected thresholds in Table 2.2. As we see in the table, the thresholds are, in general, decreasing in ps. We recall 37 that as ps increases, the transmission will be more ”efficient”. Thus, the transmitter can make transmission attempts when the age is relatively low while keeping the transmission rate not exceeding the power budget. Effect of α: Then, we analyze the performances of AoII-optimal policies under different values of α. We adopt N = 7 and ps = 0.8. We also set p = 0.2. The expected AoII achieved by the AoII-optimal policies are plotted in Figure 2.3c. As we see, the expected AoII decreases as α increases. The reason is simple. As the power budget increases, more transmission attempts are allowed. We recall that we impose a transmission attempt to always help reduce the age. Keeping this in mind, we can conclude that the expected AoII is decreasing in α. It is worth noting that as α increases, the expected AoII will stop decreasing before α = 1. To explain this, we recall that the transmitter will never make transmission attempt at state (0, 0) if an optimal policy is adopted. Thus, as α becomes large, the transmitter will have enough budget to make transmission attempts at any states other than (0, 0). Then, the transmission rate is saturated, and the expected AoII will not decrease further. Comparison with AoI-optimal policy: Lastly, we compare the AoII-optimal policy with the AoI-optimal policy. From Figure 2.3, we can see that the performance gap expands with the increase in p and the decrease in ps and α. The reason behind it lies in the value of transmission attempts. As p increases, the source process becomes more inclined to change states. Therefore, transmission attempts are more needed to bring correct information to the receiver. As ps decreases, the number of successful transmissions will decrease, and the AoII will build up faster. In this case, transmission attempts are more valuable because they will greatly reduce AoII once they 38 succeed. As α decreases, transmission attempts will be more valuable as fewer attempts are allowed. Due to the different definitions of age, there are often cases where AoI is large, but AoII is small. In these cases, the AoI-optimal policy will waste valuable transmission attempts. Therefore, the increase in the value of transmission attempts will lead to an expansion of the performance gap. 2.5 Conclusion In this chapter, we consider a system where the source process is modeled by an N-state Markov chain. The AoII that adopts non-binary information penalty function is used. We study the problem of minimizing the AoII subject to a power constraint. By casting the problem into a CMDP, we can prove that the optimal policy is a mixture of two deterministic threshold policies. Then, an efficient algorithm is proposed to find such policies and the mixing coefficient. Lastly, numerical results are provided to illustrate the performance of the AoII-optimal policy and compare it with the AoI-optimal policy. 39 Chapter 3: Scheduling for Freshness 3.1 Overview With the development of communication and sensor technology, the number of users that a base station need to handle is constantly increasing. However, due to resource limitations, the base station can only process a subset of the users simultaneously. This leads us to study how the base station should reasonably schedule these resources so that all users can get optimal service under a given evaluation system. Therefore, in this chapter, we study the AoII minimization problem in the context of scheduling. We consider a system where a base station needs to update a part of all available users at the same time. Meanwhile, we consider the generic time penalty function and study the minimization problem in the presence of imperfect channel state information (CSI). Due to the presence of CSI, the Whittle’s index policy becomes infeasible in general. Therefore, we introduce another scheduling policy that is more versatile and has comparable performance to Whittle’s index policy. The scheduling problem with AoI as the performance measure is studied under various system settings in [54–58]. The problem studied in this chapter is different and more complicated because AoII considers the aging process of inconsistent information rather than the aging process of updates. Meanwhile, none of them consider the case where CSI is available. The problem of optimizing information freshness in the presence of CSI is studied in [59, 60]. However, they focus on the single-user system and 40 mainly discuss cases where CSI is perfect. The scheduling problems with the goal of minimizing an error-based performance measure are considered in [61–63]. Our problem is fundamentally different because AoII also considers the time effect. Moreover, we consider the system where a base station observes multiple sources simultaneously and has to send updates to multiple destinations. The main contributions of this chapter can be summarized as follows. 1) We study the problem of minimizing AoII in a multi-user system where imperfect CSI is available. Meanwhile, the time penalty function is generic. 2) We derive the structural properties of the optimal policy for the considered problem. 3) We establish the indexability of the considered problem under a simple condition and develop the Whittle’s index policy. 4) We obtain the optimal policy for a relaxed version of the original problem. By exploring the characteristics of the relaxed problem, we provide an efficient algorithm to obtain the optimal policy. 5) Based on the optimal policy for the relaxed problem, we develop the indexed priority policy, which is free of indexability and has comparable performance to Whittle’s index policy. This chapter summarizes the work in [64], and the rest is organized as follows. Section 3.2 introduces the system model and formulates the primal problem. In Section 3.3, we explore the structural properties of the optimal policy for the primal problem. Under a simple condition, we develop the Whittle’s index policy in Section 3.4. Section 3.5 presents the optimal policy for a relaxed version of the primal problem. On this basis, we develop the indexed priority policy in Section 3.6. Finally, in Section 3.7, the numerical results are laid out. 41 Base Station Receiver1 Receiver2 ReceiverN U S E R 1 U S E R 2 U S E R N Source1 Source2 SourceN Channel1 Channel2 ChannelN Figure 3.1: The structure of the system model. 3.2 System Overview 3.2.1 System Model We consider a slotted-time system with N users and one base station. Each user has a source process, a channel, and a receiver. We assume all the users share the same structure, but the parameters are different. The structure of the communication model is provided in Figure 3.1. For user i, the source process is modeled by a two-state Markov chain where transitions happen between the two states with probability pi > 0 and self-transitions happen with probability 1−pi. At any time slot t, the state of the source process Xi,t ∈ {0, 1} will be reported to the base station as an update, and the base station will decide whether to transmit this update through the corresponding channel. The channel is unreliable, but the estimate of the CSI is available at the beginning of each time slot. Let ri,t ∈ {0, 1} be the CSI at time t. We assume that ri,t is independent across time and user indices. ri,t = 1 if and only if the transmission attempt at time t will succeed and ri,t = 0 otherwise. Then, we denote by r̂i,t ∈ {0, 1} the estimate of ri,t. We 42 assume that r̂i,t is an independent Bernoulli random variable with parameter γi, i.e., r̂i,t = 1 with probability γi ∈ [0, 1] and r̂i,t = 0 with probability 1 − γi. However, the estimate is imperfect. We assume that the error depends only on the user and its estimate. More precisely, we define the probability of error as pr̂ie,i ≜ Pr[ri ̸= r̂i | r̂i]. We assume pr̂ie,i < 0.5 because we can flip the estimate if pr̂ie,i > 0.5. We are not interested in the case of pr̂ie,i = 0.5 since r̂i,t is useless in this case. Although the channel is unreliable, each transmission attempt takes exactly one time slot regardless of the result, and the successfully transmitted update will not be corrupted. Every time an update is received, the receiver will use it as the new estimate X̂i,t ∈ {0, 1}. The receiver will send an ACK/NACK packet to inform the base station of its reception of the new update. Since an ACK/NACK packet is generally very small and simple, we assume that it is transmitted reliably and received instantaneously. Then, if ACK is received, the base station knows that the receiver’s estimate changed to the transmitted update. If NACK is received, the base station knows that the receiver’s estimate did not change. Therefore, the base station always knows the estimate at the receiver side. At the beginning of each time slot, the base station receives updates from each source and the estimates of CSI from each channel. The old updates and estimates are discarded upon the arrival of new ones. Then, the base station decides which updates to transmit, and the decision is independent of the transmission history. Due to the limited resources, at mostM < N updates are allowed per transmission attempt. We consider a base station that always transmits M updates. All the users adopt AoII as a performance metric, but the choices of penalty functions may vary. We consider the case where the users adopt the same information penalty function g(Xt, X̂t) = |Xt − X̂t| but possibly different time penalty functions. To ease the analysis, we require f(t) to be unbounded. Combined with the monotonicity requirement of the time penalty 43 function, we require f(t1) ≤ f(t2) if t1 < t2 and limt→+∞ f(t) = +∞. Without a loss of generality, we assume f(0) = 0. Meanwhile, as the source is modeled by a two-state Markov chain, g(Xt, X̂t) ∈ {0, 1}. Hence, AoII can be written as ∆AoII(Xt, X̂t, t) = t∑ k=Ut+1 F (k − Ut) = f(st), where Ut is defined in Section 1.2.2 and st ≜ t − Ut. Therefore, the evolution of st is sufficient to characterize the evolution of AoII. To this end, we distinguish between the following cases. • When the receiver’s estimate is correct at time t + 1, we have Ut+1 = t + 1. Then, by definition, st+1 = 0. • When the receiver’s estimate is incorrect at time t + 1, we have Ut+1 = Ut. Then, by definition, st+1 = t+ 1− Ut = st + 1. To sum up, we get st+1 = 1{Ut+1 ̸= t+ 1}(st + 1). (3.1) In the remainder of this chapter, we use fi(·) to denote the time penalty function user i adopts. 3.2.2 System Dynamic In this section, we tackle the system dynamic. We notice that the status of user i can be captured by the pair xi,t ≜ (si,t, r̂i,t). In the following, we will interchangeably use xi,t and (si,t, r̂i,t). Then, the system dynamic can be fully characterized by the dynamic of xt ≜ (x1,t, ..., xN,t). Hence, it suffices to characterize the value of xt+1 given xt and the base station’s action. To this end, we denote, by at = (a1,t, ..., aN,t), the base station’s action at time t. ai,t = 1 44 if the base station transmits the update from user i at time t and ai,t = 0 otherwise. We notice that given action at, users are independent and the action taken on user i will only affect itself. Consequently Pr(xt+1 | xt,at) = N∏ i=1 Pr(xi,t+1 | xi,t,at) = N∏ i=1 Pr(xi,t+1 | xi,t, ai,t). Combined with the fact that all the users share the same structure, it is sufficient to study the dynamic of a single user. In the following discussions, we drop the user-dependent subscript i. We recall that r̂t+1 is an independent Bernoulli random variable. Then, we have Pr(xt+1 | xt, at) = P (r̂t+1)× Pr(st+1 | xt, at). By definition, P (r̂t+1 = 1) = γ and P (r̂t+1 = 0) = 1 − γ. Then, we only need to tackle the value of Pr(st+1 | xt, at). To this end, we distinguish between the following cases • When xt = (0, r̂t), the estimate at time t is correct (i.e., X̂t = Xt). Hence, for the receiver, Xt carries no new information about the source process. In other words, X̂t+1 = X̂t regardless of whether an update is transmitted at time t. We recall that Ut+1 = Ut if X̂t+1 ̸= Xt+1 and Ut+1 = t+ 1 otherwise. Since the source is binary, we obtain Ut+1 = Ut if Xt+1 ̸= Xt, which happens with probability p and Ut+1 = t+1 otherwise. According to (3.1), we obtain Pr(1 | (0, r̂t), at) = p. Pr(0 | (0, r̂t), at) = 1− p. 45 • When at = 0 and xt = (st, r̂t), where st > 0, the channel will not be used and no new update will be received by the receiver,and so, X̂t+1 = X̂t. We recall that Ut+1 = Ut if X̂t+1 ̸= Xt+1 and Ut+1 = t + 1 otherwise. Since Xt ̸= X̂t and the source is binary, we have Ut+1 = Ut if Xt+1 = Xt, which happens with probability 1 − p and Ut+1 = t + 1 otherwise. According to (3.1), we obtain Pr(st + 1 | (st, r̂t), at = 0) = 1− p. Pr(0 | (st, r̂t), at = 0) = p. • When at = 1 and xt = (st, 1) where st > 0, the transmission attempt will succeed with probability 1−p1e and fail with probability p1e. We recall that Ut+1 = Ut if X̂t+1 ̸= Xt+1 and Ut+1 = t + 1 otherwise. Then, when the transmission attempt succeeds (i.e., X̂t+1 = Xt), Ut+1 = Ut if Xt+1 ̸= Xt and Ut+1 = t+ 1 otherwise. When the transmission attempt fails (i.e., X̂t+1 = X̂t ̸= Xt), we have Ut+1 = Ut if Xt+1 = Xt and Ut+1 = t + 1 otherwise. Combining (3.1) with the dynamic of the source process we obtain Pr(st + 1 | (st, 1), at = 1) = p1e(1− p) + (1− p1e)p ≜ α. Pr(0 | (st, 1), at = 1) = p1ep+ (1− p1e)(1− p) = 1− α. • When at = 1 and xt = (st, 0), where st > 0, following the same line, we obtain Pr(st + 1 | (st, 0), at = 1) = p0ep+ (1− p0e)(1− p) ≜ β. 46 Pr(0 | (st, 0), at = 1) = p0e(1− p) + (1− p0e)p = 1− β. Combines together, we obtain the value of Pr(st+1 | xt, at) in all cases. As only M out of N updates are allowed per transmission attempt, we realize a necessity to require transmission attempts always help minimize AoII. It is equivalent to impose Pr(st+1 > st | (st, r̂t), at = 0) > Pr(st+1 > st | (st, r̂t), at = 1) for any (st, r̂t). Leveraging the results above, it is sufficient to require p < 0.5. As all the users share the same structure, we assume, for the rest of this chapter, that 0 < pi < 0.5 for 1 ≤ i ≤ N . 3.2.3 Problem Formulation The communication goal is to minimize the expected AoII. Therefore, the problem can be formulated as the following arg min ϕ ∈ Φ lim T→∞ 1 T Eϕ ( T−1∑ t=0 N∑ i=1 fi(si,t) ) subject to N∑ i=1 ai,t =M ∀t, (3.2) where Φ is the set of all causal policies. We refer to the constrained minimization problem reported in (3.2) as the primal problem (PP). We notice that the PP is a restless multi-armed bandit (RMAB) Problem. The optimal policy for this type of problem is far from reachable since it is PSPACE-hard in general [65]. However, we can still derive the structural properties of the optimal policy. These structural properties can be used as a guide for the development of scheduling policies and can indicate the good performance of the developed scheduling policies. 47 3.3 Structural Properties of the Optimal Policy In this section, we investigate the structural properties of the optimal policy for PP. We first define an infinite horizon with an average cost Markov decision process (MDP)MN(w,M) = (XN ,AN(M),PN , CN(w)), where • XN denotes the state space. The state is x = (x1, ..., xN) where xi = (si, r̂i). • AN(M) denotes the action space. The feasible action is a = (a1, ..., aN) where ai ∈ {0, 1} and ∑N i=1 ai =M . Note that the feasible actions are independent of the state and the time. • PN denotes the state transition probabilities. We define Px,x′(a) as the probability that action a at state x will lead to state x′. It is calculated by Px,x′(a) = N∏ i=1 P (r̂′i)Psi,s′i(ai, r̂i), where Psi,s′i(ai, r̂i) is the transition probability from si to s′i when the estimate of CSI is r̂i and action ai is taken. The values of Psi,s′i(ai, r̂i) can be obtained easily from the results in Section 3.2.2. • CN(w) denotes the instant cost. When the system is at state x and action a is taken, the instant cost is C(x,a) ≜ ∑N i=1C(xi, ai) ≜ ∑N i=1 ( fi(si) + wai ) . We notice that PP can be cast intoMN(0,M). Since w = 0, the instant cost is independent of action a. Therefore, we abbreviate C(x,a) as C(x). To simplify the analysis, we consider the case of M = 1. Equivalently, we investigate the structural properties of the optimal policy forMN(0, 1). 48 Remark 3. For the case of M > 1, we can apply the same methodology. However, as M increases, the action space will grow quickly, resulting in the need to consider more feasible actions in each step of the proof. Hence, to better demonstrate the methodology, we only consider the case of M = 1 in this chapter. It is well known that the optimal policy for MN(0, 1) can be characterized by the value function. We denote the value function of state x as V (x). A canonical procedure to calculate V (x) is applying the value iteration algorithm (VIA). To this end, we define Vν(·) as the estimated value function at iteration ν of VIA and initialize V0(·) = 0. Then, VIA updates the estimated value functions in the following way Vν+1(x) = C(x)− θ + min a∈AN (1) { ∑ x′∈XN Px,x′(a)Vν(x ′) } , (3.3) where θ is the optimal value ofMN(0, 1). VIA is guaranteed to converge to the value function [51]. More precisely, Vν(·) = V (·) when ν → +∞. However, the exact value function is impossible to get since we need infinite iterations and the state space is infinite. Instead, we provide two structural properties of the value function. Lemma 2. ForMN(0, 1), V (x) is non-decreasing in si for 1 ≤ i ≤ N . Proof. Leveraging the iterative nature of VIA, we use mathematical induction to prove the desired results. The complete proof can be found in Appendix B.1. Before introducing the next structural property, we make the following definition. Definition 1 (Statistically identical). Two users are said to be statistically identical if the user- dependent parameters and the adopted time penalty functions are the same. 49 For the users that are statistically identical, we can prove the following Lemma 3. ForMN(0, 1), if users j and k are statistically identical, V (x) = V (P(x)) where P(x) is state x with xj and xk exchanged. Proof. Leveraging the iterative nature of VIA, we use mathematical induction to prove the desired results. At each iteration, we show that for each feasible action at state x, we can find an equivalent action at state P(x). Two actions are equivalent if they lead to the same value function. The complete proof can be found in Appendix B.2. Equipped with the above lemmas, we proceed with characterizing the structural properties of the optimal policy. We recall that the optimal action at each state can be characterized by the value function. Hence, we denote, by V j(x), the value function resulting from choosing user j to update at state x. Then, V j(x) can be calculated by V j(x) = C(x)− θ + ∑ x′−x′j  (∏ i ̸=j Pxi,x′i(0) )∑ r̂′j P (r̂′j) ∑ s′j Psj ,s′j(1, r̂j)V (x′)  . If V j(x) < V k(x) for all k ̸= j, it is optimal to transmit the update from user j. When V j(x) = V k(x), the two choices are equally desirable. In the following, we will characterize the properties of δj,k(x) ≜ V j(x)− V k(x) for any j and k. Theorem 3. ForMN(0, 1), δj,k(x) has the following properties 1. δj,k(x) ≤ 0 if r̂k = p0e,k = 0. The equality holds when sj = 0 or r̂j = p0e,j = 0. 2. δj,k(x) is non-increasing in r̂j and is non-decreasing in r̂k when sj, sk > 0. At the same time, δj,k(x) is independent of r̂i for any i ̸= j, k. 50 3. δj,k(x) ≤ 0 if sk = 0. The equality holds when sj = 0 or r̂j = p0e,j = 0. 4. δj,k(x) is non-increasing in sj if Γr̂jj ≤ Γr̂kk and is non-decreasing in sk if Γr̂jj ≥ Γr̂kk when sj, sk > 0. We define Γ1 i ≜ αi 1−pi and Γ0 i ≜ βi 1−pi for 1 ≤ i ≤ N . 5. δj,k(x) ≤ 0 if sj ≥ sk, r̂j ≥ r̂k, and users j and k are statistically identical. Proof. The proof can be found in Appendix B.3. We notice that Γr̂ii can be written as Γr̂ii = Pr(si + 1 | (si, r̂i), ai = 1) Pr(si + 1 | (si, r̂i), ai = 0) < 1, where si can be any positive integer. Consequently, Γr̂ii is independent of any si > 0 and indicates the decrease in the probability of increasing si caused by action ai = 1. When Γr̂ii is large, action ai = 1 will achieve a small decrease in the probability of increasing si. In the following, we provide an intuitive interpretation of why the monotonicity in Property 4 of Theorem 3 depends on Γr̂ii . We take the case of Γr̂jj ≤ Γr̂kk as an example and assume that there are only users j and k in the system. Then, according to Section 3.2.2, the dynamic of sj and sk can be divided into the following three cases • Neither sj nor sk increases. In this case, both sj and sk become zero. • Either sj or sk increases and the other becomes zero. We denote by P k j the probability that only sk increases when aj = 1. The notation for other cases is defined analogously. The probabilities can be obtained easily using the