ABSTRACT

Title of Dissertation: TOWARD INTEGRATING INTELLIGENCE
INTO EVERYTHING AROUND US

Nakul Garg
Doctor of Philosophy, 2025

Dissertation Directed by: Professor Nirupam Roy
Department of Computer Science

The vision of ambient intelligence promises a world where computational capabilities

seamlessly integrate into everyday objects and environments, creating systems that sense, learn,

and adapt to human needs while remaining invisible to users. Despite significant advances

in miniaturization and low-power computing, true ambient intelligence has remained elusive,

hindered by a fundamental challenge: current intelligent systems require substantial energy,

complex hardware, and frequent maintenance, making widespread deployment impractical. We

introduce a paradigm shift in how we create intelligent systems by fundamentally reimagining

sensing and computing architectures from first principles for extreme resource constraints.

This thesis centers on encoding intelligence directly into the physical domain through novel

hardware-software co-design, where passive structures perform initial signal transformations

without consuming power. Through novel architectures across acoustic, radio frequency, and

optical domains, we demonstrate systems that achieve spatial perception, global positioning, and

environmental monitoring with orders of magnitude less power than conventional approaches.


These innovations enable intelligence in previously impossible contexts: insect-scale robots that

navigate complex environments, sticker-sized tags that provide GPS-like tracking for years on

a single battery, and wireless sensors that monitor food quality throughout global supply chains.

By bridging the gap between what intelligent systems can do and what resource-constrained

platforms can support, this work establishes a foundation for truly pervasive intelligence that

operates sustainably at large scale.


TOWARD INTEGRATING INTELLIGENCE INTO EVERYTHING
AROUND US

by

Nakul Garg

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2025

Advisory Committee:
Professor Nirupam Roy, Chair/Advisor
Professor Ramani Duraiswami
Professor Lin Zhong
Professor Alan Liu
Professor Sennur Ulukus, Dean’s Representative


© Copyright by
Nakul Garg

2025


To my family - Kavita, Sunil, and Rishabh

ii


Acknowledgments

I would like to express my sincere gratitude to my advisor, Professor Nirupam Roy, for

his exceptional guidance and mentorship throughout my doctoral studies. His support has been

instrumental in shaping my research capabilities and analytical thinking. I have thoroughly enjoyed

our brainstorming sessions where several creative ideas emerged and I learned how to transform

research challenges into opportunities. He taught me invaluable skills in organizing complex

thoughts, approaching problems with critical rigor, and communicating my ideas. Without his

dedication and belief in my potential, the work presented in this dissertation would not have been

possible.

I would like to express my deep appreciation to my committee members for their guidance

and support. Professor Ramani Duraiswami has been incredibly supportive of my work since

day one and provided invaluable feedback till the end in my academic job search. I am very

grateful to have Professor Lin Zhong on my committee; his leadership and mentorship have been

exemplary. He has been a role model for me and has shown unwavering support for our research

directions, providing validation for the structure-assisted spatial sensing work we pursued. I

extend my heartfelt thanks to Professor Alan Liu, from whom I learned how to build and scale

networked systems both while taking his class on cloud networking and computing and during our

discussions about the future of Edge IoT. I truly value his thoughtful feedback on this dissertation

and his help in crafting my thesis story. I am fortunate to have Professor Sennur Ulukus on my

iii


committee. As an expert in wireless, her keen insights have been particularly valuable as I explore

scalable wireless systems in this work.

I am incredibly grateful to the mentors and supporters who helped me through the academic

job market. Thank you to Nirupam Roy, Karthik Sundaresan, Ranveer Chandra, Lin Zhong,

and Siavash Alamouti. Karthik mentored me during my internship at NEC and beyond. He

is a brilliant researcher and collaborator who believed in my abilities from the start. I want to

thank him for collaborating with me on the scalable UWB project and for inspiring me to aim

higher. I met Ranveer during my internship at Microsoft Research. He is the best mentor anyone

could ask for, and his leadership and ability to solve impactful problems has inspired me to the

core. I am fortunate for his feedback and mentorship during my job search and lucky to have

him as a mentor. Siavash is a rockstar researcher and has been a strong advocate for my work.

Whenever I got a chance to meet him, he always gave great advice on my research directions and

was especially supportive of my low power wireless work. I am lucky to know him as a friend, a

mentor, and a 10x entrepreneur. I would also like to thank Akshay Gadre, Ish Jain, Nivedita Arora,

Akarsh Prabhakar, Justin Chan, Vikram Iyer, Suman Banerjee, Yasaman Ghasempour, Venkat

Arun, Dinesh Bharadia, Swarun Kumar, Chahatdeep Singh, Nitin Sanket, and Tara Boroushaki for

their time and invaluable advice throughout my job search journey.

I am grateful to my labmates and Spacewalkers - Yang Bai, Aritrik Ghosh, Irtaza Shahid

and Harsh Takawale (The iCoSMoS Lab). Thank you Yang for working with me on the spatial

acoustics projects. I learned a lot about perseverance and discipline from your approach to research.

Special thanks to Aritrik for collaborating on the low-power cellular work. I learned a lot about

theoretical research and cherish the many deep physics conversations we had. I would like to

thank Irtaza for working with me on the UWB project and spending nights to create testbeds and

iv


calibrating our sensors. I also cherish all our fundamental discussions and recoveries from our

breakdowns thinking about signal processing problems. Thanks to Harsh for working with me

on the food sensing project. I have always enjoyed brainstorming with you. Thanks to the entire

iCoSMoS lab for always providing me feedback on the papers, talks, and supporting me through

both ups and downs in my PhD journey.

I must express my heartfelt gratitude to my amazing friends who have been my support

system throughout this journey. To Satyarth, who taught me life philosophy and the positive vibe

- I’m sorry I couldn’t attend your wedding. To Deepak, Ravin, Smriti, Varenya, Shweta, and

Abhishek - I found a family away from family especially during COVID with our Spring TP group.

To Aditi for helping me with medicines in all conference travels and your perfect song suggestions.

Special thanks to Meenal for her unwavering support. You are my best friend, and I am lucky to

have you. I am deeply grateful to have Nitin bhaiya as a friend and a mentor - I have learned so

much from our conversations, especially your advice and guidance in my early years of PhD when

I came and felt lost. Chahat bhaiya, thank you for all your advice throughout my PhD journey,

from my first project to my most recent one, and especially for all the job talk and interview

tips. Thanks also to Anoorag, Priyal, Stella, Pooja, and Mrunal. To my dearest friends whose

support and belief in me has been a driving force - Tanmay, Bhawana, Prerna, Ishani, Devina, and

Sidharth, I’m super lucky to have all of you. Thank you Ayush for being a huge inspiration to

pursue academia and for being my longest friend.

Most importantly, I want to thank my family. My parents (Kavita and Sunil), my brother

(Rishabh), my grandparents (Ilaychi and Ramchandra), and the entire RCG family. I cannot thank

my parents enough who have sacrificed so much and faced hardships just to make my life better

and prioritize my education. My brother is my best friend who unknowingly taught me so much, to

v


dream big, never lose hope, and to smile through challenges. My grandfather is the mathematical

prodigy I know and grew up with him teaching me mathematics and discipline. My grandmother,

amma, taught me to be a good human being above everything else. I am lucky to have grown up

learning from her. My uncle (Anil) has taught me to be selfless, to provide value and support, and

to remain optimistic no matter what. Thanks to my brothers Lakshay and Gaurav for being my

biggest pillars for life. My sister (Mahima) for being a role model and helping me pursue science

and teaching me the fundamentals. You are the biggest inspiration for me to pursue academia.

Special thanks to my superstar sisters - Himanshi, Mugdhal; my aunts - Anita, Mamta and Babita;

my brother Akshay; and everyone for their love.

vi


Table of Contents

Dedication ii

Acknowledgements iii

Table of Contents vii

List of Tables xi

List of Figures xii

List of Abbreviations xix

Chapter 1: Introduction 1
1.1 Challenges in Scaling Ambient Intelligence . . . . . . . . . . . . . . . . . . . . 3
1.2 Designing Ultra-Low-Power Intelligent Systems . . . . . . . . . . . . . . . . . . 5
1.3 Systems Developed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.1 Structure-Assisted Spatial Intelligence . . . . . . . . . . . . . . . . . . . 8
1.3.2 Scalable nextG Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Systems for Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Other Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Chapter 2: Structure-Assisted Spatial Audio Sensing 15
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Core Intuitions and Primers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Metamaterials for Passive Filtering . . . . . . . . . . . . . . . . . . . . . 22
2.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.1 Processing for DoA Estimation . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Eliminating Source Signal Dependency . . . . . . . . . . . . . . . . . . 25
2.3.3 Eliminating Environmental Dependency . . . . . . . . . . . . . . . . . . 28
2.3.4 Synthetic Training for Deep Learning . . . . . . . . . . . . . . . . . . . 29
2.3.5 Optimizing 3D Stencil Design . . . . . . . . . . . . . . . . . . . . . . . 31

2.4 Prototype Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.1 3D-Printing Stencil Caps . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 Calibration and Data Collection . . . . . . . . . . . . . . . . . . . . . . 36

2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.1 Evaluation Setup and Results Summary . . . . . . . . . . . . . . . . . . 37

vii


2.5.2 Impacts of external conditions . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.3 Performance in different environments . . . . . . . . . . . . . . . . . . . 41
2.5.4 Impact of different sound sources . . . . . . . . . . . . . . . . . . . . . 42
2.5.5 Performance in known environment . . . . . . . . . . . . . . . . . . . . 42
2.5.6 Localization Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.7 Comparison with traditional methods . . . . . . . . . . . . . . . . . . . 44
2.5.8 Comparison between learning models . . . . . . . . . . . . . . . . . . . 44
2.5.9 Energy consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Chapter 3: Microstructure-Assisted Vision for Ubiquitous Tiny Robots 52
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Core Intuitions and Primers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.2.1 Coded Signal Projection with Structures . . . . . . . . . . . . . . . . . . 58
3.2.2 Single-Receiver Depth Mapping . . . . . . . . . . . . . . . . . . . . . . 60

3.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.1 Low-Power Scene Reconstruction . . . . . . . . . . . . . . . . . . . . . 61
3.3.2 Directional Code Projection . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.3 Optimal microstructure design . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.4 Motion stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.2 Overall Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.3 Impact of the Environment . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.4 Impact of system parameters . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.5 Impact of scene parameters . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.6 Computation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4.7 Power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Chapter 4: Ultra-Low-Power Self-Localization Using a Single Antenna 87
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Core Intuition and Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.3.1 Ultra-low Power Receiver . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3.2 AoA Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.3 Localization with Independent Beacons . . . . . . . . . . . . . . . . . . 97
4.3.4 Designing Programmable Directional Gain . . . . . . . . . . . . . . . . 102
4.3.5 Pin-diodes as RF Switches . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.6 Directional Code to AoA Mapping . . . . . . . . . . . . . . . . . . . . . 105

4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

viii


4.4.1 AoA Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.4.2 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.4.3 Impact on RF Communication . . . . . . . . . . . . . . . . . . . . . . . 110
4.4.4 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4.5 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.5 Limitation and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Chapter 5: Scalable Asset Tracking with NextG Cellular Signals 123
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.2 Cellular Networks Primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.4 LiTEfoot Prototype Implementation . . . . . . . . . . . . . . . . . . . . . . . . 144
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Chapter 6: Large Network UWB Localization: Algorithms and Implementation 159
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

6.2.1 Joint Range-Angle Localization . . . . . . . . . . . . . . . . . . . . . . 164
6.2.2 Scaling to Large networks . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.2.3 Reference Frame Transformation . . . . . . . . . . . . . . . . . . . . . . 174

6.3 Opportunistic Anchor Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

6.5.1 Localization Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.5.2 Latency Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.5.3 Micro-Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

Chapter 7: Low-Cost and Dynamic Food Quality Sensing at the Pallet-Level 195
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7.2 Understanding Food Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . 197
7.3 Physics and Core Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.4 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

Chapter 8: Conclusion and Future Directions 207
8.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

ix


8.2 Broader Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.4 Closing Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Bibliography 213

x


List of Tables

2.1 Comparison of prototype cost, size, median error, and energy consumption of
Owlet with a microphone array. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1 Breakdown of energy consumption for the hardware and software submodules. . . . . . 81
3.2 Comparison of different computation optimizations showing the total energy consumed

per scene reconstruction. Prototype on Raspberry Pi 4. . . . . . . . . . . . . . . . . . 83

4.1 Breakdown of energy consumption. . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2 Summary of related work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.1 Key characteristics of evaluation routes. . . . . . . . . . . . . . . . . . . . . . . 147
5.2 Comparison between latency, accuracy, and power consumption. . . . . . . . . . 149
5.3 Overall power and energy per inference. . . . . . . . . . . . . . . . . . . . . . . 150
5.4 RF frontend power consumption. . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.5 Baseband power and time per inference. . . . . . . . . . . . . . . . . . . . . . . 151

xi


List of Figures

1.1 Examples of intelligent, low-power, and scalable sensors developed in this thesis. 2
1.2 The intelligence-energy tradeoff: Conventional intelligent systems (top left)

operate at high energy levels and offer good capabilities. Current resource-
constrained systems (bottom right) provide limited intelligence. This thesis
explores approaches to bridge this gap by creating systems that offer 100× more
intelligence at similar energy levels. . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Our approach to sustainable ambient intelligence combines nature-inspired architectures,
physics-informed AI, and scalable networks. . . . . . . . . . . . . . . . . . . . . 6

1.4 Structure-assisted sensing systems: (A) SPiDR’s depth imaging using 3D-printed
acoustic metamaterial, (B) SPiDR’s acoustic stencil and its internal spatial filtering
channels, (C) Owlet’s acoustic stencil and its internal direction-finding channels,
(D) Owlet’s direction finding with passive acoustic structures. . . . . . . . . . . . 8

1.5 LiTEfoot leverages cellular signals for nationwide tracking using an ultra-low-
power miniaturized receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.6 Locate3D enables infrastructure-free tracking at scale through optimized peer-to-
peer measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 The vision and technical overview of Owlet, a low-power and miniaturized system for
extracting spatial information from sound. Owlet uses acoustic microstructures to embed
direction-specific signatures on the recorded sound and develops a learning-based approach
for signature recovery and mapping in real-time. . . . . . . . . . . . . . . . . . . . . 16

2.2 The concept of using a stencil with direction-specific hole patterns and microstructures for
passive filtering of the incoming sound. The stencil embeds a directional response to the
recorded signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 The concept of passive directional filtering using a stencil of acoustic microstructure.
The stencil embeds a directional signature to the recorded sound unique to its direction
of arrival (DoA). The spectrum of complex gains represents the signature for further
computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4 Different types of metamaterial stencils used in our experiments. . . . . . . . . . . . . 23
2.5 Angular diversity of the microphone with and without the microstructure stencil. . . . . 24
2.6 Comparison of the diversity in frequency responses (amplitude and phase) of the three

types of metamaterial stencils. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 The two-microphone model for eliminating source and environmental dependency. . . . 26

xii


2.8 The architecture of the proposed CNN model. . . . . . . . . . . . . . . . . . . . . . 29
2.9 The behavior of sound field at the outer surface of an obstacle. (a) When the object’s size

is much larger compared to the wavelength of the sound, the obstacle creates a shadow
region. (b) When the object’s size is comparable to the wavelength of the sound, the wave
diffracts around the object creating high-pressure at a larger region of the surface. It also
creates a high-pressure region directly opposite to the sound’s directions where sound
fields from the top and bottom sides meet. . . . . . . . . . . . . . . . . . . . . . . . 32

2.10 (a) A one-hole stencil to measure surface pressure levels. (b) Sound amplitude at different
angles from the sound’s direction of arrival. . . . . . . . . . . . . . . . . . . . . . . 33

2.11 Comparison of diversity in phase and amplitude patterns for an optimal and a sub-optimal
design of the stencil. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.12 The Owlet prototype used in the evaluation experiment (left) and a 9-element uniform
linear microphone array used as baseline for comparison(right). The array is 12cm wide,
where Owlet is significantly smaller measuring less than 2cm in its largest dimension. . . 37

2.13 Various locations for system evaluations: (a) indoor laboratory, (b) indoor lobby, (c) outdoor. 38
2.14 Overall performance of the Owlet system compared to the traditional microphone arrays

of various sizes. Owlet requires 100× less energy than the state-of-the-art array systems
while achieving better accuracy than a 9-element array. . . . . . . . . . . . . . . . . . 39

2.15 Performance under external conditions: (a) The impact of varying types and loudness
levels of ambient noise on the median DoA estimation error. (b) The CDF of errors when
the sound source is located at varying distances from the receiver. (c) The CDF plot of
estimation error for different elevation angles or the vertical positions of the sound source.
(d) The CDF plots of errors that show the impact of dynamic movements in the environment. 40

2.16 The performance of sound tracking while the source is constantly moving near the sensor.
The movement of the source creates a dynamic multipaths scenario. . . . . . . . . . . . 41

2.17 The CDF of median error for (a) different environments and (b) different types of sound
sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.18 The performance for DoA estimation with known room size: (a) The confusion matrix
and (b) the CDF of error in degrees of angle. . . . . . . . . . . . . . . . . . . . . . . 43

2.19 The localization error as (a) heatmap and (b) empirical CDF. . . . . . . . . . . . . . . 44
2.20 Performance comparison of Owlet with the implementation of beamscan, MVDR, and

MUSIC algorithms: (a) The CDF of median errors, (b) The spatial spectrum for an
incoming signal from 20◦ angle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.21 Performance comparison of Owlet with different deep learning models and architectures. 45
2.22 The setup for evaluating energy consumption. The setup tracks the energy requirements of

Owlet and baseline microphone arrays under various conditions using a Keysight E6313A
power supply and monitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.23 Energy consumption of (a) MSP430FR5969 low-power ADC [1] for different sampling
rates and (b) Keysight Data Acquisition System [2] for different number of microphones. 47

2.24 Overall energy consumption of array-based systems and Owlet. . . . . . . . . . . . . . 48

3.1 SPiDR, an ultra-low-power acoustic spatial sensing system for mobile robots. The system
uses a carefully designed 3D-printed micro-structure for projecting spatially coded signals
for imaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

xiii


3.2 The concept of the spatially coded channel sounding method. The received signal is the
weighted linear combination of the reflections that bear the direction-specific signature. . 54

3.3 Diversity projection with the stencil with the internal channels to encode unique gains to
signals to probe each pixel on the object plane with a unique signature. . . . . . . . . . 58

3.4 (left) The 3D design of the stencil and (right) the internal structure showing the tubular
helical paths of different lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5 Ultrasound emitted from the speaker without (left) and with the stencil (right). The stencil
spreads the signal energy over the region of interest. . . . . . . . . . . . . . . . . . . 59

3.6 The amplitude of acoustic signal at the cross section of the image plane when (left) speaker
does not have any stencil, and (right) when it has a stencil. The internal micro-structure of
the stencil diversifies the signal amplitude as direction codes. . . . . . . . . . . . . . . 61

3.7 (a) Energy consumption for different number of columns in the channel matrix H . (b)
The correlation between the signals at nearby locations. The pixels within 3cm have
correlation higher correlation higher than 0.5. . . . . . . . . . . . . . . . . . . . . . 62

3.8 Row 1: Comparison of the correlation of the time domain signals project at different
angles with an arbitrary stencil (left) and an optimized stencil (right). Row 2: Comparison
of the location detection performance for a small object (1cm wide) placed at different
angles from the sensor with an arbitrary stencil (left) and an optimized stencil (right). . . 64

3.9 Different sizes of stencils used in our experiments. . . . . . . . . . . . . . . . . . . . 65
3.10 F values with (a) different lengths of tubes inside stencil, (b) different diameters of tubes. 67
3.11 Confusion matrix of imaging accuracy for different frequencies and after combining all

frequencies together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.12 Stacking multiple frames suppresses spurious objects in the scene. Above we show the

result of motion stacking of 5 frames taken as the robot moves. . . . . . . . . . . . . . 70
3.13 Depth-map reconstruction using SPiDR for various real-world scenes. . . . . . . . . . . 72
3.14 Overall performance of the SPiDR compared to Intel Realsense lidar and ultrasound

distance sensor mounted on a servo motor. SPiDR consumes a fraction of power compared
to Lidar and motor based systems while delivering high accuracy in depth-map reconstruction. 73

3.15 Impact of varying (a) types and (b) levels of noises on depth-map reconstruction. . . . . 73
3.16 SPiDR’s performance in different environments. . . . . . . . . . . . . . . . . . . . . 74
3.17 Cross-sectional depth-map reconstruction performance in terms of (a) RMS error and (b)

structural similarity, as a function of sparsity of the scene. . . . . . . . . . . . . . . . 75
3.18 Horizontal localization performance in terms of (a) RMS error and (b) structural similarity,

as a function of sparsity of the scene. . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.19 CDF plot of (a) RMS error and (b)structural similarity for depth-map reconstruction at

varying resolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.20 Scene reconstruction results at 1cm and 0.5cm resolutions. We modify the number of

columns in the channel matrix to have a higher resolution. . . . . . . . . . . . . . . . 76
3.21 The depth-map reconstruction performance for (a) different materials and (b) different

proximity between two objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.22 Depth-map reconstruction outputs for varying proximity between objects. . . . . . . . . 77
3.23 Depth-map reconstruction outputs of different depths of the objects. . . . . . . . . . . 78
3.24 Performance of depth-map reconstruction with different depths of the objects. . . . . . . 78

xiv


3.25 Scene reconstruction of a horizontal bar with and without frequency or motion stacking.
(a) Ground truth, (b) Raw output, (c) Only motion stacked, (d) Only frequency stacked,
(e) Both motion and frequency stacked. . . . . . . . . . . . . . . . . . . . . . . . . 79

3.26 The scene reconstruction with and without the fractional computing method. (a) Ground
truth, and the scene reconstruction (a) without and (b) with fractional computing. . . . . 79

3.27 Performance with different sizes of Hmeta. . . . . . . . . . . . . . . . . . . . . . . . 80
3.28 SPiDR prototype for power evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.29 Power consumption during sampling and computation. . . . . . . . . . . . . . . . . . 82
3.30 Comparison of power consumption with different sizes of Hmeta. Prototype on MSP430FR5969. 83

4.1 Sirius dynamically switches the beam pattern of the antenna to embed direction
specific signature to the received signal. The vector of amplitudes contain the
unique signature which map to the angle-of-arrival, θ. . . . . . . . . . . . . . . . 88

4.2 Sirius uses pin-diode to switch the gain pattern of an antenna. Connecting and
disconnecting a conductive patch to the surface of antenna can change the shape
of the gain pattern. This figure shows four different configurations of the antenna
controlled using a set of two switches. . . . . . . . . . . . . . . . . . . . . . . . 90

4.3 Feasibility study demonstrates dynamic gain pattern switching. Figure shows gain
patterns for (a) our reconfigurable antenna and (b) regular antenna. . . . . . . . . 93

4.4 Passive envelope detector used by Sirius captures incoming signal energy without
requiring power-hungry components such as oscillators and down-converters. . . 95

4.5 Gain patterns for the first 3 paths that signal takes to reach the antenna. First one
is the direct path which constitutes the majority of the energy, followed by weaker
reflections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.6 Sirius uses triangulation to localize mobile nodes. It estimates angles from at least
two anchors with known locations. . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.7 Figure depicts envelope detector’s received signal: (top) Anchor1 transmitting,
(middle) Anchor2 transmitting, and (bottom) both Anchor1 and Anchor2 transmitting. 99

4.8 Anchor beacon signals designed with varying duty cycles create distinct time
windows for interference-free reception. (left) Case 1: Anchor2’s window as the
outer neighbor of collision windows. (right) Case 2: Anchor2’s window as the
inner neighbor of the collision window. . . . . . . . . . . . . . . . . . . . . . . . 101

4.9 Figure displays anchor detection algorithm outputs: (top) time domain signal,
(middle) total energy from all configurations, and (bottom) signal’s first derivative. 102

4.10 Using pin diode as RF switch. (a) Biasing circuit (b, c) Equivalent lumped model
for on and off states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.11 Switching and sampling techniques: (a) Sequential sampling: constant-time switch,
hold, and sample, (b) Uniform sampling: continuous configuration switching with
constant rate, (c) Burst sampling: high-speed switching and sampling with long
duty cycles for omnidirectional communication. . . . . . . . . . . . . . . . . . . 103

4.12 Sirius’s prototype for localization. . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.13 We evaluate Sirius on different antennas in ISM band 900MHz and 2.4GHz. The

figure shows the fabricated reconfigurable antennas used in the prototype. . . . . 107
4.14 Setup for outdoor long-range data collection. The map shows the static anchors

and node locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

xv


4.15 (a) Overall AoA estimation accuracy of Sirius. (b) CDF of AoA error. . . . . . . 109
4.16 (a) CDF of localization error (b) Localization error per location shows the impact

of distance from the APs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.17 Effect of pattern switching speed on communication. . . . . . . . . . . . . . . . 111
4.18 Impact of antenna reconfiguration on bit error rate for different antennas. . . . . . 111
4.19 Energy distribution between direct path and multipath for various indoor locations. 113
4.20 AoA errors for different indoor locations. . . . . . . . . . . . . . . . . . . . . . 113
4.21 AoA errors for different reflection materials. Each cluster of bars represents one

type of material, and each bar within a cluster represents different indoor locations.114
4.22 Inverse distance matrix for varying number of gain patterns (N). . . . . . . . . . 115
4.23 AoA estimation errors. (left) varying levels of multipath density in the environment.

(right) varying number of gain patterns. . . . . . . . . . . . . . . . . . . . . . . 115
4.24 (a) Correlation of gain pattern amplitudes. (b) Mean and median AoA errors

showing the long-term stability of gain patterns in a dynamic environment. . . . . 116
4.25 The figure shows gain patterns collected at two different indoor locations in a

dynamic environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.26 CDF of AoA error for different environments. . . . . . . . . . . . . . . . . . . . 117
4.27 Our experimental setup at different indoor locations in varying levels of static and

dynamic multipath. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.28 AoA error for varying distance from the anchor. . . . . . . . . . . . . . . . . . . 118
4.29 Impact of clock drift on AoA estimation . . . . . . . . . . . . . . . . . . . . . . 120
4.30 Impact of frequency shift on AoA estimation . . . . . . . . . . . . . . . . . . . . 120

5.1 LiTEfoot- an ultra-low-power wireless tracker next to a US quarter for scale. . . . 124
5.2 LTE cells and frame structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.3 Intermodulation and spectrum folding. . . . . . . . . . . . . . . . . . . . . . . . 132
5.4 Confusion matrix for PCI estimation using SSS2 (i.e., the SSS signal after non-

linearity). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.5 LTE packets before and after stacking (a) 1.4 MHz bandwidth (b) 10 MHz

bandwidth. After stacking the PSS and SSS strength increases in the 10 MHz case
whereas the data fades into a DC bias. . . . . . . . . . . . . . . . . . . . . . . . 138

5.6 (a) Blind Separation of weighted superposition (b) Linear phase change introduced
due to sub-sample offset in the OFDM subcarriers. . . . . . . . . . . . . . . . . 140

5.7 The high-level circuit schematic of LiTEfoot. . . . . . . . . . . . . . . . . . . . 144
5.8 LiTEfoot PCB tag prototype showing the low-power RF frontend. . . . . . . . . . 145
5.9 LiTEfoot’s estimated trajectories and localization errors for in (a) urban and (b), (c)

rural environments. The black lines denote the GPS trajectory and blue markers
denote LiTEfoot’s estimated trajectory. The cell towers that are detected during
the route are shown in red. (d) The empirical CDF of the localization errors. . . . 145

5.10 Diversity in downlink center frequencies used by the cells in a 200-meter radius. . 148
5.11 PCI estimation accuracy (F1 score) vs. number of stacked frames for LTE, 5G-NR,

and NB-IoT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.12 (a) CDF of phase offsets in measurements before and after sub-sample offset

correction. (b) Localization error for varying speed of vehicle. . . . . . . . . . . 152

xvi


5.13 (a) An alert is generated when the tag exits the boundary of the marked region. (b)
Alert generation distance from the map boundaries. . . . . . . . . . . . . . . . . 154

6.1 Locate3D’s approach to include both angle and edge constraints for faster and efficient
localization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6.2 Comparative analysis of constraints: Incorporating angles reduces the number of edges
required to attain the same level of accuracy as the ”Ranges only” approach. . . . . . . 166

6.3 (a) Histogram of localization errors for all spanning trees. (b) Reported angles by COTS
UWB sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

6.4 Different spanning trees representing rigid and non-rigid graphs. Solid lines indicate
both range+angle edges, and dashed lines indicate range-only edges. (a) A connected but
non-rigid graph due to missing angle information in an edge. (b) The subgraph is free to
rotate. (c) Adding a range measurement makes the graph rigid. . . . . . . . . . . . . . 171

6.5 The displacements corresponding to zero eigenvalues represent the translational and
rotational motions that the nodes can undergo without violating any constraint. . . . . . 172

6.6 (a) When Node 1 is aligned with the global frame of reference, it reports that Node 2 is
positioned at angle θ. (b) When Node 1 is rotated by γ relative to the global frame of
reference in a 3D space, it reports a distinctly different angle θ′, for the same node. . . . 174

6.7 The relation between AoAs reported by two nodes (a) when nodes are perfectly aligned,
and (b) when nodes are oriented with angle γ relative to the global frame of reference. . . 176

6.8 (a) Time to reach ‘0 False positives’. (b) Percentage of users registered for varying
trajectory matching thresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

6.9 Room-scale evaluation: CDF of localization errors. . . . . . . . . . . . . . . . . . . . 180
6.10 3D Localization performance of Locate3D compared to baseline system - Cappella [3]

- which uses visual odometry along with UWB. Results show performance in different
lighting conditions for (a) Moving nodes and (b) Static nodes. . . . . . . . . . . . . . 181

6.11 (a) Prototype built using Raspberry Pi, UWB sensor, Intel Realsense, IR markers for
ground truth and a battery pack. (b) Room-scale evaluation. (c) Building-scale evaluation
(d) 3D Lidar scan of the building for reference (not used in computation) (e) Snapshot of
estimated locations and MST. (f) AprilTags [4] captured by nodes for ground truth (not
used in computation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

6.12 City-scale analysis: (a) CDF of localization errors of 30k nodes, 3800m× 3800m area,
and 15 anchors. (b) Errors for 30k, 60k, and 100k nodes in 1500m× 1500m area and 1
anchor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

6.13 New York City wide-area simulation results: CDF of localization errors for 100,000 nodes
using 1 (left figure) and 5 (right figure) anchors in a 22000m× 3200m area. . . . . . . 185

6.14 Impact of submodules on (a) latency and (b) accuracy. . . . . . . . . . . . . . . . . . 186
6.15 Localization errors (in meters) for infrastructure baseline [5] and Locate3D for varying

nodes and anchors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.16 (a) Number of unreachable nodes and (b) Total number of NLOS measurements made

with varying anchor area density. . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.17 (a) LOS and NLOS localization errors. (b) Static vs Mobile nodes localization errors . . 188
6.18 (a) City-scale results for a 1000 node topology simulation in a 200 × 200 × 50 meter

3D space for a varying number of static anchors. (b) Room-scale localization error for
real-world 20-node experiments with varying numbers of anchors registered. . . . . . . 189

xvii


6.19 Ranging and AoA errors for various range-angles. . . . . . . . . . . . . . . . . . . . 190
6.20 (a) CDF orientation errors. (b) Percent of users registered as Virtual anchors over time. . 191

7.1 Complex permittivity of electromagnetic waves in water. . . . . . . . . . . . . . 199
7.2 FreshSense: Different frequencies travel at different speeds through water resulting

in an additional frequency dependent delay. . . . . . . . . . . . . . . . . . . . . 200
7.3 Estimated dispersion for varying percentage of water in the box. . . . . . . . . . 201
7.4 Evaluation setup with a box of avocados. . . . . . . . . . . . . . . . . . . . . . . 202
7.5 Images of avocados captured across 14 days. . . . . . . . . . . . . . . . . . . . . 204
7.6 Ground truth measured for 5 individual fruits and their average trend. . . . . . . . 204
7.7 Correlation between estimated and true DM% for the testing data. . . . . . . . . 205
7.8 Estimated and true DM% across 14 days. . . . . . . . . . . . . . . . . . . . . . . 205

xviii


List of Abbreviations

ADC Analog-to-Digital Converter
AI Artificial Intelligence
AP Access Point
AoA Angle of Arrival
API Application Programming Interface
ASIC Application-Specific Integrated Circuit
BLE Bluetooth Low Energy
CDF Cumulative Distribution Function
CGI Cell Global Identity
CMOS Complementary Metal–Oxide–Semiconductor
CNN Convolutional Neural Network
COTS Commercial Off-The-Shelf
DAQ Data Acquisition System
DC Direct Current
DM Dry Matter
DoA Direction of Arrival
DoF Degree of Freedom
eDRX Extended Discontinuous Reception
eNodeB Evolved Node B
FCC Federal Communications Commission
FDA Food and Drug Administration
FDD Frequency Division Duplex
FFT Fast Fourier Transform
FMCW Frequency-Modulated Continuous Wave
FoV Field of View
GHz Gigahertz
GPS Global Positioning System
GS Gerchberg-Saxton
HFSS High Frequency Structure Simulator
IoT Internet of Things
IoU Intersection over Union
IQ In-phase and Quadrature
LEA Low Energy Accelerator
LiDAR Light Detection and Ranging

xix


LNA Low-Noise Amplifier
LOS Line of Sight
LTE Long-Term Evolution
MCU Microcontroller Unit
MDS Multidimensional Scaling
MEMS Microelectromechanical Systems
MHz Megahertz
ML Machine Learning
MLP Multilayer Perceptron
MST Minimum Spanning Tree
MUSIC MUltiple SIgnal Classification
MUT Micromachined Ultrasound Transducer
NB-IoT Narrowband Internet of Things
NeRF Neural Radiance Fields
NIR Near-Infrared
NLOS Non-Line of Sight
NSSS Narrowband Secondary Synchronization Signal
OFDM Orthogonal Frequency Division Multiplexing
PA Power Amplifier
PCA Principal Component Analysis
PCB Printed Circuit Board
PCI Physical Cell Identity
PIN Positive-Intrinsic-Negative (diode)
PSS Primary Synchronization Signal
QPSK Quadrature Phase Shift Keying
R-squared Coefficient of Determination
RF Radio Frequency
RLS Recursive Least Squares
RMSE Root Mean Square Error
RSSI Received Signal Strength Indicator
SAR Successive Approximation Register
SDR Software-Defined Radio
SIMD Single Instruction, Multiple Data
SMACOF Scaling by MAjorizing a COmplicated Function
SNR Signal-to-Noise Ratio
SPL Sound Pressure Level
SPI Serial Peripheral Interface
SSIM Structural Similarity Index
SSS Secondary Synchronization Signal
STL Stereolithography

xx


SWaP Size, Weight, and Power
SWaP-C Size, Weight, Power, and Cost
TSS Total Soluble Solids
ToF Time-of-Flight
UDP User Datagram Protocol
UE User Equipment
UHF Ultra High Frequency
UWB Ultra-Wideband
VIO Visual Inertial Odometry
VNA Vector Network Analyzer

xxi


Chapter 1: Introduction

Imagine a world where intelligence is woven into the fabric of our physical environment.

Insect-scale robots navigate disaster zones, paper-thin tags track goods across global supply

chains without manual maintenance, and medical implants detect disease markers years before

symptoms appear. The vision of intelligent systems seamlessly embedded in the environment

has attracted sustained interest from the research community since the late 1990s, when pioneers

like Mark Weiser envisioned ubiquitous computing [6] and Kristofer S. J. Pister, Joe Kahn, and

Bernhard Boser conceptualized smart dust [7]. While today’s smart devices offer glimpses of

this future, they represent only incremental steps toward true ambient intelligence—systems that

vanish into the background while autonomously sensing, learning, and adapting to human needs.

The fundamental challenge lies in scaling intelligence sustainably: current approaches rely on

power-hungry hardware, frequent maintenance, and expensive components, making widespread

deployment impractical [8]. Achieving this vision of ambient intelligence at scale requires

fundamentally new approaches that can operate within extreme resource constraints.

Over the past two decades, we have pursued miniaturization and low-power operation as

primary paths toward embedding intelligence in everyday objects. This approach has yielded

impressive advances in microelectronics, MEMS sensors, and efficient computing. However,

simply making conventional systems smaller encounters fundamental physical barriers that force

1


difficult tradeoffs between intelligence capabilities and energy requirements. For instance, consider

the contrast between perception systems for autonomous vehicles and insect-scale robots: self-

driving cars use LiDAR sensors consuming watts of power, while a robot at millimeter scale must

achieve similar functionality within sub-milliwatt constraints. Sensor arrays required for spatial

perception cannot be reduced beyond wavelength-dependent limits without sacrificing resolution

or accuracy. Furthermore, modern deep-learning systems demand substantial computational

resources that conflict with the strict power budgets of tiny devices. Our goal, in this thesis, is

not merely to reduce power consumption, but to maintain intelligent capabilities while operating

within extreme resource limitations. To summarize, these challenges indicate that miniaturization

alone cannot bridge the gap between current technology and true ambient intelligence. We

need to fundamentally redesign our approaches to sensing, computing, and communication in

resource-constrained environments.

Figure 1.1: Examples of intelligent, low-power, and scalable sensors developed in this thesis.

In this thesis, we investigate how to achieve ambient intelligence by fundamentally reimagining

sensing and computing architectures from first principles. We show how bio-inspired meta-

structures (see Figure 1.1) enable spatial perception with single sensors that match the performance

of multi-element arrays while using 1000× less power; how ultra-low-power techniques can

provide GPS-like positioning with sticker-sized tags operating for years on a single battery; and

how novel sensing approaches can monitor food quality non-invasively throughout global supply

2


chains. Through innovations at the intersection of physics and computation, we create sensing

systems that achieve orders of magnitude improvements in energy efficiency, size and scalability,

making intelligence practical in environments previously considered impossible.

1.1 Challenges in Scaling Ambient Intelligence

The gap between our vision of integrating intelligence in everything and current reality

stems from three fundamental challenges:

Nature’s Efficiency vs. Artificial Systems: Nature has created intelligent systems that

operate within incredible efficiency constraints. A fruit fly navigates complex 3D environments,

avoids predators, and locates food with a brain of merely 100,000 neurons consuming just

microwatts of power [9]. Similarly, desert ants perform precise navigation across vast distances

without GPS, using minimal neural hardware [10]. In stark contrast, today’s artificial intelligent

systems capable of comparable perception [11], reasoning and decision-making [12] require

orders of magnitude more resources—hundreds of megabytes of memory and several watts of

power [13]. This fundamental mismatch prevents direct application of current AI approaches in

resource-constrained devices, creating a critical barrier to embedding intelligence in everyday

objects.

Physical Limitations: Traditional sensing paradigms encounter fundamental physical

barriers when scaled down to micro-devices. Spatial perception systems like radar, sonar, and

camera arrays rely on sampling theory principles that demand multiple sensors separated by

minimal distances to avoid aliasing. The Nyquist-Shannon sampling theorem states that to

accurately capture a signal with maximum frequency component fmax, sampling must occur at a

3


Energy efficiency

Intelligence

100x energy

100x intelligence

100x intelligence

Figure 1.2: The intelligence-energy tradeoff: Conventional intelligent systems (top left) operate
at high energy levels and offer good capabilities. Current resource-constrained systems (bottom
right) provide limited intelligence. This thesis explores approaches to bridge this gap by creating
systems that offer 100× more intelligence at similar energy levels.

rate of at least 2fmax [14, 15]. This creates critical hardware requirements: higher sampling rates

demand faster ADCs with correspondingly higher power consumption. Additionally, diffraction

limits optical sensing according to Abbe’s criterion (d = λ/2NA), while signal-to-noise ratio

deteriorates as sensor size decreases. These physics-based constraints collectively create seemingly

insurmountable barriers to effective sensing in tiny platforms.

Energy Sustainability at Scale: Achieving ambient intelligence requires not just creating

intelligent systems but ensuring they can operate sustainably at planetary scale. As illustrated

in Figure 1.2, current intelligent systems consumes order of magnitude more power, requiring

frequent battery replacements or continuous power delivery. The energy gap is huge mainly

because perception typically requires hundreds of milliwatts, while energy harvesting provides only

microwatts in many scenarios [16]. This gap cannot be bridged through incremental improvements

alone. It demands radically different approaches to energy-conscious intelligence.

These challenges collectively highlight that simply miniaturizing conventional sensing and

computing approaches is insufficient. The vision of ambient intelligence requires fundamentally

reimagining how we design intelligent systems from first principles, particularly for extreme

4


resource constraints.

1.2 Designing Ultra-Low-Power Intelligent Systems

Traditional approaches to intelligent systems follows the pipeline: collect high-resolution

signals using sensors, convert analog signals to digital data, and process this data through

computationally intensive large ML models. This conventional paradigm tightly integrates sensing

with digital processing, requiring substantial computational resources to extract meaningful

information from raw sensor data. While this approach works well for resource-rich platforms

like self driving cars, drones, humanoids; it fundamentally breaks down in resource-constrained

scenarios. Tiny robots, wearable devices, and battery-free sensor networks cannot support the

high-dimensional data acquisition, digital conversion, and large model inferencing required by

conventional intelligent systems. This mismatch between learning capabilities and hardware

limitations represents a critical barrier to realizing ambient intelligence at scale.

Our approach departs from this conventional primitive by fundamentally rethinking the

relationship between sensing hardware and computation. We relax the strict hardware requirements

for sensing on resource-constrained devices and instead leverage tiny ML models that operate on

physically pre-processed signals. The key idea is to merge the fundamental laws of physics with

learning principles, creating sensing frontends that perform signal transformations in the analog

domain before any power-consuming digital processing occurs. In other words, these passive

structures act as physical neural encoders, encoding domain knowledge directly into hardware that

requires zero power to operate.

For instance, in acoustic sensing, we implement this concept through 3D-printed metamaterials

5


that transform omnidirectional microphones into direction-aware sensors (Figure 1.1a, 1.1b).

For wireless applications, we develop gain-pattern reconfigurable antennas that embed spatial

information directly into received signal strength measurements (Figure 1.1c). By performing

these transformations in the physical domain, our approach dramatically reduces both data

dimensionality and energy requirements. The resulting sensor outputs contain high-level features

extracted with minimal energy consumption, requiring significantly smaller neural models for

downstream processing while providing information that naturally aligns with the hierarchical

structure of neural networks.

Physics-
informed AI

Nature-inspired 
architectures

Scalable 
Networks

xx
Sustainable 

Ambient 
Intelligence

Figure 1.3: Our approach to sustainable ambient intelligence combines nature-inspired
architectures, physics-informed AI, and scalable networks.

This approach combines three complementary strategies that together enable sustainable

ambient intelligence, as illustrated in Figure 1.3. First, we draw inspiration from nature’s

efficient solutions to similar problems. Biological systems have evolved sophisticated sensing

capabilities within strict resource constraints—owls can precisely locate prey in total darkness

using asymmetric ear structures, insects navigate complex environments with tiny brains, and even

plants respond to environmental stimuli through passive mechanisms. By studying these biological

systems, we identify principles for efficient spatial sensing that can be translated into engineered

6


systems through biomimetic design. This bio-inspired approach leads to sensing architectures that

achieve remarkable capabilities with minimal active components.

Second, we integrate physics-informed machine learning to bridge the gap between simple

hardware and complex perception tasks. Our systems incorporate passive components that perform

initial signal transformations in the analog domain without consuming power, effectively encoding

domain knowledge directly into the hardware. These physical structures act as computational

elements, extracting meaningful features before digital processing begins. The resulting signals

naturally align with the hierarchical feature extraction performed by neural networks, dramatically

reducing the computational burden on subsequent processing stages. This physics-informed

approach enables us to extract maximum information from minimal hardware, achieving capabilities

that would traditionally require complex, power-hungry systems.

Third, we develop techniques for scaling these intelligent systems across large networks

and diverse environments. By reimagining how devices communicate, collaborate, and leverage

existing infrastructure, we create systems that can be deployed at unprecedented scales—from

city-wide sensor networks to micro-robotic swarms. This scalable approach enables widespread

deployment of ambient intelligence without requiring dedicated infrastructure or frequent maintenance,

making these systems practical for real-world applications.

Together, these strategies enable a new class of intelligent systems with transformative

potential. They deliver extreme energy efficiency, supporting sensing functions, that typically

demand complex digital systems, now with minimal power use. They improve robustness by

leveraging core physical principles for better generalization. They also allow real-time adaptability

through dynamically reconfigurable designs. Most importantly, they close the gap between ambient

intelligence’s promise and the practical limits of embedded platforms, making it feasible to scale.

7


1.3 Systems Developed

1.3.1 Structure-Assisted Spatial Intelligence

Spatial perception systems fundamentally rely on sensor arrays spanning multiple wavelengths,

consuming hundreds of milliwatts of power and requiring substantial physical space. This

dependency on arrays creates a critical barrier for resource-constrained devices like micro-robots

and IoT sensors. We developed a different approach that achieves spatial sensing without arrays

by combining carefully designed passive structures with minimal active components, significantly

reducing both power and computational requirements.

A B C D

Figure 1.4: Structure-assisted sensing systems: (A) SPiDR’s depth imaging using 3D-printed
acoustic metamaterial, (B) SPiDR’s acoustic stencil and its internal spatial filtering channels, (C)
Owlet’s acoustic stencil and its internal direction-finding channels, (D) Owlet’s direction finding
with passive acoustic structures.

Bio-inspired DoA Estimation. Conventional acoustic DoA estimation requires multiple

synchronized microphones separated by half-wavelength distances. Drawing inspiration from how

barn owls achieve precise sound localization through asymmetric ear structures, we developed

Owlet [17], a system that reimagines direction estimation. The key insight lies in leveraging

diffraction and Helmholtz resonance through 3D-printed metamaterials to create direction-dependent

acoustic filtering (Figure 1.4 C, 1.4 D). By wrapping a single microphone with a structured pattern

8


of holes and resonant cavities, each sound direction creates a unique spectral signature. We

solved the critical challenge of environmental robustness through a two-microphone architecture

that eliminates both source and environmental dependencies. The system achieves 3.6◦ angular

error—matching a 9-microphone array while using 100× less power.

Single-sensor Depth Perception. Insect-scale robots operating in unknown environments

require depth perception for navigation, but existing solutions like LiDAR and ultrasound arrays

are power-hungry and bulky. We developed SPiDR [18], a fundamentally new approach to depth

perception using a single microphone-speaker pair. The core idea is to spatially encode the acoustic

channel through a metamaterial ”stencil” that creates unique signatures for each point in 3D space

(Figure 1.4 A, 1.4 B). These physics-optimized waveguides embed both direction and distance

information in single measurements, eliminating the need for scanning or arrays. Through sparse

recovery algorithms informed by wave interference patterns, SPiDR achieves centimeter-level

depth accuracy while consuming only 0.83mJ per frame—a 400× improvement over traditional

solutions. This work establishes new possibilities for perception in resource-constrained robotics.

Passive Spectral Analysis. Extending our structure-assisted sensing approach to spectral

processing, we drew inspiration from how the human cochlea naturally decomposes sound into

frequencies. We developed Lyra [19], a system that uses standing wave resonators to implement

FFT-like spectral analysis without power-hungry ADCs or digital processing. By leveraging wave

interference patterns in carefully designed cavity structures, Lyra enables always-on acoustic

monitoring with microwatt power consumption, making it suitable for continuous environmental

sensing in battery-constrained or energy-harvesting scenarios.

RF Self-localization. Building on our success with acoustic metamaterials, we extended

these principles to RF signals with Sirius [20]. While passive structures worked well for acoustic

9


sensing, the small wavelength diversity in RF necessitated a different approach. We developed a

gain-pattern reconfigurable antenna that dynamically embeds direction-specific codes in received

signals. Recent advances show envelope detectors enable ultra-low-power communication but

cannot extract phase information needed for spatial sensing. We solved this through a neural

pipeline that learns to decode spatial information directly from signal amplitudes, achieving 7◦

angular accuracy while consuming 1000× less energy than array-based positioning systems. This

enables sustainable sensor networks across agricultural fields, supply chains, and wildlife habitats

where traditional GPS (25mJ per fix) would quickly deplete batteries.

Figure 1.5: LiTEfoot leverages cellular signals for nationwide tracking using an ultra-low-power
miniaturized receiver.

1.3.2 Scalable nextG Systems

From tracking perishable goods to monitoring elderly patients with dementia, continuous

location tracking of small assets and individuals has become essential across numerous domains.

However, existing solutions like GPS rely on bulky batteries or dedicated infrastructure, creating

fundamental barriers to widespread deployment. We developed two complementary systems that

10


reimagine global positioning for resource-constrained scenarios, enabling seamless tracking from

nationwide supply chains to dense urban environments.

Ultra-low-power Location Tracking. Traditional cellular localization requires frequency

hopping across multi-GHz bandwidths using power-hungry oscillators and IQ demodulators that

consume over 100mW. We developed LiTEfoot [21], a cellular-based self-localization system that

uses non-linear intermodulation to simultaneously capture synchronization signals across 3GHz

spectrum through a passive envelope detector (Figure 1.5). The system decodes Physical Cell

Identities from the folded spectrum and performs multilateration, achieving 19m accuracy while

consuming only 40µJ—a 625× reduction compared to GPS. This enables 11-year operation on

a coin cell battery for nationwide asset tracking in supply chains, precision agriculture, wildlife

conservation, and healthcare monitoring, all without requiring dedicated infrastructure.

24mm

100x less 
energy

9x smaller
Traditional 9 

element array
Owlet

Cross-sectional 
depth map

Transducer with 
microstructure

Figure 1.6: Locate3D enables infrastructure-free tracking at scale through optimized peer-to-peer
measurements.

Infrastructure-free 6DoF Tracking at Scale. For scenarios without cellular infrastructure,

we developed Locate3D [22], a peer-to-peer system enabling infrastructure-free 6-degree-of-

freedom tracking using UWB radios for massive-scale networks. The system introduces angle

measurements alongside ranging to reduce the minimum edges needed for unique topology

11


realization by 4× (see Figure 1.6). Using rigidity-aware spanning tree optimization and non-rigid

graph decomposition, Locate3D achieves 0.86m accuracy in building-scale networks and 12.09m

in city-scale deployments while reducing latency by 75%. The system seamlessly scales to track

100,000 devices across cities, enabling transformative applications from coordinating disaster

response teams to managing autonomous delivery fleets and monitoring smart city infrastructure.

1.3.3 Systems for Sustainability

While 800 million people face hunger globally, nearly 40% of food produced is lost to

waste, primarily due to inadequate monitoring during distribution. In collaboration with Microsoft

Research, we developed sensing systems to enable data-driven food quality monitoring across

global supply chains.

Non-invasive Food Quality Monitoring. Current food quality assessment methods require

destructive testing, making continuous monitoring throughout distribution impractical. We

developed FreshSense [23], a wireless sensing system that monitors dry matter content non-

invasively at the pallet level. The key challenge lies in measuring subtle changes in water content

through densely-packed produce where traditional RF sensing fails due to complex multi-path

effects. We solved this through a dispersion-based sensing approach that exploits frequency-

dependent wave propagation in water-rich environments. By analyzing electromagnetic delays

with physics-informed neural networks, FreshSense achieves robust quality assessment while

eliminating environmental variations.

12


1.3.4 Other Contributions

In my Ph.D., I have also developed solutions across security, AI verification, and audio

processing, demonstrating the broader applicability of our approach.

Security and Self-defense for Drones. As drones become trusted delivery systems and

law enforcement tools, they face increasing risk of mid-air attacks and vandalism. We developed

DopplerDodge [24], an acoustic sensing system enabling real-time threat detection and avoidance

in resource-constrained drones. Using just a single microphone and Doppler effect analysis, the

system detects incoming projectiles with 100ms advance warning, enabling autonomous evasive

maneuvers while consuming minimal power. This work establishes a new direction in embedded

defense systems for tiny autonomous vehicles.

Side-channel Security in Embedded AI. With the proliferation of edge AI, verifying the

trustworthiness of model inference is increasingly important. We developed ThermWare [25], a

system that leverages thermal side-channels to detect anomalous computations in embedded AI

systems. By capturing spatiotemporal heat signatures with a thermal camera, the system identifies

unauthorized operations with 94% accuracy, enabling non-invasive run-time AI model monitoring

without requiring system-level access or modifications.

Noise-cancellation for Wearables. To improve voice communication in noisy environments,

we developed VoiceFind [26], a speech enhancement system that uses just two microphones to

achieve spatial filtering of desired speech. Through a combination of harmonic-based direction

finding and conditional generative adversarial networks, the system improves speech intelligibility

by 16% in real-world environments. This work demonstrates how physics-informed machine

learning can enable sophisticated audio processing on resource-constrained wearables.

13


1.4 Organization

The remainder of this thesis is organized as follows. Chapter 2 introduces a bio-inspired

acoustic sensing system that achieves direction finding with a single microphone through carefully

designed metamaterials, enabling spatial audio perception in wearable devices. Chapter 3 presents

a novel approach to depth perception for micro-robotics that provides high-resolution imaging

with minimal power requirements through physics-based spatial encoding. Chapter 4 explores self-

localization for IoT devices using reconfigurable RF structures and machine learning techniques,

achieving long-range positioning with significantly reduced energy consumption. Chapter 5 details

a cellular-based localization framework that enables nationwide tracking with sticker-sized tags

by exploiting existing cellular infrastructure for applications in supply chains and healthcare.

Chapter 6 describes an infrastructure-free tracking system that supports large-scale coordination

of devices in urban environments through innovative peer-to-peer algorithms. Chapter 7 presents a

non-invasive approach to food quality monitoring that creates three-dimensional maps of food

quality within pallets through physics-informed sensing techniques. Chapter 8 concludes with

reflections on the broader implications and future directions of this thesis.

14


Chapter 2: Structure-Assisted Spatial Audio Sensing

2.1 Introduction

Acoustic devices are increasingly becoming pervasive in our everyday environments.

Beyond just voice interfaces, a broad spectrum of applications is emerging that leverages multiple

facets of context-awareness and analytics. These applications encompass indoor activity monitoring

based on sound [27–29], health monitoring through acoustic signals [30, 31], speech development

support and acoustic environment sensing with on-body wearables [32, 33], along with numerous

outdoor use-cases utilizing distributed sensor nodes [34,35]. With the advent of new low-power

and battery-free technologies [36, 37], it has become feasible to continuously capture and process

sound using independent sensing modules distributed throughout the environment. Adding

spatial analysis of sound and source localization can significantly enhance the capabilities of

such context-aware systems. Meanwhile, spatial sensing of sound is also critical for robotic

navigation and situational awareness systems, both in aerial environments [38–40] and underwater

scenarios [41, 42]. However, conventional methods for obtaining spatial sound information

typically rely on capturing multiple synchronized audio streams through microphone arrays, which

is a power-intensive hardware requirement that is challenging for standalone sensing modules. In

this chapter, we aim to build an acoustic sensing system that enables spatial information processing

on power-constrained ubiquitous devices with compact form factors.

15


15 mm

1
3

 m
m

Stencil with

µ-resonators
Internal cavity

Estimates direction 

and location

Low-power and miniaturized

spatial sensing with acoustic

micro-structures

Passive structure 

interacts with sound

System learns

location-agnostic features

Microstructure embeds 

directional signature

1 2 3 4

Figure 2.1: The vision and technical overview of Owlet, a low-power and miniaturized system for extracting
spatial information from sound. Owlet uses acoustic microstructures to embed direction-specific signatures
on the recorded sound and develops a learning-based approach for signature recovery and mapping in
real-time.

Estimating spatial features of sound, such as direction-of-arrival (DoA) or source location,

traditionally relies on sampling the acoustic wavefront across space using a microphone array.

Since conventional DoA algorithms are fundamentally based on this spatial sampling model, both

the array dimensions and the number of microphones directly impact their performance. According

to the sampling theorem [14], microphones in a linear array must ideally be spaced at half the

signal’s wavelength (λ/2) for accurate DoA estimation. Moreover, the angular resolution, inversely

proportional to the Half Power Beam-width, improves with the total length of the array aperture.

Thus, achieving fine-grained DoA resolution typically demands large physical arrays. Additionally,

these arrays require simultaneous sampling across all microphones, causing power consumption

and hardware complexity to grow with array size. Although acoustic devices are increasingly

common in ubiquitous computing, the power, hardware, and form-factor constraints limit the

feasibility of traditional array-based spatial sensing. In this chapter, we explore an alternative

approach to spatial signal processing that moves away from the standard spatio-temporal sampling

paradigm, leveraging wave-structure interactions to enable low-power, compact, and simple

designs.

Directional hearing aided by structural interactions is widespread in nature. The symmetric

16


placement of ears in most mammals, including humans, effectively forms a two-element array for

directional processing. However, biophysical studies show that fine-grained localization relies

heavily on how sound interacts with the complex three-dimensional geometry of the head [43].

Owls, for example, possess asymmetrically positioned ears along both horizontal and vertical

planes [44], allowing them to precisely localize low-frequency sounds, a task difficult to achieve

with symmetric structures alone. Remarkably, certain insects with body sizes much smaller

than one-tenth of the sound wavelength can localize sound as accurately as mammals [45]. A

grasshopper, for instance, with a body width of just 3mm, achieves precise localization despite

its small size relative to the sound wavelength. This is enabled by asymmetrical body structures

that produce direction-dependent responses to incoming sounds. The sensory and neural systems

of these organisms have evolved to map such responses to sound direction. Inspired by these

biological mechanisms, we design a structure-assisted DoA estimation system suited for power-

constrained and miniaturized sensing platforms.

In this chapter, we present the design and prototype of an acoustic localization system that

introduces physical structures around a microphone to embed directional cues. As acoustic waves

propagate, they interact with physical structures, resulting in transformations of the wave field.

Such interactions are evident at large scales in room acoustics, where the same sound can differ

based on the room’s shape, size, and object placement. We demonstrate that small 3D-printed

structures can similarly manipulate sound waves, imprinting unique signatures onto the passing

sound. By placing a microphone within a structure only a few centimeters in size, the recorded

signals inherently carry these signatures. With careful design, the structure can embed distinct

signatures for sounds arriving from different angles, achieving angular resolutions of a few degrees.

Our system detects these embedded signatures to infer the direction of arrival (DoA) of sound. We

17


name this system Owlet, inspired by a bird known for its exceptional auditory capabilities.

The idea of leveraging environmental variations in sound fields for localization is not new.

Prior work has explored fingerprinting multipath environments and analyzing reflections for

localization [46]. Closest to our approach is [47], which places objects in a 60 × 60 cm area

with a central microphone, showing that scattered sound carries directional cues that can be used

for DoA estimation. While Owlet builds on similar principles, it differs in two critical ways.

First, we focus on creating a centimeter-scale sensing system suitable for resource-constrained

robots and ubiquitous sensing applications. Our Owlet prototype achieves angular resolutions

comparable to or better than previous work, with a compact 1.5 cm× 1.3 cm sensor. Second, we

address robustness to environmental changes. Owlet is designed to operate beyond controlled

environments like anechoic chambers and does not require location-specific training data.

A primary challenge for Owlet is achieving sufficient multipath diversity within a small form

factor. Because low-frequency acoustic signals have large wavelengths, traditional reflection-based

approaches require similarly large reflectors to create diversity, directly limiting spatial resolution.

We overcome this limitation by developing a diffraction-based technique for miniature acoustic

structures. When sound passes through small apertures, it diffracts, effectively generating new

secondary sources. We exploit this phenomenon by designing a 3D-printed cylindrical cover,

called a stencil, that surrounds the microphone. These stencils incorporate optimally patterned

holes that create complex but predictable multipath interference inside the structure. The resulting

interference patterns carry signatures encoding the direction of arrival. To enhance angular

diversity, we also integrate principles of metamaterial design into the stencil. The Owlet system

learns these signatures through a one-time calibration and maps them to DoA estimates during

operation.

18


Another major challenge is ensuring that the design remains robust against environmental

changes that can unpredictably alter incoming sound. For practical deployment, the system must

function reliably across diverse environments while requiring only a one-time calibration during

manufacturing. As previously noted, room acoustics can distort the sound field and compromise the

mapping between directional signatures and angles. To address this, Owlet introduces a reference

microphone and adopts a communication-theoretic approach to suppress transient multipath

effects during signature generation and mapping. This technique enhances Owlet’s robustness to

environmental variations, making it viable for real-world applications.

This chapter explores acoustic structures as passive components for creating low-power,

miniaturized solutions in ubiquitous sensing. Potential applications include wearable devices for

acoustic environment sensing, such as systems for assessing speech development in infants [48,49]

or personal acoustic analytics [50, 51], where sound direction is critical. Navigation in SWaP-

constrained [52, 53] aerial and underwater robots can also benefit from spatial sensing capabilities

enabled by Owlet. Moreover, Owlet offers a path toward directional sensing and localization

in energy-harvesting systems, a task challenging for traditional microphone arrays. Figure 3.1

illustrates the broader vision and technical overview of this work. While many application

opportunities emerge from this platform, this chapter focuses on developing the core capabilities

and understanding the system’s fundamental limits.

In this chapter, we make the following three contributions:

• A novel method of using passive elements for directional sensing, enabling a low-power,

low-complexity, and miniaturized system for acoustic localization. The sensing and signal

processing techniques ensure robust DoA estimation with a single in-lab calibration. The

19


system achieves a median DoA error of 3.6◦, comparable to microphone array-based

solutions while significantly reducing power and size requirements.

• A replicable process for designing and 3D-printing optimal acoustic structures that encode

incoming sounds with directional cues, presenting a new approach for shaping sound fields

using controlled diffraction in compact metamaterial structures.

• A complete hardware and software prototype of the Owlet system, made available for the

community to reproduce, evaluate, and extend.

In the following sections, we detail the core intuition, system design, and key findings of

this work.

j

Stencil

Approximation

of the stencil

Sound source
Sound source

Hole pattern

for a specific DoA

Microphone
Microphone inside 

the stencil

Sound holes

Figure 2.2: The concept of using a stencil with direction-specific hole patterns and microstructures for
passive filtering of the incoming sound. The stencil embeds a directional response to the recorded signals.

2.2 Core Intuitions and Primers

The core idea of our system is to engineer a controlled environment around the microphone

such that the recorded signal carries a unique, direction-specific channel impulse response. This

impulse response, extracted from the microphone recording, serves as a signature of the sound’s

20


angle of arrival. While larger objects or room acoustics naturally introduce diverse multipath

effects that embed directional cues, our goal is to achieve finer-grained diversity within a compact

form factor by combining principles of diffraction, interference, and structural resonance. Toward

this end, we design a porous cap for the microphone, referred to as a stencil. The stencil features

patterned holes on different sides, as illustrated in Figure 2.2. Incoming sound waves pass through

these holes, and depending on the angle of incidence, different hole patterns influence the wave

before it reaches the microphone. The holes are coupled with internal microstructures of varying

parameters, imparting distinct frequency responses to the sound.

The stencil acts as a metamaterial, where the internal microstructures modulate the incoming

sound and imprint a unique directional signature. Because the response of these microstructures is

frequency-dependent, the directional signature is represented as a vector of complex frequency

gains, Gθ. This concept is depicted in Figure 2.3.

f f

Sound DoA = »1 Sound DoA = »2

f

Signature_»1 Signature_»2

f

C1 C2 C3 C4 C5 C6
C1 C2 C3 C4 C5 C6

Complex frequency gains represent 

directional signature (G»)

Mic

Acoustic micro-structures for 

directional filtering
Stencil

Figure 2.3: The concept of passive directional filtering using a stencil of acoustic microstructure. The
stencil embeds a directional signature to the recorded sound unique to its direction of arrival (DoA). The
spectrum of complex gains represents the signature for further computation.

21


2.2.1 Metamaterials for Passive Filtering

When sound interacts with physical structures, its frequencies are either amplified or

attenuated. At larger scales, multipath reflections create such frequency variations through

constructive and destructive interference. While reflections can embed directional signatures into

sounds, they typically require structures comparable in size to the acoustic wavelength. Since

Owlet targets low-frequency audible signals with large wavelengths, traditional reflection-based

approaches would demand structures on the order of half a meter—prohibitively large for our

goals. To achieve passive filtering within a compact design, we leverage the concept of acoustic

metamaterials. Metamaterials are artificially structured materials composed of subwavelength

elements that endow the material with novel properties. In constructing the metamaterial stencil,

we utilize three key principles: (a) diffraction, (b) capillary effects, and (c) structural resonance.

(a) Diffraction: When waves encounter the edge of an obstacle, they bend around it—a

phenomenon known as diffraction. This behavior is particularly pronounced when sound passes

through an aperture smaller than its wavelength [54]. In such cases, the hole behaves as a virtual

point source. When a receiver is placed behind a barrier with multiple small apertures, it observes

a multipath-like environment formed by multiple virtual sources. The interaction of signals from

these sources creates patterns of constructive and destructive interference, influenced by both the

receiver’s position and the sound frequency. We exploit this property by designing diverse hole

patterns on the stencil, enabling a rich multipath environment around the microphone within a

small form factor.

(b) Capillary Effect: As sound propagates through narrow capillary tubes, its acoustic

impedance varies significantly [55]. The dimensions of these tubes, particularly their length

22


and cross-sectional area, influence the speed and phase of the transmitted sound. By integrating

capillary tubes of different geometries into the stencil, we introduce controlled phase shifts between

sound paths. This enhances frequency diversity, even when the physical separations between the

holes are small.

Cylindrical stencil with
capillary tubes

Cylindrical stencil with
micro resonators

US penny for 
reference

Hemispherical stencil with
capillary tubes

19 mm

30 mm15 mm
10 mm

11
 m

m 13
 m

m

Internal cavity 
structure of the 

stencils

Figure 2.4: Different types of metamaterial stencils used in our experiments.

(c) Structural Resonance: Certain sound frequencies are amplified when oscillating air

pressures interact with cavities along their path [56]. This phenomenon, known as Helmholtz

resonance, is commonly observed in whistling bottles. We design millimeter-scale Helmholtz

resonators embedded within the stencil, connected to the sound holes. By varying the shapes

and dimensions of these resonators, we generate arbitrary resonance effects across different

frequencies.

Figure 2.4 shows examples of 3D-printed stencils with embedded microstructures for

directional filtering. Figure 2.5 illustrates the improvement in angular diversity provided by the

stencil, comparing the amplitude variation of a 7 kHz tone with and without the stencil. Finally,

Figure 2.6 presents the corresponding diversity in direction-specific frequency responses across

different stencil designs.

23


0°

30°

60°
90°

120°

150°

180°

With stencil
Without stencil

Figure 2.5: Angular diversity of the microphone with and without the microstructure stencil.

2.3 System Design

The system design focuses on two key objectives: (a) creating an optimal stencil structure

that maximizes angular diversity, and (b) developing signal processing techniques to estimate the

direction of arrival (DoA) from the recorded signal. Naturally, the accuracy of the system depends

directly on the diversity introduced by the stencil. Our algorithms optimize the stencil design by

simulating wave propagation around small structures and subsequently fabricating the optimized

stencil using 3D printing. Before delving into stencil design details, we first describe the signal

processing and DoA estimation techniques, providing an overview of the complete system.

(a) Cylinder_capillary (b) Cylinder_resonator (c) Hemisphere_capillary

4000 5000 6000 7000 8000

Frequency

0

1

2

3

P
h
a
s
e

0 deg

60 deg

120 deg

180 deg

4000 5000 6000 7000 8000

Frequency

0

0.5

1

1.5

2

A
m

p
lit

u
d
e

0 deg

60 deg

120 deg

180 deg

4000 5000 6000 7000 8000

Frequency

0

5

10

15

20

25

A
m

p
lit

u
d
e

0 deg

60 deg

120 deg

180 deg

4000 5000 6000 7000 8000

Frequency

0

1

2

3

P
h
a
s
e

0 deg

60 deg

120 deg

180 deg

4000 5000 6000 7000 8000

Frequency

0

0.5

1

1.5

A
m

p
lit

u
d
e

0 deg

60 deg

120 deg

180 deg

4000 5000 6000 7000 8000

Frequency

0

1

2

3

P
h
a
s
e

0 deg

60 deg

120 deg

180 deg

Figure 2.6: Comparison of the diversity in frequency responses (amplitude and phase) of the three types of
metamaterial stencils.

24


2.3.1 Processing for DoA Estimation

At a high level, Owlet’s DoA estimation operates in two stages. First, during a one-time

in-lab calibration, we generate a table of direction-specific signatures Gθ by sending known

wideband signals from various directions. These signals are recorded by a microphone equipped

with the stencil cap, capturing the direction-dependent modifications imposed by the structure.

This calibration process is analogous to the procedures used for commercial microphone arrays.

The second stage occurs at run-time, where the system processes incoming sounds to extract

the stencil-induced signature, hstencil, and matches it against the pre-collected signature table to

infer the DoA. In practice, we train a deep learning model on variations of the signature table,

allowing the system to predict the DoA directly from pre-processed signals during real-world

operation.

A critical aspect of this processing is accurately extracting hstencil from real-world signals,

which must overcome two challenges: (i) separating the stencil’s signature from the unknown

source signal, and (ii) mitigating distortions caused by environmental multipath. We first describe

our method for eliminating dependency on the source signal, followed by techniques to suppress

environmental effects.

2.3.2 Eliminating Source Signal Dependency

The signal recorded by the microphone inside the stencil is the source signal modified by

the stencil’s directional response. Assuming no environmental effects, if X(ω) denotes the source

signal and Yin(ω) the recorded signal, their relationship in the frequency domain is:

25


Yin(ω) = X(ω)Hstencil (2.1)

When the source signal X(ω) is known, the stencil response Hstencil can be directly obtained

by dividing the recorded signal by the source signal, i.e., Hstencil =
Yin(ω)
X(ω)

. In certain applications,

such as navigation where a robot localizes itself using a known control signal, this assumption

holds.

However, in many other applications, including ambient sound localization or user direction

detection from speech, the source signal is unknown. To address this, Owlet introduces a secondary

microphone placed outside the stencil. This reference microphone records the incoming sound

without the stencil’s influence, providing an independent observation of the source signal. Unlike

traditional microphone arrays, the secondary microphone can be placed arbitrarily close to the

primary microphone without affecting system operation. Figure 2.7 illustrates the physical setup

and realistic signal model of the system.

H9env

Henv

Source sound: X

Yin = X.Henv . Hstencil

Yout = X.H9env

Hstencil

Stencil

Inside mic

outside mic

Environmental 

multipath response

Direction-specific 

response of stencil

Figure 2.7: The two-microphone model for eliminating source and environmental dependency.

Consider the channel frequency responses from the source to the inside and outside

microphones as Henv and H ′
env, respectively. These responses capture the effects of multipath

propagation, including reflections from nearby objects. The presence of the stencil around

26


the internal microphone introduces an additional modulation represented by the frequency

response Hstencil. Assuming linearity of the system, the signal recorded by the inside microphone

experiences both environmental and stencil-induced transformations, as illustrated in Figure 2.7.

Thus, the signals recorded by the inside and outside microphones, Yin(ω) and Yout(ω), can be

expressed as:

Yin(ω) = X(ω)HenvHstencil +N(ω)

Yout(ω) = X(ω)H ′
env +N ′(ω)

(2.2)

where X(ω) denotes the source signal, and N(ω) and N ′(ω) represent independent noise at

the two microphones.

Dividing Yin(ω) by Yout(ω) cancels the dependency on the unknown source signal but

retains a residual environmental dependency:

Yin(ω)

Yout(ω)
= Hstencil

Henv

H ′
env

+N ′′(ω), (2.3)

where N ′′(ω) ≪ Hstencil
Henv

H′
env

.

This residual term, Henv

H′
env

, implies that without further correction, the system remains sensitive

to environmental variations. Thus, the stencil calibration or the training of the deep learning

model would need to be performed for every new environment to ensure accurate angle prediction.

Such a setup may be feasible in controlled scenarios where the positions of sound sources and

sensing modules are fixed, such as object tracking on a conveyor belt. However, in most practical

applications, the source location is unknown and variable, making exhaustive environment-

specific training impractical. To overcome this limitation, we introduce a technique to eliminate

27


environmental dependency, enabling Owlet to operate reliably with a single in-lab calibration.

2.3.3 Eliminating Environmental Dependency

Our approach to removing environmental dependency is based on the observation that,

although Henv and H ′
env can vary unpredictably across environments, their ratio Hratio =

Henv

H′
env

remains bounded when the microphones are placed closely together.

This can be intuitively understood by considering the origin of environmental diversity. After

emission from the source, sound waves reflect off various surfaces, creating multiple paths that

arrive at the microphones with different delays. These differences in path lengths cause variations

in the observed environmental response. When microphones are widely separated, these path

differences are significant, leading to substantial variability in Henv and H ′
env. However, when the

microphones are placed within a few centimeters of each other, the differences in reflection paths

become small and bounded. At the limit, if the microphones are exactly co-located, they observe

identical environmental responses.

Therefore, the ratio Hratio =
Henv

H′
env

has a narrow distribution across frequencies when the

microphones are closely spaced. We validate this property through both simulated ray tracing and

real-world experiments, demonstrating that environmental variations can be effectively suppressed,

enabling robust DoA estimation without location-specific retraining.

Once the distribution of Hratio is known and Hstencil is collected through the calibration

process, we generate synthetic training data by sampling from the distribution of Hratio and

combining it with Hstencil. This synthetic data enables robust training of the deep learning model

for angle prediction without requiring real-world sound traces. Moreover, if the dimensions of

28


the target environment and the positions of major reflectors are known, the synthetic training data

can be tailored accordingly. Such customization accelerates model convergence and improves

prediction accuracy.

At run-time, the system extracts HratioHstencil from the two recorded channels, Yin(ω)

and Yout(ω). To enhance this extraction, we employ a Recursive Least Squares (RLS) adaptive

filter [57] operating in system identification mode. The adaptive filter exploits the uncorrelated

Gaussian noise between the two channels to estimate HratioHstencil by minimizing the following

error term via gradient descent:

e(ω) = Yin(ω)− Yout(ω)
HenvHstencil

H ′
env

(2.4)

2.3.4 Synthetic Training for Deep Learning

We use the synthetic channel responses generated above to introduce environmental diversity

into the training set for the neural network model. Specifically, we simulate different room

environments and various source-microphone placements to create a wide range of HratioHstencil

samples. Each sample is represented as a vector of 400 equally spaced frequency components

between 0 and 8 kHz. Rather than using raw complex values, we separately extract the amplitude

and phase spectra for training.

Freq. spectrum

Phase spectrum

Sound signal

Input

matrix

Conv.

layer

Conv.

layer

Conv.

layer
Fully connected

layer

Predicted 

Angle

Regression

layer

(400 x 2) (394 x 64) (390 x 128) (388 x 256) (1 x 1)

2x7
1x5 1x3

Figure 2.8: The architecture of the proposed CNN model.

29


We use the synthetic channel responses generated above to introduce environmental diversity

into the training set for the neural network model. Specifically, we simulate different room

environments and various source-microphone placements to create a wide range of HratioHstencil

samples. Each sample is represented as a vector of 400 equally spaced frequency components

between 0 and 8 kHz. Rather than using raw complex values, we separately extract the amplitude

and phase spectra for training.

We adopt a Convolutional Neural Network (CNN)-based regression model for DoA estimation.

CNNs are well-suited for environmental sound processing [58] and offer low-latency performance

due to their compact parameter sets. Our model consists of a one-dimensional CNN with

three convolutional layers followed by a fully connected layer and an output regression layer.

The convolutional layers have 64, 128, and 256 filters with kernel sizes of 2 × 7, 1 × 5, and

1× 3, respectively. The regression layer minimizes the half-mean-squared-error loss for angular

prediction. We customize the loss function according to the target range and resolution of the

directional angles. ReLU activation functions and batch normalization layers are used between

convolutional layers to accelerate training, and stochastic gradient descent serves as the optimizer.

The model is trained for 100 epochs with a learning rate of 1× 10−6. The block diagram of the

CNN architecture is shown in Figure 2.8.

In addition to the regression model, we also develop a CNN-based classification model

for evaluation and comparison, as described in Section 2.5.8. This model largely mirrors the

regression architecture but replaces the output with a fully connected layer of size 360 (one node

per degree) followed by a Softmax and classification layer.

30


2.3.5 Optimizing 3D Stencil Design

The performance of our system depends critically on the diversity of the frequency gain

patterns across different angles. Our initial feasibility studies using randomly distributed holes

on the stencil demonstrated a median DoA error of 7◦. However, the angular resolution was

non-uniform, with some directions exhibiting significantly worse accuracy. This issue arises from

suboptimal distributions of holes and internal microstructures, leading to similar gain patterns for

different angles.

To address this, we systematically optimize the 3D stencil design to guarantee a minimum

DoA detection resolution across all directions.

Ideally, the stencil should maximize the diversity of frequency gain patterns associated

with each possible DoA. This design problem parallels the information-theoretic challenge of

constructing maximally diverse code sequences. Each frequency gain pattern, Gθ, associated with

an angle θ, can be thought of as a codeword. Our goal is to design a set of N codewords that are

maximally distant from each other (i.e., maximizing the Euclidean distance between all pairs of

codewords). The number of codewords, N , determines the angular resolution, ∆θ = 2π
N

.

Initially, we attempt to design ideal codewords of discrete frequencies and use them as

guidelines for constructing desired gain patterns Gθ. These patterns are then mapped to physical

arrangements of pinholes on the stencil surface at corresponding angles.

Given the number of holes N , the distances from the microphone to the holes rn, the distance

from the microphone to the stencil surface D, and the acoustic wavelength λ, the superposition of

waves at the microphone can be modeled as:

31


u(λ) =
N∑

n=1

D

jλr2n
e

j2πrn
λ (2.5)

This equation defines the resu