ABSTRACT

Title of Dissertation: WIRELESS SENSING AND ANALYTICS
FOR MOTION MONITORING AND MAPPING

Guozhen Zhu
Doctor of Philosophy, 2023

Dissertation Directed by: Professor K. J. Ray Liu
Department of Electrical and Computer Engineering

Environmental perception is pivotal for intelligent systems, enabling them to adeptly

capture, interpret, and act upon contextual cues. Grasping the intricacies of the environment—its

objects, occupants, floor plan, and dynamics—is fundamental for the effective deployment of

technologies, including robotics, the Internet of Things (IoT), and augmented reality. Traditional

perception mechanisms, such as video surveillance and sensor-based monitoring, are often

hampered by privacy concerns, substantial infrastructural costs, energy inefficiencies, and

limited coverage. In contrast, WiFi sensing stands out for its non-intrusive, cost-effective, and

pervasive attributes. Capitalizing on ubiquitous WiFi signals that permeate both indoor and

outdoor spaces, WiFi sensing delivers unparalleled advantages over its traditional counterparts,

sidestepping the need for extra hardware yet offering profound environmental insights. Its

capability to penetrate walls and other obstructions further broadens its range, covering areas

beyond the reach of conventional sensors. These unique edges of WiFi sensing elevate its value

across diverse applications, spanning smart homes, health monitoring, location-based services,


and security systems. Amplifying environmental perception via WiFi sensing is more than

just an innovation in ubiquitous computing; it’s a leap towards forging safer, more efficient,

and smarter environments. This dissertation explores monitoring and mapping environments

leveraging motion analytics based on commodity WiFi.

In the first part of this dissertation, we introduce an efficient and cost-effective system for

precise floor plan construction by integrating RF and inertial sensing techniques. The proposed

system harnesses detailed insights from RF tracking and broad context from inertial metrics,

such as magnetic field strength, to produce an accurate map. The system employs a robot

for trajectory collection and requires only a single Access Point to be arbitrarily installed in

space, both of which are widely available nowadays. Impressively, the system can produce

detailed maps even with minimal data, making it adaptable for diverse structures such as shopping

centers, offices, and residences without significant expenses. We validated the efficacy of the

proposed system using a Dji RoboMaster S1 robot equipped with standard WiFi across three

distinct buildings, demonstrating its capability to produce reliable maps for the intended regions.

Given the widespread presence of WiFi setups and the increasing prevalence of domestic robots,

the proposed approach paves the way for universal intelligent systems offering indoor mapping

services.

In the second and third parts, we present two innovative strategies leveraging WiFi to

identify the motion of human and various non-human subjects. Initially, we detail a novel passive,

non-intrusive methodology tailored for edge devices. By extracting and analyzing motion’s

physically and statistically plausible features, our system recognizes human and diverse non-

human subjects through walls using a singular WiFi link. Experimental results from four distinct

buildings with various moving subjects validate its efficiency on edge devices. Advancing to


more intricate cases, we put forth a deep learning-based WiFi sensing paradigm. This delves

into the efficacy of diverse deep learning models on human and non-human object recognition

and probes the feasibility of transferring image-trained models to fulfill the WiFi sensing task.

Designed with a robust statistic invariant to the environment and position, this system efficiently

adapts to new surroundings. Comprehensive experimental evaluations affirm our framework’s

precision in pinpointing intricate human and non-human subjects, and readiness for integration

into prevalent intelligent systems, thereby boosting their perceptual capacities.

In the final part of this dissertation, we propose a pioneering through-wall indoor intrusion

detection system that adeptly filters out interference from non-human subjects using ubiquitous

WiFi signals. A novel deep learning architecture is proposed for single-link WiFi signal analysis.

It employs a ResNet-18-based module to extract features of indoor moving subjects and an

LSTM-based module to incorporate temporal information for efficient intrusion detection.

Notably, the system is invariant to environmental changes, angles, and positions, enabling

swift deployment in new environments without additional training. Evaluation in five indoor

environments with various interference yielded high intrusion detection accuracy and a low false

alarm rate, even without model tuning for unseen settings. The results underscore the system’s

exceptional adaptability, positioning it as a top contender for widespread intelligent indoor

security applications.


WIRELESS SENSING AND ANALYTICS FOR MOTION MONITORING
AND MAPPING

by

Guozhen Zhu

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment

of the requirements for the degree of
Doctor of Philosophy

2023

Advisory Committee:
Professor K. J. Ray Liu, Chair/Advisor
Professor Gang Qu, Co-Chair
Professor Min Wu
Dr. Beibei Wang
Professor Lawrence C. Washington


© Copyright by
Guozhen Zhu

2023


Dedication

To my family —

Xinhua Zhu, and Wenchuan Guo

ii


Acknowledgments

I am immensely grateful for the collective support, mentorship, and camaraderie that have

contributed to the completion of this dissertation.

First and foremost, my heartfelt gratitude goes out to my advisor, Professor K. J. Ray

Liu, for his belief in my abilities, which paved the way for the exploration of challenging yet

profoundly intriguing projects throughout my four-year journey. His continual availability and

prompt response for guidance and inquiries never wavered - there hasn’t been an instance when

I reached out for advice and he hasn’t generously lent his time. The experience of collaborating

with and learning from such an extraordinary individual has been an unparalleled privilege and

pleasure.

I am sincerely thankful to my mentors, Dr. Chenshu Wu and Dr. Beibei Wang, for their

invaluable advice, patience, mentorship, and for constantly challenging me to achieve more. I am

truly inspired by their commitment and dedication to academic excellence. Their suggestions,

discussions, and dedication to my progress have been a beacon throughout this journey.

I would like to thank my co-authors, Dr. Yuqian Hu, Dr. Xiaolu Zeng, and Weihang

Gao for their valuable contributions, intellectual stimulation, and spirited discussions. They have

enriched my research experience and boosted my learning curve.

I would like to extend my gratitude to my committee members, Prof. Min Wu, Prof. Gang

Qu, and Prof. Lawrence C. Washington, for their precious time and efforts in serving on my

iii


dissertation committee and reviewing the manuscript.

My journey would not have been the same without the camaraderie and support from my

lab mates, Wei-Hisang Wang, Dr. Sai Deepika Regani, Dr. Muhammed Zahid Ozturk, Dr.

Fengyu Wang, and Sakila S. Jayaweera. Their collective contribution in providing a collaborative,

intellectually stimulating environment was invaluable. I am fortunate to have shared this journey

with such a dedicated and friendly group of individuals.

I wish to express my special thanks to my best friend Mengting Xing, for her support, love,

and understanding. Her constant motivation and companionship provided comfort and strength

during challenging times. I also want to thank my wonderful friends here, Ke Gong, Yexin Cao

and Xiaojue Chen, for their friendship and unwavering support during this journey.

Finally, I would like to express my deepest gratitude to my parents, Mr. Xinhua Zhu and

Dr. Wenchuan Guo. Your unconditional love, encouragement, advice, and faith in my abilities

have been my rock. Thank you for instilling in me the values of hard work and perseverance,

and for your endless sacrifices to provide me with the best opportunities. This achievement is as

much yours as it is mine.

To all mentioned and those unmentioned who have contributed to my academic journey in

any form, I am eternally grateful. I am reminded of an African proverb, “If you want to go fast,

go alone. If you want to go far, go together.” This journey has been far and enriching, and I could

not have done it without each and every one of you.

Thank you all.

iv


Table of Contents

Dedication ii

Acknowledgements iii

Table of Contents v

List of Tables viii

List of Figures ix

List of Abbreviations xi

Chapter 1: Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Floor Plan Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Human and Non-Human Motion Identification . . . . . . . . . . . . . . 6
1.2.3 Indoor Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Dissertation Outline and Contributions . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Automatic Floor Plan Construction (Chapter 2) . . . . . . . . . . . . . . 10
1.3.2 Human and Non-human Motion Discrimination (Chapter 3) . . . . . . . 11
1.3.3 Deep Learning for Human and Non-human Motion Identification

(Chapter 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Through-the-wall Indoor Intrusion Detection (Chapter 5) . . . . . . . . . 12

1.4 Preliminaries on Wireless Sensing . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Received Signal Strength Indicator . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Channel State Information . . . . . . . . . . . . . . . . . . . . . . . . . 15

Chapter 2: Automatic Floor Plan Construction 17
2.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 Trajectory Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3 Trajectory Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.4 Segment Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.5 Trajectory Bundling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.6 Trajectory Fusion and Area Shaping . . . . . . . . . . . . . . . . . . . . 29

2.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

v


2.2.1 Experimental Setup and Data Collection . . . . . . . . . . . . . . . . . . 32
2.2.2 Reconstructed Floor Plans . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Chapter 3: Human and Non-human Motion Discrimination 40
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.1.2 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1.3 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Motion Detection and Speed Estimation . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 Motion Detection and Speed Estimation . . . . . . . . . . . . . . . . . . 47

3.3 Feature Extraction and Motion Recognition . . . . . . . . . . . . . . . . . . . . 49
3.3.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.2 Recognition Model Design . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.3 State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4.2 Recognition Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5.1 Effectiveness of State Machine . . . . . . . . . . . . . . . . . . . . . . . 63
3.5.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5.3 Feature Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5.4 Length of Motion Segments . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5.5 Sounding Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.6 Edge Device Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Chapter 4: Deep Learning for Human and Non-human Motion Identification 69
4.1 WiFi Signal Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.1.1 Environment-independent Statistic Extraction . . . . . . . . . . . . . . . 72
4.1.2 Motion Detection and Segmentation . . . . . . . . . . . . . . . . . . . . 75
4.1.3 Input Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2 Deep Learning for Human and Non-human Recognition . . . . . . . . . . . . . . 77
4.2.1 Forward Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.3 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.4 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.5 Transfer from Pre-trained Models . . . . . . . . . . . . . . . . . . . . . 81

4.3 Experimental Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.2 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3.3 Dataset and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

vi


4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.4.1 Classification Performance Evaluation . . . . . . . . . . . . . . . . . . . 86
4.4.2 Recognition in Unseen Environments for Seen Subjects . . . . . . . . . . 87
4.4.3 Recognition in Unseen Environment for Unseen Subjects . . . . . . . . . 88
4.4.4 Recognition in Unseen Environment for Coexisting Multiple Subjects . . 89
4.4.5 Performance of Transfer Learning . . . . . . . . . . . . . . . . . . . . . 90
4.4.6 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.4.7 Window Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4.8 Sounding Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.4.9 Computational Complexity and Model Parameter . . . . . . . . . . . . . 95

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Chapter 5: Through-the-wall Indoor Intrusion Detection 97
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.2 WiFi-based Intrusion Detection Framework . . . . . . . . . . . . . . . . 100

5.2 WiFi Signal Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.1 A-ACF Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.2 Motion Detection and Segmentation . . . . . . . . . . . . . . . . . . . . 103

5.3 Intrusion Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3.1 Motion Recognition Module . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3.2 Intrusion Detection Module . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.4 Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.1 Hardware and Experimental Environment . . . . . . . . . . . . . . . . . 107
5.4.2 Dataset and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4.3 Evaluation on Classification Performance . . . . . . . . . . . . . . . . . 110
5.4.4 Evaluation on Intrusion Detection Performance . . . . . . . . . . . . . . 111

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.5.1 Effectiveness of Intrusion Detection Module . . . . . . . . . . . . . . . . 114
5.5.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5.3 Computational Complexity and Memory Requirements . . . . . . . . . . 116

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Chapter 6: Conclusion and Future Work 118
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Bibliography 122

vii


List of Tables

2.1 Evaluation results of hallway shape . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Evaluation results of construction efficiency . . . . . . . . . . . . . . . . . . . . 39

3.1 Human and non-human motion identification performance . . . . . . . . . . . . 60
3.2 Adaptivity to other non-human subjects . . . . . . . . . . . . . . . . . . . . . . 62
3.3 State machine performance evaluation . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Recognition performance with feature selection . . . . . . . . . . . . . . . . . . 65
3.5 Performance at difference sounding rates . . . . . . . . . . . . . . . . . . . . . . 67

4.1 Summary of dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2 Classification accuracy of deep learning models for human and non-human

motion identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 Classification accuracy evaluation on transfer learning . . . . . . . . . . . . . . . 91
4.4 Computational resource requirements and model size . . . . . . . . . . . . . . . 96

5.1 Summary of dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Architecture of Wi-IntruNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3 Classification accuracy of MRM for human and non-human motion identification 112
5.4 Evaluation of Wi-IntruNet in unseen environments with seen, unseen and

coexisting subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5 Computational complexity and memory requirements . . . . . . . . . . . . . . . 116

viii


List of Figures

2.1 System overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Crowdsourcing trajectory collection devices. . . . . . . . . . . . . . . . . . . . . 20
2.3 Atomic segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 An example of matching process. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Time series of MFS and RSSI on two path. . . . . . . . . . . . . . . . . . . . . . 26
2.6 Bundling process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 Trajectory fusion process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8 Curved trajectory positioning process. . . . . . . . . . . . . . . . . . . . . . . . 32
2.9 Ground truth floor plans of (a) office, (b) home, and (c) campus court. . . . . . . 33
2.10 Construction result of Scenario I. (a) Trajectories bundling result. (b)

Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d)
Alpha-shape of ground-truth floor plan. . . . . . . . . . . . . . . . . . . . . . . 34

2.11 Construction result of Scenario II. (a) Trajectories bundling result. (b)
Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d)
Alpha-shape of ground-truth floor plan. . . . . . . . . . . . . . . . . . . . . . . 34

2.12 Construction result of Scenario III. (a) Trajectories bundling result. (b)
Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d)
Alpha-shape of ground-truth floor plan. . . . . . . . . . . . . . . . . . . . . . . 35

2.13 Reconstructed hallway plan results of Scenario I using different number of
trajectories. (a) Result of using 24 trajectories. (b) Result of using 35 trajectories.
(c) Result of using 46 trajectories. (d) Result of using 52 trajectories. (e) Result
of using 61 trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.1 An illustration of non-human motion interference with sensing systems. . . . . . 41
3.2 An illustration of moving patterns of human, dog and cleaning robot. . . . . . . . 44
3.3 System overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 CSI, ACF, motion statistics and speed estimation for (a) human, (b) pet, and (c)

cleaning robot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Histograms of (a) motion statistics and (b) speed estimations for human and pet. 49
3.6 Example of human stride cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.7 Stide cycle time and stride length of human and pet. . . . . . . . . . . . . . . . . 52
3.8 (a) 25 percentile and (b) 75 percentile of speed estimations for human, robot, pet. 53
3.9 (a) ACF at a time instance, (b) average ACF peak value, and (c) average ACF

valley values of human, robot, pet. . . . . . . . . . . . . . . . . . . . . . . . . . 54

ix


3.10 Floor plans of (a) Scenario I: an apartment, (b) Scenario II: a townhouse, (c)
Scenario III: a single family house, and (d) Scenario IV: an office building.
Multiple pairs of transceiver are adopted to enhance data collection efficiency
and data diversity, but only data from single transceiver pair is used for feature
extraction and recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.11 (a) Off-the-shelf WiFi device, (b) pet in Scenario I. . . . . . . . . . . . . . . . . 58
3.12 Adaptivity to human motion types. . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.13 Impact of length of motion segments. . . . . . . . . . . . . . . . . . . . . . . . . 66
3.14 CPU and memory requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.1 CSI of (a) human motion in Environment A, (b) pet motion in Environment A, (c)
cleaning robot motion in Environment A, (d) human motion in Environment B,
(e) pet motion in Environment B, and (f) cleaning robot motion in Environment B. 72

4.2 A-ACF of (a) human motion in Environment A, (b) pet motion in Environment A,
(c) cleaning robot motion in Environment A, (d) human motion in Environment
B, (e) pet motion in Environment B, and (f) cleaning robot motion in Environment
B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3 Motion detection and A-ACF segmentation with motion statistics. . . . . . . . . 76
4.4 Illustration of data input preparation. . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5 (a) Tx, and (b) Rx in Scenario III. . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6 Floor plan of (a) Scenario I, an apartment, (b) Scenario II, a townhouse, (c)

Scenario III, a single family house, (d) Scenario IV, a single family house, and
(e) Scenario V, an office building. . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.7 Recognition accuracy in unseen environment with familiar subjects. . . . . . . . 88
4.8 Recognition accuracy in unseen environment with unseen subjects. . . . . . . . . 89
4.9 Recognition accuracy in unseen environment with multiple subjects coexisting. . 90
4.10 Training losses of deep learning models over 100 epochs. . . . . . . . . . . . . . 91
4.11 Validation losses of deep learning models over 100 epochs. . . . . . . . . . . . . 92
4.12 Training procedures of deep learning models regarding training accuracy. . . . . 93
4.13 Evaluation on impact of motion segment length. . . . . . . . . . . . . . . . . . . 94
4.14 Evaluation on impact of sounding rate. . . . . . . . . . . . . . . . . . . . . . . . 95

5.1 Overview of Wi-IntruNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 An illustration of LSTM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 (a) Tx, and (b) Rx in Scenario III. . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.4 Floor plan of (a) Scenario I, an apartment, (b) Scenario II, a townhouse, (c)

Scenario III, a single family house, (d) Scenario IV, a single family house, and
(e) Scenario V, an office building. . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.5 Effectiveness evaluation of IDM. . . . . . . . . . . . . . . . . . . . . . . . . . . 115

x


List of Abbreviations

A-ACF Amplified Autocorrelation Function
ACF Autocorrelation Function
AoA Angle of Arrival
AP Access Point

CDF Cumulative Distribution Function
CFO Channel Frequency Offset
CFR Channel Frequency Response
CIR Channel Impulse Response
CNN Convolutional Neural Network
CPU Central Processing Unit
CSI Channel State Information

DTW Dynamic Time Warping
DFS Doppler Frequency Shift

EM Electromagnetic

FMCW Frequency Modulated Continuous Wave
FNN Forward Neural Network
FFT Fast Fourier Transform
FPR False Positive Rate

GHz Gigahertz
GPU Graphics Processing Unit
GRUNet Gated Recurrent Unit Network

HMM Hidden Markov Model

IDM Intrusion Detection Module
IoT Internet of Things
IMU Inertial Measurement Unit

LBS Location-Based Services

xi


LOS Line of Sight
LSTM Long Short Term Memory

MFS Magnetic Field Strength
MHz Megahertz
MIMO Multiple Input Multiple Output
MLP Multilayer Perceptron
mmWave Millimeter-Wave
MRC Maximum Ratio Combine
MRM Motion Recognition Module

NLOS Non Line of Sight
NMI Normalized Mutual Information

PCA Principal Component Analysis

RAM Random-Access Memory
RF Radio Frequency
RNN Recurrent Neural Network
RSS Received Signal Strength
RSSI Received Signal Strength Indicator
Rx Receiver

SFO Sampling Frequency Offset
SLAM Simultaneous Localization and Mapping
SNR Signal-to-Noise Ratio
STFT Short-Time Fourier Transform
STO Symbol Timing Offset
SVM Support-Vector Machines

ToA Time of Arrival
ToF Time of Flight
TPR True Positive Rate
TRRS Time Reversal Resonance Strength
Tx Transmitter

UWB Ultra-Wide Band

ViT Vision Transformer

xii


Chapter 1: Introduction

1.1 Motivation

Environmental perception is integral to the effectiveness of intelligent systems. It enables

these systems to understand, map, and monitor their surroundings, which is crucial for their

proper functioning and optimization. Mapping the environment allows a system to accurately

determine its position and plan routes, a fundamental aspect for applications like autonomous

vehicles and robotics. Similarly, monitoring enables a system to detect changes or anomalies

in the environment, key for predictive maintenance, security, or hazard detection. In essence,

the ability to perceive the environment underpins the core functionalities of intelligent systems,

enhancing their effectiveness, safety, and efficiency.

WiFi signals are ubiquitous in our modern digital world, providing wireless connectivity for

a plethora of devices ranging from smartphones to smart home appliances. These signals, which

invisibly traverse our environment, are a cornerstone of wireless communication and have opened

up a new avenue of WiFi sensing. By detecting alterations in WiFi signal characteristics due to

interactions with objects or individuals, WiFi sensing provides a new layer of environmental

awareness. It offers numerous advantages, including: 1) non-intrusive and privacy-preserving

sensing, 2) lower costs and energy consumption by leveraging existing infrastructure, 3) the

ability to penetrate penetrate walls and other opaque objects for hard-to-reach sensing areas, 4)

1


high scalability due to the ubiquity of WiFi, and 5) resilience to lighting conditions for continuous

operation.

Motivated by the compelling benefits of WiFi-based sensing and the critical need

to augment the perceptual capabilities of intelligent systems, this dissertation explores the

innovative use of ubiquitous WiFi signals for environmental perception enhancement through

motion analytics, with a specific focus on mapping and monitoring. We present an advanced suite

of solutions that includes an automated algorithm for floor plan construction, two distinctive

approaches for identifying human and non-human subjects, and a robust indoor intrusion

detection system. By pushing the frontiers of Wi-Fi sensing, this research marks a significant

advancement in the realm of intelligent systems, fostering their evolution towards heightened

perceptual precision and superior situational awareness.

• Automatic floor plan construction is a vital component for intelligent systems. Providing a

bird’s eye view of an environment, floor plans are crucial for navigation, localization, and

planning. For systems such as autonomous vehicles and mobile robots, they enable efficient

path planning and collision avoidance. Furthermore, they assist in resource allocation

for indoor positioning systems, improving services like indoor navigation and emergency

response. Therefore, the development of automated floor plan construction techniques

significantly enhances the perceptual capabilities and operational efficiency of intelligent

systems.

• Human and non-human motion identification is a critical capability for intelligent

systems, particularly for applications in security, automation, and user interaction. By

distinguishing between human and non-human entities, systems can perform tasks more

2


efficiently and accurately, ensuring proper response based on the detected subject type.

Given the prevalence of pets, robotic vacuum cleaners, and electrical appliances, especially

in residential environments, it is crucial to develop a reliable system that can accurately

recognize human and non-human subjects.

• Indoor intrusion detection is integral for maintaining safety and security in various

environments. An advanced intrusion detection system can provide real-time alerts and

minimize the false alarms caused by non-human subjects. In the context of intelligent

systems, the incorporation of robust intrusion detection can significantly elevate their utility

and effectiveness, making them indispensable tools for proactive security management. By

enhancing the ability of these systems to detect intrusions and mitigating the interference

from non-human motion, we can ensure a higher level of security and peace of mind for

individuals and organizations alike.

1.2 Related Works

The related works of this dissertation cover floor plan construction, human and non-human

motion identification, and indoor intrusion detection, which are reviewed in the following

subsections, respectively.

1.2.1 Floor Plan Construction

Although projects like Google Indoor Maps, Point Inside, and Micello Indoor Map aim

at collecting indoor maps, the availability of indoor maps is still very limited considering the

huge number of buildings worldwide. Furthermore, most floor plans are manually generated and

3


uploaded in these projects. The traditional manual methods require professional technicians to

draw the floor plan using specialized measurement devices, which is time-consuming, costly, and

thus unaffordable to cover all buildings. In addition, potential changes in the indoor environment

would require much effort to update the maps.

Efforts have been taken to generate indoor maps automatically. Most indoor map

construction approaches are based on Simultaneous Localization and Mapping (SLAM), which

is widely used in robotics to locate robots in unknown scenarios by mapping the environment

simultaneously. Among numerous SLAM systems, many of them are based on vision [1–5].

These systems could construct 3D indoor maps by modeling the environment through images

obtained by cameras. However, the cameras have a high requirement for light, making these

systems unnable to work in an environment with poor lighting. Besides, there are many privacy

concerns in vision-based SLAM systems. Lasers are commonly used to combat the concerns

of vision-based systems. However, laser-based systems are costly and need special equipment.

Inertial Measurement Unit (IMU) is free of these drawbacks, but it suffers from odometry

deviation, especially over a long distance.

Pedestrian Dead Reckoning (PDR) algorithm is often employed to estimate the trajectories

from inertial sensor data [6–14]. However, PDR estimated trajectories by inertial sensors usually

suffer from large deviation, especially in direction. Thus, when these trajectories are employed

for floor plan construction, several methods have been proposed to correct the deviation error

caused by dead reckoning. In CrowdInside [7] and SenseWit [6], a large number of anchor

points, such as water dispensers and doors, are utilized to overcome the derivation of IMU.

However, in many buildings, these anchor points are too sparse to correct the IMU errors. Thus,

the accuracy and generation performance would decrease in buildings without sufficient anchor

4


points employed in their methods. Furthermore, to construct a map with erroneous trajectories,

inertial sensor-based systems require a vast amount of data to estimate the reachable area in the

environment. Different from current works based on PDR using inertial sensors, EZMap requires

neither any anchor points nor a large number of trajectories. We propose a hierarchical matching

scheme to accurately match trajectories at the exact location by considering both the geometric

shape estimated from the RF tracking and ambient properties such as RSSI and the MFS.

Different approaches have been developed to generate maps from crowdsourced data,

which can be classified into two categories: grid-based and topological-based. For grid-based

map construction, Gaussian distribution is often used to compute the accessible probability of

each cell to create the occupancy grid map. Grids in the matrix are described by values between

two states of 0 and 1, where 0 represents that the grid is accessible and 1 represents that objects

fully occupy the grid. The value between 0 and 1 represents the percentage of the area that an

object occupies the grid [8, 9]. These approaches are easy to build and maintain but suffer from

enormous storage space and are time-consuming [15].

Since the topological-based approach is more efficient than the grid-based approach,

it attracts more attention from researchers. Walkie-Markie [16] applies the spring relaxation

concept, treating the encountered WiFi landmarks as nodes and the user trajectories as springs.

As more user trajectories are collected, the node positions are adjusted so as to achieve the

minimum system potential energy. CrowdInside [7] uses identifiable acceleration patterns of

the elevators, escalators, and stairs to correct dead reckoning, hence constructing the map. This

approach fails when the users mainly stay on the same floor, e.g., an exhibition in a one-storey

hall or a single-floor office such as that in our experiments.

In [17], users have to hold the Arduino sensor, and need to wait for the servomotor to sweep

5


the surroundings. [18] requires the users to walk in a straight line, and the corridors should be

perpendicular to each other, which means it cannot be applied to a structure with curved paths. A

preliminary version of this work has been published in [19] based on crowdsourcing with carts.

In this paper, we improve the work by tackling curve paths and providing detailed information

about the environment, such as open spaces and rooms.

In addition, existing approaches mainly crowdsource data from a large number of mobile

users and targets public spaces like malls and office buildings. Small private spaces like homes

are less explored. Differently, EZMap employs the increasingly popular home robots for data

collection and extends the scope to home environments.

1.2.2 Human and Non-Human Motion Identification

While there have been numerous indoor activity recognition and monitoring methods

proposed over the years, only a few have taken into account the impact of non-human motion.

Some researchers have attempted to filter out pet motion from human motion due to its

potential interference with their systems. In this section, we review the current methods used to

differentiate pets from humans, focusing on non-RF-based and RF-based methods.

Non-RF-based methods distinguishing pet motion and human motion mainly rely on heat

maps obtained from thermal sensors [20] or image/face recognition techniques using cameras

[21]. These methods have limited coverage, and are often restricted to LOS, and camera-based

methods further introduce privacy concerns [22].

As for RF-based methods, recent attempts to differentiate between humans and pets have

employed radar [23] and ultra-wideband (UWB) [24], focusing on analyzing biometric indicators

6


such as heartbeat and respiration. But these methods necessitate the pet to remain stationary

during recognition and impose constraints regarding their spatial position and distance from the

device, thereby rendering them impractical and ineffective for identifying moving subjects. While

[25] enables the recognition of canine movement distinct from human movement through radar, it

demands that the device be installed on the ceiling, and restricts the movement of humans and pets

to a limited area under the device. Furthermore, the method presented in [26] can differentiate

fan motion from human respiration, yet it remains inapplicable to human activities like walking

and running.

WiFi-based sensing systems are attractive for indoor activity recognition and monitoring

due to their ubiquity and low cost [27,28]. Numerous WiFi-based systems have been designed to

monitor human motion and the environment. WiFi-based motion detection systems [29,30] detect

the occurrence of human motion within an environment, while WiFi-based activity recognition

methods [31–35] attempt to recognize human daily activities like running, walking, and sitting.

Identification systems like [36–39] recognize human identity through gait or other biometric

features extracted from WiFi signals. In addition, mapping and tracking systems that leverage

WiFi [40–43] plot human movement paths and generate floor maps. Nonetheless, these systems

fail to account for the influence of non-human entities, like pets and vacuum robots, on their

operational accuracy. These non-human subjects, capable of affecting WiFi signals similarly to

humans, introduce a potential source of interference. Consequently, these systems’ reliability for

indoor monitoring and recognition can be compromised, particularly when non-human subjects

are present in the environment.

Some researchers have proposed to detect intrusion using WiFi. Among the proposed

systems, PerFree [44] and M-WiFi [45] proposed methods to reduce false alarms caused by

7


pets. For example, PetFree distinguishes pets from humans by mapping the subject’s effective

interference height (EIH) to CSI measurements. This method, however, requires the transmitter

and receiver to be positioned at the same height and it operates solely under LOS conditions.

Moreover, its performance can be affected by the multipath effect in environments with complex

layouts. M-WiFi differentiates between pets and humans by examining the RF disturbance.

However, this system encounters issues when a pet moves close to the wireless link. This

movement can induce signal fading akin to that of a person moving farther from the wireless link,

resulting in a significant number of false alarms, regardless of differences in RF disturbance.

Unlike the existing RF-based approaches, we have no restriction on pet movement,

device placement, or environment complexity. Notably, most existing works only consider the

interference of pets, and few tackle the motion of indoor robots and household appliances. In this

paper, we systematically differentiate various typical non-human motions from human motions.

1.2.3 Indoor Intrusion Detection

Human detection systems not based on Radio Frequency (RF) primarily utilize

technologies such as video cameras [21, 46], sound [47, 48], and Infrared sensors [49, 50].

Though camera- and sound-based detection systems can effectively identify intruders in an

environment, their practicality as ideal intrusion detection systems is marred by privacy concerns.

Furthermore, camera-based detection devices demand high environmental lighting conditions

and struggle under Non-Line-of-Sight (NLOS) scenarios.

Infrared intrusion detection devices, due to their relatively better privacy protection

and cost-effectiveness, are quite popular. However, these devices come with their own set of

8


limitations. They require precise positioning, offer smaller coverage areas, and fail under NLOS

conditions. Most importantly, their detection accuracy can be influenced by environmental

temperature, and they lack the ability to distinguish between human and non-human targets like

pets.

As for RF-based Intrusion Detection, there has been an uptick in indoor intrusion detection

projects using devices like millimeter-wave radars recently. While these devices have superior

accuracy and lower false-positive rates compared to infrared-based devices, their high cost and

extremely limited coverage area, along with inability to operate under NLOS conditions, pose

significant limitations.

As IoT devices and WiFi become increasingly ubiquitous, a surge in the development of

WiFi-based sensing systems has been observed [51, 52], thanks to their omnipresence and cost-

effectiveness. Of late, numerous studies have attempted to leverage WiFi for human detection

[29, 53–55]. However, these systems fail to eliminate false alarms triggered by non-human

movements in the environment, such as pets, cleaning robots, fans, etc.

M-WiFi [45] proposed to distinguish pet from human based on the difference of the RF

disturbance, but it failed to reduce false alarm rates, as pets approaching the wireless connection

could produce similar interference to humans distancing themselves from it. PetFree [44]

introduced a method to eliminate pet interference but required all equipment to be placed at

the same height and its performance decreases when applied to in complex environments.

Moreover, these existing approaches only accounted for pet interference, incapable of excluding

disturbances from other non-human entities like cleaning robots or moving household appliances.

Our previous work [56] proposed an SVM-based method to differentiate various non-

human targets from humans, offering a more comprehensive way to discern between human and

9


non-human movements in the environment. However, as a classification methodology, this work

did not consider the temporal correlation of targets’ appearances in the environment. In this paper,

we leverage a more robust deep learning model, designed specifically for WiFi-based intrusion

detection, and combine both current target detection results with historical information, enabling

an intrusion detection system capable of eliminating interference from various non-human targets

in the environment.

1.3 Dissertation Outline and Contributions

In this dissertation, we initially lay out the fundamentals of WiFi sensing in Chapter 1.4,

encompassing the Received Signal Strength Indicator (RSSI) and Channel State Information

(CSI). Following this, in Chapter 2, we delineate a method for automatic floor plan construction.

Chapters 3 and 4 respectively detail a Support Vector Machine (SVM) based method for

distinguishing human and non-human motion using edge computing, as well as a framework for

human and non-human subject recognition with deep learning methods. Chapter 5 introduces a

robust indoor intrusion detection system resilient to non-human movement, underpinned by deep

learning. Finally, in Chapter 6, we conclude the dissertation and discuss the future work.

1.3.1 Automatic Floor Plan Construction (Chapter 2)

In this chapter, we present EZMap, a high-accuracy, low-cost floor plan construction system

that fuses RF and inertial sensing. EZMap combines the fine-grained yet local information from

RF tracking with the coarse-grained but global contexts from inertial sensing (e.g., magnetic field

strength), which together makes for an accurate map. Our system employs a robot for trajectory

10


collection and requires only a single Access Point to be arbitrarily installed in the space, both

of which are widely available nowadays. Furthermore, it can generate a map even only a small

amount of data is available, allowing it to scale for different buildings like malls, office buildings,

and homes with little cost. We validate the performance using a Dji RoboMaster S1 robot with

commodity WiFi in three different buildings. The results show that our system can efficiently

generate faithful maps for the targeted areas. With the ubiquity of WiFi infrastructure and the rise

of home robots, we believe our approach will pave the way for pervasive indoor maps services.

1.3.2 Human and Non-human Motion Discrimination (Chapter 3)

In this chapter, we propose a novel system, Wi-MoID, that passively and unobtrusively

distinguishes moving human and various non-human subjects using a single pair of commodity

WiFi transceivers, without requiring any device on the subjects or restricting their movements.

Wi-MoID leverages a novel statistical electromagnetic wave theory-based multipath model to

detect moving subjects, extracts physically and statistically explainable features of their motion,

and accurately differentiates human and various non-human movements through walls, even in

complex environments. In addition, Wi-MoID is suitable for edge devices, requiring minimal

computing resources and storage, and is environment-independent, making it easy to deploy

in new environments with minimum effort. We evaluate the performance of Wi-MoID in four

distinct buildings with various moving subjects, including pets, vacuum robots, humans, and

fans, and the results demonstrate that it achieves 97.34% accuracy and 1.75% false alarm rate for

identification of human and non-human motion, and 92.61% accuracy in unseen environments

without model tuning, demonstrating its robustness for ubiquitous use.

11


1.3.3 Deep Learning for Human and Non-human Motion Identification

(Chapter 4)

In this chapter, we design a deep learning framework to recognize human and non-human

subject with WiFi signals through the wall. Our system extract environment independent features

from single-link WiFi. We investigate the performance of popular deep neural networks and

explore the efficacy of transfer models trained with image dataset on WiFi sensing tasks. We

implement our framework and evaluate the performance in five environments with commodity

WiFi devices. With a challenging dataset considering large pets and multiple subjects coexiting

cases, our proposed framework achieves an average validation accuracy of 95.84% and an average

testing accuracy of 91.71% in unseen environments without further training or parameter tuning.

These results underline the robustness of our approach and its readiness for integration into

ubiquitous intelligent IoT systems and applications.

1.3.4 Through-the-wall Indoor Intrusion Detection (Chapter 5)

In this chapter, we propose Wi-IntruNet, the first robust through-wall indoor intrusion

detection system that mitigates the interference from non-human indoor objects based on widely

available WiFi signals. Wi-IntruNet introduces a novel deep learning framework for single-link

WiFi signal analysis, employing a ResNet model for extracting features of indoor moving targets

and an LSTM to incorporate temporal data for efficient intrusion detection. Notably, Wi-IntruNet

is invariant to environmental changes, angle, and position, enabling swift deployment in new

environments without additional training. We implement and extensively evaluate Wi-IntruNet

12


with commodity WiFi devices in 5 typical indoor environments with various interference from

pets, cleaning robots and fans. The results revealed that Wi-IntruNet achieved an average of

98.91% intrusion detection accuracy and 2.84% false alarm rates in unseen environments without

model tuning, underscoring its robustness and ready for ubiquitous indoor security applications

1.4 Preliminaries on Wireless Sensing

Wireless sensing represents a pivotal advancement in the field of remote detection and

measurement technologies. It leverages the pervasive and invisible radio waves to detect,

identify, track, and understand phenomena within their range without requiring a physical

connection or proximity. One notable subset of this technology is WiFi sensing, which

utilizes the omnipresent WiFi signals as a source of information to enable context-aware

applications. WiFi sensing is capable of detecting minute changes in the wireless signals caused

by various objects and movements in the environment, providing rich information about the

surroundings. This transformative technology not only offers an innovative way to interact with

the environment but also opens up a plethora of opportunities for applications in various sectors

such as healthcare, security, home automation, and more.

5 GHz WiFi, part of the dual-band WiFi capabilities present in many modern devices,

provides a faster, less congested frequency range compared to its counterpart, 2.4 GHz WiFi.

This frequency band not only allows for greater data transmission speeds but also supports a

wider range of channels, reducing the potential for interference from other devices. When utilized

for WiFi sensing, the 5 GHz spectrum can offer improved performance and accuracy due to its

characteristics. WiFi sensing on this band leverages the scattering, reflection, and absorption

13


of the 5 GHz signals by objects and people in the environment to create a rich tapestry of data

about the surroundings. This information can be analyzed and interpreted to enable a range of

applications, such as detecting movement or occupancy in a room, monitoring vital signs for

healthcare, or even identifying gestures for interaction with smart devices. The use of 5 GHz

WiFi sensing thus combines the benefits of higher frequency wireless communication with the

growing potential of context-aware technologies.

1.4.1 Received Signal Strength Indicator

Received Signal Strength Indicator (RSSI) is a metric used in wireless communications to

measure the power level received by a device from a source transmitter. In an IEEE 802.11

system, RSSI is the relative received signal strength in a wireless environment. RSSI is an

indication of the power level being received by the receiving radio after the antenna and possible

cable loss. A higher RSSI value denotes a stronger received signal. This metric is crucial

for various applications, including network diagnostics, link quality estimation, and location-

based services. Factors affecting RSSI include the distance from the transmitter and physical

obstructions.

RSSI plays a vital role in various WiFi sensing applications. One of the primary

applications of RSSI in WiFi sensing is indoor positioning. By measuring the RSSI values from

multiple access points, a device can estimate its location based on signal strength triangulation.

The stronger the signal from a particular access point, the closer the device is likely to be to that

access point. Beyond positioning, RSSI also aids in detecting movement, gauging link quality,

assisting in network management, and offering insights into human-device interactions.

14


While RSSI offers a singular measure of overall signal strength, indicative of the total

received power from a transmitter, there are some cases—such as motion detection, through-wall

sensing, gait recognition—where finer granularity is required to capture the intricate nuances

of the wireless environment. Therefore, Channel State Information (CSI) comes into play,

providing detailed information on the channel’s response across different frequencies. CSI

captures the amplitude and phase of each signal path, allowing for a deeper understanding of

environmental dynamics. In scenarios demanding precision and comprehensive insights into the

wireless channel, CSI is the preferred choice over RSSI.

1.4.2 Channel State Information

CSI represents an intricate matrix containing the properties of the wireless communication

channel. It captures comprehensive information regarding the propagation path between each

transmit-receive antenna pair, including the path loss, multipath fading, phase shift, and Doppler

effect due to relative movement. The CSI thus provides a detailed spectral view of the WiFi

signal as it traverses through, and interacts with, the complex wireless medium. The CSI is often

calculated as the frequency response of the channel, which can be modeled as

H(t, f) =
L∑
l=1

al(t) exp (−j2πfτl(t))

where al(t) and τl(t) denote the complex amplitude and propagation delay of the l-th multipath

component (MPC), respectively, and L stands for the number of MPCs.

Due to the timing and frequency synchronization offsets and additive thermal noise, the

15


real measurement of CFR H̃(t, f) is expressed as

H̃(t, f) = exp(−j(α(t) + β(t)f))H(t, f) + n(t, f),

where α(t) and β(t) are the random initial and linear phase distortions at time t, respectively.

Define the channel power response G(t, f) as the square of the magnitude of H̃(t, f)

G(t, f) ≜ |H̃(t, f)|2

= |H(t, f)|2 + 2Re {n∗(t, f)H(t, f)

exp(−j(α(t) + β(t)f))}+ |n(t, f)|2

≜ |H(t, f)|2 + ε(t, f),

(1.1)

where the superscript ∗ denotes the operator of complex conjugate, the operator Re denotes the

real part of x, and n(t, f) is defined as the noise term, which can be approximated as additive

white Gaussian noise (AWGN) with variance σ2
n and is statistically independent of H(t, f) [57].

16


Chapter 2: Automatic Floor Plan Construction

Location-Based Services (LBS) are gaining increasing popularity, such as navigation,

location-based searching, and social network services, etc., thanks to the ever-growing presence

of smart devices with built-in location systems and maps. However, most of these services are

only available in outdoor environments. One of the most critical constraints to the development

of indoor LBS is the lack of digital indoor maps [58].

Recent advances in RF-based tracking enable an opportunity for this breakthrough [59–62].

Notably, a recent work RIM [63], an RF-based Inertial Measurement system, has enabled

centimeter-level distance tracking on commodity WiFi in both Line-of-Slight (LOS) and Non-

LOS (NLOS) conditions with only a single Access Point. RIM allows us to collect high-accuracy

trajectories with commodity WiFi clients (e.g., home robots) and further construct precise floor

plans from the gathered trajectories. There are multiple advantages if we could leverage this

opportunity for floor plan construction: It only uses commodity WiFi, particularly without

expensive hardware like lidar or privacy-intrusive modules like cameras. It promises high

accuracy for broad coverage with only one AP, including both LOS and NLOS areas throughout

the space being covered by the WiFi signals [64]. Moreover, different from previous works

that assume a number of mobile users to contribute a lot of data, we could achieve high-quality

construction with a small amount of data efficiently collected by a single home robot.

17


However, leveraging RF-based tracking like RIM for high-precision floor plan construction

entails great challenges. First, RIM only provides distance estimation (using a linear array

available on most devices). To recover a trajectory, we still need the direction information, for

which we can only utilize the inaccurate inertial sensors. Second, despite its high moving distance

estimation accuracy, the RF-based tracking lacks global reference information, a necessity to

connect different trajectories and construct a floor plan. And it becomes very difficult to obtain

reliable global reference information while there is only one single AP in the environment [65].

Third, given the imperfect information, it is non-trivial to fuse trajectories collected by a robot to

recover a floor plan.

In this chapter, we overcome the above challenges and propose EZMap, a novel system

to boost the automatic construction of indoor floor plans. EZMap employs a commodity home

robot, equipped with WiFi and inertial sensors to roam around the environment and collect data.

It requires only a single AP and can map not only public spaces such as malls and office buildings

but also home environments that are barely explored by existing works. The reconstructed map

represents hallway structure, room layout, as well as their sizes (i.e., corridor widths and room

sizes).

This chapter is organized as follows. Section 2.1 elaborates on the system design and

detailed solutions. Section 2.2.3 shows the implementation, experiments, and evaluation results.

In Section 2.3, we conclude this chapter.

18


5 6 7 8 9 10 11 12 13 14 15
5

10

15

20

25

30

35

-4 -2 0 2 4 6 8 10 12 14 16
10

12

14

16

18

20

22

24

26

Acquisition Segment
Matching

Trajectory
Bundling

Geometric
Constraints

Trajectory Fusing
and Shaping

Constructed
Floor Plan

Trajectory
Segmentation

Ambient
Properties

Figure 2.1: System overview.

2.1 System Design

2.1.1 System Overview

In this part, we present an overview of EZMap system as Fig. 2.1 shows. The traces can be

collected by humans or robots with one AP deployed in the environment. The traces contain the

distance information obtained by RIM and direction information derived from inertial sensors.

RSSI of the AP and the magnetic field strength (MFS) are also recorded. Then, the collected

trajectories, which could be of arbitrary lengths, are divided into short segments, named as

atomic segments, to overcome the potential errors accumulated in orientation while leveraging the

accurate distance estimation (Trajectory Segmentation). The atomic segments are then clustered

by their intrinsic geometric constraints as well as the accompanying time series of RSSI and MFS

(Segment Matching). The clustered segments are then positioned by bundling the long trajectories

and thereby inferring their relative positions (Trajectory Bundling). Finally, all trajectories are

fused, taking their original shapes into account, to output the reconstructed map with corridor

widths and room sizes (Trajectory Fusion and Shaping).

19


Tracking
Device

Cart

(a) Cart.

Tracking
Device

Dji Robot

(b) Robot.

Figure 2.2: Crowdsourcing trajectory collection devices.

2.1.2 Trajectory Acquisition

The crowdsourcing trajectory tracking is based on RIM [63] for distance estimation and

inertial sensors for orientation estimation. RIM is good at estimating the moving distance of

wheeled platforms and robots [40]. Leveraging multipath profiles as virtual antennas with a

super-resolution virtual antenna alignment algorithm, RIM achieves centimeter accuracy in

moving distance over a multipath-rich area. RIM needs only a single AP, and captures Channel

State Information (CSI) from data packets for its moving distance calculation. EZMap relies on

RIM’s algorithm to estimate the moving distance with high accuracy while using inertial sensors

to measure the turning angles and heading information, which together shape the geometric

properties. The device setup can be found in Fig. 2.2, where we can use a pushing cart in Fig.

2.2a and a Dji robot in Fig. 2.2b. In both cases, the blue bot is customized hardware with a

commodity WiFi chipset and IMU on it. We use a customized hardware because CSI extraction

is not yet ready on Dji robot. EZMap itself is a software solution making no changes to hardware.

20


2.1.3 Trajectory Segmentation

The acquired trajectories come in various lengths from meters to tens of meters or even

longer. EZMap prefers long trajectories in general, as they likely cover and connect wider areas

and thus contain more information to be uniquely identified. However, long trajectories suffer

from large accumulative errors particularly in orientation. To overcome the issues, we propose

to decompose all collected trajectories into short pieces in a novel form of atomic segments that

better preserve the accurate geometric shape information.

We divide each long trajectories into the structures that consist of a straight segment with

two turns on the two ends, as shown in Fig. 2.3(a). We name such a structure an atomic

segment and use it as the basic unit of trajectories in EZMap. Each atomic segment contains

distance information of the path section and angle information of two turns at its two ends. Such

atomization can be accomplished by a simple turn detection based on inertial sensor readings.

In EZMap, we calculate the change rate of angle (the first-order reciprocal of angle to time) of

accelerometer and gyroscope data to detect turns. The procedure of trajectory segmentation is

shown in Fig. 2.3(b).

The design of atomic segment is inspired by the below insights: First, the atomic segments

disconnect the original trajectories at each turn, because orientation errors easily accumulate

during a turn (as seen in the example in Fig. 2.3(c)). As a result, it preserves the accurate distance

information while resets significant orientation errors before they could accumulate over multiple

turns. Second, the two turns are kept as parts of each atomic segment because they provide more

information to cluster the segments that have similar lengths. Taking the segments in Fig. 2.3(a)

as an example, two atomic segments, ending with a left turn and a right turn respectively, can

21


Path
Section 2nd Turn

1st Turn

Atomic
Segments

(a) Example atomic segments.

-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25

-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25

-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25

-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25

-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25

-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25

-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25-6
-4

-2
0

2
4

6
8

-10 -5 0 5 10 15 20 25

Long trajectory

Turn Detection Segmentation

Atomic segments

(b) Procedure of trajectory segmentation.

-6 -4 -2 0 2 4 6 8
-10

-5

0

5

10

15

20

-6 -4 -2 0 2 4 6 8
-10

-5

0

5

10

15

20

25

Correct Trajectory Trajectory with direction estimation error

(c) Errors accumulated at turns.

Figure 2.3: Atomic segments.

be separated even they have the same length. Third, considering real-world building layouts,

such atomic segments can be frequently obtained. And as will be demonstrated later, the global

location information (i.e., RSSI and MFS time series) associated with such segments suffice to

22


uniquely cluster and reconnect them, be jointly considering the precise geometric shape of the

segments.

Note that some trajectories will not be segmented by the atomization. For example, if

the target circles around and moves along a curved pathway, the resultant trajectory would

be continuously “turning” and will never be cut. These curved segments are still useful, as

they possibly provide information about some open spaces and/or rooms, and will be handled

separately in area shaping, as in Section 2.1.6.2.

2.1.4 Segment Matching

To generate an accurate floor plan with tracking traces, we need to accurately estimate the

relative position of each trajectory. However, it is very challenging because we do not have global

reference or the start point of each trajectory. As we discussed in the previous section, we divide

the long trajectories into atomic segments and resort to determining the relative position of each

atomic segment. To accurately predict the relative position of each segment, we take two steps.

We first cluster the atomic segments belongings to the same location. Then the atomic segments

in the same cluster are positioned and bundled by inferring its relative position from the long

trajectory. In this section, we will focus on the first step, that is identifying the segments at the

same location and distinguishing those at different locations.

We present a trajectory matching algorithm to recognize the segments belonging to the

same location and distinguish those belonging to different locations by their geometric shapes

and time series of RSSI and MFS. The matching process is shown in Fig. 2.4.

23


5 6 7 8 9 10 11 12 13 14 15
5

10

15

20

25

30

35

-4 -2 0 2 4 6 8 10 12 14 16
10

12

14

16

18

20

22

24

26

-4 -2 0 2 4 6 8 10 12 14 16
10

12

14

16

18

20

22

24

26

Ambient 
Properties

Geometric
Constraints

Figure 2.4: An example of matching process.

2.1.4.1 Geometric Constraints

With the unique structure that we designed for segments, each atomic segment has a

geometric shape which contains the angle information of two turns and distance information

of the path section in between. Atomic segments related to same path show similar geometric

shapes. Geometric constraints are designed to sort the segments with similar shapes together

based on their geometric shape information. Example of atomic segments belonging to the same

path are shown in Fig. 2.3(a).

The constraint of geometric shapes has two parts, distance constraint and angle constraint.

As for the distance constraint, based on experimental observations and the distance tracking

accuracy of the tracking client, we limit the distance difference between segments on the same

24


path to be no more than 3 m. In other words, if the difference of estimated distances of two path

sections are greater than 3 m, these two segments are unlikely to be at the same location. As for

the angle constraint, considering certain angle errors, the angle difference between the same turn

is limited to be no more than 30 degrees. We choose 30 degrees because it has tolerance for the

orientation estimation error from IMU, but will not match the trajectories of different turns into

one category. If the constraint is too strict, the segments of the same path may not be matched

together as they have orientation deviations caused by IMU. If the constraint is too loose, the

segments with different turning angles would be mistakenly matched.

Noted that atomic segments from the same location with different turnings (left and right)

would not be clustered together by this method. But this will not affect the final reconstructed

map because they would be recognized and merged together with trajectory fusion later at Section

2.1.6.1.

To verify the effectiveness of the geometric constraints, we conduct a comparative

experiment where over 200 atomic segments are used to compare the matching accuracy with

and without geometric constraints. The experimental results validate that the trajectory constraint

can effectively improve the accuracy of segment matching, which will be shown later in Section

2.2.3.1.

2.1.4.2 Ambient Properties

The geometric constraints alone are insufficient to support segment matching with high

accuracy since it cannot separate the segments from different paths but having similar geometric

shapes. Therefore, a robust global reference is crucial to the matching accuracy and the

25


0 20 40 60 80 100 120 140 160

Time series number of MFS

-20

-10

0

10

20

30

40

50

M
F

S
 (

T
)

1
st

 time series of MFS on path 1

2
nd

 time series of MFS on path 1

1
st

 time series of MFS on path 2

2
nd

 time series of MFS on path 2

(a) Time series of MFS.

0 20 40 60 80 100 120 140 160 180

Time series number of RSSI

-55

-50

-45

-40

-35

R
S

S
I 

(d
B

m
)

1
st

 time series of RSSI on path 1

2
nd

 time series of RSSI on path 1

1
st

 time series of RSSI on path 2

2
nd

 time series of RSSI on path 2

(b) Time series of RSSI.

Figure 2.5: Time series of MFS and RSSI on two path.

construction performance. In EZMap, we leverage the time series of MFS and RSSI along the

segment, denoted as ambient properties for global reference information and propose a robust

algorithm to match them.

As described in CrowdMap[8], the magnetic field anomalies caused by ferromagnetic

construction materials (e.g., reinforcing steel bars) on the indoor path are generally stationary

and unique over time. The time series of MFS are collected when a user or a robot walks along

a specific indoor path and are used as the representative feature of the corresponding path. As

shown in Fig. 2.5(a), the time series of MFS data along the same indoor path is similar, while

those along different indoor paths differ considerably. Similar characteristics also appear on the

time series of RSSI values measured for a single AP, as shown in Fig. 2.5(b). Although RSSI

values at a single location suffer from very limited space resolution, a series of RSSI along a path

provide more distinctive information.

A cluster can be generated when the similarity value of two atomic segments based on the

time series of MFS and RSSI is below a threshold. A similarity vector can be calculated for an

26


atomic segment, measuring the similarity between itself and each existing cluster of the atomic

segments. For an atomic segment j, we apply dynamic time warping (DTW) [66] to calculate

its RSSI similarity vector Dj,R and MFS similarity vector Dj,M , respectively. Dj,R and Dj,M are

normalized with min-max normalization and summed together to obtain the similarity vector as

follows:

Dj = Dj,R +Dj,M = [dj,1, dj,2, ..., dj,i, ..., dj,N ], (2.1)

where dj,i denotes the similarity value between atomic segment j to the ith cluster, and N denotes

the total number of clusters. If dj,i is below a pre-defined threshold (e.g., 0.2), this atomic segment

is clustered to the ith cluster and can be matched with other segments from the ith cluster. If an

atomic segment cannot be matched with any existing cluster, we generate a new cluster for it.

Based on our experimental observations, two segments with a similarity value below 0.2 can be

considered matched. As we have normalized the RSSI vector and MFS vector during fusion, the

threshold can be generalized to various environments.

In some corner cases, the threshold may not work well. For example, when: 1) the

similarity value of two matched segments (i.e., two segments from the same location) is greater

than 0.2, and 2) there are multiple similarity values lower than 0.2 but only one segment should

be matched. In the first case, although the segments are mistakenly treated as belonging to two

clusters, we will be able to merge them together by trajectory fusion in Section 2.1.6.1. As for

the second case, we will match the target segment with the most similar one, meaning the one

with the smallest similarity value. The threshold is tested to be valid in our experiments in three

scenarios as will be shown in Section 2.2.

Note that for the matched segments within the same cluster, we not only know that they

27


should come from the same location but also have the alignment information between each pair of

the segments, e.g., we have information about how each turn on the atomic segment corresponds

to each other on the other matched segments.

2.1.5 Trajectory Bundling

In the previous section, we identify the segments belonging to the same location. To

reconstruct an accurate floor plan with these segments, we also need to identify their relative

positions. In this section, a trajectory bundling algorithm is designed to determine the position of

atomic segments.

The long trajectories are bundled leveraging their matched atomic segments. For every two

long trajectories that have matched segments, they are bundled in the following two steps: 1)

The first turning points of two matched segments are stitched together. 2) The long trajectories

are rotated so that the direction of the path section of the two matched atomic segments are

consistent. Every long trajectory is bundled to others via aligning their matched segments. After

long trajectories are bundled, we can determine the coordinates of atomic segments by utilizing

their spacial relationship on their original long trajectories. An example of trajectory bundling is

shown in Fig. 2.6, where two trajectories are bundled by aligning their matched segments.

However, as we treat the long trajectory as a rigid body for trajectory bundling, the long

trajectories containing large accumulative direction errors, as shown in Fig. 2.3(c), can reduce

the accuracy of the generated map. In the next section, we will adjust the positions of atomic

segments on the long trajectory to avoid the impact of accumulated errors on the generated map.

28


5 10 15 20 25 30

0

5

10

15

20

25

10 15 20 25 30 35

0

5

10

15

20

25

5 10 15 20 25 30

-5

0

5

10

15

20

Matched segments

1st turn aligned Matched segments aligned

Figure 2.6: Bundling process.

2.1.6 Trajectory Fusion and Area Shaping

2.1.6.1 Trajectory fusion

Although the rough position of atomic segment can be inferred by bundling long

trajectories, atomic segments belonging to the same path have severe coordinate deviation, as

shown in Fig. 2.7, which is undesirable. Hence, we propose a trajectory fusion algorithm to

generate a more accurate and well-shaped map next.

Leveraging the characteristics of matched segments, we fuse matched atomic segments in

two steps. First, for matched atomic segments at the same path, the inner-cluster constraints,

including angle and positions of their turns, are utilized to update their positions. We calculate

the medium coordinate of their endpoints at the same turn and update the coordinate of each

endpoint by

New location =(1− b)∗cluster medium location

+ b∗original location,

(2.2)

where b ∈ (0, 1) is a parameter balancing the cluster median location and original location.

Then, the intra-cluster global information can be leveraged to merge the turns that are

29


Four bundled trajectories Four trajectories fused together

Location 
Update

Helmert 
Transformation

Trajectory Fusion

Figure 2.7: Trajectory fusion process.

mistakenly separated, which would happen when atomic segments on the same path are classified

into different clusters, as discussed at the end of Section 2.1.4.2. DTW can be applied again

to merge the mistakenly separated turns. After the trajectories are bundled, we search for the

turns within a small area. Different from last time that we applied DTW on all pairs of atomic

segments, this time we only apply the DTW on the turns close to each other. If the DTW distance

of MFS and RSSI of two turns are below a threshold (e.g. 0.05 according to our experiments),

the two turns are considered to be at the same location and merged together. Thus, the mistakenly

separated segments belonging to the same path are correctly merged.

Finally, with the new coordinates of two endpoints, each atomic segment is transformed

to its new position by Helmert transformation [67], a similarity transformation frequently used

in geodesy to produce datum transformations between datums. Two dimensional Helmert

transformation is performed here. With the knowledge of the original coordinate of two turning

points and the updated coordinates of each points on the segments, Helmert Transformation

calculates the updated coordinates of each point on the segments. The transformed segments

preserve the shape and 2D information of the original atomic segments. Fig. 2.7 shows an

example of trajectory fusion process, where Helmert Transformation is performed on four

trajectories.

30


2.1.6.2 Area Shaping

When segmenting the long trajectories, there are atomic segments without straight path

section in between. Instead, they contain curved sections with continuous “turnings”. While

the straight segments, as discussed before, reflect the major layout of the building, the curved

segments also contain useful information about rooms, curved corridors, and open spaces.

Therefore, we separately process them to add more details to the recovered floor plan.

The position of the curved segment is deduced from its spatial relationship with straight

segments. We first find out straight segments that directly connect with the curved segment. Then,

the positions of straight segments are utilized to estimate the position of the curved segment and

Helmert transformation is then performed on the curved segment. This process is shown in Fig.

2.8. Without losing the shape information, curved segments are grouped with straight segments to

reconstruct the curve corridors, rooms and open spaces. Hence, the map construction algorithm

can not only generate straight corridors, but also estimate the location and area of the rooms,

making the floor plan more detailed and comprehensive.

Finally, to provide a well-shaped floor plan, we use alpha-shape [68] method to extract

the contour of trajectories, which creates a bounding area that envelops the set of points on all

trajectories. Specifically, it finds a convex hull using Delauney Triangulation and then deletes

some triangle borders according to parameter α to make a better fitting hull.

31


Curved atomic segment

Straight atomic segment

Figure 2.8: Curved trajectory positioning process.

2.2 Experiment

2.2.1 Experimental Setup and Data Collection

The proposed algorithm is evaluated in three scenarios: Scenario I is an office with an area

of 22.0 m × 36.5 m, which consists of corridors, rooms, doors, and elevators as shown in Fig.

2.9(a); Scenario II is the second floor of a townhouse with an area of 5.8 m × 12.2 m, consisting

of a kitchen and a living room as shown in Fig. 2.9(b); and Scenario III is a dining court of a

campus building with an area of 49.0 m × 12.5 m, which consists of a dining hall and hallways

around it, as shown in Fig. 2.9(c). The goal is to reconstruct the detailed floor plan using the

proposed system.

In Scenario I, the tracking device is installed on a cart pushed by human as shown in Fig.

2.2(a). Traces are collected on 12 different days in 5 months, with a total of 64 crowdsourcing

trajectories collected. Since the different turning directions are considered, there are 18 clusters

in this data set. In Scenarios II and III, the tracking device is equipped on a robot, Dji RoboMaster

S1, as shown in Fig. 2.2(b), which collect 49 trajectories with 8 clusters over 16 different days

in 5 months for Scenario II, and 54 trajectories with 19 clusters on 7 different days in 2 months

32


36.5 m

22
.0

m

AP

(a)
5.8 m

12
.2
m

AP

(b)

49.0 m

12
.5
m AP

(c)

Figure 2.9: Ground truth floor plans of (a) office, (b) home, and (c) campus court.

for Scenario III. We collect data over a long course to validate the robustness of EZMap over

environmental dynamics, while we only obtain a small amount of trajectories to demonstrate the

effectiveness of the proposed system.

EZMap employs a commodity home robot to collect data because it is labor-saving and

efficient. Moreover, home robots are widely available nowadays. However, EZMap is not limited

to high-precision robotic tracking based on RIM. Our proposed map construction algorithm is

also applicable to high-precision pedestrian tracking, such as WiBall [69].

2.2.2 Reconstructed Floor Plans

The results of the three scenarios are presented in Fig. 2.10, Fig. 2.11, and Fig. 2.12,

respectively. For all three scenarios, the first subfigure (i.e., Fig. 2.10(a), Fig. 2.11(a), and Fig.

2.12(a)) shows the bundled trajectories with color lines. The reconstructed floor plan of Scenario

I is shown in Fig. 2.10(b), where we can see the accessible area of rooms and open space are

well reconstructed with curved trajectories. The reconstructed hallway plan of Scenario I is

shown in Fig. 2.10(c), while the ground-truth hallway plan extracted by alpha-shape is shown

33


(a) (b) (c) (d)

Figure 2.10: Construction result of Scenario I. (a) Trajectories bundling result. (b) Reconstructed
floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground-truth floor
plan.

(a) (b) (c) (d)

Figure 2.11: Construction result of Scenario II. (a) Trajectories bundling result. (b) Reconstructed
floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground-truth floor
plan.

in Fig. 2.10(d). In Scenario II, the reconstructed floor plan is shown in Fig. 2.11(c), where the

open space in Kitchen is well reconstructed by our system. Without the need of anchor points

required by most related works, our system demonstrate the capability to reconstruct the floor

plan for not only large and public areas, but also small and private environments like homes. Fig.

2.12(c) shows the well reconstructed floor plan of Scenario III. The result shows our system can

accurately reconstruct the floor plan for a variety of areas with only a single AP.

2.2.3 Performance Evaluation

The matching performance, room size accuracy, hallway shape accuracy and the

construction efficiency of EZMap are evaluated. As for hallway shape and room size, we

compare our system with SenseWit [6] and CorwdInside [7] using the evaluation result reported

34


(a) (b) (c) (d)

Figure 2.12: Construction result of Scenario III. (a) Trajectories bundling result. (b)
Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground-
truth floor plan.

in [6]. We do not implement the two systems for comparison because the prerequisites make

them inapplicable to our settings.

2.2.3.1 Matching performance

The matching performance is evaluated by purity measure and Normalized Mutual

Information (NMI) score. The purity score sc is defined as

sc =
1

M

c∑
i=1

Ci, (2.3)

where Ci is the number of atomic segments in the largest class inside each cluster, c is the number

of clusters and and M is the total number of atomic segments.

The NMI is defined as

NMI(Y,C) =
2× I(Y ;C)

[H(Y ) +H(C)]
, (2.4)

where Y denotes class labels, C denotes cluster labels, H(.) is Entropy, and I(Y ;C) is Mutual

Information between Y and C.

35


In this experiment, we collect 206 atomic segments belonging to 18 true clusters, which

are clustered into 20 clusters as our algorithm incorrectly separates one of the clusters into three.

All the atomic segments belong to the largest class inside the cluster, except for one, resulting in

a purity score of 99.51% and a normalized mutual information score of 95.05%.

To justify that the geometric constraint is very beneficial to atomic trajectory matching,

a comparison experiment is conducted, where the matching performance is calculated without

adopting the geometric constraint. The purity score drops to 87.38%, and the NMI score

becomes 81.80%, proving that the geometric constraint can improve the matching performance

significantly. On the other hand, it also demonstrates, surprisingly, that a reasonable accuracy of

over 80% of segment matching could be achieved by using magnetism and RSSI of a single AP.

2.2.3.2 Room size

The room size is evaluated by the error between the reconstructed room size and ground

truth area, which is calculated as:

Error =
|reconstructed room size− ground truth size|

ground truth size
. (2.5)

We compare the error of our system with SenseWit and CrowdInside with the data collected in

office. The average error of SenseWit is 31.4%, and that of CrowdInside is 40.6%. Our system has

an average error of 36.1%, which is lower than CrowdInside and higher than SenseWit. Without

requiring lots of anchor points as SenseWit, the accuracy of our system is still comparable. The

estimation error is mainly due to obstacles like tables, drawers, and chairs in the office, which

prevent the cart or robot from walking through the entire room.

36


2.2.3.3 Hallway shape

We evaluate the hallway shape with the same metric as SenseWit [16], which is shown

below:

P =
|Sgen ∩ Strue|

|Sgen|
,

R =
|Sgen ∩ Strue|

|Strue|
,

F = 2 ∗ P ∗ R
P +R

,

(2.6)

where P is the precision of the hallway shape, R is the recall, and F is the harmonic mean of

precision and recall. P is defined as the overlapped area divided by the reconstructed hallway

area. R is defined as the overlapped area divided by the ground-truth hallway area.

Table 2.1: Evaluation results of hallway shape

EZMap SenseWit CrowdInside

P 78.18% 75.3% 59.5%

R 75.10% 82.4% 47.1%

F 76.61% 78.69% 52.0%

The evaluation result is shown in Table 2.1. The origin point and the orientation of ground

truth hallway and the generated hallway are aligned. From the table, we can find that the P , R,

and F of our system are better than CrowdInside and comparable with SenseWit. The precision

of our system is higher than SenseWit, but recall rate is lower than SenseWit, which is because

we do multi-trajectory fusion so that the hallway widths are less than the ground truths. Note that

SenseWit needs various anchors to achieve comparable performance with the proposed system,

showing that the proposed system can generate accurate hallway plans for various environments,

37


(a) (b) (c) (d) (e)

Figure 2.13: Reconstructed hallway plan results of Scenario I using different number of
trajectories. (a) Result of using 24 trajectories. (b) Result of using 35 trajectories. (c) Result
of using 46 trajectories. (d) Result of using 52 trajectories. (e) Result of using 61 trajectories.

especially for the private environments without anchor points, such as home.

2.2.3.4 Construction efficiency

Different from most crowdsourcing based indoor map construction system, EZMap can

generate hallway map with much fewer trajectories while achieving comparable precision. We

evaluate the construction efficiency of EZMap in Scenario I. Fig. 2.13 shows the reconstructed

hallway with a number of 24, 35, 46, 52, and 61 trajectories, respectively. The hallway

construction accuracy with different number of trajectories is 53.76%, 67.44%, 71.99%, 73.46%,

and 76.61%, respectively.

We compare the construction efficiency of EZMap with SenseWit, and CrowdInside, shown

in Table 2.2. SenseWit uses over 300 trajectories to cover a campus library with 464 m2 area,

and CrowdInside employs over 150 trajectories to cover an environment with 448 m2 area. We

can find that EZMap achieves comparable accuracy to SenseWit with only one fourth of the

number of trajectories used in a twice larger environment, and EZMap is more accurate than

CrowdInside with much less trajectories needed. The efficiency evaluation result shows that

EZMap can rapidly reconstruct accurate hallway. In addition, our design employs low-cost home

robots, which also save significant manpower for data collection.

38


Table 2.2: Evaluation results of construction efficiency

EZMap SenseWit CrowdInside

Trajectory number 61 300 150

Area 803 m2 464 m2 448 m2

Trajectory number/area 0.08 /m2 0.65 /m2 0.33 /m2

2.3 Summary

In this chapter, we present EZMap, a universal automatic floor plan construction system

that does not need any prerequisite knowledge of buildings in advance. EZMap benefits

from recent advances in centimeter-accuracy indoor tracking using RF signals. It leverages

commodity WiFi to estimate accurate moving distances and employs inertial sensors for

orientation reckoning. EZMap then processes crowdsourced trajectories with a novel pipeline

of trajectory segmentation, matching, bundling, and shaping, which ultimately reconstructs a

floor plan with not only skeletal layouts but also detailed area sizes of straight/curved corridors,

open spaces, and rooms. Requiring minimal infrastructure and a small amount of data, EZMap

can scale to a number of various buildings including public malls, offices, as well as home

environments. To our best knowledge, EZMap is the first RF-based system that can accurately

reconstruct map of private environments such as home.

39


Chapter 3: Human and Non-human Motion Discrimination

With the proliferation of Internet of Things (IoT) devices, indoor intelligent applications

such as security surveillance, intruder detection, occupancy monitoring, and activity recognition

have gained significant attention [28]. However, these applications frequently suffer from an

elevated rate of false alarms due to the inability to recognize human and non-human subjects, such

as pets, robotic vacuum cleaners, and electrical appliances, as depicted in Fig. 3.1. This ability

to differentiate is essential, especially for applications related to security, health monitoring,

automation and energy management. Misidentification can lead to user frustration, erode trust,

and hamper the practical and widespread adoption of these technologies.

Pets, vacuum machines, and electrical appliances such as fans are prevalent in indoor

environments, especially in homes. According to the American Pet Products Association [70],

about 70% of families in the United States (about 85 million families) have pets in 2022, and the

percentage is increasing year by year. The global robotic vacuum cleaners market is expected

to grow from $5.59 billion in 2021 to $7.83 billion in 2026, at a compound annual growth rate

(CAGR) of 6.9%. Therefore, given the prevalence of pets, robotic vacuum cleaners, and electrical

appliances, especially in residential environments [70], it is crucial to develop a reliable system

that can accurately recognize human and non-human subjects.

In this chapter, we introduce the first system (“WI-MOID”: WiFi-based human and

40


Intrusion
Detected

Room
Occupied

Figure 3.1: An illustration of non-human motion interference with sensing systems.

non-human motion identification) that can accurately identify various human and non-human

subjects through walls. The system utilizes ubiquitous WiFi signals and operates unobtrusively

and without contact, eliminating the need for additional instrumentation or environmental

restrictions. In contrast to existing systems that require pets, robots, and humans to move along

predefined paths/directions/areas, Wi-MoID can detect freely moving subjects, allowing them

to walk/turn/stand/stay without any predefined restrictions. Wi-MoID automatically detects

the movement of the subject and extracts context-independent features, allowing it to function

effectively in diverse environments without requiring additional training or parameter tuning.

The structure of this chapter is organized as follows: Section 3.1 provides an overview

of Wi-MoID, including challenges, the insight and a system overview. Section 3.2 delves into

the details of motion detection and speed estimation. Section 3.3 introduces feature extraction

and motion recognition. Experimental setups and evaluations are described in Section 3.4. We

discuss the impact of various factors in Section 3.5 and draw our conclusions in Section 3.6.

41


3.1 Overview

3.1.1 Challenges

To realize such a flexible system is not an easy task, several major challenges need to be

addressed.

The first difficulty lies in deriving motion features in both Line-Of-Sight (LOS) and

Non-Line-Of-Sight (NLOS) settings using only a single-link WiFi signal. Speed pattern, a key

component of motion features, presents its own unique challenge. While various WiFi-based

speed estimation approaches, such as those using Doppler Frequency Shift (DFS), have been

attempted, they generally require multiple links or are only effective under LOS conditions.

In this paper, we draw inspiration from WiSpeed [57] and adopt a rich-scattering model to

overcome the challenge by treating subjects as multiple scatters.

Secondly, developing a system that operates effectively in unseen contexts without

requiring additional training or parameter adjustments is a substantial challenge. Most existing

WiFi-based activity recognition methods rely on features extracted directly from raw CSI, which

are susceptible to environmental changes with a degraded performance in new environments

without retraining. To address this issue, we leverage a second-order statistic, the autocorrelation

function (ACF), of the CSI and only extract features caused by the change of the multipath

profiles. Unlike raw CSI, the ACF is insensitive to the static subjects and solely reflects

the movement of moving subjects, making it well-suited for motion recognition in different

environments.

Third, accurately recognizing pet motion in real-world scenarios is challenging, especially

42


when their movement path, area, or activities are not restricted. Previous works that aim

to recognize pet motion often make unrealistic assumptions about pets’ movement, such as

assuming a specific path or confined area, considering only walking activity, or requiring pets

to remain close to devices quietly for extended periods [44, 45]. In this paper, we propose a

new approach that combines physical and statistical features to classify human and non-human

motion, and further enhance the classification using a Hidden Markov Model (HMM)-based state

machine that leverages temporal information to boost the accuracy. By incorporating temporal

information, Wi-MoID can accurately and reliably distinguish pet motion from human motion,

even when pets move freely and engage in different activities.

3.1.2 Insight

A moving subject can be categorized as either human or non-human, the latter

encompassing a range of entities such as pets or vacuum robots. When we refer to human

motion, we are talking about the movement of the human body, encompassing activities such

as walking, running, and sneaking. In contrast, non-human motion pertains to the movement of

animate or inanimate subjects, which includes animals, robots, and other physical objects. We

classify any motion originating from non-human entities as non-human motion.

Interestingly, though pets, cleaning robots, and even electrical appliances such as fans can

all generate motion that can be picked up by sensing systems, their movements are distinct from

those of humans. The differences in gait patterns are a key aspect of this distinction, with humans,

pets, and robots each demonstrating unique movement patterns due to their bipedal, quadrupedal,

and wheeled locomotion, respectively. These differences are visually represented in Fig. 3.2. As

43


Stride cycle

Stance Swing

Stride cycle

Swing Stance

Moving with a constant speed

Figure 3.2: An illustration of moving patterns of human, dog and cleaning robot.

a result of these disparate movement patterns, the analysis of features like speed patterns becomes

an essential factor in differentiating the motion of these subjects.

3.1.3 System Overview

The overview of Wi-MoID, as illustrated in Fig. 3.3, is primarily composed of the following

three steps:

1) Firstly, we preprocess the CSI received by the receiver. We compute the ACF of the CSI

and, based on this, calculate the motion statistic to determine the presence of moving targets

in the environment. Additionally, we derive the speed of the moving target based on the

ACF.

44


CSI acquisition
ACF

calculation

Speed
estimation

Motion
detection

Data processing

Physical
plausible

Statistical

Feature vector

Feature extraction

Subject type

Human

Pet

Cleaning
robot

[0,
1,

0
]

Classification

SVM classifier

Figure 3.3: System overview.

2) Secondly, we effectively extract a series of features from the moving target. Using the ACF,

motion statistic, and speed, we extract physically interpretable features and signal statistical

features of the moving target. These features are independent of the environment, location,

and direction.

3) Lastly, using SVM and a state machine, we output the final identification result of the

system. We initially use SVM to classify the features of the target and then employ a

state machine to adjust any potential misclassifications based on the transitional features

between states, finally outputting the result.

3.2 Motion Detection and Speed Estimation

This section presents our approach to detecting the motion of subjects and estimating their

speeds through walls, using a single WiFi link.

3.2.1 Preliminary

In order to accurately detect the motion of a subject in an indoor environment, it is crucial

to consider the presence of numerous multipath signals arising from rich scattering. A statistical

model that account for all these multipath components can be employed to estimate the speed of

45


0 5 10 15 20 25 30 35 40

10
20
30
40
50

C
S

I
-2

0

2

5 10 15 20 25 30 35 40

0.02

0.04

0.06

0.08

A
C

F

-0.05

0

0.05

0 5 10 15 20 25 30 35 40

0

0.5

1

M
o
ti
o
n
 s

ta
ts

0

0.5

1

D
e
te

c
ti
o
n

0 5 10 15 20 25 30 35 40

Time (sec)

0.2

1

2

S
p
e
e
d
 (

m
/s

)

(a) Human

0 5 10 15 20 25 30 35 40

10
20
30
40
50

C
S

I

-2

0

2

4

5 10 15 20 25 30 35 40

0.02

0.04

0.06

0.08

A
C

F

-0.05

0

0.05

0 5 10 15 20 25 30 35 40

0

0.5

1

M
o
ti
o
n
 s

ta
ts

0

0.5

1

D
e
te

c
ti
o
n

0 5 10 15 20 25 30 35 40

Time (sec)

0.2

1

2

S
p
e
e
d
 (

m
/s

)

(b) Cleaning robot

0 5 10 15 20 25 30 35 40

10
20
30
40
50

C
S

I

-2

0

2

5 10 15 20 25 30 35 40

0.02

0.04

0.06

0.08

A
C

F

-0.05

0

0.05

0 5 10 15 20 25 30 35 40

0

0.5

1

M
o
ti
o
n
 s

ta
ts

0

0.5

1

D
e
te

c
ti
o
n

0 5 10 15 20 25 30 35 40

Time (sec)

0.2

1

2

S
p
e
e
d
 (

m
/s

)

(c) Pet

Figure 3.4: CSI, ACF, motion statistics and speed estimation for (a) human, (b) pet, and (c)
cleaning robot.

the target [29, 36, 57].

In indoor environments, rich scattering occurs due to both static and dynamic scatterers.

Static scatterers comprise walls, floors, and stationary furniture, while dynamic scatterers include

moving individuals or objects. The superposition principle of electromagnetic (EM) waves allows

us to decompose CSI as follows:

H(t, f) =
∑

i∈Ωs(t)

Hi(t, f) +
∑

j∈Ωd(t)

Hj(t, f) + ε(t, f), (3.1)

where Ωs(t) represents the set of static scatterers, Ωd(t) corresponds to the set of dynamic

scatterers. Hi(t, f) and Hj(t, f) represent the contributions of the i-th and j-th scatterers

respectively. The noise term, ε(t, f), is statistically independent of Hi(t, f) and Hj(t, f) [57].

Each scatterer acts as a ”virtual transmitter”, scattering its received EM waves around. The CSI

represents the aggregate of the electric fields of all incoming EM waves.

The ACF of the CSI H(t, f) with a time lag τ , denoted as ρH(τ, f), can be determined by

calculating the covariance between H(t, f) and H(t + τ, f) divided by the variance of H(t, f)

46


itself. Mathematically, it can be expressed as:

ρH(τ, f) =
Cov[H(t, f), H(t+ τ