ABSTRACT Title of Dissertation: WIRELESS SENSING AND ANALYTICS FOR MOTION MONITORING AND MAPPING Guozhen Zhu Doctor of Philosophy, 2023 Dissertation Directed by: Professor K. J. Ray Liu Department of Electrical and Computer Engineering Environmental perception is pivotal for intelligent systems, enabling them to adeptly capture, interpret, and act upon contextual cues. Grasping the intricacies of the environment—its objects, occupants, floor plan, and dynamics—is fundamental for the effective deployment of technologies, including robotics, the Internet of Things (IoT), and augmented reality. Traditional perception mechanisms, such as video surveillance and sensor-based monitoring, are often hampered by privacy concerns, substantial infrastructural costs, energy inefficiencies, and limited coverage. In contrast, WiFi sensing stands out for its non-intrusive, cost-effective, and pervasive attributes. Capitalizing on ubiquitous WiFi signals that permeate both indoor and outdoor spaces, WiFi sensing delivers unparalleled advantages over its traditional counterparts, sidestepping the need for extra hardware yet offering profound environmental insights. Its capability to penetrate walls and other obstructions further broadens its range, covering areas beyond the reach of conventional sensors. These unique edges of WiFi sensing elevate its value across diverse applications, spanning smart homes, health monitoring, location-based services, and security systems. Amplifying environmental perception via WiFi sensing is more than just an innovation in ubiquitous computing; it’s a leap towards forging safer, more efficient, and smarter environments. This dissertation explores monitoring and mapping environments leveraging motion analytics based on commodity WiFi. In the first part of this dissertation, we introduce an efficient and cost-effective system for precise floor plan construction by integrating RF and inertial sensing techniques. The proposed system harnesses detailed insights from RF tracking and broad context from inertial metrics, such as magnetic field strength, to produce an accurate map. The system employs a robot for trajectory collection and requires only a single Access Point to be arbitrarily installed in space, both of which are widely available nowadays. Impressively, the system can produce detailed maps even with minimal data, making it adaptable for diverse structures such as shopping centers, offices, and residences without significant expenses. We validated the efficacy of the proposed system using a Dji RoboMaster S1 robot equipped with standard WiFi across three distinct buildings, demonstrating its capability to produce reliable maps for the intended regions. Given the widespread presence of WiFi setups and the increasing prevalence of domestic robots, the proposed approach paves the way for universal intelligent systems offering indoor mapping services. In the second and third parts, we present two innovative strategies leveraging WiFi to identify the motion of human and various non-human subjects. Initially, we detail a novel passive, non-intrusive methodology tailored for edge devices. By extracting and analyzing motion’s physically and statistically plausible features, our system recognizes human and diverse non- human subjects through walls using a singular WiFi link. Experimental results from four distinct buildings with various moving subjects validate its efficiency on edge devices. Advancing to more intricate cases, we put forth a deep learning-based WiFi sensing paradigm. This delves into the efficacy of diverse deep learning models on human and non-human object recognition and probes the feasibility of transferring image-trained models to fulfill the WiFi sensing task. Designed with a robust statistic invariant to the environment and position, this system efficiently adapts to new surroundings. Comprehensive experimental evaluations affirm our framework’s precision in pinpointing intricate human and non-human subjects, and readiness for integration into prevalent intelligent systems, thereby boosting their perceptual capacities. In the final part of this dissertation, we propose a pioneering through-wall indoor intrusion detection system that adeptly filters out interference from non-human subjects using ubiquitous WiFi signals. A novel deep learning architecture is proposed for single-link WiFi signal analysis. It employs a ResNet-18-based module to extract features of indoor moving subjects and an LSTM-based module to incorporate temporal information for efficient intrusion detection. Notably, the system is invariant to environmental changes, angles, and positions, enabling swift deployment in new environments without additional training. Evaluation in five indoor environments with various interference yielded high intrusion detection accuracy and a low false alarm rate, even without model tuning for unseen settings. The results underscore the system’s exceptional adaptability, positioning it as a top contender for widespread intelligent indoor security applications. WIRELESS SENSING AND ANALYTICS FOR MOTION MONITORING AND MAPPING by Guozhen Zhu Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor K. J. Ray Liu, Chair/Advisor Professor Gang Qu, Co-Chair Professor Min Wu Dr. Beibei Wang Professor Lawrence C. Washington © Copyright by Guozhen Zhu 2023 Dedication To my family — Xinhua Zhu, and Wenchuan Guo ii Acknowledgments I am immensely grateful for the collective support, mentorship, and camaraderie that have contributed to the completion of this dissertation. First and foremost, my heartfelt gratitude goes out to my advisor, Professor K. J. Ray Liu, for his belief in my abilities, which paved the way for the exploration of challenging yet profoundly intriguing projects throughout my four-year journey. His continual availability and prompt response for guidance and inquiries never wavered - there hasn’t been an instance when I reached out for advice and he hasn’t generously lent his time. The experience of collaborating with and learning from such an extraordinary individual has been an unparalleled privilege and pleasure. I am sincerely thankful to my mentors, Dr. Chenshu Wu and Dr. Beibei Wang, for their invaluable advice, patience, mentorship, and for constantly challenging me to achieve more. I am truly inspired by their commitment and dedication to academic excellence. Their suggestions, discussions, and dedication to my progress have been a beacon throughout this journey. I would like to thank my co-authors, Dr. Yuqian Hu, Dr. Xiaolu Zeng, and Weihang Gao for their valuable contributions, intellectual stimulation, and spirited discussions. They have enriched my research experience and boosted my learning curve. I would like to extend my gratitude to my committee members, Prof. Min Wu, Prof. Gang Qu, and Prof. Lawrence C. Washington, for their precious time and efforts in serving on my iii dissertation committee and reviewing the manuscript. My journey would not have been the same without the camaraderie and support from my lab mates, Wei-Hisang Wang, Dr. Sai Deepika Regani, Dr. Muhammed Zahid Ozturk, Dr. Fengyu Wang, and Sakila S. Jayaweera. Their collective contribution in providing a collaborative, intellectually stimulating environment was invaluable. I am fortunate to have shared this journey with such a dedicated and friendly group of individuals. I wish to express my special thanks to my best friend Mengting Xing, for her support, love, and understanding. Her constant motivation and companionship provided comfort and strength during challenging times. I also want to thank my wonderful friends here, Ke Gong, Yexin Cao and Xiaojue Chen, for their friendship and unwavering support during this journey. Finally, I would like to express my deepest gratitude to my parents, Mr. Xinhua Zhu and Dr. Wenchuan Guo. Your unconditional love, encouragement, advice, and faith in my abilities have been my rock. Thank you for instilling in me the values of hard work and perseverance, and for your endless sacrifices to provide me with the best opportunities. This achievement is as much yours as it is mine. To all mentioned and those unmentioned who have contributed to my academic journey in any form, I am eternally grateful. I am reminded of an African proverb, “If you want to go fast, go alone. If you want to go far, go together.” This journey has been far and enriching, and I could not have done it without each and every one of you. Thank you all. iv Table of Contents Dedication ii Acknowledgements iii Table of Contents v List of Tables viii List of Figures ix List of Abbreviations xi Chapter 1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Floor Plan Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Human and Non-Human Motion Identification . . . . . . . . . . . . . . 6 1.2.3 Indoor Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Dissertation Outline and Contributions . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Automatic Floor Plan Construction (Chapter 2) . . . . . . . . . . . . . . 10 1.3.2 Human and Non-human Motion Discrimination (Chapter 3) . . . . . . . 11 1.3.3 Deep Learning for Human and Non-human Motion Identification (Chapter 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.4 Through-the-wall Indoor Intrusion Detection (Chapter 5) . . . . . . . . . 12 1.4 Preliminaries on Wireless Sensing . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Received Signal Strength Indicator . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Channel State Information . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 2: Automatic Floor Plan Construction 17 2.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.2 Trajectory Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.3 Trajectory Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.4 Segment Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.5 Trajectory Bundling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.6 Trajectory Fusion and Area Shaping . . . . . . . . . . . . . . . . . . . . 29 2.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 v 2.2.1 Experimental Setup and Data Collection . . . . . . . . . . . . . . . . . . 32 2.2.2 Reconstructed Floor Plans . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 3: Human and Non-human Motion Discrimination 40 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.1.2 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.1.3 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Motion Detection and Speed Estimation . . . . . . . . . . . . . . . . . . . . . . 45 3.2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.2 Motion Detection and Speed Estimation . . . . . . . . . . . . . . . . . . 47 3.3 Feature Extraction and Motion Recognition . . . . . . . . . . . . . . . . . . . . 49 3.3.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3.2 Recognition Model Design . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.3 State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.2 Recognition Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.5.1 Effectiveness of State Machine . . . . . . . . . . . . . . . . . . . . . . . 63 3.5.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.5.3 Feature Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5.4 Length of Motion Segments . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5.5 Sounding Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5.6 Edge Device Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Chapter 4: Deep Learning for Human and Non-human Motion Identification 69 4.1 WiFi Signal Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.1.1 Environment-independent Statistic Extraction . . . . . . . . . . . . . . . 72 4.1.2 Motion Detection and Segmentation . . . . . . . . . . . . . . . . . . . . 75 4.1.3 Input Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2 Deep Learning for Human and Non-human Recognition . . . . . . . . . . . . . . 77 4.2.1 Forward Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.3 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.4 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.5 Transfer from Pre-trained Models . . . . . . . . . . . . . . . . . . . . . 81 4.3 Experimental Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.2 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3.3 Dataset and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 vi 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.1 Classification Performance Evaluation . . . . . . . . . . . . . . . . . . . 86 4.4.2 Recognition in Unseen Environments for Seen Subjects . . . . . . . . . . 87 4.4.3 Recognition in Unseen Environment for Unseen Subjects . . . . . . . . . 88 4.4.4 Recognition in Unseen Environment for Coexisting Multiple Subjects . . 89 4.4.5 Performance of Transfer Learning . . . . . . . . . . . . . . . . . . . . . 90 4.4.6 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.4.7 Window Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.4.8 Sounding Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.4.9 Computational Complexity and Model Parameter . . . . . . . . . . . . . 95 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Chapter 5: Through-the-wall Indoor Intrusion Detection 97 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.1.2 WiFi-based Intrusion Detection Framework . . . . . . . . . . . . . . . . 100 5.2 WiFi Signal Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2.1 A-ACF Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2.2 Motion Detection and Segmentation . . . . . . . . . . . . . . . . . . . . 103 5.3 Intrusion Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.1 Motion Recognition Module . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.2 Intrusion Detection Module . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4 Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4.1 Hardware and Experimental Environment . . . . . . . . . . . . . . . . . 107 5.4.2 Dataset and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.4.3 Evaluation on Classification Performance . . . . . . . . . . . . . . . . . 110 5.4.4 Evaluation on Intrusion Detection Performance . . . . . . . . . . . . . . 111 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.1 Effectiveness of Intrusion Detection Module . . . . . . . . . . . . . . . . 114 5.5.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.5.3 Computational Complexity and Memory Requirements . . . . . . . . . . 116 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Chapter 6: Conclusion and Future Work 118 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Bibliography 122 vii List of Tables 2.1 Evaluation results of hallway shape . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2 Evaluation results of construction efficiency . . . . . . . . . . . . . . . . . . . . 39 3.1 Human and non-human motion identification performance . . . . . . . . . . . . 60 3.2 Adaptivity to other non-human subjects . . . . . . . . . . . . . . . . . . . . . . 62 3.3 State machine performance evaluation . . . . . . . . . . . . . . . . . . . . . . . 64 3.4 Recognition performance with feature selection . . . . . . . . . . . . . . . . . . 65 3.5 Performance at difference sounding rates . . . . . . . . . . . . . . . . . . . . . . 67 4.1 Summary of dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 Classification accuracy of deep learning models for human and non-human motion identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3 Classification accuracy evaluation on transfer learning . . . . . . . . . . . . . . . 91 4.4 Computational resource requirements and model size . . . . . . . . . . . . . . . 96 5.1 Summary of dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 Architecture of Wi-IntruNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.3 Classification accuracy of MRM for human and non-human motion identification 112 5.4 Evaluation of Wi-IntruNet in unseen environments with seen, unseen and coexisting subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.5 Computational complexity and memory requirements . . . . . . . . . . . . . . . 116 viii List of Figures 2.1 System overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Crowdsourcing trajectory collection devices. . . . . . . . . . . . . . . . . . . . . 20 2.3 Atomic segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 An example of matching process. . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 Time series of MFS and RSSI on two path. . . . . . . . . . . . . . . . . . . . . . 26 2.6 Bundling process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.7 Trajectory fusion process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.8 Curved trajectory positioning process. . . . . . . . . . . . . . . . . . . . . . . . 32 2.9 Ground truth floor plans of (a) office, (b) home, and (c) campus court. . . . . . . 33 2.10 Construction result of Scenario I. (a) Trajectories bundling result. (b) Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground-truth floor plan. . . . . . . . . . . . . . . . . . . . . . . 34 2.11 Construction result of Scenario II. (a) Trajectories bundling result. (b) Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground-truth floor plan. . . . . . . . . . . . . . . . . . . . . . . 34 2.12 Construction result of Scenario III. (a) Trajectories bundling result. (b) Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground-truth floor plan. . . . . . . . . . . . . . . . . . . . . . . 35 2.13 Reconstructed hallway plan results of Scenario I using different number of trajectories. (a) Result of using 24 trajectories. (b) Result of using 35 trajectories. (c) Result of using 46 trajectories. (d) Result of using 52 trajectories. (e) Result of using 61 trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.1 An illustration of non-human motion interference with sensing systems. . . . . . 41 3.2 An illustration of moving patterns of human, dog and cleaning robot. . . . . . . . 44 3.3 System overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4 CSI, ACF, motion statistics and speed estimation for (a) human, (b) pet, and (c) cleaning robot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5 Histograms of (a) motion statistics and (b) speed estimations for human and pet. 49 3.6 Example of human stride cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.7 Stide cycle time and stride length of human and pet. . . . . . . . . . . . . . . . . 52 3.8 (a) 25 percentile and (b) 75 percentile of speed estimations for human, robot, pet. 53 3.9 (a) ACF at a time instance, (b) average ACF peak value, and (c) average ACF valley values of human, robot, pet. . . . . . . . . . . . . . . . . . . . . . . . . . 54 ix 3.10 Floor plans of (a) Scenario I: an apartment, (b) Scenario II: a townhouse, (c) Scenario III: a single family house, and (d) Scenario IV: an office building. Multiple pairs of transceiver are adopted to enhance data collection efficiency and data diversity, but only data from single transceiver pair is used for feature extraction and recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.11 (a) Off-the-shelf WiFi device, (b) pet in Scenario I. . . . . . . . . . . . . . . . . 58 3.12 Adaptivity to human motion types. . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.13 Impact of length of motion segments. . . . . . . . . . . . . . . . . . . . . . . . . 66 3.14 CPU and memory requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.1 CSI of (a) human motion in Environment A, (b) pet motion in Environment A, (c) cleaning robot motion in Environment A, (d) human motion in Environment B, (e) pet motion in Environment B, and (f) cleaning robot motion in Environment B. 72 4.2 A-ACF of (a) human motion in Environment A, (b) pet motion in Environment A, (c) cleaning robot motion in Environment A, (d) human motion in Environment B, (e) pet motion in Environment B, and (f) cleaning robot motion in Environment B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Motion detection and A-ACF segmentation with motion statistics. . . . . . . . . 76 4.4 Illustration of data input preparation. . . . . . . . . . . . . . . . . . . . . . . . . 77 4.5 (a) Tx, and (b) Rx in Scenario III. . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.6 Floor plan of (a) Scenario I, an apartment, (b) Scenario II, a townhouse, (c) Scenario III, a single family house, (d) Scenario IV, a single family house, and (e) Scenario V, an office building. . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.7 Recognition accuracy in unseen environment with familiar subjects. . . . . . . . 88 4.8 Recognition accuracy in unseen environment with unseen subjects. . . . . . . . . 89 4.9 Recognition accuracy in unseen environment with multiple subjects coexisting. . 90 4.10 Training losses of deep learning models over 100 epochs. . . . . . . . . . . . . . 91 4.11 Validation losses of deep learning models over 100 epochs. . . . . . . . . . . . . 92 4.12 Training procedures of deep learning models regarding training accuracy. . . . . 93 4.13 Evaluation on impact of motion segment length. . . . . . . . . . . . . . . . . . . 94 4.14 Evaluation on impact of sounding rate. . . . . . . . . . . . . . . . . . . . . . . . 95 5.1 Overview of Wi-IntruNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 An illustration of LSTM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3 (a) Tx, and (b) Rx in Scenario III. . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4 Floor plan of (a) Scenario I, an apartment, (b) Scenario II, a townhouse, (c) Scenario III, a single family house, (d) Scenario IV, a single family house, and (e) Scenario V, an office building. . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.5 Effectiveness evaluation of IDM. . . . . . . . . . . . . . . . . . . . . . . . . . . 115 x List of Abbreviations A-ACF Amplified Autocorrelation Function ACF Autocorrelation Function AoA Angle of Arrival AP Access Point CDF Cumulative Distribution Function CFO Channel Frequency Offset CFR Channel Frequency Response CIR Channel Impulse Response CNN Convolutional Neural Network CPU Central Processing Unit CSI Channel State Information DTW Dynamic Time Warping DFS Doppler Frequency Shift EM Electromagnetic FMCW Frequency Modulated Continuous Wave FNN Forward Neural Network FFT Fast Fourier Transform FPR False Positive Rate GHz Gigahertz GPU Graphics Processing Unit GRUNet Gated Recurrent Unit Network HMM Hidden Markov Model IDM Intrusion Detection Module IoT Internet of Things IMU Inertial Measurement Unit LBS Location-Based Services xi LOS Line of Sight LSTM Long Short Term Memory MFS Magnetic Field Strength MHz Megahertz MIMO Multiple Input Multiple Output MLP Multilayer Perceptron mmWave Millimeter-Wave MRC Maximum Ratio Combine MRM Motion Recognition Module NLOS Non Line of Sight NMI Normalized Mutual Information PCA Principal Component Analysis RAM Random-Access Memory RF Radio Frequency RNN Recurrent Neural Network RSS Received Signal Strength RSSI Received Signal Strength Indicator Rx Receiver SFO Sampling Frequency Offset SLAM Simultaneous Localization and Mapping SNR Signal-to-Noise Ratio STFT Short-Time Fourier Transform STO Symbol Timing Offset SVM Support-Vector Machines ToA Time of Arrival ToF Time of Flight TPR True Positive Rate TRRS Time Reversal Resonance Strength Tx Transmitter UWB Ultra-Wide Band ViT Vision Transformer xii Chapter 1: Introduction 1.1 Motivation Environmental perception is integral to the effectiveness of intelligent systems. It enables these systems to understand, map, and monitor their surroundings, which is crucial for their proper functioning and optimization. Mapping the environment allows a system to accurately determine its position and plan routes, a fundamental aspect for applications like autonomous vehicles and robotics. Similarly, monitoring enables a system to detect changes or anomalies in the environment, key for predictive maintenance, security, or hazard detection. In essence, the ability to perceive the environment underpins the core functionalities of intelligent systems, enhancing their effectiveness, safety, and efficiency. WiFi signals are ubiquitous in our modern digital world, providing wireless connectivity for a plethora of devices ranging from smartphones to smart home appliances. These signals, which invisibly traverse our environment, are a cornerstone of wireless communication and have opened up a new avenue of WiFi sensing. By detecting alterations in WiFi signal characteristics due to interactions with objects or individuals, WiFi sensing provides a new layer of environmental awareness. It offers numerous advantages, including: 1) non-intrusive and privacy-preserving sensing, 2) lower costs and energy consumption by leveraging existing infrastructure, 3) the ability to penetrate penetrate walls and other opaque objects for hard-to-reach sensing areas, 4) 1 high scalability due to the ubiquity of WiFi, and 5) resilience to lighting conditions for continuous operation. Motivated by the compelling benefits of WiFi-based sensing and the critical need to augment the perceptual capabilities of intelligent systems, this dissertation explores the innovative use of ubiquitous WiFi signals for environmental perception enhancement through motion analytics, with a specific focus on mapping and monitoring. We present an advanced suite of solutions that includes an automated algorithm for floor plan construction, two distinctive approaches for identifying human and non-human subjects, and a robust indoor intrusion detection system. By pushing the frontiers of Wi-Fi sensing, this research marks a significant advancement in the realm of intelligent systems, fostering their evolution towards heightened perceptual precision and superior situational awareness. • Automatic floor plan construction is a vital component for intelligent systems. Providing a bird’s eye view of an environment, floor plans are crucial for navigation, localization, and planning. For systems such as autonomous vehicles and mobile robots, they enable efficient path planning and collision avoidance. Furthermore, they assist in resource allocation for indoor positioning systems, improving services like indoor navigation and emergency response. Therefore, the development of automated floor plan construction techniques significantly enhances the perceptual capabilities and operational efficiency of intelligent systems. • Human and non-human motion identification is a critical capability for intelligent systems, particularly for applications in security, automation, and user interaction. By distinguishing between human and non-human entities, systems can perform tasks more 2 efficiently and accurately, ensuring proper response based on the detected subject type. Given the prevalence of pets, robotic vacuum cleaners, and electrical appliances, especially in residential environments, it is crucial to develop a reliable system that can accurately recognize human and non-human subjects. • Indoor intrusion detection is integral for maintaining safety and security in various environments. An advanced intrusion detection system can provide real-time alerts and minimize the false alarms caused by non-human subjects. In the context of intelligent systems, the incorporation of robust intrusion detection can significantly elevate their utility and effectiveness, making them indispensable tools for proactive security management. By enhancing the ability of these systems to detect intrusions and mitigating the interference from non-human motion, we can ensure a higher level of security and peace of mind for individuals and organizations alike. 1.2 Related Works The related works of this dissertation cover floor plan construction, human and non-human motion identification, and indoor intrusion detection, which are reviewed in the following subsections, respectively. 1.2.1 Floor Plan Construction Although projects like Google Indoor Maps, Point Inside, and Micello Indoor Map aim at collecting indoor maps, the availability of indoor maps is still very limited considering the huge number of buildings worldwide. Furthermore, most floor plans are manually generated and 3 uploaded in these projects. The traditional manual methods require professional technicians to draw the floor plan using specialized measurement devices, which is time-consuming, costly, and thus unaffordable to cover all buildings. In addition, potential changes in the indoor environment would require much effort to update the maps. Efforts have been taken to generate indoor maps automatically. Most indoor map construction approaches are based on Simultaneous Localization and Mapping (SLAM), which is widely used in robotics to locate robots in unknown scenarios by mapping the environment simultaneously. Among numerous SLAM systems, many of them are based on vision [1–5]. These systems could construct 3D indoor maps by modeling the environment through images obtained by cameras. However, the cameras have a high requirement for light, making these systems unnable to work in an environment with poor lighting. Besides, there are many privacy concerns in vision-based SLAM systems. Lasers are commonly used to combat the concerns of vision-based systems. However, laser-based systems are costly and need special equipment. Inertial Measurement Unit (IMU) is free of these drawbacks, but it suffers from odometry deviation, especially over a long distance. Pedestrian Dead Reckoning (PDR) algorithm is often employed to estimate the trajectories from inertial sensor data [6–14]. However, PDR estimated trajectories by inertial sensors usually suffer from large deviation, especially in direction. Thus, when these trajectories are employed for floor plan construction, several methods have been proposed to correct the deviation error caused by dead reckoning. In CrowdInside [7] and SenseWit [6], a large number of anchor points, such as water dispensers and doors, are utilized to overcome the derivation of IMU. However, in many buildings, these anchor points are too sparse to correct the IMU errors. Thus, the accuracy and generation performance would decrease in buildings without sufficient anchor 4 points employed in their methods. Furthermore, to construct a map with erroneous trajectories, inertial sensor-based systems require a vast amount of data to estimate the reachable area in the environment. Different from current works based on PDR using inertial sensors, EZMap requires neither any anchor points nor a large number of trajectories. We propose a hierarchical matching scheme to accurately match trajectories at the exact location by considering both the geometric shape estimated from the RF tracking and ambient properties such as RSSI and the MFS. Different approaches have been developed to generate maps from crowdsourced data, which can be classified into two categories: grid-based and topological-based. For grid-based map construction, Gaussian distribution is often used to compute the accessible probability of each cell to create the occupancy grid map. Grids in the matrix are described by values between two states of 0 and 1, where 0 represents that the grid is accessible and 1 represents that objects fully occupy the grid. The value between 0 and 1 represents the percentage of the area that an object occupies the grid [8, 9]. These approaches are easy to build and maintain but suffer from enormous storage space and are time-consuming [15]. Since the topological-based approach is more efficient than the grid-based approach, it attracts more attention from researchers. Walkie-Markie [16] applies the spring relaxation concept, treating the encountered WiFi landmarks as nodes and the user trajectories as springs. As more user trajectories are collected, the node positions are adjusted so as to achieve the minimum system potential energy. CrowdInside [7] uses identifiable acceleration patterns of the elevators, escalators, and stairs to correct dead reckoning, hence constructing the map. This approach fails when the users mainly stay on the same floor, e.g., an exhibition in a one-storey hall or a single-floor office such as that in our experiments. In [17], users have to hold the Arduino sensor, and need to wait for the servomotor to sweep 5 the surroundings. [18] requires the users to walk in a straight line, and the corridors should be perpendicular to each other, which means it cannot be applied to a structure with curved paths. A preliminary version of this work has been published in [19] based on crowdsourcing with carts. In this paper, we improve the work by tackling curve paths and providing detailed information about the environment, such as open spaces and rooms. In addition, existing approaches mainly crowdsource data from a large number of mobile users and targets public spaces like malls and office buildings. Small private spaces like homes are less explored. Differently, EZMap employs the increasingly popular home robots for data collection and extends the scope to home environments. 1.2.2 Human and Non-Human Motion Identification While there have been numerous indoor activity recognition and monitoring methods proposed over the years, only a few have taken into account the impact of non-human motion. Some researchers have attempted to filter out pet motion from human motion due to its potential interference with their systems. In this section, we review the current methods used to differentiate pets from humans, focusing on non-RF-based and RF-based methods. Non-RF-based methods distinguishing pet motion and human motion mainly rely on heat maps obtained from thermal sensors [20] or image/face recognition techniques using cameras [21]. These methods have limited coverage, and are often restricted to LOS, and camera-based methods further introduce privacy concerns [22]. As for RF-based methods, recent attempts to differentiate between humans and pets have employed radar [23] and ultra-wideband (UWB) [24], focusing on analyzing biometric indicators 6 such as heartbeat and respiration. But these methods necessitate the pet to remain stationary during recognition and impose constraints regarding their spatial position and distance from the device, thereby rendering them impractical and ineffective for identifying moving subjects. While [25] enables the recognition of canine movement distinct from human movement through radar, it demands that the device be installed on the ceiling, and restricts the movement of humans and pets to a limited area under the device. Furthermore, the method presented in [26] can differentiate fan motion from human respiration, yet it remains inapplicable to human activities like walking and running. WiFi-based sensing systems are attractive for indoor activity recognition and monitoring due to their ubiquity and low cost [27,28]. Numerous WiFi-based systems have been designed to monitor human motion and the environment. WiFi-based motion detection systems [29,30] detect the occurrence of human motion within an environment, while WiFi-based activity recognition methods [31–35] attempt to recognize human daily activities like running, walking, and sitting. Identification systems like [36–39] recognize human identity through gait or other biometric features extracted from WiFi signals. In addition, mapping and tracking systems that leverage WiFi [40–43] plot human movement paths and generate floor maps. Nonetheless, these systems fail to account for the influence of non-human entities, like pets and vacuum robots, on their operational accuracy. These non-human subjects, capable of affecting WiFi signals similarly to humans, introduce a potential source of interference. Consequently, these systems’ reliability for indoor monitoring and recognition can be compromised, particularly when non-human subjects are present in the environment. Some researchers have proposed to detect intrusion using WiFi. Among the proposed systems, PerFree [44] and M-WiFi [45] proposed methods to reduce false alarms caused by 7 pets. For example, PetFree distinguishes pets from humans by mapping the subject’s effective interference height (EIH) to CSI measurements. This method, however, requires the transmitter and receiver to be positioned at the same height and it operates solely under LOS conditions. Moreover, its performance can be affected by the multipath effect in environments with complex layouts. M-WiFi differentiates between pets and humans by examining the RF disturbance. However, this system encounters issues when a pet moves close to the wireless link. This movement can induce signal fading akin to that of a person moving farther from the wireless link, resulting in a significant number of false alarms, regardless of differences in RF disturbance. Unlike the existing RF-based approaches, we have no restriction on pet movement, device placement, or environment complexity. Notably, most existing works only consider the interference of pets, and few tackle the motion of indoor robots and household appliances. In this paper, we systematically differentiate various typical non-human motions from human motions. 1.2.3 Indoor Intrusion Detection Human detection systems not based on Radio Frequency (RF) primarily utilize technologies such as video cameras [21, 46], sound [47, 48], and Infrared sensors [49, 50]. Though camera- and sound-based detection systems can effectively identify intruders in an environment, their practicality as ideal intrusion detection systems is marred by privacy concerns. Furthermore, camera-based detection devices demand high environmental lighting conditions and struggle under Non-Line-of-Sight (NLOS) scenarios. Infrared intrusion detection devices, due to their relatively better privacy protection and cost-effectiveness, are quite popular. However, these devices come with their own set of 8 limitations. They require precise positioning, offer smaller coverage areas, and fail under NLOS conditions. Most importantly, their detection accuracy can be influenced by environmental temperature, and they lack the ability to distinguish between human and non-human targets like pets. As for RF-based Intrusion Detection, there has been an uptick in indoor intrusion detection projects using devices like millimeter-wave radars recently. While these devices have superior accuracy and lower false-positive rates compared to infrared-based devices, their high cost and extremely limited coverage area, along with inability to operate under NLOS conditions, pose significant limitations. As IoT devices and WiFi become increasingly ubiquitous, a surge in the development of WiFi-based sensing systems has been observed [51, 52], thanks to their omnipresence and cost- effectiveness. Of late, numerous studies have attempted to leverage WiFi for human detection [29, 53–55]. However, these systems fail to eliminate false alarms triggered by non-human movements in the environment, such as pets, cleaning robots, fans, etc. M-WiFi [45] proposed to distinguish pet from human based on the difference of the RF disturbance, but it failed to reduce false alarm rates, as pets approaching the wireless connection could produce similar interference to humans distancing themselves from it. PetFree [44] introduced a method to eliminate pet interference but required all equipment to be placed at the same height and its performance decreases when applied to in complex environments. Moreover, these existing approaches only accounted for pet interference, incapable of excluding disturbances from other non-human entities like cleaning robots or moving household appliances. Our previous work [56] proposed an SVM-based method to differentiate various non- human targets from humans, offering a more comprehensive way to discern between human and 9 non-human movements in the environment. However, as a classification methodology, this work did not consider the temporal correlation of targets’ appearances in the environment. In this paper, we leverage a more robust deep learning model, designed specifically for WiFi-based intrusion detection, and combine both current target detection results with historical information, enabling an intrusion detection system capable of eliminating interference from various non-human targets in the environment. 1.3 Dissertation Outline and Contributions In this dissertation, we initially lay out the fundamentals of WiFi sensing in Chapter 1.4, encompassing the Received Signal Strength Indicator (RSSI) and Channel State Information (CSI). Following this, in Chapter 2, we delineate a method for automatic floor plan construction. Chapters 3 and 4 respectively detail a Support Vector Machine (SVM) based method for distinguishing human and non-human motion using edge computing, as well as a framework for human and non-human subject recognition with deep learning methods. Chapter 5 introduces a robust indoor intrusion detection system resilient to non-human movement, underpinned by deep learning. Finally, in Chapter 6, we conclude the dissertation and discuss the future work. 1.3.1 Automatic Floor Plan Construction (Chapter 2) In this chapter, we present EZMap, a high-accuracy, low-cost floor plan construction system that fuses RF and inertial sensing. EZMap combines the fine-grained yet local information from RF tracking with the coarse-grained but global contexts from inertial sensing (e.g., magnetic field strength), which together makes for an accurate map. Our system employs a robot for trajectory 10 collection and requires only a single Access Point to be arbitrarily installed in the space, both of which are widely available nowadays. Furthermore, it can generate a map even only a small amount of data is available, allowing it to scale for different buildings like malls, office buildings, and homes with little cost. We validate the performance using a Dji RoboMaster S1 robot with commodity WiFi in three different buildings. The results show that our system can efficiently generate faithful maps for the targeted areas. With the ubiquity of WiFi infrastructure and the rise of home robots, we believe our approach will pave the way for pervasive indoor maps services. 1.3.2 Human and Non-human Motion Discrimination (Chapter 3) In this chapter, we propose a novel system, Wi-MoID, that passively and unobtrusively distinguishes moving human and various non-human subjects using a single pair of commodity WiFi transceivers, without requiring any device on the subjects or restricting their movements. Wi-MoID leverages a novel statistical electromagnetic wave theory-based multipath model to detect moving subjects, extracts physically and statistically explainable features of their motion, and accurately differentiates human and various non-human movements through walls, even in complex environments. In addition, Wi-MoID is suitable for edge devices, requiring minimal computing resources and storage, and is environment-independent, making it easy to deploy in new environments with minimum effort. We evaluate the performance of Wi-MoID in four distinct buildings with various moving subjects, including pets, vacuum robots, humans, and fans, and the results demonstrate that it achieves 97.34% accuracy and 1.75% false alarm rate for identification of human and non-human motion, and 92.61% accuracy in unseen environments without model tuning, demonstrating its robustness for ubiquitous use. 11 1.3.3 Deep Learning for Human and Non-human Motion Identification (Chapter 4) In this chapter, we design a deep learning framework to recognize human and non-human subject with WiFi signals through the wall. Our system extract environment independent features from single-link WiFi. We investigate the performance of popular deep neural networks and explore the efficacy of transfer models trained with image dataset on WiFi sensing tasks. We implement our framework and evaluate the performance in five environments with commodity WiFi devices. With a challenging dataset considering large pets and multiple subjects coexiting cases, our proposed framework achieves an average validation accuracy of 95.84% and an average testing accuracy of 91.71% in unseen environments without further training or parameter tuning. These results underline the robustness of our approach and its readiness for integration into ubiquitous intelligent IoT systems and applications. 1.3.4 Through-the-wall Indoor Intrusion Detection (Chapter 5) In this chapter, we propose Wi-IntruNet, the first robust through-wall indoor intrusion detection system that mitigates the interference from non-human indoor objects based on widely available WiFi signals. Wi-IntruNet introduces a novel deep learning framework for single-link WiFi signal analysis, employing a ResNet model for extracting features of indoor moving targets and an LSTM to incorporate temporal data for efficient intrusion detection. Notably, Wi-IntruNet is invariant to environmental changes, angle, and position, enabling swift deployment in new environments without additional training. We implement and extensively evaluate Wi-IntruNet 12 with commodity WiFi devices in 5 typical indoor environments with various interference from pets, cleaning robots and fans. The results revealed that Wi-IntruNet achieved an average of 98.91% intrusion detection accuracy and 2.84% false alarm rates in unseen environments without model tuning, underscoring its robustness and ready for ubiquitous indoor security applications 1.4 Preliminaries on Wireless Sensing Wireless sensing represents a pivotal advancement in the field of remote detection and measurement technologies. It leverages the pervasive and invisible radio waves to detect, identify, track, and understand phenomena within their range without requiring a physical connection or proximity. One notable subset of this technology is WiFi sensing, which utilizes the omnipresent WiFi signals as a source of information to enable context-aware applications. WiFi sensing is capable of detecting minute changes in the wireless signals caused by various objects and movements in the environment, providing rich information about the surroundings. This transformative technology not only offers an innovative way to interact with the environment but also opens up a plethora of opportunities for applications in various sectors such as healthcare, security, home automation, and more. 5 GHz WiFi, part of the dual-band WiFi capabilities present in many modern devices, provides a faster, less congested frequency range compared to its counterpart, 2.4 GHz WiFi. This frequency band not only allows for greater data transmission speeds but also supports a wider range of channels, reducing the potential for interference from other devices. When utilized for WiFi sensing, the 5 GHz spectrum can offer improved performance and accuracy due to its characteristics. WiFi sensing on this band leverages the scattering, reflection, and absorption 13 of the 5 GHz signals by objects and people in the environment to create a rich tapestry of data about the surroundings. This information can be analyzed and interpreted to enable a range of applications, such as detecting movement or occupancy in a room, monitoring vital signs for healthcare, or even identifying gestures for interaction with smart devices. The use of 5 GHz WiFi sensing thus combines the benefits of higher frequency wireless communication with the growing potential of context-aware technologies. 1.4.1 Received Signal Strength Indicator Received Signal Strength Indicator (RSSI) is a metric used in wireless communications to measure the power level received by a device from a source transmitter. In an IEEE 802.11 system, RSSI is the relative received signal strength in a wireless environment. RSSI is an indication of the power level being received by the receiving radio after the antenna and possible cable loss. A higher RSSI value denotes a stronger received signal. This metric is crucial for various applications, including network diagnostics, link quality estimation, and location- based services. Factors affecting RSSI include the distance from the transmitter and physical obstructions. RSSI plays a vital role in various WiFi sensing applications. One of the primary applications of RSSI in WiFi sensing is indoor positioning. By measuring the RSSI values from multiple access points, a device can estimate its location based on signal strength triangulation. The stronger the signal from a particular access point, the closer the device is likely to be to that access point. Beyond positioning, RSSI also aids in detecting movement, gauging link quality, assisting in network management, and offering insights into human-device interactions. 14 While RSSI offers a singular measure of overall signal strength, indicative of the total received power from a transmitter, there are some cases—such as motion detection, through-wall sensing, gait recognition—where finer granularity is required to capture the intricate nuances of the wireless environment. Therefore, Channel State Information (CSI) comes into play, providing detailed information on the channel’s response across different frequencies. CSI captures the amplitude and phase of each signal path, allowing for a deeper understanding of environmental dynamics. In scenarios demanding precision and comprehensive insights into the wireless channel, CSI is the preferred choice over RSSI. 1.4.2 Channel State Information CSI represents an intricate matrix containing the properties of the wireless communication channel. It captures comprehensive information regarding the propagation path between each transmit-receive antenna pair, including the path loss, multipath fading, phase shift, and Doppler effect due to relative movement. The CSI thus provides a detailed spectral view of the WiFi signal as it traverses through, and interacts with, the complex wireless medium. The CSI is often calculated as the frequency response of the channel, which can be modeled as H(t, f) = L∑ l=1 al(t) exp (−j2πfτl(t)) where al(t) and τl(t) denote the complex amplitude and propagation delay of the l-th multipath component (MPC), respectively, and L stands for the number of MPCs. Due to the timing and frequency synchronization offsets and additive thermal noise, the 15 real measurement of CFR H̃(t, f) is expressed as H̃(t, f) = exp(−j(α(t) + β(t)f))H(t, f) + n(t, f), where α(t) and β(t) are the random initial and linear phase distortions at time t, respectively. Define the channel power response G(t, f) as the square of the magnitude of H̃(t, f) G(t, f) ≜ |H̃(t, f)|2 = |H(t, f)|2 + 2Re {n∗(t, f)H(t, f) exp(−j(α(t) + β(t)f))}+ |n(t, f)|2 ≜ |H(t, f)|2 + ε(t, f), (1.1) where the superscript ∗ denotes the operator of complex conjugate, the operator Re denotes the real part of x, and n(t, f) is defined as the noise term, which can be approximated as additive white Gaussian noise (AWGN) with variance σ2 n and is statistically independent of H(t, f) [57]. 16 Chapter 2: Automatic Floor Plan Construction Location-Based Services (LBS) are gaining increasing popularity, such as navigation, location-based searching, and social network services, etc., thanks to the ever-growing presence of smart devices with built-in location systems and maps. However, most of these services are only available in outdoor environments. One of the most critical constraints to the development of indoor LBS is the lack of digital indoor maps [58]. Recent advances in RF-based tracking enable an opportunity for this breakthrough [59–62]. Notably, a recent work RIM [63], an RF-based Inertial Measurement system, has enabled centimeter-level distance tracking on commodity WiFi in both Line-of-Slight (LOS) and Non- LOS (NLOS) conditions with only a single Access Point. RIM allows us to collect high-accuracy trajectories with commodity WiFi clients (e.g., home robots) and further construct precise floor plans from the gathered trajectories. There are multiple advantages if we could leverage this opportunity for floor plan construction: It only uses commodity WiFi, particularly without expensive hardware like lidar or privacy-intrusive modules like cameras. It promises high accuracy for broad coverage with only one AP, including both LOS and NLOS areas throughout the space being covered by the WiFi signals [64]. Moreover, different from previous works that assume a number of mobile users to contribute a lot of data, we could achieve high-quality construction with a small amount of data efficiently collected by a single home robot. 17 However, leveraging RF-based tracking like RIM for high-precision floor plan construction entails great challenges. First, RIM only provides distance estimation (using a linear array available on most devices). To recover a trajectory, we still need the direction information, for which we can only utilize the inaccurate inertial sensors. Second, despite its high moving distance estimation accuracy, the RF-based tracking lacks global reference information, a necessity to connect different trajectories and construct a floor plan. And it becomes very difficult to obtain reliable global reference information while there is only one single AP in the environment [65]. Third, given the imperfect information, it is non-trivial to fuse trajectories collected by a robot to recover a floor plan. In this chapter, we overcome the above challenges and propose EZMap, a novel system to boost the automatic construction of indoor floor plans. EZMap employs a commodity home robot, equipped with WiFi and inertial sensors to roam around the environment and collect data. It requires only a single AP and can map not only public spaces such as malls and office buildings but also home environments that are barely explored by existing works. The reconstructed map represents hallway structure, room layout, as well as their sizes (i.e., corridor widths and room sizes). This chapter is organized as follows. Section 2.1 elaborates on the system design and detailed solutions. Section 2.2.3 shows the implementation, experiments, and evaluation results. In Section 2.3, we conclude this chapter. 18 5 6 7 8 9 10 11 12 13 14 15 5 10 15 20 25 30 35 -4 -2 0 2 4 6 8 10 12 14 16 10 12 14 16 18 20 22 24 26 Acquisition Segment Matching Trajectory Bundling Geometric Constraints Trajectory Fusing and Shaping Constructed Floor Plan Trajectory Segmentation Ambient Properties Figure 2.1: System overview. 2.1 System Design 2.1.1 System Overview In this part, we present an overview of EZMap system as Fig. 2.1 shows. The traces can be collected by humans or robots with one AP deployed in the environment. The traces contain the distance information obtained by RIM and direction information derived from inertial sensors. RSSI of the AP and the magnetic field strength (MFS) are also recorded. Then, the collected trajectories, which could be of arbitrary lengths, are divided into short segments, named as atomic segments, to overcome the potential errors accumulated in orientation while leveraging the accurate distance estimation (Trajectory Segmentation). The atomic segments are then clustered by their intrinsic geometric constraints as well as the accompanying time series of RSSI and MFS (Segment Matching). The clustered segments are then positioned by bundling the long trajectories and thereby inferring their relative positions (Trajectory Bundling). Finally, all trajectories are fused, taking their original shapes into account, to output the reconstructed map with corridor widths and room sizes (Trajectory Fusion and Shaping). 19 Tracking Device Cart (a) Cart. Tracking Device Dji Robot (b) Robot. Figure 2.2: Crowdsourcing trajectory collection devices. 2.1.2 Trajectory Acquisition The crowdsourcing trajectory tracking is based on RIM [63] for distance estimation and inertial sensors for orientation estimation. RIM is good at estimating the moving distance of wheeled platforms and robots [40]. Leveraging multipath profiles as virtual antennas with a super-resolution virtual antenna alignment algorithm, RIM achieves centimeter accuracy in moving distance over a multipath-rich area. RIM needs only a single AP, and captures Channel State Information (CSI) from data packets for its moving distance calculation. EZMap relies on RIM’s algorithm to estimate the moving distance with high accuracy while using inertial sensors to measure the turning angles and heading information, which together shape the geometric properties. The device setup can be found in Fig. 2.2, where we can use a pushing cart in Fig. 2.2a and a Dji robot in Fig. 2.2b. In both cases, the blue bot is customized hardware with a commodity WiFi chipset and IMU on it. We use a customized hardware because CSI extraction is not yet ready on Dji robot. EZMap itself is a software solution making no changes to hardware. 20 2.1.3 Trajectory Segmentation The acquired trajectories come in various lengths from meters to tens of meters or even longer. EZMap prefers long trajectories in general, as they likely cover and connect wider areas and thus contain more information to be uniquely identified. However, long trajectories suffer from large accumulative errors particularly in orientation. To overcome the issues, we propose to decompose all collected trajectories into short pieces in a novel form of atomic segments that better preserve the accurate geometric shape information. We divide each long trajectories into the structures that consist of a straight segment with two turns on the two ends, as shown in Fig. 2.3(a). We name such a structure an atomic segment and use it as the basic unit of trajectories in EZMap. Each atomic segment contains distance information of the path section and angle information of two turns at its two ends. Such atomization can be accomplished by a simple turn detection based on inertial sensor readings. In EZMap, we calculate the change rate of angle (the first-order reciprocal of angle to time) of accelerometer and gyroscope data to detect turns. The procedure of trajectory segmentation is shown in Fig. 2.3(b). The design of atomic segment is inspired by the below insights: First, the atomic segments disconnect the original trajectories at each turn, because orientation errors easily accumulate during a turn (as seen in the example in Fig. 2.3(c)). As a result, it preserves the accurate distance information while resets significant orientation errors before they could accumulate over multiple turns. Second, the two turns are kept as parts of each atomic segment because they provide more information to cluster the segments that have similar lengths. Taking the segments in Fig. 2.3(a) as an example, two atomic segments, ending with a left turn and a right turn respectively, can 21 Path Section 2nd Turn 1st Turn Atomic Segments (a) Example atomic segments. -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25 -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25 -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25 -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25 -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25 -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25 -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25-6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25-6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25 Long trajectory Turn Detection Segmentation Atomic segments (b) Procedure of trajectory segmentation. -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 -6 -4 -2 0 2 4 6 8 -10 -5 0 5 10 15 20 25 Correct Trajectory Trajectory with direction estimation error (c) Errors accumulated at turns. Figure 2.3: Atomic segments. be separated even they have the same length. Third, considering real-world building layouts, such atomic segments can be frequently obtained. And as will be demonstrated later, the global location information (i.e., RSSI and MFS time series) associated with such segments suffice to 22 uniquely cluster and reconnect them, be jointly considering the precise geometric shape of the segments. Note that some trajectories will not be segmented by the atomization. For example, if the target circles around and moves along a curved pathway, the resultant trajectory would be continuously “turning” and will never be cut. These curved segments are still useful, as they possibly provide information about some open spaces and/or rooms, and will be handled separately in area shaping, as in Section 2.1.6.2. 2.1.4 Segment Matching To generate an accurate floor plan with tracking traces, we need to accurately estimate the relative position of each trajectory. However, it is very challenging because we do not have global reference or the start point of each trajectory. As we discussed in the previous section, we divide the long trajectories into atomic segments and resort to determining the relative position of each atomic segment. To accurately predict the relative position of each segment, we take two steps. We first cluster the atomic segments belongings to the same location. Then the atomic segments in the same cluster are positioned and bundled by inferring its relative position from the long trajectory. In this section, we will focus on the first step, that is identifying the segments at the same location and distinguishing those at different locations. We present a trajectory matching algorithm to recognize the segments belonging to the same location and distinguish those belonging to different locations by their geometric shapes and time series of RSSI and MFS. The matching process is shown in Fig. 2.4. 23 5 6 7 8 9 10 11 12 13 14 15 5 10 15 20 25 30 35 -4 -2 0 2 4 6 8 10 12 14 16 10 12 14 16 18 20 22 24 26 -4 -2 0 2 4 6 8 10 12 14 16 10 12 14 16 18 20 22 24 26 Ambient Properties Geometric Constraints Figure 2.4: An example of matching process. 2.1.4.1 Geometric Constraints With the unique structure that we designed for segments, each atomic segment has a geometric shape which contains the angle information of two turns and distance information of the path section in between. Atomic segments related to same path show similar geometric shapes. Geometric constraints are designed to sort the segments with similar shapes together based on their geometric shape information. Example of atomic segments belonging to the same path are shown in Fig. 2.3(a). The constraint of geometric shapes has two parts, distance constraint and angle constraint. As for the distance constraint, based on experimental observations and the distance tracking accuracy of the tracking client, we limit the distance difference between segments on the same 24 path to be no more than 3 m. In other words, if the difference of estimated distances of two path sections are greater than 3 m, these two segments are unlikely to be at the same location. As for the angle constraint, considering certain angle errors, the angle difference between the same turn is limited to be no more than 30 degrees. We choose 30 degrees because it has tolerance for the orientation estimation error from IMU, but will not match the trajectories of different turns into one category. If the constraint is too strict, the segments of the same path may not be matched together as they have orientation deviations caused by IMU. If the constraint is too loose, the segments with different turning angles would be mistakenly matched. Noted that atomic segments from the same location with different turnings (left and right) would not be clustered together by this method. But this will not affect the final reconstructed map because they would be recognized and merged together with trajectory fusion later at Section 2.1.6.1. To verify the effectiveness of the geometric constraints, we conduct a comparative experiment where over 200 atomic segments are used to compare the matching accuracy with and without geometric constraints. The experimental results validate that the trajectory constraint can effectively improve the accuracy of segment matching, which will be shown later in Section 2.2.3.1. 2.1.4.2 Ambient Properties The geometric constraints alone are insufficient to support segment matching with high accuracy since it cannot separate the segments from different paths but having similar geometric shapes. Therefore, a robust global reference is crucial to the matching accuracy and the 25 0 20 40 60 80 100 120 140 160 Time series number of MFS -20 -10 0 10 20 30 40 50 M F S ( T ) 1 st time series of MFS on path 1 2 nd time series of MFS on path 1 1 st time series of MFS on path 2 2 nd time series of MFS on path 2 (a) Time series of MFS. 0 20 40 60 80 100 120 140 160 180 Time series number of RSSI -55 -50 -45 -40 -35 R S S I (d B m ) 1 st time series of RSSI on path 1 2 nd time series of RSSI on path 1 1 st time series of RSSI on path 2 2 nd time series of RSSI on path 2 (b) Time series of RSSI. Figure 2.5: Time series of MFS and RSSI on two path. construction performance. In EZMap, we leverage the time series of MFS and RSSI along the segment, denoted as ambient properties for global reference information and propose a robust algorithm to match them. As described in CrowdMap[8], the magnetic field anomalies caused by ferromagnetic construction materials (e.g., reinforcing steel bars) on the indoor path are generally stationary and unique over time. The time series of MFS are collected when a user or a robot walks along a specific indoor path and are used as the representative feature of the corresponding path. As shown in Fig. 2.5(a), the time series of MFS data along the same indoor path is similar, while those along different indoor paths differ considerably. Similar characteristics also appear on the time series of RSSI values measured for a single AP, as shown in Fig. 2.5(b). Although RSSI values at a single location suffer from very limited space resolution, a series of RSSI along a path provide more distinctive information. A cluster can be generated when the similarity value of two atomic segments based on the time series of MFS and RSSI is below a threshold. A similarity vector can be calculated for an 26 atomic segment, measuring the similarity between itself and each existing cluster of the atomic segments. For an atomic segment j, we apply dynamic time warping (DTW) [66] to calculate its RSSI similarity vector Dj,R and MFS similarity vector Dj,M , respectively. Dj,R and Dj,M are normalized with min-max normalization and summed together to obtain the similarity vector as follows: Dj = Dj,R +Dj,M = [dj,1, dj,2, ..., dj,i, ..., dj,N ], (2.1) where dj,i denotes the similarity value between atomic segment j to the ith cluster, and N denotes the total number of clusters. If dj,i is below a pre-defined threshold (e.g., 0.2), this atomic segment is clustered to the ith cluster and can be matched with other segments from the ith cluster. If an atomic segment cannot be matched with any existing cluster, we generate a new cluster for it. Based on our experimental observations, two segments with a similarity value below 0.2 can be considered matched. As we have normalized the RSSI vector and MFS vector during fusion, the threshold can be generalized to various environments. In some corner cases, the threshold may not work well. For example, when: 1) the similarity value of two matched segments (i.e., two segments from the same location) is greater than 0.2, and 2) there are multiple similarity values lower than 0.2 but only one segment should be matched. In the first case, although the segments are mistakenly treated as belonging to two clusters, we will be able to merge them together by trajectory fusion in Section 2.1.6.1. As for the second case, we will match the target segment with the most similar one, meaning the one with the smallest similarity value. The threshold is tested to be valid in our experiments in three scenarios as will be shown in Section 2.2. Note that for the matched segments within the same cluster, we not only know that they 27 should come from the same location but also have the alignment information between each pair of the segments, e.g., we have information about how each turn on the atomic segment corresponds to each other on the other matched segments. 2.1.5 Trajectory Bundling In the previous section, we identify the segments belonging to the same location. To reconstruct an accurate floor plan with these segments, we also need to identify their relative positions. In this section, a trajectory bundling algorithm is designed to determine the position of atomic segments. The long trajectories are bundled leveraging their matched atomic segments. For every two long trajectories that have matched segments, they are bundled in the following two steps: 1) The first turning points of two matched segments are stitched together. 2) The long trajectories are rotated so that the direction of the path section of the two matched atomic segments are consistent. Every long trajectory is bundled to others via aligning their matched segments. After long trajectories are bundled, we can determine the coordinates of atomic segments by utilizing their spacial relationship on their original long trajectories. An example of trajectory bundling is shown in Fig. 2.6, where two trajectories are bundled by aligning their matched segments. However, as we treat the long trajectory as a rigid body for trajectory bundling, the long trajectories containing large accumulative direction errors, as shown in Fig. 2.3(c), can reduce the accuracy of the generated map. In the next section, we will adjust the positions of atomic segments on the long trajectory to avoid the impact of accumulated errors on the generated map. 28 5 10 15 20 25 30 0 5 10 15 20 25 10 15 20 25 30 35 0 5 10 15 20 25 5 10 15 20 25 30 -5 0 5 10 15 20 Matched segments 1st turn aligned Matched segments aligned Figure 2.6: Bundling process. 2.1.6 Trajectory Fusion and Area Shaping 2.1.6.1 Trajectory fusion Although the rough position of atomic segment can be inferred by bundling long trajectories, atomic segments belonging to the same path have severe coordinate deviation, as shown in Fig. 2.7, which is undesirable. Hence, we propose a trajectory fusion algorithm to generate a more accurate and well-shaped map next. Leveraging the characteristics of matched segments, we fuse matched atomic segments in two steps. First, for matched atomic segments at the same path, the inner-cluster constraints, including angle and positions of their turns, are utilized to update their positions. We calculate the medium coordinate of their endpoints at the same turn and update the coordinate of each endpoint by New location =(1− b)∗cluster medium location + b∗original location, (2.2) where b ∈ (0, 1) is a parameter balancing the cluster median location and original location. Then, the intra-cluster global information can be leveraged to merge the turns that are 29 Four bundled trajectories Four trajectories fused together Location Update Helmert Transformation Trajectory Fusion Figure 2.7: Trajectory fusion process. mistakenly separated, which would happen when atomic segments on the same path are classified into different clusters, as discussed at the end of Section 2.1.4.2. DTW can be applied again to merge the mistakenly separated turns. After the trajectories are bundled, we search for the turns within a small area. Different from last time that we applied DTW on all pairs of atomic segments, this time we only apply the DTW on the turns close to each other. If the DTW distance of MFS and RSSI of two turns are below a threshold (e.g. 0.05 according to our experiments), the two turns are considered to be at the same location and merged together. Thus, the mistakenly separated segments belonging to the same path are correctly merged. Finally, with the new coordinates of two endpoints, each atomic segment is transformed to its new position by Helmert transformation [67], a similarity transformation frequently used in geodesy to produce datum transformations between datums. Two dimensional Helmert transformation is performed here. With the knowledge of the original coordinate of two turning points and the updated coordinates of each points on the segments, Helmert Transformation calculates the updated coordinates of each point on the segments. The transformed segments preserve the shape and 2D information of the original atomic segments. Fig. 2.7 shows an example of trajectory fusion process, where Helmert Transformation is performed on four trajectories. 30 2.1.6.2 Area Shaping When segmenting the long trajectories, there are atomic segments without straight path section in between. Instead, they contain curved sections with continuous “turnings”. While the straight segments, as discussed before, reflect the major layout of the building, the curved segments also contain useful information about rooms, curved corridors, and open spaces. Therefore, we separately process them to add more details to the recovered floor plan. The position of the curved segment is deduced from its spatial relationship with straight segments. We first find out straight segments that directly connect with the curved segment. Then, the positions of straight segments are utilized to estimate the position of the curved segment and Helmert transformation is then performed on the curved segment. This process is shown in Fig. 2.8. Without losing the shape information, curved segments are grouped with straight segments to reconstruct the curve corridors, rooms and open spaces. Hence, the map construction algorithm can not only generate straight corridors, but also estimate the location and area of the rooms, making the floor plan more detailed and comprehensive. Finally, to provide a well-shaped floor plan, we use alpha-shape [68] method to extract the contour of trajectories, which creates a bounding area that envelops the set of points on all trajectories. Specifically, it finds a convex hull using Delauney Triangulation and then deletes some triangle borders according to parameter α to make a better fitting hull. 31 Curved atomic segment Straight atomic segment Figure 2.8: Curved trajectory positioning process. 2.2 Experiment 2.2.1 Experimental Setup and Data Collection The proposed algorithm is evaluated in three scenarios: Scenario I is an office with an area of 22.0 m × 36.5 m, which consists of corridors, rooms, doors, and elevators as shown in Fig. 2.9(a); Scenario II is the second floor of a townhouse with an area of 5.8 m × 12.2 m, consisting of a kitchen and a living room as shown in Fig. 2.9(b); and Scenario III is a dining court of a campus building with an area of 49.0 m × 12.5 m, which consists of a dining hall and hallways around it, as shown in Fig. 2.9(c). The goal is to reconstruct the detailed floor plan using the proposed system. In Scenario I, the tracking device is installed on a cart pushed by human as shown in Fig. 2.2(a). Traces are collected on 12 different days in 5 months, with a total of 64 crowdsourcing trajectories collected. Since the different turning directions are considered, there are 18 clusters in this data set. In Scenarios II and III, the tracking device is equipped on a robot, Dji RoboMaster S1, as shown in Fig. 2.2(b), which collect 49 trajectories with 8 clusters over 16 different days in 5 months for Scenario II, and 54 trajectories with 19 clusters on 7 different days in 2 months 32 36.5 m 22 .0 m AP (a) 5.8 m 12 .2 m AP (b) 49.0 m 12 .5 m AP (c) Figure 2.9: Ground truth floor plans of (a) office, (b) home, and (c) campus court. for Scenario III. We collect data over a long course to validate the robustness of EZMap over environmental dynamics, while we only obtain a small amount of trajectories to demonstrate the effectiveness of the proposed system. EZMap employs a commodity home robot to collect data because it is labor-saving and efficient. Moreover, home robots are widely available nowadays. However, EZMap is not limited to high-precision robotic tracking based on RIM. Our proposed map construction algorithm is also applicable to high-precision pedestrian tracking, such as WiBall [69]. 2.2.2 Reconstructed Floor Plans The results of the three scenarios are presented in Fig. 2.10, Fig. 2.11, and Fig. 2.12, respectively. For all three scenarios, the first subfigure (i.e., Fig. 2.10(a), Fig. 2.11(a), and Fig. 2.12(a)) shows the bundled trajectories with color lines. The reconstructed floor plan of Scenario I is shown in Fig. 2.10(b), where we can see the accessible area of rooms and open space are well reconstructed with curved trajectories. The reconstructed hallway plan of Scenario I is shown in Fig. 2.10(c), while the ground-truth hallway plan extracted by alpha-shape is shown 33 (a) (b) (c) (d) Figure 2.10: Construction result of Scenario I. (a) Trajectories bundling result. (b) Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground-truth floor plan. (a) (b) (c) (d) Figure 2.11: Construction result of Scenario II. (a) Trajectories bundling result. (b) Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground-truth floor plan. in Fig. 2.10(d). In Scenario II, the reconstructed floor plan is shown in Fig. 2.11(c), where the open space in Kitchen is well reconstructed by our system. Without the need of anchor points required by most related works, our system demonstrate the capability to reconstruct the floor plan for not only large and public areas, but also small and private environments like homes. Fig. 2.12(c) shows the well reconstructed floor plan of Scenario III. The result shows our system can accurately reconstruct the floor plan for a variety of areas with only a single AP. 2.2.3 Performance Evaluation The matching performance, room size accuracy, hallway shape accuracy and the construction efficiency of EZMap are evaluated. As for hallway shape and room size, we compare our system with SenseWit [6] and CorwdInside [7] using the evaluation result reported 34 (a) (b) (c) (d) Figure 2.12: Construction result of Scenario III. (a) Trajectories bundling result. (b) Reconstructed floor plan. (c) Alpha-shape of reconstructed floor plan. (d) Alpha-shape of ground- truth floor plan. in [6]. We do not implement the two systems for comparison because the prerequisites make them inapplicable to our settings. 2.2.3.1 Matching performance The matching performance is evaluated by purity measure and Normalized Mutual Information (NMI) score. The purity score sc is defined as sc = 1 M c∑ i=1 Ci, (2.3) where Ci is the number of atomic segments in the largest class inside each cluster, c is the number of clusters and and M is the total number of atomic segments. The NMI is defined as NMI(Y,C) = 2× I(Y ;C) [H(Y ) +H(C)] , (2.4) where Y denotes class labels, C denotes cluster labels, H(.) is Entropy, and I(Y ;C) is Mutual Information between Y and C. 35 In this experiment, we collect 206 atomic segments belonging to 18 true clusters, which are clustered into 20 clusters as our algorithm incorrectly separates one of the clusters into three. All the atomic segments belong to the largest class inside the cluster, except for one, resulting in a purity score of 99.51% and a normalized mutual information score of 95.05%. To justify that the geometric constraint is very beneficial to atomic trajectory matching, a comparison experiment is conducted, where the matching performance is calculated without adopting the geometric constraint. The purity score drops to 87.38%, and the NMI score becomes 81.80%, proving that the geometric constraint can improve the matching performance significantly. On the other hand, it also demonstrates, surprisingly, that a reasonable accuracy of over 80% of segment matching could be achieved by using magnetism and RSSI of a single AP. 2.2.3.2 Room size The room size is evaluated by the error between the reconstructed room size and ground truth area, which is calculated as: Error = |reconstructed room size− ground truth size| ground truth size . (2.5) We compare the error of our system with SenseWit and CrowdInside with the data collected in office. The average error of SenseWit is 31.4%, and that of CrowdInside is 40.6%. Our system has an average error of 36.1%, which is lower than CrowdInside and higher than SenseWit. Without requiring lots of anchor points as SenseWit, the accuracy of our system is still comparable. The estimation error is mainly due to obstacles like tables, drawers, and chairs in the office, which prevent the cart or robot from walking through the entire room. 36 2.2.3.3 Hallway shape We evaluate the hallway shape with the same metric as SenseWit [16], which is shown below: P = |Sgen ∩ Strue| |Sgen| , R = |Sgen ∩ Strue| |Strue| , F = 2 ∗ P ∗ R P +R , (2.6) where P is the precision of the hallway shape, R is the recall, and F is the harmonic mean of precision and recall. P is defined as the overlapped area divided by the reconstructed hallway area. R is defined as the overlapped area divided by the ground-truth hallway area. Table 2.1: Evaluation results of hallway shape EZMap SenseWit CrowdInside P 78.18% 75.3% 59.5% R 75.10% 82.4% 47.1% F 76.61% 78.69% 52.0% The evaluation result is shown in Table 2.1. The origin point and the orientation of ground truth hallway and the generated hallway are aligned. From the table, we can find that the P , R, and F of our system are better than CrowdInside and comparable with SenseWit. The precision of our system is higher than SenseWit, but recall rate is lower than SenseWit, which is because we do multi-trajectory fusion so that the hallway widths are less than the ground truths. Note that SenseWit needs various anchors to achieve comparable performance with the proposed system, showing that the proposed system can generate accurate hallway plans for various environments, 37 (a) (b) (c) (d) (e) Figure 2.13: Reconstructed hallway plan results of Scenario I using different number of trajectories. (a) Result of using 24 trajectories. (b) Result of using 35 trajectories. (c) Result of using 46 trajectories. (d) Result of using 52 trajectories. (e) Result of using 61 trajectories. especially for the private environments without anchor points, such as home. 2.2.3.4 Construction efficiency Different from most crowdsourcing based indoor map construction system, EZMap can generate hallway map with much fewer trajectories while achieving comparable precision. We evaluate the construction efficiency of EZMap in Scenario I. Fig. 2.13 shows the reconstructed hallway with a number of 24, 35, 46, 52, and 61 trajectories, respectively. The hallway construction accuracy with different number of trajectories is 53.76%, 67.44%, 71.99%, 73.46%, and 76.61%, respectively. We compare the construction efficiency of EZMap with SenseWit, and CrowdInside, shown in Table 2.2. SenseWit uses over 300 trajectories to cover a campus library with 464 m2 area, and CrowdInside employs over 150 trajectories to cover an environment with 448 m2 area. We can find that EZMap achieves comparable accuracy to SenseWit with only one fourth of the number of trajectories used in a twice larger environment, and EZMap is more accurate than CrowdInside with much less trajectories needed. The efficiency evaluation result shows that EZMap can rapidly reconstruct accurate hallway. In addition, our design employs low-cost home robots, which also save significant manpower for data collection. 38 Table 2.2: Evaluation results of construction efficiency EZMap SenseWit CrowdInside Trajectory number 61 300 150 Area 803 m2 464 m2 448 m2 Trajectory number/area 0.08 /m2 0.65 /m2 0.33 /m2 2.3 Summary In this chapter, we present EZMap, a universal automatic floor plan construction system that does not need any prerequisite knowledge of buildings in advance. EZMap benefits from recent advances in centimeter-accuracy indoor tracking using RF signals. It leverages commodity WiFi to estimate accurate moving distances and employs inertial sensors for orientation reckoning. EZMap then processes crowdsourced trajectories with a novel pipeline of trajectory segmentation, matching, bundling, and shaping, which ultimately reconstructs a floor plan with not only skeletal layouts but also detailed area sizes of straight/curved corridors, open spaces, and rooms. Requiring minimal infrastructure and a small amount of data, EZMap can scale to a number of various buildings including public malls, offices, as well as home environments. To our best knowledge, EZMap is the first RF-based system that can accurately reconstruct map of private environments such as home. 39 Chapter 3: Human and Non-human Motion Discrimination With the proliferation of Internet of Things (IoT) devices, indoor intelligent applications such as security surveillance, intruder detection, occupancy monitoring, and activity recognition have gained significant attention [28]. However, these applications frequently suffer from an elevated rate of false alarms due to the inability to recognize human and non-human subjects, such as pets, robotic vacuum cleaners, and electrical appliances, as depicted in Fig. 3.1. This ability to differentiate is essential, especially for applications related to security, health monitoring, automation and energy management. Misidentification can lead to user frustration, erode trust, and hamper the practical and widespread adoption of these technologies. Pets, vacuum machines, and electrical appliances such as fans are prevalent in indoor environments, especially in homes. According to the American Pet Products Association [70], about 70% of families in the United States (about 85 million families) have pets in 2022, and the percentage is increasing year by year. The global robotic vacuum cleaners market is expected to grow from $5.59 billion in 2021 to $7.83 billion in 2026, at a compound annual growth rate (CAGR) of 6.9%. Therefore, given the prevalence of pets, robotic vacuum cleaners, and electrical appliances, especially in residential environments [70], it is crucial to develop a reliable system that can accurately recognize human and non-human subjects. In this chapter, we introduce the first system (“WI-MOID”: WiFi-based human and 40 Intrusion Detected Room Occupied Figure 3.1: An illustration of non-human motion interference with sensing systems. non-human motion identification) that can accurately identify various human and non-human subjects through walls. The system utilizes ubiquitous WiFi signals and operates unobtrusively and without contact, eliminating the need for additional instrumentation or environmental restrictions. In contrast to existing systems that require pets, robots, and humans to move along predefined paths/directions/areas, Wi-MoID can detect freely moving subjects, allowing them to walk/turn/stand/stay without any predefined restrictions. Wi-MoID automatically detects the movement of the subject and extracts context-independent features, allowing it to function effectively in diverse environments without requiring additional training or parameter tuning. The structure of this chapter is organized as follows: Section 3.1 provides an overview of Wi-MoID, including challenges, the insight and a system overview. Section 3.2 delves into the details of motion detection and speed estimation. Section 3.3 introduces feature extraction and motion recognition. Experimental setups and evaluations are described in Section 3.4. We discuss the impact of various factors in Section 3.5 and draw our conclusions in Section 3.6. 41 3.1 Overview 3.1.1 Challenges To realize such a flexible system is not an easy task, several major challenges need to be addressed. The first difficulty lies in deriving motion features in both Line-Of-Sight (LOS) and Non-Line-Of-Sight (NLOS) settings using only a single-link WiFi signal. Speed pattern, a key component of motion features, presents its own unique challenge. While various WiFi-based speed estimation approaches, such as those using Doppler Frequency Shift (DFS), have been attempted, they generally require multiple links or are only effective under LOS conditions. In this paper, we draw inspiration from WiSpeed [57] and adopt a rich-scattering model to overcome the challenge by treating subjects as multiple scatters. Secondly, developing a system that operates effectively in unseen contexts without requiring additional training or parameter adjustments is a substantial challenge. Most existing WiFi-based activity recognition methods rely on features extracted directly from raw CSI, which are susceptible to environmental changes with a degraded performance in new environments without retraining. To address this issue, we leverage a second-order statistic, the autocorrelation function (ACF), of the CSI and only extract features caused by the change of the multipath profiles. Unlike raw CSI, the ACF is insensitive to the static subjects and solely reflects the movement of moving subjects, making it well-suited for motion recognition in different environments. Third, accurately recognizing pet motion in real-world scenarios is challenging, especially 42 when their movement path, area, or activities are not restricted. Previous works that aim to recognize pet motion often make unrealistic assumptions about pets’ movement, such as assuming a specific path or confined area, considering only walking activity, or requiring pets to remain close to devices quietly for extended periods [44, 45]. In this paper, we propose a new approach that combines physical and statistical features to classify human and non-human motion, and further enhance the classification using a Hidden Markov Model (HMM)-based state machine that leverages temporal information to boost the accuracy. By incorporating temporal information, Wi-MoID can accurately and reliably distinguish pet motion from human motion, even when pets move freely and engage in different activities. 3.1.2 Insight A moving subject can be categorized as either human or non-human, the latter encompassing a range of entities such as pets or vacuum robots. When we refer to human motion, we are talking about the movement of the human body, encompassing activities such as walking, running, and sneaking. In contrast, non-human motion pertains to the movement of animate or inanimate subjects, which includes animals, robots, and other physical objects. We classify any motion originating from non-human entities as non-human motion. Interestingly, though pets, cleaning robots, and even electrical appliances such as fans can all generate motion that can be picked up by sensing systems, their movements are distinct from those of humans. The differences in gait patterns are a key aspect of this distinction, with humans, pets, and robots each demonstrating unique movement patterns due to their bipedal, quadrupedal, and wheeled locomotion, respectively. These differences are visually represented in Fig. 3.2. As 43 Stride cycle Stance Swing Stride cycle Swing Stance Moving with a constant speed Figure 3.2: An illustration of moving patterns of human, dog and cleaning robot. a result of these disparate movement patterns, the analysis of features like speed patterns becomes an essential factor in differentiating the motion of these subjects. 3.1.3 System Overview The overview of Wi-MoID, as illustrated in Fig. 3.3, is primarily composed of the following three steps: 1) Firstly, we preprocess the CSI received by the receiver. We compute the ACF of the CSI and, based on this, calculate the motion statistic to determine the presence of moving targets in the environment. Additionally, we derive the speed of the moving target based on the ACF. 44 CSI acquisition ACF calculation Speed estimation Motion detection Data processing Physical plausible Statistical Feature vector Feature extraction Subject type Human Pet Cleaning robot [0, 1, 0 ] Classification SVM classifier Figure 3.3: System overview. 2) Secondly, we effectively extract a series of features from the moving target. Using the ACF, motion statistic, and speed, we extract physically interpretable features and signal statistical features of the moving target. These features are independent of the environment, location, and direction. 3) Lastly, using SVM and a state machine, we output the final identification result of the system. We initially use SVM to classify the features of the target and then employ a state machine to adjust any potential misclassifications based on the transitional features between states, finally outputting the result. 3.2 Motion Detection and Speed Estimation This section presents our approach to detecting the motion of subjects and estimating their speeds through walls, using a single WiFi link. 3.2.1 Preliminary In order to accurately detect the motion of a subject in an indoor environment, it is crucial to consider the presence of numerous multipath signals arising from rich scattering. A statistical model that account for all these multipath components can be employed to estimate the speed of 45 0 5 10 15 20 25 30 35 40 10 20 30 40 50 C S I -2 0 2 5 10 15 20 25 30 35 40 0.02 0.04 0.06 0.08 A C F -0.05 0 0.05 0 5 10 15 20 25 30 35 40 0 0.5 1 M o ti o n s ta ts 0 0.5 1 D e te c ti o n 0 5 10 15 20 25 30 35 40 Time (sec) 0.2 1 2 S p e e d ( m /s ) (a) Human 0 5 10 15 20 25 30 35 40 10 20 30 40 50 C S I -2 0 2 4 5 10 15 20 25 30 35 40 0.02 0.04 0.06 0.08 A C F -0.05 0 0.05 0 5 10 15 20 25 30 35 40 0 0.5 1 M o ti o n s ta ts 0 0.5 1 D e te c ti o n 0 5 10 15 20 25 30 35 40 Time (sec) 0.2 1 2 S p e e d ( m /s ) (b) Cleaning robot 0 5 10 15 20 25 30 35 40 10 20 30 40 50 C S I -2 0 2 5 10 15 20 25 30 35 40 0.02 0.04 0.06 0.08 A C F -0.05 0 0.05 0 5 10 15 20 25 30 35 40 0 0.5 1 M o ti o n s ta ts 0 0.5 1 D e te c ti o n 0 5 10 15 20 25 30 35 40 Time (sec) 0.2 1 2 S p e e d ( m /s ) (c) Pet Figure 3.4: CSI, ACF, motion statistics and speed estimation for (a) human, (b) pet, and (c) cleaning robot. the target [29, 36, 57]. In indoor environments, rich scattering occurs due to both static and dynamic scatterers. Static scatterers comprise walls, floors, and stationary furniture, while dynamic scatterers include moving individuals or objects. The superposition principle of electromagnetic (EM) waves allows us to decompose CSI as follows: H(t, f) = ∑ i∈Ωs(t) Hi(t, f) + ∑ j∈Ωd(t) Hj(t, f) + ε(t, f), (3.1) where Ωs(t) represents the set of static scatterers, Ωd(t) corresponds to the set of dynamic scatterers. Hi(t, f) and Hj(t, f) represent the contributions of the i-th and j-th scatterers respectively. The noise term, ε(t, f), is statistically independent of Hi(t, f) and Hj(t, f) [57]. Each scatterer acts as a ”virtual transmitter”, scattering its received EM waves around. The CSI represents the aggregate of the electric fields of all incoming EM waves. The ACF of the CSI H(t, f) with a time lag τ , denoted as ρH(τ, f), can be determined by calculating the covariance between H(t, f) and H(t + τ, f) divided by the variance of H(t, f) 46 itself. Mathematically, it can be expressed as: ρH(τ, f) = Cov[H(t, f), H(t+ τ