ABSTRACT Title of dissertation: DATA-DRIVEN GEOMETRIC DESIGN SPACE EXPLORATION AND DESIGN SYNTHESIS Wei Chen, Doctor of Philosophy, 2019 Dissertation directed by: Professor Mark D. Fuge Department of Mechanical Engineering A design space is the space of all potential design candidates. While the design space can be of any kind, this work focuses on exploring geometric design spaces, where geometric parameters are used to represent designs and will largely affect a given design?s functionality or performance (e.g., airfoil, hull, and car body designs). By exploring the design space, we evaluate different design choices and look for desired solutions. However, a design space may have unnecessarily high dimensionality and implicit boundaries, which makes it difficult to explore. Also, if we synthesize new designs by randomly sampling design variables in the high-dimensional design space, there is a high chance that the designs are not feasible, as correlation exists between feasible design variables. This dissertation introduces ways of capturing a compact representation (which we call a latent space) that describes the variability of designs, so that we can synthesize designs and explore design options using this compact representation instead of the original high-dimensional design variables. The main research question answered by this dissertation is: how does one effectively learn this compact representation from data and efficiently explore this latent space so that we can quickly find desired design solutions? The word ?quickly? here means to eliminate or reduce the iterative ideation, prototyping, and evaluation steps in a conventional design process. This also reduces human intervention, and hence facilitates design automation. This work bridges the gap between machine learning and geometric design in engineering. It contributes new pieces of knowledge within two topics: design space exploration and design synthesis. Specifically, the main contributions are: 1. A method for measuring the intrinsic complexity of a design space based on design data manifolds. 2. Machine learning models that incorporate prior knowledge from the domain of design to improve latent space exploration and design synthesis quality. 3. New design space exploration tools that expand the design space and search for desired designs in an unbounded space. 4. Geometrical design space benchmarks with controllable complexity for testing data-driven design space exploration and design synthesis. DATA-DRIVEN GEOMETRIC DESIGN SPACE EXPLORATION AND DESIGN SYNTHESIS by Wei Chen Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2019 Advisory Committee: Professor Mark D. Fuge, Chair/Advisor Professor Shapour Azarm Professor Katrina Groth Professor David Jacobs (Dean?s Representative) Professor Linda Schmidt ?c Copyright by Wei Chen 2019 Acknowledgments Over the past four years of my PhD study, I have received tremendous help and support from people around me. Dr. Mark Fuge, who is the most brilliant, responsi- ble, and considerate advisor I have ever met, has been consistently offering guidance on not only research, but also my career path. You gave me great advice for particular problems, while also kept reminding me to step back and see the bigger picture. Also because of you, I was able to work efficiently in a nice and friendly environment in the past four years. I would also like to thank my committee members, Shapour Azarm, Katrina Groth, David Jacobs, and Linda Schmidt, who have provided insightful com- ments for my research. In addition, I would like to acknowledge my colleagues from the IDEAL lab at UMD. I have learned a lot from our collaborations and discussions. Also thanks to the Product Design, Modeling and Simulation Research Group in Siemens Corporate Technology. You offered me a great internship experience?an opportunity where I can integrate my research into real products. Finally, I am grateful that my family and friends are always providing me the strongest support. You have brought me happiness, helped me go through hard times, and given me the motivation to excel in my research. ii Table of Contents Acknowledgements ii 1 Introduction 1 1.1 Why use Data-Driven Methods to Create Design Representations? . . 2 1.2 Research Questions and Contributions of the Dissertation . . . . . . . 4 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 How to Use This Dissertation . . . . . . . . . . . . . . . . . . . . . . 7 2 Background 9 2.1 Geometric Design Representation . . . . . . . . . . . . . . . . . . . . 9 2.2 The Manifold Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Learning the Latent Space . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Exploring the Latent Space . . . . . . . . . . . . . . . . . . . 12 2.3 Designs Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Rule-Based Approaches . . . . . . . . . . . . . . . . . . . . . . 14 2.3.2 Assembly-Based Approaches . . . . . . . . . . . . . . . . . . . 15 2.3.3 How Data-Driven Approaches Differ from Other Synthesis Ap- proaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Learning the Intrinsic Complexity of Design Spaces 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 How Our Contributions Relate to Prior Work . . . . . . . . . . . . . 20 3.4 Samples and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.1 Synthetic Benchmark . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.2 Real-World Data . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5.2 Learning Design Space Properties . . . . . . . . . . . . . . . . 28 3.5.3 Embedding and Shape Synthesis . . . . . . . . . . . . . . . . . 33 3.5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 iii 3.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6.1 Design Space Properties . . . . . . . . . . . . . . . . . . . . . 39 3.6.2 Principal Attributes . . . . . . . . . . . . . . . . . . . . . . . 41 3.6.3 Are all regions on the manifold equally valid? . . . . . . . . . 42 3.6.4 Sample Arrangement in Latent Space . . . . . . . . . . . . . . 43 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Aerodynamic Design Synthesis, Optimization, and Shape Exploration 46 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.1 Optimization Methods . . . . . . . . . . . . . . . . . . . . . . 48 4.2.2 Shape Parameterization . . . . . . . . . . . . . . . . . . . . . 50 4.2.3 Design Space Dimensionality Reduction . . . . . . . . . . . . 51 4.3 Obtaining Disentangled Latent Representation Using Generative Ad- versarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.4 Spline-Based Shape Synthesis . . . . . . . . . . . . . . . . . . . . . . 54 4.5 Optimization over the Learned Latent Space . . . . . . . . . . . . . . 55 4.5.1 The Optimization Problem in the Latent Space . . . . . . . . 55 4.5.2 Unbounded Bayesian Optimization . . . . . . . . . . . . . . . 57 4.6 Experiment: Airfoil Synthesis and Shape Optimization . . . . . . . . 58 4.6.1 Dataset and Preprocessing . . . . . . . . . . . . . . . . . . . . 58 4.6.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . 59 4.6.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6.4 GA Refining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5 Synthesizing Designs with Inter-Part Dependencies 66 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.1 Design Space Dimensionality Reduction . . . . . . . . . . . . 69 5.2.2 Data-Driven Design Synthesis . . . . . . . . . . . . . . . . . . 70 5.2.3 Generative Adversarial Networks . . . . . . . . . . . . . . . . 72 5.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.2 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . 75 5.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.2 Network and Training . . . . . . . . . . . . . . . . . . . . . . 79 5.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.1 Visual Inspection. . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.5.2 Constraint Satisfaction. . . . . . . . . . . . . . . . . . . . . . . 85 5.5.3 Distance between Data and Generator Distributions. . . . . . 86 iv 5.5.4 Diversity of Generated Designs. . . . . . . . . . . . . . . . . . 87 5.5.5 Latent Space Consistency. . . . . . . . . . . . . . . . . . . . . 89 5.5.6 Effect of Encoding Inter-Part Dependencies. . . . . . . . . . . 91 5.6 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . 93 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6 Feasible Domain Identification with Active Expansion Sampling 96 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.2.1 Feasible Domain Identification . . . . . . . . . . . . . . . . . . 98 6.2.2 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.2.3 Gaussian Process Classification (GPC) . . . . . . . . . . . . . 101 6.3 Active Expansion Sampling (AES) . . . . . . . . . . . . . . . . . . . 103 6.3.1 -Margin Probability . . . . . . . . . . . . . . . . . . . . . . . 105 6.3.2 Exploitation and Exploration . . . . . . . . . . . . . . . . . . 106 6.4 Dynamic Local Pool Generation . . . . . . . . . . . . . . . . . . . . . 111 6.4.1 Scope of an Optimal Query . . . . . . . . . . . . . . . . . . . 112 6.4.2 Pool for the Exploration Stage . . . . . . . . . . . . . . . . . . 116 6.4.3 Pool for the Exploitation Stage . . . . . . . . . . . . . . . . . 117 6.4.4 Choosing when to Exploit versus Explore . . . . . . . . . . . . 118 6.5 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.5.1 Accuracy Analysis . . . . . . . . . . . . . . . . . . . . . . . . 119 6.5.2 Query Density . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.5.3 Influence of Hyperparameters . . . . . . . . . . . . . . . . . . 122 6.6 Using Design Manifolds to Synthesize Novel Designs . . . . . . . . . . 123 6.7 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.7.1 Effect of Hyperparameters . . . . . . . . . . . . . . . . . . . . 126 6.7.2 Unbounded versus Bounded . . . . . . . . . . . . . . . . . . . 132 6.7.3 Effect of Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.7.4 Effect of Dimensionality . . . . . . . . . . . . . . . . . . . . . 135 6.7.5 Nowacki Beam Example . . . . . . . . . . . . . . . . . . . . . 136 6.7.6 Detecting Novel Designs . . . . . . . . . . . . . . . . . . . . . 137 6.8 Additional Experimental Results . . . . . . . . . . . . . . . . . . . . 140 6.8.1 Hosaki Example . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.8.2 Results of Straddle Heuristic . . . . . . . . . . . . . . . . . . . 143 6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7 Global Optimization with Trust Region Bayesian Optimization 147 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.2 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.2.1 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.2.2 Acquisition Function . . . . . . . . . . . . . . . . . . . . . . . 149 v 7.2.3 Previous Work on Unbounded Bayesian Optimization . . . . . 150 7.3 Trust Region Bayesian Optimization . . . . . . . . . . . . . . . . . . 151 7.3.1 Acquisition Strategy . . . . . . . . . . . . . . . . . . . . . . . 151 7.3.2 Feasible Domain Bounds . . . . . . . . . . . . . . . . . . . . . 153 7.3.3 Adaptive Exploration-Exploitation Trade-off . . . . . . . . . . 155 7.3.4 Local Search for Better Exploitation . . . . . . . . . . . . . . 159 7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.4.1 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . 160 7.4.2 Synthetic Benchmarks . . . . . . . . . . . . . . . . . . . . . . 161 7.4.3 Constrained Problems . . . . . . . . . . . . . . . . . . . . . . 164 7.4.4 MLP on MNIST . . . . . . . . . . . . . . . . . . . . . . . . . 165 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8 Conclusion 169 8.1 Dissertation Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.2 Broader Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.3 Limitations and Future Research Directions . . . . . . . . . . . . . . 173 A Publications 176 A.1 Journal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 A.2 Peer-Reviewed Conference . . . . . . . . . . . . . . . . . . . . . . . . 177 Bibliography 178 vi Chapter 1: Introduction A design space is the space of all potential design candidates defined by design variables. Through exploring the design space, we can look for desired solutions by evaluating different design alternatives before realizing them in the real world. While some of the methods described in this dissertation can be applied to any design space, the primary focus is on geometric design spaces. Geometric design in general is a branch of computational geometry and focuses on curve and surface modeling and representation. Here, by geometric design, we refer to the design of any object?s geometry. For example, it can be the design of the car body, mechanical parts, or shape of any object, as opposed to the design of materials, color, or machining processes. A geometric design can have various representations. For example, we can represent a two-dimensional design by using spline curves [105, 179] or grid-point coordinates [40, 49], while a three-dimensional design can be represented by free-form deformation (FFD) [66] or point clouds [228]. Although different representations lead to different design spaces, geometric design spaces are generally high-dimensional and complex, which complicates both human exploration and computational optimization. This dissertation addresses this by analyzing geometric design spaces and generating new representations that are more interpretable and compact. 1 1.1 Why use Data-Driven Methods to Create Design Representations? Many traditional design processes go through the cycle of ideation, prototyping, evaluating, and improving. This iterative process usually requires human interven- tion, which can be expensive and time-consuming. Eventually, only a few designs are implemented and tested, leaving limited time for exploring design alternatives. De- signers can find a design that satisfies all requirements (i.e., feasible to all constraints), but the true optimal design is hard to uncover since insufficient design alternatives are explored. Alternatively, one can represent a class of designs by a consistent parameteriza- tion, so that a design can be synthesized by setting those parameters and the process of searching for solutions can be automated by optimization algorithms. For example, people have parameterized designs using polynomials, splines, or free-form deforma- tion (FFD) to represent aerodynamic or hydrodynamic designs (e.g., airfoils and hulls) [66, 105, 122, 163, 179]. Given a parameterization, designs can be optimized in that parameter space (i.e., the design space) by using methods like the genetic algorithm [79] or topology optimization [16, 140, 147]. These types of approaches also have downsides: 1. The dimensionality of the design space is often higher than the minimal dimen- sions needed to capture design variability, which creates unnecessary computa- tional cost. For example, if we use B-spline curves to parameterize airfoils, the design space will be the coordinates of those B-spline control points. However, we cannot arbitrarily change these coordinates (e.g., because airfoils possess properties like curvature continuity, etc.). Thus feasible designs lie on a low- dimensional manifold within that large design space. The dimensionality of this manifold is the minimal dimensionality that preserves design variability. 2 We call it the intrinsic dimensionality. 2. Designs that can be represented by a certain parameterization are usually lim- ited. This can be mitigated by adding parameters (e.g., increasing the number of B-spline control points). However, it is hard to decide on the number of parameters (i.e., design space dimensionality) manually, as we need to trade-off between representation capacity and computational cost. 3. The boundaries of the design space are often unknown. It is also tricky to set these boundaries, because the design space should be large enough to include the optimal solution, while as small as possible to reduce the computational cost on exploring infeasible regions. To address some of these downsides, we can extract useful information from exist- ing designs and leverage this information to create new designs. Designers have made many designs based on certain knowledge set. Usually, new designs are created based on the same knowledge set. So why can?t we make machines learn the knowledge set from previous designs, and infer new designs based on the learned knowledge set? Techniques from machine learning have made this possible. Once trained properly, machine learning models can automatically capture the intrinsic dimensionality, vari- ability, and boundaries of designs from data, thus ameliorating the above problems. In this dissertation, I propose data-driven methods to address problems in traditional design, and facilitate the design process so that it frees the time of designers, pro- vides inspiration for them, and even allows practitioners to create products without the help of a design expert. 3 1.2 Research Questions and Contributions of the Dissertation Data-driven design, especially in the field of engineering, usually requires extra consideration compared to problems dealt with in common machine learning areas like computer vision and language processing. Specifically, the following three aspects make data-driven design difficult: 1. Unlike the bounded input space of images or sentences, the bounds for the design space are usually unknown, so one has to assume the bounds during tasks like design optimization. For example, people usually predefine the parameter ranges of given parametric models when optimizing airfoils or hulls [105, 122, 163]. As mentioned previously, these predefined design space boundaries are sometimes problematic. 2. Design synthesis differs from synthesis commonly studied in the machine learn- ing scenario (e.g., image, music, sentence, or video synthesis), as it is subject to stricter functional (e.g., the lift/drag coefficient of airfoils) or geometrical (e.g., surface smoothness) constraints. 3. There are special requirements associated with data-driven design. For example, we want to regularize the latent space such that designs change consistently along any direction in the latent space. This property is desirable for design exploration over the latent space, as it regularizes the mapping from the latent space to the design space or the performance space. In response to these difficulties, this dissertation explores the following fundamen- tal research questions and contributions needed to answer those questions: RQ 1: How does one measure the complexity of a design space? Contribution: 4 Chapter 3 represents this complexity by the intrinsic dimensionality, non-linearity, and separability of the design data manifold [40, 49]. RQ 2: How does one find a low-dimensional latent representation that captures de- sign variability? Contribution: Chapter 3 derives latent spaces by using differ- ent models and then evaluates each latent space. Chapter 5 further describes a model that hierarchically captures a regularized latent space of each component in a design [50]. RQ 3: How does one explore the design space without specifying a fixed boundary? Contribution: Chapters 6 and 7 describe an adaptive sampling method that identifies feasible regions and a Bayesian optimization method that searches for global optimal designs, respectively, without the need for predefined design space bounds [43, 44]. This also allows the discovery of novel designs outside the domain of existing designs. By building data-driven models to find out answers to these questions, we can solve the three problems in traditional design described earlier. Particularly, the proposed data-driven design methods will help in three ways: 1. Feasible domain identification. By estimating a function that predicts the fea- sibility of designs, we can identify the feasible domains of the design space with respect to implicit constraints. 2. Design space exploration. We can capture a latent space that represents the low- dimensional manifold encompassing the major variability of designs. Design space exploration can then be performed on this latent space instead of the higher-dimensional design space. 5 3. Design recommendation. We can generate new designs from the learned model that captures real-world design variability. These new designs will provide in- spiration for designers and allow practitioners to create products without the help of design experts. 1.3 Overview In this dissertation, I will use design data to solve existing problems in design, and help facilitate design space exploration and design synthesis. Specifically, each chapter addresses the following: Chapter 2: reviews previous work that helps data-driven design space exploration and synthesis. Chapter 3: shows how to measure the intrinsic complexity and dimensionality of a design space, and generate fundamental knowledge about how designs differ from one another. It deepens our understanding of design complexity in general. Chapter 4: introduces a deep generative model of aerodynamic designs (specifically airfoils) that reduces the dimensionality of the optimization problem by learning from shape variations in an airfoil database, and proposes a two-stage optimiza- tion method that prioritizes the optimization of major attributes. Chapter 5: describes a method to synthesize designs with inter-part dependencies. It decomposes the design space of the whole design into the design space of each component while keeping the inter-component dependencies satisfied, so that we can perform design synthesis and design space exploration for each component separately. 6 Chapter 6: introduces Active Expansion Sampling (AES), a method that identifies (possibly disconnected) feasible domains over an unbounded input space. It avoids the need for setting explicit design variable bounds, and can be used for discovering potentially novel and creative designs. Chapter 7: proposes a Bayesian optimization approach, Trust Region Bayesian Op- timization (TRBO), that only needs to specify an initial search space that does not necessarily include the global optimum, and expands the search space when necessary. While the primary focus of this dissertation is on geometric design spaces, some of the methods described here can be applied to other design spaces. Specifically, when demonstrating dimensionality reduction techniques in Chapters 3-5, although we only use geometrical designs as examples since they are easy to visualize, the proposed techniques are applicable to studying any design spaces that can be reduced and represented by a lower-dimensional space. The design space exploration methods described in Chapters 6 and 7 can be used in any design space. 1.4 How to Use This Dissertation Although this dissertation is organized in a way that each chapter should be read sequentially, specific audiences may benefit more from certain portions of the work. Engineering Design Practitioners looking to develop data-driven design automation applications. Chapters 4 and 5 discuss the technique for learning generative models from design databases, so that new designs can be automatically synthesized from those models. Sections 4.5-4.6 and Chapters 6-7 further introduce 7 tools that can be built on those generative models and automatically search for desired designs. Machine Learning Researchers looking for new domains of application. Despite the wide usage of machine learning in areas including social media, robotics, and medical science, its application in engineering design is currently limited. In general, the whole dissertation discusses machine learning techniques that address engineering design problems. Particularly, Chapter 2 reviews how we can use dimen- sionality reduction and Bayesian approaches to facilitate design synthesis and design space exploration. Sections 3.4.1 and 5.4.1 provide synthetic datasets that could serve as benchmarks for design space study. Finally, Section 8.3 brings up some open questions and future research directions that may be addressed by more advanced machine learning techniques. Those who are interested in design of experiments, system design, algo- rithm hyperparameter optimization, structural optimization, etc. Chap- ters 6 and 7 present design space exploration tools based on the Bayesian approach. Specifically, Chapter 6 introduces an adaptive sampling method for feasible domain identification and Chapter 7 introduces a global optimization method. Different from conventional methods, they gradually expand the search space while updating the Bayesian model, so that the operating domain is no longer limited within fixed vari- able bounds. They are most useful when the variable ranges are hard to specify. Researchers looking to reproduce the results of this dissertation. The code and data for reproducing the experimental results are open-source through the GitHub link in each chapter. 8 Chapter 2: Background This dissertation focuses on the data-driven solution of two tasks: design space exploration and design synthesis. Both tasks have been dealt with in areas of design, machine learning, and computer graphics. This chapter reviews some fundamental concepts and the existing state of the art in data-driven design space exploration and synthesis. 2.1 Geometric Design Representation To create a geometric design space, one has to define a consistent representation that can express a set of designs. Note that in most cases the representation has to be consistent (e.g., having the same dimensionality1 and one-to-one mapping from a representation to a design) to form a design space, which allows machine learning or optimization algorithms to operate on. Using discrete geometry is a straight-forward way to represent designs. Specifi- cally, sequences of point coordinates is used for representing 2D shapes that consist of curves or pen strokes [40, 49, 91]. For 3D models, we use representations such as point clouds [71, 99, 228], depth maps [203], geometry images [196, 197], voxel grids [24, 137, 199, 231], and octrees [176, 212, 226]. These discrete representations 1Note that sometimes a design can also be represented as a sequence of variable length (i.e., variable dimensionality) [91]. In such cases we may still optimize the design as long as we can get a consistent latent representation. 9 are usually flexible enough to model any possible shape variation. However, it is their flexibility that makes those representations under-constrained. For example, when generating 3D models with smooth surfaces, the representation of voxel grids cannot enforce the smoothness by itself. Another issue is that they often suffer from their high dimensionality, which is also the source of their representation flexibility. In contrast, using parameterization of shapes as a representation reduces the di- mensionality of discrete representations and usually enforces properties like smooth- ness and watertightness. The spline curve is a common 2D shape parameterization approach, and is used commonly for aerodynamic shapes [105, 179]. Representation such as abstract deformation handles [241, 242] and free-form deformation (FFD) [66] are used for 3D models. Though these representations enforce desirable properties, they also limit the diversity of shapes so that some minor features may not be rep- resented. Also, the mapping from a parameterization and a design is sometimes not one-to-one. For example, different rational Be?zier curve representations (i.e., control points, weights and parameter variables combinations) can result in the same curve. This may bring difficulties to exploring the design space. 2.2 The Manifold Hypothesis Though parameterization reduces the design space dimensionality required for discrete geometry representations, the intrinsic dimensionality is usually much lower. Thus a large body of work has been focusing on reducing the design space to a low- dimensional latent space. The design space is where potential design candidates exist. We can find de- sired designs by searching the design space using methods such as the genetic algo- rithm [131]. However, the design space contains mostly invalid solutions, and valid 10 Figure 2.1: Valid glass designs (parameterized by Be?zier curves) are on a low-dimensional manifold. Here the high-dimensional design space is shown in 3D for visualization. designs are usually on a lower-dimensional manifold (Fig. 2.1). Also, based on the curse of dimensionality [14], the cost of exploring the design space grows exponen- tially with its dimensionality. It is expensive and unnecessary to directly search a high-dimensional geometric design space X . Instead, we can capture one or multiple lower-dimensional manifolds (i.e., the latent space Z) that valid designs lie on. This is done by learning a mapping f : X ? Z. Since the latent space captures major design variability, we can then search for designs in the latent space. 2.2.1 Learning the Latent Space Since the high-dimensional design space is hard to visualize or explore and con- tains mostly invalid solutions, we can use a lower-dimensional latent space to capture the variability of designs. Previous efforts have been made to achieve the goal of discovering how designs vary in the latent space. Work in this area is highly related to dimensionality reduction, manifold learning, and representation learning. We can model the mapping f : X ? Z as a linear mapping, which uses a set of 11 optimal directions or basis functions to represent designs such that the variance of shape geometry is maximized. Such methods include the Karhunen-Loe?ve expansion (KLE) [51, 66], principal component analysis (PCA) [69], and the active subspaces approach [214]. These methods usually have closed-form solutions to both the map- ping f and its reverse mapping g. However, in practice, it is more reasonable to assume that design variables lie on a non-linear manifold, rather than a hyperplane. This brings up the need to study non-linear methods. The non-linearity can be achieved by 1) applying linear reduction techniques locally to construct a non-linear global manifold [69, 130, 167?169]; 2) using kernel methods with linear reduction tech- niques [49, 69]; 3) latent variable models like Gaussian process latent variable model (GPLVM) and generative topographic mapping (GTM) [224]; and 4) neural networks based approaches such as self-organizing maps [165] and autoencoders [27, 49, 59, 69]. These previous approaches are for capturing a low-dimensional latent space that represents major variability of designs. They are usually applied for reducing the complexity of design optimization [165, 224] or creating interactive tools to visualize and explore the design space [8, 162, 241]. 2.2.2 Exploring the Latent Space When designs are associated with requirements or performance scores, we can find feasible or optimal designs by relating the latent space to relevant quantities of interest (QoI). In applications like design space exploration [64, 128, 237] and reliability analy- sis [133, 248], people need to find feasible domains within which solutions are valid. Usually these constraints are implicit and cannot be expressed analytically (e.g., aesthetics, functionality, or performance requirements) and require expensive evalu- 12 ations (e.g., simulation, experiments, or human evaluation). This problem is studied in the fields of adaptive sampling and active learning, where we want the mapping h from the design space X to the evaluation space Y ? {?1, 1} (-1 for infeasible and 1 for feasible designs) to be estimated through as few measurements as possible. Specifically, we iteratively update the estimated mapping h. Within each iteration t, a design x(t) is chosen to be evaluated such that its result y(t) ? Y is the most informative for updating the estimation?this property is referred to as informative- ness within the Active Learning community. This informativeness can be based on label ambiguity [100, 136, 188], estimated expected error [31, 125, 154, 247], the re- duction of the version space [215], classifier disagreement [7, 148], or the predictive variance [26, 86, 111]. Similar approaches have also been developed for design optimization, where the evaluation space Y ? R in most cases. Bayesian optimization (BO) is one type of method that aims to reduce the number of evaluations. Instead of selecting the most informative design at each iteration, BO selects designs to evaluate based on their likelihood of being the optimal design. This likelihood is quantified as probabil- ity of improvement [127], expected improvement [217], or upper confidence bounds (UCB) [206]. When using adaptive sampling to identify feasible domains or using Bayesian optimization to find global optimal designs, we normally need to specify some fixed design variable bounds. In cases such as algorithm hyperparameter tuning [190, 204, 207] and shape optimization [164], setting the variable bounds are not trivial. It is hard to guarantee that any fixed bounds will include the entire feasible domain or the true global optimum. To solve this problem, this dissertation demonstrates new methods for adaptive sampling or Bayesian optimization that require no fixed design variable bounds. This also allows the discovery of novel designs outside the domain 13 of existing designs. Due to the curse of dimensionality, the number of evaluations required to precisely estimate h increases exponentially with the dimensionality of X . Thus instead of learning the mapping h : X ? Y , we can learn a different mapping h? : Z ? Y , given that we have the mapping from the latent space to the design space, i.e., g : Z ? X . We need g for synthesizing designs to perform the evaluation (i.e., experiments or simulations) and get the desired results (feasible or optimal designs). This brings up the need for design synthesis. 2.3 Designs Synthesis Design synthesis is not only used in feasible domain identification and design optimization, but also for simplifying the design process by recommending new designs and visualizing the variability of design candidates. While this dissertation mainly deals with data-driven design synthesis, there are two other approaches to generating new designs, namely rule-based approaches and assembly-based approaches. 2.3.1 Rule-Based Approaches Rule-based approaches generate new shapes or designs via explicit rules or con- straints. These approaches involves procedural modeling [150, 210] and computational design synthesis (CDS) [30, 118, 119, 158, 232]. However, such explicit rules are usu- ally hard-coded based on prior knowledge, and are difficult to specify or generalize to diverse design objectives. Instead of requiring rules of shapes themselves, topology optimization [16, 140, 147] specifies a set of rules including a performance objective, constraints, and the design?s physical interaction with the environment (i.e., loads and/or boundary condi- 14 tions), which are usually clear in a specific problem definition. Topology optimization generates new designs by searching for the solution of an optimization problem de- fined by those rules. Since normally topology optimization relies on gradient-based optimization, it cannot be applied on black-box objectives (e.g., aesthetics, customer preference, or experimental performance). 2.3.2 Assembly-Based Approaches Assembly-based approaches synthesize new shapes by assembling or reorganizing parts from an existing shape database, while preserving the desired structures [36, 109, 208, 234, 246]. The shapes are usually parameterized by high-level abstract representations, such as hand-crafted feature vectors [109] or shape grammars [208]. While these methods generate designs by using parts in the database, they edit shapes at a high-level and do not control each part?s local geometry. 2.3.3 How Data-Driven Approaches Differ from Other Synthesis Ap- proaches In contrast to rule-based or assembly-based design synthesis, data-driven ap- proaches learn rules, constraints, and shape deformation from data. They eliminate the need for explicit rules, and often generate designs with more flexibility. More importantly, we can identify a continuous compact latent space that encodes major variations of valid designs. This simplifies design space exploration. As mentioned previously, we embed designs into low-dimensional latent spaces by learning the mapping f : X ? Z. Meanwhile, given any coordinate in a latent space, one can also synthesize a design if the inverse mapping g : Z ? X is also learned. Usually, dimensionality reduction techniques allow inverse transformations 15 from the latent space back to the design space, thus can synthesize new designs from latent variables [49, 51, 59, 69]. For example, under the PCA model, the latent variables define a linear combination of principal components to synthesize a new design [51]; for local manifold based approaches, a new design can be synthesized via interpolation between neighboring points on the local manifold [130]; and under the autoencoder model, the trained decoder maps any given point in the latent space to a new design [27, 59, 197]. Different from models like PCA and autoencoders, a generative model learns the probability distribution of designs Px. Usually it projects latent variables z draw from some prior distribution Pz to a design x ? Px. Researchers have employed generative models such as kernel density estimation [209], Boltzmann machines [99], variational autoencoders (VAEs) [153], and generative adversarial nets (GANs) [137, 231] to learn the distribution of samples in the design space, and synthesize new designs by drawing samples from the learned distribution. Though the aforementioned machine learning models can be directly applied to design synthesis, in practice there are limitations that hinder their performance. For example, engineering designs are usually subject to specific geometrical (e.g., surface smoothness of aerodynamic shapes) considerations that subjects commonly synthe- sized in the machine learning scenario (e.g., images, music, sentences, or videos) do not have. These considerations are prior knowledge that we can incorporate into our model to improve data-driven design synthesis. To guide data-driven design space exploration and design synthesis, the next chap- ter starts by understanding important properties of a design space, i.e., intrinsic dimension, data separability, and non-linearity. 16 Chapter 3: Learning the Intrinsic Complexity of Design Spaces Portions of the work in this chapter were published in the Journal of Mechanical Design [49] and the ASME International Design Technical Conference [41] 3.1 Introduction Products differ among many design parameters. For example, a wine glass con- tour can be designed using coordinates of B-spline control points: with 20 B-spline control points, a glass would have at least 40 design parameters. However, we cannot arbitrarily set these parameters because the contour must still look like a wine glass. Therefore, high-dimensional design parameters actually lie on a lower-dimensional design manifold (Fig. 3.1) or latent space that encodes major variability of design attributes, such as roundness or slenderness. A manifold?s intrinsic dimension is the minimal dimensionality we need to faithfully represent how those high-dimensional design parameters vary. Past researchers, both within and beyond design, have proposed many algo- rithms for mapping high-dimensional spaces to lower-dimensional manifolds and back again?what we call a design embedding. But how do you ensure the embedding has captured all the geometric variability among a collection of designs, while not us- ing more dimensions than necessary? How do you evaluate which embedding best captures a given design space? What properties should a good design embedding 17 Figure 3.1: 3D visualization of high dimensional design space showing that design parame- ters actually lie on a 2-dimensional manifold. possess? This chapter answers those questions by proposing methods to study design em- beddings. Specifically, our method measures the complexity of a high-dimensional design space and the quality of an embedding. We demonstrate our approach on syn- thetic superformula examples [79] with varying complexity and real-world glassware and airfoil examples. While this chapter focuses on mathematically understanding design spaces, our approach ultimately has implications for several important sub-fields of Engineering Design. In engineering optimization, the number of design variables severely impacts accuracy and convergence. In consumer preference models, high-dimensional design spaces complicate capturing human opinion inexpensively or accurately. In design in- terfaces, designers have difficulty exploring and manipulating high-dimensional design spaces. Because our approach automatically determines a design space?s complexity and dimension, these sub-fields can assess 1) their task?s fundamental difficulty, and 2) how to best reduce that difficulty. Mathematically, our work deepens our un- derstanding of design complexity in general by illuminating how design embeddings model a design space, how to judge the inherent complexity of a design space, and 18 how to measure the ways designs differ from one another. Our main contributions are: 1. A method for embedding designs in the fewest dimensions needed to control shape variations. 2. A minimal set of performance metrics for embeddings, including reconstruc- tion error, topology preservation, and whether the manifold captures key shape variations. 3. Manifold benchmarks with controllable complexity (i.e., non-linearity, dimen- sionality, and separability) for testing design embeddings. 3.2 Related Work Past research has taken two different approaches to understanding design spaces and synthesizing new shapes or designs?knowledge-driven methods and data-driven methods [233]. Knowledge-driven methods generate new shapes or designs via explicit rules. One representative example is procedural modeling, which creates 3D models and textures of objects (such as buildings and cities) from sets of rules [150, 210]. An- other example is Computational Design Synthesis (CDS), which synthesizes designs (such as gearboxes and bicycle frames) based on topological or parametric rules or constraints [30, 118, 119, 158, 232]. However, such explicit rules are hard to specify or generalize to diverse design objectives. Data-driven methods, by contrast, learn the representation of geometric structure from examples, adjusting the model to match provided design data. We refer the readers to the survey by Xu et al. [233] for an overview of data-driven shape processing techniques. Our work uses a data-driven approach to learn the inherent complexity and the latent representation of a shape collection, and synthesize shapes by exploring that 19 latent representation. Generally, there are three main data-driven approaches: 1) assembly-based modeling, where parts from an existing shape database are assem- bled onto a new shapes [35?37, 90, 109]; 2) statistical-based modeling, where a prob- ability distribution is fitted to shapes in the design space, and plausible shapes are generated based on that distribution [73, 109, 209]; and 3) shape editing, where a low-dimensional representation is learned from high-dimensional design parameters, and designers create shapes by exploring that low-dimensional representation. This chapter falls into that third category. Many approaches use manifold learning techniques such as Multi-Dimensional Scaling (MDS) to map the design space to a low-dimensional embedding space [8]. However such approaches generally construct only one-way mappings (high to low). Some methods can also directly learn a two-way mapping between the design space and the embedding space, such as autoencoders, which uses neural networks to project designs from high to low dimension and back again [28, 240]. Another way is to associate shapes with their semantic attributes by crowd-sourcing, and then learn a mapping from the semantic attributes to the shapes, such that new shapes can be generated by editing these semantic attributes [241]. 3.3 How Our Contributions Relate to Prior Work Our hypothesis is that while shapes are represented in a high-dimensional design space, they actually vary in a lower-dimensional manifold, or surface within the larger design space [17, 216]. One would then navigate a shape space by moving across this manifold?like an ant walking across a surface of a sphere?while still visiting all valid designs. This raises several questions: How does one find the manifold (if it exists)? How does one ?jump back? to the high-dimensional space (raw geometry) once one has wandered around on the manifold? How does one choose a ?useful? manifold, 20 and how do ones evaluate that ?usefulness?? A common issue for previous shape synthesis methods is that they often do not address the inherent properties (e.g., intrinsic dimension, non-linearity) of the geo- metric design space before embedding, choosing instead to set parameters (such as dimensionality) manually. This causes problems during embedding and shape synthe- sis because this may not adequately capture shape variability or may use unnecessary dimensions along which designs do not vary. Unlike past work, this chapter directly investigates the inherent complexity of a design space under the assumption that dif- ferent parts of that space may differ in complexity. We discover information such as discontinuities in the design space and the intrinsic dimension of each segment. We then adjust the embeddings and design manifolds based on that information. Another issue is that given various embeddings (constructed by MDS, autoen- coders, etc.), how does one choose the embedding that best enables a smooth and accurate exploration of the space? Reconstruction error is one common metric for evaluating embeddings. It measures how accurately the model can reconstruct in- put data as it embeds data from high dimension to low and back again. However, sometimes embeddings can excel at reconstruction but ultimately place the data in a lower-dimensional space with unexpected and unintuitive topological structures, such as linear filaments in Deep Autoencoders [68]. While these structures may aid accu- rate reconstruction, they make it difficult for users to explore the embedded space. In this chapter, we propose evaluation metrics that measure three properties of an embedding: (1) its reconstruction accuracy, (2) how well it preserves a design space?s topology, and (3) whether it captures the design space?s principal semantic attributes. In practice, there is often a trade-off between these metrics, and our approach allows a designer to visualize and decide among embeddings with respect to that trade-off. 21 3.4 Samples and Data Before we discuss our approach, we need to introduce some concrete benchmarks of design spaces that we will demonstrate and validate our method over. By design space, we mean any M -dimensional vector (x ? RM) that controls a given design?s form or function?e.g., its shape, material, power, etc. To create a high-dimensional design space X , we generate a set of design parameters or shape representations X ? X . For ease of explanation and visualization, this chapter uses designs described by 2D curved contours; however, the proposed methods extend to non-geometric design spaces as well. Specifically, we uniformly sample points along the shape contours and (i) (i) use their coordinates as X where xj and yj are the x and y coordinates of the jth point on the contour of the ith? sample: ? ?? (1) (1) (1) (1)?x1 y1 . . . xn yn ??? . . ?X = . . .. . . . .. .. .. ??? (m) (m) (m) (m) x1 y1 . . . xn yn The sample shapes come from two sources: 1) the superformula [79] as a synthetic example whose design space complexity we can directly control, and 2) glassware and airfoil contours as real-world examples. 3.4.1 Synthetic Benchmark Since this chapter?s goal is to capture a design space?s inherent properties, we first need a benchmark dataset whose properties we can directly control; this allows us to measure performance with respect to a known ground truth. Since, to our knowledge, no such benchmark exists for design embeddings, we created one using a generaliza- tion of the ellipse?called the superformula (See Fig. 3.2 for examples)?that allows 22 Figure 3.2: Examples of superformula shapes. us to control the following properties: (non-)Linearity, number of separate mani- folds, intrinsic dimension, and manifold separability/intersection. Two-dimensional superformula shapes have a multidimensional parameter space in R6 with parame- ters (a, b,m, n1, n2, n3) [79] in Eqn. (3.1). With a radius (r) and an angle (?), the superformula is expressed in p(o?lar coord?inates? as:??? n ?n )2 3 ? 1n?cos(m? ) ??? ???sin(m? ) ? 1r(?) = 4 + 4 ?? (3.1)a ? ? b ? thus the Cartesian coordinates X are: (x, y) = (r(?) cos ?, r(?) sin ?) Controlling linearity. By tuning the parameters in Eq. 3.1 we can vary the non- linearity of X; for example, by changing the aspect ratio s of the shapes, we can linearly vary X: (x, y) = (s ? r(?) cos ?, r(?) sin ?) (3.2) We can control the linearity of X by tuning the linear switch s in Eqn. (3.2) or the nonlinear switches {a, b,m, n1, n2, n3} in Eqn. (3.1). Fig. 3.3a and 3.3b are examples of controlling linearity of the design space. A design space?s linearity is reflected by the curvature of the manifold?higher non-linearity results in higher curvature. Controlling the intrinsic dimension. To artificially control the intrinsic dimen- sion M of the design space, we construct M -dimensional subspaces of the superfor- 23 (a) Linear design space (d = 1) (b) Nonlinear space (d = 2) varying s. varying s & n3. Abnormal! m = 5 m = 3 m = 4 (c) Design space with multiple shape categories. Figure 3.3: 3D visualization of the superformula design space created by a linear mapping from the high-dimensional design space X to a 3-dimensional space, solely for visualization. Each point represents a design. 24 mula parameter space by varying M parameters choosing from {s, a, b,m, n1, n2, n3}, and keeping other parameters fixed, as shown in Fig. 3.3a (1-D) and 3.3b (2-D). Controlling the number of shape categories. A collection of designs does not always consist of just one category of shapes. For example, a collection may contain not only glassware, but also bottles, which have very different contours to glassware and likely have different manifold properties. A na??ve embedding, which lumps glasses and bottles together, should perform poorly here, and thus we need our synthetic benchmark to create similar cases with distinct manifolds. It is possible to have multiple categories of superformula shapes by discretely changing the value of m in Eqn. (3.1). Because the parameter m controls the period of the right-hand-side function in Eqn. (3.1), we can use it to set the superformula to a m-pointed-star, as shown in Fig. 3.3c. The discrete change of m forms separate clusters or sub-manifolds in the design space. Another benefit of this formulation is that we can control the separability of the sub-manifold, since we can make these clusters intersect one another (e.g., Fig. 3.7b). We can generate intersecting clusters by setting the ranges of varying parameters n2 and n3, such that all the clusters contain ellipses or circles. Because this superformula benchmark can generate design manifolds with many different properties, we believe it should be useful not only for benchmarking and improving future design embedding techniques, but also for evaluating manifold learning techniques in general. 3.4.2 Real-World Data Glassware. Glassware is a good real-world example because 1) shape representation using B-splines smoothly fits the glass contours, and 2) we can interpret the semantic attributes of glassware?e.g., roundness, slenderness, type of drink, etc. We use 128 25 glass images, including wine, beer, champagne, and cocktail glasses. We fit each glass contour with a B-spline, and build the design space X using the coordinates of points uniformly sampled across the B-spline curves. Airfoils. An airfoil is the cross-section shape of a wing or blade (of a propeller, rotor, or turbine). Like glassware, we can also represent airfoils via 2D contours and they have discernible semantic attributes that allow us to verify different embeddings. A good airfoil shape embedding can aid airfoil optimization; for example, the pro- posed method can provide a continuous space with the fewest number of necessary dimensions for an airfoil optimization algorithm to optimize over, improving conver- gence speed and accuracy. Our airfoil samples are from the UIUC Airfoil Coordinates Database, which provides the Cartesian coordinates for nearly 1,600 airfoils.1 We use linear interpolation to ensure that each airfoil has the same number of coordinates in the design space X . 3.5 Methodology First, we try to achieve good design embeddings by understanding the complexity or properties of the design space?for example, detecting the number of sub-manifolds and what their dimension might be. Based on those results, we then apply embeddings of appropriate complexity and type to the different sub-manifolds. We will review the details of our methodology. Interested readers can review the full source code for needed to reproduce all detail in this chapter.2 1http://m-selig.ae.illinois.edu/ads/coord_database.html 2https://github.com/IDEALLab/design_embeddings_jmd_2016 26 3.5.1 Pre-processing Our raw data may come from shapes with various height and width, or have inconsistent or very high dimensionality. These issues obstruct manifold learning and embedding. We apply shape correspondence and linear dimension reduction as pre-processing steps to mitigate these issues. Shape correspondence. We ensure that the coordinates for all designs correspond to consistent cardinality and areas in space. This step is important for our techniques to work well; however, the choice of correspondence technique is not central to the contributions of the chapter. In this work, we fixed the coordinates of the start and end points, and uniformly sample a constant number of points on the shape outline. Specifically, 1. For the superformula example, we set the point whose angle is 0 in the polar coordinate system as the start point, and the point which has the largest angle in the polar coordinate system as the end point; 2. For the glass example, we set the start and end point at the top and bottom of the left contour respectively; 3. For an airfoil, its outline has two curves ? the upper and lower curves. We set the left and right most point as the start and end point of each curve, respectively. Then for the superformula and glass examples, we standardized each sample such that the y coordinates are in [0, 1]; and for the airfoil example, we standardized each sample such that the x coordinates are in [0, 1]. 27 Linear dimension reduction. Before performing non-linear dimension reduction, we first use Principal Component Analysis as a linear mapping (f ? : X ? RD ? X ? ? RD?) to reduce X such that it retains 99.5% of the variance. The original design space X is normally high dimensional. For example, if 100 points are used to represent the contour of each glass, we will have a 200-dimensional design space. It is difficult to learn from such a high dimensional space because of the following reasons: 1) the standard Euclidean distance metric is no longer a reliable measurement of similarity in a high dimensional space; 2) the number of samples required grows exponentially with the dimension; and 3) computing high dimensional data requires more time and resources. The first two reasons are the consequences of what is called the curse of dimensionality [15]. Despite their high dimensionality, the design parameters X are usually highly redundant. For example, in our experiments, by using principal component analysis (PCA), the 200-dimensional glassware design parameters can be reduced to 30-dimensional while still retaining at least 99.5% of the variance between designs. Therefore, our first step is to apply simple linear dimensionality reduction to map the high-dimensional design space to a lower (but still high) dimensional space: f ? : X ? RD ? X ? ? RD? where f ? is a linear mapping and D ? D?. 3.5.2 Learning Design Space Properties If designs do lie on manifolds, we first need to know the number, intrinsic dimen- sion, and complexity of those manifolds for two reasons. First, knowing the intrinsic dimension M sets the dimensionality d of the latent space F . Setting d < M makes it impossible to completely capture all variability of designs; while d > M introduces 28 unnecessary dimensions along which designs do not vary?this unnecessarily impedes exploration or optimization. Second, separating different categories of designs helps exclude invalid designs. To illustrate this point, we can first look into the case when there are multiple manifolds in a design space, as shown in Fig. 3.3c. Suppose we treat these multiple manifolds as one, and perform embedding and shape synthesis. We have to use a 3-dimensional latent space to completely capture the variation between the designs. Since there are no design samples in between two manifolds?what we call a design cavity, we do not know whether a design from that area is valid. Consequently, in that area we might synthesize a new shape which looks like a weird hybrid of two designs from two different manifolds, like the abnormal shape in Fig. 3.3c. In contrast, if we separate these manifolds and then do embedding and shape synthesis on each manifold/design category, we can avoid generating invalid new designs.3 For the above reasons, we apply clustering and intrinsic dimension estimation over the dimensionality-reduced design space X ? before embedding designs. Clustering. To separate manifolds, we use a method based on robust multiple man- ifolds structure learning (RMMSL) [83]. It assumes that within each cluster, samples should not only be close to each other, but also form a flat and smooth manifold. A manifold can be flat and smooth if it has small curvature everywhere. The method thus constructs an affinity matrix which incorporates both pairwise distance and a curvature measurement, which is computed via principal angles between local tangent spaces. RMMSL uses a curved-level measurement R(x) to indicate curvature and flatness: 3An astute reader may notice that designs ?off-the-manifold? may be, in some sense, creative or innovative. We return to that discussion in Sect. 3.6.3. 29 ? ??(Ji, J)? R(x) = (3.3) d(xi, x) xi?N(x) where ?(Ji, J) measures the principal angle between the tangent spaces Ji at xi and J at x, which indicates the curvature of the manifold; d(xi, x) is the geodesic distance between xi and x; and N(x) is the spatial neighborhood point set for x. In general, RMMSL first performs local manifold structure estimation: it estimates ? the local tangent space J ? RD ?dii at each sample xi, where i = 1, ..., n and di is the local intrinsic dimension at xi (the method of estimating di is introduced in Sect. 5.2.2). Then it learns the global manifold structure by constructing an affinity matrix W ? Rn?n using both a pairwise distance kernel w1(xi, xj) and a curved level kernel w2(xi, xj): Wij = w1(xi, xj)w2(xi, xj) (3.4) where w1(xi, xj) = exp(??xi ? xj?2/(?i?j)), w2(xi, xj) = exp(??(Ji, Jj)2/(?xi ? x ?2j ?2c/?i?j)), ?i and ?j are local bandwidth [244], and ?c is a coefficient to con- trol the effect of the curved level kernel. The time complexity for estimating the local tangent spaces is O(ND?(k2max + D?k ?2max + D )), where kmax is the maximum neighborhood size. The complexity for computing the affinity matrix using pairwise distances and curved-level measurements is O(N2D?d2max), where dmax is the maximum local intrinsic dimension. Thus the overall computational cost for getting the affinity matrix is O(ND?(k2 ?max +D kmax + D?2 +Nd2max)). To compute the curvature measurement, we need the neighboring point set N(x) for each sample point x. We use the neighborhood contraction and expansion algo- rithm proposed by Zhang et al. [245]. The basic idea is that the neighborhood size k selected for each point x should not only reflect the local geometric structure of the 30 manifold, but also have enough overlap among nearby neighborhoods to allow local information propagation. Specifically, the neighborhood contraction and expansion algorithm first uses k- NN to get a neighborhood Ni of size kmax for ea?ch point xi, and constructs a matrix Xi? x? 1Ti , where Xi = [xi1 , ..., xi ] and x?i = k?k xi . Then it contracts Ni bykmax max k removing the farthest neighbor until some preset minimal neighborhood size kmin is reached or until the followin?g inequality hold?s:? ? (i) 2 (i)(?j ) ? ? (? 2j ) (3.5) j>di j?di (i) (i) (i) where ?j is the jth singular value of Xi ? x?i1T (?1 ? ... ? ?k ), di is the localmax intrinsic dimension of the manifold at xi, which we will introduce in the next section. and the small constant ? ? (0, 0.5). Based on [245], forN samples the algorithm has complexityO(N(kmax?kmin)k2max(kmax+ D? ? )), when sample x ? RDi . Normally given the correct number of manifolds C (i.e., group number), RMMSL has good performances in separating them [83]. However, it requires manually speci- fying the number C. To automatically detect the group number C, we apply a method based on self-tuning spectral clustering (STSC) [244]. This method automatically in- fers C by exploiting the structure of the eigenvectors V of the normalized affinity matrix A (i.e., the Laplacian matrix) using an iterative algorithm (with T number of iterations). For each possible group number C, the STSC algorithm tries to find a rotation R? such that each row in the matrix Z = VCR? has a single non-zero entry (VC is the 31 first C columns of V ). This is achieved by minimizing a cost function: ?n ?C L = (Z2 2ij ?Mi ) i=1 j=1 where Mi = maxj Zij. Then the best group number C is the one with the lowest cost. Once we get the optimal Z, the sample xi can be assigned to cluster c if and only if max 2 2j(Zij) = Zic. The eigenvalue decomposition of the Laplacian matrix takes O(N3) time. Given a group number C, at each iteration of the optimization, the time for computing Z via Givens rotation [82] is O(C2(C3 + N)). Thus the total time for computing Z is O(TC3 3max(Cmax + N)), where Cmax is the maximum group number, and T is the number of iterations. Assigning samples to clusters takes O(NCmax) time. Therefore the overall computational cost for estimating the optimal group number is O(N3 + TC3 3max(Cmax +N)). While the scale complexity on Cmax is high, in practice Cmax is often a small number, so the complexity penalty is manageable. In sum, we first obtain the nearest neighbors N(xi) for each sample xi, then apply RMMSL to compute the affinity matrix W . Finally, we use W as the affinity matrix for STSC to determine the group number C and assign samples to different groups. Intrinsic dimension estimation. Following the manifold clustering procedure, we apply intrinsic dimension estimation over each manifold/design category. The esti- mator is based on the local dimension estimation method mentioned in RMMSL [83]. We first obtain the K nearest neighbors of each sample xi. We set the neighborhood size K using the adaptive method proposed in [180]. With the neighbors of xi and a weight matrix S, we construct a local structure matrix: Ti = (Xi ? x 1Ti )SST (X T Ti ? xi1 ) 32 where S is a diagonal matrix and can be set as S = 1/(?2 + ?2jj n ?(?xi ? xi? )), ?2j 2 n and ?2 indicate the scales of the noise and the error, and ?(?) is a monotonically non-decreasing function with non-negative domain (e.g., a quadratic) [83]. The local intrinsic dimension di is estimated from the eigenvalues of Ti?similar to dimensionality estimation using PCA. A category?s overall intrinsic dimension is the mean intrinsic dimension of all points in that category. In practice, point-wise intrinsic dimension can be noisy (and thus appear to change dimensionality often). We apply a kernel density smoother to local dimensionality estimates to account for our assumption that the local dimensionality should not vary drastically within a neighborhood on a smooth manifold. For example, if the local dimensionality estimations for xi and its five neighbors are {2, 3, 2, 2, 2, 2}, the second estimation is likely to be incorrect. After applying KDE, the new estimations will be {2, 2, 2, 2, 2, 2}. We use the Epanechnikov kernel to limit density estimation to a local neighborhood, and set the kernel bandwidth adaptively based on the distance between each sample and its Kth neighbor. 3.5.3 Embedding and Shape Synthesis We used methods involving PCA, kernel PCA (with a RBF kernel) [183], and stacked denoising autoencoders (SdA) [221], which all simultaneously learn a mapping ? f from the space X ? to the latent space F (f : X ? ? RD ? F ? Rd) and a reverse ? mapping g from the F back to X ? (g : F ? Rd ? X ? ? RD ) where D? ? d. PCA performs an orthogonal linear transformation on the input data. Kernel PCA extends PCA, achieving a nonlinear transformation via the kernel trick [182]. The SdA extends the stacked autoencoder [220], which is an multilayer artificial neural network that nonlinearly maps X ? and F using nonlinear activation functions [18]. 33 Invalid! (a) Create a convex hull of the(b) Copy the boundary of the convex(c) Remove designs outside the training set in the latent space. hull to the grid of new designs gener-boundary. ated from the latent space. Figure 3.4: Set boundary of the feasible latent space. We split any design data into a training set and test set. We further split the training data via 5-fold cross validation to optimize the hyperparameters using the sequential model-based algorithm configuration (SMAC) [102]. After optimizing any hyperparameters, the model trains each example with the entire training set. We test each model with the test set. After mapping from design space X ? to latent space F , we also need to map back to the original X from F?i.e., given certain latent codes, we want to generate new shapes. This is achieved by first applying mapping g, and then applying g? (g? : X ? ? RD? ? X ? RD), which is the inverse mapping of f ?. To visualize the mapping g? : F ? X , we uniformly sample from F , map those samples back to X to synthesize their shapes, and then plot them in Fig. 3.4b. Note that although we only visualize a limited number of shapes generated from the latent space F , we can generate infinitely many shapes when continuously exploring in F . Every latent space should have a boundary beyond which designs are not guaran- teed to be valid. For example, at the top left of Fig. 3.4b, the glass contours on the two sides intersect. We call these infeasible shapes, i.e., shapes that are unrealistic or invalid in the real-world. To limit F to only feasible shapes, we take the convex hull of the training samples in F as shown in Fig. 3.4a, and set its boundary as the 34 boundary for the feasible latent space. We sample and synthesize shapes inside the feasible latent space, as shown in Fig. 3.4c. Because training samples all have feasible shapes, any designs lying between any two training samples should be valid if the la- tent space preserves the original design space?s topology (i.e., the space is not highly distorted). As a result, this method may not explore innovative, unusual designs. While this chapter?s main focus is understanding the complexity of an existing design space, we discuss how to find innovative designs in Sect. 3.6.3. 3.5.4 Evaluation To evaluate embeddings, we consider these questions: 1. Given known latent codes (e.g., roundness and slenderness), can the embedding precisely restore the original design parameters that created it? 2. Do shapes in the latent space (F) vary similarly compared to the original shapes in the design space (X )? 3. Does the embedding precisely capture all the attributes that control the varia- tion of shapes? To answer these questions, we propose three metrics for comparing embeddings: 1) re- construction error, 2) geodesic distance inconsistency, and 3) principal attributes. Reconstruction error. Reconstruction error measures how the actual design pa- rameters X ? differ from the design parameters of the input data once we project them onto the low-dimensional manifold and un-project back into high-dimensional space. We use the symmetric mean absolute percentage error (SMAPE) [144] to measure reconstruction error: 1 ?m ?n | (i)rj ? (i)sj |  = (3.6) mn (i) (i) i=1 j=1 |rj |+ |sj | 35 X A B F1 C Embedding 1 A B C: B reduce rim diameter dAB A dAC increase stem diameter C A C F B2 A C B: Embedding 2 d irregular shape variationAC dAB Figure 3.5: Illustration of pairwise distance preservation. Similar designs (A and B) have similar shape representations in X , thus are closer in X than dissimilar designs (A and C). We want such relation of pairwise distances to be preserved in F (i.e., dAB < dAC) such that shapes will vary in the same manner as they do in X . where m is the sample size, n is the number of design parameters for each sample, (i) (i) rj is the ith reconstructed design parameter for the jth sample, and sj is the ith original design parameter for the jth sample. Pairwise distance preservation. To answer the second question, we can compare the pairwise distances of samples in the high-dimensional design space X versus the low-dimensional latent space F . Generally, as shown in Fig. 3.5, similar designs (A and B) have similar shape representations (e.g., Cartesian coordinates of shape outlines) in X , thus are closer in X than dissimilar designs (A and C). We want such pairwise distances to be preserved in F (i.e., dAB < dAC) such that shapes will vary in the same manner as they do in X . Since we assume our samples lie on a manifold in X , we use the pairwise geodesic distances along the manifold as the pairwise distances to be preserved after the embedding. Specifically, we construct a nearest neighbor graph G over the samples X ? X to model the manifold structure. G is a weighted graph where the edge weight between neighbors is their Euclidean distance. The nearest neighbors are selected using the neighborhood contraction and expansion algorithm mentioned in the neighborhood selection section. This algorithm adaptively chooses the neighborhood size for each sample based on its local manifold geometry. The neighborhood size is large where 36 the manifold is flat, and small where it is curvy. Thus this method prevents ?short- cuts? across high-curvature manifolds and maintains large enough overlap among nearby neighborhoods. Then we compute all pairs shortest paths for the graph G, and construct a geodesic distance matrix DG. We compare DG with the pairwise Euclidean distance matrix D of the embedded samples F ? F using Pearson?s cor- relation coefficient. The geodesic distance inconsistency (GDI) can be expressed as GDI = 1? ?(D ,D)2G (3.7) where ?(DG, D) is the Pearson?s correlation coefficient between DG and D. Lower GDI indicates the embedding better preserves pairwise distances. Normally, embedding methods like Isomap will have low GDIs because they explic- itly optimize pairwise distances. However, they cannot simultaneously learn two-way mappings between the design space and the embedding space. Methods like au- toencoders are able to learn two-way mappings, but they usually have higher GDIs because they minimize reconstruction error rather than preserve distances. There is often a trade-off between reconstruction and distance preservation; embeddings that explicitly optimize both objectives would be an interesting topic for future research. Principal attributes. For superformula examples, we know what their correct latent spaces should look like by looking at their parameter spaces. A parameter space P (e.g., Fig. 3.6a) for the superformula contains the shapes generated by all possible combinations of values for all the parameters used to modify the shapes. It is this space from which we randomly select training and testing samples. For example, if we fix parameters {a, b,m, n2, n3} and vary {s, n1} in Eqn. (3.1) and Eqn. (3.2), the superformula parameter space will have two dimensions (i.e., P ? R2). Along the first dimension some shape attribute (e.g., aspect ratio) changes with s, and along 37 (a) Shapes in the superformula (b) Generated shapes in the la- parameter space. tent space. Figure 3.6: An example comparing shapes generated from the latent space F versus the su- performula parameter space P. If the embedding precisely captures the principal attributes, shapes from F should look like those from P?with neither extra unexpected shape variation nor missing diversity. the second dimension another attribute (e.g., roundness) changes with n1. We call these attributes the principal attributes. We can also vary these two attributes along a single dimension by simultaneously varying s and n1 (e.g., let s = ?t and n1 = ?t, where ? and ? are coefficients, and t is a variable). Fig. 3.7a shows an example where two shape attributes vary along each dimension?the aspect ratio4 changes along one dimension and roundness along the other, and the number of arms changes over both dimensions. Because the number of arms is not continuous, the clustering algorithm should separate shapes with a different number of arms. And within each cluster, a good embedding should capture the other two attributes?the aspect ratio and roundness. A good embedding should precisely capture the right principal attributes, such that tuning these attributes creates diverse but valid shapes. By comparing the shapes generated from the latent space F with those from the superformula parameter space P , we can evaluate whether the embedding precisely captures all the attributes that control the variation of shapes. That is, shapes generated from F should look similar to those from P?with neither unexpected shape variation nor missing diversity, as 4All the shapes shown in the figures are scaled to the same height. Thus the aspect ratio will matter instead of the height or the width. 38 (a) Shapes in the superformula (b) Result of manifold clus- parameter space. tering (as in Fig. 3.3, the de- sign space X is visualized in three dimensions). (c) Generated shapes in latent spaces. Since there are three categories, we have three separated latent spaces. Figure 3.7: Multiple superformula categories with intersection. Our approach correctly separates the three sub-manifolds, even though they all connect via a common seam. shown in Fig. 3.6. 3.6 Results and Discussion We evaluated our method?s performance at recovering various design space prop- erties. We also compared how well different embedding approaches captured shape attributes. 3.6.1 Design Space Properties We use the superformula examples to test the accuracy of our clustering algorithm and intrinsic dimension estimator. We conducted experiments with the number of clusters C set from 1 to 5, intrinsic dimension from 1 to 3, and three levels of linearity (curvature of the manifold). The dataset used in each experiment has a sample size 39 (a) Shapes in the superformula pa-(b) Result of manifold clus- (c) Generated shapes rameter space. tering (as in Fig. 3.3, the in latent spaces. design space X is visualized in three dimensions). Figure 3.8: Multiple superformula categories with intersection and different intrinsic dimen- sions. Our intrinsic dimension estimator automatically detects the appropriate dimension- ality of the latent space for each design category (c). of 500. All the experiments correctly separated the superformula shape categories and estimated the correct intrinsic dimensions (Fig. 3.7). In cases where multiple manifolds intersect (Fig. 3.7b), it is improper to use metrics like the rand index to evaluate clustering accuracy, because samples at the intersection can be classified to any adjacent group. For example, shapes turn into ellipses at the intersection (Fig. 3.7a), so it does not matter whether they belong to the four- or five-pointed star group. Figure 3.7c shows that our approach captures this case. Figure 3.8 demonstrates a case where two manifolds with different intrinsic dimen- sions intersect: one superformula category with an intrinsic dimension of 1 intersects with another category with an intrinsic dimension of 2. Our method correctly sepa- rates the two categories and estimates the intrinsic dimension for each category, as shown in Fig. 3.8c. 40 eter Rim diam Stem diameter Figure 3.9: Synthesized glassware shapes in a 3D latent space. The embedding captured three shape attributes?the rim diameter, the stem diameter, and the curvature. 3.6.2 Principal Attributes As mentioned above, a good embedding should precisely capture the right prin- cipal attributes. For superformula examples, we check this by comparing the shapes generated from the latent space F with those from the superformula parameter space P . For the example shown in Fig. 3.7, the clustering separated shapes with a different number of arms, and each cluster?s embedding correctly captured the two principal attributes?aspect ratio and roundness. In Fig. 3.8?s example, as expected, the em- bedding captured the aspect ratio and the roundness attributes of the four-pointed stars, and just the roundness for the five-pointed stars. Figure 3.9 shows the synthesized glassware in a 3-dimensional latent space. This embedding captured three attributes?the rim diameter, the stem diameter, and the curvature. With the variation of these three attributes, we synthesized a collection of shapes that generally covers all the training samples?wine, beer, champagne, and cocktail glasses. Similarly, the airfoil embedding shown in Fig. 3.10 also captured three attributes?the upper and lower surface protrusion, and the trailing edge direc- tion. For the airfoil example, the uncovered design manifold allows us to optimize airfoil 41 Curvature ction edge dire railin g Upper surface protrusion T Figure 3.10: Synthesized airfoil shapes in a 3D latent space. The embedding captured three shape attributes?the upper and lower surface protrusion, and the trailing edge direction. B B A A (a) Arrangement of training samples in (b) Synthesized shapes. the latent space (for simplicity, this is a 2D projection of the 3D latent space). Figure 3.11: Point A has high sample density, and thus higher confidence that synthesized shapes will look similar to nearby real-world samples. In contrast, point B has low sample density, and thus lower confidence but higher chance of generating an unusual or creative shape. Shade darkness correlates with higher local density. shapes over this continuous low-dimensional space instead of the original geometric design space, using metamodeling techniques like Bayesian optimization. In our future work, we will compare the performance of metamodeling techniques with and without applying our dimensionality reduction method. 3.6.3 Are all regions on the manifold equally valid? The different shades in Fig. 3.7?3.10 represent for local density of the shape col- lection (i.e., training samples) distributed in the latent space F . At positions nearby 42 Lower surface protrusion the training samples (e.g., point A in Fig. 3.11), we plot the synthesized shape with a darker color. This means we are more certain of its validity because it lies closer to real-world training samples. In contrast, in locations where training samples are sparse (e.g., point B in Fig. 3.11), we plot the synthesized shape with a lighter color. This means we are less certain of its validity, and our model may create shapes that diverge from the training samples. In the latent space where the training samples are absent?what we call a design cavity?new designs might be innovative or they may be unrealistic. For example, as shown in Fig. 3.9, inside regions where shapes have light colors, we synthesized glassware that is not similar to any of our training samples. In this case, the model generated innovative designs rather than unrealistic ones; however, to our knowledge there is no formal mathematical way of differenti- ating those two cases automatically. Doing so would be a compelling topic of future research. 3.6.4 Sample Arrangement in Latent Space Figure 3.12 shows the comparison between results obtained from different em- bedding methods. For this superformula example, PCA has a high reconstruction error because the design space is nonlinear. This results in some abnormal shapes (i.e., shapes with attributes that do not exist in the parameter space in Fig. 3.12a) in the latent space created by PCA (Fig. 3.12c). SdA and kernel PCA have similar reconstruction errors. However, the latent space created by kernel PCA (Fig. 3.12d) better aligns with the original parameter space than that of SdA (Fig. 3.12e), because SdA generates abnormal shapes (Fig. 3.12e, top right). Since these abnormal shapes have light colors, we know that they are inside the design cavity; while in fact if the embedding preserved the geodesic distances between samples (i.e., low GDI), the 43 (a) Shapes in the superformula param- (b) Reconstruction error and geodesic dis- eter space. tance inconsistency. (c) Embedding and shape syn- (d) Embedding and shape syn- (e) Embedding and shape syn- thesis result by PCA. thesis result by kernel PCA. thesis result by a stacked de- noising autoencoder. Figure 3.12: Comparison of different embedding methods. The abnormal shapes generated by PCA and SdA are due to high reconstruction error and high GDI respectively. sample arrangement should resemble that of PCA or kernel PCA, where there is no design cavity (Fig. 3.12c and 3.12d). Therefore both high reconstruction errors (e.g., PCA in this example) and high GDI (e.g., SdA in this example) can create abnormal or invalid shapes. 3.7 Summary We introduced an approach to learn the inherent properties of a design space and evaluate two-way mappings between a design space and a latent space. By correctly identifying the design space properties such as the number of design categories and the intrinsic dimension, one can then create an embedding that precisely captures the principal attributes of each design category, assuming that the embedding is 44 well-chosen based on the reconstruction error and the pairwise distance preservation. This means the synthesized shapes will have no unexpected shape variation, missing diversity, or repeated shapes (brought about by unnecessary dimensions). We also introduced a benchmark for rigorously testing design embeddings that accounted for non-linear, multiple (potentially intersecting) manifolds of controllable intrinsic dimension. We encourage others to use this benchmark to improve future design embeddings. While this chapter mainly addressed geometric design spaces, our approach would extend to any type of design space, including those that involve text, materials, or combinations with geometry (For an example of a combined material and geometry space). It could be applied to improving interfaces that help novices explore designs, aiding high-dimensional design optimization, and helping model consumer preferences in high-dimensional design spaces. Our work?s main implication is that choosing a de- sign embedding carries with it important choices about what you value in your latent space: Should it reconstruct designs consistently? Should it preserve the topology of the design space? What semantic attributes should the latent space capture? Choos- ing an embedding with the properties you want is not straightforward; our approach provides a principled way to compare and contrast embeddings?to help navigate those options and identify useful properties of both the embedding and your design space in general. So far this dissertation has shown how to measure the intrinsic complexity of a design space, which guides data-driven design synthesis process. Purely data-driven design synthesis methods ignore designs? functional or geometrical constraints and characteristics, and thus will have limitations. The next two chapters will look at in- corporating prior knowledge into data-driven models to improve the quality of design synthesis. 45 Chapter 4: Aerodynamic Design Synthesis, Optimization, and Shape Exploration The work in this chapter was published in the AIAA SciTech 2019 Forum [42] 4.1 Introduction Aerodynamic shape optimization is a necessary step in designing parts like air- craft wings and (propeller/rotor/turbine) blades. It has been an active research area for over 60 years [198]. The bottleneck of most global optimization methods for aerodynamic design is the computational cost of the computational fluid dynam- ics (CFD) simulations. To combat this, surrogate-based modeling approaches are used [4, 5, 76, 123, 146] to reduce the number of simulations by balancing explo- ration and exploitation while sampling the design space. However, the computational cost of sampling the design space increases exponentially with the dimensionality of the design space due to the curse of dimensionality [14]. Previous research has looked into dimensionality reduction (DR) of the original parametric design space (i.e., the space of designs represented by shape parameters such as B-spline control points). This permits faster exploration by capturing only those dimensions that either affect the final design?s performance [19?21, 88, 141] or capture major shape variability [55, 56, 223, 224, 239]. But these DR models may not accurately capture the true variation that we observed in real-world airfoils, e.g., those in the UIUC 46 airfoils database. A vast amount of research on DR has been conducted in the field of machine learning, where deep neural networks such as variational autoencoders (VAEs) [117] and generative adversarial networks (GANs) [85] have successfully rep- resented data from complex high-dimensional distributions, such as images, by using low-dimensional latent variables. In this chapter, we address the problem of reducing the design space dimensionality for aerodynamic shape optimization. We apply GANs to learn an interpretable low- dimensional space (i.e., the latent space) that encodes how aerodynamic shapes vary. To avoid the limitation caused by shape parameterizations (e.g., curve-fitting errors and the lack of representation flexibility), we learn directly the distribution of points along the curves instead of curve parameters (such as Be?zier control points). However, na??ve application of neural network techniques to airfoil designs does not work well because the output is noisy and does not conserve important continuity properties important for aerodynamic shapes. Therefore, we use Be?zier-GANs [46] to generate aerodynamic shapes. The Be?zier-GAN model can be used for reducing the design parameters of any smooth shapes such as aerodynamic or hydrodynamic designs. The design optimization method demonstrated in this chapter is applicable to cases where the design space is reducible and represented by a lower-dimensional representation. The specific scientific contributions of this paper are: 1. A new type of generative model appropriate for smooth geometry (such as those expressed via splines or Bezier curves) that improves the sample quality and convergence rate compared to traditional GANs. 2. A study of the comparative optima and convergence rate of several compet- ing optimization methods?multiple parametric forms including our approach, Principal Component Analysis, PARSEC, and NURBS representations, along 47 with two optimization strategies (Bayesian Optimization and Genetic Algo- rithms)?and illuminate the performance conditions under which different ap- proaches improve over others. 3. A two-stage optimization method that prioritizes the optimization of major shape attributes. 4.2 Background In this section, we introduce previous work on common algorithms used in aero- dynamic design optimization (Sec. 4.2.1), parameterization techniques (Sec. 4.2.2), and methods for reducing design space dimensionality (Sec. 4.2.3). 4.2.1 Optimization Methods Aerodynamic design is, in large part, an optimization problem. One common objective is to find design variables that minimize the drag coefficient CD, while maximizing or constraining the lift coefficient CL [20, 88, 223]. There are primarily three approaches for solving the optimization problem: evolutionary algorithms (EA), surrogate-based optimization (SBO), and gradient-based methods. Evolutionary algorithms (EA) are gradient-free optimization algorithms that mimic the process of biological evolution through mutation, recombination, and reproduc- tion of different designs. Genetic algorithms (GA), a type of EA, is widely used in aerodynamic shape optimization [56, 106, 224]. Work has also been done to augment GA with the Bees algorithm [211] and adaptive mutation rates [104], resulting in more accurate optimization and/or faster convergence. Other EA methods applied in aerodynamic optimization are differential evolution [139] and particle swarm op- timization [173, 219]. However, due to the large number of function calls needed in 48 each generation, EAs can be prohibitively expensive computationally, especially if every evaluation requires a high-fidelity computational fluid dynamics simulation. Surrogate-based optimization (SBO) uses an inexpensive surrogate model to ap- proximate the expensive function of the quantity of interest (QoI) (i.e., the opti- mization objective). Bayesian optimization (BO) is a commonly used method for SBO. It consists of two components ? a sampling method (e.g., maximum expected improvement [108] or maximum upper confidence bound [205]) and a surrogate mod- eling method (e.g., Gaussian process regression, also know as kriging [171]). In each iteration, the sampling method samples a point in the design space for eval- uation of the QoI, and then that point and its QoI update the surrogate model. Compared to methods like genetic algorithms, surrogate-based optimization reduces the number of expensive CFD evaluations needed in aerodynamic shape optimiza- tion [20, 55, 92, 141, 224]. However, for a high-dimensional design space, the number of evaluations will still be inevitably high due to the curse of dimensionality [14, 174]. Note that in these cases, kriging can also be prohibitively expensive at the later stage when the model is trained on a large number of evaluated samples since its compu- tational cost scales cubically with the sample size (though practical approximation methods do exist to reduce this cost). Gradient-based methods search for the optimal solution based on the gradient of the objective function. When the objective is based on CFD simulations, auto- matic differentiation (AD)?a generalization of adjoint methods used by the CFD community?is usually used to compute the gradients. It provides a relatively fast and exact method of calculating numerical gradients. The computer records ev- ery elemental operation used to calculate a QoI (?forward pass?) before reversing through this ?tape? to determine the sensitivity of the QoI with respect to each parameter. Generally, gradient calculations are exact and have a computational 49 cost within an order of magnitude of the forward pass. Because of this, previous work [65, 112, 145, 184, 185, 213, 225] have used AD for gradient calculations. Com- bined with optimization algorithms such as SQP [65, 225], steepest descent [184], and Newton- and quasi-Newton methods [112, 145, 185], AD can drastically acceler- ate gradient calculations in the optimization process, even in complex or turbulent models [145, 185, 225]. However, for optimization using state-of-the-art turbulence models such as Large Eddy Simulation, one cannot use adjoint methods because the chaotic butterfly ex- ponential divergence of trajectories makes the adjoint ill-posed [129]. In addition, an AD gradient is only applicable at one point; thus, unlike e.g.analytical derivatives, where a single equation provides exact derivatives at any point, AD requires a forward pass before each new gradient calculation, contributing to a large portion of the op- timization cost. In terms of memory, building the tape of operators can be expensive compared to, e.g., a finite difference method. Additionally, as a method of gradient calculation, AD will still maintain the disadvantage inherent in gradient-based algo- rithms, e.g., converging to local minima. As a workaround solution, Berguin et al. [21] use solutions to SBO as starting points for AD methods, hoping to find good local optima. 4.2.2 Shape Parameterization Parameterization maps a set of parameters to points along a smooth curve or sur- face via a parametric function. Common parameterization for aerodynamic shapes in- cludes splines (e.g., B-spline and Be?zier curves) [105, 178, 218], free-form deformation (FFD) [113, 186], class-shape transformations (CST) [126, 152], PARSEC [138, 202], and Be?zier-PARSEC [63]. While this work does not study paramterization, we show 50 the optimization performance of two paramterization approaches, namely nonuniform rational B-splines (NURBS) [134] and PARSEC [138], in comparison to our proposed method. Usually during design optimization, parameters are sampled to generate design candidates [92]. There are two main issues when optimizing these parameters from conventional parameterization: (1) one has to guess the limits of the parameters to form a bounding-box within which the optimization operates, and (2) the design space dimensionality is usually higher than the underlying dimensionality for representing sufficient shape variability [49] ?i.e., to capture sufficient shape variation, manually designed shape parameterizations require higher dimensions than are strictly neces- sary. 4.2.3 Design Space Dimensionality Reduction It is computationally expensive to search for solutions in the design space directly due to the space?s high dimensionality. Factor screening methods [151, 230] are used to select the most relevant design variables for a design problem while fixing the rest as constant during optimization. These methods fail to consider the correlation between design variables. Thus, researchers have found ways to capture the low- dimensional subspace that identifies important directions with respect to the change of response (i.e., QoI or performance measure) [19?21, 88, 141]. This response-based dimensionality reduction usually has several issues: 1) it requires many simulations when collecting samples of response gradients; 2) variation in gradients can only capture non-linearity rather than variability in the response, so extra heuristics are required to select latent dimensions that capture steep linear response changes; 3) the learned latent space is not reusable for any different design space exploration or 51 optimization task (i.e., when a different response is used); and 4) the linear DR techniques applied in previous work may fail on responses with a non-linear correlation between partial derivatives. The first three issues can be avoided by directly applying DR on design vari- ables without associating them with the response. Note that by doing this, we are assuming that if changes in a design are negligible, changes of responses are also negligible. In the area of aerodynamic design, researchers use linear models such as proper orthogonal decomposition (POD), also known as principal component anal- ysis (PCA) [55, 56, 239], and nonlinear models like generative topographic map- ping [223, 224] to reduce the dimension of design variables. More work on DR has been done in other fields such as image processing and computer graphics [84, 132], where DR is used for generating and visualizing data. Deep neural networks such as VAEs and GANs have been widely applied in these areas to learn the latent rep- resentation of data. These methods are known for their ability to learn complex high-dimensional data distributions. Our work extends this class of techniques by considering the generation of smooth geometries such as those needed in spline-based representations. Note that as DR models map latent variables to shapes, we can treat the latent variables and the mapping as parameters and the parametric function. Thus, in a broader sense, we will also refer to these methods as parameterization in Sec. 4.6. 4.3 Obtaining Disentangled Latent Representation Using Generative Adversarial Networks We use a method based on GANs [85] to train a generative model that synthesizes aerodynamic shapes from interpretable low-dimensional latent codes. GANs are one 52 type of deep neural network architecture which consists of two components: a gen- erator and a discriminator. The generator takes in random noise from some known prior distribution Pz. Its objective is to generate samples from the desired distribu- tion (i.e., data distribution Pdata). The discriminator takes in a sample (either from the training data or synthesized by the generator) and predicts the probability of the sample coming from the training data. The generator tries to make the generative distribution PG look like Pdata to fool the discriminator; the discriminator tries not to be fooled. GANs achieve this by minimizing the objective: min maxV (D,G) = Ex?P [logD(x)] + Ez?Pz [log(1?D(G(z)))] (4.1)data G D where D is the discriminator, and G is the generator. Both components improve via training until the discriminator cannot differentiate between real and fake inputs, implying that the generative distribution resembles the data distribution. A trained generator thus can map from the predefined noise distribution to the distribution of designs. The noise input z is considered as the latent representation in the dimen- sionality reduction scenario, since z captures the variability of the data. Standard GANs are not built for learning latent representations; thus, they can- not be used to reduce the dimensionality of the design space. To compensate for this weakness, InfoGANs [52] encourage interpretable latent representations by maximiz- ing the mutual information between some noise variables (called latent codes) and the generated samples. Thus, InfoGAN?s generator takes both latent codes c and random noise z as inputs. Unfortunately, it is hard to directly maximize the mutual informa- tion I(c;G(c, z)), so instead an InfoGAN approximates the solution by maximizing a lower bound. In practice, this is realized by adding an extra fully connected layer to the discriminator to predict the latent codes. Please refer to Ref. [52] and Sect. 5.2.3 53 for more details about the InfoGAN. We build upon this line of work below, extending it to spline-based geometry. 4.4 Spline-Based Shape Synthesis Typical approaches to generative shape models (such as GANs) represent shapes as a collection of discrete samples (e.g., as pixels or voxels) owing to their original development in the computer vision community. For example, a na??ve way of synthe- sizing shapes like airfoils would be to generate this discrete representation directly using the generator, such as generating a fixed number of coordinates sampled along the airfoils boundary curve (e.g., Fig. 4.2, right). However, in practice, airfoils typ- ically possess substantial smoothness/continuity and are typically represented using parametric curve families like splines, Be?zier curves, or NURBS surfaces. The na??ve GAN representation of predicting discretized curves from the generator usually (1) creates noisy curves that have low smoothness and (2) has parametric outputs that are harder for humans to interpret and use in standard CAD packages compared to equivalent curve representations (e.g., Be?zier curves). This creates problems, partic- ularly in aerodynamic shape synthesis. To solve this issue, we modified the InfoGAN?s generator such that it only gener- ates smooth shapes that conform to Be?zier curves. We call this generative adversarial network a Be?zier-GAN [46]. As shown in Fig. 4.1, most of its architecture is adapted from the InfoGAN. However, before outputting discrete coordinates along the curve, the generator synthesizes control points P , weights w, and parameter variables t of rational Be?zier curves. The last layer?the Be?zier layer?converts this rational Be?zier 54 curve representation into discrete representation X: ??n ( )ni=0 tij(1? t )n?ij Pi iwiXj = n ( )n , j = 0, ...,m (4.2)i i=0 tj(1? t )n?iwi j i where n is the Be?zier degree, and the number of discrete points to represent the curve is m + 1. Since variables {Pi}, {wi}, and {tj} are differentiable in Eq. 4.2, we can train the network using back propagation. Figure 4.2 compares synthesized shapes with and without using a Be?zier layer. Generator Discriminator Control points Real/Fake Weights Latent codes Synthesized Latent codes Deconvolutional Bezier Convolutional layers layer layers Fully Noise connected Parameter layers variables Fully connected Samples from layers dataset Figure 4.1: Model architecture of the Be?zier-GAN. Use a B?zier layer Not use a B?zier layer Figure 4.2: Synthesized airfoils using a generator with and without a Be?zier layer. 4.5 Optimization over the Learned Latent Space 4.5.1 The Optimization Problem in the Latent Space The optimal aerodynamic shape can be solved by x? = arg minx f(x), where x is an aerodynamic shape (expressed in this case by the latent codes and the Be?zier curve parameters) and f(x) is some performance measure defined over x (e.g., lift, 55 drag, etc.). Since the function f is usually non-convex, methods such as EA or SBO are often used for optimization [62, 70, 77, 249]. These methods search for the global optimum by exploring the design space X . However, since X is usually high- dimensional, it takes many performance evaluations to find the optimal solution due to the curse of dimensionality [14]. Since we can modify x by changing the latent code c, which has a lower dimension than x, finding the optimal shape x? is equivalent to finding an optimal latent code c?. Thus, we solve the following problem instead: c? = arg minh(c) = arg min f(Ez?Pz [G(c, z)]) (4.3) c c and then use c? to synthesize the optimal shape x? = E [G(c?z?Pz , z)]. We can then apply SBO to find c? instead of x?. There are two major components in SBO: an inference model and an acquisition function. In this chapter we use a SBO method called Efficient Global Optimization (EGO) [108], which uses Gaussian process (GP) regression [171] as the inference model, and expected improvement (EI) [108] as the acquisition function: E[I(c)] = E[max(hmin ? h(c), 0)] (4.4) where hmin is the current best function value. At each step t of EGO we want to find c(t) that is expected to best improve upon the current optimal solution: c(t) = arg maxE[I(c)] (4.5) c Now with the ingredients of GP regression and EI, the EGO process simply repeats the following steps: 56 1. Estimate the function h by using GP regression; 2. Compute EI using Eq. 4.4 and the learned GP model; 3. Search for the point c(t) that has the highest EI (solving Eq. 4.5); 4. Evaluate the function h at c(t), and add the new (c(t), h(c(t))) pair into the dataset for learning GP regression. 4.5.2 Unbounded Bayesian Optimization To find c(t) in Step 3, we can use gradient-based optimization algorithms like BFGS [25, 74, 81, 192]. However, these methods may get stuck in local optima and are unstable if operated in an unbounded space (i.e., the solution may go too far from feasible regions and thus will diverge). Random search is a simple alternative approach that searches c(t) within fixed variable bounds; however, the optimal solu- tion may be located outside those bounds. To circumvent these issues, we search for the solution near c(t?1) (i.e., the point evaluated at step t ? 1) without requiring a boundary. Specifically, we search for c(t) among samples drawn from the distribution N (c(t?1), ?2), where ? controls the dispersion of the drawn samples (Fig. 4.3). Com- bined with the EI criteria, each iteration of the search area moves in the direction which is expected to improve the current optimal solution. This moving search area eliminates the limitation of variable bounds. A larger ? encourages exploration and prevents the solution from getting stuck in local optima while a smaller ? encourages exploitation and refines the current optimal solution. We use a decreasing ? over it- erations (i.e., in each iteration, multiply ? by a constant ? that is close to but smaller than 1), so that the algorithm first explores then focuses more on exploitation. 57 Figure 4.3: Unbounded sampling in Bayesian optimization. 4.6 Experiment: Airfoil Synthesis and Shape Optimization In this section, we demonstrate our method via an airfoil optimization task. Rather than targeting one specific airfoil model (e.g., the NACA 0012 airfoil in Ref. [88] or the RAE 2822 airfoil in Ref. [224]) and its perturbations, we search for the optimal design within all the existing major airfoil models. We show that Be?zier- GAN learns realistic shape variations from these airfoil models and that optimizing in the latent space accelerates convergence. 4.6.1 Dataset and Preprocessing We use the UIUC airfoil database1 as our training data for the Be?zier-GAN. It provides the geometries of nearly 1,600 real-world airfoil designs, each of which is represented by discrete coordinates along their upper and lower surfaces. The number 1http://m-selig.ae.illinois.edu/ads/coord_database.html 58 of coordinates for each airfoil is inconsistent across the database, so we use B-spline interpolation to obtain consistent shape representations. Specifically, we interpolate 192 points over each airfoil with the concentration of these points along the B-spline curve based on the curvature [105]. The preprocessed data are visualized at the top of Fig. 4.4. c c c c2 c2 c2 c1 c1 c1 Figure 4.4: Examples in the airfoil database and synthesized airfoil shapes in three- dimensional latent spaces (visualized by uniform slices of multiple two-dimensional spaces). 4.6.2 Dimensionality Reduction We build a Be?zier-GAN model based on the architecture in Fig. 4.1. The la- tent codes are from a three-dimensional uniform distribution, and the input noise is from a ten-dimensional Gaussian distribution. In the discriminator, we use six one-dimensional convolutional layers followed by fully connected layers to predict la- tent codes and the probability of the input data coming from the dataset. For the generator, we use three one-dimensional deconvolutional layers [243] to predict the control points {Pi|i = 0, ..., n} and the weights {wi|i = 0, ..., n}, and three fully con- nected layers followed by a softmax activation to predict discrete differences between 59 parameter variables {tj+1 ? tj|j = 0, ...,m ? 1}. Interested readers can refer to de- tailed network architectures and hyperparameters in our Tensorflow implementation available on Github2. We optimize the Be?zier-GAN using an Adam optimizer [116] and train it on an Nvidia Titan X GPU. The wall-clock training time is about 1 hour, and the inference takes less than 15 seconds. Figure 4.4 shows synthesized airfoil shapes by linearly interpolating points in the latent space. The middle subplot shows that airfoils synthesized by Be?zier-GAN are realistic and capture most variation in the airfoil dataset. We also obtained an interpretable latent space: the horizontal axis (c1) captured the leading edge angle, the vertical axis (c2) captured the trailing edge angle, and the third axis (c3) captured the thickness. We use PCA as a baseline DR method to compare the synthesis quality. The latent space is also set to three-dimensional. The results of PCA are shown at the bottom of Fig. 4.43. Compared to Be?zier-GAN, PCA shows the limitations of a linear DR model by synthesizing unrealistic designs in some regions of the latent space. 4.6.3 Optimization Our optimization objective is to maximize the lift to drag ratio CL/CD. We use XFOIL [67] to compute the lift and drag coefficients C and C .4L D The XFOIL operation conditions are set as follows: Reynolds number Re = 1.8 ? 106, Mach number Ma = 0.01, and angle of attack ? = 0?. 2https://github.com/IDEALLab/airfoil-opt-gan 3The bounds of the visualized latent space are based on the latent coordinates of the data, i.e., the minimum bounding box for data points projected onto the latent space. 4We use XFOIL here to demonstrate our scientific contributions, however, our approach is not limited to XFOIL. One can apply our techniques to any CFD or performance code including RANS or LES. 60 For Be?zier-GAN and PCA, we apply EGO on the three-dimensional latent spaces and use the trained Be?zier-GAN generator or the inverse transformation of PCA to synthesize the optimal airfoils corresponding to the optimal latent codes. We also compare these results to optimizing directly in the parametric design space. Specifically, we use two parameterizations, NURBS and PARSEC, and two optimization algorithms, EGO and GA, as additional experiments. We use the NACA 0012 airfoil as the initial design. The NURBS parameterization is based on Ref. [224]. The design space is defined as a ?0.1 perturbation of the initial NURBS control point coordinates or a 20% perturbation of the initial PARSEC parameters. The population size of the GA is 100, and the chance of mutation (i.e., the probability of mutating an individual?s parameter) is 0.1. In each generation, we choose 30 best and 10 random individuals for crossover, and produce 5 children for each pair. We direct interested readers to our code for more details. We run each experiment so that the total number of CL/CD evaluations is 1000. The results of each experiment setting are averaged over 10 runs. Figure 4.5 shows the best-so-far CL/CD versus the number of evaluations. It shows that the value reached in 100 XFOIL evaluations by Be?zier-GAN+EGO takes other methods at least 500 XFOIL evaluations to reach. Figure 4.6 shows the optimal airfoils for all experi- ment settings. Runs from the same scenario are plotted on the same subplot. For PCA+EGO and NURBS+GA, the final optimal solutions are inconsistent compared to other methods, indicating the optimization converged to different local optima. The values of maximal CL/CD after 100 and 1000 evaluations are shown in Table 4.1. 61 Figure 4.5: Optimization history (averaged over 10 runs). Figure 4.6: Optimal airfoils (airfoils in the same subplot are under the same experimental configuration but different runs). 4.6.4 GA Refining Figure 4.5 shows that when using Be?zier-GAN, the optimal CL/CD stops im- proving after 100 evaluations, whereas the optimal CL/CD improves continuously, though slowly, when using the NURBS parameterization. This is because the three- dimensional latent space does not contain as much shape variation as the NURBS design space. However, while the three-dimensional latent space captures major shape variations, minor shape variations are captured by the noise space (i.e., the space of the random input noise of Be?zier-GAN). Therefore, we can further search for an im- provement in that noise space. We achieve this by using the optimal solution of EGO 62 Table 4.1: Values of CL/CD for optimal solutions. # Eval. Be?zierGAN+EGO PCA+EGO NURBS+EGO NURBS+GA 100 195.41 ? 2.94 149.86? 13.46 122.75? 20.41 64.97? 4.82 1000 200.80 ? 2.12 181.86? 21.87 197.37? 3.22 142.35? 20.38 # Eval. PARSEC+EGO PARSEC+GA 100 26.44? 3.88 24.11? 0.90 1000 27.26? 4.07 53.77? 3.24 after 100 evaluations as the initial design and run GAs in both the latent space and the noise space. We call this GA refining. Specifically, we allow larger shape varia- tion on the noise variables while limiting the variation on the latent variables during mutation. The results are shown in Figs. 4.7 and 4.8. In this way, the optimal CL/CD keeps improving even after the latent space is exploited. Figure 4.7: Optimization history for Be?zierGAN+EGO with and without GA refining. Figure 4.8: Optimal airfoils for Be?zierGAN+EGO with and without GA refining. 63 4.7 Summary We use a Be?zier-GAN to capture a low-dimensional latent space that encodes major shape variability of aerodynamic designs. Design optimization can then be conducted in this latent space to reduce the number of evaluations required to find the optimal solution. Our results show that our Be?zier-GAN method significantly accelerates convergence and finds optimal designs that are comparable to those found by other algorithms. Because the latent space discards minor variability in designs that can potentially contribute to higher performance, the final optimal solution may be not as good as directly optimizing in the design space given a sufficient number of evaluations. The GA refining mitigates this issue by continuing to explore the input noise space of the GAN after discovering good latent variables. There are other ways to improve the optimal solution while maintaining fast convergence. For example, the optimal solution obtained by our method can be used as a good start point for gradient-based optimization methods (e.g., as in Berguin et al. [21]). For future research, we can concatenate a trained Be?zier-GAN generator and an automatic differentiation solver to obtain the gradient of a QoI with respect to each latent variable directly. The low-dimensional gradients can then be applied to solve optimization problems. Different from previous DR research for aerodynamic shape optimization which only targets one specific QoI (i.e., response-based DR) or one airfoil model, the learned latent space in this work is reusable for optimizing any QoI for any airfoil model included in the UIUC airfoil database. So far this dissertation has shown how to measure the intrinsic complexity of a design space to guide data-driven design synthesis and how to constrain the data- driven model to generate valid aerodynamic shapes. The next chapter will introduce 64 a design synthesis model that considers the inner-part dependencies of designs, which is another example of incorporating prior knowledge into data-driven design synthesis models. 65 Chapter 5: Synthesizing Designs with Inter-Part Dependencies Portions of the work in this chapter were accepted to the Journal of Mechanical De- sign [48] and published in the ASME International Design Technical Conference [50] 5.1 Introduction Representing a high-dimensional design space with a lower-dimensional latent space makes it easier to explore, visualize, or optimize complex designs. This of- ten means finding a latent representation, or a manifold, along which valid designs, such as geometries, lie [44, 49]. While this works well for single parts, designs usually have multiple parts with inter-part dependencies. For example, the size and position of a conduit, lightening, or alignment hole in an airframe structure depend on the shape of the airfoil. Here we assume that design components are synthesized sequentially and do not consider bi- directional inter-part dependencies, although this kind of dependency exists in some design tasks (e.g., alternating optimization of Part A and Part B in a design). We leave this topic for future research. Thus we can describe inter-part dependencies in a design by using a directed acyclic graph (DAG). This DAG captures whether the geometry of a part depends on the geometry of its parent part(s). In this case, one may want to identify first the parent manifold that captures the major variation of parent shapes, and then the child manifold that captures the major variation of feasible child 66 Figure 5.1: Manifolds of parent and child shapes shapes conditioned on any parent shape (Fig. 5.1). Because, for example, we may first optimize the airfoil shape on the airfoil manifold (parent) to obtain the optimal lift and drag; and then given the optimal airfoil, we may optimize the hole?s size and position on the hole manifold (child) for other considerations like light-weighting, etc. This chapter addresses the problem of capturing the latent representation for multi-components designs. However, finding individual part manifolds that both rep- resent the design space well, while also satisfying part configuration, is non-trivial. Traditionally, to learn the inter-part dependency, one has to either define explicit constraints [120, 121] or learn implicit constraints via adaptive sampling [44, 45]. The former uses hard-coded (often application-specific) constraints and hence lacks flexibility; whereas the latter queries external sources by human annotation, experi- ment, or simulation, and thus is expensive. In this chapter, we solve these problems by instead learning these constraints given examples. We assume that we only have the prior knowledge on inter-part dependencies, but not the specific types of con- straints (e.g., concentric, alignment, tangent, etc.) that confine the geometry of each part. We do this by identifying different levels of manifolds, where the higher-level (parent) manifold imposes implicit constraints on the lower-level (child) manifolds. 67 We propose a deep generative model that synthesizes designs in a hierarchical man- ner according to those inter-part dependencies: it first synthesizes parent parts, and then synthesizes parts conditioned on those parent parts and so on. At each level, the model simultaneously captures a parent manifold and an infinite number of child manifolds which are conditioned on parent parts. This results in latent spaces that can synthesize and search each part individually as well as their assemblies. Importantly, our method is fully data-driven and requires no hard-coded rules or querying external sources, except for providing an example dataset of designs with part correspondence and known inter-part dependencies. For designs without part correspondence, we can apply unsupervised co-segmentation [194] as a pre-processing step. This work facilitates the understanding of complex design spaces where inter-part dependencies exist and can be used for efficient hierarchical design space exploration. The chapter?s key contributions are as follows: 1. A novel deep generative model architecture that simultaneously learns a design?s inter-part dependencies and each part?s geometry variation conditioned on the corresponding parent part(s). It decouples each part?s latent space so that we can perform design space exploration in separate low-dimensional latent spaces. 2. New benchmark datasets?both real-world and synthetic?that can be used for studying different kinds of inter-part dependencies including: type of geometric constraints, depth of hierarchy, and branching factor of parent/child relation- ships. This dataset can aid in the future evaluation of generative models of hierarchical parts. 3. Characterizing the effects of sample size and part dependencies? complexity (depth and branching factor) on the synthesis performance of our generative model. 68 4. A new evaluation metric for generative models that measures the consistency of shape variation in the latent space. 5.2 Related Work Our work produces generative models that synthesize designs from latent repre- sentations. There are primarily two streams of related research from the fields of engineering design and computer graphics?Specifically, design space dimensional- ity reduction and design synthesis. We also review generative adversarial networks (GANs) [85], which we use to build our model. 5.2.1 Design Space Dimensionality Reduction While designs can be parametrized by various techniques [179], the number of de- sign variables (i.e., the dimensionality of a design space) increases with the geometric variability of designs. In tasks like design optimization, to find better designs we usually need a design space with higher variability, i.e., higher dimensionality. This demand creates the problem of exploring a high-dimensional design space. Based on the curse of dimensionality [14], the cost of exploring the design space grows exponen- tially with its dimensionality. Thus, researchers have studied approaches for reducing the design space dimensionality. Normally, dimensionality reduction methods identify a lower-dimensional latent space that captures most of the design space?s variability. This can be grouped into linear and non-linear methods. Linear dimensionality reduction methods select a set of optimal directions or ba- sis functions where the variance of shape geometry or certain simulation output is maximized. Such methods include the Karhunen-Loe?ve expansion (KLE) [51, 66], principal component analysis (PCA) [69], and the active subspaces approach [214]. 69 In practice, it is more reasonable to assume that design variables lie on a non- linear manifold, rather than a hyper-plane. Thus, researchers also apply non-linear methods to reduce the dimensionality of design spaces. This non-linearity can be achieved by (1) applying linear reduction techniques locally to construct a non-linear global manifold [69, 130, 167?169]; (2) using kernel methods with linear reduction techniques (i.e., using linear methods in a Reproducing Kernel Hilbert Space that then induces non-linearity in the original design space) [49, 69]; (3) latent variable models like Gaussian process latent variable model (GPLVM) and generative topo- graphic mapping (GTM) [224]; and 4) neural networks based approaches such as self-organizing maps [165] and autoencoders [27, 49, 59, 69]. This work differs from these past approaches in that we aim at identifying two- level latent spaces with the lower-level encodes inter-part dependencies, rather than learning only one latent space for the complete design. 5.2.2 Data-Driven Design Synthesis Design synthesis methods can be divided into two categories: rule-based and data- driven design synthesis. The former (e.g., grammars-based design synthesis [80, 120, 121]) requires labeling of the reference points or surfaces and defining rule sets, so that new designs are synthesized according to this hard-coded prior knowledge; while the latter learns rules/constraints from a database and generates plausible new designs with similar structure/function to exemplars in the database. Usually, dimensionality reduction techniques allow inverse transformations from the latent space back to the design space, thus can synthesize new designs from latent variables [49, 51, 59, 69]. For example, under the PCA model, the latent variables define a linear combination of principal components to synthesize a new 70 design [51]; for local manifold based approaches, a new design can be synthesized via interpolation between neighboring points on the local manifold [130]; and under the autoencoder model, the trained decoder maps any given point in the latent space to a new design [27, 59]. Researchers have also employed generative models such as kernel density estimation [209], Boltzmann machines [99], variational autoencoders (VAEs) [153], and generative adversarial nets (GANs) [137, 231] to learn the distribu- tion of samples in the design space, and synthesize new designs by drawing samples from the learned distribution. Discriminative models like deep residual networks [197] are also used to generate 3D shapes. These aforementioned models synthesize a design or a shape as a whole. There are methods that synthesize new shapes by assembling or reorganizing parts from an existing shape database, while preserving the desired structures [36, 109, 208, 234, 246]. The shapes are usually parametrized by high-level abstract representations, such as hand-crafted feature vectors [109] or shape grammars [208]. While these methods edit shapes at a high-level, they do not control the local geometry of each synthesized component. Previously the inter-part dependencies of shapes have been modeled by grammar induction [208], kernel density estimation [73], probabilistic graph models [36, 99, 109], and recursive autoencoders [137]. Those methods handle part relations and design synthesis separately. In contrast, our method encodes part relations through the model architecture, so that it simultaneously learns the inter-part dependencies and single part geometry variation. The model can also be used for inferring the generative distribution of each part conditioned on any parent part. 71 5.2.3 Generative Adversarial Networks As introduced in Sect. 4.3, generative adversarial nets [85] model a game between a generative model (generator) and a discriminative model (discriminator). One advantage GANs hold over Variational Auto-encoders (VAEs) [117] is that GANs tend to generate more realistic data [166]. But a disadvantage of the original GAN formulation is that it cannot learn an interpretable latent representation. Built upon these ?vanilla? GANs, the InfoGAN [52] aims at regularizing the latent representation of the data space by maximizing a lower bound of the mutual information between a set of latent codes c and the generated data. The generator is provided with both z and c. Thus, the generator distribution PG = G(c, z) is conditioned on c. The mutual information lower bound LI is LI(G,Q) = Ec?P (c),x?G(c,z)[logQ(c|x)] +H(c) (5.1) where H(c) is the entropy of the latent codes, and Q(c|x) is called the auxiliary distribution which approximates P (c|x). We direct interested readers to [52] for the derivation of LI . The InfoGAN objective combines LI with the standard GAN objective: min maxVInfoGAN(D,G,Q) = V (D,G)? ?LI(G,Q) (5.2) G,Q D where ? is a weight parameter. In practice, H(c) is treated as constant if the distribution of c is fixed. The auxiliary distribution Q is parametrized by a neural network?here we call it the auxiliary network. In our design synthesis scenario, the latent codes c can represent any continuous or discrete factor that controls the geometry of the design, e.g., the upper/lower surface 72 z Figure 5.2: Architectures of the standard GAN and the InfoGAN protrusion of the airfoil. Previously there are GAN-based models that generate specific types of data (i.e., images and videos) by a two-level hierarchy [159, 227]. For example, a generator first generates the structure of an image, and then conditioned on that structure, another generator generates the texture of that image. In the case of design synthesis, it is intuitive to generate each design component given all its dependencies, which results in a hierarchical model structure with multiple levels of generators. 5.3 Method In this section, we introduce our proposed deep neural network architecture and its training details. 5.3.1 Problem Formulation We can use a directed acyclic graph to define inter-part dependencies of a design. We call this graph a part dependency graph. For example, suppose we want to design an airfoil with two holes inside (top left in Fig. 5.3). We might first design the airfoil (Part A), and then set the position and diameter of one hole (Part B) based on the shape of the airfoil, followed by the design of the second hole (Part C) based on both 73 the airfoil shape and the first hole. Thus, the dependencies can be expressed by the graph shown in the bottom left of Fig. 5.3. This chapter deals with a design synthesis and design space exploration problem, where we have a database of geometric designs, each of which has multiple parts. We have no prior knowledge of the specific types of constraints that confine the geometry of each part, but only inter-part dependencies. We propose a model that learns to synthesize designs based on those inter-part dependencies. We want to use this model to (1) correctly synthesize each part that follows both the shape feasibility and the dependency constraints; and (2) use a low-dimensional latent spaces C to represent the design spaces X of each part. The ability to learn separate representations of each part is useful for decomposing a design space exploration problem (e.g., in design optimization). Desired latent spaces satisfy the following requirements: 1. Any child latent space should be conditioned on a parent shape (e.g., for the design in Fig. 5.3, any latent space of the first hole should be conditioned on an airfoil). 2. Major variability across designs in the database should be captured by those latent spaces. 3. Designs should changes consistently as we move along any basis of the latent space. This regularity/consistency/smoothness will improve latent space design exploration and optimization. To meet the first requirement we construct a composite generative model?a model with multiple generators, each of which learns a (conditional) generative distribution of a part from the design. We ensure the rest of the requirements by adapting In- foGAN?s architecture and objective, i.e., conditioning the generator on latent codes 74 cA zA Figure 5.3: An example of part dependency and the corresponding Hierarchical GAN ar- chitecture. The interaction between generators (GA, GB, and GC) and auxiliary networks (QA, QB, and QC) is based on the connection of the part dependency graph. This figure is best viewed in color. and maximizing a lower bound of the mutual information between the synthesized designs and their latent representations. This is known to (1) make latent codes tar- get salient features of the designs, and (2) disentangle the latent representation of the data space[52]. We introduce the details of our model?s architecture in the following section. 5.3.2 Model Architecture To learn separate distributions/representations of each part, we use a generative adversarial net with multiple generators/auxiliary networks. We call this network the Hierarchical Generative Adversarial Networks (HGAN). An example of its architec- ture is shown in Fig. 5.3. We use multiple generators to synthesize different parts in a design. The interaction between generators/auxiliary networks changes with the part dependency graph. The latent code cj defines the latent representation of the j-th part, where j = 1, ..., n, and n is the number of parts. The generator Gj learns a (conditional) shape distribution P (xj|cj,xPar(j)), where Par(j) denotes the set of 75 parent(s) of the j-th part. For the example in Fig. 3, Par(A) = ?, Par(B) = {A}, and Par(C) = {A,B}. The output distribution PG = Gj(cj, zj, x?Par(j)) represents thej distribution of the synthesized part x?j, where x?Par(j) denotes the synthesized parent part(s) of xj, and zj is the noise input to Gj. The auxiliary networkQj predicts the latent code distribution of the corresponding part, i.e., to estimate the conditional distributions P (cj|xj,xPar(j)). The discrimina- tor D predicts whether a full design is from the database or generated by generators. A properly trained D should distinguish designs with unrealistic or mismatched parts. The objective of HGAN is expressed as min maxVHGAN(D, {Gj}, {Qj}) = V (D, {Gj})? ?LI({Gj}, {Qj}) (5.3) {Gj},{Qj} D where {Gj} and {Qj} denotes the set of generators and auxiliary networks, respec- tively. The first term in Eqn. (5.3) denotes the standard GAN objective: V (D, {Gj}) = E{x }?P [logD({xj data j})]+E{cj?P (c )},{z ?P (z )}[log(1?D({x?j}))] (5.4)j j j where x?j = Gj(cj, zj, x?Par(j)). The second term in Eqn. (5.3) is the lower bound of the mutual information between the latent codes and the synthesized designs: ?n LI({Gj}, {Qj}) = Ec ?P (c ),x ?G (c ,z ,x? )[logQj(cj|xj, x?Par(j))] +H(c )j j j j j j Par(j) j (5.5) j=1 5.4 Experimental Setup To demonstrate the performance of our model, we built six datasets with differ- ent ground-truth inter-part dependencies. We train the proposed network on these datasets, and evaluate the generative performance, constraint satisfaction, and the 76 B A B C A A A A A C A A AH AHH B B B A B C SCC B SEoEi B A B C C SE SEiEo C C B Figure 5.4: Datasets with different inter-part dependencies latent space property through qualitative and quantitative measures. 5.4.1 Dataset We created the following datasets, as shown in Fig. 5.4: 1. AH: An 2D airfoil (Part A) with a hole (circle, Part B) inside. 2. AHH: An 2D airfoil (Part A) with two non-intersecting holes (Parts B and C) inside. The centers of the two holes lie on a horizontal line. 3. SE: A superformula [79] (Part A) with a concentric ellipse (Part B) inside. 4. SEoEi: A superformula (Part A) with two ellipses inside. The second ellipse (Part C) is also inside the first one (Part B). All three shapes are concentric. 5. SEiEo: A superformula (Part A) with two ellipses inside. The second ellipse (Part C) is also outside the first one (Part B). All three shapes are concentric.1 6. SCC: A superformula (Part A) with two tangent circles?one (Part B) inside and the other (Part C) outside. 1Note that the assembly of SEiEo is the same as SEoEi, but since we synthesize its parts in a different order, the inter-part dependency changes (see Fig. 5.4). We create this dataset for comparing different inter-part dependencies, rather than simulating practical use case. 77 In addition, we control the part dependency graph?s depth and branching factor by adding more circles/ellipses to SCC/SEoEi. This results in six extra datasets?S, SC, SCCC, SCCCC, SEEE, and SEEEE. Here the dataset of superformulas (S) is used as a baseline where no inter-part dependency is presented. Specifically, the airfoil shapes are from the UIUC airfoil coordinates database2, which provides the Cartesian coordinates for nearly 1,600 airfoils. Each airfoil is re- parametrized and represented with 64 Cartesian coordinates, resulting in a 64 ? 2 matrix. Though targeted for real-world applications, the airfoils may not be a perfect experimental dataset to visualize the latent space, because the ground truth intrinsic dimension (ID)3 of the airfoil dataset is unknown. Thus, we create another synthetic dataset using superformulas and ellipses, the IDs of which are controllable. The superformula is a generalization of the ellipse [79]. We generate superformulas using the following equations: n1 = 10s1 n2 = n3 = 10(s1 + s2) (5.6) ? 1 r(?) = (|cos ?|n2 + |sin ?|n3) n1 (x, y) = (r(?) cos ?, r(?) sin ?) where s1, s2 ? [0, 1], and (x, y) is a Cartesian coordinate. For each superformula, we sample 64 evenly spaced ? from 0 to 2?, and get 64 grid-point Cartesian coordinates. Equations (5.6) show that we can control the deformation of the superformula shape with s1 and s2. Thus, the ground truth ID of our superformula dataset is two. 2http://m-selig.ae.illinois.edu/ads/coord_database.html 3The intrinsic dimension is the minimum number of variables required to represent the data. It indicates the degrees of freedom we have to control the shape and position. 78 AH AHH SE SEoEi/SEiEo SCC Figure 5.5: Samples drawn from datasets For further details on how to modify the superformula parameters to adjust the ID, complexity, and number of manifolds see [49]. The other shapes (e.g., circles and ellipses) are also represented using 64 Cartesian coordinates. The ground truth ID of ellipses in SE, SEoEi, and SEiEo is two, since we fix their centers and only change their semi-major axis and semi-minor axis lengths. The ground truth ID of circles in SCC is also two, since they can change their radii and move tangentially to the superformulas. The Part B (circle) in AH and AHH has a ground truth ID of three, as both their centers and radii can change; while the Part C in AHH has a ground truth ID of two, since the y-coordinate of its center is fixed. Figure 5.5 shows samples drawn from each dataset. For each dataset, we run experiments with sample sizes ranging from 500 to 10,000. 5.4.2 Network and Training We implement our airfoil/superformula generators by adopting the generator ar- chitecture of the Be?zierGAN [42, 47]. The circle/ellipse generators first generate shape parameters (e.g., the center coordinates and the radius for a circle), and then convert them into grid-point coordinates using corresponding parametric functions. 79 For a generator with parent parts as inputs, we use an encoder to convert each parent part into a feature vector before concatenating all inputs and feeding them into the generator. Similarly, we use an encoder to convert each generated part into a feature vector before feeding them into the auxiliary networks and the discriminator. The use of encoders reduces the model complexity by reducing the input dimension of the generators, the auxiliary networks, and the discriminator. But one has to carefully choose the feature vector dimensions to avoid loss of information from dimensionality reduction. At training, we sample the latent codes from uniform distribution U(0, 1), and the noise inputs from normal distribution N (0, 0.25). The hyper-parameter ? in Eqn. (5.3) was set to 0.1 in all experiments. The network was optimized using Adam [116] with the momentum terms ?1 = 0.5 and ?2 = 0.999. The learning rates were set to 0.0001. The total number of training steps was 100,000. The batch size was 32. Interested readers who wish to reproduce our exact architectures, hyper- parameters, and training procedures are directed to our code located on GitHub4. The training procedure is summarized in Algorithm 1. We used TensorFlow [1] to build the networks. We trained our networks on an Nvidia Titan X GPU. For each experiment, the training process took around 2.2 hours for SE and AH, and 3 hours for other examples. The inference took less than 10 seconds. 5.5 Results and Discussion We evaluated the performance of our trained generative models using both vi- sual inspection (Figs. 5.6-5.10) and quantitative measures (Fig. 5.11). We analyze 4https://github.com/IDEALLab/hgan_jmd_2019 80 Algorithm 1 Train HGAN for designs with n parts 1: . T : number of training steps 2: . m: batch size 3: . X: training set 4: procedure Train(T , m) 5: for t = 1 : T do 6: Sample {x1, ...,xm} from X, where xi = [xi1, ...,xin], i = 1, ...,m 7: for j = 1 : n do 8: Sample {c1j , ..., cmj } from a uniform distribution 9: Sample {z1j , ...,zmj } from a normal distribution 10: end for 11: . Based on Eqn. (5.3-5.5): 12: Train D using {x1, ...,xm}, fixing G1, ..., Gn 13: x?i := [G (ci , zi , x?i1 1 1 Par(1)), ..., Gn(c i i i n, zn, x?Par(n))] 14: Train Q1, ..., Qn and D using {x?1, ..., x?m}, fixing G1, ..., Gn 15: Train G1, ..., Gn, fixing Q1, ..., Qn and D 16: end for 17: end procedure the effect of sample size and problem complexity on those quantitative performance metrics. 5.5.1 Visual Inspection. The captured latent spaces for different examples are visualized in Figs. 5.6-5.9. All these plots are generated using a sample size of 10,000. The results show that each child latent space adjusts itself according to its parent part, so that the sizes/positions of child parts match their parent parts. This indicates that the child generator fig- ures out the implicit constraints encoded in data. The latent spaces capture major shape variations and show consistent shape change. For example, in Fig. 5.6, the outer ellipse (middle subplot) has a consistently decreasing width from left to right, and increasing height from top to bottom. Interestingly, Fig. 5.8 shows that the circles (middle and right subplots) change in the latent space according to a polar 81 Figure 5.6: Latent space visualization of the SEoEi example. Left: synthesized super- formulas in a 2-D latent space; middle: synthesized outer ellipses in a 2-D latent space conditioned on a random superformula; right: synthesized inner ellipses in a 2-D latent space conditioned on a random outer ellipse. Figure 5.7: Latent space visualization of the SEiEo example. Left: synthesized super- formulas in a 2-D latent space; middle: synthesized inner ellipses in a 2-D latent space conditioned on a random superformula; right: synthesized outer ellipses in a 2-D latent space conditioned on that same superformula and a random outer ellipse. 82 Figure 5.8: Latent space visualization of the SCC example. Left: synthesized superformulas in a 2-D latent space; middle: synthesized inner circles in a 2-D latent space conditioned on a random superformula; right: synthesized outer circles in a 2-D latent space conditioned on that same superformula. Interestingly, the HGAN automatically learns a polar-coordinate representation for the tangent constraint. Figure 5.9: Latent space visualization of the AHH example. Top: Synthesized airfoils in a 3-D latent space (visualized by multiple slices of 2-D latent spaces); middle: Synthesized holes in a 3-D latent space conditioned on a random airfoil; Bottom: Synthesized holes in a 2-D latent space conditioned on that same airfoil and another random hole. Unfeasible synthesized designs occur when the two holes intersect (indicated by red boxes). 83 Figure 5.10: Latent spaces learned without maximizing the mutual information lower bound. Left: the latent code c failed to capture major shape variation; right: inconsistent shape variation along latent space bases (LSC=0.562? 0.008). coordinate system, instead of a Cartesian coordinate system like in other examples. For example, the inner circle?s radius decreases with the radial coordinate, and its position moves with the angular coordinate. This behavior is interesting because we did not explicitly encode this polar-coordinate representation into the HGAN architecture?rather, the HGAN automatically learns that such a representation is appropriate for this constraint. Figure 5.10 shows that when we removed the mutual information lower bound LI in Eqn. (5.3), latent spaces either failed to capture major shape variation, or became entangled and inconsistent. In the AHH example (Fig. 5.9), unfeasible synthesized designs occur when the two holes intersect. The figure shows that while the second hole moves from one side of the airfoil to the other side, it has to pass through a narrow unfeasible region. This unfeasible region cuts off the latent space, but the generator ignores this fact and learns a continuous latent space by interpolating designs inside the unfeasible region. The small volume of this unfeasible region may cause the discriminator to ignore it. In other words, the generator is willing to take the minor loss incurred by this small region of the infeasible design space in order to avoid making the latent space representation more complicated. To solve this problem, we can perform adaptive sampling in the latent space to more accurately identify the feasible region(s) [44, 45]. 84 Figure 5.11: Quantitative measure of synthesized design quality and latent space properties. LSC A, LSC B, LSC C denote the LSC for the latent spaces of Parts A, B, C, respectively. 5.5.2 Constraint Satisfaction. We measure the precision of constraint satisfaction by computing the proportion of feasible designs among all the synthesized designs. We call this metric the Con- straint Satisfaction Score (CSS). Specifically, different constraints define feasibility in different examples: 1. AH: Each point on the hole should be inside the airfoil. 2. AHH: (i) Each point on both holes should be inside the airfoil; (ii) the centers of the two holes should have a vertical distance of less than 0.01 (alignment constraint);5 and (iii) the distance of the two centers should be larger than the sum of the two radii (non-intersection constraint). 5All designs are rescaled such that the airfoils and the superformulas have unit widths. 85 3. SE: (i) Each point on the ellipse should be inside the superformula; and (ii) The distance between the origin and the center of the ellipse should not exceed 0.01 (concentric constraint).6 4. SEoEi: (i) Each point on both ellipses should be inside the superformula; (ii) both the semi-major and semi-minor axis lengths of the first ellipse should be larger than the second one; and (iii) for both ellipses, the distance between their centers and the origin should not exceed 0.01 (concentric constraint). 5. SEiEo: Same as the SEoEi example, except that the first and the second ellipses are swapped. 6. SCC: For each circle with a center Co and a radius r, the difference between r and the distance from Co to the superformula should be less than 0.03 (tangent constraint). The results on CSS in Fig. 5.11 show that the tangent constraint in the example SCC is the hardest to learn. But the learning performance improves with larger sam- ple size. The SE example outperforms SEoEi and SEiEo on CSS, which is expected since the task of learning constraints in SE can be considered as a sub-task in SEoEi and SEiEo. This also applies to AH and AHH. It is also expected that SEoEi out- performs SEiEo, since the two are dealing with the same design but the former has fewer dependencies. 5.5.3 Distance between Data and Generator Distributions. We measure how well our generator approximates the real data distribution by computing the kernel maximum mean discrepancy (MMD) [87] between the data and 6For all the examples we assume that the superformula is centered at the origin. 86 the generator distribution: [ ] MMD2(Pdata, PG) = E ? ?xd,x? ?P ?data;xg ,xg?P k(xd,xG d)? 2k(xd,xg) + k(xg,x )d g where k(x,x?) = exp (??x? x??2/(2?2)). A lower kernel MMD indicates that the generator distribution is closer to the data distribution. The top middle plot in Fig. 5.11 shows the results of MMD. Here the SEoEi and SEiEo examples have similar MMD values, which suggests that synthesized designs in both examples have similar perceptual quality, despite the fact that SEiEo requires more model parameters to learn the additional dependency. The MMD plot provides insight into how realistic the generated designs can get as the training sample size changes. In general, MMD first decreases steeply with the sample size, and then reaches a plateau at some point. This point indicates the smallest sample size required to reach a ?perceptually good? synthesis performance. 5.5.4 Diversity of Generated Designs. A common problem in GANs? training is mode collapse, during which the genera- tor only generates a few types of designs to fool the discriminator instead of properly learning the complete data distribution. Therefore it is important to measure the diversity of the synthesized designs. We use Relative Diversity (R-Div) to measure the relative level of variability captured by the generator. We define the diversity of N samples X ? Rd?N as7 1 Div(X) = trace(cov[X,X]) N 7Here each sample is represented as a column vector containing all the coordinates of the design. 87 Figure 5.12: Mode collapse occurs when the sample size is 500. The diversity is low in these randomly generated designs. The R-Div metric can then be expressed as Div(Xg) R-Div = Div(Xdata) where Xg and Xdata denote the set of synthesized designs and designs from the dataset, respectively. A R-Div close to 0.0 means that there is little variation within the synthesized designs, which could be an indicator of mode collapse. A R-Div around 1.0 indicates that the synthesized designs have a similar level of variability with the dataset. Note that high diversity does not always indicate good performance, as there could be unrealistic designs being synthesized, which also contribute to diversity. Thus, we should view this metric in concert with the kernel MMD to determine how well the generator performs. Figure 5.11 shows that synthesized designs tend to have a high divergence of R-Div when the sample size is small. Particularly, in the AHH example with a sample size of 500, the low R-Div combined with the high kernel MMD indicates the occurrence of mode collapse (Fig. 5.12). Increasing the sample size stabilized the R-Div and eventually bounded it between 0.8 and 1.1. 88 5.5.5 Latent Space Consistency. A desirable latent space has two properties: 1) disentanglement : each latent vari- able is related to only one factor; and 2) consistency : shapes change consistently along any basis of the latent space. Note that this consistency is evaluated along one direction at a time, since scales along different directions may vary. To the best of our knowledge, existing quantitative measurements for the first property?latent space disentanglement?are supervised, i.e., the ground-truth independent factors causing shape deformation have to be provided [39, 95, 114]. The second property is important for latent space design exploration. When searching for designs along a direction in the latent space, optimization algorithms and humans usually prefer if shapes change consistently, such that the objective function over the latent space is less complicated (i.e., has better Lipschitz continuity) and has fewer local optima. We propose Latent Space Consistency (LSC) as a quantitative measure of how consistently shapes change along any basis of the latent space. Since the change from one shape to another can be measured by their dissimilarity, distances between samples along a certain direction in the latent space should be consistent with the dissimilarity between those samples. We use Pearson correlation coefficient to mea- sure this consistency. Algorithm 2 describes how to compute the LSC. The choice of the dissimilarity function d is not central to the overall method. In our experiments, we simply use the Euclidean distance to measure the dissimilarity of designs. Both the middle plots in Figs. 5.6 and 5.7 show latent spaces with LSCs above 0.9. In contrast, the right plot in Fig. 5.10 provides a visual example of an LSC of around 0.56. The bottom plots in Fig. 5.11 shows that larger sample size does not improve LSCs of Part A (at least with sample sizes in the range from 500 to 10,000), but improves the LSCs of Parts B and C in most cases. 89 Algorithm 2 Evaluate Latent Space Consistency 1: procedure LatentConsistency(G, m, n, d) 2: . G: the mapping from a latent space to a design space 3: . m: the number of lines to be evaluated 4: . n: the number of points sampled on each line 5: . d: a dissimilarity function 6: . C: the latent space 7: . X : the design space 8: sum = 0 9: for i = 1 : m do 10: Sample a line L parallel to any basis of C 11: Sample n points {c1, c2, ..., cn} along L 12: {x1,x2, ...,xn} := {G(c1), G(c2), ..., G(cn)} 13: D := {?ci ? cj?}, D := {d(xi,xjC X )}, where i, j ? {1, ..., n} 14: Compute Pearson correlation coefficient: cov(DC, DX ) ? := ?(DC)?(DX ) 15: sum := sum+ ? 16: end for 17: Return sum/m 18: end procedure 90 C2 C4 C6 C1 C3 C5 Figure 5.13: The latent space learned by a standard InfoGAN. This plot visualizes a 6- D latent space of the SEoEi example by showing each two dimensions while setting the latent coordinates of other dimensions to zero. In each dimension, three parts change simultaneously. This shows that the latent space learned by a standard InfoGAN does not disentangle each part?s shape variation. In contrast, the results of the HGAN disentangle and separate each part?s latent space. 5.5.6 Effect of Encoding Inter-Part Dependencies. To further study the effect of encoding inter-part dependencies in the GAN, we compared the results of our HGAN with a standard InfoGAN, where there is only one generator and all parts are synthesized from a single latent space. The latent space dimension was set to the sum of all latent space dimensions in the HGAN experiments. Figure 5.13 visualizes the latent space learned by a standard InfoGAN. It shows that in each latent dimension, three parts change simultaneously, which indicates that the latent space does not disentangle each part?s shape variation. We also study the effects of part dependency graph?s depth and branching factor on the model?s synthesis performance. The examples S, SE, SEE, SEEE, and SEEEE simulate increasing depths, respectively; while the examples S, SC, SCC, SCCC, and SCCCC simulate increasing branch factors, respectively. The results are shown in Fig. 5.14. The InfoGAN model is used as a baseline where we do not encode inter-part dependencies. As expected for both HGAN and InfoGAN, CSS decreases 91 Figure 5.14: Effects of depth (top) and branching factor (bottom) on synthesis performance. Here we denote the example SEoEi as SEE for simplicity. with an increased number of parts, since additional parts bring extra constraints. Theoretically, encoding inter-part dependencies introduces extra constraints to the model and hence complicates the overall task. However, based on the results, the HGAN shows no significant performance drop comparing to the InfoGAN, except for the CSS when increasing the branching factor and the R-Div when increasing the depth. Note that the MMD values are below 0.05 and do not change notably. The lower CSS or R-Div indicates that HGAN compromises its synthesis precision (i.e., the precision of satisfying constraints) or generator distribution?s coverage for disentangling each part?s latent space. We also included the training history of HGAN and InfoGAN in the online supplemental material. 92 5.6 Limitations and Future Work One limitation of our approach is that it is difficult to achieve high precision in satisfying some constraints. This problem exists in all purely data-driven design methods, where there is no process of incorporating restricted constraints in the generative model or validating the constraint satisfaction of the outputs. While we can address this by encoding constraints explicitly in the generative model, this requires us to know these constraints in advance and create hand-coded rules for all types of constraints. For example, to generate concentric ellipses, one can simply fix the center of the second ellipse to have the same coordinate as the first one. However, it may be difficult to incorporate some constraints in the generative model (e.g., the constraint of one part being inside another). Also, it is sometimes hard to even know the type of the exact constraint between parts (e.g., aesthetic preferences when placing handles on a vase). Thus, here we assume that we have no prior knowledge of the types of constraints, and inter-part dependencies are the only knowledge we need for our model. Despite the limitation of purely data-driven design methods, they can be used in the conceptual design stage for exploring a wide range of design alternatives and inspiring novel designs. We can also use validation based on simulation, experiment, or human annotation to exclude infeasible synthesized designs when performing latent space exploration [44, 89], which could be an interesting avenue for future work. Another limitation is that all designs must have the same part dependency graph, which is impractical in some cases. For example, not all tables have four legs, thus for some designs, their part dependency graphs might miss some nodes. Future study needs to address this situation. Sometimes the design data is not partitioned into separate components. One possible solution to this problem is to apply unsupervised co-segmentation [194] to 93 partition designs and establish correspondences between common components in dif- ferent designs. Then we can use HGAN to learn the latent spaces and generate new designs. 5.7 Summary We introduced a GAN-based generative model for synthesizing designs with inter- part dependencies. It decomposes the synthesis into synthesizing each part con- ditioned on the corresponding parent part. This also creates a conditioned low- dimensional latent representation that allows accelerated design exploration. This model is built for problems where design space exploration over the latent space has to be staged since the optimal solution of one part depends on the geometry of another. Such models can accelerate design optimization problems, which we are exploring in our future work. An advantage of neural-network-based generative models (e.g., GANs and VAEs), compared to other dimensionality reduction models (e.g., PCA and GPLVM), is that one can define or regularize latent distributions. Our model adapts InfoGAN?s mutual information objective to derive a consistent latent space, where the change of shapes is consistent along any basis of the latent space. This property is desirable in latent space design exploration, as the objective function over the latent space is less complicated and has less local optima. We also created new benchmark datasets for studying different kinds of inter-part dependencies including: type of geometric constraints, depth of hierarchy, and branch- ing factor of parent/child relationships. By using these datasets, we characterized the effects of sample size and part dependencies? complexity (depth and branching factor) on the synthesis performance of our generative model. We also proposed a new eval- 94 uation metric for generative models that measures the consistency of shape variation in the latent space. Compared to a standard InfoGAN, the HGAN disentangles each part?s latent space at the cost of weakened synthesis precision when the branching of the part dependency increases. The concept of decoupling the latent space of a design is important for design space exploration in general. It allows separate exploration and synthesis of each part, and helps us understand how different constraints control shape variation. Though we use GANs to achieve this goal, the idea of encoding inter-part dependencies, learning conditioned generative distributions, and the latent space analysis is also applicable to other generative models. Thus this chapter?s techniques lay the foundation for learned generative models to extend to more realistic engineering systems where inter-part dependencies are widespread. So far this dissertation has shown how to measure the intrinsic complexity of a design space to guide data-driven design synthesis and how to incorporate prior knowl- edge into data-driven design synthesis models. The compact latent spaces learned by the proposed models can reduce the cost of design space exploration. But there is another problem in design space exploration?the bounds of the design space are hard to specify. Usually one cannot guarantee that the desired designs are inside the user-specified bounds. Also sometimes the desired design is remarkably different from existing ones, which makes it unreliable to set bounds based on the boundary of any design dataset. To address this problem, the next two chapters will introduce design space exploration methods that gradually expand an input space (i.e., either a design space or a latent space), so that no fixed bounds are required. 95 Chapter 6: Feasible Domain Identification with Active Expansion Sam- pling Portions of the work in this chapter were published as journal papers in the Journal of Mechanical Design [44] and Structural and Multidisciplinary Optimization [43] 6.1 Introduction In applications like design space exploration [64, 128, 237] and reliability analy- sis [133, 248], people need to find feasible domains within which solutions are valid. Sometimes the constraints that define those feasible domains are implicit, i.e., they cannot be represented analytically. Examples of these constraints are aesthetics, functionality, or performance requirements, which are usually evaluated by human assessment, experiments, or time-consuming computer simulations. Thus usually it is expensive to detect the feasibility of a given input. In such cases, one would like to use as few samples as possible while still approximating the feasible domain well. To solve such problems, researchers have used active learning (or adaptive sam- pling)1 to sequentially select the most informative instances and query their feasibil- ity, so that the number of queries can be minimized [101, 128, 133, 175, 248]. These methods require fixed bounds over the input space, and only pick queries inside those bounds. But what if we do not know how wide to set those bounds? If we set the 1Note that in this chapter the terms ?active learning? and ?adaptive sampling? are interchange- able. 96 bounds too large, an active learner will require an excessively large budget to explore the input space; whereas if we set the bounds too small, we cannot guarantee that an algorithm will recover all the feasible domains [44]. In this case, we need an active learning method that can gradually expand our knowledge about the input space until we have either discovered all feasible domains or used up our remaining query budget. This chapter proposes a method ? which we call Active Expansion Sampling (AES) ? to solve that problem by casting the detection of feasible domains as an unbounded domain estimation problem. In an unbounded domain estimation problem, given an expensive function h : X ? Rd ? {?1, 1} that evaluates any point x in an unbounded input data space X , we want to find (possibly disconnected) feasible domains in which h(x) = 1. Specifically, h could be costly computation, time-consuming experiments, or human evaluation, so that the problem cannot be solved analytically. By un- bounded, we mean that we do not manually bound the input space. Thus the input space can be considered as infinite, and theoretically if the query budget allows, our method can keep expanding the explored area of the input space. To use as few function evaluations as necessary to identify feasible domains, AES first fully exploits (up to an accuracy threshold) any feasible domains it knows about and then, budget permitting, searches outward to discover other feasible domains. The main contributions of this chapter are: 1. We introduce the AES method for identifying (possibly disconnected) feasible domains over an unbounded input space. 2. We provide a framework that transfers bounded active learning methods into methods that can operate over unbounded input space. 3. We introduce a dynamic local pool method that efficiently finds near-optimal 97 solutions to the global optimization problem (Eq. 6.9) for selecting queries. 4. We prove a constant theoretical bound for AES?s misclassification error at any iteration inside the explored region. 6.2 Related Work Essentially, the unbounded domain estimation problem breaks down into two tasks explored by past researchers: 1) the active learning task, where we efficiently query the feasibility of inputs; and 2) the classification task, where we estimate decision boundaries (i.e., boundaries of feasible domains) that separates the feasible class and the infeasible class (i.e., feasible regions and infeasible regions). For the first task, we will review relevant past work on active learning. For the second task, we use the Gaussian Process as the classifier in this chapter and will introduce basic concepts of Gaussian Processes. 6.2.1 Feasible Domain Identification Past work in design and optimization has proposed ways to identify feasible do- mains or decision boundaries of expensive functions. Generally, those methods were proposed to reduce the number of simulation runs and improve the accuracy of surro- gate models in simulation-based design and reliability assessment [13, 133]. Also, the problem of feasible domain identification is equivalent to estimating the level set or the threshold boundaries of a function, where the feasible/infeasible region becomes superlevel/sublevel set [26, 86]. Such methods select samples that are expected to best improve the surrogate model?s accuracy. A common rule is to sample on the es- timated decision boundary, but not close to existing sample points. Existing methods achieve this by (1) explicitly optimizing or constraining the decision function or the 98 distance between the new sample and the existing samples [12, 13, 195], or (2) select- ing points based on the estimated function values and their confidence at candidate points [26, 53, 54, 86, 133, 235]. 6.2.2 Active Learning Methods for feasible domain identification usually require strategies that sequen- tially sample points in an input space, such that the sample size is minimized. These strategies fall under the larger category of active learning. There are three main scenarios of active learning problems: (1) membership query synthesis, (2) stream-based selective sampling, and (3) pool-based sampling [187]. In the membership query model, the learner generates samples de novo for labeling. For classification tasks, researchers have typically applied membership query models to learning finite concept classes [6, 9, 103, 115] and halfspaces [3, 38]. In the stream- based selective sampling model, an algorithm draws each unlabeled sample from an incoming data distribution, and then decides whether or not to query that label. This decision can be based on some informativeness measure of the drawn sample [2, 33, 34, 58, 61, 75, 160, 181], or whether the drawn sample is inside a region of uncertainty [57, 60]. In the pool-based sampling model, there is a small pool of labeled samples and a large (but finite) pool of unlabeled samples, where the learner selects new queries from the unlabeled pool. The unbounded domain estimation problem assumes that synthesizing an unla- beled sample from the input space is not expensive (as in the membership query scenario), since otherwise we have to use existing samples and the input space will be bounded. An example that satisfies this assumption is experimental design, where we can form an experiment by selecting a set of parameters. With this assumption, 99 our proposed method approximates the pool-based sampling setting by synthesizing a pool of unlabeled samples in each iteration. A pool-based sampling method first trains a classifier using the labeled samples. Then it ranks the unlabeled samples based on their informativeness indicated by an acquisition function. A query is then selected from the pool of unlabeled samples ac- cording to their rankings. After that, we add the selected query into the set of labeled data and repeat the previous process until our query budget is reached. Many of these methods use the informativeness criteria that select queries with the maximum label ambiguity [100, 136, 188], contributing the highest estimated expected classification error [31, 125, 154, 247], best reducing the version space [215], or where different classifiers disagree the most [7, 148]. Such methods are usually good at exploitation, since they keep querying points close to the decision boundary, refining our estimate of it. However, when the input space may have multiple regions of interest (i.e., feasible regions), these methods may not work well if the active learner is not aware of all the regions of interest initially. Note that while some of the methods mentioned above also consider representativeness [100, 148, 154, 188, 247], or the diversity of queries [97, 236], they do not explicitly explore unknown regions and discover other regions of interests. To address this issue, an active learner also has to allow for exploration (i.e., to query in unexplored regions where no labeled sample has been seen yet). A learner must trade-off exploitation and exploration. To query in an unexplored region, there are methods that (1) take into account the predictive variance at unlabeled samples when selecting new queries [26, 86, 111], (2) naturally balance exploitation/exploration by looking a the expected error [143], or (3) make exploitative and exploratory queries separately using different strategies [10, 22, 96, 98, 124, 161]. In previous methods, the exploitation-exploration trade-off 100 was performed in a bounded input space or a fixed sampling pool. However, in the unbounded domain estimation problem, there is no fixed sampling pool and we are usually uncertain about how to set the bounds of the input space for performing active learning. If the bounds are too small, we might miss feasible domains; while if the bounds are too large, the active learner has to query more samples than necessary to achieve the required accuracy. In this chapter, we introduce a method of using active learning to expand our knowledge about an unbounded input data space, and discover feasible domains in that space. A na??ve solution would be to progressively expand a bounded input space, and apply the existing active learning techniques. However, there are two problems with this na??ve solution: (1) it is difficult to explicitly specify when and how fast we expand the input space; and (2) the area we need to evaluate increases over time increasing the computational cost. Thus existing active learning techniques cannot apply directly to the unbounded domain estimation problem. To the best of our knowledge, [44] is the first to deal with the active learning problem over an unbounded input space (i.e., the unbounded domain estimation problem). The AES method proposed in this chapter improves upon that previous work (as illustrated in Sect. 6.3). 6.2.3 Gaussian Process Classification (GPC) Gaussian Processes (GP, also called Kriging) are often used as a classifier in active learning [26, 53, 54, 86, 111, 133]. Compared to other commonly used classifiers such as Support Vector Machines or Logistic Regression, GP naturally models probabilistic predictions. This offers us a way to evaluate a sample?s informativeness based on its predictive probability distribution. 101 The Gaussian process uses a kernel (covariance) function k(x,x?) to measure the similarity between the two points x and x?. It encodes the assumption that ?similar inputs should have similar outputs?. Some commonly used kernels are the Gaussian kernel and the exponential kernel [110, 124, 142, 143]. In this chapter we use the Gaussian kernel: ( ? )?x? x ?2 k(x,x?) = exp ? (6.1) 2l2 where l is the length scale. For binary GP classification, we place a GP prior over the latent function f(x), and then ?squash? f(x) through the logistic function to obtain a prior on ?(x) = ?(f(x)) = P (y = 1|x). In the feasible domain identification setting, we can consider f : X ? Rd ? R as an estimation of feasibility, thus we can call it estimated feasibility function. Under the Laplace approximation, given the labeled data (XL,y), the posterior of the latent function f(x) at any x ? XU is a Gaussian distribution: f(x)|XL,y,x ? N (f?(x), V (x)) with the mean and the variance expressed as f?(x) = k(x)TK?1f? = k(x)T? logP (y|f?) (6.2) V (x) = k(x,x)? k(x)T (K +W?1)?1k(x) (6.3) where W = ??? logP (y|f) is a diagonal matrix with non-negative diagonal ele- ments; f is the vector of latent function values at X , i.e., f = f(x(i)) where x(i)L i ? XL; K is the covariance matrix of the training samples, i.e., K = k(x (i),x(j)ij ); k(x) is the vector of covariances between x and the training samples, i.e., k (x) = k(x,x(i)i ); and f? = arg maxf P (f |X,y). When using the Gaussian kernel shown in Eq. 6.1, k(x,x) = 1. We refer interested readers to a detailed description by Rasmussen [172] about the Laplace approximation for the binary GP classifier. 102 The decision boundary corresponds to f?(x) = 0 or ??(x) = 0.5. We predict y = ?1 when f?(x) < 0, and y = 1 otherwise. 6.3 Active Expansion Sampling (AES) Algorithm 3 summarizes our proposed Active Expansion Sampling method. Over- all, the method consists of the following steps: 1. Select an initial sample x(0) to label. 2. In each subsequent iteration, (a) check the exploitation/exploration status (Sect. 6.4.4), (b) generate a pool of candidate samplesXU based on the exploitation/exploration status and previous queries (Sect. 6.4.2 and 6.4.3), (c) train a GP classifier using the labeled set XL to evaluate the informative- ness of candidate samples in XU , (d) select a sample from XU based on its informativeness and its distance from c (Sect. 6.3.1), (e) label the new sample and put it into XL. 3. Exit when the query budget is reached. This AES method improves upon our previous domain expansion method [44] in several ways. For example, the previous method generates a pool XU that expands with the explored region each iteration. So its pool size and hence the computational cost increase significantly over time if using a constant sample density. To avoid this problem, this chapter proposes a dynamic local pool method (Sect. 6.4). Another 103 Algorithm 3 The Active Expansion Sampling algorithm 1: Inputs: Query budget T Initial point x(0) and its label y0 d-dimensional evaluation function h(?) Hyperparameters  and ? 2: Initialize: XL ? {x(0)}, YL ? {y0}, INIT ? True 3: for t = 1, 2, . . . , T do 4: if INIT is True then 5: if XL consists of only one class (all feasible or all infeasible) then 6: c? x(0) 7: else 8: INIT ? False 9: c? centroid of positive samples in XL 10: end if 11: end if 12: Train the GP classifier using XL 13: Compute ?exploit using Eq. 6.18 14: XU ? uniform samples inside the (d? 1)-sphere C(x(t?1), ?exploit) 15: Compute f?(x), V (x), and p(x) for x ? XU using Eq. 6.2, (6.3), and (6.4) 16: if there are both f?(x) < 0 and f?(x) > 0 for {x ? XU |p(x) > ?} then . Exploitation stage 17: Select a new query x(t) from XU based on Eq. 6.9 18: else . Exploration stage 19: Compute ?explore using Eq. 6.17 20: if previous iteration is in exploitation stage then 21: x?? argmaxx?X ?x? c?L 22: XU ? uniform samples inside the (d? 1)-sphere C(x?, ?explore) 23: else 24: XU ? uniform samples inside the (d? 1)-sphere C(x(t?1), ?explore) 25: end if 26: Compute f?(x), V (x), and p(x) for x ? XU using Eq. 6.2, 6.3, and 6.4 27: Select a new query x(t) from XU based on Eq. 6.9 28: end if 29: y ? h(x(t)t ) 30: XL ? XL ? {x(t)}, YL ? YL ? {yt} 31: end for 104 Figure 6.1: The probability density function of the latent function f(x) [44]. The blue areas represent the -margin probability p(x). major difference is that AES provides a verifiable way to distinguish between ex- ploitation and exploration (Sect. 6.4.4); while the previous method uses a heuristic based on the labels of last few queries (which is more likely to make mistakes). In this section and Sect. 6.7, we show comprehensive theoretical analysis and experiments to prove favorable properties of our new method. 6.3.1 -Margin Probability We train a GP classification model to evaluate the informativeness of candidate samples based on the -margin probability (Fig. 6.1): ?????P (f(x) < ?|x), if y? = 1p(x) = ??P (f(x) > |x), if y? = ?1 = P (?y?f(x) > |x) (6.4) = P ((y?f(x) < ?|x)) = ? ?|f??(x)|+  V (x) 105 where y? is the estimated label of x, the margin  > 0, and ?(?) is the cumulative distribution function of standard Gaussian distribution N (0, 1). The -margin proba- bility represents the probability of x being misclassified with some degree of certainty (controlled by the margin ). Let the misclassification loss be ?????max{0,?f(x)}, if y = 1L(x) = ??max{0, f(x)}, if y = ?1 (6.5) = max{0,?yf(x)} where y is the true label of x. L(x) measures the deviation of the estimated feasibility function value f(x) from 0 when the class prediction is wrong. Then, based on Eq. 6.4 and 6.5, p(x) = P (L(x) > ), which is the probability that the expected misclassification loss exceeds . A high p(x) indicates that x is very likely to be misclassified, and requires further evaluation. Thus we use this probability to measure informativeness. 6.3.2 Exploitation and Exploration Since our input space is unbounded, na??vely maximizing the -margin probability (informativeness) will always query points infinitely far away from previous queries.2 To avoid this issue, one solution is to query informative samples that are close to previously labeled samples. This allows the active learner to progressively expand its knowledge as the queries cover an increasingly large area of the input space. When a new decision boundary is discovered during expansion, we want a query strategy that continues querying points on that decision boundary, such that the new feasible region 2A point infinitely far away from previous queries has the f?(x) close to 0 and the maximum V (x), thus the highest p(x). 106 New Other queried Figure 6.2: Queries at the exploitation stage (left) and the exploration stage (right). The gray area is the ground truth of the feasible domain. The solid line is the decision boundary estimated by the GP classifier, and the dotted line is the isocontour of p(x). At the exploitation stage (left), the center c is the previous query, which makes the next query stay along the decision boundary. At the exploration stage (right), c is the centroid of the initial positive samples, which keeps the queries centered around the existing (real-world) samples rather than biasing towards some direction. can be identified as quickly as possible. Therefore, to enable continuous exploitation of the decision boundary, we propose the following query strategy min V (x) x?XU (6.6) s.t. p(x) ? ? where V (x) is the predictive variance at x, and ? is a threshold of the informativeness measure p(x). Theorem 6.1. The solution to Eq. 6.6 will lie at the intersection of the estimated decision boundary (f?(x) = 0) and the isocontour of p(x) = ? (Point A in Fig. 6.2), if that intersection A exists. Proof. In the following proof, we denote f?P = f?(xP ), and VP = V (xP ). For a sample xA at the intersection of f?(x) = 0 and p(x) = ? , we have f?A = 0 and ? p(xA) = ?(?/ VA) = ? (Point A in Fig. 6.2); and for a sample xB that is any ? feasible solution to Eq. 6.6, we have p(xB) = ?(?(|f?B| + )/ VB) ? ? (Point B in 107 ? ? ? Fig. 6.2). Thus we get / VB ? (|f?B|+ )/ VB ? / VA. Therefore, VA ? VB. The equality holds when |f?B| = 0 and p(xB) = ? , i.e., xB is also at the intersection of f?(x) = 0 and p(x) = ? . Thus we proved the intersection has the minimal predictive variance among feasible solutions to Eq. 6.6, and hence it is the optimal solution. Theorem 6.1 indicates that when applying the query strategy shown in Eq. 6.6, the active learner will only query points at the estimated decision boundary3 as long as the estimated decision boundary and the isocontour of p(x) = ? intersect. The fact that this intersection exists indicates that there are points on the decision boundary that are informative to some extent (i.e., with p(x) ? ?). We call this stage the exploitation stage ? at this stage the active learner exploits the decision boundary. Equation 6.6 ensures that the queries are always on the estimated decision boundary until the exploitation stage ends (i.e., there are no longer informative points on the decision boundary). If the estimated decision boundary and the isocontour of p(x) = ? do not inter- sect, then the algorithm has fully exploited any informative points on the estimated decision boundary (i.e., for all the points on the estimated decision boundary, we have p(x) < ?). We call this stage the exploration stage since the active learner starts to search for another decision boundary (Fig. 6.2). In this stage, we want the new query to be both informative and close to where we started, since we do not want the new query to deviate too far from where we start. Therefore, the query strategy at the exploration stage is min ?x? c? x?XU (6.7) s.t. p(x) ? ? 3In Sect. 6.3, we assume that the queried point is the exact solution to the query strategy. However since we approximate the exact solution by using a pool-based sampling setting, the query may deviate from the exact solution slightly. 108 where the objective function is the Euclidean distance between x and a center c. This objective keeps the new query selected by Eq. 6.7 close to c. In practice, initially when there are only samples from one class, we set c as the initial point x(0) to keep new queries close to where we start; once there are both positive and negative samples, we set c as the centroid of these initial positive samples, since we want to keep new queries close to the initial feasible region. Theorem 6.2. Given x? as the solution to Eq. 6.7, we have p(x ?) = ? , if p(c) < ? . Proof. Since p(c) < ? , c itself is not the solution of Eq. 6.9. Thus ?x? ? c? > 0. Then we have p(x) < ? at any point within a (d?1)-sphere centered at c with radius ?x?? c?, because otherwise the query will be inside the sphere. Thus on that sphere we have p(x) ? ? . So p(x?) ? ? , since x? is on that sphere. Because x? is a feasible solution to Eq. 6.9, we also have p(x ?) ? ? at x?. Therefore p ?(x ) = ? . Theorem 6.2 shows that in each iteration, the optimal query x? selected by Eq. 6.7 is on the isocontour of p(x) = ? . For both Eq. 6.6 and 6.7, the feasible solutions are in the region of p(x) ? ? . Intuitively this means that we only query samples with at least some level of informativeness. We call the region where p(x) ? ? the unexplored region, since it contains informative samples (feasible solutions) that our query strategy cares about; while we call the rest of the input space (p(x) ? ?) the explored region (Fig. 6.2). The upper bound of p(x) is ?(?/ supx V (x)), and it lies infinitely far away from the labeled samples. In Eq. 6.3, K + W?1 is positive semidefinite, thus k(x)T (K + W?1)?1k(x) ? 0 and V (x) ? k(x,x). For a kernel k(?) with k(x,x) = 1 (e.g., the Gaussian or the exponential kernel), we have V (x) ? 1. Thus p(x) ? ?(?). To ensure that Eq. 6.9 has a feasible solution, we have to set ? ? ?(?). Therefore, we 109 can set ? = ?(??), where ? ? 1.0. Then the constraint in Eq. 6.6 and 6.7 can be expressed as ( ) ?|f??(x)|+ ? ? ?(??) V (x) which can be written as ? ? V (x)? |f?(x)| ?  (6.8) The left-hand side of Eq. 6.8 is identical to the acquisition function of the straddle heuristic when ? = 1.96 [26]. The straddle heuristic queries the sample with the largest value of the acquisition function. This acquisition function accounts for the ambiguity of samples in terms of their confidence intervals [86]: a(x) = min{??minQ(x),maxQ(x)} = 1.96 V (x)? |f?(x)| where Q(x) is the 95% confidence interval of x. Substituting Eq. 6.8 for the constraint in Eq. 6.6 and 6.7, and combining the exploitation and exploration stages, our overall query strategy becomes min V (?x) ??x? c?1?? x?XU (6.9) s.t. ? V (x)? |f?(x)| ?  where the indicator ? is 1 at the exploitation stage, and 0 otherwise. Section 6.4.4 introduces how to set ? (i.e., when to exploit vs explore). In general, the unbounded domain estimation problem can be solved using a family 110 of query strategies with the following form min D(x, XL) x?XU s.t. I(x) ? ? whereD(x, XL) is a function that increases as x moves away from the labeled samples, and I(x) is the informativeness measure that is used in any bounded active learning methods. Our query strategies of Eqn 6.6 and 6.7 all have this form. Comparatively, for bounded active learning methods, the query strategies are usually in the form of maxx?X I(x).U 6.4 Dynamic Local Pool Generation We cast our problem as pool-based sampling by generating a pool of unlabeled instances de novo in each iteration. A na??ve way to generate this pool is to try to sample points anywhere near the p(x) = ? isocontour. However, intuitively, as the algorithm searches progressively larger volumes of the input space, the pool volume will likewise expand. This expansion means that the size of the pool will increase dramatically over time (assuming we want a constant sample density). This increase, however, makes the computation of Eq. 6.2 and 6.3 expensive during later expansion stages. To bypass this problem, we propose a dynamic local pool method that generates the pool of candidate samples only at a certain location in each iteration, rather than sampling the entire domain.4 The key insight behind our local pooling method is that while the optimal solution to Eq. 6.9 can, in principle, occur anywhere on 4Sampling methods like random sampling or Poisson-disc sampling [23] can be used to generate the pool. We use random sampling here thereby for simplicity. The specific choice of the sampling method within the local pool is not central to the overall method. 111 the p(x) = ? isocontour, in practice, multiple points on the isocontour are equally optimal. All we need to do is sample points around any one of those optima. Below, we derive guarantees for how to sample volumes near one of those optima, thus only needing to sample a small fraction of the total domain volume. 6.4.1 Scope of an Optimal Query Theorem 6.3. Let ? be the distance between an optimal query5 and its nearest labeled sample. We have ? < ?l (6.10) where ? is a coefficient depends on , ?, and the GP model. 5The optimal query means the exact solution to the AES query strategy shown in Eq. 6.6, 6.7, or 6.9 112 Proof. According to Eq. 6.2, given an optimal query x?, we have |f?(x?)| = ??|k(x ?)T? log p?(y|f?)|?? ???? = ??? ? ?(y1f1) ??? k(x? ? T? ??? .? ) log ..? ?? ?????? ? ?? ? ?(yt?1ft?1) ??? ? ????? ?? y ?1N (f1)/?(y1f1)? ? ??? ? T ? . ??= ?k(x ) .. ??? ?? ?? ?? ? ? yt?1N (ft?1)/?(yt?1ft?1) ?? = ?? ? t?1 N (fi) ?? k(x?,x(i))yi ? ??? ?(yi=1 ifi) ? t?1 ?? ?? N (fi) ?k(x?,x(i))y ?i ? ? ?(y f )i=1 i it?1 N (fi) < kmsign(y i=1 ?i)yi?(yifi) ? ??? y1N (f1)/?(y1f1)? ?T ? . ?= kmsign(y) .. ??? yt?1N (ft?1)/?(yt?1ft?1) = km? where km = ma(x k(x?,x(i))x(i)?XL ) min ?x(i)?X ?x ? x(i)?2 = exp ? L (6.11) 2l2 2 2 = e?? /(2l ) and ? = sign(y)T? log p(y|f?) (6.12) 113 Similarly, V (x?) = 1? k(x?)T (K +W?1)?1k(x?) > 1? (km1)T (K +W?1)?1(km1) (6.13) = 1? k2 T ?1m1 (K +W )?11 = 1? k2m? where ? = 1T (K +W?1)?11 (6.14) Therefore for the optimal query x? we have ( ) ( ) ? ?|f??(x?)|+  km?+ p(x ) = ? > ? ?? V (x?) 1? k2m? Both Theorem 6.1 and 6.2 state that p ?(x ) = ? , thus ( ? )km?+ ? ? < ? 1? k2m? When ? = ?(??), we have ?km?+  > ? (6.15) 1? k2m? Plugging Eq. 6.11 into Eq. 6.15 and solving for the distance ?, we get ? < ?l where ? ? = 2 log ? ?2 + ?22? (6.16) ? ?2 + (?2 ? 1)2? ? ? 114 Figure 6.3: Dynamic local pools (dashed circles) at the exploitation stage (left) and the exploration stage (right). During the exploitation stage, the estimated decision boundary divided the unexplored region into two subregions: unexplored negative R1 = {x|p(x) > ?, f?(x) < 0} and unexplored positive R2 = {x|p(x) > ?, f?(x) > 0}; while during the exploration stage, there will be at most one of R1 and R2 in the unexplored region. This property can be used to distinguish between the exploitation/exploration stages. Theorem 6.3 indicates that if we set the pool boundary by extending the current labeled sample range by ?l, then that pool is guaranteed to contain all solutions to Eq. 6.9; that is, extending the overall pool boundary further will not increase the chances of sampling near p(x) = ? , and will only decrease the sample density (given a fixed pool size) or increase the evaluated samples (given a fixed sample density). However, if we generate the pool based solely on this principle (i.e., extending the current labeled sample range by ?l), the pool size will still increase over time as the domain size grows. The next two sections show how, for the exploration and exploitation stages respectively, we can further reduce the sample boundary to only a local hyper-sphere. 115 6.4.2 Pool for the Exploration Stage Theorem 6.4. During the exploration stage of Active Expansion Sampling, the dis- tance between an optimal query and its nearest labeled sample is ? < ?explore = ?l (6.17) Theorem 6.4 is derived from Eq. 6.10. The nearest labeled sample of the optimal query could be any border point (a sample lying on the periphery of the labeled set). There are multiple local optima that are equally useful for expanding the explored region (Fig. 6.3). Thus we just sample near one of those optima. Specifically, we approximate the nearest labeled sample as the previous query. With this approxima- tion, incorporating Theorem 6.4, the optimal query will be inside C(x(t?1), ?explore), the (d?1)-sphere with a radius of ?explore, centered at the previous query x(t?1). Thus during the exploration stage, we set the pool boundary to be that sphere (Fig. 6.3). Sometimes when AES switches from exploitation to exploration, the previous query may not lie on the periphery of the labeled samples. This causes samples around the previous query to have low values of p(x). In this case, there might not be a feasible solution to Eq. 6.9. Thus, every time AES switches from exploitation to exploration, we center the pool around the farthest labeled sample from the centroid of the initial positive samples (i.e., argmaxx?X ?x ? c?). This ensures that AESL generates pool samples near the periphery of the labeled samples. 116 6.4.3 Pool for the Exploitation Stage Theorem 6.5. During the exploitation stage of Active Expansion Sampling, the dis- tance between an optimal query and its nearest labeled sample is ? < ?exploit = ?l (6.18) where ? is a coefficient depends on , ?, and the GP model. Proof. Theorem 6.1 states that the optimal query in the exploitation stage lies at the intersection of f?(x) = 0 and p(x) = ? . By substituting ?(??) for ? , we have ? 1V (x ) = (6.19) ?2 According to Eq. 6.13, we have V (x?) > 1 ? k2m?. Combining Eq. 6.11, 6.14, and 6.19, we get ? < ?exploit = ?l where ? ?2? ? = log (6.20) ?2 ? 1 Similar to the exploration stage, based on Theorem 6.5, we define the pool bound- ary during the exploitation stage as C(x(t?1), ?exploit), a (d ? 1)-sphere with a radius of ?exploit, centered at the previous query x (t?1) (Fig. 6.3). 117 6.4.4 Choosing when to Exploit versus Explore Since we use different rules to generate the pool at the exploitation and exploration stage, we need to distinguish between the two stages at the beginning of each iteration. In the exploitation stage, according to Theorem 6.5, the optimal query lies within the (d ? 1)-sphere C(x(t?1), ?exploit) centered at the previous query. While, according to Theorem 6.1, that same query must lie where the estimated decision boundary and the isocontour of p(x) = ? intersect. Thus, the decision boundary and the isocontour divide the sphere C into four regions (Fig. 6.3): unexplored negative R1 = {x|p(x) > ?, f?(x) < 0}; unexplored positive R2 = {x|p(x) > ?, f?(x) > 0}; explored negative R3 = {x|p(x) < ?, f?(x) < 0}; and explored positive R4 = {x|p(x) < ?, f?(x) > 0}. In contrast, during exploration the estimated decision boundary and the p(x) = ? isocontour do not intersect ? meaning, unlike exploitation, there exist only two of the four regions (either R1 & R3 or R2 & R4). In particular, within the unexplored region, f?(x) will be either all positive or all negative, i.e., R1 and R2 cannot exist simultaneously (Fig. 6.3). We use this property to detect exploitation or exploration by generating a pool (a set of uniformly distributed samples) within the boundary C(x(t?1), ?exploit) and checking if, for samples with p(x) > ? , samples differ in f?(x) > 0 and f?(x) < 0. If so, AES is in the exploitation stage; otherwise it is in the exploration stage. 118 6.5 Theoretical Analysis In this section, we derive a theoretical accuracy bound for AES with respect to its hyperparameters. We further discuss the influence of those hyperparameters on the classification accuracy, the query density, and the exploration speed. The results of this section guide the selection of proper hyperparameters given an accuracy or budget requirement. 6.5.1 Accuracy Analysis It is impossible to discuss the function accuracy across the entire input space, since the input space is unbounded. However, we can consider ways to bound the accuracy within bounded explored regions at any time step. As mentioned in Sect. 6.3.1, p(x) = P (L(x) > ), where L(x) is the misclassifi- cation loss at x defined in Eq. 6.5. Thus within the explored region, we have P (L(x) ? ) ? ? ?x ? {x|p(x) ? ?} or P (L(x) ? ) ? 1? ? ?x ? {x|p(x) ? ?} (6.21) This shows that at any location within the explored region of the input space, the proposed method guarantees an upper bound  of misclassification loss with a probability of at least 1 ? ? at any given point. Since, in the exploration stage, the estimated decision boundary lies inside the p(x) ? ? region (as discussed in Sect. 6.3.2), we have P (L(x) ? ) ? 1? ? ?x ? {x|f?(x) = 0} 119 This means that in the exploration stage, the estimated decision boundary f?(x) = 0 lies in between the isocontours of f(x) = ? with a probability of at least 1 ? ? , where f is the true latent function. Note that Eq. 6.21 shows that AES?s accuracy bound within the explored region is independent of the number of iterations or labeled samples. One advantage of keeping a constant accuracy bound for AES is that the accuracy in the explored region meets our requirements6 whenever AES stops. This also means that the estimation within the explored region is reliable at any iteration (although this is not true if one includes the unexplored region). In contrast, bounded active learning methods usually only achieve required accuracy after a certain number of iterations, before which the estimation may not be reliable. Therefore, AES can be used for real-time prediction of samples? feasibility in the explored region. 6.5.2 Query Density In Gaussian Processes, given a fixed homoscedastic Gaussian or exponential kernel, we can measure the query density by looking at the predictive variance at queried points. According to Eq. 6.3, V (x) only depends on k(x), which is affected by the distances between x and other queries. A smaller variance at a query indicates that it is closer to other queries, and hence a higher query density; and vise versa. Theorem 6.6. The predictive variance of an optimal query in the exploitation and exploration stage is 1 V (xexploit) = (6.22) ?2 6We can set  and ? such that the accuracy bound is as required. Details about how to set hyperparameters are in Sect. 6.5.3. 120 and ( ) 1 | 2f?(xexplore)| V (xexplore) = 1 + (6.23) ?2  where xexploit and xexplore are optimal queries at the exploitation stage and exploration stage, respectively. Proof. According to Eq. 6.19, the predictive variance of an optimal query xexploit in the exploitation stage is 1 V (xexploit) = ?2 While in the exploration stage, we have p(xexplore) = ? at the optimal query xexplore (Theorem 6.2). And by applying Eq. 6.4 and setting ? = ?(??), we have ( ) | | 21 f?(xexplore) V (xexplore) = 1 + ?2  This theorem indicates that the predictive variances of queries at the exploitation stage are always smaller than those at the exploration stage (as |f?(xexplore)| > 0). Thus the query density at the exploitation stage is always higher than that at the exploration stage. The property of having a denser set of points along the decision boundary (queried during the exploitation stage) and a sparser set of points at other regions (queried during the exploration stage) is desirable because we want to save our query budget for refining the decision boundary rather than other regions of the input space. Equation 6.21 and 6.22 also reflect the trade-off between the accuracy and the running time. When the query density near the decision boundary is high (small V (xexploit) in Eq. 6.22), ? is large, thus ? in Eq. 6.21 is small, which means our model will have a higher probability of having a misclassification loss less than . However, 121 as the query density gets higher, we need more queries to cover a certain region, thus the running time increases. 6.5.3 Influence of Hyperparameters There are four hyperparameters that control Active Expansion Sampling ? the initial point x(0),  and ? in the exploitation/exploration stage, and the length scale l of the GP kernel. The choice of the kernel function and length scale depends on assumptions regarding the nature and smoothness of the underlying feasibility function. Such kernel choices have been covered extensively in prior research and we refer interested readers to [172] for multiple methods of choosing l. Note that it is difficult to optimize the length scale at each iteration, since the length scale will eventually be pushed to extremes. In the exploitation stage, for example, once the length scale is smaller than the previous iteration, the distance between the new query and its nearest query will also be smaller (due to Eq. 12). Then the maximum marginal likelihood estimation will result in a smaller length scale, as the estimated function is steeper. This process will repeat and eventually cause the optimal length scale to converge to 0. The initial point x(0) can be any point not too far away from the boundary of feasible regions, since otherwise it will take a large budget to just search for a sample from the opposite class. Here we focus on the analysis of the other two hyperparameters ?  and ?. According to Eq. 6.21,  and ? affect the classification accuracy in a probabilistic way. When ? = ?(??), we have P (L(x) ? ) ? 1 ? ?(??) in the explored region. This offers us a guideline for setting  and ? with respect to a given accuracy requirement. According to Eq. 6.22 and 6.23, ? controls the density of queries in both ex- 122 ploitation and exploration stages. Specifically, as we increase ?, Vexploit and Vexplore decreases, increasing the query density and essentially placing labeled points closer together. In contrast,  only controls the distances between queries in the exploration stage7. Increasing  decreases Vexplore and hence increases the density of queries in the explo- ration stage. This density of queries affects (1) how fast we can expand the explored region, and (2) how likely we are to capture small feasible regions. When ? or  increases, we expand the explored region slower, making it more likely that we will discover smaller feasible regions. Likewise, we also slow down the expansion in ex- ploitation stages, making the classifier more likely to capture a sudden change along domain boundaries. Note that when  = 0, the constraint of p(x) ? ? in Eq. 6.9 is equivalent to f?(x) = 0, thus theoretically all queries should lie near the estimated decision boundary. In this case, the Active Expansion Sampling acts like Uncertainty Sampling [135, 136]. In practice, however, AES will be unable to find a feasible solution when  = 0 since no candidate sample will be exactly on the decision boundary under the pool-based sampling setting. 6.6 Using Design Manifolds to Synthesize Novel Designs It is difficult to directly explore a real-world design space since it is usually high- dimensional, and most points in that space will be unrealistic or nonfunctional. To handle real-world design spaces, our proposed method assumes that design vari- ables original expressed in a high-dimensional design space X usually lie on a lower- 7Technically, due to sampling error introduced when generating the pool, the exploitation stage will be influenced by  (since f?(x?) is only ? 0). But this effect is negligible compared to ?s influence on the exploration stage. 123 Real-world samples Design space Valid design Design manifold Invalid design Real-world samples Figure 6.4: 3D visualization of high dimensional design space showing that design variables actually lie on a 2-dimensional manifold [41, 49]. At a point away from the real-world stemless glass samples, the glass contours are self-intersecting; at another point, the shape becomes a stem glass. dimensional design manifold (Fig. 6.4) [17, 41, 49, 216]. We can thus project designs on that lower-dimensional manifold F by a mapping g : X ? F . Then given any point in that space, we can synthesize new designs by a reverse mapping g? : F ? X . This reduces the problem of exploring the high-dimensional design space X to ex- ploring a corresponding embedding space F . Just like in the original design space, there are boundaries for feasible designs in the embedded space. The function that evaluates feasible domains can thus be expressed as h : F ? {?1, 1}. Figure 6.4 shows an example of a glassware design manifold, where the synthesized contours are self-intersecting at a point away from the real-world samples. We call these invalid designs?designs that are unrealistic or nonfunctional in the real-world. Since real-world samples are all valid designs, normally designs lying between any two real-world samples (i.e., inside the convex hull of all the real-world samples in the embedding space) will also be valid [41, 49]. However, this assumption may not hold for designs that lie on the manifold but beyond the real-world samples. Since they are on the manifold, they obey similar 124 rules of variation and deformation learned from the real-world samples, but because they are away from the real-world samples, there is no guarantee that those designs are functional or aesthetically valid. On a manifold that preserves pairwise distances between samples in the design space, designs far away from the real-world samples will look different from them [41, 49]. If these designs remain valid, despite being far away from the real-world samples, then we are discovering innovative designs. The methods presented in this chapter are one way to achieve this creative exploration, even in high dimensional design spaces. Note that this method assumes that the dimensionality of the design space can be reduced. In cases where this assumption does not hold, we have to directly explore the original high-dimensional design space. 6.7 Experimental Evaluation We evaluate the performance of AES in capturing feasible domains using both synthesized and real-world examples. The performance is measured by the F1 score, which is expressed as ? precision ? recallF1 = 2 precision + recall where true positives precision = true positives + false positives and true positives recall = true positives + false negatives We compare AES with two conventional bounded adaptive sampling methods ? the Neighborhood-Voronoi (NV) algorithm [195] and the straddle heuristic [26]. We also investigate the effects of noise and dimensionality on AES. 125 We use the same pool size (500 candidate samples8) in all the experiments. In Fig. 6.8-6.11 and 6.14, the F1 scores are averaged over 100 runs. We run all 2- dimensional experiments on a Dell Precision Tower 5810 with 16 GB RAM, a 3.5 GHz Intel Xeon CPU E5-1620 v3 processor, and a Ubuntu 16.04 operating sys- tem. We run all higher-dimensional experiments on a Dell Precision Tower 7810 with 32 GB RAM, a 2.4 GHz Intel Xeon CPU E5-2620 v3 processor, and a Red Hat Enterprise Linux Workstation 7.2 operating system. The Python code needed to reproduce our AES algorithm, our baseline implementations of NV and Straddle, and all of our below experiments is available at https://github.com/IDEALLab/ Active-Expansion-Sampling. 6.7.1 Effect of Hyperparameters We first use two 2-dimensional test functions ? the Branin function and Hosaki function, respectively ? as indicator functions to evaluate whether an input is in- side the feasible domain. Both examples construct an input space with multiple disconnected feasible regions, which makes the feasible domain identification task challenging. The Branin function is ( )2 ( ) 5.1 5 1 g(x) = x2 ? x2 + x1 ? 6 + 10 1? cosx1 + 10 4?2 1 ? 8? We define the label y = 1 if x ? {x|g(x) ? 8,?9 < x1 < 14,?7 < x2 < 17}; and y = ?1 otherwise. The resulting feasible domains resemble three isolated feasible regions (Fig. 6.5). The initial point x(0) = (3, 3). For the Gaussian process, we use a 8For NV algorithm, its pool size refers to the test samples generated for the Monte Carlo simu- lation. 126 Exploitation stage I Exploration stage I Exploitation stage II Exploitation stage II Exploration stage II Exploitation stage III Exploitation stage III Exploration stage III Figure 6.5: Querying sequence for Active Expansion Sampling ( = 0.5 and ? = 1.3). The solid lines are estimated decision boundaries, and the dotted lines are the isocontour p(x) = ? . The gray areas are actual feasible regions. 90 queries 180 queries 270 queries 90 queries 180 queries 270 queries (a) Neighborhood-Voronoi algorithm. (b) Straddle heuristic. Figure 6.6: Querying sequence for bounded adaptive sampling methods. The dashed lines are pool boundaries. 127 Exploration stage II Exploration stage I Exploration stage III Exploitation stage I Exploitation stage III Exploitation stage II Figure 6.7: F1 score plot for Fig. 6.5. During exploitation stages, the F1 score increases stochastically as the decision boundary changes; while in the exploration stage, the current decision boundaries have been exploited and do not change, thus the F1 score also does not change. (a) Changing  (? = 1.3). (b) Changing ? ( = 0.3). Figure 6.8: AES with different  and ? on the Branin example. The upper plots show their F1 scores averaged over 100 runs. The lower plots show queried points during one of the 100 runs. 128 Figure 6.9: AES and NV (with different input variable bounds) on the Branin example. The pool boundaries set in the Neighborhood-Voronoi algorithm are shown as dashed lines. Figure 6.10: F1 scores of AES ( = 0.3 and ? = 1.3), NV, and Straddle (with tight bounds) on the Branin example within the explored region (i.e., the p(x) < ? region, where ? = ?(??),  = 0.3, and ? = 1.3). (a) Bernoulli noise. (b) Gaussian noise. Figure 6.11: AES and NV on the Branin example using noisy labels. 129 AES NV Straddle Figure 6.12: Queried points under uniform label noise (p = 0.2). Figure 6.13: The 3-dimensional double-sphere example. The gray regions are the feasible domains. The dashed boxes are the input space bounds for the NV algorithm. (a) F1 scores. (b) Total running time. Figure 6.14: AES and NV on high-dimensional double-sphere examples. 130 Figure 6.15: AES on the Nowacki beam example. Gaussian kernel (Eq. 6.1). We set the kernel length scale l = 0.9. To compute the F1 scores, we generate samples along a 100? 100 grid as the test set in the region where x1 ? [?13, 18] and x2 ? [?8, 23]. This section mainly describes the Branin example ? as both the Branin and Hosaki examples show similar results ? however, we direct interested readers to Sect. 6.8.1, where we describe the Hosaki example in detail and show its experimental results. For both examples, we use three levels of  (0.1, 0.3, 0.5) and ? (1.2, 1.3, 1.4) to demonstrate their effects on AES?s performance. Figures 6.5 and 6.6 show the sequence of queries selected by AES and the two bounded adaptive sampling methods, respectively, applied on the Branin example. For AES, there are three exploitation stages, as there are three disconnected feasible domains. AES starts by querying samples along the initial estimated decision bound- ary, and then expands queries outward to discover other feasible regions. In contrast, the straddle heuristic simultaneously explores the whole bounded input space, and 131 Table 6.1: Input space bounds for the NV algorithm and the straddle heuristic (Branin example). Tight Loose Insufficient x1 ? [?9, 14], x1 ? [?14, 19], x1 ? [?4, 9],Branin x2 ? [?7, 17] x2 ? [?12, 22] x2 ? [?2, 12] refines all three decision boundaries. Fig. 6.7 shows the corresponding F1 scores of the experiment in Fig. 6.5. During exploitation stages, AES?s F1 score non-monotonically increases as part of the estimated decision boundary is outside the explored region (where AES has confidence in the accuracy); while in the exploration stage, the cur- rent decision boundaries are inside the explored region and remain unchanged, thus the F1 score stabilizes. Figures 6.8a and 6.8b demonstrate the effects of hyperparameters  and ?, re- spectively, on AES?s performance. Increasing  or ? leads to slower expansion of the explored region and a higher F1 score. This means that using a higher  or ? enables accuracy improvement but requires a larger query budget. In both examples, the F1 score is more sensitive to ? than . 6.7.2 Unbounded versus Bounded We use the NV algorithm and the straddle heuristic as examples of bounded adaptive sampling methods. Because these two methods do not progressively expand the region (as in AES), but rather assumes a fixed region, we create a ?bounding box? in the input space, and generate queries inside this box. When comparing AES with the bounded methods, we use  = 0.3 and ? = 1.3 for AES. In each experiment, we change the size of the input space bounds to evaluate the effect of bound size on these methods. Specifically, we simulate the cases where 132 we set tight, loose, and insufficient bounds, as shown in Tab. 6.1 and Fig. 6.9. ?Tight? means the bounds cover the entire feasible domain while being no larger than needed (in practice we use bounds slightly larger than this to ensure the feasible domain boundary is inside the tight bounds); ?loose? means the bounds cover the entire feasible domain but are larger than the tight bounds; ?insufficient? means the variable bounds do not cover the entire feasible domain. As shown in Fig. 6.9, the NV algorithm makes fast accuracy improvement at early stages, and slows down after some iterations. The F1 score of NV is almost monotonically increasing; while AES?s score fluctuates because it focuses first on refining the domains it knows about during exploitation (at the expense of accuracy on domains it has not seen yet). This causes AES to have a lower F1 score early on. For the NV algorithm, when the input variable bounds are set properly, both AES and NV achieve similar final F1 scores. However, NV requires more iterations to achieve similar final accuracy to AES, especially when the bounds are set too large, where NV exhausts its query budget exploring unknown regions. When the bounds are set too small to cover certain feasible regions, NV stops improving the F1 score when it begins to over-sample the space and is unable to reach similar accuracy as AES. Note that in this case, we purposefully set the bounds such that they cover the vast majority of the feasible region, leaving only a small feasible area outside of those bounds. Our explicit purpose here is to demonstrate how sensitive such bounded heuristics can be when their bounds are misspecified (even by small amounts). The performance of bounded methods degrades rapidly as their bound sizes decrease further. Although AES shows slow accuracy improvement over the entire test region, it keeps a constant accuracy bound within the explored region (as discussed in Sect. 6.5.1). Fig. 6.10 shows the F1 scores within the p(x) < ? region, which is AES?s explored region. Specifically, we set  = 0.3, ? = 1.3, and ? = ?(??). For the 133 Table 6.2: Final F1 scores and running time (Branin example). F1 score Time (s) AES ( = 0.3, ? = 1.3) 0.90? 0.004 92.34? 0.62 AES ( = 0.1, ? = 1.3) 0.87? 0.008 95.71? 0.37 AES ( = 0.5, ? = 1.3) 0.90? 0.002 89.71? 0.38 AES ( = 0.3, ? = 1.2) 0.87? 0.006 96.73? 0.26 AES ( = 0.3, ? = 1.4) 0.91? 0.002 80.70? 0.33 NV (tight) 0.83? 0.021 64.40? 0.09 NV (loose) 0.75? 0.030 63.68? 0.06 NV (insufficient) 0.41? 0.028 63.83? 0.06 Straddle (tight) 0.82? 0.012 43.72? 0.22 Straddle (loose) 0.71? 0.014 41.72? 0.22 Straddle (insufficient) 0.34? 0.009 54.44? 0.21 NV algorithm, we use the tight input space bounds from the previous experiments. By just considering the explored region, AES?s F1 scores are quite stable throughout the sampling sequence; while NV?s F1 scores are low at the beginning, and then in- crease until stable.9 Since AES?s accuracy inside the explored region is invariant of the number of iterations, it can be used for real-time prediction of samples? feasibility in the explored region. Table 6.2 shows the final F1 scores and wall-clock running time of AES, NV, and the straddle heuristic. Note that the confidence interval for NV?s averaged F1 scores are much larger than AES. This is because during some runs NV fails to discover all the three feasible regions (Fig. 6.9 for example). 6.7.3 Effect of Noise Label noise is usually inevitable in active learning tasks. The noise comes from, for example, simulation/experimental error or human annotators? mistakes. We test 9This difference is because NV?s explored region covers more area than AES at the beginning. 134 Branin (350 queries) the cases where the labels are under (1) uniform noise and (2) Gaussian noise centered at the decision boundary. We simulate the first case by randomly flipping the labels. The noisy label is set as y? = (?1)?y, where ? ? Bernoulli(p), p is the parameter of the Bernoulli distribution that indicates the noise level, and y is the true label. The second case is probably more common in practice, since it is usually harder to decide the labels near the decision boundary. To simulate this case, we add Gaussian noise to the test functions: g?(x) = g(x) + e, where g(x) is the Branin or Hosaki function, and e ? s ? N (0, 1). In each case, we compare the performance of AES ( = 0.3, ? = 1.3) and NV (with tight bounds) under two noise levels. As expected, adding noise to the labels decreases the accuracy of both methods (Fig. 6.11a and 6.11b). However, in both cases (Bernoulli noise and Gaussian noise), the noise appears to influence NV more than AES. As shown in Fig. 6.12, when adding noise to the labels, NV has high error mostly along the input space boundaries, where it cannot query samples outside to further investigate those apparent feasible regions. In contrast, AES tries to exploit those rogue points to try to find new feasible regions, realizing after a few new samples that they are noise. 6.7.4 Effect of Dimensionality To test the effects of dimensionality on AES?s performance, we apply both AES and NV on higher-dimensional examples where the feasible domains are inside two (d? 1)-spheres of radius 1 centered at a and b respectively. Here a = 0 and b = (3, 0, ..., 0). Fig. 6.13 shows the input space of the 3-dimensional double-sphere example. The initial point x(0) = 0. For the Gaussian process, we use a Gaussian kernel with a 135 length scale of 0.5. We set  = 0.3 and ? = 1.3. To compute the F1 scores, we randomly generate 10,000 samples uniformly within the region where x1 ? [?2, 5] and xk ? [?2, 2], k = 2, ..., d. The input space bounds for the NV algorithm are x1 ? [?1.5, 4.5] and xk ? [?1.5, 1.5], k = 2, ..., d. We get the F1 scores and running time after querying 1,000 points. As shown in Fig. 6.14, both AES and NV shows an accuracy drop and running time increase as the problem?s dimensionality increases. This is expected, since based on the curse of dimensionality [14], the number of queries needed to achieve the same accuracy increases with the input space dimensionality. The curse of dimensionality is inevitable in machine learning problems. However, since AES explores the input space only when necessary (i.e., only after it has seen the entire decision boundary of the discovered feasible domain), its queries do not need to fill up the large volume of the high-dimensional space. Therefore, AES?s accuracy drop with problem dimensionality is not as severe as bounded methods like NV. For particularly high-dimensional design problems, another complementary approach is to construct explicit lower-dimensional design manifolds upon which to run AES [44, 49]. 6.7.5 Nowacki Beam Example To test AES?s performance in a real-world scenario, we consider the Nowacki beam problem [157]. The original Nowacki beam problem is a design optimiza- tion problem where we minimize the cross-section area A of a cantilever beam of length l with concentrated load F at its end. The design variables are the beam?s breadth b and height h. We turn this problem into a feasible domain identification problem by replacing the objective with a constraint A = bh ? 0.0025m2. Other constraints are (1) the maximum tip deflection ? = Fl3/(3EIY ) ? 5mm, (2) the 136 maximum bending stress ?B = 6Fl/(bh 2) ? ?Y , (3) the maximum shear stress ? = 1.5F/(bh) ???Y /2, (4) the ratio h/b ? 10, and (5) the failure force of buck- ling F 2crit = (4/l ) (GIT )(EIZ)/(1? ?2) ? fF , where IY = bh3/12, IZ = b3h/12, IT = IY + IZ , and f is the safety factor. And ?Y , E, ?, and G are the yield stress, Young?s modulus, Poisson?s ratio, and shear modulus of the beam?s material, respectively. We use the settings from [195], where l = 0.5m, F = 5kN, f = 2, ?Y = 240MPa, E = 216.62GPa, ? = 0.27, and G = 86.65GPa. As shown in Fig. 6.15, the feasible domain is a crescent-shaped region. Given only these constraints, it is unclear what appropriately tight bounds on the design variables should be. In this experiment, we set the Gaussian kernel?s length scale as 0.005,  = 0.3 and ? = 1.3. The initial point x(0) = (b0, h0) = (0.05, 0.05). The test samples are generated along a 100? 100 grid in the region where b ? [0, 0.02] and h ? [0.1, 0.16]. After 242 iterations, the F1 score of AES reaches 0.933 and remains constant. Note that mostly the estimation error comes from the two sharp ends of the crescent- shaped feasible region (Fig. 6.15). This is because the kernel?s assumption on function smoothness (i.e., similar inputs should have similar outputs) causes the GP to have a bad performance where the labels shift frequently. The similar problem also exists when using other classifiers like SVM, where a kernel is also used to enforce similar outputs between similar inputs. This problem can be alleviated by using a smaller kernel length scale. 6.7.6 Detecting Novel Designs We use two other real-world design examples to show how our proposed method discovers novel designs. Similar to the airfoil example, we represent each design with 100 Cartesian coordinates from their 2D outlines, and use the inverse mapping 137 Figure 6.16: Some of the initial designs used in the stemless glassware example. g : F ? R2 ? X ? R200 generated by PCA to synthesize samples. Different from previous examples, we use human judgment as the feasibility cri- terion. Although we can use some rules to determine the feasibility (e.g., whether the outlines of a design are self-intersecting), a design?s actual feasibility may depend on more complicated factors (e.g., functionality and aesthetics). Therefore, initially we start with limited number and styles of real-world designs and do not have labels (i.e., valid or invalid) for synthesized designs outside the real-world design domain until human assessment. In the following two experiments, we select a sample in F , and synthesize a design x ? X , where x consists of the Cartesian coordinates of that design?s outline [49]. Then we use an interface that shows a human oracle the picture of that design, and then the oracle labels the design valid or invalid based on whether the designed shape is aesthetically pleasing. This process forms the function h : F ? {?1, 1}. The specific choice of human oracle and interface is not central to the contributions of this chapter, serving only as an illustrative real-world example. In practice, we need to take human annotators? mistakes and disagreements into consideration. For example, we can model human disagreement by adding a white noise kernel to the Gaussian process kernel function. The white noise kernel is expressed as k(x,x?) = r if x = x? and 0 otherwise, where r is the noise level. In the future, we will look into ways of dealing with the problem of input-dependent noise level (i.e., different degree of human mistakes and disagreements for different design instances) [177, 193]. 138 Figure 6.17: The discovered feasible domain and valid designs. The top figure shows the initial and queried samples and the estimated feasible domain in the embedding space F The solid dots represent valid designs, while the hollow dots represent invalid ones. Started with the stemless glasses shown in Fig. 6.16, the proposed method discovered other types of revolved objects such as vases and bowls. Figure 6.18: Some of the initial designs used in the bottle example. Example: stemless glasses as initial samples. In this example, we used only stemless glasses as initial designs (Fig. 6.16). We set the length scale of the Gaussian kernel as l = 2.4, and the margin  = 1.5. As shown in Fig. 6.17, on the given design manifold, our proposed method discovered a feasible domain with other revolved objects such as vases and bowls. The new designs increasingly differ as they get farther away from the initial design samples. Example: bottles as initial samples. Similar to the previous example, we used only bottles as initial designs (Fig. 6.18). We set the length scale of the GP classifier l = 1.3, and the margin  = 0.7. As shown in Fig. 6.19, our proposed method 139 Figure 6.19: The discovered feasible domains and valid designs. The solid dots represent valid designs, while the hollow dots represent invalid ones. Started with the bottles shown in Fig. 6.18, the proposed method discovered two feasible domains, between which there are designs with self-intersecting contours. discovered two feasible domains. In the left feasible domain, there are designs that look like bowling balls and flasks; and in the right feasible domain, there are designs that still look like bottles, but with a larger aspect ratio. Between these two feasible domains, designs have self-intersecting contours and thus are invalid. 6.8 Additional Experimental Results 6.8.1 Hosaki Example We use the Hosaki example as an additional 2-dimensional example to demonstrate the performance of our proposed method. Different from the Branin example, the Hosaki example has feasible domains of different scales. Its feasible domains resemble two isolated feasible regions ? a large ?island? and a small one (Fig. 6.20a). The 140 (a) Changing  (? = 1.3). (b) Changing ? ( = 0.3). Figure 6.20: AES with different  and ? on the Hosaki example. Figure 6.21: AES and NV (with different input variable bounds) on the Hosaki example. 141 (a) Bernoulli noise. (b) Gaussian noise. Figure 6.22: AES and NV on the Hosaki example using noisy labels. Table 6.3: Input space bounds for the NV algorithm and the straddle heuristic (Hosaki example). Tight Loose Insufficient x1 ? [0, 6], x1 ? [?2.5, 8.5], xHosaki 1 ? [1, 6], x2 ? [0, 5] x2 ? [?3, 8] x2 ? [0, 4.5] Hosaki function is ( ) g(x) = 1? 8x + 7x2 ? 7 1x3 + x4 x2e?x21 1 3 1 4 1 2 We define the label y = 1 if x ? {x|g(x) ? ?1, 0 < x1, x2 < 5}; and y = ?1 otherwise. For AES, we set the initial point x(0) = (3, 3). We use a Gaussian kernel with a length scale l = 0.4. The test set to compute F1 scores is generated along a 100?100 grid in the region where x1 ? [?3, 9] and x2 ? [?3.5, 8.5]. For NV and straddle, the input space bounds are shown in Tab. 6.3. Table 6.4 shows the final F1 scores and running time of AES, NV, and the straddle heuristic. Fig. 6.20 shows the F1 scores and queries under different  and ?. Fig. 6.20. Fig. 6.21 compares the performance of AES and NV with different boundary sizes. 142 Table 6.4: Final F1 scores and running time (Hosaki example). F1 score Time (s) AES ( = 0.3, ? = 1.3) 0.95? 0.003 28.25? 0.25 AES ( = 0.1, ? = 1.3) 0.94? 0.004 30.86? 0.19 AES ( = 0.5, ? = 1.3) 0.95? 0.002 28.32? 0.33 AES ( = 0.3, ? = 1.2) 0.94? 0.003 31.69? 0.45 AES ( = 0.3, ? = 1.4) 0.96? 0.002 26.39? 0.38 NV (tight) 0.95? 0.003 22.58? 0.03 NV (loose) 0.93? 0.004 22.28? 0.03 NV (insufficient) 0.69? 0.010 22.27? 0.03 Straddle (tight) 0.95? 0.002 16.20? 0.19 Straddle (loose) 0.88? 0.005 14.00? 0.14 Straddle (insufficient) 0.69? 0.010 16.92? 0.25 Fig. 6.22 shows the performance of AES and NV under Bernoulli and Gaussian noise. 6.8.2 Results of Straddle Heuristic In this section, we list experimental results related to the straddle heuristic. Specif- ically, Fig. 6.23 shows straddle?s F1 scores and queries using different sizes of input variable bounds, and the comparison with AES. Fig. 6.24 shows the comparison of AES and straddle under noisy labels. 6.9 Summary We presented a pool-based sampling method, AES, for identifying (possibly dis- connected) feasible domains over an unbounded input space. Unlike conventional methods that sample inside a fixed boundary, AES progressively expands our knowl- edge of the input space under an accuracy guarantee. We showed that AES uses successive exploitation and exploration stages to switch between learning the deci- sion boundary and searching for new feasible domains. To avoid increasing the pool 143 Hosaki (200 queries) (a) Branin example. (b) Hosaki example. Figure 6.23: AES and straddle (with different input variable bounds). (a) Branin example under Bernoulli noise. (b) Branin example under Gaussian noise. (c) Hosaki example under Bernoulli noise. (d) Hosaki example under Gaussian noise. Figure 6.24: AES and straddle under noisy labels. 144 size and hence the computation cost as the explored area grows, we proposed a dy- namic local pool generation method that samples the pool locally at a certain location in each iteration. We showed that at any point within the explored region, AES guar- antees an upper bound  of misclassification loss with a probability of at least 1? ? , regardless of the number of iterations or labeled samples. This means that AES can be used for real-time prediction of samples? feasibility inside the explored region. We also demonstrated that, compared to existing methods, AES can achieve comparable or higher accuracy without needing to set exact bounds on the input space. Note that AES cannot be applied to input spaces where synthesizing a useful sample is difficult. For example, in an image classification task, we cannot directly synthesize an image by arbitrarily setting its pixels, since most of the synthesized images may be unrealistic and hence useless. Usually in such cases, we use real-world samples as the pool and apply bounded active learning methods (since we know the bounds of real-world samples). Or instead, we first embed the original inputs onto a lower-dimensional space, such that given the low-dimensional representation, we can synthesize realistic samples. We can then apply AES on that embedded space. This approach can be used for discovering novel feasible domains (i.e., finding feasible inputs that are nonexistent in the real-world). We refer interested readers to a detailed introduction of this approach by [44]. One limitation of AES is that the accuracy improves slowly at the early stage compared to bounded active learning methods. This is because AES focuses on only the explored region (which is small at the beginning), while bounded active learning methods usually do space-filling at first. In the situation where we want fast accuracy improvement at the beginning, one possible way of tackling this problem is by dynamically setting AES?s hyperparameters. Specifically, since the expansion speed increases as  or ? decreases, we can accelerate AES?s accuracy improvement 145 at earlier stages by setting small values of  and ?, so that queries quickly fill up a larger region. Then to achieve high final accuracy, we can increase  and ? to meet the accuracy requirement. So far this dissertation has shown how to measure the intrinsic complexity of a design space to guide data-driven design synthesis, how to incorporate prior knowledge into data-driven design synthesis models, and how to identify feasible domains and discover novel designs when the design space or the latent space bounds are unclear. The next chapter will introduce a global optimization method that also expands an input space to allow the discovery of the optimal design being far away from existing designs. 146 Chapter 7: Global Optimization with Trust Region Bayesian Opti- mization The work in this chapter has been submitted to the Conference on Neural Information Processing Systems (NeurIPS). 7.1 Introduction Bayesian optimization (BO) is a global optimization technique targeted for expen- sive black-box functions [191]. Particularly, one of its important application in the machine learning community is automated hyperparameter tuning [201, 204, 207]. In a standard BO process, the objective function is modeled as a random function with a prior distribution. This prior updates to form a posterior after new observations (i.e., a Gaussian process or GP [172]). The decision about which observation to collect next is made by globally maximizing an acquisition function based on the posterior. This step requires fixed variable bounds, which are sometimes not trivial to set. It is hard to guarantee that any fixed bounds will include the true global optimum. In this chapter, we modified the standard BO approach so that the fixed vari- able bounds are not required. When the search space is unbounded, the acquisition function can have suprema at infinity, where the uncertainty is maximized. Thus we search only in the region with sufficiently low uncertainty, which we referred to as the trust region in reference to conceptually similar approaches from trust region 147 optimization. This trust region expands as we add more observations. We call this method Trust Region Bayesian Optimization (TRBO). The main technical contributions of this chapter are: 1. A trust-region-based acquisition strategy to bound the GP model uncertainty and expand the search space; and 2. Theoretical results regarding how to adaptively set the threshold of the uncer- tainty bound to avoid the over-exploration problem that occurs in an expanding search space. 7.2 Bayesian Optimization Bayesian Optimization uses a sequential strategy to search for the global opti- mum of expensive black-box functions. Assuming we have an objective function: f : Rd ? R, and the observation of its output has Gaussian noise: y ? N (f(x), ?2n). It is expensive to evaluate either the function f or its gradient (assuming we can only approximate the gradient by the finite difference method when f is a black-box func- tion and that Automatic Differentiation methods cannot be used). Thus the goal of BO is to minimize the number of evaluations needed to find the global minimum. BO treats the objective as a random function that has a prior distribution, and update this prior to form a posterior distribution over the function after observing data. This can be done by using a Gaussian process (GP). The posterior distribution can then be used to form an acquisition criterion that proposes to evaluate f at a promising point, so that the regret is minimized. The GP posterior can then be updated after the new observation. This process repeats until the evaluation budget runs out or a satisfied solution is achieved. We will elaborate on the Gaussian process and the acquisition function in the following sections. 148 7.2.1 Gaussian Process The Gaussian process (GP) [172] estimates the distribution of the objective func- tion. A kernel (covariance) function k(x,x?) is used to measure the similarity between two points x and x?. It encodes the assumption that ?similar inputs should have sim- ilar outputs?. The specific choice of kernel is not central to the core contributions of the chapter. In this chapter we use the RBF kernel: ( ) 1 k(x,x?) = exp ? (x? x?)>??1(x? x?) (7.1) 2 where ? = diag(l21, ..., l 2 d) with li the length scale of the i-th dimension. Given N observations D = (X,y) = {(xi, yi)|i = 1, ..., N}, the GP posterior f(x) at any point x is a Gaussian distribution: f(x)|D,x ? N (?(x), ?2(x)) with the mean and the variance expressed as ?(x) = k(x)>(K + ?2I)?1n y (7.2) ?2(x) = k(x,x)? k(x)>(K + ?2I)?1n k(x) (7.3) where k(x) is an N -dimensional vector with the i-th dimension being k(x,xi), and K is an N ?N covariance matrix with Kij = k(xi,xj). 7.2.2 Acquisition Function Bayesian optimization picks the next point to evaluate by maximizing an acqui- sition function, which is computed based on the GP posterior. Common acquisition functions include the probability of improvement (PI) [127], the expected improve- ment (EI) [108], the Gaussian Process upper confidence bound (GP-UCB) [205], and 149 those based on entropy search [93, 94, 229]. In this chapter, we use EI as our acquisition function. It measures the expected amount of improvement over the current best solution based on the learned GP model: EI(x) = ?E[max{0, f(x)? f ?}]+? = (f ? f ?)N (f ;?(x), ?2(x))df (7.4) f ? = ?(x)(u?(u) + ?(u)) where f ? is the current best objective function value, u = (?(x) ? f ?)/?(x), and ? and ? are the cumulative density function (CDF) and probability density function (PDF) of the standard normal distribution, respectively. 7.2.3 Previous Work on Unbounded Bayesian Optimization Normally Bayesian optimization is performed within fixed variable bounds. But in cases such as algorithm hyperparameter tuning [190, 204, 207] and shape optimiza- tion [164], setting the variable bounds are not trivial. It is hard to guarantee that any fixed bounds will include the true global optimum. Two types of solutions were proposed to handle this problem: 1) performing BO in an unbounded space by regularization via non-stationary prior means so that the acquisition function?s suprema will not be at infinity [170, 190]; and 2) performing BO in ?soft bounds? that are gradually expanded over iterations [155, 156, 190]. The first solution computes an acquisition function that is biased toward regions near some user-specified center point, thus insufficient exploitation may occur when the optimal solution is far from the center. The second solution either expands each direction equally, which often yields to unnecessarily large search spaces [170, 190]; or expands only to the promising region where the upper confidence bound (UCB) is 150 larger than the lower confidence bound (LCB) of the current best solution [155, 156]. But the latter approach has to perform global search twice at each iteration?one for the maximum LCB to filter out the non-promising region; and one for the maximum acquisition function value. It expands the bounds of the first search according to a hard-coded rule, and hence may show a lack of adaptability to different optimization problems. The trust region Bayesian optimization is different from the previous methods in that it adaptively expands the search space based on the uncertainty of the GP model. It can essentially avoid the aforementioned issues by employing a strategy that we will introduce in the following section. 7.3 Trust Region Bayesian Optimization In this section we will introduce the main ingredients of TRBO, namely, its acquisi- tion strategy (Sect. 7.3.1), global optimization of the acquisition function (Sect. 7.3.2), the way of adaptively balancing exploration and exploitation (Sect. 7.3.3), and a trick to improve exploitation when expanding the search space (Sect. 7.3.4). 7.3.1 Acquisition Strategy Our acquisition strategy can be expressed as the following constrained optimiza- tion problem: max EI(x) x?Rd (7.5) s.t. ?2(x) ? ?k0 where ? ? (0, 1) is a coefficient controlling the aggressiveness/conservativeness of exploration, and k = ?20 (x?) with x? denotes a point infinitely far away from the observations. Based on Eq. 7.3, we have k0 = ? 2(x?) = k(x,x). When using the 151 kernel shown in Eq. 7.1, simply we have k0 = 1. Under this acquisition strategy, only points with low GP model uncertainty will be picked for evaluation. We also included the acquisition strategy for the constrained BO setting in the supplementary material. To avoid excessive local exploitation, we can modify the equation for EI (Eq. 7.4) as EI(x) = E[max{0, f(x)? (f ? + )}] (7.6) where  > 0 is the minimum improvement parameter [107, 189, 190]. In many real-world cases, Bayesian optimization needs to deal with constrained problems of the following two kinds: 1) there are infeasible regions in the input space (e.g., some experimental configurations are infeasible); and 2) the objective function f does not have definition in some regions of the input space (e.g., when the hyperparameters of a neural network are not properly chosen, exploding gradients may occur, which may lead to NaN weight values and hence the NaN accuracy). This is especially common when the we have an unbounded or expanding search space. Therefore, it is worth extending TRBO to make it suitable for constrained problems. Specifically, we can modify Eq. 7.5 based on Refs. [11] and [78]: max EI(x)Pr(C(x)) x?Rd s.t. ?2(x) ? ?k (7.7)0 Pr(C(x)) ? 0.5 where C(x) is an indicator of whether the constraints are satisfied or whether the objective function has definition. 152 Figure 7.1: Feasible domain bounds. In each iteration, we expand the minimum bounding box of evaluated samples along the i-th axis by ri. 7.3.2 Feasible Domain Bounds The feasible domain defined by Eq. 7.5 is bounded by the isocontour ?2(x) = ?k0. However, it is easier to search inside a bounding box instead of an irregular isocontour when solving the global optimization problem in Eq. 7.5. We can show that the solution to Eq. 7.5 is inside a bounding box, which we call the feasible domain bounds. In this section, we will derive the feasible domain bounds. For any point x? on the isocontour (Fig. 7.1), i.e., ? 2(x? ) = ?k0, based on Eq. 7.3 we have k0 ? k>? Ak? = ?k0, or k>? Ak? = (1? ?)k0 (7.8) where A = (K + ?2nI) ?1 and k? = k(x? ). Since A is symmetric, we have ? k>k ? k>Ak >min ? ? ? ? ? ?maxk? k? , where ?min and ?max are the smallest and largest eigenvalue of A, respectively. Thus k > ? k? have the following bounds for any x? : (1? ?)k /? ? k>0 max ? k? ? (1? ?)k0/?min (7.9) 153 Suppose xnl is the nearest evaluated point to x? (Fig. 7.1), the following inequality holds: k>? k? < Nk 2(xnl,x? ) (7.10) where N > 1 is the number of evaluated points. According to Eq. 7.9 and Eq. 7.10, we have Nk2(xnl,x? ) > (1? ?)k0/?max (7.11) for any x? . Also, based on Eq. 7.1, we have ( ?d ( ) )2 k2 ?i (xnl,x? ) = exp ? (7.12) l i=1 i where ? = |xnl ? x? |. Substituting Eq. 7.12 into Eq. 7.11, we get the following inequality ?d ?2i < 1 (7.13) Cl2 i=1 i where C = ? log((1 ? ?)k0/(N?max)). Equation 7.13 shows that x? is inside a d- dimensional hyperellipsoid that centered at xnl with ? ? ri = Cli = li ? log((1? ?)k0/(N?max)) (7.14) corresponding to half the[length of the i-th principal axis. ]Thus by setting the bounds (j) (j) of the i-th dimension as minj{xi } ? ri,maxj{xi }+ ri , we can include the entire feasible domain of Eq. 7.5. This means that in each iteration, we get the minimum bounding box of all evaluated samples, and expand the bounding box along the i-th axis by ri (Fig. 7.1). Then constrained global optimization of the acquisition function is performed within the new bounds. 154 FBO-EI TRBO (?=0.99) TRBO Current-best True optimum GP Posterior True function Evaluated points New evaluation Figure 7.2: Over-exploration in FBO-EI (left) and TRBO with large ? (middle); By adap- tively setting ? , TRBO enforces exploitation based on an accuracy criterion (right). FBO-EI (ybest=4.82) TRBO (ybest=0.01) True optimum Solution found Evaluations Figure 7.3: In contrast with TRBO (right), FBO-EI (left) spends too much budget on randomly exploring the search space (left). In practice, because Eq. 7.10 is usually quite loose (especially when N is large), the above derived bounds are usually unnecessarily large, causing large volume of infeasible domain inside the bounds. In that case, we can replace ?max with ?min in Eq. 7.11, i.e., substituting the upper bound of k>? k? (Eq. 7.9) into Eq. 7.11. 7.3.3 Adaptive Exploration-Exploitation Trade-off A problem of an expanding search space is that a new evaluation may get too far away from the region of interest, due to the high uncertainty and hence the high EI in far-away regions (Fig. 7.2). The informativeness of those high-uncertainty regions, however, is low because of the sparsity of observed data near them. Thus sampling at 155 those regions is like shooting in the dark. This is fine in the fixed-bound BO, because it will exploit the region of interest eventually after finishing exploring those high- uncertainty regions. However, with expanding bounds, the high-uncertainty regions are expanding and BO could continuously sample in those regions and never head back to exploit the region of interest. As a result, the algorithm will spend too much budget on randomly exploring the search space but have insufficient exploitation, as shown in the left plot of Fig. 7.3. We call this over-exploration. This over-exploration problem can exist in every unbounded Bayesian optimiza- tion algorithm with an aggressive expansion strategy. Trust region BO can solve this problem by avoiding exploring in regions where our estimated model is uncertain (i.e., constraining the GP?s predictive variance ?2(x), see Eq. 7.5). However, one has to choose a proper coefficient ? to set the uncertainty threshold. In this section, we derive a way of setting ? adaptively to balance exploration and exploitation as the search space expands. The simplest way to avoid over-exploration is to force the algorithm to stop ex- ploring and start to refine the solution by exploiting near the current best point. In TRBO, exploration is performed by sampling points along the feasible domain boundary (i.e., ?2(x) = ?k0). Thus, we can avoid over-exploration by decreasing ? so that the expected improvement on the boundary is lower than that near the current best solution x?. The expected improvement on the boundary can be expressed via the predictive mean ?? : ( ) ( ? ? f ? ? ? ? f ?)? ? ?EI? (?? ) = (?? ? f )? ? + ?k0? ? (7.15) ?k0 ?k0 156 Also we have ( ) max {EI? (?? )} = EI? max {??} = EI? (?m) (7.16) ?2(x)=?k 20 ? (x)=?k0 since EI monotonically increases with the predictive mean. The expected improvement near the current best solution x? is ( ) ( ) ? ? ?+ ? f ? ? ?+ ? f EI+ = (?+ f )? + ?+? (7.17) ?+ ?+ where ?+ and ?+ are the predictive mean and standard deviation respectively at a point x+ near the current best solution. Assuming that the GP mean function ?(x) is Lipschitz continuous, we have f ???+ = ?, where ? is a small positive real number. Thus we have u+ = ??/?+. Now we can set EI+ > EI? (?m) to encourage exploitation. However, we do not want pure exploitation. Specifically, we want to stop exploitation at x+ whenever the room for improvement over the current solution f ? within the neighborhood of x? is sufficiently low with a high probability: Pr(f+ ? f ? ? ?) ? 1? ? (7.18) where f ? N (? , ?2+ + +), and ? ? 0 and 0 < ? < 1 are small real numbers. From Eq. 7.18 we can derive ?+ ? ? + ? = ?0 ??1(1? ?) Thus we only need to exploit at x+ when ?+ > ?0. By substituting it into Eq. 7.17, 157 we get a lower bound for EI+: EI+ > ???(??/?0) + ?0?(??/?0) = EI0 (7.19) since EI monotonically increases with the predictive variance. We can set this lower bound EI0 equal to EI? (?m) to enforce EI+ > EI? (?m) when ?+ > ?0 (i.e., when exploitation is necessary). Using Eq. 7.15, we can write EI? (?m) = EI0 as ( ?) ? ( ) ? ? ??m ? f ??m ? f ? (?m f )? + ?k0? = EI0 (7.20) ?k0 ?k0 We can solve for ? by using any root finding algorithm (e.g., Newton?s method). At the beginning of the optimization process, we do not need to make sure the room for improvement over f ? is small within the neighborhood of x?. Rather, we want to explore other regions that may contain better local optima. Thus we can set ? = ?0 at the beginning, where a larger ?0 allows more exploration, and then anneal ? over iterations until ? = 0 (e.g., towards the end of a computational budget). As a result, TRBO?s focus gradually switches from exploration to exploitation. In practice, we can set ?m as the prior mean (0 by default), since it is usually the case when over-exploration occurs. Thus this adaptive approach can effectively avoid over-exploration, as shown in Fig. 7.2 and Fig. 7.3. If in reality ?m < 0, then EI? (?m) < EI0 < EI+, TRBO will exploit near the current best solution even when it is unnecessary (i.e., ?+ < ?0); while if ?m > 0, then EI? (?m) > EI0, TRBO may explore when it should exploit. 158 7.3.4 Local Search for Better Exploitation In practice, to perform the global optimization of Eq. 7.5, we can sample initial candidate solutions within the bounds derived in Sect. 7.3.2, and refine those solu- tions using a gradient-based optimization method (e.g., L-BFGS-B [29]). If the GP kernel is fixed, the search space is always expanding, because ? is monotonically non- increasing as the number of observation increases. Specifically, the space near the queried point will be added to the search space volume. This results in a volume increase that is exponential with respect to the search space dimensionality. It will become harder for the candidate solutions to maintain the coverage of the search space as the optimization proceeds, especially when the problem has high dimensionality. Although we keep increasing exploitation by annealing ?, it does not guarantee that we will exploit near the current best solution in a large search space. This problem was not addressed in previous unbounded Bayesian optimization methods [155, 190]. A straightforward way to solve the problem is to increase the density of search al- gorithms, but this continuously increases the computational cost for each iteration. Alternatively, we propose local search near the current best solution to allow better exploitation. Specifically, in each iteration, we generate the same number of candi- date solutions but divide it for two tasks?global search and local search. Global search tries to find a promising point in the entire feasible domain in Eq. 7.5; while local search tries to find a promising point near the current best solution. This avoids insufficient exploitation but will not increase the computational cost. The optimization process is summarized in Algorithm 4. 159 Algorithm 4 Trust region Bayesian optimization 1: . Given objective function f , initial bounds B, initial evaluation n, and evaluation budget N 2: procedure Optimize(f,B, n,N) 3: Sample n points {x1, ...,xn} in B using LHS 4: yi ? f(xi), ?i = 1, ..., n 5: D ? {(x1, y1), ..., (xn, yn)} 6: for t = (n+ 1) : N do 7: Fit the GP model M to D 8: Compute ? based on Eq. 7.20 9: Expand the minimum bounding box of D (Eq. 7.14) as the feasible domain bounds B? 10: Search f?or the solution xt to Eq. 7.5 or Eq. 7.7 inside B ? 11: yt ? f(xt) 12: D ? D {(xt, yt)} 13: end for 14: end procedure 7.4 Experiments We evaluate the TRBO on both a range of synthetic test functions and an MLP hyperparameter optimization task. We also demonstrate the effect of dimensionality on TRBO?s performance and the significance of adaptive exploration-exploitation trade-off in TRBO. 7.4.1 Experimental Protocol The evaluation budget was set to 50d, and an initial sample size of 5d was drawn by using Latin hypercube sampling [108, 149]. For simplicity we used an isotropic kernel for the GP (i.e., l1 = ... = ld). We normalized the observed function outputs before fitting a GP regression model. We set ?0 = 0.1, ? = 0.1,  = 0.01, and ? = 0.01. For each test function, we set the initial bounds to be [10%, 30%] of its original bounds, as was also configured in Ref. [155]. All the initial bounds do not 160 Table 7.1: Optimization results for synthetic benchmarks Method SixHumpCamel Branin Rastrigin Hartmann3 TRBO ?1.03? 0.00 0.41? 0.01 0.21? 0.39 ?3.74? 0.17 FBO-EI ?0.97? 0.05 1.27? 1.49 4.52? 2.29 ?1.50? 0.93 FBO-UCB ?0.85? 0.09 1.43? 0.61 5.50? 2.60 ?2.31? 1.34 EI-Q ?0.28? 0.37 2.95? 1.73 8.10? 1.47 ?2.43? 0.65 EI-H ?0.47? 0.46 1.89? 1.00 7.39? 1.29 ?3.41? 0.25 Method Hartmann6 Beale Rosenbrock TRBO ?3.30? 0.04 0.26? 0.29 0.44? 0.40 FBO-EI ?3.30? 0.03 0.41? 0.32 7.72? 9.25 FBO-UCB ?3.26? 0.04 0.46? 0.37 17.39? 33.04 EI-Q ?2.32? 0.23 4.25? 2.83 17.45? 20.12 EI-H ?2.82? 0.15 3.87? 3.31 20.63? 19.71 include global optima. We compared TRBO to methods from Ref. [190] (i.e., EI-Q and EI-H) and Ref. [155] (i.e., FBO-EI and FBO-UCB). 7.4.2 Synthetic Benchmarks We used seven standard global optimization test functions. As shown in Table 7.1, TRBO out-performs other methods on most test functions. Note that TRBO?s results have lower variance compared to other methods, which is an indication of robustness. Figure 7.4 shows the optimization history on benchmark functions. It shows that compared to the other two state-of-the-art methods, trust region Bayesian optimiza- tion (TRBO) converged faster and achieved a better solution in most cases. We also demonstrated the effects of problem dimensionality on TRBO?s perfor- mance by using two synthetic benchmarks, as shown in Fig. 7.5. Here we define the optimality gap e = ysol?yopt, where ysol and yopt are the minimal observation and the true minimum of the objective function, respectively. We compared TRBO to: 1) the other two state-of-the-art methods?FBO-EI and EI-H, and 2) the standard BO with 161 Figure 7.4: Optimization history for synthetic benchmarks (noisy) (noisy) Figure 7.5: The effect of problem dimension on optimality gaps. 162 Best-So-Far Observation (adaptive) Figure 7.6: The effect of fixed and adaptive ? on optimality gaps. Here FBO-EI is shown as a baseline. the original function bounds (which include global optima). We also tested the case where the observations are corrupted by Gaussian noise with a standard deviation of 0.1. The results show that the optimality gap increases with the problem dimension, which can be explained by the curse of dimensionality [14]. TRBO demonstrated the best performance among methods dealing with an unbounded or expanding search space, and is almost as good as the standard BO on the Rastrigin function. Since the global optimum of the Rosenbrock function is inside a narrow flat valley, it is trivial to find the valley but difficult to converge to the global optimum. Thus it requires large budget for exploitation in that valley (i.e., exploitation-intense). Methods like FBO- EI, EI-H, and the standard BO may have unnecessarily large search space and hence waste budget on exploring regions far from the global optimum, rather than exploiting the valley. Thus compared to TRBO, it is more difficult for these three methods to find good solutions on the Rosenbrock function, especially when the dimensionality is high (Fig. 7.5). To demonstrate the effectiveness of adaptive exploration-exploitation trade-off, we ran TRBO with both fixed ? and adaptive ? solved from Eq. 7.20. The two test functions, Rosenbrock and Rastrigin, have different characteristics and hence prefer different exploration-exploitation trade-offs. Since the Rastrigin function has 163 a large number of local optima, the difficulty for optimizing on Rastrigin is to avoid getting stuck in those local optima. Thus an algorithm with a higher search space expanding rate (e.g., TRBO with a large ?) is likely to perform better since it will spend less budget exploiting local optima and more budget expanding towards the global optimum (i.e., exploration-intense). In contrast, due to the narrow flat valley in the Rosenbrock function, the difficult part is exploiting near the global optimum to refine the solution. Thus a lower expanding rate (e.g., TRBO with a small ?) is likely to be preferred since less budget will be wasted for exploration. The results shown in Fig. 7.6 are consistent with our expectation: the optimality gap increases with the value of the fixed ? on the Rosenbrock function, while the opposite behavior was observed on the Rastrigin function. However, by using an adaptive ? , TRBO performs better than most other configurations on both test functions. Note that the behavior of FBO-EI is similar to TRBO with a large ? (without considering the high performance variance on the Rosenbrock function). As every objective function weights exploitation and exploration differently, BO methods with a fixed expansion schedule may succeed for one function, but fail for another. The TRBO can avoid this by adaptively balancing exploitation-exploration while expanding the search space. 7.4.3 Constrained Problems We created two test problems to evaluate the performance of TRBO in dealing with constrained problems. Specifically, the constrained Rastrigin problem uses the Rastrigin function as the objective function, and the feasible domain is defined by an ellipse 0.01x2 + (x + 2)21 2 ? 1 (Fig. 7.7). The Nowacki beam problem is a real-world test problem originally described by Nowacki [157, 195]. The goal is to minimize the 164 cross-sectional area of a tip-loaded cantilever beam subject to certain constraints.1 The results of the two problems are shown in Figs. 7.7 and 7.8. For the constrained Rastrigin problem, TRBO achieved a better solution than the other two methods. For the Nowacki beam problem, FBO-EI?s solution has the lowest mean value but a very high variance; while TRBO found a fairly close optimal solution with much lower variance. The evaluated points by TRBO were dense near optima (either local or global). This behavior was, however, not obvious for the other two methods. This is likely because that FBO-EI and EI-H over-trusted the GP posterior even where its uncertainty was high. This resulted in sampling patterns with too much randomness, and hence higher variance of optimal solutions. 7.4.4 MLP on MNIST We use the hyperparameter optimization of a multilayer perceptron (MLP) as a real-world example to demonstrate the performance of the TRBO. MNIST was used as the training data. The MLP has 512 hidden units with ReLU activations and was implemented using TensorFlow [1]. We used Adam [116] as the MLP?s optimizer. We optimized seven hyperparameters, namely the learning rate, the learning rate decay, the dropout rate, and the L1 and L2 regularization coefficients for each layer. We performed the optimization in the log space (base 10) with the initial bounds of [?5,?4]7. The objective is to maximize the accuracy of the MLP. The observed accuracy values are usually left-skewed, because usually there are a few very bad values initially sampled by LHS, after which most observations have high accuracy. Thus we apply a cube transformation before normalizing the accuracy values. As 1The original problem is a multi-objective optimization problem that minimizes both the cross- sectional area and the bending stress. Here we only consider the first objective and limit the second objective (i.e., the bending stress should be smaller than the yield stress of the material) to form an extra constraint. 165 Figure 7.7: Optimization history and evaluated points for the constrained Rastrigin problem 166 TRBO FBO-EI EI-H Initial bounds True optimum Solution found Evaluations Figure 7.8: Optimization history and evaluated points for the Nowacki beam problem Figure 7.9: Optimization history for the hyperparameter tuning of a MLP trained on MNIST. 167 shown in Fig. 7.9, TRBO converged faster and found a better optimal solution than FBO-EI and EI-H. 7.5 Summary We proposed a Bayesian optimization method, TRBO, that gradually expands the search space, so that we can find the global optimum without having to specify the input space bounds that include it. The proposed method only evaluates samples at regions with low GP model uncertainty, and expands the search space adaptively to avoid over-exploration in an expanding search space. This method is useful in cases where we are not confident about the range of the global optimum. The experimental results show that our method outperforms the other state-of-the-art methods in most cases. In the standard BO, even if the input space bounds are set large enough to cover the global optimum, too much budget may be spent on needlessly exploring the large space. This will result in bad solutions when optimizing an exploitation-intense objective function, as shown by the Rosenbrock example. So far this dissertation has shown how to measure the intrinsic complexity of a design space to guide data-driven design synthesis, how to incorporate prior knowl- edge into data-driven design synthesis models, and how to explore design spaces or latent spaces when their bounds are unclear. The next chapter will summarize these contributions, look at their broader implications, and discuss possible future research directions. 168 Chapter 8: Conclusion Many traditional design processes go through the cycle of ideation, prototyping, evaluating, and improving. These processes involve human intervention, and can be expensive and time-consuming. Given a limited budget, only a few designs can be implemented and tested, and the true optimal design is hard to uncover since insufficient design alternatives are explored. Currently, topology optimization can be used to avoid human intervention in the traditional design process and automatically generate suboptimal designs. However, it does not reveal the underlying properties of a design space?e.g., what is the least number of factors that are sufficient to control the shape variation, how does the shape change in the design space, and what is the global optimal solution to a design problem. This dissertation addressed some of these related problems in traditional design processes and emergent generative design approaches by learning a compact represen- tation from data. The compact representation encodes important properties of the design space and makes it less expensive to search for desired designs. This chapter summarizes the main ideas in this dissertation, discusses some broader implications for the whole body of work, and proposes some potential future research directions. 169 8.1 Dissertation Summary This dissertation first reviewed problems in traditional design processes, includ- ing repetitive and expensive human intervention. Parametric design synthesis and design space exploration use consistent parameterization and automated algorithms to avoid human intervention, but still, the high-dimensional design space is difficult to understand and search for solutions. To solve these problems, this dissertation introduced data-driven methods that learn the low-dimensional compact representa- tion of designs from data, and presented efficient approaches to explore this compact space. To understand those data-driven methods, Chapter 2 reviewed some fundamental concepts and the existing state of the art in data-driven design space exploration and synthesis. It first introduced existing design representations and their issues. Then it reviewed the manifold hypothesis and how to learn and explore latent spaces on those manifolds. That chapter also discussed current approaches for synthesizing new designs, namely rule-based, assembly-based, and data-driven approaches. The dissertation continued by showing how to measure the intrinsic complexity and dimensionality of a design space. The method introduced in Chapter 3 first captures the inherent properties of a design space and then chooses the appropri- ate design embedding based on the captured properties. It successfully identified multiple sub-manifolds and their intrinsic dimensions. By generating fundamental knowledge about the inherent complexity of a design space and how designs differ from one another, this approach allows us to improve design optimization, consumer preference learning, geometric modeling, and other design applications that rely on navigating complex design spaces. Ultimately, this deepens our understanding of design complexity in general. 170 The learned properties of the design space can guide data-driven design synthesis. However, purely data-driven methods ignore designs? functional or geometrical con- straints and characteristics, and thus will have limitations. Chapters 4 and 5 looked at incorporating prior knowledge into data-driven models to improve synthesis qual- ity. Specifically, Chapter 4 introduced a deep generative model that can synthesize aerodynamic shapes by incorporating smoothness constraints. It also demonstrated that the generative model both (1) learns realistic and compact airfoil shape repre- sentations and (2) empirically accelerates optimization convergence by over an order of magnitude. Chapter 5 presented a generative model for synthesizing designs with multiple parts. The method decomposes the problem of synthesizing/optimizing the whole design into synthesizing/optimizing each part separately but keeping the inter- part dependencies satisfied. This technique of capturing dependencies among parts lay the foundation for learned generative models to extend to more realistic engineer- ing systems where such relationships are widespread. The latent spaces constructed by those above-mentioned models reduce the cost for design space exploration. But sometimes the desired design is remarkably different from existing ones (i.e., outside the region of existing designs in the latent space), which makes it difficult to specify latent space bounds for exploration. Chapters 6 and 7 introduced design space exploration methods that gradually expand a design space or a latent space, so that no fixed bounds are required. Specifically, Chap- ter 6 introduced a method, Active Expansion Sampling (AES), that identifies feasible domains when the design space/latent space bounds are unclear. The method both learns the domain boundary of feasible designs, while also expanding our knowledge of the design space/latent space as available budget increases. That chapter also showed how coupling design manifolds with AES allows us to actively expand high- dimensional design spaces without incurring this exponential penalty, and it enables 171 the discovery of designs that have different appearance and functionality from its initial design set. Chapter 7 presented a global optimization method, Trust Region Bayesian Optimization (TRBO), that can be used in a similar scenario, i.e., it only needs to specify an initial search space that does not necessarily include the global optimum, and will expand the search space when necessary. TRBO avoids the over- exploration problem that often occurs in unbounded Bayesian optimization. Results show that it out-performs the current state-of-the-art methods. 8.2 Broader Implications This dissertation brought together some important aspects for understanding de- sign spaces. It started by looking at the complexity of the design space?the intrinsic dimensionality, the data separability, and the non-linearity. It also showed that we can understand the shape variation in a design space by visualizing a low-dimensional latent space that encodes major shape changes. The dissertation then introduced a way of decomposing a space of assemblies (i.e., designs with multiple components) into hierarchical latent spaces. This not only suits some practical needs (e.g., hier- archical design synthesis/optimization), but also converts a composite design space into subspaces that are more comprehensible by human beings. Through these tech- niques, we can see clearly what factors change or constrain our designs, so that we can explore alternative solutions within appropriate limits. The techniques of under- standing the design space can be extended to any engineering systems where design space exploration is needed and data on previous designs are sufficient. This allows us to explore the design space more efficiently compared to conventional exploration techniques and mitigates the curse of dimensionality. The unbounded design space exploration methods introduced in Chapters 6 and 7 172 (i.e., AES and TRBO) essentially eliminate the need for performing adaptive sampling or Bayesian optimization within a fixed bounding box. While this dissertation mainly demonstrated their application in expanding the design space, they can be applied in any circumstances where the variable bounds are unclear. For example, they are useful in deciding the parameters of any engineering systems or algorithms where those parameter bounds are hard to specify, such that any conventional methods that work within fixed bounds would possibly miss either part of the feasible domain (when performing feasible domain identification) or the true global optimum (when performing global optimization) as they may be outside the specified bounds. 8.3 Limitations and Future Research Directions The main idea of this dissertation is to learn a compact representation of the design space from data, so that it enables efficient design synthesis and design space exploration. However, unlike learning from art or literature (e.g., images, videos, or articles), engineering design usually has strict geometry, functionality, or performance requirements. Violating those requirements may result in invalid designs. This makes purely data-driven models unreliable in practice. Chapters 4 and 6 have demonstrated the combination of a data-driven model with aerodynamic simulation or human an- notation, which are used to evaluate or justify synthesized designs. Chapter 4 and 5 also used geometric constraints or inter-part dependencies as prior knowledge to define the data-driven model, so that synthesized designs automatically satisfy those prior beliefs. It is also interesting to incorporate physics into the model so that the synthesized designs will have desired functionality or performance [32, 238]. In the future, we can consider a broader range of prior knowledge regarding the geometry, physical, and mathematical properties of designs. 173 Designs generated from purely data-driven models will not look much different from existing ones, as these models only learn from past examples. This (poten- tially) limits the creation of innovative designs. The unbounded design space explo- ration described in Chapters 6 and 7 mitigated this problem by extending the design manifold. But still, new designs will (by construction) be restricted to the learned manifold. In contrast, techniques like topology optimization, though having its own disadvantages, can generate innovative designs by considering primarily physics-based objectives and constraints. Thus it is interesting to combine machine learning with physics-based solvers to learn a compact solution space for design optimization prob- lems with physics objectives/constraints. Most design examples used in this work have relatively simple geometries. They are good for demonstrating the concepts and effectiveness of our proposed methods. However, we acknowledge that there are more complex real-world designs and the scalability of our model on data complexity needs to be further investigated in the future. It is also useful to develop techniques for reducing the computational cost for complex designs. Different from the datasets of images or text that are widely used in the machine learning community, the public access of huge engineering design dataset is unrealistic, which impedes the development of effective data-driven design models. One promising future research avenue would then be improving the sample efficiency of those models. Machine learning researchers have already applied few-shot learning to tasks such as classification, regression, and reinforcement learning [72, 200, 222], where the model can learn from only a few examples. Realizing few-shot learning for engineering design tasks will be extremely useful considering the limited design data. On the other hand, as humans create more and more designs, in the future we are able to maintain a diverse range of design datasets. By leveraging such resource of 174 existing designs, we can extract useful knowledge for creating new designs and even- tually eliminate or reduce the cost of human intervention in the design process. This work has brought us one step closer to the goal where designers can use automated algorithms to efficiently search for any desired solutions, so that it frees the time of designers, provides inspiration for them, and even allows practitioners to create products while minimizing required expert input. This, in turn, will help augment the design datasets and further improve the quality of generated designs. 175 Appendix A: Publications A.1 Journal 1. Chen W, Fuge M. Synthesizing Designs with Inter-Part Dependencies using Hierarchical Generative Adversarial Networks. ASME. J. Mech. Des. 2019;():1- 15. doi:10.1115/1.4044076. (Accepted) Code: https://github.com/IDEALLab/hgan_jmd_2019 2. Chen W, Fuge M. Active expansion sampling for learning feasible domains in an unbounded input space. Structural and Multidisciplinary Optimization. 2018;57(3):925-945. doi:10.1007/s00158-017-1894-y. Code: https://github.com/IDEALLab/Active-Expansion-Sampling 3. Chen W, Fuge M. Beyond the Known: Detecting Novel Feasible Domains over an Unbounded Design Space. ASME. J. Mech. Des. 2017;139(11):111405- 111405-10. doi:10.1115/1.4037306. Code: https://github.com/IDEALLab/domain_expansion_jmd_2017 4. Chen W, Fuge M, Chazan J. Design Manifolds Capture the Intrinsic Complex- ity and Dimension of Design Spaces. ASME. J. Mech. Des. 2017;139(5):051102- 051102-10. doi:10.1115/1.4036134. Code: https://github.com/IDEALLab/design_embeddings_jmd_2016 176 A.2 Peer-Reviewed Conference 1. Chen W, Fuge M. Expanding Search Space for Global Optimization with Trust Region Bayesian Optimization. Advances in Neural Information Processing Systems. 2019. (under review) 2. Chen W, Chiu K, Fuge M. Aerodynamic Design Optimization and Shape Ex- ploration using Generative Adversarial Networks. AIAA Scitech 2019 Forum. doi:10.2514/6.2019-2351. (Invited talk) Code: https://github.com/IDEALLab/airfoil-opt-gan 3. Chen W, Jeyaseelan A, Fuge M. Synthesizing Designs With Inter-Part Depen- dencies Using Hierarchical Generative Adversarial Networks. ASME. Interna- tional Design Engineering Technical Conferences and Computers and Informa- tion in Engineering Conference, Volume 2A: 44th Design Automation Confer- ence:V02AT03A007. doi:10.1115/DETC2018-85339. Code: https://github.com/IDEALLab/hgan_idetc2018 4. Chen W, Chazan J, Fuge M. How Designs Differ: Non-Linear Embeddings Illuminate Intrinsic Design Complexity. ASME. International Design Engi- neering Technical Conferences and Computers and Information in Engineering Conference, Volume 2A: 42nd Design Automation Conference:V02AT03A014. doi:10.1115/DETC2016-60112. Code: https://github.com/IDEALLab/design_embeddings_idetc_2016 177 Bibliography [1] Mart??n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven- berg, Dan Mane?, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vie?gas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. [2] Alekh Agarwal. Selective sampling algorithms for cost-sensitive multiclass pre- diction. ICML (3), 28:1220?1228, 2013. [3] Ibrahim Alabdulmohsin, Xin Gao, and Xiangliang Zhang. Efficient active learn- ing of halfspaces via query synthesis. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pages 2483?2489. AAAI Press, 2015. [4] Anand Amrit, Leifur T Leifsson, Slawomir Koziel, and Yonatan Afework Tes- fahunegn. Efficient multi-objective aerodynamic optimization by design space dimension reduction and co-kriging. In 17th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, page 3515, 2016. [5] George R Anderson, Marian Nemec, and Michael J Aftosmis. Aerodynamic shape optimization benchmarks with error control and automatic parameteri- zation. In 53rd AIAA Aerospace Sciences Meeting, page 1719, 2015. [6] Dana Angluin. Queries revisited. Theoretical Computer Science, 313(2):175? 194, 2004. [7] Shlomo Argamon-Engelson and Ido Dagan. Committee-based sample selection for probabilistic classifiers. J. Artif. Intell. Res.(JAIR), 11:335?360, 1999. 178 [8] Melinos Averkiou, Vladimir G Kim, Youyi Zheng, and Niloy J Mitra. Shapesynth: Parameterizing model collections for coupled shape exploration and synthesis. In Computer Graphics Forum, volume 33, pages 125?134. Wiley Online Library, 2014. [9] Pranjal Awasthi, Vitaly Feldman, and Varun Kanade. Learning using local membership queries. In Shai Shalev-Shwartz and Ingo Steinwart, editors, Pro- ceedings of the 26th Annual Conference on Learning Theory, volume 30 of Pro- ceedings of Machine Learning Research, pages 398?431, Princeton, NJ, USA, 12?14 Jun 2013. PMLR. [10] Yoram Baram, Ran El Yaniv, and Kobi Luz. Online choice of active learning algorithms. Journal of Machine Learning Research, 5(Mar):255?291, 2004. [11] Anirban Basudhar, Christoph Dribusch, Sylvain Lacaze, and Samy Missoum. Constrained efficient global optimization with support vector machines. Struc- tural and Multidisciplinary Optimization, 46(2):201?221, 2012. [12] Anirban Basudhar and Samy Missoum. Adaptive explicit decision functions for probabilistic design and optimization using support vector machines. Computers & Structures, 86(19):1904?1917, 2008. [13] Anirban Basudhar and Samy Missoum. An improved adaptive sampling scheme for the construction of explicit boundaries. Structural and Multidisciplinary Optimization, 42(4):517?529, 2010. [14] R.E. Bellman. Dynamic programming. Princeton University Press, Princeton, NY, 1957. [15] Richard E Bellman. Adaptive control processes: a guided tour. Princeton uni- versity press, 2015. [16] Martin P Bends?e. Topology optimization. Springer, 2009. [17] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transac- tions on, 35(8):1798?1828, Aug 2013. [18] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, et al. Greedy layer-wise training of deep networks. Advances in neural information processing systems, 19:153, 2007. [19] Steven H Berguin and Dimitri N Mavris. Dimensional design space exploration of expensive functions with access to gradient. In 15th AIAA/ISSMO Multidis- ciplinary Analysis and Optimization Conference, page 2174, 2014. 179 [20] Steven H Berguin and Dimitri N Mavris. Dimensionality reduction using prin- cipal component analysis applied to the gradient. AIAA Journal, 53(4):1078? 1090, 2014. [21] Steven H Berguin, David Rancourt, and Dimitri N Mavris. Method to facili- tate high-dimensional design space exploration using computationally expensive analyses. AIAA Journal, 53(12):3752?3765, 2015. [22] Djallel Bouneffouf. Exponentiated gradient exploration for active learning. Computers, 5(1):1, 2016. [23] Robert Bridson. Fast poisson disk sampling in arbitrary dimensions. In ACM SIGGRAPH 2007 Sketches, SIGGRAPH ?07, New York, NY, USA, 2007. ACM. [24] Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236, 2016. [25] Charles George Broyden. The convergence of a class of double-rank minimiza- tion algorithms 1. general considerations. IMA Journal of Applied Mathematics, 6(1):76?90, 1970. [26] Brent Bryan, Robert C Nichol, Christopher R Genovese, Jeff Schneider, Christo- pher J Miller, and Larry Wasserman. Active learning for identifying function threshold boundaries. In Advances in neural information processing systems, pages 163?170, 2006. [27] Alexander Burnap, Ye Liu, Yanxin Pan, Honglak Lee, Richard Gonzalez, and Panos Y Papalambros. Estimating and exploring the product form design space using deep generative models. In ASME 2016 International Design Engineering Technical Conferences and Computers and Information in Engineering Con- ference, pages V02AT03A013?V02AT03A013. American Society of Mechanical Engineers, 2016. [28] Alexander Burnap, Ye Liu, Yanxin Pan, Honglak Lee, Richard Gonzalez, and Panos Y Papalambros. Estimating and exploring the product form design space using deep generative models. In ASME 2016 International Design Engineering Technical Conferences and Computers and Information in Engineering Confer- ence, page V02AT03A013. American Society of Mechanical Engineers, 2016. [29] Richard H Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited mem- ory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190?1208, 1995. 180 [30] Jonathan Cagan, Matthew I Campbell, Susan Finger, and Tetsuo Tomiyama. A framework for computational design synthesis: model and applications. Journal of Computing and Information Science in Engineering, 5(3):171?181, 2005. [31] Colin Campbell, Nello Cristianini, and Alex J Smola. Query learning with large margin classifiers. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 111?118. Morgan Kaufmann Publishers Inc., 2000. [32] Ruijin Cang, Hope Yao, and Yi Ren. One-shot generation of near-optimal topol- ogy through theory-driven machine learning. Computer-Aided Design, 109:12? 21, 2019. [33] Giovanni Cavallanti, Nicolo Cesa-Bianchi, and Claudio Gentile. Linear classifi- cation and selective sampling under low noise conditions. In Advances in Neural Information Processing Systems, pages 249?256, 2009. [34] Nicolo Cesa-Bianchi, Claudio Gentile, and Francesco Orabona. Robust bounds for classification via selective sampling. In Proceedings of the 26th annual in- ternational conference on machine learning, pages 121?128. ACM, 2009. [35] Siddhartha Chaudhuri, Evangelos Kalogerakis, Stephen Giguere, and Thomas Funkhouser. Attribit: content creation with semantic attributes. In Proceedings of the 26th annual ACM symposium on User interface software and technology, pages 193?202. ACM, 2013. [36] Siddhartha Chaudhuri, Evangelos Kalogerakis, Leonidas Guibas, and Vladlen Koltun. Probabilistic reasoning for assembly-based 3d modeling. In ACM Transactions on Graphics (TOG), volume 30, page 35. ACM, 2011. [37] Siddhartha Chaudhuri and Vladlen Koltun. Data-driven suggestions for creativ- ity support in 3d modeling. ACM Transactions on Graphics (TOG), 29(6):183, 2010. [38] Lin Chen, Hamed Hassani, and Amin Karbasi. Near-optimal active learn- ing of halfspaces via query synthesis in the noisy setting. arXiv preprint arXiv:1603.03515, 2016. [39] Tian Qi Chen, Xuechen Li, Roger Grosse, and David Duvenaud. Isolat- ing sources of disentanglement in variational autoencoders. arXiv preprint arXiv:1802.04942, 2018. [40] Wei Chen, Jonah Chazan, and Mark Fuge. How designs differ: Non-linear em- beddings illuminate intrinsic design complexity. In ASME International Design Engineering Technical Conferences. ASME, August 2016. 181 [41] Wei Chen, Noa Chazan, and Mark Fuge. How designs differ: Non-linear em- beddings illuminate intrinsic design complexity. In ASME 2016 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, pages V02AT03A014?V02AT03A014. American Soci- ety of Mechanical Engineers, August 2016. [42] Wei Chen, Kevin Chiu, and Mark Fuge. Aerodynamic design optimization and shape exploration using generative adversarial networks. In AIAA Scitech 2019 Forum, page 2351, 2019. [43] Wei Chen and Mark Fuge. Active expansion sampling for learning feasible domains in an unbounded input space. Structural and Multidisciplinary Opti- mization, pages 1?21, 2017. [44] Wei Chen and Mark Fuge. Beyond the known: Detecting novel feasible domains over an unbounded design space. Journal of Mechanical Design, 139(11):111405?111405?10, 2017. [45] Wei Chen and Mark Fuge. Active expansion sampling for learning feasible domains in an unbounded input space. Structural and Multidisciplinary Opti- mization, 57(3):925?945, 2018. [46] Wei Chen and Mark Fuge. Be?ziergan: Automatic generation of smooth curves from interpretable low-dimensional parameters. arXiv preprint arXiv:1808.08871, 2018. [47] Wei Chen and Mark Fuge. Be?ziergan: Automatic generation of smooth curves from interpretable low-dimensional parameters. arXiv preprint arXiv:1808.08871, 2018. [48] Wei Chen and Mark Fuge. Synthesizing designs with inter-part dependencies using hierarchical generative adversarial networks. Journal of Mechanical De- sign, 2019. [49] Wei Chen, Mark Fuge, and Noa Chazan. Design manifolds capture the intrinsic complexity and dimension of design spaces. Journal of Mechanical Design, 139(5):051102?051102?10, 2017. [50] Wei Chen, Ashwin Jeyaseelan, and Mark Fuge. Synthesizing designs with inter- part dependencies using hierarchical generative adversarial networks. In ASME 2016 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Quebec City, Canada, Aug 2018. ASME. 182 [51] Xi Chen, Matteo Diez, Manivannan Kandasamy, Zhiguo Zhang, Emilio F Cam- pana, and Frederick Stern. High-fidelity global optimization of shape design by dimensionality reduction, metamodels and deterministic particle swarm. Engi- neering Optimization, 47(4):473?494, 2015. [52] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximiz- ing generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2172?2180, 2016. [53] Zhenzhong Chen, Siping Peng, Xiaoke Li, Haobo Qiu, Huadi Xiong, Liang Gao, and Peigen Li. An important boundary sampling method for reliability- based design optimization using kriging model. Structural and Multidisciplinary Optimization, 52(1):55?70, 2015. [54] Zhenzhong Chen, Haobo Qiu, Liang Gao, Xiaoke Li, and Peigen Li. A local adaptive sampling method for reliability-based design optimization using krig- ing model. Structural and Multidisciplinary Optimization, 49(3):401?416, 2014. [55] Davide Cinquegrana and Emiliano Iuliano. Efficient global optimization of a transonic wing with geometric data reduction. In 35th AIAA Applied Aerody- namics Conference, page 3057, 2017. [56] Davide Cinquegrana and Emiliano Iuliano. Investigation of adaptive design vari- ables bounds in dimensionality reduction for aerodynamic shape optimization. Computers & Fluids, 174:89?109, 2018. [57] David Cohn, Les Atlas, and Richard Ladner. Improving generalization with active learning. Machine learning, 15(2):201?221, 1994. [58] Ido Dagan and Sean P Engelson. Committee-based sampling for training prob- abilistic classifiers. In In Proceedings of the Twelfth International Conference on Machine Learning, 1995. [59] Danny D?Agostino, Andrea Serani, Emilio F Campana, and Matteo Diez. Deep autoencoder for off-line design-space dimensionality reduction in shape opti- mization. In 2018 AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, page 1648, 2018. [60] Sanjoy Dasgupta, Adam Tauman Kalai, and Claire Monteleoni. Analysis of perceptron-based active learning. Journal of Machine Learning Research, 10(Feb):281?299, 2009. [61] Ofer Dekel, Claudio Gentile, and Karthik Sridharan. Selective sampling and active learning from single and multiple teachers. Journal of Machine Learning Research, 13(Sep):2655?2697, 2012. 183 [62] Pierluigi Della Vecchia, Elia Daniele, and Egidio D?Amato. An airfoil shape optimization technique coupling parsec parameterization and evolutionary al- gorithm. Aerospace Science and Technology, 32(1):103?110, 2014. [63] RW Derksen and Tim Rogalsky. Bezier-parsec: An optimized aerofoil parame- terization for design. Advances in engineering software, 41(7-8):923?930, 2010. [64] Srikanth Devanathan and Karthik Ramani. Creating polytope representations of design spaces for visual exploration using consistency techniques. Journal of Mechanical Design, 132(8):081011, 2010. [65] Tristan Dhert, Turaj Ashuri, and Joaquim R. R. A. Martins. Aerody- namic shape optimization of wind turbine blades using a Reynolds-averaged Navier?Stokes model and an adjoint method. Wind Energy, 20:909?926, 05/2017 2017. [66] Matteo Diez, Emilio F Campana, and Frederick Stern. Design-space dimension- ality reduction in shape optimization by karhunen?loe?ve expansion. Computer Methods in Applied Mechanics and Engineering, 283:1525?1544, 2015. [67] Mark Drela. Xfoil: An analysis and design system for low reynolds number airfoils. In Low Reynolds number aerodynamics, pages 1?12. Springer, 1989. [68] David K Duvenaud, Oren Rippel, Ryan P Adams, and Zoubin Ghahramani. Avoiding pathologies in very deep networks. In AISTATS, pages 202?210, 2014. [69] Danny DAgostino, Andrea Serani, Emilio F Campana, and Matteo Diez. Non- linear methods for design-space dimensionality reduction in shape optimization. In International Workshop on Machine Learning, Optimization, and Big Data, pages 121?132. Springer, 2017. [70] M Ebrahimi and A Jahangirian. Aerodynamic optimization of airfoils using adaptive parameterization and genetic algorithm. Journal of Optimization The- ory and Applications, 162(1):257?271, 2014. [71] Haoqiang Fan, Hao Su, and Leonidas Guibas. A point set generation network for 3d object reconstruction from a single image. In Conference on Computer Vision and Pattern Recognition (CVPR), volume 38, page 1, 2017. [72] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1126?1135. JMLR. org, 2017. [73] Noa Fish, Melinos Averkiou, Oliver Van Kaick, Olga Sorkine-Hornung, Daniel Cohen-Or, and Niloy J Mitra. Meta-representation of shape families. ACM Transactions on Graphics (TOG), 33(4):34, 2014. 184 [74] Roger Fletcher. A new approach to variable metric algorithms. The computer journal, 13(3):317?322, 1970. [75] Yoav Freund, H Sebastian Seung, Eli Shamir, and Naftali Tishby. Selective sampling using the query by committee algorithm. Machine learning, 28(2):133? 168, 1997. [76] Francesca Fusi, Giuseppe Quaranta, Alberto Guardone, and Pietro M Congedo. Drag minimization of an isolated airfoil in transonic inviscid flow by means of genetic algorithms. In 53rd AIAA Aerospace Sciences Meeting, page 1722, 2015. [77] Adam Gaier, Alexander Asteroth, and Jean-Baptiste Mouret. Aerodynamic de- sign exploration through surrogate-assisted illumination. In 18th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, page 3330, 2017. [78] Michael A Gelbart, Jasper Snoek, and Ryan P Adams. Bayesian optimiza- tion with unknown constraints. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, pages 250?259. AUAI Press, 2014. [79] Johan Gielis. A generic geometric transformation that unifies a wide range of natural and abstract shapes. American journal of botany, 90(3):333?338, 2003. [80] Thomas Gmeiner and Kristina Shea. A spatial grammar for the computational design synthesis of vise jaws. In ASME 2013 International Design Engineering Technical Conferences and Computers and Information in Engineering Con- ference, pages V03AT03A006?V03AT03A006. American Society of Mechanical Engineers, 2013. [81] Donald Goldfarb. A family of variable-metric methods derived by variational means. Mathematics of computation, 24(109):23?26, 1970. [82] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU Press, 2012. [83] Dian Gong, Xuemei Zhao, and Ge?rard Medioni. Robust multiple manifolds structure learning. arXiv preprint arXiv:1206.4624, 2012. [84] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016. [85] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adver- sarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672?2680. Curran Associates, Inc., 2014. 185 [86] Alkis Gotovos, Nathalie Casati, Gregory Hitz, and Andreas Krause. Active learning for level set estimation. In Proceedings of the Twenty-Third Interna- tional Joint Conference on Artificial Intelligence, IJCAI ?13, pages 1344?1350. AAAI Press, 2013. [87] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Scho?lkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723?773, 2012. [88] Zachary J Grey and Paul G Constantine. Active subspaces of airfoil shape parameterizations. AIAA Journal, 56(5):2003?2017, 2018. [89] Tinghao Guo, Daniel R Herber, and James T Allison. Circuit synthesis using generative adversarial networks (GANs). In AIAA 2019 Science and Technology Forum and Exposition, number AIAA 2019-2350, San Diego, CA, USA, January 2019. [90] Xuekun Guo, Juncong Lin, Kai Xu, and Xiaogang Jin. Creature grammar for creative modeling of 3d monsters. Graphical Models, 76(5):376?389, 2014. [91] David Ha and Douglas Eck. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477, 2017. [92] Zhong-Hua Han, Jing Chen, Ke-Shi Zhang, Zhen-Ming Xu, Zhen Zhu, and Wen-Ping Song. Aerodynamic shape optimization of natural-laminar-flow wing using surrogate-based approach. AIAA Journal, 56(7):2579?2593, 2018. [93] Philipp Hennig and Christian J Schuler. Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13(Jun):1809?1837, 2012. [94] Jose? Miguel Herna?ndez-Lobato, Matthew W Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of black-box functions. In Advances in neural information processing systems, pages 918?926, 2014. [95] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. Beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proceedings of the 5th International Conference on Learning Representations, 2017. [96] Trong Nghia Hoang, Bryan Kian Hsiang Low, Patrick Jaillet, and Mohan Kankanhalli. Nonmyopic -bayes-optimal active learning of gaussian processes. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learn- ing Research, pages 739?747, Bejing, China, 22?24 Jun 2014. PMLR. 186 [97] Steven CH Hoi, Rong Jin, Jianke Zhu, and Michael R Lyu. Semisupervised svm batch mode active learning with applications to image retrieval. ACM Transactions on Information Systems (TOIS), 27(3):16, 2009. [98] Wei-Ning Hsu and Hsuan-Tien Lin. Active learning by learning. In Proceed- ings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI?15, pages 2659?2665. AAAI Press, 2015. [99] Haibin Huang, Evangelos Kalogerakis, and Benjamin Marlin. Analysis and synthesis of 3d shape families via deep-learned generative models of surfaces. In Computer Graphics Forum, volume 34, pages 25?38. Wiley Online Library, 2015. [100] Sheng-Jun Huang, Rong Jin, and Zhi-Hua Zhou. Active learning by querying informative and representative examples. In Advances in neural information processing systems, pages 892?900, 2010. [101] Yen-Chih Huang and Kuei-Yuan Chan. A modified efficient global optimization algorithm for maximal reliability in a probabilistic constrained space. Journal of Mechanical Design, 132(6):061002, 2010. [102] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Sequential model- based optimization for general algorithm configuration. In International Confer- ence on Learning and Intelligent Optimization, pages 507?523. Springer, 2011. [103] Jeffrey C Jackson. An efficient membership-query algorithm for learning dnf with respect to the uniform distribution. Journal of Computer and System Sciences, 55(3):414?440, 1997. [104] A. R. Jahangirian and M. Ebrahimi. Airfoil shape optimization with adap- tive mutation genetic algorithm. Journal of Aerospace Science and Technology, 11(1):?, 2017. [105] Je?, r iuml, me Le?, pine, Fran-atilde, ois Guibault, Jean-Yves Tre?, panier, Fran- atilde, ois Pe?, et al. Optimized nonuniform rational b-spline geometrical rep- resentation for aerodynamic design of wings. AIAA journal, 39(11):2033?2041, 2001. [106] Jae-Ho Jeong and Soo-Hyun Kim. Optimization of thick wind turbine air- foils using a genetic algorithm. Journal of Mechanical Science and Technology, 32(7):3191?3199, Jul 2018. [107] Donald R Jones. A taxonomy of global optimization methods based on response surfaces. Journal of global optimization, 21(4):345?383, 2001. 187 [108] Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455?492, 1998. [109] Evangelos Kalogerakis, Siddhartha Chaudhuri, Daphne Koller, and Vladlen Koltun. A probabilistic model for component-based shape synthesis. ACM Transactions on Graphics (TOG), 31(4):55, 2012. [110] Kirthevasan Kandasamy, Jeff Schneider, and Barnaba?s Po?czos. Query efficient posterior estimation in scientific experiments via bayesian active learning. Ar- tificial Intelligence, 243:45?56, 2017. [111] Ashish Kapoor, Kristen Grauman, Raquel Urtasun, and Trevor Darrell. Gaus- sian processes for object categorization. International journal of computer vi- sion, 88(2):169?188, 2010. [112] Gaetan K. W. Kenway, Graeme J. Kennedy, and Joaquim R. R. A. Martins. Scalable parallel approach for high-fidelity steady-state aeroelastic analysis and adjoint derivative computations. AIAA Journal, 52:935?951, 2014. [113] Gaetan KW Kenway and Joaquim RRA Martins. Buffet-onset constraint for- mulation for aerodynamic shape optimization. AIAA Journal, 55(6):1930?1947, 2017. [114] Hyunjik Kim and Andriy Mnih. Disentangling by factorising. arXiv preprint arXiv:1802.05983, 2018. [115] Ross D King, Kenneth E Whelan, Ffion M Jones, Philip GK Reiser, Christo- pher H Bryant, Stephen H Muggleton, Douglas B Kell, and Stephen G Oliver. Functional genomic hypothesis generation and experimentation by a robot sci- entist. Nature, 427(6971):247?252, 2004. [116] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimiza- tion. arXiv preprint arXiv:1412.6980, 2014. [117] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013. [118] Corinna Ko?nigseder and Kristina Shea. Analyzing generative design grammars. In Design Computing and Cognition?14, pages 363?381. Springer International Publishing, 2015. [119] Corinna Ko?nigseder and Kristina Shea. Comparing strategies for topologic and parametric rule application in automated computational design synthesis. Journal of Mechanical Design, 138(1):011102, 2016. 188 [120] Corinna Ko?nigseder and Kristina Shea. Visualizing relations between grammar rules, objectives, and search space exploration in grammar-based computational design synthesis. Journal of Mechanical Design, 138(10):101101, 2016. [121] Corinna Ko?nigseder, Tino Stankovic?, and Kristina Shea. Improving design grammar development and application through network-based analysis of tran- sition graphs. Design Science, 2, 2016. [122] Konstantinos V Kostas, Alexandros I Ginnis, Constantinos G Politis, and Pana- giotis D Kaklis. Shape-optimization of 2d hydrofoils using an isogeometric bem solver. Computer-Aided Design, 82:79?87, 2017. [123] Slawomir Koziel and Leifur Leifsson. Multi-level surrogate-based airfoil shape optimization. In 51st AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, page 778, 2013. [124] Andreas Krause and Carlos Guestrin. Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In Proceedings of the 24th international conference on Machine learning, pages 449?456. ACM, 2007. [125] Georg Krempl, Daniel Kottke, and Vincent Lemaire. Optimised probabilistic active learning (opal). Machine Learning, 100(2-3):449?476, 2015. [126] Brenda M Kulfan. Universal parametric geometry representation method. Jour- nal of Aircraft, 45(1):142?158, 2008. [127] Harold J Kushner. A new method of locating the maximum point of an arbi- trary multipeak curve in the presence of noise. Journal of Basic Engineering, 86(1):97?106, 1964. [128] Brad J Larson and Christopher A Mattson. Design space exploration for quantifying a system models feasible domain. Journal of Mechanical Design, 134(4):041010, 2012. [129] Johan Larsson and Qiqi Wang. The prospect of using large eddy and detached eddy simulations in engineering design, and the research required to get there. Phil. Trans. R. Soc. A, 372(2022):20130329, 2014. [130] Guenhael Le Quilliec, Balaji Raghavan, and Piotr Breitkopf. A manifold learning-based reduced order model for springback shape characterization and optimization in sheet metal forming. Computer Methods in Applied Mechanics and Engineering, 285:621?638, 2015. [131] Rodolphe Le Riche and Raphael T Haftka. Optimization of laminate stacking sequence for buckling load maximization by genetic algorithm. AIAA journal, 31(5):951?956, 1993. 189 [132] John A. Lee and Michel Verleysen. Nonlinear Dimensionality Reduction. Springer, New York, 2007. [133] Tae Hee Lee and Jae Jun Jung. A sampling technique enhancing accuracy and efficiency of metamodel-based rbdo: Constraint boundary sampling. Computers & Structures, 86(13):1463?1476, 2008. [134] Jerome Lepine, Jean-Yves Trepanier, and Francois Pepin. Wing aerodynamic design using an optimized nurbs geometrical representation. In 38th Aerospace Sciences Meeting and Exhibit, page 669, 2000. [135] David D Lewis and Jason Catlett. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the eleventh international conference on machine learning, pages 148?156, 1994. [136] David D Lewis and William A Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3?12. Springer-Verlag New York, Inc., 1994. [137] Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Zhang, and Leonidas Guibas. Grass: Generative recursive autoencoders for shape structures. ACM Transactions on Graphics (TOG), 36(4):52, 2017. [138] Pei Li, Richard Seebass, and Helmut Sobieczky. Manual aerodynamic optimiza- tion of an oblique flying wing. In 36th AIAA Aerospace Sciences Meeting and Exhibit, page 598, 1998. [139] Zijing LIU, Xuejun LIU, and Xinye CAI. A new hybrid aerodynamic optimiza- tion framework based on differential evolution and invasive weed optimization. Chinese Journal of Aeronautics, 31(7):1437 ? 1448, 2018. [140] Danny J Lohan, Ercan M Dede, and James T Allison. Topology optimization for heat conduction using generative design algorithms. Structural and Multi- disciplinary Optimization, 55(3):1063?1077, 2017. [141] Trent W Lukaczyk, Paul Constantine, Francisco Palacios, and Juan J Alonso. Active subspaces for shape optimization. In 10th AIAA Multidisciplinary Design Optimization Conference, page 1171, 2014. [142] Yifei Ma, Roman Garnett, and Jeff Schneider. Active area search via bayesian quadrature. In Artificial Intelligence and Statistics, pages 595?603, 2014. [143] Oisin Mac Aodha, Neill DF Campbell, Jan Kautz, and Gabriel J Brostow. Hierarchical subquery evaluation for active learning on a graph. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 564?571, 2014. 190 [144] Spyros Makridakis. Accuracy measures: theoretical and practical concerns. International Journal of Forecasting, 9(4):527?529, 1993. [145] K Mani, B.A. Lockwood, and D.J. Mavriplis. Adjoint-based unsteady airfoil de- sign optimization with application to dynamic stall. Annual Forum Proceedings - AHS International, 3:1940?1952, 01 2012. [146] Dominic A Masters, Nigel J Taylor, TCS Rendall, Christian B Allen, and Daniel J Poole. Geometric comparison of aerofoil shape parameterization meth- ods. AIAA Journal, pages 1575?1589, 2017. [147] Justin Matejka, Michael Glueck, Erin Bradner, Ali Hashemi, Tovi Grossman, and George Fitzmaurice. Dream lens: Exploration and visualization of large- scale generative design datasets. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, page 369. ACM, 2018. [148] Andrew McCallum, Kamal Nigam, et al. Employing em and pool-based active learning for text classification. In ICML, volume 98, pages 359?367, 1998. [149] Michael D McKay, Richard J Beckman, and William J Conover. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2):239?245, 1979. [150] Pascal Mu?ller, Peter Wonka, Simon Haegler, Andreas Ulmer, and Luc Van Gool. Procedural modeling of buildings. In Acm Transactions On Graphics (Tog), volume 25, pages 614?623. ACM, 2006. [151] Raymond H Myers, Douglas C Montgomery, et al. Response surface method- ology: process and product optimization using designed experiments, volume 3. Wiley New York, 1995. [152] Siva Nadarajah, Patrice Castonguay, and Arash Mousavi. Survey of shape parameterization techniques and its effect on three-dimensional aerodynamic shape optimization. In 18th AIAA computational fluid dynamics conference, page 3837, 2007. [153] Charlie Nash and Chris KI Williams. The shape variational autoencoder: A deep generative model of part-segmented 3d objects. In Computer Graphics Forum, volume 36, pages 1?12. Wiley Online Library, 2017. [154] Hieu T. Nguyen and Arnold Smeulders. Active learning using pre-clustering. In Proceedings of the Twenty-first International Conference on Machine Learning, ICML ?04, pages 79?, New York, NY, USA, 2004. ACM. [155] Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, and Svetha Venkatesh. Filter- ing bayesian optimization approach in weakly specified search space. Knowledge and Information Systems, pages 1?29, 2018. 191 [156] Vu Nguyen, Sunil Gupta, Santu Rane, Cheng Li, and Svetha Venkatesh. Bayesian optimization in weakly specified search space. In 2017 IEEE Interna- tional Conference on Data Mining (ICDM), pages 347?356. IEEE, 2017. [157] Horst Nowacki. Modelling of design decisions for cad. In Computer Aided Design Modelling, Systems Engineering, CAD-Systems, pages 177?223. Springer, 1980. [158] Matthias Oberhauser, Sky Sartorius, Thomas Gmeiner, and Kristina Shea. Computational design synthesis of aircraft configurations with shape grammars. In Design Computing and Cognition?14, pages 21?39. Springer International Publishing, 2015. [159] Katsunori Ohnishi, Shohei Yamamoto, Yoshitaka Ushiku, and Tatsuya Harada. Hierarchical video generation from orthogonal information: Optical flow and texture. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [160] Francesco Orabona and Nicolo Cesa-Bianchi. Better algorithms for selective sampling. In Proceedings of the 28th international conference on Machine learn- ing (ICML-11), pages 433?440, 2011. [161] Thomas Osugi, Deng Kim, and Stephen Scott. Balancing exploration and ex- ploitation: A new algorithm for active machine learning. In Data Mining, Fifth IEEE International Conference on. IEEE, 2005. [162] Maks Ovsjanikov, Wilmot Li, Leonidas Guibas, and Niloy J Mitra. Exploration of continuous variability in collections of 3d shapes. In ACM Transactions on Graphics (TOG), volume 30, page 33. ACM, 2011. [163] Simon Painchaud-Ouellet, Christophe Tribes, Jean-Yves Tre?panier, and Do- minique Pelletier. Airfoil shape optimization using a nonuniform rational b- splines parametrization under thickness constraint. AIAA journal, 44(10):2170? 2178, 2006. [164] Pramudita Satria Palar and Koji Shimoyama. Efficient global optimization with ensemble and selection of kernel functions for engineering design. Structural and Multidisciplinary Optimization, 59(1):93?116, 2019. [165] Haobo Qiu, Yanjiao Xu, Liang Gao, Xiaoke Li, and Li Chi. Multi-stage de- sign space reduction and metamodeling optimization method based on self- organizing maps and fuzzy clustering. Expert Systems with Applications, 46:180?195, 2016. [166] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015. 192 [167] Balaji Raghavan, Piotr Breitkopf, Yves Tourbier, and Pierre Villon. Towards a space reduction approach for efficient structural shape optimization. Structural and Multidisciplinary Optimization, 48(5):987?1000, 2013. [168] Balaji Raghavan, Guenhael Le Quilliec, Piotr Breitkopf, Alain Rassineux, Jean- Marc Roelandt, and Pierre Villon. Numerical assessment of springback for the deep drawing process by level set interpolation using shape manifolds. Interna- tional Journal of Material Forming, 7(4):487?501, 2014. [169] Balaji Raghavan, Liang Xia, Piotr Breitkopf, Alain Rassineux, and Pierre Vil- lon. Towards simultaneous reduction of both input and output spaces for inter- active simulation-based structural design. Computer Methods in Applied Me- chanics and Engineering, 265:174?185, 2013. [170] Tom Rainforth, Tuan Anh Le, Jan-Willem van de Meent, Michael A Osborne, and Frank Wood. Bayesian optimization for probabilistic programs. In Advances in Neural Information Processing Systems, pages 280?288, 2016. [171] Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63?71. Springer, 2004. [172] C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learn- ing. the MIT Press, 2006. [173] T Ray and HM Tsai. Swarm algorithm for single-and multiobjective airfoil design optimization. AIAA journal, 42(2):366?373, 2004. [174] Jeffrey C Regier and Philip B Stark. Mini-minimax uncertainty quantification for emulators. SIAM/ASA Journal on Uncertainty Quantification, 3(1):686? 708, 2015. [175] Yi Ren and Panos Y Papalambros. A design preference elicitation query as an optimization process. Journal of Mechanical Design, 133(11):111004, 2011. [176] Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 3, 2017. [177] Filipe Rodrigues, Francisco Pereira, and Bernardete Ribeiro. Gaussian process classification and active learning with multiple annotators. In Tony Jebara and Eric P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 433?441. JMLR Workshop and Conference Proceedings, 2014. [178] T Rogalsky, S Kocabiyik, and RW Derksen. Differential evolution in aerody- namic optimization. Canadian Aeronautics and Space Journal, 46(4):183?190, 2000. 193 [179] Jamshid A Samareh. Survey of shape parameterization techniques for high- fidelity multidisciplinary shape optimization. AIAA journal, 39(5):877?884, 2001. [180] Oksana Samko, A David Marshall, and Paul L Rosin. Selection of the opti- mal parameter value for the isomap algorithm. Pattern Recognition Letters, 27(9):968?979, 2006. [181] Greg Schohn and David Cohn. Less is more: Active learning with support vector machines. In ICML, pages 839?846, 2000. [182] Bernhard Scho?lkopf, Alexander Smola, and Klaus-Robert Mu?ller. Kernel prin- cipal component analysis. In International Conference on Artificial Neural Net- works, pages 583?588. Springer, 1997. [183] Bernhard Scho?lkopf, Alexander Smola, and Klaus-Robert Mu?ller. Nonlin- ear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299?1319, 1998. [184] Matthias Schramm, Bernhard Stoevesandt, and Joachim Peinke. Simulation and optimization of an airfoil with leading edge slat. Journal of Physics: Con- ference Series, 753(2):022052, 2016. [185] Matthias Schramm, Bernhard Stoevesandt, and Joachim Peinke. Optimization of airfoils using the adjoint approach and the influence of adjoint turbulent viscosity. Computation, 6(1):5, Jan 2018. [186] Thomas W Sederberg and Scott R Parry. Free-form deformation of solid geo- metric models. ACM SIGGRAPH computer graphics, 20(4):151?160, 1986. [187] Burr Settles. Active learning literature survey. University of Wisconsin, Madi- son, 52(55-66):11, 2010. [188] Burr Settles and Mark Craven. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on empirical methods in natural language processing, pages 1070?1079. Association for Computational Linguistics, 2008. [189] Bobak Shahriari. Practical Bayesian optimization with application to tuning machine learning algorithms. PhD thesis, University of British Columbia, 2016. [190] Bobak Shahriari, Alexandre Bouchard-Co?te?, and Nando Freitas. Unbounded bayesian optimization via regularization. In Artificial Intelligence and Statistics, pages 1168?1176, 2016. 194 [191] Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of bayesian opti- mization. Proceedings of the IEEE, 104(1):148?175, 2015. [192] David F Shanno. Conditioning of quasi-newton methods for function minimiza- tion. Mathematics of computation, 24(111):647?656, 1970. [193] Viktoriia Sharmanska, Daniel Herna?ndez-Lobato, Jose Miguel Hernandez- Lobato, and Novi Quadrianto. Ambiguity helps: Classification with disagree- ments in crowdsourced annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2194?2202, 2016. [194] Oana Sidi, Oliver van Kaick, Yanir Kleiman, Hao Zhang, and Daniel Cohen-Or. Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering. ACM Trans. Graph., 30(6):126:1?126:10, December 2011. [195] Prashant Singh, Joachim Van Der Herten, Dirk Deschrijver, Ivo Couckuyt, and Tom Dhaene. A sequential sampling strategy for adaptive classification of computationally expensive data. Structural and Multidisciplinary Optimization, 55(4):1425?1438, 2017. [196] Ayan Sinha, Jing Bai, and Karthik Ramani. Deep learning 3d shape surfaces using geometry images. In European Conference on Computer Vision, pages 223?240. Springer, 2016. [197] Ayan Sinha, Asim Unmesh, Qixing Huang, and Karthik Ramani. Surfnet: Generating 3d shape surfaces using deep residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6040?6049, 2017. [198] Shaun N Skinner and Hossein Zare-Behtash. State-of-the-art in aerodynamic shape optimisation methods. Applied Soft Computing, 62:933?962, 2018. [199] Edward Smith and David Meger. Improved adversarial systems for 3d object generation and reconstruction. arXiv preprint arXiv:1707.09557, 2017. [200] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few- shot learning. In Advances in Neural Information Processing Systems, pages 4077?4087, 2017. [201] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian opti- mization of machine learning algorithms. In Advances in neural information processing systems, pages 2951?2959, 2012. [202] Helmut Sobieczky. Parametric airfoils and wings. In Recent development of aerodynamic design methodologies, pages 71?87. Springer, 1999. 195 [203] Amir Arsalan Soltani, Haibin Huang, Jiajun Wu, Tejas D Kulkarni, and Joshua B Tenenbaum. Synthesizing 3d shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In The IEEE conference on computer vision and pattern recognition (CVPR), volume 3, page 4, 2017. [204] Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter. Bayesian optimization with robust bayesian neural networks. In Advances in Neural Information Processing Systems, pages 4134?4142, 2016. [205] Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995, 2009. [206] Niranjan Srinivas, Andreas Krause, Matthias Seeger, and Sham M Kakade. Gaussian process optimization in the bandit setting: No regret and experimen- tal design. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 1015?1022, 2010. [207] Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task bayesian opti- mization. In Advances in neural information processing systems, pages 2004? 2012, 2013. [208] Jerry Talton, Lingfeng Yang, Ranjitha Kumar, Maxine Lim, Noah Goodman, and Radom??r Me?ch. Learning design patterns with bayesian grammar induction. In Proceedings of the 25th annual ACM symposium on User interface software and technology, pages 63?74. ACM, 2012. [209] Jerry O Talton, Daniel Gibson, Lingfeng Yang, Pat Hanrahan, and Vladlen Koltun. Exploratory modeling with collaborative design spaces. ACM Trans- actions on Graphics-TOG, 28(5):167, 2009. [210] Jerry O Talton, Yu Lou, Steve Lesser, Jared Duke, Radom??r Me?ch, and Vladlen Koltun. Metropolis procedural modeling. ACM Transactions on Graphics (TOG), 30(2):11, 2011. [211] Emad Tandis and Ehsanolah Assareh. Inverse design of airfoils via an intelligent hybrid optimization technique. Engineering with Computers, 33(3):361?374, Jul 2017. [212] Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. In Proc. of the IEEE International Conf. on Computer Vision (ICCV), vol- ume 2, page 8, 2017. 196 [213] Yonatan A. Tesfahunegn, Slawomir Koziel, Leifur Leifsson, and Adrian Bekasiewicz. Surrogate-based airfoil design with space mapping and adjoint sensitivity. Procedia Computer Science, 51:795 ? 804, 2015. International Con- ference On Computational Science, ICCS 2015. [214] Marco Tezzele, Filippo Salmoiraghi, Andrea Mola, and Gianluigi Rozza. Di- mension reduction in heterogeneous parametric spaces with application to naval engineering shape design problems. arXiv preprint arXiv:1709.03298, 2017. [215] Simon Tong and Daphne Koller. Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov):45?66, 2001. [216] LJP Van der Maaten, EO Postma, and HJ Van den Herik. Dimensionality reduction: A comparative review. Technical Report TiCC TR 2009-005, 2009. [217] Emmanuel Vazquez and Julien Bect. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. Journal of Statistical Planning and inference, 140(11):3088?3095, 2010. [218] P Venkataraman. A new procedure for airfoil definition. In 13th Applied Aero- dynamics Conference, page 1875, 1995. [219] Gerhard Venter and Jaroslaw Sobieszczanski-Sobieski. Particle swarm optimiza- tion. AIAA journal, 41(8):1583?1589, 2003. [220] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096?1103. ACM, 2008. [221] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre- Antoine Manzagol. Stacked denoising autoencoders: Learning useful represen- tations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371?3408, 2010. [222] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630?3638, 2016. [223] Asha Viswanath, AIJ Forrester, and AJ Keane. Constrained design optimiza- tion using generative topographic mapping. AIAA journal, 52(5):1010?1023, 2014. [224] Asha Viswanath, AI J. Forrester, and AJ Keane. Dimension reduction for aerodynamic design optimization. AIAA journal, 49(6):1256?1266, 2011. 197 [225] Kun Wang, Shengjiao Yu, and Tiegang Liu. Airfoil optimization based on iso- geometric discontinuous galerkin. In Proceedings of the 2018 2Nd International Conference on Algorithms, Computing and Systems, ICACS ?18, pages 227?231, New York, NY, USA, 2018. ACM. [226] Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. O- cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics (TOG), 36(4):72, 2017. [227] Xiaolong Wang and Abhinav Gupta. Generative image modeling using style and structure adversarial networks. In European Conference on Computer Vision, pages 318?335. Springer, 2016. [228] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829, 2018. [229] Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient bayesian optimization. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3627?3635. JMLR. org, 2017. [230] William J Welch, Robert J Buck, Jerome Sacks, Henry P Wynn, Toby J Mitchell, and Max D Morris. Screening, predicting, and computer experiments. Technometrics, 34(1):15?25, 1992. [231] Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenen- baum. Learning a probabilistic latent space of object shapes via 3d generative- adversarial modeling. In Advances in Neural Information Processing Systems, pages 82?90, 2016. [232] David F Wyatt, David C Wynn, Jerome P Jarrett, and P John Clarkson. Supporting product architecture design using computational design synthesis with network structure constraints. Research in Engineering Design, 23(1):17? 52, 2012. [233] Kai Xu, Vladimir G Kim, Qixing Huang, Niloy Mitra, and Evangelos Kaloger- akis. Data-driven shape analysis and processing. In SIGGRAPH ASIA 2016 Courses, page 4. ACM, 2016. [234] Kai Xu, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen. Fit and diverse: set evolution for inspiring 3d shape galleries. ACM Transactions on Graphics (TOG), 31(4):57, 2012. [235] Xufeng Yang, Yongshou Liu, Yi Gao, Yishang Zhang, and Zongzhan Gao. An active learning kriging model for hybrid reliability analysis with both ran- dom and interval variables. Structural and Multidisciplinary Optimization, 51(5):1003?1016, 2015. 198 [236] Yi Yang, Zhigang Ma, Feiping Nie, Xiaojun Chang, and Alexander G Haupt- mann. Multi-class active learning by uncertainty sampling with diversity max- imization. International Journal of Computer Vision, 113(2):113?127, 2015. [237] Bernard Yannou, Faysal Moreno, Henri J Thevenot, and Timothy W Simp- son. Faster generation of feasible design points. In ASME 2005 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, pages 355?363. American Society of Mechanical Engi- neers, 2005. [238] Houpu Yao, Yi Ren, and Yongming Liu. Fea-net: A deep convolutional neural network with physicsprior for efficient data driven pde learning. In AIAA Scitech 2019 Forum, page 0680, 2019. [239] QIU Yasong, BAI Junqiang, LIU Nan, and WANG Chen. Global aerodynamic design optimization based on data dimensionality reduction. Chinese Journal of Aeronautics, 31(4):643?659, 2018. [240] Mehmet Ersin Yumer, Paul Asente, Radomir Mech, and Levent Burak Kara. Procedural modeling using autoencoder networks. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, UIST ?15, pages 109?118, New York, NY, USA, 2015. ACM. [241] Mehmet Ersin Yumer, Siddhartha Chaudhuri, Jessica K. Hodgins, and Lev- ent Burak Kara. Semantic shape editing using deformation handles. ACM Trans. Graph., 34(4):86:1?86:12, July 2015. [242] Mehmet Ersin Yumer and Levent Burak Kara. Co-constrained handles for defor- mation in shape collections. ACM Transactions on Graphics (TOG), 33(6):187, 2014. [243] Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. De- convolutional networks. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2528?2535. IEEE, 2010. [244] Lihi Zelnik-Manor and Pietro Perona. Self-tuning spectral clustering. In NIPS, volume 17, page 16, 2004. [245] Zhenyue Zhang, Jing Wang, and Hongyuan Zha. Adaptive manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2):253? 265, 2012. [246] Youyi Zheng, Daniel Cohen-Or, and Niloy J. Mitra. Smart variations: Func- tional substructures for part compatibility. Computer Graphics Forum (Euro- graphics), 32(2pt2):195?204, 2013. 199 [247] Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani. Combining active learn- ing and semi-supervised learning using gaussian fields and harmonic functions. In ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, volume 3, 2003. [248] Xiaotian Zhuang and Rong Pan. A sequential sampling strategy to improve reliability-based design optimization with implicit constraint functions. Journal of Mechanical Design, 134(2):021002, 2012. [249] Lavi R Zuhal, Cahya Amalinadhi, Yohanes B Dwianto, Pramudita S Palar, and Koji Shimoyama. Benchmarking multi-objective bayesian global optimization strategies for aerodynamic design. In 2018 AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, page 0914, 2018. 200