ABSTRACT Title of dissertation: INTEGRATED FIELD INVERSION AND MACHINE LEARNING WITH EMBEDDED NEURAL NETWORK TRAINING FOR TURBULENCE MODELING Jonathan Holland Doctor of Philosophy, 2019 Dissertation directed by: Dr. James Baeder, Professor Department of Aerospace Engineering University of Maryland, College Park Dr. Karthik Duraisamy, Associate Professor Department of Aerospace Engineering University of Michigan, Ann Arbor A rich set of experimental and high fidelity simulation data is available to improve Reynolds Averaged Navier Stokes (RANS) models of turbulent flow. In practice, using this data is difficult, as measured quantities cannot be used to im- prove models directly. The Field Inversion and Machine Learning (FIML) approach addressed this challenge through an inference step, in the form of an inverse prob- lem, which treats inconsistencies between the models and the data in a consistent manner. However, a separate learning algorithm is not always able to be learned from the generated inverse problem data accurately. Two new methods of incor- porating higher fidelity data into RANS turbulence models via machine learning are proposed and applied for the first time in this thesis. Both build on the FIML framework by performing learning during the inference step, instead of considering the inference and learning steps separately as in the classic FIML approach. The first new approach embeds neural network learning into the RANS solver, and the second trains the weights of the neural network directly. Additionally, for the first time, the inverse problem can incorporate higher fidelity data from multiple cases simultaneously, promising to improve the generalization of the augmented model. The two new methods and the classic approach are demonstrated with a simple model problem, as well as a number of challenging RANS cases. For a 2D airfoil case, all three FIML augmentations are shown to improve predictions, with the new methods demonstrating increased regularization. Additionally, a model augmentation is generated by considering seven angles of attack of an airfoil in the inference step, and the augmentation is shown to improve predictions on a different airfoil. Additional cases are considered including a transonic shock wave boundary layer interaction and the NASA wall-mounted hump. In all cases, the inference is shown to improve predictions. For the first time, the inverse problem accounts for the limitations of the learning procedure, guaranteeing that the model discrepancy is optimal for the chosen learning algorithm. The results in this thesis prove that learning during the inference step provides additional regularization, and guarantees the inference produces learnable model discrepancy. Integrated Field Inversion and Machine Learning With Embedded Neural Network Training for Turbulence Modeling by Jonathan Richard Holland Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2019 Advisory Committee: Dr. James Baeder, Chair/Advisor Dr. Karthik Duraisamy, co-Advisor Dr. Stuart Laurence Dr. Kenneth Yu Dr. Robert Sanner Dr. Nikhil Chopra ?c Copyright by Jonathan Richard Holland 2019 Acknowledgments This research was supported by the Vertical Lift Research Center Of Excel- lence (VLRCOE) at the University of Maryland and the ONR program Advanc- ing Predictive Strategies for Wall-Bounded Turbulence by Fundamental Studies and Data-driven Modeling at the University of Michigan. Computational resources were provided by the University of Maryland Deepthought2 High Performance Comput- ing cluster. ii Table of Contents Acknowledgments ii List of Tables vi List of Figures vii List of Abbreviations xi List of Symbols xii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Physical Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 A Brief History of Turbulent Boundary Layer Analysis . . . . . . . . 4 1.4 Physics of Boundary Layers . . . . . . . . . . . . . . . . . . . . . . . 5 1.4.1 Turbulent Boundary Layer Structure . . . . . . . . . . . . . . 6 1.4.2 Factors Affecting Turbulent Boundary Layers . . . . . . . . . 7 1.4.3 Shock Wave Turbulent Boundary Layer Interactions . . . . . . 9 1.5 Challenges in Turbulent Boundary Layer Modeling . . . . . . . . . . 10 1.5.1 RANS Modeling of Turbulent Flows Over Airfoils . . . . . . . 11 1.5.2 RANS Predictions of Shock Wave Turbulent Boundary Layer Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 Machine Learning with Neural Networks for Regression . . . . . . . . 21 1.7 Leveraging Machine Learning for Turbulence Modeling . . . . . . . . 23 1.8 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.9 Contributions of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.10 Scope and Organization of Thesis . . . . . . . . . . . . . . . . . . . . 27 2 Progress in Data-Informed Turbulence Modeling 29 2.1 Calibrating Turbulence Models From Data . . . . . . . . . . . . . . . 29 2.2 Improving Turbulence Models with Data and Machine Learning . . . 33 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 iii 3 Field Inversion and Machine Learning With Embedded Neural Network Train- ing Development 40 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Field Inversion and Machine Learning: FIML-Classic . . . . . . . . . 41 3.2.1 Neural Networks for Regression . . . . . . . . . . . . . . . . . 46 3.3 Field Inversion and Machine Learning with Embedded Backpropaga- tion: FIML-Embedded . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 Field Inversion and Machine Learning with Direct Training: FIML- Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5 Computational Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4 Numerical Methodology 61 4.1 Governing Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Reynolds-Averaged Navier-Stokes Equations . . . . . . . . . . . . . . 66 4.2.1 The Spalart-Allmaras Turbulence Model . . . . . . . . . . . . 67 4.3 The Spalart-Allmaras 1-equation BC Transitional Model (SA-BC) . . 69 4.3.1 Strain Adaptive Spalart Allmaras Model: SA-SALSA . . . . . 71 4.4 Neural Network Structure and Training . . . . . . . . . . . . . . . . . 72 4.4.1 Feature Selection and Scaling . . . . . . . . . . . . . . . . . . 80 4.5 Field Inversion and Machine Learning . . . . . . . . . . . . . . . . . . 84 4.5.1 FIML-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.5.2 FIML-Embedded . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.5.3 FIML-Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.5.4 Unconstrained Optimization By Gradient Descent . . . . . . . 90 4.5.4.1 The BFGS Method . . . . . . . . . . . . . . . . . . . 91 4.5.4.2 The Stochastic Gradient Descent Method . . . . . . 93 4.6 Complex Step Method . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.7 Adjoint Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.7.1 Discrete Adjoint Implementation . . . . . . . . . . . . . . . . 101 4.8 Model Problem: The 1D Heat Equation . . . . . . . . . . . . . . . . 105 4.8.1 Model Problem: FI-Classic Implementation . . . . . . . . . . 108 4.8.2 Model Problem: FIML-Embedded . . . . . . . . . . . . . . . . 111 4.8.3 Model Problem: FIML-Direct . . . . . . . . . . . . . . . . . . 112 4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5 One-Dimensional Heat Equation Simulations 116 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.2 FIML-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.3 FIML-Embedded . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.4 FIML-Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.4.1 Inversion Convergence Comparison . . . . . . . . . . . . . . . 137 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 iv 6 FIML Results for RANS Applications 143 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2 S809 Airfoil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.2.1 FI-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.2.2 Feature Selection and Scaling . . . . . . . . . . . . . . . . . . 149 6.2.3 FIML-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.2.4 FIML-Embedded . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.2.5 FIML-Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.2.6 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.2.7 FIML-Direct for Simultaneous Inversion of Multiple Cases . . 171 6.3 RAE2822 Airfoil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 6.3.1 FI-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 6.3.2 FIML-Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.4 NASA Langley Hump . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.4.2 FI-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.4.3 FIML-Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 7 Conclusions 200 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 7.2 Key Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 7.2.1 FIML-Embedded . . . . . . . . . . . . . . . . . . . . . . . . . 204 7.2.2 FIML-Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.2.3 1D Heat Equation Model Problem . . . . . . . . . . . . . . . . 207 7.2.4 2D Incompressible Airfoil Results . . . . . . . . . . . . . . . . 209 7.2.5 2D Incompressible Airfoil Results for Multiple Cases . . . . . 211 7.2.6 Transonic Airfoil Results . . . . . . . . . . . . . . . . . . . . . 212 7.2.7 NASA Langley Hump Results . . . . . . . . . . . . . . . . . . 214 7.2.8 Other RANS Applications . . . . . . . . . . . . . . . . . . . . 214 7.3 Recommendations for Future Work . . . . . . . . . . . . . . . . . . . 215 A Hypersonic Wedge 221 A.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 A.2 FI-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 A.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 B Onera M6 232 B.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 B.2 FI-Classic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 B.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 v List of Tables 6.1 Comparison of target (kd), inverse, and augmented model lift coeffi- cients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.2 Comparison of CL following inversion and/or model augmentation for FIML approaches on the S809 airfoil; the ??? is applied to emphasize that some results, as discussed, are not fully converged so confident conclusions about the relative performance of the three FIML ap- proaches cannot be made. . . . . . . . . . . . . . . . . . . . . . . . . 167 vi List of Figures 1.1 Illustration (not to scale) of a typical turbulent boundary layer ve- locity profile and a boundary layer subject to an adverse pressure gradient, adapted from [1, 2, 3]. . . . . . . . . . . . . . . . . . . . . . 8 1.2 Spalart Allmaras Predicted Lift Coefficient for the S809 Airfoil Com- pared With Wind Tunnel Data. . . . . . . . . . . . . . . . . . . . . . 13 1.3 Spalart Allmaras Predicted Eddy viscosity for the S809 airfoil at 14.2? angle of attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Mach contour of a transonic RAE2822 airfoil at 2.31? angle of attack and Mach 0.725. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5 Illustration of a smeared shock foot, adapted from [4]. . . . . . . . . . 18 1.6 Pressure coefficient contour of a transonic RAE2822 airfoil at 2.31?angle of attack and Mach 0.725. . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7 Flow structure of a ramp induced separated SWTBLI, adapted from [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1 Chart of FIML-Classic procedure illustrating learning separate from inversion process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 FIML-Classic Procedure for Producing Model Augmentations Incor- porating Information Learned from Multiple Inversions. . . . . . . . . 45 3.3 Chart of FIML-Embedded and FIML-Direct procedure illustrating that offline learning is not required for these approaches. . . . . . . . 49 3.4 FIML-Direct procedure for simultaneous inversion of multiple cases to produce single augmentation. . . . . . . . . . . . . . . . . . . . . . 55 4.1 Illustration of a single neuron in a neural network. . . . . . . . . . . . 74 4.2 Illustration of chosen neural network structure. . . . . . . . . . . . . . 75 4.3 Flowchart of FIML-Classic RANS implementation in SU2 illustrating input and output of major modules. . . . . . . . . . . . . . . . . . . . 86 4.4 Flowchart of FIML-Embedded RANS implementation in SU2 illus- trating input and output of major modules. . . . . . . . . . . . . . . 88 4.5 Flowchart of FIML-Direct RANS implementation in SU2 illustrating input and output of major modules. . . . . . . . . . . . . . . . . . . . 89 4.6 Flowchart of FIML-Direct RANS implementation for multiple cases. . 90 vii 4.7 Comparison of the error as a function of step size for the finite differ- ence method and complex variable step method. . . . . . . . . . . . . 97 4.8 Illustration of the iteration process for 1D Heat direct (primal) solver for FIML-Embedded. . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.1 Figure of FI-Classic results for T? = 50. . . . . . . . . . . . . . . . . 117 5.2 FI-Classic results displaying computed ?(z) for T? = 50. . . . . . . . 118 5.3 Objective function convergence history of (Jc) for T? = 50. . . . . . . 119 5.4 FIML-Classic Neural Network Error Following Training. . . . . . . . 121 5.5 FIML-Classic results of model augmentation for training temperatures.122 5.6 FIML-Classic results of model augmentation for holdout temperatures.122 5.7 FIML-Classic neural network correction for holdout conditions. . . . . 123 5.8 Features for FIML-Classic training and holdout cases. . . . . . . . . . 124 5.9 FIML-Embedded results of model augmentation. . . . . . . . . . . . . 126 5.10 FIML-Embedded correction output from model augmentation. . . . . 127 5.11 FIML-Embedded objective function minimization history. . . . . . . . 128 5.12 FIML-Direct results of model augmentation for a single training tem- perature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.13 FIML-Direct correction for a single training temperature. . . . . . . . 130 5.14 FIML-Direct correction for a single training temperature. . . . . . . . 131 5.15 FIML-Direct results of model augmentation for four training temper- atures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.16 FIML-Direct Objective Function, Jd, Convergence using BFGS min- imization for four training temperatures. . . . . . . . . . . . . . . . . 133 5.17 FIML-Direct correction, ?(T, T?), for training using four temperatures.134 5.18 FIML-Direct results of model augmentation for four unseen T?(z) distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.19 FIML-Direct correction, ?(T, T?), for holdout T? distributions. . . . 135 5.20 FIML-Direct feature space depiction for training and holdout T?(z) distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.21 FIML-Direct stochastic gradient descent results for training temper- atures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.22 FIML-Direct stochastic gradient descent results for holdout T?(z) distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.23 FIML-Direct stochastic gradient descent objective function Jd con- vergence history. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.24 Convergence history of the objective functions for all three methods. . 140 6.1 Baseline SA-BC streamlines and Cp contours for S809 airfoil. . . . . . 147 6.2 Initial gradient for FI-Classic. . . . . . . . . . . . . . . . . . . . . . . 148 6.3 Convergence history of S809 airfoil for FI-Classic. . . . . . . . . . . . 148 6.4 Optimal correction field found by FI-Classic. . . . . . . . . . . . . . . 149 6.5 Magnitude of the gradient vector of the objective function during the inversion for FI-Classic. . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.6 Scatterplot pairs for some considered features. . . . . . . . . . . . . . 151 viii 6.7 Correlation coefficient pairs for some considered features. . . . . . . . 152 6.8 Scatterplot pairs for some considered features mapped to normal dis- tribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.9 Correlation coefficient pairs for some considered features mapped to normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.10 R2 scores for each subset of features during sequential back selection. 156 6.11 Comparison between correction found in inversion and output of neu- ral network following learning. . . . . . . . . . . . . . . . . . . . . . . 157 6.12 Pressure distribution plot showing improvement in prediction for the FIML-Classic augmentation. . . . . . . . . . . . . . . . . . . . . . . . 159 6.13 Convergence history of S809 airfoil field inversion. . . . . . . . . . . . 160 6.14 CL convergence at minimum Je evaluation. . . . . . . . . . . . . . . . 162 6.15 SSE convergence at minimum Je evaluation. . . . . . . . . . . . . . . 163 6.16 FIML-Embedded comparison of training correction ?T and correction output by converged neural network augmentation ?. . . . . . . . . . 164 6.17 Convergence history of S809 airfoil FIML-Direct inversion for ? = 10?4.165 6.18 Correction field (?) for FIML-Direct with ? = 10?4. . . . . . . . . . . 165 6.19 Correction field (?) for FIML-Direct with ? = 10?5. . . . . . . . . . . 166 6.20 Comparison of pressure distributions for baseline SA-BC model and FIML augmentations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.21 Comparison of production term correction (?) for FIML inversions. . 170 6.22 Convergence history of S809 airfoil FIML-Direct simultaneous inver- sion at 3 Angles of Attack. . . . . . . . . . . . . . . . . . . . . . . . . 172 6.23 Correction for all three angles of attack at best evaluation. . . . . . . 173 6.24 Summary of augmentation performance for training and holdout cases for augmentation trained on three angles of attack. . . . . . . . . . . 174 6.25 S809 augmentation lift performance for training set of all seven angles of attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.26 S809 augmentation drag performance for training set of all seven an- gles of attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 6.27 S814 baseline SA-BC pressure and streamline predictions for ? = 16.2?.177 6.28 S814 CL resulting from augmentation trained on S809 angles of attack.178 6.29 S814 ? fields for three angles of attack. . . . . . . . . . . . . . . . . . 179 6.30 Pressure distribution plot for the S814 at 16.2? angle of attack show- ing improvement in prediction for the model augmentation. . . . . . . 180 6.31 Convergence of RAE2822 FI-Classic objective function. . . . . . . . . 181 6.32 RAE2822 FI-Classic pressure distribution results. . . . . . . . . . . . 182 6.33 RAE2822 FI-Classic pressure distribution results near shock location. 183 6.34 FI-Classic correction for RAE2822 application. . . . . . . . . . . . . . 183 6.35 Pressure distribution and FI-Classic correction near the smeared shock foot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.36 FIML-Direct convergence for RAE2822 airfoil test case with ? = 10?6 185 6.37 FIML-Direct correction (?(w)) for RAE2822 airfoil test case. . . . . . 185 6.38 FIML-Direct correction at the shock interaction location for RAE2822 airfoil test case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 ix 6.39 FIML-Direct pressure distribution results. . . . . . . . . . . . . . . . 186 6.40 FIML-Direct pressure distribution results at the shock location. . . . 187 6.41 Baseline SA model predictions for pressure coefficient over the NASA Langley wall mounted hump. . . . . . . . . . . . . . . . . . . . . . . . 188 6.42 FI-Classic convergence for NASA Langley Hump test case with ? = 0 190 6.43 FI-Classic correction field ? for case with ? = 0 . . . . . . . . . . . . 191 6.44 FI-Classic Cp comparison to available data kd for test case with ? = 0 191 6.45 FI-Classic convergence for test case with ? = 10?6 . . . . . . . . . . 192 6.46 FI-Classic correction field ? for ? = 10?6 . . . . . . . . . . . . . . . . 193 6.47 FI-Classic correction field ? with streamlines for test case with ? = 10?6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 6.48 FI-Classic Cp comparison to available data kd for test case with ? = 10?6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 6.49 FIML-Direct convergence for test case with ? = 10?7 . . . . . . . . . 195 6.50 FIML-Direct correction field ? with streamlines for test case with ? = 10?7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 6.51 FIML-Direct Cp comparison to available data kd for test case with ? = 10?7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A.1 Illustration of geometry for hypersonic wedge case. . . . . . . . . . . 222 A.2 Mach number contours for baseline SA-SALSA model. . . . . . . . . 223 A.3 Pressure coefficient contours for baseline SA-SALSA model. . . . . . 223 A.4 Temperature (?K) contours for baseline SA-SALSA model. . . . . . 224 A.5 Objective function convergence for FI-Classic hypersonic wedge ap- plication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 A.6 Correction (?) at interaction region for hypersonic wedge FI-Classic application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 A.7 Correction (?) upstream of interaction region for hypersonic wedge FI-Classic application. . . . . . . . . . . . . . . . . . . . . . . . . . . 228 A.8 Pressure distribution near the interaction region for FI-Classic hyper- sonic wedge application. . . . . . . . . . . . . . . . . . . . . . . . . . 228 A.9 Wall heat flux near the interaction region for FI-Classic hypersonic wedge application (W/m2). . . . . . . . . . . . . . . . . . . . . . . . 229 B.1 Surface pressure CP contours for baseline SA model on upper surface of Onera M6 wing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 B.2 FI-Classic results for Onera M6 wing. . . . . . . . . . . . . . . . . . 234 B.3 Inverse results at span ? = 0.2. . . . . . . . . . . . . . . . . . . . . . 235 B.4 Inverse results at span ? = 0.44. . . . . . . . . . . . . . . . . . . . . 236 B.5 Inverse results at span ? = 0.95. . . . . . . . . . . . . . . . . . . . . 236 B.6 Scatterplot colored by error in pressure coefficient (|C?P ? CP |). . . . 237 B.7 Correction (?) at span ? = 0.2. . . . . . . . . . . . . . . . . . . . . . 237 B.8 Correction (?) at span ? = 0.2, with wing surface contoured by Cp. . 238 B.9 Correction (?) at span ? = 0.95. . . . . . . . . . . . . . . . . . . . . 239 x List of Abbreviations AoA Angle of Attack APG Adverse Pressure Gradient BFGS Broyden Fletcher Goldfarb Shanno Quasi-Newton Minimization Algorithm CFD Computational Fluid Dynamics DNS Direct Numerical Simulation DES Detached Eddy Simulation FIML Field Inversion and Machine Learning HPC High Perfomance Computing LES Large Eddy Simulation L-BFGS-B Limited Memory BFGS Method NASA National Aeronautics and Space Administration RANS Reynolds Averaged Navier Stokes SA Spalart-Allmaras Turbulence Model SA-BC Spalart-Allmaras Bas Cakmakcioglu One Equation Algebraic Transition Model SGD Stochastic Gradient Descent SSE Sum Squared Error SU2 Stanford University Unstructured SWTBLI Shock Wave Turbulent Boundary Layer Interaction xi List of Symbols ? Correction to Turbulent Production Cp, CD, CL Pressure, Drag, and Lift Coefficients cp, cv Specific Heat at Constant Pressure and Volume D Turbulent Destruction in Spalart-Allmaras Model ? Delta Criterion e Internal Engergy ? Intermittency Jc, Je, Jd FIML-Classic, Embedded, and Direct Objective Functions k Thermal Conductivity ? Von Ka?rma?n Constant km, kd Model Output and Higher Fidelity Data ? Objective Function Regularization Constant M Mach Number M Model ? Kinematic Viscosity ?t Eddy Viscosity ?? Spalart-Allmaras Model Transported Variable ? Vorticity P Turbulent Production in Spalart-Allmaras Model p Pressure xii R Ideal Gas Constant Re Reynolds Number ? Density S Strain T Temperature Tu? Freestream Turbulence Intensity ?w Shear Stress at Wall U+ Non-dimensional Boundary Layer Velocity U? Friction Velocity xi Spatial Dimension y+ Non-dimensional Distance from Wall xiii Chapter 1: Introduction 1.1 Motivation Many practical fluid dynamic applications require the accurate characteriza- tion of the dynamics of turbulent boundary layers. Even simple flows, such as incompressible flow over an airfoil, can require accurate modeling of boundary layer effects to accurately predict the lift and drag even at moderate angles of attack. Other examples are readily available, such as shock wave turbulent boundary layer interactions (SWTBLI). Computational Fluid Dynamics (CFD) is commonly se- lected to analyze these cases; but, unfortunately turbulence modeling for CFD is notoriously difficult. Despite immense effort and resources many practical engi- neering applications are still difficult to predict computationally. The continued accelerating growth of computational power predicted by Moore?s law has enabled the development of modeling applications with higher fidelity and potential accu- racy, such as Large Eddy Simulation (LES) and Direct Numerical Simulation (DNS). However, LES remains out of reach for most practical engineering applications, intro- duces additional modeling uncertainties, and still requires immense computational resources. DNS has incredible usefulness for providing the exact solution but will remain out of reach for anything but the simplest applications for decades. RANS 1 modeling will remain essential for fluid dynamics predictions, yet progress in RANS accuracy improvements has been frustratingly slow. Experiments, DNS, and LES have provided a wealth of information to leverage for turbulence modeling; unfortu- nately RANS models have yet to see substantial gains by incorporating inferences from these more fundamental sources of information. Machine learning provides an exciting new avenue to incorporate inferences from these higher fidelity simulations and experiments into RANS simulations, to increase accuracy and reliability. It has been demonstrated, that by careful problem definition, corrections to RANS models can be generated for the fluid dynamic problem of interest to the modeler. These model discrepancies can be used by the modeler to improve RANS predictions. Additionally, by leveraging machine learning methods a machine learned model can be generated that corrects and augments the turbulence model in a predictive environment. The application of these machine learning methods is not intended to replace the modeler, but rather provide an avenue of generating inferences and incorporating patterns that are critical to RANS accuracy, but may not be apparent to the modeler through other traditional analysis techniques. In summary, the lack of higher order CFD techniques for high-Reynolds number flows and the difficulty of incorporating higher fidelity data into RANS models provides strong motivation for developing methods of inferring corrections and augmenting existing models using machine learning. 2 1.2 Physical Problem Viscosity slows the fluid at the wall to zero when fluid flows over a wall. In a thin layer near the wall, a velocity profile develops where the velocity is zero at the wall and increases in velocity away from the wall until it reaches its freestream value. This thin layer where viscous effects are critical is known as the boundary layer and its concept is attributed to Prandtl [5]. Boundary layers can be divided into two types: laminar and turbulent. Laminar boundary layers are characterized by the streamlines of the fluid being largely parallel to the wall with little mix- ing between the layers of the boundary layer and negligible motion normal to the wall. Laminar boundary layers can be destabilized through a variety of mechanisms (surface roughness, freestream turbulence, curvature, etc) resulting in transition to turbulent boundary layers. Transition is the process by which the ordered, parallel streamlines of the laminar boundary layer destabilize into the disordered, chaotic structure of the turbulent boundary layer. Despite being contained in a very thin layer near the wall, the turbulent bound- ary layer is responsible for numerous important fluid dynamic phenomena. Due to increased mixing in the turbulent boundary layer, the higher momentum from layers farther from the wall is thrown closer to the wall. This creates an increased velocity gradient near the wall, resulting in increased skin friction and heat transfer. Ad- ditionally, the boundary layer can separate when subjected to an adverse pressure gradient. Turbulent boundary layers are less susceptible to boundary layer separa- tion due to the increased momentum near the wall, which counteracts an adverse 3 pressure gradient. This effect is important both for airfoils at high angles of attack and for shock boundary layer interactions, as the boundary layer in both cases is subjected to a strong adverse pressure gradient. 1.3 A Brief History of Turbulent Boundary Layer Analysis Reynolds first distinguished between laminar and turbulent flows with his famous experiments in 1883. He observed dyed fluid in pipes and noted the point at which the dye became irregular (where the fluid transitions from laminar to turbulent flow). Through dimensional arguments and observations he discovered the now ubiquitous Reynolds number [6]. ?UL Re = ? This dimensionless quantity measures the ratio of the inertial forces (numer- ator) to the viscous forces (denominator). Reynolds observed that the flow in a pipe transitions to turbulence at particular Reynolds numbers, and additionally ob- served that the transition location is particularly dependent on the upstream flow conditions (i.e. transition location dependent on upstream disturbances) [6] [7]. The concept of the ?law of the wall?, first presented by von Ka?rma?n, was developed from first principles and observations and was quickly accepted. This foundational work presented a simple relation describing the velocity profile of tur- bulent boundary layers [8]. Subsequent experiments, such as the experiments of Coles, found experimental values for the constants in the relation to calibrate the 4 theory. The theory has been widely accepted and forms the basis and rationale for numerous computational models, including RANS turbulence models. Despite the ubiquitous nature of the law of the wall it is still the subject of some controversy. There is still significant debate over the constants of the theory, or even if the law of the wall is valid in a general sense. The debate is perhaps best explained by George [10]. Turbulent boundary layer analysis has increasingly made use of high fidelity simulations, due to the continuous advancement of computational resources. The first fully resolved DNS simulations of a turbulent channel were presented by Kim et al. [11]. This data is extremely valuable, and DNS data has in general proven to be the most accurate way of analyzing and understanding turbulent physics. Unfortu- nately, due to computational limitations DNS remains far out of reach for practical Reynolds numbers. Experiments can measure flows with Reynolds numbers far out of reach of DNS simulations, but physical limitations of experimental methods still prevent direct measurement of turbulent statistics from Reynolds numbers relevant to large aircraft [10]. 1.4 Physics of Boundary Layers Turbulent boundary layer analysis is a rich field with an immense volume of es- tablished research and thought. This section briefly introduces the terminology and underlying physical principles of turbulent boundary layer analysis and modeling. 5 1.4.1 Turbulent Boundary Layer Structure The fluid dynamics of a turbulent boundary layer are extremely complex. It is often useful to analyze the turbulent boundary layer by looking at the mean velocity profile. The instantaneous velocity in a turbulent boundary layer is chaotic, but in a mean sense the velocity will be nearly parallel to the wall similar to the laminar boundary layer. Thus, we can simplify the velocity field to a mean velocity profile U?(y) along the wall with y being the distance from the wall. These are dimensional variables but typically boundary layer velocity profiles are analyzed using the non- dimensional plus coordinates: + U?yy = (1.1) ? U+ U? = (1.2) U? ( ) ? ?wU? (1.3) ? Where ? is the kinematic viscosity and ?w is the shear stress at the wall. By noting that the mean velocity profile near the wall will depend on the distance to the wall (y), the shear stress at the wall (?w), the kinematic viscosity (?), and the fluid density (?) from dimensional analysis we can find the law of the wall, which concludes that the mean velocity profile (U?) will be a function of the distance from the wall [12]. 6 U+ = f(y+) (1.4) Nearest to the wall the viscous forces dominate in a region commonly referred to as the viscous sublayer. In this region (in plus coordinates) the mean velocity is equivalent to the distance from the wall (U+ = y+). At approximately y+ ? 5 there begins a buffer layer which connects the viscous sublayer to the log-law region. von Ka?rma?n first proposed the since widely accepted logarithmic law of the wall [8]. From first principles the law of the wall can be written: U+ 1 = ln y+ +Bi (1.5) ? Conceptually, in plus coordinates, the turbulent boundary layer velocity profile is illustrated in Figure 1.1. 1.4.2 Factors Affecting Turbulent Boundary Layers Boundary layer analysis is more difficult when considering more practical engi- neering flows, such as turbulent flow over an airfoil, or the reaction of the boundary layer to a sudden increase in pressure from a shock wave. Many factors can influence a turbulent boundary layer, and some of these factors are introduced in this section. Real walls are not perfectly smooth, and naturally a viscous fluid interacting with a wall boundary will be influenced by the roughness of that boundary. Surface roughness primarily influences the viscous sublayer (nearest layer to the wall) and has little effect on the more outer layers. As the surface roughness increases the log 7 Figure 1.1: Illustration (not to scale) of a typical turbulent boundary layer velocity profile and a boundary layer subject to an adverse pressure gradient, adapted from [1, 2, 3]. region will move lower and to the right in Figure 1.1. This will increase the velocity gradient at the wall. Because the shear stress at the wall is proportional to this gradient (1.6), an increase in surface roughness is associated with an increase in ?w and associated parameters, such as the skin friction (Cf = ?w/(0.5?v 2)) [1]. ?U ?w = ? (1.6) ?y If surface roughness is considered at all it is often simply correlated only to the 8 average roughness size, even though it is clear that this is not the only parameter required to capture the effects of surface roughness [1]. For all the cases considered in this thesis, the boundary layer data was obtained from experiments with models manufactured to be as smooth as possible. Therefore, as is typical, surface roughness effects are neglected. Most importantly for the present work, boundary layers are strongly influenced by pressure gradients. Opposite to surface roughness, pressure gradients have the strongest effect on the outer layer. If subjected to an adverse pressure gradient (dp/dx > 0) the fluid will slow, and the boundary layer profile shown in Figure 1.1 will begin to show a slower velocity profile in the outer layer. It has been shown that the inner layer velocity profile is remarkably resilient to the influence of pressure gradients [1], however, in a strong adverse pressure gradient the defect layer begins closer to the wall as the outer layer velocity slows [3]. Conceptually, this is shown in Figure 1.1. Ultimately, given a strong enough adverse pressure gradient the momentum in the defect layer can slow to the point that the boundary layer separates. Therefore, accurately modeling the behavior of the boundary layer is critical for a variety of practical engineering flows. 1.4.3 Shock Wave Turbulent Boundary Layer Interactions Another challenge for turbulent boundary layer analysis is shock wave bound- ary layer interaction. A shock wave marks a very rapid increase in pressure, tem- perature, and density. In many fluid dynamic problems of interest a shock wave 9 impinges on a turbulent boundary. For example, transonic flow over the suction side of a transonic airfoil, as well as supersonic inlets and isolators are common ex- amples of flow conditions that result in shock boundary layer interactions. When a shock wave impinges on a turbulent boundary layer the boundary layer is subjected to a strong adverse pressure gradient. The reaction of the boundary layer to the shock wave can be very difficult to predict. 1.5 Challenges in Turbulent Boundary Layer Modeling Turbulent boundary layer predictions for many practical engineering applica- tions rely on boundary layer models. Experimental data is often expensive, or diffi- cult to obtain. Computational fluid dynamic (CFD) models of turbulent boundary layers are often used to generate predictions. CFD seeks to numerically solve the physical equations that govern fluid flows: the Navier Stokes equations. Ideally, the exact equations could be solved (DNS), but for turbulent flows at practical Reynolds numbers it is not computationally possible to resolve the exceedingly small turbu- lent eddies that define the dynamics of these problems. The required grid sizes to do so are exceedingly small, requiring immense computational resources. LES re- laxes this requirement somewhat, but still requires unattainable resources for high Reynolds number cases and introduces additional model uncertainties. Therefore, for practical turbulent CFD predictions it is common to apply the RANS equations. For RANS, the effects of turbulence are modeled, and the development, calibration, testing, uncertainty quantification, and validation of these models on all varieties 10 of turbulent flows has been an intense area of research for decades. This section will focus on the difficulties of RANS predictions for two broad classes of turbulent applications used in the current work: turbulent flow over airfoils and shock wave turbulent boundary layer interactions. 1.5.1 RANS Modeling of Turbulent Flows Over Airfoils While RANS models yield excellent predictions for a variety of applications, there remain many common engineering applications for which RANS models re- main surprisingly deficient. Specifically, for 2D flow over subsonic airfoils at low angles of attack predictions of the forces and moments on the airfoil are generally well predicted by a variety of RANS models. However, as the angle of attack is in- creased a strong adverse pressure gradient develops on the upper (suction) surface. Eventually, the boundary layer will separate, and a recirculating region will form. It has been well documented that RANS models have difficulty predicting the size of this recirculating region, and therefore have an inaccurate prediction of the forces and moments once the boundary layer separates [3, 13, 14]. Celic and Hirschel evaluated a variety of algebraic, one, and two equation eddy viscosity models on four adverse pressure gradient cases, including separated flow over 2D airfoils. They concluded that none of the eddy viscosity models showed a clear advantage for all the considered cases of adverse pressure gradient flows. In particular, it was shown that the prediction accuracy for the 2D airfoil case was generally far poorer than the other cases considered; although this could be contributed at least in part to 11 additional experimental uncertainty for this case [14]. Matyushenko et al. performed a detailed investigation into the prediction errors of RANS models for 2D airfoils at high angles of attack. A variety of turbulence models were examined for a set of 2D airfoils including the S809 wind turbine airfoil. Errors in the transition model were considered, as well as various sources of experimental error were examined parametrically, to evaluate the likely sources of observed RANS prediction errors in the separated angle of attack region of the airfoils. Ultimately it was concluded that the primary source of error must be the turbulence models themselves [15]. Figure 1.2 shows results for RANS predictions of a two dimensional airfoil designed for wind turbine applications (S809 airfoil). Experimentally derived lift coefficients for this airfoil are shown from Somers and Tangler [16], and are compared to RANS predictions for a range of angles of attack using the Spalart Allmaras turbulence model [17] with an algebraic transition model [18]. Note that at low angles of at- tack (< 5?) the RANS predicted lift coefficient agrees well with the experiment. At high angles of attack the result is quite poor. This result is typical for a variety of RANS models for 2D airfoils [14, 15]. All RANS closure models utilize assumptions and some level of empiricism to provide closure to the RANS equations. One of these assumptions present in the Spalart Allmaras turbulence model (and others) is the assumption of an equilibrium boundary layer in the log law region [17]. In the presence of an adverse pressure gradient this assumption is violated. The adverse pressure gradient slows the flow in the outer regions of the boundary layer, and the defect layer begins closer to the wall. As the adverse pressure gradient increases, the momentum in the log law region is 12 Figure 1.2: Spalart Allmaras Predicted Lift Coefficient for the S809 Airfoil Com- pared With Wind Tunnel Data. decreased, and the equilibrium assumption of the eddy viscosity model is violated. As demonstrated by Medida, the eddy viscosity is over-predicted in the presence of an adverse pressure gradient [3, 13]. This over-prediction results in the RANS prediction of a boundary layer that is less susceptible to separation than observed in experiments. The resulting predicted separation location is then too far aft, and the separated region is too small. Ultimately this gives an over-prediction in the lift coefficient in the separated region. These observations are consistent with Spalart?s observation in the original presentation of the model, where he noted that the model has a tendency to under-predict the thickness of the boundary layer under adverse 13 pressure gradients, and observed that this may lead to a prediction of a boundary layer less susceptible to separation [17]. Another violated assumption appears in the recirculating region on the suction side of the airfoil after the flow separates. RANS turbulence models are calibrated based on attached turbulent flow over simple geometries, such as flat plate boundary layers. In this regime it is natural to use the wall distance as a length scale. However, when the flow separates the wall distance is no longer an appropriate length scale, as the turbulent flow moves far from the wall. In this region it is more appropriate to use the length scale associated with the mesh size, as is used in large eddy simulation (LES) [19, 20]. Therefore, again (due to violated assumptions) it is expected that the prediction of the eddy viscosity will be inaccurate in the recirculating region. Figure 1.3 shows the eddy viscosity predicted for the S809 airfoil at a high angle of attack using the Spalart-Allmaras turbulence model. The eddy viscosity is over- predicted in the highlighted regions, due to the violated assumptions used in the formulation of the turbulence model. 1.5.2 RANS Predictions of Shock Wave Turbulent Boundary Layer Interactions Another challenge for turbulent boundary layer analysis is shock wave bound- ary layer interaction. A shock wave marks a very rapid increase in pressure, tem- perature, and density. In many fluid dynamic problems of interest a shock wave impinges on a turbulent boundary layer. For example, transonic flow over the suc- 14 Figure 1.3: Spalart Allmaras Predicted Eddy viscosity for the S809 airfoil at 14.2? angle of attack. tion side of a transonic airfoil, as well as supersonic inlets and isolators are common examples of flow conditions that result in shock boundary layer interactions. When a shock wave impinges on a turbulent boundary layer, the boundary layer is sub- jected to a strong adverse pressure gradient, and the reaction of the boundary layer to the shock wave can be very difficult to predict. Two forms of shock boundary layer interaction will be examined in the current work. The first is a transonic airfoil as illustrated in Figure 1.4. As shown, the air accelerates over the suction side of the airfoil and a turbulent boundary layer develops. Despite the freestream Mach number being less than 1 there is a supersonic region above the airfoil. As described by Babinsky and Harvey, the upper surface of the airfoil is convex, which results in the creation of expansion waves from the 15 surface. These expansion waves reflect from the sonic line as compression waves, which coalesce and form a normal shock that ends the supersonic region [4]. This normal shock is characterized by a sudden increase in pressure and a decrease in velocity to subsonic conditions. Figure 1.4: Mach contour of a transonic RAE2822 airfoil at 2.31? angle of attack and Mach 0.725. When the shock impinges on the boundary layer a shock wave turbulent bound- ary layer interaction (SWTBLI) occurs. While for supersonic flow no information can travel upstream, there is a subsonic region at some point close to the wall, 16 separated by a sonic line, because the velocity must stagnate at the wall. For an attached interaction the subsonic region experiences an adverse pressure gradient and thickens the boundary layer not only downstream, but also some distance up- stream of the shock due to the subsonic region in the boundary layer. The sonic line also moves further from the wall, which creates compression waves in the supersonic region just ahead of the impinging (nearly) normal shock wave. When visualized, either through CFD or experimental methods, these compression waves make the shock appear smeared over a small distance upstream of the interaction, which is the reason this effect is referred to as a smeared shock-foot [4]. A smeared shock-foot is illustrated in Figure 1.5. Also note that unlike the subsonic airfoil, there is a region of favorable pres- sure gradient in the supersonic region because the airfoil is convex; therefore, the Mach number increases slightly in the supersonic region with a corresponding slight decrease in pressure. This favorable gradient is terminated by the strong adverse pressure gradient of the shock, and the adverse pressure gradient continues in the subsonic region following the shock. This is illustrated in Figure 1.6. Additionally, strong transonic SWTBLI can result in flow separations that can have a substantial impact on airfoil or wing performance. A strong SWTBLI can result in a separation at the shock, which can either reattach downstream or remain separated. The increased boundary layer thickness downstream of the shock results in less momentum near the wall in the boundary layer. This gives the boundary layer downstream of a SWTBLI a decreased ability to resist separation, and can result in a separated region near the trailing edge of the airfoil [4]. Accurately predicting the 17 Figure 1.5: Illustration of a smeared shock foot, adapted from [4]. interaction location and flow separation remain a challenge for RANS models. For similar reasoning as discussed for the subsonic airfoils, the boundary layer reaction to this impinging shock and associated adverse pressure gradient is difficult to predict with RANS models that are based on reasoning and calibrated for flat plate boundary layers. Note: it is expected that RANS turbulence models will perform better in the supersonic region with the favorable pressure gradient than in the strong adverse pressure gradient of the shock and the subsequent adverse pressure gradient in the subsonic region. The second type of SWTBLI considered in this work is a turbulent boundary 18 Figure 1.6: Pressure coefficient contour of a transonic RAE2822 airfoil at 2.31?angle of attack and Mach 0.725. layer encountering a supersonic compression corner. An oblique shock forms when the supersonic air approaches a concave corner. If the compression angle is low enough, an oblique shock wave will form at the corner and turn the air to the new wall angle. This shock impinges on the boundary layer and creates a SWTBLI. If the shock is strong enough the boundary layer will separate, creating a separation bubble that extends upstream and downstream at the corner. Upstream, the sep- arated bubble turns the supersonic air creating an oblique separation shock. The air above the bubble is characterized by a strong shear layer, which then reattaches downstream of the corner. The reattachment point generates a reattachment shock, 19 with a greater shock angle, such that downstream the separation and reattachment shocks impinge on each other generating a shock-shock interaction. For hypersonic cases, this interaction is characterized by very high heat fluxes with the peak heat flux occurring downstream of the reattachment point. Conceptually, the 2D sepa- rated ramp SWTBLI is shown in Figure 1.7. Figure 1.7: Flow structure of a ramp induced separated SWTBLI, adapted from [4]. There is a wealth of information from wind tunnel testing for 2D compression corners and transonic airfoils (and other SWTBLI) [21, 22, 23, 4, 24, 25, 26]. Despite this widely available experimental data for a variety of SBLI cases, RANS models are generally very poor at predicting the boundary layer reaction to an impinging shock. 20 Numerous studies have been devoted to investigating this error [27, 28, 29, 30, 21]. For transonic shock boundary layer interactions the shock/interaction location is generally poorly predicted [31, 32, 26], leading to an incorrect prediction of lift and drag on the airfoil. For compression corner interactions, RANS models are typically incapable of correctly predicting the separation length in the interaction region, and wall heat flux predictions are similarly poor [26, 21]. There does not appear to be a clear consensus whether any particular RANS turbulence model outperforms others. 1.6 Machine Learning with Neural Networks for Regression Neural networks are commonly used in machine learning applications because of their flexibility, robustness, and relative ease of computation. Neural networks for regression allow modelers to connect inputs (features) with outputs, by generating a function based on example (training) data. What makes neural networks truly extraordinary is the ability to scale with numerous inputs1, and their ability to be efficiently trained using massive quantities of data. Neural networks are not a new technology, but with the progress in modern computing there has been renewed interest in algorithms that can efficiently process the large quantity of data that is created by modern computing systems. Additionally, a property of neural networks that makes them particularly attractive is that they are universal approximators of nonlinear functions [33, 34]. Note that the properties of neural networks discussed here make them ideal 1or in other applications, numerous outputs as well 21 for augmenting RANS turbulence models (or other complex physical models). First, the required correction to augment a RANS turbulence model has an unknown, and non-linear functional form. It is difficult, therefore, to augment the model with a correction that could correct the model in a general sense.2 Additionally, turbulence models utilize numerous variables in addition to the flow variables, all of which can influence the turbulent physics, and therefore meaningully affect the required correction. Neural networks can accept numerous inputs efficiently. In his book, Dreyfus states, ?in general, neural networks make the best use of the available data for models with more than 2 inputs [33].? It is a safe assumption that any model aiming to augment a RANS equation will require more than two inputs due to the complexity of the problem. Due to the ability of neural networks to model general functions efficiently, incorporate a large quantity of data, and accept as many inputs as required by the problem, neural networks are employed in this work to produce augmented RANS turbulence models. It is important to note here that neural networks do not eliminate the modeler from the learning process. In fact, like all machine learning methods there are a number of choices that the modeler must make to implement an effective neural network. These choices include the choice of inputs, scaling of the inputs, and the size and structure of the network itself. Additionally, there is a choice of how to introduce the nonlinearity of the network (activation function). These factors can 2Although some corrections are possible for specific applications, as the SA-APG correction proposed by Medida can be considered an augmented Spalart Allmaras model for adverse pressure gradient applications [3]. 22 dramatically affect the performance of the learning and cannot be neglected. 1.7 Leveraging Machine Learning for Turbulence Modeling Other researchers have presented the successful application of neural networks for physics-based simulations. Samareh and Wong used neural networks to model the dynamics of several complex physical simulations, including CFD, to predict the output of the simulation. In this way the user could be informed of the likelihood of success of the simulation before performing the computation [35]. These applications demonstrate the flexibility and efficiency that machine learning methods can have on complex physics based simulations. Specifically for CFD, several researchers have utilized machine learning to either provide modeling inferences or enhanced models. Wang et al. utilized DNS solutions to train a machine learned correction to the Reynolds stress tensor using mean flow features. The resulting data-informed RANS predictions were shown to be substantially more accurate than the baseline model even for geometries different than the DNS training case [36]. Wu et al. used a similar methodology to produce a random forest model that provided an a-priori estimation of prediction confidence of a RANS solution [37]. Ling and Templeton used DNS and LES solutions to develop machine learned classifiers to identify where RANS turbulence models were likely to have large errors or violated assumptions [38]. Unfortunately, LES and DNS training data is unavailable at useful scales for even moderate Reynolds numbers. Additionally, directly learning from higher fi- 23 delity data is not likely to have much success, due to the lack of consistency between the higher fidelity data and the modeling environment. Measured data contains real, physical quantities, while RANS models employ empiricism and modeled quantities that are inconsistent with measured data. To resolve this inconsistency, Duraisamy and co-workers developed the Field Inversion and Machine Learning (FIML) ap- proach [39]. Instead of learning directly from the data, an inverse problem is for- mulated such that information is generated that is consistent with the modeling environment in an inference procedure. A misfit function (J) between the data and the model output is formulated. One or more model discrepancy corrections (?) is (are) embedded within the model at each point on the computational domain and a gradient based optimization algorithm is used to find the optimal correction field. Since the discrepancy field is constrained by the model, the inference step resolves the inconsistency between the data and the model. Following the inference step, a feature set (?) can be constructed from the model variables and a machine learned model is trained separate (offline) from the inference step. It has been demonstrated that given appropriate training cases, the FIML approach produces a machine learned model that improves predictions [39, 40, 41, 13, 42, 43]. Despite these demonstrated successes, there remains a drawback to the FIML approach. While the inconsistency between the data and the modeling environment has been alleviated, it was not completely removed because there is no guarantee that the information generated by the inversion can be learned perfectly, if at all. This in- troduces an inconsistency between the information and the modeling environment due to the limitations of the chosen machine learning algorithm being neglected in 24 the inference step. 1.8 Objectives The discussion in this chapter has identified several known areas where, despite a wealth of widely available experimental data, RANS models remain deficient. Additionally, machine learning with neural networks was introduced, along with a brief discussion of previous efforts to leverage machine learning methods to improve CFD predictions. Specifically, the FIML approach was shown to have a strong advantage over other machine learning augmentation methods, due to the inverse procedure; but a deficiency was noted in that imperfect training of the machine learning algorithm introduces an inconsistency between the information generated in the inversion and the model augmentation. The objective of this thesis is to improve RANS predictions by leveraging machine learning approaches to augment RANS turbulence models. The goal is not to machine learn a new model, but to learn a correction to an existing model that improves predictions where the model is known to be deficient, but does not diminish the accuracy where the model already performs well. Additionally, the augmentation needs to be as consistent as possible to the modeling framework. To accomplish these broad goals this work builds on the FIML approach, but seeks to improve the consistency of the FIML framework by accounting for the limitations of the machine learning framework during the inversion. In doing so, the informa- tion generated by the inversion will be guaranteed to be learnable, optimal for the 25 chosen machine learning method, and as consistent as possible with the prediction environment. These methods must be able to incorporate enough data to produce a useful augmentation, and be efficient in order to be practically implemented on a modern high performance computing system. 1.9 Contributions of Thesis The primary contributions of this thesis are as follows: 1. Implemented the Field Inversion and Machine Learning (FIML) framework inside the Stanford University Unstructured (SU2) code, a fully unstructured RANS solver, demonstrating robustness and practicality of the approach. For the first time, the SU2 code was adapted to perform the FIML approach. 2. Proposed and developed a new FIML approach that trains the neural network during inversion (FIML-Embedded) demonstrating improved regularization of the correction, guaranteeing that the correction can be learned, improving the consistency between the inversion and the model augmentation, and producing a neural network augmentation in the inversion step. This is a unique FIML method, entirely developed and applied for the first time in this thesis. The new method was demonstrated on both the simple model problem and several RANS applications. 3. Proposed and developed a new FIML method that utilizes a novel approach of training neural networks from physics based models (FIML-Direct). This pro- cedure improves upon the FIML-Classic approach similarly to FIML-Embedded, 26 but with less complexity. This is also a unique FIML approach, entirely devel- oped and applied for the first time in this thesis. This new method was also demonstrated on both the simple model problem and several RANS applica- tions. (a) Developed and applied a novel method of performing field inversions on numerous cases simultaneously, increasing generalization and preventing overtraining of the model augmentation. For the first time, the inference step can incorporate information from an unlimited number of datasets simultaneously, instead of solving multiple inverse problems as required by the classic approach. (b) Developed efficient and scalable algorithms and demonstrated perfor- mance on high performance computing (HPC) architectures. 1.10 Scope and Organization of Thesis Two new methods of improving upon the FIML framework were proposed, developed, and applied to both a simple model problem and a variety of RANS applications. Both new methods, wholly developed and applied for the first time in this thesis, improve on the classic FIML approach by accounting for the limitations of the chosen machine learning algorithm in the inversion process. The development, application, and results of these new approaches are documented in this thesis with the following organization. ? Chapter 1 provides an introduction and motivation to the physical problem, 27 as well as the major contributions of this thesis. ? Chapter 2 provides a detailed survey of previous work to augment RANS models with data via machine learning methods. Previous work on the FIML approach is discussed in particular detail. ? Chapter 3 introduces two new methods, developed for the first time in this effort, that improve upon the classic FIML approach by accounting for the limitations of the machine learning algorithm in the inversion process. ? Chapter 4 discusses the numerical methodology for the FIML approaches for both the RANS applications, and a simple 1D heat equation model problem. ? Chapter 5 presents the results for the FIML applications to the 1D model problem. The two new FIML approaches are applied here for the first time. ? Chapter 6 presents the results for the RANS applications. The FIML methods are demonstrated on a variety of canonical and practical RANS cases to explore the advantages/disadvantages of each approach. For the first time, the two new approaches are applied to several RANS applications. ? Chapter 7 presents the overall conclusions and summary for this effort, as well as some suggestions for future work. ? Appendix A presents inference results for a hypersonic wedge application. ? Appendix B presents inference results for the Onera M6 transonic wing. 28 Chapter 2: Progress in Data-Informed Turbulence Modeling A wealth of experimental and high fidelity simulation data sets exist for tur- bulent flows, and naturally turbulence modelers have leveraged these data sets to either improve predictions or quantify uncertainties in turbulence models. In this chapter the various methods of leveraging data for turbulence modeling are exam- ined. This chapter examines these previous works in two sections, based on the objective and approach of the effort. The first section reviews efforts to utilize data to calibrate models. These approaches do not attempt to modify the functional form of the underlying model, but instead calibrate closure constants such that the prediction matches observations. The next section discusses efforts to utilize data to improve predictions by modifying the functional form of the model itself. 2.1 Calibrating Turbulence Models From Data RANS turbulence models contain a number of free parameters, known as clo- sure constants, that can modify the behavior of the model. These closure constants were originally calibrated from simple flows such as flat plate boundary layers, and therefore many authors have attempted to recalibrate the various closure constants in order to improve predictions for specific applications. While many authors have 29 demonstrated success in improving predictions for various classes of flow, this simple approach to utilizing data is not likely to improve the model in general, as it does nothing to address the functional errors contained in the models themselves. In a calibration exercise, higher fidelity data, ? is identified that the modeler expects could improve the model, M(c). Free parameters in the model, c, are identified by the modeler, preferably parameters that have considerable uncertainty or have violated assumptions for the chosen application [40]. The free parameters are then calibrated such that the output of the model more closely matches the chosen data after calibration. Examples of RANS model calibration include Kato et al. who examined a single parameter of the k ? ? SST turbulence model, and showed that calibrating the parameter based off of a backwards facing step would also somewhat improve predictions on other cases, such as the RAE2822 airfoil [44]. Lanzafame et al. calibrated closure constants in the ??Re??SST transition model to improve force predictions on wind turbine airfoils [45]. Note that this approach neglects the error present in the turbulence model itself entirely, and attempts to correct any error via manipulation of the transition model. Similarly, Mauro et al. calibrated the ? ? Re? ? SST transition model to improve force predictions on a S809 airfoil at a high angle of attack. This again assumes that the prediction error is due to a deficiency in the prediction of transition, and neglects the error in the turbulence model itself, as identified by the analysis of Matyushenko et al. for this airfoil and others [46, 15]. Rocha et al. examined the k???SST turbulence model calibration for small wind turbines and demonstrated that modification of a closure constant 30 could improve performance for these cases [47, 48]. It is perhaps not surprising that modification of the closure constants can improve performance for some cases. It is, however, not likely a viable approach to improve the turbulence model in general because it inherently assumes that the functional form of the model is sufficient. Additionally, by recalibrating a new turbulence model variant is created, which can result in confusion and difficulty in assessing model accuracy [40]. Clearly, the models were originally calibrated based on certain simple flows, and recalibrating to different flows is unlikely to improve the accuracy of the model in general. A more rigorous approach to calibration is provided by statistical inference. Bayesian analysis is a statistical inference technique that allows the calculation of posterior distributions of the parameter given the observations. In this approach both the data, ?, and the parameters of the model, c, have associated probability distributions. P [c] is termed the prior distribution and represents the probability of the model without any observations. Given observations, ?, information about the probability of the model being correct can be determined. Typically this involves sampling methods such as Markov Chain Monte-Carlo (MCMC). Formally, the like- lihood, P [?|c], is the probability of the model being consistent with the observations which can be found through MCMC sampling. From Bayes? theorem the poste- rior probability, the probability of the parameters given the observations, P [c|?], is proportional to the likelihood times the prior (2.1). P [c|?] ? P [?|c]? P [c] (2.1) 31 The posterior distribution gives statistical insight that can be used to cali- brate the model (given by the values of the parameters that maximize the posterior probability distribution), along with the uncertainty of those values1. Several researchers have leveraged Bayesian techniques to recalibrate turbu- lence models. Ray et al. utilized Bayesian calibration techniques to calibrate closure constants of the k ? ? model, to improve predictions for a jet in crossflow. An- other Bayesian calibration study was performed by Guillas et al., based on data for a street canyon. Of note, the best parameters identified by Guillas et al. varied substantially from those identified by Ray et al. [54, 53]. Oliver and Moser utilized DNS solutions of channel flow to perform uncertainty quantification and calibra- tion of four common RANS models [55]. Papadimitriou and Papadimitriou used DNS solutions of flow over a backwards facing step to find posterior distributions of closure constants of the Spalart Allmaras model [56]. Edeling et al. used mean velocity profiles of flat plate boundary layers under various conditions to find pos- terior distributions for the closure constants of a number of popular RANS models. The results were remarkable in that the posterior probabilities differed substantially depending on the experimental data considered, despite the simple flat-plate appli- cation [57]. In summary, Edeling states: ?These results suggest that there is no single best choice of turbulence model or closure coefficients, and no obvious way to 1This differs from traditional uncertainty quantification where the uncertainties are simply propagated through the model and to estimate the effect on the output. While this can be valuable information, it does not require observations. A good example of this manner of (data-free) analysis for the Spalart-Allmaras turbulence model is given by Schaefer et al. [49, 50, 51, 52] 32 choose an appropriate model a priori [57].? 2.2 Improving Turbulence Models with Data and Machine Learning In this section, previous efforts to leverage data and machine learning to either replace model components with machine learned functions, or to augment the model to improve model accuracy. These approaches modify the functional form of the model itself to correct for errors in the functional form, as opposed to calibration exercises which simply modify the closure constants of the model and do not address errors in the functional form. The resulting data informed model is given by M? = M(?). The general process these approaches utilize is first to identify higher fidelity data, ?, that could improve a deficient model, M. A discrepancy is introduced, ?, to the model and is found. Machine learning is then applied to allow for the generalization of ? to additional cases. The use of data to make predictions can be grouped into three separate overall approaches. The first is to simply use the data to directly make predictions. In this approach there is no model and the data is used directly. This, in general, is not practical as it would either require an extreme amount of data of perfect quality, or very simple physics such that modeling is not necessary. This is certainly not appropriate for RANS modeling, as turbulence is an extremely complex physical process, and there is certainly not enough data available to make predictions for all required cases. The second category of utilizing data to inform predictions does involve mod- 33 eling. In this approach the higher fidelity data is injected directly into the model. While appealing in its simplicity, this approach is also not appropriate for RANS modeling. The higher fidelity data (typically from experiments, DNS, or LES) contain real, physical, quantities. The RANS models however, make use of modeled quantities that rely on some empiricism and have been carefully calibrated when the model was created. There is, therefore, a lack of consistency between the data in- jected into the model and the modeled quantities that will degrade the performance of the data-augmented prediction. An investigation by Raiesi et al. exemplifies the pitfalls of this approach. In order to evaluate and examine the underlying assump- tions of various RANS models Raiesi et al. computed exact turbulent quantities from LES and DNS solutions. These exact quantities were then injected into the model and the performance compared to the baseline model. None of the turbu- lence models were improved by using the exact quantities and most showed inferior performance [58]. Attempts to inject exact quantities into turbulence models have not resulted in enhanced insights or improved models [59]. Even though the mod- eled quantities represent real, physical quantities, the models have been calibrated such that if exact quantities are injected in a portion of the model the predictive capability is reduced. The third approach is appropriate for RANS modeling and other complex physics-based models. This approach is characterized by an inverse problem by which useful information is generated from the data. The data is inconsistent with the modeling environment. As noted, the data contains real, physical quantities, but RANS models contain numerous assumptions, calibrated quantities, and model 34 variables that are not consistent with the data itself. Therefore, to resolve this inconsistency, an inverse problem is formulated and solved by which information is generated that is consistent with the modeling environment. Note that in order to make predictions for new cases a model must be trained on this information to create an augmented model. This offline learning step introduces an additional opportunity for inconsistency, as the training introduces additional inconsistency between the information (from the inverse problem) and the resulting augmented model. Emory et al. utilized an eigen-decomposition of the anisotropy tensor to decom- pose the Reynolds stress tensor and perturb the shape of the Reynolds stresses by modifying the eigenvalues. Eddy viscosity models typically only produce Reynolds stress tensor shapes in a very small region of the realizable range due to the Boussi- nesq assumption. Thus, by perturbing the shape of the Reynolds stresses the struc- tural uncertainty of eddy viscosity models can be measured. This data-free uncer- tainty quantification has been applied to a variety of applications [60, 61], and has also been implemented in the SU2 package [62, 63]. To incorporate inferred Reynolds stress shape error from data, Xiao et al. utilized a sampling method to infer Reynolds stress anisotropy discrepancies from higher fidelity simulations. This discrepancy was spatially correlated ?(x) and the information was not able to be applied to a dissimilar application. Wang et al. built on this work and utilized machine learning methods (random forests) to build a function to reproduce the discrepancy from mean flow features, ?. The resulting discrepancy function, ?(?), was then shown to improve RANS predictions on other 35 cases (aside from the case it was built from). Wu et al. used a similar Bayesian sampling method to infer posterior distributions of the model discrepancy, and then applied those inferences to improve predictions on similar flows in a predictive setting [36, 64, 66, 65]. Edeling et al. utilized two additional transport equations that perturbed the Reynolds stress anisotropy from the eddy viscosity model baseline. In the data-driven approach of this method Bayesian inference was used to infer the coefficients of these transport equations, such that the prediction of the model more closely matched the data [67]. Again, sampling methods were used, and the inference was not learned such that it could be applied in a predictive setting. In an alternative approach to sampling methods, which can be prohibitively expensive for RANS applications, the inversion procedure was implemented by Wang and Dow in order to infer eddy viscosity fields from DNS data. A cost function was formulated that measured the distance of the RANS velocity field to the average velocity in the DNS solution. The eddy viscosity was treated as a parameter to be optimized, and the cost function was minimized, such that the optimal eddy vis- cosity was found that resulted in the RANS solution that most accurately matched the DNS solution [68]. This approach, however, completely neglected existing eddy viscosity models and is therefore not a complete methodology for improving them. Duraisamy et al. pioneered and developed the Field Inversion and Machine Learning (FIML) approach. This procedure utilizes a similar inverse problem; however, unlike the work of Wang and Dow the design variable is a discrepancy function introduced into the model. The inverse procedure finds the optimal discrepancy to minimize the distance between the output of the model and the data. This discrepancy 36 function is initially spatially correlated (?(x)), but then machine learning methods are utilized to create a discrepancy function from the flow features (?(x) ? ?(?)) [40, 69, 43, 13, 70, 41, 71, 72, 73]. Additionally, the methodology was implemented in a Bayesian framework such that posterior distributions could be determined along with the optimal correction [41, 39]. A variety of applications have been demon- strated, including 2D airfoils and shock boundary layer interactions. Additionally, Singh et al. presented a methodology for incorporating information from a number of cases into a single machine learned model augmentation. It was shown that an augmentation built off of inverse information from a 2D S809 airfoil improved pre- dictions on similar airfoils, demonstrating exciting generalization capabilities of the approach [72, 13]. It was demonstrated that the FIML augmentations are portable, and the augmentation was implemented and demonstrated on a separate prediction code than the implementation used for the inversion [13]. Parish and Duraisamy demonstrated the FIML approach on a simple model problem that will also be utilized in this work [39]. 2.3 Summary This chapter reviewed recent efforts by researchers to utilize higher fidelity data to improve turbulence models. The first section discussed efforts to calibrate models. In this approach the functional form of the model is not modified, but the various closure constants of the model are tuned in order to match data. There are numerous examples of successfully tuning models to match experimental or high 37 fidelity simulation data; however, it was noted that this method is unlikely to correct RANS turbulence models in general. RANS models were developed with simplifying assumptions, and calibrated based on simple flows such as flat plate boundary layers. Therefore, for more complicated applications, like flow over an airfoil, it is certain that the primary source of error in the model is the functional form of the model itself due to violated assumptions. Therefore, methods that do not seek to modify the functional form of the model are unlikely to show substantial improvement outside of the applications considered in the calibration process. The next section discussed efforts to produce data-augmented turbulence mod- els. The discussion was delineated by three approaches. The first, taking data and directly making predictions is the most simple but only appropriate for applications with exceedingly simple physics, or large quantities of perfect data. The second approach involves taking higher fidelity data and directly applying that data to the model. For RANS modeling, some researchers have demonstrated computing quantities from DNS or LES simulations and injecting those quantities directly into RANS models. The predictions are typically worse in this approach as there is an inconsistency between the real, physical quantities and the modeled quantities that have been calibrated in the model. The FIML approach addresses this inconsistency, by generating model-consistent information from higher fidelity data. This informa- tion can then be learned by a machine learning model and used in an augmented predictive model. This is the framework for the Field Inversion and Machine Learn- ing approach, and the successes of this approach were briefly surveyed. The next chapter continues the FIML discussion, beginning with details about the framework 38 developed by previous researchers, and continuing with the new approaches that build on this body of work. 39 Chapter 3: Field Inversion and Machine Learning With Embedded Neural Network Training Development 3.1 Overview This chapter introduces the field inversion and machine learning (FIML) frame- work. First, the methodology pioneered by Duraisamy et al. and utilized by previous researchers (referred to as FIML-Classic here) is discussed in detail. The motivation for this approach is discussed, as well as the procedure for incorporating information from multiple cases. To enable a discussion of the framework, a brief introduction to neural networks is also provided. Subsequent sections introduce two new meth- ods proposed and developed for the first time in this effort: 1. FIML-Embedded, where the backpropagation algorithm was embedded into the solver, and 2. FIML- Direct, where the weights of the neural network are trained directly. The advantages and potential drawbacks of both approaches are discussed, as well as procedures to incorporate information from multiple cases. Specifically for FIML-Direct, a new approach to performing the inversion on numerous cases simultaneously is discussed, which provides an exciting path for increased generalization and regularization over the classic approach. 40 3.2 Field Inversion and Machine Learning: FIML-Classic The Field Inversion and Machine Learning process was pioneered by Du- raisamy et al. and has been shown to be a flexible framework to derive optimal RANS correction fields. This approach has been successfully applied to a variety of RANS problems [40, 43, 69, 13, 70, 41, 39]. Additionally, by casting the model in a Bayesian framework, posterior distributions can be computed. The discussion of the FIML process below will be restricted to the specific application in the current work, and in a non-Bayesian setting. First an objective function (J) is defined to represent how closely the RANS solution matches the data. This cost function could take many forms, but typically J takes the form of a squared error of the model prediction (km) and the higher fidelity data (kd). To differentiate the objective functions for different methods, we denote the objective function for FIML-Classic as Jc. For FIML-Classic and the cases used in the present work the objective function takes the following form: FIML-Classic: Jc(?) = ?kd ? km(?)?22 + ??? ? 1.0?22. (3.1) The first part of the objective function (Jc) represents the misfit between the current design and the higher fidelity data. The second term is the regularization term, and penalizes corrections far away from the prior (baseline) design and pre- vents corrections that are unnecessarily large. The regularization constant (?) is set to a small value. With analogies to Bayesian inversion, the value of this constant has an intuitive meaning, and represents the confidence in the prior solution vs the 41 confidence in the posterior. The inversion procedure takes the form of a minimization (optimization) prob- lem to minimize the objective function: ?(x) = arg min?(Jc). Following the inver- sion, the optimum correction field (?) is obtained. Note the optimal ? is very useful to the modeler, as it shows the correction required to the model to match the data. If, however, the modeler wishes to incorporate this information that can be used to make predictions, a machine learned model, in this case feed-forward neural net- works, is then trained so that the correction can be reproduced from the features (?) which are an appropriate selection of variables from the model. Previous works showed the capability of this approach, including the demonstration that the re- sulting data-augmented model can improve predictions in applications not in the training set, and that the augmented model is portable [13]. Conceptually, the FIML process is illustrated in Figure 3.1. Figure 3.1: Chart of FIML-Classic procedure illustrating learning separate from inversion process. 42 In Figure 3.1 the data box represents the selection of appropriate higher fidelity data that the modeler has identified to improve the model. Examples of relevant data for RANS applications could include skin friction, surface pressure distribu- tions, heat flux distributions, etc. The FIML process utilizes this data to produce an augmented model that better matches the chosen data. The inversion box for FIML-Classic represents the determination of an optimal ?(x) to minimize Jc. Ad- joint methods efficiently compute the required gradients to employ gradient based optimization methods to solve the minimization problem. The learning box then represents the process of extracting features (?) from the solution to the inversion step, and training a machine learning model that can produce the correction ?(?) from the features. At the end of the FIML process, the model has been augmented with a machine learned model on-line so that the correction is applied to the model without the need to perform the expensive off-line inversion process. Note that the offline training process introduces an inconsistency into the FIML-Classic approach. The inverse correction distribution is the solution of the optimization problem min?(Jc(?)) which will give the optimal 1 spatial distribution for the correction (?(x)). Regardless of the machine learning method to produce the function ?(?) there will always be residual training error as there is no guarantee that there is an algorithm that can produce ?(?) exactly. For neural networks, a large 1The solution to min?(Jc(?)) will be a local optimum. Typically due to the cost of evaluating Jc(?) gradient based optimization methods are employed for this minimization and therefore the minimization will be susceptible to converging to local optima instead of the preferred global optimum. 43 neural network (a large number of nodes or hidden layers) may be required to reduce training error, but this network may be too costly to compute. Additionally, overly large neural networks suffer from overtraining, where the network represents the training data accurately but exhibits poor generalization to cases not in the training set. This training error is not accounted for in the inversion and therefore the evaluation of ?(?) will not perfectly reproduce the optimal distribution found in the inversion. In practice, this training error is significant and results in a degradation of the performance of the resulting augmented model. Ideally, we want a model augmentation that improves the model prediction for a variety of cases. To accomplish this the machine learning model must be trained on data that sufficiently represents the entire feature space of interest. In this way, a neural network trained on sufficient data will be interpolating on the data it was trained on when asked to improve the prediction on a case not considered in the training set. Because there are features and a correction at every point in the computational domain in the FIML-Classic approach, there is a relatively large quantity of training data available per case in the RANS application. However, to cover the feature space in non-trivial problems it is required to train the neural network using information from multiple data sets. In the FIML-Classic information learned from multiple cases is incorporated in a relatively straightforward manner. First the cases of interest are identified and higher fidelity data for each case (i) identified that the modeler expects will improve the model (kid). Then the inversion is performed on each case individually, and the features ?ik and optimal corrections ? i k are compiled from each case into a single 44 training dataset {[?1, ?1], [?2k k k, ?2 i ik ], [?k, ?k]}. The subscript k denotes that there is a feature and correction defined at every point in the computational domain. Note that the design variables used in the inversion are different for each case, which prevents us from performing the inversion on all cases simultaneously. The chosen neural network is then trained on the information from all cases in the training set. This generates a model augmentation that can reproduce the correction distribution from the features (training converts ?(xj) ? ?(?)). This augmentation can be applied to cases not considered in the training set (holdout cases). Graphically, this is represented in Figure 3.2. Figure 3.2: FIML-Classic Procedure for Producing Model Augmentations Incorpo- rating Information Learned from Multiple Inversions. 45 3.2.1 Neural Networks for Regression In order to better discuss the FIML framework a brief introduction to the numerical structure of neural networks is presented here, with much more amplifying detail in Chapter 4. For the FIML problem the authors utilized a fully connected, feed forward multilayer perceptron with a hyperbolic tangent activation function on the hidden layers. The input to the neural network is some subset of the flow variables (the features) at a node in the computational domain, and the output is the training correction at that node. In a fully connected feed forward neural network the input to the j-th neuron (xj) is a linear combination of the output of the (i) neurons in the previous layer. So that: ? xj = yiwji (3.2) i The output of node j is then computed using the activation function. For a hyperbolic tangent activation function: 1? e?2xj yj(xj) = tanh (xj) = ? (3.3)1 + e 2xj The output layer in this case is a single linear (yj = xj) neuron and the output of this neuron is the output of the neural network. The input layer has a number of neurons equal to the number of inputs to the neural network (number of features). Following the notation of Dreyfus, a single hidden layer network is given by (3.4) [33]: 46 ?[ (? )]Nc n y(x,w) = wNc+1,i tanh wi,jxj + wNc+1 (3.4) i=1 j=0 Where Nc is the number of hidden nodes and n is the number of inputs. Note that the neural network output depends on two variables: the features (? = x for the input layer) and the weights (w). The weights are determined in the training process. Typically the data required for the training process is assembled prior to training, and subsequently the weights are obtained by training the network using the data via the backpropagation technique [74] or another algorithm. Following training the weights are fixed so that the neural network is a function of the features. In backpropagation, the weights are first initialized to small random values and then iteratively updated to minimize a loss function, in this case the loss function was chosen to be the sum squared error over N training samples: ?N1 SSE = (y(w)? yT )2, (3.5) 2 k k=1 where w are the weights of the neural network and y is the output of the neural network. yT is the target output of the network (the output the neural network will reproduce if well trained). The derivative of the loss function with respect to the weights is then effi- ciently computed using reverse mode differentiation, and the weights are updated in the steepest descent direction as indicated by the gradients. This algorithm is then repeated until the loss function is small enough or stops improving. Full documen- tation of the backpropagation algorithm is given by Rumelhart et al. [74], and the 47 algorithm is discussed in detail in Chapter 4. 3.3 Field Inversion and Machine Learning with Embedded Backprop- agation: FIML-Embedded A new approach was developed and applied for the first time in this thesis, termed FIML-Embedded, which trains the neural network by backpropagation si- multaneously with the model such that the features are converging concurrently with the physics solver (and neural network inputs/features). FIML-Embedded in- volves the simultaneous solution of two minimization problems with different design variables. The first is the neural network cost function (3.5) which solves the min- imization problem minw(SSE(w)) by backpropagation. The second problem is the minimization of the FIML objective function (min? (Je)) using the training correc-T tion (?T ) as the design variable: FIML-Embedded: Je(?T ) = ?kd ? km(? 2 2T )?2 + ???T ? 1.0?2. (3.6) Note that both minimization problems must sufficiently converge, which re- sults in added complexity to the inversion procedure as it is sometimes difficult to sufficiently minimize equation (3.5) throughout the inversion. If minw(SSE(w)) sufficiently converges however, the nonlinearity of ?(w) due to the neural network is effectively hidden from min? (Je(?T )) resulting in easier convergence of that mini-T mization algorithm. Thus FIML-Embedded has the potential to be much more effi- cient than FIML-Direct (presented next section), especially for large and/or multi- 48 layered neural networks. Conceptually, the FIML-Embedded (and FIML-Direct) process is illustrated in Figure 3.3. Figure 3.3: Chart of FIML-Embedded and FIML-Direct procedure illustrating that offline learning is not required for these approaches. In Figure 3.3, the data and augmented model blocks are unchanged from the FIML-Classic chart (Figure 3.1). Now, however, the offline learning step is no longer required, because machine learning is included in the inversion step. For FIML-Embedded, the inversion block now represents the simultaneous solution of two minimization problems: minw(SSE(w)) and min? (Je(?T )). For both FIML-T Embedded and FIML-Direct, the solution to the inversion procedure is the aug- mented model so offline training is no longer necessary. Another potential drawback of the FIML-Embedded procedure is that is it is not as natural to account for the incorporation of multiple cases into a single 49 neural network. In order to provide more generalization to the resulting model aug- mentation it will be required to consider multiple cases. In other words, the model augmentation will need to incorporate information from multiple (perhaps numer- ous) inversions. For FIML-Classic the training data is simply assembled, and the learning is performed offline separate from the inversion. For FIML-Embedded the learning is performed in the inversion. This remains an unresolved question: how to best account for information from multiple inversions in a single model augmen- tation for FIML-Embedded. There are several possibilities that remain unexplored. There are concerns about all of these unexplored options: Unexplored Option 1. Perform FIML-Embedded on many cases, which will effectively regularize the resulting ? distributions by accounting for the limitations of the chosen machine learning algorithm for each case. Then, similar to the FIML- Classic procedure, assemble the features from each inversion and train a new model augmentation that incorporates all the inversions together. This approach is not ideal, as it is unsatisfying to simply throw out the augmentations for each case and train another. Additionally, while it would be guaranteed that each ? distribution could be learned by the chosen machine learning algorithm, there would be no guarantee that the set of inverse solutions could be learned despite the increased regularization. Unexplored Option 2. Perform FIML-Embedded inversions sequentially, with each inversion incorporating the inverse information learned from the previous in- versions. This approach would be similar to Option 1 however it could poten- tially alleviate the issue with guaranteeing that the set of inverse solutions could be 50 learned. However, each subsequent inversion would become more and more costly as the training algorithm would be learning on more and more data. Therefore, this approach would likely become intractable as the number of cases grows. Unexplored Option 3. Construct an ensemble model from each individual aug- mentation. Ensemble models for decision trees are utilized effectively in the random forests machine learning algorithm. The individual model augmentations could po- tentially be incorporated into an ensemble-averaged model that would incorporate the information learned from each inversion. This would be the most straightfor- ward approach however it would still result in a lack of consistency between the inversion environment and the model augmentation as the augmentation would not reproduce precisely the same correction distribution. Unexplored Option 4. Perform training offline (like FIML-Classic) but com- pute the derivative of the training algorithm so that the limitations of that algorithm are accounted for in the inversion. Option 4 is perhaps the most interesting of the unexplored FIML-Embedded options to incorporate multiple cases. This approach would account for the limitations in the inversion and it would also enable the si- multaneous inversion of multiple cases (similar to FIML-Direct, discussed in the next section). In this approach the inversion would be performed using the current output of the neural network ?(?, w) to find that case?s objective function ji, and the derivative of the cost function would be computed via adjoint methods to find dji/d?. Then, the training data would be assembled from each case and a single neural network trained on the information from each case. The derivative of the training algorithm, d?T/d?, could be computed by finite difference methods. Then, 51 the chain rule would be applied to each case to provide the req?uired derivative, dJe/d?T , in order to minimize the composite cost function Je = n i j i(?T ) where n is the total number of cases. There are two potential issues with this unexplored approach. The first is that the neural network is not defined for the first iteration, so the solver would be required to use ? = 1.0 at the beginning of the inversion process. The second, and more complicating, is that adjoint methods or any reverse mode differentiation approach would not be advantageous for the neural network derivative computation d?/d?T because the total number of ?T elements is equiva- lent to the number of ? elements. Still, this is an embarrassingly parallel problem that could be easily parallelized. However, for large neural networks, a large number of cases, or large computational domains this derivative computation could become costly. 3.4 Field Inversion and Machine Learning with Direct Training: FIML- Direct Another new FIML approach was proposed, developed, and applied for the first time in this thesis: FIML-Direct. For FIML-Direct, we begin with a straightforward modification to the cost function. Now the design variables are the weights of the neural network, and the output of the neural network is the correction. So in (3.4), y = ?. The minimization problem is now minw(Jd) FIML-Direct: Jd(w) = ?kd ? km(w)?22 + ???(w)? 1.0?22. (3.7) 52 The correction, ?, is determined by computing the output of the neural net- work by applying (3.4). The immediate disadvantage of FIML-Direct is that the nonlinearity of the neural network is now fully exposed to the optimization algo- rithm (minw(Jd)). Typically neural networks are trained with static data and the computation of the gradient by backpropagation is very efficient and tens of thou- sands of backpropagation iterations can be quickly performed. With FIML-Direct, each gradient computation is far more costly because km(w) must first be evaluated by a CFD simulation, followed by the evaluation of dJd/d(wj) by adjoint methods. Despite the potentially challenging minimization, FIML-Direct has many ad- vantages over FIML-Classic and FIML-Embedded. First, for turbulent correction applications, the use of FIML-Direct results in substantially fewer design variables for typical neural networks as the number of weights is substantially fewer than the number of points in the computational domain. Second, the optimum solution for the weights is independent of the computational grid and each particular flow solu- tion. We are seeking one neural network that corrects the model for all applications of interest, and therefore the weights should be the same between cases. Because the weights are independent of the case and computational grid they can be applied to dissimilar cases (such as different airfoils). To consider the simultaneous inversion of multiple cases, we first identify data that can improve each case, i, for n cases. The cost function is then defined as the composite cost function as the sum of the cost function from each case. So?if each case, i, has objective function ji then the total cost function is given by J = n id i=1 j . Note there is no requirement that each ji have the same functional form. This is 53 important as the type of data that improves case 1 may not be the same type that improves case 2. So dataset 1 could use the experimental lift coefficient, and dataset 2 could utilize the heat flux distribution. Both cases would have different functional forms for ji, but the composite derivative would still simply be the sum of each case?s derivative (Equation (3.8)). Additionally, the regularization constant, ?i can be tuned for each case. So for a case where the model performs well and the modeler has high confidence in the result the ?i can be increased for that case. ?n J id = j (3.8) i=1 n dJ ?d dji(w) = (3.9) dw dw i=1 The inversion is then carried out as in the single case inversion, where we minimize the composite cost function with respect to the weights: minw(Jd(w)). The resulting augmentation is then optimal considering all the data (all kid) and considering the limitations of the chosen machine learning algorithm. There is also no inconsistency between the inversion environment and the model augmentation as the same augmentation found in the inversion is used for prediction. Conceptually, this approach is illustrated in Figure 3.4. 3.5 Computational Cost Modern CFD simulations are often extremely computationally expensive. The FIML-Classic approach requires an inversion utilizing a gradient based optimization procedure that results in a series of CFD simulations (direct and adjoint) and natu- 54 Figure 3.4: FIML-Direct procedure for simultaneous inversion of multiple cases to produce single augmentation. rally for large cases this method can become quite expensive. We have proposed two novel approaches and here we discuss the computational cost of the new approaches and methods of mitigating the cost through efficient computation on modern high performance computing (HPC) architectures. The minimization (min(J)) for the FIML approaches could be performed using minimization algorithms that do not require gradient information, such as Powell?s method [75], but in practice more evaluations are required for gradient-free ap- proaches. Since efficiently computed gradient information is available via adjoint methods all FIML applications in this work utilize gradient-based minimization ap- proaches2. Therefore, for the FIML-Classic inversion (min?(Jc(?))) each new design 2For applications where evaluations of the cost function are efficient, gradient-free genetic algo- rithms could be utilized [76], at dramatically increased expense, but without gradient information 55 requires at a minimum an updated objective function and gradient. So each iter- ation costs approximately two CFD simulations (Direct+Adjoint). The required number of optimizer iterations required to substantially minimize the cost function is application dependent but is typically on the order of 10-20 iterations correspond- ing to 20-40 simulations. Thus the inversion cost per case is 20-40 times the cost of the baseline SA model. The inversion must be performed on enough cases to cover the feature space of the problem of interest, so for n cases the cost is 20n ? 40n simulations. The cost to perform the offline training is not trivial, but for the ap- plications here was far less than the cost of the baseline CFD. Following offline training, the model has been augmented and the only increased cost to utilize the augmented model is the extra cost of the floating point operations in the neural net- work; which is trivial compared to solving the extra transport equation associated with the baseline SA model. The added cost for FIML-Embedded is associated with the additional opera- tions that must be performed per iteration inside the direct and adjoint solvers. Each iteration the features at each node are updated and scaled, the forward propagation is performed to obtain the current output of the neural network (?(?, w)), and then the backpropagation algorithm is performed to update the weights to minimize the error between ? and ?T . Exact timing studies have not been performed to determine precisely how much these operations cost; however, in the applications considered in this work the additional computational cost was not noticeable. However, it was noted that under some conditions the backpropagation algorithm would not easily and with an increased ability to locate the global optimum. 56 converge and would ultimately affect the convergence of the flow solver, requiring increased flow solver iterations to sufficiently converge. Improved robustness of the FIML-Embedded approach has been identified as an area deserving of future work. No offline training is required following the inversion as the training/augmentation have already been performed. For FIML-Direct the added cost is primarily just the cost of assembling the features every solver iteration and forward propagating to obtain the current ?. The added solver cost for this operation was not noticeable, and the convergence issues observed for FIML-Embedded were not an issue for the FIML-Direct appli- cations in this thesis. The number of optimizer iterations required for FIML-Direct was comparable to that required for FIML-Classic, for the cases considered. There- fore, the computational cost of FIML-Direct is comparable to FIML-Classic except that there is no need for offline training following the inversion as the learning and augmentation has already been performed. For FIML-Direct with multiple cases, again enough cases must be included to sufficiently cover the feature space for the application of interest. For n cases, each optimizer iteration for FIML-Direct will require 2?n simulations (n ? Direct + n ? Adjoint) t?o evaluate the cost function Jd(w) = n i i=1 jd(w) and its gradient dJd(w)/dw = n i i=1 djd(w)/dw. So again, the cost is roughly comparable to FIML- Direct, with 20n ? 40n simulations required. For FIML-Direct there is also the potential issue that the inversions are coupled such that all cases must be considered at the same time during the inversion process. This is not an issue on modern HPC systems as the computation of the composite cost function and the composite 57 gradient is an embarrassingly parallel problem. Additionally, as will be shown it is also not necessary to compute the full gradient, as a randomly selected subset of the cases can be considered a batch for that iteration, and only the partial gradient of the cases in the batch are computed for that iteration. The weights are updated with the partial gradient and a new batch is selected. This is similar to stochastic gradient descent and is demonstrated in Chapter 5. 3.6 Summary The Field Inversion and Machine Learning methodology was introduced. The FIML-Classic procedure developed by Duraisamy and co-workers was discussed in detail including the methodology to incorporate numerous datasets (cases) into a sin- gle model augmentation. The overall advantage of the FIML procedure versus other machine learning approaches was discussed. Other machine learning approaches for physics based modeling suffer from an inconsistency between the modeling environ- ment and the prediction environment. By using the field inversion approaches pre- sented in this Chapter, information is generated by solving the the inverse problem that is consistent with the prediction/modeling environment. A potential short- coming of the FIML-Classic procedure was also presented, in that it requires offline learning or training separate from the inversion. As discussed, this introduces an opportunity for inconsistency, as perfect learning is not possible in practice. This error is not accounted for in the inversion and therefore there is an inconsistency between the inversion and the prediction environment, albeit much improved over 58 not performing the inversion at all. Two new FIML approaches, both developed and applied for the first time in this thesis, were introduced in this Chapter. Both new approaches address the identified deficiency in the FIML-Classic procedure. The first, FIML-Embedded was described, where the backpropagation algorithm is embedded into the model solver so that the limitations of the training algorithm would be implicitly consid- ered in the inversion. Additionally, following inversion the neural network has been trained and the model augmented. The potential shortfalls of this approach were also discussed. The first being that there is considerable added complexity over the FIML-Classic and FIML-Direct approaches. The second is that it is not straight- forward to include information learned from multiple cases. Potential methods to overcome this shortfall were proposed, but have not been developed and applied. The second new FIML procedure, FIML-Direct, was introduced in this chap- ter. In this approach the weights are trained directly (without backpropagation or offline training) in the inversion. In this new approach the weights are con- sidered the design variables of the cost function minimization. The advantages of this approach were discussed. For FIML-Direct the design variables are consistent across multiple cases. Therefore the inversion can be performed on numerous cases simultaneously. Thus, the machine learning model limitations are considered and the resulting augmentation incorporates limitations from numerous cases. Both of which are expected to improve the regularization and generalization of the resulting augmentation. Finally, the computational cost of all three FIML approaches was discussed. 59 It was shown that the two new approaches do not present an unacceptable increase in computational cost and should be straightforward to perform on modern HPC architectures. 60 Chapter 4: Numerical Methodology This chapter discusses the numerical methodology utilized in this effort. First the Navier Stokes equations are presented, followed by a discussion about the RANS approach. The turbulence models used are then presented. Neural networks, dis- cussed only briefly in Chapter 3, are discussed in detail in this chapter. Methods to perform feature selection and scaling, procedures critical to machine learning, are also presented. The 1D heat equation model problem is presented, which pro- vides a much simpler environment to examine the FIML methodology compared to the RANS applications. Finally, details on the numerical implementation of FIML-Classic, FIML-Embedded, and FIML-Direct are presented for both the model problem and the RANS applications. 4.1 Governing Equations The governing equations for fluid flows are given by the Navier Stokes equa- tions. Their derivation relies on the continuum hypothesis of an ideal gas and the assumption of a Newtonian fluid. From fundamental laws of conservation of mass, momentum, and energy the Navier Stokes equations on a rectilinear coordinate frame (x, y, z) are given by Equation 4.1 [77] 61 ?U ?F ?G ?H + + + = J (4.1) ?t ?x ?y ?z ???? ??????? ?? ? ? ??????? ??? ?u ???????? U = ????? ?v ????? (4.2)??????? ?w ???? ??? ???? ?(e+ V 2 ? )? 2 ? ? ??????? ? ?? ? ????? ?u ???????? ?u2 + p? ? ?xx ????? F = ????? ?vu? ? ????? (4.3)? xy ?????? ??? ?wu? ? ???xz ?????2 ?(e+ V )u+ pu? k ?T ? u? ? ?v? ? w? ? 2 ?x xx xy xz ???????? ??? ?v ? ??? ????? ? ? ?????uv ? ? ?yx ????? G = ?????? ?v2 + p? ? (4.4) ?? yy ?? ? ? ??? ??? ?wv ? ?? ?? yz ?? ????2 ??(e+ V )v + pv ? k ?T ? u? ? v? ? ?w? ? 2 ?y yx yy yz 62 ????????? ?w ? ?? ?????? ? ??????? ?uw ? ? ?zx ????? H = ????? ?vw ? ?? zy?? ? ???? (4.5) ? 2???? ?w + p? ? ? ? zz? 2 ? ?? ?(e+ V )w + pw ? k ?T ? u? ? v? ? w? ?? ???? 2 ?z zx zy zz ? ? ???????? ? ? ?? 0 ?????? ?? ? ????f ?x ???? J = ?????? ? ? ?fy ????? ? (4.6) ????? ? ? ?fz ????????? ?(uf + vf + wf ) + ?q??x y z The column vectors F,G, and H are the flux terms and J is a source term. U is referred to as a solution vector because typically in CFD solvers the elements of U are the variables that are solved for numerically [77]. The last element in U is the density times the total energy per unit mass E = internal energy + kinetic energy = e + V 2/2. The first equation (row) in (4.1) is derived from conservation of mass (or continuity), the next three from conservation of momentum in all three spa- tial dimensions, and the final from conservation of energy. ? is the fluid density, {u, v, w} are the components of velocity in the x, y, and z directions respectively, ? V = u2 + v2 + w2, e is the internal energy, and p is the pressure. The terms in J represent body forces and is zero if body forces are negligible. As a shorthand the spatial dimensions x, y, z will also be referred to as xj where i = 1, 2, 3 in the 63 following discussion. The k?T/?xj terms are the thermal conduction terms arising from temper- ature gradients in the fluid, where k is the thermal conductivity. Because k is typically constant the following relation is used for the thermal conductivity, where Pr = 0.72 is the Prandtl number for air [78]: ? k = cp (4.7) Pr ?ij is the viscous stress tensor. For Newtonian fluids the shear stress is pro- portional to the time rate of strain and then the viscous stress tensor becomes: [( )] ?ui ?uj ?uk ?ij = ? + +? ?ij, ?ij = 1 if i = j; ?ij = 0 if i 6= j (4.8) ?xj ?xi xk Where ? is the molecular viscosity of the fluid that can be easily computed as a function of temperature by Sutherland?s law [79] given by Equation (4.9) along with the constants for air (4.10). ( ) T Tref + S ? = ?ref (4.9) Tref T + S [ ] kg ?ref = 1.716? 10?5 , Tref = 273.15 [K], S = 110.4 [K] (4.10) ms ? is the bulk viscosity and the Stokes hypothesis given by Equation (4.11). This hypothesis is widely used and accepted but has not been proven [77], although there is no evidence that contradicts it [78]. 64 ? = ?2? (4.11) 3 For an ideal gas we have the following familiar relationship between pressure, density, and temperature: P = ?RT (4.12) Were R is the ideal gas constant. For a calorically perfect gas we have the following relationship with the specific heat at constant volume (cv) and constant volume (cp): R ?R cv = , c = (4.13) ? ? p1 ? ? 1 We can also rewrite e in terms of the temperature: e = cvT (4.14) Thus we ultimately have seven unknowns {?, ?u, ?v, ?w, ?E, p, T} that can be solved for with the five Navier Stokes equations (4.1), the equation of state (4.12) and equation (4.14) [77, 78]. For all cases considered in this work a calorically perfect gas is assumed where the ratio of specific heats (? = cp/cv) is assumed constant ? = 1.4. For all of the applications with the exception of the hypersonic wedge this assumption is valid. The extreme temperature range encountered in the hypersonic wedge case somewhat challenges the calorically perfect gas assumption; however, even in this application 65 the variation in ? is expected to be small (about 5%). 4.2 Reynolds-Averaged Navier-Stokes Equations Solving the Navier Stokes equations (4.1) is typically feasible for laminar flows, but far more involved and costly for turbulent flows. The issue arises from the need to accurately resolve all the small turbulent eddies, leading to the need for extremely spatially resolved grids as compared to the laminar case. This requirement becomes more and more challenging with increasing Reynolds number, as Blazek presents, the number of grid points to sufficiently resolve turbulent flows grows proportional to Re9/4 and the solution time proportional to Re3 [78]. With Reynolds numbers for typical engineering applications easily in the millions, the RANS equations are typically solved due to the difficulty of accurately resolving all turbulent length scales for DNS. For the RANS equations the velocity components are decomposed into a mean component and a fluctuating component (4.15). Substituting Equation (4.15) into (4.1) the resulting RANS equations are almost identical, but now for the mean quan- tities, and with the addition of the Reynolds stresses in the momentum equations resulting from the fact that the mean of the product of two fluctuating components is not zero. ui = u?i + u ? ? ? i, p = p?+ p , ? = ??+ ? , T = T? + T ? (4.15) The Reynolds stress is given in (4.16), and represents the mean momentum 66 fluxes induced by the turbulence [80]: ?Rij = ??u?iu?j (4.16) Boussinesq proposed the eddy viscosity concept which models the effect of turbulence as an additional effective viscosity, referred to as the eddy viscosity, ?t. This reasoning leads to the Boussinesq hypothesis: [ ] ?u?i ?u?j ? ?Rij = ??u?iu?j = ??t + ? u? u?k k?ij (4.17)?xj ?xi 3 The second term in (4.17) is often rewritten in terms of the turbulent kinetic energy, k = (1/2)u? u?k k. Turbulence models provide the missing terms in (4.17), namely ?t and k. 4.2.1 The Spalart-Allmaras Turbulence Model The Spalart-Allmaras (SA) turbulence model [17] is a popular one equation eddy viscosity model used to provide closure to the RANS equations for turbulent flow simulations. This model is chosen as the baseline to introduce the embedded model discrepancy. The transport equation for the SA model is given below: ( )2 ?v? ?v? [ c ]b1 v? + uj = cb1(1? ft2)S?v? ? cw1fw ?[ f2 (t2 +?t ?xj ? d ) ] 1 ? ?v? ?v? ?v? (v + v?) + cb2 (4.18) ? ?xj ?xj ?xi ?xi 67 The terms on the right hand side of (4.18) are referred to as the production (P ), destruction (D), and diffusion terms of the SA model such that: P = cb1(1? ft2)S?v? (4.19) [ ( )2 ? c ] b1 v? D = cw1fw ft2 (4.20) ?2 d The eddy viscosity is given by ?t = ?v?fv1. Additionally, ?3 v? cb1 1 + cb2 fv1 = ; ? = ; cw1 = + (4.21) ?3 + c3 2v1 v ? ? [ ] [ ] 1 + c6 f = g w3 v? w ; g = r + cw2(r 6 ? r) ; r = min , 10.0 (4.22) g6 + c6 S??2d2w3 The ft2 term was originally implemented to model transition but is often ne- glected as was done in this application (ft2 = 0). This variant of the Spalart Allmaras turbulence model is referred to as the ?SA-noft2? model. The Spalart-Allmaras model uses some empiricism and a number of closure constants (?, ?, cbi, cwi, cti) which have their own uncertainty. Of particular note is the equilibrium assumption in the log layer of the boundary layer. This assumes that the terms on the right hand side of equation (4.18) are in balance. In other words, the production, destruction, and diffusion terms all cancel each other in the log layer. This assumption is violated in strong adverse pressure gradients. Additionally, note 68 that the wall distance, d, is utilized as a length scale in both the production and destruction terms. This is natural for the problems for which the Spalart Allmaras was calibrated, namely attached boundary layers. However, this length scale is not appropriate far from the wall where d becomes large. Recently, Schaefer et al. have performed sensitivity analysis on the SA model closure coefficients. The authors concluded that the uncertainty in the von Karman constant (?) and the turbulent Prandtl number (?) were the primary sources of uncertainty quantities of interest such as pressure coefficient (Cp) [51, 50]. Other efforts seeking to recalibrate the closure constants of turbulence models were dis- cussed in detail in Chapter 2. While these calibration efforts consider parametric uncertainty, it is intuitive to expect that a larger source of uncertainty is a result of the structural form of the model. 4.3 The Spalart-Allmaras 1-equation BC Transitional Model (SA- BC) For many cases of common engineering interest the effects of transition are significant. For cases where transition was anticipated to be significant the transi- tion model proposed by Cakmakcioglu et al. was used [18]. Note that a correction to the original model is documented on the turbulence modeling resource website maintained by Rumsey, and additional comments and discussion of the model are also documented there [25]. In order to model laminar to turbulent transition an intermittency function, ?BC , is applied to the production term of the SA-noft2 tur- 69 bulence model. The intermittency function approaches 0 for laminar regions and approaches 1.0 where the flow is predicted to be turbulent. In this way there is min- imal turbulent production in the laminar regions and the standard SA-noft2 model in turbulent regions. The intermittency is computed as follows: ? ? ?BC = 1? exp(? Term1 ? Term2) (4.23) max(Re? ?Re? , 0.0) max(?BC ? ?2), 0.0 Term1 = C , Term2 = (4.24) ?1Re? ?C 2 Re 2? ?d Re? = , Re? = ? (4.25) 2.193 ? Re? = 803.73(Tu? + 0.6067) ?1.027 (4.26) C ?t 5.0 ?BC = , ?1 = 0.002, ?2 = (4.27) Ud Re The variables ?1 and ?2 were calibrated to a flat plate test case and set to ?1 = 0.002, and ?2 = 5.0/Re, where Re is the freestream Reynolds number. Re? , Re?, Re?c are the vorticity, momentum thickness, and experimental transi- tion onset critical momentum thickness Reynolds numbers respectively. Tu? is the freestream turbulence intensity. Like other RANS transition models [25, 81, 82, 3], the SA-BC model predicts transition by comparing the estimated momentum thickness Reynolds number to the 70 estimated critical momentum thickness Reynolds number. Unlike other transition models, the SA-BC model is an algebraic model meaning that there are no additional transport equations (no additional transported variables). 4.3.1 Strain Adaptive Spalart Allmaras Model: SA-SALSA The SA-SALSA model was implemented in SU2 in order to better model hy- personic shock boundary layer interactions. This formulation of the SA model re- states several terms with the intention of improving its predictive capability under non-equilibrium conditions [83]. This SA variant has shown an ability to better pre- dict the separation length in hypersonic SWTBLI [26]. The modifications required for the SALSA variant of the SA model versus the previous model are documented below [25, 83, 17]. The S? term is redefined (4.28) along with r (4.29): [ ] 1 S? = S? + fv1 (4.28) ? [ ? ( )] ?0 ?? r = 1.6 tanh 0.7 (4.29) ? S??2d2 ? ?0 is the freestream stagnation density, S ? is defined S? = 2S ? S ?ij ij, and S ? ij is similar (but not equivalent to) the vorticity (4.30): ( ) ? 1 ?ui ?uj ? 1 ?ukSij = + ?ij (4.30)2 ?xj ?xi 3 ?xk The term cb1 is replaced by c ? b1 which is not constant and defined by (4.31): 71 ? c?b1 = 0.1355 ?, ? = min[1.25,max(?, 0.75)], ? = max(?1, ?2) (4.31) The additional ? terms are then defined as: [ ( )]0.65 ?? [ ( ? )]0.65 ?1 = 1.1 ? ?2 = max 0, 1? tanh (4.32)S ?2d2 68 Finally, unlike the standard SA model, the turbulent kinetic energy term in the Boussinesq equation (4.17) is not neglected, and k is given by: ? ?tS k = ? (4.33) 0.09 4.4 Neural Network Structure and Training At this point neural networks have been discussed in some detail but only a cursory discussion of the structure of the network or algorithm to determine the neural network weights has been presented. This section adds some detail to the neural network structure used in this thesis, and then presents the algorithm for training (determining the weights) of the network. The first task in training a network is to assemble the training dataset. For all the applications in this thesis the output of the neural network is the correction to the model, ?. The inputs of the neural network are the features (?). For the RANS applications the features are a subset of the local flow variables discussed in detail in later sections. Assembly of the training data is a critical step in neural network 72 training, as the features must have a strong functional relationship to the desired output or any attempt to train a network will be futile. Numerical methodology, feature selection, and feature scaling are discussed in detail in the next section. Prior to learning, the neural network structure must be selected. In this ap- plication we exclusively use fully connected feed-forward neural networks but many other topologies and variations exist [33, 84, 85]. For this topology the primary parameters that define the topology (so-called ?hyperparameters?) are the selection of the number of layers, the number of nodes for each layer, and the choice of ?ac- tivation function? on the nodes. A neural network is made up of ?neurons1?. A single neuron is illustrated in Figure 4.1 [84]. The neurons are placed in layers, such that the inputs are placed in the first ?input layer?. The subsequent layers are termed the hidden layers. If there are mul- tiple hidden layers then the network can also be referred to as a ?deep? network, although there is some disagreement about how many layers constitute a deep net- work. The final or output layer of the neural network constructs the output of the network. In all applications in the present work the network only has a single out- put, ?. Each layer also utilizes bias nodes which are treated as additional inputs 1The term ?neural network? is somewhat of a confusing nomenclature as the analogy of a neural network neuron to a neuron in a biological organism is poor at best. Dreyfus has an excellent discussion of this inconsistency and states clearly that the progress made in neural networks is due to understanding the mathematics and statistics and not due to an understanding of how the human brain functions [33]. Therefore the present work will not attempt to make any comparisons of the human brain to the functionality of a neural network, however the nomenclature of neural networks is well established and will be used here. 73 Figure 4.1: Illustration of a single neuron in a neural network. (input of 1.0) to the neurons (each with an associated weight). This topology is illustrated in Figure 4.2. Note that neural networks are often displayed graphically to illustrate the connections between neurons, but they can also be easily written as equations. For Figure 4.1 the inputs from the previous layer are denoted yj, j = 1, . . . , n where n is the number of nodes in the previous layer. If the previous layer is the input layer, then n is the number of inputs and the yj are the inputs. Each input to the neuron has a weight, wij, where i is the layer of the neuron. The weight multiplies its input, and then all the weighted inputs are summed to create the input to the neuron. The neuron applies an ?activation function? to create the output of the neuron. There are many activation functions to choose from (Rectified Linear Unit (ReLU), Radial Basis Functions (RBF), etc.) but for all applications a hyperbolic tangent (sigmoid) activation function (yi = tanh(ai)) was employed in the hidden layers. As is typical 74 Figure 4.2: Illustration of chosen neural network structure. for unbounded regression problems the activation function for the output layer was chosen to be linear (yi = ai). The activation function introduces nonlinearity, so for this neural network structure the output of the neural network (?) is nonlinear with respect to the inputs of the network, but linear to the output of the final hidden layer. Equation 3.4 is equivalent to the topology shown in Figure 4.2 for a single hidden layer. The process of determining the weights of the neural network is referred to as ?training? or ?learning?. The broad category of learning implemented in this thesis is called ?supervised? learning which simply means that the model is being fitted 75 to assembled (known) training data in order to replicate the outputs as closely as possible given the same inputs. Now that the training data has been assembled and the structure of the network defined, the next task is to finally determine the weights of the network. Each line in Figure 4.1 represents a weight so naturally the number of weights grows substantially with the number of nodes and hidden layers. Typical neural networks have hundreds of weights that must be trained, and often there is a large quantity of training data, so the computational efficiency of the training algorithm is important. To efficiently train the weights of the neural network gradient descent methods are typically used, in order to minimize a cost function that measures the distance of the current output of the neural network to the desired output defined by the train- ing data. Gradient based methods require a gradient, and the gradient is efficiently computed by a reverse mode differentiation algorithm referred to as ?backpropaga- tion?. This algorithm was first presented by Rumelhart et al. [74]; here we largely follow the notation of Reed and Marks [84], with minor changes to notation to reflect the more specific application in the present work. First, we define the cost function as the sum squared error in Equation 4.34. For P training samples (input/output pairs in the training set), E is the total sum squared error of the current output of the network (??) compared to the target output in the training set, ?. 1 ?P E = (? ? ?? )2p p (4.34) 2 p=1 As a reverse mode differentiation algorithm, the derivative computation starts 76 with the output of the neural network by computing the value ?i for each node, i, as defined in Equation (4.35). ?Ep ?Ep ?yi ?i = = (4.35) ?ai ?yi ?ai For the output node, ?yi/?ai evaluates to 1 because this a regression problem using a linear activation function on the output node. The derivative of the error term for this pattern with respect to the output of the output node (?Ep/?yi) is then simply the derivative of the cost function. For hidden nodes the term ?yi/?ai is evaluated by using the information from nodes upstream. For hidden nodes: ?E ?p ?Ep ?a ?k ?ak ?i = = = ?k (4.36) ?ai ?ak ?ai ?ai k k Then to evaluate ?ak/?ai : ????? ???a ?fi k wki?k If Node i Connects to Node kk = ??? (4.37)?ai 0 Otherwise The term f ?i is the derivative of the activation function that captures the nonlinearity of the node at its current activation (ak) value. For the tanh(ai) we have f ?i = 1? f 2(ai). So we ultimately arrive at the following algorithm to compute the ? in Equation (4.38). Note that the algorithm begins at the output layer and works backwards, such that the nodes k are in the next layer closer. So the output layer provides the k nodes to the last hidden layer (i nodes). The last hidden layer then provides the k nodes to the i nodes and so on. 77 ???????(?pi ? ??pi) For Output Node?i = ??? ?? (4.38)fi k wki?k For Hidden Nodes Finally the derivative of the cost function with respect to each weight is given by Equation (4.39): ?Ep = ?iyj (4.39) ?wij Summing Equation (4.39) across all patterns in the training set gives the re- quired total error derivative with respect to that weight (?Ep/?wij). Note that while this algorithm is relatively easy to implement, it is possible to bypass the sometimes tricky coding of the backpropagation algorithm by using automatic differentiation [86]. This can greatly simplify the construction of large and/or complicated network connection structures by computing the gradient in Equation (4.39) with little or no user burden to implement the derivative computation. It also eliminates the requirement that the activation function has an analytical derivative. Additionally, the derivative computation could also be estimated by finite differences if the asso- ciated increase in cost and decrease in accuracy is tolerable (if using a real-valued step). In the present effort, the backpropagation algorithm was implemented as written above due to the relative simplicity of the neural networks implemented. With the gradient information, the weights can be updated towards a descent direction in order to minimize the current error. This can be as simple as a constant step (?learning rate?) in the steepest descent direction (??Ep/?w). The update rule for the weights would then be given by Equation 4.40: 78 w(n+1) = w(n) ? ?Ep? (4.40) ?w Where n is the current training iteration, the learning rate (?) is a small number (often 0.001 or smaller), and w is a vector containing the weights with corresponding gradient vector (?Ep/?w). For the first iteration the weights are ini- tialized to small random numbers. The update algorithm given by Equation 4.40 is equivalent to steepest descent with a fixed line search step. This is trivial to im- plement, but will exhibit poor performance compared to more advanced algorithms such as BFGS or its limited memory variant: ?L-BFGS-B? [87, 88]. Both BFGS variants construct estimates of the Hessian, giving superior performance over the steepest descent algorithm for minimization problems in high dimensions, such as neural network training, which will have hundreds of weights/dimensions even for small networks. Additionally, we hope that neural network exhibits good generalization, so that that the model is capable of not only replicating the data it was trained on, but also cases that were not available in the training set (holdout data). The ma- chine learning term for a network that performs well on the patterns in the training set but poorly on other patterns, is an ?overtrained? network. The terminology refers to the phenomenon where a neural network exhibits good generalization for earlier training iterations, but diminished generalization to holdout data later in the training evolution. Therefore, it is standard practice to periodically test the network on holdout data in order to quantify the performance on cases not in the training 79 set. Overtraining is particularly troublesome for large networks and therefore it is advantageous to not over-parameterize the network, by using the smallest network that provides sufficient accuracy to the training data. Therefore, the modeler choice for the neural network topology is certainly not trivial and is often problem depen- dent. The selection of the number of nodes and layers is often set based on previous work on similar problems, by trial and error, or by a parametric study where nu- merous networks are constructed with varying hyperparameters and the resulting performance is examined. 4.4.1 Feature Selection and Scaling Feature selection and scaling is an important consideration in the construction of useful neural networks that is sometimes neglected or overlooked. The features must have a strong functional relationship to the desired output of the model. For complex physics-based applications, such as RANS modeling, it is clear that there is a long list of variables that may influence the output, and it is not readily apparent which features are the most important. As noted, over-parameterizing the problem can result in over-training, so for generalization (and to minimize complexity) it is important to try to use the fewest features as possible that still accurately define the output. The process of determining the features is termed feature selection. Additionally, certain methods of training machine learning algorithms are sensitive to outliers and large differences in scales in the features. Neural networks trained by gradient descent methods are one such training algorithm, in which poor feature 80 scaling can result in networks that are difficult or impossible to sufficiently train. Because of the importance of feature selection, a substantial effort was made to identify suitable features for the RANS applications in the current work. The numerical methodologies employed for feature selection are documented individually below. Feature Selection Method: Correlation Coefficients Pearson correlation coefficients2 are a measure of the linear correlation between two quantities. The correlation coefficient between n samples of two variables x and y is computed by?first finding the sample covariance between two variables and then dividing by ?2 ?2X Y . The correlation coefficient is therefore given by Equation (4.41) [89]: ? 1 n i=1(n ?xi ? x?)(yi ? y?)?XY = (4.41) ?2X? 2 Y Correlation coefficients can help expose strong linear correlations (? approaches 1) or strong inverse linear correlations (? approaches ?1). Potential features with correlation coefficients with the output near 0 may not be the best choice. Similarly if two potential features are strongly correlated with each other using only one of the two in the feature set may provide the same information as using both 3. To further examine relationships between features two dimensional scatter plots were examined 2Also known as Pearson?s r, the Pearson product-moment correlation coefficient, or bivariate correlation 3Singular value decompositions were also used on potential features to help identify if features were providing unique information 81 between all features and the desired output (?). The features were assembled and correlation coefficients computed using the pandas library [90], and plots generated using the seaborn library which provides an interface to matplotlib [91] that enables the visualization of large quantities of data. Feature Selection Method: Sequential Back Selection Sequential back selection is a costly method of evaluating features but is straightforward to implement and can be applied to any machine learning algo- rithm. The machine learning model is trained using all but one of the features and is scored. This score could be any measure of accuracy such as mean squared error or maximum absolute error, but in this application the coefficient of determination was used, given by Equation (4.42). ?n 2 ? ?i=1(?i ? ??i)R (?, ??) = 1 n (4.42) i=1(?i ? ??) Values closer to 1 indicates that additional samples will be predicted well by the model. After models that have been trained holding each feature out, the fea- ture set with the best score is carried forward and the feature not in that set is dropped. This continues until there is only one feature or the desired number of features is met. Often models improve when certain features are removed. This is an indication that there are features in the set that are not providing useful infor- mation, and are complicating training by inappropriately raising the dimensionality of the problem. Eventually the score begins to decrease when additional features are removed, indicating that these features are providing useful information to the 82 model and should likely be retained. Note that the sequential back selection method could quickly become impractical with large quantities of training data because so many models must be trained and compared. Additionally, two methods of feature scaling were utilized. Feature Scaling Method: Z-Scaling Normalization For the RANS applications discussed later the features are of different scales, and there are very large outliers. Often the inputs have different physical units and scales, and if not normalized the smallest-scaled features will be ignored in training [33]. To mitigate the difference in scales Z-scaling was attempted, which is simply to scale each feature to a mean of 0 and a standard deviation of 1. So for each feature, i, and pattern, p, we perform the following normalization: p p ?i ? ??i?i = (4.43)??i For the applications in this thesis this scaling method demonstrated less per- formance than mapping to a Gaussian distribution. Feature Transformation Method: Map to Normal Distribution Extremely skewed data can be mapped to a Gaussian distribution to mitigate the negative impact of outliers on neural network training performance. Mapping to a normal distribution tends to spread out closely spaced features, and reduce the impact or spread of the outliers. Two methods of accomplishing this mapping were used. For the offline training cases in the FIML-Classic procedure (discussed in this chapter) the quantile transformer was used in the scikit-learn package [92]. 83 The second approach was implemented for the neural networks used inside the SU2 solver. First, the min and max of each feature were computed and bins created. Using the bins, the feature?s probability distribution function was approximated by computing the estimated distribution function. Typically 10 bins were used, but testing showed the algorithm was not very sensitive to the number of bins. The estimated distribution function was then transformed by the logit function, which is a close approximator to the inverse cumulative distribution function for a normal distribution. Each feature was then mapped to the transformed estimated distri- bution function to find its transformed value. While somewhat cumbersome and costly to iterate through all the features several times to compute the transformed distribution, it was shown that this transformation dramatically improved the neu- ral network training for the RANS applications due to its ability to mitigate the influence of marginal outliers. 4.5 Field Inversion and Machine Learning Numerical considerations for the FIML methods, introduced in Chapter 3, are discussed here. Two different implementations were assembled for the present work: one for the RANS applications, and the other for the 1D heat equation model problem. The heat equation numerics are considerably more simple and are discussed later in this chapter. The SU2 framework is the main focus of the following sections. The FIML approach for both the 1D heat equation and RANS implementation 84 relies on three major computations. The outer-most solver is the minimization (op- timization) routine. The optimizer minimizes the cost function, J , by manipulating the design variables, ?. For the applications in this work the BFGS optimization routine and the limited memory BFGS variant, L-BFGS-B were used. The opti- mizer initializes the design variables and calls the other components to compute the current value of the objective function, J(?), and the gradient vector, dJ/d?. For the RANS applications the objective function is determined by solving the RANS equations using the modified SU2 package for the problem of interest, and compar- ing the output of the model km(?) to the data kd. The derivative is computed by solving the corresponding adjoint equations. The modified SU2 autodifferentiated discrete adjoint solver was used to compute the gradient. The following sections detail this numerical framework for each FIML approach. 4.5.1 FIML-Classic For FIML-Classic, the design variables are the correction to the production term of the SA model at each point in the computational domain (? ? ?). The objective function is given by Equation (3.1). The numerical solution procedure is illustrated in Figure 4.3. The optimizer initializes the ? field4 and passes ? to the flow (direct) solver. The direct solver5 solves the RANS equations with the turbulence model with ? 4Typically, ? = 1.0 everywhere to start, corresponding to the baseline model solution for Jc 5?direct solver? here refers to the model solver, that is used to evaluate the objective function. This is also known as the ?primal? solver, and the terms are used interchangeably throughout this 85 Figure 4.3: Flowchart of FIML-Classic RANS implementation in SU2 illustrating input and output of major modules. applied to the production term. This gives the value of km(?), and therefore also Jc(?) which is returned to the optimizer. The adjoint solver requires the converged flow variables U? from the direct solver as well as the current value of ? set by the optimizer. The adjoint solver is executed and returns the gradient of the objective function with respect to the correction to the optimizer. With the current objective thesis. For RANS applications the direct or primal solver is the flow solver used to evaluate J and obtain converged flow variables U?. The adjoint solver refers to the solver used to obtain the gradient of the objective function, dJ/d?, via the adjoint equations. 86 function value and gradient, the optimizer can then determine the search direction and update the correction field to minimize the objective function. This updated correction is passed to the flow solver, and the process is repeated until no new min- imum can be found or the modeler terminates the algorithm because the objective function has been reduced sufficiently. The numerical details of both the optimizer and the adjoint solver are discussed in later sections of this chapter. 4.5.2 FIML-Embedded For FIML-Embedded, the objective function (Je) is given by Equation 3.6. The design variables in this case are the training correction (? ? ?T ). The backpropaga- tion algorithm was embedded in the SU2 flow solver (embedded backpropagation). Every flow solver iteration the required features are assembled from the flow vari- ables, are scaled, and the current values of the correction ? are found at each node in the computational domain. The current output of the neural network is com- pared to the training correction and the weights are updated via backpropagation. Ideally, the backpropagation algorithm will converge well and the converged value of the weights w? will replicate ?T well, so that ? ? ? ?T . The adjoint solver requires additional information from the direct solver for the FIML-Embedded case, and the converged values of the weights, flow variables, and ? are passed to the adjoint solver. A flowchart of this method is shown in Figure 4.4. 87 Figure 4.4: Flowchart of FIML-Embedded RANS implementation in SU2 illustrating input and output of major modules. 4.5.3 FIML-Direct The flow structure of the FIML-Direct approach is largely similar to the FIML- Classic implementation just substituting the weights for the correction. So for FIML- Direct the objective function is given by Equation 3.7, and the weights of the neural network are the design variables (? ? w). Inside the flow solver the flow features are assembled and scaled from the current values of the flow variables, and the current value of ? is updated. The optimizer is therefore influencing the eddy viscosity by manipulating the weights of the neural network that augments the production 88 term of the SA model. The rest of the solver structure is largely similar to the FIML-Classic implementation, as illustrated in Figure 4.5. Figure 4.5: Flowchart of FIML-Direct RANS implementation in SU2 illustrating input and output of major modules. When considering multiple cases, the flow structure is similar, except that each evaluation of the composite objective function (3.8) involves multiple flow solutions (one for each case), and each gradient evaluation involves multiple adjoint solution. Conceptually this is illustrated in Figure 4.6. 89 Figure 4.6: Flowchart of FIML-Direct RANS implementation for multiple cases. 4.5.4 Unconstrained Optimization By Gradient Descent All three FIML methods require an unconstrained minimization algorithm in order to efficiently minimize the cost function, J , for each approach. As discussed for the backpropagation algorithm, it is possible to simply take a small fixed step in the steepest descent direction (defined by the negative of the gradient with respect to the design variables). This approach, however, is not efficient especially for problems such as the FIML approaches here, that are very highly dimensional. The BFGS6 6BFGS stands for four names: Broyden Fletcher Goldfarb and Shanno. 90 algorithm [93] is very efficient at minimizing functions with many dimensions, and most of the objective function minimizations in this thesis utilized either the BFGS algorithm or the limited memory variant (L-BFGS-B) [88] when constructing the full Hessian estimate was intractable due to the exceedingly high dimensionality of the application. 4.5.4.1 The BFGS Method The gradient7 of the cost function ?J can be efficiently computed via adjoint methods. Hessian8 information would be very valuable, if available, as efficient New- ton methods could be used for the minimization of the cost function. Papadimitriou and Giannakoglou surveys different approaches for computing the Hessian for CFD applications via adjoint methods and, at best, the cost of computing the Hessian is N + 1 primal simulations [94, 95]. That cost is not achievable for applications with hundreds (potentially millions) of design variables. Therefore, the computational cost of computing the Hessian of J prevents its exact computation for the FIML RANS applications. Quasi-Newton methods provide an alternative to the direct computation of the Hessian. This family of methods estimates the Hessian from the gradient evaluation at two points. The quasi-Newton condition relies on the approximation given by Equation (4.44), where J is the function to be minimized, x is a vector of design variables, k is an index indicating the current design, ?J is the gradient of J , and 7Vector of first derivatives. 8Matrix of second derivatives. 91 H is the Hessian of J : H(k+1)(xk+1 ? xk) ? ?J(xk+1)??J(xk) (4.44) The algorithm is summarized as follows. The estimate for the Hessian, Bk is initialized to the identity matrix for the first iteration only. The current search di- rection, pk, is found by solving Bkpk = ??J(xk). Note that on the first iteration the search direction will be the steepest descent direction, but on subsequent iterations it will be informed by the estimate of the Hessian. If Bk were the exact Hessian we recover Newton?s method. After pk is found a line search is performed to minimize J along the search direction. This is a one-dimensional minimization problem to find ?k = arg min J(xk + ?pk). Once ?k is chosen by the line search algorithm the variable sk = ?kpk is set and the next design is defined by xk+1 = xk + ?kpk. The variable yk is defined as the difference between the gradient of the new design and the previous: yk = ?J(xk+1)??J(xk). Then the estimated Hessian can be updated with the new information as shown in Equation (4.45) [93, 87]: y yT T Tk B = B + k ? BkskskBk(k+1) k (4.45) yT Tk sk skBksk Note that the Hessian matrix is a n?n matrix where n is the number of design variables (elements in x). Therefore the memory required to store Hk for problems with many design variables can be prohibitively large. For neural network training n > 100, and for FIML-Classic and FIML-Embedded n > 50, 000 depending on the application. Thus for FIML-Classic and neural network training applications a 92 limited memory variant of BFGS was used, specifically the ?L-BFGS-B? algorithm, which stores vectors containing the Hessian estimate information [88]. In this way a sparse estimate of the Hessian is used and the memory requirements are dramat- ically diminished. The L-BFGS-B algorithm can also incorporate design variable constraints. For the applications in this thesis design variable constraints were only used to prevent the optimizer from taking unnecessarily large line search steps and constraints were never active in the optimal designs. For the FIML results the SciPy implementations of BFGS and L-BFGS-B were utilized [96], and for offline training for FIML-Classic the L-BFGS-B implementation in the scikit-learn package to train the weights [92]. 4.5.4.2 The Stochastic Gradient Descent Method In stochastic gradient descent (SGD) method9 is a natural extension of the steepest descent approach that is sometimes applied when multiple cases (or pat- terns) are being considered as a composite gradient. This is a common minimization procedure for weight updates in neural networks [84]. Instead of computing the ex- act gradient at each iteration only portions of the full gradient are computed, and a small step (h) in the partial gradient steepest descent direction is taken. In this effort the SGD method was implemented to minimize the composite cost function for th?e FIML-Direct approach, so the full cost function across n cases is given by J = n jii . In the SGD method, m < n cases are selected at random, and the 9also known as incremental gradient descent, and similar to ?on-line? training as presented by Reed and Marks [84]. 93 gradient is estimated by Equation 4.46: dJ ?m? djl (4.46) dw dw l The design variables are then updated by moving the design a small step in the steepest descent direction approximated by 4.46. Naturally, the gradient estimation will not be equivalent to the true steepest descent direction, but in some applications this is advantageous. First, it is less computationally expensive to compute the partial gradient. Second, the partial gradient introduces some randomness to the search direction, which in some applications can improve the chances of finding the true gradient and avoiding local minima [84]. 4.6 Complex Step Method Gradient descent methods require the computation of the gradient. For simple problems the gradient can often be computed analytically (and exactly). When the gradient is not available, the gradient is often estimated via finite differences. Writing the Taylor expansion of a function F (x) about an arbitrary point F (x0) we have Equation 4.47. ?x2 F (x0 + ?x) = F (x0) + ?xF ?(x0) + F ??(x0)? 2 ?x3 n (n) F (3)(x0) + ? ? ? (?x F (x0) + (4.47) 3! n! It is then straightforward to rearrange Equation 4.47 to obtain the first order accurate finite difference approximation for the first derivative: 94 ? F (x0 + ?x)? F (x0)F (x0) = +O(?x) (4.48) ?x As ?x goes to 0 we recover the exact estimation of the first derivative, however, on computing systems using floating point math the accuracy is limited by floating point accuracy. Floating point numbers can represent extremely small numbers, but truncation error becomes significant when operating with large numbers. So when applying Equation (4.48) as ?x approaches 0 the numerator (F (x0 + ?x)? F (x0)) will also become very small, but the values of F (x0) will remain constant. If F (x0) is much larger than F (x0 + ?x)? F (x0) the truncation error can be severe, and in practice there is a minimum value at which point ?x before the truncation error starts to dominate the error from neglecting higher order terms in Equation 4.47. Therefore, in practice, machine accurate estimations of the gradient are not possible with the finite difference method due to subtractive cancellation error. Note that this introduces another complication: the estimation of ?x will be dependent on the step size and the choice of ?x will be problem dependent. Alternatively, the complex variable method (or complex step method) enables the computation of the gradient of real-valued functions to machine accuracy [97] by leveraging the additional information that is stored in floating point complex numbers. To compute the gradient we simply take a small complex step (?x ? ih). The Taylor series expansion for this step is then: 95 1 F (x0 + ih) = F (x0) + ihF ?(x0)? h2F ??(x0)? 2 1 (ih)(n)3 (3) ? ? ? F (x0) (n) ih F (x0) + + (4.49) 3! n! Therefore, by adding a small complex step into any real-valued function the real part of the result is a O(h2) accurate estimation of F (x0): Re{F (x0 + ih)} = F (x ) +O(h20 ) (4.50) The imaginary part can then then be used to estimate F ?(x0) with O(h2) accuracy: Im{F (x0 + ih)} F ?(x0) = +O(h2) (4.51) h The advantage of using a complex step over a real valued step is that the gra- dient of a real-valued function can be estimated to machine accuracy, because the small imaginary step is never added to a much larger number. This causes subtrac- tive cancellation errors when using a real valued step and limits the smallest step size that is possible without unacceptably large round-off errors. With a complex step the subtractive cancellation error is avoided, enabling the use of extraordinarily small step sizes that can estimate the gradient so accurately that the missing terms of O(h2) are below the round-off error. At this point the gradient is estimated to machine accuracy, and the result of the estimation is insensitive to the chosen step size. 96 The complex variable method is easily demonstrated, and we show the re- sults of the same function used by Squire and Trapp [97]. We use the test function f(x) = x9/2, and compare complex variable step method (4.51) with the first or- der forward finite difference method given by (4.48), which requires an additional function evaluation. Figure 4.7: Comparison of the error as a function of step size for the finite difference method and complex variable step method. As shown in Figure 4.7, the complex variable method has several advantages. First, the 2nd order accuracy of the complex variable method manifests as a steeper error convergence slope over the first order accurate real valued finite difference method. Most importantly, the finite difference method error never approaches a constant value, and instead grows as the step size is decreased past the point were subtractive cancellation error begins to dominate the error balance. In fact, the approximation returns 0.0 as the step size approaches zero because there is so much 97 subtractive cancellation error in the numerator of Equation (4.48). The complex variable method, however, does converge to a very small error at which point the gradient is estimated to machine accuracy. Note that the machine accurate gradient is returned for a wide range of step sizes indicating that the estimation is not sensitive to the step size in this region, which greatly simplifies the choice of step size. Note that the complex variable method has two potential shortfalls: first, it only applies to real-valued functions. Second, like the finite difference method it is a forward differentiation approach. In other words, it can compute the gradient of any number of outputs for a single input. This can become costly for function with large numbers of inputs, as each element of the gradient requires a function evaluation. Despite these drawbacks, the approach is very practical for complicated and costly simulations as it is extremely easy to implement. Simply by changing all real type variables to complex type a real-valued simulation can return machine accurate gradients. The method is now widely used for a variety of physics based simulations due to its accuracy and ease of implementation [97, 98, 99, 100, 101]. 4.7 Adjoint Methods Adjoint methods allow the efficient computation of the exact gradient (to ma- chine precision) that is required for gradient based optimization methods. Many excellent explanations of adjoint methods have been presented (Johnson, for exam- ple), and we follow the notation and explanation of Giles and Pierce here [103, 102]. Given a linear set of equations Au = f where A is a matrix and f is a vector, it 98 is equivalent to solve the dual form where ATv = g. It can be shown that these formulations are equivalent as shown in (4.52). vTf = vTAu = (ATv)Tu = gTu (4.52) Now, given a set of equations that have variables, U and depend on ? that satisfy: N(U, ?) = 0 (4.53) For the Navier-Stokes equations the variables U would be the flow variables, but in the model problem discussed in later sections, U is equivalent to the temper- ature. ? is a vector of design variables that we wish to modify to minimize the cost function, J , that depends on ? and U . The gradient of the cost function is given by (4.54): dJ ?J dU ?J = + (4.54) d? ?U d? ?? In (4.54) the term dU/d? satisfies the linearized form of (4.53): ?N dU ?N + = 0 (4.55) ?U d? ?? Rewriting (4.55): ?N ? dU ? ?NA, u, ? ?f ? Au = f (4.56) ?U d? ?? 99 And we recover the conventional definition of the gradient of the cost function in (4.57): T ?J ? dJ ?Jg = = gTu+ (4.57) ?u d? ?? Note that the term ?J/?? is typically easy to compute analytically. The first term involving ?J/?U , however, can be costly. Note that gTu ? vTf . Therefore we can evaluate the first term in (4.57) either by evaluating the direct formulation Au = f or the adjoint form ATv = g. For a single design variable there is no difference in the cost and there is no benefit to evaluating the adjoint form. However for n design variables we must evaluate the direct formulation, (Au = f), n times because each design variable has a different f = ?N/??. This is not the case for the adjoint form as there is only a single g = [?J/?u]T common to all the design variables. Therefore, it is far less costly to evaluate the adjoint formulations when there are many design variables [103]. Alternatively, and perhaps more intuitively, adjoint variables can be derived by considering them as Lagrange multipliers [103]. Simply, because N(U, ?) = 0 the augmented objective function, I(U, ?) can be written (4.58): I(U, ?) = J(U, ?)? ?TN(U, ?), N(U, ?) = 0 ? I(U, ?) = J(U, ?) (4.58) Again ? are the adjoint variables. Differentiating: ( ) ( ) ?J dI = ? ?T ?N ?J ? T ?NdU + ? d? (4.59) ?U ?U ?? ?? 100 Note that ? can be chosen to be any value without violating (4.58) so it is chosen to conveniently eliminate the first term in (4.59). We again then recover the adjoint equation (4.61) [103]: ( )T ( )T ?N ?J ? = (4.60) ?U ?U And the derivative is computed: dI ?J ? T ?N= ? (4.61) d? ?? ?? For FIML-Classic and FIML-Embedded we have a design variable at every spatial point in the domain. For RANS applications this typically requires tens of thousands of design variables and direct solutions requiring hours of computational time. In this framework it is not practical to evaluate Au = f for each design variable and the adjoint formulation is required. For FIML-Direct the number of design variables is typically somewhat reduced, where even large neural networks will only have hundreds of weights. However it would still be far too costly to evaluate approach and therefore adjoint formulations are again required for the FIML-Direct approach. 4.7.1 Discrete Adjoint Implementation In this section the discrete adjoint implementation in the chosen solver (SU2) is discussed, and the modifications required to implement the various FIML approaches in SU2 are presented. The SU2 open source CFD software suite was chosen, in part, 101 due to its implementation of the autodifferentiated discrete adjoint solver, which is presented and well explained by Albring et al.. This discrete adjoint solver was originally developed for aerodynamic design optimization problems, where the de- sign variables modify an aerodynamic surface to minimize a cost function (minimize drag, for example). The flexibility of SU2 and the autodifferentiated discrete ad- joint approach has enabled the extension of the SU2 package to solve a variety of optimization problems [104, 105, 106, 107, 108, 109]. Given that the present effort made use of the SU2 discrete adjoint solver, we follow the notation and explanation of the adjoint solver given by Albring et al. [104] here, with some modification to represent the changes for the FIML procedures. Given a cost function, J , design variables, ?, and state variables, U , we have the following optimization problem (4.62): min J(U(?)) (4.62) ? Subject to U(?) = G(U(?), X(?)) (4.63) Where G(?) results from the discretization of and solution of N(U(?)) = 0: Un+1 = Un +N(U(?)n) (4.64) G(Un) ? Un +N(U(?)n) (4.65) N(U(?)) is the residual at the current iteration. If the system of equations has been solved, denoted by ?, then N(U?) = 0 which implies that with converged U we have U? = G(U?). 102 This framework is sufficient for the FIML-Classic and FIML-Direct imple- mentation. For these cases ? = ? and ? = w respectively. However, for the FIML-Embedded implementation (? = ?T ) there is an additional constraint on the minimization. Let X(U, ?) = H(?) be the backpropagation algorithm which is an additional constraint on the minimization problem such that (4.62) becomes (4.66): min J(U(?), X(?)) (4.66) ? Subject to U(?) = G(U(?), X(?)) (4.67) X(?) = M(U(?), ?) (4.68) Then, following the Lagrange multiplier explanation, we write the Lagrangian, where the additional adjoint variables, ?, are marked with a bar to denote the association to the variable in question (U? is adjoint of flow variables, X? is the adjoint of the neural network weights): I(?, U,X, U? , X?) = J(U,X) + [G(U,X)? U ]T U? + [M(?)?X]T X? (4.69) = O(U, U? ,X)? UT U? + [M(?)?X]T X? (4.70) Where O(U, U? ,X) is introduced as: O(U, U? ,X) ? J(U,X) +GT (U,X)U? (4.71) The values of the adjoint variables are again chosen to eliminate troublesome variables from the derivative of (4.69), namely ?U/?? and ?X/??, giving: 103 ? ? ? U? = O(U, U? ,X) = JT (U,X) + GT (U,X)U? (4.72) ?U ?U ?U ? ? ? X? = O(U, U? ,X) = JT (U,X) + GT (U,X)U? (4.73) ?X ?X ?X And the required derivative is then: dJ T d = MT (?)X? (4.74) d? d? To solve for the adjoint variables, a fixed point iteration is performed such that: ? U?n+1 = O(U?, U?n, X?, X?n) (4.75) ?U ? X?n+1 = O(U?, U?n, X?, X?n) (4.76) ?X Following convergence of (4.75), the adjoint variables have been found and the derivative can be computed. It can be shown that the adjoint fixed point it- eration inherits the convergence properties of the forward (direct/primal) solution and therefore if the flow solver converges the adjoint solver will as well. The ad- ditional adjoint variables for the weights, X? are required for the FIML-Embedded adjoint computation, but are dropped from the computation for the FIML-Classic approach, as well as the FIML-Direct approach because the weights are not con- verging (no backpropagation) and therefore the derivative information of the forward propagation in the FIML-Direct procedure can be computed via autodifferentiation without the additional adjoint variables. 104 Autodifferentiation tools were used to compute the required derivatives in (4.75). It is possible to compute the derivatives by hand, but autodifferentiation greatly simplifies the process. Essentially, the code that was used in the primal solver to produce U? is recorded. For the derivative computation, the record is used to replace every elementary operation with its derivative. This dramatically simplifies the development process. SU2 makes use of the autodifferentiation tool CoDiPack [110]. Because of the use of autodifferentiation, the development effort was mini- mized. The primary effort of modifying SU2 to support the FIML approaches was to redefine the design variables, assemble and scale the features as required, and to implement the neural network forward and backpropagation algorithms. For FIML-Embedded, the additional modification of adding adjoint variables for the weights was required. For FIML-Direct, additional adjoint variables for the weights are not required as the weights are held fixed, and therefore the required derivative information is automatically accounted for by the autodifferentiation algorithm. 4.8 Model Problem: The 1D Heat Equation To demonstrate all three methods we consider the 1D heat equation given by equation 4.77 and also presented by Parish and Duraisamy [39]. d2T = ?(T )(T 4 4? ? T ) + h(T? ? T ), z ? [0 . . . 1], h = 0.5 (4.77)dz2 The emissivity, ?, is taken to be a nonlinear function of temperature: 105 [ ( ) ] 3? ?(T ) = 1 + 5 sin T + exp(0.02T ) +N (0, 0.12) ? 10?4 (4.78) 200 To model this problem we use the following imperfect model of the true process. This imperfect model is missing the linear term, and therefore there is a deficiency in the functional form of the model. It is, therefore, a good analogy to the RANS application as the error in the RANS models are not likely due to the uncertainty in the closure constants, but are likely due to errors in the functional form of the model: d2T = ? (T 4 ? T 4o ), z ? [0 . . . 1], ?o = 5? 10?4 (4.79) dz2 ? To correct the imperfect model through the FIML inversion process we mul- tiply the right hand side of (4.79) by the function ?. For FIML-Classic the inverse process first solves for ?(z) and then trains a neural network to determine ?(T, T?). FIML-Embedded and FIML-Direct solve for ?(T, T?) in the inverse procedure. For the 1D heat equation model problem we use the following objective functions: ?N1 FIML-Classic: Jc(?) = (Ti(?)? T 2truth ) (4.80) 2 i i ?N1 FIML-Embedded: Je(?T ) = (T 2 i(?T )? Ttruth ) (4.81) 2 i i N 1 ? FIML-Direct: Jd(w) = (T 2 i(w)? Ttruth ) (4.82) 2 i i 106 Note, that for this model problem there are essentially only two variables to use as inputs for an augmented model. Therefore, the features for our augmented model are chosen to be ? = {T, T?} at each spatial point. In this form, ? has an analytical solution that, if found in the inversion process, will result in a fully corrected model. [ ] 1 3? ?(T, T ) = 1 + 5 sin( T ) + exp(0.02T ) +N(0, 0.1) ? 10?4? 0 200 h T? ? T + (4.83)  4 40 T? ? T Note that it is not guaranteed that the chosen machine learning algorithm can sufficiently replicate (4.83). For FIML-Classic the procedure is to first perform the inversion to find the optimal correction ?(z) = arg min?(Jc), and then train a machine learned model to replicate the solution of the inversion. It is not always possible to sufficiently replicate ?(z) with a machine learning algorithm (?(?)) which results in a degradation of performance of the augmented model due to imperfect learning. FIML-Embedded and FIML-Direct have a substantial advantage in this regard, as the training is performed in the inversion step. This will naturally restrict the solution of the inversion to what is learnable by the chosen algorithm, and by including the limitation of the algorithm in the inversion the optimization procedure will find the optimal solution that can be learned by the machine learning algorithm The 1D heat equation model problem is considered for two important reasons. First, the 1D heat equation is substantially more simple than a RANS simulation. By using a single equation in a single dimension it is substantially easier to under- 107 stand the algorithms presented. Second, it demonstrates the applicability of all three FIML approaches to other physics-based systems. FIML-Classic, FIML-Embedded, and FIML-Direct are applicable to many physics-based simulations and by demon- strating these approaches on the 1D heat equation this flexibility is demonstrated. 4.8.1 Model Problem: FI-Classic Implementation The 1D heat equation implementation is relatively straightforward. First the correction field ?(z) is uniformly initialized to ?(z) = 1.0. The correction field is then applied to the right hand side of (4.79) to give (4.84): d2T = ?(z)?o(T 4 4 ? ? T ), z ? [0 . . . 1], ?o = 5? 10?4 (4.84)dz2 Equation (4.84) is then solved by finite differences. In this application, second order central finite difference approximations were used for the second order deriva- tives with one sided second order finite difference approximations on the boundaries. This gives the following discretization: B = ?(z)?o(T 4 ? ? T 4) (4.85) Where the matrix B is the discrete approximation of second order accurate spatial derivative of T . The system of equations is then solved by adding a pseudo- time and marching to the solution of (4.84). This results in the update equation in (4.87). Dirichlet boundary conditions of T = 0 are enforced at both ends of the spatial domain. 108 ?? ???2 ?5 4 ?1 0 . . . . . . 0???? .? ? ?1 ?2 1 0 ..??? ? ? ? ??0 1 ?2 1 ? B ? ?? ?? ? .. . .? . . . . . . . .. .. ?? ? ? ??? (4.86) ?? 1 ?2 1 0?? ?? ?? . ?? . ?. 0 1 ?2 1??? 0 . . . . . . 0 ?1 4 ?5 2 T (n+1) ?t = T (n) + ?T, ?T = BT ??t?(z)? 4 4o(T ? T ) (4.87) ?z2 ? To generate the truth temperature distribution, equation (4.77) is solved and sampled a number of times. This gives a temperature distribution that defines our data (kd) for our problem. In the applications for this thesis only the mean of 100 samples is used, but in a Bayesian setting, as demonstrated by Parish and Du- raisamy, the sampling procedure defines the prior distribution required for Bayesian inference [39]. After solving the model problem for the current design the objective function (4.80) is computed. The gradient of the objective function Jc with respect to the design variables, ?(z) is efficiently computed via adjoint methods. First we set u ? T (z) and solve for the adjoint variables (v) via the dual form given by ATv = g. Here, using the linearized form of the equations to form the matrix A we have: A = BT ? 4??oT 3 (4.88) 109 Note that the temperature distribution (T ? u) has been found from the direct (primal) solver, so the adjoint solver requires a converged solution from the direct solver. For the model problem we use the objective function Jc given by (4.80) and therefore we have the following form for g: [ ]T ? ?Jg = ?(T? ? T ) (4.89) ?u The adjoint variables, v, are then solved similarly to the direct solution by solving for ATv = g. Finally, the required derivative is computed from Equation (4.90): dJ ?J dU ?J ?J dJ = + = gTu+ gTu ? vTf ? = vT ?0(T 4 4? ? T ) (4.90)d? ?U d? ?? ?? d? This gives the required gradient. For each design (each ?(z)) the direct solver produces the temperature distribution T (z) that satisfies the model problem and returns T (z) and the current objective function value Jc(?). The temperature dis- tribution is then passed to the adjoint solver which computes the adjoint variables, v that satisfy ATv = g and returns the gradient, dJ/d?. The optimizer then gener- ates a new ? distribution that the gradient based optimizer (such as BFGS) expects will produce a lower cost function and the process is repeated. In this application the BFGS implementation provided in the open source library SciPy was utilized [96, 87]. 110 4.8.2 Model Problem: FIML-Embedded For FIML-Embedded, the minimization for Je begins with the initialization of ?T (z) = 1.0. This is the ?training? correction, and is passed to the direct (primal) solver to evaluate Je. At the beginning of each new direct solve new weights, w, are initialized randomly. In other words, we train a new neural network every time Je is evaluated. To solve for the current temperature distribution first the current feature values are collected and scaled, for the 1D heat equation the features are ? = {T, T?}. The weights and the features are then used to solve for the current value of the correction, ?(?, w) using Equation (3.4). The backpropagation algorithm is then performed to update the weights to minimize the sum squared error between the current correction (?(?, w)) and the training correction ?T (z). Note that ?T (z) is constant every Je evaluation, however ?, w, and ? are updated every solver iteration. The temperatures are then updated using Equation (4.87). The process then repeats (gather and scale ?, update ?(?, w), update w, update T ) until the temperature distribution stops changing. In this fashion the temperature distribution converges with the weights. Following each Je evaluation a neural network has been trained. Figure 4.8 displays the algorithm for evaluating the current value of Je: Complex step differentiation was utilized for the gradient computation in the 1D heat equation FIML-Embedded implementation. This greatly simplifies the algorithm development and still yields machine accurate gradient information, albeit at a much greater computational cost over the adjoint approach. The complex step differentiation algorithm is, however, an embarrassingly parallel problem for each 111 Figure 4.8: Illustration of the iteration process for 1D Heat direct (primal) solver for FIML-Embedded. element of the gradient and therefore the process was parallelized to minimize the required run time. With the gradient computation complete, the design variables, ?T , are updated using the BFGS algorithm in order to minimize the cost function. As is likely apparent from this discussion, the FIML-Embedded approach has more complexity than the FIML-Classic approach, as well as the FIML-Direct method discussed in the next section. 4.8.3 Model Problem: FIML-Direct The solution process for FIML-Direct is largely the same as for FIML-Classic except the design variables are the weights of the neural network. This modification requires only minor changes to the direct and adjoint solver. 112 First the weights, w, are initialized randomly by the optimizer. This will initially give a very poor estimate to ?, and the initial solution to Jd(w) will not be equivalent to the baseline model (?(w) 6= 1.0 everywhere). In order to have the baseline solution for ? be close to 1 everywhere for the initial solution we have the neural network learn ?? so that ? = 1 + ?? and initialize the weights to small random values. This is equivalent to setting the output layer bias node to 1 for the first iteration, and therefore to simplify the notation ? will be considered the output of the neural network in this discussion. This initialization detail is not strictly necessary but helps avoid potential numerical solution issues with an initial design far away from the baseline model. Initializing the weights to 0 everywhere is not possible as this would also result in dJd/dw = 0 because the weights would have no influence on the value of ?. The initialized weights are given to the direct (primal) solver in order to evalu- ate Jd. The features are initialized and scaled and the forward propagation algorithm is performed to find ?(w, ?). The temperatures are updated and then the process is repeated (gather and scale features, update ?(w, ?), update T ) until the temper- ature distribution stops changing, and then ? and T have converged to their final values. Jd is computed and passed to the optimizer. To find the gradient the same adjoint numerical method discussed in the FIML- Classic section of this chapter is performed again with the converged value of ?(w, ?) found in the direct solver. This produces the gradient dJd/d?. Then complex step differentiation is performed on the forward propagation algorithm to produce the gradient vector d?/dw. The chain rule is then applied to find the required gradient 113 for the optimizer: dJd/dw. The BFGS optimizer can then update the weights and the next evaluation of Jd begins until Jd has been sufficiently minimized. 4.9 Summary In this chapter the numerical methodology was presented. The first section focused on the equations governing fluid flow and accommodations that must be made in order to efficiently solve for turbulent problems at practical Reynolds num- bers. The turbulence models utilized in this work to provide closure to the RANS equations were then presented. The next sections provided greater detail on machine learning with neural networks for regression. Neural networks, only discussed in passing up until this chapter, were explained along with the algorithmic documentation of the method for training these networks. A section was devoted to feature selection and scaling as this is a critical step that any modeler must consider in order to successfully construct a useful neural network. Numerical details concerning the three FIML approaches used were then pre- sented, along with the minimization algorithm used in all three approaches: the quasi-Newton BFGS method. This method is popular for highly dimensional mini- mization problems. The next sections discussed methods of obtaining gradient information in- cluding the complex step finite difference method and adjoint methods. The adjoint equations were presented, along with a detailed explanation of the discrete adjoint 114 approach implemented and modified in the SU2 code. Finally, the 1D heat equation model problem was presented. This simplified problem provides a simple modeling environment to test all three FIML approaches, and much of the numerical methodology presented in this Chapter. It also demon- strates that the FIML approach can be applied to other physics-based simulations (other than RANS simulations). The next chapter begins to present the results of this methodology, beginning with the 1D heat equation. 115 Chapter 5: One-Dimensional Heat Equation Simulations 5.1 Overview This chapter presents the results of all three FIML methods to augment the 1D heat equation model problem. Results for the FIML-Embedded and FIML-Direct methods are presented here for the first time. First, the results of the 1D heat equation using FIML-Classic are presented for a single case. The inversion is then performed on four cases and a neural network augmentation trained on the informa- tion resulting from the inversion. The augmentation is then tested on holdout data. Then the FIML-Embedded results are presented for a single case. This demonstra- tion is promising, but as discussed the methodology for considering multiple cases with FIML-Embedded has not yet been developed so results cannot be presented here. The results for FIML-Direct are presented for a single case, as well as the simultaneous inversion of multiple cases. The results from this model problem are very encouraging and are discussed. Finally, the FIML-Direct procedure is repeated using the stochastic gradient descent algorithm as an example demonstrating an al- gorithm that may be more favorable for considering large numbers of cases in certain computing environments. 116 5.2 FIML-Classic First, the truth distribution (4.77) is sampled which gives the truth tempera- ture distribution to which we compare our imperfect model given by (4.79). Then the minimization procedure is performed using gradient descent methods to min- imize (4.80) with respect to ?. In other words, the minimization is finding the spatially distributed ?(z) = arg min?(Jc). This completes the field inversion process (FI-Classic)1. An example of the inversion results is shown in 5.1. Figure 5.1: Figure of FI-Classic results for T? = 50. As shown in 5.1, the uncorrected model shows significant error. This error is almost completely eliminated following the inversion (right panel Figure 5.1). The 1Note the drop of the ?ML? from FIML, this notation will be used to denote cases where the inversion has been performed to generate the model-consistent information, but the information has not been learned by a neural network (offline training not yet performed). 117 residual error is insignificant. Figure 5.2: FI-Classic results displaying computed ?(z) for T? = 50. However, as shown in Figure 5.2 the inversion did not exactly produce the ? distribution of the analytical solution. This is because there are portions of the solution for this T? where the model performs well and is relatively insensitive to ?. This residual error is not eliminated by additional optimizer iterations. Figure 5.3 shows the inversion history of (4.80), Jc. The gradient-based opti- mization (BFGS) substantially reduced the value of Jc indicating that the optimiza- tion was successful in minimizing the error of the model. As with all gradient-based optimization procedures, there is no guarantee that the global optimum has been found. In this case, as there is an analytical solution for the optimum solution (4.83), we know that the final solution is very close, but not identical to the global 118 Figure 5.3: Objective function convergence history of (Jc) for T? = 50. (analytic) optimum. Also note, the inversion initially converges very quickly in the first 20 iterations. For this simulation it is relatively inexpensive to evaluate Jc and dJc/d?, however this convergence behavior could be very beneficial for more expen- sive applications such as RANS turbulence modeling as substantial improvement can be made with few optimizer iterations. At this point only the field inversion process has been completed. The inversion is repeated for a number of cases (several T?) in order to generate training data that the machine learning algorithm will learn from. In this case we choose T? = {20.0, 30.0, 40.0, 50.0} as the training set. The inversion results for each of these cases is performed and saved. From our training set we wish to create a machine learning model that can predict the ? without performing an inversion. For this application we use fully 119 connected feed forward neural networks for regression. In this application there are only three variables (T, T?, z). The inversion created a spatially correlated correction distribution ?(z) and we now want to create a function ?(T, T?). T and T? are our ?features? (?) that are the inputs to the neural network. The number of layers in the neural network and the number of nodes are free parameters that must be chosen by the modeler. In this application a single layer with 20 nodes was used. The hidden layer used a hyperbolic tangent activation function (y = tanh(x)) and, as is typical for regression networks, the output node used a linear activation function (y = x). The neural network was trained using the Scikit-Learn software [92]. The limited memory BFGS (L-BFGS-B) algorithm [88] was used to minimize the log loss function with respect to the weights of the neural network. 20% of the training data points were chosen at random and used as holdout points in order to test the algorithm, and guard against overfitting/overtraining, where the trained neural network performs very well on the training set but exhibits poor generalization for cases other than the training data (unseen data). The training results are shown in Figure 5.4. Note that for this neural network structure and activations some significant error remains following training. Though it is possible another neural network could perform better, there will always be residual error following learning. Note that this is error that was not considered during the inversion, and naturally, this will result in reduced performance of the augmented model due to imperfect training. The truth temperature distribution, baseline model, and neural network aug- mented model results for each temperature in the training set are shown in Figure 120 Figure 5.4: FIML-Classic Neural Network Error Following Training. 5.5. The error is almost completely eliminated for the temperatures in the training set indicating that the errors observed due to imperfect training (Figure 5.4) were not significant enough to produce significant errors in the augmented model. This is not true in general, as will be demonstrated in the FIML-Classic application to the S809 airfoil in Chapter 6. To further explore the generalization capabilities the augmented model is tested for conditions that were not included in the training set (conditions that we did not perform an inversion for). Additionally, the augmented model was tested for a variable T? case, while the neural network was trained using cases uniform T? cases. The performance of the augmented model for these unseen conditions 121 Figure 5.5: FIML-Classic results of model augmentation for training temperatures. is shown in Figure 5.6. The correction ?(T, T?) produced by the neural network augmentation is shown in Figure 5.7. Figure 5.6: FIML-Classic results of model augmentation for holdout temperatures. 122 Figure 5.7: FIML-Classic neural network correction for holdout conditions. As shown in Figure 5.6, there is some diminished performance for the holdout conditions compared to the training conditions. However the error is substantially reduced in the augmented model compared to the baseline, so the augmentation is certainly improving the model in all cases. The augmentation improved the prediction despite some extrapolation, which is illustrated in Figure 5.8. As shown, the holdout cases not only test the generalization of the network for interpolated cases, but also in some extrapolation conditions, demonstrating the robustness of the augmented model resulting from the FIML-Classic process. 123 Figure 5.8: Features for FIML-Classic training and holdout cases. 5.3 FIML-Embedded The FIML-Embedded procedure was also tested on the 1D heat equation model problem. Again, the truth distribution was sampled 100 times by solving the full ?true? heat equation with the full right hand side (4.77). The weights of the neural network are initialized randomly at the start of the model (primal) solver and then updated via backpropagation. The weights converge to their final values with the temperature distribution. For the initial design the neural network is learn- ing a uniform distribution (?T = 1.0) and the initial temperature distribution will closely match that of the baseline model (4.79). The gradient is computed via the 124 complex step finite difference method which yields a machine-accurate estimation of the gradient, but is far less computationally efficient in this application compared to adjoint methods because it is a forward differentiation approach. This shortcom- ing is mitigated in this implementation by parallelizing the gradient computation. The gradient computation is embarrassingly parallel so this implementation is triv- ial and performs adequately well for a limited number of design variables. For FIML-Embedded a maximum of 60 nodes (equivalent to the number of design vari- ables) was used. For RANS applications adjoint methods were used to compute the FIML-Embedded gradient due to the much larger number of design variables in that application. The BFGS method was again used to perform the minimization of the cost function (4.81). The temperature distribution of the baseline model and the aug- mented model are shown in Figure 5.9. Note that unlike FIML-Classic, the neural network training is performed in the inversion. Therefore, following inversion, the model has already been augmented. The error following inversion is slightly larger than that shown for the FI- Classic inversion (Figure 5.1). This is expected, because the FIML-Embedded ap- proach accounts for the limitation of the chosen neural network in the inversion process. Therefore, it is guaranteed that the resulting inversion from the FIML- Embedded procedure can be learned. In other words, the inversion will not accept a solution that the backpropagation algorithm cannot learn. FIML-Embedded results in a correction field, ?(T, T?), that does not match the true distribution (4.83) as closely as the inverse to FI-Classic. This is due to 125 Figure 5.9: FIML-Embedded results of model augmentation. the increased regularization of the FIML-Embedded approach. The correction field (output of the neural network augmentation) is shown in Figure 5.10. Additionally, the history of the BFGS minimization of (4.81) is shown in Figure 5.11. This shows strong convergence and a substantial improvement over only a handful of iterations. By training the neural network inside the iterative solver we can train the neural network with only slightly increased computational effort in the model solver, and are still not required to perform excessive minimization iterations. Note that unlike FIML-Classic and FIML-Direct it is not readily apparent how to incorporate data from multiple inversion cases (multiple T?) into a single aug- mented model. Additionally, FIML-Embedded relies on the sufficient convergence of two minimization problems: minw(SSE) and min? (Je). The requirement to suf-T ficiently converge the error term (SSE) by backpropagation inside the solver while the features are converging is sometimes difficult, and adds substantial complexity 126 Figure 5.10: FIML-Embedded correction output from model augmentation. to the evaluation of the current design (current Je). These are both drawbacks of the FIML-Embedded approach. However, with FIML-Classic there is no guarantee that the chosen machine learning algorithm can learn the inverse solution. With FIML-Embedded the inverse solution is already learned, and there is no need for offline training. Additionally, FIML-Embedded utilizes the backpropagation algo- rithm to update the weights. This algorithm provides a very efficient weight update algorithm that enables the training of very large neural networks. Depending on the application, neural networks with multiple layers can easily have weights num- bering in the hundreds. FIML-Embedded, like FIML-Classic, can efficiently train 127 Figure 5.11: FIML-Embedded objective function minimization history. these large networks. This may not be as straightforward when training the weights directly as discussed in the following section (FIML-Direct). 5.4 FIML-Direct The FIML-Direct algorithm was also demonstrated using the model problem. Again the truth model was sampled 100 times to generate the truth distribution (kd). The weights (FIML-Direct design variables) are initialized randomly. The model problem is solved by computing the current ?(w, T, T?) and applying to the right hand side of the model equation. The cost function (Jd) is evaluated by 128 Equation (3.7). The gradient for FIML-Direct (dJd/dw) is evaluated through adjoint methods to compute dJd/d? and d?/dw is computed for each weight via complex step differentiation. The weights are then updated via the BFGS method in order to minimize the cost function Jd. Once the inversion is complete and a minimum Jd is found, the augmentation has been generated and is tested on holdout conditions. The weights are then held constant, and the correction is a function only of the flow features, ?(T, T?). Figure 5.12 shows the results for FIML-Direct using a uniform T? = 50.0. For this application, a single layer of five neurons using hyperbolic tangent activation functions. Figure 5.12: FIML-Direct results of model augmentation for a single training tem- perature. The optimal correction, ?(T, T?), is shown in Figure 5.13. The derived cor- rection shows remarkable agreement with the true correction distribution for this temperature and neural network configuration. This agreement is not expected in a 129 general application because the limitations of the neural network are being consid- ered in the inversion, and will limit the convergence of the neural network output to the true distribution in some cases. Despite this fact, the excellent convergence is very promising especially considering that neural networks are typically trained using large quantities of training data and thousands of weight updates. As shown in Figure 5.14, the objective function converged several orders of magnitude in a few iterations. Figure 5.13: FIML-Direct correction for a single training temperature. A major advantage of FIML-Direct over the FIML-Classic and FIML-Embedded approaches is that the design variables in the inversion minimization problem are 130 Figure 5.14: FIML-Direct correction for a single training temperature. consistent across all cases. In this application: across T? distributions. Therefore the inversion can be performed on numerous cases simultaneously simply by creat- ing a composite objective function that is the sum of the cost function (Jd) from each case. The gradient is similarly just the sum of each case?s gradient. This is demonstrated for the model problem and results are shown in Figure 5.15. Note that the augmentation is very successful with very little residual error shown for the training temperatures. This is confirmed by the excellent convergence of Jd shown in Figure 5.16. Following the inversion/augmentation on four training temperatures the aug- 131 Figure 5.15: FIML-Direct results of model augmentation for four training temper- atures. mentation was tested on holdout (unseen) conditions, including a variable T? distri- bution. As shown in Figure 5.18, the augmentation performs extremely well on the holdout conditions. This shows that the neural network is exhibiting good gener- alization capability, or using the machine learning terminology: the neural network is not overtrained. Despite only training on four uniform T?(z) distributions, the augmentation is able to substantially improve the prediction on a variable T?(z) holdout case. Additionally, some extrapolation is possible as the highest tempera- ture in the training set was T?(z) = 50, but the error is still dramatically reduced when the augmentation is tested on the holdout condition of T?(z) = 55 and simi- larly on the variable T? distribution that reaches T? = 60 as z approaches 1.0. The feature space is illustrated in Figure 5.20 and shows that the model was 132 Figure 5.16: FIML-Direct Objective Function, Jd, Convergence using BFGS mini- mization for four training temperatures. tested for both interpolations and modest extrapolation from the training set. Also note that excessive training conditions were not required to cover the feature space. This is a promising observation as we seek augmentations that apply to a broad variety of applications, and the results suggest that excessive sampling may not be required to cover the feature space of those applications in order to provide a substantial improvement to the performance of the augmented model. The 1D heat equation was also tested using the FIML-Direct stochastic gradi- ent descent approach. This method may be advantageous for performing the train- 133 Figure 5.17: FIML-Direct correction, ?(T, T?), for training using four temperatures. Figure 5.18: FIML-Direct results of model augmentation for four unseen T?(z) distributions. 134 Figure 5.19: FIML-Direct correction, ?(T, T?), for holdout T? distributions. ing/inversion on numerous cases when limited computational resources are avail- able, as the gradient computation is substantially more efficient. For four training T? distribution 8 simulations are required to compute the full gradient via adjoint methods (4 Direct + 4 Adjoint simulations). For the stochastic gradient descent demonstration on the 1D heat equation only a single case was considered the batch for the gradient computation. Therefore each gradient computation to update the weights was computed with the cost of two simulations (1 Direct + 1 Adjoint). Note that without the full gradient advanced gradient descent methods (like BFGS) should not be used for the direction finding, so a small step in the steepest descent 135 Figure 5.20: FIML-Direct feature space depiction for training and holdout T?(z) distributions. direction is taken after the gradient evaluation for the stochastic descent method. It is expected, therefore that many more steps will be required for the stochastic implementation in order for an equivalent drop in the objective function compared to the full gradient approach, however, each step is far more efficient. Additionally, the computation of the full gradient is an embarrassingly parallel problem (since the computation of the partial gradient for each case does not require any information from the other cases), so given enough computational resources there should be no need to avoid the computation of the full gradient. The results of the model aug- mentation for the training temperatures after using stochastic gradient descent for the inversion/training are shown in Figure 5.21. Similar to the FIML-Direct results when using the full gradient, the resulting model augmentation from using the partial gradient shows substantial improvement 136 Figure 5.21: FIML-Direct stochastic gradient descent results for training tempera- tures. on the holdout cases as (Figure 5.22). As expected, the convergence rate when using the full gradient is substantially better as compared to only using the partial gradient as shown in Figure 5.23. However, each evaluation for the SGD method is roughly 4 times more efficient than using the full gradient for this application. 5.4.1 Inversion Convergence Comparison The convergence histories for the cases discussed for all three methods are shown in Figure 5.24. First note that because of the imperfect convergence of the backpropagation algorithm for FIML-Embedded and random initialization of the weights for FIML-Direct, the objective function values (Jc, Je, Jd) are not equivalent on the first evaluation. The objective function values for the multi-temperature 137 Figure 5.22: FIML-Direct stochastic gradient descent results for holdout T?(z) distributions. inversions are normalized by the number of T? distributions for comparison to the single temperature cases. Note that all methods show strong convergence of the objective function (a drop of at least a few orders of magnitude). The FIML-Classic convergence of Jc is by far the best of all the methods tested. This is expected, as the FI-Classic ap- proach does not account for the limitations of the chosen machine learning algorithm in the inversion process. The FIML-Embedded convergence indicates a successful augmentation, but is the worst compared to the other approaches. This is likely due to the imperfect convergence of the backpropagation algorithm that is limiting the achievable accuracy. As noted previously the robustness of the FIML-Embedded approach remains an issue, and the added complexity of the FIML-Embedded pro- 138 Figure 5.23: FIML-Direct stochastic gradient descent objective function Jd conver- gence history. cedure is a drawback of this algorithm. Most remarkable is the convergence of the FIML-Direct cases. For a single temperature the FIML-Direct algorithm converges roughly six orders of magnitude; this is excellent convergence considering the algo- rithm is training the neural network in the inversion process. As more cases are considered the convergence is somewhat adversely affected when using the FIML- Direct approach, but the objective functions still show a drop of over five orders of magnitude. Also note that all methods show strong convergence early in the inversion history indicating that successful augmentations could still be obtained if 139 Figure 5.24: Convergence history of the objective functions for all three methods. computational cost considerations preclude hundreds of inversion iterations. The FIML-Direct stochastic convergence is by far the worst on a per-iteration basis, but because each evaluation is efficient it is possible that the stochastic approach would be advantageous when considering more cases than considered here. 5.5 Summary Results were presented for all three FIML methods on the 1D heat equation model problem. It was shown that all three methods can generate augmentations that correct the model for the training temperatures. The FIML-Classic and FIML- Direct methods were able to almost perfectly recover the truth ? distribution for a single training temperature, while it was shown that the FIML-Embedded approach dramatically reduced the model error but did not converge to the true distribution. 140 The FIML-Classic and FIML-Direct methods were also demonstrated by using four T?(z) distributions, and it was shown that it was not possible to achieve perfect offline training for the FIML-Direct approach. This results in a lack of consistency between the inversion results and the model augmentation that can result in dimin- ished performance. For this simple implementation, however, nearly perfect model augmentations were demonstrated for the training temperatures for both FIML- Classic and FIML-Direct. It was demonstrated that the FIML-Direct method can efficiently perform the inversion on multiple cases simultaneously. This is a somewhat surprising result, as FIML-Direct is training a neural network in an unconventional way. FIML-Direct does not train a neural network by the typical approach: by acquiring large quanti- ties of training data and utilizing the backpropagation algorithm (or other approach) to train the network to reproduce the training set. FIML-Direct operates on the weights directly by computing the gradient of the cost function (Jd) with respect to the weights. In this way there really is no training data in the classical sense (large quantity of input/output pairs). The algorithm is simply finding the weights that cause the augmentation to minimize the cost function. The only data that is being utilized is the truth distributions, but the output of the neural network is ? and the inputs are the features, {T?, T}. For FIML-Classic (and typical neural network training) we must assemble a representative dataset of inputs and outputs, but FIML-Direct completely avoids this sometimes cumbersome step. It was antici- pated, but not observed, that the nonlinearity of the neural network may complicate the training and give poor inversion results. For the model problem this was not an 141 issue, as the FIML-Direct approach required a comparable number of iterations to perform the inversion and augmentation as the FI-Classic inversion. In summary, the 1D heat equation provides a simple environment to evaluate and compare the FIML approaches. Of the two new methods presented in this the- sis (FIML-Embedded and FIML-Direct), the FIML-Direct approach shows the most promising results for the model problem. The simultaneous inversion and training of multiple cases provides an exciting novel approach to train neural networks for physics based simulations. As demonstrated by the 1D heat equation cases consid- ered in this Chapter, the FIML-Direct augmentation shows comparable or better accuracy than the FIML-Classic augmentation without the need for offline train- ing. By including the limitations of the neural network in the inversion the modeler guarantees consistency between the inversion environment and the resulting aug- mentation. 142 Chapter 6: FIML Results for RANS Applications 6.1 Overview This chapter presents results for all three FIML approaches on several histor- ically challenging RANS applications. Results for the FIML-Embedded and FIML- Direct approaches for RANS applications are presented here for the first time. The FIML-Classic, FIML-Embedded, and FIML-Direct methods were implemented in a modified version of the Stanford University Unstructured code (SU2) [107] for the first time. This software suite is dramatically more complex than the 1D heat equation simulation in the previous chapter. This demonstrates the robustness and flexibility of the FIML approach, however it also complicates the discussion and con- clusions as the physical problem and numerical methods involved are substantially more complex. In the first section, results for simulations concerning the S809 airfoil are presented, and all three methods are demonstrated. Of particular interest are the FIML-Direct results for this application, which demonstrate the capability of this new approach to perform the simultaneous inversion of multiple cases. A new augmentation is constructed from this inversion and it is demonstrated on unseen data: another airfoil, the S814. To demonstrate that the methods can also improve predictions for compress- 143 ible cases, all three methods are demonstrated on a transonic airfoil with a shock wave turbulent boundary layer interaction (SWTBLI). Another historically difficult problem is the NASA Langley wall mounted hump [111]. These applications are presented in this chapter to demonstrate performance on a variety of problematic RANS applications. As will be shown in this chapter, the FIML approach demon- strates promising performance on all of these cases, with the 2D airfoil results being the most promising as these results are perhaps the most developed of all the appli- cations presented. This chapter only presents results for cases where one of the new novel methods (FIML-Embedded and FIML-Direct) was applied. Additional cases performed for only the FI-Classic approach are presented in Appendices A and B, which show results for a hypersonic wedge SWTBLI application and a 3D Onera M6 wing. 6.2 S809 Airfoil The S809 airfoil is a thick airfoil designed for wind turbine applications. As discussed in Chapter 1, RANS predictions on 2D airfoils at a high angle of attack often suffer from an overprediction in eddy viscosity. This results in a boundary layer more resistant to separation than observed in experiments, leading to an over- prediction in lift coefficient at angles of attack sufficient to result in separation on the suction surface [15, 3]. In general the SA model overpredicts eddy viscosity for adverse pressure gradients as noted by Spalart and Allmaras [17], and regions where the SA model is anticipated to perform poorly are illustrated in Figure 1.3. 144 The S809 airfoil was considered for all three FIML approaches. The data chosen to improve the baseline model was the experimental lift coefficient [16] (kd = CLexp) for all cases. In the following sections first the results for the augmentation for a single angle of attack (single case) are presented for each method. Additionally, a discussion of the features selected for the neural network augmentation is presented. Finally, results for the simultaneous inversion of multiple cases using the FIML- Direct algorithm are presented. This augmentation, only trained using S809 data, is then tested on another 2D airfoil, the S814. 6.2.1 FI-Classic The Classic method field inversion ?FI-Classic? was performed on the S809 airfoil at ? = 14.2? angle of attack. The baseline SA-BC model was run for a number of angles of attack and compared to the experiment. Note that the SA-BC transition model was used for all of the S809 airfoil cases as the natural transition experimental data was used. As shown in Figure 1.2, the lift coefficient is substan- tially over-predicted by the baseline SA-BC model, indicating that there is room for improvement that could be made by incorporating the information from the data into an augmented model. The objective function for the FI-Classic inversion is given by 6.1, where n is the number of points in the computational domain: 1 ?n Jc(?) = (C ? C )2Lexp L + ? (?i ? 1.0)2 (6.1)2 i=1 145 For this angle of attack we have CLexp = 1.0546 and ? = 1.0?10?4. Remember the regularization constant, ?, represents the confidence in the baseline solution relative to the data, such that higher ? indicates more confidence in the model. In this case ? was chosen to be similar to that explored by Singh [42]. To show the effect of the regularization constant values for the objective function without any regularization constant (? = 0) will be shown and denoted J ?c such that J ? c ? Jc. The baseline SA-BC model solution (no correction, ? = 1.0 everywhere) is shown in Figure 6.1. There is a large separated region at this angle of attack indicated by the recirculating streamlines. This solution corresponds to the first evaluation of Jc in the minimization. Because there is a ? at every node of the domain the gradients can be easily visualized. The gradients found by the adjoint solver for the first evaluation (? = 1.0 everywhere) are shown in Figure 6.2. Note that most of the design variables are inactive, meaning that the gradients are equal to 0, they do not affect the experimental lift coefficient, and therefore will not be modified by the optimization algorithm during the minimization. The remaining gradients are almost universally positive, meaning that the downhill direction is to decrease production on the upper surface of the airfoil, as expected. The convergence history of the FI-Classic algorithm for the S809 airfoil is shown in Figure 6.3. Each ?evaluation? represents a solution for a chosen ?(x) correction field. A direct (primal) and adjoint solution is found for each evaluation to find Jc(?) and dJc/d?. Note that the convergence for this case is very good, with a substantial decrease in the lift coefficient error in a few iterations. The right side of 146 Figure 6.1: Baseline SA-BC streamlines and Cp contours for S809 airfoil. Figure 6.3 shows good convergence and also shows the purpose of the regularization constant. Note that J ?c shows a three order of magnitude drop in the objective function for evaluation 3, however Jc is much higher indicating that the correction for evaluation 3 is excessively large. The minimum for Jc occurs on evaluation 11, which is a much less intrusive correction that still substantially corrects the lift coefficient. To further evaluate the inversion convergence history, the L2 norm of the gradient, ??Jc?2, is shown in Figure 6.5. Note that the magnitude of the gradient is substantially reduced throughout the inversion, exhibiting the expected behavior for convergence to a local minimum. 147 Figure 6.2: Initial gradient for FI-Classic. Figure 6.3: Convergence history of S809 airfoil for FI-Classic. The optimal correction field, found in evaluation 11, is shown at the top of Figure 6.11. Note that this correction is rather substantial, and over a limited 148 domain in space. The sudden changes in ? will complicate the determination of a function to reconstruct it from local flow features. At this point, the inversion is complete, but no learning has occurred. There remains the process of taking the spatially correlated correction field, and learning a function to reproduce it: ?(x)? ?(?). Figure 6.4: Optimal correction field found by FI-Classic. 6.2.2 Feature Selection and Scaling The methodology used to examine potential features, ?, is discussed in this section. In order to successfully train a neural network that can reproduce the correction field, the modeler must select appropriate features that exhibit a strong functional relationship to the output. Ideally these features will be local and easily computed in the flow solver so it will be available to the model augmentation. To start, the features were visualized with scatter plots and correlation coefficients. A 149 Figure 6.5: Magnitude of the gradient vector of the objective function during the inversion for FI-Classic. large set of features already used in the SA turbulence model were examined. Intu- itively, these are quantities that have already been identified by previous researchers as important to accurately model the eddy viscosity, and therefore it is likely that a subset of these features will be adequate to produce a function to predict the correction to the production term. Figures 6.6 and 6.7 shows the scatterplot and correlation coefficient plot of a large subset of the features considered during this effort. P/D is the (dimensionless) ratio of the production to destruction terms of the SA model. S? is the SA model variable. |S|/|?| is the ratio of the strain to vorticity magnitudes. ? is the ratio of the local turbulent strain to the shear stress at the wall that was also used by Medida 150 [3] in his PhD Thesis to create an adverse pressure gradient correction to the SA model 6.2. fw is the SA wall function variable. The features have been scaled to unit mean and standard deviation. First note that some of the considered features have extreme outliers. As noted previously this can complicate neural network training. Additionally, the correlation coefficients are low (with the possible exception of ?), indicating that the features do not show a strong linear correlation to the correction. Figure 6.6: Scatterplot pairs for some considered features. 151 Figure 6.7: Correlation coefficient pairs for some considered features. ?t|Sij| ? = (6.2) 1.5?w A number of scaling operations were considered in order to account for large outliers. The most successful of these is shown in Figures 6.8 and 6.9. In this scaling the features have been mapped to a normal distribution. Note that the outliers have been dramatically reduced, and the correlation coefficients have also increased substantially. In practice, mapping the features to a normal distribution produced far better augmentations than scaling to unit mean and standard deviation due to 152 the more robust treatment of outliers. Also note that there is a strong relationship between fw and the P/D suggesting that perhaps both features do not need to be included in the feature set. Figure 6.8: Scatterplot pairs for some considered features mapped to normal distri- bution. The features were also ranked in order of importance in two ways. First, a random forest regressor was trained on the features. Random forests naturally 153 Figure 6.9: Correlation coefficient pairs for some considered features mapped to normal distribution. produce a feature ranking as part of the learning process. This may give some indication of which features show the strongest functional relationship to the output, but since in this application we are using neural networks it is perhaps not the most appropriate approach. The feature ranking from the random forest regression training in order of most to least important was: 1. ?, 2. fw, 3. ?, 4. |S|/|?|, 5. S?, 6. P/D. Next, the sequential back selection (SBS) algorithm was performed on the 154 potential features. This algorithm also ranks features, but in this case the proper learning algorithm is used (neural networks). Note, however, that the results are dependent on the chosen neural network structure, feature scaling, and specific training algorithm so again conclusions are difficult. The feature ranking for the sequential back selection algorithm was: 1. fw, 2. P/D, 3. ?, 4. |S|/|?|, 5. ?, 6. S?. Additionally, the R2 scores for each subset of features were plotted. This can help determine the selection of the optimal number of inputs for the neural network. Networks with too many features (features that do not provide additional useful information) suffer from overtraining that can diminish performance. The scores for each subset of features is shown in Figure 6.10. Note that the network for all 6 features shows a worse score than the 5 feature subset (P/D removed). This is likely because fw and P/D are highly correlated (Figure 6.9) and therefore only one is needed in the feature set. Also note that the R2 score only starts decreasing for the 2 or 3 feature subsets indicating that perhaps 3-4 features are an appropriate number of inputs for this application, features selection, training algorithm, and neural network structure. Definitive conclusions concerning feature selection for machine learning are difficult to make, because the optimal features are so highly dependent on choices made by the modeler. Ultimately, four features were selected given by Equation 6.3: [ ] P |S| ? = , ?, , ? (6.3) D |?| 155 Figure 6.10: R2 scores for each subset of features during sequential back selection. 6.2.3 FIML-Classic The four features selected from the feature selection section were used to train a neural network augmentation to the SA-BC model. This completes the FIML- Classic augmentation and creates a model that could be used to make predictions for additional (unseen) cases, without the need for the expensive inversion process (and the corresponding need for additional data). The SciKit-Learn package was used to train the neural network. A rigorous hyperparameter search was not performed however a 3 hidden layer x 20 node net- work was selected. The network was trained via the L-BFGS-B algorithm, and the 156 resulting output of the neural network is shown in Figure 6.11. Figure 6.11: Comparison between correction found in inversion and output of neural network following learning. Clearly, the residual training error shown in Figure 6.11 will have a negative effect on the accuracy of the resulting augmentation. While another neural net- work topology or other machine learning algorithm could likely perform better than shown [13], no learning algorithm will have perfect training. Therefore, for FIML- Classic there will always be some level of inconsistency between the inversion and the prediction environment. For this case, the penalty due to inconsistent augmentation 157 CL Error Target 1.0546 - Baseline 1.15826 0.10366 Inverse 1.06726 0.01266 Augmentation 1.06914 0.01454 Table 6.1: Comparison of target (kd), inverse, and augmented model lift coefficients. is shown in Table 6.1. The penalty in this case is an error in the lift coefficient that is 15% greater than that of the inverse solution. Thus, the achievable performance for this neural network training increases the error by 15%. Fortunately, for this case the residual error is still quite low. The pressure distributions from the experiment, baseline model, and FIML- Classic augmentation are shown in Figure 6.12. As shown, the improvement in prediction is due to better prediction of the separation location, consistent with the known deficiencies in the RANS models discussed previously. Note that for the FIML-Classic augmentation only the only higher fidelity data used was the experimental lift coefficient at 14.2? angle of attack: a single number. Despite this limited data set the augmentation has corrected the pressure distribution, indicating that the augmentation is making a physically relevant prediction. 158 Figure 6.12: Pressure distribution plot showing improvement in prediction for the FIML-Classic augmentation. 6.2.4 FIML-Embedded The FIML-Embedded approach was performed on the same application (S809 at 14.2? AoA). The same features were used however in this application a 2 hid- den layer x 20 node neural network was used. A two hidden layer network proved sufficient to minimize the loss function in the flow solver. Again the hyperbolic tangent activation function was used. The features were again mapped to a normal distribution, however the method implemented in SU2 to accomplish the mapping is different than that implemented in the SciKit-Learn package [92]. The SU2 imple- mentation computes the estimated distribution function of the feature by binning then applies the inverse cumulative distribution function of a normal distribution as described in Chapter 4. The objective function for the FIML-Embedded inversion 159 is given by 6.4: ?n1 Je(?T ) = (C 2 Lexp ? CL) + ? (?T ? 1.0)2 (6.4)2 i i=1 The minimization history of the cost function (min? Je(?T )) is shown in Fig-T ure 6.13. Note that the lift coefficient error has been substantially reduced as shown on the left hand side figure. On the right, the regularized cost function has been substantially reduced (Je). The lowest value of Je was obtained on evaluation 6. Figure 6.13: Convergence history of S809 airfoil field inversion. As noted previously, a drawback to the FIML-Embedded approach is the ad- ditional complexity required to perform the simultaneous minimization associated with the backpropagation algorithm (minimize the SSE loss function) inside the flow solver, and the minimization of Je. In practice, this sometimes resulted in poor convergence inside the flow solver. Improving the robustness of the FIML-Embedded 160 implementation remains an area for future work, especially for RANS applications. For this case, the robustness issue prevented further convergence of the minimiza- tion algorithm and the inversion terminated prematurely. This almost certainly is limiting the achievable performance of this FIML-Embedded case, so direct compar- ison to the other FIML algorithms is not appropriate until the robustness issues are resolved. Nevertheless, for this application Figures 6.14 and 6.15 show the lift con- vergence and SSE loss function convergence history for evaluation 6 indicating that the flow solver and backpropagation algorithm are showing sufficient convergence for this case. Therefore, regardless of the premature inversion termination, because the objective function is substantially smaller than the initial design, and the back- propagation algorithm converged, this design is valid and represents a successful augmentation that improves the prediction of the model. Additionally, Figure 6.16 shows shows the optimal ?T and ? correction fields. Recall that ?T are the design variables for the optimizer, and ? is the correction applied to the production term and also the output of the neural network being trained by the backpropagation algorithm. Note that in Figure 6.16 the neural network training is satisfactory, but not perfect. The ? field is slightly more dispersed and generally of lower magnitude indicating increased regularization over the ?T field. This is due to imperfect learn- ing in the backpropagation algorithm and is expected. Since this training error is now inside the inversion, it is accounted for in the optimization process. This en- sures that the optimal correction can be learned. Also, unlike FIML-Classic, this implementation trains a neural network in the inversion and therefore guarantees 161 Figure 6.14: CL convergence at minimum Je evaluation. consistency between the inversion solution and the resulting augmentation. 6.2.5 FIML-Direct The FIML-Direct algorithm was also performed for the same case (S809 at 14.2? AoA). Compared to the FIML-Embedded implementation the FIML-Direct approach is substantially more straightforward. The weights of the neural network are now the design variables and are set by the optimizer at each Jd(w) evaluation. The weights remain fixed for each evaluation. The forward propagation algorithm is still performed each flow solver iteration to find the current correction ?(w). For this application the same features were used, however a smaller network was 162 Figure 6.15: SSE convergence at minimum Je evaluation. selected using a single hidden layer with 20 nodes. More layers were tested but did not result in improved performance for this case. The objective function for this FIML-Direct case is given by 6.5 and initially ? = 1.0 ? 10?4. Note that unlike the other methods, the gradient cannot be visualized in the same manner because the design variables are no longer defined at each node, but are now the weights of the neural network. Despite being not easily visualized, FIML-Direct substantially lowers the dimensionality of the inversion for this application (number of weights much less than the number of nodes in the computational domain). Due to the many fewer design variables the limited memory BFGS variant (L-BFGS-B) algorithm was not necessary and the BFGS algorithm was used. The convergence of Jd(w) during the inversion/optimization procedure, minw(Jd(w), is shown in Figure 6.17. 163 Figure 6.16: FIML-Embedded comparison of training correction ?T and correction output by converged neural network augmentation ?. ?n Jd(w) = (CLexp ? 2 1 CL) + ? (? ? 1.0)2 (6.5) 2 i=1 Note that the convergence shown in Figure 6.17 is relatively poor compared to FIML-Classic and FIML-Embedded. However, the FIML-Direct approach is re- sulting in substantially more regularization than the other methods, as shown in Figure 6.18. It is therefore appropriate to lower the regularization constant because the required regularization is being applied by the limitations of the neural network 164 Figure 6.17: Convergence history of S809 airfoil FIML-Direct inversion for ? = 10?4. in this approach. Figure 6.18: Correction field (?) for FIML-Direct with ? = 10?4. Figures 6.19 shows the correction for the same case with ? = 10?5. Note that the correction has a larger magnitude, and the residual error in the lift coefficient 165 (discussed next section) is much lower with the decreased regularization constant. It is possible to lower the regularization constant for the other methods, with the risk of the increased difficulty of learning the de-regularized correction. Unlike the convergence history shown in Figure 6.17 for the ? = 10?4 FIML-Direct case, the ? = 10?5 results did not demonstrate strong (asymptotic) convergence during the inversion. However, the achieved minimum gives a much closer match to the data than the higher regularization constant solution, as expected, due to the decreased regularization constant allowing a more intrusive correction. Figure 6.19: Correction field (?) for FIML-Direct with ? = 10?5. 6.2.6 Comparison Comparisons between the three FIML methods tested on the S809 airfoil for this single case are presented in this section. First, note that the three methods are using different neural network structures and numerical approaches so compar- 166 CL Error Target 1.0546 - Baseline 1.15826 0.10366 FI-Classic, ? = 10?4 1.06726 ? 0.01266 FIML-Classic, ? = 10?4 1.06914 ? 0.01454 FIML-Embedded, ? = 10?4 1.07372 ? 0.01912 FIML-Direct, ? = 10?4 1.09483 ? 0.04023 FIML-Direct, ? = 10?5 1.05834 ? 0.003740 Table 6.2: Comparison of CL following inversion and/or model augmentation for FIML approaches on the S809 airfoil; the ??? is applied to emphasize that some results, as discussed, are not fully converged so confident conclusions about the relative performance of the three FIML approaches cannot be made. isons must be made with caution. Additionally, the FIML-Embedded solution was not sufficiently converged due to robustness issues, as discussed. Convergence ver- ification for the FIML-Direct application with ? = 10?5 also remains an item for future work. Therefore, because of these considerations, confident comparisons of the performance of each method will not be made here, and the relative achiev- able performance of the methods remains uncertain. Table 6.2.6 compares the lift coefficients achieved for all three FIML approaches. Note that all the FIML approaches result in augmentations that dramatically 167 reduce the error in the predicted lift coefficient. The penalty for imperfect training for the FIML-Classic approach is shown in Table 6.2.6. The difference in the FI- Classic and FIML-Classic result is simply due to the fact that the neural network used for the augmentation cannot perfectly replicate the correction field. FIML- Embedded for the same regularization constant performs somewhat worse than the FIML-Classic augmentation for this choice of neural network hyperparameters and regularization constant. It is certain that the performance of the FIML-Embedded algorithm would be improved if a solution to the observed robustness issues for the RANS applications is developed, enabling the inversion to converge further. The FIML-Direct method performs worse than the other methods for the same regularization constant, but as the regularization is being provided by considering the limitations of the neural network in the inversion it is appropriate to decrease the regularization. The lower FIML-Direct error bar shows the best performance of all augmentations (? = 10?5 for that case). Also note that, in general, it is possible that a different optimization algorithm or improved implementation could achieve better inversion performance than the results presented here. The experimental lift coefficient at a single angle of attack was the only higher fidelity data used to perform the inversions: a single number. It is somewhat re- markable, therefore, that the entire pressure distribution on the airfoil is corrected by the augmentations as illustrated in Figure 6.20. Note that the baseline model shows substantial error in the experimentally observed separation location, but all the FIML approaches substantially reduce this error. Also note that the three FIML approaches result in different optimal solutions for the SA production term (?) as 168 Figure 6.20: Comparison of pressure distributions for baseline SA-BC model and FIML augmentations. 169 shown in Figure 6.21. This is due to the differing ways of accounting for (or neglect- ing) the limitations of the learning algorithm in the inverse problem. Figure 6.21: Comparison of production term correction (?) for FIML inversions. 170 6.2.7 FIML-Direct for Simultaneous Inversion of Multiple Cases As discussed in Chapter 4, because the design variables are consistent across multiple cases for the FIML-Direct approach, the simultaneous inversion of multiple cases can be performed. In other words, because we are seeking a single neural net- work that augments the model for a variety of cases we can perform the FIML-Direct inversion on multiple cases simultaneously. Figure 3.4 illustrates this process. There is no requirement for the objective function to be the same for each case, meaning whatever data is available can be used for each case. Additionally, there is no re- quirement to use the same computational grid, and therefore multiple geometries can be considered simultaneously. To demonstrate the simultaneous inversion of multiple cases the FIML-Direct inversion is performed on the S809 airfoil at three angles of attack, ?train = {1.02?, 8.2?, 14.2?}. The individual objective functions were the same form as considered in the single angle of attack case; however, the regularization constants were different between the angles of attack considered. This is due to the difference in confidence between the conditions. At ? = 1.02? the confidence in the prediction is high, and therefore the regularization constant was set at ? = 10?4. For the high angles of attack the model prediction is much less confident due to the high error with respect to the experimental data. Therefore, the regularization constant for these cases was set at ? = 10?5. The convergence for this case is shown in Figure?6.22. Note that each evaluation corresponds to the solution of 3 direct (J (w) = 3d i ji(w)) and 3 adjoint simulations for roughly the equivalent cost of 6 times the cost of a single 171 baseline solution. Figure 6.22: Convergence history of S809 airfoil FIML-Direct simultaneous inversion at 3 Angles of Attack. The minimum was found on Evaluation 14 and the resulting correction for each angle of attack is shown in Figure 6.23. To evaluate the generalization ability of the resulting augmentation the net- work was tested on four additional (unseen) angles of attack that were not included in the training set. These are referred to as the holdout cases, so ?holdout = {5.13?, 11.21?, 12.22?, 15.24?}. The baseline results were then compared to the augmen- tation trained on the three ?train conditions. The results are summarized in Figure 6.24. Note that for all the angles of attack in the training and holdout cases, the 172 Figure 6.23: Correction for all three angles of attack at best evaluation. predicted lift coefficient improved. Additionally, mild extrapolations in angle of attack from the training set were possible with the augmented model. To further improve the augmentation the inversion was continued with all seven angles of attack. This highlights the flexibility of the FIML-Direct approach, as the modeler can choose to continue training with additional data as necessary, 173 Figure 6.24: Summary of augmentation performance for training and holdout cases for augmentation trained on three angles of attack. or an existing augmentation could incorporate new data that becomes available. Following the inclusion of the additional angles of attack the result of the inversion with all seven cases is shown in Figure 6.25. The drag prediction is also improved, as shown in Figure 6.26. To test the augmentation created from S809 lift coefficient data, augmented predictions were made from the S814 airfoil. The Mach number contour for the baseline SA-BC prediction on the S814 airfoil at 15.25? angle of attack is shown in Figure 6.27. The S809 augmentation also improves prediction on the S814 airfoil as shown in Figure 6.28. The corrections output by the neural network for three of the S814 174 Figure 6.25: S809 augmentation lift performance for training set of all seven angles of attack. angles of attack are shown in Figure 6.29. Additionally, the pressure distribution for the S814 at 16.2? angle of attack is compared to the experimentally obtained pressure distribution in Figure 6.30. The pressure distribution has been substantially improved by the augmentation. The S814 results demonstrate an important characteristic of machine learn- ing augmented models. Despite the S814 having a different shape than the S809, an augmentation trained using the FIML-Direct process on S809 information sub- tantially improves predictions on the S814. Although the airfoils are different, the feature space of the S814 has been covered by the S809 cases; and, therefore the 175 Figure 6.26: S809 augmentation drag performance for training set of all seven angles of attack. model does not need information from the S814 cases to improve predictions. In other words, while two cases may be different, their features may be closer than expected and the information learned from one case may be surprisingly effective on a geometrically dissimilar case. 6.3 RAE2822 Airfoil The RAE2822 transonic is another canonical RANS test case. This airfoil exhibits a shock boundary layer interaction on the suction surface that produces 176 Figure 6.27: S814 baseline SA-BC pressure and streamline predictions for ? = 16.2?. a strong adverse pressure gradient. This violates the assumptions in the RANS models, and the models have difficulty predicting the precise shock location as well as the response of the boundary layer to the large adverse pressure gradient of the shock. This error results in incorrect force predictions. RAE2822 wind tunnel predictions are provided by Cook et al. [24], and the higher fidelity data for the FIML applications was the experimentally derived pressure coefficient (kd = CPexp). The freestream Mach number was M? = 0.729 and the angle of attack was 2.31 ?. This corresponds to the conditions on the NPARC Wind validation website [112]. The experimental pressure distribution is interpolated onto the boundary points of 177 Figure 6.28: S814 CL resulting from augmentation trained on S809 angles of attack. the airfoil mesh. 6.3.1 FI-Classic The objective function for the FI-Classic application is given by Equation 6.6, where k is the total number of boundary points defining the airfoil in the computational domain, n is the total number of points in the domain, and C?pexp represents the interpolated pressure coefficient from the experiment. 178 Figure 6.29: S814 ? fields for three angles of attack. ?k n1 ? Jc(?) = (C?pexp,i ? CP,i(?))2 + ? (?j ? 1.0)2 (6.6)2 i=1 j=1 Again, the BFGS optimizer was used. The regularization constant was chosen to be ? = 10?7. The initial solution is quite good, with the predicted shock location being relatively close to the experiment. The pressure immediately behind the 179 Figure 6.30: Pressure distribution plot for the S814 at 16.2? angle of attack showing improvement in prediction for the model augmentation. shock is too low, indicating that the model is not predicting the correct boundary layer reaction to the strong adverse pressure gradient behind in the shock. The convergence of the objective function throughout the inversion is shown in Figure 6.31. The pressure distribution of the experiment, baseline SA model, and FI-Classic inverse solutions are shown in Figure 6.32 and 6.33. As shown, the inversion has corrected the small error in predicted shock location (inverse solution shock slightly forward), and the inverse solution shock location agrees better with the experiment. Also note that the pressure rise through the smeared shock foot is now over a larger distance that more closely matches the experiment. As noted, it is not expected that the SA model will be very accurate in this region due to the strong adverse pressure gradient, and therefore it is expected that the inverse solution will show the 180 Figure 6.31: Convergence of RAE2822 FI-Classic objective function. most improvement in this area. With the improved agreement in the shock location the pressure following the shock is higher than the baseline model, and in better agreement to the experiment. The correction is also largest near the shock foot. Figure 6.34 shows the cor- rection over the whole airfoil, and Figure 6.35 illustrates the correction and pressure distribution near the SWTBLI. 6.3.2 FIML-Direct The FIML-Direct approach was also applied to the RAE2822 airfoil. The same experimental data was used giving the objective function in Equation (6.7). ? = 10?6 was used for this case, as well as a neural network with two hidden layers of 20 neurons each. The same features were used as in the S809 airfoil case. The 181 Figure 6.32: RAE2822 FI-Classic pressure distribution results. convergence is shown in Figure 6.36. ?k 1 ?n J (?) = (C? 2 2d pexp,i ? CP,i(w)) + ? (?j(w)? 1.0) (6.7)2 i=1 j=1 Note that in Figure 6.36 there is little difference between Jd and J ? d indicat- ing that the regularization term is not contributing much to the overall objective function despite setting the regularization term much higher than in the FI-Classic approach. The reason for this becomes apparent when looking at the optimal correc- tion field output from the neural network as shown in Figures 6.37 and 6.38, showing the correction over the entire airfoil and at the interaction region respectively. 182 Figure 6.33: RAE2822 FI-Classic pressure distribution results near shock location. Figure 6.34: FI-Classic correction for RAE2822 application. Similar to the S809 airfoil results, the FIML-Direct correction is much more regularized in that the magnitude of the correction is smaller and is spread over a larger area. This regularization is not being provided by the cost function regu- larization constant (?), but is due to the limitations of the network that are now considered in the inversion. The pressure distribution resulting from this inversion is shown in Figure 6.39. 183 Figure 6.35: Pressure distribution and FI-Classic correction near the smeared shock foot. Even with the increased regularization from the neural network the FIML-Direct inverse solution improved the shock location. This is better illustrated in Figure 6.40. Note, however, that the correction to the smeared shock-foot that was possible in the FI-Classic application is not shown in the FIML-Direct results. This is likely 184 Figure 6.36: FIML-Direct convergence for RAE2822 airfoil test case with ? = 10?6 Figure 6.37: FIML-Direct correction (?(w)) for RAE2822 airfoil test case. due to the correction near the shock being high in magnitude and in very small regions near the interaction region. This likely prevented learning of the required correction to adequately adjust the boundary layer reaction to the shock. It is possible that improved feature selection or a different neural network structure could improve the FIML-Direct performance for this case. Also note that to apply the information given by the FI-Classic inverse solution the information would still need 185 Figure 6.38: FIML-Direct correction at the shock interaction location for RAE2822 airfoil test case. to be learned. For the FIML-Direct result the neural network has already been trained. Figure 6.39: FIML-Direct pressure distribution results. 186 Figure 6.40: FIML-Direct pressure distribution results at the shock location. 6.4 NASA Langley Hump 6.4.1 Overview The NASA Langley wall mounted hump was also examined. Similar to the S809 cases this is an incompressible, 2D, separated flow application. The case is based on a wind tunnel model hump that was placed directly on the tunnel wall. A fully developed turbulent boundary layer then passes over the hump and a large separation region develops on the leeward side. It has been shown that RANS mod- els are unable to adequately predict the size of the separated region [111]. The computational domain was developed to mimic the conditions in the tunnel experi- ment. The top of the domain is an inviscid wall with a contoured shape to account for blockage effects in the wind tunnel experiment. The bottom of the domain is a viscous wall with sufficient length to develop the boundary layer consistent with 187 the experiment. Details concerning this computational setup are provided on the turbulence modeling resource website [25]. The pressure contours and streamlines predicted by the baseline SA model are shown in Figure 6.41. Figure 6.41: Baseline SA model predictions for pressure coefficient over the NASA Langley wall mounted hump. As shown in Figure 6.41 there is a strong adverse pressure gradient just ahead of the separated region on the top of the hump. Similar to the S809 results we expect the eddy viscosity to be too high in this region. Additionally, in the separated region the assumption of the wall distance is invalid and therefore it is expected that the eddy viscosity is poorly modeled there. 188 6.4.2 FI-Classic The FI-Classic method field inversion was performed on the NASA Langley 2D hump. The experimentally derived pressure distribution was used as the higher fidelity data kd = CP . Where experimental pressure was available, the pressure distribution was interpolated onto each point in the computational domain on the hump. This results in the following objective function for this application 6.8, where k is the total number of boundary points defining the hump in the computational domain, n is the total number of points in the domain, and C?pexp represents the interpolated pressure coefficient from the experiment. ?k ?n J (?) = (C? ? C (?))2 1+ ? (? 2c pexp,i P,i j ? 1.0) (6.8)2 i=1 j=1 First, a case was run with ? = 0. In general, this is not expected to produce a useful correction field because it is expected that the resulting correction will be excessively large. ? = 0 corresponds to no confidence in the baseline model. Clearly it is not believed that the baseline SA model is completely wrong in this case, however the resulting inversion illustrates the reaction of the inversion to different regularization constants. The convergence history of Jc with ? = 0 is shown in Figure 6.42. Note the convergence is fantastic. The convergence likely would have continued however the inversion was terminated because the correction field was extreme, as shown in Figure 6.43. Despite this, the model shows an incredible match to the data over the baseline solution as shown in Figure 6.44. This illustrates the effect 189 Figure 6.42: FI-Classic convergence for NASA Langley Hump test case with ? = 0 of the regularization constant: it is a compromise between matching the available data and the modeler?s confidence in the model. For a much more useful inversion, ? = 10?6 was chosen. The convergence for this case is shown in Figure 6.45. The convergence is still quite good, and the resulting correction field is much more reasonable (Figures 6.46 and 6.47), as it is much lower in magnitude, and more closely localized to where there are violated assumptions in the SA model. Note that in Figure 6.47 the correction is reducing eddy viscosity production in the strong adverse pressure gradient just before the boundary layer separates. This is similar to the behavior observed in the S809 cases. Then, also similar to the 190 Figure 6.43: FI-Classic correction field ? for case with ? = 0 Figure 6.44: FI-Classic Cp comparison to available data kd for test case with ? = 0 191 Figure 6.45: FI-Classic convergence for test case with ? = 10?6 S809 cases the turbulent production is reduced in the separated region. Unlike the S809 cases there is also a large increase in production in the shear layer above the recirculating region. Note that only the field inversion process has been performed. If an augmen- tation were to be created, offline training would be performed on the correction and flow features found in the inversion. 6.4.3 FIML-Direct The FIML-Direct method was also performed on the NASA Langley hump. The computational domain was identical to that used for the FI-Classic application. 192 Figure 6.46: FI-Classic correction field ? for ? = 10?6 Figure 6.47: FI-Classic correction field ? with streamlines for test case with ? = 10?6 193 Figure 6.48: FI-Classic Cp comparison to available data kd for test case with ? = 10?6 This results in the following objective function for this case: ?k ?n Jd(w) = (C?pexp,i ? 1 CP,i(w)) 2 + ? (?j(w)? 1.0)2 (6.9) 2 i=1 j=1 Again, because of the increased regularization caused by the FIML-Direct method accounting for the limitations of the neural network in the inversion, the regularization constant was decreased to ? = 10?7 for the FIML-Direct application. A single hidden layer of 20 hyperbolic tangent neurons was used. The convergence history of minw Jd(w) is shown in Figure 6.49. The minimum occurred on evaluation 19 with the optimizer subsequently struggling to find an appropriate line search step size near the minimum. Note that the regularization constant appears to not be active J ?d ? Jd. Curiously, in- 194 Figure 6.49: FIML-Direct convergence for test case with ? = 10?7 creasing the regularization constant severely penalized the achievable match to the pressure data and therefore the regularization constant was left at 10?7. Looking at the correction at the minimum (Figure 6.50) the correction has a much larger spatial extent than the equivalent FI-Classic case. It is possible that the features used could not sufficiently differentiate between the boundary layer and the rest of the domain, resulting in corrections far away from the separated region where it is not likely having much effect on the pressure distribution. Nevertheless, the achieved pres- sure distribution closely matches the experimental data indicating that the neural network augmentation has been successful in reducing the prediction error (Figure 6.51). Note that unlike the FI-Classic results, for the FIML-Direct inversion the 195 model has now been augmented, and that augmentation could be applied to other cases. Figure 6.50: FIML-Direct correction field ? with streamlines for test case with ? = 10?7 Also note that the correction for the FIML-Direct application (Figure 6.50) is substantially different from the FIML-Classic application (Figure 6.47). Despite the different correction fields, the achieved pressure distributions both closely match the experiment. This suggests, as expected, that the optimal correction will be different when considering what correction can be learned by the chosen algorithm. In other words, the FIML-Direct correction is optimal for the chosen neural network features, scaling, and structure. 196 Figure 6.51: FIML-Direct Cp comparison to available data kd for test case with ? = 10?7 6.5 Summary This chapter built on the 1D heat equation results by presenting the FIML- Classic, FIML-Embedded, and FIML-Direct results for RANS equations. All three methods were successfully demonstrated on the S809 airfoil using experimentally derived lift coefficient data. The full FIML-Classic methodology was performed for this application, followed by the FIML-Embedded, and FIML-Direct; all of which produced model augmentations based on the experimental data that improved pre- dictions. Most exciting, the results of the simultaneous inversion of multiple S809 airfoils at various angles of attack was performed. Performing the inversion on three angles of attack produced an augmentation that corrected predictions for the en- 197 tire angle of attack range. This demonstrated the improved generalization provided by the FIML-Direct approach, especially when considering multiple conditions in a single inversion and augmentation. The inversion was then continued considering the S809 at seven angles of attack. This further improved the augmentation and demonstrated the improved flexibility of the FIML-Direct method since the design variables are consistent across any number of cases. The final augmentation, in- corporating solely experimental lift coefficient data from the S809, was tested on a different airfoil: the S814. The augmented S814 airfoil predictions showed universal improvement across the angle of attack range, further demonstrating the general- ization capabilities of the FIML-Direct approach. Additionally, results for the FI-Classic and FIML-Direct approaches were ap- plied to the NASA Langley wall mounted hump. The FI-Classic inversion was shown both with and without the regularization constant. This demonstrated the effect of the regularization constant on the result of the inversion. It was shown that with- out the regularization constant the inverse pressure distribution closely matched the data and objective function convergence was very good. However, it was shown that the inverse solution likely is not realistic for that case, as it was extremely high in magnitude and not localized close to where the SA model is likely inaccurate (ad- verse pressure gradients and separated regions). It was shown that the regularization constant allows the modeler to constrain the inversion to more reasonable correc- tions, without much penalty in accuracy with respect to the available data. The FIML-Direct results for the NASA Langley wall mounted hump were also presented in this chapter. It was shown that the FIML-Direct method can provide similar 198 inversion performance as the FI-Classic method, while simultaneously training the network and augmenting the model. The effect of considering the limitations of the network was also discussed. It was observed that the correction for FIML-Direct was substantially different than the FI-Classic results. This demonstrates the effect of considering the neural network limitations, as the inverse solution is now depen- dent on the feature selection, feature scaling, and network structure. Therefore, the optimal solution will likely differ in some applications such as the NASA hump, as the ideal solution adapts to the limitations of the chosen learning method. 199 Chapter 7: Conclusions 7.1 Summary RANS turbulence modeling continues to be a challenging field. Despite decades of effort in the turbulence modeling community, RANS models continue to be defi- cient or unreliable for a variety of practical engineering applications. For example, RANS models typically fail to predict the correct forces on an airfoil at high an- gles of attack due to an incorrect prediction of the separation location. As DNS and LES are still decades away from being capable of simulating Reynolds num- bers high enough for many applications, RANS models will remain the method of choice for the majority of engineering computations for some time. There has been, however, substantial investment in high fidelity numerical simulations and physical experiments. These efforts provide higher fidelity data that turbulence modelers can draw on to improve RANS models. Despite the availability of this data, progress is difficult, as measured quantities cannot be used to improve models directly. Thus, there is substantial motivation to utilize machine learning models to fully leverage this existing dataset of higher fidelity data to improve the RANS models where they are deficient. Several researchers have attempted to learn directly from the higher fidelity data, but this approach suffers from a lack of consistency between 200 the higher fidelity data and the model. The FIML approach addressed this concern by first performing an inference step (in the form of an inverse problem) to generate model consistent information from the higher fidelity data that can then be learned and applied in the form of an augmented model. Despite this improvement, there remained a lingering inconsistency with the FIML approach: the model consistent information was not always able to be easily learned, or learned sufficiently at all during offline training. It was this observation that motivated the present work; the disconnect between the model augmentation and the information generated by the inversion could be avoided if the learning algorithm were considered during the inversion. The current effort attempted to resolve the inconsistency between the inverse problem and the model augmentation with the following approach, enumerated be- low as broad tasks. 1. The Stanford University Unstructured (SU2) open source CFD package was modified to perform the FIML process, referred to in this work as FIML- Classic. SU2 was selected due to its autodifferentiated (AD) discrete adjoint solver already configured for optimization problems. The use of AD for this ef- fort enabled efficient experimentation for the new FIML approaches as much of the development effort for the adjoint solver is handled automatically through AD. The FIML method was implemented in SU2 for the first time, and the FIML-Classic approach was performed on a variety of canonically difficult RANS problems. Additionally, the FI-Classic information generated from 2D 201 airfoil simulation results was used to perform a feature selection analysis. This enabled the analysis of potential inputs to the model augmentation, and ul- timately the selection of appropriate features and appropriate feature scaling for this problem. 2. A new FIML approach, termed FIML-Embedded, was proposed and analyzed. This unique approach was developed for this thesis and applied for the first time. The machine learning training algorithm was moved inside the model solver itself, such that the training is performed while the model variables are converging, and therefore the inverse procedure will account for the dynamics of the learning algorithm. Feedforward neural networks for regression were chosen as the machine learning algorithm, and the backpropagation training algorithm was implemented. The neural network is trained as the solver is converging, requiring several additional operations each solver iteration (fea- ture scaling, backpropagation, and forward propagation). For the first time, this method was implemented both for a 1D heat equation model problem and in the modified SU2 code for RANS applications. The 1D heat equation provided a simple, easily implemented problem that fully demonstrates the method and numerics involved, while the SU2 implementation demonstrates that the method can be applied to complex RANS simulations. 3. Another new FIML approach, termed FIML-Direct, was proposed and ana- lyzed. This new FIML approach was also developed in this thesis and applied for the first time. In this approach the weights of the neural network are con- 202 sidered the design variables of the inversion. This directly trains the neural network without the need for the backpropagation algorithm. The FIML- Direct method was also implemented for the 1D heat equation model problem and the modified SU2 package. Additionally, a methodology for incorporating information from multiple cases was developed for the FIML-Direct approach. For the first time multiple cases can be considered in the inference step simul- taneously. For the 1D heat equation this enabled the simultaneous inversion of multiple temperature distributions, and for the 2D airfoil results the in- version was performed for up to seven cases simultaneously. The resulting augmentation was then shown to improve predictions on a different airfoil, not included in the training set. This effectively demonstrated the methodol- ogy for including multiple cases in the inversion and augmentation, which is a requirement for generating augmentations with adequate generalization for practical predictions. Additionally, for the first time this approach was also applied to other individual RANS cases, including the NASA Langley wall mounted hump and the RAE2822 transonic airfoil. 7.2 Key Observations For the first time, it was shown that the FIML process can be improved by accounting for the limitations of the chosen machine learning algorithm inside the inversion process. By doing so, learning is performed during the inference step, the model correction will be optimally regularized for the chosen learning algorithm, 203 and there is the highest possible consistency between the inference step and the prediction environment. It was shown that the FIML-Classic inverse information can be difficult to learn due to insufficient regularization. This can be managed somewhat through careful construction of the objective function and the regulariza- tion constant (?), but ultimately there is no guarantee that the optimal discrepancy is learnable by the learning algorithm selected by the modeler. Even when the in- verse information can be learned, it was shown that there is a noticeable decrease in performance of the augmentation due to imperfect training, which illustrated the potential inconsistency between the inference and predictive environments in the FIML-Classic approach. Both new FIML approaches were shown to address this inconsistency by performing the learning in the inversion process. First, general conclusions made for the two new FIML approaches and the FIML-Classic approach are discussed. Then, specific conclusions made from the FIML applications to the model problem and RANS cases are presented. 7.2.1 FIML-Embedded The FIML-Embedded approach, developed and applied for the first time in this thesis, was shown to provide increased regularization over the FIML-Classic method. Additionally, by performing the backpropagation algorithm inside the solver dur- ing the inversion the inverse solution was guaranteed to be learnable. The optimal discrepancies for the FIML-Embedded applications showed increased regularization over the FIML-Classic procedure for all applications considered. It was also demon- 204 strated that the gradient could be efficiently computed through additional adjoint variables, with only a minor increase in computational cost over the FIML-Classic approach. Unfortunately, there were also negative observations associated with this method. Performing the backpropagation algorithm in the solver itself increased the complex- ity of the simulation. This added complexity adversely affected the robustness of the inverse solution and is a drawback to the approach. Additionally, as currently formulated it is not readily apparent how to incorporate information learned from multiple cases with the FIML-Embedded approach. Possible solutions to this draw- back were proposed, but have not yet been explored. Nevertheless, it was observed that the FIML-Embedded approach could be performed with similar inverse problem convergence as the FIML-Classic approach. If the backpropagation algorithm converges sufficiently, the nonlinearity of the neural network is effectively hidden from the optimizer. By hiding this non-linearity it is expected that the FIML-Embedded approach could have substantial advantages over the FIML-Direct approach for large and/or complex network structures. For the attempted applications considered, however, the convergence of the FIML-Classic, FIML-Embedded, and FIML-Direct approaches were comparable. 7.2.2 FIML-Direct The FIML-Direct approach, developed and applied for the first time in this thesis, was shown to give increased regularization over the FIML-Classic approach 205 for the RANS applications. More regularization, in fact, over the FIML-Embedded approach as well. It was demonstrated that the FIML-Direct approach could effi- ciently train the neural network by considering the weights as the design variables directly. This is somewhat of a surprising result, as typically thousands of iterations of the backpropagation algorithm are required to train a neural network from static data. It was shown that by considering the weights as the design variables in the FIML cost function the network could be minimized in a manageable number of evaluations (typically 10-20 observed). This is an exciting observation, and because of this result the FIML-Direct approach provides an alternative, efficient, and ad- vantageous approach to training neural network augmentations that improve models with data. It was also demonstrated that the FIML-Direct approach has a substantial advantage over the FIML-Embedded procedure because the design variables in the inversion are the same across multiple cases. The model discrepancy is not expected to be the same across multiple cases, but the weights of the neural network are (a sin- gle augmentation that improves predictions for multiple cases is desired). Therefore the inverse problem can be posed for an unlimited number of cases simultaneously by minimizing a composite cost function. For the first time, the FIML inversion was performed considering multiple cases simultaneously. It was observed, both for the model problem and the RANS applications, that the data from multiple cases could be successfully incorporated into a single augmentation via the FIML-Direct approach with simultaneous inversion. Additionally, it was demonstrated with the model problem that it is not necessary to compute the full gradient (which is costly), 206 but alternatively the partial gradient could be computed and the stochastic gradient descent method used to train the augmentation. This method could be advantageous when considering a large number of cases. 7.2.3 1D Heat Equation Model Problem The 1D heat equation model problem was shown to be a computationally efficient problem to evaluate the FIML procedures and make direct comparisons between algorithms. The FIML-Classic, FIML-Embedded, and FIML-Direct algo- rithms were all implemented for this problem, and all produced model augmentations that dramatically reduced the error of the augmented model. Thus it was demon- strated that FIML-Embedded and FIML-Direct approaches can both produce useful model augmentations during the inversion step, unlike the FIML-Classic method which requires offline training following inversion. The exact analytical solution for the optimal correction is known for the 1D heat equation case. It was shown that the FIML-Classic and FIML-Direct methods, when applied to a single case, can find the optimal correction distribution. The FIML-Embedded implementation did not find the true distribution, but was still capable of producing a model augmentation that substantially reduced prediction error. The objective function convergence of the three methods was also compared, and it was shown that, as expected, the conver- gence of the FI-Classic approach was by far the best because the limitations of the learning algorithm are neglected in the inverse problem of FI-Classic. The conver- gence of the FIML-Direct method was perhaps most surprising, as it demonstrated 207 excellent convergence despite learning during the inversion step. Model augmentations considering information from multiple cases were also generated and tested for the FIML-Classic and FIML-Direct methods. For the FIML-Classic method, this requires the solution of the inverse problem for each case to generate training data. The data from each case is then compiled into a single training set, and a model augmentation is trained offline. For the FIML-Direct case, it was shown that multiple cases could be considered simultaneously, such that the inversion not only learns during the inversion, but also learns information from multiple cases. For the first time, the inverse problem can incorporate information from multiple cases simultaneously. The augmentation was trained using several constant T? distributions, and was shown to improve predictions for constant T? distributions not in the training set, and also on variable T? holdout data. In summary, the conclusions from the 1D heat equations were: 1. Learning during the inversion process is feasible, either with FIML-Embedded or FIML-Direct with only a minor penalty on achievable accuracy of the re- sulting model augmentation over the FIML-Classic approach. 2. The FIML-Direct method exhibits convergence characteristics comparable to the FIML-Classic approach, while simultaneously learning the model augmen- tation. This is unexpected and exciting, given the number of iterations often required to train neural networks. 3. The FIML-Direct method can learn from multiple cases simultaneously, con- sidering the limitations of the network during the inversion, and producing 208 the optimal model augmentation for the chosen network structure. 4. Only a limited number of training cases are required to produce model aug- mentations that improve predictions for cases that interpolate in, or mildly extrapolate from, the feature space of the training set. 5. For FIML-Direct, the full gradient computation is not required to produce useful augmentations. The stochastic gradient descent method was demon- strated using a single case as a batch, which may have practical advantages over the full gradient computation when considering a large number of cases. 7.2.4 2D Incompressible Airfoil Results All three FIML approaches were applied to the S809 incompressible wind tur- bine airfoil. The Spalart-Allmaras turbulence model with the algebraic transition model was used (SA-BC). At high angles of attack it was shown that the baseline SA-BC model results in a substantial error in predicted lift coefficient, due to a smaller separated region than observed in experiments. The inverse solutions for all three methods demonstrate that this error is due to an over-production of eddy vis- cosity on the upper surface of the airfoil. It was shown that the FI-Classic approach could correct this error, but that there was a substantial penalty due to imperfect offline learning that resulted in increased error in the resulting augmentation. Thus an issue with FIML-Classic was demonstrated: by not accounting for the limita- tions of the neural network in the inversion there is an inconsistency between the inverse solution and the model augmentation. The FIML-Embedded approach was 209 performed and it was shown that this approach could produce a model augmen- tation in the inversion step that substantially reduces prediction error. However, it was also shown that the FIML-Embedded procedure has substantial robustness issues that prevented the inversion from fully converging. The FIML-Direct appli- cation for this single angle of attack, however, did not show this robustness issue. The resulting model augmentation for the FIML-Direct procedure provided a sub- stantial reduction in error while regularizing the correction. It was shown that the regularization parameter could be decreased because the required regularization was being provided by considering the limitation of the neural network in the inversion process. In summary, the conclusions from the S809 FIML applications at a single angle of attack were: 1. The FIML-Classic procedure can result in an inconsistency between the inverse solution and the model augmentation due to imperfect learning. 2. Both new methods, FIML-Embedded and FIML-Direct, were applied to RANS cases for the first time and were shown to produce useful model augmentations for the RANS problems considered. 3. The FIML-Embedded approach, as currently formulated, has significant ro- bustness issues that can complicate the inversion procedure. 4. The FIML-Direct approach demonstrates surprisingly efficient training of the neural network on a RANS application, consistent with the heat equation conclusion. 210 7.2.5 2D Incompressible Airfoil Results for Multiple Cases For the FIML-Direct method, a model augmentation was generated from the S809 airfoil considering experimentally observed lift coefficient data for up to 7 an- gles of attack simultaneously. A composite objective function was constructed, and initially an augmentation was produced considering three angles of attack. It was shown that the FIML-Direct algorithm could train a neural network that improved force predictions for the three angles of attack in the training set, and also for four additional angles of attack that were not. This further demonstrates that only a limited amount of higher fidelity data is required to produce useful model augmen- tations. In this case, the only higher fidelity data used was the experimental lift coefficient at three angles of attack, and the resulting model augmentation reduced model error for the entire S809 angle of attack range. The model augmentation training was then continued using the FIML-Direct method and all seven S809 angles of attack. Minor improvement in the model augmentation was shown when considering this extra information. This model aug- mentation, trained exclusively on S809 airfoil data, was then tested on a different geometry: the S814 airfoil. It was shown that the model augmentation substantially improved the S814 predictions where the model was deficient (high angles of attack), but did not hurt accuracy where the model was already accurate (lower angles of attack). In summary, the FIML-Direct application to the S809 at multiple angles of attack resulted in the following conclusions: 211 1. The simultaneous inversion of multiple RANS cases, performed for the first time in this thesis, produces model augmentations with good generalization capabilities. 2. The FIML-Direct algorithm exhibits good convergence during the inversion, even when considering multiple RANS cases. 3. Only a limited quantity of higher fidelity information is required to produce useful model augmentations via the FIML-Direct approach. 7.2.6 Transonic Airfoil Results The RAE2822 transonic airfoil was considered for both the FI-Classic and FIML-Direct approaches. For the conditions considered this airfoil has a substantial supersonic region which terminates in a shock boundary layer interaction. Following the shock there is a subsonic region with an adverse pressure gradient. The base- line SA model showed a small error in predicted shock location, and the pressure distribution was poorly predicted in the vicinity of the SWTBLI. Wind tunnel pressure coefficient data was used to formulate the objective function, and the FI-Classic and FIML-Direct methods were performed. The model results following the FI-Classic inversion compared very well with the experimental data, and there was a large correction in the SWTBLI region. This modified the location of the shock and the response of the boundary layer to the shock, which smeared the pressure rise over a larger area. This is consistent with the dynamics of a smeared shock-foot that is a characteristic of transonic airfoil SWTBLI. The 212 FIML-Direct method yielded a model augmentation that substantially improved the prediction of the shock location, but was not able to improve the pressure distribution through the SWTBLI as well as the FI-Classic inversion. This is likely due to the increased regularization of the FIML-Direct approach that prevented the precise correction required in the region of the SWTBLI. It is possible that the features used, chosen for the S809 case, are not optimal for SWTBLI cases. It is also possible that a more rigorous investigation into the neural network structure (size of hidden layers, number of neurons, choice of activation function, etc.) could produce better results. In summary, the conclusions from the RAE2822 case were: 1. The SA baseline model incorrectly predicts the shock location, and does not accurately model the pressure distribution through the smeared shock-foot. 2. The inverse solution from the FI-Classic approach can accurately correct the observed SA model deficiencies, but this correction requires relatively intrusive corrections at very precise locations, indicating that the inverse solution is poorly regularized. 3. The FIML-Direct method can be successfully applied to SWTBLI and move the shock location, but it was not possible to adjust the pressure distribu- tion through the SWTBLI for the chosen learning algorithm, neural network structure, and features. 213 7.2.7 NASA Langley Hump Results The NASA Langley wall mounted hump was also considered. This is an ad- ditional 2D, incompressible, separated flow application that has been shown to be difficult to successfully model using RANS. Experimentally obtained pressure distri- bution was used as the higher fidelity data, and both the FI-Classic and FIML-Direct approaches were applied. Results for the FI-Classic approach both with and without the regularization constant were presented. In summary the conclusions for the NASA Langley hump results were: 1. The regularization constant is critical to producing meaningful model aug- mentations; without this parameter the inverse solution can be unnecessarily intrusive. 2. When considering the limitations of the neural network, the model correc- tion from the FIML-Direct approach can be substantially different than the correction resulting from FI-Classic method. 7.2.8 Other RANS Applications Two other RANS applications were considered, the 2D hypersonic wedge and the 3D Onera M6 wing (results in Appendices A and B). The FI-Classic procedure was performed, and in both applications the inverse solution more closely matched the higher fidelity data. These cases further demonstrate the capability of the inverse procedure to generate corrections that improve the model for a wide range 214 of historically difficult RANS cases. 7.3 Recommendations for Future Work In this section recommendations for future work are presented. The FIML approach up this point has been developed and demonstrated on a variety of difficult RANS applications. There remains the task of constructing a robust augmented model that is shown to improve predictions over a broad range of applications. Such an effort would not be trivial, as it would be necessary to construct inverse problems that adequately cover the feature space of the problem of interest, and despite the relative efficiency of the FIML approach over sampling methods substantial computational resources would still be required. Nevertheless the present work and the FIML approach in general has certainly demonstrated that such an effort could be successful. For the two new FIML methods proposed, implemented, and analyzed in the present effort, the FIML-Direct approach is comparatively more developed than the FIML-Embedded method, as evidenced by the results in Chapters 5 and 6 and the discussion in detail below. 1. Feature Selection: Chapter 6 presented the feature selection approach used in the current effort, and discussed the reasons that conclusions concerning feature selection are difficult. In particular, the best features will be appli- cation dependent. The limited feature selection study in this effort focused on FI-Classic inversion results for the S809 airfoil, and it is not expected that 215 conclusions from this limited dataset apply to RANS applications in general. Additionally, the appropriate features may depend on the chosen machine learning algorithm. For FIML-Embedded and FIML-Direct feature selection is particularly important, as the inversion cannot be successful unless the fea- tures are appropriately chosen since the limitations of the learning algorithm are considered in the inversion. In particular, it was observed that the FIML- Direct correction for the NASA Langley hump case was particularly intrusive compared to the regularized FI-Classic result. It is likely that improved feature selection, tailored to this application, could produce a less intrusive augmen- tation than achieved in this effort. 2. Feature Scaling and Outlier Rejection: A related issue related to the features is feature scaling and outlier rejection. Extreme outliers were observed in the selected features that, if not properly accounted for, adversely affected the training of the neural network. In the current effort, the features were mapped to a normal distribution. This minimized the effect of outliers and enabled sufficient training for the FIML approaches presented in the current work. However, it is likely that the treatment of outliers could be improved, and this would likely improve the performance of all three FIML approaches. Perhaps a filtering algorithm could be implemented. The current effort tested a filtering method based on a shielding function often used in delayed detached eddy simulations (DDES)1, but it was found that this did not sufficiently 1The SA-DDES shielding function, fd, [113] was used to filter out all points not in the boundary layer for an S809 airfoil at high angle of attack. This method was found to not sufficiently minimize 216 reduce the effect of outliers and therefore was not used for any of the results presented in this thesis. A more robust treatment of outliers would almost certainly improve performance. 3. Learning Algorithm: As with all machine learning applications, the results are highly dependent on the choice of machine learning algorithm. Only neural networks were considered in the present effort, but many other algorithms could be applied. In particular for neural networks, the modeler has many choices including the number of hidden layers, number of nodes, choice of ac- tivation function, etc. There are also many types of neural networks, and only fully connected feedforward neural networks were considered. To fully explore the capability of both FIML-Embedded and FIML-Direct a more rigorous in- vestigation into all of these choices is required. 4. FIML-Embedded Development: The FIML-Embedded approach was found to be difficult to perform for RANS applications. To succeed, the implementation of the backpropagation algorithm must not adversely affect the solver routines. Unfortunately, this was observed in the RANS applications presented in this thesis resulting in premature termination of the inversion. Feature scaling may have some influence on this issue, as it was found that mapping the fea- tures to a normal distribution substantially improved inversion performance. More robust treatment of outliers may keep intermediate weight values from the effect of outliers, and also filtered out some of the separated region which could remove some areas from consideration that require correction. 217 predicting excessive corrections that prevent convergence of the flow solver. Thus, more effective outlier treatment may also resolve this robustness issue with the FIML-Embedded approach. 5. FIML-Embedded for Multiple Cases: The ability to perform the inversion in- corporating the information from data for multiple cases has not been devel- oped for the FIML-Embedded approach. Currently, this gives a strong incen- tive to using the FIML-Direct approach as this capability is likely required in order to adequately cover the feature space for a practical (useful) augmenta- tion. Several possibilities for how to improve on the current FIML-Embedded methodology were proposed in Chapter 3, but these concepts remain unex- plored. 6. Learning Without Forgetting: Perhaps the biggest obstacle to obtaining an augmentation that improves a RANS model for a wide range of applications is a practical issue. How can a single researcher or team practically or efficiently augment a model that applies to every case in an area of interest? Even if such a model is developed, how do you update such a model if it is found to be deficient in a new application? Currently, for FIML-Classic, to incorporate new information all the previous information is also required. It is certainly possible to retrain an existing augmentation with new data, but if the old data is not included in the updated training set the augmentation will ?forget? the old data, such that the updated augmentation will have diminished perfor- mance on the old training set. Therefore, to retrain the augmentation with 218 new information, the new information must be added to the old and a new augmentation learned. This is also an issue with the FIML-Direct approach, as new data can be incorporated, but the inversion must be continued with all of the cases (old and new) as was demonstrated with the S809 airfoil aug- mentation in Chapter 6. Clearly, it will be impractical to share, store, and distribute large quantities of such data with the turbulence modeling com- munity. Therefore, there is a need for algorithms that can incorporate new information without forgetting the old, such that a model augmentation can be updated efficiently without forgetting what has already been learned. This is termed ?learning without forgetting?, and is the concept of incorporating new data into an already trained model without access to the original data used to learn that model. Some algorithms exist [114, 115], and this capability in the FIML framework would greatly increase the ability for a community to develop useful model augmentations collaboratively. 7. Uncertainty Quantification: The FIML implementations in this work largely neglected formal uncertainty quantification. This is a major shortcoming, as the model and the data both have uncertainties that should be rigorously eval- uated and quantified. In some applications, the FIML-Classic approach used a Bayesian formalism to evaluate model posterior uncertainties [39]. For the ap- plications of the new methods in this thesis, the Bayesian framework was not used, and only a cursory accounting of the prior model confidence was utilized through the regularization constant. Additionally, neural networks can incor- 219 porate Bayesian uncertainty quantification by formally assigning uncertainty distributions to their parameters. Thus the model posterior probability can be evaluated. Utilizing Bayesian neural networks in the FIML approach could help avoid over-fitting and enable more successful learning from small datasets. Formal uncertainty quantification for the FIML-Direct and FIML-Embedded methods remains an area for future work. 220 Appendix A: Hypersonic Wedge A.1 Overview Accurate prediction of shock wave turbulent boundary layer interactions is also a difficult problem for the design of hypersonic vehicles. At these much higher Mach numbers concave corners can create very strong SWTBLI, and the prediction of the surface heat flux and pressures involved in these interactions can have a dramatic impact on design decisions for these vehicles. RANS models have difficulty predicting hypersonic SWTBLI due to the very strong adverse pressure gradients and strong nonequilibrium effects in the boundary layer. To examine the FIML ability to improve RANS predictions for a hypersonic SWTBLI a 2D compression corner was modeled. The 36? compression corner pre- sented by Holden et al. [116, 26] was modeled using the SA turbulence model. Incoming Mach 11.3 air encounters a flat plate at 0? incidence and a turbulent boundary layer develops1. This turbulent boundary layer encounters a 36? ramp which results in a SWTBLI that separates the boundary layer upstream of the 1The experiment used natural transition [116], but the SA-BC transition model was not used for this case due to the poor performance of this model for these hypersonic conditions. Therefore the CFD results are fully turbulent. 221 ramp corner. The experiment was performed at the CUBRC shock tunnel facilities, and the model was wide enough that three-dimensional effects (from finite width) are not expected to be significant. It has been shown [116, 26] that RANS models are unable to accurately predict the separation length, pressure distribution, and heat flux distribution in the interaction region and downstream on the ramp. The geometry for this experiment is illustrated in Figure A.1. Figure A.1: Illustration of geometry for hypersonic wedge case. The SA model used to this point (SA-noft2) [17] was found to produce poor results for this application. This model predicted no separated region and thus the physical dynamics of interest for this problem were not predicted at all. Another variant of the SA model, the SA-SALSA [83], was shown to produce better pre- dictions for shock boundary layer interactions, and has been implemented in codes typically applied to hypersonic applications [117]. The SU2 v5.0 package did not include the SA-SALSA variant, but it is a relatively straightforward modification as outlined in Chapter 4, and it was implemented for use on this hypersonic SWTBLI application. Figures A.2, A.3, and A.4 show the Mach, CP , and temperature contour pre- 222 Figure A.2: Mach number contours for baseline SA-SALSA model. Figure A.3: Pressure coefficient contours for baseline SA-SALSA model. dictions for the baseline SA-SALSA model at the interaction region. Note the com- putational domain extends much further in the left and right of the region shown in order to develop the incoming boundary layer from freestream conditions on the plate, and to predict the flow conditions downstream of the interaction region on the 223 Figure A.4: Temperature (?K) contours for baseline SA-SALSA model. wedge. Note that in Figure A.2 the boundary layer separates in front of the corner and a recirculating region develops. A shock is shown in the boundary layer (sepa- ration shock) from the separation location that then impinges on the much stronger shock formed by the wedge. Note that there is a large range of pressure coefficient magnitudes as shown in Figure A.3. Additionally, extremely high temperatures are predicted in the interaction region as shown in Figure A.4. These temperatures are greatest near the reattachment location on the wedge. The wall is modeled as an isothermal boundary with T = 300?wall K, and this extremely high temperature produces very high heat flux at the wall. 224 A.2 FI-Classic The FI-Classic procedure was applied to the hypersonic wedge case using ex- perimental pressure coefficient information CP . This pressure distribution was in- terpolated to the wall boundary nodes in the computational domain to produce the target pressure distribution C?P . Note that for this application the range of CP val- ues is very large. Upstream of the wedge the pressures will be low as there will only be a very weak compression from the boundary layer growth, and a stronger, but still relatively small compression from the separation shock upstream of the wedge corner. Downstream of the wedge corner the pressures will be dramatically higher from the compression of the very strong oblique shock to turn the air to the wedge angle. To account for this large magnitude range of the target (experimental) pres- sure distribution, the objective function was normalized by the target distribution at that point in the domain. The objective function is then given by Equation A.1. ?( )k 2 ?nC?pexp,i ? CP,i(?) 1Jc(?) = + ? (? 2j ? 1.0) (A.1) 2 i=1 C?pexp,i j=1 The convergence history of Equation (A.1) is shown in Figure A.5. Note that the convergence is generally quite good, but requires more evaluations than other cases presented in this thesis. Additionally, the large increases observed in the intermediate evaluations (12-21) are due to line search steps that resulted in separation lengths larger than the experiment. This gives a large value for the numerator in the first term of Equation A.1 that is normalized by a very small value giving a very large objective function value. This is not an issue, as the optimizer 225 quickly found a lower objective value in the subsequent step in the line search. Despite the good trend in objective function value, the inversion was terminated after 30 evaluations because subsequent attempts to improve the objective further were not successful due to flow solver convergence issues. Figure A.5: Objective function convergence for FI-Classic hypersonic wedge appli- cation. The cause of these issues is apparent when the correction is visualized (Figures A.6 and A.7). Note that at the interaction region (Figure A.6) the magnitude of the correction is very large ?5 and is very localized. The correction in this case is required to be very large and in very specific locations in the interaction region. Note that there is a very large correction at the separation location, followed by a region where ? ? 1.0. Then, closer to the wall the shear layer above the separated region shows ? < 1.0 corresponding to destruction of eddy viscosity, while the adjacent separated region is showing a very large positive correction indicating the 226 eddy viscosity should be increased there. Subsequent evaluations of the optimizer attempted to produce even larger magnitude corrections than shown in Figure A.6, however this resulted in numerical convergence issues of the flow solver and therefore the inversion was terminated. Figure A.6: Correction (?) at interaction region for hypersonic wedge FI-Classic application. Nevertheless, the FI-Classic inversion substantially improved the pressure dis- tribution, and the FI-Classic inverse solution more closely matches the experimental data as shown in Figure A.8. Note that the separation length is predicted very well by the inverse solution, and the resulting pressure distribution downstream of the interaction region is also improved. However, the magnitude of the pressure in the separated region is improved, but is still substantially lower than the experiment. Experimental data for wall heat flux is also available for this experiment. The heat flux distribution results are shown in Figure A.9 in units of W/m2 227 Figure A.7: Correction (?) upstream of interaction region for hypersonic wedge FI-Classic application. Figure A.8: Pressure distribution near the interaction region for FI-Classic hyper- sonic wedge application. Note that the heat flux results show positive and negative trends. The baseline SA-SALSA model correctly predicts the heat flux on the flat plate ahead of the 228 Figure A.9: Wall heat flux near the interaction region for FI-Classic hypersonic wedge application (W/m2). interaction region. The large corrections in this region, that are shown to help the pressure distribution, negatively impact the heat flux prediction on the flat plate. Additionally, the heat flux prediction was adversely affected in the separated region. Clearly this is problematic. Nevertheless, heat flux distribution downstream of the corner shows substantial improvement over the baseline model. The peak heat flux prediction is poor for the baseline SA-SALSA model, but much improved for the inverse solution. A.3 Conclusions A hypersonic wedge producing a SWTBLI interaction was considered. Com- pared to the other SWTBLI cases presented in this thesis, this case produces a 229 much stronger shock and therefore it is expected that the required correction to the baseline model will be comparatively larger. The baseline SA model was shown to produce poor results (no separated region), and therefore the SA-SALSA turbulence model was implemented in SU2. The SA-SALSA method predicted a separated re- gion, but the separation length and pressure throughout the interaction region was still poorly predicted. The FI-Classic method was applied. Experimentally measured pressure distri- bution was used to construct the cost function, and the error term was normalized by the local experiment CP in order to account for the large range in CP magni- tude for this case. The inversion was performed and the objective function showed good improvement. However, it was found that eventually the required correction adversely affected the convergence of the flow solver, and therefore the inversion terminated while the cost function was still improving. Nevertheless, substantial improvement in the pressure distribution was observed prior to these numerical issues. The required correction was very large in magnitude. Specifically in the interaction region, the correction changes sign and magnitude at very specific lo- cations in the separated region. This poor regularization could possibly make this correction difficult to learn, either offline in the FIML-Classic framework, or during the inversion for FIML-Direct applications. Additionally, it was observed that the inverse solution negatively affected the prediction of the wall heat flux on the flat plate and in the interaction region. Despite this, the heat flux prediction downstream of the interaction region was improved. Overall, the heat flux results are disappointing, however, it is likely that 230 the inverse solution could be improved by including the experimental heat flux data in the objective function. Exploration for the best objective function for hypersonic SWTBLI cases remains an area for future work. 231 Appendix B: Onera M6 B.1 Overview The FIML framework was applied to the Onera M6 transonic wing. Unlike the other RANS applications in this thesis, this case is three dimensional. This is a good demonstration of the advantage of implementing the FIML framework in the SU2 package, and demonstrates that the FIML approach can be applied to more complicated geometries, and to cases with large grid sizes sufficient to model practical engineering applications. The Onera M6 transonic wing is a low aspect ratio design with experimen- tal data presented by Schmitt and Charpin [118]. For the case considered, the freestream Mach number is M? = 0.84 and the wing is at 3.06 ? angle of attack. Un- der these conditions, the wing is transonic and experimental data shows a ?lambda? shock structure on the upper surface of the wing [119], with the name referring to the shape of the shock structure on the upper surface of the wing. Similar to the RAE2822 airfoil this establishes SWTBLIs on the upper surface that have strong adverse pressure gradients that are difficult to accurately predict with RANS models. Predictions for the Onera M6 wing were made using the SA turbulence model. Figure B.1 displays the surface pressure contours for this case. Note the lambda 232 shock structure and the adverse pressure gradient behind the shocks similar to the RAE2822 case. Figure B.1: Surface pressure CP contours for baseline SA model on upper surface of Onera M6 wing. B.2 FI-Classic The FI-Classic procedure was applied to the Onera M6 wing. The experimental pressure distribution was used as the higher fidelity data. This data is relatively sparse and only available at 7 chordwise stations along the wing [120, 118]. A 2D 233 interpolation was performed to find the expected (interpolated) pressure coefficient at each node on the surface of the wing (C?P ). Note that no extrapolations were performed, so a portion of the wing near the root and tip are excluded from the objective function calculation. The objective function for this case is the same as for the RAE2822 FI-Classic implementation, and given by Equation B.1. For this case, ? = 10?6 was used. ?k 1 ?n Jc(?) = (C?pexp,i ? CP,i(?))2 + ? (?j ? 1.0)2 (B.1)2 i=1 j=1 The objective function history during the inversion (min? Jc(?)) is shown in the bottom right panel of Figure B.2, along with the pressure distribution at span locations (?) where experimental data is available. Figure B.2: FI-Classic results for Onera M6 wing. The largest improvement is shown at stations ? = 0.2, 0.44, and 0.95, better 234 illustrated in in Figures B.3, B.4, and B.5. This improvement is primarily due to moving the aft shock location slightly more forward at these locations. Figure B.3: Inverse results at span ? = 0.2. Additionally, the improvement in prediction can be visualized by looking at the error scatterplot shown in Figure B.6, which again shows that the most improvement is near the mid-chord of the wing closer to the root of the span. In 3D it is more difficult to visualize the correction (?) but in this case the correction is largely localized near ? = 0.2 as shown in Figures B.7 and B.8 with a small correction at ? = 0.95 as shown in Figure B.9. Note that at ? = 0.2 the correc- tion is almost universally decreasing eddy viscosity production, which is consistent with the observations for the RAE2822 airfoil. The SA model tends to over-predict eddy viscosity in adverse pressure gradients, and the FI-Classic inversion shows that corrections are improved if the eddy viscosity is decreased in these regions. 235 Figure B.4: Inverse results at span ? = 0.44. Figure B.5: Inverse results at span ? = 0.95. 236 Figure B.6: Scatterplot colored by error in pressure coefficient (|C?P ? CP |). Figure B.7: Correction (?) at span ? = 0.2. B.3 Conclusions FI-Classic results for the Onera M6 wing were presented in this appendix. These results demonstrate that the FIML approach can also be applied to 3D RANS 237 Figure B.8: Correction (?) at span ? = 0.2, with wing surface contoured by Cp. applications at a practical scale. Naturally the computational expense increases, however, this demonstrates proves that the added cost of considering 3D cases is not insurmountable. For this application, experimental pressure data were interpolated onto the surface of the wing at every node in the computational domain. The inversion had good convergence, and the resulting error distribution showed that the primary increase in accuracy was towards the root of the wing. The loca- tion of the second shock was moved forward slightly, which improved the pressure 238 Figure B.9: Correction (?) at span ? = 0.95. distribution. When visualizing the correction, it was shown that the largest correc- tion was in this region, and that the correction almost universally decreased eddy viscosity production. This result is consistent with the 2D RAE2822 airfoil result in that the eddy viscosity tends to be over-produced by the baseline SA model. This 3D application demonstrates that the FIML procedure can be applied to more complicated geometries and larger computational domains that will be required for practical engineering applications. 239 Bibliography [1] Joseph A Schetz and Rodney DW Bowersox. Boundary layer analysis. Amer- ican Institute of Aeronautics and Astronautics, 2011. [2] Francis H Clauser. The turbulent boundary layer. In Advances in applied mechanics, volume 4, pages 1?51. Elsevier, 1956. [3] Shivaji Medida. Correlation-based Transition Modeling for External Aero- dynamic Flows. PhD thesis, University of Maryland, College Park, Uni- versity of Maryland, College Park, MD, 20742, USA, 7 2014. URL http: //hdl.handle.net/1903/15150. [4] Holger Babinsky and John K Harvey. Shock wave-boundary-layer interactions, volume 32. Cambridge University Press, 2011. [5] Ludwig Prandtl. Motions of fluids with very little viscosity. National Advi- sory Committee for Aeronautics, (452), 1928. URL https://ntrs.nasa.gov/ search.jsp?R=19930090813. [6] Osborne Reynolds. Iii. an experimental investigation of the circumstances which determine whether the motion of water shall be direct or sinuous, and of the law of resistance in parallel channels. Proceedings of the Royal Society of London, 35(224-226):84?99, 1883. doi: 10.1098/rspl.1883.0018. URL http:// rspl.royalsocietypublishing.org/content/35/224-226/84.abstract. [7] Bruno Eckhardt. Introduction. turbulence transition in pipe flow: 125th anniversary of the publication of reynolds? paper. Philosophical Trans- actions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 367(1888):449?455, 2009. ISSN 1364-503X. doi: 10.1098/rsta.2008.0217. URL http://rsta.royalsocietypublishing.org/ content/367/1888/449. [8] Theodore von Ka?rma?n. Mechanical similitude and turbulence. Nachrichten von der Gesellschaft der Wissenschaften zu Gottingen - Fachgruppe I (Math- ematik), 5:58?76, 3 1930. URL https://ntrs.nasa.gov/search.jsp?R= 19930094805. 240 [9] Donald Coles. The law of the wake in the turbulent boundary layer. Journal of Fluid Mechanics, 1(2):191226, 1956. doi: 10.1017/S0022112056000135. [10] William K George. Is there a universal log law for turbulent wall- bounded flows? Philosophical Transactions of the Royal Society of Lon- don A: Mathematical, Physical and Engineering Sciences, 365(1852):789? 806, 2007. ISSN 1364-503X. doi: 10.1098/rsta.2006.1941. URL http: //rsta.royalsocietypublishing.org/content/365/1852/789. [11] John Kim, Parviz Moin, and Robert Moser. Turbulence statistics in fully developed channel flow at low reynolds number. Journal of Fluid Mechanics, 177:133166, 1987. doi: 10.1017/S0022112087000892. [12] P.S. Bernard and J.M. Wallace. Turbulent Flow: Analysis, Measurement, and Prediction, chapter 4, pages 111?112. John Wiley & Sons, 2002. ISBN 9780471332190. URL https://books.google.com/books?id=N44_ L3SCN6EC. [13] Anand Pratap Singh, Shivaji Medida, and Karthik Duraisamy. Machine- learning-augmented predictive modeling of turbulent separated flows over air- foils. AIAA Journal, 55(7):2215?2227, 2017. URL https://doi.org/10. 2514/1.J055595. [14] Alan Celic and Ernst H Hirschel. Comparison of eddy-viscosity turbulence models in flows with adverse pressure gradient. Aiaa Journal, 44(10):2156? 2169, 2006. [15] Aleksey A Matyushenko, Eugeniy V Kotov, and Andrey V Garbaruk. Cal- culations of flow around airfoils using two-dimensional rans: an analysis of the reduction in accuracy. St. Petersburg Polytechnical University Journal: Physics and Mathematics, 3(1):15?21, 2017. [16] Dan M. Somers and James Tangler. Design and experimental results for the s809 airfoil. Technical report, National Renewable Energy Laboratory, 01 1997. URL https://www.nrel.gov/docs/legosti/old/6918.pdf. [17] P. R. Spalart and S. R. Allmaras. A one-equation turbulence model for aero- dynamic flows. In 30th Aerospace Sciences Meeting and Exhibit, 1992. URL https://doi.org/10.2514/6.1992-439. [18] Samet Caka Cakmakcioglu, Onur Bas, and Unver Kaynak. A correlation- based algebraic transition model. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 232(21):3915? 3929, 2018. doi: 10.1177/0954406217743537. URL https://doi.org/10. 1177/0954406217743537. [19] PR Spalart, WH Jou, M Strelets, and SR Allmaras. Comments on the fea- sibility of les for wings, and on a hybrid rans/les approach. 1st afosr int. In Symp. Eng. Turb. Modelling and Measurements, May, pages 24?26, 1997. 241 [20] Philippe R Spalart. Detached-eddy simulation. Annual review of fluid me- chanics, 41:181?202, 2009. [21] Michael S Holden, Timothy P Wadhams, John K Harvey, and Graham V Candler. Comparisons between measurements in regions of laminar shock wave boundary layer interaction in hypersonic flows with navier-stokes and dsmc solutions. Technical report, AEROTHERMAL AND AERO-OPTICS EVALUATION CENTER BUFFALO NEW YORK, 2006. [22] Michael S Holden, Tim P Wadhams, and Matthew G MacLean. Measurements in regions of shock wave/turbulent boundary layer interaction from mach 4 to 10 for open and blind code evaluation/validation. In 21st AIAA Computational Fluid Dynamics Conference, page 2836, 2013. [23] Michael Holden. Measurements in regions of shock wave/turbulent boundary layer interaction from mach 4 to 7 at flight duplicated velocities to evaluate and improve the models of turbulence in cfd codes. In 2018 Fluid Dynamics Conference, page 3706, 2018. [24] PH Cook, MCP Firmin, and MA McDonald. Aerofoil RAE 2822: pressure distributions, and boundary layer and wake measurements. RAE, 1977. [25] Christopher L Rumsey. Recent developments on the turbulence modeling resource website. In 22nd AIAA Computational Fluid Dynamics Conference, page 2927, 2015. URL https://turbmodels.larc.nasa.gov/. [26] Joseph G Marvin, James L Brown, and Peter A Gnoffo. Experimental database with baseline cfd solutions: 2-d and axisymmetric hyper- sonic shock-wave/turbulent-boundary-layer interactions. 2013. URL https://turbmodels.larc.nasa.gov/Other_exp_Data/SBLI_various_ marvin/NASATM_2013_216604.pdf. [27] Francis K Acquaye. Evaluation of various turbulence models for shock-wave boundary layer interaction flows. Master?s thesis, Washington University in St. Louis, 2016. URL https://doi.org/10.7936/K7RJ4GS6. [28] Sergio Pirozzoli, Alexandre Beer, Matteo Bernardini, and Francesco Grasso. Computational analysis of impinging shock-wave boundary layer interaction under conditions of incipient separation. Shock Waves, 19(6):487, 2009. [29] Dennis C Jespersen, Thomas H Pulliam, and Marissa Lynn Childs. Overflow turbulence modeling resource validation results. 2016. URL https://ntrs. nasa.gov/search.jsp?R=20190000252. [30] Brandon Morgan, K Duraisamy, N Nguyen, S Kawai, and SK Lele. Flow physics and rans modelling of oblique shock/turbulent boundary layer inter- action. Journal of Fluid Mechanics, 729:231?284, 2013. 242 [31] Pietro Catalano and Marcello Amato. An evaluation of rans turbulence mod- elling for aerodynamic applications. Aerospace science and Technology, 7(7): 493?509, 2003. [32] Keerti K Bhamidipati, Daniel A Reasor, and Crystal L Pasiliao. Unstructured grid simulations of transonic shockwave-boundary layer interaction-induced oscillations. In 22nd AIAA Computational Fluid Dynamics Conference, page 2287, 2015. [33] Ge?rard Dreyfus. Neural Networks: Methodology and Applications. Springer- Verlag, Berlin Heidelberg, 1 edition, 2005. [34] Kurt Hornik, Maxwell Stinchcombe, and Halber White. Multilayer feedfor- ward networks are universal approximators. Neural Networks, 2:451460, 3 1989. [35] Jamshid A. Samareh and Jay Ming Wong. Training knowledge bots for physics-based simulations using artificial neural networks. Technical report, NASA, 11 2014. URL https://ntrs.nasa.gov/search.jsp?R= 20150000596. [36] Jian-Xun Wang, Jin-Long Wu, and Heng Xiao. A physics informed machine learning approach for reconstructing reynolds stress modeling discrepancies based on dns data. Physical Review Fluids, 2(3), 2017. [37] Jin-Long Wu, Jian-Xun Wang, Heng Xiao, and Julia Ling. A priori assessment of prediction confidence for data-driven turbulence modeling. Flow, Turbu- lence and Combustion, 99(1):25?46, 2017. URL https://doi.org/10.1007/ s10494-017-9807-0. [38] J. Ling and J. Templeton. Evaluation of machine learning algorithms for prediction of regions of high reynolds averaged navier stokes uncertainty. Physics of Fluids, 27(8):085103, 2015. doi: 10.1063/1.4927765. URL https: //aip.scitation.org/doi/abs/10.1063/1.4927765. [39] Eric J. Parish and Karthik Duraisamy. A paradigm for data-driven predictive modeling using field inversion and machine learning. Journal of Computational Physics, 305:758 ? 774, 2016. ISSN 0021-9991. doi: https://doi.org/10.1016/j. jcp.2015.11.012. URL http://www.sciencedirect.com/science/article/ pii/S0021999115007524. [40] Karthik Duraisamy, Gianluca Iaccarino, and Heng Xiao. Turbulence modeling in the age of data. Annual Review of Fluid Mechanics, 51:357?377, 2019. [41] Anand Pratap Singh and Karthik Duraisamy. Using field inversion to quantify functional errors in turbulence closures. Physics of Fluids, 28(4):045110, 2016. doi: 10.1063/1.4947045. URL https://doi.org/10.1063/1.4947045. 243 [42] Anand Pratap Singh. A Framework to Improve Turbulence Models Using Full- field Inversion and Machine Learning. PhD thesis, University of Michigan, University of Michigan, Ann Arbor, MI, 48109, USA, 6 2018. URL http: //hdl.handle.net/2027.42/144034. [43] Karthik Duraisamy, Ze Jia XZhang, and Anand Pratap Singh. New approaches in turbulence and transition modeling using data-driven techniques. In 53rd AIAA Aerospace Sciences Meeting, AIAA SciTech Forum, 2015. URL https: //doi.org/10.2514/6.2015-1284. [44] Hiroshi Kato, Keiichi Ishiko, and Akira Yoshizawa. Optimization of parameter values in the turbulence model aided by data assimilation. AIAA Journal, pages 1512?1523, 2016. [45] Rosario Lanzafame, Stefano Mauro, and Michele Messina. 2d cfd modeling of h-darrieus wind turbines using a transition turbulence model. Energy Procedia, 45:131?140, 2014. [46] S Mauro, R Lanzafame, M Messina, and D Pirrello. Transition turbulence model calibration for wind turbine airfoil characterization through the use of a micro-genetic algorithm. International Journal of Energy and Environmental Engineering, 8(4):359?374, 2017. [47] PA Costa Rocha, HH Barbosa Rocha, FO Moura Carneiro, ME Vieira da Silva, and A Valente Bueno. k?? sst (shear stress transport) turbulence model calibration: A case study on a small scale horizontal axis wind turbine. Energy, 65:412?418, 2014. [48] PA Costa Rocha, HH Barbosa Rocha, FO Moura Carneiro, ME Vieira da Silva, and C Freitas de Andrade. A case study on the calibration of the k?? sst (shear stress transport) turbulence model for small scale wind turbines designed with cambered and symmetrical airfoils. Energy, 97:144?150, 2016. [49] John Schaefer, Serhat Hosder, Thomas West, Christopher Rumsey, Jan-Renee Carlson, and William Kleb. Uncertainty quantification of turbulence model closure coefficients for transonic wall-bounded flows. AIAA Journal, 55(1): 195?213, 2016. [50] John Schaefer, Andrew Cary, Mori Mani, and Philippe Splart. Uncertainty quantification and sensitivity analysis of sa turbulence model coefficients in two and three dimensions. In 55th AIAA Aerospace Sciences Meeting, AIAA SciTech Forum, 2017. URL https://doi.org/10.2514/6.2017-1710. [51] John Schaefer, Serhat Hosder, Thomas West, Christopher Rumsey, Jan-Renee Carlson, and William Kleb. Uncertainty quantificaiton of turbulence model closure coefficients for transonic wall-bounded flows. AIAA Journal, 55(1): 195?213, 2017. URL https://doi.org/10.2514/1.J054902. 244 [52] John Anthony Schaefer. Uncertainty quantification of turbulence model clo- sure coefficients for transonic wall-bounded flows. Master?s thesis, Missouri University of Science and Technology, Missouri S&T, Rolla, MO 65409, USA, 2014. URL http://scholarsmine.mst.edu/masters_theses/7480/. [53] Jaideep Ray, Sophia Lefantzi, Srinivasan Arunajatesan, and Lawrence J DeChant. Bayesian calibration of a k-? turbulence model for predictive jet-in- crossflow simulations. In 44th AIAA Fluid Dynamics Conference, page 2085, 2014. [54] Serge Guillas, Nina Glover, and Liora Malki-Epshtein. Bayesian calibration of the constants of the k?? turbulence model for a cfd model of street canyon flow. Computer methods in applied mechanics and engineering, 279:536?553, 2014. [55] Todd A Oliver and Robert D Moser. Bayesian uncertainty quantification applied to rans turbulence models. In Journal of Physics: Conference Series, volume 318, page 042032. IOP Publishing, 2011. [56] Dimitrios I Papadimitriou and Costas Papadimitriou. Bayesian uncertainty quantification of turbulence models based on high-order adjoint. Computers & Fluids, 120:82?97, 2015. [57] WN Edeling, Pasquale Cinnella, and Richard P Dwight. Predictive rans sim- ulations via bayesian model-scenario averaging. Journal of Computational Physics, 275:65?91, 2014. [58] Hassan Raiesi, Ugo Piomelli, and Andrew Pollard. Evaluation of turbulence models using direct numerical and large-eddy simulation data. Journal of Fluids Engineering, 133(2):021203, 2011. [59] Paul A Durbin. Some recent developments in turbulence closure modeling. Annual Review of Fluid Mechanics, 50:77?103, 2018. [60] Michael Emory, Johan Larsson, and Gianluca Iaccarino. Modeling of struc- tural uncertainties in reynolds-averaged navier-stokes closures. Physics of Flu- ids, 25(11):110822, 2013. [61] Michael Emory, Rene Pecnik, and Gianluca Iaccarino. Modeling structural uncertainties in reynolds-averaged computations of shock/boundary layer in- teractions. In 49th AIAA Aerospace Sciences Meeting including the New Hori- zons Forum and Aerospace Exposition, page 479, 2011. [62] Aashwin Mishra, J Mukhopadhaya, Gianluca Iaccarino, and Juan Alonso. Uncertainty quantification of turbulence models for complex aerospace flows, 12 2018. 245 [63] Aashwin Ananda Mishra, Jayant Mukhopadhaya, Gianluca Iaccarino, and Juan Alonso. Uncertainty estimation module for turbulence model predic- tions in su2. AIAA Journal, 57(3):1066?1077, 2018. [64] Heng Xiao, J-L Wu, J-X Wang, Rui Sun, and CJ Roy. Quantifying and reduc- ing model-form uncertainties in reynolds-averaged navier?stokes simulations: A data-driven, physics-informed bayesian approach. Journal of Computational Physics, 324:115?136, 2016. [65] Jin-Long Wu, Jian-Xun Wang, and Heng Xiao. A bayesian calibration? prediction method for reducing model-form uncertainties with application in rans simulations. Flow, Turbulence and Combustion, 97(3):761?786, 2016. [66] Heng Xiao and Paola Cinnella. Quantification of model uncertainty in rans simulations: a review. Progress in Aerospace Sciences, 2019. [67] WN Edeling, G Iaccarino, and P Cinnella. Data-free and data-driven rans predictions with quantified uncertainty. Flow, Turbulence and Combustion, 100(3):593?616, 2018. [68] Q Wang and EA Dow. Quantification of structural uncertainties in the k- ? turbulence model. In Proceedings of the Summer Program, page 41, 2010. [69] Brendan D Tracey, Karthikeyan Duraisamy, and Juan J Alonso. A machine learning strategy to assist turbulence model development. In 53rd AIAA aerospace sciences meeting, page 1287, 2015. [70] Anand Pratap Singh, Shaowu Pan, and Karthikeyan Duraisamy. Characteriz- ing and improving predictive accuracy in shock-turbulent boundary layer inter- actions using data-driven models. In 55th AIAA Aerospace Sciences Meeting, AIAA SciTech Forum, 2017. URL https://doi.org/10.2514/6.2017-0314. [71] Anand Pratap Singh, Karthikeyan Duraisamy, and Ze Jia Zhang. Augmenta- tion of turbulence models using field inversion and machine learning. In 55th AIAA Aerospace Sciences Meeting, page 0993, 2017. [72] Anand Pratap Singh, Racheet Matai, Asitav Mishra, Karthikeyan Duraisamy, and Paul A Durbin. Data-driven augmentation of turbulence models for ad- verse pressure gradient flows. In 23rd AIAA Computational Fluid Dynamics Conference, page 3626, 2017. [73] Jonathan R Holland, James D Baeder, and Karthik Duraisamy. Towards integrated field inversion and machine learning with embedded neural networks for rans modeling. In AIAA Scitech 2019 Forum, page 1884, 2019. [74] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back-propagating errors. Nature, 323:533?536, 1986. doi: 10.1038/323533a0. URL http://dx.doi.org/10.1038/323533a0. 246 [75] Michael JD Powell. An efficient method for finding the minimum of a function of several variables without calculating derivatives. The computer journal, 7 (2):155?162, 1964. [76] David E Goldberg and John H Holland. Genetic algorithms and machine learning. Machine learning, 3(2):95?99, 1988. [77] John David Anderson and J Wendt. Computational fluid dynamics, volume 206. Springer, 1995. [78] Jiri Blazek. Computational fluid dynamics: principles and applications. Butterworth-Heinemann, 2015. [79] W Sutherland LII. The viscosity of gases and molecular force. Lond. Edinburgh Dublin Philos. Mag. J. Sci., 36(223):507?531, 1893. [80] Peter Davidson. Turbulence: an introduction for scientists and engineers. Oxford University Press, 2015. [81] Robin B Langtry and Florian R Menter. Correlation-based transition model- ing for unstructured parallelized computational fluid dynamics codes. AIAA journal, 47(12):2894?2906, 2009. [82] FR Menter, R Langtry, and S Vo?lker. Transition modelling for general purpose cfd codes. Flow, turbulence and combustion, 77(1-4):277?303, 2006. [83] T Rung, U Bunge, M Schatz, and F Thiele. Restatement of the spalart- allmaras eddy-viscosity model in strain-adaptive formulation. AIAA journal, 41(7):1396?1399, 2003. [84] Russell Reed and Robert J Marks. Neural smithing: supervised learning in feedforward artificial neural networks. 1999. [85] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006. [86] Mart??n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Lev- enberg, Dan Mane?, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Tal- war, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vie?gas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on hetero- geneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org. 247 [87] Josef Stoer and Roland Bulirsch. Introduction to numerical analysis, vol- ume 12. Springer Science & Business Media, 2013. [88] Richard H Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190?1208, 1995. [89] Peter Flach. Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, 2012. [90] Wes McKinney et al. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, volume 445, pages 51?56. Austin, TX, 2010. [91] D Hunter John. Matplotlib: a 2d graphics environment comput. Sci. Eng, 9: 90?5, 2007. [92] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825? 2830, 2011. [93] Charles George Broyden. The convergence of a class of double-rank minimiza- tion algorithms 1. general considerations. IMA Journal of Applied Mathemat- ics, 6(1):76?90, 1970. [94] DI Papadimitriou and KC Giannakoglou. Direct, adjoint and mixed ap- proaches for the computation of hessian in airfoil design problems. Inter- national journal for numerical methods in fluids, 56(10):1929?1943, 2008. [95] Devendra Ghate and Michael Giles. Efficient hessian calculation using auto- matic differentiation. In 25th AIAA Applied Aerodynamics Conference, page 4059, 2007. [96] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scien- tific tools for Python, 2001?. URL http://www.scipy.org/. [97] William Squire and George Trapp. Using complex variables to estimate derivatives of real functions. SIAM review, 40(1):110?112, 1998. URL https://doi.org/10.1137/S003614459631241X. [98] James C Newman, W Kyle Anderson, and David L Whitfield. Multidisci- plinary sensitivity derivatives using complex variables. Mississippi State Uni- versity Publication, MSSU-EIRS-ERC-98-08, 1998. [99] Robert T Biedron, Jan-Renee? Carlson, Joseph M Derlaga, Peter A Gnoffo, Dana P Hammond, William T Jones, Bill Kleb, Elizabeth M Lee-Rausch, Eric J Nielsen, Michael A Park, et al. Fun3d manual: 13.4. 2018. 248 [100] Martin S Ridout. Statistical applications of the complex-step method of nu- merical differentiation. The American Statistician, 63(1):66?74, 2009. [101] Ravi Kiran and Kapil Khandelwal. Complex step derivative approximation for numerical evaluation of tangent moduli. Computers & Structures, 140:1?13, 2014. [102] Steven G. Johnson. Notes on adjoint methods for 18.336. Online Posting, 12 2012. URL https://math.mit.edu/~stevenj/18.336/adjoint.pdf. [103] Michael B Giles and Niles A Pierce. An introduction to the adjoint approach to design. Flow, turbulence and combustion, 65(3-4):393?415, 2000. [104] Tim A Albring, Max Sagebaum, and Nicolas R Gauger. Efficient aerody- namic design using the discrete adjoint method in su2. In 17th AIAA/ISSMO multidisciplinary analysis and optimization conference, page 3518, 2016. [105] Beckett Yx Zhou, Tim A Albring, Nicolas R Gauger, Thomas D Economon, and Juan J Alonso. An efficient unsteady aerodynamic and aeroacoustic design framework using discrete adjoint. In 17th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, page 3369, 2016. [106] Thomas D Economon, Juan J Alonso, Tim A Albring, and Nicolas R Gauger. Adjoint formulation investigations of benchmark aerodynamic design cases in su2. In 35th AIAA Applied Aerodynamics Conference, page 4363, 2017. [107] Thomas D. Economon, Francisco Palacios, Sean R. Copeland, Trent W. Lukaczyk, and Juan Alonso. Su2: An open-source suite for multiphysics simulation and design. AIAA Journal, 54(3):828?846, 2016. URL https: //doi.org/10.2514/1.J053813. [108] Beckett Yx Zhou, Tim A Albring, Nicolas R Gauger, Thomas D Economon, Francisco Palacios, and Juan J Alonso. A discrete adjoint framework for unsteady aerodynamic and aeroacoustic optimization. In 16th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, page 3355, 2015. [109] Beckett Yx Zhou, Tim Albring, Nicolas R Gauger, Carlos R Ilario da Silva, Thomas D Economon, and Juan J Alonso. A discrete adjoint approach for jet- flap interaction noise reduction. In 58th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, page 0130, 2017. [110] M. Sagebaum, T. Albring, and N. R. Gauger. High-performance derivative computations using codipack. arXiv preprint arXiv:1709.07229, 2017. URL https://arxiv.org/abs/1709.07229. [111] Christopher L Rumsey. Proceedings of the 2004 workshop on cfd validation of synthetic jets and turbulent separation control. 2007. 249 [112] R Bush, G Power, and C Towne. Wind-the production flow solver of the nparc alliance. In 36th AIAA Aerospace Sciences Meeting and Ex- hibit, page 935, 1998. URL https://www.grc.nasa.gov/WWW/wind/valid/ raetaf/raetaf05/raetaf05.html. [113] Philippe R Spalart, Shur Deck, Michael L Shur, Kyle D Squires, M Kh Strelets, and Andrei Travin. A new version of detached-eddy simulation, resistant to ambiguous grid densities. Theoretical and computational fluid dynamics, 20 (3):181, 2006. [114] Robi Polikar, Lalita Upda, Satish S Upda, and Vasant Honavar. Learn++: An incremental learning algorithm for supervised neural networks. IEEE trans- actions on systems, man, and cybernetics, part C (applications and reviews), 31(4):497?508, 2001. [115] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935?2947, 2018. [116] Michael Holden, Timothy Wadhams, Matthew MacLean, and Erik Mundy. Experimental studies of shock wave/turbulent boundary layer interaction in high reynolds number supersonic and hypersonic flows to evaluate the perfor- mance of cfd codes. In 40th Fluid Dynamics Conference and Exhibit, page 4468, 2010. [117] Michael J Wright, Todd White, and Nancy Mangini. Data parallel line relax- ation (dplr) code user manual: Acadia-version 4.01. 1. 2009. [118] V Schmitt and F Charpin. Pressure distributions on the onera-m6-wing at transonic mach numbers. Experimental data base for computer program as- sessment, 4, 1979. [119] Alexander Kuzmin. On the lambda-shock formation on onera m6 wing. In- ternational Journal of Applied Engineering Research, 9(20):7029?7038, 2014. [120] Daniel Destarac and Antoine Dumont. Onera m6 wing test-case, original and tmr. 2016. URL https://turbmodels.larc.nasa.gov/ Onerawingnumerics_val/ONERA_M6_Test_Case_TMR.pdf. 250