ABSTRACT 

 Title of Thesis:  DEVELOPMENT OF LOW-COST 
 AUTONOMOUS RESEARCH SYSTEMS 

 Logan M. Saar, Master of Science, 2023 

 Thesis directed by:  Professor Ichiro Takeuchi, Materials Science and 
 Engineering 

 A central challenge of materials discovery for improved technologies arises from the 

 increasing compositional, processing, and structural complexity involved when synthesizing 

 hitherto unexplored material systems.  Traditional Edisonian and combinatorial high-throughput 

 methods have not been able to keep up with the exponential growth in potential materials and 

 relevant property metrics.  Autonomously operated Self-Driving Labs (SDLs) - guided by the 

 optimal experiment design sub-field of machine learning, known as active learning - have arisen 

 as promising candidates for intelligently searching these high-dimensional search spaces.  In the 

 fields of biology, pharmacology, and chemistry, these SDLs have allowed for expedited 

 experimental discovery of new drugs, catalysts, and more.  However, in material science, highly 

 specialized workflows and bespoke robotics have limited the impact of SDLs and contributed to 

 their exorbitant costs. In order to equip the next generation workforce of scientists and advanced 

 manufacturers with the skills needed to coexist with, improve, and understand the benefits and 

 limitations of these autonomous systems, a low-cost and modular SDL must be available to 

 them.  This thesis describes the development of such a system and its implementation in an 

 undergraduate and graduate machine learning for materials science course.  The low-cost SDL 

 system developed is shown to be affordable for primary through graduate level adoption, and 

 provides a hands-on method for simultaneously teaching active learning, robotics, measurement 


 science, programming, and teamwork: all necessary skills for an autonomous compatible 

 workforce.  A novel hypothesis generation and validation active learning scheme is also 

 demonstrated in the discovery of simple composition/acidity relationships. 


 DEVELOPMENT OF LOW-COST AUTONOMOUS RESEARCH SYSTEMS 

 by 

 Logan M. Saar 

 Thesis submitted to the Faculty of the Graduate School of the 
 University of Maryland, College Park, in partial fulfillment 

 of the requirements for the degree of 
 Master of Science 

 2023 

 Advisory Committee 
 Professor Ichiro Takeuchi, Chair 
 Professor Ji-Cheng Zhao 
 Dr. Aaron Gilad Kusne 


 ©Copyright by 
 Logan M. Saar 

 2023 


ii 
 

Acknowledgements 
 

I am extremely lucky to have had so many encouraging and helpful people in my life 

over the past 2 and a half years, all of whom deserve appreciation for their support.   

To my research advisor, Dr. Ichiro Takeuchi, I am truly thankful for the opportunity I 

have had to receive your guidance and assistance throughout my research.  It has been a pleasure 

to be mentored by you and I am grateful for the chance you gave me to implement my research 

in the classroom as an educational tool.  To my other advisor, Dr. Gilad Kusne, I am equally 

thankful for your guidance and patience when helping me to overcome the hurdles in this project.  

To both of you, your constructive communication, patience, and friendliness has been a stellar 

example to me of not only how to conduct research, but also how to treat others well.   

Thank you to my entire research group and other students I have had the pleasure to work 

with during my time at university: you have made me feel welcome and helped me grow as an 

individual by sharing your knowledge and offering your support.  Thank you  to Haotong for the 

countless hours of helping me troubleshoot code and help students; to Felix for the valuable 

assistance in facilitating the use of the robots in the classroom,  to Alex, whose introduction to 

the project and previous work set me up well for success; and the many others who have helped 

me in small ways - I hope to return the favor with my own support.   

I would like to give a big thanks to my family and my friends, who have always given me 

unconditional support, love, and loyalty.  You have all carried me through this busy time in my 

life and always made it easier with your attention and presence.   

Thank you to my defense committee members for being so generous with their time.  To 

Dr. JC Zhao, your support of this project and my education has been a true blessing, and I am 

extremely grateful for your confidence in me.  To Adaire Parker, Dr. Isabel Lloyd, and the entire 


iii 
 

MSE department at UMD: thank you for presenting me with all the opportunities to give back to 

the program and for your assistance in helping me navigate my coursework.  Finally, a big 

thanks to the National Institute of Standards and Technology for their financial support during 

the Summer Undergraduate Research Fellowship in 2021.   

 
iv 
 

Table of Contents 
Acknowledgements ......................................................................................................................... ii 
Table of Contents ........................................................................................................................... iv 
List of Tables ................................................................................................................................. vi 
List of Figures ............................................................................................................................... vii 
Chapter 1: Introduction ....................................................................................................................1 

1.1  Perspective & Motivation ...................................................................................................1 
1.2 Thesis Overview ..................................................................................................................3 
1.3 Background ..........................................................................................................................5 

1.3.1 Active Learning Schema .............................................................................................5 
1.3.2 Relevant ML Concepts ...............................................................................................9 
1.3.3 Autonomous Experimentation in Materials Science.................................................12 
1.3.4 Educating an SDL Compatible Workforce ...............................................................14 

1.4 Thesis Outline: Ch.2 - Ch.4 (Development and Operation) ..............................................24 
Chapter 2:  Systems Development .................................................................................................25 

2.1 Design Principles ...............................................................................................................25 
2.2 Mechanical Design.............................................................................................................28 
2.3 System Construction & Preparation...................................................................................31 
2.4 Connecting to & Calibrating LEGOLAS ...........................................................................39 
2.5 Developmental Stages ........................................................................................................43 

Chapter 3: Educational Implementation ........................................................................................50 
3.1 Henderson-Hasselbalch Exercise .......................................................................................50 

3.1.1 Introductory Exercises ..............................................................................................53 
3.1.2 Autonomous Closed Loop Exercise..........................................................................57 

3.2 Class Implementations .......................................................................................................62 
3.2.1 Fall 2021 ...................................................................................................................63 
3.2.2 Fall 2022 ...................................................................................................................63 
3.2.3 Lessons Learned........................................................................................................64 

Chapter 4:  Autonomous Model Exploration .................................................................................67 
4.1  Bayesian Optimization ......................................................................................................67 
4.2  Hypothesis Validation Objective ......................................................................................70 

4.2.1 Generating and Evaluating Candidate Hypotheses ...................................................73 
4.2.2  Informational Entropy Acquisition Function ...........................................................76 

Chapter 5:  Future Work and Conclusions .....................................................................................81 
5.1 Future Work & Scope ........................................................................................................81 

5.2.1 Alternate Educational Exercises ...............................................................................81 
5.2.2 System Modifications ...............................................................................................82 

5.2 Conclusions ........................................................................................................................83 


v 
 

Appendices .....................................................................................................................................84 
I: Associated Media ...........................................................................................................84 

Bibliography ..................................................................................................................................85 
 

vi 
 

List of Tables 

Table 1.1 Summary of some recent low-cost SDL platforms   ……………………………  19 

Table 2.1 Cost Breakdown for a single LEGOLAS robot        ……………………………. 32 

Table 2.2 Summary of FDM printed parts                                ……………………………. 33 

 
vii 
 

List of Figures 

Figure 1.1 Picture of LEGOLAS & its use in the Fall ‘21 ……………………………….   5 

Machine Learning for Materials Science course at UMD    

Figure 1.2 Pictorial representation of Bayes Theorem for ……………………………….   11 

iteratively updating our beliefs 

Figure 1.3 Three target foundational pillars for educational ……………………………...  15 

development identified by an MGI working group 

Figure 1.4 Images of low-cost SDLs from Table 1.1           ………………………………. 20 

Figure 1.5 Simplified pictorial representation for an SDL   ………………………………. 23 

such as LEGOLAS           

Figure 2.1 Experimental components on the trolley             ………………………………. 29 

Figure 2.2 Camera attached to the bottom of the Trolley     ………………………………. 30 

Figure 2.3 All 3D printed parts needed to build one             ………………………………. 33 

LEGOLAS robot 

Figure 2.4 Fully assembled stand                                          ………………………………. 34 

Figure 2.5 Fully assembled bridge                                         ………………………………. 35 

Figure 2.6 Assembled trolley resting on the bridge               ………………………………. 36 

Figure 2.7 Sample space setup for the                                   ………………………………. 38 

Henderson-Hasselbalch pH study 

Figure 2.8 Location of relevant wires and electronic             ………………………………. 39 

devices for LEGOLAS 

Figure 2.9 GUI calibration window and liquid-volume/        ………………………………. 41 

gear-step calibration using a mg digital scale 


viii 
 

Figure 2.10 Example usage of fundamental movement        ………………………………. 42 

functions and a synthesis/measurement loop 

Figure 2.11 Liquid handling robot that inspired the              ………………………………. 43 

LEGOLAS design  

Figure 2.12 LEGOLAS (Generation I)                                  ………………………………. 44 

Figure 2.13 LEGOLAS (Generation II)                                 ………………………………. 45 

Figure 2.14 Classroom setup for Fall ’21 implementation     ………………………………. 46 

Figure 2.15 LEGOLAS (Generation III)                                ………………………………. 47 

Figure 2.16 LEGOLAS (Generation IV)                                ………………………………. 48 

Figure 2.17 Students working on LEGOLAS exercises         ………………………………. 49 

during the Fall ’22 implementation                                 

Figure 3.1 The Henderson-Hasselbalch equation                   ………………………………. 51 

Figure 3.2 Percent error in the Henderson-Hasselbalch         ………………………………. 52 

Equation as a function of sodium hydroxide concentration                     

Figure 3.3 Gaussian process demonstrated for 5 samples      ………………………………. 56 

Within a composition range of %acid = [5-95]   

Figure 3.4 Gaussian process evolution (RBF Kernel) for      ………………………………. 61 

First 5 experimental samples with an exploration-based CO 

Figure 4.1 Sampling of potential models across the prior      ………………………………. 69 

distribution determined by the kernel 

Figure 4.2 Probabilistic interpretation of the GP for              ………………………………. 69 

uncertainty quantification and propagation       

Figure 4.3 Effect of new data on GP structure and                ………………………………. 70 


ix 
 

a purely exploration-based acquisition function               

Figure 4.4 Process-flow for experimentation, hypothesis     ………………………………. 75 

generation, and evaluation 

Figure 4.5 Process-flow for creating GMMs over sample space          ……………………. 78 

Figure 4.6 Using the informational entropy metric to           ………………………………. 80 

select the next composition 

Figure 5.1 Camera images of sample wells from the color   ………………………………. 81 

mixing study 

Figure 5.2 Analog aqueous conductivity probe,                    ………………………………. 82 

spectrophotometer, and heating element    

 
 Chapter 1: Introduction 

 1.1  Perspective & Motivation 

 In the past 50 years,  automated  systems - systems  that perform tasks with little or no 

 human control - revolutionized the manner in which society operated [1].  These new 

 technologies increased productivity and efficiency by automating repetitive and precision based 

 tasks; tasks for which humans were more poorly suited [2].  These automated systems, created 

 by and  for  humans, greatly improved society’s ability  to produce on a large scale [1,2].  Inherent 

 to this revolution was a shift in the needs of the workplace.  On one hand, there was a new 

 demand for people to develop and improve the automated systems themselves [3].  On the other 

 hand, there was an increase in demand for workers - humans - who could tend to, interact with, 

 and understand both the hardware and software elements of the automated systems [3].  At all 

 levels of education, classes that taught the fundamentals of automated systems - from computer 

 programming to robotics - began receiving a greater degree of emphasis [1,3].  These were all 

 efforts to equip the next workforce with skills necessary to coexist with automated systems and 

 thrive in an automated world. 

 For materials science, this automated revolution expanded the ability to explore and 

 experiment with a larger space of compositions, processing parameters, and performance metrics 

 [4].  In large part, the substantial successes of combinatorial and high-throughput 

 experimentation (CHT) were made possible by automated systems that accelerated the synthesis 

 and characterization steps [4].  However, in automated systems,  the “lead scientists” - the 

 experiment directing field-specific human experts are never supplanted by the robotic 

 task-performing machines [4].  Incapable of straying beyond the means of their typical tasks - 

 even if preprogrammed to act differently in certain situations - the automated systems could not 

 1 


 take on the intuition based activity of  understanding  the world through the method of scientific 

 process:  observation, hypothesis, and experiment [5,6].  At most, they played a  role  in this 

 process, as a tool which could facilitate and accelerate experiments. 

 In the past few decades, with the advent of machine learning (ML) and the broader field 

 of artificial intelligence (AI), it has become possible to merge these new technologies with 

 automated robotics into what are known as  autonomous  systems.  Capable of  approximating 

 human intelligence - mainly the faculty of learning and adapting -  these systems were endowed 

 with adaptive algorithms for responding to varying stimuli in a non-preprogrammed, 

 non-predetermined fashion.   To what degree AI & ML constitutes intelligence with respect to 

 our human definition of the word is already a subject of much debate, but what is clear - from 

 their widespread adoption (albeit still mostly in development stage) and rising popularity - is that 

 they are going to play a large role in the world of the future [1,7-9]. Self-driving cars, 

 autonomous factories, autonomous drones, the internet-of-things: all examples of a data-driven 

 world in which machines have begun to usurp  certain  previously human-performed decision 

 making tasks [10,11]. The pervading question, however, is to what degree could these 

 autonomous systems function in roles where the workflow is not only highly complex and 

 dynamic, but also not clearly defined or understood by humans themselves [5,12-14].  How may 

 an autonomous system perform within a profession in which the role can be ambiguous, 

 dependent on insight, and itself a method of discovery, say, as that of a “lead scientist?” 

 [5,7,8,12,15,16]. 

 To consider this question, as well as work to better illuminate what a “lead scientist” 

 really does, is one of the essential steps in preparing the ground for autonomous experimentation 

 (AE) as a fruitful field for cultivation [5,15,17].  A second related question that arises is: what 

 2 


 role will  humans  continue to play in science, and thereby what skills & fundamentals will be 

 useful for them to cultivate to continue thriving in this changing world? [18-21]   This thesis 

 attempts to approach and begin to answer both of these questions.  In the background section, the 

 fundamentals of ML, AI, and  active learning  will be  discussed in terms of their implementation 

 in AE.  Next, the merit of AE will be discussed with respect to prior successes and challenges 

 within the field of Materials Science and Engineering (MSE).  We present a LEGO-based 

 low-cost autonomous scientist as an embodiment of an autonomous science kit to be used for 

 educating the next generation workforce. The requirements for an autonomous compatible 

 workforce of  human  researchers will be discussed,  focusing on the skills needed and methods of 

 teaching those skills.  Additionally, the role of a “lead scientist” and the ability to potentially 

 encapsulate the scientific process of hypothesis generation and testing within an AE system’s 

 “intelligence” will be considered. 

 1.2 Thesis Overview 

 This thesis describes my development of low-cost robotic systems for closed-loop AE. 

 These systems are designed to be affordable, modular, and simple to operate to ensure usability 

 at all educational levels.  They are composed of inexpensive components, including LEGOs, to 

 allow for the affordability and modularity desired.  From here on out, they will be referred to as 

 LEGOLAS:  LEGO  based  L  ow-cost  A  utonomous  S  cientist(s)  (  Figure 1.1  ). 

 I began development of LEGOLAS during the Summer Undergraduate Research 

 Fellowship (SURF) program at the National Institute of Standards and Technology (NIST) in 

 2020.  The first task was building the robot and producing a live-run with a purely exploration 

 based campaign objective (  Section 4.1  ).  Some figures shown in this section are from the 

 3 


 resulting colloquium presentation provided at the end of that Summer [22].  I was then able to 

 further develop the Autonomous Model Exploration modules (  Section 4.2  ) while working as an 

 undergraduate research assistant in the ML group at UMD (co-led by Dr. Ichiro Takeuchi at 

 UMD and Dr. Gilad Kusne at NIST).  Many of the figures in  Sections 4.1 & 4.2  are from a 

 presentation I gave in December of 2021 at the Materials Research Society (MRS) Symposium 

 on Accelerating Experimental Materials Research with Machine Learning [23].  We were then 

 also able to submit a publication on this work to MRS Bulletin in May of 2022 (Published 

 November 2022) [24].  Further publications on the use of LEGOLAS in the classroom for 

 solid-state materials research, and multi-objective studies, are expected in the coming months 

 and years. 

 Chapter 2 describes the design principles and methods I used in determining how to 

 construct, operate, and interface with LEGOLAS.  Chapter 3 then reviews the use of LEGOLAS 

 in two undergraduate / graduate courses for  Machine  Learning for Materials Science  (ENMA 

 437/637)  offered at the University of Maryland (UMD) (  Figure 1.1  ).  In these classes, I was the 

 lead teaching fellow and designed the exercises for completion, which served as the final project 

 for students. 

 Chapter 4 discusses the use of LEGOLAS to perform novel on-the-fly hypothesis 

 generation and validation using symbolic regression and an acquisition function rooted in the use 

 of informational entropy as a metric of uncertainty [25,26].  The process of active learning 

 development displayed in Chapter 4 is meant to be a testament to the ability of LEGOLAS to 

 facilitate a creative and responsive learning environment in which the encoding of patterns of 

 human “intelligence” into AE systems is more easily achieved.  Chapter 5 talks about the future 

 for LEGOLAS and the educational opportunities it provides. 

 4 


Figure 1.1: (left) Picture of LEGOLAS & (right) its use in the Fall ‘21 Machine Learning for
Materials Science course at UMD

1.3 Background

1.3.1 Active Learning Schema

Active Learning is itself a sub-field of ML, and its application in AE systems is what

distinguishes them from the rigid, prescriptive behaviors of purely automated systems.  AE

systems are flexible and capable of guiding and altering experimental design - that is, the choice

of experimental prerequisites (composition, processing parameters) and/or characterization

schemes (means of collecting feedback/data) - to achieve some user defined “goal” [27]  These

goals, known as campaign objectives (CO), may be divided into roughly 3 main categories:

exploration, exploitation, & mechanistic study [27].  These COs will be described in more details

later but briefly:

- an exploration CO aims to “explore” a large amount of sample space, prioritizing the

investigation of areas in which we have little information (i.e, the unknown),

- an exploitation CO - also known as an optimization CO - aims usually to maximize or

minimize some response metric (feedback or performance measure) and thereby

5


 investigate areas for which we believe the optimal metric  could  be found (i.e use the 

 known  to make a prediction), 

 -  and a  mechanistic study  CO aims to either determine  or attribute an underlying functional 

 form to the data -  a relation between input parameters and output values - with the 

 highest possible  confidence,  always considering that  this  form  is selected amongst a sea 

 of other potential descriptive forms, 

 In reality, most active learning schemas are a combination of these COs, and will implement 

 them together or in succession depending on the nature of the experiment.  As an example, in a 

 hypothetical materials “discovery” experiment, it may be beneficial to first  explore  the 

 composition space, conducting initial experiments into unknown areas of sample space, and then 

 - once some of the sample space is known - shift to an  exploitation  CO in which subsequent 

 experiments search at or nearby established maxima/minima.  In this way, the active learning 

 schema provides an “intelligent” means of searching through the sample space, potentially 

 avoiding the headache of investigating all samples (i.e. exploring the entire time), and 

 additionally avoiding the pitfall of trying to optimize the property when nothing is known (i.e. 

 “shooting in the dark,” or exploiting from the start).  This idea of alternating between COs in an 

 active learning schema is known as “scheduling.”  It is important to note that in all these COs, 

 we are typically basing our decision making criteria on some idea of the assumed  value  of 

 running a certain experiment.  The various methods for quantifying this value and using it for 

 experimental guidance are at the heart of active learning, and will be elucidated in more detail 

 when  acquisition functions  are discussed. 

 6 


 When active learning is implemented in AE towards the achievement of these CO’s, it is 

 often in pursuit of a reduced experimental cost.  This “reduced cost” can constitute several 

 meanings depending on the constraints and nature of the specific experiment: 

 -  (1) acceleration of the process due to a lack of funds required to carry out 

 experiments or time, 

 -  (2) reduction in number of experiments due to a lack of funds, time, or resources, 

 -  (3) reduction in number of experiments due to a prohibitively large sample space 

 Reasons (1) & (2) are quite straightforward when considering costly and time-restricted studies - 

 for example, those involving synchrotron beam time or expensive constituents [28] - and reason 

 (1) is to some degree satisfied partially by  automated  experimentation systems (for example, 

 those of combinatorial experimentation studies) since they can reduce sample transfer time, 

 human error, and other associated slowdowns with human-executed processes [4,12,29-31]. 

 Reason (3) arises from the exponential growth in the number of “combinations” of possible 

 experimental conditions with the addition of each processing/composition variable [7,27].  For 

 example, the exhaustive study of 10 compositions at 10 temperatures would only necessitate 100 

 experiments.  Upon additionally investigating 10 pertinent pressure values and the inclusion of a 

 new constituent at 10 concentrations, there appear now 10  4  possible experiments.  This is often 

 prohibitive for the implementation of an exhaustive or “grid” search of the sample space, and 

 although it is quite obvious that an Edisonian approach would be inapt, it is also true that a 

 purely  automated  grid-search may not be appropriate  given the time/money costs [27].  This 

 “curse of dimensionality” problem is highly prevalent in materials science, especially in the 

 fields of materials discovery and property optimization since there are a large number of 

 potential constituents (considering binary, tertiary, and quaternary compounds consisting of 

 7 


 elements selected from across the periodic table), a large number of processing parameters 

 (temperature, pressure, etc.), a large number of structural considerations (film thickness, growth 

 technique, crystal structure, etc.), and a large number of relevant property metrics (optical, 

 electrical, mechanical, etc.) that affect any target performance criterion [14,27].  When 

 considering the complex web of process/structure/property/performance relationships, is there a 

 way to “intelligently experiment” to reduce the necessary number of experiments to achieve a 

 CO; and how should one do this? 

 This is not a new question, and the Design of Experiment (DoE) field has existed to 

 address the large sample space problem for many decades [32]. This field is concerned with 

 optimal experiment design when under resource or time constraints, and aims to identify causal 

 variables.  It may explore relationships between independent and dependent variables in a 

 multifactorial fashion, alter experimental design based on collected results (sequential analysis), 

 and use randomization - among other techniques - to achieve this [33]. 

 In AE, active learning takes the place of DoE [32].  Specifically, the quantification of 

 value for a potential experiment - as mentioned earlier - is done through what is known as an 

 acquisition function  .  In the closed-loop AE process  flow, an acquisition function is generated 

 and updated after each successive data point is measured, and signifies the  value metric  that each 

 new potential experiment “should” have (  Section 1.3.2  for ML background  ).  The next 

 experimental conditions can be chosen on the basis of the optimal value metric [27].  An 

 acquisition function is meant to adapt and change after each successive data point is collected, 

 reflecting a new “mindset” from which the robot works with as it incorporates new data.  This is 

 roughly akin to a human taking  interest  in guiding  the experiment a certain or different way once 

 they have conducted some runs/trials -  something that purely automated systems are incapable 

 8 


 of achieving.  The degree, however, to which this characteristic truly embodies the ineffable 

 process of human  intuition  in guiding experiment is  a subject of much debate [5].  Nonetheless, 

 the acquisition function is the next logical step towards  approximating  the process of our 

 adaptable intelligence and intuition.  Since, inherently, the acquisition function is a  supposition 

 of experimental value it is of great importance that acquisition functions be modular, flexible, 

 and to whatever degree possible,  informed  [16,34].  Without these characteristics, it is easy for 

 the acquisition function to be ill-suited for providing a real benefit towards intelligently 

 searching the space, and in fact certain comparative studies have shown little to no benefit at all 

 from relatively uninformed active learning schema in guiding certain experiments [9].  To inform 

 an acquisition function means providing it with prior data, physics axioms and principles (as in 

 physics-informed AE), or even - in the case of human-in-the-loop AE - some human supervision 

 to ensure that it doesn’t become too  unreasonable  or counterproductive in its experimental 

 suggestions [34,35].  Reasonableness, however, is quite hard to quantify, and to a large degree 

 the predominant intention for implementing active learning/AE guidance in certain fields is that 

 humans themselves may be ill-suited toward - and sometimes biased against - seeing the 

 data-driven “signs” that point towards valuable next-experiments [27,36].  In summary, active 

 learning, like any tool, has its pros and cons, and it should be the goal of any teaching program to 

 develop a critical thinking mindset for distinguishing between its pros and cons so that it is 

 properly applied in a useful manner (  Section 2.1 for  Design Principles  ). 

 1.3.2 Relevant ML Concepts 

 The ability to autonomously guide experimental design,  as discussed in  Section 1.3.1  , 

 relies on a quantification of supposed value as a function of experimental prerequisites, which in 

 turn emerges from the underlying ML framework utilized.  In this thesis, the predominant ML 

 9 


 frameworks used are that of Bayesian statistics and Gaussian processes. In our application of ML 

 to active learning for AE, the task will often be of regression: minimization of the discrepancy 

 between our models predictions (i.e., our beliefs) and the observed experimental data (i.e., 

 reality).  Our predictive models in the case of parameterized models may rely on the selection 

 and tuning of a functional form, or in other cases (such as with Gaussian Processes) lack the need 

 for an explicit functional form but instead require the selection of an appropriate kernel, which is 

 essentially a covariance function [37]. 

 The utility of Bayesian statistics with regard to active learning arises from its ability to 

 quantify model  uncertainty  [37,38].  As discussed  in  Section 1.3.1  , some COs (exploration, 

 exploitation), explicitly require some encoding of the concept of what is known or unknown (i.e. 

 where we are confident in our beliefs, and where we are not).  These concepts can be conflated 

 with the quantification of uncertainty in our model [37,38].  The Bayesian statistical approach is 

 based on the work of Thomas Bayes, and is exemplified by Bayes theorem (  Equation 1.1  ) [38]. 

 (Eq.  1.1)[38]   (   |   ) =   (   |   )*  (  )  (  )
 This equation is intended to assist in producing the  posterior probability  [  ], a conditional   (   |   )
 probability that represents the probability of event A given event B is true [38]. We rely on 

 inputs of a  prior probability  [  ], which represents  the initial beliefs about the probability of   (  )
 event A, and a  marginal probability,  [  ], which  may be expanded into constituent terms to   (  )
 allow for calculation [38].  The Bayesian framework is used often in the analysis of positivity 

 rates in medical testing, but to see its application to active learning, it may be rewritten (  Equation 

 1.2  ) [38]: 

 (Eq.  1.2)[38]   (   |   ) =   (   |   )*  (  )  (  )

 10 


 In this case we can presuppose a predictive model, and our  posterior  represents the probability of 

 our predictive model being true given our observed data.  If we recognize that our predictive 

 model is a continuous explanatory function across our sample space, we can see that we actually 

 calculate a  posterior distribution  across this same  space [37,38].  This may be a corollary to a 

 confidence  distribution for our model (i.e. where  we are more and less certain about its predictive 

 capabilities and whether or not it is “true”).  In addition to inputting a predictive model, we must 

 also input a  prior  (which represents our initial beliefs/confidence  for our model) [38].  Our other 

 conditional probability [  ]  (which represents  the likelihood of observing this data,   (   |   )
 assuming our model is correct) is the main metric that is updated as we gain new data.  In this 

 way, Bayes Theorem can be used iteratively between closed-loop runs (synthesis & 

 measurement) to constantly update our  prior  and use  the calculated  posterior  to construct our 

 acquisition function (based on quantification of uncertainty) (  Figure 1.2  ). 

 Figure 1.2:  Pictorial representation of Bayes Theorem  for iteratively updating our beliefs 

 Two of the active learning techniques implemented on LEGOLAS to study the simple 

 chemical system described here (  Chapter 3  ) involve  Bayesian ML to some degree: Bayesian 

 optimization using Gaussian processes (  Section 4.1  )  and Bayesian Inference for parameter 

 refinement in preselected models (not discussed in this thesis). The third technique - Hypothesis 

 11 


 Generation and Validation - utilizes an informational entropy based approach for quantification 

 of experimental value (i.e. the acquisition function) (  Section 4.2  ).  To produce potential 

 hypotheses, several candidates are either preselected by the user or generated via a symbolic 

 regression package (genetic programming) (  Section  4.2.1  ).  Fitting of these candidate hypotheses 

 occurs either through simple non-linear least squares regression, or automatically within the 

 symbolic regression functions.  Hypothesis validation is accomplished by evaluating one of two 

 possible metrics, both of which promote fitness and penalize complexity: Bayesian Information 

 Criteria (BIC), and M.S.E./Complexity score (  Section  4.2.1  ).  The fundamental statistical 

 workings of symbolic regression will not be described in this paper, but more can be learned via 

 sources in the bibliography [25,39].  Only the Bayesian Optimization active learning schema is 

 currently taught in the  Machine Learning for Materials  Science  course (ENMA 437/637) at the 

 University of Maryland, where LEGOLAS is used as a teaching tool, although other techniques 

 may be introduced in future implementations (  Chapter  3  ). 

 1.3.3 Autonomous Experimentation in Materials Science 

 Many thorough reviews regarding the successes, experimental extents, and scopes of AE 

 in theoretical, computational, and experimental materials science have been published in the past 

 decade [7-9,12-14,16,18,27,29].  The use of AE for exploration, optimization, and discovery of 

 materials systems has been already proven in many areas of materials science:  reducing the 

 number of neutron diffraction experiments for determination of transition temperatures of 

 magnetic materials [28], incorporation of physics intuition and structural phase mapping to 

 accelerate the discovery of best-in-class phase change memory materials [34], autonomous study 

 of processing parameters to optimize carbon nanotube growth rate [40,41], autonomous 

 optimization of mechanical properties of 3D printed structures [42], accelerated discovery of 

 12 


 new metallic glasses [43], active learning for tuning of interatomic potentials in atomistic 

 simulations [44], among many others [29,45-49] 

 At the heart of it, the greatest misconception surrounding AE is that it could elicit an 

 elimination of humans from the scientific process altogether [12,14,27].  In all these studies 

 mentioned, however, a great deal of human guiding, monitoring, and encapsulation of human 

 intuition into ML frameworks was required to create meaningful results [29,34,40-49].  The 

 hopes of those developing AE systems are that they could (1) free up time for experts by 

 alleviating them of laborious and repetitive laboratory procedures, (2) equip them with 

 data-driven methods of navigating within highly dimensional sample spaces perhaps beyond the 

 scope of human intuition & theory based analysis, and (3) forgo analytical delays  between 

 experiments to allow accelerated exploration with automated equipments. 

 The field of AE itself has its beginnings in Biology and Chemistry, where workflows 

 generally involve more liquid handling and are less complex [17,50-52].  In the field of Materials 

 Science, workflows may require high amounts of dexterity, and can be extremely complex and 

 highly dissimilar from case to case [13,29].  Insights from thermodynamic phase stability, 

 structural considerations, and defect equilibria (among other confounding factors) have proven 

 necessary for making some sense of the inherently high-dimensional and highly-complex search 

 spaces (process/structure/property/performance relationships) [9,34,35].  The call to utilize 

 autonomous exploration techniques is therefore warranted, yet it is further complicated by 

 expensive and complex characterization techniques [12-14].  Due to these restrictions, the use of 

 Self-Driving Labs (SDLs) - controlling synthesis, measurement, and analysis functionalities - has 

 been limited to universities and groups with large funding sources, resulting in highly specialized 

 and extremely expensive setups that are not easily accessible to the majority of research groups, 

 13 


 and completely out of reach for most educational instruction purposes [49,53,54].  A few 

 commercial modular SDL platforms are on the market, but again many of them cost hundreds of 

 thousands of dollars, leading to a lack of availability for educational institutions [53,54] 

 In order to fully embrace SDL technologies, a multitude of expertise over many domains 

 is needed including computer science, decision-making theory, experimental science, machine 

 learning, statistics, materials science, systems engineering, robotics, software design, etc [19,55]. 

 An ideal research group working in SDL technologies might consist of experts in various fields, 

 but it is quite unreasonable to expect any one person to possess the utmost expertise in all 

 necessary domains.  An SDL compatible workforce should at least be conversant in these fields 

 and understand the basics of the various fields [19,55] (  Section 1.3.4  ).  For developing a 

 manufacturing workforce in industry to manage autonomously operating factories, the same is 

 true [19,20]. Additionally, since collaboration is required to achieve the large-scale goals of 

 autonomous manufacturing or experimentation, teamwork should be emphasized in developing 

 the workforce [19].  One of the main barriers to the expansion of AE as a field, however, is the 

 lack of availability of lower-cost and available teaching platforms and curricula for the education 

 of an SDL compatible workforce [12,13,18,19,27,55] (  Section 1.3.4  )  . 

 1.3.4 Educating an SDL Compatible Workforce 

 The Materials Genome Initiative (MGI) is a large-scale multi-agency federal initiative 

 launched in 2011, with the goal of “deploying advanced materials twice as fast and at a fraction 

 of the cost compared to traditional methods” [55].  The MGI was developed in response to the 

 10-20 year delay time from materials discovery to commercial implementation that handicaps 

 our ability to respond to existential societal threats such as climate change and sustainable energy 

 needs [7,55].  In 2021, the MGIs strategic plan consisted of three main goals, one of which was 

 14 


 to “Educate, Train, and Connect the Materials Research & Development Workforce” [19,55]. 

 From its conception, MGI set out to ensure our workforce was “trained for careers in academia 

 or industry, including high-tech manufacturing jobs”[55]. The first objective to tackle in 

 achieving this goal was to “Address Current Challenges in R&D Education'' at the 

 undergraduate, graduate, and foundational (K-12) levels.  The second objective was to “Train the 

 Next-Generation Workforce,” which includes mid and late career post-graduates, manufacturing 

 workers, and other industry employees [20,55].  In 2019, an MGI working group identified 3 

 foundational pillars and their associated necessary competencies: data management, 

 computation, and experimentation (  Figure 1.3  ) [19,55]. 

 Figure 1.3:  Three target foundational pillars for  educational development identified by 
 an MGI working group[19] 

 The MGI working group emphasized that “students need not be experts in all three areas but 

 should be conversant in multiple topics across this spectrum” [55].  They also pointed out that 

 the contrasting vernacular and cultures of theorists, experimentalists, computationally scientists, 

 and other contributing experts necessitated a convergent educational preparation in order to 

 provide the optimal cross-pollination between groups.  Teamwork -  a critical aspect of large 

 scale success - was emphasized as well [55]. 

 15 


 Existing educational solutions for the development of MGI-related skills have touched 

 upon perhaps 1-2 of the foundational pillars, but rarely all three at once.  For instance, at the 

 graduate level, the Data-Enabled Discovery and Design of Energy Materials (D  3  EM) at Texas 

 A&M University provides a wonderful preset track for the convergence of data informatics, 

 computational methods, teamwork, leadership, and materials science fundamentals [56]. 

 However, it currently lacks emphasis on automation and autonomous technologies: a crucial 

 aspect of experimental acceleration techniques [56].  Although other established programs at the 

 undergraduate and graduate level - as well as informational bootcamps and summer programs - 

 have also begun to incorporate ML and active learning techniques into their curricula, they 

 currently lack an available  tangible platform  from  which to  experimentally  teach these 

 techniques [56-61].  For the  Machine Learning for  Materials Science  graduate/undergraduate 

 course (ENMA 437/637) taught at the University of Maryland, this has been a challenge since 

 the inception of the course in 2019.  The majority of the course modules focus on ML and 

 applied ML techniques, and at the end, when active learning is discussed, it was observed that 

 many students lacked the ability to fully incorporate active learning as a decision making engine 

 (either in the class assignments or into their own applied research). 

 In other developing fields such as drone technology and self-driving cars, where there is 

 also a large emphasis on autonomous development, they have succeeded educationally by 

 implementing tangible platforms in the classroom such as self-driving RC cars with multitudes 

 of sensors and autonomously flying drones [62-66].  It was observed that the use of real 

 platforms helped identify and remedy deficiencies in understanding of the robot point-of-view 

 (POV) (i.e. what the robot knows vs. what the human knows), robotics hardware proficiency, 

 coding and data management skills, and teamwork capabilities [66].  In developing these skills, 

 16 


 students became more adept at distinguishing between realistic and useful applications of 

 autonomous control, and those that were “impractically ambitious”[66].  In this way, the students 

 came to fundamentally deconvolute the seemingly nebulous concepts of AI, ML, and 

 autonomous control to facilitate practical applications [66]. Although the use of simulated 

 experiments or previously collected data can be helpful in teaching the data-driven side of 

 autonomous systems, it can also lead to a lack of proficiency and understanding from the 

 experimental side.  It is a goal of MGI to have students be at least conversant in  all  of these 

 fields [19,55] 

 Some of the tangible and hands-on platforms for autonomous drone technology were only 

 a few hundred dollars and composed of widely available DIY robotics kits (Raspberry Pi, 

 Arduino, etc.) [64]. However, in the case of autonomous driving, the F1Tenth program noted that 

 their price per system (nearly $5,000) was cost prohibitive (considering a class typically needed 

 10 systems) for university programs, and certainly at the foundational (K-12) levels [62]. 

 Therefore, for maximum educational impact, it should be a goal of any tangible platform to be 

 low-cost and affordable (  Section 2.1: Design Principles  - Affordability  ).  At the same time, one 

 of the main benefits of a hands-on educational tool is ultimately the captivation of student 

 interest.  It should also be a goal of the system to entice students and motivate them with exciting 

 and fun challenges (  Section 2.1: Design Principles  - Usability & Modularity  ) [13].  In the case of 

 the F1Tenth car, this was achieved by replacing exams with racing events [62].  Lastly, one key 

 aspect of all tangible platforms is that they attempt to accurately portray “real-world” 

 autonomous situations (  Section 2.1: Design Principles  - Transferability  ).  For hands-on 

 educational platforms, there must be a careful balance between system complexity and usability, 

 so that the majority of students are not overwhelmed nor bored during its use [66]. 

 17 


 Although SDLs are being developed at research institutions for fruitful materials science 

 studies (  Section 1.3  ), these bespoke systems are typically  hundreds of thousands of dollars, so 

 that they are neither available nor abundant enough for widespread  educational use 

 [13,49,53,54], Several groups in the past decade have seen an opportunity to produce tangible 

 SDL platforms for teaching at a wide range of costs ($100 - $10,000) (  Table 1.1, Figure 1.4  ) 

 [67-73].  Typically, these are cartesian style robots that use syringe pumps or peristaltic pumps to 

 manipulate liquids, and perform characterization through contact measurements or via computer 

 vision techniques (composition-color relationships, composition-acidity relationships, and even 

 autonomously refining cocktails based on human feedback) [67,69-71,73-79].  There are also 

 examples of autonomous 3D print optimization schemes, networking robots, and some attempts 

 at solid-state experiments (using powders or wax) [68].  Table 1.1  shows the strengths and 

 drawbacks of each of these systems.  Some SDL platforms are high quality, modular, usable, and 

 transferable, but simply too expensive for educational use [67,72,76,77,79].  On the other hand, 

 some are extremely cheap, but lack modularity, appeal, and most importantly  transferability 

 when assessed along the MGI emphasized competencies [19,55,68-70]. 

 18 


 Table 1.1:  Summary of some recent low-cost SDL platforms. 

 19 


 Figure 1.4:  Images of low-cost SDLs from  Table 1.1 

 In addition to the SDLs shown here, there has also been a rise in so-called “cloud labs” 

 which allow students or collaborators to interact with experimental hardwares from a distance, in 

 some cases using AI to facilitate complex manufacturing processes [54,80,81].  Although these 

 technologies have the potential to revolutionize certain repeatable synthesis routes and improve 

 upon the “crisis of reproducibility” increasingly apparent in recent human produced studies, they 

 20 


 are likely not good tools for educational purposes because they distance the student from the 

 tangible aspects of experimentation [12,29,30,31].  In addition, it is clear from reviews and 

 analyses of many SDL based setups and experiments that it is highly unlikely that humans will 

 be completely removed from the physical laboratory[9,12,27,30].  In nearly every study it can be 

 seen that a human is eventually needed - to replace experimental consumables, to troubleshoot 

 hardware issues, etc. - and at best the “closed-loop” can only operate for so long before it needs 

 tuning [13,15,17,29,46,51].  Additionally, thorough knowledge of experimental hardwares 

 contributes to the reduction in “impractically ambitious” expectations often inherent to pure 

 theorists [62].  Therefore, it should be a definite goal of educational platforms to continue to 

 instill the importance of working with your hands and with tools on the equipment  (Section 2.1: 

 Transferability). 

 Of the best SDL platforms, (  Table 1.1  ) the Chemorobotic  Robot - essentially a 

 liquid-handling cartesian gantry - was identified as the optimal option due to the incorporation of 

 3D printed modular parts, a captivating application, smaller profile, and robust design [79]. 

 However, due to the incorporation of syringe pumps and bespoke parts, this SDL had an elevated 

 cost - although cheaper syringe pump alternatives now exist [82] - and required a complicated 

 construction process [79].  For the work represented in this thesis, a LEGO based chemistry 

 robot design for STEM education (  Figure 1.1  ) was developed  to facilitate much of the same 

 functionality at a reduced cost  (~$950: For a full  breakdown of parts and costs see Section 2.3  ). 

 This was done through 3D printing many of the parts, as well as replacing the syringe pumps 

 with a simple plastic syringe.  Computer vision and contact measurements were made possible 

 by incorporation of a low-cost ($30) reliable pH sensor and a small USB Camera [83,84]. 

 21 


 LEGOLAS is a cartesian style gantry robot that can be expanded beyond the tasks elucidated in 

 this thesis due to its high degree of modularity (  Chapter  5  ). 

 The design principles followed for creating LEGOLAS are described in detail in  Section 

 2.1  , but briefly, the incorporation of LEGO parts  was intended to elevate student interest in the 

 project, remove the stigma of complexity from the physical system (so non-experts could learn 

 as well), and allow for low-cost modularity (i.e. creative students could shape the course of the 

 robotics design quickly and cheaply).  By simplifying the experimental setup and reducing its 

 costs, LEGOLAS facilitates financially achievable adoption at K-12, undergraduate, graduate, 

 and industrial education levels, and effectively prepares students for the following competencies: 

 MGI Foundational Pillar 1 (Data) 

 -  Data Handling:  Students manage upcoming measurements and manipulate data 

 structures to facilitate ML analysis. (  LEGOLAS Challenge  1 - 3 in Chapter 3) 

 -  Software and codes to manage MG workflows:  Students embrace workflows in the 

 easy-to-learn yet complexity tolerant Python language, where they manage both robotic 

 control functions and active learning analysis in live-run closed loop AE fashion. 

 (  LEGOLAS Challenge 4 in Chapter 3  ) 

 MGI Foundational Pillar 2 (Computation): 

 -  Microstructure Evolution and material response:  Students study the relationship between 

 composition in acidity in the main challenges described here.  In a future exercise being 

 developed for LEGOLAS (  Chapter 5  ), students may be  able to use computer vision to 

 study the solidification kinetics and resulting growth structures of precipitating salt 

 crystals in solution. 

 22 


 MGI Foundational Pillar 3 (Experiments) 

 -  Multi-objective design and decision-making under uncertainty:  Students learn to 

 quantify uncertainty using Bayesian ML techniques, and guide multi-objective 

 optimization studies involving color mixing, solidification kinetics, and acidity. 

 -  Measurement methods and tools:  Students work hands-on with an electrochemical probe 

 for measuring pH, calibrate the sensor, and build an understanding of the noise inherent 

 to it (  Chapter 3  ).  In addition, they must calibrate  the synthesis syringe using a mg 

 sensitive scale and calibrate the sample well locations. 

 -  Sensor fusion, high-throughput methods, and automation:  This is LEGOLAS strongest 

 strength, combining synthesis, measurement, and active learning into an autonomous 

 closed-loop AE cycle (  Section 3.1.2  ) (  Figure 1.5  ). 

 Figure 1.5:  Simplified pictorial representation for  a SDL such as LEGOLAS 

 23 


 1.4 Thesis Outline: Ch.2 - Ch.4 (Development and Operation) 

 An instructional overview of the construction and operation of LEGOLAS is outlined in 

 Sections 2.2 & 2.3  , with links to more detailed instructions therein.  Worksheets and challenge 

 templates for educational implementation are overviewed in  Chapter 3.  Code used to implement 

 the active learning techniques for autonomous model discovery are outlined in  Chapter 4  .  All 

 detailed instructions files, worksheets, challenge templates, and code for any acquisition 

 functions described here can be found centrally at the  LEGOLAS github page  . 

 Inquiries for obtaining additional software modules for relevant chemistry experiments, 

 challenge worksheets for students, active learning code, and machine learning code can also be 

 requested via email from:  takeuchi@umd.edu  ,  agiladk@gmail.com  . 

 24 


 Chapter 2:  Systems Development 

 2.1 Design Principles 

 LEGOLAS is designed with several overarching principles in mind:  affordability, 

 usability, modularity, and transferability.  These principles are held paramount for the purpose of 

 achieving maximum educational impact.  A large educational impact can best be realized 

 through a system that is widely accessible and helpful in  realistically  portraying the benefits, 

 drawbacks, simplicities, and challenges of closed-loop AE.  It should also be a goal of the 

 educational tool to promote critical thinking - with respect to active learning and experimental 

 design - and not merely a prescriptive carrying out of established steps and rules (  Chapter 3  ). 

 Most closed-loop AE systems are currently cost-prohibitive, leading edge research tools 

 that are available only to a small, select few scientists and researchers who are far along in their 

 careers [13,30].  A low-cost, easily reproducible AE system encourages the instruction of AE 

 fundamentals at earlier ages; facilitating implementation in high-school, undergraduate, and 

 graduate education.  It is likely true that an early introduction - and thereby a wider berth - for 

 the AE field could only serve to enliven and further enrich its goals of improving our ability to 

 produce societally beneficial materials [7,13,30]. 

 Usability of an educational AE system refers not only to its ability to operate reliably and 

 smoothly, but also to its propensity for the captivation and inspiration of student interest 

 [13,62,66].  LEGO components were selected to elevate this level of interest, as well as reduce 

 the stigma of complexity that can often serve to overwhelm certain students.  These LEGO 

 components, however, are by no means of poor quality, and provide the robot with structure, 

 modularity, and a pleasant aesthetic quality.  All aspects of the design embody the principles of 

 interactive usability and simplicity:  manual controls on motors, open and accessible joints, 

 25 


 removable trolley & bridge, light-weight & sturdy aluminum frame, Arduino & Raspberry Pi 

 integration, remote Wi-Fi connectivity, and Python based Jupyter Notebook control. 

 The modularity of an educational AE system is important to ensure it can be extended 

 beyond its original mode of use into new and fresh domains for exploration by creative students 

 [13,30].  Physically, the LEGOLAS system described here is quite modular: 3D printed (and 

 thereby replaceable) components, semi-permanent joining methods (LEGO pins, machine 

 screws), LEGO parts & motors that can be rearranged for different types of experiments, and an 

 open frame and trolley design providing space for additional sensors, probes, cameras, and other 

 experimental hardwares.  These attributes allow LEGOLAS to be expanded into new 

 experiments beyond the ones described here (  Chapter  5  ).  With respect to the software side of an 

 AE education system, ideally there should be a control interface that is relatively easy to learn, 

 robust, and tolerant of complexity [30].  For LEGOLAS, a Graphical User Interface (GUI) has 

 been developed to  reduce experimental calibration complexity.  Coding in the easy to learn & 

 widely supported Python language (Jupyter Notebooks) allows for both beginner and 

 experienced programmers to get what they need out of the platform.  Within Python, community 

 supported active learning/ML and robot control functions can coexist, allowing students to 

 visualize the process-flow of closed-loop AE quite well. 

 Finally, the aspect of transferability - albeit hard to quantify - is key to setting up an 

 educational AE system for maximum impact.  Transferability refers to the educational AE 

 system’s ability to accurately  approximate  the challenges,  drawbacks, and benefits of a 

 larger-scale research grade AE system in a way that will prepare students for the “real thing.” 

 The key fundamentals of research level closed-loop AE will be elaborated upon later (  Chapter 

 3  ), but briefly, they are: 

 26 


 -  (1) the skills of working with your hands & with tools on hardware, 

 -  (2) the knowledge to interface with motors, sensors, & robotic components, 

 -  (3) the patience and attention to detail to properly calibrate and tend-to the 

 machine, 

 -  (4) the knowledge of ML techniques for prediction, clustering, etc. 

 -  (5) the skills of programming and data manipulation 

 -  (6) the insight to encapsulate research “objectives” into active learning schema, 

 -  (7) the foresight and knowledge to simulate live-runs, 

 -  (8) the skills of presenting data in a digestible & meaningful fashion, 

 -  (9) and the discernment to understand pros & cons of AE. 

 These are skills that are often distributed amongst a team of specialists in larger-scale 

 research-grade AE groups, but all are necessary to ensure a successful, and  meaningful  use of AE 

 for research purposes [19,66]. By possessing many or all of these skills, an individual has a 

 greater ability to integrate their research goals and experimental processes to achieve relevant 

 applications.  However, nearly all high-level research occurs in group settings, so it should still 

 be a goal of an educational AE system to promote teamwork [19].  In LEGOLAS, the principle 

 of transferability is largely affected by the methods in which it is used/taught, rather than any 

 physical characteristics of the machine itself.  This will be discussed more in the educational 

 section (  Chapter 3  ). However, certain aspects of the  LEGOLAS system - as with any machine - 

 are liable to occasionally malfunction.  These challenging moments happened to provide some of 

 the greatest learning benefits for any of the students, since, in reality, no AE system is truly a 

 “closed” loop (humans will need to periodically intervene) [13,30,46,51]. 

 27 


 2.2 Mechanical Design 

 LEGOLAS is a cartesian gantry style robot with a frame track (y-axis), a bridge track 

 (x-axis), and a trolley cart that rests on the bridge.  Each assembly contains an assortment of 

 LEGO components, aluminum frame extrusions, electronics, and/or 3D printed parts.  Trolley, 

 bridge, and frame may be constructed from the constituent components with common tools and 

 adhesives (  Section 2.3  ) and are easily removable for  maintenance or alterations. 

 The X & Y position of the trolley cart can be controlled via keyboard or manually 

 through control knobs (  Figure 2.1, 2.5  ).  Force sensors  aligned along the X & Y axes allow for 

 automatic recalibration of relative experimental coordinates (i.e. the location of the sample wells 

 or reservoirs) in the case of the bridge or trolley being derailed or dislodged.  The trolley cart 

 contains all experimental components (pH Sensor, Syringe & Plunger, and/or camera), which are 

 free to move in the Z-direction (  Figure 2.1, 2.2  ). 

 The LEGOLAS was inspired by a fully LEGO based liquid-handling robot developed for 

 chemistry experiments [85] (  Figure 2.11  ).  I have  developed 4 generations of LEGOLAS since 

 2019, attempting to create each new generation at a lower cost, higher stability, and with greater 

 interactivity possible.  The chronological development and modifications made for each of these 

 models is shown in  Section 2.5  .  To better visualize  the operation of LEGOLAS (  Generation III  ), 

 one can see it performing experiments in this  video  [86]. 

 28 


 Figure 2.1:  Experimental components on the trolley  (  Generation IV  LEGOLAS).  Axes of 
 motion (  double sided arrows  ) and manual control knob  locations (  rotational arrows  ) shown. 

 29 


 Figure 2.2:  USB Camera attached to the bottom of  the Trolley  (Generation IV  LEGOLAS). 
 The camera can easily be attached and removed for different types of experiments/studies. 

 30 


 2.3 System Construction & Preparation 

 This section contains the general process flow of sourcing components, 3D printing parts, 

 constructing LEGOLAS, and preparing the experimental consumables needed for the simplistic 

 chemistry experiment described in  Chapter 3  .  Guidance  to in-depth instruction modules are 

 provided as links within each section. 

 Sourcing Components and Cost Breakdown: 

 The LEGO Components used in LEGOLAS design were originally sourced from the 

 LEGO® MINDSTORMS® EV3 Core (45544) and Robot Inventor (51515) sets, but can be 

 sourced more easily from individual part retailers and via the LEGO Education website (for 

 motors and sensors).  Aluminum frame components can be acquired from MakerBeam B.V., and 

 additional non-LEGO electronics and chemistry equipment from Amazon and other online 

 retailers.  See the detailed links for specifics on part sourcing.  A cost breakdown is shown in 

 Table 2.1  .  Note that this does not include the cost  of chemicals for experimentation or for a USB 

 compatible camera, which can vary based on the application desired but typically lead to a full 

 cost of around $1,000. 

 Detailed Links:  LEGO Parts List  ,  Sourcing & Cost  of LEGO Parts  ,  Sourcing & Cost of 
 Non-LEGO Parts 

 31 


 Table 2.1:  Cost Breakdown for a single LEGOLAS robot 

 Category  Included  Cost 

 LEGO Parts  Structural Components  ~ $60.00 

 LEGO Electronics  Motors, Force Sensors  ~ $280.00 

 Frame Components  MakerBeam Structural Parts  ~ $100.00 

 Non-LEGO 
 Electronics 

 Raspberry Pis, BuildHat, Arduino, pH 
 Sensor, Chargers, Wiring 

 ~ $470.00 

 Chemistry 
 Equipment 

 Wells, Stand, Syringe, Dispenser Tips  ~ $40.00 

 Total:  $950 

 Printing Parts:  Parts were printed with Ø1.75 mm Galaxy Silver Prusament Polylactic Acid 

 (PLA) using a Fused Deposition Modeling (FDM) style Original Prusa i3 MKS3 printer (  Figure 

 2.3  ).  With the exception of one Raspberry Pi Holder  Case [87], all parts were developed in the 

 Autodesk Fusion360 CAD software.  Total PLA needed for all prints is ~410 g, for around 

 $12.50 in material costs.  Table 2.2  summarizes some  relevant characteristics for each part. 

 Information on optimal orientation, advanced printer settings, post-processing tips, as well as the 

 CAD (.f3d) and printable (.stl, .3mf, .obj) files for each part are in the detailed links below. 

 Detailed Links:  3D Printing Guide  ,  Printable Files 

 32 


 Table 2.2:  Summary of FDM printed parts 

 Part  Purpose  Supports?  Quantity  Material Used (g)  Cost* 

 pH Sensor Guide Tube  Guide pH sensor towards 
 sample wells 

 Yes  1  42.45  $1.28 

 pH Sensor Sleeve  Protect pH Sensor  No  1  11.29  $0.34 

 Trolley Base  Hold experimental 
 hardware 

 No  1  38.71  $1.17 

 Mid-Axle Supports  Support mid-spans of 
 X-axis axles 

 No  6  14.77  $0.45 

 End-Axle Supports  Support ends of X-axis 
 axles 

 No  4  12.21  $0.37 

 Side Assembly  Hold R-Pi & Force 
 Sensors on Bridge 

 Yes  1  66.42  $2.01 

 Reservoir Tank and 
 Stands 

 Contain liquid 
 constituents 

 Yes  1  203.41  $6.15 

 Force Sensor Holders  Fasten force sensors to 
 frame 

 No  2  2.18  $0.07 

 R-Pi Case  Hold Raspberry Pi  No  1  16.51  $0.50 

 *Based on $30/kg PLA costs 

 Figure 2.3:  All 3D printed parts needed to build one  LEGOLAS robot 

 33 


 Constructing Stand:  The gantry stand is composed of MakerBeam (10mm x 10mm profile) 

 miniature slotted aluminum extrusion pieces, 2 force sensor holders (3D printed), MakerBeam 

 fasteners, a LEGO force sensor, and LEGO gear rack pieces (composing the y-axis of the gantry) 

 (  Figure 2.4  ).  Store bought superglue and a 2 mm hex  key are all that are needed to construct it. 

 Instructions are included in the links below. 

 Detailed Links:  Stand Instructions 

 Figure 2.4:  Fully assembled stand (  320 x 340 x 60  mm profile  ) 

 34 


 Constructing Bridge:  Similar to the frame, the bridge is constructed of an aluminum MakerBeam 

 frame with associated fasteners (  Figure 2.5  ).  It also includes a LEGO force sensor, 3D printed 

 components (4 End-Axle supports, 6 Mid-Axle Supports, 1 Side Assembly), assorted LEGO 

 pieces, a Raspberry Pi, a BuildHAT, and LEGO rack pieces (composing the x-axis of the gantry). 

 Like the stand, it only requires the hex key and superglue to be assembled. 

 Detailed Links:  Bridge Instructions 

 Figure 2.5:  (  left  ) Assembled bridge, with X-axis (  double  sided arrows  ) and Y-axis manual 
 control gear location (  rotational arrow  ) shown. (  right  )  3D printed Side Assembly with 

 Raspberry Pi + Buildhat attached in their holder piece. 

 35 


 Constructing Trolley:  The trolley is built of assorted LEGO pieces, 4 LEGO motors, a Raspberry 

 Pi, a BuildHAT, and experimental equipment (plastic syringe, Arduino pH sensor, USB camera) 

 that all rest on the 3D Printed Trolley Base frame (  Figure 2.6  ).  One must first build the Syringe 

 Plunger Assembly, then the Syringe Holder Assembly, and finally follow the Trolley Instructions 

 to complete construction.  The only additional components needed to build it are scissors, 

 superglue, 18-24” of braided fishing line, and 3 small barrel swivels. 

 Detailed Links:  Trolley Instructions  ,  Syringe Plunger  Assembly Instructions  ,  Syringe Holder 

 Assembly Instructions 

 Figure 2.6:  (  left  ) Assembled trolley resting on the bridge, with Raspberry Pi + BuildHAT 
 assembly and X-axis manual control gear location (  rotational arrow  ) shown. (  right  ) Syringe 

 Plunger & Holder assembly, prior to being inserted into the Trolley Frame. 

 36 


 Sample Space:  The sample space shown in  Figure 2.7  is applicable to the Henderson- 

 Hasselbalch pH study (  Chapter 3  ), and can be modified in terms of components and their 

 orientation for other types of experiments.  This setup contains the constituent reservoir tanks 

 (3D printed and coated with an epoxy to ensure solution impermeability for acid, base, and 

 Deionized (DI) water solutions), 48 sample wells (each Ø1.5 cm, 3.5 mL volume), and a mg 

 accurate digital scale for volume calibration (  2.4:  System Calibration Section  ).  Each reservoir 

 tank holds 50 mL of either acid or base solution, facilitating the experimental use of all 48 

 sample wells if needed (assuming 2 mL sample volume).  These components are held in place by 

 two adjustable MakerBeam aluminum extrusion pieces, and can easily be removed from the 

 sample space by sliding them out through the open end for post-experiment clean-up.  The 

 sample wells and reservoir are elevated several cm off of the ground, providing ample space for 

 the insertion of hot plates and other process-altering equipment for modular experiment design, if 

 desired (  Chapter 5  ). 

 Detailed Links:  Chemistry Guide  (  step #2  ) 

 37 


 Figure 2.7:  Sample space setup for the Henderson-Hasselbalch  pH study 

 Chemistry:  Preparing the solutions for the Henderson-Hasselbalch pH study involves non-toxic 

 constituents, and produces chemicals with acidity on the level of vinegar and milk of magnesia, 

 allowing safe classroom use even at primary school levels.  Only a mg accurate scale, 10 mL 

 graduated cylinder, small glassware components, and common chemicals are required (  see 

 Chemistry Guide for more details  ). 

 Detailed Links:  Chemistry Guide 

 38 


 2.4 Connecting to & Calibrating LEGOLAS 

 This section describes the means of properly wiring LEGOLAS, downloading the 

 appropriate configuration files, connecting to the robot via WiFi, and calibrating both the system 

 and pH sensor (only needed if running acidity based studies as in  Chapter 3  ). 

 Wiring:  LEGOLAS has 2 Raspberry Pi & BuildHAT stacks, with one (  R-Pi A  ) located on the 

 Side Assembly of the bridge (controls Y-axis motor, Y-axis force sensor, X-axis force sensor), 

 and one (  R-Pi B  ) located on the Trolley (controls  Syringe Z-motor, Syringe Plunger motor, pH 

 Sensor Z-motor, X-axis motor).  Two 120V wall plugs must be available for the BuildHAT 

 chargers, which are the exclusive power source for LEGOLAS and the Arduino, which connects 

 to the pH sensor and (  R-Pi A  ).  A chemistry lab retort  stand is used to suspend the pH sensor’s 

 BNC cable and keep it out of LEGOLAS’s path of motion (  Figure 2.8  ). 

 Detailed Links:  Wiring Guide 

 Figure 2.8:  Location of relevant wires and electronic  devices for LEGOLAS 

 39 


 Raspberry Pis & WiFi Connection:  The Raspberry Pi’s + BuildHAT stack must have the 

 Raspberry Pi operating system installed, as well as packages for automatic remote SSH 

 connection (RPyC Server) and BuildHAT interfacing.  A WiFi connection (from your Computer 

 to LEGOLAS) may be enabled through a local router to facilitate DHCP Client List observations 

 and Address Reservations.  Raspberry Pi configuration can also be copied from one microSD 

 card to another using a USB microSD card reader, for expediting the process when setting up the 

 second Raspberry Pi + BuildHAT stack (  see Raspberry  Pi & BuildHAT Setup Guide for more 

 details on proper setup  ) 

 Detailed Links:  Raspberry Pi & BuildHAT Setup Guide 

 File Set-up:  The preferred LEGOLAS programming environment uses Jupyter Notebooks 

 through the Anaconda platform.  Files for calibration (manual.py, config.py), movement 

 functions (core.py), demonstration (LegolasDemo.ipynb), and classroom use 

 (LegolasOutline.ipynb) can be downloaded from the LEGOLAS github page (  see LEGOLAS 

 Scripts  ).   Instructions for directory management  and use are in the beginning of the LEGOLAS 

 Calibration & Use Guide. 

 Detailed Links:  LEGOLAS Scripts  ,  LEGOLAS Calibration  & Use Guide  (  pages 1-7  ) 

 System Calibration:  Prior to using LEGOLAS for experiments, one must calibrate the location 

 of relevant experimental locations (reservoirs, sample wells, DI water tank, syringe/pH sensor 

 device offset) in cartesian space relative to a user set origin point (which is defined by an XY 

 offset from the X-axis and Y-axis force sensors).  If the LEGOLAS trolley or bridge is then 

 derailed, it may reestablish these locations relative to that origin point.  Once XY positions are 

 40 


 calibrated, the user must define the range of motion (in the Z-direction) for the pH sensor and 

 syringe, as well as their distance of offset in the XY plane.  Finally, the liquid-volume/gear-step 

 ratio is defined via calibration with a mg accurate scale.  One may use keyboard controls or the 

 manual control gears to perform calibration steps.  All aforementioned steps use a GUI that 

 reduces calibration time to ~10 minutes (  Figure 2.9  ).  The calibration values are exported from 

 the GUI into  config.yaml  (  to be called in the Jupyter  Notebooks  ).  Additionally, the pH sensor 

 should be calibrated (  see Arduino pH Calibration Code  )  prior to experiment, which can be 

 carried out with installation of the Arduino IDE and a 2-point buffer solution calibration.  To 

 extend pH sensor bulb life-time, it is advised to avoid long periods of exposure (> 1 hr) outside 

 the pH storage solution (1 M KCl). 

 Detailed Links:  LEGOLAS Calibration & Use Guide  ,  Arduino  pH Calibration Code 

 Figure 2.9:  (  left  ) GUI calibration window, and (  right  )  liquid-volume/gear-step calibration using 
 the mg accurate digital scale. 

 41 


 Coding Interface:  Once the system is physically calibrated, one can connect to verify calibration 

 of LEGOLAS using the  LegolasDemo.ipynb  file in Jupyter Notebooks (  Figure 2.10  ).  Students 

 may also become comfortable with the motor movement functions here, before moving onto 

 LegolasOutline.ipynb  (  which contains the worksheet  problems in Markdown text  ) for completion 

 of the assigned challenges.  Installation and usage of common ML packages (such as GPy and 

 scikit-learn) in this Python environment allows students to visualize the process flow of AE in 

 one central location. 

 Detailed Links:  LegolasDemo.ipynb  ,  LegolasOutline.ipynb 

 Figure 2.10:  (  top  ) Example usage of fundamental movement  functions (  contained in core.py  ) 
 and (  bottom  ) example of a simple loop for experimental  synthesis and measurement in a 4x6 grid 

 of sample wells  (no active learning involved).  These snippets are included in the 
 LegolasDemo.ipynb  file. 

 42 


 2.5 Developmental Stages 

 This section visually depicts the 4 generations of LEGOLAS, in which mechanical 

 design, interface methods, and reliability were iteratively improved based on student feedback 

 and for the purpose of reducing material costs (  Figures  2.11- 2.17  ) 

 Figure 2.11:  Liquid Handling robot that inspired the  LEGOLAS design [85].  This Robot was 
 designed using LEGO® MINDSTORM® EV3 robotic and frame components, and was capable 

 of liquid dispensing and color detection (using LEGO color sensors). 

 43 


 Figure 2.12:  LEGOLAS (  Generation I  ) using a Mindstorm  EV3 Brick.  This model was 
 accessible for control via WiFi or Bluetooth, and now included a sturdier aluminum frame, as 

 well as a pH sensor for acidity measurements. 

 44 


 Figure 2.13:  LEGOLAS (  Generation II  ) using the LEGO®  Robot Inventor Hub.  This model 
 was accessible for control via Bluetooth only, and was created to utilize LEGO’s newest robotic 
 kit as they phased out production of the MINDSTORM® EV3 sets.  The Inventor hub acted only 

 as a microcontroller, providing limited functionality compared with the EV3 kit. 

 45 


 Figure 2.14:  Classroom Setup for Fall ‘21 Implementation,  which used  Generation I & II 
 LEGOLAS. 

 46 


 Figure 2.15:  LEGOLAS (  Generation III  ) using integrated  Raspberry Pi & BuildHATs to 
 interface with LEGO motors.  This model was accessible for control through a SSH via WiFi. 

 This design featured new 3D printed axle supports and greater mechanical stability.  It was 
 utilized to conduct the first live runs of the on-the-fly hypothesis validation active learning 
 schema. The Raspberry Pi & Buildhat interface allowed the same programming freedom 

 available with the prior MINDSTORM® EV3 kits. 

 47 


Figure 2.16: (top) LEGOLAS (Generation IV) used in the Fall ‘22 UMD course and (bottom)
an image of the robot with a black frame (MakerBeam extrusions).  This LEGOLAS had more
3D printed components (trolley base, pH sensor lowering assembly, and side cart) that reduced

the overall cost of the robot while improving its stability and robustness.  Generation IV
LEGOLAS could also be equipped with a camera for computer vision based AE (Chapter 5).

48


 Figure 2.17:  Students working on the LEGOLAS exercises during the Fall ‘22 Implementation, 
 which utilized  Generation IV  LEGOLAS. 

 49 


 Chapter 3: Educational Implementation 

 3.1 Henderson-Hasselbalch Exercise 

 The Henderson-Hasselbalch exercise was selected as a simple example for the teaching 

 of AE fundamentals in the  Machine Learning for Materials  Science  course at the University of 

 Maryland in the Fall ‘21 and Fall ‘22 semesters as a final project.  Students had spent the initial 

 12 weeks of the class overviewing ML techniques (regression, classification, supervised 

 learning, unsupervised learning, etc.), and had just been introduced to  active learning  about 1 

 week prior to the start of the project.  The selected exercise was  intentionally  simple - being that 

 it contains only one control variable (mixture ratio of acid and base solutions) and 1 response 

 variable (acidity or pH) - so that fundamental concepts could be discussed in conjunction with 

 hands-on implementation.  It is fundamentally a chemistry problem, but the ideas of 

 composition-property relationships can be extended into materials science based problems 

 naturally.  Although the physical system is simple, layers of complexity can be added with the 

 use of novel active learning techniques (Gaussian Processes, Bayesian Inference, and Entropy 

 Based Acquisition functions) without a related increase in system price.  Additionally, the 

 chemical constituents are non-toxic, and safe to work with to eliminate the risk of injury to 

 students. 

 The Henderson-Hasselbalch equation is a simplified and rearranged mass-action equation 

 that relates the pH of a buffer solution to the relative concentration of an acid and its conjugate 

 base (  Equation 3.1, Figure 3.1  ). 

 (Eq.  3.1) [88]   =          +       10 [  −][    ]( )
[  −] =             ., [  ] =        .,             =                             

 50 


 Figure 3.1:  The Henderson-Hasselbalch equation for  the system of study as expressed in 
 a Titration Curve.  The trend here does not appear exactly logarithmic due to the scale of the 

 x-axis (  not in units of [base]/[acid]  )[89] 

 The equation relies on several assumptions, namely that the acid is monobasic, 

 self-ionization of water may be ignored, the salt base completely dissociates in solution, and the 

 activity coefficient quotient remains constant within experimental conditions [88].  These 

 assumptions are reliably met with buffer solutions of sodium acetate salt (NaOAc) and acetic 

 acid (CH  3  COOH) each prepared at 1M concentrations  (acetic acid pK  a  = 4.756), although the 

 Henderson-Hasselbalch equation begins to deviate from experimental results at high acid 

 compositions due to the violation of certain assumptions in the simplification (  Figure 3.2  ) [88]. 

 This deviation from expected behavior is later used as a teaching example of the benefits of 

 non-parameterized models such as Gaussian Processes in materials exploration. 

 51 


 Figure 3.2:  Percent error in Henderson-Hasselbalch  equation as a function of sodium hydroxide 
 volume per 100 mL of weak acid (  see pK  a  = 5 as a sufficient  analogue to our system with pK  a  = 
 4.756  ).  The red region indicates solutions of higher  acid concentration in which the pH does not 

 correspond well with the Henderson-Hasselbalch equation [88] 

 The Henderson-Hasselbalch equation represents the functional form of interest in the 

 mechanistic study of the composition-property relationship.  In using this example, the key idea 

 to be transferred to students is that the  exploration  of the acidity as a function of composition 

 could aid in accelerating the  exploitation  of a desired  pH value (  Section 3.1.2  ).  In the exercises 

 described in the next two sections, students become familiar with manipulation of LEGOLAS 

 robotics for experimentation, quantify acidity measurement uncertainty, extract a meaningful 

 physical constant of the system (dissociation constant) by fitting a Henderson-Hasselbach 

 equation to the data with non-linear least squares regression, and finally execute a closed-loop 

 Bayesian Optimization based AE by developing a scheduled acquisition function and running the 

 live experiment themselves.  Each exercise is intended to build upon skills developed in previous 

 exercises, and to leverage their knowledge of concepts learned in the  Machine Learning for 

 Materials Science  course.  The implementation of these  exercises - as well as the team-based 

 52 


 nature of the project - are fundamental to the  transferability  aspect of LEGOLAS outlined in 

 Section 2.1.  This is further discussed in  Section  3.2.3  .  Worksheets and Python templates for all 

 exercises may be found at the links below. 

 Detailed Links:  LEGOLAS System Exercises  ,  LegolasOutline.ipynb 

 3.1.1 Introductory Exercises 

 Prior to completing any of the exercises, the students were encouraged to master the 

 system calibration tasks, and peruse LegolasDemo.ipynb, to familiarize themselves with how the 

 robot could be operated.  This was a critical step in addressing skills (1), (2), and (3) of 

 transferability:  working with their hands on hardware, interfacing with robotics/motors, and 

 demonstrating attention to detail when calibrating the machine.  During the implementation of 

 the LEGOLAS as the final project for the machine learning class, it was  noticed that groups that 

 tried to forge ahead too quickly to complete exercises without devoting time to these critical 

 steps had much trouble later when trying to run closed-loop AE runs. 

 Exercise 1:  This task asked the students to synthesize one 2 mL sample (50% acid, 50% base), 

 and then measure the pH 10 separate times, cleaning the probe with DI water between each 

 iteration.  From these measurements, they were asked to calculate the mean and variance of the 

 pH and present this in the notebook.  Completion of this assignment ensured that the group was 

 capable of running the constituent acquisition & deposition functions (by creating the sample), 

 had properly calibrated the liquid-volume/gear ratio (by creating exactly a 2 mL sample), had 

 properly calibrated the pH sensor (by verifying  ), and were giving enough time for   =  4 .  75 
 the pH measurement to stabilize (~20 sec, for a  ).  The peculiarities of the measurement σ 2 <  0 .  1 

 53 


 and synthesis system presented here are helpful in addressing aspects (1), (2), and (3) of 

 transferability.  It was also important for the students to get some idea of the magnitude of the 

 noise factor in their measurements as they continued to approach the other exercises. 

 Exercise 2:  In this task, the students were given the functional form of the 

 Henderson-Hasselbalch equation, as well as descriptions of the variables therein.  They were 

 asked to synthesize a grid of compositions (  %acid  = [10, 20, … 80], %base = 100 - %acid  ), 

 taking pH measurements between each synthesis step (cleaning the probe with DI water again as 

 well). Students were then asked to use a nonlinear least squares protocol to fit  Equation 3.2  to 

 the collected data for the purpose of extracting a pK value (  ideally, pK = 4.75  ): 

 (Eq.  3.2)   (  ,   )   =   +       10 (  )  =    [  ] / [  ]   =     %    /%   
 While this exercise again required utilizing skills (1), (2), and (3) of transferability, it was 

 specifically suited towards building upon skills (4) and (5); utilizing knowledge of ML 

 techniques, and programming for data manipulation.  Using non-linear least squares regression is 

 a critical tool for any scientist dealing with data processing, and - in this example - was shown to 

 produce the “discovery” of a physical constant inherent to the acetic acid buffer system.  From 

 the programming perspective, students had to arrange incoming data into an array of values, 

 define a function for optimization, and research the specific nonlinear least squares regression 

 documentation to understand the parameters and returns.  It was observed that a small group of 

 students did not understand the reasoning behind alternating between synthesis and measurement 

 (to prepare for an active learning style study) when they could just as easily have prepared all the 

 samples prior to measurement.  However, after a discussion, they began to understand the 

 54 


 differences between the format of a grid-like exhaustive study as shown here, and the proposed 

 format of a closed-loop AE study.  This was a subtle point of misunderstanding that went 

 apparently undetected during the purely theoretical data-science active learning sessions of the 

 earlier weeks (  i.e., without the hands-on application  ). 

 Exercise 3:  In this task, students were introduced to the use of a Gaussian Process and the 

 importance of its kernel and model hyperparameters.  There was no experimental use of 

 LEGOLAS within this exercise, as it used the data collected from the grid-like study of  Exercise 

 2  .  Students were asked to use a Radial Basis Function  (RBF) kernel, and output the Gaussian 

 Process hyperparameters (kernel length scale, kernel variance, noise variance) after several 

 iterations of a hyperparameter optimizer.  Next, they were asked to repeat this process, but now 

 using only 6 data points (  %acid = [10, 20, … 60]  ).  As mentioned in  Section 3.1  , the 

 assumptions of the Henderson-Hasselbach equation are valid only within a certain composition 

 range, begetting this limitation of relevant data points (  Figure 3.3  ).  Students were observed to 

 ask questions about this limitation, leading to further discussions that helped solidify 

 understanding of the flexibility of a Gaussian Process over a rigid parameterized functional form 

 as seen in  Challenge 2  .  This discussion briefly addressed  aspect (9) of transferability; the 

 discernment to understand pros and cons of AE. 

 55 


 Figure 3.3:  Gaussian Process (RBF Kernel) demonstrated  for 5 samples within a composition 
 range of %acid = [5,.. 95].  The 95% Confidence Interval of the GP is represented in light blue, 

 with the mean as a dark blue line.  It can be seen that the data points (x’s) deviate from the 
 expected Henderson-Hasselbach equation (dotted line) at high acid compositions (red region). 

 In the second portion of Exercise 3, students went further into aspect (9) of 

 transferability, by purposefully implementing bad model assumptions for the Gaussian Process, 

 and analyzing the effects on model outputs.  They were asked to rigidly fix certain 

 hyperparameter values - such as length scale - at values that were either too low or too high to 

 see that this could either “underfit” or “overfit” the data.  Next, they were asked to use an 

 inappropriate and less flexible kernel (standard periodic) to see that this imposed an (incorrect) 

 periodic form to the Gaussian Process model.  Through this type of study, students began to see 

 that these data processing and active learning tools are not necessarily always applicable 

 off-the-shelf, and could sometimes require  human insight  or tuning to be of real service. 

 Throughout this exercise, students were also asked to plot the Gaussian Process (mean, 2 σ
 Confidence Interval), idealized Henderson-Hasselbalch model (for reference), and experimental 

 56 


 data  (  Figure 3.4  for example graphs  ).  This ensured that they were also working on skill (8) of 

 transferability; presenting data in a digestible & meaningful fashion. 

 3.1.2 Autonomous Closed Loop Exercise 

 The Autonomous Closed Loop exercise was the pinnacle achievement for the students 

 and was intended to usurp the most time & thought on their part.  It was designed to combine all 

 9 critical aspects of transferability, with emphasis on (6), (7), and (9):  the insight to encapsulate 

 research “objectives” into active learning schema, the foresight and knowledge to simulate 

 live-runs, and the discernment to understand AE pros & cons.  Students who did not have a good 

 grasp of the theoretical fundamentals of active learning (from the prior classwork) struggled to 

 understand the significance and development of acquisition functions.  Many students also had a 

 hard time differentiating between what  they  knew about the composition-property relationships 

 (because they were given the Henderson-Hasselbalch equation) and what  the robot  knew 

 (because in this toy problem we were assuming it began uninformed) [66].  This was surprising, 

 as these same students had succeeded in completing simulated active learning assignments 

 earlier in the course.  However, it highlighted the most beneficial aspect of LEGOLAS:  AE as a 

 field cannot be taught reliably and effectively without a real-world tangible system for 

 experimental application  .  The majority of these same students that struggled to understand the 

 benefits of active learning and AE were then capable - after successfully completing  Exercise 4  - 

 of clearly elucidating their thought processes and acquisition function implementation while 

 presenting their final project results. 

 Exercise 4:  In this task, the students were able to use the Gaussian Process understanding and 

 code they developed in  Exercise 3  within a closed-loop AE process framework to  autonomously 

 57 


 determine the composition (mixture ratio) at which the pH = 4.75.  As stated earlier, this led to 

 some confusion among students, as  they  already knew  what composition this occurred at (50% 

 acid, 50% base),  and  they had already collected enough  data in Exercises 1-3 to reliably extract 

 this.  It was also confusing to students that this was indeed an  exploitation  based CO, because the 

 property value they were targeting was neither a maximum nor minimum acidity metric.  The 

 students were told to discuss the merits of different acquisition function schemas, and 

 encouraged to combine  exploitation  and  exploration  COs in some manner.  This was a reiterative 

 process in which students would come to the TA’s for guidance.  Collectively, the class would 

 first talk through what it  meant  to explore (looking  towards the  unknown  ), what it  meant  to 

 exploit (where do we think our target is considering what we know?), and what would be an 

 appropriate order in which to execute those initiatives.  The drawbacks of exploiting from the 

 start (i.e., “shooting in the dark”) and purely exploring (effectively a grid search) were discussed. 

 A question that often came up was  why  we were doing  this in the first place. (The goal is to 

 reduce the number of experiments).  Of course, a toy problem like the Henderson-Hasselbalch 

 equation seemed to not need this reduction in experiments, but the discussion of the merit of AE 

 in this case was a valuable expression of aspect (9) of transferability; discerning between AE 

 pros and cons. 

 Once the theoretical and philosophical discussions of the CO order were understood, the 

 students again came for guidance with respect to the mathematical encapsulation of these  human 

 ideas  into quantitative acquisition functions.  This  discussion, along with the prior philosophical 

 one, embodied aspect (6) of transferability most directly; the insight to encapsulate research 

 “objectives” into active learning schema.  Some groups decided on a hardcoded cutoff between a 

 purely exploration based CO (  Equation 3.3  ) and purely  exploitation based CO (  Equation 3.4  ). 

 58 


 This could, for example, consist of doing 4 experiments exploring, then spending the remainder 

 of the experiments exploiting. Other groups understood more fundamentally that although this 

 approach could work, it fundamentally depended on our  prior knowledge  of the simplicity of the 

 system for picking this cutoff point.  Those groups, instead, utilized their knowledge of other 

 techniques - namely, a modified upper confidence bound (UCB) - to have an adaptable and 

 continuous transition between these COs (  Equation  3.5  )[90].  Certain aspects of UCB tuning led 

 to further discussions, broadening the students' understanding of the potential data research 

 opportunities within the acquisition function based side of AE. 

 (Eq.  3.3)   =   σ   2 ⎡⎢⎣ ⎤⎥⎦
 (Eq.  3.4)   =   −  | μ  −  4 .  75| [ ]
 (Eq. 3.5) [90]        =         (−  | μ  −  4 .  75| ) + σ     1          ( )⎡⎢⎣ ⎤⎥⎦  =           ,    1 =           

 In the actual implementation of  Challenge 4,  the  students also began to understand that 

 they needed an  exit criterion  at which the AE loop  would terminate.  Upon further discussions, it 

 became clear that this could introduce even more complexity to their AE loop.  If, by chance, 

 they serendipitously selected a composition of 50% acid, 50% base (with a pH sufficiently close 

 to 4.75) during their  exploration  initiative portion,  would that constitute success?  Should they 

 verify their result?  Was it important to continue searching the composition space for other 

 potential compositions that might produce this specific acidity?  No singular answer to these 

 questions was identified as correct, and the discussion itself was the teaching point.  Without a 

 tangible experimental system such as LEGOLAS, pertinent questions like this are easily 

 overlooked  . 

 59 


 As mentioned before, some groups who did not give much attention to calibration (  aspect 

 (3)  ) either had hardware malfunctions with the robot  (they are not supposed to touch or assist the 

 robot during the final closed-loop study), or erroneous pH metrics that led to incorrect answers. 

 Another critical aspect of designing their final closed-loop AE was to leverage the utility of 

 simulated runs.  Being that the synthesis/measurement/analysis loop could take an average of 90 

 seconds, groups that attempted to troubleshoot their active learning code during live runs could 

 become extremely frustrated and get far behind others due to the time they were spending 

 resetting the experiment.  Those that followed the advice of using simulated experimental data 

 (i.e. a Henderson-Hasselbalch function with some random white noise applied) refined their 

 active learning code, and once they knew it worked, were able to confidently proceed with 

 experiment (  aspect (7) of transferability  ). 

 An example of a live-run evolution of the Gaussian Process for a purely exploration 

 based CO (  Equation 3.3  ) is shown in  Figure 3.4,  for  samples 1 through 5.  The Gaussian Process 

 can be seen to approach the logarithmic behavior of the Henderson-Hasselbalch equation, as the 

 acquisition function prioritizes compositions with high degrees of model uncertainty. 

 60 


 Figure 3.4:  Gaussian Process evolution (RBF Kernel) for first 5 experimental samples with a 
 purely exploration based acquisition function. 

 61 


 3.2 Class Implementations 

 This section provides a summary of the implementations of LEGOLAS within the 

 Machine Learning for Materials Science  course in the  Fall ‘21 and Fall ‘22 semesters.  This class 

 (  3 credits  ) is listed on the University of Maryland  Schedule of classes as ENMA437 (  previously 

 ENMA 489L  ) for undergraduates, and ENMA637 for graduate  students.  It has been offered once 

 a year in the Fall semester since 2020.  The enrollment has on average been around 12-16 

 students, typically half graduate and half undergraduate students. I have served as the main 

 teaching fellow for the class in 2021 and 2022, when LEGOLAS was fully implemented as the 

 final project tool. Prior to the Fall 2021 offering of the class, I had spent the summer developing 

 LEGOLAS. 

 Course Description:  Familiarizes students with basic  as well as state of the art knowledge of 

 machine learning and its applications to materials science and engineering. Covers the range of 

 machine learning topics with applications including feature identification and extraction, 

 determining predictive descriptors, uncertainty analysis, and identifying the most informative 

 experiment to perform next. One focus of the class is to build the skills necessary for developing 

 an autonomous materials research system, where machine learning controls experiment design, 

 execution, and analysis in a closed-loop. 

 The LEGOLAS Exercises were designated as the final project for the class, and students 

 were split up into teams of 4-6 per robot.  They were given the last 3-4 weeks of the in-period 

 class time (  3 hrs/week  ) to work on the project, and  allowed extra lab time if they felt they needed 

 it.  Credit was given for each exercise either through TA observations of completion, or through 

 a video taken by group members (  see LEGOLAS rubric  in Detailed Links  ).  Additionally, 

 students prepared a final presentation (20 minutes) in which they overviewed their completion of 

 62 


 the Exercises, elucidated their decision making processes, highlighted the difficulties they had, 

 and offered suggestions for course improvements. 

 Detailed Links:  LEGOLAS Rubric 

 3.2.1 Fall 2021 

 In the Fall of 2021, the class utilized  Generation  I & II  LEGOLAS systems (  Figures 2.12 

 - 2.14  ).  There were 3 teams of 6 students, w