ABSTRACT Title of Dissertation: ECOLOGICAL APPLICATIONS OF MACHINE LEARNING TO DIGITIZED NATURAL HISTORY DATA Alexander John Robillard, Doctor of Philosophy, 2022 Dissertation directed by: Associate Professor Christopher Rowe, University of Maryland Center for Environmental Science Natural history collections are a valuable resource for assessment of biodiversity and species decline. Over the past few decades, digitization of specimens has increased the accessibility and value of these collections. As such the number and size of these digitized data sets have outpaced the tools needed to evaluate them. To address this, researchers have turned to machine learning to automate data-driven decisions. Specifically, applications of deep learning to complex ecological problems is becoming more common. As such, this dissertation aims to contribute to this trend by addressing, in three distinct chapters, conservation, evolutionary and ecological questions using deep learning models. For example, in the first chapter we focus on current regulations prohibiting the sale and distribution of hawksbill sea turtle derived products, which continues internationally in physical and online marketplaces. To curb the sale of illegal tortoiseshell, application of new technologies like convolutional neural networks (CNNs) is needed. Therein we describe a curated data set (n = 4,428) which was used to develop a CNN application we are calling ?SEE Shell?, which can identify real and faux hawksbill derived products from image data. Developed on a MobileNetV2 using TensorFlow, SEE Shell was tested against a validation (n = 665) and test (n = 649) set where it achieved an accuracy between 82.6-92.2% correctness depending on the certainty threshold used. We expect SEE Shell will give potential buyers more agency in their purchasing decision, in addition to enabling retailers to rapidly filter their online marketplaces. In the second chapter we focus on recent research which utilized geometric morphometrics, associated genetic data, and Principal Component Analysis to successfully delineate Chelonia mydas (green sea turtle) morphotypes from carapace measurements. Therein we demonstrate a similar, yet more rapid approach to this analysis using computer vision models. We applied a U-Net to isolate carapace pixels of (n = 204) of juvenile C. mydas from multiple foraging grounds across the Eastern Pacific, Western Pacific, and Western Atlantic. These images were then sorted based on general alignment (shape) and coloration of the pixels within the image using a pre-trained computer vision model (MobileNetV2). The dimensions of these data were then reduced and projected using Universal Manifold Approximation and Projection. Associated vectors were then compared to simple genetic distance using a Mantel test. Data points were then labeled post-hoc for exploratory analysis. We found clear congruence between carapace morphology and genetic distance between haplotypes, suggesting that our image data have biological relevance. Our findings also suggest that carapace morphotype is associated with specific haplotypes within C. mydas. Our cluster analysis (k = 3) corroborates past research which suggests there are at least three morphotypes from across the Eastern Pacific, Western Pacific, and Western Atlantic. Finally, within the third chapter we discuss the sharp increase in agricultural and infrastructure development and the paucity of widespread data available to support conservation management decisions around the Amazon. To address these issues, we outline a more rapid and accurate tool for identifying fish fauna in the world's largest freshwater ecosystem, the Amazon. Current strategies for identification of freshwater fishes require high levels of training and taxonomic expertise for morphological identification or genetic testing for species recognition at a molecular level. To overcome these challenges, we built an image masking model (U-Net) and a CNN to mask and classify Amazonian fish in photographs. Fish used to generate training data were collected and photographed in tributaries in seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019. Species identifications in the training images (n = 3,068) were verified by expert ichthyologists. These images were supplemented with photographs taken of additional Amazonian fish specimens housed in the ichthyological collection of the Smithsonian?s National Museum of Natural History. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the one described here, will enable fishermen, local communities, and citizen scientists to more effectively participate in collecting and sharing data from their territories to inform policy and management decisions that impact them directly. ECOLOGICAL APPLICATIONS OF MACHINE LEARNING TO DIGITIZED NATURAL HISTORY DATA by Alexander John Robillard Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy, 2022 Advisory Committee: Dr. Christopher Rowe, Chair Dr. Helen Bailey, Co-Chair Dr. Vyacheslav Lyubchich Dr. Rebecca Dikow Dr. Jeffrey Seminoff Dean?s Representative: Dr. Gerald Wilkinson ? Copyright by Alexander John Robillard 2022 Dedication This manuscript is dedicated to the increase and diffusion of knowledge. https://www.iucn.org/news/secretariat/202105/a-tribute-lee-merriam-talbot-1930- 2021 ii Acknowledgements This doctoral dissertation was supported through the Smithsonian Institution Predoctoral Fellowship, the UMD Covid Relief Grant, and through a UMCES Graduate Student FRA. Significant funding was generously given to support this research by The Bently Foundation (Ch.1), FAPESP (#2016/19075-9), Smithsonian?s Global Genome Initiative (GGI-Rolling-2019-2020; 2019-242), and GeoPark Per? (Ch. 3). Computation was performed on the Smithsonian High-Performance Computing Cluster (SI-HPC; doi.org/10.25572/SIHPC). I?d like to thank Brad Nahill and the entire SEE Turtles network for their significant contributions and continued support for this work. Special thanks go to Dr. Christine Madden Hof of the World Wildlife Fund for Nature for her contributions to the development of the SEE Shell project and associated research. I would like to thank Cristian Ramirez-Gallego, Didiher Chac?n Chaverri, Karla G Barrientos- Munoz, Callie A. Veelenturf, Muhammad Jayuli, Dr. Hiltrud Cordes, Didiher Chac?n Vargas, Marino Abrego, and Captain Genoveva Forero and from the SEE Turtles network for aiding in image data collection. I?d also like to thank Dr. Jose Urteaga for his feedback on the manuscript and overall scope of the SEE Shell project. My sincere thanks go to Dr. Rocio ?lvarez-Varas for her support, enthusiasm, initiated research, collaboration, and vital commentary on drafts of Chapter 2. Additionally, I?d like to thank Dr. Daniel Godoy, Alejandro Fallabrino, Dr. Gabriela V?lez, Dr. Eduardo Res?ndiz, Maike Heidemeyer, Dr. Juan Pablo Mu?oz, Daniela Alarc?n, Dr. Joanna Alfaro, Dr. Jeffrey Mangel for their initiated field work. Their collective body of work, their permitting, and excellent data curation was the iii foundation for the research conducted in Chapter 2. Additional thanks go to UMCES REU student Sammy Arnold who, for his summer project, assisted me with development of the turtle shell masking model. Special thanks go to Dr. Michael Jensen for his support on several chapters of this dissertation, including data collection and offering vital commentary on this manuscript. I?d like to thank Dr. Jessica Deichmann of the Smithsonian National Zoo and Conservation Biology Institute for her significant contributions and support for the work carried out in Chapter 3, your help and guidance throughout this process was pivotal to the success of this dissertation. Additional thanks to Morgan Ruiz-Tafur, Edgard Leonardo D?vila Panduro for their input on drafts of Chapter 3, and for their initiated field efforts. Thanks goes to Dr. C. David de Santana for his vital input and edits on Chapter 3 as well. I thank the people of the Achuar native community of Brasilia for access to their territory and their interest and contribution to the project. I thank the Indigenous Socio-Environmental Monitors (MOSAI) and local indigenous community experts from the Wampis and Achuar nations for invaluable help with data collection in the field including Sem Flores Bautista, Antonio Dias Pinedo, Persiles Dias Espinar, Wilson Chichipe Villega and Peas Mukuik Tsunki. We are grateful to Diego Balbuena, Diana V?squez, Ernesto Yallico and the GeoPark Per? HSE team for logistical support in the field. Homero S?nchez Riveiro and James Garcia Ayala contributed to species identifications. I would like to thank Lynne Parenti, Sandra Raredon, Kris Murphy, and the entire Smithsonian Museum Support Center staff for granting us access to and enabling us to successfully traverse the USNM ichthyological collections. Additional thanks go to Julianna Hazera, Erika Ali, iv Shauna Rasband and Guillem Millan for assistance photographing USNM specimens. Fish sampling in Peru in 2018 and 2019 was conducted under permits RD N264- 2018-PRODUCE-DGPCHDI and RD N358-2019-PRODUCE-DGPCHDI. This is under contribution #65 of the Peru Biodiversity Program of the Smithsonian?s Center for Conservation and Sustainability. I?d like to thank my dissertation committee, Helen, Rebecca, Jeff, Chris, and Slava. Through the highs and lows of the development of this manuscript, you were unrelenting in your support. Through your collective acts of patience, leadership, and generosity I have come to learn the true meaning of mentorship. Additionally, my sincere gratitude goes to Dr. George Zug, and the late Dr. Lee Talbot, for their contributions to the field of ecology as well as their guidance throughout this journey. Special thanks go to Mike Trizna, Dr. Alex White, Dr. Mirian Tsuchiya and the rest of the group at the Smithsonian OCIO Data Science lab, your thoughtful input and help at every step were only outmatched by your friendship. Additional appreciation goes to Nicole Barbour, Amber Fandel, Ben Colbert and the entire Bailey lab group, your kindness, collaboration and friendship were vital to my success. I?d like to thank my roommates Blake Klocke, Fiorella Andrea Brice?o Huerta, Cecilia Barriga Bahamonde, and Anne Safiya Clay, for all the friendship and dinner parties. Finally, I?d like to thank my parents Mark and Regina, my siblings Janine and Steven, and my partner Brigid, for their undying support over the years which made this dissertation a reality. v Table of Contents Dedication ..................................................................................................................... ii Acknowledgements ...................................................................................................... iii Table of Contents ......................................................................................................... vi List of Tables ............................................................................................................. viii List of Figures .............................................................................................................. ix Introduction ................................................................................................................... 1 References ............................................................................................................ 5 Chapter 1: SEE Shell .................................................................................................. 14 Introduction ................................................................................................................. 14 Methods....................................................................................................................... 17 Preprocessing Steps ............................................................................................ 19 Training, Validation, Testing .............................................................................. 20 Results ......................................................................................................................... 21 Discussion ................................................................................................................... 24 Conclusions ................................................................................................................. 28 References ........................................................................................................... 28 Chapter 2: Mapping Genetic Lineage Through Morphology ..................................... 35 Introduction ................................................................................................................. 36 Methods and Experimental Design ............................................................................. 41 Data Collection and Study Sites ......................................................................... 41 vi Data Processing ................................................................................................... 41 Semi-Supervised Classification and Dimensionality Reduction ........................ 42 Genetic Analysis ................................................................................................. 43 Analysis and Visualization ................................................................................. 45 Results ......................................................................................................................... 45 Discussion ................................................................................................................... 50 Conclusions ................................................................................................................. 60 References ........................................................................................................... 60 Chapter 3: Application of a Deep Learning Image Classifier for Identification of Amazonian Fishes ....................................................................................................... 83 Introduction ................................................................................................................. 84 Methods....................................................................................................................... 87 Preprocessing Steps ............................................................................................ 88 Identification Model Architecture, Training, and Validation ............................. 88 Results ......................................................................................................................... 89 Discussion ................................................................................................................... 93 Conclusions ................................................................................................................. 96 References ........................................................................................................... 96 Appendices ................................................................................................................ 105 vii List of Tables Table 1.1 ...............................................................................................................24 Table 2.1 ...............................................................................................................44 Table 3.1 ...............................................................................................................91 Table A.1................................................................................................ Appendix I viii List of Figures Figure 1.1 ..............................................................................................................19 Figure 1.2 ..............................................................................................................21 Figure 1.3 ..............................................................................................................22 Figure 1.4 ..............................................................................................................23 Figure 1.5 ..............................................................................................................28 Figure 2.1 ..............................................................................................................42 Figure 2.2 ..............................................................................................................46 Figure 2.3 ..............................................................................................................47 Figure 2.4 ..............................................................................................................48 Figure 2.5 ..............................................................................................................49 Figure 2.6 ..............................................................................................................50 Figure 3.1 ..............................................................................................................89 Figure 3.2 ..............................................................................................................92 Figure A.1 ............................................................................................. Appendix II ix Introduction The collection and archival of physical voucher specimens in museums and other repositories have proven extremely useful for analyzing long-term ecological trends (Shaffer et al., 1998). Natural history collection data has proven to be a valuable resource for assessment of species decline (Shaffer et al., 1998) and evaluation of biodiversity (Ponder et al., 2001; O'Connell Jr et al., 2004). Over the past decade the prevalence of large biodiversity datasets has grown rapidly (Weinstein, 2018). Digitization, which is the generation of images based on a photograph of a physical voucher specimen, has greatly expanded the usage of collections (Hendrick et al., 2020). Throughout time, ecologists, naturalists, and evolutionary biologists have utilized drawings, paintings, and photographs to study the natural world (L?rig et al., 2021; Hayashi & Yasuda, 2022). As an archival method, such data have proven to be useful for documenting historical natural heritage (Hayashi & Yasuda, 2022). The recent push to convert specimen data for mobilization on online platforms, like iDigBio (Matsunaga et al., 2013), has enhanced the value of these collections by creating secondary workflows that are entirely digital (Hendrick et al., 2020). Similarly, platforms like iNaturalist (Van Horn et al., 2018) have even circumvented the physical voucher and utilize direct data capture. Implementation of these digital-only pipelines has only sped up the size and scope of such collections (Hendrick et al., 2020). As such, the number and size of image data sets have largely outpaced the tools necessary to evaluate them (Weinstein, 2018; L?rig et al., 2021). Machine learning, which started in 1930, is a discipline which applies advanced algorithms that, by way of static programming, can make data-driven decisions (Thessen, 2016). Machine learning is often utilized in ecology for time-series forecasting (Lin et al., 2018; Li et 1 al., 2021; Lucas, 2020) and dynamic time warping (Hegg & Kennedy, 2021). Machine learning can enable users to find insights from otherwise uninterpretable data. For example, Okamoto et al. (2020) were able to successfully estimate sea turtle species composition by using a random forest classifier on catch records from 10,490 longline fishing operations. Deep learning, which was popularized in 2012, is a branch of machine learning that utilizes neural networks with multiple layers to automate detection of features, often applied to more complex problems (Christin et al., 2019). Computer vision by way of deep learning is a growing technology which utilizes input images to sort and categorize image data. Today, such tools are answering any number of key questions about photographic datasets and their underlying information (Weinstein, 2018; L?rig et al., 2021). A rapid increase in the computational power behind machine learning models (Thessen, 2016; Maeda-Guti?rrez et al., 2020) has acted as a catalyst for implementation of computer vision across several biological (Brosch et al., 2014; Hua et al., 2015) and ecological studies (Grinblat et al., 2016; Norouzzadeh et al., 2018; Younis et al., 2018; Borowiec et al., 2021). Unfortunately, many past applications of machine learning to the natural world are applied to predicting rather than understanding ecology (Lucas, 2020), but this paradigm is changing (Borowiec et al., 2021; L?rig et al., 2021). More recently, deep learning has been applied to animal behavior (M?nck et al., 2018), population genetics (Sheehan & Song, 2016), niche modeling (White et al., 2019) and species delimitation studies (Saryan et al., 2020). Within this dissertation, the hope is that similar explorations at the intersection of ecology and deep learning can be made. Specifically, each chapter within this dissertation describes independent studies which were carried out with the intention that each can act as a case-study for applying computer vision to digitized natural 2 history data to address conservation, evolution, and ecological issues. Respectively, each of these specific areas is representative of the chapters within this dissertation. The first chapter outlines a novel application of computer vision to identifying Eretmochelys imbricata (hawksbill) derived products known as ?tortoiseshell?. Herein is a description of how a Convolutional Neural Network (CNN), which can accurately identify tortoiseshell products, was generated. This is undoubtedly a major conservation issue given that the endangered E. imbricata are by far the most illegally traded sea turtle species on the planet (Frazier et al., 2003; Miller et al., 2019; Nahill et al., 2020). Application of such a tool has the potential to have direct and wide-ranging impact to collect data on and prevent illegal sales which perpetuate the trade both in real and virtual marketplaces (Nahill et al., 2020). Although past efforts have applied machine learning to the illegal wildlife trade (Di Minin et al., 2019), there is no evidence to date which suggests such efforts have applied computer vision to the sea turtle trade specifically. This case-study may prove to be a critical first step toward applying this novel technique to other endangered species derived products. The second chapter describes a novel approach for analyzing and contextualizing morphometric image data of a heritable phenotypic trait (carapace) in Chelonia mydas (green sea turtle). Past research has used machine learning to delimitate species across several taxa (Derkarabetian et al., 2019; Derkarabetian et al., 2022; Perez et al., 2022), including terrestrial turtles (Martin et al., 2021). Separately, dimensionality reduction by way of Principal Component Analysis (PCA) on multivariate data has also been applied to morphological data as a means to delimitate species (?lvarez-Varas et al., 2019). In tandem, machine learning and PCA have proven to be complementary tools with the ability to identify species boundaries from morphological data (Saryan et al., 2020). Uniform Manifold Approximation and Projection 3 (UMAP) is another form of dimensionality reduction (McInnes et al., 2018) which, when used in tandem with a machine learning model, has the capacity to outperform PCA (Chari et al., 2021; Yang et al., 2021). Proper delimitation of species has evolutionary implications which can directly impact species counts and biodiversity assessments, this in turn impacts advocacy and funding (Seminoff & Shanker, 2008; Funk et al., 2019; Saryan et al., 2020), thus making comparisons amongst similar methods is vital. The overall goal for the second chapter is to use a machine learning model to sort the C. mydas carapace image data based on general alignment (shape) and coloration of the pixels within the image, and project it into a morphospace similar to ?lvarez-Varas et al. (2019). By using UMAP to examine an expanded version of the ?lvarez-Varas et al. (2019) data, one can make direct comparisons between dimensionality reduction strategies. The output embeddings from this analysis will be compared to associated genetic data in hopes of validating the biological relevance of the image morphology data. This suggested pipeline, which utilizes image data rather than individual measurements, may prove to expedite future morphometric analysis and could prove to be the theoretical foundation for other computer vision applications focused on sea turtle morphology. The research in the third and final chapter of this dissertation attempts to generate a Convolutional Neural Network (CNN) which is able to accurately identify Amazonian fish species. Specifically, we describe the methods used to build this CNN, and the underlying data set which it was trained on. The Amazon basin is home to over 2,700 species of freshwater fish (Junk et al., 2007; Dagosta & De Pinna, 2019), many of which are critical to the health and economy of the people living in the Amazon (Moreau & Coomes, 2007; Coomes et al., 2010). Given the difficulty associated with identifying and differentiating species of fish (Kirsch et al., 4 2018) it is no surprise that many of the species within this region are not described (Reis et al., 2016). Past efforts have attempted to utilize computer vision (Hern?ndez-Serna & Jim?nez- Segura 2014; Sun et al., 2016; Alsmadi et al., 2019), to identify fish with varying levels of success. Given that the region our data comes from, within the sub-drainages of the Mara??n river, is one of the most under sampled in the Amazon (J?z?quel et al., 2020), such a tool may prove extremely useful for bridging this informational gap. Additionally, the methods and dataset described within the third chapter should provide a foundation for other computer vision projects seeking to identify freshwater fish within the Amazon. Each of the projects outlined within this dissertation apply deep learning methods to digitized image data in an attempt to answer complex biological questions. Although each chapter focuses these efforts on different aquatic organisms, the methodologies put forth here can be applied across taxa. Considering the increasing rate at which machine learning and computer vision tools are being deployed to answer similar ecology questions (Weinstein, 2018), the possibility that such tools will be an essential part of every biologist's tool kit seems more certain (L?rig et al., 2021). References Alsmadi, M. K., Tayfour, M., Alkhasawneh, R. A., Badawi, U., Almarashdeh, I., & Haddad, F. (2019). Robust feature extraction methods for general fish classification. International Journal of Electrical & Computer Engineering, 9, 2088-8708. https://doi.org/10.11591/ijece.v9i6.pp5192-5204 5 ?lvarez-Varas, R., V?liz, D., V?lez-Rubio, G.M., Fallabrino, A., Z?rate, P., Heidemeyer, M., Godoy, D.A. & Ben?tez, H.A. (2019). Identifying genetic lineages through shape: An example in a cosmopolitan marine turtle species using geometric morphometrics. PLOS ONE, 14(10), e0223587. Borowiec, M. L., Frandsen, P., Dikow, R., McKeeken, A., Valentini, G., & White, A. E. (2021). Deep learning as a tool for ecology and evolution. EcoEvoRxiv, 1-30. https://doi.org/10.32942/osf.io/nt3as Brosch, T., Yoo, Y., Li, D.K., Traboulsee, A. & Tam, R. (2014). Modeling the variability in brain morphology and lesion distribution in multiple sclerosis by deep learning. International Conference on Medical Image Computing and Computer-Assisted Intervention, 8674, 462-469. https://doi.org/10.1007/978-3-319-10470-6_5 Chari, T., Banerjee, J. & Pachter, L. (2021). The specious art of single-cell genomics. bioRxiv, 1- 25. https://doi.org/10.1101/2021.08.25.457696 Christin, S., Hervet, ?., & Lecomte, N. (2019). Applications for deep learning in ecology. Methods in Ecology and Evolution, 10(10), 1632-1644. https://doi.org/10.1111/2041- 210X.13256 Coomes, O. T., Takasaki, Y., Abizaid, C., & Barham, B. L. (2010). Floodplain fisheries as natural insurance for the rural poor in tropical forest environments: evidence from Amazonia. Fisheries Management and Ecology, 17(6), 513-521. https://doi.org/10.1111/j.1365-2400.2010.00750.x Dagosta, F.C. & De Pinna, M. (2019). The fishes of the Amazon: distribution and biogeographical patterns, with a comprehensive list of species. Bulletin of the American Museum of Natural History, (431), 1-163. https://doi.org/10.1206/0003-0090.431.1.1 6 Derkarabetian, S., Castillo, S., Koo, P.K., Ovchinnikov, S. & Hedin, M. (2019). A demonstration of unsupervised machine learning in species delimitation. Molecular Phylogenetics and Evolution, 139, 106562. https://doi.org/10.1016/j.ympev.2019.106562 Derkarabetian, S., Starrett, J., & Hedin, M. (2022). Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data. Frontiers in Zoology, 19(1), 1-15. https://doi.org/10.1186/s12983-022-00453-0 Di Minin, E., Fink, C., Hiippala, T., & Tenkanen, H. (2019). A framework for investigating illegal wildlife trade on social media with machine learning. Conservation Biology, 33(1), 210. https://dx.doi.org/10.1111%2Fcobi.13104 Frazier, J., Lutz, P., Musick, J., & Wyneaken, J. (2003). Prehistoric and ancient historic interactions between humans and marine turtles. In P.L Lutz, J. A. Musick & J. Wyneken, The Biology of Sea Turtles, Volume II, 1-38. https://doi.org/10.1201/9781420040807 Funk, W.C., Forester, B.R., Converse, S.J., Darst, C. & Morey, S. (2019). Improving conservation policy with genomics: a guide to integrating adaptive potential into US Endangered Species Act decisions for conservation practitioners and geneticists. Conservation Genetics, 20(1), 115-134. https://doi.org/10.1007/s10592-018-1096-1 Grinblat, G.L., Uzal, L.C., Larese, M.G. & Granitto, P.M. (2016). Deep learning for plant identification using vein morphological patterns. Computers and Electronics in Agriculture, 127, 418-424. https://doi.org/10.1016/j.compag.2016.07.003 Hayashi, R., & Yasuda, Y. (2022). Past biodiversity: Japanese historical monographs document the trans?Pacific migration of the black turtle, Chelonia mydas agassizii. Ecological Research, 37(1), 151-155. https://doi.org/10.1111/1440-1703.12265 7 Hedrick, B.P., Heberling, J.M., Meineke, E.K., Turner, K.G., Grassa, C.J., Park, D.S., Kennedy, J., Clarke, J.A., Cook, J.A., Blackburn, D.C. & Edwards, S.V. (2020). Digitization and the future of natural history collections. BioScience, 70(3), 243-251. https://doi.org/10.1093/biosci/biz163 Hegg, J. C., & Kennedy, B. P. (2021). Let's do the time warp again: non?linear time series matching as a tool for sequentially structured data in ecology. Ecosphere, 12(9), e03742. https://doi.org/10.1002/ecs2.3742 Hern?ndez-Serna, A. & Jim?nez-Segura, L.F. (2014). Automatic identification of species with neural networks. PeerJ, 2, e563. https://doi.org/10.7717/peerj.563 Hua, K.L., Hsu, C.H., Hidayati, S.C., Cheng, W.H. & Chen, Y.J. (2015). Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets and Therapy, 8. https://dx.doi.org/10.2147%2FOTT.S80733 J?z?quel, C., Tedesco, P.A., Darwall, W., Dias, M.S., Frederico, R.G., Hidalgo, M., Hugueny, B., Maldonado?Ocampo, J., Martens, K., Ortega, H. & Torrente?Vilara, G. (2020). Freshwater fish diversity hotspots for conservation priorities in the Amazon Basin. Conservation Biology, 34(4), 956-965. https://doi.org/10.1111/cobi.13466 Junk, W.J., Soares, M.G.M. & Bayley, P.B. (2007). Freshwater fishes of the Amazon River basin: their biodiversity, fisheries, and habitats. Aquatic Ecosystem Health & Management, 10(2), 153-173. https://doi.org/10.1080/14634980701351023 Kirsch, J.E., Day, J.L., Peterson, J.T. & Fullerton, D.K. (2018). Fish misidentification and potential implications to monitoring within the San Francisco Estuary, California. Journal of Fish and Wildlife Management, 9(2), 467-485. https://doi:10.3996/032018- JFWM-020 8 Li, M. F., Glibert, P. M., & Lyubchich, V. (2021). Machine Learning Classification Algorithms for Predicting Karenia brevis Blooms on the West Florida Shelf. Journal of Marine Science and Engineering, 9(9), 999. https://doi.org/10.3390/jmse9090999 Lin, C. H. M., Lyubchich, V., & Glibert, P. M. (2018). Time series models of decadal trends in the harmful algal species Karlodinium veneficum in Chesapeake Bay. Harmful Algae, 73, 110-118. https://doi.org/10.1016/j.hal.2018.02.002 Lucas, T. C. (2020). A translucent box: interpretable machine learning in ecology. Ecological Monographs, 90(4), e01422. https://doi.org/10.1002/ecm.1422 L?rig, M. D., Donoughe, S., Svensson, E. I., Porto, A., & Tsuboi, M. (2021). Computer vision, machine learning, and the promise of phenomics in ecology and evolutionary biology. Frontiers in Ecology and Evolution, 9, 148. https://doi.org/10.3389/fevo.2021.642774 Maeda-Guti?rrez, V., Galvan-Tejada, C.E., Zanella-Calzada, L.A., Celaya-Padilla, J.M., Galv?n- Tejada, J.I., Gamboa-Rosales, H., Luna-Garcia, H., Magallanes-Quintanar, R., Guerrero Mendez, C.A. & Olvera-Olvera, C.A., (2020). Comparison of convolutional neural network architectures for classification of tomato plant diseases. Applied Sciences, 10(4), 1245. https://doi.org/10.3390/app10041245 Martin, B. T., Chafin, T. K., Douglas, M. R., Placyk Jr, J. S., Birkhead, R. D., Phillips, C. A., & Douglas, M. E. (2021). The choices we make and the impacts they have: Machine learning and species delimitation in North American box turtles (Terrapene spp.). Molecular Ecology Resources, 21(8), 2801-2817. https://doi.org/10.1111/1755- 0998.13350 Matsunaga, A., Thompson, A., Figueiredo, R.J., Germain-Aubrey, C.C., Collins, M., Beaman, R.S., MacFadden, B.J., Riccardi, G., Soltis, P.S., Page, L.M., Fortes, J.A.B. (2013). A 9 computational- and storage-cloud for integration of biodiversity collections. IEEE 9th International Conference on e-Science. 78?87. https://doi.org/10.1109/eScience.2013.48. McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint, arXiv:1802.03426. https://doi.org/10.48550/arXiv.1802.03426 Miller, E. A., McClenachan, L., Uni, Y., Phocas, G., Hagemann, M. E., & Van Houtan, K. S. (2019). The historical development of complex global trafficking networks for marine wildlife. Science Advances, 5(3), eaav5948. https://doi.org/10.1126/sciadv.aav5948 M?nck, H.J., J?rg, A., von Falkenhausen, T., Tanke, J., Wild, B., Dormagen, D., Piotrowski, J., Winklmayr, C., Bierbach, D. & Landgraf, T. (2018). BioTracker: an open-source computer vision framework for visual animal tracking. arXiv preprint arXiv:1803.07985. https://doi.org/10.48550/arXiv.1803.07985 Moreau, M. A., & Coomes, O. T. (2007). Aquarium fish exploitation in western Amazonia: conservation issues in Peru. Environmental Conservation, 34(1), 12-22. https://doi.org/10.1017/S0376892907003566 Nahill, B., von Weller, P., & Barrios-Garrido., H. (2020). The global tortoiseshell trade. Oregon, USA: SEE Turtles, 1-83. https://static1.squarespace.com/static/5369465be4b0507a1fd05af0/t/5f37089ddc88be5b0f ce18fe/1597442219875/Global+Tortoiseshell+Report.pdf Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. & Clune, J. (2018). Automatically identifying, counting, and describing wild animals in camera- trap images with deep learning. Proceedings of the National Academy of Sciences, 115(25), E5716-E5725. https://doi.org/10.1073/pnas.1719367115 10 O'Connell Jr, A. F., Gilbert, A. T., & Hatfield, J. S. (2004). Contribution of natural history collection data to biodiversity assessment in national parks. Conservation Biology, 18(5), 1254-1261. https://doi.org/10.1111/j.1523-1739.2004.00034.x-i1 Okamoto, K., Kanaiwa, M., & Ochi, D. (2020). Machine learning approach to estimate species composition of unidentified sea turtles that were recorded on the Japanese longline observer program. Collective Volumes of Scientific Papers ICCAT, 76(9), 175-178. Perez, M. F., Bonatelli, I. A., Romeiro?Brito, M., Franco, F. F., Taylor, N. P., Zappi, D. C., & Moraes, E. M. (2022). Coalescent?based species delimitation meets deep learning: Insights from a highly fragmented cactus system. Molecular Ecology Resources, 22(3), 1016-1028. https://doi.org/10.1111/1755-0998.13534 Ponder, W. F., Carter, G. A., Flemons, P., & Chapman, R. R. (2001). Evaluation of museum collection data for use in biodiversity assessment. Conservation biology, 15(3), 648-657. https://doi.org/10.1046/j.1523-1739.2001.015003648.x Reis, R.E., Albert, J.S., Di Dario, F., Mincarone, M.M., Petry, P. & Rocha, L.A. (2016). Fish biodiversity and conservation in South America. Journal of Fish Biology, 89(1), 12-47. https://doi.org/10.1111/jfb.13016 Saryan, P., Gupta, S., & Gowda, V. (2020). Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery. Applications in Plant Sciences, 8(7), e11377. https://doi.org/10.1002/aps3.11377 Seminoff, J. A., & Shanker, K. (2008). Marine turtles and IUCN Red Listing: a review of the process, the pitfalls, and novel assessment approaches. Journal of Experimental Marine Biology and Ecology, 356(1-2), 52-68. https://doi.org/10.1016/j.jembe.2007.12.007 11 Shaffer, H.B., Fisher, R.N. & Davidson, C. (1998). The role of natural history collections in documenting species declines. Trends in Ecology & Evolution, 13(1), 27-30. https://doi.org/10.1016/S0169-5347(97)01177-4 Sheehan, S., & Song, Y. S. (2016). Deep learning for population genetic inference. PLOS Computational Biology, 12(3), e1004845. https://doi.org/10.1371/journal.pcbi.1004845 Sun, X., Shi, J., Dong, J., & Wang, X. (2016). Fish recognition from low-resolution underwater images. 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 471-476. https://doi.org/10.1109/CISP- BMEI.2016.7852757 Thessen, A. (2016). Adoption of machine learning techniques in ecology and earth science. One Ecosystem, 1, e8621. http://dx.doi.org/10.3897/oneeco.1.e8621 Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P. & Belongie, S. (2018). The inaturalist species classification and detection dataset. Proceedings of the IEEE conference on computer vision and pattern recognition. 8769- 8778. https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00914 Weinstein, B. G., (2018). A computer vision for animal ecology. Journal of Animal Ecology, 87(3), 533-545. https://doi.org/10.1111/1365-2656.12780 White, A. E., Trizna, M. G., Frandsen, P. B., Dorr, L. J., Dikow, R. B., & Schuettpelz, E. (2019). Evaluating Geographic Patterns of Morphological Diversity in Ferns and Lycophytes Using Deep Neural Networks. Biodiversity Information Science and Standards, (4), e37559. https://doi.org/10.3897/biss.3.37559 Yang, Y., Sun, H., Zhang, Y., Zhang, T., Gong, J., Wei, Y., Duan, Y.G., Shu, M., Yang, Y., Wu, D. & Yu, D. (2021). Dimensionality reduction by UMAP reinforces sample heterogeneity 12 analysis in bulk transcriptomic data. Cell Reports, 36(4), 109442. https://doi.org/10.1016/j.celrep.2021.109442 Younis, S., Weiland, C., Hoehndorf, R., Dressler, S., Hickler, T., Seeger, B., & Schmidt, M. (2018). Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks. Botany Letters, 165(3-4), 377-383. https://doi.org/10.1080/23818107.2018.1446357 13 Chapter 1: SEE Shell- A Deep Learning Model for Detecting Hawksbill Derived Products Abstract 1. Despite current regulations prohibiting the sale and distribution of hawksbill sea turtle derived products, the tortoiseshell trade continues internationally, in physical and online marketplaces. To curb the sale of illegal tortoiseshell, application of new technologies like convolutional neural networks (CNNs) is needed. 2. Here we describe a curated data set (n = 4,428) which was used to develop a CNN application we are calling ?SEE Shell?, which can identify real and faux hawksbill derived products from image data. 3. Developed on a MobileNetV2 using TensorFlow, SEE Shell was tested against a validation (n = 665) and test (n = 649) set where it achieved an accuracy between 82.6- 92.2% correctness depending on the certainty threshold used. 4. We expect SEE Shell will give potential buyers more agency in their purchasing decision, in addition to enabling retailers to rapidly filter their online marketplaces. Introduction Illegal harvest is known to negatively impact sea turtle populations around the globe (Alvarado-D?az et al., 2001; Senko et al., 2014; Cheng et al., 2018; Joseph et al., 2019). Despite current international mandates and domestic legislation, which, in many countries, regulate or prohibit the collection, consumption and sale of any sea turtle products (e.g., carapace, meat, adipose tissue, organs, blood, eggs); individuals primarily from the family Cheloniidae (and to a 14 lesser extent Dermochelyidae), continue to be sold in markets and online in many communities worldwide (Aguirre et al., 2006; Nada & Casale, 2011; Rudrud, 2010; Harrison et al., 2017; Qui?ones et al., 2017; Nahill et al., 2020). Contemporary research suggests that of all sea turtles, hawksbills (Eretmochelys imbricata) are by far the most illegally traded, critically endangered, species on the planet (Frazier et al., 2003; Miller et al., 2019; Nahill et al., 2020). Although every species of sea turtle is under threat of poaching, hawksbills are the most prized for their exceptionally ornate carapace. Their malleable carapaces--colloquially referred to as ?tortoiseshell? or ?bekko?--are often harvested and fashioned into jewelry and other items of adornment (Harrison et al., 2017; Nahill et al., 2020). Conservative estimates suggest that over a 150-year period, approximately 9 million hawksbills were illegally harvested (Miller et al., 2019). Despite access to less risky sources of protein and legal protections in some areas, illegal harvest of hawksbill still persists (Mancini et al., 2011). A recent report reviewing the global tortoiseshell trade across 40 countries suggests that since 2017, online trade of hawksbill products may have surpassed in-person sales (Nahill et al., 2020). Thus, to curb the continued trade of hawksbill and other sea turtles, new technologies like deep learning can be a tool for those on the frontlines of conservation and enforcement. Convolutional Neural Networks (CNN) are one deep-learning method successfully used in computer vision to extract valuable information from images, for example, learn to accurately classify those based on a labeled dataset of training images (Norouzzadeh et al., 2018). A CNN can be used to identify features within an image, which can be correlated to morphological traits similar to carapace shape, enabling the model to accurately differentiate between classes. Ultimately this allows a user to take a photo of an unknown target, pass it through a trained 15 model, and receive immediate feedback (a label) as to what is likely depicted in the image (Schuettpelz et al., 2017). Synthesis of community science and image classification has proven to be an effective method for sorting scientific data (Norouzzadeh et al., 2018; Sullivan et al., 2018). For example, implementation of deep learning on a training and test dataset, labeled by community scientists, resulted in 96.6% of nearly 3.2 million camera trap images of mammals being accurately classified (Norouzzadeh et al., 2018). Although past efforts have used similar image classification tools to expedite such research (Yang et al., 2009; Yu et al., 2013), those not using a CNN often require costly amounts of valuable time to pre-label large subsets, crop images, and tune parameters (Norouzzadeh et al., 2018). This is time which could be saved by a more rapid and objective method such as the application of a CNN. Image classification, without the assistance of a CNN, has been utilized for sea turtle research in the past. Early studies utilized visual inspection of photographs to non-invasively identify individual sea turtles (McDonald et al., 1996; Dutton et al., 2005; Reisser et al., 2008; Schofield et al., 2008). Early efforts successfully utilized key-point matching of head patterns to differentiate between images of individual leatherback sea turtles in Trinidad (De Zeeuw et al., 2010). Calmanovici et al. (2018) utilized the I3S Pattern image software to implement a mark recapture regime with identification accuracy of 85% for free-swimming turtles, and 97% for captured individuals. Although implementation of this tool reduced their overall analysis time by 80%, it still required human identification and outlining of anchor points for each photo (Calmanovici et al., 2018). Similarly, Hanna et al. (2021) was able to successfully match all their (n= 309) images of green sea turtles to seven individuals using a tool called ?Hotspotter?. Gatto et al. (2018) utilized a similar software called APHIS to classify hatchlings and adult turtles with 92.9% and 81.8% accuracy (respectively) based on images of their flippers. Like the I3S and 16 Hotspotter software, APHIS requires significant human point anchoring and identification (Gatto et al., 2018). Unfortunately, many of the software implemented in past research have extensive and highly technical user manuals, limiting their immediate usability by untrained individuals (Calmanovici et al., 2018; Gatto et al., 2018). Although image classification for sea turtle research has been implemented in the past, previous efforts to classify the tortoiseshell products derived from sea turtles have been limited to genetic studies (Jensen et al., 2019, LaCasella et al., 2021), and confiscations (Rice and Moore, 2008). An existing automated tool which can quickly identify the nuances of tortoiseshell in a non-destructive manner to delineate ?Real? and ?Fake? hawksbill derived products is currently lacking. A readily available application would enable scientists, law enforcement, community scientists, and other conservation professionals involved with monitoring efforts to collect fast and accurate trade data, or simply avoid purchasing tortoiseshell items. Using deep learning, we trained and tested a CNN model to detect real and fake tortoiseshell, which formed the basis for a broadly available, tortoiseshell trade monitoring and consumer choice application. This model can be used by multiple audiences for rigorous data collection, direct detection, monitoring, consumer purchasing power, and social outreach to enhance sea turtle conservation. Methods Images were collected by conservation professionals from across Asia, Oceania, North, Central and South America. Additional images were scraped from the internet using the DuckDuckGo image search engine. All images were visually verified by the first two authors (Author & Brad Nahill) of this paper (n = 4,428). Images were labeled ?real? (n = 1,409) or 17 ?fake? (n = 3,019). Real item images were considered to be items made from hawksbill sea turtle carapace. A small subset (n = 50) of the real images were previously verified with genetic analysis (Jensen et al. 2019). Fake item images were derived from a variety of mimics including resin, coconut, conch shell, cow horn, wood, and ceramics. An additional ?test set? (n = 649) was collected from the same sources but of different products. The test set images were not preprocessed and instead were collected with specific guidelines given to image takers. Specifically, we asked those collecting test images to (A) try to center and focus on items in their photos, (B) refrain from having multiple types of items or full store backgrounds, and (C) avoid other obstructions like capturing appendages or glares in the image frame. Figures 1.2 (A-C) show violations of these rules, while Figure 1.1 D is an example of an optimal image. The CNN classifier was developed using an Nvidia GeForce (V100; 32 GB VRAM) GPU implementing the Tensorflow library (Abadi et al., 2016) and was developed by retraining the last few layers of a MobileNetV2 pretrained architecture (Howard et al., 2017). 18 Figure 1.1 Examples of low quality (A-C) and high quality (D) test images. Preprocessing Steps To better-enable targeted training, images were manually augmented to ensure quality of the training data was maintained. Images were standardized in size (224 x 224 pixels), cropped and centered so target items were the central focus. In some cases, items were clipped from their background entirely to ensure that only one class of item (real or fake) was in the image. Microsoft Paint3D was used for all image adjustments and augmentations. 19 Training, Validation, Testing To develop our image classifier model, images were randomly subset into training (n= 3,763) and validation (n= 665) sets. This was an 85-15 split, respectively. We trained our model over 20 epochs (iterations to update the model coefficients to improve performance on the training set) at a learning rate of 0.00083, at which point the training and validation loss were minimized. Finally, we selected the model with the lowest and most even loss, with the highest accuracy in the validation set. The final model was then applied to a test set of 649 images. The test set consisted of both real (n = 142) and fake (n = 507) items. To test the model at different sensitivities we adjusted the threshold of certainty for its predictions, and measured precision and recall. Precision is the fraction of correct predictions among all positive results, whether they are incorrect or not. Recall is the fraction of correct predictions among the total items targeted, regardless of whether they were correctly predicted or not. From this information a Receiver Operating Characteristics (ROC) curve was generated and the Area Under Curve (AUC) was calculated (Davis & Goadrich, 2006). Images which fell below the selected certainty threshold were categorized as ?inconclusive.? Responses categorized as ?inconclusive? were stored but prompted users to try and retake the image while suggesting they do not purchase the product. Example images are shown in Figure 1.2. 20 Figure 1.2 (Left Quad Clockwise) Resin, Coconut, Conch, Cow horn, and (Right) tortoiseshell products. Results Our computer vision model, henceforth referred to as ?SEE Shell?, was able to accurately identify 92.2% of the 665 validation images with a precision of 0.87, recall of 0.89 and an F1 score of 0.88. These results are summarized in a confusion matrix (Figure 1.3). When applied to our test set, SEE Shell was able to obtain a range of accuracy at different certainty thresholds between 82.6-90.3%, and an F1 score between 0.69-0.79. Results for the certainty thresholds, dropout rate, accuracy, precision, recall and F1 are summarized in Table 1.1. The ROC curve for our model revealed an AUC of 0.87 (Figure 1.4). 21 Figure 1.3 Confusion matrix summarization of validation dataset (n = 665) results comparing the predicted counts relative to their actual classification. SEE Shell was able to obtain a class accuracy of 89.2% for real item images, and 93.6% for fake item images on the validation set. 22 Figure 1.4 Receiver Operating Characteristic (ROC) curve for our SEE Shell image classifier. The vertical axis is the true positive rate, also known as the recall. The horizontal axis is the false positive rate and is representative of the probability of a false alarm. ROC curve is based on variable prediction certainty thresholds. Blue dashed line is representative of a ?No skill? classifier, which would have an equal (50-50) random chance of selecting the correct response. 23 Table 1.1. SEE Shell test dataset (n = 649) inconclusive counts, accuracy, precision, recall and F1 based on adjusted certainty threshold. Certainty Threshold Inconclusive Images Inconclusive Results (%) Accuracy (%) Precision Recall F1 None 0 - 82.6 0.566 0.880 0.690 0.60 16 2.5 82.8 0.574 0.879 0.695 0.75 38 5.9 83.8 0.591 0.882 0.708 0.80 48 7.4 84.5 0.601 0.895 0.719 0.85 67 10.3 85.7 0.628 0.901 0.740 0.90 91 14.0 86.6 0.640 0.894 0.746 0.95 117 18.0 87.2 0.654 0.889 0.754 0.99 170 26.2 88.7 0.672 0.911 0.773 1.00 217 33.4 90.3 0.696 0.920 0.792 Discussion In this study, our validation and test set results showed that SEE Shell is an accurate method for detecting real tortoiseshell from images. Our results demonstrate that the application of deep learning to targeted wildlife trafficking projects can generate models, like SEE Shell, which may be immediately useful for conservation. By cultivating the largest known database of tortoiseshell products to date, we were able to train a computer vision model to differentiate between item types found within the image collection with high accuracy. Wide circulation of SEE Shell, and the mobile application, will allow stakeholders to monitor, avoid, and study the illegal tortoiseshell trade and support consumers to make more informed choices. Stakeholders include community scientists and conservation professionals seeking to monitor tortoiseshell products in their area, web-based marketplaces or e-commerce ventures looking to filter out illegal solicitation on their platforms, and officials monitoring, detecting, or confiscating 24 trafficked products at transaction points like shipping ports, customs checkpoints, and tourist areas. The specific goal of our application was to identify ?Real? tortoiseshell products, and in doing so emphasize the importance of recall on any given version of our model in the context of its usage. This is because not catching instances of real tortoiseshell is much worse than accidentally flagging fake tortoiseshell for further scrutiny. Results considered ?inconclusive? can also be flagged for inspection to err on the side of caution. SEE Shell has the potential to dramatically cut down on the number of items needed to be scrutinized, while expanding the overall monitoring effort?s reach. For example, from 1997 to 2003, legal wildlife shipments into the United States increased from 57,491 to 115,667 per year, yet the number of staff monitoring these shipments remained the same. This led to a substantial reduction in inspection rates, as it dropped from 36% in 1997 to 22% in 2003 (Rice & Moore, 2008). With an increasing volume of shipments to monitor, image classification tools like SEE Shell, may be able to increase the reach of monitoring effort; allowing officials to maintain their current workload but limit their purview to instances with high classification uncertainty only. This same logic can be applied to the discrete filtration of illegal solicitations on web-based marketplaces or e-commerce sites for tortoiseshell, where there have been more recent reports of shifts towards and increases in usage and activity (Rice & Moore, 2008; Nahill et al., 2020). Another direct benefit of using a computer vision model such as SEE Shell is the cost efficiency in its immediate detection. Where current methods of identifying tortoiseshell rely on visual expert opinion, burning the item (Brad Nahill, personal comm.) or damaging genetic testing (Jensen et al., 2019), SEE Shell does not require any physical sampling of collected shell 25 material. Although monitors can be trained to expertly discern tortoiseshell from mimics, this takes time, effort, is susceptible to human error and often needs finances to do so. A CNN such as SEE Shell offers a less expensive and more efficient means of enabling identification. Given the demand for highly accurate deep learning models, we expect to continue to improve SEE Shell?s performance and expand its capabilities of associated data collection for monitoring and reporting purposes. Specifically, our model?s relative change in accuracy, precision, recall, and F1 between the validation and test set results can be explained by the nature of the two data sets. Although both sets were never seen by the model during training, inherently training data resembles validation data and makes validation results at least in part biased (James et al., 2013; Kuhn & Johnson, 2013). It is likely the preprocessing augmentations made to our initial image set, which was subdivided for training and validation, created ideal conditions for us to perform optimally without any adjustments to prediction certainty thresholds. This lack of variability in our training data likely contributed to our slight loss in overall performance on the test set. Additionally, our test images were collected with minimal instructions given, and resulted in several images that were less than optimal for prediction (Figure 1.1, A-C). Some test images were of non-target items (e.g., cats, power tools, plants), which may have also lowered SEE Shell?s performance. In a few cases test item images were indistinguishable for even a human (author of this paper) to discern due to poor lighting and focus (Figure 1.4). To address this, we plan for future versions of our model to use an ensemble method by adding multiple models to enhance our application?s output as implemented in other computer vision projects (Guo et al., 2020; Yang et al., 2021). Specifically, we believe a blur detector, a bounding box, and a target detector will enhance SEE Shell?s precision and overall performance. Addition of a 26 blur detector would filter images such as those seen in Figure 1.1 A. Inclusion of a bounding box would segment images, and enable prediction on multiple items within the same photo, addressing images like Figure 1.1 B-C. Finally, inclusion of a target detector would simply filter out images of irrelevant items or full store front photographs. Each of these would prompt the user to retake the photograph, reminding them of the underlying instructions. It is worth noting that our underlying architecture was that of a MobileNetV2, which is condensed for deployment to mobile devices. Although increasing the size of a neural network doesn?t always improve performance (Malach & Shalev-Shwartz, 2019), it is possible that utilizing a larger neural network architecture may yield improved results given how complicated identifying tortoiseshell products has proven to be. Shallow architectures have proven to be useful on simple or well-constrained problems but can be limited when dealing with more complicated real-world visuals and scenes (Yoo, 2015). Use of other, larger architectures may lead to greater accuracy, but could limit portability especially in areas where there is limited cellular service to access server-side assets. As computational technologies advance, we recommend further exploration of new architectures and mobile deployment tools utilizing our dataset. 27 Figure 1.5 Example of exceptionally challenging test images. Conclusions Here, we presented our deep learning model, SEE Shell, which proved to be a viable approach for accurately and nondestructively identifying and detecting real and fake tortoiseshell products. We have deployed this through a mobile application, with the expectation that it will be used as an operational tool for monitoring hawksbill derived products in the field. Although we plan to continue to improve the performance of SEE Shell in the future, as a standalone mobile tool its current accuracy can benefit tortoiseshell monitoring efforts globally. Our SEE Shell application can act as a case study for other applications which seek to implement computer vision to combat the illegal wildlife trafficking of other threatened species References Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M. & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. 12th USENIX symposium on operating systems design and implementation, 265-283. https://doi.org/10.5281/zenodo.5949169 28 Aguirre, A.A., Gardner, S.C., Marsh, J.C., Delgado, S.G., Limpus, C.J. & Nichols, W.J. (2006). Hazards associated with the consumption of sea turtle meat and eggs: a review for health care workers and the general public. EcoHealth, 3(3), 141-153. https://doi.org/10.1007/s10393-006-0032-x Alvarado-D?az, J., Delgado-Trejo, C. & Suazo-Ortu?o, I. (2001). Evaluation of the black turtle project in Michoacan, Mexico. Marine Turtle Newsletter, 92, 4-7. http://www.seaturtle.org/mtn/archives/mtn92/mtn92p4.shtml?nocount Calmanovici, B., Waayers, D., Reisser, J., Clifton, J. & Proietti, M. (2018). I3S Pattern as a mark-recapture tool to identify captured and free-swimming sea turtles: an assessment. Marine Ecology Progress Series, 589, 263-268. https://doi.org/10.3354/meps12483 Cheng, I.J., Cheng, W.H. & Chan, Y.T. (2018). Geographically close, yet so different: Contrasting long-term trends at two adjacent sea turtle nesting populations in Taiwan due to different anthropogenic effects. PLOS ONE, 13(7), e0200063. https://doi.org/10.1371/journal.pone.0200063 Davis, J. & Goadrich, M. (2006) The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd international conference on Machine learning, 233-240. https://doi.org/10.1145/1143844.1143874 De Zeeuw, P. M., Pauwels, E. J., Ranguelova, E. B., Buonantony, D. M., & Eckert, S. A., (2010). Computer assisted photo identification of Dermochelys coriacea. Proceedings of the International Conference on Pattern Recognition (ICPR), 165-172. Dutton, D.L., Dutton, P.H., Chaloupka, M. & Boulon, R.H. (2005). Increase of a Caribbean leatherback turtle Dermochelys coriacea nesting population linked to long-term nest 29 protection. Biological Conservation, 126(2), 186-194. https://doi.org/10.1016/j.biocon.2005.05.013 Frazier, J., Lutz, P., Musick, J., & Wyneaken, J. (2003). Prehistoric and ancient historic interactions between humans and marine turtles. In P.L Lutz, J. A. Musick & J. Wyneken, The Biology of Sea Turtles, Volume II, 1-38. https://doi.org/10.1201/9781420040807 Gatto, C.R., Rotger, A., Robinson, N.J. & Tomillo, P.S. (2018). A novel method for photo- identification of sea turtles using scale patterns on the front flippers. Journal of Experimental Marine Biology and Ecology, 506, 18-24. https://doi.org/10.1016/j.jembe.2018.05.007 Guo, P., Xue, Z., Mtema, Z., Yeates, K., Ginsburg, O., Demarco, M., Long, L.R., Schiffman, M. & Antani, S. (2020). Ensemble deep learning for cervix image selection toward improving reliability in automated cervical precancer screening. Diagnostics, 10(7), 451. https://doi.org/10.3390/diagnostics10070451 Hanna, M.E., Chandler, E.M., Semmens, B.X., Eguchi, T., Lemons, G.E. & Seminoff, J.A. (2021). Citizen-sourced sightings and underwater photography reveal novel insights about green sea turtle distribution and ecology in southern California. Frontiers in Marine Science, 8, 500. https://doi.org/10.3389/fmars.2021.671061 Harrison, E., von Weller, P., & Nahill, B. (2017). Endangered Souvenirs- Hawksbill Sea Turtle Products Sale in Latin America & the Caribbean. Oregon, USA: SEE Turtles, 1-35. https://www.seeturtles.org/s/Endangered-Souvenirs-Report-Final.pdf 30 Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv, 1704.04861. https://doi.org/10.48550/arXiv.1704.04861 James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013). An Introduction to Statistical Learning. New York: Springer, (112), 176. https://doi.org/10.1007/978-1-0716-1418-1 Jensen, M.P., LaCasella, E.L., Dutton, P.H. & Madden Hof, C.A. (2019). Cracking the Code: Developing a tortoiseshell DNA extraction and source detection method. Australia: World Wildlife Fund for Nature- Australia, 1-20. https://www.wwf.org.au/ArticleDocuments/353/pub-cracking-the-code-2019- 21aug19.pdf.aspx Joseph, J., Nishizawa, H., Alin, J.M., Othman, R., Jolis, G., Isnain, I. & Nais, J. (2019). Mass sea turtle slaughter at Pulau Tiga, Malaysia: Genetic studies indicate poaching locations and its potential effects. Global Ecology and Conservation, 17, e00586. https://doi.org/10.1016/j.gecco.2019.e00586 Kuhn, M. & Johnson, K. (2013). Applied Predictive Modeling. New York: Springer, (26), 67. https://doi.org/10.1007/978-1-4614-6849-3 LaCasella, E.L., Jensen, M.P., Madden Hof, C.A., Bell, I.P., Frey, A. & Dutton, P.H. (2021). Mitochondrial DNA profiling to combat the illegal trade in tortoiseshell products. Frontiers in Marine Science, 7, 1225. https://doi.org/10.3389/fmars.2020.595853 Malach, E., & Shalev-Shwartz, S. (2019). Is deeper better only when shallow is good?. Advances in Neural Information Processing Systems, 32, 6429-6438. https://doi.org/10.48550/arXiv.1903.03488 31 Mancini, A., Senko, J., Borquez-Reyes, R., P?o, J. G., Seminoff, J. A., & Koch, V. (2011). To poach or not to poach an endangered species: elucidating the economic and social drivers behind illegal sea turtle hunting in Baja California Sur, Mexico. Human Ecology, 39(6), 743-756. https://doi.org/10.1371/journal.pone.0001041 McDonald, D., Dutton, P., Brander, R. & Basford, S. (1996). Use of pineal spot (pink spot) photographs to identify leatherback turtles. Herpetological Review, 27(1), 11. Miller, E.A., McClenachan, L., Uni, Y., Phocas, G., Hagemann, M.E. & Van Houtan, K.S. (2019). The historical development of complex global trafficking networks for marine wildlife. Science Advances, 5(3), eaav5948. https://doi.org/10.1126/sciadv.aav5948 Nada, M. & Casale, P. (2011). Sea turtle bycatch and consumption in Egypt threatens Mediterranean turtle populations. Oryx, 45(1), 143-149. https://doi.org/10.1017/S0030605310001286 Nahill, B., von Weller, P., & Barrios-Garrido., H. (2020). The global tortoiseshell trade. Oregon, USA: SEE Turtles, 1-83. https://static1.squarespace.com/static/5369465be4b0507a1fd05af0/t/5f37089ddc88be5b0f ce18fe/1597442219875/Global+Tortoiseshell+Report.pdf Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. & Clune, J. (2018). Automatically identifying, counting, and describing wild animals in camera- trap images with deep learning. Proceedings of the National Academy of Sciences, 115(25), E5716-E5725. https://doi.org/10.1073/pnas.1719367115 Qui?ones, J., Quispe, S. & Galindo, O. (2017). Illegal capture and black market trade of sea turtles in Pisco, Peru: the never-ending story. Latin American Journal of Aquatic Research, 45(3), 615-621. http://dx.doi.org/10.3856/vol45-issue3-fulltext-11 32 Reisser, J., Proietti, M., Kinas, P. & Sazima, I. (2008). Photographic identification of sea turtles: method description and validation, with an estimation of tag loss. Endangered Species Research, 5(1), 73-82. https://doi.org/10.3354/esr00113 Rice, S.M. & Moore, M.K. (2008). Trade secrets: a ten year overview of the illegal import of sea turtle products into the United States. Marine Turtle Newsletter, (121), 1-5. http://hdl.handle.net/1834/30789 Rudrud, R.W. (2010). Forbidden sea turtles: Traditional laws pertaining to sea turtle consumption in Polynesia (Including the Polynesian Outliers). Conservation and Society, 8(1), 84-97. https://doi.org/10.4103/0972-4923.62669 Schuettpelz, E., Frandsen, P.B., Dikow, R.B., Brown, A., Orli, S., Peters, M., Metallo, A., Funk, V.A. & Dorr, L.J. (2017). Applications of deep convolutional neural networks to digitized natural history collections. Biodiversity Data Journal, 1(5). https://dx.doi.org/10.3897%2FBDJ.5.e21139 Schofield, G., Katselidis, K.A., Dimopoulos, P. & Pantis, J.D. (2008). Investigating the viability of photo-identification as an objective tool to study endangered sea turtle populations. Journal of Experimental Marine Biology and Ecology, 360(2), 103-108. https://doi.org/10.1016/j.jembe.2008.04.005 Senko, J., Mancini, A., Seminoff, J.A. & Koch, V. (2014). Bycatch and directed harvest drive high green turtle mortality at Baja California Sur, Mexico. Biological Conservation, 169, 24-30. https://doi.org/10.1016/j.biocon.2013.10.017 Sullivan, D.P., Winsnes, C.F., ?kesson, L., Hjelmare, M., Wiking, M., Schutten, R., Campbell, L., Leifsson, H., Rhodes, S., Nordgren, A. & Smith, K. (2018). Deep learning is 33 combined with massive-scale citizen science to improve large-scale image classification. Nature Biotechnology, 36(9), 820. https://doi.org/10.1038/nbt.4225 Yang, J., Yu, K., Gong, Y. & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. 2009 IEEE Conference on computer vision and pattern recognition (IEEE), 1794-1801. https://doi.org/10.1109/CVPR.2009.5206757 Yang, X., Zhang, Y., Lv, W. & Wang, D. (2021). Image recognition of wind turbine blade damage based on a deep learning model with transfer learning and an ensemble learning classifier. Renewable Energy, 163, 386-397. https://doi.org/10.1016/j.renene.2020.08.125 Yoo, H.J. (2015). Deep convolution neural networks in computer vision: a review. IEIE on Transactions on Smart Processing and Computing, 4(1), 35-43. https://doi.org/10.5573/IEIESPC.2015.4.1.035 Yu, X., Wang, J., Kays, R., Jansen, P.A., Wang, T. & Huang, T. (2013). Automated identification of animal species in camera trap images. Journal on Image and Video Processing, 52(2013), 1-10. https://doi.org/10.1186/1687-5281-2013-52 34 Chapter 2: Mapping Genetic Lineage through Morphology- A novel application of computer vision for morphological orientation from digitized field data Abstract 1. Recent research utilized geometric morphometrics, associated genetic data, and PCA to successfully delineate Chelonia mydas (green sea turtle) morphotypes from carapace measurements. Here we demonstrate a similar, yet more rapid approach to this analysis using computer vision models. 2. We applied a U-Net to isolate carapace pixels of (n = 204) of juvenile C. mydas from multiple foraging grounds across the Eastern Pacific, Western Pacific, and Western Atlantic. These images were then sorted based on general alignment (shape) and coloration of the pixels within the image using a pre-trained computer vision model (MobileNetV2). 3. The dimensions of these data were then reduced and projected using UMAP. Associated vectors were then compared to simple genetic distance using a Mantel test. Data points were then labeled post-hoc for exploratory analysis. 4. We found clear congruence between carapace morphology and genetic distance between haplotypes, suggesting that our image data have biological relevance. Our findings also suggest that carapace morphotype is associated with specific haplotypes within C. mydas. 5. Our cluster analysis (k = 3) corroborates past research which suggests there are at least three morphotypes from across the Eastern Pacific, Western Pacific, and Western Atlantic. 35 Introduction Species are considered cryptic if they are taxonomically classified with two or more species as a single nominal unit, where one species should actually be several. This typically occurs when distinct species, at least superficially, resemble one another morphologically (Bickford et al., 2007). This is frequently reflected early in the ontogenetic stages of many organisms (Heldstab et al., 2020; Chatterji et al., 2022). Recognition of cryptic species has grown exponentially since 1975, mostly due to the advent of advanced technologies like Polymerase Chain Reaction (Bickford et al., 2007). More recently, advanced genetic analyses have been applied to aid in the delineation of cryptic species such as those observed in Canis spp. (Wolves, Jackals and Dogs; Koepfli et al., 2015) or Macroclemys spp. (Alligator Snapping Turtles; Roman et al., 1999). Although useful as independent tools, recent efforts which have paired genetic analysis with machine learning have proven to reliably analyze trends (e.g. gene flow, morphological diversity) over spatial gradients (Howes et al., 2009; White et al., 2019). Unfortunately, many machine learning studies focus on simple target prediction with little evaluation of ecology or evolution (Lucas, 2020). One exception has been the application of machine learning for species delimitation using DNA sequences, as has occurred for insects, plants (Perez et al., 2022), arachnids (Derkarabetian et al., 2019; Derkarabetian et al., 2022) and North American Box turtles (Terrapene spp.; Martin et al., 2021). The delimitation of species has taxonomic and evolutionary implications for our understanding of speciation processes, and directly affects species? counts and biodiversity assessments, which can impact the conservation status of populations within a region (Funk et al., 2019; Saryan et al., 2020). 36 In the criteria for defining one species versus another, consistency is difficult to achieve (Mallet, 1995; Shanker, 2001). In recent years, researchers have begun to utilize the emerging tool of deep learning, a form of machine learning, to delineate phenotypic differences between image data sets and to distinguish between morphologically similar organisms (Schuettpelz et al., 2017; Earl et al., 2019; White et al., 2019). For example, Earl et al. (2019) were able to accurately (90%) differentiate between divergent genetic lineages (15 subgenus and 178 species) of Bombus (bumblebees) using images (n = 45,000) that may inherently represent morphological features. Similar model accuracy was achieved when machine learning was applied to herbarium specimens (Schuettpelz et al., 2017; White et al., 2019). More specifically, deep learning and image classification have been used to rapidly classify and distinguish key morphological features from similar images in several biological (Brosch et al., 2014; Hua et al., 2015) and ecological studies (Grinblat et al., 2016; Norouzzadeh et al., 2018; Younis et al., 2018). Past efforts to make consistent determinations for accurate classification of closely related species, or subspecies, is a long-standing topic of scientific debate within the sea turtle research community (Karl & Bowen, 1999). Sea turtles are particularly difficult to study due to their complex reproductive cycles, inaccessible localities, large home ranges, and generally enigmatic life histories (Reich et al., 2007; Kahn et al., 2016). Due to these complexities, sea turtles are considered one of the highest priority reptile taxa for the application of novel technologies (Komoroske et al., 2017). Although one of the most abundant sea turtle species, Chelonia mydas (green sea turtle) around the globe are of particular conservation interest, given the rapid degradation of their habitat and climate change-related decline observed in some populations (Jensen et al., 2018; Maurer et al., 2021). Over several decades, multiple studies have instigated a debate around the 37 taxonomy of Chelonia mydas and potential conspecifics (Baur, 1890; Carr, 1961; Caldwell, 1962; Kamezaki & Matsui, 1995; Dutton et al., 1996; Parham & Zug, 1996; Karl & Bowen, 1999; Pritchard, 1999; Bowen & Karl, 2000; Shanker, 2001; Okamoto & Kamezaki, 2014; ?lvarez-Varas et al., 2019). Whether ?split? or ?lumped,? the outcome of such determinations has clear financial and advocacy implications for C. mydas populations (Seminoff & Shanker, 2008). Until recently, much of the taxonomic focus has been on two morphotypes of C. mydas: a black morph with a conical carapace which is associated with the Eastern Pacific, and a light cream-yellow morph with an oval carapace which is believed to be distributed from the Atlantic to the Western Pacific (Parker et al., 2011; Z?rate et al., 2015). Chelonia mydas have demonstrated clear morphological variation (e.g., skull morphometrics, carapace scute arrangement, carapace length, flipper size) across their global distribution (Kamezaki & Matsui, 1995; Wyneken et al., 1999; Seminoff et al., 2015; S?nmez, 2019). Despite the clear phenotypic differences, early genetic research examining mitochondrial and nuclear phylogenetic positioning of the black morphotype dismissed the idea that C. mydas is anything but a single species based on a polyphyletic nature of its morphotypes (Karl & Bowen, 1999). These findings put application of genetic isolation and morphological analysis definitions for C. mydas at odds with one another. Although genetic data provide some resolution and morphology additional context, neither alone can be used to draw the line between sea turtle species (Seminoff & Shanker, 2008). Over the last two decades, research has brought to light new evidence which might help elucidate C. mydas and its conspecific taxonomic relation to one another. For example, a machine learning study using color histograms was able to accurately identify sea turtle species 38 based on carapace color (Paixao et al., 2018). Generally, turtle carapace shape is known to be heritable and associated with phylogeographic variation in tortoises (Chiari et al., 2009; Poulakakis et al., 2015), freshwater turtles (Lamb & Avise, 1992; Meyers et al., 2006; Rivera, 2008), and marine turtles (?lvarez-Varas et al., 2019). Contemporary geometric morphometric research suggests at least three morphotypes of C. mydas exist across the Eastern Pacific, Western Pacific and Atlantic (?lvarez-Varas et al., 2019). Additional research has demonstrated definitive allele frequency differences across nuclear markers, corresponding to differences between the Eastern and Western Pacific regional morphotypes (?lvarez-Varas et al., 2021). Past efforts to quantify sea turtle morphological variability among populations, by estimating morphospace occupation, were based on multiple carapace measurements using Principal Component Analysis (PCA), a dimensionality reduction technique (?lvarez-Varas et al., 2019). Although such data capture techniques are effective, the efficiency of machine learning using convolutional neural networks (CNNs) is more rapid and less prone to human error (Yildirim & Cinar, 2022). Additionally, the implementation of another dimensionality reduction tool, known as Uniform Manifold Approximation and Projection (UMAP; McInnes et al., 2018), in tandem with a CNN, has the capacity to outperform PCA (Yang et al., 2021; Chari et al., 2021). Past research has demonstrated that dimensionality reduction, when used in tandem with a machine learning model, can effectively delimitate species boundaries (Saryan et al., 2020). A recent debate has focused on the biological relevance of images, questioning the extent to which photographs can be used to delimitate species (Shatalkin & Galinskaya, 2017). Historically, there are examples of species that have been described purely by way of photographic evidence (Marshall & Evenhuis, 2015). These typeless species classifications are 39 entirely without physical voucher and are considered to be extremely questionable (Santos et al., 2016; Shatalkin & Galinskaya, 2017). Alternatively, there have been instances where image data has led to the direct discovery and eventual voucher of new species (Skejo & Caballero, 2016), and in some instances extinct species are only known to us by way of paintings or photographs (Amorim et al., 2016). Paintings and photographs have been used by ecologists for decades to study the natural world when physical vouchers were not possible (L?rig et al., 2021), and in some instances have been proven valid. For example, one study discovered the description of a black morphotype C. mydas present in Japanese waters, their unusual appearance being meticulously documented in a naturalist's drawing (Hayashi & Yasuda, 2022). This documented presence of the Eastern Pacific associated morphotype implies migration to the Western Pacific was occurring back in the Edo period (1600-1868; Hayashi & Yasuda, 2022), and still appears to occur today (Fukuoka et al., 2015). In order to answer the question of whether or not images can have biological relevance, congruence between potentially morphologically representative images and associated genetic data should be established. To streamline geometric morphometric analysis and examine the morphometric relationship of C. mydas and its conspecifics, we aim to estimate morphospace occupation for comparison of morphology and genetic distance, across populations using image data. This technique has previously been used to successfully evaluate diversity-disparity relationships within Polypodiopsida (ferns; White et al., 2019). By projecting morphological (e.g. turtle carapace) image data into a morphospace and confirming those embedding vectors are correlated to genetic distance we may be able to consistently quantify species occupation. Given that contemporary research has suggested that there is a connection, at the population level, between C. mydas carapace morphology and mitochondrial genetic lineages (?lvarez-Varas et 40 al., 2019), we would expect that a natural correlation exists between the genetic distance and a CNN-generated morphospace projection derived from associated image data. Utilizing mtDNA analysis in synthesis with deep machine learning tools, we hope to provide a case study for deploying CNNs to elucidate intraspecific phenotypic plasticity and natural selection across biogeographic scales. This research aims to use a pre-trained machine learning model to sort our carapace image data based on general alignment (shape) and coloration of the pixels within the image and compare it to associated genetic data. Methods and Experimental Design Data Collection and Study Sites Genetic samples and images of juvenile C. mydas (n = 204; Table 1) were captured during surveys at foraging sites located in the Southwestern Atlantic (Uruguay), Eastern Pacific (from north to south: Mexico, Costa Rica, Gal?pagos-Ecuador, Peru and Chile), and Australasia (Australia and New Zealand). Images were selected for analysis if they had an unobstructed dorsal view of the carapace scutes and associated mtDNA haplotype data. Data Processing To use an untrained machine learning model to sort our carapace image data, without risk of background pixels of non-biological items interfering with carapace data categorization, we needed to eliminate the possible influence of background pixels on CNN performance. To achieve this, we trained a machine learning model called a U-Net (Ronneberger et al., 2015) segmentation model to mask non-carapace pixels. Generated masks zero out (blacked) background pixels, while retaining carapace pixels (Figure 2.1). Specifically, we manually 41 masked a subset of images (n = 77) and used the methods of White et al. (2020) to develop a training set to build a U-Net segmentation model. The segmentation model was built on a resnet- 34 architecture pretrained on the ImageNet dataset (Deng et al., 2009). All images were then masked by our trained U-Net. All images were standardized in size (224 x 224 pixels). Figure 2.1 Example of an original (left) and masked (right) image of a female juvenile C. mydas. Semi-Supervised Classification and Dimensionality Reduction For the second step in our analysis, we wanted to take our masked images and enable a CNN, pre-trained to identify basic shapes and colors, to sort our data. Using a CNN called a MobileNetV2 (Howard et al., 2017) which is pretrained on the ImageNet dataset (Deng et al., 42 2009), we passed our masked carapace images through the network weights, sorting images based on general alignment (shape) and coloration of the pixels within the image. The MobileNetV2 was not trained on any of our carapace image data, and no classification labels were used to help the CNN decide how to embed each carapace image. The MobileNetV2 was deployed using an Nvidia GeForce (V100; 32 GB VRAM) GPU implementing the Tensorflow library (Abadi et al., 2016). To remove irrelevant features (noise) from our data and improve interpretability, dimensionality was reduced using the UMAP algorithm, and the low- dimensional embeddings were visualized. This semi-supervised approach has proven to allow for useful hypothesis-driven biological discovery by way of accurate latent space representation (Chari et al., 2021). The weighted MobileNetV2 output was then passed into a visualized projection space, generating reduced dimension coordinates for each carapace image using UMAP with 9 nearest neighbors used based on collection sample size (McInnes et al., 2018). Genetic Analysis Haplotypes associated with carapace images were approximately 770 base pairs (bp) of the mitochondrial control region, generated by ?lvarez-Varas et al. (2020) and Jensen et al. (2018). Simple genetic distance was calculated using the nucleotide difference between haplotypes in Geneious Prime 2022.1.1 (https://www.geneious.com). To label the haplotypes post-hoc, we labeled clades based primarily on the nomenclature of Jensen et al. (2019), except for three clustered haplotypes (CmP97.1, CmP109.1, CmP207.1) which were not part of the aforementioned phylogenetic analysis. Alternatively, Naro-Maciel et al. (2014) had found this unnamed haplotypic cluster tied to the Line Islands in the Polynesian Triangle, aligned phylogenetically between the Eastern Pacific (Clade IX) and Australasia Clades (III-V) 43 according to the nomenclature used in Jensen et al. (2019). Uncertain of which clade these haplotypes belong in, we labeled them as Clade VS for identification purposes. Table 2.1 Metadata for C. mydas carapace image data and associated haplotypes. Includes location of sample collection, number of images used for this study, haplotypes found at each location (?lvarez-Varas et al., 2020), and reference for data capture method. 44 Analysis and Visualization Due to the need to control for the inherent size differences between male and female turtles (Klingenberg, 2016; ?lvarez-Varas et al., 2019), we took a subset of our juvenile turtle data (n =77), which were sexed by a combination of probing for gonads and hormone testing (Jensen et al., 2018). We separately projected these masked carapace images using UMAP. We then compared the slope of regression lines for each sex using an ANCOVA in R. Euclidean vectors between carapace image points, projected using UMAP, were extracted, and centroids were calculated based on the haplotype grouping. Centroid vectors were formed into a distance matrix and compared (? = 0.05) to an associated genetic distance matrix (Appendix I) using a ?two-tailed? Mantel test using the package Mantel (Carr, 2021) in Python (Van Rossum & Drake, 2009). Mantel test was repeated on the data with haplotypes with low sample sizes (n ? 2) removed from the analysis to ensure accurate centroid triangulation. Centroids were calculated and visualized based on associated grouping, and separately, collection area. Grouping centroids were labeled based on the ocean basin associated with the majority of data point haplotypes within the cluster. Validity index analysis was conducted in R (R Core Team, 2021) using the package ?fpc? (Hennig, 2020) to substantiate the n-nearest neighbors (N. Neighbors = 9, 10, 30, 77; respectively) used and groupings observed in the projection. All morphospace projections were then visualized and labeled with metadata post-hoc for exploratory analysis. Results All (n = 204) juvenile carapace images were successfully masked and projected using UMAP (Figures 2.1-2.6; Appendix II). Comparison of male (R2 = 0.274) and female (R2 = 0.196) projection regressions (Figure 2.2) revealed no difference between slopes of regression lines 45 (F1,73 = 0.16, p = 0.68). Cluster validity indices suggested unanimously that the data contained three clusters (k = 3). Mantel test results indicated a significant correlation (p = 0.0002, Z = 4.69) between the haplotypic distances and projected morphological centroid distances. The same result was found (p = 0.0001, Z = 10.265) when small samples (n ? 2) were removed (Figure 2.5). Figure 2.2 Euclidean UMAP projection (N. Neighbors = 9) and regression of Male (n = 25) and Female (n = 52) juvenile Chelonia mydas carapace images from subset of data. 46 Figure 2.3 Euclidean UMAP projection (N. Neighbors = 9) of Chelonia mydas carapace images (n = 204) embedded using a MobileNetV2 neural network. Data were labeled post-hoc by Clade. Number of samples by Clade- II: n = 10, III: n = 8, IV: n = 61, V: n = 33, VS: n = 14, IX: n = 78. Cladistic nomenclature based on the alignments of Naro-Maciel et al. (2014) and Jensen et al. (2019). 47 Figure 2.4 Euclidean space UMAP projection (N. Neighbors = 9) of Chelonia mydas carapace image haplotype centroids embedded using a MobileNetV2 neural network. Centroid shape is indicative of a phylogenetic Clade (?Circle? = II, ?Star? = III, ?X? = IV, ?Diamond? = V, ?Plus? = VS, ?Square? = IX) based on the alignments of Naro-Maciel et al. (2014) and Jensen et al. (2019). Point size is indicative of relative sample size. 48 Figure 2.5 Euclidean space UMAP projection (N. Neighbors = 9) of Chelonia mydas carapace image centroids (n = 204) embedded using a MobileNetV2 neural network. Centroids were calculated based on the area of collection. Centroid shape is indicative of its ocean basin (?Circle? = Atlantic, ?Diamond? = Western Pacific, ?X? = Central Pacific, ?Square? = Eastern Pacific). 49 Figure 2.6 Centroid Chelonia mydas carapace UMAP projection. Colored circles are representative of centroid embeddings, adjacent visual examples were selected based on their proximity to the centroid. Discussion Our results found three separate clusters (k = 3) associated with our CNN-sorted carapace image data indicating there are carapace shape and coloration differences between turtles from the Atlantic and Pacific Oceans and among individuals within the Pacific Ocean (Figure 2.6). Results from our Mantel test substantiate the genetic congruence between carapace shape and approximate genetic origin, while also demonstrating the biological relevance of such an image 50 analysis using a CNN. Specifically, our semi-supervised CNN model sorted masked carapace images based on general alignment (shape) and coloration of the pixels within the image without prior training on our data. Although three distinct clusters were observed and verified through our validity index analysis, with the additional context of associated data labels we can clearly visualize the congruence observed between morphology and genetic distance from our Mantel test (Figures 2.3 & 2.4). This assortment was in some capacity reflective of the genetic value phenotypically held within the image. Based on our findings, our morphological data does group within phylogenetic alignments, while also appearing to contextualize Clade VS?s previously uncertain positioning. For example, the two studies which did phylogenetically place CmP97.1 did so within the same relative space, but with different relative relationships. Specifically, Naro-Maciel et al. (2014) examined the phylogenetics of C. mydas at Palmyra Atoll relative to the Eastern and Western Pacific and found that haplotype CmP97.1 was most closely neighbored to a cluster which includes CmP47.1 which aligned closer to CmP20.1 and CmP22.1 respectively. Similarly, another study examined French Polynesian nesting C. mydas in the context of the Western Pacific and Indo-Pacific and found that CmP97.1 aligned between CmP47.1 and the second cluster of CmP20.1 and CmP22.1 (Boissin et al., 2019). According to Jensen et al. (2019) CmP47.1 belongs to Clade IV and is associated with the South Western Pacific, while CmP20.1 and CmP22.1 are aligned within Clade III in the Central West and Central South Pacific respectively. Applying the context of our results (Figure 2.4) suggests, at least in terms of its morphological distance, that CmP97.1 aligns closest with CmP22.1, while CmP47.1 aligns closer to CmP.20.1. This means that based on our results, CmP97.1, would more likely align closer to 51 Clade III, and align closest with the Central South Pacific evolutionary distinct region (Jensen et al., 2019). Considering the locations CmP97.1 was found (Rapa Nui, New Zealand and Costa Rica) this alignment agrees with the proximity of these locations to the Central South Pacific, than to the other two evolutionary distinct regions defined by Jensen et al. (2019). Additionally, network analysis on telemetry data for C. mydas revealed the clear connectivity between the Eastern and Central South Pacific (Kot et al., 2022). Given UMAPs ability to maintain local signal integrity within a morphospace (Chari et al., 2021), and the observed congruence with genetic distance within this study, we believe relative distances between our data points can offer an added level of resolution to taxonomic and phylogenetic efforts. Within the context of our study, which examined the morphology of foraging juvenile C. mydas collected across a geographic gradient, we observed several patterns. For example, clusters appeared to be made up almost entirely of haplotypes from within the same ocean basin. This also is reflected in the calculated centroid position based on collection area, where we observe a clear division of ocean basins, with Rapa Nui as a central point between Eastern and Western Pacific (Figure 2.5). Second, we observe a genetic partitioning (Figure 2.3), which is likely explained by the strong natal-rookery homing observed in C. mydas (Bowen & Karl, 2007; Dutton et al., 2014). This is also reflected in our Mantel test results. We believe there are a few potential explanations for the observed congruence between mtDNA and carapace shape. Specifically, we believe the origin of this signal could be an artifact from natural nuclear genetic exchange, or lack thereof, between populations that has resulted in an observable reflection within the mtDNA (Carreras et al., 2007). Despite their low mutation and evolution rates (Avise et al., 1992; Lee et al., 2020), multiple studies have identified a clear genetic distinction between several C. mydas populations, including between the Atlantic and 52 Pacific, using nuclear and mitochondrial markers (Roberts et al., 2004; Okamoto & Kamezaki, 2014; Jensen et al., 2019; ?lvarez-Varas et al., 2021). Future research should evaluate whether the same relationship between carapace morphology and nuclear genetic distance exists. An alternative explanation may be related to mtDNA's environmental selection and its physiological effects on turtle carapace composition. For aquatic turtles, studies have found that environmental factors like hydrology, thermal and chemical composition can impact turtle carapace shape (Rivera et al., 2014; Nagle et al., 2018). Additionally, there are clear deleterious effects of hypoxia by way of slowed mineralization and decreased material strength of shell composition (Jackson et al., 2000; Odegard et al., 2018). Chelonia mydas are specially adapted for long anaerobic dives, which require specialized ATP-yielding capabilities to help avoid these effects (Hochachka & Storey, 1975). Compared to other sea turtles like Caretta caretta (loggerhead; 60.3 Torr), C. mydas has a markedly higher oxygen affinity (32.6 Torr), this may generally expose them to greater rates of tissue hypoxia (Isaacks et al., 1978; Woods et al., 1984; Jackson, 1985) which could increase the influence this might have as a selection pressure relative to other sea turtle species. Additionally, substantial evidence suggests that the physical intensity that comes with avoiding hypoxia or complete anoxia has put sea turtle mitogenomes under considerable selection pressure (Norman et al., 1994; Ramos et al., 2020). Research has found mitochondrial selection specifically associated with aerobic respiration and shell composition in aquatic turtles (Escalona et al., 2017). We believe the genetic and geographic fidelity we observed in our results (Figures 2.3 ? 2.6) amongst C. mydas morphotypes may be driven by habitat-specific differences impacting selection at the mitochondrial and metabolic level. For example, within the Eastern Pacific, dissolved oxygen content and temperature are naturally lower than they are in the Atlantic (Dunbar et al., 1994; Helly & Levin, 2004; Breitburg 53 et al., 2018; Gr?goire et al., 2021). Generally, C. mydas seek out habitats with higher dissolved oxygen and warmer temperatures due to greater food availability associated with those areas (Hays et al., 2000; Attum & Rabia, 2021). Studies have shown that Atlantic C. mydas appear to outright avoid coastal hypoxic events (Craig et al., 2001) which resemble the habitat conditions (O2 < 62 ?mol kg?1) their Eastern Pacific counterparts live in persistently (Breitburg et al., 2018; Gr?goire et al., 2021). The potential influence dissolved oxygen and temperature may have on morphologic variability may be reflected in foraging behavior differences between C. mydas morphotypes across and within ocean basins. For example, Eastern Pacific C. mydas appear to quickly traverse long distances for coastal foraging (Seminoff et al., 2002; Seminoff & Jones, 2006; Blanco et al., 2012), where their Atlantic counterparts are less inclined to do so (Godley et al., 2003; Makowski et al., 2006; Schultz, 2016; Doherty et al., 2020). This is validated by the observation that C. mydas that consume non-sedentary prey (e.g. free floating algae, crustaceans, cnidarians), like those in the Eastern Pacific often do (Carri?n-Cortez et al., 2010; Parker et al., 2011; Tomaszewicz et al., 2018), are more likely to traverse long distances when compared to those feeding on seagrass in the Atlantic (Godley et al., 2003). Studies examining diel movements of foraging Eastern Pacific C. mydas found that daily foraging activity appears to be substantially higher (8.2 km) than that of their Atlantic (1.2 - 4.1 km) and Western Pacific (0.9 - 4.9 km) conspecifics (Mendon?a, 1983; Whiting & Miller, 1998; Seminoff & Jones, 2006). Lower water temperatures, like those experienced in the Eastern Pacific, typically result in increased dive duration and frequency of deep dives for C. mydas (Dunbar et al., 1994; Enstipp et al., 2011; Enstipp et al., 2016; Madrak et al., 2022). Additional evidence suggests that the lower temperatures at depth may enable Pacific C. mydas to dive 54 deeper than their Atlantic counterparts (Hatase et al., 2006). Even within the Atlantic, comparisons have demonstrated that C. mydas in the lower dissolved-oxygen waters (Breitburg et al., 2018; Gr?goire et al., 2021) near Ascension Island dive deeper and stay at shallow depths less frequently than their Mediterranean counterparts, with the latter experiencing easier foraging (Hays et al., 2002). This difference in foraging strategy may comparatively be reflected in the particularly slow growth rates experienced by the black morphotype when compared to their Atlantic counterparts (Labrada-Martag?n et al., 2017). If C. mydas in the Eastern Pacific have adapted to more frequent and longer deep dives due to lower temperatures and decreased food availability, we can reasonably expect metabolic adaptations. These adaptations at the mitochondrial level, likely in conjunction with nuclear genes, could be reflected in changes to shell morphology and composition, given the association between hypoxia avoidance and carapace composition (Norman et al., 1994; Escalona et al., 2017; Ramos et al., 2020). Additional evidence for a relationship between environmental conditions and carapace morphology can be found by looking within the Eastern Pacific where both yellow and black morphotypes have been observed foraging (Amorocho et al., 2012; Sampson et al., 2014; Z?rate et al., 2015; Sampson et al., 2018), with markedly fewer yellow turtles than black in the region (Amorocho et al., 2012; Sampson et al., 2014; Z?rate et al., 2015; Sampson et al., 2018). Genetic evidence suggests that yellow C. mydas found in the Eastern Pacific are of Western Pacific origin (Amorocho et al., 2012). As such, studies have found that yellow turtles remain in Eastern Pacific foraging grounds for shorter periods of time than black turtles (Sampson et al., 2014). Additionally, Eastern Pacific C. mydas demonstrate morphotype-specific life histories, which appear to be related to resource usage (Sampson et al., 2014; Seminoff et al., 2021). For example, stable isotope analysis in the Galapagos indicated that black C. mydas may be more 55 likely to forage in deeper offshore sites than yellow turtles within the same habitat (Z?rate et al., 2012). Another study found that yellow C. mydas foraging off the Pacific coast of Costa Rica had diets akin to the neritic Eretmochelys imbricata (hawksbill) rather than their black counterparts (Clyde-Brockway, 2019). Similarly, another study found that within one Eastern Pacific foraging area, C. mydas spent more time swimming and diving, and are more active at depth then they are in other parts of the world (Seminoff et al., 2021). This may be indicative of differing morphotype-specific foraging strategies (Z?rate et al., 2012; Sampson et al., 2014; Seminoff et al., 2021), which could demonstrate an adaptation by Eastern Pacific C. mydas. Based on the body of evidence put forth here, we would expect yellow turtles to be less habituated to the lower temperature and dissolved oxygen conditions within the Eastern Pacific than their black counterparts, yet more so than Atlantic turtles. A similar Atlantic-Eastern Pacific difference has been observed in the deep diving Dermochelys coriacea (leatherback sea turtle), with Eastern Pacific populations experiencing similar foraging difficulties as C. mydas (Seminoff & Jones, 2006; Bailey et al., 2012). It is also worth noting the exceptionally hydrodynamic morphological (Seminoff et al., 2012; Bang et al., 2016) similarities between D. coriacea and the black C. mydas morphotype. Both turtles have an exceptionally conical or tear-drop shape carapace and black coloration (Parker et al., 2011; Seminoff et al., 2012). These similarities could be a sign of morphological convergence and may indicate that the black C. mydas are transitioning into a deeper diving species. Additional evidence for this environmentally driven morphology change come from genetic research which indicates C. mydas nuclear markers are under clear morphotype-specific selection at genes associated with hypoxia management, melanin, UV regulation and thermoregulation (?lvarez-Varas et al., 2021). Considering these findings in light of our 56 observations, the mitochondrial signal might actually be reflective of true physiological processes at play, rather than a secondary genetic signal. We recommend future research into the dive behavior and physiology of the black-morphotype, with consistent reporting throughout the Eastern Pacific with comparison to Atlantic and Western Pacific individuals. We do consider differences in nesting ecology and ontogenetic changes as a potential driver for our observed genetic signal. Specifically, there are clear differences in nesting conditions (Rubinoff, 1968) and success between the Eastern Pacific and Atlantic sea turtles (Pike, 2014). In C. mydas it seems higher nesting temperatures, especially in dry conditions, increase the likelihood of anomalous shell scute pattern in hatchlings (Zimm et al., 2017). Given the Eastern Pacific generally experiences greater variability (18 ?C) in temperatures than the Western Atlantic (6 ?C; Rubinoff, 1968), we might expect increased exposure to extreme temperatures to impact the rate at which anomalous shell patterns appear. Preliminary results from lab experiments suggest that hatchling phenotype (e.g carapace size, hatchling mass, swim thrust, stroke rate) is likely influenced by both maternal origin and nesting temperature (Booth et al., 2013). We recommend future research to compare nesting conditions, inter-nesting temperature, and its impacts on hatchling morphology across C. mydas populations from different ocean basins. Regardless of signal origin, our results clearly imply that the images themselves have real-world value and carry at least some of the necessary biological information needed to conduct an objective study of species morphology and population genetics using a CNN. Implementing a CNN on images of genetically associated morphological traits, across turtles and other taxa may enable a more consistent method of delineating species-subspecies boundaries and aid in phylogenetic alignment. We believe our results reflect actual biological processes and 57 set the stage for the development of CNNs for genetic inference. Given the congruence between C. mydas mtDNA and carapace shape, future development of computer vision models trained specifically on carapace image classes associated with specific haplotypes may be able to assist conservation professionals in the field. Specifically, a computer vision application that can accurately identify a turtle's haplotype or clade from just a single captured image, may be possible. Another novel insight which was not a part of the original intent of this study was identification of individual turtles at foraging grounds, whose mtDNA signature likely belongs to a distant rookery. Although relatively few, there were outliers beyond their clade or expected grouping. Whether these are simply morphological outliers or instances of corrupted pixel arrangement (e.g. severe presence of barnacles, algae, misaligned scutes, mismasked pixels, etc.), we are not certain. Although our technique does account for pixel arrangement (shape) and color, we do not currently have a scale-specific understanding of each axis within our morphospace. To quantify the axis of latent morphospace projections going forward, we recommend the implementation of Generative Adversarial Networks (GAN), a type of CNN based architecture (Goodfellow et al., 2021). GANs can generate detailed composite images given a training set (Do et al., 2018). Within a latent morphospace projection, which is now a quantified pixel space, you can take a centroid of data points at your axis extrema and compare them. The output image from each of these should return a relative sense of what shifts along the axis should produce in terms of morphological differences. Our semi-supervised image analysis is likely at least in part independent of size. Specifically, we did not observe any of the documented size differences (Klingenberg, 2016; ?lvarez-Varas et al., 2019) between males and females in the regression analysis (Figure 2.2). 58 This might be due to some plasticity in photo quality. For example, small juvenile turtles could be photographed up close, making them appear larger than they are when compared to their counterparts, and vice versa. Additionally, a lack of size reference due to masking might also control for size differences. Without a consistent object (e.g., tape measure or ruler) included in the images, there is no sense of scale for the model to reference in sorting other than what information is available. In our case, this would be limited to the carapace and the relative size and coloration of the scutes to one another. Further investigation is needed to determine the extent to which pixel orientation (object shape) and coloration within a CNN effect size signals in such an analysis. For example, future studies may want to analyze the impact that carapace orientation or lighting has on embedding position. The overwhelming majority of our carapace images were oriented in the same dorsal direction under day light conditions (Figure 2.1). Although we would expect the dimensionality reduction to control for signal noise (Chari et al., 2021), we cannot be certain that our results are not at least in part influenced by small differences in carapace orientation or residual coloration alteration due to differences in lighting. Although we observed a clear genetic signal from our data, there is variability within each grouping. Inter-group carapace variability may be an artifact of underrepresentation due to sample size or a lack of control over the specific age of juveniles. Specifically, research has illuminated age-specific differences in juvenile carapace shell shape, with a clear ?widening,? and in turn ?slimming? throughout this life stage (Salmon et al., 2018). Given the stark contrast of our groupings we wouldn?t expect such changes to dramatically augment image position within the morphological projection, however such variability was not accounted for in this study and should be researched further. 59 Conclusions Our CNN-sorted carapace image results found three separate clusters (k = 3) indicating there are carapace shape and coloration differences between turtles from the Atlantic and Pacific Oceans and among individuals within the Pacific Ocean, corroborating the findings of ?lvarez- Varas et al. (2019). Our Mantel test results demonstrate the biological relevance of image analysis using a CNN and substantiate the genetic congruence between carapace shape and approximate genetic origin. Given the clear genetic congruence between carapace shape and haplotype, we believe future efforts should attempt to develop computer vision techniques which can utilize this connection to accurately assign genetic origin to individual turtles based on carapace images. Future exploration of the intersection of morphology and genetics, especially within the context of taxonomic delineation, should utilize CNNs to aid in quantifying key phenotypic traits. We believe research put forth in this manuscript demonstrates the efficacy of utilizing such a pipeline for holistically analyzing evolutionary trends at genetic and biogeographic scales. Specifically, we have put forth evidence which suggests CNNs can accurately and rapidly mimic the outputs of other dimensionality reduction techniques, improving phylogenetic articulation of potentially cryptic species. References Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M. & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. 12th USENIX symposium on operating systems design and implementation, 265-283. https://doi.org/10.5281/zenodo.5949169 60 ?lvarez-Varas, R., Contardo, J., Heidemeyer, M., Forero-Rozo, L., Brito, B., Cort?s, V., Brain, M.J., Pereira, S. & Vianna, J.A. (2017). Ecology, health and genetic characterization of the southernmost green turtle (Chelonia mydas) aggregation in the Eastern Pacific: implications for local conservation strategies. Latin American Journal of Aquatic Research, 45(3), 540-554. http://dx.doi.org/10.3856/vol45-issue3-fulltext-4 ?lvarez-Varas, R., Heidemeyer, M., Riginos, C., Ben?tez, H.A., Res?ndiz, E., Lara-Uc, M., Godoy, D.A., Mu?oz-P?rez, J.P., Alarc?n-Ruales, D.E., V?lez-Rubio, G.M. & Fallabrino, A. (2020). Integrating morphological and genetic data at different spatial scales in a cosmopolitan marine turtle species: challenges for management and conservation. Zoological Journal of the Linnean Society, 191(2), 434-453. https://doi.org/10.1093/zoolinnean/zlaa066 ?lvarez-Varas, R., Rojas-Hern?ndez, N., Heidemeyer, M., Riginos, C., Ben?tez, H.A., Araya- Donoso, R., Res?ndiz, E., Lara-Uc, M., Godoy, D.A., Mu?oz-P?rez, J.P. and Alarc?n- Ruales, D.E. (2021). Green, yellow or black? Genetic differentiation and adaptation signatures in a highly migratory marine turtle. Proceedings of the Royal Society B: Biological Science, 288(1954), 20210754. https://doi.org/10.1098/rspb.2021.0754 ?lvarez-Varas, R., V?liz, D., V?lez-Rubio, G.M., Fallabrino, A., Z?rate, P., Heidemeyer, M., Godoy, D.A. & Ben?tez, H.A. (2019). Identifying genetic lineages through shape: An example in a cosmopolitan marine turtle species using geometric morphometrics. PLOS ONE, 14(10), e0223587. Amorim, D.S., Santos, C.M.D., Krell, F.T., Dubois, A., Nihei, S.S., Oliveira, O.M., Pont, A., Song, H., Verdade, V.K., Fachin, D.A. & Klassa, B. (2016). Timeless standards for 61 species delimitation. Zootaxa, 4137(1), 121-128. http://doi.org/10.11646/zootaxa.4137.1.9 Amorocho, D. F., Abreu-Grobois, F. A., Dutton, P. H., & Reina, R. D. (2012). Multiple distant origins for green sea turtles aggregating off Gorgona Island in the Colombian Eastern Pacific. PLOS ONE, 7(2), e31486. https://doi.org/10.1371/journal.pone.0031486 Attum, O., & Rabia, B. (2021). Green (Chelonia mydas) and loggerhead (Caretta caretta) habitat use of the most environmentally extreme sea turtle feeding ground in the Mediterranean basin. Journal of Coastal Conservation, 25(1), 1-7. https://doi.org/10.1007/s11852-020- 00793-1 Avise, J.C., Bowen, B.W., Lamb, T., Meylan, A.B. & Bermingham, E. (1992). Mitochondrial DNA evolution at a turtle's pace: evidence for low genetic variability and reduced microevolutionary rate in the Testudines. Molecular Biology and Evolution, 9(3), 457- 473. https://doi.org/10.1093/oxfordjournals.molbev.a040735 Bailey, H., Fossette, S., Bograd, S.J., Shillinger, G.L., Swithenbank, A.M., Georges, J.Y., Gaspar, P., Str?mberg, K.P., Paladino, F.V., Spotila, J.R. & Block, B.A. (2012). Movement patterns for a critically endangered species, the leatherback turtle (Dermochelys coriacea), linked to foraging success and population status. PLOS ONE, 7(5), e36401. https://doi.org/10.1371/journal.pone.0036401 Bang, K., Kim, J., Lee, S. I., & Choi, H. (2016). Hydrodynamic role of longitudinal dorsal ridges in a leatherback turtle swimming. Scientific reports, 6(1), 1-10. https://doi.org/10.1038/srep34283 Baur, G. (1890). The genera of the Cheloniidae. American Naturalist, 1890, 486-487. 62 Bickford, D., Lohman, D.J., Sodhi, N.S., Ng, P.K., Meier, R., Winker, K., Ingram, K.K. & Das, I., (2007). Cryptic species as a window on diversity and conservation. Trends in Ecology & Evolution, 22(3), 148-155. Blanco, G. S., Morreale, S. J., Bailey, H., Seminoff, J. A., Paladino, F. V., & Spotila, J. R. (2012). Post-nesting movements and feeding grounds of a resident East Pacific green turtle Chelonia mydas population from Costa Rica. Endangered Species Research, 18(3), 233-245. https://doi.org/10.3354/esr00451 Boissin, E., Neglia, V., Boulet Colomb D?hauteserre, F., Tatarata, M., & Planes, S. (2019). Evolutionary history of green turtle populations, Chelonia mydas, from French Polynesia highlights the putative existence of a glacial refugium. Marine Biodiversity, 49(6), 2725- 2733. https://doi.org/10.1007/s12526-019-01001-6 Booth, D. T., Feeney, R., & Shibata, Y. (2013). Nest and maternal origin can influence morphology and locomotor performance of hatchling green turtles (Chelonia mydas) incubated in field nests. Marine Biology, 160(1), 127-137. https://doi.org/10.1007/s00227-012-2070-y Bowen, B.W., & Karl, S.A. (2000). Meeting report: taxonomic status of the East Pacific green turtle (Chelonia agassizii). Marine Turtle Newsletter, 89, 20-22. http://www.seaturtle.org/mtn/archives/mtn89/mtn89p20.shtml?nocount Bowen, B. W., & Karl, S. A. (2007). Population genetics and phylogeography of sea turtles. Molecular Ecology, 16(23), 4886-4907. https://doi.org/10.1111/j.1365- 294X.2007.03542.x Breitburg, D., Levin, L.A., Oschlies, A., Gr?goire, M., Chavez, F.P., Conley, D.J., Gar?on, V., Gilbert, D., Guti?rrez, D., Isensee, K. & Jacinto, G.S. (2018). Declining oxygen in the 63 global ocean and coastal waters. Science, 359(6371), eaam7240. https://doi.org/10.1126/science.aam7240 Brosch, T., Yoo, Y., Li, D.K., Traboulsee, A. & Tam, R. (2014). Modeling the variability in brain morphology and lesion distribution in multiple sclerosis by deep learning. International Conference on Medical Image Computing and Computer-Assisted Intervention, 8674, 462-469. https://doi.org/10.1007/978-3-319-10470-6_5 Caldwell, D. K. (1962). Sea turtles in Baja Californian waters (with special reference to those of the Gulf of California), and the description of a new sub- species of northeastern Pacific green turtle. Los Angeles County Museum Contributions in Science, (61), 3-31. https://iucn-tftsg.org/wp-content/uploads/file/Articles/Caldwell_1962.pdf Carr, A. (1961). Pacific turtle problem. Natural History, 70, 64-71. Carr, J. W. (2021). Jwcarr/mantel: Python implementation of the mantel test, a significance test of the correlation between two distance matrices. GitHub, Retrieved March 15th, 2022, from https://github.com/jwcarr/mantel#readme Carreras, C., Pascual, M., Cardona, L., Aguilar, A., Margaritoulis, D., Rees, A., Turkozan, O., Levy, Y., Gasith, A., Aureggi, M. & Khalil, M. (2007). The genetic structure of the loggerhead sea turtle (Caretta caretta) in the Mediterranean as revealed by nuclear and mitochondrial DNA and its conservation implications. Conservation Genetics, 8(4), 761- 775. https://doi.org/10.1007/s10592-006-9224-8 Carri?n-Cortez, J. A., Z?rate, P., & Seminoff, J. A. (2010). Feeding ecology of the green sea turtle (Chelonia mydas) in the Galapagos Islands. Journal of the Marine Biological Association of the United Kingdom, 90(5), 1005-1013. https://doi.org/10.1017/S0025315410000226 64 Chari, T., Banerjee, J. & Pachter, L. (2021). The specious art of single-cell genomics. bioRxiv, 1- 25. https://doi.org/10.1101/2021.08.25.457696 Chatterji, R. M., Hipsley, C. A., Sherratt, E., Hutchinson, M. N., & Jones, M. E. (2022). Ontogenetic allometry underlies trophic diversity in sea turtles (Chelonioidea). Evolutionary Ecology, 1-30. https://doi.org/10.1007/s10682-022-10162-z Chiari, Y., Hyseni, C., Fritts, T.H., Glaberman, S., Marquez, C., Gibbs, J.P., Claude, J. & Caccone, A. (2009). Morphometrics parallel genetics in a newly discovered and endangered taxon of Gal?pagos tortoise. PLOS ONE, 4(7), e6272. https://doi.org/10.1371/journal.pone.0006272 Clyde-Brockway, C. E. (2019). Foraging Ecology and Stress in Sea Turtles (Doctoral Dissertation) Purdue University Graduate School, West Lafayette, IN. Craig, J. K., Crowder, L. B., Gray, C. D., McDaniel, C. J., Kenwood, T. A., & Hanifen, J. G. (2001). Ecological Effects of Hypoxia on Fish, Sea Turtles, and Marine Mammals in the Northwestern Gulf of Mexico. Coastal Hypoxia: Consequences for Living Resources and Ecosystems, 58, 269-291. https://doi.org/10.1029/CE058p0269 Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 248-255. https://doi.org/10.1109/CVPR.2009.5206848 Derkarabetian, S., Castillo, S., Koo, P.K., Ovchinnikov, S. & Hedin, M. (2019). A demonstration of unsupervised machine learning in species delimitation. Molecular Phylogenetics and Evolution, 139, 106562. https://doi.org/10.1016/j.ympev.2019.106562 65 Derkarabetian, S., Starrett, J., & Hedin, M. (2022). Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data. Frontiers in Zoology, 19(1), 1-15. https://doi.org/10.1186/s12983-022-00453-0 Do, N.T., Na, I.S. & Kim, S.H. (2018). Forensics face detection from GANs using convolutional neural network. ISITC, 2018, 376-379. Doherty, P.D., Broderick, A.C., Godley, B.J., Hart, K.A., Phillips, Q., Sanghera, A., Stringell, T.B., Walker, J.T. & Richardson, P.B. (2020). Spatial ecology of sub-adult green turtles in coastal waters of the Turks and Caicos Islands: implications for conservation management. Frontiers in Marine Science, 690. https://doi.org/10.3389/fmars.2020.00690 Dunbar, R. B., Wellington, G. M., Colgan, M. W., & Glynn, P. W. (1994). Eastern Pacific Sea surface temperature since 1600 AD: The ?18O record of climate variability in Gal?pagos corals. Paleoceanography, 9(2), 291-315. https://doi.org/10.1029/93PA03501 Dutton, P. H., Davis, S. K., Guerra, T., & Owens, D. (1996). Molecular phylogeny for marine turtles based on sequences of the ND4-leucine tRNA and control regions of mitochondrial DNA. Molecular Phylogenetics and Evolution, 5(3), 511-521. https://doi.org/10.1006/mpev.1996.0046 Dutton, P.H., Jensen, M.P., Frey, A., LaCasella, E., Balazs, G.H., Z?rate, P., Chassin?Noria, O., Sarti?Martinez, A.L. & Velez, E. (2014). Population structure and phylogeography reveal pathways of colonization by a migratory marine reptile (Chelonia mydas) in the central and eastern Pacific. Ecology and Evolution, 4(22), 4317-4331. https://doi.org/10.1002/ece3.1269 66 Earl, C., White, A.E., Trizna, M.G., Frandsen, P.B., Kawahara, A.Y., Brady, S.G. & Dikow, R.B. (2019). Discovering Patterns of Biodiversity in Insects Using Deep Machine Learning. Biodiversity Information Science and Standards, (4), e37525. https://doi.org/10.3897/biss.3.37525 Enstipp, M.R., Ballorain, K., Ciccione, S., Narazaki, T., Sato, K. & Georges, J.Y. (2016). Energy expenditure of adult green turtles (Chelonia mydas) at their foraging grounds and during simulated oceanic migration. Functional Ecology, 30(11), 1810-1825. https://doi.org/10.1111/1365-2435.12667 Enstipp, M.R., Ciccione, S., Gineste, B., Milbergue, M., Ballorain, K., Ropert-Coudert, Y., Kato, A., Plot, V. & Georges, J.Y. (2011). Energy expenditure of freely swimming adult green turtles (Chelonia mydas) and its link with body acceleration. Journal of Experimental Biology, 214(23), 4010-4020. https://doi.org/10.1242/jeb.062943 Escalona, T., Weadick, C. J., & Antunes, A. (2017). Adaptive patterns of mitogenome evolution are associated with the loss of shell scutes in turtles. Molecular Biology and Evolution, 34(10), 2522-2536. https://doi.org/10.1093/molbev/msx167 Fukuoka, T., Narazaki, T., & Sato, K. (2015). Summer-restricted migration of green turtles Chelonia mydas to a temperate habitat of the northwest Pacific Ocean. Endangered Species Research, 28(1), 1-10. https://doi.org/10.3354/esr00671 Funk, W.C., Forester, B.R., Converse, S.J., Darst, C. & Morey, S. (2019). Improving conservation policy with genomics: a guide to integrating adaptive potential into US Endangered Species Act decisions for conservation practitioners and geneticists. Conservation Genetics, 20(1), 115-134. https://doi.org/10.1007/s10592-018-1096-1 67 Gr?goire, M., Gar?on, V., Garcia, H.E., Breitburg, D.L., Isensee, K., Oschlies, A., Telszewski, M., Barth, A., Bittig, H.C., Carstensen, J. & Carval, T. (2021). A Global Ocean Oxygen Database and Atlas for assessing and predicting deoxygenation and ocean health in the open and coastal ocean. Frontiers in Marine Science, 8, 724913. http://dx.doi.org/10.3389/fmars.2021.724913 Grinblat, G.L., Uzal, L.C., Larese, M.G. & Granitto, P.M. (2016). Deep learning for plant identification using vein morphological patterns. Computers and Electronics in Agriculture, 127, 418-424. https://doi.org/10.1016/j.compag.2016.07.003 Godley, B.J., Lima, E.H.S.M., ?kesson, S., Broderick, A.C., Glen, F., Godfrey, M.H., Luschi, P. & Hays, G.C. (2003). Movement patterns of green turtles in Brazilian coastal waters described by satellite tracking and flipper tagging. Marine Ecology Progress Series, 253, 279-288. https://doi.org/10.3354/meps253279 Godoy, D. A. (2016). The ecology and conservation of green turtles (Chelonia mydas) in New Zealand (Doctoral dissertation). Massey University, Albany, NZ. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. & Bengio, Y. (2021). Generative adversarial networks. arXiv Preprint, arXiv:1406.2661. https://doi.org/10.48550/arXiv.1406.2661 Hatase, H., Sato, K., Yamaguchi, M., Takahashi, K., & Tsukamoto, K. (2006). Individual variation in feeding habitat use by adult female green sea turtles (Chelonia mydas): are they obligately neritic herbivores?. Oecologia, 149(1), 52-64. https://doi.org/10.1007/s00442-006-0431-2 68 Hayashi, R., & Yasuda, Y. (2022). Past biodiversity: Japanese historical monographs document the trans?Pacific migration of the black turtle, Chelonia mydas agassizii. Ecological Research, 37(1), 151-155. https://doi.org/10.1111/1440-1703.12265 Hays, G. C., Adams, C. R., Broderick, A. C., Godley, B. J., Lucas, D. J., Metcalfe, J. D., & Prior, A. A. (2000). The diving behaviour of green turtles at Ascension Island. Animal Behaviour, 59(3), 577-586. https://doi.org/10.1006/anbe.1999.1326 Hays, G. C., Glen, F., Broderick, A. C., Godley, B. J., & Metcalfe, J. D. (2002). Behavioural plasticity in a large marine herbivore: contrasting patterns of depth utilisation between two green turtle (Chelonia mydas) populations. Marine Biology, 141(5), 985-990. https://doi.org/10.1007/s00227-002-0885-7 Heidemeyer, M., Arauz-Vargas, R., & L?pez-Ag?ero, E. (2014). New foraging grounds for hawksbill (Eretmochelys imbricata) and green turtles (Chelonia mydas) along the northern Pacific coast of Costa Rica, Central America. Revista de Biologia Tropical, 62(4), 109-118. http://www.redalyc.org/articulo.oa?id=44958812009 Heldstab, S. A., Isler, K., Schuppli, C., & van Schaik, C. P. (2020). When ontogeny recapitulates phylogeny: Fixed neurodevelopmental sequence of manipulative skills among primates. Science advances, 6(30), eabb4685. https://doi.org/10.1126/sciadv.abb4685 Helly, J. J., & Levin, L. A. (2004). Global distribution of naturally occurring marine hypoxia on continental margins. Deep Sea Research Part I: Oceanographic Research Papers, 51(9), 1159-1168. https://doi.org/10.1016/j.dsr.2004.03.009 Hennig, C. (2020). Package ?fpc?. fpc: Flexible Procedures for Clustering, Retrieved March 15th, 2022, from https://cran.r-project.org/web/packages/fpc/index.html 69 Hochachka, P.W. & Storey, K.B. (1975). Metabolic Consequences of Diving in Animals and Man: The diving habit calls for controlled oscillation between aerobic and anaerobic metabolism. Science, 187(4177), 613-621. https://doi.org/10.1126/science.163485 Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint, arXiv:1704.04861. https://doi.org/10.48550/arXiv.1704.04861 Howes, B.J., Brown, J.W., Gibbs, H.L., Herman, T.B., Mockford, S.W., Prior, K.A. & Weatherhead, P.J. (2009). Directional gene flow patterns in disjunct populations of the black ratsnake (Pantheropis obsoletus) and the Blanding?s turtle (Emydoidea blandingii). Conservation Genetics, 10(2), 407-417. https://doi.org/10.1007/s10592-008-9607-0 Hua, K.L., Hsu, C.H., Hidayati, S.C., Cheng, W.H. & Chen, Y.J. (2015). Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets and Therapy, 8. https://dx.doi.org/10.2147%2FOTT.S80733 Isaacks, R.R., Harkness, D.R. & Whitham, P.R. (1978). Relationship between the major phosphorylated metabolic intermediates and oxygen affinity of whole blood in the loggerhead (Caretta caretta) and the green sea turtle (Chelonia mydas mydas) during development. Developmental Biology, 62, 344?353. https://doi.org/10.1016/0012- 1606(78)90221-X Jackson, D.C. (1985). Respiration and respiratory control in the green turtle, Chelonia mydas. Copeia, (3), 664-671. https://doi.org/10.2307/1444760 70 Jackson, D.C., Ramsey, A.L., Paulson, J.M., Crocker, C.E. & Ultsch, G.R. (2000). Lactic acid buffering by bone and shell in anoxic softshell and painted turtles. Physiological and Biochemical Zoology, 73(3), 290-297. https://doi.org/10.1086/316754 Jensen, M.P., Allen, C.D., Eguchi, T., Bell, I.P., LaCasella, E.L., Hilton, W.A., Hof, C.A. & Dutton, P.H. (2018). Environmental warming and feminization of one of the largest sea turtle populations in the world. Current Biology, 28(1),154-159. https://doi.org/10.1016/j.cub.2017.11.057 Jensen, M. P., FitzSimmons, N. N., Bourjea, J., Hamabata, T., Reece, J., & Dutton, P. H. (2019). The evolutionary history and global phylogeography of the green turtle (Chelonia mydas). Journal of Biogeography, 46(5), 860-870. https://doi.org/10.1111/jbi.13483 Kamezaki, N., & Matsui, M. (1995). Geographic variation in skull morphology of the green turtle, Chelonia mydas, with a taxonomic discussion. Journal of Herpetology, 51-60. Karl, S. A., & Bowen, B. W. (1999). Evolutionary significant units versus geopolitical taxonomy: molecular systematics of an endangered sea turtle (genus Chelonia). Conservation Biology, 13(5), 990-999. https://doi.org/10.1046/j.1523-1739.1999.97352.x Khan, S., Nabi, G., Ullah, M.W., Yousaf, M., Manan, S., Siddique, R. & Hou, H. (2016). Overview on the Role of Advance Genomics in Conservation Biology of Endangered Species. International Journal of Genomics, 2016, 3460416-3460416. http://dx.doi.org/10.1155/2016/3460416 Klingenberg, C.P. (2016). Size, shape, and form: concepts of allometry in geometric morphometrics. Development Genes and Evolution, 226(3), 113-137. https://doi.org/10.1007/s00427-016-0539-2 71 Koepfli, K.P., Pollinger, J., Godinho, R., Robinson, J., Lea, A., Hendricks, S., Schweizer, R.M., Thalmann, O., Silva, P., Fan, Z. & Yurchenko, A.A. (2015). Genome-wide evidence reveals that African and Eurasian golden jackals are distinct species. Current Biology, 25(16), 2158-2165. https://doi.org/10.1016/j.cub.2015.06.060 Kot, C.Y., ?kesson, S., Alfaro?Shigueto, J., Amorocho Llanos, D.F., Antonopoulou, M., Balazs, G.H., Baverstock, W.R., Blumenthal, J.M., Broderick, A.C., Bruno, I. & Canbolat, A.F. (2022). Network analysis of sea turtle movements and connectivity: A tool for conservation prioritization. Diversity and Distributions, 28(4), 810-829. https://doi.org/10.1111/ddi.13485 Komoroske, L.M., Jensen, M.P., Stewart, K.R., Shamblin, B.M. & Dutton, P.H. (2017). Advances in the application of genetics in marine turtle biology and conservation. Frontiers in Marine Science, 4, 156. https://doi.org/10.3389/fmars.2017.00156 Lamb, T. & Avise, J.C. (1992). Molecular and population genetic aspects of mitochondrial DNA variability in the diamondback terrapin, Malaclemys terrapin. Journal of Heredity, 83(4), 262-269. https://doi.org/10.1093/oxfordjournals.jhered.a111211 Labrada-Martag?n, V., Tener?a, F.A.M., Herrera-Pav?n, R. & Negrete-Philippe, A. (2017). Somatic growth rates of immature green turtles Chelonia mydas inhabiting the foraging ground Akumal Bay in the Mexican Caribbean Sea. Journal of Experimental Marine Biology and Ecology, 487, 68-78. https://doi.org/10.1016/j.jembe.2016.11.015 Lee, L.S., Navarro-Dom?nguez, B.M., Wu, Z., Montiel, E.E., Badenhorst, D., Bista, B., Gessler, T.B. & Valenzuela, N. (2020). Karyotypic evolution of sauropsid vertebrates illuminated by optical and physical mapping of the painted turtle and slider turtle genomes. Genes, 11(8), 928. https://doi.org/10.3390/genes11080928 72 Lucas, T. C. (2020). A translucent box: interpretable machine learning in ecology. Ecological Monographs, 90(4), e01422. https://doi.org/10.1002/ecm.1422 L?rig, M. D., Donoughe, S., Svensson, E. I., Porto, A., & Tsuboi, M. (2021). Computer vision, machine learning, and the promise of phenomics in ecology and evolutionary biology. Frontiers in Ecology and Evolution, 9, 148. https://doi.org/10.3389/fevo.2021.642774 Madrak, S.V., Lewison, R.L., Eguchi, T., Klimley, A.P. & Seminoff, J.A. (2022). Effects of ambient temperature on dive behavior of East Pacific green turtles before and after a power plant closure. Marine Ecology Progress Series, 683, 157-168. https://doi.org/10.3354/meps13940 Makowski, C., Seminoff, J. A., & Salmon, M. (2006). Home range and habitat use of juvenile Atlantic green turtles (Chelonia mydas L.) on shallow reef habitats in Palm Beach, Florida, USA. Marine Biology, 148(5), 1167-1179. https://doi.org/10.1007/s00227-005- 0150-y Mallet, J. (1995). A species definition for the modern synthesis. Trends in Ecology & Evolution, 10(7), 294-299. https://doi.org/10.1016/0169-5347(95)90031-4 Marshall, S. A., & Evenhuis, N. L. (2015). New species without dead bodies: a case for photo- based descriptions, illustrated by a striking new species of Marleyimyia Hesse (Diptera, Bombyliidae) from South Africa. ZooKeys, (525), 117. https://dx.doi.org/10.3897%2Fzookeys.525.6143 Martin, B. T., Chafin, T. K., Douglas, M. R., Placyk Jr, J. S., Birkhead, R. D., Phillips, C. A., & Douglas, M. E. (2021). The choices we make and the impacts they have: Machine learning and species delimitation in North American box turtles (Terrapene spp.). 73 Molecular Ecology Resources, 21(8), 2801-2817. https://doi.org/10.1111/1755- 0998.13350 Maurer, A. S., Seminoff, J. A., Layman, C. A., Stapleton, S. P., Godfrey, M. H., & Reiskind, M. O. B. (2021). Population viability of sea turtles in the context of global warming. BioScience, 71(8), 790-804. https://doi.org/10.1093/biosci/biab028 McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint, arXiv:1802.03426. https://doi.org/10.48550/arXiv.1802.03426 Mendon?a, M. T. (1983). Movements and feeding ecology of immature green turtles (Chelonia mydas) in a Florida lagoon. Copeia, 1013-1023. https://doi.org/10.2307/1445104 Myers, E.M., Janzen, F.J., Adams, D.C. & Tucker, J.K. (2006). Quantitative genetics of plastron shape in slider turtles (Trachemys scripta). Evolution, 60(3), 563-572. https://doi.org/10.1111/j.0014-3820.2006.tb01137.x Nagle, R. D., Rowe, C. L., Grant, C. J., Sebastian, E. R., & Martin, B. E. (2018). Abnormal shell shapes in northern map turtles of the Juniata River, Pennsylvania, USA. Journal of Herpetology, 52(1), 59-66. https://doi.org/10.1670/17-030 Naro-Maciel, E., Gaughran, S.J., Putman, N.F., Amato, G., Arengo, F., Dutton, P.H., McFadden, K.W., Vintinner, E.C. & Sterling, E.J. (2014). Predicting connectivity of green turtles at Palmyra Atoll, central Pacific: a focus on mtDNA and dispersal modelling. Journal of the Royal Society Interface, 11(93), 20130888. https://doi.org/10.1098/rsif.2013.0888 Norman, J.A., Moritz, C. & Limpus, C.J. (1994). Mitochondrial DNA control region polymorphisms: genetic markers for ecological studies of marine turtles. Molecular Ecology, 3(4), 363-373. https://doi.org/10.1111/j.1365-294X.1994.tb00076.x 74 Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. & Clune, J. (2018). Automatically identifying, counting, and describing wild animals in camera- trap images with deep learning. Proceedings of the National Academy of Sciences, 115(25), E5716-E5725. https://doi.org/10.1073/pnas.1719367115 Odegard, D.T., Sonnenfelt, M.A., Bledsoe, J.G., Keenan, S.W., Hill, C.A. & Warren, D.E., (2018). Changes in the material properties of the shell during simulated aquatic hibernation in the anoxia-tolerant painted turtle. Journal of Experimental Biology, 221(18), jeb176990. https://doi.org/10.1242/jeb.176990 Okamoto, K. & Kamezaki, N. (2014). Morphological variation in Chelonia mydas (Linnaeus, 1758) from the coastal waters of Japan, with special reference to the turtles allied to Chelonia mydas agassizii Bocourt, 1868. Current Herpetology, 33(1), 46-56. https://doi.org/10.5358/hsj.33.46 Paixao, W. R., Paixao, T. M., Costa, M. C. B., Andrade, J. O., Pereira, F. G., & Komati, K. S. (2018). Texture classification of sea turtle shell based on color features: color histograms and chromaticity moments. International Journal of Artificial Intelligence and Applications (IJAIA), 9(2), 55-67. http://dx.doi.org/10.5121/ijaia.2018.9205 Parham, J.F. & Zug, G.R. (1996). Chelonia agassizii-valid or not. Marine Turtle Newsletter, 72(2). http://www.seaturtle.org/mtn/archives/mtn72/mtn72p2b.shtml?nocount Parker, D.M., Dutton, P.H. & Balazs, G.H. (2011). Oceanic diet and distribution of haplotypes for the green turtle, Chelonia mydas, in the Central North Pacific. Pacific Science, 65(4), 419-431. https://doi.org/10.2984/65.4.419 Perez, M. F., Bonatelli, I. A., Romeiro?Brito, M., Franco, F. F., Taylor, N. P., Zappi, D. C., & Moraes, E. M. (2022). Coalescent?based species delimitation meets deep learning: 75 Insights from a highly fragmented cactus system. Molecular Ecology Resources, 22(3), 1016-1028. https://doi.org/10.1111/1755-0998.13534 Pike, D. A. (2014). Forecasting the viability of sea turtle eggs in a warming world. Global Change Biology, 20 (1), 7-15. http://dx.doi.org/10.1111/gcb.12397 Poulakakis, N., Edwards, D.L., Chiari, Y., Garrick, R.C., Russello, M.A., Benavides, E., Watkins-Colwell, G.J., Glaberman, S., Tapia, W., Gibbs, J.P. & Cayot, L.J., (2015). Description of a new Gal?pagos giant tortoise species (Chelonoidis; Testudines: Testudinidae) from Cerro Fatal on Santa Cruz Island. PLOS ONE, 10(10), e0138779. https://doi.org/10.1371/journal.pone.0138779 Pritchard, P.C. (1999). Status of the black turtle. Conservation Biology, 1000-1003. https://www.jstor.org/stable/2641731 Ramos, E.K.D.S., Freitas, L. & Nery, M.F. (2020). The role of selection in the evolution of marine turtles mitogenomes. Scientific Reports, 10(1), 1-13. https://doi.org/10.1038/s41598-020-73874-8 R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, R version 4.1.2. https://www.R-project.org/ Reich, K.J., Bjorndal, K.A. & Bolten, A.B. (2007). The ?lost years? of green turtles: using stable isotopes to study cryptic life stages. Biology Letters, 3(6),712-714. https://doi.org/10.1098/rsbl.2007.0394 Res?ndiz, E., Fern?ndez-Sanz, H., & Lara-Uc, M. M. (2018). Baseline health indicators of eastern Pacific green turtles (Chelonia mydas) from Baja California Sur, Mexico. Comparative Clinical Pathology, 27(5), 1309-1320. https://doi.org/10.1007/s00580-018- 2740-3 76 Rivera, G. (2008). Ecomorphological variation in shell shape of the freshwater turtle Pseudemys concinna inhabiting different aquatic flow regimes. Integrative and Comparative Biology, 48(6), 769-787. https://doi.org/10.1093/icb/icn088 Rivera, G., Davis, J. N., Godwin, J. C., & Adams, D. C. (2014). Repeatability of habitat- associated divergence in shell shape of turtles. Evolutionary Biology, 41(1), 29-37. https://doi.org/10.1007/s11692-013-9243-6 Roberts, M.A., Schwartz, T.S. & Karl, S.A. (2004). Global population genetic structure and male-mediated gene flow in the green sea turtle (Chelonia mydas): analysis of microsatellite loci. Genetics, 166(4), 1857-1870. https://doi.org/10.1093/genetics/166.4.1857 Roman, J., Santhuff, S. D., Moler, P. E., & Bowen, B. W. (1999). Population structure and cryptic evolutionary units in the alligator snapping turtle. Conservation Biology, 13(1), 135-142. https://doi.org/10.1046/j.1523-1739.1999.98007.x Ronneberger, O., Fischer, P. & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-assisted Intervention, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 Rubinoff, I. (1968). Central American Sea-Level Canal: Possible Biological Effects: An opportunity for the greatest biological experiment in man's history may not be exploited. Science, 161(3844), 857-861. https://doi.org/10.1126/science.161.3844.857 Salmon, M., Mott, C.R. & Bresette, M.J. (2018). Biphasic allometric growth in juvenile green turtles Chelonia mydas. Endangered Species Research, 37, 301-308. https://doi.org/10.3354/esr00930 77 Sampson, L., Giraldo, A., Pay?n, L. F., Amorocho, D. F., Ramos, M. A., & Seminoff, J. A. (2018). Trophic ecology of green turtle Chelonia mydas juveniles in the Colombian Pacific. Journal of the Marine Biological Association of the United Kingdom, 98(7), 1817-1829. https://doi.org/10.1017/S0025315417001400 Sampson, L., Pay?n, L. F., Amorocho, D. F., Seminoff, J. A., & Giraldo, A. (2014). Intraspecific variation of the green turtle, Chelonia mydas (Cheloniidae), in the foraging area of Gorgona Natural National Park (Colombian Pacific). Acta Biol?gica Colombiana, 19(3), 461-470. http://dx.doi.org/10.15446/abc.v19n3.42615 Santos, C.M.D., Amorim, D.S., Klassa, B., Fachin, D.A., Nihei, S.S., De Carvalho, C.J., Falaschi, R.L., Mello-Patiu, C.A., Couri, M.S., Oliveira, S.S. & Silva, V.C. (2016). On typeless species and the perils of fast taxonomy. Systematic Entomology, 41(3), 511-515. https://doi.org/10.1111/syen.12180 Saryan, P., Gupta, S., & Gowda, V. (2020). Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery. Applications in Plant Sciences, 8(7), e11377. https://doi.org/10.1002/aps3.11377 Schuettpelz, E., Frandsen, P.B., Dikow, R.B., Brown, A., Orli, S., Peters, M., Metallo, A., Funk, V.A. and Dorr, L.J. (2017). Applications of deep convolutional neural networks to digitized natural history collections. Biodiversity Data Journal, 1(5). https://dx.doi.org/10.3897%2FBDJ.5.e21139 Schultz, E. A. (2016). Genetic analysis, movement, and nesting patterns of the green sea turtle (Chelonia mydas) in St. Croix, Virgin Islands (USA): A regional analysis for the Caribbean (Doctoral dissertation). Savannah State University, Savannah, GA. 78 Seminoff, J. A., Alfaro-Shigueto, J., Amorocho, D., Aruaz, R., Baquero Gallegos, A., Chacon Chaverri, D., Gaos, A. R., Kelez, S., Mangel, J. C., Urteaga, J., & Wallace, B. P. (2012). Biology and Conservation of Sea Turtles in the Eastern Pacific. In: Sea Turtles in the Eastern Pacific Advances in Research and Conservation. Tucson, AZ, USA: University of Arizona Press, 11-38. https://doi.org/10.2307/j.ctv21hrddc Seminoff, J.A., Allen, C.D., Balazs, G.H., Dutton, P.H., Eguchi, T., Haas, H.L., Hargrove, S., Jensen, M.P., Klemm, D.L., Lauritsen, M. & MacPherson, S.L. (2015). Status review of the green turtle (Chelonia mydas) under the US Endangered Species Act. NOAA Technical Memorandum NOAA-TM-NMFS-SWFSC-539. Seminoff, J. A., & Jones, T. T. (2006). Diel movements and activity ranges of green turtles (Chelonia mydas) at a temperate foraging area in the Gulf of California, Mexico. Herpetological Conservation and Biology, 1(2), 81-86. https://www.herpconbio.org/volume_1/issue_2/Seminoff_Jones_2006.pdf Seminoff, J. A., & Shanker, K. (2008). Marine turtles and IUCN Red Listing: a review of the process, the pitfalls, and novel assessment approaches. Journal of Experimental Marine Biology and Ecology, 356(1-2), 52-68. https://doi.org/10.1016/j.jembe.2007.12.007 Seminoff, J. A., Resendiz, A., & Nichols, W. J. (2002). Diet of East Pacific green turtles (Chelonia mydas) in the central Gulf of California, Mexico. Journal of Herpetology, 36(3), 447-453. https://doi.org/10.1670/0022- 1511(2002)036[0447:DOEPGT]2.0.CO;2 Seminoff, J. A., Whitman, E. R., Wallace, B. P., Bayless, A., Resendiz, A., & Jones, T. T. (2020). No rest for the weary: restricted resting behaviour of green turtles (Chelonia mydas) at a deep-neritic foraging area influences expression of life history traits. Journal 79 of Natural History, 54(45-46), 2979-3001. https://doi.org/10.1080/00222933.2021.1887387 Shanker, K. (2001). A review of species concepts: ideas for a new concept and implications for the green?black turtle debate. Twenty-first Annual Symposium on Sea Turtle Biology and Conservation, Philadelphia: NOAA Technical Memorandum NOAA-NMFS-SEFSC- 528, 323-325. https://repository.library.noaa.gov/view/noaa/3412/noaa_3412_DS1.pdf#page=341 Shatalkin, A. I., & Galinskaya, T. V. (2017). A commentary on the practice of using the so- called typeless species. ZooKeys, (693), 129. https://doi.org/10.3897/zookeys.693.10945 Skejo, J. O. S. I. P., & Caballero, J. H. S. (2016). A hidden pygmy devil from the Philippines: Arulenus miae sp. nov.?a new species serendipitously discovered in an amateur Facebook post (Tetrigidae: Discotettiginae). Zootaxa, 4067(3), 383-393. http://doi.org/10.11646/zootaxa.4067.3.7 S?nmez, B. (2019). Morphological variations in the green turtle (Chelonia mydas): A field study on an eastern Mediterranean nesting population. Zoological Studies, 58. https://dx.doi.org/10.6620%2FZS.2019.58-16 Tomaszewicz, C. N. T., Seminoff, J. A., Avens, L., Goshe, L. R., Rguez-Baron, J. M., Peckham, S. H., & Kurle, C. M. (2018). Expanding the coastal forager paradigm: long-term pelagic habitat use by green turtles Chelonia mydas in the eastern Pacific Ocean. Marine Ecology Progress Series, 587, 217-234. https://doi.org/10.3354/meps12372 Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA, USA: CreateSpace. https://www.python.org/ 80 V?lez-Rubio, G. M., Cardona, L., L?pez-Mendilaharsu, M., Mart?nez Souza, G., Carranza, A., Gonz?lez-Paredes, D., & Tom?s, J. (2016). Ontogenetic dietary changes of green turtles (Chelonia mydas) in the temperate southwestern Atlantic. Marine Biology, 163(3), 1-16. https://doi.org/10.1007/s00227-016-2827-9 White, A.E., Dikow, R.B., Baugh, M., Jenkins, A. & Frandsen, P.B. (2020). Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning. Applications in Plant Sciences, 8(6), e11352. https://doi.org/10.1002/aps3.11352 White, A. E., Trizna, M. G., Frandsen, P. B., Dorr, L. J., Dikow, R. B., & Schuettpelz, E. (2019). Evaluating Geographic Patterns of Morphological Diversity in Ferns and Lycophytes Using Deep Neural Networks. Biodiversity Information Science and Standards, (4). https://doi.org/10.3897/biss.3.37559 Whiting, S. D., & Miller, J. D. (1998). Short term foraging ranges of adult green turtles (Chelonia mydas). Journal of Herpetology, 330-337. https://doi.org/10.2307/1565446 Wood, S.C., Gatz, R.N. & Glass, M.L. (1984). Oxygen transport in the green sea turtle. Journal of Comparative Physiology B, 154(3), 275-280. https://doi.org/10.1007/BF02464407 Wyneken, J., Balazs, G. H., Murakawa, S., & Anderson, Y. (1999). Size differences in hind limbs and carapaces of hatchling green turtles (Chelonia mydas) from Hawaii and Florida, USA. Chelonian Conservation and Biology, 3(3), 491-495. Yang, Y., Sun, H., Zhang, Y., Zhang, T., Gong, J., Wei, Y., Duan, Y.G., Shu, M., Yang, Y., Wu, D. & Yu, D. (2021). Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data. Cell Reports, 36(4), 109442. https://doi.org/10.1016/j.celrep.2021.109442 81 Yildirim, M., & Cinar, A. (2022). Classification with respect to colon adenocarcinoma and colon benign tissue of colon histopathological images with a new CNN model: MA_ColonNET. International Journal of Imaging Systems and Technology, 32(1), 155- 162. https://doi.org/10.1002/ima.22623 Younis, S., Weiland, C., Hoehndorf, R., Dressler, S., Hickler, T., Seeger, B., & Schmidt, M. (2018). Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks. Botany Letters, 165(3-4), 377-383. https://doi.org/10.1080/23818107.2018.1446357 Z?rate, P. M., Bjorndal, K. A., Seminoff, J. A., & Bolten, A. B. (2012). Understanding migratory and foraging behavior of green turtles Chelonia mydas in the Galapagos Islands through stable isotopes. Thirty-first Annual Symposium on Sea Turtle Biology and Conservation, San Diego: NOAA Technical Memorandum NOAA-NMFS-SEFSC-631. Z?rate, P.M., Bjorndal, K.A., Seminoff, J.A., Dutton, P.H. & Bolten, A.B. (2015). Somatic growth rates of green turtles (Chelonia mydas) and hawksbills (Eretmochelys imbricata) in the Galapagos Islands. Journal of Herpetology, 49(4), 641-648. https://doi.org/10.1670/14-078 Zimm, R., Bentley, B. P., Wyneken, J., & Moustakas-Verho, J. E. (2017). Environmental causation of turtle scute anomalies in ovo and in silico. Integrative and Comparative Biology, 57(6), 1303-1311. https://doi.org/10.1093/icb/icx066 82 Chapter 3: Application of a Deep Learning Image Classifier for Identification of Amazonian Fishes Abstract 1. Given the sharp increase in agricultural and infrastructure development and the paucity of widespread data available to support conservation management decisions, a more rapid and accurate tool for identifying fish fauna in the world's largest freshwater ecosystem, the Amazon, is needed. 2. Current strategies for identification of freshwater fishes require high levels of training and taxonomic expertise for morphological identification or genetic testing for species recognition at a molecular level. 3. To overcome these challenges, we built an image masking model (U-Net) and a Convolutional Neural Net (CNN) to classify Amazonian fish in photographs. Fish used to generate training data were collected and photographed in tributaries in seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019. 4. Species identifications in the training images (n = 3,068) were verified by expert ichthyologists. These images were supplemented with photographs taken of additional Amazonian fish specimens housed in the ichthyological collection of the Smithsonian?s National Museum of Natural History. 5. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the one described here, will enable fishermen, local communities, and citizen scientists to 83 more effectively participate in collecting and sharing data from their territories to inform policy and management decisions that impact them directly. Introduction The Amazon basin is home to over 2,700 species of freshwater fishes (Junk et al., 2007; Dagosta & De Pinna, 2019), many of which are of conservation concern (Albert et al., 2011; Garc?a-D?vila et al., 2018; Pelicice et al., 2021). Freshwater fishes provide one of the few reliable sources of protein for Amazonian communities and represent an important economic opportunity through the aquarium trade (Moreau & Coomes, 2007; Coomes et al., 2010). This unique ichthyofauna is facing unprecedented threats, such as deforestation (Junk et al., 2007; Lob?n-Cervi? et al., 2015), construction of hydropower dams (Winemiller et al., 2016), mining (Azevedo-Santos et al., 2021), climate change (Bodmer et al., 2017), and in some cases, over exploitation (Moreau & Coomes, 2007). While advances in sampling poorly explored areas and describing the diversity of Amazonian fish have been made over the last decade (e.g Alfos et al., 2014; de Santana et al., 2019; de Santana et al., 2021), the sub-drainages of the Mara??n river remain among the most under sampled regions in South America (J?z?quel et al., 2020). Freshwater fishes provide one of the few reliable sources of protein for Amazonian communities (Moreau & Coomes, 2007; Coomes et al., 2010). In less populated areas of the Amazon, subsistence fishing, for both consumption and the pet trade, can be essential to sustaining life (Moreau & Coomes, 2007; Coomes et al., 2010). Due to the urgency of these economic and ecological threats, efficient data collection and long-term monitoring are needed to better inform mitigation strategies and policy. 84 Traditional ichthyological sampling methods include focused netting and fishing efforts, followed by extensive manual sorting, documentation, and identification. Although effective, and necessary in the Amazon where a countless number of fishes remain to be described (Reis et al., 2016), these methods are time consuming and raise the potential for misidentification bias (Kirsch et al., 2018). As a result, many have turned to the assistance of community scientists to aid in catch effort and identification of individual landings, yet accurate species identification remains a challenge (Gardiner et al., 2012; Swanson et al., 2016). Genetic approaches have also been implemented to identify many of the fish species inhabiting the Amazon (Garc?a-D?vila et al., 2017; de Santana et al., 2021), but these approaches also rely on well-identified and vouchered genetic libraries that are still missing for Amazonian fishes. These techniques require expensive storage and sample processing technology, which are not readily available in most institutions within the Amazon (de Santana et al., 2021). In order to address the ever-growing need for data and cost-effective solutions, contemporary fisheries research has called for the development and application of a rapid solution, namely by way of machine learning models, such as Convolutional Neural Networks (CNNs, e.g., Perdig?o et al., 2020). CNNs have the potential to enable rapid identification of fish to monitor fishery stocks, diversity, bycatch, and to combat illegal fishing (Marini et al., 2018; Perdig?o et al., 2020). Machine learning techniques have been successfully implemented in niche modeling, prediction of mass mortality events, and the development of non-linear ecological time-series models (Recknagel, 2001; Crisci et al., 2012; Miller-Coleman et al., 2012). Image classification deep learning models show promise in being applied to highly diverse taxa and collections (Weinstein, 2017; Norouzzadeh et al., 2018; Sullivan et al., 2018; W?ldchen & M?der, 2018; Borowiec et al., 2021). Past attempts to identify fish taxa using computer vision have had 85 varying degrees of success across a wide breadth of ichthyological data sets. For example, early attempts by Alsmadi et al. (2010) were able to identify 20 families of marine fish from 610 images with an accuracy of 84%. More recent work improved accuracy to 90% (Alsmadi et al., 2019). Hern?ndez-Serna and Jim?nez-Segura (2014) used seven museum collections that included both marine and Amazonian freshwater fish (images per collection ranged from 422 to 2,392) and obtained accuracies between 72-92%. Sun et al. (2016) obtained a species identification accuracy of 77.27% from 9,160 AUV images of fish. A study by Qin et al. (2015) was able to identify 23 deep sea fish species with an accuracy of 98% using a substantial number of training images (n = 22,370). In this study, we developed two deep learning computer vision models: one that segments fish pixels from background pixels, and one that classifies images of Amazonian fishes to the genus level. As the first image classifier for ichthyological monitoring in the megadiverse Peruvian Amazon basin, we hope this case study will act as a primer for further development of deep learning models, as tools for conservation stakeholders. Deep learning for taxonomic image classification has proven to be efficient and highly accurate, demonstrating promise for improving participatory monitoring initiatives (Norouzzadeh et al., 2018; Sullivan et al., 2018). Specifically, these tools will enable communities involved in participatory monitoring to fill knowledge gaps and improve data reliability. These models can also provide a basis on which to build new models for other species of conservation concern and public health interest. Our data 86 and pipeline are publicly available, which will enable others to apply these techniques to other taxa. Methods In July 2018, we sampled freshwater fishes in small white-water rivers, and black and white-water streams in seasonally flooded forests of the upper Morona River valley in Achuar native territory, Loreto, Peru. Sites were resampled in November 2018 and November 2019. Fish were identified by specialists with the aid of dichotomous taxonomic keys considering morphological, meristic, and morphometric characteristics. Taxonomic nomenclature follows Fricke et al. (2018). A total of 141 fish species belonging to 89 genera and 29 families across all sites and seasons were identified (Morgan Ruiz-Tafur, unpublished data). Captured fish (n = 1, 967) were placed on a 1cm grid or a neutral background (leaves, hands, ground, etc.) and photographed using a Nikon D3500 camera, prior to preservation. Specimens were deposited in the ichthyology collection at the Instituto de Investigaciones de la Amazonia Peruana (IIAP) in Iquitos, Peru. Due to the limited number of images, we had per species, we restricted our analysis to genera (n = 33), using a minimum threshold of 20 field images per genus (n = 1,615). To supplement field images, we incorporated additional images (n = 1,453) taken of specimens housed at the Smithsonian National Museum of Natural History Department of Vertebrate Zoology, Division of Fishes collection (USNM) using both a Nikon B500 and W100. Fish 87 specimens were photographed on both blank and 1cm grid backgrounds from multiple angles. In total, our dataset consists of 3,068 images prior to processing. Preprocessing Steps To build a training dataset, we first removed all incidentally taken/non-fish and unidentified fish images. We then built a U-Net (Ronneberger et al., 2015) segmentation model to classify pixels in images as fish or background using the methods similar to White et al. (2020). Specifically, we manually masked a subset of images (n = 66; 2 images from each genus), using the methods of White et al. (2020), to use as a training set to build a U-Net. Our generated masks zeroed out (blacked) background pixels, while retaining fish pixels. The model was built on a resnet-34 architecture pretrained on the ImageNet dataset (Deng et al., 2009). All field and museum images were then masked by our trained U-Net. Images which were unsuccessfully masked, where no component of the original input image remained within the photo, were removed from the dataset. The remaining images, which had at least some component of the target object with no background, were then subdivided for training and validation of the genus identification model. Identification Model Architecture, Training, and Validation We trained our image classifier to distinguish between 33 fish genera based on masked images. The classifier was developed using a Nvidia GeForce (V100; 32GB VRAM) GPU implementing the Fast.ai library (Howard & Gugger, 2020) in PyTorch (Paszke et al., 2019). The model was built on a resnet-101 architecture pretrained on the ImageNet dataset (Deng et al., 2009). To develop our image classifier model, masked images were randomly divided into training (n= 2,387) and validation (n= 596) sets, split 80/20 respectively, to maximize accuracy 88 (Hern?ndez-Serna & Jim?nez-Segura, 2014). All images were resized by ?squishing? them into 300 x 300 pixels. We trained our model over 60 epochs with 1 training session of random transformations making up 6/60 epochs. Figure 3.1 Example of unmasked (left) and masked (right) images of a fish (Bario steindachneri). Results The U-Net masking model was trained over 20 epochs, at which point the training loss and validation loss were minimized. Our U-Net was able to successfully mask 97.23% (n = 2,983) of our images. Images which were not successfully masked (n = 85) were removed from training and validation. Our Amazonian fish image classifier trained in 50 epochs at which point the training loss and validation loss were minimized. The validation set results, predicted class versus actual class, are summarized in a confusion matrix (Figure 3.2). Of the 596 validation images, the image classifier predicted 97.99% of them correctly. Accuracy by genus is 89 summarized in Table 1. The range of accuracy by genus was 88.89% to 100%. Accuracy by genus by training data type is summarized in Table 2. The models, and associated metadata are available at the Smithsonian figshare repository (DOI: 10.25573/data.17315126). The application for both models is available online (https://github.com/MikeTrizna/streamlit_fish_masking). 90 Table 3.1 Summary of validation set (n=596) results by genus. 91 Figure 3.2 Confusion matrix visualization of computer vision model validation results. The x- axis depicts the genus predicted by the model. The y-axis depicts the actual genus to which the image belongs, organized by taxonomic class, family and genus according to Fricke et al. (2018). Correct identifications are depicted in the left-to-right diagonal, with a darker color indicating more correct identifications, and blank yellow squares indicating zeros. Masked image examples on y-axis are as follows: A- Bryconops, B- Tetragonopterus, C- Astyanax, D- Moenkhausia, E- Gymnotus, F- Ancistrus, G- Corydoras, H- Bujurquina. 92 Discussion We were able to efficiently build a state-of-the-art model which can rapidly identify standardized Amazonian fish images to the genus level (n = 33) with 97.99% accuracy. Of the 12 incorrectly classified images in our validation set, 7 were misclassified outside of their family, while 2 images were misclassified outside of their order. After visually examining the incorrectly classified images, it was evident that some of them were likely more difficult to classify because of bisection from incidentally masked fish pixels. In short, we believe our masking rendered a few of our images unidentifiable and is arguably an artifact of the data pipeline rather than a source of true error on the image classifier. To improve accuracy when used in the field, we recommend capturing multiple clear images of individual fish to ensure at least one is successfully masked prior to inference for identification. This can be done by using a background that is sufficiently distinct from the coloration of the fish being photographed. Most misidentifications in our model involved tetras, small characids that are the dominant fish fauna in Amazonian small rivers and streams (de Oliveira et al., 2009). Historically, species-rich and closely-related tetras have been difficult to identify due to cryptic species diversity ? where more than one nominal species may be several undescribed species ? and the lack of exclusive morphological characters to identify some genera (e.g., Astyanax >170 species and Hyphessobrycon >130 species; Escobar-Camacho et al., 2015; Oliveira et al., 2011; Barreto et al., 2017). In addition, an estimated 40% of species in the region have yet to be described (e.g., Reis et al., 2016). Thus, species misidentifications due to taxonomically complex groups, such as tetras and other cryptic assemblages, are common problems in manual morphological as well as with genetic identification approaches (e.g., de Santana et al., 2021) and this must be taken into account when building an image classifier for Amazonian fishes. In 93 short, the output given by an image classification model is only as good as the label given to each class during training. If the target class is not well defined, this ultimately may disrupt the classification accuracy for those genera. Our model is deployed through a publicly accessible web-based application available on Streamlit. The emergence of new technologies such as mobile applications, wireless sensor networks, augmented/virtual reality and high throughput computing are already advancing scientific research by enabling community scientists to bridge the training gap through instant ?expert? verification (Newman et al., 2012). In the case of remote locations in the Amazon basin, like those sampled for this study, collection of accurate, reliable data is vital for monitoring freshwater ecosystem health and local fish stocks. Fish are key indicators of water quality and the health of aquatic ecosystems (Harris, 1995), and for many indigenous Amazonian communities, fish are a reliable source of protein especially in times of hardship (Swierk & Madigosky, 2014). When deployed in the field, our model will empower community-led initiatives to monitor fish in the Amazon River basin to collect more accurate information and identify ecological trends about this integral source of food and income (Finer et al., 2008). While we obtained a high level of accuracy in line with the results of other deep learning fish studies implementing image classifiers (Qin et al., 2015; Alsmadi et al., 2019), our study is novel because we were able to combine both museum and field collected images and we think this is a robust framework for future studies. Use of these different data sources in combination can enable novel insights that may not have been found by building separate museum and field models (Lendemer et al., 2020). Given the remoteness of the localities sampled in this study, and the cryptic nature of the species endemic to these sites, we will always be limited in terms of the number of field images we can acquire, which can limit the scope and breadth of the model we 94 can train. By utilizing a hybrid approach, and digitizing specimens in the museum collection, we were able to double the amount of total images available to generate the model. Although past field efforts have applied image classification to citizen science data taken from the field (Van Horn et al., 2017), none have targeted freshwater fish in such highly diverse sites as the upper Morona River valley. Image classification models such as the model presented here increase the accessibility of taxonomic identification needed to accurately monitor ecosystem health and natural resources (Gardiner et al., 2012; Newman et al., 2012). In such an incredibly diverse ecosystem, a model accurately identifying fish to the genus level is a first step which will provide motivation for increased digitization efforts to obtain sufficient images for training a model at the species level. In order to bolster future modeling efforts, and enable advancements in training data acquisition, collection of photographic data must be considered standard protocol going forward on any field or museum-based ecological study. This is especially true given the rapid advancement of mobile phone photography (Rasmusson et al., 2004). Although our images were originally taken at a high resolution of varying sizes, ultimately, they were resized to just 300 x 300 pixels. At the time of this publication, there are several types of mobile phones which have cameras capable of capturing higher resolution images than the size used when training our model (Gonz?lez & Pozo, 2019). Although past efforts have been able to accurately identify fish with varying degrees of success, we expect the ever-growing amount of image data available to enable generation of even more robust and more accurate models. Collection of more fish image data from this region, and continued development, will allow for a more sophisticated version of this model to be developed in the future. While this model is specific to the Peruvian Amazon, the workflow 95 itself is publicly available on GitHub, and can be applied to any taxonomic survey. In order to aid such efforts and expand this type of initiative to a global level, we recommend that those collecting ichthyological, or any taxonomic data, incorporate targeted image capture as part of their standardized protocols, and that they be made publicly available. Conclusions We present an application that can be used to rapidly and accurately classify freshwater fish from the upper Morona River valley in the northwest Amazon to genus for scientific research. Although able to classify 33 genera present in the current study area, the model described here provides a solid foundation for future projects. The application, which can be used to classify single images to genus, is accessible to the community online. The model's application to images taken from geographic areas outside of the northwestern Amazon has yet to be explored. References Albert, J.S., Carvalho, T.P., Petry, P., Holder, M.A., Maxime, E.L., Espino, J., Corahua, I., Quispe, R., Rengifo, B., Ortega, H. & Reis, R.E. (2011). Aquatic biodiversity in the Amazon: habitat specialization and geographic isolation promote species richness. Animals, 1(2), 205-241. https://doi.org/10.3390/ani1020205 Alofs, K.M., Liverpool, E.A., Taphorn, D.C., Bernard, C.R. & L?pez?Fern?ndez, H. (2014). Mind the (information) gap: the importance of exploration and discovery for assessing conservation priorities for freshwater fish. Diversity and Distributions, 20(1), 107-113. https://doi.org/10.1111/ddi.12127 96 Alsmadi, M. K., Omar, K. B., Noah, S. A., & Almarashdeh, I. (2010). Fish recognition based on robust features extraction from color texture measurements using back-propagation classifier. Journal of Theoretical and Applied Information Technology, 18(1), 11-18 Alsmadi, M. K., Tayfour, M., Alkhasawneh, R. A., Badawi, U., Almarashdeh, I., & Haddad, F. (2019). Robust feature extraction methods for general fish classification. International Journal of Electrical & Computer Engineering, 9, 2088-8708. https://doi.org/10.11591/ijece.v9i6.pp5192-5204 Azevedo-Santos, V.M., Arcifa, M.S., Brito, M.F., Agostinho, A.A., Hughes, R.M., Vitule, J.R., Simberloff, D., Olden, J.D. & Pelicice, F.M. (2021). Negative impacts of mining on Neotropical freshwater fishes. Neotropical Ichthyology, 19. https://doi.org/10.1590/1982- 0224-2021-0001 Barreto, C.A.V., Granja, M.M.C., Vidigal, P.M.P., Carmo, A.O. & Dergam, J.A. (2017). Complete mitochondrial genome sequence of neotropical fish Astyanax giton Eigenmann 1908 (Ostariophysi; Characidae). Mitochondrial DNA Part B, 2(2), 839-840. https://doi.org/10.1080/23802359.2017.1403869 Bodmer, R., Fang, T., Antunez, M., Puertas, P., Chota, K., Pittet, M., Kirkland, M., Walkey, M., Rios, C., Perez-Pe?a, P. & Mayor, P. (2017). Impact of recent climate fluctuations on biodiversity and people in flooded forests of the Peruvian Amazon. CBD Technical Series, 89, 81-90. Borowiec, M. L., Frandsen, P., Dikow, R., McKeeken, A., Valentini, G., & White, A. E. (2021). Deep learning as a tool for ecology and evolution. EcoEvoRxiv, 1-30. https://doi.org/10.32942/osf.io/nt3as 97 Coomes, O. T., Takasaki, Y., Abizaid, C., & Barham, B. L. (2010). Floodplain fisheries as natural insurance for the rural poor in tropical forest environments: evidence from Amazonia. Fisheries Management and Ecology, 17(6), 513-521. https://doi.org/10.1111/j.1365-2400.2010.00750.x Crisci, C., Ghattas, B. & Perera, G. (2012). A review of supervised machine learning algorithms and their applications to ecological data. Ecological Modelling, 240, 113-122. https://doi.org/10.1016/j.ecolmodel.2012.03.001 Dagosta, F.C. & De Pinna, M. (2019). The fishes of the Amazon: distribution and biogeographical patterns, with a comprehensive list of species. Bulletin of the American Museum of Natural History, (431), 1-163. https://doi.org/10.1206/0003-0090.431.1.1 Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 248-255. https://doi.org/10.1109/CVPR.2009.5206848 de Oliveira, R.R., Rocha, M.S., dos Anjos, M.B., Zuanon, J. & Py-Daniel, L.H.R. (2009). Fish fauna of small streams of the Catua-Ipixuna Extractive Reserve, state of Amazonas, Brazil. Check List, 5(2), 154-172. https://doi.org/10.15560/5.2.154 de Santana, C.D., Crampton, W.G., Dillman, C.B., Frederico, R.G., Sabaj, M.H., Covain, R., Ready, J., Zuanon, J., de Oliveira, R.R., Mendes-J?nior, R.N. & Bastos, D.A. (2019). Unexpected species diversity in electric eels with a description of the strongest living bioelectricity generator. Nature Communications, 10(1), 1-10. https://doi.org/10.1038/s41467-019-11690-z de Santana, C.D., Parenti, L.R., Dillman, C.B., Coddington, J.A., Bastos, D.A., Baldwin, C.C., Zuanon, J., Torrente-Vilara, G., Covain, R., Menezes, N.A. & Datovo, A. (2021). The 98 critical role of natural history museums in advancing eDNA for biodiversity studies: a case study with Amazonian fishes. bioRxiv. https://doi.org/10.1101/2021.04.18.440157 Escobar-Camacho, D., Barriga, R. & Ron, S.R. (2015). Discovering Hidden Diversity of Characins (Teleostei: Characiformes) in Ecuador?s Yasun? National Park. PloS One, 10(8), e0135569. https://doi.org/10.1371/journal.pone.0135569 Finer, M., Jenkins, C.N., Pimm, S.L., Keane, B. & Ross, C. (2008). Oil and gas projects in the western Amazon: threats to wilderness, biodiversity, and indigenous peoples. PloS One, 3(8), e2932. https://doi.org/10.1371/journal.pone.0002932 Fricke, R., Eschmeyer, W. N., & Van der Laan, R. (2018). Catalog of fishes: genera, species, references. California Academy of Sciences, San Francisco, CA, USA http://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp Garc?a-D?vila, C. R., Flores, M., Pinedo, L., Loyola, R., Castro-Ruiz, D., Angulo, C., Mejia, E., S?nchez, H., Garc?a, A., Chota, W., Estivals, G., Panduro, H., Nolorbe, C., Chuquipiondo, C., Duponchelle, F., & Renno, J. F. (2017). Aplicaci?n del Barcoding al Manejo y Conservaci?n de Peces y sus Subproductos en la Amazon?a Peruana. Folia Amaz?nica, 26(2), 195-204. https://doi.org/10.24841/fa.v26i2.329 Garc?a-D?vila, C., S?nchez Riveiro, H., Flores Silva, M. A., Mej?a de Loayza, E., Angulo Ch?vez, C., Castro Ruiz, D., Estivals, G., Garc?a V?squez, A., Nolorbe Payahua, C., Vargas D?vila, G., N??ez, J., Mariac, C., Duponchelle, F., & Renno, J. F. (2018). Peces de Consumo de la Amazon?a Peruana. Instituto de Investigaciones de la Amazon?a Peruana (IIAP), 218, ISBN: 978-612-4372-11-7. Gardiner, M. M., Allee, L. L., Brown, P. M., Losey, J. E., Roy, H. E., & Smyth, R. R. (2012). Lessons from lady beetles: accuracy of monitoring data from US and UK citizen?science 99 programs. Frontiers in Ecology and the Environment, 10(9), 471-476. https://doi.org/10.1890/110185 Gonz?lez, A.B. & Pozo, J. (2019). The Industrial Camera Modules Market: Market review and forecast until 2022. PhotonicsViews, 16(2), 24-26. https://doi.org/10.1002/phvs.201970207 Harris, J. H. (1995). The use of fish in ecological assessments. Australian Journal of Ecology, 20(1), 65-80. https://doi.org/10.1111/j.1442-9993.1995.tb00523.x Hern?ndez-Serna, A. & Jim?nez-Segura, L.F. (2014). Automatic identification of species with neural networks. PeerJ, 2, e563. https://doi.org/10.7717/peerj.563 Howard, J., & Gugger, S. (2020). Fastai: A layered API for deep learning. Information, 11(2), 108. https://doi.org/10.3390/info11020108 J?z?quel, C., Tedesco, P.A., Darwall, W., Dias, M.S., Frederico, R.G., Hidalgo, M., Hugueny, B., Maldonado?Ocampo, J., Martens, K., Ortega, H. & Torrente?Vilara, G. (2020). Freshwater fish diversity hotspots for conservation priorities in the Amazon Basin. Conservation Biology, 34(4), 956-965. https://doi.org/10.1111/cobi.13466 Junk, W.J., Soares, M.G.M. & Bayley, P.B. (2007). Freshwater fishes of the Amazon River basin: their biodiversity, fisheries, and habitats. Aquatic Ecosystem Health & Management, 10(2), 153-173. https://doi.org/10.1080/14634980701351023 Kirsch, J.E., Day, J.L., Peterson, J.T. & Fullerton, D.K. (2018). Fish misidentification and potential implications to monitoring within the San Francisco Estuary, California. Journal of Fish and Wildlife Management, 9(2), 467-485. https://doi:10.3996/032018- JFWM-020 100 Lendemer, J., Thiers, B., Monfils, A.K., Zaspel, J., Ellwood, E.R., Bentley, A., LeVan, K., Bates, J., Jennings, D., Contreras, D. & Lagomarsino, L. (2020). The extended specimen network: A strategy to enhance US biodiversity collections, promote research and education. BioScience, 70(1), 23-30. https://doi.org/10.1093/biosci/biz165 Lob?n-Cervi?, J., Hess, L.L., Melack, J.M. & Araujo-Lima, C.A. (2015). The importance of forest cover for fish richness and abundance on the Amazon floodplain. Hydrobiologia, 750(1), 245-255. https://doi.org/10.1007/s10750-014-2040-0 Marini, S., Fanelli, E., Sbragaglia, V., Azzurro, E., Fernandez, J. D. R., & Aguzzi, J. (2018). Tracking fish abundance by underwater image recognition. Scientific Reports, 8(1), 1-12. https://doi.org/10.1038/s41598-018-32089-8 Miller-Coleman, R.L., Dodsworth, J.A., Ross, C.A., Shock, E.L., Williams, A.J., Hartnett, H.E., McDonald, A.I., Havig, J.R. & Hedlund, B.P. (2012). Korarchaeota diversity, biogeography, and abundance in Yellowstone and Great Basin hot springs and ecological niche modeling based on machine learning. PLOS ONE, 7(5), e35964. https://doi.org/10.1371/journal.pone.0035964 Moreau, M. A., & Coomes, O. T., (2007). Aquarium fish exploitation in western Amazonia: conservation issues in Peru. Environmental Conservation, 34(1), 12-22. https://doi.org/10.1017/S0376892907003566 Newman, G., Wiggins, A., Crall, A., Graham, E., Newman, S., & Crowston, K. (2012). The future of citizen science: emerging technologies and shifting paradigms. Frontiers in Ecology and the Environment, 10(6), 298-304. https://doi.org/10.1890/110294 Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. & Clune, J. (2018). Automatically identifying, counting, and describing wild animals in camera- 101 trap images with deep learning. Proceedings of the National Academy of Sciences, 115(25), E5716-E5725. https://doi.org/10.1073/pnas.1719367115 Oliveira, C., Avelino, G.S., Abe, K.T., Mariguela, T.C., Benine, R.C., Ort?, G., Vari, R.P. & e Castro, R.M.C. (2011). Phylogenetic relationships within the speciose family Characidae (Teleostei: Ostariophysi: Characiformes) based on multilocus analysis and extensive ingroup sampling. BMC Evolutionary Biology, 11(1), 1-25. https://doi.org/10.1186/1471- 2148-11-275 Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L. & Desmaison, A. (2019). Pytorch: An imperative style, high- performance deep learning library. Advances in Neural Information Processing Systems, 32, 8026-8037. https://pytorch.org/ Pelicice, F.M., Bialetzki, A., Camelier, P., Carvalho, F.R., Garc?a-Berthou, E., Pompeu, P.S., Mello, F.T.D. & Pavanelli, C.S. (2021). Human impacts and the loss of Neotropical freshwater fish diversity. Neotropical Ichthyology, 19. https://doi.org/10.1590/1982- 0224-2021-0134 Perdig?o, P., Lous?, P., Ascenso, J., & Pereira, F., (2020). Visual monitoring of High-Sea fishing activities using deep learning-based image processing. Multimedia Tools and Applications, 79, 22131-22156. https://doi.org/10.1007/s11042-020-08949-9 Qin, H., Li, X., Liang, J., Peng, Y., & Zhang, C. (2015). DeepFish: Accurate underwater live fish recognition with a deep architecture. Neurocomputing, 187, 49-58. https://doi.org/10.1016/j.neucom.2015.10.122 Rasmusson, J., Dahlgren, F., Gustafsson, H. & Nilsson, T. (2004). Multimedia in mobile phones- The ongoing revolution. Ericsson Review, 2, 98-107. 102 Recknagel, F. (2001). Applications of machine learning to ecological modelling. Ecological Modelling, 146(1-3), 303-310. https://doi.org/10.1016/S0304-3800(01)00316-7 Reis, R.E., Albert, J.S., Di Dario, F., Mincarone, M.M., Petry, P. & Rocha, L.A. (2016). Fish biodiversity and conservation in South America. Journal of Fish Biology, 89(1), 12-47. https://doi.org/10.1111/jfb.13016 Ronneberger, O., Fischer, P. & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-assisted Intervention, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 Schuettpelz, E., Frandsen, P.B., Dikow, R.B., Brown, A., Orli, S., Peters, M., Metallo, A., Funk, V.A. & Dorr, L.J., (2017). Applications of deep convolutional neural networks to digitized natural history collections. Biodiversity Data Journal, 1(5), e21139-e21139. https://dx.doi.org/10.3897%2FBDJ.5.e21139 Sullivan, D.P., Winsnes, C.F., ?kesson, L., Hjelmare, M., Wiking, M., Schutten, R., Campbell, L., Leifsson, H., Rhodes, S., Nordgren, A. & Smith, K. (2018). Deep learning is combined with massive-scale citizen science to improve large-scale image classification. Nature Biotechnology, 36(9), 820. https://doi.org/10.1038/nbt.4225 Sun, X., Shi, J., Dong, J., & Wang, X. (2016). Fish recognition from low-resolution underwater images. 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 471-476. https://doi.org/10.1109/CISP- BMEI.2016.7852757 Swanson, A., Kosmala, M., Lintott, C. & Packer, C. (2016). A generalized approach for producing, quantifying, and validating citizen science data from wildlife images. Conservation Biology, 30(3), 520-531. https://doi.org/10.1111/cobi.12695 103 Swierk, L., & Madigosky, S. R. (2014). Environmental perceptions and resource use in rural communities of the Peruvian Amazon (Iquitos and vicinity, Maynas Province). Tropical Conservation Science, 7(3), 382-402. https://doi.org/10.1177%2F194008291400700303 Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P. & Belongie, S. (2018). The inaturalist species classification and detection dataset. Proceedings of the IEEE conference on computer vision and pattern recognition. 8769- 8778. https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00914 W?ldchen, J. & M?der, P. (2018). Machine learning for image based species identification. Methods in Ecology and Evolution, 9(11), 2216-2225. https://doi.org/10.1111/2041- 210X.13075 Weinstein, B. G. (2017). A computer vision for animal ecology. Journal of Animal Ecology, 87(3), 533-545. https://doi.org/10.1111/1365-2656.12780 Winemiller, K.O., McIntyre, P.B., Castello, L., Fluet-Chouinard, E., Giarrizzo, T., Nam, S., Baird, I.G., Darwall, W., Lujan, N.K., Harrison, I. and Stiassny, M.L.J. (2016). Balancing hydropower and biodiversity in the Amazon, Congo, and Mekong. Science, 351(6269), 128-129. https://doi.org/10.1126/science.aac7082 White, A. E., Dikow, R. B., Baugh, M., Jenkins, A., & Frandsen, P. B. (2020). Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning. Applications in Plant Sciences, 8(6), e11352. https://doi.org/10.1002/aps3.11352 104 Appendices Appendix I. Sample Size Controlled Haplotype Distance Matrices Table A.1 Haplotypic distances matrices for genetic distance by way of nucleotide differences (Top), and morphospace Euclidean distance centroids (bottom). https://tinyurl.com/3av2x665 105 Appendix II. Morphospace UMAP Projection with Image Overlay Figure A.1 Euclidean UMAP projection (N. Neighbors = 9) of Chelonia mydas carapace images (n = 204) embedded using a MobileNetV2 neural network with original image overlay. https://www.iucn.org/news/secretariat/202105/a-tribute-lee-merriam-talbot-1930-2021 106 107