ABSTRACT 
 
 
 
 
Title of Dissertation: ECOLOGICAL APPLICATIONS OF 
MACHINE LEARNING TO DIGITIZED 
NATURAL HISTORY DATA 
  
 Alexander John Robillard, Doctor of Philosophy, 
2022 
  
Dissertation directed by: Associate Professor Christopher Rowe, 
University of Maryland Center for 
Environmental Science 
 
  
Natural history collections are a valuable resource for assessment of biodiversity and 
species decline. Over the past few decades, digitization of specimens has increased the 
accessibility and value of these collections. As such the number and size of these digitized 
data sets have outpaced the tools needed to evaluate them. To address this, researchers have 
turned to machine learning to automate data-driven decisions. Specifically, applications of 
deep learning to complex ecological problems is becoming more common. As such, this 
dissertation aims to contribute to this trend by addressing, in three distinct chapters, 
conservation, evolutionary and ecological questions using deep learning models.  
For example, in the first chapter we focus on current regulations prohibiting the sale and 
distribution of hawksbill sea turtle derived products, which continues internationally in 
physical and online marketplaces. To curb the sale of illegal tortoiseshell, application of new 
technologies like convolutional neural networks (CNNs) is needed. Therein we describe a 
curated data set (n = 4,428) which was used to develop a CNN application we are calling 
?SEE Shell?, which can identify real and faux hawksbill derived products from image data. 
  
Developed on a MobileNetV2 using TensorFlow, SEE Shell was tested against a validation 
(n = 665) and test (n = 649) set where it achieved an accuracy between 82.6-92.2% 
correctness depending on the certainty threshold used. We expect SEE Shell will give 
potential buyers more agency in their purchasing decision, in addition to enabling retailers to 
rapidly filter their online marketplaces. 
In the second chapter we focus on recent research which utilized geometric 
morphometrics, associated genetic data, and Principal Component Analysis to successfully 
delineate Chelonia mydas (green sea turtle) morphotypes from carapace measurements. 
Therein we demonstrate a similar, yet more rapid approach to this analysis using computer 
vision models. We applied a U-Net to isolate carapace pixels of (n = 204) of juvenile C. 
mydas from multiple foraging grounds across the Eastern Pacific, Western Pacific, and 
Western Atlantic. These images were then sorted based on general alignment (shape) and 
coloration of the pixels within the image using a pre-trained computer vision model 
(MobileNetV2). The dimensions of these data were then reduced and projected using 
Universal Manifold Approximation and Projection. Associated vectors were then compared 
to simple genetic distance using a Mantel test. Data points were then labeled post-hoc for 
exploratory analysis. We found clear congruence between carapace morphology and genetic 
distance between haplotypes, suggesting that our image data have biological relevance. Our 
findings also suggest that carapace morphotype is associated with specific haplotypes within 
C. mydas. Our cluster analysis (k = 3) corroborates past research which suggests there are at 
least three morphotypes from across the Eastern Pacific, Western Pacific, and Western 
Atlantic. 
  
Finally, within the third chapter we discuss the sharp increase in agricultural and 
infrastructure development and the paucity of widespread data available to support 
conservation management decisions around the Amazon. To address these issues, we outline 
a more rapid and accurate tool for identifying fish fauna in the world's largest freshwater 
ecosystem, the Amazon. Current strategies for identification of freshwater fishes require high 
levels of training and taxonomic expertise for morphological identification or genetic testing 
for species recognition at a molecular level. To overcome these challenges, we built an image 
masking model (U-Net) and a CNN to mask and classify Amazonian fish in photographs. 
Fish used to generate training data were collected and photographed in tributaries in 
seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 
2019. Species identifications in the training images (n = 3,068) were verified by expert 
ichthyologists. These images were supplemented with photographs taken of additional 
Amazonian fish specimens housed in the ichthyological collection of the Smithsonian?s 
National Museum of Natural History. We generated a CNN model that identified 33 genera 
of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish 
image recognition tools, such as the one described here, will enable fishermen, local 
communities, and citizen scientists to more effectively participate in collecting and sharing 
data from their territories to inform policy and management decisions that impact them 
directly.  
 
 
 
  
  
 
 
 
 
ECOLOGICAL APPLICATIONS OF MACHINE LEARNING TO DIGITIZED 
NATURAL HISTORY DATA 
 
 
 
by 
 
 
Alexander John Robillard 
 
 
 
 
 
Dissertation submitted to the Faculty of the Graduate School of the  
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Doctor of Philosophy, 
2022 
 
 
 
 
 
 
 
 
 
 
Advisory Committee: 
Dr. Christopher Rowe, Chair 
Dr. Helen Bailey, Co-Chair 
Dr. Vyacheslav Lyubchich 
Dr. Rebecca Dikow 
Dr. Jeffrey Seminoff 
Dean?s Representative: Dr. Gerald Wilkinson 
 
  
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
? Copyright by 
Alexander John Robillard 
2022 
 
 
 
 
 
 
 
 
  
 
Dedication 
 
 
 
 
 
 
 
This manuscript is dedicated to the increase and diffusion of knowledge.  
 
 
 
 
https://www.iucn.org/news/secretariat/202105/a-tribute-lee-merriam-talbot-1930-
2021 
 
 
 
 
   
ii  
 
 
Acknowledgements 
This doctoral dissertation was supported through the Smithsonian Institution 
Predoctoral Fellowship, the UMD Covid Relief Grant, and through a UMCES 
Graduate Student FRA. Significant funding was generously given to support this 
research by The Bently Foundation (Ch.1), FAPESP (#2016/19075-9), Smithsonian?s 
Global Genome Initiative (GGI-Rolling-2019-2020; 2019-242), and GeoPark Per? 
(Ch. 3). Computation was performed on the Smithsonian High-Performance 
Computing Cluster (SI-HPC; doi.org/10.25572/SIHPC).  
 I?d like to thank Brad Nahill and the entire SEE Turtles network for their 
significant contributions and continued support for this work. Special thanks go to Dr. 
Christine Madden Hof of the World Wildlife Fund for Nature for her contributions to 
the development of the SEE Shell project and associated research. I would like to 
thank Cristian Ramirez-Gallego, Didiher Chac?n Chaverri, Karla G Barrientos-
Munoz, Callie A. Veelenturf, Muhammad Jayuli, Dr. Hiltrud Cordes, Didiher Chac?n 
Vargas, Marino Abrego, and Captain Genoveva Forero and from the SEE Turtles 
network for aiding in image data collection. I?d also like to thank Dr. Jose Urteaga for 
his feedback on the manuscript and overall scope of the SEE Shell project.  
 My sincere thanks go to Dr. Rocio ?lvarez-Varas for her support, enthusiasm, 
initiated research, collaboration, and vital commentary on drafts of Chapter 2. 
Additionally, I?d like to thank Dr. Daniel Godoy, Alejandro Fallabrino, Dr. Gabriela 
V?lez, Dr. Eduardo Res?ndiz, Maike Heidemeyer, Dr. Juan Pablo Mu?oz, Daniela 
Alarc?n, Dr. Joanna Alfaro, Dr. Jeffrey Mangel for their initiated field work. Their 
collective body of work, their permitting, and excellent data curation was the 
iii  
 
 
foundation for the research conducted in Chapter 2. Additional thanks go to UMCES 
REU student Sammy Arnold who, for his summer project, assisted me with 
development of the turtle shell masking model. Special thanks go to Dr. Michael 
Jensen for his support on several chapters of this dissertation, including data 
collection and offering vital commentary on this manuscript. 
I?d like to thank Dr. Jessica Deichmann of the Smithsonian National Zoo and 
Conservation Biology Institute for her significant contributions and support for the 
work carried out in Chapter 3, your help and guidance throughout this process was 
pivotal to the success of this dissertation. Additional thanks to Morgan Ruiz-Tafur, 
Edgard Leonardo D?vila Panduro for their input on drafts of Chapter 3, and for their 
initiated field efforts. Thanks goes to Dr. C. David de Santana for his vital input and 
edits on Chapter 3 as well. I thank the people of the Achuar native community of 
Brasilia for access to their territory and their interest and contribution to the project. I 
thank the Indigenous Socio-Environmental Monitors (MOSAI) and local indigenous 
community experts from the Wampis and Achuar nations for invaluable help with 
data collection in the field including Sem Flores Bautista, Antonio Dias Pinedo, 
Persiles Dias Espinar, Wilson Chichipe Villega and Peas Mukuik Tsunki. We are 
grateful to Diego Balbuena, Diana V?squez, Ernesto Yallico and the GeoPark Per? 
HSE team for logistical support in the field. Homero S?nchez Riveiro and James 
Garcia Ayala contributed to species identifications. I would like to thank Lynne 
Parenti, Sandra Raredon, Kris Murphy, and the entire Smithsonian Museum Support 
Center staff for granting us access to and enabling us to successfully traverse the 
USNM ichthyological collections. Additional thanks go to Julianna Hazera, Erika Ali, 
iv  
 
 
Shauna Rasband and Guillem Millan for assistance photographing USNM specimens. 
Fish sampling in Peru in 2018 and 2019 was conducted under permits RD N264-
2018-PRODUCE-DGPCHDI and RD N358-2019-PRODUCE-DGPCHDI. This is 
under contribution #65 of the Peru Biodiversity Program of the Smithsonian?s Center 
for Conservation and Sustainability.  
I?d like to thank my dissertation committee, Helen, Rebecca, Jeff, Chris, and 
Slava. Through the highs and lows of the development of this manuscript, you were 
unrelenting in your support. Through your collective acts of patience, leadership, and 
generosity I have come to learn the true meaning of mentorship. Additionally, my 
sincere gratitude goes to Dr. George Zug, and the late Dr. Lee Talbot, for their 
contributions to the field of ecology as well as their guidance throughout this journey. 
Special thanks go to Mike Trizna, Dr. Alex White, Dr. Mirian Tsuchiya and 
the rest of the group at the Smithsonian OCIO Data Science lab, your thoughtful input 
and help at every step were only outmatched by your friendship. Additional 
appreciation goes to Nicole Barbour, Amber Fandel, Ben Colbert and the entire 
Bailey lab group, your kindness, collaboration and friendship were vital to my 
success. I?d like to thank my roommates Blake Klocke, Fiorella Andrea Brice?o 
Huerta, Cecilia Barriga Bahamonde, and Anne Safiya Clay, for all the friendship and 
dinner parties.  
 Finally, I?d like to thank my parents Mark and Regina, my siblings Janine and 
Steven, and my partner Brigid, for their undying support over the years which made 
this dissertation a reality. 
v  
 
 
Table of Contents 
 
 
Dedication ..................................................................................................................... ii 
Acknowledgements ...................................................................................................... iii 
Table of Contents ......................................................................................................... vi 
List of Tables ............................................................................................................. viii 
List of Figures .............................................................................................................. ix 
Introduction ................................................................................................................... 1 
References ............................................................................................................ 5 
Chapter 1: SEE Shell .................................................................................................. 14 
Introduction ................................................................................................................. 14 
Methods....................................................................................................................... 17 
Preprocessing Steps ............................................................................................ 19  
Training, Validation, Testing .............................................................................. 20 
Results ......................................................................................................................... 21 
Discussion ................................................................................................................... 24 
Conclusions ................................................................................................................. 28 
References ........................................................................................................... 28 
Chapter 2: Mapping Genetic Lineage Through Morphology ..................................... 35 
Introduction ................................................................................................................. 36 
Methods and Experimental Design ............................................................................. 41 
Data Collection and Study Sites ......................................................................... 41 
vi  
 
 
Data Processing ................................................................................................... 41 
    Semi-Supervised Classification and Dimensionality Reduction ........................ 42 
Genetic Analysis ................................................................................................. 43 
Analysis and Visualization ................................................................................. 45 
Results ......................................................................................................................... 45 
Discussion ................................................................................................................... 50 
Conclusions ................................................................................................................. 60 
References ........................................................................................................... 60 
Chapter 3: Application of a Deep Learning Image Classifier for Identification of 
Amazonian Fishes ....................................................................................................... 83 
Introduction ................................................................................................................. 84 
Methods....................................................................................................................... 87 
Preprocessing Steps ............................................................................................ 88 
Identification Model Architecture, Training, and Validation ............................. 88 
Results ......................................................................................................................... 89 
Discussion ................................................................................................................... 93 
Conclusions ................................................................................................................. 96 
References ........................................................................................................... 96 
Appendices ................................................................................................................ 105 
vii  
 
 
List of Tables 
 
 
Table 1.1 ...............................................................................................................24 
Table 2.1 ...............................................................................................................44 
Table 3.1 ...............................................................................................................91 
Table A.1................................................................................................ Appendix I 
 
 
 
 
 
viii  
 
 
List of Figures 
 
Figure 1.1 ..............................................................................................................19 
Figure 1.2 ..............................................................................................................21 
Figure 1.3 ..............................................................................................................22 
Figure 1.4 ..............................................................................................................23 
Figure 1.5 ..............................................................................................................28 
Figure 2.1 ..............................................................................................................42 
Figure 2.2 ..............................................................................................................46 
Figure 2.3 ..............................................................................................................47 
Figure 2.4 ..............................................................................................................48 
Figure 2.5 ..............................................................................................................49 
Figure 2.6 ..............................................................................................................50 
Figure 3.1 ..............................................................................................................89 
Figure 3.2 ..............................................................................................................92 
Figure A.1 ............................................................................................. Appendix II 
 
 
 
 
 
 
 
 
  
ix  
 
 
Introduction 
The collection and archival of physical voucher specimens in museums and other 
repositories have proven extremely useful for analyzing long-term ecological trends (Shaffer et 
al., 1998). Natural history collection data has proven to be a valuable resource for assessment of 
species decline (Shaffer et al., 1998) and evaluation of biodiversity (Ponder et al., 2001; 
O'Connell Jr et al., 2004). Over the past decade the prevalence of large biodiversity datasets has 
grown rapidly (Weinstein, 2018). Digitization, which is the generation of images based on a 
photograph of a physical voucher specimen, has greatly expanded the usage of collections 
(Hendrick et al., 2020). Throughout time, ecologists, naturalists, and evolutionary biologists have 
utilized drawings, paintings, and photographs to study the natural world (L?rig et al., 2021; 
Hayashi & Yasuda, 2022). As an archival method, such data have proven to be useful for 
documenting historical natural heritage (Hayashi & Yasuda, 2022). The recent push to convert 
specimen data for mobilization on online platforms, like iDigBio (Matsunaga et al., 2013), has 
enhanced the value of these collections by creating secondary workflows that are entirely digital 
(Hendrick et al., 2020). Similarly, platforms like iNaturalist (Van Horn et al., 2018) have even 
circumvented the physical voucher and utilize direct data capture. Implementation of these 
digital-only pipelines has only sped up the size and scope of such collections (Hendrick et al., 
2020). As such, the number and size of image data sets have largely outpaced the tools necessary 
to evaluate them (Weinstein, 2018; L?rig et al., 2021). 
Machine learning, which started in 1930, is a discipline which applies advanced 
algorithms that, by way of static programming, can make data-driven decisions (Thessen, 2016). 
Machine learning is often utilized in ecology for time-series forecasting (Lin et al., 2018; Li et 
1 
 
 
 
al., 2021; Lucas, 2020) and dynamic time warping (Hegg & Kennedy, 2021). Machine learning 
can enable users to find insights from otherwise uninterpretable data. For example, Okamoto et 
al. (2020) were able to successfully estimate sea turtle species composition by using a random 
forest classifier on catch records from 10,490 longline fishing operations.  
Deep learning, which was popularized in 2012, is a branch of machine learning that 
utilizes neural networks with multiple layers to automate detection of features, often applied to 
more complex problems (Christin et al., 2019). Computer vision by way of deep learning is a 
growing technology which utilizes input images to sort and categorize image data. Today, such 
tools are answering any number of key questions about photographic datasets and their 
underlying information (Weinstein, 2018; L?rig et al., 2021). A rapid increase in the 
computational power behind machine learning models (Thessen, 2016; Maeda-Guti?rrez et al., 
2020) has acted as a catalyst for implementation of computer vision across several biological 
(Brosch et al., 2014; Hua et al., 2015) and ecological studies (Grinblat et al., 2016; Norouzzadeh 
et al., 2018; Younis et al., 2018; Borowiec et al., 2021). Unfortunately, many past applications of 
machine learning to the natural world are applied to predicting rather than understanding ecology 
(Lucas, 2020), but this paradigm is changing (Borowiec et al., 2021; L?rig et al., 2021).  
More recently, deep learning has been applied to animal behavior (M?nck et al., 2018), 
population genetics (Sheehan & Song, 2016), niche modeling (White et al., 2019) and species 
delimitation studies (Saryan et al., 2020). Within this dissertation, the hope is that similar 
explorations at the intersection of ecology and deep learning can be made. Specifically, each 
chapter within this dissertation describes independent studies which were carried out with the 
intention that each can act as a case-study for applying computer vision to digitized natural 
2 
 
 
 
history data to address conservation, evolution, and ecological issues. Respectively, each of these 
specific areas is representative of the chapters within this dissertation.  
The first chapter outlines a novel application of computer vision to identifying 
Eretmochelys imbricata (hawksbill) derived products known as ?tortoiseshell?. Herein is a 
description of how a Convolutional Neural Network (CNN), which can accurately identify 
tortoiseshell products, was generated. This is undoubtedly a major conservation issue given that 
the endangered E. imbricata are by far the most illegally traded sea turtle species on the planet 
(Frazier et al., 2003; Miller et al., 2019; Nahill et al., 2020). Application of such a tool has the 
potential to have direct and wide-ranging impact to collect data on and prevent illegal sales 
which perpetuate the trade both in real and virtual marketplaces (Nahill et al., 2020). Although 
past efforts have applied machine learning to the illegal wildlife trade (Di Minin et al., 2019), 
there is no evidence to date which suggests such efforts have applied computer vision to the sea 
turtle trade specifically. This case-study may prove to be a critical first step toward applying this 
novel technique to other endangered species derived products.  
The second chapter describes a novel approach for analyzing and contextualizing 
morphometric image data of a heritable phenotypic trait (carapace) in Chelonia mydas (green sea 
turtle). Past research has used machine learning to delimitate species across several taxa 
(Derkarabetian et al., 2019; Derkarabetian et al., 2022; Perez et al., 2022), including terrestrial 
turtles (Martin et al., 2021). Separately, dimensionality reduction by way of Principal 
Component Analysis (PCA) on multivariate data has also been applied to morphological data as 
a means to delimitate species (?lvarez-Varas et al., 2019). In tandem, machine learning and PCA 
have proven to be complementary tools with the ability to identify species boundaries from 
morphological data (Saryan et al., 2020). Uniform Manifold Approximation and Projection 
3 
 
 
 
(UMAP) is another form of dimensionality reduction (McInnes et al., 2018) which, when used in 
tandem with a machine learning model, has the capacity to outperform PCA (Chari et al., 2021; 
Yang et al., 2021). Proper delimitation of species has evolutionary implications which can 
directly impact species counts and biodiversity assessments, this in turn impacts advocacy and 
funding (Seminoff & Shanker, 2008; Funk et al., 2019; Saryan et al., 2020), thus making 
comparisons amongst similar methods is vital.  
The overall goal for the second chapter is to use a machine learning model to sort the C. 
mydas carapace image data based on general alignment (shape) and coloration of the pixels 
within the image, and project it into a morphospace similar to ?lvarez-Varas et al. (2019). By 
using UMAP to examine an expanded version of the ?lvarez-Varas et al. (2019) data, one can 
make direct comparisons between dimensionality reduction strategies. The output embeddings 
from this analysis will be compared to associated genetic data in hopes of validating the 
biological relevance of the image morphology data. This suggested pipeline, which utilizes 
image data rather than individual measurements, may prove to expedite future morphometric 
analysis and could prove to be the theoretical foundation for other computer vision applications 
focused on sea turtle morphology.  
The research in the third and final chapter of this dissertation attempts to generate a 
Convolutional Neural Network (CNN) which is able to accurately identify Amazonian fish 
species. Specifically, we describe the methods used to build this CNN, and the underlying data 
set which it was trained on. The Amazon basin is home to over 2,700 species of freshwater fish 
(Junk et al., 2007; Dagosta & De Pinna, 2019), many of which are critical to the health and 
economy of the people living in the Amazon (Moreau & Coomes, 2007; Coomes et al., 2010). 
Given the difficulty associated with identifying and differentiating species of fish (Kirsch et al., 
4 
 
 
 
2018) it is no surprise that many of the species within this region are not described (Reis et al., 
2016). Past efforts have attempted to utilize computer vision (Hern?ndez-Serna & Jim?nez-
Segura 2014; Sun et al., 2016; Alsmadi et al., 2019), to identify fish with varying levels of 
success. Given that the region our data comes from, within the sub-drainages of the Mara??n 
river, is one of the most under sampled in the Amazon (J?z?quel et al., 2020), such a tool may 
prove extremely useful for bridging this informational gap. Additionally, the methods and dataset 
described within the third chapter should provide a foundation for other computer vision projects 
seeking to identify freshwater fish within the Amazon.  
Each of the projects outlined within this dissertation apply deep learning methods to 
digitized image data in an attempt to answer complex biological questions. Although each 
chapter focuses these efforts on different aquatic organisms, the methodologies put forth here can 
be applied across taxa. Considering the increasing rate at which machine learning and computer 
vision tools are being deployed to answer similar ecology questions (Weinstein, 2018), the 
possibility that such tools will be an essential part of every biologist's tool kit seems more certain 
(L?rig et al., 2021).  
 
References 
Alsmadi, M. K., Tayfour, M., Alkhasawneh, R. A., Badawi, U., Almarashdeh, I., & Haddad, F. 
(2019). Robust feature extraction methods for general fish classification. International 
Journal of Electrical & Computer Engineering, 9, 2088-8708. 
https://doi.org/10.11591/ijece.v9i6.pp5192-5204 
5 
 
 
 
?lvarez-Varas, R., V?liz, D., V?lez-Rubio, G.M., Fallabrino, A., Z?rate, P., Heidemeyer, M., 
Godoy, D.A. & Ben?tez, H.A. (2019). Identifying genetic lineages through shape: An 
example in a cosmopolitan marine turtle species using geometric morphometrics. PLOS 
ONE, 14(10), e0223587. 
Borowiec, M. L., Frandsen, P., Dikow, R., McKeeken, A., Valentini, G., & White, A. E. (2021). 
Deep learning as a tool for ecology and evolution. EcoEvoRxiv, 1-30. 
https://doi.org/10.32942/osf.io/nt3as 
Brosch, T., Yoo, Y., Li, D.K., Traboulsee, A. & Tam, R. (2014). Modeling the variability in 
brain morphology and lesion distribution in multiple sclerosis by deep learning. 
International Conference on Medical Image Computing and Computer-Assisted 
Intervention, 8674, 462-469. https://doi.org/10.1007/978-3-319-10470-6_5 
Chari, T., Banerjee, J. & Pachter, L. (2021). The specious art of single-cell genomics. bioRxiv, 1-
25. https://doi.org/10.1101/2021.08.25.457696 
Christin, S., Hervet, ?., & Lecomte, N. (2019). Applications for deep learning in ecology. 
Methods in Ecology and Evolution, 10(10), 1632-1644. https://doi.org/10.1111/2041-
210X.13256 
Coomes, O. T., Takasaki, Y., Abizaid, C., & Barham, B. L. (2010). Floodplain fisheries as 
natural insurance for the rural poor in tropical forest environments: evidence from 
Amazonia. Fisheries Management and Ecology, 17(6), 513-521. 
https://doi.org/10.1111/j.1365-2400.2010.00750.x 
Dagosta, F.C. & De Pinna, M. (2019). The fishes of the Amazon: distribution and 
biogeographical patterns, with a comprehensive list of species. Bulletin of the American 
Museum of Natural History, (431), 1-163. https://doi.org/10.1206/0003-0090.431.1.1 
6 
 
 
 
Derkarabetian, S., Castillo, S., Koo, P.K., Ovchinnikov, S. & Hedin, M. (2019). A demonstration 
of unsupervised machine learning in species delimitation. Molecular Phylogenetics and 
Evolution, 139, 106562. https://doi.org/10.1016/j.ympev.2019.106562 
Derkarabetian, S., Starrett, J., & Hedin, M. (2022). Using natural history to guide supervised 
machine learning for cryptic species delimitation with genetic data. Frontiers in Zoology, 
19(1), 1-15. https://doi.org/10.1186/s12983-022-00453-0 
Di Minin, E., Fink, C., Hiippala, T., & Tenkanen, H. (2019). A framework for investigating 
illegal wildlife trade on social media with machine learning. Conservation Biology, 33(1), 
210. https://dx.doi.org/10.1111%2Fcobi.13104 
Frazier, J., Lutz, P., Musick, J., & Wyneaken, J. (2003). Prehistoric and ancient historic 
interactions between humans and marine turtles. In P.L Lutz, J. A.  Musick & J. 
Wyneken, The Biology of Sea Turtles, Volume II, 1-38. 
https://doi.org/10.1201/9781420040807 
Funk, W.C., Forester, B.R., Converse, S.J., Darst, C. & Morey, S. (2019). Improving 
conservation policy with genomics: a guide to integrating adaptive potential into US 
Endangered Species Act decisions for conservation practitioners and geneticists. 
Conservation Genetics, 20(1), 115-134. https://doi.org/10.1007/s10592-018-1096-1 
Grinblat, G.L., Uzal, L.C., Larese, M.G. & Granitto, P.M. (2016). Deep learning for plant 
identification using vein morphological patterns. Computers and Electronics in 
Agriculture, 127, 418-424. https://doi.org/10.1016/j.compag.2016.07.003 
Hayashi, R., & Yasuda, Y. (2022). Past biodiversity: Japanese historical monographs document 
the trans?Pacific migration of the black turtle, Chelonia mydas agassizii. Ecological 
Research, 37(1), 151-155. https://doi.org/10.1111/1440-1703.12265 
7 
 
 
 
Hedrick, B.P., Heberling, J.M., Meineke, E.K., Turner, K.G., Grassa, C.J., Park, D.S., Kennedy, 
J., Clarke, J.A., Cook, J.A., Blackburn, D.C. & Edwards, S.V. (2020). Digitization and 
the future of natural history collections. BioScience, 70(3), 243-251. 
https://doi.org/10.1093/biosci/biz163 
Hegg, J. C., & Kennedy, B. P. (2021). Let's do the time warp again: non?linear time series 
matching as a tool for sequentially structured data in ecology. Ecosphere, 12(9), e03742. 
https://doi.org/10.1002/ecs2.3742 
Hern?ndez-Serna, A. & Jim?nez-Segura, L.F. (2014). Automatic identification of species with 
neural networks. PeerJ, 2, e563. https://doi.org/10.7717/peerj.563  
Hua, K.L., Hsu, C.H., Hidayati, S.C., Cheng, W.H. & Chen, Y.J.  (2015). Computer-aided 
classification of lung nodules on computed tomography images via deep learning 
technique. OncoTargets and Therapy, 8. https://dx.doi.org/10.2147%2FOTT.S80733 
J?z?quel, C., Tedesco, P.A., Darwall, W., Dias, M.S., Frederico, R.G., Hidalgo, M., Hugueny, 
B., Maldonado?Ocampo, J., Martens, K., Ortega, H. & Torrente?Vilara, G. (2020). 
Freshwater fish diversity hotspots for conservation priorities in the Amazon Basin. 
Conservation Biology, 34(4), 956-965. https://doi.org/10.1111/cobi.13466 
Junk, W.J., Soares, M.G.M. & Bayley, P.B. (2007). Freshwater fishes of the Amazon River 
basin: their biodiversity, fisheries, and habitats. Aquatic Ecosystem Health & 
Management, 10(2), 153-173. https://doi.org/10.1080/14634980701351023 
Kirsch, J.E., Day, J.L., Peterson, J.T. & Fullerton, D.K. (2018). Fish misidentification and 
potential implications to monitoring within the San Francisco Estuary, California. 
Journal of Fish and Wildlife Management, 9(2), 467-485. https://doi:10.3996/032018-
JFWM-020 
8 
 
 
 
Li, M. F., Glibert, P. M., & Lyubchich, V. (2021). Machine Learning Classification Algorithms 
for Predicting Karenia brevis Blooms on the West Florida Shelf. Journal of Marine 
Science and Engineering, 9(9), 999. https://doi.org/10.3390/jmse9090999 
Lin, C. H. M., Lyubchich, V., & Glibert, P. M. (2018). Time series models of decadal trends in 
the harmful algal species Karlodinium veneficum in Chesapeake Bay. Harmful Algae, 73, 
110-118. https://doi.org/10.1016/j.hal.2018.02.002 
Lucas, T. C. (2020). A translucent box: interpretable machine learning in ecology. Ecological 
Monographs, 90(4), e01422. https://doi.org/10.1002/ecm.1422 
L?rig, M. D., Donoughe, S., Svensson, E. I., Porto, A., & Tsuboi, M. (2021). Computer vision, 
machine learning, and the promise of phenomics in ecology and evolutionary biology. 
Frontiers in Ecology and Evolution, 9, 148. https://doi.org/10.3389/fevo.2021.642774 
Maeda-Guti?rrez, V., Galvan-Tejada, C.E., Zanella-Calzada, L.A., Celaya-Padilla, J.M., Galv?n-
Tejada, J.I., Gamboa-Rosales, H., Luna-Garcia, H., Magallanes-Quintanar, R., Guerrero 
Mendez, C.A. & Olvera-Olvera, C.A., (2020). Comparison of convolutional neural 
network architectures for classification of tomato plant diseases. Applied Sciences, 10(4), 
1245. https://doi.org/10.3390/app10041245 
Martin, B. T., Chafin, T. K., Douglas, M. R., Placyk Jr, J. S., Birkhead, R. D., Phillips, C. A., & 
Douglas, M. E. (2021). The choices we make and the impacts they have: Machine 
learning and species delimitation in North American box turtles (Terrapene spp.). 
Molecular Ecology Resources, 21(8), 2801-2817. https://doi.org/10.1111/1755-
0998.13350 
Matsunaga, A., Thompson, A., Figueiredo, R.J., Germain-Aubrey, C.C., Collins, M., Beaman, 
R.S., MacFadden, B.J., Riccardi, G., Soltis, P.S., Page, L.M., Fortes, J.A.B. (2013). A 
9 
 
 
 
computational- and storage-cloud for integration of biodiversity collections. IEEE 9th 
International Conference on e-Science. 78?87. https://doi.org/10.1109/eScience.2013.48. 
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and 
projection for dimension reduction. arXiv preprint, arXiv:1802.03426. 
https://doi.org/10.48550/arXiv.1802.03426 
Miller, E. A., McClenachan, L., Uni, Y., Phocas, G., Hagemann, M. E., & Van Houtan, K. S. 
(2019). The historical development of complex global trafficking networks for marine 
wildlife. Science Advances, 5(3), eaav5948. https://doi.org/10.1126/sciadv.aav5948 
M?nck, H.J., J?rg, A., von Falkenhausen, T., Tanke, J., Wild, B., Dormagen, D., Piotrowski, J., 
Winklmayr, C., Bierbach, D. & Landgraf, T. (2018). BioTracker: an open-source 
computer vision framework for visual animal tracking. arXiv preprint arXiv:1803.07985. 
https://doi.org/10.48550/arXiv.1803.07985 
Moreau, M. A., & Coomes, O. T. (2007). Aquarium fish exploitation in western Amazonia: 
conservation issues in Peru. Environmental Conservation, 34(1), 12-22. 
https://doi.org/10.1017/S0376892907003566 
Nahill, B., von Weller, P., & Barrios-Garrido., H. (2020). The global tortoiseshell trade. Oregon, 
USA: SEE Turtles, 1-83. 
https://static1.squarespace.com/static/5369465be4b0507a1fd05af0/t/5f37089ddc88be5b0f
ce18fe/1597442219875/Global+Tortoiseshell+Report.pdf 
Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. & Clune, 
J. (2018). Automatically identifying, counting, and describing wild animals in camera-
trap images with deep learning. Proceedings of the National Academy of Sciences, 
115(25), E5716-E5725. https://doi.org/10.1073/pnas.1719367115 
10 
 
 
 
O'Connell Jr, A. F., Gilbert, A. T., & Hatfield, J. S. (2004). Contribution of natural history 
collection data to biodiversity assessment in national parks. Conservation Biology, 18(5), 
1254-1261. https://doi.org/10.1111/j.1523-1739.2004.00034.x-i1 
Okamoto, K., Kanaiwa, M., & Ochi, D. (2020). Machine learning approach to estimate species 
composition of unidentified sea turtles that were recorded on the Japanese longline 
observer program. Collective Volumes of Scientific Papers ICCAT, 76(9), 175-178. 
Perez, M. F., Bonatelli, I. A., Romeiro?Brito, M., Franco, F. F., Taylor, N. P., Zappi, D. C., & 
Moraes, E. M. (2022). Coalescent?based species delimitation meets deep learning: 
Insights from a highly fragmented cactus system. Molecular Ecology Resources, 22(3), 
1016-1028. https://doi.org/10.1111/1755-0998.13534 
Ponder, W. F., Carter, G. A., Flemons, P., & Chapman, R. R. (2001). Evaluation of museum 
collection data for use in biodiversity assessment. Conservation biology, 15(3), 648-657. 
https://doi.org/10.1046/j.1523-1739.2001.015003648.x 
Reis, R.E., Albert, J.S., Di Dario, F., Mincarone, M.M., Petry, P. & Rocha, L.A. (2016). Fish 
biodiversity and conservation in South America. Journal of Fish Biology, 89(1), 12-47. 
https://doi.org/10.1111/jfb.13016 
Saryan, P., Gupta, S., & Gowda, V. (2020). Species complex delimitations in the genus 
Hedychium: A machine learning approach for cluster discovery. Applications in Plant 
Sciences, 8(7), e11377. https://doi.org/10.1002/aps3.11377 
Seminoff, J. A., & Shanker, K. (2008). Marine turtles and IUCN Red Listing: a review of the 
process, the pitfalls, and novel assessment approaches. Journal of Experimental Marine 
Biology and Ecology, 356(1-2), 52-68. https://doi.org/10.1016/j.jembe.2007.12.007 
11 
 
 
 
Shaffer, H.B., Fisher, R.N. & Davidson, C. (1998). The role of natural history collections in 
documenting species declines. Trends in Ecology & Evolution, 13(1), 27-30. 
https://doi.org/10.1016/S0169-5347(97)01177-4 
Sheehan, S., & Song, Y. S. (2016). Deep learning for population genetic inference. PLOS 
Computational Biology, 12(3), e1004845. https://doi.org/10.1371/journal.pcbi.1004845 
Sun, X., Shi, J., Dong, J., & Wang, X. (2016). Fish recognition from low-resolution underwater 
images. 9th International Congress on Image and Signal Processing, BioMedical 
Engineering and Informatics (CISP-BMEI), 471-476. https://doi.org/10.1109/CISP-
BMEI.2016.7852757  
Thessen, A. (2016). Adoption of machine learning techniques in ecology and earth science. One 
Ecosystem, 1, e8621. http://dx.doi.org/10.3897/oneeco.1.e8621 
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P. & 
Belongie, S. (2018). The inaturalist species classification and detection dataset. 
Proceedings of the IEEE conference on computer vision and pattern recognition. 8769-
8778. https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00914 
Weinstein, B. G., (2018). A computer vision for animal ecology. Journal of Animal Ecology, 
87(3), 533-545. https://doi.org/10.1111/1365-2656.12780 
White, A. E., Trizna, M. G., Frandsen, P. B., Dorr, L. J., Dikow, R. B., & Schuettpelz, E. (2019). 
Evaluating Geographic Patterns of Morphological Diversity in Ferns and Lycophytes 
Using Deep Neural Networks. Biodiversity Information Science and Standards, (4), 
e37559. https://doi.org/10.3897/biss.3.37559 
Yang, Y., Sun, H., Zhang, Y., Zhang, T., Gong, J., Wei, Y., Duan, Y.G., Shu, M., Yang, Y., Wu, 
D. & Yu, D. (2021). Dimensionality reduction by UMAP reinforces sample heterogeneity 
12 
 
 
 
analysis in bulk transcriptomic data. Cell Reports, 36(4), 109442. 
https://doi.org/10.1016/j.celrep.2021.109442 
Younis, S., Weiland, C., Hoehndorf, R., Dressler, S., Hickler, T., Seeger, B., & Schmidt, M. 
(2018). Taxon and trait recognition from digitized herbarium specimens using deep 
convolutional neural networks. Botany Letters, 165(3-4), 377-383. 
https://doi.org/10.1080/23818107.2018.1446357 
 
 
 
 
 
13 
 
 
 
Chapter 1: SEE Shell- A Deep Learning Model for Detecting Hawksbill 
Derived Products 
Abstract 
1. Despite current regulations prohibiting the sale and distribution of hawksbill sea turtle 
derived products, the tortoiseshell trade continues internationally, in physical and online 
marketplaces. To curb the sale of illegal tortoiseshell, application of new technologies 
like convolutional neural networks (CNNs) is needed.  
2. Here we describe a curated data set (n = 4,428) which was used to develop a CNN 
application we are calling ?SEE Shell?, which can identify real and faux hawksbill 
derived products from image data.  
3. Developed on a MobileNetV2 using TensorFlow, SEE Shell was tested against a 
validation (n = 665) and test (n = 649) set where it achieved an accuracy between 82.6-
92.2% correctness depending on the certainty threshold used.  
4. We expect SEE Shell will give potential buyers more agency in their purchasing decision, 
in addition to enabling retailers to rapidly filter their online marketplaces. 
Introduction 
Illegal harvest is known to negatively impact sea turtle populations around the globe 
(Alvarado-D?az et al., 2001; Senko et al., 2014; Cheng et al., 2018; Joseph et al., 2019). Despite 
current international mandates and domestic legislation, which, in many countries, regulate or 
prohibit the collection, consumption and sale of any sea turtle products (e.g., carapace, meat, 
adipose tissue, organs, blood, eggs); individuals primarily from the family Cheloniidae (and to a 
14 
 
 
 
lesser extent Dermochelyidae), continue to be sold in markets and online in many communities 
worldwide (Aguirre et al., 2006; Nada & Casale, 2011; Rudrud, 2010; Harrison et al., 2017; 
Qui?ones et al., 2017; Nahill et al., 2020). Contemporary research suggests that of all sea turtles, 
hawksbills (Eretmochelys imbricata) are by far the most illegally traded, critically endangered, 
species on the planet (Frazier et al., 2003; Miller et al., 2019; Nahill et al., 2020). Although every 
species of sea turtle is under threat of poaching, hawksbills are the most prized for their 
exceptionally ornate carapace. Their malleable carapaces--colloquially referred to as 
?tortoiseshell? or ?bekko?--are often harvested and fashioned into jewelry and other items of 
adornment (Harrison et al., 2017; Nahill et al., 2020).  
 Conservative estimates suggest that over a 150-year period, approximately 9 million 
hawksbills were illegally harvested (Miller et al., 2019). Despite access to less risky sources of 
protein and legal protections in some areas, illegal harvest of hawksbill still persists (Mancini et 
al., 2011). A recent report reviewing the global tortoiseshell trade across 40 countries suggests 
that since 2017, online trade of hawksbill products may have surpassed in-person sales (Nahill et 
al., 2020).  
 Thus, to curb the continued trade of hawksbill and other sea turtles, new technologies like 
deep learning can be a tool for those on the frontlines of conservation and enforcement. 
Convolutional Neural Networks (CNN) are one deep-learning method successfully used in 
computer vision to extract valuable information from images, for example, learn to accurately 
classify those based on a labeled dataset of training images (Norouzzadeh et al., 2018). A CNN 
can be used to identify features within an image, which can be correlated to morphological traits 
similar to carapace shape, enabling the model to accurately differentiate between classes. 
Ultimately this allows a user to take a photo of an unknown target, pass it through a trained 
15 
 
 
 
model, and receive immediate feedback (a label) as to what is likely depicted in the image 
(Schuettpelz et al., 2017). Synthesis of community science and image classification has proven to 
be an effective method for sorting scientific data (Norouzzadeh et al., 2018; Sullivan et al., 
2018). For example, implementation of deep learning on a training and test dataset, labeled by 
community scientists, resulted in 96.6% of nearly 3.2 million camera trap images of mammals 
being accurately classified (Norouzzadeh et al., 2018). Although past efforts have used similar 
image classification tools to expedite such research (Yang et al., 2009; Yu et al., 2013), those not 
using a CNN often require costly amounts of valuable time to pre-label large subsets, crop 
images, and tune parameters (Norouzzadeh et al., 2018). This is time which could be saved by a 
more rapid and objective method such as the application of a CNN. 
 Image classification, without the assistance of a CNN, has been utilized for sea turtle 
research in the past. Early studies utilized visual inspection of photographs to non-invasively 
identify individual sea turtles (McDonald et al., 1996; Dutton et al., 2005; Reisser et al., 2008; 
Schofield et al., 2008). Early efforts successfully utilized key-point matching of head patterns to 
differentiate between images of individual leatherback sea turtles in Trinidad (De Zeeuw et al., 
2010). Calmanovici et al. (2018) utilized the I3S Pattern image software to implement a mark 
recapture regime with identification accuracy of 85% for free-swimming turtles, and 97% for 
captured individuals. Although implementation of this tool reduced their overall analysis time by 
80%, it still required human identification and outlining of anchor points for each photo 
(Calmanovici et al., 2018). Similarly, Hanna et al. (2021) was able to successfully match all their 
(n= 309) images of green sea turtles to seven individuals using a tool called ?Hotspotter?. Gatto 
et al. (2018) utilized a similar software called APHIS to classify hatchlings and adult turtles with 
92.9% and 81.8% accuracy (respectively) based on images of their flippers. Like the I3S and 
16 
 
 
 
Hotspotter software, APHIS requires significant human point anchoring and identification (Gatto 
et al., 2018). Unfortunately, many of the software implemented in past research have extensive 
and highly technical user manuals, limiting their immediate usability by untrained individuals 
(Calmanovici et al., 2018; Gatto et al., 2018).  
Although image classification for sea turtle research has been implemented in the past, 
previous efforts to classify the tortoiseshell products derived from sea turtles have been limited 
to genetic studies (Jensen et al., 2019, LaCasella et al., 2021), and confiscations (Rice and 
Moore, 2008). An existing automated tool which can quickly identify the nuances of tortoiseshell 
in a non-destructive manner to delineate ?Real? and ?Fake? hawksbill derived products is 
currently lacking. A readily available application would enable scientists, law enforcement, 
community scientists, and other conservation professionals involved with monitoring efforts to 
collect fast and accurate trade data, or simply avoid purchasing tortoiseshell items. 
 Using deep learning, we trained and tested a CNN model to detect real and fake 
tortoiseshell, which formed the basis for a broadly available, tortoiseshell trade monitoring and 
consumer choice application. This model can be used by multiple audiences for rigorous data 
collection, direct detection, monitoring, consumer purchasing power, and social outreach to 
enhance sea turtle conservation.  
Methods 
Images were collected by conservation professionals from across Asia, Oceania, North, 
Central and South America. Additional images were scraped from the internet using the 
DuckDuckGo image search engine. All images were visually verified by the first two authors 
(Author & Brad Nahill) of this paper (n = 4,428). Images were labeled ?real? (n = 1,409) or 
17 
 
 
 
?fake? (n = 3,019). Real item images were considered to be items made from hawksbill sea turtle 
carapace. A small subset (n = 50) of the real images were previously verified with genetic 
analysis (Jensen et al. 2019). Fake item images were derived from a variety of mimics including 
resin, coconut, conch shell, cow horn, wood, and ceramics.  
An additional ?test set? (n = 649) was collected from the same sources but of different 
products. The test set images were not preprocessed and instead were collected with specific 
guidelines given to image takers. Specifically, we asked those collecting test images to (A) try to 
center and focus on items in their photos, (B) refrain from having multiple types of items or full 
store backgrounds, and (C) avoid other obstructions like capturing appendages or glares in the 
image frame. Figures 1.2 (A-C) show violations of these rules, while Figure 1.1 D is an example 
of an optimal image.  
The CNN classifier was developed using an Nvidia GeForce (V100; 32 GB VRAM) 
GPU implementing the Tensorflow library (Abadi et al., 2016) and was developed by retraining 
the last few layers of a MobileNetV2 pretrained architecture (Howard et al., 2017).  
18 
 
 
 
 
Figure 1.1 Examples of low quality (A-C) and high quality (D) test images. 
Preprocessing Steps 
To better-enable targeted training, images were manually augmented to ensure quality of 
the training data was maintained. Images were standardized in size (224 x 224 pixels), cropped 
and centered so target items were the central focus. In some cases, items were clipped from their 
background entirely to ensure that only one class of item (real or fake) was in the image. 
Microsoft Paint3D was used for all image adjustments and augmentations. 
19 
 
 
 
Training, Validation, Testing 
 To develop our image classifier model, images were randomly subset into training (n= 
3,763) and validation (n= 665) sets. This was an 85-15 split, respectively. We trained our model 
over 20 epochs (iterations to update the model coefficients to improve performance on the 
training set) at a learning rate of 0.00083, at which point the training and validation loss were 
minimized. Finally, we selected the model with the lowest and most even loss, with the highest 
accuracy in the validation set.  
The final model was then applied to a test set of 649 images. The test set consisted of 
both real (n = 142) and fake (n = 507) items. To test the model at different sensitivities we 
adjusted the threshold of certainty for its predictions, and measured precision and recall. 
Precision is the fraction of correct predictions among all positive results, whether they are 
incorrect or not. Recall is the fraction of correct predictions among the total items targeted, 
regardless of whether they were correctly predicted or not.  From this information a Receiver 
Operating Characteristics (ROC) curve was generated and the Area Under Curve (AUC) was 
calculated (Davis & Goadrich, 2006). 
Images which fell below the selected certainty threshold were categorized as 
?inconclusive.? Responses categorized as ?inconclusive? were stored but prompted users to try 
and retake the image while suggesting they do not purchase the product. Example images are 
shown in Figure 1.2.  
20 
 
 
 
 
Figure 1.2 (Left Quad Clockwise) Resin, Coconut, Conch, Cow horn, and (Right) 
tortoiseshell products. 
Results 
Our computer vision model, henceforth referred to as ?SEE Shell?, was able to accurately 
identify 92.2% of the 665 validation images with a precision of 0.87, recall of 0.89 and an F1 
score of 0.88. These results are summarized in a confusion matrix (Figure 1.3). When applied to 
our test set, SEE Shell was able to obtain a range of accuracy at different certainty thresholds 
between 82.6-90.3%, and an F1 score between 0.69-0.79. Results for the certainty thresholds, 
dropout rate, accuracy, precision, recall and F1 are summarized in Table 1.1. The ROC curve for 
our model revealed an AUC of 0.87 (Figure 1.4).  
 
21 
 
 
 
 
Figure 1.3 Confusion matrix summarization of validation dataset (n = 665) 
results comparing the predicted counts relative to their actual classification. 
SEE Shell was able to obtain a class accuracy of 89.2% for real item images, 
and 93.6% for fake item images on the validation set. 
 
22 
 
 
 
 
Figure 1.4 Receiver Operating Characteristic (ROC) curve for our SEE Shell image classifier. 
The vertical axis is the true positive rate, also known as the recall. The horizontal axis is the 
false positive rate and is representative of the probability of a false alarm. ROC curve is based 
on variable prediction certainty thresholds. Blue dashed line is representative of a ?No skill? 
classifier, which would have an equal (50-50) random chance of selecting the correct 
response.  
 
 
 
 
 
 
23 
 
 
 
 
 
 
 
Table 1.1. SEE Shell test dataset (n = 649) inconclusive counts, accuracy, precision, recall and 
F1 based on adjusted certainty threshold. 
Certainty Threshold Inconclusive Images Inconclusive Results (%) Accuracy (%) Precision Recall F1 
None 0 - 82.6 0.566 0.880 0.690 
0.60 16 2.5 82.8 0.574 0.879 0.695 
0.75 38 5.9 83.8 0.591 0.882 0.708 
0.80 48 7.4 84.5 0.601 0.895 0.719 
0.85 67 10.3 85.7 0.628 0.901 0.740 
0.90 91 14.0 86.6 0.640 0.894 0.746 
0.95 117 18.0 87.2 0.654 0.889 0.754 
0.99 170 26.2 88.7 0.672 0.911 0.773 
1.00 217 33.4 90.3 0.696 0.920 0.792 
 
 
 
Discussion 
In this study, our validation and test set results showed that SEE Shell is an accurate 
method for detecting real tortoiseshell from images. Our results demonstrate that the application 
of deep learning to targeted wildlife trafficking projects can generate models, like SEE Shell, 
which may be immediately useful for conservation. By cultivating the largest known database of 
tortoiseshell products to date, we were able to train a computer vision model to differentiate 
between item types found within the image collection with high accuracy. Wide circulation of 
SEE Shell, and the mobile application, will allow stakeholders to monitor, avoid, and study the 
illegal tortoiseshell trade and support consumers to make more informed choices. Stakeholders 
include community scientists and conservation professionals seeking to monitor tortoiseshell 
products in their area, web-based marketplaces or e-commerce ventures looking to filter out 
illegal solicitation on their platforms, and officials monitoring, detecting, or confiscating 
24 
 
 
 
trafficked products at transaction points like shipping ports, customs checkpoints, and tourist 
areas. 
The specific goal of our application was to identify ?Real? tortoiseshell products, and in 
doing so emphasize the importance of recall on any given version of our model in the context of 
its usage. This is because not catching instances of real tortoiseshell is much worse than 
accidentally flagging fake tortoiseshell for further scrutiny. Results considered ?inconclusive? 
can also be flagged for inspection to err on the side of caution. SEE Shell has the potential to 
dramatically cut down on the number of items needed to be scrutinized, while expanding the 
overall monitoring effort?s reach.  
For example, from 1997 to 2003, legal wildlife shipments into the United States 
increased from 57,491 to 115,667 per year, yet the number of staff monitoring these shipments 
remained the same. This led to a substantial reduction in inspection rates, as it dropped from 36% 
in 1997 to 22% in 2003 (Rice & Moore, 2008). With an increasing volume of shipments to 
monitor, image classification tools like SEE Shell, may be able to increase the reach of 
monitoring effort; allowing officials to maintain their current workload but limit their purview to 
instances with high classification uncertainty only. This same logic can be applied to the discrete 
filtration of illegal solicitations on web-based marketplaces or e-commerce sites for tortoiseshell, 
where there have been more recent reports of shifts towards and increases in usage and activity 
(Rice & Moore, 2008; Nahill et al., 2020).  
 Another direct benefit of using a computer vision model such as SEE Shell is the cost 
efficiency in its immediate detection. Where current methods of identifying tortoiseshell rely on 
visual expert opinion, burning the item (Brad Nahill, personal comm.) or damaging genetic 
testing (Jensen et al., 2019), SEE Shell does not require any physical sampling of collected shell 
25 
 
 
 
material. Although monitors can be trained to expertly discern tortoiseshell from mimics, this 
takes time, effort, is susceptible to human error and often needs finances to do so. A CNN such 
as SEE Shell offers a less expensive and more efficient means of enabling identification.  
 Given the demand for highly accurate deep learning models, we expect to continue to 
improve SEE Shell?s performance and expand its capabilities of associated data collection for 
monitoring and reporting purposes. Specifically, our model?s relative change in accuracy, 
precision, recall, and F1 between the validation and test set results can be explained by the nature 
of the two data sets. Although both sets were never seen by the model during training, inherently 
training data resembles validation data and makes validation results at least in part biased (James 
et al., 2013; Kuhn & Johnson, 2013). It is likely the preprocessing augmentations made to our 
initial image set, which was subdivided for training and validation, created ideal conditions for 
us to perform optimally without any adjustments to prediction certainty thresholds. This lack of 
variability in our training data likely contributed to our slight loss in overall performance on the 
test set.  
Additionally, our test images were collected with minimal instructions given, and 
resulted in several images that were less than optimal for prediction (Figure 1.1, A-C). Some test 
images were of non-target items (e.g., cats, power tools, plants), which may have also lowered 
SEE Shell?s performance. In a few cases test item images were indistinguishable for even a 
human (author of this paper) to discern due to poor lighting and focus (Figure 1.4). To address 
this, we plan for future versions of our model to use an ensemble method by adding multiple 
models to enhance our application?s output as implemented in other computer vision projects 
(Guo et al., 2020; Yang et al., 2021). Specifically, we believe a blur detector, a bounding box, 
and a target detector will enhance SEE Shell?s precision and overall performance. Addition of a 
26 
 
 
 
blur detector would filter images such as those seen in Figure 1.1 A. Inclusion of a bounding box 
would segment images, and enable prediction on multiple items within the same photo, 
addressing images like Figure 1.1 B-C. Finally, inclusion of a target detector would simply filter 
out images of irrelevant items or full store front photographs. Each of these would prompt the 
user to retake the photograph, reminding them of the underlying instructions. 
It is worth noting that our underlying architecture was that of a MobileNetV2, which is 
condensed for deployment to mobile devices. Although increasing the size of a neural network 
doesn?t always improve performance (Malach & Shalev-Shwartz, 2019), it is possible that 
utilizing a larger neural network architecture may yield improved results given how complicated 
identifying tortoiseshell products has proven to be. Shallow architectures have proven to be 
useful on simple or well-constrained problems but can be limited when dealing with more 
complicated real-world visuals and scenes (Yoo, 2015). Use of other, larger architectures may 
lead to greater accuracy, but could limit portability especially in areas where there is limited 
cellular service to access server-side assets. As computational technologies advance, we 
recommend further exploration of new architectures and mobile deployment tools utilizing our 
dataset.  
27 
 
 
 
 
Figure 1.5 Example of exceptionally challenging test images. 
Conclusions 
Here, we presented our deep learning model, SEE Shell, which proved to be a viable 
approach for accurately and nondestructively identifying and detecting real and fake tortoiseshell 
products. We have deployed this through a mobile application, with the expectation that it will 
be used as an operational tool for monitoring hawksbill derived products in the field. Although 
we plan to continue to improve the performance of SEE Shell in the future, as a standalone 
mobile tool its current accuracy can benefit tortoiseshell monitoring efforts globally. Our SEE 
Shell application can act as a case study for other applications which seek to implement 
computer vision to combat the illegal wildlife trafficking of other threatened species 
References 
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, 
G., Isard, M. & Kudlur, M. (2016). Tensorflow: A system for large-scale machine 
learning. 12th USENIX symposium on operating systems design and implementation, 
265-283. https://doi.org/10.5281/zenodo.5949169 
28 
 
 
 
Aguirre, A.A., Gardner, S.C., Marsh, J.C., Delgado, S.G., Limpus, C.J. & Nichols, W.J. (2006). 
Hazards associated with the consumption of sea turtle meat and eggs: a review for health 
care workers and the general public. EcoHealth, 3(3), 141-153. 
https://doi.org/10.1007/s10393-006-0032-x 
Alvarado-D?az, J., Delgado-Trejo, C. & Suazo-Ortu?o, I. (2001). Evaluation of the black turtle 
project in Michoacan, Mexico. Marine Turtle Newsletter, 92, 4-7. 
http://www.seaturtle.org/mtn/archives/mtn92/mtn92p4.shtml?nocount 
Calmanovici, B., Waayers, D., Reisser, J., Clifton, J. & Proietti, M. (2018). I3S Pattern as a 
mark-recapture tool to identify captured and free-swimming sea turtles: an assessment. 
Marine Ecology Progress Series, 589, 263-268. https://doi.org/10.3354/meps12483 
Cheng, I.J., Cheng, W.H. & Chan, Y.T. (2018). Geographically close, yet so different: 
Contrasting long-term trends at two adjacent sea turtle nesting populations in Taiwan due 
to different anthropogenic effects. PLOS ONE, 13(7), e0200063. 
https://doi.org/10.1371/journal.pone.0200063 
Davis, J. & Goadrich, M. (2006) The relationship between Precision-Recall and ROC curves. 
Proceedings of the 23rd international conference on Machine learning, 233-240. 
https://doi.org/10.1145/1143844.1143874 
De Zeeuw, P. M., Pauwels, E. J., Ranguelova, E. B., Buonantony, D. M., & Eckert, S. A., 
(2010). Computer assisted photo identification of Dermochelys coriacea. Proceedings of 
the International Conference on Pattern Recognition (ICPR), 165-172. 
Dutton, D.L., Dutton, P.H., Chaloupka, M. & Boulon, R.H. (2005). Increase of a Caribbean 
leatherback turtle Dermochelys coriacea nesting population linked to long-term nest 
29 
 
 
 
protection. Biological Conservation, 126(2), 186-194. 
https://doi.org/10.1016/j.biocon.2005.05.013 
Frazier, J., Lutz, P., Musick, J., & Wyneaken, J. (2003). Prehistoric and ancient historic 
interactions between humans and marine turtles. In P.L Lutz, J. A.  Musick & J. 
Wyneken, The Biology of Sea Turtles, Volume II, 1-38. 
https://doi.org/10.1201/9781420040807 
Gatto, C.R., Rotger, A., Robinson, N.J. & Tomillo, P.S. (2018). A novel method for photo-
identification of sea turtles using scale patterns on the front flippers. Journal of 
Experimental Marine Biology and Ecology, 506, 18-24. 
https://doi.org/10.1016/j.jembe.2018.05.007 
Guo, P., Xue, Z., Mtema, Z., Yeates, K., Ginsburg, O., Demarco, M., Long, L.R., Schiffman, M. 
& Antani, S. (2020). Ensemble deep learning for cervix image selection toward 
improving reliability in automated cervical precancer screening. Diagnostics, 10(7), 451. 
https://doi.org/10.3390/diagnostics10070451 
Hanna, M.E., Chandler, E.M., Semmens, B.X., Eguchi, T., Lemons, G.E. & Seminoff, J.A. 
(2021). Citizen-sourced sightings and underwater photography reveal novel insights 
about green sea turtle distribution and ecology in southern California. Frontiers in 
Marine Science, 8, 500. https://doi.org/10.3389/fmars.2021.671061 
Harrison, E., von Weller, P., & Nahill, B. (2017). Endangered Souvenirs- Hawksbill Sea Turtle 
Products Sale in Latin America & the Caribbean. Oregon, USA: SEE Turtles, 1-35. 
https://www.seeturtles.org/s/Endangered-Souvenirs-Report-Final.pdf 
30 
 
 
 
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. & 
Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision 
applications. arXiv, 1704.04861. https://doi.org/10.48550/arXiv.1704.04861 
James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013). An Introduction to Statistical 
Learning. New York: Springer, (112), 176. https://doi.org/10.1007/978-1-0716-1418-1 
Jensen, M.P., LaCasella, E.L., Dutton, P.H. & Madden Hof, C.A. (2019). Cracking the Code: 
Developing a tortoiseshell DNA extraction and source detection method. Australia: 
World Wildlife Fund for Nature- Australia, 1-20. 
https://www.wwf.org.au/ArticleDocuments/353/pub-cracking-the-code-2019-
21aug19.pdf.aspx 
Joseph, J., Nishizawa, H., Alin, J.M., Othman, R., Jolis, G., Isnain, I. & Nais, J. (2019). Mass sea 
turtle slaughter at Pulau Tiga, Malaysia: Genetic studies indicate poaching locations and 
its potential effects. Global Ecology and Conservation, 17, e00586. 
https://doi.org/10.1016/j.gecco.2019.e00586 
Kuhn, M. & Johnson, K. (2013). Applied Predictive Modeling. New York: Springer, (26), 67. 
https://doi.org/10.1007/978-1-4614-6849-3 
LaCasella, E.L., Jensen, M.P., Madden Hof, C.A., Bell, I.P., Frey, A. & Dutton, P.H. (2021). 
Mitochondrial DNA profiling to combat the illegal trade in tortoiseshell products. 
Frontiers in Marine Science, 7, 1225. https://doi.org/10.3389/fmars.2020.595853 
Malach, E., & Shalev-Shwartz, S. (2019). Is deeper better only when shallow is good?. Advances 
in Neural Information Processing Systems, 32, 6429-6438. 
https://doi.org/10.48550/arXiv.1903.03488 
31 
 
 
 
Mancini, A., Senko, J., Borquez-Reyes, R., P?o, J. G., Seminoff, J. A., & Koch, V. (2011). To 
poach or not to poach an endangered species: elucidating the economic and social drivers 
behind illegal sea turtle hunting in Baja California Sur, Mexico. Human Ecology, 39(6), 
743-756. https://doi.org/10.1371/journal.pone.0001041 
McDonald, D., Dutton, P., Brander, R. & Basford, S. (1996). Use of pineal spot (pink spot) 
photographs to identify leatherback turtles. Herpetological Review, 27(1), 11. 
Miller, E.A., McClenachan, L., Uni, Y., Phocas, G., Hagemann, M.E. & Van Houtan, K.S. 
(2019). The historical development of complex global trafficking networks for marine 
wildlife. Science Advances, 5(3), eaav5948. https://doi.org/10.1126/sciadv.aav5948 
Nada, M. & Casale, P. (2011). Sea turtle bycatch and consumption in Egypt threatens 
Mediterranean turtle populations. Oryx, 45(1), 143-149. 
https://doi.org/10.1017/S0030605310001286 
Nahill, B., von Weller, P., & Barrios-Garrido., H. (2020). The global tortoiseshell trade. Oregon, 
USA: SEE Turtles, 1-83. 
https://static1.squarespace.com/static/5369465be4b0507a1fd05af0/t/5f37089ddc88be5b0f
ce18fe/1597442219875/Global+Tortoiseshell+Report.pdf 
Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. & Clune, 
J. (2018). Automatically identifying, counting, and describing wild animals in camera-
trap images with deep learning. Proceedings of the National Academy of Sciences, 
115(25), E5716-E5725. https://doi.org/10.1073/pnas.1719367115 
Qui?ones, J., Quispe, S. & Galindo, O. (2017). Illegal capture and black market trade of sea 
turtles in Pisco, Peru: the never-ending story. Latin American Journal of Aquatic 
Research, 45(3), 615-621. http://dx.doi.org/10.3856/vol45-issue3-fulltext-11 
32 
 
 
 
Reisser, J., Proietti, M., Kinas, P. & Sazima, I. (2008). Photographic identification of sea turtles: 
method description and validation, with an estimation of tag loss. Endangered Species 
Research, 5(1), 73-82. https://doi.org/10.3354/esr00113 
Rice, S.M. & Moore, M.K. (2008). Trade secrets: a ten year overview of the illegal import of sea 
turtle products into the United States. Marine Turtle Newsletter, (121), 1-5. 
http://hdl.handle.net/1834/30789 
Rudrud, R.W. (2010). Forbidden sea turtles: Traditional laws pertaining to sea turtle 
consumption in Polynesia (Including the Polynesian Outliers). Conservation and Society, 
8(1), 84-97. https://doi.org/10.4103/0972-4923.62669 
Schuettpelz, E., Frandsen, P.B., Dikow, R.B., Brown, A., Orli, S., Peters, M., Metallo, A., Funk, 
V.A. & Dorr, L.J. (2017). Applications of deep convolutional neural networks to 
digitized natural history collections. Biodiversity Data Journal, 1(5). 
https://dx.doi.org/10.3897%2FBDJ.5.e21139 
Schofield, G., Katselidis, K.A., Dimopoulos, P. & Pantis, J.D. (2008). Investigating the viability 
of photo-identification as an objective tool to study endangered sea turtle populations. 
Journal of Experimental Marine Biology and Ecology, 360(2), 103-108. 
https://doi.org/10.1016/j.jembe.2008.04.005 
Senko, J., Mancini, A., Seminoff, J.A. & Koch, V. (2014). Bycatch and directed harvest drive 
high green turtle mortality at Baja California Sur, Mexico. Biological Conservation, 169, 
24-30. https://doi.org/10.1016/j.biocon.2013.10.017 
Sullivan, D.P., Winsnes, C.F., ?kesson, L., Hjelmare, M., Wiking, M., Schutten, R., Campbell, 
L., Leifsson, H., Rhodes, S., Nordgren, A. & Smith, K. (2018). Deep learning is 
33 
 
 
 
combined with massive-scale citizen science to improve large-scale image classification. 
Nature Biotechnology, 36(9), 820. https://doi.org/10.1038/nbt.4225 
Yang, J., Yu, K., Gong, Y. & Huang, T. (2009). Linear spatial pyramid matching using sparse 
coding for image classification. 2009 IEEE Conference on computer vision and pattern 
recognition (IEEE), 1794-1801. https://doi.org/10.1109/CVPR.2009.5206757 
Yang, X., Zhang, Y., Lv, W. & Wang, D. (2021). Image recognition of wind turbine blade 
damage based on a deep learning model with transfer learning and an ensemble learning 
classifier. Renewable Energy, 163, 386-397. https://doi.org/10.1016/j.renene.2020.08.125 
Yoo, H.J. (2015). Deep convolution neural networks in computer vision: a review. IEIE on 
Transactions on Smart Processing and Computing, 4(1), 35-43. 
https://doi.org/10.5573/IEIESPC.2015.4.1.035 
Yu, X., Wang, J., Kays, R., Jansen, P.A., Wang, T. & Huang, T. (2013). Automated 
identification of animal species in camera trap images. Journal on Image and Video 
Processing, 52(2013), 1-10. https://doi.org/10.1186/1687-5281-2013-52 
  
34 
 
 
 
Chapter 2: Mapping Genetic Lineage through Morphology- A novel 
application of computer vision for morphological orientation from 
digitized field data 
 
Abstract 
1. Recent research utilized geometric morphometrics, associated genetic data, and PCA to 
successfully delineate Chelonia mydas (green sea turtle) morphotypes from carapace 
measurements. Here we demonstrate a similar, yet more rapid approach to this analysis 
using computer vision models.   
2. We applied a U-Net to isolate carapace pixels of (n = 204) of juvenile C. mydas from 
multiple foraging grounds across the Eastern Pacific, Western Pacific, and Western 
Atlantic. These images were then sorted based on general alignment (shape) and 
coloration of the pixels within the image using a pre-trained computer vision model 
(MobileNetV2). 
3. The dimensions of these data were then reduced and projected using UMAP. Associated 
vectors were then compared to simple genetic distance using a Mantel test. Data points 
were then labeled post-hoc for exploratory analysis. 
4. We found clear congruence between carapace morphology and genetic distance between 
haplotypes, suggesting that our image data have biological relevance. Our findings also 
suggest that carapace morphotype is associated with specific haplotypes within C. mydas.   
5. Our cluster analysis (k = 3) corroborates past research which suggests there are at least 
three morphotypes from across the Eastern Pacific, Western Pacific, and Western 
Atlantic. 
35 
 
 
 
 
Introduction 
Species are considered cryptic if they are taxonomically classified with two or more 
species as a single nominal unit, where one species should actually be several. This typically 
occurs when distinct species, at least superficially, resemble one another morphologically 
(Bickford et al., 2007). This is frequently reflected early in the ontogenetic stages of many 
organisms (Heldstab et al., 2020; Chatterji et al., 2022). Recognition of cryptic species has 
grown exponentially since 1975, mostly due to the advent of advanced technologies like 
Polymerase Chain Reaction (Bickford et al., 2007). More recently, advanced genetic analyses 
have been applied to aid in the delineation of cryptic species such as those observed in Canis 
spp. (Wolves, Jackals and Dogs; Koepfli et al., 2015) or Macroclemys spp. (Alligator Snapping 
Turtles; Roman et al., 1999). Although useful as independent tools, recent efforts which have 
paired genetic analysis with machine learning have proven to reliably analyze trends (e.g. gene 
flow, morphological diversity) over spatial gradients (Howes et al., 2009; White et al., 2019). 
Unfortunately, many machine learning studies focus on simple target prediction with little 
evaluation of ecology or evolution (Lucas, 2020). One exception has been the application of 
machine learning for species delimitation using DNA sequences, as has occurred for insects, 
plants (Perez et al., 2022), arachnids (Derkarabetian et al., 2019; Derkarabetian et al., 2022) and 
North American Box turtles (Terrapene spp.; Martin et al., 2021). The delimitation of species 
has taxonomic and evolutionary implications for our understanding of speciation processes, and 
directly affects species? counts and biodiversity assessments, which can impact the conservation 
status of populations within a region (Funk et al., 2019; Saryan et al., 2020).  
36 
 
 
 
In the criteria for defining one species versus another, consistency is difficult to achieve 
(Mallet, 1995; Shanker, 2001). In recent years, researchers have begun to utilize the emerging 
tool of deep learning, a form of machine learning, to delineate phenotypic differences between 
image data sets and to distinguish between morphologically similar organisms (Schuettpelz et al., 
2017; Earl et al., 2019; White et al., 2019). For example, Earl et al. (2019) were able to 
accurately (90%) differentiate between divergent genetic lineages (15 subgenus and 178 species) 
of Bombus (bumblebees) using images (n = 45,000) that may inherently represent morphological 
features. Similar model accuracy was achieved when machine learning was applied to herbarium 
specimens (Schuettpelz et al., 2017; White et al., 2019). More specifically, deep learning and 
image classification have been used to rapidly classify and distinguish key morphological 
features from similar images in several biological (Brosch et al., 2014; Hua et al., 2015) and 
ecological studies (Grinblat et al., 2016; Norouzzadeh et al., 2018; Younis et al., 2018).  
Past efforts to make consistent determinations for accurate classification of closely 
related species, or subspecies, is a long-standing topic of scientific debate within the sea turtle 
research community (Karl & Bowen, 1999). Sea turtles are particularly difficult to study due to 
their complex reproductive cycles, inaccessible localities, large home ranges, and generally 
enigmatic life histories (Reich et al., 2007; Kahn et al., 2016). Due to these complexities, sea 
turtles are considered one of the highest priority reptile taxa for the application of novel 
technologies (Komoroske et al., 2017). 
Although one of the most abundant sea turtle species, Chelonia mydas (green sea turtle) 
around the globe are of particular conservation interest, given the rapid degradation of their 
habitat and climate change-related decline observed in some populations (Jensen et al., 2018; 
Maurer et al., 2021). Over several decades, multiple studies have instigated a debate around the 
37 
 
 
 
taxonomy of Chelonia mydas and potential conspecifics (Baur, 1890; Carr, 1961; Caldwell, 
1962; Kamezaki & Matsui, 1995; Dutton et al., 1996; Parham & Zug, 1996; Karl & Bowen, 
1999; Pritchard, 1999; Bowen & Karl, 2000; Shanker, 2001; Okamoto & Kamezaki, 2014; 
?lvarez-Varas et al., 2019). Whether ?split? or ?lumped,? the outcome of such determinations has 
clear financial and advocacy implications for C. mydas populations (Seminoff & Shanker, 
2008).  
Until recently, much of the taxonomic focus has been on two morphotypes of C. mydas: a 
black morph with a conical carapace which is associated with the Eastern Pacific, and a light 
cream-yellow morph with an oval carapace which is believed to be distributed from the Atlantic 
to the Western Pacific (Parker et al., 2011; Z?rate et al., 2015). Chelonia mydas have 
demonstrated clear morphological variation (e.g., skull morphometrics, carapace scute 
arrangement, carapace length, flipper size) across their global distribution (Kamezaki & Matsui, 
1995; Wyneken et al., 1999; Seminoff et al., 2015; S?nmez, 2019). Despite the clear phenotypic 
differences, early genetic research examining mitochondrial and nuclear phylogenetic 
positioning of the black morphotype dismissed the idea that C. mydas is anything but a single 
species based on a polyphyletic nature of its morphotypes (Karl & Bowen, 1999). These findings 
put application of genetic isolation and morphological analysis definitions for C. mydas at odds 
with one another. Although genetic data provide some resolution and morphology additional 
context, neither alone can be used to draw the line between sea turtle species (Seminoff & 
Shanker, 2008).  
Over the last two decades, research has brought to light new evidence which might help 
elucidate C. mydas and its conspecific taxonomic relation to one another. For example, a 
machine learning study using color histograms was able to accurately identify sea turtle species 
38 
 
 
 
based on carapace color (Paixao et al., 2018). Generally, turtle carapace shape is known to be 
heritable and associated with phylogeographic variation in tortoises (Chiari et al., 2009; 
Poulakakis et al., 2015), freshwater turtles (Lamb & Avise, 1992; Meyers et al., 2006; Rivera, 
2008), and marine turtles (?lvarez-Varas et al., 2019). Contemporary geometric morphometric 
research suggests at least three morphotypes of C. mydas exist across the Eastern Pacific, 
Western Pacific and Atlantic (?lvarez-Varas et al., 2019). Additional research has demonstrated 
definitive allele frequency differences across nuclear markers, corresponding to differences 
between the Eastern and Western Pacific regional morphotypes (?lvarez-Varas et al., 2021). 
Past efforts to quantify sea turtle morphological variability among populations, by 
estimating morphospace occupation, were based on multiple carapace measurements using 
Principal Component Analysis (PCA), a dimensionality reduction technique (?lvarez-Varas et 
al., 2019). Although such data capture techniques are effective, the efficiency of machine 
learning using convolutional neural networks (CNNs) is more rapid and less prone to human 
error (Yildirim & Cinar, 2022). Additionally, the implementation of another dimensionality 
reduction tool, known as Uniform Manifold Approximation and Projection (UMAP; McInnes et 
al., 2018), in tandem with a CNN, has the capacity to outperform PCA (Yang et al., 2021; Chari 
et al., 2021). Past research has demonstrated that dimensionality reduction, when used in tandem 
with a machine learning model, can effectively delimitate species boundaries (Saryan et al., 
2020).  
A recent debate has focused on the biological relevance of images, questioning the extent 
to which photographs can be used to delimitate species (Shatalkin & Galinskaya, 2017). 
Historically, there are examples of species that have been described purely by way of 
photographic evidence (Marshall & Evenhuis, 2015). These typeless species classifications are 
39 
 
 
 
entirely without physical voucher and are considered to be extremely questionable (Santos et al., 
2016; Shatalkin & Galinskaya, 2017). Alternatively, there have been instances where image data 
has led to the direct discovery and eventual voucher of new species (Skejo & Caballero, 2016), 
and in some instances extinct species are only known to us by way of paintings or photographs 
(Amorim et al., 2016). Paintings and photographs have been used by ecologists for decades to 
study the natural world when physical vouchers were not possible (L?rig et al., 2021), and in 
some instances have been proven valid. For example, one study discovered the description of a 
black morphotype C. mydas present in Japanese waters, their unusual appearance being 
meticulously documented in a naturalist's drawing (Hayashi & Yasuda, 2022). This documented 
presence of the Eastern Pacific associated morphotype implies migration to the Western Pacific 
was occurring back in the Edo period (1600-1868; Hayashi & Yasuda, 2022), and still appears to 
occur today (Fukuoka et al., 2015).  
In order to answer the question of whether or not images can have biological relevance, 
congruence between potentially morphologically representative images and associated genetic 
data should be established. To streamline geometric morphometric analysis and examine the 
morphometric relationship of C. mydas and its conspecifics, we aim to estimate morphospace 
occupation for comparison of morphology and genetic distance, across populations using image 
data. This technique has previously been used to successfully evaluate diversity-disparity 
relationships within Polypodiopsida (ferns; White et al., 2019). By projecting morphological 
(e.g. turtle carapace) image data into a morphospace and confirming those embedding vectors are 
correlated to genetic distance we may be able to consistently quantify species occupation.  Given 
that contemporary research has suggested that there is a connection, at the population level, 
between C. mydas carapace morphology and mitochondrial genetic lineages (?lvarez-Varas et 
40 
 
 
 
al., 2019), we would expect that a natural correlation exists between the genetic distance and a 
CNN-generated morphospace projection derived from associated image data. Utilizing mtDNA 
analysis in synthesis with deep machine learning tools, we hope to provide a case study for 
deploying CNNs to elucidate intraspecific phenotypic plasticity and natural selection across 
biogeographic scales. This research aims to use a pre-trained machine learning model to sort our 
carapace image data based on general alignment (shape) and coloration of the pixels within the 
image and compare it to associated genetic data. 
Methods and Experimental Design 
Data Collection and Study Sites 
 Genetic samples and images of juvenile C. mydas (n = 204; Table 1) were captured 
during surveys at foraging sites located in the Southwestern Atlantic (Uruguay), Eastern Pacific 
(from north to south: Mexico, Costa Rica, Gal?pagos-Ecuador, Peru and Chile), and Australasia 
(Australia and New Zealand). Images were selected for analysis if they had an unobstructed 
dorsal view of the carapace scutes and associated mtDNA haplotype data. 
 
Data Processing 
 To use an untrained machine learning model to sort our carapace image data, without risk 
of background pixels of non-biological items interfering with carapace data categorization, we 
needed to eliminate the possible influence of background pixels on CNN performance. To 
achieve this, we trained a machine learning model called a U-Net (Ronneberger et al., 2015) 
segmentation model to mask non-carapace pixels. Generated masks zero out (blacked) 
background pixels, while retaining carapace pixels (Figure 2.1). Specifically, we manually 
41 
 
 
 
masked a subset of images (n = 77) and used the methods of White et al. (2020) to develop a 
training set to build a U-Net segmentation model. The segmentation model was built on a resnet-
34 architecture pretrained on the ImageNet dataset (Deng et al., 2009). All images were then 
masked by our trained U-Net. All images were standardized in size (224 x 224 pixels). 
 
 
Figure 2.1 Example of an original (left) and masked (right) image of a female juvenile C. 
mydas. 
 
 
Semi-Supervised Classification and Dimensionality Reduction 
 For the second step in our analysis, we wanted to take our masked images and enable a 
CNN, pre-trained to identify basic shapes and colors, to sort our data. Using a CNN called a 
MobileNetV2 (Howard et al., 2017) which is pretrained on the ImageNet dataset (Deng et al., 
42 
 
 
 
2009), we passed our masked carapace images through the network weights, sorting images 
based on general alignment (shape) and coloration of the pixels within the image. The 
MobileNetV2 was not trained on any of our carapace image data, and no classification labels 
were used to help the CNN decide how to embed each carapace image. The MobileNetV2 was 
deployed using an Nvidia GeForce (V100; 32 GB VRAM) GPU implementing the Tensorflow 
library (Abadi et al., 2016). To remove irrelevant features (noise) from our data and improve 
interpretability, dimensionality was reduced using the UMAP algorithm, and the low-
dimensional embeddings were visualized. This semi-supervised approach has proven to allow for 
useful hypothesis-driven biological discovery by way of accurate latent space representation 
(Chari et al., 2021). The weighted MobileNetV2 output was then passed into a visualized 
projection space, generating reduced dimension coordinates for each carapace image using 
UMAP with 9 nearest neighbors used based on collection sample size (McInnes et al., 2018). 
Genetic Analysis 
Haplotypes associated with carapace images were approximately 770 base pairs (bp) of 
the mitochondrial control region, generated by ?lvarez-Varas et al. (2020) and Jensen et al. 
(2018). Simple genetic distance was calculated using the nucleotide difference between 
haplotypes in Geneious Prime 2022.1.1 (https://www.geneious.com). To label the haplotypes 
post-hoc, we labeled clades based primarily on the nomenclature of Jensen et al. (2019), except 
for three clustered haplotypes (CmP97.1, CmP109.1, CmP207.1) which were not part of the 
aforementioned phylogenetic analysis. Alternatively, Naro-Maciel et al. (2014) had found this 
unnamed haplotypic cluster tied to the Line Islands in the Polynesian Triangle, aligned 
phylogenetically between the Eastern Pacific (Clade IX) and Australasia Clades (III-V) 
43 
 
 
 
according to the nomenclature used in Jensen et al. (2019). Uncertain of which clade these 
haplotypes belong in, we labeled them as Clade VS for identification purposes. 
 
Table 2.1 Metadata for C. mydas carapace image data and associated haplotypes. 
Includes location of sample collection, number of images used for this study, 
haplotypes found at each location (?lvarez-Varas et al., 2020), and reference for 
data capture method. 
 
 
44 
 
 
 
Analysis and Visualization 
Due to the need to control for the inherent size differences between male and female 
turtles (Klingenberg, 2016; ?lvarez-Varas et al., 2019), we took a subset of our juvenile turtle 
data (n =77), which were sexed by a combination of probing for gonads and hormone testing 
(Jensen et al., 2018). We separately projected these masked carapace images using UMAP. We 
then compared the slope of regression lines for each sex using an ANCOVA in R. Euclidean 
vectors between carapace image points, projected using UMAP, were extracted, and centroids 
were calculated based on the haplotype grouping. Centroid vectors were formed into a distance 
matrix and compared (? = 0.05) to an associated genetic distance matrix (Appendix I) using a 
?two-tailed? Mantel test using the package Mantel (Carr, 2021) in Python (Van Rossum & Drake, 
2009). Mantel test was repeated on the data with haplotypes with low sample sizes (n ? 2) 
removed from the analysis to ensure accurate centroid triangulation. Centroids were calculated 
and visualized based on associated grouping, and separately, collection area. Grouping centroids 
were labeled based on the ocean basin associated with the majority of data point haplotypes 
within the cluster. Validity index analysis was conducted in R (R Core Team, 2021) using the 
package ?fpc? (Hennig, 2020) to substantiate the n-nearest neighbors (N. Neighbors = 9, 10, 30, 
77; respectively) used and groupings observed in the projection. All morphospace projections 
were then visualized and labeled with metadata post-hoc for exploratory analysis. 
Results 
All (n = 204) juvenile carapace images were successfully masked and projected using 
UMAP (Figures 2.1-2.6; Appendix II). Comparison of male (R2 = 0.274) and female (R2 = 0.196) 
projection regressions (Figure 2.2) revealed no difference between slopes of regression lines 
45 
 
 
 
(F1,73 = 0.16, p = 0.68). Cluster validity indices suggested unanimously that the data contained 
three clusters (k = 3). Mantel test results indicated a significant correlation (p = 0.0002, Z = 4.69) 
between the haplotypic distances and projected morphological centroid distances. The same 
result was found (p = 0.0001, Z = 10.265) when small samples (n ? 2) were removed (Figure 
2.5).  
 
 
 
Figure 2.2 Euclidean UMAP projection (N. Neighbors = 9) and regression of Male (n = 
25) and Female (n = 52) juvenile Chelonia mydas carapace images from subset of data. 
 
 
46 
 
 
 
 
Figure 2.3 Euclidean UMAP projection (N. Neighbors = 9) of Chelonia mydas carapace 
images (n = 204) embedded using a MobileNetV2 neural network. Data were labeled post-hoc 
by Clade. Number of samples by Clade- II: n = 10, III: n = 8, IV: n = 61, V: n = 33, VS: n = 
14, IX: n = 78. Cladistic nomenclature based on the alignments of Naro-Maciel et al. (2014) 
and Jensen et al. (2019). 
 
 
 
47 
 
 
 
 
Figure 2.4 Euclidean space UMAP projection (N. Neighbors = 9) of Chelonia mydas 
carapace image haplotype centroids embedded using a MobileNetV2 neural network. 
Centroid shape is indicative of a phylogenetic Clade (?Circle? = II, ?Star? = III, ?X? = IV, 
?Diamond? = V, ?Plus? = VS, ?Square? = IX) based on the alignments of Naro-Maciel et al. 
(2014) and Jensen et al. (2019). Point size is indicative of relative sample size. 
 
 
 
48 
 
 
 
  
Figure 2.5 Euclidean space UMAP projection (N. Neighbors = 9) of Chelonia mydas carapace 
image centroids (n = 204) embedded using a MobileNetV2 neural network. Centroids were 
calculated based on the area of collection.  Centroid shape is indicative of its ocean basin 
(?Circle? = Atlantic, ?Diamond? = Western Pacific, ?X? = Central Pacific, ?Square? = Eastern 
Pacific). 
 
49 
 
 
 
 
Figure 2.6 Centroid Chelonia mydas carapace UMAP projection. Colored circles are 
representative of centroid embeddings, adjacent visual examples were selected based on their 
proximity to the centroid. 
 
Discussion 
Our results found three separate clusters (k = 3) associated with our CNN-sorted carapace 
image data indicating there are carapace shape and coloration differences between turtles from 
the Atlantic and Pacific Oceans and among individuals within the Pacific Ocean (Figure 2.6). 
Results from our Mantel test substantiate the genetic congruence between carapace shape and 
approximate genetic origin, while also demonstrating the biological relevance of such an image 
50 
 
 
 
analysis using a CNN. Specifically, our semi-supervised CNN model sorted masked carapace 
images based on general alignment (shape) and coloration of the pixels within the image without 
prior training on our data. Although three distinct clusters were observed and verified through 
our validity index analysis, with the additional context of associated data labels we can clearly 
visualize the congruence observed between morphology and genetic distance from our Mantel 
test (Figures 2.3 & 2.4). This assortment was in some capacity reflective of the genetic value 
phenotypically held within the image.  
Based on our findings, our morphological data does group within phylogenetic 
alignments, while also appearing to contextualize Clade VS?s previously uncertain positioning. 
For example, the two studies which did phylogenetically place CmP97.1 did so within the same 
relative space, but with different relative relationships. Specifically, Naro-Maciel et al. (2014) 
examined the phylogenetics of C. mydas at Palmyra Atoll relative to the Eastern and Western 
Pacific and found that haplotype CmP97.1 was most closely neighbored to a cluster which 
includes CmP47.1 which aligned closer to CmP20.1 and CmP22.1 respectively. Similarly, 
another study examined French Polynesian nesting C. mydas in the context of the Western 
Pacific and Indo-Pacific and found that CmP97.1 aligned between CmP47.1 and the second 
cluster of CmP20.1 and CmP22.1 (Boissin et al., 2019). According to Jensen et al. (2019) 
CmP47.1 belongs to Clade IV and is associated with the South Western Pacific, while CmP20.1 
and CmP22.1 are aligned within Clade III in the Central West and Central South Pacific 
respectively.  
Applying the context of our results (Figure 2.4) suggests, at least in terms of its 
morphological distance, that CmP97.1 aligns closest with CmP22.1, while CmP47.1 aligns closer 
to CmP.20.1. This means that based on our results, CmP97.1, would more likely align closer to 
51 
 
 
 
Clade III, and align closest with the Central South Pacific evolutionary distinct region (Jensen et 
al., 2019). Considering the locations CmP97.1 was found (Rapa Nui, New Zealand and Costa 
Rica) this alignment agrees with the proximity of these locations to the Central South Pacific, 
than to the other two evolutionary distinct regions defined by Jensen et al. (2019). Additionally, 
network analysis on telemetry data for C. mydas revealed the clear connectivity between the 
Eastern and Central South Pacific (Kot et al., 2022). Given UMAPs ability to maintain local 
signal integrity within a morphospace (Chari et al., 2021), and the observed congruence with 
genetic distance within this study, we believe relative distances between our data points can offer 
an added level of resolution to taxonomic and phylogenetic efforts.  
Within the context of our study, which examined the morphology of foraging juvenile C. 
mydas collected across a geographic gradient, we observed several patterns. For example, 
clusters appeared to be made up almost entirely of haplotypes from within the same ocean basin. 
This also is reflected in the calculated centroid position based on collection area, where we 
observe a clear division of ocean basins, with Rapa Nui as a central point between Eastern and 
Western Pacific (Figure 2.5). Second, we observe a genetic partitioning (Figure 2.3), which is 
likely explained by the strong natal-rookery homing observed in C. mydas (Bowen & Karl, 2007; 
Dutton et al., 2014). This is also reflected in our Mantel test results. 
We believe there are a few potential explanations for the observed congruence between 
mtDNA and carapace shape. Specifically, we believe the origin of this signal could be an artifact 
from natural nuclear genetic exchange, or lack thereof, between populations that has resulted in 
an observable reflection within the mtDNA (Carreras et al., 2007). Despite their low mutation 
and evolution rates (Avise et al., 1992; Lee et al., 2020), multiple studies have identified a clear 
genetic distinction between several C. mydas populations, including between the Atlantic and 
52 
 
 
 
Pacific, using nuclear and mitochondrial markers (Roberts et al., 2004; Okamoto & Kamezaki, 
2014; Jensen et al., 2019; ?lvarez-Varas et al., 2021).  Future research should evaluate whether 
the same relationship between carapace morphology and nuclear genetic distance exists. 
An alternative explanation may be related to mtDNA's environmental selection and its 
physiological effects on turtle carapace composition. For aquatic turtles, studies have found that 
environmental factors like hydrology, thermal and chemical composition can impact turtle 
carapace shape (Rivera et al., 2014; Nagle et al., 2018). Additionally, there are clear deleterious 
effects of hypoxia by way of slowed mineralization and decreased material strength of shell 
composition (Jackson et al., 2000; Odegard et al., 2018). Chelonia mydas are specially adapted 
for long anaerobic dives, which require specialized ATP-yielding capabilities to help avoid these 
effects (Hochachka & Storey, 1975). Compared to other sea turtles like Caretta caretta 
(loggerhead; 60.3 Torr), C. mydas has a markedly higher oxygen affinity (32.6 Torr), this may 
generally expose them to greater rates of tissue hypoxia (Isaacks et al., 1978; Woods et al., 1984; 
Jackson, 1985) which could increase the influence this might have as a selection pressure relative 
to other sea turtle species. Additionally, substantial evidence suggests that the physical intensity 
that comes with avoiding hypoxia or complete anoxia has put sea turtle mitogenomes under 
considerable selection pressure (Norman et al., 1994; Ramos et al., 2020). Research has found 
mitochondrial selection specifically associated with aerobic respiration and shell composition in 
aquatic turtles (Escalona et al., 2017). We believe the genetic and geographic fidelity we 
observed in our results (Figures 2.3 ? 2.6) amongst C. mydas morphotypes may be driven by 
habitat-specific differences impacting selection at the mitochondrial and metabolic level. 
For example, within the Eastern Pacific, dissolved oxygen content and temperature are 
naturally lower than they are in the Atlantic (Dunbar et al., 1994; Helly & Levin, 2004; Breitburg 
53 
 
 
 
et al., 2018; Gr?goire et al., 2021). Generally, C. mydas seek out habitats with higher dissolved 
oxygen and warmer temperatures due to greater food availability associated with those areas 
(Hays et al., 2000; Attum & Rabia, 2021). Studies have shown that Atlantic C. mydas appear to 
outright avoid coastal hypoxic events (Craig et al., 2001) which resemble the habitat conditions 
(O2 < 62 ?mol kg?1) their Eastern Pacific counterparts live in persistently (Breitburg et al., 2018; 
Gr?goire et al., 2021).  
The potential influence dissolved oxygen and temperature may have on morphologic 
variability may be reflected in foraging behavior differences between C. mydas morphotypes 
across and within ocean basins. For example, Eastern Pacific C. mydas appear to quickly traverse 
long distances for coastal foraging (Seminoff et al., 2002; Seminoff & Jones, 2006; Blanco et al., 
2012), where their Atlantic counterparts are less inclined to do so (Godley et al., 2003; 
Makowski et al., 2006; Schultz, 2016; Doherty et al., 2020). This is validated by the observation 
that C. mydas that consume non-sedentary prey (e.g. free floating algae, crustaceans, cnidarians), 
like those in the Eastern Pacific often do (Carri?n-Cortez et al., 2010; Parker et al., 2011; 
Tomaszewicz et al., 2018), are more likely to traverse long distances when compared to those 
feeding on seagrass in the Atlantic (Godley et al., 2003). 
Studies examining diel movements of foraging Eastern Pacific C. mydas found that daily 
foraging activity appears to be substantially higher (8.2 km) than that of their Atlantic (1.2 - 4.1 
km) and Western Pacific (0.9 - 4.9 km) conspecifics (Mendon?a, 1983; Whiting & Miller, 1998; 
Seminoff & Jones, 2006). Lower water temperatures, like those experienced in the Eastern 
Pacific, typically result in increased dive duration and frequency of deep dives for C. mydas 
(Dunbar et al., 1994; Enstipp et al., 2011; Enstipp et al., 2016; Madrak et al., 2022). Additional 
evidence suggests that the lower temperatures at depth may enable Pacific C. mydas to dive 
54 
 
 
 
deeper than their Atlantic counterparts (Hatase et al., 2006). Even within the Atlantic, 
comparisons have demonstrated that C. mydas in the lower dissolved-oxygen waters (Breitburg 
et al., 2018; Gr?goire et al., 2021) near Ascension Island dive deeper and stay at shallow depths 
less frequently than their Mediterranean counterparts, with the latter experiencing easier foraging 
(Hays et al., 2002). This difference in foraging strategy may comparatively be reflected in the 
particularly slow growth rates experienced by the black morphotype when compared to their 
Atlantic counterparts (Labrada-Martag?n et al., 2017). If C. mydas in the Eastern Pacific have 
adapted to more frequent and longer deep dives due to lower temperatures and decreased food 
availability, we can reasonably expect metabolic adaptations. These adaptations at the 
mitochondrial level, likely in conjunction with nuclear genes, could be reflected in changes to 
shell morphology and composition, given the association between hypoxia avoidance and 
carapace composition (Norman et al., 1994; Escalona et al., 2017; Ramos et al., 2020).  
Additional evidence for a relationship between environmental conditions and carapace 
morphology can be found by looking within the Eastern Pacific where both yellow and black 
morphotypes have been observed foraging (Amorocho et al., 2012; Sampson et al., 2014; Z?rate 
et al., 2015; Sampson et al., 2018), with markedly fewer yellow turtles than black in the region 
(Amorocho et al., 2012; Sampson et al., 2014; Z?rate et al., 2015; Sampson et al., 2018). Genetic 
evidence suggests that yellow C. mydas found in the Eastern Pacific are of Western Pacific 
origin (Amorocho et al., 2012). As such, studies have found that yellow turtles remain in Eastern 
Pacific foraging grounds for shorter periods of time than black turtles (Sampson et al., 2014). 
Additionally, Eastern Pacific C. mydas demonstrate morphotype-specific life histories, which 
appear to be related to resource usage (Sampson et al., 2014; Seminoff et al., 2021). For 
example, stable isotope analysis in the Galapagos indicated that black C. mydas may be more 
55 
 
 
 
likely to forage in deeper offshore sites than yellow turtles within the same habitat (Z?rate et al., 
2012). Another study found that yellow C. mydas foraging off the Pacific coast of Costa Rica 
had diets akin to the neritic Eretmochelys imbricata (hawksbill) rather than their black 
counterparts (Clyde-Brockway, 2019). Similarly, another study found that within one Eastern 
Pacific foraging area, C. mydas spent more time swimming and diving, and are more active at 
depth then they are in other parts of the world (Seminoff et al., 2021). This may be indicative of 
differing morphotype-specific foraging strategies (Z?rate et al., 2012; Sampson et al., 2014; 
Seminoff et al., 2021), which could demonstrate an adaptation by Eastern Pacific C. mydas. 
Based on the body of evidence put forth here, we would expect yellow turtles to be less 
habituated to the lower temperature and dissolved oxygen conditions within the Eastern Pacific 
than their black counterparts, yet more so than Atlantic turtles.  
A similar Atlantic-Eastern Pacific difference has been observed in the deep diving 
Dermochelys coriacea (leatherback sea turtle), with Eastern Pacific populations experiencing 
similar foraging difficulties as C. mydas (Seminoff & Jones, 2006; Bailey et al., 2012). It is also 
worth noting the exceptionally hydrodynamic morphological (Seminoff et al., 2012; Bang et al., 
2016) similarities between D. coriacea and the black C. mydas morphotype. Both turtles have an 
exceptionally conical or tear-drop shape carapace and black coloration (Parker et al., 2011; 
Seminoff et al., 2012). These similarities could be a sign of morphological convergence and may 
indicate that the black C. mydas are transitioning into a deeper diving species.  
Additional evidence for this environmentally driven morphology change come from 
genetic research which indicates C. mydas nuclear markers are under clear morphotype-specific 
selection at genes associated with hypoxia management, melanin, UV regulation and 
thermoregulation (?lvarez-Varas et al., 2021). Considering these findings in light of our 
56 
 
 
 
observations, the mitochondrial signal might actually be reflective of true physiological 
processes at play, rather than a secondary genetic signal. We recommend future research into the 
dive behavior and physiology of the black-morphotype, with consistent reporting throughout the 
Eastern Pacific with comparison to Atlantic and Western Pacific individuals. 
We do consider differences in nesting ecology and ontogenetic changes as a potential 
driver for our observed genetic signal. Specifically, there are clear differences in nesting 
conditions (Rubinoff, 1968) and success between the Eastern Pacific and Atlantic sea turtles 
(Pike, 2014). In C. mydas it seems higher nesting temperatures, especially in dry conditions, 
increase the likelihood of anomalous shell scute pattern in hatchlings (Zimm et al., 2017). Given 
the Eastern Pacific generally experiences greater variability (18 ?C) in temperatures than the 
Western Atlantic (6 ?C; Rubinoff, 1968), we might expect increased exposure to extreme 
temperatures to impact the rate at which anomalous shell patterns appear. Preliminary results 
from lab experiments suggest that hatchling phenotype (e.g carapace size, hatchling mass, swim 
thrust, stroke rate) is likely influenced by both maternal origin and nesting temperature (Booth et 
al., 2013). We recommend future research to compare nesting conditions, inter-nesting 
temperature, and its impacts on hatchling morphology across C. mydas populations from 
different ocean basins. 
Regardless of signal origin, our results clearly imply that the images themselves have 
real-world value and carry at least some of the necessary biological information needed to 
conduct an objective study of species morphology and population genetics using a CNN. 
Implementing a CNN on images of genetically associated morphological traits, across turtles and 
other taxa may enable a more consistent method of delineating species-subspecies boundaries 
and aid in phylogenetic alignment. We believe our results reflect actual biological processes and 
57 
 
 
 
set the stage for the development of CNNs for genetic inference. Given the congruence between 
C. mydas mtDNA and carapace shape, future development of computer vision models trained 
specifically on carapace image classes associated with specific haplotypes may be able to assist 
conservation professionals in the field. Specifically, a computer vision application that can 
accurately identify a turtle's haplotype or clade from just a single captured image, may be 
possible.  
Another novel insight which was not a part of the original intent of this study was 
identification of individual turtles at foraging grounds, whose mtDNA signature likely belongs to 
a distant rookery. Although relatively few, there were outliers beyond their clade or expected 
grouping. Whether these are simply morphological outliers or instances of corrupted pixel 
arrangement (e.g. severe presence of barnacles, algae, misaligned scutes, mismasked pixels, 
etc.), we are not certain. Although our technique does account for pixel arrangement (shape) and 
color, we do not currently have a scale-specific understanding of each axis within our 
morphospace. To quantify the axis of latent morphospace projections going forward, we 
recommend the implementation of Generative Adversarial Networks (GAN), a type of CNN 
based architecture (Goodfellow et al., 2021). GANs can generate detailed composite images 
given a training set (Do et al., 2018). Within a latent morphospace projection, which is now a 
quantified pixel space, you can take a centroid of data points at your axis extrema and compare 
them. The output image from each of these should return a relative sense of what shifts along the 
axis should produce in terms of morphological differences. 
Our semi-supervised image analysis is likely at least in part independent of size. 
Specifically, we did not observe any of the documented size differences (Klingenberg, 2016; 
?lvarez-Varas et al., 2019) between males and females in the regression analysis (Figure 2.2). 
58 
 
 
 
This might be due to some plasticity in photo quality. For example, small juvenile turtles could 
be photographed up close, making them appear larger than they are when compared to their 
counterparts, and vice versa. Additionally, a lack of size reference due to masking might also 
control for size differences. Without a consistent object (e.g., tape measure or ruler) included in 
the images, there is no sense of scale for the model to reference in sorting other than what 
information is available. In our case, this would be limited to the carapace and the relative size 
and coloration of the scutes to one another. Further investigation is needed to determine the 
extent to which pixel orientation (object shape) and coloration within a CNN effect size signals 
in such an analysis.  
For example, future studies may want to analyze the impact that carapace orientation or 
lighting has on embedding position. The overwhelming majority of our carapace images were 
oriented in the same dorsal direction under day light conditions (Figure 2.1). Although we would 
expect the dimensionality reduction to control for signal noise (Chari et al., 2021), we cannot be 
certain that our results are not at least in part influenced by small differences in carapace 
orientation or residual coloration alteration due to differences in lighting.  
Although we observed a clear genetic signal from our data, there is variability within 
each grouping. Inter-group carapace variability may be an artifact of underrepresentation due to 
sample size or a lack of control over the specific age of juveniles. Specifically, research has 
illuminated age-specific differences in juvenile carapace shell shape, with a clear ?widening,? 
and in turn ?slimming? throughout this life stage (Salmon et al., 2018). Given the stark contrast 
of our groupings we wouldn?t expect such changes to dramatically augment image position 
within the morphological projection, however such variability was not accounted for in this study 
and should be researched further. 
59 
 
 
 
Conclusions 
Our CNN-sorted carapace image results found three separate clusters (k = 3) indicating 
there are carapace shape and coloration differences between turtles from the Atlantic and Pacific 
Oceans and among individuals within the Pacific Ocean, corroborating the findings of ?lvarez-
Varas et al. (2019). Our Mantel test results demonstrate the biological relevance of image 
analysis using a CNN and substantiate the genetic congruence between carapace shape and 
approximate genetic origin. Given the clear genetic congruence between carapace shape and 
haplotype, we believe future efforts should attempt to develop computer vision techniques which 
can utilize this connection to accurately assign genetic origin to individual turtles based on 
carapace images. Future exploration of the intersection of morphology and genetics, especially 
within the context of taxonomic delineation, should utilize CNNs to aid in quantifying key 
phenotypic traits. We believe research put forth in this manuscript demonstrates the efficacy of 
utilizing such a pipeline for holistically analyzing evolutionary trends at genetic and 
biogeographic scales. Specifically, we have put forth evidence which suggests CNNs can 
accurately and rapidly mimic the outputs of other dimensionality reduction techniques, 
improving phylogenetic articulation of potentially cryptic species. 
References 
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, 
G., Isard, M. & Kudlur, M. (2016). Tensorflow: A system for large-scale machine 
learning. 12th USENIX symposium on operating systems design and implementation, 
265-283. https://doi.org/10.5281/zenodo.5949169 
60 
 
 
 
?lvarez-Varas, R., Contardo, J., Heidemeyer, M., Forero-Rozo, L., Brito, B., Cort?s, V., Brain, 
M.J., Pereira, S. & Vianna, J.A. (2017). Ecology, health and genetic characterization of 
the southernmost green turtle (Chelonia mydas) aggregation in the Eastern Pacific: 
implications for local conservation strategies. Latin American Journal of Aquatic 
Research, 45(3), 540-554. http://dx.doi.org/10.3856/vol45-issue3-fulltext-4  
?lvarez-Varas, R., Heidemeyer, M., Riginos, C., Ben?tez, H.A., Res?ndiz, E., Lara-Uc, M., 
Godoy, D.A., Mu?oz-P?rez, J.P., Alarc?n-Ruales, D.E., V?lez-Rubio, G.M. & Fallabrino, 
A. (2020). Integrating morphological and genetic data at different spatial scales in a 
cosmopolitan marine turtle species: challenges for management and conservation. 
Zoological Journal of the Linnean Society, 191(2), 434-453. 
https://doi.org/10.1093/zoolinnean/zlaa066 
?lvarez-Varas, R., Rojas-Hern?ndez, N., Heidemeyer, M., Riginos, C., Ben?tez, H.A., Araya-
Donoso, R., Res?ndiz, E., Lara-Uc, M., Godoy, D.A., Mu?oz-P?rez, J.P. and Alarc?n-
Ruales, D.E. (2021). Green, yellow or black? Genetic differentiation and adaptation 
signatures in a highly migratory marine turtle. Proceedings of the Royal Society B: 
Biological Science, 288(1954), 20210754. https://doi.org/10.1098/rspb.2021.0754 
?lvarez-Varas, R., V?liz, D., V?lez-Rubio, G.M., Fallabrino, A., Z?rate, P., Heidemeyer, M., 
Godoy, D.A. & Ben?tez, H.A. (2019). Identifying genetic lineages through shape: An 
example in a cosmopolitan marine turtle species using geometric morphometrics. PLOS 
ONE, 14(10), e0223587. 
Amorim, D.S., Santos, C.M.D., Krell, F.T., Dubois, A., Nihei, S.S., Oliveira, O.M., Pont, A., 
Song, H., Verdade, V.K., Fachin, D.A. & Klassa, B. (2016). Timeless standards for 
61 
 
 
 
species delimitation. Zootaxa, 4137(1), 121-128. 
http://doi.org/10.11646/zootaxa.4137.1.9 
Amorocho, D. F., Abreu-Grobois, F. A., Dutton, P. H., & Reina, R. D. (2012). Multiple distant 
origins for green sea turtles aggregating off Gorgona Island in the Colombian Eastern 
Pacific. PLOS ONE, 7(2), e31486. https://doi.org/10.1371/journal.pone.0031486 
Attum, O., & Rabia, B. (2021). Green (Chelonia mydas) and loggerhead (Caretta caretta) habitat 
use of the most environmentally extreme sea turtle feeding ground in the Mediterranean 
basin. Journal of Coastal Conservation, 25(1), 1-7. https://doi.org/10.1007/s11852-020-
00793-1 
Avise, J.C., Bowen, B.W., Lamb, T., Meylan, A.B. & Bermingham, E. (1992). Mitochondrial 
DNA evolution at a turtle's pace: evidence for low genetic variability and reduced 
microevolutionary rate in the Testudines. Molecular Biology and Evolution, 9(3), 457-
473. https://doi.org/10.1093/oxfordjournals.molbev.a040735 
Bailey, H., Fossette, S., Bograd, S.J., Shillinger, G.L., Swithenbank, A.M., Georges, J.Y., 
Gaspar, P., Str?mberg, K.P., Paladino, F.V., Spotila, J.R. & Block, B.A. (2012). 
Movement patterns for a critically endangered species, the leatherback turtle 
(Dermochelys coriacea), linked to foraging success and population status. PLOS ONE, 
7(5), e36401. https://doi.org/10.1371/journal.pone.0036401 
Bang, K., Kim, J., Lee, S. I., & Choi, H. (2016). Hydrodynamic role of longitudinal dorsal ridges 
in a leatherback turtle swimming. Scientific reports, 6(1), 1-10. 
https://doi.org/10.1038/srep34283 
Baur, G. (1890). The genera of the Cheloniidae. American Naturalist, 1890, 486-487. 
62 
 
 
 
Bickford, D., Lohman, D.J., Sodhi, N.S., Ng, P.K., Meier, R., Winker, K., Ingram, K.K. & Das, 
I., (2007). Cryptic species as a window on diversity and conservation. Trends in Ecology 
& Evolution, 22(3), 148-155. 
Blanco, G. S., Morreale, S. J., Bailey, H., Seminoff, J. A., Paladino, F. V., & Spotila, J. R. 
(2012). Post-nesting movements and feeding grounds of a resident East Pacific green 
turtle Chelonia mydas population from Costa Rica. Endangered Species Research, 18(3), 
233-245. https://doi.org/10.3354/esr00451 
Boissin, E., Neglia, V., Boulet Colomb D?hauteserre, F., Tatarata, M., & Planes, S. (2019). 
Evolutionary history of green turtle populations, Chelonia mydas, from French Polynesia 
highlights the putative existence of a glacial refugium. Marine Biodiversity, 49(6), 2725-
2733. https://doi.org/10.1007/s12526-019-01001-6 
Booth, D. T., Feeney, R., & Shibata, Y. (2013). Nest and maternal origin can influence 
morphology and locomotor performance of hatchling green turtles (Chelonia mydas) 
incubated in field nests. Marine Biology, 160(1), 127-137. 
https://doi.org/10.1007/s00227-012-2070-y 
Bowen, B.W., & Karl, S.A. (2000). Meeting report: taxonomic status of the East Pacific green 
turtle (Chelonia agassizii). Marine Turtle Newsletter, 89, 20-22. 
http://www.seaturtle.org/mtn/archives/mtn89/mtn89p20.shtml?nocount 
Bowen, B. W., & Karl, S. A. (2007). Population genetics and phylogeography of sea 
turtles. Molecular Ecology, 16(23), 4886-4907. https://doi.org/10.1111/j.1365-
294X.2007.03542.x 
Breitburg, D., Levin, L.A., Oschlies, A., Gr?goire, M., Chavez, F.P., Conley, D.J., Gar?on, V., 
Gilbert, D., Guti?rrez, D., Isensee, K. & Jacinto, G.S. (2018). Declining oxygen in the 
63 
 
 
 
global ocean and coastal waters. Science, 359(6371), eaam7240. 
https://doi.org/10.1126/science.aam7240 
Brosch, T., Yoo, Y., Li, D.K., Traboulsee, A. & Tam, R. (2014). Modeling the variability in 
brain morphology and lesion distribution in multiple sclerosis by deep learning. 
International Conference on Medical Image Computing and Computer-Assisted 
Intervention, 8674, 462-469. https://doi.org/10.1007/978-3-319-10470-6_5 
Caldwell, D. K. (1962). Sea turtles in Baja Californian waters (with special reference to those of 
the Gulf of California), and the description of a new sub- species of northeastern Pacific 
green turtle. Los Angeles County Museum Contributions in Science, (61), 3-31. 
https://iucn-tftsg.org/wp-content/uploads/file/Articles/Caldwell_1962.pdf 
Carr, A. (1961). Pacific turtle problem. Natural History, 70, 64-71. 
Carr, J. W. (2021). Jwcarr/mantel: Python implementation of the mantel test, a significance test 
of the correlation between two distance matrices. GitHub, Retrieved March 15th, 2022, 
from https://github.com/jwcarr/mantel#readme  
Carreras, C., Pascual, M., Cardona, L., Aguilar, A., Margaritoulis, D., Rees, A., Turkozan, O., 
Levy, Y., Gasith, A., Aureggi, M. & Khalil, M. (2007). The genetic structure of the 
loggerhead sea turtle (Caretta caretta) in the Mediterranean as revealed by nuclear and 
mitochondrial DNA and its conservation implications. Conservation Genetics, 8(4), 761-
775. https://doi.org/10.1007/s10592-006-9224-8 
Carri?n-Cortez, J. A., Z?rate, P., & Seminoff, J. A. (2010). Feeding ecology of the green sea 
turtle (Chelonia mydas) in the Galapagos Islands. Journal of the Marine Biological 
Association of the United Kingdom, 90(5), 1005-1013. 
https://doi.org/10.1017/S0025315410000226 
64 
 
 
 
Chari, T., Banerjee, J. & Pachter, L. (2021). The specious art of single-cell genomics. bioRxiv, 1-
25. https://doi.org/10.1101/2021.08.25.457696 
Chatterji, R. M., Hipsley, C. A., Sherratt, E., Hutchinson, M. N., & Jones, M. E. (2022). 
Ontogenetic allometry underlies trophic diversity in sea turtles (Chelonioidea). 
Evolutionary Ecology, 1-30. https://doi.org/10.1007/s10682-022-10162-z 
Chiari, Y., Hyseni, C., Fritts, T.H., Glaberman, S., Marquez, C., Gibbs, J.P., Claude, J. & 
Caccone, A. (2009). Morphometrics parallel genetics in a newly discovered and 
endangered taxon of Gal?pagos tortoise. PLOS ONE, 4(7), e6272. 
https://doi.org/10.1371/journal.pone.0006272 
Clyde-Brockway, C. E. (2019). Foraging Ecology and Stress in Sea Turtles (Doctoral 
Dissertation) Purdue University Graduate School, West Lafayette, IN.  
Craig, J. K., Crowder, L. B., Gray, C. D., McDaniel, C. J., Kenwood, T. A., & Hanifen, J. G. 
(2001). Ecological Effects of Hypoxia on Fish, Sea Turtles, and Marine Mammals in the 
Northwestern Gulf of Mexico. Coastal Hypoxia: Consequences for Living Resources and 
Ecosystems, 58, 269-291. https://doi.org/10.1029/CE058p0269 
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. & Fei-Fei, L. (2009). ImageNet: A large-scale 
hierarchical image database. IEEE Conference on Computer Vision and Pattern 
Recognition, 248-255. https://doi.org/10.1109/CVPR.2009.5206848 
Derkarabetian, S., Castillo, S., Koo, P.K., Ovchinnikov, S. & Hedin, M. (2019). A demonstration 
of unsupervised machine learning in species delimitation. Molecular Phylogenetics and 
Evolution, 139, 106562. https://doi.org/10.1016/j.ympev.2019.106562 
65 
 
 
 
Derkarabetian, S., Starrett, J., & Hedin, M. (2022). Using natural history to guide supervised 
machine learning for cryptic species delimitation with genetic data. Frontiers in Zoology, 
19(1), 1-15. https://doi.org/10.1186/s12983-022-00453-0 
Do, N.T., Na, I.S. & Kim, S.H. (2018). Forensics face detection from GANs using convolutional 
neural network. ISITC, 2018, 376-379. 
Doherty, P.D., Broderick, A.C., Godley, B.J., Hart, K.A., Phillips, Q., Sanghera, A., Stringell, 
T.B., Walker, J.T. & Richardson, P.B. (2020). Spatial ecology of sub-adult green turtles 
in coastal waters of the Turks and Caicos Islands: implications for conservation 
management. Frontiers in Marine Science, 690. 
https://doi.org/10.3389/fmars.2020.00690 
Dunbar, R. B., Wellington, G. M., Colgan, M. W., & Glynn, P. W. (1994). Eastern Pacific Sea 
surface temperature since 1600 AD: The ?18O record of climate variability in Gal?pagos 
corals. Paleoceanography, 9(2), 291-315. https://doi.org/10.1029/93PA03501 
Dutton, P. H., Davis, S. K., Guerra, T., & Owens, D. (1996). Molecular phylogeny for marine 
turtles based on sequences of the ND4-leucine tRNA and control regions of 
mitochondrial DNA. Molecular Phylogenetics and Evolution, 5(3), 511-521. 
https://doi.org/10.1006/mpev.1996.0046 
Dutton, P.H., Jensen, M.P., Frey, A., LaCasella, E., Balazs, G.H., Z?rate, P., Chassin?Noria, O., 
Sarti?Martinez, A.L. & Velez, E. (2014). Population structure and phylogeography 
reveal pathways of colonization by a migratory marine reptile (Chelonia mydas) in the 
central and eastern Pacific. Ecology and Evolution, 4(22), 4317-4331. 
https://doi.org/10.1002/ece3.1269 
66 
 
 
 
Earl, C., White, A.E., Trizna, M.G., Frandsen, P.B., Kawahara, A.Y., Brady, S.G. & Dikow, 
R.B. (2019). Discovering Patterns of Biodiversity in Insects Using Deep Machine 
Learning. Biodiversity Information Science and Standards, (4), e37525. 
https://doi.org/10.3897/biss.3.37525 
Enstipp, M.R., Ballorain, K., Ciccione, S., Narazaki, T., Sato, K. & Georges, J.Y. (2016). Energy 
expenditure of adult green turtles (Chelonia mydas) at their foraging grounds and during 
simulated oceanic migration. Functional Ecology, 30(11), 1810-1825. 
https://doi.org/10.1111/1365-2435.12667 
Enstipp, M.R., Ciccione, S., Gineste, B., Milbergue, M., Ballorain, K., Ropert-Coudert, Y., Kato, 
A., Plot, V. & Georges, J.Y. (2011). Energy expenditure of freely swimming adult green 
turtles (Chelonia mydas) and its link with body acceleration. Journal of Experimental 
Biology, 214(23), 4010-4020. https://doi.org/10.1242/jeb.062943 
Escalona, T., Weadick, C. J., & Antunes, A. (2017). Adaptive patterns of mitogenome evolution 
are associated with the loss of shell scutes in turtles. Molecular Biology and Evolution, 
34(10), 2522-2536. https://doi.org/10.1093/molbev/msx167 
Fukuoka, T., Narazaki, T., & Sato, K. (2015). Summer-restricted migration of green turtles 
Chelonia mydas to a temperate habitat of the northwest Pacific Ocean. Endangered 
Species Research, 28(1), 1-10. https://doi.org/10.3354/esr00671 
Funk, W.C., Forester, B.R., Converse, S.J., Darst, C. & Morey, S. (2019). Improving 
conservation policy with genomics: a guide to integrating adaptive potential into US 
Endangered Species Act decisions for conservation practitioners and geneticists. 
Conservation Genetics, 20(1), 115-134. https://doi.org/10.1007/s10592-018-1096-1 
67 
 
 
 
Gr?goire, M., Gar?on, V., Garcia, H.E., Breitburg, D.L., Isensee, K., Oschlies, A., Telszewski, 
M., Barth, A., Bittig, H.C., Carstensen, J. & Carval, T. (2021). A Global Ocean Oxygen 
Database and Atlas for assessing and predicting deoxygenation and ocean health in the 
open and coastal ocean. Frontiers in Marine Science, 8, 724913. 
http://dx.doi.org/10.3389/fmars.2021.724913 
Grinblat, G.L., Uzal, L.C., Larese, M.G. & Granitto, P.M. (2016). Deep learning for plant 
identification using vein morphological patterns. Computers and Electronics in 
Agriculture, 127, 418-424. https://doi.org/10.1016/j.compag.2016.07.003 
Godley, B.J., Lima, E.H.S.M., ?kesson, S., Broderick, A.C., Glen, F., Godfrey, M.H., Luschi, P. 
& Hays, G.C. (2003). Movement patterns of green turtles in Brazilian coastal waters 
described by satellite tracking and flipper tagging. Marine Ecology Progress Series, 253, 
279-288. https://doi.org/10.3354/meps253279 
Godoy, D. A. (2016). The ecology and conservation of green turtles (Chelonia mydas) in New 
Zealand (Doctoral dissertation). Massey University, Albany, NZ.  
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, 
A. & Bengio, Y. (2021). Generative adversarial networks. arXiv Preprint, 
arXiv:1406.2661. https://doi.org/10.48550/arXiv.1406.2661 
Hatase, H., Sato, K., Yamaguchi, M., Takahashi, K., & Tsukamoto, K. (2006). Individual 
variation in feeding habitat use by adult female green sea turtles (Chelonia mydas): are 
they obligately neritic herbivores?. Oecologia, 149(1), 52-64. 
https://doi.org/10.1007/s00442-006-0431-2 
68 
 
 
 
Hayashi, R., & Yasuda, Y. (2022). Past biodiversity: Japanese historical monographs document 
the trans?Pacific migration of the black turtle, Chelonia mydas agassizii. Ecological 
Research, 37(1), 151-155. https://doi.org/10.1111/1440-1703.12265 
Hays, G. C., Adams, C. R., Broderick, A. C., Godley, B. J., Lucas, D. J., Metcalfe, J. D., & Prior, 
A. A. (2000). The diving behaviour of green turtles at Ascension Island. Animal 
Behaviour, 59(3), 577-586. https://doi.org/10.1006/anbe.1999.1326 
Hays, G. C., Glen, F., Broderick, A. C., Godley, B. J., & Metcalfe, J. D. (2002). Behavioural 
plasticity in a large marine herbivore: contrasting patterns of depth utilisation between 
two green turtle (Chelonia mydas) populations. Marine Biology, 141(5), 985-990. 
https://doi.org/10.1007/s00227-002-0885-7 
Heidemeyer, M., Arauz-Vargas, R., & L?pez-Ag?ero, E. (2014). New foraging grounds for 
hawksbill (Eretmochelys imbricata) and green turtles (Chelonia mydas) along the 
northern Pacific coast of Costa Rica, Central America. Revista de Biologia Tropical, 
62(4), 109-118. http://www.redalyc.org/articulo.oa?id=44958812009 
Heldstab, S. A., Isler, K., Schuppli, C., & van Schaik, C. P. (2020). When ontogeny recapitulates 
phylogeny: Fixed neurodevelopmental sequence of manipulative skills among primates. 
Science advances, 6(30), eabb4685. https://doi.org/10.1126/sciadv.abb4685 
Helly, J. J., & Levin, L. A. (2004). Global distribution of naturally occurring marine hypoxia on 
continental margins. Deep Sea Research Part I: Oceanographic Research Papers, 51(9), 
1159-1168. https://doi.org/10.1016/j.dsr.2004.03.009 
Hennig, C. (2020). Package ?fpc?. fpc: Flexible Procedures for Clustering, Retrieved March 15th, 
2022, from https://cran.r-project.org/web/packages/fpc/index.html 
69 
 
 
 
Hochachka, P.W. & Storey, K.B. (1975). Metabolic Consequences of Diving in Animals and 
Man: The diving habit calls for controlled oscillation between aerobic and anaerobic 
metabolism. Science, 187(4177), 613-621. https://doi.org/10.1126/science.163485 
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. & 
Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision 
applications. arXiv preprint, arXiv:1704.04861. 
https://doi.org/10.48550/arXiv.1704.04861 
Howes, B.J., Brown, J.W., Gibbs, H.L., Herman, T.B., Mockford, S.W., Prior, K.A. & 
Weatherhead, P.J. (2009). Directional gene flow patterns in disjunct populations of the 
black ratsnake (Pantheropis obsoletus) and the Blanding?s turtle (Emydoidea blandingii). 
Conservation Genetics, 10(2), 407-417. https://doi.org/10.1007/s10592-008-9607-0 
Hua, K.L., Hsu, C.H., Hidayati, S.C., Cheng, W.H. & Chen, Y.J. (2015). Computer-aided 
classification of lung nodules on computed tomography images via deep learning 
technique. OncoTargets and Therapy, 8. https://dx.doi.org/10.2147%2FOTT.S80733 
Isaacks, R.R., Harkness, D.R. & Whitham, P.R. (1978). Relationship between the major 
phosphorylated metabolic intermediates and oxygen affinity of whole blood in the 
loggerhead (Caretta caretta) and the green sea turtle (Chelonia mydas mydas) during 
development. Developmental Biology, 62, 344?353. https://doi.org/10.1016/0012-
1606(78)90221-X 
Jackson, D.C. (1985). Respiration and respiratory control in the green turtle, Chelonia mydas. 
Copeia, (3), 664-671. https://doi.org/10.2307/1444760 
70 
 
 
 
Jackson, D.C., Ramsey, A.L., Paulson, J.M., Crocker, C.E. & Ultsch, G.R. (2000). Lactic acid 
buffering by bone and shell in anoxic softshell and painted turtles. Physiological and 
Biochemical Zoology, 73(3), 290-297. https://doi.org/10.1086/316754 
Jensen, M.P., Allen, C.D., Eguchi, T., Bell, I.P., LaCasella, E.L., Hilton, W.A., Hof, C.A. & 
Dutton, P.H. (2018). Environmental warming and feminization of one of the largest sea 
turtle populations in the world. Current Biology, 28(1),154-159. 
https://doi.org/10.1016/j.cub.2017.11.057 
Jensen, M. P., FitzSimmons, N. N., Bourjea, J., Hamabata, T., Reece, J., & Dutton, P. H. (2019). 
The evolutionary history and global phylogeography of the green turtle (Chelonia 
mydas). Journal of Biogeography, 46(5), 860-870. https://doi.org/10.1111/jbi.13483 
Kamezaki, N., & Matsui, M. (1995). Geographic variation in skull morphology of the green 
turtle, Chelonia mydas, with a taxonomic discussion. Journal of Herpetology, 51-60. 
Karl, S. A., & Bowen, B. W. (1999). Evolutionary significant units versus geopolitical 
taxonomy: molecular systematics of an endangered sea turtle (genus Chelonia). 
Conservation Biology, 13(5), 990-999. https://doi.org/10.1046/j.1523-1739.1999.97352.x 
Khan, S., Nabi, G., Ullah, M.W., Yousaf, M., Manan, S., Siddique, R. & Hou, H. (2016). 
Overview on the Role of Advance Genomics in Conservation Biology of Endangered 
Species. International Journal of Genomics, 2016, 3460416-3460416. 
http://dx.doi.org/10.1155/2016/3460416 
Klingenberg, C.P. (2016). Size, shape, and form: concepts of allometry in geometric 
morphometrics. Development Genes and Evolution, 226(3), 113-137. 
https://doi.org/10.1007/s00427-016-0539-2 
71 
 
 
 
Koepfli, K.P., Pollinger, J., Godinho, R., Robinson, J., Lea, A., Hendricks, S., Schweizer, R.M., 
Thalmann, O., Silva, P., Fan, Z. & Yurchenko, A.A. (2015). Genome-wide evidence 
reveals that African and Eurasian golden jackals are distinct species. Current Biology, 
25(16), 2158-2165. https://doi.org/10.1016/j.cub.2015.06.060 
Kot, C.Y., ?kesson, S., Alfaro?Shigueto, J., Amorocho Llanos, D.F., Antonopoulou, M., 
Balazs, G.H., Baverstock, W.R., Blumenthal, J.M., Broderick, A.C., Bruno, I. & 
Canbolat, A.F. (2022). Network analysis of sea turtle movements and connectivity: A 
tool for conservation prioritization. Diversity and Distributions, 28(4), 810-829. 
https://doi.org/10.1111/ddi.13485 
Komoroske, L.M., Jensen, M.P., Stewart, K.R., Shamblin, B.M. & Dutton, P.H. (2017). 
Advances in the application of genetics in marine turtle biology and conservation. 
Frontiers in Marine Science, 4, 156. https://doi.org/10.3389/fmars.2017.00156 
Lamb, T. & Avise, J.C. (1992). Molecular and population genetic aspects of mitochondrial DNA 
variability in the diamondback terrapin, Malaclemys terrapin. Journal of Heredity, 83(4), 
262-269. https://doi.org/10.1093/oxfordjournals.jhered.a111211 
Labrada-Martag?n, V., Tener?a, F.A.M., Herrera-Pav?n, R. & Negrete-Philippe, A. (2017). 
Somatic growth rates of immature green turtles Chelonia mydas inhabiting the foraging 
ground Akumal Bay in the Mexican Caribbean Sea. Journal of Experimental Marine 
Biology and Ecology, 487, 68-78. https://doi.org/10.1016/j.jembe.2016.11.015 
Lee, L.S., Navarro-Dom?nguez, B.M., Wu, Z., Montiel, E.E., Badenhorst, D., Bista, B., Gessler, 
T.B. & Valenzuela, N. (2020). Karyotypic evolution of sauropsid vertebrates illuminated 
by optical and physical mapping of the painted turtle and slider turtle genomes. Genes, 
11(8), 928. https://doi.org/10.3390/genes11080928 
72 
 
 
 
Lucas, T. C. (2020). A translucent box: interpretable machine learning in ecology. Ecological 
Monographs, 90(4), e01422. https://doi.org/10.1002/ecm.1422 
L?rig, M. D., Donoughe, S., Svensson, E. I., Porto, A., & Tsuboi, M. (2021). Computer vision, 
machine learning, and the promise of phenomics in ecology and evolutionary biology. 
Frontiers in Ecology and Evolution, 9, 148. https://doi.org/10.3389/fevo.2021.642774 
Madrak, S.V., Lewison, R.L., Eguchi, T., Klimley, A.P. & Seminoff, J.A. (2022). Effects of 
ambient temperature on dive behavior of East Pacific green turtles before and after a 
power plant closure. Marine Ecology Progress Series, 683, 157-168. 
https://doi.org/10.3354/meps13940 
Makowski, C., Seminoff, J. A., & Salmon, M. (2006). Home range and habitat use of juvenile 
Atlantic green turtles (Chelonia mydas L.) on shallow reef habitats in Palm Beach, 
Florida, USA. Marine Biology, 148(5), 1167-1179. https://doi.org/10.1007/s00227-005-
0150-y 
Mallet, J. (1995). A species definition for the modern synthesis. Trends in Ecology & Evolution, 
10(7), 294-299. https://doi.org/10.1016/0169-5347(95)90031-4 
Marshall, S. A., & Evenhuis, N. L. (2015). New species without dead bodies: a case for photo-
based descriptions, illustrated by a striking new species of Marleyimyia Hesse (Diptera, 
Bombyliidae) from South Africa. ZooKeys, (525), 117. 
https://dx.doi.org/10.3897%2Fzookeys.525.6143 
Martin, B. T., Chafin, T. K., Douglas, M. R., Placyk Jr, J. S., Birkhead, R. D., Phillips, C. A., & 
Douglas, M. E. (2021). The choices we make and the impacts they have: Machine 
learning and species delimitation in North American box turtles (Terrapene spp.). 
73 
 
 
 
Molecular Ecology Resources, 21(8), 2801-2817. https://doi.org/10.1111/1755-
0998.13350 
Maurer, A. S., Seminoff, J. A., Layman, C. A., Stapleton, S. P., Godfrey, M. H., & Reiskind, M. 
O. B. (2021). Population viability of sea turtles in the context of global warming. 
BioScience, 71(8), 790-804. https://doi.org/10.1093/biosci/biab028 
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and 
projection for dimension reduction. arXiv preprint, arXiv:1802.03426. 
https://doi.org/10.48550/arXiv.1802.03426 
Mendon?a, M. T. (1983). Movements and feeding ecology of immature green turtles (Chelonia 
mydas) in a Florida lagoon. Copeia, 1013-1023. https://doi.org/10.2307/1445104 
Myers, E.M., Janzen, F.J., Adams, D.C. & Tucker, J.K. (2006). Quantitative genetics of plastron 
shape in slider turtles (Trachemys scripta). Evolution, 60(3), 563-572. 
https://doi.org/10.1111/j.0014-3820.2006.tb01137.x 
Nagle, R. D., Rowe, C. L., Grant, C. J., Sebastian, E. R., & Martin, B. E. (2018). Abnormal shell 
shapes in northern map turtles of the Juniata River, Pennsylvania, USA. Journal of 
Herpetology, 52(1), 59-66. https://doi.org/10.1670/17-030 
Naro-Maciel, E., Gaughran, S.J., Putman, N.F., Amato, G., Arengo, F., Dutton, P.H., McFadden, 
K.W., Vintinner, E.C. & Sterling, E.J. (2014). Predicting connectivity of green turtles at 
Palmyra Atoll, central Pacific: a focus on mtDNA and dispersal modelling. Journal of the 
Royal Society Interface, 11(93), 20130888. https://doi.org/10.1098/rsif.2013.0888 
Norman, J.A., Moritz, C. & Limpus, C.J. (1994). Mitochondrial DNA control region 
polymorphisms: genetic markers for ecological studies of marine turtles. Molecular 
Ecology, 3(4), 363-373. https://doi.org/10.1111/j.1365-294X.1994.tb00076.x 
74 
 
 
 
Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. & Clune, 
J. (2018). Automatically identifying, counting, and describing wild animals in camera-
trap images with deep learning. Proceedings of the National Academy of Sciences, 
115(25), E5716-E5725. https://doi.org/10.1073/pnas.1719367115 
Odegard, D.T., Sonnenfelt, M.A., Bledsoe, J.G., Keenan, S.W., Hill, C.A. & Warren, D.E., 
(2018). Changes in the material properties of the shell during simulated aquatic 
hibernation in the anoxia-tolerant painted turtle. Journal of Experimental Biology, 
221(18), jeb176990. https://doi.org/10.1242/jeb.176990 
Okamoto, K. & Kamezaki, N. (2014). Morphological variation in Chelonia mydas (Linnaeus, 
1758) from the coastal waters of Japan, with special reference to the turtles allied to 
Chelonia mydas agassizii Bocourt, 1868. Current Herpetology, 33(1), 46-56. 
https://doi.org/10.5358/hsj.33.46 
Paixao, W. R., Paixao, T. M., Costa, M. C. B., Andrade, J. O., Pereira, F. G., & Komati, K. S. 
(2018). Texture classification of sea turtle shell based on color features: color histograms 
and chromaticity moments. International Journal of Artificial Intelligence and 
Applications (IJAIA), 9(2), 55-67. http://dx.doi.org/10.5121/ijaia.2018.9205 
Parham, J.F. & Zug, G.R. (1996). Chelonia agassizii-valid or not. Marine Turtle Newsletter, 
72(2). http://www.seaturtle.org/mtn/archives/mtn72/mtn72p2b.shtml?nocount 
Parker, D.M., Dutton, P.H. & Balazs, G.H. (2011). Oceanic diet and distribution of haplotypes 
for the green turtle, Chelonia mydas, in the Central North Pacific. Pacific Science, 65(4), 
419-431. https://doi.org/10.2984/65.4.419 
Perez, M. F., Bonatelli, I. A., Romeiro?Brito, M., Franco, F. F., Taylor, N. P., Zappi, D. C., & 
Moraes, E. M. (2022). Coalescent?based species delimitation meets deep learning: 
75 
 
 
 
Insights from a highly fragmented cactus system. Molecular Ecology Resources, 22(3), 
1016-1028. https://doi.org/10.1111/1755-0998.13534 
Pike, D. A. (2014). Forecasting the viability of sea turtle eggs in a warming world. Global 
Change Biology, 20 (1), 7-15. http://dx.doi.org/10.1111/gcb.12397 
Poulakakis, N., Edwards, D.L., Chiari, Y., Garrick, R.C., Russello, M.A., Benavides, E., 
Watkins-Colwell, G.J., Glaberman, S., Tapia, W., Gibbs, J.P. & Cayot, L.J., (2015). 
Description of a new Gal?pagos giant tortoise species (Chelonoidis; Testudines: 
Testudinidae) from Cerro Fatal on Santa Cruz Island. PLOS ONE, 10(10), e0138779. 
https://doi.org/10.1371/journal.pone.0138779 
Pritchard, P.C. (1999). Status of the black turtle. Conservation Biology, 1000-1003. 
https://www.jstor.org/stable/2641731 
Ramos, E.K.D.S., Freitas, L. & Nery, M.F. (2020). The role of selection in the evolution of 
marine turtles mitogenomes. Scientific Reports, 10(1), 1-13. 
https://doi.org/10.1038/s41598-020-73874-8 
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for 
Statistical Computing, Vienna, Austria,  R version 4.1.2.  https://www.R-project.org/ 
Reich, K.J., Bjorndal, K.A. & Bolten, A.B. (2007). The ?lost years? of green turtles: using stable 
isotopes to study cryptic life stages. Biology Letters, 3(6),712-714. 
https://doi.org/10.1098/rsbl.2007.0394 
Res?ndiz, E., Fern?ndez-Sanz, H., & Lara-Uc, M. M. (2018). Baseline health indicators of 
eastern Pacific green turtles (Chelonia mydas) from Baja California Sur, Mexico. 
Comparative Clinical Pathology, 27(5), 1309-1320. https://doi.org/10.1007/s00580-018-
2740-3 
76 
 
 
 
Rivera, G. (2008). Ecomorphological variation in shell shape of the freshwater turtle Pseudemys 
concinna inhabiting different aquatic flow regimes. Integrative and Comparative 
Biology, 48(6), 769-787. https://doi.org/10.1093/icb/icn088 
Rivera, G., Davis, J. N., Godwin, J. C., & Adams, D. C. (2014). Repeatability of habitat-
associated divergence in shell shape of turtles. Evolutionary Biology, 41(1), 29-37. 
https://doi.org/10.1007/s11692-013-9243-6 
Roberts, M.A., Schwartz, T.S. & Karl, S.A. (2004). Global population genetic structure and 
male-mediated gene flow in the green sea turtle (Chelonia mydas): analysis of 
microsatellite loci. Genetics, 166(4), 1857-1870. 
https://doi.org/10.1093/genetics/166.4.1857 
Roman, J., Santhuff, S. D., Moler, P. E., & Bowen, B. W. (1999). Population structure and 
cryptic evolutionary units in the alligator snapping turtle. Conservation Biology, 13(1), 
135-142. https://doi.org/10.1046/j.1523-1739.1999.98007.x 
Ronneberger, O., Fischer, P. & Brox, T. (2015). U-net: Convolutional networks for biomedical 
image segmentation. International Conference on Medical Image Computing and 
Computer-assisted Intervention, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 
Rubinoff, I. (1968). Central American Sea-Level Canal: Possible Biological Effects: An 
opportunity for the greatest biological experiment in man's history may not be exploited. 
Science, 161(3844), 857-861. https://doi.org/10.1126/science.161.3844.857 
Salmon, M., Mott, C.R. & Bresette, M.J. (2018). Biphasic allometric growth in juvenile green 
turtles Chelonia mydas. Endangered Species Research, 37, 301-308. 
https://doi.org/10.3354/esr00930 
77 
 
 
 
Sampson, L., Giraldo, A., Pay?n, L. F., Amorocho, D. F., Ramos, M. A., & Seminoff, J. A. 
(2018). Trophic ecology of green turtle Chelonia mydas juveniles in the Colombian 
Pacific. Journal of the Marine Biological Association of the United Kingdom, 98(7), 
1817-1829. https://doi.org/10.1017/S0025315417001400 
Sampson, L., Pay?n, L. F., Amorocho, D. F., Seminoff, J. A., & Giraldo, A. (2014). Intraspecific 
variation of the green turtle, Chelonia mydas (Cheloniidae), in the foraging area of 
Gorgona Natural National Park (Colombian Pacific). Acta Biol?gica Colombiana, 19(3), 
461-470. http://dx.doi.org/10.15446/abc.v19n3.42615 
Santos, C.M.D., Amorim, D.S., Klassa, B., Fachin, D.A., Nihei, S.S., De Carvalho, C.J., 
Falaschi, R.L., Mello-Patiu, C.A., Couri, M.S., Oliveira, S.S. & Silva, V.C. (2016). On 
typeless species and the perils of fast taxonomy. Systematic Entomology, 41(3), 511-515. 
https://doi.org/10.1111/syen.12180 
Saryan, P., Gupta, S., & Gowda, V. (2020). Species complex delimitations in the genus 
Hedychium: A machine learning approach for cluster discovery. Applications in Plant 
Sciences, 8(7), e11377. https://doi.org/10.1002/aps3.11377 
Schuettpelz, E., Frandsen, P.B., Dikow, R.B., Brown, A., Orli, S., Peters, M., Metallo, A., Funk, 
V.A. and Dorr, L.J. (2017). Applications of deep convolutional neural networks to 
digitized natural history collections. Biodiversity Data Journal, 1(5). 
https://dx.doi.org/10.3897%2FBDJ.5.e21139 
Schultz, E. A. (2016). Genetic analysis, movement, and nesting patterns of the green sea turtle 
(Chelonia mydas) in St. Croix, Virgin Islands (USA): A regional analysis for the 
Caribbean (Doctoral dissertation). Savannah State University, Savannah, GA. 
78 
 
 
 
 Seminoff, J. A., Alfaro-Shigueto, J., Amorocho, D., Aruaz, R., Baquero Gallegos, A., Chacon 
Chaverri, D., Gaos, A. R., Kelez, S., Mangel, J. C., Urteaga, J., & Wallace, B. P. (2012). 
Biology and Conservation of Sea Turtles in the Eastern Pacific. In: Sea Turtles in the 
Eastern Pacific Advances in Research and Conservation. Tucson, AZ, USA: University 
of Arizona Press, 11-38. https://doi.org/10.2307/j.ctv21hrddc 
Seminoff, J.A., Allen, C.D., Balazs, G.H., Dutton, P.H., Eguchi, T., Haas, H.L., Hargrove, S., 
Jensen, M.P., Klemm, D.L., Lauritsen, M. & MacPherson, S.L. (2015). Status review of 
the green turtle (Chelonia mydas) under the US Endangered Species Act. NOAA 
Technical Memorandum NOAA-TM-NMFS-SWFSC-539. 
Seminoff, J. A., & Jones, T. T. (2006). Diel movements and activity ranges of green turtles 
(Chelonia mydas) at a temperate foraging area in the Gulf of California, Mexico. 
Herpetological Conservation and Biology, 1(2), 81-86. 
https://www.herpconbio.org/volume_1/issue_2/Seminoff_Jones_2006.pdf 
Seminoff, J. A., & Shanker, K. (2008). Marine turtles and IUCN Red Listing: a review of the 
process, the pitfalls, and novel assessment approaches. Journal of Experimental Marine 
Biology and Ecology, 356(1-2), 52-68. https://doi.org/10.1016/j.jembe.2007.12.007 
Seminoff, J. A., Resendiz, A., & Nichols, W. J. (2002). Diet of East Pacific green turtles 
(Chelonia mydas) in the central Gulf of California, Mexico. Journal of 
Herpetology, 36(3), 447-453. https://doi.org/10.1670/0022-
1511(2002)036[0447:DOEPGT]2.0.CO;2 
Seminoff, J. A., Whitman, E. R., Wallace, B. P., Bayless, A., Resendiz, A., & Jones, T. T. 
(2020). No rest for the weary: restricted resting behaviour of green turtles (Chelonia 
mydas) at a deep-neritic foraging area influences expression of life history traits. Journal 
79 
 
 
 
of Natural History, 54(45-46), 2979-3001. 
https://doi.org/10.1080/00222933.2021.1887387 
Shanker, K. (2001). A review of species concepts: ideas for a new concept and implications for 
the green?black turtle debate. Twenty-first Annual Symposium on Sea Turtle Biology 
and Conservation, Philadelphia: NOAA Technical Memorandum NOAA-NMFS-SEFSC-
528, 323-325. 
https://repository.library.noaa.gov/view/noaa/3412/noaa_3412_DS1.pdf#page=341 
Shatalkin, A. I., & Galinskaya, T. V. (2017). A commentary on the practice of using the so-
called typeless species. ZooKeys, (693), 129. https://doi.org/10.3897/zookeys.693.10945 
Skejo, J. O. S. I. P., & Caballero, J. H. S. (2016). A hidden pygmy devil from the Philippines: 
Arulenus miae sp. nov.?a new species serendipitously discovered in an amateur 
Facebook post (Tetrigidae: Discotettiginae). Zootaxa, 4067(3), 383-393. 
http://doi.org/10.11646/zootaxa.4067.3.7 
S?nmez, B. (2019). Morphological variations in the green turtle (Chelonia mydas): A field study 
on an eastern Mediterranean nesting population. Zoological Studies, 58. 
https://dx.doi.org/10.6620%2FZS.2019.58-16 
Tomaszewicz, C. N. T., Seminoff, J. A., Avens, L., Goshe, L. R., Rguez-Baron, J. M., Peckham, 
S. H., & Kurle, C. M. (2018). Expanding the coastal forager paradigm: long-term pelagic 
habitat use by green turtles Chelonia mydas in the eastern Pacific Ocean. Marine Ecology 
Progress Series, 587, 217-234. https://doi.org/10.3354/meps12372 
Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA, USA: 
CreateSpace. https://www.python.org/ 
80 
 
 
 
V?lez-Rubio, G. M., Cardona, L., L?pez-Mendilaharsu, M., Mart?nez Souza, G., Carranza, A., 
Gonz?lez-Paredes, D., & Tom?s, J. (2016). Ontogenetic dietary changes of green turtles 
(Chelonia mydas) in the temperate southwestern Atlantic. Marine Biology, 163(3), 1-16. 
https://doi.org/10.1007/s00227-016-2827-9 
White, A.E., Dikow, R.B., Baugh, M., Jenkins, A. & Frandsen, P.B. (2020). Generating 
segmentation masks of herbarium specimens and a data set for training segmentation 
models using deep learning. Applications in Plant Sciences, 8(6), e11352. 
https://doi.org/10.1002/aps3.11352 
White, A. E., Trizna, M. G., Frandsen, P. B., Dorr, L. J., Dikow, R. B., & Schuettpelz, E. (2019). 
Evaluating Geographic Patterns of Morphological Diversity in Ferns and Lycophytes 
Using Deep Neural Networks. Biodiversity Information Science and Standards, (4). 
https://doi.org/10.3897/biss.3.37559 
Whiting, S. D., & Miller, J. D. (1998). Short term foraging ranges of adult green turtles 
(Chelonia mydas). Journal of Herpetology, 330-337. https://doi.org/10.2307/1565446 
Wood, S.C., Gatz, R.N. & Glass, M.L. (1984). Oxygen transport in the green sea turtle. Journal 
of Comparative Physiology B, 154(3), 275-280. https://doi.org/10.1007/BF02464407 
Wyneken, J., Balazs, G. H., Murakawa, S., & Anderson, Y. (1999). Size differences in hind 
limbs and carapaces of hatchling green turtles (Chelonia mydas) from Hawaii and 
Florida, USA. Chelonian Conservation and Biology, 3(3), 491-495. 
Yang, Y., Sun, H., Zhang, Y., Zhang, T., Gong, J., Wei, Y., Duan, Y.G., Shu, M., Yang, Y., Wu, 
D. & Yu, D. (2021). Dimensionality reduction by UMAP reinforces sample heterogeneity 
analysis in bulk transcriptomic data. Cell Reports, 36(4), 109442. 
https://doi.org/10.1016/j.celrep.2021.109442 
81 
 
 
 
Yildirim, M., & Cinar, A. (2022). Classification with respect to colon adenocarcinoma and colon 
benign tissue of colon histopathological images with a new CNN model: 
MA_ColonNET. International Journal of Imaging Systems and Technology, 32(1), 155-
162. https://doi.org/10.1002/ima.22623 
Younis, S., Weiland, C., Hoehndorf, R., Dressler, S., Hickler, T., Seeger, B., & Schmidt, M. 
(2018). Taxon and trait recognition from digitized herbarium specimens using deep 
convolutional neural networks. Botany Letters, 165(3-4), 377-383. 
https://doi.org/10.1080/23818107.2018.1446357 
Z?rate, P. M., Bjorndal, K. A., Seminoff, J. A., & Bolten, A. B. (2012). Understanding migratory 
and foraging behavior of green turtles Chelonia mydas in the Galapagos Islands through 
stable isotopes. Thirty-first Annual Symposium on Sea Turtle Biology and Conservation, 
San Diego: NOAA Technical Memorandum NOAA-NMFS-SEFSC-631. 
Z?rate, P.M., Bjorndal, K.A., Seminoff, J.A., Dutton, P.H. & Bolten, A.B. (2015). Somatic 
growth rates of green turtles (Chelonia mydas) and hawksbills (Eretmochelys imbricata) 
in the Galapagos Islands. Journal of Herpetology, 49(4), 641-648. 
https://doi.org/10.1670/14-078 
Zimm, R., Bentley, B. P., Wyneken, J., & Moustakas-Verho, J. E. (2017). Environmental 
causation of turtle scute anomalies in ovo and in silico. Integrative and Comparative 
Biology, 57(6), 1303-1311. https://doi.org/10.1093/icb/icx066 
 
 
  
82 
 
 
 
Chapter 3: Application of a Deep Learning Image Classifier for 
Identification of Amazonian Fishes 
 
Abstract 
1. Given the sharp increase in agricultural and infrastructure development and the paucity of 
widespread data available to support conservation management decisions, a more rapid 
and accurate tool for identifying fish fauna in the world's largest freshwater ecosystem, 
the Amazon, is needed.  
2. Current strategies for identification of freshwater fishes require high levels of training 
and taxonomic expertise for morphological identification or genetic testing for species 
recognition at a molecular level.  
3. To overcome these challenges, we built an image masking model (U-Net) and a 
Convolutional Neural Net (CNN) to classify Amazonian fish in photographs. Fish used to 
generate training data were collected and photographed in tributaries in seasonally 
flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019.  
4. Species identifications in the training images (n = 3,068) were verified by expert 
ichthyologists. These images were supplemented with photographs taken of additional 
Amazonian fish specimens housed in the ichthyological collection of the Smithsonian?s 
National Museum of Natural History.  
5. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 
97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the 
one described here, will enable fishermen, local communities, and citizen scientists to 
83 
 
 
 
more effectively participate in collecting and sharing data from their territories to inform 
policy and management decisions that impact them directly.  
 
Introduction 
The Amazon basin is home to over 2,700 species of freshwater fishes (Junk et al., 2007; 
Dagosta & De Pinna, 2019), many of which are of conservation concern (Albert et al., 2011; 
Garc?a-D?vila et al., 2018; Pelicice et al., 2021). Freshwater fishes provide one of the few 
reliable sources of protein for Amazonian communities and represent an important economic 
opportunity through the aquarium trade (Moreau & Coomes, 2007; Coomes et al., 2010). This 
unique ichthyofauna is facing unprecedented threats, such as deforestation (Junk et al., 2007; 
Lob?n-Cervi? et al., 2015), construction of hydropower dams (Winemiller et al., 2016), mining 
(Azevedo-Santos et al., 2021), climate change (Bodmer et al., 2017), and in some cases, over 
exploitation (Moreau & Coomes, 2007). While advances in sampling poorly explored areas and 
describing the diversity of Amazonian fish have been made over the last decade (e.g Alfos et al., 
2014; de Santana et al., 2019; de Santana et al., 2021), the sub-drainages of the Mara??n river 
remain among the most under sampled regions in South America (J?z?quel et al., 2020). 
Freshwater fishes provide one of the few reliable sources of protein for Amazonian communities 
(Moreau & Coomes, 2007; Coomes et al., 2010). In less populated areas of the Amazon, 
subsistence fishing, for both consumption and the pet trade, can be essential to sustaining life 
(Moreau & Coomes, 2007; Coomes et al., 2010). Due to the urgency of these economic and 
ecological threats, efficient data collection and long-term monitoring are needed to better inform 
mitigation strategies and policy.  
84 
 
 
 
 Traditional ichthyological sampling methods include focused netting and fishing efforts, 
followed by extensive manual sorting, documentation, and identification. Although effective, and 
necessary in the Amazon where a countless number of fishes remain to be described (Reis et al., 
2016), these methods are time consuming and raise the potential for misidentification bias 
(Kirsch et al., 2018). As a result, many have turned to the assistance of community scientists to 
aid in catch effort and identification of individual landings, yet accurate species identification 
remains a challenge (Gardiner et al., 2012; Swanson et al., 2016). Genetic approaches have also 
been implemented to identify many of the fish species inhabiting the Amazon (Garc?a-D?vila et 
al., 2017; de Santana et al., 2021), but these approaches also rely on well-identified and 
vouchered genetic libraries that are still missing for Amazonian fishes. These techniques require 
expensive storage and sample processing technology, which are not readily available in most 
institutions within the Amazon (de Santana et al., 2021). In order to address the ever-growing 
need for data and cost-effective solutions, contemporary fisheries research has called for the 
development and application of a rapid solution, namely by way of machine learning models, 
such as Convolutional Neural Networks (CNNs, e.g., Perdig?o et al., 2020). CNNs have the 
potential to enable rapid identification of fish to monitor fishery stocks, diversity, bycatch, and to 
combat illegal fishing (Marini et al., 2018; Perdig?o et al., 2020). 
 Machine learning techniques have been successfully implemented in niche modeling, 
prediction of mass mortality events, and the development of non-linear ecological time-series 
models (Recknagel, 2001; Crisci et al., 2012; Miller-Coleman et al., 2012). Image classification 
deep learning models show promise in being applied to highly diverse taxa and collections 
(Weinstein, 2017; Norouzzadeh et al., 2018; Sullivan et al., 2018; W?ldchen & M?der, 2018; 
Borowiec et al., 2021). Past attempts to identify fish taxa using computer vision have had 
85 
 
 
 
varying degrees of success across a wide breadth of ichthyological data sets. For example, early 
attempts by Alsmadi et al. (2010) were able to identify 20 families of marine fish from 610 
images with an accuracy of 84%. More recent work improved accuracy to 90% (Alsmadi et al., 
2019). Hern?ndez-Serna and Jim?nez-Segura (2014) used seven museum collections that 
included both marine and Amazonian freshwater fish (images per collection ranged from 422 to 
2,392) and obtained accuracies between 72-92%. Sun et al. (2016) obtained a species 
identification accuracy of 77.27% from 9,160 AUV images of fish. A study by Qin et al. (2015) 
was able to identify 23 deep sea fish species with an accuracy of 98% using a substantial number 
of training images (n = 22,370).  
 In this study, we developed two deep learning computer vision models: one that segments 
fish pixels from background pixels, and one that classifies images of Amazonian fishes to the 
genus level. As the first image classifier for ichthyological monitoring in the megadiverse 
Peruvian Amazon basin, we hope this case study will act as a primer for further development of 
deep learning models, as tools for conservation stakeholders. Deep learning for taxonomic image 
classification has proven to be efficient and highly accurate, demonstrating promise for 
improving participatory monitoring initiatives (Norouzzadeh et al., 2018; Sullivan et al., 2018). 
Specifically, these tools will enable communities involved in participatory monitoring to fill 
knowledge gaps and improve data reliability. These models can also provide a basis on which to 
build new models for other species of conservation concern and public health interest. Our data 
86 
 
 
 
and pipeline are publicly available, which will enable others to apply these techniques to other 
taxa. 
Methods 
In July 2018, we sampled freshwater fishes in small white-water rivers, and black and 
white-water streams in seasonally flooded forests of the upper Morona River valley in Achuar 
native territory, Loreto, Peru. Sites were resampled in November 2018 and November 2019. Fish 
were identified by specialists with the aid of dichotomous taxonomic keys considering 
morphological, meristic, and morphometric characteristics. Taxonomic nomenclature follows 
Fricke et al. (2018). A total of 141 fish species belonging to 89 genera and 29 families across all 
sites and seasons were identified (Morgan Ruiz-Tafur, unpublished data). Captured fish (n = 1, 
967) were placed on a 1cm grid or a neutral background (leaves, hands, ground, etc.) and 
photographed using a Nikon D3500 camera, prior to preservation. Specimens were deposited in 
the ichthyology collection at the Instituto de Investigaciones de la Amazonia Peruana (IIAP) in 
Iquitos, Peru. Due to the limited number of images, we had per species, we restricted our 
analysis to genera (n = 33), using a minimum threshold of 20 field images per genus (n = 1,615). 
To supplement field images, we incorporated additional images (n = 1,453) taken of specimens 
housed at the Smithsonian National Museum of Natural History Department of Vertebrate 
Zoology, Division of Fishes collection (USNM) using both a Nikon B500 and W100. Fish 
87 
 
 
 
specimens were photographed on both blank and 1cm grid backgrounds from multiple angles. In 
total, our dataset consists of 3,068 images prior to processing. 
Preprocessing Steps 
 To build a training dataset, we first removed all incidentally taken/non-fish and 
unidentified fish images. We then built a U-Net (Ronneberger et al., 2015) segmentation model 
to classify pixels in images as fish or background using the methods similar to White et al. 
(2020). Specifically, we manually masked a subset of images (n = 66; 2 images from each 
genus), using the methods of White et al. (2020), to use as a training set to build a U-Net. Our 
generated masks zeroed out (blacked) background pixels, while retaining fish pixels. The model 
was built on a resnet-34 architecture pretrained on the ImageNet dataset (Deng et al., 2009). All 
field and museum images were then masked by our trained U-Net. Images which were 
unsuccessfully masked, where no component of the original input image remained within the 
photo, were removed from the dataset. The remaining images, which had at least some 
component of the target object with no background, were then subdivided for training and 
validation of the genus identification model. 
Identification Model Architecture, Training, and Validation 
 We trained our image classifier to distinguish between 33 fish genera based on masked 
images. The classifier was developed using a Nvidia GeForce (V100; 32GB VRAM) GPU 
implementing the Fast.ai library (Howard & Gugger, 2020) in PyTorch (Paszke et al., 2019). The 
model was built on a resnet-101 architecture pretrained on the ImageNet dataset (Deng et al., 
2009). To develop our image classifier model, masked images were randomly divided into 
training (n= 2,387) and validation (n= 596) sets, split 80/20 respectively, to maximize accuracy 
88 
 
 
 
(Hern?ndez-Serna & Jim?nez-Segura, 2014). All images were resized by ?squishing? them into 
300 x 300 pixels. We trained our model over 60 epochs with 1 training session of random 
transformations making up 6/60 epochs. 
 
 
Figure 3.1 Example of unmasked (left) and masked (right) images of a fish (Bario 
steindachneri). 
 
Results 
The U-Net masking model was trained over 20 epochs, at which point the training loss 
and validation loss were minimized. Our U-Net was able to successfully mask 97.23% (n = 
2,983) of our images. Images which were not successfully masked (n = 85) were removed from 
training and validation. Our Amazonian fish image classifier trained in 50 epochs at which point 
the training loss and validation loss were minimized. The validation set results, predicted class 
versus actual class, are summarized in a confusion matrix (Figure 3.2). Of the 596 validation 
images, the image classifier predicted 97.99% of them correctly. Accuracy by genus is 
89 
 
 
 
summarized in Table 1. The range of accuracy by genus was 88.89% to 100%. Accuracy by 
genus by training data type is summarized in Table 2. The models, and associated metadata are 
available at the Smithsonian figshare repository (DOI: 10.25573/data.17315126). The application 
for both models is available online (https://github.com/MikeTrizna/streamlit_fish_masking). 
90 
 
 
 
 
 
Table 3.1 Summary of validation set (n=596) results by genus.  
 
91 
 
 
 
 
Figure 3.2 Confusion matrix visualization of computer vision model validation results. The x-
axis depicts the genus predicted by the model. The y-axis depicts the actual genus to which the 
image belongs, organized by taxonomic class, family and genus according to Fricke et al. 
(2018). Correct identifications are depicted in the left-to-right diagonal, with a darker color 
indicating more correct identifications, and blank yellow squares indicating zeros. Masked 
image examples on y-axis are as follows: A- Bryconops, B- Tetragonopterus, C- Astyanax, D- 
Moenkhausia, E- Gymnotus, F- Ancistrus, G- Corydoras, H- Bujurquina.   
92 
 
 
 
Discussion 
 We were able to efficiently build a state-of-the-art model which can rapidly identify 
standardized Amazonian fish images to the genus level (n = 33) with 97.99% accuracy. Of the 12 
incorrectly classified images in our validation set, 7 were misclassified outside of their family, 
while 2 images were misclassified outside of their order. After visually examining the incorrectly 
classified images, it was evident that some of them were likely more difficult to classify because 
of bisection from incidentally masked fish pixels. In short, we believe our masking rendered a 
few of our images unidentifiable and is arguably an artifact of the data pipeline rather than a 
source of true error on the image classifier. To improve accuracy when used in the field, we 
recommend capturing multiple clear images of individual fish to ensure at least one is 
successfully masked prior to inference for identification. This can be done by using a 
background that is sufficiently distinct from the coloration of the fish being photographed. 
 Most misidentifications in our model involved tetras, small characids that are the 
dominant fish fauna in Amazonian small rivers and streams (de Oliveira et al., 2009). 
Historically, species-rich and closely-related tetras have been difficult to identify due to cryptic 
species diversity ? where more than one nominal species may be several undescribed species ? 
and the lack of exclusive morphological characters to identify some genera (e.g., Astyanax >170 
species and Hyphessobrycon >130 species; Escobar-Camacho et al., 2015; Oliveira et al., 2011; 
Barreto et al., 2017). In addition, an estimated 40% of species in the region have yet to be 
described (e.g., Reis et al., 2016). Thus, species misidentifications due to taxonomically complex 
groups, such as tetras and other cryptic assemblages, are common problems in manual 
morphological as well as with genetic identification approaches (e.g., de Santana et al., 2021) 
and this must be taken into account when building an image classifier for Amazonian fishes. In 
93 
 
 
 
short, the output given by an image classification model is only as good as the label given to 
each class during training. If the target class is not well defined, this ultimately may disrupt the 
classification accuracy for those genera. 
 Our model is deployed through a publicly accessible web-based application available on 
Streamlit. The emergence of new technologies such as mobile applications, wireless sensor 
networks, augmented/virtual reality and high throughput computing are already advancing 
scientific research by enabling community scientists to bridge the training gap through instant 
?expert? verification (Newman et al., 2012). In the case of remote locations in the Amazon basin, 
like those sampled for this study, collection of accurate, reliable data is vital for monitoring 
freshwater ecosystem health and local fish stocks. Fish are key indicators of water quality and 
the health of aquatic ecosystems (Harris, 1995), and for many indigenous Amazonian 
communities, fish are a reliable source of protein especially in times of hardship (Swierk & 
Madigosky, 2014). When deployed in the field, our model will empower community-led 
initiatives to monitor fish in the Amazon River basin to collect more accurate information and 
identify ecological trends about this integral source of food and income (Finer et al., 2008). 
 While we obtained a high level of accuracy in line with the results of other deep learning 
fish studies implementing image classifiers (Qin et al., 2015; Alsmadi et al., 2019), our study is 
novel because we were able to combine both museum and field collected images and we think 
this is a robust framework for future studies. Use of these different data sources in combination 
can enable novel insights that may not have been found by building separate museum and field 
models (Lendemer et al., 2020). Given the remoteness of the localities sampled in this study, and 
the cryptic nature of the species endemic to these sites, we will always be limited in terms of the 
number of field images we can acquire, which can limit the scope and breadth of the model we 
94 
 
 
 
can train. By utilizing a hybrid approach, and digitizing specimens in the museum collection, we 
were able to double the amount of total images available to generate the model.  
 Although past field efforts have applied image classification to citizen science data taken 
from the field (Van Horn et al., 2017), none have targeted freshwater fish in such highly diverse 
sites as the upper Morona River valley. Image classification models such as the model presented 
here increase the accessibility of taxonomic identification needed to accurately monitor 
ecosystem health and natural resources (Gardiner et al., 2012; Newman et al., 2012). In such an 
incredibly diverse ecosystem, a model accurately identifying fish to the genus level is a first step 
which will provide motivation for increased digitization efforts to obtain sufficient images for 
training a model at the species level. 
  In order to bolster future modeling efforts, and enable advancements in training data 
acquisition, collection of photographic data must be considered standard protocol going forward 
on any field or museum-based ecological study. This is especially true given the rapid 
advancement of mobile phone photography (Rasmusson et al., 2004). Although our images were 
originally taken at a high resolution of varying sizes, ultimately, they were resized to just 300 x 
300 pixels. At the time of this publication, there are several types of mobile phones which have 
cameras capable of capturing higher resolution images than the size used when training our 
model (Gonz?lez & Pozo, 2019).  
 Although past efforts have been able to accurately identify fish with varying degrees of 
success, we expect the ever-growing amount of image data available to enable generation of 
even more robust and more accurate models. Collection of more fish image data from this 
region, and continued development, will allow for a more sophisticated version of this model to 
be developed in the future. While this model is specific to the Peruvian Amazon, the workflow 
95 
 
 
 
itself is publicly available on GitHub, and can be applied to any taxonomic survey. In order to 
aid such efforts and expand this type of initiative to a global level, we recommend that those 
collecting ichthyological, or any taxonomic data, incorporate targeted image capture as part of 
their standardized protocols, and that they be made publicly available.  
Conclusions 
We present an application that can be used to rapidly and accurately classify freshwater 
fish from the upper Morona River valley in the northwest Amazon to genus for scientific 
research. Although able to classify 33 genera present in the current study area, the model 
described here provides a solid foundation for future projects. The application, which can be 
used to classify single images to genus, is accessible to the community online. The model's 
application to images taken from geographic areas outside of the northwestern Amazon has yet 
to be explored. 
References 
Albert, J.S., Carvalho, T.P., Petry, P., Holder, M.A., Maxime, E.L., Espino, J., Corahua, I., 
Quispe, R., Rengifo, B., Ortega, H. & Reis, R.E. (2011). Aquatic biodiversity in the 
Amazon: habitat specialization and geographic isolation promote species richness. 
Animals, 1(2), 205-241. https://doi.org/10.3390/ani1020205 
Alofs, K.M., Liverpool, E.A., Taphorn, D.C., Bernard, C.R. & L?pez?Fern?ndez, H. (2014). 
Mind the (information) gap: the importance of exploration and discovery for assessing 
conservation priorities for freshwater fish. Diversity and Distributions, 20(1), 107-113. 
https://doi.org/10.1111/ddi.12127 
96 
 
 
 
Alsmadi, M. K., Omar, K. B., Noah, S. A., & Almarashdeh, I. (2010). Fish recognition based on 
robust features extraction from color texture measurements using back-propagation 
classifier. Journal of Theoretical and Applied Information Technology, 18(1), 11-18 
Alsmadi, M. K., Tayfour, M., Alkhasawneh, R. A., Badawi, U., Almarashdeh, I., & Haddad, F. 
(2019). Robust feature extraction methods for general fish classification. International 
Journal of Electrical & Computer Engineering, 9, 2088-8708. 
https://doi.org/10.11591/ijece.v9i6.pp5192-5204 
Azevedo-Santos, V.M., Arcifa, M.S., Brito, M.F., Agostinho, A.A., Hughes, R.M., Vitule, J.R., 
Simberloff, D., Olden, J.D. & Pelicice, F.M. (2021). Negative impacts of mining on 
Neotropical freshwater fishes. Neotropical Ichthyology, 19. https://doi.org/10.1590/1982-
0224-2021-0001 
Barreto, C.A.V., Granja, M.M.C., Vidigal, P.M.P., Carmo, A.O. & Dergam, J.A. (2017). 
Complete mitochondrial genome sequence of neotropical fish Astyanax giton Eigenmann 
1908 (Ostariophysi; Characidae). Mitochondrial DNA Part B, 2(2), 839-840. 
https://doi.org/10.1080/23802359.2017.1403869 
Bodmer, R., Fang, T., Antunez, M., Puertas, P., Chota, K., Pittet, M., Kirkland, M., Walkey, M., 
Rios, C., Perez-Pe?a, P. & Mayor, P. (2017). Impact of recent climate fluctuations on 
biodiversity and people in flooded forests of the Peruvian Amazon. CBD Technical 
Series, 89, 81-90. 
Borowiec, M. L., Frandsen, P., Dikow, R., McKeeken, A., Valentini, G., & White, A. E. (2021). 
Deep learning as a tool for ecology and evolution. EcoEvoRxiv, 1-30. 
https://doi.org/10.32942/osf.io/nt3as 
97 
 
 
 
Coomes, O. T., Takasaki, Y., Abizaid, C., & Barham, B. L. (2010). Floodplain fisheries as 
natural insurance for the rural poor in tropical forest environments: evidence from 
Amazonia. Fisheries Management and Ecology, 17(6), 513-521. 
https://doi.org/10.1111/j.1365-2400.2010.00750.x 
Crisci, C., Ghattas, B. & Perera, G. (2012). A review of supervised machine learning algorithms 
and their applications to ecological data. Ecological Modelling, 240, 113-122. 
https://doi.org/10.1016/j.ecolmodel.2012.03.001 
Dagosta, F.C. & De Pinna, M. (2019). The fishes of the Amazon: distribution and 
biogeographical patterns, with a comprehensive list of species. Bulletin of the American 
Museum of Natural History, (431), 1-163. https://doi.org/10.1206/0003-0090.431.1.1 
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. & Fei-Fei, L. (2009). ImageNet: A large-scale 
hierarchical image database. IEEE Conference on Computer Vision and Pattern 
Recognition, 248-255. https://doi.org/10.1109/CVPR.2009.5206848 
de Oliveira, R.R., Rocha, M.S., dos Anjos, M.B., Zuanon, J. & Py-Daniel, L.H.R. (2009). Fish 
fauna of small streams of the Catua-Ipixuna Extractive Reserve, state of Amazonas, 
Brazil. Check List, 5(2), 154-172. https://doi.org/10.15560/5.2.154 
de Santana, C.D., Crampton, W.G., Dillman, C.B., Frederico, R.G., Sabaj, M.H., Covain, R., 
Ready, J., Zuanon, J., de Oliveira, R.R., Mendes-J?nior, R.N. & Bastos, D.A. (2019). 
Unexpected species diversity in electric eels with a description of the strongest living 
bioelectricity generator. Nature Communications, 10(1), 1-10. 
https://doi.org/10.1038/s41467-019-11690-z 
de Santana, C.D., Parenti, L.R., Dillman, C.B., Coddington, J.A., Bastos, D.A., Baldwin, C.C., 
Zuanon, J., Torrente-Vilara, G., Covain, R., Menezes, N.A. & Datovo, A. (2021). The 
98 
 
 
 
critical role of natural history museums in advancing eDNA for biodiversity studies: a 
case study with Amazonian fishes. bioRxiv. https://doi.org/10.1101/2021.04.18.440157 
Escobar-Camacho, D., Barriga, R. & Ron, S.R. (2015). Discovering Hidden Diversity of 
Characins (Teleostei: Characiformes) in Ecuador?s Yasun? National Park. PloS One, 
10(8), e0135569. https://doi.org/10.1371/journal.pone.0135569 
Finer, M., Jenkins, C.N., Pimm, S.L., Keane, B. & Ross, C. (2008). Oil and gas projects in the 
western Amazon: threats to wilderness, biodiversity, and indigenous peoples. PloS One, 
3(8), e2932. https://doi.org/10.1371/journal.pone.0002932 
Fricke, R., Eschmeyer, W. N., & Van der Laan, R. (2018). Catalog of fishes: genera, species, 
references. California Academy of Sciences, San Francisco, CA, USA 
http://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp 
Garc?a-D?vila, C. R., Flores, M., Pinedo, L., Loyola, R., Castro-Ruiz, D., Angulo, C., Mejia, E., 
S?nchez, H., Garc?a, A., Chota, W., Estivals, G., Panduro, H., Nolorbe, C., 
Chuquipiondo, C., Duponchelle, F., & Renno, J. F. (2017). Aplicaci?n del Barcoding al 
Manejo y Conservaci?n de Peces y sus Subproductos en la Amazon?a Peruana. Folia 
Amaz?nica, 26(2), 195-204. https://doi.org/10.24841/fa.v26i2.329 
Garc?a-D?vila, C., S?nchez Riveiro, H., Flores Silva, M. A., Mej?a de Loayza, E., Angulo 
Ch?vez, C., Castro Ruiz, D., Estivals, G., Garc?a V?squez, A., Nolorbe Payahua, C., 
Vargas D?vila, G., N??ez, J., Mariac, C., Duponchelle, F., & Renno, J. F. (2018). Peces 
de Consumo de la Amazon?a Peruana. Instituto de Investigaciones de la Amazon?a 
Peruana (IIAP), 218, ISBN: 978-612-4372-11-7. 
Gardiner, M. M., Allee, L. L., Brown, P. M., Losey, J. E., Roy, H. E., & Smyth, R. R. (2012). 
Lessons from lady beetles: accuracy of monitoring data from US and UK citizen?science 
99 
 
 
 
programs. Frontiers in Ecology and the Environment, 10(9), 471-476. 
https://doi.org/10.1890/110185 
Gonz?lez, A.B. & Pozo, J.  (2019). The Industrial Camera Modules Market: Market review and 
forecast until 2022. PhotonicsViews, 16(2), 24-26. 
https://doi.org/10.1002/phvs.201970207 
Harris, J. H. (1995). The use of fish in ecological assessments. Australian Journal of 
Ecology, 20(1), 65-80. https://doi.org/10.1111/j.1442-9993.1995.tb00523.x 
Hern?ndez-Serna, A. & Jim?nez-Segura, L.F. (2014). Automatic identification of species with 
neural networks. PeerJ, 2, e563. https://doi.org/10.7717/peerj.563  
Howard, J., & Gugger, S. (2020). Fastai: A layered API for deep learning. Information, 11(2), 
108. https://doi.org/10.3390/info11020108 
J?z?quel, C., Tedesco, P.A., Darwall, W., Dias, M.S., Frederico, R.G., Hidalgo, M., Hugueny, 
B., Maldonado?Ocampo, J., Martens, K., Ortega, H. & Torrente?Vilara, G. (2020). 
Freshwater fish diversity hotspots for conservation priorities in the Amazon Basin. 
Conservation Biology, 34(4), 956-965. https://doi.org/10.1111/cobi.13466 
Junk, W.J., Soares, M.G.M. & Bayley, P.B. (2007). Freshwater fishes of the Amazon River 
basin: their biodiversity, fisheries, and habitats. Aquatic Ecosystem Health & 
Management, 10(2), 153-173. https://doi.org/10.1080/14634980701351023 
Kirsch, J.E., Day, J.L., Peterson, J.T. & Fullerton, D.K. (2018). Fish misidentification and 
potential implications to monitoring within the San Francisco Estuary, California. 
Journal of Fish and Wildlife Management, 9(2), 467-485. https://doi:10.3996/032018-
JFWM-020 
100 
 
 
 
Lendemer, J., Thiers, B., Monfils, A.K., Zaspel, J., Ellwood, E.R., Bentley, A., LeVan, K., Bates, 
J., Jennings, D., Contreras, D. & Lagomarsino, L. (2020). The extended specimen 
network: A strategy to enhance US biodiversity collections, promote research and 
education. BioScience, 70(1), 23-30. https://doi.org/10.1093/biosci/biz165 
Lob?n-Cervi?, J., Hess, L.L., Melack, J.M. & Araujo-Lima, C.A. (2015). The importance of 
forest cover for fish richness and abundance on the Amazon floodplain. Hydrobiologia, 
750(1), 245-255. https://doi.org/10.1007/s10750-014-2040-0 
Marini, S., Fanelli, E., Sbragaglia, V., Azzurro, E., Fernandez, J. D. R., & Aguzzi, J. (2018). 
Tracking fish abundance by underwater image recognition. Scientific Reports, 8(1), 1-12. 
https://doi.org/10.1038/s41598-018-32089-8 
Miller-Coleman, R.L., Dodsworth, J.A., Ross, C.A., Shock, E.L., Williams, A.J., Hartnett, H.E., 
McDonald, A.I., Havig, J.R. & Hedlund, B.P. (2012). Korarchaeota diversity, 
biogeography, and abundance in Yellowstone and Great Basin hot springs and ecological 
niche modeling based on machine learning. PLOS ONE, 7(5), e35964. 
https://doi.org/10.1371/journal.pone.0035964 
Moreau, M. A., & Coomes, O. T., (2007). Aquarium fish exploitation in western Amazonia: 
conservation issues in Peru. Environmental Conservation, 34(1), 12-22. 
https://doi.org/10.1017/S0376892907003566 
Newman, G., Wiggins, A., Crall, A., Graham, E., Newman, S., & Crowston, K. (2012). The 
future of citizen science: emerging technologies and shifting paradigms. Frontiers in 
Ecology and the Environment, 10(6), 298-304. https://doi.org/10.1890/110294 
Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. & Clune, 
J. (2018). Automatically identifying, counting, and describing wild animals in camera-
101 
 
 
 
trap images with deep learning. Proceedings of the National Academy of Sciences, 
115(25), E5716-E5725. https://doi.org/10.1073/pnas.1719367115 
Oliveira, C., Avelino, G.S., Abe, K.T., Mariguela, T.C., Benine, R.C., Ort?, G., Vari, R.P. & e 
Castro, R.M.C. (2011). Phylogenetic relationships within the speciose family Characidae 
(Teleostei: Ostariophysi: Characiformes) based on multilocus analysis and extensive 
ingroup sampling. BMC Evolutionary Biology, 11(1), 1-25. https://doi.org/10.1186/1471-
2148-11-275 
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., 
Gimelshein, N., Antiga, L. & Desmaison, A. (2019). Pytorch: An imperative style, high-
performance deep learning library. Advances in Neural Information Processing Systems, 
32, 8026-8037. https://pytorch.org/ 
Pelicice, F.M., Bialetzki, A., Camelier, P., Carvalho, F.R., Garc?a-Berthou, E., Pompeu, P.S., 
Mello, F.T.D. & Pavanelli, C.S. (2021). Human impacts and the loss of Neotropical 
freshwater fish diversity. Neotropical Ichthyology, 19. https://doi.org/10.1590/1982-
0224-2021-0134 
Perdig?o, P., Lous?, P., Ascenso, J., & Pereira, F., (2020). Visual monitoring of High-Sea fishing 
activities using deep learning-based image processing. Multimedia Tools and 
Applications, 79, 22131-22156. https://doi.org/10.1007/s11042-020-08949-9 
Qin, H., Li, X., Liang, J., Peng, Y., & Zhang, C. (2015). DeepFish: Accurate underwater live fish 
recognition with a deep architecture. Neurocomputing, 187, 49-58. 
https://doi.org/10.1016/j.neucom.2015.10.122 
Rasmusson, J., Dahlgren, F., Gustafsson, H. & Nilsson, T. (2004). Multimedia in mobile phones-
The ongoing revolution. Ericsson Review, 2, 98-107. 
102 
 
 
 
Recknagel, F. (2001). Applications of machine learning to ecological modelling. Ecological 
Modelling, 146(1-3), 303-310. https://doi.org/10.1016/S0304-3800(01)00316-7 
Reis, R.E., Albert, J.S., Di Dario, F., Mincarone, M.M., Petry, P. & Rocha, L.A. (2016). Fish 
biodiversity and conservation in South America. Journal of Fish Biology, 89(1), 12-47. 
https://doi.org/10.1111/jfb.13016 
Ronneberger, O., Fischer, P. & Brox, T. (2015). U-net: Convolutional networks for biomedical 
image segmentation. International Conference on Medical Image Computing and 
Computer-assisted Intervention, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 
Schuettpelz, E., Frandsen, P.B., Dikow, R.B., Brown, A., Orli, S., Peters, M., Metallo, A., Funk, 
V.A. & Dorr, L.J., (2017). Applications of deep convolutional neural networks to 
digitized natural history collections. Biodiversity Data Journal, 1(5), e21139-e21139. 
https://dx.doi.org/10.3897%2FBDJ.5.e21139 
Sullivan, D.P., Winsnes, C.F., ?kesson, L., Hjelmare, M., Wiking, M., Schutten, R., Campbell, 
L., Leifsson, H., Rhodes, S., Nordgren, A. & Smith, K. (2018). Deep learning is 
combined with massive-scale citizen science to improve large-scale image classification. 
Nature Biotechnology, 36(9), 820. https://doi.org/10.1038/nbt.4225 
Sun, X., Shi, J., Dong, J., & Wang, X. (2016). Fish recognition from low-resolution underwater 
images. 9th International Congress on Image and Signal Processing, BioMedical 
Engineering and Informatics (CISP-BMEI), 471-476. https://doi.org/10.1109/CISP-
BMEI.2016.7852757 
Swanson, A., Kosmala, M., Lintott, C. & Packer, C. (2016). A generalized approach for 
producing, quantifying, and validating citizen science data from wildlife images. 
Conservation Biology, 30(3), 520-531. https://doi.org/10.1111/cobi.12695 
103 
 
 
 
Swierk, L., & Madigosky, S. R. (2014). Environmental perceptions and resource use in rural 
communities of the Peruvian Amazon (Iquitos and vicinity, Maynas Province). Tropical 
Conservation Science, 7(3), 382-402. https://doi.org/10.1177%2F194008291400700303 
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P. & 
Belongie, S. (2018). The inaturalist species classification and detection dataset. 
Proceedings of the IEEE conference on computer vision and pattern recognition. 8769-
8778. https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00914 
W?ldchen, J. & M?der, P. (2018). Machine learning for image based species identification. 
Methods in Ecology and Evolution, 9(11), 2216-2225. https://doi.org/10.1111/2041-
210X.13075 
Weinstein, B. G. (2017). A computer vision for animal ecology. Journal of Animal Ecology, 
87(3), 533-545. https://doi.org/10.1111/1365-2656.12780 
Winemiller, K.O., McIntyre, P.B., Castello, L., Fluet-Chouinard, E., Giarrizzo, T., Nam, S., 
Baird, I.G., Darwall, W., Lujan, N.K., Harrison, I. and Stiassny, M.L.J. (2016). Balancing 
hydropower and biodiversity in the Amazon, Congo, and Mekong. Science, 351(6269), 
128-129. https://doi.org/10.1126/science.aac7082 
White, A. E., Dikow, R. B., Baugh, M., Jenkins, A., & Frandsen, P. B. (2020). Generating 
segmentation masks of herbarium specimens and a data set for training segmentation 
models using deep learning. Applications in Plant Sciences, 8(6), e11352. 
https://doi.org/10.1002/aps3.11352 
 
104 
 
 
 
   
Appendices 
 
Appendix I. Sample Size Controlled Haplotype Distance Matrices 
 
Table A.1 Haplotypic distances matrices for genetic distance by way of nucleotide differences 
(Top), and morphospace Euclidean distance centroids (bottom).  
 
 
 
https://tinyurl.com/3av2x665 
 
 
 
105 
 
 
 
 
 
 
 
Appendix II. Morphospace UMAP Projection with Image Overlay 
 
 
Figure A.1 Euclidean UMAP projection (N. Neighbors = 9) of Chelonia mydas carapace 
images (n = 204) embedded using a MobileNetV2 neural network with original image 
overlay. 
 
 
 
https://www.iucn.org/news/secretariat/202105/a-tribute-lee-merriam-talbot-1930-2021 
 
 
 
 
106 
 
 
 
 
107