ABSTRACT Title of dissertation: A WIDE SCALE INVESTIGATION INTO LNCRNA IN BOS TAURUS Alexis Marceau, Doctor of Philosophy, 2023 Dissertation directed by: Professor Li Ma University of Maryland Department of Animal and Avian Science Although the history of genetic research has focused on genes and gene products, there is an interesting emerging subclass of genetic elements: long noncoding RNAs (lncRNAs). These are portions of the genome that are longer than 200 base pairs in length and are transcribed from DNA to RNA but do not yield a protein. The function of lncRNA is wide reaching and difficult to define; however, they are predominantly linked to the regulation of gene expression. This is done via transcriptional control, translation control, pre- and post- transcriptional and translational control, epigenetic modifications, RNA processing, as well as other methods [1]. In this dissertation, multiple Bos taurus tissues across various life conditions were in- vestigated in order to identify lncRNA and to begin making predictions about the role and function of identified transcripts. First, lncRNA were identified and analyzed in Bos tau- rus rumen tissue in pre-weaning and post-weaning cattle. lncRNA were implicated in the weaning process and demonstrated enrichment in complex traits, indicating the continued impact rumen-associated lncRNA have on dairy cattle. Following this study, mammary tissues from dry and lactating cattle were used for lncRNA analysis, in relation to the lacta- tion processes. This study revealed both the presence and impact of mammary lncRNA, and identified lncRNA associated with genes and biological processes that are strongly linked to lactation and mammary tissue function. Subsequently, immune system related tissues were analyzed for lncRNA and their roles. This investigation demonstrated lncRNA to be present in all investigated tissues, including transcripts being repeatedly present. Further analysis into identified lncRNA associated transcripts with genes and functions that are crucial to immune response. Finally, a tutorial was created to make lncRNA identification research more easily accessible to future researchers. The findings and creations of this dissertation increase the knowledge base of lncRNA and their role, allowing for further research endeavors and improvements in Bos taurus husbandry. A WIDE SCALE INVESTIGATION INTO LNCRNA IN BOS TAURUS by Alexis Marceau Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor Li Ma, Chair/Advisor Professor Lindley Darden Professor Najib El-Sayed Dr. George Liu Professor Mohamed Salem © Copyright by Alexis Marceau 2023 Dedication To my mom, I can’t make you a coauthor so I hope this is enough. ii Acknowledgments There are more people to thank than words in this dissertation. From the smallest words of encouragement to those who read and reread my ramblings, none of this could have hap- pened without all of you. First, I would like to thank my advisor, Dr. Li Ma, and the defense committee; for believing in me and allowing me to work in my own strange ways. I am forever grateful for your role in this chapter of my life. This gratitude extends to the Department of Animal and Avian Science, the Graduate School, and the University of Maryland for their varied supports. I would also like to thank all my lab mates, past and present. From commiserating to encouraging to checking in, you have all provided a support network I could not have suc- ceeded without. On more personal notes, there are many people who deserve to have their names asso- ciated with this work. As an homage to my time working in a library, in alphabetical order: Tera German, for holding me accountable, even when I railed against it. Jordan Graham, for contributions that are uncountable, invaluable, and everlasting. Zach Hood, for being both my moral and logical compass for the past 15 years. Tori Iqbal, for being a lighthouse in the storm of graduate school since well before my first day. Jonathan Mason, for being a new mentor when I didn’t know I needed one. And Shannon Schreiner, who I truly owe everything to. I’d also like to thank my family, both human and animal, for their love and support. It was undying and unwavering even when I felt as though I was dying or waver- ing. Finally, on the most personal notes, I would like to thank my partner, Nathan White, of the past decade. You have brightened my life since we were kids; without your humor, support, and love, I don’t know where I’d be, but I’d sure wouldn’t have laughed near as much. iii Table of Contents Dedication ii Acknowledgements iii 1 Literature Review 1 1.1 Study Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 An Overview of Cattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Functions of Tissues of Interest . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3.1 Functions of Rumen Tissue . . . . . . . . . . . . . . . . . . . . . . 2 1.3.2 Mammary Tissue Function . . . . . . . . . . . . . . . . . . . . . . 3 1.3.3 Immune System Tissues and Their Functions . . . . . . . . . . . . 4 1.4 Development of Tissues of Interest . . . . . . . . . . . . . . . . . . . . . . 6 1.4.1 Development of Rumen Tissue . . . . . . . . . . . . . . . . . . . . 6 1.4.2 Development of Mammary Tissue . . . . . . . . . . . . . . . . . . 6 1.4.3 Development of the Immune System and Associated Tissues . . . . 7 1.5 Genetic Changes in Tissues of Interest . . . . . . . . . . . . . . . . . . . . 8 1.5.1 Genetic Changes of Rumen Tissue Through Development . . . . . . 8 1.5.2 Genetic Changes in Mammary Tissue As Development Progresses . 9 1.5.3 Genes Associated with the Immune System and its Activation . . . 10 1.6 Long Non-Coding RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.6.1 The History of lncRNA . . . . . . . . . . . . . . . . . . . . . . . . 11 1.6.2 Notable lncRNA and their Functions . . . . . . . . . . . . . . . . . 12 1.6.3 Gaps in lncRNA Identification Software . . . . . . . . . . . . . . . 13 1.7 Contemporaneous Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.7.1 The Conservation and Signatures of lincRNAs in Marek’s Disease of Chicken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.7.2 Comprehensive Analysis of Differentially Expressed mRNA, lncRNA and circRNA and Their ceRNA Networks in the Longissimus Dorsi Muscle of Two Different Pig Breeds . . . . . . . . . . . . . . . . . 16 iv 1.7.3 Genome-Wide Identification and Characterization of Long Non- Coding RNAs in Longissimus dorsi Skeletal Muscle of Shandong Black Cattle and Luxi Cattle . . . . . . . . . . . . . . . . . . . . . 17 1.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 Investigation of Rumen Long Noncoding RNA Before and After Weaning in Cattle 26 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.1 lncRNA Identification . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2 lncRNA Characteristics with Comparison to Protein Coding Tran- scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.3 Differential Expression Analysis Before and After Weaning . . . . . 33 2.3.4 Analysis of lncRNA Sequence Conservation . . . . . . . . . . . . . 36 2.3.5 Transcriptional Annotation of Common and Conserved lncRNA . . 39 2.3.6 SNP heritability Enrichment Analysis on Cattle Traits . . . . . . . . 40 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.6 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.6.1 Animals and Tissue Collection . . . . . . . . . . . . . . . . . . . . 46 2.6.2 RNA sequencing, Transcriptional Mapping and Assembly . . . . . 46 2.6.3 lncRNA Identification . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.6.4 Comparison to Coding Transcripts . . . . . . . . . . . . . . . . . . 48 2.6.5 PhastCons Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.6.6 Transcriptional Annotation . . . . . . . . . . . . . . . . . . . . . . 49 2.6.7 Heritability Enrichment Analysis . . . . . . . . . . . . . . . . . . . 50 2.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3 Investigation of lncRNA in Bos taurus Mammary Tissue During Dry and Lactation Periods 59 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.1 lncRNA Identification . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.2 lncRNAs as Compared to Protein Coding Transcripts . . . . . . . . 65 3.3.3 Differential Expression of lncRNA in Dry Versus Lactating Mam- mary Tissue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.4 Sequence Conservation of lncRNA . . . . . . . . . . . . . . . . . . 68 3.3.5 Annotation of lncRNA of Interest Based on Homology . . . . . . . 72 3.3.6 lncRNA-Gene Coexpression Correlation and Ontology . . . . . . . 74 3.3.7 lncRNA-Gene Coexpression Correlation Annotation . . . . . . . . 76 v 3.3.8 SNP Heritability Enrichment Analysis on Cattle Traits . . . . . . . 77 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.6 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.6.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.6.2 Sequence Assembly and Mapping . . . . . . . . . . . . . . . . . . 82 3.6.3 lncRNA Identification . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.6.4 Comparison to Coding Transcripts . . . . . . . . . . . . . . . . . . 84 3.6.5 Differential Expression Analysis . . . . . . . . . . . . . . . . . . . 85 3.6.6 Phastcon Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.6.7 Transcriptional Annotation Based on Sequence Homology . . . . . 86 3.6.8 Gene Co-Expression Correlation and Ontology . . . . . . . . . . . 87 3.6.9 Transcriptional Annotation Based on Correlation . . . . . . . . . . 87 3.6.10 Heritability Enrichment Analysis . . . . . . . . . . . . . . . . . . . 88 3.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4 An Exploration of lncRNA in Immune System Associated Tissues in Bos taurus 97 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.1 lncRNA Identification . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.2 Comparisons Between lncRNA and Coding Transcripts . . . . . . . 102 4.3.3 Common lncRNA and Differential Expression . . . . . . . . . . . . 105 4.3.4 Sequence Conservation of lncRNA . . . . . . . . . . . . . . . . . . 109 4.3.5 Transcriptional Annotation of lncRNA Based on Sequence Homology113 4.3.6 Genetic Co-expression Correlation and Annotation . . . . . . . . . 115 4.3.7 Gene Cluster Ontology Based on lncRNA Correlation . . . . . . . . 118 4.3.8 Integration of GWAS Data Using SNP Heritability Enrichment Anal- ysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.3.9 Immune-system Related lncRNA as Compared to Other lncRNA Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.6 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.6.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.6.2 Sequence Assembly and Mapping . . . . . . . . . . . . . . . . . . 128 4.6.3 lncRNA Identification . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.6.4 Comparison to Coding Transcripts . . . . . . . . . . . . . . . . . . 130 4.6.5 Differential Expression Analysis . . . . . . . . . . . . . . . . . . . 131 4.6.6 Phastcon Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 4.6.7 Transcriptional Annotation Based on Sequence Homology . . . . . 132 4.6.8 Gene Co-Expression Correlation . . . . . . . . . . . . . . . . . . . 132 vi 4.6.9 Gene Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.6.10 Heritability Enrichment Analysis . . . . . . . . . . . . . . . . . . . 134 4.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5 A Comprehensive Tutorial for Identification of lncRNA from RNA-Seq Data 143 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.2.2 Set Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.2.3 Software Installation . . . . . . . . . . . . . . . . . . . . . . . . . 145 Hisat2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Samtools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 ChromToUcsc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Stringtie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Cufflinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 lncRNA Identification Pipeline . . . . . . . . . . . . . . . . . . . . 148 Bedtools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Seqtk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 CPC2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Hmmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 SRA-Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 5.2.4 Reference Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Reference Genome(s) . . . . . . . . . . . . . . . . . . . . . . . . . 154 Reference Fasta . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Chromosome Alias Files . . . . . . . . . . . . . . . . . . . . . . . 156 Unscaffolded Chromosome File . . . . . . . . . . . . . . . . . . . 157 Pfam Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 BLAST Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.3.2 Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Obtain data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Build index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Align data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Secondary alignment . . . . . . . . . . . . . . . . . . . . . . . . . 164 Convert SAM file to BAM file . . . . . . . . . . . . . . . . . . . . 165 Sort BAM file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Standardize reference genome naming conventions . . . . . . . . . 167 Transcript Assembly . . . . . . . . . . . . . . . . . . . . . . . . . 168 Split files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 vii Cuffcompare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Reconstitute files . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Intergenic list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Generate intergenic GTF . . . . . . . . . . . . . . . . . . . . . . . 176 Generate a summary file . . . . . . . . . . . . . . . . . . . . . . . 178 Filter by size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Convert file format . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Combine samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Split bed file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Obtain fasta file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Reconstitute fasta file . . . . . . . . . . . . . . . . . . . . . . . . . 185 Find reverse transcript . . . . . . . . . . . . . . . . . . . . . . . . 186 Coding Potential Calculator . . . . . . . . . . . . . . . . . . . . . 187 Coding Potential Filtration . . . . . . . . . . . . . . . . . . . . . . 188 Generate noncoding lists . . . . . . . . . . . . . . . . . . . . . . . 189 Finalize noncoding list . . . . . . . . . . . . . . . . . . . . . . . . 189 Convert file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 Split bed file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Obtain fasta file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Reconstitute fasta file . . . . . . . . . . . . . . . . . . . . . . . . . 193 Move codon.txt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Convert to protein sequence . . . . . . . . . . . . . . . . . . . . . 195 Remove excessively long transcripts . . . . . . . . . . . . . . . . . 196 Split protein files . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Parse Pfam database . . . . . . . . . . . . . . . . . . . . . . . . . 197 Merge all outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Clean up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Filter Pfam list . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 List tidying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Convert to bed file . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Split bed file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Obtain fasta file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Reconstitute fasta file . . . . . . . . . . . . . . . . . . . . . . . . . 205 Prepare BLAST database . . . . . . . . . . . . . . . . . . . . . . . 205 Run BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Filter BLAST results . . . . . . . . . . . . . . . . . . . . . . . . . 207 Generate match list . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Convert list file format . . . . . . . . . . . . . . . . . . . . . . . . 209 Generate final lncRNA list . . . . . . . . . . . . . . . . . . . . . . 210 5.4 Further Analysis and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 211 5.4.1 Further Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 viii Conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Other . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 5.4.2 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Error Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . 218 Words of Encouragement . . . . . . . . . . . . . . . . . . . . . . . 218 5.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 6 Conclusion 222 Bibliography 225 ix Chapter 1: Literature Review 1.1 Study Objectives The overall objective of this study is to explore lncRNA across multiple Bos taurus tissues and developmental time points to build a deeper understanding of the genetic land- scape that shapes these important production animals. This dissertation is organized as follows: Chapter 1 seeks to establish a knowledge base about topics related to each research endeavor. Chapter 2 investigates lncRNA in cat- tle rumen tissue before and after weaning. Chapter 3 explores lncRNA in cattle mammary tissue during the dry and lactating periods. Chapter 4 is a multi-tissue analysis focusing on tissues associated with the immune system found in Bos taurus. After identification, in all studies, lncRNA are analyzed for trends and functional annotation. Chapter 5 is a plain language tutorial for lncRNA identification from genetic sequence data. This is intended to fill the current gap in accessible lncRNA identification tools. The final chapter, Chapter 6, summarizes the conclusions of all lncRNA identification projects and discusses avenues for future research, as well as provides closing remarks. To begin this dissertation, I will review pertinent literature about related topics. This will include: what are cattle; function, development, and genetic changes associated with the investigated tissues; an overview of long non-coding RNA (including their history, influential lncRNA, and the current status of identification software); reviews of recent, related studies; and the objectives of the studies conducted within this dissertation. 1 1.2 An Overview of Cattle Bos taurus, or cattle, are hoofed ruminant livestock animals commonly used for their meat and dairy. Descending from wild ox, modern cattle have been traced back to a small population of aurochs dating back 10,500 years; having been selectively bred to increase meat and milk production since their domestication [4]. Today, Bos tauruscattle tend to be divided into two groups: those raised for meat (including breeds such as Angus and Hereford) and those raised for dairy (breeds include Holstein, Guernsey, Brown Swiss, and Jersey, to name a few), although it is possible to use a single animal for both purposes [1, 32]. They are found world-wide, being more concentrated in regions with large, open areas where herds feed on herbaceous plant material; cows utilize a ruminant digestive system, using a four compartment stomach to break down plant material into usable nutrients. [2] Cattle tend to reproduce on a yearly basis, gestating for 9 months before birthing a single calf (although it is possible for multiples to be born, this is not common) [2]. Like most mammals, once a cow has birthed her offspring, she will begin to lactate. She will proceed to lactate for up to 12 months, where she will be milked by hand or machine, before her milk is further processed into consumer ready milk or other dairy products [31]. Depending on the nature of the farm (dairy or beef), the offspring may be kept in the herd to generate more milk and/or offspring or it may be raised to be processed for beef when it reaches the appropriate size [38]. 1.3 Functions of Tissues of Interest 1.3.1 Functions of Rumen Tissue As previously mentioned, cattle are ruminant animals. This means they utilize a multi chambered stomach to ferment feed, breaking it down from indigestible material into en- 2 ergy precursors [41]. The largest portion of the chambered system is the rumen; this large stomach compartment is made up of several sacs and holds 25+ gallons of material. It represent several key functions: absorption, transport, volatile fatty acid metabolism, and protection[12, 42]. When feed is ingested by cattle, it will be chewed quickly and swallowed. The swal- lowed material enters the rumen, liquid material passes quickly to the rest of the stomach and solid material faces one of two fates: be regurgitated and chewed further (this is called ”chewing the cud”) or form a dense mat within the rumen. The mat of solid material fer- ments in the rumen for up to 48 hours; during this time, a carefully balanced community of microbes goes to work breaking down the material into nutrients the animal can utilize [41]. This process is assisted by the constant movement and contractions of the rumen; a healthy animal should have one to two rumen contractions a minute. This movement func- tions to mix the rumen content, force rumen content into contact with undigested material, compact material into the fermenting mat, and move appropriately digested material out of the rumen [39]. This process allows the cattle to extract more nutrients from their food, as well as allows farmers to feed by-products to their livestock, keeping feed costs down while still ensuring adequate nutrition for their herd [33]. 1.3.2 Mammary Tissue Function As time has progressed, dairy cattle have been selectively bred to favor those with higher milk production as a more productive animal is a more valuable animal. The most important organ for this production is the mammary gland: a pair of apocrine glands located on each side of the anterior chest wall. This gland is made up of three distinct regions: skin, stroma, and parenchyma [23]. The skin is represented by the nipple, which functions to secrete milk, and the areola, which lubricates the nipple during nursing [3]. The stroma 3 is composed of fibrous and fatty components, which act together to create the framework of the breast around the parenchyma. Finally, the parenchyma is made up of 15 to 20 alveoli-containing lobules that branch into a duct network. These ducts come together to form several major (or main) ducts that each drain into a lactiferous sinus, where milk is collected. These sinuses are arranged radially around the nipple and are stimulated to release milk when a calf begins to suckle (or when a cow is milked) [23]. Given milk is primarily water, the additional nutrients must be supplied by the animal’s diet. Once the food material has been adequately digested, the nutrients can be absorbed into the blood stream, where they can be shuttled to various locations, one of which is the mammary gland. Once the nutrient rich blood reaches the mammary gland, epithelial cells are able to uptake the nutrients and infuse them into the developing milk [20]. The excretion of the milk, is a two-staged process: a preparatory phase and an excretion phase. The preparatory phase is composed of cellular and enzymatic changes within the alveolar cells. The excretion phase is driven by hormonal changes, with prolactin playing a key role [40]. Overall, the milk production process is complex and multifaceted. 1.3.3 Immune System Tissues and Their Functions The world is riddled with pathogens that, although just trying to live and reproduce, cause sickness and disease. The immune system is the line of defense against these pathogens. The immune system’s function can be summarized with three main tasks: fight pathogens, neutralize harmful environmental particulates, and protect the body from rogue cells (like cancer) [21]. The immune system protects the body from outside threats (both living and environmental) with 2 distinct defense systems: the innate immune response and the adap- tive immune response. The innate immune system utilizes preventative measures, such as skin and mucous membranes, and responsive cells, such as phagocytes, to act immediately, 4 although in a non-pathogen specific way. Phagocytes enclose and digest foreign parti- cles, neutralizing the threat; they are also able to trigger cascades of signaling pathways to prepare the adaptive immune system for a pathogen-specific response. The adaptive im- mune system requires activation before engaging in a specific response, targeting specific pathogens. This system is made up of lymphocytes (both B cells and T cells) and antibod- ies. T cells use a cell surface protein to interact with germs in a lock-and-key function, replicating and creating more T cells once a matching pathogen has been found. There are three types of T cells: helper T cells, which use chemical messengers to activate further adaptive immune responses; cytotoxic T cells, which detect cells infected by viruses and tumor cells and destroy them; and memory T cells, helper T cells that remain after infec- tion and are primed to initiate an immune response if the pathogen returns. B cells are activated by helper T cells: the T cell interacts with a matching B cell (the match being made between the cell surface proteins), causing the B cell to multiply and transform into antibody producing plasma cells. The transformed B cells can then release the antibodies into the blood stream, creating a flurry of pathogen-specific antibodies within the blood. The final component of the adaptive immune system is antibodies. These are protein-sugar compounds that circulate within the bloodstream and can quickly detect pathogens. Once bound to the pathogen, the pathogen is rendered useless and other immune cells are at- tracted to begin combating the infection. Antibodies also act in a specific lock-and-key fashion. The final function of the immune system is to protect the body from cells which have been compromised by DNA damage or viral infection. Natural killer cells scan the surface of cells, looking for abnormalities and utilize cell toxins to dispatch identified cells, in a similar fashion to cytotoxic T cells [37]. 5 1.4 Development of Tissues of Interest 1.4.1 Development of Rumen Tissue Although cattle are ruminants, their rumen is underdeveloped and nonfunctional at birth. Upon entering the world, the rumen takes up approximately 35% of the stomach capacity and is not utilized as the calf is exclusively feeding on milk. When suckling, the milk the calf is consuming travels down the esophageal groove, bypassing the rumen and directly entering the abomasum for digestion. Rumen maturation is two-fold, the devel- opment of rumen papillae and the development of rumen muscle mass. Fermentation is required to increase the size and muscle mass of the rumen, but this is not possible with- out establishing a microbial community within the rumen. Luckily, the birthing process, existing in their post-natal environment, and the introduction of dry matter all lead to the colonization of microbes in the rumen and assist in the development of this crucial micro- bial environment [36]. When dry feed is introduced, acetic, propionic, and butyric acids are produced, lowering the pH of the anaerobic rumen, and allowing for continued micro- bial growth and development; the development of the microbial ecosystem also stimulates the growth of papillae. The introduction of dry matter also physically stimulates the de- velopment of rumen mass by forcing the tissue to expand to accommodate the feed. The combination of physically increasing the rumen size and the development of necessary papillae and microbes leads to the maturation of the rumen [42]. 1.4.2 Development of Mammary Tissue Similar to rumen tissue, mammary tissue is underdeveloped at birth. Although unlike rumen tissue, mammary tissue remains underdeveloped until the cow approaches puberty. For dairy cows, puberty occurs after 13-14 months when they begin to ovulate and are 6 now physically ready to calve [18]. At this point, the increase in hormone levels leads to an increase in mammary tissue size. However maturation arrests after the mammary fat pad fills. The mammary tissue is larger and the duct network has begun to branch but is still not fully developed or functional [13]. The maturation of mammary tissue resumes during gestation and after birth with a surge of hormones like progesterone and estrogen. The next stage of development includes another increase in size and the finalization of the duct network branching. The duct network also becomes surrounded by connective tissue, developing into secretory alveoli. The final stage of development is triggered by an increase in prolactin and leads the ducts to fill with milk. The mammary gland is now fully developed and ready to nurse a calf. The mammary gland will be refilled with milk after the calf nurses until the calf is weaned. At that point, the gland will remain at its final size, but the alveoli will shrink and disappear until the cow is pregnant again where the cycle will continue. Once a cow reaches menopause and is no longer able to become pregnant, the lack of estrogen will cause the mammary gland to atrophy and decrease in size [23]. 1.4.3 Development of the Immune System and Associated Tissues The immune system in cattle calves begins developing from conception and does not reach full maturity until 6 months postpartum. Given the high number of pathogens the young animals are exposed to, it is crucial that their immune system develops strongly. Fetal calves are reliant on their mothers’ immune system and their innate immune system, as they are yet to be exposed to any pathogens to trigger and train their adaptive immune system. It is not until late gestation that calves develop their phagocytic cells, increasing their innate immune systems’ strength. Regarding their adaptive immune systems, unless a calf has been infected prior to birth, newborn calves lack antibodies completely. B and T cells are present in neonatal calves, but their numbers decrease as birth approaches. 7 Since newborn calves are immunologically naı̈ve, they rely heavily on the passive immunity provided to them by their mother’s colostrum (which is primarily composed of antibodies, cytokines, and cells). In healthy calves, the colostrum is absorbed by the intestinal cells, giving the baby’s immune system a jump start [10]. As the immune system develops, the amount of immune system cells must increase. Most immune system cells are made in the bone marrow before traveling to other parts of the body. Lymphocytes are made in the bone marrow and those that remain in the bone marrow mature into B cells. Some lymphocytes will travel to the thymus, there they will mature into T cells. After development, many immune cells are found in strategically placed lymph nodes and within lymphatic fluid. The localization of these immune cell home bases allows for rapid immune responses when an infection looms. The spleen also holds a small army of immune system cells; when blood passes through the spleen, it can be filtered for pathogens and the local immune cells can respond to any detected threats [34]. 1.5 Genetic Changes in Tissues of Interest 1.5.1 Genetic Changes of Rumen Tissue Through Development While rumen development is reliant on feed intake, there are necessary genetic changes that occur to facilitate the development. In fact, there have been over 6,000 genes de- termined to be differentially expressed at various stages of tissue development and using functional annotations and protein-protein interactions, 9 genes revealed themselves to be involved directly with rumen development [45]. The uptick in butyrate as dry feed matter is introduced can lead to changes in structural integrity, epigenetic regulation, signaling path- ways, and more [11]. Direct addition of butyrate via milk or top-dressed feeds led to the identification of 1,977 differentially expressed genes in rumen epithelial cells. These genes had associations with functions like regulatory and signaling pathways [27]. Changes in 8 chromatin structures, epigenetic interactions (for example, DNA methylation and histone modification), and extracellular interactions were also seen as a calf was weaned off milk and onto dry feed [14]. The growing microbial community within the rumen is also influ- ential on genetic expression in early development of rumen tissue. A multifaceted study investigating the relationship between the rumen’s microbiota and host transcriptomes re- vealed the volatile fatty acids released by the microbial colony led to changes to the mRNA and miRNA within the host’s genome, with three gene modules and one miRNA associ- ating significantly with volatile fatty acids. 13 genetic elements were also associated with a rumen bacteria cluster [30]. Findings linking rumen development, both physically and microbiologically, to genes indicate there are genetic changes happening along side the morphological changes that occur as the rumen develops. 1.5.2 Genetic Changes in Mammary Tissue As Development Progresses Given the large changes that occur in the mammary gland as mammogenesis occurs, there must be many genes that require differential expression to generate such changes. Knockout models uncovered 2 genes that, when knocked out, disrupted the formation of mammary placodes. When investigating mammary gland aplasia, it seems a mutation in the TBX3 genes leads to the lack of mammary placodes—this gene appears to be key to mammary gland development. Ductal tree formation also appears to rely on the parathyroid hormone-releasing protein (PTHrP) gene, as disruptions in this gene lead to bud formation that ceases before branching into ductal trees [19]. When analyzing mammary tissue at various time points relating to pregnancy, and thus mammary gland development, 4,843 differentially expressed genes were identified. Included in those genes were 110 genes that showed a peak profile that was associated only with lactation, having much lower expres- sion during pregnancy and after lactation. Also included in these identified genes were sev- 9 eral transcription factors, indicating there are cascading effects of differentially expressed genes as maturation progresses [9]. There are also various growth factor expression profiles that change over the course of mammary development, from prenatal mammary cell differ- entiation to pubescent and gestational maturation. For these profiles to vary, the expression patterns of their associated genes must also vary [29]. 1.5.3 Genes Associated with the Immune System and its Activation While there are obviously a variety of genes needed to create the various cells asso- ciated with the immune system, are there differences in gene expression in response to immune system activation? The answer is yes. Toll-like receptors (TLRs) are a key com- ponent of innate immunity and when investigating unstimulated monocytes as compared to TLR4-stimulated monocytes, 1471 expression quantitative trait loci (eQTLs) were found to be present in the stimulated monocytes while absent in the unstimulated monocytes. These eQTLs were able to be associated with genes related to TLR4 activation. These findings demonstrate there are genetic changes that are key to innate immune response [24]. Relat- ing to the adaptive immune system, changes in gene expression are crucial. Without these changes, there is no way for adaptive immune cells to modify their cell surface proteins and become specialized. First, to transform from an early thymic progenitor cell into a T cell, B cell, natural killer cell, myeloid cell, or dendritic cell, the expression of genes like Flt3, CD24, and CCR9 must vary. The interactions these cell have with specific ligands also triggers cascades of genetic expression changes, leading cells to differentiate and follow various pathways. Activation of adaptive immune cells is also dependent on interactions leading to gene expression changes, these gene expression changes then lead to distinct subpopulations of immune cells, all ready to fight specific pathogens [8]. 10 1.6 Long Non-Coding RNA 1.6.1 The History of lncRNA It has been known since the 1950s that the amount of present haploid DNA showed little to no correlation with the organism’s size or its complexity. If it did, the larger the genome, the more complex the creature; this simply was not the case. In the 1970s, it ap- peared that the genome was not all converted into genes as the amount of material present was far greater than the predicted number of genes that humans could possibly have. This led to the belief that majority of the genome was noncoding ”junk,” being too rife with transposons, pseudogenes, and repeating sequences to be of any use. Although the idea that the rest of the genome was just junk was short-lived, soon it was uncovered that por- tions of the genome were used as various forms of RNA (rRNA and tRNA being the first two major classes discovered). 1977 brought the discovery of introns and the 1980s bore snRNAs and snoRNAs. The discovery of snRNAs and snoRNAs gave way to the theory that these small portions of RNA acted in a regulatory fashion, being implicated in post- transcriptional RNA processing. It wasn’t until whole genome sequencing technologies began to rise in the early 2000s that researchers were able to begin to appreciate the vast- ness of non-coding portions of the genome [25]. The early 1990s brought about identification of the first long noncoding RNAs (lncRNA), noncoding RNA transcripts that exceed 200 base pairs in length. These findings included H19 in 1990 and Xist in 1992, though they were hotly contested as they were not readily assigned a function, therefore casting doubt on the value of these findings. As technology advanced, it was clear lncRNA were here to stay [25]. Now, lncRNA have been discov- ered in all species where they have been looked for (from viruses to prokaryotes to fungi to animals), they play roles relating to cis- and trans- acting gene expression, cis- and 11 trans- acting chromatin remodeling, molecular scaffolding, cellular process regulation, cel- lular differentiation, sex determination, sequence divergence, and likely other individual lncRNA-specific functions that have yet to be discovered [22]. 1.6.2 Notable lncRNA and their Functions In January of 1990, Molecular and Cellular Biology published Brannan et al’s publica- tion “The product of the H19 gene may function as an RNA.” This was the first publication to feature a lncRNA. It appeared the mouse gene H19 was a hepatic fetal-specific mRNA that was under the transcriptional control of a trans-acting locus termed “raf.” The “gene” was made into mRNA, but what protein it was supposed to produce was unclear. There were many small open reading frames and multiple termination codons in the transcript. The RNA also did not appear to interact with translational machinery. All this evidence led this group to hypothesize that H19 was not a normal RNA that would be made into a functional protein, but an RNA molecule that never proceeded further in transcription- translation pathway [5]. The next big lncRNA, though not yet classified as a lncRNA, to enter the scene came a year later in January of 1991: XIST. This gene was implicated in X chromosome inac- tivation. In-situ hybridization demonstrated XIST repeatedly localized to the Xq13 region in inactivated X chromosomes [7]. Further research into XIST revealed the “gene product” was actually a 15kb inactive X-specific transcript that lacked a conserved open reading frame. The transcript was riddled with tandem repeats and did not associate with trans- lational machinery; it is also almost exclusively found in the nucleus. These findings led researchers to believe XIST was a functional RNA supporting X chromosome inactivation, and was later coined as a lncRNA [6]. Perhaps one of the most recognizable lncRNA to be discovered is HOTAIR, a mem- 12 ber of the HOX noncoding RNA (ncRNA) family. What makes the HOX ncRNA family so memorable is the tendency to be expressed near developmental axes and differential histone modifications leading to variable RNA polymerase accessibility. HOTAIR is a specific 2.2 kilobase ncRNA located within the HOXC locus. It appears to interact with the polycomb repressive complex 2 (PCR2) and is a necessary component for both PCR2 occupancy and the histone H3 lysine-27 trimethylation of the HOXD locus. If other lncRNA act in a simi- lar manner to HOTAIR, it would indicate lncRNA may be able to act as gene silencers from a distance, associating them with gene regulation [35]. Today, lncRNA have been implicated in a wide range of biological functions: chro- matin remodeling, transcriptional activation, transcriptional interference, RNA processing, and mRNA translation. These associations connect lncRNA to growth, development, stress responses, cell differentiation, regulation, cell cycle control, and gene expression control at all levels. These non-coding transcripts are pervasive and extensive [44]. 1.6.3 Gaps in lncRNA Identification Software With the growing interest in lncRNA, it would be expected to see an increase in lncRNA identification tools, and while this is true, the available tools are sorely lacking in usability. Developed in 2014, PLEK, or “the predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme”, is a computational pipeline used to distin- guish lncRNA from mRNA using a k-mer scheme. In humans, this tool was demonstrated to be robust and accurate with highly accurate results in other vertebrates. Although a use- ful tool, this tool functions more to distinguish between two groups than to identify lncRNA from scratch [26]. Lncident was developed in 2016 and intended to be a novel method to identify lncRNA based on the researchers’ belief that rapid identification of lncRNA and mRNA were cru- 13 cial for functional discovery of lncRNA. With the successful development of this program, developers claim it to be high speed without the loss of accuracy. Lncident is available as an R package but this does limit its usefulness as users must be comfortable with R. A webserver was created for this tool but it was limited to human, mouse, and C. elegans models; at this time, the webserver is not available [16]. LncFinder is a tool created in 2019. It was developed to be an integrated platform to promote lncRNA research; it aimed to be used for both identification of lncRNA, as well as attribution of properties. It also aimed to improve currently available tools. The developers found the tool did outperform contemporaneous tools and was robust, providing satisfac- tory results. Like Lncident, this tool is available as an R package but does require users to be comfortable with R. Again, similarly to Lncident, a web server was also developed but currently is not available [15]. There are other lncRNA identification tools and tutorials available, however they all ap- pear to have serious drawbacks. Ranging from lack of maintenance to limited functionality, it is not easy to find a useful, plain language guide to identifying lncRNA at this time. 1.7 Contemporaneous Studies 1.7.1 The Conservation and Signatures of lincRNAs in Marek’s Disease of Chicken In this study, He et al endeavored to investigate lincRNA expression as it related to the T cell lymphoma induced in chickens by the Marek’s disease virus. Marek’s disease virus, and the resulting Marek’s disease, is highly contagious and deadly; it also is prone to in- creases in virulence, rendering current vaccines redundant. When investigating the chicken bursa tissue at 3 time points post-infection, 1225 transcripts in 1056 loci were revealed. 14 The identified transcripts were shorter, had fewer exons per transcript, showed lower expression, and less sequence conservation when compared to coding transcripts. A con- served transcript of note showed positional equivalence with the HOTTIP lincRNA, a lin- cRNA that has been well documented in human and mice genomes. HOTTIP is believed to act as a 5’ HOXA gene activator and correlation analysis also showed this transcript to be positively correlated with HOXA expression in analyzed samples. Positional analysis was used to ascertain genes neighboring the identified lincRNA, these neighboring genes were then assessed for correlation with the transcripts, and finally correlated genes were assessed for gene ontology. This analysis revealed lincRNA to be connected to the regula- tion of transcription, signaling and regulation of development. When connecting lincRNA to Marek’s disease and resistance to the virus, linc-satb1 emerged as an interesting finding. This lincRNA was positively associated with defense response, inflammatory response, lymphocyte activation and response to external stimu- lus and negatively associated with cell cycle-related functions such as cell cycle process and DNA replication. It was also only highly expressed in birds that were infected with the virus from an MD-resistant line, implicating this lincRNA in the immune response to MDV. The nearby gene, SATB1, is a genome organizer that regulates chromatin structure and a transcription factor that controls many genes involved in T cell development and activation; these associations further connect this gene and the lincRNA to the immune response initi- ated by MDV. Further investigation revealed that SATB1 could promote tumor emergence and progression. In conclusion, this study successfully identified lincRNA in chicken bursa and was able to associate identified transcripts with functions of interest. A lincRNA was also identified and assigned to play a role in the immunological response initiated by a Marek’s disease virus infection [17]. 15 1.7.2 Comprehensive Analysis of Differentially Expressed mRNA, lncRNA and circRNA and Their ceRNA Networks in the Longissimus Dorsi Muscle of Two Different Pig Breeds In porcine, meat quality is crucial to sale price and economic value; muscle growth and fatness are complex traits that greatly influence meat quality. In this study, two pig breeds with varying muscularity traits were compared using mRNA, lncRNA, and circRNA. This investigation identified 854 mRNA, 233 lncRNA, and 66 circRNA that were differentially expressed in longissimus dorsi muscles in Chinese Huainan pigs (HN pigs) and Western commercial Duroc X (Landrace X Yorkshire) pigs (DLY pigs). Identification efforts found 9,068 lncRNA to be present in samples and differential expression analysis isolated 4,859 lncRNA as common to both HN and DLY pigs. 233 of these lncRNA were statistically significantly differentially expressed between the two breeds. Identified differentially expressed lncRNA were then used with circRNA, miRNA, and mRNA data to construct potential regulatory networks as lncRNA have been regularly implicated as regulatory agents. Common nodes across networks included features such as the genes MYOD1 and PPARD and miRNAs miR-423-5p and miR-874. In the construc- tion of regulatory networks, PPARD was targeted by three miRNAs, these miRNAs then targeted five lncRNA and 16 circRNA; MYOD1 showed connections between miR-296-5p; this miRNA then connected to two lncRNA and three circRNA. Overall, identified lncRNA appear to function in the regulation of biological processes that affect muscle growth in a variety of ways. Researchers hypothesized lncRNA may change chromatin spatial conformation by binding to target genes, altering their expression level. They also theorized that lncRNA may interact with transcription factors. In the con- struction of regulatory networks, lncRNA were able to be associated with miRNA that led 16 to differential expression of genes that may contribute to the differences seen in porcine meat composition [43]. 1.7.3 Genome-Wide Identification and Characterization of Long Non-Coding RNAs in Longissimus dorsi Skeletal Muscle of Shandong Black Cat- tle and Luxi Cattle Although lncRNA have been repeatedly associated with the regulation of cell differenti- ation, fat synthesis, and embryonic development, this study sought to investigate lncRNA’s role in skeletal muscle development in Shandong Black and Luxi cattle. Similarly to porcine production, meat quality in cattle is very economically important and impacted by many factors; the genetics behind improving meat quality is of utmost importance and many breeding decisions are made to maximize the development of cattle muscle. Com- parisons between the two cattle breeds did reveal genetic differences. When comparing the muscle fibers of both cattle breeds, there are demonstrable differ- ences between the fibers: Shangdong cattle muscle fibers were significantly larger, showed less muscle fiber density, and a lower ratio of both fast and slow twitch fibers when com- pared to Luxi cattle. Further analysis of the genetic elements contributing to those dif- ferences identified 1,415 differentially expressed mRNA and 480 differentially expressed lncRNA. Correlation analysis between the differentially expressed mRNA and differen- tially expressed lncRNA yielded 43,844 correlated pairs and generated a list of 387 lncRNA to be further analyzed. Based on location, the 387 lncRNA targeted 1,164 genes including MYORG, Wnt4, PAK1, and ADCY7. Gene ontology analysis of the targeted genes re- vealed GO items such as cellular organization development, single multicellular organiza- tion process, and multicellular organic process. KEGG analysis of the 1,164 targeted genes showed enrichment of the calcium signaling pathway, the AMPK signaling pathway, the 17 cGMP-PKG signaling pathway, and the PPAR signaling pathway—these pathways related heavily to muscle development. Interaction network development also associated multiple lncRNA with several muscle types, including skeletal muscle. The most promising lncRNA identified appeared to target bta-miR-133a. The bind- ing between the lncRNA and this miRNA may influence the regulation of skeletal muscle development by competitively inhibiting the expression of the target gene Pax7. In con- clusion, the finding of this study associated lncRNA with skeletal muscle development in cattle; the differential expression of these lncRNA between analyzed beef cattle breed indi- cate lncRNA may contribute to the differences seen between the two breeds and allow for continuation of improvement of beef cattle meat quality [28]. 18 1.8 References [1] 5 Best Beef Cattle Breeds. URL: https://americancowboy.com/lifestyle/5- best-beef-cattle-breeds-24440/. [2] ADW: Bos taurus: INFORMATION. URL: https : / / animaldiversity . org / accounts/Bos_taurus/. [3] Areola — anatomy — Britannica. URL: https://www.britannica.com/science/ areola. [4] Ruth Bollongino et al. “Modern Taurine Cattle Descended from Small Number of Near-Eastern Founders”. In: Molecular Biology and Evolution 29.9 (Sept. 2012), pp. 2101–2104. ISSN: 0737-4038. DOI: 10.1093/MOLBEV/MSS092. URL: https: //dx.doi.org/10.1093/molbev/mss092. [5] Camilynn I. Brannan et al. “The product of the H19 gene may function as an RNA”. In: Molecular and cellular biology 10.1 (Jan. 1990), pp. 28–36. ISSN: 0270-7306. DOI: 10.1128/MCB.10.1.28-36.1990. URL: https://pubmed.ncbi.nlm.nih. gov/1688465/. [6] Neil Brockdorff et al. “The product of the mouse Xist gene is a 15 kb inactive X- specific transcript containing no conserved ORF and located in the nucleus”. In: Cell 71.3 (Oct. 1992), pp. 515–526. ISSN: 0092-8674. DOI: 10.1016/0092-8674(92) 90519-I. URL: https://pubmed.ncbi.nlm.nih.gov/1423610/. [7] Carolyn J. Brown et al. “Localization of the X inactivation centre on the human X chromosome in Xq13”. In: Nature 1991 349:6304 349.6304 (1991), pp. 82–84. ISSN: 1476-4687. DOI: 10.1038/349082a0. URL: https://www.nature.com/ articles/349082a0. 19 https://americancowboy.com/lifestyle/5-best-beef-cattle-breeds-24440/ https://americancowboy.com/lifestyle/5-best-beef-cattle-breeds-24440/ https://animaldiversity.org/accounts/Bos_taurus/ https://animaldiversity.org/accounts/Bos_taurus/ https://www.britannica.com/science/areola https://www.britannica.com/science/areola https://doi.org/10.1093/MOLBEV/MSS092 https://dx.doi.org/10.1093/molbev/mss092 https://dx.doi.org/10.1093/molbev/mss092 https://doi.org/10.1128/MCB.10.1.28-36.1990 https://pubmed.ncbi.nlm.nih.gov/1688465/ https://pubmed.ncbi.nlm.nih.gov/1688465/ https://doi.org/10.1016/0092-8674(92)90519-I https://doi.org/10.1016/0092-8674(92)90519-I https://pubmed.ncbi.nlm.nih.gov/1423610/ https://doi.org/10.1038/349082a0 https://www.nature.com/articles/349082a0 https://www.nature.com/articles/349082a0 [8] R Luz Elena Cano and H. Damaris E. Lopera. “Introduction to T and B lympho- cytes”. In: (July 2013). URL: https : / / www . ncbi . nlm . nih . gov / books / NBK459471/. [9] Theresa Casey et al. “Transcriptome analysis of epithelial and stromal contributions to mammogenesis in three week prepartum cows”. In: PloS one 6.7 (2011). ISSN: 1932-6203. DOI: 10.1371/JOURNAL.PONE.0022541. URL: https://pubmed. ncbi.nlm.nih.gov/21829467/. [10] Christopher C.L. Chase, David J. Hurley, and Adrian J. Reber. “Neonatal Immune Development in the Calf and Its Impact on Vaccine Response”. In: The Veterinary Clinics of North America. Food Animal Practice 24.1 (Mar. 2008), p. 87. ISSN: 07490720. DOI: 10 . 1016 / J . CVFA . 2007 . 11 . 001. URL: /pmc / articles / PMC7127081/%20/pmc/articles/PMC7127081/?report=abstract%20https: //www.ncbi.nlm.nih.gov/pmc/articles/PMC7127081/. [11] Qiyu Diao, Rong Zhang, and Tong Fu. “Review of strategies to promote rumen de- velopment in calves”. In: Animals 9.8 (Aug. 2019), p. 490. ISSN: 20762615. DOI: 10.3390/ani9080490. [12] Digestive Anatomy in Ruminants. URL: http://www.vivo.colostate.edu/ hbooks/pathphys/digestion/herbivores/rumen_anat.html. [13] Laurence Finot, Eric Chanat, and Frederic Dessauge. “Bovine mammary gland de- velopment: new insights into the epithelial hierarchy”. In: bioRxiv (Jan. 2018), p. 251637. DOI: 10.1101/251637. URL: https://www.biorxiv.org/content/10.1101/ 251637v1%20https://www.biorxiv.org/content/10.1101/251637v1. abstract. 20 https://www.ncbi.nlm.nih.gov/books/NBK459471/ https://www.ncbi.nlm.nih.gov/books/NBK459471/ https://doi.org/10.1371/JOURNAL.PONE.0022541 https://pubmed.ncbi.nlm.nih.gov/21829467/ https://pubmed.ncbi.nlm.nih.gov/21829467/ https://doi.org/10.1016/J.CVFA.2007.11.001 /pmc/articles/PMC7127081/%20/pmc/articles/PMC7127081/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127081/ /pmc/articles/PMC7127081/%20/pmc/articles/PMC7127081/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127081/ /pmc/articles/PMC7127081/%20/pmc/articles/PMC7127081/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127081/ https://doi.org/10.3390/ani9080490 http://www.vivo.colostate.edu/hbooks/pathphys/digestion/herbivores/rumen_anat.html http://www.vivo.colostate.edu/hbooks/pathphys/digestion/herbivores/rumen_anat.html https://doi.org/10.1101/251637 https://www.biorxiv.org/content/10.1101/251637v1%20https://www.biorxiv.org/content/10.1101/251637v1.abstract https://www.biorxiv.org/content/10.1101/251637v1%20https://www.biorxiv.org/content/10.1101/251637v1.abstract https://www.biorxiv.org/content/10.1101/251637v1%20https://www.biorxiv.org/content/10.1101/251637v1.abstract [14] Yahui Gao et al. “Single-cell transcriptomic analyses of dairy cattle ruminal epithe- lial cells during weaning”. In: Genomics 113.4 (July 2021), pp. 2045–2055. ISSN: 0888-7543. DOI: 10.1016/J.YGENO.2021.04.039. [15] Siyu Han et al. “LncFinder: an integrated platform for long non-coding RNA identi- fication utilizing sequence intrinsic composition, structural information and physic- ochemical property”. In: Briefings in Bioinformatics 20.6 (Nov. 2019), pp. 2009– 2027. ISSN: 14774054. DOI: 10.1093/BIB/BBY065. URL: https://academic. oup.com/bib/article/20/6/2009/5062950. [16] Siyu Han et al. “Lncident: A Tool for Rapid Identification of Long Noncoding RNAs Utilizing Sequence Intrinsic Composition and Open Reading Frame Infor- mation”. In: International Journal of Genomics 2016 (2016). ISSN: 23144378. DOI: 10.1155/2016/9185496. [17] Yanghua He et al. “The conservation and signatures of lincRNAs in Marek’s disease of chicken”. In: Scientific Reports 5.1 (Oct. 2015), pp. 1–17. ISSN: 20452322. DOI: 10.1038/srep15184. URL: www.nature.com/scientificreports. [18] Heifer Development: Puberty — The Cattle Site. URL: https://www.thecattlesite. com/articles/901/heifer-development-puberty. [19] Julie R. Hens and John J. Wysolmerski. “Molecular mechanisms involved in the formation of the embryonic mammary gland”. In: Breast Cancer Research 7.5 (Oct. 2005), pp. 220–224. ISSN: 14655411. DOI: 10.1186/BCR1306/TABLES/1. URL: https://breast- cancer- research.biomedcentral.com/articles/10. 1186/bcr1306. [20] How Do Cows Make Milk? — Organic Valley. URL: https://www.organicvalley. coop/blog/from-grass-to-glass-how-do-cows-make-milk/. 21 https://doi.org/10.1016/J.YGENO.2021.04.039 https://doi.org/10.1093/BIB/BBY065 https://academic.oup.com/bib/article/20/6/2009/5062950 https://academic.oup.com/bib/article/20/6/2009/5062950 https://doi.org/10.1155/2016/9185496 https://doi.org/10.1038/srep15184 www.nature.com/scientificreports https://www.thecattlesite.com/articles/901/heifer-development-puberty https://www.thecattlesite.com/articles/901/heifer-development-puberty https://doi.org/10.1186/BCR1306/TABLES/1 https://breast-cancer-research.biomedcentral.com/articles/10.1186/bcr1306 https://breast-cancer-research.biomedcentral.com/articles/10.1186/bcr1306 https://www.organicvalley.coop/blog/from-grass-to-glass-how-do-cows-make-milk/ https://www.organicvalley.coop/blog/from-grass-to-glass-how-do-cows-make-milk/ [21] “How does the immune system work?” In: (Apr. 2020). URL: https://www.ncbi. nlm.nih.gov/books/NBK279364/. [22] Julien Jarroux, Antonin Morillon, and Marina Pinskaya. “History, Discovery, and Classification of lncRNAs”. In: Advances in experimental medicine and biology 1008 (2017), pp. 1–46. ISSN: 0065-2598. DOI: 10.1007/978- 981- 10- 5203- 3{\_}1. URL: https://pubmed.ncbi.nlm.nih.gov/28815535/. [23] Yusuf S. Khan and Hussain Sajjad. “Anatomy, Thorax, Mammary Gland”. In: Stat- Pearls (July 2021). URL: https://www.ncbi.nlm.nih.gov/books/NBK547666/. [24] Sarah Kim et al. “Characterizing the genetic basis of innate immune response in TLR4-activated human monocytes”. In: Nature Communications 2014 5:1 5.1 (Oct. 2014), pp. 1–7. ISSN: 2041-1723. DOI: 10.1038/ncomms6236. URL: https:// www.nature.com/articles/ncomms6236. [25] Johnny T.Y. Kung, David Colognori, and Jeannie T. Lee. Long noncoding RNAs: Past, present, and future. 2013. DOI: 10 . 1534 / genetics . 112 . 146704. URL: /pmc/articles/PMC3583990/?report=abstract%20https://www.ncbi.nlm. nih.gov/pmc/articles/PMC3583990/. [26] Aimin Li, Junying Zhang, and Zhongyin Zhou. “PLEK: A tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme”. In: BMC Bioinformatics 15.1 (Sept. 2014), pp. 1–10. ISSN: 14712105. DOI: 10. 1186/1471-2105-15-311/FIGURES/3. URL: https://bmcbioinformatics. biomedcentral.com/articles/10.1186/1471-2105-15-311. [27] Shudai Lin et al. “Establishment and transcriptomic analyses of a cattle rumen ep- ithelial primary cells (REPC) culture by bulk and single-cell RNA sequencing to elucidate interactions of butyrate and rumen development”. In: Heliyon 6.6 (June 2020), e04112. ISSN: 24058440. DOI: 10.1016/j.heliyon.2020.e04112. 22 https://www.ncbi.nlm.nih.gov/books/NBK279364/ https://www.ncbi.nlm.nih.gov/books/NBK279364/ https://doi.org/10.1007/978-981-10-5203-3{\_}1 https://doi.org/10.1007/978-981-10-5203-3{\_}1 https://pubmed.ncbi.nlm.nih.gov/28815535/ https://www.ncbi.nlm.nih.gov/books/NBK547666/ https://doi.org/10.1038/ncomms6236 https://www.nature.com/articles/ncomms6236 https://www.nature.com/articles/ncomms6236 https://doi.org/10.1534/genetics.112.146704 /pmc/articles/PMC3583990/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3583990/ /pmc/articles/PMC3583990/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3583990/ https://doi.org/10.1186/1471-2105-15-311/FIGURES/3 https://doi.org/10.1186/1471-2105-15-311/FIGURES/3 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-311 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-311 https://doi.org/10.1016/j.heliyon.2020.e04112 [28] Ruili Liu et al. “Genome-Wide Identification and Characterization of Long Non- Coding RNAs in Longissimus dorsi Skeletal Muscle of Shandong Black Cattle and Luxi Cattle”. In: Frontiers in Genetics 13 (May 2022), p. 849399. ISSN: 16648021. DOI: 10.3389/FGENE.2022.849399/BIBTEX. [29] Hector Macias and Lindsay Hinck. “Mammary Gland Development”. In: Wiley inter- disciplinary reviews. Developmental biology 1.4 (July 2012), p. 533. ISSN: 17597684. DOI: 10.1002/WDEV.35. URL: /pmc/articles/PMC3404495/%20/pmc/articles/ PMC3404495/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC3404495/. [30] Nilusha Malmuthuge, Guanxiang Liang, and Le Luo Guan. “Regulation of rumen development in neonatal ruminants through microbial metagenomes and host tran- scriptomes”. In: Genome Biology 20.1 (Aug. 2019), pp. 1–16. ISSN: 1474760X. DOI: 10.1186/S13059-019-1786-0/FIGURES/6. URL: https://genomebiology. biomedcentral.com/articles/10.1186/s13059-019-1786-0. [31] Managing Cow Lactation Cycles — The Cattle Site. URL: https://www.thecattlesite. com/articles/4248/managing-cow-lactation-cycles/. [32] Name That Cow: The 6 Great Dairy Breeds — Dairy Discovery Zone. URL: https: //www.dairydiscoveryzone.com/blog/name-cow-6-great-dairy-breeds. [33] Adam I Orr. “How Cows Eat Grass Exploring Cow Digestion”. In: (2011). URL: http://www.fda.gov/AnimalVeterinary. [34] Parts of the Immune System — Children’s Hospital of Philadelphia. URL: https: //www.chop.edu/centers-programs/vaccine-education-center/human- immune-system/parts-immune-system. 23 https://doi.org/10.3389/FGENE.2022.849399/BIBTEX https://doi.org/10.1002/WDEV.35 /pmc/articles/PMC3404495/%20/pmc/articles/PMC3404495/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3404495/ /pmc/articles/PMC3404495/%20/pmc/articles/PMC3404495/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3404495/ /pmc/articles/PMC3404495/%20/pmc/articles/PMC3404495/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3404495/ https://doi.org/10.1186/S13059-019-1786-0/FIGURES/6 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1786-0 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1786-0 https://www.thecattlesite.com/articles/4248/managing-cow-lactation-cycles/ https://www.thecattlesite.com/articles/4248/managing-cow-lactation-cycles/ https://www.dairydiscoveryzone.com/blog/name-cow-6-great-dairy-breeds https://www.dairydiscoveryzone.com/blog/name-cow-6-great-dairy-breeds http://www.fda.gov/AnimalVeterinary https://www.chop.edu/centers-programs/vaccine-education-center/human-immune-system/parts-immune-system https://www.chop.edu/centers-programs/vaccine-education-center/human-immune-system/parts-immune-system https://www.chop.edu/centers-programs/vaccine-education-center/human-immune-system/parts-immune-system [35] John L. Rinn et al. “Functional Demarcation of Active and Silent Chromatin Do- mains in Human HOX Loci by Non-Coding RNAs”. In: Cell 129.7 (June 2007), p. 1311. ISSN: 00928674. DOI: 10.1016/J.CELL.2007.05.022. URL: /pmc/ articles/PMC2084369/%20/pmc/articles/PMC2084369/?report=abstract% 20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2084369/. [36] Rumen development of calves - Beef. URL: https://www.canr.msu.edu/news/ rumen-development-of-calves. [37] “The innate and adaptive immune systems”. In: (July 2020). URL: https://www. ncbi.nlm.nih.gov/books/NBK279396/. [38] The Life Cycle of Beef Cattle Production — Peterson Farm Brothers. URL: https:// petersonfarmbrothers.com/the-life-cycle-of-beef-cattle-production/. [39] The ruminant digestive system. URL: https://extension.umn.edu/dairy- nutrition/ruminant-digestive-system#stomach-compartments-1000460. [40] H Allen Tucker and East Lansing. “Physiological Control of Mammary Growth, Lactogenesis, and Lactation 1”. In: Journal of Dairy Science 64 (), pp. 1403–1421. DOI: 10.3168/jds.S0022-0302(81)82711-7. [41] Understanding the Ruminant Animal Digestive System — Mississippi State Univer- sity Extension Service. URL: http://extension.msstate.edu/publications/ understanding-the-ruminant-animal-digestive-system. [42] Ransom L Baldwin Vi and Erin E Connor. “Rumen Function and Development”. In: (2017). DOI: 10.1016/j.cvfa.2017.06.001. URL: http://dx.doi.org/10. 1016/j.cvfa.2017.06.001. 24 https://doi.org/10.1016/J.CELL.2007.05.022 /pmc/articles/PMC2084369/%20/pmc/articles/PMC2084369/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2084369/ /pmc/articles/PMC2084369/%20/pmc/articles/PMC2084369/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2084369/ /pmc/articles/PMC2084369/%20/pmc/articles/PMC2084369/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2084369/ https://www.canr.msu.edu/news/rumen-development-of-calves https://www.canr.msu.edu/news/rumen-development-of-calves https://www.ncbi.nlm.nih.gov/books/NBK279396/ https://www.ncbi.nlm.nih.gov/books/NBK279396/ https://petersonfarmbrothers.com/the-life-cycle-of-beef-cattle-production/ https://petersonfarmbrothers.com/the-life-cycle-of-beef-cattle-production/ https://extension.umn.edu/dairy-nutrition/ruminant-digestive-system#stomach-compartments-1000460 https://extension.umn.edu/dairy-nutrition/ruminant-digestive-system#stomach-compartments-1000460 https://doi.org/10.3168/jds.S0022-0302(81)82711-7 http://extension.msstate.edu/publications/understanding-the-ruminant-animal-digestive-system http://extension.msstate.edu/publications/understanding-the-ruminant-animal-digestive-system https://doi.org/10.1016/j.cvfa.2017.06.001 http://dx.doi.org/10.1016/j.cvfa.2017.06.001 http://dx.doi.org/10.1016/j.cvfa.2017.06.001 [43] Jing Wang et al. “Comprehensive Analysis of Differentially Expressed mRNA, lncRNA and circRNA and Their ceRNA Networks in the Longissimus Dorsi Muscle of Two Different Pig Breeds”. In: International Journal of Molecular Sciences 20.5 (Mar. 2019). ISSN: 14220067. DOI: 10.3390/IJMS20051107. URL: /pmc/articles/ PMC6429497/%20/pmc/articles/PMC6429497/?report=abstract%20https: //www.ncbi.nlm.nih.gov/pmc/articles/PMC6429497/. [44] Xiaopei Zhang et al. “Mechanisms and Functions of Long Non-Coding RNAs at Multiple Regulatory Levels”. In: International Journal of Molecular Sciences 20.22 (Nov. 2019). ISSN: 14220067. DOI: 10.3390/IJMS20225573. URL: /pmc/articles/ PMC6888083/%20/pmc/articles/PMC6888083/?report=abstract%20https: //www.ncbi.nlm.nih.gov/pmc/articles/PMC6888083/. [45] Yapeng Zhang et al. “Transcriptome Analysis of Bovine Rumen Tissue in Three Developmental Stages”. In: Frontiers in Genetics 13 (Mar. 2022), p. 821406. ISSN: 16648021. DOI: 10.3389/FGENE.2022.821406/BIBTEX. 25 https://doi.org/10.3390/IJMS20051107 /pmc/articles/PMC6429497/%20/pmc/articles/PMC6429497/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6429497/ /pmc/articles/PMC6429497/%20/pmc/articles/PMC6429497/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6429497/ /pmc/articles/PMC6429497/%20/pmc/articles/PMC6429497/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6429497/ https://doi.org/10.3390/IJMS20225573 /pmc/articles/PMC6888083/%20/pmc/articles/PMC6888083/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6888083/ /pmc/articles/PMC6888083/%20/pmc/articles/PMC6888083/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6888083/ /pmc/articles/PMC6888083/%20/pmc/articles/PMC6888083/?report=abstract%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6888083/ https://doi.org/10.3389/FGENE.2022.821406/BIBTEX Chapter 2: Investigation of Rumen Long Noncoding RNA Before and After Weaning in Cattle 2.1 Abstract Background: This study aimed to identify long non-coding RNA (lncRNA) from the rumen tissue in dairy cattle, explore their features including expression and conservation levels, and reveal potential links between lncRNA and complex traits that may indicate important functional impacts of rumen lncRNA during the transition to the weaning period. Results: A total of six cattle rumen samples were taken with three replicates from before and after weaning periods, respectively. Total RNAs were extracted and sequenced with lncRNA dis- covered based on size, coding potential, sequence homology, and known protein domains. As a result, 404 and 234 rumen lncRNAs were identified before and after weaning, respec- tively. However, only nine of them were shared under two conditions, with 395 lncRNAs found only in pre-weaning tissues and 225 only in post-weaning samples. Interestingly, none of the nine common lncRNAs were differentially expressed between the two weaning conditions. LncRNA averaged shorter length, lower expression, and lower conservation scores than the genome overall, which is consistent with general lncRNA characteristics. By integrating rumen lncRNA before and after weaning with large-scale GWAS results in cattle, we reported significant enrichment of both pre- and after-weaning lncRNA with 26 traits of economic importance including production, reproduction, health, and body con- formation phenotypes. Conclusions: The majority of rumen lncRNAs are uniquely expressed in one of the two weaning condi- tions, indicating a functional role of lncRNA in rumen development and transition of wean- ing. Notably, both pre- and post-weaning lncRNA showed significant enrichment with a variety of complex traits in dairy cattle, suggesting the importance of rumen lncRNA for cattle performance in the adult stage. These relationships should be further investigated to better understand the specific roles that lncRNAs play in rumen development and cow performance. Note: This chapter was previously published by BMC Genomics. Alexis Marceau was the main contributor and is the first author. The original citation is as follows: Marceau, A., Gao, Y., Baldwin , R.L. et al. Investigation of rumen long noncoding RNA be- fore and after weaning in cattle. BMC Genomics 23, 531 (2022). https://doi.org/10.1186/s12864- 022-08758-4 27 2.2 Background For many years, genomic research has focused on the direct line of genes to gene prod- ucts to phenotypic observations. Recent advances in sequencing technologies have led to more in depth exploration of the genomic regions that are transcribed into RNA but rarely into a protein product. Connections have been established between these non-coding RNAs and regulation of gene expression in many organisms. Examples include the X chromosome inactivation, allelic imprinting, pluripotency control, cancer, and many other biological processes [23]. A subset of these noncoding RNAs are the long noncoding RNA (lncRNA): these transcripts are at least 200 nucleotides in length, with some reaching up to 32,000 nucleotides, and are found across nearly all species [34]. Previous research has identified large numbers of lncRNA across many model and non-model species, ranging from over 7,000 transcripts in cattle [20] to over 270,000 lncRNA in humans [28]. Re- cent research in cattle has revealed more lncRNA transcripts in many tissues across cattle breeds: over 4,000 in 6 different tissues between 2 Chinese cattle breeds, nearly 10,000 in 18 different bovine tissues, over 23,000 lncRNA in bovine testes tissue as they mature, and 1,535 lncRNAs in bovine oocytes [13, 11, 22]. Additionally, almost 8,000 lncRNAs were found to be associated with metabolic efficiency [31]. A key characteristic of the cow is the four-part stomach system, where feed enters the reticulum and rumen, passes through the omasum, and reaches the abomasum where di- gestion occurs in a manner similar to humans and other livestock animals. Many organs comprise a small proportion of the body as the animal matures, but the rumen increases from 30 to 70% of the capacity of the gut during weaning and continues to grow through- out lactation [43, 2]. However, at birth, a calf has only fully developed the abomasum. The fermentation vat including the reticulum and rumen is sterile upon birth and takes several weeks to establish a bacterial colony suitable for ruminant digestion. Therefore, new calves 28 are mostly fed with milk or a milk replacer that can use an esophageal groove to bypass the under-developed digestion system, although dry feed is required to develop rumen bacte- rial activity. The anaerobic nature of the under-developed rumen, when dry feed is added, makes a vessel suitable for anaerobic bacterial growth, producing acetic, propionic, and butyric acids, lowering the pH, of the rumen and continuing bacterial growth [3]. As the rumen develops post-natal, the bacterial colonies grow and genetic/genomic changes likely occur to aid in the transition from a milk-based, pre-weaning diet to the feed or grass diet after weaning. It has been shown that rumen development and rumen microbiomes are affected by the weaning process across different weaning strategies [30]. It has been well documented that gene expression changes have implications in ru- men pH maintenance, gastrointestinal tract cell proliferation, growth, and development [4]. LncRNA has been shown to have regulatory roles in gene expression. In humans, approxi- mately 5% of the genome is conserved solely due to genomic regions with regulatory roles, with 80% of these regions being associated with chromatin state, adding more evidence to the hypothesis that lncRNAs are regulatory elements [12]. Given the recent increase in interest in non-coding genomic regions, it would not be surprising to find connections be- tween these non-coding RNAs and changes in gastrointestinal behaviors, such as weaning [18]. Previous research has shown that the weaning process does lead to changes in gene expression in rumen. Butyrate, a volatile fatty acid often produced by rumen microbes, can promote rumen development through assisting with structural integrity, epigenetic reg- ulation, signaling pathways, and more [7]. As butyrate enters the rumen via milk or top- dressed feeds, 1,977 genes were identified as differentially expressed with or without bu- tyrate treatment in rumen epithelial tissue — these genes included many key regulators and signaling pathways [25]. Weaning also led to variability in the chromatin structures and extracellular interactions in rumen tissues. The shift from milk to feed led to epige- 29 netic changes including DNA methylation, histone modification, and more [10]. Lin et al. provided increasing evidence that rumen development hinges on changes in genetic in- teractions, both expression-wise and extracellularly [25]. The previously identified 6,679 novel lncRNA found in rumen tissue gives additional support [29]. As lncRNAs are as- sociated with pre-transcriptional regulation, transcriptional regulation, post-transcriptional regulation, as well as acting as signaling molecules, decoys, scaffolds, and guides, it is expected to find lncRNAs functioning in the rumen tissue [24]. The previous research also lends credence to the hypothesis that the two weaning conditional rumen samples will show differential expression levels in lncRNA. Given the regulatory roles associated with lncRNA, analysis of lncRNA and their dif- ference before and after weaning in rumen may reveal important regulatory pathways that allow a calf to be adequately weaned and begin life with a healthy digestive system. To further explore the features and potential functions of lncRNA in cattle, here we report a genome-wide study by isolating and investigating RNA transcripts that meet coding and length criteria for lncRNA classification, using rumen epithelial tissue in calves before and after weaning. With industrial tools, samples were sequenced and mapped to a robust genome before being filtered based on size, coding status, coding potential, and similarities to known genes (Supplemental Figure 1). The resulting transcript lists were then analyzed for differential expression, sequence conservation, multi-species orthologs, and enrichment with GWAS results of 42 dairy cattle traits that involve production, reproduction, body conformation, and health components. These findings shed light on the lncRNA landscape within cattle and their rumen, allowing us to further unravel the genetic mechanisms at work for tissue development and dairy cattle performance. 30 2.3 Results 2.3.1 lncRNA Identification To identify lncRNA, Illumina HiSeq 2500 (PE150) sequencing was performed on three pre-weaning rumen samples and three post-weaning samples in Holstein cattle. We gen- erated a total of over 125,000,000 sequence reads, averaging 21,085,595 reads per sam- ple. Double-iterative mapping via Hisat yielded approximately 87.97% alignment in each sample [21]. Stringtie and Stringtie-merge yielded a consensus sequence for pre-weaning samples with 312,432 fragments and 315,559 fragments in the post-weaning samples [41, 33]. Comparing consensus sequences to the current ARS-UCD 1.2 Bos taurus reference genomes annotated transcripts based on known loci and genes [37]. Results from the com- parisons allowed for filtering based on overlapping with known loci and transcripts in both reference genomes that resulted in 33,975 remaining transcripts in the pre-weaning samples and 35,621 transcripts in the post-weaning samples, respectively. These transcripts were then analyzed with a coding potential calculator, a BLAST search, and comparisons to the Pfam database to remove any remaining coding transcripts [1, 35, 17]. Finally, filtered results produced a list of 404 candidate lncRNA transcripts in the pre-weaning tissue and 234 transcripts in the post-weaning tissue (Fig 2.1). 2.3.2 lncRNA Characteristics with Comparison to Protein Coding Tran- scripts Based on the ARS-UCD1.2 Bos taurus annotation [37], size of gene transcripts ranged from 200 to over a million nucleotides, averaging 28,239 nucleotides. This is in stark contrast to the length of lncRNA transcripts we identified in rumen, which averaged 674 31 Figure 2.1: Filtering of Transcripts in Pre-weaning and Post-Weaning Rumen Tissue Samples. Once a consensus sequence was generated for each sample, a series of filtering steps were used to iso- late candidate lncRNA. Steps included removing known protein coding transcripts, removing tran- scripts possessing coding potential, and those that demonstrated nucleotide and protein sequence homology. In the pre-weaning rumen tissue sample, 404 transcripts remained, and 234 transcripts remained in the post-weaning rumen tissue sample 32 nucleotides overall; pre-weaning lncRNAs averaged 466 nucleotides and post-weaning av- eraged 1,033 nucleotides (Fig 2.2). Although all identified lncRNAs were over 200 nu- cleotides in length, as required by the definition, they averaged 674 nucleotides and ranged from 200 to 18,000 base pairs in pre-weaning samples, and 200-57,000 base pairs in post- weaning samples (Fig 2.2). This is in line with previous findings of lncRNA transcripts being shorter than their coding counterparts. All identified lncRNA had, on average, 1.01 exons per transcript, whereas the NCBI gene annotation had 1-335 exons per transcript with an average of 9.2 exons per transcript. A distinctly lower number of exons per transcript in lncRNA compared to coding transcripts appears to be common across many studies [12, 6]. In pre-weaning cattle, expression level of coding genes averaged 26.12 fragments per kilobase of transcript per million mapped reads (FPKM). This is five times more than the lncRNA identified in pre-weaning samples, which measured an average of 5.24 FPKM. In post-weaning samples, lncRNA expression averaged an FPKM value of 7.89, compared to the 27.89 FPKM average value of all gene transcripts (Fig 2.3). The lower expression level of lncRNA compared to coding genes is also consistent with previous studies. 2.3.3 Differential Expression Analysis Before and After Weaning The expression level of rumen lncRNA was similar before and after weaning with an average of 5.24 and 7.89 FPKM pre- and post-weaning, respectively. When analyzing the lncRNA profiles of pre- and post-weaning rumen tissue, the majority of lncRNAs identified were present only in one condition, with 395 found only in pre-weaning tissues and 225 only in post-weaning samples. Still, nine transcripts were present under both weaning conditions. However, none of these common transcripts showed expression levels that varied at a statistically significant level (Table 2.1). 33 Figure 2.2: Length distribution of candidate lncRNA transcripts. A Length of all candidate lncRNA transcript. The average length of transcripts measured 674 base pairs, indicated by red line. B Zoomed in distribution of length of all candidate lncRNA transcript. Excluding those longer than 2000 base pairs for added clarity. C Length of pre-weaning transcripts, ranging from 200 to 17809 and averaging 466 nucleotides. D Length of post-weaning transcripts, ranging from 200 to 56626 and averaging 1033 nucleotides 34 Figure 2.3: Expression of lncRNA candidate transcripts. A FPKM values of transcripts expressed in pre-weaning tissue. Expression levels ranged from 0.17 to 46.81 FPKM, averaging 5.24 FPKM. The average length of transcripts was indicated by red line. B FPKM values of transcripts expressed in post-weaning tissue. Expression levels ranged from 0.72 to 106 FPKM, averaging 7.89 FPKM. The average length of transcripts was indicated by red line Table 2.1: T-test of expression levels between pre and post weaning rumen tissue for nine common lncRNA transcripts. Transcripts were isolated as common if they overlapped with each other, aver- age expression was calculated for each transcript and is reported in FPKM. A paired student t-test was performed with degrees of freedom equaling 8. This yielded a critical value of 1.860, which none of the t-test scores surpassed, indicating none of the lncRNAs identified as common to both conditions were differentially expressed at a significant level Pre-weaning Post-weaning P-value lncRNA Expression lncRNA Expression Chr1:143606037–143,606,970 1.167 Chr1:143606047–143,607,059 2.963 0.743 Chr1:146868890–146,870,164 0.798 Chr1:146868985–146,869,483 2.004 0.798 Chr20:23927430–23,927,631 46.81 Chr20:23927420–23,927,884 106.4 0.584 Chr4:119469477–119,469,730 3.01 Chr4:119469677–119,470,016 1.776 0.514 Chr5:118054620–118,054,855 0.722 Chr5:118054288–118,054,860 1.607 0.654 Chr7:46217152–46,217,456 3.399 Chr7:46217122–46,217,472 5.292 0.515 Chr7:12950022–12,950,226 13.54 Chr7:12950002–12,950,242 11.11 0.464 Chr7:36273737–36,274,435 1.494 Chr7:36273748–36,274,421 1.348 0.471 Chr7:43944991–43,945,389 1.430 Chr7:43944829–43,945,374 1.757 0.516 35 2.3.4 Analysis of lncRNA Sequence Conservation lncRNA tends to show lower rates of sequence conservation when compared to coding genes. To investigate this in cattle, transcript profiles for all coding transcripts, intergenic transcripts, and lncRNAs for both pre- and post-weaning conditions were converted to the human genome equivalents using the LiftOver v3.13 software and run through the Phast- Cons program to calculate conservation scores among transcripts of 46 vertebrates [40]. As expected, whole genome gene profiles showed higher scores overall than both in- tergenic and lncRNA profiles (Fig2.4). Interestingly, the median conversation score was slightly higher in lncRNA than intergenic transcripts. Although when plotted as a violin graph, the scores of lncRNA are localized near 0, intergenic regions are slightly more con- served. All transcripts averaged a PhastCons score of 0.103, intergenic regions scored an average of 0.104, and lncRNA had the lowest average score of 0.100. Complete profiles ranged in scores from 0.000111 to 0.999982, intergenic profiles ranged from 0 to 1, and lncRNA scores were as small as 0.000183 to as large as 0.998854. Between pre- and post- weaning, conservation scores were very similar (Fig 2.4). Pre-weaning samples ranged from 0.000873 to 0.879405, averaging 0.09963. Although it should be noted the median score of conserved pre-weaning transcripts was 0.0434. In post-weaning transcripts, scores ranged from 0.000183 to 0.98854 with an average score of 0.101 and a median score of 0.0485. LncRNA PhastCons scores were plotted based on score rank to illustrate clus- tering patterns (Fig 2.5). Most lncRNAs showed low sequence conservation scores with a small number showing much higher scores. These findings are consistent with lncRNA trends regarding conservation. The conservation score of the nine common transcripts were also calculated, which ranged from 0.008296 to 0.161162 with an average score of 0.0477 and a median of 0.0261. 36 Figure 2.4: Phastcons scores of pre- and post-weaning conditions at whole genome, intergenic re- gion, and lncRNA levels. A Boxplot of PhastCons scores for all transcripts, all intergenic transcripts, and all lncRNA candidate transcripts. B Violin plot of all six profiles: all preweaning transcripts, preweaning intergenic regions, preweaning lncRNA transcripts, all postweaning transcripts, post- weaning intergenic regions, and postweaning lncRNA transcripts 37 Figure 2.5: Scatter plot of lncRNA PhastCons scores. Most lncRNAs show scores well below 0.50 with a small number being well conserved across many species. Pre-weaning scores ranged from 0.000873 to 0.879405, and post-weaning scores ranged from 0.000183 to 0.658853, with an outlier of 0.98854 38 2.3.5 Transcriptional Annotation of Common and Conserved lncRNA In addition to the nine common transcripts in two weaning conditions, the top 10% of lncRNAs based on conservation scores were also isolated from both weaning profiles for further analyses, in an attempt to ascertain lncRNA function. Twenty-three transcripts were kept from the pre-weaning profile, and 14 transcripts were kept for the post-weaning profile. Using the UCSC genome browser, these 46 transcripts were compared to the human and mouse genomes to identify previously annotated genes. Of the nine com- mon lncRNA transcripts, three matched to human and mouse genes: ESYT2, SEC24A, and ZFN491/Zfp811. A fourth lncRNA matched to the Clic6 gene in Rattus norvegicus. Functions of these genes include ion channel formation, cellular lipid transport, transport vesicle formation, transcriptional activity, and more. Eleven of the 23 top conserved pre- weaning lncRNA showed matches to human and/or mouse genes. These include NLGN1, SAP130, ATP13A2/Padi3, BCL1, NTNG1, SLC2A1, APBB2, OAZ1, MEX3D, ANXA6, and DOCK11. Roles include nervous system development, cell surface protein interactions, transcriptional repressors, signal transduction, neuronal circuit formation, glucose trans- portation, post-transcriptional regulation, and much more. Of the post-weaning transcripts, nine of the 14 candidates matched to human/mouse genes: 4930448K20Rik, LAPTM4A, MBD5, IGSF21, F11R, EPS15, RAB21, COL25A1, UHRF1, and PLK5. Roles include nu- cleoside transport, heterochromatin binding, synapse inhibition, membrane traffic control, fibrillization inhibition, epigenetic regulation, cell cycle progression, and a number of other roles. Most notably, the F11R gene plays a role in epithelial tight junction formation; given the rumen is lined with stratified squamous epithelial cells, this has interesting implications in the relationship between lncRNA and rumen development [19]. Another interesting find- ing is the EPS15 gene, which has associations with cell growth regulation [38]. Given the physical changes taking place as the weaning process progresses, cell growth is a key com- 39 ponent of the rumen maturation. The large number of roles these equivalent genes play fit with previous research of the vast number of functions lncRNAs exhibit in the biological systems. 2.3.6 SNP heritability Enrichment Analysis on Cattle Traits By integrating 404 pre- and 234 post-weaning lncRNAs with large-scale GWAS data in Holstein cattle [14, 9], we revealed an interesting relationship between rumen lncRNA and cattle complex traits of economic importance. Using a method for partitioning SNP heritability named MPH (https://github.com/jiang18/mph), we quantified SNP heritability enrichment as the ratio of per-SNP heritability near the lncRNAs to the genome-wide one. P-value was also computed by comparing the enrichment level to 1 using a Wald test. Fi- nally, our enrichment analysis used 7,988 and 4,856 variants within the lncRNA identified for pre-weaning and post-weaning conditions, respectively. As a result, we found a significant enrichment of per-SNP heritability near the lncR- NAs under pre and post weaning conditions across cattle production, reproduction, health, and body conformation traits (Table 2.2). Overall, post-weaning lncRNAs showed slightly more significant enrichment with cattle traits than pre-weaning lncRNAs, indicating a more important function of the rumen lncRNAs after weaning for cow performance in the adult stage. Stature was highly significantly associated with lncRNAs in both pre- (11.67x; P=5.7E-6) and post-weaning (25.01x; P=1.1E-9) conditions, reflecting the important func- tions of rumen lncRNAs under both pre and post weaning conditions related to the overall tissue development and growth. Livability was associated with lncRNAs under both wean- ing conditions (11.34x; P=0.01 for pre and 15.1x; P=0.01 for post, respectively), suggesting that rumen development may have a functional role in the regulation of immunity and dis- ease resistance. Milk’s SNP heritability was only significantly enriched in post-weaning 40 lncRNAs (5.38x; P=0.04). And daughter pregnancy rate (DPR) was only significantly en- riched with post-weaning lncRNAs (14.9; P=6.4E-4). Table 2.2: Enrichment of rumen lncRNA in cattle GWAS results. Traits analyzed for SNP heri- tability enrichment included cattle production (milk), reproduction (daughter pregnancy rate, DPR), health (livability), and body conformation (stature). Enrichment was analyzed in both pre and post weaning tissue conditions DPR Livability Milk Stature Enr SE P Enr SE P Enr SE P Enr SE P Pre 3.49 1.76 0.08 3.76 2.55 0.14 11.67 2.43 5.7E-06 11.34 4.46 0.01 Post 5.38 2.52 0.04 14.9 4.31 6.4E-04 25.01 4.02 1.1E-09 15.07 6.23 0.01 41 2.4 Discussion In this study, Illumina high throughput RNA-seq data were used to detect and analyze lncRNA in cattle rumen tissue before and after calf weaning. This was done to both identify rumen lncRNA and to find transcripts that are differentially expressed as the rumen devel- ops from immature to mature. RNA-seq data were aligned to a robust reference genome and then progressively filtered by criteria such as coding potential, intergenic support, and size, resulting in a list of 629 lncRNA transcripts. Candidate transcripts averaged a shorter length, less exons per transcript, lower expression, and lower conservation scores when compared to whole genomic transcripts. This represents 404 transcripts in the pre-weaning profile, 234 transcripts in the post-weaning profile, and 9 transcripts common to both pro- files. Interestingly, the 9 common transcripts are expressed at similar levels, indicating they likely play a basal role in rumen tissue that is independent of rumen tissue development. This study confirms that there are lncRNA transcripts expressed in rumen tissue, and furthermore that transcripts are expressed differently as weaning progresses. Identification of hundreds of transcripts unique to before and after weaning conditions suggests the ex- pression of these transcripts is related to the maturation of the animal, likely being tied to the weaning process. Overall, the variability in identified lncRNA transcripts indicates they likely play a role in the biological changes that occur as the calf’s digestive system devel- ops. This also supports theories that although these transcripts are not made directly into gene product, they are essential for the growth and development of organisms. Enrichment analysis demonstrated that, although isolated from rumen tissue, these lncRNA transcripts likely influence traits outside of those associated with digestion/weaning, as several of the transcripts show significant enrichment based on genome wide association studies. A sub- set of the unique transcripts to each tissue condition showing enrichment in complex traits could mean many things, however, it is evidence that these transcripts are involved in the 42 development of other characteristics as the calf is weaned. This could demonstrate that several transcripts may have broad applications in calf development and later performance overall. When integrating the identified rumen lncRNA with existing GWAS results, we re- ported significant enrichments of GWAS signals in or near lncRNA regions across a wide spectrum of cattle traits, suggesting the importance of lncRNA on the genetics of com- plex traits. Considering the functional impact of lncRNA on gene regulation, future studies should no longer ignore lncRNA in GWAS and fine-mapping studies. In addition, genomic selection may also consider adding important SNPs near lncRNA onto newer SNP arrays and using different weights in the modelling process. Previous studies have identified at least 7,000 lncRNA within cattle, as well as 20,000+in other systems, and this study was able to identify over 600 transcripts that met lncRNA cri- teria when analyzing rumen tissue from Bos taurus [20, 28, 16]. Compared to other studies of cattle lncRNA, the current study is limited by only focusing on the rumen tissue and two weaning conditions. With only three replicates within each condition, we may not be able to discover all the important lncRNAs in rumen due to limited detection power. Still, we emphasize the validity and importance of our results based on the evidence of consis- tent genomic features of the identified lncRNAs with existing studies, higher conservation across species than genome average, and enrichment with GWAS signals of important dairy cattle traits. Further analyses of multiple tissues may also expand our findings and identify lncRNAs that are expressed exclusively in other tissues, as well as reveal those expressed in multiple tissues, including the rumen. Research has also shown lncRNA has connections with many biological processes such as chromosome X inactivation, allelic imprinting, pluripotency control, and cancer, so it is not surprising to find that lncRNA identified in Bos taurus rumen may be involved in development of the rumen, and likely across the organism [23]. 43 This is supported by our findings that there are very different lncRNA profiles as a calf is weaning. Of both common and highly conserved lncRNAs, the corresponding genes in human and mouse models are indicative that these lncRNAs serve a number of roles related to rumen. Two of the identified genes are of particular interest: F11R and EPS15. F11R is related to the formation of the epithelial tight junctions, which is an interesting research endeavor given the rumen is lined wi