ABSTRACT Title of dissertation: EXPANDING THE TOOLKIT: STRUCTURE, DYNAMICS, AND DRUG INTERACTIONS OF THE ?PRIMING LOOP? FROM HEPATITIS B VIRUS PRE-GENOMIC RNA BY SOLUTION NMR SPECTROSCOPY Lukasz T. Olenginski, Doctor of Philosophy, 2022 Dissertation directed by: Theodore Kwaku Dayie, Professor, Chemistry and Biochemistry RNAs are dynamic macromolecules that function as essential components of biological pathways that result in human disease, making them attractive therapeutic targets. Yet, RNA structural biology lags significantly behind that of proteins, limiting mechanistic understanding of RNA chemical biology. Fortunately, solution NMR spectroscopy can probe the structure, dynamics, and interactions of RNA in solution at atomic resolution, opening the door to their functional understanding. However, NMR analysis of RNA ? with only four unique ribonucleotide building blocks ? suffers from spectral crowding and broad linewidths, especially as RNAs grow in size. One effective strategy to overcome these challenges is to introduce NMR-active stable isotopes into RNA in an atom- and position- specific manner. Here, we outline the development of labeling technologies, their use in benefiting RNA dynamics measurements, and applications to study the structure, dynamics, and interactions of a conserved regulatory RNA stem-loop from hepatitis B virus that is critical for viral replication. EXPANDING THE TOOLKIT: STRUCTURE, DYNAMICS, AND DRUG INTERACTIONS OF THE ?PRIMING LOOP? FROM HEPATITIS B VIRUS PRE-GENOMIC RNA BY SOLUTION NMR SPECTROSCOPY by Lukasz T. Olenginski Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2022 Advisory committee: Professor Theodore Kwaku Dayie, Chair Professor Nicole LaRonde-LeBlanc Professor John Orban Professor Lai-Xi Wang Professor Wade Winkler ? Copyright by Lukasz Tyler Olenginski 2022 Forward As the name of my thesis implies, this work is my attempt at merging the two general realms of my research: tool-development (i.e., chemical and enzymatic synthesis) and application to answer biological questions (e.g., what part does RNA play in HBV genome replication?). The ultimate goal, of course, is to push beyond the current limitations of solution NMR spectroscopy to provide the greater RNA research community, from cell biologists to biophysical chemists, a better means of understanding RNA structure- dynamics-function relationships. To this end, I have organized my thesis as follows. My first chapter provides a robust introduction to the world in which I have lived and breathed for the past five years: RNA labeling as it relates to solution NMR spectroscopy. This chapter is a combination of review articles (two published in Molecules and one in Chemical Reviews) that I co-wrote with other graduate students from our research group (i.e., Drs. Owen Becette and Regan LeBlanc and Mary Taiwo). The second chapter provides my original contributions to the RNA labeling field, highlighting the work that Dr. Owen Becette started while in graduate school as a collaboration between our group and Dr. Serge Beaucage at the Food and Drug Administration. This work has led, in part, to three first-author manuscripts (two published in Monatshefte f?r Chemie ? Chemical Monthly and one in Journal of Biomolecular NMR). The third chapter shifts focus slightly, introducing the important topic of RNA dynamics. A full understanding of RNA dynamics is not only important for chapters to come, but also provides the opportunity to showcase how RNA labeling directly benefits NMR dynamics measurements. This chapter includes one of the previously mentioned review (Chemical Reviews) and research (Monatshefte f?r Chemie ? Chemical Monthly) ii articles and therefore contains contributions from co-graduate student Mary Taiwo. The fourth chapter is where everything that came before can finally be put to use. This section highlights a large collaborative effort that has been with our research group and those from the National Cancer Institute (NCI) for many years, spanning the PhD lifespan of three graduate students: Drs. Andrew Longhini and Regan LeBlanc, and now myself. This chapter details our attempt to provide structural dynamic insight into the namesake of this thesis, the so-called ?priming loop? RNA from hepatitis B virus (HBV). Their contribution, along with Drs. Wojciech Kasprzak and Bruce Shapiro, was critical for the RNA structure to come to life. My contribution was in final analysis and manuscript preparation (Journal of Biomolecular Structure and Dynamics). The fourth chapter concludes with my original contributions with collaborators Drs. Christina Bergonzo, Wojciech Kasprzak, and Bruce Shaprio to provide a robust RNA dynamics analysis and is currently under review (Journal of Molecular Biology). The fifth chapter includes our attempts to identify RNA-targeting compounds that can function as anti-HBV therapeutics, which draws partly on our collaborative manuscript (Journal of Biomolecular Structure and Dynamics) with collaborators from the NCI (Drs. Stuart LeGrice, John Schneekloth Jr., and Bruce Shapiro), and partially from my original unpublished work, again with collaborator Dr. Wojciech Kasprzak, that we intend to submit for publication in the near future. The sixth and final chapter is meant to draw conclusions and detail what future research is needed to help push these research projects forward. My hope is that this work will not only be a useful resource for future students in our research group but can also be applied broadly to aid in future solution NMR and RNA-based studies. iii Acknowledgments The work described herein is indebted to many others who helped me in various ways and at various stages of my scientific development. To start, I would not be where I am today without the excellent mentorship of Drs. Scott Brewer and Christine Phillips-Piro at Franklin and Marshall College, as well as Peggy Hsieh at the National Institutes of Health. Their time and commitment prepared me well for the rigors of graduate school. Now, to my time at the University of Maryland. I want to extend my deep gratitude to my advisor, Dr. Kwaku Dayie, whose mentorship enabled me to mature into an independent scientist. Moreover, my work with Dr. Kwaku Dayie presented diverse research opportunities and exposed me to a multitude of wonderful collaborations. Throughout all the hustle-and-bustle of work in the laboratory, Dr. Kwaku Dayie was always open and interested in the lives of his students outside of the laboratory. These small gestures cultivated a wonderful working environment that created a talented and motivated research team, which has provided me a tremendous amount of support. To previous members, Drs. Owen Becette, Bin Chen, Regan LeBlanc, Andrew Longhini, and Hyeyeon Nam, thank you for helping me successfully navigate the many demands of graduate school. I want to extend a special thanks to Drs. LeBlanc and Becette for their continued support, collaboration, and advice over the past few years. To current members, Solomon Attionu, Joshua Cooksey, Rita Dill, Erica Henninger, Lily Nguyen, Frances Stump, Mary Taiwo, and Rose Willis, thank you for making the hard work not as hard and the long days not as long. I am grateful for all of you, but especially Mary Taiwo. We rotated and joined Dr. Kwaku Dayie?s research group together and the rest has been history. I cannot imagine my time in graduate school without her and I iv cannot wait to see what she will accomplish in her career! In addition to our research group, I want to thank the newly minted Drs. Emily Luteran and Daniel Trettel, who just successfully defended their theses, as well as Bill Evans. Their friendship helped motivate me throughout graduate school and I am grateful that Daniel, Emily, and I reached the finish line together, with Bill following not-so-far behind! I would also like to thank my committee members, who have challenged me to think critically and read more broadly. Your guidance has allowed the work described herein to reach its full potential. In addition, I want to acknowledge the many collaborators who I had the pleasure of working with over the years: Drs. Serge Beaucage, Christina Bergonzo, Victoria D?Souza, Andrzej Grawjkowski, Wojciech Kasprzak, Christoph Kreutz, Stuart LeGrice, John Schneekloth Jr., and Bruce Shapiro. I want to extend a special thanks to Dr. Kreutz. Our collaborative efforts over the years have deepened my passion for the intersection of chemistry and biophysics. In addition, I am grateful for the mentorship of Drs. Beaucage and Grawjkowski, which introduced me to the world of chemical synthesis. Finally, I would extend my gratitude to Drs. Bergonzo and Kasprzak for their computational expertise and willingness to answer my emails and field my many questions at all hours of the day. These acknowledgements would not be complete without mention of my biggest support system, my family. To my mother and father, thank you for igniting my curiosity at a young age and providing me the resources to follow it all the way to graduate school. One of the interesting outcomes of being in graduate school during the COVID-19 pandemic and doing research on a viral RNA is that you become your family?s local ?expert? and are bombarded with question after question. To be clear, I am far from an v expert on the virus for which I study, let alone SARS-CoV-2, but their line of questioning is emblematic of their love for their son. I am their pride and joy, and they are mine. To my identical twin brother, thank you for constantly setting the bar to success so high. In college, you joined Dr. Piro?s laboratory first. After graduation, you secured a research position first. You finished medical school first. And now, you are a medical doctor. Greg, you will always be the smarter twin. I am lucky to have you to lean on. And last, but certainly not least, I am filled with joy to welcome my own family into the world. To my wife, Madelyn, it is hard to imagine where I would be without you. At times, your love, friendship, and support were the only things keeping me going. I love you dearly, and I am forever proud to be your husband. To our daughter-to-be, Rhys, who will be born at the end of April, I cannot wait to meet you and begin our lives together! To our dog, Sweetie, no one has spent more time with me while writing my thesis than you, and I would not have it any other way. vi Table of Contents Forward ................................................................................................................... ii Acknowledgements ................................................................................................. iv Table of Contents .................................................................................................... vii List of Tables ........................................................................................................... xi List of Figures .......................................................................................................... xii List of Schemes ....................................................................................................... xv Abbreviations ........................................................................................................... xvi 1 RNA isotope labeling in the context of solution NMR spectroscopy .................. 1 1.1 Introduction ................................................................................................... 1 1.2 Stable isotopes in NMR spectroscopy .......................................................... 3 1.2.1 Proton isotope .................................................................................... 4 1.2.2 Heteronuclear 13C and 15N isotopes ................................................... 5 1.2.3 Deuteration in context of heteronuclear 13C and 15N isotopes ............ 7 1.2.4 Fluorination in context of 2H, 13C, and 15N isotopes ........................... 7 1.3 Stable isotope labeling of RNA building blocks ............................................ 8 1.3.1 Commercial isotope sources .............................................................. 9 1.3.2 Biomass labeling ................................................................................ 11 1.3.2.1 Uniform biomass labeling ......................................................... 11 1.3.2.2 Atom-specific biomass labeling ................................................ 12 1.3.3 rNTP de novo biosynthesis ................................................................ 13 1.3.3.1 Purine de novo biosynthesis .................................................... 13 1.3.3.2 Pyrimidine de novo biosynthesis .............................................. 15 1.3.4 Chemo-enzymatic labeling ................................................................. 17 1.3.4.1 Nucleobase labeling ................................................................. 18 1.3.4.1.1 Pyrimidine 2H, 13C, 15N, and 19F labeling ............................. 18 1.3.4.1.2 Purine C8 labeling .............................................................. 21 1.3.4.1.3 Purine C2 labeling .............................................................. 22 1.3.4.1.4 Pyrimidine N1, N3, and N4 labeling .................................... 22 1.3.4.1.5 Purine N1, N3, N7, and N9 labeling .................................... 23 1.3.4.1.6 Nucleobase labels: summary and outlook .......................... 25 1.3.4.2 Enzymatic coupling of nucleobase and ribose sources ........... 26 1.3.4.2.1 Synthesis from D-glucose ................................................... 26 1.3.4.2.2 Synthesis from D-ribose ...................................................... 29 1.3.4.2.3 Synthesis from inosine ........................................................ 31 1.3.4.2.4 rNTP labels: summary and outlook ..................................... 32 1.3.5 RNA phosphoramidite labeling ........................................................... 33 1.3.5.1 2H, 13C, and 15N labeling .......................................................... 33 1.3.5.2 19F labeling and post-transcriptional modifications .................. 38 1.3.5.3 Synergy between labeling methods ......................................... 43 1.3.5.4 Phosphoramidite labels: summary and outlook ....................... 44 1.4 RNA preparation methods ............................................................................ 45 1.4.1 Solid-phase RNA synthesis ................................................................ 45 1.4.2 T7 RNA polymerase-based in vitro transcription ................................ 47 1.4.3 Enzymatic ligation ............................................................................... 48 vii 1.4.3.1 T4 DNA and RNA ligation ........................................................ 49 1.4.3.2 Segmental RNA labeling .......................................................... 50 1.4.4 Enzymatic position-specific RNA labeling .......................................... 51 1.4.4.1 Position-Selective Labeling of RNA (PLOR) ............................ 52 1.4.4.2 Chemo-enzymatic position-specific labeling ............................ 52 1.5 Conclusion .................................................................................................... 53 2 Chemical and enzymatic synthesis of RNA building blocks .............................. 57 2.1 Introduction ................................................................................................... 57 2.2 Synthesis of [2-13C, 7-15N]-ATP .................................................................... 57 2.2.1 Motivation ........................................................................................... 57 2.2.2 Synthetic overview .............................................................................. 58 2.2.3 Synthetic details ................................................................................. 60 2.2.4 Applications to NMR studies .............................................................. 64 2.2.4.1 RNA transcription ..................................................................... 64 2.2.4.2 NMR structure measurements ................................................. 65 2.2.4.3 NMR dynamics measurements ................................................ 67 2.2.4.4 NMR spectroscopy details ....................................................... 67 2.3 Synthesis of [1?,6-13C2, 5-2H]-uridine 2?-O-CEM amidite ............................... 69 2.3.1 Motivation ........................................................................................... 69 2.3.2 Synthetic overview .............................................................................. 70 2.3.3 Synthetic details ................................................................................. 72 2.3.4 Application to NMR studies ................................................................ 76 2.3.4.1 Synthesis of unlabeled 2?-O-CEM amidites .............................. 77 2.3.4.2 RNA Synthesis with 2?-O-tBDMS amidites ............................... 79 2.3.4.3 RNA Synthesis with 2?-O-CEM amidites .................................. 80 2.4 Conclusion .................................................................................................... 81 3 NMR probes of accurate RNA dynamics ........................................................... 83 3.1 Introduction ................................................................................................... 83 3.2 Probing fast motions ..................................................................................... 84 3.2.1 Dipolar coupling and spin relaxation ................................................. 86 3.2.1.1 Effects of long-range 13C-13C dipolar couplings ....................... 88 3.2.1.2 Measurements in uniformly and selectively labeled RNA ........ 91 3.2.1.3 Theoretical simulation details ................................................... 94 3.2.1.4 RNA transcription ..................................................................... 95 3.2.1.5 NMR spectroscopy details ....................................................... 95 3.3 Probing slow motions ................................................................................... 96 3.3.1 Scalar coupling, relaxation dispersion, and saturation transfer .......... 101 3.3.1.1 CPMG in selectively labeled RNA ............................................ 102 3.3.1.2 RNA transcription ..................................................................... 104 3.3.1.3 NMR spectroscopy details ....................................................... 104 3.4 Conclusion ................................................................................................... 106 4 Structural dynamic insights of hepatitis B virus pre-genomic RNA ................... 107 4.1 Introduction ................................................................................................... 107 4.2 HBV lifecycle and genome replication .......................................................... 107 4.2.1 The transformation from rcDNA to cccDNA to pgRNA ....................... 109 4.2.2 pgRNA packaging and reverse transcription ...................................... 111 viii 4.2.3 (-)-DNA strand synthesis .................................................................... 111 4.2.4 (+)-DNA strand synthesis ................................................................... 113 4.2.5 Structural dynamics of ? ..................................................................... 114 4.2.6 P protein structure and host interactions ............................................ 116 4.2.7 Determinants for HBV replication ...................................................... 117 4.3 Structural analysis of full-length ? ................................................................. 120 4.3.1 Resonance assignment ..................................................................... 120 4.3.2 Confirmation of ? secondary structure ................................................ 122 4.3.3 Assessing ? solvent accessibility and local dynamics ........................ 123 4.3.4 Full-length ? structural modeling ......................................................... 125 4.3.5 Experimental details ........................................................................... 127 4.3.5.1 NMR sample preparation ......................................................... 127 4.3.5.2 NMR resonance assignment .................................................... 128 4.3.5.3 sPRE measurements ............................................................... 129 4.3.5.4 13C ?xy rate measurements ...................................................... 129 4.3.5.5 SAXS measurements ............................................................... 130 4.3.5.6 Structure calculation ................................................................ 130 4.4 Probing full-length ? dynamics on multiple time scales ................................ 131 4.4.1 Full-length ? undergoes fast motions on the ps-ns time scale ............ 132 4.4.2 Full-length ? experiences slow motions on the ?s-ms time scale ....... 135 4.4.3 MD simulations show high-frequency motions within full-length ? ...... 137 4.4.4 Full-length ? undergoes vast conformational sampling ...................... 140 4.4.5 Experimental details ........................................................................... 143 4.4.5.1 NMR sample preparation ......................................................... 143 4.4.5.2 13C spin relaxation measurements ........................................... 144 4.4.5.3 1H CPMG measurements ........................................................ 145 4.4.5.4 MD simulations ........................................................................ 146 4.5 Conclusion .................................................................................................... 148 4.5.1 ? dynamics occur in highly conserved nucleotides ............................. 149 4.5.2 Conformational dynamics of ?: implications for HBV replication ........ 151 5 Discovery of potential anti-HBV therapeutics .................................................... 154 5.1 Introduction ................................................................................................... 154 5.2 Global burden of HBV .................................................................................. 154 5.3 Current treatments of chronic HBV infection ................................................ 155 5.3.1 FDA-approved chronic HBV treatments ............................................. 156 5.3.1.1 NRTI treatment ........................................................................ 157 5.3.1.2 IFN-? treatment ........................................................................ 158 5.3.2 Alternative NRTIs in clinical trials for chronic HBV treatment ............. 160 5.4 Alternative anti-HBV therapies ..................................................................... 161 5.4.1 Targeting RNase H activity ................................................................. 161 5.4.2 Targeting protein priming ................................................................... 162 5.4.3 Targeting the ?-P interaction .............................................................. 163 5.5 Discovery of ?-targeting ligands ................................................................... 165 5.5.1 High-throughput screening strategy ................................................... 166 5.5.1.1 Lead compound generation ..................................................... 166 5.5.1.2 Raloxifene selectively targets the ? priming loop ..................... 167 ix 5.5.1.3 Modeling the ?-Raloxifene complex ......................................... 171 5.5.1.4 Raloxifene is unable to inhibit HBV protein priming ................. 172 5.5.1.5 Experimental details ................................................................. 173 5.5.1.5.1 Small molecule microarray .................................................. 173 5.5.1.5.2 NMR titrations ..................................................................... 174 5.5.1.5.3 Dye-displacement assay ..................................................... 175 5.5.1.5.4 Computational docking ....................................................... 176 5.5.1.5.5 MD simulations ................................................................... 176 5.5.2 Virtual screening strategy ................................................................... 177 5.5.2.1 Lead compound generation ..................................................... 177 5.5.2.2 Daclatasvir selectively targets the ? priming loop .................... 180 5.5.2.3 Modeling the ?-Daclatasvir complex ........................................ 184 5.5.2.4 Experimental details ................................................................. 186 5.5.2.4.1 Virtual screening and computational docking ..................... 186 5.5.2.4.2 Dye-displacement assay ..................................................... 187 5.5.2.4.3 NMR titrations ..................................................................... 187 5.5.2.4.4 MD simulations ................................................................... 187 5.6 Conclusion .................................................................................................... 188 6 Conclusions and future directions ..................................................................... 191 6.1 Summary of work ......................................................................................... 191 6.2 Future directions ........................................................................................... 193 6.2.1 RNA labeling ....................................................................................... 193 6.2.2 Structural modeling of full-length ? ..................................................... 193 6.2.3 Probing slow motions in full-length ? .................................................. 194 6.2.4 Ensemble-based virtual screening ..................................................... 195 6.2.5 ? and N6-methylation .......................................................................... 195 Appendix ................................................................................................................. 197 Bibliography ............................................................................................................. 227 x List of Tables Table 1.1. Stable isotopes relevant to RNA NMR spectroscopy ............................. 4 Table 1.2. Price of commercial isotope-labeled rNTPs and RNA amidites ............. 10 Table 1.3. Enzymes for the de novo biosynthesis of purine rNTPs ......................... 15 Table 1.4. Enzymes for the de novo biosynthesis of pyrimidine rNTPs .................. 17 Table 1.5. Summary of nucleobase labels .............................................................. 26 Table 1.6. Enzymes for Gilles-Schramm-Williamson rNTP synthesis method ........ 27 Table 1.7. Enzymes for Dayie rNTP synthesis method ........................................... 29 Table 1.8. Enzymes for Serianni rNTP synthesis method ....................................... 31 Table 1.9. Summary of rNTP labels ........................................................................ 33 Table 1.10. Summary of RNA phosphoramidite labels ........................................... 45 Table 4.1. ? sequence requirements for HBV replication ........................................ 118 Table 4.2. ? structure requirements for HBV replication .......................................... 119 Table 4.3. NMR and refinement statistics for full-length (61 nt) ? ............................ 127 Table A.1. Nuclei excluded from sPRE analysis due to spectral overlap ................ 213 Table A.2. Nuclei excluded from 13C ?xy analysis due to spectral overlap .............. 213 Table A.3. Individual data fitting of 1H CPMG measurements ................................. 216 Table A.4. 1H CPMG data fitting from different analysis programs ......................... 217 Table A.5. Combined results for PL angle-based clustering of ? R3 ....................... 218 Table A.6. Combined results for global RMSD-based clustering of ? R3 ................ 218 Table A.7. NMR parameters for 13C R1 and R1? experiments ................................. 219 Table A.8. Additional NMR parameters for 13C spin relaxation experiments ........... 220 Table A.9. Nuclei excluded from dynamics analysis due to spectral overlap .......... 221 Table A.10. NMR parameters for 1H CPMG experiments ....................................... 221 Table A.11. Additional NMR parameters for 1H CPMG experiments ...................... 221 xi List of Figures Figure 1.1. Structural biology statistics .................................................................... 2 Figure 1.2. Limitations of 1H NMR of RNA ............................................................. 5 Figure 1.3. Nomenclature of RNA labeling .............................................................. 8 Figure 1.4. Examples of N-labeled 2?-O-tBDMS amidites ....................................... 37 Figure 1.5. Examples of 2H-, 13C-, and/or 15N-labeled 2?-O-CEM amidites ............. 38 Figure 1.6. Commercially available 13C-labeled modified 2?-O-tBDMS amidites ..... 41 Figure 1.7. Commercially available ribose-labeled 2?-O-tBDMS amidites ............... 44 Figure 1.8. Overview of solid-phase RNA synthesis ............................................... 46 Figure 1.9. Overview of T7 RNAP-based in vitro transcription ................................ 48 Figure 1.10. Overview of enzymatic RNA ligation ................................................... 50 Figure 1.11. Overview of segmental RNA labeling .................................................. 51 Figure 1.12. List of possible labeling topologies ...................................................... 54 Figure 2.1. Scheme of [2-13C, 7-15N]-ATP labeled RNA .......................................... 64 Figure 2.2. NMR structure experiments in [2-13C, 7-15N]-ATP labeled RNA ........... 66 Figure 2.3. NMR Dynamics measurements in [2-13C, 7-15N]-ATP labeled RNA ..... 67 Figure 2.4. Scheme of position-specifically [1?,6-13C2, 5-2U]-uridine labeled RNA .. 77 Figure 2.5. Schematic of scalar and dipolar couplings in RNA ............................... 82 Figure 3.1. RNA dynamics by solution NMR spectroscopy ..................................... 84 Figure 3.2. Effect of adjacent 13C-13C dipolar coupling on R1 rates in RNA ............ 87 Figure 3.3. Simulated contributions to Ade-C2 relaxation rates .............................. 89 Figure 3.4. Simulated Ade-C2 R1 rates in uniformly/selectively labeled RNA ......... 90 Figure 3.5. Pulse scheme for 13C R1 in selectively labeled RNA ............................. 91 Figure 3.6. RNA constructs for dynamics measurements ....................................... 92 Figure 3.7. Measured Ade-C2 R1 rates in uniformly/selectively labeled RNA ......... 93 Figure 3.8. Simulated NMR relaxation dispersion and CEST experiments ............. 99 Figure 3.9. Pulse scheme for 1H CPMG in selectively labeled RNA ....................... 103 Figure 3.10. RNA construct for CPMG measurements ........................................... 103 Figure 3.11. CPMG measurements in selectively labeled RNA .............................. 104 Figure 4.1. HBV genome organization .................................................................... 108 Figure 4.2. HBV lifecycle ......................................................................................... 109 Figure 4.3. HBV protein-priming and template translocation ................................... 112 Figure 4.4. HBV rcDNA genome synthesis ............................................................. 113 Figure 4.5. Summary of previous NMR studies of ? ................................................ 115 Figure 4.6. ? sequence and structure requirements for HBV replication ................. 118 Figure 4.7. ? modular constructs to facilitate resonance assignment ...................... 120 Figure 4.8. Assignment of ? priming loop resonances ............................................. 121 Figure 4.9. Assignment of additional ? priming loop resonances ............................ 122 Figure 4.10. Novel ? secondary structure information ............................................. 123 Figure 4.11. Full-length ? exhibits high solvent accessibility and local dynamics .... 125 Figure 4.12. NMR and SAXS refined solution structure of full-length ? ................... 126 Figure 4.13. NMR samples used to probe full-length ? dynamics ........................... 132 Figure 4.14. 13C relaxation experiments indicate dynamic regions of full-length ? .. 133 Figure 4.15. Mapping dynamic ?hot spots? in full-length ? ....................................... 134 Figure 4.16. 1H CPMG experiments reveal chemical exchange in full-length ? ...... 136 xii Figure 4.17. Quantifying chemical exchange processes in full-length ? .................. 137 Figure 4.18. MD simulations reveal high-frequency motions within full-length ? ..... 139 Figure 4.19. Structural insight into ns-?s sampling within full-length ? .................... 140 Figure 4.20. Global ? motions correlate with priming loop conformations ............... 141 Figure 4.21. Vast conformational sampling of full-length ? ...................................... 143 Figure 4.22. Full-length ? dynamics occur in highly conserved nucleotides ............ 152 Figure 5.1. Chronic HBV infection statistics ............................................................ 155 Figure 5.2. Timeline of FDA-approved chronic HBV treatment ............................... 156 Figure 5.3. FDA-approved NRTIs for chronic HBV treatment ................................. 157 Figure 5.4. NRTIs in clinical trials for chronic HBV treatment ................................. 161 Figure 5.5. Examples of HBV RNase H inhibitors ................................................... 162 Figure 5.6. Examples of ?-P binding inhibitors ........................................................ 164 Figure 5.7. Predicted ligand cavity in full-length ? ................................................... 165 Figure 5.8. Initial hit compounds from SMM ............................................................ 166 Figure 5.9. CSP mapping of Raloxifene binding to full-length ? .............................. 168 Figure 5.10. CSP mapping of Raloxifene binding to PL ? ....................................... 169 Figure 5.11. CSP mapping of SERMS binding to full-length ? ................................ 170 Figure 5.12. Quantifying the affinity of SERMS binding to full-length ? ................... 170 Figure 5.13. Computational docking and MD simulations of ? R3-Raloxifene ......... 172 Figure 5.14. Raloxifene has no effect on HBV protein priming ............................... 173 Figure 5.15. Virtual screen lead compound selection strategy ................................ 179 Figure 5.16. Virtual screen-identified lead compounds ........................................... 180 Figure 5.17. Lead compound binding to full-length ? ............................................... 181 Figure 5.18. Daclatasvir selectively binds full-length ? ............................................ 182 Figure 5.19. Mapping Daclatasvir binding to full-length ? ........................................ 183 Figure 5.20. Computational docking of ? R3-Daclatasvir ........................................ 184 Figure 5.21. MD simulations of ? R3-Daclatasvir .................................................... 185 Figure A.1. 1H NMR spectrum of compound 122 .................................................... 197 Figure A.2. 13C NMR spectrum of compound 122 ................................................... 197 Figure A.3. 1H NMR spectrum of compound 151 .................................................... 198 Figure A.4. 1H NMR spectrum of compound 152 .................................................... 198 Figure A.5. 1H NMR spectrum of compound 153 .................................................... 199 Figure A.6. 13C NMR spectrum of compound 153 ................................................... 199 Figure A.7. 1H NMR spectrum of compound 154 .................................................... 200 Figure A.8. 13C NMR spectrum of compound 154 ................................................... 200 Figure A.9. 1H NMR spectrum of compound 155 .................................................... 201 Figure A.10. 13C NMR spectrum of compound 155 ................................................. 201 Figure A.11. 1H NMR spectrum of compound 156 .................................................. 202 Figure A.12. 13C NMR spectrum of compound 156 ................................................. 202 Figure A.13. ESI-TOF MS spectrum of compound 156 ........................................... 203 Figure A.14. 31P NMR spectrum of compound 162 ................................................. 203 Figure A.15. 1H NMR spectrum of compound 168 .................................................. 204 Figure A.16. 13C NMR spectrum of compound 168 ................................................. 204 Figure A.17. 1H NMR spectrum of compound 171 .................................................. 205 Figure A.18. 13C NMR spectrum of compound 171 ................................................. 205 Figure A.19. 1H NMR spectrum of compound 172 .................................................. 206 xiii Figure A.20. 13C NMR spectrum of compound 172 ................................................. 206 Figure A.21. 31P NMR spectrum of compound 172 ................................................. 207 Figure A.22. ESI-TOF MS of compound 172 .......................................................... 207 Figure A.23. ESI-TOF MS of compound 173 ........................................................... 208 Figure A.24. ESI-TOF MS of compound 174 ........................................................... 208 Figure A.25. ESI-TOF MS of compound 175 ........................................................... 209 Figure A.26. ESI-TOF MS of compound 176 ........................................................... 209 Figure A.27. Simulated Ade-C2 relaxation in uniformly/selectively labeled RNA .... 210 Figure A.28. Representative TROSY spectra in Ade-C2 relaxation experiments ... 210 Figure A.29. Measured Ade-C2 relaxation in uniformly/selectively labeled RNA .... 211 Figure A.30. ? modular constructs faithfully recapitulate full-length ? ...................... 212 Figure A.31. Comparison of full-length and apical loop ? NMR structures .............. 213 Figure A.32. Full-length ? SAXS data ...................................................................... 214 Figure A.33. Relaxation decay curves for full-length ? ............................................ 215 Figure A.34. Comparison of ? R3 MD trajectories ................................................... 218 Figure A.35. Small molecule microarray set-up ...................................................... 222 Figure A.36. NMR titration experiments for SMM-selected compounds ................. 223 Figure A.37. Testing selectivity of Raloxifene and Pentamidine ............................. 224 Figure A.38. Priming loop docking validation .......................................................... 225 Figure A.39. Non-binding lead compounds ............................................................. 226 xiv List of Schemes Scheme 1.1. Synthesis of [6-13C, 5-2H]-uracil ........................................................ 19 Scheme 1.2. Synthesis of [6-2H]-5FU ..................................................................... 20 Scheme 1.3. Synthesis of [6-13C]-thymine ............................................................. 21 Scheme 1.4. Synthesis of [8-13C]-adenine .............................................................. 21 Scheme 1.5. Synthesis of [8-13C]-adenine .............................................................. 21 Scheme 1.6. Synthesis of [1-15N]-adenine .............................................................. 23 Scheme 1.7. Synthesis of [3-15N]-adenine .............................................................. 24 Scheme 1.8. Synthesis of [7-15N]-guanine .............................................................. 24 Scheme 1.9. Synthesis of [9-15N]-adenine .............................................................. 25 Scheme 1.10. rNTP synthesis from D-glucose ....................................................... 28 Scheme 1.11. rNTP synthesis from D-ribose .......................................................... 30 Scheme 1.12. rNTP synthesis from inosine ............................................................ 32 Scheme 1.13. Synthesis of [6-13C, 5-2H]-uridine 2?-O-TOM amidite ........................ 34 Scheme 1.14. Synthesis of [6-13C, 5-2H]-N4-Ac-cytidine 2?-O-TOM amidite ............ 35 Scheme 1.15. Synthesis of [8-13C]-N6-Bz-adenosine 2?-O-TOM amidite ................ 36 Scheme 1.16. Synthesis of [8-13C]-N2-iBu-guanosine 2?-O-TOM amidite ............... 36 Scheme 1.17. Synthesis of [1,3-15N2]-cmo5-uridine 2?-O-tBDMS amidite ................ 40 Scheme 1.18. Synthesis of [2,8-13C2]-N6-methyladenosine 2?-O-tBDMS amidite ... 41 Scheme 1.19. Synthesis of [5-13C, 5-19F]-uridine 2?-O-tBDMS amidite ................... 42 Scheme 1.20. Synthesis of [5-13C, 5-19F]-N4-Ac-cytidine 2?-O-tBDMS amidite ....... 43 Scheme 2.1. Synthesis of [2-13C, 7-15N]-adenine .................................................... 59 Scheme 2.2. Synthesis of [2-13C, 7-15N]-ATP .......................................................... 60 Scheme 2.3. Synthesis of [1?,6-13C2, 5-2H]-uridine .................................................. 70 Scheme 2.4. Synthesis of [1?,6-13C2, 5-2H]-uridine 2?-O-CEM amidite ..................... 71 Scheme 2.5. Synthesis of N6-Pac-adenosine 2?-O-CEM amidite ............................ 78 Scheme 2.6. Synthesis of N2-Pac-guanosine 2?-O-CEM amidite ............................ 78 Scheme 2.7. Synthesis of N4-Ac-cytidine 2?-O-CEM amidite .................................. 78 Scheme 2.8. Synthesis of uridine 2?-O-CEM amidite .............................................. 79 xv Abbreviations 1D, one-dimension(al) 2D, two-dimension(al) 3D, three-dimension(al) 5FU, [5-13C, 5-19F, 6-2H]-uracil A, adenosine Ac, acetyl Ac2O, acetic anhydride ACE, bis(acetoxyethoxy)methyl ether Ade, adenosine Ade-R1,C2, adenosine C2 R1 rate Ade-R1,C2(selective), adenosine C2 R1 rate from a selectively labeled RNA Ade-R1,C2(uniform), adenosine C2 R1 rate from a uniformly labeled RNA ADMET, absorption, distribution, metabolism, excretion, and toxicity ADV, Adefovir dipivoxil AIC, Akaike Information Criterion AICA, 5-aminoimidazole-4-carboxamide AL, apical loop ALT, alanine aminotransferase AMP(s), adenosine 5?-monophosphate(s) APRT, adenine phosphoribosyltransferase arom, aromatic ATBR, 1?-O-acetyl-(2?,3?,5?-O-tribenzoyl)-?-D-ribofuranose ATP(s), adenosine 5?-triphosphate(s) amidites, phosphoramidites B0, static magnetic field bp, (Watson-Crick) base pair BIL, Biliverdin br, broad BSA, bovine serum albumin BSV, Besifovir dipivoxyl maleate BTT, 5-(benzylthio)-1H-tetrazole Bz, benzoyl C, cytidine C, HBV core protein 12C, carbon-12 13C, carbon-13 13C-KCN, 13C-potassium cyanide cccDNA, covalently closed circular DNA CDI, 1,1'-carbonyldiimidazole CEF, Ceftaroline fosamil CEM, 2-cyanoethoxymethyl CEP, 2-cyanoethyl-N,N-diisopropylamino CEST, Chemical Exchange Saturation Transfer CH3, methyl xvi (CH3)2C(OCH3)2, dimethoxypropane cHBV, chronic HBV CHCl3, chloroform CH2Cl2, dichloromethane CH3CN, acetonitrile CH3NO, formamide C5H9NOS, 2-cyanoethyl methylthiomethylether CH3OH, methanol C2H5ONa, sodium ethoxide CIL, Cambridge Isotope Laboratories CK, creatine kinase CLV, Clevudine cmo5U, uridine 5-oxyacetic acid CMP(s), cytidine 5?-monophosphate(s) CO2, carbon dioxide CP, creatine phosphate CPMG, Carr-Purcell-Meiboom-Gill cryo-EM, cryo-electron microscopy CSA, chemical shielding anisotropy CSP, chemical shift perturbation CT, catalase CTP(s), cytidine 5?-triphosphate(s) CTPS, CTP synthase Cyt, cytosine/cytidine d, doublet DAC, Daclatasvir dATP(s), deoxyadenosine 5?-triphosphate(s) D2O, deuterium oxide DDX3, DEAD-box RNA helicase 3 DEMA, diethoxymethyl acetate DMAM, di-N-methylamino)methylene DMAP, N-dimethyl aminopyridine DMF, dimethylformamide DMSO, dimethyl sulfoxide DMSO-d6, deuterated dimethyl sulfoxide DNA, deoxyribonucleic acid DR1, direct repeat 1 DR2, direct repeat 2 DSS, sodium-3-(trimethylsilyl)-1-propanesulfonate DtBS, di-tert-butylsilyl bis(trifluoromethanesulfonate) DTT, dithiothreitol EC, Enzyme Commission E. coli, Escherichia coli EDC, N-ethyl-N?-(3-dimethyl aminopropyl) carbodiimide EDTA, ethylenediaminetetraacetic acid ELB, Elbasvir xvii ENO, enolase eq, equivalents ESI, electron spray ionization ETV, Entecavir 19F, fluorine-19 F, phenylalanine FOL, Folinic acid fs, femtosecond G, guanosine g, grams Gd-DTPA-BMA, gadolinium-diethylenetriamine pentaacetic acid-bismethylamide GDN, Geldanamycin GND, phosphogluconate dehydrogenase GK, guanylate kinase GLUD, Glutamate dehydrogenase (NAD(P)+) GMP(s), guanosine 5?-monophosphate(s) GTP(s), guanosine 5?-triphosphate(s) Gua, guanine/guanosine h, hour 1H, hydrogen-1, proton 2H, hydrogen-2, deuterium HBeAg, hepatitis B e antigen HBsAg, hepatitis B surface antigen HBV, hepatitis B virus HCC, hepatocellular carcinoma HCl, hydrochloric acid HC(OC2H5)3, triethyl orthoformate HCOOH, formic acid HDV, hepatitis delta virus HEM, hemin HEPES, 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid HF, hydrogen fluoride HF, host factor HH, hammerhead H2SO4, sulfuric acid HID, N-Hydroxyisoquinolinediones HIV-1, human immunodeficiency virus Hsp, heat shock protein HNO, N-Hydroxynapthyridinones hNOE, heteronuclear Overhauser effect HOCH2CH2CN, 2-cyanoethanol HPLC, high-performance liquid chromatography HPD, N-Hydroxypyridinediones HTS, high-throughput screening HXK, hexokinase Hz, hertz xviii iBu, isobutyrl IFN, interferon IMP, inosine 5?-monophosphate IUBMB, International Union of Biochemistry and Molecular Biology IUPAC, International Union of Pure and Applied Chemistry IVE, Ivermectin IVT, in vitro transcription kB, kilobase(s) kDa, kilodalton KCl, potassium chloride kAB, forward rate (i.e., from A to B) kBA, reverse rate (i.e., from B to A) kex, exchange rate constant LDH, L-lactate dehydrogenase LED, Ledipasvir LH, lower helix L. lactis, Lactoccocus lactis LLC, Limited Liability Company LMV, Lamivudine M, molar m1A, N1-methyladenine m6A, N6-methyladenine m3C, N3-methylcytidine M. extorquens, Methylorubrum extorquens Mg, milligram MgCl2, magnesium chloride min, minute MIN, Minocycline MK, myokinase mL, milliliter M. methylotrophus, Methylophilus methylotrophus mM, millimolar mmol, millimole MOPS, 3-morpholinopropane-1-sulfonic acid mRNA, messenger RNA ms, millisecond MS, mass spectrometry MTM, methylthiomethyl N, one normal acid or base 14N, nitrogen-14 15N, nitrogen-15 nc, non-(Watson-Crick) canonical base pair NH3, ammonia NH4Cl, ammonium chloride 15NH4Cl, 15N-ammonium chloride NH4F, ammonium fluoride xix NH4OH, ammonium hydroxide 15NH4OH, 15N-ammonium hyrdoxide NAD(P)H/NADP+, nicotinamide adenine dinucleotide phospha NaH13CO3, 13C-sodium bicarbonate NaN3, sodium azide NaNO2, sodium nitrite Na15NO2, sodium 15N-nitrite NaOF, sodium deuteroxide Na3PO4, sodium phosphate Na2SO4, sodium sulfate Na2S2O3, sodium thiosulfate Na2S2O4, sodium dithionite NAT, Natamycin NDB, Nucleic Acid Database NH415NO3,15N-ammonium nitrate NIS, N-iodosuccinimide nm, nanometer nM, nanomolar nmol, nanomoles NMPK, nucleoside-monophosphate kinase NMR, nuclear magnetic resonance N,N-DMA, N,N-dimethylaniline NOE, nuclear Overhauser effect NRTI(s), nucleos(t)ide reverse transcriptase inhibitor(s) nt(s), nucleotide(s) NTCP, sodium-taurocholate co-transporting polypeptide receptor OH, hydroxyl OMP, orotidine 5?-monophosphate 31P, phosphorus-31 P, HBV polymerase protein pA, population of state A Pa, pascal pB, population of state B Pac, phenoxyacetyl PAGE, polyacrylamide gel electrophoresis PEG, polyethylene glycol PDB, Protein Data Bank PDDF, pair distance distribution function P. furiosus, Pyrococcus furiosus PGI1, glucose-6-phosphate isomerase pgRNA, pre-genomic RNA pH, potential hydrogen PKC-?, protein kinase C-? PL, priming loop PLOR, position-selective labeling of RNA PME, particle mesh Ewald xx PNPase, purine nucleoside phosphorylase POCl3, phosphorus oxychloride Pol II, RNA polymerase II poly-A, poly-adenylate(d) ppm, parts per million PPP-IX, Protoporphyrin IX PPP-IX-Na, Protoporphyrin IX disodium preC, HBV pre-core protein preS1, HBV pre-surface protein (large) preS2, HBV pre-surface protein (medium) PRPP, 5-phospho-D-ribosyl-?-1-pyrophosphate PRPPS, phosphoribosylpyrophosphate synthetase ps, picosecond PTL, pseudo-triloop pTSA, para-toluene sulfonic acid PYKF, pyruvate kinase QUE, Quercetin R1, longitudinal relaxation rate ?#" , longitudinal relaxation rate of state A ?$" , longitudinal relaxation rate of state B R1?, rotating-frame relaxation rate R2, transverse relaxation rate ?#% , transverse relaxation rate of state A ?$% , transverse relaxation rate of state B R2,eff, effective R2 rate R2?, fast relaxing antii-TROSY component R2?, slowly relaxing TROSY component rad, radian rcDNA, relaxed circular DNA RD, relaxation dispersion RDC, residual dipolar coupling Rex, exchange contribution to R2 RF, radiofrequency RK, ribokinase RMSD, root-mean square deviation RNA, ribonucleic acid RNAP, RNA polymerase ROS, Rosmarinic acid RPI1, Ribose-5-phosphate isomerase rRNA, ribosomal RNA rNMP(s), ribonucleotide 5?-monophosphate(s) rNTP(s), ribonucleotide 5?-triphosphate(s) R5P, ribulose-5-phosphate S, generalized order parameter S, HBV pre-surface protein (small) s, second xxi s, singlet SAH, S-adenosylhomocysteine SAM, S-adenosylmethionine SAQ, Saquinavir SAS, sample-and-select SAXS, small-angle X-ray scattering SD, standard deviation SE, standard error SELEX, Systematic Evolution of Ligands by EXponential Enrichment SERM(s), selective estrogen receptor modulator(s) SIM, Simeprevir SiO2, silica SMM, small molecules microarray SO2Cl2, sulfuryl chloride sPRE, solvent paramagnetic relaxation enhancement SPS, solid-phase synthesis T4 PNK, T4 polynucleotide kinase t, triplet T, tesla TAF, Tenofovir alafenamide tBDMS, tert-butyldimethylsilyl TBV, Telbivudine TDF, Tenofovir disoproxil fumarate TEA-3HF, triethylamine trihydrofluoride TEL, Telithromycin TEN, Tenofovir TfOH, Trifluoromethanesulfonic acid THF, anhydrous tetrahydrofuran Thy, thymine TiPCEP, 2-cyanoethyl N,N,N?,N?-tetraisopropylphosphorodiamidite TIPP, thermostable inorganic pyrophosphatase TIPS-Cl2, 1,3-dichloro-1,1,3,3-tetraisopropyldisiloxane TOCSY, total correlation spectroscopy TOM, [(triisopropylsilyl)oxy]methyl Trelax, constant time relaxation period tRNA, transfer RNA TXL, Tenofovir exalidex u, uniformly (labeled) U, uridine UH, upper helix UMP(s), uridine 5?-monophosphate(s) UPase, uridine phosphorylase UPRT, uracil phosphoribosyltransferase Ura, uracil Uri, uridine UTP(s), uridine 5?-triphosphate(s) xxii VEL, Velpatasvir VS, Varkud satellite VS, virtual screening W, watt WAXS, wide-angle X-ray scattering XGPRT, xanthine-guanine phosphoribosyltransferase XO, xanthine oxidase Y, tyrosine YIBO, 3-phosphoglycerate mutase ZWF, glucose-6-phosphate dehydrogenase ?, favorable, highly populated NMR state (aligned parallel with B0) ?R1P, ?-D-ribofuranosyl-1-phosphate sodium salt ?HT, ?-Hydroxytropolones ?, unfavorable, low populated NMR state (aligned antiparallel with B0) g, gyromagnetic ratio ?()*''(, chemical shift change that would be present if pa = pb ??, change in chemical shift ?L, microliter ?M, micromolar ?mol, micromole ?s, microsecond vCPMG, frequency at which 180? pulses are applied in a CPMG experiment ?C, overall correlation time tex, exchange lifetime ?, pseudouridine ?, Larmor frequency ?", B1 field strength ?#, offset of the B1 spin-lock field from the peak in state A ?$, offset of the B1 spin-lock field from the peak in state B ?, offset from the spin-lock carrier frequency ?obs, resonance frequency of the observed nucleus ?rf, spinlock transmitter frequency ?, angstrom ?C, degrees Celsius xxiii 1 RNA isotope labeling in the context of solution NMR spectroscopy *This chapter is adapted from the following1?3. 1.1 Introduction Ribonucleic acid (RNA) is a dynamic macromolecule that has many biological functions, including gene regulation4?8, catalysis9?11, structural organization12,13, and viral replication14,15, to name a few. Almost without exception, RNA?s intricate three- dimensional (3D) structure and conformational plasticity are required to carry out these functions16,17. A robust understanding of RNA function therefore requires high-resolution structure and dynamics information. Unfortunately, there is a scarcity of RNA 3D structures as compared to those of proteins. The number of RNA-only structures deposited in the Protein Data Bank (PDB) remains below 1% whereas the number of protein-only structures is a staggering 87% (Figure 1.1A). This paucity undercuts current understanding of RNA structure-function relationships. Nuclear magnetic resonance (NMR) spectroscopy accounts for 35% of the RNA structures deposited in the Nucleic Acid Database (NDB) and ~7% of the protein structures in the PDB, making it competitive with other biophysical tools such as X-ray crystallography and more recently cryo-electron microscopy (cryo-EM) (Figure 1.1B)18. Moreover, NMR spectroscopy provides high-resolution structural dynamic information in solution, rendering it an ideal tool to study RNA and its interactions with macromolecules or small drug-like compounds or both19?26. However, unlike proteins, which are made up of 20 unique amino acid building blocks, RNAs are composed of only four aromatic nucleotides (nts) (i.e., purines: adenosine (Ade, A) and guanosine (Gua, G) and pyrimidines: cytidine (Cyt, C) and uridine (Uri, U)) that resonate over a very narrow 1 chemical shift region. This poor chemical shift dispersion is further exacerbated with increasing RNA size. In fact, only 23 RNA structures >60 nt have been solved by NMR (some requiring additional methodologies, e.g., cryo-EM). To overcome these limitations, novel isotope labeling strategies that incorporate atom-specific labels (e.g., uridine 13C6) or expand the number of NMR probes (e.g., 13C-19F) have been developed. Figure 1.1. Structural biology statistics3. (A) Percentage of RNA-only or protein-only structures deposited in PDB. Given that this analysis excluded DNA-only structures and structures of protein- RNA/DNA complexes, the percentages do not sum to 100%. (B) Percentage of RNA-only and protein-only structures deposited in the NDB and PDB, sorted by structure determination technique. Given that this analysis is self-contained within categories, the percentages sum to 100%. NMR accounts for a larger fraction of RNA structures as compared to proteins. (C) Histogram of RNA NMR structures in the NDB, sorted by RNA size (in nt, bin = 10 nt). Given the challenges faced by RNA NMR, there are only 23 NMR structures corresponding to RNAs >60 nt. PDB (https://www.rcsb.org/) and NDB (http://ndbserver.rutgers.edu/) statistics were accessed in March 2022. The main approaches to obtain isotope-labeled RNA are enzymatic or chemical synthesis. For the enzymatic approach, almost all methods are based on DNA template- directed T7 RNA polymerase (RNAP)-based in vitro transcription (IVT) using ribonucleoside 5?-triphosphates (rNTPs)27?30. The alternative method is chemical solid- phase synthesis (SPS) using RNA phosphoramidites (amidites)31?34. Both approaches 2 can use unlabeled and isotope-labeled building blocks (i.e., rNTPs and amidites) to generate versatile RNA labeling patterns. The four strategies to obtain such building blocks for enzymatic RNA synthesis are: (1) purchase commercially available isotope- labeled rNTPs, (2) use simple organisms to incorporate isotope-labeled precursors into their rNTPs, (3) complete de novo biosynthesis of rNTPs, or (4) utilize a hybrid chemo- enzymatic approach that combines chemical syntheses of ribose and nucleobases and their enzymatic coupling to prepare rNTPs. In this chapter, we outline the common NMR-active isotope (Section 1.2) and then detail the benefits and limitations of the four methods to prepare isotope-labeled RNA building blocks (Section 1.3) with special emphasis on chemo-enzymatic and amidite labeling. My specific contributions to these approaches, however, will not be discussed until Chapter 2. With labeled building blocks in-hand, we will then discuss the various ways of using them to make isotope-labeled RNA in the context of NMR studies (Section 1.4). Finally, we will conclude with some remarks on where the RNA labeling field is and where it is headed (Section 1.5). More broadly, this chapter is meant as an introduction to a variety of topics, techniques, and terminology that will be interwoven throughout the remaining chapters. It is also important to note that this chapter is adapted from three different review articles and therefore includes contributions from other graduate students from our group: Drs. Owen Becette and Regan Leblanc as well as Mary Taiwo. 1.2 Stable isotopes in NMR spectroscopy Frederick Soddy is credited with coining the word ?isotope? from the Greek isos (????) and topos (?????) meaning ?same place?35, with the idea that stable isotopes are chemical elements that occupy the same position in the periodic table but differ in mass due to a 3 different number of neutrons within the atomic nucleus. Stable isotopes have been used in a wide range of applications in industry, academia, and medicine35. In particular, the use of stable isotopes as probes, labels, or standards has significantly impacted methods such as NMR spectroscopy and mass spectrometry (MS). For this work, we will focus on how these probes impact RNA NMR spectroscopy, with special emphasis on the following stable isotopes: proton (hydrogen-1 or 1H), deuterium (hydrogen-2 or 2H), carbon-13 (13C), nitrogen-15 (15N), fluorine-19 (19F), and phosphorus-31 (31P) (Table 1.1). Table 1.1. Stable isotopes relevant to RNA NMR spectroscopy3,36,37. Isotope Natural abundance (%) g (rad Hz T-1) Spin 1H 99.99 26.752 x 107 ? 2H 0.01 4.107 x 107 1 12C 98.90 NMR inactive NMR inactive 13C 1.10 6.728 x 107 ? 14N 99.58 1.934 x 107 1 15N 0.37 -2.713 x 107 ? 19F 100.00 25.181 x 107 ? 31P 100.00 10.839 x 107 ? 1.2.1 Proton isotope The proton isotope has high natural abundance (~100%) and the highest sensitivity of NMR receptive and stable nuclei (Table 1.1). Therefore, homonuclear two-dimensional (2D) 1H-1H NMR methods were attractive in the early days of NMR analysis. However, the very limited resolution of ribose and aromatic nucleobase resonances in RNA 1H spectra limited such studies to small RNAs (<5 kilodalton, kDa). Within the ribose, all protons with the exception of H1' (i.e., H2', H3', H4', H5', and H5'') are clustered within a narrow ~0.6-0.8 ppm range (Figure 1.2A)38. Within the nucleobase, the chemical shift distribution of all protons is limited to 1 ppm or less, except for imino protons with a dispersion of ~4 ppm (Figure 1.2A)39,40. Taken together, the distribution of proton 4 resonances leads to severe chemical shift overlap that worsens as RNAs grow in size due to increased resonance line broadening (Figure 1.2B). Figure 1.2. Limitations of 1H NMR of RNA3. (A) One-dimensional (1D) 1H NMR spectrum of a 61 nt RNA to emphasize the narrow chemical shift dispersion of RNA protons. Here, bp and nc refer to canonical Watson-Crick base pair and non-canonical pairs, respectively. A schematic of RNA ribose and nucleobase structures and numbering are shown as an inset. (B) Nucleobase region of 1H NMR spectra for RNAs of increasing size. Both signal overlap and broad linewidths worsen as RNAs grow in size. In fact, for the best visual representation, the signals corresponding to the 61 and 232 nt RNAs were increased to display them on a similar scale to that of the 14 nt RNA. 1.2.2 Heteronuclear 13C and 15N isotopes Unlike protons, with a chemical shift span of 2-15 ppm, 13C and 15N nuclei in RNA have a larger chemical shift distribution among the various atomic sites. For example, 13C nuclei in RNA have chemical shifts from 61 (ribose-C5') to 170 (non-protonated pyrimidine (i.e., Cyt- and Uri-C4) ppm, and 15N nuclei from 70 (amino nitrogen) to 240 (non-protonated purine (i.e., Ade- and Gua-N7) ppm38?40. Introduction of the 15N isotope (0.37%) into RNA nucleobases circumvents the extensive resonance line broadening that arises from the electric quadrupole moment of the naturally abundant 14N isotope (99.63%) (Table 1.1). 5 As a spin ? nucleus with low gyromagnetic ratio (g) (Table 1.1), the 15N isotope provides very narrow spectral lines. Moreover, given the wider chemical shift dispersion of nitrogen over the 1H nucleus and narrower linewidths over 13C and 1H nuclei, nitrogen atoms located in nucleic acid major and minor grooves ? sites of metal, drug, or macromolecule interactions ? can be readily monitored, even in larger RNAs. However, low-g is also an ?Achilles heel?. In the absence of appropriate NMR cryogenic probes and the availability of high magnetic fields, detecting low-g nuclei such as 15N has been very unattractive. Increasing availability of such probes is expected to reverse this trend. Nevertheless, these considerations raised hopes that heteronuclear NMR methods might be a solution for the shortcomings of proton NMR41. Beginning in the 1980s, several groups introduced 13C, 15N, and 2H labels to facilitate NMR studies of RNAs and proteins42?51. Depending on the scientific question, these labels were introduced uniformly (e.g., 13C enriched at all carbon sites) or selectively (e.g., Uri-C4) using bacteria in vivo or enzyme catalyzed synthesis in vitro. Selective enrichment was achieved by growing auxotrophs on obligate chemically synthesized compounds. Selective 13C incorporation into bacterial tRNAs49?51 and 15N- labeling of tRNA and 5S rRNA enabled various atomic sites in these RNAs to be monitored by NMR. Uniform enrichment with 15N was applied to 5S rRNA in vivo45?48. To extend this labeling to additional RNAs, several research groups developed in vitro methods to convert ribonucleoside 5'-monophosphates (rNMPs) isolated from bacteria grown on 2H-, 13C-, and/or 15N-sources into rNTPs for IVT52?56. These uniform 13C- and 15N-labeling technologies did extend the use of NMR to medium-sized RNAs (<20 kDa). However, two perennial challenges of low signal-to-noise and decreased spectral 6 resolution remained. The latter problem arises from the reintroduction of spectral overlap along the heteronuclear dimension as the RNA grows in size, and the former occurs from increased relaxation that results from the slower overall tumbling of large biomolecules. The next section will describe recent labeling methods to overcome both problems 1.2.3 Deuteration in context of heteronuclear 13C and 15N isotopes Deuteration (i.e., replacement of 1H with 2H) simplifies the multiplicity of spin-spin interactions, eliminates nonessential resonances, reduces spectral overlap, helps to identify coupling patterns, and improves calculation of coupling constants with precision57. Given the smaller g of the 2H relative to 1H (gD ~ gH/6.5) (Table 1.1), the relaxation rates are scaled proportionally by 2% [(gD/gH)2~0.02]. By eliminating competing relaxation pathways of dipolar coupled protons, deuteration suppresses spin diffusion within a relaxation network, leading to smaller linewidths and higher signal-to-noise for the remaining protons and directly attached 13C and 15N spins.52,57,58 Given these advantages, 2H-labeling has played an important role in probing the structure, dynamics, and interactions of large RNAs by NMR18,59?64. 1.2.4. Fluorination in context of 2H, 13C, and 15N isotopes In addition to 2H, magnetically active nuclei such as 19F have valuable spectroscopic properties that confer clear advantages in the study of macromolecular structure and conformational changes65. These benefits include the 100% natural abundance of 19F, a comparably large g (94% of 1H), and a superior chemical shift dispersion that is ~6-fold that of 1H (Table 1.1).26,66 Furthermore, 19F is sensitive to changes in its local chemical environment, making it a useful probe of conformational changes.26,65,66 Fluorine has an 7 atomic radius (1.35 ?) slightly larger than that of a hydrogen (1.20 ?) but slightly smaller than that of a methyl group (2.00 ?). The 19F nuclei is therefore expected to substitute for either group without serious structural perturbation67, making it a valuable tool for the in vitro study of medically important RNAs68. Finally, 19F is virtually absent in biological systems and therefore offers 19F NMR the biorthogonal advantage of background-free drug screening69. Taken together, 19F is an attractive probe for studying RNAs in solution. 1.3 Stable isotope labeling of RNA building blocks Throughout this work, we adopt the IUPAC/IUBMB guidelines for RNA atom numbering (Figure 1.2A)70,71. When describing RNA labeling, we will use four categories: (1) uniform, (2) nucleotide-specific, (3) atom-specific, and (4) position-specific (Figure 1.3). Figure 1.3. Nomenclature of RNA labeling. Schematic of an RNA molecule that is either (A) uniformly, (B) nucleotide-specifically, (C) atom-specifically, (D) or position-specifically labeled. The RNAs in A-C are prepared by IVT using either uniformly 13C/15N-labeled rNTPs, uniformly 13C/15N-labeled UTP, or [1?,6-13C2, 5-2H]-UTP, respectively. The RNA in D is prepared by SPS with a [1?,6-13C2, 5-2H]-uridine amidite. Orange nucleotides represent those harboring stable isotope labels. 8 Uniform labeling is when every atom of a certain type (e.g., 2H, 13C, 15N, or both of the latter) is enriched, nucleotide-specific labeling is when every nucleotide of a certain type (e.g., all uridines) is enriched, atom-specific labeling is when every atom of a certain type (e.g., Uri-C6) is enriched, and position-specific labeling is when an individual nucleotide (e.g., uridine 7) is labeled (Figure 1.3). In the latter case, the type of label that is incorporated position-specifically can be uniformly or atom-specifically labeled. As such, these labeling categories are not mutually exclusive. 1.3.1 Commercial isotope sources The simplest approach to obtain isotope-labeled rNTPs and amidites is to purchase the desired 2H, 13C, and/or 15N isotope labels from a commercial source. As of January 2022, Cambridge Isotope Laboratories (CIL), Sigma-Aldrich, Cassia LLC, Silantes, and INNotope are the major suppliers of isotope-labeled rNTPs, whereas Silantes and INNotope are the only suppliers of isotope-labeled RNA amidites. Unfortunately, these products can be prohibitively expensive. Uniformly 2H-, 13C-, 15N-, 2H/15N-, and 13C/15N- labeled and atom-specifically 2H-, 13C-, 2H/13C-labeled rNTPs are available for $800-5,600 per 100 micromoles (?mol) or 50 milligrams (mg) (Table 1.2). Additionally, uniformly 13C- , 15N-, and 13C/15N-labeled and atom-specifically 2H-, 13C-, 2H/13C-, and 15N-labeled amidites are available for $900-6,600 per 50 mg (Table 1.2). For reference, IVT (20 milliliter (mL)) typically requires 250-1,000 microliter (?L) per rNTP (100 ?mol stock) and yields 0.2-2.0 millimolar (mM) in 300 ?L of RNA (Section 1.4.2). SPS (1 ?mol) generally requires 10-20 mg per amidite (0.1 molar (M) stock) coupling and yields 0.2-0.6 mM in 300 ?L of RNA (Section 1.4.1). 9 Table 1.2. Price of commercial isotope-labeled rNTPs and RNA amidites1. Building blocka Priceb ($) Supplierc Uniformly 2H-labeled rNTPs ATP, GTP, CTP, or UTP 1,300 Silantes Selectively 2H-labeled rNTPs [3?,4?,5?,5??-2H4]-ATP or- GTP 800 CIL [2-2H]-ATP 1,200 CIL [5,1?,2?,3?,4?,5?,5??-2H7]-CTP 1,800 CIL [5,3?,4?,5?,5??-2H5]-CTP or -UTP 800 CIL [5,6-2H2]-CTP 1,800 CIL [5,1?,2?,3?,4?,5?,5??-2H7]-UTP 1,400 CIL Uniformly 13C-labeled rNTPs and amidites ATP, GTP, CTP, or UTP 1,400 Silantes A, C, G, or U 6,600 Silantes Selectively 13C-labeled rNTPs and amidites [8-13C]-ATP or -GTP 1,400 Silantes [8-13C]-A 900 Silantes [2,8-13C2]-A 2,700 Silantes [8-13C]-G 1,000 INNotope [1?,8-13C2]-A or -G 2,800 Silantes Selectively 13C-labeled methylated amiditesd [13C]-m6A 1,200 INNotope [2,8-13C2]-m6A 1,200 INNotope [13C]-m1A 1,200 INNotope [13C]-m3C 1,200 INNotope Selectively 2H/13C-labeled rNTPs and amidites [6-13C, 5-2H]-CTP or -UTP 1,600 Silantes [6-13C, 5-2H]-C or U 1,000 INNotope [1?,6-13C2, 5-2H]-C or U 2,800 Silantes Uniformly 15N-labeled rNTPs and amidites ATP, GTP, CTP, or UTP 900 CIL A, G, C, or U 1,400 Silantes Selectively 15N-labeled amidites [1-15N]-A 1,000 INNotope [1-15N]-G 1,100 INNotope [3-15N]-C- or -U 1,000 Silantes [1,3-15N2]-C 1,200 Silantes [1,3,4-15N3]-C 1,000 INNotope [1,3-15N2]-U 1,000 INNotope Uniformly 2H/15N-labeled rNTPs ATP, GTP, CTP, or UTP 5,600 Silantes Uniformly 13C/15N-labeled rNTPs and amidites ATP, GTP, CTP, or UTP 1,100 CIL A, G, C, or U 5,300 Silantes a. For simplicity, all nucleobase (A, G, or C) exocyclic amine protecting groups are neglected for amidites. b. Price of rNTPs (CIL, per 100 ?mol or Silantes, per 50 mg) and amidites (Silantes or INNotope, per 50 mg) were accessed from their respective websites (CIL: https://www.isotope.com/, Silantes: https://www.silantes.com/, and INNotope: https://www.innotope.at/) in March 2022. c. While some of the isotope-labeled material is available from multiple suppliers, all prices reflect the cheapest available and are rounded up to the nearest $100. d. More information on these amidites can be found in Section 1.3.5.2 and Figure 1.6. 10 These considerations underscore a sobering fact: the cost per NMR sample when using commercial building blocks often exceeds $1,000. Since it takes multiple samples for complete NMR resonance assignment, robust RNA analysis by NMR can easily reach $10,000. For example, a recently determined structure of the 43 nt SAM/SAH-binding riboswitch required seven uniformly and nucleotide-specifically labeled samples by IVT and 13 atom- and position-specifically labeled samples for SPS72,73. This financial burden partly explains the slow rate of RNA structure depositions (Figure 1.1C). It is therefore crucial to reduce the costs of obtaining isotope-labeled RNA for NMR studies. 1.3.2 Biomass labeling Biomass labeling incorporates isotope-labeled building blocks into simple organisms? RNA. This approach was established by the Pardi54 and Williamson55 research groups. Their method includes growing organisms on 13C- and/or 15N-labeled source(s), harvesting cells and extracting RNA, hydrolyzing RNA to rNMPs, and conversion to rNTPs. The latter step can be achieved by chemical74 or enzymatic75 means, depending on the expertise and resources available. Enzymatic conversion is usually superior to the chemical approach, yielding rNTPs of >95% purity54,55,74?76. Although biomass methods permit new and commercially unavailable rNTP labeling patterns, the overall cost advantage is minimal, and the purification steps are laborious. Nevertheless, many research groups have used biomass labeling to prepare RNA for NMR studies. 1.3.2.1 Uniform biomass labeling Uniform 13C labeling was first achieved by growing Escherichia coli (E. coli)54,55, Methylophilus methylotrophus (M. methylotrophus)54, or Methylorubrum extorquens (M. 11 extorquens)55 with either 13C-glucose or 13C-methanol. For uniform 13C/15N labeling, E. coli was grown with 13C-glucose and 15N-ammonium sulfate54,55. The use of M. methylotrophus and M. extorquens gained popularity due to their compatibility with the more cost-effective 13C-methanol54. Nevertheless, these organisms have significantly lower rNTP contents and more difficult growth conditions as compared to E. coli54. The rNTPs obtained from this method were used in IVT to make uniformly and nucleotide- specifically labeled RNAs for multi-dimensional NMR experiments54,55,76,77. These new experiments greatly simplified resonance assignment strategies and the structure determination of small (<30 nts) RNAs. However, alternative labeling strategies were needed to overcome spectral overlap in larger RNAs. 1.3.2.2 Atom-specific biomass labeling Hoffman and Holland modified the biomass method to make atom-specifically labeled rNTPs78. In their approach, E. coli were grown with different 13C-sodium acetate sources to make various 13C-labeled rNTPs. For example, [2-13C]-sodium acetate labeled purine- C2, -C5, and -C8 (>95%), pyrimidine-C5 and -C6 (>90%), and ribose-C1?, -C4?, and -C5? (~90%). Alternatively, [1-13C]-sodium acetate labeled purine-C4 and -C6 (> 90%), pyrimidine-C2 and -C4 (>95%), and ribose-C3? (~75%)78. Among others, Hoogstraten and co-workers employed a similar methodology wherein E. coli strains deficient in enzymes involved in the tricarboxylic acid cycle (DL323)79 or the oxidative pentose phosphate pathway (K10-1516)80 were grown with various 13C-glycerol sources81. While most labeling patterns had yields <50%, [2?,4?-13C2]-AMP was created with an 80% yield in K10- 1516 cells grown with [2-13C]-glycerol81. Importantly, Hoogstraten and co-workers 12 demonstrated that 13C-13C dipolar couplings present in uniformly 13C/15N-labeled samples lead to over-estimated relaxation rates81. Our research group used a similar method but with a different isotope source82. Purine-C2 and -C8 (~95%), pyrimidine-C5 (~98%), and ribose-C1? (42%) and -C5? (95%) were labeled in DL323 cells fed with [3-13C]-pyruvate82. To demonstrate the utility of these labeling patterns, we used our atom-specifically labeled rNTPs to make a 27 nt RNA via IVT for NMR studies. In agreement with previous work81, R1 rate measurements showed a discrepancy between uniformly and atom-specifically labeled samples for pyrimidine- C5 and ribose-C1? and -C5?82. 1.3.3 rNTP de novo biosynthesis A ribonucleotide de novo biosynthesis uses enzymes from the pentose phosphate and nucleotide salvage biosynthetic pathways, various cofactor regeneration systems, and isotope-labeled precursor compounds to synthesize purine83 and pyrimidine84 rNTPs in a one-pot enzymatic reaction. The benefits of this route include reduced reaction time and increased product yield and specificity, compared with traditional chemical synthesis85?89. Moreover, this approach produces cost-effective uniformly 13C/15N-labeled and atom- specifically labeled rNTPs. On the other hand, laborious molecular cloning is involved, but the pay offs in preparation and purification more than make up for this initial outlay. Still, de novo labeling to prepare RNA for NMR studies has been used with some success. 1.3.3.1 Purine de novo biosynthesis Williamson and co-workers were the first to describe the de novo biosynthesis of isotope- labeled purine rNTPs83. Their approach used enzymes from the pentose phosphate 13 pathway to convert glucose to 5-phospho-D-ribosyl-?-1-pyrophosphate (PRPP) and enter a linear cascade of reactions to assemble the purine ring and produce inosine 5?- monophosphate (IMP), a precursor for both ATP and GTP. The de novo biosynthesis of ATP and GTP also required NAD(P)H and rNTP regeneration systems, folate, aspartate, and glutamine. Isotope-labeled precursor compounds 13C-D-glucose, 13C-sodium bicarbonate (NaH13CO3), 15N-ammonium chloride (15NH4Cl), and 13C/15N-L-serine enabled atom-specific labeling. Specifically, N1, N2, N3, N6, and N9 are derived from 15NH4Cl. Similarly, 13C/15N-L-serine labels C2, C4, C5, C8, and N7, and NaH13CO labels C6. Finally, 13C-D-glucose provides the label for the ribose carbons (i.e., C1?, C2?, C3?, C4?, and C5?) and purine-C6. While this methodology is very powerful, it also comes with a number of drawbacks. The use of isotope-labeled D-glucose can limit the potential labeling patterns if both ribose and nucleobase labels are desired. For example, production of carbon dioxide (CO2) via decarboxylation of 6-phosphogluconate to ribulose-5-phosphate (R5P) during PRPP production links the isotope label of the C1 of D-glucose to purine-C6. If ribose labeling is not desired, and only purine-C6 labeling is needed, then PRPP must be made directly from unlabeled D-ribose. Similarly, care must be taken to prevent isotopic dilution from atmospheric CO2, if both ribose and purine-C6 labeling are desired. Additionally, the C2 and C8 positions are labeled together or not at all. Finally, C6 and ribose labeling are limited by commercial sources of D-glucose. In summary, a total of 28 biosynthetic enzymes (Table 1.3) were used for the one- pot synthesis of ATP and GTP over two days with yields of up to 66%83. Specifically, 23 enzymes were used to synthesize [2,8-13C2]-ATP in a 57% yield, and 26, 24, and 27 14 enzymes helped synthesize uniformly 13C-, 15N-, and 13C/15N-labeled GTP in 66, 24, and 42% yields, respectively83. These atom-specific rNTPs were used in IVT to make a 30 nt RNA for NMR studies83. Their labeling patterns helped identify specific nucleobase interactions and greatly reduced spectral overlap. Table 1.3. Enzymes for the de novo biosynthesis of purine rNTPs1,83. Enzymea Gene EC number Source Hexokinase hxk1/2 2.7.1.1 Baker?s yeast Glucokinase glk 2.7.1.2 E. coli Glucose-6-phosphate dehydrogenase zwf1 1.1.1.49 Baker?s yeast Phosphogluconate dehydrogenase gndA 1.1.1.44 E. coli Ribose-5-phosphate isomerase rpiA 5.3.1.6 E. coli Ribose-phosphate diphosphate kinase prsA 2.7.6.1 E. coli Amido phosphoribosyl-transferase purF 2.4.2.14 E. coli Phosphoribosylamine-glycine ligase purD 6.3.4.13 E. coli Phosphoribosylglycinamide formyltransferase purN 2.1.2.2 E. coli Phosphoribosylformylglycinamidine synthase purL 6.3.5.3 E. coli Phosphoribosylformylglycinamidine cyclo-ligase purM 6.3.3.1 E. coli Phosphoribosylamino-imidazole carboxylase (catalytic subunit) purE 4.1.1.21 E. coli Phosphoribosylamino-imidazole carboxylase (ATPase subunit) purK 4.1.1.21 E. coli Phosphoribosylamino-imidazole-succinocarboxamide synthase purC 6.3.2.6 E. coli Adenylosuccinate lyase purB 4.3.2.2 E. coli Phosphoribosylamino-imidazole-carboxamide formyltransferase purH 2.1.2.3 E. coli Inosine-monophosphate cyclohydrolase purH 3.5.4.10 E. coli Adenylosuccinate synthase purA 6.3.4.4 E. coli Inosine-monophosphate dehydrogenase guaB 1.1.1.205 E. coli Guanosine-monophosphate synthase guaA 6.3.5.2 E. coli Adenylate kinase plsA 2.7.4.3 Chicken muscle Creatine phosphokinase ckmT 2.7.3.2 E. coli Guanylate kinase spoR 2.7.4.8 Rabbit muscle Glycine hydroxymethyltransferase glyA 2.1.2.1 E. coli Methylene-tetrahydrofolate dehydrogenase folD 1.5.1.5 E. coli Methenyl-tetrahydrofolate cyclohydrolase folD 3.5.4.9 E. coli Aspartate ammonia-lyase aspA 4.3.1.1 E. coli Glutamate dehydrogenase (NAD(P)+) glud1 1.4.1.3 Bovine liver Glutamate dehydrogenase (NADP+) gdhA 1.4.1.4 E. coli Glutamine synthase glnA 6.3.1.2 E. coli Inorganic diphosphatase ppa 3.6.1.1 E. coli a. All enzymes are commercially available except phosphoribosylformylglycinamidine cyclo-ligase, phosphoribosylamino-imidazole carboxylase, phosphoribosylamino-imidazole-carboxamide formyltransferase, inosine-monophosphate cyclohydrolase, and methenyl-tetrahydrofolate cyclohydrolase. 1.3.3.2 Pyrimidine de novo biosynthesis Extending previous work, Williamson and co-workers developed the first de novo biosynthesis of isotope-labeled pyrimidines84. In contrast to purine synthesis, where the 15 nucleobase was constructed step-by-step on the ribose, the nucleobase was directly and enzymatically coupled to the ribose to synthesize pyrimidines. Rather than directly coupling uracil, orotidine 5?-monophosphate (OMP) was produced and then converted to UTP. In a final step, UTP was converted to CTP with CTP synthetase (CTPS) (EC 6.3.4.2) and NH4Cl90. This method still relied on enzymes from the pentose phosphate and nucleotide salvage biosynthetic pathways, albeit with two enzymes cloned from species other than E. coli: the carbamoyl-phosphate synthase-like carbamate kinase enzyme was cloned from the thermophile Pyrococcus furiosus (P. furiosus) (EC 6.3.5.5), and dihydro- orotate dehydrogenase from Lactoccocus lactis (L. lactis) (EC 1.3.5.2). The de novo biosynthesis of UTP and CTP also required ATP and NADPH regeneration systems. As before, isotope-labeled precursor compounds 13C-D-glucose, NaH13CO3, 15NH4Cl, and 13C/15N-L-aspartate enable atom-specific labeling. Specifically, C4, C5, C6, and N1 were derived from 13C/15N-L-aspartate, Cyt-N3 and -N4 were delivered by 15NH4Cl, and C2 came from NaH13CO3. All ribose carbons were provided by 13C-D-glucose. Again, this labeling methodology has a number of drawbacks. The use of isotope-labeled D-glucose can limit the potential labeling patterns if both ribose and nucleobase labels are desired. For example, production of CO2 via decarboxylation of 6- phosphogluconate to R5P during PRPP production, and of OMP to UMP, links the isotope label of the C1 of glucose and the C1 of aspartate to pyrimidine-C2. If only pyrimidine-C2 labeling is wanted, then PRPP must again be made directly from unlabeled D-ribose. Similarly, care must be taken to prevent isotopic dilution from solvent and atmospheric CO2, if both ribose and pyrimidine-C2 labeling are desired. Alternatively, commercially available [1-13C]-L-aspartate can be used to label C2 without C4, C5, and C6 labeling. 16 To summarize, a total of 16 biosynthetic enzymes were used for the efficient one- pot synthesis of UTP (and CTP) over three-to-four days with yields of up to 45%84 (Table 1.4). Specifically, 15 enzymes were used to synthesize atom-specific [1?,6-13C2]-UTP with a 25% yield, and batches of 15, 16, and 16 enzymes helped synthesize uniformly 13C-, 13C/15N-, and 15N-[5,3?,4?,5?,5??-2H5]-labeled UTP with 40, 45, and 30% yields, respectively84. Additionally, CTPS converted uniformly 15N-[5,3?,4?,5?,5??-2H5]-UTP to its CTP counterpart with a 48% yield84. The utility of these UTP and CTP labels was demonstrated for the same 30 nt RNA model system84. This labeling scheme reduced spectral overlap even more than before, owing to ribose deuteration. Table 1.4. Enzymes for the de novo biosynthesis of pyrimidine rNTPs1,84. Enzymea Gene EC number Source Hexokinase hxk1/2 2.7.1.1 Baker?s yeast Glucose-6-phosphate isomerase pgi 5.3.1.9 Baker?s yeast Glucose-6-phosphate dehydrogenase zwf1 1.1.1.49 Baker?s yeast Phosphogluconate dehydrogenase gndA 1.1.1.44 E. coli Ribose-5-phosphate isomerase rpiA 5.3.1.6 Spinach Ribose-phosphate diphosphate kinase prsA 2.7.6.1 E. coli Carbamate kinase-like carbamoyl-P synthase cpkA 6.3.4.16 P. furiosus Aspartate carbamoyl transferase pyrl/B 2.1.3.2 E. coli Dihydro-orotase pyrC 3.5.2.3 E. coli Dihydro-orotate dehydrogenase pydA 1.3.3.1 L. lactis Orotate phosphoribosyl transferase pyrE 2.4.2.10 E. coli Orotate monophosphate decarboxylase pyrF 4.1.1.23 E. coli CTP synthase pyrG 6.3.4.2 E. coli Uridine monophosphate kinase pyrH 2.7.4.22 E. coli Cytidine monophosphate kinase cmk 2.7.4.14 E. coli Adenylate kinase plsA 2.7.4.3 Chicken muscle Glutamate dehydrogenase (NAD(P)+) glud1 1.4.1.3 Bovine liver Creatine phosphokinase ckmT 2.7.3.2 Chicken muscle a. All enzymes are commercially available except Carbamate kinase-like carbamoyl-P synthase, Dihydro- orotate dehydrogenase, and CTP synthase. 1.3.4 Chemo-enzymatic labeling Chemo-enzymatic labeling is a hybrid approach that we developed, taking inspiration from a variety of other research groups75,91?98. In brief, this method uses enzymes from 17 the nucleotide salvage biosynthetic pathways and cofactor regeneration systems to couple a nucleobase and ribose, followed by subsequent phosphorylation to the rNTP in a one-pot enzymatic reaction99?101. Moreover, the nucleobase and ribose building blocks can be unlabeled, isotope-labeled, chemically synthesized, or commercially available, permitting a diverse set of labeling patterns. In collaboration with the Kreutz research group, we have prepared rNTPs with a variety of commercially unavailable labeled nucleobases at reduced costs99,100. 1.3.4.1 Nucleobase labeling Before detailing this hybrid method, we will first outline the chemical synthetic methods for atom-specific labeling of RNA nucleobases with stable isotopes. These nucleobases can then serve as the building blocks for the synthesis of the rNTPs (Section 1.3.4.2) or amidites (Section 1.3.5) that enable the eventual enzymatic (Section 1.4.1) or chemical (Section 1.4.2) production of isotope-labeled RNAs for NMR studies. 1.3.4.1.1 Pyrimidine 2H, 13C, 15N, and 19F labeling The uracil (Ura) nucleobase is easily assembled using the Poulter-SantaLucia-Kreutz approach102?104. In the original eight-step synthetic pathway described by Roberts and Poulter, the 13C label can be placed in any position of the six-membered ring simply by changing the 13C-source102. SantaLucia and Tinoco and co-workers streamlined this to a three-step reaction scheme to make 13C-cyanoacetyl urea from inexpensive commercially available 13C-labeled precursors103. A slightly modified approach from Kreutz and co- workers uses bromoacetic rather than chloroacetic acid. Bromoacetic acid is the preferred starting material due to the lower costs and better handling of the cyanide reagent100,104. 18 Other methods with fewer steps exist, such as condensation of malic or propiolic acid and urea105,106. Even though these are simple two-step reactions, execution is not as convenient or cost-effective. Using the Poulter-SantaLucia-Kreutz approach100,102?104, [1-13C]- and [2-13C]- bromoacetic acid selectively labels Ura-C4 and -C5, respectively. Use of 13C-urea, on the other hand, delivers 13C at the C2 site, and that of 13C-potassium cyanide (13C-KCN) labels the C6 site. Finally, 15N-urea installs 15N at Ura-N1 and -N3. All possible uracil heteroatom positions can therefore be labeled in good yields, and these reactions can be easily scaled to gram quantities100,104. An example of a synthetic scheme using the Poulter-SantaLucia-Kreutz approach100,102?104 is shown for Ura-C6 labeling (Scheme 1.1)100,104,107. In brief, bromoacetic acid 1 reacts with 13C-KCN and sodium carbonate (Na2CO3) in a Kolbe nitrile reaction to form 2-[cyano-13C]acetic acid 2. Treatment of 2 with urea in the presence of acetic anhydride (Ac2O) then yields a urea intermediate 3 that can be readily converted to [6-13C]-uracil 4 using a palladium catalyst (e.g., Pd/BaSO4) under hydrogen atmosphere (H2). Given that pyrimidine-H5/H6 protons have three-bond scalar coupling (3JH5/H6 ~8 Hz38) and strong dipolar coupling (H5-H6 distance of 2 ?) that complicate NMR experiments, selective and quantitative deuteration can be achieved by reacting 4 with triethylamine (TEA) to form the desired [6-13C, 5-2H]-uracil 5.107 Taken together, 5 was synthesized with four-steps in 63% total yield (Scheme 1.1)100,104,107. Scheme 1.1. Synthesis of [6-13C, 5-2H]-uracil3. Additional detail is found in the original works100,104,107. KEY: D = 2H = 13C O O O O O O 1) 5% Pd/BaSO4 in H2 Br KCN, H2O, Na2CO3 NC urea, Ac2O NC 2) 50% CH3COOH (aq) NH TEA, D O D 2 NH OH 16 h, 80 ?C-rt OH 30 min, 90 ?C N NHH 2 16 h, rt N O 90 h, 110 ?C96% 93% 71% N OH ~100% H 1 2 3 4 5 19 Given the valuable spectroscopic properties of 19F (Section 1.2.4), uracil can be fluorinated with the commercially available SelectfluorTM, as recently reported26,66,104,108. This synthetic procedure is similar to that described for Ura-C6 labeling (Scheme 1.1)100,104,107, except using [2-13C]-bromoacetic acid 6 as starting material. Kolbe nitrile reaction of 6 forms 7 followed by reaction with 15N-urea and Ac2O to yield 8. Addition of Pd/BaSO4 in H2 to 8 then forms [5-13C, 1,3-15N2]-uracil 9, which can then be fluorinated with SelectfluorTM to yield [5-13C, 5-19F, 1,3-15N2]-uracil (5FU) 10. Again, selective and quantitative deuteration can remove the unwanted three-bond 1H-19F coupling (3JH6F5 ~7.1 Hz109) that complicates NMR experiments by heating 10 in sodium deuteroxide (NaOD) to form [6-2H]-5FU 11 26,108,110. In summary, 11 was synthesized in five-steps with an overall yield of 38% (Scheme 1.2)26,66,104,107,108. Scheme 1.2. Synthesis of [6-2H]-5FU3. Additional detail is found in the original works26,66,104,107,108. Cl N BF KEY: D = 2H = 13C N = 15N 4F = 19F F -SelectfluorTM = N F BF4 O 1) 5% Pd/BaSO in H O KCN, H O, Na CO O H N NH Ac O O O 4 2 2 2 3 2 2 2 2) 50% CH3COOH (aq) Br OH NC NC16 h, 80 ?C-rt OH 30 min, 90 ?C N NHH 2 16 h, rt96% 93% 71% 6 7 8 O O O NH F -Selectfluor TM in H2O, NaB(C6H5)4 F H NaOD FN NH N O 1) 4 h, 90 ?C2) sublimation N O 4 h, 60 ?C 95% D N OH H H 63% 9 10 11 Finally, DNA pyrimidine thymine (Thy)-C6 can be selectively labeled in a manner similar to uracil (Schemes 1.1 and 1.2)26,66,100,104,107,108. In brief, bromopropionic acid 12 is used in a Kolbe nitrile reaction followed by addition of urea and Ac2O to form intermediates 13 and 14, respectively. Then, reaction of 14 with Pd/BaSO4 in H2 forms the desired [6-13C]-thymine 15 with in 45% total yield (Scheme 1.3)111. 20 Scheme 1.3. Synthesis of [6-13C]-thymine3. Additional detail is found in the original work111. KEY: = 13C O O O O O 1) 5% Pd/BaSO4 in H2 Br KCN, H2O, Na2CO3 NC urea, Ac2O NC 2) 50% CH3COOH (aq) NHOH 3 h, 60 ?C OH 2 h, 90 ?C N NHH 2 20 h, rt 97% 67% 70% N OH 12 13 14 15 1.3.4.1.2 Purine C8 labeling Purines can also be selectively isotope-labeled using commercially available precursors. In the early 1990s, SantaLucia and Tinoco and co-workers described an effective purine synthesis using 13C-formic acid to label purine-C8 103. More recently, Kreutz and co- workers streamlined the efficiency of such labeling in one-step reactions99,107,112. Here, the condensation of 13C-formic acid 16 with morpholine forms a morpholinium formate intermediate that reacts with either 4,5,6-triaminopyrimidine sulfate 17 to yield [8-13C]- adenine 18 (Scheme 1.4) or 2,5,6-triaminopyrimidin-4-ol sulfate 19 to form [8-13C]- guanine 20 (Scheme 1.5) in 64 and 94% yield, respectively107. Scheme 1.4. Synthesis of [8-13C]-adenine3. Additional detail is found in the original works99,107,112. KEY: = 13C NH2 NH2 O H2N+ N Morpholine N N H OH H2SO4 H N N 3 h, 100-200 ?C2 N64% H N 16 17 18 Scheme 1.5. Synthesis of [8-13C]-guanine3. Additional detail is found in the original works99,107,112. KEY: = 13C OH O O H2N+ N Morpholine N NH H OH H2SO4 H2N N NH 3 h, 100-200 ?C2 N94% H N NH2 16 19 20 21 1.3.4.1.3 Purine C2 labeling As with purine-C8 labeling, Ade-C2 can be readily labeled. One synthetic route begins with 5-aminoimidazole-4-carboxamide (AICA) and ethylsodium 13C-xanthate to form [2- 13C]-hypoxanthine, -adenine, or -guanine113. A preferred alternative for purine-C2 labeling uses the method of Battaglia and Ouwerkerk and co-workers, wherein sodium ethoxide (C2H5ONa) mediates cyclization of ethyl cyanoacetate with 13C-thiourea to give [2-13C]-6- amino-2-thiouracil114,115. Sodium nitrite (NaNO2) is then used for nitrosylation. Importantly, Ade-N7 can also be labeled If Na15NO2 is used in this step. Sodium dithionite (Na2S2O4) further mediates reduction of the nitroso group followed by desulfurization over Raney-Nickel to form [2-13C]-5,6-diamino-4-pyrimidinone116. Treatment of the product with sulfuric (H2SO4) and formic (HCOOH) acids yields [2-13C]-hypoxanthine117. Subsequent reaction with phosphorus oxychloride (POCl3) and N,N-dimethylaniline (N,N- DMA) yields [2-13C]-6-chloropurine118. In the final step, reaction with methanolic NH3 in a microwave reactor yields the desired [2-13C]-adenine114. Alternative purine synthesis pathways have been devised to enable specific labeling of Ade-C2 and/or any purine nitrogen position113?115,117,119?125. Indeed, we recently synthesized [2-13C, 7-15N]-adenine in seven-steps with an overall yield of 18% and showed its utility in NMR analysis of RNA structure and dynamics124. This will be discussed in greater detail in Chapter 2. 1.3.4.1.4 Pyrimidine N1, N3, and N4 labeling As described above, using the Poulter-SantaLucia-Kreutz approach100,102?104, 15N-urea delivers 15N at Ura-N1 and -N3 sites. Cytosine labeling, on the other hand, occurs through uracil, given that the corresponding CTP can be built directly from enzymatic conversion (with NH4Cl) from UTP90,100 or by chemical synthesis from a transiently protected uridine 22 amidite107. In this way, all uracil labeling patterns will be retained in CTP and cytidine amidites. Moreover, the Cyt-N4 amino (NH2) group can also be labeled by 15NH4Cl in an enzymatic90,100 (Section 1.3.4.2) or chemical107 (Section 1.3.5) reaction. 1.3.4.1.5 Purine N1, N3, N7, and N9 labeling Synthesis with Ade-N1 labeling occurs in two-steps116. Here, commercially available 5- aminoimidazole-4-carbonitrile 21 reacts with diethoxymethyl acetate (DEMA) to yield intermediate 22. Subsequent reaction of 22 with aqueous ammonia (NH3) readily forms the desired [1-15N]-adenine 23 in 66% total yield (Scheme 1.6)116. Scheme 1.6. Synthesis of [1-15N]-adenine3. Additional detail is found in the original work116. O O KEY: N = 15N DEMA = O O NH2 N CN DEMA N CN NH3 in CH2H5OH, (CH3)2CHOH N N N NH 6 h, 115-120 ?C2 N N O ~20 min, ~ -10?C / 24 h, 160-165 ?C N H 91% H ~66% H N 21 22 23 Ade-N3 labeling, on the other hand, is carried out in six-steps120. In brief, commercially available 4-imidazolecarboxylic acid 24 is nitrated with 15N-ammonium nitrate (NH415NO3) to afford 5-[nitro-15N]1H-imidazole-4-carboxylic acid 25. Activation of 25 with 1,1'-carbonyldiimidazole (CDI) in dimethylformamide (DMF) and excess NH3 forms carboxamide 26. Addition of 15NH4Cl in this step can also label the N1 site, permitting the synthesis of [1,3-15N2]-adenine. Catalytic reduction of 26 affords 15N-AICA 27. Ring closure of 27 with triethyl orthoformate (HC(OC2H5)3) gives a hypoxanthine 28 which readily forms [3-15N]-6-chloropurine 29 upon chlorination with POCl3 and N,N-DMA. Finally, ammonolysis with ammonium hydroxide (NH4OH) yields the desired [3-15N]- 23 adenine 30 with an overall yield of 47% (Scheme 1.7)120. The Ade-N(6)H2 group can also be labeled by 15NH4OH in the final step. Scheme 1.7. Synthesis of [3-15N]-adenine3. Additional detail is found in the original work120. KEY: N = 15N O O 1) CDI in DMF O CH3OH, CH3COOH, O N OH NH4NO3 in H2SO4 N OH 2) NH3 (g) N NH Pd/C in H2 N2 NH2 N 12 h, 100 ?C N H 76% H NO 1) 4 h, rt N 6 h, rt2 N2) 48 h, rt H NO2 86% H NH2 24 25 95% 26 27 O Cl NH2 HC(OC2H5)3 in DMF N NH POCl3, N,N-DMA N N NH4OH in CH3OH N N 12 h, 140 ?C N N 30 min, 130 ?C N 17 h, 150 ?C N88% H 91% H N 94% H N 28 29 30 In addition, purine-N7 labeling is readily achieved and has been widely adapted114,115,117,122,124?126. For example, synthesis with Gua-N7 is achieved in three- steps. Nitrosylation of commercially available 2,6-diaminopyrimidin-4-ol 31 by Na15NO2 yields 32. Reduction of 32 with Na2S2O4 followed by acidification by H2SO4 forms 33. In the final step, reflux of 33 with formamide (CH3NO) followed by HCOOH provides the desired [7-15N]-guanine 34 in 65% total yield (Scheme 1.8)126. Scheme 1.8. Synthesis of [7-15N]-guanine3. Additional detail is found in the original work126. KEY: N = 15N OH 1) NaNO2 in NaOH OH Na2S2O4 (aq) / OH O N 2) CH3COOH ON N H2SO4 (aq) H2N N HCONH2, HCOOH N NH H N N NH 1 h, rt 1 h, 100-0 ?C / 1 h, 110 ?C / 2 h, 180 ?C2 2 N79% H2N N NH2 20 min, 80 ?C H2N N NH2 83% H N NH2 ~99% 31 32 33 34 Several direct routes to 15N-labeled adenine initiate from commercially available aminopyrimidines117,122. Micura and Kreutz and co-workers reported one such synthesis of [7-15N]-adenine using a hypoxanthine intermediate125 in a manner similar to that of 24 Battaglia and Ouwerkerk and co-workers114,115 (Section 1.3.4.1.3). Again, we recently showcased a similar synthetic scheme to form [2-13C, 7-15N]-adenine in seven-steps with an overall yield of 18%124. This will be discussed in greater detail in Chapter 2. Finally, in the synthesis with Ade-N9 labeling, 5-amino-4,6-dichloropyrimidine 35 is converted to chloropurine 36 using aqueous 15NH3 and DEMA. Addition of aqueous NH3 then yields the desired [9-15N]-adenine 37. This simple three-step reaction forms 37 in 83% total yield (Scheme 1.9)119. Scheme 1.9. Synthesis of [9-15N]-adenine3. Additional detail is found in the original work119. KEY: N = 15N Cl 1) 26% NH (aq) Cl NH H N 3 2 2 N 2) DEMA N N 28% NH3 (aq) N N Cl N 1) 7 h, 120-130 ?C N 7 h, 150 ?C N2) 3.5 h, 100 ?C H N 95% H N ~83% 35 36 37 1.3.4.1.6 Nucleobase labels: summary and outlook As described in Section 1.3.4.1 and shown in Schemes 1.1-1.9, a wide range of isotope- labeled nucleobases (Table 1.5) are available to the scientific community. Of all synthetic procedures, purine C8 sites (Schemes 1.4 and 1.5)107 are most readily labeled in one chemical step in a single day and with high yield (64-94%) (Table 1.5). Conversely, Ade- N3 (Scheme 1.7)120 is the least readily labeled, taking 11 days whereas C2 and N7 labeling124 have the lowest overall yields of 18% (Table 1.5). Future work must focus on improving yields and reducing the number of chemical steps. However, these RNA labeling patterns are commonly chosen based on the experimental information required, and less often dictated by the relative time and yield of the building blocks. 25 Table 1.5. Summary of nucleobase labels3. Nucleobase label Time (days)a Chemical stepsb Yield (%) Reference [8-13C]-adenine 1 1 64 [99, 107, 112] [8-13C]-guanine 1 1 94 [99, 107, 112] [2-13C]-adeninec 2.5 7 (1) 18 [124] [1-15N]-adenine 2.5 2 (1) 66 [116] [3-15N]-adenine 11 6 (2) 47 [120] [7-15N]-adeninec 2.5 7 (1) 18 [124] [7-15N]-guanine 1.5 3 65 [126] [9-15N]-adenine 5.5 3 (3) 79 [119] [6-13C, 5-2H]-uracil 7 4 63 [100, 104, 107] [5-13C, 5-19F, 6-2H]-uracil 8 5 38 [26, 66, 104, 107, 108] [6-13C]-thymine 2.5 3 45 [111] a. The total reaction time was based on the time required for all chemical steps. In addition, 16 h were added for any explicit mention of overnight procedures and 24 h were added for any chromatographic purifications. b. The number in parentheses represents the number of chromatographic purification steps. c. All data for [2-13C]-adenine and [7-15N]-adenine labeling came from the same doubly labeled [2-13C, 7- 15N]-adenine labeling scheme124, which will be detailed in Chapter 2. 1.3.4.2 Enzymatic coupling of nucleobase and ribose sources With chemically synthesized isotope-labeled nucleobases in-hand, this section outlines the various enzymatic methods that can be used to build them into rNTPs. While these methods share many similarities, they can broadly be classified by the precursor compounds that are built into the rNTP ribose moiety. 1.3.4.2.1 Synthesis from D-glucose The first enzymatic approach to prepare isotope-labeled rNTPs was motivated by the early work of Schramm93,94,96 and Gilles95 co-workers and later optimized by the Williamson research group97,98. This Gilles-Schramm-Williamson pentose phosphate pathway method93?98 uses isotope-labeled D-glucoses as the precursor and requires 14 enzymes (Table 1.6) and several coenzymes97,98. This approach is appealing for uniform ribose labeling using commercially available uniformly 13C- and/or 2H-labeled D-glucoses. 26 Table 1.6. Enzymes for Gilles-Schramm-Williamson method93?98 rNTP synthesis method3. Enzymea Abbreviation EC number Source Hexokinase HXK 2.7.1.1 Baker?s yeast Glucose-6-phosphate isomerase PGI1 5.3.1.9 Baker?s yeast Glucose-6-phosphate dehydrogenase ZWF 1.1.1.49 L. rnesenteroides Phosphogluconate dehydrogenase GND 1.1.1.44 Torula yeast Ribose-5-phosphate isomerase RPI1 5.3.1.6 Spinach Phosphoribosylpyrophosphate synthetase PRPPS 2.7.6.1 E. coli Adenine phosphoribosyltransferase APRT 2.4.2.7 JM109/pTTA6 Uracil phosphoribosyltransferase UPRT 2.4.2.9 JM109/pTTU2 Xanthine-guanine phosphoribosyltransferase XGPRT 2.4.2.22 JM109/pTTG2 Nucleoside-monophosphate kinase NMPK 2.7.4.4 Bovine liver Myokinase (Adenylate kinase) MK 2.7.4.3 Rabbit muscle Guanylate kinase GK 2.7.4.8 Porcine brain 3-Phosphoglycerate mutase YIBO 5.4.2.1 Rabbit muscle Enolase ENO 4.2.1.11 Baker?s yeast Pyruvate kinase PYKF 2.7.1.40 Rabbit muscle Glutamate dehydrogenase (NAD(P)+) GLUD 1.4.1.3 Bovine liver CTP synthase CTPS 6.3.4.2 JM109/pMW5 L-Lactate dehydrogenase LDH 1.1.1.27 Rabbit muscle a. All enzymes are commercially available except APRT, UPRT, XGPRT, CTPS, and RK. In brief, hexokinase (HXK) (EC 2.7.1.1) phosphorylates 13C-labeled D-glucose 38 at its O6 position to yield glucose-6-phosphate 39. Then, glucose-6-phosphate dehydrogenase (ZWF) (EC 1.1.1.49) oxidizes 39 to form 6-phosphogluconate 40. Phosphogluconate dehydrogenase (GND) (EC 1.1.1.44) further oxides 40 to 41. Finally, ribose-5-phosphate isomerase (RPI1) (EC 5.3.1.6) isomerizes 41 to ribose-5-phosphate 42. Following isomerization, phosphoribosylpyrophosphate synthetase (PRPPS) (EC 2.7.6.1) pyrophosphorylates 42 at its O1? site to afford 43. Then, adenine (APRT) (EC 2.4.2.7), guanine (XGPRT) (EC 2.4.2.22), or uridine (UPRT) (2.4.2.9) phosphoribosyl transferases facilitate the nucleophilic attack of adenine or guanine N9 or uracil N1 to the C1? of 43 to yield 5?-monophosphates 44-46, respectively. Phosphorylation of 44-46 is achieved by adenylate (MK) (EC 2.7.4.3), guanylate (GK) (EC 2.7.4.8), or nucleoside monophosphate (NMPK) (EC 2.7.4.4) kinases to form the 5?-diphosphates 47-49, respectively. Pyruvate kinase (PYKF) (EC 2.7.1.40) then catalyzes the final 27 phosphorylation to form the 5?-triphosphates 50-52 (Scheme 1.10)97,98. Finally, UTP 52 can be converted to CTP 53 by CTP synthase (CTPS) (EC 6.3.4.2), and the Cyt-N(4)H2 can also be labeled by 15NH3 in this step (Scheme 1.10)97,98. Scheme 1.10. rNTP synthesis from D-glucose3. Additional detail is found in the original works97,98. ENO glutamate NADP+ 2-phosphoglycerate phosphoenolpyruvate KEY: = 13C N = 15N X = GLUD ZWF Y = YIBO PYFK -ketoglutarate NADPH pyruvate? 3-phosphoglycerate O OP OH HO O O O O O OH OH OHXK X P GND CO O OH 2 HO HO O O O RPI1 HO OH OH O O POH HO OH ATP ADP OH OH NADP+ NADPH OH OH O O 38 39 40 41 O O O O P O O P O A, or G, or U O P O O O PRPPS O O APRT, or XGPRT, or UPRT O O A, or G, or UOH O O OH OH ATP AMP O P O P O OH OH PPiO O OH OH 42 43 44-46 O O O O O O P O P O P P P MK, or GK, or NMPK O O O A, or G, or O O O O U Y O O O O A, or G, or U ATP ADP OH OH OH OH 47-49 50-52 O O O O P O P O P O KEY: (continued) NH2 O O NH2 CTPS, NH UTP 3 O O O O C N N N NH NH N 52 A, G, U, and C = A G U C ATP ADP OH OH N NH N H N NH2 N O N OH H 53 This approach can use any of the chemically synthesized nucleobases described in Section 1.3.4.1. Moreover, Scott and Hennig and coworkers demonstrated that the Gilles-Schramm-Williamson method93?98 is compatible with 19F-labeled nucleobases67,127,128 by synthesizing [2-19F]-ATP127 and [5-19F]-UTP128 and -CTP128. However, D-ribose is a more cost-effective labeled precursor than D-glucose for the selective 13C and/or 2H ribose labeling129. 28 1.3.4.2.2 Synthesis from D-ribose Based on earlier work by Whitesides and co-workers75,91,92, our group truncated the relatively complex Gilles-Schramm-Williamson method93?98 to use 10 enzymes instead of 18, and two cofactor regeneration systems (i.e., dATP and creatine phosphate) (Table 1.7). This chemo-enzymatic strategy is a versatile method to couple nucleobase and ribose followed by subsequent phosphorylation to the rNTP in a one-pot enzymatic reaction99?101. This approach has many advantages over previously reported biomass54,55,76?82, de novo,83,84 and chemical85?89 synthesis methods, including fewer enzymes, fewer synthetic steps, and greater yields. Table 1.7. Enzymes for Dayie method99?101rNTP synthesis method3. Enzymea Abbreviation EC number Source Ribokinase RK 2.7.1.15 E. coli Phosphoribosylpyrophosphate synthetase PRPPS 2.7.6.1 E. coli Adenine phosphoribosyltransferase APRT 2.4.2.7 JM109/pTTA6 Uracil phosphoribosyltransferase UPRT 2.4.2.9 JM109/pTTU2 Xanthine-guanine phosphoribosyltransferase XGPRT 2.4.2.22 JM109/pTTG2 Uridine monophosphate kinase UMPK 2.7.4.22 E. coli Guanylate kinase GK 2.7.4.8 Porcine brain Myokinase (Adenylate kinase) MK 2.7.4.3 Rabbit muscle CTP synthase CTPS 6.3.4.2 JM109/pMW5 Creatine kinase CK 2.7.3.2 Chicken muscle a. All enzymes are commercially available except APRT, UPRT, XGPRT, CTPS, and RK. We used this method to synthesize [1?,5?,6-13C3, 1,3-15N2]-pyrimidine rNTPs and [1?,8-13C2]-, or [2?,8-13C2]-, or [1?,5?,8-13C3]-purine rNTPs with six and five enzymes, respectively (Table 1.7).130 First, ribokinase (RK) (EC 2.7.1.15) phosphorylated 13C- labeled D-ribose 54 at its O5 position to yield ribose-5-phosphate 55. Then, PRPPS pyrophosphorylates 55 at its O1? site to afford 56. APRT, XGPRT, or UPRT then catalyze the nucleophilic attack of the adenine or guanine N9 or uracil N1 to the C1? of 56 to yield 5?-monophosphates 57-59, respectively. Phosphorylation of 57-59 is achieved by MK, 29 GK, or UMP kinase (UMPK) (EC 2.7.4.22) to form the 5?-diphosphates 60-62, respectively. Creatine kinase (CK) (EC 2.7.3.2) then facilitates the final phosphorylation to afford the 5?-triphosphates 63-65 in 90, 75 and 90% total yield, respectively (Scheme 1.11)99?101. Similar to the Scheme 1.1097,98, UTP 65 can be converted to CTP 66 by CTPS in 95% yield, and the Cyt-N(4)H2 can also be labeled by 15NH4Cl in this step (Scheme 1.11)99?101. These atom-specifically labeled rNTPs can then be used in IVT to make a variety of RNAs. Importantly, these labeling patterns reduced spectral overlap, increased signal-to-noise ratios, and facilitated direct carbon detection experiments. Scheme 1.11. rNTP synthesis from D-ribose3. Additional detail is found in the original works99?101. MK dAMP + dATP 2 dADP KEY: = 13C N = 15N dATP regeneration system = CK dADP + creatine phosphate dATP + creatine O O HO O P O O P O A , or G, or U O RK O O PRPPS O O APRT, or XGPRT, or UPRTOH OH O O O P O P O OH OH dATP dADP OH OH dATP dAMP OH OH PPiO O 54 55 56 O O O O O O O P O O O A, or O P O P O P P P G, or U O O O OMK, or GK, or UMPK O O O A, or G, or U CK O O O O A, or G, or U OH OH dATP dADP OH OH creatine phosphate creatine OH OH 57-59 60-62 63-65 O O O O P O P O P O CTPS, NH C KEY: (continued) NH2 O O NHUTP 3 O O O O 2 65 N N NH NA, G, U, and C = A N G NH U C dATP dADP OH OH N N H N H N NH2 N O N OH H 66 18 20 As with the Gilles-Schramm-Williamson method93?98, our approach is also compatible with the chemically synthesized nucleobases described in Section 1.3.4.1 and 19F-labeled nucleobases (e.g., [2-19F]-adenine and 5FU26,108). The main disadvantage of our approach, however, is the need to express and purify five non-commercial enzymes 30 (Table 1.7). Still, the benefits afforded by our method, particularly the ability to generate non-commercially available atom-specific labeling patterns, outweigh this drawback. 1.3.4.2.2 Synthesis from inosine Serianni and co-workers have developed a complementary approach to enzymatically couple nucleobase and ribose sources using only four enzymes (Table 1.8)131. Their method uses hypoxanthine 67 and 1-O-acetyl-2,3,5-tri-O-benzoyl-?-D-ribofuranoside (ATBR) 68 in a Vorbr?ggen reaction89 to yield inosine 69. Then, purine nucleoside phosphorylase (PNPase) (EC 2.4.2.1) facilitates the nucleophilic attack of phosphate at the C1? position of 69 to give ?-D-ribofuranosyl-1-phosphate sodium salt (?R1P) 70. Then, 70 is glycosylated enzymatically by PNPase with adenine or guanine, or by UPase (EC 2.4.3.2) with uracil to yield nucleosides 71-73 (Scheme 1.12)131. These nucleosides can then be converted to the desired rNTP or amidite with further enzymatic or chemical synthesis, respectively. As with the Gilles-Schramm-Williamson93?98 and Dayie99?101 methods, this approach is also compatible with the chemically synthesized nucleobases described in Section 1.3.4.1. Table 1.8. Enzymes for Serianni method131 rNTP synthesis method3. Enzymea Abbreviation EC number Source Purine nucleoside phosphorylase PNPase 2.4.2.1 E. coli Xanthine oxidase XO 1.1.3.22 Buttermilk Catalase CT 1.11.1.6 Bovine liver Uridine phosphorylase UPase 2.4.2.3 E. coli a. All enzymes are commercially available. 31 Scheme 1.12. rNTP synthesis from inosine3. Additional detail is found in the original work131. O NH2 O O O O N N KEY: = 13C ATBR = O O NH A, G, and = A N G NH U U O O N H N N H N NH2 N OH O O 18 20 4 O O HO A or G N Vorbr?ggenNH N NH PNPase O + ATBR r G e HO HO A o as N H N 68 O N N NaPi N P OH OH 67 O P 67 CT O 1/2 O + H O H O P O 71, 72 OH OH 2 2 2O2 XO OH OH UP UO ase HO 69 uric acid 70 O U OH OH 73 1.3.4.2.3 rNTP labels: summary and outlook As described in Section 1.3.4.2.2 and shown in Scheme 1.11, the chemo-enzymatic labeling method developed by Dayie and co-workers99?101 permits the synthesis of a versatile assortment of rNTPs with atom-specific isotope labels (Table 1.9). While there are other enzymatic methods to generate such labels (e.g., the Gilles-Schramm- Williamson93?98 or Serriani131 methods shown in Schemes 1.10 and 1.12, respectively), no other technique offers the versatility and simplicity that is afforded by the Dayie method. Our one-pot chemo-enzymatic approach can produce isotope-labeled purine and pyrimidine rNTPs in a few days and with high yield (75-95%) (Table 1.9). The main disadvantage of this method is the need to express and purify five non-commercial enzymes in-house (Table 1.7). However, we are in the process of providing these plasmids to Addgene to make our method widely accessible to the field. 32 Table 1.9. Summary of rNTP labels3. rNTP labela Time (days)b Enzymatic stepsc Yield (%) Reference [8-13C]-ATP 1.5 1 (1) 90 [100] [8-13C]-GTP 1.5 1 (1) 75 [100] [1?,5?,6-13C3, 1,3-15N2]-CTP 3 3 (2) 95 [101] [1?,5?,6-13C3, 1,3-15N2]-UTP 2.5 2 (2) 90 [101] a. The [8-13C]-adenine and -guanine were coupled to [1-13C]-, or [2-13C]-, or [1,5-13C2]-D-ribose to genera te a variety of ATPs and GTPs99. The [6-13C, 1,3-15N2]-uracil and -cytosine nucleobases, on the other hand, were coupled to [1?,5?-13C2]-D-ribose only100. Nevertheless, the reported times, enzymatic steps, and yields are representative of all ATP, GTP, CTP, and UTP reactions made with this method. b. The total reaction time was based on the time required for all chemical steps. In addition, 24 h were added for any chromatographic purification. c. The number in parentheses represents the number of chromatographic purification steps. Since the time of our original publication100, pyrimidine rNTP synthesis now only requires one chromatographic purification26,108. 1.3.5 RNA phosphoramidite labeling Thus far, all discussions of RNA building blocks (Sections 1.3.2-1.3.4) have focused on rNTPs for use in IVT to make atom-specifically labeled RNA (Section 1.4.1). However, position-specifically labeled RNA can also be prepared by chemical SPS with unlabeled and isotope-labeled amidites (Section 1.4.2). Several groups have developed strategies to obtain isotope-labeled amidites. Initial efforts were developed by the Pitsch132 and Jones133?136 research groups. More recently, the Micura69,125,137 and Kreutz66,104,107,111,112 groups have improved the efficiency and scalability of amidite synthesis for NMR studies. This section will detail the synthetic methods to obtain isotope-labeled amidites with several 2?-hydroxyl (OH) protecting groups. However, even though amidite labeling is currently the most effective and widely used method for position-specific RNA labeling, its utility for NMR studies is limited to RNAs ~60 nt. 1.3.5.1 2H, 13C, and 15N labeling The Micura69,125,137 and Kreutz66,104,107,111 groups have used isotope-labeled nucleobases (Figures 1.6-1.10) to prepare and 2?-O-[(triisopropylsilyl)oxy]methyl (TOM)138 and 2?-O- 33 tert-butyldimethylsilyl (tBDMS)33 amidites for NMR studies. A representative example of [6-13C, 5-2H]-pyrimidine 2?-O-TOM amidite syntheses are shown in Schemes 1.13 and 1.14107. In brief, [6-13C, 5-2H]-uracil 5 is coupled to ATBR 68 under Vorbr?ggen conditions89 to give the 2?,3?,5?-O-benzoyl (Bz)-protected 74, which is then fully deprotected to nucleoside 75 after treatment with methylamine (CH3NH2) in ethanol (C2H5OH). Addition of 4,4?-dimethoxytrityl chloride (DMT-Cl) and TOM-Cl protects the 5?- and 2?-hydroxyl (OH) to form 76 and 77, respectively. Finally, phosphitylation of the 3?-OH of 77 with 2-cyanoethyl N,N-diisopropylchlorophosphoramidite (CEP-Cl) and N,N- diisopropylethylamine (DiPEA) yields the desired [6-13C, 5-2H]-uridine 2?-O-TOM amidite 78 in five-steps with an overall yield of 22% (Scheme 1.13)107. Given that all amidites described herein are 5?-O-DMT-protected, we will omit this designation from their names and instead focus on which 2?-OH protecting group is used. Scheme 1.13. Synthesis of [6-13C, 5-2H]-uridine 2?-O-TOM amidite3. Additional detail is found in the original work107. O O Cl KEY: D = 2H = 13C TMSOTf = Si O S CF3 DtBDC = Sn BSA = Si O N Si Bz = TOM = O O Cl Si O DMT O BSA, TMSOTf O DMT-Cl O O CEP-Cl, DiPEA O D NH in CH3CN D D D + NH in pyridine NH in CH2Cl2 NH ATBR N O 1 h, 60 ?C RO N O 4 h, rt O 68 98% 70% NO O O 2 h, rt DMTO H 80% O N O 5 CEP OR OR OH OR NC O P O OTOM N CH3NH2 in C2H5OH R = Bz 74 DtBDC, DiPEA, R = H 76 16 h, rt TOM-Cl in C2H4Cl2 100% R = H 75 1h, 80 ?C40% R = TOM 77 78 The corresponding cytidine derivative is obtained from 77 in four additional steps (Scheme 1.14)107. First, the 3?-OH of 77 is transiently acetylated with Ac2O to afford 79. Then, treatment of 79 with 2,4,6-triisopropylbenzenesulfonyl chloride (TiBSC) and TEA 34 yields the 5?-O-DMT-2?-O-TOM cytidine 80, which is immediately N4-acetylated (Ac) with Ac2O to form 81. Finally, 3?-OH phosphitylation yields the desired [6-13C, 5-2H]-N4-Ac- cytidine 2?-O-TOM amidite 82 with eight-steps in 14% total yield (Scheme 1.14)107. Scheme 1.14. Synthesis of [6-13C, 5-2H]-N4-Ac-cytidine 2?-O-TOM amidite3. Additional detail is found in the original work107. O S Cl O KEY: D = 2H = 13C TiBSC = O Ac = O 1) TiBSC, TEA, in CH2Cl2 HNR HNAc D 2) 28% NH3 (aq) in THF D DNH 3) CH3NH2 in C2H5OH N CEP-Cl, DiPEA in CH N2Cl2 DMTO N O DMTO N O DMTOO 1) 2 h, rt O 2 h, rt N O O 2) 18 h, rt 90% 3) 1 h, rt OR OTOM 68% OH OTOM CEPO OTOM Ac2O in pyridine R = H 77 Ac2O in DMF R = H 80 82 3 h, rt 22 h, rt 90% R = Ac 79 90% R = Ac 81 In contrast to pyrimidines, the starting purine is protected before beginning the nucleosidation reaction. A representative example of [8-13C]-purine 2?-O-TOM amidite syntheses are shown in Schemes 1.15 and 1.16107. Starting with N6-Bz-protected adenine 83, a Vorbr?ggen reaction89 gives the 2?,3?,5?-O-Bz-protected 84, which is readily 2?,3?,5?- O-deprotected to nucleoside 85 after treatment with sodium hydroxide (NaOH) in pyridine and C2H5OH.Then, 5?-OH tritylation, 2?-OH TOM protection, and 3?-OH phosphitylation yields 86, 87, and 88, respectively. Taken together, [8-13C]-N6-Bz-adenosine 2?-O-TOM amidite 88 was synthesized with five-steps in 20% total yield (Scheme 1.15). Guanosine synthesis, on the other hand, proceeds from a N2-isobutyrl (iBu) protected guanine 89. From there, however, synthesis proceed as with adenosine. That is, 89 is reacted under Vorbr?ggen conditions89 to form 90, which is then 2?,3?,5?-O- deprotected to nucleoside 91. Again, and to conclude, 5?-OH tritylation, 2?-OH TOM 35 protection, and 3?-OH phosphitylation yields 92, 93, and 94, respectively. In summary, [8- 13C]-N2-iBu-guanosine 2?-O-TOM amidite 94 was synthesized in five-steps with an overall yield of 23% (Scheme 1.16). Scheme 1.15. Synthesis of [8-13C]-N6-Bz-adenosine 2?-O-TOM amidite3. Additional detail is found in the original work107. KEY: = 13C HNBz BSA, TMSOTf HNBz DMT-Cl HNBz CEP-Cl, DiPEA HNBz N in toluene N in pyridine N in CH ClN N N 2 2 N+ NATBR N 1 h, 105 ?C RO N 4 h, rt DMTO N 2 h, rt DMTO H N 68 75% O N 74% O N 72% NO N 83 OR OR OH OR CEPO OTOM NaOH DtBDC, TOM-Cl, in pyridine/C2H OH R = Bz5 84 DiPEA R = H 86 88 30 min, 0 ?C 1 h, 80 ?C ~100% R = H 85 49% R = TOM 87 Scheme 1.16. Synthesis of [8-13C]-N2-iBu-guanosine 2?-O-TOM amidite3. Additional detail is found in the original work107. KEY: = 13C CEP-Cl, O BSA, TMSOTf O DMT-Cl O DiPEA O N in toluene N in pyridine N in CH ClNH O iBu NH NH 2 2 N NH N 1 h, 105 ?C RO N DMTO 16 h, rt DMTO H N N 83% O N NiBu 4 h, rt 70% NO N NiBu N N NiBuH H H 77% O H 89 OR OR OH OR CEPO OTOM + ATBR NaOH DtBDC, TOM-Cl, 68 in pyridine/C R = Bz2H5OH 90 DiPEA R = H 92 94 30 min, 0 ?C 91 1 h, 80 ?C99% R = H 52% R = TOM 93 In addition to 2H and 13C isotopes, the synthetic routes described above (Schemes 1.13-1.16) can easily incorporate 15N labels by building them into the nucleobase using the methods described in Section 1.3.4.1 (Schemes 1.1-1.9). Alternatively, Kreutz and Micura and co-workers developed synthetic procedures to directly incorporate 15N labels at the imino (i.e., purine-N1 and pyrimidine-N3) sites of 2?,3?,5?-O-Ac-protected nucleosides137. Using this approach, [1-15N]-N6-Ac-adenosine 95 (10% yield in nine- steps), [1-15N]-N2-(di-N-methylamino)methylene (DMAM)-guanosine 96 (6% yield in 12- 36 steps), [3-15N]-uridine 97 (12% yield in seven-steps), and [3-15N]-N4-Ac-cytidine 98 (14% in nine-steps) 2?-O-tBDMS amidites were prepared (Figure 1.4)137. Importantly, these procedures can synthesize either 2?-O-tBDMS or -TOM amidites by altering the 2?-OH protection reaction137. Rather than summarizing the synthetic procedures here, we will detail representative examples of 2?-O-tBDMS amidite syntheses in the next section. KEY: N = 15N HNAc O O HNAc N N N NH DMAM NH NDMTO N DMTO DMTO DMTOO N O N N N N N O N O tBDMS O O CEPO O Si CEPO OtBDMS CEPO OtBDMS CEPO OtBDMS 95 96 97 98 Figure 1.4. Examples of 15N-labeled 2?-O-tBDMS amidites. Additional details can be found in the original work137. Synthetic procedures for 2?-O-tBDMS amidites can be found in the next section. These 2?-O-tBDMS or -TOM amidites are not suitable for producing RNAs >60 nts. Instead, amidites with 2-cyanoethoxymethyl (CEM) as the 2?-OH protecting group139?141 are used, due to its increased coupling efficiency, which rivals that in DNA synthesis142. Using a protocol developed by Yano and co-workers140,141, Kreutz and co-workers prepared [8-13C]-N6-Ac-adenosine 99 (30% yield in five-steps), [8-13C]-N2-phenoxyacetyl (Pac)-guanosine 100 (14% yield in nine-steps), [6-13C, 5-2H]-uridine 101 (14% yield in five-steps), and [6-13C, 5-2H]-N4-Ac-cytidine 102 (18% yield in eight-steps) 2?-O-CEM amidites, and the modified [1,3-15N2]-dihydrouridine 103 (14% yield in 10-steps) and [2,8- 13C2]-inosine 104 (3% yield in 12-steps) 2?-O-CEM amidites (Figure 1.5)112. While the benefits of the CEM amidite method are attractive for obvious reasons, it has not gained widespread use due to the commercial unavailability of 2?-O-CEM amidites. As such, researchers must synthesize both unlabeled and isotope-labeled amidites, as well as the 37 CEM group itself, in order to prepare atom- and position-specifically labeled RNA. Rather than detailing the synthetic procedures here, we will showcase our synthesis of 2?-O-CEM amidites in Chapter 2. KEY: D = 2H = 13C N = 15N HNAc O O N N N NH O Pac D NH DMTO N DMTO N O DMTOO N O N N N OH O CEM CEPO O O CN CEPO OCEM CEPO OCEM 99 100 101 HNAc O O D N NH N NH DMTO N O DMTO DMTO N N O OO O N CEPO OCEM CEPO OCEM CEPO OCEM 102 103 104 Figure 1.5. Examples of 2H-, 13C-, and/or 15N-labeled 2?-O-CEM amidites. Additional details can be found in the original work112. Synthetic procedures for 2?-O-CEM amidites can be found in Chapter 2. 1.3.5.2 19F labeling and post-transcriptional modifications Another benefit of labeling with amidites is the position-specific incorporation of modified building blocks. Indeed, many post-transcriptional modifications modulate the structure, dynamics, and interactions of RNAs, and NMR is providing new insights into their functions143. These studies have been greatly aided by the synthesis of 13C- and/or 15N- labeled amidites bearing uridine 5-oxyacetic acid (cmo5U)144, N6-methyl(CH3)-adenosine (m6A)145,146, and pseudouridine (?)147. In collaboration with the Al-Hashimi group, Kreutz and co-workers synthesized a 15N-labeled cmo5U amidite144. Their synthetic route begins from bromoacetic acid 1 and through intermediates 105 and 106 to assemble [1,3-15N2]- uracil 107, as in Schemes 1 and 226,66,100,104,107,108. Then, 107 was coupled to ATBR 68 under Vorbr?ggen conditions89, 2?,3?,5?-O-deprotected, and hydroxylated at the C5 38 position to yield 108, 109, and 110, respectively. Addition of para-toluene sulfonic acid (pTSA) and dimethoxypropane ((CH3)2C(OCH3)2) then formed the 2?,3?,5?-O-protected nucleoside 111. Further addition of ethyl-2-iodo acetate in C2H5OH and NaOH transformed the 5-OH of 111 into an ethylcarboxymethoxy group while also deprotecting the 5?-OH to afford 112. After transient 2?,3?-O-deprotection of 112 to form 113, the 3?- and 5?-OH were immediately protected along with 2?-O-tBDMS protection to yield 114 by di- tert-butylsilyl bis(trifluoromethanesulfonate) (DtBS) and tBDMS-Cl. Addition of pyridine and CH3OH to 114 forms 115, and subsequent treatment with nitrophenyl ethanol (NPE), N-dimethyl aminopyridine (DMAP), and N-ethyl-N?-(3-dimethyl aminopropyl) carbodiimide (EDC) finalizes the NPE-protected cmo5U 116. Reaction of 116 with hydrogen fluoride (HF) affords the 3?,5?-O-deprotected 117, which can then be 5?-O-tritylatyed and 3?-O- phosphitylated to yield 118 and 119, respectively. The final reaction was carried out with 5-(benzylthio)-1H-tetrazole (BTT) and 2-cyanoethyl N,N,N?,N?- tetraisopropylphosphorodiamidite (TiPCEP). Taken together, [1,3-15N2]-cmo5U 2?-O- tBDMS amidite 119 was synthesized with 15-steps in 1% total yield (Scheme 1.17)144. Another example from the Al-Hashimi and Kreutz groups showcases the synthesis of a 13C-labeled m6A amidite145. Their synthetic route begins following the recommendations of Battaglia and Ouwerkerk and co-workers, wherein C2H5ONa mediates cyclization of ethyl cyanoacetate 120 with 13C-thiourea 121 to give [2-13C]-6- amino-2-thiouracil 122114,115. Addition of NaNO2 to 122 forms the nitroso containing 123, which is then reduced and desulfurized by Na2S2O4 and Raney-Nickel116 to yield 124 and 125, respectively. Treatment of 125 with H2SO4 and H13COOH acids yields [2,8-13C]- hypoxanthine 126117. Then, the familiar Vorbr?ggen reaction89 of 126 and ATBR 68 yields 39 the 2?,3?,5?-O-Bz-protected 127 followed by addition of sulfuryl chloride (SO2Cl2) to yield 6-chloropurine nucleoside 128. Sequential addition of CH3NH2 in C2H5OH and then H2O affords the m6A nucleoside 129. As above, the synthetic route concludes with 2?-O-tBDMS protection, 5?-O-tritylation, and 3?-O-phosphitylation to yield 130, 131, and 132, respectively. In summary, [2,8-13C2]-N6-methyladenosine 2?-O-tBDMS amidite 132 was synthesized in 11-steps with an overall yield of 4% (Scheme 1.18)145. Scheme 1.17. Synthesis of [1,3-15N2]-cmo5-uridine 2?-O-tBDMS amidite3. Additional detail is found in the original work144. O O BSA,O KCN, H2O, O O O 1) 5% Pd/BaSO4 in H2 TMSOTf Br NaHCO3 (sat) OH NC H2N NH2 Ac2O NC 2) 50% CH3COOH (aq) in CH CNOH N NH NH + 3 ATBR 16 h, 80 ?C 30 min, 90 ?C / 0 ?C H 2 20 h, rt / 1.5 h, 70 ?C N O 68 1 h, 60 ?C~100% 71% 40% H ~100% 1 105 106 107 O Br2 in H2O, O pTSA, O ethyl-2-iodo acetate O O pyridine HO (CH3)2C(OCH3)2 HO in C2H5OH / NaOHNH ONH NH O NH RO 14 h, rt HO 16 h, rt O 2.5 h, rt N O 45% N O 97% HON O 65% NO O O O O O OR OR OH OH O O O O CH3NH2 in C2H5OH R = Bz 108 110 111 112 14 h, rt ~100% R = H 109 1) DtBS in DMF O O 2) imidazole O O HF-pyridine O O DMT-Cl CH3OH, HCl O 3) tBDMS-Cl O CH2Cl2 in pyridineO NH RO NH O 1 h, 50 ?C HO 1) 30 min, 0?C 1 h, 0 ?C NPEO NH 16 h, rt 92% N O 2) 20 min, rt ~100% HOO O O N O N O 26%3) 3 h, 50 ?C O 68% Si OH OH O OtBDMS OH OtBDMS 113 pyridine/CH3OH, NaOH in CH3OH R = CH3 114 117 10 min, rt 87% R = H 115 NPE, DMAP, EDC in THF 2 h, rt R = NPE 116 ~100% O O O 1) BTT in CH3CN O O N O 2) TiPCEP O S OH O O NPEO NH NPEO NH 15 DMTO 1) 1 h, rt KEY: N = N pTSA = NPE = DMAP = N O 2) 4 h, rt DMTO N O O N O 79% O 2 N N O O OH OtBDMS CEPO OtBDMS F C S O Si O S CFEDC = N C N DtBS = 3 3 118 119 O O 40 Scheme 1.18. Synthesis of [2,8-13C2]-N6-methyladenosine 2?-O-tBDMS amidite3. Additional detail is found in the original work145. KEY: = 13C O Na O Raney-Ni, O O O S in C2H5OH R NH 5% NH3 (aq) H2N NH H OH H2SO4 (aq) N NC + NH + ATBR O H2N NH2 2 h, 100 ?C H N N S 2 h , 80?C H N N 16 h , 100?C / 30 min, 0?C2 N~100% H 82% 2 65% H N 68 120 121 HCl (aq), NaNO 125 1262 3 h, 0 ?C R = H 122 ~100% R = NO Na2S2O4, NaHCO3 (sat) 123 3 h , 0?C R = NH2 124 94% 1) DtBS in DMF BSA, TMSOTf O SO Cl Cl2 2 CH3NH2 in C2H5OH NH 2) imidazole in toluene N NH in CHCl3 N N CH3NH2 in H2O N N 3) tBDMS-Cl 75 min, 100 ?C BzO N N 3 h, reflux BzO N 48 h, rt HOO O N O N 1) 10 min, 63% 62% 90% N 2) 0 ?C-rt 3) 2 h, 60 ?C OBz OBz OBz OBz OH OH 67% 127 128 129 NH NH NH 1) HF-pyridine in CH2Cl2 CEP-Cl, DiPEA N N 2) DMT-Cl in pyridine N N in THF N N N 1) 2 h, 0 ?C DMTON N N 2 h, rt DMTO N O O 2) 3 h, rt O 61% O N Si 61%O OtBDMS OH OtBDMS CEPO OtBDMS 130 131 132 In addition, Chow and co-workers synthesized a [1,3-15N2]-? 2?-O- bis(acetoxyethoxy)methyl ether (ACE) amidite147 with 11-steps in 6% total yield. Moreover, INNotope offers [2,8-13C2]-m6A 132, [13CH3]-m6A 133, [13CH3]-N6-chloroAc- m1A 134, and [13CH3]-N4-Ac-m3C 135 2?-O-tBDMS amidites (Table 1.2 and Figure 1.6). KEY: = 13C chloroAc O NH NH Cl NH HNAc N N N N N N N DMTO N DMTO N DMTO N DMTO N O O N O N O N O CEPO OtBDMS CEPO OtBDMS CEPO OtBDMS CEPO OtBDMS 132 133 134 135 Figure 1.6. Commercially available 13C-labeled modified 2?-O-tBDMS amidites. All amidites are available from INNotope (https://www.innotope.at/). 41 Additionally, building on the work shown in Scheme 1.2 Scheme 1.226,66,104,107,108, Kreutz and co-workers showcased new methods to incorporate 19F-13C into the pyrimidine nucleobase of amidites66. Starting from [6-13C]-uracil 4, fluorination is achieved with SelectfluorTM to yield 5FU 136, as in Scheme 1.226,66,104,107,108. The remaining chemical steps are similar to other 2?-O-tBDMS amidite syntheses (Schemes 1.17 and 1.18) 144,145. That is, 136 is coupled to ATBR 68 under Vorbr?ggen conditions89, 2?,3?,5?-O-deprotected, and then 3?,5?-O-protected and 2?-O-tBDMS protected to yield 137, 138, and 139, respectively. To conclude, 139 is 5?-O-tritylated and 3?-O-phosphitylated to yield 140 and 141, respectively. Taken together, [5-13C, 5-19F]-uridine 2?-O-tBDMS amidite 141 was synthesized with six-steps in 8% total yield (Scheme 1.19)66. Scheme 1.19. Synthesis of [5-13C, 5-19F]-uridine 2?-O-tBDMS amidite3. Additional detail is found in the original work66. KEY: = 13C F = 19F O 1) DtBS in DMF F -SelectofluorTM in H2O, O BSA, TMSOTf O 2) imidazole NaB(C6HNH 5 )4 F in CH3CN F NH 3) tBDMS-ClNH + N O 1) 16 h, 90 ?C ATBR N O 1 h, 60 ?C RO N O 1) 10 min, 0 ?C H 2) 30 min, 0 ?C 68 64% O 2) 0 ?C-rt3) sublimation H 3) 2 h, 60 ?C 4 47% 136 70% OR OR CH3NH2 in C2H5OH R = Bz 137 16 h, rt 88% R = H O 1) HF-pyridine in CH2Cl O2 1 )BTT in CH CN O3 F NH 2) DMT-Cl in pyridine F F NH 2) TiPCEP NH 1) 2 h, 0 ?C N DMTO 1) 2 h, rt DMTO N OO O O 2) 3 h, rt O N O 2) 16 h, rt 47% O 89% Si O OtBDMS OH OtBDMS CEPO OtBDMS 139 140 141 The corresponding cytidine derivative is obtained from 139 through intermediates 142-144 to afford the desired 145, as in Scheme 1.14107. In summary, [5-13C, 5-19F]-N4- Ac-cytidine 2?-O-tBDMS amidite 145 was synthesized in eight-steps with an overall yield 42 of 4% (Scheme 1.20)66. These labeling topologies not only capitalize on the beneficial spectroscopic properties of the 19F nuclei (Section 2.4) but also opens the door to NMR studies of large RNAs, as described elsewhere26,66,108. Scheme 1.20. Synthesis of [5-13C, 5-19F]-N4-Ac-cytidine 2?-O-tBDMS amidite3. Additional detail is found in the original work66. KEY: = 13C F = 19F 1) TEA, DMAP, O TiBSC in CH3CN HNR 1) HF-pyridine, CH2Cl2 HNAc 1) BTT in CH3CN HNAc F 2) 28% NH3 (aq) F 2) DMT-Cl in pyridine F 2) TiPCEP FNH N N N 2 h, 0 ?C-rt 1) 2 h, 0 ?C DMTO 1) 2 h, rt DMTO O O N O 62% O O N O 2) 3 h, rt O N O 2) 16 h, rt N O 64% 70% O Si O SiO OtBDMS OtBDMS OH OtBDMS CEPO OtBDMS 139 144 Ac2O in DMF 145 R = H 142 16 h, rt 74% R = Ac 143 1.3.5.3 Synergy between labeling methods In principle, any nucleobase labeling scheme described in Section 1.3.4.1 can be coupled to any commercially available 13C- and/or 2H-labeled D-ribose (from Omicron Biochemicals or CIL) with the chemo-enzymatic method (Section 1.3.4.2) and built into an amidite with a variety of 2?-OH protecting groups (Section 1.3.5). Indeed, our group recently made [1?,8-13C2]-N6-Bz-adenosine 2?-O-tBDMS148 and [1?,6-13C2, 5-2H]-uridine 2?- O-CEM149 amidites via chemo-enzymatic synthesis, dephosphorylation with rSAP, and chemical synthesis. Synthesis of the latter amidite will be detailed in Chapter 2. These amidites can then be used to make RNA via solid-phase synthesis. Given that the Kreutz and Micura groups have implemented a wide variety of atom-specific labeling schemes into the nucleobase of RNAs66,69,104,107,111,112,125,137, this hybrid approach is only needed if atom-specific ribose labeling is desired in a position-specific manner. However, Silantes offer [1?,8-13C2]-N6-Ac-adenosine 146, [1?,2,8-13C3]-N6-Ac-adenosine 147, [1?,8-13C2]-N2- 43 Ac-guanosine 148, [1?,6-13C2, 5-2H]-uridine 149, and [1?,6-13C2, 5-2H]-N4-Ac-cytidine 150 2?-O-tBDMS amidites (Table 1.2 and Figure 1.7). KEY: D = 2H = 13C HNAc HNAc O O HNAc N N N D NH DN N NH N DMTO N DMTO N DMTO N DMTO N O DMTOO N O N O N HNAc O N OO CEPO OtBDMS CEPO OtBDMS CEPO OtBDMS CEPO OtBDMS CEPO OtBDMS 146 147 148 149 150 Figure 1.7. Commercially available ribose labeled 2?-O-tBDMS amidites. All amidites are available from Silantes (https://www.silantes.com/). 1.3.5.4 Phosphoramidite labels: summary and outlook As described in Sections 1.3.5 and shown in Schemes 1.13-1.20 and Figures 1.4-1.7, a wide range of isotope-labeled amidites are becoming available to the scientific community. For all synthetic protocols, pyrimidine-C5/C6 and purine-C8 sites are most readily labeled (Table 1.10). The production of these 2?-O-TOM amidites is streamlined, proceeding quickly (~1 week) and with reasonable yields (20-34%) (Table 1.10). The introduction of 19F labels and post-transcriptional modifications, on the other hand, dramatically increases the time of synthesis (i.e., up to 10 days) and reduces the overall reaction yields (i.e., as low as 1%) (Table 1.10). Nevertheless, the benefits afforded by the position-specific incorporation of these labels into RNA more than offsets these shortcomings. As with nucleobase labeling, researchers are typically motivated by the scientific question they are pursuing rather than the relative yields of each labeling reaction. Still, improvements in reaction yields and reduction in chemical steps would be advantageous for future work. 44 Table 1.10. Summary of RNA phosphoramidite labels3. RNA Phosphoramidite Labela Time (days)b Chemical Stepsc Yield (%) Reference [8-13C]-N6-Bz-adenosine (TOM) 4.5 5 (4) 20 [107] [2,8-13C2]-N6-methyladenosine (tBDMS) 8 11 (5) 4 [145] [8-13C]-N2-iBu-guanosine (TOM) 5 5 (4) 23 [107] [6-13C, 5-2H]-N4-Ac-cytidine (TOM) 8 8 (6) 14 [107] [5-13C, 5-19F]-N4-Ac-cytidine (tBDMS) 10 8 (6) 4 [66] [6-13C, 5-2H]-uridine (TOM) 4 5 (3) 22 [107] [5-13C, 5-19F]-uridine (tBDMS) 7.5 6 (4) 8 [66] [1,3-15N2]-cmo5-uridine (tBDMS) 8 15 (3) 1 [144] a. The 2?-OH protecting groups are listed in parentheses. b. The total reaction time was based on the time required for all chemical steps. In addition, 16 h were added for any explicit mention of overnight procedures and 24 h were added for any chromatographic purifications. c. The reactions for amidites harboring post-transcriptional modifications begin with isotope-labeled precursors whereas reactions for unmodified amidites begin with isotope-labeled protected nucleobase. Also, the number in parentheses represents the number of chromatographic purification steps. 1.4 RNA preparation methods With isotope-labeled building blocks in-hand, we can now discuss how they are used to prepare isotope-labeled RNA for NMR studies. In general, amidites are used in SPS to make RNAs <60 nts with atom- and position-specific labeling. All other RNA preparation methods require T7 RNAP-based IVT and therefore use rNTPs. IVT is the most widely used method to prepare large RNAs (>60 nts) but has major NMR limitations due to spectral overlap and broad linewidths, as previously discussed. Nevertheless, large RNAs can still be profitably analyzed if made from atom-specifically labeled rNTPs. The remaining RNA preparation methods rely on segmental labeling of large RNAs from smaller fragments or the enzymatic incorporation of position-specific labels. 1.4.1 Solid-phase RNA synthesis Originally developed for DNA synthesis by Beaucage and Caruthers142, the amidite method has since been adapted to RNA31?34. SPS is carried out in an automated synthesizer, requires amidite building blocks, and occurs in four-steps (Figure 1.8). First, 45 the DMT protecting group at the 5?-OH of the solid-support bound 3?-nucleoside is removed. Second, the deprotected 5?-OH then attacks the activated amidite to couple the two nucleosides. Synthesis grows the RNA molecule 3?-to-5? (Figure 1.8) by repeating the first two steps following oxidation of the phosphite-triester to the phosphotriester and subsequent deprotection. Cleavage from the solid-support terminates the cycle. DMTO O N base(n) O OR 4. New cycle (or release) 1. DeprotectionDMTO HO Nbase(n-1) O N base(n) O O O OR RNA solid phase synthesis O ORNC P O O Nbase(n)O DMTO base(n-1) 3. Oxidation 2. Coupling / capping O N O OR NC O P O OR DMTO Nbase(n-1) NO NC O P O OR O Nbase(n)O KEY: O base(n) O = solid-support O N O OR R = 2?-OH protecting group Nbase(n) = 3?-end nucleoside O OR Nbase(n-1) = 3?-end nucleoside - 1 Figure 1.8. Overview of solid-phase RNA synthesis2. Schematic of the solid-phase synthesis cycle, growing the RNA polymer from 3'-5'. SPS efficiency depends on the protecting group choice. RNA amidites are 5?-O- DMT-protected, and the nucleobase (Ade, Gua, and Cyt) exocyclic amino groups are protected with Ac, Pac, Bz, or iBu groups. The choice of the various 2?-OH protecting groups requires careful deliberation. These groups can be classified as acid-150, photo- 151, and fluoride-labile33,138,140,141. While RNAs have been synthesized with a variety of 2?- OH protecting groups, 2?-O-TOM138, -tBDMS33, and -CEM139?141 amidites are most 46 commonly used for NMR studies. SPS is therefore compatible with unlabeled and isotope-labeled amidites. The latter can either be purchased (Section 1.3.1, Table 1.2, and Figures 1.6 and 1.7) or prepared in-house (Sections 1.3.5, Schemes 1.13-1.20, and Figures 1.4 and 1.5). As such, this method can yield atom- and position-specifically labeled RNA for NMR studies. However, in practice, SPS is rarely used to make RNAs >60 nts and therefore alternative approaches must be considered to label large RNAs. 1.4.2 T7 RNA polymerase-based in vitro transcription IVT with DNA-dependent RNAPs from bacteriophage SP6, T3, or T7 (EC 2.7.7.6) is a widely used enzymatic method for RNA synthesis29,30,152?154. IVT is undoubtedly the standard approach for making RNAs for NMR analysis. In practice, IVT is performed with chemically synthesized single- or double-stranded DNA templates with one of two T7 RNAP promoter sequences (i.e., class II ?2.5 or class III ?6.5)30,154,155 (Figure 1.9). While this approach overcomes the size restrictions of SPS, it has limitations of its own. First, the widely used class III promoter is GTP-initiated and requires 5?-GG for efficient initiation154. Second, repeated failed transcription initiation results in 5?-end heterogeneity156,157. Third, T7 often adds additional non-templated rNTPs to the 3?-end of the nascent RNA30,158. Lastly, T7 is not amenable to position-specific labeling, though exceptions may arise by chance (e.g., an RNA sequence with a single A, G, C, or U). Fortunately, the 3?- and 5?-end heterogeneities are dramatically reduced by adding ribozyme sequences in the template in cis and trans159,160, by incorporating 2?-O-CH3 rNTPs at the 3?-end27, or by judicious choice of 5?-sequences. In addition to template modification, the efficiency of IVT with nucleotides bearing 2?-O-F, -NH2, or -CH3 modified ribose is enhanced by introducing Y639F and H784A mutations161?164. Although the 47 addition of non-templated rNTPs remains a challenge to IVT, Roy and co-workers found no detectable 3?-end products when transcribing RNA of various sizes at higher temperature165. Despite its limitations, IVT is an extremely versatile method and is compatible with unlabeled and isotope-labeled rNTPs. The latter can either be purchased (Section 1.3.1) or prepared in-house by biomass, de novo biosynthesis, or chemo- enzymatic methods (Sections 1.3.2-1.3.4). As such, this method can yield uniformly, nucleotide-specifically, and atom-specifically labeled RNA for NMR studies. Figure 1.9. Overview of T7 RNAP-based in vitro transcription. Transcription requires a DNA-template that includes the 18 nt T7 RNAP promoter sequence. The resulting RNA is produced from the coding strand shown in green. T7 also requires Mg2+ and the four rNTPs to produce the RNA of interest. 1.4.3 Enzymatic ligation One approach that enables position-specific and segmental isotope labeling is the ligation of two RNA molecules by T4 DNA (EC 6.5.1.1) or RNA (EC 6.5.1.3) ligase. These ligating enzymes have also been combined with self-cleaving ribozymes to segmentally label RNA. In these methods, multiple fragments of RNA are ligated to produce a larger isotope-labeled RNA that can be studied by NMR. Depending on the RNA sequence under investigation, researchers can devise unique labeling patterns to incorporate position-specific labels and greatly reduce spectral overlap and NMR analysis. 48 1.4.3.1 T4 DNA and RNA ligation The standard method for RNA ligation uses T4 DNA ligase166. In the presence of ATP, this enzyme recognizes a nicked double-stranded substrate and joins a 5?- monophosphate (P) RNA donor with a 3?-OH RNA acceptor (Figure 1.10A). The donor and acceptor RNA fragments can either be prepared by SPS or IVT. In the former case, the 5?-P can be added during or after donor RNA synthesis using T4 polynucleotide kinase (T4 PNK) (EC 2.7.1.78) and ATP. In the latter case, a donor RNA can be initiated with GMP or a 5?-XpG-3? dinucleotide. Alternatively, transcribed RNAs can be dephosphorylated with rSAP and then phosphorylated with T4 PNK167. The 3?- and 5?-end heterogeneities dramatically reduce ligation efficiency, and therefore great care must be taken to purify the RNA of interest166. The two main advantages of T4 DNA ligation are that undesired side products (e.g., circularization and oligomerization) are minimized and enzymatic activity is independent of ligation junction sequence. A major disadvantage of the methodology is that T4 DNA ligase requires large quantities of RNA and is relatively inefficient at joining RNA strands166. An alternative method for RNA ligation is using T4 RNA ligase. Like its DNA counterpart, RNA ligase requires a 5?-P donor, a 3?-OH acceptor, and ATP (Figure 1.10B)168. However, RNA ligase requires single-stranded ligation junctions, complicating the use of cDNA as a template. To overcome this limitation, Bain and Switzer designed a single-stranded DNA splint that positioned the donor and acceptor in close proximity, was compatible with T4 RNA ligase, and resulted in ligation efficiencies of 53%169. Building on this work, Rader and co-workers optimized ligation efficiency to near completion in less than an hour170. To achieve this, they protected the donor 3?-OH with a 2?-O-ACE group 49 to minimize side products, chemically incorporated the 5?-P to minimize 5?-end heterogeneities, and designed an optimized linker at the ligation junction170. Figure 1.10. Overview of enzymatic RNA ligation1. DNA splinted ligation schemes are shown using (A) T4 DNA ligase and (B) RNA ligase. Sequence requirements for T4 RNA ligase are also shown. Additional details can be found in the original works166,170. 1.4.3.2 Segmental RNA labeling Another unique ligation strategy employs RNAse H and hammerhead (HH), Varkud satellite (VS), and hepatitis delta virus (HDV) self-cleaving ribozymes. This approach dramatically reduces 3?- and 5?-end heterogeneities and has therefore been embraced as a popular method to segmentally label large RNAs for NMR studies. Two such examples have come from the Puglisi171 and Lukavsky172 research groups. Their method was streamlined by assembling a plasmid containing a T7 RNAP promoter, the 3?-fragment and its 3?-HH ribozyme in cis, and the RNA of interest. Despite the attractive design, the protocol took 12-14 days and only yielded 20-22 nmol RNA171,172. Nevertheless, this approach simplified NMR structural analysis of 74172 and 77 nt171 RNAs. Building on this work, Wijmenga and co-workers developed an efficient two-step ligation method to selectively label central positions of large RNAs173. The utility of these labeling patterns to simplify NMR resonance assignment was showcased with a 61 nt RNA. Still, this method only yielded 15-30 nmol RNA and required 9-11 days173. Finally, 50 Allain and co-workers developed an alternative approach for segmental labeling of RNA based on IVT of two full-length RNAs with identical sequence: one unlabeled and one isotope-labeled174. The RNAs were flanked at the 5?- and 3?-end by the HH and VS ribozymes, respectively. After ribozyme and RNase H cleavage steps, the acceptor and donor fragments were cross-ligated using T4 DNA or RNA ligase (Figure 1.11)174. The power of this method was demonstrated in a 72 nt non-coding RNA containing four stem- loops. Four NMR samples were made: each with only one of the four stem-loops isotope- labeled. This approach also provided ~10-fold better yield (i.e., 90-260 nmol RNA) and required less time (i.e., five-seven days) than did previous methods174. Figure 1.11. Overview of segmental RNA labeling1. Segmental labeling is achieved via IVT of identical unlabeled and isotope-labeled and fragments along with HH (cis) and VS (trans) ribozymes which are cleaved co-transcriptionally (step 1). Then, site-specific RNase H cleavage is facilitated by a DNA/RNA chimera (step 2) following cross ligation reactions with T4 DNA or RNA ligases to yield segmentally labeled RNA (step 3). Additional details can be found in the original work174. 1.4.4 Enzymatic position-specific RNA labeling The final two RNA preparation methods rely on enzymatic incorporation of position- specific isotope labels within RNA. These approaches hold promise for enabling site- 51 specific NMR studies of large RNAs, and therefore combine the benefits of both enzymatic (i.e., IVT and ligation) and chemical (i.e., SPS) preparation methods. 1.4.4.1 Position-Selective Labeling of RNA (PLOR) Wang and co-workers developed a hybrid solid-liquid phase transcription technique that employs an automated robotic platform known as position-selective labeling of RNA (PLOR)175. In PLOR, the DNA template is attached to beads and RNA synthesis is initiated by the addition of T7 RNA polymerase and a mixture of three of the four rNTP building blocks. The beads are then washed and a new rNTP mixture is added, this time containing the previously omitted building block. As such, PLOR can incorporate any isotope-labeled rNTP position-specifically, assuming the desired labeling site does not coincide with a stretch of identical nucleotides (e.g., UUU). While isotope labeling by PLOR has aided NMR studies of RNA175?177, its widespread use is still limited due to the requisite equipment needed, the need for stoichiometric amounts of T7 and DNA template, and its laborious nature. 1.4.4.2 Chemo-enzymatic position-specific labeling Schwalbe and co-workers developed an alternative chemo-enzymatic approach for position-specific labeling178. Importantly, this method uses standard laboratory equipment and commercially available enzymes T4 RNA ligase 1/2 and rSAP, making it more accessible than PLOR. In their method, a modified nucleoside 3?,5?-biphosphate is incorporated to the 3?-end of an RNA fragment by T4 RNA ligase 1 followed by dephosphorylation by rSAP and DNA-splinted ligation by T4 RNA ligase 2. This technique has been used to introduce modified nucleosides (i.e., photocaged, photoswitchable, and 52 isotope-labeled) into RNAs up to 392 nts178. While this method holds great promise for NMR applications, low yields of bis-phosphorylation (6-22%) and ligation (9-49%) reactions are a major drawback178. More recent efforts by Schwalbe and co-workers to improve this technology include the addition of magnetic streptavidin beads as a solid- support and 5?-biotinylated RNA179. 1.5 Conclusion We have presented a detailed overview of the common NMR-active isotopes (Section 1.2), the diverse chemical and enzymatic methods to synthesize isotope-labeled RNA building blocks (Section 1.3), and how they are used to prepare isotope-labeled RNA for NMR studies (Section 1.4). Despite these advances, a number of substantial limitations remain for RNAs prepared enzymatically (i.e., IVT) and chemically (i.e., SPS). The former is incapable of position-specific labeling and the latter is limited by size. Again, unlike IVT, a tremendous advantage to the field is that amidite labeling and SPS can provide direct read-outs of the biophysical consequences of post-transcriptional modifications143. However, despite this strength, the ?size problem? of SPS limits the production of RNAs to ~60 nt, beyond which it is exceedingly difficult to prepare NMR samples with sufficient yield and purity. Even though the 2?-O-CEM112,139?141 protecting group initially held promise for synthesizing larger RNAs, it has not gained widespread use. Conversely, while much larger RNAs can be transcribed enzymatically but not synthesized chemically, larger RNAs always carry with them more extensive spectral overlap and broader linewidths. These complications make NMR analysis of RNAs >60 nt extremely difficult, even when atom-specific labeling is used (Figure 1.12). 53 Labeled nucleobase: KEY: D = 2H = 13C N = 15N F = 19F NH2 NH2 NH2 O NH2 NH2 NH2 O N N N N N N N N N NA A A N G NH A N A N A N G NH N N N N N N N N NH2 N N N N N N N N NH2 [1-15N]-Ade [3-15N]-Ade [7-15N]-Ade [7-15N]-Gua [9-15N]-Ade [2-13C]-Ade [8-13C]-Ade [8-13C]-Gua NH2 NH2 O NH2 O NH2 O N D/HC N C N U NH NH D/H C U C N U NH N O N O N O D/H N O D/H N O N O N O [1,3-15N ]-Cyt [1,3,4-15N ]-Cyt [1,3-15N ]-Uri [5-13C, 6-1/22 3 2 H]-Cyt [5-13C, 6-1/2H]-Uri [6-13C, 5-1/2H]-Cyt [6-13C, 5-1/2H]-Uri Labeled modified nucleobase: O O NH NH NH2 O O F F HO O N NU NH A N A N C N U NH HN? NH N O N N N N D/H N O D/H N O O [1,3-15N2]-cmo5-Uri [2,8- 13C ]-m6-Ade [132 CH3]-m6-Ade [5-13C, 5-19F, 6-1/2H]-Cyt [5-13C, 5-19F, 6-1/2H]-Uri [1,3-15N2]-? Labeled ribose: HO HO HO HO HO HO O O O O O O OH OH OH OH OH OH OH OH OH OH OH OH [1?-13C]-ribose [2?-13C]-ribose [3?-13C]-ribose [4?-13C]-ribose [5?-13C]-ribose [1?,2?,3?,4?,5?- 13C5]-ribose HO HO HO HO HO DD HO D D O O D D O O O D O D D D D D OH OH OH OH OH OH OH OH OH OH OH OH [1?-2H]-ribose [2?-2H]-ribose [3?-2H]-ribose [4?-2H]-ribose [5?,5??-2H2]-ribose [1?,2?,3?,4?,5?,5??-2H6]-ribose Figure 1.12. List of possible labeling topologies3. These can be coupled to form rNTPs via chemo- enzymatic synthesis (Section 1.3.4) but also converted into amidites with further chemical synthesis (Section 1.3.5). Nucleobase labeling patterns (unmodified and modified) are based on the synthetic schemes described in Sections 1.3.4 and 1.3.5. These need not be mutually exclusive, and some labeled sites can be incorporated simultaneously. Labeled ribose, on the other hand, are available from commercial sources (Omicron Biochemicals and CIL). For example, there is a tremendous bottleneck for complete NMR resonance assignment, which is a prerequisite for RNA structure determination and useful analysis of dynamics data. On the one hand, there are dramatic costs associated with NMR sample preparation (Section 1.4). Many samples are needed for complete resonance assignment, and even more are needed for RNA structure determination. In addition, extreme time investments are required to characterize RNAs. The work needed to 54 determine the structure of a medium-sized RNA (~40 nt) often spans an entire PhD or postdoc, if not more. Our example from Section 1.3.1 illustrates this point. NMR structure determination of a 43 nt RNA required 20 NMR samples and 10 contributing authors72,73. The costs of materials for NMR sample preparation and the labor are prohibitively expensive for most research groups. Moreover, this RNA is only of modest size and studying larger RNAs will therefore involve greater financial and time commitments. However, RNA structural biology is moving toward larger and larger RNAs, especially as cryo-EM gains in resolution and popularity18,180,181. Nevertheless, solution NMR studies, unlike X-ray crystallography and cryo-EM, somewhat attempt to replicate the appropriate physiological environments and temperatures and are therefore more apt for investigating the structural dynamics (from picoseconds-to-seconds) of macro- molecules. Certainly, the technical advances and explosion of new data from X-ray and/or cryo-EM structures indicate exciting times for the use of solution NMR in integrative structural biology and drug discovery projects18,180,182?184. Still, the challenges associated with NMR studies of large systems must be met head-on. Attention must center on: (1) developing approaches for efficient position-specific or segmental labeling of large RNAs, (2) strategic ?divide-and-conquer? designs of atom-specifically labeled RNAs, (3) leveraging new labeling topologies, and/or (4) reducing the cost of selective deuteration. First, improving ligation efficiency would open the door for larger RNAs to be constructed from any variations of atom-specifically labeled RNAs from IVT and/or position-specifically labeled RNAs from SPS. Second, careful design of multiple small, functional, and folded core-RNAs that represent larger RNAs would be powerful if used in combination with atom-specific labeling. Indeed, this ?divide-and-conquer? 55 strategy185,186 has been used successfully to study RNAs as large as 155 nts18,60. Third, introducing 13C-19F spin pairs into RNA26,66,108, and leveraging the spectral properties of the 15N nuclei61,187, have shown initial progress toward lessening the burden imposed by large RNAs. Lastly, NMR experiments with deuterated rNTPs have been used with great success by the Summers research group to determine the structures of large RNAs18,59? 64. However, the prices of these rNTPs prevent their widespread use (Table 1.2). Taken together, analysis of large RNAs by solution NMR spectroscopy will always be a challenge. Such studies will involve trade-offs between costs and efficient labeling methods. Fortunately, the methodological developments described herein, and the versatile assortment of current labeling topologies (Figure 1.12) demonstrate a research community that has adapted to previous challenges and will continue to do so. 56 2 Chemical and enzymatic synthesis of RNA building blocks *This chapter is adapted from the following124,149. 2.1 Introduction As discussed in Chapter 1, NMR is a powerful biophysical tool to study RNA structure, dynamics, and interactions in solution and at high-resolution. However, these analyses face two major obstacles: spectral overlap and broad linewidths, both of which worsen as RNAs grow in size. To overcome these challenges, new stable isotope labeling strategies are needed. In this chapter, we will outline two recent examples of such efforts. First, I highlight the chemical synthesis of an atom-specifically labeled adenine124 that can be coupled to any ribose source using our chemo-enzymatic approach99?101 (Section 2.2). Second, we showcase a combined enzymatic and chemical synthesis of an atom-specific nucleobase and ribose labeled uridine 2?-O-CEM amidite149 (Section 2.3). This section included contributions from Dr. Owen Becette in collaboration with Dr. Serge Beacuage. As broadly discussed in Chapter 1, these isotope-labeled building blocks can then be used to make atom- and position-specifically labeled RNA. In this chapter, however, we will detail how to make RNA via IVT (Section 2.2.4.1) and SPS (2.3.4.2). 2.2 Synthesis of [2-13C, 7-15N]-ATP 2.2.1 Motivation A number of 13C and/or 15N-labeled adenine derivatives are known113?115,117,119?125, some of which are described in Sections 1.3.4.1.2, 1.3.4.1.3, and 1.3.4.1.5 and shown in Schemes 1.4-1.7 and 1.9. However, the specific combination of [2-13C, 7-15N]-adenine has only been reported by Moody and co-workers to study toyocamycin biosynthesis114. 57 Moreover, this labeling pattern has yet to be implemented in the corresponding rNTP building block for use in IVT of RNA. This particular labeling topology is advantageous for NMR analysis to detect canonical or noncanonical base pair interactions and ligand binding that occur on the Watson-Crick (C2) or Hoogsteen (N7) face of adenosines. Additionally, our labeling permits straightforward and accurate probing of RNA dynamics (more on this in Chapter 3). Here, we present a chemo-enzymatic procedure to synthesize [2-13C, 7-15N]-ATP and applications to probe Ade-C2 and -N7 sites in a large 61 nt RNA with NMR structure and dynamics measurements124. Importantly, these experiments can be readily adapted to any RNA of interest for facile NMR studies. 2.2.2. Synthetic overview The atom-specifically labeled [2-13C, 7-15N]-ATP was assembled with a hybrid chemical and enzymatic approach. First, we chemically synthesized 3C/15N-adenine following previous protocols115,117. Several routes to 15N-labeled adenine initiate from di- or triaminopyrimidines117,122 but they were not adopted in the present study due to the desired Ade-C2 labeling. We therefore employed a C2H5ONa mediated cyclization of ethyl cyanoacetate 120 with [13C]-thiourea 121 to form the thiouracil 122188. Subsequent nitrosylation of 122 installed the 15N label by electrophilic substitution using the cost- effective isotope source Na15NO2 to yield 151117. A Na2S2O4 mediated reduction of the nitroso group of 151 gave 152 and desulfurization over Raney-Nickel formed the diaminopyrimidone 153116. Treatment of 153 with H2SO4 and CHOOH yielded hypoxanthine 154117. Subsequent reaction of 154 with POCl3 and N,N-DMA gave the chloropurine 155, which required chromatographic purification118. In the final step, reaction of 155 with NH3 in CH3OH in a microwave reactor yielded the desired [2-13C, 7- 58 15N]-adenine 156118. All intermediate compounds 122 and 151-155 displayed the expected 1H and 13C NMR spectra reported in the literature115,117,118 (Figures A.1-A.10).1H (Figure A.11) and 13C (Figure A.12) NMR and electron spray ionization (ESI) MS data (Figure A.13) all confirmed the identity of 156. Scheme 2.1. Synthesis of [2-13C, 7-15N]-adenine124. Additional detail is found below. KEY: = 13C N = 15N O O O Raney-Ni, O S C2H5ONa NH HCl (aq), NaNO2 ON NH Na2S2O4, NaHCO3 (sat) H2N 5% NH3 (aq) NC + NH O H2N NH2 2 h, 80 ?C H N N S 5 h, 0 ?C H N N S 6 h , 0?C H N N S 2 h , 100?C71% 2 H 90% 2 H ~99% 2 H ~99% 120 121 122 151 152 O O Cl NH2 H2N NH H2SO4 (aq), HCOOH (aq) N NH POCl3, N,N-DMA N N NH3 (aq) in CH3OH N N H N N 19 h , 100?C N N 30 min, 150?C N N 2 h, 150?C / 18 h rt2 N33% H ~99% H 87% H N 153 154 155 156 Then, enzymes from the pentose phosphate and nucleotide salvage biosynthetic pathways and a dATP regeneration system were used to couple the newly assembled 156 with D-ribose followed by subsequent phosphorylation to the corresponding ATP, as in Scheme 1.1197,98,101. This one-pot enzymatic reaction was used following our previously published procedure99 with minor alterations. In brief, RK phosphorylates D- ribose 157 at its O5 position to yield 158. Then, PRPPS pyrophosphorylates 158 at its O1? site to form 159. APRT then facilitates the nucleophilic attack of the C1 site of 159 by the N9 of 156 to form the 5?-monophosphate 160. Finally, MK phosphorylates 160 to afford the 5?-diphosphate 161, which is then phosphorylated a final time by CK to yield the desired [2-13C, 7-15N]-ATP 162 (Scheme 2.2). Complete triphosphate formation of 162 was monitored by 31P NMR and was in agreement with previous reports189 (Figure A.14). 59 Scheme 2.2. Synthesis of [2-13C, 7-15N]-ATP124. Additional detail is found below. MK dAMP + dATP 2 dADP KEY: = 13C N = 15N dATP regeneration system = CK dADP + creatine phosphate dATP + creatine NH2 O O N HO O P O O P O 156 N RK PRPPS APRT NO O O H NOH OH O O O O O P O P O OH OH dATP dADP OH OH dATP dAMP OH OH PPiO O 157 158 159 NH2 NH2 NH2 O N N O O NMK N O O O N N O P O N O P O P O CK N O P O P O P O O O N O O O N N O O O O N dATP dADP creatine phosphate creatine OH OH OH OH OH OH 160 161 162 Taken together, our synthetic route (Schemes 2.1 and 2.2) provides 162 in 16% total yield with seven chemical steps, one enzymatic step, and two chromatographic purifications. This route was impacted by the low yielding (33%) formation of 154, which resulted from complications with filtration in our hands and not with overall conversion of 153 to hypoxanthine 154. Nevertheless, all other reactions proceeded with excellent yield to assemble the first report of an [2-13C, 7-15N]-ATP. What is more, our approach enables the rapid production of additional ATP building blocks by coupling 156 to different 2H- and/or 13C-labeled D-ribose sources. 2.2.3 Synthetic details [2-13C]-6-amino-2-thiouracil (122, C313CH5N3OS) C2H5ONa (4.74 g, 69.61 mmol, 1.05 eq.) was dissolved in C2H5OH (150 mL) and then ethyl cyanoacetate 120 (7.50 g, 66.30 mmol, 1.00 eq.) was added. Next, 13C-thiourea 121 (5.10 g, 66.1 mmol, 1.00 eq.) was added and refluxed at 80 ?C for 2 h. The solvents were removed under reduced pressure and the solid was dissolved in water (90 mL). Then, 60 CH3COOH was added to pH 7 to precipitate the product. The solid was vacuum filtered, successively washed with water (20 mL), C2H5OH (20 mL), and acetone (20 mL), and pure compound 122 was obtained as a white powder. Yield: 6.76 g (71%); 1H NMR (300 MHz, DMSO-d6): ? = 11.60 (s, 1H, N(3)H), 11.51 (s, 1H, N(1)H), 6.36 (s, 2H, N(6)H2), 4.70 (s, 1H, C(5)H) ppm; 13C NMR (75 MHz, DMSO-d6): ? = 175.07 (13C(2)) ppm. [2-13C]-6-amino-5-[nitroso-15N]2-thiouracil (151, C313CH4N315NO2S) Compound 122 (6.76 g, 46.89 mmol, 1.00 eq.) was suspended in 1 N hydrochloric acid (HCl) (135 mL) and Na15NO2 (3.45 g, 49.23 mmol, 1.05 eq.) (41 mL) was added. The reaction was cooled to 0 ?C and stirred for 5 h until a red precipitate formed. The solid was vacuum filtered and successively washed with cold water (50 mL) and C2H5OH (50 mL) to give pure compound 151 as a red powder. Yield: 7.35 g (90%); 1H NMR (300 MHz, DMSO-d6): ? = 12.56 (s, 1H, N(3)H), 11.23 (s, 1H, N(1)H), 7.71 (s, 2H, N(6)H2) ppm. [2-13C]-6-amino-5-[amino-15N]2-thiouracil (152, C313CH6N315NOS) Compound 151 (7.35 g, 42.20 mmol, 1.00 eq.) was suspended in saturated NaHCO3 (200 mL). The reaction was cooled to 0 ?C with an ice bath and solid Na2S2O4 (19.10 g, 109.72 mmol, 2.60 eq.) was added in 4 equal portions at 0 ?C. The mixture was stirred for 6 h and then CH3COOH was added until pH 6 was reached. The resulting precipitate was vacuum filtered and successively washed with ice cold water (50 mL) and C2H5OH (50 mL) to give pure compound 152 as a pale-yellow solid. Yield: 6.76 g (~99%); 1H NMR (300 MHz, DMSO-d6): ? = 5.68 (s, 4H, N(5)H2, N(6)H2) ppm. [2-13C]-6-amino-5-[amino-15N]4-pyrimidinone (153, C313CH6N315NO) Compound 152 (5.37 g, 33.50 mmol, 1.00 eq) was dissolved in 5% NH3 (100 mL) and then Raney-Nickel (20 mL of a 50% slurry) was added and refluxed at 100 ?C for 2 h with 61 vigorous stirring. The hot reaction mixture was filtered over celite and the filter cake was washed with boiling water (30 mL total) several times. The filtrate was removed under reduced pressure and the residual yellow solid was co-evaporated with C2H5OH. Pure compound 153 was obtained as a pale-yellow solid. Yield: 4.29 g (~99%); 1H NMR (300 MHz, DMSO-d6): ? = 7.41 (d, 1JH2C2 = 200.32 Hz, 1H, 13C(2)H), 5.42 (s, 4H, N(5)H2, N(6)H2) ppm; 13C NMR (75 MHz, DMSO-d6): ? = 140.01 (13C(2)) ppm. [2-13C, 7-15N]-hypoxanthine (154, C413CH4N315NO) Compound 153 (1.75 g, 13.66 mmol, 1.00 eq.) was suspended in water (5.5 mL) to form a pale-yellow mixture. Then, concentrated H2SO4 (1.34 g, 13.66, 1.00 eq) and HCOOH (0.94 g, 20.49 mmol, 1.50 eq.) were successively added and refluxed at 100 ?C for 19 h. The mixture was allowed to cool to room temperature and neutralized with 28% NH3 and CH3COOH. The resulting red precipitate was vacuum filtered and successively washed with water (10 mL), C2H5OH (10 mL), and acetone (10 mL) to give pure compound 154 as a red solid. Yield: 0.63 g (33%); 1H NMR (300 MHz, DMSO-d6): ? = 12.56 (br, 2H, N(1)H, N(9)H), 8.12 (d, 1H, C(8)H) 7.98 (d, 1JH2C2 = 204.56 Hz, 1H, 13C(2)H) ppm; 13C NMR (75 MHz, DMSO-d6): ? = 145.10 (13C(2)) ppm. [2-13C, 7-15N]-6-chloropurine (155, C413CH3ClN315N) Compound 154 (0.63 g, 4.53 mmol, 1.00 eq.) was dissolved in POCl3 (16.7 mL) and N,N- DMA (1.4 mL) and refluxed at 150 ?C for 30 min. The black solution was cooled to room temperature and the solvent was removed under reduced pressure to form a black gum. Excess POCl3 was removed under high vacuum. The residual oil was dissolved in 25% NH3 (6 mL), silica gel (SiO2) (1.0 g) was added, and the crude product was purified via column chromatography (18.0 g, SiO2, ((CH2Cl2/CH3OH = 7/3 (v/v)) to obtain pure 62 compound 155 as a brown solid; Yield: 0.71 g (~99%); Rf: 0.90 (CHCl3/ CH3OH = 9/1 (v/v)); 1H NMR (300 MHz, DMSO-d6): ? = 8.75 (d, 1JH2C2 = 209.32 Hz, 1H, 13C(2)H), 8.69 (d, 1H, C(8)H) ppm; 13C NMR (75 MHz, DMSO-d6): ? = 151.95 (13C(2)) ppm. [2-13C, 7-15N]-adenine (156, C413CH5N415N) Compound 155 (0.71 g, 4.53 mmol, 1.00 eq.) was dissolved in 7 N NH3 in CH3OH (14 mL) and heated at 150 ?C in a sealed tube in a microwave reactor (300 W) for 2 h. The reaction stood overnight at room temperature until a precipitate formed. The solution was then vacuum filtered, the filtrate was evaporated, and pure compound 156 was obtained as a red solid. Yield: 0.54 g (87%); 1H NMR (300 MHz, DMSO-d6): ? = 8.12 (d, 1JH2C2 = 197.20 Hz, 1H, 13C(2)H), 8.11 (d, 1H, C(8)H), 7.13 (s, 2H, N(6)H2) ppm; 13C NMR (75 MHz, DMSO-d6, 25?C): ? = 152.77 (13C(2)) ppm; MS (ESI): m/z calculated for C413CH5N415N [M+H]+ 138.06271 Da, found 138.0629 Da. [2-13C, 7-15N]- adenosine 5?-triphosphate (162, C913CH12N415NO11P34-) Compound 162 was enzymatically synthesized in vitro. The 10 mL reaction was carried out in 50 mM sodium phosphate (Na3PO4) (pH 8), 100 mM potassium chloride (KCl), 0.2% sodium azide (NaN3), 10 mM magnesium chloride (MgCl2), 10 mM dithiothreitol (DTT), 0.5 mM dATP, 0.1% bovine serum albumin (BSA), 100 mM creatine phosphate (CP), 8 mM 156, 10 mM D-ribose 157, 0.05 mg/mL CK, 0.01 U/?L MK, 0.1 U/?L thermostable inorganic pyrophosphatase (TIPP, E.C. 3.6.1.1), 9x10-6 U/mm3 RK, 9x10-6 U/ ?L PRPPS (E.C. 2.7.6.1), and 9x10-6 U/mm3 APRT. The reaction was incubated at 37 ?C for 24 h. Crude compound 162 was purified by boronate affinity chromatography (Eluent A: 1 M TEA pH 9; Eluent B: acidified water pH 4), lyophilized to a powder, and resuspended in 63 Ultrapure water. Yield: 37 mg (~90%); 31P NMR (324 MHz, D2O): ? = -5.74 (d, 1P, P(g)), -10.92 (d, 1P, P(a)), and -21.26 (t, 1P, P(b)) ppm. 2.2.4 Applications to NMR studies Our motivation to synthesize building block 162 was for use in straightforward and accurate NMR analysis of RNA. We therefore used 162 along with unlabeled GTP, UTP, and CTP to make an atom-specifically labeled 61 nt RNA by IVT (Figure 2.1) and probed Ade-C2 and -N7 sites with NMR structure and dynamics measurements. Figure 2.1. Scheme of [2-13C, 7-15N]-ATP labeled RNA124. Secondary structure of an atom-specifically labeled 61 nt RNA made from IVT. Nucleotides harboring building block 162 are colored orange and numbered. One- (1JH2C2) and two-bond (2JH8N7) couplings38 that will be used for magnetization transfer in NMR experiments are shown. Additional details can be found below. 2.2.4.1 RNA transcription RNA was prepared as previously described30. In brief, IVT was carried out in 40 mM Tris- HCl (pH 8 at 37 ?C), 1 mM spermidine, 0.01% Triton-X100, 80 mg/mL polyethylene glycol, 64 0.3 ?M DNA templates (RNA of interest: 5? ? GGTCCATGCCCCAAAGCCACCCAAGGCACAGCTTGGAGGCTTGAACAGTAGGACA TGAACCTATAGTGAGTCGTATTAG ? 3? and T7 RNAP promoter: 5? ? CTAATACGACTCACTATAG ? 3?), 1 mM DTT, 2 U/?L TIPP, 1.88 mM 162, 1.88 mM GTP, 1.88 mM UTP, 1.88 mM CTP, 7.5 mM MgCl2, and 0.1 mg/mL T7 RNAP. Reaction proceeded for 3 h at 37 ?C. After IVT, the sample was extracted with acid- phenol:chloroform (CHCl3), ethanol precipitated, and purified by preparative denaturing polyacrylamide gel electrophoresis (PAGE) and electroeluted. The sample was subsequently dialyzed 3-5 times against UltraPure water, folded in NMR buffer A (10 mM Na3PO4, 0.02% NaN3, 0.1 mM ethylenediaminetetraacetic acid (EDTA), 0.1 mM sodium- 3-(trimethylsilyl)-1-propanesulfonate (DSS), pH 6.7), lyophilized, and resuspended in D2O. The RNA NMR sample was 0.3 mM in 300 ?L (calculated using an extinction coefficient of 768.3 mM-1cm-1). 2.2.4.2 NMR structure measurements As a first application, we implemented 2D heteronuclear single quantum spectroscopy (HSQC) experiments using one (1JH2C2)- and two (2JH8N7)-bond couplings for magnetization transfer. In all eight Ade-H8-N7 resonances were clearly resolved whereas Ade-H2-C2 resonances from palindromic nucleotides A6 and A55 (Figure 2.1) showed overlap (Figure 2.2A). An important source of inter-proton distances for determining RNA structure by NMR spectroscopy is the nuclear Overhauser effect (NOE). Therefore, as a second application we sought to resolve 1H-1H NOEs into a third dimension using 13C- and 15N-edited 3D NOESY HSQC experiments to uncover all protons within ~5 ? of adenosine H2 and H8 protons. In A-helical RNA, Ade-H2 protons show cross-peaks to 65 3?-neighboring H1? protons on both sides of the helical strand and to their own H2? protons38 (Figure 2.2B). Ade-H8 protons, on the other hand, show cross-peaks to 5?- neighboring H1? protons as well as to their own H1? and H2? protons38 (Figure 2.2B). Correspondingly, we used the chemical shift of Ade-C2 and -N7 to resolve NOE cross- peaks to H1? and H2? protons of H2 and H8 resonances, respectively (Figure 2.2C). Figure 2.2. NMR structure experiments in [2-13C, 7-15N]-ATP labeled RNA124. (A) 2D 1H-13C and 1H-15N HSQC spectra showing H2-C2 and H8-N7 resonances. (B) Representation of NOE cross-peaks to H2 and H8 protons for nucleotides A13. (C) 2D 1H-1H slice along a single 13C and 15N frequency. Overlapped H1? and H2? NOE cross-peaks are resolved by the C2 and N7 chemical shift. All NMR spectra are annotated with RNA resonance assignments. 66 2.2.4.3 NMR dynamics measurements As a final application, we measured dynamics in our 61 nt RNA. The two main dynamics parameters are the longitudinal (R1) and transverse (R2) relaxation rates, both of which will be explained in more detail in Chapter 3. An alternative method to measure R2 in RNA is a transverse rotating-frame (R1?) experiment whereby radio frequency field pulse spin- locks the magnetization in the rotating-frame and the relaxation rate constant along the effective field is measured190,191. Due to the large adenosine 1JH2C2 (203 Hz) as compared to 2JH8N7 (11 Hz) coupling38, we used transverse relaxation optimized spectroscopy (TROSY)-detected pulse schemes to measure 13C R1 and R2 (from R1?) rates in the 61 nt RNA. All Ade-C2 nuclei showed monoexponential decay and the corresponding relaxation rates were easily obtained (Figure 2.3). These relaxation measurements can help identify which RNA nucleotides are flexible or in chemical exchange and are therefore critical to understand RNA function. Figure 2.3. NMR dynamics measurements in [2-13C, 7-15N]-ATP labeled RNA124. Representative 13C R1 and R1? decay curves are shown for RNA nucleotide A13. Extracted rates and curve fits are shown. 2.2.4.4 NMR spectroscopy details 2D HSQC, 3D NOESY HSQC, and R1 and R1? NMR experiments were performed on an Avance III Bruker Ultrashield Plus 600 MHz spectrometer with a room temperature triple 67 resonance probe. For the 2D HSQC experiments, the sweep widths were set to 2.5 (13C), 2.2 (15N), and 13 (for 1H-15N HSQC) or 16 (1H-13C HSQC) (1H) ppm with 64 scans and 64 increments. Carriers were centered at 150.6 (13C), 230.6 (15N), and 4.7 (1H) ppm. The 13C-192 and 15N-edited193 3D NOESY HSQC experiments were recorded using previously described pulse sequences. For the 13C-edited experiment, INEPT transfer delays (1/(4J)) were set to 1.2 ms which is optimal for a 203 Hz 1JH2C2 coupling38. The sweep widths were set to 2.75 (13C) and 10.0 (1H) ppm. Carriers were centered at 150.6 (13C), 8.0 (indirect 1H), and 4.7 (1H) ppm. For the 15N-edited experiment, INEPT transfer delays (1/(4J)) were set to 16.7 ms, which is optimal for a 15 Hz 2JH8N7 coupling38. The sweep widths were set to 2.2 (15N) and 10.0 (1H) ppm. Carriers were centered at 230.6 ppm (15N), 10.0 (indirect 1H), and 4.7 (1H) ppm. For each 3D data set, 200x64 complex point sampling matrices were used for the indirect 1H and 13C/15N dimensions along with 96 increments and non-uniform sine-weighted Poisson-gap sampling of 10% was used194. TROSY-detected measurements of 13C R1 and R1? relaxation rates were adapted from previous pulse sequences190,191,195. The sweep widths were set to 2.6 (13C) and 10.0 (1H) ppm with 32 scans and 120 increments. Carriers were centered at 151.1 (13C) and 4.7 (1H) ppm, and a 203 Hz 1JH2C2 coupling38 was used for coherence transfer. For R1 experiments, relaxation delays of 0.10, 0.20 (x2), 0.40, 0.60, 0.80, 1.00, and 1.20 s were used. For R1? experiments, relaxation delays of 1.0 (x2), 3.0, 5.0, 7.0, 9.0, 11.0, and 13.0 ms were used. The strength of spin-lock field (?1) was 1.882 kHz and the offset from the spin-lock carrier frequency (?) was 6 Hz. All NMR data were collected at 25 ?C with a recycle delay of 1.5 s, and analyzed using TopSpin 4.0, NMRFx Processor, and NMRViewJ196. R1 and R1? relaxation rates 68 were determined by fitting peak intensities to a monoexponential decay. Uncertainties in R1 rates were estimated by propagating the error in peak intensities from duplicated delay points (indicated by ?x2?). Uncertainties in R1? rates were estimated from spectral signal- to-noise with RELAXFIT197. R2 rates were corrected for the off-resonance ?1 by190,191 ?", = ?"(???%?) + ? (???%?) and ? = ???:" ; <= % ?. (2.1, 2.2) > 2.3 Synthesis of [1?,6-13C2, 5-2H]-uridine 2?-O-CEM amidite 2.3.1 Motivation Although the synthesis of [8-13C]-N6-Ac-adenosine and [8-13C]-N2-Pac-guanosine and [6- 13C, 5-2H]-N4-Ac-cytidine and [6-13C, 5-2H]-uridine 2?-O-CEM amidites are known (Figure 1.5)112, the labeling of both nucleobase and ribose moieties has yet to be implemented into such building blocks. Recent work from our group and Kreutz and co-workers148 demonstrated initial success in this direction. Here, our group combined chemo- enzymatic nucleobase and ribose coupling99?101 with chemical synthesis to produce an [1?,8-13C2]-N6-Bz-adenosine 2?-O-tBDMS amidite with four-steps in 11% total yield148. This hybrid approach is beneficial because it facilitates unambiguous resonance assignment in the nucleobase (i.e., H6-C6) and ribose (i.e., H1?-C1?). Moreover, given the increased coupling efficiency of CEM112,139?141 2?-OH protecting group as compared to TOM138 and tBDMS33 groups, this resonance assignment can help the study of large RNAs >60 nt. Here, we present a combined enzymatic and chemical method to synthesize [1?,6-13C2, 5-2H]-uridine 2?-O-CEM amidite. Importantly, this precursor can be used in solid-phase synthesis to produce large atom- and position-specifically labeled RNAs amenable for NMR analysis, which will be the focus of future work. 69 2.3.2 Synthetic overview The atom-specific nucleobase and ribose labeled uridine 2?-O-CEM amidite was assembled with a hybrid enzymatic and chemical approach. First, we used enzymes from the nucleotide salvage biosynthetic pathways97,98,101 to couple [6-13C, 5-2H]-uracil 5 (Scheme 1.1100,104,107) and commercially available [1-13C]-D-ribose 163. This one-pot reaction was used modifying our previously published procedure100,148. In brief, RK phosphorylated 163 at its O5 site to yield 164. Then, PRPPS pyrophosphorylates 164 at its O1? site to form 165. UPRT then facilitates the nucleophilic attack of the C1 site of 165 by the N1 of 5 to afford the 5?-monophosphate 166. Finally, 166 was dephosphorylated by rSAP to yield the desired [1?,6-13C2, 5-2H]-uridine 167 (Scheme 2.3). Scheme 2.3. Synthesis of [1?,6-13C2, 5-2H]-uridine149. Additional detail is found below. KEY: D = 2H = 13C O O HO O P O O P O O RK O O PRPPSOH OH O O O O O P O P O OH OH dATP dADP OH OH dATP dAMP OH OH O O 163 164 165 O O O D NH D D 5 O NH NH N O P O HOO N O UPRT N OH O O rSAP O PPi OH OH NaPi OH OH 166 167 The newly assembled 167 was then used as the starting point to chemically synthesize the desired 2?-O-CEM amidite following established protocols112,139?141. To ensure selective 2?-O-alkylation, the 3?- and 5?-OH groups of 167 were protected with 1,3- dichloro-1,1,3,3-tetraisopropyldisiloxane (TIPS-Cl2) to yield the 3?,5?-di-O-TIPS-uridine 168. Subsequent reaction of 168 with 2-cyanoethyl methylthiomethylether (C5H9NOS, 70 prepared as previously described139) as the alkylating agent and N-iodosuccinimide (NIS) as the activator led to efficient production of 2?-O-CEM uridine 169 at low temperature (- 45 ?C). Treatment of 169 with ammonium fluoride (NH4F) led to removal of the 3?,5?-di-O- TIPS group to yield 170. In the second to last step, the 5?-OH group of 170 was tritylated with DMT to form 171. Lastly, 3?-OH phosphitylation of 171 with CEP-Cl yielded the desired [1?,6-13C2, 5-2H]-uridine 2?-O-CEM amidite 172 (Scheme 2.4). Pure intermediate compounds 168 and 172 displayed the expected 1H and 13C NMR spectra reported in the literature115,117,118 (Figures A.15-A.18).1H (Figure A.19), 13C (Figure A.20), and 31P (Figure A.21) NMR and ESI MS data (Figure A.22) all confirmed the identity of 172, albeit with some phosphonate impurity (Figure A.21). Scheme 2.4. Synthesis of [1?,6-13C2, 5-2H]-uridine 2?-O-CEM amidite149. Additional detail is found below. KEY: D = 2H = 13C O O O D NH D NH D1) C H NOS in THF NH5 9 HO N 2) TfOHO O N O O N O O TIPS-Cl2 in pyridine Si O 3) NIS Si O NH4F in CH3OH O O 1.5 h, rt Si 1) 30 min, -45 ?C 5 h, 50 ?COH OH 39% O OH 2) 10 min, -45 ?C Si O OCEM ~100% 3) 15 min, -45 ?C 167 168 ~100% 169 O O O D NH D NH D NH HO N O DMTO DMTO O DMT-Cl in pyridine N O O DiPEA, CEP-Cl in CH2Cl2 N O O 3 h, rt 16 h, rt OH OCEM 60% OH OCEM 45% CEPO OCEM 170 171 172 Taken together, the combined enzymatic (Scheme 2.3) and chemical (Scheme 2.4) synthetic route provides 173 in an overall yield of 8% with two enzymatic steps, five chemical steps, and five chromatographic purifications. and two of the first atom-specific nucleobase and ribose labeled [1?,6-13C2, 5-2H]-uridine CEM amidite. This route was 71 impacted by the low yielding (39%) formation of 169, which resulted from complications with purification of the 3?,5?-di-O-TIPS product in our hand and not with conversion from 168 to 169. Nevertheless, all other reactions proceeded with reasonable yield to assemble the first report of an atom-specific nucleobase and ribose labeled 2?-O-CEM amidite. What is more, our approach enables the rapid production of additional building blocks by building 168 into various 2?-O-protected amidites, or by coupling different combinations of isotope-labeled nucleobase and ribose prior to chemical synthesis. 2.3.3 Synthetic details [1?,6-13C2, 5-2H]-uridine (167, C713C2H112HN2O6) (by Dr. Owen Becette) Compound 167 was enzymatically synthesized in vitro. The 200 mL reaction was carried out in 50 mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) pH 8, 0.2% NaN3, 10 mM MgCl2, 10 mM DTT, 0.5 mM dATP, 0.1% BSA, 100 mM CP, 10 mM 5, 8 mM 163, 0.005 mg/mL CK, 0.01 U/mL MK, 0.1 mg/mL TIPP, 1x10-5 U/?L RK, 1x10-5 U/?L PRPPS, and 0.1 mg/mL UPRT. The reaction was split into five 40 mL aliquots and incubated at 37 ?C for 24 h. The 5?-monophosphate 166 was converted into crude compound 168 by adding rSAP (1.81 m ?L/per reaction mL) and incubating at 37 ?C for an additional 24 h. Crude compound 167 was purified by boronate affinity chromatography (Eluent A: 1 M TEA pH 9; Eluent B: acidified water pH 4) and lyophilized to an off-white oil. Finally, the oil was dissolved (CHCl3/CH3OH = 1/1 (v/v)) and purified via column chromatography (7.5 g SiO2, (CHCl3/CH3OH = 9/1 (v/v))) to yield pure compound 167 as a white foam after drying under high vacuum. Yield: 351 mg (~80% to 72 input uracil for each 40 mL aliquot); Rf: 0.75 (CHCl3/CH3OH = 3/2 (v/v)). 1H NMR: not determined; 13C NMR: not determined. [1?,6-13C2, 5-2H]-3?,5?-di-O-TIPS-uridine (168, C1913C2H372H N2O7Si2) Compound 167 (351 mg, 1.42 mmol, 1.00 eq.) was co-evaporated with anhydrous pyridine and then dissolved in fresh anhydrous pyridine (1.81 mL). Then, TIPS-Cl2 (500 mg, 1.59 mmol, 1.10 eq.) was added dropwise over 1 h and the mixture was stirred under argon atmosphere at room temperature for 30 min. After TLC (CHCl3/CH3OH = 9/1 (v/v)) showed total conversion, the solution was diluted with saturated NaHCO3 (20 mL) and extracted with dichloromethane (CH2Cl2) (20 mL). The organic phase was dried over anhydrous sodium sulfate (Na2SO4) and the solution was evaporated to dryness. The residual oil was co-evaporated with anhydrous toluene twice and dried under high vacuum. Crude product 168 was purified via column chromatography (6.5 g, SiO2, (CHCl3/CH3CN = 100/0-80/20 (v/v))) to obtain pure compound 168 as a white foam after drying under high vacuum. Yield: 271 mg (39%); Rf: 0.46; 1H NMR (300 MHz, DMSO-d6): ? = 11.35 (s, 1H, N(3)H); 7.98 (d, 1JH6C6 = 181.43 Hz, 1H, 13C(6)H); 5.82 (d, 1JH1?C1? = 173.11 Hz, 1H, 13C(1?)H); 5.58 (d, 3JHH = 9.17 Hz, 1H, C(2?)OH); 4.28-4.11 (m, 3H, C(2?)H; C(3?)H; C(4?)H); 3.98-3.89 (m, 2H, C(5?)H; C(5??)H); 1.05-0.97 (m, 28H, 4xSi-CH-(CH3)2; 4xSi-CH-(CH3)2); 13C NMR (75 MHz, DMSO-d6): ? = 140.21(13C(6)); 90.99 (13C(1?)). [1?,6-13C2, 5-2H]-2?-O-CEM-3?,5?-di-O-TIPS-uridine (169, C2313C2H422HN3O8Si2) Compound 168 (260 mg, 0.53 mmol, 1.00 eq.) was dissolved in anhydrous tetrahydrofuran (THF) (4 mL) and C5H9NOS (146 mg, 1.12 mmol, 2.10 eq.) was added. The solution was cooled to -45 ?C with a dry ice/acetonitrile (CH3CN) bath and stirred under argon atmosphere for 30 min. Trifluoromethanesulfonic acid (TfOH) (167 mg, 1.12 73 mmol, 2.10 eq.) was carefully dropped into the mixture over a period of 10 min and then NIS (113 mg, 1.12 mmol, 2.10 eq.) was added in one portion. The reaction mixture was stirred for 15 min at -45 ?C and then TEA (113 mg, 1.12 mmol, 2.10 eq.) was slowly added over a period of 20 min to quench the reaction. The mixture was diluted with ethyl acetate (10 mL) and washed with saturated sodium thiosulfate (Na2S2O3) (10 mL) and saturated NaHCO3 (10 mL). The organic layers were evaporated to dryness and the residual oil was dissolved in ethyl acetate (10 mL). The mixture was successively washed with water (10 mL), saturated Na2S2O3 (10 mL), and saturated sodium chloride (10 mL). The organic layer was dried over anhydrous Na2SO4 and the solution was evaporated to dryness. The residual light brown oil was isolated as crude compound 169 and dried under high vacuum. No further purification steps were used and the crude product 169 was used in the next synthetic step. Yield: assumed to be 304 mg (100%); Rf: 0.52 (CHCl3/CH3OH = 9/1 (v/v)); 1H-NMR: not determined; 13C NMR: not determined. [1?,6-13C2, 5-2H]-2?-O-CEM-uridine (170, C1113C2H162HN3O7) Crude compound 169 (304 mg, 0.53 mmol, 1.00 eq.) was dissolved in anhydrous CH3OH (7.5 mL) and NH4F (72.2 mg, 1.95 mmol, 3.67 eq.) was added. The reaction mixture was heated to 50 ?C and stirred for 5 h under argon atmosphere. After TLC (CH3Cl/CH3OH = 9/1 (v/v)) showed total conversion, CH3OH was removed under reduced pressure. The residue was dissolved in CH3CN (10 mL) and the white precipitate that formed was removed by vacuum filtration and washed with CH3CN. The solution was extracted with n-hexane twice, the hexane layers were discarded, the CH3CN layer was dried over anhydrous Na2SO4, and the solution was evaporated to dryness. The residual oil was isolated as crude compound 170 and dried under high vacuum. No further purification 74 steps were used and the crude product 170 was used in the next synthetic step. Yield: assumed to be 175 mg (100%); Rf = 0.18 (CHCl3/CH3OH = 9/1 (v/v)); 1H NMR: not determined; 13C NMR: not determined. [1?,6-13C2, 5-2H]-2?-O-CEM-5?-O-DMT-uridine (171, C3213C2H342HN3O9) Crude compound 170 (175 mg, 0.53 mmol, 1.00 eq.) was co-evaporated with anhydrous pyridine and then dissolved in fresh anhydrous pyridine (3.5 mL). Then, DMT-Cl (215 mg, 0.64 mmol, 1.20 eq.) was added with stirring at room temperature under argon atmosphere for 3 h. After TLC (CH3Cl/CH3OH = 9/1 (v/v)) showed total conversion, the reaction was quenched with cold water (10 mL) and extracted with CHCl3 (10 mL) twice. The organic phase was dried over anhydrous Na2SO4 and the solution was evaporated to dryness and dried under high vacuum. The crude product 171 was purified via column chromatography (6.0 g, SiO2, ((CH3Cl/CH3OH = 100/0-98/2 (v/v)) + 0.5% pyridine) to obtain pure compound 171 as an off-white solid. Yield: 200 mg (60%); Rf = 0.33 (CHCl3/CH3OH = 9/1 (v/v)); 1H NMR (300 MHz, DMSO-d6): ? = 11.39 (s, 1H, N(3)H); 8.04 (d, 1JH6C6 = 181.75 Hz, 1H, 13C(6)H); 7.40-7.14 (m, 9H, arom. CH); 6.92-6.89 (d, 3JHH = 8.98 Hz, 4H, arom. CH-C-OCH3); 5.50 (d, 1JH1?C1? = 170.37 Hz, 1H, 13C(1?)H); 5.37 (d, 3JHH = 6.22 Hz, 1H, C(3?)OH); 4.83 (s, 2H, -O-CH2-O-); 4.26-4.24 (m, 2H, C(2?)H; C(3?)H); 3.98 (singlettoid, 1H, C(4?)H); 3.744 (s, 6H, 2x -OCH3); 3.70-3.64 (m, 2H, -O-CH2-CH2-); 3.33- 3.21 (m, 2H, C(5?)H; C(5??)H); 2.79 (dd, 2JHH = 15.86, 3JHH = 3.92, 2H, -O-CH2-CH2); 13C NMR (75 MHz, DMSO-d6): ? = 140.55 (13C(6)); 87.98 (13C(1?)). [1?,6-13C2, 5-2H]-2?-O-CEM-5?-O-DMT-uridine amidite (172, C4113C2H512HN5O10P) Compound 171 (195 mg, 0.31 mmol, 1.00 eq.) was dissolved in anhydrous CH2Cl2 (3 mL). Then, both DiPEA (119 mg, 0.92 mmol, 3.00 eq.) and CEP-Cl (109 mg, 0.46 mmol, 75 1.5 eq.) were added and the solution was stirred overnight (~16 h). Monitoring with TLC (CH3Cl/CH3OH = 9/1 (v/v)) showed an incomplete reaction, so additional CEP-Cl (0.5 eq.) was added, and the solution was stirred until TLC (CH3Cl/CH3OH = 9/1 (v/v)) showed total conversion (~1 h). The reaction was quenched with water (10 mL) and extracted with CH2Cl2 (10 mL) twice. The organic phase was dried over anhydrous Na2SO4 and the solution was evaporated to dryness and dried under high vacuum. The crude product 172 was purified via column chromatography (7.5 g, SiO2, (benzene/TEA = 9/1 (v/v)) to obtain pure compound 172 as a white foam consisting of a mixture of two diastereomers after drying under high vacuum. Yield: 120 mg (45%); Rf = 0.54 + 0.58 (CHCl3/CH3OH = 9/1 (v/v)); 1H NMR (300 MHz, DMSO-d6): ? = 11.38 (s, 1H, N(3)H); 8.06 (d, 1JH6C6 = 181.75 Hz, 1H, 13C(6)H); 7.38-7.23 (m, 9H, arom. CH); 6.92-6.88 (d, 3JHH = 11.51 Hz, 4H, arom. CH-C-OCH3); 6.14 (d, 1JH1?C1? = 169.46 Hz, 1H, 13C(1?)H); 4.85-4.79 (m, 2H, -O-CH2-O-); 4.43-4.40 (m, 2H, C(3?)H), C(2?)H); 4.13-3.99 (m, 2H, C(4?)H), -P-O-CH?2-CH2-); 3.74 (s, 6H, 2x -OCH3); 3.73-3.69 (m, 2H, -P-O-CH??2-CH2-; -CH2-O-CH?2-CH2-); 3.55-3.41 (m, 3H, C(5?)H; C(5??)H; -CH2-O-CH??2-CH2-); 2.80-2.73 (m, 2H, -P-O-CH2-CH2-; -CH2-O-CH2- CH?2-); 2.64-2.60 (m, 1H, -CH2-O-CH2-CH??2-); 1.24-1.08 (m, 14H 2x -N-CH-(CH3)2; 2x - N-CH-(CH3)2); 13C NMR (75 MHz, DMSO-d6): ? = 140.71 (13C(6)); 88.49 (13C(1?)); 31P NMR: (122 MHz, C6D6): ? = 151.33 (s); 149.62 (s); MS (ESI): m/z calculated for C4113C2H512HN5O10P [M-H]- 831.3498 Da, found 831.3481 Da. 2.3.4 Applications to NMR studies Our motivation to synthesize building block 172 was for NMR analysis of large RNAs. We plan to use used 172 along with unlabeled adenosine, guanosine, cytidine, and uridine 2?-O-CEM amidites to make position-specifically labeled RNAs >60 nt by SPS (Figure 76 2.4). However, as mentioned in Section 1.3.5.1, 2?-O-CEM amidites are commercially unavailable and therefore the unlabeled building blocks must also be synthesized. Figure 2.4. Scheme of position-specifically [1?,6-13C2, 5-2U]-uridine labeled RNA. Secondary structure of an atom- and position-specifically labeled 63 nt RNA made from SPS. Nucleotide harboring building block 173 is colored orange and numbered. One-bond (1JH6C6 and 1JH1?C1?) couplings38 that will be used for magnetization transfer are shown, and a hypothetical NMR spectrum is presented. 2.3.4.1 Synthesis of unlabeled 2?-O-CEM phosphoramidites Using the same overall synthetic route shown in Scheme 2.4, and the recommendations of previous work112,139?141, we synthesized N6-Pac-adenosine 173 (7% yield in six-steps) (Scheme 2.5), N2-Pac-guanosine 174 (11% yield in five-steps) (Scheme 2.6), N4-Ac- cytidine 175 (37% yield in four-steps) (Scheme 2.7), and uridine 176 (35% yield in four- steps) (Scheme 2.8) 2?-O-CEM amidites. Each amidite displayed the expected 1H, 13C, and 31P NMR spectra reported in the literature115,117,118 (data not shown) and their identities were further confirmed by ESI MS (Figures A.23-A.26). 77 Scheme 2.5. Synthesis of N6-Pac-adenosine 2?-O-CEM amidite. Additional detail is found above. HNPac HNPac HNPac N N N N N N HO TIPS-Cl2 O Ac2O, CH3COOH O 1) HOCH2CH2CN in THF O N N in pyridine Si O N N in DMSO Si O N N 2) TfOH, NIS O O 1.5 h, rt Si 24 h, rt Si 1) 30 min, -45 ?COH OH ~50% O OH 36% O O S 2) 20 min, -45 ?C ~100% MTM HNPac HNPac HNPac HNPac N N N N N N N N O N TEA-3HF HO DMT-Cl DMTO DiPEA, CEP-Cl DMTO Si O N in THF O N N in pyridine O N N in CH2Cl2 O N N O 2 h, 45 ?C 1.5 h, rt 2 h, rt Si O OCEM ~100% OH OCEM 72% OH OCEM 70% CEPO OCEM 173 Scheme 2.6. Synthesis of N2-Pac-guanosine 2?-O-CEM amidite. Additional detail is found above. O O O N NH N NH 1) C5H9NOS in THF N NH HO TIPS-Cl2 O 2) TfOH O O N N HNPac in pyridine Si O N N HNPac 3) NIS Si O N N HNPac O O 1.5 h, rt 1) 30 min, -45 ?C OH OH ~50% Si O OH 2) 10 min, -45 ?C Si O OCEM 3) 15 min, -45 ?C ~100% O O O N NH N NH N NH TEA-3HF HO DMT-Cl DMTO DiPEA, CEP-Cl DMTO in THF O N N HNPac in pyridine O N N HNPac in CH2Cl2 O N N HNPac 2 h, 35 ?C 3 h, 35 ?C 2.5 h, rt ~100% OH OCEM 50% OH OCEM 44% CEPO OCEM 174 Scheme 2.7. Synthesis of N4-Ac-cytidine 2?-O-CEM amidite. Additional detail is found above. HNAc HNAc HNAc N 1) C5H9NOS in THF N N O N 2) TfOHSi O O HO O 3) NIS Si N O O TEA-3HF in THF N O O O O Si 1) 30 min, -45 ?C 2 h, 45 ?CO OH 2) 10 min, -45 ?C Si O OCEM ~100% OH OCEM 3) 15 min, -45 ?C ~100% HNAc HNAc N N DMTO DMTO DMT-Cl in pyridine N OO DiPEA, CEP-Cl in CH2Cl2 N O O 1.5 h, rt 5 h, rt 61% OH OCEM 61% CEPO OCEM 175 78 Scheme 2.8. Synthesis of uridine 2?-O-CEM amidite. Additional detail is found above. O O O NH 1) C H NOS in THF NH NH5 9 O N 2) TfOHO OSi 3) NIS Si N O HO O O NH4F in CH3OH N O O O O Si 1) 30 min, -45 ?C 5 h, 50 ?CO OH 2) 10 min, -45 ?C Si O OCEM ~64% OH OCEM 3) 15 min, -45 ?C ~100% O O NH NH DMTO DMTO DMT-Cl in pyridine N OO DiPEA, CEP-Cl in CH2Cl2 N O O 1.5 h, rt 2 h, rt 82% OH OCEM 86% CEPO OCEM 176 However, there were some notable differences in some of the reactions. For example, based on commercially available material in-hand, syntheses of 173 and 174 initiated from N-protected nucleosides (Schemes 2.5 and 2.6) whereas that of 175 and 176 started from N-protected 3?,5?-di-O-TIPS nucleosides (Schemes 2.7 and 2.8). Moreover, in the synthesis of 173, 2?-O-CEM protection proceeded in two-steps. First, a methylthiomethyl (MTM) group was installed by Ac2O and CH3COOH in DMSO and then it was converted to CEM by the addition of 2-cyanoethanol (HOCH2CH2CN) and TfOH and NIS at low temperature (-45 ?C) (Scheme 2.5). Lastly, while 3?,5?-di-O-TIPS removal was carried out with NH4OH in CH3OH for 172 and 176 (Schemes 2.4 and 2.8), this step required TEA- 3HF for N-protected 3?,5?-di-O-TIPS nucleosides 173-175, and some of which at slightly different temperatures (Schemes 2.5-2.7). 2.3.4.2 RNA synthesis with 2?-O-tBDMS amidites With isotope-labeled and unlabeled 2?-O-CEM amidites now in-hand, we are ready to synthesize atom- and position-specifically labeled RNA by SPS. However, prior to using our newly synthesized 2?-O-CEM amidites 172-176, we wanted to first gain familiarity with 79 SPS. To this end, we used commercially available N6-Pac-adenosine, N2-Pac-guanosine, N4-Pac-cytidine, and uridine 2?-O-tBDMS amidites to synthesize a 63 nt RNA using standard techniques and procedures137. In brief, RNA synthesis (1 ?mol scale) was carried out on an Applied Biosystems 394 DNA/RNA synthesizer using the DMT-on method and 0.4 M tetrazole in CH3CN, THF/Ac2O, 0.05 M iodine in pyridine/water, 3% dichloracetic acid in CH2Cl2, and anhydrous CH3CN as the activator, capping, oxidation, deblocking, and wash solutions, respectively. Following synthesis, RNA cleavage and nucleobase deprotection were carried out using 1:3 (v/v) ethanol:NH4OH, and 2?-O- deprotection was achieved with TEA-3HF. Final high-performance liquid chromatography (HPLC) purification of the crude RNA cleavage solution confirmed the successful synthesis and our 63 nt RNA (data not shown). 2.3.4.3 RNA synthesis with 2?-O-CEM amidites We have yet to use our 2?-O-CEM amidites 172-176 in SPS. However, future work in this direction will be carried out following previously reported procedures112. In brief, synthesis (1 ?mol scale) will be completed on an Applied Biosystems Expedite 8909 Nucleic Acid Synthesizer using the DMT-on method and 0.25 M BTT in CH3CN, 3% Trichloroacetic acid in CH2Cl2, 0.1 M iodine in 7:1:2 (v/v/v) THF/pyridine/water, 5% phenoxyacetic anhydride in THF and 1:1:8 (v/v/v) N-methylimidazole/2,6-lutidene/THF, and anhydrous CH3CN as the activator, capping, oxidation, deblocking, and wash solutions, respectively. Following synthesis, RNA cleavage and nucleobase deprotection will be carried out using 1:3 (v/v) ethanol:NH4OH, and 2?-O-deprotection will be achieved with anhydrous 1 M tetrabutylammonium fluoride solution (with 1% nitromethane) in DMSO. RNA purification will then be carried out using preparative denaturing PAGE. With building blocks 172-176 80 in-hand and the above procedure in place, future efforts will focus on producing large >60 nt atom- and position-specifically labeled RNAs for unambiguous NMR resonance assignment and straightforward dynamics probing (Figure 2.4) 2.4 Conclusion We have presented two examples of new stable isotope labeling strategies to facilitate solution NMR studies of RNA structure and dynamics. Our first example showcases a combined chemical and enzymatic synthesis of an atom-specifically labeled [2-13C, 7- 15N]-ATP 162 (Section 2.2). In addition to detailing the synthetic procedures, we also summarize how 162 can be used in IVT to produce atom-specifically labeled RNA (Section 2.2.4.1). Our second example features a combined enzymatic and chemical synthesis of an atom-specific nucleobase and ribose labeled [1?,6-13C2, 5-2H]-uridine 2?- O-CEM amidite 172 (Section 2.3). Once more, we also detail how 172 can be used to make atom- and position-specifically labeled RNA (Section 2.3.4.3). Atom-specific labeling benefits NMR studies by alleviating the burdens imposed by spectral overlap and broad linewidths. While spectral overlap is circumvented by reducing the number of probe sites, broad linewidths are reduced by removing (or minimizing) excess dipolar-coupled partners. These dipolar couplings introduce competing relaxation pathways that broaden linewidths and reduce signal-to-noise (Figure 2.5). Atom-specific labeling also removes scalar couplings and therefore leads to more efficient coherence transfers (i.e., increased signal-to-noise) (Figure 2.5). The removal of these couplings therefore directly assists NMR structure measurements. Their removal also benefits NMR dynamics measurements, which will be detailed in Chapter 3. 81 Figure 2.5. Schematic of scalar and dipolar couplings in RNA. Example of atom-specifically [6-13C, 5- 2H]-Uri and uniformly [13C/15N]-labeled-Uri with all dipolar- (left) and scalar-coupled38 (right) partners to the H6-C6 spin par of interest (shaded green circle) shown in blue or orange arrows, respectively. Removal of excess dipolar coupling will lead to sharper Uri-H6-C6 linewidths. Removal of excess magnetization transfer between Uri-H6-C6 will improve signal-to-noise of Uri-H6-C6 signals. 82 3 NMR probes of accurate RNA dynamics *This chapter is adapted from the following3,198. 3.1 Introduction In Section 1.1, we argued that the knowledge of RNA structure is a prerequisite for a robust mechanistic understanding of RNA function. While this remains true, RNA structure is only part of the story. In reality, RNAs dynamically interconvert between conformational states to carry out their biological functions16,17. A robust understanding of RNA function therefore requires both high-resolution structure and dynamics information. Originating more than 45 years ago, early investigations of RNA dynamics were limited to the study of bacterial tRNAs using 1D NMR methods199. More than a decade later, development of 1D and 2D heteronuclear polarization transfer schemes to measure heteronuclear relaxation rates200?202 uniquely positioned solution NMR spectroscopy to probe protein203?206 and RNA207?211 dynamics. With multidimensional NMR, we can measure the dynamics of ribose, nucleobase, and phosphorous nuclei distributed along the entire RNA structure17,212?215. We can especially characterize motions that range from picoseconds-to-seconds and visualize conformers that are transient and sparsely populated (Figure 3.1). For these low populated states, we can extract chemical shifts (structure), rates (kinetics), and populations (thermodynamics) under various physiological conditions of temperature, salt, pH, and cellular environment. Finally, we can examine how the cellular milieu modulates the structure, dynamics, and interactions of RNA in real time. This chapter will explore the theory and application of fast (i.e., ps-ns) (Section 3.2) and slow (i.e., ?s-ms) (Section 3.3) motions, and how RNA labeling benefits such experiments (Sections 3.2.1 and 3.3.1). 83 Figure 3.1. RNA dynamics by solution NMR spectroscopy3. Dynamic processes in RNA and corresponding NMR methods and RNA nuclei that can be utilized and monitored to characterize such motions. The highlighted 13C and 15N sites have been used extensively in NMR spin relaxation and relaxation dispersion experiments143, whereas 31P213,214 and 2H212 sites are probed less frequently. Alternative time charts can be found elsewhere17,143,216,217. 3.2 Probing fast motions On the picosecond-to-nanosecond (ps-ns) time scale, spin relaxation provides information about the amplitude of motions powered by the bond vectors (e.g., 1H-13C, 1H-1H, 13C-19F, and 1H-15N) reorienting relative to the external applied magnetic field218? 220. Longitudinal relaxation describes the return to the equilibrium distribution of spins along the z-axis, with a characteristic exponential time constant T1 (or rate constant R1 = 1/T1). Transverse relaxation, on the other hand, describes the decay of magnetization in the transverse xy-plane, with a characteristic decay time constant T2 (or rate constant R2 84 = 1/T2). The heteronuclear NOE (hNOE) measures the rate at which proton magnetization is transferred to the heteroatom via their dipolar interaction. For an isolated pair of spin-? nuclei S and I (here, S is 13C, 15N, 19F, 31P and I is 1H), R1, R2 and the hNOE of nucleus S are related to the rotational diffusion tensor of the molecule according to well-known relations221,222 ? % %" = 3(? + ? )?(? ) + ?%D [?(?F ? ?D) + 6?(?F + ?D)] (3.1) 1 % ? % % ? % = [2 (? + ? ) 4? (0) + 3?(?D)] + [?(2 ?F ? ?D ) + 6?(?F + ?D) + 6?(?F)] + ?NO (3.2) ? ?% ???? = 1 + T F? V [ ? 6? (?F + ?D) ? ?(?F ? ?D)] (3.3) D " ? = TWXYZ[Z\V and ? = ? ?? /3 (3.4, 3.5) "] ^_ a D D[?\ where d and c are the well-known dipolar and chemical shielding anisotropy (CSA) constants, ??D = e? % %O + ?f ? ?O?f, ?O = ?gg ? ?"", ?f = ?gg ? ?%%, ?"", ?%%, and ?gg are the principal components of the CSA tensor223,224, ?(?) is a spectral density function which is assumed to be a Lorentzian (e.g., simplest form is ?(?) = hi _), ?"j(40 ns, especially at higher magnetic fields. Additional details can be found below. In u-[13C/15N]-ATP labeled RNAs, Ade-C2 is dipolar coupled to the attached H2, the adjacent N1 and N3, and the long-range (>2 ?) C4, C5, and C6 atoms (Figure 3.4A). Moreover, in fully protonated RNA, Ade-C2 also experiences long-range dipolar contributions from protons within the same nucleotide, those 3? and on the same strand, and those 3? and on the cross-strand38. If these long-range dipolar couplings contribute significantly to RNA relaxation, theoretical simulations should reveal discrepancies in adenosine Ade-R1,C2 and -R2,C2 rates and Ade-hNOEC2 values in u-[13C/15N]-ATP (i.e., Ade-R1,C2(uniform), Ade-R2,C2(uniform), and Ade-hNOE,C2(uniform)) and [2-13C]-ATP (i.e., Ade- 89 R1,C2(selective), Ade-R2,C2(selective), and Ade-hNOE,C2(selective)) labeled RNAs. To test this hypothesis, we used previously reported CSA values derived from solution NMR223 and Equations 3.1-3.5221,222 to simulate Ade-R1,C2 and -R2,C2 rates and Ade-hNOEC2 values in u-[13C/15N]-ATP and [2-13C]-ATP labeled RNAs. We do not observe differences in the simulated Ade-R2,C2(uniform) and -R2,C2(selective) rates or Ade-hNOE,C2(uniform) and -hNOE,C2(selective) values (Figure A.27), in agreement with previous studies227. However, our simulations do predict discrepancies in Ade-R1,C2(uniform) and -R1,C2(selective) (Figure 3.4A), an observation similar to that recently reported for purine- C8 sites227. Specifically, dipolar interactions result in overestimated R1,C2(uniform) rates that increase with higher magnetic fields and molecular weights (Figures 3.4A and B), as predicted by Equation 3.1 (Figures 3.3). Figure 3.4. Simulated Ade-C2 R1 rates in uniformly/selectively labeled RNA198. (A) Simulated Ade- R1,C2(uniform) and-R1,C2(selective) rates with a scheme of adenine (numbered by atom and with interatomic distances (?) to C2) shown as an inset. (B) Simulated Ade-R1,C2 difference (as defined above). (C) Simulated Ade-R1,C2 difference for each dipolar coupling partner (i.e., Ade-C4, -C5, -C6, -N1, and -N3). Simulations in B were carried out with increasing magnetic field strengths and ?c whereas those in A and C are at 800 MHz only. Our simulations suggest that dipolar interactions (primarily 13C-13C) result in overestimated Ade-R1,C2(uniform) rates that increase with higher magnetic fields and molecular weights. Additional details can be found below. Moreover, the Ade-R1,C2 difference (as defined above) is predicted to be as large as 80% at 1.2 GHz and a ?c of 100 ns (Figure 3.4B). While RNAs of this size are rarely probed by 90 NMR, the simulated discrepancies are still significant for smaller RNAs. As highlighted by our simulations, 13C-13C dipolar interactions dominate the discrepancy whereas Ade-N1 and -N3 have almost no effect. Moreover, the 13C-13C contributions scale with atomic distance from Ade-C2, with Ade-C4 (2.2 ?) having the greatest effect followed by Ade-C6 (2.3 ?) and then Ade-C5 (2.7 ?) (Figure 3.4A and C). 3.2.1.2 Measurements in uniformly and selectively labeled RNA Our newly synthesized ATP 162 removes unwanted dipolar interactions and was therefore used along with u-[13C/15N]-ATP to empirically test our simulations. To this end, we used pulse schemes (based on the isolated 1H-15N backbone amide spin pair in proteins195) to leverage the isolated 1H-13C spin pair afforded by our atom-specifically labels (e.g., 162) (Figure 3.5) and measure Ade-R1,C2(uniform) and Ade-R1,C2(selective) rates in a u-[13C/15N]-ATP or 162 labeled 61 nt RNA (Figure 3.6) at 800 MHz and 25 ?C. Figure 3.5. Pulse scheme for 13C R1 in selectively labeled RNA. Pulse scheme is adapted from previous reports195. Open half-ellipsis represents a selective 180? REBURP234 pulse. Quadrature detection and sensitivity-enhanced/gradient-selection is implemented using the Rance-Kay235,236 echo/anti-echo scheme with the polarity of G1 inverted and ?4 and ?5 incremented 180? for each second FID of the quadrature pair. Thin and thick bars represent 90 and 180 ?C pulses, respectively. All pulses are in the x-plane unless noted otherwise. Pulse field gradients (PFG) are half-ellipsis and ?, ?, ?, and t represent time delays. This pulse program is titled 13C_TD_T1_ALH10ms_3D_HN on the UMD spectrometers. 91 Figure 3.6. RNA constructs for dynamics measurements198. Secondary structure of a u-[13C/15N]-ATP (top) or 162 (bottom) labeled 61 nt RNA made from IVT. Nucleotides harboring isotope labels are colored orange and numbered. Additional details can be found below. In agreement with our simulations (Figures 3.3 and 3.4), measured Ade-R1,C2(uniform) rates were significantly higher than Ade-R1,C2(selective) for six of the eight Ade-C2 nuclei (Figure 3.7A). Moreover, the average Ade-R1,C2 difference for measured Ade-R1,C2 rates was 4.7% (Figure 3.7B) compared to the simulated 5.4% (Figure 3.7B) for an RNA with a ?c of 11.2 ? 1.1 ns at 800 MHz (measured from R2/R1237). While this discrepancy is small and can likely be ignored, our simulations (Figure 3.7B) suggest that this may no longer be true for larger RNAs (e.g., ?c > 20 ns). To experimentally verify that the discrepancy in Ade-R1,C2 increases at higher molecular weights, we repeated our Ade-R1,C2(uniform) and -R1,C2(selective) rate measurements at 5 ?C to simulate an RNA with a higher molecular weight. To maximize signal-to-noise and minimize experiment time, we reduced the sweep-width and time-domain points while 92 increasing the number of scans. Therefore, only four of eight Ade-H2-C2 resonances were well-resolved (Figure A.28). Nevertheless, measured Ade-R1,C2(uniform) rates were again observed to be significantly higher than Ade-R1,C2(selective) for all four well-resolved Ade-C2 nuclei (Figure 3.7A). Moreover, the average measured Ade-R1,C2 difference at 5 ?C was significantly higher than those measured at 25 ?C (Figure 3.7B), in agreement with our simulations (Figures 3 and 4). Specifically, the average Ade-R1,C2 difference in measured Ade-R1,C2 rates was 25.5% (Figure 3.7B), compared to the simulated 15.6% (Figure 3.7B) for an RNA with a ?c of 21.3 ? 1.1 ns at 800 MHz (measured from R2/R1237). Figure 3.7. Measured Ade-C2 R1 rates in uniformly/selectively labeled RNA198. (A) Ade-R1,C2(uniform) and -R1,C2(selective) rate measurements at 800 MHz and 5 (right) or 25 (left) ?C. Mean rates are shown with dashed lines and error bars represent ? standard deviation (SD). Experimental R1,C2(uniform) rates at 25 ?C were larger (outside experimental error) than Ade-R1,C2(selective) for all Ade-C2 nuclei except A29 and A55 (designated no significance, NS). Experimental R1,C2(uniform) rates at 5 ?C were larger than Ade-R1,C2(selective) for all four well-resolved Ade-C2 nuclei. (B) Average Ade-R1,C2 difference (as defined above) taken from the data in A. The average Ade-R1,C2 difference in measured Ade-R1,C2 rates was 4.7 and 25.5% at 5 and 25 ?C, respectively. Taken together, our simulations and experimental measurements suggest that the discrepancy between Ade-R1,C2(uniform) and -R1,C2(selective) increases with higher molecular weights. Additional detail can be found below. For completeness, we wanted to compare experimental Ade-R2,C2 rates and Ade- hNOEC2 values in u-[13C/15N]-ATP or 162 labeled 61 nt RNA (Figure 3.6) to our simulations shown in Figure A.13. We did not observe differences in measured Ade- 93 R2,C2(uniform) and -R2,C2(selective) rates or Ade-hNOEC2(uniform) and -hNOEC2(selective) values (Figure A.29), in agreement with our simulations (Figure A.27). Taken together, Ade-C2 experiences long-range 13C-13C dipolar couplings from Ade-C4, -C5, and -C6 that can cannot be ignored or circumvented with selective pulses in u-[13C/15N]-ATP labeled RNA. That is, these dipolar contributions must be explicitly taken into account191 when interpreting Ade-R1,C2 rates in terms of motional models for large RNAs. We will showcase an example of robust 13C spin relaxation measurements and analysis in Chapter 4. 3.2.1.3 Theoretical simulation details 13C Ade-R1,C2 and -R2,C2 relaxation rates and steady-state 13C{1H} Ade-hNOEC2 values were simulated using Equations 3.1-3.5221,222, assuming isotropic tumbling. These relaxation parameters were simulated for Ade-C2 in u-[13C/15N]-ATP (i.e., Ade- R1,C2(uniform), -R2,C2(uniform), and -hNOEC2(uniform)) and [2-13C]-ATP (i.e., Ade-R1,C2(selective), - R2,C2(selective), and -hNOEC2(selective)) labeled RNAs and included dipolar contributions from Ade-C4, -C5, -C6, -N1, and -N3 at average distances of 2.20, 2.70, 2.30, 1.40, and 1.30 ?, respectively. In addition, Ade-C2 experiences dipolar contributions with the following protons: (1) Ade-H2, (2) those within the same nucleotide, (3) those 3? on the same strand, and (4) those 3? on the cross-strand38. Protons in (2) are H1? and H2? at average distances of 4.35 and 4.20 ?, respectively. Protons in (3) are H1, H2, amino (N)H2, and H1? at average distances of 4.00, 4.40, 4.33, and 4.45 ?, respectively. Protons in (4) are H1, H3, amino (N)H2, and H1? at average distances of 4.10, 4.50, 4.40, and 4.95 ?, respectively. All proton distances above were calculated from PDB 2ixy238 as a representative A-helical RNA. Solution NMR derived CSA values (?11 = 89, ?22 = 15, ?33 = -104)223 and an aromatic CH bond length of 1.104 ?239 were used in these simulations. 94 3.2.1.4 RNA transcription RNA was prepared as previously described30 and as outlined in Section 2.2.4.1, with minor modifications. Specifically, IVT was carried out with u-[13C/15N]-ATP (CIL) and 162. Both u-[13C/15N]-ATP and 162-labeled RNA NMR samples were 0.3 mM in 300 ?L of NMR buffer A (calculated using an extinction coefficient of 768.3 mM-1cm-1). 3.2.1.5 NMR spectroscopy details All NMR experiments were performed on an 800 MHz Avance III Bruker spectrometer equipped with a triple resonance cryogenic probe. NMR relaxation data were collected at either 5 or 25?C as specified in Section 3.2.1.2 and the legends of Figures 3.6 and A.15. TROSY-detected measurements of 13C R1 (Figure 3.5) and R1? relaxation rates and steady-state 13C{1H} hNOE values were adapted from previous pulse sequences190,191,195. For experiments at 25 ?C, the sweep widths were set to 2.6 (13C) and 8.0 (1H) ppm with 32 scans and 100 increments. For experiments at 5 ?C, the sweep widths were set to 2.0 (13C) and 10.0 (1H) ppm with 512 scans and 16 increments. In all experiments, carriers were centered at 150.9 (13C) and 4.7 (1H) ppm and a 203 Hz 1JH2C2 coupling38 was used for coherence transfer. For R1 experiments at 25?C, relaxation delays of 0.10, 0.20 (x2), 0.36, 0.50, 0.90, and 1.20 s were used for both u-[13C/15N]-ATP and 162-labeled RNA. For R1 experiments at 5?C, relaxation delays of 0.10, 0.20 (x2), 0.80, 1.00, 1.20 s or 0.10, 0.20 (x2), 0.90, 1.10, 1.30 s were used for u-[13C/15N]-ATP and 162-labeled RNA, respectively. For R1? experiments, relaxation delays of 1.5, 2.4, 3.4, 4.6, 6.1, 8.0, and 11.0 ms or 1.0, 2.0, 4.0, 5.0, and 6.0 ms were used at 25 and 5 ?C, respectively, and ?1 and ? were 1.882 kHz and 6 Hz, respectively. R1 and R1? experiments were acquired in an interleaved manner as a pseudo-3D experiment and using a recycle delay of 2.5 s. 95 For hNOE saturation experiments, recycle and saturation delays of 1.5 and 7.0 s were used, and proton saturation was achieved using a hard 180? pulse. In the hNOE no saturation experiments, a delay of 8.5 s was used in order to match the time of both recycle and saturation delays from the saturation experiment. For experiments on u-[13C/15N]-ATP labeled RNA, 15N was decoupled and selective pulses were applied as previously described191. Shape pulses used for on- resonance 13C inversion, on-resonance 13C refocusing, and off-resonance 13C inversion were Q3240, RSNOB241, and IBURP2234, respectively. Q3 pulse selectively inverts the 13C magnetization of interest, whereas RSNOB and IBURP2 selectively refocus (i.e., invert) 13C magnetization to eliminate 13C-13C scalar coupling evolution. Pulse lengths for Q3, RSNOB, and IBURP2 were 937.5, 1,000, and 450 ?s, respectively. The offset and bandwidth for IBURP2 were -40 and 50 ppm, respectively. NMR spectra were processed and analyzed using TopSpin 4.0, NMRFx Processor, and NMRViewJ196. R1 and R1? relaxation rates were determined by fitting peak intensities to a monoexponential decay. Uncertainties in R1 rates were estimated by propagating the error in peak intensities from duplicated delay points (indicated by ?x2?). Uncertainties in R1? rates were estimated from spectral signal-to-noise with RELAXFIT197. R2 rates were corrected for the off-resonance ?1 using Equations 2.1 and 2.2190,191. The hNOE values were obtained using Equation 3.3221,222 and their uncertainties were estimated by propagating the error in peak intensities from duplicated experiments. 3.3 Probing slow motions Spin-? nuclei with a positive g either align parallel (?, high-populated, favorable energetic state) to the static NMR magnetic field (B0) or antiparallel (?, low-populated, unfavorable 96 state). The net bulk magnetization, oriented parallel to B0, can be realigned with radiofrequency (RF) pulses along a direction perpendicular to B0. The magnetization then precesses about B0 at a resonant Larmor frequency (?) characteristic of the nucleus. When Fourier transformed, this detectable oscillating time-domain signal yields a frequency-domain NMR spectrum with signals at characteristic frequencies for each nucleus. When referenced against a standard frequency (e.g., DSS for 1H chemical shifts), we obtain a field-independent chemical shift that is directly proportional to the energy difference between the ? and ? states. For RNA exchanging between two states A and B, the chemical shift difference (??) between the two states and the exchange rate constant (kex, sum of the forward (kAB) and reverse (kBA) rate constants) or the exchange lifetime (tex = 1/kex) determine if two distinct NMR peaks are observed, and what signal intensity and linewidth are obtained for a given nucleus242,243. In the slow exchange regime, two distinct peaks are detected at the chemical shifts of the individual states, and the peak intensities are proportional to the populations of each state. In the fast exchange regime, kex is much larger than ??, and therefore a single peak is observed at the population-weighted average chemical shift. In the intermediate exchange regime, which, as its name implies, lies between the fast and slow time scales, kex ~??. Regardless of the exchange regime, if chemical exchange is present, R2 increases by Rex, which depends on kex and ?? and can therefore be modulated by magnetic field strength242?246. Dynamics on the intermediate and slow time scales (i.e., ?s-ms) can be characterized with relaxation dispersion (RD) using R1?244,247, Carr-Purcell-Meiboom-Gill 97 (CPMG)248?250, or Chemical Exchange Saturation Transfer (CEST)251 experiments, and even processes slower than seconds can be studied with real-time NMR252 (Figure 3.1). For two-site exchange, a general expression for the R2 rate constant (RCPMG(tcp)) for state A (where pA > pB), that encompasses all conformational exchange time scales, is given by the Carver-Richards equation243,253 1 1 ? # $ :"n'(op?n'r = 2s?% + ?% + ?NO ? [ 2? ???? ?j cosh (?j) ? ?: cos(?:)]{ (3.6) n' ? = ?2? ~?? + (?% + ?%)"/%?"/%? n' (3.7) 1 ? + 2??% ?? = 2 ??1 + (?% + ?%)"/%? (3.8) ? = (?# ? ?$% % ? ?#?NO + ? %$?NO) ? ??% + 4? %$?# ?NO (3.9) ? = 2?? (?# $% ? ?% ? ?#?NO + ?$?NO) (3.10) where ?#/$% and pA/B are the R2 rate and relative populations of the A/B state, respectively. A main disadvantage of the CPMG experiment is that only the magnitude (and not the sign) of ?? is obtained. Still, this disadvantage of the CPMG experiment is offset by the relative ease of CPMG experimental implementation and data analysis. That is, conformational exchange is easily detected by a non-flat CPMG curve when plotting the effective R2 rate (R2,eff) as a function of the frequency at which 180? pulses are applied (vCPMG) (Figure 3.8A). Non-exchanging nuclei, on the other hand, have no dependence of R2,eff on vCPMG and therefore appear as flat curves (Figure 3.8A). 98 Figure 3.8. Simulated NMR relaxation dispersion and CEST experiments3. (A) CPMG curves for two nuclei: one in exchange (in red, Rex > 0) using the parameters kex = 794 s-1, pB = 8.7%, and ?? = 228 Hz (150 MHz 13C-Larmor frequency) and one without (in black, Rex = 0, or ??=0, or both), based on published data254. (B) CEST profile for a given nuclei showing evidence of two states A and B. Calculations assumed kex = 121 s-1, pB = 10.8%, g(1H)B0/2? = 600 MHz, ?? = -4 ppm, ?#" = ?$" , T = 0.3 s, and the B1 fields specified on the figure. (C) R1? profile for a given nuclei showing evidence of two states A and B. Calculations used the same parameters as in (B) but with different B1 fields, which are again specified on the figure. As seen in the CEST and R1? profiles, at higher B1 fields, linewidths broaden to the point where state B becomes increasingly difficult to detect. CEST and R1? profiles are based on published data255. R1? and CEST experiments provide more robust information regarding the chemical shifts of state B. For a two-site model, ??, kex, and pB can be extracted from CEST profiles using the Bloch-McConnell 7x7 matrix (including the equilibrium magnetization terms)256?258. By combining all data sets, global kex and pB values can be fit numerically for all the CEST profiles, plotted as I/I0 versus spin-lock offset (in Hz) (Figure 3.8B). The 7x7 two-site Bloch-McConnell equation is derived from the relaxation matrix and the kinetic rate matrix for an exchanging two-site system251,255?257 ?/2 0 0 0 0 0 0 0 ? ? ?/2 ? ? ? 0 ?? # ? O % ? ?#$ ??# ?" ?$# 0 0 ? ? ? ? ? # ? ? ?O ? d ? f ? ? 0 ?# ??% ? ?#$ 0 0 ?$# 0 ? ? # # ? ?f ? ? = 2? ? ?? 0 ?? ? ?d? ? ? ? " # " " #$ 0 0 ?$# ?? ? ?? ? (3.11) ? ? ? 0 ? 0 0 ??$ ?O ? #$ % ? ?$# ??$ ?" ? ? ?O ? ? ?f ? ? 0 0 ? $#$ 0 ?$ ??% ? ?$# 0 ? ? ?f ? ? ?? ? ?2?$" ?$ 0 0 ?#$ ?? 0 ??$" " ? ?$#? ? ?? ? 99 where ?#/$" , ?#/$, and ?" are the R1 rate of the A/B state, the offset of the B1 spin-lock field from the peaks in the A/B state (in rad?s?1), and the B1 field strength (in rad?s?1), respectively. The evolution of magnetization for the peak in state A during the CEST spinlock period is given by ?(?) = ?(0) ? ?(:???). (3.12) Similarly, under the R1? model for two-site exchange, the R1? value for state A magnetization is given by259 ? = :" ? ln (?????", ? ????? ? ??) and ? (3.13) ????? # ???% ? ?#$ ??# 0 ?$# 0 0# ? ? ?# ??% ? ?#$ ??" 0 ?$# 0 ? ? #? = 0 ?" ??" ? ?#$ 0 0 ?$# ? ? (3.14) ? $ ? ? #$ 0 0 ??% ? ?$# ??$ 0 ? ? 0 ?#$ 0 ?$ ?? $ % ? ?$# ??" ? ? 0 0 ? $#$ 0 ?" ??" ? ?$#? ?#????? 0 ??? ????? ? #? = ? ?0 and ? = tan :" ;<=? (3.15, 3.16) ? ? ? ? 0 ? ? 0 ? where ? = ?rf ? ?obs is the difference between the resonance frequency of the observed nucleus (?obs) and the spinlock transmitter frequency (?rf). For R1? experiments, Rex can be detected by plotting R2,eff versus ?/2? (Figure 3.8C). The expression for CEST and R1? (Equations 3.11-3.16) provide insight into the parameters that are important for 100 acquiring useful data. For example, higher B1 fields decrease chemical shift resolution between states and also broadens linewidths (Figures 3.8B and C). While almost all RD studies involve two-site systems, expressions for CPMG, R1r, and CEST models for characterizing N-site exchange have been described by Arthur Palmer III and co-workers243. Indeed, work from Al-Hashimi and co-workers on Watson- Crick mismatches and base pair reshuffling in RNA feature R1? and CEST data that described three-site exchange260. 3.3.1 Scalar coupling, relaxation dispersion, and saturation transfer As with spin relaxation, the scalar and dipolar couplings present in u-[13C/15N]-labeled RNA can lead to complications in RD and CEST experiments. As we have discussed elsewhere99, numerous spectroscopic solutions have been proposed to circumvent the problems that arise from 13C-13C couplings that exist in uniformly labeled RNA. These advances include constant time evolution261?264, adiabatic band selective decoupling265? 267, and selective cross-polarization with weak RF fields268?270. These solutions have benefited RD and CEST experiments to varying degrees in RNA. Specifically, 13C-13C scalar couplings (e.g., C1?-C2? or C5-C6) complicate CPMG experiments271,272 to a much larger degree than both CEST and R1?. However, these couplings still pose a problem to CEST273,274 and R1?255 and oscillations are sometimes observed in the decay profiles of ribose C1? and Pyr-C6 nuclei. Moreover, as with spin relaxation, these couplings must be explicitly taken into consideration in data analysis. The number of coupled homogenous differential equations (n) is equal to (2 x 4m) ? 1, where m is the number of weakly coupled nuclear spins in an m-spin system. Therefore, for one-, two-, and three-spin systems, n = 7, 31, and 127, respectively256,257,273. This transforms the CEST matrix (Equation 3.11) 101 from 7x7 to 31x31 for 13C-13C scalar coupled spin pairs found in the nucleobase and ribose moieties. Atom-specific labeling (Sections 1.34 and 1.4.2), on the other hand, circumvents this problem entirely, and dramatically simplifies NMR spectra, especially when incorporated position-specifically via SPS (Sections 1.3.5 and 1.4.1). Nevertheless, using both selective and uniformly labeled RNA, CEST and R1? experiments have now been applied to the protonated nucleobase (Pyr-C5 and -C6, Pur-C8, and Ade-C2) and ribose (C1?-C5?) carbons, the nucleobase imino (Gua-N1 and Thy/Uri-N3) and amino (Gua-N2) nitrogen, nucleobase (Uri-H3, Gua-H1, Ade-H2, Pur-H8, and Pyr-H5 and -H6) and ribose H1? protons, as well as non-protonated (Ade-N1 and Pur-N7) and amino nitrogen (Cyt-N4) sites (Figure 3.1)143. 3.3.1.1 CPMG in selectively labeled RNA Given the considerations described above, CPMG experiments are solely implemented on atom-specifically labeled RNA, and mainly from our research group99,148,272 and the Kreutz research group104,107,254,275, though not exclusively232. To emphasize this point, we used pulse schemes (based on methyl groups in protein side proteins276,277) to leverage the isolated 1H-13C spin pair afforded by our labels (Figure 3.9) and measure 1H CPMG in an atom-specifically labeled (i.e., [8-13C]-Ade/Gua and [6-13C, 5-2H]-Cyt/Uri) 40 nt RNA (Figure 3.10). These experiments probe ?s-ms motions within Pur-H8 and Pyr-H6 sites. As such, conformational exchange processes can be identified by non-flat CPMG curves, (Figure 3.8A) from which kex can be obtained. If the exchange processes are slow on the NMR time scale, then ?? and pB can also be determined250,253,278. 102 Figure 3.9. Pulse scheme for 1H CPMG in selectively labeled RNA. Pulse scheme is adapted from previous reports272,276,277 The CPMG element is shown as a constant-time period with a variable number N of equally spaced 180? pulses. Open half-ellipsis represents a selective 180? REBURP234 pulse. Quadrature detection and sensitivity-enhanced/gradient-selection is implemented using the Rance-Kay235,236 echo/anti- echo scheme with the polarity of G1 inverted and ?4 and ?5 incremented 180? for each second FID of the quadrature pair. Thin and thick bars represent 90 and 180 ?C pulses, respectively. All pulses are in the x- plane unless noted otherwise. PFG are half-ellipsis and ?, ?, ?, and t represent time delays. This pulse program is similar to that titled tkd_TROSY_HCP_1HCPMG.RML on the UMD spectrometers. Figure 3.10. RNA construct for CPMG measurements. Secondary structure of an atom-specifically labeled (i.e., [8-13C]-Ade/Gua and [6-13C, 5-2H]-Cyt/Uri) 40 nt RNA made from IVT. Nucleotides harboring isotope labels are colored orange. Additional details can be found below. All exchange events in our 40 nt RNA, however, are fast on the NMR time scale and therefore only kex (and ?ex) could be extracted250,253,278. As shown in Figure 3.11, conformations, exchange processes occur in the upper helix with a wide range of 103 exchange rates (i.e., ~3,300-20,000 s-1) and lifetimes (i.e., ~50-490 ?s). We will showcase an example of robust 1H CPMG measurements and analysis in Chapter 4. Figure 3.11. CPMG measurements in selectively labeled RNA. (A) Representative example of a non- flat (top) (U30, Rex >0) and flat (bottom) (G19, Rex = 0) CPMG curve. (B) Extracted kex (top) and ?ex (bottom) values from Luz-Meiboom (fast exchange model) fitting with RING NMR Dynamics278. Error bars represent ? SD. (C) Slow ?s-ms motions mapped onto the RNA secondary structure, based on the CPMG data in B. 3.3.1.2 RNA transcription RNA was prepared as previously described30 and as outlined in Section 2.2.4.1, with minor modifications. Specifically, a different DNA template (RNA) was used (5? ? GGATGCCCCAAAGCCCGAAGGCTTGAACAGTAGGACATCCTATAGTGAGTCGTAT TAG ? 3?). In addition, IVT was carried out with [8-13C]-ATP, [8-13C]-GTP, [6-13C, 5-2H]- CTP, and [6-13C, 5-2H]-UTP (prepared as described in Sections 1.3.4.2.2 and 2.3.3). The RNA NMR sample was 0.5 mM in 300 ?L of NMR buffer A (calculated using an extinction coefficient of 418.8 mM-1cm-1). 3.3.1.3 NMR spectroscopy details TROSY-detected 1H CPMG experiments were adapted from previous pulse sequences (Figure 3.9)272,276,277 and collected at 25 ?C on an 800 MHz Avance III Bruker 104 spectrometer equipped with a triple resonance cryogenic probe. The sweep widths were set to 10.5 (13C) and 10.0 (1H) ppm with 32 scans and 100 increments. Carriers were centered at 138.0 (13C) and 4.7 (1H) ppm and a 200 Hz 1JHC coupling38 was used for coherence transfer. CPMG modules were used to obtain R2,eff as a function of ?CPMG during a constant time relaxation period (Trelax)279. The CPMG module is a set of N repeated blocks of delay (?)-180? pulse (?)-delay (?) (?CPMG-?-?CPMG) such that 2 ? N ? ?CPMG = Trelax. In the CPMG experiment, Trelax was set to 40 ms, recycle delays of 1.5 s were used, and the following ?CPMG values were used: 50, 100, 200, 250, 300, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000, and 3,500 Hz. NMR spectra were processed and analyzed with TopSpin 4.0, NMRFx Processor, and NMRViewJ196. The R2,eff rates were determined according to the following equation278 1 I(? R = ? ln ???? ) %,??? (3.19) T????? I? where I0 and I(?CPMG) are peak intensities measured without and with the constant-time relaxation period, respectively. Uncertainties in R2,eff rates were estimated from spectral signal-to-noise with RELAXFIT197. For R2 relaxation dispersion, three exchange models exist: (1) no exchange (R2,eff = R2,0), (2) fast exchange (kex >> ??), and (3) slow exchange (i.e., outside of the fast regime). Fast exchange can be modeled as243,250 4? k R%,??? = R%,? + R?? ?1 ? ???? ?? k tanh ? ?? 4????? (3.20) 105 where R2,0 is R2,eff at infinite ?CPMG or no exchange (Rex = 0). In the fast exchange regime, it is not possible to fit for pb or ??. Instead, only R2,0, Rex, kex, and ?()*''( can be obtained250,253,278, where the latter parameter represents the chemical shift change that would be present if the populations were equal. For exchange outside the fast regime, data can be fit to a model using the Carver Richards equations253 to extract R2,0, kex, pb, and ??. CPMG curves were also fit to each of the three exchange models (i.e., fast, slow, or no exchange) using RING NMR Dynamics278 and the appropriate model was selected on the basis of the Akaike Information Criterion (AIC)280. 3.4 Conclusion Solution NMR spectroscopy is well suited biophysical tool to probe RNA motions on a wide-range of time scales (Figure 3.1). As such, we have presented a detailed account of the theory and application of RNA dynamics, with a special emphasis on fast (i.e., ps- ns) (Section 3.2) and slow (i.e., ?s-ms) (Section 3.3) motions. We also showcase how atom-specific labeling can benefit these measurements to ensure artefact-free probing of RNA dynamics. Specifically, the removal of dipolar coupling simplifies R1 rate measurements and analysis (Section 3.2.1) whereas the removal of scalar coupling facilitates relaxation dispersion (i.e., CPMG and R1?) and saturation transfer (i.e., CEST) measurements and analysis (Section 3.3.1). Given that RNA dynamics are critical to their many functions16,17, the accurate probing of RNA dynamics is a prerequisite to a detailed mechanistic understanding of RNA biology. While we demonstrate simple examples of 13C spin relaxation (3.2.1.1) and 1H CPMG (3.3.1.1) measurements in selectively labeled RNA, Chapter 4 will showcase a project where the robust probing of RNA structural dynamics helped provide mechanistic details to an important biological phenomenon. 106 4 Structural dynamic insights of hepatitis B virus pre-genomic RNA *This chapter is adapted from the following20,281. 4.1 Introduction Chapters 1-3 were meant to introduce how chemical tools can be broadly applied to study RNA structure, dynamics, and interactions by solution NMR spectroscopy. This chapter, however, describes the specific application of these tools to investigate the structural dynamics of a 61 nt regulatory RNA element in hepatitis B virus (HBV) pre-genomic RNA (pgRNA), designated epsilon (?). To start, we will introduce HBV biology and its viral lifecycle (Section 4.2). Then, we will showcase our recent solution NMR structure of ?20 (Section 4.3), which is just one of 23 NMR structures of RNAs >60 nt (Figure 1.1C). This section is mainly the work of former graduate (Drs. Regan LeBlanc and Andrew Longhini) and undergraduate (Maryia) students in the group, with long standing collaborations with NCI (Drs. Wojciech Kasprzak and Bruce Shapiro) and National Institute for Standards and Technology (Dr. Christina Bergonzo) research groups. My contribution was in final analysis and manuscript20 preparation. With the foundation of NMR dynamics already in- place (Chapter 3), we will then outline our robust probing of ? dynamics by 13C spin relaxation, 1H CPMG, and molecular dynamics (MD) simulations281 (Section 4.4). To conclude, we will hypothesize how our combined ? structure (Section 4.3) and dynamics (Section 4.4) data inform our understanding of the HBV genome replication (Section 4.5). 4.2 HBV lifecycle and genome replication HBV is a member of the hepadnaviral family and is the smallest animal-infecting DNA virus with a genome of only 3.2 kilobases (kB) (Figure 4.1A)282?286. The HBV genome is 107 partially double-stranded, relaxed circular DNA (rcDNA) and is covalently attached to a multifunctional viral polymerase protein (P) (Figure 4.1A)282,287,288. Unlike other viruses, HBV P comprises four domains in a single polypeptide: reverse transcriptase (RT), a middle spacer, a RNase H (RH), and a terminal protein (TP) domains289?294. Figure 4.1. HBV genome organization. (A) Schematic of the 3.2 kB HBV rcDNA genome, showing the pgRNA, (-)- and (+)-DNA strands, and its four open reading frames and seven genes. Critical genomic elements (e.g., attachment to P, direct repeats 1 (DR1) and 2 (DR2), and ?) are also shown. (B) Representation of the pgRNA and four messenger RNAs (mRNAs) that are transcribed from the ccDNA template using the host RNA polymerase II (Pol II) and their seven protein products. All 5?-caps are shown as spheres and 3?-poly-adenylated (poly-A) tails are shown as pA. HBV replication begins with binding of infectious virions to the sodium-taurocholate co- transporting polypeptide receptor (NTCP) and heparan sulfate proteoglycans (HSPG) of the host liver cell295. Following infection, the rcDNA is imported to the host cell nucleus and repaired to form a covalently closed circular DNA (cccDNA), reminiscent of plasmid DNA282,296. This cccDNA contains is transcribed into several genomic and sub-genomic RNAs by the liver host Pol II282,296. Viral transcripts are exported to the cytoplasm, where pgRNA provides the mRNA translated into viral proteins (more on this in Section 4.2.1). 108 The pgRNA is also selectively packaged into immature capsids and is reverse transcribed by the co-packaged P into new rcDNA genomes288,289,296?303. Matured rcDNA-containing nucleocapsids are then used for intracellular cccDNA amplification or enveloped and released from the host cell as progeny virions. The HBV lifecycle is shown in Figure 4.2. Below, we discuss these genome conversions, with emphasis on the step involving the interaction of ? and P and subsequent protein-primed reverse transcription. Figure 4.2. HBV lifecycle. Schematic representation of HBV genome replication, from the liver cell entry to complete genome replication. The HBV lifecycle can be separated into the following phases: rcDNA repair to form ccDNA, RNA transcription of ccDNA by host Pol II, pgRNA and P packaging, and pgRNA reverse transcription to restore the rcDNA genome. This figure is inspired by the following304. 4.2.1 The transformation from rcDNA to cccDNA to pgRNA Chronic viral infections require that viral genomes persist in infected cells. This requires genomes stable enough to survive cell division which can therefore be continuously passed onto progeny. While each virus has their own strategy to achieve this goal, HBV and other hepadnaviruses use their cccDNA, which avoids the need for terminal 109 redundancy. That is, on the circle, the Pol II promoter and enhancer sequences are in front of the start sites for the genomic RNAs, facilitating replication282?286. Naturally HBV- infected hepatocytes contain ~50 or more copies of cccDNA305, each with a long half-life (e.g., ~30-60 days for the related duck HBV system305,306). Taken together, these features ensure that cccDNA survives cell division and help it persist during antiviral therapy307. Given these facts, a critical process in the HBV lifecycle is the formation of the cccDNA (Figure 4.2). However, the HBV genome enters the host cell as rcDNA (Figure 4.2), which has many unique characteristics. The (-)-DNA strand (i.e., opposite polarity to the mRNAs) is complete whereas the (+)-DNA strands (i.e., same polarity to the mRNAs) are not (Figure 4.1A). Moreover, the 5?-end of the (-)-DNA is covalently linked to P whereas the 5?-end of the (+)-strand consists of an RNA oligonucleotide derived from the pgRNA (more on this in Section 4.2.2) (Figure 4.1A). The necessary formation of cccDNA requires that all these modifications are removed and that both strands are covalently ligated. Exactly how these processes are achieved is not well understood, owing to the ambiguity of cccDNA detection in the presence of excess rcDNA308. HBV RNAs are transcribed by host Pol II using cccDNA as the template, which contains seven gene products in four open reading frames: pre-core (preC), core (C), P, pre-surface 1 (preS1), pre-surface 2 (preS2), and surface (S), and X (Figure 4.1B)282. Each RNA contains 5?-cap structures and 3?-poly-A tails and serve as mRNAs. The RNA relevant for HBV replication is the pgRNA, encompassing the entire genome plus a terminal redundancy of ~120 nt that contains a second copy of the DR1 and the ? signal as well as the poly-A tail (Figure 4.1A)282?286. One role of the pgRNA is to serve as the mRNA for C and P proteins282 (Figure 4.1B). The remaining five viral proteins are 110 transcribed from preC, preS1, preS2/S, and X mRNAs (Figure 4.1B)282. Another function of the pgRNA is to template the reverse transcription of new DNA genomes288,289,296?303. 4.2.2 pgRNA packaging and reverse transcription The next critical step in HBV replication is the packaging of pgRNA and P into immature capsids and subsequent reverse transcription (Figure 4.2). These phenomena require the binding of P to ?, an ~85 nt cis-acting regulatory stem-loop RNA located at the 5?-end of the pgRNA (Figure 4.3)289,298,299,301?303. The ? motif is also located at the 3?-end of the pgRNA but this copy does not bind P(Figure 4.3)282?286. The 5?-end ?-P interaction triggers the initiation of protein-primed reverse transcription301,309?311 and packaging288,300 of both P and the pgRNA by C dimers into immature capsids. The product of the priming reaction is a 3 nt DNA motif, whose 5?-end is covalently attached to a tyrosine residue (Y63) in the TP domain, and which is templated from the 6 nt priming loop (PL) bulge within ?290? 292,301,309?311 (Figure 4.3). The complex then translocates to the 3?-proximal DR1 element where (-)-DNA strand synthesis begins from the 3-nt DNA primer (Figure 4.3)312,313. 4.2.3 (-)-DNA strand synthesis While initial HBV (-)-DNA strand synthesis is templated from the 5?-UUC-3? sequence in the ? PL, (-)-DNA strand elongation is templated from the same sequence in the 3?-end DR1 motif nearly ~3 kB away (Figure 4.3). As such, the TP-bound DNA must translocate a substantial distance in sequence space. Given that there are ~20 additional 5?-UUC-3? motifs within the pgRNA282, and that <4 nt sequence identity are required between template and target301, additional regulatory elements must ensure proper translocation. One such model is that 3?-DR1 and 5?-? are brought into close proximity via closed-loop 111 formation of the pgRNA facilitated by cellular proteins (e.g., eukaryotic elongation initiation factor 4G (eIF4G) which links 5?-cap314 and 3?-poly-A315 binding proteins) (Figure 4.3). The fact that pgRNA packaging requires close proximity of ? and the 5?-cap314 support this model. An alternative model suggests that a long range RNA interaction between ? and a new cis-element designated ?, which has partial complementarity to ? and is slightly upstream of DR1, is required for efficient (-)-DNA synthesis (Figure 4.3)316? 318. Indeed, mutations affecting ?-? reduces the efficiency of (-)-DNA synthesis319. Figure 4.3. HBV protein-priming and template translocation. The TP domain of P synthesizes a 4 nt DNA using the ? PL bulge as a template. TP-linked DNA then translocates to the 3?-end DR1 motif where (-)-DNA strand elongation commences. Dashed lines highlight the translocation event of the TP-linked DNA and the proposed interaction between ? and ?. This figure is inspired by the following304. Above all, the translocation process must remodel the ?-P complex. That is, P first facilitates the protein priming with TP-Y63, then enables DNA growth, and finally replaces the ? template with DR1. P must therefore have distinct initiation and elongation modes, like other protein-priming polymerases320. The end product of (-)-DNA synthesis is a DNA 112 copy of the pgRNA from its 5?-cap to the 5?-UUC-3? motif in the 3?-DR1, including ~10 nt 3?- and 5?-end redundancy (Figure 4.3). As the (-)-DNA strand is synthesized, the pgRNA is degraded by the RH domain of P with the exception of its 5?-terminal ~18 nt which includes the 5?-end DR1321. This 5?-capped RNA then functions as the template for (+)- DNA strand synthesis (Figure 4.4A). Figure 4.4. HBV rcDNA genome synthesis. (A) The TP-linked DNA extends (-)-DNA strand synthesis toward the 5?-end of the pgRNA, which is concomitantly degraded the RH domain of P. (B) The RNA primer then translocates to the DR2 motif and is extended toward the 5?-end of the (-)-DNA strand, thereby initiating (+)-DNA strand synthesis. Here, the 3?- and 5?-r refer to the 10 nt redundancy that is generated with the (- )-DNA strand. (C) After copying the 5?-r, the growing 3?-end of the (+)-DNA strand translocates to the 5?-r on the (-)-DNA strand to permit further elongation. (D) Final extension of the (-)-DNA strand template yields (+)-DNA strands of various lengths to yield the final rcDNA. This figure is inspired by the folllwing304. 4.2.4 (+)-DNA strand synthesis In order to ensure rcDNA formation and not simply double-stranded and linear DNA, the RNA primer must be transferred to the 3?-proximal DR2 motif (Figure 4.4B). This second 113 template switch requires RNA primers with their 5?-cap and the DR1 motif. Interestingly, the template 5?-DR1 translocates to the 3?-DR2 rather than the initial 3?-DR1 motif, even though it has more complementarity to the latter. As before, this observation suggests an additional level of control of efficient (+)-DNA strand synthesis. From its new location on DR2, the RNA primer is elongated towards the TP-bound 5?-end of the (-)-DNA, including the 5?-end redundancy (Figure 4.4B). Further elongation requires circularization, which is facilitated by a third template switch (Figure 4.4C). That is, the growing (+)-DNA strand is transferred from the 5?- to 3?-end redundancy on the (-)-DNA template where its final extension yields rcDNA (Figure 4.4D). While the sequence requirements of both redundant ends are critical, additional cis-acting elements (e.g., long range base pairing) have been hypothesized to play important roles318,322. Intramolecular base pairing is likely an important mechanism that ensures the proper shape and necessary contacts within the HBV genome needed to facilitate the three template switches that form the rcDNA318. 4.2.5 Structural dynamics of ? While the entire HBV lifecycle is of critical importance, our focus is centered on ? and its interaction with P. The secondary structure of a 61 nt ? was confirmed (Figure 4.5A)298,303, and its role in P binding323,324, pgRNA packaging298,303,310,325, and DNA synthesis301,310,323,325 has been established by biochemical and mutational analyses (more on this in Section 4.2.5). This ? construct contains the entire stem-loop region and will therefore be referred to as full-length ? throughout the text. Moreover, the ? sequence is highly conserved in other mammalian hepadnaviruses and between different isolates326,327. While secondary structure analysis provides a useful starting point, a detailed understanding requires 3D structural information. Fortunately, the structure of 114 the truncated 27 nt ? upper helix (UH), solved by solution NMR (Figure 4.5B)238,328, revealed that the apical loop (AL) consists actually of only three nucleotides in a pseudo- triloop (PTL) structure (Figures 4.5). Moreover, the solution structure suggests that the entire upper loop forms a nearly contiguous A-helix, with the exception of the PTL and U43 (full-length numbering) bulge (Figure 4.5B)238. Figure 4.5. Summary of previous NMR studies of ?281. (A) Secondary structure of full-length ?298,303 with secondary structure abbreviations that will be used throughout the text. The A13:U49 ?base pair? has been left as a dashed line for reasons that will be explained in Section 4.3. (B) Top-ranked AL ? solution NMR conformer (PDB 2ixy)238 with 13C dynamics data329 mapped onto the 3D structure. The AL contains nucleotides in the UH and PTL, as indicated by the shaded gray box in A. Nucleotides in the PTL and U43 bulge (full-length numbering) exhibit fast ps-ns (and to a lesser extent slower ?s-ms) motions that have been hypothesized to facilitate P binding. In addition to structural data, previous NMR studies have also measured AL ? dynamics329. Specifically, 13C spin relaxation measurements demonstrate fast ps-ns motions in PTL nucleotides U32-U34 and C36 as well as the U43 bulge (all full-length numbering) (Figure 4.5B)329. 13C R1?-based measurements also hinted at slower ?s-ms motions for C31-C1?, C36-C5, and U43-C5 nuclei (Figure 4.5B)329. These motions are 115 hypothesized to facilitate P binding329. Interestingly, Systematic Evolution of Ligands by EXponential Enrichment (SELEX) experiments indicate that diverse ? sequence mutations are compatible with P binding, suggesting structural flexibility of ?330. Taken together, these results implicate RNA dynamics in the ?-P interaction and HBV genome replication more broadly. While these data provide useful insight into the RNA-assisted HBV replication mechanism, all data on the PL bulge region, which is the template for protein-primed reverse transcription290?292,301,309?311, are lacking. Recent work from our group to fill this critical knowledge gap will be the focus of Sections 4.320 and 4.4281. 4.2.6 P protein structure and host interactions Currently, there are no structures of HBV or hepadnaviral P proteins, though homology models have been proposed for the RT331 and RH domain332. The RT model agrees with drug resistance data, and it is supported by mutational analysis of the putative dNTP pocket of the duck P protein where a single phenylalanine residue (F451) was shown to have a homologous role in dNTP versus rNTP discrimination as in HIV-1 RT333. However, outside the active site the accuracy of the modeled structure is unknown. The situation is even worse for the TP domain, which shares no significant sequence similarity to any other protein in the PDB, not even to the few other TPs involved in viral genome replication320,334. Moreover, those TPs are not covalently linked to their polymerases, so homology-based efforts have limited utility. Structure determination of P or its individual domains is therefore a desirable goal but has proven a difficult task. P is difficult to express in soluble amounts sufficient for structural biology techniques. While this challenge has been partially overcome by including solubility fusion partners, they still form soluble aggregates at high concentrations334?336. 116 While structural understanding of P is limited, much is known about host factors (HFs) that modulate HBV and hepadnaviral P protein function. The first HF shown to interact with P is the heat shock protein 90 (Hsp90) complex, which includes Hsp90, Hsp70, Hsp40, Hop/p60, and p23337?339. This chaperone complex is required to establish and maintain the P that binds ?337?339. In addition, and similar to (-)-DNA strand synthesis, eIF4E (not to be confused with eIF4G as in Section 4.2.3) binds to P340. This interaction can occur in an RNA-independent manner, but the presence of the pgRNA enhances the P-eIF4E binding340. Another RNA-independent interacting partner is the human cytidine deaminase Apobec3G341, which inhibits early stages of (-)-DNA strand synthesis independent of its deaminase activity342,343. There is also a complex interplay between HBV and the host immune response. For example, the immune modulatory DEAD-box RNA helicase 3 (DDX3) was shown to interact with P in an RNA-independent manner344, though its exact function is not well understood. Additional work has also shown that P binds to both protein kinase C-? (PKC-?) and importin-?5345, two proteins involved in nuclear translocation. As with DDX3, more work is required to uncover the regulatory mechanism of PKC-? and importin-?5 in HBV replication. 4.2.7 Determinants for HBV replication The basic requirements for the ?-P interaction323,324, pgRNA packaging298,303,310,325, and DNA synthesis301,310,323,325 have been defined with biochemical and mutational analyses. In terms of proteins, the TP and RT domains of P are required for ?-P binding323,324. Cellular chaperones (e.g., Hsp90, Hsp70, Hsp40, Hop/p60, and p23)337?339 and HFs (e.g., eIF4E340, Apobec3G341?343, DDX3344, PKC-?345, and importin-?5345) are further needed to assist the ? interaction and later pgRNA packaging. As regards the pgRNA, there is a 117 proximity requirement between the 5?-end ? and the 5?-cap for efficient reverse transcription314. Moreover, mutational studies298,301,303,310,323?325 show that different ? regions contribute to P binding, protein-priming, and pgRNA packaging in both a sequence- and structure-specific manner (Figure 4.6 and Tables 4.1 and 4.2). Figure 4.6. ? sequence and structure requirements for HBV replication281. RNA sequence and secondary structure requirements for ?-P binding (specifically the RT domain), pgRNA packaging, and DNA synthesis are shown, as summarized from previous biochemical and mutational studies298,301,303,310,323?325. Table 4.1. ? sequence requirements for HBV replication281. Replication function Sequence requirements Reference a. G8-U12 and G50-C54 a. 310, 324 pgRNA packaging b. C31-C36 b. 303, 324 c. G44-U48 c. 303, 310, 324 DNA synthesis a. A20-C24 a. 310 b. G8-U12 and G50-C54 b. 310 a. G8-C11 and G51-C54 a. 324 RT binding b. C14 and U15 b. 324 c. A21-C23 and G45-U47 c. 324 118 Table 4.2. ? structure requirements for HBV replication281. Replication function Structure requirements Reference a. upper part of the LH a. 303, 310, 324 b. PL b. 298, 301, 303, 310, 323, 324 pgRNA packaging c. upper part of the UH c. 303, 324 d. PTL d. 298, 324 d. U43 e. 303, 324 a. PL a. 301, 310, 323 DNA synthesis b. A:U base pairs in the UH b. 325 c. PTL c. 323 a. PL a. 323, 324 RT binding b. upper part of the UH b. 324 d. U43 c. 324 Specifically, the upper portion of the LH has primary sequence requirements for P binding324 and pgRNA packaging303,310. Intriguingly, the lower portion of the UH has sequence requirements for P binding324 and DNA synthesis on its 5?-side310 and pgRNA packaging on its 3?-side303,310, whereas the upper portion of the UH solely has a structural role303. The PL bulge structure is essential for P binding324, pgRNA packaging298,301,303,323,324, and DNA synthesis298,303,310,323, whereas its 5?- and 3?-ends function separately in P binding324 and protein priming310, respectively. The PTL303,324 and U43 bulge298,324 are essential for pgRNA packaging but only the former structure is dispensable for P binding324. Finally, the A:U base pairs after the PL and preceding the PTL are required for protein-priming and (-)-DNA strand elongation, respectively325. While this information provides a useful starting point, a detailed mechanistic understanding of the ?-P interaction and its subsequent functions require high-resolution structural dynamics studies. These data are completely lacking for P, and only available for truncated ? RNAs, making NMR studies of full-length 61 nt ? a necessary next step. The rest of this chapter is devoted to filling this critical knowledge gap. 119 4.3 Structural analysis of full-length ? 4.3.1 Resonance assignment To date, the only ? structure is a truncated 27 nt fragment derived from its AL (Figure 4.5B)238,328. Here, we report the first ever structure of full-length 61 nt ? that includes the previously undetermined PL bulge. As discussed in Chapter 1, structure determination of large RNAs by solution NMR is challenged by severe chemical shift overlap and resonance line broadening. These difficulties create a large bottleneck for complete resonance assignment, even when atom-specific labeling (Sections 1.34 and 1.4.2) is used. To lessen this burden and expedite resonance assignment, we sub-divided ? into modular constructs (i.e., PL and AL) (Figures 4.7 and A.30A). Figure 4.7. ? modular constructs to facilitate resonance assignment. Secondary structure representations of ? constructs made from IVT for NMR studies., with important structural regions shown in color in a way that will be used throughout the text. The PL ? construct contains PL nucleotides C14-C19, four flanking base pairs on either side, an additional three base pairs to stabilize the LH and improve transcription154, and a UUCG tetraloop to close the UH. The AL ? construct contains nucleotides G22-C46 of the UH and PTL with an additional terminal G:C base pair to improve transcription154. 120 We observed good agreement between our u-[13C/15N]-labeled ? constructs when comparing the chemical shifts in both aromatic (i.e., H6-C6/H8-C8) (Figure A.30B) and ribose (i.e., H1?-C1?) (Figure A.16C) regions, suggesting that our modular constructs faithfully recapitulate full-length ?. We therefore used u-[13C/15N]-labeled PL ? to simplify resonance assignment with various through-bond and through-space multidimensional NMR experiments, which formed the basis for verifying the assignment in full-length ?. To this end, 3D HCP-HCCH total correlation spectroscopy (TOCSY)346 provided sequential assignments of non-exchangeable H1?-C1? resonances for nucleotides U15-U18 (Figure 4.8A), and 13C-edited NOESY experiments provided further connectivity of nucleotides A13-A20 via inter-strand H1?-H6/8 NOEs (Figure 4.8B). Figure 4.8. Assignment of ? priming loop resonances20. (A) 3D HCP-HCCH TOCSY and (B) 13C-edited NOESY experiments were carried out on u-[13C/15N]-labeled PL ? to initiate resonance assignment. TOCSY data provided unambiguous sequential assignments for non-exchangeable H1?-C1? resonances for nucleotides U15-U18. NOESY data provided further connectivity between nucleotides A13-A20 via inter- strand H1?-H6/8 NOEs. NMR spectra were recorded at 25 ?C and 800 MHz. Additional non-exchangeable resonances were assigned leveraging our labels in combined asymmetric 13C-labeling and isotopic filter/edit NOESY strategy347 on atom- specifically labeled (i.e., [2?,8-13C2]-Ade, [1?,6-13C2, 5-2H]-Cyt/Uri, and [1?,8-13C2]-Gua) full- 121 length ? (Figure 4.9A). This technique dramatically reduced spectral overlap and therefore helped confirm many of the assignments from the PL ? construct (Figure 4.8). Moreover, these experiments identified additional Ade-H2 NOE cross-peaks for nucleotides A13 and A20 to other protons in the PL (Figures 4.9B). Figure 4.9. Assignment of additional ? priming loop resonances20. (A) Secondary structure of atom- specifically labeled (i.e., [2?,8-13C2]-Ade, [1?,6-13C2, 5-2H]-Cyt/Uri, and [1?,8-13C2]-Gua) full-length ? made from IVT. Nucleotides harboring isotope labels are colored orange. (B) 2D 1H-13C slice from a 3D filter/edit experiment on atom-specifically labeled full-length ? (from A). Resonances marked (*) refer to signals that are aliased (i.e., folded in). NMR spectra were recorded at 25 ?C and 800 MHz. 4.3.2 Confirmation of ? secondary structure To further confirm base pairing in ?, HNN-COSY experiments were collected on u-[15N]- labeled full-length and PL ? constructs. Using the latter, only two A:U and seven G:C base pairs were observed, whereas five A:U and eight G:C base-pairs were expected from the previously reported secondary structure238,298,303,328 of this ? region. The absence of visible terminal G:C base pairs due to fraying is common in NMR, but the lack of 122 detectable A13:U49, A20:U48, and A21:U47 base pairs in HNN-COSY experiments suggests either solvent exchange or incorrect assignment of secondary structure. To provide more insight into these possibilities, we employed a two-bond (H2)-NN-COSY experiment348 on the PL ? construct. This experiment reports on dynamic solvent labile base pairs otherwise undetectable in classic HNN-COSY experiments349 due to solvent exchange of Uri-H3 by transferring magnetization from Ade-H2. With this experiment, the only undetectable base pair was A13:U49 (Figure 4.10). Figure 4.10. Novel ? secondary structure information20. H(2)-NN-COSY NMR experiment on u-[15N]- labeled PL ?. Base-pairs indicated with gray line connecting Ade-N1 (gray) and Uri-N3 (black) peaks. All A:U base pairs were detected except A13:U49. NMR spectra were recorded at 25 ?C and 800 MHz. 4.3.3 Assessing ? solvent accessibility and local dynamics To complement the NMR observables described above and aid our structural interpretation, we measured solvent paramagnetic relaxation enhancement (sPRE) by titrating u-[13C/15N]-labeled full-length ? with a magnetic resonance imaging contrast dye (e.g., gadolinium-diethylenetriamine pentaacetic acid-bismethylamide, Gd-DTPA-BMA). 123 In these experiments, R1 is measured for RNA protons as a function of Gd-DTPA-BMA concentration and the slope provides quantitative solvent accessibility and distance-to- surface information350, with 0 and 1 being the least and most solvent accessible, respectively. These measurements demonstrate that most helical regions of ? are buried (i.e., below or around the mean) whereas nucleotides in the PL (U15-C19), PTL (U32- U34 and C36), and U43 bulge display elevated sPRE values (>1 SD above the mean) suggestive of solvent accessibility and/or local motion (Figure 4.11A). Moreover, the PL- adjacent nucleotide U49 also showed an elevated sPRE value whereas A13 did not (Figure 4.11A). These data are consistent with the undetectable A13:U49 base pair from our H(2)-NN-COSY experiment (Figure 4.10). To determine whether the solvent exposed regions of ? also undergo local motions, we measured cross-correlated relaxation (?xy) rates351?353. To alleviate spectral overlap and circumvent deleterious 13C-13C dipolar coupling81,82,100,198,227,232, we used atom- specifically labeled (i.e., [2?,8-13C2]-Ade, [1?,6-13C2, 5-2H]-Cyt/Uri, and [1?,8-13C2]-Gua) full- length ? (Figure 4.9A). In these experiments, attenuated (<1 SD below the mean) normalized 13C ?xy rates indicate fast ps-ns motions. These measurements demonstrate that most helical regions of ? are rigid (i.e., above or around the mean) whereas nucleotides in the PL (C14-C19), PTL (U32-U34 and C36), and U43 bulge display attenuated (<1 SD below the mean) normalized 13C ?xy rates suggestive of local motion on the ps-ns time scale (Figure 4.11B). These data also suggest that A13 is rigid whereas U49 is flexible (Figure 4.11B), indicating that they are unlikely to base pair, in support of our sPRE (Figure 4.11A) and H(2)-NN-COSY348 (Figure 4.10) measurements. Taken together, A13 and U49 were designated unpaired in structure calculations. 124 Figure 4.11. Full-length ? exhibits high solvent accessibility and local dynamics20. (A) sPRE values of u-[13C/15N]-labeled full-length ? were measured for Ade-H2, Pyr-H5 and -H6, and Pur-H8 nuclei as well as ribose H1? resonances at various Gd-DTPA-BMA concentrations. (B) Normalized (norm) 13C ?xy rates of atom-specifically labeled full-length ? (Figure 4.9A) were measured for Pur-C8 and Pyr-C6 nuclei. In A and B, ? structural regions are abbreviated and colored as in Figure 4.7. Error bars represent ? SD and dashed lines represent 1 SD above (sPRE) or below (hxy) the mean (calculated from helical nuclei). Nucleotides in the PL (including adjacent U49), PTL, and U43 bulge experience solvent accessibility and fast ps-ns motions. NMR measurements were recorded at 25 ?C and either 600 (?xy) or 800 MHz (sPRE). 4.3.4 Full-length ? structural modeling Full-length ? structure calculation was carried out using the Xplor-NIH 2.48 framework based on the protocol in the gb1_rdc example included in the software package354, including the recently published RNA torsion potential RNA-ff1355. Initial structures were calculated from NOE distance, hydrogen bonding, and dihedral angle restraints. Then, torsion database statistics, residual dipolar coupling (RDC), sPRE, and small/wide-angle X-ray scattering (SAXS/WAXS) data were used to refine the structure (Figure 4.12 and Table 4.3). The nbTargetPot350 module was used to correlate sPRE data 125 with distance-from-surface refinements. Following standard refinement protocols, the top 10 of 200 NMR structures are reported with a root-mean square deviation (RMSD) of 1.8 ?, indicating a well-folded converged structure (Figure 4.12). Figure 4.12. NMR and SAXS refined solution structure of full-length ?20. (A) Correlation plot between the experimentally measured and back-calculated RDCs for the top-ranked ? conformer. RDC measurements were recorded at 600 MHz and 25 ?C. (B) Back-calculated scattering curve the top-ranked ? conformer fit to experimental SAXS data. (C) Bundle of top 10 lowest energy Xplor-NIH354 ? structures (PDB 6var20) (RMSD of 1.8 ?) shown inside the SAXS envelope and with a zoomed in view of the PL in three NMR conformers (rank 3, 5, and 6). These three conformers share the backbone kink centered at U15 followed by partially stacked G16 and U17. ? structure regions are colored as in Figure 4.7. Our NMR and SAXS refined structural model shows good agreement with the existing 27 nt AL ? solution NMR structure238 (Figure A.31). Importantly, our structure provides novel insight into the PL structure, revealing its unique orientation. Specifically, nucleotides U15-C19 remain well-oriented with G16 and U17 partially stacked in three of the top 10-ranked NMR structures (Figure 4.12). Moreover, a kink is apparent in the backbone of these ? conformers between nucleotides C14-G16 (Figure 4.12). However, 126 our sPRE and ?xy measurements also suggest that ? is dynamic and likely does not adopt one single structure. Nevertheless, our NMR model is a critical first step toward understanding what ? structural transition facilitate the replication of the HBV genome. This will be investigated in greater detail in Section 4.4. Table 4.3. NMR and refinement statistics for full-length (61 nt) ?20. NMR distance and dihedral restraints Distance restraints Total NOE 478 Intra-residue 177 Inter-residue 301 Hydrogen bonds 108 RDC 47 Total dihedral-angle restraints 423 Backbone 408 Sugar pucker 15 Other restraints sPRE 104 Structure statistics Violations (mean ? SD) NOE (?) 0.08 ? 0.00 Dihedral angle (?) 6.64 ? 0.89 RDC (Hz) 0.62 ? 0.09 sPRE (mM-1s-1) 0.57 ? 0.00 Deviations from idealized geometry Bond length (?) 0.83 ? 0.01 Bond angles (?) 0.01 ? 0.00 Impropers (?) 0.98 ? 0.02 Heavy atom RMSD from mean structure (?) Overall (nt G1-C61) 1.83 ? 0.14 Lower helix (nt G1-U12 and G50-C61) 1.34 ? 0.08 Priming loop (nt A13-A20, U48, and U49) 0.85 ? 0.12 Apical loop (nt A21-U47) 1.57 ? 0.05 4.3.5 Experimental details 4.3.5.1 NMR sample preparation Full-length 61 nt ? was prepared by IVT as in Section 2.2.4.1, except using the p2RZ HDV ribozyme-containing plasmid356. A combination of u-[13C/15N]-, u-[15N]-, and atom- specifically labeled full-length ? samples were used for NMR resonance assignment and 127 structural studies. All ? NMR samples were 0.5-1.5 mM in 300 ?L of NMR buffer A (calculated using an extinction coefficient of 768.3 mM-1cm-1). 4.3.5.2 NMR resonance assignment (by Drs. Regan LeBlanc and Andrew Longhini) All NMR experiments were performed on either an Avance III Bruker Ultrashield Plus 600 MHz spectrometer with a room temperature triple resonance probe or an 800 MHz Avance III Bruker spectrometer equipped with a triple resonance cryogenic probe and at 25 ?C. Starting points for non-exchangeable (i.e., Ade-H2, Pur-H8, Pyr-H6, and ribose H1? and H2?) assignments were identified with an atom-specifically labeled (i.e., [2?,8- 13C2]-Ade, [1?,6-13C2, 5-2H]-Cyt/Uri, and [1?,8-13C2]-Gua) (Figure 4.9A) sample that gave unambiguous contacts between unlabeled Ade-H2 (1H-12C) and labeled ribose H1? (1H- 13C) resonances, as previously reported347 when applying a 3D 13C-filter/edit NOESY experiment. Additional non-exchangeable protons were assigned with a combination of through-bond HCCH-COSY-TOCSY357, TROSY-detected HCN358, and 3D 13C-edited and 13C-filtered 1H-1H NOESY experiments347,359. Assignment of base pairs in helical regions was confirmed with 1H-1H NOESY of the imino protons with RNA-PAIRS360. A through- bond HNN-COSY experiment348,349 confirmed hydrogen bonding restraints for the ? LH and UH. Any ambiguity in the above experiments was resolved using ? modular constructs (i.e., PL and AL) (Figures 4.7, 4.8, and A.30). Taken together, these methods permitted complete assignment of the Ade-H2-C2, Pur-H8-C8, Pyr-H6-C6, ribose H1'-C1' and H2'-C2', Gua-H1-A1, and Uri-H3-N3 resonances. Raw NMR data were processed and analyzed using Topspin 4.0, NMRPipe361, NMRFx Processor, and NMRViewJ196. All NMR chemical shifts have been deposited in the Biological Magnetic Resonance Data Bank under accession number 5013620 and additional information can be found there. 128 4.3.5.3 sPRE measurements (by Dr. Regan LeBlanc) To assess the solvent accessibility in ?, 1H-13C HSQC spectra were collected with and without Gd-DTPA-BMA350. All measurements were collected on u-[13C/15N]-labeled full- length ? on an 800 MHz Avance III Bruker spectrometer equipped with a triple resonance cryogenic probe and at 25 ?C. Using a saturation-recovery approach, proton R1 rates were measured as a function of increasing Gd-DTPA-BMA concentration (0.8-4.2 mM). R1 experiments were acquired in an interleaved manner as a pseudo-3D experiment and R1 rates were determined by fitting peak intensities to a monoexponential decay. The slope of the best fit linear relationship between proton R1 and dye Gd-DTPA-BMA concentration gave the sPRE value for each measured proton. Uncertainties in sPRE measurements were estimated by propagating the error in peak intensities from duplicated delay points in in R1 experiments. Due to spectral overlap, data from certain resonances were excluded from analysis (Table A.1). 4.3.5.4 13C ?xy rate measurements (by Dr. Andrew Longhini) NMR cross-correlated relaxtion experiments were performed on atom-specifically labeled (i.e., [2?,8-13C2]-Ade, [1?,6-13C2, 5-2H]-Cyt/Uri, and [1?,8-13C2]-Gua) (Figure 4.9A) full-length ? on an Avance III Bruker Ultrashield Plus 600 MHz spectrometer with a room temperature triple resonance probe and at 25 ?C. Hahn-Echo TROSY-detected measurements of the fast (anti-TROSY, R2?) and slowly (TROSY, R2?) relaxing TROSY components were adapted from previous pulse sequences190. To alleviate spectral overlap and circumvent deleterious 13C-13C dipolar coupling81,82,100,198,227,232, atom-specifically labeled (i.e., [2?,8- 13C2]-Ade, [1?,6-13C2, 5-2H]-Cyt/Uri, and [1?,8-13C2]-Gua) full-length ? (Figure 4.9A) was 129 used. R2? and R2? experiments were acquired in an interleaved manner as a pseudo-3D experiment. ?xy rates351?353 were obtained by the following relation ?%? ? ? ? = %?Of (4.1) 2 where R2? and R2? rates were determined by fitting peak intensities to a monoexponential decay. Uncertainties in R2? and R2? rates were estimated from spectral signal-to-noise with RELAXFIT197. Given that ?xy is dependent on the CSA, purine and pyrimidine values were normalized by dividing ?xy rates by their corresponding maximum value. Due to spectral overlap, data from certain resonances were excluded from analysis (Table A.2). 4.3.5.5 SAXS measurements (by Dr. Andrew Longhini) Full-length unlabeled 61 nt ? was prepared by IVT as in Section 2.2.4.1. ? samples of increasing concentrations (1.0, 2.5 and 5.0 mg/mL) were prepared in SAXS buffer (10 mM 3-morpholinopropane-1-sulfonic acid (MOPS), 0.1 mM EDTA pH 6.7) and SAXS data were collected at the Advanced Photon Source at the Argonne National Laboratory. Raw data were reduced, each data set was buffer subtracted, and the three data sets were then merged and extrapolated back to q=0 to use for subsequent analysis. The Kratky plot (i.e., plot of q2?I(q) versus q) indicated a well-folded RNA and real-space pair distance distribution function (PDDF) analysis gave a radius of gyration (RG) and maximum distance (Dmax) of 29.8 and 97.0 ?, respectively (Figure A.32). 4.3.5.6 Structure calculation (by Drs. Regan LeBlanc and Andrew Longhini) After the chemical shifts of HBV e NMR resonances were assigned using a combined asymmetric 13C-labeling and isotopic filter/edit NOESY strategy347, the NOE distance 130 restraints, dihedral angle restraints, sPRE350 and RDC restraints were combined for use in structure calculations in Xplor-NIH (version 2.48)354. Additional restraint tables were taken from SAXS/WAXS data. Python scripts to fold and refine the RNA structure were adapted from the Xplor-NIH website354. Using previously reported protocols350,354,355, the extended RNA sequence was subjected to high temperature MD in torsion angle space, simulated annealing, gradient minimization in torsion angle space, and gradient minimization in Cartesian space. These stages folded the RNA according to the restraint files collected from the established torsion database, NMR distances, hydrogen bond distances, and SAXS/WAXS data. Then, 200 structures and violation files were generated, and the top 10 lowest energy structures were checked by MolProbity362 and the atomic coordinates for the reported solution structure ensemble have been deposited with the PDB under accession number 6var20. 4.4. Probing full-length ? dynamics on multiple time scales As discussed above, our 13C ?xy relaxation measurements20 (Section 4.3.3) hint at ps-ns motions within ? (Figure 4.11B), in agreement with previous studies329 on the AL ? (Section 4.2.5 and Figure 4.5B). Still, several open questions remain about the dynamics of full-length ?: what is the extent and time scales of motion it experiences? Does it adopt multiple conformations, and how do these modulate ?-P binding interactions and downstream functions? As a first step toward assessing the relevance of RNA dynamics for the ?-P interaction, we present a detailed description of ? motions using 13C spin relaxation and 1H CPMG measurements as well as MD simulations. 131 4.4.1 Full-length ? undergoes fast motions on the ps-ns time scale To confirm and extend previous and preliminary dynamics data on full-length ?, we employed 13C spin relaxation experiments to probe motions on the ps-ns time scale in both the nucleobase (i.e., Pur-C8 and Pyr-C6) and ribose (i.e., C1?) moieties. To alleviate spectral overlap and circumvent deleterious 13C-13C dipolar coupling81,82,100,198,227,232, atom-specifically labeled (i.e., [1?,8-13C2]-Ade/Gua and [1?,6-13C2, 5-2H]-Cyt/Uri) ? samples (Figure 4.13) were used in all 13C spin relaxation experiments. Figure 4.13. NMR samples used to probe full-length ? dynamics281. For all NMR experiments, three atom-specifically labeled full-length ? samples were prepared by IVT: (1) [1?,8-13C2]-ATP and [1?,8-13C2]- GTP labeled, (2) [1?,6-13C2, 5-2H]-CTP labeled, and (3) [1?,6-13C2, 5-2H]-UTP labeled. Nucleotides harboring isotope labels are shown in orange and bolded. These labeling topologies approximate isolated spin-pairs in both the nucleobase (i.e., purine H8-C8 and pyrimidine H6-C6) and ribose (i.e., H1?-C1?) moieties. R1 and R2 rates and {1H}-13C hNOE values were determined for all well-resolved Pur-C8, Pyr-C6, and ribose C1? nuclei in full-length ?. Monoexponential decay was observed in all relaxation experiments, indicating the absence of 13C-13C cross-relaxation and Hartman- Hahn transfer (Figure A.33). Slight variations in data are observed for Pur-C8 and Pyr-C6 nuclei within the same secondary structural elements, reflecting their different CSAs 132 (Figure 4.14A). Given that the ribose C1? CSA is small and less dependent on secondary structure230, these data provide a view of dynamics orthogonal to that from nucleobase C6 and C8 sites (Figure 4.14B). Nucleobase (Figure 4.14A) and ribose (Figure 4.14B) measurements show that most nucleotides in helical regions of ? are rigid whereas nucleotides in the PL (C14-C19, including adjacent U48 and U49), PTL (U32-C36), and U43 bulge show elevated R1 and hNOE values (>1 SD above the mean) and attenuated R2 rates (<1 SD below the mean), indicative of increased flexibility (Figure 4.14). Figure 4.14. 13C relaxation experiments indicate dynamic regions of full-length ?281. Measurements of 13C R1, R2, and hNOE values are shown for all well-resolved (A) nucleobase (i.e., Pur-C8 and Pyr-C6) and (B) ribose (i.e., C1?) nuclei with ? structural regions abbreviated as in Figure 1(b). ? structural regions are abbreviated and colored as in Figure 4.7. Error bars represent ? SD and dashed lines represent 1 SD above (R1 and hNOE) or below (R2) the mean (calculated from helical nuclei). Nucleotides in the PL (including U48 and U49), PTL, and U43 bulge experience fast ps-ns motions. NMR data were collected at 600 (nucleobase) and 800 (nucleobase and ribose) MHz and at 25 ?C. 133 We then used the R2/R1 ratio to determine which regions of full-length ? undergo motion on the ps-ns time scale. This analysis is approximately independent of the site- specific variations in dipolar coupling and CSA and is therefore less artifact prone. These data demonstrated that, compared to helical regions, nucleobase sites in the PL (C14- C19 and U49), PTL (U32-U34 and C36), U43 bulge (and adjacent G41), and 5?- and 3?- ends (G1-U3 and G58) show attenuated (<1 SD below the mean) R2/R1 values, suggestive of high-frequency motions (Figure 4.15A). Figure 4.15. Mapping dynamic ?hot spots? in full-length ?281. R2/R1 values derived from the data shown in Figure 4.14 for all well-resolved (A) nucleobase (i.e., Pur-C8 and Pyr-C6) and (B) ribose (i.e., C1?) nuclei, with ? structural regions abbreviated and colored as in Figure 4.7. Error bars represent ? SD and dashed lines represent 1 SD below the mean (calculated from helical nuclei). (C) Summary of 13C spin relaxation data mapped onto the top-ranked ? NMR conformer (PDB 6var20) to indicate dynamic ?hot spots? that undergo fast ps-ns motions. 134 Fast dynamics were also observed for ribose nuclei in the PL (C14-C19 and adjacent A20, U48, and U49), PTL (U32, G33, G35, C36, and adjacent G30), U43 bulge (and adjacent G42), 5?-ends (G2) (Figure 4.15B). Taken together, our 13C relaxation data (Figures 4.14 and 4.15) indicate that the PL, PTL, and U43 bulge are dynamic ?hot spots? that undergo fast ps-ns motions (Figure 4.15C). 4.4.2 Full-length ? experiences slow motions on the ?s-ms time scale Given that previous measurements329 hint at ?s-ms motions within AL ?, we sought to unambiguously probe for such conformational exchange in full-length ?. We therefore employed 1H CPMG relaxation dispersion measurements (Section 3.3. and Figure 3.8A)248?250. To alleviate spectral overlap and circumvent deleterious 13C-13C scalar coupling271,272, atom-specifically labeled (i.e., [1?,8-13C2]-Ade/Gua and [1?,6-13C2, 5-2H]- Cyt/Uri) ? samples (Figure 4.13) were used in 1H CPMG experiments. CPMG curves were obtained for all well-resolved Pur-H8, Pyr-H6, and ribose H1? nuclei in full-length ?, as outlined in Section 3.3. This analysis indicated that several nucleotides in the PL (C14, U15, U17-C19, U48, and U49), PTL (U32, U34 and C36), and U43 bulge experience conformational exchange whereas the majority of helical nucleotides do not (Figure 4.16). As with our 13C spin relaxation data, our 1H CPMG data show nucleotides with different motions in their nucleobase and ribose moieties. For example, exchange was observed in both nucleobase and ribose nuclei for U17, U18, U32, and U43 (Figure 4.16). Conversely, C14, C19, and U34 only displayed Rex in their nucleobase nuclei (Figure 4.16) whereas U15, C36, U48, and U49 exhibited exchange only in their ribose nuclei (Figure 4.16). 135 Figure 4.16. 1H CPMG experiments reveal chemical exchange in full-length ?281. 1H CPMG relaxation dispersion profiles are shown for nucleobase (i.e., Pyr-H6) and ribose (i.e., H1?) nuclei showing evidence of exchange (i.e., a non-flat dispersion where Rex >0). NMR data were collected at 800 MHz and 25 ?C. All CPMG curves are shown with best fits to the Luz-Meiboom (fast exchange) model using ShereKhan363. Representative nucleobase (C24-H6) and ribose (C24-H1?) profiles that do not experience conformational exchange (i.e., Rex = 0) are shown as a reference. Error bars represent ? SD. All CPMG dispersion curves were then fit to a two-site exchange model and the appropriate exchange regime (i.e., fast or slow exchange) was selected based on chi- squared minimization363 (Table A.3). Moreover, all CPMG data were analyzed with three different fitting programs to ascertain the robustness of the fits (see Section 4.4.5. and Table A.4). These analyses revealed that all nuclei exhibit exchange that is fast on the NMR time scale (Tables A.3 and A.4). Therefore, and as discussed in Section 3.3.1.1, we are only able to extract kex and ?ex250,253,278. Interestingly, conformational exchange processes in ? occur with a variety of kex values (Table A.3 and Figure 4.17A). Considering such events in both nucleobase and ribose moieties, PL nuclei have a kex ranging from ~1,700-4,700 s-1, those in the PTL have kex between ~3,400-11,700 s-1, and U43 has a 136 kex of ~3,800-4,600 s-1 (Table A.3 and Figure 4.17A). These rates correspond to ?ex values of ~200-600, ~90-290, and ~220-260 ?s for nuclei in the PL, PTL, and U43 bulge, respectively (Table A.3 and Figure 4.17B). Taken together, our 1H CPMG data reveal ?s- ms chemical exchange that localizes to the same structural regions (i.e., PL, PTL, and U43 bulge) (Figure 4.17) as the motions on the ps-ns time scale (Figure 4.15D), implying that relatively slower motions are superimposed onto faster ones within full-length ?. Figure 4.17. Quantifying chemical exchange processes in full-length ?281. (A) All kex and ?ex values, derived from the 1H CPMG data presented in Figure 4.16 and Table A.3 are shown with ? structural regions abbreviated and colored as in Figure 4.7. The kex values are the direct output from Luz-Meiboom (fast exchange model) fitting with ShereKhan363 and ?ex = 1/kex. Error bars represent ? SD. (B) Summary of 1H CPMG data mapped onto the top-ranked ? NMR conformer (PDB 6var20). 4.4.3 MD simulations show high-frequency motions within full-length ? To gain further insight into the conformational dynamics of full-length ?, MD simulations were employed. Fortunately, this computational technique provides some structural insight and reports on motions that occur on time scales between that of NMR spin relaxation and relaxation dispersion (i.e., ns-?s motions). Three independent 1 ?s 137 simulations were carried out on ? NMR conformer 320, designated R3. This model was chosen due to its unique PL orientation20. A large ensemble of states is evident from overlays of ? conformations sampled throughout an MD run (Figure 4.18A). When aligning the backbone phosphorus atom of all nucleotides, the ensemble centers around an average fold, similar to that of the deposited NMR model20. However, when the alignment only includes nucleotides in the LH, this no longer holds true. Instead, a vast conformational ensemble is observed with large deviations within the remaining portion of the RNA (Figure 4.18A). Taken together, this suggests that global flexibility of full- length ? correlates with the flexibility of its PL, PTL, and U43 bulge (Figure 4.18). In agreement with this idea, MD simulations show elevated RMSD values (>1 SD above the mean) for nucleotides in the PL (A13-A21 and U48-G50), PTL (C31-C37), and U43 bulge (G42-G44), indicative of increased flexibility far above those in helical regions (Figure 4.18B). While variations exist between the three MD runs (Figure A.34), the overall trends are unambiguous. Moreover, MD data (Figure 4.18B), show agreement with 13C spin relaxation data (Figures 4.14 and 4.15), and we therefore conclude that ? exhibits increased sampling on the ns-?s time scale within the same structural regions (i.e., PL, PTL, and U43 bulge) that undergo both ps-ns and ?s-ms motions (Figure 4.16 and 4.17). 138 Figure 4.18. MD simulations reveal high-frequency motions within full-length ?281. (A) Representative structural overlay of PDB snapshots taken every 10 ns of the first (of three) 1 ?s MD trajectory, with backbone phosphorus atom alignments shown using all (i.e., G2-C61) (left) or LH (i.e., G1-U12, G50-C60) (right) nucleotides. (B) Whole-atom RMSD averaged over the entire 1 ?s trajectory (see Section 4.4.5.4). Error bars represent ? SD and the dashed line represents 1 SD above the mean (calculated from helical nucleotides). In A and B, ? structural regions are abbreviated and colored as in Figure 4.7. To provide structural insight into the conformational sampling of ?, PDB snapshots were taken throughout the MD trajectories. These analyses reveal that nucleotides in the PL undergo a series of nucleobase flipping events and backbone reorganizations on the ns-?s time scale. For example, the nucleobases of A13 and C14 move toward one another into a partially stacked conformation. The nucleobase of G16, on the other hand, flips away from the PL central cavity to partially stack with U15 (Figure 4.19). Finally, the nucleobases of U17, U18, and C19 flip back-and-forth between the PL central cavity, facilitated by kinking their backbone dihedrals (Figure 4.19). Similar types of motions are also observed in the PTL and U43 bulge, though to a lesser extent (Figure 4.19). 139 Specifically, the nucleobases of U32, G33, U34, and C36 flip back-and-forth whereas that of C31 and G35 are relatively more stable (Figure 4.19). Moreover, the PTL backbone, which is initially kinked at G33, appears to unwind, resulting in U34 and G35 switching orientation around G33 (Figure 4.19). Finally, motion in the nucleobase of U43 around the ribose of G42 is accompanied by backbone reorganization (Figure 4.19). Figure 4.19. Structural insight into ns-?s sampling within full-length ?281. Representative PDB snapshots are shown at equally spaced time (i.e., 0, 250, 500, 750, and 1,000 ns) points along the first (of three) 1 ?s MD trajectory and are aligned at the backbone phosphorus atom of either PL nucleotides (i.e., A13-A20, U49, and G50) (for PL snapshots) or the UH nucleotides (i.e., A21-U48) (for PTL and U43 snapshots). ? structural regions are colored as in Figure 4.7. 4.4.4 Full-length ? undergoes vast conformational sampling We then performed K-means clustering analyses on all the conformations sampled in each of the three 1 ?s MD trajectories. Clustering was carried out using either the global RMSD of the backbone phosphorus atom of nucleotides G2-C61 or backbone and chi dihedral angles from PL nucleotides A13-A20. We used five clusters and identified 140 centroid structures (designated c0, c1, c2, c3, and c4) that are MD-sampled conformations that have the lowest cumulative distance to every other point in a given cluster. Centroid structures are therefore representative conformations of the five clusters, where c0 represents the most populated centroid and c4 represents the least. Interestingly, individual simulations largely sampled individual clusters, with minimal sampling across clusters (Tables A.5 and A.6). However, when clustered as an ensemble (i.e., including all three R3-based MD runs), there was a high degree of correlation between centroid structures identified by clustering on global RMSD and centroid structures identified by clustering on the PL backbone dihedral angles (Figure 4.20). Figure 4.20. Global ? motions correlate with priming loop conformations281. (A) 2D-RMSD plot comparing centroid structures from K-means clustering analysis of the set of three R3-based 1 ?s MD runs using global RMSD of the backbone of phosphorus atom of nucleotides G2-C61 (x-axis) and backbone and chi dihedral angles (i.e., ?, ?, ?, ?, ?, ?, and ?) of the PL nucleotides A13-A20 (y-axis). (B) Representative structural overlays with percentage of ensemble from global RMSD (first percentage) and PL dihedral (second percentage) clustering. Centroid structures of interest are shown with ? structural regions abbreviated and colored as in Figure 4.7 and all remaining structures are shown transparent. All centroid structures are aligned at the backbone phosphorus atom of the PL (i.e., A13-A20, U49, and G50). (C) Same 2D-RMSD plot as in A except clustering by backbone and chi dihedral angles of the PTL nucleotides C31- C36 (left) or LH nucleotides G2-A13 and U49-C61 (right). 141 Figure 4.20A shows a 2D-RMSD plot comparing the backbone phosphorus atom RMSD of centroid structures from global RMSD (x-axis) and PL dihedral (y-axis) clustering. Comparatively low backbone RMSD along the diagonal of the 2D-RMSD plot suggests that these centroid structures (e.g., c0 from RMSD clustering and c0 from PL dihedral clustering) have the same overall conformations (Figure 20A and B). Importantly, this correlation is absent when clustering with PTL or LH nucleotide dihedral angles (Figure 20C) instead of the PL nucleotides. Taken together, these data suggest that global ? motions uniquely correlate with PL conformations. We then carried out additional 1 ?s MD trajectories on all ? NMR conformers R1- R1020. As one means of quantifying the global sampling of ?, we measured the overall global bend between the helices. To this end, a simple angle (?) convention was used, where the center of mass of the first turn of the LH, the PL and its flanking nucleotides, and the second turn of the UH defined the three points (Figure 4.21A). Using data from all MD trajectories (i.e., three of R3 and new R1-R10), the sampling of ? angles from 65- 180? suggests a large conformational space (Figure 4.21B). Next, we employed principal component analysis (PCA) of the covariance matrix of all MD trajectories to extract the dominant modes of ? motion sampled in these simulations. PC1 and PC2 account for 43.6 and 18.8% of the eigenvalues in the covariance matrix and therefore describe the majority (>61%) of motional modes. While full-length ? assumes a vast amount of sampling space (Figure 4.21C and D), no single simulation visits all the possible conformations, suggesting that our simulations show no obvious convergence. Instead, our simulations investigate potential conformations and demonstrate that wide global rearrangements in ? are possible and likely occur on a time scale beyond the MD sampling (i.e., 1 ?s). 142 Figure 4.21. Vast conformational sampling of full-length ?281. (A) Schematic definition of the angle ? mapped onto an MD-sampled ? conformer. Here, the first turn of the LH is G2-A6 and U56-C60, the PL and adjacent are A13-A20, U48, and U49, and the second turn of the UH is U25-G30 and C37-G42. (B) Normalized population data of the distribution of ? sampled for all 13 MD trajectories. (C) PCA of the modes of motion along PC1 and PC2 for all 13 MD trajectories. In B and C, R3a, R3b, and R3c refer to the initial three independent 1 ?s MD runs of ? R3. (D) Vectors of amplitudes of motion of the PC1 projection shown on full-length ? with structural regions colored as in Figure 4.7. PC vector images were made using the NMWiz plugin for Visual Molecular Dynamics364,365. 4.4.5 Experimental details 4.4.5.1 NMR sample preparation Full-length 61 nt ? samples were prepared by IVT as in Section 2.2.4.1. For all NMR experiments, three atom-specifically labeled ? samples were used: (1) [1?,8-13C2]-ATP and [1?,8-13C2]-GTP labeled, (2) [1?,6-13C2, 5-2H]-CTP labeled, and (3) [1?,6-13C2, 5-2H]-UTP 143 labeled (Figure 4.13). These labeling patterns approximate isolated heteronuclear spin pairs in both the nucleobase (i.e., Pur-H8-C8 and Pyr-H6-C6) and ribose (i.e., H1?-C1?) moieties to reduce spectral crowding and permit NMR measurements free of complications from 13C-13C scalar271,272 and dipolar81,82,100,198,227,232 couplings. All ? NMR samples were 0.5-0.7 mM in 300 ?L of NMR buffer A (calculated using an extinction coefficient of 768.3 mM-1cm-1). 4.4.5.2 13C spin relaxation measurements All NMR experiments were performed on either an Avance III Bruker Ultrashield Plus 600 MHz spectrometer with a room temperature triple resonance probe or an 800 MHz Avance III Bruker spectrometer equipped with a triple resonance cryogenic probe and at 25 ?C. TROSY-detected measurements of 13C R1 and R1? relaxation rates and steady- state {1H}-13C hNOE measurements were adapted from previous pulse sequences (Figure 3.5)190,191,195. R1 and R1? experiments were each collected in an interleaved manner as a pseudo-3D experiment with relaxation delays specified in Table A.7 and using 215 (1JH8C8), 185 (1JH6C6), and170 Hz (1J H1?C1) couplings38 for coherence transfers. In these experiments, recycle delays of 1.5 and 2.5 s were used at 600 and 800 MHz, respectively. In the R1? experiments, ?1 and ? were 1.882 kHz and 6 Hz, respectively. In the saturation hNOE experiments, recycle delays of 1.5 s were used and saturation delays of 5.0 and 7.0 s were used at 600 and 800 MHz, respectively. Proton saturation was achieved using a hard 180? pulse. In the no saturation hNOE experiments, delays of 6.5 and 8.5 s were used at 600 and 800 MHz, respectively, to match delays in the saturation experiments. Additional experimental parameters are provided in Table A.8. 13C relaxation measurements were carried out at 600 and 800 MHz for nucleobase sites 144 and only 800 MHz for ribose sites. The latter was due to worse signal-to-noise and spectral overlap in the ribose region at lower magnetic field. Due to spectral overlap, data from certain resonances were excluded from analysis (Table A.9). NMR spectra were processed and analyzed using TopSpin 4.0, NMRFx Processor, and NMRViewJ196. R1 and R1? relaxation rates were determined by fitting peak intensities to a monoexponential decay. Uncertainties in R1 rates were estimated by propagating the error in peak intensities from duplicated delay points (indicated by ?x2? in Table A.7). Uncertainties in R1? rates were estimated from spectral signal-to-noise with RELAXFIT197 R2 rates were corrected for the off-resonance ?1 using Equations 2.1 and 2.2190,191. The hNOE values were obtained using Equation 3.3221,222 and their uncertainties were estimated by propagating the error in peak intensities from duplicated experiments. We did not conduct a Model-Free226 analysis of the 13C relaxation data given its many assumptions (e.g., molecular rotational anisotropy, 13C-13C dipolar couplings, and uncertainty in nucleobase CSA values, especially in different structural regions). Instead, we opted for a qualitative analysis, as previously described.228 That is, local motions were determined from individual relaxation parameters, whereby elevated R1 and hNOE values (>1 SD above the mean) and attenuated R2 and R2/R1 values (<1 SD below the mean) indicate fast motions on the ps-ns time scale. 4.4.5.3 1H CPMG measurements TROSY-detected 1H Carr-Purcell-Meiboom-Gill (CPMG) experiments were adapted from previous pulse sequences (Figure 3.9)272,276,277 and collected at 25 ?C on an 800 MHz Avance III Bruker spectrometer equipped with a triple resonance cryogenic probe. CPMG modules were used to obtain R2,eff as a function of ?CPMG during Trelax, as described in 145 Section 3.3.1.3279. In the CPMG experiments, 215 (1JH8C8), 185 (1JH6C6), and170 Hz (1J H1?C1) couplings38 were used for coherence transfers, recycle delays of 1.5 s were used, Trelax was set to 40 ms, and the ?CPMG values are specified in Table A.10. Additional experimental parameters are provided in Table A.11. Due to spectral overlap, data from certain resonances were excluded from analysis (Table A.9). NMR spectra were processed and analyzed with TopSpin 4.0, NMRFx Processor, and NMRViewJ196. The R2,eff rates were determined according to the Equation 3.19278 and uncertainties in R2,eff rates were estimated from spectral signal-to-noise with RELAXFIT197. The presence of chemical exchange was determined by identifying which nuclei displayed non-flat dispersion profiles. Then, the appropriate exchange regime (i.e., fast or slow exchange) was selected based on which model led to minimization of the chi- squared using ShereKhan363. CPMG curves were also fit to each of the three exchange models (i.e., fast, slow, or no exchange) using RING NMR Dynamics278 and the appropriate model was selected on the basis of the AIC280. The CPMG data were fit a final time using rdnmr (version 1.5)254 and the outputs were compared across the three programs. All exchange processes were determined to be in the fast exchange regime modeled as in Equation 3.20243,250, and the extracted parameters from each analysis program showed strong agreement (Table A.4). 4.4.5.4 MD simulations (by Drs. Christina Bergonzo and Wojciech Kasprzak) The Amber20 software package366 was used to perform MD simulations with the ff99LJbb367 forcefield (source file leaprc.RNA.LJbb), which combines the OL3368 parameter set, the Steinbrecher and Case phosphate oxygen van der Waals radii369, and the OPC water model370,371 to model RNA. The receptor PDB files were input into the 146 Amber LEaP module which combined them with OPC waters, Joung-Cheatham372 monovalent ions (Na+/Cl-), and the RNA-specific force field parameters mentioned above to generate the topology and coordinate files. Explicit solvent molecular particle mesh Ewald (PME) dynamics simulations were utilized373. RNA was placed in a cuboid solvent box with OPC waters and the minimum distance between the solute and solvent box boundary was set at 12 ?. The net solute charge was neutralized with Na+ ions, and additional Na+/Cl- ion pairs were added to simulate the net 0.15 M salt concentration for the entire system. Simulations were run with 2 femtosecond (fs) time steps, employing the SHAKE algorithm to constrain all hydrogen bonds in the system. The Berendsen thermostat374 and algorithm were used to maintain the simulation temperature of 26.85 ?C and to maintain the pressure at 1.0 pascal (Pa) in NPT simulations used in all phases of MD. A cut-off of 9 ? for the non-bonded interactions was used and explicit solvent periodic boundary conditions were employed. A 12-step equilibration protocol was used in all simulations that started with energy minimization of the solvent (while the RNA was restrained), followed by multiple short phases of heating to 26.85 ?C, dynamics at 26.85 ?C, and energy minimizations with gradually decreasing harmonic restraints applied to the solute. The last phase of the equilibration protocol was an unrestrained heating to 26.85 ?C, ramped up over 0.2 ns and kept at the steady target temperature for a total time of 2 ns. Following equilibration, unrestrained (production) MD simulations were performed for 1 ?s for ? R3 (PDB 6var20). Repeated simulations on all ? NMR conformers R1-R10 (PDB 6var20) were solvated in an octahedral TIP4Pew375 water box, with a 10 ? buffer between the solute and edge of the box. The ff99LJbb367 forcefield previously described was used, combined 147 with Joung-Cheatham372 monovalent ion parameters. Sodium ions were added to neutralize the RNA charge, and an effective salt concentration of 10 mM NaCl was added to match 6var experimental conditions20. The protocol described by Roe and Brooks was used to prepare the system for dynamics376. MD simulations were run in constant volume using the pmemd.cuda.MPI implementation377 in Amber20. The Langevin thermostat378 was used to maintain temperature at 26.85 ?C, with a collision frequency of 2 ps-1. Random seeds were used for each restart to avoid synchronization artifacts379. Hydrogen mass repartitioning was used so a 4 fs timestep could be employed380. SHAKE and default PME settings were used as described above. The Amber CPPTRAJ381 module was used for analysis. RMSD calculations were performed using the MD trajectories (excluding the 2 ns equilibrations) with ? R3 used as the reference structure. The 1 ?s trajectories were sampled in 0.1 ns steps to yield 10,000 data points and calculations were made considering all atoms within a 3 nt window (i.e., G1-U3,?,A59-C61). K-means clustering analysis was employed individually for the set of three ? R3-based 1 ?s MD runs or as an ensemble. Clustering was carried out using either the global RMSD of the backbone phosphorus atom of nucleotides G2-C61 or backbone and chi dihedral angles (i.e., ?, ?, ?, ?, ?, ?, and ?) from PL nucleotides A13- A20. From these data, five representative centroid structures were selected that minimize the distance between all other points in the cluster. CPPTRAJ381 was used a final time for measuring interhelical angles and PCA. 4.5 Conclusion The cis-acting regulatory stem-loop ? plays a central role in the HBV life cycle through its interaction with P292,295,297,304,317?31. Unfortunately, lack of structural data on P limits our 148 understanding of this interaction and its subsequent functions. Our recent NMR structure of full-length 61 nt ?20 provides an important starting point to an improved understanding of HBV replication. In addition to structural data, initial 13C ?xy relaxation measurements suggest that nucleotides in the PL, PTL, and U43 bulge are flexible on the ps-ns time scale (Figure 4.11B)20. Given that these regions are required for the ?-P interaction323,324 and its subsequent pgRNA packaging298,303,310,325 and DNA synthesis301,310,323,325 (Figure 4.6 and Tables A.1 and A.2), our data imply the functional importance of these motions. Whether or not ? by itself experiences conformational dynamics on additional time scales beyond ps-ns was completely unknown until this work, and we anticipate that such knowledge would add valuable insight to the ?-P binding mechanism. 4.5.1 ? dynamics occur in highly conserved nucleotides To begin to fill this critical knowledge gap, we combined NMR and MD simulations to provide a robust description of full-length ? dynamics. 13C spin relaxation measurements were used to assess fast motions on the ps-ns time scale. In agreement with our recent work20 and previous NMR studies329, nucleotides in the PL (C14-C19 and adjacent U48 and U49), PTL (U32-C36 and adjacent G30), and U43 bulge (and adjacent G41 and G42) undergo fast ps-ns motions (Figures 4.14 and 4.15). To unambiguously determine if ? experiences motion on the slower ?s-ms time scale, we employed 1H CPMG relaxation dispersion measurements. Interestingly, nucleotides in the PL (C14, U15, U17-C19, U48, and U49), PTL (U32, U34, and C36), and U43 bulge all experience conformational exchange (Figure 4.16) that is fast on the NMR time scale, albeit with various exchange rates and lifetimes (Figure 4.17 and Table A.3). These observations suggest that slower 149 motions are superimposed onto faster ones, with the notable exception of G16 and G33 in the PL and PTL, respectively (Figures 4.14-4.17). To gain additional insight into the conformational dynamics of ?, MD simulations were carried out on ? R320. RMSD analysis reveals that nucleotides in the PL (A13-A21 and U48-G50), PTL (C31-C37), and U43 bulge (G42-G44) show increased sampling on the ns-?s time scale (Figure 4.18), consistent with our NMR data (Figure 4.15C). In an attempt to provide some structural insight regarding local dynamics in full-length ?, we took PDB snapshots along our ? R3 MD trajectories. These analyses reveal that nucleotides in the PL, PTL, and U43 bulge undergo a series of nucleobase flipping events and backbone reorganizations (Figure 4.19). While the MD runs do not extend to the time scale that our 1H CPMG data report on, these data may nevertheless be useful in hinting at the structural rearrangements that take place on the longer ?s-ms time scale. To better understand the conformational variety in full-length ?, we clustered together all MD-sampled conformations of ? R3 (Figure 4.20). Interestingly, there was a correlation between the dominant PL orientations (cluster centroids from backbone and chi dihedral angle clustering) and the global motions of the RNA (cluster centroids from backbone phosphorus RMSD clustering) (Figure 4.20). Indeed, the representative centroid structures display of a wide variety of both PL conformations and global sampling, suggesting that the high degree of PL flexibility prevents ? from adopting a single, stable conformation (Figure 4.20B). Instead, PL flexibility enables global ? flexibility. To gain a better understanding of the conformational space of ?, we carried out ?s-length MD simulations on all ? NMR conformers R1-R1020. We observed a vast amount of sampling, as measured by interhelical angles (Figure 4.21A and B) and PCA 150 (Figure 4.21C and D). Given that there was no obvious convergence in our MD simulations, global rearrangements in ? likely occur on a time scale beyond the MD sampling (i.e., 1 ?s). This is consistent with the overall interpretation of our 1H CPMG data, which suggests that ? samples low populated, transient states on the ?s-ms time scale (Figures 4.16 and 4.17 and Table A.3). Taken together, our combined NMR and MD data suggest that ? undergoes conformational dynamics on multiple time scales (Figure 4.22A). Moreover, these motions are localized to nucleotides that are highly conserved and functionally important (Figure 4.22B) That is, every dynamic nucleotide in the PL, PTL, and U43 bulge is completely conserved except C14, G16, U32, and G35, and all nucleotides but G42 vary in fewer than three HBV strains (Figure 4.22). While nucleotides in rigid structural elements (e.g., helices) may be conserved to maintain their structure, the motions in the non-helical PL, PTL, and U43 bulge nucleotides are likely conserved for their biological function. 4.5.2 Conformational dynamics of ?: implications for HBV replication The diversity of RNA sequence- and structure-specific requirements for HBV replication (Figure 4.6 and Tables 4.1 and 4.2) suggests that each step (i.e., P binding, protein- priming, pgRNA packaging, and DNA synthesis) may require a distinct conformation of ? and P. Moreover, there are ? conformational states beyond unbound, P-bound, and priming-competent, as evidenced by the case where ?-P binding occurs without protein- priming324. Nucleotide motions may therefore enable binding to P and cellular HFs, and facilitate conformational changes required for protein-priming, pgRNA packaging, and DNA synthesis, as suggested for the related duck HBV304,312,382. 151 Figure 4.22. Full-length ? dynamics occur in highly conserved nucleotides281. (A) NMR and MD data mapped onto the secondary structure of full-length ?. Even though each nucleotide is denoted as either experiencing ps-ns and/or ?s-ms motions there are instances where this is true only for the nucleobase (i.e., Pur-C8/H8 or Pyr-C6/H6) or ribose (C1?/H1?) moieties (Figures 4.14-4.17). Dynamic nucleotides from the 5?- and 3?-ends have been excluded given that they likely experience motions due to base pair fraying. (B) Nucleotide mutations and conserved mobile mapped onto the secondary structure of full-length ?. Nucleotides with at least one mutation described among 1,025 HBV strains from the literature, and/or observed by Wijmenga and co-workers328 in 205 strains, are shown in gray. Mutations shown in parentheses have been seen in fewer than three strains, and (*) denote a coincidental double base pairing mutation in one strain. Conserved mobile nucleotides were shown to be dynamic by both NMR (13C spin relaxation or 1H CPMG) and MD. Using this framework, we hypothesize that dynamics may be linked to function as follows. Given that the PL and U43 bulge are required for P binding (Figure 4.6 and Tables 4.1 and 4.2)323,324, nucleotides with fast ps-ns motions in these regions could facilitate this interaction. Although the PTL is dispensable for P binding (Figure 4.6 and Tables 4.1 and 4.2)324, it is proposed to interact with cellular HFs that are needed for protein-priming and pgRNA packaging337?345. Again, ps-ns motions in the PTL may promote binding to HFs. Both scenarios involve conformational capture mechanisms, whereby mobile nucleotides facilitate binding interactions. Superimposed on these fast motions, it is likely that a series of complex conformational changes can alter contact networks between ?, P, and HFs to 152 encourage protein-priming, pgRNA packaging, and DNA synthesis. Fast exchange rates in PTL nucleotides could initiate the transition of the ?-P-HF complex to its priming- competent state. Then, other exchange processes may position ? within the P complex and cellular HFs to prepare the priming complex and present the 5?-end of the PL to the TP domain to prime and initiate reverse transcription. In the absence of additional experimental data, this must remain purely speculative. In conclusion, our combined NMR and MD data indicate a series of complex motions on multiple time scales within full-length ? (Figure 4.22A). This work provides a useful follow-up to our recent ? NMR structure20 and underscores an important reality: rather than a single, stable structure, ? exhibits a dynamic conformational ensemble with motions that occur in conserved structural regions (i.e., PL, PTL, and U43 bulge) that are critical for function (Figure 4.22). Taken together, our work strongly implicates ? dynamics as an integral part of HBV replication. This proposed dynamic model would require NMR measurements of ? in the presence of P, or at least its RT and TP domains, which at present is not feasible. Our dynamic description of unbound ? therefore provides a necessary starting point for a detailed understanding how RNA dynamics regulates HBV replication, which must take into account important metal ion requirements (e.g., Mg2+), structural dynamics of protein cofactors, and their interplay within the cellular context. Given the importance of the ?-P292,295,297,304,317?31 interaction for HBV replication, ? is an attractive therapeutic target. The following chapter describes our efforts to discover novel ?-targeting ligands as potential anti-HBV therapeutics. 153 5 Discovery of potential anti-HBV therapeutics *This chapter is adapted from the following20. 5.1 Introduction The previous chapter provided an in-depth look at the HBV life cycle and the structural dynamic characterization20,281 of one of its key actors, ?. This chapter, however, describes how these biophysical data can be leveraged to discover new anti-HBV therapeutics. To start, we will discuss the global burden of HBV (Section 5.2). Then, we will summarize the existing treatments for chronic HBV (cHBV) (Sections 5.3 and 5.4). Given the role of ? in P binding323,324, pgRNA packaging298,303,310,325, and DNA synthesis301,310,323,325 (Figure 4.6 and Tables 4.1 and 4.2), ? is an attractive therapeutic target. As such, we will showcase various screening efforts (Section 5.5) to identify ?-targeting ligands as a novel means to potentially inhibit HBV. This section includes contributions from Dr. Andrew Longhini and collaborators from the NCI (Drs. Fardokht Abulwerdi, Stuart LeGrice, Wojciech Kasprzak, and John Schneekloth Jr, and Bruce Shapiro). To conclude, we will summarize how our lead compounds can complement existing therapies (Section 5.6). 5.2 Global burden of HBV Globally, approximately two billion people have been exposed to HBV and >300 million people are chronically infected, leading to ~600,000 deaths per year383,384. In fact, HBV accounts for ~30 and ~50% of all cases of cirrhosis (i.e., sever liver damage) and hepatocellular carcinoma (HCC), respectively385. HBV infection is therefore a global heath burden, especially in developing countries. According to the World Health Organization (WHO), cHBV infection is highest in the Western Pacific and African Region, where 116 154 and 81 million people are chronically infected, respectively (Figure 5.1). Even in developed countries, with robust vaccination regimes (i.e., four HBV vaccines) and increased availability of treatment, the burden of HBV-related disease remains high. In the United States, the prevalence of chronic HBV infection is estimated to be ~0.27% (i.e., 0.8-1.4 million)386 but can be as high as 10-15% (i.e., 3-5 million) in Asian American communities387. Moreover, approximately 65% of these chronically infected people are unaware of their infection383. The prevalence of cHBV infection in Europe, on the other hand, varies from ~0.2-7% depending on the country, infecting a total of ~14 million people (Figure 5.1). Given that up to 90% of these chronically infected people are unaware of their infection388, accurate data are lacking. Figure 5.1. Chronic HBV infection statistics. Map of the WHO regions (left) and their total number of cHBV infections (right). Data were accessed from the WHO website (https://www.who.int/news-room/fact- sheets/detail/hepatitis-b) in March 2022. 5.3 Current treatments of chronic HBV infection The ultimate goal of cHBV treatment is to prevent cirrhosis, liver failure, and HCC. Treatment end points are designed to correlate with clinical end points and can be 155 classified as biochemical, virological, serological, and histological389. The biochemical end point is a normalized level of alanine aminotransferase (ALT) whereas the virologic is suppression of HBV cccDNA to undetectable levels389. The serologic end point, on the other hand, refers to the loss or seroconversion of hepatitis B e and surface antigens (HBeAg and HBsAg, respectively)389. Finally, the histologic endpoint is a decrease in necrosis (i.e., liver tissue damage) and inflammation without increasing liver scarring389. Currently, there are eight Food and Drug Administration (FDA)-approved treatments for cHBV, which help patients reach these end points to varying degrees. 5.3.1 FDA-approved chronic HBV treatments The eight FDA-approved treatments for cHBV infection include interferon (IFN)-? and its polyethylene glycol (PEG) containing form and six nucleos(t)ide RT inhibitors (NRTIs) (Figure 5.2). As their names imply, NRTIs prevent HBV (-)-DNA strand elongation by the RT domain of P390?392 (Figure 4.2). The exact mode of action of IFN-?, on the other hand, is less clear. In general, IFN-? is known to have antiviral, immunomodulatory, and antiproliferative effects393?395. Unfortunately, these treatments are not curative and involve life-long therapy (e.g., NRTIs) and adverse effects (e.g., NRTIs and IFN-?)389?393. Figure 5.2. Timeline of FDA-approved chronic HBV treatment. Trade names of NRTIs and IFN-? are shown in blue and purple, respectively. Additional information can be found below. 156 5.3.1.1 NRTI treatment The six FDA-approved NRTI treatments of cHBV are Lamivudine (LMV, Epivir), Adefovir dipivoxil (ADV, Hepsera), Entecavir (ETV, Baraclude), Telbivudine (TBV, Tyzeka), Tenofovir disoproxil fumarate (TDF, Viread), and Tenofovir alafenamide fumarate (TAF, Vemlidy) (Figures 5.2 and 5.3). NRTIs are prodrugs that require phosphorylation to their active 5?-triphosphate form by cellular kinases390. Once active, NRTIs compete with natural dNTP substrates (e.g., dATP, dGTP, dCTP, and dTTP) for incorporation into the (-)-DNA strand by RT. Given that NRTIs lack a 2?-OH (Figure 5.3), they function as DNA chain terminators (Figure 4.4). O NH2 O O NH O 2 N N NH NH N N HO HO HO O O O N O N N NH O N O P 2 O O N N S OH OH O O Lamivudine Entecavir Telbivudine Adefovir dipivoxil (LMV, Epivir) (ETV, Baraclude) (TBV, Tyzeka) (ADV, Hepsera) O O O O HO OH O HO OH O O O OO NH O O N HN 2 N O P O NH P N O N O O N N NH NO 2 O O Tenofovir disoproxil fumarate Tenofovir alafenamide fumarate (TDF, Viread) (TAF, Velmidy) Figure 5.3. FDA-approved NRTIs for chronic HBV treatment. Structures of all prodrugs are shown with trade names in parentheses. Regions shown in blue denote structures that are removed during activation. All NRTIs require phosphorylation to their active 5?-triphosphate forms by cellular kinases390. NRTIs are administered orally with a daily dose of 0.5-600 mg. Clinical trial data from cHBV-infected HBeAg-positive patients showed that after a year of NRTI treatment, 157 21-76% had an undetectable level of HBV cccDNA, 41-77% had a normalized level of ALT, but only 12-22% achieved HBeAg seroconversion and 0-3% lost HBsAg396?401. Similarly, data from cHBV-infected HBeAg-negative patients showed that a one-year NRTI therapy led to high levels of undetectable HBV cccDNA (51-93%), normalization of ALT levels (62-78%), but negligible HBsAg loss (~1%)396,397,402,403. Extending NRTI treatment to four-to-five years in both patient types led to an increase in HBeAg seroconversion (31-48%) but loss of HBsAg remained low (0-10%)396?403. While life-long NRTI therapy could, in principle, help reduce cHBV-related symptoms, HBV resistance to NRTIs limits their efficacy. That is, HBV resistance to NRTIs ranges from 0-27% and 0- 80% after one and five years, respectively396,397,399,400,402?405. To make matters worse, NRTIs are also associated with several adverse effects, some mild (e.g., headache, fatigue, and dizziness) and others more severe (e.g., increased liver toxicity, kidney tube dysfunction, myopathy, neuropathy, and lowering bone mineral density)406. 5.3.1.2 IFN-? treatment The two FDA-approved IFN-? treatments of cHBV are IFN-?-2b (Intron A) and PEG-IFN- ?-2a (Pegasys) (Figure 5.2). IFN-? is a cytokine that is secreted by plasmacytoid dendritic cells407 as a soluble glycoprotein with potent antiviral activity408,409. IFN-? has been used to treat cHBV since 1976 and was the first FDA-approved cHBV treatment (Figure 5.2). The emergence of the PEG containing form (i.e., PEG-IFN-?-2a) in 2005, however, supplanted the standard IFN-?-2b treatment. Compared to NRTIs, much less is known regarding the exact mode of action of IFN-? treatment of cHBV. Nevertheless, the proposed mechanism of action of IFN-? treatment is two-fold. The first effect is to drive proliferation, activation, and antiviral potential of immune cells that are dysfunctional in 158 cHBV-infected cells, which has been associated with a greater decline in HBsAg393?395. The second effect of IFN-? treatment is to inhibit early stages of HBV (-)-DNA strand synthesis, which is supported by multiple lines of evidence. First, IFN-? treatment has been shown to increase the expression of Apobec3G410, which inhibits early stages of HBV (-)-DNA strand synthesis342,343, as outlined in Section 4.2.6. Second, IFN-? treatment results in cccDNA-bound histone hypoacetylation and recruitment of cccDNA transcriptional co-repressors, reducing HBV cccDNA transcription411 (Figure 4.2). PEG-IFN-?-2a treatment is injected subcutaneous once a week393. Clinical trial data from cHBV-infected HBeAg-positive patients showed that PEG-IFN-?-2a therapy was more effective to LMV, and the addition of LMV to PEG-IFN-?-2a treatment had no added benefit412?414. Specifically, a year of PEG-IFN-?-2a therapy led to modest levels of undetectable HBV cccDNA (25%), normalization of ALT levels (34-39%), HBeAg seroconversion (27%), and negligible loss of HBsAg (3%)412. Data from a similar trial in HBeAg-negative patients showed that a one-year PEG-IFN-?-2a treatment led to 63% undetectable HBV cccDNA and 4% HBsAg loss415. Extending PEG-IFN-?-2a therapy to four-to-five years led to varied results, depending on HBV genotype and patient origin416,417. As with NRTIs, PEG-IFN-?-2a treatment also comes with mild (e.g., fatigue, flu-like symptoms, and mood changes) and severe (e.g., bone marrow suppression and autoimmune illnesses) adverse effects389,393. Moreover, PEG-IFN-?-2a therapy has been reported to precipitate liver failure in patients with cirrhosis389. NRTI and IFN-? treatments have their advantages and disadvantages. On the one hand, NRTIs are a potent anti-HBV therapeutic that can be easily administered (i.e., orally) with minimal adverse effects. On the other hand, NRTIs involve life-long treatment 159 with resistance-related complications and are not effective enough to suppress genomic recycling of HBV cccDNA391,418,419. IFN-? therapy can lessen these burdens given its finite treatment course and ability to modestly suppress cccDNA420. However, IFN-? is not well tolerated due to many adverse effects and even its administration (i.e., subcutaneous injections) is a burden to patients. Taken together, NRTIs hold more promise for the future prevention of HBV. Initial work in this direction involves choosing NRTIs with low barriers to resistance or combining NRTI treatments in a way to reduce multidrug resistance. 5.3.2. Alternative NRTIs in clinical trials for chronic HBV treatment In addition to the FDA-approved NRTIs described 5.3.1.1 and shown in Figure 5.3, there are other promising NRTIs that are either approved or in clinical trials in the Unites States and elsewhere. A few examples of such NRTIs are Besifovir dipivoxyl maleate (BSV, Besivo), Tenofovir exalidex (TXL), and ATI-2173 (Figure 5.4). As a first example, BSV is a prodrug that was approved in South Korea in 2017. Clinical trial data showed that 48- week BSV therapy (150 mg) was as effective as ETV (0.5 mg)421 and TDF (300 mg)422 in treating cHBV but without eliciting renal toxicity. These findings were validated when extending the treatment to 96 weeks, with significantly reduces bone and liver toxicities and no observable drug resistance422,423. The only drawback of BSV was depletion of L- carnitine in patients, requiring carnitine supplementation423. Another example of alternative NRTIs is TXL, which shares the familiar Tenofovir (TEN) scaffold found in other NRTIs (e.g., TDF and TAF) but has been shown to be 100-fold more potent than TEN in vitro424. Initial clinical trials have administered TXL (up to 100 mg) to patients with cHBV, showing excellent tolerance and good pharmacokinetics. Given this preliminary success, TXL is currently in phase III trials. 160 O HO OH O O NH2 N O O N O H O NH O O N HO O N N PO O N O O PO O N N NH2 C16H33 O P O F O O N N O O OH Besifovir dipivocil maleate Tenofovir exalidex ATI-2173 (BSV) (TXL) Figure 5.4. NRTIs in clinical trials for chronic HBV treatment. Structures of all prodrugs are shown with blue regions to denote structures that are removed during activation. All NRTIs require phosphorylation to their active 5?-triphosphate forms by cellular kinases390. The final example involves ATI-2173, an analog of Clevudine (CLV, Levovir), which was formerly approved to treat cHBV in South Korea and the Philippines in 2006 but was later revoked due to complications with skeletal myopathy425. In its active 5?-triphosphate form, CLV has a long half-life (11 h) and is a competitive inhibitor of RT426,427, suppressing HBV replication for months after treatment and reducing cccDNA levels in animal models428. Based on these data, CLV analog prodrugs were recently reinvestigated to avoid skeletal myopathy. ATI-2173 is the lead candidate, showing in vitro efficacy against HBV with significant reduction of cccDNA in the related woodchuck HBV model (unpublished). A phase I clinical trial for ATI-2173 was recently initiated. 5.4 Alternative anti-HBV therapies 5.4.1 Targeting RNase H activity Much of the focus (e.g., NRTI-bases therapies) of cHBV treatment is centered around targeting the RT domain of P. However, the RH activity of P is an attractive target in its own right. RH degrades the pgRNA as RT elongates the (-)-DNA strand (Figure 4.4). Blocking RH activity has therefore been shown to prematurely stall (-)-DNA strand extension, causing extensive RNA:DNA duplexes that further prevent (-)-DNA 161 synthesis429,430. The recent production of active recombinant RH429,431 has enabled low- and mid-throughput screening efforts to identify HBV replication inhibitors. These approaches have focused on chemotypes that are known to inhibit human immunodeficiency virus (HIV-1)429. Screening over 3,000 compounds led to the identification of ~150 HBV replication inhibitors that function by blocking RH activity, as confirmed by the detection of RNA:DNA duplex accumulation429,432?436. These inhibitors can be classified into four chemotypes: N-Hydroxyisoquinolinediones (HID), N- Hydroxynapthyridinones (HNO), N-Hydroxypyridinediones (HPD), and ?- Hydroxytropolones (?HT) (Figure 5.5). Importantly, RH inhibitors are effective against recombinant RH and clinical isolates from three HBV genotypes, RH inhibitor treatment should not be complicated by the genetic diversity of HBV437. NH2 F O O HO N N O O N N O O N OH HO OH OH OH O N-Hydroxyisoquinolinedione N-Hydroxyapthyridinone N-Hydroxypyridinedione ?-Hydroxytropolone (HID) (HNO) (HNO) (?HT) Figure 5.5. Examples of HBV RNase H inhibitors. Representative compounds from each of the four chemotypes are shown (i.e., HID, HNO, HPD, and ?HT). 5.4.2 Targeting protein priming An alternative target of emerging anti-HBV therapies is protein priming, which offers the ability to prevent HBV replication at a very early stage. P is presumed to adopt conformational change to transition from protein priming to the subsequent (-)-DNA strand elongation (Figure 4.3). This feature permits the opportunity to design anti-HBV 162 therapeutic to specifically inhibit TP-mediated protein priming functions to complement current NRTI treatments290,323,438,439. Indeed, the FDA-approved guanosine analog ENT has been shown to inhibit protein priming by competing with dGTP, the initiating substrate to synthesize the 5?-GAA-3? DNA (Figure 4.3)440,441. In a similar manner, the adenosine analog TEN can also inhibit the elongation of the 5?-GAA-3? DNA by competing with dATP426. Intriguingly, thymidine analog CLV was also shown to inhibit protein priming through a non-competitive mechanism without its incorporation into the (-)-DNA strand426. Finally, using the related duck system, it has been shown that when added in trans, a catalytically dead RT can inhibit protein priming, presumably from preventing the necessary interactions and/or conformations of TP and/or ?442. Whether or not this is possible in the case of human HBV remains unknown but offers a unique targeting strategy completely orthogonal to NRTI therapy. 5.4.3 Targeting the ?-P interaction Since the ?-P interaction initiates protein-primed reverse transcription301,309?311 and pgRNA and P packaging288,300 (Figure 4.6 and Tables 4.1 and 4.2), their complex represents an attractive therapeutic target for early intervention of HBV replication. Examples of such compounds are the antibiotic Geldanamycin (GDN), Rosmarinic acid (ROS) derivatives, and iron protoporphyrin IX (hemin, HEM) and related porphyrin compounds (Figure 5.6). In the first example, GDN disrupts the ?-P association in both human and duck systems by blocking the function of the Hsp90 complex337?339. Given the critical functions of Hsp90, this mode of preventing HBV replication is disadvantageous, motivating the discovery of additional compounds that target the ?-P complex. A second example is ROS and its analog Quercetin (QUE) (Figure 5.6), which specifically inhibit ?- 163 P binding443. Moreover, when combined with the LAM, ROS slightly increased the anti- HBV activity of LAM, suggesting that ROS inhibition involves an HBV replication step distinct from (-)-DNA strand elongation443. Finally, HEM, Protoporphyrin IX (PPP-IX), Protoporphyrin IX disodium (PPP-IX-Na), and Biliverdin (BIL) (Figure 5.6) all disrupt the ?-P complex in both duck and human systems444. Interestingly, this effect is enacted by compound binding to the TP domain of P. O O O OH OH N O O OH OH O H O HO O O O HO OH HO O OH OO NH2 Geldanamycin Rosmarinic acid Quercetin (GDN) (ROS) (QUE) O O N Cl N NH N NH N NH HN Fe N N N HN N HN NH N Na HO O HO O HO O HO O O O Na O O HO O HO O Iron protoporphyrin IX protoporphyrin IX protoporphyrin IX disodium Billiverdin (Hemin, HEM) (PPP-IX) (PPP-IX-Na) (BIL) Figure 5.6. Examples of ?-P binding inhibitors. These molecules are either antibiotics (GDN), rosmarinic acid analogs (ROS and QUE), or porphoryin compounds (HEM, PPP-IX, PPP-IX-Na, and BIL). Another approach to target the ?-P interaction employs SELEX to select strong P- binding RNA aptamers that compete with ? for P binding330. These ?decoy? aptamers show a strong inhibitory effect on pgRNA packaging and DNA synthesis330. While these ?-P inhibitors can complement existing NRTI treatments of cHBV, lack of structural information for the ?-P complex prevents structure-enabled design of anti-HBV 164 therapeutics. Nevertheless, our recent structure of full-length ?20 (Figure 4.12) presents a necessary step in this direction. Indeed, given the central role of ? in HBV replication (Figure 4.6 and Tables 4.1 and 4.2), ? is an attractive and novel therapeutic target. 5.5 Discovery of ?-targeting ligands Small molecules offer an opportunity to target RNA motifs ? such as pseudoknots, bulges, and hairpins ? which are often highly conserved and mediate important biological functions445?448. Several recent in vitro449 and in silico450 high-throughput screening (HTS) approaches have identified chemotypes that selectively bind RNA motifs, with corresponding physiological effects in cell culture and animal models. Our structural analysis of full-length ? suggests that the 6 nt PL bulge forms a binding pocket that is amenable to small molecule targeting (Figure 4.12). Computational predictions451 support this notion, mapping that the most probable ligand cavity directly at the PL (Figure 5.7). Figure 5.7. Predicted ligand cavity in full-length ?. Ligand cavity mapping was carried out with RNACavityMiner451 using PDB 6var20 as the input structure. The most probable ligand cavity is shown with cyan spheres and is displayed on ? R3, which is colored as in Figure 4.7. The 6 nt PL bulge is predicted to be the most likely pocket for small molecule targeting, in agreement with our structural analysis. 165 5.5.1 High-throughput screen strategy 5.5.1.1 Lead compounds generation As a first means of testing our structure-informed hypothesis, we employed a small molecule microarray (SMM) approach that was previously used to identify a variety of chemotypes targeting both RNA25,452?454 and DNA455,456 motifs (Figure A.35A). Here, fluorescently tagged full-length ? and a control RNA were used to screen a ~26,000 compound library. Hit selection required a pipeline comprising statistical analysis, inspection of pharmacophore properties (i.e., amenability of samples to later medicinal chemistry), and commercial availability. For each compound, a composite Z-score (see Section 5.5.1.5.1) was calculated based on increased fluorescence at that location on the array in the presence of full-length ?. Compounds with a Z-score >3 were investigated further. Following the visual inspection of the array fluorescence signals and elimination of false positive signals yielded five candidate compounds (Figures 5.8 and A.35B). Following identification of these initial hits, NMR titration experiments were used to discriminate specific binders from non-binders, aggregators, and non-specific binders. NH NH N H2N NH2 O O N O Pentamidine O N O O S N S N OH N N N N N N HO S O2N H H H H Raloxifene Pinafide NSC20618 NSC20619 Figure 5.8. Initial hit compounds from SMM. Chemical structures of all molecules had a Z-score >3 and were the candidates for further testing to assess selectivity and affinity. 166 5.5.1.2 Raloxifene selectively targets the ? priming loop Initial hit compounds were titrated with atom-specifically labeled (i.e., [2?,8-13C2]-Ade, [1?,6-13C2, 5-2H]-Cyt/Uri, and [1?,8-13C2]-Gua) full-length ? (Figure 4.9A) in order to reduce spectral overlap. Then, 1H-13C sofast-heteronuclear multiple-quantum correlation (HMQC) NMR experiments were used to monitor chemical shift perturbations (CSPs) of aromatic (i.e., H6-C6/H8-C8) resonances. This analysis failed to confirm binding of NSC20618 and NSC20619 (Figure A.36). Moreover, at the compound concentrations tested, (i.e., 50-150 micromolar (?M)) Pinafide induced non-specific resonance line broadening indicative of ? aggregation and was therefore not considered further (Figure A.36). Titrations with Pentamidine, on the other hand, did show ? binding. Specifically, Pentamidine titrations led to CSPs for a large number of aromatic resonances in nucleotides located in both the PL and AL, suggestive of non-specific binding (Figure A.36). Finally, Raloxifene titrations provided CSPs for aromatic resonances exclusive to PL and adjacent nucleotides (e.g., A13-U15, U18, C19, and U48-G50) (Figure 5.9). To further confirm that Raloxifene targets the ? PL, we repeated our 1H-13C sofast- HMQC NMR titration experiments with Raloxifene and Pentamidine using atom- specifically labeled (i.e., [8-13C]-Ade/Gua and [6-13C, 5-2H]-Cyt/Uri (Figure A.22A) ? AL. This analysis revealed that only Pentamidine binds the AL (Figure A.37), in agreement with our previous titration (Figure A.36). For completeness, we then titrated Raloxifene with u-[13C/15N]-labeled PL ? and used 1H-13C HSQC experiments to monitor CSPs of aromatic (i.e., H2/5/6/8-C2/5/6/8) and ribose (i.e., H1?-C1?) resonances. These experiments permitted detection of resonances that were not probed in our atom- specifically labeled sample. In agreement with previous titrations (Figure 5.9), Raloxifene 167 titrations led to CSPs exclusive to PL nucleotides (Figure 5.10). Notably, Raloxifene induced greater CSPs at ribose H1?-C1? and Ade-H2-C2 resonances, suggesting binding of the minor groove of the PL (Figure 5.10). Collectively, our NMR titration data (Figures 5.9, 5.10, A.36, and A.37) unambiguously demonstrate that Raloxifene targets the ? PL, supporting our structure-informed pocket hypothesis. Figure 5.9. CSP mapping of Raloxifene binding to full-length ?20. (A) 1H-13C sofast-HMQC NMR experiments were used to measure CSPs for aromatic (i.e., H6-C6/H8-C8) resonances of atom-specifically labeled (Figure 4.9A) full-length ? (50 ?M) titrated with Raloxifene (up to 250 ?M). (B) CSP map of ? with structural regions abbreviated and colored as in Figure 4.7. Dashed lines represent 1 SD above the mean. Raloxifene binding is localized to the PL nucleotides. Raloxifene is a benzothiophene belonging to the class of selective estrogen receptor modulators (SERMs) and is in clinical use for treatment of osteoporosis by mimicking the effects of the hormone estrogen to increase bone density. Raloxifene is also proposed to lower the risk of breast cancer by blocking the effects of estrogen on breast tissue457. The benzothiophene, Arzoxifene, and the phenylindole, Bazedoxifene, are closely related SERMs that have also been under clinical investigation458 (Figure 168 5.11A). To test whether these compounds also bind full-length ?, NMR titrations were again used. This analysis revealed that Bazedoxifene also targets ? at the PL, albeit with less CSPs than Raloxifene whereas Arzoxifene showed no binding (Figure 5.11B). Figure 5.10. CSP mapping of Raloxifene binding to PL ?20. 1H-13C HSQC NMR experiments were used to measure CSPs for aromatic (i.e., H2/5/6/8-C2/5/6/8) and ribose (i.e., H1?-C1?) resonances of u-[13C/15N]- labeled PL ? (150 ?M) titrated with Raloxifene (up to 600 ?M). Raloxifene selectively targets the PL, in agreement with data from full-length ? titrations. As an orthogonal measure of SERM binding, a dye-displacement assay was used. These data corroborate the NMR titration data (Figure 5.11) and suggest that Raloxifene and Bazedoxifene have affinities (measured by IC50, see Section 5.5.1.5.3) of 69.2 ? 6.7 and 107.0 ? 32.1 ?M, respectively, whereas Arzoxifene showed no binding (Figure 5.12). This is the first report of SERM binding to structured RNA motifs. It is desirable to obtain a Raloxifene-bound ? NMR structure. Unfortunately, saturation of the RNA is not feasible at the NMR concentrations needed for such experiments, due to Raloxifene insolubility. Preliminary NOESY experiments of the ?-Raloxifene complex were hampered by low signal-to-noise and the absence of cross-peaks (data not shown). Alternative 169 computational approaches were therefore explored to provide a more detailed understanding of the ?-Raloxifene interaction. Figure 5.11. CSP mapping of SERMS binding to full-length ?20. (A) Chemical structure of related SERM compounds. (B) 1H-13C sofast-HMQC NMR experiments were used to measure CSPs for aromatic (i.e., H6-C6/H8-C8) resonances of atom-specifically labeled (Figure 4.9A) full-length ? (50 ?M) titrated with Bazedoxifene (left) or Arzoxifene (right) (up to 100 and 250 ?M, respectively). Figure 5.12. Quantifying the affinity of SERMS binding to full-length ?20. Representative binding curves from dye-displacement experiments to determine the binding affinity of each SERM to full-length ?. Fluor (norm) refers to normalized fluorescence. Affinities (measured by IC50, see Section 5.5.1.5.3) are reported as the average ? standard error (SE) from fitting data of triplicate measurements to Equation 5.1 (see Section 5.5.1.5.3). Raloxifene and Bazedoxifene show mid-?M affinity while Arzoxifene does not bind at the concentrations used (i.e., IC50 >500 ?M). 170 5.5.1.3 Modeling the ?-Raloxifene complex To gain further insight into the ?-Raloxifene interaction, computational docking and MD simulations were adopted. Raloxifene docking pose predictions generated by rDock459 indicated that among the top 10 NMR ? conformers20, three could be targeted directly at the PL, with ? R3 scoring best, ? R6 a close second, and ? R5 scoring third. Interestingly, these are the three ? conformers that share the unique PL orientation shown in Figure 4.12C. The predicted docking pose reveals that the Raloxifene core is wedged deeply into the PL between nucleotides U15-C19 (with G16, U18, and C19 rotated away) and is also close to U48 and U49 (Figure 5.13A). Given the strong agreement in data from ? R3 and ? R6, we only illustrate computational docking and MD results for the ? R3 target. It is worthwhile noting that the docking prediction search space was not restricted to the PL, and therefore our findings of the best-scored docking poses in the PL, which are consistent with NMR titration data, were not biased by the input parameters. Three independent 300 ns MD simulations were carried out on ? R3 and the ? R3- Raloxifene complex. In each trajectory of the ? R3-Raloxifene complex, Raloxifene remained stably bound to the RNA target (Figure 5.13B), suggesting a valid docking pose prediction. Representative MD trajectories of the unliganded (Figure 5.13C) and Raloxifene-bound (Figure 5.13B and D) RNA demonstrate stabilization of the PL upon ligand binding. Box plots illustrate the difference in the cumulative RMSDs measured for the PL and adjacent nucleotides (i.e., A13-A20, U48, and U49) in the MD simulations (Figure 5.13E). Indeed, Raloxifene stabilized the conformation of the PL, lowering its mean RMSD value from 5.2 ? 1.7 to 3.6 ? 0.8 ?. The difference in the median values 171 between the two groups (5.4 ? for ? R3 and 3.5 ? for the ? R3-Raloxifene complex) is statistically significant based on the Mann-Whitney Rank Sum Test (P ? 0.001). Figure 5.13. Computational docking and MD simulations of ? R3-Raloxifene. (A) Top ranked rDock459 predicted docking pose of Raloxifene to ? R320. Raloxifene preferentially docks to the PL, in agreement with NMR titration data. (B) Representative structural overlay of PDB snapshots taken every 10 ns of the first (of three) 300 ns ? R3-Raloxifene MD trajectory, with backbone phosphorus atom alignments shown using PL (i.e., A13-A20, U48, and U49) nucleotides. (C) Same as in B but for unliganded ? R3. (D) Same as in B but with Raloxifene hidden to best illustrate the relatively stable backbone and nucleotide orientation in the Raloxifene-bound PL. In A-D, ? structural regions abbreviated and colored as in Figure 4.7. (E) Box plots of the cumulative RMSDs measured for the PL and adjacent (i.e., A13-A20, U48, and U49) nucleotides calculated from the MD simulations in B and C. Median values correspond to the horizontal lines inside the boxes, while the upper and lower box boundaries indicate the 25 and 75 percentile values. ?Whiskers? indicate 10 and 90 percentile values. Raloxifene binding significantly lowers the RMSD of PL nucleotides. The difference in the median values is statistically significant in Mann-Whitney Rank Sum Test (P ? 0.001). 5.5.1.4 Raloxifene is unable to inhibit HBV protein priming Given that Raloxifene targets ? at its well conserved PL and quenches its dynamics, Raloxifene is likely to alter the ?-P interaction and its downstream functions. That is, modulating ? dynamics may be an effective therapeutic strategy, which would benefit mid- ?M binders such as Raloxifene. Indeed, considerations of RNA dynamics in small molecule targeting has shown promising results in RNA-targeted drug discovery450. Based on the preliminary results20 described in Section 5.5, LeGrice and co-workers used 172 an in vitro assay to test whether Raloxifene could prevent HBV protein priming460. In this experiment, ? and P were transfected into cells and treated with Raloxifene and radiolabeled [?-32P]-dGTP (Figure 5.14A). Given that dGTP initiates the synthesis of the 5?-GAA-3? DNA (Figure 4.3), successful protein priming can be detected by the incorporation of [?-32P]-GTP into the (-)-DNA strand by phosphorimaging. Using this methodology, LeGrice and co-workers demonstrated that Raloxifene had no effect on HBV protein priming (Figure 5.14B)460. Figure 5.14. Raloxifene has no effect on HBV protein priming. (A) Schematic of the TP-mediated protein priming. Highlighted in red is the initiating dGTP, which in the experiments carried out by LeGrice and co- workers460 is radiolabeled (i.e., [?-32P]-dGTP). (B) Phosphorimaged gel data from LeGrice and co- workers460. Raloxifene has no effect on protein priming as evidenced by the presence of a gel band. 5.5.1.5. Experimental details 5.5.1.5.1 Small molecule microarray (Dr. Fardokht Abulwerdi) To identify ?-targeting ligands, a SMM was used25,452?456. In brief, g-aminopropyl silane microscope slides were functionalized with a short fluorenylmethoxycarbonyl-protected amino PEG-spacer. After piperidine deprotection, 1,6-diisocyanatohexane was coupled to the surface by urea bond formation to provide functionalized isocyanate-coated microarray slides that react with primary and secondary amines and primary alcohols to create immobilized small molecule libraries. Slides were then exposed to pyridine vapor to facilitate covalent attachment and incubated with a 1:20 PEG:DMF (v/v) solution to 173 quench unreacted isocyanate surface. Fluorescently (TYE665) labeled ? was dissolved in Ultrapure water and diluted to 1 or 5 ?M in SMM buffer (25 mM sodium cacodylate, 50 mM KCl, 1 mM MgCl2, pH 6.9). Then, ? was annealed to the slides by heating to 95 ?C for 3 min, snap cooling on ice for 10 min, and slow equilibration to room temperature for 1 h. Following incubation, slides were gently washed twice in SMM buffer with 0.01% Tween-20 for 2 min and then a final time in SMM buffer for an additional 2 min. After all washes, the slides were allowed to dry. Fluorescence intensity (650 nanometer (nm) excitation, 670 nm emission) was measured on an Innopsys Innoscan 1100 AL Microarray Scanner. The scanned images were aligned with the corresponding GenePix Array List files to identify individual features. For statistical analysis, hits were defined as [(mean foreground ? mean background) / (SD of background)] and the Z-score was defined as Z = (mean SNR635 compound ? mean SNR635 library) / (SD of SNR635 library) with the following criteria: (1) SNR > 0, (2) Z score > 3, (3) coefficient of variance of replicate spots < 200, (4) [(ZRNA incubated ? Zbuffer incubated) / Zbuffer incubated] > 3, and (5) visual inspection and removal of false positives (e.g., dust particulates) (Figure A.35). 5.5.1.5.2 NMR titrations (by Drs. Regan LeBlanc and Andrew Longhini) All ? NMR samples were prepared by IVT as in Section 2.2.4.1 in NMR buffer A. NMR titration experiments were performed on atom-specifically labeled ([2?,8-13C2]-Ade, [1?,6- 13C2, 5-2H]-Cyt/Uri, and [1?,8-13C2]-Gua) (Figure 4.9A) full-length ?, u-[13C/15N]-labeled PL ?, or atom-specifically labeled ([8-13C]-Ade/Gua and [6-13C, 5-2H]-Cyt/Uri) (Figure A.37A) AL ?. Initial hit compounds (Figure 5.8) were screened by titrating up to 200-600 ?M of small molecule against 50-150 ?M ? samples. Then, 1H-13C sofast-HMQC or HSQC NMR experiments were used to monitor CSPs of aromatic (i.e., H2/5/6/8-C2/5/6/8) and ribose 174 (i.e., H1?-C1?) resonances. CSPs helped distinguish specific binders, non-specific binders, aggregators, and non-binders (Figures 5.9 and A.36). PL ? and AL ? were then used to validate specific binders and rule out allosteric effects (Figures 5.10 and A.37). All NMR data were collected at 25 ?C with a recycle delay of 1.5 s, and analyzed using TopSpin 4.0, NMRFx Processor, and NMRViewJ196 5.5.1.5.3 Dye-displacement assay (by Dr. Fardokht Abulwerdi) In all binding assays, a fixed concentration of full-length ? (0.5 ?M) and SYBRG II (4x) was used. Then, 5 ?L of compound (in DMSO) and 95 ?L of RNA/dye complex in assay buffer (5 mM sodium cacodylate, 50 mM KCl, 1 mM MgCl2, 0.1 mM EDTA, 0.01% Triton- X100, pH 6.5) were added to black Nunc 96-well plates, incubated at room temperature for 30 min, and fluorescence intensity values were measured (485 ? 5 nm excitation, 525 ? 5 nm emission) using a Tecan plate reader. IC50 values were determined by normalizing fluorescence intensity of each well to an average value for the fluorescence intensity of RNA/dye complex by the following relation (? ? ? ) ? = ? (?O ()*()* + ((1 + 10??? (Fl?X):O (5.1) ) ? ?????????) where Fmax and Fmin are the highest and lowest fluorescence readings, respectively, the Hillslope is the steepness (i.e., responsiveness) of the curve, x is log(ligand), and Y is normalized fluorescence. Reported IC50 values are the average ? SE from fitting data of triplicate measurements to Equation 5.1. Given that the affinity of SYBRG II for RNA (nanomolar, nM) is much greater than that of the small molecules tested (?M), the calculated IC50 is a good approximation of the true affinity. 175 5.5.1.5.4 Computational docking (by Dr. Wojciech Kasprzak) rDock459 was used to predict Raloxifene docking pose to full-length ?. This program offers a dedicated intermolecular scoring function (e.g., van der Waals, polar, and desolvation components) that has been validated against RNA targets459. First, rbcavity generates the docking cavity for the receptor (i.e., docking surface interface). Then, rbdock and docks the ligand. rDock ligand conformation predictions are based on sampling of the exocyclic dihedral angles that yield best docking scores when fit to a rigid target (i.e., receptor). The program employs a genetic algorithm-based stochastic search algorithm and therefore has to be run multiple times. rbdock was run 50 times to generate best- scored docking poses, using full-length ? (PDB 6var20) as the target. The receptor input was converted to a MOL2 format, while the ligand conformations were prepared with the quantum mechanics output and converted into the required SDF format. Raloxifene docking pose predictions generated by rDock459 indicated that three ? NMR conformers20 (i.e., ? R3, ? R5, and ? R6) could be targeted directly at the PL, with ? R3 being the best. 5.5.1.5.5 MD simulations (by Dr. Wojciech Kasprzak) Based on rDock predictions, ? R320 was selected for unliganded and Raloxifene-bound analysis. Three independent 300 ns MD simulations were run for ? R3 and ? R3- Raloxifene (i.e., the top-ranked rDock pose) using the Amber16 software package366 and ff99LJbb367 forcefield as in Section 4.4.5.4. The Amber CPPTRAJ381 module was then used for analysis. RMSD calculations were performed using the MD trajectories (excluding the 2 ns equilibrations) with ? R3 used as the reference structure. The 300 ns trajectories were sampled in 0.1 ns steps to yield 3,000 data points and calculations were made considering all atoms within the PL nucleotides (i.e., A13-A20, U48, and U49). 176 5.5.2 Virtual screening strategy As a second means of testing our structure-informed hypothesis, we employed a structure-based virtual screening (VS) approach. As we described in Section 5.5.1.3, computational docking can provide complementary data and corroborating evidence to experimental binding assays. Moreover, VS dramatically reduces the of time to generate lead compounds. However, VS is not without limitations, especially when targeting RNA. That is, docking to RNA targets is complicated by the high flexibility of RNA and its propensity to change conformations upon ligand binding. One approach to overcome this challenge is to treat the RNA target as a conformational ensemble that is then subject to VS461?463. These ensembles can either be computationally derived or experimentally informed450,464?469. Initial successes of the latter approach in an ensemble-based VS suggest a promising path forward for RNA450,464. However, the utility of this method is predicated on having robust and plentiful experimental restraints such as RDCs and NOEs. Unfortunately, these data are sparse for full-length ? due to its large size. As an alternative, we employed a rigid dock VS followed by MD simulations as a means to rapidly identify lead compounds while partially addressing the inherent dynamics of ?. 5.5.2.1 Lead compound generation The first step in VS is receptor preparation and compound library selection. Based on our previous computational docking20, we used our full-length ? R3 structure (PDB 6var20) as the receptor. We then selected a FDA-approved library curated on the ZINC15 database470?472 to avoid additional filter selection steps such as ADMET and Lipinski?s rule of five. The former refers to absorption, distribution, metabolism, excretion, and toxicity, and is highly predictive of drug efficacy and safety473?475 whereas Lipinski?s 177 suggestions state that in successful drug candidates typically violated no more than one of the following considerations: ? five hydrogen bond donors, ? 10 hydrogen bond acceptors, molecular weight ? 500 Da, and a logP ? 5)476,477. Our assumption is that FDA- approved drugs already have good drug-like properties, which paradoxically is not always true. Nevertheless, the value of VS is that it can be easily repeated with a different compound library to identify new lead compounds, if needed. With our receptor and library in-place, we carried out VS with Autodock vina478 in PyRx open-source software package479 to see if we could identify FDA-approved drugs that can be repurposed as anti-HBV therapeutics. We then carried out selection criteria on the basis of affinity, commercial availability and drug-like properties, and docking site (Figure 5.15) to identify lead compounds from our 1,604-compound library. Given that Raloxifene is an FDA-approved drug that is included in our compound library, we used its predicted affinity (-9.5 kcal?mol-1) as our first filter and proceeded with the 122 compounds with a higher predicted (and hopefully experimental) affinity (Figure 5.15). We then manually removed all compounds that were not commercially available and/or had potential negative side-effects (e.g., anti-cancer drugs) and proceeded with 66 compounds that are presumably safe cHBV treatments (Figure 5.15). As with our previous ? R3 computational docking20, we employed an unbiased dock. Therefore, as our final filter step, we opted to only select compounds that selectively targeted the ? PL in a reproducible manner (Figure A.38, see Section 5.5.2.4.1), leaving us with 12 lead compounds that could be experimentally verified (Figure 5.15). 178 Figure 5.15. Virtual screen lead compound selection strategy. (A) Schematic of our lead compound filtering funnel, starting from the 1,604 FDA-approved ZIN15 library. (B) Histogram showing all compound hits sorted by their predicted affinities. The shaded region refers to the area containing the 122 compounds with a predicted affinity higher than Raloxifene. (C) Plot of compound docking sites (i.e., PL or non-PL) for the 66 compounds with commercial availability at good drug-like properties. Interestingly, our 12 VS-identified lead compounds have substantial diversity in their chemotypes as well as their use (Figure 5.16). For example, Ledipasvir (LED), Elbasvir (ELB), Simeprevir (SIM), Daclatasvir (DAC), Velpatasvir (VEL), and Saquinavir (SAQ) are antivirals, all of which being anti-HCV drugs except SAQ, which targets HIV-1. Telithromycin (TEL), Ceftaroline fosamil (CEF), and Minocycline (MIN), on the other hand, are antibiotics. Lastly, Folinic acid (FOL), Natamycin (NAT) and Ivermectin (IVE) are the only vitamin, antifungall, and antiparasitic compounds, respectively. Moreover, our lead compounds have a range of predicted affinities from -12.1 (LED) to -9.6 (FOL, SAQ, CEF, and MIN) kcal?mol-1, all of which can now be verified experimentally. 179 O F F N O N O O HO N H O O HN O N N O N O N H N O O N N H2N N HN O H HN HN N O O N N O O O HN OH O HO O O 1. Ledipasvir (LED) 5. Telithromycin (TEL) 9. Folinic acid (FOL) (-12.1 kcal?mol-1) (-10.7 kcal?mol-1) (-9.6 kcal?mol-1) O O H O O HN H H O N N OH O O H N N N N H O HN O NO N N O H O HN N O O N N N O O N N NH2 O N N NH HN 2. Elbasvir (ELB) 6. Daclatasvir (DAC) 10. Saquinavir (SAQ) (-11.3 kcal?mol-1) (-10.6 kcal?mol-1) (-9.6 kcal?mol-1) O OOH HO OO N O OH S NH O O N O O H S S HO N N N N OH N OH O N O O N NH NO N HO OH HN O O S O O NH O N P 2 + H OH OH O N 3. Natamycin (NAT) 7. Velpatasvir (VEL) 11. Ceftaroline fosamil (CEF) (-11.0 kcal?mol-1) (-10.6 kcal?mol-1) (-9.6 kcal?mol-1) O O N O O N N N S O O O S O HO O OH N O HO O O HN O O O OH O OH NH2 O OH O OHN O O OH 4. Simeprevir (SIM) 8. Ivermectin (IVE) 12. Minocycline (MIN) (-10.8 kcal?mol-1) (-10.6 kcal?mol-1) (-9.6 kcal?mol-1) Figure 5.16. Virtual screen-identified lead compounds. Structures of all lead compounds identified by our VS, rank-ordered by their predicted docking affinities to full-length ? R3. 5.5.2.2 Daclatasvir selectively targets the ? priming loop Initial lead compounds were then experimentally validated with a dye-displacement assay. Of the 12 compounds, nine showed no evidence of binding (Figure A.39) whereas three of the anti-HCV compounds, LED, SIM, and DAC, did bind full-length ? with an affinity range of 60-300 ?M (Figure 5.17). We then used our dye-displacement assay again to tested whether these compounds bound additional RNA targets or are selective ?-ligands. To this end, additional RNAs with structural elements similar to ? (e.g., apical loops and internal bulges) were used: a 27 nt RNA from the A-site of the decoding center 180 of E. coli ribosomal RNA (A-site), a 30 nt RNA from the transactive response element from HIV-1 (TAR-2), and a 34 nt RNA from the self-splicing group II intron catalytic effector domain 5 from Pylaiella littoralis (D5-PL) (Figure 5.18A). However, given that LED was extremely insoluble and would therefore complicate downstream NMR experiments, we only proceeded with SIM and DAC. This analysis revealed that SIM binds A-site, TAR-2, and D5-PL with approximate affinities of 436, 58, and 60 ?M, respectively (Figure 5.18B). Given that SIM binds additional RNAs, some of which with greater affinity than full-length ? (e.g., TAR-2 and D5-PL), it was no longer considered as a lead compound. DAC, on the other hand, showed little to no binding to the additional RNAs (i.e., an order of magnitude weaker affinity (e.g., >500 ?M) than full-length ?, if at all) (Figure 5.18C). Taken together, these results suggest that DAC is an ?-targeting ligand with mid-?M affinity. Figure 5.17. Lead compound binding to full-length ?. Representative binding curves from dye- displacement experiments to determine the affinity of the subset of VS-identified lead compounds that bind to full-length ?. Affinities (measured by IC50) are reported as the average ? SE from fitting data of triplicate measurements to Equation 5.1. LED, SIM, and DAC all show mid-?M affinity. As a preliminary means of mapping the specific binding site of DAC to full-length ?, we employed our dye-displacement assay a final time using the PL ? and AL ? (Figure 4.7). These experiments should therefore suggest which ? region (i.e., LH, PL, PTL, and UH) DAC binds, which can then be further validated. Binding experiments with ? modular 181 constructs demonstrate that DAC binds to PL ? but not AL ?, suggesting that DAC binding is localized to regions shared by the two constructs (i.e., LH and PL) (Figure 5.19A). Figure 5.18. Daclatasvir selectively binds full-length ?. (A) Secondary structure representation of additional RNAs used to test for the selectivity of binding of SIM and DAC. Representative binding curves from dye-displacement experiments for (B) SIM and (C) DAC and the RNAs shown in A. Affinities (measured by IC50) are reported as the average ? SE from fitting data of triplicate measurements to Equation 5.1. Only DAC shows evidence of selective binding to full-length ?. 182 Figure 5.19. Mapping Daclatasvir binding to full-length ?. (A) Representative binding curves from dye- displacement experiments on full-length ? (left) (replotted from Figure 5.17 for illustrative purposes), PL ? (middle), and AL ? (right) to map DAC binding to full-length ?. Affinities (measured by IC50) are reported as the average ? SE from fitting data of triplicate measurements to Equation 5.1. (B) 1H NMR spectra of full- length ? (left), PL ? (middle), and AL ? (right) titrated with DAC. NMR measurements were collected at 600 MHz and 25 ?C. Dye-displacement and NMR demonstrate that DAC only binds to full-length ? and PL ?, suggesting localization to the LH and PL. To verify our dye-displacement results, we employed NMR measurements. Unfortunately, due to DAC insolubility, we were limited to low concentration samples and therefore 1H NMR. Nevertheless, we titrated DAC against ? modular constructs (Figure 4.7) and monitored the CSPs of imino protons (i.e., Gua-H1 and Uri-H3). This analysis demonstrated that DAC titration only led to CSPs (and an increase in resonance intensities) in full-length ? and PL ? (Figure 5.19B) suggestive of binding, in agreement with our dye-displacement data (Figure 5.19A). As a means to provide additional information on the ?-DAC complex, we again employed our computational MD simulation- based approach, as with Raloxifene (Section 5.5.1.3). 183 5.5.2.3 Modeling the ?-Daclatasvir complex To gain further insight into the ?-DAC interaction, more robust computational docking and MD simulations were adopted. That is, while Autodock vina478 is a useful tool for the rapid screening of a compound library, its docking pose is not as robust as other software programs, such as rDock459. This is exactly why we employed repeated docking runs in our third filter-step to generate confident PL docking poses (see Section 5.5.2.4.1). As such, we used rDock to predict more robust DAC docking poses. The predicted docking pose reveals that the DAC selectively targets the ? PL, with its core wedged deeply into the PL between nucleotides U15, U18-A20, and U49 (Figure 5.20). As before, the docking prediction search space was not restricted to the PL, and therefore our finding that the best-scored docking pose of DAC is in the PL, which is consistent with dye-displacement and NMR titration data, was not biased by the input parameters. Figure 5.20. Computational docking of ? R3-Daclatasvir. Top ranked rDock459 predicted docking pose of DAC to ? R320 with structural regions abbreviated and colored as in Figure 4.7. DAC preferentially docks to the PL, in agreement with dye-displacement and NMR titration data. Three independent 500 ns MD simulations were carried out on ? R3 and the ? R3- DAC complex. In each trajectory of the ? R3-DAC complex, DAC remained stably bound 184 to the RNA target (Figure 5.21B), suggesting a valid docking pose prediction. Representative MD trajectories of the unliganded (Figure 5.21A) and DAC-bound (Figure 5.21B) RNA demonstrate subtle differences global ? R3 dynamics. This observation is recapitulated in RMSD analysis, whereby DAC modulates the flexibility of PL nucleotides (Figure 5.21C). Specifically, DAC increases the flexibility of nucleotide U15 but rigidifies the motions of nucleotides U17-C19 (Figure 5.21D). Figure 5.21. MD simulations of ? R3-Daclatasvir. (A) Representative structural overlay of PDB snapshots taken every 10 ns of the third (of three) 500 ns ? R3 MD trajectory, with backbone phosphorus atom alignments shown using PL (i.e., A13-A20, U48, and U49) nucleotides. (B) Same as in A but with ? R3- DAC MD trajectories. In A and B, ? structural regions abbreviated and colored as in Figure 4.7. (C) RMSD averaged over the 500 ns trajectory (see Section 4.4.5.4) for ? R3 and ? R3-DAC. (D) Same data as in C but without averaging the RMSD but rather showing it throughout the MD run. Taken together, DAC binding ? R3 seems to modulate the dynamics of PL nucleotides U15 and U17-C19. 185 5.5.2.4 Experimental details 5.5.2.4.1 Virtual screening and computational docking We carried out our VS with Autodock vina478 in PyRx open-source software package479. In brief, SDF files of our 1,604-compound FDA-approved library were downloaded from the ZINC15 database470?472 and loaded into PyRx with the Open Babel chemical toolbox480. Then, the SDF files were then energy minimized to generate the required PDBQT files. Once all ligands were prepared, ? R320 was loaded and prepared as the receptor molecule. The docking grid was prepared in a manner to ensure an unbiased dock (i.e., the grid encompasses the entire receptor molecule) and therefore a grid box size of 64.9 x 57.0 x 39.3 was used. Finally, we enabled 9 possible docking poses per ligand. The intention of our VS with Autodock vina478 in PyRx479 was to rapidly screen our library and rank-order our lead compounds by predicted affinity. We therefore carried out selection criteria on the basis of (1) affinity, (2) commercial availability and drug-like properties, (3) and docking site (Figure 5.15) to identify lead compounds from our 1,604- compound library. For our first filter, we simply took all 122 compounds with predicted affinities higher than Raloxifene (-9.5 kcal?mol-1) (Figure 5.15). For our second filter, we manually removed all compounds that were not commercially available and/or had potential negative side-effects (e.g., anti-cancer drugs) and proceeded with 66 compounds (Figure 5.15). Given that we did not intend our VS to be used for accurate docking pose predictions, in order to apply our third filter, we repeated our Autodock vina478 and PyRx479 docking three additional times. Then, we classified confident PL docking on the basis of two criteria: (i) top-rank pose localized to the PL in >50% of repeated runs and/or (ii) >50% of all poses localize to the PL (Figures 5.15 and A.38). 186 This final computational filter left us with 12 lead compounds that could then be validated experimentally. Once we narrowed our focus to DAC, additional computational docking was carried out with rDock459, this time with the intention of accurate docking pose prediction. This was carried out as described in Section 5.5.1.5.4. 5.5.2.4.2 Dye-displacement assay All binding assays were carried out as described in Section 5.5.1.5.3. As before, given that the affinity of SYBRG II for RNA (nM) is much greater than that of the small molecules tested (?M), the calculated IC50 is a good approximation of the true affinity. IC50 values were determined by normalizing fluorescence intensity of each well to an average value for the fluorescence intensity of RNA/dye complex. Reported IC50 values are the average ? SE from fitting data of triplicate measurements to Equation 5.1. 5.5.2.4.3 NMR titrations All ? NMR samples were prepared by IVT as in Section 2.2.4.1 in NMR buffer A. NMR titration experiments were performed on unlabeled ? modular constructs (Figure 4.7). DAC was screened by titrating 100 ?M against 50 ?M ? samples. 1H NMR experiments were used to monitor CSPs of imino protons (i.e., Gua-H1 and Uri-H3). All NMR data were collected on an Avance III Bruker Ultrashield 600 MHz spectrometer equipped with a triple resonance cryogenic probe. Spectra were collected at 25 ?C with a recycle delay of 1.5 s and analyzed using TopSpin 4.0. 5.5.2.4.4 MD simulations (by Dr. Wojciech Kasprzak) Three independent 500 ns MD simulations were run for ? R3 and ? R3-DAC (i.e., the top- ranked rDock pose) using the Amber20 software package366 and ff99LJbb367 forcefield as 187 in Sections 4.4.5.4 and 5.5.1.5.5. The Amber CPPTRAJ381 module was then used for analysis. RMSD calculations were performed using the MD trajectories (excluding the 2 ns equilibrations) with ? R3 used as the reference structure. The 500 ns trajectories were sampled in 0.1 ns steps to yield 5,000 data points. 5.6 Conclusion There are approximately two billion people worldwide that have been exposed to HBV, leading to >300 million cHBV infections and ~600,000 deaths per year383,384, most of which occurring in the developing world (Figure 5.1). Moreover, HBV accounts for ~30 and ~50% of all cases of cirrhosis and and HCC, respectively385. This global burden has motivated intense efforts to discover and develop cHBV treatments. As such, there are eight FDA-approved therapies Figure 5.2): two IFN-? treatments and six NRTIs (Figure 5.3), and more in clinical trials (Figure 5.4). Unfortunately, both treatments have their disadvantages. IFN-? are rife with adverse effects and NRTIs require life-long therapy and suffer from resistance-related complications. Nevertheless, NRTIs are the most effective drug to combat cHBV infection. Still, there is a tremendous need for additional anti-HBV therapeutics to complement existing NRTI treatments. Two such examples are compounds that inhibit the RH domain of P (Figure 5.5) or the ?-P interaction. These are both attractive targets because they inhibit early stages of HBV (-)-DNA strand synthesis. However, the lack of structural information for the ?-P complex and P itself prevents structure-enabled design of anti-HBV therapeutics. Our recent structure of full-length ?20 (Figure 4.12) presents a necessary step in this direction. Indeed, given the central role of ? in HBV replication (Figure 4.6 and Tables 4.1 and 4.2), ? is an attractive and novel therapeutic target. Our structural analysis of full-length ? suggests that the 6 nt PL bulge 188 forms a binding pocket that is amenable to small molecule targeting (Figure 4.12). Computational predictions451 support this notion, mapping that the most probable ligand cavity directly at the PL (Figure 5.7). We therefore employed two orthogonal strategies to identify ?-targeting ligands as a strategy to reduce cHBV infection. Our first strategy was an HTS-based approach (Section 5.5.1). Here, we screened ~26,000 compounds using a SMM followed by NMR titrations to identify the SERM, Raloxifene, as an ?-targeting ligand with mid-?M affinity20. Unfortunately, in vitro assays determined that Raloxifene was unable to prevent protein priming (Figure 5.15)460. As a second approach, we employed a VS method (Section 5.5.2), which circumvented the months-to-years required to generate lead compounds using HTS-based strategies. Here, we computationally screened a 1,604-compound FDA-approved drug library from the ZINC15 database470?472 followed by subsequent binding experiments to identify the ant-HCV drug, DAC, as another ?-targeting ligand with mid-?M affinity. As of now, we do not know whether or not DAC has an in vitro or in vivo effect to reduce cHBV infection. Interestingly, both Raloxifene and DAC were shown to regulate PL dynamics (Figures 5.13 and 5.21). As outlined in great detail in Chapter 4, the structural dynamics of the PL nucleotides are critical to their function in facilitating HBV replication (Figure 4.22). As such, modulating ? dynamics may be an effective therapeutic strategy, which would benefit mid-?M binders. That is, dynamic-regulating small molecules can induce their effect by preventing ? from adopting the conformations needed to move from one functional state to the next (Section 4.5.2). Indeed, considerations of RNA dynamics in small molecule targeting has shown promising results in RNA-targeted drug discovery450. Therefore, even though Raloxifene was shown to have no anti-HBV effect, and this 189 information is not yet known for DAC, the approaches described in Sections 5.5.1 and 5.5.2 provide useful platforms for the discovery of new compounds whose ability to alter ? dynamics may result in the inhibition of early stages of HBV replication. 190 6 Conclusions and future directions 6.1 Summary of work NMR is a powerful biophysical tool to study RNA structure, dynamics, and interactions in solution and at high-resolution. However, these analyses face two major obstacles: spectral overlap and broad linewidths, both of which worsen as RNAs grow in size (Figure 1.2). To overcome these challenges, new stable isotope labeling strategies have been developed, as shown in Schemes 1.1-1.20 and Figures 1.4-1.7 and summarized in Tables 1.5, 1.7, and 1.10 and Figure 1.12. We have also presented two detailed and recent examples of new labeling strategies to facilitate solution NMR studies of RNA structure and dynamics. Our first example showcased a combined chemical and enzymatic synthesis of an atom-specifically labeled [2-13C, 7-15N]-ATP 162 (Schemes 2.1 and 2.2)124. Our second example featured a combined enzymatic and chemical synthesis of an atom-specific nucleobase and ribose labeled [1?,6-13C2, 5-2H]-uridine 2?-O-CEM amidite 172 (Schemes 2.3 and 2.4)149. We then outlined how atom-specific labeling can ensure artefact-free probing of RNA dynamics. Specifically, the removal of deleterious dipolar coupling simplifies R1 rate measurements and analysis (Figures 3.2-3.4 and 3.7)198 whereas the removal of strong scalar coupling facilitates relaxation dispersion (i.e., CPMG and R1?) (Figure 3.11) and saturation transfer (i.e., CEST) measurements and analysis. Taken together, the chemical tools outlined in Chapters 1-3 can be used with great effect to study the structure, dynamics, and interactions of RNAs, as showcased by our study of the RNA element from HBV, designated ?. The cis-acting regulatory stem-loop ? plays a central role in the HBV life cycle through its interaction with P292,295,297,304,317?31. Unfortunately, lack of structural data on P 191 limits our understanding of this interaction and its subsequent functions. Our recent NMR structure of full-length 61 nt ?20 (Figure 4.12) provides an important starting point to an improved understanding of HBV replication. In addition to structural information, our combined NMR and MD data suggest that nucleotides in the PL, PTL, and U43 bulge exhibit motions on multiple time scales (Figures 4.14-4.22)281. Given that these regions are required for the ?-P interaction323,324 and its subsequent pgRNA packaging298,303,310,325 and DNA synthesis301,310,323,325 (Figure 4.6 and 4.22 and Tables 4.1 and 4.2), our data imply the functional importance of these conformational dynamics. We therefore hypothesize that these ? motions facilitate its interaction with P and are further required to help direct the ?-P complex from one function state to the next (i.e., protein priming, pgRNA packaging, and initial (-)-DNA strand synthesis), which likely requires cellular HFs. However, the lack of structural information for the ?-P complex and P itself prevents structure-enabled design of anti-HBV therapeutics. Our recent structure of full-length ?20 (Figure 4.12) presents a necessary step in this direction. Indeed, structural analysis suggests that the 6 nt PL bulge forms a binding pocket that is amenable to small molecule targeting (Figure 4.12). Computational predictions451 support this notion, mapping that the most probable ligand cavity directly at the PL (Figure 5.7). We therefore employed two orthogonal strategies20 to identify ?-targeting ligands as a strategy to reduce cHBV infection, which causes ~600,000 deaths per year383,384. While neither compound has been shown to reduce HBV in vitro or in vivo, they were shown to regulate PL dynamics (Figures 5.13 and 5.21)20. Indeed, modulating ? dynamics may be an effective therapeutic strategy, and considerations of RNA dynamics in small molecule targeting has shown initial promise450. As such, the approaches described herein provide useful platforms for 192 the discovery of new compounds whose ability to alter ? dynamics may result in the inhibition of early stages of HBV replication. 6.2 Future directions While the work described herein has showcased the development of new RNA labeling technologies and provided the necessary first steps toward an in-depth biophysical understanding of the early stages of HBV replication, there is plenty of room for future work. This final section briefly outlines specific areas of focus that would most readily allow us to expand our results. 6.2.1 RNA labeling Our group26 and Kreutz and co-workers66 have shown the utility of [13C-19F]-Pyr labeling to study large RNA systems. Specifically, incorporation of the 13C-19F spin pair at aromatic nucleobase sites (e.g., Pyr-C5) leads to ?six-times wider chemical shift dispersion and ~two-times more favorable relaxation properties in TROSY experiments, compared to the 1H-13C spin pair.26,66. However, based on the work of Arthanari and co-workers, the spectral benefits are predicted to be even better for [2-13C, 2-19F]-Ade481. As such, in collaboration with Dr. Christoph Kreutz, we have begun to explore the synthesis and incorporation of [2-13C, 2-19F]-ATP into large (>200 nt) RNAs to experimentally validate these predictions (data not shown). 6.2.2 Structural modeling of full-length ? While our current structural model of full-length ?20 is a good starting point, our NMR restraints are sparse and we know that ? is highly dynamic281. As such, additional efforts to refine ? structure as an ensemble is highly desirable. To this end, in collaboration with 193 Dr. Christina Bergonzo, we plan to carry out refinement of full-length ? using two approaches: solution state refinement using explicit solvent MD and sample-and-select (SAS) parsing of conformational pools. Explicit solvent refinement uses an explicit representation of water and counterions to better solvate ?, resulting in structures that are more reasonable and exhibit dynamics consistent with their known environment. This is critically important for nucleic acids, as they are usually non-spherical and formally charged, making quality descriptions of solvation necessary. It has been shown that with explicit solvent, in either the presence or absence of NMR restraints, the resulting structures better fit the experimental data482. Additionally, new methods using ROSETTA?s FARFAR2 RNA 3D structure prediction tool483 have yielded promising results in SAS for HIV-1 TAR484. Though TAR has a 3 nt internal bulge region and ? has a larger 6 nt internal bulge, the approach merits an attempt, given that thousands of FARFAR2-predicted structures can be generated in only a few days. The resulting FARFAR2-predicted ? structural ensemble will be parsed using the available NMR data20, starting with chemical shifts, and the best fitting structures can be compared to the refined ensemble. Taken together, these two approaches will provide a more robust modeling of the dynamic ? structural ensemble. 6.2.3 Probing slow motions in full-length ? Our combined NMR and MD analysis of full-length ? demonstrates that nucleotides in the PL, PTL, and U43 bulge exhibit motions on multiple time scales (Figures 4.14-4.22)281. Specifically, 13C spin relaxation measurements and 1H CPMG measurements revealed ps-ns (Figures 4.14 and 4.15) and ?s-ms (Figures 4.16 and 4.17) motions within full- length ?, respectively. However, the latter motions revealed by CPMG were all fast on the 194 NMR time scale (i.e., fast exchange regime) and therefore chemical shift information of the low populated minor states were unavailable. To fill this critical knowledge gap, we wish to employ 1H and 13C CEST measurements. These data will reveal whether full- length ? experiences slower motions on the millisecond time scale and help uncover the underlying structural transitions they represent. 6.2.4 Ensemble-based virtual screening Given that we have outlined the dynamic nature of full-length ? (Figures 4.14-4.22)281, our VS-based approaches should reflect the inherent flexibility of the RNA target. However, as a first pass, we opted to perform a rigid dock VS. This was, in part, due to practical considerations. For example, the success of ensemble-based VS relies on having robust and plentiful experimental restraints such as RDCs and NOEs. Unfortunately, in the case of full-length ?, both these data are sparse. However, the ensemble-based refinement techniques described above (i.e., explicit solvent MD and SAS of conformational pools) provide a computational path to circumvent our sparse RDC and NOE data. We therefore intend to utilize these ? conformational ensembles in VS-based methods with various compounds libraries, following the initial success of similar approaches450,464. 6.2.5. ? and N6-methylation The biggest and most exciting path for future studies concerning ? revolve around the recent work of Siddiqui and co-workers485. Specifically, mutational analysis identified that m6A modification at the 5?-end ? motif (A59 in our full-length construct numbering) was required for efficient reverse transcription whereas m6A modification at the 3?-end ? motif destabilized all HBV transcripts, suggesting a dual regulatory role for m6A in the HBV life 195 cycle485. In general, m6A is thought to exert its biological effects in two modes: recruiting protein machinery that regulates mRNA fate (e.g., splicing486,487, export488,489, decay490, and translation initiation efficiency491,492) or modifying RNA structural dynamics and/or stability145,146,493?496. The former mechanism likely explains the destabilization of HBV transcripts. The increase of reverse transcription caused by m6A59, on the other hand, is likely a result of modulating ? structural dynamics and/or stability. A detailed mechanism of how this modification modulates the ?-P interaction and its downstream functions remains elusive but is critical to a full understanding of HBV replication. To fill this critical knowledge gap, we plan to perform NMR studies of full-length ? with and without m6A59. This work will also add evidence to the ongoing debate on whether m6A stabilizes or destabilizes RNA structural dynamics and/or stability145,146,493?496. In collaboration with Dr. Christoph Kreutz, we have begun to carry out such measurements. Initial results suggest that m6A59 destabilizes ? folding and has a varied effect on its dynamics (data not shown). However, additional measurements are needed to draw confident conclusions. 196 Appendix Figure A.1. 1H NMR spectrum of compound 122124. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.2. 13C NMR spectrum of compound 122124. NMR parameters: 75 MHz, DMSO-d6, 25 ?C 197 Figure A.3. 1H NMR spectrum of compound 151124. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.4. 1H NMR spectrum of compound 152124. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. 198 Figure A.5. 1H NMR spectrum of compound 153124. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.6. 13C NMR spectrum of compound 153124. NMR parameters: 75 MHz, DMSO-d6, 25 ?C 199 Figure A.7. 1H NMR spectrum of compound 154124. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.8. 13C NMR spectrum of compound 154124. NMR parameters: 75 MHz, DMSO-d6, 25 ?C 200 Figure A.9. 1H NMR spectrum of compound 155124. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.10. 13C NMR spectrum of compound 155124. NMR parameters: 75 MHz, DMSO-d6, 25 ?C 201 Figure A.11. 1H NMR spectrum of compound 156124. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.12. 13C NMR spectrum of compound 156124. NMR parameters: 75 MHz, DMSO-d6, 25 ?C. 202 Figure A.13. ESI-TOF MS of compound 156124. MS parameters: ESI positive mode. Figure A.14. 31P NMR spectrum of compound 162124. NMR parameters: 324 MHz, D2O, 25 ?C. 203 Figure A.15. 1H NMR spectrum of compound 168149. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.16. 13C NMR spectrum of compound 168149. NMR parameters: 75 MHz, DMSO-d6, 25 ?C 204 Figure A.17. 1H NMR spectrum of compound 171149. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.18. 13C NMR spectrum of compound 171149. NMR parameters: 75 MHz, DMSO-d6, 25 ?C 205 Figure A.19. 1H NMR spectrum of compound 172149. NMR parameters: 300 MHz, DMSO-d6, 25 ?C. Figure A.20. 13C NMR spectrum of compound 172149. NMR parameters: 75 MHz, DMSO-d6, 25 ?C 206 Figure A.21. 31P NMR spectrum of compound 172149. NMR parameters: 122 MHz, C6D6, 25 ?C. Figure A.22. ESI-TOF MS of compound 172149. MS parameters: ESI negative mode. 207 Figure A.23. ESI-TOF MS of compound 173. MS parameters: ESI positive mode. Figure A.24. ESI-TOF MS of compound 174. MS parameters: ESI negative mode. 208 Figure A.25. ESI-TOF MS of compound 175. MS parameters: ESI negative mode. Figure A.26. ESI-TOF MS of compound 176. MS parameters: ESI positive mode. 209 Figure A.27. Simulated Ade-C2 relaxation in uniformly/selectively labeled RNA198. Simulated Ade- R2,C2(uniform) and -R2,C2(selective) rates (left) and Ade-hNOEC2(uniform) and -hNOEC2(selective) values (right). Both simulations were carried out at increasing ?C and at 800 MHz. Our simulations suggest that dipolar interactions to Ade-C2 do not significantly contribute to these relaxation parameters. Additional details can be found in the original work198. Figure A.28. Representative TROSY spectra in Ade-C2 relaxation experiments198. (A) 1H-13C TROSY spectra of u-[13C/15N]-ATP and 162 labeled 61 nt RNA at 800 MHz and 25 ?C. (B) Modified 1H-13C TROSY spectrum of 162 labeled 61 nt RNA at 800 MHz and 5 ?C. To maximize signal-to-noise and minimize experiment time, we reduced the sweep-width and time-domain points while increasing the number of scans. Therefore, only four of eight Ade-H2-C2 resonances were well-resolved. The asterisks by A28 indicates that the signal is aliased (i.e., folded in). 210 Figure A.29. Measured Ade-C2 relaxation in uniformly/selectively labeled RNA198. Measured Ade- R2,C2(uniform) and -R2,C2(selective) rates (left) and Ade-hNOEC2(uniform) and -hNOEC2(selective) values (right) at 800 MHz and 25 ?C. Mean rates are shown with dashed lines and error bars represent ? SD. Experimental Ade- R2,C2(uniform) and -R2,C2(selective) rates do not differ significantly (values are within experimental error). Experimental Ade-hNOEC2,(uniform) and -hNOEC2,(selective) values also do not differ significantly, except A29 (designated *). Taken together, our simulations and experimental measurements suggest that dipolar interactions to Ade-C2 do not significantly contribute to these relaxation parameters. 211 Figure A.30. ? modular constructs faithfully recapitulate full-length ?20. (A) Secondary structure representations of ? constructs made from IVT for NMR studies, as in Figure 4.7. (B) Overlay of aromatic H6-C6/H8-C8 resonances from 1H-13C HMQC experiments on our three u-[13C/15N]-labeled ? constructs. Resonances marked (*) are from the non-native UUCG tetraloop used to close the UH of the PL ? construct. Resonances belonging to the first A:U base pair in the PL ? construct were shifted compared to full-length ? due to the difference in the preceding base pair type (i.e., G:C instead of C:G). Full-length ? aromatic region was well-resolved and therefore enabled a complete assignment of all nucleotides. Assignments were then confirmed with standard multidimensional NMR experiments. (C) Overlay of ribose H1?-C1? resonances from 1H-13C HSQC experiments on all three ? constructs with various labelling patterns. the priming loop and apical loop constructs are shown overlaid with the full-length HBV ? spectrum. As with the aromatic region, ribose H1?-C1 resonance assignment was straightforward and confirmed by multidimensional NMR. 212 Figure A.31. Comparison of full-length and apical loop ? NMR structures20. Overlay of top-ranked full- length 61 nt (PDB 6var20) and apical loop 27 nt ? (PDB 2ixy238) NMR conformers. Structures show strong agreement with a mean RMSD of 1.72 ? 0.12 ?. Table A.1. Nuclei excluded from sPRE analysis due to spectral overlap. Nuclei type Excluded resonance Pur-H8 G1, A6, G8, U25-C27, G51, G52, A55, and G57 Pyr-H6 U3-C5, U7, U25-C27, U39, C46, U56, and C61 Ribose H1? G1, U4-C6, G8, C10-A13, C19, A20, C23-C26, A28, A29, G45, C46, G51, G53- A55, A59 and C61 Pyr-H5 U4, U7, U9, C11, U12, C23-U25, C27, U32, U34, C37, U39, C46, U47, C54, U56, and C61 Ade-H2 A6 and A55 Table A.2. Nuclei excluded from 13C ?xy analysis due to spectral overlap. Nuclei type Excluded resonance Pur-H8 A6, G35, G40, G51, G52, A55, G57, and A59 Pyr-H6 U3, U7, C26, C27, C37, U38, C46, and U56 213 Figure A.32. full-length ? SAXS data20. Merged scattering at I(q)=0 data (top left) for the three ? concentrations (i.e., 1.0, 2.5, 5.0 mg/ml) with the Guinier plot at the top right. The characteristic profile of a well-folded RNA was observed in the Kratky plot (bottom left). Real-space PDDF analysis gave an RG of 29.8 ? and a Dmax of 97 ? (bottom right). 214 Figure A.33. Relaxation decay curves for full-length ?281. Relaxation decay curves for TROSY-detected 13C R1 and R1? measurements for (A) nucleobase (i.e., purine C8 and pyrimidine C6) and (B) ribose (i.e., C1?) nuclei. Representative curves are shown for each nucleus probed from the three atom-specifically labeled samples (Figure 4.13). All nuclei show monoexponential decay, indicating the absence of 13C-13C cross-relaxation and Hartman-Hahn transfer and therefore the reliability of the data. All extracted relaxation rates and the quality of the fits are shown. 215 Table A.3. Individual data fitting of 1H CPMG measurements281. Nuclei Modela ?2 AIC R2,0 (s-1)b ???? 2 c ??? (ppm ) kex (s-1)b ? (?s)b C14-H6 Fast 11.2 14.1 16.1 ? 0.2 0.03 ? 0.01 4,600 ? 1,780 220 ? 100 U15-H1? Fast 13.1 5.4 17.3 ? 0.1 0.03 ? 0.00 1,700 ? 260 580 ? 80 U17-H6 Fast 35.4 17.5 17.5 ? 0.1 0.05 ? 0.01 3,400 ? 360 300 ? 30 U17-H1? Fast 22.4 7.4 14.7 ? 0.1 0.04 ? 0.01 2,000 ? 240 490 ? 50 U18-H6 Fast 12.1 -1.3 16.6 ? 0.1 0.06 ? 0.00 4,700 ? 490 210 ? 20 U18-H1? Fast 8.2 5.4 17.6 ? 0.2 0.05 ? 0.00 2,800 ? 360 360 ? 40 C19-H6 Fast 10.6 3.0 18.4 ? 0.1 0.02 ? 0.00 2,100 ? 600 470 ? 100 U32-H6 Fast 4.3 -5.9 14.6 ? 0.2 0.05 ? 0.00 4,400 ? 960 230 ? 40 U32-H1? Fast 9.8 -0.5 16.0 ? 0.1 0.05 ? 0.00 3,400 ? 370 290 ? 30 U34-H6 Fast 21.8 30.5 15.8 ? 0.9 0.12 ? 0.00 11,600 ? 2,770 90 ? 20 C36-H1? Fast 7.8 -1.3 16.1 ? 0.7 0.10 ? 0.02 10,600 ? 2,080 90 ? 20 U43-H6 Fast 8.0 -12.8 16.5 ? 0.1 0.03 ? 0.00 3,800 ? 960 260 ? 50 U43-H1? Fast 13.4 3.8 17.2 ? 0.2 0.03 ? 0.01 4,600 ? 2,310 220 ? 70 U48-H1? Fast 169.1 -1.1 9.0 ? 0.0 0.03 ? 0.00 2,000 ? 70 490 ? 20 U49-H1? Fast 9.7 8.7 15.0 ? 0.2 0.03 ? 0.00 2,700 ? 620 370 ? 70 a. CPMG curves were fit using ShereKhan363, RING NMR Dynamics278, and rdnmr (version 1.5)254 but only the former two programs have robust model selection criteria. Using ShereKhan, the appropriate exchange regime (i.e., fast exchange or slow exchange) was selected based on which model led to minimization of the chi-squared363. Using RING NMR Dynamics278, the appropriate model was selected on the basis of the AIC280. Both selection criteria determined that all nuclei exhibit exchange that is fast on the NMR time scale. b. The kex and R2,0 values are the direct output from Luz-Meiboom (fast exchange model) fitting with ShereKhan363 and ?ex = 1/kex. c. ?()*''( represents the ?? that would be present if the pa and pb are equal and the values are the direct output from RING NMR Dynamics (fast exchange model)278. 216 Table A.4. 1H CPMG data fitting from different analysis programs281. ShereKhana RINGa Rdnmra Nuclei R2,0 (s-1) kex (s-1) R2,0 (s-1) kex (s-1) R2,0 (s-1) kex (s-1) C14-H6 16.1 ? 0.2 4,600 ? 1,780 16.4 ? 0.2 3,600 ? 2,680 5,200 U15-H1? 17.3 ? 0.1 1,700 ? 260 17.4 ? 0.1 1,700 ? 210 1,800 U17-H6 17.5 ? 0.1 3,400 ? 360 17.5 ? 0.2 3,400 ? 990 3,600 U17-H1? 14.7 ? 0.1 2,000 ? 240 14.7 ? 0.3 2,000 ? 420 2,100 U18-H6 16.6 ? 0.1 4,700 ? 490 16.6 ? 0.1 4,700 ? 620 4,700 U18-H1? 17.6 ? 0.2 2,800 ? 360 17.6 ? 0.1 2,800 ? 320 2,800 C19-H6 18.4 ? 0.1 2,100 ? 600 18.4 ? 0.1 2,100 ? 1,060 2,300 U32-H6 14.6 ? 0.2 4,400 ? 960 14.6 ? 0.1 4,400 ? 680 4,300 U32-H1? 16.0 ? 0.1 3,400 ? 370 16.0 ? 0.1 3,400 ? 490 3,400 U34-H6 15.8 ? 0.9 11,600 ? 2,770 15.8 ? 1.6 11,700 ? 4,130 12,200 C36-H1? 16.1 ? 0.7 10,600 ? 2,080 16.1 ? 0.8 10,600 ? 2,150 10,900 U43-H6 16.5 ? 0.1 3,800 ? 960 16.5 ? 0.1 3,800 ? 1,270 3,600 U43-H1? 17.2 ? 0.2 4,600 ? 2,310 17.2 ? 0.2 4,600 ? 2,310 4,600 U48-H1? 9.0 ? 0.0 2,000 ? 70 9.1 ? 0.1 2,000 ? 380 2,100 U49-H1? 15.0 ? 0.2 2,700 ? 620 15.0 ? 0.1 2,700 ? 530 2,800 a. CPMG curves were fit using ShereKhan363, RING NMR Dynamics278, and rdnmr (version 1.5)254 and the outputs are compared across the three programs. ShereKhan and RING provide fits with errors for R2,0 and kex. On the other hand, rdnmr only reports kex without error when fitting nuclei without global fit statistics. Nevertheless, the output from all three programs agree (i.e., are within error). For simplicity, we chose to report the values from ShereKhan throughout Section 4.4.2 and in Figures 14.16 and 4.17. 217 Figure A.20. Comparison of ? R3 MD trajectories281. Whole-atom RMSD (calculated against ? R3, PDB 6var20) for all three independent 1 ?s MD runs for each nucleotide window (see Section 4.4.5.4), averaged over the entire 1 ?s trajectory is shown with ? structural regions abbreviated and colored as in Figure 4.7. While variations exist between the three MD runs, the overall trends (i.e., elevated RMSD for nucleotides in the PL, PTL, and U43 bulge far above the helical regions) are unambiguous. Table A.5. Combined results for PL angle-based clustering of ? R3281. Cluster Fraction, average Fraction, run 1a Fraction, run 2 Fraction, run 3 C0 0.3093 0.0006 0.9272 n.f. C1 0.2497 0.0007 0.0235 0.7249 C2 0.1953 0.5858 n.f. n.f. C3 0.1309 0.3927 n.f. n.f. C4 0.1149 0.0202 0.0493 0.2751 a. Here, n.f. refers to not found (i.e., 0% populated). Table A.6. Combined results for global RMSD-based clustering of ? R3281. Cluster Fraction, average Fraction, run 1a Fraction, run 2 Fraction, run 3 C0 0.3045 0.0025 0.9108 0.0002 C1 0.2001 0.0079 n.f. 0.5925 C2 0.1931 0.4726 0.0685 0.0381 C3 0.1817 0.5170 0.0207 0.0075 C4 0.1206 n.f. n.f. 0.3617 a. Here, n.f. refers to not found (i.e., 0% populated). 218 Table A.7. NMR parameters for 13C R1 and R1? experiments281. Spina Experiment Field (MHz) Delay times (s)b A/G-C8 R1 800 0.10, 0.20 (x2), 0.40, 0.50, 0.70, 0.90, 1.10 A/G-C8 R1 600 0.01, 0.20 (x2), 0.30, 0.50, 0.80, 0.90, 1.00 A/G-C1? R1 800 0.05, 0.10, 0.20 (x2), 0.40, 0.60, 0.90, 1.10, 1.30 A/G-C8 R1? 800 0.001, 0.003, 0.004, 0.006, 0.007, 0.009, 0.011 A/G-C8 R1? 600 0.001, 0.003, 0.006, 0.009, 0.012, 0.015, 0.018 A/G-C1? R1? 800 0.0005, 0.002, 0.005, 0.007, 0.010, 0.012, 0.015, 0.018 C-C6 R1 800 0.05, 0.20 (x2), 0.30, 0.40, 0.50, 0.60, 0.80, 0.90 C-C6 R1 600 0.001, 0.10, 0.20 (x2), 0.40, 0.55, 0.70, 0.825 C-C1? R1 800 0.05, 0.10, 0.20 (x2), 0.40, 0.60, 0.90, 1.10, 1.30 C-C6 R1? 800 0.002, 0.004, 0.006, 0.008, 0.010, 0.012, 0.0135 C-C6 R1? 600 0.001, 0.003, 0.005, 0.007, 0.010, 0.012, 0.014 C-C1? R1? 800 0.0005, 0.002, 0.005, 0.007, 0.010, 0.012, 0.015, 0.018 C-C6 R1 800 0.05, 0.20 (x2), 0.30, 0.40, 0.50, 0.60, 0.80, 0.90 C-C6 R1 600 0.001, 0.10, 0.20 (x2), 0.40, 0.55, 0.70, 0.825 C-C1? R1 800 0.05, 0.10, 0.20 (x2), 0.40, 0.60, 0.90, 1.10, 1.30 C-C6 R1? 800 0.002, 0.004, 0.006, 0.008, 0.010, 0.012, 0.0135 C-C6 R1? 600 0.001, 0.003, 0.005, 0.007, 0.010, 0.012, 0.014 C-C1? R1? 800 0.0005, 0.002, 0.005, 0.007, 0.010, 0.012, 0.015, 0.018 a. A/G-C8 and A/G-C1? refers to experiments with [1?,8-13C2]-ATP and [1?,8-13C2]-GTP labeled ? samples; C-C6 and C-C1? refers to experiments with [1?,6-13C2]-CTP labeled ? samples; and U-C6 and U-C1? refers to experiments with [1?,6-13C2]-UTP labeled full-length ? samples (Figure 4.13). b. Duplicated measurements are indicated by ?x2?, which was used for estimating errors in R1 experiments. 219 Table A.8. Additional NMR parameters for 13C spin relaxation experiments281. Spina Experimentb Field (MHz) Scans Increments Carrier (ppm) SW (ppm) A/G-C8 R1 800 32 80 136.0 6.0 A/G-C8 R1? 800 32 80 136.0 6.0 A/G-C8 hNOE 800 32 80 136.0 6.0 A/G-C8 R1 600 64 80 136.35 6.0 A/G-C8 R1? 600 64 80 136.35 6.0 A/G-C8 hNOE 600 64 76 136.35 6.0 A/G-C1? R1 800 32 128 89.0 6.0 A/G-C1? R1? 800 32 128 89.0 6.0 A/G-C1? hNOE 800 32 128 89.0 6.0 C-C6 R1 800 32 80 140.3 5.5 C-C6 R1? 800 32 80 140.3 5.5 C-C6 hNOE 800 32 80 140.3 5.5 C-C6 R1 600 64 80 140.3 5.5 C-C6 R1? 600 64 80 140.3 5.5 C-C6 hNOE 600 64 70 140.3 5.5 C-C1? R1 800 32 100 90.6 4.0 C-C1? R1? 800 32 100 90.6 4.0 C-C1? hNOE 800 32 100 90.6 4.0 U-C6 R1 800 32 80 140.0 4.3 U-C6 R1? 800 32 80 140.0 4.3 U-C6 hNOE 800 32 80 140.0 4.3 U-C6 R1 600 64 80 140.0 4.5 U-C6 R1? 600 64 80 140.0 4.5 U-C6 hNOE 600 64 80 140.0 4.5 U-C1? R1 800 32 100 90.0 4.6 U-C1? R1? 800 32 100 90.0 4.6 U-C1? hNOE 800 32 98 90.0 4.6 a. A/G-C8 and A/G-C1? refers to experiments with [1?,8-13C2]-ATP and [1?,8-13C2]-GTP labeled ? samples; C-C6 and C-C1? refers to experiments with [1?,6-13C2]-CTP labeled ? samples; and U-C6 and U-C1? refers to experiments with [1?,6-13C2]-UTP labeled full-length ? samples (Figure 4.13). b. Saturation and no saturation hNOE experiments were run with identical conditions in the table above. 220 Table A.9. Nuclei that were excluded from dynamics analysis due to spectral overlap. Nuclei type Excluded resonance Pur-H8/C8 A6, G40, G51, A55, and G57 Pyr-H6/C6 U7, C27, C37, and U56 Ribose H1?/ C1? U3, C11, U12, A13, C23, A29, U38, G44, G52, C54, A55, and A59-C61 Table A.10. NMR parameters for 1H CPMG experiments. Spina Field (MHz) ?CPMG (Hz) A/G-C8 800 50, 100, 200, 250, 300, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500 A/G-C1? 800 50, 100, 200, 250, 300, 400, 500, 750, 1,250, 1,750, 2,250, 2,500 C-C6 800 50, 100, 200, 250, 300, 400, 500, 1,000, 15,00, 2,000, 2,500, 3,000, 3,500 C-C1? 800 50, 100, 200, 250, 300, 400, 500, 750, 1,250, 1,750, 2,250, 2,500 U-C6 800 50, 100, 200, 250, 300, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500 U-C1? 800 50, 100, 200, 250, 300, 400, 500, 750, 1,250, 1,750, 2,250, 2,500 a. A/G-C8 and A/G-C1? refers to experiments with [1?,8-13C2]-ATP and [1?,8-13C2]-GTP labeled ? samples; C-C6 and C-C1? refers to experiments with [1?,6-13C2]-CTP labeled ? samples; and U-C6 and U-C1? refers to experiments with [1?,6-13C2]-UTP labeled full-length ? samples (Figure 4.13). Table A.11. Additional NMR parameters for 1H CPMG experiments. Spina Field (MHz) Scans Increments Carrier (ppm) SW (ppm) A/G-C8 800 32 80 136.0 6.0 A/G-C1? 800 32 100 89.0 6.0 C-C6 800 32 80 140.5 5.5 C-C1? 800 32 100 90.6 4.0 U-C6 800 32 80 140.0 4.3 U-C1? 800 32 100 90.0 4.6 a. A/G-C8 and A/G-C1? refers to experiments with [1?,8-13C2]-ATP and [1?,8-13C2]-GTP labeled ? samples; C-C6 and C-C1? refers to experiments with [1?,6-13C2]-CTP labeled ? samples; and U-C6 and U-C1? refers to experiments with [1?,6-13C2]-UTP labeled full-length ? samples (Figure 4.13). 221 Figure A.35. Small molecule microarray set-up20. (A) SMM screen25,452?456 used to identify ?-targeting ligands. A library of ~26,000 chemotypes were covalently linked to slides via their hydroxyl or amine functional groups (see Section 5.5.1.5.1). Then, 5?-CGGCGG-3?-TYE665 (control) or 5?-CGGCGG-3?- TYE665-labeled ? was applied to the slides. Upon scanning, red spots were indicative of ? binding to the covalently linked small molecule. Each compound is printed in duplicate on an array. (B) Fluorescent spot analysis profiles for the five initial hit compounds. 222 Figure A.36. NMR titration experiments for SMM-selected compounds20. 1H-13C sofast-HMQC NMR experiments were used to measure CSPs for aromatic (i.e., H6/H8-C6/C8) resonances in atom-specifically labeled (Figure 4.9A) full-length ? (50 ?M) titrated with SMM-selected compounds (up to 250 ?M). NSC200618 (top left) and NSC200619 (top right) both showed no significant CSPs, indicating no binding. Pinafide (bottom left) showed resonance line broadening indicative of ? aggregation. Pentamidine (bottom right), on the other hand, showed CSPs for aromatic resonances of nucleotides in both the PL and AL, suggesting non-specific binding to ?. 223 Figure A.37. Testing selectivity of Raloxifene and Pentamidine20. (A) Atom-specifically labeled (i.e., 8- 13C]-Ade/Gua and [6-13C, 5-2H]-Cyt-Uri) AL ? used in NMR titration experiments in B. (B) 1H-13C sofast- HMQC NMR experiments were used to measure CSPs for aromatic (i.e., H6/H8-C6/C8) resonances in atom-specifically labeled (shown in A) AL ? (50 ?M) titrated with either Raloxifene (left) or Pentamidine (right) (both up to 200 ?M). 224 Figure A.38. Priming loop docking validation. Docked complexes of all ligand poses in all docking runs with the ? R3 (PDB 6var20) target. ? structural regions are abbreviated and colored as in Figure 4.7 and lead compounds abbreviated as in Figure 5.16. Each of these 12 compounds passed the final filter step, which determined confident PL docking on the basis of two criteria: (1) top-rank pose localized to the PL in >50% of repeated runs and/or (2) >50% of all poses localize to the PL (see Section 5.5.2.4.1). 225 Figure A.39. Non-binding lead compounds. Representative binding curves from dye-displacement experiments to determine the binding affinity of each VS-identified lead compound to full-length ?. These compounds do not show evidence of binding at the concentrations used (i.e., IC50 >500 ?M), as determined by data fitting with Equation 5.1. 226 Bibliography (1) Olenginski, L. T., Taiwo, K. M., Leblanc, R. M., and Dayie, T. K. (2021) Isotope-labeled RNA building blocks for NMR structure and dynamics studies. Molecules 26, 5581. (2) Becette, O., Olenginski, L. T., and Dayie, T. K. (2019) Solid-phase chemical synthesis of stable isotope-labeled RNA to aid structure and dynamics studies by NMR spectroscopy. Molecules 24, 3476. (3) Dayie, T. K., Olenginski, L. T., and Taiwo, K. M. (2022) Isotope labels combined with solution NMR spectroscopy makes visible the invisible conformations of small-to-large RNAs. Chem. Rev. (in press). (4) Grundy, F. J., and Henkin, T. M. (1993) tRNA as a positive regulator of transcription antitermination in B. subtilis. Cell 74, 475?482. (5) Winkler, W., Nahvi, A., and Breaker, R. R. (2002) Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419, 952?956. (6) Mironov, A. S., Gusarov, I., Rafikov, R., Lopez, L. E., Shatalin, K., Kreneva, R. A., Perumov, D. A., and Nudler, E. (2002) Sensing small molecules by nascent RNA: a mechanism to control transcription in bacteria. Cell 111, 747?756. (7) Lee, R. C., Feinbaum, R. L., and Ambros, V. (1993) The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843? 854. (8) Wightman, B., Ha, I., and Ruvkun, G. (1993) Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855?862. (9) Guerrier-Takada, C., Gardiner, K., Marsh, T., Pace, N., and Altman, S. (1983) The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell 35, 849?857. (10) Kruger, K., Grabowski, P. J., Zaug, A. J., Sands, J., Gottschling, D. E., and Cech, T. R. (1982) Self-splicing RNA: autoexcision and autocyclization of the ribosomal RNA intervening sequence of Tetrahymena. Cell 31, 147?157. (11) Steitz, T. A., and Steitz, J. A. (1993) A general two-metal-ion mechanism for catalytic RNA. Proc. Natl. Acad. Sci. U. S. A. 90, 6498?6502. (12) Yik, J. H. N., Chen, R., Nishimura, R., Jennings, J. L., Link, A. J., and Zhou, Q. (2003) Inhibition of P-TEFb (CDK9/Cyclin T) kinase and RNA polymerase II transcription by the coordinated actions of HEXIM1 and 7SK snRNA. Mol. Cell 12, 971?982. (13) Zappulla, D. C., and Cech, T. R. (2006) RNA as a flexible scaffold for proteins: yeast telomerase and beyond. Cold Spring Harb. Symp. Quant. Biol. 71, 217?224. (14) Tycowski, K. T., Guo, Y. E., Lee, N., Moss, W. N., Vallery, T. K., Xie, M., and Steitz, J. A. (2015) Viral noncoding RNAs: more surprises. Genes Dev. 29, 567?584. (15) D?Souza, V., and Summers, M. F. (2005) How retroviruses select their genomes. Nat. Rev. Microbiol. 3, 643?655. (16) Ganser, L. R., Kelly, M. L., Herschlag, D., and Al-Hashimi, H. M. (2019) The roles of structural dynamics in the cellular functions of RNAs. Nat. Rev. Mol. Cell Biol. 20, 474? 489. (17) Maru?i?, M., Schlagnitweit, J., and Petzold, K. (2019) RNA Dynamics by NMR Spectroscopy. Chembiochem 20, 2685?2710. (18) Zhang, K., Keane, S. C., Su, Z., Irobalieva, R. N., Chen, M., Van, V., Sciandra, C. A., Marchant, J., Heng, X., Schmid, M. F., Case, D. A., Ludtke, S. J., Summers, M. F., 227 and Chiu, W. (2018) Structure of the 30 kDa HIV-1 RNA dimerization signal by a hybrid cryo-EM, NMR, and molecular dynamics approach. Structure 26, 490-498.e3. (19) Wacker, A., Weigand, J. E., Akabayov, S. R., Altincekic, N., Bains, J. K., Banijamali, E., Binas, O., Castillo-Martinez, J., Cetiner, E., Ceylan, B., Chiu, L. Y., Davila-Calderon, J., Dhamotharan, K., Duchardt-Ferner, E., Ferner, J., Frydman, L., F?rtig, B., Gallego, J., Tassilo Gr?n, J., Hacker, C., Haddad, C., H?hnke, M., Hengesbach, M., Hiller, F., Hohmann, K. F., Hymon, D., de Jesus, V., Jonker, H., Keller, H., Knezic, B., Landgraf, T., L?hr, F., Luo, L., Mertinkus, K. R., Muhs, C., Novakovic, M., Oxenfarth, A., Palomino- Sch?tzlein, M., Petzold, K., Peter, S. A., Pyper, D. J., Qureshi, N. S., Riad, M., Richter, C., Saxena, K., Schamber, T., Scherf, T., Schlagnitweit, J., Schlundt, A., Schnieders, R., Schwalbe, H., Simba-Lahuasi, A., Sreeramulu, S., Stirnal, E., Sudakov, A., Tants, J. N., Tolbert, B. S., V?gele, J., Wei?, L., Wirmer-Bartoschek, J., Wirtz Martin, M. A., W?hnert, J., and Zetzsche, H. (2020) Secondary structure determination of conserved SARS-CoV- 2 RNA elements by NMR spectroscopy. Nucleic Acids Res. 48, 12415?12435. (20) LeBlanc, R. M., Kasprzak, W. K., Longhini, A. P., Olenginski, L. T., Abulwerdi, F., Ginocchio, S., Shields, B., Nyman, J., Svirydava, M., Del Vecchio, C., Ivanic, J., Schneekloth, J. S., Shapiro, B. A., Dayie, T. K., and Le Grice, S. F. J. (2021) Structural insights of the conserved ?priming loop? of hepatitis B virus pre-genomic RNA. J. Biomol. Struct. Dyn. (21) Binas, O., de Jesus, V., Landgraf, T., V?lklein, A. E., Martins, J., Hymon, D., Kaur Bains, J., Berg, H., Biedenb?nder, T., F?rtig, B., Lakshmi Gande, S., Niesteruk, A., Oxenfarth, A., Shahin Qureshi, N., Schamber, T., Schnieders, R., Tr?ster, A., Wacker, A., Wirmer-Bartoschek, J., Wirtz Martin, M. A., Stirnal, E., Azzaoui, K., Richter, C., Sreeramulu, S., Jos? Blommers, M. J., and Schwalbe, H. (2021) 19F NMR-based fragment screening for 14 different biologically active RNAs and 10 DNA and protein counter-screens. Chembiochem 22, 423?433. (22) Thompson, R. D., Baisden, J. T., and Zhang, Q. (2019) NMR characterization of RNA small molecule interactions. Methods 167, 66?77. (23) Johnson, E. C., Feher, V. A., Peng, J. W., Moore, J. M., and Williamson, J. R. (2003) Application of NMR SHAPES screening to an RNA target. J. Am. Chem. Soc. 125, 15724? 15725. (24) Lind, K. E., Du, Z., Fujinaga, K., Peterlin, B. M., and James, T. L. (2002) Structure- based computational database screening, in vitro assay, and NMR assessment of compounds that target TAR RNA. Chemistry & Biology 9, 185?193. (25) Abulwerdi, F. A., Xu, W., Ageeli, A. A., Yonkunas, M. J., Arun, G., Nam, H., Schneekloth, J. S., Dayie, T. K., Spector, D., Baird, N., and Le Grice, S. F. J. (2019) Selective small-molecule targeting of a triple helix encoded by the long noncoding RNA, MALAT1. ACS Chem. Biol. 14, 223?235. (26) Becette, O. B., Zong, G., Chen, B., Taiwo, K. M., Case, D. A., and Dayie, T. K. (2020) Solution NMR readily reveals distinct structural folds and interactions in doubly 13C- And 19F-labeled RNAs. Sci. Adv. 6, eabc6572. (27) Kao, C., Zheng, M., and R?disser, S. (1999) A simple and efficient method to reduce nontemplated nucleotide addition at the 3? terminus of RNAs transcribed by T7 RNA polymerase. RNA 5, 1268?1272. (28) Helmling, C., Keyhani, S., Sochor, F., F?rtig, B., Hengesbach, M., and Schwalbe, H. (2015) Rapid NMR screening of RNA secondary structure and binding. J. Biomol. NMR 228 63, 67?76. (29) Milligan, J. F., Groebe, D. R., Witherell, G. W., and Uhlenbeck, O. C. (1987) Oligoribonucleotide synthesis using T7 RNA polymerase and synthetic DNA templates. Nucleic Acids Res. 15, 8783?8789. (30) Milligan, J. F., and Uhlenbeck, O. C. (1989) Synthesis of small RNAs using T7 RNA polymerase. Methods Enzymol. 180, 51?62. (31) Beaucage, S. L., and Reese, C. B. (2009) Recent advances in the chemical synthesis of RNA. Curr. Protoc. Nucleic Acid Chem. 38, 2.16.1-2.16.31. (32) Reese, C. B. (1989) The chemical synthesis of oligo- and poly-ribonucleotides. Nucleic Acids Mol. Biol. 3, 164?181. (33) Ogilvie, K. K., Sadana, K. L., Thompson, E. A., Quilliam, M. A., and Westmore, J. B. (1974) The use of silyl groups in protecting the hydroxyl functions of ribonucleosides. Tetrahedron Lett. 15, 2861?2863. (34) Ogilvie, K. K., Theriault, N., and Sadana, K. L. (2002) Synthesis of oligoribonucleotides. J. Am. Chem. Soc. 99, 7741?7743. (35) Wilkinson, D. J. (2018) Historical and contemporary stable isotope tracer approaches to studying mammalian protein metabolism. Mass Spectrom. Rev. 37, 57?80. (36) Harris, R. K., Becker, E. D., Cabral De Menezes, S. M., Goodfellow, R., and Granger, P. (2001) NMR nomenclature. Nuclear spin properties and conventions for chemical shifts (IUPAC recommendations 2001). Pure Appl. Chem. 73, 1795?1818. (37) Bernstein, M. A., King, K. F., and Zhou, X. J. (2004) Handbook of MRI Pulse Sequences. Handb. MRI Pulse Seq. 1?1017. (38) Wijmenga, S. S., and Van Buuren, B. N. M. (1998) The use of NMR methods for conformational studies of nucleic acids. Prog. Nucl. Magn. Reson. Spectrosc. 32, 287? 387. (39) Xue, Y., Kellogg, D., Kimsey, I. J., Sathyamoorthy, B., Stein, Z. W., McBrairty, M., and Al-Hashimi, H. M. (2015) Characterizing RNA excited states using NMR relaxation dispersion. Methods Enzymol. 558, 39?73. (40) Wang, Y., Han, G., Jiang, X., Yuwen, T., and Xue, Y. (2021) Chemical shift prediction of RNA imino groups: application toward characterizing RNA excited states. Nat. Commun. 12, 15925. (41) Schnieders, R., Keyhani, S., Schwalbe, H., and F?rtig, B. (2020) More than proton detection?new avenues for NMR spectroscopy of RNA. Chem. Eur. J. 26, 102?113. (42) Kime, M. J., and Moore, P. B. (1984) Escherichia coli ribosomal 5S RNA-protein L25 nucleoprotein complex: effects of RNA binding on the protein structure and the nature of the interaction. Biochemistry 23, 1688?1695. (43) LeMaster, D. M., and Richards, F. M. (1988) NMR sequential assignment of Escherichia coli thioredoxin utilizing random fractional deuteriation. Biochemistry 27, 142?150. (44) Torchia, D. A., Sparks, S. W., and Bax, A. (1989) Staphylococcal Nuclease: Sequential Assignments and Solution Structure. Biochemistry 28, 5509?5524. (45) Gewirth, D. T., Leontis, N. B., Abo, S. R., and Moore, P. B. (1987) Secondary structure of 5S RNA: NMR experiments on RNA molecules partially labeled with nitrogen- 15. Biochemistry 26, 5213?5220. (46) Roy, S., Papastavros, M. Z., Redfield, A. G., and Sanchez, V. (1984) Nitrogen-15- labeled yeast tRNAPhe: double and two-dimensional heteronuclear NMR of guanosine and 229 uracil ring NH groups. Biochemistry 23, 4395?4400. (47) Griffey, R. H., Davis, D., Yamaizumit, Z., Nishimurat, S., Bax, A., Hawkins, B., and Poulter, C. D. (1985) 15N-labeled Escherichia coli tRNAfMet, tRNAGlu, tRNATyr, and tRNAPhe. Double resonance and two-dimensional NMR of N1-labeled pseudouridine. J. Biol. Chem. 260, 9734?9741. (48) Davis, D. R., Yamaizumi, Z., Nishimura, S., and Poulter, C. D. (1989) 15N-labeled 5S RNA. Identification of uridine base pairs in Escherichia coli 5S RNA by 1H-15N multiple quantum NMR. Biochemistry 28, 4105?4108. (49) Shigeyuki, Y., Usuki, K. M. J., Ziro, Y., Susumu, N., and Tatsuo, M. (1980) Tertiary structures of Escherichia coli tRNA as studied by NMR spectroscopy with 13C-labeling method. FEBS Lett. 119, 77?80. (50) Oslen, J. I., Schweizer, M. P., Walkiw, I. J., Hamill, W. D., Horton, W. J., and Grant, D. M. (1982) Carbon-13 NMR relaxation studies of pre-melt structural dynamics in [4-13C- uracil] labeled E. coli transfer RNAIVal. Nucleic Acids Res. 10, 4449?4464. (51) Schmidt, P. G., Sierzputowska-Gracz, H., Agris, P. F., and Agris, P. F. (1987) Internal motions in yeast phenylalanine transfer RNA from 13C NMR relaxation rates of modified base methyl groups: a Model-free approach. Biochemistry 26, 8529?8534. (52) Nikonowicz, E. P., Michnicka, M., Kalurachchi, K., and DeJong, E. (1997) Preparation and characterization of a uniformly 2H/15N-labeled RNA oligonucleotide for NMR Studies. Nucleic Acids Res. 25, 1390?1396. (53) Nikonowicz, E. P., and Pardi, A. (1992) Three-dimensional heteronuclear NMR studies of RNA. Nature 355, 184?186. (54) Nikonowicz, E. P., Sirr, A., Legault, P., Jucker, F. M., Baer, L. M., and Pardi, A. (1992) Preparation of 13C and 15N labelled RNAs for heteronuclear multi-dimensional NMR studies. Nucleic Acids Res. 20, 4507?4513. (55) Batey, R. T., Inada, M., Kujawinski, E., Puglisi, J. D., and Williamson, J. R. (1992) Preparation of isotopically labeled ribonucleotides for multidimensional NMR spectroscopy of RNA. Nucleic Acids Res. 20, 4515?4523. (56) Santoro, J., and King, G. C. (1992) A constant-time 2D overbodenhausen experiment for inverse correlation of isotopically enriched species. J. Magn. Reson. 97, 202?207. (57) F?ldesi, A., Trifonova, A., Kundu, M. K., and Chattopadhyaya, J. (2000) The synthesis of deuterionucleosides. Nucleosides, Nucleotides and Nucleic Acids 19, 1615? 1656. (58) Sattler, M., and Fesik, S. W. (1996) Use of deuterium labeling in NMR: overcoming a sizeable problem. Structure 4, 1245?1249. (59) D?Souza, V., Dey, A., Habib, D., and Summers, M. F. (2004) NMR structure of the 101-nucleotide core encapsidation signal of the Moloney murine leukemia virus. J. Mol. Biol. 337, 427?442. (60) Keane, S. C., Heng, X., Lu, K., Kharytonchyk, S., Ramakrishnan, V., Carter, G., Barton, S., Hosic, A., Florwick, A., Santos, J., Bolden, N. C., McCowin, S., Case, D. A., Johnson, B. A., Salemi, M., Telesnitsky, A., and Summers, M. F. (2015) Structure of the HIV-1 RNA packaging signal. Science 348, 917?921. (61) Marchant, J., Bax, A., and Summers, M. F. (2018) Accurate measurement of residual dipolar couplings in large RNAs by variable flip angle NMR. J. Am. Chem. Soc. 140, 6978?6983. (62) Kotar, A., Foley, H. N., Baughman, K. M., and Keane, S. C. (2020) Advanced 230 approaches for elucidating structures of large RNAs using NMR spectroscopy and complementary methods. Methods 183, 93?107. (63) Boyd, P. S., Brown, J. B., Brown, J. D., Catazaro, J., Chaudry, I., Ding, P., Dong, X., Marchant, J., O?Hern, C. T., Singh, K., Swanson, C., Summers, M. F., and Yasin, S. (2020) NMR Studies of retroviral genome packaging. Viruses 12, 1115. (64) Kang, M., Eichhorn, C. D., and Feigon, J. (2014) Structural determinants for ligand capture by a class II preQ1 riboswitch. Proc. Natl. Acad. Sci. U. S. A. 111, E663?E671. (65) Rastinejad, F., Evilia, C., and Lu, P. (1995) Studies of nucleic acids and their protein interactions by 19F NMR. Methods Enzymol. 261, 560?575. (66) Nu?baumer, F., Plangger, R., Roeck, M., and Kreutz, C. (2020) Aromatic 19F?13C TROSY?[19F, 13C]-pyrimidine labeling for NMR spectroscopy of RNA. Angew. Chemie Int. Ed. 59, 17062?17069. (67) Scott, L. G., and Hennig, M. (2016) 19F-Site-Specific-Labeled Nucleotides for Nucleic Acid Structural Analysis by NMR. Methods Enzymol. 566, 59?87. (68) Huang, W., Varani, G., and Drobny, G. P. (2010) 13C/15N-19F intermolecular REDOR NMR study of the interaction of TAR RNA with Tat peptides. J. Am. Chem. Soc. 132, 17643?17645. (69) Kreutz, C., K?hlig, H., Konrat, R., and Micura, R. (2006) A general approach for the identification of site-specific RNA binders by 19F NMR spectroscopy: proof of concept. Angew. Chemie Int. Ed. 45, 3450?3453. (70) Markley, J. L., Bax, A., Arata, Y., Hilbers, C. W., Kaptein, R., Sykes, B. D., Wright, P. E., and W?thrich, K. (1998) Recommendations for the presentation of NMR structures of proteins and nucleic acids. Eur. J. Biochem. 256, 1?15. (71) (1983) Abbreviations and Symbols for the Description of Conformations of Polynucleotide Chains: Recommendations 1982. Eur. J. Biochem. 131, 9?15. (72) Weickhmann, A. K., Keller, H., Wurm, J. P., Strebitzer, E., Juen, M. A., Kremser, J., Weinberg, Z., Kreutz, C., Duchardt-Ferner, E., and W?hnert, J. (2019) The structure of the SAM/SAH-binding riboswitch. Nucleic Acids Res. 47, 2654?2665. (73) Weickhmann, A. K., Keller, H., Duchardt-Ferner, E., Strebitzer, E., Juen, M. A., Kremser, J., Wurm, J. P., Kreutz, C., and W?hnert, J. (2018) NMR resonance assignments for the SAM/SAH-binding riboswitch RNA bound to S- adenosylhomocysteine. Biomol. NMR Assign. 12, 329?334. (74) Hoard, D. E., and Ott, D. G. (2002) Conversion of mono- and oligodeoxyribonucleotides to 5?-triphosphates. J. Am. Chem. Soc. 87, 1785?1788. (75) Simon, E. S., Grabowski, S., and Whitesides, G. M. (2002) Convenient syntheses of cytidine 5?-triphosphate, guanosine 5?-triphosphate, and uridine 5?-triphosphate and their use in the preparation of UDP-glucose, UDP-glucuronic acid, and GDP-mannose. J. Org. Chem. 55, 1834?1841. (76) Michnicka, M. J., King, G. C., and Harper, J. W. (1993) Selective isotopic enrichment of synthetic RNA: application to the HIV-1 TAR element. Biochemistry 32, 395?400. (77) Hines, J. V., Landry, S. M., Varani, G., and Tinoco, I. (1994) Carbon-proton scalar couplings in RNA: 3D heteronuclear and 2D isotope-edited NMR of a 13C-labeled extra- stable hairpin. J. Am. Chem. Soc. 116, 5823?5831. (78) Hoffman, D. W., and Holland, J. A. (1995) Preparation of carbon-13 labeled ribonucleotides using acetate as an isotope source. Nucleic Acids Res. 23, 3361?3362. (79) LeMaster, D. M., and Kushlan, D. M. (1996) Dynamical mapping of E. coli thioredoxin 231 via 13C NMR relaxation analysis. J. Am. Chem. Soc. 118, 9255?9264. (80) Fraenkel, D. G. (1968) Selection of Escherichia coli mutants lacking glucose-6- phosphate dehydrogenase or gluconate-6-phosphate dehydrogenase. J. Bacteriol. 95, 1267?1271. (81) Johnson, J. E., Julien, K. R., and Hoogstraten, C. G. (2006) Alternate-site isotopic labeling of ribonucleotides for NMR studies of ribose conformational dynamics in RNA. J. Biomol. NMR 35, 261?274. (82) Thakur, C. S., and Dayie, T. K. (2012) Asymmetry of 13C labeled 3-pyruvate affords improved site specific labeling of RNA for NMR spectroscopy. J. Biomol. NMR 52, 65?77. (83) Schultheisz, H. L., Szymczyna, B. R., Scott, L. G., and Williamson, J. R. (2008) Pathway engineered enzymatic de novo purine nucleotide synthesis. ACS Chem. Biol. 3, 499?511. (84) Schultheisz, H. L., Szymczyna, B. R., Scott, L. G., and Williamson, J. R. (2011) Enzymatic de novo pyrimidine nucleotide synthesis. J. Am. Chem. Soc. 133, 297?304. (85) Kline, P. C., and Serianni, A. S. (1990) 13C-Enriched ribonucleosides: synthesis and application of 13C-1H and 13C-13C spin-coupling constants to assess furanose and A- glycoside bond conformations. J. Am. Chem. Soc. 112, 7373?7381. (86) Cook, G. P., and Greenberg, M. M. (1994) A General Synthesis of C2?-Deuteriated Ribonucleosides. J. Org. Chem. 59, 4704?4706. (87) Toyama, A., Takino, Y., Takeuchi, H., and Harada, I. (2002) Ultraviolet resonance Raman spectra of ribosyl C(1?)-deuterated purine nucleosides: evidence of vibrational coupling between purine and ribose rings. J. Am. Chem. Soc. 115, 11092?11098. (88) F?ldesi, A., Nilson, F. P. R., Glemarec, C., Gioeli, C., and Chattopadhyaya, J. (1992) Synthesis of 1?#,2?,3?,4?#,5?,5?-2H6-?-D-ribonucleosides and 1?#, 2?,2?,3?,4?#,5?,5?-2H7-?-D- 2?-deoxyribonucleosides for selective suppression of proton resonances in partially- deuterated oligo-DNA, oligo-RNA and in 2,5A core (1H-NMR window). Tetrahedron 48, 9033?9072. (89) Vorbr?ggen, H., Krolikiewicz, K., and Bennua, B. (1981) Nucleoside syntheses, XXII1) Nucleoside synthesis with trimethylsilyl triflate and perchlorate as catalysts. Chem. Ber. 114, 1234?1255. (90) Lunn, F. A., MacDonnell, J. E., and Bearne, S. L. (2008) Structural requirements for the activation of Escherichia coli CTP synthase by the allosteric effector GTP are stringent, but requirements for inhibition are lax. J. Biol. Chem. 283, 2010?2020. (91) Gross, A., Abril, O., Lewis, J. M., Geresh, S., and Whitesides, G. M. (2002) Practical synthesis of 5-phospho-D-ribosyl ?-1-pyrophosphate (PRPP): enzymatic routes from ribose 5-phosphate or ribose. J. Am. Chem. Soc. 105, 7428?7435. (92) Hirschbein, B. L., Mazenod, F. P., and Whitesides, G. M. (2002) Synthesis of phosphoenolypyruvate and its use in ATP cofactor regeneration. J. Org. Chem. 47, 3765? 3766. (93) Parkin, D. W., and Schramm, V. L. (1987) Catalytic and allosteric mechanism of AMP nucleosidase from primary, beta-secondary, and multiple heavy atom kinetic isotope effects. Biochemistry 26, 913?920. (94) Rising, K. A., and Schramm, V. L. (1994) Enzymatic synthesis of NAD+ with the specific incorporation of atomic labels. J. Am. Chem. Soc. 116, 6531?6536. (95) Gilles, A. M., Cristea, I., Palibroda, N., Hilden, I., Jensen, K. F., Sarfati, R. S., Namane, A., Ughetto-Monfrin, J., and Ba?rzu, O. (1995) Chemienzymatic synthesis of 232 uridine nucleotides labeled with [15N] and [13C]. Anal. Biochem. 232, 197?203. (96) Parkin, D. W., Leung, H. B., and Schramm, V. L. (1984) Synthesis of nucleotides with specific radiolabels in ribose. J. Biol. Chem. 259, 9411?9417. (97) Scott, L. G., Tolbert, T. J., and Williamson, J. R. (2000) Preparation of specifically 2H- and 13C-labeled ribonucleotides. Methods Enzymol. 317, 18?38. (98) Tolbert, T. J., and Williamson, J. R. (1996) Preparation of Specifically Deuterated RNA for NMR studies using a combination of chemical and enzymatic synthesis. J. Am. Chem. Soc. 118, 7929?7940. (99) Longhini, A. P., Leblanc, R. M., Becette, O., Salguero, C., Wunderlich, C. H., Johnson, B. A., D?Souza, V. M., Kreutz, C., and Dayie, T. K. (2016) Chemo-enzymatic synthesis of site-specific isotopically labeled nucleotides for use in NMR resonance assignment, dynamics and structural characterizations. Nucleic Acids Res. 44, e52. (100) Alvarado, L. J., Leblanc, R. M., Longhini, A. P., Keane, S. C., Jain, N., Yildiz, Z. F., Tolbert, B. S., D?Souza, V. M., Summers, M. F., Kreutz, C., and Dayie, T. K. (2014) Regio- selective chemical-enzymatic synthesis of pyrimidine nucleotides facilitates RNA structure and dynamics studies. Chembiochem 15, 1573?1577. (101) Arthur, P. K., Alvarado, L. J., and Dayie, T. K. (2011) Expression, purification and analysis of the activity of enzymes from the pentose phosphate pathway. Protein Expr. Purif. 76, 229?237. (102) Roberts, J. L., and Poulter, C. D. (2002) 2?,3?,5?-Tri-O-benzoyl[4-13C]uridine. An efficient, regiospecific synthesis of the pyrimidine ring. J. Org. Chem. 43, 1547?1550. (103) Santalucia, J., Shen, L. X., Cai, Z., Lewis, H., and Tinoco, I. (1995) Synthesis and NMR of RNA with selective isotopic enrichment in the bases. Nucleic Acids Res. 23, 4913?4921. (104) Wunderlich, C. H., Spitzer, R., Santner, T., Fauster, K., Tollinger, M., and Kreutz, C. (2012) Synthesis of (6-13C)pyrimidine nucleotides as spin-labels for RNA dynamics. J. Am. Chem. Soc. 134, 7558?7569. (105) Davidson, D., and Baudisch, O. (1926) The preparation of uracil from urea. J. Am. Chem. Soc. 48, 2379?2383. (106) Niu, C. H. (1984) Synthesis of [4-15NH2]- and [1,3-15N2]cytidine derivatives for use in NMR-monitored binding tests. Anal. Biochem. 139, 404?407. (107) Juen, M. A., Wunderlich, C. H., Nu?baumer, F., Tollinger, M., Kontaxis, G., Konrat, R., Hansen, D. F., and Kreutz, C. (2016) Excited states of nucleic acids probed by proton relaxation dispersion NMR spectroscopy. Angew. Chemie Int. Ed. 55, 12008?12012. (108) Taiwo, K. M., Becette, O. B., Zong, G., Chen, B., Zavalij, P. Y., and Dayie, T. K. (2021) Chemo-enzymatic synthesis of 13C- and 19F-labeled uridine-5?-triphosphate for RNA NMR probing. Monatshefte f?r Chemie 152, 441?447. (109) Hennig, M., Munzarov?, M. L., Bermel, W., Scott, L. G., Sklen??, V., and Williamson, J. R. (2006) Measurement of long-range 1H-19F scalar coupling constants and their glycosidic torsion dependence in 5-fluoropyrimidine-substituted RNA. J. Am. Chem. Soc. 128, 5851?5858. (110) Cushley, R. J., Lipsky, S. R., and Fox, J. J. (1968) Reactions of 5-fluorouracil derivatives with sodium deuteroxide. Tetrahedron Lett. 9, 5393?5396. (111) Nu?baumer, F., Juen, M. A., Gasser, C., Kremser, J., M?ller, T., Tollinger, M., and Kreutz, C. (2017) Synthesis and incorporation of 13C-labeled DNA building blocks to probe structural dynamics of DNA by NMR. Nucleic Acids Res. 45, 9178. 233 (112) Kremser, J., Strebitzer, E., Plangger, R., Juen, M. A., Nu?baumer, F., Glasner, H., Breuker, K., and Kreutz, C. (2017) Chemical synthesis and NMR spectroscopy of long stable isotope labelled RNA. Chem. Commun. 53, 12938?12941. (113) Abad, J. L., Gaffney, B. L., and Jones, R. A. (1999) 15N-Multilabeled adenine and guanine nucleosides. Syntheses of [1,3,NH2-15N3]- and [2-13C-1,3,NH2-15N3]-labeled adenosine, guanosine, 2?-deoxyadenosine, and 2?-deoxyguanosine. J. Org. Chem. 64, 6575?6582. (114) Battaglia, U., Long, J. E., Searle, M. S., and Moody, C. J. (2011) 7-Deazapurine biosynthesis: NMR study of toyocamycin biosynthesis in Streptomyces rimosus using 2- 13C-7-15N-adenine. Org. Biomol. Chem. 9, 2227?2232. (115) Ouwerkerk, N., Van Boom, J., Lugtenburg, J., and Raap, J. (2002) Synthesis of [1?,2?,5?,2-13C4]-2?-deoxy-D-adenosine by a chemoenzymatic strategy to enable labelling of any of the 215 carbon-13 and nitrogen-15 isotopomers. European J. Org. Chem. 14, 2356?2362. (116) Sethi, S. K., Gupta, S. P., Jenkins, E. E., Whitehead, C. W., Townsend, L. B., and McCloskey, J. A. (1982) Mass spectrometry of nucleic acid constituents. Electron ionization spectra of selectively labeled adenines. J. Am. Chem. Soc. 104, 3349?3353. (117) Pagano, A. R., Lajewski, W. M., and Jones, R. A. (1995) Syntheses of [6,7-15N]- Adenosine, [6,7-15N]-2?-deoxyadenosine, and [7-15N]-hypoxanthine. J. Am. Chem. Soc. 117, 11669?11672. (118) Bendich, A., Russell, P. J., and Fox, J. J. (2002) The synthesis and properties of 6- chloropurine and Purine. J. Am. Chem. Soc. 76, 6073?6077. (119) Orji, C. C., Kelly, J., Ashburn, D. A., and Silks, L. A. (1996) First synthesis of ?-2?- deoxy[9-15N]adenosine. J. Chem. Soc. Perkin Trans. 595?597. (120) Jain, M. L., Tsao, Y. P., Ho, N. L., and Cheng, J. W. (2001) A facile synthesis of [N1,NH2-15N2]-, [N3,NH2-15N2]-, and [N1,N3,NH2-15N3]-labeled adenine. J. Org. Chem. 66, 6472?6475. (121) Zhao, H., Pagano, A. R., Wang, W., Shallop, A., Gaffney, B. L., and Jones, R. A. (1997) Use of a 13C atom to differentiate two 15N-labeled nucleosides. Syntheses of [15NH2]-adenosine,[1,NH2-15N2]- And [2-13C-1,NH2-15N2]-Guanosine, and [1,7,NH2- 15N3]- and [2-13C-1,7,NH2-15N3]-2?. J. Org. Chem. 62, 7832?7835. (122) Gaffney, B. L., Rung, P. P., and Jones, R. A. (1990) Nitrogen-15-labeled deoxynucleosides. 2. Synthesis of [7-15N]-labeled deoxyadenosine, deoxyguanosine, and related deoxynucleosides. J. Am. Chem. Soc. 112, 6748?6749. (123) Del, M., Barrio, C. G., Scopes, D. I. C., Holtwick, J. B., and Leonard, N. J. (1981) Syntheses of all singly labeled [15N]adenines: Mass spectral fragmentation of adenine. Proc. Natl. Acad. Sci. U. S. A. 78, 3986?3988. (124) Olenginski, L. T., and Dayie, T. K. (2020) Chemo-enzymatic synthesis of [2-13C, 7- 15N]-ATP for facile NMR analysis of RNA. Monatshefte f?r Chemie 151, 1467?1473. (125) Neuner, S., Kreutz, C., and Micura, R. (2017) The synthesis of 15N(7)-Hoogsteen face-labeled adenosine phosphoramidite for solid-phase RNA synthesis. Monatshefte f?r Chemie 148, 149?155. (126) Taiwo, K. M., Olenginski, L. T., Nu?baumer, F., Nam, H., Hilber, S., Kreutz, C., and Dayie, T. K. (2022) Synthesis of [7-15N]-GTPs for RNA structure and dynamics by NMR spectroscopy. Monatshefte f?r Chemie 153, 293?299. (127) Scott, L. G., Geierstanger, B. H., Williamson, J. R., and Hennig, M. (2004) 234 Enzymatic synthesis and 19F NMR studies of 2-fluoroadenine-substituted RNA. J. Am. Chem. Soc. 126, 11776?11777. (128) Hennig, M., Scott, L. G., Sperling, E., Bermel, W., and Williamson, J. R. (2007) Synthesis of 5-fluoropyrimidine nucleotides as sensitive NMR probes of RNA structure. J. Am. Chem. Soc. 129, 14911?14921. (129) Zhang, W., Zhao, S., and Serianni, A. S. (2015) Labeling monosaccharides with stable isotopes. Methods Enzymol. 565, 423?458. (130) Longhini, A. P., LeBlanc, R. M., Becette, O., Salguero, C., Wunderlich, C. H., Johnson, B. A., D?Souza, V. M., Kreutz, C., and Dayie, T. K. (2016) Chemo-enzymatic synthesis of site-specific isotopically labeled nucleotides for use in NMR resonance assignment, dynamics and structural characterizations. Nucleic Acids Res. 44, e52. (131) Zhang, W., Turney, T., Surjancev, I., and Serianni, A. S. (2017) Enzymatic synthesis of ribo- and 2?-deoxyribonucleosides from glycofuranosyl phosphates: an approach to facilitate isotopic labeling. Carbohydr. Res. 449, 125?133. (132) Wenter, P., and Pitsch, S. (2003) Synthesis of selectively 15N-Labeled 2?-O- {[(Triisopropylsilyl)oxy]methyl}(=tom)-Protected ribonucleoside phosphoramidites and their incorporation into a bistable 32mer RNA sequence. Helv. Chim. Acta 86, 3955?3974. (133) Gaffney, B. L., and Jones, R. A. (2001) Regioselective 2?-silylation of purine ribonucleosides for phosphoramidite RNA synthesis. Curr. Protoc. Nucleic Acid Chem. 6, 2.8.1-2.8.13. (134) Shallop, A. J., Gaffney, B. L., and Jones, R. A. (2004) Use of both direct and indirect 13C tags for probing nitrogen interactions in hairpin ribozyme models by 15N NMR. Nucleosides, Nucleotides and Nucleic Acids 23, 273?280. (135) Zhang, X., Gaffney, B. L., and Jones, R. A. (1998) 15N NMR of RNA fragments containing specifically labeled GU and GC pairs. J. Am. Chem. Soc. 120, 615?618. (136) Xiaohu Zhang, Barbara L. Gaffney, A., and Jones, R. A. (1997) 15N NMR of a specifically labeled RNA fragment containing intrahelical GU wobble pairs. J. Am. Chem. Soc. 119, 6432?6433. (137) Neuner, S., Santner, T., Kreutz, C., and Micura, R. (2015) The ?speedy? synthesis of atom-specific 15N imino/amido-labeled RNA. Chem. Eur. J. 21, 11634?11643. (138) Pitsch, S., Weiss, P. A., Jenny, L., Stutz, A., and Wu, X. (2001) Reliable chemical synthesis of oligoribonucleotides (RNA) with 2?-O-[(triisopropylsilyl)oxy]methyl(2?-O-tom)- protected phosphoramidites. Helv. Chim. Acta 84, 3773?3795. (139) Ohgi, T., Masutomi, Y., Ishiyama, K., Kitagawa, H., Shiba, Y., and Yano, J. (2005) A new RNA synthetic method with a 2?-O-(2-cyanoethoxymethyl) protecting group. Org. Lett. 7, 3477?3480. (140) Ohgi, T., Kitagawa, H., and Yano, J. (2008) Chemical synthesis of oligoribonucleotides with 2?-O-(2-cyanoethoxymethyl)-protected phosphoramidites. Curr. Protoc. Nucleic Acid Chem. 34, 2.15.1-2.15.19. (141) Shiba, Y., Masuda, H., Watanabe, N., Ego, T., Takagaki, K., Ishiyama, K., Ohgi, T., and Yano, J. (2007) Chemical synthesis of a very long oligoribonucleotide with 2- cyanoethoxymethyl (CEM) as the 2?-O-protecting group: structural identification and biological activity of a synthetic 110mer precursor-microRNA candidate. Nucleic Acids Res. 35, 3287?3296. (142) Beaucage, S. L., and Caruthers, M. H. (1981) Deoxynucleoside phosphoramidites?a new class of key intermediates for deoxypolynucleotide synthesis. 235 Tetrahedron Lett. 22, 1859?1862. (143) Liu, B., Shi, H., and Al-Hashimi, H. M. (2021) Developments in solution-state NMR yield broader and deeper views of the dynamic ensembles of nucleic acids. Curr. Opin. Struct. Biol. 70, 16?25. (144) Strebitzer, E., Rangadurai, A., Plangger, R., Kremser, J., Juen, M. A., Tollinger, M., Al-Hashimi, H. M., and Kreutz, C. (2018) 5-Oxyacetic acid modification destabilizes double helical stem structures and favors anionic Watson-Crick like cmo5 U-G Base Pairs. Chem. - A Eur. J. 24, 18903?18906. (145) Shi, H., Liu, B., Nussbaumer, F., Rangadurai, A., Kreutz, C., and Al-Hashimi, H. M. (2019) NMR chemical exchange measurements reveal that N6-methyladenosine slows RNA annealing. J. Am. Chem. Soc. 141, 19988?19993. (146) Liu, B., Shi, H., Rangadurai, A., Nussbaumer, F., Chu, C. C., Erharter, K. A., Case, D. A., Kreutz, C., and Al-Hashimi, H. M. (2021) A quantitative model predicts how m6A reshapes the kinetic landscape of nucleic acid hybridization and conformational transitions. Nat. Commun. 2021 121 12, 1?17. (147) Desaulniers, J. P., Chang, Y. C., Aduri, R., Abeysirigunawardena, S. C., Santalucia, J., and Chow, C. S. (2008) Pseudouridines in rRNA helix 69 play a role in loop stacking interactions. Org. Biomol. Chem. 6, 3892?3895. (148) Chen, B., Longhini, A. P., Nu?baumer, F., Kreutz, C., Dinman, J. D., and Dayie, T. K. (2018) CCR5 RNA Pseudoknots: Residue and Site-Specific Labeling correlate Internal Motions with microRNA Binding. Chem. Eur. J. 24, 5462?5468. (149) Olenginski, L. T., Becette, O. B., Beaucage, S. L., and Dayie, T. K. (2021) Synthesis of atom-specific nucleobase and ribose labeled uridine phosphoramidite for NMR analysis of large RNAs. Monatshefte f?r Chemie 152, 1361?1367. (150) Scaringe, S. A., Kitchen, D., Kaiser, R. J., and Marshall, W. S. (2004) Preparation of 5?-silyl-2?-orthoester ribonucleosides for use in oligoribonucleotide synthesis. Curr. Protoc. Nucleic Acid Chem. 16, 2.10.1-2.10.16. (151) Schwartz, M. E., Breaker, R. R., Asteriadis, G. T., deBear, J. S., and Gough, G. R. (1992) Rapid synthesis of oligoribonucleotides using 2?-O-(o-nitrobenzyloxymethyl)- protected monomers. Bioorg. Med. Chem. Lett. 2, 1019?1024. (152) Krieg, P. A., and Melton, D. A. (1987) In vitro RNA synthesis with SP6 RNA polymerase. Methods Enzymol. 155, 397?415. (153) Pokrovskaya, I. D., and Gurevich, V. V. (1994) In Vitro transcription: preparative RNA yields in analytical scale reactions. Anal. Biochem. 220, 420?423. (154) William Studier, F., Rosenberg, A. H., Dunn, J. J., and Dubendorff, J. W. (1990) Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol. 185, 60?89. (155) Coleman, T. M., Wang, G., and Huang, F. (2004) Superior 5? homogeneity of RNA from ATP-initiated transcription under the T7 phi 2.5 promoter. Nucleic Acids Res. 32, e14. (156) Helm, M., Brule, H., Giege, R., and Florentz, C. (1999) More mistakes by T7 RNA polymerase at the 5? ends of in vitro-transcribed RNAs. RNA 5, 618?621. (157) Pleiss, J. A., Derrick, M. L., and Uhlenbeck, O. C. (1998) T7 RNA polymerase produces 5? end heterogeneity during in vitro transcription from certain templates. RNA 4, 1313?1317. (158) Krupp, G. (1988) RNA synthesis: strategies for the use of bacteriophage RNA 236 polymerases. Gene 72, 75?89. (159) Grosshans, C. A., and Cech, T. R. (1991) A hammerhead ribozyme allows synthesis of a new form of the Tetrahymena ribozyme homogeneous in length with a 3? end blocked for transesterification. Nucleic Acids Res. 19, 3875?3880. (160) Ferr?-D?Amar?, A. R., and Doudna, J. A. (1996) Use of cis- and trans-ribozymes to remove 5? and 3? heterogeneities from milligrams of in vitro transcribed RNA. Nucleic Acids Res. 24, 977?978. (161) Brieba, L., and Sousa, R. (2000) Roles of histidine 784 and tyrosine 639 in ribose discrimination by T7 RNA polymerase. Biochemistry 39, 919?923. (162) Padilla, R., and Sousa, R. (2002) A Y639F/H784A T7 RNA polymerase double mutant displays superior properties for synthesizing RNAs with non-canonical NTPs. Nucleic Acids Res. 30, e138. (163) Sousa, R., and Padilla, R. (1995) A mutant T7 RNA polymerase as a DNA polymerase. EMBO J. 14, 4609?4621. (164) Kostyuk, D. A., Dragan, S. M., Lyakhov, D. L., Rechinsky, V. O., Tunitskaya, V. L., Chernov, B. K., and Kochetkov, S. N. (1995) Mutants of T7 RNA polymerase that are able to synthesize both RNA and DNA. FEBS Lett. 369, 165?168. (165) Wu, M. Z., Asahara, H., Tzertzinis, G., and Roy, B. (2020) Synthesis of low immunogenicity RNA with high-temperature in vitro transcription. RNA 26, 345?360. (166) Moore, M. J., and Query, C. C. (2000) Joining of RNAs by splinted ligation. Methods Enzymol. 317, 109?123. (167) Porecha, R., and Herschlag, D. (2013) RNA radiolabeling. Methods Enzymol. 530, 255?279. (168) Romaniuk, P. J., and Uhlenbeck, O. C. (1983) Joining of RNA molecules with RNA ligase. Methods Enzymol. 100, 52?59. (169) Bain, J. D., and Switzer, C. (1992) Regioselective ligation of oligoribonucleotides using DNA splints. Nucleic Acids Res. 20, 4372. (170) Stark, M. R., Pleiss, J. A., Deras, M., Scaringe, S. A., and Rader, S. D. (2006) An RNA ligase-mediated method for the efficient creation of large, synthetic RNAs. RNA 12, 2014?2019. (171) Kim, I., Lukavsky, P. J., and Puglisi, J. D. (2002) NMR study of 100 kDa HCV IRES RNA, using segmental isotope labeling. J. Am. Chem. Soc. 124, 9338?9339. (172) Tzakos, A. G., Easton, L. E., and Lukavsky, P. J. (2006) Complementary segmental labeling of large RNAs: economic preparation and simplified NMR spectra for measurement of more RDCs. J. Am. Chem. Soc. 128, 13344?13345. (173) Nelissen, F. H. T., van Gammeren, A. J., Tessari, M., Girard, F. C., Heus, H. A., and Wijmenga, S. S. (2008) Multiple segmental and selective isotope labeling of large RNA for NMR structural studies. Nucleic Acids Res. 36, e89. (174) Duss, O., Maris, C., Von Schroetter, C., and Allain, F. H. T. (2010) A fast, efficient and sequence-independent method for flexible multiple segmental isotope labeling of RNA using ribozyme and RNase H cleavage. Nucleic Acids Res. 38, e188. (175) Liu, Y., Holmstrom, E., Zhang, J., Yu, P., Wang, J., Dyba, M. A., Chen, D., Ying, J., Lockett, S., Nesbitt, D. J., Ferr?-D?Amar?, A. R., Sousa, R., Stagno, J. R., and Wang, Y. X. (2015) Synthesis and applications of RNAs with position-selective labelling and mosaic composition. Nature 522, 368?372. (176) Liu, Y., Yu, P., Dyba, M., Sousa, R., Stagno, J. R., and Wang, Y. X. (2016) 237 Applications of PLOR in labeling large RNAs at specific sites. Methods 103, 4?10. (177) Liu, Y., Holmstrom, E., Yu, P., Tan, K., Zuo, X., Nesbitt, D. J., Sousa, R., Stagno, J. R., and Wang, Y. X. (2018) Incorporation of isotopic, fluorescent, and heavy-atom- modified nucleotides into RNAs by position-selective labeling of RNA. Nat. Protoc. 13, 987?1005. (178) Keyhani, S., Goldau, T., Bl?mler, A., Heckel, A., and Schwalbe, H. (2018) Chemo- Enzymatic Synthesis of Position-Specifically Modified RNA for Biophysical Studies including Light Control and NMR Spectroscopy. Angew. Chemie Int. Ed. 57, 12017? 12021. (179) Bl?mler, A., Schwalbe, H., and Heckel, A. (2022) Solid-Phase-Supported Chemoenzymatic Synthesis of a Light-Activatable tRNA Derivative. Angew. Chemie Int. Ed. 61, e202111613. (180) Su, Z., Zhang, K., Kappel, K., Li, S., Palo, M. Z., Pintilie, G. D., Rangan, R., Luo, B., Wei, Y., Das, R., and Chiu, W. (2021) Cryo-EM structures of full-length Tetrahymena ribozyme at 3.1 ? resolution. Nature 596, 603?607. (181) Kappel, K., Zhang, K., Su, Z., Watkins, A. M., Kladwang, W., Li, S., Pintilie, G., Topkar, V. V., Rangan, R., Zheludev, I. N., Yesselman, J. D., Chiu, W., and Das, R. (2020) Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat. Methods 17, 699?707. (182) Gauto, D. F., Estrozi, L. F., Schwieters, C. D., Effantin, G., Macek, P., Sounier, R., Sivertsen, A. C., Schmidt, E., Kerfah, R., Mas, G., Colletier, J. P., G?ntert, P., Favier, A., Schoehn, G., Schanda, P., and Boisbouvier, J. (2019) Integrated NMR and cryo-EM atomic-resolution structure determination of a half-megadalton enzyme complex. Nat. Commun. 10, 2697. (183) Shimada, I., Ueda, T., Kofuku, Y., Eddy, M. T., and W?thrich, K. (2018) GPCR drug discovery: integrating solution NMR data with crystal and cryo-EM structures. Nat. Rev. Drug Discov. 18, 59?82. (184) Seffernick, J. T., and Lindert, S. (2020) Hybrid methods for combined experimental and computational determination of protein structure. J. Chem. Phys. 153, 240901. (185) Lu, K., Miyazaki, Y., and Summers, M. F. (2010) Isotope labeling strategies for NMR studies of RNA. J. Biomol. NMR 46, 113?125. (186) Barnwal, R. P., Loh, E., Godin, K. S., Yip, J., Lavender, H., Tang, C. M., and Varani, G. (2016) Structure and mechanism of a molecular rheostat, an RNA thermometer that modulates immune evasion by Neisseria meningitidis. Nucleic Acids Res. 44, 9426?9437. (187) de Jesus, V., Qureshi, N. S., Warhaut, S., Bains, J. K., Dietz, M. S., Heilemann, M., Schwalbe, H., and F?rtig, B. (2021) Switching at the ribosome: riboswitches need rProteins as modulators to regulate translation. Nat. Commun. 12, 4723. (188) Traube, W. (1904) Der Aufbau der xanthinbasen aus der cyanessigs?ure. Synthese des hypoxanthins und adenins. Justus Liebigs Ann. Chem. 331, 64?88. (189) Nardi-Schreiber, A., Sapir, G., Gamliel, A., Kakhlon, O., Sosna, J., Gomori, J. M., Meiner, V., Lossos, A., and Katz-Brull, R. (2017) Defective ATP breakdown activity related to an ENTPD1 gene mutation demonstrated using 31P NMR spectroscopy. Chem. Commun. 53, 9121?9124. (190) Lakomek, N. A., Kaufman, J. D., Stahl, S. J., Louis, J. M., Grishaev, A., Wingfield, P. T., and Bax, A. (2013) Internal dynamics of the homotrimeric HIV-1 viral coat protein gp41 on multiple time scales. Angew. Chemie Int. Ed. 52, 3911?3915. 238 (191) Hansen, A. L., and Al-Hashimi, H. M. (2007) Dynamics of large elongated RNA by NMR carbon relaxation. J. Am. Chem. Soc. 129, 16072?16082. (192) Dieckmann, T., and Feigon, J. (1997) Assignment methodology for larger RNA oligonucleotides: Application to an ATP-binding RNA aptamer. J. Biomol. NMR 9, 259? 272. (193) Hoffman, D. W. (2000) Resolution of the 1H-1H NOE spectrum of RNA into three dimensions using 15N-1H two-bond couplings. J. Biomol. NMR 16, 165?169. (194) Hyberts, S. G., Takeuchi, K., and Wagner, G. (2010) Poisson-gap sampling and forward maximum entropy reconstruction for enhancing the resolution and sensitivity of protein NMR data. J. Am. Chem. Soc. 132, 2145?2147. (195) Lakomek, N. A., Ying, J., and Bax, A. (2012) Measurement of 15N relaxation rates in perdeuterated proteins by TROSY-based methods. J. Biomol. NMR 53, 209?221. (196) Johnson, B. A., and Blevins, R. A. (1994) NMR View: A computer program for the visualization and analysis of NMR data. J. Biomol. NMR 4, 603?614. (197) Fushman, D., Cahill, S., and Cowburn, D. (1997) The main-chain dynamics of the dynamin pleckstrin homology (PH) domain in solution: analysis of 15N relaxation with monomer/dimer equilibration. J. Mol. Biol. 266, 173?194. (198) Olenginski, L. T., and Dayie, T. K. (2021) Quantifying the effects of long-range 13C- 13C dipolar coupling on measured relaxation rates in RNA. J. Biomol. NMR 75, 203?211. (199) Komoroski, R. A., and Allerhand, A. (1974) Observation of resonances from some minor bases in the natural-abundance carbon-13 nuclear magnetic resonance spectrum of unfractionated yeast transfer ribonucleic acid. Evidence for fast internal motion of the dihydrouracil rings. Biochemistry 13, 369?372. (200) Kay, L. E., Jue, T. L., Bangerter, B., and Demou, P. C. (1987) Sensitivity enhancement of 13C T1 measurements via polarization transfer. J. Magn. Reson. 73, 558? 564. (201) Sklen??, V. ?., Torchia, D., and Bax, A. (1987) Measurement of carbon-13 longitudinal relaxation using 1H detection. J. Magn. Reson. 73, 375?379. (202) Nirmala, N. R., and Wagner, G. (2002) Measurement of 13C relaxation times in proteins by two-dimensional heteronuclear 1H-13C correlation spectroscopy. J. Am. Chem. Soc. 110, 7557?7558. (203) Kay, L. E., Torchia, D. A., and Bax, A. (1989) Backbone dynamics of proteins as studied by 15N inverse detected heteronuclear NMR spectroscopy: application to staphylococcal nuclease. Biochemistry 28, 8972?8979. (204) Peng, J. W., and Wagner, G. (1992) Mapping of spectral density functions using heteronuclear NMR relaxation measurements. J. Magn. Reson. 98, 308?332. (205) Dayie, K. T., and Wagner, G. (1994) Relaxation-Rate Measurements for 15N?1H groups with pulsed-field gradients and preservation of coherence pathways. J. Magn. Reson. 111, 121?126. (206) Farrow, N. A., Muhandiram, R., Pascal, S. M., Kay, L. E., Singer, A. U., Forman- Kay, J. D., Kay, C. M., Gish, G., Pawson, T., and Shoelson, S. E. (1994) Backbone dynamics of a free and a phosphopeptide-complexed src homology 2 domain studied by 15N NMR relaxation. Biochemistry 33, 5984?6003. (207) Akke, M., Fiala, R., Jiang, F., Patel, D., and Palmer, A. G. (1997) Base dynamics in a UUCG tetraloop RNA hairpin characterized by 15N spin relaxation: correlations with structure and stability. RNA 3, 702?709. 239 (208) Hall, K. B., and Tang, C. (1998) 13C relaxation and dynamics of the purine bases in the iron responsive element RNA hairpin. Biochemistry 37, 9323?9332. (209) Boisbouvier, J., Brutscher, B., Simorre, J. P., and Marion, D. (1999) 13C spin relaxation measurements in RNA: Sensitivity and resolution improvement using spin- state selective correlation experiments. J. Biomol. NMR 14, 241?252. (210) Hoogstraten, C. G., Wank, J. R., and Pardi, A. (2000) Active site dynamics in the lead-dependent ribozyme. Biochemistry 39, 9951?9958. (211) Dayie, K. T., Brodsky, A. S., and Williamson, J. R. (2002) Base flexibility in HIV-2 TAR RNA mapped by solution 15N, 13C NMR relaxation. J. Mol. Biol. 317, 263?278. (212) Vallurupalli, P., and Kay, L. E. (2005) A suite of 2H NMR spin relaxation experiments for the measurement of RNA dynamics. J. Am. Chem. Soc. 127, 6893?6901. (213) Rinnenthal, J., Richter, C., Nozinovic, S., F?rtig, B., Lopez, J. J., Glaubitz, C., and Schwalbe, H. (2009) RNA phosphodiester backbone dynamics of a perdeuterated cUUCGg tetraloop RNA from phosphorus-31 NMR relaxation analysis. J. Biomol. NMR 45, 143?155. (214) Nozinovic, S., Richter, C., Rinnenthal, J., F?rtig, B., Duchardt-Ferner, E., Weigand, J. E., and Schwalbe, H. (2010) Quantitative 2D and 3D ?-HCP experiments for the determination of the angles ? and ? in the phosphodiester backbone of oligonucleotides. J. Am. Chem. Soc. 132, 10318?10329. (215) Rangadurai, A., Szymaski, E. S., Kimsey, I. J., Shi, H., and Al-Hashimi, H. M. (2019) Characterizing micro-to-millisecond chemical exchange in nucleic acids using off- resonance R1? relaxation dispersion. Prog. Nucl. Magn. Reson. Spectrosc. 112?113, 55? 102. (216) Al-Hashimi, H. M., and Walter, N. G. (2008) RNA dynamics: It is about time. Curr. Opin. Struct. Biol. 18, 321. (217) Bothe, J. R., Nikolova, E. N., Eichhorn, C. D., Chugh, J., Hansen, A. L., and Al- Hashimi, H. M. (2011) Characterizing RNA dynamics at atomic resolution using solution- state NMR spectroscopy. Nat. Methods 8, 919?931. (218) Ishima, R., and Torchia, D. A. (2000) Protein dynamics from NMR. Nat. Struct. Biol. 7, 740?743. (219) Palmer, A. G. (2004) NMR Characterization of the dynamics of biomacromolecules. Chem. Rev. 104, 3623?3640. (220) Peng, J. W. (2001) Cross-correlated 19F relaxation measurements for the study of fluorinated ligand-receptor interactions. J. Magn. Reson. 153, 32?47. (221) Dayie, K. T., Wagner, G., and Lef?vre, J. F. (2003) Theory and practice of nuclear spin relaxation in proteins. Annu. Rev. Phys. Chem. 47, 243?282. (222) Fushman, D., and Cowburn, D. (2001) Nuclear magnetic resonance relaxation in determination of residue-specific 15N chemical shift tensors in proteins in solution: protein dynamics, structure, and applications of transverse relaxation optimized spectroscopy. Methods Enzymol. 339, 109?122. (223) Ying, J., Grishaev, A., Bryce, D. L., and Bax, A. (2006) Chemical shift tensors of protonated base carbons in helical RNA and DNA from NMR relaxation and liquid crystal measurements. J. Am. Chem. Soc. 128, 11443?11454. (224) Fushman, D., Tjandra, N., and Cowburn, D. (1998) Direct measurement of 15N chemical shift anisotropy in solution. J. Am. Chem. Soc. 120, 10947?10952. (225) Spiess, H. W. (1978) Rotation of Molecules and Nuclear Spin Relaxation. Dyn. NMR 240 Spectrosc. 55?214. (226) Lipari, G., and Szabo, A. (2002) Model-free approach to the interpretation of nuclear magnetic resonance relaxation in macromolecules. 1. Theory and range of validity. J. Am. Chem. Soc. 104, 4546?4559. (227) Nam, H., Becette, O., LeBlanc, R. M., Oh, D., Case, D. A., and Dayie, T. K. (2020) Deleterious effects of carbon-carbon dipolar coupling on RNA NMR dynamics. J. Biomol. NMR 74, 321?331. (228) Shajani, Z., Drobny, G., and Varani, G. (2007) Binding of U1A protein changes RNA dynamics as observed by 13C NMR relaxation studies. Biochemistry 46, 5875?5883. (229) Shajani, Z., and Varani, G. (2005) 13C NMR relaxation studies of RNA base and ribose nuclei reveal a complex pattern of motions in the RNA binding site for human U1A protein. J. Mol. Biol. 349, 699?715. (230) Boisbouvier, J., Wu, Z., Ono, A., Kainosho, M., and Bax, A. (2003) Rotational diffusion tensor of nucleic acids from 13C NMR relaxation. J. Biomol. NMR 27, 133?142. (231) Zhang, Q., Sun, X., Watt, E. D., and Al-Hashimi, H. M. (2006) Resolving the motional modes that code for RNA adaptation. Science (80-. ). 311, 653?656. (232) Johnson, J. E., and Hoogstraten, C. G. (2008) Extensive backbone dynamics in the GCAA RNA tetraloop analyzed using 13C NMR spin relaxation and specific isotope labeling. J. Am. Chem. Soc. 130, 16757?16769. (233) Thakur, C. S., Luo, Y., Chen, B., Eldho, N. V., and Dayie, T. K. (2012) Biomass production of site selective 13C/15N nucleotides using wild type and a transketolase E. coli mutant for labeling RNA for high resolution NMR. J. Biomol. NMR 52, 103?114. (234) Geen, H., and Freeman, R. (1991) Band-selective radiofrequency pulses. J. Magn. Reson. 93, 93?141. (235) Kay, L. E., Keifer, P., and Saarinen, T. (1992) Pure absorption gradient enhanced heteronuclear single quantum correlation spectroscopy with improved sensitivity. J. Am. Chem. Soc. 114, 10663?10665. (236) Palmer, A. G., Cavanagh, J., Wright, P. E., and Rance, M. (1991) Sensitivity improvement in proton-detected two-dimensional heteronuclear correlation NMR spectroscopy. J. Magn. Reson. 93, 151?170. (237) Mandel, A. M., Akke, M., and Palmer, A. G. (1995) Backbone dynamics of Escherichia coli ribonuclease HI: correlations with structure and function in an active enzyme. J. Mol. Biol. 246, 144?163. (238) Flodell, S., Petersen, M., Girard, F., Zdunek, J., Kidd-Ljunggren, K., Schleucher, J., and Wijmenga, S. (2006) Solution structure of the apical stem-loop of the human hepatitis B virus encapsidation signal. Nucleic Acids Res. 34, 4449?4457. (239) Fiala, R., Czernek, J., and Sklen??, V. (2000) Transverse relaxation optimized triple- resonance NMR experiments for nucleic acids. J. Biomol. NMR 16, 291?302. (240) Emsley, L., and Bodenhausen, G. (1992) Optimization of shaped selective pulses for NMR using a quaternion description of their overall propagators. J. Magn. Reson. 97, 135?148. (241) Kup?e, ?., Boyd, J., and Campbell, I. D. (1995) Short selective pulses for biochemical applications. J. Magn. Reson. 106, 300?303. (242) Palmer, A. G., Kroenke, C. D., and Loria, J. P. (2001) Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules. Methods Enzymol. 339, 204?238. 241 (243) Palmer, A. G., and Koss, H. (2019) Chemical Exchange. Methods Enzymol. 615, 177?236. (244) Palmer, A. G., and Massi, F. (2006) Characterization of the dynamics of biomacromolecules using rotating-frame spin relaxation NMR spectroscopy. Chem. Rev. 106, 1700?1719. (245) Palmer, A. G., Grey, M. J., and Wang, C. (2005) Solution NMR spin relaxation methods for characterizing chemical exchange in high-molecular-weight systems. Methods Enzymol. 394, 430?465. (246) Anthis, N. J., and Clore, G. M. (2015) Visualizing transient dark states by NMR spectroscopy. Q. Rev. Biophys. 48, 35?116. (247) Peng, J. W., Thanabal, V., and Wagner, G. (1991) 2D Heteronuclear NMR Measurements of Spin-Lattice Relaxation Times in the Rotating Frame of X Nuclei in Heteronuclear HX Spin Systems. J. Magn. Reson. 94, 82?100. (248) Carr, H. Y., and Purcell, E. M. (1954) Effects of diffusion on free precession in nuclear magnetic resonance experiments. Phys. Rev. 94, 630?638. (249) Meiboom, S., and Gill, D. (2004) Modified spin-echo method for measuring nuclear relaxation times. Rev. Sci. Instrum. 29, 688?691. (250) Luz, Z., and Meiboom, S. (2004) Nuclear magnetic resonance study of the protolysis of trimethylammonium ion in aqueous solution?oprder of the reaction with respect to solvent. J. Chem. Phys. 39, 366?370. (251) Vallurupalli, P., Bouvignies, G., and Kay, L. E. (2012) Studying ?invisible? excited protein states in slow exchange with a major state conformation. J. Am. Chem. Soc. 134, 8148?8161. (252) Schanda, P., and Brutscher, B. (2005) Very fast two-dimensional NMR spectroscopy for real-time investigation of dynamic events in proteins on the time scale of seconds. J. Am. Chem. Soc. 127, 8014?8015. (253) Carver, J. P., and Richards, R. E. (1972) A general two-site solution for the chemical exchange produced dependence of T2 upon the carr-Purcell pulse separation. J. Magn. Reson. 6, 89?105. (254) Strebitzer, E., Nu?baumer, F., Kremser, J., Tollinger, M., and Kreutz, C. (2018) Studying sparsely populated conformational states in RNA combining chemical synthesis and solution NMR spectroscopy. Methods 148, 39?47. (255) Zhao, B., Hansen, A. L., and Zhang, Q. (2014) Characterizing slow chemical exchange in nucleic acids by carbon CEST and low spin-lock field R1? NMR spectroscopy. J. Am. Chem. Soc. 136, 20?23. (256) Helgstrand, M., Hard, T., and Allard, P. (2000) Simulations of NMR pulse sequences during equilibrium and non-equilibrium chemical exchange. J. Biomol. NMR 18, 49?63. (257) Bouvignies, G., and Kay, L. E. (2012) Measurement of proton chemical shifts in invisible states of slowly exchanging protein systems by chemical exchange saturation transfer. J. Phys. Chem. B 116, 14311?14317. (258) McConnell, H. M. (2004) Reaction rates by nuclear magnetic resonance. J. Chem. Phys. 28, 430. (259) Korzhnev, D. M., Orekhov, V. Y., and Kay, L. E. (2004) Off-Resonance R1? NMR studies of exchange dynamics in proteins with low spin-lock fields:? an application to a Fyn SH3 Domain. J. Am. Chem. Soc. 127, 713?721. 242 (260) Kimsey, I. J., Szymanski, E. S., Zahurancik, W. J., Shakya, A., Xue, Y., Chu, C. C., Sathyamoorthy, B., Suo, Z., and Al-Hashimi, H. M. (2018) Dynamic basis for dG?dT misincorporation via tautomerization and ionization. Nature 554, 195?201. (261) Bax, A., Mehlkopf, A. F., and Smidt, J. (1979) Homonuclear broadband-decoupled absorption spectra, with linewidths which are independent of the transverse relaxation rate. J. Magn. Reson. 35, 167?169. (262) Bax, A., and Freeman, R. (1981) Investigation of complex networks of spin-spin coupling by two-dimensional NMR. J. Magn. Reson. 44, 542?561. (263) Grzesiek, S., and Bax, A. (1992) Improved 3D triple-resonance NMR techniques applied to a 31 kDa protein. J. Magn. Reson. 96, 432?440. (264) van de Ven, F. J. M., and Philippens, vanielle E. P. (1992) Optimization of constant- time evolution in multidimensional NMR experiments. J. Magn. Reson. 97, 637?644. (265) Kup?e, E., and Wagner, G. (1996) Multisite band-selective decoupling in proteins. J. Magn. Reson. Ser. B 110, 309?312. (266) Brutscher, B., Boisbouvier, J., Kupe, E., Tisn?, C., Dardel, F., Marion, D., and Simorre, J. P. (2001) Base-type-selective high-resolution 13C edited NOESY for sequential assignment of large RNAs. J. Biomol. NMR 2001 192 19, 141?151. (267) Dayie, K. T. (2005) Resolution enhanced homonuclear carbon decoupled triple resonance experiments for unambiguous RNA structural characterization. J. Biomol. NMR 32, 129?139. (268) Chiarparin, E., Pelupessy, P., and Bodenhausen, G. (2012) Selective cross- polarization in solution state NMR. Mol. Phys. 95, 759?767. (269) Pelupessy, P., Chiarparin, E., and Bodenhausen, G. (1999) Excitation of selected proton signals in NMR of isotopically labeled macromolecules. J. Magn. Reson. 138, 178? 181. (270) Ferrage, F., Eykyn, T. R., and Bodenhausen, G. (2004) Frequency-switched single- transition cross-polarization: a tool for selective experiments in biomolecular NMR. Chemphyschem 5, 76?84. (271) Vallurupalli, P., Scott, L., Williamson, J. R., and Kay, L. E. (2007) Strong coupling effects during X-pulse CPMG experiments recorded on heteronuclear ABX spin systems: artifacts and a simple solution. J. Biomol. NMR 38, 41?46. (272) LeBlanc, R. M., Longhini, A. P., Tugarinov, V., and Dayie, T. K. (2018) NMR probing of invisible excited states using selectively labeled RNAs. J. Biomol. NMR 71, 165?172. (273) Zhou, Y., and Yang, D. (2014) Effects of J couplings and unobservable minor states on kinetics parameters extracted from CEST data. J. Magn. Reson. 249, 118?125. (274) Zhou, Y., and Yang, D. (2015) 13C? CEST experiment on uniformly 13C-labeled proteins. J. Biomol. NMR 61, 89?94. (275) Kloiber, K., Spitzer, R., Tollinger, M., Konrat, R., and Kreutz, C. (2011) Probing RNA dynamics via longitudinal exchange and CPMG relaxation dispersion NMR spectroscopy using a sensitive 13C-methyl label. Nucleic Acids Res. 39, 4340?4351. (276) Hansen, A. L., Lundstr?m, P., Velyvis, A., and Kay, L. E. (2012) Quantifying millisecond exchange dynamics in proteins by CPMG relaxation dispersion NMR using side-chain 1H probes. J. Am. Chem. Soc. 134, 3178?3189. (277) Otten, R., Villali, J., Kern, D., and Mulder, F. A. A. (2010) Probing microsecond time scale dynamics in proteins by methyl 1H Carr-Purcell-Meiboom-Gill relaxation dispersion NMR measurements. Application to activation of the signaling protein NtrCr. J. Am. 243 Chem. Soc. 132, 17004?17014. (278) Beckwith, M. A., Erazo-Colon, T., and Johnson, B. A. (2021) RING NMR dynamics: software for analysis of multiple NMR relaxation experiments. J. Biomol. NMR 75, 9?23. (279) Mulder, F. A. A., Skrynnikov, N. R., Hon, B., Dahlquist, F. W., and Kay, L. E. (2001) Measurement of slow (micros-ms) time scale dynamics in protein side chains by 15N relaxation dispersion NMR spectroscopy: application to Asn and Gln residues in a cavity mutant of T4 lysozyme. J. Am. Chem. Soc. 123, 967?975. (280) Cavanaugh, J. E., and Neath, A. A. (2019) The Akaike information criterion: Background, derivation, properties, application, interpretation, and refinements. Wiley Interdiscip. Rev. Comput. Stat. 11, e1460. (281) Olenginski, L. T., Kasprzak, W. K., Bergonzo, C., Shapiro, B. A., and Dayie, T. K. (2022) Conformational dynamics of the hepatitis B virus pre-genomic RNA on multiple time scales: implications for viral replication. J. Mol. Biol. (under review) (282) Galibert, F., Mandart, E., Fitoussi, F., Tiollais, P., and Charnay, P. (1979) Nucleotide sequence of the hepatitis B virus genome (subtype ayw) cloned in E. coli. Nature 281, 646?650. (283) Valenzuela, P., Gray, P., Quiroga, M., Zaldivar, J., Goodman, H. M., and Rutter, W. J. (1979) Nucleotide sequence of the gene coding for the major protein of hepatitis B virus surface antigen. Nature 280, 815?819. (284) Charnay, P., Mandart, E., Hampe, A., Fitoussi, F., Tiollais, P., and Galibert, F. (1979) Localization on the viral genome and nucleotide sequence of the gene coding for the two major polypeptides of the Hepatitis B surface antigen (HBs Ag). Nucleic Acids Res. 7, 335?346. (285) Pasek, M., Goto, T., Gilbert, W., Zink, B., Schaller, H., Mackay, P., Leadbetter, G., and Murray, K. (1979) Hepatitis B virus genes and their expression in E. coli. Nature 282, 575?579. (286) Robinson, W. S., Clayton, D. A., and Greenman, R. L. (1974) DNA of a human hepatitis B virus candidate. J. Virol. 14, 384?391. (287) Jones, S. A., and Hu, J. (2013) Hepatitis B virus reverse transcriptase: diverse functions as classical and emerging targets for antiviral intervention. Emerg. Microbes Infect. 2, e56. (288) Bartenschlager, R., Junker-niepmann, M., and Schaller, H. (1990) The P gene product of hepatitis B virus is required as a structural component for genomic RNA encapsidation. J. Virol. 64, 5324. (289) Junker-Niepmann, M., Bartenschlager, R., and Schaller, H. (1990) A short cis- acting sequence is required for hepatitis B virus pregenome encapsidation and sufficient for packaging of foreign RNA. EMBO J. 9, 3389. (290) Wang, G. H., and Seeger, C. (1992) The reverse transcriptase of hepatitis B virus acts as a protein primer for viral DNA synthesis. Cell 71, 663?670. (291) Weber, M., Bronsema, V., Bartos, H., Bosserhoff, A., Bartenschlager, R., and Shaller, H. (1994) Hepadnavirus P protein utilizes a tyrosine residue in the TP domain to prime reverse transcription. J. Virol. 68, 2994?2999. (292) Zoulim, F., and Seeger, C. (1994) Reverse transcription in hepatitis B viruses is primed by a tyrosine residue of the polymerase. J. Virol. 68, 6?13. (293) Toh, H., Hayashida, H., and Miyata, T. (1983) Sequence homology between retroviral reverse transcriptase and putative polymerases of hepatitis B virus and 244 cauliflower mosaic virus. Nat. 1983 3055937 305, 827?829. (294) Radziwill, G, Tucker, W., and Schaller, H. (1990) Mutational analysis of the hepatitis B virus P gene product: domain structure and RNase H activity. J. Virol. 64, 613?620. (295) Yan, H., Zhong, G., Xu, G., He, W., Jing, Z., Gao, Z., Huang, Y., Qi, Y., Peng, B., Wang, H., Fu, L., Song, M., Chen, P., Gao, W., Ren, B., Sun, Y., Cai, T., Feng, X., Sui, J., and Li, W. (2012) Sodium taurocholate cotransporting polypeptide is a functional receptor for human hepatitis B and D virus. Elife 2012. (296) Rall, L. B., Standring, D. N., Laub, O., and Rutter, W. J. (1983) Transcription of hepatitis B virus by RNA polymerase II. Mol. Cell. Biol. 3, 1766?1773. (297) Summers, J., and Mason, W. S. (1982) Replication of the genome of a hepatitis B- -like virus by reverse transcription of an RNA intermediate. Cell 29, 403?415. (298) Knaus, T., and Nassal, M. (1993) The encapsidation signal on the hepatitis B virus RNA pregenome forms a stem-loop structure that is critical for its function. Nucleic Acids Res. 21, 3967?3975. (299) Hirsch, R. C., Lavine, J. E., Chang, L. J., Varmus, H. E., and Ganem, D. (1990) Polymerase gene products of hepatitis B viruses are required for genomic RNA packaging as wel as for reverse transcription. Nature 344, 552?555. (300) Bartenschlager, R., and Schaller, H. (1992) Hepadnaviral assembly is initiated by polymerase binding to the encapsidation signal in the viral RNA genome. EMBO J. 11, 3413. (301) Nassal, M., and Rieger, A. (1996) A bulged region of the hepatitis B virus RNA encapsidation signal contains the replication origin for discontinuous first-strand DNA synthesis. J. Virol. 70, 2764?2773. (302) Wang, G. H., Zoulim, F., Leber, E. H., Kitson, J., and Seeger, C. (1994) Role of RNA in enzymatic activity of the reverse transcriptase of hepatitis B viruses. J. Virol. 68, 8437?8442. (303) Pollack, J. R., and Ganem, D. (1993) An RNA stem-loop structure directs hepatitis B virus genomic RNA encapsidation. J. Virol. 67, 3254. (304) Beck, J., and Nassal, M. (2007) Hepatitis B virus replication. World J. Gastroenterol. 13, 48?64. (305) Zhu, Y., Yamamoto, T., Cullen, J., Saputelli, J., Aldrich, C. E., Miller, D. S., Litwin, S., Furman, P. A., Jilbert, A. R., and Mason, W. S. (2001) Kinetics of hepadnavirus loss from the liver during inhibition of viral DNA synthesis. J. Virol. 75, 311?322. (306) Addison, W. R., Walters, K.-A., Wong, W. W. S., Wilson, J. S., Madej, D., Jewell, L. D., and Tyrrell, D. L. J. (2002) Half-life of the duck hepatitis B virus covalently closed circular DNA pool in vivo following Inhibition of viral replication. J. Virol. 76, 6356?6363. (307) Ganem, D., and Prince, A. M. (2004) Hepatitis B virus infection?natural history and clinical consequences. N. Engl. J. Med. 350, 1118?1129. (308) Sun, D., and Nassal, M. (2006) Stable HepG2- and Huh7-based human hepatoma cell lines for efficient regulated expression of infectious hepatitis B virus. J. Hepatol. 45, 636?645. (309) A, R., and M, N. (1996) Specific hepatitis B virus minus-strand DNA synthesis requires only the 5? encapsidation signal and the 3?-proximal direct repeat DR1. J. Virol. 70, 585?589. (310) Fallows, D. A., and Goff, S. P. (1995) Mutations in the epsilon sequences of human hepatitis B virus affect both RNA encapsidation and reverse transcription. J. Virol. 69, 245 3067?3073. (311) Lanford, R. E., Notvall, L., and Beames, B. (1995) Nucleotide priming and reverse transcriptase activity of hepatitis B virus polymerase expressed in insect cells. J. Virol. 69, 4431. (312) Beck, J., and Nassal, M. (1998) Formation of a Functional Hepatitis B Virus Replication Initiation Complex Involves a Major Structural Alteration in the RNA Template. Mol. Cell. Biol. 18, 6265. (313) Tavis, J. E., Perri, S., and Ganem, D. (1994) Hepadnavirus reverse transcription initiates within the stem-loop of the RNA packaging signal and employs a novel strand transfer. J. Virol. 68, 3536?3543. (314) Jeong, J.-K., Yoon, G.-S., and Ryu, W.-S. (2000) Evidence that the 5?-end cap structure is essential for encapsidation of hepatitis B virus pregenomic RNA. J. Virol. 74, 5502?5508. (315) Mangus, D. A., Evans, M. C., and Jacobson, A. (2003) Poly(A)-binding proteins: Multifunctional scaffolds for the post-transcriptional control of gene expression. Genome Biol. 4, 1?14. (316) Tang, H., and McLachlan, A. (2002) A pregenomic RNA sequence adjacent to DR1 and complementary to epsilon influences hepatitis B virus replication efficiency. Virology 303, 199?210. (317) Shin, M.-K., Lee, J., and Ryu, W.-S. (2004) A Novel cis-acting element facilitates minus-strand DNA synthesis during reverse transcription of the hepatitis B virus genome. J. Virol. 78, 6252. (318) Liu, N., Ji, L., Maguire, M. L., and Loeb, D. D. (2004) Cis-acting sequences that contribute to the synthesis of relaxed-circular DNA of human hepatitis B virus. J. Virol. 78, 642?649. (319) Abraham, T. M., and Loeb, D. D. (2006) Base pairing between the 5? half of epsilon and a cis-acting sequence, phi, makes a contribution to the synthesis of minus-strand DNA for human hepatitis B virus. J. Virol. 80, 4380?4387. (320) Kamtekar, S., Berman, A. J., Wang, J., L?zaro, J. M., De Vega, M., Blanco, L., Salas, M., and Steitz, T. A. (2006) The ?29 DNA polymerase:protein-primer structure suggests a model for the initiation to elongation transition. EMBO J. 25, 1335. (321) Loeb, D. D., Hirsch, R. C., and Ganem, D. (1991) Sequence-independent RNA cleavages generate the primers for plus strand DNA synthesis in hepatitis B viruses: implications for other reverse transcribing elements. EMBO J. 10, 3533. (322) Lee, J., Shin, M.-K., Lee, H.-J., Yoon, G., and Ryu, W.-S. (2004) Three novel cis- acting elements required for efficient plus-strand DNA synthesis of the hepatitis B virus genome. J. Virol. 78, 7455?7464. (323) Jones, S. A., Boregowda, R., Spratt, T. E., and Hu, J. (2012) In vitro epsilon RNA- dependent protein priming activity of human hepatitis B virus polymerase. J. Virol. 86, 5134?5150. (324) Hu, J., and Boyer, M. (2006) Hepatitis B virus reverse transcriptase and ? RNA sequences required for specific interaction in vitro. J. Virol. 80, 2141?2150. (325) Feng, H., Chen, P., Zhao, F., Nassal, M., and Hu, K. (2013) Evidence for multiple distinct interactions between hepatitis B virus P protein and its cognate RNA encapsidation signal during initiation of reverse transcription. PLoS One 8, e72798. (326) Lok, A. S. F., Akarca, U., and Greene, S. (1994) Mutations in the pre-core region of 246 hepatitis B virus serve to enhance the stability of the secondary structure of the pre- genome encapsidation signal. Proc. Natl. Acad. Sci. U. S. A. 91, 4077. (327) Laskus, T., Rakela, J., and Persing, D. H. (1994) The stem-loop structure of the cis- encapsidation signal is highly conserved in naturally occurring hepatitis B virus variants. Virology 200, 809?812. (328) Flodell, S., Schleucher, J., Cromsigt, J., Ippel, H., Kidd-Ljunggren, K., and Wijmenga, S. (2002) The apical stem?loop of the hepatitis B virus encapsidation signal folds into a stable tri?loop with two underlying pyrimidine bulges. Nucleic Acids Res. 30, 4803?4811. (329) Petzold, K., Duchardt, E., Flodell, S., Larsson, G., Kidd-Ljunggren, K., Wijmenga, S., and Schleucher, J. (2007) Conserved nucleotides in an RNA essential for hepatitis B virus replication show distinct mobility patterns. Nucleic Acids Res. 35, 6854?6861. (330) Feng, H., Beck, J., Nassal, M., and Hu, K. hong. (2011) A SELEX-screened aptamer of human hepatitis B virus RNA encapsidation signal suppresses viral replication. PLoS One 6, e27862. (331) Das, K., Xiong, X., Yang, H., Westland, C. E., Gibbs, C. S., Sarafianos, S. G., and Arnold, E. (2001) Molecular modeling and biochemical characterization reveal the mechanism of hepatitis B virus polymerase resistance to lamivudine (3TC) and emtricitabine (FTC). J. Virol. 75, 4771?4779. (332) Allen, M. I., Deslauriers, M., Webster Andrews, C., Tipples, G. A., Walters, K. A., Tyrrell, D. L. J., Brown, N., and Condreay, L. D. (1998) Identification and characterization of mutations in hepatitis B virus resistant to lamivudine. Lamivudine Clinical Investigation Group. Hepatology 27, 1670?1677. (333) Beck, J., Vogel, M., and Nassal, M. (2002) dNTP versus NTP discrimination by phenylalanine 451 in duck hepatitis B virus P protein indicates a common structure of the dNTP-binding pocket with other reverse transcriptases. Nucleic Acids Res. 30, 1679? 1687. (334) Hu, J., and Anselmo, D. (2000) In vitro reconstitution of a functional duck hepatitis B virus reverse transcriptase: posttranslational activation by Hsp90. J. Virol. 74, 11447? 11455. (335) Beck, J., and Nassal, M. (2004) In vitro reconstitution of epsilon-dependent duck hepatitis B virus replication initiation. Methods Mol. Med. 95, 315?325. (336) Beck, J., and Nassal, M. (2001) Reconstitution of a functional duck hepatitis B virus replication initiation complex from separate reverse transcriptase domains expressed in Escherichia coli. J. Virol. 75, 7410?7419. (337) Hu, J., O.Toft, D., and Seeger, C. (1997) Hepadnavirus assembly and reverse transcription require a multi-component chaperone complex which is incorporated into nucleocapsids. EMBO J. 16, 59?68. (338) Hu, J., and Seeger, C. (1996) Hsp90 is required for the activity of a hepatitis B virus reverse transcriptase. Proc. Natl. Acad. Sci. 93, 1060?1064. (339) Hu, J., Flores, D., Toft, D., Wang, X., and Nguyen, D. (2004) Requirement of heat shock protein 90 for human hepatitis B virus reverse transcriptase function. J. Virol. 78, 13122?13131. (340) Kim, S., Wang, H., and Ryu, W.-S. (2010) Incorporation of eukaryotic translation initiation factor eIF4E into viral nucleocapsids via interaction with hepatitis B virus polymerase. J. Virol. 84, 52?58. 247 (341) Nguyen, D. H., and Hu, J. (2008) Reverse transcriptase- and RNA packaging signal-dependent incorporation of APOBEC3G into hepatitis B virus nucleocapsids. J. Virol. 82, 6852?6861. (342) Turelli, P., Mangeat, B., Jost, S., Vianin, S., and Trono, D. (2004) Inhibition of hepatitis B virus replication by APOBEC3G. Science 303, 1829. (343) Nguyen, D. H., Gummuluru, S., and Hu, J. (2007) Deamination-Independent Inhibition of Hepatitis B Virus Reverse Transcription by APOBEC3G. J. Virol. 81, 4465. (344) Feng, T., Sun, T., Li, G., Pan, W., Wang, K., and Dai, J. (2017) DEAD-box helicase DDX25 is a negative regulator of type I interferon pathway and facilitates RNA virus infection. Front. Cell. Infect. Microbiol. 7. (345) Chen, J., Wu, M., Zhang, X., Zhang, W., Zhang, Z., Chen, L., He, J., Zheng, Y., Chen, C., Wang, F., Hu, Y., Zhou, X., Wang, C., Xu, Y., Lu, M., and Yuan, Z. (2013) Hepatitis B virus polymerase impairs interferon-?-induced STAT activation through inhibition of importin-?5 and protein kinase C-?. Hepatology 57, 470?482. (346) Marino, J. P., Schwalbe, H., Anklin, C., Bermel, W., Crothers, D. M., and Griesinger, C. (1995) Sequential correlation of anomeric ribose protons and intervening phosphorus in RNA oligonucleotides by a 1H, 13C, 31P triple resonance experiment: HCP-CCH- TOCSY. J. Biomol. NMR 5, 87?92. (347) LeBlanc, R. M., Longhini, A. P., Le Grice, S. F. J., Johnson, B. A., and Dayie, T. K. (2017) Combining asymmetric 13C-labeling and isotopic filter/edit NOESY: a novel strategy for rapid and logical RNA resonance assignment. Nucleic Acids Res. 45, e146. (348) Dallmann, A., Simon, B., Duszczyk, M. M., Kooshapur, H., Pardi, A., Bermel, W., and Sattler, M. (2013) Efficient detection of hydrogen bonds in dynamic regions of RNA by sensitivity-optimized NMR pulse sequences. Angew. Chem. Int. Ed. Engl. 52, 10487? 10490. (349) Dingley, A. J., and Grzesiek, S. (1998) Direct observation of hydrogen bonds in nucleic acid base pairs by internucleotide 2JNN couplings. J. Am. Chem. Soc. 120, 8293? 8297. (350) Hartlm?ller, C., G?nther, J. C., Wolter, A. C., W?hnert, J., Sattler, M., and Madl, T. (2017) RNA structure refinement using NMR solvent accessibility data. Sci. Reports 2017 71 7, 1?10. (351) Goldman, M. (1984) Interference effects in the relaxation of a pair of unlike spin-1/2 nuclei. J. Magn. Reson. 60, 437?452. (352) Brutscher, B., Skrynnikov, N. R., Bremi, T., Br?schweiler, R., and Ernst, R. R. (1998) Quantitative investigation of dipole?CSA cross-correlated relaxation by ZQ/DQ spectroscopy. J. Magn. Reson. 130, 346?351. (353) Yang, D., and Kay, L. E. (1998) Determination of the protein backbone dihedral angle ? from a combination of NMR-derived cross-correlation spin relaxation rates. J. Am. Chem. Soc. 120, 9880?9887. (354) Schwieters, C. D., Bermejo, G. A., and Clore, G. M. (2018) Xplor-NIH for molecular structure determination from NMR and other data sources. Protein Sci. 27, 26?40. (355) Bermejo, G. A., Clore, G. M., and Schwieters, C. D. (2016) Improving NMR Structures of RNA. Structure 24, 806?815. (356) Walker, S. C., Avis, J. M., and Conn, G. L. (2003) General plasmids for producing RNA in vitro transcripts with homogeneous ends. Nucleic Acids Res. 31, e82. (357) Hu, W., Kakalis, L. T., Jiang, L., Jiang, F., Ye, X., and Majumdar, A. (1998) 3D 248 HCCH-COSY-TOCSY experiment for the assignment of ribose and amino acid side chains in 13C labeled RNA and protein. J. Biomol. NMR 12, 559?564. (358) Brutscher, B., and Simorre, J. P. (2001) Transverse relaxation optimized HCN experiment for nucleic acids: combining the advantages of TROSY and MQ spin evolution. J. Biomol. NMR 21, 367?372. (359) Peterson, R. D., Theimer, C. A., Wu, H., and Feigon, J. (2004) New applications of 2D filtered/edited NOESY for assignment and structure elucidation of RNA and RNA- protein complexes. J. Biomol. NMR 2004 281 28, 59?67. (360) Bahrami, A., Clos, L. J., Markley, J. L., Butcher, S. E., and Eghbalnia, H. R. (2012) RNA-PAIRS: RNA probabilistic assignment of imino resonance shifts. J. Biomol. NMR 52, 289?302. (361) Delaglio, F., Grzesiek, S., Vuister, G. W., Zhu, G., Pfeifer, J., and Bax, A. (1995) NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277?293. (362) Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S., and Richardson, D. C. (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D. Biol. Crystallogr. 66, 12?21. (363) Mazur, A., Hammesfahr, B., Griesinger, C., Lee, D., and Kollmar, M. (2013) ShereKhan?calculating exchange parameters in relaxation dispersion data from CPMG experiments. Bioinformatics 29, 1819?1820. (364) Bakan, A., Meireles, L. M., and Bahar, I. (2011) ProDy: Protein Dynamics Inferred from Theory and Experiments. Bioinformatics 27, 1575?1577. (365) Humphrey, W., Dalke, A., and Schulten, K. (1996) VMD: visual molecular dynamics. J. Mol. Graph. 14, 33?38. (366) Case, D. A., Cheatham, T. E., Darden, T., Gohlke, H., Luo, R., Merz, K. M., Onufriev, A., Simmerling, C., Wang, B., and Woods, R. J. (2005) The Amber biomolecular simulation programs. J. Comput. Chem. 26, 1668?1688. (367) Bergonzo, C., and Cheatham, T. E. (2015) Improved Force Field Parameters Lead to a Better Description of RNA Structure. J. Chem. Theory Comput. 11, 3969?3972. (368) Zgarbov?, M., Otyepka, M., ?poner, J., Ml?dek, A., Ban??, P., Cheatham, T. E., and Jure?ka, P. (2011) Refinement of the Cornell et al. Nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles. J. Chem. Theory Comput. 7, 2886?2902. (369) Steinbrecher, T., Latzer, J., and Case, D. A. (2012) Revised AMBER parameters for bioorganic phosphates. J. Chem. Theory Comput. 8, 4405?4412. (370) Mukhopadhyay, A., Fenley, A. T., Tolokh, I. S., and Onufriev, A. V. (2012) Charge hydration asymmetry: the basic principle and how to use it to test and improve water models. J. Phys. Chem. B 116, 9776. (371) Izadi, S., Anandakrishnan, R., and Onufriev, A. V. (2014) Building water models: A different approach. J. Phys. Chem. Lett. 5, 3863?3871. (372) Joung, I. S., and Cheatham, T. E. (2008) Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. J. Phys. Chem. B 112, 9020?9041. (373) Cheatham, T. E., Miller, J. L., Fox, T., Darden, T. A., and Kollman, P. A. (2002) Molecular Dynamics Simulations on Solvated Biomolecular Systems: The Particle Mesh 249 Ewald Method Leads to Stable Trajectories of DNA, RNA, and Proteins. J. Am. Chem. Soc. 117, 4193?4194. (374) Berendsen, H. J. C., Postma, J. P. M., Van Gunsteren, W. F., Dinola, A., and Haak, J. R. (1998) Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684. (375) Horn, H. W., Swope, W. C., Pitera, J. W., Madura, J. D., Dick, T. J., Hura, G. L., and Head-Gordon, T. (2004) Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. J. Chem. Phys. 120, 9665. (376) Roe, D. R., and Brooks, B. R. (2020) A protocol for preparing explicitly solvated systems for stable molecular dynamics simulations. J. Chem. Phys. 153, 054123. (377) Salomon-Ferrer, R., G?tz, A. W., Poole, D., Le Grand, S., and Walker, R. C. (2013) Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh ewald. J. Chem. Theory Comput. 9, 3878?3888. (378) Loncharich, R. J., Brooks, B. R., and Pastor, R. W. (1992) Langevin dynamics of peptides: The frictional dependence of isomerization rates of N-acetylalanyl-N?- methylamide. Biopolymers 32, 523?535. (379) Sindhikara, D. J., Kim, S., Voter, A. F., and Roitberg, A. E. (2009) Bad seeds sprout perilous dynamics: Stochastic thermostat induced trajectory synchronization in biomolecules. J. Chem. Theory Comput. 5, 1624?1631. (380) Hopkins, C. W., Le Grand, S., Walker, R. C., and Roitberg, A. E. (2015) Long-Time- Step Molecular Dynamics through Hydrogen Mass Repartitioning. J. Chem. Theory Comput. 11, 1864?1874. (381) Roe, D. R., and Cheatham, T. E. (2013) PTRAJ and CPPTRAJ: Software for processing and analysis of molecular dynamics trajectory data. J. Chem. Theory Comput. 9, 3084?3095. (382) Beck, J., and Nassal, M. (1997) Sequence- and structure-specific determinants in the interaction between the RNA encapsidation signal and reverse transcriptase of avian hepatitis B viruses. J. Virol. 71, 4971?4980. (383) (2004) World Health Orhanization. Hepatitis B vaccines. Wkly Epidemiol Rec 79, 253?264. (384) Stanaway, J. D., Flaxman, A. D., Naghavi, M., Fitzmaurice, C., Vos, T., Abubakar, I., Abu-Raddad, L. J., Assadi, R., Bhala, N., Cowie, B., Forouzanfour, M. H., Groeger, J., Hanafiah, K. M., Jacobsen, K. H., James, S. L., MacLachlan, J., Malekzadeh, R., Martin, N. K., Mokdad, A. A., Mokdad, A. H., Murray, C. J. L., Plass, D., Rana, S., Rein, D. B., Richardus, J. H., Sanabria, J., Saylan, M., Shahraz, S., So, S., Vlassov, V. V., Weiderpass, E., Wiersma, S. T., Younis, M., Yu, C., El Sayed Zaki, M., and Cooke, G. S. (2016) The global burden of viral hepatitis from 1990 to 2013: findings from the Global Burden of Disease Study 2013. Lancet 388, 1081?1088. (385) Perz, J. F., Armstrong, G. L., Farrington, L. A., Hutin, Y. J. F., and Bell, B. P. (2006) The contributions of hepatitis B virus and hepatitis C virus infections to cirrhosis and primary liver cancer worldwide. J. Hepatol. 45, 529?538. (386) Ioannou, G. N. (2011) Hepatitis B virus in the United States: infection, exposure, and immunity rates in a nationally representative survey. Ann. Intern. Med. 154, 319?328. (387) (2006) CDC. Screening for Chronic Hepatitis B Among Asian/Pacific Islander Populations-New York City, 2005. Morb Mortal Wkly Rep 55, 505?509. (388) Hatzakis, A., Wait, S., Bruix, J., Buti, M., Carballo, M., Cavaleri, M., Colombo, M., 250 Delarocque-Astagneau, E., Dusheiko, G., Esmat, G., Esteban, R., Goldberg, D., Gore, C., Lok, A. S. F., Manns, M., Marcellin, P., Papatheodoridis, G., Peterle, A., Prati, D., Piorkowsky, N., Rizzetto, M., Roudot-Thoraval, F., Soriano, V., Thomas, H. C., Thursz, M., Valla, D., Van Damme, P., Veldhuijzen, I. K., Wedemeyer, H., Wiessing, L., Zanetti, A. R., and Janssen, H. L. A. (2011) The state of hepatitis B and C in Europe: report from the hepatitis B and C summit conference. J. Viral Hepat. 18, 1?16. (389) Scaglione, S. J., and Lok, A. S. F. (2012) Effectiveness of hepatitis B treatment in clinical practice. Gastroenterology 142, 1360?1368. (390) Men?ndez-Arias, L., ?lvarez, M., and Pacheco, B. (2014) Nucleoside/nucleotide analog inhibitors of hepatitis B virus polymerase: mechanism of action and resistance. Curr. Opin. Virol. 8, 1?9. (391) Zoulim, F., and Locarnini, S. (2009) Hepatitis B virus resistance to nucleos(t)ide analogues. Gastroenterology 137. (392) de Clercq, E., F?rir, G., Kaptein, S., and Neyts, J. (2010) Antiviral treatment of chronic hepatitis B virus (HBV) infections. Viruses 2, 1279. (393) Woo, A. S. J., Kwok, R., and Ahmed, T. (2017) Alpha-interferon treatment in hepatitis B. Ann. Transl. Med. 5. (394) Gill, U. S., Peppa, D., Micco, L., Singh, H. D., Carey, I., Foster, G. R., Maini, M. K., and Kennedy, P. T. F. (2016) Interferon alpha induces sustained changes in NK cell responsiveness to hepatitis B viral load suppression in vivo. PLoS Pathog. 12. (395) Micco, L., Peppa, D., Loggi, E., Schurich, A., Jefferson, L., Cursaro, C., Panno, A. M., Bernardi, M., Brander, C., Bihl, F., Andreone, P., and Maini, M. K. (2013) Differential boosting of innate and adaptive antiviral responses during pegylated-interferon-alpha therapy of chronic hepatitis B. J. Hepatol. 58, 225?233. (396) Marcellin, P., Heathcote, E. J., Buti, M., Gane, E., de Man, R. A., Krastev, Z., Germanidis, G., Lee, S. S., Flisiak, R., Kaita, K., Manns, M., Kotzev, I., Tchernev, K., Buggisch, P., Weilert, F., Kurdas, O. O., Shiffman, M. L., Trinh, H., Washington, M. K., Sorbel, J., Anderson, J., Snow-Lampart, A., Mondou, E., Quinn, J., and Rousseau, F. (2008) Tenofovir disoproxil fumarate versus adefovir dipivoxil for chronic hepatitis B. N. Engl. J. Med. 359, 2442?2455. (397) Lai, C.-L., Gane, E., Liaw, Y.-F., Hsu, C.-W., Thongsawat, S., Wang, Y., Chen, Y., Heathcote, E. J., Rasenack, J., Bzowej, N., Naoumov, N. V., Di Bisceglie, A. M., Zeuzem, S., Moon, Y. M., Goodman, Z., Chao, G., Constance, B. F., and Brown, N. A. (2009) Telbivudine versus lamivudine in patients with chronic hepatitis B. N. Engl. J. Med. 357, 2576?2588. (398) Dienstag, J. L., Schiff, E. R., Wright, T. L., Perrillo, R. P., Hann, H.-W. L., Goodman, Z., Crowther, L., Condreay, L. D., Woessner, M., Rubin, M., and Brown, N. A. (1999) Lamivudine as initial treatment for chronic hepatitis B in the United States. N. Engl. J. Med. 341, 1256?1263. (399) Marcellin, P., Chang, T.-T., Lim, S. G., Tong, M. J., Sievert, W., Shiffman, M. L., Jeffers, L., Goodman, Z., Wulfsohn, M. S., Xiong, S., Fry, J., and Brosgart, C. L. (2003) Adefovir dipivoxil for the treatment of hepatitis B e antigen-positive chronic hepatitis B. N. Engl. J. Med. 348, 808?816. (400) Chang, T.-T., Gish, R. G., de Man, R., Gadano, A., Sollano, J., Chao, Y.-C., Lok, A. S., Han, K.-H., Goodman, Z., Zhu, J., Cross, A., DeHertogh, D., Wilber, R., Colonno, R., and Apelian, D. (2006) A comparison of entecavir and lamivudine for HBeAg-positive 251 chronic hepatitis B. N. Engl. J. Med. 354, 1001?1010. (401) Zeng, M. De, Mao, Y. M., Yao, G. B., Wang, H., Hou, J. L., Wang, Y. Z., Ji, B. N., Chang, C. N. P., and Barker, K. F. (2006) A double-blind randomized trial of adefovir dipivoxil in Chinese subjects with HBeAg-positive chronic hepatitis B. Hepatology 44, 108?116. (402) Lai, C.-L., Shouval, D., Lok, A. S., Chang, T.-T., Cheinquer, H., Goodman, Z., DeHertogh, D., Wilber, R., Zink, R. C., Cross, A., Colonno, R., and Fernandes, L. (2009) Entecavir versus lamivudine for patients with HBeAg-negative chronic hepatitis B. N. Engl. J. Med. 354, 1011?1020. (403) Hadziyannis, S. J., Tassopoulos, N. C., Heathcote, E. J., Chang, T.-T., Kitis, G., Rizzetto, M., Marcellin, P., Lim, S. G., Goodman, Z., Wulfsohn, M. S., Xiong, S., Fry, J., and Brosgart, C. L. (2009) Adefovir dipivoxil for the treatment of hepatitis B e antigen? negative chronic hepatitis B. N. Engl. J. Med. 348, 800?807. (404) Lok, A. S. F., Lai, C. L., Leung, N., Yao, G. B., Cui, Z. Y., Schiff, E. R., Dienstag, J. L., Heathcote, E. J., Little, N. R., Griffiths, D. A., Gardner, S. D., and Castiglia, M. (2003) Long-term safety of lamivudine treatment in patients with chronic hepatitis B. Gastroenterology 125, 1714?1722. (405) Lai, C.-L., Chien, R.-N., Leung, N. W. Y., Chang, T.-T., Guan, R., Tai, D.-I., Ng, K.- Y., Wu, P.-C., Dent, J. C., Barber, J., Stephenson, S. L., and Gray, D. F. (1998) A one- year trial of lamivudine for chronic hepatitis B. Asia hepatitis lamivudine study group. N. Engl. J. Med. 339, 61?68. (406) Kayaaslan, B., and Guner, R. (2017) Adverse effects of oral antiviral therapy in chronic hepatitis B. World J. Hepatol. 9, 227. (407) Reizis, B., Bunin, A., Ghosh, H. S., Lewis, K. L., and Sisirak, V. (2011) Plasmacytoid dendritic cells: recent progress and open questions. Annu. Rev. Immunol. 29, 163?183. (408) A, I., J, L., and RC, V. (1957) Virus interference. II. Some properties of interferon. Proc. R. Soc. London. Ser. B - Biol. Sci. 147, 268?273. (409) A, I., and J, L. (1957) Virus interference. I. The interferon. Proc. R. Soc. London. Ser. B, Biol. Sci. 147, 258?267. (410) Xu, F., Song, H., Li, N., and Tan, G. (2016) HBsAg blocks TYPE I IFN induced up- regulation of A3G through inhibition of STAT3. Biochem. Biophys. Res. Commun. 473, 219?223. (411) Belloni, L., Allweiss, L., Guerrieri, F., Pediconi, N., Volz, T., Pollicino, T., Petersen, J., Raimondo, G., Dandri, M., and Levrero, M. (2012) IFN-? inhibits HBV transcription and replication in cell culture and in humanized mice by targeting the epigenetic regulation of the nuclear cccDNA minichromosome. J. Clin. Invest. 122, 529?537. (412) Chan, H. L. Y., Leung, N. W. Y., Hui, A. Y., Wong, V. W. S., Liew, C. T., Chim, A. M. L., Chan, F. K. L., Hung, L. C. T., Lee, Y. T., Tam, J. S. L., Lam, C. W. K., and Sung, J. J. Y. (2005) A randomized, controlled trial of combination therapy for chronic hepatitis B: comparing pegylated interferon-alpha2b and lamivudine with lamivudine alone. Ann. Intern. Med. 142, 240?250. (413) Janssen, H. L. A., Van Zonneveld, M., Senturk, H., Zeuzem, S., Akarca, U. S., Cakaloglu, Y., Simon, C., So, T. M. K., Gerken, G., De Man, R. A., Niesters, H. G. M., Zondervan, P., Hansen, B., and Schalm, S. W. (2005) Pegylated interferon alfa-2b alone or in combination with lamivudine for HBeAg-positive chronic hepatitis B: a randomised trial. Lancet (London, England) 365, 123?129. 252 (414) Lau, G. K. K., Piratvisuth, T., Luo, K. X., Marcellin, P., Thongsawat, S., Cooksley, G., Gane, E., Fried, M. W., Chow, W. C., Paik, S. W., Chang, W. Y., Berg, T., Flisiak, R., McCloud, P., and Pluck, N. (2005) Peginterferon Alfa-2a, lamivudine, and the combination for HBeAg-positive chronic hepatitis B. N. Engl. J. Med. 352, 2682?2695. (415) Marcellin, P., Lau, G. K. K., Bonino, F., Farci, P., Hadziyannis, S., Jin, R., Lu, Z.- M., Piratvisuth, T., Germanidis, G., Yurdaydin, C., Diago, M., Gurel, S., Lai, M.-Y., Button, P., and Pluck, N. (2004) Peginterferon alfa-2a alone, lamivudine alone, and the two in combination in patients with HBeAg-negative chronic hepatitis B. N. Engl. J. Med. 351, 1206?1217. (416) Wong, V. W. S., Wong, G. L. H., Yan, K. K. L., Chim, A. M. L., Chan, H. Y., Tse, C. H., Choi, P. C. L., Chan, A. W. H., Sung, J. J. Y., and Chan, H. L. Y. (2010) Durability of peginterferon alfa-2b treatment at 5 years in patients with hepatitis B e antigen?positive chronic hepatitis B. Hepatology 51, 1945?1953. (417) Buster, E. H. C. J., Flink, H. J., Cakaloglu, Y., Simon, K., Trojan, J., Tabak, F., So, T. M. K., Feinman, S. V., Mach, T., Akarca, U. S., Schutten, M., Tielemans, W., van Vuuren, A. J., Hansen, B. E., and Janssen, H. L. A. (2008) Sustained HBeAg and HBsAg loss after long-term follow-up of HBeAg-positive patients treated with peginterferon alpha- 2b. Gastroenterology 135, 459?467. (418) Monto, A., Schooley, R. T., Lai, J. C., Sulkowski, M. S., Chung, R. T., Pawlotsky, J. M., McHutchison, J. G., and Jacobson, I. M. (2010) Lessons from HIV therapy applied to viral hepatitis therapy: Summary of a workshop. Am. J. Gastroenterol. 105, 989?1004. (419) Ghany, M., and Liang, T. J. (2007) Drug Targets and Molecular Mechanisms of Drug Resistance in Chronic Hepatitis B. Gastroenterology 132, 1574?1585. (420) Takkenberg, B., Terpstra, V., Zaaijer, H., Weegink, C., Dijkgraaf, M., Jansen, P., Beld, M., and Reesink, H. (2011) Intrahepatic response markers in chronic hepatitis B patients treated with peginterferon alpha-2a and adefovir. J. Gastroenterol. Hepatol. 26, 1527?1535. (421) Lai, C. L., Ahn, S. H., Lee, K. S., Um, S. H., Cho, M., Yoon, S. K., Lee, J. W., Park, N. H., Kweon, Y. O., Sohn, J. H., Lee, J., Kim, J. A., Han, K. H., and Yuen, M. F. (2014) Phase IIb multicentred randomised trial of besifovir (LB80380) versus entecavir in Asian patients with chronic hepatitis B. Gut 63, 996?1004. (422) Ahn, S. H., Kim, W., Jung, Y. K., Yang, J. M., Jang, J. Y., Kweon, Y. O., Cho, Y. K., Kim, Y. J., Hong, G. Y., Kim, D. J., Um, S. H., Sohn, J. H., Lee, J. W., Park, S. J., Lee, B. S., Kim, J. H., Kim, H. S., Yoon, S. K., Kim, M. Y., Yim, H. J., Lee, K. S., Lim, Y. S., Lee, W. S., Park, N. H., Jin, S. Y., Kim, K. H., Choi, W., and Han, K. H. (2019) Efficacy and safety of besifovir dipivoxil maleate compared with tenofovir disoproxil fumarate in treatment of chronic hepatitis B virus infection. Clin. Gastroenterol. Hepatol. 17, 1850- 1859.e4. (423) Yuen, M. F., Ahn, S. H., Lee, K. S., Um, S. H., Cho, M., Yoon, S. K., Lee, J. W., Park, N. H., Kweon, Y. O., Sohn, J. H., Lee, J., Kim, J. A., Lai, C. L., and Han, K. H. (2015) Two-year treatment outcome of chronic hepatitis B infection treated with besifovir vs. entecavir: Results from a multicentre study. J. Hepatol. 62, 526?532. (424) Painter, G. R., Almond, M. R., Trost, L. C., Lampert, B. M., Neyts, J., De Clercq, E., Korba, B. E., Aldern, K. A., Beadle, J. R., and Hostetler, K. Y. (2007) Evaluation of hexadecyloxypropyl-9-R-[2-(phosphonomethoxy)propyl]-adenine, CMX157, as a potential treatment for human immunodeficiency virus type 1 and hepatitis B virus 253 infections. Antimicrob. Agents Chemother. 51, 3505?3509. (425) Park, S. H., Park, K. S., Kim, N. H., Cho, J. Y., Koh, M. S., and Lee, J. H. (2017) Clevudine Induced Mitochondrial Myopathy. J. Korean Med. Sci. 32, 1857?1860. (426) Jones, S. A., Murakami, E., Delaney, W., Furman, P., and Hu, J. (2013) Noncompetitive inhibition of hepatitis B virus reverse transcriptase protein priming and DNA synthesis by the nucleoside analog clevudine. Antimicrob. Agents Chemother. 57, 4181. (427) Niu, C., Murakami, E., and Furman, P. A. (2008) Clevudine is efficiently phosphorylated to the active triphosphate form in primary human hepatocytes: Antivir. Ther. 13, 263?269. (428) Anderson, D. L. (2009) Clevudine for hepatitis B. Drugs of Today 45, 331?350. (429) Tavis, J. E., Cheng, X., Hu, Y., Totten, M., Cao, F., Michailidis, E., Aurora, R., Meyers, M. J., Jacobsen, E. J., Parniak, M. A., and Sarafianos, S. G. (2013) The Hepatitis B virus ribonuclease H is sensitive to inhibitors of the human immunodeficiency virus ribonuclease H and integrase enzymes. PLOS Pathog. 9, e1003125. (430) Hu, Y., Cheng, X., Cao, F., Huang, A., and Tavis, J. E. (2013) ?-Thujaplicinol inhibits hepatitis B virus replication by blocking the viral ribonuclease H activity. Antiviral Res. 99, 221?229. (431) Villa, J. A., Pike, D. P., Patel, K. B., Lomonosova, E., Lu, G., Abdulqader, R., and Tavis, J. E. (2016) Purification and enzymatic characterization of the hepatitis B virus ribonuclease H, a new target for antiviral inhibitors. Antiviral Res. 132, 186?195. (432) Edwards, T. C., Mani, N., Dorsey, B., Kakarla, R., Rijnbrand, R., Sofia, M. J., and Tavis, J. E. (2019) Inhibition of HBV replication by N-hydroxyisoquinolinedione and N- hydroxypyridinedione ribonuclease H inhibitors. Antiviral Res. 164, 70?80. (433) Edwards, T. C., Lomonosova, E., Patel, J. A., Li, Q., Villa, J. A., Gupta, A. K., Morrison, L. A., Bailly, F., Cotelle, P., Giannakopoulou, E., Zoidis, G., and Tavis, J. E. (2017) Inhibition of hepatitis B virus replication by N-hydroxyisoquinolinediones and related polyoxygenated heterocycles. Antiviral Res. 143, 205?217. (434) Huber, A. D., Michailidis, E., Tang, J., Puray-Chavez, M. N., Boftsi, M., Wolf, J. J., Boschert, K. N., Sheridan, M. A., Leslie, M. D., Kirby, K. A., Singh, K., Mitsuya, H., Parniak, M. A., Wang, Z., and Sarafianos, S. G. (2017) 3-hydroxypyrimidine-2,4-diones as novel hepatitis b virus antivirals targeting the viral ribonuclease H. Antimicrob. Agents Chemother. e00245-17. (435) Lomonosova, E., Zlotnick, A., and Tavis, J. E. (2017) Synergistic interactions between hepatitis B virus RNase H antagonists and other inhibitors. Antimicrob. Agents Chemother. e02441-16. (436) Lu, G., Lomonosova, E., Cheng, X., Moran, E. A., Meyers, M. J., Le Grice, S. F. J., Thomas, C. J., Jiang, J. K., Meck, C., Hirsch, D. R., D?Erasmo, M. P., Suyabatmaz, D. M., Murelli, R. P., and Tavis, J. E. (2015) Hydroxylated tropolones inhibit hepatitis B virus replication by blocking viral ribonuclease H activity. Antimicrob. Agents Chemother. 59, 1070?1079. (437) Lu, G., Villa, J. A., Donlin, M. J., Edwards, T. C., Cheng, X., Heier, R. F., Meyers, M. J., and Tavis, J. E. (2016) Hepatitis B virus genetic diversity has minimal impact on sensitivity of the viral ribonuclease H to inhibitors. Antiviral Res. 135, 24?30. (438) Wang, X., and Hu, J. (2002) Distinct requirement for two stages of protein-primed initiation of reverse transcription in hepadnaviruses. J. Virol. 76, 5857?5865. 254 (439) Jones, S. A., and Hu, J. (2013) Protein-primed terminal transferase activity of hepatitis B virus polymerase. J. Virol. 87, 2563?2576. (440) Tchesnokov, E. P., Obikhod, A., Schinazi, R. F., and G?tte, M. (2008) Delayed chain termination protects the anti-hepatitis B virus drug entecavir from excision by HIV- 1 reverse transcriptase. J. Biol. Chem. 283, 34218?34228. (441) Langley, D. R., Walsh, A. W., Baldick, C. J., Eggers, B. J., Rose, R. E., Levine, S. M., Kapur, A. J., Colonno, R. J., and Tenney, D. J. (2007) Inhibition of hepatitis B virus polymerase by entecavir. J. Virol. 81, 3992?4001. (442) Boregowda, R. K., Adams, C., and Hu, J. (2012) TP-RT domain interactions of duck hepatitis B virus reverse transcriptase in cis and in trans during protein-primed initiation of DNA synthesis in vitro. J. Virol. 86, 6522?6536. (443) Tsukamoto, Y., Ikeda, S., Uwai, K., Taguchi, R., Chayama, K., Sakaguchi, T., Narita, R., Yao, W. L., Takeuchi, F., Otakaki, Y., Watashi, K., Wakita, T., Kato, H., and Fujita, T. (2018) Rosmarinic acid is a novel inhibitor for Hepatitis B virus replication targeting viral epsilon RNA-polymerase interaction. PLoS One 13, e0197664. (444) Lin, L., and Hu, J. (2008) Inhibition of hepadnavirus reverse transcriptase-? RNA interaction by porphyrin compounds. J. Virol. 82, 2305?2312. (445) Hargrove, A. E. (2020) Small molecule?RNA targeting: starting with the fundamentals. Chem. Commun. 56, 14744?14756. (446) Falese, J. P., Donlic, A., and Hargrove, A. E. (2021) Targeting RNA with small molecules: from fundamental principles towards the clinic. Chem. Soc. Rev. 50, 2224? 2243. (447) Shortridge, M. D., and Varani, G. (2015) Structure based approaches for targeting non-coding RNAs with small molecules. Curr. Opin. Struct. Biol. 30, 79?88. (448) Disney, M. D., Yildirim, I., and Childs-Disney, J. L. (2014) Methods to enable the design of bioactive small molecules targeting RNA. Org. Biomol. Chem. 12, 1029?1039. (449) Naryshkin, N. A., Weetall, M., Dakka, A., Narasimhan, J., Zhao, X., Feng, Z., Ling, K. K. Y., Karp, G. M., Qi, H., Woll, M. G., Chen, G., Zhang, N., Gabbeta, V., Vazirani, P., Bhattacharyya, A., Furia, B., Risher, N., Sheedy, J., Kong, R., Ma, J., Turpoff, A., Lee, C. S., Zhang, X., Moon, Y. C., Trifillis, P., Welch, E. M., Colacino, J. M., Babiak, J., Almstead, N. G., Peltz, S. W., Eng, L. A., Chen, K. S., Mull, J. L., Lynes, M. S., Rubin, L. L., Fontoura, P., Santarelli, L., Haehnke, D., McCarthy, K. D., Schmucki, R., Ebeling, M., Sivaramakrishnan, M., Ko, C. P., Paushkin, S. V., Ratni, H., Gerlach, I., Ghosh, A., and Metzger, F. (2014) Motor neuron disease. SMN2 splicing modifiers improve motor function and longevity in mice with spinal muscular atrophy. Science 345, 688?693. (450) Stelzer, A. C., Frank, A. T., Kratz, J. D., Swanson, M. D., Gonzalez-Hernandez, M. J., Lee, J., Andricioaei, I., Markovitz, D. M., and Al-Hashimi, H. M. (2011) Discovery of selective bioactive small molecules by targeting an RNA dynamic ensemble. Nat. Chem. Biol. 7, 553?559. (451) Xie, J., and Frank, A. T. (2021) Mining for Ligandable Cavities in RNA. ACS Med. Chem. Lett. 12, 928?934. (452) Abulwerdi, F. A., Shortridge, M. D., Sztuba-Solinska, J., Wilson, R., Le Grice, S. F. J., Varani, G., and Schneekloth, J. S. (2016) Development of small molecules with a noncanonical binding mode to HIV-1 trans activation response (TAR) RNA. J. Med. Chem. 59, 11148?11160. (453) Connelly, C. M., Boer, R. E., Moon, M. H., Gareiss, P., and Schneekloth, J. S. 255 (2017) Discovery of Inhibitors of MicroRNA-21 Processing using small molecule microarrays. ACS Chem. Biol. 12, 435?443. (454) Connelly, C. M., Numata, T., Boer, R. E., Moon, M. H., Sinniah, R. S., Barchi, J. J., Ferr?-D?Amar?, A. R., and Schneekloth, J. S. (2019) Synthetic ligands for PreQ1 riboswitches provide structural and mechanistic insights into targeting RNA tertiary structure. Nat. Commun. 10, 1501. (455) Felsenstein, K. M., Saunders, L. B., Simmons, J. K., Leon, E., Calabrese, D. R., Zhang, S., Michalowski, A., Gareiss, P., Mock, B. A., and Schneekloth, J. S. (2016) Small molecule microarrays enable the identification of a selective, quadruplex-binding inhibitor of MYC expression. ACS Chem. Biol. 11, 138?148. (456) Calabrese, D. R., Chen, X., Leon, E. C., Gaikwad, S. M., Phyo, Z., Hewitt, W. M., Alden, S., Hilimire, T. A., He, F., Michalowski, A. M., Simmons, J. K., Saunders, L. B., Zhang, S., Connors, D., Walters, K. J., Mock, B. A., and Schneekloth, J. S. (2018) Chemical and structural studies provide a mechanistic basis for recognition of the MYC G-quadruplex. Nat. Commun. 2018 91 9, 1?15. (457) Valentovic, M. (2007) Gemfibrozil. xPharm Compr. Pharmacol. Ref. 1?5. (458) Y. Maximov, P., M. Lee, T., and Craig Jordan, V. (2013) The discovery and development of selective estrogen receptor modulators (SERMs) for clinical practice. Curr. Clin. Pharmacol. 8, 135?155. (459) Ruiz-Carmona, S., Alvarez-Garcia, D., Foloppe, N., Garmendia-Doval, A. B., Juhos, S., Schmidtke, P., Barril, X., Hubbard, R. E., and Morley, S. D. (2014) rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLOS Comput. Biol. 10, e1003571. (460) Bak, E., Miller, J. T., Noronha, A., Tavis, J., Gallicchio, E., Murelli, R. P., and Le Grice, S. F. J. (2020) 3,7-Dihydroxytropolones inhibit initiation of hepatitis B virus minus- strand DNAsynthesis. Molecules 25, 4434. (461) Carlson, H. A., Masukawa, K. M., Rubins, K., Bushman, F. D., Jorgensen, W. L., Lins, R. D., Briggs, J. M., and McCammon, J. A. (2000) Developing a dynamic pharmacophore model for HIV-1 integrase. J. Med. Chem. 43, 2100?2114. (462) Lin, J. H., Perryman, A. L., Schames, J. R., and McCammon, J. A. (2002) Computational drug design accommodating receptor flexibility: the relaxed complex scheme. J. Am. Chem. Soc. 124, 5632?5633. (463) Knegtel, R. M. A., Kuntz, I. D., and Oshiro, C. M. (1997) Molecular docking to ensembles of protein structures. J. Mol. Biol. 266, 424?440. (464) Ganser, L. R., Lee, J., Rangadurai, A., Merriman, D. K., Kelly, M. L., Kansal, A. D., Sathyamoorthy, B., and Al-Hashimi, H. M. (2018) High-performance virtual screening by targeting a high-resolution RNA dynamic ensemble. Nat. Struct. Mol. Biol. 2018 255 25, 425?434. (465) T?th, G., Gardai, S. J., Zago, W., Bertoncini, C. W., Cremades, N., Roy, S. L., Tambe, M. A., Rochet, J. C., Galvagnion, C., Skibinski, G., Finkbeiner, S., Bova, M., Regnstrom, K., Chiou, S. S., Johnston, J., Callaway, K., Anderson, J. P., Jobling, M. F., Buell, A. K., Yednock, T. A., Knowles, T. P. J., Vendruscolo, M., Christodoulou, J., Dobson, C. M., Schenk, D., and McConlogue, L. (2014) Targeting the intrinsically disordered structural ensemble of ?-synuclein by small molecules as a potential therapeutic strategy for Parkinson?s disease. PLoS One 9. (466) Fischer, M., Coleman, R. G., Fraser, J. S., and Shoichet, B. K. (2014) Incorporation 256 of protein flexibility and conformational energy penalties in docking screens to improve ligand discovery. Nat. Chem. 6, 575?583. (467) Salmon, L., Yang, S., and Al-Hashimi, H. M. (2014) Advances in the determination of nucleic acid conformational ensembles. Annu. Rev. Phys. Chem. 65, 293?316. (468) Salmon, L., Bascom, G., Andricioaei, I., and Al-Hashimi, H. M. (2013) A general method for constructing atomic-resolution RNA ensembles using NMR residual dipolar couplings: the basis for interhelical motions revealed. J. Am. Chem. Soc. 135, 5457? 5466. (469) Frank, A. T., Stelzer, A. C., Al-Hashimi, H. M., and Andricioaei, I. (2009) Constructing RNA dynamical ensembles by combining MD and motionally decoupled NMR RDCs: new insights into RNA dynamics and adaptive ligand recognition. Nucleic Acids Res. 37, 3670?3679. (470) Irwin, J. J., and Shoichet, B. K. (2005) ZINC-A free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177?182. (471) Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S., and Coleman, R. G. (2012) ZINC: A free tool to discover chemistry for biology. J. Chem. Inf. Model. 52, 1757?1768. (472) Sterling, T., and Irwin, J. J. (2015) ZINC 15 ? ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324?2337. (473) Tetko, I. V., Bruneau, P., Mewes, H. W., Rohrer, D. C., and Poda, G. I. (2006) Can we estimate the accuracy of ADME?Tox predictions? Drug Discov. Today 11, 700?707. (474) Singh, S. (2006) Preclinical Pharmacokinetics: an approach towards safer and efficacious drugs. Curr. Drug Metab. 7, 165?182. (475) Balani, S., Miwa, G., Gan, L.-S., Wu, J.-T., and Lee, F. (2005) Strategy of utilizing in vitro and in vivo ADME tools for lead optimization and drug candidate selection. Curr. Top. Med. Chem. 5, 1033?1038. (476) Lipinski, C. A. (2004) Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337?341. (477) Lipinski, C. A., Lombardo, F., Dominy, B. W., and Feeney, P. J. (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3?26. (478) Trott, O., and Olson, A. J. (2010) AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455?461. (479) Dallakyan, S., and Olson, A. J. (2015) Small-molecule library screening by docking with PyRx. Methods Mol. Biol. 1263, 243?250. (480) O?Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., and Hutchison, G. R. (2011) Open Babel: an open chemical toolbox. J. Cheminform. 3, 33. (481) Boeszoermenyi, A., Chhabra, S., Dubey, A., Radeva, D. L., Burdzhiev, N. T., Chanev, C. D., Petrov, O. I., Gelev, V. M., Zhang, M., Anklin, C., Kovacs, H., Wagner, G., Kuprov, I., Takeuchi, K., and Arthanari, H. (2019) Aromatic 19F-13C TROSY: A background-free approach to probe biomolecular structure, function and dynamics. Nat. Methods 16, 333?340. (482) Bergonzo, C., and Grishaev, A. (2019) Maximizing accuracy of RNA structure in refinement against residual dipolar couplings. J. Biomol. NMR 73, 117?139. (483) Watkins, A. M., Rangan, R., and Das, R. (2020) FARFAR2: Improved de novo rosetta prediction of complex global RNA folds. Structure 28, 963-976.e6. 257 (484) Shi, H., Rangadurai, A., Abou Assi, H., Roy, R., Case, D. A., Herschlag, D., Yesselman, J. D., and Al-Hashimi, H. M. (2020) Rapid and accurate determination of atomistic RNA dynamic ensemble models using NMR and structure prediction. Nat. Commun. 11, 5531. (485) Imam, H., Khan, M., Gokhale, N. S., McIntyre, A. B. R., Kim, G. W., Jang, J. Y., Kim, S. J., Mason, C. E., Horner, S. M., and Siddiqui, A. (2018) N6-methyladenosine modification of hepatitis b virus RNA differentially regulates the viral life cycle. Proc. Natl. Acad. Sci. U. S. A. 115, 8829?8834. (486) Xiao, W., Adhikari, S., Dahal, U., Chen, Y. S., Hao, Y. J., Sun, B. F., Sun, H. Y., Li, A., Ping, X. L., Lai, W. Y., Wang, X., Ma, H. L., Huang, C. M., Yang, Y., Huang, N., Jiang, G. Bin, Wang, H. L., Zhou, Q., Wang, X. J., Zhao, Y. L., and Yang, Y. G. (2016) Nuclear m6A Reader YTHDC1 Regulates mRNA Splicing. Mol. Cell 61, 507?519. (487) Zhao, X., Yang, Y., Sun, B. F., Shi, Y., Yang, X., Xiao, W., Hao, Y. J., Ping, X. L., Chen, Y. S., Wang, W. J., Jin, K. X., Wang, X., Huang, C. M., Fu, Y., Ge, X. M., Song, S. H., Jeong, H. S., Yanagisawa, H., Niu, Y., Jia, G. F., Wu, W., Tong, W. M., Okamoto, A., He, C., Danielsen, J. M. R., Wang, X. J., and Yang, Y. G. (2014) FTO-dependent demethylation of N6-methyladenosine regulates mRNA splicing and is required for adipogenesis. Cell Res. 24, 1403?1419. (488) Zheng, G., Dahl, J. A., Niu, Y., Fedorcsak, P., Huang, C. M., Li, C. J., V?gb?, C. B., Shi, Y., Wang, W. L., Song, S. H., Lu, Z., Bosmans, R. P. G., Dai, Q., Hao, Y. J., Yang, X., Zhao, W. M., Tong, W. M., Wang, X. J., Bogdan, F., Furu, K., Fu, Y., Jia, G., Zhao, X., Liu, J., Krokan, H. E., Klungland, A., Yang, Y. G., and He, C. (2013) ALKBH5 is a mammalian RNA demethylase that impacts RNA metabolism and mouse fertility. Mol. Cell 49, 18?29. (489) Fustin, J. M., Doi, M., Yamaguchi, Y., Hida, H., Nishimura, S., Yoshida, M., Isagawa, T., Morioka, M. S., Kakeya, H., Manabe, I., and Okamura, H. (2013) RNA-methylation- dependent RNA processing controls the speed of the circadian clock. Cell 155, 793?806. (490) Wang, X., Lu, Z., Gomez, A., Hon, G. C., Yue, Y., Han, D., Fu, Y., Parisien, M., Dai, Q., Jia, G., Ren, B., Pan, T., and He, C. (2013) N6-methyladenosine-dependent regulation of messenger RNA stability. Nature 505, 117?120. (491) Wang, X., Zhao, B. S., Roundtree, I. A., Lu, Z., Han, D., Ma, H., Weng, X., Chen, K., Shi, H., and He, C. (2015) N6-methyladenosine modulates messenger RNA translation efficiency. Cell 161, 1388?1399. (492) Meyer, K. D., Patil, D. P., Zhou, J., Zinoviev, A., Skabkin, M. A., Elemento, O., Pestova, T. V., Qian, S. B., and Jaffrey, S. R. (2015) 5? UTR m6A promotes cap- independent translation. Cell 163, 999?1010. (493) Roost, C., Lynch, S. R., Batista, P. J., Qu, K., Chang, H. Y., and Kool, E. T. (2015) Structure and thermodynamics of N6-methyladenosine in RNA: a spring-loaded base modification. J. Am. Chem. Soc. 137, 2107?2115. (494) Kierzek, E., and Kierzek, R. (2003) The thermodynamic stability of RNA duplexes and hairpins containing N6-alkyladenosines and 2-methylthio-N6-alkyladenosines. Nucleic Acids Res. 31, 4472?4480. (495) Liu, B., Merriman, D. K., Choi, S. H., Schumacher, M. A., Plangger, R., Kreutz, C., Horner, S. M., Meyer, K. D., and Al-Hashimi, H. M. (2018) A potentially abundant junctional RNA motif stabilized by m6A and Mg2+. Nat. Commun. 9, 2761. (496) Chu, C. C., Liu, B., Plangger, R., Kreutz, C., and Al-Hashimi, H. M. (2019) m6A 258 minimally impacts the structure, dynamics, and Rev ARM binding properties of HIV-1 RRE stem IIB. PLoS One 14, e0224850. 259