ABSTRACT Title of Dissertation: SYSTEMS APPROACHES TO IMMUNOLOGY IN ACUTE COVID19, MONOGENIC IMMUNE DISORDERS, AND CHILDHOOD DEVELOPMENT Nicholas Rachmaninoff, Doctor of Philosophy 2023 Dissertation directed by: Professor Philip L.F. Johnson Department of Biology In recent years advances in immune profiling technologies have allowed us to generate data at an unprecedented scale, and interrogate human immune systems in ways that were not previously possible. In this dissertation, I use these approaches in three different contexts. First, I explore how the cells of the immune system respond to acute COVID-19 infection and how this depends on the severity of the disease. Using CITEseq, simultaneous profiling of surface markers and RNA in peripheral blood mononuclear cells, I identify differentially expressed gene expression programs associated with COVID-19 infection and gene expression programs associated with disease severity. In addition, I explore how phenotypes of memory T- cells including the clonal nature and exhaustion signatures are associated with severity of COVID-19 infection. Second, I address what it means to be immunologically healthy through a multi-omics study of a cohort of patients at the NIH clinical center with various monogenic Immune disorders. I identify supervised and unsupervised axes of immune health that can separate disease from healthy controls, and additionally track changes to the immune system as people age, showing the parallels between disease associated inflammation and aging associated inflammation. I verify the utility of these metrics in several contexts outside of the original cohort and show that the signatures reflect broad changes to various cells of the immune system. Last, I explore the development of the immune system in childhood and the maintenance of temporally stable gene expression patterns. In a cohort of children that was tracked longitudinally over six years in Nicaragua, I utilize whole blood transcriptomics to explore both how the immune system changes as children grow older and which aspects of the immune system show large amounts of individuality or persistent inter-subject variation in their levels. I show that persistent inter-subject variation in gene expression and cellular frequencies is quite pronounced throughout childhood and attempt to identify when certain aspects of the immune system begin to stabilize in terms of their levels for an individual. SYSTEMS APPROACHES TO IMMUNOLOGY IN ACUTE COVID19, MONOGENIC IMMUNE DISORDERS, CHILDHOOD DEVELOPMENT by Nicholas Andre Rachmaninoff Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2023 Advisory Committee: Professor Philip L.F. Johnson, Chair/Advisor Professor John S. Tsang, Co-Advisor Professor Steve Mount Professor Charles Ma Professor David Mosser ã Copyright Nicholas Rachmaninoff 2023 ii DEDICATION To my parents, Vanessa and JP Rachmaninoff. Thanks for always being there throughout my life. iii ACKNOWLEDGEMENTS I would like to thank all participants in these studies and their families for their willingness to take part and advance our understanding of biology. I would also like to thank my two mentors throughout my PhD, John Tsang and Philip Johnson for their guidance and support throughout my time here. There was never a shortage of interesting and exciting projects to work on with John, and I thank him for both involving in new projects as they appeared and letting me explore various paths along the way. I thank Philip for the many hours spent brainstorming projects, for welcoming me into his lab even midway into my PhD, and for helping me find a path that worked for me. My committee members, Steve Mount, David Mosser, and Charles Ma also provided great advice along the way. I would also like to thank the members of the Tsang lab/CHI: Neha, Dylan, Darius, John Kim, Can, Candace, William, Andrew, Matt, Laura, Rachel, Pedro, Richard, Foo, Rohit and Yuri and the Johnson lab: Thomas, Shauna, Hao, Wei, and Arvind for being great friends and making it a pleasure to go into the lab. In particular, I would like to thank Dylan Hirsch for the many hours of discussion and for teaching me a lot about math, coding, and machine learning, and always being willing to talk if I ran into a problem. In addition, I would like to thank all of the great experimentalists that I had the pleasure of working with and to whom I am greatly indebted. Without their expertise and hard work, none of this would be possible. This includes Andrew Martins, Can Liu, Neha Bansal, Galina Koroleva, and Jinguo Chen. I would also like to thank Rachel Sparks for all of her work in organizing the pan- monogenic study and for mentoring me when I first arrived to the lab. I would like to thank iv William Lau for all of the great work we did together on multiple projects, and for putting together a fantastic clinical database for the pan-monogenic project. I would like to thank Stephen Popper for being a driving force behind the Nicaragua study, envisioning the study before I even began my PhD, and for the many hours of discussion of results and future analyses. In addition, I would like to thank Eva Harris and the members of her lab for running such a great study that made my work possible, in particular, Jose Victor Zambrana, for all of his help organizing and sharing data. My time here was made much richer by a great group of friends: Vanessa Rubio, Dom Braccia, Zabdiel Alvarado-Martinez, Anshuman Swain, and Kevin Bennett. Thanks for all of the great times we had together. Lastly, I would like to thank my family: my parents, JP and Vanessa, my siblings, Christian and Victoria, and my grandfather, Guillermo, for their support and love throughout my life. v TABLE OF CONTENTS DEDICATION ............................................................................................................................... ii ACKNOWLEDGEMENTS ........................................................................................................... iii TABLE OF CONTENTS ................................................................................................................ v INTRODUCTION .......................................................................................................................... 1 How can we measure the immune system in people? .......................................................................... 1 Blood as the typical tissue of choice ....................................................................................................... 1 What can you do with blood? ................................................................................................................ 2 Transcriptomics of the blood ................................................................................................................. 2 Transcriptomics ...................................................................................................................................... 3 Single Cell transcriptomics .................................................................................................................... 4 Measuring proteins in the blood ............................................................................................................ 4 Using immune profiling to understand immunological variation between people ........................... 5 Summary of Projects .............................................................................................................................. 5 Understanding immune correlates of disease severity in COVID-19 .................................................................. 5 Using many monogenic Immune disorders to understand immune health ........................................................... 6 Human Immune Variation and the immunological baseline ................................................................................ 7 Introduction References ......................................................................................................................... 9 Chapter 1: Time-resolved Systems Immunology Reveals a Late Juncture Linked to Fatal COVID-19 .................................................................................................................................... 10 SUMMARY ........................................................................................................................................... 10 AUTHOR CONTRIBUTIONS ............................................................................................................ 12 INTRODUCTION ................................................................................................................................ 15 RESULTS .............................................................................................................................................. 18 Study design and approach ................................................................................................................................. 18 Empowering disease severity correlate analysis: development of a multi-parameter disease severity metric .. 22 A multimodal time-resolved single immune cell atlas of COVID-19 patients and matching HCs .................... 26 pDC apoptosis is associated with elevated disease severity and reduced pDC frequency ................................. 39 Conditional independence network analysis suggests IL-15-linked fatty acid metabolism and attenuated inflammation in CD56dimCD16hi NK cells as primary correlates of disease severity ........................................ 41 Extensive single cell and clonal expansion heterogeneity without signs of exhaustion in CD8+ T cells ........... 51 Analysis of timing effects suggests a late immune response juncture ................................................................ 56 Divergences at the juncture predict fatal COVID-19 ......................................................................................... 59 DISCUSSION ........................................................................................................................................ 66 ACKNOWLEDGMENTS .................................................................................................................... 75 vi STAR METHODS ................................................................................................................................ 76 EXPERIMENTAL MODEL AND SUBJECT DETAILS ................................................................................. 77 METHOD DETAILS .......................................................................................................................................... 78 QUANTIFICATION AND STATISTICAL ANALYSIS .................................................................................. 83 REFERENCES .................................................................................................................................... 107 Chapter 2: Multiomics integration of 22 immune-mediated monogenic diseases reveals an emergent axis of human immune health .................................................................................. 129 Summary .............................................................................................................................................. 129 Author Contributions ......................................................................................................................... 130 Introduction ......................................................................................................................................... 132 Results .................................................................................................................................................. 134 A multiomics compendium of 22 monogenic immune-mediated diseases reveals temporally stable individual differences tend to be the dominant source of variation ................................................................................... 134 Pan-disease and disease-specific signatures ..................................................................................................... 145 Integration of transcriptomic and serum protein personal immune profiles revealed an emergent axis of immune health .................................................................................................................................................. 154 A quantitative metric of human immune health ............................................................................................... 163 The cellular origin of the IHM transcriptional signature .................................................................................. 179 IHM captures immune variation in heathy individuals beyond age ................................................................. 183 Discussion ............................................................................................................................................ 186 Limitations of the Study ................................................................................................................................... 191 Acknowledgements ............................................................................................................................. 193 Methods ................................................................................................................................................ 194 Methods References ............................................................................................................................ 216 References ............................................................................................................................................ 219 Chapter 3: Longitudinal gene expression profiling of children in Nicaragua reveals persistent inter-subject immune variation maintained over 6 years ......................................................... 224 Summary .............................................................................................................................................. 224 Author Contributions ......................................................................................................................... 225 Introduction ......................................................................................................................................... 225 Results .................................................................................................................................................. 227 Longitudinal profiling of whole blood transcriptomes in children in Nicaragua reveals that age is associated with gene expression principal components and cell frequencies ................................................................... 227 Temporal trends of cell-frequencies and gene expression ................................................................................ 232 Persistent inter-subject variation is the most prominent source of explained gene expression variation ......... 235 Genes with high individuality tend to be more heritable and are consistent across studies ............................. 241 The time dynamics of inter-subject variation ................................................................................................... 246 Discussion ............................................................................................................................................ 252 Relation to previous work ................................................................................................................................. 254 Limitations of the study .................................................................................................................................... 255 Strengths of the study ....................................................................................................................................... 256 Directions for future study ................................................................................................................................ 256 vii Methods ................................................................................................................................................ 257 References ............................................................................................................................................ 265 1 INTRODUCTION How can we measure the immune system in people? Much has been learned about immunology through careful observation and experimentation in mice as a model system. The advent of high-throughput immune profiling technologies now allows us to observe human immunology in a way that was not previously possible. Though fraught with some difficulties and large variability that comes with looking in people, we can begin to understand immune responses in humans building off the many years of knowledge gained through the study of model organisms and careful observation of individual parameters in people. Blood as the typical tissue of choice Due to its ease of collection, blood is the predominant tissue used in the study of human immunology. Because it functions to transport both the cells and signaling molecules of the immune system, it allows us to measure what is going on in various parts of the body, though this also makes it difficult to know whether the cells in the blood are responding to distant signals or more local signals at the site of inflammation. It would be ideal to understand what is happening in various immunological tissues; for example through fine needle aspirates of draining lymph nodes (Kim et al., 2022), surgical resection of tonsils (King et al., 2021), or looking at local sites of inflammation through techniques like bronchoalveolar lavage (Arunachalam et al., 2020). However, it often is easiest and only feasible to get blood from the human subjects in these studies. Moreover, if any interesting biomarkers are discovered in the blood, it is far easier to take a relatively non- 2 invasive blood draw than it is to remove a tissue. The study of blood allows us to understand what the whole system is doing and do so in a way that is minimally invasive. What can you do with blood? Once blood is drawn and processed, cells can be enumerated by a coulter counter; cellular phenotypes can be measured by flow cytometry; RNA inside cells can be measured with PAXgene and TEMPUS tubes that make it easy to collect and stabilize RNA immediately after a blood draw. In addition, it is possible to measure the proteins and metabolites present in the blood as well. In this dissertation, I primarily focus on studying the cells of the immune system and the processes within them using RNAseq at the bulk and single cell level. Chapter 2 however, utilizes a high-throughput method for the study of proteins in blood. Transcriptomics of the blood In this dissertation, I utilize both whole blood bulk transcriptomic profiling (chapters 2 and 3), and single cell transcriptomic profiling (chapter 1) to study the immune cells and cellular processes in the blood. Whole blood transcriptomics allows us to understand both the relative predominance of certain types of cells in the blood and additionally the processes ongoing in those cells. It is limited however by losing the information about which types of cells produced which RNA molecules. Despite this, it is very useful for providing a broad overview of the processes within the blood and can allow for the identification of biomarkers of disease and other processes. 3 In contrast, single-cell transcriptomics provides information at the single cell level. It is possible to enumerate both the number of cells with a particular phenotype in a sample as well as describe the processes ongoing within that cell. It is however limited by cost, additional difficulty in sample preparation and data generation, and much reduced ability to scale to many samples compared to bulk transcriptomics. Transcriptomics The central dogma of biology tells us that DNA is transcribed into RNA which is translated into proteins. These proteins are essentially the effector molecules of the cell. However, because of its role as an intermediate messenger molecule, the levels of RNA in a cell (or tissue) tell us something about the levels of the related protein. Therefore, it is possible to utilize quantitative measures of RNA to attempt to understand some of the processes that are going on within a cell or group of cells. The RNA molecules that are eventually translated into proteins are termed mRNA. It is now possible to measure nearly all types of mRNA present in a tissue using microarrays or RNA sequencing (RNAseq). Both technologies ultimately provide a quantitative readout of the relative amount of a given RNA molecule present in a cell and it is possible to make relative comparisons in the levels between different samples. Looking at a single RNA species of interest can allow one to discern whether a gene is higher or lower in each sample, but it is also possible to use many RNA molecules at once to gain a global view of the similarities or differences between various samples. 4 Single Cell transcriptomics Recent advances in microfluidics have allowed us to now measure RNA in single cells by, for example, capturing cells in droplets in an oil emulsion. Within the droplets, RNA amplification can be performed. Within the droplet, each cell binds a bead with a unique molecular barcode where PCR is performed to amplify the RNA transcripts. After amplification occurs, the droplets are pooled, and using the molecular barcodes, we ultimately can trace back which RNAs corresponded to each cell. This allows for similar analyses as bulk RNAseq, but also for the discovery of cells with previously unknown phenotypes and the ability to identify the cellular source of gene expression signatures found in bulk studies. Measuring proteins in the blood Immune cells also communicate with each other through a variety of signaling molecules that can be measured in the blood. The molecules also can reflect non-immune cells communicating with immune cells to promote inflammation or recruit a certain type of immune response. For example there exist several chemokines and cytokines, TNFa, IL-1, and CXCL8, that promote a process called extravasation where cells adhere to the vascular endothelium and then enter a nearby tissue (Muller, 2013). The quantity of a certain type of protein can be measured through immunoassays such as ELISA. There also exists a scaled-up immunoassay called Luminex that allows for the measurement of multiple proteins simultaneously. This technique is used in chapter 1 to measure cytokines in COVID patients. In addition, there are more high throughput methods such as 5 aptamer-based arrays such as SomaLogic that allow you to measure thousands of proteins simultaneously in the blood. A SomaLogic panel of ~1300 proteins is used in chapter 2. Using immune profiling to understand immunological variation between people There is great variability between people in their ability to respond to infections, susceptibility to autoimmune disorders, or allergic complications. This variability is due to a combination of factors reflecting the current state of their immune system, which ultimately is tied to a combination of genetics, history of exposure to infectious agents and chemicals, interactions with the microbiome, diet and lifestyle. By measuring aspects of the immune system through the approaches described above we can begin linking features of the immune system to these functional outcomes. Summary of Projects Understanding immune correlates of disease severity in COVID-19 In chapter 1, I explore how the cells of the immune system respond to COVID-19 infection and additionally explore correlates of disease severity in terms of the immune response. The COVID-19 pandemic has caused 6.62 million deaths so far at the time of writing this dissertation (https://ourworldindata.org/explorers/coronavirus-data-explorer). The reason for its high mortality rate is still not fully understood despite the several correlates of disease severity that have been discovered (age, male sex, being overweight, type diabetic, or immuno- compromised) (Gallo Marin et al., 2021). Many of the deaths seem to be caused by an 6 overreaction of the immune response in the lungs that leads to permanent lung damage. There is also a general increase in caogulatory activity in the blood and other organ damage that occurs. Beyond understanding these risk factors for severe disease, it is important to understand what is happening to the immune system in severe disease. This informs not only what treatments could be tried, but also some basic immunology relating to pathological immune responses to respiratory infections. To address this question of which aspects of the immune response were associated with disease severity, we derived a metric of disease severity that allowed for continuous descriptions of disease severity. I then identified genes differentially expressed in COVID across various cell clusters and additionally genes associated with severity. I found that interferon signaling was elevated in these patients in contrast to what was previously reported in the literature. Additionally, using a network-based approach we identified several main correlates of disease severity and validated this in an external cohort. I also showed that increased clonality in memory CD8 T cells was associated with a more severe course of disease. Lastly, we explored whether there existed a critical juncture around 20 days into COVID infection that separated the more severe vs. less severe patients. Using many monogenic Immune disorders to understand immune health In Chapter 2, I explore what it means to be immunologically healthy throughout a multi-omics study of many monogenic immune disorders. Monogenic immune disorders (disease that is believed to be caused by a mutation in a single gene) provide an excellent resource for understanding the link between genetics and phenotypic immune variation. Though there exist many studies showing signatures of single 7 diseases in comparison to healthy individuals, there have been few attempts to understand the vast landscape of different human diseases in an integrated fashion (Lee et al., 2019). In the context of common diseases, it has been observed that many seemingly disparate genetic defects could converge to a shared phenotype (Goh et al., 2007). This in turn suggests that finding commonalities across seemingly disparate immunological disorders may allow for effective transfer of successful therapies across different diseases. To address this issue of whether multiple immune disorders can be considered together to understand immune health, I derive two metrics of immune health, one that arises naturally from the data in an unsupervised fashion and one that was trained to identify healthy individuals. I show that these metrics of immune health not only separate healthy controls from those with immunological disease, but additionally track age related inflammation. I validate this in additional cohorts and attempt to describe the cellular origin of these signatures using an external dataset. Human Immune Variation and the immunological baseline In chapter 3, I aim to find which aspects of the immune system show the greatest variation between people that is maintained over time, and describe how the immune system changes as children age. Several studies have shown that some aspects of the immune system (cell frequencies, cytokines in the blood, and gene expression) are remarkably stable over time within individuals (Carr et al., 2016; Tsang et al., 2014) This is likely a result of a combination of age, genetics, previous exposures and environmental factors. The contribution of genetics to frequencies of peripheral immune parameters has also been described through GWAS studies looking at SNPs 8 associated with cell frequencies (Roederer et al., 2015). Despite this, it is still not fully understood when these stable subject-subject differences develop nor whether they can be observed in childhood. I build upon this work by using an approach called variance partitioning to attempt to find genes and cell populations that show large amounts of individuality (persistent differences in these cells that are maintained over time) in a cohort of children from Nicaragua who received annual blood draws as part of the Pediatric Dengue Cohort Study. I also identify cell populations and gene expression programs that change as children age. I then explore how and when immunological baseline states are set by identifying genes and cell populations that show increasing or decreasing levels of individuality over time. 9 Introduction References Arunachalam, P.S., Wimmers, F., Mok, C.K.P., Perera, R.A.P.M., Scott, M., Hagan, T., Sigal, N., Feng, Y., Bristow, L., Tak-Yin Tsang, O., et al. (2020). Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science 369, 1210-1220. doi:10.1126/science.abc6261. Carr, E.J., Dooley, J., Garcia-Perez, J.E., Lagou, V., Lee, J.C., Wouters, C., Meyts, I., Goris, A., Boeckxstaens, G., Linterman, M.A., and Liston, A. (2016). The cellular composition of the human immune system is shaped by age and cohabitation. Nature Immunology 17, 461--468. 10.1038/ni.3371. Gallo Marin, B., Aghagoli, G., Lavine, K., Yang, L., Siff, E.J., Chiang, S.S., Salazar‐Mather, T.P., Dumenco, L., Savaria, M.C., and Aung, S.N. (2021). Predictors of COVID‐19 severity: a literature review. Reviews in medical virology 31, 1-10. Goh, K.-I., Cusick, M.E., Valle, D., Childs, B., Vidal, M., and Barabási, A.-L. (2007). The human disease network. Proceedings of the National Academy of Sciences 104, 8685--8690. 10.1073/pnas.0701361104. Kim, W., Zhou, J.Q., Horvath, S.C., Schmitz, A.J., Sturtz, A.J., Lei, T., Liu, Z., Kalaidina, E., Thapa, M., and Alsoussi, W.B. (2022). Germinal centre-driven maturation of B cell response to mRNA vaccination. Nature 604, 141-145. King, H.W., Orban, N., Riches, J.C., Clear, A.J., Warnes, G., Teichmann, S.A., and James, L.K. (2021). Single-cell analysis of human B cell maturation predicts how antibody class switching shapes selection dynamics. Science Immunology 6, eabe6291. doi:10.1126/sciimmunol.abe6291. Lee, Y.-s., Krishnan, A., Oughtred, R., Rust, J., Chang, C.S., Ryu, J., Kristensen, V.N., Dolinski, K., Theesfeld, C.L., and Troyanskaya, O.G. (2019). A Computational Framework for Genome-wide Characterization of the Human Disease Landscape. Cell Systems 8, 152--162.e156. 10.1016/j.cels.2018.12.010. Muller, W. (2013). Getting leukocytes to the site of inflammation. Veterinary pathology 50, 7-22. Roederer, M., Quaye, L., Mangino, M., Beddall, M.H., Mahnke, Y., Chattopadhyay, P., Tosi, I., Napolitano, L., Barberio, M.T., Menni, C., et al. (2015). The Genetic Architecture of the Human Immune System: A Bioresource for Autoimmunity and Disease Pathogenesis. Cell 161, 387--403. 10.1016/j.cell.2015.02.046. Tsang, J.S., Schwartzberg, P.L., Kotliarov, Y., Biancotto, A., Xie, Z., Germain, R.N., Wang, E., Olnes, M.J., Narayanan, M., Golding, H., et al. (2014). Global Analyses of Human Immune Variation Reveal Baseline Predictors of Postvaccination Responses. Cell 157, 499--513. 10.1016/j.cell.2014.03.031. 10 Chapter 1: Time-resolved Systems Immunology Reveals a Late Juncture Linked to Fatal COVID-19 Adapted from previous work: Can Liu*, Andrew J. Martins*, William W. Lau*, Nicholas Rachmaninoff*, Jinguo Chen, Luisa Imberti, Darius Mostaghimi, Danielle L. Fink, Peter D. Burbelo, Kerry Dobbs, Ottavia M. Delmonte, Neha Bansal, Laura Failla, Alessandra Sottini, Eugenia Quiros-Roldan, Kyu Lee Han, Brian A. Sellers, Foo Cheung, Rachel Sparks, Tae-Wook Chun, Susan Moir, Michail S. Lionakis, NIAID COVID Consortium, COVID Clinicians, Camillo Rossi, Helen C. Su, Douglas B. Kuhns, Jeffrey I. Cohen, Luigi D. Notarangelo, John S. Tsang, “Time-resolved Systems Immunology Reveals a Late Juncture Linked to Fatal COVID-19” Cell 184.7 (2021): 1836-1857. *Contributed Equally SUMMARY COVID-19 exhibits extensive patient-to-patient heterogeneity. To link immune response variation to disease severity and outcome over time, we longitudinally assessed circulating proteins as well as 188 surface protein markers, transcriptome, and T-cell receptor sequence simultaneously in single peripheral immune cells from COVID-19 patients. Conditional- independence network analysis revealed primary correlates of disease severity, including gene expression signatures of apoptosis in plasmacytoid dendritic cells and attenuated inflammation but increased fatty acid metabolism in CD56dimCD16hi NK cells linked positively to circulating IL-15. CD8+ T cell activation was apparent without signs of exhaustion. While cellular 11 inflammation was depressed in severe patients early after hospitalization, it became elevated by days 17-23 post symptom onset, suggestive of a late wave of inflammatory responses. Furthermore, circulating protein trajectories at this time were divergent between, and predictive, of recovery-fatal outcomes. Our findings stress the importance of timing in the analysis, clinical monitoring, and therapeutic intervention of COVID-19. 12 AUTHOR CONTRIBUTIONS NR contributions CL, AJM, WWL, NR performed data analysis and generated figures with help from FC; CL, AJM, WWL, NR, and JST interpreted the data; CL, AJM, WWL, NR, and JST designed data analysis strategies and approaches; CL, AJM, WWL, NR and JST finalized the manuscript with inputs from other authors. Contributions from other authors JST and LDN conceived the study; Andrew Martins, Can Liu, JST designed CITE-seq experiments with inputs from WWL, DM, JC, and SM; CL, JC, DM generated CITE-seq data with help from NB; WWL organized sample meta-data with help from the NIAID COVID Consortium; KLH and BAS generated flow cytometry data; NIAID COVID Consortium provided logistical and infrastructure and analysis support; LI, AS, EQR and COVID Clinicians cared for patients, collected samples, and clinical data; OMD, LI, LF, AS, EQR, MSL, HCS, CR, and LDN clinically characterized patients with help from NIAID COVID Consortium and COVID Clinicians; LI, AS, KD, NB processed and handled patient samples; RS, LF, NB, TWC, SM contributed healthy control samples; DLF, DBK, MSL, LDN contributed circulating protein data; PDB and JIC contributed antibody data; JST drafted the manuscript; JST supervised the study. NR Computational analysis contributions Analysis of expanded T cell populations and T cell clonality and the association with severity/COVID were performed by NR. Analysis of T cell exhaustion using canonical surface 13 markers and gene expression signatures was performed by NR. Pseudobulk differential expression testing for differences between COVID/healthy, associations with the disease severity metric, and interaction models looking at severity timing interactions were performed by NR. All bubble plots showing enriched genesets for the differential expression results was generated by NR. Differential abundance analysis of cell clusters was performed by NR. Validation of COVID associated signatures in external single-cell RNAseq data (Schulte-Schrepping et al.) and creation of related plots was performed by NR. Computational analysis contributions from other authors The Disease severity metric was developed and tested by WWL along with related figures. Patient timelines were visualized by WL. Clustering of CITEseq data, annotation of clusters, and visualization via UMAP was performed by AJM and CL. CL analysed the correlation between cell frequencies measured by CITEseq vs. flow cytometry The disease severity network was created by WWL. Fine clustering of T cell populations was performed by AJM. CL created heatmaps of differentially expressed modules (e.g. interferon, translation, pDC apoptosis, NK cell fatty acid signaling) in the Brescia cohort and German cohort. Plots of geneset scores over time (e.g. NK cell fatty acid, Hallmark TNF-alpha, translation) were created by CL. CL created heatmap of cell frequencies in various patients The critical juncture analysis, (selection of proteins in Brescia cohort and prediction of patient morality in additional cohort) was performed by WWL. CL created plots showing major correlates of the disease severity network. CL performed the analysis related to glucocorticoid responses and the TSC22D3 expression, WWL performed Effect size comparison of Brescia deceased versus recovered and an independent US 14 cohort. CL performed comparison of effect sizes and at the period before day 17 (TSO < day 17, green) and during the TSO = days 17-23 period (purple) for inflammatory related gene sets and serum proteins. Analysis and visualization of temporal dynamics of myeloid cell populations around the critical juncture was performed by CL and WWL. Contributions to specific figures are also described in the figure legends. Unless otherwise stated in the legend, the analysis/figure was performed/created by NR 15 INTRODUCTION COVID-19 is a potentially fatal disease caused by infection with the coronavirus SARS-CoV-2 (Zhou et al., 2020b), which has been fueling a global pandemic since December 2019 with more than 98 million confirmed cases and 2.1 million deaths to date (March, 2021) (https://www.who.int/emergencies/diseases/novel-coronavirus-2019). The clinical course of COVID-19 exhibits extensive heterogeneity: the infection can result in little to no symptoms, or mild disease with complete recovery, but also critical illness, hospitalization, and progression to Acute Respiratory Distress Syndrome (ARDS), tissue damage, and death (Wu and McGoogan, 2020). Variables such as age and host genetic variations [e.g., TLR7 in men (van der Made et al., 2020) or genes involved in type I Interferon (IFN) production (Pairo-Castineira et al., 2020; Zhang et al., 2020c)], presence of autoantibodies (Bastard et al., 2020; Wang et al., 2020), and preexisting conditions, including obesity, diabetes mellitus, and cardiovascular disease have been identified as risk factors for severe disease and poor outcomes (CDC COVID-19 Response Team, 2020; Williamson et al., 2020). The research community has rapidly mobilized to investigate how human immune response variations can contribute to disease severity, progression, and outcome (reviewed in (Grigoryan and Pulendran, 2020)). Understanding the mechanisms underlying immune response dynamics and differences across patients is critical for designing effective therapeutics, prognostic tools, and prevention strategies better tailored to individual patients. The immune system protects the host but can also drive life-threatening inflammatory pathologies (Matheson and Lehner, 2020; Tay et al., 2020). Human immune responses to SARS- CoV-2 infection are highly dynamic; most of the data thus far came from studies of hospitalized 16 patients following symptom onset (Arunachalam et al., 2020; Laing et al., 2020; Lucas et al., 2020; Mathew et al., 2020; Moderbacher et al., 2020; Schulte-Schrepping et al., 2020; Su et al., 2020). COVID-19 patients are broadly characterized by elevation in neutrophils and monocytes concomitant with lymphopenia in blood, particularly with decreases in CD4+ and CD8+ T cells (Lucas et al., 2020; Mathew et al., 2020), including subsets such as gamma-delta T cells (Laing et al., 2020). However, consistent with ongoing adaptive responses, increases in the frequency of activated and cycling CD4+ and CD8+ T cells as well as activated B cells and plasmablasts have been detected (Laing et al., 2020; Mathew et al., 2020). Accordingly, robust humoral responses have been observed with marked elevation of antibodies against SARS-CoV-2 proteins, such as the spike and nucleocapsid (Burbelo et al., 2020; Ju et al., 2020; Laing et al., 2020; Long et al., 2020; Zheng et al., 2020). However, lack of coordination among and delay in antigen-specific T and B cell responses tend to mark severe disease (Moderbacher et al., 2020). The level of proinflammatory cytokines such as IL-6, IL-8, IL-18, and TNF-a are often elevated in COVID- 19 patients, and higher levels of IL-6 has been found to be predictive of poor outcomes in several studies (Liu et al., 2020; Mandel et al., 2020); TNF-a, IP-10 (CXCL-10), and IL-10 levels have also been shown to carry prognostic information (Abers et al., 2021; Del Valle et al., 2020; Laing et al., 2020; Lucas et al., 2020). Innate immunity shows signs of dysregulation in COVID-19. Early reports noted the low levels of type I Interferon (IFN) activity detected in peripheral blood or lungs of COVID-19 patients (Acharya et al., 2020; Blanco-Melo et al., 2020; Hadjadj et al., 2020), suggesting a potential defect in viral defense. However, later reports found type I IFN transcriptional signatures (IFN-I signatures) in cells from bronchoalveolar lavage (BAL) fluids or peripheral blood of patients (Arunachalam et al., 2020; Liao et al., 2020; Schulte-Schrepping et al., 2020). 17 The levels of both circulating IFN-a proteins and IFN-I signature in blood were negatively associated with the time post symptom onset and more mildly, with disease severity (Arunachalam et al., 2020; Hadjadj et al., 2020), suggesting that timing plays an important role in both the induction and experimental detection of type I IFN activities. In addition to the genetic evidence linking the type I IFN pathway to severe COVID-19 mentioned above, auto- antibodies against type I IFN have been found in some patients with life-threatening disease (Bastard et al., 2020; Wang et al., 2020), further implicating the importance of type I IFN response to protection against severe disease. Aside from infected epithelial cells, plasmacytoid dendritic cells (pDC) are another major producer of type I IFNs in response to viral infections. The frequency of pDCs in peripheral blood has been found to be lower in COVID-19 patients than healthy controls (HCs) and negatively correlated with disease severity (Arunachalam et al., 2020; Kuri-Cervantes et al., 2020; Laing et al., 2020), but substantial changes in pDC levels have not been found in BAL fluids or lung autopsies (Chua et al., 2020; Liao et al., 2020; Nienhold et al., 2020; Wauters et al., 2020). In addition, cell surface levels of HLA-DR have also been found to be lower in classical monocytes (Giamarellos-Bourboulis et al., 2020; Moratto et al., 2020; Schulte-Schrepping et al., 2020) and myeloid DCs (Arunachalam et al., 2020), suggesting that patients undergoing active inflammation and immune responses to SARS-CoV-2 have attenuated innate immune functions. Given the importance of timing (Matheson and Lehner, 2020) and the involvement of diverse immune cells with evolving states in COVID-19, here we used multimodal single cell profiling and computational approaches to longitudinally assess hospitalized COVID-19 patients. In addition to providing a time-resolved single cell atlas of COVID-19 for further integration with other single cell datasets (Schulte-Schrepping et al., 2020; Su et al., 2020; Unterman et al., 18 2020; Wilk et al., 2020; Yao et al., 2020; Zhang et al., 2020b), we revealed a network of dynamically evolving cell-type specific signatures linked to disease severity. Strikingly, our analyses uncovered a late time window during which the host immune response undergoes divergences correlated with disease severity and predictive of fatal outcomes. Our results raise the prospect of more personalized intervention strategies and stress the importance of timing in the analysis and clinical monitoring of COVID-19. RESULTS Study design and approach Peripheral blood mononuclear cells (PBMC) were obtained from 33 hospitalized patients from Brescia, Italy (Figures 1A and S1A) and 14 age- and gender-matched HCs (see Methods). The patients were classified as moderate (n=3), severe (n=5), and critical (n=25 with 4 deceased during hospitalization) based on the National Institutes of Health guidelines (https://www.covid19treatmentguidelines.nih.gov/overview/management-of-covid-19/) (Table S1). This distribution of disease severity is reflective of the early pandemic in Northern Italy when only the most severe patients were hospitalized due to capacity limitations. Longitudinal samples were obtained at up to four time-points from each patient [Time 0 (T0) to Time 3 (T3)] that covered a range of times post symptom onset, thus providing a unique opportunity for linking immune state variations at T0 (at or near the time of hospital admission) to disease status and severity and reconstructing temporal immune response trajectories by pooling information across patients (Figure 1A). 19 20 Figure 1: Study Design and a Multi-parameter Disease Severity Metric (DSM). (A) COVID-19 patient cohort overview and sample collection timeline. The distribution of sample collection timing since symptom onset (TSO) is summarized on the top while the sampling for each of the 33 donors (age and gender indicated in parentheses) is plotted below indicating individual T0 (first) up to T3 (fourth) timepoints. One patient (P098) with unknown TSO is depicted here by using his hospital admission date as the reference. Some samples were excluded from analysis after initial quality control as they did not have a sufficient number of cells (see Methods). (B) Experimental design, analysis questions, and approach. Frozen PBMCs from the collected samples indicated in (A) and age-matched healthy controls were thawed and pooled, followed by combined surface protein/mRNA single cell expression analysis (CITE- seq); data from individual samples were demultiplexed by a combination of SNP-based and ‘hashtag’ antibody staining (see Methods). After automated clustering (plus additional targeted cell populations identified by manual gating) and cell subset identification based on surface protein profiles, pooled RNA data (‘pseudobulk’) libraries were generated in silico for each sample per cell cluster/population, and then used for downstream analysis using mixed effect statistical modeling to assess the effect of disease severity and TSO, while controlling for other variables such as TSO (when characterizing the effect of disease severity), age and experimental batch (created with BioRender.com). (C) Heatmap of 18 clinical and serum protein measurements of patients after correction for days since hospital admission. Measurement values are standardized across subjects. Samples are divided into four groups based on unsupervised hierarchical clustering. Patients admitted to the Intensive Care Unit (ICU) during their hospitalization are denoted with an asterisk (*) next to their labels. None of the T0 samples were 21 collected at the ICU. (D) Parameter importance of the fitted coefficient values from partial-least- square (PLS) ordinal regression models of disease severity. Parameters are ordered by their absolute median coefficient values from independent training iterations in leave-one-out cross validation. Error bars indicate standard deviation of coefficient distribution across all iterations. (E) Distribution of patient disease severity metric (DSM) grouped based on the disease severity- outcome classification for the 30 patients with CITE-seq data (left panel, see Methods) and an independent set of 64 patients from Brescia (right panel). The DSM formula was trained using the CITE-seq cohort only. Significance of mean difference between groups is determined by two-tailed Wilcoxon test. *: p <= 0.05, **: p <= 0.01, and ***: p <= 0.001. See also Figure S1 and Table S1. Analysis from William Lau. Figure generated William Lau and Andrew Martins Peripheral single immune cells were profiled by CITE-seq (Stoeckius et al., 2017), simultaneously measuring: 1) the expression of 188 surface protein markers (plus 4 isotype control antibodies) (Table S2); 2) the mRNA transcriptome; and 3) B- and T-cell receptor (BCR/TCR) sequences of the variable/diversity/joining (V(D)J) region. In addition to total PBMCs, we sorted non-naïve B and T cells to increase their representation for T/B cell clonality and repertoire analysis. We integrated T cell clonality data here, while the corresponding B cell data will be reported separately. The data were analyzed at both the single cell and “pseudobulk” levels – the latter capturing the average gene expression profiles of cell clusters and states while mitigating single-cell mRNA measurement noise (Kotliarov et al., 2020; Soneson and Robinson, 2018). In addition to multimodal single cell profiling, we have obtained and, in our analyses 22 below, integrated blood cell count data and serum protein profiles (Abers et al., 2021) covering 63 circulating proteins including cytokines. Our analyses focused on the gene expression program within cell clusters (capturing cell types and states). We built mixed effect statistical models to link these parameters to disease status (COVID-19 patients versus HCs), disease severity (after adjusting for the effects of timing; see below), and timing [time since symptom onset (TSO)] after accounting for age and experimental batch (Figure 1B). The goal was to first uncover cell type specific gene expression correlates of disease status and severity at T0 (near admission) followed by a dissection of how immune cell states evolve over time. Given the dynamic nature of the immune response to SARS-CoV-2, accounting for and disentangling the effects of timing when analyzing disease severity correlates were particularly important. Empowering disease severity correlate analysis: development of a multi-parameter disease severity metric Given the large proportion of patients classified as “critical” [or a World Health Organization (WHO) ordinal score of ≥ 5] in our cohort, we first sought to use established clinical, inflammatory, and cell count parameters to develop a quantitative disease severity score; such a metric would allow us to delineate finer shades of disease severity (e.g., breaking ties among patients in the same severity category) and thus better power downstream analyses. Based on the literature (Del Valle et al., 2020; Lucas et al., 2020; Mandel et al., 2020; Schett et al., 2020), we selected circulating proteins associated with inflammation (TNF-a/b, IL-6, IL-18), IFN responses (IP-10, CXCL9), immune responses (IFN-g, IL-4, IL-13, IL-17), and clinical markers known to be corelated with COVID-19 severity, namely C-reactive protein (CRP), neutrophil/lymphocyte 23 ratio (NLR), lymphocyte and platelet counts, fibrinogen, D-dimer, lactate dehydrogenase (LDH), and pulse oximetry to fraction of inspired oxygen (SpO2/FiO2) ratio. Excluding three patients with very late initial sample time-points (TSO ≥ 30 days), hierarchical clustering of the T0 patient samples using these parameters revealed several distinct patient profiles (Figure 1C), including 1) a profile of the most critical patients characterized by high levels of inflammatory markers but low in TNF-b and lymphocyte counts; 2) the moderate/severe patients with higher IL-17 and lymphocyte counts with either high or low levels of several cytokines; and 3) most of the critical-alive patients with a more mixed profile, consistent with the notion that a finer quantitative delineation of disease severity could be useful to characterize these patients. Using ordinal partial least square regression and leave-one-out cross validation within the CITE- seq cohort, we performed feature selection to identify parameters most predictive of the four severity-outcome categories (Figure 1D; SpO2/FiO2 ratio was not used because it was used to define clinical disease severity). Those positively associated with severity include NLR, LDH, IP-10, D-Dimer, IL-6, and IL-13, while lymphocyte count and TNF-b were correlated with less severe disease. These top parameters were then used along with the SpO2/FiO2 ratio to derive a severity score for each patient. As expected, this disease severity metric (DSM), was significantly associated with the severity-outcome categories within our CITE-seq cohort (Figure 1E) and also with the WHO ordinal score (Figure S1B). This was validated in an independent set of 64 patients from Brescia for whom DSM was computed using the same formula derived from our CITE-seq cohort only (Figure 1E). The DSM of the critical-alive patients in both the CITE- seq and the validation cohorts spanned a wide range, from levels comparable to the moderate and severe categories, to those of the critical-deceased patients (Figure 1E). As a further indication that DSM can delineate clinically relevant disease severity, DSM was significantly associated 24 with ICU admission status in the validation cohort, even when assessed in the critical-alive patients only (Figure S1C). DSM thus provides a quantitative tool to order patients on a continuous severity scale; in our analyses below, we also used DSM to bin patients into discrete balanced groups of lower (DSM-low: DSM<=median DSM) versus higher disease severity (DSM-high: DSM>median DSM). 25 26 Supplemental Figure 1 (related to Figure 1): Timeline of Treatments Relative to Hospital Admission Date and DSM Validation in an Independent Set of Patients. (A) Timeline of treatments relative to hospital admission date. Dashed lines represent treatments with unknown start and/or end dates. Age and gender are listed next to each patient label. (B) Distribution of patient disease severity metric (DSM) grouped based on the WHO ordinal disease severity score of the CITE-seq cohort at the earliest time of PBMC sampling. The p value was computed using the Jonckheere trend test to assess the positive correlation between the WHO score and DSM. (C) Distribution of validation cohort disease severity metric (DSM) grouped by whether they were ever admitted to the ICU during the course of hospitalization for all patients (left) and only patients classified as Critical-Alive (right). Significance of mean difference between groups is determined by two-tailed Wilcoxon test. Figure generated by William Lau A multimodal time-resolved single immune cell atlas of COVID-19 patients and matching HCs Given the interpretability of immune cell subsets identified using surface protein markers, we used the 188 surface protein profile to cluster ~400,000 single cells and derived 30 coarse- resolution cell clusters (Table S3) spanning diverse innate and adaptive immune cell types and subsets [Figures 2A and 2B (small clusters not shown); Tables S2 and S3]. Additional higher resolution clustering within each coarse level subset identified finer subpopulations, e.g., distinct memory subsets within the CD4+ memory T cell cluster (Figure 2C). Manual gating was also used to obtain specific cell populations such as plasmablasts and circulating T follicular helper 27 cells (cTfh). The relative frequency of the major cell populations obtained by CITE-seq is largely concordant with that obtained independently from 27-color flow cytometry (Figure S2A). Furthermore, after accounting for TSO and age, we confirmed several previously reported associations between peripheral immune cell frequencies and disease status or severity near hospital admission (T0) (Figures S2B and S2C), including the negative correlation between severity (here captured by DSM) and the frequency of pDCs, CD8+ central memory T cells, and non-classical monocytes (Arunachalam et al., 2020; Grigoryan and Pulendran, 2020; Kuri- Cervantes et al., 2020; Laing et al., 2020; Mathew et al., 2020; Schulte-Schrepping et al., 2020). 28 29 Figure 2: A Multimodal Single Peripheral Immune Cell Atlas of COVID-19. (A) Average DSB-normalized protein expression (see Methods) in each coarse-level cell cluster. Only proteins with a DSB value > 3 in at least 1% of the cells are shown. (B) UMAP visualization of single cells based on protein expression profiles for innate and adaptive groupings of cells labelled by the name of the corresponding coarse-level cluster. (C) Average expression of selected surface protein markers in example finer-resolution CD4+ T cell clusters (CM = central memory, TM = transitional memory, EM.TE = terminal/effector memory). (D-I) Transcript-based UMAP visualization of classical monocytes defined by surface proteins. Cells are colored according to donor class (D), disease severity group (only COVID-19 samples are shown) (E), days since symptom onset (only COVID-19 samples are shown) (F), surface marker CD163 expression (G), surface HLA-DR expression (H), and donor (I). See also Figure S2, Tables S2 and S3. Figure generated by Andrew Martins. Clustering and annotation by Andrew Martins 30 Supplemental Figure 2 (related to Figure 2): Single Immune Cell Atlas of COVID-19 Reveals Cell Populations Associated with COVID-19 Disease Status and Severity. (A) Correlations of cell frequencies gated from CITE-seq and independently obtained 27-color flow cytometry data of the same samples. Shown are Pearson correlations and p-values. Note that the cTfhs shown here were gated using the same strategy as flow data, i.e., based on CXCR5hiCD45RA-CD4+ T cells. Given the less-than-ideal staining of CXCR5 in both CITE-seq 31 and flow cytometry, the cTfh being analyzed elsewhere in the paper were gated based on ICOS and PD1 in ICOShiPD1hi CD4+ T cell RNA-based cluster (see Methods). (B) Frequencies of immune cell clusters/subsets in HC, DSM-low (less severe disease; DSM at or below median of DSM) and DSM-high (more severe disease; DSM above median) groups at T0 (near hospitalization). Shown are the populations with the most significant changes in either COVID- 19 versus HCs or correlation with DSM. P-values shown are unadjusted p-values based on linear models controlling for age, experimental batch and TSO (TSO only controlled for in DSM model, see Methods). P-values are reported for assessing differences between COVID-19 versus HCs (green) and for correlation with DSM (red). Additional cell clusters meeting the p-value cutoff are shown in Figure S2B. (CD4_Mem_CM: central memory CD4+ T cell cluster as a fraction of total CD4+ T cells; CD8_Mem_CM.TM: central/transitional memory CD8+ T cell cluster as a fraction of total CD8+ T cells; the frequency of γδT, mucosal associated invariant T cells (MAIT), cDC, pDC, Non-Classical Monocytes and Platelets are expressed as fractions of total PBMCs.) 104 populations were tested in total. (C) Heatmap showing cell frequencies of major cell clusters/subsets in individual subjects (columns), grouped by HC and DSM. Plasmablast and B cell subsets are expressed as fractions of CD19+ B cells; CD4+ T cell subsets are expressed as fractions of CD4+ T cells; CD8+ T cell subsets are expressed as fractions of CD8+ T cells; the CD163hi classical monocyte cluster is expressed as a fraction of total classical monocytes; others are shown as fractions of total PBMCs. Rows in the heatmap were scaled to have mean = 0 and variance = 1. Figures A and C from Can Liu 32 Due to the relatively lower number of cells obtained per sample by CITE-seq compared to flow cytometry, the frequency of rarer cell subsets showed weaker correlation with these two assays (Figure S2A). However, the power of CITE-seq lies in its ability to assess gene expression within cell subsets defined by surface proteins. As an example, a mRNA-only UMAP visualization of classical monocytes (identified by surface proteins only) shows separation of cells based on disease status, DSM, and to a lesser extent, TSO (Figures 2D-F). Surface protein expression also differs across cells with distinct underlying RNA expression; for example, cells expressing CD163 or HLA-DR protein (Figures 2G-H) group together and are present in multiple donors (Figure 2I). Thus, these data (GEO Accession: GSE161918) constitute a rich multimodal time-resolved single cell dataset for exploring cell-type-specific transcriptomic and pathway signatures of disease severity and their dynamic trajectories. Type I interferon signatures are negatively associated with disease severity and decrease over time We systematically searched for transcriptional correlates of disease status (COVID-19 versus HCs), severity (DSM), and time (TSO) within individual cell clusters. Our goal was to first examine correlates at or close to hospital admission (T0) after accounting for TSO differences across patients (Figures 3A and 3B), followed by incorporation of longitudinal data within patients to reveal the effects of time (TSO) and the temporal evolution of cell type specific immune signatures in DSM-high and DSM-low patients separately (Figures S3A and S3B). We fit the expression of each gene within individual cell clusters (Figure 2A, plus additional manually gated populations) via a model that incorporated the primary effect (COVID-19 versus 33 HCs or DSM and TSO), age, and experimental batch, followed by gene set enrichment analysis (GSEA) (Subramanian et al., 2005) to uncover the biological processes and pathways involved. 34 35 Figure 3: Cell-type-specific Gene Expression Signatures of COVID-19 Disease Status and Severity. (A) Gene set enrichment analysis (GSEA) result of COVID-19 versus HCs at T0. Selected gene sets (rows – see Methods) are grouped into functional/pathway categories. Differential expression moderated t statistics were computed from a linear model accounting for age, and experimental batch (TSO not accounted for as TSO does not exist for healthy individuals). Dot color denotes normalized gene set enrichment score (NES) and size denotes -log10(adjusted P value). P values were adjusted using the Benjamini-Hochberg method. The sign of the NES corresponds to increase/decrease of change in COVID-19 compared to HCs. Cell populations (columns) are grouped by lineage/type. See Table S4A for detailed results of all gene sets which passed the adjusted P value cutoff of 0.2. (B) Similar to (A) but the enrichment analysis was performed based on association with DSM at T0, and the model controlled for TSO. The sign of the NES score denotes the sign of the association with DSM. See also Table S4B. (C) Heatmap of type I IFN response in classical monocytes. Heatmap showing the scaled average mRNA expression (within the indicated cell cluster/population for a given subject) of overlapping/shared leading-edge (LE) genes from the GSEA analysis of COVID-19 versus HCs (see (A)) and association with DSM (see (B)). Selected top LE genes are labeled by gene symbol and only the sample from the first timepoint for each patient (T0) is shown. For gene expression heatmaps, subjects are grouped by HC and DSM classes; also shown are the DSM value and severity- outcome classification of each patient (top of the heatmaps). Subjects with a small number of cells not included in the GSEA test are marked by a # (# cell number < 8). (D) Per-sample gene set signature scores of the GO_Response to type I IFN gene set vs. TSO (in days) in DSM-low 36 and DS-high groups. Classical monocyte is shown as an example. Individual samples are shown as dots and longitudinal samples from the same individual are connected by gray lines. Trajectories (using loess smoothing – see Methods) were fitted to the groups separately with the shaded areas representing standard error. The median score of age- and gender-matched HCs is shown as a green dotted line. Gene set scores were calculated using the union of LE genes from both the timing (TSO) and disease severity (DSM) association models using Gene Set Variation Analysis (See Methods). (E) Scatter plot comparing the effect size of association between TSO and the GO_Response to type I IFN gene set in the DSM-high (y axis) and DSM-low (x axis) patients. Each dot corresponds to a cell type/cluster. The effect size reflects the change of type I IFN signature enrichment with time (effect size < 0 corresponds to the enrichment decreasing with time). Cell types are colored by their statistical significance (adjusted p-value < 0.05) of whether the association with time is significantly different between the two DSM groups (based on a model with an DSM-TSO interaction term – see Methods). The area framed by blue borders corresponds to cell types with more precipitous drop of IFN signature over time in the DSM-low than -high groups. (F) Heatmap of apoptosis/cell death signature in pDCs. Similar to (C), but instead of the shared LE genes, all LE genes from COVID-19 versus HCs and association with DSM are shown. Shared LE genes and genes in the REACTOME_Oxidative stress induced senescence gene set are annotated on the right. (G) Heatmap showing the sample-level pairwise Pearson correlations among serum IFN-α2a level, pDC frequency, the apoptosis signature score in pDCs, as well as the IFN-I and protein translation signature scores in classical monocyte, CD56dimCD16hi NK, and CD8 memory T cell clusters (* p value < 0.05). Selected scatter plots are shown and the corresponding entry in the heatmap is indicated using bold borders. Samples were filtered to remove those with fewer than 8 cells or less than 40,000 unique molecular 37 identifiers (UMI) in the pseudobulk library. Note that not all subjects had IFN-α2a measurements. Each dot indicates a subject and is colored by the severity-outcome category. See also Figure S3 and Table S4. Annotations on A/B added by Can Liu. C,D, and F created by Can Liu. 38 39 IFN-I and early viral response gene signatures from live influenza challenge (Woods et al., 2013) or yellow fever vaccination (Querec et al., 2009) (see Methods for signature compilation) were elevated in COVID-19 relative to HCs across myeloid and lymphoid cell lineages (Figure 3A, Table S4A). These signatures were negatively associated with DSM (Figures 3B and 3C, Table S4B): the DSM-high patients tended to have weaker IFN-I signatures than DSM-low patients even after accounting for TSO, although even some of the most critical patients had elevated IFN signatures relative to HCs (Figure 3C). IFN-I signatures decreased over time and the extent of the drop was more precipitous in DSM-low patients in most cell types, partly because those patients started off at a higher level (Figures 3D and 3E). Translation and ribosome genes tended to be lower in most cell types with elevated IFN-I signatures (Figures 3A, 3G and S3C – Translation/Ribosome signatures, Table S4A). Elevated type I IFNs are known to suppress protein translation to limit virus production (Ivashkiv and Donlin, 2014). Consistently, as IFN-I signatures decreased over time, translation increased (Figure S3D). Interestingly, even though IFN-I signatures were negatively associated with DSM in most cell types, translation signatures were not positively associated with DSM in the same cell types (Figure 3B, Table S4B), indicating that additional regulation was involved to tune protein translation in COVID-19 patients. pDC apoptosis is associated with elevated disease severity and reduced pDC frequency pDCs are major producers of type I IFNs and orchestrators of T and B cell responses upon viral infection (Swiecki, 2015). It has been unclear why peripheral pDC frequencies were decreased in COVID-19 patients and negatively correlated with disease severity (Figures S2B and S2C) (Arunachalam et al., 2020; Kuri-Cervantes et al., 2020; Laing et al., 2020; Lucas et al., 2020); a 40 better understanding of this observation could help explain the attenuated IFN-I signatures in the most severe patients (Figure 3B, Table S4B). There is no evidence thus far that blood pDCs migrate into tissues after SARS-CoV-2 infection (Chua et al., 2020; Liao et al., 2020; Nienhold et al., 2020; Wauters et al., 2020); pDC infiltration was also not evident in bronchoscopy (Sánchez-Cerrillo et al., 2020). Alternatively, our analysis revealed that apoptotic gene signatures in pDCs, including BRCA2, CASP3, CASP8, BID, BAK1, and XBP1, were positively associated with disease status and severity (some of the most critical patients had few or no pDCs, so they were filtered out), as were oxidative stress- induced senescence genes such as FOS, HIST1H2AC, PHC3, MDM4, CBX6, and CDKN2D (Figures 3A, 3B and 3F; Tables S4A and S4B). We further validated that the pDC apoptosis signature was significantly increased in COVID-19 patients relative to HCs using publicly available single cell RNA-seq data from two independent COVID-19 cohorts from Germany (Figures S3E-G) (Schulte-Schrepping et al., 2020), although the positive association with disease severity was not as apparent, likely due to weak statistical power given low cell numbers. The apoptotic gene signature score was also negatively correlated with pDC frequency (Pearson r=- 0.72, p=0.045), which was positively associated with IFN-I signatures in several cell populations (Figure 3G), suggesting that pDC apoptosis might have contributed to lower peripheral pDC numbers, which then led to the depressed IFN-I signatures in cells. We also confirmed that circulating IFN-a2a levels in serum were positively correlated with IFN-I signatures in cells (Figure 3G; CD8+ memory T cell shown; similar for other cell types). These analyses further confirmed the negative association between IFN-I and translation signatures within cell types discussed above (Figures 3A and 3G). Together, these results suggest that pDC apoptosis is a potential culprit of lower pDC frequencies and IFN-I signatures in severe COVID-19. 41 Conditional independence network analysis suggests IL-15-linked fatty acid metabolism and attenuated inflammation in CD56dimCD16hi NK cells as primary correlates of disease severity Thus far, our analyses identified cell-type-specific signatures associated with DSM near hospital admission (T0) after accounting for TSO (Figure 3B and Table S4B), but many of them might not reflect primary correlates of disease severity and could have emerged because they correlated with ones that were. Thus, we next applied conditional independence network analysis to infer primary (or direct) correlates of DSM (see Methods). Nodes in the resultant network represent the gene expression state of a biological process in a cell population, except that DSM itself is one of the nodes; two nodes are connected if the correlation between them across patients is statistically independent of their correlations with other nodes in the network. We found four nodes directly connected to DSM (Figure 4A; the rest of the network is captured in Table S5). Oxidative stress-induced senescence in pDCs and fatty acid (FA) metabolism in CD56dimCD16hi NK cells were inferred as the primary positive correlates of DSM (Figure 4A and Tables S4B and S5). The senescence signal is reflective of the aforementioned pDC apoptosis signature (Figures 3B and 3G, Table S4B), e.g., the senescence and apoptosis signature scores in pDCs were significantly correlated across patients even though JUN is the only shared leading-edge gene in these two gene sets (Figure S4A). Thus, this result further supports that pDC apoptosis was a potential driver of disease severity. 42 43 Figure 4: Conditional Independence Network Analysis Points to IL-15-associated Fatty Acid Metabolism and Attenuated Inflammation in CD56dimCD16hi NK Cells as Primary Correlates of Disease Severity. (A) Disease severity network showing cell type-specific gene set signatures directly connected with DSM (see Methods). Direct connections between the nodes to each other are also shown in a lighter shade. Edge value and width indicate Spearman correlation and its statistical significance, respectively, between DSM and the gene set signature score. In the legend: % leading-edge (LE) genes denotes the proportion of genes from the gene set that are in the LE based on gene set enrichment analysis. The top three LE genes based on effect size (association with DSM; see Methods) and other selected genes representative of the biology are shown for each gene set. (B and C, G and I) Scatter plots showing the correlation of circulating IL-15 level versus REACTOME_Fatty acid metabolism signature score (B) and DSM (C); REACTOME_Fatty acid metabolism score versus HALLMARK_TNFα signaling via NF-κB score and GO_Cellular response to IL-1 score (G); REACTOME_Fatty acid metabolism score versus HALLMARK_mTORC1 signaling score and normalized IFNG mRNA expression (I). Pearson correlation (R) and associated p value were computed using pairwise complete observations (see Methods). Each dot indicates a subject and is colored by the severity-outcome category. (D) Heatmap of REACTOME_Fatty acid metabolism LE genes from the GSEA analysis of DSM association in CD56dimCD16hi NK cells at T0 (see Figure 3B). LE genes are grouped based on annotations obtained from Gene Ontology. Genes with an asterisk (*) are those with both biosynthetic and catabolic annotations. Genes in red are those in the fatty acid oxidation set. (E and F) Similar to (D), heatmaps of inflammation related gene sets in CD56dimCD16hi NK cells at T0: HALLMARK_TNFα signaling via NF-κB (E), GO_Cellular 44 response to IL1 and KEGG_Chemokine signaling pathway (see (A)) (F). Heatmaps showing the scaled average mRNA expression (within the indicated cell cluster/population for a given subject) of LE genes from the GSEA analysis of DSM association in CD56dimCD16hi NK cells (see Figure 3B). Selected top LE genes are labeled by gene symbol. For all gene expression heatmaps (D-F), subjects are grouped by HC and DSM classes; also shown are the DSM value and severity-outcome classification of each patient (top of the heatmaps in (D)). Subjects with a small number of cells not included in the GSEA test are marked by a # (cell number < 8). (H) Average IFNG mRNA expression of CD56dimCD16hi NK cells; (B-C, F, I) are aligned column wise. (J and K) Per-sample gene set signature scores of REACTOME_Fatty acid metabolism (J), HALLMARK_TNFα signaling via NFkB and GO_Cellular response to IL-1 (K) vs. TSO (in days) in DSM-high (red) and DSM-low (blue) groups of CD56dimCD16hi NK cells. Individual samples are shown as dots and longitudinal samples from the same individual are connected by gray lines. Trajectories (using loess smoothing) were fitted to the groups separately with the shaded areas representing standard error. The median score of age- and gender-matched HCs is shown as a green dotted line. Gene set scores were calculated using the union of LE genes from both TSO association and DSM association models reflecting the change over time and severity differences at T0 (See Methods). (L) Circulating IL-15 levels in DSM-low and DSM-high groups versus TSO. See also Figures S4, S5 and Table S5. Figure and analysis from Can Liu and William Lau 45 Supplemental Figure 4 (related to Figure 4): Supporting Data for Dissecting Primary Gene Expression Signature Correlates Inferred by Conditional Independence Network Analysis. (A) Scatter plot of REACTOME_Oxidative stress-induced senescence signature score and GO_Apoptotic signaling signature score in pDCs (Pearson correlation and p value were 46 computed using pairwise complete observations [samples filtered to those with at least 8 cells and 40,000 UMI detected in the pseudobulk library]). Each dot denotes a subject and is colored by the severity-outcome category. (B) Similar to (A), but between circulating IL-15 level and fatty acid metabolism signature score in CD56dimCD16hi NK cells after regressing out their associations with DSM. (C and D) Similar to Figure 4D. Heatmaps of REACTOME_Fatty acid metabolism in NK cells of two validation cohorts – Schulte-Schrepping et al, Cell 2020 cohort 1 (C) and cohort 2 (D). The LE genes from the GSEA analysis of association with DSM in each cohort are shown. Only the first timepoint (T0 sample) of each subject is shown. (E) GSEA results of Schulte-Schrepping et al cohorts for NK cell REACTOME_Fatty acid metabolism. (F) Similar to Figure 4G. Scatter plot of REACTOME_Fatty acid metabolism score and HALLMARK_TNFα signaling via NF-κB score in the validation cohorts – Schulte-Schrepping et al, 2020, Cell. (G and H) Similar to Figure 4E. Heatmaps of inflammation related gene sets in classical monocytes: HALLMARK_TNFα signaling via NF-κB (G) and HALLMARK_Inflammatory response (H). B-D and G,H from Can Liu and William Lau The role of NK cell metabolism in COVID-19 is not known. However, CD56dimCD16hi NK cells are a cytolytic subset activated rapidly within hours of infection or after stimulation by cytokines such as IL-15 (Carson et al., 1994; De Maria et al., 2011). Strikingly, circulating levels of IL-15 were indeed positively correlated with the FA signature score in these NK cells (Figure 4B), and this association was DSM dependent: IL-15 was itself correlated with DSM (Figure 4C) and IL-15 and the FA signature were no longer correlated once their associations with DSM were statistically removed (Figure S4B). Both FA biosynthetic and catabolic/oxidative genes 47 were elevated in the most severe (e.g., the critical-deceased subjects) versus the milder patients, who tended to express these genes at lower levels than HCs (Figure 4D). By using single cell RNA-seq data from two independent German COVID-19 cohorts (Schulte-Schrepping et al., 2020), we confirmed that this FA signature was indeed significantly associated with disease severity (Figures S4C-E). Despite the positive association between IL-15 (a proinflammatory cytokine) and the FA signature, the FA signature was negatively correlated with inflammation related processes, including the NF-κB (Figures 4E and 4G; r=-0.55, p=0.004) and IL-1 response signatures (see below; Figures 4F and 4G; see also Table S4A), as well as with the mTORC1 signature (HALLMARK_mTORC1_signaling) (Pearson r=-0.63; p=0.0008) and the IFNG transcript (Pearson r=-0.46; p=0.02, Figures 4H and 4I). This negative association between the FA and NF- κB related inflammation signature was also confirmed in the two independent German cohorts (Figure S4F) (Schulte-Schrepping et al., 2020). Thus, CD56dimCD16hi NK cells from the most critical patients were residing in a potentially dysfunctional state with attenuated inflammation (relative to less severe cases) and low IFNG transcription despite exposure to increased IL-15, as well as elevation in both FA biosynthesis and oxidation genes. Consistent with this notion, while CD56dimCD16hi NK cells are known to produce IFN-g early (within hours) after cytokine stimulation, IFN-g production decreases or stops by 16 hours (De Maria et al., 2011); prolonged exposure of NK cells to inflammatory cytokines like IL-15 can actually lead to hyporesponsiveness to subsequent stimulation, partly via a loss in FA oxidation (Felices et al., 2018). Thus, here the increased FA oxidation genes may reflect a compensatory mechanism to counteract this dysfunctional metabolic state, although it is unclear why FA biosynthetic genes were also increased. 48 Consistently, the two primary negative correlates of DSM, chemokine signaling and IL-1 response signatures in the same subset of NK cells (Figures 4A and 4F), shared genes with the NF-κB and inflammation signatures, such as RELA, NFKB1B, STAT1, and IL8, although they also contained additional genes such as XCL1 and XCL2, which are chemokines that can be produced and secreted early after NK cell activation following an infection (Dorner et al., 2002). This inflammatory attenuation in the most severe patients was also found in other cell types including classical monocytes (Figures S4G and S4H), as reported earlier by others (Arunachalam et al., 2020; Schulte-Schrepping et al., 2020). Thus, the increased FA together with the attenuated inflammatory and mTORC1 signatures in the most severe patients might reflect an exhaustion-like NK cell state due to persistent exposure to inflammatory signals including IL-15. Similar to the IFN-I signatures, the FA signature decreased over time, but here particularly in the DSM-high patients because those had the most elevated expression of the FA genes near T0 (Figures 4J and S3A and S3B). In contrast, the inflammation signatures increased, including the NF-κB (HALLMARK_TNFα_signaling_via_NF-κB) and IL-1 signatures (Figure 4K; Tables S4C and S4D), and the IFNG transcript (data not shown). Consistently, circulating IL-15 levels decreased particularly in the DSM-high patients (Figure 4L). Unlike the IFN-I signatures, however, both the FA and inflammation signatures deviated further from the HCs even though IL-15 dropped over time (Figures 4J-L). Thus, while exposure to IL-15 earlier in the disease course might have contributed to the dysregulated phenotypes in these NK cells, a reduction in IL-15 alone seemed insufficient to reset these cells; additional signals, such as those from other inflammatory cytokines, might have contributed as well (see below). 49 Exogenous corticosteroid use was associated with DSM (p=0.001, F test from a linear model accounting for age, sex, ICU and intubation statuses, and TSO) and thus might have played a role in driving the inflammatory attenuation signature in the CD56dimCD16hi NK cells. However, DSM was not associated with a glucocorticoid transcriptional response signature derived from human immune cells (Franco et al., 2019) (Figure S5A). Importantly, both this glucocorticoid transcriptional signature or another well-known marker of glucocorticoid or IL-10 exposure [TSC22D3/GILZ transcript (Berrebi et al., 2003; Cannarile et al., 2001)] was not negatively associated with the NF-κB, inflammatory, and mTORC1 signatures and also not positively correlated with the FA signature (Figures S5B and S5C). Although TSC22D3 mRNA level was trending higher in COVID-19 patients than HCs, it was similar between patients on or off corticosteroids (Figure S5D). Unexpectedly, the NF-κB and glucocorticoid signatures were positively correlated across patients (Figure S5B). These observations together suggest that the glucocorticoid response signature might be driven by endogenous glucocorticoids as a part of negative feedback regulation resulting from earlier inflammatory activation in COVID-19 patients (Jamieson et al., 2010; Newton et al., 2017). Thus, a “burnt-out”/exhausted, low inflammation, high FA gene expression state in CD56dimCD16hi NK cells could contribute to severe COVID-19 independent of exogenous corticosteroid use. 50 Supplemental Figure 5 (related to Figure 4): Exogeneous Corticosteroid Treatment is Not a Major Driver of Cell-type-specific Gene Expression Signatures Associated with Disease Severity. (A) GSEA results for glucocorticoid response signature (see Methods on how the signature was derived) in DSM association model. Positive enrichment corresponds to higher level in the DSM-high samples. (B and C) Scatter plot showing the correlations between the indicated signature scores (computed using GSVA) and the glucocorticoid response signature score (B) or the TSC22D3 mRNA expression level (C) in CD56dimCD16hi NK cells. Pearson correlations and p-values are reported for patients with corticosteroid use (orange; steroid use TRUE) and those without (cyan; steroid use FALSE) as well as for all samples (black). Each dot indicates a subject, shaped by steroid use status and colored by the severity-outcome category. (D) 51 TSC22D3 mRNA expression levels of CD56dimCD16hi NK cells and classical monocytes in HC, no steroid-use and steroid-use COVID-19 patient groups. P-values from testing for differences between the no steroid-use and steroid-use groups are shown. P-values were obtained from an ANOVA test accounting for DSM, TSO, age and experimental batch. B-D from Can Liu Extensive single cell and clonal expansion heterogeneity without signs of exhaustion in CD8+ T cells T cell signatures were interestingly not implicated as primary correlates of disease severity. Consistent with an acute viral infection, signs of T cell activation and proliferation such as cell cycle and related metabolic signatures appeared in both CD4 + and CD8+ T cells in COVID-19 patients compared to HCs (Figures 3A and 3B). Examining CD8+ T cells further at the single cell level including clonality information (utilizing our simultaneous TCR and gene expression measurements) revealed extensive heterogeneity (Figures S6A and S6B), including subsets of activated cells with high clonality that were more abundant in COVID-19 (i.e. cluster 14 in Figures S6A-D). Similar patterns of heterogeneity were observed in CD4+ T cells, but most CD4+ subclusters were not expanded or less expanded compared to the CD8+ T cells (data not shown). Surprisingly, although clonality in all CD8+ memory T cells was unchanged between HCs and COVID-19 patients at T0, it was significantly higher in the DSM-high than the DSM- low patients after accounting for TSO (Figure S6E; p=0.023 from linear model testing the effects of DSM while accounting for age, batch and TSO), suggesting increased diversification (or depletion of clonal cells) in less severe patients relative to those with more severe disease. 52 53 Supplemental Figure 6: Single Cell and Clonal Expansion Analysis in CD8+ T cells and Exhaustion Assessment in Clonal CD8+ T cells. (A) Heatmap showing the gene expression profile of CD8+ T cell clusters identified based on single-cell mRNA expression of the leading-edge genes of selected pathways presented in Figure 3 (see text and Methods); only COVID-19 T0 samples are shown. Average scaled mRNA expression per cluster of genes (columns) in each of the CD8+ T cell clusters (rows) is shown. Gene memberships in selected gene sets are indicated by the color bars at the bottom of the heatmap. Total number of cells in each cluster is indicated on the right side of the heatmap. Bar plot shows the fraction of cells within each cluster that are defined as expanded (>1 cell detected per CDR3 α-β pair). Two small clusters (<32 cells each) are not shown. (B) Average expression of selected surface proteins in the clusters from (A). (C) Results of linear model accounting for age, and experimental batch, comparing the frequency of CD8+ T cell clusters from (A) between COVID-19 and HC samples. Shown is the difference in mean frequency between COVID-19 and HC (x-axis) versus the -log10 (p value); horizontal line denotes an unadjusted p-value of 0.05. 18 CD8 clusters were tested in linear models. (D) Fraction of overlap between RNA based clustering (from A) and surface protein based CD8+ naive and memory T cell cluster annotations (based on high resolution clustering). (E) CD8+ T cell clonality (median rarefied Simpson index, see Methods) in HC, DSM-low and -high groups. P-values shown are from linear model adjusted for experimental batch, age, and TSO (green for assessing differences between COVID-19 versus HCs and red for correlation with DSM; TSO only adjusted for in the model assessing DSM association). (F) Coefficients from linear models of mean surface protein expression of canonical exhaustion markers fitted to COVID-19 patients and HCs. Positive coefficients (red 54 bars) correspond to higher expression in COVID-19 versus HCs (above) or higher expression in DSM-high versus -low (below) and vice versa for negative coefficients (green bars). P values are indicated with red indicating significance at the 0.05 level (unadjusted). Among them, only CTLA4 was mildly significant but the association was in the opposite direction expected for exhaustion: its expression was lower in patients, particularly in the more severe. Similarly, insignificant changes or inconsistent directions of change were detected for the mRNA of these protein markers (data not shown). (G) Association of proportion of CD39+PD1+ cells with COVID-19 versus HCs and DSM in clonal CD8+ memory T cells using different cutoffs for CD39 and PD1 surface protein expression DSB counts (0.5, 1). P values are from Wilcoxon tests. Similarly, we also tested whether the frequency of cells co-expressing multiple exhaustion markers in (F) was different. Independent of the surface marker combination or protein expression cutoff used, we saw no signs of exhaustion or association with DSM (data not shown). (H) GSEA results for assessing enrichment for known exhaustion signatures in DE genes for expanded CD8+ T cells in COVID-19 versus HCs and DSM-high versus DSM-low comparisons. Positive enrichment corresponds to higher level in the first group. Both exhaustion gene sets were not enriched in the DSM-high versus DSM-low comparison, indicating that exhaustion was not associated with disease severity. The enrichment in the COVID vs. HC comparisons could reflect cellular activation (“up” signature – see (I) below) and depression of translation genes in COVID-19 patients negatively associated with IFN-I signatures (“down” signature) – see also Figure 3A. (I) Gene set enrichment of Wherry et al. up and down genes in KEGG, GO BP, REACTOME, and Li BTM’s. Only top hits (ranked by adjusted p values from Fisher’s exact test) are shown. The “exhaustion up” genes appeared to reflect CD8+ T cell activation as they are enriched for NF-κB, JAK-STAT signaling, and IL-2 signaling genes. (J) 55 GSEA result of Schulte-Schrepping et al. cohorts for exhaustion signatures of COVID-19 versus HCs and severe versus mild comparisons at T0. Differential expression moderated t statistics were computed from a linear model accounting for age, experimental batch and TSO (TSO only accounted for in the severe versus mild comparison). Dot color denotes normalized gene set enrichment score (NES) and size denotes -log10(adjusted P value). Dot shape indicates the significance of adjusted P values (adjusted P value < 0.05). Both cohorts 1 and 2 and both CD8+ T cell clusters in the original data and memory CD8+ T cell populations using transferred cell type labels from our CITE-seq data are shown. (K) Similar to (H). GSEA results of Schulte- Schrepping et al. cohorts for Wherry et al. exhaustion signatures of COVID-19 versus HCs and severe versus mild comparisons at T0. Positive enrichment corresponds to higher level in the first group. Only memory CD8 population based on transferred labels from our CITE-seq data is shown as an example. A,B,D from Andrew Martins. Clustering in A performed by Andrew Martins. Despite signs of activation and proliferation, we did not detect a cluster of “exhausted” CD8+ T cells associated with COVID-19 as suggested earlier (Laing et al., 2020; Zhang et al., 2020b), based on surface markers such as PD-1 (CD279) (Figures S6B and S6C). Furthermore, we did not detect association of exhaustion with disease status or severity by using surface markers or surface marker combinations (e.g., CD39+PD1+) (Gupta et al., 2015) in clonally expanded CD8+ memory T cells in our cohort (Figures S6F and S6G). Assessment using transcriptional signatures of exhaustion (Wherry et al., 2007) in our and in two German cohorts (Schulte-Schrepping et al., 2020) revealed a more complex picture given that these signatures overlap with those of cellular activation and translation/ribosome (Figures S6H-K), which were 56 detected earlier as disease status or severity correlates of COVID-19 (Figures 3A and 3B). Thus, transcriptional signatures alone are not sufficiently specific to indicate whether the cells were more exhausted or merely more activated. Thus, while we uncovered an exhaustion related metabolic gene signature in CD56dimCD16hi NK cells as a primary disease severity correlate (Figure 4A), CD8+ T cells did not show signs of exhaustion beyond the transcriptional signatures expected of cellular activation or regulation by type I IFNs (Ivashkiv and Donlin, 2014), even in patients with critical disease. Analysis of timing effects suggests a late immune response juncture As shown earlier, while inflammatory signatures (e.g., HALLMARK_TNFα_signaling_via_NF- κB) were less elevated in the DSM-high patients than DSM-low patients early, they increased over time and stayed elevated compared to HCs as late as day 20 post symptom onset in cell subsets such as CD56dimCD16hi NK cells and classical monocytes (Figures 3B, 4K, S3A and S3B; Tables S4C and S4D). Gene signatures of cytokine-mediated signaling pathways (e.g., IL- 4/13 and IL-17 signaling) in classical monocytes also showed significant increases over time, particularly in the DSM-high patients (Figure S3B). Consistent with an earlier report (Lucas et al., 2020), the frequency of classical monocytes and its CD163hi subset increased in both patient groups over time and peaked around day 20 (Figure 5A). The same was true for absolute monocyte and neutrophil counts in blood (Figure 5B). Thus, we next further examine the dynamics around day 20, particularly to assess differences between the DSM-high versus -low patients. 57 Figure 5: Analyses of Timing Effects Suggest a Late Immune Response Juncture. (A) Time course of monocyte subset frequencies in DSM-low and DSM-high groups. Classical monocyte is expressed as fraction of total PBMCs; the CD163hi classical monocyte subset is expressed as a fraction of classical monocytes. The median cell frequency of age- and gender- matched HCs is shown as a green dotted line. Individual samples are shown as dots and longitudinal samples from the same individual are connected by gray lines. P-values shown are from mixed effect linear models indicating the statistical significance of the timing effect (i.e., TSO) accounting for age and experimental batch in DSM-low and DSM-high groups, 58 respectively. Trajectories (using loess smoothing) were fitted to DSM-low and DSM-high groups separately. TSO = days 17-23 period is highlighted with purple. (B) Similar to (A) but showing the absolute blood neutrophil and monocyte counts. The two green dotted lines mark the approximate reference range of cell counts in healthy adults. The shaded areas around trajectories denote standard error. (C) Effect size (normalized enrichment score from GSEA) comparison of the period before day 17 (TSO < day 17, green) and during the TSO = days 17-23 period (purple) for inflammatory related gene sets. Effect sizes correspond to differences between DSM-high vs. -low groups, e.g., effect size < 0 corresponds to the gene signature being less enriched in the DSM-high group than DSM-low group. P values shown are FDR adjusted from the test reflecting the temporal changes in the difference between DSM-high and -low groups from before the day 17 period to the days 17-23 time window (see Methods). (D) Similar to Figure 3A, but here showing GSEA results for assessing differences between the DSM-high versus DSM-low groups using only samples from days 17-23 since symptom onset. See Table S4E for detailed results of these selected gene sets and all sets that passed the adjusted P value cutoff of 0.2. (E) Time course of gene set signature scores of inflammatory related gene sets in DSM-low and -high patient groups in CD56dimCD16hi NK cells and classical monocytes. The gene set signature scores were calculated using the LE genes identified from the enrichment analysis shown in (C) above to highlight differences between the DSM-high versus -low groups during the days 17-23 period. The median score of age- and gender-matched HCs is shown as a green dotted line. (F) Time course of serum protein levels from DSM-low and -high patients respectively. Similar to (E). Top 4 serum proteins significantly different between DSM-high versus DSM-low groups during the days 17-23 period are shown. See Table S6 for the full list of differentially expressed proteins. See also Figure S7, Tables S4 and S6. 59 A-C, E,F from Can Liu and William Lau. We divided samples into three groups by TSO: before day 17 (27 samples), the week around day 20 (days 17-23 – 16 samples), and after day 23 (5 samples), then fitted mixed effect models to evaluate changes between the first two periods in cell-type specific gene expression differences between DSM-high and -low patients (Figure S7A; Table S4F; see Methods). We detected several statistically significant “flips”, for example, inflammatory signatures in CD56dimCD16hi NK cells and classical monocytes were lower in DSM-high patients before day 17 but rose to be higher than the DSM-low patients by days 17-23 (Figure 5C). We then assessed differences between the DSM-high and -low groups during days 17-23 (Figure 5D; Table S4E). In addition to CD56dimCD16hi NK cells and classical monocytes, the NF-κB gene signature was elevated in non-classical monocytes, other NK cell subsets, MAIT cells, and memory and naïve B cells in DSM-high than -low patients (Figure 5D). The