Genome sequence of the insect pathogenic fungus Cordyceps militaris, a valued traditional chinese medicine Zheng et al. Zheng et al. Genome Biology 2011, 12:R116 http://genomebiology.com/2011/12/11/R116 (23 November 2011) Zheng et al. Genome Biology 2011, 12:R116 http://genomebiology.com/2011/12/11/R116 RESEARCH Open Access Genome sequence of the insect pathogenic fungus Cordyceps militaris, a valued traditional chinese medicine Peng Zheng1?, Yongliang Xia1?, Guohua Xiao1?, Chenghui Xiong1, Xiao Hu1, Siwei Zhang1, Huajun Zheng2, Yin Huang2, Yan Zhou2, Shengyue Wang2, Guo-Ping Zhao1,2, Xingzhong Liu3, Raymond J St Leger4 and Chengshu Wang1* Abstract Background: Species in the ascomycete fungal genus Cordyceps have been proposed to be the teleomorphs of Metarhizium species. The latter have been widely used as insect biocontrol agents. Cordyceps species are highly prized for use in traditional Chinese medicines, but the genes responsible for biosynthesis of bioactive components, insect pathogenicity and the control of sexuality and fruiting have not been determined. Results: Here, we report the genome sequence of the type species Cordyceps militaris. Phylogenomic analysis suggests that different species in the Cordyceps/Metarhizium genera have evolved into insect pathogens independently of each other, and that their similar large secretomes and gene family expansions are due to convergent evolution. However, relative to other fungi, including Metarhizium spp., many protein families are reduced in C. militaris, which suggests a more restricted ecology. Consistent with its long track record of safe usage as a medicine, the Cordyceps genome does not contain genes for known human mycotoxins. We establish that C. militaris is sexually heterothallic but, very unusually, fruiting can occur without an opposite mating-type partner. Transcriptional profiling indicates that fruiting involves induction of the Zn2Cys6-type transcription factors and MAPK pathway; unlike other fungi, however, the PKA pathway is not activated. Conclusions: The data offer a better understanding of Cordyceps biology and will facilitate the exploitation of medicinal compounds produced by the fungus. Background insect biocontrol agents [2,3]. Although C. militaris and The Ascomycete genus Cordyceps includes over 500 Cordyceps sinensis (syn. Ophiocordyceps sinensis) are species that are pathogens of arthropods. Cordyceps best known as traditional Chinese medicines, they are militaris (CCM) is the type species and occurs through- also increasingly being studied and used in the West out much of the Northern Hemisphere as a pathogen of [4,5]. An array of pharmacologically active components lepidopteran insect pupae [1]. C. militaris is readily has been identified, including cordycepin, cordycepic characterized by the sexual fruiting bodies forming on acids, polysaccharides and macrolides [6]. Cordycepin mycosed pupae, the structures giving the fungus its (3?-deoxyadenosine) has so far only been reported in C. common name of ?pupa grass? in China. Anamorphic militaris and is a broad spectrum antimicrobial [5] and Cordyceps species, such as Beauveria spp., Metarhizium polyadenylation inhibitor that is currently undergoing spp. and Paecilomyces spp., have been developed as clinical trials against cancers [7]. The biosynthetic path- way of cordycepin production has not been determined. * Correspondence: cswang@sibs.ac.cn In spite of their market values - for example, > ? Contributed equally $10,000 per kilo for the fruiting bodies of the un-culti- 1Key Laboratory of Insect Developmental and Evolutionary Biology, Institute vatable C. sinensis [8] - very little is known about sex of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 300 Fenglin Road, Shanghai 200032, China and developmental processes in Cordyceps species, and Full list of author information is available at the end of the article remedying this deficiency should help in production/ ? 2011 Zheng et al.; licensee BioMed Central Ltd This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Zheng et al. Genome Biology 2011, 12:R116 Page 2 of 21 http://genomebiology.com/2011/12/11/R116 cultivation of these enigmatic fungi. C. militaris is nota- pharmaceuticals underlying the widespread medical ble as it readily performs sexual reproduction on artifi- impact of Cordyceps spp., and identify potential safety cial media and is thus a good target for studying the hazards, including genes for known human mycotoxins. molecular underpinnings of sex and development in Cordyceps spp. (Figure 1). C. militaris also has the Results potential to be a versatile new model for studying the Genome sequencing and general features evolution of sex and reproductive structures. While cur- The C. militaris genome was shotgun sequenced to 147 rent fungal models have provided numerous insights ? coverage and assembled into 33 scaffolds with an N50 into the evolution of sex [9], there is still much to be of 4.6 Mb and a total genome size of 32.2 Mb. The gen- understood about the mechanisms, evolution and ecolo- ome is smaller than either the broad host range Metar- gical impact of sexuality in fungi. This is, in part, hizium anisopliae (MAA) or the locust-specific because fungal mating and sexual cycles are often com- pathogen Metarhizium acridum (MAC) that we plicated; for example, aspergilli have both self-fertile sequenced previously (Table 1). The characteristic telo- (homothallism) and self-sterile (heterothallism) mating meric repeats (TTAGGG/CCCTAA)n were found at systems [10]. either 5? or 3? terminal of 13 scaffolds, including the There is also much to be learnt about the nature and terminal anchoring of two scaffolds, that is, the com- evolution of interactions of Cordyceps spp. with their plete chromosomes. From mapping > 5,000 expressed hosts and with the wider environment. As entomo- sequence tags [12], the C. militaris genome was esti- pathogenicity appears to have evolved independently in mated to be > 99% complete. The genome was predicted Cordyceps and two Metarhizium species [11], compara- to encode 9,684 protein genes, which is slightly fewer tive genomics will provide independent assessments of than M. anisopliae and M. acridum (Table 1). Conse- what is required to be entomopathogenic, identify the quently, many protein functional categories are smaller degree to which evolution between these fungi has in Cordyceps than in Metarhizium spp. (Figure 2a). been convergent, and identify the genomic basis of However, like M. anisopliae (17.6%) and M. acridum their differing physiologies and host-specificity. Last (15.1%), C. militaris has a higher proportion of its genes but not least, genomic sequencing of C. militaris will encoding putatively secreted proteins (15.9%) than other enable a systematic exploration of the biology and sequenced ascomycetes (5 to 10%) [10,13,14]. Figure 1 Life cycle and phenotypic polymorphism of C. militaris. The round conidia (from a solid culture) or the bar shaped blastospores (from a liquid culture) were inoculated onto caterpillar pupa or rice medium and incubated for up to 60 days. The resulting fertile fruiting bodies have protruded perithecia that contain asci. The ejected linear ascospores fragment and germinate to produce secondary pear-shaped conidia under nutrient poor conditions, that is, micro-cycle conidiation. Both the ascospores and secondary conidia can infect caterpillars. Scale bar: 5 ?l. Zheng et al. Genome Biology 2011, 12:R116 Page 3 of 21 http://genomebiology.com/2011/12/11/R116 Table 1 Comparison of genome features among three amino acid identity with either M. anisopliae or M. acri- insect pathogens dum, slightly higher than with the plant pathogens F. Features C. militaris M. anisopliae M. acridum graminearum (61.6%) and Magnaporthe oryzae (56.0%) Size (Mb) 32.2 39.0 38.1 (Table 2). Thus, the three insect pathogenic fungi are Coverage (fold) 147 ? 100 ? 107 ? more highly diverged than F. graminearum, Fusarium Percentage G+C 51.4 51.5 50.0 oxysporum and Fusarium verticillioides, which share an content average of 85% nucleotide sequence identity [14], and Percentage repeat rate 3.04 0.98 1.52 Aspergillus nidulans, Aspergillus fumigatus and Aspergil- Protein-coding genes 9,684 10,582 9,849 lus oryzae, which share an average of 68% amino acid Gene density (genes 257 271 259 sequence identity [13], and Trichoderma reesei, Tricho- per Mb) derma virens and Trichoderma atroviride, which share Exons per gene 3.0 2.8 2.7 an average of > 70% amino acid sequence identity [15]. Percentage secreted 16.2 17.6 15.1 The regions containing at least three contiguous open proteins reading frames that are not present in the reference gen- tRNA 136 141 122 ome are designated as genomic islands (GIs) [16]. Pseudogenes 102 363 440 Whole genome reciprocal analysis of three insect patho- NCBI accession AEVU00000000 ADNJ00000000 ADNI00000000 gens demonstrated that, in comparison to Metarhizium Mb, mega base pairs. spp., C. militaris has 52 GIs (2% coverage of its genome, harboring 21% of its species-specific genes), which is An InterproScan analysis identified 2,736 conserved many more than M. anisopliae (8 GIs, 0.3%) or M. acri- protein families in C. militaris (containing 6,725 pro- dum (5 GIs, 0.2%) when referenced to C. militaris. As teins), fewer than those in M. anisopliae (7,556 proteins in aspergilli [17], many C. militaris species-specific in 2,796 families) or M. acridum (6,948 proteins in gene-encoding proteins do not have conserved domains 2,746 families) [11]. In particular, the number of trans- and the genes are clustered together to form GIs (Table posases is much fewer in C. militaris (4) than in Metar- 2). A phylogenomic analysis established that the Cordy- hizium spp. (148 in M. anisopliae and 20 in M. ceps lineage is more closely related to the wheat patho- acridum) or other sequenced ascomycetes (15 to 426) gen F. graminearum (divergence time of 200 to 260 (Table S1 in Additional file 1). The C. militaris genome million years ago (MYA)) than it is to Metarhizium spp. lacks retrotransposase (Table S2 in Additional file 1), (26 to 34 MYA) (Figure 3c). Thus, the lineage leading to and has more than three-fold fewer pseudogenes than C. militaris appears to have diverged from plant patho- Metarhizium spp. (Table S3 in Additional file 1). About gens around the Triassic-Jurassic boundary (200 MYA), 16% of the predicted C. militaris genes (1,547) are puta- while M. anisopliae and M. acridum diverged after the tively involved in pathogen-host interactions; this pro- Cretaceous Extinction Event (65 MYA) [18]. Analysis of portion is slightly lower than for Metarhizium spp. paralogous genes found only one pair of C. militaris (17.3% in MAA and 16.5% in MAC) but is higher than genes with > 90% nucleotide sequence similarities (Fig- four plant pathogens (10.8 to 15.5%; P = 0.0476; false ure 3d), which is similar to Neurospora crassa (one pair) discovery rate (FDR) = 0.0152) (Table S4 in Additional [13] and F. graminearum (two pairs) [19]. Analysis of 24 file 1). paired C. militaris genes showing > 70% nucleotide More than 50% of M. anisopliae and M. acridum pro- identities found a strong overall C:G to T:A mutation teins have > 90% identity [11]. Although the rarely bias (Figure S1 in Additional file 2), consistent with observed sexual stages of Metarhizium spp. have been repeat-induced point mutations, the DNA methylation- identified as a Cordyceps species [1], the analysis linked processes that cause mutations of repeated fungal revealed that < 2% of C. militaris genes were highly con- sequences [15,20]. served in comparison with those from Metarhizium spp., that is, had Blast score ratio (BSR) values close to Protein family analysis 1 (Figure 3a). A similar pattern was observed when We identified gene family expansions for proteases, chit- comparing C. militaris, M. anisopliae and the plant inases, lipases and protein kinases in C. militaris when pathogen Fusarium graminearum (Figure 3b). Compara- compared with phytopathogenic fungi, whereas gene tive genomic analysis of the three insect pathogens family contractions occurred for glycoside hydrolases found that the percentage of species-specific genes is (GHs; P = 0.0144; FDR = 0.02), cutinases (P = 0.0065; much higher in C. militaris (13.7%) compared to M. FDR = 0.0226) and pectin lyases (P = 0.0245; FDR = anisopliae (4.8%) and M. acridum (3.5%) (Figure 2b). 0.0284) (Table S1 in Additional file 1). The largest Based on the identities between orthologous proteins, C. family expansions were for proteases. The C. militaris militaris displays an average of approximately 63% genome contains 61 families of proteases but most of Zheng et al. Genome Biology 2011, 12:R116 Page 4 of 21 http://genomebiology.com/2011/12/11/R116 Figure 2 Comparative genomics analysis of three insect pathogens. (a) Functional classification and comparison of C. militaris (CCM), M. anisopliae (MAA) and M. acridum (MAC) proteins, showing that C. militaris has fewer genes in each category. Each circle represents the relative fraction of genes represented in each of the categories for each genome. (b) Reciprocal blast analysis of the predicted proteins among three insect pathogens. The cut-off E value is at ? 1e-5. Zheng et al. Genome Biology 2011, 12:R116 Page 5 of 21 http://genomebiology.com/2011/12/11/R116 Figure 3 Comparative genomics and evolutionary analysis of C. militaris. Scatter plots of Blast score ratio (BSR) analysis of (a) C. militaris (CCM), M. anisopliae (MAA) and M. acridium (MAC) genomes, and (b) CCM, MAA and F. graminearum (FG) genomes. The numbers in red at the lower left corners are the percentages of C. militaris species-specific sequences and the numbers at the upper left or lower right are the percentages of lineage-specific genes between pairs of genomes. (c) A maximum likelihood phylogenomic tree constructed using the Dayhoff amino acid substitution model showing the evolutionary relationship of C. militaris with different fungal species. Three insect pathogens are highlighted by the green shading. (d) Distribution of paralogous gene numbers with different levels of nucleotide similarity in C. militaris and other fungi. MY, million years. them were included in families of serine proteases (180/ bacterial-like chymotrypsin identified in M. anisopliae 381) and metallopeptidases (108/381) (Table S5 in Addi- [21] is absent in M. acridum [11] but present as two tional file 1). Gene expansions within the subtilisin (P = copies in C. militaris. The A01 aspartyl proteases are 0.0109; FDR = 0.0189) and trypsin (P = 0.0077; FDR = virulence factors of both mammalian and plant patho- 0.0178) families are consistent with their being virulence gens because of their ability to cleave an array of host factors in insect pathogens [11]. However, different proteins [22]. Compared to phytopathogenic fungi (aver- families of proteases are expanded in Metarhizium spp. age 17), their number is significantly (P = 0.0059; FDR = and C. militaris, consistent with each lineage ?reinvent- 0.0057) expanded in the three insect pathogens (average ing the wheel? during the evolution of entomopathogeni- 24) (Table S4 in Additional file 1). city. Thus, relative to Metarhizium spp., the S01 trypsin Compared to many plant pathogens, Metarhizium spp. and S08 subtilisin subfamilies are smaller and the S53 and C. militaris have fewer cutinases for degrading plant subfamily is larger (Table S6 in Additional file 1). The cell walls (Table S1 in Additional file 1). They also have C. militaris genome has 12 trypsin genes compared to 4 fewer (average 137, P < 0.05) GHs than plant pathogens or less in plant pathogens. It lacks four subfamilies of (average 199), including the lack of 20 GH families used trypsins present in M. anisopliae. Interestingly, the by most plant pathogens and saprobes to target plant Zheng et al. Genome Biology 2011, 12:R116 Page 6 of 21 http://genomebiology.com/2011/12/11/R116 Table 2 Genome-wide analysis of C. militaris gene sets Characteristics C. militaris CMM corea CMA restrictedb CMC restrictedc CCM specific Number of genes 9,684 7,981 217 158 1,328 Mean gene length (bp) 1,742 1,885 1,445 1,440 967 Mean number of introns per gene 1.99 2.05 1.77 1.69 1.71 Percentage genes without introns 21.3 20.1 22.6 28.5 27.5 Percentage GC content (excluding introns) 58.6 58.6 70.7 58.7 58.3 Number of InterproScan protein families 2,644 2,552 69 52 112 Number of secreted proteins 1,572 1,250 45 13 264 Number of PHI genesd 1,547 1,539 4 2 2 Number of TSA proteases 68 65 3 0 0 Number of MFS genes 245 242 0 2 1 Number of cytochrome P450s 57 56 0 1 0 Number of Pth11-like GPCRs 18 18 0 0 0 Number of protein kinases 167 167 0 0 0 Number of transcription factors 123 120 2 1 0 Number of glycoside hydrolases 105 103 2 0 0 Number of SM backbone genes 28 28 0 0 0 Number of horizontally transferred genes 49 30 5 1 12 Number of orthologs in M. anisopliae 6,863 6,705 158 NA NA Number of orthologs in M. acridum 6,762 6,644 NA 118 NA Number of orthologs in F. graminearum 6,740 6,376 106 89 169 Number of orthologs in M. oryzae 6,219 5,937 90 80 112 Percentage identity to M. anisopliae orthologs 63.4 63.7 51.3 NA NA Percentage identity to M. acridum orthologs 63.4 63.6 NA 51.2 NA Percentage identity to F. graminearum orthologs 61.6 62.3 51.9 52.8 46.7 Percentage identity to M. oryzae orthologs 56.0 56.4 48.3 48.0 45.3 aCMM core: C. militaris (CCM), M. anisopliae and M. acridum genes grouped with a cutoff E value of 1e-5 during reciprocal Blast analysis. bCMA restricted: C. militias and M. anisopliae restricted genes grouped with a cutoff E value of 1e-5. cCMC restricted: C. militaris and M. acridum restricted genes grouped at a cutoff E value of 1e-5. dPHI genes, pathogen-host interaction genes predicted by blast analysis against the PHI database [72]. Identity was estimated at the amino acid level. GPCR, G-protein coupled receptor; MFS, major facilitator superfamily; NA, not available; SM, secondary metabolite; TSA, trypsin, subtilisin and aspartyl protease. cell walls - for example, GH6, GH7 and GH61 cellu- xenobiotics and the biosynthesis of secondary metabo- lases, GH10 and GH11 xylanases, GH28 pectinases and lites [25]. C. militaris has only about half as many CYPs GH78 rhamnosidases (Table S7 in Additional file 1). as Metarhizium spp., and most other fungi (Table S8 in There are also significant differences in the spectrum of Additional file 1). Seventy CYP subfamilies present in enzymes produced by the entomopathogens. For exam- M. anisopliae and/or M. acridum are absent in C. mili- ple, compared to M. anisopliae, C. militaris has few taris. Of particular interest, C. militaris lacks CYP55, xyloglucosyl transferases (GH16) for xyloglucan catabo- CYP58 and CYP65. CYP55 is a nitric oxide reductase lism and lacks a-glucuronidases (GH115) active on required for denitrification [25]. Thus, unlike most fila- xylan oligomers or polymeric xylan [23]. Consistent with mentous fungi, C. militaris may not respond to hypoxia this, C. militaris grows very poorly on xylose when com- through the bacterial ammonia fermentation mechan- pared with M. anisopliae (Figure S2 in Additional file 2). ism. The absence of CYP58 (trichodiene oxygenase) and A phosphoketolase MPK1 involved in pentose metabo- CYP65 (trichothecene C-15 hydroxylase) suggests that lism is required for full virulence of M. anisopliae [24], C. militaris will not produce the mycotoxin trichothe- but the homolog is absent in C. militaris. However, cene [26]. M. anisopliae can efficiently metabolize insect GH18 chitinases similar to those used by Metarhizium epicuticle alkanes [27]. The CYP52 subfamily for alkane to degrade insect cuticles [11] are well represented in hydroxylation [25] is well represented in Cordyceps. the C. militaris genome (20 in CCM versus 30 in MAA The major facilitator superfamily (MFS) and ATP- and 19 in MAC) relative to plant pathogens (average 11) binding cassette (ABC) transporters are the two biggest (Table S7 in Additional file 1). families of fungal transporters. Members of the former Cytochrome P450s (CYPs) play essential roles in fun- typically function as nutrient symporters and drug anti- gal physiologies, including detoxification, degradation of porters, whereas the latter are more often implicated in Zheng et al. Genome Biology 2011, 12:R116 Page 7 of 21 http://genomebiology.com/2011/12/11/R116 defense against toxic metabolites [28]. C. militaris has Strain Cm01 forms fruiting bodies on caterpillar approximately half (123) the number of these transpor- pupae that lack perithecia and ascospores (Figure 5a-e). ters as Metarhizium (269 in MAA and 236 in MAC) Thus, it is the first ascomycete species reported to fruit (Table S9 in Additional file 1). The MFS transporters without an opposite mating-type partner. Other C. mili- that are underrepresented in Cordyceps include the car- taris isolates could also fruit sterilely with a single mat- bohydrate symporters (37 in CCM versus 48 in MAA, ing-type locus (Figure 6a, b). However, a hybrid strain, 51 in MAC and an average of 58 in plant pathogens), Cm06, with both MAT1-1 and MAT1-2 loci produced vitamin B2 (riboflavin) transporters (2 in CCM versus sexual perithecia and ascospores (Figure 5f, g). In addi- 17 each in Metarhizium species and an average of 4 in tion, the sexual structures could be similarly re-formed plant pathogens) and multidrug antiporters (23 in CCM after inoculation of the caterpillar pupae with different versus 110 in MAA, 77 in MAC and an average of 10 in ratios of MAT1-1 and MAT1-2 isolate conidia (Figure plant pathogens). Consistent with their having many 6c-e), confirming that C. militaris is heterothallic. PCR multidrug transporters, Metarhizium spp. are resistant examination of 18 field-collected strains identified three to diverse antibiotics and fungicides [29]. Cordyceps has containing both MAT1-1 and MAT1-2 loci (Figure 5h). more ABC-type drug and metal resistant proteins than However, 28 out of 30 single spore isolates of the Cm06 Metarhizium and plant pathogens (63 in CCM, 56 in strain belonged to the MAT1-1 mating-type (Figure 5i). MAA, 51 in MAC and an average of 54 in plant patho- A similar unequal prevalence of mating types occurs in gens). The amino acid and dipeptide transporters are the dermatophyte fungus [31]. similarly represented in the three insect pathogens and other fungi (46 in CCM versus 53 in MAA, 49 in MAC Metabolism of medically active components and and an average of 45 in plant pathogens). mycotoxins Fungal G-protein coupled receptors (GPCRs) are One of the main pharmaceutically active components of required for pheromone/nutrient sensing and host C. militaris is cordycepin [5,6], which is structurally recognition [11]. Thus, the Pth11-like GPCR of Magna- similar to 2?-deoxyadenosine (Figure 7a). C. militaris porthe mediates cell differentiation in responses to plant possesses most of the genes required for metabolism of inductive cues [30]. C. militaris has fewer GPCRs than adenine and adenosine except for lacking a ribonucleo- Metarhizium spp. and is particularly impoverished in tide trisphosphate reductase (RNR; converts ATP to Pth11-like GPCRs (Table S10 in Additional file 1). C. dATP) and a deoxyadenosine kinase (converts deoxyade- militaris has a similar number (167) of protein kinases nosine to dAMP) (Figure 7b; Table S13 in Additional as M. anisopliae (161) but less than M. acridum (192) file 1). It has been suggested that the biosynthesis of (Table S11 in Additional file 1). Like other fungi, fungal cordycepin proceeds through a reductive mechanism as specific transcription factors (TFs) and zinc finger TFs described for the formation of 2?-deoxyadenosine [32]. represent the two largest classes of TFs in C. militaris However, C. militaris resembles Metarhizium and other and their numbers are similar to those of other fungi cordycepin non-producing fungi in having only two (Table S1 in Additional file 1). highly conserved subunits of class I RNRs (Figure S3 in Additional file 2). The substrates for class I RNRs are Mating-type and sexuality analysis ADP, GDP, CDP and UDP but not TDP or nucleosides, The fruiting bodies of Cordyceps spp. are the most com- and as the reductive reaction proceeds via a free radical monly sold traditional Chinese medicine products [5]. mechanism [33], C. militaris RNRs will not be involved However, the sexual cycle and fruiting of C. militaris is in cordycepin production. poorly understood. We only identified a MAT1-1 mat- Contamination of food and feed by mycotoxins is a ing-type locus, including MAT1-1-1 and MAT1-1-2 longstanding threat to the health of humans and animals genes, in the sequenced Cm01 strain, suggesting that C. [26]. C. militaris has been consumed for hundreds of militaris is heterothallic (Figure 4a). A single mating- years, implying safety, but the genome data allowed us type locus was also found in M. anisopliae (MAT1-1) to make the first comprehensive inventory of Cordyceps and M. acridum (MAT1-2). Like aspergilli [10], the idio- genes involved in biosynthesis of secondary metabolites morphic regions of the three insect pathogens are highly for comparison with known mycotoxins. There are divergent (Figure 4a). The MAT1-1 locus of M. aniso- fewer secondary metabolite core genes in C. militaris pliae contains a MAT1-1-3 gene but lacks the MAT1-1- relative to Metarhizium spp. or plant pathogens (Table 2 gene present in C. militaris. Except for the mating- 3). In comparison to Metarhizium spp., Cordyceps has type locus region, most A. nidulans and N. crassa genes fewer terpenoid synthases, polyketide synthases (PKSs) involved in mating, fruiting, karyogamy and meiosis are and non-ribosomal peptide synthetases (NRPSs). Phylo- also present in insect pathogens (Table S12 in Addi- genetic analysis of Cordyceps PKS and PKS-like genes tional file 1). using the ketoacyl CoA synthase (KS) domain sequences Zheng et al. Genome Biology 2011, 12:R116 Page 8 of 21 http://genomebiology.com/2011/12/11/R116 Figure 4 Comparative analysis of the C. militaris mating-type (MAT) locus. (a) Comparative analysis of the C. militaris MAT locus with those of sexually heterothallic and homothallic fungal species. Genes labeled in the same color have orthologous relationships. (b) Syntenic relationship of the MAT loci and their flanking regions between the three insect pathogens C. militaris (CCM), M. anisopliae (MAA) and M. acridum (MAC). Zheng et al. Genome Biology 2011, 12:R116 Page 9 of 21 http://genomebiology.com/2011/12/11/R116 Figure 5 Fruiting body development, sexuality and mating-type analysis. (a-c) Chinese Tussah silkmoth pupae were inoculated with conidia from the C. militaris Cm01 strain and incubated for 14 days (a), 29 days (b) and 59 days (c) to produce nascent, mid-term and developmentally mature fruiting bodies. (d-g) The mature fruiting bodies of the Cm01 strain do not produce perithecia (d, e) but those of strain Cm06 are completely covered with protruded perithecia (f, g). (h) PCR examination of different strains (numbers labeled on the top) showed that strains Cm06, Pm36 and 80399 contain the MAT1-1-1, MAT1-1-2 and MAT1-2-1 genes while Cm01 and other strains lack the MAT1-2-1 gene. (i) PCR examination of 30 randomly selected single spore isolates from the hybrid strain Cm06 showed that only 2 out of 30 isolates contain the MAT1-2-1 gene. Zheng et al. Genome Biology 2011, 12:R116 Page 10 of 21 http://genomebiology.com/2011/12/11/R116 Figure 6 Fruiting structures of different mating-type isolates. (a, b) Sterile fruiting bodies formed on caterpillar pupae after inoculation of MAT1-1 (a) and MAT1-2 (b) isolates acquired by single conidial spore isolation from a MAT1-1/MAT1-2 hybrid strain, Cm06. (c-e) Fertile fruiting structures formed on caterpillar pupae after inoculation of the mixed conidia of MAT1-1 (Cm01) and MAT1-2 (Cm06) at ratios of 1:9 (c), 1:1 (d) and 9:1 (e), respectively. The right panels represent close-up views of corresponding sterile (without protruded perithecia) or fertile (with protruded perithecia) fruiting bodies. After inoculation, the pupae were incubated at 22?C with a 12:12 hour light:dark cycle for 60 days. Zheng et al. Genome Biology 2011, 12:R116 Page 11 of 21 http://genomebiology.com/2011/12/11/R116 Figure 7 Cordycepin analogues and the C. militaris adenine metabolic pathway. (a) The structures of cordycepin analogues. (b) The C. militaris adenine metabolic pathway. Abbreviations for different enzymes: ADA, adenosine deaminase; ADE, adenine deaminase; ADEK, adenylate kinase; ADK, adenosine kinase; ADN, adenosine nucleosidase; AMPD, AMP deaminase; APRT, adenine phosphoribosytransferase; DADK, deoxyadenylate kinase; DAK, deoxyadenosine kinase; NDK, nucleoside-diphosphate kinase; NT5E, 5?-nucleotidase; PK, pyruvate kinase; PNP, purine nucleoside phosphorylase; 3?-RNR, ribonucleotide triphosphate reductase. The red dashed lines show metabolic pathways present in other organisms but absent in C. militaris. found that the C. militaris proteins grouped into differ- different from mycotoxin PKSs (Figure 8b). The further ent clusters compared to PKSs for known mycotoxins survey showed that the CCM_00603 protein has only (Figure 8a). In addition, modular analysis indicated that, 27% identity with PatK and the gene cluster for patulin except for CCM_00603, which has a similar domain biosynthesis is absent in C. militaris (Table S14 in Addi- organization to the Aspergillus clavatus PatK gene for tional file 1). This suggests that C. militaris PKSs do not patulin biosynthesis, C. militaris PKSs are structurally produce patulin or other known human mycotoxins. Zheng et al. Genome Biology 2011, 12:R116 Page 12 of 21 http://genomebiology.com/2011/12/11/R116 Table 3 Numbers of core genes involved in the of undifferentiated mycelia from Sabouraud dextrose biosynthesis of secondary metabolites in different fungi broth (SDB) culture with developmental stages on cater- Core gene CCM MAA MAC FG MO BC SS NC AN pillar pupae defined as nascent (14 days, termed as sam- DMAT 1 5 3 0 3 1 1 1 6 ple FB1), stalk formation (29 days, FB2) and mature TC 3 3 3 3 3 3 3 3 5 fruiting bodies (59 days, FB3) (Figure 5a-c). Of the 9,684 TS 2 8 6 11 8 7 1 2 5 genes, more than 63% were expressed during both FAS 1 2 2 1 1 1 1 1 1 undifferentiated hyphal growth and formation of fruiting GGPS 3 4 4 3 3 0 0 1 0 bodies (Table S16 in Additional file 1). Relative to the NRPS 5 14 13 10 5 6 5 3 11 growth in SDB, more than 900 genes were significantly NRPS-like 8 9 8 11 6 8 5 3 12 (P < 0.05; FDR < 0.001) up-regulated while around PKS 9 24 13 14 12 16 16 6 24 2,000 genes were down-regulated during fungal fruiting PKS-like 2 3 4 1 3 6 2 2 4 (Figure 9a). A Pearson correlation analysis indicated that HYBRID 3 5 1 1 3 0 0 0 1 transcriptional profiles at the different stages of fruiting Total 37 77 57 55 47 48 34 22 69 body formation more closely resembled each other than Core genes encoding: DMAT, dimethylallyl tryptophan synthase; TC, terpenoid they resembled the transcriptomes of undifferentiated cyclase; TS, terpenoid synthase; FAS, fatty-acid synthase; GGPS, geranylgeranyl mycelia (Figure S6a in Additional file 2). This is consis- diphosphate synthase; NRPS, non-ribosomal peptide synthetase; PKS, tent with a Venn diagram analysis of the commonest polyketide synthetase; HYBRID, hybrid PKS-NRPS enzyme. Fungal species: CCM, C. militaris; MAA, M. anisopliae; MAC, M. acridum; FG, F. graminearum; co-expressed genes between different samples (Figure MO, M. oryzae; BC, Botrytis cinerea; SS, Sclerotinia sclerotiorum; NC, N. crassa; S6b in Additional file 2). Of the 100 most highly AN, A. nidulans. expressed genes in developing C. militaris fruiting bodies, 26 (FB1), 31 (FB2) and 37 (FB3) are functionally Similarly, phylogenetic and modular analyses indicated uncharacterized (Table S17 in Additional file 1). This that Cordyceps NRPSs had different protein structures suggests that the genes with unknown function are than any NRPSs involved in production of known myco- more likely to be stringently regulated and involved in toxins like enniatin, HC-toxin and gliotoxin (Figure S4 developmental processes than orthologs of genes with in Additional file 2). known function. These genes are thereby the targets for The mycotoxin ergot alkaloids have a wide range of future functional studies. In general, the genes involved biological activities and are important in pharmaceuti- in cell wall structure and biogenesis, detoxification, pro- cals and agriculture [26]. Dimethylallyl tryptophan tein degradation and amino acid transportation were synthase (DMAT) catalyzes the alkylation of L-trypto- significantly up-regulated during formation of fruiting phan, the first committed step in the ergot alkaloid bio- structures. In contrast, most of the genes specifically up- synthetic pathway [34]. C. militaris has one putative regulated by undifferentiated SDB cultures were DMAT gene (CCM_04410), in contrast to five in M. involved in rapid growth and carbohydrate metabolism. anisopliae and three in M. acridum (Table 3). A phylo- Concomitant with fruiting structure maturation, the genetic analysis showed that CCM_04410 is not clus- genes for cytoskeletal organization, cell cycle and sec- tered with the Claviceps DMAT clade involved in ergot ondary metabolism were up-regulated. alkaloid production (Figure S5 in Additional file 2). The Unlike other fungi, C. militaris can fruit sterilely in trichothecenes T-2 toxin and deoxynivalenol (type B tri- the absence of a sexual partner (Figure 5a-c). Perhaps chothecene) are natural fungal products that are toxic to because of this, 31 of the 42 C. militaris orthologs of both animals and plants [35]. Consistent with lacking sex-related genes identified in other ascomycetes were CYP58 and CYP65, the C. militaris genome also lacks not expressed or transcribed at low levels (< 10 tran- trichodiene synthase (Table S15 in Additional file 1). scripts per million tags (TPM)) in sterile fruiting bodies Thus, unlike Fusarium [26], C. militaris is not predicted (Table S18 in Additional file 1). However, in some to produce trichothecene mycotoxins. The presence of cases, C. militaris expresses paralogous genes to those terpenoid cyclase, terpenoid synthase, fatty-acid synthase employed by other fungi, suggesting they have co-opted and geranylgeranyl diphosphate synthase genes in the C. different components of the same signal transduction militaris genome suggests that the fungus is capable of pathways to fulfill similar functions. For example, producing an array of metabolites, but the identity of GATA-type TFs are important for fruiting in both A. these and their biological activities remain to be nidulans and N. crassa [36], but C. militaris fruiting determined. structures expressed orthologs of these genes at very low levels or not at all (Table S19 in Additional file 1). Transcriptional regulation of fruiting body development In contrast, the Zn2Cys6-type TFs were highly tran- To identify genes associated with C. militaris fruiting scribed during fruiting but not in undifferentiated fungal body development, we compared the expression profiles mycelia - for example, CCM_01809 and CCM_09644 Zheng et al. Genome Biology 2011, 12:R116 Page 13 of 21 http://genomebiology.com/2011/12/11/R116 Figure 8 Phylogenetic and modular analysis of C. militaris polyketide synthases compared with those involved in the production of human mycotoxins. (a) A neighbor-joining tree showing the relationships of ketoacyl CoA synthase (KS) domain sequences. (b) Modulation and comparison of C. militaris PKSs with those involved in production of mycotoxins. The PKS-NRPS hybrid proteins CCM_04722, CCM_08261 and CCM_08018 are not included in the analysis. Domain definitions: ACP, acyl carrier protein domain; AT, acyltransferase domain; CYC, cyclase domain; DH, dehydratase domain; ER, enoyl reductase domain; KR, ketoreductase domain; MT, methyltransferase domain; TE, thioesterase domain. The accessions and references for different mycotoxins are provided in the Materials and methods. (Figure 9b) - indicating that Zn2Cys6 type TFs are pre- orthologous genes were not transcribed (CCM_04200 dominately involved in the major developmental switch versus AN1017) or transcribed at low levels of production of fruiting structures. (CCM_01235 versus NCU02393) by C. militaris (Table Pheromone receptors, that is, GPCRs, control fungal S19 in Additional file 1). However, Cordyceps sharply fruiting body formation and sexual cycle but not vegeta- up-regulated (P < 0.05, FDR < 0.001) a MAPK paralog tive growth [36]. The pheromone receptor of C. mili- (CCM_09637) as well as a calcium regulated kinase taris has not been identified. In comparison to (CaMK, CCM_06085) (Figure 9c). These data, taken in undifferentiated mycelial growth, a putative pheromone conjunction with the single adenylate cyclase receptor (CCM_01499) and a Pth11-like GPCR (CCM_06928) not being transcribed and the low level (CCM_03015) were significantly up-regulated (P < 0.05, expression of both protein kinase A (PKA; CCM_03352) FDR < 0.001), respectively, during initiation of fruiting and Rap GTPase (CCM_01391), indicate that fruiting by body formation. Mitogen-activated protein kinase C. militaris in the absence of a partner is more depen- (MAPK) genes are required for fruiting in Aspergillus dent on the MAPK pathway than the cAMP-dependent (AN1017) and Neurospora (NC02393) [36], but PKA pathway (Figure 10). Zheng et al. Genome Biology 2011, 12:R116 Page 14 of 21 http://genomebiology.com/2011/12/11/R116 Figure 9 Differential gene expression by C. militaris in association with fruiting structure formation or growth in a liquid medium. (a) Estimation of significantly up- and down-regulated genes between different samples. (b) Heat map of protein kinases associated with the mitogen-activated and cAMP-dependent protein kinase pathways at different developmental stages. (c) Heat map of the highly expressed transcription factors at different developmental stages. Genes with expression values > 100 transcripts per million tags (TPM) are also indicated in red. Annotation information for the genes is provided in Table S19 in Additional file 1. DEG, differentially expressed gene. FB1, FB2 and FB3 are associated with nascent, stalk formation and mature developmental stages shown in Figure 5a-c, respectively. The transcriptome of undifferentiated mycelia harvested from SDB was included as a reference for gene expression analysis. Discussion involved in interactions with insect hosts. Cordyceps We report here the first genome analysis of a Cordyceps resembles Metarhizium spp. in having a very high per- species, the medicinal lepidopteran pathogen C. mili- centage of secreted proteins relative to plant pathogens taris, and show that the fungus is capable of fruiting and saprophytes and expanded families of proteases and without an opposite mating-type partner. We also show chitinases with targets in insect hosts. However, insect- that it lacks genes known to be involved in production killing strategies may differ between Cordyceps and of human mycotoxins. Being an insect pathogen, the C. Metarhizium due to differences in gene content. Mat- militaris genome contains thousands of genes putatively ing-type analysis indicated that sexual reproduction in Zheng et al. Genome Biology 2011, 12:R116 Page 15 of 21 http://genomebiology.com/2011/12/11/R116 Figure 10 Putative signal transduction pathways regulating fruiting body development in C. militaris. The dashed lines show the cAMP- dependent PKA pathway, which might not be involved in control of fruiting in C. militaris. The transcription data for different components are provided in Table S19 in Additional file 1. AC, adenylate cyclase; CaMK, calmodulin-dependent protein kinase; CDK, cyclin-dependent kinase; PLC, phospholiapse C; RGS, regulator of G protein signaling. Zheng et al. Genome Biology 2011, 12:R116 Page 16 of 21 http://genomebiology.com/2011/12/11/R116 C. militaris is heterothallic. Transcriptional profiling lipid storage and appressorium penetration [41], and an indicated that fruiting of the MAT1-1 C. militaris strain osmosensor (CCM_04885 versus MAA_01551) to med- involves induction of the MAPK pathway, but unlike iate adaptation to the insect hemocoel [42]. Homologs other homothallic or heterothallic fungi, the PKA path- of these genes are broadly distributed among ascomy- way was not up-regulated. It remains to be determined cetes, indicative of an ancient origin. However, Cordy- whether this reflects the very unusual ability of C. mili- ceps lacks other Metarhizium pathogenicity-related taris to produce fruiting bodies without an opposite genes, including a collagen-like protein to evade the mating-type partner. host immune system [43], a phosphoketolase for pentose Aside from knowing that C. militaris infects lepidop- metabolism [24] and the adhesins to mediate spore teran pupae [37], the life cycle of C. militaris in nature adhesions to insect and plant surfaces [44]. The absence is poorly understood [8]. Following disease, survival in of key components of the Metarhizium entomopatho- soil may depend on the sexual stage of Cordyceps pro- genicity ?toolkit? from C. militaris indicates that it has viding resilient long-lived ascospores as described in evolved different determinants to mediate its interac- other fungi [38]. Micro-cycle conidiation from germi- tions with insects. nated ascospores could adapt the fungus to nutrient Like N. crassa and F. graminearum, C. miltaris lacks poor niches (Figure 1). Metarhizium does not produce highly similar paralogs, a hallmark of the repeat-induced ascospores but flourishes in plant rhizospheres, which point mutation (RIP) mechanism [20]. C. militaris has thus provide an alternative habitat in the absence of an ortholog (CCM_03609) of the N. crassa RIP defective insect hosts. C. militaris can grow on germinated soy- gene (NCU02034), a cytosine methyltranserase essential beans [39], suggesting a potential for an association with for RIP [45]. The high C?T and G?A mutation bias in plants. However, relative to Metarhizium and most the C. militaris genome and the readiness of C. miltaris other ascomycetes, many protein families are smaller in to undergo the sexual cycle suggests that RIP is com- the C. militaris genome, especially serine proteases, monplace in C. militaris like many ascomycetes GHs, CYPs, MFS transporters and signal transduction [10,15,19]. Since RIP can function effectively against factors. These families would be involved in scavenging selfish DNAs [46], it likely contributes, at least in part, for nutrients, avoidance of host defenses and toxins and to C. militaris having few DNA type transposon other processes related to pathogenicity and a saprobic encoded genes, that is, transposases [15]. lifestyle. Around two-thirds of these protein families There are many more orphan genes in Cordyceps than include pathogen-host interaction genes in plant-asso- in Metarhizium spp., underscoring that much about the ciated fungi (Table S4 in Additional file 1). Further stu- proteome of Cordyceps spp. remains unknown. It is dies on the ecology of Cordyceps spp. will shed more speculated that orphan genes arise from gene duplica- light on the relevance of the C. militaris genome to the tion, shuffling of gene fragments, mobile element evolution of gene families in relation to acquisition/loss effects, mutation of existing sequences, horizontal gene of capability for dual plant/insect colonization and host transfer and de novo origination from non-coding range specialization. DNAs [47]. De novo creation of new genes is probably The phylogenomic analysis demonstrated that the rare [48]. A role for mobile element effects is also unli- lineage leading to Cordyceps spp. diverged after most kely given how few putative transposase genes are pre- well known plant pathogens, including F. graminearum, sent in the C. militaris genome. Putative horizontal but 130 MYA before Metarhizium diverged from the gene transfer genes are even fewer in Cordyceps than in grass endophyte Epichlo? festucae. The estimate of a Metarhizium. Thus, the numerous orphans in the C. Triassic-Jurassic boundary origin for the Cordyceps line- militaris genome most likely arose from frequent muta- age and the post-Cretaceous origin of Metarhizium spp. tions caused by RIP in existing (duplicated) sequences. is consistent with the hypocrealean fungi of Cordycipita- Just as the Metarhizium-specific collagen-like protein is ceae (includes Cordyceps spp.), Clavicipitaceae (includes essentially required to camouflage cells from host Metarhizium spp.) and Ophiocordycipitaceae splitting immune recognition [43]. The transcriptome data about the same time as insects and angiosperms were showed that 428 of the 1,329 orphan genes were tran- diversifying [40]. Families of proteases and chitinases are scribed during fruiting. Of the 100 most highly not expanded or lost in the E. festucae genome as they expressed genes in developing C. militaris fruiting are in Cordyceps and Metarhizium, exemplifying conver- bodies, about one-third are orphans (Table S17 in Addi- gent evolution to insect pathogenicity. Besides proteases tional file 1), underscoring the potential of orphans to and chitinases, experimentally verified Metarhizium have specific functions. Likewise, genes that are appar- virulence-associated genes with homologs in the C. mili- ently unique to the mushroom Schizophyllum commune taris genome include a perilipin-like protein are more likely to be expressed during mushroom for- (CCM_06103 versus MAA_08819) to control cellular mation [49]. Zheng et al. Genome Biology 2011, 12:R116 Page 17 of 21 http://genomebiology.com/2011/12/11/R116 Concern has been raised about the possibility of produces medicinal compounds and so further their harmful side effects of traditional Chinese medicines, exploitation. including Cordyceps [50]. Consistent with genotoxicity and cytotoxicity assays that show Cordyceps products to Materials and methods be safe for consumption [51], there is no evidence in Fungal strains the C. militaris genome for genes involved in the pro- C. militaris strain Cm01 (CGMCC 3.14242) was selected duction of known mycotoxins. However, safety could for genome sequencing as it is culturally stable and only be completely verified by meticulous profiling of commercialized in China. The culture was maintained the metabolites produced by the fungus under diverse either on artificial medium or silkworm pupae as pre- growth conditions. The C. militaris genome data will viously described [12]. Several different C. militaris facilitate these processes as well as help with elucidation strains were included in this study for PCR genotyping of the biosynthetic pathways of different metabolites. of mating-type genes (Figure 5h). The analysis of C. militaris genome indicates that it is sexually heterothallic, but strikingly, both the MAT1-1 Genome sequencing and assembly single mating-type and MAT1-1/MAT1-2 hybrid strains The genome of C. militaris strain Cm01 was shotgun can form fruiting bodies, which means that C. militaris sequenced using a Roche 454 GS FLX system for mas- is capable of fruiting without a partner. Single mating- sively parallel pyrosequencing for 2.25 runs at the Chi- type (haploid) fruiting has also been observed in the nese National Human Genome Center (Shanghai, human pathogens Cryptococcus neoformans and Can- China). This resulted in 951 Mb of sequence data (29.6 dida albicans [52]. Given that perithecia and ascospores ? coverage) with an average read length of 385 bp. are not produced by MAT1-1 fruiting bodies, C. mili- Assembly was performed using the Newbler software taris haploid fruiting is different from the same-sex mat- (v2.3) within the Roche 454 suite package [56], which ing and fruiting of C. neoformans, in which produced 597 contigs with a total size of 32.2 Mb. For diploidization and meiosis can occur. In budding yeast, sequence scaffolding, a DNA library of 2- to 5-kb inserts meiotic recombination is initiated by the formation of was generated and sequenced with an ABI SOLiD sys- double-strand breaks catalyzed by SPO11, a meiosis-spe- tem (Carlsbad, California, USA). This resulted in 3.8 Gb cific endonuclease [53]. The meiosis-specific recombi- of mate-pair reads (117.4 ? coverage) to improve nase DMC1 and the DNA repair enzyme RAD51 then sequence quality and construct scaffolds. By mapping co-localize to double-strand breaks and function the reads to contigs, 578 contigs were assembled into 13 together for meiotic recombination [54]. The C. mili- scaffolds and 19 contigs less than 2 kb left outside. The taris homologue of yeast SPO11 (CCM_09527) was up- raw data of 454 and SOLiD reads have been deposited regulated more than five-fold during fruiting body at NCBI?s Sequence Read Archive under accession num- maturation (the TPM ratio of FB2/FB1 = 8.1; FB3/FB2 = ber SRA047932 and the whole project has been depos- 5.7). Intriguingly, the C. militaris genome lacks a yeast ited at DDBJ/EMBL/GenBank under accession number RAD51 ortholog, but its DMC1 ortholog (CCM_06822) AEVU00000000. contains a RAD51 domain. CCM_06822 was not expressed by C. militaris during fruiting, which may Annotation explain why the C. militaris MAT1-1 strain forms fruit- To maximize gene prediction accuracy, the gene struc- ing bodies without meiosis. Consistent with this, a puta- tures of Cordyceps were predicted with a combination of tive cyclin dependent kinase 7 (CDK7; CCM_03900) was different algorithms plus manual inspections [11,57]. up-regulated during fungal fruiting (Table S19 in Addi- The inconsistent open reading frames were individually tional file 1). Orthologs of CDK7 initiate DNA synthesis subject to Blast searches against the NCBI curated and facilitate mitosis instead of meiosis [55]. refseq_protein database. The prediction with the best hit was selected. Pseudogene identification was conducted Conclusions with the pipeline of PseudoPipe with default settings In conclusion, we report on the genome sequencing, [58]. The potential secreted proteins of C. militaris and comparative genome analysis and transcriptional regula- other fungal species included for comparison were pre- tion of fruiting body development in the medicinal fun- dicted by SignaIP 3.0 analysis using a hidden Markov gus C. militaris. The sequence data should markedly model [59]. Genome repetitive elements were analyzed enhance the pace of molecular research on Cordyceps by Blast against the RepeatMasker library (Open 3.2.9) biology, fungal sex and pathogenicity, and will have [60] and with the Tandem Repeats Finder [61]. The impacts on the commercial production of fruiting struc- transposases/retrotransposases were classified by Blastp tures. The genomic sequence will also be an essential analysis against the Repbase [62] plus manual tool to unravel the mechanisms by which C. militaris inspections. Zheng et al. Genome Biology 2011, 12:R116 Page 18 of 21 http://genomebiology.com/2011/12/11/R116 Blast score ratio test compare the differences in protein family sizes between BSR tests [14] were conducted to compare the differ- insect and plant pathogens. Estimation of FDR of P- ences between C. militaris and the sequenced Metarhi- values was conducted using the program mafdr (Matlab zum genomes and the plant pathogen F. graminearum, 7.8.0.347(R2009a)). respectively. The BSR index for each reference protein is calculated by dividing the query bit score by the refer- Analysis of genes involved in purine synthesis and ence score and normalized from 0 to 1. A score of 1 secondary metabolism indicates a perfect match while a score of 0 indicates no To model the biosynthesis of cordycepin, the purine Blast match of a query protein in the reference pro- metabolic pathway in C. militaris was constructed based teome. The normalized pairs of BSR indices were then on the KEGG (Kyoto Encyclopedia of Genes and Gen- plotted using the Matlab (v7.0) program (Natick, Massa- omes) annotations [73]. To identify NRPS, PKS or chusetts, USA). The same analysis was conducted for NRPS-PKS hybrid genes and gene clusters, the whole the three genomes of C. militaris, M. anisopliae and F. genome data set was subjected to analysis with the pro- graminearum. gram SMURF with default settings [74]. Modulation analysis and domain extraction of different NRPS or Orthology and phylogenomic analysis PKS proteins were conducted by Blast searching against In total, 2,106 orthologous proteins were acquired by a the SBSPKS database [75]. For phylogenetic analysis, the reciprocal Blast method with a cutoff E value of 1e-20 domain sequences were aligned with Clustal X 2.0 and and a Blast alignment length greater than 60% of the the tree was generated using a Poisson model with query sequence. Corresponding orthologous gene pro- 1,000 bootstrap replications and pair-wise deletions for tein sequences were aligned with Clustal X 2.0 and the gaps or missing data. The mycotoxin-encoding PKSs concatenated amino acid sequences were used for the used in the analysis include Gibberella zeae PKS4 generation of a maximum likelihood phylogenomic tree (ABB90283) and PKS13 (ABB90282) for the biosynthesis with the program TREE-PUZZLE [63] using a Dayhoff of zearalenones [76], Aspergillus ochraceus PKS model. The divergence time between species was esti- (AAT92023) for ochratoxin [77], A. clavatus PatK mated with the program r8s using a Langley-Fitch (ACLA_093660) for patulin [78], Gibberella moniliformis model [64] by calibration with the origin of the Ascomy- Fum1p (AAD43562) for fumonisin [79], Monascus pur- cota at 500 to 650 MYA [65]. pureus PKS (BAD44749) for citrinin [80], Aspergillus fla- vus PksA (AAS90093) for aflatoxin [81] and A. nidulans Protein family classifications StcA (Q12397) for sterigmatocystin [82]. The myco- Whole genome protein families were classified by Inter- toxin-encoding NRPSs included in the analysis are A. proScan [66] and Pfam [67] analysis. The families of fumigatus Glip (AAW03307) for gliotoxin [83], Fusar- proteases were identified by Blastp searching against the ium equiseti NRPS (CAA79245) for enniatin [84], MEROPS peptidase database release 9.4 with a cutoff E Cochliobolus carbonum NRPS (AAA33023) for HC- value of 1e-20 [68]. The CYPs were named according to toxin [85] and Tolypocladium inflatum NRPS the classifications collected at the P450 database [69]. (CAA82227) for cyclosporin [86]. Transporters were classified based on the Transport Classification Database [70]. Kinases were classified by Transcriptome analysis Blastp analysis against the KinBase database with a cut- Conidia of C. militaris from day 14 potato dextrose agar off E value of 1e-10 [68]. Carbohydrate-active enzymes were inoculated into SDB and the undifferentiated were classified by local Blastp searching against a library mycelia harvested after a 72-hour incubation at 25?C, of catalytic and carbohydrate-binding module enzymes 180 rpm. The transcriptome of the mycelia provided a [68]. G-protein-coupled receptors were selected from control for comparison with the transcriptomes of fruit- the best hits to GPCRDB sequences [71] and by confir- ing bodies. Chinese Tussah silkmoth (Antheraea pernyi) mation that they contained seven transmembrane pupae were injected with 50 ?l of a conidial suspension helices with the amino terminus outside and the car- (5 ? 106 conidia/ml) and incubated at 22?C in a 12:12 boxyl terminus inside the plasma membrane. Homologs hour light:dark cycle for up to 14 days to allow emer- of the Magnaporthe Pth11-like GPCRs [30] were identi- gence of fruiting bodies (a stage designated as FB1), 29 fied by local Blastp analysis with a cutoff E value of 1e- days for half-grown fruiting bodies (FB2) and 59 days, 10. Putative Cordyceps virulence factors were identified by which time fruiting bodies were mature (FB3) by searching against the pathogen-host interaction data- [11,12]. RNA was extracted with a Qiagen RNeasy kit base [72] with a cutoff E value of 1e-5, plus additional plus on-column treatment with RNase-free DNase I searches of known virulence genes reported in entomo- (Germantown, Maryland, USA). Messenger RNA was pathogenic fungi. Two sample t-tests were conducted to purified and after reverse transcription into cDNA, the Zheng et al. Genome Biology 2011, 12:R116 Page 19 of 21 http://genomebiology.com/2011/12/11/R116 libraries were constructed for tag preparation according ribosomal peptide synthetase; PKA: protein kinase A; PKS: polyketide to the massively parallel signature sequencing protocol synthase; RIP: repeat-induced point mutation; RNR: ribonucleotide trisphosphate reductase; SDB: Sabouraud dextrose broth; TF: transcription [87]. The tags were sequenced with an Illumina techni- factor; TPM: transcripts per million tags. que. We omitted tags from further analysis if only one copy was detected or it could be mapped to a different Acknowledgements CSW was supported by the Ministry of Science and Technology of China transcript. Other tags were mapped to the genome or (grant number 2009CB118904), the Ministry of Agriculture of China (grant annotated genes if they possessed no more than one number 2009ZX08009-035B), the Science and Technology Commission of nucleotide mismatch [11,88]. The abundance of each tag Shanghai Municipality (grant number 08DZ1970200) and the Chinese Academy of Sciences (KSCX2-EW-N-06 and KSCX2-EW-G-16). was converted to the value of transcripts per million (TPM) for each mapped gene for expressional compari- Author details 1 son between samples. The significance of differential Key Laboratory of Insect Developmental and Evolutionary Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, gene expression between samples and the FDR of P- Chinese Academy of Sciences, 300 Fenglin Road, Shanghai 200032, China. values were estimated for each individual gene with a 2Chinese National Human Genome Center at Shanghai, 250 Bibo Road, Shanghai 201203, China. 3cutoff of P ? 0.05 and FDR ?0.001 [11,89]. The RNA_- Institute of Microbiology, Chinese Academy of Sciences, 1 West Beichen Road, Beijing 100101, China. 4Department of seq expression dataset is available at the NCBI?s Gene Entomology, University of Maryland, 4112 Plant Sciences Building, College Expression Omnibus under the accession code Park, Maryland 20742, USA. GSE28001. Authors? contributions CSW initiated and designed the study. PZ, YLX, GHX, CHX, YH and YZ Additional material annotated genes and performed protein family analysis; GHX and HX performed phylogenetic and transcriptome analysis; PZ and SWZ conducted fruiting body induction, and DNA and RNA extraction; HJZ, SYW and GPZ Additional file 1: Comparative genomics analysis of C. militaris. The performed genome sequencing and assembly; PZ and GHX performed file contains additional information on genomic properties and transcriptome analysis; CSW and RJSL wrote the paper. All authors read and comparative gene family analysis of C. militaris with other fungi approved the final manuscript. comprising 19 tables provided in separate excel sheets. Table S1 summarizes major protein family sizes of different fungal species. Table Received: 4 July 2011 Revised: 10 November 2011 S2 provides a comparison of transposase genes among three insect Accepted: 23 November 2011 Published: 23 November 2011 pathogens. Table S3 lists the pseudogenes present in the genomes of three insect pathogens. Table S4 summarizes the protein families References putatively involved in pathogen-host interactions. Table S5 compares the 1. Sung GH, Hywel-Jones NL, Sung JM, Luangsa-Ard JJ, Shrestha B, proteases in different fungal genomes. Table S6 lists the serine and Spatafora JW: Phylogenetic classification of Cordyceps and the aspartyl proteases in three insect pathogens. Table S7 lists the glycoside clavicipitaceous fungi. Stud Mycol 2007, 57:5-59. hydrolase families in different fungal genomes. Table S8 compares the 2. de Faria MR, Wraight SP: Mycoinsecticides and Mycoacaricides: a cytochrome P450 genes in three insect pathogens. Table S9 summarizes comprehensive list with worldwide coverage and international the membrane transporters in different fungal genomes. Table S10 classification of formulation types. Biol Control 2007, 43:237-256. compares the G-protein-coupled receptors in three insect pathogens. 3. St Leger RJ, Wang C: Genetic engineering of fungal biocontrol agents to Table S11 lists the protein kinases in three insect pathogens. Table S12 achieve greater efficacy against insect pests. Appl Microbiol Biotechnol provides the information of mating- and sexuality-related genes. Table 2010, 85:901-907. S13 lists the genes putatively involved in purine metabolisms in three 4. Paterson RR: Cordyceps: a traditional Chinese medicine and another insect pathogens. Table S14 summarizes the presence/absence of patulin fungal therapeutic biofactory? Phytochemistry 2008, 69:1469-1495. biosynthesis homologous genes in C. militaris. Table S15 summarizes the 5. Zhou X, Gong Z, Su Y, Lin J, Tang K: Cordyceps fungi: natural products, presence/absence of T-2 toxin biosynthesis homologous genes in C. pharmacological functions and developmental products. J Pharm militaris. Table S16 summarizes the information from RNAseq analysis. Pharmacol 2009, 61:279-291. Table S17 lists the 100 most highly expressed genes in C. militaris at 6. Xiao JH, Zhong JJ: Secondary metabolites from Cordyceps species and different growth stages. Table S18 lists the transcriptional data of their antitumor activity studies. Recent Pat Biotechnol 2007, 1:123-137. sexuality- and fruiting-related genes. Table S19 compares the expression 7. ClinicalTrials.gov.. [http://clinicaltrials.gov/show/NCT00709215]. data of genes putatively involved in signaling and transcription controls. 8. Stone R: Last stand for the body snatcher of the Himalayas? Science 2008, Additional file 2: Figures that provide support information for the 322:1182. main text. Figure S1 provides support for RIP occurring in C. militaris. 9. Lee SC, Ni M, Li W, Shertz C, Heitman J: The evolution of sex: a Figure S2 provides support for the lack of the pentose metabolic perspective from the fungal kingdom. Microbiol Mol Biol Rev 2010, pathway in C. militaris. Figure S3 provides a phylogeny analysis of fungal 74:298-340. ribonucleotide reductases. Figure S4 provides the phylogeny and 10. Galagan JE, Calvo SE, Cuomo C, Ma LJ, Wortman JR, Batzoglou S, Lee SI, modular analysis of C. militaris NRPSs. Figure S5 provides a phylogeny Bast?rkmen M, Spevak CC, Clutterbuck J, Kapitonov V, Jurka J, analysis of fungal dimethylallyl tryptophan synthases. Figure S6 provides Scazzocchio C, Farman M, Butler J, Purcell S, Harris S, Braus GH, Draht O, the gene transcription profiles between different samples. Busch S, D?Enfert C, Bouchier C, Goldman GH, Bell-Pedersen D, Griffiths- Jones S, Doonan JH, Yu J, Vienken K, Pain A, Freitag M, et al: Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature 2005, 438:1105-1115. Abbreviations 11. Gao Q, Jin K, Ying SH, Zhang Y, Xiao G, Shang Y, Duan Z, Hu X, Xie XQ, ABC: ATP-binding cassette; bp: base pair; BSR: Blast score ratio; CCM: Zhou G, Peng G, Luo Z, Huang W, Wang B, Fang W, Wang S, Zhong Y, Cordyceps militaris; CYP: cytochrome P450; DMAT: dimethylallyl tryptophan Ma LJ, St Leger RJ, Zhao GP, Pei Y, Feng MG, Xia Y, Wang CS: Genome synthase; FDR: false discovery rate; GH: glycoside hydrolase; GI: genomic sequencing and comparative transcriptomics of the model island; GPCR: G-protein coupled receptor; MAA: Metarhizium anisopliae; MAC: entomopathogenic fungi Metarhizium anisopliae and M. acridum. PLoS Metarhizium acridum; MAPK: mitogen-activated protein kinase; MAT: matting- Genet 2011, 7:e1001264. type; MFS: major facilitator superfamily; MYA: million years ago; NRPS: non- Zheng et al. Genome Biology 2011, 12:R116 Page 20 of 21 http://genomebiology.com/2011/12/11/R116 12. Xiong CH, Xia YL, Zheng P, Shi SH, Wang CS: Developmental stage- 30. Kulkarni RD, Thon MR, Pan H, Dean RA: Novel G-protein-coupled receptor- specific gene expression profiling for a medicinal fungus Cordyceps like proteins in the plant pathogenic fungus Magnaporthe grisea. militaris. Mycology 2010, 1:25-66. Genome Biol 2005, 6:R24. 13. Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, 31. Hejtmanek M, Hejtmankova N: Teleomorphs and mating types in FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Trichophyton mentagrophytes complex. Acta Univ Palacki Olomuc Fac Med Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, 1989, 123:11-33. Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, 32. Lennon MB, Suhadolnik RJ: Biosynthesis of 3?-deoxyadenosine by Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, et al: The genome Cordyceps militaris. Mechanism of reduction. Biochim Biophys Acta 1976, sequence of the filamentous fungus Neurospora crassa. Nature 2003, 425:532-536. 422:859-868. 33. Reichard P: Ribonucleotide reductases: substrate specificity by allostery. 14. Ma LJ, van der Does HC, Borkovich KA, Coleman JJ, Daboussi MJ, Di Biochem Biophys Res Commun 2010, 396:19-23. Pietro A, Dufresne M, Freitag M, Grabherr M, Henrissat B, Houterman PM, 34. Liu M, Panaccione DG, Schardl CL: Phylogenetic analyses reveal Kang S, Shim WB, Woloshuk C, Xie X, Xu JR, Antoniw J, Baker SE, Bluhm BH, monophyletic origin of the ergot alkaloid gene dmaW in fungi. Evol Breakspear A, Brown DW, Butchko RA, Chapman S, Coulson R, Coutinho PM, Bioinform 2009, 5:15-30. Danchin EG, Diener A, Gale LR, Gardiner DM, Goff S, et al: Comparative 35. Brown DW, McCormick SP, Alexander NJ, Proctor RH, Desjardins AE: A genomics reveals mobile pathogenicity chromosomes in Fusarium. genetic and biochemical approach to study trichothecene diversity in Nature 2010, 464:367-373. Fusarium sporotrichioides and Fusarium graminearum. Fungal Genet Biol 15. Kubicek CP, Herrera-Estrella A, Seidl-Seiboth V, Martinez DA, Druzhinina IS, 2001, 32:121-133. Thon M, Zeilinger S, Casas-Flores S, Horwitz BA, Mukherjee PK, 36. P?ggeler S, Nowrousian M, Ringelberg C, Loros JJ, Dunlap JC, Kuck U: Mukherjee M, Kredics L, Alcaraz LD, Aerts A, Antal Z, Atanasova L, Microarray and real-time PCR analyses reveal mating type-dependent Cervantes-Badillo MG, Challacombe J, Chertkov O, McCluskey K, Coulpier F, gene expression in a homothallic fungus. Mol Genet Genomics 2006, Deshpande N, von D?hren H, Ebbole DJ, Esquivel-Naranjo EU, Fekete E, 275:492-503. Flipphi M, Glaser F, G?mez-Rodr?guez EY, Gruber S, et al: Comparative 37. Kamata N, Sato H, Shimazu M: Seasonal changes in the infection of genome sequence analysis underscores mycoparasitism as the ancestral pupae of the beech caterpillar, Quadricalcarifera punctatella (Motsch.) life style of Trichoderma. Genome Biol 2011, 12:R40. (Lep., Notodontidae), by Cordyceps militaris Link (Clavicipitales, 16. Rajashekara G, Glasner JD, Glover DA, Splitter GA: Comparative whole- Clavicipitaceae) in the soil of the Japanese beech forest. J Appl Entomol genome hybridization reveals genomic islands in Brucella species. J 1997, 121:17-21. Bacteriol 2004, 186:5040-5051. 38. Clarkson JP, Staveley J, Phelps K, Young CS, Whipps JM: Ascospore release 17. Fedorova ND, Khaldi N, Joardar VS, Maiti R, Amedeo P, Anderson MJ, and survival in Sclerotinia sclerotiorum. Mycol Res 2003, 107:213-222. Crabtree J, Silva JC, Badger JH, Albarraq A, Angiuoli S, Bussey H, Bowyer P, 39. Ohta Y, Lee JB, Hayashi K, Fujita A, Park DK, Hayashi T: In vivo anti- Cotty PJ, Dyer PS, Egan A, Galens K, Fraser-Liggett CM, Haas BJ, Inman JM, influenza virus activity of an immunomodulatory acidic polysaccharide Kent R, Lemieux S, Malavazi I, Orvis J, Roemer T, Ronning CM, Sundaram JP, isolated from Cordyceps militaris grown on germinated soybeans. J Agric Sutton G, Turner G, Venter JC, et al: Genomic islands in the pathogenic Food Chem 2007, 55:10194-10199. filamentous fungus Aspergillus fumigatus. PLoS Genet 2008, 4:e1000046. 40. Sung GH, Poinar GO Jr, Spatafora JW: The oldest fossil evidence of animal 18. McElwain JC, Punyasena SW: Mass extinction events and the plant fossil parasitism by fungi supports a Cretaceous diversification of fungal- record. Trends Ecol Evol 2007, 22:548-557. arthropod symbioses. Mol Phylogenet Evol 2008, 49:495-502. 19. Cuomo CA, G?ldener U, Xu JR, Trail F, Turgeon BG, Di Pietro A, Walton JD, 41. Wang CS, St Leger RJ: The Metarhizium anisopliae perilipin homolog Ma LJ, Baker SE, Rep M, Adam G, Antoniw J, Baldwin T, Calvo S, Chang YL, MPL1 regulates lipid metabolism, appressorial turgor pressure, and Decaprio D, Gale LR, Gnerre S, Goswami RS, Hammond-Kosack K, Harris LJ, virulence. J Biol Chem 2007, 282:21110-21115. Hilburn K, Kennell JC, Kroken S, Magnuson JK, Mannhaupt G, Mauceli E, 42. Wang CS, Duan ZB, St Leger RJ: MOS1 osmosensor of Metarhizium Mewes HW, Mitterbauer R, Muehlbauer G, et al: The Fusarium anisopliae is required for adaptation to insect host hemolymph. Eukaryot graminearum genome reveals a link between localized polymorphism Cell 2008, 7:302-309. and pathogen specialization. Science 2007, 317:1400-1402. 43. Wang CS, St Leger RJ: A collagenous protective coat enables Metarhizium 20. Galagan JE, Selker EU: RIP: The evolutionary cost of genome defense. anisopliae to evade insect immune responses. Proc Natl Acad Sci USA Trends Genet 2004, 20:417-423. 2006, 103:6647-6652. 21. Screen SE, St Leger RJ: Cloning, expression, and substrate specificity of a 44. Wang CS, St Leger RJ: The MAD1 adhesin of Metarhizium anisopliae links fungal chymotrypsin. Evidence for lateral gene transfer from an adhesion with blastospore production and virulence to insects, and the actinomycete bacterium. J Biol Chem 2000, 275:6689-6694. MAD2 adhesin enables attachment to plants. Eukaryot Cell 2007, 22. Coetzer TH, Goldring JP, Huson LE: Oligopeptidase B: a processing 6:808-816. peptidase involved in pathogenesis. Biochimie 2008, 90:336-344. 45. Freitag M, Williams RL, Kothe GO, Selker EU: A cytosine methyltransferase 23. Chong SL, Battaglia E, Coutinho PM, Henrissat B, Tenkanen M, de Vries RP: homologue is essential for repeat-induced point mutation in Neurospora The ?-glucuronidase Agu1 from Schizophyllum commune is a member of crassa. Proc Natl Acad Sci USA 2002, 99:8802-8807. a novel glycoside hydrolase family (GH115). Appl Microbiol Biotechnol 46. Montiel MD, Lee HA, Archer DB: Evidence of RIP (repeat-induced point 2011, 90:1323-1332. mutation) in transposase sequences of Aspergillus oryzae. Fungal Genet 24. Duan ZB, Shang Y, Gao Q, Zheng P, Wang CS: A phosphoketolase Mpk1 Biol 2006, 43:439-445. of bacterial origin is adaptively required for full virulence in the insect- 47. Long M, Betran E, Thornton K, Wang W: The origin of new genes: pathogenic fungus Metarhizium anisopliae. Environ Microbiol 2009, glimpses from the young and old. Nat Rev Genet 2003, 4:865-875. 11:2351-2360. 48. Ekman D, Elofsson A: Identifying and quantifying orphan protein 25. Cre?nar B, Petri? S: Cytochrome P450 enzymes in the fungal kingdom. sequences in fungi. J Mol Biol 2010, 396:396-405. Biochim Biophys Acta 2011, 1814:29-35. 49. Ohm RA, de Jong JF, Lugones LG, Aerts A, Kothe E, Stajich JE, de Vries RP, 26. Keller NP, Turner G, Bennett JW: Fungal secondary metabolism - from Record E, Levasseur A, Baker SE, Bartholomew KA, Coutinho PM, Erdmann S, biochemistry to genomics. Nat Rev Microbiol 2005, 3:937-947. Fowler TJ, Gathman AC, Lombard V, Henrissat B, Knabe N, K?es U, Lilly WW, 27. Jarrold SL, Moore D, Potter U, Charnley AK: The contribution of surface Lindquist E, Lucas S, Magnuson JK, Piumi F, Raudaskoski M, Salamov A, waxes to pre-penetration growth of an entomopathogenic fungus on Schmutz J, Schwarze FW, vanKuyk PA, Horton JS, Grigoriev IV, et al: host cuticle. Mycol Res 2007, 111:240-249. Genome sequence of the model mushroom Schizophyllum commune. 28. Morschhauser J: Regulation of multidrug resistance in pathogenic fungi. Nat Biotechnol 2010, 28:957-963. Fungal Genet Biol 2010, 47:94-106. 50. Corson TW, Crews CM: Molecular understanding and modern application 29. Luz C, Netto MC, Rocha LF: In vitro susceptibility to fungicides by of traditional medicines: triumphs and trials. Cell 2007, 130:769-774. invertebrate-pathogenic and saprobic fungi. Mycopathologia 2007, 51. Yan WJ, Li TH, Lin QY, Song B, Jiang ZD: Safety assessment of Cordyceps 164:39-47. guangdongensis. Food Chem Toxicol 2010, 48:3080-3084. Zheng et al. Genome Biology 2011, 12:R116 Page 21 of 21 http://genomebiology.com/2011/12/11/R116 52. Heitman J: Evolution of eukaryotic microbial pathogens via covert sexual 83. Gardiner DM, Howlett BJ: Bioinformatic and expression analysis of the reproduction. Cell Host Microbe 2010, 8:86-99. putative gliotoxin biosynthetic gene cluster of Aspergillus fumigatus. 53. Keeney S: Mechanism and control of meiotic recombination initiation. FEMS Microbiol Lett 2005, 248:241-248. Curr Top Dev Biol 2001, 52:1-53. 84. Haese A, Schubert M, Herrmann M, Zocher R: Molecular characterization 54. Masson JY, West SC: The Rad51 and Dmc1 recombinases: a non-identical of the enniatin synthetase gene encoding a multifunctional enzyme twin relationship. Trends Biochem Sci 2001, 26:131-136. catalyzing N-methyldepsipeptide formation in Fusarium scirpi. Mol 55. Fisher RP: Secrets of a double agent: CDK7 in cell-cycle control and Microbiol 1993, 7:905-914. transcription. J Cell Sci 2005, 118:5171-5180. 85. Scott-Craig JS, Panaccione DG, Pocard JA, Walton JD: The cyclic peptide 56. Schatz MC, Delcher AL, Salzberg SL: Assembly of large genomes using synthetase catalyzing HC-toxin production in the filamentous fungus second-generation sequencing. Genome Res 2010, 20:1165-1173. Cochliobolus carbonum is encoded by a 15.7-kilobase open reading 57. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M: Gene frame. J Biol Chem 1992, 267:26044-26049. prediction in novel fungal genomes using an ab initio algorithm with 86. Weber G, Sch?rgendorfer K, Schneider-Scherzer E, Leitner E: The peptide unsupervised training. Genome Res 2008, 18:1979-1990. synthetase catalyzing cyclosporine production in Tolypocladium niveum 58. Zhang Z, Carriero N, Zheng D, Karro J, Harrison PM, Gerstein M: is encoded by a giant 45.8-kilobase open reading frame. Curr Genet 1994, PseudoPipe: an automated pseudogene identification pipeline. 26:120-125. Bioinformatics 2006, 22:1437-1439. 87. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, 59. SignalP 3.0 Server.. [http://www.cbs.dtu.dk/services/SignalP/]. McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, 60. RepeastMasker Server.. [http://www.repeatmasker.org/]. Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, 61. Benson G: Tandem repeats finder: a program to analyze DNA sequences. Kirchner J, Fearon K, Mao J, Corcoran K: Gene expression analysis by Nucleic Acids Res 1999, 27:573-580. massively parallel signature sequencing (MPSS) on microbead arrays. 62. Kohany O, Gentles AJ, Hankus L, Jurka J: Annotation, submission and Nat Biotechnol 2000, 18:630-634. screening of repetitive elements in Repbase: RepbaseSubmitter and 88. Bruno VM, Wang Z, Marjani SL, Euskirchen GM, Martin J, Sherlock G, Censor. BMC Bioinformatics 2006, 7:474. Snyder M: Comprehensive annotation of the transcriptome of the 63. Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: human fungal pathogen Candida albicans using RNA-seq. Genome Res maximum likelihood phylogenetic analysis using quartets and parallel 2010, 20:1451-1458. computing. Bioinformatics 2002, 18:502-504. 89. Benjamini Y, Yekutieli D: The control of the false discovery rate in 64. Taylor JW, Berbee M: Dating divergences in the Fungal Tree of Life: multiple testing under dependency. Ann Stat 2001, 29:1165-1188. review and new analyses. Mycologia 2006, 98:838-849. 65. Lucking R, Huhndorf S, Pfister DH, Plata ER, Lumbsch HT: Fungi evolved doi:10.1186/gb-2011-12-11-r116 right on track. Mycologia 2009, 101:810-822. Cite this article as: Zheng et al.: Genome sequence of the insect 66. InterPro Protein Sequence Analysis & Classification.. [http://www.ebi.ac. pathogenic fungus Cordyceps militaris, a valued traditional chinese uk/interpro/]. medicine. Genome Biology 2011 12:R116. 67. Pfam Database.. [http://pfam.sanger.ac.uk/]. 68. Schomburg D, Schomburg IL: Enzyme databases. Methods Mol Biol 2010, 609:113-128. 69. Cytochrome P450 Server.. [http://drnelson.uthsc.edu/CytochromeP450. html]. 70. Saier MH Jr, Tran CV, Barabote RD: TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Res 2006, 34:D181-186. 71. GPCR Database.. [http://www.gpcr.org/7tm/]. 72. PHI-base: Pathogen-Host Interactions.. [http://www.phi-base.org]. 73. Kanehisa M, Goto S, Kawashima S, Nakaya A: The KEGG databases at GenomeNet. Nucleic Acids Res 2002, 30:42-46. 74. SMURF.. [http://www.jcvi.org/smurf/index.php]. 75. Anand S, Prasad MV, Yadav G, Kumar N, Shehara J, Ansari MZ, Mohanty D: SBSPKS: structure based sequence analysis of polyketide synthases. Nucleic Acids Res 2010, 38:W487-496. 76. Kim YT, Lee YR, Jin J, Han KH, Kim H, Kim JC, Lee T, Yun SH, Lee YW: Two different polyketide synthase genes are required for synthesis of zearalenone in Gibberella zeae. Mol Microbiol 2005, 58:1102-1113. 77. Dao HP, Mathieu F, Lebrihi A: Two primer pairs to detect OTA producers by PCR method. Int J Food Microbiol 2005, 104:61-67. 78. Artigot MP, Loiseau N, Laffitte J, Mas-Reguieg L, Tadrist S, Oswald IP, Puel O: Molecular cloning and functional characterization of two CYP619 cytochrome P450s involved in biosynthesis of patulin in Aspergillus clavatus. Microbiology 2009, 155:1738-1747. 79. Proctor RH, Desjardins AE, Plattner RD, Hohn TM: A polyketide synthase gene required for biosynthesis of fumonisin mycotoxins in Gibberella Submit your next manuscript to BioMed Central fujikuroi mating population A. Fungal Genet Biol 1999, 27:100-112. and take full advantage of: 80. Shimizu T, Kinoshita H, Ishihara S, Sakai K, Nagai S, Nihira T: Polyketide synthase gene responsible for citrinin biosynthesis in Monascus purpureus. Appl Environ Microbiol 2005, 71:3453-3457. ? Convenient online submission 81. Ehrlich KC, Chang PK, Yu J, Cotty PJ: Aflatoxin biosynthesis cluster gene ? Thorough peer review cypA is required for G aflatoxin formation. Appl Environ Microbiol 2004, ? No space constraints or color figure charges 70:6518-6524. 82. Yu JH, Leonard TJ: Sterigmatocystin biosynthesis in Aspergillus nidulans ? Immediate publication on acceptance requires a novel type I polyketide synthase. J Bacteriol 1995, ? Inclusion in PubMed, CAS, Scopus and Google Scholar 177:4792-4800. ? Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit