High-quality Assembly of Dermatophagoides Pteronyssinus Genome and Transcriptome Reveals a Wide Range of Novel Allergens.
Xiao-Yu Liu,Kevin Yi Yang,Ming-Qiang Wang,Jamie Sui-Lam Kwok,Xi Zeng,Zhiyuan Yang,Xiao-Jun Xiao,Carol Po-Ying Lau,Ying Li,Zhi-Ming Huang,Jin-Ge Ba,Aldrin Kay-Yuen Yim,Chun-Yan Ouyang,Sai-Ming Ngai,Ting-Fung Chan,Elaine Lai-Han Leung,Liang Liu,Zhi-Gang Liu,Stephen Kwok-Wing Tsui
DOI: https://doi.org/10.1016/j.jaci.2017.11.038
2018-01-01
Abstract:House dust mite (HDM) exposure is a strong risk factor for childhood asthma in various parts of the world.1Huss K. Adkinson Jr., N.F. Eggleston P.A. Dawson C. Van Natta M.L. Hamilton R.G. House dust mite and cockroach exposure are strong risk factors for positive allergy skin test responses in the Childhood Asthma Management Program.J Allergy Clin Immunol. 2001; 107: 48-54Google Scholar The 2 predominant species of HDM are Dermatophagoides pteronyssinus (also known as European HDM) and Dermatophagoides farinae (also known as American HDM). In our previous study, the draft genome and transcriptome of D farinae revealed a spectrum of previously unknown allergens.2Chan T.F. Ji K.M. Yim A.K. Liu X.Y. Zhou J.W. Li R.Q. et al.The draft genome, transcriptome, and microbiome of Dermatophagoides farinae reveal a broad spectrum of dust mite allergens.J Allergy Clin Immunol. 2015; 135: 539-548Google Scholar In this study, we attempted to sequence a high-quality genome and transcriptome of D pteronyssinus. We further combined genomic and proteomic approaches to uncover previously unrecognized D pteronyssinus allergens as well as identify a number of potential mite allergens on the basis of homology searches of various nonmite sources. With these 2 closely related transcriptomes available, we sought to compare the expression of their allergen genes. We used a hybrid assembly approach using a combination of 3 sequencing technologies (PacBio Sequel, Illumina HiSeq 2000, and Thermo Ion Torrent) to generate 62 Gb of DNA sequencing data. After building a de novo draft genome assembly, we performed scaffolding, gap filling, and polishing processes (see this article's Methods section in the Online Repository at www.jacionline.org) to obtain the 66.8 Mb final assembly with 1390 contigs and 634 scaffolds, with scaffold median contig size (N50) being 194 Kb and contig N50 being 80 Kb. Based on the genome size estimation by Canu,3Koren S. Walenz B.P. Berlin K. Miller J.R. Bergman N.H. Phillippy A.M. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.Genome Res. 2017; 27: 722-736Google Scholar D pteronyssinus' genome size was 68.0 to 72.5 Mb. Therefore, this genome assembly represents a maximum of 98.2% of the estimated genome size. Summary statistics of the genome assembly are presented in Table E1 in this article's Online Repository at www.jacionline.org and the assembled genome was submitted to the National Center for Biotechnology Information with BioProject ID number PRJNA388362. Assessment of the genome completeness was performed using Benchmarking Universal Single-Copy Orthologs4Simao F.A. Waterhouse R.M. Ioannidis P. Kriventseva E.V. Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.Bioinformatics. 2015; 31: 3210-3212Google Scholar and results indicated that our hybrid-assembled D pteronyssinus genome contained 90.4% of the examined Benchmarking Universal Single-Copy Orthologs, of which 949 (89.1%) were complete. After combining the 16,300 ab initio predicted genes, as well as 13,699 predicted genes with RNA-Seq support, a total of 16,805 protein-coding genes in D pteronyssinus were annotated. Although we obtained the complete sequences of most D pteronyssinus' known allergen genes, there were no Der p 12, 16, 17, 19, and 22 in the World Health Organization and International Union of Immunological Societies Allergen Nomenclature database (Table I). Thus far, no studies have reported group 12 and group 19 allergens in D pteronyssinus or D farinae. Using Blo t 12 and Blo t 19 as reference sequences, we performed BLAST searches in both the transcriptome and genome of D pteronyssinus and no significant hits were identified. This observation could be because group 12 and group 19 allergens are restricted to Blomia and related mites only or the incompleteness of our assembled D pteronyssinus genome. However, we could find putative novel Der p 16 and Der p 22 in our high-quality genome by inferring homology from Der f 16 and Der f 22, which are known allergens in D farinae. Moreover, we obtained the putative novel Der p 25 to Der p 33 gene sequences and gene structures using this ortholog searching approach (Table I). We were able to locate the Der p 34 gene on the genome from Der f 34, an enamine/imine deaminase protein, but the translated Der p 34 had only 76 amino acid residues aligned to Der f 34 with a low sequence identity of 27%. We have also identified the Der p 35 gene on the basis of the recently published protein sequence of Der f 35, a protein with a MD-2–related lipid recognition domain.5Fujimura T. Aki T. Isobe T. Matsuoka A. Hayashi T. Ono K. et al.Der f 35: an MD-2-like house dust mite allergen that cross-reacts with Der f 2 and Pso o 2.Allergy. 2017; 72: 1728-1736Google Scholar However, both Der p 34 and Der p 35 did not have RNA-Seq support. The gene encoding the recently identified Der p 36,6Bordas-Le Floch V. Le Mignon M. Bussieres L. Jain K. Martelet A. Baron-Bodo V. et al.A combined transcriptome and proteome analysis extends the allergome of house dust mite Dermatophagoides species.PLoS One. 2017; 12: e0185830Google Scholar a profilin-like protein, in D pteronyssinus was present in D farinae and D pteronyssinus genomes. To extend the search of allergen orthologs, we also screened the genome with all the allergens from both World Health Organization and International Union of Immunological Societies and Allergen Online databases. These analyses indicated that 53 D pteronyssinus protein sequences were similar to allergens previously described in species other than mites (see Table E2 in this article's Online Repository at www.jacionline.org).Table ISummary of the allergens Der p 1 to Der p 35 found in D pteronyssinusExpected D pteronyssinus geneTranscript tagBiochemical nameNo. of exonsProtein lengthIdentity (%)∗Data in this column are the identity related to the corresponding D farinae allergen. The identity of Der f 5 and Der f 9 was calculated using data curated in Allergome.Reference homologue1Derp.2723Cysteine protease532082.9—2Derp.7375NPC2 family214687.6—3Derp.5920Trypsin226180.1—4Derp.1782Alpha amylase352386.5—5Derp.518IgE-binding protein29775.2—6Derp.10755Chymotrypsin328175.1—7Derp.9162Lipopeptide-binding protein227285.9—8Derp.11458Glutathione S-transferase318570.5—9Derp.7210Collagenolytic serine protease324987.3—10Derp.7036Tropomyosin614497.9—11Derp.3414Paramyosin1187598.1—12Not identifiedIgE-binding protein———Blo t 1213Derp.1993Cytosolic fatty acid–binding protein213195.4—14Derp.1332Apolipophorin61,03377.4—15Derp.2289Chitinase-like protein315685.1—16Derp.8966Gelsolin/villin748090.4Der f 16†Reference homologue allergen gene names beginning with “Der f” are from D farinae.17Not identified————No available sequence data18Derp.14002Chitin-binding protein346289.2—19Not identifiedAntimicrobial peptide homologue———Blo t 1920Derp.6960Arginine kinase535695.2—21Derp.593Coiled coil structural protein213371.2—22Derp.6007Lipid-binding protein733875.5Der f 2223Derp.7561Peritrophin-like protein domain2845——24Derp.7254Ubiquinol-cytochrome c reductase binding protein411896.6—25Derp.4090Triosephosphate isomerase424778.0Der f 2526Derp.10154Myosin alkali light chain612176.9Der f 2627Derp.5029Serpin124772.6Der f 2728Derp.10281Heat shock protein465567.9Der f 2829Derp.10531Peptidyl-prolyl cis-trans isomerase (cyclophilin)824775.8Der f 2930Derp.5413Ferritin610781.0Der f 3031Derp.5921Cofilin414883.2Der f 3132Derp.8965Secreted inorganic pyrophosphatase439182.8Der f 3233Derp.546Alpha-tubulin139693.6Der f 3334—Enamine/imine deaminase—7627.0Der f 3435—MD-2–related lipid recognition protein214382.1Der f 3536Derp.4698Profilin421777.2Der f 36If the D pteronyssinus allergen sequence was not available at World Health Organization and International Union of Immunological Societies, the allergen was identified by searching the homologous allergen protein sequences in D farinae.∗ Data in this column are the identity related to the corresponding D farinae allergen. The identity of Der f 5 and Der f 9 was calculated using data curated in Allergome.† Reference homologue allergen gene names beginning with “Der f” are from D farinae. Open table in a new tab If the D pteronyssinus allergen sequence was not available at World Health Organization and International Union of Immunological Societies, the allergen was identified by searching the homologous allergen protein sequences in D farinae. By using Coomassie Blue–stained 2-dimensional-PAGE, we separated HDM proteins, with molecular weights ranging from 15 kDa to 100 kDa. Immunoblotting results showed that 64 protein spots bound specific IgE using sera of patients with HDM allergy, as numbered on the 2-dimensional gel image (see Fig E1 in this article's Online Repository at www.jacionline.org). All 64 proteins were sequenced by Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, and the sequenced peptides were searched against our predicted genes. Among them, 50 allergens were identified and sequences could be supported by RNA-Seq data (Table II).Table IIMALDI-TOF MS analysis of 50 circled spots from 2-dimensional-PAGE of D pteronyssinus mite total protein extract determined by IgE-immunoblotting with HDM-allergic serTranscript tagProtein nameSpot ID∗Multiple spots associated with a particular protein may represent isoallergens, an allergen with varying carbohydrate content or degradation components.Previously identified allergens Derp.2723Der p 1 allergen23, 24, 25 Derp.6761Der p 29 allergen-like43 Derp.10154Der p 30 allergen-like33, 34 Derp.8965Der p 32 allergen44 Derp.10737DFP2-like proteins†DFP2: Dermatophagoides farinae most abundant protein 2, UniProt accession A1KXC2.8 Derp.1548DFP2-like protein45Newly identified allergens Derp.7585Actin46 Derp.4204Cytochrome c oxidase subunit 5B, mitochondrial-like protein38 Derp.3629EF-hand domain–containing protein32 Derp.8299Glutathione transferase delta-like Dp7018E1147 Derp.11987Heat shock protein 20-like protein 248 Derp.5129Muscle LIM protein Mlp84B-like protein49 Derp.6894Heat shock protein 20-like protein50 Derp.3626Hypothetical protein BLA29_00432527 Derp.316Hypothetical protein BLA29_00815741, 42 Derp.12160Muscle-specific protein 20-like protein7, 36 Derp.6652Myosin regulatory light chain 2-like28 Derp.10809PREDICTED: histone H2A.V31 Derp.13554Sodium-dependent glucose transporter-like protein 19, 10, 11, 19, 21 Derp.7036Tropomyosin1, 13, 14, 15, 16, 17 Derp.723Troponin T-like protein2, 3, 5 Derp.7254Ubiquinol-cytochrome c reductase binding protein-like protein35MALDI-ToF MS, Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry.∗ Multiple spots associated with a particular protein may represent isoallergens, an allergen with varying carbohydrate content or degradation components.† DFP2: Dermatophagoides farinae most abundant protein 2, UniProt accession A1KXC2. Open table in a new tab MALDI-ToF MS, Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. To characterize Der p 25 to 33, the recombinant proteins of Der p 25 to 33 were successfully expressed in Escherichia coli Origami (DE3) and then purified by Ni2+ affinity chromatography with fast protein liquid chromatography. The allergenicity of Der p 25, 26, 28, 32, and 33 was detected by ELISA using 26 sera of HDM allergy patients and 4 nonallergic human sera as negative control. The results showed that IgE-binding rates were 53.8%, 61.5%, 38.5%, 46.2%, and 65.3%, respectively (see Fig E2 in this article's Online Repository at www.jacionline.org). Therefore, among the novel D pteronyssinus allergens, Der p 25, Der p 26, and Der p 33 should be considered major allergens in the population studied on the basis of frequency of reactivity. Gene expression levels of allergens of D farinae and D pteronyssinus were also compared. Results showed that the expression levels of allergens in D pteronyssinus and D farinae were quite distinct. The top 5 expressed allergens in D pteronyssinus were Der p 1, 2, 5, 21, and 23, while in D farinae were Der f 10, 13, 21, 26, and 31 (see Fig E3 in this article's Online Repository at www.jacionline.org). We have also studied the transcriptome-wide differential expression in D pteronyssinus and D farinae. Gene Ontology (GO) enrichment analysis of these differentially expressed genes revealed 22 GO terms significantly enriched in D pteronyssinus downregulated genes and 1 GO term significantly enriched in upregulated genes. The top 3 enriched GO terms for the downregulated genes are “plasma membrane,” “open tracheal system development,” and “calcium ion binding” while the GO term enriched in upregulated genes is “proteolysis.” In this study, because the genomic DNA used in this study was extracted from a mixed culture of D pteronyssinus individuals, the highly heterozygous DNA sample made the assembly of a complete genome very difficult.7Kelley D.R. Salzberg S.L. Detection and correction of false segmental duplications caused by genome mis-assembly.Genome Biol. 2010; 11: R28Google Scholar Therefore, the scaffolds could not yet be arranged in chromosome order. Our previously published D farinae genome yielded only 53.5 Mb (∼76% of the genome) spread across 11,600 contigs with contig N50 of 8,538bp and 11,085 gaps.2Chan T.F. Ji K.M. Yim A.K. Liu X.Y. Zhou J.W. Li R.Q. et al.The draft genome, transcriptome, and microbiome of Dermatophagoides farinae reveal a broad spectrum of dust mite allergens.J Allergy Clin Immunol. 2015; 135: 539-548Google Scholar Because the single-molecule real-time sequencing technology was adopted in this study, the final assembly represents a 10-fold improvement in contig N50 and 15-fold improvement in continuity due to properly assembled gaps. The high error rate of the PacBio long reads was compensated by the relatively low error rate short reads, which was used to correct the PacBio long reads before assembly. The unprecedented contiguity of this D pteronyssinus genome provides a strong foundation for the identification of novel allergens. A previous study8Heymann P.W. Chapman M.D. Aalberse R.C. Fox J.W. Platts-Mills T.A. Antigenic and structural analysis of group II allergens (Der f II and Der p II) from house dust mites (Dermatophagoides spp).J Allergy Clin Immunol. 1989; 83: 1055-1067Google Scholar has reported that the protein level of Der p 1 and Der p 2 was higher than that of Der f 1 and Der f 2, respectively. Furthermore, the protein level of Der f 1 was higher than that of Der f 2. Another report also showed that the amount of Der p 1 protein in HDM extracts was higher than that of Der p 2,9Meyer C.H. Bond J.F. Chen M.S. Kasaian M.T. Comparison of the levels of the major allergens Der p I and Der p II in standardized extracts of the house dust mite, Dermatophagoides pteronyssinus.Clin Exp Allergy. 1994; 24: 1041-1048Google Scholar but we found that the expression of Der p 1 was slightly higher that of Der p 2 in living D pteronyssinus. In summary, a high-quality genome of D pteronyssinus was constructed, which (1) provided full gene structures of dust mite allergens and allergen homologues; and (2) enabled for the first time a comprehensive transcriptome analysis of D pteronyssinus and D farinae, and revealed distinctively expressed allergen genes between the 2 dust mites. Results in this study are important resources for the future development of diagnostics and immunotherapeutic vaccines. D pteronyssinus mites were cultured according to the previous method with some modifications.E1Chan T.F. Ji K.M. Yim A.K. Liu X.Y. Zhou J.W. Li R.Q. et al.The draft genome, transcriptome, and microbiome of Dermatophagoides farinae reveal a broad spectrum of dust mite allergens.J Allergy Clin Immunol. 2015; 135: 539-548Google Scholar In brief, the culture medium was prepared using a mixture of 220 g soybean powder and a ground-up vitamin tablet. D pteronyssinus mites were cultured at 25°C in a small air-filtered room at 70% to 75% relative humidity for 50 days. The cultured mites were enriched by a special setting including an incandescent lamp, a modified funnel, and a small container. The cultured mites were placed in the modified funnel, which was topped with a sieve (80 mesh) and connected to the small container. Under illumination by the incandescent lamp, mites actively went into the funnel through the sieve and eventually to the small container because they were afraid of heat. This setting enabled the collection of a higher quantity of mites. After washing with PBS and centrifugation to remove the culture media, the precipitated dust mites were observed under a light microscope for purity check. Approximately 400 mg of D pteronyssinus body mass was washed with 1× PBS and homogenized to a fine powder with an Ultra-Turrax T25 Homogenizer (IKA Works GmbH & Co., KG, Staufen, Germany) at a speed of 13,500 rpm. Genomic DNA was extracted from the homogenate with Genomic DNA Extraction Kit (Qiagen, GmbH, Hilden, Germany) using a tissue lysis protocol in accordance with the manufacturer's instructions. The integrity and quantity of genomic DNA were determined by 0.5% agarose gel electrophoresis and a Qubit Fluorometer (Thermo Fisher Scientific Inc, Waltham, Mass), respectively. Next-generation sequencing was performed using Illumina HiSeq 2000 to produce pair-end reads at 90 bp per end at 4 different insert sizes: 200 bp, 500 bp, 2 kbp, and 5 kbp (see Table E1), as well as using IonTorrent PGM and PacBio Sequel sequencing systems to produce single-end reads at mean read length of 188 bp and 3,576 bp, respectively. Similar to the first step in DNA extraction, approximately 400 mg of Dpteronyssinus body mass was washed with 1× PBS and homogenized in Trizol reagent (Thermo Fisher Scientific) with the Ultra-Turrax T25 Homogenizer (IKA) at a speed of 13,500 rpm. Total RNA was isolated from the homogenate with Trizol followed by PureLink RNA Mini Kit (Thermo Fisher Scientific). The quantity and quality of the total RNA sample were measured with a NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific). The first-strand cDNA was synthesized with a SMARTer PCR cDNA Synthesis Kit (Takara Bio USA Inc, Madison, Wis) following the manufacturer's protocol. In brief, 1 μg of total RNA was mixed with 3′ SMART coding sequence Primer IIA and then incubated at 72°C in a hot-lid thermal cycler for 3 minutes and at 42°C for 2 minutes. The reaction was transferred to a 5.5 μL aliquot of master mix containing 5× first-strand buffer, 1,4-dithiothreitol (100 mM), deoxynucleotide (10 mM), SMARTer IIA oligonucleotide (12 μM), RNase inhibitor, and SMARTScribe reverse transcriptase (100 U/μL). The reverse transcriptase reaction was incubated at 42°C for 1 hour, followed by heating at 70°C for 10 minutes. The cDNA product was amplified with an Advantage 2 PCR Kit (Takara Bio USA Inc) by mixing with a 90 μL aliquot of PCR master mix containing 10× Advantage 2 PCR buffer, 50× deoxynucleotide (10 mM), 5′ PCR Primer2A (12 μM), and 50× Advantage 2 polymerase. PCR conditions were as follows: 95°C for 1 minute followed by 20 cycles at 95°C for 15 seconds, 65°C for 30 seconds, and 68°C for 3 minutes. The quality of double-stranded cDNA was determined by an Agilent 2100 Bioanalyzer (Agilent Technologies, Waldbronn, Germany). Next-generation sequencing was performed using Illumina HiSeq 2500 to produce pair-end reads at 150 bp per end. Ion Torrent reads were assembled by the Newbler assembler version 2.8 while Illumina reads of different insert size were assembled by SOAPdenovoE2Luo R.B. Liu B.H. Xie Y.L. Li Z.Y. Huang W.H. Yuan J.Y. et al.SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.Gigascience. 2012; 1: 18Google Scholar version 2.04-r240, VelvetE3Zerbino D.R. Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.Genome Res. 2008; 18: 821-829Google Scholar version 1.2.10, and PlatanusE4Kajitani R. Toshimoto K. Noguchi H. Toyoda A. Ogura Y. Okuno M. et al.Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads.Genome Res. 2014; 24: 1384-1395Google Scholar version 1.2.4. Both the PacBio long reads and Illumina short reads were hybrid assembled together by using SPAdesE5Antipov D. Korobeynikov A. McLean J.S. Pevzner P.A. hybridSPAdes: an algorithm for hybrid assembly of short and long reads.Bioinformatics. 2016; 32: 1009-1015Google Scholar version 3.10.0, MaSuRCAE6Zimin A.V. Marcais G. Puiu D. Roberts M. Salzberg S.L. Yorke J.A. The MaSuRCA genome assembler.Bioinformatics. 2013; 29: 2669-2677Google Scholar version 3.2.1, and DBG2OLC.E7Ye C.X. Hill C.M. Wu S.G. Ruan J. Ma Z.S. DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies.Sci Rep. 2016; 6: 31900Google Scholar For the PacBio long reads, proovreadE8Hackl T. Hedrich R. Schultz J. Forster F. proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.Bioinformatics. 2014; 30: 3004-3011Google Scholar version 2.13.13 used the high-accuracy Illumina short reads with the insert size equal to 500 bp to correct the long reads. CanuE9Koren S. Walenz B.P. Berlin K. Miller J.R. Bergman N.H. Phillippy A.M. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.Genome Res. 2017; 27: 722-736Google Scholar version 1.5 was then used to assemble the corrected long reads. Mate paired short reads were used to scaffold contigs using SSPACE.E10Boetzer M. Henkel C.V. Jansen H.J. Butler D. Pirovano W. Scaffolding pre-assembled contigs using SSPACE.Bioinformatics. 2011; 27: 578-579Google Scholar SSPACE-LongE11Boetzer M. Pirovano W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information.BMC Bioinformatics. 2014; 15: 211Google Scholar version 1-1 was run for several rounds to link contigs into scaffolds from the rather high-accuracy circular consensus reads to the subreads. After scaffolding by PacBio long reads and mate-pair short reads, 2 more steps were performed to fill the gaps. First, PBjelly from PBSuite v15.8.24E12English A.C. Richards S. Han Y. Wang M. Vee V. Qu J. et al.Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.PLoS One. 2012; 7: e47768Google Scholar was used to fill the gaps using PacBio long reads. The second step used GapFillerE13Boetzer M. Pirovano W. Toward almost closed genomes with GapFiller.Genome Biol. 2012; 13: R56Google Scholar version 1-10 with Illumina mate-pair short reads. Benchmarking Universal Single-Copy Orthologs (BUSCO) version 3 has been widely used for the assessment of genome assembly and gene set based on evolutionarily informed expectations of gene content.E14Simao F.A. Waterhouse R.M. Ioannidis P. Kriventseva E.V. Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.Bioinformatics. 2015; 31: 3210-3212Google Scholar To evaluate the completeness of the D pteronyssinus genome, BUSCO was used in the genome mode. We used 2 approaches to annotate the genome. The first method is based on RNA-Seq data, which was run through the Tophat and Cufflinks pipeline (versions v2.1.1 and v2.2.1, respectively) in gene discovery mode to generate the gene annotation file (in GTF format).E15Trapnell C. Roberts A. Goff L. Pertea G. Kim D. Kelley D.R. et al.Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.Nat Prot. 2012; 7: 562-578Google Scholar The second is an ab initio method, in which GlimmerHMM was used. GlimmerHMM is based on a generalized hidden Markov model.E16Delcher A.L. Bratke K.A. Powers E.C. Salzberg S.L. Identifying bacterial genes and endosymbiont DNA with Glimmer.Bioinformatics. 2007; 23: 673-679Google Scholar We also used another ab initio prediction program AUGUSTUS, which has a protein profile extension that uses protein family–specific conservation to identify members and exon-intron structure of a protein family given by a block profile.E17Stanke M. AUGUSTUS: ab initio prediction of alternative transcripts.Nucleic Acids Res. 2006; 34: W435-W439Google Scholar AUGUSTUS prediction was conducted using the 400 single-gene sequences from FlyBase, disjoint with adh122. tRNAscan-SE was used to find out transfer RNA genes within the Dpteronyssinus nuclear genome,E18Lowe T.M. Eddy S.R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.Nucleic Acids Res. 1997; 25: 955-964Google Scholar and RNAmmer-1.2 was used to predict rRNA gene.E19Lagesen K. Hallin P. Rodland E.A. Staerfeldt H.H. Rognes T. Ussery D.W. RNAmmer: consistent and rapid annotation of ribosomal RNA genes.Nucleic Acids Res. 2007; 35: 3100-3108Google Scholar RepeatMasker was used to screen the D pteronyssinus genome sequences for interspersed repeats and low-complexity DNA sequencesE20Tarailo-Graovac M. Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences.Curr Protoc Bioinformatics. 2009; (pp. 4-10)Google Scholar with default parameters. The reference allergen protein sequences were downloaded from World Health Organization and International Union of Immunological Societies Allergen Nomenclature and Allergen Online database.E21Brusic V. Millot M. Petrovsky N. Gendel S.M. Gigonzac O. Stelman S.J. Allergen databases.Allergy. 2003; 58: 1093-1100Google Scholar To identify the allergen genes on our assembled genome, tblastn and in-house Perl scripts were used. The command was as follows: $ tblastn -query HDM_allergen.fasta -db jelly22.fa -out tblastn_out01 -evalue 1e-10 -num_threads 4 -outfmt 6 -max_hsps 1 -num_alignments 1. First, from the D pteronyssinus gene annotation GTF file, we extracted the coding sequence regions. This, together with the reads, Salmon was used to quantify expression levels.E22Patro R. Duggal G. Love M.I. Irizarry R.A. Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression.Nat Methods. 2017; 14: 417-419Google Scholar After that, the value of transcripts per kilobase million was obtained for each transcript. The transcripts per kilobase million values were normalized by Trinity and the normalized trimmed mean of maximum values representing the expression level were finally obtained for each gene.E23Haas B.J. Papanicolaou A. Yassour M. Grabherr M. Blood P.D. Bowden J. et al.De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.Nat Protoc. 2013; 8: 1494-1512Google Scholar The differential gene expression analysis was then conducted using R package EdgeR. Because there were no replicates, the dispersion was set as 0.1.E24Robinson M.D. McCarthy D.J. Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.Bioinformatics. 2010; 26: 139-140Google Scholar Drosophila melanogaster was selected as the background for functional enrichment analysis. To find the homologous protein pairs of Drosophila melanogaster and D pteronyssinus, the predicted proteins of D pteronyssinus were aligned against the National Center for Biotechnology Information protein database of Drosophila melanogaster with blastp.E25McGinnis S. Madden T.L. BLAST: at the core of a powerful and diverse set of sequence analysis tools.Nucleic Acids Res. 2004; 32: W20-W25Google Scholar The parameters of blastp were “-max_target_seqs 1 -culling_limit 1 -evalue 1e-10 -qcov_hsp_perc 50 -num_threads 39.” After that, the differentially expressed genes of D pteronyssinus and D farinae were labeled as their homologous mates in Drosophila melanogaster and the labeled differentially expressed genes were subjected to functional enrichment analysis using DAVID version 6.8.E26Huang da W. Sherman B.T. Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.Nat Protoc. 2009; 4: 44-57Google Scholar One gram of D pteronyssinus bodies was weighed and pulverized with liquid nitrogen in a 5.0 mL precooled crude buffer (50 mM Tris-HCl, pH 8.5, 5 mM EDTA, 100 mM KCl, 1% w/v 1,4-dithiothreitol, 30% w/v sucrose, 1% v/v Triton x-100) and then an equal volume of Tris-HCl–saturated phenol (pH 8.0) was added, followed by stirring at 4°C for 25 minutes and centrifugation at 4°C and 10,000 rpm for 25 minutes. The phenol phase (upper layer) was collected and added to a 5-fold volume of methanol solution containing 0.1 mol ammonium acetate and allowed to stand overnight at −20°C. The supernatant was then removed by centrifugation (4°C, 8000 rpm, 5 minutes), washed 3 times with acetone containing 0.2% w/v 1,4-dithiothreitol (precooled at 80°C), dried in a fume hood at room temperature, and placed in an appropriate amount (1 mL) at 4°C overnight for complete dissolution. Finally, the supernatant was collected by centrifugation (4°C, 10,000 rpm, 20 minutes), dispensed, and stored at −80°C. The protein extracts were separated by 2-dimensional gel electrophoresis in duplicate (300 μg sample of total protein per gel). One gel was used for IgE-blot analysis to locate D pteronyssinus antigens, and the second gel was stained with Coomassie Blue. For the first-dimensional isoelectric focusing electrophoresis, a total volume of 125 μL protein extracts was loaded into a focusing tray and the pH was adjusted to 4 to 7 and 3 to 10, respectively. The sample was soaked in the entire tape and covered with an appropriate amount of mineral oil. Then, the focus tray was put into the isoelectric focusing instrument for running. After the isoelectric focusing, the strips were washed with distilled water and then put into the equilibration buffer I for 15 minutes and rinsed with distilled water. Th