FishCODE: a web-based information platform for comprehensive omics data exploration in fish research

Heng Li,Wanting Zhang,Keyi Ren,Hong Yang,Lei Zhang,Waqar Younas,Yingyin Cheng,Yaping Wang,Mijuan Shi,Xiao-Qin Xia
DOI: https://doi.org/10.1101/2024.09.25.614839
2024-09-27
Abstract:In terms of the utilization of omics data, the current fish database analysis functions are primarily relatively simple tools at the transcriptional level, aimed at obtaining the co-expression levels of specified genes or the data visualization of multiple genes, and do not enable users to perform comprehensive omics data analysis. Furthermore, the gene-level information currently provided by these multispecies fish genomics databases is incomplete, and there is a lack of a comprehensive portal that can offer multidimensional genetic information. To address these challenges, we collected extensive multi-omics information on 35 fishes and established the primary comprehensive multi-omics data information platform for fish, FishCODE (http://bioinfo.ihb.ac.cn/fishcode). We have collected experimental background of dataset which pertaining to the target fishes, selected a range of datasets that encompass a broad spectrum of research areas, and downloaded the corresponding raw omics data from public repositories such as the Sequence Read Archive (SRA). Through a unified pipeline analysis, FishCODE contains 11,216 samples from 540 sets of genomic, transcriptomic, and methylomic datasets. These data encompass transcript structure and expression, gene methylation levels, protein domains, protein subcellular localization, protein interactions, best matched protein (Swiss-Prot), associated SNP site information (47,111,018), orthologous genes, phylogenetic tree and GO/KEGG annotations. To facilitate comparison, we annotated the experimental background data sets of the FishCODE, FishGET, PhyloFish, FishSED and FishSCT databases using the Fish Experimental Condition Ontology. Currently, the FishCODE database omics dataset includes 146 unique experimental condition words, 654 cumulative experimental condition words, and 13 species with rich experimental background (more than 20 unique FECO words). These data are 3.5 times (42), 8.3 times (74), and 6.5 times (2) those of the second-ranked databases respectively. We generated word cloud maps for the experimental condition vocabularies of FishCODE and FishGET, illustrating the superior richness of FishCODE's experimental background.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the current fish database analysis functions are relatively simple, mainly focusing on the transcriptional level, such as obtaining the co - expression level of specified genes or the data visualization of multiple genes, but being unable to conduct comprehensive multi - omics data analysis. In addition, the gene - level information provided by the existing multi - species fish genome databases is incomplete, lacking a comprehensive portal that can provide multi - dimensional genetic information. To meet these challenges, researchers collected extensive multi - omics information of 35 fish species and established the first comprehensive fish multi - omics data information platform - FishCODE (<http://bioinfo.ihb.ac.cn/fishcode>). The FishCODE platform not only contains rich genomic, transcriptomic and methylomic data, but also provides a variety of tools, such as single - nucleotide polymorphism annotation, phylogenetic tree construction, comparative genome browser, conventional transcriptome, cross - species transcriptome, time - series transcriptome analysis and epigenomic analysis capabilities. These tools are closely linked to the diverse experimental background data in the database, enabling users to easily conduct exploratory analysis without preliminary research, data download, cleaning or upload. Through this platform, users can obtain valuable information such as relevant information of candidate genes, enriched biological pathways, and metabolic pathway dynamics.