FishCODE: a web-based information platform for comprehensive omics data exploration in fish research
Heng Li,Wanting Zhang,Keyi Ren,Hong Yang,Lei Zhang,Waqar Younas,Yingyin Cheng,Yaping Wang,Mijuan Shi,Xiao-Qin Xia
DOI: https://doi.org/10.1101/2024.09.25.614839
2024-09-27
Abstract:In terms of the utilization of omics data, the current fish database analysis functions are primarily relatively simple tools at the transcriptional level, aimed at obtaining the co-expression levels of specified genes or the data visualization of multiple genes, and do not enable users to perform comprehensive omics data analysis. Furthermore, the gene-level information currently provided by these multispecies fish genomics databases is incomplete, and there is a lack of a comprehensive portal that can offer multidimensional genetic information. To address these challenges, we collected extensive multi-omics information on 35 fishes and established the primary comprehensive multi-omics data information platform for fish, FishCODE (http://bioinfo.ihb.ac.cn/fishcode). We have collected experimental background of dataset which pertaining to the target fishes, selected a range of datasets that encompass a broad spectrum of research areas, and downloaded the corresponding raw omics data from public repositories such as the Sequence Read Archive (SRA). Through a unified pipeline analysis, FishCODE contains 11,216 samples from 540 sets of genomic, transcriptomic, and methylomic datasets. These data encompass transcript structure and expression, gene methylation levels, protein domains, protein subcellular localization, protein interactions, best matched protein (Swiss-Prot), associated SNP site information (47,111,018), orthologous genes, phylogenetic tree and GO/KEGG annotations. To facilitate comparison, we annotated the experimental background data sets of the FishCODE, FishGET, PhyloFish, FishSED and FishSCT databases using the Fish Experimental Condition Ontology. Currently, the FishCODE database omics dataset includes 146 unique experimental condition words, 654 cumulative experimental condition words, and 13 species with rich experimental background (more than 20 unique FECO words). These data are 3.5 times (42), 8.3 times (74), and 6.5 times (2) those of the second-ranked databases respectively. We generated word cloud maps for the experimental condition vocabularies of FishCODE and FishGET, illustrating the superior richness of FishCODE's experimental background.
Bioinformatics