Addressing Global Biodiversity Challenges: Ensuring Long-Term Sustainability of Morphological Data Collection and Reuse through MorphoBank
Brooke Long-Fox,Ana Andruchow-Colombo,Shreya Jariwala,Maureen O’Leary,Tanya Berardini
DOI: https://doi.org/10.3897/biss.8.135124
2024-08-20
Biodiversity Information Science and Standards
Abstract:Phenotypic, especially morphological, data are highly useful in systematics, taxonomy, and phylogenetics. Despite the increased use of genetic information, phenotypic data are necessary when researching the fossil record and remain useful for living taxa by providing independent evidence for testing molecular clades. MorphoBank is a FAIR (Findable, Accessible, Interoperable, and Reusable) database providing open biodiversity data in the form of morphological characters (O’Leary and Kaufman 2011, O'Leary and Kaufman 2012), a similar concept to GenBank for open access sequence data. MorphoBank enables scientists to share morphological character data associated with their peer-reviewed publications in the form of phylogenetic matrices as Tree analysis using New Technology (TNT) or NEXUS files. MorphoBank hosts 1,738 publicly accessible projects (each MorphoBank project is issued a unique identifier (ID) begining with the letter P followed by a number) with 173,559 images and 1,138 matrices as of July 2024. These data can be downloaded by the public, researchers, and students in the scientific community, where the data can be used for educational purposes or reused in additional phylogenetic analyses. MorphoBank encourages scientists to add content in numerous ways throughout the research process, including while actively working on a morphological matrix or in conjunction with a paper to be published that has a morphological matrix. For example, some large projects, such as P773, represents collaborative research that contains a matrix with 4,541 characters and over 12,000 annotated images. Researchers looking to replicate or utilize the data from this study, a task that would normally be extremely time and labor intensive, are able to quickly and easily download and work with the data in their own analyses. MorphoBank has a team of part-time curators and interns who also add content post-publication. Between 2018 and 2023, MorphoBank staff accounted for 25% of project creation and 41% of project publication. The MorphoBank community members created more projects but published fewer of them in the same time frame. The MorphoBank curation team strives to add the matrices to make the data FAIR. A majority of the data are associated with publications in journals that require a subscription; MorphoBank makes the matrix data available with its complete metadata without a financial access barrier. Data standards for morphological character matrices include scored taxa, full taxonomic names, and complete character names with character state descriptions. Since NEXUS files have varying standardization and syntax (Maddison et al. 1997, Vos et al. 2012), importing a matrix can lead to data errors, which MorphoBank does not accept due to its mission to provide complete and reproducible datasets. Hence, users often add incomplete data as file attachments. To help ensure full data is uploaded, MorphoBank has partnered with journals to ensure instructions to authors or emails to authors of accepted manuscripts make clear the need to upload data matrices to MorphoBank. MorphoBank has been cited over 1,500 times, with increasing citations each year (Fig. 1). We examined the use and impact of MorphoBank data on systematic and phylogenetic research and found that most data are used in phylogenetic analyses, describing new species, and examining diversification of taxonomic groups, which span a wide-range organisms from vertebrates such as dinosaurs, reptiles, and mammals (including studies of human evolution) to plants, invertebrates, and micro-organisms. MorphoBank has developed and implemented an internship program for undergraduate biology students focused on training in phylogenetic data, curation, research writing, and conference presenting. Part of this intership program involves utilizing Artificial Intelligence (AI) to increase efficiency by automating the process of extraction of character name and state data from published articles and integrating them into NEXUS files. Three additional activities help raise awareness and increase community contributions to MorphoBank: A partnership with the American Museum of Natural History (AMNH) was established in Summer 2024 to train volunteer curators. MorphoBank workshops have been developed for in-person (i.e., 12th North American Paleontological Convention in Ann Arbor, Michigan) and virtual (i.e., 3rd Joint Congress on Evolutionary Biology supported by the Society of Systematic Biologists) conferences. Virtual workshops will be offered quarterly to educate the scientific community on ways to add their own phylogenetic data to MorphoBank. A partnership with the American Museum of Natural History (AMNH) was established in Summer 2024 to train volunteer curators. MorphoBank workshops have been developed for in-person (i.e., 12th North American Paleontological Convention in Ann Arbor, Michigan) and virtual (i.e., 3rd Joint Congress on Evolutionary Biology supported by the Society of Systematic Biologists) conferences. Virtual workshops will be offered quarterly to educate the scientific community on ways to add their own phylogenetic data to MorphoBank. The long-term sustainability of MorphoBank depends on success in three areas: Financial: MorphoBank is currently supported by membership fees from academic institutions and museums; institutional support from the non-profit organization Phoenix Bioinformatics; and grants from the United States National Science Foundation. Its future depends on continued and growth in membership. Technical: The over 20-year-old MorphoBank codebase is being completely overhauled to provide better performance, add longer term software stability, and enable easier addition of new features. Scientific: The outreach efforts to increase community awareness and contributions aim to ensure the continued relevance and utility of the resource. Growth in data depth and breadth feeds into making MorphoBank indispensable for research in this scientific domain. Financial: MorphoBank is currently supported by membership fees from academic institutions and museums; institutional support from the non-profit organization Phoenix Bioinformatics; and grants from the United States National Science Foundation. Its future depends on continued and growth in membership. Technical: The over 20-year-old MorphoBank codebase is being completely overhauled to provide better performance, add longer term software stability, and enable easier addition of new features. Scientific: The outreach efforts to increase community awareness and contributions aim to ensure the continued relevance and utility of the resource. Growth in data depth and breadth feeds into making MorphoBank indispensable for research in this scientific domain.