Misspellings or “miscellings”-non-verifiable cell lines in cancer research publications

Danielle J. Oste,Pranujan Pathmendra,Reese A. K. Richardson,Gracen Johnson,Yida Ao,Maya D. Arya,Naomi R. Enochs,Muhammed Hussein,Jinghan Kang,Aaron Lee,Jonathan J. Danon,Guillaume Cabanac,Cyril Labbé,Amanda Capes Davis,Thomas Stoeger,Jennifer A. Byrne
DOI: https://doi.org/10.1101/2024.02.29.582220
2024-03-06
Abstract:Reproducible laboratory research relies on correctly identified reagents. We have previously described human gene research papers with wrongly identified nucleotide sequence reagent(s), including papers studying . Manually verifying reagent identities in more recent papers found 20/36 (56%) and 6/36 (17%) papers with misidentified nucleotide sequence reagent(s) and human cell line(s), respectively. We also found 5 cell line identifiers in two papers with wrongly identified nucleotide sequences and cell lines, and 18 identifiers published elsewhere that did not correspond to indexed cell lines. These cell line identifiers were described as non-verifiable, as their identities appeared uncertain. Studying 420 papers that mentioned 8 different non-verifiable cell line identifier(s) found 235 papers (56%) that appeared to refer to BGC-803, BSG-803, BSG-823, GSE-1, HGC-7901, HGC-803 and/or MGC-823 as independent cell lines. We could not find publications describing how these cell lines were established, and they were not indexed in claimed externally accessible cell line repositories. While some papers stated that STR profiles had been generated for BGC-803, GSE-1 and/or MGC-823 cells, no STR profiles were identified. In summary, non-verifiable human cell lines represent new challenges to research reproducibility and require further investigation to clarify their identities.
Cancer Biology
What problem does this paper attempt to address?
The paper attempts to address the following issues: There is a large number of non-verifiable cell lines in cancer research literature, whose identifiers cannot be found or verified in known cell line databases. By analyzing recently published research papers on miR-145, the authors found misidentified nucleotide sequence reagents and non-verifiable human cell lines. Specifically, the paper focuses on the following aspects: 1. **Verification of nucleotide sequence reagents**: By verifying the nucleotide sequence reagents in recent research papers on miR-145, it was found that approximately 24% of the sequences were misidentified. 2. **Contamination and misclassification of cell lines**: In 21 papers containing experiments on miR-145, nearly 30% of the papers used at least one misidentified cell line. 3. **Identification of non-verifiable cell lines**: The paper identified several identifiers not included in cell line databases, and these identifiers may represent misspelled known contaminated cell lines. For example, BGC-803, BSG-823, etc., are considered misspelled versions of contaminated cell lines like MGC-803, BGC-823, etc. The core issue of the paper is to explore how these non-verifiable cell lines are described in the literature and their impact on the reliability of research results. It also introduces a new concept—"miscelling," which refers to published cell lines that lack established descriptions, cannot be found in the claimed external repositories, and lack STR profiles. These issues pose new challenges to the reproducibility of research.