Optimal SNP filtering strategies for pedigree reconstruction: A case study with wild red‐spotted masu salmon population

Shohei Noda,Tetsuya Akita,Rui Ueda,Takafumi Katsumura,Yasuyuki Hashiguchi,Hirohiko Takeshima,Takuya Sato
DOI: https://doi.org/10.1002/1438-390x.12192
2024-05-29
Population Ecology
Abstract:In this article, we tested optimal SNP filtering strategies for accurate parentage assignment and pedigree reconstruction for a wild population of red‐spotted masu salmon, Oncorhynchus masou ishikawae. We found that mid‐point filtering in terms of call rate and minor allele frequency performs well for pedigree reconstruction. We provided an effective bioinformatic pipeline for determining suitable SNP filtering based on call rate and minor allele frequency for pedigree analysis in a wild population, but the optimal balance depends on the study systems. Pedigree data have provided indispensable information for the study of ecology and evolution. Improvement of bioinformatics guidelines for discovering informative single nucleotide polymorphisms (SNPs) from genomic data is essential for pedigree reconstruction because of the trade‐off between the quantity (number of SNPs), quality (minor allele frequency [MAF]), and call rate (CR). However, there are few practical reports assessing the optimal balance of SNP filtering parameter combinations while maintaining a sufficient number of SNPs required for accurate pedigree analysis. In this study, we tested some bioinformatic pipelines for accurate SNP‐based parentage assignment and pedigree reconstruction in a wild population of red‐spotted masu salmon, Oncorhynchus masou ishikawae. We produced nearly complete parentage assignments using any SNP sets filtered for different MAF and CR values. For full sibling and half‐sibling assignments, mid‐point filtered SNP sets performed well. This indicates the significant effects of SNP filtering parameter combinations on pedigree reconstruction in a multi‐generational population. Considering the balance between the quantity and quality of SNP data is essential for accurately inferring pedigrees.
ecology
What problem does this paper attempt to address?