Data Mining of Online Genealogy Datasets for Revealing Lifespan Patterns in Human Population

Michael Fire,Yuval Elovici
DOI: https://doi.org/10.1145/2700464
IF: 5
2015-05-04
ACM Transactions on Intelligent Systems and Technology
Abstract:Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can help identify various patterns in the human population. In this study, we present methods and algorithms that can assist in identifying variations in lifespan distributions of the human population in the past centuries, in detecting social and genetic features that correlate with the human lifespan, and in constructing predictive models of human lifespan based on various features that can easily be extracted from genealogy datasets. We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 9 million connections, all of which were collected from the WikiTree website. Our findings indicate that significant but small positive correlations exist between the parents’ lifespan and their children’s lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our predictive models presented results with a Mean Absolute Error as low as 13.18 in predicting the lifespans of individuals who outlived the age of 10, and our classification models presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80. We believe that this study will be the first of many studies to utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors that influence the human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?