A lexicographic optimisation approach to promote more recent features on longitudinal decision-tree-based classifiers: applications to the English Longitudinal Study of Ageing

Caio Ribeiro,Alex A. Freitas
DOI: https://doi.org/10.1007/s10462-024-10718-1
IF: 9.588
2024-03-10
Artificial Intelligence Review
Abstract:Supervised machine learning algorithms rarely cope directly with the temporal information inherent to longitudinal datasets, which have multiple measurements of the same feature across several time points and are often generated by large health studies. In this paper we report on experiments which adapt the feature-selection function of decision tree-based classifiers to consider the temporal information in longitudinal datasets, using a lexicographic optimisation approach. This approach gives higher priority to the usual objective of maximising the information gain ratio, and it favours the selection of features more recently measured as a lower priority objective. Hence, when selecting between features with equivalent information gain ratio, priority is given to more recent measurements of biomedical features in our datasets. To evaluate the proposed approach, we performed experiments with 20 longitudinal datasets created from a human ageing study. The results of these experiments show that, in addition to an improvement in predictive accuracy for random forests, the changed feature-selection function promotes models based on more recent information that is more directly related to the subject's current biomedical situation and, thus, intuitively more interpretable and actionable.
computer science, artificial intelligence
What problem does this paper attempt to address?