Component Based Pre-filtering of Noisy Data for Improved Tsetlin Machine Modelling

Anna Jenul,Bimal Bhattarai,Kristian Hovde Liland,Lei Jiao,Stefan Schrunner,Cecilia Futsaether,Ole-Christoffer Granmo,Oliver Tomic
DOI: https://doi.org/10.1109/istm54910.2022.00019
2022-01-01
Abstract:Tabular data with few observations and many features are common in the healthcare domain. With its rule-based approach to data modelling, the Tsetlin Machine has considerable potential to be a valuable tool in healthcare data analysis by providing interpretability to medical personnel. However, the performance of Tsetlin Machine models may be hampered by the presence of noise, which may often be the case with healthcare data due to individual differences across patients. This study shows that intelligent pre-filtering of healthcare measurement data using so-called component-based methods, such as Principal Component Analysis or Partial Least Squares Regression, can be beneficial for the performance of Tsetlin Machines. Modelling four healthcare data sets shows that the Tsetlin Machine achieved better predictive performance on pre-filtered data for data sets with high features-to-observations ratios.
What problem does this paper attempt to address?