On Entity Embeddings for Ordinal Features as Representation Learning in Recurrence Prediction of Urothelial Bladder Cancer

Louisa Schwarz,Franz Rothlauf
DOI: https://doi.org/10.3233/SHTI240508
2024-08-22
Abstract:Background: Urothelial Bladder Cancer (UBC) is a common cancer with a high risk of recurrence, which is influenced by the TNM classification, grading, age, and other factors. Recent studies demonstrate reliable and accurate recurrence prediction using Machine Learning (ML) algorithms and even outperform traditional approaches. However, most ML algorithms cannot process categorical input features, which must first be encoded into numerical values. Choosing the appropriate encoding strategy has a significant impact on the prediction quality. Objective: We investigate the impact of encoding strategies for ordinal features in the prediction quality of ML algorithms. Method: We compare three different encoding strategies namely one-hot, ordinal, and entity embedding in predicting the 2-year recurrence in UBC patients using an artificial neural network. We use ordered categorical and numerical data of UBC patients provided by the Cancer Registry Rhineland-Palatinate. Results: We show superior prediction quality using entity embedding encoding with 84.6% precision, an overall accuracy of 73.8%, and 68.9% AUC on testing data over 100 epochs after 30 runs compared to one-hot and ordinal encoding. Conclusion: We confirm the superiority of entity embedding encoding as it could provide a more detailed and accurate representation of ordinal features in numerical scales. This can lead to enhanced generalizability, resulting in significantly improved prediction quality.
What problem does this paper attempt to address?