Titanic Survival Prediction Based on Machine Learning Algorithms

Yuming Zhang
DOI: https://doi.org/10.54097/mwkr1a24
2024-08-15
Abstract:This report is aimed at demonstrating the application of machine learning techniques for predicting the survival of passengers who boarded the Titanic. After analyzing the Titanic dataset, which includes variables Pclass, Sex, Age, SibSp, Parch, Fare, Ticket, and Cabin, two machine learning algorithms, Logistic Regression and Random Forests Model, are used to give survival predictions. Models are compared to find an accuracy difference, and the magnitudes that each factor has on survival are also identified. Data preprocessing is the essential technique that will be used to adjust the data set. Before this process, correlations between variables are analyzed to give directions for feature engineering. And for feature engineering, data conversion and vacancy filling are first implemented. Afterwards, features are selected while new features are gained for model implementation. In understanding the final outputs of the model, new features, like name length, combine to give insight into more implicit survival factors that have been previously ignored.
What problem does this paper attempt to address?