A Decision Tree Approach to Predicting Recidivism in Domestic Violence

Senuri Wijenayake,Timothy Graham,Peter Christen
DOI: https://doi.org/10.48550/arXiv.1803.09862
2018-03-27
Abstract:Domestic violence (DV) is a global social and public health issue that is highly gendered. Being able to accurately predict DV recidivism, i.e., re-offending of a previously convicted offender, can speed up and improve risk assessment procedures for police and front-line agencies, better protect victims of DV, and potentially prevent future re-occurrences of DV. Previous work in DV recidivism has employed different classification techniques, including decision tree (DT) induction and logistic regression, where the main focus was on achieving high prediction accuracy. As a result, even the diagrams of trained DTs were often too difficult to interpret due to their size and complexity, making decision-making challenging. Given there is often a trade-off between model accuracy and interpretability, in this work our aim is to employ DT induction to obtain both interpretable trees as well as high prediction accuracy. Specifically, we implement and evaluate different approaches to deal with class imbalance as well as feature selection. Compared to previous work in DV recidivism prediction that employed logistic regression, our approach can achieve comparable area under the ROC curve results by using only 3 of 11 available features and generating understandable decision trees that contain only 4 leaf nodes.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accurately predict the recidivism rate of domestic violence (DV) by developing a decision tree (DT) method while ensuring the interpretability of the model. Specifically: 1. **Improve prediction accuracy**: The author hopes that, when using administrative data, it can predict the recidivism risk of domestic violence offenders more accurately than existing methods such as logistic regression. 2. **Enhance the interpretability of the model**: Existing prediction models are often too complex to be interpreted and applied. The author hopes to simplify the decision - tree structure to generate understandable and operable results, thereby helping the police and front - line agencies better conduct risk assessment and decision - making. ### Research background Domestic violence is a global social and public health problem with a high degree of gender disparity. Accurately predicting the recidivism behavior of domestic violence can help accelerate and improve the risk assessment process, better protect victims, and may prevent future violent incidents. However, although some existing classification techniques (such as decision trees and logistic regression) can achieve high prediction accuracy, the generated decision trees are too complex to be interpreted, which affects the decision - making process in practical applications. ### Research objectives The main objectives of this paper are to improve the prediction of domestic violence recidivism in the following ways: - **Use the decision - tree method**: By introducing decision - tree induction, generate a model that is both interpretable and has high prediction accuracy. - **Handle the class imbalance problem**: Since the number of recidivists is far less than that of non - recidivists, the data set is class - imbalanced. The article explores under - sampling and over - sampling methods to balance the data set. - **Feature selection**: In order to simplify the model and improve interpretability, feature selection was also carried out in the study, and only the most important features were used for prediction. ### Experimental results The experimental results show that by appropriately adjusting the size of the decision tree and selecting key features, an easy - to - understand and use decision tree can be generated while maintaining high prediction accuracy. For example, using a decision tree with only 3 features and 4 leaf nodes can achieve an area under the ROC curve (AUC - ROC) result comparable to that of logistic regression. ### Conclusion This research not only improves the accuracy of predicting domestic violence recidivism but also enhances its interpretability by simplifying the model structure, providing better support for practical applications.