ITree : a user-driven tool for interactive decision-making with classification trees

Hubert Sokołowski,Marcin Czajkowski,Anna Czajkowska,Krzysztof Jurczuk,Marek Kretowski
DOI: https://doi.org/10.1093/bioinformatics/btae273
IF: 5.8
2024-04-18
Bioinformatics
Abstract:Abstract Motivation ITree is an intuitive web tool for the manual, semi-automatic, and automatic induction of decision trees. It enables interactive modifications of tree structures and incorporates Relative Expression Analysis for detecting complex patterns in high-throughput molecular data. This makes ITree a versatile tool for both research and education in biomedical data analysis. Results The tool allows users to instantly see the effects of modifications on decision trees, with updates to predictions and statistics displayed in real time, facilitating a deeper understanding of data classification processes. Availability and Implementation Available online at https://itree.wi.pb.edu.pl. Source code and documentation are hosted on GitHub at https://github.com/hsokolowski/iTree. Supplementary Information Additional resources are provided to enhance user experience and support.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The paper introduces a user-driven tool named ITree, aimed at addressing the issues of constructing and applying Decision Trees (DTs) in high-throughput biomedical data. Specifically, ITree focuses on the following aspects: 1. **Addressing the limitations of traditional decision trees in high-dimensional biomedical data**: Traditional decision trees tend to suffer from overfitting or underfitting when dealing with such data. Therefore, ensemble methods like random forests or gradient boosting trees are often used. However, while these methods improve predictive performance, they sacrifice model interpretability. 2. **Providing an intuitive and interactive decision tree construction platform**: ITree allows users to construct decision trees manually, semi-automatically, or automatically, and to modify the tree structure in real-time, instantly viewing the impact of modifications on predictive results and statistical information. This interactive approach helps users gain a deeper understanding of the classification process. 3. **Introducing the concept of Relative Expression Analysis (RXA)**: To capture complex patterns in biomedical data, ITree employs the RXA concept, which compares the relative expression levels between features within samples to identify meaningful rules. This makes ITree more robust in handling high-dimensional data, less susceptible to variability between different platforms, and less affected by complex data preprocessing methods. 4. **Supporting multiple node splitting tests**: ITree supports three types of node splitting tests, including C4.5-style splits based on information gain, Top Scoring Pair (TSP) tests, and more advanced Weighted TSP (WTSP) tests. These testing methods not only enhance the model's flexibility but also reveal key changes in gene expression or important regulatory networks in biological processes. 5. **Enhancing educational and research purposes**: Besides its practical application value, ITree is also designed as an excellent educational tool, particularly suitable for medical professionals and researchers with non-technical backgrounds. Its user-friendly interface helps bridge the gap between complex data analysis and practical application. In summary, by providing a highly interactive and easy-to-use platform, ITree aims to improve the application of decision trees in biomedical data while maintaining model interpretability, thereby promoting research and development in related fields.