Abstract:Context: Previous studies demonstrate that Machine or Deep Learning (ML/DL) models can detect Technical Debt from source code comments called Self-Admitted Technical Debt (SATD). Despite the importance of ML/DL in software development, limited studies focus on automated detection for new SATD types: Algorithm Debt (AD). AD detection is important because it helps to identify TD early, facilitating research, learning, and preventing the accumulation of issues related to model degradation and lack of scalability. Aim: Our goal is to improve AD detection performance of various ML/DL models. Method: We will perform empirical studies using approaches: TF-IDF, Count Vectorizer, Hash Vectorizer, and TD-indicative words to identify features that improve AD detection, using ML/DL classifiers with different data featurisations. We will use an existing dataset curated from seven DL frameworks where comments were manually classified as AD, Compatibility, Defect, Design, Documentation, Requirement, and Test Debt. We will explore various word embedding methods to further enrich features for ML models. These embeddings will be from models founded in DL such as ROBERTA, ALBERTv2, and large language models (LLMs): INSTRUCTOR and VOYAGE AI. We will enrich the dataset by incorporating AD-related terms, then train various ML/DL classifiers, Support Vector Machine, Logistic Regression, Random Forest, ROBERTA, and ALBERTv2.

What problem does this paper attempt to address?

The problem this paper attempts to address is: improving the performance of automatic detection of Algorithm Debt (AD) within deep learning frameworks. Specifically, the paper aims to enhance the performance of various machine learning (ML) and deep learning (DL) models in detecting algorithm debt through empirical research. The paper points out that although existing research has demonstrated that ML/DL models can automatically detect technical debt (TD) from source code comments, there is still limited research on the automated detection of a new type of technical debt—algorithm debt (AD). Therefore, this study aims to explore different feature extraction methods and ML/DL models to improve the performance of AD detection. ### Main Issues: 1. **Improving AD Detection Performance**: How to improve the performance of AD detection through different feature extraction methods and ML/DL models? 2. **Performance Comparison of Different Models**: Which ML/DL models perform best in detecting AD? ### Research Background: - **Technical Debt (TD)**: TD refers to compromises made during software development for short-term benefits, which may increase maintenance costs in the future. - **Algorithm Debt (AD)**: AD specifically refers to suboptimal implementations of algorithm logic, which may lead to system performance degradation, model deterioration, and lack of scalability. - **Existing Research**: While existing research has shown that ML/DL models can automatically detect technical debt in traditional software, there is relatively little research on the automated detection of AD, especially in the field of deep learning. ### Research Objectives: - **Improving AD Detection Performance**: Through empirical research, explore different feature extraction methods and ML/DL models to improve the accuracy of AD detection. - **Evaluating the Performance of Different Models**: Compare the performance of different ML/DL models in detecting AD to identify the most effective model. ### Methods: - **Dataset**: Use the dataset compiled by Liu et al., which contains manually classified SATD (self-admitted technical debt) comments from seven deep learning frameworks. - **Feature Extraction Methods**: Include TF-IDF, Count Vectorizer, Hash Vectorizer, and AD indicator words. - **ML/DL Models**: Include SVM, Logistic Regression, Random Forest, ROBERTA, and ALBERTv2. - **Data Augmentation**: Enrich the dataset by adding terms and definitions related to AD to provide more contextual information. - **Model Training and Tuning**: Use Grid Search CV for parameter optimization and 10-fold cross-validation for model training and validation. - **Performance Evaluation**: Evaluate model performance using metrics such as accuracy, recall, F1 score, and conduct statistical significance tests. ### Expected Contributions: - **Improving AD Detection Performance**: Improve the accuracy of AD detection by exploring different feature extraction methods and ML/DL models. - **Promoting AD Research**: Provide a foundation for understanding the characteristics and impacts of AD in deep learning systems, encouraging further research. - **Practical Tools**: Provide developers with an automated tool to help early identification and management of AD, thereby improving system performance and maintainability.

Automated Detection of Algorithm Debt in Deep Learning Frameworks: An Empirical Study

Neural Network-based Detection of Self-Admitted Technical Debt

An Exploratory Study on the Introduction and Removal of Different Types of Technical Debt in Deep Learning Frameworks

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems

SATD Detector

Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt

Self-Admitted Technical Debt Detection Approaches: A Decade Systematic Review

Self-admitted technical debt classification using natural language processing word embeddings

Detecting and Explaining Self-Admitted Technical Debts with Attention- Based Neural Networks

Improving the detection of technical debt in Java source code with an enriched dataset

Automatic Detection and Analysis of Technical Debts in Peer-Review Documentation of R Packages

Measuring Improvement of F$_1$-Scores in Detection of Self-Admitted Technical Debt

Exploring the Advances in Using Machine Learning to Identify Technical Debt and Self-Admitted Technical Debt

DebtFree: Minimizing Labeling Cost in Self-Admitted Technical Debt Identification using Semi-Supervised Learning

An Exploratory Study on the Introduction and Removal of Different Types of Technical Debt

Detecting Multi-Type Self-Admitted Technical Debt with Generative Adversarial Network-Based Neural Networks

Automating Change-Level Self-Admitted Technical Debt Determination.

Identifying self-admitted technical debt in open source projects using text mining

Large Language Model ChatGPT Versus Small Deep Learning Models for Self‐admitted Technical Debt Detection: Why Not Together?

An empirical study on the effectiveness of large language models for SATD identification and classification