Abstract:Context:Design smell detection has proven to be a significant activity that has an aim of not only enhancing the software quality but also increasing its life cycle.Objective:This work investigates whether machine learning approaches can effectively be leveraged for software design smell detection. Additionally, this paper provides a comparatively study, focused on using balanced datasets, where it checks if avoiding dataset balancing can be of any influence on the accuracy and behaviour during design smell detection.Method:A set of experiments have been conducted-using 28 Machine Learning classifiers aimed at detecting God classes. This experiment was conducted using a dataset formed from 12,587 classes of 24 software systems, in which 1,958 classes were manually validated.Results:Ultimately, most classifiers obtained high performances,-with Cat Boost showing a higher performance. Also, it is evident from the experiments conducted that data balancing doesn't have any significant influence on the accuracy of detection. This reinforces the application of machine learning in real scenarios where the data is usually imbalanced by the inherent nature of design smells.Conclusions:Machine learning approaches can effectively be used as a leverage for God class detection. While in this paper we have employed SMOTE technique for data balancing, it is worth noting that there exist other methods of data balancing and with other design smells. Furthermore, it is also important to note that application of those other methods may improve the results, in our experiments SMOTE did not improve God class detection.The results are not fully generalizable because only one design smell is studied with projects developed in a single programming language, and only one balancing technique is used to compare with the imbalanced case. But these results are promising for the application in real design smells detection scenarios as mentioned above and the focus on other measures, such as Kappa, ROC, and MCC, have been used in the assessment of the classifier behavior.

Odor Classification: Exploring Feature Performance and Imbalanced Data Learning Techniques

Predicting odor from vibrational spectra: a data-driven approach

Boost AI Power: Data Augmentation Strategies with Unlabeled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose

Design of Biomimetic Olfactory Sensing System by Implanted Microelectrode Array and Its Application in Odor Recognition

Predicting individual perceptual scent impression from imbalanced dataset using mass spectrum of odorant molecules

Insight into the Structure–Odor Relationship of Molecules: A Computational Study Based on Deep Learning

Advancing Odor Classification Models Enhanced by Scientific Machine Learning and Mechanistic Model: Probabilistic Weight Assignment for Odor Intensity Prediction and Uncertainty Analysis for Robust Fragrance Classification

Data Science In Olfaction

Understanding the Odour Spaces: A Step towards Solving Olfactory Stimulus-Percept Problem

A Novel Semi-Supervised Learning Approach in Artificial Olfaction for E-Nose Application

Processing and classification of chemical data inspired by insect olfaction

Olfactory Label Prediction on Aroma-Chemical Pairs

A machine learning based analysis to probe the relationship between odorant structure and olfactory behaviour in C. elegans

An Odor Labeling Convolutional Encoder-Decoder for Odor Sensing in Machine Olfaction

A Binary Classification Approach Specifically for Green Odor

A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique

A comparison of machine learning algorithms on design smell detection using balanced and imbalanced dataset: A study of God class

Mapping the combinatorial coding between olfactory receptors and perception with deep learning

Multi-label Classification Performance using Deep Learning

A deep position-encoding model for predicting olfactory perception from molecular structures and electrostatics

Code Smell Detection using Multilabel Classification Approach