Nasraldeen Alnor Adam Khleel, and Károly Nehéz
Optimizing LSTM for Code Smell Detection: The Role of Data Balancing
Code smells are specific patterns or characteristics in software code that indicate potential design or implementation problems. Identifying code smells has gained significant attention in software engineering. It is essential to address code smells to maintain high-quality software systems. Machine learning (ML) models, such as Long Short-Term Memory (LSTM), have been to detect code smells automatically based on source code features. However, the imbalanced distribution of code smells within software projects poses a challenge to the accuracy of these models. This study explores the role of data balancing methods in optimizing the accuracy of the LSTM model for code smell detection. We investigate different techniques for addressing the class imbalance problem, including random oversampling and synthetic minority oversampling techniques (SMOTE). We evaluate the performance of the LSTM model with and without data balancing methods using accuracy, precision, recall, f-measure, Matthew’s correlation coefficient (MCC), and the area under a receiver operating characteristic curve (AUC). Our experimental results, conducted on four code smell datasets (God class, data class, feature envy, and long method) extracted from 74 open-source systems, demonstrate the effectiveness of data balancing methods in improving the accuracy of the LSTM model for code smell detection. The results indicate that the use of data balancing methods had a positive effect on the predictive accuracy of the LSTM model. In addition, we compared our proposed method with state-of-the-art code smell detection approaches. The findings from the comparison indicate that our proposed method performs notably better than existing state-of the-art approaches across the majority of datasets.
DOI: 10.36244/ICJ.2024.3.5
Please cite this paper the following way:
Nasraldeen Alnor Adam Khleel, and Károly Nehéz , "Optimizing LSTM for Code Smell Detection: The Role of Data Balancing", Infocommunications Journal, Vol. XVI, No 3, September 2024, pp. 57-63., https://doi.org/10.36244/ICJ.2024.3.5