Unleashing the Power of Data Mining in Software Engineering: Applications and Case Studies

Authors

  • Vamsi Krishna Thatikonda Senior Software Engineer, Computer Science and Technology, Chewy, Washington, USA
  • Hemavantha Rajesh Varma Mudunuri Sr. System Architect, Restaurant Point of Sale & Management System, Toast, Georgia, Washington, USA

Keywords:

Data Mining, Software Engineering, Bug Prediction, Software Quality Assurance

Abstract

 

Recent times have seen the nexus of data mining and software engineering evolve into a potent force, promising to redefine both fields. This paper highlights the critical role and varies uses of data mining within the software engineering domain, backed by the latest academic resources. The deep dive reveals critical data mining methodologies, their pertinent role across different sectors, and their unique implementations within software engineering. Various organizations, including well-known ones like Facebook and Uber, have showcased a variety of applications for data mining in the realm of software engineering, such as bug prediction, software quality assurance, user behaviour analysis, code smell detection, and effective project management. Although challenges related to data quality and privacy are hurdles along the way, the main emphasis remains on the bright prospects of data mining in this field, thanks in large part to advancements in deep learning and enhanced data privacy and security measures. The potential ripple effects of these advancements on the future of software engineering also command attention. The paper ends on a hopeful note for the continued integration of data mining and software engineering, hinting at a vibrant future.

References

Alotaibi NM, Abdullah MA. Big data mining: A classification perspective. InCommunication, Management and Information Technology 2016 Nov 3 (pp. 701–710). CRC Press.

Gupta MK, Chandra P. A comprehensive survey of data mining. Int J Info Technol. 2020 Dec; 12(4): 1243–57.

Mall R. Fundamentals of software engineering. PHI Learning Pvt. Ltd.; 2018 Sep 1.

Salem IE, Mijwil M, Abdulqader AW, Ismaeel MM, Alkhazraji A, Alaabdin AM. Introduction to The Data Mining Techniques in Cybersecurity. Mesopotamian J Cybersecur. 2022 May 30; 2022: 28–37.

Bindra K, Mishra A. A detailed study of clustering algorithms. In2017 6th international conference on reliability, infocom technologies and optimization (trends and future directions)(ICRITO) 2017 Sep 20 (pp. 371–376). IEEE.

Charbuty B, Abdulazeez A. Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends. 2021 Mar 24; 2(01): 20–8.

Maulud D, Abdulazeez AM. A review on linear regression comprehensive in machine learning. J Appl Sci Technol Trends. 2020 Dec 31; 1(4): 140–7.

Saxena A, Rajpoot V. A comparative analysis of association rule mining algorithms. InIOP Conference Series: Materials Science and Engineering 2021 Mar 1 (Vol. 1099, No. 1, p. 012032). IOP Publishing.

Suma DV. Data mining based prediction of demand in Indian market for refurbished electronics. J Soft Comput Paradigm. 2020 Jun; 2(2): 101–10.

Perez B, Castellanos C, Correal D. Applying data mining techniques to predict student dropout: a case study. In2018 IEEE 1st Colombian conference on applications in computational intelligence (colcaci) 2018 May 16 (pp. 1–6). IEEE.

Minku LL, Mendes E, Turhan B. Data mining for software engineering and humans in the loop. Prog Artif Intell. 2016 Nov; 5: 307–14.

Li Z, Fan Y, Jiang B, Lei T, Liu W. A survey on sentiment analysis and opinion mining for social multimedia. Multimedia Tools and Applications. 2019 Mar; 78: 6939–67.

What is a Source Code Repository | Sonatype. Sonatype.com. 2023. Available from: https://www.sonatype.com/launchpad/what-are-code-repositories

Hassani M, Shang W, Shihab E, Tsantalis N. Studying and detecting log-related issues. Empir Softw Eng. 2018 Dec; 23: 3248–80.

Maalej W, Kurtanović Z, Nabil H, Stanik C. On the automatic classification of app reviews. Requir Eng. 2016 Sep; 21: 311–31.

Pereira dos Reis J, Brito e Abreu F, de Figueiredo Carneiro G, Anslow C. Code smells detection and visualization: a systematic literature review. Arch Comput Methods Eng. 2022 Jan; 29(1): 47–94.

Mahmood N, Hafeez Y, Iqbal K, Hussain S, Aqib M, Jamal M, Song OY. Mining Software Repository for Cleaning Bugs Using Data Mining Technique. Comput Mater Contin. 2021 Oct 1; 69(1).

Alshahwan N, Gao X, Harman M, Jia Y, Mao K, Mols A, Tei T, Zorin I. Deploying search based software engineering with Sapienz at Facebook. InSearch-Based Software Engineering: 10th International Symposium, SSBSE 2018, Montpellier, France, September 8-9, 2018, Proceedings 10 2018 (pp. 3–45). Springer International Publishing.

Ramanathan MK, Clapp L, Barik R, Sridharan M. Piranha: Reducing feature flag debt at uber. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice 2020 Jun 27 (pp. 221–230).

Nedeltcheva GN. Data Mining for Software Development Life Cycle Quality Management. Serdica J Comput. 2014; 8(2): 183–98.

Nguyen G, Dlugolinsky S, Bobák M, Tran V, López García Á, Heredia I, Malík P, Hluchý L. Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev. 2019 Jun 1; 52: 77–124.

Ghaffarian SM, Shahriari HR. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Computing Surveys (CSUR). 2017 Aug 25; 50(4): 1–36.

Cheng L, Liu F, Yao D. Enterprise data breach: causes, challenges, prevention, and future directions. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2017 Sep; 7(5): e1211.

Olmedilla M, Martínez-Torres MR, Toral SL. Harvesting Big Data in social science: A methodological approach for collecting online user-generated content. Comput Stand Interfaces. 2016 May 1; 46: 79–87.

Wang B, Fan S, Jiang P, Xing T, Fang Z, Wen Q. Research on predicting the productivity of cutter suction dredgers based on data mining with model stacked generalization. Ocean Eng. 2020 Dec 1; 217: 108001.

Albahri AS, Hamid RA, Alwan JK, Al-Qays ZT, Zaidan AA, Zaidan BB, Albahri AO, AlAmoodi AH, Khlaf JM, Almahdi EM, Thabet E. Role of biological data mining and machine learning techniques in detecting and diagnosing the novel coronavirus (COVID-19): a systematic review. J Med Syst. 2020 Jul; 44: 1–1.

Alansari Z, Anuar NB, Kamsin A, Soomro S, Belgaum MR, Miraz MH, Alshaer J. Challenges of internet of things and big data integration. InEmerging Technologies in Computing: First International Conference, iCETiC 2018, London, UK, August 23–24, 2018, Proceedings 1 2018 (pp. 47–55). Springer International Publishing.

Chen B. Improving the software logging practices in DevOps. In2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) 2019 May 25 (pp. 194–197). IEEE.

Gunasekaran A, Papadopoulos T, Dubey R, Wamba SF, Childe SJ, Hazen B, Akter S. Big data and predictive analytics for supply chain and organizational performance. J Bus Res. 2017 Jan 1; 70: 308–17.

Zahra SW, Nadeem M, Ramzan A, Ahmad W, Arshad A, Riaz S, Saddheer M. Enhancing the Cloud Data Security Using Keeper Key and Kernel Tag. J Adv Shell Program. 2023;10(1): 35–46.

Syeda Wajiha Zahra, Muhammad Nadeem, et al. A Securing infrastructure against signature-based assaults using Karnaugh Detection Systems. Recent Trends Parallel Comput. 2023;10(1):1–12.

Published

11/29/2023

How to Cite

Thatikonda, V. K. ., & Varma Mudunuri, H. R. . (2023). Unleashing the Power of Data Mining in Software Engineering: Applications and Case Studies. JOURNAL OF WEB ENGINEERING &Amp; TECHNOLOGY, 10(3), 1–6. Retrieved from https://stmcomputers.stmjournals.com/index.php/JoWET/article/view/637