Comparison of Machine Learning Classification Algorithms for Prediction of Early-stage Diabetes

Authors

  • Shirish Mohan Dubey Assistant Professor, Department of Computer Science & Engineering, Poornima College of Engineering, Jaipur, Rajasthan, India
  • Vani Chaturvedi Student, Department of Computer Science & Engineering, Poornima College of Engineering, Jaipur, Rajasthan, India
  • Samidha Bafna Student, Department of Computer Science & Engineering, Poornima College of Engineering, Jaipur, Rajasthan, India
  • Sumit Sing Rathore Student, Department of Computer Science & Engineering, Poornima College of Engineering, Jaipur, Rajasthan, India
  • Rajat Tank Student, Department of Computer Science & Engineering, Poornima College of Engineering, Jaipur, Rajasthan, India

Keywords:

Diabetes, prediction, classification machine learning, model evaluation, logistic regression, KNN method, random forest, support vector machine

Abstract

Diabetes is the most widespread and gruesome disease spread all around the globe. It is a chronic metabolic disease which has mild symptoms which are hard to spot at the initial stage. It is well known that diabetes causes a high blood sugar level. This is because insulin is not able to carry sugar from your blood into your cells in order to be stored or used for energy as it should. Negligence in treatment and identification of this disease can damage a person’s nerves, eyes, kidneys, skin, and other organs. Thus, timely diagnosis and treatment can be effective in overcoming this disease. The purpose of this study is to review a number of frameworks that have been proposed as ways to predict diabetes disease and classify diabetes using Machine Learning (ML) algorithms. The database is a summarized set of data collected from various medical research centres and hospitals from across the India. The emergence of diabetes can be predicted by applying four different classification algorithms, i.e., KNN Algorithm, Logistic Regression, Random Forest, and Support Vector Machine. The motive of this framework is to save the money and time of a patient in diagnosis using various ML approaches.

References

Nithya B, Ilango V. Predictive analytics in health care using machine learning tools and techniques. In 2017 IEEE International Conference on Intelligent Computing and Control Systems (ICICCS). 2017 Jun 15; 492–499.

Kumar PS, Pranavi S. Performance analysis of machine learning algorithms on diabetes dataset using big data analytics. In 2017 IEEE international conference on infocom technologies and unmanned systems (trends and future directions) (ICTUS). 2017 Dec 18; 508–513.

Emmanuelle Rieuf. (2017 Mar 9). 6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python). [Online]. Data Science Central. [cited 2023 May 1]. Available from: https://www.datasciencecentral.com/6-easy-steps-to-learn-naive-bayes-algorithm-with-code-in-python/

Hosur R, Rati A, Dalawai P, Gornal R, Patil R. A Survey on Automatic Detection of LPG Gas Leakage. In 2018 IEEE International Conference on Smart Systems and Inventive Technology (ICSSIT). 2018 Dec 13; 266–269.

Bhargavi VR, Senapati RK. Curvelet fusion enhancement based evaluation of diabetic retinopathy by the identification of exudates in optic color fundus images. Biomed Eng: Appl Basis Commun. 2016 Dec 19; 28(06): 1650046.

Breiman L. Random Forests. Statistics Department. Berkeley, CA: University of California; 2001 Jan; 4720.

Sisodia D, Shrivastava SK, Jain RC. ISVM for face recognition. In 2010 IEEE International conference on computational intelligence and communication networks. 2010 Nov 26; 554–559.

Witten IH, Frank E. Data mining: practical machine learning tools and techniques with Java implementations. ACM Sigmod Record. 2002 Mar 1; 31(1): 76–77.

Kanchan BD, Kishor MM. Study of machine learning algorithms for special disease prediction using principal of component analysis. In 2016 IEEE international conference on global trends in signal processing, information computing and communication (ICGTSPICC). 2016 Dec 22; 5–10.

Thandar M, Usanavasin S. Measuring opinion credibility in Twitter. In Recent Advances in Information and Communication Technology 2015: Proceedings of the 11th International Conference on Computing and Information Technology (IC2IT). 2015; 205–214. Springer International Publishing.

Semeraro Esposito M, Esposito F, Malerba D, Semeraro G. A comparative analysis of methods for pruning decision trees. IEEE Trans Pattern Anal Mach Intell. 1997; 19(5): 476–91.

Agrawal P, Dewangan A. A brief survey on the techniques used for the diagnosis of diabetes-mellitus. Int Res J Eng Technol (IRJET). 2015 Jun; 2(3): 1039–1043.

Nai-Arun N, Moungmai R. Comparison of classifiers for the risk of diabetes prediction. Procedia Comput Sci. 2015 Jan 1; 69: 132–142.

Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J. 2017 Jan 1; 15: 104–16.

Liu Y, Wang Y, Zhang J. New machine learning algorithm: Random forest. In: Information Computing and Applications: 3rd International Conference, ICICA 2012, Chengde, China, 2012 Sep 14–16; Proceedings. 2012; 246–252. Springer Berlin Heidelberg.

Published

2023-06-12

How to Cite

[1]
S. M. . Dubey, V. . Chaturvedi, S. . Bafna, S. S. . Rathore, and R. . Tank, “Comparison of Machine Learning Classification Algorithms for Prediction of Early-stage Diabetes”, JoSETTT, vol. 10, no. 1, pp. 1–5, Jun. 2023.