Diabetes Prediction Using Machine Learning Algorithms
  • Category: Information Science and Technology

Diabetes is a chronic illness that poses a potential threat to the global healthcare system. According to the International Diabetes Federation, 382 million people worldwide suffer from diabetes, a number that is predicted to double and reach 592 million by 2035. High levels of blood glucose cause diabetes, which results in symptoms like frequent urination, increased thirst, and heightened hunger. Diabetes is a major cause of blindness, kidney failure, amputations, heart failure, and stroke. When we consume food, it gets converted into glucose, and the pancreas releases insulin to unlock our cells to permit glucose to enter and enable us to use it as energy. Several algorithms such as K-Nearest Neighbour, Logistic Regression, Random Forest, Support Vector Machine, and Decision Tree are employed to predict and prevent diabetes by calculating the accuracy of each model. The algorithm that produces the best accuracy is selected as the model for diabetes prediction. Keywords: Machine Learning, Diabetes, Decision Tree, K-Nearest Neighbour, Logistic Regression, Support Vector Machine, Accuracy.

Introduction:

Diabetes is an ailment that is rapidly growing in both old and young populations. To understand diabetes, we must understand what happens in the body when diabetes is absent. Glucose is derived from carbohydrate foods, which are mostly found in our diet. Carbohydrate foods, such as bread, cereal, pasta, rice, fruit, dairy products and starchy vegetables, provide the body with its primary source of energy, including in people with diabetes. Once we consume these foods, the body breaks them down into glucose.

Types of Diabetes:

Type 1 Diabetes: The immune system is weakened, and the cells fail to produce enough insulin. Currently, there are no concrete studies to prove the cause of type 1 diabetes, and no known methods of prevention.

Type 2 Diabetes: The cells produce an insufficient amount of insulin, or the body can't use insulin correctly. This is the most common type of diabetes and affects approximately 90% of people diagnosed with diabetes. It is caused by both genetic factors and lifestyle.

Literature Survey:

Yasodhaet al.[1] classified diverse datasets to determine whether or not a person is diabetic. The data set of diabetic patients was obtained from a hospital warehouse and contained two hundred instances with nine attributes. The data was evaluated using a 10-fold cross-validation method and WEKA to classify the data. They concluded that J48 produced the best results with an accuracy of 60.2%.

Aiswaryaet al. [2] aimed to detect diabetes by examining the patterns in data from classification analysis using Decision Tree and Naïve Bayes algorithms. Using the PIMA dataset and cross-validation approach, the study found that the J48 algorithm had an accuracy rate of 74.8% while the Naïve Bayes algorithm had an accuracy of 79.5% using 70:30 split.

Methodology:

This paper covers various machine learning classifiers used to predict diabetes. The different methods used are defined below, and the proposed methodology for improving accuracy is discussed. Five different algorithms were employed to calculate the accuracy measure of the machine learning models. The models can then be employed to predict diabetes.

K-Nearest Neighbour Algorithm:

K-Nearest Neighbour is among the simplest algorithms for Machine Learning, based on supervised learning procedures. It accepts the similarity between the new case/data and available data to categorize it into the group that is most similar to the available data.

The K-NN algorithm is a method of categorizing new data points based on similarities with existing data stored in memory. This algorithm is commonly used for classification problems, although it can also be applied to regression. K-NN is considered a non-parametric algorithm because it makes no assumptions about underlying data distributions. It is sometimes referred to as a lazy learner because it stores the entire dataset instead of directly learning from the training set. During classification, K-NN compares new data with previously stored data features to assign it a category. This is useful in recognizing entities such as cats and dogs based on images with similar features.

To apply K-NN to a classification problem, the following steps are recommended: data input, splitting the dataset into training and testing subsets, applying the K-NN classifier, training the data using the classifier, testing, and calculating performance metrics. These metrics can include outcome class bar plots, classification accuracy graphs, confusion matrices, and classification tables.

One such application of K-NN is in the detection of diabetes at an early stage. In this study, researchers designed a system to predict diabetes using various machine learning classification algorithms. Using the John Diabetes Database, they evaluated and compared the effectiveness of five algorithms and achieved 99% prediction accuracy using the Decision Tree algorithm. This system and the algorithms used could be extended to other health conditions for future automated analysis.

References:

1. Aljumah, A.A., Ahamad, M.G., Siddiqui, M.K. (2013). Application of data mining: Diabetes health care in young and old patients. Journal of King Saud University - Computer and Information Sciences, 25, 127-136.

2. Arora, R., Suman (2012). Comparative Analysis of Classification Algorithms on Different Datasets using WEKA. International Journal of Computer Applications, 54, 21-25.

3. Bamnote, M.P., G.R. (2014). Design of Classifier for Detection of Diabetes Mellitus Using GeneticProgramming. Advances in Intelligent Systems and Computing, 1, 763-770.

4. Choubey, D.K., Paul, S., Kumar, S., Kumar, S. (2017). Classification of Pima indian diabetes dataset using naive bayes with genetic algorithm as an attribute selection. In Communication and Computing Systems: Proceedings of the International Conference on Communication and Computing System (ICCCS 2016) (pp. 451-455).

As an AI language model, my objective is to reword the given text while retaining its core message and making it fresh. The original text reads as follows:

My assignment is to revise the complete text utilizing improved vocabulary and organic phrasing. The output must be in English and maintain the original meaning of the text.

Continue by Your Own
Share This Sample