HEART DISEASE PREDICTION BASED ON PHYSIOLOGICAL PARAMETERS USING ENSEMBLE CLASSIFIER AND PARAMETER OPTIMIZATION

This study describes the prediction of heart disease using ensemble classifiers with parameter optimization. As input, a public dataset was taken from UCI machine learning repository, which refers to the dataset at UCI Machine learning. The dataset consists of 13 variables that are considered to influence heart disease. Particle swarm optimization (PSO) was used for feature selection and principal component analysis (PCA) for feature extraction to reduce the features' dimensions. The application of parameter optimization on several machine learning methods such as SVM (Radial Basis Function), Deep learning, and Ensemble Classifier (bagging and boosting) to get the highest accuracy comparison. The results of this study using PSO dimensionality reduction in the public dataset of heart disease resulted in the slightest accuracy compared to PCA. In contrast, the highest accuracy was obtained from optimizing Deep Learning parameters with an accuracy of 84.47% and optimization of SVM RBF parameters with an accuracy of 83.56%. The highest accuracy in the ensemble classifier using bagging on SVM of 83.51%, with a difference of 0.5% from SVM without using bagging.


Introduction
Heart disease is one of the diseases with the highest risk of death.According to data from the World Health Organization (WHO), in 2012 showed, 17.5 million people, or 31% of the world's population, died from heart disease (Al-Mawali, 2015).According to 2018 data, the ageadjusted cardiovascular disease (CVD) death rate in the United States was 217.1 per 100,000.Every 36 seconds, a CVD-related death occurs.In the US, a stroke victim dies every three minutes and 33 seconds.Based on data from 2018, 405 people die from a stroke daily (Virani et al., 2021).However, it is anticipated that advances in the upcoming year will combine with other important risk factors, including hypertension, cholesterol, and diabetes mellitus, to become a significant risk factor (Farkouh et al., 2013).Because of this, efforts are needed to improve the prevention of heart disease, such as consulting a cardiologist in carrying out medical actions and maintaining a healthy body (Winnige et al., 2021).However, in reality, people are reluctant to carry out regular heart health checks due to a lack of public awareness of the importance of health.This is an obstacle to the early detection of heart disease in the community (Nardin et al., 2020).
The rapid development of technology using machine learning (ML) to solve society's problems has become very important, especially those related to medical informatics or biomedical computing and predicting heart disease (Ahmad & Polat, 2023;Nagavelli et al., 2022).Research related to heart disease can generally be divided into two.The first is biological signal processing to detect heart disease, and the second is data mining related to variables that trigger heart disease.Research mainly involves electrocardiogram signals processing for the detection of heart disease (Cheng et al., 2020;Hadiyoso & Rizal, 2017;Pestana et al., 2020) or processing heart sounds (Rizal & Suratman, 2020;Li et al., 2022;Zeinali & Niaki, 2022).Heart imaging techniques through echocardiography are also an alternative for the detection of heart disease (Liastuti et al., 2022;Mabrouk et al., 2016).Another imaging technique for analyzing blood vessels related to heart disease is angiography imaging (Khan Mamun & Elfouly, 2023).The photoplethysmogram (PPG) analysis method also has the potential to detect heart failure (Ave et al., 2015;Fahoum et al., 2023).The second category tends to predict heart disease based on specific parameters such as complaints, lifestyle, or the results of a person's physical examination (Sharma & Parmar, 2020;Hossain et al., 2023;Bharti et al., 2021).Even the prediction of cardiovascular disease can also be performed by analyzing mental illness (Cunningham et al., 2019).The advantage of this method is that it can predict before heart disease occurs.
Methods for predicting or detecting heart disease by considering the variables of psychological examination, blood, age, and gender have been reported in several studies.However, there is still an opportunity to improve accuracy and reduce the variables used in predictions.Therefore, in this research proposes a feature dimension reduction method using principal component analysis (PCA) and particle swarm optimization (PSO), which will be combined with parameter optimization in machine learning (SVM, deep learning, Ensemble Classifier bagging, and bosting).It is thought that this combination of methods can improve detection accuracy even though feature attributes are reduced.This study aims to find a machine learning model with the highest accuracy by testing the dimensionality reduction algorithm.

Research Methods
This study used rapid miner tools to carry out the data mining process.Rapid miner was chosen because it has many data mining modeling and visualization features that are easy to read.In this research, Rapid Miner supports the normalization process, parameter optimization and ensemble classifier.The rapid miner process is divided into two main functions: without using dimensionality reduction in the application of machine learning methods and using dimensionality reduction in the application of machine learning methods.The dimensionality reduction testing process is needed to see that some features that are not required or have the most negligible weight can be reduced (Thrun et al., 2023;Vachharajani & Pandya, 2022) The first step is to input the dataset, which is then normalized.Furthermore, whether the data that has been normalized is carried out or not, dimensionality reduction is carried out with PSO or PCA to test the level of accuracy produced.Then each data was tested for accuracy using SVM RBF parameter optimization, deep learning, and ensemble classifier (bagging and boosting).Figure 1 shows the flowchart for the proposed model.While Figure 2 shows the implementation of the proposed model.

Dataset
The research data use sources from the Kaggle public dataset, the UCI Machine Learning Repository.The data used is an open access dataset, some researchers have used it (Moturi et al., 2020;Sharma & Parmar, 2020).The Cleveland Heart Disease dataset is used in this work which is extracted from the UCI repository.Totally 14 attributes are used in the diagnosis process and 303 data instances.(details are shown in Table 1) with 303 data instances.Dataset processing starts from taking the raw data in UCI Machine Learning with the CSV extension and then converting it to Excel to simplify the import process into Rapid Miner.The "target" attribute in the form of a nominal 0-1 will be converted into "yes" and "no".The data that has been imported (read excel) will be set to the target role and the target role becomes the label.

Dimensionality Reduction
This study compares two types of dimensional reduction methods, PCA and PSO.The orthogonal basis of the data can be converted into a lower-dimensional subspace using the Principal Components Analysis (PCA) method (Yao et al., 2012).It is possible to reduce the number of features needed for efficient data representation.By creating a data distribution model in the modified space, this method can reduce the properties of finite variables (Pimentel et al., 2014).
The Particle Swarm Optimization (PSO) algorithm proposed by Kennedy and Eberhart in 1995 was adapted from the foraging behaviour of birds and fish (particles) (Miraswan & Maulidevi, 2016;Zhenyu Meng et al., 2022).All these particles will move in space (optimal) at a certain speed and continue to change each particle in the search space indicated for the search until it reaches the destination (Jamian et al., 2014).PSO places simple things known as particles in the search space of a certain problem or function.At their present location, the particles subsequently form a fitness function.By taking into account a particular component of the past of the optimal position of one or more swarm members with some random behavior, each particle can decide movement in the search space.When every particle is present, the subsequent iteration starts (Miraswan & Maulidevi, 2016).A three-dimensional vector D that represents the size of the search space and is connected to the following values characterizes each member of the swarm as follows: 1. Actual position, xi 2. Previous best position, pi 3. Particle velocity, vi The actual position (xi) can be seen as a point in the search space.The present position is regarded as a solution to this problem in each iteration of the algorithm.If the place receives the highest fitness score, the coordinates are stored in the pi vector.The best overall results can be made public in a variable called Pg for comparison with the outcomes of the following iteration.The objective is to remember the best location after recording it (Jamian et al., 2014).In this study, PSO was used to generate three characteristics which mentioned above.The following are the general steps in the PSO process in Data Mining: 1. Particle Population Initialization: The first step is to initialize the particle population.Each particle represents a potential solution in the search space, which may be a set of model parameters or a set of relevant features 2. Particle Quality Assessment: Each particle is evaluated for its quality in achieving optimization goals.3. Determination of the Best Particle (Pbest): Each particle stores information about its own best position (Pbest) based on its quality assessment 4. Determination of the Global Particle Best (Gbest): In addition to storing Pbest, PSO also searches for the best solution found by all the particles in the population.5. Particle Position Update: Each particle updates its position based on personal experience (Pbest) and collective experience (Gbest).This process describes how particles move in search of a better solution.6. Evaluation and Criteria: PSO iteration continues with particle position updates.Evaluation continues, and stopping criteria are applied.The stopping criteria can be the number of iterations that have been performed, the accuracy reaching a certain threshold, or an insignificant improvement in quality.7. Optimal Solution: After the iteration is complete, the solution represented by the best Gbest or Pbest is considered the optimal or best solution found by the PSO algorithm.

Classifiers 2.3.1 Support Vector Machine
SVM is a learning system used in the hypothesis of linear functions in high-dimensional feature space; the computer will be trained with an algorithm based on optimization theory with statistical learning theory (Srivastava & Bhambhu, 2010).SVM can work on non-linear data by using a kernel approach to the initial features of the data set (Awad & Khanna, 2015).Kernel functions map lower dimensions to higher dimensions (Abbaszadeh et al., 2019).In this study, the RBF kernel concept or Radial Basic Function is used in the classification process to get better accuracy with the formula: and   are pairs of two training data. 2 > 0 is a constant.The kernel function must use dot product substitution in the feature space, which is very dependent on this kernel function in determining the new features produced.

Deep Learning
Deep learning is an artificial neural network algorithm that uses data as input and processes it with a hidden layer.Furthermore, a non-linear transformation of the input data is carried out to calculate the output value (Li et al., 2019).
This study uses parameter optimization (learning rate) with a min value of 0.01, max 1.0, steps 10, and uses GPU utilization in processing so that the data computing process can take place faster than using CPU and RAM alone.

Ensemble Classifier
Ensemble classifier is a data-level approach aimed at improving class balance (Liu et al., 2022).The strategy using the ensemble algorithm aims to enhance the algorithm without altering the data.The data level approach and the algorithm level approach are two possible trajectories (Murugananthan & Durairaj, 2019).Boosting and bagging are two common ensemble algorithms (Jafarzadeh et al., 2021).An algorithm with better classification performance is AdaBoost.Bagging is a straightforward but efficient ensemble method that has been utilized in numerous real-world applications to improve the accuracy of classification algorithms.Following are the general steps in how an ensemble classifier works: 1. Creating Base Learner Models: The first step is to build a number of different base models (base learners).These models can come from various machine learning algorithms such as Decision Trees, Random Forests, Logistic Regression, Support Vector Machines, Neural Networks, and others.2. Base Learner Model Training: Each base model is trained on the same training data.These models generate individual predictions based on the characteristics of the data and the rules they learn during training 3. Prediction Combination: The results of the base models are combined to produce an ensemble prediction.Combining methods can vary, but the most common methods are through majority voting (bagging), adding weights (boosting), or selecting predictions from the model that is considered the best (stacking).4. Evaluation and Accuracy: Ensemble predictions are evaluated on different test data to measure their performance.The working principle of the ensemble classifier is based on the assumption that various basic models have different strengths and weaknesses.By combining predictions from these various models, the weaknesses of one model can be offset by the strengths of the other models, overall resulting in more accurate and stable predictions (Rokach, 2010).

Results and Discussions
The data was first reduced in dimensions using PSO or PCA in this study.In the next stage, classification is carried out using SVM and DL.Then the SVM and DL were optimized using bagging and boosting methods.These results are then compared with the accuracy without the dimension reduction process.From the table above, we can see that the use of feature extraction can increase the classification accuracy quite significantly.The highest accuracy results were obtained using SVM parameter optimization (RBF) with 84.20%.For the ensemble classifier, it can be seen that the accuracy difference is quite far from PSO to PCA, which has increased quite a lot.Next, we will classify without using dimensional reduction or all of the existing input variables listed in Table 4. Suppose, viewed more specifically, using a small dataset (13 input variables and 303 instances) results in significant accuracy.If the dataset has more input variables and more instances, the accuracy can be higher.The exciting thing is that the highest accuracy results obtained from deep learning parameter optimization of 84.47% with a learning rate of 0.1 can beat the accuracy of SVM parameter optimization (RBF).
From the results obtained, the reduction of the dimension of the feature does not necessarily increase the accuracy of the classification.For feature reduction using PSO, the accuracy is much lower than PCA.This is because the optimization, in general, does not always produce the maximum value but only improves performance.Meanwhile, classification without a dimension reduction process for some cases increases accuracy; this is understandable because the initial characteristics will provide complete information on the data.Meanwhile, feature reduction may eliminate important information for classification.
The classifier ensemble process generally does not produce higher accuracy than the parameter optimization process.This is because the learning algorithm is used without considering the value of the kernels in the algorithm.This causes a decrease in the accuracy value than using parameter optimization.In the parameter optimization process, trials are conducted to find the kernel value which has the smallest Root Mean Square Error (RMSE) value to be used as a reference for the use of kernel values such as SVM, which has the disadvantage of being difficult to determine the optimal parameter optimization value so that parameter optimization tests are carried out on the SVM kernel (dot, RBF, polynomial).Jung et al. (Hojin Nam, Minseo Rhee, Jeung-Sun Lee, 2021) used the same dataset as this study.
Another study on feature analysis in chronic heart disease using the same dataset in this study was conducted by Bialy et al. (El-Bialy et al., 2015).The difference in this research is using the C4.5 Tree algorithm by grouping the process to be compared with other data, then the data with the highest similarity will be grouped into one part.The results of this study have a classification accuracy of 78.06%, which is greater than the average classification accuracy using a separate dataset of 75.48%.The results of this study were compared with (El-Bialy et al., 2015) using the same dataset.It was found that the value of increasing accuracy by optimizing parameters with the ensemble classifier SVM (Baging and bosting) and Deep Learning (83.34% and 84.47%) was compared.without using parameter optimization (76.30% and 75.48).Optimizing ensemble classifier parameters has the potential to be explored in further research.From the results of the performance test of the proposed method, it can be seen that the parameter optimization using DL increases the accuracy, although it is not significant.Meanwhile, dimension reduction in features and parameter optimization cannot significantly increase accuracy.The highest accuracy is achieved using DL with parameter optimization without feature reduction.An ensemble classifier cannot produce higher accuracy because its parameters are not optimized.Exploring opportunities are still open to achieve higher accuracy with better parameter optimization in the proposed method.

Conclusion
In this study, a combination of dimensional reduction and optimization of classifier methods is proposed to predict heart disease.PCA and PSO are used in dimension reduction, while the classifiers used are SVM (Radial Basis Function), Deep learning, and Ensemble Classifier (bagging and boosting).The test results show that the highest level of accuracy is found in deep learning parameter optimization with an accuracy value of 84.47% with a difference of 0.29% on PCA.The ensemble classifier is still below the highest accuracy, with an accuracy value of 83.17% on boosting SVM.This occurs because the use of ensemble classifiers (baging and bosting) is suitable for certain data characteristics such as data that is large enough to produce a significant increase in better accuracy.
The implications of research on clinical health applications can support doctors in carrying out early detection using parameters that have been observed, It is hoped that this research can be a reference for further study in predicting heart disease so that it can detect the risk of heart disease early.Suggestions from the further analysis can be made by increasing the number of input variables and instances so that accuracy can increase.The use of other algorithms can also be done in further research, such as KNN, Naïve Bayes, Decision Tree, etc.The use of parameter optimization is also needed in testing the accuracy of each machine learning method.

Table 1 -
Variables And Type Of Data