Optimization of tree-based machine learning algorithms for improving the predictive accuracy of hepatitis C disease

dc.authorscopusidFemilda Josephin Joseph Shobana Bai / 57810685700
dc.authorwosidFemilda Josephin Joseph Shobana Bai / AGG-4255-2022
dc.contributor.authorBai, Femilda Josephin Joseph Shobana
dc.contributor.authorJasmine, R. Anita
dc.date.accessioned2025-04-18T10:08:20Z
dc.date.available2025-04-18T10:08:20Z
dc.date.issued2024
dc.departmentİstinye Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstractHepatitis C is a globally prevalent viral infection that has the potential to cause significant liver-related complications if not appropriately managed. The timely and precise identification of the medical condition is imperative for the efficient administration of patient care and therapy. One of the precise and potential diagnosis methods in the identification of hepatitis C is the utilization of machine learning (ML) algorithms. The present investigation focuses on the optimization of four ML algorithms which are tree-based algorithms, namely, random forest (RF), gradient boosting machines (GBMs), light gradient boosting machines (LGBMs), and extreme gradient boosting (XGBoost) with the aim of enhancing the predictive accuracy of hepatitis C disease. The investigation utilized a reliable dataset from the University of California, Irvine (UCI) Machine Learning Repository. The research methodology encompasses various stages, including data preprocessing, feature selection, hyperparameter tuning, and model evaluation. Optimization techniques, including the synthetic minority oversampling technique (SMOTE) for data balancing and grid search optimization for hyperparameter tuning, were utilized to improve the models’ performance. The optimized models were assessed through the utilization of stratified k-fold cross-validation and performance metrics, which comprise accuracy, precision, recall, F1-score, and area under the receiver operating characteristic (ROC) curve. The findings of our study indicate that the optimized tree-based algorithms exhibit superior performance compared to their nonoptimized counterparts. Specifically, LGBM demonstrated the highest level of predictive accuracy at 98.91%, followed by XGBoost at 98.70%, GBM at 97.83%, and RF at 97.29%. The LGBM learning approach has the potential to be broadly applied and extended to diverse medical datasets and use cases, thus advancing ML in the healthcare domain. The study highlights the importance of optimizing tree-based algorithms to improve the accuracy of early prediction of the prevalence of hepatitis C disease and promote patient health. This underscores the capacity of ML to improve healthcare outcomes. © 2024 Elsevier Inc. All rights reserved.
dc.identifier.citationBai, F. J. J. S., & Jasmine, R. A. (2024). Optimization of tree-based machine learning algorithms for improving the predictive accuracy of hepatitis C disease. In Decision-Making Models (pp. 523-545). Academic Press.
dc.identifier.doi10.1016/B978-0-443-16147-6.00015-3
dc.identifier.endpage545
dc.identifier.isbn978-044316147-6, 978-044316148-3
dc.identifier.scopus2-s2.0-85202870219
dc.identifier.scopusqualityN/A
dc.identifier.startpage523
dc.identifier.urihttps://hdl.handle.net/20.500.12713/6956
dc.indekslendigikaynakScopus
dc.institutionauthorBai, Femilda Josephin Joseph Shobana
dc.institutionauthoridFemilda Josephin Joseph Shobana Bai / 0000-0003-0249-9506
dc.language.isoen
dc.publisherElsevier
dc.relation.ispartofDecision-Making Models: A Perspective of Fuzzy Logic and Machine Learning
dc.relation.publicationcategoryKitap Bölümü - Uluslararası
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectExtreme Gradient Boosting Machines
dc.subjectGradient Boosting Machines
dc.subjectHepatitis C Disease Prediction
dc.subjectHyperparameter Optimization
dc.subjectLight Gradient Boosting Machines
dc.subjectMachine Learning
dc.subjectRandom Forest
dc.subjectSMOTE
dc.titleOptimization of tree-based machine learning algorithms for improving the predictive accuracy of hepatitis C disease
dc.typeBook Chapter

Dosyalar

Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: