Abstract
Cervical cancer is one of the most prevalent gynecological cancers worldwide, and early screening plays a crucial role in mitigating its global burden. This disease is largely preventable, yet disadvantaged groups often lack access to regular screenings due to limited knowledge, restricted medical facility access, and high treatment costs, particularly in developing countries. Addressing this challenge, our study introduces an innovative ensemble machine learning approach to accurately predict cervical cancer risk. This novel method is distinct in its integration of multiple advanced algorithms, including decision tree, random forest, support vector machine, and k-nearest neighbor, offering a comprehensive analysis, unlike previous singular model approaches. Applying these techniques to a dataset of 858 patients from the University of California, Irvine (UCI) machine learning repository, collected at the “Hospital Universitario de Caracas” in Venezuela, we encompass a wide range of data including demographic information, routines, medical records, and 36 distinct features. A key step in our methodology was the preprocessing of this data, where missing values were judiciously replaced with mean values to preserve data integrity. The findings are groundbreaking, with the random forest model outshining others by achieving an accuracy of 97%. This level of precision in forecasting cervical cancer threat is unmatched and holds substantial promise for healthcare professionals. By utilizing a confusion matrix, we have thoroughly evaluated each design’s efficiency. This research not only demonstrates the effectiveness of machine learning in boosting healthcare but additionally highlights its potential to boost the quality of life of patients through early discovery and targeted care of those at enhanced risk of cervical cancer.