Predictive Modeling For Early Lung Cancer Detection Using Ensemble Machine Learning

Authors

  • Nashad Noor Yussuf Dinni, Affan Bin Hassan, Asim Bin Awad Mahfooz BE.Students;Department Of Artificial Intelligence & Data Science ISL Engineering College, Hyderabad, India Author
  • Mr. Mohammed Rahmat Ali Assistant Professor, Department Of Computer Science & Artificial Intelligence & Data Science, ISL Engineering College, Hyderabad, India. Author

Keywords:

Lung Cancer Detection, Ensemble Machine Learning, SMOTE, Stacked Classifier, CatBoost, XGBoost, LightGBM, Flask Web Application, Predictive Modeling, Healthcare AI.

Abstract

Lung cancer remains one of the leading causes of cancer-related deaths worldwide, accounting for approximately 1.8 million deaths annually. Early detection is critical for improving patient survival rates and enabling timely therapeutic interventions. This paper presents an intelligent machine learning-based prediction system for early lung cancer detection using survey-based clinical and lifestyle data. The proposed system employs a stacked ensemble learning model that integrates five powerful base classifiers, namely CatBoost, XGBoost, LightGBM, AdaBoost, and Random Forest, with Logistic Regression as the meta-learner that produces the final binary prediction.

To address the inherent class imbalance in the dataset (270 cancer vs. 39 non-cancer cases), the Synthetic Minority Over-sampling Technique (SMOTE) was applied, ensuring a balanced training distribution. The proposed stacked ensemble model achieved an accuracy of 96.9%, precision of 96.98%, recall of 96.82%, F1-score of 96.90%, and an ROC-AUC score of 0.99, outperforming all individual classifiers and demonstrating state-of-the-art performance. Additionally, a Flask-based web application was implemented, providing a user-friendly interface for real-time prediction, data visualization, and result interpretation. The system is modular, scalable, and clinically accessible. The proposed system leverages advanced ensemble learning techniques to improve prediction reliability and reduce model variance compared to single classifiers. A balanced dataset is achieved using SMOTE, which enhances the model’s ability to correctly classify minority (non-cancer) cases and reduces bias.  The system ensures high sensitivity (recall), which is critical in medical diagnosis to minimize false negatives and avoid missed cancer cases. Feature importance analysis highlights key contributing factors such as smoking habits, anxiety, fatigue, and respiratory symptoms, improving interpretability.  The model demonstrates robust generalization capability, validated using cross-validation techniques to ensure consistent performance across unseen data.  A modular architecture is designed, allowing easy scalability and integration with other healthcare systems or datasets in the future.

Downloads

Published

2026-04-27

How to Cite

Predictive Modeling For Early Lung Cancer Detection Using Ensemble Machine Learning. (2026). International Journal of Engineering and Science Research, 16(2s1), 18-26. https://ijesr.org/index.php/ijesr/article/view/1696

Most read articles by the same author(s)

Similar Articles

1-10 of 1193

You may also start an advanced similarity search for this article.