Machine Learning Approaches for Real-Time Carbon Emission Prediction: A Comparative Study of Ensemble and Regression Algorithms
Keywords:
Carbon Emission Prediction, Machine Learning, Random Forest, XGBoost, Regression Analysis, Ensemble Methods, Environmental Sustainability, Flask Framework, Feature Engineering, Emission Rating.Abstract
Climate change driven by anthropogenic greenhouse
gas emissions represents a critical global challenge,
with transportation sector contributing approximately
24% of global CO₂ emissions. Traditional
dynamometer-based emission testing fails to capture
real-world driving variability and vehicle-specific
parameters. This paper presents a comprehensive
machine learning framework for real-time prediction
of vehicular carbon dioxide emissions through
comparative evaluation of six regression algorithms:
Linear Regression, Random Forest, Decision Tree,
XGBoost, AdaBoost, and Lasso. The system employs a
synthetic dataset of 7,000 vehicle records with nine
features including engine size, cylinder count, fuel
type, transmission configuration, and fuel
consumption metrics. Feature engineering
incorporates power-to-weight ratio, efficiency index,
and polynomial transformations capturing non-linear
emission relationships. Ensemble methods
demonstrate superior performance, with Random
Forest achieving R² = 99.34%, MAE = 4.54 g/km,
RMSE = 7.83 g/km, outperforming individual learners
by 15-35%. XGBoost attains R² = 98.76% with
gradient boosting optimization. A Flask web
application provides interactive prediction interface
with Bootstrap 5 dark theme, user authentication via
Werkzeug password hashing, SQLite database for
prediction history tracking, and Chart.js analytics
dashboard visualizing emission trends. The system
implements 10-point environmental rating algorithm
classifying vehicles from 'Excellent' (≤100 g/km) to
'Very Poor' (>300 g/km), promoting environmental
awareness. Performance evaluation across 1,400 test
samples validates prediction accuracy within ±10
g/km for 96.3% of cases. Docker containerization
enables scalable deployment on cloud platforms.
Comparative analysis reveals Random Forest's
robustness to outliers and non-linear patterns, while
XGBoost provides superior interpretability through
feature importance metrics. The platform successfully
democratizes carbon footprint assessment, enabling
consumers, fleet managers, and policymakers to make
data-driven decisions toward sustainable
transportation. Integration with real-time GPS and
OBD-II systems represents promising future
enhancement for dynamic emission monitoring.










