Sentiment Analysis of YouTube Comments Using Machine Learning
Keywords:
Sentiment Analysis · YouTube Comments · Machine Learning · TF-IDF · Logistic Regression · Naive Bayes · SVM · NLP · Flask · scikit-learn · Opinion MiningAbstract
YouTube generates billions of user comments daily, representing an enormous corpus of public opinion, audience
feedback, and social discourse. Automated sentiment analysis of this data is essential for content creators, brand
managers, and researchers who need to gauge audience reception at scale. This paper presents a comprehensive
machine learning-based platform for three-class sentiment classification (Positive, Negative, Neutral) of YouTube
comments, achieving 95.0% accuracy with both Logistic Regression and Naive Bayes classifiers.The system employs
TF-IDF vectorization with 5,000 features and bigram support on a 15,000-comment training dataset. Six machine
learning algorithms — Logistic Regression (95.0%), Naive Bayes (95.0%), SVM (94.97%), Gradient Boosting
(94.8%), KNN (94.8%), and Random Forest (94.53%) — are systematically trained, evaluated, and compared. The
NLP preprocessing pipeline performs lowercasing, URL removal, mention/hashtag stripping, non-alphabetic
character removal, and NLTK stopword filtering.The system is deployed as a Flask web application (port 5014) with
a Bootstrap 5 dark purple theme, Chart.js interactive visualizations (pie charts, bar charts), dual-mode YouTube
comment fetching (real YouTube Data API v3 and mock generation with 200+ templates), per-user analysis history,
a nine-chart EDA gallery, secure Werkzeug-PBKDF2 authentication, and Docker containerization. All six models
achieve above 94.5% accuracy, validating TF-IDF with bigrams as a highly effective feature representation for social
media sentiment classification.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.










