I utilized Python libraries such as Pandas, Matplotlib, NumPy, Seaborn, and SciPy to clean, prepare, manipulate, and visualize five large datasets, including Stress, Alcohol Consumption, Smoking, Unhealthy Eating Behaviors, and Stroke data. I conducted statistical analyses, including Chi-square tests and ANOVA, to identify significant correlations within each dataset. Additionally, I applied machine learning techniques such as Classification, Linear Regression, K-Nearest Neighbors, Decision Trees, Logistic Regression, and Random Forest classifiers using Scikit-learn to the Stroke dataset, exploring how stress, alcohol consumption, smoking, and unhealthy eating behaviors correlate with the rising incidence of strokes in young adults.
Explore the project: https://thinguyen3008.github.io/