Projects

IMBD Review Sentiment Prediction

Stats 101C final project using R. In this project, we used predictive models: Logistic Regression, K-Nearest Neighbors, LDA, QDA, and Random Forests predict IMBD review sentiment.

Yelp Review Rrue Rating Prediction with Neural Networks and Bert Model

PIC 16B final project using Python. Our project successfully developed a model to predict the star ratings of Yelp reviews based on the text review content. By transforming review text into sequences and applying a trained machine-learning model, we were able to predict star ratings with a reasonable degree of accuracy. Thus, our model provides a valuable tool for both businesses and customers in understanding and analyzing feedback through automated star rating predictions.

2023 UCLA DataFest: ABA pro-bono lawyer data

2023 UCLA DataFest, analyzed ABA pro-bono lawyer data using R, Tableau, Data Scraping and Data Mining.

We should be devoting more resources, education about pro bono, and outreach towards the states with higher poverty rates because we can see that they are under utilizing the pro bono service. Our final recommendations are to invest more outreach towards women and high poverty groups using resources and education efforts during the second quarter to anticipate the peak pro bono demand during the third quarter.

Housing Price Prediction

Stats 101A final project using R. In this project, I aimed to research the relationship between housing prices and housing features. To explore their relationship, I chose a multiple linear model, and used various methods to find the regression model which best described the relationship between explanatory and response variables. Through the project, I justified why the regression model is appropriate for describing the relationship among the variables.

Stats 112 Final Project

In our study, we aim to analyze reflection papers from each group on the guest speaker Ms. Susan Philips. We are interested in identifying the underlying common themes among each group’s different understanding of the chapters about individuals immigrating to the U.S. and the guest speaker Ms. Phillip’s talk. To do this, we utilized text mining techniques such as frequency graphs, and word clouds. For our data model, we used text networks, a cluster dendrogram, and LDA topics modeling to find common and significant words in the reflection papers.

Yelp Dataset Analysis

Stats141 final project using R, Python, and Tableau to analyze Yelp data.

Yelp is a platform that crowdsources reviews about businesses, with the majority being restaurants. In this study, our objective was to gain insight into the following questions: What factors contributed to the overall star rating of Chinese restaurants? Which factor has the greatest effect? Do a restaurant’s Yelp reviews accurately reflect its star rating?

Film Production Gentrification in Atlanta GA

Using data to explore how has the rising film industry brought about gentrification in Atlanta therefore disrupting and displacing the historic culture of the community? Presented our findings through a WordPress website. Used tools such as R, Python, Tableau, web scrapping, canva, timeline.js.

Drug Effect on Attention Study

Stats 101B final project using R. A Study on Drug Effects: Testing the Benefits of Caffeine and Nicotine on Attention using repeated measure.