Last two weeks at the Metis Data Science Bootcamp were quite intense. We learned the basics of web scraping with two interesting and powerful Python packages, BeautifulSoup and Selenium, how to analyze and manipulate dataframes with Pandas and theory/applications of linear regression models.
At the end, we developed and presented to the class an individual project based on “movies data”. Everyone tried to answer a different question, from predicting success of movies based on tv series to understanding the key factors that determine the success of trilogies. My project was about predicting the box office opening income during the first weekend and determing the additional opening revenues in case of changing the release season given the genre.
A couple of lessons learnt: production budget and number of opening theaters play a important role in predicting the opening income, and sometimes the holiday season doesn’t represent the “best season” in case of comedy or action films.
More insights in this presentation; web scraping scripts and linear regression analysis in this GitHub repository.