In these weeks at the Metis Data Science Bootcamp we covered SQL on cloud servers (week 4), supervised learning algorithms with sklearn and statsmodels (week 4-5), classification errors (week 5), interactive visualization and d3.js (week 6).
As a connecting fil rouge, we also worked on a individual project. I used this public dataset from the University of California containing data about bank marketing calls for a short term deposit subscription.
My initial goal was to build a classifier in order to understand the probability of subscription given a set of features about the clients (age, job, marital status, etc), the economic context (euribor rate, unemployment, etc) and the performances of the previous marketing campaigns (days from the last contact, previous campaign result, etc).
At the end, I chose a logistic regression model for its performance in terms of precision and recall.
Then, as my first d3 project, I focused on creating a day-to-day dashboard for the bank employees that could help them to address the marketing calls to the right clients in the right day.
The interface of this decision support system is quite simple: given some inputs about the current day and the economic indicators, it shows the customers sorted by their short term deposit subscription probability, with the ability to look for further details about the specific client.
Some lessons learned during this project: feature selection and model evaluation require a deep understanding of the underlying math and the tools used, and visualization can take the same (or even more) amount of time of cleaning and preparing the data.
Github repository of this project
Update (02/24/2015): I moved the logistic regression model to Python (instead of including it in the web page with javascript) using the Flask package: live demo here. The response of the server can be a bit slower, but in this way it will be easier in the future to create a model with online training from the user inputs.