How does a data scientist go about understanding the data and determining how to build a model? What are the problems, and how can a model be misleading? What are the tools available at scale, and how easily can existing knowledge be leveraged?
We’ll explore a data scientist's methodology for predicting restaurant failures using Chicago's Open Data Portal and DSE. What data sources would make sense and how does intuition come into play? What are we trying to achieve, and how do we measure accuracy?
While perfect for those new to data science, we’ll also dive more deeply into the thought process, exploration, and analysis with code and visualization tools (Jupyter, ML, DSE, Cassandra, Spark).
This project was one involved in graduate studies at Harvard University, but the topic was explored and implemented by the City of Chicago.
Participants will come away with an understanding of how and why a platform for exploration makes life better and how various tools work. Feel free to clone the GitHub repo and maybe even come up with a better model!