Statistical Learning and Big Data
Instructor: Alice Paul
Textbook: [1] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd Edition). Springer New York, NY. ISBN 978-0-387-84858-7. [2] James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Springer New York, NY. ISBN 978-1-4614-7138-7.
Description: This course introduces modern statistical learning tools with a focus on those developed for big data. It covered three interconnected components: (i) statistical machine learning methods, (ii) the underlying algorithms, and (iii) computational tools. The course focused on the principal techniques to analyze data from start to finish: i.e., managing large data, exploring patterns, framing statistical problems, building efficient computational algorithms, and writing reports. Topics covered ranged from data management, feature engineering, clustering, convex optimization algorithms, tree/ensemble methods, and predictive modeling.
One key aspect of statistical learning in the context of big data is the emphasis on predictive modeling. Rather than focusing solely on understanding the underlying mechanisms of a phenomenon, statistical learning approaches prioritize the development of ML models that can accurately predict outcomes or make informed decisions based on the available data.
Assignments:
Spring ‘22