First semester

Machine Learning for Data Science

Objectives

Upon completing this course, students should be able to:

– select the appropriate methods;
– implement these statistical methods;
– compare leading procedures based on statistical arguments;
– assess the prediction performance of a learning algorithm;
– apply these key insights into class activities using statistical software.

Course outline

This course focuses on supervised learning methods for regression and classification. Starting from elementary algorithms such as ordinary least squares, we will cover regularization methods (crucial in large scale learning), nonparametric decision rules such as support vector machine, the nearest neighbor algorithm and CART.

Finally, bagging and boosting techniques will be discussed while presenting random forest and XGboost algorithm.

We shall focus on methodological and algorithmic aspects, while trying to give an idea of the underlying theoretical foundations. Practical sessions will give the opportunity to apply the methods on real data sets using either R or Python. The course will alternate between lectures and practical lab sessions.

Prerequisites

Linear algebra, probability, optimization