Research

Pre-Program Preparation

This page is dedicated to future students of the Master for Smart Data Science. 

The Smart Data Science program is a unique and innovative program that combines advanced notions in Statistics, Applied Mathematics, and Computer Science. This one-year program is quite intense because of:

  • the high number of class hours
  • a significant volume of topics, embedding a wide panel of models and algorithms

For these reasons, the teaching team provides some support before the beginning of the courses in order to ensure the students are well prepared before starting the program.

On this page, students will find lecture notes and exercises as well as corrections to some exercises. The different documents must be read and understood by each student before the beginning of the program.

DOCUMENTS

The first document is about the basics of Probability Theory which is a prerequisite for the Master: PDF PROBA

Understanding of the basics of Probability Theory represents an important and necessary step in view of completing the program. Indeed, most of the courses, for instance, Machine Learning or Models for Dependent Data, will heavily rely on these notions. Exercises are provided so that the students may check their understanding. Of course, some personal research and complementary readings are still recommended for complete understanding.

The second document (more brief) is about Linear Algebra. For most of the students entering the program, it represents a reminder but still, it is important to refresh these notions because of its usefulness for instance in principal component analysis (PCA) and linear models. In particular, the notions of linear vector space, matrix inversion, eigenvalues, and basis decomposition in Hilbert spaces are crucial. PDF LINALG

After reading the two previous documents, one might be interested in Linear Regression which illustrates perfectly the notions of Probability (first documents) as well as Linear Algebra (second document). PDF OLS

Here is a small TEST to check your training.

At the beginning of the first semester, some class hours will be dedicated to checking the students’ understanding, answering questions, and clarifying some points.

OTHER RESOURCES

Several good books are accessible online for free (links below were valid in March 2022).

Reading these is not a requirement but they provide interesting complementary information. The first and second books are about Math and might be used as a complement to the previous documents.

https://mml-book.github.io/book/mml-book.pdf

https://probml.github.io/pml-book/book1.html

The third book is more advanced as it deals with ML. A full course on this topic will be taught during the program.

https://hastie.su.domains/ISLR2/ISLRv2_website.pdf

 

 

TRAIN YOUR CODING SKILLS

There is plenty of relevant formation about Python and R available online. Each student should be familiar with both before the beginning of the program. We recommend using COURSERA; DATACAMP and Fun-MOOC (for which many courses are available in French).

For instance:

https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/

https://www.coursera.org/learn/probability-intro

https://app.datacamp.com/learn/courses/intro-to-python-for-data-science

https://app.datacamp.com/learn/courses/free-introduction-to-r

NB: Most online courses provide free access to the first chapter which would be enough for practicing before the program starts.

Contact

François PORTIER
Associate professor - Head of the Master for Smart Data Science
Email
francois.portier@ensai.fr
Téléphone
02 99 05 32 41