First semester

Dimension Reduction and Matrix Completion

Objectives

In modern datasets, many variables are collected and, to ensure good statistical performance, one needs to circumvent the so-called "curse of dimensionality" by applying dimension reduction techniques. The key notion to clarify the performance of dimension reduction is sparsity, understood in a broad sense meaning that the phenomenon under investigation has a low-dimensional intrinsic structure. Sparsity is also at the core of compressive sensing for data acquisition. The simplest notion of sparsity is developed for vectors, where it provides an opening to high-dimensional linear regression (LASSO) and non-linear regression, such as for instance generalized high-dimensional linear models, using regularization techniques. When the low-dimensional structure is not aligned with the chosen basis, however, such methods eventually fail, and we instead turn to embedding algorithms such as SNE or its variants to obtain a lower dimensional representation of the dataset.rnrnWhile clearly stating the mathematical foundations of dimension reduction, this course will focus on methodological and algorithmic aspects of these techniques.rn- Understand the curse of dimensionality and the notion of sparsity.rn- Know the definition of the Lasso and its main variants, as well as its main algorithmic implementations.rn- Understand the tuning of the Lasso and know the main techniques.rn- Know how to regularize a high-dimensional generalized linear model.rn- Understand the basics of neighborhood embeddings, and the main algorithms that employ this technique.rn

Course outline

– High-dimensional linear regression.rn- High-dimensional generalized linear models.rn- Embedding algorithms: SNE, t-SNE, UMAPrnrn

Prerequisites

Basic statistics, linear algebra and probability.