Journées de la statistique I2E
Insee-ENSAE-ENSAI ENSAI, Rennes - 9 & 10 juin 2026Programme
Chaque présentation dure 40 minutes, questions comprises.
Mardi 9/06
- 13h30 : accueil
- 14h : mot d’accueil de Corinne Prost, Directrice de la méthodologie et de la coordination statistique et internationale à l'Insee
- 14h10 - 15h30 : Sébastien Da Veiga (ENSAI), Julien Jamme (Insee)
- 15h30 - 16h40 : Session posters
- 16h40 - 18h00 : Johann Faouzi (ENSAI), Matthieu Lerasle (ENSAE)
- 19h : dîner
Mercredi 10/06
- 9h15 : accueil
- 9h30 - 10h50 : Arnak Dalalyan (ENSAE), Jean Rubin (Insee)
- 10h50 - 11h10 : pause café
- 11h10 - 12h30 : Sébastien Herbreteau (ENSAI), Jaouad Mourtada (ENSAE)
- 12h20 - 13h30 : déjeuner
- 13h30 - 14h50 : Meilame Tayebjee (Insee), Emmanuel Pilliat (ENSAI)
- 14h50 - 15h10 : pause café
- 15h10 - 16h30 : Austin Stromme (ENSAE), Lionel Truquet (ENSAI)
Organisation
- François Portier (ENSAI)
- Valérie Ledonné (ENSAI)
![]() |
![]() |
![]() |
![]() |
Titres et résumés des exposés
Sébastien Da Veiga (ENSAI)
Brief introduction to conformal prediction, with a discussion on recent research challenges
Conformal prediction has emerged recently as a promising and popular framework for producing confidence intervals around predictions with no assumptions on the data distribution and without relying on asymptotics on the number of observations. In this talk we will start by introducing the basics of conformal prediction, and discuss the numerous extensions that have been proposed to widen its practical applicability and computability (cross-validation, adaptivity, asymmetry, …). We will also discuss open research questions in this field.
Julien Jamme (Insee)
Titre et Résumé à venir
Johann Faouzi (ENSAI)
Time series clustering with CLUES-WEASEL
Time series data is very common in many real-world applications and in numerous domains, with increasing interest for automated information extraction using machine learning. One of these subfields is time series clustering, which consists in identifying clusters among a set of time series in an unsupervised fashion. Most time series clustering algorithms suffer from the same balancing act: they trade clustering performance for faster runtimes or vice versa. We present a novel time series clustering algorithm that we call CLUES-WEASEL, which stands for CLustering with the UnsupervisEd Second version of Word ExtrAction for time SEries cLassification. CLUES-WEASEL extracts features using the unsupervised version of the transformation step of WEASEL 2.0, which is a time series classification algorithm, then reduces these features using principal component analysis, and finally performs clustering with the k-means algorithm using these reduced extracted features. Through extensive experiments, we provide evidence that CLUES-WEASEL is significantly better than any other existing time series clustering algorithm while being (much) faster than any state-of-the-art one. We also show that the architecture of CLUES-WEASEL can work well with other time series feature extraction algorithms. Our findings highlight the relevance of CLUES-WEASEL for time series clustering.
Matthieu Lerasle (ENSAE)
Bornes de classification pour le MLE en régression logistique.
La régression logistique est un modèle élémentaire pour la classification. Le comportement asymptotique de l’estimateur du maximum de vraisemblance est décrit par le théorème de Wilks qui assure que son excès de risque est d’ordre d/n, où d est le nombre de covariables et n le nombre de données. Le lemme de Zhang est un résultat permettant de transférer les bornes d’excès de risque en bornes de classification. En appliquant cette recette standard, on obtient que le risque de classification du MLE en régression logistique est au pire d’ordre \sqrt{d/n}. Lorsque le design est Gaussien, le modèle satisfait une condition de marge qui peut être exploitée pour montrer que ce risque est en réalité au pire d’ordre (d/n)^{2/3}.
Dans cet exposé, je montrerai que le lemme de Zhang peut encore être précisé dans ce problème grâce à une condition de marge 2D.
En combinant ce résultat aux récentes bornes précises d’excès de risque non asymptotiques pour le MLE en régression logistique, on en déduit que le risque de classification de cet estimateur est en réalité d’ordre optimal d/n.
Ce travail est issu d’une collaboration avec H. Chardon et J. Mourtada.
Arnak Dalalyan (ENSAE)
A Simple Proof of Improved Wasserstein Bounds for Langevin Monte Carlo
I will present a simple and sharp analysis of the Langevin Monte Carlo algorithm. The main theorem provides a non-asymptotic upper bound on the Wasserstein-2 error under strong convexity and smoothness assumptions. The proof is shorter than existing ones and reveals that the discretization error is controlled by an average of coordinate-wise smoothness constants, rather than by the worst-case smoothness parameter. I will discuss the resulting improvement in the mixing-time bound, compare it with prior work, and show how the argument extends to variable step-size schemes.
Jean Rubin (Insee)
Titre et résumé à venir
Sébastien Herbreteau (ENSAI)
Divergence-Free Neural Networks with Application to Image Denoising
We introduce a resource-efficient neural network architecture with zero divergence by design, adapted for high-dimensional problems. Our method is directly applicable to image denoising, for which divergence-free estimators are particularly well-suited for self-supervised learning, in accordance with Stein's unbiased risk estimation theory. Comparisons of our parameterization on popular denoising datasets demonstrate that it retains sufficient expressivity to remain competitive with other divergence-based approaches, while outperforming its counterparts when the noise level is unknown and varies across the training data.
Jaouad Mourtada (ENSAE)
Estimation of discrete distributions in relative entropy, and the deviations of the missing mass
We consider the problem of estimating a distribution over a finite alphabet from an i.i.d. sample, with accuracy measured in relative entropy (Kullback-Leibler divergence). While optimal bounds on the expected risk are known, high-probability guarantees remain less well-understood. First, we characterize the performance of the classical Laplace (add-one) estimator, obtaining matching upper and lower bounds on its performance and establishing its optimality among confidence-independent estimators. We then characterize the minimax-optimal high-probability risk and show that it is achieved by a simple confidence-dependent smoothing technique. Notably, the optimal non-asymptotic risk incurs an additional logarithmic factor compared to the ideal asymptotic rate. Next, motivated by modern regimes in which the alphabet size exceeds the sample size, we discuss methods that adapt to the sparsity of the underlying distribution. We introduce an estimator using data-dependent smoothing, for which we establish a high-probability risk bound depending on two effective sparsity parameters. As part of our analysis, we also derive a sharp high-probability upper bound on the "missing mass", namely the total probability of symbols that do not appear in the sample.
Meilame Tayebjee (Insee)
Titre et résumé à venir
Emmanuel Pilliat (ENSAI)
A Unified Framework for Infinitely Many-Armed Bandits
We study bandit problems where the sampling budget is far smaller than the number of arms, possibly infinite. Instead of minimizing simple regret, which requires the arm means to be bounded, we maximize the expected reward of the recommended arm, with guarantees that hold even for unbounded distributions. The analysis relies on a single quantity that captures the difficulty of recommending a good arm. The resulting upper bounds recover known rates, uncover new transition phenomena tied to the noise level, and give the first guarantees for unbounded distributions. The talk also offers algorithmic insights, including a practical refinement with strong empirical performance and an efficient implementation.
Austin Stromme (ENSAE)
On the implicit regularization of Langevin dynamics with projected noise
We study Langevin dynamics with noise projected onto the directions orthogonal to an isometric group action. This mathematical model is introduced to shed new light on the effects of symmetry on stochastic gradient descent for over-parametrized models. Our main result identifies a novel form of implicit regularization: when the initial and target density are both invariant under the group action, Langevin dynamics with projected noise is equivalent in law to Langevin dynamics with isotropic diffusion but with an additional drift term proportional to the negative log volume of the group orbit. We prove this result by constructing a coupling of the two processes via a third process on the group itself, and identify the additional drift as the mean curvature of the orbits.
Lionel Truquet (ENSAI)
When Taylor Meets Taylor: Asymptotic Bias Corrections for Fluctuation Scaling
Taylor's power law of fluctuation scaling states that the variance of a stochastic quantity scales as a power of its mean, a relationship widely observed in ecology, finance, physics, and even in the distribution of prime numbers. In ecology, empirical estimation of this law typically proceeds through a log-log regression between empirical variances and empirical means computed from collections of time series observed across many units or locations. This naturally leads to a two-dimensional asymptotic framework in which both the number of units n and the time horizon T grow jointly large.
We show that estimating a growing number of empirical moments may induce non-negligible asymptotic biases closely related to the Scott and Neyman incidental parameter problem. Similar effects are well known in the asymptotic theory of panel data models, where the estimation of many nuisance parameters affects the limiting behaviour of structural estimators.
Using higher-order Taylor expansions of the estimating equations, we derive explicit analytical bias corrections and establish valid asymptotic inference for the resulting debiased estimators under general heterogeneity conditions. Several open questions and possible extensions are also discussed.



