Second semester

Missing Survey Data

Objectives

Missing data problems are encountered in surveys when some units refuse to respond, or when it is impossible to contact them. Partial non-response occurs when a sampled individual answers part of the survey questions, and total non-response when no response is observed for an individual.
Non-response has consequences in terms of the variance of estimators (the size of the sample actually observed decreases) and above all in terms of bias: estimators unadjusted for non-response can be highly biased if respondents differ from non-respondents with regard to the variables studied.
The aim of this course is to present the different types of nonresponse, the factors that can help limit this problem, and classic methods for dealing with nonresponse in surveys.

At the end of this course, students should be able to:
-> master the steps involved in handling total non-response: separating non-respondent units from out-of-scope units, fitting a model to estimate response probabilities, setting up homogeneous response groups, calculating weights corrected for non-response.
-> master the steps involved in dealing with partial nonresponse: fit an imputation model for the variable under study, find explanatory covariates, choose an imputation mechanism suited to statistical analysis, implement imputation.

Course outline

Part 1: Introduction
Review of finite population sampling
Review of calibration methods
Types of nonresponse: total nonresponse, partial nonresponse

Part 2: Handling total nonresponse
Two-phase sampling
Reweighting adjustment
Homogeneous response groups
Applications

Part 3: Treatment of partial nonresponse
The imputation model
Simple imputation methods
Applications

Prerequisites

Survey theory, linear regression, generalized linear model