Mixture of hidden Markov models for accelerometer data

« The study of physical activity and the resulting health consequences are big challenges for many health researchers.

In order to measure physical activity, accelerometers are used in many studies. These are tools that can measure the acceleration of a part of a body from where it is attached. A recent big health study using an accelerometer was the Physical Activity and Transit (PAT) Survey made between 2010 and 2011. The objective of the PAT survey was to assess the health problems of New York and United States citizens that can be generated by physical inactivity.

The first part of this study was a telephone survey (2010-2011) about physical activity practiced at work, at home, or during active transportation (biking or walking) for instance. The data from this first part correspond to self-reported activity levels. The second part of the survey was an assessment of a sub-sample of the interviewees who agreed to wear an accelerometer so as to measure their activity levels. This is the data we are going to use for our presentation. This sub-sample represents 133 people, 65-year-old and over, who chose to participate in the survey.

We can find many approaches in the statistical literature for this type of data. One of them is discussed in Du Roy De Chaumaray et al. [2020] research paper. Their objective was to describe the measured activity levels of the PAT survey sub-sample and to compare the differences between the self-reported results and the data from the accelerometers.

Our purpose here was to reproduce the results from this specific research paper (Du Roy De Chaumaray et al. [2020]), which uses Mixture of Hidden Markov Models (MHMM) applied to the data from the Physical Activity and Transit (PAT) survey.

The first part aims to introduce the data from PAT study and the notation that is going to be used throughout the article. The second part introduces the different models we applied to our data. And in the last part, we used the R package MHMM available on CRAN (Du Roy de Chaumaray et al.). It allowed us to show how the model handles missing values and how the probability of misclassification decreases on simulated data. Then we applied the MHMM model on a sub-sample of the data from the PAT study. »