Weighted Normal Spatial Scan Statistic

« More than ever in view of the current pandemic, locating clusters of diseases, understanding why some diseases are concentrated in certain ar­eas and finding solutions to them is a matter of utmost importance.

In their article entitled Weighted Normal Spatial Scan Statistic for Het­erogeneous Population Data from the Journal of the American Statistical Association, written in 2009, Lan Huang, Ram C. Tiwari, Zhaohui Zou, Martin Kulldorf and Eric J. Feuer offer a statistical solution of the same name to the first step: detecting clusters.

That solution is based on specific continuous measures of that dis­ease. A potential cluster would be found out if those measures are either too high or too low. Unlike other spatial scans, the main interest of this one here is in geographical distribution, in this case, we refer to aggregated data. The clus­ter detected is a collection of geographic units with high/low regional measures that directly re­flect the behavior of the cells, instead of the indi­viduals inside cells.

Remember that considering aggregated data slightly increases the difficulty of the problem to be solved as the variance is no longer the same for all observations. They consider a circular area to simplify, assuming its center is known and the only parameter is its ra­dius. In addition to that, the authors add weights to their study to represent the uncertainty of re­gional measures or the sample size (number of observed cases). For example, hospitals do not all welcome the same amount of patients and so it can have an effect on the variances. To test their results, they applied several simulations as well as applications on real data namely:

  • 1988­-2002 stage I and II lung cancer sur­vival data in LA county (diagnosis of survival rate)
  • 1999­-2003 breast cancer age­adjusted mortality rate date in the U.S.

Several issues are faced here:

• a statistic test is used here because the exhaustive research method, consisting of testing each zone one by one to get the one with the most extreme measures would be way too long and too expensive to imple­ment

• using aggregated data makes the problem even more complex as there is no known statistical law for such data

Throughout our own article, we tried to ex­plain most of their results as well as the issues faced here. First, we explained how it would have worked to study individual data and the ex­haustive research method. We then detailed their method and tried to implement it in a real case. We were interested in the rate of premature babies for 94 departments in 2018 in metropoli­tan France. From the spatial discrepancies in premature births, we aim to detect the environ­ mental factors that would cause these high­risk births. »