Predicting Covid-19 severity early

« It is little to say that COVID-19 pandemic has changed the world since early 2020. Mankind has been facing a virus known as « SARS-CoV-2 » which has already killed hundreds of thousands of people all around the world. Still today, several million COVID-19 infected patients are in the hands of health care personnel.

Governments are thinking about how to deal with this pandemic as it represents huge stakes in terms of health, economy, society, and environment. For the last few months, all the researchers have raced to find an efficient vaccine against the virus and to enable each human being to get back to a « normal » life, whether they are directly affected by the resulting crisis or not.

Before having found a vaccine, some specialized scientists wanted to help doctors and infectious diseases specialists to understand this virus. They wished to give them some simple tools to cure too many patients who were arriving at the hospital. At least it was the case of Haochen Yao et al., who tried to identify COVID-19 infected patients with potentially severe symptoms in the future.

This paper is based on the article « Severity Detection for the Coronavirus Disease 2019 (COVID-19) Patients Using a Machine Learning Model Based on the Blood and Urine Tests« . It was published on July 31st, 2020 in Frontiers in Cell and Developmental Biology journal. The authors of this article studied the severity of the symptoms developed by COVID-19 infected patients. Indeed the virus does not affect everyone in the same way around the world. Some patients are asymptomatic while others suffer from severe symptoms or even die. The patients involved in the study were kept under observation at the Tongji Hospital (affiliated to Huazhong University of Science and Technology, China) from January 18th, 2020, to February 13th, 2020. They gave their consent to let the authors achieve their work.

The goal was then to detect COVID-19 severity as early as possible from clinical, blood, and urine tests. A total of 137 COVID-19 infected patients were divided into two groups. In the first group, 62 patients developed moderate or mild symptoms. In the second group, 75 patients suffered from severe symptoms. Out of 75 from the latter, 21 died. Given a new patient’s blood and urine tests, the purpose was to determine to which category they belonged.

We compared the methods used as well as the algorithms implemented by the researchers (Logistic Regression, Support Vector Machines, Random Forests, KNN and Adaboost).

First, we discussed the debatable choices they made for data pre-processing: treatment of missing values and selection of the features. We studied the methods they used for model comparison before explaining in more detail their SVM algorithm, and how they reached an overall accuracy of 0.8148. We concluded that there are surely other methods to be used to improve their results which are nevertheless good given the limited data available to them. »