A Data Army to Fight our Current Battle

  • António Crespo de Carvalho
  • 5 April 2020

Prediction, protection and containment. Three use-cases, that, supported by reliable data sources, can be decisive in helping governments in their fight against the pandemic.

On the 6th of March of 2020 I was welcoming my family in Madrid to attend my graduation ceremony after completing a master’s in the world of Data Science. In the middle of the traditional celebrations following an intense and exciting year, the unavoidable question popped up: what exactly is it that you have been studying? What is this mysterious science which seems to be transforming the world in front of our eyes? How can we use it? Finally, someone stepped up and asked this interesting question!

For all of you students, colleagues, professors, people who already know the answer to the question, you can probably skip this article. It is written in an educational spirit and targeted at people who either don’t know, or think they know but want to understand better. I’m going to try here to explain what I learned in the last year, and how can it be used for the overall benefit of society.

Drew Conway, an American data scientist, defined the science as a “mixture of information hacking abilities, domain knowledge and maths and statistical skills”. In a later and much clearer definition, David Lazer, political and computer sciences professor in the USA, argued that it combines “quantitative methods, computer science and social sciences”. I can assure you that even after a whole year studying this topic, the challenge of summarizing it in words is not a simple one. Data Science is a process that uses qualitative and quantitative data to solve human problems, by utilizing techniques powered by mathematical and statistical knowledge.

One year ago, in a conference with some of the most valuable companies in the world, Satya Nadella, Microsoft’s current CEO, said: “The most important resource that we all share in this room is data”. By that time, Microsoft was worth something like 905 billion dollars. Amazing! But is the benefit of this new science exclusively to serve companies’ balance sheets? Does it make sense see algorithms merely as hidden strategies to manipulate clients? Could we, on the other hand, take a more holistic view of the potential of this new reality? Please read on to discover!

On the 31st of December 2019, the Wuhan Health committee reported 27 cases of an unknown pneumonia. At the time of writing in the last week of March, there are now more than 350.000 cases worldwide with an overall mortality rate of around 4.4%. So what happened between December 31st 2019 and March 26th 2020?

Let´s start by analysing the population of city X, which has a population of 17 citizens, represented here by white balls:

Figure 1 – Population in city X day N


After one day, two people were found to be infected by the virus, represented by black balls:

Figure 2- Population City X on day N+1

Out of these three balls, one felt regular flu symptoms and stayed isolated at home. The other two, feeling tired and with a little fever, took public transport, went to work and eventually returned home. One of them decided to stop by the supermarket and do some shopping. After 5 days, this was the outlook of city X’s population:

Figure 3- Population city X on day N+5

Both balls in the middle carrying the virus contaminated two more (with whom they were living) and one of them contaminated a third one who was waiting next in line at the supermarket queue. Multiply this by infinite balls (meaning people) and add other dimensions not previously mentioned: international trips (white balls entering city X and vanishing among black balls), black balls entering hospitals and sharing this surroundings with white balls, or even black balls in manufacturing facilities or corporate meetings surrounded by white balls. What I just described is an almost infinite process of data point creation.

Looking at the last diagram (Figure 3), we can already perform some analysis: we can measure the distance between different balls, analyse the movement inside and outside city X of both types of balls, estimate the conversion time of one white ball into a black ball and vice-versa or even analyse the active channels of black balls spreading the virus. This is called network analysis, an area that could be tremendously valuable in tracking and resolving the current pandemic. Individually, each person can be described according to variables: number of people that share their household or workplace, locations between which he/she commutes, physical address, transport used, health records, and many others. Using individual data through advanced prediction techniques, such as machine learning, it would be possible to determine the probability of contamination of each person, helping health authorities to arrest the virus’s spread.

As a result, I want to highlight three use-cases that, supported by reliable data sources, can be decisive in helping governments in their fight against the pandemic:

  1. Prevention: the data regarding the number of infected people in the pandemic´s origin country, number of available tests, country demographics and mobility indexes allow the estimation of the number of people to be infected. This will automatically enable efficient healthcare logistics, and projected requirements for beds, ventilators, masks and other items;
  2. Protection: collecting figures of the infected population in the country of origin can provide ground for the extrapolation of that data, creating health profiles of afflicted members of the population which would eventually lead to have clusters of people with similar inherent characteristics, placing them along a normalized scale of risk. This will shift responsibilities over to citizens, who should be accountable for their exposure to the existent risks, therefore requiring less communication and alerting efforts by the government;
  3. Containment: collecting data regarding journeys of infected people via geolocation (as China did) will clearly identify the virus´ propagation channels. This leads to an automatic notification of the responsible health units that can finally enforce coercion policies against antisocial behaviour.

All of this looks beautifully simple in theory. However, it is not. It’s hard to recognize that challenges largely outweigh the opportunities in this area, mainly because many of the assumptions on which these suggestions are based are, at this moment, somewhat utopian. Most of all, I would highlight the need for perfect coordination between healthcare units, governments, citizens and other involved agents; the availability of resources such as human knowledge, infrastructure and collection and processing data technologies and the idealistic international cooperation between countries and continents. If we are capable of leveraging all these tools we will certainly win this battle in a war that looks like it will be anything but short!

  • António Crespo de Carvalho
  • Guest columnist