You Knew Quite A Bit, John Snow

Diseases spreading are not a new phenomenon. They’ve been spreading as long as humans have been alive and will continue to spread long after you and I are dead. We don’t have the power to completely stop the spread of diseases but years of study will help us at least prevent widespread, uncontrollable health issues or even death. Let’s take a look at how data became such an important part of life.

Our journey begins in the middle of the 19th century. John Snow, a sceptic of the popular theory that diseases were caused by pollution or what people called ‘bad air’, observed that this was not possible with the case of the cholera outbreak. He was the first person to conduct an epidemiological study on the mode of transmission of cholera.

In August 1848, he gathered reports from various homes that were supplied with water from two different water companies; the Lambeth Water Company and the Southwark-Vauxhall Water Company. Lambeth drew water from the upper reaches of the Thames, which was not contaminated by the fecal matter from London. Southwark-Vauxhall drew water from the lower Thames in which most of the London sewage was dumped. In these studies, he found that the houses supplied by Southwark-Vauxhall had a higher number of cholera cases when compared to the houses supplied by Lambeth.

Based on this wonderful data, he was able to confidently say that the cause of the epidemic was fecal matter contaminated water. Many people did not want to accept it, mainly due to them being unable to stomach that they were swallowing fecal matter. In 1854, there was a severe outbreak of cholera on the now named Broadwick street. He talked to the residents and was able to trace the source of the outbreak to the water pump. Even with a complete scientific analysis, including microscopic and chemical analysis, he was not able to confirm that the water was the cause of the outbreak. Using the data collected from the water companies and the testimonies of the residents, he was able to convince Queen Victoria to shut down the water pump on Broadwick street. At this time, the cases were already on the decline, and the shutting down of the water pump resulted in the complete stoppage of the spread of cholera.

John Snow, by just correlating and analysing the data available, was able to stop a widespread epidemic. This is what kick-started modern epidemiological practices and helped control the spread of many diseases in the modern era.

John Snow is considered the first person to do such a data-gathering study on disease spread and transmission. Hold on to your head because we’re going to jump forward a few decades to 2020.

We land in a quarantined society. The spread of COVID-19 is known by everyone in the world. Epidemiological studies on this modern epidemic have been increasing day by day across the Earth. A great deal has changed since John Snow collected data from homes. Nowadays, it’s not possible to just collect data and come to the correct conclusions. The data must be synthesised, cleaned, and analysed to be of any use.

One of the companies that did a lot of research on COVID-19 was nFerence. They are a software company that takes biomedical knowledge and makes it computable so that the world’s healthcare problems can be solved. They have done some research on how various vaccines such as polio and varicella reduced the number of COVID-19 cases. They published their preliminary analysis. But, even their preliminary analysis has data gathered from more than 100,000 people as compared to John Snow’s few homes. While gathering this data, they had to take into consideration the race, geography, pre-existing medical conditions of the people they gathered the data from. These constraints on data gathering weren’t considered during John Snow’s time. With this data, they state that varicella, polio, geriatric flu, HIB, MMR, PCV13, and HepA-HepB vaccines received over a time period of 1,2, or 5 years show associations with lower cases of COVID-19 infection.

This is only one example of how data and data analysis has helped in epidemiological studies. Another research, by the same company, analysed how long it took for a COVID-19 patient to give a negative SARS-CoV-2 PCR test. In this study, there were many groups of people from whom data was collected. A total of 874 infected people were tested, out of which 53 tested positive 4 WEEKS after the initial positive test! One of the curiosities in this group of people was that a majority of the people in this group (40 out of 53) were not hospitalised during this timeframe. In another set of 370 infected individuals, the average time for testing negative was 21.2 days with a variation of 9.3 days. This study gives us an understanding of how infections spread from individuals and how tests for analysing how long an individual remains infectious is required for mitigation of community transmission.

John Snow could have never known that his research would grow into such a big area of study. As our world develops, data gets more and more complicated. That is why data collection and analysis is essential for any type of research, more so in epidemiology. In our complicated world data collection is even more important than it ever was in the 19th century. Data analysis is what made all these studies on diseases possible. Without it, we would still be stuck in an age where moving away or dying was the only option when an epidemic strikes.

So, is data analysis wonderful?
How wonderful is it?
So wonderful that it can help prevent widespread death and slow down a global pandemic at the
same time.


1854 Broad Street cholera outbreak
wikipedia:John Snow
John Snow’s legacy: epidemiology without borders
Quantifying the prevalence of SARS-CoV-2 long-term shedding among non-hospitalized
COVID-19 patients

Exploratory analysis of immunization records highlights decreased SARS-CoV-2 rates in
individuals with recent non-COVID-19 vaccinations

John Snow, MD: anaesthetist to the Queen of England and pioneer epidemiologist

Written by Srinidhi Sridhar

Integrated M.Sc Life Sciences

Central University of Tamil Nadu



  1. Good case study, and interpretation to what is happening in this pandemic situation and how important is the data collection and data analysis. Way to go..

  2. A good read,
    data science is no wonder the most significant tech with applications in all areas of modern living ..bringing out hidden solutions which wud be impossible to fathom otherwise….
    Nice to know it might even bring an end to this pandemic someday.
    Very well written.

Leave a Reply

%d bloggers like this: