# Plotting COVID-19 cases

The Centre for Systems Science and Engineering (CSSE) at Johns Hopkins University publishes daily statistics of the number of confirmed cases of COVID-19 by country on its GitHub page. The short script below pulls data from this page to plot a bar chart of cases and growth in cases as a function of time for a given country. For example:

Change the value of the variable country to plot for a different country, using one of the values in the "Country/Region" column of the CSV file in time_series_covid19_confirmed_global.csv.

import sys
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

# If you have saved a local copy of the CSV file as LOCAL_CSV_FILE,
LOCAL_CSV_FILE = 'covid-19-cases.csv'

# Start the plot on the day when the number of confirmed cases reaches MIN_CASES.
MIN_CASES = 100

# The country to plot the data for.
country = 'United Kingdom'

# This is the GitHub URL for the Johns Hopkins data in CSV format
data_loc = ('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/'
'csse_covid_19_data/csse_covid_19_time_series'
'/time_series_covid19_confirmed_global.csv')

# Read in the data to a pandas DataFrame.
data_loc = LOCAL_CSV_FILE

# Group by country and sum over the different states/regions of each country.
grouped = df.groupby('Country/Region')
df2 = grouped.sum()

def make_plot(country):
"""Make the bar plot of case numbers and change in numbers line plot."""

# Extract the Series corresponding to the case numbers for country.
c_df = df2.loc[country, df2.columns[3:]]
# Discard any columns with fewer than MIN_CASES.
c_df = c_df[c_df >= MIN_CASES].astype(int)
# Convet index to a proper datetime object
c_df.index = pd.to_datetime(c_df.index)
n = len(c_df)
if n == 0:
print('Too few data to plot: minimum number of cases is {}'
.format(MIN_CASES))
sys.exit(1)

fig = plt.Figure()

# Arrange the subplots on a grid: the top plot (case number change) is
# one quarter the height of the bar chart (total confirmed case numbers).
ax2 = plt.subplot2grid((4,1), (0,0))
ax1 = plt.subplot2grid((4,1), (1,0), rowspan=3)
ax1.bar(range(n), c_df.values)
# Force the x-axis to be in integers (whole number of days) in case
# Matplotlib chooses some non-integral number of days to label).
ax1.xaxis.set_major_locator(MaxNLocator(integer=True))

c_df_change = c_df.diff()
ax2.plot(range(n), c_df_change.values)
ax2.set_xticks([])

ax1.set_xlabel('Days since {} cases'.format(MIN_CASES))
ax1.set_ylabel('Confirmed cases, $N$')
ax2.set_ylabel('$\Delta N$')

# Add a title reporting the latest number of cases available.
title = '{}\n{} cases on {}'.format(country, c_df[-1],
c_df.index[-1].strftime('%d %B %Y'))
plt.suptitle(title)

make_plot(country)
plt.show()

Current rating: 4.6

#### Sean Rommel 9 months, 4 weeks ago

This is an excellent routine. I'm making a few edits I'd be happy to share with you. Specifically, I'm recasting the plots as semilogy to show the beginning of roll-over/flattening. I'm hoping to also build a routine that will extract/plot USA data by state/county. I'd be happy to collaborate as my job permits. Still learning Python, but can help here if you are interested.
Sincerely,
Sean Rommel (Rochester, NY USA)

Current rating: 3.5

#### christian 9 months, 3 weeks ago

I would be very interested to see your improvements: the code in this post was updated in the one following it, and lives on GitHub at https://github.com/xnx/covid-19 – feel free to fork and send me a PR.
Cheers,
Christian

Currently unrated

#### Joseph Levine 6 months, 1 week ago

Hello,

I think there is an error in how you remove min cases.

If a cases count time series was
[0, 0, 10, 100, 99, 200]
we would get a graph that said 200 cases occurred on day 2 since 100 cases.

In a more extreme example, China had 0 cases for several weeks. All that time will just be dropped using this method. If I come up with something better I'll PR

Cheers,
Joseph

Currently unrated

#### christian 6 months, 1 week ago

Hi Joseph,
Please do correct me if I've got this wrong, but I think the data being read in are cumulative numbers not daily numbers, so only the first MIN_CASES are dropped from the data set.
Cheers, Christian

Currently unrated

#### Joseph Levine 6 months, 1 week ago

Hi Christian,

You are corret! I converted to daily cases and mixed myself up. Thanks for the prompt reply and the slick solution.

Joseph