How to Build a Simple COVID Data Graph in Python
How to Make a Efficient COVID Data Visualizer
This tutorial assumes you have basic Python knowledge and limited-to-some bash knowledge
Step 1: Create Base Directory
First, you need a directory. In my opinion, a bash repl is the coolest so lets make one :D. In the
main.sh file insert the following code:
#!/bin/bash clear python main.py
Lets break this down line by line.
#!/bin/bash - Sometimes bash glitches and reverts to the sh language, so make sure the file knows it's the most recent bash
clear - Clears the terminal which can be helpful if the terminal is full
python main.py - Runs our future program!
Now, create a file named
main.py, you probably expected this ;).
The last file you need is from here: https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv - copy all the data and put it in a file named
data.csv. Remember to remove the first line of that file which says: "date, cases, deaths."
Your directory should look like this:
Note: You could also just do this is a python repl without any bash, I just like bash :D
Step 2: Fetch Data
main.py file, we need to get the csv data. How do we do that? Well we have to open the file, split the lines into a list, split all the indexes into smaller lists, and boom, it works! Here is the code:
csv = open('data.csv', 'r') data_raw = csv.read().split('\n') csv.close() data =  for line in data_raw: data.append(line.split(','))
Lets break this segment down:
First we open the file using the
Then we read the file by splitting for each line which is representing by the
\n character. Next, we close the file to save memory :D. We define the data list which has the final splitted data. Next, we iterate through the entire list that is splitted by the line. For each of the lines, we split it by the commas which is the point of a csv. You append that data and a sample index would look like this:
['2021-1-1', '0', '1']
Step 3: Figure out Time Difference
The problem is, we have a lot of dates in our values, and we need integer x values. To figure this out, we need to use the time module to figure out the difference between dates. We need to get the last and first dates as our seperation values, so lets set those. The following code:
start_date = data.split('-') end_date = data[-1].split('-')
Now lets find the difference between the two dates
from datetime import date start_date = date(int(start_date), int(start_date), int(start_date)) end_date = date(int(end_date), int(end_date), int(end_date)) difference = end_date - start_date difference = difference.days
This pretty much turns the dates into values which the module will understand, which can be modified to get the difference in days!
To break it down, first we import a module. Then as stated, we convert the dates into a way the module will understand. We use a built in function (subtraction) and use the module to understand it which returns the amount of days!
Next, we split the data. To do this, we make two smaller lists and assign values to them:
cases =  deaths =  for value in data: cases.append(int(value)) deaths.append(int(value))
This justs splits the data list into two smaller, easier to use data sources.
Step 4: Plot Data
Now we need to use matplotlib!
Lets aqquire the module with the following line:
import matplotlib.pyplot as plot
After we import, there is a glitch with replit.com so we need to once again clear the terminal:
import os os.system('clear')
To plot both lines on the axis, we just use the following code:
plot.plot(cases, label="Cases") plot.plot(deaths, label="Deaths") plot.title("Cases and Deaths for COVID-19 in day intervals") plot.xlabel('Days') plot.ylabel('Units in 10 Millions')
You can change the title, however it meets most criteria and is logical. Now, lets show the graph!
That was simple... :D
Hope you liked the tutorial, I spent 2 months building one of these and then built one in 10 minutes :/
What else could you add?
1. Counties using more github data
2. Scrape the data
3. Better graphing
4. Make it into a webpage!