Diving Into the Deluge of Data :: Lab 9 :: Regular Expressions and Recursion

Lab 9: Regular Expressions and Recursion

This lab uses regular expressions to grab semi-structured from a website and save it as CSV. The data is historical population data from US counties. We will use this data to create choropleth maps using Vincent and Vega. We will create an animated GIF from the pictures so that one can see trends through time. Finally, we will also use recursion to create a Sierpinksi Triangle.

Here is a picture of populatoin densities over time from 1790 to 2010 by decade:

Step 0: Lab Preparation

Step 1: Source Code

Step 2: Population Data

The Census Beureau has a pretty visualization of coastline and interior population densities from 1790 to 2010. We will make a simliar visualization using an animated gif created from a sequence of choropleth images. Below the Census Beureau visualization is a link to the data table. Your primary exercise this week will be extract this data from the web page source using regular expressions.

Step 3: Creating Maps

The file map.py is provided for you. Take a look at the code. We use the Vincent library along with a counties datafile us_counties.topo.json to bind county population data to map counties based on their FIPS code. Vincent outputs a Vega graph visualization in JSON format. The syntax is as follows:

  $ python3 map.py data.csv us_counties.topo.json pop 1790 2010
  

where 1790 is the start deacde and 2010 is the finish decade. This will output 23 maps named popYYYY.json respectively. You can use the Vega command-line tools to convert these JSON files into PNG files. Here is the syntax.

  $ vega/bin/vg2png pop.2000.json pop.2000.png
  

To convert all the files using BASH's builtin for loop use:

  $ for file in `ls pop*.json`; do vega/bin/vg2png $file "${file%.*}".png; done
  

To create an animated gif called pop.gif from these PNG files, use the convert command.

  $ convert -delay 20 -loop 0 pop*.png pop.gif
  

You can view your animated gif in a web browser using

  $ open -a safari pop.gif
  

Step 4: Recursion

Below are five Sierpinksi Triangles. You should write a program called triangle.py that, when called from the command line with argument N, produces a single Sierpinksi Triangle of depth N. You should also pass it an output file name and an image dimension. Here is some example syntax for creating a triangle of depth 8.

  python triangle.py 8 tri8.png 1000
  

Here are some implementation details:

Step 5: Submission