Diving Into the Deluge of Data :: Lab 4 :: Stock Viz

Lab 4: Stock Viz

This lab explores visualizing stock data by market cap and percentage changes over small spans of time. The data comes from yahoo finance and is downloaded in CSV format. We will use a dynamic programming algorithm by [BBRR] to partition a square into p rectangles so that the sum of the perimeters of the rectangles is minimized when those rectangles appear in contiguous column


Beautiful visualizations of stock market data are available around the web. One such visualzation (available at Market Watch) shows stocks as rectangles where the size of the rectangle is proportional to the market cap and the color of the rectangle is proportional to its performance.

This visualization is based on a tree. It produces a nice tiling, but stocks with similar market share may be far away. A related visuatlization that groups stocks with close market shares together might provide information more efficiently.

Step 0: Lab Preparation

Step 1: Source Code

Step 2: Grabbing Data

The file fetch.py contains skeleton code to download a CSV file containing stock symbols, market capitalization, and price change percentages over a 50 day moving average. Here is how it works.

Step 3: Scrubbing Data

      def stock_info_from(file):
          """
          Takes a CSV file of the form

          STOCK_SYMBOL, MARKET_CAP, PERCENT_CHANGE_50_DAYS

          where

          STOCK_SYMBOL is a string
          MARKET_CAP is a string of the form "XX.XXXB" where B = BILLION
          PERCENT_CHANGE_50_DAYS is a string of the form "[+,-]XXX.X%"

          and returns a list of 3-tuples of the form

          (STOCK_SYMBOL, X, Y)

          where

          X is an integer (the actuall billion dollar number) and
          Y is a float where -20.5%  is -0.205

          sorted by market cap lowest-to-highest"""

Make sure to test your function out from the Python REPL. Your data should look similar to the following.

  >>> import stocks
  >>> stocks.stock_info_from("data.csv")
  [('LVNTA', 5658000000, 0.0516), ..., ('AAPL', 752200000000, 0.0784)]
  >>>
  

Step 4: Making Rectangles

Recall from Lecture 11 that given a list of areas A = [A0, ..., An-1] that sum to 1, we can produce, through dynamic programming, a partition P = [0, p1, ..., pq, n-1] of A into q columns of rectangles that tiles the unit square and minimizes the sum of the rectangle perimeters. Each consecutive pair of numbers in P should be viewed as slicing A into a column of rectangles where the column width is equal to the sum of the areas in that column. That is, for any consecutive pi, pi+1 in P, the areas Api,...,Api+1 all appear in the i+1 column. The width of that column is Api+ ... + Api+1

This functionality is available in the table module through the function min_partition.

Here is an example. Suppose that I have a list of 8 areas A, which yields a partition part = [0, 4, 6, 8] when using the table.min_partition function.

    >>> A = [0.02,0.04,0.06,0.08,0.2,0.2,0.2,0.2]
    >>> part = table.min_partition(A)
    >>> part
    [0, 4, 6, 8]
    

This means that the first column contains rectangles with areas 0.02, 0.04, 0.06, and 0.08, the second column contains rectangles with areas 0.2 and 0.2, and the third column also contains rectangles with areas 0.2 and 0.2. The first column has width 0.02+0.04+0.06+0.08=0.2 while the remaining columns each have width 0.2+0.2=0.4. The first rectangle in the first column has top-left and bottom-right coordinates respectively (0, 0), (0.2, 0.1)) because 0.02 is 1/10 the area of the four rectangles appearing in the first column.

Here's a picture of the tiling of the unit square.

Implement the function partition_to_rects.

    def partition_to_rects(part, areas):
        """
        return a list of rectangles, one per area, that partition
        the unit square in corresondence to 'part'

        :param part: a partition of areas of the form [0, p1, p2, ..., N]
        :param areas: a list of N areas where the sum of the areas is 1.0"""

You can test your code with the following

    >>> partition_to_rects(part, [0.02,0.04,0.06,0.08,0.2,0.2,0.2,0.2])
    [((0, 0), (0.2, 0.09999999999999999)),
     ((0, 0.09999999999999999), (0.2, 0.3)),
     ((0, 0.3), (0.2, 0.6)),
     ((0, 0.6), (0.2, 1.0)),
     ((0.2, 0), (0.6000000000000001, 0.5)),
     ((0.2, 0.5), (0.6000000000000001, 1.0)),
     ((0.6000000000000001, 0), (1.0, 0.5)),
     ((0.6000000000000001, 0.5), (1.0, 1.0))]
    

In general, consider the following strategy when implementing your function:

Step 5: Visualizing Rectangles

The draw_rects function takes a square image, a list of N rectangles that collectively tile the unit square, and a list of N colors and draws a projection of each rectangle, filled with the appropriate color, onto the image.

Some notes:

    def draw_rects(im, rects, colors):
        """
        Map and draw rectangles from the unit square onto the image

        :param im: an Image
        :param rects: a list of N rectangles where a rectangle is a pair of points
        :param symbols:  a list of N stock symbols corresponding to the N rectangles"""
        

Step 6: Putting it all Together

The function draw should perform the following:

 

To run your code from the command line use

      $ python3 stocks.py data.csv stocks.png 1024 1024
      
Your visualization should look like this.

Step 7: Submission