Many Eyes put the county map option back! After a brief disappearence from Monday afternoon through Tuesday morning.
In this project we will do a real data analysis and visualization project, producing the map in the picture above. The map shows the number of home foreclosures in the third quarter of 2008, per 1000 people, in some of California's counties. We can see that the problem is much worse in some places than in others.
The data comes from two Web sites. Data Quick sells data to various businesses in the real estate market. They put up this list comparing the number of foreclosures in some California counties this year and last year. Their table tells us where the problem is getting worse (almost everywhere, unfortunately), but not which counties are hardest hit. For instance, there were 3,482 foreclosures in Alameda county. But there are a heck of a lot of people in Alemeda county. Is this a lot, or a little, considering how many people there are?
To get a better sense of how different counties are affected, we'll also look at the populations of the counties. For this, we of course go to the US Census. This table tells us the total population in every county, and also gives us a lot of other information we don't really need.
Finally, we'll actually make the map using IBM's Many Eyes visualization service. This cool Web site lets up upload a table of data, and then choose different ways of visualizing it. We will use the 'US County Map' Visualization.
I've 'scraped' the data off the Web pages, and put it into two files. The foreclosure data is in foreclosures.txt, and the Census data is in census.txt. You do not need to go to the Web pages to get the data, but you should look at them to see what the data means. You do not need to hand in these two files, your TA will get them from here.
Your program should read the data from the two files, and write out a third file, rate.txt, which you should upload to Many Eyes to make the visualization.
If you work with a partner, you should get together, sit down, and work at the same computer. That way you'll both learn the things you'll need to know to pass the tests. If you let your partner do all the work, you will end up failing the midterm (and, probably, your partner will be very annoyed).
The basic structure of the program is the two-loop "build a data structure and do something with it" structure we have been considering in the last two lectures. In this case, we have foreclosure information for some, but not all, counties in California. For each of these counties, we need population information. So first, we build a data structure (a dictionary) that can provide population information for a county, using data that we read from census.txt. Then, we read foreclosures.txt, look up the population for each county in our dictionary, do the calculation, and write an output file.
{'Humboldt County': '126,518', 'Sonoma County': '458,614', 'San Luis
Obispo County': '246,681', 'Glenn County': '26,453', 'Nevada County':
'92,033', 'Yolo County': '168,660', 'Sacramento County': '1,223,499',
...
'124,279', 'Mendocino County': '86,265', 'Plumas County': '20,824'}
Los Angeles 17073.0
Orange 5692.0
San Diego 7062.0
...
Colusa 68.0
Sutter 269.0
Hand in this program on Nov. 5.
Los Angeles 17073.0 9519338.0 1.7935070695
Orange 5692.0 2846289.0 1.99979692856
San Diego 7062.0 2813833.0 2.5097438263
...
Yuba 307.0 60219.0 5.09805875222
Colusa 68.0 18804.0 3.61625186131
Sutter 269.0 78930.0 3.40808311162
You do this by looking up the name of the county in the dictionary to get the population, and then doing the calculation.
Hand in just the program on myUCDavis. We do not need the input files, or rate.txt. Your TA will run the program and verify that it produces rate.txt properly, and he will follow the link in your comment to see the visualization.