Drinking Water Filtered by Population

At the consulting company that hires you with your double major in Spanish and Economics, you are helping to develop a report for a client on how to improve access to safe drinking water in developing countries. To get started, you decide to figure out which countries are doing a good job of improving access to safe drinking water.

Fortunately, you find this file drinkingWater.csv. (Mac users, try opening the file with open("drinkingWater.csv","r",encoding="latin-1") - apparently the character set the Mac assumes by default is even more restricted than the one on Windows; this is related to Friday's lecture.) It's part of the dataset downloaded from the bottom of this page from the Guardian, a newspaper in the UK that publishes a lot of data files. It contains the percentage of people in each country who had access to an "improved water source" in the years 1990-2010 (they say "Access to an improved water source refers to the percentage of the population with reasonable access to an adequate amount of water from an improved source, such as a household connection, public standpipe, borehole, protected well or spring, and rainwater collection. Unimproved sources include vendors, tanker trucks, and unprotected wells and springs. Reasonable access is defined as the availability of at least 20 liters a person a day from a source within one kilometer of the dwelling.")

But some of these countries are pretty small, and they are probably not relevant to the client who wants the report. So you decide to consider only countires that have a population greater than 500,000 (Davis is already 50,000, and San Jose is a million; so we are talking about excluding really small countries).

To get the populations, we'll use this population.tsv file from the CIA World Factbook.

To get you started, here is a file drinkingWater.py with just the main function, and stubs for the two functions that do the real work. Your code should all go into the two functions makeDictionary and readDWdata.

The makeDictionary function should read in population.tsv, and build a dictionary in which the keys are countries and the values are populations.

The readDWdata function should read in the file drinkingWater.csv, and print out the difference for any large enough country where the percentage of people with access to safe water changed between 1990 and 2010.

Specifically, for every line of the file, first check to see if it belongs to a country with a population over 500,000, using the dictionary. You do not have to worry about country names that don't match exactly, such as United States of America vs United States. Your program can mistakenly assume that these are different countries. Then, see if it has a percentage for 1990 and a percentage for 2010; if not, we can ignore it. Finally, if we do get both, compute how access changed by subtracting the percentage in 1990 from the percentage in 2010. If the difference is not zero, print out the name of the country, the two percentages, and the difference.

You must write this program as a collection of functions. If you want to add more functions than the two in the stub file, feel free.