This is day 3 of my countdown to Open Data Day. That sneaky Open Data Day is slithering and sneaking up on us like a greased snake on an ice rink. So far we’ve looked at data from the provincial government and the city. That leaves us just one level of government: the federal government. I actually found that the feds had the best collection of data. Their site data.gc.ca has a huge number of data sets. What’s more is that the government has announced that it will be adopting the same open government system which has been developed jointly by India and the US: http://www.opengovtplatform.org/.

One of the really interesting things you can do with open data is to merge multiple data sets from different sources and pull out conclusions nobody has ever looked at before. That’s what I’ll attempt to do here.

Data.gc.ca has an amazing number of data sets available so if you’re like me and you’re just browsing for something fun to play with then you’re in for a bit of a challenge. I eventually found a couple of data sets related to farming in Canada which looked like they could be fun. The first was a set of data about farm incomes and net worths between 2001 and 2010. The second was as collection of data about yields of various crops in the same time frame.

I started off in excel summarizing and linking these data sets. I was interested to see if there was a correlation between high grain yields per hector and an increase in farm revenue. This would be a reasonable assumption as getting more grain per hector should allow you to sell more and earn more money.  Using the power of Excel I merged and cut up data sets to get this table:

Farm RevenueYield Per HectorProduction in tonnes
200118326722005864900
200221119119003522400
200319433126006429600
200423805531007571400
200521835032008371400
200626283829007503400
200730091826006076100
200838159732008736200
200938125028007440700
201035663632008201300
201148005633008839600
Looking at this it isn’t apparent if there is a link. We need a graph!

I threw it up against d3.js and produced some code which was very similar to my previous bar chart example in HTML 5 Data Visualizations – Part 5 – D3.js

Grain yields in blue, farm revenues in orangeGrain yields in blue, farm revenues in oranges

I didn’t bother with any scales because it is immediately apparent that there does not seem to be any  correlation. Huh. I would have thought the opposite.

You can see a live demo and the code over at http://bl.ocks.org/stimms/5008627