data visualiztion

Open Data: Countdown to Open Data Day 2

Open Data day draws ever closer, like Alpha-Centauri would if we lived in a closed universe during it contraction phase. Today we will be looking at some of the data the City of Calgary produces. In particular the geographic data about the city.

I should pause here and say that I really don’t know what I’m doing with geographic data. I am  not a GIS developer so there are very likely better ways to process this data and awesome refinements that I don’t know about. I can say that the process I followed here does work so that’s a good chunk of the battle.

A common topic of conversation in Calgary is “Where do you live?”. The answer is typically the name of a community to which I nod knowingly even though I have no idea which community is which. One of the data sets from the city is a map of the city divided into community boundaries. I wanted a quick way to look up where communities are. To start I downloaded the shape files which came as a zip. Unzipping these got me

  • CALGIS.ADMCOMMUNITYDISTRICT.dbf
  • CALGIS.ADMCOMMUNITYDISTRICT.prj
  • CALGIS.ADMCOMMUNITYDISTRICT.shp
  • CALGIS.ADMCOMMUNITYDISTRICT.shx

It is my understanding that these are ESRI files. I was most interested in the shp file because I read that it could be transformed into a format known a GeoJSON which can be read by D3.js. To do this I followed the instruction on Jim Vallandingham’s site. I used a tool called ogr2ogr

ogr2ogr -f geoJSON output.json CALGIS.ADM_COMMUNITY_DISTRICT.shp

However this didn’t work properly and when put into the web page produced a giant mess which looked a lot like

Random Mess

I know a lot of people don’t like the layout of roads in Calgary but this seemed ridiculous.

I eventually found out that the shp file I had was in a different coordinate system from what D3.js was expecting. I should really go into more detail about that but not being a GIS guy I don’t understand it very well. Fortunately some nice people on StackOverflow came to my rescue and suggested that I instead use

ogr2ogr -f geoJSON output.json <strong>-t_srs "WGS84"</strong> CALGIS.ADM_COMMUNITY_DISTRICT.shp

This instructs ogr2ogr that the input is in World Geodetic System 1984.

Again leaning on work by Jim Vallandingham I used d3.js to build the map in an SVG.

The most confusing line in there is the section with scaling, rotating and translating the map. If these values seem random it is because they are. I spent at least an hour twiddling wit them to get them more or less correct. If you look at the final product you’ll notice it isn’t quite straight. I don’t care. Everything else is fairly easy to understand and should look a lot like the d3.js we’ve done before.

Coupled with a little bit of jquery for selecting matching elements we can build this very simple map. It will take some time to load as the GeoJSON is 3 meg in size. This can probably be reduced through simplifying the shape files and reducing the number of properties in the JSON. I also think this JSON is probably very compressible so delivering it over a bzip stream will be more efficient.

The full code is available on github at https://github.com/stimms/VectorMapOfCalgary

 

Open Data: Countdown to Open Data Day

Only a few more days to go before Open Data Day is upon us. For each of the next few days I’m going to look at a set of data from one of the levels of government and try to get some utility out of it. Some of the data sets which come out of the government have no conceivable use to me. But that’s the glory of open data, somebody else will find these datasets to be more useful than a turbo-charged bread slicer.

Today I’m looking at some of the data the provincial government is making available through The Office of Statistics and Information. This office seems to be about the equivalent of StatsCan for Alberta. They have published a large number of data set which have been divided into categories of interest such as “Science and Technology”, “Agriculture”, “Construction”. Drilling into the data sets typically gets you a graph of the data and the data used to generate the graph. For instance looking into Alberta Health statistics about infant mortality gets you to this page.

The Office of Statistics and Information, which I’ll call OSI for the sake of my fingers, seems to have totally missed the point of OpenData. They have presented data as well as interpretation of the data as OpenData. This is a cardinal sin, in my mind. OpenData is not about giving people data you’ve already massaged in a CSV file. It is about giving people the raw, source data so that they can draw their own conclusions. Basically give people the tools to think by themselves, don’t do the thinking for them.

The source data they give doesn’t provide any advantage over the graph, in fact it is probably worse. What should have been given here is an anonymized list of  all the births and deaths of infants in Alberta broken down by date and by hospital. From that I can gather all sorts of other interesting data such as

  • Percentage of deaths at each hospital
  • Month of the year when there are the most births(always fun for making jokes about February 14th + 9 months)
  • The relative frequency of deaths in the winter compared with those in the summer

For this particular data set we see reference to zones. What delineates these zones? I went on a quest to find out and eventually came across a map at the Alberta Health page. The map is, of course, PDF. Without this map I would never have known that Tofield isn’t in the Edmonton zone while the much more distant Kapasiwin is. The reason for this is likely lost in the mists of government bureaucracy. So this brings me to complaint number two: don’t lock data into artificial containers. I should not have to go hunting around to find the definition of zones, they should either be linked off the data page or, better, just not used. Cities are a pretty good container for data of this sort, if the original data had been set up for Calgary,Edmonton, Banff,… then its meaning would have been far more apparent.

Anyway I promised I would do something with the data. I’m so annoyed by the OSI that this is just going to be a small demonstration. I took the numbers from the data set above and put them in to the map from which I painstakingly removed all the city names.

Infant mortality in AlbertaInfant mortality in Alberta

Obviously there are a million factors which determine infant mortality but all things being equal you should have your babies in Calgary. You should have them here anyway because Calgary has the highest concentration of awesome in the province. Proof? I live here.

Data Visualization - A Misleading Visualization

There is a saying which goes something like “you can make up statistics to prove anything, 84% of people know that”. The assertion is that nobody checks the sources of statistics which is more or less accurate. The lack of fact checking goes double for the recent surge of infographics on the web. I saw one show up on twitter today which I thought was particularly damning in its misrepresentation of statistics.

A poor visualizationA poor visualization


What’s wrong with this? Look at the size of those two circles. The one on the left is shockingly larger than the one on the right. This is done very much on purpose to shock people into thinking that the government is burning through money, that government workers have received a huge salary increase in comparison with the private sector. However the difference isn’t that huge. The ratio between the two should be about 2.38 but if we look at the size of the circles the ratio looks to be closer to 7 or 8.

Small circles inside the large onSmall circles inside the large on

A common mistake made with circles is to double the diameter to represent a doubling in size. Unfortunately, this increased the volume by a factor of 4 and not 2. In this case the ratio is more than doubled so this isn’t the common mistake but a purposeful misrepresentation.

More than a 2.38 ratioMore than a 2.38 ratio

The morale of the story? While data visualizations can tell a story about data you, as the consumer of the visualization, need to pay attention to the underlying data and not just a pretty picture.

HTML 5 Visualizations -Talk Notes

If you came to my talk today then thanks! If you didn’t then you should know that I’m writing down you name. What am I going to do with the list of names I build? Probably I’ll sell it to telemarketers or something.

Power point slides: HTML5 data visualizations (don’t bother, there are like 3 slides)

Code: https://github.com/stimms/HTML5Visualizations

The presentation is based on a number of blog entries written earlier this year:

HTML5 Data Visualizations – Part 1 – SVG vs. Canvas

HTML5 Data Visualizations – Part 2 – An Introduction to SVG

HTML5 Data Visualizations – Part 3 – Getting Started with Raphaël

HTML5 Data Visualizations – Part 4 – Creating a component with Raphaël and TypeScript

HTML 5 Data Visualizations – Part 5 – D3.js

HTML 5 Data Visualizations – Part 6 – Visual Jazz

Presentation Today!

Today I’m doing a talk at the Calgary .net group about HTML 5 data visualizations. If you’re interested in learning a bit about some of the cool, interactive graphics which HTML 5 enables on the browser then I encourage you to come out. You will go away knowing how to build a simple bar chart in a handful of JavaScript and you may even learn something about TypeScript.

The event starts at high noon in downtown Calgary, 800 6th Ave SW (+15 Level across from Spice Cafe):

[googlemaps https://maps.google.ca/maps?q=800+6th+Ave+SW+Calgary&hl=en&sll=51.013117,-114.088499&sspn=0.863132,1.674042&hnear=800+6+Ave+SW,+Calgary,+Alberta+T2P+3E5&t=m&layer=c&cbll=51.047926,-114.079055&panoid=cphX2uyRRUZ8A2UhYBhD9g&cbp=12,14.34,,0,-16.27&ie=UTF8&hq=&ll=51.04793,-114.078793&spn=0.001499,0.003484&z=14&source=embed&output=svembed&w=425&h=350]

Bring your lunch and come out, you’ll be done in plenty of time to make it back to work for 1.  The slides and demos will be posted here once my presentation has started.