A while back I presented a d3.js based solution for showing a map of Calgary based on the open data provided by the city. One of the issues with the map was that the data was huge, over 3 meg. Even with goodcompressionthis was a lot of data for a simple map. Today I went hunting to find a way to shrink that data.
I started in QGIS which is a great mapping tool. In there I loaded the same shape file I downloaded from the City of Calgary’s open data website some months back. In that tool I went to the Vector menu then Geometry Tools and selected Simplify geometries. That presents you with a tolerance to use to remove some of the lines. See the map is a collection of polygons, but the ones the city gives us are super in detail. For a zoomed out map such as the one we’re creating there is no need to have that much detail. The lines can be smoothed
if we repeat thisprocessmany thousands of times the geometry becomes simpler. I played around with various values for this number and finally settled on 20 as a reasonabletolerance.
QGIS then shows map with the verticies removed highlighted with a red X. I was alarmed by this at first but don’t worry, the export won’t contain these.
I then right clicked on the layer and saved it as a shape file. Next I dropped back to my old friend ogr2ogr to transform the shape file into geojson.
ogr2ogr -f geoJSON data.geojson -t_srs “WGS84” simplified.shp
The result was a JSON file which clocked in at 379K but looked indistinguishable from the original. Not too bad, about an 80% reduction.
I opened up the file in my favorite text editor and found that there was a lot of extra data in the file which wasn’t needed. For instance the record for the community of Forest Lawn contained
“type”: “Feature”, “properties”: { “GEODB_OID”: 1069.0, “NAME0”: “Industrial”, “OBJECTID”: 1069.0, “CLASS”: “Industrial”, “CLASS_CODE”: 2.0, “COMM_CODE”: “FLI”, “NAME”: “FOREST LAWN INDUSTRIAL”, “SECTOR”: “EAST”, “SRG”: “N/A”, “STRUCTURE”: “EMPLOYMENT”, “GLOBALID”: “{0869B38C-E600-11DE-8601-0014C258E143}”, “GUID”: null, “SHAPE_AREA”: 1538900.276872559916228, “SHAPE_LEN”: 7472.282043282280029 }, “geometry”: { “type”: “Polygon”, “coordinates”: [ [ [ -113.947910445225986, 51.03988003011284, 0.0 ], [ -113.947754144125611, 51.03256157080034, 0.0 ], [ -113.956919846879231, 51.032555219649311, 0.0 ], [ -113.956927379966558, 51.018183034486498, 0.0 ], [ -113.957357020247059, 51.018182170100317, 0.0 ], [ -113.959372692834563, 51.018816832914304, 0.0 ], [ -113.959355756999315, 51.026144523792645, 0.0 ], [ -113.961457605689915, 51.026244276425345, 0.0 ], [ -113.964358232775155, 51.027492581858972, 0.0 ], [ -113.964626177178133, 51.029176808875143, 0.0 ], [ -113.968142377996443, 51.029177287922145, 0.0 ], [ -113.964018694707519, 51.031779244814821, 0.0 ], [ -113.965325119316319, 51.032649604496967, 0.0 ], [ -113.965329328092139, 51.039853734199696, 0.0 ], [ -113.947910445225986, 51.03988003011284, 0.0 ] ] ] } }
Most of the properties are surplus to our requirements. I ran a series of regex replaces on the file
s/, “SEC.“ge/}, “ge/g s/“GEODB.“NA/“NA/g s/,s//g s/]s//g s/[s//g s/{s//g s/}s//g
The first two strip the extra properties and the rest strip extra spaces.
This striped a record down to looking like(I added back some new lines for formatting sake)
{“type”:”Feature”,”properties”:{“NAME”:”FOREST LAWN INDUSTRIAL”},”geometry”: {“type”:”Polygon”,”coordinates”:[[[-113.947910445225986,51.03988003011284,0.0], [-113.947754144125611,51.03256157080034,0.0],[-113.956919846879231, 51.032555219649311,0.0],[-113.956927379966558,51.018183034486498,0.0], [-113.957357020247059,51.018182170100317,0.0],[-113.959372692834563, 51.018816832914304,0.0],[-113.959355756999315,51.026144523792645,0.0], [-113.961457605689915,51.026244276425345,0.0],[-113.964358232775155, 51.027492581858972,0.0],[-113.964626177178133,51.029176808875143,0.0], [-113.968142377996443,51.029177287922145,0.0],[-113.964018694707519, 51.031779244814821,0.0],[-113.965325119316319,51.032649604496967,0.0], [-113.965329328092139,51.039853734199696,0.0],[-113.947910445225986, 51.03988003011284,0.0]]]}}
The whole file now clocks in at 253K. With gzip compression this file is now only 75K which is very reasonable, and something like 2% of the size of the file we had originally. Result! You can see the new, faster loading, map here