2014

2014-05-09

Let's build a map!

If you spend any time working in oil and gas in this province then you’re going to run into a situation where you need to put some stuff on a map. If you’re like me that involves complaining about it on twitter

I don’t know how I get myself into working with GIS data. I seriously have no idea.

“” Simon Timms (@stimms) April 28, 2014

The problem is that GIS stuff is way harder than it looks on the surface. Maybe not timezone hard but still really hard. Most of it comes from the fact that we live on some sort of roughly spherical thing. If we live in flatland mapping would be trivial. As it stands we need to use crazy projections to map a 3D world onto a 2D piece of paper

https://www.youtube.com/watch?v=n8zBC2dvERM

There are literally hundreds of projections out there which stress different things. Add to that a variety of coordinate systems which can be layered on top of it. There is the latitude/longitude system with which we’re all familiar but there are also a bunch of others. In Western Canadathe important one is the Dominion Land Survey(well most of Western Canada, we’ll get to that). The Dominion Land Survey was actually a series of surveys starting as far back as the 1870s. Bands of bearded men (there may have been bearded women too, everybody back then had beards) traveled around Canada plunking down lines to divide the land into 1sq mile sections called, well, sections. Why miles? Because of some guys calledJ. S. Dennis and William McDougall figured that a lot of people would be coming up from the states and would better understand miles. Thanks for screwing us over, again, with your stupid outdated measurement system, United States.

Anyway you can read a ton more about the system over atwikipedia. The important thing to know is that Western Canada is divided into 6 mile by 6 mile blocks know as townships and that these are divided into 36 sections and each section is divided into 4 quarter sections. Sections can also be divided into 16 legal subdivisions commonly known as LSDs. The LSDsare numbered in the stupidest way possible starting in the bottom right corner and counting up by going left and then up then pretending that we’re a snake and just flipping back and forward

13	14	15	16
12	11	10	9
5	6	7	8
4	3	2	1

You might has well have numbered this things completely randomly as far as I’m concerned

1	7	27	cranberry
12	6	also 6	9
5	6	7
4	33	null	1

I’m told that this all makes sense if you have some background in cartography.

How I picture the average cartographer

The point of all of this is that LSDs are super important in the oil and gas industry because that is how you lease land. The result is that people want to see LSDs on their maps and, as seems to frequently happen, I got the task of building a map. This post is about how to get LSDs onto a map and show it to your clients who aren’t asses. Mostly.

The company for which I’m working has most of its interests in Saskatchewan, Alberta and BC. I started with Saskatchewanas I figured that I might as well get into thinking like a cartographer and working right to left straight off the bat(I hear that cartographers from Arabic countries work from left to right to maximize the confusion). The first step was to find some LSD data. Saskatchewan have an open data portal at http://opendatask.ca/data/which contains a link for LSD data. What you want in particular is theSaskGrid2012. This file contains a lot of stuff but the four things you want are the various high level map structures: township, section, quarter section and legal sub division. We probably don’t need quarter section as once we get to that level most people are interested in LSDs. Inside one of these zip files are a number of files which, as it turns out, are Esri shape files. Esri is a piece of GIS software. It is expensive. However there is a free alternative which has all the functionality we need along with 9000 pieces of functionality we don’t:QGIS. If you download and install this software it will let you take a look at the shape files. You can add it by clicking on “Add Vector Layer” then pointing it at the .shp file.If you load the township file it will get you something which looks like very much likeSaskatchewan. What’s more is that if you click on the little identify featuretool then on the map it will tell you the name of that township. Awesome!

To give you an idea of how many of these features there are here is what the townships look like:

For each township there are 36 section (6Ã—6) and for each section there are 16 (4Ã—4) LSDs. So for Saskatchewan there are something like 7000 townships, 250 000 section and a mind blowing 4 million LSDs. Quite a bit of data.

So now we have a map. But it only works inside of QGIS and I’m sure not going to go around supporting that. It would be really nice if this layer was available on a Google maps like thing.

Leaflet

Leaflet.js is a nifty library for manipulating maps. It can use any number of map backends but I usedOpenStreeMap because it is awesome. Like really awesome. Start a new ASP.net project and then go and grab the latest leaflet from theirsite. Leaflet is in nuget but it is an older version which hasn’t been updated for 6 months or so. Add the files to the site bundles in BundleConfig.cs. I also included my site files in this bundle for convenience

I created a typescript file for the home page based on the example on the lealflet home page

Once I’d added a div in my Index.cshtml under the home controller and constructed the map I ended up with a map centered on, roughly, the border between Alberta and Saskatchewan.

Note:We’re using the open street map tiles directly from their title server in this example. This is frowned upon as it cost the project money. There are a number of proxy services you can use instead or write your own. Be a good citizen and cache the tiles so that the project can spend money on something else.

Now to get our grid lines onto the map. To start I added a simple polygon

#file-index-ts-L14-L18 This builds a map which includes a nifty redbox.[![map2](http://stimms.files.wordpress.com/2014/05/map2.png)](http://stimms.files.wordpress.com/2014/05/map2.png)This basically proves that we can draw out LSDs as needed. So the next thing is going to be combining the shape file data we had above and the map we have from OpenStreetMap. # Exporting KML Files I went down a few blind alley with this one before coming up with, what I think, is the best option. I decided to exploit the power of the geometry types in SQL Server to find any LSDs inside a bounding box. To display all the LSDs or even the townships at a low zoom level is messy. It covers up the map and is too detailed for that level. As such only showing a few at a time is a good idea. To figure out which ones to show requires putting in place a filter to show only LSDs within a bounding box, the bounding box described by the map at a high zoom level. Getting the data into SQL server is a two step process: this first is to create KML files and the second is to load them into SQL server. For each of township, section and LSD I loaded them into QGIS. Then right clicked on the layer and hit Save as. In that dialog I selected KML as the format and a file into which to save the layer.[![save](http://stimms.files.wordpress.com/2014/05/save.png)](http://stimms.files.wordpress.com/2014/05/save.png)This generates a pretty sizable export file, well over 4GB for the LSDs. Don't worry though because these crummy things are XML so most of that will disappear when loaded into SQL server. If you want you can attempt to simplify the geometry in QGIS which will reduce the export size at the cost of fidelity. The file size does, however, pose a bit of a problem for our import tools as they use a DOM Parser for reading KML instead of a SAX parser. If I were going to make a living at manipulating maps this would be a place I expended some effort to correct. # Importing KML Files into SQL Server I hunted around and found a tool called[KML2SQL](https://github.com/Pharylon/KML2SQL)to import KML files into SQL server. It had a few issues with it so I[forked it](https://github.com/Pacesetter/KML2SQL)and made some updates. If you're going to make use of the tool then you're going to be better off using my fork at the current time. However if my pull requests are merged then the master repo may be more healthy. As I mentioned the KML files are too large to be consumed by the import tool. To solve this I wrote a quick KML splitter application, which splits the KML files into 500 feature blocks. I've included the source on[github](https://github.com/stimms/LSDMap/tree/master/KMLSplitter). It isn't pretty but it gets the job done. [![kml2sql](http://stimms.files.wordpress.com/2014/05/kml2sql.png)](http://stimms.files.wordpress.com/2014/05/kml2sql.png) The KML2SQL tool will dynamically build tables with the correct columns in them so that's great. All I did was plop in my SQL server credentials and point the tool at the directory containing the output files from the splitter. I left this to churn for a few minutes. A few hours for the LSD divisions. I did see some import errors related to open polygons which I generally ignored. It is something I'll have to come back to in a while but they were perhaps 1% of the imports. SQL server wants the polygons to have the same star coordinate as they have end coordinate, which is only reasonable. The data from the government doesn't quite have that but for free data you can't complain too much. # Querying the Data Now that we have the data in the database the next step is to get it out onto the map. To start we need to have the map ask for some data when it is resized or panned. I hooked into the zoomend and dragend events in Leaflet. I found that it made sense to display the townships starting at zoom level 10 and at higher zoom levels show more detailed data such as sections or LSDs. I threw together a Web API controller to do the work of querying the database. Doing geospatial queries in SQL server is a bit more complicated that I would like but complex types in relational databases always are. I don't know, relational databases, man. I'm a big fan of the light weight ORM Dapper so I installed that and stole a bit of code for doing spatial queries from a posting by Sam Saffron on StackOverflow. I had to modify it a bit and ended up with:

This can be used to select the intersecting polygons by doing

STIntersects will check the number of intersection points between the given polygon and the ones in the database. If there are any intersections then we have a match and we add that to the result set. Actually building the search area can be a pain as you actually build a geometry like so

This provides a set of data to return to the client.

Plotting the Data

We’re almost there, folks, thanks for staying with it. The last step is to get the returned data plotted on the map. This is simply done by adding the polygons to a multipolygon layer

That’s it folks! You now have a map which looks like

The labels are a bit jaggedy right now as I’m just using the envelope center to calculate the position of the label. It doesn’t take a whole lot to screw that up. Putting them in the top left corner helps with that a lot

P.S. I promised I would get back to BC’s system. They don’t use the DLS they use theNational Topographic System

2014-04-22

The permission model in android is totally broken

When installing a new application on a cell phone I typically agree to whatever the stupid app wants. My approach is “just do it and stop asking me questions”. There have been numerous reports about how apps are stealing data. I had to rebuild my phone this week after getting a replacement from Google due to some rather nasty screen issues. I thought I would be a bit more circumspect in installing applications this time. I took a close look at the permissions applications were requesting as I installed them.

It is absolutely amazing the permissions applications are requesting. Of the 10 or 11 clock applications I looked at every last one of them wanted some permission which I deemed unnecessary. Reading caller ids, access to the network, access to contacts, ability to send e-mails without me knowing,”¦ Outrageous! I’m sure an argument could be made for many of these but I cannot imagine how the argument for being able to read my text messagesor read my contacts would go. If you’re not paying for something then you’re the product has never been more true.

Asking to read my text messages? That’s a paddlin’

What’s the solution?

I think it is actually a pretty easy solution: grant permissions in the same way as HTML5 or OpenID. HTML5 will request permission when a page performs some activity such as capturing images from your web camera. If the script isn’t granted permission to access the camera then it should degrade or cancel based on this. Equally when you’re logging into an OpenID site and it requests additional fields from the login provider then you can click cancel and the application should accept this and compensate.

Sorry, you need what permissions?

As it stands I either accept that my alarm clock needs to read my text messagesor I don’t install it. Usually I just don’t install it. If I were able to pick and chose the permissions the application could have then it could degrade and still give me some functionality. Developers would have a much harder time sneaking malware onto phones if this could be done. As an added bonus I would like to see developers have to enter a reason why each permission was needed and have that show up during the install.

The correct set of permissions for an alarm clock

I can’t believe that Google is just letting this stuff go. Say what you will for Apple but they’re pretty willing to crack down on stuff like this.

2014-04-21

Glimpse for raw ADO

If you haven’t used Glimpse and you’re developing a web application in .net then you’re missing out on a great deal of fun. Glimpse is sort of like the F12 developer tools but running on your server and offering insight into how the server side of the page is doing. Installing Glimpse is also very easy and can be done entirely from nuget.

install-package Glimpse.AspNet

Because there are all sorts of different configurations for ASP.net applications Glimpse is divided up into a main assembly and then a bunch of helper assemblies which can be put together like Lego. For instance if you’re using a brand new ASP.net MVC site with Entity Framework then you can install Glimpse and then the modules for that specific configuration

install-package Glimpse.EF6 install-package Glimpse.MVC5

This is a very well supported configuration. If you’re less fortunate and you need to hook Glimpse into an application which makes use of a lot of legacy data layer code then EF profiling isn’t going to be available for you. You can, however, make use of the ADO Glimpse package. To do this you’ll first need to install the package

install-package Glimpse.ADO

If you’re application makes use of DbProviderFactory then you’re done. If not then you’ll need to transition the site to make use of this method of building connections.DbProviderFactory is a method of abstracting out the connection provider so that you can easily swap different database strategies into place. If you were originally setting up connections to open like so

Then all you need to do is convert it to look like

The issue is that the ADO tooling for .net provides very few extension points. Glimpse.ADO works by hooking itself in as a DbProvider which wraps the SqlProvider and intercepts all the calls. You don’t need to modify your entire site at once but if you don’t you’ll get a funny mixture of pages which work and pages which don’t work. A few well crafted regular expressions got me 90% of the way there on a medium sized application and I did a full transition in about an hour so it isn’t a huge time investment.

With the site using DbProviderFactory and Glimpseup and running I was able to get some really good hints about why pages were somewhat slow.

Having this sort of information exposed to developers makes it much easier to debug and solve performance issues before the site hits users.

2014-04-18

Limitations of WebForms

I’m spending a lot of time working with WebForms at the moment. I haven’t written WebForms forms since”¦ 2003 maybe 2004. When WebForms was created it was done as way to transition developers from the drag and drop world of Windows development to the exciting world of the Internet. Of course the Internet is not a Windows form. The result is that WebForms is a leaky abstraction. The abstraction has been getting leakier and leakier as web technology progressed.

One of the key features of WebForms is that it keeps track of transient data in ViewState. By using ViewState WebForms is able to provide a web experience which is more similar to a desktop application: it is expected that when pressing a button on a Windows form that the rest of the form isn’t wiped out. This would happen without a ViewState storing the form state.Depending on how you configure your application the ViewState is either kept in a hidden field which is sent to the client or in some sort of server side storage mechanism. You can hook ViewState persistence up to a database like SQL server or to a distributed cache like memcache or Azure cache.

However the vast majority of sites keep ViewState in that hidden field. As your ViewState grows then so does each page load. There is very little room to optimize ViewState because, instead of it being sensibly stored in a key value fashion it is persisted as a single blob. The entire thing is persisted and reloaded each time. The result is that most interactions with the server need to include this viewstate. This makes lightweight AJAX calls difficult. When AJAX started to become popular update panels were introduced. These were chunks of the page which could be refreshed independently.

Again these were justplugging up a leaky abstraction.

As web applications become more JavaScript based it became apparent that the HTML produced by WebForms was brittle. Controls were named with a near indecipherable id which changed based on the rest of the page. Later versions of ASP.net brought more predictible contol names but, again, this was just patching the abstraction. If you’re interested in building a modern web application it should not be done using webforms. There are just too many places where the abstraction leaks and makes your job much more difficult.

Modern web applications make much greater use of client side framework the likes of Angular, Ember and Backbone. With this class of application the server side framework starts to matter less and less. Eventually it is reduced to a tool for sending views to the client and providing data end points. I won’t miss WebForms on new applications but we’re not all lucky enough to work on new applications. For legacy applications which are written using WebForms there are upgrade paths available to you.

I’m going to start blogging a bit over the next few weeks about how to start taking steps towards more modern WebForms applications without jeopardizing existing functionality. Stay tuned!

2014-04-14

A Quick tip on adding dependency injection

I ran into the need today to move quite a number of classes into dependency injection. This can be a bit of a pain as you have to go through a ton of places to find where the class is used and get it out of the DI container instead of simply newing it up. See the constructor remains valid so you can still create an instance using just var b = new blah();

One trick I used which really helped speed up finding the places where the class was being manually created was to, temporarily, make the constructor private. This will cause all the places where the class is being instantiated to be highlighted as compiler errors. Once you’re done fixing it then you can return the constructor to its previous state and go about your business. This is really just an application of the Lean on the Compiler pattern I first learned about in Michael Feathers’ excellent book Working Effectively with Legacy Code. Well worth a read if you have any untested code to maintain.

2014-04-04

Roslyn Changes Everything

Yesterday at Microsoft’s build conference there was a huge announcement: Microsoft were open sourcing their new C#/VB.net compiler. On the surface this seems like a pretty minor thing. I mean who looks at how compilers work? “This is probably going to be interesting to academics who study compilers and nobody else.”

Well I disagree. I think it is going to be a huge turning point in how programmers work with code.

There are other open source compilers: GCC,LLVM both come to mind as great examples. The differences between these and Roslyn are huge. First Roslyn is a much more modern compiler than almost anything else out there. I still think of clang, which is based on LLVM, as the new kid on the block, however LLVM was started in 2000: 14 years ago. Roslyn was written from the ground up over the last 4 years. I haven’t looked but I would bet that it makes much better use of things like parallel processing than other compilers. There is a pretty vague post on the C# blogabout how they’re treating performance of the compiler as a feature. Idon’t know what progress they’ve made on that front but we’ll certainly be seeing some benchmarks come out in the next few weeks as people dig into Roslyn.

Next is that Roslyn written in a much more accessible language: C#. It is going to be far easier for the average developer to jump into modifying the compiler than it would be add some functionality to LLVM. Roslyn was designed to be an extensible compiler. It has a well defined API and some phenomenal extension points into which people can plug. I think that we’re going to see a huge number of plugable modules which mutate the language.

The build pipeline for Roslyn taken from the overview on codeplex http://roslyn.codeplex.com/wikipage?title=Overview&referringTitle=Home

Finally I’m excited that Roslyn will enable smaller, more incremental changes to the languages it compiles. Already we’re seeing some hints as to this. In the Tour of Roslyn post there was an example of inline declarations:

public static void Main(string[] args) { if (int.TryParse(args[0], out var n1)) { Console.WriteLine(n1); } }

There is support for these in Roslyn but not in the classic C# compiler. Little things like this are going to add up and make the language much better. If shipping for Roslyn can be decoupled from Visual Studio, a given for open source projects, then we can see awesome new features enabled rapidly instead of waiting for the full releases of Visual Studio.

What can we do with the compiler?

Here are some quick ideas I had about what we could plug into Roslyn. Some of them are mad dreams but some of them are almost certain to get made.

Aspect Weaver

There is already and AOP weaver available for the .net platform in Aspect Sharp. It has a bit of a reputation for being slow. It works by rewriting the IL instructions which is kind of hacky and presents some problems. With Roslyn there should be no need to hook into the build that late. I think you could manipulate the syntax tree to inject calls to the aspects whenever needed.

AOP should be vastly easier and may even be more powerful with this syntax tree rewriting.

Custom Compiler Errors

Is there some practice you’re trying to avoid in your team? Perhaps long methods are really a huge deal for you and you want to fail the build when some mouth-breather writes a method which is over 50 lines long. No problem! Just plug into the syntax tree API and fail the compile when long methods are detected. Perhaps you want to check for and fail on concatenating strings and them running them against a database(SQL injection). Again this could be plugged in without a great deal of trouble.

Domain Specific Languages

There are plenty of nifty places where it would be fun to be able to define a custom syntax for certain projects. Perhaps you’re writing a message based system and you want to make it easier to write message handlers. With some Roslyn work a new syntax could be added so that instead of writing

You could just write

and all the wireup would be dealt with by the compiler.

Random other Syntax Improvements

You know what syntax I really like? Post if statements. I think they’re nifty and read more like human language.

This the sort of thing which can just be added by rewritingthe syntax tree. Oh or how about cleaning up the accessors for collections?

That’s probably a terrible syntax now I think about it”¦ whatever it is still possible.

It is going to be awesome!

I envision a future where any project of appreciable size will include a collection of syntax and compiler modules. These will be compiled first, plugged into Roslyn and then used to build the rest of the project. Coding standards will be easier to enforce, compilations will be more powerful. There is a risk that the language proliferation will get out of hand but I’m betting it will settle down after 2 or 3 years and we’ll get a handful of new dialects out of this. There will need to be new tooling developed to make changing compilers in VS easier. Package managers like nuget will need to be updated to support compiler modules but that seems trivial.

It is an exciting time to be a .net developer. I’m so glad that when I had the option I decided to go down the .net path and not the Java path. Those suckers just got lambdas and we’re working with the most modern, flexible, extensible compiler in the world? No contest.

2014-04-03

Checking Form Re-submissions in CasperJS

ASP.net WebForms has a nasty habit of making developers comfortable with using POST for pretty much everything. When you add a button to the page it is typically managed via a postback. This is okay for the most part but it becomes an issue when using the back button. See HTTP suggests that things which are POSTed are actual data changes where as GETs are side effect free. Most browsers save you from messing up with the back button by simply throwing up an warning

This warning is not something we want our users to have to see. Without some understanding of how browsers work it is confusing to understand why users are even seeing this error. On my agenda today is fixing a place where this is occurring.

The first step was to get a test in place which demonstrated the behaviour. Because it is a browser issue I turned to our trusty CasperJS integration tests and wrote a test where I simply navigated to a page and then tried to go back. The test should fail because of the form resubmission.

It didn’t.

Turns out that CasperJS(or perhaps PhantomJS on which it is built) is smart enough to simply agree to the form submission. Bummer.

To test this you need to intercept the navigation event and make sure it isn’t a form re-submission. This can be done using casper.on

If you add this before the test then navigate around an exception will be thrown any time a form is resubmitted automatically. Once your testing is done you can remove the listener with casper.removeAllListeners.

Now on to actually fixing the code”¦

2014-04-01

Hacking Unicoin for Really no Reason

It is April 1st today which means that all manner of tom-foolery is afoot. Apart from WestJet’s brilliant “metric time“ joke the best one I’ve seen today is Stack Overflow’s introduction of Unicoin which is a form of digital currency which can be used to purchase special effects on their site.

To get Unicoin you have two options: buy it or mine it. I have no idea if buying it actually works and at $9.99 for 100 coins I’m not going to experiment to see if you can actually purchase it. Mining it involves playing a fun little game where you have to click on rocks to uncover what they have under them(could be coins, could be nothing).

I played for a few minutes but got quickly tired of clicking. I’m old and clicking takes a toll. To unlock all the prizes you need to have about 800 coins (799 to be exact). So I fired up the F12 developer tools to see if I could figure out how the thing was working.

As it turns out there are two phases to showing and scoring a rock. The first one is rock retrieval which is accomplished by a GET tohttp://stackoverflow.com/unicoin/rock?_=1396372372225 or similar. That parameter looked familiar to me and, indeed, it is a timestamp. This will return a new “rock” which is just JSON

{“rock”:”DAUezpi1zrfxHRxdi3yp9JUCZ9vwABJbDA”}

The value appears to be some sort of randomly generated value. Doesn’t really matter for our purposes. The response once a rock is mined is to POST to

http://stackoverflow.com/unicoin/mine?rock=DAUezpi1zrfxHRxdi3yp9JUCZ9vwABJbDA

in the body of that POST you’ll need an fkey which can be found by looking at the value in StackExchange.options.user.fkey.

Once you know that stealing coin is as easy as

There appears to be some throttling behaviour built in so I ran my requests every 15 seconds. Hilariously if you don’t include the fkey the server will respond with HTTP 418, an April Fools inside and April Fools. Now you can buy whatever powerups you want

Update: The rate at which I’m discovering new Unicoins has dropped off rapidly. I was discovering coins on almost every hit originally now it is perhaps 1:20. Either I’m being throttled or the rate of discovery of new coins reduces as more of the keyspace has been explored like Bitcoin. I really hope it is the second one, that would be super nifty.

2014-03-29

Octokit.net - Quickstart

I’m working on a really nifty piece of code at the moment which interacts with a lot of external services and aggregate the data into a dashboard. One of the services with which I’m working is github. My specific need was, given a commit, what was the commit message?

GitHub have a great RESTful API for just this sort of thing and they even have a .net wrapper library for the API called Octokit.net. It seems to bind most of the API, which is great. It also seems to have no real documentation, which is not.

The repositories against which I wanted to fire the API were part of an organization and were private so I needed to authenticate. You have two options for authenticating against the API: basic or OAuth. As my service was going to be used by people who don’t have github credentials the OAuth route was out. Instead I created a new user account and invited it into the organization. It is always smart to give as few permissions as possible to a user so I created a new team called API in the organization and made the API user its only member. The API team got only read permission and only to the one repository in which I was interested.

Next I dropped into my web project and added app settings for the user name and password. I use a great little tool called T4AppSettings which is available in nuget. It is a T4template which reads the configuration sections in your web.config or app.config and makes them into static strings so you don’t need to worry about missing one in a renaming. The next step was to add a reference to Octokit

install-package octokit

in the package manager console did that. Then we new up some credentials based on our app settings

Next create a connection

The product header values seems to just be any value you want. I’m sure there is some reason behind it but who knows what”¦ Now we need to get the octokit client based on this connection.

That is all the boring stuff out of the way and you can start playing around with it. In my case I had a list of objects which contained the commit versions and I wanted to decorate them with the descriptions

This was actually what took me the longest. The parameters to the Get were not well named so I wasn’t sure what should go in them. Turns out the first one is the name of the owner where the owner is either the organization or the user. The second one is the name of the repository. So for this repositorythe owner is alexwolfe and the repository name is Buttons.

The GitHub API is rich and powerful. There is a ton to explore and many ways to get into trouble.Take chances,makemistakes,get messy.

2014-03-24

Where is my tax software?

It is tax season again here in Canada which always makes me angry. A little bit because I have to pay my taxes (who likes that?) but mostly because of tax software. Doing taxes by hand isn’t all that bad but we live in the 21st century and doing taxes like that is old school; we use computers these days.

There are a lot of options out there for software to help with doing taxes. QuickTax, Cantax and, I kid you not, Taxtron are all good options. But there is one piece of tax software which I never see and I should: whatever tax software the Canada Revenue Agency use internally. Let me walk through this:

Every year almost everybody in the country fills in some form of tax filing and sends it to the government. Let’s use a Fermi estimationto figure out how much paper work the government has to do. There are about 35 million people in Canada. Perhaps 25% of them aren’t filing taxes because they’re too young. A few more people don’t file taxes for a variety of other reasons so let’s say that 25 million people file taxes. Each person is likely to have at least five forms plus the actual tax form themselves, let’s say 20 pages all told. So that means that the government can expect to get something on the order of 500 million pieces of paper.

That’s a lot of paper! Even with netfile, the electronic filing system, it is a mountain of data to process. There is no way that this amount of paperwork is being done by hand. There must be some software which is processing this data. What’s more every year I receive a notice that they’ve assessed my taxes and found them to be correct.

This means that not only is their software handling filing the taxes it is also performing a cross checking function. My argument is that this software should be made public, what’s more this software should be open sourced.

By making the software available to everybody we tax payers can perform our own cross check to ensure that our filings are correct. If we wanted we could use the software to actually fill out our taxes. This has the potential to save us millions of dollars spent on buying tax software. For those for whom buying tax software is a burden this could be a great boon. I don’t worry very much that giving away the government’s software would necessarily put the traditional tax software companies out of business, either. Their selling feature would be ease of use. Goodness knows that the software the government uses internally isn’t likely to be very user friendly. What’s worse is that the software may not be very good.

This software handles billions of dollars every year. Billions of dollars which fund our schools, roads, military and everything else in between. Allowing the population to test and audit this software is quintessentially democratic. For many of us paying taxes is the only time we interact directly with the government all year. If we cannot be sure that the government is getting this most basic interaction right how can we trust them to deal with more pressing issues?

Opening this software up and providing it for general consumption should be a priority. Our taxes paid for the software to be developed in the first place and any government which values transparency should be delighted to open it up.

Free our tax software.

A blog about computer programming and technology.

My Books