2013

2013-02-18

Data Visualization - A Misleading Visualization

There is a saying which goes something like “you can make up statistics to prove anything, 84% of people know that”. The assertion is that nobody checks the sources of statistics which is more or less accurate. The lack of fact checking goes double for the recent surge of infographics on the web. I saw one show up on twitter today which I thought was particularly damning in its misrepresentation of statistics.

A poor visualization

What’s wrong with this? Look at the size of those two circles. The one on the left is shockingly larger than the one on the right. This is done very much on purpose to shock people into thinking that the government is burning through money, that government workers havereceiveda huge salary increase in comparison with the private sector. However the difference isn’t that huge. The ratio between the two should be about 2.38 but if we look at the size of the circles the ratio looks to be closer to 7 or 8.

Small circles inside the large on

A common mistake made with circles is to double the diameter to represent a doubling in size.Unfortunately,this increased the volume by a factor of 4 and not 2. In this case the ratio is more than doubled so this isn’t the common mistake but a purposeful misrepresentation.

More than a 2.38 ratio

The morale of the story? While data visualizations can tell a story about data you, as the consumer of thevisualization, need to pay attention to the underlying data and not just a pretty picture.

2013-02-15

Open Data 101

One of the things I’m really enthusiastic about is open data. If you haven’t heard of open data the ideas that governments have a lot of data at their disposal. They gather this data as a normal part of doing business. If you think about your city they gather data about traffic patterns so they can set up traffic light patterns and decide which interchanges to expand first. They gather data about property values so they know what tax rates should be. They gather demographics aboutneighbourhoodsto decide where to put recreation centers. Cities also have at their disposal lists of all the street names and all the companies registered in the city. The list of data goes on and on.

Other levels of governments are equally well set up with data. You simply cannot run a government, or really any large company, without a lot of data. Typically governments sit on this data, they hoard it and gloat over it in the high towers of their data castles.

The City of Calgary’s data castle. Huge waste of tax payer’s money if you ask me.

Fortunately this is changing. Governments are starting to open this data up an make it available to the general public. This is fantastic because it allows those of us with some data analysis chops to dig in and find all sort of correlations which governments which might not have noticed. Many eyes can find things which government workers might have missed. Governments are also amazingly slow to react to new technologies so it isn’t very likely that your government is going to even think about producing a mobile application much less creating one. However private individuals or companies may well see profit or use in creating applications. With open data they can go ahead and do it.

The key to open data is that governments give it away for free and without strings. Somegovernmentsare reluctant to give the data away without strings attached. They consider that they have spent a lot of money to create the data so they should be compensated for it. What they’re missing is that the money that they’re spending is our money. We paid for the data so we should be entitled to use it in whatever way we see fit. The applications and tools which the data savvy are creating are not being created by governments. Everybody is likely tobenefitfrom these applications so that is another way in whichgovernmentsbenefitfrom open data.

The city in which I live, the City of Calgary, has a small collection of open data available on their website.Unfortunately,they’re still behind the times. ~~The data is protected by a rather draconian user agreement~~(Edit: Walter was kind enough to point out that the license has been vastly improved, I no longer have any real complaints about it) and the city provides no API access to the data. I am hopeful that the city will catch on soon and update the data they provide.

I’m talking today about open data because February the 23rd is Open Data Day. I am going to spend some time writing something using either Calgary’s open data or something from the provincial or federal governments. I’m not sure what I’ll make so please feel free to make some suggestions. Calgary has a few transit applications already so it won’t be one of those. If you’reinterestedin joining me to create something then drop me a line. Open data is only a success if people like you and me use it. So let’s get using it!

2013-02-14

HTML 5 Visualizations -Talk Notes

If you came to my talk today then thanks! If you didn’t then you should know that I’m writing down you name. What am I going to do with the list of names I build? Probably I’ll sell it to telemarketers or something.

Power point slides:HTML5 data visualizations(don’t bother, there are like 3 slides)

Code: https://github.com/stimms/HTML5Visualizations

The presentation is based on a number of blog entries written earlier this year:

HTML5 Data Visualizations ““ Part 1 ““ SVG vs. Canvas

HTML5 Data Visualizations ““ Part 2 ““ An Introduction to SVG

HTML5 Data Visualizations ““ Part 3 ““ Getting Started with RaphaÃ«l

HTML5 Data Visualizations ““ Part 4 ““ Creating a component with RaphaÃ«l and TypeScript

HTML 5 Data Visualizations ““ Part 5 ““ D3.js

HTML 5 Data Visualizations ““ Part 6 ““ Visual Jazz

2013-02-14

Presentation Today!

Today I’m doing a talk at the Calgary .net group about HTML 5 data visualizations. If you’re interested in learning a bit about some of the cool, interactive graphics which HTML 5 enables on the browser then I encourage you to come out. You will go away knowing how to build a simple bar chart in a handful of JavaScript and you may even learn something about TypeScript.

The event starts at high noon in downtown Calgary, 800 6th Ave SW (+15 Level across from Spice Cafe):

[googlemaps https://maps.google.ca/maps?q=800+6th+Ave+SW+Calgary&hl=en&sll=51.013117,-114.088499&sspn=0.863132,1.674042&hnear=800+6+Ave+SW,+Calgary,+Alberta+T2P+3E5&t=m&layer=c&cbll=51.047926,-114.079055&panoid=cphX2uyRRUZ8A2UhYBhD9g&cbp=12,14.34,,0,-16.27&ie=UTF8&hq=&ll=51.04793,-114.078793&spn=0.001499,0.003484&z=14&source=embed&output=svembed&w=425&h=350]

Bring your lunch and come out, you’ll be done in plenty of time to make it back to work for 1. The slides and demos will be posted here once my presentation has started.

2013-02-13

Debugging Android App from OSX

I just got a Nexus 7 for my birthday and thought I would try deploying to it. I really have no idea about building for Android so there were a couple of stumbling blocks for me I thought I would write down for future reference.

The Nexus 7 and, I understand, Android devices in general after 4.2 don’t have a development option on the menu. You have to go to settings > about and then tap on the build number 7 times to enable it. Hilarious. I’m delighted we basically have a Konami code for getting to options.
By default the USB debugging is turned off. You need to turn that on from the developer menu to get it to actually be found by the android SDK. You can check your devices by running

adb devices

adb is found in the platform-tools directory of the SDK.

2013-02-12

Stop With the Error Codes, Already

I don’t like error codes. I suppose that back in the day when computers were young and had very limited memory there was a purpose to error codes. But last I checked everybody and their mother had enough storage available to show people a message in their language of choosing. I’m put in mind of this today by there being some random alpha numerical error on Windows Phone . I didn’t see it myself and, honestly, I’m so done with Windows Phone that I wouldn’t care if I did.

Why are we showing errors like this to users? We try to abstract the inner functioning of our applications from users so they don’t know what technologies or tools were made to created it. Users don’t care that you used a really novel way of building a dependency injection container or that your development workflow was centered around branches and not feature flags. Nope, they care if things work and if they are easy to use. So look deep into your developer brains and think which of these is better?

805a0193

An error hasoccurred: the server is currently unresponsive ““ we’re working on a fix and will be back shortly

Gosh, I sure like that second one. Even more than that error I would like to see something which will tell me how I can fix the problem, if it is indeed something I can fix. As an example I have an application which reads excel files and loads them into a database. I would say that 30% of the time when I run it there is a failure because I’ve still got the excel file open. Instead of crashing or showing anunintelligibleerror message I show

Unable to read Excel file. Typically this is because the file is already open. Close Excel and click to retry.

This error is precise, it gives a tip on how to fix it and a shortcut to the action the user should perform after they have solved the problem. We should be writing more errors like this. It only takes an extra 5 minutes to code up but can save people a lot in the long run.

2013-02-11

There is No Feature Flag Conspiracy

I know what they say, that if there were a conspiracy we would deny its existence but honestly those of us who support feature flags didn’t even know that we were suppose to be conspiring. In either case one of those smart guys I’ve talked about in the past, Amir Barylko, just wrote a blog post in which he derides the idea of feature flags and supports branches instead.

Thedefinition which Amir givesof feature flags doesn’t sitentirelywell with me. A feature flag is a branch in the flow of the application which disables a chunk of functionality for some or all users. These flow branches can be either compile time switches using ifdefs, if statements in the code or even a strategy pattern where different strategies are plugged in.

Let’s take a step back for a moment and talk about why we branch our code. I think there are a number of reasons for and applications of branching.

You want to develop a feature which will take some time and will not beimmediatelyvisible to users. So perhaps you want to add a function which e-mails users when some long running action they have started completes. This will take more than the 3 hours you have left in the day so you branch and, when the feature is complete, you merge it back into trunk and all is well. These feature branches may live for hours, days, weeks or even months and may be worked upon by several people.
You’re supporting multiple versions of the application. I know this is hard to remember in the reality of software as a service but there was once a time when you would deliver a product and then supply updates and patches for a time while having already started developing the next version. You don’t want users of the previous version to get the full update to the next version without paying but at the same time you don’t want them to suffer a bug which is fixed in the next version. In this case you maintain a branch for each release in which you can add fixes.
When you’re working on a feature and something more important comes along so you shove your work in a branch and come back to it later.
Quality branches when you promote features from dev to test to production. This ensures that only features which are ready for production make it there and they make it through test.
To support multiple parallel developments against different technologies. Perhaps your product has a version for Windows and a version for UNIX. Between the two of them there likely exists a shared kernel but you keep different branches for each platform and merge into them from the shared kernel.

Huh, I was sort of expecting that to be a longer list. I may have missed something in here but I don’t think so. I don’t believe that either branching or feature flags is the magic pill which solves each one of these use cases optimally.

Amir’s argument boils down to: git is really good at merging and feature flags add a lot of overhead without offering anything that branching doesn’t. Well let’s look at each of the cases above and see if that holds true.

Feature Branches

Feature branches keep new functionality isolated from users until it is ready. I don’t like the idea of branching for features right off the bat. It gives the idea that the feature is a bigmonolithicpiece of code which needs to be completed before it is ready for users. Why is that dangerous? Because developing functionality in isolation leads to functionality which users don’t really want or don’t work in the way the users envisioned. Feature flags are a clear winner here in my mind. Feature flags allow developers to turn features on at runtime to test out new functionality on small sets of users or try them out at certain times of day.

In many large systems the testing environment isdifferentenough form production that it is almost impossible to know what will happen when a feature is turned on. The production environment can be too expensive to reproduce or too diverse. In these cases you turn the features on for a few people and see what happens. It sounds crazy but there arelegitimatelysome things you can’t test outside of production. There are also some things you want to test in production. If you’re scientific about making changes to your application then you want to to AB testing against a real user base.

Branching for features delays integration with mainline. It means that you’repurposefullyadding uncertainty about what will happen when the feature is merged back in. In the comments to another post on this subject, Dyaln Smith’s,Amir suggests that the delayed integration problem is a myth. Each feature branch is responsible for merging from mainline, he claims, so in fact integration is happening all the time. That’s not true. Integration is happening between mainline and each feature branch but not between different feature branches.

If features are very small and short lived then branching remains an option. In my mind if the development of a feature is expected to take more than 24 hours then a branch is not the right approach.

Branching to SupportMultipleReleases

I think this is a totally legitimate use of branching and a terrible place to use feature flags. The reason here is that the changes to the maintenance branch are always enabled. You would never want to put in a change to maintenance branch which wouldn’t be enabled right away. Source control systems like git are magnificent at merging between code bases. I remember merging between branches in ClearCase and in Subversion. These legacy source control systems were not good at this uses case ““ it was almost always easier to just manually make the changes in two places. It is really difficult to scale beyond two releases. When I was working with ClearCase we supported about 10simultaneousreleases. It was not fun.

Branching to Shelve Work

Again legitimate. When something more important comes along you may not have time to finish the line on which you’re working so the code base could not compile. Committing this back into mainline would bedisastrous (for a pretty benign definition of disaster, I admit). These branches are fine in my mind because they’re really short lived. Remember that: branchesshouldbe short lived.

Quality Branches

I don’t really think these differ much from feature branches. It is just a mechanism for promoting code between environments. Feature flags are a better choice here as they give better granularity for promotion and avoid the nightmare that is trying to isolate, at merge time what needs to be promoted. Merging upstream is a difficult problem. I’ve seen it used before in environments which strongly regulate what goes into production. We had terrible issues with merging and breaking upstream code. It was especiallyproblematicas this was many years ago before automated testing had really caught on. We did a lot of “emergency” fixes.

Supporting Multi Environment Development

Yuck. This is a totallyillegitimateplace to use branching. Different code for different environments should be isolated into different projects in mainline. If you create a branch for each one you’re buying a huge number of branches. Remember when I said I worked at a place where we supported 10releases? We also supported 31 platforms. 31. Heck we supported 4 different builds on Solaris: x86, x86_64, sparc and sparc64. Then we brought on a new compiler for Solaris which created binaries which were not backwards compatible. That added 3 more platforms, just for Solaris. I cannot even imagine how much time would have been wasted merging between branches if we had a different branch for each. It would have taken 90% of our development cycle just to push stuff to the various branches.

Arguments Against Feature Flags

A common argument I hear is that feature flags need to be toggled in a lot of places. If a change happens in the UI and the back end then that’s at least 2 places you need to put in the flag. This is a reasonable concern. The beauty of feature flags is that once the feature is fully live then the flags can be removed. Centralizing feature flag implementations is also key to implementing feature flags. Testing feature flags should be isolated to a single piece of code. I would also recommend leaning on the compiler a bit for feature flags. Instead of using strings to denote the name of a flag use an enum. In this way it is easy to find all references to the value and quickly jump to the feature flags used in code.

I really like feature flags for the flexibility they offer me for when I deploy features and for whom they are turned on. They’re not good in every situation and, as with all things, you have to think before you use them. There is a very realpossibilitythat feature flags will grow out of control and turn your code into spagetti. These concerns should be addressed the same way you would address any code quality concern: with thought and refactoring.

2013-02-08

Recreating Visualizations - CodeEval

I come across great visualizations every day. Every time I see one I now start thinking about how I could recreate it using SVG or even how I could improve it. The RecreatingVisualizationsseries of blog articles is going to explore some of these.

There was an interesting little article on reddit a few days back about some of the most popular programming languages of 2012. The results are a bit questionable but I did like the visualization they used. I say that the results are questionable because CodeEval are only looking at their own site’s results. They failed to state that clearly. To me the gold standard of programming language usage statistics is TIOBE. They publish their methodology openly and I cannot find fault with it. However their visualizations are not very attractive. Let’s see what happens when we combine the great visualizations of CodeEval and the statistics of TIOBE using d3.js.

A common mistake with bubble graphs of this sort is the scale of the bubbles. When building the graph we use the radius of the circle to draw the circles but if we use the radius directly to scale the circles then we actually end up with inaccurate circles because the surface area of a circle is pi * r^2. We need to adjust for this when building the graph by taking the square root of the value as the radius.

To start we build the data by copying and pasting from TIOBE

If you’ve read my multi-part series on introducing data visualizations for HTML5 then this should all look pretty familiar. If you haven’t read it then you should! I worked hard on it.

Initial Bubbles

Here we are setting up a new entry in our Graph module. You’ll notice on line 15 I set up the square root scale and set it to fill the entire width of the SVG. This is a bit of a naive approach and we’ll refine it later. We also placed each bubble in a row so none of them would overlap. It is going to look like

Bubbles!

Heck, already that’s kind of nifty. If we compare it with the CodeEval one, though it doesn’t have the same sorts of cool over laps and we’re missing labels. The labels are easy so let’s start with them.

Labels

In a previous post I mentioned that it was difficult to center strings in an SVG. This, as it turns out, isn’t true! You can make use of an attribute called text-anchor which sets where the anchor point is for a block of text. In our case we want to set its value to “middle” which means that whatever x value we gave should be treated as the center of the string.

This will add the name of the language to all our bubbles. You’ll notice that for the x and y values I’ve taken the values from the data array. In a step you’re about to see we calculate the values for the radius and coordinates for each bubble. Saving it back into the data array means we only ever have to calculate it once.

Calculating Bubbles

When CodeEval made their graphic they probably did so using photoshop or some other piece of graphical design software. This is typical of a lot of the visualizations we see on the web. As with most computery things if you have to do something once then you can do it by hand but if you do it more than once script it. Besides, we don’t havebenefitof graphical design software when we’re drawing an SVG so we need to do a little math to try to get our bubbles to fit in a nice way.

I started by thinking that I would like the majority of the image to be taken up by bubbles. If our bubbles were non-overlapping rectangles then we would end up with an algorithm which looked a lot like the NP-hard optimization version of theknapsack problem.Fortunately,we don’t need to do perfect packing so we can use linear approximations to come up with reasonable values.

I figure out the total volume of the SVG and then use it to manipulate the scale. The magic number there on line 2 was found by doing a bit of guess and check. Anything between 1.5 and 2 seemed to create a reasonable fill. Now we need to figure out the location of the bubbles, this is a bit harder. There are a couple of strategies which can be used; I’ve opted to go with the simplest here in the interests of having something more to blog about later.

We start by pretending we have a bubble in the middle of the canvas. This is done to stop us from having a boring bubble at the center of the canvas every time. From this we can figure out the placement of the next bubble. We would like the next bubble to have a little bit of overlap with the current bubble but not too much. If we add some padding to the current bubble then we get a new circle of possible locations for the center of the next bubble.

Next we pick a random X value somewhere within the radius of the padding circle

Now we can calculate the Y value as we know the radius of the outer circle. You will need to use TRIANGLES to do this. Well one triangle, but that doesn’t invalidate the point that your high school math is actually useful.

That’s the center of your new bubble

I randomly moved this point to differentquadrantsso we didn’t always have a bubble attached to another in the top right.

The drawback with this strategy is that you may have bubbles which end up off screen. I solved this by invalidating positions which were off screen and calculating a new random point for them.

This fits into our bubble calculations like so

Putting it Together

Now we have a way to build the locations of each bubble we can go ahead and combine it with the actual bubble drawing we did earlier.

This gives us something which looks like

I like this a lot! However because our bubbles only care if they overlap the previous we do sometimes end up with a mess like

We’ll look at some ways to deal with this in an upcoming post. You can get the full code for this over at github.

2013-02-07

Starting with Windows Workflow Foundation

I happened to be talking to a friend of mine who was looking for some advice about how to deal with a generalization problem he was having. His problem was around invoices, as so many problems seem to be. His software supports a single workflow for the processing of invoices. It is a tried and tested workflow but he wanted to be able to offer his clients some configurability around the workflow.

I suggested that he take a look a a workflow engine. Workflow engines are systems which allow for a rules based approach to the processing of actions. Basically it is a state machine with actions which can occur in each state. They can usually be modeled as a flow diagram complete with branches and loops. I don’t usually suggest using workflow engines; in my experience they’re over complicated and far more trouble than they’re worth. However a lot of my reservations are due to the ways in which people make use of them.

Frequently developers include them in their applications to allow users to define their work processes. The idea is that in an ever changing business you shouldn’t need involve those developer guys to change your workflow. Besides, those developers are very smart: they just sit around and debate the relative merits of Wheel of Time and Song of Ice and Fire. A workflow engine empowers the business to make their own changes. There is a whole class of workflow engines designed to manage business processes. Heck there is even a language called Business Process Execution Language or BPEL which can be fed into these engines by the end users as they figure out their workflow. At least that is the theory.

In my experience workflows become complicated very quickly, growing beyond the ability of business people to manage. I’m not saying that business people are stupid just the their expertise lie elsewhere. Simon’s rule about workflows is much the same as my rule about reporting: any workflow sufficiently complicated to be useful is too complicated for users to manage themselves. If the users aren’t going to be building the workflow then you’re likely going to recompile and redeploy yourapplicationanyway so one of the big advantages of using a workflow engine is gone. Workflows violate the idea of keeping programming as simple as possible.

So why am I recommending them? In this case my friend wasn’t going to let users write their own workflows and his workflows were very simple. His current flow iscapturedin a C# file of no more than 100 lines.

There are a ton of workflow engines out there but I suggested using Windows Workflow Foundation which is, confusingly, known as WF. I’ve used it before to describe builds in TFS but never in one of my own applications. The information out there on WF is a bit confusing as there are tutorials about version 3.5 and some about version 4 the two of which are as different as pasta and electromagnativity. This getting started guide is all about version 4.

Start with a blank console application. Add the WF libraries to the project. These come with .net 4 so you don’t have to hunt them down.

Workflow Libraries

Next you create an Activity. Activity are the core elements of a workflow. An activity can contain other action. You can build your own custom activities or use one of the built in activities. The built in activities are not particularly useful other than the tasks related to the flow through the workflow such as if and while.

You should now have a blank canvas in front of you to which you can add activities. I added two code activities. The first was a simple console activity to write string to the console

It is pretty basic class. The only interesting piece is

public InArgument Message { get; set; }

This property allows you to pass information in and out of the activity from the rest of the workflow. Obviously this is an input argument. I used an output argument in my other task:

This simply returns a random number. From these two tasks and the built in If activity I was able to make this simple activity for the all important task of cheese counting

Simple workflow

The workflow itself contains a variable called NumberOfCheeses which is scoped to the sequence. The output of GetARandomNumber is assigned to this variable

Activity properties

I also set the input properties of the two WriteToConsole activities

What will be printed

This simple workflow should be sufficient to test that we’ve grasped the concepts. How do we run it? Workflows can be run in a number of ways, they can be handed off to standalone workflow engines or you can use a self hosted workflow runtime. I chose the second one as it was far easier.

Here a new workflow invoker is created and given the main activity shown above. It is then started. If we execute the application and check out the console, the output from the WriteToConsole activity is shown. Easy peasy!

There is a lot more to workflow such as the ability to load and persist workflows from databases and hook into events in the workflow. I may do another blog once I’ve figured some of those out.

2013-02-06

Great Customer Service Pays

I have a membership at TekPub which is a provider of programming related videos. I really like the content, it is well thought out and they manage to get some really big names to do videos. Of course being a famous programmer is sort of like being the best tiddly-wink player in the world ““ not all that impressive to the larger world. The title sequences for the videos are one of the best parts, I don’t know if Rob Conery makes them himself but to me they’re super impressive.

What I don’t like about TekPub is actually getting to my videos. I find it to be awkward and it always takes me longer than I would like. My irritation with the site was compounded by the fact I had ribs for lunch and needed something to watch while eating them. Every second spent fiddling with the website was a second noteatingribs. Leveraging the relative anonymityof twitter I complained about it:

I love tekpub videos but the website is so hard to navigate around

“” Simon Timms (@stimms) February 5, 2013

I returned to my ribs, satisfied that the world now knew what was going down.

I can never understand why my keyboard sticks

Within a couple of minute Rob was on the twitter asking for details of my complaints. Now I’ve talked with Rob a couple of times in the past when I was using SubSonic a bunch. He can be a little”¦ umm”¦ scary. He is a smart dude but can get a bit angry. I was a little afraid. He asked me to send him an e-mail with some details about what I didn’t like. Was it a trap so he could sign me up for mailing lists in revenge? Was he using the e-mail as a way to track me to my house? I sent him a list of things which could be improved. Then I hid.

There was no need to hide, though, Rob e-mailed back in couple of minutes, he was super friendly and is going to implement a bunch of the features I wanted! Those he isn’t going to implement he had either a good technical reason not to or he had thought the improvement through way more than me and showed it to be unwise. I was delighted. Also relieved. I called my wife and told her she could unload the shotgun: Rob wasn’t coming.

I really appreciated the time that Rob took to answer my inquiries. A personal response take a bit of time but it is so worth it. If you have a small company then it is this sort of user interaction that sets you apart from the big boys. I don’t know at what size of a company this gets lost but it seems to be somewhere between TekPub size and FogCreek size. There are a few other companies out there which have sold me using good customer service. Perforce and DigiCert jump to mind as prime examples. I recommend their products in a second now just because of their good service.

I bet I’ll be able to see the things Rob agreed toimplementsoon. It is kind of exciting that in some way I’ve had a positive impact on the site. Now I’ve got to find some dental floss, stupid ribs.

A blog about computer programming and technology.

My Books