d3

Recreating Visualizations: CodeEval

I come across great visualizations every day. Every time I see one I now start thinking about how I could recreate it using SVG or even how I could improve it. The Recreating Visualizations series of blog articles is going to explore some of these.


There was an interesting little article on reddit a few days back about some of the most popular programming languages of 2012. The results are a bit questionable but I did like the visualization they used. I say that the results are questionable because CodeEval are only looking at their own site’s results.  They failed to state that clearly. To me the gold standard of programming language usage statistics is TIOBE. They publish their methodology openly and I cannot find fault with it. However their visualizations are not very attractive. Let’s see what happens when we combine the great visualizations of CodeEval and the statistics of TIOBE using d3.js.

A common mistake with bubble graphs of this sort is the scale of the bubbles. When building the graph we use the radius of the circle to draw the circles but if we use the radius directly to scale the circles then we actually end up with inaccurate circles because the surface area of a circle is pi * r^2. We need to adjust for this when building the graph by taking the square root of the value as the radius.

To start we build the data by copying and pasting from TIOBE

If you’ve read my multi-part series on introducing data visualizations for HTML5 then this should all look pretty familiar. If you haven’t read it then you should! I worked hard on it.

Initial Bubbles

Here we are setting up a new entry in our Graph module. You’ll notice on line 15 I set up the square root scale and set it to fill the entire width of the SVG. This is a bit of a naive approach and we’ll refine it later. We also placed each bubble in a row so none of them would overlap. It is going to look like

Bubbles!Bubbles!

Heck, already that’s kind of nifty. If we compare it with the CodeEval one, though it doesn’t have the same sorts of cool over laps and we’re missing labels. The labels are easy so let’s start with them.

Labels

In a previous post I mentioned that it was difficult to center strings in an SVG. This, as it turns out, isn’t true! You can make use of an attribute called text-anchor which sets where the anchor point is for a block of text. In our case we want to set its value to “middle” which means that whatever x value we gave should be treated as the center of the string.

This will add the name of the language to all our bubbles. You’ll notice that for the x and y values I’ve taken the values from the data array. In a step you’re about to see we calculate the values for the radius and coordinates for each bubble. Saving it back into the data array means we only ever have to calculate it once.

Calculating Bubbles

When CodeEval made their graphic they probably did so using photoshop or some other piece of graphical design software. This is typical of a lot of the visualizations we see on the web. As with most computery things if you have to do something once then you can do it by hand but if you do it more than once script it. Besides, we don’t have benefit of graphical design software when we’re drawing an SVG so we need to do a little math to try to get our bubbles to fit in a nice way.

I started by thinking that I would like the majority of the image to be taken up by bubbles. If our bubbles were non-overlapping rectangles then we would end up with an algorithm which looked a lot like the NP-hard optimization version of the knapsack problem. Fortunately, we don’t need to do perfect packing so we can use linear approximations to come up with reasonable values.

I figure out the total volume of the SVG and then use it to manipulate the scale. The magic number there on line 2 was found by doing a bit of guess and check. Anything between 1.5 and 2 seemed to create a reasonable fill. Now we need to figure out the location of the bubbles, this is a bit harder. There are a couple of strategies which can be used; I’ve opted to go with the simplest here in the interests of having something more to blog about later.

We start by pretending we have a bubble in the middle of the canvas. This is done to stop us from having a boring bubble at the center of the canvas every time. From this we can figure out the placement of the next bubble.  We would like the next bubble to have a little bit of overlap with the current bubble but not too much. If we add some padding to the current bubble then we get a new circle of possible locations for the center of the next bubble.

padding

Next we pick a random X value somewhere within the radius of the padding circle

calculation1

Now we can calculate the Y value as we know the radius of the outer circle. You will need to use TRIANGLES to do this. Well one triangle, but that doesn’t invalidate the point that your high school math is actually useful.

calculation2

That’s the center of your new bubble

Bubbles!

I randomly moved this point to different quadrants so we didn’t always have a bubble attached to another in the top right.

The drawback with this strategy is that you may have bubbles which end up off screen. I solved this by invalidating positions which were off screen and calculating a new random point for them.

This fits into our bubble calculations like so

Putting it Together

Now we have a way to build the locations of each bubble we can go ahead and combine it with the actual bubble drawing we did earlier.

This gives us something which looks like

Screen Shot 2013-02-08 at 9.42.31 AM

I like this a lot! However because our bubbles only care if they overlap the previous we do sometimes end up with a mess like

Screen Shot 2013-02-08 at 10.06.17 AM

We’ll look at some ways to deal with this in an upcoming post. You can get the full code for this over at github.

HTML 5 Data Visualizations – Part 6 – Visual Jazz

Note: I will be presenting a talk on data visualization in HTML5 on February the 14th at the Calgary .net user group. Keep an eye on http://www.dotnetcalgary.com/ for details. This blog is one in a series which will make up the talk I will be giving. I’m planning for this to be the final instillment of this series. However, I’ve enjoyed playing with d3.js so much that I will very likely make visualization using it an ongoing theme on this blog.  I’ve never considered myself much of an artist, as my poor school teachers can attest, but I do like this visualization design. In the last part of the series we figured out how to make a simple bar chart using d3.js. But this isn’t going to impress your boss because your boss read an article last week about HTML5 and how it is better than excel(I swear to you there are articles like this in “Boss Magazine” and “Pointy Hair Weekly”). The graph we made could have been created in excel so lets jazz it up a bit.

Animation

To start with let’s animation which is super simple with transitions. You can animate multiple properties and even add effect like bounce. Here is an example of loading the graph using transitions. I refreshed it a couple of times in the video because the effect is so cool. [wpvideo ZbF9usve] In this case all that was added was a couple of lines describing what to animate (the x attribute) and what effect to use (bounce). The added commands are there on lines 9-11. Transition tells d3 to animated from the previous value of at attribute to the new value. In this example we haven’t given any x value so the rectangles start off at the default x value of 0. Ease instructs d3 to use an effect, in this case the bounce effect. Finally duration tells 3d to make the animation take 750ms. Most properties can be animated. Here we have dropping and bouncing [wpvideo gjBv23aE] And this is my favorite: growing. In it you’ll notice that I had to set up a default value for y and transition both y and the height. That’s because 0,0 is in the top left and the bars would grow down, otherwise. [wpvideo R6FAvBoa]

Interaction

Animation are all very well and are great for leveraging the halo effect to ensure that people are enthusiastic about your application, but they aren’t all that useful overall. Fortunately, d3.js defines the ability to add event listeners to your visualization permitting interaction.  When I first played with them I used them to change the colours on bars as of the graph as I hovered over them. In his D3 book “Interactive Data Visualization” Scott Murray points out that this effect can be better created using only CSS’ hover pseudo selector. That’s unfortunate because up until I read that section it was going to be my example. Instead let’s try adding extra information to the bar.

This ended up being way more complex than I had originally planned so let’s build it up nice and slow. The first thing is that we add some additional information to each of the month bars.  Here we’ve added weekly percentages to each month.

We would like to divide the existing bars into bands when somebody mouses over them. To do this we can make use of the on() command.  on takes two arguments, the first is the name of the event to bind, in most cases this will be mousover, mouseout or click.  The second argument is a function to call when the event occurs.

#file-mouseover1-js-L9-L11 That’s the easy part, the harder part is to come. We add to the current bar a number of additional bars

On line 2 in this code we set up a new scale which generates a different colour for each entry. D3.js comes with a couple of built in colour scales and here we’re using one with 10 colours. If this wasn’t a demo script I would make my scale derivative of the original bar colour. Line 3 is just a shortcut to the currently covered bar. Line 4 gets the top of the currently selected bar, this will be where we start adding new bars. Line 5 is where things get interesting, you may notice it looks somewhat familiar. In fact we’re using the same construct as earlier to define the bars. You’ll notice this select-data-enter quite frequently in d3. The only complex attribute is the y attribute which changes with each element as each element must start further down the bar.

All of this gets use something which looks like

[wpvideo mtzMgPaN]

There is a obvious flaw in this in that moving the mouse off the chart doesn’t remove the bars. To fix this we add a transparent rectangle over the top of the whole bar to detect when the mouse moves out. The original bar can’t be used as it will be covered which will cause the mouse out event to fire erratically.

Now it looks like

[wpvideo LobFZMpn]

Conclusion

We’ve only scratched the surface of the cool visualizations which can be created with d3.js. HTML5 visualizations are a great way to help people understand data. There is so much information available in the world today that it is almost impossible to understand it with out some sort of a visual aide. I’m going to continue blogging about data visualizations as I learn more about d3. You should learn along with me!