Simon Online

2013

2013-01-29

HTML 5 Data Visualizations "“ Part 6 "“ Visual Jazz

Note: I will be presenting a talk on data visualization in HTML5 on February the 14th at the Calgary .net user group. Keep an eye onhttp://www.dotnetcalgary.com/ for details. This blog is one in a series which will make up the talk I will be giving. I’m planning for this to be the finalinstillmentof this series. However, I’ve enjoyed playing with d3.js so much that I will very likely make visualization using it an ongoing theme on this blog. I’ve never considered myself much of an artist, as my poor school teachers can attest, but I do like this visualization design. In the last part of the series we figured out how to make a simple bar chart using d3.js. But this isn’t going to impress your boss because your boss read an article last week about HTML5 and how it is better than excel(I swear to you there are articles like this in “Boss Magazine” and “Pointy Hair Weekly”). The graph we made could have been created in excel so lets jazz it up a bit.

Animation

To start with let’s animation which is super simple with transitions. You can animate multiple properties and even add effect like bounce. Here is an example of loading the graph using transitions. I refreshed it a couple of times in the video because the effect is so cool. [wpvideo ZbF9usve] In this case all that was added was a couple of lines describing what to animate (the x attribute) and what effect to use (bounce). The added commands are there on lines 9-11. Transition tells d3 to animated from the previous value of at attribute to the new value. In this example we haven’t given any x value so the rectangles start off at the default x value of 0. Ease instructs d3 to use an effect, in this case the bounce effect. Finally duration tells 3d to make the animation take 750ms. Most properties can be animated. Here we have dropping and bouncing [wpvideo gjBv23aE] And this is my favorite: growing. In it you’ll notice that I had to set up a default value for y and transition both y and the height. That’s because 0,0 is in the top left and the bars would grow down, otherwise. [wpvideo R6FAvBoa]

Interaction

Animation are all very well and are great for leveraging the halo effectto ensure that people are enthusiastic about your application, but they aren’t all that useful overall.Fortunately, d3.js defines the ability to add event listeners to your visualizationpermitting interaction. When I first played with them I used them to change the colours on bars as of the graph as I hovered over them. In his D3 book “Interactive Data Visualization” Scott Murraypoints out that this effect can be better created using only CSS’ hover pseudo selector. That’s unfortunate because up until I read that section it was going to be my example. Instead let’s try adding extra information to the bar.

This ended up being way more complex than I had originally planned so let’s build it up nice and slow. The first thing is that we add some additional information to each of the month bars. Here we’ve added weekly percentages to each month.

We would like to divide the existing bars into bands when somebody mouses over them. To do this we can make use of the on() command. on takes two arguments, the first is the name of the event to bind, in most cases this will be mousover, mouseout or click. The second argument is a function to call when the event occurs.

#file-mouseover1-js-L9-L11 That's the easy part, the harder part is to come. We add to the current bar a number of additional bars

On line 2 in this code we set up a new scale which generates a different colour for each entry. D3.js comes with a couple of built in colour scales and here we’re using one with 10 colours. If this wasn’t a demo script I would make my scale derivative of the original bar colour. Line 3 is just a shortcut to the currently covered bar. Line 4 gets the top of the currently selected bar, this will be where we start adding new bars. Line 5 is where things get interesting, you may notice it looks somewhat familiar. In fact we’re using the same construct as earlier to define the bars. You’ll notice this select-data-enter quite frequently in d3. The only complex attribute is the y attribute which changes with each element as each element must start further down the bar.

All of this gets use something which looks like

[wpvideo mtzMgPaN]

There is a obvious flaw in this in that moving the mouse off the chart doesn’t remove the bars. To fix this we add a transparent rectangle over the top of the whole bar to detect when the mouse moves out. The original bar can’t be used as it will be covered which will cause the mouse out event to fire erratically.

Now it looks like

[wpvideo LobFZMpn]

Conclusion

We’ve only scratched the surface of the cool visualizations which can be created with d3.js. HTML5 visualizations are a great way to help people understand data. There is so much information available in the world today that it is almost impossible to understand it with out some sort of a visual aide. I’m going to continue blogging about data visualizations as I learn more about d3. You should learn along with me!

2013-01-28

this vs. _this in TypeScript

One of the real difficult things to deal with in JavaScript is understanding exactly to what the variable “this” currentlyrefers. “this” is a scope variable which means that it can change from line to line. In most languages this wouldn’t be a big deal because the number of scopes is small but with JavaScript so much is done with anonymous functions that things become confusing quickly.

In TypeScript many of these internal function can be replaced with what I would call lambdas but I believe might also be known as “Fat Arrow Functions”. These are taken directly from ECMAScript 6.0. However there is a key difference between the new lambda functions and the current function denoted functions: the value of “this”. In a fad arrow function the value of “this” is bound to the outer scope, the lexical scope.

So if you’re at all familiar with d3.js which I’ve been using a lot as of late the “on” function requires that “this” be permitted to be set by d3.

TypeScript forces this to be bound to the outer context by replacing our call to this with one to _this which is a new variable that TypeScript creates. Obviously this doesn’t work for our case as we expect this to be boud whatever d3 has found during selection.

There are a number of possible fixes on StackOverflow but they seem over complicated to me and some of them are jQuery specific. Instead I recommend simply using a traditional function instead of the fat arrow function.

2013-01-25

HTML 5 Data Visualizations - Part 5 - D3.js

Thus far we’ve made use of either pure SVG or made use of theRaphaÃ«llibrary. Both were pretty simple but using a library certainly made things a bit easier and gave us access to more powerful programming tools. Now if you happent to have gone over to theRaphaÃ«l web site you might have seen some really impressive demos of drawing a tiger in SVGwhich just blows my mind. There are also a number of demos of graphs which are pretty impressive. HoweverRaphaÃ«l is a general purpose SVG library and isn’t designed specifically for making data visualizations. There is another library called d3.js which has been created for exactly our purpose. Cool.

Okay well let’s do the same thing we did earlier and rebuild our original graphing demo making use of d3.js.

Yay, giant block of code! To start with it looks like we’ve managed to get rid of a lot of the declarations which cluttered our function last time. d3.js has a bunch of utility functions which allow our code to be more terse. For instance we no longer need a our own function to find the maximum element in an array, d3.max will do that for us(line 21).

d3.js places a lot ofemphasison method chaining. If you want to set a number of attributes then you can just make multiple calls to attr. It is a clean way to programatically build up properties of the graph objects.

Lines 11 through 13 create an SVG element in the given container. You’ll notice that as soon as we create the element with append we can start adding attributes to it.

Next we set up a scaling factors.

d3.js has some great tools for setting up graph scales. Here we see two different examples. The first is using an ordinal scale, this means that we have a discrete set of input, or domain, elements. Our data contains a number of months and we map each one of those to something in a range. We map our domain to a rangeBand in this example. A range band is a continuious interval and the function will find a number of evenly spaced discrete values within that band to mark as output points. We also give it a padding of 10% to allow for spacing between our columns.

For the vertical, or y scale we use a simple linear scale taken from the domain of 0 through to the maximum value in the data set. For the range we use 0 through the maximum height of the bar which we set up earlier.

Here we are setting up the x axis labels by appending a new element to the graph and setting the properties.

Finally the actual bars of the graph are set up. Using data directive we set up the data used to drive our graph. Enter acts much like a map directive which calls the code that follows for each element of the data. This is where we use the various scaling functions we set up earlier.

The result?

Graphtastic!

I really like the declarative syntax of d3 and I’m going to tie my horse to it for future data visualization projects.

2013-01-24

You Data isn't an Avocado

At my local supermarket you can buy avocados two different ways: either you get the individual avocados or you buy a bag of five avocados. The individual avocados are never ripe; they are as hard as a governess in a 19th century English novel. The bags of five are usually borderline overripe. This means that if you want to make anything with avocado the day you’re shopping you have to be prepared to eat five avocados in a single sitting. Now I like guacamoles as much as the next guy but five avocados is a hell of a lot.

I will have the enchilada platter with two tacos and no guacamoles

It is difficult to get just the right number and ripeness of avocados for my liking. Usually I have to buy a bunch of bulk avocados and wait rotate them out of the fridge one or two at a time so they’re not all ripe at the same time. A lot of application development is trying to get people data which is ripe. But you know what? Sometimes you can buy the data earlier and wait for it to ripen.

I don’t get it.

Yeah it isn’t the best analogy but I just ate 5 avocados worth of guacamole so I’m not all that coherent. What I’m getting at is that even though people tell you they absolutely need the latest data to make their decisions they don’t. I work in the oil and gas market and the majority of the people with whom I deal are the tree hating, printing reports type. If they are printing reports then they’re never going to have the latest data. The state of the system can change radically even from the time they hit print to the time they pick the paper up on the printer.

It is a bit of a change of mindset for developers to appreciate the fact that they don’t always have to query the database for information. Caching isn’t a new concept but typically it has only been used for expensive operations. What I suggest is that you should flip your mindset around caching from “cache when expensive” to “query when stale”. Cache everything.

There are some great tools and techniques out there for doing this at an application level. One of my favorite is to make use of an aspect weaver like Postsharp to wrap data queries. This allows you to write your code as normal but simply annotate the repository methods with an attribute which will cause the weaver to intercept the method invocation and pull the answer from the cache instead of from source.

The only obstacle to caching is knowing when to invalidate the cache and cause a requery. That is where I would suggest you spend your business analysis time. How frequently does data change? How much importance should be placed on having fresh data?If you happen to have an event log from an active system then it is pretty easy to calculate how frequently data changes.

Caching has a significant speed advantage and will allow your application to scale further with the same database. Databases are generally the key scalability bottlenecks in most systems and being able to delay difficult and expensive database rearchitecting or replacement is almost certain to pay off.

2013-01-23

Does var compile to the same code?

In a code review the other day the topic of when to use var and when to use a non-inferreddata-type. That’s a religious argument and probably a post for another day but the question of

Is there any difference between the code produced using var and using, say, string?

I confidently answered “No, it is identical. The type is inferred by the compiler and replaced”. But I was thinking about it later and I wondered if I was right.

I created a really simple test program

This code compiled down to

Changing the string to var produced exactly the same IL. So, oddly, I was right about something.

2013-01-22

The Skytech Security Fiasco

There was a story making the rounds today on the twitter about a Montreal university student who had been expelled for, ostensibly, testing the security of a web site. If you missed it there are a number of articles out there about it as it has become a bit of a media darling.

The story goes that this young fellow was working on an app for letting students access their data. In order to test their app they were given access to a test server atSkytech, the company behind the student information software. While playing around he discovered an exploit which allowed him to gain access to information on any student. It is a pretty common exploit: not cleaning your inputs. Al-Khabaz did the right thing in reporting thevulnerabilityand, to their credit, Skytech had a fix deployed in about a day. This is a bit slow in my mind for such a serious exploit but many company aren’t quite there yet on being able to deploy at the drop of a hat.

A few days laterAl-Khabaz ran a security testing tool against the test server he had been given to ensure that there were no other vulnerabilities. This is where things start to go off the rails. Skytech noticed an increased load and claim that the attack was damaging their ability to serve their customers. The president of Skytech, Edouard Taza, called upAl-Khabaz and demanded that he come into the Skytech office and sign a non-disclosure agreement or they would press charges.It seems that Dawson College got wind of all this activity and started their own investigation. Theyconveneda pannel of 15 computer science professors who voted to expelAl-Khabaz.

That brings us to today. I see a number of things here which could have been done better both from a technical and from a human relations point of view:

There is no denying itAl-Khabaz should have checked with Skytech before running vulnerability tests. I can see where he is coming from and it is unlikely that he knew how much traffic the tool,Acunetix, would generate on Skytech’s site.
There is no way that Acunetix, running on a single developer workstation, should be able to take out a website designed to serve such a large body as all the students in Quebec. There is a lack of preparedness for attacks on Skytech’s part. This is a site which is likely to attract attacks as it contains a lot of student data including SIN numbers, grades, addresses and the like. One thing is for sure now that Skytech’sineptitudehas been revealed they’re going to be the brunt of some actually serious attacks. If you’re a student in Quebec you should be worried.
An attack on a testing server should not have had an effect on the production site. It is a test server for a reason, you test things against it and, from time to time, that testing is going to be destructive. Separate your servers! With the low prices of cloud servers there is no excuse to have your test site on your production hardware.
Skytech reacted well to the first vulnerability but they reacted terribly to the proceeding attacks. As a company you have to know that threatening students with legal action is basically blackmail. If you want people to keep quiet about how crummy your security is then you’re pretty much going about it the right way. If you want to actually be secure then you’re screwing up. Believe me having a whitehat test your site and report problems is going to save you some big trouble in the future. That’s why Google runcompetitionsto find exploits in Chrome.

Now I understand that Skytech have made some moves to fix their screw-ups here including giving Al-Khabaz a scholarship and offering him a job. Good for them. I don’t believe he took the job but I wouldn’t either, who wants to work for bullies?

From what I can tell Skytech were getting a free app created here by students of Dawson. So there was probably some sort of an agreement between Dawson and Skytech to allow students access to a real world system in return for an app. Sounds a lot like slave labour to me. I’m not a fan of unpaid internships or freecollaborations. Companies should pay for apps to bedevelopedfor them. Programmers should not be giving services away for free to companies, it devalues the profession. If you’re a programmer and you want to hack on something to help people there is a whole lot of open government data out there which has a greater potential than Skytech’s data.
Dawson college are so far into the wrong that they can’t be saved. To me the fact that 15 researchers chose to slam the research of a student and in fact expel him is crazy to me. They claim that it is against professional conduct. Okay fine, point me to theaccrediteddocument which outlines the professional conduct for a computer scientist. No, no I’ll wait.

Exactly.

Even if such a document existed testing the security of a test server is unlikely to be a serious violation. The CBC checked with some lawyers and they could find no charge under the criminal code so it is radically presumptive of the university to suppose that the activities were illegal.

The kangaroo courts that universities set up in this country need to be stopped. These professors, locked in their ivory towers, have no idea about real worldconsequences. Where are the police charges ifAl-Khabaz actually did something seriously wrong?

I know that if I were a student I wouldn’t want to go to Dawson College and if I were an employer I would be suspicious of graduates of Dawson. If their professors can’t understand the difference betweencriminalhacking and harmless testing they shouldn’t be teaching and their students might need remedial training.

Dawson saw this as an optics problem and did what they could to get rid of it. Well that worked out pretty well didn’t it, Dawson?

Idiots.

2013-01-21

Content Security Policy for ASP.net MVC

In the last article we talked a bit about Content Security Policy. Now let’s see how to quickly apply it to an ASP.net MVC project.

The ASP.net MVC project have provided some extension points in the lifecycle of a request which allow you to hook in almost as if you’re using AOP. The one we’reinterestedin today is the global action filter. This is fired for every request and is an ideal place to put in a hook for adding HTTP headers.

First we create an action filter attribute which extendsActionFilterAttribute

As you can see here I’ve put in all the different headers we talked about yesterday. You could make this more efficient by checking the browser and only sending the response which suits. That is kind of a pain to do as CSP is still in flux on most browsers. In a couple of years you will probably be able to only send one header.

Next we tie it into ASP.net MVC. You can throw it into the FilterConfig.cs file like so:

(line 6 is the relevant one)

And you’re done! I tested it by throwing in an inline alert(“˜hi’) and found it to be effective. Well effective in Chrome and FireFox. IE10 still merrily threw up an alert. IE10 support is not there yet, perhaps in IE11.

There is one other good way to add CSP to an ASP.net MVC project and we’ll cover that in a future post.

2013-01-18

Content Security Policy

A few weeks ago I was doing some research into web application security to placate some security concerns a security audit raised. For the most part what I found was the typical advice

Avoid SQL injection attacks by using parameterized queries
Use low privilege accounts to run the web server and the database
Don’t connect to the SQL server with an account with permissions other than db_reader and db_writer(or your database’sequivalent)
Validate user input
HTML encode any untrusted string(so pretty much everything)
Avoid using dynamic SQL

Onerelativelynew development I hadn’t heard of was using a technique called Content Security Policy. To understand the purpose of CSP you first need to know a little bit about what comes down the line to you from the web server and how the browser handles it.

For many years, until Chrome came a long a messed it all up, the first 7 characters of every URL you saw was http://. This was a protocol identifier just like ftp:// or smb://. Chrome dropped showing it, something I agree withentirely. The protocol HTTP is the language which web servers speak. One of the things which HTTP defines are things called headers. These headers provide meta-information related to the request or the response. For the most part you can be a web developer and never look at HTTP headers. It is the headers which contain POST parameters as well as Accept-Types and Content-Types. The body of the HTTP response contains the HTML, CSS and JavaScript which define the page. Within that markup you can define external resources to load be they images, scripts or style sheets.

CSP is another header which describes the behaviour of the browser when it comes to loading the external resources and processing internal scripts. There is a wide variety of things which can be done using CSP but the most useful is to block the execution of inline JavaScript.

What?

How is my JavaScript going to get run if I can’t have it inlined? Well pretty simply you make all of your javascript included from external files.

Why?

Because one of the most common attacks against sites it to inject some nefarious JavaScript in a way that it is rendered out when other users are logged in. By doing so you can grap their cookie information or information on the page. Think about a site with a comment system, if a bad guy injects some javascript code which can run in the context of other users then they can perform any action that the logged in user can. If all your JavaScript comes from static JavaScript files then there is no attack vector to exploit.

To get this working you’re going to need to add 3 different headers to your site. This is because the various browsers have differing levels of support for CSP.

Content-Security-Policy: script-src 'self' 
X-WebKit-CSP: script-src 'self'
X-Content-Security-Policy: script-src 'self'

The first line is the policy as defined by the W3 standard, it is supported only by chrome and even then only by version 25+. The second version works in older WebKit based browsers. The third is supported by FireFox and IE10+. The support for CSP across browsers is not fantastic. At the time of writing there is support for about 55% of users.

There are quite a few rules you can use in the CSP. The rules above require that all your scripts come from your domain. If you make use of a CDN then you can add it to the end of the rules

Content-Security-Policy: script-src 'self' https://youcdn.com
X-WebKit-CSP: script-src 'self' https://yourcdn.com
X-Content-Security-Policy: script-src 'self' https://yourcdn.com

You can also set the default so that all resources (scripts, style, images, flash, frames, fonts) are restricted to your server.

Content-Security-Policy: default-src 'self' https://youcdn.com
X-WebKit-CSP: default-src 'self' https://yourcdn.com
X-Content-Security-Policy: default-src 'self' https://yourcdn.com

Once you get your head around not being able to use inline JavaScript then CSP is a clear win and should probably be the default when you create a new project.

2013-01-17

Who is Afraid of Change?

If you’ve ever worked for a large company then you’ve probably run into a change management policy. If you’ve been particularly bad in a previous life and I’m talking tremendously bad, maybe you caused the extinction of a species or you failed to equip the Titanic with enough life boats or you composed country music, then you have had to deal with a change control board. Just the words strike fear into my heart. The only way it could be worse is if you slap the word “global” or “enterprise” into the phrase:

Global Enterprise Change Management Board

When I heard the phrase I use to picture a mystical circle of elders who, filled with the knowledge of decades of experience and probably a dozen degrees a piece. They would have such integral knowledge of the systems that they would just be able to look at a requested change and intuit what would break.

The Change Management Board

I don’t think that anymore. The truth is that change management boards tend to be populated with management types who really have no idea how things actually work. Their technical knowledge is out of date and they have to rely on the people requesting the change for background knowledge. If you’re looking to the person requesting the change to describe and analyse the change then why even bother having a change management board? No reasonable checks andbalancesare provided.

The purpose of a change management process is to try to restrict unexpected side effects from changes to the computing environment in a company. What it actually does is slow down change and make IT less responsive to the changing business environment. If the effects of a change aren’t known then why are you making the change? You’re basically saying “we don’t understand our environment sufficiently well to know what we’re doing”.

Many would argue that this is understandable, after all the computing environment at large companies is stunningly complicated. I don’t buy it. If you go to your doctor they understand the body well enough that they know what the side effects could be of giving medicines. Through experimentation and analysis medical science has cataloged the possible side effects of using medicines. We would not accept doctors giving us medicines which aren’t understood so why are we accept the same thing from our IT groups? The human body is many many orders of magnitude more complicated than even the most complex enterprise networks.

We are fortunate to live in a time of virtual machines and provisioning tools like Chef and Puppet. Using these tools it is possible to easily establish full test environment in a matter of minutes. The test environment can be a scaled down version of the production environment. This gives an amazing tool for testing any change. Even if your company doesn’t have the spare computers to provision an environment then you can still build a test environment on EC2.

If you’re afraid of change and feel the need to try to control it using a change management board or a rigid change management process then you would be far better off spending your time developing a reproducible testing environment and a suite of tests. Frankly not having a solid testing environment is irresponsible. No amount of change management is better than a realistic testing regim. This isn’t a problem of lack of regulation, it is a problem of lack of understanding and professionalism. It is a cognitivedissonanceof restricting change when the business is based on change. It has got to be fixed.

2013-01-16

HTML5 Data Visualizations "“ Part 4 "“ Creating a component with RaphaÃ«l and TypeScript

In the last article we created a simple little hard coded graph with RaphaÃ«l. I also insulted the Germans. That wasn’tentirelyfair because RaphaÃ«l is really an Italian name and we never thrashed them in a war, probably because they were too busy being defeated by the French. The French. Oh, and we did thrash the Italians too. That’s why you come to this blog: friendly racism, code and an inability to forget wars which happened 40 years before the author was born. So we’re’ still going to call the library Ralph.

Building JavaScript graphs is pretty easy with Ralph but we still don’t want to recreate the graph on every page where we need a graph. The solution is to turn it into a component. Honestly JavaScript is pretty crummy at being put into a component. Creating classes and namespaces in javascript is possible but it is a lot of boiler plate code and look like a mess. Keeping code in namespaces or modules is key to building a scalable application where there are no collisions between names.

Fortunatelywe live in an age where there are multiple languages which transcompile down to JavaScript. For this effort we’re going to make use of TypeScript. TypeScript is an open source project which was created by Microsoft. It makes use of some conventions which are looking like they are going to be a part of the next release of JavaScript. In many ways it is like JavaScript vNext right now.

I’m going to dive right in and rewrite this puppy as a component in TypeScript

Okay that was fun but what is all this doing? Well on line one we have started by creating a module. A module is basically a namespace. Then on line 2 a public, or exported class inside that module. This is just like you would expect a class to behave in Java or C#. Within that class we define a constructor. One of the cool things is that prefacing each parameter in the constructor with public make it a public property on the class automatically and assigns the value from the constructor to it. C# could use a shortcut like that! We also set the data types. This doesn’t really do anything with the generated JavaScript but it does throw compiler errors if we use a string when a number is expected. TypeScript is great in that it allows you to tune how much strong typing you want.

Another great feature in the component is the use of ES5 style loops on line 29. This is basically an iterator syntax.

The final thing to which I wanted to draw your attention was the lambda style anonymous functions. You can see this inside the same loop on line 29. I’ve always found this syntax cleaner than a full function. The => is sort of the official syntax now for lambda functions with C# and Java both adopting it.

To make use of our new graph component is easy. We just call

Now, of course, this all compiles down to JavaScript because you could have go exactly the same functionality in pure JavaScript. What does that look like? Well it is a bit more verbose:

I find the TypeScript to be way more readable.

In the next part we’re going to look at an alternative library which might be better suited for creating data visualizations.

Archives

A blog about computer programming and technology.

My Books