2013-04-15

What I'm Excited About at Prairie DevCon

Every year I go to Prairie DevCon which is put on by the stylish and dapper fellow who is D’Arcy Lussier. Every year he schedules me in the worst speaker’s spot and laughs and laughs. Once he put me in the last spot of the conference and gave away free beer in the room next to me. 3 people came to my talk. 1 left when I mentioned the free beer. Come to think of it, he really isn’t so much dapper as an ass.

None the less he finds interesting topics and great speakers. I wanted to highlight the talks which I think are going to be reallyinteresting. Probably after reading this D’Arcy will change my schedule so I can’t attend any of them but I’m going to take the risk:

1.Machine Learning for Developers ““ In university I tried some machine learning and it was terrible. That being said it is a reallyinterestingfield and if it really is possible, with modern tools, to get something built in an hour then I’m adding machine learning to everything.

2.A Primer on LESS and Sass ““ The last time I tried to design a website 3 large men broke down my door and broke my fingers. They said that I had killed all the beauty in the world and that if I ever designed another site they would be back and they would bring pizza. I don’t know what that means nor am I anxious to find out. Eden, on the other hand, can design things and she is a computer scientist. So when you can learn about CSS from a designer to knows how to program you should jump at it.

3.Building Cross-Platform Mobile Apps with Sencha Touch and PhoneGap -I’m starting to really think that PhoneGap is the way to go for developing the vast majority of mobile applications. It is the only real way to get cross-platformcompatibilityand it’s also quick to develop. I’ve played a bit with it but I really am excited to see how a skilled person can use it. I don’t really know anything about Sencha Touch, so that should be fun to learn too.

  1. Git Magic ““ A couple of conference back David Alpert taught me a bunch about git. I haven’t used any of it. Frankly I felt like the advanced stuff he was teaching me would only be used if you had somehow failed to use git properly and had messed up. I’m starting to change my mind about that so I’m going to give James Kovacs a chance to solidly change my mind.

  2. Hacking Hardware ““ I talked a couple of months back with Donald Belcham about his experiments with hardware hacking and was inspired. Not inspired enough to actually do anything but still inspired that a software guy could pick up hardware. I’m super curious to see what he has built.

There are many other interesting looking talks out there but these were my top 5. If you’re nearWinnipeg or if you can run there at the speed of light you should pick up yourconferencetickets now and come out. It will be a riot.

2013-04-12

So You Want to Version Your Access Database

First off let me say that I’m sorry that you’re in that position. It happens to the best of us. There is still a terrifying amount of business logic written in MS Access and MS Excel. One of the things I’ve found working with Access to be greatly improved if you use source control. This is because access has a couple of serious flaws which can bealleviatedby using source control.

The first is that access ismonolithic, it is a single file which contains forms, queries, logic and, sometimes, data. This makes shipping the database easy and doesn’t confuse users with a bunch of dlls and stuff. It also means that exactly one person can work on designing the database at any time. Hello, scalability nightmare.

Next up is that access has a tendency to change things you didn’t change. As soon as you open a form in design mode Access makes some sort of a change. Who knows why but it worries me. If I’m not changing anything then why is Access changing something?

Finally Access files grow totally out of control. Every time you open the database its size increases seemingly at random. This is probably an extension of the previous point.

Access is a nightmare to work with, an absolute nightmare. I have no secret inside knowledge about what Microsoft is doing with Access and Office in general but I suspect that desktop versions of office have a limited future. There have been no real updates to the programming model in”¦ well ever as far as I can tell.

Okay well let’s put the project under source control and then I’ll talk a bit about how this improves our life. I’ll be using TFS for the source control because we might as well give ourselves a challenge and have twonightmares to deal with.

The first thing you’ll need is the access MSSCCI extensionsfollowed up by the MS Access Developer Tools. Now when you open up access there should be a new tab available to you in the menu strip: Source Control. Yay!

Menu bar additionsMenu bar additions

Open up your current database and click the button marked Add Database to Team Foundation. You’ll be prompted for your TFS information. Once that’s been entered access will spool up and create a zillion files in source control for you. This confused us a lot when we first did it because none of the files created were mdb or accdb files: the actual database. Turns out the way it works is that the files in source control are mapped, one to one, with objects in the database. To create a “build” of the database you have to click on the “Create from Team Foundation” button. This pulls down all the files and recombines them into the database you love.

Selecting the TFS source (identifying information removed)Selecting the TFS source (identifying information removed)

You’ll now see that the object browser window now has hints on it telling you what’s checked out.Unfortunately you need to go and check out objects explicitly when you work on them. At first it is a pain but it becomes just part of your process in short order. One really important caveat is that you have to do the source control operations through the access integrations, you can’t just use TFS from Visual Studio. This is because the individual source files are not updated until you instruct access to check them in. Before that changes remain part of the mdb file and are not reflected in the individual files.

Right so what does this do for us? First having the code and objects split over many files improves the ability to work on a databasecollaboratively. While the individual objects are a total disaster ofserializationindividualscan still work on different parts of the same database at once. That’s a huge win. Second we’re protected from Access’ weird changes to unrelated files. If we didn’t change something then we just revert the file and shake our heads at Access. Finally because the mdb file is recreated each time we open it there is no longer unexpected growth.

This doesn’t make working with Accesspainlessbut it sure helps.

2013-04-11

A Day of Azure

Have you guys heard about this Azure thing but not sure what it is? Did your boss read an article about “The Cloud” in a 3 years old copy of CIO Magazine he found while waiting for Tip Top Tailors to fashion a suit of sufficient size to contain his gigantic, non-technical girth? Know all about Azure but haven’t had a chance yet to try it out?

Source: http://www.itstrulyrandom.com/2008/02/07/obese-man-kills-wife-by-sitting-on-her/Make me a cloud and waffles, mostly waffles.

Then boy do we have the event for you! The Calgary .net Users Group is putting on an Azure camp tocoincidewith the global Azure Bootcamp on April the 27th.

Been through the Azure training before? Don’t write us off, we’ve created a special Calgary themed project which show cases the part of Azure most applicable to the Calgary IT market. It might even have a bit of a Calgary theme to it.

I’ll be speaking as will the daring, dynamic anddutifulDavid Paquette.

Where: ICT 122 at the University of Calgary

When: April the 27th starting at 9 and going on until we run out of stuff to talk about. Before 4pm if nobody asks me what’s wrong with Java, 4 am if somebody does.

Cost: Free

Tell me more: Go to the official site:http://dotnetcalgary.com/

2013-04-10

Getting Started With Table Storage

In the last post we started looking at table storage and denormalization, but we didn’t actually use table storage at all. The first thing to know is that your data objects need to extend

Microsoft.WindowsAzure.Storage.Table.TableEntity

This base class is required to provide the two key properties: partition and row key. These two keys are important foridentifyingeach record in the table. The row key provides a unique value within the partition. The partition key provides a hint to the table storage about where the table data should be stored. All entries which share a partition key will be on the same node where as different partition keys within the table may be stored on different nodes. As you might expect this provides for someinterestingquery optimizations.

If you have a large amount of data or frequent queries you may wish to use multiple partition keys to allow for loadbalancingqueries. If you’re use to just slapping amonotonicallyincreasing number in for your ID you might want to rethink that strategy. The combination of knowing the partition key and the row key allows for very fast lookups. If you don’t know the row key then a scan of the entire partition may need to be performed which isrelativelyslow. This plays well into our idea of denormalizing. If you commonly need to look up user information using e-mail addresses then you might want to set your partition key to be the e-mail domain and the row key to be the user name. In this way queries are fast and distrubuted. Picking keys is a non-trivial task.

A key does not need to be constructed from a single field either, you might want to build your keys byconcatenatingseveral fields together. A great example might be if you had a set of geocoded points of interest. You could build a row key by joining the latitude and longitude into a single field.

To actually get access to the table storage you just need to use the table client API provided in the latest Azure SDK.

inserting records is simple. I use automapper to map between my domain objects and the table entities.

It is actually the same process for updating an existing record.

A simple retrieve operation looks like

Those are pretty much the basics of table storage. There is also support for a limited set of Linq operations against the table but for the most part they are ill advised because they fail to take full advantage of the key based architecture.

2013-04-09

Azure Table Storage

Azure table storage is another data persistence open when building applications on Azure. If you’re use to SQL Server or another pure SQL storage solution then table storage isn’t all that different. There is still a concept of a table made up of columns and row. What you lose is the ability to join tables. This makes for some interesting architectural patterns but it actually ends up being not that big of a leap.

We have been conditioned to believe that databases should be designed such that data is well partitioned into its own tiny corner. We build a whole raft of table each one representing bit of data which we don’t want to duplicate. Need an address for your users? Well that goes into the address table. As we continue down that line we typically end up with something like this:

Typical Relational Database StructureTypical Relational Database Structure

Here each entity is give its own table. This example might be a bit contrived, I hope. But I’m sure we’ve seen databases which are overly normalized like this. Gosh knows that I’ve created them in the past. How to arrange data is still one of the great unsolved problems of software engineering as far as I’m concerned(hey, there is a blog topic, right there).

With table storage you would have two options:

  1. Store the IDs just as you would in a fully relational database and retrieve the data in a series of operations.
  2. Denormalize the database

I like number 2. Storing IDs is fine but really it adds complexity to your retrieval and storage operations. If you use table storage like that then you’re not really using it properly, in my mind. It is better to denormalize your data. I know this is a scary concept to many because of the chances that data updates will be missed. However if you maintain a rigorousapproach to updating and creating data this concern is largely minimized.

DenormalizedDenormalized

To do so it is best to centralized your data persistence logic in a single area of code. In CQRS parlance this might be called a denormalizer or a disruptor in LMAX. If you have confidence that you can capture all the sources of change then you should have confidence that your denormalized views have correct data in them. By directing all change through a central location you can build that confidence.

In tomorrow’s post I’ll show how to make use of table storage in your web application.

2013-04-08

Hollywood Teaches Distributed Systems

I’m sitting here watching the movie Battle: LA and thinking about how it relates to distributed systems.The movie is available on Netflix if you haven’t seen it. You should stop reading now if you don’t want me spoiling it.

The movie takes place, unsurprisingly, in Los Angeles where an alien attack force is invading the city. As Aaron Eckhart and his band of marines struggle to take back the city they discover that the ships the aliens are using are actuallyunmanneddrones.

War MachineUnmanned, except for the man in front, obviously

This is not news the member of the signal corps they have hooked up with. She believes that all the drones are being controlled from a central location, a command and control center. The rest of the movie follows the marines as they attempt to find this structure which, as it turns out, isburiedin the ground. It is, seemingly, very secret and very secure hidden in the ruined city.Fortunately, as seems to happen frequently in these sorts of movies, the US military prevails and blows up this centralstructure.

The destruction of the command ship causes all the drones, which where were previously holding human forces at bay, to crash and explode. It is a total failure as a distributed system. Destroying one central node had the effect of taking out a whole fleet of automated ships. The invasion failed because some tentacled alien failed to read a book on distributed systems.

See the key is that you never know what is going to fail. Having a single point of failure is a huge weakness. Most people, when they’re designing systems don’t even realize that what they’ve got is a distributed system. I’ve seen this costing a lot of people a lot of time a couple of times. I’ve seen lone SAN failure take out an entire company’s ability to work. I’ve seen failures in data centers on the other side of the planet take out citrix here. If there is one truth to the information systems in large companies it is that they are complicated. However the people working on them frequently fail to realize that what they have on their hands is a single, large, distributed system.

For sure some services are distributed by default (Active Directory, DNS,”¦) but many are not. Think about file systems: most companies have files shared from a single point, a single server. Certainly that server might have multiple disks in a RAID but the server itself is still a single point of failure. This is why it’s important to invest in technologies like Microsoft’s Distributed File System which uses replication to ensure availability. Storage is generally cheap, much cheaper than dealing with downtime from a failed node in Austin.

Everything is a distributed system, let’s start treating it that way.

2013-04-05

Shrink geoJSON

A while back I presented a d3.js based solution for showing a map of Calgary based on the open data provided by the city. One of the issues with the map was that the data was huge, over 3 meg. Even with goodcompressionthis was a lot of data for a simple map. Today I went hunting to find a way to shrink that data.

I started in QGIS which is a great mapping tool. In there I loaded the same shape file I downloaded from the City of Calgary’s open data website some months back. In that tool I went to the Vector menu then Geometry Tools and selected Simplify geometries. That presents you with a tolerance to use to remove some of the lines. See the map is a collection of polygons, but the ones the city gives us are super in detail. For a zoomed out map such as the one we’re creating there is no need to have that much detail. The lines can be smoothed

Smoothing a lineSmoothing a line

if we repeat thisprocessmany thousands of times the geometry becomes simpler. I played around with various values for this number and finally settled on 20 as a reasonabletolerance.

QGIS then shows map with the verticies removed highlighted with a red X. I was alarmed by this at first but don’t worry, the export won’t contain these.

I then right clicked on the layer and saved it as a shape file. Next I dropped back to my old friend ogr2ogr to transform the shape file into geojson.

ogr2ogr -f geoJSON data.geojson -t_srs “WGS84” simplified.shp

The result was a JSON file which clocked in at 379K but looked indistinguishable from the original. Not too bad, about an 80% reduction.

I opened up the file in my favorite text editor and found that there was a lot of extra data in the file which wasn’t needed. For instance the record for the community of Forest Lawn contained

“type”: “Feature”, “properties”: { “GEODB_OID”: 1069.0, “NAME0”: “Industrial”, “OBJECTID”: 1069.0, “CLASS”: “Industrial”, “CLASS_CODE”: 2.0, “COMM_CODE”: “FLI”, “NAME”: “FOREST LAWN INDUSTRIAL”, “SECTOR”: “EAST”, “SRG”: “N/A”, “STRUCTURE”: “EMPLOYMENT”, “GLOBALID”: “{0869B38C-E600-11DE-8601-0014C258E143}”, “GUID”: null, “SHAPE_AREA”: 1538900.276872559916228, “SHAPE_LEN”: 7472.282043282280029 }, “geometry”: { “type”: “Polygon”, “coordinates”: [ [ [ -113.947910445225986, 51.03988003011284, 0.0 ], [ -113.947754144125611, 51.03256157080034, 0.0 ], [ -113.956919846879231, 51.032555219649311, 0.0 ], [ -113.956927379966558, 51.018183034486498, 0.0 ], [ -113.957357020247059, 51.018182170100317, 0.0 ], [ -113.959372692834563, 51.018816832914304, 0.0 ], [ -113.959355756999315, 51.026144523792645, 0.0 ], [ -113.961457605689915, 51.026244276425345, 0.0 ], [ -113.964358232775155, 51.027492581858972, 0.0 ], [ -113.964626177178133, 51.029176808875143, 0.0 ], [ -113.968142377996443, 51.029177287922145, 0.0 ], [ -113.964018694707519, 51.031779244814821, 0.0 ], [ -113.965325119316319, 51.032649604496967, 0.0 ], [ -113.965329328092139, 51.039853734199696, 0.0 ], [ -113.947910445225986, 51.03988003011284, 0.0 ] ] ] } }

Most of the properties are surplus to our requirements. I ran a series of regex replaces on the file

s/, “SEC.“ge/}, “ge/g s/“GEODB.“NA/“NA/g s/,s//g s/]s//g s/[s//g s/{s//g s/}s//g

The first two strip the extra properties and the rest strip extra spaces.
This striped a record down to looking like(I added back some new lines for formatting sake)

{“type”:”Feature”,”properties”:{“NAME”:”FOREST LAWN INDUSTRIAL”},”geometry”: {“type”:”Polygon”,”coordinates”:[[[-113.947910445225986,51.03988003011284,0.0], [-113.947754144125611,51.03256157080034,0.0],[-113.956919846879231, 51.032555219649311,0.0],[-113.956927379966558,51.018183034486498,0.0], [-113.957357020247059,51.018182170100317,0.0],[-113.959372692834563, 51.018816832914304,0.0],[-113.959355756999315,51.026144523792645,0.0], [-113.961457605689915,51.026244276425345,0.0],[-113.964358232775155, 51.027492581858972,0.0],[-113.964626177178133,51.029176808875143,0.0], [-113.968142377996443,51.029177287922145,0.0],[-113.964018694707519, 51.031779244814821,0.0],[-113.965325119316319,51.032649604496967,0.0], [-113.965329328092139,51.039853734199696,0.0],[-113.947910445225986, 51.03988003011284,0.0]]]}}

The whole file now clocks in at 253K. With gzip compression this file is now only 75K which is very reasonable, and something like 2% of the size of the file we had originally. Result! You can see the new, faster loading, map here

2013-04-04

pointer-events - Wha?

Every once in a while I come across something in a technology I’ve been using for ages which blows my mind. Today’s is thanks to Interactive Data Visualizations for the Web about which I spoke yesterday.In the discussion about how to set up tool tips there was mention of a CSS property called pointer-events. I had never seen this property before so I ran out to look it up. As presented in the book the property can be used to prevent mouse events from firing on a div. It is useful for avoiding mouse out events should a tool tip show up directly under the user’s mouse.

The truth is that it is actually a far more complex property. See an SVG element is constructed from two parts: the fill is the stuff inside theelement In the picture here is is the purple portion. There is also a stroke which is theouteredge of the shape; shown here in black.

Screen Shot 2013-01-09 at 9.22.22 PM

By setting pointer-events to fill then mouse events will only fire when the cursor is over the filled portion of an element. Conversly setting it to stroke will only fire events when the mouse pointer is over the outer border. There are also settings to only fire events for visible elements (visibleStroke, visibleFill) and to allow the mouse events to pass through to the element underneath (none). TheMozilla documentsgo into its behaviour in some depth.

The ability to hide elements from the mouse pointer is powerful and can be used to improve the user’s interaction with mouse events. I had originally recommended using an invisible layer for this but pointer-events is much cleaner.

2013-04-03

Review - Interactive Data Visualization for the Web

I’ve been reading Scott Murray’sexcellentbook Interactive Data Visualization for the Webthis week. Actually, I’ve been reading a lot of O’Reilly books as of late because they keep offering to sell them to me at a huge discount. Ostensibly the book is about d3.js which is my personal favorite datavisualizationlibrary for JavaScript.It is a pretty well thought out book and I would recommend it for those looking to explore d3.js.I don’t know that it is really about interactive data visualizations so much as it is about one specific technology for creating visualizations. However if we ignore the title then the contents stand by themselves.

I discovered the book because I cam across Scott’s blogwhen I was doing a spike on d3.js some weeks back. His blog was expanded into the book. The book starts with a look at why we build data visualizations, offers some alternative toolkits then jumps right into d3.js. I would have liked to see more of a discussion around the technologies available in HTML5 forvisualizingdata.In addition to SVG, which d3.js leverages, there is canvas and you can also build some pretty interesting things in pure CSS3. There are also many tools for doing static image generation on the server side.

A significant section of the book is dedicated to teaching JavaScript and presenting web fundamentals. I wasn’t impressed that so much effort had gone into a topic which is covered so well by many other books. I’m sure it was just that I’m not the target audience of this section. By chapter 5, though, things are getting interesting.

Scott introduces d3 in more detail and talks about method chaining(a huge part of d3) and getting data. The rest of the book builds on the basic d3.js knowledge by creating more and more complicated graphs. The book moves through bar chart and scatter plot before adding talk of using scales and leveraging animation. I had been a bit confused about how to make use of dynamic data sets in d3 but the section on how to add data cleared that up nicely.

I think the real key to this book is the chapter on interactions. Anybody can draw a graph server side, the story for creating it in JavaScript becomes much more compelling when users can click on items in the graph and have things happen. There is a pretty extensive discussion about how to add tooltips to your visualizations. I have to admit I was a bit miffed by that because I was going to do a blog series along the same lines and now I’ll just look like an idea stealing baboon instead of an insightful orangutang.

Finally a couple of more advanced topics are covered including talking about some(not all, mind you) of the buit in layouts in d3.js. Finally mapping is covered. Thank goodness there is some discussion of projections because that is what got me when I worked with maps in d3.js.

There is very little discussion about what makes a good visualization and there is no attempt to come up with any unique visualizations.If you’re interested in that aspect then pretty much anything which comes out of Stephen Few is insightful and superinteresting.

For the price that O’Reilly charged me for this book it is 100% worth it. Plus I hear that for every time you look at an O’Reilly book and don’t buy it they kill one of the animals pictured on the cover.

Book CoverBook Cover

2013-04-02

Importing a git Repository into TFS

From time to time there is need to replace a good technology with a not so good technology. The typical reason is a business one. I don’t claim to have deep understanding of how businesses works but if you find yourself in a situation where you need to replace your best friend, your amigo, your confident Git with the womanizing, drunken lout that is TFS then this post is for you! This post describes how to import a git repo into TFS and preserve most of the history.

The first think you’ll need is a tool called Git-TF which can be found on codeplex. This comes as a zip file and you can unzip it anywhere. Next you’ll need to add the unzip directory to your path. If you’re just doing this as a one-time operation then you can use the powershell command:

$env:Path += ";C:tempgit-tf"

to add to your path.

Now that you have that git should be able to find a whole set of new subcommands. You can check if it is working by running

git tf

You should get a list of subcommands you can run

Git-TF subcommandsGit-TF subcommands

Now drop into the git repository you want to push to TFS and enter

git tf configure http://tfsserver:8080/tfs $/Scanner/Main

Wherehttp://tfsserver:8080/tfs is the collection path for your TFS server and$/Scanner/Main is the server path to which you’re pushing. This will modify your .git/config file and add the following:

[git-tf “server”] collection =http://tfsserver:8080/tfs serverpath =$/Scanner/Main

Your git repository now knows a bit about TFS. All you need to do now is push your git code up and that can be done using

git tf checkin –deep

This will push all the commits on the mainline of your git repo up into TFS. Without the ““deep flag only the latest commit will be submitted.

There are a couple of gotchas around branching. You may get this error:

git-tf: cannot check in - commit 70350fb has multiple parents, please rebase to form a linear history or use –shallow or –autosquash

You canflattenyour branches by either rebasing or by passing git-tf the ““autosquash flag which will attempt to flatten the branching structure automatically. I’m told that autosquashing can consume a lot of memory if there are a lot of commits in the repository. I have not had any issue but my repositories are small and my machine has 16GB of memory.

Now you have move all your source code over to TFS. Yay.

I’m not going to point out that if you keep git-tf around you can continue to work as if you have git and just push commits to TFS. That would likely be against your company’s policies.