2013

2013-07-19

Southern Alberta Flooding

In the past couple of weeks there have been two big rain storms in Canada which have caused a great deal of flooding. The first was the Southern Alberta floods and the second was the flood in Toronto. I was curious about how the amount of rain we have had stacks up against some other storms. I was always struck by the floods in India during the monsoon season so I looked up some number on that and also on the world record for most rain in 24 hours.

Of course I wanted to create a visualization of it because that’s what I do. Click on the picture to get through to the full visualization

Click for details

Now I know that the amount of rain is just one part of the flood story but the numbers are still interesting. Can you imagine being around to see 1.8m of rain fall in 24 hours? I guess it was the result of a major hurricane. Incidentally Foc-Foc is on an islandRÃ©union near Madagascar. I’d never heard of it, despite 800 000 people living there.

I used these as the data sources:

Toronto 126mm -http://www.cbc.ca/news/canada/toronto/story/2013/07/09/toronto-rain-flooding-power-ttc.html

Calgary45mm ““ http://www.cbc.ca/news/canada/calgary/story/2013/06/21/f-alberta-floods.html

Mumbai 181.1mm -http://www.dnaindia.com/mumbai/1845996/report-mumbai-gets-its-third-highest-rainfall-for-june-in-a-decade-at-181-1-mm

Foc-Foc, RÃ©union1,825mm -http://wmo.asu.edu/world-greatest-twenty-four-hour-1-day-rainfall

2013-07-16

Storage Costs

Earlier this week I got into a discussion at work about how much storage an application was using up. It was an amount which I considered to be trivial. 20gig, I think it was. I could have trimmed down by but it would have taken me an hour or two and with the cost of storage it didn’t seem worth it. My argument was that with the cost of storage these days it would cost the company more to pay for me to reduce the file storage than to just pay for storage. The problem seemed to go away and I claimed victory.

I’m just saying, don’t start a “storage is expensive” argument with me. You’re not going to win for numbers under 50TB

“” Simon Timms (@stimms) July 8, 2013

My victory glow did not last long. @HOVERBED, my good sysadmin friend, jumped on my argument.

“Disk is cheap, storage isn’t”

Then he said some nasty stuff about developers which I won’t repeat here. I may have said some things about system admins prior to that which started the spiral.

Truth is that he’s right. When I talk about storage being cheap I am talking about disk being cheap. There is a lot more to storage than putting a bunch of disks in a server or hooking up to the cloud. These things are cheap but managing the disk isn’t. There is a cost associated with backing up data, restoring the data and generally managing disk space. There is also the argument that server disk isn’t the same as workstation disk. Server space is far more expensive because it has to be reliable and it has to be larger than typical disk. When I did the math I figured disk might cost something like $5 a gigabyte to provide. @HOVERBED quoted me numbers closer to $90 a gig. That’s crazy. I’m going into go out on a limb here but if you’re paying that sort of money for managing your storage you’re doing it wrong. The expensive things are

Backing up your storage to tape andshippingthat tape to somwhere safe
Paying people to run the backups and restores
Paying for SANs or NAS which is much more expensive than just disk

So let’s break this thing down. First off why are we backing up to tape still? I’ve seen a couple of arguments. The first is that tape is less costly than backing up to online storage. I had to look up some information on tapes because when I was last involved with them they were 250GB native. Turns out that they’re up to 5TB native now(StorageTek T10000 T2). That’s a lot of storage! Tapes have two listed capacity: a native capacity and a compressed capacity. The compressed capacity is 2x the native capacity ““ the theory being that you can gzip files on the tape and get extra capacity for free. I don’t know if people still get 2x compression with newer file formats as many of them integrate compression already.

These tapes go for something like $150, so that’s pretty frigging cheap! To get the same capacity on cloud services will cost you

Service	Per GB per month	Per 5TB per month
Azure Geo Redundant	$0.095	$415
Azure Local	$0.07	$330
Azure Geo Redundant	$0.10	$425
Amazon Glacier	$0.01	$50

(Some of the math may seem wonky here but there are discounts for storing a bunch of data) Tapes are looking like a pretty good deal! Of course there are a lots of additional costs around that $150. We’re not really comparing the same thing here: you need to keep multiple tapes so that you can cycle them offsite and even from day to day. I don’t know what a normal tape cycling strategy is but I don’t really see how you could get away with fewer than 3 tapes per 5TB.

There is also the cost of buying a tape drive and you have to pay some person to take tapes out of the drive, label them(please label them) put them in a box and ship them offsite. This takes us over to the second point: people. No matter what you do you’re going to have to have people involved in running a tape drive. These people add cost and errors to the system. Any time you have people involved there is a huge risk of making a mistake. You can’t tell me that you’ve never accidentally written over a tape which wasn’t meant to be reused yet.

Doing your backup to a cloud provider can be completely automated. There are no tapes to change, no tapes to ship to Iron Mountain(which I discovered isn’t actually located in a mountain). There is a bandwidth cost and the risk of failure seems to be higher when backing up offsite. Bandwidth is quickly dropping in price and as most of your bandwidth will be used overnight that means that during the day your users can benefit from way faster Internet.

Not what Iron Mountain looks like. Jerks.

I’m in favour of cloud storage over local tape storage because I think it is more reliable once the data is up there, easier to recover(no messy bringing tapes back on site), more durable (are you really going to have a tape drive around in 10 years time to read these tapes? One in working condition?) and generally easier. There is also a ton of fun stuff you can do with your online backup that you can’t do locally. Consider building a mirror of your entire environment online. Having all your data online also lets you run analysis of how your data changes and who is making changes.

@HOVERBED suggested that having your backups available at all time is a security risk. What if people sneak in and corrupt them? I believe the same risk exists on tape. Physical security is largely more difficult than digital security. Most of the attacks on data are socially engineered or user error rather than software bugs allowing access. The risk can be mitigated by keeping checksums of everything an validating the data as you restore it.

Okay so now we’re onto SANs and NASs(Is that how you pluralize NAS? Well it is now.). More expensive than disk in my workstation? Actually not anymore. The storage on my workstation is SSD which, despite price cuts, is more expensive than buying the same amount of storage on a SAN. But why are we still buying SANs? The beauty of a SAN is that it is a heavily redundant storage device which can easily be attached to by a wide variety of devices.

Enter ZFS. ZFS is a file system created by Sun back when they were still a good technology company. I’m not going to go too far into the features but ZFS allows you to offer many of the features of a SAN on more commodity hardware. If that isn’t good enough then you can make use of a distributed file system to spread your data over a number of nodes. Distributed file systems can handle node failures by keeping multiple copies of files on different machines. You can think of it as RAID in the large.

To paraphrase networking expert Tony Stark:That’s how Google did it, that’s how Azure does it, and it’s worked out pretty well so far.

Storage in this fashion is much cheaper than a SAN and still avoids much of the human factor. To expand a disk pool you just plug in new machines. Need to take machines offline? Just unplug them.

So is storage expensive? No. Is it more expensive than I think? Slightly. Am I going to spend the time trimming down my application? Nope, I spent the time writing this blog post instead. Efficiency!

2013-07-09

Invalid Installation Options in Starcraft Installer

I play next to no video games but my brother was over tonight and was bugging me to play Starcraft Heart of the Swarm. So I broke out the awesome collector’s edition which contains more disks and mousepads than”¦ I don’t know something which contains a lot of mousepads and disks, perhaps the year 2001. I put the disk in, hit install, agreed to god knows what in the licence agreement and was promptly told: Invalid Installation Options. We’ll that’s odd because the only option I selected was the directory which had 1.5TB free. I rebooted and futzed around a bit to no avail.

There was no additional information available as to what the error might be. Frankly this sort of thing irritates the shit out of me. I don’t mind there being errors. Installing software or running software on such a diverse set of machines as run Starcraft has got to be non trivial. What I mind is that there is no way for me to solve the problem. Throw me a fricking bone so I know what to fix.

No help on the official support forums, of course they couldn’t be bothered helping you out with only such a generic error message to point the direction. Eventually I found some post about Diablo III which seemed related. I had to delete the Battle.net folder from my program data folder. This caused a redownload of the update installer. That resulted in it working. So that’s deletec:ProgramDataBattle.net.

Now I’m going to open negotiations with some terran generals. I’ve always felt there it too much Doom about Starcraft and not enough The West Wing. I’m going to write such a speech that galactic peace will have no choice but to show up.

2013-07-08

Quick Custom Colour Scales in d3js

The built in color scales in d3 are wonderful tools for those of us who aren’t so good with coming up with colour schemes. I would love to have the skills to be a designer but by now it should be pretty clear that’s never going to happen. However it occurred to me the other day that the built in scales in d3 are designed for high contrast rather than for colour scheme consistency.

Woah, consistent

In order to make prettier graphs and visualizations I thought I would build a colour scheme scale. I started with a very simple two colour scale

This can be used as a drop in replacement for the d3 colour scale. You can specify a domain and a range then call it as you would a normal d3 scale

I used it to alternate colors in my graph to look like this

Prettier

It would be trivial to change the scale to account for more colours

#file-gistfile1-txt-L7

Now you can make graphs which look like

More columns for more column fun!

Of course color scales can be used for all sorts of fun applications. Consider

highlight bars which are above or below a certain threshold
display another dimension of data
be switchable for the color blind

Anyway this was just a fun exercise to see what I could come with in a couple of minutes.

2013-07-01

Blogging so far this year

In January of this year I decided that I was going to try to blog every single day for a whole year. It’s July 1st now which means that I’m 6 months into the project. I had a number of reasons at the beginning for embarking on such an adventure:

Get better known in the technical community around Calgary and Canada in general. I have been doing more talks and more open source projects in the last year in the hopes it raises my profile. I feel that I’m having some success in this area. I’ve joined the Calgary .net users group and I’ve given a number of talks at various conferences and .net workshops.
Encourage me to learn about new technologies and to step outside of my normal box of understanding. I feel that I’m having some success in that area. I would never have looked at suchtechnologies as Dart or CoffeeScriptnor would I have delved so deeply into databases and a dozen other topics.
Improve my writing by doing more of it. There is no way to improve at something which doesn’t involve at least some degree of practice. To that end I’ve written many thousands of words. Several of the words had more than onesyllable which is always impressive.
Finally there was a selfish reason: I wasn’t really challenged at work and I wasn’t enjoying my job. I use the blog as a lifeline to keep me sane when working with ancient technologies like Windows XP and Microsoft Access. I figured having a lot of blogs would give me something to point at during job interviews.

I certainly feel that my goals have been largely accomplished but there has been a cost too: quality. Producing a post every weekday(even holidays, like today) means that I’m working to some pretty tight deadlines. Many of my posts have only scratched the surface of a technology or are justregurgitationsof a collection of other blog posts onthesame topic without providing my own take.

Sacrificing quality for quantity is a common trade off both in software and in the real world. As noted agile brain-box Steve Rogalskywould likely claim: deadlines are the enemy of quality. Having tried half a year of quantity over quality I’m going to switch it up for the next 6 months and try quality overquantity. I’m still going to try imposing a week long time box because I’ve got to publish something.

2013-06-28

Turning on a dime

I don’t think it is any secret that Windows 8 is not doing well. The ModernUI or MetroUI was a bold move but not one which was well received by consumers. Big changes of that sort are seldom well received. It isn’t necessarily because the new UI is wrong or broken(I think it is broken but that’s not the point) it is just that people have invested a lot of time into learning the old way of doing things and change is scary. Remember when Office 2007 came out with the ribbon? Total disaster. At the time I watched a little documentary that Microsoft put out about the research they put into the productivity of the ribbon vs the old tool bars. It was amazing, they spent hours and hours on the thing doing A/B testing in a user interface labritory. I don’t remember the exact stats but they found the ribbon to be far more productive than the tool bar any only take a few hours to learn. I think the stat was that within 3 days users were more productive on the ribbon than the tool bars. Still the outcry was palpable and to this day my dad complains about not being able to find things on the ribbon(Office 1997 was really the high water mark for him).

I imagine similar testing went on with ModernUI and we’re seeing the same sort of backlash. Only this time users have alternatives: tablets and Macs. In 2007 there was no alternative to MS Office, I think you can argue that remains true. The Microsoft of today is a different beast from that of 2007: they are more responsive to user complaints. So this summer they are launching Windows 8.1 which is designed to fix many of the problems in Windows 8. Well fix the perceived problems with Windows 8. I never felt there was a big problem with Windows 8 which needed fixing in the first place. Already I’m seeing complaints that the new Start button is junk and that Windows 8.1 is no better than Windows 8. However the point is that Microsoft, a huge multi-zillion dollar company with more momentum than the ship in Speed 2 changed their whole Windows strategy.

I remember this being a terrible movie. I bet it is better than Numbers Station, though.

Good on them! Now it cost them a few executives to make the change but if Microsoft can do this then what is stopping you and your company from making big changes? See change isn’t that hard, it just requires that you value the end user more than the process.

2013-06-27

What happened to thin clients?

Somewhere at home I have this awesome little thin client computer called a Sun Ray. It doesn’t have any disk in it and has limited memory ““ basically it is a front end for a server. A visual terminal if you like. I think I got it for about $20 on eBay 7 or 8 years ago.When you boot it up it makes a DHCP request and, in addition to the normal data returned by DHCP, the server gives it the location of an image to boot. All the actual computing a user does is performed on a server and the Sun Ray simply shows images. The idea was that instead of buying a bunch of desktop computers which quickly become outdated you buy these thin clients and just upgrade the server. You might not even need to upgrade the server so much as add more servers as the old server could be upgraded easily. IT management costs would be reduced as there was no reason to send out techs to people’s offices except to replace a keyboard or mouse. The Sun Ray had no moving parts so the chances of failure were pretty damned small. Also it was built by Sun which had a reputation for making the most bullet proof hardware imaginable.

It was a well deserved reputation. Years ago we had some E450s and used them as go-carts around the office. We never broke one. Walls? Yes. A door. Yep. But the machine? Never.

Not even a purple shell could break one.

Without the need for onsite techs you could outsource all your desktop support to SE Asia or a country with more stans in its name than Syd Hoff’s famous book

More popular that the sequel “Stanley is Devoured”

For me to spend $20 on this thing I must have been convinced at the time that thin clients were the future. But look around today and do you see any thin clients in your offices or your house? I don’t know what happened. They are a great idea and one which seems even more sensible in a cloud computing world.

While I don’t see any thin clients on my desk I do see a Windows XP workstation. It hasn’t been upgraded by the IT group yet because they are too busy doing their normal jobs without running around upgrading thousands of work stations. Any ever-greening programme which existed has beencrippled by the financial crisis and even those getting new machines are getting Windows XP workstations. Upgrading an entire company is a huge undertaking both in terms of time an money. If the business is persisting on Windows XP then what’s the advantage in upgrading? It is a good question and the only answers are

Older OSes are not being supported any more and if you find an issue you’re on your own
Newer technology is more efficient from a power perspective and from a workflow perspective
People like having new hardware/software. It is a low cost way of keeping people happy (at least people like me)

Instead of upgrading all the workstations upgrading the servers is much easier. You can even do it remotely using virtual machines and have an easy downgrade path should it be needed. Why thin client computing hasn’t taken off in business I don’t know. Looking around the office here it seems that almost everybody just uses excel which can easily be put on a thin client without worrying about network latency. I suppose you could argue that Google’s Chromebook is a thin client but I don’t think it is the sort of product which is going to be on corporate desks any time soon.

So I’m asking: What happened to thin clients?

2013-06-26

3D printing is here!

A few years back I read the Cory Doctorow book Makers. It is a very interesting look at the future of technology and one of the things which made a big difference in the future of technology was 3D printing. People no longer needed to go to Wal-Mart to buy small things like kids’s toys or even tools. Instead they just printed whatever they needed on commercially available 3D printers. Need a new wrench at 3am to replace some car part I’ve never heard of? Print it. Heck, need the car part? Print it. Even if you need a cup holder for your car you can print it.

Printers are available which print with all sorts of different materials. I’m particularly excited by the ones which print in food. Yummy. However more practical are the ones which can print in metal or ceramic. Perhaps soon you’ll no longer go to Canadian Tyre to get that 5/16th socket you’re missing. Even if you can’t afford your own 3D printer a local print store will soon crop up.

When the book came out it really was a futuristic technology. I think that has changed today. Microsoft announced that Windows 8.1 will have support for 3D printing out of the box.

Microsoft is not really a very exciting company for the most part so if they’re adding support for 3D printing that means that 3D printing has moved from an experiment to a consumer-grade technology. Once the world of rapid prototyping 3D printing is starting to become actually useful.

Printers are still pretty expensive. This MakerBot Replicator 2 is over $2000

However that isn’t going to last very long. Every week on Kickstarter I see new printers which arepoppingup and are far cheaper than MakerBot’s offerings. I was particularly drawn to this Buccaneer printer.

Smaller than the MakerBot printer the rest of the specifications seem to be very similar.

Local manufacturing using 3D printers and milling machines is going to change the world. Mass production reduced the cost of goods but also removed the ability to customize the output. 3D printing brings back the customization while keeping the low price of mass production.

Exciting. 58 hours left on the buccaneer kickstart at the time of writing. Do you think my wife will mind if I buy one, or four?

2013-06-25

Retrieving Documents from Blob Storage with Shared Access Signatures

Azure blob storage provides a place where you can put large binary objects and then retrieve them relatively quickly. You can think of it it as a key-value store, if you like. I don’t know how many people use it for storing serialized objects from their applications but I bet hardly any. SQL storage is a far more common approach and even table storage with its built in serialization seems better suited for most storage of serializable data. For the most part people use it as a file system, at least I do. When there is a need to upload a document in my program the server side application takes the multi-part form data and shoves it into blob storage. When it needs to be retrieved, a far more common activity, then I like to let blob storage itself to the heavy lifting.

Because blob storage is accessed over HTTP you can actually give out the document name from blob storage and let access blob storage directly. This saves you on bandwidth and server load since you don’t have to transfer it to your server first and it is almost certainly faster for your clients.

From blob storage, to cloud server to end user

You can set up access control on blob storage to allow for anonymous read of the blob contents. However that isn’t necessarily what you want because now everybody can read the contents of that blob. What we need is a way to instruct blob storage to let people read a file but only specific groups of people. The common solution to this problem is to give people a one time key to access the blob storage. In azure world this is called a Shared Access Signature Token. I wouldn’t normally blog about this because there are a million other sources but I found that the terminology has changed since the 2.0 release of the Azure tools and now all the other tutorials are out of date. It took me a while to figure it all out so I thought I would save some time.

The first set is to generate a token.

Here I set up the blob security to remove public access then I generate aSharedAccessBlobPolicy to have an expiration time 1 minute in the past and 2 minutes in the future. The time in the past is to account for minor deviations in the clocks on the server and on blob storage. This policy is then assigned to the container.

The URL returned looks something likehttp://127.0.0.1:10000/devstoreaccount1/documents/cdb8335d-f001-46ca-84da-de12ac57157b?sv=2012-02-12&sr=c&si=temporaryaccesspolicy&sig=BC4YjrFtVfpkZry9VHCB9qDMKqGS4%2B46rcNt30kjH4o%3D

Fun! Let’s break it down

This part is the blob storage url(this one is against local dev storage and my blob id is a guid to prevent conflicts)

http://127.0.0.1:10000/devstoreaccount1/documents/cdb8335d-f001-46ca-84da-de12ac57157b

The second part here is the SAS Token which is valid for the next 2 minutes. Hurry up!

?sv=2012-02-12&sr=c&si=temporaryaccesspolicy&sig=BC4YjrFtVfpkZry9VHCB9qDMKqGS4%2B46rcNt30kjH4o%3D

If you attempt to retrive the document after the time window then you’ll get this error

While this method of preventing people from getting the protected documents is effective it isn’t flawless. It relies on providing a narrow window during which an attack can function. If you have several classes of documents, say one for each customer of your site, then it would be advisable to isolate each customer in their own blob container.

2013-06-24

Elasticsearch

I have previously written about how I think that search is the killer interface of the future. We’re producing more and more data every day and search is the best way to get at it. In the past I’ve implemented search both in a naÃ¯ve fashion and using lucene, a search engine. In thenaÃ¯ve approach I’ve just done wildcard searches inside a SQL database using a like clause

select * from table_to_search where search_column like ‘%’ + @searchTerm + ‘%’

This is a very inefficient way to search but when we benchmarked it the performance was well within our requirements even on data sets 5x the size of our current DB. A problem with this sort of search is that it requires that people be very precise with their search. So if the column being searched contains “I like to shop at the duty free shop” and they search for “duty-free” then they’re not going to get the results for which they’re looking. It is also very difficult to scale this over a large number of search columns you have to keep updating the query.

Lucene is an Apache project to provide a real search engine. It has support for everything you would expect in a search engine: stemming, synonyms, ranking,”¦ It is, however, a bit of a pain tointegratewith your existing application. That’s why I like Elasticsearch so much. It is an HTTP front end for Lucene.

I like having it as a search appliance on projects because it is just somewhere I can dump documents to be indexed for future search even if I don’t plan on searching the data right away.

Setting up a basic Elasticsearch couldn’t be simpler. You just need to download the search engine and start it with the binary in the bin directory.

bin/elasticsearch

This will start an HTTP server on port 9200(this can be configured, of course). Add documents to the collection using HTTP PUT like so

curl -PUT ‘http://localhost:9200/documents/tag/1' -d ‘{ “user” : “simon”, “post_date” : “2013-06-24T11:46:21”, “number” : “PT-0093-01A”, “description” : “Pressure transmitter” }’

The URL contains the index (documents) and the type (tag) as well as the id(1). To that we can put an arbitrary json document. If the index documents doesn’t already exist it will be created automatically with a set of defaults. These are fine defaults but lack any sort of stupport for stemming orlemming. You can set these up using the index creation API. I found a good example which demonstrates some more advanced index options:

In this example a snowball stemming filter is used to find word stems. This means that searching for walking in an index which contains only documents with walk will actually result in results. However stemming does not look words up in a dictionary, unlike lemming, so it won’t find good if you search for better. Stemming simply modified words according to an algorithm.

There are a number of ways to retrieve this document. The one we’re interested in is, of course, search. A very simple search looks like

curl -XGET ‘http://localhost:9200/documents/_search?pretty=true' -d ‘{ “query”: { “fuzzy” : {“_all”: “pressure”} } }

This will perform a fuzzy search of all the fields in our document index. You can specify an individual field in the fuzzy object by specifying a field instead of _all. You’ll also notice that I append pretty=true to the query, this just produces more readable json in the result.

Because everything is HTTP driven you might be tempted to have clients query directly against the ElasticSearch end point. However that isn’t recommend, instead it is suggested that the queries be run by a server against an ElasticSearch instance behind a firewall.

Adding search to an existing application is as easy as setting up an ElasticSearch instance. ElasticSearch can scale up over mulitple nodes if needed or you can use multiple indicies for different applications. So whether you’re big or small ElasticSearch can be a solution for searching.

A blog about computer programming and technology.

My Books