2013-09-18

Document control and DDD/CQRS - solving similar problems

I had the good fortune to have a two hour introduction to the world of document control the other day. It was refreshing to see that we programmers aren’t the only ones who don’t have things figured out yet. The entire document control process is an exercise in managing the flow and ownership of data. I spent a lot of time thinking about how similar the document control problem and the data flow problem mirror each other.

Document control is really interested in documents and doesn’t care at all about the contents of these documents. Their concerns are largely around

  • who owns this document?
  • what is the latest version of this document?
  • how is this document identified?
  • how long do I have to keep this document?
  • is this document superseded by some other document?

These sound a lot like issues with which we deal when using DDD. Document ownership is a simply a problem of knowing in which aggregate root a document belongs. Document versioning is similar to maintaining an event stream. Document identification is typically done through numbering ““ however the flow of documents is slow enough that sequential numbering isn’t a problem ““ no need for a randomly generated GUID.

Document retention isn’t one with which we typically spend much time in CQRS land. Storage is cheap so we just keep every version around or at least we’re able to generate every version through event sourcing. Perhaps the most congruent concept is taking snapshots of aggregates, but we’re typically only interested in the most recent version of the aggregate. With document control there is always some degree of manual intervention with documents so there is a significant cost to retaining all documents indefinitely. I’m only talking about digital copies of document here, Zuul protect you if you need to track paper copies of things too. I can’t even keep track of my keys let alone tens of thousands of documents. My strategy for paper documents would be to burn them as soon as I got them and refer people to the digital version.

Superseding documents also doesn’t seem like a problem we typically have in CQRS. In document control one or more documents may be supersededby one or more documents. For instance we may have a lot of temporary documents which are created by the business things like requests to move offices. They have value but only in a transitory way. Every week the new office seating chart is built from these office move documents and the documents discarded. Their purpose is complete and we no longer care about them as we have a summary document.

Many documents become one. I call it a Voltron operation. Many documents become one. I call it a Voltron operation.

In the opposite operation a document can be replaced by a series of documents. This activity is prevalent when adding detail to documents. A single data sheet may become several documents when examined in more detail

Reverse Voltron? Fan-out? The name may need some work. Reverse Voltron? Fan-out? The name may need some work.

This was originally going to be a post about how much we in the DDD/CQRS community have to learn from document control. I imagined that document control was a pretty old and well defined problem. There would surely be well defined solutions. I did not get that impression.

The problem of canonical source of truth or “who owns the data” is a very difficult one in document control. We’re spoiled in DDD because it is rare indeed that the owner of a piece of data can change during its lifespan. Typically the data would remain within an AR and never updated without the involvement of the AR. With document control it is probable that responsibility could jump from your AR to some other, possibly unknown, AR. It could then jump back. At any point in time it would be impossible, without querying every AR, who had control of the data. Of course with a distributed system like many people working on a document it is possible that there will be disagreement about which AR has responsibility at any one time. Yikes!

What we can learn from document control

I think that looking at document control gives us a window into what can happen when you relax some of the constraints around DDD. Data life-cycle is well defined in DDD and we know who owns data. If you don’t then you end up in trouble with knowing who is the source of truth. Document control must solve this problem constantly and it can only be done by going out and asking stakeholders a lot of questions ““ a time consuming exercise.

The introduction of splitting and combining documents, or in our case aggregates, over their lifetime is disastrous. You lose out on the history of information and knowing where to apply events becomes difficult. Instead we should retain aggregates as unchanged as possible (in terms of what fields they have, obviously the data can change) and rely on projections of the data to create different views of information. This is basically impossible to apply to formatted documents as you would have in document control.

What I think would help out document control

The first thing which comes to mind as being directly applicable to document control is removing meaning from the document identifiers. The documents document control manages tend to be numbered and the temptation to add meaning to a document number is too tempting to turn down. For instance you might get a number like

P334E-TT-6554

In our imaginary scheme all documents which start with P are piping diagrams. The 334 denotes the system to which it belong, E the operating pressure and TT the substance inside the pipe. The final digits are just incrementally assigned. The problem is just what you would expect: things change. When they do a decision must be made to either leave the number intact and damage its reliability or to renumber the document and lose the history. instead document control would do well to maintain an identifier whose sole purpose is to identify the document. The number can be retained but only as a field.

A more controversial assertion is that document control should retain all documentation. We retain a full history of messages used to build an entity, even if it is offline and used in favor of a snapshot. I believe that document control should do the same thing. Merging and splitting document is problematic and complicated. It is easier to just create a new document and reference the source documents. Ideally the generation of these new documents can be treated as a projection and the original documents retained.

In the end it is interesting to see how similar problem domains are solved by different people. That’s the beauty in learning a new development language; every language has different features and practices. I’m not, however, prepared to be the guy who learns document control in depth to bring their knowledge back to the community.

2013-09-11

The city of calgary doesn't get open data

It seems that the City of Calgary has updated its open data portal. I was alerted to it not by some sort of announcement but by a tweet from Grant Neufeld who isn’t a city employee any shouldn’t be my source of information on open data in Calgary.

Nice new City of Calgary Open Data website was quietly deployed a couple weeks ago! https://t.co/y1PnIuJg0o #yycdata #yyccc

“” Grant Neufeld (@grant) September 10, 2013

The new site is better than the old one. They have done away with the concept of having to add data to a shopping card and then check out with it. They have also made the data sets more obvious by putting them all in one table. They have also opened up an app showcasewhich is a fantastic feature. It can’t help to cross promote apps which make use of your data. There are also a few links to Google and Bing maps which do an integration with the city’s provided KML files. As I’ve said before I’m not a GIS guy so most of that is way over my head.

It is a big step forward”¦ well it is a step forward. I know the city is busy with more important things than open data but the improvements to the site are a couple of day’s worth of work at best. What frustrates me about the process is that despite having several years on lead time on this stuff the city is still not sure about what open data is. I draw your attention to the FoIP requests CSV&VariantId=1(CITYonlineDefault)). First thing you’ll notice is that despite being listed as a CSV it isn’t, it is an Excel document. Second is that the format is totally not machine readable, at least not without some painful parsing of different rows. Third the data is a summary and not the far more useful raw data. I bet there is some supposed reason that they can’t release detailed information. However if FoIP requests aren’t public knowledge then I don’t know what would be.

Open data is not that difficult. I’ve reproduced here the 8 principles of open data fromhttp://www.opengovdata.org/home/8principles

  1. Data Must Be Complete

All public data are made available. Data are electronically stored information or recordings, including but not limited to documents, databases, transcripts, and audio/visual recordings. Public data are data that are not subject to valid privacy, security or privilege limitations, as governed by other statutes.
2. Data Must Be Primary

Data are published as collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.
3. Data Must Be Timely

Data are made available as quickly as necessary to preserve the value of the data.
4. Data Must Be Accessible

Data are available to the widest range of users for the widest range of purposes.
5. Data Must Be Machine processable

Data are reasonably structured to allow automated processing of it.
6. Access Must Be Non-Discriminatory

Data are available to anyone, with no requirement of registration.
7. Data Formats Must Be Non-Proprietary

Data are available in a format over which no entity has exclusive control.
8. Data Must Be License-free

Data are not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed as governed by other statutes.
The city is failing to meet a number of these. They are so simple, I just don’t get what they’re missing. The city employees aren’t stupid so all I can conclude is that there is either a great deal of resistance to open data somewhere in the government or nobody is really convinced of the value of it yet. In either case we need a good push from the top to get going.

2013-09-10

So I wrote a book

I’ve been pretty quiet on the old blog front as of late. This is largely attributable to me being busy with other things. The most interesting of which, in my mind, is that I wrote a book. It isn’t a very long book and it isn’t a very exciting book but I’m still proud of having written the little guy. This post is less about the book itself and more about what it was like to write a book.

6542OS_mockupcover_normal

First off it is a lot of work. Far more work than I was originally expecting. I’ve written lengthy things before, most notably about 100 pages during my masters. This was different because I didn’t feel like I knew the content as well as I did for the masters paper. Having restrictions on the length of the chapters was the most difficult part. Due to some confusion about the margins for a page I started by writing the equivalent of 15 pages for a 10 page chapter. I did this for 4 chapters before my editor caught it. I agreed to cut the content down and get back on track in accordance with the outline.

This was a mistake. It was really hard to cut content to that degree. A few words here or there was easy enough but what amounted to a third of the chapter content? Tough. Later in the project I realized that keeping within the outline pages was not nearly as important as I had been lead to believe. After throwing the limits out the window the writing process became much easier.

In order to treat writing a book with the same agile approach one might use for developing software it seems crucial to not involve page counts at all. A page count is a poor metric and I have no idea why one would optimize for it. Obviously there should be some rough guidelines for the whole thing you don’t want to end up with a 1000 page book when you only set out to write 200 nor do you want 200 pages when you set out to write 1000. But writing to within 50% of the target length is reasonable.

To put too much emphasis on length is to lose sight of the goals of the book. These are much more along the lines of education or entertainment or something like that. The goal isn’t to kill X trees.

Are books still relevant?

Umm, . I don’t know to be honest. I don’t read many programming books these days, I spend my time reading blogs and tutorials instead. I think there is still a space for paper form technical books even in a fast moving world like computer programming. There is certainly a place for books about techniques or styles or about the craft of programming in general. I have some well thumbed copies of Code Complete and The Pragmatic Programmer and even Clean Code. I do not, however, think there is a place for technology specific paper books. That target moves too quickly.

The long form technical document is not dead it just needs to remain spry. If you’re going to publish a longer book style document then publishing it in a form which can be changed and updated easily is key. This is where wikis and services like leanpub come into their own. As an author you need to keep updating the book or open it to a community which will do updates for you.

Would I do it again?

Not at the moment. Not through a traditional publisher. Not on my own.

I’ve had enough of writing books for now. I’m going to take a break from that, likely a long break. I might come back to it in a year or two but no sooner. I think I can understand why authors frequently have long breaks between their books. It is an exhausting slog, a death march really.

There was nothing wrong, per say, with Packt publishing. They did pretty good work and I liked my editor or editors or many many editors”¦ I’m not sure how many edits we had on some chapters. Frequently it would be edited by person X and then those edits reversed by person Y. There didn’t seem to be an overall guiding hand which was responsible for ensuring a quality product. Good editing has to be the selling feature for publishers, the way they attract both authors and purchasers. It is the only thing which sets them apart from self publishers.

Self publishing and micro-printing is coming into its own now. By micro-printing I mean being able to produce small runs of books economically rather than printing in very small text. If I were to do it again I would take this route. I would also hire a top notch editor who would stay with the project the whole time, somebody like @SocksOnBackward(she would tell me that top notch should be top-notch).

I would also like to work with somebody. Writing alone is difficult because there is nobody off of whom you can bounce ideas. I certainly could have reached out to random people I know in the community but it is a lot to ask of them. I would be much happier having somebody who could share the whole endeavour with me.

I guess watch this spot to see if I end up writing another book.

2013-08-28

Configuration Settings in an Azure Worker Role

I have found that developing an Azure worker role is somewhat poorly documented. Perhaps I am just not good at googling for what I need but that has not been my experience in the past. Anyway I have a few worker roles in a project I’ve been struggling with how to get connection strings into them. I need two strings: a database connection and an azure storage connection string. Typically I would set this up by having an app.config file but that doesn’t scale particularly well out to the cloud. You have to redeploy to change of the settings. Instead I thought I would make use of the settings mechanism provided by Azure.

The first step is to set up the settings in your Azure project. This is done by opening up the properties for the role and going to the settings tab

Good try, Harriet the Spy, you can't see anything secret in this screenshot.Good try, Harriet the Spy, you can’t see anything secret in this screenshot.

I added two settings: StorageConnectionString and DefaultConnection. While writing this I decided that I hate both those names, drat. You can pick the environment in the service configuration drop down. Cloud is used in the cloud and local is used during a locally emulated cloud.

In my code I created a static helper class to access these settings

You can see that I’m checking two sources for the information. If the role environment provider works then I use that connection string otherwise I use the fallback to the configuration file. For some reason the role environment bails with a full on exception if you try to get configuration information out of it without it running in the cloud or emulator. That seems like overkill, especially because the exception thrown is super general and provides no helpful information.

The config file settings are used when I’m running tests on the service locally outside of the emulator. Frequently the emulator is overkill for the simple debugging I’m doing, so I have a unit test which can be enabled that just launches the service. It is quick and close enough to production for most purposes.

In azure proper you can configure overrides for the cloud settings in the configure tab of your cloud service.

In Azure you can update the configuration settings.In Azure you can update the configuration settings.

All this seems to work pretty well.

2013-08-01

C# Contracts

A few weeks ago I stumbled on an excellent video of Greg Young talking at Ordev back in 2010. The topic was object oriented programming and, basically, how I’m an idiot. Not me in particular, it would be somewhat upsetting if Greg had taken the time to do an hour talk on how Simon Timms is an idiot. Upsetting or flattering, I’m not sure which. It is a very worthwhile video and you should make time to watch it. One of the takeaways was about code contracts.

contract

I’ve never given much thought to code contracts before. I was never too impressed by what I considered to be a bunch of noise which tools like Resharper add to your code.

http://gist.github.com/stimms/6133393

“Asserting stuff is all nice and good but it should be caught by unit tests anyway” was my though.I have a lot of respect for Greg so I though I would look into code contracts. I look on them as a sort of extension to interfaces. Interfaces are a programmatic way of describing how an implementation should look. For instance a common interface in the projects I build is ILog which is an interface for logging. It is typically modeled after the ILog interface from Log4Net although it now includes some practices I picked up from my preferred logging framework, NLog.

The compiler guarantees that anything which implements that interface has at least some sort of implementation in place for each one of the defined methods. The compiler doesn’t care what the implementation is so long as there is one there. This allows me to create a “valid” implementation which looks like

This implementation doesn’t actually do what I had intended when I specified the interface. Unfortunately, there is no way, through, interfaces to require that functions actually do what they claim to do. Code contracts add another layer of requirements to implementations and allow for the enforcing of some additional conditions. Having contracts in place allows you to replace many of your unit tests with static checking. Want to ensure that null isn’t passed in? Build a contract. I decided to dig a bit more into how code contracts were working for C#.

As it turns out finding information on code contracts for C# is really difficult. There have been a couple of efforts over the years to bring code contracts into the .net world. The latest and, seemingly, most successful is as part of the PEX project. There was a burst of videos and activity on that project in 2010 but since then activity seem to have fallen off rather dramatically. Most everything in the code contracts works but it is somewhat flaky on visual studio 2012.

To get started you need to install two visual studio extensions: Code Contracts Tools and Code Contracts Editor Extensions VS2012. You can also install the code digger which displays a table of inputs which are checked for your methods. It is useful but is crippleware compared to how it is shown to work in videos like this one. The tool use to have the ability to generate unit tests but as I understand it this functionality is limited to Visual Studio Ultimate. I’m not fabricated from money so I don’t have that. Boo. (well not “Boo” for not being made from money rather “Boo” for the restriction. I’m glad I’m not made from money. Money is filthy)

Code contract extensionsCode contract extensions

Once you have these extensions installed you can start playing around with code contracts. When you come across a method which has contracts attached to it they will be shown in the intellisensehint. Some parts of the .net BCLs have received code contracts treatment. However it is wildly inconsistent which parts have contracts associated with them. Some places where I think they would be useful have been missed and other places are oddly over specified. For instance System.Math:

Missing contractsMissing contracts

Overly complicated contractOverly complicated contracts

The contracts on Math.Ceiling are pretty obvious yet they don’t seem to have been implemented. Irritating!

If you would like to specify contracts on your own code then, as far as I’m concerned, you should do it at the interface level. Always. You can put contracts on your concrete classes but then you’re all coupled to implementation and that sucks.

Because code contracts are implemented as a library instead of being part of the language syntax like Eiffel you need to set them up in buddy classes next to your interfaces. It is a real shame that they went this way and perhaps, once Rosslyn gets going, there will be a way to modify the language with new key words to deal with contracts.

Let’s say you have a class which does some math, specifically it takes a square root of a number.

This class is an implementation of the IMath interface

Here I’ve added an annotation which points to another class as containing the contracts. I actually really like that the contracts are split out into another class. It keeps the code short and still allows communicating the information about the contracts viaintellisense. The buddy class looks like:

For some reason I don’t really understand you need to specify the class for which it is a contract in an annotation. I think that pollutes the idea of a contract. The implementer should know about what contract it implements but the contract shouldn’t care at all. Each method on which you want a contract is specified and you can put in requires (pre conditions) and ensures (post-conditions). We’ll ignore the existence of i to make a point. The method is never executed so the remainder of the body is not important.

You can try the contract out by attempting to pass in an illegal value.

This will result in errors like

A failing contractA failing contract

This isn’t very exciting because, of course, -9 is a negative number. Where things get interesting is when you start coupling together contracts.

This will also fail because the contract checker will actually go out and build up a representation of how data moves around the application. It is able to spot the conflicting contracts and warning about them.

The checking won’t actually be run unless you enable it in the properties of your project. I couldn’t find any setting which showedintellisense for the contracts I had created. I believe that is just suppose to work but it didn’t on the machine I used.

Settings for contract checking Settings for contract checking

If you run into a contract which is failing and you can’t quite figure out what’s going on then the PEX Code Digger can come in handy. You can right click on the method with the contract and it will show you the paths through the method which caused a contract failure. By default it only works on portable class libraries, I understand you can reconfigure that but I don’t know what the repercussions are of that. So I created a portable class library.

Portable Class Library

The System.Diagnostics.Contracts namespace in which the contracts code lives is not part of any of the 4.0 portable subsets. You’ll need to get one of the .net 4.5 portable subsets. That’s not an obvious task. To do it you need to add a brand new library to your project and it needs to use the portable class library template.

New portable libraryNew portable library

You’re then given a choice of platforms. Many of these platforms are not natively .net 4.5 and will result in a 4.0 library. It took some playing around but I found that this combination worked:

Only contract killers on the xbox, no code contractsOnly contract killers on the xbox, no code contracts

Conclusions

I don’t know about contracts. They have the potential to speed up unit testing by creating your tests for you. Well some of your tests. The simple boiler plate tests that everybody skips doing because they’re mind numbing are largely eliminated. Anything which removes a barrier to the adoption of TDD is a good thing in my mind.

However I don’t think the implementation for C# is ready yet. Maybe they’ll never be ready. I asked around a bit but nobody seems to know what happened to code contracts. Are they still being developed? If so where is the activity? How come the editor stuff doesn’t work for my code contracts? Contract checking is also super slow. Even on this small application running the checks took a minute. I cannot imagine what it must do on a large project. Contract checking seems like it might be the sort of thing you run on that build which runs over the weekend. That sort of long feedback cycle is terrible. The better solution is to run the contracts, generate unit tests from them and run the unit tests. However, like I said, that feature seems to have been moved to the elite SKUs.

I won’t be using contracts but I will be keeping an eye out for news of continued work on them.

2013-07-30

How I broke the Linux

Years ago I was big into the Linux. Heck I was big into all Unix stuff. I had a 3 node cluster of Solaris 9 servers in my basement once which, having been built from old hardware, was probably slower than any other single machine on my network. But then I got tired of screwing around with Linux and FreeBSD and OpenBSD and (I was young, I swear) OpenVMS. I got old and I just wanted things to work. If I buy a new video card I don’t want to recompile my fricking kernel from sources I downloaded using and FTP client I wrote myself based on an argument I had with RMS in which I accused him of being a Microsoft shrill. Just work, damn it.

That being said I keep a few Linux boxes around to do things like serve files and do DHCP and the such. It was one of these boxes I rebooted after some updates last week. Now this box is amazingly stable and I have it on a UPS so its uptime was over 500 days. When it came back up a drive was missing. “That’s weird” I thought and dug into it. This is my primary drive which contains terabytes of completely legally obtained videos. In a fit of anti-police sentiment I had encrypted the snot out of the drive with TrueCrypt. I had no idea what the passphrase was. I just remembered it was long. Like mindbogglinly long. So long that, were he still alive, Robert Jordan would be impressed. This drive was never going to be cracked and I didn’t have the passphrase.

Well shoot.

So I decided I would throw the whole thing out and start over. All my important files were backed up to CrashPlan using a key I actually remembered. I would reformat and start over. A fresh start! I could get higher res versions of the stuff I had lost. Name them sequentially. It would be gloriously well ordered. Then I made my second mistake. I decided to upgrade the OS to the latest while I was in there.

Turns out that when I set up that machine I had used a software RAID1. It had, of course, never really worked properly. During boot md (the software RAID) would complain about being degraded. It never really seemed to be a big deal and I had run out of time when setting it up so I had left it. Turns out that one of the changes in the new version of the OS was to make this warning a fatal error. Now the system won’t boot.

I get dropped into a recovery console and I sigh. Fortunatly I still had some memory of how Linux works so I started to debug

dmesg | less less: command not found >god damn it you stupid recovery shell god: command not found >tail dmesg

The final couple of lines of the output pointed to md as the culprit. I was going to rebuild the array anyway so I moved mdadm.conf out of /etc/mdadm to a backup so they system wouldn’t try to mount any md drives. However, as it turns out that does nothing now in Ubuntu. Since I stopped knowing about Linux they seem to have created an init ram disk into which a subset of files is loaded. I have no recollection of this existing so it may be new or I may have just never run into it before. Anyway it holds a protected, secret copy of your mdadm.conf file so you can change the one in /etc forever and your system still won’t boot. I call this the “you stupid newb” ram disk.

By this point I’d discovered that you could append

bootdegraded=true

to the kernel line to at least get a system up with a degraded array. I did that and managed to get into the system long enough to delete the array

mdadm –stop /dev/md0 mdadm –zero-superblock /dev/sd[bc]1

create a new array(RAID 0 this time)

mdadm –create /dev/md0 –level=0 –raid-devices=2 /dev/sdb1 /dev/sdc1

Update the mdadm.conf file

head -n -1 /etc/mdadm/mdadm.conf > /etc/mdadm/mdadm.conf.new mdadm –detail –scan >> /etc/mdadm/mdadm.conf.new mv/etc/mdadm/mdadm.conf.new/etc/mdadm/mdadm.conf

and set the init ram disk back up

update-initramfs -u

I rebooted and found everything to be in working order. Thank goodness.

So the lesson here is don’t touch anything which is working. Don’t touch it ever or you will break it and have to spend all evening fixing it instead of churning butter. Which would have been more fun.

Fun riot!Fun riot!

2013-07-28

Exciting Year for Calgary .net

For ages I’ve been meaning to get more involved in the local .net community and really the whole tech community in Calgary. This last year was my ear of effort and I’ve been out to a couple of activities and a couple of groups which made me feel old and stupid(I’m looking at you YYC.js). As it turned out the Entity Framework Demi-God David Paquette picked this year to move to hotter climates leaving the presidency of the .net group here in Calgary open. Upon discovering this I immediately calledBradley Whitford and we launched an exploratory committee. I knew that Alan Alda was gunning for the same position but Bradley and I dug up some dirt on him and I coasted the rest of the way.

It wasn’t a clean campaign but I won. We’ve also had a couple of other people join the .net user group executive and we’ve managed to retain most of the old team to boot. It is a perfect mixture of the seasoned and the new.

I am really excited for our talks this year. Already we’ve got two talks set up and I’m sure we’ll have a bunch more in no time. Typically the theme or our talks has been “What’s new and awesome”. That’s pretty much going to continue this year but with a bit more emphasis on “awesome” than on “new”. We’re looking to do talks on the likes of TypeScript, F# and NoSQL databases. We’re also partnering with a couple of other groups in town to do a showdown between Ruby on Rails and ASP.net MVC and something with the JavaScript group which hasn’t yet been fleshed out.

We’re actively looking to increase our membership and our sponsors as well as our relationships with people looking to bring specialized training into town.

This season is going to rock be a jolly good time.

If you want to have a say in our topics or come out to any events be sure to join our brand new meetup site over athttp://www.meetup.com/Calgary-net-User-Group/.

2013-07-19

Southern Alberta Flooding

In the past couple of weeks there have been two big rain storms in Canada which have caused a great deal of flooding. The first was the Southern Alberta floods and the second was the flood in Toronto. I was curious about how the amount of rain we have had stacks up against some other storms. I was always struck by the floods in India during the monsoon season so I looked up some number on that and also on the world record for most rain in 24 hours.

Of course I wanted to create a visualization of it because that’s what I do. Click on the picture to get through to the full visualization

Click for detailsClick for details

Now I know that the amount of rain is just one part of the flood story but the numbers are still interesting. Can you imagine being around to see 1.8m of rain fall in 24 hours? I guess it was the result of a major hurricane. Incidentally Foc-Foc is on an islandRéunion near Madagascar. I’d never heard of it, despite 800 000 people living there.

I used these as the data sources:

Toronto 126mm -http://www.cbc.ca/news/canada/toronto/story/2013/07/09/toronto-rain-flooding-power-ttc.html

Calgary45mm ““ http://www.cbc.ca/news/canada/calgary/story/2013/06/21/f-alberta-floods.html

Mumbai 181.1mm -http://www.dnaindia.com/mumbai/1845996/report-mumbai-gets-its-third-highest-rainfall-for-june-in-a-decade-at-181-1-mm

Foc-Foc, Réunion1,825mm -http://wmo.asu.edu/world-greatest-twenty-four-hour-1-day-rainfall

2013-07-16

Storage Costs

Earlier this week I got into a discussion at work about how much storage an application was using up. It was an amount which I considered to be trivial. 20gig, I think it was. I could have trimmed down by but it would have taken me an hour or two and with the cost of storage it didn’t seem worth it. My argument was that with the cost of storage these days it would cost the company more to pay for me to reduce the file storage than to just pay for storage. The problem seemed to go away and I claimed victory.

I’m just saying, don’t start a “storage is expensive” argument with me. You’re not going to win for numbers under 50TB

“” Simon Timms (@stimms) July 8, 2013

My victory glow did not last long. @HOVERBED, my good sysadmin friend, jumped on my argument.

“Disk is cheap, storage isn’t”

Then he said some nasty stuff about developers which I won’t repeat here. I may have said some things about system admins prior to that which started the spiral.

Truth is that he’s right. When I talk about storage being cheap I am talking about disk being cheap. There is a lot more to storage than putting a bunch of disks in a server or hooking up to the cloud. These things are cheap but managing the disk isn’t. There is a cost associated with backing up data, restoring the data and generally managing disk space. There is also the argument that server disk isn’t the same as workstation disk. Server space is far more expensive because it has to be reliable and it has to be larger than typical disk. When I did the math I figured disk might cost something like $5 a gigabyte to provide. @HOVERBED quoted me numbers closer to $90 a gig. That’s crazy. I’m going into go out on a limb here but if you’re paying that sort of money for managing your storage you’re doing it wrong. The expensive things are

  1. Backing up your storage to tape andshippingthat tape to somwhere safe

  2. Paying people to run the backups and restores

  3. Paying for SANs or NAS which is much more expensive than just disk

So let’s break this thing down. First off why are we backing up to tape still? I’ve seen a couple of arguments. The first is that tape is less costly than backing up to online storage. I had to look up some information on tapes because when I was last involved with them they were 250GB native. Turns out that they’re up to 5TB native now(StorageTek T10000 T2). That’s a lot of storage! Tapes have two listed capacity: a native capacity and a compressed capacity. The compressed capacity is 2x the native capacity ““ the theory being that you can gzip files on the tape and get extra capacity for free. I don’t know if people still get 2x compression with newer file formats as many of them integrate compression already.

These tapes go for something like $150, so that’s pretty frigging cheap! To get the same capacity on cloud services will cost you

ServicePer GB per monthPer 5TB per month
Azure Geo Redundant$0.095$415
Azure Local$0.07$330
Azure Geo Redundant$0.10$425
Amazon Glacier$0.01$50
(Some of the math may seem wonky here but there are discounts for storing a bunch of data) Tapes are looking like a pretty good deal! Of course there are a lots of additional costs around that $150. We’re not really comparing the same thing here: you need to keep multiple tapes so that you can cycle them offsite and even from day to day. I don’t know what a normal tape cycling strategy is but I don’t really see how you could get away with fewer than 3 tapes per 5TB.

There is also the cost of buying a tape drive and you have to pay some person to take tapes out of the drive, label them(please label them) put them in a box and ship them offsite. This takes us over to the second point: people. No matter what you do you’re going to have to have people involved in running a tape drive. These people add cost and errors to the system. Any time you have people involved there is a huge risk of making a mistake. You can’t tell me that you’ve never accidentally written over a tape which wasn’t meant to be reused yet.

Doing your backup to a cloud provider can be completely automated. There are no tapes to change, no tapes to ship to Iron Mountain(which I discovered isn’t actually located in a mountain). There is a bandwidth cost and the risk of failure seems to be higher when backing up offsite. Bandwidth is quickly dropping in price and as most of your bandwidth will be used overnight that means that during the day your users can benefit from way faster Internet.

Not what Iron Mountain looks like. Jerks.Not what Iron Mountain looks like. Jerks.

I’m in favour of cloud storage over local tape storage because I think it is more reliable once the data is up there, easier to recover(no messy bringing tapes back on site), more durable (are you really going to have a tape drive around in 10 years time to read these tapes? One in working condition?) and generally easier. There is also a ton of fun stuff you can do with your online backup that you can’t do locally. Consider building a mirror of your entire environment online. Having all your data online also lets you run analysis of how your data changes and who is making changes.

@HOVERBED suggested that having your backups available at all time is a security risk. What if people sneak in and corrupt them? I believe the same risk exists on tape. Physical security is largely more difficult than digital security. Most of the attacks on data are socially engineered or user error rather than software bugs allowing access. The risk can be mitigated by keeping checksums of everything an validating the data as you restore it.

Okay so now we’re onto SANs and NASs(Is that how you pluralize NAS? Well it is now.). More expensive than disk in my workstation? Actually not anymore. The storage on my workstation is SSD which, despite price cuts, is more expensive than buying the same amount of storage on a SAN. But why are we still buying SANs? The beauty of a SAN is that it is a heavily redundant storage device which can easily be attached to by a wide variety of devices.

Enter ZFS. ZFS is a file system created by Sun back when they were still a good technology company. I’m not going to go too far into the features but ZFS allows you to offer many of the features of a SAN on more commodity hardware. If that isn’t good enough then you can make use of a distributed file system to spread your data over a number of nodes. Distributed file systems can handle node failures by keeping multiple copies of files on different machines. You can think of it as RAID in the large.

To paraphrase networking expert Tony Stark:That’s how Google did it, that’s how Azure does it, and it’s worked out pretty well so far.

Storage in this fashion is much cheaper than a SAN and still avoids much of the human factor. To expand a disk pool you just plug in new machines. Need to take machines offline? Just unplug them.

So is storage expensive? No. Is it more expensive than I think? Slightly. Am I going to spend the time trimming down my application? Nope, I spent the time writing this blog post instead. Efficiency!

2013-07-09

Invalid Installation Options in Starcraft Installer

I play next to no video games but my brother was over tonight and was bugging me to play Starcraft Heart of the Swarm. So I broke out the awesome collector’s edition which contains more disks and mousepads than”¦ I don’t know something which contains a lot of mousepads and disks, perhaps the year 2001. I put the disk in, hit install, agreed to god knows what in the licence agreement and was promptly told: Invalid Installation Options. We’ll that’s odd because the only option I selected was the directory which had 1.5TB free. I rebooted and futzed around a bit to no avail.

There was no additional information available as to what the error might be. Frankly this sort of thing irritates the shit out of me. I don’t mind there being errors. Installing software or running software on such a diverse set of machines as run Starcraft has got to be non trivial. What I mind is that there is no way for me to solve the problem. Throw me a fricking bone so I know what to fix.

No help on the official support forums, of course they couldn’t be bothered helping you out with only such a generic error message to point the direction. Eventually I found some post about Diablo III which seemed related. I had to delete the Battle.net folder from my program data folder. This caused a redownload of the update installer. That resulted in it working. So that’s deletec:ProgramDataBattle.net.

Now I’m going to open negotiations with some terran generals. I’ve always felt there it too much Doom about Starcraft and not enough The West Wing. I’m going to write such a speech that galactic peace will have no choice but to show up.