2014

2014-02-06

Postmark Incoming Mail Parser

If you’re deploying a website or tool to Azure and need to send e-mail you may already have discovered that it doesn’t work. To send e-mail there are a ton of third party services into which you can hook. I’ve been making use of Postmark for my sites. Honestly I can’t remember why I picked them. They have a nice API and offer 10 000 free credits. Subsequent credits are pretty reasonably priced too.

One of the cool features they have is the ability to accept incoming e-mail. This opens up a whole new world of interactivity with your application. We have a really smart use for it Secret Project Number 1, about which I can’t tell you. Sorry

When I checked there was no library for .net to parse the incoming mail. Ooop, well there’s an opportunity. So I wrote one and shoved it into nuget.

https://github.com/stimms/PostmarkIncomingMailParser

Using it

First you should set up a url in the postmark settings under the incoming hook. This should be the final URL of your published end point. Obviously there cannot be a password around the API as there is nowhere to set it. That might be something Postmark would like to change in the future.

The simplest way to set up a page to be the endpoint is to set up a ASP.net MVC WebAPI page. It can contain just a single method

public async void Post()
{
    var parser = new PostmarkIncomingMailParser.Parser();
    var mailMessage = parser.Parse(await Request.Content.ReadAsStringAsync());
    //do something here with your mail message
}

The mail message which is returned is an extension of the standard System.Net.MailMessage. It adds a couple of extra fields which are postmark specific

-MessageId ““ the ID of the message from postmark
-Date ““ the date of the message (see the issues section)

There is a sample project in git which makes use of WebAPI. And that’s all I have to say about that.

2014-02-03

The Craziest Technology You'll Read about today

Sorry for the alarmist title, I swear that I’m not attempting to bait your clicks”¦ well not much. This is awesome. I was talking with a couple of guys who work at a directional drilling company. They drill conventional oil recovery wells in order to extract a substance made from bits of old plants and animals. Drilling is way more complicated at that scale than the drilling a hole in you wall scale. The holes are thousands of feet down and, in many cases, in a weird crooked line (hence the directional drilling part). You have to do that because it may be more efficient to drill through a certain substrate at 500 feet and then move over a bit at 1000 feet. The level of accuracy these guys can do now is out of this world.

I was really intrigued about how they got information back from the drill head. There are tons of interesting readings you can gather from the head but the problem is that it is way under ground. Getting that data back is difficult. I have trouble getting wi-fi in my bedroom when the router is in the living room so clearly wi-fi is out as a technology.

Running a cable down there is also difficult. Unlike the putting a picture on the wall sort of drill these drills have lubricants and rock cuttings to deal with. Mud is pumped down the inside of the shaft and then recovered out the sides.

Thanks to the Energy Institute for this picture ““ http://www.energyinst.org/home

This means that any cable path is already full of highly abrasive mud. The guys told me that does work but it is flaky and you end up spending a lot of time replacing cables and also signal boosters every 200 or 300 feet.

Another way is to use low frequency radio waves which are a bit better than wi-fi at getting through rock. However there are some rocks through which it just doesn’t work to send radio waves.

The final method, and the coolest, is to use mud pulses. The mud being pumped down the well is at huge pressure (as you would expect from pumping mud). On the drill head there is a valve which can be opened and closed by the sensors on the drill head. When closed the pressure in the mud pumps increases and when opened it decreases. They literally send pulse waves through the mud they’re pumping into the well and measure them at the surface. It is outrageously low bandwidth but that it actually works at all is amazing to me. It is effectively a digital signal in mud. What a cool application of technology!

It gets better, though. If you have a valve with various open and closed settings you can vary the pressure to create a waveform varying both amplitude and frequency.

At shallow depths you can get as much as 40bit/s through mud pulsing although at deeper depths that drops off significantly to perhaps 1.5bit/s at 12 000m. 40 bit/s means that we’re running at 40Hz which is fricking huge for a mechanical valve under that much pressure to open and close.

I would not have invented this.

2014-01-28

Illegal file name characters

If you’re code is interacting with the file system at all in Windows then you might want to think about what sort of protection you have in place for illegal characters in file names. Previously my code was checking only a handful, a handful I got from looking at the error message windows gives you when you attempt to rename a file using one of these characters.

This is but the tip of the iceberg. Instead use something like this

*Efficiency of this code not tested and probably terrible

2014-01-27

Building Redis keys

I was having a discussion with somebody the other day and wanted to link them to an article on how you can use compound names as effective keys in key value stores like Redis. Weirdly I couldn’t find a good article on it. Perhaps my google-fu (actually duck-duck-go-fu now) isn’t as strong as I thought or perhaps there isn’t an article. In either case this is now going to be an article on how to build keys for KV stores.

KV-stores are one of the famed NOSQL databases. There are quite a few options for key value storesthese days, I like Redis and I’ve also made use of Azure’s blob storage which is basically a key value store. KV-stores are interesting in that they are highly optimized for looking things up by a known key. Unlike relational databases you can’t easily query against the values. This means that there is no doing

select * from users where name like ‘John%’

to find all the users with a first name of John. This isn’t the sort of operation for which key value stores are good. You can list keys in Redis by using the KEYS command but it is generally not encouraged as it isn’t performant. You really need to know the key for which you’re looking.

The KV key can be stored in another storage system like a relational database. Perhaps you might store several indexes of keys in the relational database, filter and retrieve sets of them and then pull the records from the KV-store. So say you have a list of trees you want to store. In the relational database you can create a table of tree types (coniferous, deciduous, carnivorous) and the Redis keys. You also store the maximum heights and keys in the relational database. The details of the trees go into Redis for rapid retrieval. First query against the relational database then against the KV store.

Sounds unnecessarily complicated? Yeah, I think so too. The better approach is to use a key which can be constructed without need to query another data store explicitly. If we had a database of historical values of temperatures in all of North America then we could build our keys to make them match our data access pattern. The users for this database are most interested in looking up values for their hometown on specific days. Thus we’ve constructed our keys to look like

:<state/province>::::

This allows us to build up a key programmatically from the user’s query

GET US:NewYork:Albany:1986:01:15 -> -8

We can be even smarter and build keys like

:<state/province>:::::Max :<state/province>:::::Min

For more detailed information. They take away here is that you can build large and complex keys based on hierarchical data to make your lookups run in O(1) time, which is also known as O(jolly quick) which is a good place to be. Creating effective keys requires that you know about your access patterns ahead of time but that’s true of most databases, even in a relational database you have to pick your indexes based on access patterns.

2014-01-20

Quick A/B Testing in ASP.net MVC - Part 3 Storing clicks

This blog is the third in a series about how to implement A/B testing easily in ASP.net MVC by using Redis

Having 2 pages
Choosing your groups
Storing user clicks
Seeing your data

So far we’ve seen how to select pages semi randomly and how to have two different views. In this post we’ll figure out a way of storing the click through statistics on the server. For the server I’ve chosen Redis. My reasoning was largely around how simple programming against Redis is and also that it has a built in primitive which allows for incrementing safely. At the time of writing Redis doesn’t have good support for clustering (wait until version 3!) but there is support for master-slave configurations. Weirdly our data access pattern here is more writes than reads, so having a multi-master setup might be better.For most purposes a single master should be enough to support A/B record keeping as Redis is very fast and 99% of websites aren’t going to be taxing for it. If you really are getting more hits than a Redis server can handle then perhaps reduce the percentage of people who end up doing the testing.

Redis is likely better run on a Linux box than Windows. Although there is a Windows version it was developed largely by Microsoft and it not officially supported. At the current time there are no plans to merge the Windows support back into mainline. Still the Windows version will work just fine for development purposes. I installed Redis using ChocolateyNuget

cinst redis

You can do that or download the package and do it manually.

There are a couple of .net clients available for Redis but I went with one called Booksleeve which was developed by the Stackoverflow team. This can be installed from nuget

install-package booksleeve

Now we need to write to Redis as needed. I created a simple interface to handle the logging.

LogInitialVisit is called whenever somebody visits and A/B page and LogSuccess is used when the user has performed the activity we want. The initial visit call can be hooked into the BaseController we created in part 1.

The actual Redis implementation of IABLogger looks like:

In a real implementation just using localhost for the Redis server is likely not what you want. What isn’t apparent here is that the call to Redis is performed asynchronously inside Booksleeve so this should have very little impact on the speed of page rendering.

The initial visit counter should be correctly updated. Now we need a way to update success counter. There are a couple of approaches to this but I think the best is simply to provide an end point for the views to call out for when “success” is achieved. Success can be visiting a link or just pausing on the page for a period of time; an endpoint which accepts POSTs is sufficiently flexible to handle all eventualities.

The controller looks like

I’m a big fan to TypeScript so I wrote my client side code in TypeScript

The ready function at the end hooks up links which have appropriate data- attributes. This does mean that a post is sent to our server before sending people on to their final destination. Google does more or less the same thing with search results.

If we check the results in Redis we can see that appropriate keys have been created and values set.

redis 127.0.0.1:6379> keys About* 1) “About.AboutA.success” 2) “About.AboutB.success” 3) “About.AboutA” 4) “About.AboutB” redis 127.0.0.1:6379> get About.AboutA “8” redis 127.0.0.1:6379> get About.AboutA.success “1”

In the next part we’ll look at how to retrieve and analyse the statistics we’ve gathered.

2014-01-14

Quick A/B Testing in ASP.net MVC - Part 2 Choosing Your Groups

This blog is the second in a series about how to implement A/B testing easily in ASP.net MVC by using Redis1. Having 2 pages

Choosing your groups
Storing user clicks
Seeing your data

In part one of this series we looked at a simple way to selectively render pages. However we did not take a look at how to select which users end up in each of the test groups. There is almost certainly some literature on how to select test groups. Unfortunately my attempts at investigation met with a wall of spam from marketing sites. Well we’ll just jolly well make up our own approach.

Time for an adventure!

My initial approach was to randomly assign requests to be either in A or B. However this approach isn’t ideal as it means that the same user visiting the page twice in a row may encounter different content. That seems like the sort of thing which would confuse many a user and it would irritate me as a user. It is best to not irritate our users so we need a better approach.

We can easily write a cookie to the user’s browser so they get the same page on each load. That is rather grim. We like nothing more than avoiding cookies when we can. They are inefficient and, depending on how they are implemented, a pain in a cluster. A better solution is to pick a piece of information which is readily available and use it as a hash seed. I think the two best pieces are the username and failing that (for unauthenticated users) the IP address. So long as we end up having an approximately even distribution into the A and B buckets we’re set.

We’ve been talking about having only two buckets: A and B. There is no actual dependence on there being but two buckets. Any number of buckets is fine but the complexity does tend to tick up a bit with more buckets. I have also read some suggestions that you might not wish to set up your testing to unevenly weight visits to your page. If you have a page which already works quite well you can direct, say, 90% of your users to that page. The remaining 10% of users become the testing group. In this way the majority of users get a page which has already been deemed to be good. The math to see if your new page is better is pretty simple, a little bit of Bayes and you’re set.

We’ll take a pretty basic approach.

Here we differentiate between authenticated and unauthenticated user, each one has a different strategy. The authenticated users use their user name to select a group. User names are pretty static so we should be fine hashing them and using them as a hash seed.

For unauthenticated users we use the IP address as a hash value. In some cases this will fall down due to people being behind a router but it is sufficient for our purposes.

In the next post we’ll log the clicks into Redis.

2014-01-13

Resizing an XFS disk on amazon

I have a linux server I keep around on amazon EC2 for when I need to do linuxy thing and don’t want to do them on one of my local linux boxes. I ran out of disk space for my home directory on this server. I could have mounted a new disk and moved /home to it but instead I thought I might give resizing the disk a try. Afterall it is 2014 and I’m sure resizing disks has long since been solved.

To start with I took the instance down by clicking on stop in the amazon console. Next I took a snapshot of the current disk and named it something I could easily remember.

This snapshot works simply as a backup of the current state of the disk. The next step was to build a new volume and restore the snapshot onto it.

I made the new volume much larger and selected the snapshot I had just created as the base for it. The creation process took quite some time, a good 10 minutes, which was surprising as the disk wasn’t all that big. Now I had two identical volumes except one was bigger. As I only have one instance on EC2 I attached the volume back to my current instance as /dev/sdb1. I started the instance back up and sshed in.

To actually resize the file system turned out to be stunningly easy, although I had a few false starts because I thought I was running ext3 and not xfs. You can check this by catting out /etc/mtab, the third column will be the file system type.

The existing volume and the snapshot volume will have the same UUID so this must be updated by running.

sudo xfs_admin -U generate /dev/sdb1

Now you can mount the new volume without error

sudo mount /dev/sdb1 /mnt

Then simply run

sudo xfs_growfs /dev/sdb1

This will grow the xfs volume to fill the whole volume. I then shut the machine down, switched the two disks around and rebooted. I made sure the system was working fine and had sufficient storage. As it did I deleted the old volume and the snapshot.

2014-01-06

Quick A/B Testing in ASP.net MVC - Part 1 Using 2 Pages

A/B testing is a great tool for having your users test your site. It is used by some of the largest sites on the internet to optimize their pages to get people to do exactly what they want. Usually what they want is to drive higher sales or improve click-through rates. The key is in showing users two versions of a page on your site and then monitoring the outcome of that page view. With enough traffic you should be able to derive which page is doing better. It seems somewhat intimidating to run two versions of your website and monitor what people are doing, but it isn’t. First it isn’t your entire site which is participating in a single giant A/B test, rather it is a small portion of it ““ really the smallest portion possible. In ASP.net MVC terms it is just a single view.

This blog is the first in a series about how to implement A/B testing easily in ASP.net MVC by using Redis

Having 2 pages
Choosing your groups
Storing user clicks
Seeing your data

The first thing we need is a website on which to try this out. Let’s just use the default ASP.net MVC website. Next we will need two versions of a page. I’m going to pick the about page which looks like

Next we need to decide what we’re trying to get people to do on the about page. I think I’m most interested in getting them to go look at the source code for the website. After all we like open source and maybe if we get enough people on our github site they’ll start to contribute back and grow the community. So let’s build an A version of the page.

This is a very simple version of the about page, good enough for a version 1.0. But we really want to drive people to contribute to the project so let’s set up another page.

Nothing drives contributions like a giant octo-cat. To do this I set up two pages in my views folder, aboutA.cshtml and aboutB.cshtml

The first contains the markup from the first page and the second the markup for the second page. Now we need a way to select which one of these pages we show. I’d like to make this as transparent as possible for developers on the team. The controller code for the view currently looks like

This is, in fact, the stock page code for ASP.net MVC 5. I’m going to have this page, instead, extend a new base class called BaseController which, in turn, extends Controller. Pretty much every one of my ASP.net MVC applications follows this pattern of having my controllers extend a project specific base class. I’m generally not a huge fan of inheritance but in this case it is the easiest solution. Altering the base class from Controller to BaseController is all that is needed to activate A/B testing at the controller level.

In the base class I have overridden one of the View() signatures.

In the override I intercept the view name and, if the original view name doesn’t exist I test for A and B versions of the view.

Really the only trick here is to note how we find the viewName if it doesn’t exist: we take it from the action name on the route. Checking the source for ASP.net MVC(helpfully open sourced, thanks Microsoft!) we find that this is exactly approach they take so behaviour should not change. You’ll also notice that we call out to an ABSelector to chose which view to show. I’ll cover some approaches to building the ABSelector in part II of this series. For now the code is available on github athttps://github.com/stimms/RedisAB.

I hope to have part II out in the next week.

2014-01-02

Should you store dev tools in source control?

The utility of storing build tools and 3rd party libraries in source control is an age old question. One that I have flipflopped on for a number of years. There are arguments on each side of checking things in.

	Check in	Install separately
Size of repository	Increases rapidly, time required for a fresh checkout gets long	Remains the same regardless of how much is installed
Reproducibility of builds	Good	Poor
Ease of spinning up new developers or build machines	Easy	Difficult, possibly impossible
Ease of setting up initial builds	Difficult, many build tools do not like being checked in. Paths may not work and dependent libraries are difficult to find.	No increased difficulty

There is no easy answer one way or another.

If I had to set up repositories and builds on a new project today using one of these approaches I think I would look at a number of factors to decide what to do

Lifetime of the project. Projects which have a shorter life would not see as much churn in build tools and libraries so would not benefit as much from checking in tooling. Shorter projects are also less likely to see a number of new developers join the team mid-project. New developers would have to spend time setting up their workstations to match other workstations and build servers.
Distribution of the team. Developers working in remote locations are going to have a harder time setting up builds without somebody standing over their shoulder.
Number of other projects on which the developers are working. If developers are working on more than one project then they may very well run into conflicts between the current project and other projects.
Lifespan of the code. Projects which have a long projected life will, of course, have changes in the future. If the project isn’t under continual development during that time then it is advisable to check in the development tools so that future developers can easily build and deploy the code.
The project is using unique tools or environments. For instance if the development shop is typically a .net shop but this project called for the development of an iPhone application then check in the tools.

The major disadvantages to checking in binaries are really that it slows down the checkout on the CI server and that checking in some binaries is difficult. For instance I remember a build of the Sun Studio compiler on a now very outdated version of Solaris was very particular about having been installed in /usr/local/bin. I don’t remember the solution now but it involved much fist shaking and questioning the sanity of the compiler developers.

There are now a few of new options for creating build environments which can vastly improve these issues: virtual machines, configuration management tools and package managers. Virtual machines are glorious in that they can be set up once and then distributed to the entire team. If the project is dormant then the virtual machines can be archived and restored in the future. Virtual machine hypervisors tend to be pretty good at maintaining a consistent interface allowing for older virtual machines images to be used in newer environments. There are also a lot of tools in place to convert one image format into another so you need not fret greatly about which hypervisor you use.

Configuration management tools such as puppet, chef, chocolatey and vagrant have made creating reproducible environments trivial. Instead of checking in the development tools you can simply check in the configuration and apply it over a machine image. Of course this requires that the components remain available somewhere so it might be advisable to set up a local repository of important packages. However this repository can exist outside of source control. It is even possible that, if your file system supports it, you can put symbolic links back to the libraries instead of copying them locally. It is worth experimenting to see if this optimization actually improves the speed of builds.

Package managers such as nuget, npm and gem fulfill the same role as configuration management tools but for libraries. In the past I’ve found this to fall down a bit when I have to build my own versions of certain libraries (I’m looking at you SharpTestsEx). I’ve previously checked these into a lib folder. However it is a far better idea to run your own internal repository of these specially modified packages ““ assuming they remain fairly static. If they change frequently then including their source in your project could be a better approach.

Being able to reproduce builds in a reliable fashion and have the ability to pull out a long mothballed project are more important than we like to admit. Unfortunately not all of us can work on greenfield projects at all time so it remains important to make the experience of building old projects as simple as possible. In today’s fast moving world a project which is only two or three years old could well be written with techniques or tools which we would now consider comically outdated.

Edit: Donald Belcham pointed out to me that I hadn’t talked about pre-built VMs

@stimms you forgot pre-built development (or project) specific VMs

“” Donald Belcham (@dbelcham) January 2, 2014

It is a valid approach but it isn’t one I really like. The idea here is that you build a VM for development or production and everything uses that. When changes are needed you update the template and push it out. I think this approach encourages being sloppy. The VMs will be configured but a lot of different people and will get a lot of different changes to them over the lifetime of a project. Instead of having a self-documenting process as chef or puppet would provide you have a mystery box. If there is a change to something basic like the operating system then there is no clear path to get back to an operating image using the new OS. With a configuration management tool you can typically just apply the same script to the new operating system and get close to a working environment. In the long run it is little better than just building snowflake servers on top of bare metal.

A blog about computer programming and technology.

My Books