Simon Online

2014

2014-01-20

Quick A/B Testing in ASP.net MVC - Part 3 Storing clicks

This blog is the third in a series about how to implement A/B testing easily in ASP.net MVC by using Redis

Having 2 pages
Choosing your groups
Storing user clicks
Seeing your data

So far we’ve seen how to select pages semi randomly and how to have two different views. In this post we’ll figure out a way of storing the click through statistics on the server. For the server I’ve chosen Redis. My reasoning was largely around how simple programming against Redis is and also that it has a built in primitive which allows for incrementing safely. At the time of writing Redis doesn’t have good support for clustering (wait until version 3!) but there is support for master-slave configurations. Weirdly our data access pattern here is more writes than reads, so having a multi-master setup might be better.For most purposes a single master should be enough to support A/B record keeping as Redis is very fast and 99% of websites aren’t going to be taxing for it. If you really are getting more hits than a Redis server can handle then perhaps reduce the percentage of people who end up doing the testing.

Redis is likely better run on a Linux box than Windows. Although there is a Windows version it was developed largely by Microsoft and it not officially supported. At the current time there are no plans to merge the Windows support back into mainline. Still the Windows version will work just fine for development purposes. I installed Redis using ChocolateyNuget

cinst redis

You can do that or download the package and do it manually.

There are a couple of .net clients available for Redis but I went with one called Booksleeve which was developed by the Stackoverflow team. This can be installed from nuget

install-package booksleeve

Now we need to write to Redis as needed. I created a simple interface to handle the logging.

LogInitialVisit is called whenever somebody visits and A/B page and LogSuccess is used when the user has performed the activity we want. The initial visit call can be hooked into the BaseController we created in part 1.

The actual Redis implementation of IABLogger looks like:

In a real implementation just using localhost for the Redis server is likely not what you want. What isn’t apparent here is that the call to Redis is performed asynchronously inside Booksleeve so this should have very little impact on the speed of page rendering.

The initial visit counter should be correctly updated. Now we need a way to update success counter. There are a couple of approaches to this but I think the best is simply to provide an end point for the views to call out for when “success” is achieved. Success can be visiting a link or just pausing on the page for a period of time; an endpoint which accepts POSTs is sufficiently flexible to handle all eventualities.

The controller looks like

I’m a big fan to TypeScript so I wrote my client side code in TypeScript

The ready function at the end hooks up links which have appropriate data- attributes. This does mean that a post is sent to our server before sending people on to their final destination. Google does more or less the same thing with search results.

If we check the results in Redis we can see that appropriate keys have been created and values set.

redis 127.0.0.1:6379> keys About* 1) “About.AboutA.success” 2) “About.AboutB.success” 3) “About.AboutA” 4) “About.AboutB” redis 127.0.0.1:6379> get About.AboutA “8” redis 127.0.0.1:6379> get About.AboutA.success “1”

In the next part we’ll look at how to retrieve and analyse the statistics we’ve gathered.

2014-01-14

Quick A/B Testing in ASP.net MVC - Part 2 Choosing Your Groups

This blog is the second in a series about how to implement A/B testing easily in ASP.net MVC by using Redis1. Having 2 pages

Choosing your groups
Storing user clicks
Seeing your data

In part one of this series we looked at a simple way to selectively render pages. However we did not take a look at how to select which users end up in each of the test groups. There is almost certainly some literature on how to select test groups. Unfortunately my attempts at investigation met with a wall of spam from marketing sites. Well we’ll just jolly well make up our own approach.

Time for an adventure!

My initial approach was to randomly assign requests to be either in A or B. However this approach isn’t ideal as it means that the same user visiting the page twice in a row may encounter different content. That seems like the sort of thing which would confuse many a user and it would irritate me as a user. It is best to not irritate our users so we need a better approach.

We can easily write a cookie to the user’s browser so they get the same page on each load. That is rather grim. We like nothing more than avoiding cookies when we can. They are inefficient and, depending on how they are implemented, a pain in a cluster. A better solution is to pick a piece of information which is readily available and use it as a hash seed. I think the two best pieces are the username and failing that (for unauthenticated users) the IP address. So long as we end up having an approximately even distribution into the A and B buckets we’re set.

We’ve been talking about having only two buckets: A and B. There is no actual dependence on there being but two buckets. Any number of buckets is fine but the complexity does tend to tick up a bit with more buckets. I have also read some suggestions that you might not wish to set up your testing to unevenly weight visits to your page. If you have a page which already works quite well you can direct, say, 90% of your users to that page. The remaining 10% of users become the testing group. In this way the majority of users get a page which has already been deemed to be good. The math to see if your new page is better is pretty simple, a little bit of Bayes and you’re set.

We’ll take a pretty basic approach.

Here we differentiate between authenticated and unauthenticated user, each one has a different strategy. The authenticated users use their user name to select a group. User names are pretty static so we should be fine hashing them and using them as a hash seed.

For unauthenticated users we use the IP address as a hash value. In some cases this will fall down due to people being behind a router but it is sufficient for our purposes.

In the next post we’ll log the clicks into Redis.

2014-01-13

Resizing an XFS disk on amazon

I have a linux server I keep around on amazon EC2 for when I need to do linuxy thing and don’t want to do them on one of my local linux boxes. I ran out of disk space for my home directory on this server. I could have mounted a new disk and moved /home to it but instead I thought I might give resizing the disk a try. Afterall it is 2014 and I’m sure resizing disks has long since been solved.

To start with I took the instance down by clicking on stop in the amazon console. Next I took a snapshot of the current disk and named it something I could easily remember.

This snapshot works simply as a backup of the current state of the disk. The next step was to build a new volume and restore the snapshot onto it.

I made the new volume much larger and selected the snapshot I had just created as the base for it. The creation process took quite some time, a good 10 minutes, which was surprising as the disk wasn’t all that big. Now I had two identical volumes except one was bigger. As I only have one instance on EC2 I attached the volume back to my current instance as /dev/sdb1. I started the instance back up and sshed in.

To actually resize the file system turned out to be stunningly easy, although I had a few false starts because I thought I was running ext3 and not xfs. You can check this by catting out /etc/mtab, the third column will be the file system type.

The existing volume and the snapshot volume will have the same UUID so this must be updated by running.

sudo xfs_admin -U generate /dev/sdb1

Now you can mount the new volume without error

sudo mount /dev/sdb1 /mnt

Then simply run

sudo xfs_growfs /dev/sdb1

This will grow the xfs volume to fill the whole volume. I then shut the machine down, switched the two disks around and rebooted. I made sure the system was working fine and had sufficient storage. As it did I deleted the old volume and the snapshot.

2014-01-06

Quick A/B Testing in ASP.net MVC - Part 1 Using 2 Pages

A/B testing is a great tool for having your users test your site. It is used by some of the largest sites on the internet to optimize their pages to get people to do exactly what they want. Usually what they want is to drive higher sales or improve click-through rates. The key is in showing users two versions of a page on your site and then monitoring the outcome of that page view. With enough traffic you should be able to derive which page is doing better. It seems somewhat intimidating to run two versions of your website and monitor what people are doing, but it isn’t. First it isn’t your entire site which is participating in a single giant A/B test, rather it is a small portion of it ““ really the smallest portion possible. In ASP.net MVC terms it is just a single view.

This blog is the first in a series about how to implement A/B testing easily in ASP.net MVC by using Redis

Having 2 pages
Choosing your groups
Storing user clicks
Seeing your data

The first thing we need is a website on which to try this out. Let’s just use the default ASP.net MVC website. Next we will need two versions of a page. I’m going to pick the about page which looks like

Next we need to decide what we’re trying to get people to do on the about page. I think I’m most interested in getting them to go look at the source code for the website. After all we like open source and maybe if we get enough people on our github site they’ll start to contribute back and grow the community. So let’s build an A version of the page.

This is a very simple version of the about page, good enough for a version 1.0. But we really want to drive people to contribute to the project so let’s set up another page.

Nothing drives contributions like a giant octo-cat. To do this I set up two pages in my views folder, aboutA.cshtml and aboutB.cshtml

The first contains the markup from the first page and the second the markup for the second page. Now we need a way to select which one of these pages we show. I’d like to make this as transparent as possible for developers on the team. The controller code for the view currently looks like

This is, in fact, the stock page code for ASP.net MVC 5. I’m going to have this page, instead, extend a new base class called BaseController which, in turn, extends Controller. Pretty much every one of my ASP.net MVC applications follows this pattern of having my controllers extend a project specific base class. I’m generally not a huge fan of inheritance but in this case it is the easiest solution. Altering the base class from Controller to BaseController is all that is needed to activate A/B testing at the controller level.

In the base class I have overridden one of the View() signatures.

In the override I intercept the view name and, if the original view name doesn’t exist I test for A and B versions of the view.

Really the only trick here is to note how we find the viewName if it doesn’t exist: we take it from the action name on the route. Checking the source for ASP.net MVC(helpfully open sourced, thanks Microsoft!) we find that this is exactly approach they take so behaviour should not change. You’ll also notice that we call out to an ABSelector to chose which view to show. I’ll cover some approaches to building the ABSelector in part II of this series. For now the code is available on github athttps://github.com/stimms/RedisAB.

I hope to have part II out in the next week.

2014-01-02

Should you store dev tools in source control?

The utility of storing build tools and 3rd party libraries in source control is an age old question. One that I have flipflopped on for a number of years. There are arguments on each side of checking things in.

	Check in	Install separately
Size of repository	Increases rapidly, time required for a fresh checkout gets long	Remains the same regardless of how much is installed
Reproducibility of builds	Good	Poor
Ease of spinning up new developers or build machines	Easy	Difficult, possibly impossible
Ease of setting up initial builds	Difficult, many build tools do not like being checked in. Paths may not work and dependent libraries are difficult to find.	No increased difficulty

There is no easy answer one way or another.

If I had to set up repositories and builds on a new project today using one of these approaches I think I would look at a number of factors to decide what to do

Lifetime of the project. Projects which have a shorter life would not see as much churn in build tools and libraries so would not benefit as much from checking in tooling. Shorter projects are also less likely to see a number of new developers join the team mid-project. New developers would have to spend time setting up their workstations to match other workstations and build servers.
Distribution of the team. Developers working in remote locations are going to have a harder time setting up builds without somebody standing over their shoulder.
Number of other projects on which the developers are working. If developers are working on more than one project then they may very well run into conflicts between the current project and other projects.
Lifespan of the code. Projects which have a long projected life will, of course, have changes in the future. If the project isn’t under continual development during that time then it is advisable to check in the development tools so that future developers can easily build and deploy the code.
The project is using unique tools or environments. For instance if the development shop is typically a .net shop but this project called for the development of an iPhone application then check in the tools.

The major disadvantages to checking in binaries are really that it slows down the checkout on the CI server and that checking in some binaries is difficult. For instance I remember a build of the Sun Studio compiler on a now very outdated version of Solaris was very particular about having been installed in /usr/local/bin. I don’t remember the solution now but it involved much fist shaking and questioning the sanity of the compiler developers.

There are now a few of new options for creating build environments which can vastly improve these issues: virtual machines, configuration management tools and package managers. Virtual machines are glorious in that they can be set up once and then distributed to the entire team. If the project is dormant then the virtual machines can be archived and restored in the future. Virtual machine hypervisors tend to be pretty good at maintaining a consistent interface allowing for older virtual machines images to be used in newer environments. There are also a lot of tools in place to convert one image format into another so you need not fret greatly about which hypervisor you use.

Configuration management tools such as puppet, chef, chocolatey and vagrant have made creating reproducible environments trivial. Instead of checking in the development tools you can simply check in the configuration and apply it over a machine image. Of course this requires that the components remain available somewhere so it might be advisable to set up a local repository of important packages. However this repository can exist outside of source control. It is even possible that, if your file system supports it, you can put symbolic links back to the libraries instead of copying them locally. It is worth experimenting to see if this optimization actually improves the speed of builds.

Package managers such as nuget, npm and gem fulfill the same role as configuration management tools but for libraries. In the past I’ve found this to fall down a bit when I have to build my own versions of certain libraries (I’m looking at you SharpTestsEx). I’ve previously checked these into a lib folder. However it is a far better idea to run your own internal repository of these specially modified packages ““ assuming they remain fairly static. If they change frequently then including their source in your project could be a better approach.

Being able to reproduce builds in a reliable fashion and have the ability to pull out a long mothballed project are more important than we like to admit. Unfortunately not all of us can work on greenfield projects at all time so it remains important to make the experience of building old projects as simple as possible. In today’s fast moving world a project which is only two or three years old could well be written with techniques or tools which we would now consider comically outdated.

Edit: Donald Belcham pointed out to me that I hadn’t talked about pre-built VMs

@stimms you forgot pre-built development (or project) specific VMs

“” Donald Belcham (@dbelcham) January 2, 2014

It is a valid approach but it isn’t one I really like. The idea here is that you build a VM for development or production and everything uses that. When changes are needed you update the template and push it out. I think this approach encourages being sloppy. The VMs will be configured but a lot of different people and will get a lot of different changes to them over the lifetime of a project. Instead of having a self-documenting process as chef or puppet would provide you have a mystery box. If there is a change to something basic like the operating system then there is no clear path to get back to an operating image using the new OS. With a configuration management tool you can typically just apply the same script to the new operating system and get close to a working environment. In the long run it is little better than just building snowflake servers on top of bare metal.

2013

2013-12-30

Grunt your ASP.net builds

Visual Studio 2013 has made great strides at being a great tool for web development. When combined with the webessentials package it is actually a very good tool, maybe the best tool out there. However it has one shortcoming in my mind and that is the poor support for third party tools. It is difficult to track the latest trends in web development when you’re working on a yearly release cycle, as Visual Studio does. This is somewhat ameliorated by the rapid release of plugins and extensions. Still I think think that the real innovative development is happening in the nodejs community.

One of the things to come out of the nodejs community is the really amazing build tool grunt. If you haven’t seen grunt then it is worth watching my mentor, Dave Mosher’s video on automating front end workflows with grunt. In that video he takes us through a number of optimizations of CSS and JavaScript using grunt. Many of these optimizations are not things which exist yet in any Visual Studio tooling that I’ve seen. Even in cases where the tooling does exist I find that it is being incorrectly executed.

Ideally your development environment mirrors your production environment exactly. However time and financial constraints make that largely impossible. If production is a hundred servers there is just no reasonable way to get a hundred servers for each developer to work on all the time. Unless your company has pots of money

In which case, give me a call, I would be happy to spend it on insane projects for you. Perhaps you need a website which runs on a full stack of poutine instead of a full stack of JavaScript”¦

Visual Studio falls down because when you F5 a project the environment into which you’re placed is not the same as the package which is generated by an upload to your hosting provider. It is close but there are some differences such as

files which are not included in the project but are on disk will be missing in the package
JavaScript combination and minification are typically turned off on dev
CSS files are similarly not minified or combined

These actions can be breaking changes which will not be apparent in the development environment. For instance changing the names of function arguments, as is common in minification, tends to break AngularJS’ injection.

Thus I actually suggest that you use Grunt instead of the built in minification in ASP.net. Let’s see how to do exactly that.

The first thing you’ll need is a copy of nodejs. This can simply be the copy which is installed on all your developer workstations and build server or it can be a version checked into source control(check in vs. install is a bit of a holy war and I won’t get into it in this post). If you’re going to check a copy of node in then you might want to look at minimizing the footprint of what you check in. It is very easy with node to install a lot of packages you don’t actually need. To isolate your packages to your current build then you can simply create a “node_modules” directory. npm, the package management system used by node, will recurse upwards to find a directory called node_modules and install to that directory.

Let’s assume that you have node installed in a tools directory at the root of your project, next to the solution file, for the purposes of this post. In there create an empty node_modules directory

Node and an empty node_modules directory

Now that node is in place you can install grunt and any specific grunt tasks you need. I have a separate install of node on this machine which includes npm so I don’t need to put a copy in node_modules. I would like to have grunt in the modules directory, though.

npm install grunt

This will install a copy of grunt into the node_modules directory. There are a huge number of grunt tasks which might be of use to use in our project. I’m going to stick to a small handful for minifying JavaScript and base64 encoding images into data-uris.

npm install grunt-contrib-uglify npm install grunt-contrib-concat npm install grunt-data-uri

Now we will need a gruntfile to actually run the tasks. I played around a bit with where to put the gruntfile and I actually found that the most reliable location was next to the node executable. Because grunt and really node in general are more focused around convention over configuration we sometimes need to do things which seem like hacks. The basic gruntfile I created looked like

This is pretty basic but it does take all the javascript files in the project and combines them into one then minifies that file. It will also insert dataURIs into your css files for embedded images. A real project will need a more complete grunt file.

Now we need to run this grunt process as part of the build. As I mentioned this is a bit convoluted because of convention and also because we’re choosing to use a locally installed grunt and node instead of the global versions.

First we create a runGrunt.bat file. This is what visual studio will call as part of the post build.

Then tie this into the msbuild process. This can be done by editing the .csproj/.vsproj file or by adding it using Visual Studio

The final step is to create a wrapper for grunt as we don’t have the globally registered grunt

Now when the build runs it will call out to grunt via node.One final thing to keep in mind is the output files from the grunt process are what is needed to be included in your html files instead of the source files. So you’ll want to include all.min.js instead of the variety of JavaScript files.

This opens up a world of advanced build tools which we wouldn’t have at our disposal otherwise. At publication there are 2 047 grunt plugins which do everything from zip files, to check JavaScript styles, to running JavaScript unit tests”¦ Writing your own tasks is also very easy(much easier than writing your own msbuild tasks). Setting up grunt in this fashion should work for automated builds as well.

Best of luck using node to improve your ASP.net builds!

2013-12-27

Specific Types and Generic Collections

Generics are pretty nifty tools in statically typed languages. They allow for one container to contain specific types, in effect allowing the container to become an infinite number of specifically typed collections. You still get the strong type checking when manipulating the contents of a collection but don’t have to bother creating specific collections for each type.

If you’re working in a strongly typed language then having access to generic collections can be a huge time saver. Before generics were introduced in a lot of timewas spent casting the contents of collections to their correct type.

Casting like this is error-prone and is the sort of thing which makes Hungarian notation look like a good idea. However generics are like most other concepts in programming: dangerous when overused.

C# provides a bunch of generic collections in System.Collections.Generic which are fantastic. However there are also some types which are ill advised. Tuple is one of the worst offenders

Does anyone have a valid use case for a Tuple<T1,T2> where an explicit strongly-typed value object (even if generic) would not be better?

“” David Alpert (@davidalpert) December 19, 2013

There are actually 8 overloads of Tuple allowing for near endless combinations of types. The more arguments the worse the abuse of generics. The final overload of tuple hilariously has a final argument called “Rest” which allows for passing in an arbitrary number of values. As David Alpert suggested it is far better to build proper types in place of these generic tuples. Having proper classes better communicates the meaning of the code and gives you more maneuverability in the case of code changes. Tuples don’t support naming each attribute so you have to guess what they are or go look up where the tuple was created. This is not maintainable at all and is of no help to other programmers who will be maintaining the code.

The C# team actually implemented Tuples as a convenience to help support F#’s use of tuples which is more reasonable. Eric Lippert suggests that you might want to group values together using a tuple when they are weakly related and have no business meaning. I think that’s garbage. The idea that OO programming objects should somehow all map to real world objects is faulty in my mind. Certainly some should map to real world items but if you’re building a system for a library then it pedantry to have a librarian class. The librarian is a construct of the actual business process which is checking out and returning books.

So that brings me to the first part of my rule on generics: use specific types for types instead of generic objects avoiding use of Tuple and similar generic classes. Building a class takes but a moment and even if it is just a container object it will at least have some context around it.

On the flip side generics are ideal for collections. I frequently run into libraries which have defined their own strongly typed collections. These collections are difficult to work with as they rarely implement IEnummerable or IQueryable. If new features are added to these interfaces such as with LINQ there is no automatic support for it in the legacy collections. It is also difficult to build the collections initially. For collections of arbitrary length make use of the generic collections, for collections of finite length use a custom class.

Generics are powerful but attention must be paid to proper development practices when using them.

2013-12-23

Speeding up page loading "“ Part 4

In the first three parts of this series we looked at JavaScript and CSS optimizations, image reduction and query reduction. In this part we’ll look at some options around optimizing the actual queries. I should warn you straight up that I am not a DBA. I’m only going to suggest the simplest of things, if you need to do some actual query optimizations then please consult with somebody who really understands this stuff like Mike DeFehror Markus Winand.

The number of different queries run on a site is probably smaller than you think. Certainly most pages on your site are, after the optimizations from part 3, going to be running only a couple of queries with any frequency. Glimpse will let you know what the queries are but, more importantly, it will show you how long each query takes to execute.

Without knowing an awful lot about the structure of your database and network it is hard to come up with a number for how long a query should take. Ideally you want queries which take well under 100ms as keeping page load times is important. People hate waiting for stuff to load, which is, I guess, the whole point of this series.

Optimizing queries is a tricky business but one trick is to take the query and paste it into SQL Management Studio. Once you have it in there click on the actual execution plan button. This will show you the execution plan which is the strategy the database believes is the optimal route to run the query.

If you’re very lucky the query analyser will suggest a new index to add to your database. If not then you’ll have to drill down into the actual query plan. In there you want to see a lot of

Index seek

Index seeks and index scans are far more efficient than table seeks and scans. Frequently table scans can be avoided by adding an index to the table. I blogged a while back about how to optimize queries with indexes. That article has some suggestions about specific steps you can take. One which I don’t think I talk about in that article is to reduce the number of columns returned. Unfortunately EF doesn’t have support for selectively returning columns or lazily loading columns. If that level of query tuning appeals to you then you may wish to swap out EF for something like Dapper or Massivewhich are much lower level.

If you happen to be fortunate enough to have a copy of Visual Studio ULTIMATE (or, as the local Ruby user group lead calls it: Ultra-Professional-Premium-Service-Pack-Two-Release-2013) then there is another option I forgot to mention in part 3. IntelliTrace will record the query text run by EF. I still think that Glimpse and EFProf are better options as they are more focused tools. However Glimpse does sometimes get a bit confused by single page applications and EFProf costs money so IntelliTrace will work.

2013-12-17

Learning Queue

There is so much stuff out there to learn it is crazy. It is said that the last person who knew everything was Thomas Goethe, or Immanuel Kant or perhaps John Stuart Mill. These men all lived some time between 1724 and 1873. Personally I think the assertion that human knowledge was ever so small as to have been knowable by a single person is bunkum. Even a millennium ago there were certainly those elders who spent their entire lives learning the patterns of flow of a single river or how to tell which plants to plant based on the weather. A person who knew everything is a romantic idea so I can certainly see the appeal of believing such a notion.

These days the breadth of knowledge is so expansive that knowing everything about even the smallest topic is impossible. I once heard a story (sorry, I don’t remember the source) of a famous wrist surgeon who operated only on right wrists. He was that specialized. The field of computing is a particularly difficult one to explore as the rate of new ideas is breakneck. 4 years ago there was no nodejs, no coffeescript, no less)- web development was totally different. Chef hadn’t been released and nuget was a twinkle in Phil Haack’s eye.

Somebody was talking about how the pace of new development ideas in computing has accelerated in the past two or three years and that since the .com crash of 2000 to 2009ish innovation had really been slow. If that is true it means that for almost my entire technical life innovation has been slow. I’ve been hanging on by my fingernails and innovation has been slow? Oh boy.

In an e-mail the other day I mentioned to somebody that I had to add Erlang to my learning queue. The learning queue is just an abstract idea, I don’t have a queue. Well, I didn’t have a queue. That changes today. I have so many things to learn that I can’t even remember them.

So with the most pressing things at the top of my queue the list currently looks like

Azure Scheduler
AngularJS
Grunt
NodeJS
Async in .net (still don’t fully understand this)
E-mail server signatures
Erlang
GoLang
Kinect gestures
Raspberry Pi
Some hardware sensors about which I know so little I can’t even put a name on this list

I’ll keep this list up to date as I learn things and as things move up and down in order. What is in your learning queue?

2013-12-16

Speeding up page loading "“ Part 3

Welcome to part 3 of my series on speeding up pages loading. The first twoparts were pretty generic in nature but this one is going to focus in a little bit more on ASP.net. I think the ideas are still communicable to other platforms but the tooling I’m going to use is ASP.net only.

Most of the applications you’re going to write will be backed by a relational database. As websites become more complicated the complexity of queries tends to increase but also the number of queries increases. In .net a lot of the complexity of queries is hidden by abstraction layers like Entity framework and NHibernate. Being able to visualize the queries behind a piece of code which operates on an abstract collection is very difficult. Lazy loading of objects and the nefarious n+1 problem can cause even simple looking pages to become query rich nightmares.

Specialized tooling is typically needed to examine the queries running behind the scenes. If you’re working with the default Microsoft stack(ASP.net + Entity Framework + SQL Server) there are a couple of tools at which you should look. The first is Hibernating Rhino’s EFProf. This tool hooks into Entity Framework and provides profiling information about the queries being run. What’s better is that it is able to examine the patterns of queries and highlight anti-patterns.

The second tool is the absolutely stunning looking Glimpse. Glimpse is basically a server side version of the F12/FireBug/Developer Tools which exists in modern browsers. It profiles not just the SQL but also which methods on the server are taking a long time. It is an absolute revolution in web development as far as I’m concerned. Its SQL analysis is not as powerful as EFProf but you can still gain a great deal of insight from looking at the generated SQL.

We’ll be making use of Glimpse in this post and the next. There is already a fine installation guide available so I won’t go into details about how to do that. In the application I was profiling I found that the dashboard page, a rather complex page, was performing over 300 queries. It was a total disaster of query optimization and was riddled with lazy loading issues and repeated queries. Now a sensible developer would have had Glimpse installed right from the beginning and would have been watching it as they developed the site to ensure nothing was getting out of hand. Obviously I don’t fall into that category.

A lot of queries

So the first step is to reduce the number of queries which are run. I started by looking for places where there were a lot of similar queries returning a single record. This pattern is indicative of some property of items in a collection being lazily loaded. There were certainly some of those and they’re an easy fix. All that is needed is to add an include to the query.

Thus the query which looked like

became

In another part of the page load for this page the users collection was iterated over as was the permission collection. This caused a large number of follow on queries. After eliminating these queries I moved onto the next problem.

If you notice in the screenshot above the number of returned records is listed. In a couple of places I was pulling in a large collection of data just to get its count. It is much more efficient to have this work done by the SQL server. This saves on transferring a lot of unneeded data to the web serving tier. I rewrote the queries to do a better job of using the SQL server.

These optimizations helped in removing a great number of the queries on the dashboard. The number fell from over 300 to about 30 and the queries which were run were far more efficient. Now this is still a pretty large number of queries so I took a closer look at the data being displayed on the dashboard.

Much of the data was very slow changing data. For instance the name of the company and the project. These are unlikely to change after creation and if they do probably only once. Even some of the summary information which summarized by week would not see significant change over the course of an hour. Slow changing data is a prime candidate for caching.

ASP.net MVC offers a very easy method of doing caching of entire pages or just parts of them. I split the single large monolithic page into a number of components each one of which was a partial view. For instance this section calls to 3 partial views

As each graph changes at a different rate the caching properties of the graph are different. The methods for the partial is annotate with an output cache directive. In this case we cache the output for 3600 seconds or an hour.

Sometimes caching the entire output from a view is more than you want. For instance one of the queries which was being commonly run was to get the name of the company from the Id. This is called in a number of places not of them in a view. Fortunately the caching mechanisms for .net are available outside of the views

One thing to remember is that the default caching mechanism in ASP.net is a local cache. So if you have more than one server running your site(and who doesn’t these days?) then the cache value will not be shared. If shared cache is needed then you’ll have to look to an external service such as the Azure cache or ElastiCacheor perhaps a memcache or Redis server in your own data center.

On my website these optimizations were quite significant. They reduced page generation time from about 3 seconds to 200ms.An optimization I didn’t try as it isn’t available in EF5 is to use asynchronous queries. When I get around to move the application to EF6 I may give that a shot.

Archives

A blog about computer programming and technology.

My Books