2. Choosing your groups
3. Storing user clicks
4. Seeing your data
Time for an adventure!
My initial approach was to randomly assign requests to be either in A or B. However this approach isnâ€™t ideal as it means that the same user visiting the page twice in a row may encounter different content. That seems like the sort of thing which would confuse many a user and it would irritate me as a user. It is best to not irritate our users so we need a better approach.
We can easily write a cookie to the userâ€™s browser so they get the same page on each load. That is rather grim. We like nothing more than avoiding cookies when we can. They are inefficient and, depending on how they are implemented, a pain in a cluster. A better solution is to pick a piece of information which is readily available and use it as a hash seed. I think the two best pieces are the username and failing that (for unauthenticated users) the IP address. So long as we end up having an approximately even distribution into the A and B buckets weâ€™re set.
Weâ€™ve been talking about having only two buckets: A and B. There is no actual dependence on there being but two buckets. Any number of buckets is fine but the complexity does tend to tick up a bit with more buckets. I have also read some suggestions that you might not wish to set up your testing to unevenly weight visits to your page. If you have a page which already works quite well you can direct, say, 90% of your users to that page. The remaining 10% of users become the testing group. Â In this way the majority of users get a page which has already been deemed to be good. The math to see if your new page is better is pretty simple, a little bit of Bayes and youâ€™re set.
Weâ€™ll take a pretty basic approach.
Here we differentiate between authenticated and unauthenticated user, each one has a different strategy. The authenticated users use their user name to select a group. User names are pretty static so we should be fine hashing them and using them as a hash seed.
For unauthenticated users we use the IP address as a hash value. In some cases this will fall down due to people being behind a router but it is sufficient for our purposes.
In the next post weâ€™ll log the clicks into Redis.