2011-08-18

Configuring the Solr Virtual Appliance

I ran into a situation today where I needed a search engine in an application I was writing. The data I was searching was pretty basic and I could have easily created an SQL query to do the lookups I wanted. I was, however, suspicious that if I implemented this correctly there would be a desire for applying the search in other places in the application. I knew a little bit about Lucene so I went and did some research on it reading a number of blogs and the project documentation. It quickly became obvious that keeping the index in sync when I was writing new documents from various nodes in our web cluster would be difficult. Being a good service bus and message queue zealot that seemed like an obvious choice: throw up some message queues and distribute updates. Done.

I then came across Solr which is a web version of Lucene. Having one central search server would certainly help me out. There might be scalability issues in some imaginary future world where users made heavy use of search but in that future world I am rich and don’t care about such thing being far too busy plotting to release dinosaurs onto the floor of the New York Stock Exchange.

I was delighted to find that there exists a virtual appliance with Solr installed on it already. If you haven’t seen a virtual appliance before it is an image of an entire computer which is dedicated to one task and comes preconfigured for it. I am pretty sure this is where a lot of server hosting is going to end up in the next few years.

Once I had the image downloaded and running in Virtual Box I had to configure it to comply with my document schema. The installed which is installed points at the example configuration files. This can be changed in /usr/share/tomcat6/conf/Catalina/localhost/solr.xml, I pointed it at /usr/local/etc/solr. The example configuration files were copied into that directory for hackification.

cp -r /usr/share/apache-solr-1.4.1/example/solr/* /usr/local/etc/solr/

Once you’ve got these files in place you can crack open the schema.xml and remove all the fields in there which are extraneous to your needs. You’ll also want to remove them from the copyField section. This section builds up a super field which contains a bunch of other fields to make searching multiple fields easier. I prefer using this DisMax query to list the fields I want to search explicitly.

2011-01-25

The ONE API

My boss was pretty excited when he came back from some sort of mobile technology symposium offered in Calgary last week. I like it when he goes to these things because he comes back full of wild ideas and I get to implement them or at least I get to think about implementing them which is almost as good. In this case it was THE ONE API. I kid you not, this is actually the name of the thing. It is an API for sending text messages to mobile phone users, getting their location and charging them money. We were interested in sending SMS messages, not because we’re a vertical marketing firm or have some fantastic news for you about your recent win of 2000 travel dollars, we have legitimate reasons. Mostly.

Anyway I signed up for an account over at https://canada.oneapi.gsmworld.com/ and waited for them to authorize my account. I guess the company is out of the UK so it took them until office hours in GMT to get me the account. No big deal. In I logged and headed over to the API documentation. They offer a SOAP and a REST version of the API so obviously I fired up curl and headed over to the sandbox url documentation in hand. It didn’t work. Not at all.

curl https://canada.oneapi.gsmworld.com/SendSmsService/rest/sandbox/ -d version=0.91 -d address=tel:+14034111111 -d message=APITEST
In theory this command should have sent me a message(I changed the phone number so you internet jerks can’t actually call me) or at worst return a helpful error message, so said the API documentation.

What actually happened was that it failed with a general 501 error, wrapped in XML. Not good, the error should be in JSON, so says the API docs. It also shouldn’t fail and if it does the error should be specific enough for me to fix. I traced the request and I was sending exactly what I should have been.

No big deal, I’ll try the SOAP API and then take my dinosaur for a walk. The WSDL they provided contained links to other WSDLs, a pretty common practice. However the URLs in the WSDL were pointing to some machine which was clearly behind their firewall making it impossible for me to use it.

I gave up at that point. These guys are not the only people who are competing in the SMS space and if they can’t get the simplest part of their service, the API, right I think we’re done here. Add to this that they only support the three major telcos in Canada(Telus, Bell and Rogers) and there are much better options available. Twilio supports all carriers in Canada and the US and they charge, at current exchange rates, 1.5 cents a message less than these guys. Sorry THE ONE API you’ve been replaced by a better API, cheaper messaging and better compatibility.

2011-01-12

Test Categories for MSTest

The version of MSTest which comes with Visual Studio 2010 has a new feature in it: test categories. These allow you to put your tests into different groups which can be configured to run or not run depending on your settings. In my case this was very handy for a specific test. Most of my database layer is mocked out and I run the tests against an in memory instance of SQLite. In the majority of cases this gives the correct results, however I had one test which required checking that values were persisted properly across database connections. This is problematic as the in memory SQLite database is destroyed on the close of a connection. There were other possible workarounds for this but I chose to just have that one test run against the actual MSSQL database. Normally you wouldn’t want to do this but it is just one very small test and I’m prepared to hit the disk for it. I don’t want this particular test to run on the CI server as it doesn’t have the correct database configured.

In order to make use of a test category start by assigning one with the TestCategory attribute.

[TestMethod]
[TestCategory(“MSSQLTests”)]
public void ShouldPersistSagaDataOverReinit()
{
FluentNhibernateDBSagaPersister sagaPersister = new FluentNhibernateDBSagaPersister(new Assembly[] { this.GetType().Assembly }, MsSqlConfiguration.MsSql2008.ConnectionString(ConfigurationManager.ConnectionStrings[“sagaData”].ConnectionString), true);
…buncha stuff…
Assert.AreEqual(data, newData);
}

Next in your TFS build definition add a rule in the Category Filter box to exclude this category of tests.

The category field has a few options and supports some simple logic.

That is pretty much it, good work!

2011-01-07

Updating TCPIPListener

I still have a post on encryption for S3 backups coming but I ran into this little problem today and couldn’t find a solution listed on the net so into the blog it goes. I have some code which is using an obsolete constructor on System.Net.Sockets.TcpListener. This constructor allows you to have the underlying system figure out the address for you. It became obsolete in .net 1.1 so this is way out of date. In order to use one of the new constructors and still keep the same behavior just use IPAdress.Any.

Old:

new System.Net.Sockets.TcpListner(port);//warning: obsolete

New:

new System.Net.Sockets.TcpListner(IPAddress.Any, port);

2011-01-04

S3 backup - Part II - Bucket Policy

This wasn’t going to become a series of posts but it is kind of looking like it is going to be that way. I was a bit concerned about the access to my S3 bucket in which I was backup up my files. By default only I have access to the bin but I do tend to be an idiot and might later change the permissions on this bucket indadvertantly. Policy to the rescue! You can set some pretty complex access policies for S3 buckets but really all I wanted was to add a layer of IP address protection to it. You can set policies by right clicking on the bucket in the AWS Manager and selecting properties. In the thing that shows up at the bottom of your screen select “Edit bucket policy”. I set up this policy

{
“Version”: “2008-10-17”,
“Id”: “S3PolicyId1”,
“Statement”: [
{
“Sid”: “IPAllow”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: “
},
“Action”: “s3:
“,
“Resource”: “arn:aws:s3:::bucket/*”,
“Condition” : {
“IpAddress” : {
“aws:SourceIp”: “255.256.256.256”
}
}
}
]
}

Yep policies are specified in JSON, it is the new XML to be sure. Replace the obviously fake IP address with your IP address.

This will protect anybody other than me or somebody at my IP from getting at the bucket. I am keeping a watch on the cat just in case he is trying to get into my bucket from my IP.

Next part is going to be about SSL and encryption.

2011-01-02

On Backing up Files

I’ve been meaning to set up some backups for ages and at last today I got around to it. I have a bunch of computers in the house and they kind of back each other up but if my house goes up in smoke or if some sort of subterranean mole creates invade Alberta in revenge for the oil sands I’m out of luck.

I really like this company BackBlaze who offer infinite backups for $5 a month. I have some issues with them which I’ve outlined in another blog post which is, perhaps, a bit too vitriolic to post. Anyway they have little client which runs on OSX or Windows and steams files up to their data center. What they lack is support for UNIX operating systems and I wanted a consistent backup solution for all my computers so something platform portable was required. They also had some pretty serious downtime a while ago which I felt highlighted some architectural immaturity.

My next stop was this rather nifty S3 BackupSystem. Again this was windows only and couldn’t backup network drives so it was out.

Finally I came acorss s3sync.rb a ruby script which was designed to work like rsync. It is simple to use and can be shoved into cron with narry a thought. At its most basic you need to do:

Set up Keys

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY_ID=
or set the variables in

$HOME/.s3conf/s3config.yml
Backup /etc to S3

s3sync.rb -r /etc :
Restore /etc from S3

s3sync.rb -r : /etc
This assumes you have set up S3 already. If not head over to Amazon and set one up. When you sign up take note of the keys they give out during the sign up. If you fail to do so they can still be accessed once the account is set up from Security Credentials under your account settings. S3 is pretty cheap and I think you even get 15gig for free now. If that ins’t enough then it is still something like 15 cents a gig/month in storage and 10 cents a gig in transfer. Trivial to say the least.