acts_as_xapian

One of the special pieces of magic in TheyWorkForYou is its email alerts, sending you mail whenever an MP says a word you care about in Parliament. Lots of sites these days have RSS, and lots have search, but surprisingly few offer search based email alerts. My Mum trades shares on the Internet, setting it to automatically buy and sell at threshold values. But she doesn’t have an RSS reader. So, it’s important to have email alerts.

So naturally, when we made WhatDoTheyKnow, search and search based email alerts were pretty high up the list, to help people find new, interesting Freedom of Information requests. To implement this, I started out using acts_as_solr, which is a Ruby on Rails plugin for Solr, which is a REST based layer on top of the search engine Lucene.

I found acts_as_solr all just that bit too complicated. Particularly, when a feature (such as spelling correction) was missing, there were too many layers and too much XML for me to work out how to fix it. And I had lots of nasty code to make indexing offline – something I needed, as I want to safely store emails when they arrive, but then do the risky indexing of PDFs and Word documents later.

The last straw was when I found that acts_as_solr didn’t have collapsing (analogous to GROUP BY in SQL). So I decided to bite the bullet and implement my own acts_as_xapian. Luckily there were already Xapian Ruby bindings, and also the fabulous Xapian email list to help me out, and it only took a day or two to write it and deploy it on the live site.

If you’re using Rails and need full text search, I recommend you have a look at acts_as_xapian. It’s easy to use, and has a diverse set of features. You can watch a video of me talking about WhatDoTheyKnow and acts_as_xapian at the London Ruby User Group, last Monday.

1 Comment