Monday, 27 September 2010

RSS feed keyword analysis for the fun of it

    What do you do when the recession hits and you are made redundant?

    When it happened to me last year, I wrote an RSS feed keyword trend analyser in my new-found free time. Over a year and several million keywords and phrases later I can find associated keywords and phrases and plot graphs for almost anything that's been in the UK mainstream news. Like this one, showing the fortunes of three Labour party leaders over the past few weeks.

   You can clearly see Tony Blair's book launch as the blue hump in the middle, and Ed Miliband's election as party leader in green on the right. Meanwhile Gordon Brown bumps along in the obscurity of his Scottish constituency as the red line. Funny that, the colours were allocated at random by my graphing library yet Blair got the Tory blue.
    As a search engine marketeers tool it's of limited use unless you really are looking at up-to-the-minute trends for very fast moving content. But as a toy, or for finding collocated words and phrases for newsworthy themes, it's shaping up pretty well.
    I'll be dipping back in to this particular well of words again on here from time to time, both from the tech side and just for the joy of playing with some words.


  1. This looks brilliant, and so clever.

    I'm a student working on my own RSS feed keyword analyser, but I'm not having much luck.

    Could you perhaps explain a little more of how it works?

    Many thanks, Marie

  2. My apologies, Blogger didn't tell me this comment was queued, so I've only just noticed it.

    Thanks, glad you like the tool. It has a list of a hundred or so RSS feeds and every day it polls them all, harvests their stories and does a keyword analysis on them, storing each incidence of each keyword phrase in a MySQL database. Thus I can pull up a list of all the occurrences of a given phrase over time, or I can pull up a list of the phrases that most often appear round a given phrase, all with a bit of SQL.

    So far so good. But in reality it's far from perfect. It produces a fair bit of noise, so for each query like the Labour leader one I have a selection stage in which I tick the keywords or phrases I'm interested in. It's also very very slow as you might expect for such a large database on a fairly puny machine.

    Still, it did me no harm in my job interview to be able to demonstrate a working language analysis tool because it's a less sophisticated attempt to do what my computational linguistic colleagues at OUP are doing.

    See and

    OTSO your Yahoo Pipes work, have you seen