Monday 14 May 2012

What I'm going to do with my Raspberry Pi

    That magic email from Farnell came on Saturday, my Raspberry Pi is in the post!
    So, what am I going to do with it?
    Like loads of other geeks I expect I'll plug it into my telly, connect it to my router and use it as a web terminal and media centre with geek bragging rights. There it'll sit for however many years it takes until I get a new telly or a Raspberry Pi 2, unseen and uncomplaining. But rather wasted, don't you think.

    Somewhere, the Flying Spaghetti Monster has just killed a kitten.

    So what do I really want it to do? I will have a very small and moderately powerful computer - insanely powerful by the standards of a few years ago -  that uses negligible electrical power and can be left on all the time. I'm still going to plug it into my router and telly, but to make it earn its keep I'm going to have it run my keyword analysis tool.
    Events have moved on a little since my blog post describing the tool, but the principle is still the same. I take new posts every day from a big list of RSS feeds and process them for keyword phrases which I store in a database. I can then extract frequencies and collocates over time, which gives me a picture of the interrelationship between the language and terms in the news over any given period. It not only fulfils my original aim of having a tool that would generate keywords and phrases for previously unseen search terms, but also allows any newsworthy subject to be examined in a way that is not possible by any other means.
    The original tool runs in PHP on my Windows laptop. Its MySQL database is pushed well beyond its limit, and I have been working on a version that uses a large directory tree of precomputed JSON files instead. It's an approach I've since also used in my work, relying on the principle that disk space is cheap and quick while complex joins on monster MySQL databases are expensive and slow.
    I could of course compile PHP for my Pi. It's probably already available precompiled anyway. But the Pi's a Python platform (Try saying that after five pints of real cider!) and that offers me a unique opportunity. My PHP code does the job, but it relies on my own language processing libraries which I built myself as a search engine specialist. I'm not a computational linguist so I'd be the first to say that they aren't as good as they could be.
    Python has the incredibly useful Natural Language Toolkit libraries which allow me to do so much more with my source texts, and so much more quickly than my PHP code. So my first effort with my Pi will be to port my keyword tool to Python, using the NLTK instead of my own library. The Pi will still sit behind my telly and be used for the occasional bit of web surfing, but for the rest of the time it'll be crunching keywords and giving me lots of lovely language data that I can work with in real time rather than with enough time to make a cup of tea every time I make a MySQL query.
    In a way I'm not taking advantage of everything the Pi can do. Almost any internet connected computer could do this job, I'm only using a Pi because it's cheap and low power, and I've lusted for one ever since I read their early press releases. Other people will use the Pi's hardware capabilities to do much more eye-catching things. But my Pi, quietly crunching words all day and night behind my telly, will still be earning its keep. It will allow me to learn new things and since its data is likely to end up in some of my work stuff it may even in its own small way make a contribution to the wider understanding of language.
    So that's what I'll be doing with my Pi, what'll you be doing with yours?

No comments:

Post a Comment