
About five years ago, I wrote a script to scrape the website of Rotate This and republish their concert ticket listing as an RSS feed. Every six hours for five years I’ve hit their site and updated my database. The RSS feed has quite a few users, using FeedBurner, Google Reader, Bloglines and a few other miscellaneous RSS viewers. It’s one of the most frequently hit pages on my site.
I wanted to try out the new Google AppEngine service, and was looking for a simple project, so I figured I could port my little scraper. It seemed like perfect fit.
Google AppEngine is a unique application hosting service. It’s not really classical web hosting, and it’s not really a virtual server. You generate a Java web application, bundle it as a .war file, and deploy it to AppEngine. Google’s magic voodoo makes sure that your app is hosted on at least two servers, that it will scale automatically for load, that it will start up quickly, and that it will always have access to its data. There are some restrictions, however: no request to your app can take longer than a few seconds to process, and there are strict size limits for data storage access and request/response size. It can’t start threads, it can’t access the disk, it can’t use too much memory.
There is no real “backend” access to your app either; all access must happen though a URL. The Cron service simply calls a specified URL within your application on regular intervals.
A big advantage to AppEngine is that their basic service is free. The quotas for requests, data transfer and CPU time under the free service level are very high. Anything that exceeded them I would call a very successful app — it is unlikely I will ever have to buy more capacity. The free service includes over a million requests and a gigabyte of data transfer per day.
I spent a couple of days working on it — AppEngine was easy to learn but had a few quirks. My older code was in Perl (the scraper) and PHP (the RSS generator) using MySQL as a backend. The AppEngine code is all Java, using Google’s datastore through Java Data Objects. I had never used JDO before, so there was a bit of a learning curve there. Everything else was straight forward, using the URL Fetch service to get the Rotate This ticket page, and using the Cron to schedule the scrape every six hours.
On the output side I was able to take advantage of the rich Java ecosystem, using ROME to generate the RSS feed, and using ICal4J to generate the new iCal feed that I was adding on. Both worked fine on AppEngine’s restricted Java runtime.
So, here are the links, give it a try!
Rotate This Tickets RSS Feed: http://rotatethisrss.appspot.com
Rotate This Tickets iCal Feed: http://rotatethisrss.appspot.com/ical
Let me know how it goes!
p.s. I redirected the old link to AppEngine — if you were using this before, you shouldn’t have to change anything.








Roomba and I have been getting along pretty well over the last few months. It has been cleaning my floor regularly, and doing a good job at it. It gets dust bunnies from places that I never did with the vacuum, and it usually makes it home to its charging base.
