Saturday, April 23, 2011

refreshed trendingtopics data

I was able to successfully refresh the Trendtopics website data today. I used a sample of an updated Wikipedia dataset that I setup on Amazon S3. The updated data is from 1/1/2011-3/31/2011. As crunching through months of data would take too long or cost too much if I used the EC2 cloud, I didn't use the whole dataset..I sampled only one hour of weblogs from each day. You can checkout my local version of TrendingTopics here .  (CTRL-click or wheel-click to open in new browser window).

In case the site is offline, you can see the more recent dates in the screenshot below:


The caveat with the refresh is that I can only load 100 records of the sample data. For some reason, the Rails app bombs when I try to load the full dataset.
* must figure out why

--more to come--

No comments:

Post a Comment