Peter Cooper : UK Web 2.0 and Ruby on Rails consultant
Recent Posts
»Jay-Z: From Brooklyn to the Boardroom
»Prank Caller Submits Girl To Sexual Torture By Proxy
>Full archive
Other Posts
« Glasses prescriptionWhat was really happening at the Steve Jobs keynote »

Putzing around with FeedDigest


Had a long day of putzing around with FeedDigest. The database problems continued to mount up. While I've spent a lot of time optimizing and keeping it running, the situation still wasn't ideal, and the main bottleneck was a combination of disk I/O and simple math (that is, making 100 radically different temporary tables of up to 10,000 rows of 300 bytes each each minute isn't viable).

I decided to solve the problem two-fold. First, I created an extremely lean 'posts_meta' table to store only the barest of information that FeedDigest needs to be able to sort items into the order the user/digest requires. That includes the id of the post (to link to the 'posts' table), the aggregated time, a hash of the URL, a hash of the title, and such like. Now I have the meta data table down to 26 bytes per row on average.. which is a ten-fold increase, and a massive sigh of relief to the database's dizzy brain.

Secondly, FeedDigest is extremely heavy on the INSERTs and SELECTs. Optimizting a database is easy if it's one way or the other, but FeedDigest's database has a ton of stuff going in (all the posts, feed items, etc) and a ton of stuff going out (to digests). All of the INSERTs were causing SELECTs to lock and slow down the process of building a digest, so I decided to implement a queue.

Due to the database structure, this was easily said than done and has resulted in a whole lot of frowning and scrunched up forehead tonight, but I think it's finally there. Now all of the crawlers throw data into posts_queue (schematically a duplicate of 'posts').. and a queue maintainence program runs every minute and syncs up the data with the active tables.

It all works too, although now it's not I/O causing problems.. it's CPU! Due to some of the structure of FeedDigest, we're now doing about 8 times the amount of raw requests per second to pull off the above.. so it's still way faster than before, but now the CPU is maxing out a lot. Luckily there are a lot of query optimizations I can pull now, since it's finally working.. but it's constant work. Still, it's serving almost three million requests a day, and you can't complain at a single machine that can do that while pulling and managing data in a gargantuan MySQL database!




January 12, 2006 | Posted by peter | Comments (3)
Comments

So 3 million requests a day and this machine is doing both the feed updates and the serving of pages and the SQL ? no bad.

Can I ask what's the HW spec ?

Also I noticed in Alexa you had a peak in traffic in begging of January, what was this about, seems since then the traffic to feeddigest has grown X 2.

Posted by: oron at January 13, 2006 05:19 PM

So 3 million requests a day and this machine is doing both the feed updates and the serving of pages and the SQL ? no bad.

Can I ask what's the HW spec ?

Also I noticed in Alexa you had a peak in traffic in begging of January, what was this about, seems since then the traffic to feeddigest has grown X 2.

Posted by: oron at January 13, 2006 05:20 PM

The main box has dual P4 2.8GHz processors and dual 80GB hard drives (only IDE, non RAIDed). Note that it doesn't host the FeedDigest.com site or do much feed crawling. We have other machines doing crawling, although the main machine does the initial crawl when a feed is added for the first time. All the other boxes (of minimal or unimportant spec - one is even a VPS) talk through MySQL and memcached, primarily.

Site traffic and digest traffic are generally unconnected, we find. The boost in early January is, surprisingly, beyond us. It doesn't appear to be Web driven, so I think we must have had a writeup somewhere. I know we were mentioned in an e-mail newsletter, which may explain some of it. I also believe someone wrote an article about us and then distributed it as a "free" article to lots of different e-mail newsletters.

We are expecting another big burst when we "relaunch" FeedDigest sometime in the next few weeks. :)

Posted by: Peter Cooper at January 13, 2006 05:37 PM

Return to the homepage.
Privacy Policy