Archive for the 'FeedDigest' Category

Google Reader ignores robots.txt, so are feed readers ‘bots’?

Wednesday, April 25th, 2007

Today, a Feed Digest user reported that his digests using IceRocket were no longer working. I looked into it, and it seems IceRocket had banned our proxy. I rigged up an alternative proxy and it worked for about 50 requests, and then that was banned too. Clearly the ban was automated, and probably reflects a new rule / policy from IceRocket.

I took a look at their robots.txt to see what the deal was, and it turns out they block ALL useragents from their /search directory, which means most of their RSS feeds can’t be used by, er.. anything. A feed reader is an automated client, much like Feed Digest is, so we’re not technically allowed to retrieve their feeds except manually with our browsers ;-) Of course, this all depends on the definition of a ‘bot’.. more on that later.

I decided to put Google Reader to the test to see if they respect robots.txt rules, and.. no! I could subscribe successfully to an IceRocket feed ( http://www.icerocket.com/search?tab=blog&q=robots&rss=1 ) from Google Reader, despite IceRocket’s robots.txt file denying it. So, at least Feed Digest isn’t alone in mostly ignoring robots.txt policy (although barely any feeds are usually covered by them since otherwise they’d be made useless) and Google Reader doesn’t follow the rules either. Difference is, Google’s a big guy and doesn’t get banned and Feed Digest is small and does. Perhaps we’ll work it out with IceRocket in a nice fashion, but the point remains and this could easily be an issue with 101 other feed providers out there in the future.

However, the remaining point is.. is a feed reader a ‘bot’? Finding a definitive answer to this isn’t easy. The original “robot exclusion” standard says:

WWW Robots (also called wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages.

In theory this means almost no feed reader is actually a “robot”, although it appears Feed Digest is being treated as such, although this definition of “robot” seems riddled with potential loopholes.

What’s the actual policy here? Are proxies, feed readers, feed “crawlers” (but not recursive ones) and so forth “robots”, “spiders”, or not? Furthermore, would an application that trawled through linked OPML files be a “robot” because it recursively retrieves OPML files? It’s a toughie, but I’m thinking there needs to be some policy set on this by the higher-ups :)

New FeedDigest.com Design & Logo

Friday, March 2nd, 2007

Fd2007Site

I’m proud to announce that Feed Digest is now up and running (just) with a new design, coupled with a new logo. The back-end system is still the same, but this new site is in anticipation of the roll out of “Feed Digest 2007″ in the coming months. This is another pile of boxes all taken off of my To Do list, allowing me to focus more on the technical hokery-pokery.

The new Feed Digest site has been a bit of a wrestle technology wise. It’s powered by some ad-hoc PHP, WordPress 2.1, and bbPress among other things. I initially redeveloped it all in Drupal but a few weeks ago I decided I found it too generic and wanted something lighter and easier to tweak, and.. et voila.

Microsoft attempts to patent feed processing technology

Saturday, December 23rd, 2006

I can’t believe it. Dave Winer reports that Microsoft are rather specifically attempting to patent a system that acts and sounds rather like what Feed Digest does. All of these excerpts from the patent application are almost word for word descriptions of significant aspects of what Feed Digest does or how it operates. It also covers significant aspects of applications such as FeedBurner.

The ability of a central system to receive feeds and allow others to retrieve data related to those feeds:

[…] the platform can acquire and organize web content, and make such content available for consumption by many different types of applications. These applications may or may not necessarily understand the particular syndication format. Thus, in the implementation example, applications that do not understand the RSS format can nonetheless, through the platform, acquire and consume content, such as enclosures, acquired by the platform through an RSS feed […]

There are cases, however, when an application that uses the platform does not wish to be subscribed to a particular feed. Rather, the application just wants to use the functionality of the platform to access data from a feed. In this case, in this particular embodiment, subscriptions object 202 supports a method that allows a feed to be downloaded without subscribing to the feed. In this particular example, the application calls the method and provides it with a URL associated with the feed. The platform then utilizes the URL to fetch the data of interest to the application. In this manner, the application can acquire data associated with a feed in an adhoc fashion without ever having to subscribe to the feed.

The ability to tailor data within the system for each feed:

On the other hand, there is data that is treated as read/write data, such as the name of a particular feed. That is, the user may wish to personalize a particular feed for their particular user interface. In this case, the object model has properties that are read/write. For example, a user may wish to change the name of a feed from “New York Times” to “NYT”. In this situation, the name property may be readable and writable.

Centralized synchronization:

In the illustrated and described embodiment, feed synchronization engine 108 (FIG. 1) is responsible for downloading RSS feeds from a source. A source can comprise any suitable source for a feed, such as a web site, a feed publishing site and the like. In at least one embodiment, any suitable valid URL or resource identifier can comprise the source of a feed. The synchronization engine receives feeds and processes the various feed formats, takes care of scheduling, handles content and enclosure downloads, as well as organizes archiving activities.

Feed normalization:

In the illustrated and described embodiment, feeds are capable of being received in a number of different feed formats. By way of example and not limitation, these feed formats can include RSS 1.0, 1.1, 0.9.times., 2.0, Atom 0.3, and so on. The synchronization engine, via the feed format module, receives these feeds in the various formats, parses the format and transforms the format into a normalized format referred to as the common format.

To Amar S. Ghandi, Edward J. Praitis, Jane T. Kim, Sean O. Lyndersay, Walter V. von Kock, William Gould, Bruce A. Morgan, and Cindy Kwan.. did you really collectively invent all of this stuff? Shame on those backing this pathetic attempt to trample over technology that has, so far, not necessitated the use of software patents.

First Angry FeedDigest Customer

Thursday, December 7th, 2006

After 18 months and 20,000 users, I’ve finally got my first angry e-mail from a FeedDigest customer. I figure the occasion is worth marking.

The customer (who I will not identify) was sent an e-mail one week before their account expired (which they definitely received, as I got an automated response) saying their account was due for renewal. They did not renew, so one week after the expiration, I deactivated it. Within an hour I got an e-mail containing this paragraphs:

I am totally outraged at this course of action as I received absolutely no notification whatsoever. I would like to understand what has happened and why immediately. You can be sure that I am looking for alternatives to your service and without an adequate explanation, will use our networks to make it broadly known how poorly we were treated – as a paying customer no less!

I am not a shmoozer who will bend over for customers who make threats (and nor should you be). I politely told him what happened with his expiration, I provided him with an OPML dump of his account to make his life easier, and told him to go find another solution. I just couldn’t believe he was threatening me over this.. especially since all support incidents with him had gone well and I’d had no contact with him at all for the past 6 months!

Good riddance to him. Losing customers like that helps me provide the excellent service (not my words!) that my other customers continue to enjoy.