Archive for March, 2007

Build your own feed aggregator with symfony

With the help of the sfFeed2 plugin and the sfWebBrowser plugin, symfony makes the creation of a feed aggregator a breeze. Let's see what it would take to create the core of a Google Reader-like.

Fetching feeds

First of all, you'll have to fetch feeds from the Internet. It is strongly recommended to browse feeds in an asynchronous way, i.e. not when the user requests the page showing the aggregated feeds. There are two obvious reasons why you wouldn't want a synchronous process:

  • Distant servers providing the feeds that you want to fetch would receive one request per request on your server. That's a nasty trick to play to other service providers, and it can corrupt the distant server's statistics.

  • If you have to fetch a dozen URLs per request, then the response time might exceed the server timeout.

So you have to fetch feeds, store them somewhere (in your filesystem or in a database), and keep them for later. I choose to store them in the disk, which gives me an occasion to use the sfFileCache class. Here is the code that I write in a batch process:

<?php

define('SF_ROOT_DIR',    realpath(dirname(__file__).'/..'));
define('SF_APP',         'frontend');
define('SF_ENVIRONMENT', 'dev');
define('SF_DEBUG',       true);

require_once(SF_ROOT_DIR.DIRECTORY_SEPARATOR.'apps'.DIRECTORY_SEPARATOR.SF_APP.DIRECTORY_SEPARATOR.'config'.DIRECTORY_SEPARATOR.'config.php');

// Put the URLs of the feeds you want to fetch in an array
$urls = array(
  'http://api.flickr.com/services/feeds/photos_public.gne?format=rss',
  'http://del.icio.us/rss/popular',
  'http://feeds.feedburner.com/TechCrunch',
  'http://www.symfony-project.com/weblog/rss'
);

// Fetch the feeds
$feeds = array();
foreach($urls as $url)
{
  try
  {
    $feeds[] = sfFeedPeer::createFromWeb($url);
    echo "fetched feed ".$url."\n";
  }
  catch(Exception $e)
  {
    echo "error fetching feed ".$url.": ".$e."\n";
  }
}

// Aggregate the feeds
$aggregated_feeds = sfFeedPeer::aggregate($feeds, array('limit' => 10));

// Cache the results
$f = new sfFileCache(sfConfig::get('sf_data_dir').'/feed');
$f->set('feeds', '', serialize($aggregated_feeds));


The interesting part of the batch is the use of the sfFeed2 plugin classes, made simple by the sfFeedPeer utility methods:

  • sfFeedPeer::createFromWeb() takes an URL as parameter, makes a request to this URL, decodes the response and populates a sfFeed object accordingly. It relies on the sfWebBrowser plugin for the HTTP request. It can recognize feeds of various formats (Atom1, RSS0.92, RSS1, RSS2).

  • sfFeedPeer::aggregate() takes an array of sfFeed objects and returns a single feed, in which all feed items are aggregated and ordered chronologically. The second parameter is an array of options, that I use here to limit the number of items present in the resulting feed.

Then I serialize the sfFeed object containing the aggregated items and store it in the disk (under the data/ directory, to make it environment-independent) using the sfFileCache class.

I execute the batch once to test it and to generate the first version of the data/feed/feeds.cache file; as it needs to run periodically, I also add the following command to my crontab:

30 1 * * * cd /path/to/my/project && php batch/fetch_feeds.php

Displaying a feed

That's it for the first part. Now, what happens when a user makes a request to my application for the page showing the aggregated feeds? If this action is called feed/show, it can look like:

public function executeShow()
{
  $f = new sfFileCache(sfConfig::get('sf_data_dir').'/feed');
  $this->feed = unserialize($f->get('feeds', '', true));
}


The last thing I'll do is to display the details of each item, in feed/templates/showSuccess.php:

<?php echo use_helper('Text', 'Date') ?>
<?php foreach($feed->getItems() as $item): ?>
<div class="post">
  <h2><?php echo link_to(truncate_text(strip_tags($item->getTitle()), 40), $item->getLink()) ?></h2>
    Posted on <?php echo format_date($item->getPubDate(), "EEEE d MMMM 'at' h:ma ") ?>
    by <?php echo link_to($item->getFeed()->getTitle(), $item->getFeed()->getLink()) ?>
  <div class="summary"><?php echo truncate_text($item->getDescription(), 300) ?></div>
</div>
<?php endforeach; ?>


That's where I'm glad that the sfFeed and sfFeedItem classes provided by the sfFeed2 plugin have the same accessors whatever the format of the feed (Atom/Rss/etc). It makes the display of a feed item details very simple.

If you want to see the result, check the "outside" columns of the symfony community page.

Let the old ones die and attend their funeral

The web is overburdened with old sites, visited by nobody and victims of the pride of their creators, who don't want to let them go. "It costs nothing", they say, "and someone may want to read my opinion on carrot soup someday".

What with the old ones in real life

In real life, the old ones are visited regularly by the members of their family, so that they don't get forgotten. It's like a child's duty to pay a visit every once in a while to grandparents, old uncles and sick elderly aunt Tatiana. Until they die, and then you go to the funeral, gather with the nearest and dearest, cry a little, drink a lot, and start something else.

The same applies to the web

To avoid overpopulation, the web should follow the example given by the family traditions. Any forgotten website for more than, say, a year, should be declared sick, and its creators/members/users/trackbacks should be told about the situation. They could all meet in the website's backyard on Sundays, to talk about the good old times when the website was still active, and about the Superbowl. That way, the website's visits figures, although low, would keep at an acceptable level.

Then the web doctor would visit the old website and check its health. He would advise against useless attempts to rejuvenation, give a few coins to the host so that the website doesn't get kicked out, check that it still looks acceptable on modern browsers... Web doctor could be a nice profession, and if it's like in real life, it would pay well.

Web funerals

Anyway. After numerous years of brave resistance of the patient, the web doctor could declare it dead. He would call the relatives and ask them to organize the ceremony. A chat would be organized in a #funeral IRC channel, people would overdress and exchange memories of the website, only to say that it was better in the old times, nowadays everything gets corrupted. The creator would choose a picture of the website, and put it on a virtual grave at graveyard.com. Then the host would be informed to wipe out all data, the search indexes would be told to do the same, and the website would only live in our memories, for the best.

How would that change the web?

I see numerous advantages to the old websites dying.

  • First of all, it leaves room for the youth. Yeah, altavista.com was a kick-ass site, but it's time to start using a real search engine.

  • Also, it forces us to keep the knowledge of the past, and is a good way to avoid repeating the same mistakes. The numerous web 2.0 sites coming out everyday have a strange aftertaste of the 00s Internet bubble (unreasonnable cash burn rate, ridiculous business potential).

  • New sites respect the older ones, and don't try to show off too much, even if they do better.

  • The growth rate of the web... well, this will probably not get better.

  • Your name can't be found related to some old story anymore. You know, when you wrote in a webmaster forum that JavaScript is crap.

  • Has-been website creators pass to something else (at last).

  • Information gathered from websites isn't outdated in fifty percent of the cases.

  • Hard disk manufacturers don't get rich so fast.

And most of all, my searches in Google would return relevant results.