Archive for the 'feed' Category

Is PicLens Malware?

I recently installed the PicLens Firefox extension. It is an incredibly useful way to browse image collections, the interface is both very responsive and well thought, and the integration into existing websites is unobtrusive enough to convince me.

Then, as I was monitoring requests on one application I develop on my local server, I noticed that each time I requested a page, two requests were received by the web server (in addition to requests for web assets such as JavaScript, CSS and image files). After investigation, I realized that the PicLens extension detected a <link> tag in the page content, and automatically fetched the RSS feed linked by that tag. It does so everytime it detects an application/rss+xml link.

I made the test with pages including more than one RSS feed (try php.net for instance) and noticed the same behavior, only at a larger scale. So PicLens does basically what Google Web Accelerator does: it prefetches web resources (in this case: RSS feeds) to accelerate the navigation experience.

I emailed the PicLens support about the issue, and here is their response:

Hi Francois,

Thank you so much for your kind words and for using PicLens! We really appreciate you taking the time to send us your thoughts.

I’m sorry to hear you are worried about PicLens’s prefetching behavior. We prefetch all tags that have a content type of “application/rss+xml” because we use that to match up mediarss feeds with items on the page. It’s not a bug at all, nor have we heard of it causing any problems for anyone. Is there a specific reason you feel that it jeopardizes websites?

Hope to hear back from you soon.

All the best, Meg & The PicLens Team

I can think of many reasons why link prefetching is bad, among which wrong statistics, additional bandwidth and server load. But maybe I’m being too extremist on that one. What do you think? Can prefetching be considered as an acceptable practice nowadays? Or is the PicLens extension something that should not be installed?

Build your own feed aggregator with symfony

With the help of the sfFeed2 plugin and the sfWebBrowser plugin, symfony makes the creation of a feed aggregator a breeze. Let's see what it would take to create the core of a Google Reader-like.

Fetching feeds

First of all, you'll have to fetch feeds from the Internet. It is strongly recommended to browse feeds in an asynchronous way, i.e. not when the user requests the page showing the aggregated feeds. There are two obvious reasons why you wouldn't want a synchronous process:

  • Distant servers providing the feeds that you want to fetch would receive one request per request on your server. That's a nasty trick to play to other service providers, and it can corrupt the distant server's statistics.

  • If you have to fetch a dozen URLs per request, then the response time might exceed the server timeout.

So you have to fetch feeds, store them somewhere (in your filesystem or in a database), and keep them for later. I choose to store them in the disk, which gives me an occasion to use the sfFileCache class. Here is the code that I write in a batch process:

<?php

define('SF_ROOT_DIR',    realpath(dirname(__file__).'/..'));
define('SF_APP',         'frontend');
define('SF_ENVIRONMENT', 'dev');
define('SF_DEBUG',       true);

require_once(SF_ROOT_DIR.DIRECTORY_SEPARATOR.'apps'.DIRECTORY_SEPARATOR.SF_APP.DIRECTORY_SEPARATOR.'config'.DIRECTORY_SEPARATOR.'config.php');

// Put the URLs of the feeds you want to fetch in an array
$urls = array(
  'http://api.flickr.com/services/feeds/photos_public.gne?format=rss',
  'http://del.icio.us/rss/popular',
  'http://feeds.feedburner.com/TechCrunch',
  'http://www.symfony-project.com/weblog/rss'
);

// Fetch the feeds
$feeds = array();
foreach($urls as $url)
{
  try
  {
    $feeds[] = sfFeedPeer::createFromWeb($url);
    echo "fetched feed ".$url."\n";
  }
  catch(Exception $e)
  {
    echo "error fetching feed ".$url.": ".$e."\n";
  }
}

// Aggregate the feeds
$aggregated_feeds = sfFeedPeer::aggregate($feeds, array('limit' => 10));

// Cache the results
$f = new sfFileCache(sfConfig::get('sf_data_dir').'/feed');
$f->set('feeds', '', serialize($aggregated_feeds));


The interesting part of the batch is the use of the sfFeed2 plugin classes, made simple by the sfFeedPeer utility methods:

  • sfFeedPeer::createFromWeb() takes an URL as parameter, makes a request to this URL, decodes the response and populates a sfFeed object accordingly. It relies on the sfWebBrowser plugin for the HTTP request. It can recognize feeds of various formats (Atom1, RSS0.92, RSS1, RSS2).

  • sfFeedPeer::aggregate() takes an array of sfFeed objects and returns a single feed, in which all feed items are aggregated and ordered chronologically. The second parameter is an array of options, that I use here to limit the number of items present in the resulting feed.

Then I serialize the sfFeed object containing the aggregated items and store it in the disk (under the data/ directory, to make it environment-independent) using the sfFileCache class.

I execute the batch once to test it and to generate the first version of the data/feed/feeds.cache file; as it needs to run periodically, I also add the following command to my crontab:

30 1 * * * cd /path/to/my/project && php batch/fetch_feeds.php

Displaying a feed

That's it for the first part. Now, what happens when a user makes a request to my application for the page showing the aggregated feeds? If this action is called feed/show, it can look like:

public function executeShow()
{
  $f = new sfFileCache(sfConfig::get('sf_data_dir').'/feed');
  $this->feed = unserialize($f->get('feeds', '', true));
}


The last thing I'll do is to display the details of each item, in feed/templates/showSuccess.php:

<?php echo use_helper('Text', 'Date') ?>
<?php foreach($feed->getItems() as $item): ?>
<div class="post">
  <h2><?php echo link_to(truncate_text(strip_tags($item->getTitle()), 40), $item->getLink()) ?></h2>
    Posted on <?php echo format_date($item->getPubDate(), "EEEE d MMMM 'at' h:ma ") ?>
    by <?php echo link_to($item->getFeed()->getTitle(), $item->getFeed()->getLink()) ?>
  <div class="summary"><?php echo truncate_text($item->getDescription(), 300) ?></div>
</div>
<?php endforeach; ?>


That's where I'm glad that the sfFeed and sfFeedItem classes provided by the sfFeed2 plugin have the same accessors whatever the format of the feed (Atom/Rss/etc). It makes the display of a feed item details very simple.

If you want to see the result, check the "outside" columns of the symfony community page.