Archive for September, 2008

Chapter 10 - Forms

Dealing with the display of form inputs, the validation of a form submission, and all the particular cases of forms is one of the most complex tasks in web development. Luckily, symfony provides a simple interface to a very powerful form sub-framework, and helps you to design and handle forms of any level of complexity in just a few lines of code.

NOTICE: This document is the first draft of a methodology experiment explained earlier in this blog. It documents the sfForm framework found in symfony 1.1, but with some changes in the API and usage. As such, it describes a library that is not yet written (like that) and cannot be used to learn the usage of the current sfForm implementation. It is quite long, so you might prefer to download the Markdown version and read it offline. Being a first draft, this document is a call for comments, both about its structure and its content. And if you are interested in implementing the differences between what this document describes and what is currently implemented in the symfony framework, please contact me.

Read more »

Sorting By Custom Column in the Symfony Admin Generator

Did you ever wish you could sort by a partial column in the admin generator? Using DbFinder and a few lines of code, it is now possible.

The symfony admin generator allows you to select which properties of a model you want to display. You can include foreign key fields, or even a partial field to display pretty much everything you want in the list view. The following example uses this ability to display the name of article authors, based on the fact that the Article model has a many to one relationship to the User model:

# in mymodule/config/generator.yml
generator:
  class:          sfPropelAdminGenerator
  param:
    model_class:  Blog
    theme:        default

list:
  display:        [=title, user, category, _nb_posts, created_at]
  fields: 
    user:         { name: Author }

This generator configuration includes a partial field that counts the number of blog posts for each blog:

// in mymodule/templates/_nb_posts.php
<?php echo $blog->countBlogPosts() ?>


The problem is that only the "True" fields (that is, the ones that correspond to a column in the main table) are sortable. The result is that, with the following example, only the title column is sortable.

With symfony alone, there is no way to make the other columns sortable except overriding the whole _list_th_tabular.php partial in your module, overriding the addSortCriteria() method in the action, and losing the ability to add or remove columns in the future.

Enters DbFinderPlugin. You probably know from this blog that DbFinder offers a very powerful and yet simple way to replace Propel Criteria queries. What you might not know is that the DbFinder plugin bundles a full admin generator theme. It has the exact same features and syntax as the standard symfony admin generator, but it is entirely written with DbFinder queries. And to make this generator theme very usable, it includes the batch_actions extension from symfony 1.1 (that's what allows to display the checkboxes on the left side of the list to perform an action on several records at a time), and the ability to sort by any type of column.

To use the DbFinder admin generator, no need to switch your entire project to DbFinder. Just install the plugin, edit the generator.yml of one of your generated modules, and change the class property from sfPropelAdminGenerator (or sfDoctrineAdminGenerator, if you use Doctrine) to DbFinderAdminGenerator. Refresh the page in your browser, and you should normally see no change. That's good news: despite the fact that all the generator code has been rewritten to work with DbFinder instead of Propel, it is completely backwards compatible.

And once a generated module uses DbFinder, you gain access to the new sort_method option for custom fields:

# in mymodule/config/generator.yml
generator:
  class:          DbFinderAdminGenerator
  param:
    model_class:  Blog
    theme:        default

list:
  display:        [=title, user, category, _nb_posts, created_at]
  fields: 
    user:         { name: Author, sort_method: orderByUsername }
    category:     { sort_method: orderByCategory }
    nb_posts:     { sort_method: orderByNbPosts }

Refresh the list view, and voila, the column headers are now clickable.

Don't click the new links yet: you've defined three methods for custom ordering, and you still have to write them. To do so, you need to create a BlogFinder, which is a finder class specific to the Blog model class. So create a lib/model/BlogFinder.php class with the following content:

// in lib/model/BlogFinder.php
class BlogFinder extends DbFinder
{
  protected $class = 'Blog';
 
  public function orderByUsername($order = 'asc')
  {
    return $this->orderBy('User.Name', $order);
  }

  public function orderByCategory($order = 'asc')
  {
    return $this->orderBy('Category.Name', $order);
  }
 
  public function orderByNbPosts($order = 'asc')
  {
    return $this->
      leftJoin('BlogPost')->
      groupBy('Blog.Id')->
      withColumn('COUNT(BlogPost.Id)', 'nbPosts')->
      orderBy('nbPosts', $order);
  }
}


The finder is smart enough to guess the relationship between the Blog and the User model, as well as the relationship with the Category model, because the YAML schema defines foreign keys between the related tables.

Clear the cache (to allow the autoloading to find the new finder class), refresh your list, and enjoy fully sortable columns.

To finish, here is a small trick to drastically improve your backend performance. Every time the _nb_posts partial is called (and that's once per row in the list), symfony issues a COUNT query. That means that the current configuration will run n+1 queries, n being the number of results per page (typically 20). That's pretty bad for performance. What if you could hydrate an additional column in the main query and use this column in the _nb_posts partial? With DbFinder, that's very easy. Just add a finder_methods setting to your list configuration, as follows:

# in mymodule/config/generator.yml
list:
  display:        [=title, user, category, _nb_posts, created_at]
  fields: 
    user:         { name: Author, sort_method: orderByUsername }
    category:     { sort_method: orderByCategory }
    nb_posts:     { sort_method: orderByNbPosts }
    finder_methods: [withNbPosts]

Symfony executes all the methods defined in the finder_methods before displaying the list. It allows you to define a default ordering, to filter out some records, or, like here, to add custom column to the main query.

Now it's time to create this BlogFinder::withNbPosts() method. Since it contains part of the code of orderByNbPosts(), and that the finder generator executes sort methods at the end of the action, you can reduce the orderByNbPosts() code accordingly:

// in lib/model/BlogFinder.php
public function withNbPosts($order = 'asc')
{
  return $this->
    leftJoin('BlogPost')->
    groupBy('Blog.Id')->
    withColumn('COUNT(BlogPost.Id)', 'nbPosts');
}

public function orderByNbPosts($order = 'asc')
{
  return $this->orderBy('nbPosts', $order);
}


Now the main list query includes the call for the calculated nbPosts column, and you can change the _nb_posts partial to use it:

// in mymodule/templates/_nb_posts.php
<?php echo $blog->getColumn('nbPosts') ?>


Refresh the list view: Ta-da, the result is the same, but using a single query instead of n+1.

So the DbFinder generator offers the same features as the current symfony 1.1 generator, except more. Don't wait until you upgrade your project to symfony 1.2 to enhance your generated modules. Read the DbFinder admin generator documentation, and download the plugin right away.

Document-Driven Development in Practice: Rethinking sfForms

If you've watched or read my presentation on Documentation-Driven Development, you may wonder how to put that new methodology into action. A practical example is often better than a long explanation, so let's see ho to apply it to the new Forms sub-framework introduced by symfony 1.1.

Not DDD

In order to use the new sfForm library, you must either read a book (not yet completely written) or dive into the source code and guess how to use it. To my mind, this is pretty much the contrary of what leads to a large adoption.

The Form framework was designed with power in mind, and reaches this goal very well: you can use it to create forms of any level of complexity, including forms embedding other forms, forms with a variable number of fields, forms split into several steps ("wizards"), etc. It is very much object oriented, so everything can be reused or overridden.

But unfortunately, in order to create a simple form, you need to learn a lot more and write a lot more code than what you used to do in symfony 1.1. The current Forms documentation describes the API and justifies its implementation. It goes very much into the details of each part of the sub-framework, and quite early in the learning process. The result - for me, at least - is that the reader feels overwhelmed by the huge amount of classes, features and options, and dismisses the whole sub-framework for being too complex.

"Let's use that new Form stuff for complex forms and keep the current form helpers and YAML validation for everyday forms", I hear. That's a pity, because once you understand how the new Forms sub-framework works and accept its verbosity, there is no good reason to stick with the old system.

An Ideal sfForm Documentation

I think that a piece of documentation is missing. This piece is probably an introduction to the Form sub-framework.

In symfony 1.0, a single chapter of the book was enough to master forms for most use cases. Even if the new form sub-framework is more powerful than the 1.0 one, it should not be more complicated to learn and use in similar cases. So the sfForms introduction should be short, requiring at most one hour to read it.

After reading this documentation, an average developer should be able to use sfForms in 80% of the cases. That includes at least all the features described in the original Forms chapter of the symfony book:

  • Displaying a form
  • Available form helpers
  • Displaying a model-based form
  • Dealing with Foreign keys
  • Handling a form submission
  • Validating a form
  • Available validators
  • Repopulating a form
  • Complex use cases

The target audience would be people knowing some concepts about symfony, but not yet everything. In fact, they should know what the Chapters 1 to 9 of the symfony guide cover, not more. So some advanced concepts should probably be skipped, or explained only after the fundamental usage is clear.

This introduction should not require additional lookup in the Forms book. That means that it should be self-sufficient. It probably also means not including the justifications of the Forms implementation that you can find in the current Forms book. The reasons why the API was designed the way it is should become obvious at the end of the introduction. Expert customization and rare use cases should probably also be left aside.

The symfony 1.0 documentation introduces concepts and features in a certain order, with a precise purpose: not loading too much information into the reader's mind at a time. In a similar fashion, the forms introduction should be a linear piece of documentation, not a set of articles that you can read in any order with hyperlinks everywhere to break the reading flow.

The forms framework is powerful, but the current form book somehow translates that into length, and verbosity. On the contrary, I think the reader should feel exalted: the documentation should put him in a rush to start using the new forms. So the forms introduction should "tell a story", and gently lead the reader to a point where he feels he can grab the steering wheel and drive the car by himself.

API enhancements

The problem is that explaining the current API takes much longer than a single piece of documentation. That's because of the many options available, because of the many objects to learn, and because even the simplest things (like a list of form controls) look complicated (sfWidgetFormSchema).

There is not much choice to overcome this problem. In order to write a short and readable guide to the forms sub-framework, its API must be adapted. That's right, the API must be changed so that the documentation can be made shorter, and more usable. This is one of the principles of the Documentation-Driven Development methodology.

These API enhancements should be completely backward compatible, so that any existing application using the current sfForms implementation can continue to work seamlessly with the modified implementation. In a way, that qualifies the API enhancements as a simplicity layer on top if the existing code. As a side note, the current Forms book still remains indispensable for advanced usage.

Note that the API enhancements don't need to be implemented before the new documentation is published. The implementation comes second, after the documentation. That's another of the DDD principles: explain first, make it work afterwards. After all, project managers write requirements for web applications before they exist, all the time.

Do As I Do, Not As I Say

Some people are getting sick of reading me criticizing parts of the symfony framework. Well, I'm not criticizing: I'm actively improving.

Rethinking sfForms is a good example for a Documentation-Driven Development. To illustrate this methodology, I'm going to rewrite the Chapter 10 of the symfony book for symfony 1.1. That's right, the current Chapter 10, which describes the "old way" of doing forms, can be rewritten in a similar fashion and serve for symfony 1.1.

But since the current API requires too much explanation to be used, I'm going to introduce the necessary API changes to the sfForms library. I'll create and manage forms in a way slightly different from what the current API allows, to make it simpler to use - and to explain.

When the new Chapter 10 is published here in this very blog, this piece of documentation will be of no use since the features it describes won't be implemented yet. But I know that writing documentation is not enough to convince people (yet), so I will Implement the API changes as a second step to the exercise. As I'm not a very good developer, any help will be welcome during that phase (contact me If you want to give me a hand after the documentation is published).

If everything goes well, the implementation of the API changes will be be released as a symfony plugin - maybe called sfSimpleForms. I hope it can lead more developers to adopt the greatest open-source Forms framework around.

Designing a CMS Architecture

When faced with the alternative between an off-the-shelf CMS or a custom development, many companies pick solutions like ezPublish or Drupal. In addition to being free, these CMS seem to fulfill all possible requirements. But while choosing an open-source solution is a great idea, going for a full-featured CMS may prove more expensive than designing and developing your own Custom Management System.

Hidden Costs

What does it cost to integrate and deploy a website based on an open-source CMS? At first sight, not much. As for every CMS, you have to design your own templates and fill your website with initial data. But there are additional costs that pop up as soon as you need a little more than just plain content management.

Think about adding a blog or a forum to a website managed by a CMS. There are modules or plugins for that, but they never provide the same flexibility as plain blogging engines such as Wordpress, or plain forum engines like phpBB. So even if the basic requirement is fulfilled by a module, you will always need - always - to adapt its code.

And this is where it gets ugly. The code base of open source CMS engines and their plugin is nowhere as good as what you can see in RAD frameworks these days. Most of them are based on a very old architecture (PHP4, no object orientation, no proper error handling, direct access to the database, etc.). That means that changing something will be very painful, and very expensive. You will encounter numerous bugs, change the blogging plugin three times because neither of the ones you tested are capable of doing what you need, you will upgrade your CMS to the latest version to benefit from this single bug fix that should save your life but then you need to change all your existing configuration...

This is as bad as it sounds. Start changing one single line of code in an application build on top of Drupal or ezPublish, to name only the two major ones, and you are in trouble. The moment you need something that is not natively supported, you enter the Dark Zone of CMS hell. You are going to spend a lot of money on development. You will never see the end of the tunnel. That is, until someone says, a few years from now, "Do we need all that crap? Let's build something that fits our needs and that actually works".

Making Your Own CMS

Given number of available open-source CMS solutions, building one on your own sounds like a stupid idea. But if your website is 50% content management and 50% something else, you probably need to start with a web application framework like symfony or Django, rather than a CMS. These frameworks provide plugins that do part of the Content Management job already, so creating a CMS today is like assembling Lego bricks to build something that exactly fits your needs.

Take symfony, for instance. It provides native support, or support through plugins, for:

Symfony doesn't yet provide an Access Control List or a Workflow plugin, but you can already put all of the above together and have a pretty powerful CMS engine.

A tailor-made CMS will always have less code and show better performance than any of the existing full-featured solutions. Also, you will be able to tweak it completely, since all the components are decoupled, and built with extensibility in mind.

Your custom CMS will cost you more during the first year, but if you expect your website(s) to live longer than that, then the benefit will become obvious after a year and a half. Plugging the CMS features into other parts of the website, adding features unrelated to content management, scaling to a larger audience, replacing the database engine or the caching backend, all that will be painless.

That is, if you design your custom CMS carefully, and with the future in mind.

Environments

When you add features to an application, you need a testing environment - a place where you can check that the additions work and don't kill the rest of the application. That means that developers have a version of the website on their desktop computer, where they change stuff. Then, they upload the application to a test server, check that everything is OK, and only then can they deploy the application to the production server. This is a very common practice, often backed up by source version control and continuous integration tools.

But what happens when a new feature is not made of code, but of data? In ezPublish, for instance, in order to define a new type of content (they call it a "Class"), you have to use the backend web interface and fill in a few forms. The properties of the new type of content are stored in the database. In order to deploy this new type of content from the testing environment to the production environment, the developers need to transfer data from one database to another - without wiping off unrelated information on the production database, such as user comments, statistics, etc.

Deploying new features in this context means executing some SQL code on each server. This is much more dangerous than just pushing a new version of the codebase, especially when the data model is made of many tables glued together in complex joins. That's why, in many websites based on ezPublish, developers add features directly on the production environment, or repeat the configuration using the backend interface on every environment. This is either a high risk or a large waste of time.

Data, or Code?

This environment drawback tends to be a major influence over the choice of features a CMS should provide. For almost every CMS feature, you should wonder: Can the user do that through the backend interface, or do we need a programmer to add a new element? In other terms, is the feature made of data, or code?

Off-the-shelf CMS engines will almost always answer 'Data'. My personal opinion is that it is wrong in many cases. Content types are just one example, but think about workflows or page layouts for instance. They define a complex logic that always translates to code, and giving the user the ability to change them via a backend interface means storing code in the database and evaluating it at runtime. Then you can't use op-code cache engines like APC incriease your website performance. And deploying that to production is a nightmare.

Some companies think that most of the CMS features should be accessible via a backend interface in order to be able to enhance the application without additional developments. But this is an illusion. For one, the configuration of content classes in ezPublish is so complex that it does indeed require a PHP developer, and an expensive one, since experience with ezPublish is one of the most demanded skills in the IT market (at least in France). More features mean more development, and there is no CMS out there that replaces the power of a programming language with a web interface.

So that leads to one good rule of thumb: Design your features so that they can be made of code rather than data. That applies to elements that can be modified by a graphical user interface, or programatically:

  • Content classes
  • "Widgets" or "Components" for pages
  • Page layouts or "templates"
  • Content validation workflow
  • Tasks

Fundamental questions

The complexity of a CMS engine depends greatly on the answer you give to a few fundamental questions:

  • Can contents exist independently of a page?
  • Can contents exist at more than one place in the website?
  • Are there several views for a single piece of content?
  • Can contents have different versions simultaneously?
  • Can contents be modified in the backend and keep unchanged in the frontend?
  • Can users compose a page with "widgets" or "components" in a WYSIWYG interface?
  • Can predefined zones in a template contain more than one "widget" or "component"?
  • Can section pages have different templates?
  • Can section pages have different versions simultaneously?
  • Can users program the publishing of a section page, or of contents, in advance?
  • Can the CMS remember previous URLs for a content that changed title?

If the answer to the first question is no, then the concept of "page" and "content" coincide. You probably don't need to develop anything, since your CMS will be quite simple.

If you answer yes to all these questions, then the CMS might take three times longer to develop than what it would be otherwise.

That's why the idea of a tailor-made CMS is not that stupid. No existing CMS will be able to answer these questions in every possible way. But designing your own relational schema based on the answer to these questions makes sense, economically speaking. Don't make it complex if you don't need do, or, to put it otherwise, Keep It Simple, Stupid.

Bootstrapping the reflection

Now that you're trying to imagine what you actually need for your own CMS, here is a glimpse of the kind of technical challenge you will face all the time.

The question turns around the concept of content types. In a CMS, you mostly deal with "articles". This type of content has a title, an author, a summary, a body, and a few other attributes. But you probably also need to deal with some other content types, like movies, slide shows, quiz games, polls, or recipes. These content types are defined by properties distinct from that of an article. Some of them can fit in a single structure, others require several structures related to each other. For instance, quiz games require a structure for the quiz itself, one for the questions, one for the answers to each question, and one for the quiz results.

The question is: Do you store the data for all these content types in a single table, or do you create a table for each content type? The most "normalized" choice is probably to create one data structure for each. You could have an "article" table, a "recipe" table, and even a "quiz" table with foreign keys to a "quiz_question" and a "quiz_result" table. That would allow you to make queries on some specific attributes of a specific content type. You could build a custom search engine for your recipes and look for ingredients, foreign cuisine and preparation time.

But then, if each content type has its own table(s), what do you do when you have to list all the contents of a section, or worse (that happens in the backend) all the contents of the website? Does that mean that, in order to display a list of contents, you must query several tables and aggregate the results together? This solution simply doesn't scale, and a CMS built like that will become slower and slower as you add new content types.

So that probably means that you should store a reference to each content in a separate table, with a copy of the data that is generic to all content types (like title, publication date, section, etc.). Pages displaying a list of contents would use this aggregate table, while pages displaying content details would use the specific tables.

And that means that you must find a way to synchronize the specific tables and the generic tables whenever data changes in content. That's not a big deal, but it gives you an idea of the kind of complexity you will encounter in a large scale CMS.

A Challenging Exercise

Designing a CMS is difficult and fun, and you'll probably do it more than once. Every CMS is different, because every content management need is different, and mostly because every customer wants more than just plain content management.

If you are a developer, whenever you meet a client that asks you for a Drupal integration, try to sell your knowledge of CMS architectures rather than a few hours of developer time. Raise the important questions, talk about the possible problems of using off-the-shelf solutions. If you ever used one of those before, you will have plenty of issues to talk about. Then, try to convince your customer to trust you into a custom development. Make it small at the beginning, so that the customer can start using it right away and refine its requirements incrementally.

This will be a very satisfying experience, and the client will thank you later for leading him on the right path. And this will give you a lot to talk about for the next CMS you build...

Developing for Developers: my SymfonyCamp08 Presentation

Did you attend this year's Symfony Camp? It was a great event, the unique occasion to meet the core team of the symfony framework. I had the opportunity to give a talk there, and you can now watch it online:

Developing for Developers
View SlideShare presentation or Upload your own. (tags: symfony php)

Don't hesitate to comment on this presentation on SlideShare.

Thanks to all the great people that I met there who gave me a feedback on my work, encouragements or advice. Thanks to Dutch Open Projects for the organization - and for inviting me. It was a great pleasure to exchange about symfony, its past, present, and future, with so many enthusiastic people.

Update: It seems that my slideshow has been featured on the SlideShare homepage by the SlideShare editorial team.

Validating a YAML file against a schema in PHP

As of today, there is no simple way to validate the syntax of a YAML file in PHP. But with two simple tricks, it takes only a few dozens of lines of code to build a robust validator capable of checking the syntax of any YAML file against a given schema.

The problem

YAML is much easier to write and read than XML, but YAML has no schema validation capabilities. With DTD and XSD, you can check that an XML file is correctly formatted before actually using it, and it helps debugging a great lot. Modern web application frameworks like symfony encourage the use of YAML for configuration files, but the lack of validation tool sometimes make YAML a poor choice in a professional environment.

Such a validation tool exists in Ruby, it's called kwalify. But unless you want to spend a huge amount of time translating the 6,000+ lines of code of the library from Ruby into PHP, or to run Ruby code inside your PHP application, you're basically stuck.

First Idea

Did you just read that XML allows validation by way of XSD? Well, why not use this mechanism to validate a YAML file? After all, PHP has a great XML manipulation library, installed by default, and capable of validating any XML file against a DTD or an XSD. Actually, this mechanism is already in use in symfony, since the Propel schema.yml is transformed into an XML counterpart that has an XSD.

It is trivial to transform a YAML file into a PHP associative array. Symfony 1.1 provides a class that does exactly that, and it's called sfYaml. With a little bit of recursion and a few lines of PHP code, it is also quite easy to transform an associative array into a simple XML file.

Let's use the view.yml configuration file in symfony for example. In a typical module, it looks like the following:

# view.yml
default:
  http_metas:
    content-type:  text/html

  metas:
    title:         My symfony project
    robots:        index, follow
    description:   This is my first symfony project
    keywords:      symfony
    language:      en

  stylesheets:     [main.css, top.css]

  javascripts:     [jquery-1.2.6.js, main.js]

  has_layout:      on
  layout:          layout

indexSuccess:
  metas:
    title:        Welcome to my site

Now what does it take to transform this YAML into a simple XML equivalent? Not much. A bit of googling shows that someone already worked on transforming an associative array into XML, and as it is not a good idea to reinvent the wheel, let's reuse this work.

// Transform YAML into XML
include 'sfYaml.class.php';
$yamlString = file_get_contents('view.yml');
$yamlArray =  sfYaml::load($yamlString);
$xmlString = ArrayToXml($yamlArray);

function ArrayToXml($data, $rootNodeName = 'root', $xml = null)
{
  if ($xml == null)
  {
    $xml = simplexml_load_string("<?xml version='1.0' encoding='utf-8'?><$rootNodeName />");
  }

  // loop through the data passed in.
  foreach($data as $key => $value)
  {
    // no numeric keys in our xml please!
    if (is_numeric($key))
    {
      // make string key...
      $key = "unknownNode_". (string) $key;
    }

    // replace anything not alpha numeric
    $key = preg_replace('/[^a-z]/i', '', $key);

    // if there is another array found recrusively call this function
    if (is_array($value))
    {
      $node = $xml->addChild($key);
      // recrusive call.
      ArrayToXml($value, $rootNodeName, $node);
    }
    else
    {
      // add single node.
      $xml->addChild($key, $value);
    }
  }

  // pass back as string. or simple xml object if you want!
  return $xml->asXML();
}


Second idea

The result of the simple YAML to XML transformation looks like this:

<?xml version="1.0" encoding="utf-8"?>
<!-- view.yml.xml -->
<root>
  <default>
    <httpmetas>
      <contenttype>text/html</contenttype>
    </httpmetas>
    <metas>
      <title>My symfony project</title>
      <robots>index, follow</robots>
      <description>This is my first symfony project</description>
      <keywords>symfony</keywords>
      <language>en</language>
    </metas>
    <stylesheets>
      <unknownNode>main.css</unknownNode>
      <unknownNode>top.css</unknownNode>
    </stylesheets>
    <javascripts>
      <unknownNode>jquery-1.2.6.js</unknownNode>
      <unknownNode>main.js</unknownNode>
    </javascripts>
    <haslayout>1</haslayout>
    <layout>layout</layout>
  </default>
  <indexSuccess>
    <metas>
      <title>Welcome to my site</title>
    </metas>
  </indexSuccess>
</root>


The trouble here is that the <default> and <indexSuccess> tags are not real tags. That means, they do not define a class of content but a value. Same for the <unknownNode> nodes. To make sense, a real equivalent to the view.yml in XML should look like this:

<?xml version="1.0"?>
<!-- view.yml.xml, semantically correct -->
<templates>
  <template name="default">
    <httpmetas>
      <contenttype>text/html</contenttype>
    </httpmetas>
    <metas>
      <title>My symfony project</title>
      <robots>index, follow</robots>
      <description>This is my first symfony project</description>
      <keywords>symfony</keywords>
      <language>en</language>
    </metas>
    <stylesheets>
      <stylesheet>main.css</stylesheet>
      <stylesheet>top.css</stylesheet>
    </stylesheets>
    <javascripts>
      <javascript>jquery-1.2.6.js</javascript>
      <javascript>main.js</javascript>
    </javascripts>
    <haslayout>1</haslayout>
    <layout>layout</layout>
  </template>
  <template name="indexSuccess">
    <metas>
      <title>Welcome to my site</title>
    </metas>
  </template>
</templates>


The difference is that main entries are <template> tags with a name attribute, and that children of the <javascripts> element are simple <javascript> elements. This second XML file is semantically correct, because it follows a simple grammar - and can be validated.

But how can you turn the first XML file into the second? The right tool for this job is called XSLT, or Extensible Stylesheet Language Transformations. An XSLT file is a set of transformation rules described in XML. Applying these rules on an XML files transforms it into another XML file. That's exactly what you need here.

The XSLT file to turn the first view.yml.xml into the second one is quite simple:

<?xml version='1.0'?>
<!-- view.yml.xsl -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="/root">
    <templates>
      <xsl:for-each select="child::*">
        <template>
          <xsl:attribute name="name">
            <xsl:value-of select="local-name()" />
          </xsl:attribute>
          <xsl:apply-templates select="child::*"/>
        </template>
      </xsl:for-each>
    </templates>
  </xsl:template>
  <xsl:template match="//stylesheets/unknownNode">
    <stylesheet>
      <xsl:value-of select="text()" />
    </stylesheet>
  </xsl:template>
  <xsl:template match="//javascripts/unknownNode">
    <javascript>
      <xsl:value-of select="text()" />
    </javascript>
  </xsl:template>
  <xsl:template match="*">
    <xsl:copy>
       <xsl:apply-templates/>
     </xsl:copy>
  </xsl:template>
</xsl:stylesheet>


Basically, this XSL stylesheet copies most of the original tags (<xsl:copy>), but does special operations for elements that should be attributes (like <default>), or that should be renamed. This stylesheet defines a "semantical correction" for the automatically created XML translation of the YAML file, and is the first step of the validation. Of course, you need to define one XSLT file for each type of YAML file you want to validate.

How to apply this XSLT to the XML version of the YAML file in PHP? Using the powerful capabilities of PHP in XML, it is extremely simple:

// Transform the XML using XSLT
// Load the simple XML transformation into a DOMDocument object
$xmlDoc = new DomDocument;
$xmlDoc->loadXML($xmlString);
// Load the XSD stylesheet into another DOMDocument object
$xslDoc = new DomDocument;
$xslDoc->load('view.yml.xsd');
// Proceed with transformation using an XsltProcessor object
$xsltp = new XsltProcessor();
$xsltp->importStylesheet($xslDoc);
if (!$xmlTransformed = $xsltp->transformToDoc($xmlDoc))
{
  throw new Exception('XSL transformation failed.');
}


Validating

Validating the semantically correct XML file is quite basic: write an XML Schema, or XSD, describing the syntax expected in a view.yml.xml. You could do it with a DTD instead of and XSD, but XSD is more powerful. Here is a simple schema defining a grammar to validate view.yml.xml files:

<?xml version="1.0"?>
<!-- view.yml.xsd -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="templates">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="template" maxOccurs="unbounded">
          <xs:complexType mixed="true">
            <xs:all>
              <xs:element name="httpmetas" minOccurs="0">
                <xs:complexType>
                  <xs:all>
                    <xs:element name="contenttype" type="xs:string"/>
                  </xs:all>
                </xs:complexType>
              </xs:element>
              <xs:element name="metas" minOccurs="0">
                <xs:complexType>
                  <xs:all>
                    <xs:element name="title" type="xs:string" minOccurs="0"/>
                    <xs:element name="robots" type="xs:string" minOccurs="0"/>
                    <xs:element name="description" type="xs:string" minOccurs="0"/>
                    <xs:element name="keywords" type="xs:string" minOccurs="0"/>
                    <xs:element name="language" type="xs:string" minOccurs="0"/>
                  </xs:all>
                </xs:complexType>
              </xs:element>
              <xs:element name="stylesheets" minOccurs="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="stylesheet" type="xs:string" maxOccurs="unbounded"/>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="javascripts" minOccurs="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="javascript" type="xs:string" maxOccurs="unbounded"/>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="haslayout" type="xs:integer" minOccurs="0"/>
              <xs:element name="layout" type="xs:string" minOccurs="0"/>
            </xs:all>
            <xs:attribute name="name" type="xs:string" use="required"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>


Note: I know this doesn't cover all cases; it is mostly a proof of concept.

Now, you need to check the XML file against that schema. Once again, the powerful XML manipulation library of PHP makes it a piece of cake:

// validate the new XML against and XSD
// $xmlTransformed is the semantically correct XML translation of the YAML file defined earlier
if($xmlTransformed->schemaValidate('view.yml.xsd'))
{
  return true;
}
else
{
  // display errors
}


Dealing with libxml errors

By default, DOMDocument::schemaValidate() will only return true if the XML file is valid, and false otherwise. But a good validation utility needs to be more verbose than that, and display errors where a files doesn't validate. In order to do that, you need to manually fetch the libxml errors when the validation fails, as explained in the PHP Manual.

libxml_use_internal_errors(true);
// validate the new XML against and XSD
// $xmlTransformed is the semantically correct XML translation of the YAML file defined earlier
if($xmlTransformed->schemaValidate('view.yml.xsd'))
{
  return true;
}
else
{
  // display errors
  $errors = libxml_get_errors();
  $message = "n";
  foreach ($errors as $error)
  {
    $message .= trim($error->message) . ' (';
    switch ($error->level)
    {
      case LIBXML_ERR_WARNING:
        $return .= "Warning $error->code";
        break;
      case LIBXML_ERR_ERROR:
        $return .= "Error $error->code";
        break;
      case LIBXML_ERR_FATAL:
        $return .= "Fatal Error $error->code";
        break;
    }
    if ($error->file)
    {
      $message .= " in $error->file";
    }
    $message .= " on line $error->line)n";
  }
  libxml_clear_errors();

  throw new Exception($message);
}


That's all. Now all it takes to validate any view.yml file are the XSLT and the XSD grammars. If a view.yml ever contains an incorrect setting, say:

default:
  foo: bar

Then an exception will be raised with a meaningful error message:

Element 'foo': This element is not expected. (Error 1871 on line 2)

Wrapping it up

The idea can be easily transposed to any YAML file. A YAML validator should:

  1. Turn a YAML file into a PHP associative array using sfYaml
  2. Turn this array into an XML structure, in a brute and blind way
  3. Turn the XML structure into a second XML structure using a set of XSLT rules to make the structure semantically correct
  4. Validate the second XML structure using an XML Schema
  5. If errors appear, return them wrapped up in an exception

To validate, say the generator.yml in symfony, all it takes is a generator.yml.xsl and a generator.yml.xsd to define the expected grammar in this file.

Ironic, isn't it?

You could say that the idea behind YAML is to avoid writing XML files. So using XML, XSD and XSLT in order to validate a YAML file may look a bit counter-intuitive, if not ironic.

But when you put it all together, the code necessary to validate any YAML file (not including, or course, the XSLT and XSD grammars, which depend on the file you validate) take only a few dozen lines. Besides, PHP is very good at handling XML, so it's better to use it for its strong points, instead of trying to mimic another language an end up writing thousands of lines of code. Actually, the 'K.I.S.S.' principle that encourages the use of YAML for configuration files should also apply here: XML manipulation is the simplest way to validate a YAML file, so it's the right tool for the job.

Last but not least, revolutions sometimes look backwards - think about the Renaissance. So using XML to validate YAML is probably not as dumb as it sounds.

The full YAML validator code is attached below, together with the example YAML file for your testing pleasure. Once again, I'm not a developer, so the code is just there to prove that the idea works. It could probably be much improved.

Source code + example YAML file and validator schemas

Including the YAML validation system in a web application framework that uses YAML is a must. Validation should only be done in development environment, of course, and only when the YAML files change. Symfony uses a configuration cache system with a set of configuration handlers that would make validation very easy and efficient. Let alone other frameworks in PHP, or in other languages, who could also take advantage of a similar approach.

Oh, and there is one more thing: The semantically correct XML file and its XSD syntax define a perfect XML equivalent to YAML files in symfony. If you want to use XML instead of YAML, and write your own configuration handlers, you should probably follow this kind of syntax.