Validating a YAML file against a schema in PHP

As of today, there is no simple way to validate the syntax of a YAML file in PHP. But with two simple tricks, it takes only a few dozens of lines of code to build a robust validator capable of checking the syntax of any YAML file against a given schema.

The problem

YAML is much easier to write and read than XML, but YAML has no schema validation capabilities. With DTD and XSD, you can check that an XML file is correctly formatted before actually using it, and it helps debugging a great lot. Modern web application frameworks like symfony encourage the use of YAML for configuration files, but the lack of validation tool sometimes make YAML a poor choice in a professional environment.

Such a validation tool exists in Ruby, it's called kwalify. But unless you want to spend a huge amount of time translating the 6,000+ lines of code of the library from Ruby into PHP, or to run Ruby code inside your PHP application, you're basically stuck.

First Idea

Did you just read that XML allows validation by way of XSD? Well, why not use this mechanism to validate a YAML file? After all, PHP has a great XML manipulation library, installed by default, and capable of validating any XML file against a DTD or an XSD. Actually, this mechanism is already in use in symfony, since the Propel schema.yml is transformed into an XML counterpart that has an XSD.

It is trivial to transform a YAML file into a PHP associative array. Symfony 1.1 provides a class that does exactly that, and it's called sfYaml. With a little bit of recursion and a few lines of PHP code, it is also quite easy to transform an associative array into a simple XML file.

Let's use the view.yml configuration file in symfony for example. In a typical module, it looks like the following:

# view.yml
default:
  http_metas:
    content-type:  text/html

  metas:
    title:         My symfony project
    robots:        index, follow
    description:   This is my first symfony project
    keywords:      symfony
    language:      en

  stylesheets:     [main.css, top.css]

  javascripts:     [jquery-1.2.6.js, main.js]

  has_layout:      on
  layout:          layout

indexSuccess:
  metas:
    title:        Welcome to my site

Now what does it take to transform this YAML into a simple XML equivalent? Not much. A bit of googling shows that someone already worked on transforming an associative array into XML, and as it is not a good idea to reinvent the wheel, let's reuse this work.

// Transform YAML into XML
include 'sfYaml.class.php';
$yamlString = file_get_contents('view.yml');
$yamlArray =  sfYaml::load($yamlString);
$xmlString = ArrayToXml($yamlArray);

function ArrayToXml($data, $rootNodeName = 'root', $xml = null)
{
  if ($xml == null)
  {
    $xml = simplexml_load_string("<?xml version='1.0' encoding='utf-8'?><$rootNodeName />");
  }

  // loop through the data passed in.
  foreach($data as $key => $value)
  {
    // no numeric keys in our xml please!
    if (is_numeric($key))
    {
      // make string key...
      $key = "unknownNode_". (string) $key;
    }

    // replace anything not alpha numeric
    $key = preg_replace('/[^a-z]/i', '', $key);

    // if there is another array found recrusively call this function
    if (is_array($value))
    {
      $node = $xml->addChild($key);
      // recrusive call.
      ArrayToXml($value, $rootNodeName, $node);
    }
    else
    {
      // add single node.
      $xml->addChild($key, $value);
    }
  }

  // pass back as string. or simple xml object if you want!
  return $xml->asXML();
}


Second idea

The result of the simple YAML to XML transformation looks like this:

<?xml version="1.0" encoding="utf-8"?>
<!-- view.yml.xml -->
<root>
  <default>
    <httpmetas>
      <contenttype>text/html</contenttype>
    </httpmetas>
    <metas>
      <title>My symfony project</title>
      <robots>index, follow</robots>
      <description>This is my first symfony project</description>
      <keywords>symfony</keywords>
      <language>en</language>
    </metas>
    <stylesheets>
      <unknownNode>main.css</unknownNode>
      <unknownNode>top.css</unknownNode>
    </stylesheets>
    <javascripts>
      <unknownNode>jquery-1.2.6.js</unknownNode>
      <unknownNode>main.js</unknownNode>
    </javascripts>
    <haslayout>1</haslayout>
    <layout>layout</layout>
  </default>
  <indexSuccess>
    <metas>
      <title>Welcome to my site</title>
    </metas>
  </indexSuccess>
</root>


The trouble here is that the <default> and <indexSuccess> tags are not real tags. That means, they do not define a class of content but a value. Same for the <unknownNode> nodes. To make sense, a real equivalent to the view.yml in XML should look like this:

<?xml version="1.0"?>
<!-- view.yml.xml, semantically correct -->
<templates>
  <template name="default">
    <httpmetas>
      <contenttype>text/html</contenttype>
    </httpmetas>
    <metas>
      <title>My symfony project</title>
      <robots>index, follow</robots>
      <description>This is my first symfony project</description>
      <keywords>symfony</keywords>
      <language>en</language>
    </metas>
    <stylesheets>
      <stylesheet>main.css</stylesheet>
      <stylesheet>top.css</stylesheet>
    </stylesheets>
    <javascripts>
      <javascript>jquery-1.2.6.js</javascript>
      <javascript>main.js</javascript>
    </javascripts>
    <haslayout>1</haslayout>
    <layout>layout</layout>
  </template>
  <template name="indexSuccess">
    <metas>
      <title>Welcome to my site</title>
    </metas>
  </template>
</templates>


The difference is that main entries are <template> tags with a name attribute, and that children of the <javascripts> element are simple <javascript> elements. This second XML file is semantically correct, because it follows a simple grammar - and can be validated.

But how can you turn the first XML file into the second? The right tool for this job is called XSLT, or Extensible Stylesheet Language Transformations. An XSLT file is a set of transformation rules described in XML. Applying these rules on an XML files transforms it into another XML file. That's exactly what you need here.

The XSLT file to turn the first view.yml.xml into the second one is quite simple:

<?xml version='1.0'?>
<!-- view.yml.xsl -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="/root">
    <templates>
      <xsl:for-each select="child::*">
        <template>
          <xsl:attribute name="name">
            <xsl:value-of select="local-name()" />
          </xsl:attribute>
          <xsl:apply-templates select="child::*"/>
        </template>
      </xsl:for-each>
    </templates>
  </xsl:template>
  <xsl:template match="//stylesheets/unknownNode">
    <stylesheet>
      <xsl:value-of select="text()" />
    </stylesheet>
  </xsl:template>
  <xsl:template match="//javascripts/unknownNode">
    <javascript>
      <xsl:value-of select="text()" />
    </javascript>
  </xsl:template>
  <xsl:template match="*">
    <xsl:copy>
       <xsl:apply-templates/>
     </xsl:copy>
  </xsl:template>
</xsl:stylesheet>


Basically, this XSL stylesheet copies most of the original tags (<xsl:copy>), but does special operations for elements that should be attributes (like <default>), or that should be renamed. This stylesheet defines a "semantical correction" for the automatically created XML translation of the YAML file, and is the first step of the validation. Of course, you need to define one XSLT file for each type of YAML file you want to validate.

How to apply this XSLT to the XML version of the YAML file in PHP? Using the powerful capabilities of PHP in XML, it is extremely simple:

// Transform the XML using XSLT
// Load the simple XML transformation into a DOMDocument object
$xmlDoc = new DomDocument;
$xmlDoc->loadXML($xmlString);
// Load the XSD stylesheet into another DOMDocument object
$xslDoc = new DomDocument;
$xslDoc->load('view.yml.xsd');
// Proceed with transformation using an XsltProcessor object
$xsltp = new XsltProcessor();
$xsltp->importStylesheet($xslDoc);
if (!$xmlTransformed = $xsltp->transformToDoc($xmlDoc))
{
  throw new Exception('XSL transformation failed.');
}


Validating

Validating the semantically correct XML file is quite basic: write an XML Schema, or XSD, describing the syntax expected in a view.yml.xml. You could do it with a DTD instead of and XSD, but XSD is more powerful. Here is a simple schema defining a grammar to validate view.yml.xml files:

<?xml version="1.0"?>
<!-- view.yml.xsd -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="templates">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="template" maxOccurs="unbounded">
          <xs:complexType mixed="true">
            <xs:all>
              <xs:element name="httpmetas" minOccurs="0">
                <xs:complexType>
                  <xs:all>
                    <xs:element name="contenttype" type="xs:string"/>
                  </xs:all>
                </xs:complexType>
              </xs:element>
              <xs:element name="metas" minOccurs="0">
                <xs:complexType>
                  <xs:all>
                    <xs:element name="title" type="xs:string" minOccurs="0"/>
                    <xs:element name="robots" type="xs:string" minOccurs="0"/>
                    <xs:element name="description" type="xs:string" minOccurs="0"/>
                    <xs:element name="keywords" type="xs:string" minOccurs="0"/>
                    <xs:element name="language" type="xs:string" minOccurs="0"/>
                  </xs:all>
                </xs:complexType>
              </xs:element>
              <xs:element name="stylesheets" minOccurs="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="stylesheet" type="xs:string" maxOccurs="unbounded"/>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="javascripts" minOccurs="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="javascript" type="xs:string" maxOccurs="unbounded"/>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="haslayout" type="xs:integer" minOccurs="0"/>
              <xs:element name="layout" type="xs:string" minOccurs="0"/>
            </xs:all>
            <xs:attribute name="name" type="xs:string" use="required"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>


Note: I know this doesn't cover all cases; it is mostly a proof of concept.

Now, you need to check the XML file against that schema. Once again, the powerful XML manipulation library of PHP makes it a piece of cake:

// validate the new XML against and XSD
// $xmlTransformed is the semantically correct XML translation of the YAML file defined earlier
if($xmlTransformed->schemaValidate('view.yml.xsd'))
{
  return true;
}
else
{
  // display errors
}


Dealing with libxml errors

By default, DOMDocument::schemaValidate() will only return true if the XML file is valid, and false otherwise. But a good validation utility needs to be more verbose than that, and display errors where a files doesn't validate. In order to do that, you need to manually fetch the libxml errors when the validation fails, as explained in the PHP Manual.

libxml_use_internal_errors(true);
// validate the new XML against and XSD
// $xmlTransformed is the semantically correct XML translation of the YAML file defined earlier
if($xmlTransformed->schemaValidate('view.yml.xsd'))
{
  return true;
}
else
{
  // display errors
  $errors = libxml_get_errors();
  $message = "n";
  foreach ($errors as $error)
  {
    $message .= trim($error->message) . ' (';
    switch ($error->level)
    {
      case LIBXML_ERR_WARNING:
        $return .= "Warning $error->code";
        break;
      case LIBXML_ERR_ERROR:
        $return .= "Error $error->code";
        break;
      case LIBXML_ERR_FATAL:
        $return .= "Fatal Error $error->code";
        break;
    }
    if ($error->file)
    {
      $message .= " in $error->file";
    }
    $message .= " on line $error->line)n";
  }
  libxml_clear_errors();

  throw new Exception($message);
}


That's all. Now all it takes to validate any view.yml file are the XSLT and the XSD grammars. If a view.yml ever contains an incorrect setting, say:

default:
  foo: bar

Then an exception will be raised with a meaningful error message:

Element 'foo': This element is not expected. (Error 1871 on line 2)

Wrapping it up

The idea can be easily transposed to any YAML file. A YAML validator should:

  1. Turn a YAML file into a PHP associative array using sfYaml
  2. Turn this array into an XML structure, in a brute and blind way
  3. Turn the XML structure into a second XML structure using a set of XSLT rules to make the structure semantically correct
  4. Validate the second XML structure using an XML Schema
  5. If errors appear, return them wrapped up in an exception

To validate, say the generator.yml in symfony, all it takes is a generator.yml.xsl and a generator.yml.xsd to define the expected grammar in this file.

Ironic, isn't it?

You could say that the idea behind YAML is to avoid writing XML files. So using XML, XSD and XSLT in order to validate a YAML file may look a bit counter-intuitive, if not ironic.

But when you put it all together, the code necessary to validate any YAML file (not including, or course, the XSLT and XSD grammars, which depend on the file you validate) take only a few dozen lines. Besides, PHP is very good at handling XML, so it's better to use it for its strong points, instead of trying to mimic another language an end up writing thousands of lines of code. Actually, the 'K.I.S.S.' principle that encourages the use of YAML for configuration files should also apply here: XML manipulation is the simplest way to validate a YAML file, so it's the right tool for the job.

Last but not least, revolutions sometimes look backwards - think about the Renaissance. So using XML to validate YAML is probably not as dumb as it sounds.

The full YAML validator code is attached below, together with the example YAML file for your testing pleasure. Once again, I'm not a developer, so the code is just there to prove that the idea works. It could probably be much improved.

Source code + example YAML file and validator schemas

Including the YAML validation system in a web application framework that uses YAML is a must. Validation should only be done in development environment, of course, and only when the YAML files change. Symfony uses a configuration cache system with a set of configuration handlers that would make validation very easy and efficient. Let alone other frameworks in PHP, or in other languages, who could also take advantage of a similar approach.

Oh, and there is one more thing: The semantically correct XML file and its XSD syntax define a perfect XML equivalent to YAML files in symfony. If you want to use XML instead of YAML, and write your own configuration handlers, you should probably follow this kind of syntax.

Possibly related posts (automatically generated):

22 Comments so far

  1. Markus.Staab on September 10th, 2008

    sounds great and would be a nice addition for the sf-framework..

  2. Ryan Weaver on September 11th, 2008

    Wow - I'm going to need to read this a few times to let it sink in. Very much outside of my expertise - I love it!

  3. jason rowe on September 11th, 2008

    interesting idea. yes it works and your on track but something tells me there must be a simpler way to do this. the double transformation might be a bit over the top.

    nice work though. jae.

  4. Romain Dorgueil (hartym) on September 11th, 2008

    hehe great idea.

    YAML big problem is indeed grammar validation, this can be a great solution to the problem. But then I wonder if your final view.xml isn't as easy to write than the view.yml?

  5. Francois Zaninotto on September 11th, 2008

    @jason: If you find a simpler way, please tell me. I'd love to hear about it ;)

    @Romain: My last paragraph introduces the idea that this new XML can be used as an alternative if you want to use XML instead of YAML. But I still prefer YAML for readability and speed. Besides, the post suggests to introduce a supplementary validation, not to break BC by changing the configuration file format...

  6. jason rowe on September 11th, 2008

    Hi Francois, i guess what I'm thinking is that it seems odd to have data in YAML and have to convert it to XML to test it, when we still have to write XML definitions which imo are harder to write than YAML.

    Maybe it would be better to have a YAML definition file, which could be passed to a generic processor along with the file to be tested. The processor could handle any symfony file that a had a YAML definition of whats expected/allowed.

  7. slantedview on September 11th, 2008

    I like this. It's a great proof of concept. I think Jason is onto something with the idea of writing a YAML schema definition file though.

  8. Fabian Spillner on September 12th, 2008

    The idea with yaml definition file is better way, its conform, like you use xml and its xml validator and not xml with another file type.

  9. Markus.Staab on September 12th, 2008

    In my opinion its a nice idea to have the validation based on DTD and don't reinvent the wheel and define a new language..

    this provide the possibility to also validate the *.xml files (which are used by some people instead of yml).. [francois already mentioned this point]

    And also the DTD definitions are reusable in other tools which may want to validate the xml/yml files, because this validation concepts (XML/DTD) are implemented in a lot of tools/languages

  10. Francois Zaninotto on September 12th, 2008

    A yaml schema language written in yaml is what kwalify does. Take a look at it, and you will see that the implementation is very complex. Until someone comes with a php equivalent, my proposal is pretty much the only way to validate a yaml file...

  11. Jason Rowe on September 12th, 2008

    Hi Francois, I've thought about this further and think your right. The thing i didn't like was having to write a XML/DTD for each YAML file in symfony, which are many.

    I wonder if you could use a similar transformation to generate the DTD.

  12. Fabian Spillner on September 12th, 2008

    No, I dont think, its required to reinvent the wheel. I didnt mention that you write the xsd schema in yaml way and you can transform it to xsd like you transform your yaml to xml.

    ex.

    schema:
      element:
        name:  templates
        complexType:
          sequence:
            element:
              name: template
              maxOccurs: unbounded
              complexType:
                mixed: true
                  all:
                    element:
                      name: httpmetas
                      minOccurs: 0
                      complexType:
                        all:
                          element:
                            name: contenttype
                            type: string
                    element:
                      name: metas
                      minOccurs: 0
                      complexType:
                        all:
                          element:
                            name: title
                            type: string
                            minOccurs: 0
                          element:
                            name: robots
                            type: string
                            minOccurs: 0
                          element:
                            name: description
                            type: string
                            minOccurs: 0
                          element:
                            name: keywords
                            type: string
                            minOccurs: 0
                          element:
                            name: language
                            type: string
                            minOccurs: 0
                    element:
                      name: javascripts
                      minOccurs: 0
                      complexType:
                        sequence:
                          element:
                            name: javascript
                            type: string
                            maxOccurs: unbounded
                    element:
                      name: haslayout
                      type: integer
                      minOccurs: 0
                    element:
                      name: layout
                      type: string
                      minOccurs: 0
                attribute:
                  name: name
                  type: string
                  use: required

    It should not be difficult to let transforming it into xsd.

    Do I miss something important thing which makes my suggestion fails?

  13. Fabian Spillner on September 12th, 2008

    Oh, no! After posting my comment it deleted the spaces and destroyed my example...

  14. Francois Zaninotto on September 12th, 2008

    @Fabian: I fixed the formatting. I see what you mean, but your example is not valid YAML, since you have several keys with the same name at the same level.

  15. Fabian Spillner on September 13th, 2008

    Thank you, Francois! Yes, you are right, that my example is not valid yaml! I wrote my example fast without taking around of yaml rules. Sorry!

    For the problem with same name you can fix it with sequence of mappings like:

    • element: name: xxx
  16. Markus.Staab on September 15th, 2008

    In my opinion the better solution is to use the DTD/XSD language for this purpose.. It can be used with nearly every language and you don't need the additional intermediate step with the transformation (error prone)..

  17. Fabian Spillner on September 16th, 2008

    Here a little demo project with yaml schema file: http://www.dreamcocoa.com/download/yaml_validation.zip

    @Markus.Staab: you should use XML configuration file instead of yaml configuration use.

  18. Francois Zaninotto on September 16th, 2008

    @Fabian: I see what you mean, but in this case your YAML file is longer and more complex than the XSD file. Besides, you use the _attributes: key, which is kind of a trick and forces your parser to be a little more than a simple transformation. Last but not least, that's a new language to teach to developers, while XSD is broadly learnt...

    So in the long run, I think that XSD would still be my best choice. Not that we shouldn't add your method to a generic Yaml validator to allow both XSD and YAML schema.

    Did you take a look at the kwalify YAML schema syntax?

  19. Fabian Spillner on September 17th, 2008

    @Francois: I never refuse your idea, but from my point of view your solution (and my solution, too) is not a solution for professional environment. I took short look at the kwalify YAML schema syntax and am impressed what the guys did! It rocks!

    Hopefully, symfony core team will port this good idea in PHP with great symfony style.

  20. Francois Zaninotto on September 18th, 2008

    @Fabian: To my mind, a kwalify in PHP is science fiction for now. Maybe you want to give it a try? But instead of doing no validation at all until it's done, my solution could help many users in the next months (or years...)

  21. Fabian Spillner on September 22nd, 2008

    I dont think that I will get time to give it a try.

  22. [...] Zaninotto submitted a tutorial he’s written up about creating a YAML validation script with [...]