Documented XML

In his article Eine Lanze für XML brechen (translated to Taking up the Cudgels for XML by leo.org) Philip Ghadir reminds us of the many benefits of XML.

easy to read by humans due to tag names (if the semantics of the tags are published, otherwise it may be easy to guess the meaning or not dependent on the document at hand)
defined way to specify the encoding within the document
defined way to reference a document type or schema
defined way to add links to other documents
defined way to add processing instructions
defined way to add comments
defined way to mix elements/attributes from different sources in one document with the help of namespaces
defined way to specify selections on document parts with XPath
defined way to run transformations with XSLT

Where XML is ill-reputed for its verboseness and in every way less cool than e.g. JSON (please note that neither the article of Ghadir nor this blog post wants to compare the two formats since there are simply use cases for both), Philip Ghadir shows in his article how XML’s features can be used to increase the acceptance for its documents. In one of his examples he shows how to add CSS to provide a pretty view an the data, in another he shows how to transform a document using XSLT. You may have a look at Ghadir’s article and the example code given therein.

Although I like DSLs that provide a concise and elegant way to write documents, XML is still that multipurpose markup language I often use in my projects. And if I am in need for a new type of XML document, I always add an XML schema (XSD) to define its valid structure. Annotations allow to add comments so that human readers are able to understand the semantics of the document elements. But the declaration of elements in an XSD document tend to be very verbose. If I have to write one (I’m not doing it on a daily or weekly basis), I always have to lookup even basic concepts since I cannot remember how to spell them out correctly.

The following examples form my personal cheat sheet I hope that may be of use for you, too.

Documentation

Documentation allows to explain what the intended use for an element or attribute is. The documentation is part of the annotation to an element.

Element

<xsd:element name="my-element">
  <xsd:annotation>
    <xsd:documentation>
       Documentation for the element...
    </xsd:documentation>
  </xsd:annotation>
  ...
</xsd:element>

Attribute

<xsd:attribute name="my-attribute" type="xsd:string">
  <xsd:annotation>
    <xsd:documentation>
      Documentation for the attribute...
    </xsd:documentation>
  </xsd:annotation>
</xsd:attribute>

So the pattern for adding documentation is always the same.

Elements with Attributes

Adding attributes to elements is an often required task. For complex types the declaration is quite intuitive. Less so for elements with simple content or no content.

Complex Element

Okay, the easy one first!

This is what I want to specify in my XML document:

<my-element my-attribute="true">
  ...
</my-element>

This is how the XSD snippet looks like:

<xsd:complexType name="my-element">
  <xsd:sequence>
    ...
  </xsd:sequence>
 <xsd:attribute name="my-attribute" type="xsd:boolean"/>
</xsd:complexType>

Element with simple Content

If I want an element with simple content I often forget the extension part.

This is what I want to specify in my XML document:

<my-element my-attribute="true">Text content</my-element>

This is how the XSD snippet looks like:

<xsd:element name="my-element">
 <xsd:complexType>
   <xsd:simpleContent>
     <xsd:extension base="xsd:string">
       <xsd:attribute name="my-attribute" type="xsd:boolean"/>
     </xsd:extension>
   </xsd:simpleContent>
 </xsd:complexType>
</xsd:element>

Element without Content

An element without content simply omits the extension part.

This is what I want to specify in my XML document:

<my-element my-attribute="value"/>

This is how the XSD snippet looks like:

<xsd:element name="my-element">
  <xsd:complexType>
    <xsd:attribute name="my-attribute" type="xsd:string"/>
  </xsd:complexType>
</xsd:element>

Global Attributes

If we simply reference an attribute we have to add the namespace. This is very inconvenient for the author and the (human) readers of an XML document. Therefore use attribute groups, even if the group contains only one attribute.

<xsd:attributeGroup name="my-attribute">
 <xsd:attribute name="gav" type="xsd:string" use="required">
   <xsd:annotation>
     <xsd:documentation>
        Documentation for the attribute...
     </xsd:documentation>
   </xsd:annotation>
 </xsd:attribute>
</xsd:attributeGroup>

Reference the attribute group like this (tns is the target namespace):

<xsd:complexType name="MyType">
  <xsd:attributeGroup ref="tns:my-attribute"/>
</xsd:complexType>

Restriction to Enumerations

If the author of an XML document should only be allowed to select a value from a finite list, the enumeration is the construct to choose. Note that you can add documentation to the enum type and the single enumeration values.

 <xsd:simpleType name="my-enum">
   <xsd:restriction base="xsd:string">
     <xsd:enumeration value="story" />
     <xsd:enumeration value="epic" />
     <xsd:enumeration value="constraint" />
     <xsd:enumeration value="spike" />
   </xsd:restriction>
 </xsd:simpleType>

Element Content

And now some recurring patterns of element content.

Any Content

The following declaration allows any content.

<xsd:sequence>
   <xsd:any
     processContents="lax"
     minOccurs="0"
     maxOccurs="unbounded" />
</xsd:sequence>

Elements from a Set

If you want to allow elements from a given set, in arbitrary order, any number of occurrences, use this:

<xsd:sequence maxOccurs="unbounded">
 <xsd:choice minOccurs="0" maxOccurs="unbounded">
   <xsd:element
     name="elementA"
     type="tns:A"
     minOccurs="0"
     maxOccurs="unbounded">
   </xsd:element>
   <xsd:element
     name="elementB"
     type="tns:B"
     minOccurs="0"
     maxOccurs="unbounded">
   </xsd:element>
 </xsd:choice>
</xsd:sequence>

CDATA Content

XML does not distinguish between simple content and content that contains character data (CDATA). Therefore the declaration for

<my-element>value</my-element>

and

<my-element><![CDATA[ ... text ... ]]></my-element>

is the same:

<xsd:element name="my-element" type="xsd:string"/>

Resources

The resources I typically consult are:

Link

Posts

Blog

Docs

Support

Free Blueprints

Tooling for Blueprints

Free Extensions

Spaces

Plugins for Maven

Incubator

Libraries for Java

Incubator

Add-ons for Confluence

Tools

Support

Team

Legal

Documentation

Element

Attribute

Elements with Attributes

Complex Element

Element with simple Content

Element without Content

Global Attributes

Restriction to Enumerations

Element Content

Any Content

Elements from a Set

CDATA Content

Resources

Posts

About

Keep up-to-date!

Powered by