CDATA, PCDATA, and Entity References




Textual data contained in an XML element can be expressed as Character Data (CDATA), Parsed
Character Data (PCDATA), or a combination of the two. Data that appears between <![CDATA[ and ]]>

tags is CDATA; any other data is PCDATA. The following element contains PCDATA:

<title>XSLT Programmers Reference</title>

The next element contains CDATA:

<author><![CDATA[Michael Kay]]></author>

And the following contains both:

<title>XSLT Programmers Reference <![CDATA[Author – Michael Kay]]></title>

As you can see, CDATA is useful when you want some parts of your XML document to be ignored by
the parser and not processed at all. This means you can put anything between <![CDATA[ and ]]> tags
and an XML parser won’t care; however data not enclosed in <![CDATA[ and ]]> tags must conform to
the rules of XML. Often, CDATAsections are used to enclose code for scripting languages like VBScript
or JavaScript.

XML parsers ignore CDATA but parse PCDATA —that is, interpret it as markup language. You might
wonder why an XML parser distinguishes between CDATA and PCDATA. Certain characters, notably <,

>, and &, have special meaning in XML and must be enclosed in CDATA sections if they’re to be used
verbatim. For example, suppose you wanted to define an element named range whose value is ‘0 <

counter < 1000’. Because < is a reserved character, you can’t define the element this way:

<range>0 < counter < 1000</range>

You can, however, define it this way:

<range><[CDATA[0 < counter < 100]]></range>

As you can see, CDATA sections are useful for including mathematical equations, code listings, and even

other XML documents in XML documents.