Components of an XML Document





As mentioned earlier in this chapter, XML is a language for describing data and the structure of data. XML
data is contained in a document, which can be a file, a stream, or any other storage medium, real or virtual,
that’s capable of holding text. A proper XML document begins with the following XML declaration, which
identifies the document as an XML document and specifies the version of XML that the document’s
contents conform to:


<?xml version=”1.0”?>


The XML declaration can also include an encoding attribute that identifies the type of characters contained
in the document. For example, the following declaration specifies that the document contains characters
from the Latin-1 character set used by Windows 95, 98, and Windows Me:


<?xml version=”1.0” encoding=”ISO-8859-1”?>


The next example identifies the character set as UTF-16, which consists of 16-bit Unicode characters:


<?xml version=”1.0” encoding=”UTF-16”?>


The encoding attribute is optional if the document consists of UTF-8 or UTF-16 characters because an XML
parser can infer the encoding from the document’s first five characters:  ‘<?xml’. Documents that use
other encodings must identify the encodings that they use to ensure that an XML parser can read them.
XML declarations are actually specialized forms of XML processing instructions that contain commands for
XML processors. Processing instructions are always enclosed in <? and ?> symbols. Some browsers, such
as Internet Explorer, interpret the following processing instruction to mean that the XML document should
be formatted using a style sheet named Books.xsl before it’s displayed:


<?xml-stylesheet type=”text/xsl” href=”Books.xsl”?>