Working with XmlDocument Class




To be fully accessible, an XML document must be entirely loaded in memory and its nodes and
attributes mapped to relative objects derived from the XmlNode class. The process that builds the XML
DOM is triggered when you call the Load() method. You can use a variety of sources to indicate the
XML document to work on, including disk files and URLs and also streams and text readers. But before
you load an XmlDocument, you need to first create an XML document, which is the topic of focus in the
next section.

Creating an XmlDocument

To load an XML document into memory for full-access processing, you create a new instance of the

XmlDocument class. The class features three public constructors, one of which is the default parameter-
less constructor, as shown here:

public XmlDocument();
public XmlDocument(XmlNameTable);
public XmlDocument(XmlImplementation);

The second overloaded constructor takes in an XmlNameTable object as an argument that allows the
class to work faster with attribute and node names and optimize memory management. Just as the

XmlReader class does, XmlDocument builds its own name table incrementally while processing the









document. Passing a precompiled name table, however, can substantially speed up the overall execu-
tion. The third overloaded constructor allows you to initialize an XmlDocument class with the specified

XmlImplementation class. The XmlImplementation class is a special class that allows you to define
the context for a set of XmlDocument objects. This class provides methods for performing operations that
are independent of any particular instance of the DOM.

method), testing for supported features (through the HasFeature() method), and
more important, sharing the same name table.

The following code snippet shows how to create two documents from the same implementation:

XmlImplementation xmlImpl = new XmlImplementation();
XmlDocument doc1 = xmlImpl.CreateDocument();
XmlDocument doc2 = xmlImpl.CreateDocument();

After you have an empty XmlDocument, you need to load it with XML data. The next section discusses
how to perform this.

Loading XML Documents

Loading of an XML document is accomplished by calling the Load() method, which reads XML data
and populates the document tree structure. There are four different versions of the Load() method, each
of which uses a different source to read the data. Here are the various forms of the Load() method:

  Load(Stream): Loads the document from a Stream data source

  Load(string): Loads the document using the given file name string

  Load(TextReader): Loads the document using a TextReader as the data source

  Load(XmlReader): Loads the document using the given XmlReader as the data source

In addition to taking a Stream, TextReader, and XmlReader objects, the Load() method also takes in a
file name as a string argument. Using this method, you can load an XML document from the specified
URL. Apart from the overloaded Load() methods, there is also a method named LoadXml() that makes
it possible to load the XML document from a string of data as its argument.

Note that when you load a new XmlDocument object, the current instance of the

XmlDocument object is cleared. This means that if you reuse the same instance of the

XmlDocument class to load a second document, the existing contents are entirely
removed and replaced with the contents of the second document.
In the base implementation of the XmlImplementation class, the list of operations
that various instances of XmlDocument classes can share is relatively short. These
operations include creating new documents (through the CreateDocument()

Listing 6-2 shows two ways to load an XmlDocument: first from a disk file and then by using a string
variable that you have created in your application code.






09_596772 ch06.qxd  12/13/05  11:13 PM  Page 141

XML DOM Object Model

Listing 6-2: Loading XML Documents

<%@ Page Language=”C#” %>
<%@ Import Namespace=”System.Xml” %>
<script runat=”server”>

void Page_Load(object sender, EventArgs e)
{

 string xmlPath = Request.PhysicalApplicationPath +

@”\App_Data\Books.xml”;

XmlDocument booksDoc = new XmlDocument();
XmlDocument empDoc = new XmlDocument();
Response.ContentType = “text/xml”;
try
{

 //Load the XML from the file
booksDoc.PreserveWhitespace = true;
booksDoc.Load(xmlPath);
//Write the XML onto the browser
Response.Write(booksDoc.InnerXml);
//Load the XML from a String
empDoc.LoadXml(“<employees>” +

“<employee id=’1’>” +   
“<name><firstName>Nancy</firstName>” +
“<lastName>Davolio</lastName>” +                      
“</name><city>Seattle</city>” +
“<state>WA</state><zipCode>98122</zipCode>” +
“</employee></employees>”);   

//Save the XML data onto a file                   
empDoc.Save(@”C:\Data\Employees.xml”);

}
catch (XmlException xmlEx)
{

 Response.Write(“XmlException: “ + xmlEx.Message);

}
catch (Exception ex)
{

 Response.Write(“Exception: “ + ex.Message);

}       

}  

</script>

In Listing 6-2, the Page_Load event starts by declaring a string variable that holds the path to the XML
file. Then it creates two instances of XmlDocument object; one for loading an XML document from the
file system and the other one for loading an XML document from a string variable. The ContentType

property of the XmlDocument object is then set to text/xml to indicate to the browser that the rendered
content is indeed an XML document.

Response.ContentType = “text/xml”;

Before loading the XML file, you also set the PreserveWhitespace property of the XmlDocument object
to true to preserve the white spaces so that the document fidelity can be retained.

booksDoc.PreserveWhitespace = true;






09_596772 ch06.qxd  12/13/05  11:13 PM  Page 142



The code then loads the XML file by invoking the Load() method of the XmlDocument passing in the
path to the XML file as an argument.

booksDoc.Load(xmlPath);

After that, the loaded XML content is displayed onto the browser through the InnerXml property of the

XmlDocument object.

Response.Write(booksDoc.InnerXml);

The XML DOM programming interface also provides you with a LoadXml() method to build a DOM
from a well-formed XML string. That XML is then persisted to a file named Employees.xml by calling
the Save() method of the XmlDocument object. You see more on the Save() method in the “Creating
XML Documents” section later in this chapter.

empDoc.Save(@”C:\Data\Employees.xml”);

When you load the XML through the LoadXml() method, you need to understand
that this method neither supports validation nor preserves white spaces. Any
context-specific information you might need (such as DTD, entities, namespaces)
must necessarily be embedded in the string to be taken into account.

All these lines of code that load and save the XML are embedded within the scope of a try..catch

block to ensure that the generated exceptions are caught and handled in a gracious manner. In this case,
the exception message is displayed onto the browser. If everything goes well, navigating to the page
using the browser results in the output shown in Figure 6-3.

Figure 6-3






09_596772 ch06.qxd  12/13/05  11:13 PM  Page 143

XML DOM Object Model

Parsing an XML Document Using XmlDocument Class

After the XmlDocument is loaded with data, you then need to be able to traverse the DOM tree. For this
purpose, the XmlDocument exposes a number of methods. The best way to traverse a tree data structure
is by recursion. Listing 6-3 shows how you can use recursion to traverse the XML DOM tree. As the code
traverses the tree, it parses the contents of the XML document and outputs its element node including
text and attributes to the browser.

Listing 6-3: Traversing DOM Tree Using XmlDocument Class

<%@ Page Language=”C#” %>
<%@ Import Namespace=”System.Xml” %>
<script runat=”server”>

void Page_Load(object sender, EventArgs e)
{

 string xmlPath = Request.PhysicalApplicationPath +

@”\App_Data\Books.xml”;

XmlDocument doc = new XmlDocument();
doc.Load(xmlPath);
XmlNode rootNode = doc.DocumentElement;       
DisplayNodes(rootNode);       

}

void DisplayNodes(XmlNode node)
{

 //Print the node type, node name and node value of the node
if (node.NodeType == XmlNodeType.Text)
{

 Response.Write(“Type= [“ + node.NodeType+ “] Value=” +

node.Value + “<br>”);

}
else
{

 Response.Write(“Type= [“ + node.NodeType+”] Name=” +

node.Name + “<br>”);

}
//Print attributes of the node
if (node.Attributes != null)
{

 XmlAttributeCollection attrs = node.Attributes;
foreach (XmlAttribute attr in attrs)
{

 Response.Write(“Attribute Name =” + attr.Name +

“Attribute Value =” + attr.Value);

}  

}
//Print individual children of the node
XmlNodeList children = node.ChildNodes;
foreach (XmlNode child in children)
{

 DisplayNodes(child);

}

}

</script>
<html xmlns=”http://www.w3.org/1999/xhtml” >
<head runat=”server”>

 143




09_596772 ch06.qxd  12/13/05  11:13 PM  Page 144



<title>Traversing the DOM Tree</title>

</head>
<body>

<form id=”form1” runat=”server”>

<div>
</div>

</form>

</body>
</html>

As you can see from Listing 6-3, the core class that forms the root of this tree is the XmlDocument class.
This code loads the XmlDocument with data from the books.xml file and uses that as the basis to tra-
verse the document.

The XmlDocument is first instantiated, and a file URL is passed to it. The document loads the XML from
the file and automatically generates the DOM tree.

XmlDocument doc = new XmlDocument();
doc.Load(xmlPath);

Next, you get a handle to the root node of the document tree:

XmlNode rootNode = doc.DocumentElement;       

After the root node is obtained, the DisplayNodes() method is then invoked to recursively traverse
through all the children of that node. DisplayNodes() is generic enough to print details of any node
type. Remember that the DOM tree consists of nodes of different types (elements, attributes, processing
instructions, comments, text nodes, and so on). This example just prints the generic information about
the node (name, type); if it’s a text node, it prints the value of the node as well.

if (node.NodeType == XmlNodeType.Text)
{

 Response.Write(“Type= [“ + node.NodeType+ “] Value=” +
node.Value + “<br>”);

}
else
{

 Response.Write(“Type= [“ + node.NodeType+”] Name=” +

node.Name + “<br>”);

}

Next, the code prints any attributes associated with the node.

if (node.Attributes != null)
{

 XmlAttributeCollection attrs = node.Attributes;
foreach (XmlAttribute attr in attrs)
{

 Response.Write(“Attribute Name =” + attr.Name +

“Attribute Value =” + attr.Value);

}  

}

144





XML DOM Object Model

The last step gets all the children of the current node and calls DisplayNodes() on each of the children.
Note that the ChildNodes method gets only the direct children of the node. To get all children of a node,
you must use recursive code as follows.

XmlNodeList children = node.ChildNodes;
foreach (XmlNode child in children)
{

 DisplayNodes(child);

}

Navigate to the page from a browser and you should see something similar to Figure 6-4.

Figure 6-4

Finding Nodes

Using the ChildNodes, FirstChild, LastChild, NextSibling, PreviousSibling, ParentNode, and

OwnerDocument properties of DOM, a program can navigate through a document hierarchy. You could
use these methods to build a function that searches the hierarchy for specific nodes. Fortunately, the
DOM provides several functions out of the box that obviates the need to write your own subroutines
that search for nodes within an XML document. To this end, the DOM objects provide methods such as

GetElementsByTagName(), GetElementById(), SelectNodes(), and SelectSingleNode() and for
finding specific nodes. The following sections describe these methods in detail.








GetElementsByTagName

The GetElementsByTagName() method returns an XmlNodeList containing references to nodes that
have a given name. Note that GetElementsByTagName() may return nodes at different levels of the
subtree, and some nodes may be descendants of others. For example, if you have an XML document that
defines a book node that contains two child nodes that are also named book. If a program searched this
document for nodes named book, GetElementsByTagName() would return all three nodes. Depending
on what you want to do with the nodes, you may need to be careful not to process a node more than once.

Both the XmlDocument and XmlElement classes provide the GetElementsByTagName()

method. The XmlDocument version searches the entire document for nodes with the
given name. The XmlElement version of GetElementsByTagName() searches the
document subtree rooted at the element.

The GetElementsByTagName() has two overloads.

public XmlNodeList GetElementsByTagName(string);
public XmlNodeList GetElementsByTagName(string, string);

The first method returns the list of all descendant elements that match the specified name. The second
method also returns the same list of all descendant elements but based on criteria of the specified name
as well as the namespace URI.

GetElementByld

The GetElementById() method returns the first node it finds with a specified ID attribute. Similar
to GetElementsByTagName(), this method searches for nodes that have a certain property. Unlike

GetElementsByTagName(), however, this method only returns the first match it finds rather than an

XmlNodeList containing all of the nodes that have the correct ID.

GetElementById() returns only the first element with an ID attribute that has the value you specified.
If you want to examine all matching nodes, you can use SelectNodes() to do something similar. For
example, SelectNodes(“//*[@Index=’3’]”) returns an XmlNodeList containing all nodes that
have Index attributes with value 3. Note that this statement does not verify that the attributes are marked
with the ID type so it’s not exactly the same as GetElementById. Although attributes of type ID can be
defined in either XSD schemas or DTDs, the current implementation of the GetElementById only
supports those defined in DTDs.

GetElementById() examines each node’s attributes, looking for one that is marked as an ID. Simply
naming the attribute ID is not enough. The XML document must identify the attribute as having type ID.
After you find a node with a matching ID, you can use local navigation methods such as NextSibling,

PreviousSibling, and Parent to move to different parts of the document.

SelectNodes

The SelectNodes() method returns an XmlNodeList containing references to nodes that match a spec-
ified XPath expression. An XPath expression gives a node’s location within an XML document much as a
file path describes a file’s location on a disk. Although file paths are relatively simple, XPath allows you
to specify a very complex set of node criteria to select nodes. You will see more on XPath and the use of

SelectNodes() and SelectSingleNode() methods in a later section of this chapter.







XML DOM Object Model

SelectSingleNode

The SelectSingleNode() method is similar to SelectNodes() except it returns only the first node
that matches an XPath expression instead of all of the nodes that match. After a program has located the
matching node, it can use other document navigation methods such as NextSibling, PreviousSibling,
and ParentNode to move through the document. In some cases this can be more efficient than using

SelectNodes.

An Example on Finding Nodes

Listing 6-4 provides an example of how to find nodes in an XML document using the

GetElementsByTagName() method.

Listing 6-4: Quer ying Nodes

<%@ Page Language=”C#” %>
<%@ Import Namespace=”System.Xml” %>
<script runat=”server”>

void Page_Load(object sender, EventArgs e)
{

 string xmlPath = Request.PhysicalApplicationPath +

@”\App_Data\Books.xml”;

XmlDocument doc = new XmlDocument();
doc.Load(xmlPath);       
//Get all job titles in the XML file
XmlNodeList titleList = doc.GetElementsByTagName(“title”);
Response.Write(“Titles: “ + “<br>”);
foreach (XmlNode node in titleList)
{

 Response.Write(“Title : “ + node.FirstChild.Value + “<br>”);

}
//Get reference to the first author node in the XML file
XmlNode authorNode = doc.GetElementsByTagName(“author”)[0];
foreach (XmlNode child in authorNode.ChildNodes)
{

 if ((child.Name == “first-name”) &&

(child.NodeType == XmlNodeType.Element))

{

 Response.Write(“First Name : “ + child.FirstChild.Value + “<br>”);

}
if ((child.Name == “last-name”) &&

(child.NodeType == XmlNodeType.Element))

{

 Response.Write(“Last Name : “ + child.FirstChild.Value + “<br>”);

}

}

}   

</script>
<html xmlns=”http://www.w3.org/1999/xhtml” >
<head runat=”server”>

<title>Querying for specific nodes</title>

</head>









<body>

<form id=”form1” runat=”server”>

<div>
</div>

</form>

</body>
</html>

In the code, you first get reference to all the title nodes in the form of an XmlNodeList object.

XmlNodeList titleList = doc.GetElementsByTagName(“title”);

You then loop through the XmlNodeList collection and display the value. Similarly you get reference to
the first author node in the XML document using the same GetElementsByTagName() method.

XmlNode authorNode = doc.GetElementsByTagName(“author”)[0];

As you can see from the code, the first element in the XmlNodeList collection object is returned through
the [0] prefix. You then loop though the child nodes of the author and display its contents using

Response.Write() statements.

Selecting a DOM Subtree Using the XmlNodeReader Class

Suppose you have selected a node about which you need more information. To scan all the nodes that
form the subtree using XML DOM, your only option is to use a recursive algorithm such as the one dis-
cussed with the previous example. The XmlNodeReader class gives you an effective, and ready-to-use,
alternative by providing a reader over a given DOM node subtree. The following lines of code demon-
strate this.

XmlDocument doc = new XmlDocument();
doc.Load(xmlPath);
//Get reference to the book node with the right genre attribute
XmlNode bookNode =

doc.SelectSingleNode(“/bookstore/book[@genre=’autobiography’]”);

XmlNodeReader reader = new XmlNodeReader(bookNode);
while(reader.Read())
{

 //Display only the element names and values
if (reader.NodeType == XmlNodeType.Element)               

lstOutput.Items.Add(“Node Name:” + reader.Name);

if (reader.NodeType == XmlNodeType.Text)                    

lstOutput.Items.Add(“Node Value:” + reader.Value);                       

}      

The while loop visits all the nodes belonging to the specified XML DOM subtree. The node reader class
is initialized using the XmlNode object that is one of the book nodes in the XML DOM subtree. After you
have the node subtree in the form of an XmlNodeReader object, you can then easily loop through it
using the Read() method.







XML DOM Object Model

Programmatically Creating XML Documents

If your primary goal is analyzing the contents of an XML document, you will probably find the XML
DOM parsing model much more effective than readers in spite of the larger memory footprint and
set-up time it requires. A document loaded through XML DOM can be modified, extended, shrunk, and,
more important, searched. The same can’t be done with XML readers; XML readers solve a different type
of problem.

To create an XML document using the XML DOM API, you must first create the document in memory,
create nodes and then call the Save() method or one of its overloads. This approach gives you great
flexibility because you can work with the in-memory document efficiently till you finally decide to save
the document.

In terms of the internal implementation, it is worth noting that the XML DOM’s

Save() method makes use of an XML text writer to create the document. So unless
the content to be generated is complex and subject to a lot of conditions, using an
XML writer to create XML documents is much faster.
The XmlNodeReader reads and returns nodes from the subtree, including entity
reference nodes. The XmlNodeReader not only enforces the XML well-formedness
rules, but also expands default attributes and entities, if DTD information is present
in the XmlDocument.

The XmlDocument class provides a bunch of methods to create new nodes. These methods are named
consistently with the writing methods of the XmlWriter class you encountered in Chapter 4. The next
section reviews these methods in detail.

Creating and Appending Nodes

To add new nodes to the document, you must first use the XmlDocument class’s factory methods for
creating a new node and then add it somewhere in the document. They are called factory methods
because they are responsible for creating a new node of a given type. These methods start with

“Create” and end with the node type to create. For example, the method to create a new Text node is
named CreateTextNode() and the method to create a new Element node is called CreateElement().
Also using the Create methods ensures that the created node will have the same namespace as the rest
of the document. The following list reviews all the Create() methods and provides a brief description
of their functionalities:

  CreateAttribute() —Creates an Attribute node with the given name

  CreateCDataSection() — Creates a CDATA section with the specified content

  CreateComment()— Creates a Comment node with the specified content

  CreateDocumentFragment()— Creates an empty DocumentFragment node

  CreateElement()— Creates an Element node with the given tag name








  CreateEntityReference()— Creates an EntityReference node

  CreateProcessingInstruction() —Creates a ProcessingInstruction node with the

given content

  CreateTextNode() —Creates a new Text node with the specified content

For example, suppose you wanted to add another <book> element to the bookstore document. To do so,
you would need to create nine new nodes to hold the information. Each of the four tags is a new node

(<title>, <first-name>, <last-name>, and <price>), and the text that goes inside the nodes are
also nodes. Finally, the genre attribute on the <book> tag is a new node.

Note that XML DOM API in .NET Framework also provides the InsertAfter() method, which
inserts a node after another node, but this method is not part of the standard W3C DOM API.

Now that you have a general understanding of the methods required for creating nodes, it is time to
look at the basic steps to create an XML document on the fly. They are as follows:

  Create any necessary nodes

  Link the nodes to create a tree

  Append the tree to the in-memory XML document

  Optionally save the document

Before you create the necessary nodes, you should first create the standard XML declaration. The following
code creates the XML prolog and appends to the XmlDocument instance the standard XML declaration and a
comment node:

XmlDocument doc = new XmlDocument();
// Write and append the XML heading
XmlNode declarationNode = doc.CreateXmlDeclaration(“1.0”, “”, “”);
doc.AppendChild(declarationNode);
// Write and append some comment
XmlNode comment = doc.CreateComment(“This file represents “ +

“a fragment of a book store inventory database”);

doc.AppendChild(comment);           

The CreateXmlDeclaration() method takes three arguments: the XML version, the required encoding,
and a boolean value denoting whether the document can be considered stand-alone or has dependencies
on other documents. All arguments are strings, including the encoding argument, as shown here:

<?xml version=”1.0” standalone=”yes” encoding=”utf-8”?>

If specified, the encoding is written in the XML declaration and used by Save() to create the actual out-
put stream. If the encoding is null or empty, no encoding attribute is set, and the default Unicode
Universal Character Set Transformation Format, 8-bit form (UTF-8) encoding is used.

CreateXmlDeclaration() returns an XmlDeclaration node that you add as a child to the

XmlDocument class. CreateComment(), on the other hand, creates an XmlComment node that represents
an XML comment, as shown here:

<!-- This file represents a fragment of a book store inventory database -->





learn guitarphysics learnteliphonyxmlphysicsenjoylife