Learn Xml: Working with XmlDocument Class

To be fully accessible, an XML document must be entirely loaded in memory and its nodes and

attributes mapped to relative objects derived from the XmlNode class. The process that builds the XML

DOM is triggered when you call the Load() method. You can use a variety of sources to indicate the

XML document to work on, including disk files and URLs and also streams and text readers. But before

you load an XmlDocument, you need to first create an XML document, which is the topic of focus in the

next section.

Creating an XmlDocument

To load an XML document into memory for full-access processing, you create a new instance of the

XmlDocument class. The class features three public constructors, one of which is the default parameter-

less constructor, as shown here:

public XmlDocument();

public XmlDocument(XmlNameTable);

public XmlDocument(XmlImplementation);

The second overloaded constructor takes in an XmlNameTable object as an argument that allows the

class to work faster with attribute and node names and optimize memory management. Just as the

XmlReader class does, XmlDocument builds its own name table incrementally while processing the

document. Passing a precompiled name table, however, can substantially speed up the overall execu-

tion. The third overloaded constructor allows you to initialize an XmlDocument class with the specified

XmlImplementation class. The XmlImplementation class is a special class that allows you to define

the context for a set of XmlDocument objects. This class provides methods for performing operations that

are independent of any particular instance of the DOM.

method), testing for supported features (through the HasFeature() method), and

more important, sharing the same name table.

The following code snippet shows how to create two documents from the same implementation:

XmlImplementation xmlImpl = new XmlImplementation();

XmlDocument doc1 = xmlImpl.CreateDocument();

XmlDocument doc2 = xmlImpl.CreateDocument();

After you have an empty XmlDocument, you need to load it with XML data. The next section discusses

how to perform this.

Loading XML Documents

Loading of an XML document is accomplished by calling the Load() method, which reads XML data

and populates the document tree structure. There are four different versions of the Load() method, each

of which uses a different source to read the data. Here are the various forms of the Load() method:

Load(Stream): Loads the document from a Stream data source

Load(string): Loads the document using the given file name string

Load(TextReader): Loads the document using a TextReader as the data source

Load(XmlReader): Loads the document using the given XmlReader as the data source

In addition to taking a Stream, TextReader, and XmlReader objects, the Load() method also takes in a

file name as a string argument. Using this method, you can load an XML document from the specified

URL. Apart from the overloaded Load() methods, there is also a method named LoadXml() that makes

it possible to load the XML document from a string of data as its argument.

Note that when you load a new XmlDocument object, the current instance of the

XmlDocument object is cleared. This means that if you reuse the same instance of the

XmlDocument class to load a second document, the existing contents are entirely

removed and replaced with the contents of the second document.

In the base implementation of the XmlImplementation class, the list of operations

that various instances of XmlDocument classes can share is relatively short. These

operations include creating new documents (through the CreateDocument()

Listing 6-2 shows two ways to load an XmlDocument: first from a disk file and then by using a string

variable that you have created in your application code.

09_596772 ch06.qxd 12/13/05 11:13 PM Page 141

XML DOM Object Model

Listing 6-2: Loading XML Documents

<%@ Page Language=”C#” %>

<%@ Import Namespace=”System.Xml” %>

void Page_Load(object sender, EventArgs e)

{

string xmlPath = Request.PhysicalApplicationPath +

@”\App_Data\Books.xml”;

XmlDocument booksDoc = new XmlDocument();

XmlDocument empDoc = new XmlDocument();

Response.ContentType = “text/xml”;

try

{

//Load the XML from the file

booksDoc.PreserveWhitespace = true;

booksDoc.Load(xmlPath);

//Write the XML onto the browser

Response.Write(booksDoc.InnerXml);

//Load the XML from a String

empDoc.LoadXml(“<employees>” +

“<employee id=’1’>” +

“<name><firstName>Nancy</firstName>” +

“<lastName>Davolio</lastName>” +

“</name><city>Seattle</city>” +

“<state>WA</state><zipCode>98122</zipCode>” +

“</employee></employees>”);

//Save the XML data onto a file

empDoc.Save(@”C:\Data\Employees.xml”);

}

catch (XmlException xmlEx)

{

Response.Write(“XmlException: “ + xmlEx.Message);

}

catch (Exception ex)

{

Response.Write(“Exception: “ + ex.Message);

}

</script>

In Listing 6-2, the Page_Load event starts by declaring a string variable that holds the path to the XML

file. Then it creates two instances of XmlDocument object; one for loading an XML document from the

file system and the other one for loading an XML document from a string variable. The ContentType

property of the XmlDocument object is then set to text/xml to indicate to the browser that the rendered

content is indeed an XML document.

Response.ContentType = “text/xml”;

Before loading the XML file, you also set the PreserveWhitespace property of the XmlDocument object

to true to preserve the white spaces so that the document fidelity can be retained.

booksDoc.PreserveWhitespace = true;

09_596772 ch06.qxd 12/13/05 11:13 PM Page 142

The code then loads the XML file by invoking the Load() method of the XmlDocument passing in the

path to the XML file as an argument.

booksDoc.Load(xmlPath);

After that, the loaded XML content is displayed onto the browser through the InnerXml property of the

XmlDocument object.

Response.Write(booksDoc.InnerXml);

The XML DOM programming interface also provides you with a LoadXml() method to build a DOM

from a well-formed XML string. That XML is then persisted to a file named Employees.xml by calling

the Save() method of the XmlDocument object. You see more on the Save() method in the “Creating

XML Documents” section later in this chapter.

empDoc.Save(@”C:\Data\Employees.xml”);

When you load the XML through the LoadXml() method, you need to understand

that this method neither supports validation nor preserves white spaces. Any

context-specific information you might need (such as DTD, entities, namespaces)

must necessarily be embedded in the string to be taken into account.

All these lines of code that load and save the XML are embedded within the scope of a try..catch

block to ensure that the generated exceptions are caught and handled in a gracious manner. In this case,

the exception message is displayed onto the browser. If everything goes well, navigating to the page

using the browser results in the output shown in Figure 6-3.

Figure 6-3

09_596772 ch06.qxd 12/13/05 11:13 PM Page 143

XML DOM Object Model

Parsing an XML Document Using XmlDocument Class

After the XmlDocument is loaded with data, you then need to be able to traverse the DOM tree. For this

purpose, the XmlDocument exposes a number of methods. The best way to traverse a tree data structure

is by recursion. Listing 6-3 shows how you can use recursion to traverse the XML DOM tree. As the code

traverses the tree, it parses the contents of the XML document and outputs its element node including

text and attributes to the browser.

Listing 6-3: Traversing DOM Tree Using XmlDocument Class

<%@ Page Language=”C#” %>

<%@ Import Namespace=”System.Xml” %>

void Page_Load(object sender, EventArgs e)

{

string xmlPath = Request.PhysicalApplicationPath +

@”\App_Data\Books.xml”;

XmlDocument doc = new XmlDocument();

doc.Load(xmlPath);

XmlNode rootNode = doc.DocumentElement;

DisplayNodes(rootNode);

}

void DisplayNodes(XmlNode node)

{

//Print the node type, node name and node value of the node

if (node.NodeType == XmlNodeType.Text)

{

Response.Write(“Type= [“ + node.NodeType+ “] Value=” +

node.Value + “<br>”);

}

else

{

Response.Write(“Type= [“ + node.NodeType+”] Name=” +

node.Name + “<br>”);

}

//Print attributes of the node

if (node.Attributes != null)

{

XmlAttributeCollection attrs = node.Attributes;

foreach (XmlAttribute attr in attrs)

{

Response.Write(“Attribute Name =” + attr.Name +

“Attribute Value =” + attr.Value);

}

//Print individual children of the node

XmlNodeList children = node.ChildNodes;

foreach (XmlNode child in children)

{

DisplayNodes(child);

}

</script>

143

09_596772 ch06.qxd 12/13/05 11:13 PM Page 144

<title>Traversing the DOM Tree</title>

</head>

<body>

<div>

</div>

</form>

</body>

</html>

As you can see from Listing 6-3, the core class that forms the root of this tree is the XmlDocument class.

This code loads the XmlDocument with data from the books.xml file and uses that as the basis to tra-

verse the document.

The XmlDocument is first instantiated, and a file URL is passed to it. The document loads the XML from

the file and automatically generates the DOM tree.

XmlDocument doc = new XmlDocument();

doc.Load(xmlPath);

Next, you get a handle to the root node of the document tree:

XmlNode rootNode = doc.DocumentElement;

After the root node is obtained, the DisplayNodes() method is then invoked to recursively traverse

through all the children of that node. DisplayNodes() is generic enough to print details of any node

type. Remember that the DOM tree consists of nodes of different types (elements, attributes, processing

instructions, comments, text nodes, and so on). This example just prints the generic information about

the node (name, type); if it’s a text node, it prints the value of the node as well.

if (node.NodeType == XmlNodeType.Text)

{

Response.Write(“Type= [“ + node.NodeType+ “] Value=” +

node.Value + “<br>”);

}

else

{

Response.Write(“Type= [“ + node.NodeType+”] Name=” +

node.Name + “<br>”);

}

Next, the code prints any attributes associated with the node.

if (node.Attributes != null)

{

XmlAttributeCollection attrs = node.Attributes;

foreach (XmlAttribute attr in attrs)

{

Response.Write(“Attribute Name =” + attr.Name +

“Attribute Value =” + attr.Value);

}

144

XML DOM Object Model

The last step gets all the children of the current node and calls DisplayNodes() on each of the children.

Note that the ChildNodes method gets only the direct children of the node. To get all children of a node,

you must use recursive code as follows.

XmlNodeList children = node.ChildNodes;

foreach (XmlNode child in children)

{

DisplayNodes(child);

}

Navigate to the page from a browser and you should see something similar to Figure 6-4.

Figure 6-4

Finding Nodes

Using the ChildNodes, FirstChild, LastChild, NextSibling, PreviousSibling, ParentNode, and

OwnerDocument properties of DOM, a program can navigate through a document hierarchy. You could

use these methods to build a function that searches the hierarchy for specific nodes. Fortunately, the

DOM provides several functions out of the box that obviates the need to write your own subroutines

that search for nodes within an XML document. To this end, the DOM objects provide methods such as

GetElementsByTagName(), GetElementById(), SelectNodes(), and SelectSingleNode() and for

finding specific nodes. The following sections describe these methods in detail.

GetElementsByTagName

The GetElementsByTagName() method returns an XmlNodeList containing references to nodes that

have a given name. Note that GetElementsByTagName() may return nodes at different levels of the

subtree, and some nodes may be descendants of others. For example, if you have an XML document that

defines a book node that contains two child nodes that are also named book. If a program searched this

document for nodes named book, GetElementsByTagName() would return all three nodes. Depending

on what you want to do with the nodes, you may need to be careful not to process a node more than once.

Both the XmlDocument and XmlElement classes provide the GetElementsByTagName()

method. The XmlDocument version searches the entire document for nodes with the

given name. The XmlElement version of GetElementsByTagName() searches the

document subtree rooted at the element.

The GetElementsByTagName() has two overloads.

public XmlNodeList GetElementsByTagName(string);

public XmlNodeList GetElementsByTagName(string, string);

The first method returns the list of all descendant elements that match the specified name. The second

method also returns the same list of all descendant elements but based on criteria of the specified name

as well as the namespace URI.

GetElementByld

The GetElementById() method returns the first node it finds with a specified ID attribute. Similar

to GetElementsByTagName(), this method searches for nodes that have a certain property. Unlike

GetElementsByTagName(), however, this method only returns the first match it finds rather than an

XmlNodeList containing all of the nodes that have the correct ID.

GetElementById() returns only the first element with an ID attribute that has the value you specified.

If you want to examine all matching nodes, you can use SelectNodes() to do something similar. For

example, SelectNodes(“//*[@Index=’3’]”) returns an XmlNodeList containing all nodes that

have Index attributes with value 3. Note that this statement does not verify that the attributes are marked

with the ID type so it’s not exactly the same as GetElementById. Although attributes of type ID can be

defined in either XSD schemas or DTDs, the current implementation of the GetElementById only

supports those defined in DTDs.

GetElementById() examines each node’s attributes, looking for one that is marked as an ID. Simply

naming the attribute ID is not enough. The XML document must identify the attribute as having type ID.

After you find a node with a matching ID, you can use local navigation methods such as NextSibling,

PreviousSibling, and Parent to move to different parts of the document.

SelectNodes

The SelectNodes() method returns an XmlNodeList containing references to nodes that match a spec-

ified XPath expression. An XPath expression gives a node’s location within an XML document much as a

file path describes a file’s location on a disk. Although file paths are relatively simple, XPath allows you

to specify a very complex set of node criteria to select nodes. You will see more on XPath and the use of

SelectNodes() and SelectSingleNode() methods in a later section of this chapter.

XML DOM Object Model

SelectSingleNode

The SelectSingleNode() method is similar to SelectNodes() except it returns only the first node

that matches an XPath expression instead of all of the nodes that match. After a program has located the

matching node, it can use other document navigation methods such as NextSibling, PreviousSibling,

and ParentNode to move through the document. In some cases this can be more efficient than using

SelectNodes.

An Example on Finding Nodes

Listing 6-4 provides an example of how to find nodes in an XML document using the

GetElementsByTagName() method.

Listing 6-4: Quer ying Nodes

<%@ Page Language=”C#” %>

<%@ Import Namespace=”System.Xml” %>

void Page_Load(object sender, EventArgs e)

{

string xmlPath = Request.PhysicalApplicationPath +

@”\App_Data\Books.xml”;

XmlDocument doc = new XmlDocument();

doc.Load(xmlPath);

//Get all job titles in the XML file

XmlNodeList titleList = doc.GetElementsByTagName(“title”);

Response.Write(“Titles: “ + “<br>”);

foreach (XmlNode node in titleList)

{

Response.Write(“Title : “ + node.FirstChild.Value + “<br>”);

}

//Get reference to the first author node in the XML file

XmlNode authorNode = doc.GetElementsByTagName(“author”)[0];

foreach (XmlNode child in authorNode.ChildNodes)

{

if ((child.Name == “first-name”) &&

(child.NodeType == XmlNodeType.Element))

{

Response.Write(“First Name : “ + child.FirstChild.Value + “<br>”);

}

if ((child.Name == “last-name”) &&

(child.NodeType == XmlNodeType.Element))

{

Response.Write(“Last Name : “ + child.FirstChild.Value + “<br>”);

}

</script>

<title>Querying for specific nodes</title>

</head>

<body>

<div>

</div>

</form>

</body>

</html>

In the code, you first get reference to all the title nodes in the form of an XmlNodeList object.

XmlNodeList titleList = doc.GetElementsByTagName(“title”);

You then loop through the XmlNodeList collection and display the value. Similarly you get reference to

the first author node in the XML document using the same GetElementsByTagName() method.

XmlNode authorNode = doc.GetElementsByTagName(“author”)[0];

As you can see from the code, the first element in the XmlNodeList collection object is returned through

the [0] prefix. You then loop though the child nodes of the author and display its contents using

Response.Write() statements.

Selecting a DOM Subtree Using the XmlNodeReader Class

Suppose you have selected a node about which you need more information. To scan all the nodes that

form the subtree using XML DOM, your only option is to use a recursive algorithm such as the one dis-

cussed with the previous example. The XmlNodeReader class gives you an effective, and ready-to-use,

alternative by providing a reader over a given DOM node subtree. The following lines of code demon-

strate this.

XmlDocument doc = new XmlDocument();

doc.Load(xmlPath);

//Get reference to the book node with the right genre attribute

XmlNode bookNode =

doc.SelectSingleNode(“/bookstore/book[@genre=’autobiography’]”);

XmlNodeReader reader = new XmlNodeReader(bookNode);

while(reader.Read())

{

//Display only the element names and values

if (reader.NodeType == XmlNodeType.Element)

lstOutput.Items.Add(“Node Name:” + reader.Name);

if (reader.NodeType == XmlNodeType.Text)

lstOutput.Items.Add(“Node Value:” + reader.Value);

}

The while loop visits all the nodes belonging to the specified XML DOM subtree. The node reader class

is initialized using the XmlNode object that is one of the book nodes in the XML DOM subtree. After you

have the node subtree in the form of an XmlNodeReader object, you can then easily loop through it

using the Read() method.

XML DOM Object Model

Programmatically Creating XML Documents

If your primary goal is analyzing the contents of an XML document, you will probably find the XML

DOM parsing model much more effective than readers in spite of the larger memory footprint and

set-up time it requires. A document loaded through XML DOM can be modified, extended, shrunk, and,

more important, searched. The same can’t be done with XML readers; XML readers solve a different type

of problem.

To create an XML document using the XML DOM API, you must first create the document in memory,

create nodes and then call the Save() method or one of its overloads. This approach gives you great

flexibility because you can work with the in-memory document efficiently till you finally decide to save

the document.

In terms of the internal implementation, it is worth noting that the XML DOM’s

Save() method makes use of an XML text writer to create the document. So unless

the content to be generated is complex and subject to a lot of conditions, using an

XML writer to create XML documents is much faster.

The XmlNodeReader reads and returns nodes from the subtree, including entity

reference nodes. The XmlNodeReader not only enforces the XML well-formedness

rules, but also expands default attributes and entities, if DTD information is present

in the XmlDocument.

The XmlDocument class provides a bunch of methods to create new nodes. These methods are named

consistently with the writing methods of the XmlWriter class you encountered in Chapter 4. The next

section reviews these methods in detail.

Creating and Appending Nodes

To add new nodes to the document, you must first use the XmlDocument class’s factory methods for

creating a new node and then add it somewhere in the document. They are called factory methods

because they are responsible for creating a new node of a given type. These methods start with

“Create” and end with the node type to create. For example, the method to create a new Text node is

named CreateTextNode() and the method to create a new Element node is called CreateElement().

Also using the Create methods ensures that the created node will have the same namespace as the rest

of the document. The following list reviews all the Create() methods and provides a brief description

of their functionalities:

CreateAttribute() —Creates an Attribute node with the given name

CreateCDataSection() — Creates a CDATA section with the specified content

CreateComment()— Creates a Comment node with the specified content

CreateDocumentFragment()— Creates an empty DocumentFragment node

CreateElement()— Creates an Element node with the given tag name

CreateEntityReference()— Creates an EntityReference node

CreateProcessingInstruction() —Creates a ProcessingInstruction node with the

given content

CreateTextNode() —Creates a new Text node with the specified content

For example, suppose you wanted to add another <book> element to the bookstore document. To do so,

you would need to create nine new nodes to hold the information. Each of the four tags is a new node

(<title>, <first-name>, <last-name>, and <price>), and the text that goes inside the nodes are

also nodes. Finally, the genre attribute on the <book> tag is a new node.

Note that XML DOM API in .NET Framework also provides the InsertAfter() method, which

inserts a node after another node, but this method is not part of the standard W3C DOM API.

Now that you have a general understanding of the methods required for creating nodes, it is time to

look at the basic steps to create an XML document on the fly. They are as follows:

Create any necessary nodes

Link the nodes to create a tree

Append the tree to the in-memory XML document

Optionally save the document

Before you create the necessary nodes, you should first create the standard XML declaration. The following

code creates the XML prolog and appends to the XmlDocument instance the standard XML declaration and a

comment node:

XmlDocument doc = new XmlDocument();

// Write and append the XML heading

XmlNode declarationNode = doc.CreateXmlDeclaration(“1.0”, “”, “”);

doc.AppendChild(declarationNode);

// Write and append some comment

XmlNode comment = doc.CreateComment(“This file represents “ +

“a fragment of a book store inventory database”);

doc.AppendChild(comment);

The CreateXmlDeclaration() method takes three arguments: the XML version, the required encoding,

and a boolean value denoting whether the document can be considered stand-alone or has dependencies

on other documents. All arguments are strings, including the encoding argument, as shown here:

<?xml version=”1.0” standalone=”yes” encoding=”utf-8”?>

If specified, the encoding is written in the XML declaration and used by Save() to create the actual out-

put stream. If the encoding is null or empty, no encoding attribute is set, and the default Unicode

Universal Character Set Transformation Format, 8-bit form (UTF-8) encoding is used.

CreateXmlDeclaration() returns an XmlDeclaration node that you add as a child to the

XmlDocument class. CreateComment(), on the other hand, creates an XmlComment node that represents

an XML comment, as shown here:

learn guitar physics learn teliphony xml physics enjoylife

Working with XmlDocument Class

lanka sri

Blogs