To be fully accessible, an XML document must be entirely loaded in memory and its nodes and attributes mapped to relative objects derived from the XmlNode class. The process that builds the XML DOM is triggered when you call the Load() method. You can use a variety of sources to indicate the XML document to work on, including disk files and URLs and also streams and text readers. But before you load an XmlDocument, you need to first create an XML document, which is the topic of focus in the next section. |
Creating an XmlDocument |
To load an XML document into memory for full-access processing, you create a new instance of the |
XmlDocument class. The class features three public constructors, one of which is the default parameter- less constructor, as shown here: |
public XmlDocument(); public XmlDocument(XmlNameTable); public XmlDocument(XmlImplementation); |
The second overloaded constructor takes in an XmlNameTable object as an argument that allows the class to work faster with attribute and node names and optimize memory management. Just as the |
XmlReader class does, XmlDocument builds its own name table incrementally while processing the |
document. Passing a precompiled name table, however, can substantially speed up the overall execu- tion. The third overloaded constructor allows you to initialize an XmlDocument class with the specified |
XmlImplementation class. The XmlImplementation class is a special class that allows you to define the context for a set of XmlDocument objects. This class provides methods for performing operations that are independent of any particular instance of the DOM. |
method), testing for supported features (through the HasFeature() method), and more important, sharing the same name table. |
The following code snippet shows how to create two documents from the same implementation: |
XmlImplementation xmlImpl = new XmlImplementation(); XmlDocument doc1 = xmlImpl.CreateDocument(); XmlDocument doc2 = xmlImpl.CreateDocument(); |
After you have an empty XmlDocument, you need to load it with XML data. The next section discusses how to perform this. |
Loading XML Documents |
Loading of an XML document is accomplished by calling the Load() method, which reads XML data and populates the document tree structure. There are four different versions of the Load() method, each of which uses a different source to read the data. Here are the various forms of the Load() method: |
Load(Stream): Loads the document from a Stream data source |
Load(string): Loads the document using the given file name string |
Load(TextReader): Loads the document using a TextReader as the data source |
Load(XmlReader): Loads the document using the given XmlReader as the data source |
In addition to taking a Stream, TextReader, and XmlReader objects, the Load() method also takes in a file name as a string argument. Using this method, you can load an XML document from the specified URL. Apart from the overloaded Load() methods, there is also a method named LoadXml() that makes it possible to load the XML document from a string of data as its argument. |
Note that when you load a new XmlDocument object, the current instance of the |
XmlDocument object is cleared. This means that if you reuse the same instance of the |
XmlDocument class to load a second document, the existing contents are entirely removed and replaced with the contents of the second document. In the base implementation of the XmlImplementation class, the list of operations that various instances of XmlDocument classes can share is relatively short. These operations include creating new documents (through the CreateDocument() |
Listing 6-2 shows two ways to load an XmlDocument: first from a disk file and then by using a string variable that you have created in your application code. |
09_596772 ch06.qxd 12/13/05 11:13 PM Page 141 |
XML DOM Object Model |
Listing 6-2: Loading XML Documents |
<%@ Page Language=”C#” %> <%@ Import Namespace=”System.Xml” %> <script runat=”server”> |
void Page_Load(object sender, EventArgs e) { |
string xmlPath = Request.PhysicalApplicationPath + |
@”\App_Data\Books.xml”; |
XmlDocument booksDoc = new XmlDocument(); XmlDocument empDoc = new XmlDocument(); Response.ContentType = “text/xml”; try { |
//Load the XML from the file booksDoc.PreserveWhitespace = true; booksDoc.Load(xmlPath); //Write the XML onto the browser Response.Write(booksDoc.InnerXml); //Load the XML from a String empDoc.LoadXml(“<employees>” + |
“<employee id=’1’>” + “<name><firstName>Nancy</firstName>” + “<lastName>Davolio</lastName>” + “</name><city>Seattle</city>” + “<state>WA</state><zipCode>98122</zipCode>” + “</employee></employees>”); |
//Save the XML data onto a file empDoc.Save(@”C:\Data\Employees.xml”); |
} catch (XmlException xmlEx) { |
Response.Write(“XmlException: “ + xmlEx.Message); |
} catch (Exception ex) { |
Response.Write(“Exception: “ + ex.Message); |
} |
} |
</script> |
In Listing 6-2, the Page_Load event starts by declaring a string variable that holds the path to the XML file. Then it creates two instances of XmlDocument object; one for loading an XML document from the file system and the other one for loading an XML document from a string variable. The ContentType |
property of the XmlDocument object is then set to text/xml to indicate to the browser that the rendered content is indeed an XML document. |
Response.ContentType = “text/xml”; |
Before loading the XML file, you also set the PreserveWhitespace property of the XmlDocument object to true to preserve the white spaces so that the document fidelity can be retained. |
booksDoc.PreserveWhitespace = true; |
09_596772 ch06.qxd 12/13/05 11:13 PM Page 142 |
The code then loads the XML file by invoking the Load() method of the XmlDocument passing in the path to the XML file as an argument. |
booksDoc.Load(xmlPath); |
After that, the loaded XML content is displayed onto the browser through the InnerXml property of the |
XmlDocument object. |
Response.Write(booksDoc.InnerXml); |
The XML DOM programming interface also provides you with a LoadXml() method to build a DOM from a well-formed XML string. That XML is then persisted to a file named Employees.xml by calling the Save() method of the XmlDocument object. You see more on the Save() method in the “Creating XML Documents” section later in this chapter. |
empDoc.Save(@”C:\Data\Employees.xml”); |
When you load the XML through the LoadXml() method, you need to understand that this method neither supports validation nor preserves white spaces. Any context-specific information you might need (such as DTD, entities, namespaces) must necessarily be embedded in the string to be taken into account. |
All these lines of code that load and save the XML are embedded within the scope of a try..catch |
block to ensure that the generated exceptions are caught and handled in a gracious manner. In this case, the exception message is displayed onto the browser. If everything goes well, navigating to the page using the browser results in the output shown in Figure 6-3. |
Figure 6-3 |
09_596772 ch06.qxd 12/13/05 11:13 PM Page 143 |
XML DOM Object Model |
Parsing an XML Document Using XmlDocument Class |
After the XmlDocument is loaded with data, you then need to be able to traverse the DOM tree. For this purpose, the XmlDocument exposes a number of methods. The best way to traverse a tree data structure is by recursion. Listing 6-3 shows how you can use recursion to traverse the XML DOM tree. As the code traverses the tree, it parses the contents of the XML document and outputs its element node including text and attributes to the browser. |
Listing 6-3: Traversing DOM Tree Using XmlDocument Class |
<%@ Page Language=”C#” %> <%@ Import Namespace=”System.Xml” %> <script runat=”server”> |
void Page_Load(object sender, EventArgs e) { |
string xmlPath = Request.PhysicalApplicationPath + |
@”\App_Data\Books.xml”; |
XmlDocument doc = new XmlDocument(); doc.Load(xmlPath); XmlNode rootNode = doc.DocumentElement; DisplayNodes(rootNode); |
} |
void DisplayNodes(XmlNode node) { |
//Print the node type, node name and node value of the node if (node.NodeType == XmlNodeType.Text) { |
Response.Write(“Type= [“ + node.NodeType+ “] Value=” + |
node.Value + “<br>”); |
} else { |
Response.Write(“Type= [“ + node.NodeType+”] Name=” + |
node.Name + “<br>”); |
} //Print attributes of the node if (node.Attributes != null) { |
XmlAttributeCollection attrs = node.Attributes; foreach (XmlAttribute attr in attrs) { |
Response.Write(“Attribute Name =” + attr.Name + |
“Attribute Value =” + attr.Value); |
} |
} //Print individual children of the node XmlNodeList children = node.ChildNodes; foreach (XmlNode child in children) { |
DisplayNodes(child); |
} |
} |
</script> <html xmlns=”http://www.w3.org/1999/xhtml” > <head runat=”server”> |
143 |
09_596772 ch06.qxd 12/13/05 11:13 PM Page 144 |
<title>Traversing the DOM Tree</title> |
</head> <body> |
<form id=”form1” runat=”server”> |
<div> </div> |
</form> |
</body> </html> |
As you can see from Listing 6-3, the core class that forms the root of this tree is the XmlDocument class. This code loads the XmlDocument with data from the books.xml file and uses that as the basis to tra- verse the document. |
The XmlDocument is first instantiated, and a file URL is passed to it. The document loads the XML from the file and automatically generates the DOM tree. |
XmlDocument doc = new XmlDocument(); doc.Load(xmlPath); |
Next, you get a handle to the root node of the document tree: |
XmlNode rootNode = doc.DocumentElement; |
After the root node is obtained, the DisplayNodes() method is then invoked to recursively traverse through all the children of that node. DisplayNodes() is generic enough to print details of any node type. Remember that the DOM tree consists of nodes of different types (elements, attributes, processing instructions, comments, text nodes, and so on). This example just prints the generic information about the node (name, type); if it’s a text node, it prints the value of the node as well. |
if (node.NodeType == XmlNodeType.Text) { |
Response.Write(“Type= [“ + node.NodeType+ “] Value=” + node.Value + “<br>”); |
} else { |
Response.Write(“Type= [“ + node.NodeType+”] Name=” + |
node.Name + “<br>”); |
} |
Next, the code prints any attributes associated with the node. |
if (node.Attributes != null) { |
XmlAttributeCollection attrs = node.Attributes; foreach (XmlAttribute attr in attrs) { |
Response.Write(“Attribute Name =” + attr.Name + |
“Attribute Value =” + attr.Value); |
} |
} |
144 |
XML DOM Object Model |
The last step gets all the children of the current node and calls DisplayNodes() on each of the children. Note that the ChildNodes method gets only the direct children of the node. To get all children of a node, you must use recursive code as follows. |
XmlNodeList children = node.ChildNodes; foreach (XmlNode child in children) { |
DisplayNodes(child); |
} |
Navigate to the page from a browser and you should see something similar to Figure 6-4. |
Figure 6-4 |
Finding Nodes |
Using the ChildNodes, FirstChild, LastChild, NextSibling, PreviousSibling, ParentNode, and |
OwnerDocument properties of DOM, a program can navigate through a document hierarchy. You could use these methods to build a function that searches the hierarchy for specific nodes. Fortunately, the DOM provides several functions out of the box that obviates the need to write your own subroutines that search for nodes within an XML document. To this end, the DOM objects provide methods such as |
GetElementsByTagName(), GetElementById(), SelectNodes(), and SelectSingleNode() and for finding specific nodes. The following sections describe these methods in detail. |
GetElementsByTagName |
The GetElementsByTagName() method returns an XmlNodeList containing references to nodes that have a given name. Note that GetElementsByTagName() may return nodes at different levels of the subtree, and some nodes may be descendants of others. For example, if you have an XML document that defines a book node that contains two child nodes that are also named book. If a program searched this document for nodes named book, GetElementsByTagName() would return all three nodes. Depending on what you want to do with the nodes, you may need to be careful not to process a node more than once. |
Both the XmlDocument and XmlElement classes provide the GetElementsByTagName() |
method. The XmlDocument version searches the entire document for nodes with the given name. The XmlElement version of GetElementsByTagName() searches the document subtree rooted at the element. |
The GetElementsByTagName() has two overloads. |
public XmlNodeList GetElementsByTagName(string); public XmlNodeList GetElementsByTagName(string, string); |
The first method returns the list of all descendant elements that match the specified name. The second method also returns the same list of all descendant elements but based on criteria of the specified name as well as the namespace URI. |
GetElementByld |
The GetElementById() method returns the first node it finds with a specified ID attribute. Similar to GetElementsByTagName(), this method searches for nodes that have a certain property. Unlike |
GetElementsByTagName(), however, this method only returns the first match it finds rather than an |
XmlNodeList containing all of the nodes that have the correct ID. |
GetElementById() returns only the first element with an ID attribute that has the value you specified. If you want to examine all matching nodes, you can use SelectNodes() to do something similar. For example, SelectNodes(“//*[@Index=’3’]”) returns an XmlNodeList containing all nodes that have Index attributes with value 3. Note that this statement does not verify that the attributes are marked with the ID type so it’s not exactly the same as GetElementById. Although attributes of type ID can be defined in either XSD schemas or DTDs, the current implementation of the GetElementById only supports those defined in DTDs. |
GetElementById() examines each node’s attributes, looking for one that is marked as an ID. Simply naming the attribute ID is not enough. The XML document must identify the attribute as having type ID. After you find a node with a matching ID, you can use local navigation methods such as NextSibling, |
PreviousSibling, and Parent to move to different parts of the document. |
SelectNodes |
The SelectNodes() method returns an XmlNodeList containing references to nodes that match a spec- ified XPath expression. An XPath expression gives a node’s location within an XML document much as a file path describes a file’s location on a disk. Although file paths are relatively simple, XPath allows you to specify a very complex set of node criteria to select nodes. You will see more on XPath and the use of |
SelectNodes() and SelectSingleNode() methods in a later section of this chapter. |
XML DOM Object Model |
SelectSingleNode |
The SelectSingleNode() method is similar to SelectNodes() except it returns only the first node that matches an XPath expression instead of all of the nodes that match. After a program has located the matching node, it can use other document navigation methods such as NextSibling, PreviousSibling, and ParentNode to move through the document. In some cases this can be more efficient than using |
SelectNodes. |
An Example on Finding Nodes |
Listing 6-4 provides an example of how to find nodes in an XML document using the |
GetElementsByTagName() method. |
Listing 6-4: Quer ying Nodes |
<%@ Page Language=”C#” %> <%@ Import Namespace=”System.Xml” %> <script runat=”server”> |
void Page_Load(object sender, EventArgs e) { |
string xmlPath = Request.PhysicalApplicationPath + |
@”\App_Data\Books.xml”; |
XmlDocument doc = new XmlDocument(); doc.Load(xmlPath); //Get all job titles in the XML file XmlNodeList titleList = doc.GetElementsByTagName(“title”); Response.Write(“Titles: “ + “<br>”); foreach (XmlNode node in titleList) { |
Response.Write(“Title : “ + node.FirstChild.Value + “<br>”); |
} //Get reference to the first author node in the XML file XmlNode authorNode = doc.GetElementsByTagName(“author”)[0]; foreach (XmlNode child in authorNode.ChildNodes) { |
if ((child.Name == “first-name”) && |
(child.NodeType == XmlNodeType.Element)) |
{ |
Response.Write(“First Name : “ + child.FirstChild.Value + “<br>”); |
} if ((child.Name == “last-name”) && |
(child.NodeType == XmlNodeType.Element)) |
{ |
Response.Write(“Last Name : “ + child.FirstChild.Value + “<br>”); |
} |
} |
} |
</script> <html xmlns=”http://www.w3.org/1999/xhtml” > <head runat=”server”> |
<title>Querying for specific nodes</title> |
</head> |
<body> |
<form id=”form1” runat=”server”> |
<div> </div> |
</form> |
</body> </html> |
In the code, you first get reference to all the title nodes in the form of an XmlNodeList object. |
XmlNodeList titleList = doc.GetElementsByTagName(“title”); |
You then loop through the XmlNodeList collection and display the value. Similarly you get reference to the first author node in the XML document using the same GetElementsByTagName() method. |
XmlNode authorNode = doc.GetElementsByTagName(“author”)[0]; |
As you can see from the code, the first element in the XmlNodeList collection object is returned through the [0] prefix. You then loop though the child nodes of the author and display its contents using |
Response.Write() statements. |
Selecting a DOM Subtree Using the XmlNodeReader Class |
Suppose you have selected a node about which you need more information. To scan all the nodes that form the subtree using XML DOM, your only option is to use a recursive algorithm such as the one dis- cussed with the previous example. The XmlNodeReader class gives you an effective, and ready-to-use, alternative by providing a reader over a given DOM node subtree. The following lines of code demon- strate this. |
XmlDocument doc = new XmlDocument(); doc.Load(xmlPath); //Get reference to the book node with the right genre attribute XmlNode bookNode = |
doc.SelectSingleNode(“/bookstore/book[@genre=’autobiography’]”); |
XmlNodeReader reader = new XmlNodeReader(bookNode); while(reader.Read()) { |
//Display only the element names and values if (reader.NodeType == XmlNodeType.Element) |
lstOutput.Items.Add(“Node Name:” + reader.Name); |
if (reader.NodeType == XmlNodeType.Text) |
lstOutput.Items.Add(“Node Value:” + reader.Value); |
} |
The while loop visits all the nodes belonging to the specified XML DOM subtree. The node reader class is initialized using the XmlNode object that is one of the book nodes in the XML DOM subtree. After you have the node subtree in the form of an XmlNodeReader object, you can then easily loop through it using the Read() method. |
XML DOM Object Model |
Programmatically Creating XML Documents |
If your primary goal is analyzing the contents of an XML document, you will probably find the XML DOM parsing model much more effective than readers in spite of the larger memory footprint and set-up time it requires. A document loaded through XML DOM can be modified, extended, shrunk, and, more important, searched. The same can’t be done with XML readers; XML readers solve a different type of problem. |
To create an XML document using the XML DOM API, you must first create the document in memory, create nodes and then call the Save() method or one of its overloads. This approach gives you great flexibility because you can work with the in-memory document efficiently till you finally decide to save the document. |
In terms of the internal implementation, it is worth noting that the XML DOM’s |
Save() method makes use of an XML text writer to create the document. So unless the content to be generated is complex and subject to a lot of conditions, using an XML writer to create XML documents is much faster. The XmlNodeReader reads and returns nodes from the subtree, including entity reference nodes. The XmlNodeReader not only enforces the XML well-formedness rules, but also expands default attributes and entities, if DTD information is present in the XmlDocument. |
The XmlDocument class provides a bunch of methods to create new nodes. These methods are named consistently with the writing methods of the XmlWriter class you encountered in Chapter 4. The next section reviews these methods in detail. |
Creating and Appending Nodes |
To add new nodes to the document, you must first use the XmlDocument class’s factory methods for creating a new node and then add it somewhere in the document. They are called factory methods because they are responsible for creating a new node of a given type. These methods start with |
“Create” and end with the node type to create. For example, the method to create a new Text node is named CreateTextNode() and the method to create a new Element node is called CreateElement(). Also using the Create methods ensures that the created node will have the same namespace as the rest of the document. The following list reviews all the Create() methods and provides a brief description of their functionalities: |
CreateAttribute() —Creates an Attribute node with the given name |
CreateCDataSection() — Creates a CDATA section with the specified content |
CreateComment()— Creates a Comment node with the specified content |
CreateDocumentFragment()— Creates an empty DocumentFragment node |
CreateElement()— Creates an Element node with the given tag name |
CreateEntityReference()— Creates an EntityReference node |
CreateProcessingInstruction() —Creates a ProcessingInstruction node with the |
given content |
CreateTextNode() —Creates a new Text node with the specified content |
For example, suppose you wanted to add another <book> element to the bookstore document. To do so, you would need to create nine new nodes to hold the information. Each of the four tags is a new node |
(<title>, <first-name>, <last-name>, and <price>), and the text that goes inside the nodes are also nodes. Finally, the genre attribute on the <book> tag is a new node. |
Note that XML DOM API in .NET Framework also provides the InsertAfter() method, which inserts a node after another node, but this method is not part of the standard W3C DOM API. |
Now that you have a general understanding of the methods required for creating nodes, it is time to look at the basic steps to create an XML document on the fly. They are as follows: |
Create any necessary nodes |
Link the nodes to create a tree |
Append the tree to the in-memory XML document |
Optionally save the document |
Before you create the necessary nodes, you should first create the standard XML declaration. The following code creates the XML prolog and appends to the XmlDocument instance the standard XML declaration and a comment node: |
XmlDocument doc = new XmlDocument(); // Write and append the XML heading XmlNode declarationNode = doc.CreateXmlDeclaration(“1.0”, “”, “”); doc.AppendChild(declarationNode); // Write and append some comment XmlNode comment = doc.CreateComment(“This file represents “ + |
“a fragment of a book store inventory database”); |
doc.AppendChild(comment); |
The CreateXmlDeclaration() method takes three arguments: the XML version, the required encoding, and a boolean value denoting whether the document can be considered stand-alone or has dependencies on other documents. All arguments are strings, including the encoding argument, as shown here: |
<?xml version=”1.0” standalone=”yes” encoding=”utf-8”?> |
If specified, the encoding is written in the XML declaration and used by Save() to create the actual out- put stream. If the encoding is null or empty, no encoding attribute is set, and the default Unicode Universal Character Set Transformation Format, 8-bit form (UTF-8) encoding is used. |
CreateXmlDeclaration() returns an XmlDeclaration node that you add as a child to the |
XmlDocument class. CreateComment(), on the other hand, creates an XmlComment node that represents an XML comment, as shown here: |
<!-- This file represents a fragment of a book store inventory database --> |
learn guitarphysics learnteliphonyxmlphysicsenjoylife