XmlNodeReader— In case you are looking to implement the pull model on a DOM tree that’s |
already present in memory, you can consider using the XmlNodeReader class. Best-suited only for the very specialized application previously mentioned, this class allows you to read the data from specific nodes of the tree and enjoy a double benefit — the speed associated with the |
XmlReader class and the ease of use of the DOM. You see usage of this class in Chapter 6. |
Typically, you would create objects of these classes and use their methods and properties. If warranted, you may also extend these classes to provide further specific functionalities. The XmlWriter class has only one derived class: XmlTextWriter. The XmlWriter can be used to write XML document on a for- ward-only basis. The classes utilized for writing XML data are as follows: |
XmlWriter —Is an abstract class that provides a “forward-only, read-only, non-cached” way of |
generating XML streams. By creating the XmlWriter object using the static Create() method, you can take advantage of the new features of XmlWriter object in .NET Framework 2.0. |
XmlTextWriter — Provides a writer that provides a “forward-only, read-only, non-cached” way |
of generating XML streams. Note that this class is obsolete in .NET Framework 2.0 and should only be used in situations where you require backward compatibility with an application created using .NET 1.x versions. |
Now that you have an overview of the different classes available for reading and writing, the following section focuses on reading XML data with the XmlReader class. |
Reading XML with XmlReader |
XmlReader provides you with a way to parse XML data that minimizes resource usage by reading forward through the document, recognizing elements as it reads. This approach results in very little data being cached in memory, but the forward-only style has two main consequences. The first is that it isn’t possible to go back to an earlier point in the file without starting to read from the top again. The second consequence is slightly more subtle: elements are read and presented to you one by one, with no context. If you need to keep track of where an element occurs within the document structure, you’ll need to do it yourself. If either of these shortcomings sounds like limitations to you, you might need to use the DOM style XmlDocument class, which is discussed later in Chapter 6 of this book. |
Overview of XmlReader |
The XmlReader class allows you to access XML data from a stream or XML document. This class pro- vides fast, non-cacheable, read-only, and forward-only access to XML data. In .NET Framework 1.x, the |
XmlReader is an abstract class that provides methods that are implemented by the derived classes to provide access to the elements and attributes of XML data. With the release of .NET Framework 2.0, however, the XmlReader class is a full-featured class similar to the XmlTextReader class and provides standards-based support to read XML data. You use XmlReader classes to determine various factors such as the depth of a node in an XML document, whether the node has attributes, the number of attributes in a node, and the value of an attribute. |
Although you can use the XmlTextReader class to read XML data, the preferred approach to reading XML data is to use the XmlReader object that is created through the static Create() method of the |
XmlReader object. This is because of the fact that the XmlReader object obtained through the |
Create() method is much more standards compliant than the XmlTextReader implementation. For example,the XmlTextReader class does not expand entities by default and does not add default attributes. |
The XmlTextReader class is one of the derived classes of the XmlReader class and implements the methods defined by the XmlReader class. The XmlValidatingReader is another class in .NET Framework 1.x that is derived from the XmlReader class, allowing you to not only read XML data but also support DTD and schema validation. Note that in .NET Framework 2.0, both XmlTextReader and |
XmlValidatingReader classes are obsolete, whose functionalities are now provided by the XmlReader |
and XmlReaderSettings class, respectively. |
Steps Involved in Using XmlReader to Read XML Data |
The XmlReader class is designed for fast, forward-only access to the contents of an XML file, and is not suited for making modifications to the file’s contents or structure (for that you will use the XmlDocument |
class). The XmlReader class works by starting at the beginning of the file and reading one node at a time. As each node is read, you can either ignore the node or access the node information as dictated by the needs of the application. |
The steps for using the XmlReader class are as follows: |
1. Create an instance of the class using the Create() method of the XmlReader class, passing to |
the method the name of the XML file to be read. |
2. Set up a loop that calls the Read() method repeatedly. This method starts with the first node in |
the file and then reads all remaining nodes, one at a time, as it is called. It returns true if there is a node to read, false when the end of the file has been reached. |
3. In the loop, examine the properties and methods of the XmlReader object to obtain information |
about the current node (its type, name, data, and so on). Loop back until Read() returns False. |
The XmlReader class has a large number of properties and methods. The ones that you will need most often are explained in Table 4-1 and Table 4-2. |
Table 4-1. Important Properties of the XmlReader Class |
Property |
Description |
AttributeCount Returns the number of attributes in the current node |
Returns the depth of the current node; used to determine if a specific node has child nodes |
Depth |
Indicates if the reader is positioned at the end of the stream |
EOF |
HasAttributes Returns a boolean value indicating if the current node has |
attributes |
Returns a boolean value indicating if the current node can have a value |
HasValue |
IsEmptyElement Indicates if the current node is an empty element |
Returns the local name of the current node |
LocalName |
Returns the qualified name of the current node |
Name |
07_596772 ch04.qxd 12/13/05 11:23 PM Page 65 |
Reading and Writing XML Data Using XmlReader and XmlWriter |
Property |
Description |
NamespaceURI Returns the namespace URI of the current node |
Returns the type of the current node in the form of an |
NodeType |
XmlNodeType enumeration |
Returns the namespace prefix associated with the current node |
Prefix |
Returns the current state of the reader in the form of |
ReadState |
ReadState enumeration |
Settings |
Returns the XmlReaderSettings object used to create the |
XmlReader instance |
Gets the value of the current node |
Value |
Gets the CLR type of the current node |
ValueType |
Now that you have an understanding of the important properties of the XmlReader class, Table 4-2 out- lines the important methods of the XmlReader class. |
Important Methods of the XmlReader Class |
Method |
Description |
Close |
Closes the XmlReader object by setting the ReadState |
enumeration to Closed |
Create |
Factory method that creates an instance of the XmlReader |
object and returns it to the caller; the preferred mechanism for obtaining XmlReader instances |
GetAttribute Gets the value of an attribute |
IsStartElement Indicates if the current node is a start tag |
MoveToAttribute Moves the reader to the specified attribute |
MoveToContent Moves the reader to the next content node if the current |
node is not a content node |
MoveToElement Moves the reader to the element that contains the current |
attribute; used when you are enumerating through the attributes and you want to switch back to the element that contains all these attributes |
MoveToFirstAttribute Moves the reader to the first attribute of the current node |
MoveToNextAttribute Moves the reader to the next attribute; used especially when |
you are enumerating through the attributes in a node |
Reads the next node from the stream |
Read |
ReadContentAs Reads the content as an object of the supplied type |
Method |
Description |
ReadElementContentAs Reads the current element and returns it contents as an |
object of the type specified |
ReadEndElement Moves the reader past the current end tag and moves onto |
the next node |
ReadInnerXml Reads all of the node’s content including the markup as a |
string |
ReadOuterXml Reads the node’s content including the current node |
markup and all its children |
ReadToDescendant Moves the reader to the next matching descendant element |
ReadToFollowing Reads until the named element is found |
ReadToNextSibling Advances the reader to the next matching sibling element |
ReadValueChunk Allows you to read large streams of text embedded in an |
XML document |
In addition to the methods described in Table 4-2, XmlReader also exposes a variety of |
ReadContentAsXXX() methods such as: |
ReadContentAsBase64() |
ReadContentAsBinHex() |
ReadContentAsBoolean() |
ReadContentAsDateTime() |
ReadContentAsDouble() |
ReadContentAsInt() |
ReadContentAsLong() |
ReadContentAsObject() |
ReadContentAsString() |
As the name suggests, these methods return the node value as an object of the type specified in the method name. For instance, the ReadContentAsString() method returns the node value as an object of type string. Similar to the ReadContentAsXXX() methods, there are also a number of variations of the ReadElementContentAsXXX() method. These methods are: |
ReadElementContentAsBase64() |
ReadElementContentAsBinHex() |
ReadElementContentAsBoolean() |
ReadElementContentAsDateTime() |
ReadElementContentAsDouble() |
Reading and Writing XML Data Using XmlReader and XmlWriter |
ReadElementContentAsInt() |
ReadElementContentAsLong() |
ReadElementContentAsObject() |
ReadElementContentAsString() |
The most important function in all of these functions is Read(), which tells the XmlReader to fetch the next node from the document. After you’ve got the node, you can use the NodeType property to find out what you have. The NodeType property returns one of the members of the XmlNodeType enumeration, whose members are listed in the Table 4-3. |
Table 4-3. Members of the XmlNodeType Enumeration |
Member |
Description |
An attribute, for example id=1 |
Attribute |
CDATA |
A CDATA section, for example <![CDATA[Some text]]> |
Comment |
An XML comment, for example <!-- Some comment --> |
The document object, representing the root of the XML tree |
Document |
DocumentFragment A fragment of XML that isn’t a document in itself |
DocumentType A document type declaration |
Element, EndElement The start and end of an element |
Entity, EndEntity The start and end of an entity declaration |
EntityReference An entity reference (for example, <) |
Used if the node type is queried when no node has been read |
None |
A notation entry in a DTD |
Notation |
ProcessingInstruction An XML processing instruction |
SignificantWhitespace White space in a mixed content model document, or when |
xml:space=preserve has been set |
The text content of an element |
Text |
White space between markup |
Whitespace |
XmlDeclaration The XML declaration at the top of a document |
Now that you have understood the important properties and methods, take a look at the different ways of creating documents, elements, attributes, and other data in the next few sections. |
Chapter 4 |
Start Reading a Document |
To begin reading an XML document, you can call any of the Read() methods to extract data from the document. For example, this code snippet uses the ReadStartElement() to move to the first element in the document: |
XmlReader reader = XmlReader.Create(“Employees.xml”); //Skip the XML declaration and go to the first element reader.ReadStartElement(); |
Alternatively, you can just jump straight to the document content by calling MoveToContent(), which skips to the next content node if the current node is not a content node. (Content nodes are the CDATA, Element, Entity, and EntityReference nodes.) If positioned on an attribute, the reader will move back to the element that contains the attribute. |
XmlReader reader = XmlReader.Create(“Employees.xml”); reader.MoveToContent(); |
In the examples shown, if Employees.xml looks as follows |
<?xml version=”1.0”?> <!--Employee Details --> <firstName> |
Nancy |
</firstName> |
the previous code would advance to the <firstName> element and skip everything before it in the prolog. |
Reading Elements |
The Read(), ReadString(), ReadStartElement(), and ReadEndElement() methods can all be used to read Element nodes from the XML source. After reading the element, each method advances to the next node in the document. In comparison, the MoveToElement() method moves to the next Element, but does not read it. |
The Read() method is the simplest: It reads the next node in the source whether or not it is an Element node. When using this method, you should check the node’s name and type to make sure you are processing an appropriate node. For example, the following code uses the Read() method and the |
NodeType property of the XmlReader to read only Comment nodes: |
XmlReader reader = XmlReader.Create(“Employees.xml”); //Read the nodes in a loop while (reader.Read()) { |
if (reader.NodeType == XmlNodeType.Comment) { |
//Code to process Comments |
} |
} |
As you read through the XML document using the XmlReader object, if you examine the ReadState |
property of the XmlReader object, you will find that it provides different values depending on the state of the XmlReader. Table 4-4 summarizes the states of the XmlReader as it reads through the various portions of an XML document. |
Reading and Writing XML Data Using XmlReader and XmlWriter |
Table 4-4. Members of the ReadState Enumeration |
State |
Description |
The reader enters this state when the Close method is called |
Closed |
Signals the end of the XML document |
EndOfFile |
Specifies that an error has occurred and the error prevents the reader from continuing the read operation |
Error |
The reader is in this state before the invocation of the Read method |
Initial |
The reader is in this state after the Read method has been called and can respond to the additional methods |
Interactive |
Reading Attributes |
Before you attempt to read attributes in an element node, you should first use the HasAttributes property to make sure that the element node contains attributes. Attributes in an element node can be accessed directly by their name or index. They can also be accessed by the MoveToAttribute(), |
MoveToFirstAttribute(), and MoveToNextAttribute() methods. |
For example, to process an attribute by name, you can call MoveToAttribute() with the name of the attribute. |
XmlReader reader = XmlReader.Create(“Employees.xml”); //Move to the first element reader.MoveToElement(); if (reader.HasAttributes) { |
reader.MoveToAttribute(“id”) //Code to do something with the attribute value stored in id attribute |
} |
You see a complete example on the use of attributes in a later section of this chapter. |
Reading Content and Other Data |
Your application can use the ReadString() method to read the content of the current node as a string. You can also read the content of the element using the various forms of the ReadElementContentAsXXX |
methods. In addition to those methods, you also have the ReadContentAsXXX methods that allow you to read the text content at the current position. For example, using the ReadContentAsDouble() |
method, you can read the text content at the current position as a Double value. The ReadString() |
method behaves differently depending on the element the reader is currently positioned in. |
If the current node is an Element node, ReadString() concatenates all text, significant white |
space, white space, and CDATAsection node types within the Element node and returns the concatenated data as the Element node’s content. |
If the current node is a Text node, ReadString() performs the same concatenation on the Text |
node’s end tag as it did on the Element node. |
If the current node is an Attribute node, ReadString() behaves as though the reader were |
currently positioned on the starting tag of the Element node and returns data as described for Element nodes. |
For all other node types, ReadString() returns an empty string. |
Microsoft has greatly enhanced XML support in the .NET Framework 2.0 by adding strong type support to all the XML processing classes. An example of this is the introduction of methods like ReadElementContentAsInt() to the XmlReader class that allow you to read the contents of an XML node in a strongly typed manner. Accomplishing this in .NET 1.x would mean that you read the XML node as a string and then convert that to appropriate data type using a helper class such as XmlConvert. This is no longer required in .NET Framework 2.0 because of the native support that is available for almost all of the XML processing classes. In addition to the strongly typed support, Microsoft also has greatly enhanced the performance of the XmlReader and |
XmlWriter classes. |
Now that you have a complete understanding of the various methods and properties of the XmlReader |
class, it is time to look at examples that exercise all of these concepts. |
Reading an XML File Using XmlReader |
Now that you know the theory, this section begins with an example to demonstrate how to read an XML document using an XmlReader object. This simple example leverages the functionalities of the XmlReader |
class to parse a static XML file named Employees.xml. Here’s the XML file, a list of employees in an organization, shown in Listing 4-1. |
Listing 4-1: Employees.xml File |
<?xml version=’1.0’?> <employees> |
<employee id=”1”> |
<name> |
<firstName>Nancy</firstName> <lastName>Davolio</lastName> |
</name> <city>Seattle</city> <state>WA</state> <zipCode>98122</zipCode> |
</employee> <employee id=”2”> |
<name> |
<firstName>Andrew</firstName> <lastName>Fuller</lastName> |
</name> <city>Tacoma</city> <state>WA</state> <zipCode>98401</zipCode> |
</employee> |
</employees> |
Reading and Writing XML Data Using XmlReader and XmlWriter |
Now that you have seen the contents of the Employees.xml file, Listing 4-2 shows the ASP.NET code that allows you to parse the Employees.xml file. |
Listing 4-2: Processing the Elements of the Employees XML File Using XmlReader Class |
<%@ Page Language=”C#” %> <%@ Import Namespace=”System.Xml” %> <script runat=”server”> |
void Page_Load(object sender, EventArgs e) { |
//Location of XML file string xmlFilePath = @”C:\Data\Employees.xml”; try { |
//Get reference to the XmlReader object using (XmlReader reader = XmlReader.Create(xmlFilePath)) { |
string result; while (reader.Read()) { |
//Process only the elements if (reader.NodeType == XmlNodeType.Element) { |
//Reset the variable for a new element result = “”; for (int count = 1;count <= reader.Depth; count++) { |
result += “===”; |
} result += “=> “ + reader.Name + “<br/>”; lblResult.Text += result; |
} |
} |
} |
} catch(Exception ex) { |
lblResult.Text = “An Exception occurred: “ + ex.Message; |
} |
} |
</script> |
<html xmlns=”http://www.w3.org/1999/xhtml” > <head runat=”server”> |
<title>Reading an XML File using XmlReader</title> |
</head> <body> |
<form id=”form1” runat=”server”> <div> |
<asp:label id=”lblResult” runat=”server” /> |
</div> </form> |
</body> </html> |
Chapter 4 |
Before examining the code, here is the output produced by Listing 4-2. |
Figure 4-2 |
The first step is to import all the namespaces required to execute the page — the .NET libraries for the XML parser, most of which are primarily contained in the System.Xml namespace. |
<%@ Import Namespace=”System.Xml” %> |
Next, within the Page_Load function, a variable containing the location of the XML file is defined. The code then declares an XmlReader object within the scope of a using block by invoking the Create method of the XmlReader object. |
using (XmlReader reader = XmlReader.Create(xmlFilePath)) |
Among the many enhancements made to the XmlReader class in .NET Framework 2.0, an important feature is the ability to dispose of the resources used by the |
XmlReader by invoking the Dispose method. This is made possible by the fact that the XmlReader class now implements the IDisposable interface. Because of this, you can now enclose the creation of the XmlReader object within the scope of a using block and the resources utilized by the XmlReader will be automatically released at the end of the using block. |
Note that XmlReader object isn’t limited to reading from files. Various overloads of the Create() |
method enable you to take XML input from URLs, streams, strings, and other Reader objects. The next step is to read the XML file — a simple matter because the XmlReader object provides a Read() method for just this purpose. This method returns true if it encounters a node in the XML file. After it is finished with the file, it returns false. This makes it easy to process an entire file simply by wrapping the method call in a “while” loop. Inside the while loop, there is code to process element nodes and format them for display. |
Reading and Writing XML Data Using XmlReader and XmlWriter |
The NodeType property of the current node can be used to filter out the elements for further processing. |
if (reader.NodeType == XmlNodeType.Element) |
The rest of the code in the “while” loop ensures that the output is formatted properly for display in the browser. Pay special attention to the use of the Depth property, which holds an integer value specifying the depth of the current node in the tree hierarchy. Simply put, the element <employees> is at depth 0; the element <employee> is at depth 1, and so on. |
It is important to realize that a node read by the Read method does not correspond to an entire XML ele- ment. For example, look at this XML element: |
<city>Seattle</city> |
From the perspective of the XmlReader, the three nodes will be read in the following order: 1. A node corresponding to the opening tag. This node has type Element and local name ‘city’. 2. A node corresponding to the data. This node has type Text and value ‘Seattle’. 3. A node corresponding to the closing tag. This node has type EndElement and local name ‘city’. |
That takes care of handling elements. But what about the attributes contained within each element? In a later section, you see the steps involved in processing attributes using the XmlReader class. |
Dealing with Exceptions |
When the XmlReader class processes an XML file, it checks the XML file for well-formedness and also resolves external references (if any). Problems can crop up in many places, aside from the obvious one where the specified file is not found or cannot be opened. Any XML syntax error will raise an exception of type System.Xml.XmlException. The Message property of this class returns a descriptive message about the error (as is the case with all Exception classes). This message also includes the line number and position where the error was found. The XmlException class has two additional properties — |
LineNumber and LinePosition — that return the line number and character position of the error, respectively. You can use this information as needed. For example, your program could open and dis- play the offending XML file with a pointer indicating where the error occurred. |
Exception handling in programs that use the XmlReader class (and other XML-related classes) follows this general scheme: |
1. Catch exceptions of type XmlException to deal with XML parsing errors. |
2. Catch other exceptions to deal with other types of errors. |
For reasons of brevity, the previous example shown in Listing 4-2 handled all the exceptions including the XmlException in a single catch block as opposed to creating two catch blocks. |
Handling Attributes in an XML File |
XML elements can include attributes, which consist of name/value pairs and are always string data. In the sample XML file, the employee element has an id attribute. As you play with the sample code in Listing 4-2, you may notice that when the nodes are read in, you don’t see any attributes. This is because |