Reading and Writing XML Data Using XmlReader and XmlWriter




  XmlNodeReader— In case you are looking to implement the pull model on a DOM tree that’s

already present in memory, you can consider using the XmlNodeReader class. Best-suited only
for the very specialized application previously mentioned, this class allows you to read the data
from specific nodes of the tree and enjoy a double benefit — the speed associated with the

XmlReader class and the ease of use of the DOM. You see usage of this class in Chapter 6.

Typically, you would create objects of these classes and use their methods and properties. If warranted,
you may also extend these classes to provide further specific functionalities. The XmlWriter class has
only one derived class: XmlTextWriter. The XmlWriter can be used to write XML document on a for-
ward-only basis. The classes utilized for writing XML data are as follows:

  XmlWriter —Is an abstract class that provides a “forward-only, read-only, non-cached” way of

generating XML streams. By creating the XmlWriter object using the static Create() method,
you can take advantage of the new features of XmlWriter object in .NET Framework 2.0.

  XmlTextWriter — Provides a writer that provides a “forward-only, read-only, non-cached” way

of generating XML streams. Note that this class is obsolete in .NET Framework 2.0 and should
only be used in situations where you require backward compatibility with an application created
using .NET 1.x versions.

Now that you have an overview of the different classes available for reading and writing, the following
section focuses on reading XML data with the XmlReader class.

Reading XML with XmlReader

XmlReader provides you with a way to parse XML data that minimizes resource usage by reading
forward through the document, recognizing elements as it reads. This approach results in very little data
being cached in memory, but the forward-only style has two main consequences. The first is that it isn’t
possible to go back to an earlier point in the file without starting to read from the top again. The second
consequence is slightly more subtle: elements are read and presented to you one by one, with no context.
If you need to keep track of where an element occurs within the document structure, you’ll need to do it
yourself. If either of these shortcomings sounds like limitations to you, you might need to use the DOM
style XmlDocument class, which is discussed later in Chapter 6 of this book.

Overview of XmlReader

The XmlReader class allows you to access XML data from a stream or XML document. This class pro-
vides fast, non-cacheable, read-only, and forward-only access to XML data. In .NET Framework 1.x, the

XmlReader is an abstract class that provides methods that are implemented by the derived classes to
provide access to the elements and attributes of XML data. With the release of .NET Framework 2.0,
however, the XmlReader class is a full-featured class similar to the XmlTextReader class and provides
standards-based support to read XML data. You use XmlReader classes to determine various factors
such as the depth of a node in an XML document, whether the node has attributes, the number of
attributes in a node, and the value of an attribute.

Although you can use the XmlTextReader class to read XML data, the preferred approach to reading
XML data is to use the XmlReader object that is created through the static Create() method of the

XmlReader object. This is because of the fact that the XmlReader object obtained through the

Create() method is much more standards compliant than the XmlTextReader implementation. For
example,the XmlTextReader class does not expand entities by default and does not add default attributes.








The XmlTextReader class is one of the derived classes of the XmlReader class and implements the
methods defined by the XmlReader class. The XmlValidatingReader is another class in .NET
Framework 1.x that is derived from the XmlReader class, allowing you to not only read XML data but
also support DTD and schema validation. Note that in .NET Framework 2.0, both XmlTextReader and

XmlValidatingReader classes are obsolete, whose functionalities are now provided by the XmlReader

and XmlReaderSettings class, respectively.

Steps Involved in Using XmlReader to Read XML Data

The XmlReader class is designed for fast, forward-only access to the contents of an XML file, and is not
suited for making modifications to the file’s contents or structure (for that you will use the XmlDocument

class). The XmlReader class works by starting at the beginning of the file and reading one node at a
time. As each node is read, you can either ignore the node or access the node information as dictated by
the needs of the application.

The steps for using the XmlReader class are as follows:

1. Create an instance of the class using the Create() method of the XmlReader class, passing to

the method the name of the XML file to be read.

2. Set up a loop that calls the Read() method repeatedly. This method starts with the first node in

the file and then reads all remaining nodes, one at a time, as it is called. It returns true if there is
a node to read, false when the end of the file has been reached.

3. In the loop, examine the properties and methods of the XmlReader object to obtain information

about the current node (its type, name, data, and so on). Loop back until Read() returns False.

The XmlReader class has a large number of properties and methods. The ones that you will need most
often are explained in Table 4-1 and Table 4-2.

Table 4-1. Important Properties of the XmlReader Class

Property

 Description

AttributeCount Returns the number of attributes in the current node

 Returns the depth of the current node; used to determine if
a specific node has child nodes

Depth

 Indicates if the reader is positioned at the end of the stream

EOF

HasAttributes Returns a boolean value indicating if the current node has

attributes

 Returns a boolean value indicating if the current node can
have a value

HasValue

IsEmptyElement Indicates if the current node is an empty element

 Returns the local name of the current node

LocalName

 Returns the qualified name of the current node

Name






07_596772 ch04.qxd  12/13/05  11:23 PM  Page 65

Reading and Writing XML Data Using XmlReader and XmlWriter

Property

 Description

NamespaceURI Returns the namespace URI of the current node

 Returns the type of the current node in the form of an

NodeType

XmlNodeType enumeration

 Returns the namespace prefix associated with the
current node

Prefix

 Returns the current state of the reader in the form of

ReadState

ReadState enumeration

Settings

 Returns the XmlReaderSettings object used to create the

XmlReader instance

 Gets the value of the current node

Value

 Gets the CLR type of the current node

ValueType

Now that you have an understanding of the important properties of the XmlReader class, Table 4-2 out-
lines the important methods of the XmlReader class.

 Important Methods of the XmlReader Class

Method

 Description

Close

 Closes the XmlReader object by setting the ReadState

enumeration to Closed

Create

 Factory method that creates an instance of the XmlReader

object and returns it to the caller; the preferred mechanism
for obtaining XmlReader instances

GetAttribute Gets the value of an attribute

IsStartElement Indicates if the current node is a start tag

MoveToAttribute Moves the reader to the specified attribute

MoveToContent Moves the reader to the next content node if the current

node is not a content node

MoveToElement Moves the reader to the element that contains the current

attribute; used when you are enumerating through the
attributes and you want to switch back to the element that
contains all these attributes

MoveToFirstAttribute Moves the reader to the first attribute of the current node

MoveToNextAttribute Moves the reader to the next attribute; used especially when

you are enumerating through the attributes in a node

 Reads the next node from the stream

Read

ReadContentAs Reads the content as an object of the supplied type








Method

 Description

ReadElementContentAs Reads the current element and returns it contents as an

object of the type specified

ReadEndElement Moves the reader past the current end tag and moves onto

the next node

ReadInnerXml Reads all of the node’s content including the markup as a

string

ReadOuterXml Reads the node’s content including the current node

markup and all its children

ReadToDescendant Moves the reader to the next matching descendant element

ReadToFollowing Reads until the named element is found

ReadToNextSibling Advances the reader to the next matching sibling element

ReadValueChunk Allows you to read large streams of text embedded in an

XML document

In addition to the methods described in Table 4-2, XmlReader also exposes a variety of

ReadContentAsXXX() methods such as:

  ReadContentAsBase64()

  ReadContentAsBinHex()

  ReadContentAsBoolean()

  ReadContentAsDateTime()

  ReadContentAsDouble()

  ReadContentAsInt()

  ReadContentAsLong()

  ReadContentAsObject()

  ReadContentAsString()

As the name suggests, these methods return the node value as an object of the type specified in the
method name. For instance, the ReadContentAsString() method returns the node value as an object
of type string. Similar to the ReadContentAsXXX() methods, there are also a number of variations of
the ReadElementContentAsXXX() method. These methods are:

  ReadElementContentAsBase64()

  ReadElementContentAsBinHex()

  ReadElementContentAsBoolean()

  ReadElementContentAsDateTime()

  ReadElementContentAsDouble()







Reading and Writing XML Data Using XmlReader and XmlWriter

  ReadElementContentAsInt()

  ReadElementContentAsLong()

  ReadElementContentAsObject()

  ReadElementContentAsString()

The most important function in all of these functions is Read(), which tells the XmlReader to fetch the
next node from the document. After you’ve got the node, you can use the NodeType property to find out
what you have. The NodeType property returns one of the members of the XmlNodeType enumeration,
whose members are listed in the Table 4-3.

Table 4-3. Members of the XmlNodeType Enumeration

Member

 Description

 An attribute, for example id=1

Attribute

CDATA

 A CDATA section, for example <![CDATA[Some text]]>

Comment

 An XML comment, for example <!-- Some comment -->

 The document object, representing the root of the XML tree

Document

DocumentFragment A fragment of XML that isn’t a document in itself

DocumentType A document type declaration

Element, EndElement The start and end of an element

Entity, EndEntity The start and end of an entity declaration

EntityReference An entity reference (for example, &lt;)

 Used if the node type is queried when no node has been read

None

 A notation entry in a DTD

Notation

ProcessingInstruction An XML processing instruction

SignificantWhitespace White space in a mixed content model document, or when

xml:space=preserve has been set

 The text content of an element

Text

 White space between markup

Whitespace

XmlDeclaration The XML declaration at the top of a document

Now that you have understood the important properties and methods, take a look at the different ways
of creating documents, elements, attributes, and other data in the next few sections.






Chapter 4

Start Reading a Document

To begin reading an XML document, you can call any of the Read() methods to extract data from the
document. For example, this code snippet uses the ReadStartElement() to move to the first element in
the document:

XmlReader reader = XmlReader.Create(“Employees.xml”);
//Skip the XML declaration and go to the first element
reader.ReadStartElement();

Alternatively, you can just jump straight to the document content by calling MoveToContent(), which
skips to the next content node if the current node is not a content node. (Content nodes are the CDATA,
Element, Entity, and EntityReference nodes.) If positioned on an attribute, the reader will move back to
the element that contains the attribute.

XmlReader reader = XmlReader.Create(“Employees.xml”);
reader.MoveToContent();

In the examples shown, if Employees.xml looks as follows

<?xml version=”1.0”?>
<!--Employee Details -->
<firstName>

Nancy

</firstName>

the previous code would advance to the <firstName> element and skip everything before it in the prolog.

Reading Elements

The Read(), ReadString(), ReadStartElement(), and ReadEndElement() methods can all be used
to read Element nodes from the XML source. After reading the element, each method advances to the
next node in the document. In comparison, the MoveToElement() method moves to the next Element,
but does not read it.

The Read() method is the simplest: It reads the next node in the source whether or not it is an Element
node. When using this method, you should check the node’s name and type to make sure you are
processing an appropriate node. For example, the following code uses the Read() method and the

NodeType property of the XmlReader to read only Comment nodes:

XmlReader reader = XmlReader.Create(“Employees.xml”);
//Read the nodes in a loop
while (reader.Read())
{  

if (reader.NodeType == XmlNodeType.Comment)
{

 //Code to process Comments

}

}

As you read through the XML document using the XmlReader object, if you examine the ReadState

property of the XmlReader object, you will find that it provides different values depending on the state of
the XmlReader. Table 4-4 summarizes the states of the XmlReader as it reads through the various portions
of an XML document.







Reading and Writing XML Data Using XmlReader and XmlWriter

Table 4-4. Members of the ReadState Enumeration

State

 Description

 The reader enters this state when the Close method is called

Closed

 Signals the end of the XML document

EndOfFile

 Specifies that an error has occurred and the error prevents
the reader from continuing the read operation

Error

 The reader is in this state before the invocation of the Read
method

Initial

 The reader is in this state after the Read method has been
called and can respond to the additional methods

Interactive

Reading Attributes

Before you attempt to read attributes in an element node, you should first use the HasAttributes property
to make sure that the element node contains attributes. Attributes in an element node can be accessed
directly by their name or index. They can also be accessed by the MoveToAttribute(),

MoveToFirstAttribute(), and MoveToNextAttribute() methods.

For example, to process an attribute by name, you can call MoveToAttribute() with the name of the
attribute.

XmlReader reader = XmlReader.Create(“Employees.xml”);
//Move to the first element
reader.MoveToElement();
if (reader.HasAttributes)
{

 reader.MoveToAttribute(“id”)
//Code to do something with the attribute value stored in id attribute

}

You see a complete example on the use of attributes in a later section of this chapter.

Reading Content and Other Data

Your application can use the ReadString() method to read the content of the current node as a string.
You can also read the content of the element using the various forms of the ReadElementContentAsXXX

methods. In addition to those methods, you also have the ReadContentAsXXX methods that allow you
to read the text content at the current position. For example, using the ReadContentAsDouble()

method, you can read the text content at the current position as a Double value. The ReadString()

method behaves differently depending on the element the reader is currently positioned in.

  If the current node is an Element node, ReadString() concatenates all text, significant white

space, white space, and CDATAsection node types within the Element node and returns the
concatenated data as the Element node’s content.

  If the current node is a Text node, ReadString() performs the same concatenation on the Text

node’s end tag as it did on the Element node.








  If the current node is an Attribute node, ReadString() behaves as though the reader were

currently positioned on the starting tag of the Element node and returns data as described for
Element nodes.

  For all other node types, ReadString() returns an empty string.

Microsoft has greatly enhanced XML support in the .NET Framework 2.0 by adding
strong type support to all the XML processing classes. An example of this is the
introduction of methods like ReadElementContentAsInt() to the XmlReader class
that allow you to read the contents of an XML node in a strongly typed manner.
Accomplishing this in .NET 1.x would mean that you read the XML node as a string
and then convert that to appropriate data type using a helper class such as XmlConvert.
This is no longer required in .NET Framework 2.0 because of the native support that is
available for almost all of the XML processing classes. In addition to the strongly typed
support, Microsoft also has greatly enhanced the performance of the XmlReader and

XmlWriter classes.

Now that you have a complete understanding of the various methods and properties of the XmlReader

class, it is time to look at examples that exercise all of these concepts.

Reading an XML File Using XmlReader

Now that you know the theory, this section begins with an example to demonstrate how to read an XML
document using an XmlReader object. This simple example leverages the functionalities of the XmlReader

class to parse a static XML file named Employees.xml. Here’s the XML file, a list of employees in an
organization, shown in Listing 4-1.

Listing 4-1: Employees.xml File

<?xml version=’1.0’?>
<employees>

<employee id=”1”>   

<name>

<firstName>Nancy</firstName>
<lastName>Davolio</lastName>

</name>
<city>Seattle</city>
<state>WA</state>
<zipCode>98122</zipCode>

</employee>
<employee id=”2”>   

<name>

<firstName>Andrew</firstName>
<lastName>Fuller</lastName>

</name>
<city>Tacoma</city>
<state>WA</state>
<zipCode>98401</zipCode>

</employee>

</employees>







Reading and Writing XML Data Using XmlReader and XmlWriter

Now that you have seen the contents of the Employees.xml file, Listing 4-2 shows the ASP.NET code
that allows you to parse the Employees.xml file.

Listing 4-2: Processing the Elements of the Employees XML File Using XmlReader Class

<%@ Page Language=”C#” %>
<%@ Import Namespace=”System.Xml” %>
<script runat=”server”>

void Page_Load(object sender, EventArgs e)
{

 //Location of XML file
string xmlFilePath = @”C:\Data\Employees.xml”;   
try
{

 //Get reference to the XmlReader object
using (XmlReader reader = XmlReader.Create(xmlFilePath))
{

 string result;
while (reader.Read())
{

 //Process only the elements
if (reader.NodeType == XmlNodeType.Element)
{

 //Reset the variable for a new element
result = “”;
for (int count = 1;count <= reader.Depth; count++)
{

 result += “===”;

}
result += “=> “ + reader.Name + “<br/>”;
lblResult.Text += result;

}

}

}       

}
catch(Exception ex)
{

 lblResult.Text = “An Exception occurred: “ + ex.Message;

} 

}

</script>

<html xmlns=”http://www.w3.org/1999/xhtml” >
<head runat=”server”>

<title>Reading an XML File using XmlReader</title>

</head>
<body>

<form id=”form1” runat=”server”>
<div>

<asp:label id=”lblResult” runat=”server” />

</div>
</form>

</body>
</html>






Chapter 4

Before examining the code, here is the output produced by Listing 4-2.

Figure 4-2

The first step is to import all the namespaces required to execute the page — the .NET libraries for the
XML parser, most of which are primarily contained in the System.Xml namespace.

<%@ Import Namespace=”System.Xml” %>

Next, within the Page_Load function, a variable containing the location of the XML file is defined. The
code then declares an XmlReader object within the scope of a using block by invoking the Create
method of the XmlReader object.

using (XmlReader reader = XmlReader.Create(xmlFilePath))

Among the many enhancements made to the XmlReader class in .NET Framework
2.0, an important feature is the ability to dispose of the resources used by the

XmlReader by invoking the Dispose method. This is made possible by the fact that
the XmlReader class now implements the IDisposable interface. Because of this, you
can now enclose the creation of the XmlReader object within the scope of a using
block and the resources utilized by the XmlReader will be automatically released at
the end of the using block.

Note that XmlReader object isn’t limited to reading from files. Various overloads of the Create()

method enable you to take XML input from URLs, streams, strings, and other Reader objects. The next
step is to read the XML file — a simple matter because the XmlReader object provides a Read() method
for just this purpose. This method returns true if it encounters a node in the XML file. After it is finished
with the file, it returns false. This makes it easy to process an entire file simply by wrapping the method
call in a “while” loop. Inside the while loop, there is code to process element nodes and format them for
display.







Reading and Writing XML Data Using XmlReader and XmlWriter

The NodeType property of the current node can be used to filter out the elements for further processing.

if (reader.NodeType == XmlNodeType.Element)

The rest of the code in the “while” loop ensures that the output is formatted properly for display in the
browser. Pay special attention to the use of the Depth property, which holds an integer value specifying
the depth of the current node in the tree hierarchy. Simply put, the element <employees> is at depth 0;
the element <employee> is at depth 1, and so on.

It is important to realize that a node read by the Read method does not correspond to an entire XML ele-
ment. For example, look at this XML element:

<city>Seattle</city>

From the perspective of the XmlReader, the three nodes will be read in the following order:
1. A node corresponding to the opening tag. This node has type Element and local name ‘city’.
2. A node corresponding to the data. This node has type Text and value ‘Seattle’.
3. A node corresponding to the closing tag. This node has type EndElement and local name ‘city’.

That takes care of handling elements. But what about the attributes contained within each element? In a
later section, you see the steps involved in processing attributes using the XmlReader class.

Dealing with Exceptions

When the XmlReader class processes an XML file, it checks the XML file for well-formedness and also
resolves external references (if any). Problems can crop up in many places, aside from the obvious one
where the specified file is not found or cannot be opened. Any XML syntax error will raise an exception
of type System.Xml.XmlException. The Message property of this class returns a descriptive message
about the error (as is the case with all Exception classes). This message also includes the line number
and position where the error was found. The XmlException class has two additional properties —

LineNumber and LinePosition — that return the line number and character position of the error,
respectively. You can use this information as needed. For example, your program could open and dis-
play the offending XML file with a pointer indicating where the error occurred.

Exception handling in programs that use the XmlReader class (and other XML-related classes) follows
this general scheme:

1. Catch exceptions of type XmlException to deal with XML parsing errors.

2. Catch other exceptions to deal with other types of errors.

For reasons of brevity, the previous example shown in Listing 4-2 handled all the exceptions including
the XmlException in a single catch block as opposed to creating two catch blocks.

Handling Attributes in an XML File

XML elements can include attributes, which consist of name/value pairs and are always string data. In
the sample XML file, the employee element has an id attribute. As you play with the sample code in
Listing 4-2, you may notice that when the nodes are read in, you don’t see any attributes. This is because