Learn Xml: Reading and Writing XML Data Using XmlReader and XmlWriter

XmlNodeReader— In case you are looking to implement the pull model on a DOM tree that’s

already present in memory, you can consider using the XmlNodeReader class. Best-suited only

for the very specialized application previously mentioned, this class allows you to read the data

from specific nodes of the tree and enjoy a double benefit — the speed associated with the

XmlReader class and the ease of use of the DOM. You see usage of this class in Chapter 6.

Typically, you would create objects of these classes and use their methods and properties. If warranted,

you may also extend these classes to provide further specific functionalities. The XmlWriter class has

only one derived class: XmlTextWriter. The XmlWriter can be used to write XML document on a for-

ward-only basis. The classes utilized for writing XML data are as follows:

XmlWriter —Is an abstract class that provides a “forward-only, read-only, non-cached” way of

generating XML streams. By creating the XmlWriter object using the static Create() method,

you can take advantage of the new features of XmlWriter object in .NET Framework 2.0.

XmlTextWriter — Provides a writer that provides a “forward-only, read-only, non-cached” way

of generating XML streams. Note that this class is obsolete in .NET Framework 2.0 and should

only be used in situations where you require backward compatibility with an application created

using .NET 1.x versions.

Now that you have an overview of the different classes available for reading and writing, the following

section focuses on reading XML data with the XmlReader class.

Reading XML with XmlReader

XmlReader provides you with a way to parse XML data that minimizes resource usage by reading

forward through the document, recognizing elements as it reads. This approach results in very little data

being cached in memory, but the forward-only style has two main consequences. The first is that it isn’t

possible to go back to an earlier point in the file without starting to read from the top again. The second

consequence is slightly more subtle: elements are read and presented to you one by one, with no context.

If you need to keep track of where an element occurs within the document structure, you’ll need to do it

yourself. If either of these shortcomings sounds like limitations to you, you might need to use the DOM

style XmlDocument class, which is discussed later in Chapter 6 of this book.

Overview of XmlReader

The XmlReader class allows you to access XML data from a stream or XML document. This class pro-

vides fast, non-cacheable, read-only, and forward-only access to XML data. In .NET Framework 1.x, the

XmlReader is an abstract class that provides methods that are implemented by the derived classes to

provide access to the elements and attributes of XML data. With the release of .NET Framework 2.0,

however, the XmlReader class is a full-featured class similar to the XmlTextReader class and provides

standards-based support to read XML data. You use XmlReader classes to determine various factors

such as the depth of a node in an XML document, whether the node has attributes, the number of

attributes in a node, and the value of an attribute.

Although you can use the XmlTextReader class to read XML data, the preferred approach to reading

XML data is to use the XmlReader object that is created through the static Create() method of the

XmlReader object. This is because of the fact that the XmlReader object obtained through the

Create() method is much more standards compliant than the XmlTextReader implementation. For

example,the XmlTextReader class does not expand entities by default and does not add default attributes.

The XmlTextReader class is one of the derived classes of the XmlReader class and implements the

methods defined by the XmlReader class. The XmlValidatingReader is another class in .NET

Framework 1.x that is derived from the XmlReader class, allowing you to not only read XML data but

also support DTD and schema validation. Note that in .NET Framework 2.0, both XmlTextReader and

XmlValidatingReader classes are obsolete, whose functionalities are now provided by the XmlReader

and XmlReaderSettings class, respectively.

Steps Involved in Using XmlReader to Read XML Data

The XmlReader class is designed for fast, forward-only access to the contents of an XML file, and is not

suited for making modifications to the file’s contents or structure (for that you will use the XmlDocument

class). The XmlReader class works by starting at the beginning of the file and reading one node at a

time. As each node is read, you can either ignore the node or access the node information as dictated by

the needs of the application.

The steps for using the XmlReader class are as follows:

1. Create an instance of the class using the Create() method of the XmlReader class, passing to

the method the name of the XML file to be read.

2. Set up a loop that calls the Read() method repeatedly. This method starts with the first node in

the file and then reads all remaining nodes, one at a time, as it is called. It returns true if there is

a node to read, false when the end of the file has been reached.

3. In the loop, examine the properties and methods of the XmlReader object to obtain information

about the current node (its type, name, data, and so on). Loop back until Read() returns False.

The XmlReader class has a large number of properties and methods. The ones that you will need most

often are explained in Table 4-1 and Table 4-2.

Table 4-1. Important Properties of the XmlReader Class

Property

Description

AttributeCount Returns the number of attributes in the current node

Returns the depth of the current node; used to determine if

a specific node has child nodes

Depth

Indicates if the reader is positioned at the end of the stream

EOF

HasAttributes Returns a boolean value indicating if the current node has

attributes

Returns a boolean value indicating if the current node can

have a value

HasValue

IsEmptyElement Indicates if the current node is an empty element

Returns the local name of the current node

LocalName

Returns the qualified name of the current node

Name

07_596772 ch04.qxd 12/13/05 11:23 PM Page 65

Reading and Writing XML Data Using XmlReader and XmlWriter

Property

Description

NamespaceURI Returns the namespace URI of the current node

Returns the type of the current node in the form of an

NodeType

XmlNodeType enumeration

Returns the namespace prefix associated with the

current node

Prefix

Returns the current state of the reader in the form of

ReadState

ReadState enumeration

Settings

Returns the XmlReaderSettings object used to create the

XmlReader instance

Gets the value of the current node

Value

Gets the CLR type of the current node

ValueType

Now that you have an understanding of the important properties of the XmlReader class, Table 4-2 out-

lines the important methods of the XmlReader class.

Important Methods of the XmlReader Class

Method

Description

Closes the XmlReader object by setting the ReadState

enumeration to Closed

Create

Factory method that creates an instance of the XmlReader

object and returns it to the caller; the preferred mechanism

for obtaining XmlReader instances

GetAttribute Gets the value of an attribute

IsStartElement Indicates if the current node is a start tag

MoveToAttribute Moves the reader to the specified attribute

MoveToContent Moves the reader to the next content node if the current

node is not a content node

MoveToElement Moves the reader to the element that contains the current

attribute; used when you are enumerating through the

attributes and you want to switch back to the element that

contains all these attributes

MoveToFirstAttribute Moves the reader to the first attribute of the current node

MoveToNextAttribute Moves the reader to the next attribute; used especially when

you are enumerating through the attributes in a node

Reads the next node from the stream

Read

ReadContentAs Reads the content as an object of the supplied type

Method

Description

ReadElementContentAs Reads the current element and returns it contents as an

object of the type specified

ReadEndElement Moves the reader past the current end tag and moves onto

the next node

ReadInnerXml Reads all of the node’s content including the markup as a

string

ReadOuterXml Reads the node’s content including the current node

markup and all its children

ReadToDescendant Moves the reader to the next matching descendant element

ReadToFollowing Reads until the named element is found

ReadToNextSibling Advances the reader to the next matching sibling element

ReadValueChunk Allows you to read large streams of text embedded in an

XML document

In addition to the methods described in Table 4-2, XmlReader also exposes a variety of

ReadContentAsXXX() methods such as:

ReadContentAsBase64()

ReadContentAsBinHex()

ReadContentAsBoolean()

ReadContentAsDateTime()

ReadContentAsDouble()

ReadContentAsInt()

ReadContentAsLong()

ReadContentAsObject()

ReadContentAsString()

As the name suggests, these methods return the node value as an object of the type specified in the

method name. For instance, the ReadContentAsString() method returns the node value as an object

of type string. Similar to the ReadContentAsXXX() methods, there are also a number of variations of

the ReadElementContentAsXXX() method. These methods are:

ReadElementContentAsBase64()

ReadElementContentAsBinHex()

ReadElementContentAsBoolean()

ReadElementContentAsDateTime()

ReadElementContentAsDouble()

Reading and Writing XML Data Using XmlReader and XmlWriter

ReadElementContentAsInt()

ReadElementContentAsLong()

ReadElementContentAsObject()

ReadElementContentAsString()

The most important function in all of these functions is Read(), which tells the XmlReader to fetch the

next node from the document. After you’ve got the node, you can use the NodeType property to find out

what you have. The NodeType property returns one of the members of the XmlNodeType enumeration,

whose members are listed in the Table 4-3.

Table 4-3. Members of the XmlNodeType Enumeration

Member

Description

An attribute, for example id=1

Attribute

CDATA

A CDATA section, for example <![CDATA[Some text]]>

Comment

An XML comment, for example

The document object, representing the root of the XML tree

Document

DocumentFragment A fragment of XML that isn’t a document in itself

DocumentType A document type declaration

Element, EndElement The start and end of an element

Entity, EndEntity The start and end of an entity declaration

EntityReference An entity reference (for example, <)

Used if the node type is queried when no node has been read

None

A notation entry in a DTD

Notation

ProcessingInstruction An XML processing instruction

SignificantWhitespace White space in a mixed content model document, or when

xml:space=preserve has been set

The text content of an element

Text

White space between markup

Whitespace

XmlDeclaration The XML declaration at the top of a document

Now that you have understood the important properties and methods, take a look at the different ways

of creating documents, elements, attributes, and other data in the next few sections.

Chapter 4

Start Reading a Document

To begin reading an XML document, you can call any of the Read() methods to extract data from the

document. For example, this code snippet uses the ReadStartElement() to move to the first element in

the document:

XmlReader reader = XmlReader.Create(“Employees.xml”);

//Skip the XML declaration and go to the first element

reader.ReadStartElement();

Alternatively, you can just jump straight to the document content by calling MoveToContent(), which

skips to the next content node if the current node is not a content node. (Content nodes are the CDATA,

Element, Entity, and EntityReference nodes.) If positioned on an attribute, the reader will move back to

the element that contains the attribute.

XmlReader reader = XmlReader.Create(“Employees.xml”);

reader.MoveToContent();

In the examples shown, if Employees.xml looks as follows

<?xml version=”1.0”?>

Nancy

</firstName>

the previous code would advance to the <firstName> element and skip everything before it in the prolog.

Reading Elements

The Read(), ReadString(), ReadStartElement(), and ReadEndElement() methods can all be used

to read Element nodes from the XML source. After reading the element, each method advances to the

next node in the document. In comparison, the MoveToElement() method moves to the next Element,

but does not read it.

The Read() method is the simplest: It reads the next node in the source whether or not it is an Element

node. When using this method, you should check the node’s name and type to make sure you are

processing an appropriate node. For example, the following code uses the Read() method and the

NodeType property of the XmlReader to read only Comment nodes:

XmlReader reader = XmlReader.Create(“Employees.xml”);

//Read the nodes in a loop

while (reader.Read())

{

if (reader.NodeType == XmlNodeType.Comment)

{

//Code to process Comments

}

As you read through the XML document using the XmlReader object, if you examine the ReadState

property of the XmlReader object, you will find that it provides different values depending on the state of

the XmlReader. Table 4-4 summarizes the states of the XmlReader as it reads through the various portions

of an XML document.

Reading and Writing XML Data Using XmlReader and XmlWriter

Table 4-4. Members of the ReadState Enumeration

State

Description

The reader enters this state when the Close method is called

Closed

Signals the end of the XML document

EndOfFile

Specifies that an error has occurred and the error prevents

the reader from continuing the read operation

Error

The reader is in this state before the invocation of the Read

method

Initial

The reader is in this state after the Read method has been

called and can respond to the additional methods

Interactive

Reading Attributes

Before you attempt to read attributes in an element node, you should first use the HasAttributes property

to make sure that the element node contains attributes. Attributes in an element node can be accessed

directly by their name or index. They can also be accessed by the MoveToAttribute(),

MoveToFirstAttribute(), and MoveToNextAttribute() methods.

For example, to process an attribute by name, you can call MoveToAttribute() with the name of the

attribute.

XmlReader reader = XmlReader.Create(“Employees.xml”);

//Move to the first element

reader.MoveToElement();

if (reader.HasAttributes)

{

reader.MoveToAttribute(“id”)

//Code to do something with the attribute value stored in id attribute

}

You see a complete example on the use of attributes in a later section of this chapter.

Reading Content and Other Data

Your application can use the ReadString() method to read the content of the current node as a string.

You can also read the content of the element using the various forms of the ReadElementContentAsXXX

methods. In addition to those methods, you also have the ReadContentAsXXX methods that allow you

to read the text content at the current position. For example, using the ReadContentAsDouble()

method, you can read the text content at the current position as a Double value. The ReadString()

method behaves differently depending on the element the reader is currently positioned in.

If the current node is an Element node, ReadString() concatenates all text, significant white

space, white space, and CDATAsection node types within the Element node and returns the

concatenated data as the Element node’s content.

If the current node is a Text node, ReadString() performs the same concatenation on the Text

node’s end tag as it did on the Element node.

If the current node is an Attribute node, ReadString() behaves as though the reader were

currently positioned on the starting tag of the Element node and returns data as described for

Element nodes.

For all other node types, ReadString() returns an empty string.

Microsoft has greatly enhanced XML support in the .NET Framework 2.0 by adding

strong type support to all the XML processing classes. An example of this is the

introduction of methods like ReadElementContentAsInt() to the XmlReader class

that allow you to read the contents of an XML node in a strongly typed manner.

Accomplishing this in .NET 1.x would mean that you read the XML node as a string

and then convert that to appropriate data type using a helper class such as XmlConvert.

This is no longer required in .NET Framework 2.0 because of the native support that is

available for almost all of the XML processing classes. In addition to the strongly typed

support, Microsoft also has greatly enhanced the performance of the XmlReader and

XmlWriter classes.

Now that you have a complete understanding of the various methods and properties of the XmlReader

class, it is time to look at examples that exercise all of these concepts.

Reading an XML File Using XmlReader

Now that you know the theory, this section begins with an example to demonstrate how to read an XML

document using an XmlReader object. This simple example leverages the functionalities of the XmlReader

class to parse a static XML file named Employees.xml. Here’s the XML file, a list of employees in an

organization, shown in Listing 4-1.

Listing 4-1: Employees.xml File

<?xml version=’1.0’?>

<name>

<firstName>Nancy</firstName>

<lastName>Davolio</lastName>

</name>

<city>Seattle</city>

</employee>

<name>

<firstName>Andrew</firstName>

<lastName>Fuller</lastName>

</name>

<city>Tacoma</city>

</employee>

</employees>

Reading and Writing XML Data Using XmlReader and XmlWriter

Now that you have seen the contents of the Employees.xml file, Listing 4-2 shows the ASP.NET code

that allows you to parse the Employees.xml file.

Listing 4-2: Processing the Elements of the Employees XML File Using XmlReader Class

<%@ Page Language=”C#” %>

<%@ Import Namespace=”System.Xml” %>

void Page_Load(object sender, EventArgs e)

{

//Location of XML file

string xmlFilePath = @”C:\Data\Employees.xml”;

try

{

//Get reference to the XmlReader object

using (XmlReader reader = XmlReader.Create(xmlFilePath))

{

string result;

while (reader.Read())

{

//Process only the elements

if (reader.NodeType == XmlNodeType.Element)

{

//Reset the variable for a new element

result = “”;

for (int count = 1;count <= reader.Depth; count++)

{

result += “===”;

}

result += “=> “ + reader.Name + “<br/>”;

lblResult.Text += result;

}

catch(Exception ex)

{

lblResult.Text = “An Exception occurred: “ + ex.Message;

}

</script>

<title>Reading an XML File using XmlReader</title>

</head>

<body>

<div>

<asp:label id=”lblResult” runat=”server” />

</div>

</form>

</body>

</html>

Chapter 4

Before examining the code, here is the output produced by Listing 4-2.

Figure 4-2

The first step is to import all the namespaces required to execute the page — the .NET libraries for the

XML parser, most of which are primarily contained in the System.Xml namespace.

<%@ Import Namespace=”System.Xml” %>

Next, within the Page_Load function, a variable containing the location of the XML file is defined. The

code then declares an XmlReader object within the scope of a using block by invoking the Create

method of the XmlReader object.

using (XmlReader reader = XmlReader.Create(xmlFilePath))

Among the many enhancements made to the XmlReader class in .NET Framework

2.0, an important feature is the ability to dispose of the resources used by the

XmlReader by invoking the Dispose method. This is made possible by the fact that

the XmlReader class now implements the IDisposable interface. Because of this, you

can now enclose the creation of the XmlReader object within the scope of a using

block and the resources utilized by the XmlReader will be automatically released at

the end of the using block.

Note that XmlReader object isn’t limited to reading from files. Various overloads of the Create()

method enable you to take XML input from URLs, streams, strings, and other Reader objects. The next

step is to read the XML file — a simple matter because the XmlReader object provides a Read() method

for just this purpose. This method returns true if it encounters a node in the XML file. After it is finished

with the file, it returns false. This makes it easy to process an entire file simply by wrapping the method

call in a “while” loop. Inside the while loop, there is code to process element nodes and format them for

display.

Reading and Writing XML Data Using XmlReader and XmlWriter

The NodeType property of the current node can be used to filter out the elements for further processing.

if (reader.NodeType == XmlNodeType.Element)

The rest of the code in the “while” loop ensures that the output is formatted properly for display in the

browser. Pay special attention to the use of the Depth property, which holds an integer value specifying

the depth of the current node in the tree hierarchy. Simply put, the element <employees> is at depth 0;

the element <employee> is at depth 1, and so on.

It is important to realize that a node read by the Read method does not correspond to an entire XML ele-

ment. For example, look at this XML element:

<city>Seattle</city>

From the perspective of the XmlReader, the three nodes will be read in the following order:

1. A node corresponding to the opening tag. This node has type Element and local name ‘city’.

2. A node corresponding to the data. This node has type Text and value ‘Seattle’.

3. A node corresponding to the closing tag. This node has type EndElement and local name ‘city’.

That takes care of handling elements. But what about the attributes contained within each element? In a

later section, you see the steps involved in processing attributes using the XmlReader class.

Dealing with Exceptions

When the XmlReader class processes an XML file, it checks the XML file for well-formedness and also

resolves external references (if any). Problems can crop up in many places, aside from the obvious one

where the specified file is not found or cannot be opened. Any XML syntax error will raise an exception

of type System.Xml.XmlException. The Message property of this class returns a descriptive message

about the error (as is the case with all Exception classes). This message also includes the line number

and position where the error was found. The XmlException class has two additional properties —

LineNumber and LinePosition — that return the line number and character position of the error,

respectively. You can use this information as needed. For example, your program could open and dis-

play the offending XML file with a pointer indicating where the error occurred.

Exception handling in programs that use the XmlReader class (and other XML-related classes) follows

this general scheme:

1. Catch exceptions of type XmlException to deal with XML parsing errors.

2. Catch other exceptions to deal with other types of errors.

For reasons of brevity, the previous example shown in Listing 4-2 handled all the exceptions including

the XmlException in a single catch block as opposed to creating two catch blocks.

Handling Attributes in an XML File

XML elements can include attributes, which consist of name/value pairs and are always string data. In

the sample XML file, the employee element has an id attribute. As you play with the sample code in

Listing 4-2, you may notice that when the nodes are read in, you don’t see any attributes. This is because

Reading and Writing XML Data Using XmlReader and XmlWriter

lanka sri

Blogs