How to prevent XmlReader from skipping to end of file, or how to reset the reader
Here is a portion of the XML file I'm reading:
<?xml version="1.0"?>
<movie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" ThumbGen="1">
<hasrighttoleftdirection>false</hasrighttoleftdirection>
<title>A Nightmare on Elm Street</title>
<originaltitle>A Nightmare on Elm Street</originaltitle>
<year>1984</year>
<plot>Years after being burned alive by a mob of angry parents, child murderer Freddy Krueger returns to haunt the dreams -- and reality -- of local teenagers. As the town's teens begin dropping like flies, Nancy and her boyfriend, Glen, devise a plan to lure the monster out of the realm of nightmares and into the real world.</plot>
<tagline>A scream that wakes you up, might be your own...</tagline>
<metascore>78</metascore>
<trailer>http://www.youtube.com/watch?v=996</trailer>
<rating>8.6</rating>
<episodes />
<episodesnames />
<writers />
<gueststars />
<id>tt0087800</id>
<releasedate>11.09.1984</releasedate>
<actor>
<name>Robert Englund</name>
<name>Heather Langenkamp</name>
<name>Johnny Depp</name>
<name>Ronee Blakley</name>
<name>John Saxon</name>
<name>Amanda Wyss</name>
<name>Jsu Garcia</name>
<name>Charles Fleischer</name>
<name>Joseph Whipp</name>
<name>Lin Shaye</name>
<name>Joe Unger</name>
<name>Mimi Craven</name>
<name>David Andrews</name>
</actor>
<genre>
<name>Horror</name>
<name>Comedy</name>
</genre>
<director>
<name>Wes Craven</name>
</director>
<runtime>91</runtime>
<certification>R</certification>
<studio>
<name>New Line Cinema</name>
</studio>
<country>
<name>United States of America</name>
</country>
...
...
...
</movie>
Problem I'm running into is when I check for MPAA, if it doesn't exist, then it runs to the end of the file, and I get stuck there. Not all the movies will have the MPAA, and for some reason, the XML doesn't include an empty element in that case.
I need to figure out how I can test for it, and not lose the position, or how I can reset the position of the reader back to the top.
I tried reader.ResetState(), but I get "Root element is missing" error.
THEN, when I'm done with the file, I can't figure out how to dispose of it so I can move to the next file in the list.
Yes, I'm a mess.
I'll admit I'm new to XML. Hopefully you can get the idea of what's going on with the code below. I would greatly appreciate suggestions on better/alternative ways to process these XML file. I have about 2000 of them, and they average 360 lines (15KB), but a few are 500 lines (50KB).
public static void ProcessMovies(string wPath, string cPath, string iPath)
{
int lineID = 0;
string strMovie = null;
string strTitle = null;
string strYear = null;
string strPlot = null;
string strRating = null;
string strMPAA = null;
string strCertification = null;
string strGenre = null;
// initiates streamwriter for output file
FileInfo fi = new FileInfo(cPath + Path.DirectorySeparatorChar + "catalog.html");
StreamWriter catalog = fi.AppendText();
// pulls list of file and sorts them alphabetically
// TODO: do "library sort" that ignores The, A at beginning of title
string[] fns = Directory.GetFiles(wPath, "*.nfo");
var sort = from fn in fns
orderby new FileInfo(fn).Name ascending
select fn;
foreach (string n in sort)
{
if (lineID == 0)
catalog.WriteLine(" <tr id=\"odd\">");
else
catalog.WriteLine(" <tr id=\"even\">");
Console.WriteLine("Processing: " + n);
XmlTextReader reader = new XmlTextReader(n);
reader.ReadToFollowing("title");
strTitle = reader.ReadElementContentAsString();
reader.ReadToFollowing("year");
strYear = reader.ReadElementContentAsString();
reader.ReadToFollowing("plot");
strPlot = reader.ReadElementContentAsString();
reader.ReadToFollowing("rating");
strRating = reader.ReadElementContentAsString();
if (reader.ReadToFollowing("mp开发者_JAVA百科aa"))
strMPAA = reader.ReadElementContentAsString();
else
strMPAA="UNKNOWN";
// ugly code to try to read multiple embedded <name> elements within <genre>
// NOTE: Possible only 1 genre
reader.ResetState();
reader.ReadToFollowing("genre");
reader.Read();
while ((reader.Name != "genre"))
{
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
strGenre += reader.Value + ", ";
}
strGenre = strGenre.Substring(0, strGenre.Length - 2);
reader.ReadToFollowing("certification");
strCertification = reader.ReadElementContentAsString();
reader.Close();
strMovie = " <td>\r\n" + " <img src=\"" + JPG_FILE_NAME + "\" width=\"75\" height=\"110\">\r\n" + " </td>\r\n" + " <td>\r\n" + " <div id=\"title\">" + strTitle + "</div>" + " <div id=\"mpaa\">" + strMPAA + "</div>" + " <div id=\"genre\">" + strGenre + "</div>" + " <div id=\"plot\">" + strPlot + "</div>" + " </td>" + " </tr>";
catalog.WriteLine(strMovie);
}
catalog.Close();
}
******************** EDIT ********************
Okay, I edited the code that processes the XML to the following per Henk's suggestion:
var doc = XDocument.Load(n); // takes care of all Open/Close issues
strTitle = doc.Root.Element("title") == null ? "" : doc.Root.Element("title").Value;
strYear = doc.Root.Element("year") == null ? "" : doc.Root.Element("year").Value;
strPlot = doc.Root.Element("plot") == null ? "" : doc.Root.Element("plot").Value;
strRating = doc.Root.Element("rating") == null ? "" : doc.Root.Element("rating").Value;
strMPAA = doc.Root.Element("mpaa") == null ? "" : doc.Root.Element("mpaa").Value;
strCertification = doc.Root.Element("certification") == null ? "" : doc.Root.Element("certification").Value;
It works great, thank you very much!!!
Now for the last bit, how can I get the Genres from the genre name using this method? I can't search for the name element since it is used in various elements. I wasn't sure if I could work with:
doc.Root.Element("genre").ElementsAfterSelf("name");
Wasn't clear on what that returns, or how it would handle multiple "names".
Unless your data is >> 100 MB, read it into a XDocument or XmlDocument.
With XmlTextReader, you cannot search for optional elements. You can only retrieve and store each element as it comes along, and then 'search' for your elements in your own data structures.
Roughly, using Sytem.Xml.Linq
var doc = XDocument.Load(fileName); // takes care of all Open/Close issues
string title = doc.Element("title").Value;
string mpaa = doc.Element("title") == null ? "" : doc.Element("mpaa").Value;
精彩评论