开发者

How to handle a LINQ expression that fails if an element is missing

I'm a rookie with LINQ to XML and I've got this code that works (most of the time):

private long processFile(StreamWriter oWriter, string inFileName)
    {
        XDocument xmlDoc = XDocument.Load(inFileName);
        List<DocMetaData> docList =
            (from d in xmlDoc.Descendants("DOCUMENT")
             select new DocMetaData
             {
                 Folder = d.Element("FOLDER").Attribute("name").Value
                 ,
                 File = d.Element("FILE").Attribute("filename").Value
                 ,
                 Comment = d.Elements("INDEX")
                    .Where(i => i.Attribute("name").Value == "Comment(idmComment)")
                    .First()
                    .Attribute("value").Value
                 ,
                 Title = d.Elements("INDEX")
                    .Where(i => i.Attribute("name").Value == "Title(idmName)")
                    .First()
                    .Attribute("value").Value
                 ,
                 DocClass = d.Elements("INDEX")
                    .Where(i => i.Attribute("name").Value == "Document Class(idmDocType)")
                    .First()
                    .Attribute("value").Value
             }
            ).ToList<DocMetaData>();
        OutputListToFile(oWriter, docList);
        return docList.LongCount();
    }

This fails on line 117 (the select expression) with:

    System.NullReferenceException: Object reference not set to an instance of an object.
   at CBMI.WinFormsUI.GridForm.<processFile>b__3(XElement d) in C:\ProjectsVS2010\CBMI.LatitudePostConverter\CBMI.LatitudePostConverter\CBMI.WinFormsUI\GridForm.cs:line 117
   at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at CBMI.WinFormsUI.GridForm.processFile(StreamWriter oWriter, String inFileName) in C:\ProjectsVS2010\CBMI.LatitudePostConverter\CBMI.LatitudePostConverter\CBMI.WinFormsUI\GridForm.cs:line 115
   at CBMI.WinFormsUI.GridForm.btnProcess_Click(Object sender, EventArgs e) in C:\ProjectsVS2010\CBMI.LatitudePostConverter\CBMI.LatitudePostConverter\CBMI.WinFormsUI\GridForm.cs:line 85

The data is well-formed XML. There are many <DOCUMENT> nodes in this given XML file and most but not all, <DOCUMENT> nodes contain a <FOLDER> node. I discovered this by brute force opening the XML file with VStudio 2010 and using Find command which gives counts of matching lines.

Is there a way I can improve the LINQ such that it does not fail when the data is not perfect? And, is there a way to see which part of the LINQ expression is actually failing (I'm guessing that it is due to the missing <FOLDER> nodes but that could be wrong and is an ugly brute force way to troubleshoot).

Here is one <DOCUMENT> that DOES contain the proper <FOLDER> node (at the very bottom):

    <?xml version="1.0" ?>
<DOCUMENTCOLLECTION>
<DOCUMENT>
<FILE filename="P:\LatitudeConsulting\LatConConverter-1.8.2\ConverterOutput\B0000002\3rd Party CON\D003694452.0001.tif" 
      outputpath="P:\LatitudeConsulting\LatConConverter-1.8.2\ConverterOutput\B0000002\3rd Party CON"/>
<ANNOTATION filename=""/>
<INDEX name="Access Level(idmAccessLevel)" value="Admin"/>
<INDEX name="Added By Group(idmAddedByGroup)" value="General Users"/>
<INDEX name="Added By User(idmDocOwner)" value="Import"/>
<INDEX name="Allow Secondary Version Lines?(idmDocVariants)" value="Yes"/>
<INDEX name="Application(idmVerApplication)" value=""/>
<INDEX name="Archive Category(idmDocDispCategory)" value="Archive"/>
<INDEX name="Archive Date(idmVerDispDate)" value=""/>
<INDEX name="Archive Repository(idmVerDispId)" value=""/>
<INDEX name="ArchivedDocument" value="NO"/>
<INDEX name="Availability Status(idmVerAvailStat)" value="Online"/>
<INDEX name="CAN(idmDocCustom4)" value=""/>
<INDEX name="Checked In By Group(idmVerCheckinGroup)" value="General Users"/>
<INDEX name="Checked In By User(idmVerCheckinUser)" value="Import"/>
<INDEX name="Checked Out?(idmVerCheckoutPending)" value="No"/>
<INDEX name="Checkin Date(idmVerCreateDate)" value="3/9/2001 9:20:38 AM"/>
<INDEX name="Child Count(idmVerCD)" value="0"/>
<INDEX name="Comment(idmComment)" value="1983\06_June_Meeting"/>
<INDEX name="Comment(idmVerComment)" value=""/>
<INDEX name="Content Search Repository(idmVerCsiId)" value=""/>
<INDEX name="Current Content Srch Repository(idmDocCurVerCsiId)" value=""/>
<INDEX name="Current Version Author(idmAddedByUser)" value="Import"/>
<INDEX name="Current Version Checked Out?(idmDocCurVerCheckedOut)" value="No"/>
<INDEX name="Current Version Date(idmDocCurVerDate)" value="3/9/2001 9:20:38 AM"/>
<INDEX name="Current Version ID(idmDocCurVerNum)" value="1"/>
<INDEX name="Current Version Index ID(idmDocCurVerCsiCid)" value=""/>
<INDEX name="Date Added(idmDateAdded)" value="3/9/2001 9:20:37 AM"/>
<INDEX name="Default Index Versions?(idmDocCsiDefault)" value="No"/>
<INDEX name="DiagnosticID(idmDocCustom5)" value="2-16.MDB-00015"/>
<INDEX name="Document Class(idmDocType)" value="3rd Party CON"/>
<INDEX name="Encrypted File Name(idmVerShelfFileId)" value="_276no__.__1"/>
<INDEX name="ExternalDocument" value="NO"/>
<INDEX name="File Name" value="51099.TIF"/>
<INDEX name="File Name(idmVerFileName)" value="51099.TIF"/>
<INDEX name="File Size(idmVerFileSize)" value="1166770"/>
<INDEX name="Has Annotations?(idmAnnotation)" value=""/>
<INDEX name="Index ID(idmVerCsiCid)" value=""/>
<INDEX name="Indexed Version Limit(idmDocCsiLimit)" value="1"/>
<INDEX name="Indexing Status(idmVerCsiStatus)" value="Not Indexed"/>
<INDEX name="Item ID(idmId)" value="003694452"/>
<INDEX name="Item ID(idmVerDocId)" value="003694452"/>
<INDEX name="Keyword(idmDocKeywords)" value=""/>
<INDEX name="Last Access Date(idmDateAccessed)" value="11/28/2003 3:05:30 PM"/>
<INDEX name="Last Access Date(idmDateModified)" value="8/24/2011 5:52:34 PM"/>
<INDEX name="Last Access Group(idmVerLastGroup)" value="Administrators"/>
<INDEX name="Last Access User(idmModifiedByUser)" value="Admin"/>
<INDEX name="Last Accessed Version(idmDocLastVerId)" value="1"/>
<INDEX name="Latest Version?(idmVerBranchCurVer)" value="Yes"/>
<INDEX name="Merge-Destination Version ID(idmVerMergeDst)" value="0"/>
<INDEX name="Merge-Source Version ID(idmVerMergeSrc)" value="0"/>
<INDEX name="MimeType" value="image/tiff"/>
<INDEX name="Min Item Delete Access Level(idmDocDeleteAccess)" value=""/>
<INDEX name="Modification Date(idmVerFileDate)" value="12/19/2000 11:12:30 AM"/>
<INDEX name="Number of Indexed Versions(idmDocCsiCount)" value="0"/>
<INDEX name="Offline Location(idmVerOfflineLocation)" value=""/>
<INDEX name="Online Disk Space(idmDocOnlineSize)" value="1166770"/>
<INDEX name="Online Limit(idmDocOnlineLimit)" value="5"/>
<INDEX name="Online Version Count(idmDocOnlineCount)" value="1"/>
<INDEX name="Origin ID(idmDocOriginID)" value=""/>
<INDEX name="Origin Library(idmDocOriginLibrary)" value=""/>
<INDEX name="Original File Name(idmDocOriginalFile)" value="51099.TIF"/>
<INDEX name="Permanent Index?(idmVerCsiPermanent)" value="No"/>
<INDEX name="Permanent Version?(idmVerPermanent)" value="No"/>
<INDEX name="Property ID(idmDocDynPropertyId)" value=""/>
<INDEX name="Protected?(idmDocProtected)" value="Yes"/>
<INDEX name="Publishing Status(idmPublish)" value=""/>
<INDEX name="Reclaim Pending?(idmVerReclaimPending)" value=""/>
<INDEX name="Reclaim Submitted Date(idmVerReclaimDate)" value=""/>
<INDEX name="Replica?(idmDocIsReplica)" value="No"/>
<INDEX name="ReplicatedDocument" value="NO"/>
<INDEX name="Secondary Version Line Count(idmVerBranchCount)" value="0"/>
<INDEX name="Source Version Checkout Date(idmVerPrevCheckoutDate)" value=""/>
<INDEX name="Storage Category(idmDocFileCategory)" value="Documents"/>
<INDEX name="Storage Repository(idmVerShelfId)" value="2"/>
<INDEX name="Title(idmName)" value="3rd Party CON Comments"/>
<INDEX name="Version ID(idmVerId)" value="1"/>
<FOLDER name="/NACAIE/1983/06_June_Meeting/NAPNSC"/>
</DOCUMENT>

EDIT: solution follows (contains LINQ that fixed this problem when FOLDER node might be missing; use of First() might be dangerous practice as others note but in this case missing FOLDER nodes had to be handled):

namespace CBMI.Common
{
    public static class Extensions
    {
    public static string SafeGetAttributeValue(this XElement element, string attribute)
    {
        return (element != null) ?
          (element.Attribute(attribute) != null) ? 
              element.Attribute(attribute).Value : null : null;
    }
}
}
private long processFile(StreamWriter oWriter, string inFileName)
    {
        XDocument xmlDoc = XDocument.Load(inFileName);
        List<DocMetaData> docList =
            (from d in xmlDoc.Descendants("DOCUMENT")
             select new DocMetaData
             {
                 File = d.Element("FILE").Attribute("filename").Value
                 ,
                 ItemID = d.Elements("INDEX")
                    .Where(i => i.Attribute("name").Value == "Item ID(idmId)")
                    .First()
                    .Attribute("value").Value
                 ,
                 Comment = d.Elements("INDEX")
                    .Where(i => i.Attribute("name").Value == "Comment(idmComment)")
               开发者_如何学运维     .First()
                    .Attribute("value").Value
                 ,
                 Title = d.Elements("INDEX")
                    .Where(i => i.Attribute("name").Value == "Title(idmName)")
                    .First()
                    .Attribute("value").Value
                 ,
                 DocClass = d.Elements("INDEX")
                    .Where(i => i.Attribute("name").Value == "Document Class(idmDocType)")
                    .First()
                    .Attribute("value").Value
                 ,
                 Folder = d.Element("FOLDER").SafeGetAttributeValue("name")
             }
            ).ToList<DocMetaData>();
        OutputListToFile(oWriter, docList);
        return docList.LongCount();
    }


You can always check the given node before trying to select it:

Folder = (d.Element("FOLDER") != null) ? (d.Element("FOLDER").Attribute("name") != null)
                                          ? Attribute("name").Value : null
                                       : null

But I admit that can get kind of ugly. In that case you can create an XElement extension method that does that:

public static class Extensions
{
   public static string SafeGetAttributeValue(this XElement element, string attribute)
   {
      return (element != null) ? (element.Attribute(attribute) != null)
                                              ? Attribute(attribute).Value : null
                                           : null
   }
}

Which you could use like:

select new DocMetaData
{
   Folder = d.Element("FOLDER").SafeGetAttributeValue("name"),
   //the rest of your object creation
}


I can't test to see if this is the exact problem for you, but First() throws an exception if there isn't any values in the query. You can avoid that by using FirstOrDefault() instead and then checking if the result is null.


My rule of thumb is to never use the First(), FirstOrDefault(), Single() or SingleOrDefault() methods in the middle of the query. It should always be the last method that is called. If you need to select a property from the selected item, project first, then call the method. It will lead to cleaner looking code making situations like this a whole lot easier to deal with.

e.g.,

// Change this:
d.Elements("INDEX")
 .Where(i => i.Attribute("name").Value == "Comment(idmComment)")
 .First()
 .Attribute("value").Value

// to this:
d.Elements("INDEX")
 .Where(i => i.Attribute("name").Value == "Comment(idmComment)")
 .Select(i => i.Attribute("value").Value)
 .First()

I'd rewrite it as this:

private long processFile(StreamWriter oWriter, string inFileName)
{
    XDocument xmlDoc = XDocument.Load(inFileName);
    List<DocMetaData> docList = xmlDoc.Descendants("DOCUMENT")
        .Select(e => new
        {
            Folder = (string)e.Element("FOLDER").Attribute("name"),
            File = (string)e.Element("FILE").Attribute("filename"),
            Comment = (string)e.Elements("INDEX")
                .Where(i => (string)i.Attribute("name") == "Comment(idmComment)")
                .Select(i => (string)i.Attribute("value"))
                .FirstOrDefault(),
            Title = (string)e.Elements("INDEX")
                .Where(i => (string)i.Attribute("name") == "Title(idmName)")
                .Select(i => (string)i.Attribute("value"))
                .FirstOrDefault(),
            DocClass = (string)e.Elements("INDEX")
                .Where(i => (string)i.Attribute("name") == "Document Class(idmDocType)")
                .Select(i => (string)i.Attribute("value"))
                .FirstOrDefault(),
        })
        .ToList();
    OutputListToFile(oWriter, docList);
    return docList.LongCount();
}


In your linq query, on every fields that might be null, hence throwing an exception if you .Value them, you could :

MyDone = PossibleNullNode != null ? PossibleNullNode.Value : WhateverFloatsYourBoat
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜