开发者

How to dynamically filter out XML nodes using PowerShell?

I would really appreciate any help you may have, regarding the following problem:

I'm processing large amounts of XML data using PowerShell. XML is stored in .txt files and my PowerShell script after reading the file writes the content into the database.

I would like to filter-out some XML nodes if they do not have proper "signatureNumber" (verifying it either by length, or preferably with regular expression).

Below is the XML structure:

<Objs xmlns="http://schemas.microsoft.com/powershell/2004/04" Version="1.1.0.1">
  <Obj RefId="0">
    <TN RefId="0">
      <T>WebServiceProxy.TestOutputElement</T>
      <T>System.Object</T>
    </TN>
    <ToString>WebServiceProxy.TestOutputElement</ToString>
    <Props>
      <DT N="declarationDate">2011-08-29T10:28:17</DT>
      <B N="declarationDateSpecified">true</B>
      <Nil N="testDate" />
      <B N="testDateSpecified">true</B>
      <S N="XMLdocument">&lt;?xml S>
      <I32 N="id">1359569</I32>
      <B N="idSpecified">true</B>
      <I32 N="decisionCode">5</I32>
      <B N="decisionCodeSpecified">true</B>
      <S N="documentStatus">issued</S>
      <S N="incidentSignature">Nc-e 491993/11</S>
      <S N="signatureNumber">11111111111/222222/33</S> <----- signature length (21) is OK! We want the whole <Obj> 
    </Props>
  </Obj>
  <Obj RefId="1">
    <TNRef RefId="0" />
    <ToString>WebServiceProxy.TestOutputElement</ToString>
    <Props>
      <DT N="declarationDate">2011-08-29T10:28:18</DT>
      <B N="declarationDateSpecified">true</B>
      <Nil N="testDate" />
      <B N="testDateSpecified">true</B>
      <S N="XMLdocument">&lt;?xml D__x000A_</S>
      <I32 N="id">1359570</I32>
      <B N="idSpecified">true</B>
      <I32 N="decisionCode">5</I32>
      <B N="decisionCodeSpecified">true</B>
      <S N="documentStatus">issued</S>
      <S N="incidentSignature">Nc-e 491923/11</S>
      <S N="signatureNumber">test</S> <----- wrong signature! <Obj> should be filtered out!
    </Props>
  </Obj>

The content is read开发者_JS百科 in loops using similar code:

$filedata = Get-Content ("C:\EXPORT\MyData"+$pageNumber+".txt")

Right after reading each file, the XML is written into database:

$Command.CommandText = "INSERT INTO dbo.ImportXml (MethodName,XmlData) VALUES ('"+$methodName+"','"+ $filedata+ "')"
$Command.ExecuteNonQuery() >> $log_message

The goal is to filter-out all <Obj> elements from the $filedata variable, if they contain "signatureNumber" with length different from 21. Everything must be done before the INSERT.

I would really appreciate any advice!

UPDATE: Just to clarify everything. In my example <Obj RefId="0"> is OK and should be inserted, and <Obj RefId="1"> should be completely removed from the XML.


Since you are loading the XML into the database, you will have to resort to some ugly regex I think:

$filedata = [System.IO.File]::ReadAllText("C:\EXPORT\MyData"+$pageNumber+".txt")
$re=[regex] '(?s)<Obj.*?<S N="signatureNumber">(.*?)</S>.*?</Obj>'
$m = $re.Matches($filedata)
$m | ?{ $_.Groups[1].value.length -ne 21} | %{ $filedata = $filedata.Replace($_.value,"")   }

$filedata

If you were using the XML in Powershell, I would have suggested something like this:

$fileXml = [xml]$filedata

$filedata = foreach ($obj in $fileXml.Objs.Obj){
    $obj.Props.S | ?{ $_.N -eq "signatureNumber"} | %{if( $_."#text".length -eq 21) {$obj}}

}

$filedata
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜