开发者

Selecting text spanning multiple lines using grep and regular expressions

I'm trying to match lines with the xs:element tag that only contain minOccurs. As seen below some of them contain both search criteria on one line, some of them span multiple lines. Is there a way of selecting them using grep and regular expressions.

<xs:element name="shipto">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="address" type="xs:string"/>
      <xs:element name="city" minOccurs="1" type="xs:string"/>
      <xs:element name="country" 
               minOccurs="1" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

The correct output should be as follows:

<xs:element开发者_开发技巧 name="city" minOccurs="1" type="xs:string"/>
<xs:element name="country" 
               minOccurs="1" type="xs:string"/>


I advise against parsing XML using regex. It is too complicated to match tags with end-tags in a robust way.

There is a command line tool "xpath" using XML::XPath in Perl (Ubuntu package libxml-xpath-perl). Example:

xpath -e '//*[@minOccurs=1]' file.xml

Output

-- NODE --
<xs:element name="city" minOccurs="1" type="xs:string" />
-- NODE --
<xs:element name="country" minOccurs="1" type="xs:string" />


Assuming well-formed XML (i.e. no un-escaped > inside attributes) then you can probably do this:

<xs:element[^>]+?\sminOccurs\s*=[^>]+>

However, I'm not sure this will work with grep, since grep matches individual lines, so you may need to write a perl script or something to do it.

(Note, if you somehow have attributes which contain the value sminOccurs= then you'd need to get cleverer, but since this appears to be address data, I'm assuming that's unlikely, and manually removing any that happen to occur isn't going to be a problem.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜