Selecting text spanning multiple lines using grep and regular expressions
I'm trying to match lines with the xs:element tag that only contain minOccurs. As seen below some of them contain both search criteria on one line, some of them span multiple lines. Is there a way of selecting them using grep and regular expressions.
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" minOccurs="1" type="xs:string"/>
<xs:element name="country"
minOccurs="1" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The correct output should be as follows:
<xs:element开发者_开发技巧 name="city" minOccurs="1" type="xs:string"/>
<xs:element name="country"
minOccurs="1" type="xs:string"/>
I advise against parsing XML using regex. It is too complicated to match tags with end-tags in a robust way.
There is a command line tool "xpath" using XML::XPath in Perl (Ubuntu package libxml-xpath-perl). Example:
xpath -e '//*[@minOccurs=1]' file.xml
Output
-- NODE --
<xs:element name="city" minOccurs="1" type="xs:string" />
-- NODE --
<xs:element name="country" minOccurs="1" type="xs:string" />
Assuming well-formed XML (i.e. no un-escaped > inside attributes) then you can probably do this:
<xs:element[^>]+?\sminOccurs\s*=[^>]+>
However, I'm not sure this will work with grep, since grep matches individual lines, so you may need to write a perl script or something to do it.
(Note, if you somehow have attributes which contain the value sminOccurs=
then you'd need to get cleverer, but since this appears to be address data, I'm assuming that's unlikely, and manually removing any that happen to occur isn't going to be a problem.)
精彩评论