开发者

How to get the whole XML elements of a particular namespace with their tags?

I cannot find a specific question like this so I'm posting. Hopefully, this will be of general use.

I have a file that includes XML tags of "<w:t> data data.....</w:t>". There is a lot of other stuff too. I need to capture everything within (and including) the <w:t></w:t> tags.

I'd appreciate hearing suggestions on how to proceed.

开发者_如何学Go

Thanks in advance..

David


You should really use a XML DOM parser like SimpleXML:

$string = '<?xml version="1.0"?>
<root xmlns:w="http://example.com/">
    <w:t>some data...</w:t>
    <not-captured>data data</not-captured>
    <w:t>more data...</w:t>
</root>';
$doc = simplexml_load_string($string);
foreach ($doc->xpath('//w:t') as $elem) {
    var_dump($elem->asXML());
}

If you do not specify the namespace for w in your XML document, use SimpleXMLElement::registerXPathNamespace:

$doc->registerXPathNamespace('w', 'http://example.com/');


Adding to the previous answer, I would include an 's' in lower-case after the i in the end to take care of line breaks.

Good point by Mr. Gumbo below. Yes do also add a 'U' in upper-case after the 's' to make the expression less greedy otherwise it won't work as expected

e.g.

preg_match_all('/.*<\/w\:t>/isU', $string, $matches);


Using DomXml is preferred option since it does not restrict you to searching for other tags/data.

But using regular expressions makes far less code so I would go for preg_match_all if those tags are only thing you need.

$string = '<?xml version="1.0"?>
<root>
    <w:t>some data...</w:t>
    <not-captured>data data</not-captured>
    <w:t>more data...</w:t>
</root>
</xml>';

preg_match_all('/<w\:t>.*<\/w\:t>/is', $string, $matches);
var_dump($matches);

response:

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(23) "<w:t>some data...</w:t>"
    [1]=>
    string(23) "<w:t>more data...</w:t>"
  }
}

Edit: /is modifier added to regex

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜