开发者

Regexp Replace (as3) - Use text to find but not to replace

Having some trouble with regexp. My XML file loaded to actionscript removes all spaces (automatically trims the text). So I want to replace all SPACE with a word so that I can fix that later on in my own parsing.

Here's examples of how the tags I want to adjust.

<w:t> </w:t>
<w:t> Test</开发者_运维问答w:t>
<w:t>Test </w:t>

This is the result I want.

<w:t>%SPACE%</w:t>
<w:t>%SPACE%Test</w:t>
<w:t>Test%SPACE%</w:t>

The closest result I got is <w:t>\s|\s</w:t>

Biggest problem is that it changes all spaces in the XML file that corrupts everything. Will only change inside w:t nodes but not destroy the text.


When parsing XML using the standard XML class in ActionScript you can specify to not ignore whitespace by setting the ignoreWhiteSpace property to false. It is set to true by default. This will ensure that white space in XML text nodes is preserved. You can then do whatever you want with it.

XML.ignoreWhiteSpace = false
/* parse your XML here */

That way you don't have to muck around with regular expressions and can use the standard XML ActionScript parsing.


var reg1 : RegExp = /((?:<w:t>|\G)[^<\s]*+)\s/g;
data = data.replace(reg1, "$1%SPACE%");

(?:<w:t>|\G) means every match starts at a <w:t> tag, or immediately after the previous match. Since [^<\s] can't match the closing </w:t> tag (or any other tag), every match is guaranteed to be inside a <w:t> element.

To do this properly, you would need to deal with some more questions, like:

  • \s matches several other kinds of whitespace, not just ' '. Do you want to replace any whitespace character with %SPACE%? Or do you know that ' ' will be the only kind of whitespace in those elements?

  • Will there be other elements inside the <w:t> elements (for example, <w:t> test <xyz> test </xyz> </w:t>)? If so, the regex becomes more complicated, but it's still doable.

I'm not set up to test ActionScript, but here's a demo in PHP, which uses the PCRE library under the hood, like AS3:
test it on ideone.com

EDIT: In addition to matching where the last match left off, \G matches the beginning of the input, just like \A. That's not a problem with the regex given here, but in the ideone demo it is. That regex should be

((?:<w:t>|\G(?!\A))(?:[^<\s]++|<(?!/w:t>))*+)\s


Made a workaround that isn't so nice. But well, problem is when you work against the clock.

I run the replace 3 times instead.

var reg1 : RegExp = /<w:t>\s/gm;
data = data.replace(reg1, "<w:t>%DEADSPACE%");

var reg2 :RegExp = /\s<\/w:t>/gm;
data = data.replace(reg2, "%DEADSPACE%</w:t>");

var reg3 :RegExp = /<w:t>\s<\/w:t>/gm;
data = data.replace(reg3, "<w:t>%DEADSPACE%</w:t>");

RegExp, what is it good for. Absolutly nothing (singing) ;)


there's also another way

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜