Regexp Replace (as3) - Use text to find but not to replace
Having some trouble with regexp. My XML file loaded to actionscript removes all spaces (automatically trims the text). So I want to replace all SPACE with a word so that I can fix that later on in my own parsing.
Here's examples of how the tags I want to adjust.
<w:t> </w:t>
<w:t> Test</开发者_运维问答w:t>
<w:t>Test </w:t>
This is the result I want.
<w:t>%SPACE%</w:t>
<w:t>%SPACE%Test</w:t>
<w:t>Test%SPACE%</w:t>
The closest result I got is <w:t>\s|\s</w:t>
Biggest problem is that it changes all spaces in the XML file that corrupts everything. Will only change inside w:t nodes but not destroy the text.
When parsing XML using the standard XML
class in ActionScript you can specify to not ignore whitespace by setting the ignoreWhiteSpace
property to false
. It is set to true
by default. This will ensure that white space in XML text nodes is preserved. You can then do whatever you want with it.
XML.ignoreWhiteSpace = false
/* parse your XML here */
That way you don't have to muck around with regular expressions and can use the standard XML ActionScript parsing.
var reg1 : RegExp = /((?:<w:t>|\G)[^<\s]*+)\s/g;
data = data.replace(reg1, "$1%SPACE%");
(?:<w:t>|\G)
means every match starts at a <w:t>
tag, or immediately after the previous match. Since [^<\s]
can't match the closing </w:t>
tag (or any other tag), every match is guaranteed to be inside a <w:t>
element.
To do this properly, you would need to deal with some more questions, like:
\s
matches several other kinds of whitespace, not just' '
. Do you want to replace any whitespace character with%SPACE%
? Or do you know that' '
will be the only kind of whitespace in those elements?Will there be other elements inside the
<w:t>
elements (for example,<w:t> test <xyz> test </xyz> </w:t>
)? If so, the regex becomes more complicated, but it's still doable.
I'm not set up to test ActionScript, but here's a demo in PHP, which uses the PCRE library under the hood, like AS3:
test it on ideone.com
EDIT: In addition to matching where the last match left off, \G
matches the beginning of the input, just like \A
. That's not a problem with the regex given here, but in the ideone demo it is. That regex should be
((?:<w:t>|\G(?!\A))(?:[^<\s]++|<(?!/w:t>))*+)\s
Made a workaround that isn't so nice. But well, problem is when you work against the clock.
I run the replace 3 times instead.
var reg1 : RegExp = /<w:t>\s/gm; data = data.replace(reg1, "<w:t>%DEADSPACE%"); var reg2 :RegExp = /\s<\/w:t>/gm; data = data.replace(reg2, "%DEADSPACE%</w:t>"); var reg3 :RegExp = /<w:t>\s<\/w:t>/gm; data = data.replace(reg3, "<w:t>%DEADSPACE%</w:t>");
RegExp, what is it good for. Absolutly nothing (singing) ;)
there's also another way
精彩评论