开发者

Removing newlines and tabs after Regex

I am performing preg_match() on the following HTML code:

HTML Code:

<div class="phone"> 
        (123) 123-1234
    </div> 

Regex Pattern:

/<div class="phone">(?<phone>.*?)<\/div>/s

Result:

[phone] => '
                    (617) 547-6670
 开发者_StackOverflow社区     '

The extra line and spaces is what I am trying to get rid of. Using /sm option does not affect the result. Using str_replace("\n",'',$string) got rid of a line, and the spaces infront should be \t tabs. I got rid of the annoying stuff with str_replace("\n\t\t\t\t",'',$string) but I need a more general solution.

How can I remove the \n and \t regardless of how many there are?


Not sure if this is what you would like, but trim() will take care of spaces, tabs, and newlines on each side of the string (but not within the string).

http://php.net/manual/en/function.trim.php

string trim ( string $str [, string $charlist ] )

This function returns a string with whitespace stripped from the beginning and end of str. Without the second parameter, trim() will strip these characters:

" " (ASCII 32 (0x20)), an ordinary space.
"\t" (ASCII 9 (0x09)), a tab.
"\n" (ASCII 10 (0x0A)), a new line (line feed).
"\r" (ASCII 13 (0x0D)), a carriage return.
"\0" (ASCII 0 (0x00)), the NUL-byte.
"\x0B" (ASCII 11 (0x0B)), a vertical tab.

I do realize that this will not handle something like Hello<space><space><space>World, but it may be what you're after (outside of the regex).


The simplest way is to pad the "content" part of the regex with \s*, like so:

/<div class="phone">\s*(?<phone>.*?)\s*<\/div>/s

The first \s* consumes as many whitespace characters as it can, stopping when it sees the first character in the phone number. Then the .*? starts consuming characters reluctantly, stopping at the first position where the next part of the regex (\s*<\/div>) can match, which is just after the last character in the phone number.

Be aware that the first \s* must be greedy and the .*? in the named group must be non-greedy for this to work. So you if you start feeling the urge to make all quantifiers non-greedy with the /U option, resist it. I mention this because some people use it all their regexes, which I consider a poor practice. Also, the /s (single-line) modifier is necessary but the /m (multiline) modifier isn't.


using \s*

\s is a whitespace character and * means any number of including 0

But I think you should look for an html parser, its here probably the better solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜