Regular expression preg_quote symbols are not detected
I have a dictionary of swear words in the database, and the following works great
preg_match_all("/\b".$f."(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
$t
is the input text and simply, $f = preg_quote("punk")
; "punk"
is from the database dictionary, so at this point in the loop the expression is as follows
preg_match_all("/\bpunk(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
preg_quote
replaces symbols eg. #
with \\#
so that the expression is escaped, but when the dictionary is checking eg. "F@CK"
or "A$$"
these symbols are not detected in the input string with the above expression, I have both a$$
and f@ck
in the dictionary, but they do not work. If I remove preg_quote()
on the word, the regular expression is invalid as these symbols are not escaped.
Any suggestions on how I can detect "a$$"
???
Edit:
So I guess the expression that is not working as intended would be eg.
preg_match_all("/\bf\@ck(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
Which should find f@ck in $t
UPDATE:
This is my usage, simply put; if there are matches in $m
replace them with "\*\*\*\*"
, this whole block is inside a loop through each word in the dictionary, $f
is the dictionary word and $t
is the input
$f = preg_quote($f);
preg_match_all("/\b$f(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
if (count($m) > 0) {
$t = preg_replace("/(\b$f(?:ing|er|es|s)?\b)/si","\*\*\*\*\*",$t);
}
UPDATE:
Behold, the var_dump
:
preg_quote($f) = string(开发者_运维技巧5) "a\$\$"
$t = string(18) "You're such an a$$"
expression = string(29) "/\ba\$\$(?:ing|er|es|s)?\b/si"
UPDATE:
This is only happening when words end with a symbol. I tested "a$$hole"
and it’s fine, but "a$$"
doesn't work.
ANOTHER UPDATE:
Try this simplified version, $words
being a make-shift dictionary
$words = array("a$$","asshole","a$$hole","f@ck","f#ck","f*ck");
$text = "Input whatever you feel like here eg. a$$";
foreach ($words as $f) {
$f = preg_quote($f,"/");
$text = preg_replace("/\b".$f."(?:ing|er|es|s)?\b/si",
str_repeat("*",strlen($f)),
$t);
}
I should expect to see "Input whatever you feel like here eg. \*\*\*"
as a result.
Cannot Be Done
I'm sorry, but this “problem” is truly impossible to solve. Consider these:
- ꜰᴜᴄᴋ is U+A730.1D1C.1D04.1D0B, "\N{LATIN LETTER SMALL CAPITAL F}\N{LATIN LETTER SMALL CAPITAL U}\N{LATIN LETTER SMALL CAPITAL C}\N{LATIN LETTER SMALL CAPITAL K}"
- ᶠᵘᶜᵏ is U+1DA0.1D58.1D9C.1D4F, "\N{MODIFIER LETTER SMALL F}\N{MODIFIER LETTER SMALL U}\N{MODIFIER LETTER SMALL C}\N{MODIFIER LETTER SMALL K}"
精彩评论