Autodetect punctuation in a HTML string, and split the string there
I have a set of punctuation characters:
$punctuation = array('.', '!', ';', '?');
A character limit variable:
开发者_如何学JAVA$max_char = 55;
And a string with HTML:
$string = 'This is a test string. With <a href="http://google.com">HTML</a>.';
How can I split this string to maximum $max_chr
characters, using one of the characters in the $punctuation
array as "keys" ?
So basically the string should split at the nearest punctuation character, but not inside a HTML tag definition/attribute (It doesn't matter if the split occurs inside a tag's contents and the tag remains unclosed -- because I'm checking for unclosed tags later).
If you want to know whether or not you're inside a tag you might need to do some kind of state machine, and then make use of a loop on the string. You can reference a string sortof like an array, so you can do something like:
$punctuation = array('.', '!', ';', '?');
$in_tag = false;
$max_char = 55;
$string = 'This is a test string. With <a href="http://google.com">HTML</a>.';
$str_length = strlen($string) > $max_char ? $max_char : strlen($string);
for($i = 0; $i < $str_length; $i++)
{
$tempChar = $string[$i]; //Get the character at position $i
if((!$in_tag) && (in_array($tempChar, $punctuation)))
{
$string1 = substr($string, 0, $i);
$string2 = substr($string, $i);
}
elseif((!$in_tag) && ($tempChar == "<"))
{
$in_tag = true;
}
elseif(($in_tag) && ($tempChar == ">"))
{
$in_tag = false;
}
}
精彩评论