开发者

Autodetect punctuation in a HTML string, and split the string there

I have a set of punctuation characters:

$punctuation = array('.', '!', ';', '?');

A character limit variable:

开发者_如何学JAVA$max_char = 55; 

And a string with HTML:

$string = 'This is a test string. With <a href="http://google.com">HTML</a>.';

How can I split this string to maximum $max_chr characters, using one of the characters in the $punctuation array as "keys" ?

So basically the string should split at the nearest punctuation character, but not inside a HTML tag definition/attribute (It doesn't matter if the split occurs inside a tag's contents and the tag remains unclosed -- because I'm checking for unclosed tags later).


If you want to know whether or not you're inside a tag you might need to do some kind of state machine, and then make use of a loop on the string. You can reference a string sortof like an array, so you can do something like:

 $punctuation = array('.', '!', ';', '?');
 $in_tag = false;
 $max_char = 55;
 $string = 'This is a test string. With <a href="http://google.com">HTML</a>.';
 $str_length = strlen($string) > $max_char ? $max_char : strlen($string);
 for($i = 0; $i < $str_length; $i++)
 {
    $tempChar = $string[$i]; //Get the character at position $i
    if((!$in_tag) && (in_array($tempChar, $punctuation)))
    {
         $string1 = substr($string, 0, $i);
         $string2 = substr($string, $i);
    }
    elseif((!$in_tag) && ($tempChar == "<"))
    {
        $in_tag = true;
    }
    elseif(($in_tag) && ($tempChar == ">"))
    {
        $in_tag = false;
    }
 }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜