开发者

Parsing the string in PHP

开发者_高级运维How can I split this line:

我 [wǒ] - (pronoun) I or me 你 [nǐ] - (pronoun) you (second person singular); yourself 他 [tā] - (pronoun) he or him

into three lines like this:

我 [wǒ] - (pronoun) I or me

你 [nǐ] - (pronoun) you (second person singular); yourself

他 [tā] - (pronoun) he or him

let's say, insert <br /> tag after each line?

Thank you!

UPD. My bad, there were periods, but it was a mistake.


The only clear pattern we can see since you removed the dots is "a foreign character, a space, and an opening bracket".

Let focus on that :

<?php

$string = "我 [wǒ] - (pronoun) I or me 你 [nǐ] - (pronoun) you (second person singular); yourself 他 [tā] - (pronoun) he or him";

$result = preg_replace('/(. \[)/u', // "any char, a space then [", 'u' flag to use UTF8 
                       '<br/>$1', // replace it by a break table and a back reference
                        $string);

echo $result;

Note that using this algo, the line breaks will be place at the begining of the lines. Don't forget the UTF-8 flag, and use UTF-8 everywhere in your application or processing strings will be a mess.

EDIT : if you ever wants the line break to be only at the beginning of the two lines, then you can use negative lookbehind for that purpose :

$string = "我 [wǒ] - (pronoun) I or me 你 [nǐ] - (pronoun) you (second person singular); yourself 他 [tā] - (pronoun) he or him";

// the same pattern, but excluding the one preceded by "^", where the string starts
$result = preg_replace('/(?<!^)(. \[)/u',   
                       '<br/>$1', 
                        $string);

echo $result;


If you are sure about the format, you can try something like this, but without a proper delimiter it's all just guessing and you might get incorrect conversion.

$str = preg_replace("/\s+(\S+\s+\[\S+\])/", "<br />$1", $str);


If my interpretation is correct, you want to break just before each chinese/japanese character?

In the php manual, in the comments of the ord function there are a number of suggestions/code for an UTF-8 ord function. With such a function you can iterate UTF-8 codepoint by UTF-8 codepoint through your string, and if you encounter a codepoint (character) whose ord is > begin of chinese/japanese chars, first insert a
or whatever.

Edit: the doc page for ord is here

And this is the code I think may be suitable for your problem: Quoting author kerry at shetline dot com

Here's my take on an earlier-posted UTF-8 version of ord, suitable for iterating through a string by Unicode value. The function can optionally take an index into a string, and optionally return the number of bytes consumed by a character so that you know how much to increment the index to get to the next character.

<?php

function ordUTF8($c, $index = 0, &$bytes = null)
{
  $len = strlen($c);
  $bytes = 0;

  if ($index >= $len)
    return false;

  $h = ord($c{$index});

  if ($h <= 0x7F) {
    $bytes = 1;
    return $h;
  }
  else if ($h < 0xC2)
    return false;
  else if ($h <= 0xDF && $index < $len - 1) {
    $bytes = 2;
    return ($h & 0x1F) <<  6 | (ord($c{$index + 1}) & 0x3F);
  }
  else if ($h <= 0xEF && $index < $len - 2) {
    $bytes = 3;
    return ($h & 0x0F) << 12 | (ord($c{$index + 1}) & 0x3F) << 6
                             | (ord($c{$index + 2}) & 0x3F);
  }          
  else if ($h <= 0xF4 && $index < $len - 3) {
    $bytes = 4;
    return ($h & 0x0F) << 18 | (ord($c{$index + 1}) & 0x3F) << 12
                             | (ord($c{$index + 2}) & 0x3F) << 6
                             | (ord($c{$index + 3}) & 0x3F);
  }
  else
    return false;
}

?>


<?php
$str="我 [wǒ] - (pronoun) I or me 你 [nǐ] - (pronoun) you (second person singular); yourself 他 [tā] - (pronoun) he or him";

$splitPoints;
$indis=0;

for($i=0;$i<strlen($str);$i++){
    if ($str[$i]=='['){
        $splitPoints[$indis]=$i-4;
        $indis++;
    }       
}

for($i=0;$i<$indis-1;$i++){
    $strArray[$i]=substr($str,$splitPoints[$i],($splitPoints[$i+1]-$splitPoints[$i]));

}

$strArray[$i]=substr($str,$splitPoints[$indis-1],(strlen($str)-$splitPoints[$indis-1]));

for($i=0;$i<$indis;$i++){
    echo $strArray[$i]."<br>";
}

?>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜