Is there a way to parse these strings?

2023-01-15 23:42 问答作者：

If there is, I certainly don't see it. We are doing magnetic stripe reads off of driver's licenses. The data does not seem to be consistent. The standard that the driver's licenses should follow sets limits on the length that any one field can have. The part that I can't wrap my head around is how to parse this data.

For example, a field may allow 13 total characters but only 8 are used. In this case, there will always be a caret delimiter ending that portion of the string. However, and here is the tricky part, if a field is exactly 13 (of the 13 allowable), there is no end caret delimiter and no right padding. All of the data just runs together.

Here are two sample strings.

%CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^?
%CALOS ANGELES^DOE$JOHN$CARL^14开发者_开发百科324 MAIN ST APT 5^?

Using PHP, how might I do this? I'd truly appreciate a hand on this. I'm really stumped.

Okay, here we go. I used the x flag to make the regex more readable and be able to comment it.

From the spec @EboMike posted, each field has a maximum length and is terminated by ^ if it is shorter than that length. The name is a composite field using $ as a separator between family name, first name, middle name, and suffix. Same goes for the address, which uses $ if the address has multiple lines.

$licenses = array(
    '%CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^?',
    '%CALOS ANGELES^DOE$JOHN$CARL^14324 MAIN ST APT 5^?'
);

foreach ($licenses as $license) {
    preg_match(
        '@
            ^%
            (.{2})          # State, 2 chars
            ([^^]{0,12}.)   # City, 13 chars, delimited by ^
            ([^^]{0,34}.)   # Name, 35 chars, delimited by ^
            ([^^]{0,28}.)   # Address, 29 chars, delimited by ^
            \?$
        @x',
        $license,
        $fields
    );

    $state   = $fields[1];
    $city    = rtrim($fields[2], '^');
    $name    = explode('$', rtrim($fields[3], '^'));
    $address = explode('$', rtrim($fields[4], '^'));

    echo "$license\n";
    echo "STATE:   "; print_r($state);   echo "\n";
    echo "CITY:    "; print_r($city);    echo "\n";
    echo "NAME:    "; print_r($name);
    echo "ADDRESS: "; print_r($address);
    echo "\n";
}

Output:

CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^
STATE:   CA
CITY:    MISSION HILLS
NAME:    Array
(
    [0] => SMITH
    [1] => JOHN
    [2] => JIM
    [3] => JR
)
ADDRESS: Array
(
    [0] => 1147 SOMESTREET
)

CALOS ANGELES^DOE$JOHN$CARL^14324 MAIN ST APT 5^
STATE:   CA
CITY:    LOS ANGELES
NAME:    Array
(
    [0] => DOE
    [1] => JOHN
    [2] => CARL
)
ADDRESS: Array
(
    [0] => 14324 MAIN ST APT 5
)

Didn't you ask this question a few hours ago? Someone posted a regex that handles the case where you separate strings that are either delimited or run exactly 13 characters here: Help with a delimited string

Did that not work?

EDIT: The format is explained here: http://en.wikipedia.org/wiki/Magnetic_stripe_card#United_States_driver.27s_licenses

For the city, it says "Field Separator - one character (generally '^') (absent if city reaches max length)". So again, a simple regex can do wonders here. Refer to the example, you can adjust it to match the format as detailed in the entry here.

EDIT: Okay, I'll give it a shot.

$str = "%CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^?";
preg_match("/%(..)".
           "([^\^]{1,13})\^?".
           "([^\\\$]+)\\\$".
           "([^\\\$]+)\\\$/",
           $str, $m);
$State = $m[1];
$City = $m[2];
$LastName = $m[3];
$FirstName = $m[4];

Just as an example of hwo you could go at it. Basically, ([^\^]{1,13}) means it'll try to get up to 13 characters that are not the '^' character. Once that's done, it'll consume the '^' character itself IF it's there via \^?.

Work from left to right, dealing with one field at a time.

Strip off the leading %:

CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^?

Take the first 15 chars (first field is max. 15 chars, right?):

CAMISSION HILLS

Doesn't contain a caret - great that's our first field - the next field starts on 16th char:

SMITH$JOHN$JIM$JR^1147 SOMESTREET^? (R1)

I don't know the max len. of this field - let's assume it's 20. Take the first 20 chars:

SMITH$JOHN$JIM$JR^11

Contains a caret - so we've > 1 field here. Take the chars up to the caret:

SMITH$JOHN$JIM$JR

...that's our next field. Now grab the string from (R1) above starting on the (length of prev field + 2)th character (+2 to skip over the ^)

1147 SOMESTREET^?

etc.

If this were java, I'd solve this with regular expressions. I know there must be some in PHP?

All the constraints you mentioned can be translated into REGEX.

for example:

X{n,m}?      X, at least n but not more than m times

can be used with something like:

[^%\$\^]{1,13}[%\$\^]

Which reads as, "1-13 instances of any character not equal to %, $, or ^ followed by one of those very same delimiters"

When I write regex, I often refer back to Java's great doc page. You can also do neat tricks like extract particular matching portions and pull out particular words. Again, I'm more familiar with java but PHP is too mature of a language not to have the same kinds of features.

I hope that helps in some way. If no one else answers, I can try to create the regex you need.

gMale

继续阅读：php

Is there a way to parse these strings?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？