开发者

Selecting before after character in regex

Basically I'm trying to get a bit of regex to do the following... I have some data I need to split, the sample data looks like this:

Brand Name - Product Name
Another Brand - Shoe Laces
Heinz - Bakes Beans

I want to be able to select the brand name or the product name but I can't seem to do it without catching the " - " part in the regex. Anyone tell me what I'm missing out? My regex is pretty basic.

EDIT:开发者_StackOverflow社区 I'm exporting a database to a spreadsheet, formatting it and importing it into a new system through a CSV. The old system used a brand name - product name method as above where as the new one uses two separate fields. Ideally I wanted to try and sneak some regex in the spreadsheet formula but now I think its going to be easier to just handle this with a script. Likely PHP although Javascript isn't ruled out.


You won't need a regex for that - a simple split would be sufficient.

Example in python:

#!/usr/bin/env python
from string import strip

s = """
Brand Name - Product Name
Another Brand - Shoe Laces 
Heinz - Bakes Beans
"""

for line in s.split('\n'):
    try:
        brand, product = map(strip, line.split('-'))
        print 'Brand:', brand, '| Product:', product
    except:
        pass

Yields:

Brand: Brand Name | Product: Product Name
Brand: Another Brand | Product: Shoe Laces
Brand: Heinz | Product: Bakes Beans

PHP version:

<?php

$s = <<<EOM
Brand Name - Product Name
Another Brand - Shoe Laces 
Heinz - Bakes Beans
EOM;

foreach (split("\n", $s) as $line) {
    list($brand, $product) = split("-", $line, 2);
    echo "Brand: " . trim($brand) . " | Product: " . trim($product) . "\n";
}

?>

Ruby version:

#!/usr/bin/env ruby

s = "
Brand Name - Product Name
Another Brand - Shoe Laces 
Heinz - Bakes Beans
"

s.split("\n").each { |line| 
  brand, product = line.split("-").map{ |item| item.strip }
  puts "Brand: #{brand} | Product: #{product}" if brand and product
}


Assuming that there won't be any stray hyphens (-) in the string (and that the brand names etc would contain only alphanumerical characters and spaces - to allow other symbols, add them to the character classes [] ), you can use following regex:

^([\w\s]+?)\s*-\s*([\w\s]+)$

The result object will look like:

$1 Brand Name
$2 Product Name


if your data is structured like that, the simplest way is to use whatever split method your language has, then do a split on "-". eg in Python

"Heinz - Bakes Beans".split("-")

No need complicated regex

So if your data is in a file

for line in open("file"):
    brand,product=line.rstrip().split("-")
    print brand, product

If you work with PHP, you can use explode

$f = fopen("file","r");
if($f){
     while( !feof($f) ){
        $line = fgets($f,4096);
        list($brand,$product) = explode("-",$line);
        echo "$brand - $product\n";
     }
}
fclose($f);


You don't need regex for this task. Just find the index of the substring "-". Stuff before it is the band name, and after is the product name.


If you know the data to be well-formatted, and in particular that the string - -- one space, one hyphen, one space -- will only occur as the separator in the middle, you can use (.*) - (.*) to retrieve the brand name in the first group and the product name in the second.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜