Selecting before after character in regex
Basically I'm trying to get a bit of regex to do the following... I have some data I need to split, the sample data looks like this:
Brand Name - Product Name
Another Brand - Shoe Laces
Heinz - Bakes Beans
I want to be able to select the brand name or the product name but I can't seem to do it without catching the " - " part in the regex. Anyone tell me what I'm missing out? My regex is pretty basic.
EDIT:开发者_StackOverflow社区 I'm exporting a database to a spreadsheet, formatting it and importing it into a new system through a CSV. The old system used a brand name - product name method as above where as the new one uses two separate fields. Ideally I wanted to try and sneak some regex in the spreadsheet formula but now I think its going to be easier to just handle this with a script. Likely PHP although Javascript isn't ruled out.
You won't need a regex for that - a simple split
would be sufficient.
Example in python:
#!/usr/bin/env python
from string import strip
s = """
Brand Name - Product Name
Another Brand - Shoe Laces
Heinz - Bakes Beans
"""
for line in s.split('\n'):
try:
brand, product = map(strip, line.split('-'))
print 'Brand:', brand, '| Product:', product
except:
pass
Yields:
Brand: Brand Name | Product: Product Name
Brand: Another Brand | Product: Shoe Laces
Brand: Heinz | Product: Bakes Beans
PHP version:
<?php
$s = <<<EOM
Brand Name - Product Name
Another Brand - Shoe Laces
Heinz - Bakes Beans
EOM;
foreach (split("\n", $s) as $line) {
list($brand, $product) = split("-", $line, 2);
echo "Brand: " . trim($brand) . " | Product: " . trim($product) . "\n";
}
?>
Ruby version:
#!/usr/bin/env ruby
s = "
Brand Name - Product Name
Another Brand - Shoe Laces
Heinz - Bakes Beans
"
s.split("\n").each { |line|
brand, product = line.split("-").map{ |item| item.strip }
puts "Brand: #{brand} | Product: #{product}" if brand and product
}
Assuming that there won't be any stray hyphens (-
) in the string (and that the brand names etc would contain only alphanumerical characters and spaces - to allow other symbols, add them to the character classes []
), you can use following regex:
^([\w\s]+?)\s*-\s*([\w\s]+)$
The result object will look like:
$1
Brand Name
$2
Product Name
if your data is structured like that, the simplest way is to use whatever split method your language has, then do a split on "-". eg in Python
"Heinz - Bakes Beans".split("-")
No need complicated regex
So if your data is in a file
for line in open("file"):
brand,product=line.rstrip().split("-")
print brand, product
If you work with PHP, you can use explode
$f = fopen("file","r");
if($f){
while( !feof($f) ){
$line = fgets($f,4096);
list($brand,$product) = explode("-",$line);
echo "$brand - $product\n";
}
}
fclose($f);
You don't need regex for this task. Just find the index of the substring "-
". Stuff before it is the band name, and after is the product name.
If you know the data to be well-formatted, and in particular that the string -
-- one space, one hyphen, one space -- will only occur as the separator in the middle, you can use (.*) - (.*)
to retrieve the brand name in the first group and the product name in the second.
精彩评论