Extract words contains only alpanumeric from a given string
Can someone please tell me how can I extract the "model name" from the below product names. As an example all I need is, extract "SGS45A08GB" from "Bosch SGS45A08GB Silver Dishwasher". Seems like I have to create Regex to identify words which has Alphanumric values for given string. Can someone give me some c# example to get this done.
Some example strings with model names:
Bosch SGS45A08GB Silver Dishwasher
Bosch Avantixx SGS45A02GB Dishwasher, White
Bosch SMS53E12GB White Dishwasher
Bosch SGS45A08GB Dishwashers
BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
Bosch SKS60E02GB Compact Dishwasher, White
BOSCH SRV43M03GB Slimline Integrated Dishwasher
BOSCH Classixx SGS45C12GB Fu开发者_如何学JAVAll-size Dishwasher - White
BOSCH SGS45A02GB Dishwashers
Bosch 18V Cordless Drill Driver
Bosch PSB 18V Li-Ion Hammer Drill
Bosch SGS45A08GB Dishwasher
Bosch SGS45A08 12Place Full Size Dishwasher in Silver
EDIT: Adding more product names
Hitachi DH24DVC 4kg Cordless SDS Plus Hammer Drill 24V
DeWalt DW965K 12V Angled Drill Driver
Grove Modern Bathroom Suite with Acrylic Bath
Bosch GBH24V 3.2kg SDS Plus Drill 24V
Makita LS0714/1 190mm Sliding Compound Mitre Saw 110V
Grove Modern Bathroom Suite with Steel Bath
Swann All-in-One Monitoring & Recording Kit with LCD
Makita BHR202RFE LXT 3.2kg SDS+ Rotary Hammer Drill 18V
DeWalt DW625EK-GB 2000W Router 240V
Trade Triple-Extension Ladder ELT340
Makita 6391DWPE3 18V Drill Driver
Erbauer ERF298MSW 165mm Sliding Compound Mitre Saw 24V
If you define "alphanumeric" as a string that contains both ASCII uppercase letters and numbers, and if you assume a minimum length for a model name (let's say 8 characters), then you can match all the names from your example using
Regex regexObj = new Regex(
@"\b # word boundary
(?=[A-Z]*[0-9]) # assert presence of at least one ASCII digit
(?=[0-9]*[A-Z]) # assert presence of at least one ASCII letter
[0-9A-Z]{8,} # match at least 8 characters
\b # until a word boundary",
RegexOptions.IgnorePatternWhitespace);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
I think that uppercase ASCII letters and digits is a reasonable assumption for model names, but if that's not correct, you need to show us more examples.
EDIT: With your new examples, the following regex works, but the constraints are getting looser and looser, and you'll probably never find a regex that reliably matches all possible model names.
Regex regexObj = new Regex(
@"\b # word boundary
(?=\S*[0-9]) # assert presence of at least one ASCII digit
(?=\S*[A-Z]) # assert presence of at least one ASCII letter
[0-9A-Z/-]{6,} # match at least 6 characters
\b # until a word boundary",
RegexOptions.IgnorePatternWhitespace);
Well dude this is the best that I could do. Note some of the items don't have any model number:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication3 {
class Program {
static void Main(string[] args) {
string _data = @"Bosch SGS45A08GB Silver Dishwasher
Bosch Avantixx SGS45A02GB Dishwasher, White
Bosch SMS53E12GB White Dishwasher
Bosch SGS45A08GB Dishwashers
BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
Bosch SKS60E02GB Compact Dishwasher, White
BOSCH SRV43M03GB Slimline Integrated Dishwasher
BOSCH Classixx SGS45C12GB Full-size Dishwasher - White
BOSCH SGS45A02GB DishwashersBosch 18V Cordless Drill Driver
Bosch PSB 18V Li-Ion Hammer Drill
Bosch SGS45A08GB Dishwasher
Bosch SGS45A08 12Place Full Size Dishwasher in Silver";
Regex _expression = new Regex(@"\p{Lu}{3}\d+\w+\s+");
foreach (Match _match in _expression.Matches(_data)) {
Console.WriteLine(_match.Value);
}
Console.ReadKey();
}
}
}
精彩评论