开发者

Extract words contains only alpanumeric from a given string

Can someone please tell me how can I extract the "model name" from the below product names. As an example all I need is, extract "SGS45A08GB" from "Bosch SGS45A08GB Silver Dishwasher". Seems like I have to create Regex to identify words which has Alphanumric values for given string. Can someone give me some c# example to get this done.

Some example strings with model names:

Bosch SGS45A08GB Silver Dishwasher
        Bosch Avantixx SGS45A02GB Dishwasher, White
        Bosch SMS53E12GB White Dishwasher
        Bosch SGS45A08GB Dishwashers
        BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
        Bosch SKS60E02GB Compact Dishwasher, White
        BOSCH SRV43M03GB Slimline Integrated Dishwasher
        BOSCH Classixx SGS45C12GB Fu开发者_如何学JAVAll-size Dishwasher - White
        BOSCH SGS45A02GB Dishwashers
        Bosch 18V Cordless Drill Driver
        Bosch PSB 18V Li-Ion Hammer Drill
        Bosch SGS45A08GB Dishwasher
        Bosch SGS45A08 12Place Full Size Dishwasher in Silver

EDIT: Adding more product names

    Hitachi DH24DVC 4kg Cordless SDS Plus Hammer Drill 24V
    DeWalt DW965K 12V Angled Drill Driver
    Grove Modern Bathroom Suite with Acrylic Bath
    Bosch GBH24V 3.2kg SDS Plus Drill 24V
    Makita LS0714/1 190mm Sliding Compound Mitre Saw 110V
    Grove Modern Bathroom Suite with Steel Bath
    Swann All-in-One Monitoring & Recording Kit with LCD
    Makita BHR202RFE LXT 3.2kg SDS+ Rotary Hammer Drill 18V
    DeWalt DW625EK-GB 2000W Router 240V
    Trade Triple-Extension Ladder ELT340
    Makita 6391DWPE3 18V Drill Driver
    Erbauer ERF298MSW 165mm Sliding Compound Mitre Saw 24V


If you define "alphanumeric" as a string that contains both ASCII uppercase letters and numbers, and if you assume a minimum length for a model name (let's say 8 characters), then you can match all the names from your example using

Regex regexObj = new Regex(
    @"\b             # word boundary
    (?=[A-Z]*[0-9])  # assert presence of at least one ASCII digit
    (?=[0-9]*[A-Z])  # assert presence of at least one ASCII letter
    [0-9A-Z]{8,}     # match at least 8 characters
    \b               # until a word boundary", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    // matched text: matchResults.Value
    // match start: matchResults.Index
    // match length: matchResults.Length
    matchResults = matchResults.NextMatch();
} 

I think that uppercase ASCII letters and digits is a reasonable assumption for model names, but if that's not correct, you need to show us more examples.

EDIT: With your new examples, the following regex works, but the constraints are getting looser and looser, and you'll probably never find a regex that reliably matches all possible model names.

Regex regexObj = new Regex(
    @"\b             # word boundary
    (?=\S*[0-9])   # assert presence of at least one ASCII digit
    (?=\S*[A-Z])   # assert presence of at least one ASCII letter
    [0-9A-Z/-]{6,} # match at least 6 characters
    \b             # until a word boundary", 
    RegexOptions.IgnorePatternWhitespace);


Well dude this is the best that I could do. Note some of the items don't have any model number:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication3 {
    class Program {
        static void Main(string[] args) {
            string _data = @"Bosch SGS45A08GB Silver Dishwasher
            Bosch Avantixx SGS45A02GB Dishwasher, White
            Bosch SMS53E12GB White Dishwasher
            Bosch SGS45A08GB Dishwashers
            BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
            Bosch SKS60E02GB Compact Dishwasher, White
            BOSCH SRV43M03GB Slimline Integrated Dishwasher
            BOSCH Classixx SGS45C12GB Full-size Dishwasher - White
            BOSCH SGS45A02GB DishwashersBosch 18V Cordless Drill Driver
            Bosch PSB 18V Li-Ion Hammer Drill
            Bosch SGS45A08GB Dishwasher
            Bosch SGS45A08 12Place Full Size Dishwasher in Silver";

            Regex _expression = new Regex(@"\p{Lu}{3}\d+\w+\s+");
            foreach (Match _match in _expression.Matches(_data)) {
                Console.WriteLine(_match.Value);
            }
            Console.ReadKey();
        }
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜