开发者

Remove specific words from a string

I am trying to parse a file of street names for a project, and need to remove modifiers (Upper / Lower /Old / New / North / East / South / West ...) and endings (street / road / way / lane...), but I am hving no luck with a regular expression.

The way it is set up at the moment is that the program will parse the file one line (ie. street) at a time, and check it

I think the problem is word boundries - what I need for example are the following transformations...

Old Harrow Way -> Harrow (ie. remove 'Old' prefix and 'Way' ending)

Chittock Mead -> Chittock (Remove the ending 'Mead')

- But to leave these alone when in a word:

Gold Lane -> Gold (just remove ending)

Eastley Avenue -> Eastly (just remove ending)

Upper Western Avenue -> Western (remove prefi开发者_C百科x and ending)

Obviously, things like "South Street" would remove both - This is ok, because I can discard an empty string.

Can anyone give me an idea of how to do this - I've been reading up on regular expressions and trying things for hours!


I would use a <list> or Array to store those values and then possibly a foreach loop to check the address against the list or array. You would then use .remove to remove each instance of the list or array item. There is more to this, but that is the general idea.


I'd use string.split(" ") to split the address into and array of words. Then take the first word and see it exists on a list of prefixes (ie a or Array). Do the same for the last word and the endings.

Running through two lists of reg-ex expressions for each input address will be time consuming. Using my logic should be a good deal faster, especially if the lists are sorted and b-searched.

If the address data is a bit dirty (ie, punctuation, double spaces, etc), you may want to do some cleanup, as an input string like " Main St" will have more 'words' than are really there (hint: Trim() and RegEx.Replace(" "," ")).


This question or this question will help you. Ensure that you use the Regex.Replace() method to do the pattern matching and replacement.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜