regex remove digits and - in beginning
I'm treating a list of strings, but I want to alter the strings so they don't look ugly to the user. An example list would be
2736162 Magazines
23-2311 Numbers
1-38122 Faces
5-231123 Newspapers
31-31235 Armynews
33-12331 Celebrities 1
33-22113 Celebrities 2
Cars
Glasses
And what I want is to trim out the beginning so that the ugly sequence of numbers and "-" are left out, and the user only sees the data that makes sense like:
Magazines
Numbers
Faces
Newspapers
Armynews
Celebrities 1
Celebrities 2
Cars
Glasses
How would I trim out the digits/-'s in the beginning with regex ?
EDIT Would it be possible to design the same REGEX to also strip these values from:
FFKKA9101U- Aquatic Environmental Chemistry
FLVKB0381U- Clinical Drug Development
4761-F-Filosofisk kulturkritik
B22-1U-Dynamic biochemistry
to:
Aquatic Environmental Chemistry
Clinical Drug Development
Filosofisk kulturkritik
Dynamic biochemistry
the rule I would think of is that if there are only capital letters, digits and - or + signs before a - it only makes sense to the machine, and is not an actual word, and therefore should b开发者_JS百科e stripped out, I don't know how to formulate this in regex though.
It looks like you can match and replace ^[\d-]*\s*
with the empty string.
The […]
is a character class. Something like [aeiou]
matches one of any of the lowercase vowels. \d
is the shorthand for the digit character class, so [\d-]
matches either a digit or a dash. The \s
is the shorthand for the whitespace character class.
The ^
is the beginning of the line anchor. The *
is "zero-or-more" repetition.
Thus the pattern matches, at the beginning of a line, a sequence of digits or dash, followed by a sequence of whitespaces.
It's not clear from the question, but if the input is a multiline text (instead of applying the regex one line at a time), then you'd want to enable the multiline mode as well.
C# snippet
Here's an example snippet in C#:
var text = @"
2736162 Magazines
23-2311 Numbers
1-38122 Faces
5-231123 Newspapers
31-31235 Armynews
33-12331 Celebrities 1
33-22113 Celebrities 2
Cars
Glasses
";
Console.WriteLine(
Regex.Replace(
text,
@"^[\d-]*\s*",
"",
RegexOptions.Multiline
)
);
The output is (as seen on ideone.com):
Magazines
Numbers
Faces
Newspapers
Armynews
Celebrities 1
Celebrities 2
Cars
Glasses
Depending on flavor, you may have to specify the multiline mode as a /m
flag (or (?m)
embedded). You may also have to double the backslash if you're representing the pattern as a string literal, e.g. in Java you can use text.replaceAll("(?m)^[\\d-]*\\s*", "")
.
Special note on including dash in a character class
Do be careful when including the -
inside a […]
character class, since it can signify a range instead of a literal -
character. Something like [a-z]
matches a lowercase letter. Something like [az-]
matches either 'a'
, 'z'
, or '-'
.
Related questions
- Regex: why doesn't
[01-12]
range work as expected?
If there are digits(with or without -'s) on every line you can just split the line on space, exclude first piece and then join again.
精彩评论