Split the string with different conditions using Linq in C#
I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.
Some Examples:
Expected output:"this is test A/ABC""this is test A"and"ABC"
Expected output:"this is a test; ABC/XYZ""this is a test; ABC"and"XYZ"
Expected output:"This TASK is assigned to ANIL/SHAM in our project""This TASK is assigned to ANIL in our project"and"SHAM"
Expected output:"This TASK is assigned to ANIL/SHAM in OUR project""This TASK is assigned to ANIL/SHAM in project"and"OUR"
Expected output:"this is test AWN.A""this is test"and"AWN.A""XETRA-DAX"Expected output:"XETRA"and"DAX""FTSE-100"Expected output:"-100"and"FTSE""A开发者_运维百科THEX"Expected output:""and"ATHEX""Euro-Stoxx-50"Expected output:"Euro-Stoxx-50"and""
How can I achieve that?
An "intelligent" version:
string strValue = "this is test A/ABC";
int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
var str1 = strValue.Substring(0, ix);
var str2 = strValue.Substring(ix + 1);
A "stupid LINQ" version:
var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());
both cases are WITHOUT checks. The OP can add checks if he wants them.
For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".
var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");
var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");
For the third question
var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);
var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");
With code sample: http://ideone.com/5OSs0
Another update (it's becoming BORING)
Regex Regex = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");
The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ
With code sample: http://ideone.com/FqcmY
This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:
Match lastSeparator = Regex.Match(strExample,
@"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'"); // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator
This regex is a little tricky. Main tricks:
- Use
RegexOptions.RightToLeftto find the last match. - Use of Match.Result for a replace.
$`$'as replacement string: http://www.regular-expressions.info/refreplace.html\p{Lu}for upper-case letters, you can change that to[A-Z]if your more comfortable with that.
If the word shouldn't follow an upper case word, you can simplify the regex to:
@"[-/ ;(](\p{Lu}+)\b"If you want other characters as well, you can use a character class (and maybe remove
\b). For example:@"[-/ ;(]([\p{Lu}.,]+)"
Working example: http://ideone.com/U9AdK
use a List of strings, set all the words to it
find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.
in the below sentence of yours your index of / will be 6.
string strSentence ="This TASK is assigned to ANIL/SHAM in our project";
then use ElementAt(6) at the end of
index is the index of the / in your List<string>
str = str.Select(s => strSentence.ElementAt(index+1)).ToList();
this will return you the SHAM
str = str.Delete(s => strSentence.ElementAt(index+1));
this will delete the SHAM then just print the strSentence without SHAM
if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.
the idea of mine is right i think but the code may not be that flawless.
You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.
As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile
string strValue = "this is test A/ABC";
var s1=new string(
strValue
.TakeWhile(c => c!= '/')
.ToArray());
var s2=new string(
strValue
.SkipWhile(c => c!= '/')
.Skip(1)
.ToArray());
I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq
加载中,请稍侯......
精彩评论