How to search for a repeated unit in a long string using C#
How to search for a repeated unit in a long string?
string foo = "atccuahhqtccuahh";
With a repeated substring of ccuahh
, how can I determine the position where the repeat happen using regex?
Thank you guys. But the code posted is not working. I am s开发者_开发知识库earch for any type of repeat in a string. Anyone can post a tested code to help me out? Thanks a lot.
Use the string.IndexOf(string, int) overload. Start with the startIndex argument at 0, you'll get the index of the first match. Loop, now pass that found index+1 for the argument.
If you want to stick with Regex then use the Match.Index property.
var matches = Regex.Matches("atccuahhqtccuahh", "ccuahh");
var indices = matches.OfType<Match>().Select((m) => m.Index);
I guess you're looking for actuall reg exp. Here's the one that should work:
Regex re = new Regex(@"(.+).+?\1");
However it works a bit weired. In order to match long string (the one that you've used as an example) I had to write it this way:
Regex re = new Regex(@"(.{3,}).+?\1");
Without explicit lover boundary spec it matched 'a' and 'hh' only.
Probably I miss something about the way Regex works in .NET ...
You could use a Regex with grouping.
Regex r = new Regex( @"(.+).*\1" );
"(.+)" will create a match group for one or more characters and represents the repeated unit. This will need to be adjusted depending on the minimum number of characters you want the repeated unit to have. E.g. replace the '+' in the match group with '{x,}' where x is the minimum number of characters.
The "\1" matches the same characters matched by "(.+)";
Test code:
string input = "atccuahhqtccuahh";
Regex r = new Regex(@"(.+).*\1");
foreach (Match match in r.Matches(input))
{
Console.WriteLine(match.Index);
Console.WriteLine(match);
GroupCollection groups = match.Groups;
Console.WriteLine("'{0}' repeated at positions {1} and {2}",
groups[0].Value,
groups[0].Index,
groups[1].Index);
}
What about using LINQ?
string text = "aafffuaffuaffuafffua";
string search = "fff";
var byLinq = from i in Enumerable.Range(0, text.Length)
where text.Length - i - search.Length > 0
where text.Substring(i, search.Length) == search
select i;
Why do you want to use regex? From a quick glimpse it seems you could do this easily with just the normal string methods:
int GetIndexOfFirstRepetition(string text, string substring){
var firstOccurrenceIndex = text.IndexOf(substring);
var indexToSearchAfter = firstOccurrenceIndex + substring.Length;
return text.IndexOf(substring, indexToSearchAfter);
}
I'm assuming that the substring is actually repeated, and that by "find position where the repeat happened" you want the second occurrence of the substring, not the first.
精彩评论