Multifunction RegEx for parsing JCL variables - out of working solutions
I'm a bit lost creating a RegEx under C#.NET.
I'm doing something like parser, so I use Regex.Replace to search text for certain "variables" and replace them with their "values". Each variable starts with ampersand ("&") and ends with ampersand (begining of another variable) or dot. Each variable (as well as text surrounding variables) can only consist of alphanumerical characters and certain "special" characters, that being "$", "@", "#" and "-". Nor variables, nor the rest of the text could contain space characters (" ").Now, the problem is that I'm trying to figure out a RegEx replacing one possible ending character ("."), while not replacing the other possible ending character ("&"). Which happanes to be quite an issue:
- "&"+variable+"[^A-Za-z0-9#@$]" does what I want, except for it also replaces "&" - not acceptable.
- "&"+variable+"(.)?\b" replaces dot, but only if followed by literal character - not if it's followed by \&\@#\$\- and that could occur, so this doesn't work either.
- "&"+variable+"(.)?(?!A-Za-z0-9)" does exactly what i want as for the ending characters, except it doesn't recognize true end of variable - this way, search-and-replace for "&DEN" also replaces that part in another variable, called "&DENV" - of which "&DEN" is a substring. This would create false/misleading results - totally unacceptable.
Just to illustrate desired function:
string variable="DEN";
string replaceWith="28";
string replText;
string r开发者_运维技巧egex = "<desired regex>";
replText = Regex.Replace(replText, "&"+variable+regex, replaceWith);
replText="&DEN";
=> replaced => repltext=="28"
replText="&DENV"
=> not replaced => repltext=="&DENV"
replText="&DEN&DEN"
=> replaced => repltext=="2828"
replText="&DEN&DENV"
=> replaced, not replaced => repltext=="28&DENV"
replText="&DEN.anything"
=> replaced and dot removed => repltext=="28anything"
replText="&DEN..anything"
=> replaced and first dot removed => repltext=="28.anything"
variable could also be like "#DE@N-$".
The following works correctly on all of your examples. I assumed that a variable &FOO
should only be replaced if it's followed by .
, &
, or end-of-string $
. If it's followed by anything else, it's not replaced.
In order to match but not capture a terminating &
, I used a lookahead assertion (?=&)
. Assertions force the string to match the regex, but they don't consume any characters, so those characters aren't replaced. Trailing .
are still captured and replaced as part of the variable, however.
Finally, a MatchEvaluator
is specified to use the captured pattern to do a lookup in the replacements
dictionary for the replacement value. If the pattern (variable name) is not found, the text is effectively untouched (the full original capture is returned).
class Program
{
static string ReplaceVariables(Dictionary<string, string> replacements, string input)
{
return Regex.Replace(input, @"&([\w\d$@#-]+)(\.|(?=&)|$)", m =>
{
string replacement = null;
return replacements.TryGetValue(m.Groups[1].Value, out replacement)
? replacement
: m.Groups[0].Value;
});
}
static void Main(string[] args)
{
string[] tests = new[]
{
"&DEN", "&DENV", "&DEN&DEN",
"&DEN&DENV", "&DEN.anything",
"&DEN..anything", "&DEN Foo",
"&DEN&FOO&DEN"
};
var replace = new Dictionary<string, string>
{
{ "DEN", "28" },
{ "FOO", "42" }
};
foreach (var test in tests)
{
Console.WriteLine("{0} -> {1}", test, ReplaceVariables(replace, test));
}
}
}
Ok, I think I finally found it, using ORs. Regex
(.)?([^A-Za-z0-9#\@\$\&\,\;\:-\<>()\ ]|(?=\&)|\b)
seems to work fine. I'm just posting this if anyone found it helpfull.
EDIT: sorry, I haven't refreshed the page and thus reacted without knowing there is a better answer provided by Chris Schmich
精彩评论