开发者

Meta-regular expressions?

I wrote a file routing utility (.NET) some time ago to examine a file's location and name pattern and move it to some other preconfigured place based on the match. Fairly simple, straightforward kinda stuff. I had included the possibility of minor transformations through a series of regular expression search-and-replace actions that could be assigned to the file "route", with the intent of adding header rows, replacing commas with pipes, that sort of thing.

So now I have a new text feed that consists of a file header, a batch header, and a multitude of detail records under the batches. The file header contains a count of all detail records in the file, and I have been asked to "split" the file in the assigned transformations, essentially producing a file for each batch record. This is fairly straightforward, as well, but the kicker is, there is an expectation to update the file header for each file to reflect the detail count.

I do not even know if this is possible with pure regular expressions. Can I count the number of matches of a group in a given text document and replace the count value in the original text, or am I going to have to write a custom transformer for this one file?

If I have to write another transformer, are there suggestions on how to make it generic enough to be reusable? I'm considering adding an XSLT transformer option, but my understanding of XSLT is not so great.

I've been asked for an example. Say I have a file like so:

FILE001DETAILCOUN开发者_如何学GoT002
BATCH01
DETAIL001FOO
BATCH02
DETAIL001BAR

this file will be split and stored in two locations. The files will look like this:

FILE001DETAILCOUNT001
BATCH01
DETAIL001FOO

and

FILE001DETAILCOUNT001
BATCH01
DETAIL001BAR

so the sticker for me is the file header's DETAILCOUNT value.


Regular expressions by themselves can't count the number of matches they've made (or, better put, they don't expose that to the regex user), so you do need additional program code to keep track of this.

A regex can only capture text that exists somewhere in the source material, it can't generate new text. So unless you can find the number you need explicitly at some point in the source, you're out of luck. Sorry.


My program first breaks the text into batches.

I think you'll agree that resequencing the detail number is the trickiest part. You can do it with a MatchEvaluator delegate.

Regex.Replace (
   text, // the text replace part of
   @"(?<=^DETAIL)\d+", // the regex pattern to find.
   m => (detailNum++).ToString ("000"), // replacement (evaluated for each match)
   RegexOptions.Multiline);

See how the preceeding code increments detailNum at the begining of each batch.

  var contents = 
@"FILE001DETAILCOUNT002
BATCH01
DETAIL001FOO
BATCH02
DETAIL001BAR";

  // foreach batch....
  foreach (Match match in Regex.Matches (contents, @"BATCH\d+\s+(?:(?!BATCH\d+).*\s*)+"))
  {
     Console.WriteLine ("==============\r\nFile\r\n================");
     int batchNum = 1;
     int detailNum = 1;
     StringBuilder temp = new StringBuilder ();
     TextWriter file = new StringWriter (temp);
     // Your file here instead of my stringBuilder/StringWriter

     string batchText = match.Value;
     int count = Regex.Matches (batchText, @"^DETAIL\d+", RegexOptions.Multiline).Count;
     file.WriteLine ("FILE001DETAILCOUNT{0:000}", count);
     string newText = Regex.Replace (batchText, @"(?<=^BATCH)\d+", batchNum.ToString ("000"), RegexOptions.Multiline);
     newText = Regex.Replace (
        newText, 
        @"(?<=^DETAIL)\d+", 
        m => (detailNum++).ToString ("000"), // replacement (evaluated for each match)
        RegexOptions.Multiline);
     file.Write (newText);

     Console.WriteLine (temp.ToString ());
  }

prints

==============
File
================
FILE001DETAILCOUNT001
BATCH001
DETAIL001FOO

==============
File
================
FILE001DETAILCOUNT001
BATCH001
DETAIL001BAR
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜