Regex: replace inner string

2022-12-24 07:21 问答作者：

I'm working with X12 EDI Files (Specifically 835s for those of you in Health Care), and I have a particular vendor who's using a non-HIPAA compliant version (3090, I think). The problem is that in a particular segment (PLB- again, for those who care) they're sending a code which is no longer supported by the HIPAA Standard. I need to locate the specific code, and update it with a corrected code.

I think a Regex would be best for this, but I'm still very new to Regex, and I'm not sure where to begin. My current methodology is to turn the开发者_运维问答 file into an array of strings, find the array that starts with "PLB", break that into an array of strings, find the code, and change it. As you can guess, that's very verbose code for something which should be (I'd think) fairly simple.

Here's a sample of what I'm looking for:

~PLB|1902841224|20100228|49>KC15X078001104|.08~

And here's what I want to change it to:

~PLB|1902841224|20100228|CS>KC15X078001104|.08~

Any suggestions?

UPDATE: After review, I found I hadn't quite defined my question well enough. The record above is an example, but it is not necessarilly a specific formatting match- there are three things which could change between this record and some other (in another file) I'd have to fix. They are:

The Pipe (|) could potentially be any non-alpha numeric character. The file itself will define which character (normally a Pipe or Asterisk).
The > could also be any other non-alpha numeric character (most often : or >)
The set of numbers immediately following the PLB is an identifier, and could change in format and length. I've only ever seen numeric Ids there, but technically it could be alpha numeric, and it won't necessarilly be 10 characters.

My Plan is to use String.Format() with my Regex match string so that | and > can be replaced with the correct characters.

And for the record. Yes, I hate ANSI X12.

Assuming that the "offending" code is always 49, you can use the following:

resultString = Regex.Replace(subjectString, @"(?<=~PLB|\d{10}|\d{8}|)49(?=>\w+|)", "CS");

This looks for 49 if it's the first element after a | delimiter, preceded by a group of 8 digits, another |, a group of 10 digits, yet another |, and ~PLB. It also looks if it is followed by >, then any number of alphanumeric characters, and one more |.

With the new requirements (and the lucky coincidence that .NET is one of the few regex flavors that allow variable repetition inside lookbehind), you can change that to:

resultString = Regex.Replace(subjectString, @"(?<=~PLB\1\w+\1\d{8}(\W))49(?=\W\w+\1)", "CS");

Now any non-alphanumeric character is allowed as separator instead of | or > (but in the case of | it has to be always the same one), and the restrictions on the number of characters for the first field have been loosened.

Another, similar approach that works on any valid X12 file to replace a single data value with another on a matching segment:

public void ReplaceData(string filePath, string segmentName, 
    int elementPosition, int componentPosition, 
    string oldData, string newData)
{
    string text = File.ReadAllText(filePath);

    Match match = Regex.Match(text, 
     @"^ISA(?<e>.).{100}(?<c>.)(?<s>.)(\w+.*?\k<s>)*IEA\k<e>\d*\k<e>\d*\k<s>$");

    if (!match.Success)
        throw new InvalidOperationException("Not an X12 file");

    char elementSeparator = match.Groups["e"].Value[0];
    char componentSeparator = match.Groups["c"].Value[0];
    char segmentTerminator = match.Groups["s"].Value[0];

    var segments = text
        .Split(segmentTerminator)
        .Select(s => s.Split(elementSeparator)
            .Select(e => e.Split(componentSeparator)).ToArray())
        .ToArray();

    foreach (var segment in segments.Where(s => s[0][0] == segmentName &&
                              s.Count() > elementPosition &&
                              s[elementPosition].Count() > componentPosition &&
                              s[elementPosition][componentPosition] == oldData))
    {
        segment[elementPosition][componentPosition] = newData;
    }

    File.WriteAllText(filePath,
        string.Join(segmentTerminator.ToString(), segments
        .Select(e => string.Join(elementSeparator.ToString(), 
            e.Select(c => string.Join(componentSeparator.ToString(), c))
             .ToArray()))
        .ToArray()));
}

The regular expression used validates a proper X12 interchange envelope and assures that all segments within the file contain at least a one character name element. It also parses out the element and component separators as well as the segment terminator.

Assuming that your code is always a two digit number that comes after a pipe character | and before the greater than sign > you can do it like this:

var result = Regex.Replace(yourString, @"(\|)(\d{2})(>)", @"$1CS$3");

You can break it down with regex yes. If i understand your example correctly the 2 characters between the | and the > need to be letters and not digits.

~PLB\|\d{10}\|\d{8}\|(\d{2})>\w{14}\|\.\d{2}~

This pattern will match the old one and capture the characters between the | and the >. Which you can then use to modify (lookup in a db or something) and do a replace with the following pattern:

(?<=|)\d{2}(?=>)

This will look for the ~PLB|#|#| at the start and replace the 2 numbers before the > with CS.

Regex.Replace(testString, @"(?<=~PLB|[0-9]{10}|[0-9]{8})(\|)([0-9]{2})(>)", @"$1CS$3")

The X12 protocol standard allows the specification of element and component separators in the header, so anything that hard-codes the "|" and ">" characters could eventually break. Since the standard mandates that the characters used as separators (and segment terminators, e.g., "~") cannot appear within the data (there is no escape sequence to allow them to be embedded), parsing the syntax is very simple. Maybe you're already doing something similar to this, but for readability...

// The original segment string (without segment terminator):

string segment = "PLB|1902841224|20100228|49>KC15X078001104|.08";

// Parse the segment into elements, then the fourth element
// into components (bounds checking is omitted for brevity):

var elements = segment.Split('|');
var components = elements[3].Split('>');

// If the first component is the bad value, replace it with
// the correct value (again, not checking bounds):

if (components[0] == "49")
    components[0] = "CS";

// Reassemble the segment by joining the components into
// the fourth element, then the elements back into the
// segment string:

elements[3] = string.Join(">", components);
segment = string.Join("|", elements);

Obviously more verbose than a single regular expression but parsing X12 files is as easy as splitting strings on a single character. Except for the fixed length header (which defines the delimiters), an entire transaction set can be parsed with Split:

// Starting with a string that contains the entire 835 transaction set:

var segments = transactionSet.Split('~');
var segmentElements = segments.Select(s => s.Split('|')).ToArray();

// segmentElements contains an array of element arrays,
// each composite element can be split further into components as shown earlier

What I found is working is the following:

parts = original.Split(record);

        for(int i = parts.Length -1; i >= 0; i--)
        {
            string s = parts[i];
            string nString =String.Empty;
            if (s.StartsWith("PLB"))
            {
                string[] elems = s.Split(elem);
                if (elems[3].Contains("49" + subelem.ToString()))
                {
                    string regex = string.Format(@"(\{0})49({1})", elem, subelem);
                    nString = Regex.Replace(s, regex, @"$1CS$2");
                }

I'm still having to split my original file into a set of strings and then evaluate each string, but the that seams to be working now.

If anyone knows how to get around that string.Split up at the top, I'd love to see a sample.

继续阅读：regex string

Regex: replace inner string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？