Regular Expression to identify placeholders
I am trying to replace placeholders in a text file, with html elements built from the placeholder content.
So for example I have a placeholder such as {Image, picture.jpg, Centre, Picture Info}
I want to convert this into:
<img src="urltopicture\picture.jpg" alt="Picture Info" class="quipImgCentre"></img>
I'm looking to use a Regex
to identify all placeholders, then worki开发者_C百科ng backwards through the document convert and replace each one in turn.
The Regex {.*} works where there is only one placeholder on a line, but not if more than one - in the text below, it will return as one long placeholder, everything from the first opening "{" to the last "}".
Aenean non felis at est gravida tincidunt. {Link, news.bbc.co.uk, popup, 500, 800} Donec non diam a mauris vestibulum condimentum eu vitae mi! Aenean sed elit libero, id mollis felis! {Image, ServiceTile.jpg, Left}
Also - if anyone has a neater way of performing this placeholder replacement I'd love to hear it.
Repeat this part for each of your Placeholders:
Regex PlaceholderExpander = new Regex(@"\{Image, ([^,]+), ([^,]+)(?:, ([^}]+))?\}");
string Expanded = PlaceholderExpander.Replace(YourHtmlStringWithPlaceholders, "<img src='$1' alt='$3' class='quipImg$2'></img>");
The [^,]
means "any character but a ,
", so that stops before the next ,
in spite of the greedy +
quantifier. It's a trick for processing speed. A more obvious alternative would be using a lazy (a.k.a. ungreedy, reluctant) quantifier.
The (?:…)
is a non-capturing group - it can not be backreferenced with something like $3
. I used it to encompass the part belonging to the optional last parameter - it is made optional with the last ?
.
I made the last parameter optional now, so it supports both
{Image, picture.jpg, Centre, Picture Info}
and
{Image, ServiceTile.jpg, Left}
the latter resulting in
<img src='ServiceTile.jpg' alt='' class='quipImgLeft'></img>
I have tested this in http://rextester.com/rundotnet with this code:
string YourHtmlStringWithPlaceholders = "Aenean {Image, picture.jpg, Centre, Picture Info} non felis at est gravida tincidunt. {Link, news.bbc.co.uk, popup, 500, 800} Donec non diam a mauris vestibulum condimentum eu vitae mi! Aenean sed elit libero, id mollis felis! {Image, ServiceTile.jpg, Left}";
Regex PlaceholderExpander = new Regex(@"\{Image, ([^,]+), ([^,]+)(?:, ([^}]+))?\}");
string Expanded = PlaceholderExpander.Replace(YourHtmlStringWithPlaceholders,"<img src='$1' alt='$3' class='quipImg$2'></img>");
Console.WriteLine(Expanded);
You're looking for an "ungreedy match" (note the ?
), basically. The following:
/\{(.*?)\}/
Will match as few characters possible within the braces. From there, you will need to grab the contents and parse according to how you feel the format should follow.
If you're looking for images only, you could, of course, specify that as well:
/\{Image (.*?)\}/
I guess you just want something like \{[^{}\n\r]+}
.
Added \n\r
in there so it wouldn't run away too much on a random {
.
You can change the regex to be less greedy: {[^}]+}
精彩评论