Parsing Dreamweaver templates with Regular Expressions

2022-12-08 19:14 问答作者：

I have a requirement to parse the content out of Dreamweaver templates. I'm using C#.

Here is some example content that I will need to parse.

<div id="myDiv">
    <h1><!-- InstanceBeginEditable name="PageHeading" -->
    The Heading<!-- InstanceEndEditable --></h1>
    <!-- InstanceBeginEditable name="PageContent" -->
    <开发者_JS百科;p>
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed nibh turpis, 
    sagittis vitae convallis at, fringilla nec augue.</p>
    <p>
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
    Sed nibh turpis, sagittis vitae convallis at, fringilla nec augue.</p>
    <!-- InstanceEndEditable -->
</div><!-- END #myDiv-->

Dreamweaver templates are based around HTML comments with specific strings denoting their purpose. They key ones for me are as follows, as they denote the start and end of editable regions in the page.

<!-- InstanceBeginEditable name="xxxxxx" -->
<!-- InstanceEndEditable -->

As you can see from my example HTML, there may be other comments in the source code.

So starting simple, I have the following, which matches all the opening Editable region tags.

<!-- InstanceBeginEditable(.*)?-->

So next I want to get everything between there and the next "

<!-- InstanceBeginEditable(.*)?-->(?<content>(.*)?)<!-- InstanceEnd

Can you tell me why this is so. I would have thought a non-greedy capture (.*)? in-between my already working code and the literal

<!—InstanceEnd

would have matched what I need...

You don't want to put parentheses around .*.

This means to grab everything greedily, or not.

(.*)?

This means to grab everything lazily:

.*?

Also, in your regex, you have only one - in the ending token. Change it to this:

<!-- InstanceBeginEditable.*?-->(?<content>.*?)<!-- InstanceEnd

By the way, it's dangerous to have two .*s in a regex without an atomic group. On unexpected data, you can get catastrophic backtracking. I'd recommend changing the first .*? to [^-]*. And, while I'm at it, I'll suggest you handle whitespace more forgivingly:

<!--\s*InstanceBeginEditable[^-]*-->(?<content>.*?)<!--\s*InstanceEnd

You probably already know this, but let me add that with .NET, you'll need to use RegexOptions.Singleline.

Use the HTML Agility Pack, see my answer here, How do I parse HTML using regular expressions in C#?

继续阅读：dreamweaver dreamweaver-templates regex templates

Parsing Dreamweaver templates with Regular Expressions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？