开发者

Regular expression to convert ul to textformat and back, with a different attribute value for first tag (VB.NET)

This is a related to a previous question I have asked here, see the link below for a brief description as to why I am trying to do this.

Regular expression from font to span (size and colour) and back (VB.NET)

Basically I need a regex replace function (or if this can be done in pure VB then that's fine) to convert all ul tags in a string to textindent tags, with a different attribute value for the first textindent tag.

For example:

<ul>
   <li>This is some text</li>
   <li>This is some more text</li>
   <li>
      <ul>
         <li>This is some indented text</li>
         <li>This is some more text</li>
      </ul>
   </li>
   <li>More text!</li>
   <li>
      <ul>
         <li>This is some indented text</li>
         <li>This is some more text</li>
      </ul>
   </li>
   <li>More text!</li>
</ul>

<ul>
   <li>Another list item</li>
   <li>
      <ul>
         <li>Another nested list item</li>
       </ul>
   </li>
</ul>

Will become:

<textformat indent="0">
   <li>This is some text</li>
   <li>This is some more text</li>
   <li>
    开发者_StackOverflow中文版  <textformat indent="20">
         <li>This is some indented text</li>
         <li>This is some more text</li>
      </textformat>
   </li>
   <li>More text!</li>
   <li>
      <textformat indent="20">
         <li>This is some indented text</li>
         <li>This is some more text</li>
      </textformat>
   </li>
   <li>More text!</li>
</textformat>

<textformat indent="0">
   <li>Another list item</li>
   <li>
      <textformat indent="20">
         <li>Another nested list item</li>
      </textformat>
   </li>
</textformat>

Basically I want the first ul tag to have no indenting, but all nested ul tags to have an indent of 20.

I appreciate this is a strange request but hopefully that makes sense, please let me know if you have any questions.

Thanks in advance.


It's possible with regex but LINQ to XML is simpler. I've included LINQ to XML and a regex solution, although I would favor the former.

Here's the LINQ to XML approach. Since ul is the top element its Name can be changed directly. Descendants will grab all the nested ul items. The only caveat with this approach is it only works if the input is well-formed. If it's wrong LINQ to XML will fail to parse it. Also, if it is well-formed and the ul isn't the top element but is part of a larger HTML block of text then you'll need to loop over Elements("ul") then do the same thing over each of them.

If the HTML is malformed you may want to look at the HTML Agility Pack.

Dim xml = XElement.Parse(input)
xml.Name = "textformat"
xml.SetAttributeValue("indent", "0")
For Each item In xml.Descendants("ul")
    item.Name = "textformat"
    item.SetAttributeValue("indent", "20")
Next

And here's the regex approach. It's not easy to detect the first ul item to distinguish between the two so this approach changes all of them to an indent of 20, then an extra step is taken to find the first textformat and change its indent to zero.

Dim pattern As String = "<ul>|</ul>"
Dim result As String = Regex.Replace(input, pattern, Function(m) If(m.Value.StartsWith("</"), "</textformat>", "<textformat indent=""20"">"))
Dim firstTextFormatPattern As String = "^(?<Start><textformat\s+indent="")\d+?(?<End>"">)"
result = Regex.Replace(result, firstTextFormatPattern, "${Start}0${End}")


Thanks for your help with this, I have managed to work out a solution myself using your reply.

Basically I am using a counter to keep track of what level of ul tag the regex has found, and then replacing it with the relevant attribute:

Dim ulCounter As Integer = 0    
Dim rxUL As New Regex("<ul>|</ul>")

xmlValue = rxUL.Replace(xmlValue, AddressOf Convert_UL)


Protected Function Convert_UL(ByVal m As Match) As String

    Dim HTML As String = ""

    If m.Value = "</ul>" Then
        ulCounter -= 1

        HTML = "</textformat>"
    Else
        ulCounter += 1

        If ulCounter > 1 Then
            HTML = "<textformat indent=""20"">"
        Else
            HTML = "<textformat indent=""0"">"
        End If
    End If

    Return HTML

End Function

This was a pretty random request so I'm not sure how much help this would be to anyone else, but just in case that was how I got round it!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜