开发者

Lua Pattern Matching to "fix" html code

I have a lot of badly formatted HTML which I am trying to fix using Lua for example

<p class='heading'>my useful information</p>
<p class='body'>lots more text</p>

which I want to replace with

<h2>my useful information</h2>
<p class='body'>lots more text</p>

What I am trying to use is the following Lua function which is passed the whole html page. How ever I have two problems, I want the gsub to pass the replace function the whole ma开发者_StackOverflowtch including the top and tail and I will then replace the top and tails and return the string. The other problem is my inner replace function can't see the top and tail fields.

Sorry if this is an obvious one, but I am still learning Lua.

function topandtailreplace(str,top,tail,newtop,newtail)
local strsearch = top..'(.*)'..tail
     function replace(str)
            str = string.gsub(str,top,newtop)
            str = string.gsub(str,tail,newtail)
            return str
    end
    local newstr = str:gsub(strsearch,replace())
    return newstr
end


This seems to work:

s=[[
<p class='heading'>my useful information</p>
<p class='body'>lots more text</p>
]]

s=s:gsub("<p class='heading'>(.-)</p>","<h2>%1</h2>")
print(s)


You could use a HTML parsing library with a DOM tree, for example lua-gumbo:

luarocks install gumbo

The following example would do what you want:

local gumbo = require "gumbo"

local input = [[
    <p class='heading'>my useful information</p>
    <p class='body'>lots more text</p>
]]

local document = assert(gumbo.parse(input))
local headings = assert(document:getElementsByClassName("heading"))
local heading1 = assert(headings[1])
local textnode = assert(heading1.childNodes[1])
local new_h2 = assert(document:createElement("h2"))

heading1.parentNode:insertBefore(new_h2, heading1)
new_h2:appendChild(textnode)
heading1:remove()

io.write(document:serialize(), "\n")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜