Convert plain text list to html
I have a plain text list like this:
I am the first top-level list item I am his son Me too Second one here His son His daughter I am the son of the one above Me too because of the indentation Another one
And I would like to turn that into:
<ul>
<li>I am the first top-level list-item
<ul>
<li>I am his son</li>
<li>Me too</li>
</ul>
</li>
<li>Second one here
<ul>
<li>His son</li>
<li>His daughter
<ul>
<li>I am the son of the one above<开发者_如何转开发/li>
<li>Me too because of the indentation</li>
</ul>
</li>
<li>Another one</li>
</ul>
</li>
</ul>
How would one go about doing that?
I never used ruby but the usual algorithm stays the same:
- Create a data structure like this:
Node => (Text => string, Children => array of Nodes) - Read a line
- Check if indent is higher than current indent
- If yes, append the Line to the Children of the current Node and call the method recursively with the node as active. Continue from 2.
- Check if indent is equal to current indent.
- If yes, append the line to the active node. Continue from 2.
- Check if the indent is lower than the current indent.
- If yes, return from the method.
- Repeat until EOF.
For output:
1. print <ul>
2. Take the first node, print <li>node.Text
3. If there are child nodes (count of node.Children > 0) recurse to 1.
4. print </li>
5. take next node, continue from 2.
6. print </ul>
This code does work as expected, but the titles are printed on a new line.
require "rubygems"
require "builder"
def get_indent(line)
line.to_s =~ /(\s*)(.*)/
$1.size
end
def create_list(lines, list_indent = -1,
b = Builder::XmlMarkup.new(:indent => 2, :target => $stdout))
while not lines.empty?
line_indent = get_indent lines.first
if line_indent == list_indent
b.li {
b.text! lines.shift.strip + $/
if get_indent(lines.first) > line_indent
create_list(lines, line_indent, b)
end
}
elsif line_indent < list_indent
break
else
b.ul {
create_list(lines, line_indent, b)
}
end
end
end
transform the input into Haml, then render that as HTML
require 'haml'
def text_to_html(input)
indent = -1
haml = input.gsub(/^( *)/) do |match|
line_indent = $1.length
repl = line_indent > indent ? "#{$1}%ul\n" : ''
indent = line_indent
repl << " #{$1}%li "
end
Haml::Engine.new(haml).render
end
puts text_to_html(<<END)
I am the first top-level list item
I am his son
Me too
Second one here
His son
His daughter
I am the son of the one above
Me too because of the indentation
Another one
END
results in
<ul>
<li>I am the first top-level list item</li>
<ul>
<li>I am his son</li>
<li>Me too</li>
</ul>
<li>Second one here</li>
<ul>
<li>His son</li>
<li>His daughter</li>
<ul>
<li>I am the son of the one above</li>
<li>Me too because of the indentation</li>
</ul>
<li>Another one</li>
</ul>
</ul>
Old topic, but...
Looks like I found a way to make Glenn Jackman code html valid (avoid <ul>
with child <ul>
).
I'm using strings with tab indentation.
require 'haml'
class String
def text2htmllist
tabs = -1
topUL=true
addme=''
haml = self.gsub(/^([\t]*)/) do |match|
line_tabs = match.length
if ( line_tabs > tabs )
if topUL
repl = "#{match}#{addme}%ul\n"
topUL=false
else
repl = "#{match}#{addme}%li\n"
addme += "\t"
repl += "#{match}#{addme}%ul\n"
end
else
repl = ''
addme = addme.gsub(/^[\t]/,'') if ( line_tabs < tabs ) #remove one \t
end
tabs = line_tabs
repl << "\t#{match}#{addme}%li "
end
puts haml
Haml::Engine.new(haml).render
end
end #String class
str = <<FIM
I am the first top-level list item
I am his son
Me too
Second one here
His son
His daughter
I am the son of the one above
Me too because of the indentation
Another one
FIM
puts str.text2htmllist
Produces:
%ul
%li I am the first top-level list item
%li
%ul
%li I am his son
%li Me too
%li Second one here
%li
%ul
%li His son
%li His daughter
%li
%ul
%li I am the son of the one above
%li Me too because of the indentation
%li Another one
<ul>
<li>I am the first top-level list item</li>
<li>
<ul>
<li>I am his son</li>
<li>Me too</li>
</ul>
</li>
<li>Second one here</li>
<li>
<ul>
<li>His son</li>
<li>His daughter</li>
<li>
<ul>
<li>I am the son of the one above</li>
<li>Me too because of the indentation</li>
</ul>
</li>
<li>Another one</li>
</ul>
</li>
</ul>
You could probably do so by doing some simple find & replace stuff. Programs like TextWrangler on the mac, Notepad++ on Windows, and possibly gedit on linux (not sure how well its find stuff works with complicated stuff), can search for newlines and replace them with other things. Start with the highest level stuff and work your way in (start with the stuff without spaces at the front and work in). You will likely have to experiment a bit to get the right stuff. If this is something you want to do on a regular basis you could probably make a small script, but I doubt that is the case.
精彩评论