开发者

Convert plain text list to html

I have a plain text list like this:

I am the first top-level list item
  I am his son
  Me too
Second one here
  His son
  His daughter
    I am the son of the one above
    Me too because of the indentation
  Another one

And I would like to turn that into:

<ul>
  <li>I am the first top-level list-item
    <ul>
      <li>I am his son</li>
      <li>Me too</li>
    </ul>
  </li>
  <li>Second one here
    <ul>
      <li>His son</li>
      <li>His daughter
        <ul>
          <li>I am the son of the one above<开发者_如何转开发/li>
          <li>Me too because of the indentation</li>
        </ul>
      </li>
      <li>Another one</li>
    </ul>
  </li>
</ul>

How would one go about doing that?


I never used ruby but the usual algorithm stays the same:

  1. Create a data structure like this:
    Node => (Text => string, Children => array of Nodes)
  2. Read a line
  3. Check if indent is higher than current indent
  4. If yes, append the Line to the Children of the current Node and call the method recursively with the node as active. Continue from 2.
  5. Check if indent is equal to current indent.
  6. If yes, append the line to the active node. Continue from 2.
  7. Check if the indent is lower than the current indent.
  8. If yes, return from the method.
  9. Repeat until EOF.

For output:

1. print <ul>
2. Take the first node, print <li>node.Text
3. If there are child nodes (count of node.Children > 0) recurse to 1.
4. print </li>
5. take next node, continue from 2.
6. print </ul>


This code does work as expected, but the titles are printed on a new line.

require "rubygems"
require "builder"

def get_indent(line)
  line.to_s =~ /(\s*)(.*)/
  $1.size
end

def create_list(lines, list_indent = -1, 
       b = Builder::XmlMarkup.new(:indent => 2, :target => $stdout))
  while not lines.empty?
    line_indent = get_indent lines.first

    if line_indent == list_indent
      b.li {
        b.text! lines.shift.strip + $/
        if get_indent(lines.first) > line_indent
          create_list(lines, line_indent, b)
        end
      }
    elsif line_indent < list_indent
      break
    else
      b.ul {
        create_list(lines, line_indent, b)
      }
    end
  end
end


transform the input into Haml, then render that as HTML

require 'haml'

def text_to_html(input)
  indent = -1
  haml = input.gsub(/^( *)/) do |match|
    line_indent = $1.length
    repl = line_indent > indent ? "#{$1}%ul\n" : ''
    indent = line_indent
    repl << "  #{$1}%li "
  end
  Haml::Engine.new(haml).render
end

puts text_to_html(<<END)
I am the first top-level list item
  I am his son
  Me too
Second one here
  His son
  His daughter
    I am the son of the one above
    Me too because of the indentation
  Another one
END

results in

<ul>
  <li>I am the first top-level list item</li>
  <ul>
    <li>I am his son</li>
    <li>Me too</li>
  </ul>
  <li>Second one here</li>
  <ul>
    <li>His son</li>
    <li>His daughter</li>
    <ul>
      <li>I am the son of the one above</li>
      <li>Me too because of the indentation</li>
    </ul>
    <li>Another one</li>
  </ul>
</ul>


Old topic, but... Looks like I found a way to make Glenn Jackman code html valid (avoid <ul> with child <ul>).
I'm using strings with tab indentation.

    require 'haml'
    class String
       def text2htmllist
         tabs = -1
         topUL=true
         addme=''

         haml = self.gsub(/^([\t]*)/) do |match|
           line_tabs = match.length

           if ( line_tabs > tabs )
                if topUL
                    repl = "#{match}#{addme}%ul\n"
                    topUL=false
                else
                    repl = "#{match}#{addme}%li\n"
                    addme += "\t"
                    repl += "#{match}#{addme}%ul\n"
                end
           else
              repl = ''
              addme = addme.gsub(/^[\t]/,'') if ( line_tabs < tabs ) #remove one \t 
           end
           tabs = line_tabs
           repl << "\t#{match}#{addme}%li "

         end
         puts haml
         Haml::Engine.new(haml).render
       end
    end #String class

    str = <<FIM
    I am the first top-level list item
        I am his son
        Me too
    Second one here
        His son
        His daughter
            I am the son of the one above
            Me too because of the indentation
        Another one
    FIM

    puts str.text2htmllist

Produces:

%ul
    %li I am the first top-level list item
    %li
        %ul
            %li I am his son
            %li Me too
    %li Second one here
    %li
        %ul
            %li His son
            %li His daughter
            %li
                %ul
                    %li I am the son of the one above
                    %li Me too because of the indentation
            %li Another one
<ul>
  <li>I am the first top-level list item</li>
  <li>
    <ul>
      <li>I am his son</li>
      <li>Me too</li>
    </ul>
  </li>
  <li>Second one here</li>
  <li>
    <ul>
      <li>His son</li>
      <li>His daughter</li>
      <li>
        <ul>
          <li>I am the son of the one above</li>
          <li>Me too because of the indentation</li>
        </ul>
      </li>
      <li>Another one</li>
    </ul>
  </li>
</ul>


You could probably do so by doing some simple find & replace stuff. Programs like TextWrangler on the mac, Notepad++ on Windows, and possibly gedit on linux (not sure how well its find stuff works with complicated stuff), can search for newlines and replace them with other things. Start with the highest level stuff and work your way in (start with the stuff without spaces at the front and work in). You will likely have to experiment a bit to get the right stuff. If this is something you want to do on a regular basis you could probably make a small script, but I doubt that is the case.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜