开发者

In Ruby Search PDF, highlight text found, export JPG of the page

I wanted to see if anyone has done this.

In ruby, i'd like to open a PDF and search for text there. Any text that I find I would like to highlight in yellow, then return the 开发者_运维百科page(s) where I found the text as a jpg. Has anyone done this before?

Thanks, Craig


Is jruby or calling a jar via the command line an option? In that case you can use the java iText library and something along the lines of these answers

iText Search, Highlight, image of result

Is it possible to find text position with iText


If you're happy to use a c-extension you can achieve this with the ruby-gnome2 bindings. You'll need the poppler and gdk_pixbuf2 gems.

The API docs for these gems are a little skimpy, but you can find what there is at http://ruby-gnome2.sourceforge.jp/

require 'poppler'
require 'gdk_pixbuf2'

SCALE = 2

filename = "source.pdf"
doc = Poppler::Document.new(filename)
page = doc.get_page(0)

# render the page to an in-memory buffer
width, height = *page.size
buf = Gdk::Pixbuf.new(Gdk::Pixbuf::COLORSPACE_RGB, true, 8, width*SCALE, height*SCALE)
page.render(0, 0, width*SCALE, height*SCALE, SCALE, 0, buf)

# copy the rendered buffer into an pixmap for further editing
map = Gdk::Pixmap.new(nil, width*SCALE, height*SCALE, 24)
map.draw_pixbuf(nil, buf, 0, 0, 0, 0, -1, -1, Gdk::RGB::DITHER_NONE, 0, 0)

# setup highlight color and blend function
gc  = Gdk::GC.new(map) # graphics context
gc.rgb_fg_color = Gdk::Color.new(65535, 65535, 0)
gc.function = Gdk::GC::AND

# find each match and highlight it. The co-ordinate maths is ugly but
# necesary to convert from PDF co-ords to Pixmap co-ords
page.find_text("the").each do |match|
  matchx = match.x1 * SCALE
  matchy = (height - match.y2) * SCALE
  matchw = (match.x2-match.x1) * SCALE
  matchh = (match.y2-match.y1) * SCALE
  map.draw_rectangle(gc, true, matchx, matchy, matchw, matchh)
end

# save the buffer to a JPG
newbuf = Gdk::Pixbuf.from_drawable(nil, map, 0, 0, width*SCALE, height*SCALE)
newbuf.save("foo.jpg", "jpeg")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜