In Ruby Search PDF, highlight text found, export JPG of the page
I wanted to see if anyone has done this.
In ruby, i'd like to open a PDF and search for text there. Any text that I find I would like to highlight in yellow, then return the 开发者_运维百科page(s) where I found the text as a jpg. Has anyone done this before?
Thanks, Craig
Is jruby or calling a jar via the command line an option? In that case you can use the java iText library and something along the lines of these answers
iText Search, Highlight, image of result
Is it possible to find text position with iText
If you're happy to use a c-extension you can achieve this with the ruby-gnome2 bindings. You'll need the poppler and gdk_pixbuf2 gems.
The API docs for these gems are a little skimpy, but you can find what there is at http://ruby-gnome2.sourceforge.jp/
require 'poppler'
require 'gdk_pixbuf2'
SCALE = 2
filename = "source.pdf"
doc = Poppler::Document.new(filename)
page = doc.get_page(0)
# render the page to an in-memory buffer
width, height = *page.size
buf = Gdk::Pixbuf.new(Gdk::Pixbuf::COLORSPACE_RGB, true, 8, width*SCALE, height*SCALE)
page.render(0, 0, width*SCALE, height*SCALE, SCALE, 0, buf)
# copy the rendered buffer into an pixmap for further editing
map = Gdk::Pixmap.new(nil, width*SCALE, height*SCALE, 24)
map.draw_pixbuf(nil, buf, 0, 0, 0, 0, -1, -1, Gdk::RGB::DITHER_NONE, 0, 0)
# setup highlight color and blend function
gc = Gdk::GC.new(map) # graphics context
gc.rgb_fg_color = Gdk::Color.new(65535, 65535, 0)
gc.function = Gdk::GC::AND
# find each match and highlight it. The co-ordinate maths is ugly but
# necesary to convert from PDF co-ords to Pixmap co-ords
page.find_text("the").each do |match|
matchx = match.x1 * SCALE
matchy = (height - match.y2) * SCALE
matchw = (match.x2-match.x1) * SCALE
matchh = (match.y2-match.y1) * SCALE
map.draw_rectangle(gc, true, matchx, matchy, matchw, matchh)
end
# save the buffer to a JPG
newbuf = Gdk::Pixbuf.from_drawable(nil, map, 0, 0, width*SCALE, height*SCALE)
newbuf.save("foo.jpg", "jpeg")
精彩评论