开发者

Finding a word's frame (position and size) on the screen using Cocoa or Carbon

Here's a tough one:

I need to be able to find a word's position and size (its frame) on the screen (its first occurence is enough, from there I should be able to get the next ones).

For example, I would like to be able to detect word positions in (but not limited to) Word, Excel and PowerPoint for Mac, as well as Safari and others.

The solution should be as fast as possible; I should be able to find at least 5-6 words per second and use as little CPU time as possible.

Here's what I thought of so far:

  • OCR in a window's screenshot / graphics context (any good Open Source framework that works on Mac OS X 10.4 and that can be used in a commercial product?). Evernote is very good at spotting words in images. I don't know if it uses a custom in-house engine or an Open Source / commercial one but that would be the kind of engine I would like to use if this is a "valid" solution. Ideally I would detect the word's frame in the active application's window (how to get the frame of another application?).
  • Getting some kind of "hook" on Quartz drawing of text and intercepting the location of the word when it's drawn (does not seem very fea开发者_如何学Gosible at first glance!).
  • AppleScript, but it depends a lot on what API the application offers (I don't think you can get a word's coordinates in a Word document from what I've seen) and it's slow.
  • ... out of ideas ...

My goal is to get all the word's frames in a paragraph in the right order based on a string containing the text of the paragraph.

Thanks in advance for any hints!


As a starting place, you may want to take a look at QuickCursor's code. It retrieves text from many different applications through the AX Accessibility APIs. Now, it won't grab the pixel placement of the word, but it will at least return the NSString associated with the text in that UI element. Of course this means that the app in question has to support these APIs; I don't know if the MS Office suite would. In addition, it only supports editable elements, so an un-editable webpage in Safari won't work either. But it may give you a starting point for some ideas.

Take a look at the QCUIElement.{m,h}, and then the implementation in the QCAppDelegate.m (beginQuickCursorEdit:)... the implementation of his abstracted QCUIElement seems to be as simple as:

QCUIElement *focusedElement = [QCUIElement focusedElement];
id value = focusedElement.value;

Edit: Aha! Check out the Accessibility Inspector Sample code: UIElementInspector. It can actually get the AXPosition of elements on a page. Now, it's not word-by-word, but we're getting closer. It'll tell you the x,y placement of a textblock, as well as the words contained in the textblock.


This is possible, but very hard to get working reliably. You can play with Spell Catcher's Direct Connect feature to see an example.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜