Algorithm for Rendering Long Text in a Text Editor
I've been thinking about writing a text editor control that can edit text that can have any a开发者_如何学运维rbitrary length (say, hundreds of megabytes), similar in some ways to the Scintilla editor. The goal is to lazy-read the file, so the user doesn't have to read five hundred megabytes of data just to view a small portion of it. I'm having two problems with this:
It seems to me to be impossible to implement any sensible scrolling feature for such an editor, unless I pre-read the entire file once, in order to figure out line breaks. Is this really true? Or is there a way to approximate things, that I'm not thinking of?
Because of various issues with Unicode (e.g. it allows many bytes to represent just one character, not just because of variable-length encoding but also because of accents and such), it seems nearly impossible to determine exactly how much text will fit on the screen -- I'd have to use TextOut() or something to draw one character, measure how big it was, and then draw the next character. And even then, that still doesn't say how I'd map the user's clicks back to the correct text position.
Is there anything I could read on the web regarding algorithms for handling these issues? I've searched, but I haven't found anything.
Thank you!
You can set a "coarse" position based on data size instead of lines. The "fine" position of your text window can be based on a local scan around an arbitrary entry point.
This means you will need to write functions that can scan locally (backwards and forwards) to find line starts, count Unicode characters, and so forth. This should not be too difficult; UTF8 is designed to be easy to parse in this way.
You may want to give special consideration of what to do about extremely long lines. Since there is no upper limit on how long a line can be, this makes finding the beginning (or end) of a line an unbounded task; I believe everything else you need for a screen editor display should be local.
Finally, if you want a general text editor, you need to figure out what you're going to do when you want to save a file in which you've inserted/deleted things. The straightforward thing is to rewrite the file; however, this is obviously going to take longer with a huge file. You can expect the user to run into problems if there is not enough room for a modified copy, so at the very least, you will want to check to make sure there is enough room on the filesystem.
@comingstorm is basically right. For display, you start at the cursor and scan backwards until you're sure you're past the top of the screen. Then you scan backwards to a line end, assuming you can identify a line end scanning backwards. Now you scan forwards, calculating and saving screen line start positions until you've gone far enough. Finally, you pick the line you want to start displaying on and off you go.
For simple text this can be done on an archaic processor fast enough to redraw a memory mapped video display every keystroke. [I invented this technology 30 years ago]. The right way to do this is to fix the cursor in the middle line of the screen.
For actually modifying files, you might look at using Gnu's ropes. A rope is basically a linked list of buffers. The idea is that all local edits can be done in just one small buffer, occasionally adding a new buffer, and occasionally merging adjacent buffers.
I would consider combining this technology with differential storage: the kind of thing all modern source control systems do. You basically have to use this kind of transaction based editing if you want to implement the undo function.
The key to this is invertible transactions, i.e. one which contains enough information to be applied backwards to undo what it did when applied forwards. The core editor transaction is:
at pos p replace old with new
which has inverse
at pos p replace new with old
This handles insert (old is empty) and delete (new is empty) as well as replace. Given a transaction list, you can undo inplace modifications to a string by applying the reverse of the list of inverse transactions.
Now you use the old checkpointing concept: you store a fairly recent in-place modified image of the file together with some recent transactions that haven't been applied yet. To display, you apply the transactions on the fly. To undo, you just throw away some transactions. Occasionally, you actually apply the transactions, making a "checkpoint" image. This speeds up the display, at the cost of making the undo slower.
Finally: to rewrite a huge sequential text file, you would normally rewrite the whole text, which is horrible. If you can cheat a bit, and allow arbitrary 0 characters in the text and you have access to virtual memory system page manager and low level disk access, you can do much better by keeping all the unchanged pages of text and just reorganising them: in other words, the ropes idea on disk.
精彩评论