Algorithm for Rendering Long Text in a Text Editor

2023-01-30 19:29 问答作者：

I've been thinking about writing a text editor control that can edit text that can have any a开发者_如何学运维rbitrary length (say, hundreds of megabytes), similar in some ways to the Scintilla editor. The goal is to lazy-read the file, so the user doesn't have to read five hundred megabytes of data just to view a small portion of it. I'm having two problems with this:

It seems to me to be impossible to implement any sensible scrolling feature for such an editor, unless I pre-read the entire file once, in order to figure out line breaks. Is this really true? Or is there a way to approximate things, that I'm not thinking of?
Because of various issues with Unicode (e.g. it allows many bytes to represent just one character, not just because of variable-length encoding but also because of accents and such), it seems nearly impossible to determine exactly how much text will fit on the screen -- I'd have to use TextOut() or something to draw one character, measure how big it was, and then draw the next character. And even then, that still doesn't say how I'd map the user's clicks back to the correct text position.

Is there anything I could read on the web regarding algorithms for handling these issues? I've searched, but I haven't found anything.

Thank you!

You can set a "coarse" position based on data size instead of lines. The "fine" position of your text window can be based on a local scan around an arbitrary entry point.

This means you will need to write functions that can scan locally (backwards and forwards) to find line starts, count Unicode characters, and so forth. This should not be too difficult; UTF8 is designed to be easy to parse in this way.

You may want to give special consideration of what to do about extremely long lines. Since there is no upper limit on how long a line can be, this makes finding the beginning (or end) of a line an unbounded task; I believe everything else you need for a screen editor display should be local.

Finally, if you want a general text editor, you need to figure out what you're going to do when you want to save a file in which you've inserted/deleted things. The straightforward thing is to rewrite the file; however, this is obviously going to take longer with a huge file. You can expect the user to run into problems if there is not enough room for a modified copy, so at the very least, you will want to check to make sure there is enough room on the filesystem.

@comingstorm is basically right. For display, you start at the cursor and scan backwards until you're sure you're past the top of the screen. Then you scan backwards to a line end, assuming you can identify a line end scanning backwards. Now you scan forwards, calculating and saving screen line start positions until you've gone far enough. Finally, you pick the line you want to start displaying on and off you go.

For simple text this can be done on an archaic processor fast enough to redraw a memory mapped video display every keystroke. [I invented this technology 30 years ago]. The right way to do this is to fix the cursor in the middle line of the screen.

For actually modifying files, you might look at using Gnu's ropes. A rope is basically a linked list of buffers. The idea is that all local edits can be done in just one small buffer, occasionally adding a new buffer, and occasionally merging adjacent buffers.

I would consider combining this technology with differential storage: the kind of thing all modern source control systems do. You basically have to use this kind of transaction based editing if you want to implement the undo function.

The key to this is invertible transactions, i.e. one which contains enough information to be applied backwards to undo what it did when applied forwards. The core editor transaction is:

at pos p replace old with new

which has inverse

at pos p replace new with old

This handles insert (old is empty) and delete (new is empty) as well as replace. Given a transaction list, you can undo inplace modifications to a string by applying the reverse of the list of inverse transactions.

Now you use the old checkpointing concept: you store a fairly recent in-place modified image of the file together with some recent transactions that haven't been applied yet. To display, you apply the transactions on the fly. To undo, you just throw away some transactions. Occasionally, you actually apply the transactions, making a "checkpoint" image. This speeds up the display, at the cost of making the undo slower.

Finally: to rewrite a huge sequential text file, you would normally rewrite the whole text, which is horrible. If you can cheat a bit, and allow arbitrary 0 characters in the text and you have access to virtual memory system page manager and low level disk access, you can do much better by keeping all the unchanged pages of text and just reorganising them: in other words, the ropes idea on disk.

继续阅读：algorithm text text-editor

Algorithm for Rendering Long Text in a Text Editor

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？