I'm looking for any optimizations I can make in a graphics editing program
Heyo, this is my first time asking a question here so do forgive me if I mess somethin up >~<
I'm working on a program similar to openCanvas, the earlier ones that allowed multiple people to draw on the same canvas in real time over the internet. OC's really buggy and has a lot of limitations, which is why I wanted to write this.
I have it set up so the canvas extends "indefinitely" in all directions and is made up of 512x512 blocks of pixels that don't become active until they're drawn on, which should be really easy to make, and I was thinking about using Direct3D to make it hardware accelerated, thus the 512 square blocks.
My problem comes when I want to use layers, I'm not quite sure how I can compose layers quickly and without using a ton of memory, since my target is DirectX9 compatible video cards with 128m of memory, and a system with about 3.2 ghz of CPU power and between 2 and 8 gigs of ram. I had a few different approaches I was thinking of using and was wondering which would probably be the best, and if there was anything I coul开发者_开发百科d look into to make it run better.
My first idea was to make the gfx hardware do as much work as possible by having all the layers on all the blocks serve as textures, and they'd be updated by locking the changed area, updating them on the cpu, and unlocking them. Blocks that aren't currently being changed are flattened into one texture and the individual layers themselves are kept in system memory, which would reduce the gfx memory used, but could significantly increase bandwidth usage between system and gfx memory. I can see the constant locking and unlocking potentially slowing down the system pretty bad as well. Another possible issue is that I've heard some people using up to 200 layers, and I can't think of any good ways to optimize that given the above.
My other idea was to compose the textures -completely- in system memory, write them into a texture, and copy that texture to gfx memory to be rendered in each block. This seems to eliminate a lot of the issues with the other method, but at the same time I'm moving all the work into the CPU, instead of balancing it. This isn't a big deal as long as it still runs quickly, though. Again, however, there's the issue of having a couple hundred layers. In this case though, I could probably only update the final pixels that are actually changing, which is what I think the bigger name programs like Sai and Photoshop do.
I'm mostly looking for recommendations, suggestions that might improve the above, better methods, or links to articles that might be related to such a project. While I'm writing it in C++, I have no trouble translating from other languages. Thanks for your time~
Data Structure
You should definitely use a quadtree (or another hierarchical data structure) to store your canvas and its nodes should contain much smaller blocks than 512x512 pixels. Maybe not as small as 1x1 pixels, because then the hierarchical overhead would kill you - you'll find a good balance through testing.
Drawing
Let your users only draw on one (the highest) resolution. Imagine a infinitely large uniform grid (two dimensional array). Since you know the position of the mouse and the amount which your users have scrolled from the origin, you can derive absolute coordinates. Traverse the quadtree into that region (eventually adding new nodes) and insert the blocks (for example 32x32) as the user draws them into the quadtree. I would buffer what the user draws in a 2D array (for example as big as his screen resolution) and use a separate thread to traverse/alter the quadtree and copy the data from the buffer to circumvent any delays.
Rendering
Traversing the quadtree and copying all tiles to one texture and send it to the GPU? No! You see, sending over one texture which is as big as the screen resolution is not the problem (bandwidth wise). But traversing the quadtree and assembling the final image is (at least if you want many fps). The answer is to store the quadtree in system memory and stream it from the GPU. Means: Asynchronously another thread does the traversal and copies currently viewed data to the GPU in chunks as fast as it can. If your user doesn't view the canvas in full resolution you don't have to traverse the tree to leaf level, which gives you automatic level of detail (LOD).
Some random thoughts regarding the proposed strategy
- The quadtree approach is great because it is very memory efficient.
- The streaming idea can be extended to the HDD...SeaDragon
- A sophisticated implementation would require something like CUDA.
- If your GPU doesn't offer the necessary performance/programmability, just implement the traversal on the CPU - a little longer delay until the image is fully displayed but should be acceptable. Don't forget to program asynchronously using multiple threads so that the screen doesn't freeze while waiting on the CPU. You can play with different effects: Showing the whole image at once, blurry at first and slowly increasing detail (breadth first search (BFS)) or render it tile by tile (depth first search (DFS)) - maybe mixed with some cool effects.
- Software implementation should be pretty easy, when only allowing viewing of the canvas at full resolution. If one can zoom out in steps that's a minor change to the traversal. If one can zoom seamlessly, that'll require linear interpolation between neighboring quadtree nodes' tiles - not trivial anymore, but doable.
- Layers: The quadtree should give you low enough memory consumption allowing you to simply store one quadtree per layer. But when you've many layers you'll need some optimizations to stay in realtime: You can't assemble 200 textures per frame and send them to the GPU. Maybe (not totally sure whether that's the best solution) for each layer deleting all nodes of quadtrees beneath that layer whose tile's pixels are completely covered by the layer above. That would need to be done at runtime while drawing and a depth-buffer is required. If you offer an eraser tool, you can't delete nodes but have to mark them as "invisible" so that they can be omitted during traversal.
..off the top of my head. If you have further questions, let me know!
精彩评论