开发者

Memory usage of file versus database for simple data storage

I'm writing the server for a Javascript app that has a syncing feature. Files and directories being created and modified by the client need to be synced to the server (the same changes made on the client need to be made on the server, including deletes).

Since every file is on the server, I'm debating the开发者_如何学编程 need for a MySQL database entry corresponding to each file. The following information needs to be kept on each file/directory for every user:

  1. Whether it was deleted or not (since deletes need to be synced to other clients)
  2. The timestamp of when every file was last modified (so I know whether the file needs updating by the client or not)

I could keep both of those pieces of information in files (e.g. .deleted file and .modified file in every user's directory containing file paths + timestamps in the latter) or in the database.

However, I also have to fit under an 80mb memory constraint. Between file storage and database storage, which would be more memory-efficient for this purpose?

Edit: Files have to be stored on the filesystem (not in a database), and users have a quota for the storage space they can use.


Probably the filesystem variant will be more efficient memory wise as long as the number of files is low, but that solution probably won't scale. Databases are optimized to do exactly that. Searching the filesystem, opening the file, searching the document, will be expensive as the number of files and requests increase.

But nobody says you have to use MySQl. A NoSQL database like Redis, or maybe something like CouchDB (where you could keep the file itself and include versioning) might be solutions that are more attractive.

here a quick comparison of NoSQL databases. And a longer comparison.

Edit: From your comments, I would build it as follows: create an API abstracting the backend for all the operations you want to do. Then implement the backend part with the 2 or 3 operations that happen most, or could be more expensive, for the filesytem, and for a database (or two). Test and benchmark.


I'd go for one of the NoSQL databases. You can store file contents and provide some key function based on user's IDs in order to retrieve those contents when you need them. Redis or Casandra can be good choices for this case. There are many libs to use these databases in Python as well as in many other languages.


In my opinion, the only real way to be sure is to build a test system and compare the space requirements. It shouldn't take that long to generate some random data programatically. One might think the file system would be more efficient, but databases can and might compress the data or deduplicate it, or whatever. Don't forget that a database would also make it easier to implement new features, perhaps access control.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜