Custom flat-file database - how do I design them?
Normally I would just use SQL/SQLite/Mongo for anything database-y but I thought it would be interesting to try and create my own flat file database structure (it's just a learning project). The application itself is a music streamer with a centralised library on the server.
The database:
My application is a client/server one whereby any changes made by the server sync to all clients. Servers do insert, edit & delete operations.
Clients are only able to change a client modifier boolean field in a record (the value of which is specific to that client). No other operations are available to the clients, so therefore there are NO changes to sync.
Write operations on the server are rare after the initial database construction but do happen. The priority here is definitely read operations for the clients.
Needs to be scalable up to 500k+ tracks or 2GB (2^31 bytes) database file size (which ever comes first).
What is stored:
- Few tables, with some relations. It's only a mockup, but you get the idea:
+--------+ +--------+ +-------------------+ | id* | | id* | | id* | | ARTIST | ------> | ARTIST | | track name | | | | ALBUM | ------> | ALBUM | | | | year | | length | | | 开发者_C百科 | | | filename** | | | | | | {client modifier} | +--------+ +--------+ +-------------------+ * unique identifier ** not stored in client version of database {client modifier} is only on the client version of database
One problem that would have to be overcome is how to deal with the relations and searching to minimise I/O operations.
- All fields are variable length apart from the id, year & length.
Required features:
- Sever able to sync the database to all clients with minimal operations.
One way to approach this would be store the date/time each record was last modified and have the client store the date of the last sync. When a client comes online, all changes part that date sync back to the client. Another way to do this would be to have a separate table on the server which lists all the operations that have happened and the date they happened; and sync in a similar fashion.
- Fast read operations for clients
Due to the tables being smaller, it is possible for a client to store the artists, albums tables in memory but I am going to assume they won't do this.
what I was thinking of doing is having separate files for each table and the client has each file open all the time to ensure they can read as quickly as possible...is this a bad idea?
Some sort of index will have to be stored for each table for where each record starts. This could be small enough to load into memory and could be stores in files separate to the actual tables to avoid issues.
- Minimise I/O operations
The server will store an "index" of the tracks database in memory with the id and file name so read operations are kept to a minimum.
The server will also buffer database write operations so that if it detects that a lot of write operations are going to happen in a short space of time it will wait and then do a batch write. This is possible because the changes to the file system will still be there is the database crashes, so it could just reload all changes on restart.
- NOT sparse file to keep file size to a minimum.
I will be working at a byte level to reduce the file size. The main problem will be fragmentation when a record is deleted. Because of the variable length fields, you can't simply add a new record in that place
I could de-fragment the file when it gets to a certain fragmentation level (ratio of deleted records to records), but I'd rather avoid this if I can as this will be an expensive operations for the clients.
I'd rather not use fixed length fields either (as the filename could be huge for instance), but these seem to me by only options?
Comments:
So how do I go about this and maximise performance?
Yes, I am looking a reinventing the wheel and yes I know I probably won't come anything close to the performance of other databases.
My suggestion would be to design and build your database. Don't worry about performance. Worry about reliability first and foremost.
Going through your features one at a time:
Sever able to sync the database to all clients with minimal operations.
This requires a log of database changes. You have the right idea.
Fast read operations for clients
On a modern PC, you can read flat files fast enough. Separate flat files for each table is a good design. If the flat file is small enough (domain tables) you could read them once and keep the table in memory. You'd write the table once, on database shutdown.
Minimise I/O operations
Databases minimize I/O operations by reading and writing blocks of data. I wouldn't get too concerned about this right away. Your database needs to be reliable.
NOT sparse file to keep file size to a minimum
Most modern PC's have plenty enough disk space that this is another feature that can be put off until later. Database reorganizations are usually under DBA control, because it's such an expensive process.
This is my incomplete project a few years ago. I can't explain it more, it's already obsolete, but it' worth experimenting with (which I didn't do actually). It's totally a disorganized database (flat-file) in first impression but in my theory it's not the worst case scenario. Anyone can add up concepts or improvements to this such as encryption, speed enhancement, data binding, data formating, etc.
By structure, Data are sorted into folders, files, etc. I also think that closing the file connection every time a query is executed will save memory.
Please deal with me, that this was proposed a few years ago, so I still use ASP. Now, I am using Ruby-on-Rails and NodeJS (and some PHP)
I call this concept folder-file data delegation, which data is structured into hierarchies by folder and file and the smallest structures in the database are called atoms.
The 5 Main Objectives of Folder-File Data Delegation
Structure
Database/
Table1/
Fieldsinfo/
username.mapper (file to provide info about the username field)
password.mapper (file to provide info about the password field)
Records/
username/
recordid1.tabledata (file that contains the data/value)
recordid2.tabledata (file that contains the data/value)
password/
recordid1.tabledata (file that contains the data/value)
recordid2.tabledata (file that contains the data/value)
…
- Speed a. By offering a draft structured database system and the avoidance of rewriting/searching a huge database every time an operation is carried, Folder-File Data Delegation aims to improve the speed reputation of flat file database systems. i. Folder-File Data Delegation stores data using folders and files, there is no need of structuring a single file database for flat-file database systems when you can just query a record by its physical path, the process of querying (searching/updating/deleting/sorting, etc.) a collection of records (table) is further simplified and extended by the use of query operatives which is securely better than SQL. The speed is improved by rewriting a small file containing the individual record’s information, instead of rewriting the entire database itself. Folder-File Data Delegation also stores key maps which further simplifies the validation of new input data. Plus, you can create unlimited access portals for your application, when one portal is in use, a new portal file will be cloned and operations will be created simultaneously not sequentially, (warning : this can result to RAM overload, use at your own risk and limit the number of portals to a number suited to your CPU’s power). ii. Disclaimer: We do not implicitly claim that Folder-File Data Delegation is the fastest database ever, rather we pronounce that the speed is effectively improved using such algorithm.
- Coherence a. By offering a draft structured database system and draft defined components such as collections, records and fields, Folder-File Data Delegation aims to improve the coherence of flat file database systems. i. Folder-File Data Delegation organizes data in such a way is effective distributed almost similarly to a relational database’s
- Databases – composed of interrelated collections (folder)
- Collections – composed of interrelated unique records (folder)
- Records – composed of interrelated basic units of data (folder)
- Fields – composed of a single unit of data (text, character, integer, array, object) and has a specified length for query speeding and security. (file, which is mapped (defined) by the collection’s field mapper)
- Security a. By offering a draft structured database system and a NoSQL and strict filtration of possible attacks to the database, Folder-File Data Delegation aims to improve the security of flat file database systems. i. Folder-File Data Delegation offers a NoSQL data and all data in query are interpreted as data and not as command. Querying is just accessing a physical path plus transmitting operations, then, and must be done in the programming language being used. It’s like a service company going to your house giving you services and stuffs, where your house is the certain record being manipulated.
- Simplicity a. By offering a draft structured system and draft documented and draft planned presentations of concepts, Folder-File Data Delegation aims to improve the security of flat file database systems. i. Folder-File Data Delegation offers a GUI, so you don’t need to manually carry out operations and commands if you want to change stuffs, etc. Operations are also simplified with user-interface interactions.
- Flexibility a. By offering a draft structured system and abstractly planned dynamic updates to the database structure, Folder-File Data Delegation aims to improve the flexibility of flat file database systems. i. Folder-File Data Delegation is regularly updated and maintained, if ever your database is attacked by a certain hacker which can be threatening, etc. Don’t worry; your database folder is stored in a hidden folder (.folder).
精彩评论