开发者

How to write a simple database engine [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

Closed 9 years ago.

The community reviewed whether to reopen this question last year and left it closed:

Original close reason(s) were not resolved

Improve this question

I am interested in learning how a database engine works (i.e. the internals of it). I know most of the basic data structures taught in CS (trees, hash tables, lists, etc.) as well as a pretty good understanding of compiler theory (and have implemented a very simple interpreter) but I don't understand how to go about writing a database engine. I have searched for tutorials on the subject and I couldn't find any, so I am hoping someone else can point me in the right direction. Basically, I would like information on the following:

  • How the data is stored internally (i.e. how tables are represented, etc.)
  • How the engine finds data that it needs (e.g. run a SELECT query)
  • How data is inserted in a way that is fast and efficient

And any other topics that may be relevant to this. It doesn't have to b开发者_运维技巧e an on-disk database - even an in-memory database is fine (if it is easier) because I just want to learn the principals behind it.

Many thanks for your help.


If you're good at reading code, studying SQLite will teach you a whole boatload about database design. It's small, so it's easier to wrap your head around. But it's also professionally written.

SQLite 2.5.0 for Code Reading

http://sqlite.org/


The answer to this question is a huge one. expect a PHD thesis to have it answered 100% ;) but we can think of the problems one by one:

  • How to store the data internally: you should have a data file containing your database objects and a caching mechanism to load the data in focus and some data around it into RAM assume you have a table, with some data, we would create a data format to convert this table into a binary file, by agreeing on the definition of a column delimiter and a row delimiter and make sure such pattern of delimiter is never used in your data itself. i.e. if you have selected <*> for example to separate columns, you should validate the data you are placing in this table not to contain this pattern. you could also use a row header and a column header by specifying size of row and some internal indexing number to speed up your search, and at the start of each column to have the length of this column like "Adam", 1, 11.1, "123 ABC Street POBox 456" you can have it like <&RowHeader, 1><&Col1,CHR, 4>Adam<&Col2, num,1,0>1<&Col3, Num,2,1>111<&Col4, CHR, 24>123 ABC Street POBox 456<&RowTrailer>

  • How to find items quickly try using hashing and indexing to point at data stored and cached based on different criteria taking same example above, you could sort the value of the first column and store it in a separate object pointing at row id of items sorted alphabetically, and so on

  • How to speed insert data I know from Oracle is that they insert data in a temporary place both in RAM and on disk and do housekeeping on periodic basis, the database engine is busy all the time optimizing its structure but in the same time we do not want to lose data in case of power failure of something like that. so try to keep data in this temporary place with no sorting, append your original storage, and later on when system is free resort your indexes and clear the temp area when done

good luck, great project.


There are books on the topic a good place to start would be Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom


SQLite was mentioned before, but I want to add some thing.

I personally learned a lot by studying SQlite. The interesting thing is, that I did not go to the source code (though I just had a short look). I learned much by reading the technical material and specially looking at the internal commands it generates. It has an own stack based interpreter inside and you can read the P-Code it generates internally just by using explain. Thus you can see how various constructs are translated to the low-level engine (that is surprisingly simple -- but that is also the secret of its stability and efficiency).


I would suggest focusing on www.sqlite.org

It's recent, small (source code 1MB), open source (so you can figure it out for yourself)...

Books have been written about how it is implemented:

http://www.sqlite.org/books.html

It runs on a variety of operating systems for both desktop computers and mobile phones so experimenting is easy and learning about it will be useful right now and in the future.

It even has a decent community here: https://stackoverflow.com/questions/tagged/sqlite


Okay, I have found a site which has some information on SQL and implementation - it is a bit hard to link to the page which lists all the tutorials, so I will link them one by one:

  • http://c2.com/cgi/wiki?CategoryPattern
  • http://c2.com/cgi/wiki?SliceResultVertically
  • http://c2.com/cgi/wiki?SqlMyopia
  • http://c2.com/cgi/wiki?SqlPattern
  • http://c2.com/cgi/wiki?StructuredQueryLanguage
  • http://c2.com/cgi/wiki?TemplateTables
  • http://c2.com/cgi/wiki?ThinkSqlAsConstraintSatisfaction


may be you can learn from HSQLDB. I think they offers small and simple database for learning. you can look at the codes since it is open source.


If MySQL interests you, I would also suggest this wiki page, which has got some information about how MySQL works. Also, you might want to take a look at Understanding MySQL Internals.

You might also consider looking at a non-SQL interface for your Database engine. Please take a look at Apache CouchDB. Its what you would call, a document oriented database system.

Good Luck!


I am not sure whether it would fit to your requirements but I had implemented a simple file oriented database with support for simple (SELECT, INSERT , UPDATE ) using perl.
What I did was I stored each table as a file on disk and entries with a well defined pattern and manipulated the data using in built linux tools like awk and sed. for improving efficiency, frequently accessed data were cached.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜