Saturday, April 10, 2010

Roadmap

I have been thinking about how SimpleDBM should evolve. When I started the project my intention was to eventually add support for SQL, but now I can't see this happening in the near term. SimpleDBM is not aimed at competing with other SQL databases; SQL is nice because of the ease with which tables can be queries, joined etc., but implementing an SQL layer is quite a lot of work, which I am not able to put in right now.

Another subject that has interested me right from the beginning is multi-version concurrency. Unfortunately, I have not really found a way of implementing this which is satisfactory. The two main approaches are those taken by Oracle and PostgreSQL - which I have previously compared in a short paper. The Oracle approach is problematic because it requires page level redo/undo in the transaction system; SimpleDBM's BTree implementation uses logical undo, allowing for undo to be applied to a different page from the original. I do not like the PostgreSQL approach to MVCC either, as it does not support versioning in indexes.

Instead of adding large features such as above, I shall perhaps focus on the many smaller changes that I have been mulling over for some time now:
  • Refactor the code so that the modularity of SimpleDBM can be exploited better. This involves separately packaging the API from the implementation, and allowing implementations of individual modules to be easily swapped in.
  • Add statistics gathering so that useful metrics can be captured. Some work has been done in this area.
  • Improve performance and scalability of the lock manager.
  • Add support for sequences.
  • Add support for reverse indexes.
  • Add JMX monitoring capability.
  • Refactor the type system - and make the type system a first class component of the core engine. This needs a bit of explanation. When I first started SimpleDBM, my strategy was that the core database engine should be typeless, and should allow a type system to be plugged in. The core engine should treat records and keys as blobs of data, and not worry about their internal structure. This strategy allowed me to develop the core engine without first having to define a type system. However, it has meant that some things are less efficient - for instance, row updates cause the entire before and after images of the row to be logged. Another area of concern is the ability to compress data within pages, which is hard to do without some knowledge of the structure of the data inside the records.
  • Carry forward the work of making most types immutable, include row types. A row builder class can be provided to create rows, but once constructed the row should be immutable. 
  • Carry on improving the documentation.
  • Improve the test cases, and the test coverage.
  • Create a single threaded version which can run on small devices.
  • Add support for nested readonly transactions; these are useful for carrying out foreign key checks, should they be added in future.
  • Ensure the embedded and network API are interchangeable, and that clients can swap between the two without having to change any code. At present, the network API is completely separate from the embedded API.
  • Create a full blown sample application - work is ongoing to create this.
  • Try to raise awareness about SimpleDBM and build a community of users and developers.

    No comments: