Friday, September 30, 2005

Design choices - flexibility versus performance

One of the design choices I made in SimpleDBM is to build the system as a collection of modules, and keep the modules loosely coupled via interfaces. There is also a hierarchy within the modules; some modules come before others, and provide services used by other modules. It is very important to enforce this hierarchy, because otherwise, the dependencies between the modules would become cyclical, making it much harder to replace a module.

A benefit of choosing this design is flexibility. For example, although there are different types of Log records in the system, the Log Manager has no knowledge of them. Similarly, the Transaction Manager interacts with disk pages without knowing much about the structure of those pages. Even the Buffer Manager, which is responsible for caching disk pages in memory, actually knows nothing about disk pages. Therefore, it is possible to implement new page types without impacting the Buffer Manager. As a final example, the module I am working on currently, the BTree manager, has no knowledge about the types of index keys and rowids it has to handle. It relies upon a couple of externally provided factory classes to generate, and manage these objects. I am therefore able to develop the BTree module in advance of defining the table row structure, the field types or even the field operations. From the BTree module's perspective, all that is needed is that the keys should be sortable and support the form of serialization that is defined by SimpleDBM.

The downside to this flexibility is increased overhead. There is extra memory and processing overhead, as various object types must be created and managed dynamically using reflection. There is extra overhead in maintaining type information and looking up type information. For instance, when the log records are retrieved during restart recovery, and we need to replay the changes that have occured within a BTree page, we need to first obtain instances of the index key generation factory and the rowid generation factory. The log records must save extra type information to allow this.

No comments: