Wednesday, September 12, 2007

HYBRID RDBMS ARCHITECTURE FROM INFOTWINS

Integrating OODBMS & ORDBMS concepts in to RDBMS

Imagine using any RDBMS, without implementing concepts like primary key, foreign key or indexes and accomplishing all that OODBMS and ORDBMS can do along with everything that RDBMS usually does! More importantly with faster data processing speeds! This is precisely what HRDBMS architecture achieves. It is an architecture, which can be deployed on any of the popular RDBMS software.

This innovation was also born out of necessity like so many others. The embryo was hatched in 1997 when an RDBMS software bought with great expectations seemed to slip in to coma once the number of records crossed 500,000. With no money to buy more powerful hardware, the only way out was to tweak the DB architecture. It worked. The RDBMS stopped flailing.

The real challenge that triggered the full-scale development of HRDBMS architecture emerged in 2004 while developing the Indian Mine Safety Information System (IMSIS), India’s first regulatory compliance monitoring system for the Directorate General of Mines Safety (DGMS), Government of India. The first edition of the application developed using traditional concepts of ID keys and indexes could no more than crawl given the extremely complex nature of a multitude of entity structures within the application.

While the main challenge in IMSIS was the need to cope with the long hierarchy of entities inside a mine, it was further compounded by the need to relate events inside mining processes to an even more complex data tree namely the regulatory framework. The safety regulations, which broadly fell in to three categories namely mining, mechanical and electrical were further divided in to Acts or Rules or Notifications each having Chapters, Sections, Clauses and Sub clauses or even sub sub clauses and more.

While flattening is a known DBA technique aimed at minimizing the impact of hierarchical entity relationships inside data structures, it is of little help when the entity relationships are dynamic and could not be predetermined. In fact, the oft-repeated limitation of RDBMS to replicate real life situations becomes a major resource guzzler when it is forced to do what it cannot easily.

Of course, ORDBMS represents an attempt to overcome this limitation. However, as Esther Dyson, a leading technology futurist, points out "using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car." In addition, ORDBMS contains a very big impedance mismatch, which leads to significant performance drops.

The Object Oriented Data Base Management System (OODBMS) on the other hand is based on a single-level store of objects. However, OODBMS are limited to small applications; OODBMS don't support a lot of concurrent users; and it takes too much time to deliver an OODBMS application. A major limitation OODBMS is the absence of well-developed common standards and a Standard Query Language.

Integrating ORDBMS and OODBMS concepts within RDBMS with improved performance therefore became a tantalizing option. Its potential to revolutionize search and retrieval and replace eventually string-based engines like Google made it irresistible, in spite of the daunting tasks involved.

As of June 2006, the idea became a reality with the commissioning of a new version of IMSIS at Central Zone office of DGMS, India. The innovation is here for everyone to see. No one would really believe its power and versatility without watching it at work. A 300 page coal mine inspection report culled out from 6915 fields spread across 403 forms appears on the screen under 10 seconds and gets completed in a few minutes. Of these 61 forms are at the top of the hierarchy to which the rest of 342 are attached in a linked hierarchy of 2 to 8 levels. In addition each field in the forms also have related fields where multi-line texts or links to files in other formats are stored.

Each form represents an object. The linked forms represent the objects linked to the parent objects and are used as many times as there are children at any level in the family genealogy. However, unlike a genealogical database neither the hierarchy needs to be only of families nor the information collected about each family member is to be the same. More importantly, at any level the records created are equal to the number of siblings that exist only. The architecture leaves no empty spaces inside tables.

Each node represents a relationship in a hierarchical entity structure and the data entry form at each level could also be different. For example, the first level forms can capture information about each of the colleges in an university, the next level about each of the courses, the next about the classes, next about each student and so on. It is not necessary that only colleges should have courses. The university could also have courses and students.

Each field in a form can hold either data pertaining to the object at that level or the reference to the next level objects. The number or the names of the objects at any level are not required to be known in advance as well. It is the users who enter the data about the object identifiers just before entering the data in to a form pertaining to an object. A very important feature of these data entry forms is also the facility to enter data that inter relates a particular object or its component to one more entities within entire system.

Apart from the object relationships that the above features provide within the ambit of RDBMS this architecture also includes facilities to store information in multiple formats, though without making use of either BLOB or CLOB concepts. While it stores unlimited amount of text within the database itself, it incorporates a way to generate folders automatically and store all other types of data in their native file formats with the necessary folder links stored inside the database. The argument is that there is no point in storing any kind of information inside the database, which cannot be processed by the database engine.

The decision not to use the basic features of RDBMS such as primary or foreign keys, indexes and BLOB and CLOB was primarily done to prove the basic robustness of the HRDBMS architecture. It is no doubt possible to use all these on top of the HRDBMS architecture and achieve even better performance. This is the very purpose of integrating ORDBMS and OODBMS concepts within RDBMS. There is much more to be gained

No comments: