Friday, October 3, 2008

Why the Database Masters Fail Us

by Jesper Larsson

There is hardly any field in computing more plagued by religious wars than database systems. For nearly 40 years, the battle has raged between various architechtures, occasionally with some combattants replaced – or at least renamed. Still, it seems that we are further away than ever from a sort of database that we can be satisfied with. (I am not going into the details of the problems right now, that will be a subject of future posts. But if you are in the business I am sure you are familiar with some of them.)

Let us take a couple of steps back and take a look at the situation. Let us forget, for the moment, our personal stance in the fight on data models, platforms etc., and ask ourselves: who do we depend on to design database systems? What is driving them? Could they do better? Could we do better, with a different kind of effort?

I have come up with three groups of operators that influence the design of database platforms. I call them the paper writers, the evangelists, and the merchants. Anyone who creates the actual code that makes up the database systems is a servant of one or more of these masters. Let us go over them one by one.

The Paper Writers

What would be a better source of knowledge to base our system design on than science? According to popular view, scientists, or more specifically academic researchers, have the task of advancing human knowledge. Using objective observation and rational reasoning, they find the ultimate truth, unaffected by fads or short-sighted economics.

Unfortunately, it does not quite work that way.

The primary concern of most academic researchers is to produce papers – articles published in conference proceedings or scientific journals. Published papers are what they are judged by, the most important merit for their Ph.D. degrees and research grants.

Consequently, academic researchers learn to become experts on getting papers published. They have an ambition to advance human knowledge too, but that is a far-fetched goal that only established stars can afford to have as their first priority. In daily work, the immediate focus of most researchers is to impress peer reviewers – people in their own field who decide whether papers get accepted or not.

For several reasons, this makes scientific work less useful for practitioners than you might expect.

First, it influences which subjects get explored. Researchers tend to pick subjects that are currently in fashion, for which papers are in demand. The result is that current trends have a large impact on what subjects people choose to work with.

Second, it has an effect on the language of published research. Since writers and reviewers are actually the same people – the researchers involved with the field – writers use language meant to be understood and judged as appropriate by other people like themselves. This has a self-amplifying effect, with the result that publications often seem impenetrable or irrelevant to people outside the field.

Third, once the papers are published, the work is finished as far as the researcher is concerned. Few researchers bother to take their findings any further.

The consequence is that academic research rarely gives us comprehensible knowledge of how to design a system. What we mostly get are thousands of fragments of potentially useful knowledge, clumped together in bursts around subjects that are popular over a few years, and usually presented in a language that is difficult to penetrate.

Many research projects include implementation of actual software systems to test or demonstrate research findings. Sometimes they develop into industrially useful ones, but only rarely. Research systems are typically not fully functional or efficient enough for general practical use, and it would hardly be fair to expect them to be, especially for software as large and complex as modern database systems are. After all, the point of academic research is not to produce ready-made systems.

The Evangelists

There are a number of people out there who claim that they have the correct view of how a database should be constructed, and that essentially everyone else is wrong. If people would just listen to them, everything would turn out fine. Of course, oddballs, fanatics, and charlatans exist in every field, but none of these labels quite captures the database evangelists – at least not all of them.

In particular there is a group of people who persistently promote the relational model. But, the relational model is already dominant in the database world, isn't it? Is not everything fine, then? No, this true relational model lobby, with esteemed relational database pioneer C.J. Date as their figurehead, claims that the version of relational databases that dominates the industry is distorted; that the relational model is misunderstood or mistreated by practically everyone, even to some extent its inventor E.F. Codd!

I may have made this sound a little more eccentric than it deserves. The fact is, I principally agree with most of what Date and his allies say. However, even if they are correct about the data model, they do not have all the solutions needed to create a full database system. In their fervor to promote the true relational model, there are a number of problems that they de-emphasize or do not address at all. Hardware utilization and efficiency, for instance, they write off as somebody elses problem.

On the other hand, an extreme pragmatic lobby has recently emerged, centered around Michael Stonebraker, another veteran of the database field. Although Stonebraker's thesis that it's time for a complete rewrite of database products could plausibly be supported by C.J. Date, Stonebraker could not care less about purifying the relational model. He currently endorses abandoning the very idea of general-purpose database systems for specialized, application-specific solutions.

The database evangilists have an influence through their writings as well as through their contacts with implementation projects in academia or in the industry. Most of the time, their direct impact is minor, but they may have important roles as architects of future systems.

The Merchants

The vendors that make their living producing database systems obviously want to sell their products or services to as many customers as possible. This makes them keenly monitor what the market seems to want, and declare that this is just what they have – sometimes adjusting their products accordingly.

On the one hand, merchants tend to be conservative, at least on the main issues. They are frightened by radical new ideas that threaten to be costly both to them and to their customers. Database management platforms are heavy components in most IT infrastructures, coupled with large investments and legacy issues. Rather than improving their systems at the core, merchants prefer to add peripheral components, covering more and more of customers' software needs.

On the other hand, the merchants must be extremely sensitive to trends. They have to keep up with the latest buzzwords, not to appear to be falling behind the competition. Also, they are always on the lookout for new features – minor extensions that they can use as selling points.

The result, when merchants get their way, is that once their product has established a decent market share, it more or less stops developing, and starts growing instead.

Who Will Create the Perfect Database System?

As you can see, if you roughly agree with my outline of the operators, nobody really has both the will and the capability to create a good database management system. How is this different from other areas of the software industry? Simply because a database platform is such a monumental piece of software, not only to produce, but also to use. Changing your database system can be more demanding than replacing your operating system. To both create and sell a major product is a gargantuan task.

It is unlikely for any company or academic institute to successfully take on the task of designing and producing from scratch, and then selling to the market, a database system that is different enough to take a major leap forward. It can happen – it has happened before – but the way the industry has developed, it has become a lot more difficult since the last major shift with the relational database breakthrough in the early 1980s.

More likely, new systems will evolve slowly. People will abandon the monolithic one-size-fits-all database systems, as Stonebraker predicts, and create specialized lightweight solutions for various applications. This has already happened in some areas.

However, I am convinced that there is a place for general data management platforms in the future. It is simply too much work for everyone to roll their own.

So What am I Selling?

Who am I to say all this then, and what is my interest in the business? I am head of research at a medium-sized Scandinavian company named Apptus Technologies, which started out offering services around existing major database systems. We suffered from the sluggishness of the platforms, adopted to it, and ultimately made our business from it.

For a number of years we have produced search systems for large and complex data sets, to be accessed over the internet by millions of users. We achieved efficiency and flexibility by gradually moving away from major database platforms. Ultimately, we created our own full-fledged database management system. We have never sold it as a standalone system (nor do we intend to in the foreseeable future), only used it as a base for more specialized systems. Therefore, we have enjoyed a freedom in choosing how to develop our platform that database system vendors do not have – including the freedom to change our minds.

We have learned a lot during the last eight years, gained a lot of insights, and developed some strong opinions in the process. Now, we have decided to stick our heads out of the laboratory a bit, and try to start a conversation with the world outside. Not just about the advantages that our technology can produce for our customers (we have a sales department for that), but about the technology and ideas behind it, as well as our visions for the future of information systems. It is going to be a lot about databases, but also about programming and computer science in general.

I hope you will join us. Stay tuned to this blog. Subjects coming up: why you can benefit from rewriting your system rather than patching it; how we started out misunderstanding relational databases, and why most people still do.


paul c said...

"...number of problems that they de-emphasize or do not address at all. Hardware utilization and efficiency, for instance, they write off as somebody elses problem."

Having read large parts of everything Date has ever written, I find the above comment rather incredible. Rather I'd say it's getting close to thirty years that he has been railing against the people who muddle up implementation and model.

Jesper Larsson said...

I perfectly agree with that interpretation of Date, and I don't see why you would think I didn't.

Not muddling up implementation and model is crucial, but it is not the answer to efficiency. You are not done when you have gotten the model right. Rather, that is where the work to address efficiency begins.

HÃ¥kan (hakke) said...

Recommended this blog by by hakank, I was not disappointed. Working as a merchant, not developing and selling database company but other software, this is really food for thought. For example your excellent comment on our possible lack of innovation:

"once their product has established a decent market share, it more or less stops developing, and starts growing instead."

As I don't have a background in research and don't really know the development of databases I would be curious to know more. Have their been efforts to build databases on principles like for example the set theory, recursive algorithms, logic constraints or linguistic theories?

Jesper Larsson said...

Database research is surprisingly thin on theory. There is some logic and set theory – which the database community usually files under relational model. But there is also some interesting database-related research that identifies itself as artificial intelligence. I hope to get to it in future posts.