by Jesper Larsson
There is hardly any field in computing more plagued by religious wars
than database systems. For nearly 40 years, the battle has raged
between various architechtures, occasionally with some combattants
replaced – or at least renamed. Still, it seems that we are
further away than ever from a sort of database that we can be
satisfied with. (I am not going into the details of the problems right
now, that will be a subject of future posts. But if you are in the
business I am sure you are familiar with some of them.)
Let us take a couple of steps back and take a look at the
situation. Let us forget, for the moment, our personal stance in the
fight on data models, platforms etc., and ask ourselves: who do we
depend on to design database systems? What is driving them? Could they
do better? Could we do better, with a different kind of
effort?
I have come up with three groups of operators that influence the
design of database platforms. I call them the paper writers,
the evangelists, and the merchants. Anyone who
creates the actual code that makes up the database systems is a
servant of one or more of these masters. Let us go over them one by
one.
The Paper Writers
What would be a better source of knowledge to base our system design
on than science? According to popular view, scientists, or more
specifically academic researchers, have the task of advancing human
knowledge. Using objective observation and rational reasoning, they
find the ultimate truth, unaffected by fads or short-sighted
economics.
Unfortunately, it does not quite work that way.
The primary concern of most academic researchers is to produce
papers – articles published in conference proceedings
or scientific journals. Published papers are what they are judged by,
the most important merit for their Ph.D. degrees and research grants.
Consequently, academic researchers learn to become experts on getting
papers published. They have an ambition to advance human knowledge
too, but that is a far-fetched goal that only established stars can
afford to have as their first priority. In daily work, the immediate
focus of most researchers is to impress peer reviewers
– people in their own field who decide whether papers get
accepted or not.
For several reasons, this makes scientific work less useful for
practitioners than you might expect.
First, it influences which subjects get explored. Researchers tend to
pick subjects that are currently in fashion, for which papers are in
demand. The result is that current trends have a large impact on what
subjects people choose to work with.
Second, it has an effect on the language of published research. Since
writers and reviewers are actually the same people – the
researchers involved with the field – writers use language
meant to be understood and judged as appropriate by other people like
themselves. This has a self-amplifying effect, with the result that
publications often seem impenetrable or irrelevant to people outside
the field.
Third, once the papers are published, the work is finished as far as
the researcher is concerned. Few researchers bother to take their
findings any further.
The consequence is that academic research rarely gives us
comprehensible knowledge of how to design a system. What we mostly get
are thousands of fragments of potentially useful knowledge, clumped
together in bursts around subjects that are popular over a few years,
and usually presented in a language that is difficult to penetrate.
Many research projects include implementation of actual software
systems to test or demonstrate research findings. Sometimes they
develop into industrially useful ones, but only rarely. Research
systems are typically not fully functional or efficient enough for
general practical use, and it would hardly be fair to expect them to
be, especially for software as large and complex as modern database
systems are. After all, the point of academic research is not to
produce ready-made systems.
The Evangelists
There are a number of people out there who claim that they have the
correct view of how a database should be constructed, and that
essentially everyone else is wrong. If people would just listen to
them, everything would turn out fine. Of course, oddballs, fanatics,
and charlatans exist in every field, but none of these labels quite
captures the database evangelists – at least not all of
them.
In particular there is a group of people who persistently promote the
relational model. But, the relational model is already dominant in the
database world, isn't it? Is not everything fine, then? No, this
true relational model lobby, with esteemed relational
database pioneer C.J. Date
as their figurehead, claims that the version of relational databases
that dominates the industry is distorted; that the
relational model is misunderstood or mistreated by practically
everyone, even to some extent its inventor E.F. Codd!
I may have made this sound a little more eccentric than it deserves.
The fact is, I principally agree with most of what Date and his allies
say. However, even if they are correct about the data model, they do
not have all the solutions needed to create a full database system. In
their fervor to promote the true relational model, there are
a number of problems that they de-emphasize or do not address at all.
Hardware utilization and efficiency, for instance, they write off as
somebody
elses problem.
On the other hand, an extreme pragmatic lobby has recently emerged,
centered around Michael
Stonebraker, another veteran of the database field. Although
Stonebraker's thesis that it's
time for a complete rewrite of database products could plausibly
be supported by C.J. Date, Stonebraker could not care less about
purifying the relational model. He currently endorses abandoning the
very idea of general-purpose database systems for specialized,
application-specific solutions.
The database evangilists have an influence through their writings as
well as through their contacts with implementation projects in
academia or in the industry. Most of the time, their direct impact is
minor, but they may have important roles as architects of future
systems.
The Merchants
The vendors that make their living producing database systems
obviously want to sell their products or services to as many customers
as possible. This makes them keenly monitor what the market seems to
want, and declare that this is just what they have – sometimes
adjusting their products accordingly.
On the one hand, merchants tend to be conservative, at least on the
main issues. They are frightened by radical new ideas that threaten to
be costly both to them and to their customers. Database management
platforms are heavy components in most IT infrastructures, coupled
with large investments and legacy issues. Rather than improving their
systems at the core, merchants prefer to add peripheral components,
covering more and more of customers' software needs.
On the other hand, the merchants must be extremely sensitive to
trends. They have to keep up with the latest buzzwords, not to appear
to be falling behind the competition. Also, they are always on the
lookout for new features – minor extensions that they
can use as selling points.
The result, when merchants get their way, is that once their product
has established a decent market share, it more or less stops
developing, and starts growing instead.
Who Will Create the Perfect Database System?
As you can see, if you roughly agree with my outline of the operators,
nobody really has both the will and the capability to create a good
database management system. How is this different from other areas of
the software industry? Simply because a database platform is such a
monumental piece of software, not only to produce, but also to use.
Changing your database system can be more demanding than replacing
your operating system. To both create and sell a major product is a
gargantuan task.
It is unlikely for any company or academic institute to successfully
take on the task of designing and producing from scratch, and then
selling to the market, a database system that is different enough to
take a major leap forward. It can happen – it has
happened before – but the way the industry has developed, it
has become a lot more difficult since the last major shift with the
relational database breakthrough in the early 1980s.
More likely, new systems will evolve slowly. People will abandon the
monolithic one-size-fits-all database systems, as Stonebraker
predicts, and create specialized lightweight solutions for various
applications. This has already happened in some areas.
However, I am convinced that there is a place for general data
management platforms in the future. It is simply too much work for
everyone to roll their own.
So What am I Selling?
Who am I to say all this then, and what is my interest in the
business? I am head of research at a medium-sized Scandinavian company
named Apptus Technologies, which started out offering services around
existing major database systems. We suffered from the sluggishness of
the platforms, adopted to it, and ultimately made our business from
it.
For a number of years we have produced search systems for large and
complex data sets, to be accessed over the internet by millions of
users. We achieved efficiency and flexibility by gradually moving away
from major database platforms. Ultimately, we created our own
full-fledged database management system. We have never sold it as a
standalone system (nor do we intend to in the foreseeable future),
only used it as a base for more specialized systems. Therefore, we
have enjoyed a freedom in choosing how to develop our platform that
database system vendors do not have – including the freedom to
change our minds.
We have learned a lot during the last eight years, gained a lot of
insights, and developed some strong opinions in the process. Now, we
have decided to stick our heads out of the laboratory a bit, and try
to start a conversation with the world outside. Not just about the
advantages that our technology can produce for our customers (we have
a sales department for that), but about the technology and ideas
behind it, as well as our visions for the future of information
systems. It is going to be a lot about databases, but also about
programming and computer science in general.
I hope you will join us. Stay tuned to this blog. Subjects coming up:
why you can benefit from rewriting your system rather than patching
it; how we started out misunderstanding relational databases, and why
most people still do.
Subscribe to:
Post Comments (Atom)
4 comments:
"...number of problems that they de-emphasize or do not address at all. Hardware utilization and efficiency, for instance, they write off as somebody elses problem."
Having read large parts of everything Date has ever written, I find the above comment rather incredible. Rather I'd say it's getting close to thirty years that he has been railing against the people who muddle up implementation and model.
I perfectly agree with that interpretation of Date, and I don't see why you would think I didn't.
Not muddling up implementation and model is crucial, but it is not the answer to efficiency. You are not done when you have gotten the model right. Rather, that is where the work to address efficiency begins.
Recommended this blog by by hakank, I was not disappointed. Working as a merchant, not developing and selling database company but other software, this is really food for thought. For example your excellent comment on our possible lack of innovation:
"once their product has established a decent market share, it more or less stops developing, and starts growing instead."
As I don't have a background in research and don't really know the development of databases I would be curious to know more. Have their been efforts to build databases on principles like for example the set theory, recursive algorithms, logic constraints or linguistic theories?
Database research is surprisingly thin on theory. There is some logic and set theory – which the database community usually files under relational model. But there is also some interesting database-related research that identifies itself as artificial intelligence. I hope to get to it in future posts.
Post a Comment