Wednesday, September 1, 2010

Moving On

As you can see, the flow of blog posts here stopped short quite a while ago. Not too early for some interesting content to appear, I hope, but the intended story arc of insights relating to database systems was never fulfilled.

I am going to round off this blog now (this will probably be the last post) without fulfilling any of my old “to be covered in future posts” promises. Sorry. Things have moved on, and I have moved on. Instead, I am going to finish off with taking a step back and making some reflections about the role of database systems in people’s minds and in the industry.

But first, I just want to explain a few things about this blog, the company that I was working for while writing the previous posts, and which I am no longer working for (Apptus Technologies), and what I am doing now and in the future.

Why This Is the End

The reason for the decline of this blog was changes at Apptus Technologies that shifted the focus for me and my colleagues a little bit. The company took on a new direction after the international financial crisis of late 2008. We decided to narrow down our scope and build something that would lie as close as possible to the final use our customers typically found for our software, rather than producing a generic platform. Hence, spreading the word of our insights in generic data management, which was the primary mission of this blog, became less of an issue.

The new direction, with the Apptus Esales platform, became quite a success, both in terms of creating a technologically strong product and selling it, and Apptus is again thriving, and hiring new software developers.

From now on, however, future development of Apptus will happen mostly without my involvement. After about ten years in the software development business, most of which as Apptus’ head of research, I have decided to give academic research and teaching another shot. In early August, I started as assistant professor at the IT University of Copenhagen. I will certainly make use of what I have learned at Apptus in my new position, and most likely my research will involve developing ideas that have come up at Apptus. Plausibly, my former Apptus colleagues will also be involved in some manner.

If you are interested in following what I do, or would like to contact me about research ideas for instance, there are several entry points for finding and contacting me on the Internet.

If you are interested in what happens at Apptus, keep track of the fairly recently reworked Apptus website. There is some blogging going on there now, and I left some texts behind that you might find there (uncredited) sometime.

Reconsidering Databases

The transition I have been going through in the last few months inspired me to make some general reflections on database systems in a wider perspective, which I thought I would share with you. I have asked myself: why do we have database systems, what do people really expect of them, why are people often so dissatisfied with them, and what directions are available for improvement?

I figure the things most database systems are intended for are the following:
  • Persistent storage
  • Secure storage
  • Organizing and structuring data
  • Complex query answering
  • Efficient (quick) query answering
  • Concurrent access
  • Concurrent update
Now, who needs all of that? Hardly anybody. That raises a few questions which I will formulate and give my answers to.

1 What are database systems mostly used for?

I haven’t researched what people use database management systems for, but I think I have enough knowledge and experience to base some thoughts on my personal impression of what most people consider to be the main task of DBMS.

I think it’s persistent storage. Period. People who don’t want data to disappear when they shut down the program (or the computer) set up a database system to hold the data. Anyhow, I think that’s what most programmers (which is what I identify as) consider the DBMS to be mostly about.

Everyone doesn’t use something that deserves being called a DBMS for this, but many do. Why, if they need only a single aspect of what the DBMS was designed for? Probably because it’s easy and safe. A DBMS is something they are familiar with how to interact with, and they trust that it won’t lose any data.

This is the reason for the object model for databases that is still out there, puzzling or even annoying advocates of the relational model. People who write object-oriented software want the objects still be there when a program comes back up after being shut down. I might not want to call that a DBMS, but I understand their position. I am not even going to say that it is always a bad idea to base an object persistence library on a general DBMS, but it does seem a bit excessive. It should be possible to create a slimmer and more efficient system designed specifically for object persistence, and I am sure there are such systems out there too.

2 What is the main point where database systems don’t deliver?

Maybe I am skewed towards things that have mattered in my own work, but I would say that at least one major point of dissatisfaction with general database systems is their lack of computational efficiency in delivering query results. We at Apptus found, like many others have, that a standard DBMS is about a factor 100 from the query performance of a well designed search engine, and that there is no principal reason why it has to be.

This is the reason why regular DBMS were thrown out of all large-scale search engines on the Internet in the early 2000s. (Well, almost all. Amazon are still using Oracle, I think, but I would presume that the Amazon search engine has evolved into something that is practically hand-written by Oracle technicians by now.)

3 Do we really need a single system to do all that a DBMS does?

Since most people are apparently primarily concerned about one, or at least just a few, of the capabilities of a DBMS, can’t we abandon the idea alltogether and split it into a number of systems, or at least components, that are specialized for those few things?

My answer to that is that I think we still need the single-system DBMS. It is true that in many cases where it used to be routine to bring in the DBMS, people have started using other things. And rightly so. That development will probably continue. But when you get into concurrency and security, everything gets entangled. I don’t see how that can be lifted out to work by itself. A secure system with concurrency capability just has to have control of everything that is done with the data.

So I think there are still businesses where you need a DBMS approximately as we know it. What one could wish for, however, is that DBMS in the future may gain a greater elasticity to satisfy different needs. For instance, it should be easier and more transparent how to sacrifice a bit of update concurrency for query speed, or improving update capabilities by reducing the capability for complex data organization. Today’s SQL systems, at the core still optimized for the needs and hardware resources of the 1980s, are not good at giving us that flexibility.

4 For what ascpects does the relational model help?

I myself have frequently tried to sell the importance of a strict data model with a firm basis in logic, and been frustrated by the lack of understanding of the connection between logic and databases that dominates the industry, and actually much of acedemia too. (Although I am less convinced now than I was a few years ago that the relational model is the perfect solution.)

But looking at the points above, which of them does a good and logical data model really solve? Actually not many. It is essential only to two of them: organizing and querying complex data. Those that can do without that can do without thinking very much about data models. Some of the other things certainly get a little simpler if you have a good grasp of logic and data modelling, but they are not strictly necessary, and not top priority.

This, I think, is reason why most programmers are so indifferent to logical data modelling, even though most of them should have the ability to both understand and appreciate the elegance of it.

That was all I had to say at this point. Not a finished theory to get oriented in the database world, but maybe a few insights that at least I personally am going to keep in mind in choosing the direction of my future research.

No comments: