Lies your database is telling you
March 10, 2016
A wise person once said time is a device invented to keep everything from happening at once. Jonas Bonér explains how the database world has abused time from the beginning...
[Note: this article by Andrew C. Oliver appeared originally on Infoworld.com]
He began with double-entry accounting, which was born as early as the seventh century and still pervades every major business in the world. Simpler alternatives are rarely considered because the disadvantages are too great. The basic premise of double-entry accounting is you can’t change the past, only correct the present.
The database developers who had those great notions must have missed some of the best/worst episodes of "Star Trek," where you find out that time travel is generally a bad idea. With updates, you get concurrency control, mutexes, transactions, and other constructs that try to mitigate the negative effects of attempting to modify the same state while dealing with more than one thing happening at a time.
Now, there is an alternative: “insert only” structures. The trouble with those -- besides generating more instances, rows, attributes, or documents (like double-entry accounting) -- is that you never have a “consistent view” of the data. Bonér asserts that this is OK because the consistent view is nothing more than a convenient fiction you have inconveniently created at the expense of adding more latency to your operational system.
According to Bonér, not only is time an illusion, so is the present. It seems absurd, right? Now is the present. However, by the time you got to the end of that sentence to cognate what you read, it was no longer true. If you try to mentally hold on to the present in more than a general sense, you find that you can’t because the present is no more than a pointer that is always moving.
When we get to the level of larger data sets, however, determining totals “right now” is at the very least laborious in an insert-only structure. The “local present” is a set of “facts derived from multiple concurrent pasts.” That is, if you look at all of the states that “were” generated in the system up until “now,” you can arrive at a conclusion as to the state or value of now.
Meanwhile, when you try to discover this “right now” state, you may find you don’t have all of the information. In fact, you find that Donald Rumsfeld might have insight for you. Not only do you have known unknowns, but you have unknown unknowns. Why? Information has latency. There are facts you don’t have yet. Even when we try and force a consistent view of the world, we make things more latent somewhere else, and our operational system is less concurrent and lower scale.
How do we deal with information inconsistency and even information loss in the “real world?” We infer from context (fill in the blanks), and we attempt to confirm, wait, and repeat operations as new facts come in. As with double-entry accounting, we try to take a “compensating action” to account for the times we are wrong.
According to Bonér, the path forward is to treat time as a first-world construct instead of an unmodeled implied item. To do that, you can’t go around “changing things,” insert only or doing away with CRUD in favor of only CR. In other words, we make records or “facts” immutable. This obviously goes all the way from the front end of the system to the storage.
A popular explanation of transactions uses the bank account. Assuming I have a bank account and you have a bank account and I want to transfer money to you, we open up a nice transaction (which locks both accounts) and subtract money from my account and add money to yours. If I don’t have enough money the transaction rolls back. If your account can’t receive the money it rolls back. This allows us to know exactly how much money we have in our bank accounts at any given time.
The only problem with this analogy is that no banking system has ever worked or will ever work this way. What do banks use? They use credits, debits, and compensating transactions. There are financial exchanges “in flight,” which are in various states of completion. If something wrong happens, the bank takes compensating action. Even with a bank, the answer to the question “How much money do I have in my account right now?” is a type of fiction told to make customers feel better.
Bonér’s idea is to define “consistency boundaries” that describe the time, place, and circumstances in which the answer we give is correct. Outside of those boundaries is chaos. This starts to look a lot more like physics than computing, but that's a more honest approach.
That said, we have a long way to go before business and developers come to terms with how much lying they do to achieve a false sense of simplicity. The idea of “strong consistency” is ingrained in the minds of many. I mean, I recently had a client design an audit log that required updates.
If you haven’t caught Bonér’s “Life Beyond the Illusion of Present,” I highly recommend it. I’d take some of the prescription (obviously this is a long pitch for Lightbend/Akka) with a grain of salt, but the problem is well stated. From a person who has developed both strongly consistent and highly concurrent systems, I can say Bonér's talk makes me even less enamored of crusty old Oracle DBAs and their pre-seventh-century ways.