Time and Space Efficiency, An Applied Example: Part 2

Reading Time: 6 minutes

Black Baza Bird

In the previous post of this series, we started implementing an in memory database to demonstrate engineering decisions for time and space efficiency.

We created methods to get values, set values, and delete values.

Today we continue that example by implementing transaction support.

This is the second post in a three-part series:

  1. Implementing Get, Set, Count, and Delete
  2. Adding Transaction Support (as long as our computer has tons of memory) (this post)
  3. Improving Space Efficiency of our Transaction Support

Transactions allow us to stage a series of changes to the database that we can either commit (save) or roll back (discard) all at once. We begin the transaction by calling .begin, save the changes of the transaction by calling .commit, or discard those changes by calling .rollback.

First we’re gonna do this in a space efficient way that’s not completely functional.

Then we’ll do it in a way that’s functional but extremely space inefficient. Finally, in the next post, we’ll alight on something that works without devouring all our memory. I’d love your feedback on whether this step-by-step approach clarifies our performance decisions!

These tests demonstrate the behavior that we would expect upon creating and committing a transaction:

You’ll notice that we continue the relatively forgiving input validation policy that we discussed in the previous post.

So far, implementation will be a piece of cake! We could add empty methods for .begin and .commit to our Database class, and these tests would work. Is this because we’re superhero developers with future vision?

It is not.

The commit tests illustrate the same behavior that we would expect from the database without a transaction. The idea of having a prospective change set to either save or discard becomes more concrete when we look at the tests for rolling back:

So how will we know which changes to undo in the case of a rollback? We’ll need to know which changes are part of the current transaction.

Maybe we could keep each change set in separate hashes and hang onto an array of those hashes. For now, let’s assume we only have one level of transactions, so our array will have two items in it: one for the “normal” database contents, and one for the change set in our transaction. It might look a little like this:

I have added puts statements to make this demo, so you can see how the state of the database changes as we call the methods:

This looks OK, but it has some problems.

First of all, we cannot access the state of our database from inside transactions: only whatever exists in this transaction’s change set. This means that, when we look for a key that is in the database but that we haven’t changed in this transaction, we cannot find it:

Second of all, given that we cannot access the state of the database, how could we delete keys inside a transaction without knowing what keys are available to delete? (Sure, we could set their values to nil, but to do so would be declaring that for our database, the presence of a key pointing to nil means the same thing as the absence of a key. Often, that’s not the case).

We need a way to access the version of the whole database that lives in this transaction.

And this is a good place to have a conversation about space performance.

Because one way we could keep track of state is to make a fresh copy of the entire database each time we begin a transaction and modify that copy. If the change set is saved, the copy becomes the true version. If the change set is discarded, so is the copy. We might implement it something like this:

And in practice, here’s what it would do:

So when we open our first transaction, we immediately double our database footprint. When our database only has three things in it, maybe this isn’t a big deal. If our database had 100,000 things in it, this would become a much bigger deal.

It’s common, in manipulating databases, to open a transaction each time we make a change so we can roll back if something goes wrong. If we do that with our database implemented this way and 100,000 items in it, we’d have to duplicate all those records every time we wanted to do one little thing.

Also, imagine that we do this with nested transactions. To open a second transaction, we’d need to copy the copy of the database. This triples the database space footprint from the original data size.

This solution is not space efficient. Ideally, instead, we’d separate the information attached to each transaction into only the changes made by that transaction, and we’d have a way to see what the whole database would look like in the event that that transaction were applied.

So what can we do? While you take some time to puzzle on that, Then head over to the next post in the series. In there, we’ll take a look at a solution, its benefits, and its drawbacks.

If you liked this post, you might also like:

This talk about the technology and psychology of refactoring

This thing about graceful processes (perhaps some of my best writing on this blog, in my humble opinion)

Hiring for Fit – unrelated, but something I wish more hiring managers would think about

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.