First Impressions of Redis

Sunday October 20 2013
python redis nosql

Redis is a key-value datastore. However, it is more accurately described by the term data structure server. Redis does not implement the typical concepts associated with relational databases such as tables, primary keys, and foreign keys. Instead, you get the same primitives common to high level programming languages: strings, lists, hash maps, sets, and sorted sets. This is why the term data structure server is more accurate than key-value datastore. Redis stores everything in a giant key-value store. The keys are always strings, but the values can be a real data structure. You can perform atomic operations on those data structures. In this way it is much more featured than something like memcached. While some "NoSQL" software describes itself as "Not Only SQL", Redis is truly "No SQL".

I was interested in Redis because my BitTorrent tracker was presently storing all peers in memory. In fact, it was storing all peers in a regular Python dictionary. Python is not well known for being efficient in terms of memory usage. Additionally, anytime I updated the software I lost all the peers. This is not a big deal as the peers continually announce their presence to the tracker, but it was slightly annoying. By storing all the peer lists in Redis, I can move that program state outside of the Python interpreter.

Documentation

One of my criteria for evaluating any software before using it is documentation. The documentation for Redis is phenomenally awesome. This is partially due to the restricted feature-set. It is much easier to produce comprehensive documentation if the goal is not to be the swiss army knife of databases. Redis has documentation for every single command the server supports. The time complexity of each command is indicated as well. All of this is available online. The example server configuration is well commented.

Data Types

Strings

Strings are the simplest type of data in Redis. Strings are not restricted to ASCII or Unicode data, but can hold any series of octets. The maximum size of a string 512 megabytes.

Integers, or maybe not

If you take a look at the data types page, Redis does not actually have a type directly corresponding to an integer. I'm not sure what lead to this design decision, but I know what the consequences are. Redis has some commands which only make sense in a numerical context. The INCR command's documentation includes this interesting note.

Note: this is a string operation because Redis does not have a dedicated integer type. The string stored at the key is interpreted as a base-10 64 bit signed integer to execute the operation.

Using redis-cli to communicate with a runnning Redis instance reveals some interesting behavior.

redis 127.0.0.1:6379> set bar 100
OK
redis 127.0.0.1:6379> incr bar
(integer) 101

First bar is set to 100. Then bar is incremented and the new value is returned. All is well and good. But it turns out Redis doesn't really support integers, so even if bar exists, the INCR command can fail.

redis 127.0.0.1:6379> set bar notAnInteger
OK
redis 127.0.0.1:6379> incr bar
(error) ERR value is not an integer or out of range

As you can see, INCR fails in this example. Admittedly, this is a contrived example. But take this example

redis 127.0.0.1:6379> set bar "\x35"
OK
redis 127.0.0.1:6379> incr bar
(integer) 6
redis 127.0.0.1:6379> set bar "\x00"
OK
redis 127.0.0.1:6379> incr bar
(integer) 1
redis 127.0.0.1:6379> set bar "\x01"
OK
redis 127.0.0.1:6379> incr bar
(error) ERR value is not an integer or out of range

In each case, bar is first set to a string of length 1. Setting bar to 53 just happens to be ASCII for "5". Calling INCR increments it to "6". Then bar is set to 0, which is ASCII for the NULL character. For whatever reason, calling increment on this results in it having a value of ASCII "1". The ASCII character code for the character "1" is actually 49, so I'm not really sure how 0 + 1 = 49. There is no consistency in this aspect, as setting bar to 1 simply causes the INCR to fail.

But it actually gets much more interesting.

redis 127.0.0.1:6379> set bar "\x00abcdefg"
OK
redis 127.0.0.1:6379> incr bar
(integer) 1
redis 127.0.0.1:6379> get bar
"1"
redis 127.0.0.1:6379> set bar "a\x00abcdefg"
OK
redis 127.0.0.1:6379> incr bar
(error) ERR value is not an integer or out of range
redis 127.0.0.1:6379> set bar "99\x00abcdefg"
OK
redis 127.0.0.1:6379> incr bar
(integer) 100
redis 127.0.0.1:6379> get bar
"100"

In each case, bar is set to a string containing the NULL character. Calling INCR with a the value of "\x00abcdefg" somehow results in a value "1". Where did the data after the null byte go? At least with a value of "a\x00abcdefg" it just blows up completely. But with a value "99\x00abcdefg" somehow "100" is the result. While the documentation states "The string stored at the key is interpreted as a base-10 64 bit signed integer", what it really means is that the key is first interpreted as a NULL-terminated ASCII string. Then if that string is a base-10 integer then it increments everything that came before the null-byte. Obviously, this means all the data after the null-byte is lost. This does not seem in any way sensible to me.

The Kitchen Sink

Redis also supports Lists, Hashes, and Sets. These data structures should not be at all surprising, but the values and keys are always strings. The great things about doing this with Redis is that you can have many processes simultaneous working away on the same data structures and it is all done atomically.

Vocabulary madness

The hash type in Redis does what you would expect: it maps from some set of values to another set of values. Of course, it always maps from strings to strings. But that's not that big of a deal. What is confusing, is the choice of vocabulary. To add something to a hash, use the HSET command. The HSET command takes 3 arguments, the key,field, & value. Key is the actual key in Redis that is to be treated as a hash. The field name is the field within the key that is assigned. This is demonstrated using redis-cli below.

redis 127.0.0.1:6379> HSET foo meow kitty
(integer) 1

Redis assigns the field meow to have a value of kitty in the key foo. The key foo is now a hash if it did not exist before. But to get all the fields in the key foo, you have to do the following.

redis 127.0.0.1:6379> HKEYS foo
1) "meow"

Despite referring to meow as a field in the context of the HSET command, you use the command HKEYS to get all of the fields in a hash. This sort of vocabulary madness makes the documentation less than clear. It also makes for great meeting room discussions where after 20 minutes of arguing, you realize what you think everyone else calls fields are really keys. Or maybe it is the other way around. Why not just name the command HFIELDS?

Sorted Sets

Redis also supports sorted sets. Each member of the set has a floating point number associated with it that is referred as its score. Members are sorted according to score. Initially, I thought I would have a great number of uses for sorted sets. It turns out I didn't use them at all, so I won't be discussing them any farther.

Storage and Persistence

Internally Redis keeps all data in memory. This allows it to achieve relatively high performance. Persistence is achieved by periodically calling fork() and writing the entirety of the dataset to a file. Alternatively, the fork()ed child can be configured to write to an append only file, just recording what has changed. In my case, I opted to disable these mechanisms since durability of the data is irrelevant to me.

Storage limits

Since Redis stores all data in memory, the size of your dataset is naturally constrained by the available memory. You can run Redis with or without an explicit memory limit. If you run Redis without an explicit limit, a large enough dataset will spill over into swap space. Naturally, the performance of Redis plummets when this occur.

With explicit limits set, Redis allows a couple of policies on how to handle hitting the memory limit.

The first policy is to do nothing. The write which would have caused Redis to exceed the memory limit fails. Any subsequent write fails. Removing keys from the data store frees up space and writes will succeed again. This is the policy I am using presently.

Python

For Redis support in Python, the clients page describes redis-py as "Mature and supported. Currently the way to go for Python." This package installs the redis module. The first thing that jumped out at me is you'd think redis.Redis is a client. It turns out it is, but actually implements an old version of the API that does not correspond to the commands in the official documentation. If you want to take advantage of the official documentation, you need to use redis.StrictRedis.

In an interactive session, getting started is easy:

>>> import redis
>>> r = redis.StrictRedis()
>>> r.ping()
True

By default, a TCP connection is established to the loclahost. The ping command checks to see if the server is alive. If redis is running on the local machine, you can use unix sockets to connect.

>>> import redis
>>> r = redis.StrictRedis(unix_socket_path='/tmp/redis.sock')
>>> r.ping()
True

This was the point in time I discovered a pretty subtle bug. If you pass a value for unix_socket_path that has a truth value of false, it still connects using TCP

>>> import redis
>>> redis.StrictRedis(unix_socket_path=0).ping()
True
>>>

The patch to change this was pretty simple and stems from the usage of None as a sentinel value for a keyword argument to the __init__ function of the object.

Overall, the API is quite pythonic. Any command described in the Redis documentation is a method on the StrictRedis object, but in all lowercase. One very important to thing to note is that the API coerces datatypes when possible.

>>> import redis
>>> r = redis.StrictRedis()
>>> r.set('testVal',5)
True
>>> r.get('testVal')
'5'
>>>

In this example, the number 5 is used as an argument to set. Calling get for the same key returns the string 5. This isn't a bug, but is very important to consider. Failure to do so leads to seemingly impossible conditions.

>>> import redis
>>> r = redis.StrictRedis()
>>> import math
>>> r.set('pi',math.pi)
True
>>> if r.get('pi')!=math.pi:
...     print 'What'
... 
What

The same as the before, math.pi becomes a string in the call to the set method. So of course when the key is retrieved, it does not compare as equal. If you're familiar with Javascript, this sort of behavior is quite commonplace. But this is not typical to the Python world.

Conclusion

Overall I am pretty pleased with Redis. I think it is important to fully define the behavior as to how strings get treated as integers. The Python API is quite pleasant to use. Redis seems to being going through a great deal of growth in its userbase currently, so it is likely such details will be hammered out in the near future.

It took me about a weekend to get my application using Redis. I wound up changing the choice of the data structures I was using mid way through, which probably contributed most of the time.