Redis is a key-value datastore. However, it is more accurately described by the term data structure server. Redis does not implement the typical concepts associated with relational databases such as tables, primary keys, and foreign keys. Instead, you get the same primitives common to high level programming languages: strings, lists, hash maps, sets, and sorted sets. This is why the term data structure server is more accurate than key-value datastore. Redis stores everything in a giant key-value store. The keys are always strings, but the values can be a real data structure. You can perform atomic operations on those data structures. In this way it is much more featured than something like memcached. While some "NoSQL" software describes itself as "Not Only SQL", Redis is truly "No SQL".
I was interested in Redis because my BitTorrent tracker was presently storing all peers in memory. In fact, it was storing all peers in a regular Python dictionary. Python is not well known for being efficient in terms of memory usage. Additionally, anytime I updated the software I lost all the peers. This is not a big deal as the peers continually announce their presence to the tracker, but it was slightly annoying. By storing all the peer lists in Redis, I can move that program state outside of the Python interpreter.
One of my criteria for evaluating any software before using it is documentation. The documentation for Redis is phenomenally awesome. This is partially due to the restricted feature-set. It is much easier to produce comprehensive documentation if the goal is not to be the swiss army knife of databases. Redis has documentation for every single command the server supports. The time complexity of each command is indicated as well. All of this is available online. The example server configuration is well commented.
Strings are the simplest type of data in Redis. Strings are not restricted to ASCII or Unicode data, but can hold any series of octets. The maximum size of a string 512 megabytes.
Integers, or maybe not
If you take a look at the data types page, Redis does not actually have a type directly corresponding to an integer. I'm not sure what lead to this design decision, but I know what the consequences are. Redis has some commands which only make sense in a numerical context. The
INCR command's documentation includes this interesting note.
Note: this is a string operation because Redis does not have a dedicated integer type. The string stored at the key is interpreted as a base-10 64 bit signed integer to execute the operation.
redis-cli to communicate with a runnning Redis instance reveals some interesting behavior.
redis 127.0.0.1:6379> set bar 100 OK redis 127.0.0.1:6379> incr bar (integer) 101
First bar is set to 100. Then bar is incremented and the new value is returned. All is well and good. But it turns out Redis doesn't really support integers, so even if
bar exists, the
INCR command can fail.
redis 127.0.0.1:6379> set bar notAnInteger OK redis 127.0.0.1:6379> incr bar (error) ERR value is not an integer or out of range
As you can see,
INCR fails in this example. Admittedly, this is a contrived example. But take this example
redis 127.0.0.1:6379> set bar "\x35" OK redis 127.0.0.1:6379> incr bar (integer) 6 redis 127.0.0.1:6379> set bar "\x00" OK redis 127.0.0.1:6379> incr bar (integer) 1 redis 127.0.0.1:6379> set bar "\x01" OK redis 127.0.0.1:6379> incr bar (error) ERR value is not an integer or out of range
In each case,
bar is first set to a string of length 1. Setting
bar to 53 just happens to be ASCII for
INCR increments it to
bar is set to 0, which is ASCII for the
NULL character. For whatever reason, calling increment on this results in it having a value of ASCII
"1". The ASCII character code for the character
"1" is actually 49, so I'm not really sure how
0 + 1 = 49. There is no consistency in this aspect, as setting
bar to 1 simply causes the
INCR to fail.
But it actually gets much more interesting.
redis 127.0.0.1:6379> set bar "\x00abcdefg" OK redis 127.0.0.1:6379> incr bar (integer) 1 redis 127.0.0.1:6379> get bar "1" redis 127.0.0.1:6379> set bar "a\x00abcdefg" OK redis 127.0.0.1:6379> incr bar (error) ERR value is not an integer or out of range redis 127.0.0.1:6379> set bar "99\x00abcdefg" OK redis 127.0.0.1:6379> incr bar (integer) 100 redis 127.0.0.1:6379> get bar "100"
In each case,
bar is set to a string containing the
NULL character. Calling
INCR with a the value of
"\x00abcdefg" somehow results in a value
"1". Where did the data after the null byte go? At least with a value of
"a\x00abcdefg" it just blows up completely. But with a value
"100" is the result. While the documentation states "The string stored at the key is interpreted as a base-10 64 bit signed integer", what it really means is that the key is first interpreted as a NULL-terminated ASCII string. Then if that string is a base-10 integer then it increments everything that came before the null-byte. Obviously, this means all the data after the null-byte is lost. This does not seem in any way sensible to me.
The Kitchen Sink
Redis also supports Lists, Hashes, and Sets. These data structures should not be at all surprising, but the values and keys are always strings. The great things about doing this with Redis is that you can have many processes simultaneous working away on the same data structures and it is all done atomically.
The hash type in Redis does what you would expect: it maps from some set of values to another set of values. Of course, it always maps from strings to strings. But that's not that big of a deal. What is confusing, is the choice of vocabulary. To add something to a hash, use the
HSET command. The
HSET command takes 3 arguments, the
value. Key is the actual key in Redis that is to be treated as a hash. The field name is the field within the key that is assigned. This is demonstrated using
redis 127.0.0.1:6379> HSET foo meow kitty (integer) 1
Redis assigns the field
meow to have a value of
kitty in the key
foo. The key
foo is now a hash if it did not exist before. But to get all the fields in the key
foo, you have to do the following.
redis 127.0.0.1:6379> HKEYS foo 1) "meow"
Despite referring to
meow as a field in the context of the
HSET command, you use the command
HKEYS to get all of the fields in a hash. This sort of vocabulary madness makes the documentation less than clear. It also makes for great meeting room discussions where after 20 minutes of arguing, you realize what you think everyone else calls fields are really keys. Or maybe it is the other way around. Why not just name the command
Redis also supports sorted sets. Each member of the set has a floating point number associated with it that is referred as its score. Members are sorted according to score. Initially, I thought I would have a great number of uses for sorted sets. It turns out I didn't use them at all, so I won't be discussing them any farther.
Storage and Persistence
Internally Redis keeps all data in memory. This allows it to achieve relatively high performance. Persistence is achieved by periodically calling
fork() and writing the entirety of the dataset to a file. Alternatively, the
fork()ed child can be configured to write to an append only file, just recording what has changed. In my case, I opted to disable these mechanisms since durability of the data is irrelevant to me.
Since Redis stores all data in memory, the size of your dataset is naturally constrained by the available memory. You can run Redis with or without an explicit memory limit. If you run Redis without an explicit limit, a large enough dataset will spill over into swap space. Naturally, the performance of Redis plummets when this occur.
With explicit limits set, Redis allows a couple of policies on how to handle hitting the memory limit.
The first policy is to do nothing. The write which would have caused Redis to exceed the memory limit fails. Any subsequent write fails. Removing keys from the data store frees up space and writes will succeed again. This is the policy I am using presently.
For Redis support in Python, the clients page describes
redis-py as "Mature and supported. Currently the way to go for Python." This package installs the
redis module. The first thing that jumped out at me is you'd think
redis.Redis is a client. It turns out it is, but actually implements an old version of the API that does not correspond to the commands in the official documentation. If you want to take advantage of the official documentation, you need to use
In an interactive session, getting started is easy:
>>> import redis >>> r = redis.StrictRedis() >>> r.ping() True
By default, a TCP connection is established to the loclahost. The
ping command checks to see if the server is alive. If redis is running on the local machine, you can use unix sockets to connect.
>>> import redis >>> r = redis.StrictRedis(unix_socket_path='/tmp/redis.sock') >>> r.ping() True
This was the point in time I discovered a pretty subtle bug. If you pass a value for
unix_socket_path that has a truth value of false, it still connects using TCP
>>> import redis >>> redis.StrictRedis(unix_socket_path=0).ping() True >>>
The patch to change this was pretty simple and stems from the usage of
None as a sentinel value for a keyword argument to the
__init__ function of the object.
Overall, the API is quite pythonic. Any command described in the Redis documentation is a method on the
StrictRedis object, but in all lowercase. One very important to thing to note is that the API coerces datatypes when possible.
>>> import redis >>> r = redis.StrictRedis() >>> r.set('testVal',5) True >>> r.get('testVal') '5' >>>
In this example, the number
5 is used as an argument to
get for the same key returns the string
5. This isn't a bug, but is very important to consider. Failure to do so leads to seemingly impossible conditions.
>>> import redis >>> r = redis.StrictRedis() >>> import math >>> r.set('pi',math.pi) True >>> if r.get('pi')!=math.pi: ... print 'What' ... What
The same as the before,
math.pi becomes a string in the call to the
Overall I am pretty pleased with Redis. I think it is important to fully define the behavior as to how strings get treated as integers. The Python API is quite pleasant to use. Redis seems to being going through a great deal of growth in its userbase currently, so it is likely such details will be hammered out in the near future.
It took me about a weekend to get my application using Redis. I wound up changing the choice of the data structures I was using mid way through, which probably contributed most of the time.