What is Redis?

This post is the beginning of a series, that I have been planning for some time. These will mostly be notes that I made while going down one rabbit hole or another. Some of them have been accumulated over a few years while some are leisure reading turned interesting.

Inspired from “How to Use Redis With Python – Real Python

So, here goes…

Redis stands for REmote DIctionary Service

In a crude way, you can think of Redis as an advanced version of python dictionaries. That you can store remotely, in memory. The in-memory part is important, cause that’s why Redis is so fast. It stays in-memory, and it’s “Remote” because you can store it in the cloud and query it repeatedly. It follows a client-server architecture.

You can install redis on ubuntu by doing:

sudo apt install redis-server

This also installs the redis-cli (command line interface) which you can use as a client.

To start the server, just do redis-server, optionally followed by the path of a configuration file.

Basics

In a new window, now do

redis-cli ping

You’ll get PONG as the response. The Server’s up and running. (port 6379 by default)

Typing only redis-cli will lead us into interactive REPL (Read Eval Print Loop), where you can type the command and see results instantaneously. But this is mostly just for playing around, most of the time, you’ll be using some kind of wrapper around Redis. In our case redis-py, which is a python wrapper for Redis. But before it, let’s play in the shell (redis-cli) itself.

In, shell:

127.0.0.1:6379> PING
PONG

The shell prompt is host:port. And ping works here too.

127.0.0.1:6379> SET dummy_key some_value
OK

127.0.0.1:6379> KEYS *
1) "dummy_key"

127.0.0.1:6379> GET dummy_key
"some_value"

This created a key value pair, like:

{
    "dummy_key": "some_value"
}

If the key doesn’t exist, we get nil

WARNING: Never do KEYS * in any DB whose magnitude you are not aware of. Since it fetches every single key in the DB, it will almost certainly freeze your system if there were enough keys.
For that, you may do something controlled like a scan:

127.0.0.1:6379> scan 0 match * count 5 
1) "0"
2) 1) "key1"
   2) "key2"
   3) "dummy_key"


This scan will start from the beginning (0 is the cursor), match all (*) keys, and give the first 5 results. Its result has two parts, part 1 is the cursor that you can use for fetching the next iteration of results. Part 2 is the result (all keys) we wanted.

Setting multiple key value pairs in a single command:

127.0.0.1:6379> MSET key1 value1 key2 value2
OK
127.0.0.1:6379> GET key1
"value1"
127.0.0.1:6379> MGET key1 key2
1) "value1"
2) "value2"

Hitherto, we’ve just created a key, value pair. Both of which were strings. Redis supports multiple types of values, BUT **only string keys**

To create a hash (a dictionary):

127.0.0.1:6379> HSET vehicles Bike "Harley Davidson"
(integer 1)
127.0.0.1:6379> HSET vehicles Car "Mercedes"
(integer 1)
127.0.0.1:6379> HGETALL vehicles
1) "Bike"
2) "Harley Davidson"
3) "Car"
4) "Mercedes"

Or, we could do it in a single line: HMSET vehicles Bike "Harley Davidson" Car "Mercedes"

This is equivalent to:

{
   "vehicles": {
        "Bike": "Harley Davidson",
        "Car": "Mercedes",
   }
}

Remember that Redis is already a dictionary. The vehicle dictionary we created is an entry in that dictionary. So there are 2 dictionaries now,
1. Redis
2. vehicles

But, we can’t go beyond this. Redis only supports dictionaries (hashes) where both key and value are of string type. For storing a nested dictionary we have to serialize the dictionary in some format that could be used to retrieve nested dictionary. Like, a json string of dictionary.

But, our sample dictionary is flawed, if we have to store multiple cars, then we need to append to the string or maybe use a CSV format. It’d be better to use a List in this case.

127.0.0.1:6379> LPUSH Car Mercedes Tesla BMW
(integer) 3
127.0.0.1:6379> LRANGE Car 0 2
1) "BMW"
2) "Tesla"
3) "Mercedes"

LPUSH pushes items into the list Car

LRANGE shows the content of list Car from index 0 to 2 (Indices here start from 0, go to n-1)

List of all data types supported as a value by Redis can be found here

Now let’s continue from here in Python!

pip install redis

Fire up python shell and do:

>>> import redis

>>> r = redis.Redis()

>>> r.lrange("Car", 0, 2)
[b'BMW', b'Tesla', b'Mercedes']

See! This is a fresh shell, and the data created in another window using redis-cli is still present. Because Redis is still in memory (we didn’t close it after entering data from CLI) and we are accessing the same database.

Also, the commands stayed the same (except setting up our client), like lrange here.

Even r.ping() can be used (In python, it returns True)

The same redis server will support upto 16 different databases out of the box!

By default we use db 0, we can specify the db number using:

redis-cli -n 1

or

r = redis.Redis(db=1)

In Python, we can use primitive data types for values, like:

>>> r.lpush("dummy", b"bytes", "string", 1, 3.14)

>>> r.lrange("dummy", 0, 4)
[b'3.14', b'1', b'string', b'bytes']

Only these 4 types, float, int, string, bytes are allowed. These are then converted by the python wrapper to their string representations

Interesting concepts

The article became really gripping when it started getting in-depth of redis-py usage. In short, they are:

1. Pipelining

If we have multiple entries to be made, there will be at least n trips for n insertions in DB. Pipelining allows us to perform all of those n tasks in a single trip!

In python, we do:

>>> dummy_dict = {
...     "a": 1,
...     "b": 2,
...     "c": 3,
...     "d": 4,
... }

>>> with r.pipeline() as pipe:
...     for key, value in dummy_dict.items():
...         pipe.set(key, value)
...     pipe.execute()
... 

>>> r.get('a')
 b'1'

The single trip was performed at pipe.execute().

On shell, we see why it is called pipelining:

$ echo -e "set a 1 \nset b 2 \nset c 3\n set d 4" | nc localhost 6379
+OK
+OK
+OK
+OK

^C

All 4 commands, separated by \n, were sent to the Redis server running at localhost port 6379 via Netcat (nc) command. The 4 operations were performed at once.

Although this is just a sample, we could actually do this specific operation in a single line:
mset a 1 b 2 c 3 d 4

2. Transaction Lock

Using this, you can make a set of commands atomic. That is, a single unit, that is executed sequentially without being interrupted by any other request.
This ensures, that either all of the commands in a transaction are executed or none.

Syntax for this is simple and similar for both cli and python:

MULTI
INCR foo
INCR bar
EXEC

In python:

with r.pipeline() as pipe:
    pipe.multi()
    pipe.incr("foo")
    pipe.incr("bar")
    pipe.execute()

Since foo and bar are not defined, their initial value is 0 by default. So the first increment makes their value 1

3. Optimistic Locking

This was a first. I had read about locking, which essentially means that whilst one request is modifying certain data, nothing else can modify that data during that time (this is an example of read lock, i.e other transactions can only read during that time).

What optimistic locking does is, it stands guard to certain data. It keeps watching that data while letting you perform the transaction on the side. If the data being watched is changed while you are performing the transaction, then your transaction fails. You have to redo it.

Let’s try it in python:

with r.pipeline() as pipe:
    pipe.watch("mykey")
    pipe.multi()
    pipe.incr("mykey")
    pipe.execute()

This will execute perfectly since there is no race condition, no other client is modifying it simultaneously.
Let’s try to emulate that though:

import time

with r.pipeline() as pipe:
    pipe.watch("mykey")

    time.sleep(5)

    pipe.multi()
    pipe.incr("mykey")
    pipe.execute()

And during the sleep, do set mykey 100 in your shell.

Result:

WatchError: Watched variable changed.

The key thing to note here is that the client who put the WATCH on the variable mykey is allowed to modify it. But when another client does so, the multi-exec clause fails for the original client.
After an exec, all WATCHes are removed irrespective of the result for exec.

Yay! Now to use it one can use an infinite loop, and execute the pipeline in a try block. Which breaks on success, as is wonderfully explained in the realpython article.

On shell, we do it like:

127.0.0.1:6379> watch mykey
OK
127.0.0.1:6379> multi
OK
127.0.0.1:6379> incr mykey
QUEUED
127.0.0.1:6379> exec
1) (integer) 6

If another client changed the value of mykey, during the watch, then exec would fail and return nil

We can replicate this.

127.0.0.1:6379> watch mykey
OK

---------------------- Another shell -----------------------
127.0.0.1:6379> incr mykey
(integer) 7
-------------------- Another shell ends --------------------

127.0.0.1:6379> multi
OK
127.0.0.1:6379> set mykey 100
QUEUED
127.0.0.1:6379> exec
(nil)

Here, after our watch started, we switched to another shell and incremented mykey, this resulted in the failure of exec!

Other interesting features, applications

Auto expire keys: This can be used to auto-expire a key in Redis after said amount of seconds or milliseconds. That means, after said time, neither that key nor its values will exist in the DB.
But, if you used a WATCH on a volatile key (one which has its expiry set), and it expires during a watch, then it will not cause failure in exec

Saving DB to disk: You can backup your Redis DB on a hard disk too. In the config file, you can specify settings that specify how many changes in a certain time period are allowed! Like, you can tell Redis to save DB when, say, 10 changes occur in 5 minutes. You can also trigger a save manually using SAVE (synchronous, blocking) or BGSAVE (async, non-blocking).

Amazon ElasticCache for Redis: A scalable Redis server. Just specify the host in Redis config (can pass as an arg in CLI, or specify in the config file. For python, can specify when creating an instance of redis.Redis class) and all functionality stays the same!

Instagrams’s dilemma: They needed a service to map 300M photos to the User IDs that created them. SQL DB was out of question since no updation to data was required, no relation with other tables, etc. Setting key-value for each record was resulting in too big a database to be stored by Amazon EC2. So they used hashes, each of size 1000. Each such hash was called a bucket. Bucket number could be calculated by mediaID//1000 (Integer division), then setting the key-value pair in that hash (1000 keys long bucket). This led to a 3/4th decrease in size compared to key-value for every entry!

Bonus!

  1. Seeing how celery uses redis

I first got to know about Redis when I was setting up celery for a backend. We used Redis as a message broker (one which allows communication between different applications) for celery.

So I was curious how celery stores data in Redis. This is how it looks:

127.0.0.1:6379> get celery-task-meta-55bae3ea-531f-4597-9ce5-cf32f22f4235

{
    "status": "SUCCESS",
    "result": null,
    "traceback": null,
    "children": [],
    "date_done": "2021-04-20T02:08:01.330722",
    "task_id": "55bae3ea-531f-4597-9ce5-cf32f22f4235"
}

I formatted the result so it would be a bit more palatable. First I listed a few keys that were present in the DB. Then for a key, I simply did a get, and this is the result.

If you are familiar with celery, you may recognize some things. The task.delay() results in an AsyncResult object with a UUID, this UUID is the task id above.

On this object, you can do task.status, task.result ... and so on. This is where it fetches them from!

2. Play with redis online

https://try.redis.io/ is a great playground and an interactive tutorial too. No need to setup.

You can even open multiple instances in different tabs all linked to the same database. But it doesn’t support WATCH 😦

Phew, quite a ride. See you next time, till then,

Keep queuing your tasks! 😀

storymode7

Leave a comment