Import CSV Data into Redis with a Single CLI Command

A few weeks back at talpasolutions I needed - only for performance testing purposes - to import a .tsv file (round about 600,000 lines with time-stamped sensor data in 15 columns) to see how redis handles it (especially memory wise). As it turned out the import is feasible with a single, one-(long)-line cli command.

Using the pipe mode redis-cli --pipe this is what I came up with (full command and corresponding .tsv see below ... also for other Redis data types than redis streams):

awk -F '\t' 'FNR > 1 {print "XADD some_stream "($1=="" ? "NA" : $1)" column_1 "($2=="" ? "NA" : $2)" column_2 "($3=="" ? "NA" : $3)" ... \n"}' /path/to/data.tsv | redis-cli --pipe 

Import Choices

Field Delimiter

If your field delimiter is a tab than use:

awk -F '\t' ...

Header Row

If you have one header row to ignore on import add this to your awk statement:

... FNR > 1  ...

String Qualifier and White Spaces in Cell Values

This was one of the trickiest as in the beginning I often got the error message ERR Protocol error: unbalanced quotes in request.
If your .csv cell value does not already contain "" you can use both and "try around" a little:

... column_1 "$2" ...
... column_1 \""$2"\" ...

If the value already has quotes around the string you must use the first option - otherwise you will also bounce into the above error message.

Empty Cells

To avoid misinterpretation by redis-cli due to empty cells and thus missing parameters use a "short hand" if (ternary operator) to check for and fill empty cells with e.g. "NA":

column_2 "($3=="" ? "NA" : $3)"

(Valid) JSON strings

If your .csv happens to have JSON-like strings in some cells ... it depends ;-> on:

  • if these are already valid, especially have double quotes around the keys
  • the string qualifier in the .csv
  • if you want to store them (immediately) as valid JSON string (and thus directly parsable in your code which fetches them from redis)
  • or if you can afford to store them as "relaxed"/JSON5 style and deal with it later with it in your code

In the original file I had there were no quotes around the JSON keys but for the worst case that your string qualifier is double quotes and you need to store them in a valid way then - in your .csv - you will need to escape the quotes around the JSON keys:

... "{\"key_1\": [5.1234567,15.1234567], \"key_2\": 5.000000, \"key_3\": 1, \"key_4\":-1}" ...

At least I did not come up with something else in the cli command per se (thus circumventing the need to clean up your raw data ... but may be this thread or this may enlighten you more than me).

Other Gotchas

New Line character

Don't for forget to print a \n at the end of the awk command:

... \n"}' /path/to/data.tsv | redis-cli --pipe

Extra (only for Redis Stream): CSV Timestamp as Redis Stream ID

If your timestamp from the .csv column is not unique but you want to set the .csv timestamp as the redis stream ID you will need to come up with some kind of index variable or something similar for the import (the timestamp in the .tsv I had was unique and sorted ascending by timestamp).

Actual Import

Sample Data

Here is a simplified version:

ts	column_1	column_2	column_3	column_4	column_5
1549878790	"{\"key_1\": [0.1234567,10.1234567], \"key_2\": 0.000000, \"key_3\": 1, \"key_4\":-1}"	true	0	-5	"Some Other Long String 1"
1549878791	"{\"key_1\": [1.1234567,11.1234567], \"key_2\": 1.000000, \"key_3\": 1, \"key_4\":-1}"	false	0		"Some Other Long String 2"
1549878792	"{\"key_1\": [2.1234567,12.1234567], \"key_2\": 2.000000, \"key_3\": 1, \"key_4\":-1}"	false	0	-44	"Some Other Long String 3"
1549878793	"{\"key_1\": [3.1234567,13.1234567], \"key_2\": 3.000000, \"key_3\": 1, \"key_4\":-1}"	false	0	-1234	"Some Other Long String 4"
1549878794	"{\"key_1\": [4.1234567,14.1234567], \"key_2\": 4.000000, \"key_3\": 1, \"key_4\":-1}"	false	0	0	"Some Other Long String 5"
1549878795	"{\"key_1\": [5.1234567,15.1234567], \"key_2\": 5.000000, \"key_3\": 1, \"key_4\":-1}"		123	-166	"Some Other Long String 6"
1549878796	"{\"key_1\": [6.1234567,16.1234567], \"key_2\": 6.000000, \"key_3\": 1, \"key_4\":-1}"	true	125	-125	"Some Other Long String 7"
1549878797	"{\"key_1\": [7.1234567,17.1234567], \"key_2\": 7.000000, \"key_3\": 1, \"key_4\":-1}"	false	0	571
1549878798	"{\"key_1\": [8.1234567,18.1234567], \"key_2\": 8.000000, \"key_3\": 1}"	false	0	43375	"Some Other Long String 9"
1549878799	"{\"key_1\": [9.1234567,19.1234567], \"key_2\": 9.000000, \"key_3\": 1}"	false	1	10125	"Some Other Long String 10"

Sample Import

If you (finally) have sorted out everything from above the actual import (here with XADD) is straight forward and - as always with Redis - very fast:

docker volume create redis_csv_import

docker run -v redis_csv_import:/data --name some-redis -d redis:5.0.3 redis-server --appendonly yes

docker cp redis_import_fake.tsv some-redis:/tmp

docker exec -it some-redis /bin/bash
# All commands now within the container

# to see if file is in the container
# ls -al /tmp/

# to check if awk recognizes your columns correctly
# awk -F'\t' '{print NF; exit}' /tmp/redis_import_fake.tsv

# for troubleshooting the format of your raw data
# cat -A /tmp/redis_import_fake.tsv

# do import
awk -F '\t' 'FNR > 1 {print "XADD some_stream \""($1=="" ? "NA" : $1)"\" column_1 "($2=="" ? "NA" : $2)" column_2 \""($3=="" ? "NA" : $3)"\" column_3 \""($4=="" ? "NA" : $4)"\" column_4 \""($5=="" ? "NA" : $5)"\" column_5 "($6=="" ? "NA" : $6)" \n"}' /tmp/redis_import_fake.tsv | redis-cli --pipe

# verfiy import
redis-cli XRANGE some_stream - +
 1) 1) "1549878790-0"
    2)  1) "column_1"
        2) "{\"key_1\": [0.1234567,10.1234567], \"key_2\": 0.000000, \"key_3\": 1, \"key_4\":-1}"
        3) "column_2"
        4) "true"
        5) "column_3"
        6) "0"
        7) "column_4"
        8) "-5"
        9) "column_5"
       10) "Some Other Long String 1"
 2) 1) "1549878791-0"
    2)  1) "column_1"
        2) "{\"key_1\": [1.1234567,11.1234567], \"key_2\": 1.000000, \"key_3\": 1, \"key_4\":-1}"
        3) "column_2"
        4) "false"
        5) "column_3"
        6) "0"
        7) "column_4"
        8) "NA"
        9) "column_5"
       10) "Some Other Long String 2"
 ...

The cli statement also works for other Redis data types as e.g. HMSET:

# do import
awk -F '\t' 'FNR > 1 {print " HMSET some_hash:ts:"($1=="" ? "NA" : $1)" column_1 "($2=="" ? "NA" : $2)" column_2 "($3=="" ? "NA" : $3)" column_3 "($4=="" ? "NA" : $4)" column_4 "($5=="" ? "NA" : $5)" column_5 "($6=="" ? "NA" : $6)" \n"}' /tmp/redis_import_fake.tsv | redis-cli --pipe

# verfiy import
redis-cli keys "*"
 1) "some_hash:ts:1549878791"
 2) "some_hash:ts:1549878797"
 3) "some_hash:ts:1549878798"
 4) "some_hash:ts:1549878790"
 5) "some_hash:ts:1549878792"
 6) "some_hash:ts:1549878796"
 7) "some_hash:ts:1549878799"
 8) "some_hash:ts:1549878793"
 9) "some_hash:ts:1549878794"
10) "some_hash:ts:1549878795"

Further Information

Redis Docs:
https://redis.io/topics/mass-insert
https://redis.io/topics/streams-intro
https://redis.io/commands/xadd
https://redis.io/commands/hmset

Gist which inspired my import approach:
https://gist.github.com/arsperger/6f246f21279edf3cd03ba2bee19daaef

JSON strings:
https://stackoverflow.com/questions/21065225/redis-command-line-set-value-containing-double-quotes
https://stackoverflow.com/questions/9637517/parsing-relaxed-json-without-eval