Practical session with JSON in Clickhouse

Table of Contents

Working with JSON in Clickhouse
Storing JSON data
JSON Validating and checking keys
Extracting values
Using JSON key values for indexing
JSON keys or separate columns?
Experimental JSON object type
Summary

Working with JSON in Clickhouse
Storing JSON data
JSON Validating and checking keys
Extracting values
Using JSON key values for indexing
JSON keys or separate columns?
Experimental JSON object type
Summary

Working with JSON in Clickhouse

Dealing with dynamic data structures can be challenging, especially when the set of features varies over time or under different circumstances. JSON provides a flexible solution for handling dynamic data, and ClickHouse offers convenient tools for working with it.

Storing JSON data

The simplest (and single as of now) way to store JSON object is to create String typed column and store JSON text representation inthere:

CREATE TABLE test\_string ( \`t\` DateTime, **\`v\` String** )  
ENGINE = MergeTree ORDER BY t

We’re going to save JSON data under v column:

INSERT INTO test\_string VALUES(now(), **'{"name":"Joe","age":95}'**)

Clickhouse provides several functions to work with JSON data, including validating JSON data itself.

JSON Validating and checking keys

We can simply check if JSON is valid using isValidJSON function:

SELECT v, isValidJSON(v) FROM test\_string

Which returns 1 (or 0) if JSON is valid (or not):

We can also check if object contains certain key (or any of given keys) which is useful during data cleansing procedures:

SELECT  
**JSONHas(v, 'name') AND JSONHas(v, 'rating')** AS is\_valid,  
count(\*)  
FROM test\_string GROUP BY **is\_valid**

Here we require our JSON field to have name and rating keys defined to be marked as valid:

As we can see, our table contains 2 invalid values.

Extracting values

In most cases we want to operate with JSON attributes (keys) values, which can be done in multiple ways. First of all there is a function to extract typed values based on key names:

SELECT  
**JSONExtract**(v, 'name', 'String'),  
**JSONExtract**(v, 'rating', 'UInt32')  
FROM test\_string LIMIT 5

Here we extract name key value as string and rating key value as [unsigned 4 bytes] integer from v column value (which stores JSON):

We can extract scalar types (like String, Int/UInt* or Float*) or complex structures, like Array or Tuple (or combinations of any):

SELECT JSONExtract('{"val": \[1,2,3,4\]}', 'val', 'Array(UInt8)')**\[2\]**

Which will return second integer value from val array (which is a JSON key):

Another option to extract value (without having to think about types) is to use JSON_VALUE function:

SELECT **JSON\_VALUE(v, '$.name')** FROM test\_string LIMIT 5

Which, again, will simply list value of name key of v JSON column:

Using JSON key values for indexing

As we know, Clickhouse allows using function to build sorting keys based on. That’s why we can use JSON extraction functions in indexes to optimize certain queries:

CREATE TABLE test\_index ( \`t\` Int64, \`v\` String )  
ENGINE = MergeTree **ORDER BY JSONExtractUInt(v, 'rating')**

Here, we’ve used extracted rating key value as a sorting key, so the relevant queries will be quite efficient:

SELECT count(\*) FROM test\_index  
WHERE **JSONExtractUInt(v, 'rating') = 140970729**

Clickhouse will answer in the very efficient way in this case since it uses index for that:

On the contrary, if we filter on the JSON key which is not under index, full table scan follows. For example, the same query on the same table/data, but with a different sort index results in the following:

250x rows scanned, 250x RAM used and 35x time spent. Not good.

JSON keys or separate columns?

If you know your JSON field is going to have exactly certain keys of exactly certain types, it is better to move them out of JSON value into separate columns before writing to Clickhouse. Doing so will save space (because of saving data using relevant types instead of strings) and improve performance (since no JSON extraction overhead will happen).

Experimental JSON object type

Clickhouse has experimental JSON type which is too early to consider for production, but let’s take a look at that. Since the feature is experimental we have to:

SET allow\_experimental\_object\_type = 1

Now we can use JSON type while creating tables:

CREATE TABLE test\_json ( \`t\` DateTime, **\`v\` JSON** )  
ENGINE = MergeTree ORDER BY t

First cool thing about this type, is that we can directly use object notation in queries:

SELECT **v.name, v.rating** FROM test\_json LIMIT 5

And Clickhouse knows what to do:

Second cool thing about this type is that it takes less space (~30% saved in our case) than String field usage:

But one downside here is the speed of inserting data. If we compare insertion speed of the same data into String and JSON columns we can see almost 10 fold difference:

Anyway, this is still experimental, so hopefully Clickhouse team will document (in details) and release this soon.

Summary

Working with JSON in Clickhouse is as simple as using String field for data and then using set of JSON* functions to check and extract key values from JSON fields:

SELECT JSONExtract(json_col, 'key', 'String') FROM table

We can use extraction functions as indexes (or part of indexes) to improve query performance, but consider moving those to separate columns instead and leave only dynamic set of keys for storing in JSON columns.

Practical session with JSON in Clickhouse

Working with JSON in Clickhouse

Storing JSON data

JSON Validating and checking keys

Extracting values

Using JSON key values for indexing

JSON keys or separate columns?

Experimental JSON object type

Summary

::Share it on::

Comments (0)

Related articles

Setting up Single Node Clickhouse on AWS EC2 and S3 for Backup

Connecting ClickHouse with AWS RDS MySQL and Migrating Data

Scaling ClickHouse Using Amazon S3 as Storage

ClickHouse Partitions

Scaling ClickHouse Using Amazon S3 as Storage

Practical session with JSON in Clickhouse

#Working with JSON in Clickhouse

#Storing JSON data

#JSON Validating and checking keys

#Extracting values

#Using JSON key values for indexing

#JSON keys or separate columns?

#Experimental JSON object type

#Summary

::Share it on::

Comments (0)

Related articles

Setting up Single Node Clickhouse on AWS EC2 and S3 for Backup

Connecting ClickHouse with AWS RDS MySQL and Migrating Data

Scaling ClickHouse Using Amazon S3 as Storage

ClickHouse Partitions

Scaling ClickHouse Using Amazon S3 as Storage

Working with JSON in Clickhouse

Storing JSON data

JSON Validating and checking keys

Extracting values

Using JSON key values for indexing

JSON keys or separate columns?

Experimental JSON object type

Summary