Palzin Track
Get 15% off with code PTRACKSIGNUP15 

Laravel Diary Logo

Practical session with JSON in Clickhouse

clickhouse
Table of Contents

Dealing with dynamic data structures can be challenging, especially when the set of features varies over time or under different circumstances. JSON provides a flexible solution for handling dynamic data, and ClickHouse offers convenient tools for working with it.

The simplest (and single as of now) way to store JSON object is to create String typed column and store JSON text representation inthere:

CREATE TABLE test\_string ( \`t\` DateTime, **\`v\` String** )  
ENGINE = MergeTree ORDER BY t

We’re going to save JSON data under v column:

INSERT INTO test\_string VALUES(now(), **'{"name":"Joe","age":95}'**)

Clickhouse provides several functions to work with JSON data, including validating JSON data itself.

We can simply check if JSON is valid using isValidJSON function:

SELECT v, isValidJSON(v) FROM test\_string

Which returns 1 (or 0) if JSON is valid (or not):

We can also check if object contains certain key (or any of given keys) which is useful during data cleansing procedures:

SELECT  
**JSONHas(v, 'name') AND JSONHas(v, 'rating')** AS is\_valid,  
count(\*)  
FROM test\_string GROUP BY **is\_valid**

Here we require our JSON field to have name and rating keys defined to be marked as valid:

As we can see, our table contains 2 invalid values.

In most cases we want to operate with JSON attributes (keys) values, which can be done in multiple ways. First of all there is a function to extract typed values based on key names:

SELECT  
**JSONExtract**(v, 'name', 'String'),  
**JSONExtract**(v, 'rating', 'UInt32')  
FROM test\_string LIMIT 5

Here we extract name key value as string and rating key value as [unsigned 4 bytes] integer from v column value (which stores JSON):

We can extract scalar types (like String, Int/UInt* or Float*) or complex structures, like Array or Tuple (or combinations of any):

SELECT JSONExtract('{"val": \[1,2,3,4\]}', 'val', 'Array(UInt8)')**\[2\]**

Which will return second integer value from val array (which is a JSON key):

Another option to extract value (without having to think about types) is to use JSON_VALUE function:

SELECT **JSON\_VALUE(v, '$.name')** FROM test\_string LIMIT 5

Which, again, will simply list value of name key of v JSON column:

As we know, Clickhouse allows using function to build sorting keys based on. That’s why we can use JSON extraction functions in indexes to optimize certain queries:

CREATE TABLE test\_index ( \`t\` Int64, \`v\` String )  
ENGINE = MergeTree **ORDER BY JSONExtractUInt(v, 'rating')**

Here, we’ve used extracted rating key value as a sorting key, so the relevant queries will be quite efficient:

SELECT count(\*) FROM test\_index  
WHERE **JSONExtractUInt(v, 'rating') = 140970729**

Clickhouse will answer in the very efficient way in this case since it uses index for that:

On the contrary, if we filter on the JSON key which is not under index, full table scan follows. For example, the same query on the same table/data, but with a different sort index results in the following:

250x rows scanned, 250x RAM used and 35x time spent. Not good.

If you know your JSON field is going to have exactly certain keys of exactly certain types, it is better to move them out of JSON value into separate columns before writing to Clickhouse. Doing so will save space (because of saving data using relevant types instead of strings) and improve performance (since no JSON extraction overhead will happen).

Clickhouse has experimental JSON type which is too early to consider for production, but let’s take a look at that. Since the feature is experimental we have to:

SET allow\_experimental\_object\_type = 1

Now we can use JSON type while creating tables:

CREATE TABLE test\_json ( \`t\` DateTime, **\`v\` JSON** )  
ENGINE = MergeTree ORDER BY t

First cool thing about this type, is that we can directly use object notation in queries:

SELECT **v.name, v.rating** FROM test\_json LIMIT 5

And Clickhouse knows what to do:

Second cool thing about this type is that it takes less space (~30% saved in our case) than String field usage:

But one downside here is the speed of inserting data. If we compare insertion speed of the same data into String and JSON columns we can see almost 10 fold difference:

Anyway, this is still experimental, so hopefully Clickhouse team will document (in details) and release this soon.

Working with JSON in Clickhouse is as simple as using String field for data and then using set of JSON* functions to check and extract key values from JSON fields:

SELECT JSONExtract(json_col, 'key', 'String') FROM table

We can use extraction functions as indexes (or part of indexes) to improve query performance, but consider moving those to separate columns instead and leave only dynamic set of keys for storing in JSON columns.

::Share it on::

Comments (0)

What are your thoughts on "Practical session with JSON in Clickhouse"?

You need to create an account to comment on this post.

Related articles