Recipes & Examples¶

Example queries¶

You can run the following example queries in the terminal: spyql "the_query" < a_data_file

Example data files are not provided on most cases.

Query a CSV (and print a pretty table)¶

SELECT a_col_name, 'positive' if int(col2) >= 0 else 'negative' AS sign
FROM csv
TO pretty

Convert CSV to a flat JSON¶

SELECT * FROM csv TO json

Convert from CSV to a hierarchical JSON¶

SELECT {'client': {'id': col1, 'name': col2}, 'price': 120.40} AS json
FROM csv TO json

or

SELECT {'id': col1, 'name': col2} AS client, 120.40 AS price
FROM csv TO json

JSON to CSV, filtering out NULLs¶

SELECT .client.id AS id, .client.name AS name, .price
FROM json
WHERE .client.name is not NULL
TO csv

Explode JSON to CSV¶

SELECT .invoice_num AS id, .items.name AS name, .items.price AS price
FROM json
EXPLODE .items
TO csv

Sample input:

{"invoice_num" : 1028, "items": [{"name": "tomatoes", "price": 1.5}, {"name": "bananas", "price": 2.0}]}
{"invoice_num" : 1029, "items": [{"name": "peaches", "price": 3.12}]}

Output:

id, name, price
1028, tomatoes, 1.5
1028, bananas, 2.0
1029, peaches, 3.12

Python iterator/list/comprehension to JSON¶

SELECT 10 * cos(col1 * ((pi * 4) / 90))
FROM range(80)
TO json

or

SELECT col1
FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
TO json

Importing python modules¶

Here we import hashlib to calculate a md5 hash for each input line. Before running this example you need to install the hashlib package (pip install hashlib).

IMPORT hashlib as hl
SELECT hl.md5(col1.encode('utf-8')).hexdigest()
FROM text

Getting the top 5 records¶

SELECT int(score) AS score, player_name
FROM csv
ORDER BY 1 DESC NULLS LAST, score_date
LIMIT 5

Aggregations¶

Totals by player, alphabetically ordered.

SELECT .player_name, sum_agg(.score) AS total_score
FROM json
GROUP BY 1
ORDER BY 1

Partial aggregations¶

Calculating the cumulative sum of a variable using the PARTIALS modifier. Also demoing the lag aggregator.

SELECT PARTIALS
    .new_entries,
    sum_agg(.new_entries) AS cum_new_entries,
    lag(.new_entries) AS prev_entries
FROM json
TO json

Sample input:

{"new_entries" : 10}
{"new_entries" : 5}
{"new_entries" : 25}
{"new_entries" : null}
{}
{"new_entries" : 100}

Output:

{"new_entries" : 10,   "cum_new_entries" : 10,  "prev_entries": null}
{"new_entries" : 5,    "cum_new_entries" : 15,  "prev_entries": 10}
{"new_entries" : 25,   "cum_new_entries" : 40,  "prev_entries": 5}
{"new_entries" : null, "cum_new_entries" : 40,  "prev_entries": 25}
{"new_entries" : null, "cum_new_entries" : 40,  "prev_entries": null}
{"new_entries" : 100,  "cum_new_entries" : 140, "prev_entries": null}

If PARTIALS was omitted the result would be equivalent to the last output row.

Distinct rows¶

SELECT DISTINCT *
FROM csv

Command line examples¶

To run the following examples, type Ctrl-x Ctrl-e on you terminal. This will open your default editor (emacs/vim). Paste the code of one of the examples, save and exit.

Queries on Parquet with directories¶

Here, find transverses a directory and executes parquet-tools for each parquet file, dumping each file to json format. jq -c makes sure that the output has 1 json per line before handing over to spyql. This is far from being an efficient way to query parquet files, but it might be a handy option if you need to do a quick inspection.

find /the/directory -name "*.parquet" -exec parquet-tools cat --json {} \; |
jq -c |
spyql "
    SELECT .a_field, .a_num_field * 2 + 1
    FROM json
"

Querying multiple json.gz files¶

gzcat *.json.gz |
jq -c |
spyql "
    SELECT .a_field, .a_num_field * 2 + 1
    FROM json
"

Querying YAML / XML / TOML files¶

yq converts yaml, xml and toml files to json, allowing to easily query any of these with spyql.

cat file.yaml | yq -c | spyql "SELECT .a_field FROM json"

cat file.xml | xq -c | spyql "SELECT .a_field FROM json"

cat file.toml | tomlq -c | spyql "SELECT .a_field FROM json"

Kafka to PostegreSQL pipeline¶

Read data from a kafka topic and write to postgres table name customer.

kafkacat -b the.broker.com -t the.topic |
spyql -Otable=customer -Ochunk_size=1 --unbuffered "
    SELECT
        .customer.id AS id,
        .customer.name AS name
    FROM json
    TO sql
" |
psql -U an_user_name -h a.host.com a_database_name

Monitoring statistics in Kafka¶

Read data from a kafka topic, continuously calculating statistics.

kafkacat -b the.broker.com -t the.topic |
spyql --unbuffered "
    SELECT PARTIALS
        count_agg(*) AS running_count,
        sum_agg(value) AS running_sum,
        min_agg(value) AS min_so_far,
        value AS current_value
    FROM json
    TO csv
"

Sub-queries (piping)¶

A special file format (spy) is used to efficiently pipe data between queries.

cat a_file.json |
spyql "
    SELECT ' '.join([.first_name, .middle_name, .last_name]) AS full_name
    FROM json
    TO spy" |
spyql "SELECT full_name, full_name.upper() FROM spy"

(Equi) Joins¶

It is possible to make simple (LEFT) JOIN operations based on dictionary lookups.

Given numbers.json:

{
   "1": "One",
   "2": "Two",
   "3": "Three"
}

Query:

spyql -Jnums=numbers.json "
        SELECT nums[col1] as res
        FROM [3,4,1,1]
        TO json"

Output:

{"res": "Three"}
{"res": null}
{"res": "One"}
{"res": "One"}

If you want a INNER JOIN instead of a LEFT JOIN, you can add a criteria to the where clause, e.g.:

SELECT nums[col1] as res
FROM [3,4,1,1]
WHERE col1 in nums
TO json

Output:

{"res": "Three"}
{"res": "One"}
{"res": "One"}

Queries over APIs¶

curl https://reqres.in/api/users?page=2 |
spyql "
    SELECT
        .data.email AS email,
        'Dear {}, thank you for being a great customer!'.format(.data.first_name) AS msg
    FROM json
    EXPLODE .data
    TO json
"

Plotting to the terminal¶

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO plot
"

Plotting with matplotcli ¶

spyql "
   SELECT col1 AS y
   FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
   TO json
" | plt "plot(y)"

Recipes & Examples¶

Example queries¶

Query a CSV (and print a pretty table)¶

Convert CSV to a flat JSON¶

Convert from CSV to a hierarchical JSON¶

JSON to CSV, filtering out NULLs¶

Explode JSON to CSV¶

Python iterator/list/comprehension to JSON¶

Importing python modules¶

Getting the top 5 records¶

Aggregations¶

Partial aggregations¶

Distinct rows¶

Command line examples¶

Queries on Parquet with directories¶

Querying multiple json.gz files¶

Querying YAML / XML / TOML files¶

Kafka to PostegreSQL pipeline¶

Monitoring statistics in Kafka¶

Sub-queries (piping)¶

(Equi) Joins¶

Queries over APIs¶

Plotting to the terminal¶

Plotting with matplotcli ¶

Table of Contents

Previous topic

Next topic

This Page

Recipes & Examples¶

Example queries¶

Query a CSV (and print a pretty table)¶

Convert CSV to a flat JSON¶

Convert from CSV to a hierarchical JSON¶

JSON to CSV, filtering out NULLs¶

Explode JSON to CSV¶

Python iterator/list/comprehension to JSON¶

Importing python modules¶

Getting the top 5 records¶

Aggregations¶

Partial aggregations¶

Distinct rows¶

Command line examples¶

Queries on Parquet with directories¶

Querying multiple json.gz files¶

Querying YAML / XML / TOML files¶

Kafka to PostegreSQL pipeline¶

Monitoring statistics in Kafka¶

Sub-queries (piping)¶

(Equi) Joins¶

Queries over APIs¶

Plotting to the terminal¶

Plotting with matplotcli¶

Plotting with matplotcli ¶