About SPyQL

SPyQL is a query language that combines:

  • the simplicity and structure of SQL;

  • with the power and readability of Python.

SELECT
    date.fromtimestamp(.purchase_ts) AS purchase_date,
    .price * .quantity AS total
FROM json
WHERE .department.upper() == 'IT'
ORDER BY 2 DESC
TO csv

SQL provides the structure of the query, while Python is used to define expressions, bringing along a vast ecosystem of packages.

SPyQL is fast and memory efficient. Take a look at the benchmarks with GB-size JSON data.

SPyQL CLI

SPyQL offers a command-line interface that allows running SPyQL queries on top of text data (e.g. CSV, JSON). Data can come from files but also from data streams, such as as Kafka, or from databases such as PostgreSQL. Basically, data can come from any command that outputs text :-). More, data can be generated by a Python expression! And since SPyQL also writes to different formats, it allows to easily convert between data formats.

Take a look at the Command line examples to see how to query parquet, process API calls, transverse directories of zipped JSONs, convert CSV to JSON, and import JSON/CSV data into SQL databases, among many other things.

See also:

SPyQL Module

SPyQL is also available as a Python module. In addition to the CLI features, you can also:

  • query variables (e.g. lists of dicts);

  • get results into in-memory data structures.

Principles

We aim for SPyQL to be:

  • Simple: simple to use with a straightforward implementation;

  • Familiar: you should feel at home if you are acquainted with SQL and Python;

  • Light: small memory footprint that allows you to process large data that fit into your machine;

  • Useful: it should make your life easier, filling a gap in the eco-system.