Reference

The Null data type

A NULL value means that data is missing, just like in SQL. An operation with NULL returns NULL without throwing exceptions or errors. Here are some examples:

NULL + 1
NULL['a']
int('Hello')
float('')

To test if a value is NULL or not, you should use the is and is not operators.

SELECT * FROM csv WHERE col1 is not NULL

You can use one of the following alternative casing:

NULL
Null
null

General SQL functions

coalesce(val, default)[source]

returns default if val is NULL otherwise returns val

ifnull(val, default)

returns default if val is NULL otherwise returns val

nullif(a, b)[source]

returns NULL if a == b otherwise returns a

Aggregate functions

any_agg(val)[source]

Returns True when there is at least one True value, ignoring NULLs

avg_agg(val)[source]

Average all non-null input values

count_agg(val)[source]

Count all non-null input values

count_distinct_agg(val)[source]

Count the number of unique (non-null) input values.

dict_agg(key, val)[source]

Collects key-value pairs into a dict. Key must be unique and not null (null keys are discarded). In case of duplicated keys, the value returned is the last seen.

every_agg(val)[source]

Returns True when all non-null values are True

first_agg(val, respect_nulls=True)[source]

Returns the first value. Returns the first non-null value when respect_nulls is False.

lag_agg(val, offset=1, default=NULL)[source]

Returns the value at offset rows before the last row. Returns default if there is no such row. Especially useful with SELECT PARTIAL to return the value at offset rows before the current row.

last_agg(val, respect_nulls=True)[source]

Returns the last value. Returns the last non-null value when respect_nulls is False.

list_agg(val, respect_nulls=True)[source]

Collects all input values into a list. Filters out NULLs when respect_nulls is False.

max_agg(val)[source]

Maximum value across all non-null input values

min_agg(val)[source]

Minimum value across all non-null input values

prod_agg(val)[source]

Product across all non-null input values

set_agg(val, respect_nulls=True)[source]

Collects all distinct input values into a set. Filters out NULLs when respect_nulls is False.

string_agg(val, sep, respect_nulls=False)[source]

Concatenates all input values into a string. Uses sep to separate values in the string. Filters out NULLs when respect_nulls is False (default).

sum_agg(val)[source]

Sum of all non-null input values

Dictionaries

class qdict(adic, dirty=True, **kwargs)[source]

A dictionary that supports NULL where items can be accessed like attributes:

mydict = qdict({
    "a": 1,
    "b": {
        "c": 2
    },
    "d": None
})
mydict.a    # returns 1, same as mydict["a"]
mydict.z    # returns NULL whenever a key is not found
mydict.b.c  # returns 2, neested dicts also support attribute access
mydict.b.x  # returns NULL, neested dicts are null-safe too
mydict.d    # returns NULL, Nones are converted to NULLs)
items()[source]

Returns a zip of keys and values. Attention: does not return a view like dict!

values()[source]

Returns a tuple of the dict values. Attention: does not return a view like dict!

Querying API

class Query(query: str, input_options: Optional[dict] = None, output_options: Optional[dict] = None, json_obj_files: Optional[dict] = None, unbuffered=False, warning_flag='default', verbose=0, default_to_clause='MEMORY')[source]

A SPyQL query than can be executed on top of a file or variables producing a file or a QueryResult. Example:

query = Query("""
    SELECT row.name as first_name, row.age
    FROM data
    WHERE row.age > 30
""")

result = query(data=[
    {"name": "Alice", "age": 20, "salary": 30.0},
    {"name": "Bob", "age": 30, "salary": 12.0},
    {"name": "Charles", "age": 40, "salary": 6.0},
    {"name": "Daniel", "age": 43, "salary": 0.40},
])

## result:
# (
#    {"first_name": "Charles", "age": 40},
#    {"first_name": "Daniel", "age": 43},
# )
stats()[source]

Returns a dictionary with statistics about the query execution, namely the number of rows in the input and output.

Return type

dict

class QueryResult(_QueryResult__values, _QueryResult__colnames)[source]

Result of a query that writes outputs to memory. Tuple of dictionaries with easy access of columns by name as attributes.

Accessing the value of the age column in the first row:

result[0].age
result[0]["age"]

Collecting the age for all rows as a tuple:

result.age
result.col("age")

Collecting the age for a subset of rows as a tuple:

result[1:3].age
result[1:3].col("age")

Collecting the value of the first column for all rows as a tuple:

result.col(0)

Iterating over rows:

for row in result:
    print(row.age, row.another_column)
col(idx)[source]

Collects and returns a tuple with all values of the col defined by idx.

Parameters

idx – if idx is an integer it refers to the nth column (0-based indexing). If idx is a string it refers to the name of the column.

colnames()[source]

Returns a tuple with the name of each column.

Return type

tuple[str]