Reference¶
The Null data type¶
A NULL
value means that data is missing, just like in SQL.
An operation with NULL
returns NULL
without throwing exceptions or
errors. Here are some examples:
NULL + 1
NULL['a']
int('Hello')
float('')
To test if a value is NULL
or not, you should use the is
and is not
operators.
SELECT * FROM csv WHERE col1 is not NULL
You can use one of the following alternative casing:
NULL
Null
null
General SQL functions¶
- ifnull(val, default)¶
returns
default
ifval is NULL
otherwise returnsval
Aggregate functions¶
- dict_agg(key, val)[source]¶
Collects key-value pairs into a dict. Key must be unique and not null (null keys are discarded). In case of duplicated keys, the value returned is the last seen.
- first_agg(val, respect_nulls=True)[source]¶
Returns the first value. Returns the first non-null value when respect_nulls is False.
- lag_agg(val, offset=1, default=NULL)[source]¶
Returns the value at offset rows before the last row. Returns default if there is no such row. Especially useful with SELECT PARTIAL to return the value at offset rows before the current row.
- last_agg(val, respect_nulls=True)[source]¶
Returns the last value. Returns the last non-null value when respect_nulls is False.
- list_agg(val, respect_nulls=True)[source]¶
Collects all input values into a list. Filters out NULLs when respect_nulls is False.
- set_agg(val, respect_nulls=True)[source]¶
Collects all distinct input values into a set. Filters out NULLs when respect_nulls is False.
Dictionaries¶
- class qdict(adic, dirty=True, **kwargs)[source]¶
A dictionary that supports
NULL
where items can be accessed like attributes:mydict = qdict({ "a": 1, "b": { "c": 2 }, "d": None }) mydict.a # returns 1, same as mydict["a"] mydict.z # returns NULL whenever a key is not found mydict.b.c # returns 2, neested dicts also support attribute access mydict.b.x # returns NULL, neested dicts are null-safe too mydict.d # returns NULL, Nones are converted to NULLs)
Querying API¶
- class Query(query: str, input_options: Optional[dict] = None, output_options: Optional[dict] = None, json_obj_files: Optional[dict] = None, unbuffered=False, warning_flag='default', verbose=0, default_to_clause='MEMORY')[source]¶
A SPyQL query than can be executed on top of a file or variables producing a file or a
QueryResult
. Example:query = Query(""" SELECT row.name as first_name, row.age FROM data WHERE row.age > 30 """) result = query(data=[ {"name": "Alice", "age": 20, "salary": 30.0}, {"name": "Bob", "age": 30, "salary": 12.0}, {"name": "Charles", "age": 40, "salary": 6.0}, {"name": "Daniel", "age": 43, "salary": 0.40}, ]) ## result: # ( # {"first_name": "Charles", "age": 40}, # {"first_name": "Daniel", "age": 43}, # )
- class QueryResult(_QueryResult__values, _QueryResult__colnames)[source]¶
Result of a query that writes outputs to memory. Tuple of dictionaries with easy access of columns by name as attributes.
Accessing the value of the age column in the first row:
result[0].age result[0]["age"]
Collecting the age for all rows as a tuple:
result.age result.col("age")
Collecting the age for a subset of rows as a tuple:
result[1:3].age result[1:3].col("age")
Collecting the value of the first column for all rows as a tuple:
result.col(0)
Iterating over rows:
for row in result: print(row.age, row.another_column)