spyql package¶
Submodules¶
spyql.agg module¶
- dict_agg(key, val)[source]¶
Collects key-value pairs into a dict. Key must be unique and not null (null keys are discarded). In case of duplicated keys, the value returned is the last seen.
- first_agg(val, respect_nulls=True)[source]¶
Returns the first value. Returns the first non-null value when respect_nulls is False.
- lag_agg(val, offset=1, default=NULL)[source]¶
Returns the value at offset rows before the last row. Returns default if there is no such row. Especially useful with SELECT PARTIAL to return the value at offset rows before the current row.
- last_agg(val, respect_nulls=True)[source]¶
Returns the last value. Returns the last non-null value when respect_nulls is False.
- list_agg(val, respect_nulls=True)[source]¶
Collects all input values into a list. Filters out NULLs when respect_nulls is False.
- set_agg(val, respect_nulls=True)[source]¶
Collects all distinct input values into a set. Filters out NULLs when respect_nulls is False.
spyql.cli module¶
spyql.log module¶
- user_debug_dict(message, adict)[source]¶
Reports (debug) information, printing a dict as a pretty json.
- user_error(message, exception, code=None, vars=None)[source]¶
Reports an error, throwing the original exception Prints a custom message. Prints the data that originated the exception (if available).
spyql.nulltype module¶
A NULL
value means that data is missing, just like in SQL.
An operation with NULL
returns NULL
without throwing exceptions or
errors. Here are some examples:
NULL + 1
NULL['a']
int('Hello')
float('')
To test if a value is NULL
or not, you should use the is
and is not
operators.
SELECT * FROM csv WHERE col1 is not NULL
You can use one of the following alternative casing:
NULL
Null
null
spyql.output_handler module¶
- class DelayedOutSortAtEnd(orderby, limit, offset)[source]¶
Bases:
spyql.output_handler.OutputHandler
Only writes after collecting and sorting all data. Temporary implementation that reads every processed row into memory.
- class DistinctDelayedOutSortAtEnd(orderby, limit, offset)[source]¶
Bases:
spyql.output_handler.DelayedOutSortAtEnd
Alters DelayedOutSortAtEnd to only store distinct results instead of keeping all rows in memory
- class GroupByDelayedOutSortAtEnd(orderby, limit, offset)[source]¶
Bases:
spyql.output_handler.DelayedOutSortAtEnd
Extends DelayedOutSortAtEnd to only store intermediate group by results instead of keeping all rows in memory
- class LineInDistinctLineOut(limit, offset)[source]¶
Bases:
spyql.output_handler.OutputHandler
In-memory distinct handler that immediately writes every non-duplicated row
- class LineInLineOut(limit, offset)[source]¶
Bases:
spyql.output_handler.OutputHandler
Simple handler that immediately writes every processed row
- class OutputHandler(limit, offset)[source]¶
Bases:
object
Mediates data processing with data writting
- handle_result(result, group_key, sort_keys)[source]¶
To be implemented by child classes to handle a new output row (aka result). All inputs should be tuples.
spyql.parser module¶
spyql.processor module¶
- class CSVProcessor(prs, strings, path=None, sample_size=10, header=None, infer_dtypes=True, **options)[source]¶
Bases:
spyql.processor.Processor
- class JSONProcessor(prs, strings, path=None, **options)[source]¶
Bases:
spyql.processor.Processor
- class ORJSONProcessor(prs, strings, path=None, **options)[source]¶
Bases:
spyql.processor.Processor
- class Processor(prs, strings, path=None)[source]¶
Bases:
object
- eval_clause(clause, clause_exprs, mode='eval')[source]¶
Evaluates/executes a previously compiled clause
- get_input_iterator()[source]¶
Returns iterator over input (e.g. list if rows) Each row is list with one value per column e.g.:
[[1] ,[2], [3]] # 3 rows with a single col [[1,'a'], [2,'b'], [3,'c']] # 3 rows with 2 cols
- go(output_options, user_query_vars={}) Tuple[spyql.query_result.QueryResult, Dict[str, int]] [source]¶
- static make_processor(prs: dict, strings: spyql.quotes_handler.QuotesHandler, input_options: Optional[dict] = None)[source]¶
Factory for making an input processor based on the parsed query
- class PythonExprProcessor(prs, strings)[source]¶
Bases:
spyql.processor.Processor
- class SpyProcessor(prs, strings, path=None)[source]¶
Bases:
spyql.processor.Processor
- class TextProcessor(prs, strings, path=None)[source]¶
Bases:
spyql.processor.Processor
spyql.prof module¶
spyql.qdict module¶
- class qdict(adic, dirty=True, **kwargs)[source]¶
Bases:
dict
A dictionary that supports
NULL
where items can be accessed like attributes:mydict = qdict({ "a": 1, "b": { "c": 2 }, "d": None }) mydict.a # returns 1, same as mydict["a"] mydict.z # returns NULL whenever a key is not found mydict.b.c # returns 2, neested dicts also support attribute access mydict.b.x # returns NULL, neested dicts are null-safe too mydict.d # returns NULL, Nones are converted to NULLs)
- class str_qdict(adic, dirty=True, **kwargs)[source]¶
Bases:
spyql.qdict.qdict
spyql.query module¶
- class Query(query: str, input_options: Optional[dict] = None, output_options: Optional[dict] = None, json_obj_files: Optional[dict] = None, unbuffered=False, warning_flag='default', verbose=0, default_to_clause='MEMORY')[source]¶
Bases:
object
A SPyQL query than can be executed on top of a file or variables producing a file or a
QueryResult
. Example:query = Query(""" SELECT row.name as first_name, row.age FROM data WHERE row.age > 30 """) result = query(data=[ {"name": "Alice", "age": 20, "salary": 30.0}, {"name": "Bob", "age": 30, "salary": 12.0}, {"name": "Charles", "age": 40, "salary": 6.0}, {"name": "Daniel", "age": 43, "salary": 0.40}, ]) ## result: # ( # {"first_name": "Charles", "age": 40}, # {"first_name": "Daniel", "age": 43}, # )
spyql.query_result module¶
- class QueryResult(_QueryResult__values, _QueryResult__colnames)[source]¶
Bases:
tuple
Result of a query that writes outputs to memory. Tuple of dictionaries with easy access of columns by name as attributes.
Accessing the value of the age column in the first row:
result[0].age result[0]["age"]
Collecting the age for all rows as a tuple:
result.age result.col("age")
Collecting the age for a subset of rows as a tuple:
result[1:3].age result[1:3].col("age")
Collecting the value of the first column for all rows as a tuple:
result.col(0)
Iterating over rows:
for row in result: print(row.age, row.another_column)
spyql.quotes_handler module¶
spyql.sqlfuncs module¶
- ifnull(val, default)¶
returns
default
ifval is NULL
otherwise returnsval
spyql.utils module¶
spyql.writer module¶
- class CSVWriter(path=None, unbuffered=False, header=True, **options)[source]¶
Bases:
spyql.writer.Writer
- class CollectWriter(path=None, unbuffered=False)[source]¶
Bases:
spyql.writer.Writer
Abstract writer that collects all records into a (in-memory) list and dumps all the output records at the end. Child classes must implement the dumprows method.
- class JSONWriter(path=None, unbuffered=False, default=<function json_default>, **options)[source]¶
Bases:
spyql.writer.Writer
- class MemoryWriter(path=None, unbuffered=False)[source]¶
Bases:
spyql.writer.CollectWriter
- class ORJSONWriter(path=None, unbuffered=False, default=<function json_default>, option=0)[source]¶
Bases:
spyql.writer.Writer
- class PlotWriter(path=None, unbuffered=False, header=True, height=20)[source]¶
Bases:
spyql.writer.CollectWriter
- class PrettyWriter(path=None, unbuffered=False, header=True, **options)[source]¶
Bases:
spyql.writer.CollectWriter
- class SQLWriter(path=None, unbuffered=False, chunk_size=1000, table='table_name')[source]¶
Bases:
spyql.writer.Writer
- class SpyWriter(path=None, unbuffered=False)[source]¶
Bases:
spyql.writer.Writer
- class Writer(path=None, unbuffered=False)[source]¶
Bases:
object
- static make_writer(to_clause: dict, output_options: Optional[dict] = None)[source]¶
Factory for making an output writer based on the parsed query
- result() spyql.query_result.QueryResult [source]¶
Gets query result, in case of writing to memory