spyql package

Submodules

spyql.agg module

any_agg(val)[source]

Returns True when there is at least one True value, ignoring NULLs

avg_agg(val)[source]

Average all non-null input values

count_agg(val)[source]

Count all non-null input values

count_distinct_agg(val)[source]

Count the number of unique (non-null) input values.

dict_agg(key, val)[source]

Collects key-value pairs into a dict. Key must be unique and not null (null keys are discarded). In case of duplicated keys, the value returned is the last seen.

every_agg(val)[source]

Returns True when all non-null values are True

first_agg(val, respect_nulls=True)[source]

Returns the first value. Returns the first non-null value when respect_nulls is False.

lag_agg(val, offset=1, default=NULL)[source]

Returns the value at offset rows before the last row. Returns default if there is no such row. Especially useful with SELECT PARTIAL to return the value at offset rows before the current row.

last_agg(val, respect_nulls=True)[source]

Returns the last value. Returns the last non-null value when respect_nulls is False.

list_agg(val, respect_nulls=True)[source]

Collects all input values into a list. Filters out NULLs when respect_nulls is False.

max_agg(val)[source]

Maximum value across all non-null input values

min_agg(val)[source]

Minimum value across all non-null input values

prod_agg(val)[source]

Product across all non-null input values

set_agg(val, respect_nulls=True)[source]

Collects all distinct input values into a set. Filters out NULLs when respect_nulls is False.

string_agg(val, sep, respect_nulls=False)[source]

Concatenates all input values into a string. Uses sep to separate values in the string. Filters out NULLs when respect_nulls is False (default).

sum_agg(val)[source]

Sum of all non-null input values

spyql.cli module

spyql.log module

colored(val, color)[source]

Creates string representation of val for color printing

conversion_warning(atype, exception, *args, **kwargs)[source]
mk_user_msg(level, level_color, message, code=None, code_color=asciichartpy.lightyellow)[source]
user_debug(message, code=None)[source]

Reports (verbose) information.

user_debug_dict(message, adict)[source]

Reports (debug) information, printing a dict as a pretty json.

user_error(message, exception, code=None, vars=None)[source]

Reports an error, throwing the original exception Prints a custom message. Prints the data that originated the exception (if available).

user_info(message, code=None)[source]

Reports (verbose) information.

user_warning(message, exception=None, code=None)[source]

Reports a warning. Prints a custom message. If error_on_warning is True, raises an error instead.

user_warning4func(message, exception, *args, **kwargs)[source]

spyql.nulltype module

A NULL value means that data is missing, just like in SQL. An operation with NULL returns NULL without throwing exceptions or errors. Here are some examples:

NULL + 1
NULL['a']
int('Hello')
float('')

To test if a value is NULL or not, you should use the is and is not operators.

SELECT * FROM csv WHERE col1 is not NULL

You can use one of the following alternative casing:

NULL
Null
null
class NullType[source]

Bases: object

get(*args, **kwargs)[source]

spyql.output_handler module

class DelayedOutSortAtEnd(orderby, limit, offset)[source]

Bases: spyql.output_handler.OutputHandler

Only writes after collecting and sorting all data. Temporary implementation that reads every processed row into memory.

finish()[source]
handle_result(result, sort_keys, *_)[source]

To be implemented by child classes to handle a new output row (aka result). All inputs should be tuples.

class DistinctDelayedOutSortAtEnd(orderby, limit, offset)[source]

Bases: spyql.output_handler.DelayedOutSortAtEnd

Alters DelayedOutSortAtEnd to only store distinct results instead of keeping all rows in memory

finish()[source]
handle_result(result, sort_keys, *_)[source]

To be implemented by child classes to handle a new output row (aka result). All inputs should be tuples.

class GroupByDelayedOutSortAtEnd(orderby, limit, offset)[source]

Bases: spyql.output_handler.DelayedOutSortAtEnd

Extends DelayedOutSortAtEnd to only store intermediate group by results instead of keeping all rows in memory

finish()[source]
handle_result(result, sort_keys, group_key)[source]

To be implemented by child classes to handle a new output row (aka result). All inputs should be tuples.

class LineInDistinctLineOut(limit, offset)[source]

Bases: spyql.output_handler.OutputHandler

In-memory distinct handler that immediately writes every non-duplicated row

finish()[source]
handle_result(result, *_)[source]

To be implemented by child classes to handle a new output row (aka result). All inputs should be tuples.

class LineInLineOut(limit, offset)[source]

Bases: spyql.output_handler.OutputHandler

Simple handler that immediately writes every processed row

finish()[source]
handle_result(result, *_)[source]

To be implemented by child classes to handle a new output row (aka result). All inputs should be tuples.

class OutputHandler(limit, offset)[source]

Bases: object

Mediates data processing with data writting

finish()[source]
handle_result(result, group_key, sort_keys)[source]

To be implemented by child classes to handle a new output row (aka result). All inputs should be tuples.

is_done()[source]
static make_handler(prs)[source]

Chooses the right handler depending on the kind of query and eventual optimization opportunities

set_writer(writer)[source]
write(row)[source]

spyql.parser module

class KeywordOrderValidator(keywords)[source]

Bases: object

Check if the keyword position is valid

run(keyword: Optional[str])[source]
extract_args(*args, **kwargs)[source]
extract_funcs(expr)[source]
get_agg_funcs()[source]
has_agg_func(expr)[source]
has_reference2row(expr)[source]
make_expr_ready(expr)[source]
parse(query, default_to_clause='MEMORY')[source]

parses the spyql query

parse_fromto(clause, strings, formats_list)[source]
parse_groupby(clause, select, strings)[source]

splits the GROUP BY clause

parse_orderby(clause, strings)[source]

splits the ORDER BY clause and handles modifiers

parse_select(sel, strings)[source]

splits the SELECT clause into columns and find their names

parse_structure(query: str)[source]

parse the supported keywords, which must follow a given order

pythonize(s)[source]

replaces sql/custom syntax by python syntax

split_multi_expr_clause(s)[source]

Transforms “abc, (123 + 1) * 2, f(a,b)” into [“abc”, “(123 + 1) * 2”, “f(a,b)”]

throw_error_if_has_agg_func(expr, clause_name)[source]

spyql.processor module

class CSVProcessor(prs, strings, path=None, sample_size=10, header=None, infer_dtypes=True, **options)[source]

Bases: spyql.processor.Processor

get_input_iterator()[source]

Returns iterator over input (e.g. list if rows) Each row is list with one value per column e.g.:

[[1] ,[2], [3]]               # 3 rows with a single col
[[1,'a'], [2,'b'], [3,'c']]   # 3 rows with 2 cols
handle_header_row(row)[source]

Action for header row (e.g. column name definition)

reading_data()[source]

Returns True after reading header, metadata, etc in input file

class JSONProcessor(prs, strings, path=None, **options)[source]

Bases: spyql.processor.Processor

get_input_iterator()[source]

Returns iterator over input (e.g. list if rows) Each row is list with one value per column e.g.:

[[1] ,[2], [3]]               # 3 rows with a single col
[[1,'a'], [2,'b'], [3,'c']]   # 3 rows with 2 cols
class ORJSONProcessor(prs, strings, path=None, **options)[source]

Bases: spyql.processor.Processor

get_input_iterator()[source]

Returns iterator over input (e.g. list if rows) Each row is list with one value per column e.g.:

[[1] ,[2], [3]]               # 3 rows with a single col
[[1,'a'], [2,'b'], [3,'c']]   # 3 rows with 2 cols
class Processor(prs, strings, path=None)[source]

Bases: object

close()[source]
compile_clause(clause, clause_modifier=None, mode='eval')[source]

Compiles a clause of the query

default_col_name(idx)[source]

Default column names, e.g. col1 for the first column

eval_clause(clause, clause_exprs, mode='eval')[source]

Evaluates/executes a previously compiled clause

get_input_iterator()[source]

Returns iterator over input (e.g. list if rows) Each row is list with one value per column e.g.:

[[1] ,[2], [3]]               # 3 rows with a single col
[[1,'a'], [2,'b'], [3,'c']]   # 3 rows with 2 cols
go(output_options, user_query_vars={}) Tuple[spyql.query_result.QueryResult, Dict[str, int]][source]
handle_1st_data_row(row)[source]

Action for handling the first row of data

handle_header_row(row)[source]

Action for header row (e.g. column name definition)

static input_processors()[source]
is_clause_single(clause)[source]

True if clause can only have a single expression

make_out_cols_names(out_cols_names)[source]

Creates list of output column names

static make_processor(prs: dict, strings: spyql.quotes_handler.QuotesHandler, input_options: Optional[dict] = None)[source]

Factory for making an input processor based on the parsed query

make_row_meta()[source]
prepare_expression(expr)[source]

Replaces identifiers (column names) in sql expressions by references to _values and put (quoted) strings back

reading_data()[source]

Returns True after reading header, metadata, etc in input file

class PythonExprProcessor(prs, strings)[source]

Bases: spyql.processor.Processor

get_input_iterator()[source]

Returns iterator over input (e.g. list if rows) Each row is list with one value per column e.g.:

[[1] ,[2], [3]]               # 3 rows with a single col
[[1,'a'], [2,'b'], [3,'c']]   # 3 rows with 2 cols
class SpyProcessor(prs, strings, path=None)[source]

Bases: spyql.processor.Processor

get_input_iterator()[source]

Returns iterator over input (e.g. list if rows) Each row is list with one value per column e.g.:

[[1] ,[2], [3]]               # 3 rows with a single col
[[1,'a'], [2,'b'], [3,'c']]   # 3 rows with 2 cols
handle_header_row(row)[source]

Action for header row (e.g. column name definition)

reading_data()[source]

Returns True after reading header, metadata, etc in input file

static unpack_line(line)[source]
class TextProcessor(prs, strings, path=None)[source]

Bases: spyql.processor.Processor

get_input_iterator()[source]

Returns iterator over input (e.g. list if rows) Each row is list with one value per column e.g.:

[[1] ,[2], [3]]               # 3 rows with a single col
[[1,'a'], [2,'b'], [3,'c']]   # 3 rows with 2 cols
init_vars(user_query_vars={})[source]

Initializes dict of variables for user queries

spyql.prof module

spyql.qdict module

class qdict(adic, dirty=True, **kwargs)[source]

Bases: dict

A dictionary that supports NULL where items can be accessed like attributes:

mydict = qdict({
    "a": 1,
    "b": {
        "c": 2
    },
    "d": None
})
mydict.a    # returns 1, same as mydict["a"]
mydict.z    # returns NULL whenever a key is not found
mydict.b.c  # returns 2, neested dicts also support attribute access
mydict.b.x  # returns NULL, neested dicts are null-safe too
mydict.d    # returns NULL, Nones are converted to NULLs)
items()[source]

Returns a zip of keys and values. Attention: does not return a view like dict!

updatef(another_dict)[source]
values()[source]

Returns a tuple of the dict values. Attention: does not return a view like dict!

class str_qdict(adic, dirty=True, **kwargs)[source]

Bases: spyql.qdict.qdict

spyql.query module

class Query(query: str, input_options: Optional[dict] = None, output_options: Optional[dict] = None, json_obj_files: Optional[dict] = None, unbuffered=False, warning_flag='default', verbose=0, default_to_clause='MEMORY')[source]

Bases: object

A SPyQL query than can be executed on top of a file or variables producing a file or a QueryResult. Example:

query = Query("""
    SELECT row.name as first_name, row.age
    FROM data
    WHERE row.age > 30
""")

result = query(data=[
    {"name": "Alice", "age": 20, "salary": 30.0},
    {"name": "Bob", "age": 30, "salary": 12.0},
    {"name": "Charles", "age": 40, "salary": 6.0},
    {"name": "Daniel", "age": 43, "salary": 0.40},
])

## result:
# (
#    {"first_name": "Charles", "age": 40},
#    {"first_name": "Daniel", "age": 43},
# )
stats()[source]

Returns a dictionary with statistics about the query execution, namely the number of rows in the input and output.

Return type

dict

spyql.query_result module

class QueryResult(_QueryResult__values, _QueryResult__colnames)[source]

Bases: tuple

Result of a query that writes outputs to memory. Tuple of dictionaries with easy access of columns by name as attributes.

Accessing the value of the age column in the first row:

result[0].age
result[0]["age"]

Collecting the age for all rows as a tuple:

result.age
result.col("age")

Collecting the age for a subset of rows as a tuple:

result[1:3].age
result[1:3].col("age")

Collecting the value of the first column for all rows as a tuple:

result.col(0)

Iterating over rows:

for row in result:
    print(row.age, row.another_column)
col(idx)[source]

Collects and returns a tuple with all values of the col defined by idx.

Parameters

idx – if idx is an integer it refers to the nth column (0-based indexing). If idx is a string it refers to the name of the column.

colnames()[source]

Returns a tuple with the name of each column.

Return type

tuple[str]

spyql.quotes_handler module

class QuotesHandler[source]

Bases: object

extract_strings(query)[source]
put_strings_back(text, quote=True)[source]
static string_placeholder_re()[source]

spyql.sqlfuncs module

coalesce(val, default)[source]

returns default if val is NULL otherwise returns val

complex_(*args)[source]
float_(a)[source]
ifnull(val, default)

returns default if val is NULL otherwise returns val

int_(a, *args, **kwargs)[source]
null_safe_call(fun, *args, **kwargs)[source]
nullif(a, b)[source]

returns NULL if a == b otherwise returns a

str_(*args, **kwargs)[source]

spyql.utils module

is_row_collapsable(row, colnames)[source]

Returns True if row only has a single column of type dict with a default name. In this case, a row (of type dict) can take the value of the first column.

isiterable(x)[source]

Returns True if x is iterable that is not a string or dict and is not null

join_paths(x, *args)[source]

convienience function for os.path.join

make_str_valid_varname(s)[source]
quote_ifstr(s)[source]
try2eval(val, globals={}, locals={})[source]

spyql.writer module

class CSVWriter(path=None, unbuffered=False, header=True, **options)[source]

Bases: spyql.writer.Writer

writeheader(header)[source]
writerow(row)[source]
writerows(rows)[source]
class CollectWriter(path=None, unbuffered=False)[source]

Bases: spyql.writer.Writer

Abstract writer that collects all records into a (in-memory) list and dumps all the output records at the end. Child classes must implement the dumprows method.

dumprows(rows)[source]
flush()[source]
transformrow(row)[source]
transformvalue(value)[source]
writerow(row)[source]
class JSONWriter(path=None, unbuffered=False, default=<function json_default>, **options)[source]

Bases: spyql.writer.Writer

writerow(row)[source]
class MemoryWriter(path=None, unbuffered=False)[source]

Bases: spyql.writer.CollectWriter

dumprows(rows)[source]
result()[source]

Gets query result, in case of writing to memory

transformrow(row)[source]
class ORJSONWriter(path=None, unbuffered=False, default=<function json_default>, option=0)[source]

Bases: spyql.writer.Writer

writerow(row)[source]
class PlotWriter(path=None, unbuffered=False, header=True, height=20)[source]

Bases: spyql.writer.CollectWriter

dumprows(rows)[source]
transformvalue(value)[source]
class PrettyWriter(path=None, unbuffered=False, header=True, **options)[source]

Bases: spyql.writer.CollectWriter

dumprows(rows)[source]
class SQLWriter(path=None, unbuffered=False, chunk_size=1000, table='table_name')[source]

Bases: spyql.writer.Writer

flush()[source]
writeheader(header)[source]
writerow(row)[source]
writestatement()[source]
class SpyWriter(path=None, unbuffered=False)[source]

Bases: spyql.writer.Writer

static pack(row)[source]
writeheader(header)[source]
writerow(row)[source]
class Writer(path=None, unbuffered=False)[source]

Bases: object

close()[source]
flush()[source]
static make_writer(to_clause: dict, output_options: Optional[dict] = None)[source]

Factory for making an output writer based on the parsed query

static output_writers()[source]
result() spyql.query_result.QueryResult[source]

Gets query result, in case of writing to memory

writeheader(header)[source]
writerow(row)[source]
writerows(rows)[source]
json_default(x)[source]

Module contents