Leopards is a way to query list of dictionaries or objects as if you are filtering in DBMS. You can get dicts/objects that are matched by OR, AND or NOT or all of them. As you can see in the comparison they are much faster than Pandas.
pip install leopards
from leopards import Q
l = [{"name":"John","age":"16"}, {"name":"Mike","age":"19"},{"name":"Sarah","age":"21"}]
filtered= Q(l,{'name__contains':"k", "age__lt":20})
print(list(filtered))
output
[{'name': 'Mike', 'age': '19'}]
The above filtration can be written as
from leopards import Q
l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, name__contains="k", age__lt=20)
Notes:
Q
returns an iterator which can be converted to a list by callinglist
.- Even though, age was
str
in the dict, as the value of in the query dict wasint
, Leopards converted the value in dict automatically to match the query data type. This behaviour can be stopped by passingFalse
toconvert_types
parameter.
eq
: equals and this default filtergt
: greater than.gte
: greater than or equal.lt
: less thanlte
: less than or equalin
: the value in a list of a tuple.- e.g. age__in=[10,20,30]
contains
: contains a substring as in the example.icontains
: case-insensitivecontains
.startswith
: checks if a value starts with a query strings.istartswith
: case-insensitivestartswith
.endswith
: checks if a value ends with a query strings.iendswith
: case-insensitiveendswith
.isnull
: checks if the value matches any of NULL_VALUES which are('', '.', None, "None", "null", "NULL")
- e.g.
filter__isnull=True
orfilter__isnull=False
- e.g.
For eq
,gt
,gte
,lt
,lte
, in
, contains
, icontains
, startswith
,istartswith
, endswith
and iendswith
, you can add a n
to negate the results. e.g nin
which is equivalent to not in
This section will cover the use of OR
, AND
and NOT
OR
or __or__
takes a list of dictionaries to evaluate and returns with the first True
.
from leopards import Q
l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"OR": [{"name__contains": "k"}, {"age__gte": 21}]})
print(list(filtered))
output
[{'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]
NOT
or __not__
takes a dict for query run.
from leopards import Q
l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"age__gt": 15, "NOT": {"age__eq": 19}})
print(list(filtered))
output
[{'name': 'John', 'age': '16'}, {'name': 'Sarah', 'age': '21'}]
AND
or __and__
takes a list of dict for query run, returns with the first False
.
from leopards import Q
l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"__and__": [{"age__gte": 15}, {"age__lt": 21}]})
print(list(filtered))
output
[{'name': 'John', 'age': '16'}, {'name': 'Mike', 'age': '19'}]
You can run the following aggregations
- Count
- Max
- Min
- Sum
- Avg
Find the count of certain aggregated column
l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Count
count = Count(l,['age'])
output
[{"age":"16","count":1},{"age":"19","count":2}, {"age":"21","count":1}]
Find the Max value for a certain column in certain aggregated columns
l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Max
count = Max(l,"age",['name'],dtype=int)
output
[{'name': 'John', 'age': '19'}, {'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]
Notes:
- If you don't pass the aggregation columns, the maximum will be found across dataset.
- You can pass the datatype of the column to convert it on the fly while evaluating
l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Max
m = Max(l,"age",dtype=int)
output
[{'age': 21}]
Find the Max value for a certain column in certain aggregated columns
l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Min
m = Min(l,"age",['name'])
output
[{'name': 'John', 'age': '16'}, {'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]
Note:
- If you don't pass the aggregation columns, the min will be found across dataset.
- You can pass the datatype of the column to convert it on the fly while evaluating
Like Min and Max but only works with integers and floats.
This is done on Python 3.8 running on Ubuntu 22.04 on i7 11th generation and 32 GB of RAM.
Comparison | Pandas | Leopards |
---|---|---|
Package Size (Lower is better) |
29.8 MB | 7.5 KB |
import Time (Worst) (Lower is better) |
146 ms | 1.05 ms |
load 10k CSV lines (Lower is better) [1] |
0.295s | 0.138s |
get first matched record (Lower is better) |
0.310s | 0.017s |
print all filtered records (10/10k) (Lower is better) |
0.310s | 0.137s |
filter by integers (Lower is better) |
0.316s | 0.138s |
[1] This was loading the whole csv in memory which was for sake of fair comparison. Nevertheless, Leopards can work with DictReader as an iterable which executes in 0.014s, then it handles line by line.
Thanks for Asma Tahir for Pandas stats.