Skip to content

Commit

Permalink
Merge pull request activeloopai#1896 from activeloopai/fy_query_api_ref
Browse files Browse the repository at this point in the history
Tensor Query Language documentation
  • Loading branch information
FayazRahman authored Sep 26, 2022
2 parents c8e45de + 50fc765 commit b14d4c7
Show file tree
Hide file tree
Showing 6 changed files with 128 additions and 55 deletions.
107 changes: 107 additions & 0 deletions docs/source/Tensor-Query-Language.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
.. _tql:

Tensor Query Language
=====================

.. role:: sql(code)
:language: sql

This page describes Tensor Query Language (TQL), defines SQL expressions we support as well as new expressions we add on top of SQL.

Language
~~~~~~~~

SELECT
------

TQL supports only :sql:`SELECT` statement. Every TQL expression starts with :sql:`SELECT *`. TQL supports only :sql:`*` which means to select all tensors.
The common syntax for select statement is the following:

.. code-block:: sql
SELECT * [FROM string] [WHERE expression] [LIMIT number [OFFSET number]] [ORDER BY expression [ASC/DESC]]
Each part of the :sql:`SELECT` statement can be omitted.

:sql:`FROM` expression is allowed, but it does not have any effect on the query, because for now TQL queries are run on a specific dataset,
so the :sql:`FROM` is known from the context

WHERE
-----

:sql:`WHERE` expression is used to filter the samples in the dataset by conditions. The conditions should be convertible to boolean.
Any expression which outputs a number will be converted to boolean with non-zero values taken as ``True``. If the expression is not convertible to boolean,
such as **strings**, **json** objects and **arrays**, the query will print the corresponding error.

ORDER BY
--------

:sql:`ORDER BY` expression orders the output of the query by the given criteria. The criteria can be any expression output of which can be ordered.
The ordered outputs are either scalar numbers or strings. In addition it can also be json, which contains number or string.

:sql:`ORDER BY` statement optionally accepts :sql:`ASC/DESC` keywords specifying whether the ordering should be ascending or descending.
It is ascending by default.

LIMIT OFFSET
------------

:sql:`LIMIT` and :sql:`OFFSET` expressions are used to limit the output of the query by index, as in SQL.

Expressions
-----------

TQL supports any comparison operator (``==, !=, <, <=, >=``) where the left side is a tensor and the right side is a known value.

The value can be numeric scalar or array as well as string value.

String literal should be provided within single quotes (``'``) and can be used on ``class_label``, ``json`` and ``text`` tensors.

For class labels it will get corresponding numeric value from the **class_names** list and do numeric comparison.

For json and text it will do string comparison. The left side of the expression
can be indexed (subscripted) if the tensor is multidimensional array or json. Jsons support indexing by string, e.g. ``index_meta['id'] == 'some_id'``.
Jsons can also be indexed by number if the underlying data is array.

Numeric multidimensional tensors can be indexed by numbers, e.g. ``categories[0] == 1`` as well as Python style slicing and
multidimensional indexing, such as ``boxes[:2]``. This last expression returns array containing the third elements of the initial
two dimensional array boxes.

TQL supports logical operators - :sql:`AND`, :sql:`OR` and :sql:`NOT`. These operators can be used to combine boolean expressions.
For example,

.. code-block:: sql
labels == 0 OR labels == 1
From SQL we also support the following two keywords:

- :sql:`BETWEEN`

.. code-block:: sql
labels BETWEEN 0 and 5
- :sql:`IN`

.. code-block:: sql
labels in ARRAY[0, 2, 4, 6, 8]
Functions
---------

There are predefined functions which can be used in :sql:`WHERE` expression as well as in :sql:`ORDER BY` expressions:

- ``CONTAINS`` - checks if the given tensor contains given value - :sql:`CONTAINS(categories, 'person')`
- ``RANDOM`` - returns random number. May be used in :sql:`ORDER BY` to shuffle the output - :sql:`ORDER BY RANDOM()`
- ``SHAPE`` - returns the shape array of the given tensor - ``SHAPE(boxes)``
- ``ALL`` - takes an array of booleans and returns single boolean, ``True`` if all elements of the input array are ``True``
- ``ALL_STRICT`` - same as :sql:`ALL` with one difference. :sql:`ALL` returns ``True`` on empty array, while :sql:`ALL_STRICT` return ``False``
- ``ANY`` - takes an array of booleans and returns single boolean, ``True`` if any of the elements int the input array is ``True``
- ``LOGICAL_AND`` - takes two boolean arrays, does element wise **logical and**, returns the result array. This will return ``False`` if the input arrays have different sizes.
- ``LOGICAL_OR`` - takes two boolean arrays, does element wise **logical or**, returns the result array. This will return ``False`` if the input arrays have different sizes.

UNION, INTERSECT, EXCEPT
------------------------

Query can contain multiple :sql:`SELECT` statements, combined by one of the set operations - :sql:`UNION`, :sql:`INTERSECT` and :sql:`EXCEPT`.
2 changes: 1 addition & 1 deletion docs/source/hub.experimental.dataloader.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ hub.experimental.dataloader

.. currentmodule:: hub.experimental.dataloader

.. autoclass:: Hub3DataLoader
.. autoclass:: Hub3DataLoader()
:members:
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Hub is an open-source database for AI.
:caption: Experimental API

Dataloader & Query <Dataloader-and-Query>
Tensor Query Language <Tensor-Query-Language>

.. toctree::
:maxdepth: 1
Expand Down
2 changes: 1 addition & 1 deletion hub/experimental/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from hub.experimental.dataloader import dataloader
from hub.experimental.dataloader import dataloader, Hub3DataLoader
from hub.experimental.hub3_query import query
from hub.experimental.convert_to_hub3 import dataset_to_hub3
50 changes: 16 additions & 34 deletions hub/experimental/dataloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def __init__(
self._return_index = _return_index

def batch(self, batch_size: int, drop_last: bool = False):
"""Returns a batched hub.experimental.Hub3DataLoader object.
"""Returns a batched :class:`Hub3DataLoader` object.
Args:
Expand All @@ -61,7 +61,7 @@ def batch(self, batch_size: int, drop_last: bool = False):
Returns:
Hub3DataLoader: A hub.experimental.Hub3DataLoader object.
Hub3DataLoader: A :class:`Hub3DataLoader` object.
Raises:
Expand All @@ -76,11 +76,11 @@ def batch(self, batch_size: int, drop_last: bool = False):
return self.__class__(**all_vars)

def shuffle(self):
"""Returns a shuffled hub.experimental.Hub3DataLoader object.
"""Returns a shuffled :class:`Hub3DataLoader` object.
Returns:
Hub3DataLoader: A hub.experimental.Hub3DataLoader object.
Hub3DataLoader: A :class:`Hub3DataLoader` object.
Raises:
Expand All @@ -93,15 +93,15 @@ def shuffle(self):
return self.__class__(**all_vars)

def transform(self, transform: Union[Callable, Dict[str, Optional[Callable]]]):
"""Returns a transformed hub.experimental.Hub3DataLoader object.
"""Returns a transformed :class:`Hub3DataLoader` object.
Args:
transform (Callable or Dict[Callable]): A function or dictionary of functions to apply to the data.
Returns:
Hub3DataLoader: A hub.experimental.Hub3DataLoader object.
Hub3DataLoader: A :class:`Hub3DataLoader` object.
Raises:
Expand All @@ -125,33 +125,15 @@ def transform(self, transform: Union[Callable, Dict[str, Optional[Callable]]]):
return self.__class__(**all_vars)

def query(self, query_string: str):
"""Returns a sliced hub.experimental.Hub3DataLoader object with given query results.
It allows to run SQL like queries on dataset and extract results. Currently supported keywords are the following:
+-------------------------------------------+
| SELECT |
+-------------------------------------------+
| FROM |
+-------------------------------------------+
| CONTAINS |
+-------------------------------------------+
| ORDER BY |
+-------------------------------------------+
| GROUP BY |
+-------------------------------------------+
| LIMIT |
+-------------------------------------------+
| OFFSET |
+-------------------------------------------+
| RANDOM() -> for shuffling query results |
+-------------------------------------------+
"""Returns a sliced :class:`Hub3DataLoader` object with given query results.
It allows to run SQL like queries on dataset and extract results. See supported keywords and the Tensor Query Language documentation
:ref:`here <tql>`.
Args:
query_string (str): An SQL string adjusted with new functionalities to run on the dataset object
Returns:
Hub3DataLoader: A hub.experimental.Hub3DataLoader object.
Hub3DataLoader: A :class:`Hub3DataLoader` object.
Examples:
>>> import hub
Expand Down Expand Up @@ -179,7 +161,7 @@ def pytorch(
distributed: bool = False,
return_index: bool = True,
):
"""Returns a hub.experimental.Hub3DataLoader object.
"""Returns a :class:`Hub3DataLoader` object.
Args:
Expand All @@ -193,7 +175,7 @@ def pytorch(
Returns:
Hub3DataLoader: A hub.experimental.Hub3DataLoader object.
Hub3DataLoader: A :class:`Hub3DataLoader` object.
Raises:
Expand Down Expand Up @@ -232,7 +214,7 @@ def numpy(
num_threads: Optional[int] = None,
prefetch_factor: int = 10,
):
"""Returns a hub.experimental.Hub3DataLoader object.
"""Returns a :class:`Hub3DataLoader` object.
Args:
num_workers (int): Number of workers to use for transforming and processing the data. Defaults to 0.
Expand All @@ -242,7 +224,7 @@ def numpy(
Returns:
Hub3DataLoader: A hub.experimental.Hub3DataLoader object.
Hub3DataLoader: A :class:`Hub3DataLoader` object.
Raises:
Expand Down Expand Up @@ -314,14 +296,14 @@ def __iter__(self):


def dataloader(dataset) -> Hub3DataLoader:
"""Returns a hub.experimental.Hub3DataLoader object which can be transformed to either pytorch dataloader or numpy.
"""Returns a :class:`Hub3DataLoader` object which can be transformed to either pytorch dataloader or numpy.
Args:
dataset: hub.Dataset object on which dataloader needs to be built
Returns:
Hub3DataLoader: A hub.experimental.Hub3DataLoader object.
Hub3DataLoader: A :class:`Hub3DataLoader` object.
Examples:
Expand Down
21 changes: 2 additions & 19 deletions hub/experimental/hub3_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,8 @@
def query(dataset, query_string: str):
"""Returns a sliced hub.Dataset with given query results.
It allows to run SQL like queries on dataset and extract results. Currently supported keywords are the following:
+-------------------------------------------+
| SELECT |
+-------------------------------------------+
| FROM |
+-------------------------------------------+
| CONTAINS |
+-------------------------------------------+
| ORDER BY |
+-------------------------------------------+
| GROUP BY |
+-------------------------------------------+
| LIMIT |
+-------------------------------------------+
| OFFSET |
+-------------------------------------------+
| RANDOM() -> for shuffling query results |
+-------------------------------------------+
It allows to run SQL like queries on dataset and extract results. See supported keywords and the Tensor Query Language documentation
:ref:`here <tql>`.
Args:
Expand Down

0 comments on commit b14d4c7

Please sign in to comment.