Skip to content

Commit

Permalink
update the post
Browse files Browse the repository at this point in the history
  • Loading branch information
Sung-Soo Kim committed Jan 27, 2014
1 parent 0f95c83 commit 267858c
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 10 deletions.
19 changes: 9 additions & 10 deletions _posts/2014-01-27-operators.md → _posts/2014-01-28-operators.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: post
title: Continuous Query Semantics and Operators (Part II)
date: 2014-01-27
date: 2014-01-28
categories: [computer science]
tags: [data management, stream computing]

Expand All @@ -10,17 +10,16 @@ tags: [data management, stream computing]
Operators
---
In this post, we now introduce common *continuous query operators*. First, we need to define two important concepts: *monotonicity* and *non-blocking* execution.
**Definition 2.2** A continuous query or continuous query operator *Q* is monotonic if *Q*(*τ*) ⊆ *Q*(*τ′*) for all *τ ≤ τ′*.
For example, simple selection over a single stream or a join of several streams are monotonic (recall our assumption that base streams are append-only). To see this, note that when a new tuple arrives, it either satisfies the (selection or join) predicate or it does not, and the satisfaction condition does not change over time. Thus, at any point in time, all the previously returned results remain in *Q*(*τ*). On the other hand, continuous queries with negation or set difference are non-monotonic, even on append-only streams.
**Definition 2.3** A continuous query or continuous query operator *Q* is non-blocking if it does not need to wait until it has seen the entire input before producing results.
Some operators have blocking and non-blocking implementations; of course, only the latter are used in DSMSs since we do not know when we will have seen the entire input. For instance, traditional SQL queries with aggregation are blocking since they scan the whole relation and then return the answer. However, on-line aggregation [Hellerstein et al., 1997; Law et al., 2004], where partial answers are incrementally returned as they are computed over the data seen so far, is non- blocking. Note that Definitions 2.2 and 2.3 are related: Q is monotonic if and only if it is non-blocking [Law et al., 2004].
The simplest continuous query operators are *stateless*; examples include duplicate-preserving projection, selection, and union. These operators process new tuples on-the-fly without storing any temporary results, either by discarding unwanted attributes (projection) or dropping tuples that do not satisfy the selection condition (technically, the union operator temporarily buffers the inputs to ensure that its output stream is ordered). Figure 2.1(a) shows a simple example of selection (of all the “a” tuples) over the character stream S1.
A non-blocking, pipelined join (of two character streams, S1 and S2) [Wilschut and Apers, 1991] is illustrated in Figure 2.1(b). A hash-based implementation maintains hash tables on both inputs. When a new tuple arrives on one of the inputs, it is inserted into its hash table and probed against the other stream’s hash table to generate results involving the new tuple, if any. Joins of more than two streams and joins of streams with a static relation are straightforward extensions. In the former, for each arrival on one input, the states of all the other inputs are probed in some order
[Golab and Özsu, 2003b; Viglas et al., 2003]. In the latter, new arrivals on the stream trigger the probing of the relation [Balazinska et al., 2007].
Since maintaining hash tables on unbounded streams is not practical, most DSMSs only support window joins. Query Q2 below is an example of a tumbling window join (on the attribute attr) of two streams, S1 and S2, where the result tuples must satisfy the join predicate and belong to the same one-minute tumbling window. Similar to Q1, tumbling windows are created by grouping on the timestamp attribute. At the end of each window, the join operator can clear its hash tables and start producing results for the next window.
**Definition 2.2** A continuous query or continuous query operator *Q* is *monotonic* if *Q*(*τ*) ⊆ *Q*(*τ′*) for all *τ ≤ τ′*.
For example, simple *selection* over a single stream or a *join* of several streams are *monotonic* (recall our assumption that base streams are append-only). To see this, note that when a new tuple arrives, it either satisfies the (selection or join) predicate or it does not, and the *satisfaction condition* does not change over time. Thus, at any point in time, all the previously returned results remain in *Q*(*τ*). On the other hand, continuous queries with *negation* or *set difference* are *non-monotonic*, even on *append-only streams*.
**Definition 2.3** A continuous query or continuous query operator *Q* is *non-blocking* if it does not need to wait until it has seen the entire input before producing results.
Some operators have *blocking* and *non-blocking* implementations; of course, only the latter are used in DSMSs since we do not know when we will have seen the entire input. For instance, *traditional SQL* queries with aggregation are *blocking* since they scan the whole relation and then return the answer. However, on-line aggregation, where *partial* answers are incrementally returned as they are computed over the data seen so far, is non- blocking. Note that **Definitions 2.2** and **2.3** are related: *Q* is *monotonic* if and only if it is *non-blocking*.
The simplest continuous query operators are *stateless*; examples include *duplicate-preserving projection*, *selection*, and *union*. These operators process new tuples *on-the-fly* without storing any temporary results, either by discarding unwanted attributes (projection) or dropping tuples that do not satisfy the selection condition (technically, the union operator temporarily buffers the inputs to ensure that its output stream is ordered). Figure 2.1(a) shows a simple example of selection (of all the “a” tuples) over the character stream S1.
A *non-blocking*, *pipelined join* (of two character streams, *S1* and *S2*) is illustrated in Figure 2.1(b). A *hash-based implementation* maintains hash tables on both inputs. When a new tuple arrives on one of the inputs, it is inserted into its hash table and probed against the other stream’s hash table to generate results involving the new tuple, if any. Joins of more than two streams and joins of streams with a static relation are straightforward extensions. In the former, for each arrival on one input, the states of all the other inputs are probed in some order. In the latter, new arrivals on the stream trigger the probing of the relation.
Since maintaining hash tables on *unbounded* streams is not practical, most DSMSs only support *window joins*. Query *Q2* below is an example of a *tumbling window join* (on the attribute attr) of two streams, *S1* and *S2*, where the result tuples must satisfy the join predicate and belong to the same one-minute tumbling window. Similar to *Q1*, tumbling windows are created by grouping on the timestamp attribute. At the end of each window, the *join operator* can clear its hash tables and start producing results for the next window.
```Q2: SELECT *
FROM S1, S2 WHERE S1.attr = S2.attr GROUP BY S1.timestamp/60 AS minute
```One disadvantage of Q2 is that matching tuples, whose timestamps are only a few seconds apart but happen to fall into different tumbling windows, will not be reported. Another option is a sliding window join [Golab and Özsu, 2003b; Kang et al., 2003], where two tuples join if they satisfy the join predicate and if their timestamps are at most one window length, call it w, apart. A sliding window join may be expressed in a similar way to Q3 below:
```One disadvantage of *Q2* is that *matching tuples*, whose timestamps are only a few seconds apart but happen to fall into different tumbling windows, will not be reported. Another option is a *sliding window join*, where two tuples join if they satisfy the join predicate and if their timestamps are at most one window length, call it *w*, apart. A sliding window join may be expressed in a similar way to *Q3* below:
```Q3: SELECT *
FROM S1, S2 WHERE S1.attr = S2.attr GROUP BY |S1.timestamp - S2.timestamp| <= w```
Alternatively, if the query language allows explicit specification of sliding windows (e.g., using the SQL:1999 RANGE keyword), Q3 may be expressed as follows:```Q4: SELECT * FROM S1 [RANGE w], S2 [RANGE w] WHERE S1.attr = S2.attr
Expand Down
Binary file added images/materialized-view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/query-languages.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/query-operators.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 267858c

Please sign in to comment.