update the post

sungsoo · Jan 27, 2014 · 267858c · 267858c
1 parent 0f95c83
commit 267858c
Show file tree

Hide file tree

Showing 4 changed files with 9 additions and 10 deletions.
diff --git a/_posts/2014-01-27-operators.md → _posts/2014-01-28-operators.md b/_posts/2014-01-27-operators.md → _posts/2014-01-28-operators.md
@@ -1,7 +1,7 @@
 ---
 layout: post
 title: Continuous Query Semantics and Operators (Part II)
-date: 2014-01-27
+date: 2014-01-28
 categories: [computer science]
 tags: [data management, stream computing]
 
@@ -10,17 +10,16 @@ tags: [data management, stream computing]
 Operators
 ---
 In this post, we now introduce common *continuous query operators*. First, we need to define two important concepts: *monotonicity* and *non-blocking* execution.
-**Definition 2.2** A continuous query or continuous query operator *Q* is monotonic if *Q*(*τ*) ⊆ *Q*(*τ′*) for all *τ ≤ τ′*.  
-For example, simple selection over a single stream or a join of several streams are monotonic (recall our assumption that base streams are append-only). To see this, note that when a new tuple arrives, it either satisfies the (selection or join) predicate or it does not, and the satisfaction condition does not change over time. Thus, at any point in time, all the previously returned results remain in *Q*(*τ*). On the other hand, continuous queries with negation or set difference are non-monotonic, even on append-only streams.  
-**Definition 2.3** A continuous query or continuous query operator *Q* is non-blocking if it does not need to wait until it has seen the entire input before producing results.  
-Some operators have blocking and non-blocking implementations; of course, only the latter are used in DSMSs since we do not know when we will have seen the entire input. For instance, traditional SQL queries with aggregation are blocking since they scan the whole relation and then return the answer. However, on-line aggregation [Hellerstein et al., 1997; Law et al., 2004], where partial answers are incrementally returned as they are computed over the data seen so far, is non- blocking. Note that Definitions 2.2 and 2.3 are related: Q is monotonic if and only if it is non-blocking [Law et al., 2004].  
-The simplest continuous query operators are *stateless*; examples include duplicate-preserving projection, selection, and union. These operators process new tuples on-the-fly without storing any temporary results, either by discarding unwanted attributes (projection) or dropping tuples that do not satisfy the selection condition (technically, the union operator temporarily buffers the inputs to ensure that its output stream is ordered). Figure 2.1(a) shows a simple example of selection (of all the “a” tuples) over the character stream S1.
-A non-blocking, pipelined join (of two character streams, S1 and S2) [Wilschut and Apers, 1991] is illustrated in Figure 2.1(b). A hash-based implementation maintains hash tables on both inputs. When a new tuple arrives on one of the inputs, it is inserted into its hash table and probed against the other stream’s hash table to generate results involving the new tuple, if any. Joins of more than two streams and joins of streams with a static relation are straightforward extensions. In the former, for each arrival on one input, the states of all the other inputs are probed in some order
-[Golab and Özsu, 2003b; Viglas et al., 2003]. In the latter, new arrivals on the stream trigger the probing of the relation [Balazinska et al., 2007].
-Since maintaining hash tables on unbounded streams is not practical, most DSMSs only support window joins. Query Q2 below is an example of a tumbling window join (on the attribute attr) of two streams, S1 and S2, where the result tuples must satisfy the join predicate and belong to the same one-minute tumbling window. Similar to Q1, tumbling windows are created by grouping on the timestamp attribute. At the end of each window, the join operator can clear its hash tables and start producing results for the next window.
+**Definition 2.2** A continuous query or continuous query operator *Q* is *monotonic* if *Q*(*τ*) ⊆ *Q*(*τ′*) for all *τ ≤ τ′*.  
+For example, simple *selection* over a single stream or a *join* of several streams are *monotonic* (recall our assumption that base streams are append-only). To see this, note that when a new tuple arrives, it either satisfies the (selection or join) predicate or it does not, and the *satisfaction condition* does not change over time. Thus, at any point in time, all the previously returned results remain in *Q*(*τ*). On the other hand, continuous queries with *negation* or *set difference* are *non-monotonic*, even on *append-only streams*.  
+**Definition 2.3** A continuous query or continuous query operator *Q* is *non-blocking* if it does not need to wait until it has seen the entire input before producing results.  
+Some operators have *blocking* and *non-blocking* implementations; of course, only the latter are used in DSMSs since we do not know when we will have seen the entire input. For instance, *traditional SQL* queries with aggregation are *blocking* since they scan the whole relation and then return the answer. However, on-line aggregation, where *partial* answers are incrementally returned as they are computed over the data seen so far, is non- blocking. Note that **Definitions 2.2** and **2.3** are related: *Q* is *monotonic* if and only if it is *non-blocking*.  
+The simplest continuous query operators are *stateless*; examples include *duplicate-preserving projection*, *selection*, and *union*. These operators process new tuples *on-the-fly* without storing any temporary results, either by discarding unwanted attributes (projection) or dropping tuples that do not satisfy the selection condition (technically, the union operator temporarily buffers the inputs to ensure that its output stream is ordered). Figure 2.1(a) shows a simple example of selection (of all the “a” tuples) over the character stream S1.
+A *non-blocking*, *pipelined join* (of two character streams, *S1* and *S2*)  is illustrated in Figure 2.1(b). A *hash-based implementation* maintains hash tables on both inputs. When a new tuple arrives on one of the inputs, it is inserted into its hash table and probed against the other stream’s hash table to generate results involving the new tuple, if any. Joins of more than two streams and joins of streams with a static relation are straightforward extensions. In the former, for each arrival on one input, the states of all the other inputs are probed in some order. In the latter, new arrivals on the stream trigger the probing of the relation.
+Since maintaining hash tables on *unbounded* streams is not practical, most DSMSs only support *window joins*. Query *Q2* below is an example of a *tumbling window join* (on the attribute attr) of two streams, *S1* and *S2*, where the result tuples must satisfy the join predicate and belong to the same one-minute tumbling window. Similar to *Q1*, tumbling windows are created by grouping on the timestamp attribute. At the end of each window, the *join operator* can clear its hash tables and start producing results for the next window.
 ```Q2: SELECT	* 
 	FROM	S1, S2    WHERE	S1.attr = S2.attr    GROUP BY S1.timestamp/60 AS minute
-```One disadvantage of Q2 is that matching tuples, whose timestamps are only a few seconds apart but happen to fall into different tumbling windows, will not be reported. Another option is a sliding window join [Golab and Özsu, 2003b; Kang et al., 2003], where two tuples join if they satisfy the join predicate and if their timestamps are at most one window length, call it w, apart. A sliding window join may be expressed in a similar way to Q3 below:
+```One disadvantage of *Q2* is that *matching tuples*, whose timestamps are only a few seconds apart but happen to fall into different tumbling windows, will not be reported. Another option is a *sliding window join*, where two tuples join if they satisfy the join predicate and if their timestamps are at most one window length, call it *w*, apart. A sliding window join may be expressed in a similar way to *Q3* below:
 ```Q3: SELECT *
 	FROM	S1, S2    WHERE	S1.attr = S2.attr    GROUP BY |S1.timestamp - S2.timestamp| <= w```
 Alternatively, if the query language allows explicit specification of sliding windows (e.g., using the SQL:1999 RANGE keyword), Q3 may be expressed as follows:```Q4: SELECT *	FROM S1 [RANGE w], S2 [RANGE w] 	WHERE S1.attr = S2.attr

diff --git a/images/materialized-view.png b/images/materialized-view.png
diff --git a/images/query-languages.png b/images/query-languages.png
diff --git a/images/query-operators.png b/images/query-operators.png