Skip to content

Tags: TIBCOSoftware/snappydata

Tags

v1.3.1-HF-1

Toggle v1.3.1-HF-1's commit message
Removed "enterprise" from docs and comments

also removed commented out portions for enterprise handling in build.gradle

v1.3.0-HF-1

Toggle v1.3.0-HF-1's commit message
Update version to 1.3.0-HF-1 for hotfix release

- update submodule links to get the fix for GEODE-8029
- other relevant fixes ported from Geode: GEODE-1969, GEODE-5302, GEODE-8667,
                                          GEODE-9881, GEODE-9854
- fixed some test failures

v1.3.1

Toggle v1.3.1's commit message
Updated release notes for the latest changes

- also fixed version string in jar manifests
- update spark sub-module link

v1.3.0

Toggle v1.3.0's commit message
Improved quickstart

- improved Quickstart.scala
- minor fix in IntervalExpression
- added docs/.gitignore to remove generated apidocs and copied reference/sql_functions

v1.3.0-rc2

Toggle v1.3.0-rc2's commit message
1.3.0 RC2 release with the following major additions over RC1:

- Column batch auto-compaction: Deletes in column tables cause the relevant rows in the column
  storage batches to be marked as deleted. If there are large number of deletions in a batch
  then this change will automatically compact the batch in the storage. Previous code only
  removed the batches if all rows have been deleted in the batch. The compaction happens in
  the foreground by the thread doing the delete. Likewise if there are a large number of
  updates to a batch then it will be compacted after a limit. There are two spark-level
  properties (i.e. has to be specified in conf/leads and servers with -snappydata... or
      as system property -D...)  that can control the ratio at which compaction is triggered:
  - snappydata.column.compactionRatio: The ratio of deletes after which the batch will be
    compacted (default is 0.1)
  - snappydata.column.updateCompactionRatio: The ratio of updates to a column after which
    the batch will be compacted (default is 0.2)

- Overflow of large results to disk on the executors: To handle OOM and related problems
  with queries having multi-GB results that get accumulated in memory of leads/servers,
  the new implementation will now dump large results to local disk files at the executors
  themselves. Instead of sending back results it will send back the Spark block IDs for
  those disk files to lead and back to the server which is servicing the client.
  When the iterator on the server finds a block ID instead of row, then it will fetch the
  disk block from the executor at that point and removes the disk block. So there is no longer
  any memory storage of large results anywhere rather only memory for one disk block is
  required temporarily on the server (32MB by default). Note that this works only if either
  connecting to the cluster using JDBC/ODBC (both SnappyData and Spark-hive server work)
  or when using SnappySession then one has to use SnappySession.sql().toLocalIterator().
  Any intervening Dataset API usage will cause the Spark regular DataFrame to be used instead
  of the optimized DataFrame implementation that has this support. There are two spark-level
  properties that can alter the defaults:
  - spark.sql.maxMemoryResultSize: The maximum size of results in a partition on executor
    that will be held in memory after which it is dumped to disk and is expressed as a
    string with units like 4m or 4mb for megabytes (default is 4mb) while the maximum size
    of a single disk file is limited to 8X of this value (so 32mb by default)
  - spark.sql.resultPersistenceTimeout: The maximum time in seconds after completion of query
    for which the disk blocks are held in memory (default is 21600 i.e. 6 hours). The timeout
    property is required because the client can skip consuming the entire result set and stop
    the iteration midway so some disk blocks will remain forever if not cleared out.

- Caching of meta-data of external tables: This is to fix the slow query problem for external
  tables having large number of partitions with disk files. The underlying issue is that with
  the large number of partitions, and each partition having a significant number of files,
  the total number of files is huge. Every query will first try to build the meta-data for all
  the existing parquet/orc/csv files (which may have changed since the last time due to new
      inserts for example) which takes all of the time reported in the queries. This change
  will cache the meta-data in the current session, so the first query on a connection or
  SnappySession will be slow but subsequent ones will be fast. The cache is automatically
  invalidated if there are new inserts or alterations to the external table from the same
  cluster. This brings an important caveat: if the external parquet/orc/csv data is altered
  in any way from outside the cluster then you must invalidate the cache in the cluster
  explicitly using "REFRESH TABLE <name>" SQL or SnappySession.catalog.refreshTable API else
  it will keep returning the results as per old meta-data. The caveat applied even before
  this change since the file/partition count is cached globally in the driver ExternalCatalog.

- Clear broadcast blocks eagerly: This change will clear all the memory/disk blocks used in
  broadcast joins eagerly at the end of the query/DML operation. The previous Spark code
  clears the blocks on the executors only when GC hits on the driver and the broadcast
  object's weak reference is collected, so the blocks can be held unnecessarily on executors
  for quite a while thus increasing GC pressure.

Apart from above there are few Spark patches that have been merged to address occasional
problems reported due to those: SPARK-26352, SPARK-13747, SPARK-27065. Also in accordance with
SPARK-13747, all instances of scala Await.ready/result calls in the code have been changed to
use Spark's ThreadUtils.await methods instead.

v1.2.0

Toggle v1.2.0's commit message
* Version changes in docs files.

v1.1.1

Toggle v1.1.1's commit message
Link with latest store

v1.1.0

Toggle v1.1.0's commit message
* Updated docs with upcoming release version 1.1.0

* Select product name depending upon enterprise flag, to be printed for 'snappy version' command.
* Discussed with @kneeraj
* Link to the latest store commit.

v1.0.2.1

Toggle v1.0.2.1's commit message
* Update the Zeppelin interpreter version.

v1.0.2

Toggle v1.0.2's commit message
* Corrected Zeppelin interpreter version.