Skip to content

Release 71 Stork-Billed Kingfisher

Compare
Choose a tag to compare
@alexanderdean alexanderdean released this 02 Oct 18:48
· 1945 commits to master since this release

Significantly overhauls Snowplow's handling of time and introduces event fingerprinting to support deduplication efforts. It also brings our validation of unstructured events and custom context JSONs "upstream" from our Hadoop Shred process into our Hadoop Enrich process.

Enrich

  • Added example event fingerprint enrichment configuration JSON (#1990)

EmrEtlRunner

  • Bumped to 0.18.0
  • Updated AMI version in config.yml.sample to 3.7.0 (#1959)
  • Updated combine_configurations.rb to add ssl_mode: disable (#1996)

Scala Common Enrich

  • Bumped to 0.16.0
  • Added derived_tstamp enrichment (#1550)
  • Added validation that v_collector is set (#1600)
  • Added validation that collector_tstamp is set and valid (#1611)
  • Added event_vendor/name/format/version to enriched event, thanks @danisola! (#1800)
  • Ported JSON schema from Scala Hadoop Shred, thanks @danisola! (#1637)
  • Bumped referer-parser to 0.3.0 (#1839)
  • Changed etl_tstamp in EnrichmentManager from String to Joda DateTime (#1841)
  • Added support for four new fields in CloudFront access logs (#1865)
  • Bumped user-agent-utils to 1.16 (#1905)
  • Changed BadRow class to use ProcessingMessages (#1936)
  • Ensured that all timestamp fields are nonnegative (#1938)
  • Started catching all exceptions in EtlPipeline (#1954)
  • Added event_fingerprint enrichment (#1965)
  • Bumped Iglu Scala Client to 0.3.0 (#1989)
  • Renamed dvce_tstamp to dvce_created_tstamp (#1995)
  • Started extracting true_tstamp from querystring (#1968)

Scala Hadoop Enrich

  • Bumped to 1.1.0
  • Bumped Scala Common Enrich to 0.16.0 (#1807)
  • Updated tests to expect bad row JSONs with timestamps and processing messages (#1751)
  • Updated to use new EtlPipeline (#1931)
  • Bad rows for Thrift payloads now contain the original Thrift record (#1950)
  • Simplified validation projection code (#1986)

Scala Hadoop Shred

  • Bumped to 0.5.0
  • Updated tests to expect bad row JSONs with timestamps and processing messages (#1953)
  • Added clojars.org as a resolver (#1952)
  • Bumped Scala Common Enrich to 0.16.0 (#1935)
  • Started using BadRow case class from Scala Common Enrich (#1914)
  • Upgraded to Hadoop 2.4 (#1720)
  • Bumped Iglu Scala Client to 0.3.0 (#1221)

Redshift

  • Added event_vendor/name/format/version to atomic.events (#1801)
  • Updated wd_access_log_1.sql with 4 new fields and renamed "x_edge_request_type" to "x_edge_request_id" (#1940)
  • Added event_fingerprint to atomic.events (#1971)
  • Added true_tstamp to atomic.events (#1984)
  • Added migration script for 0.6.0 to 0.7.0 (#1988)
  • Added migration script for 0.5.0 to 0.7.0 (#2058)
  • Renamed dvce_tstamp to dvce_created_tstamp (#1993)
  • Added comment containing table version to atomic.events (#2020)
  • Added migration script for wd_access_log_1.sql 1-0-3 to 1-0-4 (#2029)

Postgres

  • Added event_vendor/name/format/version to atomic.events (#1802)
  • Added event_fingerprint to atomic.events (#1970)
  • Added true_tstamp to atomic.events (#1985)
  • Added migration script for 0.5.0 to 0.6.0 (#1987)
  • Renamed dvce_tstamp to dvce_created_tstamp (#1994)
  • Added comment containing table version to atomic.events (#2021)

StorageLoader

  • Bumped to 0.5.0
  • Exposed sslmode connection option for loading Postgres and Redshift, thanks @dennisatspaceape! (#1980)
  • Updated wd_access_log_1.json with 4 new fields (#1941)

Data Modeling

  • Updated web-incremental so failure is recoverable (#1974)
  • Renamed dvce_tstamp to dvce_created_tstamp (#2024)