Skip to content

Clarify use of SAMPLE #165

Open
Open
@Tpt

Description

Paragraph 11.4 Aggregate Projection Restrictions states that only expressions consisting of aggregates and constants may be projected, with one exception, then gives an example of a disallowed expression with Note that it would not be legal to project.

However, just after it states that Other expressions, not using GROUP BY variables, or aggregates may have non-deterministic values projected from their groups using the SAMPLE aggregate.. Is seems contradictory to me.
Moreover, the algorithm on section 18.2.4.1 Grouping and Aggregation suggest this automated insertion of SAMPLE, leading to think this insertion of SAMPLE is mandatory (line Replace V with Sample(V)).

Example of affected query:

SELECT ?s WHERE {
  VALUES (?s ?o) { (0 0) (1 0) }
} GROUP BY ?o

Jena, Blazegraph and Oxigraph all return an error for this query whereas Virtuoso returns results using SAMPLE.

I see multiple way forward:

  1. State the behavior (error or SAMPLE) is implementation defined
  2. Mandate error behavior
  3. Mandate sample behavior

A proposal: take the "implementation defined" way to keep backward compatibility, but strongly nudge in favor of returning an error:

  • in the algorithm, replace Replace V with Sample(V) by Raise an error or Replace V with Sample(V)
  • rephrase Other expressions, not using GROUP BY variables, or aggregates may have non-deterministic values projected from their groups using the SAMPLE aggregate. into Note that some implementation do not raise an error when not using GROUP BY variables, or aggregates in exception but project non-deterministic values from their groups using the SAMPLE aggregate.

Metadata

Assignees

No one assigned

    Labels

    ErratumRaisedErrata management: proposed erratum

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions