Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(geo): add geometry data type #14470

Merged
merged 1 commit into from
Feb 2, 2024
Merged

Conversation

ariesdevil
Copy link
Contributor

@ariesdevil ariesdevil commented Jan 25, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Initial support geometry type. We use EWKB for serialize geo data and use EWKT for display geo data.

part of #14543

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@cdmikechen
Copy link

cdmikechen commented Jan 29, 2024

Each type of Geometry has a different definition, and there seems to be a difference in the type definitions used.
We have recently in learning related content, geo-arrow has given more specifically the type of description: https://github.com/geoarrow/geoarrow/blob/main/format.md

In addition, postgis, for example, also has a good guide to the use and definition of each type declaration:
https://postgis.net/workshops/postgis-intro/geometries.html

geom geometry(multilinestring, 3857) 

Hope these will be helpful to the improvement of this PR. And if there's anything I can do to help, let me know.

@kkk25641463
Copy link
Contributor

kkk25641463 commented Jan 29, 2024

Geometry have many types. Each type has own memory layout which is described in geoarrow spec . I think we will import geoarrow-r craft and define GeometryType as below:

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct GeometryType<T: Geometry>;

pub type GeometryPointType = GeometryType<GeoDataType::Point>;
pub type GeometryLineStringType = GeometryType<GeoDataType::LineString>;
pub type GeometryLargeLineStringType = GeometryType<GeoDataType::LargeLineString>;
pub type GeometryPolygonType = GeometryType<GeoDataType::Polygon>;
pub type GeometryLargePolygonType = GeometryType<GeoDataType::LargePolygon>;
pub type GeometryMultiPointType = GeometryType<GeoDataType::MultiPoint>;
pub type GeometryLargeMultiPointType = GeometryType<GeoDataType::LargeMultiPoint>;
pub type GeometryMultiLineStringType = GeometryType<GeoDataType::MultiLineString>;
pub type GeometryLargeMultiLineStringType = GeometryType<GeoDataType::LargeMultiLineString>;
pub type GeometryMultiPolygonType = GeometryType<GeoDataType::MultiPolygon>;
....

@ariesdevil
Copy link
Contributor Author

ariesdevil commented Jan 30, 2024

Hi @cdmikechen @kkk25641463 , thanks for your interest in this feature. We will follow the snowflake approach to deal with GEO types.
https://docs.snowflake.com/en/sql-reference/data-types-geospatial

Support geoparquet not planed. cc @cdmikechen

Support sub-types for geo type is not our approach now, since databend is a snowflake alternative we will follow it's way, related PR is closed. cc @kkk25641463

BTW: I'll plan to use geozero to read/store wkb binary data, if you have better options, please feel free to make suggestions.

@ariesdevil ariesdevil changed the title feat(WIP): add geometry data type feat(geo): add geometry data type Jan 30, 2024
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jan 30, 2024
@ariesdevil ariesdevil force-pushed the geometry branch 5 times, most recently from 09a9450 to 4558ebf Compare February 1, 2024 04:50
@ariesdevil ariesdevil mentioned this pull request Feb 1, 2024
7 tasks
@ariesdevil ariesdevil marked this pull request as ready for review February 1, 2024 04:54
@ariesdevil ariesdevil requested review from b41sh and sundy-li February 1, 2024 04:56
src/query/datavalues/src/data_value.rs Outdated Show resolved Hide resolved
src/query/expression/src/types/variant.rs Outdated Show resolved Hide resolved
src/query/expression/src/utils/display.rs Outdated Show resolved Hide resolved
src/query/expression/src/values.rs Outdated Show resolved Hide resolved
src/query/expression/src/values.rs Outdated Show resolved Hide resolved
src/query/formats/src/field_decoder/separated_text.rs Outdated Show resolved Hide resolved
src/query/formats/src/output_format/json.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/geometry.rs Show resolved Hide resolved
src/query/functions/src/scalars/geometry.rs Show resolved Hide resolved
src/query/functions/src/scalars/geometry.rs Show resolved Hide resolved
@drmingdrmer
Copy link
Member

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 50 files at r2, 5 of 26 files at r3.
Reviewable status: 10 of 61 files reviewed, 13 unresolved discussions (waiting on @ariesdevil, @b41sh, @kkk25641463, and @sundy-li)

@ariesdevil ariesdevil force-pushed the geometry branch 3 times, most recently from 031b107 to 09df85f Compare February 1, 2024 13:22
@sundy-li sundy-li added this pull request to the merge queue Feb 2, 2024
Merged via the queue into databendlabs:main with commit 97e7a31 Feb 2, 2024
71 of 72 checks passed
@ariesdevil ariesdevil deleted the geometry branch February 2, 2024 03:59
@cdmikechen
Copy link

cdmikechen commented Feb 5, 2024

@ariesdevil
Hi~ I'm glad this PR has been merged into the latest databend. So far I have found that there is still one problem that has not been solved:
Should we need to rename st_makepoint to st_makegeompoint or st_geom_point ?

@ariesdevil
Copy link
Contributor Author

@cdmikechen
We follow snowflake’s naming rules

@cdmikechen
Copy link

cdmikechen commented Feb 5, 2024

@cdmikechen We follow snowflake’s naming rules

Here are snowflake documents:
https://docs.snowflake.com/en/sql-reference/functions/st_makegeompoint
https://docs.snowflake.com/en/sql-reference/functions/st_makepoint

If we want to return a geometry point, should we use st_makegeompoint ?

@ariesdevil
Copy link
Contributor Author

@cdmikechen We follow snowflake’s naming rules↳

Here are snowflake documents: https://docs.snowflake.com/en/sql-reference/functions/st_makegeompoint https://docs.snowflake.com/en/sql-reference/functions/st_makepoint↳

If we want to return a geometry point, should we use st_makegeompoint ?↳

Yep, I'll fix it in next PR, thanks figure it out。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants