Skip to content

->schema and create-dataframe should support fields of struct array #333

Closed
@gavinkflam

Description

  • I have read through the quick start and installation sections of the README.

Info

Geni Version: 0.0.38

Problem / Steps to reproduce

user=> (require '[zero-one.geni.core.dataset-creation :as g] :reload)
nil
user=> (g/->schema {:coords [{:x :int :y :int}]})
Execution error (IllegalArgumentException) at org.apache.spark.sql.types.DataTypes/createArrayType (DataTypes.java:114).
elementType should not be null.

Expected results

user=> (g/->schema {:coords [{:x :int :y :int}]})
#object[org.apache.spark.sql.types.StructType 0x5cb6297e "StructType(StructField(coords,ArrayType(StructType(StructField(x,IntegerType,true), StructField(y,IntegerType,true)),true),true))"]

Proposed solution

At the moment, array-type supports only simple val-type listed in data-type->spark-type. E.g. :bool, :string.

We can extend array-type to support any Spark SQL DataType, in the same fashion we are already doing in struct-field.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions