Skip to content

Issue writing in synapse spark 3.2 #43

Open
@siege089

Description

I'm using azure synapse and nothing I'm doing is allowing me to write models. I've explicitly included spark-avro in my pom file and loaded the spark-avro package into the spark pool workspace.

    <properties>
        <spark.version>3.2.0</spark.version>
        <scala.version.major>2.12</scala.version.major>
        <scala.version.minor>15</scala.version.minor>
    </properties>
    <dependencies>
        <dependency>
            <groupId>com.linkedin.isolation-forest</groupId>
            <artifactId>isolation-forest_${spark.version}_${scala.version.major}</artifactId>
            <version>3.0.3</version>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>com.microsoft.azure.synapse</groupId>
            <artifactId>synapseutils_${scala.version.major}</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.jmockit</groupId>
            <artifactId>jmockit</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest_${scala.version.major}</artifactId>
        </dependency>
    </dependencies>
2024-01-30 01:31:47,163 INFO ApplicationMaster [shutdown-hook-0]: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.sql.AnalysisException:  Failed to find data source: com.databricks.spark.avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".        
	at org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1028)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
	at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:876)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:275)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)
	at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImplHelper(IsolationForestModelReadWrite.scala:262)
	at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImpl(IsolationForestModelReadWrite.scala:241)
	at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)

Activity

jverbus

jverbus commented on Feb 12, 2024

@jverbus
Contributor

I just created a fix.

#44

jverbus

jverbus commented on Feb 12, 2024

@jverbus
Contributor

Try this

<dependency>
  <groupId>com.linkedin.isolation-forest</groupId>
  <artifactId>isolation-forest_3.2.4_2.12</artifactId>
  <version>3.0.4</version>
</dependency>
siege089

siege089 commented on Mar 8, 2024

@siege089
Author

Still getting the same error with this new version.

self-assigned this
on May 30, 2024
jverbus

jverbus commented on Dec 17, 2024

@jverbus
Contributor

I haven't been able to reproduce this error. Are you still running into the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Issue writing in synapse spark 3.2 · Issue #43 · linkedin/isolation-forest