Open
Description
I'm using azure synapse and nothing I'm doing is allowing me to write models. I've explicitly included spark-avro in my pom file and loaded the spark-avro package into the spark pool workspace.
<properties>
<spark.version>3.2.0</spark.version>
<scala.version.major>2.12</scala.version.major>
<scala.version.minor>15</scala.version.minor>
</properties>
<dependencies>
<dependency>
<groupId>com.linkedin.isolation-forest</groupId>
<artifactId>isolation-forest_${spark.version}_${scala.version.major}</artifactId>
<version>3.0.3</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version.major}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version.major}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.version.major}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_${scala.version.major}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure.synapse</groupId>
<artifactId>synapseutils_${scala.version.major}</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.jmockit</groupId>
<artifactId>jmockit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.version.major}</artifactId>
</dependency>
</dependencies>
2024-01-30 01:31:47,163 INFO ApplicationMaster [shutdown-hook-0]: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.sql.AnalysisException: Failed to find data source: com.databricks.spark.avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".
at org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1028)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:876)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:275)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)
at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImplHelper(IsolationForestModelReadWrite.scala:262)
at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImpl(IsolationForestModelReadWrite.scala:241)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
Activity
jverbus commentedon Feb 12, 2024
I just created a fix.
#44
jverbus commentedon Feb 12, 2024
Try this
siege089 commentedon Mar 8, 2024
Still getting the same error with this new version.
jverbus commentedon Dec 17, 2024
I haven't been able to reproduce this error. Are you still running into the issue?