Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for S3 storage #146

Closed
awsazuser opened this issue Jul 28, 2019 · 4 comments
Closed

Add support for S3 storage #146

awsazuser opened this issue Jul 28, 2019 · 4 comments
Assignees
Labels
accepted Accepted for implementation enhancement New feature or request help wanted Extra attention is needed

Comments

@awsazuser
Copy link

awsazuser commented Jul 28, 2019

Does cobrix support S3 file systems ?
I am getting "java.lang.IllegalArgumentException: Wrong FS" error when loading the copybook and datafile from a AWS S3 bucket.

Code:

val spark = SparkSession.builder().appName("Spark-Cobol").getOrCreate()
import spark.implicits._
import za.co.absa.cobrix.spark.cobol.source

val df = spark.read.format(
"za.co.absa.cobrix.spark.cobol.source").option(
"copybooks", "s3://xxxx/tesfile.cbl").load("s3://xxxx/sourcedata/DATAFILE0100")

df.printSchema
df.show()

Error:

java.lang.IllegalArgumentException: Wrong FS: s3://xxxx/tesfile.cbl, expected: hdfs://ip-xxx-xx-xx-85.ec2.internal:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:653)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1430)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$.za$co$absa$cobrix$spark$cobol$source$parameters$CobolParametersValidator$$validatePath$1(CobolParametersValidator.scala:71)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$$anonfun$validateOrThrow$2.apply(CobolParametersValidator.scala:94)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$$anonfun$validateOrThrow$2.apply(CobolParametersValidator.scala:93)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$.validateOrThrow(CobolParametersValidator.scala:93)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:52)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
... 160 elided

@yruslan
Copy link
Collaborator

yruslan commented Jul 29, 2019

Unfortunately, S3 is not supported right now. But we might add S3 support in the future.

@yruslan yruslan added accepted Accepted for implementation enhancement New feature or request labels Jul 29, 2019
@yruslan yruslan changed the title java.lang.IllegalArgumentException: Wrong FS: s3://xxxx/tesfile.cbl, expected: hdfs://ip-XXX-XX-XXX-85.ec2.internal:8020 Add support for S3 storage Jul 31, 2019
@yruslan yruslan added the help wanted Extra attention is needed label Jul 31, 2019
@yruslan yruslan self-assigned this Dec 28, 2020
yruslan added a commit that referenced this issue Dec 29, 2020
yruslan added a commit that referenced this issue Dec 30, 2020
yruslan added a commit that referenced this issue Dec 30, 2020
yruslan added a commit that referenced this issue Dec 30, 2020
@yruslan
Copy link
Collaborator

yruslan commented Dec 30, 2020

S3 storage should be supported in spark-cobol version 2.2.0.

Please, let me know if it works for you.

@yruslan yruslan closed this as completed Jan 18, 2021
@RamanandJaiswal
Copy link

Does cobrix supports gs:// file system ?
i'm getting the same error as
Caused by: java.lang.IllegalArgumentException: Wrong FS: gs://

@yruslan
Copy link
Collaborator

yruslan commented Mar 19, 2021

From the filesystem support perspective, spark-cobol is the same as any other Spark data source. If you can use gs:// to read CSV or Parquet, then it should be possible to read mainframe files as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted for implementation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants