spark input_file_name() not working in cobrix #221

kriswijnants · 2019-12-06T12:07:52Z

Hi,

Thank you for creating and maintaining Cobrix. It's a tool we discovered recently, and plan to implement it in our cloud data platform for our Mainframe project.

Just a small question to ask. We notice the input_file_name() command in spark always returns blanks when using cobrix. This in combination with the option("is_record_sequence", "true") option.

spark.read.format("cobol").option("copybook", "/mnt/inputMDP/BIWA_GUTEX/Copybooks/"+dbutils.widgets.get("version")+"/GAGUSECO_20070115.txt").option("is_record_sequence", "true").load("/mnt/inputMDP/BIWA_GUTEX/Datafiles/"+dbutils.widgets.get("version")+"/GA-GA324001*").withColumn("ISN_Source", input_file_name).createOrReplaceTempView("vw_gutex_GA")

Do you notice the same behaviour? Is there any chance to get this working?

Keep up the good work!

Regards,

Kris

The text was updated successfully, but these errors were encountered:

yruslan · 2019-12-10T07:33:37Z

Thanks for reporting the issue!

Looks interesting. Will take a look.

yruslan · 2019-12-10T14:03:22Z

I can confirm the issue. Indeed, for variable-record-size files input_file_name() returns an empty string. That is due to the way we handle sparse indexes creation to parallelize the reading of such files.

It will take a while to fix this properly (probably need to create a custom RDD). But we can add a workaround to generate a column with the input file name for each record. That's what we are going to do first. It would look like this:

.option("with_input_file_name_col", "ISN_Source")

yruslan · 2019-12-10T14:05:04Z

Just a double check. Which Spark version are you using?

We are planning to release Cobrix 2.0.0 first and all further changes will be made there. But it will support Spark 2.4 or above.

kriswijnants · 2019-12-10T14:07:00Z

Hi Ruslan, Thanks for your intervention. Really appreciate this! We are running on the Databricks runtime 6.2. So we use spark version 2.4.4. Regards, Kris Kris Wijnants Innovation Wizard m +32 (0)496 121 111 From: Ruslan Yushchenko <notifications@github.com> Sent: dinsdag 10 december 2019 15:05 To: AbsaOSS/cobrix <cobrix@noreply.github.com> Cc: Wijnants Kris <kris.wijnants@kohera.be>; Author <author@noreply.github.com> Subject: Re: [AbsaOSS/cobrix] spark input_file_name() not working in cobrix (#221) Just a double check. Which Spark version are you using? We are planning to release Cobrix 2.0.0 first and all further changes will be made there. But it will support Spark 2.4 or above. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAbsaOSS%2Fcobrix%2Fissues%2F221%3Femail_source%3Dnotifications%26email_token%3DANWTSU2BVDM7LCASKSXC7PTQX6OZDA5CNFSM4JWVEGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPKZYY%23issuecomment-564047075&data=02%7C01%7Ckris.wijnants%40kohera.be%7C968d7e2082384915d9db08d77d79f628%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637115835110774511&sdata=xfwMf%2F8SbQ5Xzg52TGLLde7rnu9P97uyKVZw7xMdbe4%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANWTSUZFFQBDGPXNOYL7D3DQX6OZDANCNFSM4JWVEGAQ&data=02%7C01%7Ckris.wijnants%40kohera.be%7C968d7e2082384915d9db08d77d79f628%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637115835110774511&sdata=HNYtsVttf8fhLHBARa1k%2FQdfRKH18gybDHvRd291RBQ%3D&reserved=0>. This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.com<http://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/>

yruslan · 2019-12-10T14:45:51Z

Great! Cobrix 2.0.0 is planned to be released this week. And the workaround for this issue can be expected sometime next week.

kriswijnants · 2019-12-10T14:46:49Z

Great news. Thanks Ruslan! Kris Wijnants Innovation Wizard m +32 (0)496 121 111 From: Ruslan Yushchenko <notifications@github.com> Sent: dinsdag 10 december 2019 15:46 To: AbsaOSS/cobrix <cobrix@noreply.github.com> Cc: Wijnants Kris <kris.wijnants@kohera.be>; Author <author@noreply.github.com> Subject: Re: [AbsaOSS/cobrix] spark input_file_name() not working in cobrix (#221) Great! Cobrix 2.0.0 is planned to be released this week. And the workaround for this issue can be expected sometime next week. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAbsaOSS%2Fcobrix%2Fissues%2F221%3Femail_source%3Dnotifications%26email_token%3DANWTSU5CQLKIC2F4LC43NSLQX6TR7A5CNFSM4JWVEGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPPJ4Q%23issuecomment-564065522&data=02%7C01%7Ckris.wijnants%40kohera.be%7Cb4ec7fe5d008462962e108d77d7fa7f9%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637115859539676949&sdata=Nd07Sh3DeOInfc6NjNoK2slXkGeon44mojCzT89MKho%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANWTSU65PCGSQQLHMLLTSEDQX6TR7ANCNFSM4JWVEGAQ&data=02%7C01%7Ckris.wijnants%40kohera.be%7Cb4ec7fe5d008462962e108d77d7fa7f9%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637115859539686944&sdata=fgjNFw9wvc2TiPcYLH5arXNM2UuSY%2ByVh2hv1PozIkY%3D&reserved=0>. This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.com<http://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/>

…nown at record level.

…length files.

…nown at record level.

…length files.

yruslan · 2019-12-17T14:11:31Z

This should be fixed in the latest snapshot.
Please, try:

        <dependency>
            <groupId>za.co.absa.cobrix</groupId>
            <artifactId>spark-cobol_2.11</artifactId>
            <version>2.0.1-SNAPSHOT</version>
        </dependency>

and let me know if the issue is fixed.

kriswijnants · 2019-12-17T14:27:01Z

Thanks Ruslan, I’ll do so. Regards, Kris Kris Wijnants Innovation Wizard m +32 (0)496 121 111 From: Ruslan Yushchenko <notifications@github.com> Sent: dinsdag 17 december 2019 15:12 To: AbsaOSS/cobrix <cobrix@noreply.github.com> Cc: Wijnants Kris <kris.wijnants@kohera.be>; Author <author@noreply.github.com> Subject: Re: [AbsaOSS/cobrix] spark input_file_name() not working in cobrix (#221) This should be fixed in the latest snapshot. Please, try: <dependency> <groupId>za.co.absa.cobrix</groupId> <artifactId>spark-cobol_2.11</artifactId> <version>2.0.1-SNAPSHOT</version> </dependency> and let me know if the issue is fixed. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAbsaOSS%2Fcobrix%2Fissues%2F221%3Femail_source%3Dnotifications%26email_token%3DANWTSU77YDCQLDXK5CH6OH3QZDMZJA5CNFSM4JWVEGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHCPTJI%23issuecomment-566557093&data=02%7C01%7Ckris.wijnants%40kohera.be%7C8071bf5759d04ab1d9c508d782fb0564%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637121886951103565&sdata=svdCHUS7CBaQIJi8XGxWlwOU0EyEdPmPzqXq4k7N2uw%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANWTSU5NQRFB2IDEEAMQ3N3QZDMZJANCNFSM4JWVEGAQ&data=02%7C01%7Ckris.wijnants%40kohera.be%7C8071bf5759d04ab1d9c508d782fb0564%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637121886951103565&sdata=Wu5lwzieS5XOWTpPbMSLfWOOnFIiHjB8y0Y6IMi7Jck%3D&reserved=0>. This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.com<http://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/>

yruslan · 2019-12-18T07:42:46Z

Forgot to mention. In order to get input file names for each record of a variable record length file a workaround is used. In your case the option looks like this:

.option("with_input_file_name_col", "ISN_Source")

I'd also recommend using

.option("pedantic", "true")

So that unrecognized options cause errors.

kriswijnants · 2019-12-20T12:48:31Z

Hi Ruslan, Apologies for replying late. I get an error when I try to install the new version over Maven. So for the moment we are still using the version 1.0.2 [cid:image001.png@01D5B73C.253579C0] Once I get the maven package running I’ll try. But I believe you on your word when you say it’s fixed. Thanks for having a look into this! Regards, Kris Kris Wijnants Innovation Wizard m +32 (0)496 121 111 From: Ruslan Yushchenko <notifications@github.com> Sent: woensdag 18 december 2019 8:43 To: AbsaOSS/cobrix <cobrix@noreply.github.com> Cc: Wijnants Kris <kris.wijnants@kohera.be>; Author <author@noreply.github.com> Subject: Re: [AbsaOSS/cobrix] spark input_file_name() not working in cobrix (#221) Forgot to mention. In order to get input file names for each record of a variable record length file a workaround is used. In your case the option looks like this: .option("with_input_file_name_col", "ISN_Source") I'd also recommend using .option("pedantic", "true") So that unrecognized options cause errors. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAbsaOSS%2Fcobrix%2Fissues%2F221%3Femail_source%3Dnotifications%26email_token%3DANWTSU6TL7JXHYWLRZFLF23QZHH7NA5CNFSM4JWVEGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHFHIKY%23issuecomment-566916139&data=02%7C01%7Ckris.wijnants%40kohera.be%7C67e1da1addc8414f012d08d7838de0e2%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637122517754462709&sdata=B9a9JziTGSju3nXgY%2FzMzNF3s7BE9GLkU54DbdnKoWc%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANWTSUYSSK7GCMLU5OPW4L3QZHH7NANCNFSM4JWVEGAQ&data=02%7C01%7Ckris.wijnants%40kohera.be%7C67e1da1addc8414f012d08d7838de0e2%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637122517754467693&sdata=s5Iv97RAfNuebGGDSeywS0XEw3Qb1AWRhK3he1ahW%2B4%3D&reserved=0>. This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.com<http://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/>

yruslan · 2019-12-20T12:53:11Z

Hi Kris,

Snapshot version linking requires additional configuration in .m2/settings.xml. It might be even harder for managed clusters.

Try setting the version to 2.0.1 which was released today.

And please let me know if it worked for you.

Thank you,
Ruslan

kriswijnants · 2019-12-20T13:06:49Z

Hi Ruslan, I just tried, and it works perfect! It’s now showing the filename of ebcdic files using the option is_record_sequence = true. Thanks a lot for your efforts! Regards, Kris Kris Wijnants Innovation Wizard m +32 (0)496 121 111 From: Ruslan Yushchenko <notifications@github.com> Sent: vrijdag 20 december 2019 13:53 To: AbsaOSS/cobrix <cobrix@noreply.github.com> Cc: Wijnants Kris <kris.wijnants@kohera.be>; Author <author@noreply.github.com> Subject: Re: [AbsaOSS/cobrix] spark input_file_name() not working in cobrix (#221) Hi Kris, Snapshot version linking requires additional configuration in .m2/settings.xml. It might be even harder for managed clusters. Try setting the version to 2.0.1 which was released today. And please let me know if it worked for you. Thank you, Ruslan — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAbsaOSS%2Fcobrix%2Fissues%2F221%3Femail_source%3Dnotifications%26email_token%3DANWTSU5KQGAQARCJRBOVQ5TQZS53PA5CNFSM4JWVEGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHM3DJA%23issuecomment-567914916&data=02%7C01%7Ckris.wijnants%40kohera.be%7C70dd2f07eec548269d8e08d7854b92b6%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637124431937930582&sdata=au4S1vmXJI2QBWqOMbfmBhfV2WWfv5aLPA6ZOdZJYxg%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANWTSU6Z6E2GAZPXY47OKPTQZS53PANCNFSM4JWVEGAQ&data=02%7C01%7Ckris.wijnants%40kohera.be%7C70dd2f07eec548269d8e08d7854b92b6%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637124431937940579&sdata=yDl50HT2c2RxFEJLJbYJDt7jZo%2FxNl%2F3zuMiJ7WGK9g%3D&reserved=0>. This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.com<http://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/>

bart-at-qqdatafruits · 2020-02-20T14:54:39Z

H2. environment: docker: jupyter/all-spark-notebook:latest + Apache Toree - Scala

H2. Issue

when using

.option("file_start_offset", "600")
.option("file_end_offset", "600")

input_file_name() no longer works

H3. Annonymized extract

%AddDeps za.co.absa.cobrix spark-cobol_2.11 2.0.3 --transitive

val sparkBuilder = SparkSession.builder().appName("Example")

val spark = sparkBuilder .getOrCreate()

`
import org.apache.spark.sql.functions._

import org.apache.spark.sql.SparkSession

spark.udf.register("get_file_name", (path: String) => path.split("/").last)
val cobolDataframe = spark
.read
.format("za.co.absa.cobrix.spark.cobol.source")
.option("pedantic", "true")
.option("copybook", "file:///home/jovyan/data/BRAND/COPYBOOK.txt")
.option("file_start_offset", "600")
.option("file_end_offset", "600")
.load("file:///home/jovyan/data/BRAND/initial_transformed/FILEPATTERN*")
.withColumn("DPSource", callUDF("get_file_name", input_file_name()))
`

cobolDataframe //.filter("RECORD.ID % 2 = 0") // filter the even values of the nested field 'RECORD_LENGTH' .take(20) .foreach(v => println(v))

kriswijnants · 2020-02-20T16:24:07Z

Hi Ruslan, Hope you are doing well. I’m also involved in the project Bart Debersaque is working on. So you can reach out to or Bart or myself for testing, screenshots, … etc. With best regards, Kris Kris Wijnants Innovation Wizard m +32 (0)496 121 111 From: bart-at-qqdatafruits <notifications@github.com> Sent: donderdag 20 februari 2020 15:55 To: AbsaOSS/cobrix <cobrix@noreply.github.com> Cc: Wijnants Kris <kris.wijnants@kohera.be>; Author <author@noreply.github.com> Subject: Re: [AbsaOSS/cobrix] spark input_file_name() not working in cobrix (#221) H2. environment: docker: jupyter/pyspark-notebook:latest + Apache Toree - Scala H2. Issue when using .option("file_start_offset", "600") .option("file_end_offset", "600") input_file_name() bo longer works H3. Annonymized extract %AddDeps za.co.absa.cobrix spark-cobol_2.11 2.0.3 --transitive val sparkBuilder = SparkSession.builder().appName("Example") val spark = sparkBuilder .getOrCreate() ` import org.apache.spark.sql.functions._ import org.apache.spark.sql.SparkSession spark.udf.register("get_file_name", (path: String) => path.split("/").last) val cobolDataframe = spark .read .format("za.co.absa.cobrix.spark.cobol.source") .option("pedantic", "true") .option("copybook", "file:///home/jovyan/data/BRAND/COPYBOOK.txt") .option("file_start_offset", "600") .option("file_end_offset", "600") .load("file:///home/jovyan/data/BRAND/initial_transformed/FILEPATTERN*") .withColumn("DPSource", callUDF("get_file_name", input_file_name())) ` cobolDataframe //.filter("RECORD.ID % 2 = 0") // filter the even values of the nested field 'RECORD_LENGTH' .take(20) .foreach(v => println(v)) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAbsaOSS%2Fcobrix%2Fissues%2F221%3Femail_source%3Dnotifications%26email_token%3DANWTSU6XKG4FS6ZQBHMYDWLRD2KTBA5CNFSM4JWVEGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMOM6AY%23issuecomment-589090563&data=02%7C01%7Ckris.wijnants%40kohera.be%7C3f15a09309854977db9308d7b614d0f1%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637178072829827348&sdata=%2FLKqzyJiGim8z0YhwtaAnv3eb9jwvdgwHR07UYJ2r2M%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANWTSU5GO5CODQEBT3LGU2DRD2KTBANCNFSM4JWVEGAQ&data=02%7C01%7Ckris.wijnants%40kohera.be%7C3f15a09309854977db9308d7b614d0f1%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637178072829837344&sdata=yW1nlBG9LR0TSHZGRyiMsSMAPZZHeEb4QdDp4n4Bjks%3D&reserved=0>. This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.com<http://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/>

yruslan · 2020-02-20T20:32:00Z

Hi Kris,
When you use file offset a different reader is used. Use the workaround for this case instead of input_file_name():

.option("with_input_file_name_col", "DPSource")

kriswijnants · 2020-02-21T07:46:10Z

Hi Ruslan, Thanks for your quick reply! Regards, Kris Kris Wijnants Innovation Wizard m +32 (0)496 121 111 From: Ruslan Yushchenko <notifications@github.com> Sent: donderdag 20 februari 2020 21:32 To: AbsaOSS/cobrix <cobrix@noreply.github.com> Cc: Wijnants Kris <kris.wijnants@kohera.be>; Author <author@noreply.github.com> Subject: Re: [AbsaOSS/cobrix] spark input_file_name() not working in cobrix (#221) Hi Kris, When you use file offset a different reader is used. Use the workaround for this case instead of input_file_name(): .option("with_input_file_name_col", "DPSource") — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAbsaOSS%2Fcobrix%2Fissues%2F221%3Femail_source%3Dnotifications%26email_token%3DANWTSUYDLURPMHW44VGHDE3RD3SEDA5CNFSM4JWVEGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMP6XPI%23issuecomment-589294525&data=02%7C01%7Ckris.wijnants%40kohera.be%7C74f550ce0f23415f80f908d7b643f237%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637178275251475132&sdata=nTjsZP2yk6OyrhzPhqifXITa2GM9hSI8hXvW0KJ0xV0%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANWTSU4JKF2BSD33D3B5Z7TRD3SEDANCNFSM4JWVEGAQ&data=02%7C01%7Ckris.wijnants%40kohera.be%7C74f550ce0f23415f80f908d7b643f237%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637178275251475132&sdata=IVlCrphS2Wz64M%2By5u1gIT%2BdrnYylcDo79hOwFEjNvI%3D&reserved=0>. This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.com<http://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/>

bart-at-qqdatafruits · 2020-02-21T09:12:23Z

Hi Ruslan,

"with_input_file_name_col" seems be intended for "is_record_sequence = true" only.

In this case I have a copy book (fixed lenth) where the copybook does not mention the Header and footer.

Possibly actions I should take are:

get rid off the header and footer in a pre-prosessing (a less clean solution, to be avoided)
try to rewrite the copybook to accomodate header and footer (ideal solution, maybe as it should) consisting of several record types. I will look into this next.

I value your opinion. Mainframe code can be messy. It is a trade off between handling source particuliarities out of the box and keeping the cobrix code maintainable.

Thanks in advance,

Regards, Bart,

a test of your suggestion:

`
import org.apache.spark.sql.functions._

import org.apache.spark.sql.SparkSession

spark.udf.register("get_file_name", (path: String) => path.split("/").last)

val cobolDataframe = spark
.read
.format("za.co.absa.cobrix.spark.cobol.source")
//.option("is_record_sequence", "true")
//.option("generate_record_id", "true") // for comparison with unconverted (windows) file only
.option("pedantic", "true")
//.option("with_input_file_name_col", "DPSourceTemp")
.option("copybook", "file:///home/jovyan/data/BRAND/COPYBOOK.txt")
.option("file_start_offset", "600")
.option("file_end_offset", "600")
.option("with_input_file_name_col", "DPSourceTemp")
.load("file:///home/jovyan/data/BRAND/initial_transformed")
`

the result:

Name: java.lang.IllegalArgumentException Message: Option 'with_input_file_name_col' is supported only when 'is_record_sequence' = true. StackTrace: at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersParser$.validateSparkCobolOptions(CobolParametersParser.scala:467) at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersParser$.parse(CobolParametersParser.scala:209) at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:56) at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)

yruslan · 2020-02-21T15:12:08Z

Interesting. I will take a look. I think this can be easily fixed so that with_input_file_name_col would work in your case.

yruslan · 2020-02-21T15:15:01Z

Opened #252 to continue the discussion there. Since the incompatibility between with_input_file_name_col and file_start_offset is a separate issue,

kriswijnants · 2020-02-21T15:16:43Z

Thanks! Kris Wijnants Innovation Wizard m +32 (0)496 121 111 From: Ruslan Yushchenko <notifications@github.com> Sent: vrijdag 21 februari 2020 16:12 To: AbsaOSS/cobrix <cobrix@noreply.github.com> Cc: Wijnants Kris <kris.wijnants@kohera.be>; Author <author@noreply.github.com> Subject: Re: [AbsaOSS/cobrix] spark input_file_name() not working in cobrix (#221) Interesting. I will take a look. I think this can be easily fixed so that with_input_file_name_col would work in your case. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAbsaOSS%2Fcobrix%2Fissues%2F221%3Femail_source%3Dnotifications%26email_token%3DANWTSU2SY7IF7TKCEUFO6FDRD7VMTA5CNFSM4JWVEGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMTAOJI%23issuecomment-589694757&data=02%7C01%7Ckris.wijnants%40kohera.be%7Cffffb189c9a44811c11308d7b6e06c31%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637178947318866001&sdata=afZ%2FCBk4Dk7cazHhvPHbmzZt7Zx%2FKHATWoTO%2FLv%2B52o%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANWTSU5WHWXW4RVZO7NOOWLRD7VMTANCNFSM4JWVEGAQ&data=02%7C01%7Ckris.wijnants%40kohera.be%7Cffffb189c9a44811c11308d7b6e06c31%7C49c3d703357947bfa8887c913fbdced9%7C0%7C0%7C637178947318875997&sdata=vYkRTLaKg%2BDx4tvJXMak3ap8E%2Fmuyot75UtRJBIy4f4%3D&reserved=0>. This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.com<http://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/>

yruslan added the accepted Accepted for implementation label Dec 10, 2019

yruslan self-assigned this Dec 10, 2019

yruslan added a commit that referenced this issue Dec 10, 2019

#221 Add a failing test.

31b60cb

yruslan added a commit that referenced this issue Dec 10, 2019

#221 Add a failing test.

dd7d94d

yruslan added the enhancement New feature or request label Dec 10, 2019

yruslan added the help wanted Extra attention is needed label Dec 12, 2019

yruslan added a commit that referenced this issue Dec 16, 2019

#221 Extend SimpleStream with inputFileName() so that file names be k…

2c15d58

…nown at record level.

yruslan added a commit that referenced this issue Dec 16, 2019

#221 Implement 'with_input_file_name_col' option for variable record …

faae2bd

…length files.

yruslan added a commit that referenced this issue Dec 17, 2019

#221 Add a failing test.

09eceb9

yruslan added a commit that referenced this issue Dec 17, 2019

#221 Extend SimpleStream with inputFileName() so that file names be k…

c94b29b

…nown at record level.

yruslan added a commit that referenced this issue Dec 17, 2019

#221 Implement 'with_input_file_name_col' option for variable record …

6de8330

…length files.

yruslan added a commit that referenced this issue Dec 17, 2019

#221 Add documentation to the new feature.

7c97598

yruslan added this to the 2.0.1 milestone Dec 17, 2019

yruslan closed this as completed Dec 20, 2019

yruslan reopened this Feb 21, 2020

yruslan closed this as completed Feb 21, 2020

yruslan mentioned this issue Feb 21, 2020

The 'with_input_file_name_col' option doesn't work with File offsets #252

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark input_file_name() not working in cobrix #221

spark input_file_name() not working in cobrix #221

kriswijnants commented Dec 6, 2019 •

edited

Loading

yruslan commented Dec 10, 2019

yruslan commented Dec 10, 2019

yruslan commented Dec 10, 2019

kriswijnants commented Dec 10, 2019 via email

yruslan commented Dec 10, 2019

kriswijnants commented Dec 10, 2019 via email

yruslan commented Dec 17, 2019

kriswijnants commented Dec 17, 2019 via email

yruslan commented Dec 18, 2019

kriswijnants commented Dec 20, 2019 via email

yruslan commented Dec 20, 2019

kriswijnants commented Dec 20, 2019 via email

bart-at-qqdatafruits commented Feb 20, 2020 •

edited

Loading

kriswijnants commented Feb 20, 2020 via email

yruslan commented Feb 20, 2020

kriswijnants commented Feb 21, 2020 via email

bart-at-qqdatafruits commented Feb 21, 2020

yruslan commented Feb 21, 2020

yruslan commented Feb 21, 2020

kriswijnants commented Feb 21, 2020 via email

spark input_file_name() not working in cobrix #221

spark input_file_name() not working in cobrix #221

Comments

kriswijnants commented Dec 6, 2019 • edited Loading

yruslan commented Dec 10, 2019

yruslan commented Dec 10, 2019

yruslan commented Dec 10, 2019

kriswijnants commented Dec 10, 2019 via email

yruslan commented Dec 10, 2019

kriswijnants commented Dec 10, 2019 via email

yruslan commented Dec 17, 2019

kriswijnants commented Dec 17, 2019 via email

yruslan commented Dec 18, 2019

kriswijnants commented Dec 20, 2019 via email

yruslan commented Dec 20, 2019

kriswijnants commented Dec 20, 2019 via email

bart-at-qqdatafruits commented Feb 20, 2020 • edited Loading

kriswijnants commented Feb 20, 2020 via email

yruslan commented Feb 20, 2020

kriswijnants commented Feb 21, 2020 via email

bart-at-qqdatafruits commented Feb 21, 2020

yruslan commented Feb 21, 2020

yruslan commented Feb 21, 2020

kriswijnants commented Feb 21, 2020 via email

kriswijnants commented Dec 6, 2019 •

edited

Loading

bart-at-qqdatafruits commented Feb 20, 2020 •

edited

Loading