-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement variable length arrays (OCCURS) #156
Labels
Comments
Thanks for providing so much context. Still hard to tell what exactly went wrong, but here some ideas:
|
yruslan
added
accepted
Accepted for implementation
enhancement
New feature or request
labels
Aug 29, 2019
yruslan
changed the title
Seemingly spurious records and missing variable length data
Implement variable length arrays (OCCURS)
Aug 29, 2019
Implementation details are discussed in #172.
|
yruslan
added a commit
that referenced
this issue
Sep 2, 2019
yruslan
added a commit
that referenced
this issue
Sep 2, 2019
yruslan
added a commit
that referenced
this issue
Sep 2, 2019
Please, try this snapshot and let me know if it worked for you: <dependency>
<groupId>za.co.absa.cobrix</groupId>
<artifactId>spark-cobol</artifactId>
<version>1.0.1-SNAPSHOT</version>
</dependency> You also need to use this option:
|
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@yruslan - Thanks so much with your help on #147 . We were able to re-export the data to include the RDW. However, we're still facing some issues.
Background
I'm reading a single file with two records that use the same copybook. However, when I trie to save the dataframe to JSON, I see four records and sections of the JSON that should include repeating values (i.e. sections of the copybook that use OCCURS...DEPENDING ON) are empty.
The relevant sections of the copybook are here.
Cobrix logs the following.
Here's my code:
This produces JSON that looks like this.
First Question
So, the first question is, why is it creating four records and not two? If I omit the
.option("is_rdw_big_endian", "true")
, then I see this error.Now, the
REC_LNGTH_CNT
should contain the actual record length. It's value for the two records is 16,387 and 13,950, respectively. I tried to use that rather than the RDW, as follows.But, I got this error.
Is that because this field is defined as
PIC S9(5) COMP-3.
in the copybook?I'm guessing there is a mismatch between what the RDW is indicating and the actual data. Do you have some pointers for troubleshooting that and working around it?
Second Question
The second question is, how come the nested JSON array isn't populated for the variable length field values?
The value of the
IP-REV-CNTR-CD-I-CNT
field in the JSON for the first record looks like this:So, I expect 23 records to be populated. However, the value of the
"CLM_REV_CNTR_GRP"
key is an array of 23 elements, but they are all empty. The first 20 elements are all objects where each key has an empty value. The last three are just empty objects.Any ideas?
Thanks so much for your help!!!
The text was updated successfully, but these errors were encountered: