Parse comma-delimited ~ASCII sections #265

kinverarity1 · 2019-02-09T03:36:58Z

This may be causing an issue for some: https://stackoverflow.com/questions/53874152/how-to-perform-the-same-edit-on-a-folder-of-csv-files

dcslagel · 2021-04-12T22:41:16Z

Is there a real-world example LAS file with comma-delimited ~ASCII section?

kinverarity1 · 2021-04-12T22:45:52Z

Try las_sample_3.0.las in the repo, I think all data sections in there are comma delimited.

dcslagel · 2021-07-13T04:23:52Z

@kinverarity1, How should Lasio identify to parse the ~ASCII section with comma-delimiter? I think these are the options. Is one of these (or something else) preferred?:

Check for the delimiter in the first few lines and have some logic to make the decision. Then parse the data section with this 'found' delimiter.
A command line parameter is passed in. Does this already exist?
Check if the file is LAS-3 and if it is then parse the ~ASCII section assuming the delimiter is a comma?

Thanks,
DC

kinverarity1 · 2021-07-13T11:04:40Z

I'd prefer the second approach mixed with the "provisional" approach we use for WRAP and NULL (see the DLM item in the version section):

lasio/tests/examples/3.0/sample_3.0.las

Line 4 in 85bf606

DLM . COMMA : DELIMITING CHARACTER BETWEEN DATA COLUMNS

Then we could just provide a keyword argument to force a choice? Not sure what to default to - probably whitespace.

My preference is mainly driven by laziness 🙃 First option would be great and we can do it if you want!

dcslagel · 2021-07-18T04:04:43Z

Reading through 1.2/2.0 and 3.0 specs only 3.0 has DLM. The 3.0 spec says that if DLM is not stated then the default delimiter is space. So we could set the default to space and if DLM is stated then adopt its value. That would roughly be the First option and I think is pretty do-able. I'll attempt a draft branch of it.

dcslagel · 2021-07-18T15:48:28Z

Dev Notes (may change over time...) Edited 2021-08-10:

Reread the 3.0 spec
Review Kent's manual method to read LAS3 data https://gist.github.com/kinverarity1/92f00b781472512349a9312d75fd4c33
Create a central a las.delimiter property and make it available to both parsing engines: normal and Numpy.
In the ~Version parsing section, check for the DLM keyword, if it exists override the default delimiter with the DLM setting. The default delimiter will be a .
Focus on parsing just the LAS3 '~ASCII' data section. Skip the other data sections for now because the code and internal data-structure isn't there yet to handle multiple data sections. In LAS3 the '~ASCII' section can be also called ~Log_Data. So add the ability to find either one. Optionally implement comma parsing with this file standards/examples/3.0/sample_las3.0_spec.las. It uses the LAS2 standard section names. Using it as the test file would enable separating the 'adding of alternate 'Log' names to a separate pull-request or separate/subsequent commit.
The curve names may be in ~Log_Definition instead of ~Curves. Add the ability to check both, and to add the Data to las.curves
Add a default.py::READ_POLICIES key-value for when the delimiter is a comma (this will exclude the regex_sub for converting commas to decimal). Add provisional_delimiter. Move the call in las.py to reader.py::get_substitutions() till after the provisional_delimiter has been processed so we can decide which default.py::READ_POLICIES to use.
reader.py::read_data_section_iterative_normal_engine()'s sub-routine items() needs to be changed to enable parsing a line with different delimiters not just SPACE but also COMMA and TAB. Same for reader.py::inspect_data_section().

…-data Parse comma-delimited ~ASCII sections #265

Charles-HL · 2024-01-18T20:22:37Z

What is the current status of this issue? I see a pull request have been merged. Is it done?

kinverarity1 · 2024-01-18T21:27:06Z

I believe the PR only implemented it for the normal (slow) parser, not the numpy (fast) parser. And also didn't allow for user specification of the delimiter.

kinverarity1 added enhancement infected-LAS labels Feb 9, 2019

kinverarity1 added this to the v1 milestone Jul 4, 2019

kinverarity1 added las3 stuff relating to LAS 3.0 and removed enhancement infected-LAS labels May 3, 2020

kinverarity1 added bug data-section-parser A bug or enhancement relating to the data section parser labels Apr 9, 2021

kinverarity1 mentioned this issue Apr 14, 2021

Support reading and writing all LAS 3.0 features #5

Open

9 tasks

dcslagel self-assigned this Jul 18, 2021

dcslagel mentioned this issue Aug 10, 2021

Parse comma-delimited ~ASCII sections #265 #485

Merged

dcslagel added a commit that referenced this issue Sep 20, 2021

Merge pull request #485 from dcslagel/issue-265-parse-comma-delimited…

e32f885

…-data Parse comma-delimited ~ASCII sections #265

dcslagel removed their assignment Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse comma-delimited ~ASCII sections #265

Parse comma-delimited ~ASCII sections #265

kinverarity1 commented Feb 9, 2019

dcslagel commented Apr 12, 2021

kinverarity1 commented Apr 12, 2021

dcslagel commented Jul 13, 2021

kinverarity1 commented Jul 13, 2021

dcslagel commented Jul 18, 2021 •

edited

Loading

dcslagel commented Jul 18, 2021 •

edited

Loading

Charles-HL commented Jan 18, 2024

kinverarity1 commented Jan 18, 2024

Parse comma-delimited ~ASCII sections #265

Parse comma-delimited ~ASCII sections #265

Comments

kinverarity1 commented Feb 9, 2019

dcslagel commented Apr 12, 2021

kinverarity1 commented Apr 12, 2021

dcslagel commented Jul 13, 2021

kinverarity1 commented Jul 13, 2021

dcslagel commented Jul 18, 2021 • edited Loading

dcslagel commented Jul 18, 2021 • edited Loading

Charles-HL commented Jan 18, 2024

kinverarity1 commented Jan 18, 2024

dcslagel commented Jul 18, 2021 •

edited

Loading

dcslagel commented Jul 18, 2021 •

edited

Loading