-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading and writing all LAS 3.0 features #5
Comments
Current error is not very graceful:
|
For me two key interesting parts of LAS 3 are:
I will be very interesting to see this functionality available. |
👍 I've been looking on the overall content of the package, specifically the parser and I would like to see what you think about the following suggestions:
HeaderItem = namedlist("HeaderItem", ["mnemonic", "unit", "value", "descr"]) What I'm proposing is to include two extra items, HeaderItem = namedlist("HeaderItem", ["mnemonic", "unit", "value", "description", "format", "association"]) Those items would be empty in case
It would be some sort of always treat as Anyway, just my two cents to align my thoughts before doing some actual work. |
Hey, thanks for the help! 1 - Extend HeaderItemYes, sounds excellent. 2 - Drop the "curves" in favor of "definition"Yep! Sounds good. I had another read of the v3 spec. How does the below sound?
e.g. for the example file in the LAS v3 spec:
and the
One problem I can foresee with the above is the "Core" dataset. Maybe they should be retained as "Core[1]" and "Core[2]". I don't know... ? This would be backwards-compatible with LAS v2 because the ~Parameter, ~Curve, and ~ASCII sections could always be considered part of the "ASCII" data set: eg v2 pseudocode LASFile.sections = {
'Version': SectionItem() 1
'Well': SectionItem() 2
'Parameter': SectionItem() 3
'Curves': SectionItem() 4
'ASCII': np.ndarray / pd.DataFrame 5
}
LASFile.data_sets = {
'ASCII': {
'Parameter': SectionItem() 3
'Definition': SectionItem() 4
'Data': np.ndarray / pd.DataFrame 5
}
} I think we should avoid calling the ~Parameter, ~Curve, ~ASCII set of sections the "Log" data set as the v3 spec sometimes does, because we might encounter a file with both the ~Parameter, ~Curve, ~ASCII AND the ~Log_Parameter, ~Log_Definition, and ~Log_Data sections. (SectionItems() is a new class defined in PR #106) At least two other major changes needed are: 3 - Add data definition typesNot all data is a CurveItem - we would also need ArrayItem and a "StringCurveItem". The latter could perhaps be achieved best with a wrapper around a striplog 4 - Datetime reference curvesThis could be best managed by requiring pandas, which has excellent support for using date/time stamps as indexes. |
No problem, I'm glad you found useful! Continuing the discussion... 2 - Drop the "curves" in favor of "definition" I agree with what you said. Actually at first I idealised a I don't expect to find a lot of You actually grouped the LASFile.data_sets = {
#...
'Core': [
{
'Parameter': SectionItems() 8
'Definition': SectionItems() 9
'Data': np.ndarray / pd.DataFrame 10
},
{
'Parameter': SectionItems() 11
'Definition': SectionItems() 12
'Data': np.ndarray / pd.DataFrame 13
}
],
#...
} But if we keep the LASFile.data_sets = {
#...
'Core[1]': {
'Parameter': SectionItems() 8
'Definition': SectionItems() 9
'Data': np.ndarray / pd.DataFrame 10
},
'Core[2]': {
'Parameter': SectionItems() 8
'Definition': SectionItems() 9
'Data': np.ndarray / pd.DataFrame 10
},
#...
} If I understood correctly that could also happen with any other section. How are you planning to approach the different runs that could be present in the files? Like it is present in the sample_las3.0_spec.las file.
3 - Add data definition types You are right. However I'm not sure we need a particular type for it. For example the The 4 - Datetime reference curves Yes! If pandas eventually gets in as a requirement it's worth to try to leverage of the other packages that come along, such as the support for spreadsheets and so forth. |
Now all header sections are parsed fully before returning to read data sections. Broken - this is a work in progress.
FYI I'm happy to bring pandas in as a lasio requirement as part of this work. It's probably time. |
Remove python 2.7 from Travis-CI Rearrange-Reader: Enable Unknown section tests to pass Use TextIOWrapper.tell() to get section start pos Add initial LAS 3.0 test infrastructure - Add tests/examples/3.0 dir. - Add the CWLS's 3.0 example las file. - Copy the example file to sample_3.0.las to standardize with 1.2 and 2.0 sample las files. - Create a tests/test_read_30.py with basic read test. However, the test is set to SKIP because it current fails on the rearrange-reader branch First draft at isolated data section reader (kinverarity1#5) Now all header sections are parsed fully before returning to read data sections. Add find_sections_in_file() Rebase to master
Now all header sections are parsed fully before returning to read data sections. Broken - this is a work in progress.
Hello, I am wondering how this is progressing? I just got a hold of lasio and eagerly want to use it to read in some las v3.0 files from optical televiewer data. I attach part of one file here, with only 2 lines of data to avoid overloading you with data! When trying to read with the current version of lasio I get the following error: data_in = ls.read(r'example_header.las', ignore_header_errors=True) The data section (as can be seen in the file) starts with Should I try this out with some other branch (i.e. las3.0 ? ) Thanks for your help and for setting this up! |
Hi @shakasaki! Thanks for trying out lasio - sorry that it doesn't work yet for LAS 3.0. We are progressing, slowly, having today merged #327 which was the first step in the list above. No, the las3.0 branch does not have any improvements for you yet - we will continue to merge improvements to I expect your file may work prior to full LAS 3 support. Thank you very much for providing a example file, I'll test it out shortly and see what stands in the way of reading the data section in at least. |
Hello @kinverarity1 and thanks for the response. Please do let me know if you can extract at least the data - that would be great for now. Or perhaps, do you know any resource where I can convert las3.0 to older versions (las 2.0 for example). Thank you for the help! |
@shakasaki using lasio It is very hack-y and will certainly break in the future as we add proper support for this kind of functionality, but it should get you by in the meantime. I wasn't sure how to parse values like |
@kinverarity1 Thank you so much and apologies for the late response. I have been doing fieldwork for the past few days. I really appreciate the effort you put into helping me out. I hope I can give back to this community somehow! The values (e.g. 73.56.5) are RGB triples, that is, they denote a colored pixel. These data are from an optical televiewer. Actually the first value is a depth, followed by 360 RBG triples (one for each degree). I will try out the hack today and see how it works out. Thanks again |
No problem at all. You already have helped out the community by posting your example! 😁 I have adapted the notebook to parse the data as RGB triples - see comments on the gist. |
I have been using the code snipped you wrote to read in las 3.0 files and it works. However, when the file is too large (I have files with [198441 rows x 361 columns] ) the approach crashes. I coded another hacky example using pandas and it can handle the large files, so I was wondering why this is, since I thought lasio also uses pandas internally. Here is my approach: import numpy as np
import pandas as pd # to read in dataframe
from subprocess import check_output # to find where the data starts in the file
file_name = 'input las file that is too large for lasio hack-y example'
# following line uses grep to find out where the data starts in the file
first_dataline = int(str(check_output(["grep", "-n", "~LOG_DATA", file_name]), 'utf-8').split(':')[0])
# read in only data, otherwise pandas crashes due to unicode characters
file_in = pd.read_csv(file_name, skiprows=first_dataline, header=None)
# create a depth array and an array to store RGB values as u integers
depth = file_in[0].to_numpy(dtype='float')
store_array = np.ndarray((file_in.shape[0], file_in.shape[1] -1,3),dtype='uint8')
# loop over each column in the ATV data
for col in range(360):
temp = file_in[col+1].str.split('.', expand = True)
store_array[:,col-1,:] = temp.apply(lambda x: pd.to_numeric(x, downcast='unsigned')).to_numpy(dtype='uint8') The approach works well with the large files, but does not read in the header of course. Right now, i'm fine with it. Hopefully this insight can help in making lasio able to handle larger files (if that is actually a problem!) |
Thanks for the code! Lasio doesn't use pandas as a reader yet (see #1 and this thread for a discussion of the reasons) but we plan to switch to it soon. I'm glad you found a solution! |
I'm actually writing a LAS reader for R right now and have been struggling through LAS 3.0. Will be interested to see how you handle it. |
I completely forgot about this wiki page from ages ago: Additions in LAS 3 to look out for:
|
@dcslagel Regarding the Transform 21 hackathon - do you think perhaps we should convert (perhaps manually) this issue to a GitHub 'discussion'? I am getting lost in the discussion of the many different elements of what adds up to "LAS 3" support spread across multiple issues. It might work better as a discussion, with different threads for each element? I'm happy to do this tomorrow if you think it best. |
Hi @kinverarity1, thought about a bit...
|
👍 I don't want to break links so I'll leave it here but clean up the body of the issue and make sure we have at least 'stub' issues for each individual part that needs doing. Also, I'll change the title, since some version 3.0 files technically do now "work" in lasio, just not well. |
Hoping progress is still being made for handling the LAS 3.0 files. Is the 2 year gap in comments a bad sign? |
Sorry, GH account issues. Still interested in the LAS 3.0 progress. |
I haven't done anything. I remember looking at it and thinking it would require some pretty big changes and pose some backward compatibility problems. Also, I'm an R user for the most part. I wrote my own package that will read las 3.0. https://github.com/donald-keighley/lasr The core of it is written in c++ so maybe it could be ported over? I don't know how to do that. |
Also how many las 3 files do you have to read? I've seen very few of these in the wild. |
Yep it's a bad sign. Although I'd love to get this working, I have no spare time at all for this. I also never (like zero times) see LAS 3 files outside of lasio development, as I don't work in oil & gas. My archive of files I work with daily is mostly LAS 1.2 😄 Happy to look at PRs which tackle specific parts of the problem here. But lasio has some poor design at its core, I suspect. I wrote it when I was learning Python. It would probably make more sense for someone to take lasio and significantly re-work it, or re-write a better approach from scratch. Or use R instead to use @donald-keighley's package (which is really good by the look of it!) |
Hey, thanks for the insight. Yeah, the only ones I've seen are relative to my particular use case. The North Carolina Groundwater Section provides LAS files for download from groundwater monitoring wells & they (at least some) are LAS 3.0. |
Ah, I see. Thanks for the background. Lasio actually works very well so far for the LAS viewer I built around it. The LAS 3.0 files are the only exception. Unfortunately, LAS 3.0 apparently exist to some extent in my world (see response to Donald Keighley). Anyway, thanks for the work you have done! Good luck with all. |
LAS 3 specification: https://github.com/kinverarity1/lasio/blob/main/standards/LAS_3_File_Structure.pdf
Tasks:
pd.read_csv
) - Update 26th April: resolved by Allow different data types per curve in data section reader #461np.genfromtxt
) #446: Use an accelerated pandas reader e.g.pd.read_csv
/pd.read_fwf
where it is not needed for substitutions etc (refer to discussion in Parse dates to datetime objects #1) - Update 26th April: being worked on in Add a numpy engine for reading using numpy.genfromtxt() #452Update May 2020: I will start to sketch out a roadmap for how to achieve this. I think once this is reasonably well tested we can do a version 1 release.
Goals:
Aim to improve reading performance times, it's really bad at the momentBecause I expect this work might require a broken branch for a while, let's merge into the las3-develop branch if we need to.The text was updated successfully, but these errors were encountered: