-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error handling #296
Error handling #296
Conversation
achaikou
commented
Sep 24, 2020
- Performance seems to be mostly fine, checked one correct 3.4GB file once, process time increased from 1m 38s to 1m 40s, so within error margin.
- Real files with representation code issue seem to also behave as expected (I get the error about attributes in the logs, all other parameters in the set are processed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work. I think this is the pr that will close the most issues at once this year!
076c389
to
6b40733
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review! 👍
Majority is fixed.
I am not arguing with the docs deity 😄
92ac8d2
to
c141862
Compare
Let parse_set_component, parse_template and parse_objects be methods on object_set thus allowing them to operate directly on set attributes.
There were several places where truncated files could fail, but as truncated files could be detected on markers level, we should inform user about that early.
dlis_error and error_severity are two beings that we would need later when creating our error-mechanism. dlis_error represents error one way or another related to protocol processing. error_severity is a field in dlis error which shows how critical for further and correct processing of data is this error according to developers.
As we plan to finally give users all the warnings we see during operations in C++, we need to create an error handler class familiar both to C++ and to Python. We'll do it similar way to the PyMatcher. While logger conceptually could be a global property (like encodings), decision was made to avoid global settings and make logger a dlis attribute. It's likely to stay the same for various files, but now user is in control.
Introducing system that will send debug/info/warning information that appears during attributes parsing to the user. Co-authored-by: Erlend Hårstad <erha@equinor.com>
We have several files where - reprcode is invalid in template - attribute descriptor in object attribute is "20" We already allowed invalid representation code in templates (ee1680b) in hope that they will get overwritten in future. Sometimes they are and sometimes they are not. But we can deal with situation described above fairly well. Attribute is clearly spoiled, but that doesn't prevent us from parsing further. Hence we can just log this error and continue.
With current approach if error happens during object set processing, we lose all the processed template attributes and objects. Storing them the moment we processed them seems like a better approach. Thus if error happens afterwards we are in better situation with regards to debugging, error recovery and feeding users as much data as possible.
When something unexpected happens in one of the attributes, we want to let users know about it when they access other attributes of the same object. Due to small amount of restrictions in rp66 it can happen that we correctly process already broken attributes and do not notice that they are spoiled. Thus user will trust these attributes when they shouldn't. In theory same can happen for objects (users will believe that object is ok, while in reality it happened to be be processed by accident). This situation is however very unlikely. So in order to not introduce too many log messages, that situation is skipped. As long as errors of at least MINOR level on attributes exist, MINOR message will be logged for corresponding object.
We check for discrepancy with specification the earliest possible. While we can't decide right away if 'attribute' flags are correct as it depends on whether attribute belongs to object or template, we can do so for 'object's and 'set's. By doing so we "fixed" the flags early, thus partly obscuring information. Now all the checks are postponed until before the parsing affected items themselves. No flags are overwritten. Commit doesn't affect existing behavior.
While we do not support neither redundant nor replacement sets, we need to inform users if they ever appear. As of now we haven't seen them used in real files. Both can cause certain problems for the users, but redundant data might be handled fairly well, hence we inform users through minor error. Replacement sets might be a bigger problem for the user, hence major error is reported.
We detect various specification violation. Some of them we consider insignificant enough and recover. Out users can, however, disagree with us about importance of certain specification violations. Hence we give users a little bit more control over it. For example, users want to receive an error every time something is spoiled even a little bit. Or on the opposite - users may want to get as much data as possible, accepting the danger that data might be spoiled.
here are many truncated (or zeroed out) files out there. Right now we do not give users even a chance to read such files as they fail right on the load. However problem might appear only after hundreds of megabytes of correct data. Thus this commit aims to allow users to read as much as possible data from such files. By using error escape setting it's up to users to decide if they want to process broken files or not. While dlisio doesn't attempt to recover and find out if there is anything useful beyond error offset, already processed data will be returned. Note that we want to catch and handle any wrong situation that can happen in dlisio files (even though we do not know about its existence yet). Current implementation (here and in further commits) is a result of fierce debate about the best approach on how to tackle the issue.
Not many files can fail inside extract method because structure is processed in findoffsets and data is postponed until parse, but it's still possible. Thus making sure that failed records won't be processed, but records after bad one will be.
Divide function into two parts to make code for now less difficult to deal with. This is by no means enough and sufficient, and proper refactoring of this code is required.
Note: same problems with implementation as for findoffsets and others. Agreement regarding try-catch size, use of throw and methods extraction wasn't unanimous.
We already escape recoverable errors by storing them inside objects and attributes. Now we also can escape unparsable errors by returning everything we were able to process before.
Invalid value was printed when origin or copynumber where 0.
Issue wasn't implemented, but now we can just log an error on the attribute and continue processing given that we believe that value is not present. If that doesn't hold and exception is thrown, it will be later handled, reported and processed objects returned.
We now can inform user about a situation where padbytes number was the same as LRS length. Previously that information was skipped, we barely assumed dlisio deals with this correctly. Similar way we can deal with other non-implemented warnings in io.cpp.
Now as we added error handling to all the main methods, we can test how changes to error handler affect results of various operations combined with each other.
c141862
to
4630e53
Compare