Error handling #296

achaikou · 2020-09-24T17:18:09Z

Performance seems to be mostly fine, checked one correct 3.4GB file once, process time increased from 1m 38s to 1m 40s, so within error margin.
Real files with representation code issue seem to also behave as expected (I get the error about attributes in the logs, all other parameters in the set are processed).

ErlendHaa

Good work. I think this is the pr that will close the most issues at once this year!

lib/extension/dlisio/ext/types.hpp

lib/src/parse.cpp

python/data/chap2/README.rst

lib/src/io.cpp

python/dlisio/ext/core.cpp

lib/src/parse.cpp

ErlendHaa

Great work!

lib/src/parse.cpp

lib/src/io.cpp

python/dlisio/plumbing/errorhandler.py

python/dlisio/ext/core.cpp

achaikou

Thanks for the thorough review! 👍
Majority is fixed.
I am not arguing with the docs deity 😄

lib/src/io.cpp

python/dlisio/plumbing/errorhandler.py

lib/src/parse.cpp

python/dlisio/ext/core.cpp

python/tests/test_logical_record.py

python/dlisio/plumbing/errorhandler.py

lib/src/parse.cpp

python/dlisio/plumbing/errorhandler.py

lib/src/io.cpp

python/dlisio/errors/errorhandler.py

Let parse_set_component, parse_template and parse_objects be methods on object_set thus allowing them to operate directly on set attributes.

There were several places where truncated files could fail, but as truncated files could be detected on markers level, we should inform user about that early.

dlis_error and error_severity are two beings that we would need later when creating our error-mechanism. dlis_error represents error one way or another related to protocol processing. error_severity is a field in dlis error which shows how critical for further and correct processing of data is this error according to developers.

As we plan to finally give users all the warnings we see during operations in C++, we need to create an error handler class familiar both to C++ and to Python. We'll do it similar way to the PyMatcher. While logger conceptually could be a global property (like encodings), decision was made to avoid global settings and make logger a dlis attribute. It's likely to stay the same for various files, but now user is in control.

Introducing system that will send debug/info/warning information that appears during attributes parsing to the user. Co-authored-by: Erlend Hårstad <erha@equinor.com>

We have several files where - reprcode is invalid in template - attribute descriptor in object attribute is "20" We already allowed invalid representation code in templates (ee1680b) in hope that they will get overwritten in future. Sometimes they are and sometimes they are not. But we can deal with situation described above fairly well. Attribute is clearly spoiled, but that doesn't prevent us from parsing further. Hence we can just log this error and continue.

With current approach if error happens during object set processing, we lose all the processed template attributes and objects. Storing them the moment we processed them seems like a better approach. Thus if error happens afterwards we are in better situation with regards to debugging, error recovery and feeding users as much data as possible.

When something unexpected happens in one of the attributes, we want to let users know about it when they access other attributes of the same object. Due to small amount of restrictions in rp66 it can happen that we correctly process already broken attributes and do not notice that they are spoiled. Thus user will trust these attributes when they shouldn't. In theory same can happen for objects (users will believe that object is ok, while in reality it happened to be be processed by accident). This situation is however very unlikely. So in order to not introduce too many log messages, that situation is skipped. As long as errors of at least MINOR level on attributes exist, MINOR message will be logged for corresponding object.

We check for discrepancy with specification the earliest possible. While we can't decide right away if 'attribute' flags are correct as it depends on whether attribute belongs to object or template, we can do so for 'object's and 'set's. By doing so we "fixed" the flags early, thus partly obscuring information. Now all the checks are postponed until before the parsing affected items themselves. No flags are overwritten. Commit doesn't affect existing behavior.

While we do not support neither redundant nor replacement sets, we need to inform users if they ever appear. As of now we haven't seen them used in real files. Both can cause certain problems for the users, but redundant data might be handled fairly well, hence we inform users through minor error. Replacement sets might be a bigger problem for the user, hence major error is reported.

We detect various specification violation. Some of them we consider insignificant enough and recover. Out users can, however, disagree with us about importance of certain specification violations. Hence we give users a little bit more control over it. For example, users want to receive an error every time something is spoiled even a little bit. Or on the opposite - users may want to get as much data as possible, accepting the danger that data might be spoiled.

here are many truncated (or zeroed out) files out there. Right now we do not give users even a chance to read such files as they fail right on the load. However problem might appear only after hundreds of megabytes of correct data. Thus this commit aims to allow users to read as much as possible data from such files. By using error escape setting it's up to users to decide if they want to process broken files or not. While dlisio doesn't attempt to recover and find out if there is anything useful beyond error offset, already processed data will be returned. Note that we want to catch and handle any wrong situation that can happen in dlisio files (even though we do not know about its existence yet). Current implementation (here and in further commits) is a result of fierce debate about the best approach on how to tackle the issue.

Not many files can fail inside extract method because structure is processed in findoffsets and data is postponed until parse, but it's still possible. Thus making sure that failed records won't be processed, but records after bad one will be.

Divide function into two parts to make code for now less difficult to deal with. This is by no means enough and sufficient, and proper refactoring of this code is required.

Note: same problems with implementation as for findoffsets and others. Agreement regarding try-catch size, use of throw and methods extraction wasn't unanimous.

We already escape recoverable errors by storing them inside objects and attributes. Now we also can escape unparsable errors by returning everything we were able to process before.

Invalid value was printed when origin or copynumber where 0.

Issue wasn't implemented, but now we can just log an error on the attribute and continue processing given that we believe that value is not present. If that doesn't hold and exception is thrown, it will be later handled, reported and processed objects returned.

We now can inform user about a situation where padbytes number was the same as LRS length. Previously that information was skipped, we barely assumed dlisio deals with this correctly. Similar way we can deal with other non-implemented warnings in io.cpp.

python/dlisio/errors/errorhandler.py

lib/src/parse.cpp

Now as we added error handling to all the main methods, we can test how changes to error handler affect results of various operations combined with each other.

ErlendHaa reviewed Oct 5, 2020

View reviewed changes

achaikou force-pushed the everything_about_errors branch from 076c389 to 6b40733 Compare November 6, 2020 17:29

ErlendHaa reviewed Nov 11, 2020

View reviewed changes

achaikou commented Nov 12, 2020

View reviewed changes

ErlendHaa reviewed Nov 16, 2020

View reviewed changes

lib/src/parse.cpp Outdated Show resolved Hide resolved

ErlendHaa reviewed Nov 17, 2020

View reviewed changes

lib/src/io.cpp Outdated Show resolved Hide resolved

ErlendHaa reviewed Nov 18, 2020

View reviewed changes

python/dlisio/errors/errorhandler.py Outdated Show resolved Hide resolved

achaikou force-pushed the everything_about_errors branch from 92ac8d2 to c141862 Compare November 18, 2020 16:41

achaikou and others added 21 commits November 19, 2020 10:53

Transform parse functions into object_set methods

5c65b69

Let parse_set_component, parse_template and parse_objects be methods on object_set thus allowing them to operate directly on set attributes.

Add missing physical layout tests

2e03f62

Fail truncated files in findoffsets

66cb3ee

There were several places where truncated files could fail, but as truncated files could be detected on markers level, we should inform user about that early.

Propagate attribute warnings to Python

4da7baa

Introducing system that will send debug/info/warning information that appears during attributes parsing to the user. Co-authored-by: Erlend Hårstad <erha@equinor.com>

Propagate object set warnings to Python

c29f604

Propagate object warnings to Python

0bdc3ca

Add error handler to extract method

8ccb9d0

Not many files can fail inside extract method because structure is processed in findoffsets and data is postponed until parse, but it's still possible. Thus making sure that failed records won't be processed, but records after bad one will be.

Add error handler to findfdata method

c4ba9c2

Divide read_fdata function

97c316e

Divide function into two parts to make code for now less difficult to deal with. This is by no means enough and sufficient, and proper refactoring of this code is required.

Add error handler to readfdata

dd927f6

Note: same problems with implementation as for findoffsets and others. Agreement regarding try-catch size, use of throw and methods extraction wasn't unanimous.

Add error handler to parse

2de34f1

We already escape recoverable errors by storing them inside objects and attributes. Now we also can escape unparsable errors by returning everything we were able to process before.

Add error handler to parse_objects

99d0127

achaikou added 3 commits November 19, 2020 11:48

Fix object not found error message

db83c95

Invalid value was printed when origin or copynumber where 0.

ErlendHaa reviewed Nov 19, 2020

View reviewed changes

python/dlisio/errors/errorhandler.py Show resolved Hide resolved

lib/src/parse.cpp Show resolved Hide resolved

Add combined error handler tests

4630e53

Now as we added error handling to all the main methods, we can test how changes to error handler affect results of various operations combined with each other.

achaikou force-pushed the everything_about_errors branch from c141862 to 4630e53 Compare November 19, 2020 11:28

achaikou merged commit fdb0163 into equinor:master Nov 19, 2020

This was referenced Nov 24, 2020

Return processed frame fdata on error #196

Closed

Read as much of data as possible #197

Closed

Propagate warnings from C #207

Closed

Support wrong reprcodes #219

Closed

Handle inconsistent in LRSH #224

Open

achaikou deleted the everything_about_errors branch December 15, 2020 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling #296

Error handling #296

achaikou commented Sep 24, 2020

ErlendHaa left a comment

ErlendHaa left a comment

achaikou left a comment

Error handling #296

Error handling #296

Conversation

achaikou commented Sep 24, 2020

ErlendHaa left a comment

Choose a reason for hiding this comment

ErlendHaa left a comment

Choose a reason for hiding this comment

achaikou left a comment

Choose a reason for hiding this comment