Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling #296

Merged
merged 25 commits into from
Nov 19, 2020
Merged

Error handling #296

merged 25 commits into from
Nov 19, 2020

Conversation

achaikou
Copy link
Contributor

  • Performance seems to be mostly fine, checked one correct 3.4GB file once, process time increased from 1m 38s to 1m 40s, so within error margin.
  • Real files with representation code issue seem to also behave as expected (I get the error about attributes in the logs, all other parameters in the set are processed).

Copy link
Contributor

@ErlendHaa ErlendHaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. I think this is the pr that will close the most issues at once this year!

lib/extension/dlisio/ext/types.hpp Show resolved Hide resolved
lib/extension/dlisio/ext/types.hpp Outdated Show resolved Hide resolved
lib/src/parse.cpp Outdated Show resolved Hide resolved
python/data/chap2/README.rst Outdated Show resolved Hide resolved
python/data/chap2/README.rst Outdated Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
python/dlisio/ext/core.cpp Outdated Show resolved Hide resolved
python/dlisio/ext/core.cpp Show resolved Hide resolved
lib/src/parse.cpp Outdated Show resolved Hide resolved
lib/src/parse.cpp Outdated Show resolved Hide resolved
@achaikou achaikou force-pushed the everything_about_errors branch from 076c389 to 6b40733 Compare November 6, 2020 17:29
Copy link
Contributor

@ErlendHaa ErlendHaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

lib/src/parse.cpp Outdated Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/ext/core.cpp Outdated Show resolved Hide resolved
python/dlisio/ext/core.cpp Show resolved Hide resolved
Copy link
Contributor Author

@achaikou achaikou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! 👍
Majority is fixed.
I am not arguing with the docs deity 😄

lib/src/io.cpp Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
lib/src/parse.cpp Outdated Show resolved Hide resolved
python/dlisio/ext/core.cpp Outdated Show resolved Hide resolved
python/tests/test_logical_record.py Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
lib/src/parse.cpp Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
python/dlisio/plumbing/errorhandler.py Outdated Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
@achaikou achaikou force-pushed the everything_about_errors branch from 92ac8d2 to c141862 Compare November 18, 2020 16:41
achaikou and others added 21 commits November 19, 2020 10:53
Let parse_set_component, parse_template and parse_objects be methods on
object_set thus allowing them to operate directly on set attributes.
There were several places where truncated files could fail, but as
truncated files could be detected on markers level, we should inform
user about that early.
dlis_error and error_severity are two beings that we would need later
when creating our error-mechanism.

dlis_error represents error one way or another related to protocol
processing.
error_severity is a field in dlis error which shows how critical for
further and correct processing of data is this error according to
developers.
As we plan to finally give users all the warnings we see during
operations in C++, we need to create an error handler class familiar
both to C++ and to Python.
We'll do it similar way to the PyMatcher.

While logger conceptually could be a global property (like encodings),
decision was made to avoid global settings and make logger a dlis
attribute.
It's likely to stay the same for various files, but now user is in
control.
Introducing system that will send debug/info/warning information that
appears during attributes parsing to the user.

Co-authored-by: Erlend Hårstad <erha@equinor.com>
We have several files where
- reprcode is invalid in template
- attribute descriptor in object attribute is "20"

We already allowed invalid representation code in templates
(ee1680b) in hope that they will get
overwritten in future. Sometimes they are and sometimes they are not.
But we can deal with situation described above fairly well.

Attribute is clearly spoiled, but that doesn't prevent us from parsing
further. Hence we can just log this error and continue.
With current approach if error happens during object set processing, we
lose all the processed template attributes and objects.

Storing them the moment we processed them seems like a better approach.
Thus if error happens afterwards we are in better situation with regards
to debugging, error recovery and feeding users as much data as possible.
When something unexpected happens in one of the attributes, we
want to let users know about it when they access other attributes of the
same object.

Due to small amount of restrictions in rp66 it can happen that we
correctly process already broken attributes and do not notice that they
are spoiled. Thus user will trust these attributes when they shouldn't.

In theory same can happen for objects (users will believe that object is
ok, while in reality it happened to be be processed by accident). This
situation is however very unlikely. So in order to not introduce too
many log messages, that situation is skipped.

As long as errors of at least MINOR level on attributes exist,
MINOR message will be logged for corresponding object.
We check for discrepancy with specification the earliest possible. While
we can't decide right away if 'attribute' flags are correct as it
depends on whether attribute belongs to object or template, we can do so
for 'object's and 'set's.
By doing so we "fixed" the flags early, thus partly obscuring
information.

Now all the checks are postponed until before the parsing affected items
themselves. No flags are overwritten.

Commit doesn't affect existing behavior.
While we do not support neither redundant nor replacement sets, we need
to inform users if they ever appear. As of now we haven't seen them used
in real files.

Both can cause certain problems for the users, but redundant data might
be handled fairly well, hence we inform users through minor error.
Replacement sets might be a bigger problem for the user, hence major
error is reported.
We detect various specification violation. Some of them we consider
insignificant enough and recover.
Out users can, however, disagree with us about importance of certain
specification violations.
Hence we give users a little bit more control over it.

For example, users want to receive an error every time something is
spoiled even a little bit.
Or on the opposite - users may want to get as much data as possible,
accepting the danger that data might be spoiled.
here are many truncated (or zeroed out) files out there. Right now we
do not give users even a chance to read such files as they fail right on
the load. However problem might appear only after hundreds of megabytes
of correct data.

Thus this commit aims to allow users to read as much as possible data
from such files.
By using error escape setting it's up to users to decide if they want to
process broken files or not.

While dlisio doesn't attempt to recover and find out if there is
anything useful beyond error offset, already processed data will be
returned.

Note that we want to catch and handle any wrong situation that can
happen in dlisio files (even though we do not know about its existence
yet). Current implementation (here and in further commits) is a result
of fierce debate about the best approach on how to tackle the issue.
Not many files can fail inside extract method because structure is
processed in findoffsets and data is postponed until parse, but it's
still possible.

Thus making sure that failed records won't be processed, but records
after bad one will be.
Divide function into two parts to make code for now less difficult to
deal with.
This is by no means enough and sufficient, and proper refactoring of
this code is required.
Note: same problems with implementation as for findoffsets and others.
Agreement regarding try-catch size, use of throw and methods extraction
wasn't unanimous.
We already escape recoverable errors by storing them inside objects and
attributes.
Now we also can escape unparsable errors by returning everything we were
able to process before.
Invalid value was printed when origin or copynumber where 0.
Issue wasn't implemented, but now we can just log an error on the
attribute and continue processing given that we believe that value is
not present.
If that doesn't hold and exception is thrown, it will be later handled,
reported and processed objects returned.
We now can inform user about a situation where padbytes number was the
same as LRS length.
Previously that information was skipped, we barely assumed dlisio deals
with this correctly.

Similar way we can deal with other non-implemented warnings in io.cpp.
python/dlisio/errors/errorhandler.py Show resolved Hide resolved
lib/src/parse.cpp Show resolved Hide resolved
Now as we added error handling to all the main methods, we can test how
changes to error handler affect results of various operations combined
with each other.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants