Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read as much of data as possible #197

Closed
achaikou opened this issue Nov 26, 2019 · 2 comments
Closed

Read as much of data as possible #197

achaikou opened this issue Nov 26, 2019 · 2 comments

Comments

@achaikou
Copy link
Contributor

achaikou commented Nov 26, 2019

I saw many files when something is broken in the late point of the file, sometimes after gigabytes of data.

  • Chopped off files (around 60)
  • Files filled with 0es from certain point (around 60)
  • Files containing wrong bytes in the middle (around 5)
  • Files with fdata missing bytes (around 25)
  • Files with missing successor of the last record (around 5 probably)
  • Also files with many logical files inside. Error might be only in file 31.

There is no guarantee that existing data is correct, but it's possible to return it to the user for them to decide.
This affects first of all dlis_index_records, but also parsing methods as it might be that data are spoiled in valid records.

@achaikou
Copy link
Contributor Author

Consider possibility to read not only data before the failure occurs, but after the failure as well.
There is no guarantee that data contained inside such broken files makes any sense (it seems that in many cases it actually does not), but we may let user decide.

We can always attempt to find next FF 01, go forward and see if that's a start of new VR or just random FF 01. For manual investigation next observations usually stand:

  • About 90% of all VRs have almost-constant length 20 00.
  • Some go by 40 00.
  • It some cases number would be slightly less than 20 00, for example 1F FE or 1F F4.
  • The last VR would be of whatever length.
  • Some files surround every LR in own VR, so VRs are of corresponding length.

Notes:

  • This approach might require a lot of backtracking and messy code.
  • Sometimes trash data seemingly stops after some reasonable amount of bytes (somewhere up to 0xC000), and sometimes goes for gigabytes.
  • It might be possible to read complete fdata parts inside of broken VRs (manually recognizable by frame names).
  • Amount of useful data obtained this way would be quite small (by now saw probably around 100 files which might benefit from this approach).

@ErlendHaa ErlendHaa removed the blocked label Oct 14, 2020
@ErlendHaa
Copy link
Contributor

Resolved by #296

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants