Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Tape Image Format (TIF) files #250

Merged
merged 18 commits into from
Jun 4, 2020

Conversation

ErlendHaa
Copy link
Contributor

@ErlendHaa ErlendHaa commented Mar 27, 2020

resolves #66
resolves #179
fixes #188
fixes #189
resolves #218
resolves #227

@ErlendHaa ErlendHaa force-pushed the use-lfp-as-io-device branch 3 times, most recently from 10c29ec to a275c92 Compare March 30, 2020 12:09
Copy link
Contributor

@achaikou achaikou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 👍 👍 👍 This whole dlis processing looks so much simpler now :)

If you are asking why I have so few linguistic comments:
If I could comment on commit messages (and only on parts which are supposed to stay) you would have had (7+5+11+5=) around 28 additional comments :D

CMakeLists.txt Outdated Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
lib/extension/dlisio/ext/io.hpp Outdated Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
python/dlisio/ext/core.cpp Outdated Show resolved Hide resolved
lib/src/io.cpp Show resolved Hide resolved
lib/src/io.cpp Show resolved Hide resolved
lib/extension/dlisio/ext/io.hpp Show resolved Hide resolved
lib/test/index-records.cpp Show resolved Hide resolved
lib/CMakeLists.txt Outdated Show resolved Hide resolved
lib/extension/dlisio/ext/io.hpp Show resolved Hide resolved
lib/extension/dlisio/ext/io.hpp Show resolved Hide resolved
@ErlendHaa ErlendHaa force-pushed the use-lfp-as-io-device branch 3 times, most recently from dbc467d to badc3b0 Compare April 20, 2020 11:02
Copy link
Contributor

@achaikou achaikou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like clean stream very much now! 👍 👍

Resolved all the previous comments, if something was not clear to me repeated the comment again.
Now mostly complain about trim method.

lib/extension/dlisio/ext/io.hpp Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
lib/CMakeLists.txt Show resolved Hide resolved
python/dlisio/ext/core.cpp Show resolved Hide resolved
lib/include/dlisio/dlisio.h Outdated Show resolved Hide resolved
python/dlisio/__init__.py Outdated Show resolved Hide resolved
python/tests/test_physical_layout.py Show resolved Hide resolved
lib/extension/dlisio/ext/io.hpp Outdated Show resolved Hide resolved
lib/extension/dlisio/ext/io.hpp Show resolved Hide resolved
python/dlisio/ext/core.cpp Show resolved Hide resolved
Copy link
Contributor

@achaikou achaikou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am running out of things to complain about 😄

README.md Show resolved Hide resolved
lib/extension/dlisio/ext/io.hpp Show resolved Hide resolved
python/dlisio/__init__.py Outdated Show resolved Hide resolved
lib/src/dlisio.cpp Outdated Show resolved Hide resolved
lib/src/dlisio.cpp Show resolved Hide resolved
lib/src/dlisio.cpp Outdated Show resolved Hide resolved
lib/src/io.cpp Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
lib/extension/dlisio/ext/io.hpp Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
lib/src/io.cpp Outdated Show resolved Hide resolved
@ErlendHaa ErlendHaa force-pushed the use-lfp-as-io-device branch from fe3a0d4 to bdf9b82 Compare May 15, 2020 09:45
ErlendHaa added 6 commits May 27, 2020 01:14
The Format Version (FV) is changed from 0x02 to 0x01 which is the
correct value for a DLISv1 file. Currently dlisio does not care about
the value of FV, but in the foreseeable future that might change. As the
purpose of this testfile is unrelated to the FV, we correct it in order
to avoid failure on FV:

  printf '\x01' | dd of=3lr-in-vr-one-encrypted.dlis bs=1 seek=83 count=1
  conv=notrunc
The Visible Record Length in this file was longer than the actuall VR.
I.e. the VR was truncated. It's an intresting find, as dlisio apparently
does _not_ notice the truncation when it aligns perfectly with the end
of Logical Record. This is subject to change when layered file protocols
is introduced - so for now just update the VRL in the testfile:

printf '\x16' | dd of=multidimensions-multifdata.dlis bs=1 seek=81
count=1 conv=notrunc
Before the knowledge of Tape Image Format, we considered tapemarks as
random trash bytes. Although not supporting TIF directly, dlisio is able
to deal with such unknown bytes at the start of the file. The two
test files for this behaviour have trash bytes that are valid tapemarks.
This is not a problem now, but when TIF support is added we would still
like to be able to test support for actual trash bytes. Hence the
tif-like trash in these test files is changed to actual trash:

data/chap2/pre-sul-garbage.dlis

  for i in {0..11}; do printf '\xaa'; done | dd of=pre-sul-garbage.dlis
  bs=12 count=1 conv=notrunc

data/chap2/pre-sul-pre-vrl-garbage.dlis

  for in {0..11}; do printf '\xaa'; done |
  dd of=pre-sul-pre-vrl-garbage.dlis bs=12 count=1 conv=notrunc
  &&
  for i in {0..23}; do printf '\xaa'; done |
  dd of=pre-sul-pre-vrl-garbage.dlis bs=1 count=24 seek=92 conv=notrunc

Additionally, the VRL of the last Visible Record was to long. Currently
dlisio does care as long as it where able to read what seems to be all
the data. However, this is considered a bug and will change in the
future. The VRL is updated to match the actual size:

  printf '\x0a\x84' | dd of=pre-sul-garbage.dlis bs=1
  seek=4592 count=2 conv=notrunc

  printf '\x0a\x84' | dd of=pre-sul-pre-vrl-garbage.dlis bs=1
  seek=4616 count=2 conv=notrunc
There is a single tapemark at the start of this testfile, which violated
the Tape Image Format spec. As we are not testing any tapemark related
behavior with this file, the tapemark is removed:

dd if=7K-file.dlis ibs=12 skip=1 | dd of=7K-file.dlis bs=1
This testfile is designed to test the failiure of files with LRS-lenght
< 16 bytes. Add otherwise valid content to the too-short-lrs in case the
lenght restriction is lifted in the future.
The expectation that this file should not load due to the SUL might
change in the future. Hence the rest of the data should be valid.
@ErlendHaa ErlendHaa force-pushed the use-lfp-as-io-device branch 10 times, most recently from 69dba12 to cf7e623 Compare May 28, 2020 10:23
This patch does not add lfp to dlisio in any way, it only installs lfp
in the test containers to prime them for when lfp is included into
dlisio.

lfp has no pre-built binaries yeti, so build from source. lfp assumes
fmtlib is installed on the system as well, so install that too [1].
Building from source (especially fmtlib) will increase the total runtime
of each node a bit. But that will improve in the future - when pre-built
binaries for lfp is made available.

[1] the debian package libfmt-dev does not export the target
fmt::fmt-header-only which is the one we want, so install from source.
@ErlendHaa ErlendHaa force-pushed the use-lfp-as-io-device branch 2 times, most recently from 43895d1 to f60dfac Compare May 29, 2020 10:42
Copy link
Contributor

@achaikou achaikou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Took a look at the changes you've mentioned, do not see any new important issues 👍 👍 👍
From my local tests files everything works ok, speed is comparable to previous results and the only file that doesn't work fails on FF01 (this is additional case of this failure to the ones we already discussed, but only almost near 4GB files are affected).
I would also like to add a couple of truncated TIF files in tests, but it's not important for this PR to wait for it.

python/dlisio/__init__.py Outdated Show resolved Hide resolved
.circleci/config.yml Show resolved Hide resolved
.travis.yml Show resolved Hide resolved
python/docs/tif.rst Outdated Show resolved Hide resolved
python/docs/tif.rst Outdated Show resolved Hide resolved
python/docs/tif.rst Outdated Show resolved Hide resolved
appveyor.yml Show resolved Hide resolved
lib/src/dlisio.cpp Show resolved Hide resolved
README.md Show resolved Hide resolved
This patch does not add lfp to dlisio in any way, it only installs lfp
in the test containers to prime them for when lfp is included into
dlisio.

lfp has no pre-built binaries yet, so build from source. lfp assumes
fmtlib is installed on the system as well, so install that too [1].
Building from source (especially fmtlib) will increase the total runtime
of each node a bit. But that will improve in the future - when pre-built
binaries for lfp is made available.

[1] the debian package libfmt-dev does not export the target
fmt::fmt-header-only which is the one we want, so install from source.
@ErlendHaa ErlendHaa force-pushed the use-lfp-as-io-device branch from f60dfac to 5be98bb Compare June 3, 2020 06:29
@ErlendHaa ErlendHaa changed the title [WIP] Add support for Tape Image Format (TIF) files Add support for Tape Image Format (TIF) files Jun 3, 2020
appveyor.yml Show resolved Hide resolved
ErlendHaa added 10 commits June 3, 2020 23:07
Install layered-file-protocols on the images on Appveyor. This patch
does not add lfp to dlisio in any way, that comes later.
In an effort to support Tape Image Format (TIF) files, the realisation
that the amount of bookkeeping needed to keep track of _two_ access
mechanic format, Visible Envelope (VE) and TIF, is just to large for
dlisio to tackle on its own. Luckily the library Layered File Protocols
[1] is designed for this kind of bookkeeping, and more precisely;
present the illusion that these access mechanic formats are not there.
From lfp's documentation:

 > Layered file protocols is a high-level interface for byte-oriented
   file handles with layering. File descriptors are composed and stacked
   at run-time, adding features such as pre-fetching, indices, cloud
   access, and file format support at each level

lfp have out of the box support for both Visible Envelope and Tape
Image Format. These protocols give the user of the io-device the
illusion that the file is without Visible Envelope and tapemarks, respectably.
For dlisio, that means that our core library can act as if the only
thing present in a dlis-file is the actual dlis-data.

This commit only adapts dlisio's main io-device, dl::stream, to use
lfp's cfile is favor of std::fstream. The real golden nugget comes later
when lfp's rp66 and tapeimage protocols are introduced.

[1] https://github.com/equinor/layered-file-protocols

fixup! Use lfp as the underlying io-device for dl:stream
Although dl::stream is already powered by lfp, the full potential of lfp comes
later when we add support for the rp66 and tapeimage protocols. But before
that can happen, all io must be powered by lfp.
When stream take full advantage of lfp and use it's rp66 protocol to deal
with Visible Envelopes, the sul will be invisible for the stream. As the
sul is only 80 bytes we read it at load and pass the raw bytes to the
dlis constructor.
As lfp and its protocol is gradually making its way into dlisio, it's
time to say goodbye to the Visible Envelopes. Traditionally, dlisio's
own io have been tasked with bookkeeping of VE in order to correctly
skip them.  lfp's rp66 protocol is designed for exactly that purpose. In
short, the protocol hides all the bytes related to VE's by doing the
same kind of bookkeeping that dlisio have done itself:

What the files look like on disk:

                 -----------------------------------
                | SUL | VRL | LRS | LRS | VRL | LRS |
                 -----------------------------------
byteoffset      0    80    84   184    284   288   388

What dlisio sees when using the rp66 protocol:

                 -----------------
                | LRS | LRS | LRS |
                 -----------------
byteoffset      0    100   200   300

This means dlisio can focus on what its here for, reading the dlis-data.
All functions using io now safely assumes that the file is a series of
LRS.
The introduction of lfp as io device means the dependency on the
external library mio has come to an end.
With lfp as the only io-device, adding support for TapeImageFormat-files
is trivial. In the same way as the rp66-protocol hides all bytes that
are part of the Visible Envelope, lfp's tapeimage protocol hides all
bytes that are apart of the Tape Image Format:

What the physical file look like on disk:

               ------------------------------------------
              | TM | SUL | TM | VR | LRS | .. | LRS | TM |
               ------------------------------------------
byteoffsets:  0   12    92   104  108   208  308  408  420

What dlisio see after applying both the tapeimage- and rp66-protocol
from lfp:

               -----------------
              | LRS |  .. | LRS |
               -----------------
byteoffsets:  0   100    200   300

dlisio investigates the first bytes of a file to figure out
whether or not there are tapemarks present. Protocols are added
accordingly. In effect, dlisio does not put any responsibility on the
user to supply information about the presence of tapemarks.

There is currently no support for files with missing storage units or
missing tapemarks at the end. Both are clear standard violations.
However, missing suls seams to be fairly common (~10% of TIF files), and
the logic for such support should be fairly simple, but for now this is
considered a low hanging fruit for the near future.
We are yet to come over regular files with missing suls. However, for
tif-files this seams to happen quite frequent.

This patch lifts that restrictions and allows files with missing Storage
Unit Label, both for regular- and tif-files.
Give a short explanation the relations between RP66 and Tape Image
Format and how dlisio handles that.
@ErlendHaa ErlendHaa force-pushed the use-lfp-as-io-device branch from 5be98bb to 7db44ba Compare June 4, 2020 06:09
@ErlendHaa ErlendHaa merged commit 35fae89 into equinor:master Jun 4, 2020
@ErlendHaa ErlendHaa deleted the use-lfp-as-io-device branch June 4, 2020 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants