-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Tape Image Format (TIF) files #250
Conversation
10c29ec
to
a275c92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! 👍 👍 👍 This whole dlis processing looks so much simpler now :)
If you are asking why I have so few linguistic comments:
If I could comment on commit messages (and only on parts which are supposed to stay) you would have had (7+5+11+5=) around 28 additional comments :D
dbc467d
to
badc3b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like clean stream very much now! 👍 👍
Resolved all the previous comments, if something was not clear to me repeated the comment again.
Now mostly complain about trim method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am running out of things to complain about 😄
fe3a0d4
to
bdf9b82
Compare
The Format Version (FV) is changed from 0x02 to 0x01 which is the correct value for a DLISv1 file. Currently dlisio does not care about the value of FV, but in the foreseeable future that might change. As the purpose of this testfile is unrelated to the FV, we correct it in order to avoid failure on FV: printf '\x01' | dd of=3lr-in-vr-one-encrypted.dlis bs=1 seek=83 count=1 conv=notrunc
The Visible Record Length in this file was longer than the actuall VR. I.e. the VR was truncated. It's an intresting find, as dlisio apparently does _not_ notice the truncation when it aligns perfectly with the end of Logical Record. This is subject to change when layered file protocols is introduced - so for now just update the VRL in the testfile: printf '\x16' | dd of=multidimensions-multifdata.dlis bs=1 seek=81 count=1 conv=notrunc
Before the knowledge of Tape Image Format, we considered tapemarks as random trash bytes. Although not supporting TIF directly, dlisio is able to deal with such unknown bytes at the start of the file. The two test files for this behaviour have trash bytes that are valid tapemarks. This is not a problem now, but when TIF support is added we would still like to be able to test support for actual trash bytes. Hence the tif-like trash in these test files is changed to actual trash: data/chap2/pre-sul-garbage.dlis for i in {0..11}; do printf '\xaa'; done | dd of=pre-sul-garbage.dlis bs=12 count=1 conv=notrunc data/chap2/pre-sul-pre-vrl-garbage.dlis for in {0..11}; do printf '\xaa'; done | dd of=pre-sul-pre-vrl-garbage.dlis bs=12 count=1 conv=notrunc && for i in {0..23}; do printf '\xaa'; done | dd of=pre-sul-pre-vrl-garbage.dlis bs=1 count=24 seek=92 conv=notrunc Additionally, the VRL of the last Visible Record was to long. Currently dlisio does care as long as it where able to read what seems to be all the data. However, this is considered a bug and will change in the future. The VRL is updated to match the actual size: printf '\x0a\x84' | dd of=pre-sul-garbage.dlis bs=1 seek=4592 count=2 conv=notrunc printf '\x0a\x84' | dd of=pre-sul-pre-vrl-garbage.dlis bs=1 seek=4616 count=2 conv=notrunc
There is a single tapemark at the start of this testfile, which violated the Tape Image Format spec. As we are not testing any tapemark related behavior with this file, the tapemark is removed: dd if=7K-file.dlis ibs=12 skip=1 | dd of=7K-file.dlis bs=1
This testfile is designed to test the failiure of files with LRS-lenght < 16 bytes. Add otherwise valid content to the too-short-lrs in case the lenght restriction is lifted in the future.
The expectation that this file should not load due to the SUL might change in the future. Hence the rest of the data should be valid.
69dba12
to
cf7e623
Compare
This patch does not add lfp to dlisio in any way, it only installs lfp in the test containers to prime them for when lfp is included into dlisio. lfp has no pre-built binaries yeti, so build from source. lfp assumes fmtlib is installed on the system as well, so install that too [1]. Building from source (especially fmtlib) will increase the total runtime of each node a bit. But that will improve in the future - when pre-built binaries for lfp is made available. [1] the debian package libfmt-dev does not export the target fmt::fmt-header-only which is the one we want, so install from source.
43895d1
to
f60dfac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Took a look at the changes you've mentioned, do not see any new important issues 👍 👍 👍
From my local tests files everything works ok, speed is comparable to previous results and the only file that doesn't work fails on FF01 (this is additional case of this failure to the ones we already discussed, but only almost near 4GB files are affected).
I would also like to add a couple of truncated TIF files in tests, but it's not important for this PR to wait for it.
This patch does not add lfp to dlisio in any way, it only installs lfp in the test containers to prime them for when lfp is included into dlisio. lfp has no pre-built binaries yet, so build from source. lfp assumes fmtlib is installed on the system as well, so install that too [1]. Building from source (especially fmtlib) will increase the total runtime of each node a bit. But that will improve in the future - when pre-built binaries for lfp is made available. [1] the debian package libfmt-dev does not export the target fmt::fmt-header-only which is the one we want, so install from source.
f60dfac
to
5be98bb
Compare
Install layered-file-protocols on the images on Appveyor. This patch does not add lfp to dlisio in any way, that comes later.
In an effort to support Tape Image Format (TIF) files, the realisation that the amount of bookkeeping needed to keep track of _two_ access mechanic format, Visible Envelope (VE) and TIF, is just to large for dlisio to tackle on its own. Luckily the library Layered File Protocols [1] is designed for this kind of bookkeeping, and more precisely; present the illusion that these access mechanic formats are not there. From lfp's documentation: > Layered file protocols is a high-level interface for byte-oriented file handles with layering. File descriptors are composed and stacked at run-time, adding features such as pre-fetching, indices, cloud access, and file format support at each level lfp have out of the box support for both Visible Envelope and Tape Image Format. These protocols give the user of the io-device the illusion that the file is without Visible Envelope and tapemarks, respectably. For dlisio, that means that our core library can act as if the only thing present in a dlis-file is the actual dlis-data. This commit only adapts dlisio's main io-device, dl::stream, to use lfp's cfile is favor of std::fstream. The real golden nugget comes later when lfp's rp66 and tapeimage protocols are introduced. [1] https://github.com/equinor/layered-file-protocols fixup! Use lfp as the underlying io-device for dl:stream
Although dl::stream is already powered by lfp, the full potential of lfp comes later when we add support for the rp66 and tapeimage protocols. But before that can happen, all io must be powered by lfp.
When stream take full advantage of lfp and use it's rp66 protocol to deal with Visible Envelopes, the sul will be invisible for the stream. As the sul is only 80 bytes we read it at load and pass the raw bytes to the dlis constructor.
As lfp and its protocol is gradually making its way into dlisio, it's time to say goodbye to the Visible Envelopes. Traditionally, dlisio's own io have been tasked with bookkeeping of VE in order to correctly skip them. lfp's rp66 protocol is designed for exactly that purpose. In short, the protocol hides all the bytes related to VE's by doing the same kind of bookkeeping that dlisio have done itself: What the files look like on disk: ----------------------------------- | SUL | VRL | LRS | LRS | VRL | LRS | ----------------------------------- byteoffset 0 80 84 184 284 288 388 What dlisio sees when using the rp66 protocol: ----------------- | LRS | LRS | LRS | ----------------- byteoffset 0 100 200 300 This means dlisio can focus on what its here for, reading the dlis-data. All functions using io now safely assumes that the file is a series of LRS.
The introduction of lfp as io device means the dependency on the external library mio has come to an end.
With lfp as the only io-device, adding support for TapeImageFormat-files is trivial. In the same way as the rp66-protocol hides all bytes that are part of the Visible Envelope, lfp's tapeimage protocol hides all bytes that are apart of the Tape Image Format: What the physical file look like on disk: ------------------------------------------ | TM | SUL | TM | VR | LRS | .. | LRS | TM | ------------------------------------------ byteoffsets: 0 12 92 104 108 208 308 408 420 What dlisio see after applying both the tapeimage- and rp66-protocol from lfp: ----------------- | LRS | .. | LRS | ----------------- byteoffsets: 0 100 200 300 dlisio investigates the first bytes of a file to figure out whether or not there are tapemarks present. Protocols are added accordingly. In effect, dlisio does not put any responsibility on the user to supply information about the presence of tapemarks. There is currently no support for files with missing storage units or missing tapemarks at the end. Both are clear standard violations. However, missing suls seams to be fairly common (~10% of TIF files), and the logic for such support should be fairly simple, but for now this is considered a low hanging fruit for the near future.
We are yet to come over regular files with missing suls. However, for tif-files this seams to happen quite frequent. This patch lifts that restrictions and allows files with missing Storage Unit Label, both for regular- and tif-files.
Give a short explanation the relations between RP66 and Tape Image Format and how dlisio handles that.
5be98bb
to
7db44ba
Compare
resolves #66
resolves #179
fixes #188
fixes #189
resolves #218
resolves #227