-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading shimadzu .lcd files #29
Comments
Hi Rüdiger, Thanks for your feedback. I would definitely be interested in hearing more about your innovations and potentially incorporating them into the package. It would be nice to add support for the fluorescence data! Re: Issue 1, This was just a workaround because I couldn't figure out where the length of the stream was encoded. It worked with all the data files I had access to, but I'm not surprised that it didn't generalize well to every file. I would definitely be interested in fixing this if you can tell me how the length of the stream is encoded. It sounds like maybe this discrepancy is due to a difference in the way the fluorometer stream is encoded compared to the PDA stream. Re: Issue 2. Regarding your comment about the M1 processor, I'm not sure what issue you're running into, but I can tell you that the package is definitely functional on M1 macs, because I am actually doing most of the development of the package on an M1 mac. I'm guessing there is some other issue with your miniconda installation that is causing the installation to fail. To be honest, the python dependencies have been quite a headache and it really makes me wish that reticulate worked more smoothly in the context of a package. Unfortunately, for the Shimadzu LCD parser, the python bindings are pretty necessary since as far as I know there is no equivalent to Best, |
Hi Ethan, the file is read in with olefile.OleFileIO !
In the first case, I just use the first 4 bytes of the PDA Raw Data stream which is repeated before each data block, and simply count how often it appears. In my case it was 3564 times. Then I screen my data file (mainly by eye) and found the number '3564' in the stream "PDA 3D Raw Data/3D Data Item" which makes perfectly sense. This is an XML-type of stream similar to the one from which your code extracts the start and endtime. Unfortunately, I can not read it without an error with my XML-parser, which I can easily for many other xml streams in the same data file. I simply made a workaround and use a string operation to get the number, but this will fail in case the number would have a different length. But from this you know where to find it. For the data of the fluorometer: these instruments are connected in an analog way to the main Shimadszu instrument. The first thing to know is at which channel it is connected to. Most likely its an early one, like in may case its Channel 1. When screening the streams of the data file there are several streams for a high number of channels, but looking on the size of each stream (many are empty) I found the data in "LSS Raw Data/Chromatogram Ch1"! I could not find any additional information in other Channel 1 streams, e.g. for the length of the data set, which is different from the PDA data set as the instruments works with a different frequency. Luckily, the data format of the stream and its decoding is the same than for the PDA data. I can use your block decoding scheme. The differences to the PDA are logical: its only a single data set (the time series of the fluorescence of one excitation/emission channel). While each data set of the PDA data for each PDA spectrum consists of two data block, the fluorescence data have much more data blocks (in my case 18). But here we do not need a fixed order when reading the data in. I simply use a loop over all the blocks til the end of the data stream. For the time axis I am assuming that the start and end times are the same than for the PDA. I have not find out how the scale of the values need to be adjusted. I am getting very large values of up to 10^6 in the peaks, so I divided by 10^6. I can directly compare the results with the data in the Shimadzu software and I am in the same range but about a factor 4 too low, while the setting of the instrument is at Gain 4, but its not exactly factor 4. However here are my python functions for this:
For the cutting of the binary string at position 5 when reading the data: Hope this is helpful. Rüdiger |
Thanks Rüdiger -- this looks great! I wonder if you'd be willing to share one or two test files from your instrument? I'm not sure I have an analog stream in any of the files I currently have access to. I'd definitely be interested to hear about what you find if you make any headway with the spectral libraries. |
Hi Ethan, Rüdiger |
Thanks! |
We are also working on parsing the .lcd file from Shimadzu LC-40. Using the above python code, we have extracted LSS Raw Data - Chromatogram ch1 data. Thanks for it! -Charu |
Personally I haven't really looked into these streams too much -- I was mostly interested in being able to extract the data from the DAD detector -- but I would curious to hear what you figure out. |
Here’s the file with all streams that I have got from OleFile module in Python. In case if anyone has any idea on how to decode it (the peak table stream - ['LSS Data Processing', 'PT-LC.1.1.AD.2.CH#1']), I’d appreciate the help. |
Dear Ethan, data = read_shimadzu_lcd(path, format_out = "data.frame", data_format = "long", read_metadata = TRUE) However, I get this error: Could you please advise? Thanks, Andy |
Hi Andy, |
Dear Ethan,
Thanks so much for your message. I am delighted that you were able to
reproduce the error, and my fingers are crossed that there is a
straightforward solution. Please let me know what I can do to help. I am a
decent R programmer, but understanding the complexities of chromConverted
would be challenging for me. That is quite an R package that you wrote!
Best,
Andy
…On Tue, Jul 30, 2024 at 11:14 AM Ethan Bass ***@***.***> wrote:
Hi Andy,
Thanks for reporting this. I was able to reproduce the error. I should
have time to look into this more later in the week and hopefully track down
where the problem is. Will keep you posted.
Best,
Ethan
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APAF24GCEVSBO5IUMSYI6Y3ZO6UWBAVCNFSM6AAAAABETXJMTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJYGU4TOOBXHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi Andy, |
Hi Ethan, Thanks so much for looking into the .lcd file conversion issue.
We have a Shimadzu HPLC with refractive index (RID-10A) and UV/VIS
(SPD-20A) detectors. Below is an image and description of our machine. Is
it unexpected for the .lcd files to have empty PDA streams?
Our HPLC setup:
https://github.com/actolonen/Analysis_Lab/tree/main/HPLC
thanks! andy
…On Sat, Aug 3, 2024 at 7:04 PM Ethan Bass ***@***.***> wrote:
Hi Andy,
I had a look at your file and the PDA stream seems to be empty? What kind
of detector does your instrument have?
Ethan
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APAF24GJR5XSEZTMFLFJNNLZPUERVAVCNFSM6AAAAABETXJMTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRXGA2TSNRZG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Ahh ok. that makes sense. It's not unexpected if you don't have a PDA detector, it's just that the only parser I've written so far is for the PDA stream. Luckily I think those streams use the same encoding. Does the shape of this chromatogram look right to you? I think there is a scaling factor encoded somewhere in the file -- I'm not yet sure where. Do you perhaps have a screenshot of how the two streams (the refractive index and UV) look for the file you shared with me? Or are you expected two streams? So far I've only been able to find one stream in your file? |
Hi Ethan,
Yes, that chromatogram profile looks exactly like I would expect! I could
get you an .lcd file and the PDF showing the chromatogram and peaks as
calculated by LabSolutions. Would that help?
best,
andy
…On Sun, Aug 4, 2024 at 3:29 AM Ethan Bass ***@***.***> wrote:
Ahh ok. that makes sense. It's not unexpected if you don't have a PDA
detector, it's just that the only parser I've written so far is for the PDA
stream. Luckily I think those streams use the same encoding. Does the shape
of this chromatogram look right to you? I think there is a scaling factor
encoded somewhere in the file -- I'm not yet sure where.
image.png (view on web)
<https://github.com/user-attachments/assets/f857db7b-2995-4bd7-98e9-5d61b8f5177a>
Do you perhaps have a screenshot of how the two streams (the refractive
index and UV) look for the file you shared with me?
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APAF24AIDOTM7Z7SDWMRR23ZPV7XRAVCNFSM6AAAAABETXJMTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRXGIZTEMJQGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes, that would be great if it's not too much trouble. Also are you expecting there to be more than one data stream in this file? |
Hi Andy , |
Hi Ethan,
I grabbed the new version of chromConverter from github and ran it on a
batch of .lcd files from our HPLC. Amazing! Tthe chromatograms produced by
read_shimadzu_lcd() now match those from the Shimadzu software!:
https://github.com/actolonen/Analysis_Lab/tree/main/HPLC/ChromConverter
Two things:
1. As you noted, the peak heights of the chromConverter chromatograms had
to be scaled to match those from Lab Solutions. I multiplied all the values
in the chromConverter chromatograms by the ratio of the max peak ratio (max
peak in Lab Solution chromatogram / max peak in chromConverter chromatogram)
peak.ratio = max(data.ls$Intensity) / max(data.cc$Intensity)
In my samples, the max peak height in the Lab Solutions chromatograms
were always 0.3% of that of the chromConverter peak.
2. Our HPLC has two detectors: refractive index RID-10A (Detector B)
and UV/VIS SPD-20A (Detector A). I just learned that the .lcd files I
sent you only included data from detector B. I am generating .lcd
files that contain data from both detectors and am eager to see if
read_shimadzu_lcd() can also parse these data correctly.
best,
andy
…On Tue, Aug 6, 2024 at 5:59 PM Ethan Bass ***@***.***> wrote:
Hi Andy ,
I pushed an update to the main branch that should be able to read the 2D
chromatograms from your files. Please let me know if you find any issues. I
believe there is a scaling factor which I have not yet been able to locate
in the files, so the scale of the chromatograms may not be correct.
Ethan
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APAF24FN3GTXDHUJZFFXAZLZQDXERAVCNFSM6AAAAABETXJMTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZRGYZTCMZQHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Wonderful! The scaling factor is encoded somewhere in the file, but I haven't yet been able to figure out where it is. I hope with some more digging I can find where this value is encoded and scale the chromatograms accordingly. In another file I have from another instrument it is 0.1% so the 0.3% scaling factor is not consistent between instruments. Regarding the two detectors, I suspect that the function should be able to provide the data from both streams, but it would be great if you can update me on that. Also If you could provide me another example file with both data streams that would be great! Ethan |
@actolonen |
Hi Andy, |
Hi Ethan,
Fantastic! I just grabbed chromConverter 0.6.3 from github and can't wait
to try it out. I am eager to test it on .lcd files that include data from
both our UV and RI detectors, but the person running samples is on
vacation. I will let you know how it works. Also, the 0.001 scaling factor
makes sense to me. We normalize all the peak areas using standards, so the
scaling factor shouldn't be critical so long as it is consistent across
samples.
best,
andy
…On Wed, Aug 14, 2024 at 7:20 PM Ethan Bass ***@***.***> wrote:
Hi Andy,
I just pushed a version with support for reading more of the metadata from
LCD files and it also scales chromatograms by what I think is the scaling
factor (.001 in your case). You should be able to toggle the scaling off by
specifying scale = FALSE.
Ethan
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APAF24FERXK7BY7KIGP2P5TZROGU7AVCNFSM6AAAAABETXJMTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZGM4TAMJRGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Ethan, Here are the .lcd files: I ran read_shimadzu_lcd() as follows: data = read_shimadzu_lcd( This gives the following error. Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : This looks like a simple error that nrows doesn't equal ncols, but I don't know how to troubleshoot this with .lcd files. Could you please advise? best, |
Hi Andy, I'm having trouble reproducing this error with the files you provided. Can you double check what version of chromConverter you're currently running? Also maybe you can run Thanks! By the way, the new version I'm working on in the |
Also would it be alright with you if I include one of your multi-channel shimadzu files as a test file in my chromConverterExtraTests repository? |
Hi Ethan, I confirm that chromConverter works great on our multi-detector .lcd files: My error just was due to the chromatograms from the different detectors having different numbers of lines. I would be delighted if you include one of our multi-channel .lcd files in your chromConverterExtraTests repo. thanks! |
Excellent. Thanks Andy! |
I still don't understand why the intensities are off. I think the values exported in Shimadzu are being rounded or smoothed somehow but I can't figure out how. It's strange, because the other Shimadzu files I have access to are exact. |
Hi Ethan, Just as a quick update: we are routinely using chromConverter to extract chromatograms from .lcd files using our three detectors (RID, UV-210 nm, UV-260 nm). Thanks so much for your great work! The issue of the Lab Solutions peak scaling factor is still obscure. However, we include a set of standard solutions at different concentrations in each plate that we use to quantify compound concentrations. So, my impression is that the scaling factor doesn't matter. Do you agree? best, andy |
Hi Andy,
Thanks for the update. I'm very glad to hear that you're finding the package useful. And yes, I agree that the scaling factor doesn't really matter for practical purposes. It is still nagging at me a little, but I am pretty stumped for the time being.
all best,
Ethan
…On Wed, Oct 16, 2024 at 11:30 AM Andrew Tolonen ***@***.***> wrote:
Hi Ethan, Just as a quick update: we are routinely using chromConverter to
extract chromatograms from .lcd files using our three detectors (RID,
UV-210 nm, UV-260 nm). Thanks so much for your great work! The issue of the
Lab Solutions peak scaling factor is still obscure. However, we include a
set of standard solutions at different concentrations in each plate that we
use to quantify compound concentrations. So, my impression is that the
scaling factor doesn't matter. Do you agree? best, andy
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZEBO6ECGUOJ2JIGEPHW6DZ32BC3AVCNFSM6AAAAABETXJMTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXGE4DEMRXGE>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Dear Ethan,
I tried to use your code for reading the raw data of our Shimadzu HPLC, thanks for that code!
I am not a programmer and I am mainly in Python and not in R. Here are some results from our (mine and my colleaque using R) last days working on this, I wonder whether you would like to include the issues we found for your R code.
I needed to change mainly two things:
your line 147 in read_shimadzu_lcd.R, mat <- matrix(NA, nrow = fsize/(n_lambdas*1.5), ncol = n_lambdas)
This is about the size of the data stream which depends on the number of wavelength from the PDA and the total time of the HPLC run. A simple factor 1.5 does not work for my data. Instead, I first scan the PDA raw data stream for the start bits of each header of the data set and sum them up. Second, I now found the entry in a stream that contains the number of datasets and can simply be read out.
your line 249 in function decode_shimadzu_block: buffer[[2]] <- twos_complement(substr(bin, 5, nchar(bin))),
This line cuts off the first 4 bits of the bit string that finally contains the number of the difference to the former value. It worked this way for my PDA data, but could not reproduce the results of the fluoremeter at some positions and distorted the signal. I needed some time to understand this but at the end the funstion simply failed when the value for the difference is a large number and mpre bytes are needed to decode it. At the end I simple reduced the cut and are using the bits from position 3. This seemed to work!
My question here is: did you find the number '5' simply by trial and error, or was there a reason?
If there is interest from your side, I can spend some time to described more details, e.g. where to find the fluorescence data and how to read it or the file size in the .lcd file.
Best
Rüdiger
The text was updated successfully, but these errors were encountered: