From af0d00c1c88c616affdd167d1d0c28e224374580 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jochen=20Nie=C3=9Fer?= Date: Fri, 4 Oct 2024 18:46:58 +0200 Subject: [PATCH 1/4] change landing page --- README.md | 30 ++++-------------------------- 1 file changed, 4 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index 109cb47..28468b3 100644 --- a/README.md +++ b/README.md @@ -4,33 +4,11 @@ [![documentation](https://readthedocs.org/projects/peak-performance/badge/?version=latest)](https://peak-performance.readthedocs.io/en/latest) [![DOI](https://zenodo.org/badge/713469041.svg)](https://zenodo.org/doi/10.5281/zenodo.10255543) -# How to use PeakPerformance -For installation instructions, see `Installation.md`. -For instructions regarding the use of PeakPerformance, check out the example notebook(s) under `notebooks`, the complementary example data under `example`, and the following introductory explanations. +# About PeakPerformance -## Preparing raw data -This step is crucial when using PeakPerformance. Raw data has to be supplied as time series meaning for each signal you want to analyze, save a NumPy array consisting of time in the first dimension and intensity in the second dimension (compare example data). Both time and intensity should also be NumPy arrays. If you e.g. have time and intensity of a singal as lists, you can use the following code to convert, format, and save them in the correct manner: - -```python -import numpy as np -from pathlib import Path - -time_series = np.array([np.array(time), np.array(intensity)]) -np.save(Path(r"example_path/time_series.npy"), time_series) -``` - -The naming convention of raw data files is `___.npy`. There should be no underscores within the named sections such as `acquisition name`. Essentially, the raw data names include the acquisition and mass trace, thus yielding a recognizable and unique name for each isotopomer/fragment/metabolite/sample. - -## Model selection -When it comes to selecting models, PeakPerformance has a function performing an automated selection process by analyzing one acquisiton per mass trace with all implemented models. Subsequently, all models are ranked based on an information criterion (either pareto-smoothed importance sampling leave-one-out cross-validation or widely applicable information criterion). For this process to work as intended, you need to specify acquisitions with representative peaks for each mass trace (see example notebook 1). If e.g. most peaks of an analyte show a skewed shape, then select an acquisition where this is the case. For double peaks, select an acquision where the peaks are as distinct and comparable in height as possible. -Since model selection is a computationally demanding and time consuming process, it is suggested to state the model type as the user (see example notebook 1) if possible. - -## Troubleshooting -### A batch run broke and I want to restart it. -If an error occured in the middle of a batch run, then you can use the `pipeline_restart` function in the `pipeline` module to create a new batch which will analyze only those samples, which have not been analyzed previously. - -### The model parameters don't converge and/or the fit does not describe the raw data well. -Check the separate file `How to adapt PeakPerformance to your data`. +# First steps +Be sure to check out our thorough [documentation](https://peak-performance.readthedocs.io/en/latest). It contains not only information on how to install PeakPerformance and prepare raw data for its application but also detailed treatises about the implemented model structures, validation with both synthetic and experimental data against a commercially available vendor software, exemplary usage of diagnostic plots and investigation of various effects. +Furthermore, you will find example notebooks and data sets showcasing different aspects of PeakPerformance. # How to contribute If you encounter bugs while using PeakPerformance, please bring them to our attention by opening an issue. When doing so, describe the problem in detail and add screenshots/code snippets and whatever other helpful material you can provide. From 2bdbe5674a1254bb2bc414c1ab8e3f00a816baf2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jochen=20Nie=C3=9Fer?= Date: Fri, 4 Oct 2024 18:47:10 +0200 Subject: [PATCH 2/4] add raw data preparation to documentation --- docs/source/index.rst | 1 + docs/source/markdown/Preparing_raw_data.md | 13 +++++++++++++ 2 files changed, 14 insertions(+) create mode 100644 docs/source/markdown/Preparing_raw_data.md diff --git a/docs/source/index.rst b/docs/source/index.rst index 5091b78..c3d98d2 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -35,6 +35,7 @@ The documentation features various notebooks that demonstrate the usage and inve :maxdepth: 1 markdown/Installation + markdown/Preparing_raw_data markdown/Peak_model_composition markdown/PeakPerformance_validation markdown/PeakPerformance_workflow diff --git a/docs/source/markdown/Preparing_raw_data.md b/docs/source/markdown/Preparing_raw_data.md new file mode 100644 index 0000000..6b3cc10 --- /dev/null +++ b/docs/source/markdown/Preparing_raw_data.md @@ -0,0 +1,13 @@ +# Preparing raw data + +This step is crucial when using PeakPerformance. Raw data has to be supplied as time series meaning for each signal you want to analyze, save a NumPy array consisting of time in the first dimension and intensity in the second dimension (compare example data in the repository). Both time and intensity should also be NumPy arrays. If you e.g. have time and intensity of a signal as lists, you can use the following code to convert, format, and save them in the correct manner: + +```python +import numpy as np +from pathlib import Path + +time_series = np.array([np.array(time), np.array(intensity)]) +np.save(Path(r"example_path/time_series.npy"), time_series) +``` + +The naming convention of raw data files is `___.npy`. There should be no underscores within the named sections such as `acquisition name`. Essentially, the raw data names include the acquisition and mass trace, thus yielding a recognizable and unique name for each isotopomer/fragment/metabolite/sample. This is of course only relevant when using the pre-manufactured data pipeline and does not apply to user-generated custom data pipelines. \ No newline at end of file From 3ccaf4aadac745794028af62b24d96c921fe7184 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jochen=20Nie=C3=9Fer?= Date: Fri, 4 Oct 2024 19:02:28 +0200 Subject: [PATCH 3/4] update readme --- README.md | 1 + docs/source/markdown/Preparing_raw_data.md | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 28468b3..cdd4347 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ [![DOI](https://zenodo.org/badge/713469041.svg)](https://zenodo.org/doi/10.5281/zenodo.10255543) # About PeakPerformance +PeakPerformance employs Bayesian modelling for chromatographic peak data fitting. This has the innate advantage of providing uncertainty quantification while jointly estimating all peak parameters united in a single peak model. As Markoc Chain Monte Carlo (MCMC) methods are utilized to infer the posterior probability distribution, convergence checks and the aformentioned uncertainty quantification are applied as novel quality metrics for a robust peak recognition. # First steps Be sure to check out our thorough [documentation](https://peak-performance.readthedocs.io/en/latest). It contains not only information on how to install PeakPerformance and prepare raw data for its application but also detailed treatises about the implemented model structures, validation with both synthetic and experimental data against a commercially available vendor software, exemplary usage of diagnostic plots and investigation of various effects. diff --git a/docs/source/markdown/Preparing_raw_data.md b/docs/source/markdown/Preparing_raw_data.md index 6b3cc10..fa76d08 100644 --- a/docs/source/markdown/Preparing_raw_data.md +++ b/docs/source/markdown/Preparing_raw_data.md @@ -10,4 +10,4 @@ time_series = np.array([np.array(time), np.array(intensity)]) np.save(Path(r"example_path/time_series.npy"), time_series) ``` -The naming convention of raw data files is `___.npy`. There should be no underscores within the named sections such as `acquisition name`. Essentially, the raw data names include the acquisition and mass trace, thus yielding a recognizable and unique name for each isotopomer/fragment/metabolite/sample. This is of course only relevant when using the pre-manufactured data pipeline and does not apply to user-generated custom data pipelines. \ No newline at end of file +The naming convention of raw data files is `___.npy`. There should be no underscores within the named sections such as `acquisition name`. Essentially, the raw data names include the acquisition and mass trace, thus yielding a recognizable and unique name for each isotopomer/fragment/metabolite/sample. This is of course only relevant when using the pre-manufactured data pipeline and does not apply to user-generated custom data pipelines. From 15e2db23f3b1eebac7b1be8cba930583cc071358 Mon Sep 17 00:00:00 2001 From: Michael Osthege Date: Sat, 5 Oct 2024 00:42:08 +0200 Subject: [PATCH 4/4] Fix typos and simplify code example --- README.md | 4 +++- docs/source/markdown/Preparing_raw_data.md | 8 +++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index cdd4347..4d8e093 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,9 @@ [![DOI](https://zenodo.org/badge/713469041.svg)](https://zenodo.org/doi/10.5281/zenodo.10255543) # About PeakPerformance -PeakPerformance employs Bayesian modelling for chromatographic peak data fitting. This has the innate advantage of providing uncertainty quantification while jointly estimating all peak parameters united in a single peak model. As Markoc Chain Monte Carlo (MCMC) methods are utilized to infer the posterior probability distribution, convergence checks and the aformentioned uncertainty quantification are applied as novel quality metrics for a robust peak recognition. +PeakPerformance employs Bayesian modeling for chromatographic peak data fitting. +This has the innate advantage of providing uncertainty quantification while jointly estimating all peak parameters united in a single peak model. +As Markov Chain Monte Carlo (MCMC) methods are utilized to infer the posterior probability distribution, convergence checks and the aformentioned uncertainty quantification are applied as novel quality metrics for a robust peak recognition. # First steps Be sure to check out our thorough [documentation](https://peak-performance.readthedocs.io/en/latest). It contains not only information on how to install PeakPerformance and prepare raw data for its application but also detailed treatises about the implemented model structures, validation with both synthetic and experimental data against a commercially available vendor software, exemplary usage of diagnostic plots and investigation of various effects. diff --git a/docs/source/markdown/Preparing_raw_data.md b/docs/source/markdown/Preparing_raw_data.md index fa76d08..088f109 100644 --- a/docs/source/markdown/Preparing_raw_data.md +++ b/docs/source/markdown/Preparing_raw_data.md @@ -1,13 +1,15 @@ # Preparing raw data -This step is crucial when using PeakPerformance. Raw data has to be supplied as time series meaning for each signal you want to analyze, save a NumPy array consisting of time in the first dimension and intensity in the second dimension (compare example data in the repository). Both time and intensity should also be NumPy arrays. If you e.g. have time and intensity of a signal as lists, you can use the following code to convert, format, and save them in the correct manner: +This step is crucial when using PeakPerformance. +Raw data has to be supplied as time series meaning for each signal you want to analyze, save a shape `(2, ?)` NumPy array consisting of time in the first, and intensity in the second entry in the first dimension (compare example data in the repository). +Both time and intensity should also be NumPy arrays. +If you e.g. have time and intensity of a signal as lists, you can use the following code to convert, format, and save them in the correct manner: ```python import numpy as np -from pathlib import Path time_series = np.array([np.array(time), np.array(intensity)]) -np.save(Path(r"example_path/time_series.npy"), time_series) +np.save("time_series.npy", time_series) ``` The naming convention of raw data files is `___.npy`. There should be no underscores within the named sections such as `acquisition name`. Essentially, the raw data names include the acquisition and mass trace, thus yielding a recognizable and unique name for each isotopomer/fragment/metabolite/sample. This is of course only relevant when using the pre-manufactured data pipeline and does not apply to user-generated custom data pipelines.