Minnow Segmented Traits

We use a segmentation model to extract traits from minnows (Family: Cyprinidae).

This repository serves as a case study of an automated workflow and extraction of morphological traits using machine learning on image data.

We expand upon work already done by BGNN, including metadata collection by the Tulane Team and the Drexel Team (see Leipzig et al. 2021, Pepper et al. 2021, and Narnani et al. 2022), and a segmentation model developed by the Virginia Tech Team. We developed morphology extraction tools (Morphology-analysis) with the help of the Tulane Team. We incorporate these tools into BGNN_Core_Workflow.

Finally, with the help of the Duke Team, we create an automated workflow.

Goals

Create a use case for using an automated workflow
Show best practices for interacting with other repositories
Show utility of using a machine learning segmentation model to accelerate trait extraction from images of specimens

Organization

Scripts

Data_Manipulation.R: code for manipulating and merging data files
Minnow_Selection_Image_Quality_Metadata.R: code for image selection
Presence_Absence_Analysis.R: code for analyzing machine learning outputs
init.R: code to load functions in Functions

Files

Previous_Measurements: a file of measurements of minnow traits by found in the supplemental information. See Burress.md for more details.

Results

a folder for the outputs from the workflow
1. tables of results from analyses
2. /Figures contains all figures created from analyses

Config

contains the config.yml file
- the user can change the file inputs or number of images under limit_images

Inputs

Data Files

The Previous_Measurements file is included in this repository.

The Fish-AIR input files will be downloaded from the Fish-AIR API. This requires a Fish-AIR API key be added to Fish_AIR_API_Key in config/config.yaml. Alternatively you can download the Fish-AIR input files from Zenodo and place them in the Files/Fish-AIR/Tulane directory.

Components

The total size of the components are 5.6G (as of 5 May 2023).

All weights and dependencies for all components of the workflow are uploaded to Hugging Face or Zenodo.

Metadata by Drexel Team
- Object detection of fish and rule from fish images
- Repository
- Model Archive
Reformatting of metadata
- Trim metadata output from Metadata step to only the values necessary for this project
- Repository
- Code Archive
Crop Image
- Extract bounding box information from metadata file
- Resizes and crops fish from image
- Repository
- Code Archive
Segmentation Model by Virginia Tech Team
- Segments fish traits from fish images
- Repository
- Model Archive
Morphology analysis by Tulane Team and Battelle Team
- Tool to calculate presence of traits
- Repository
- Code Archive
Machine Learning Workflow by Battelle Team and Duke Team
- Calls all the above containers
- Repository
- Code Archive

Images

The fish images are from the Great Lakes Invasives Network (GLIN) and stored on Fish-AIR. We are using images specifically from the Illinois Natural History Survey (INHS images).

Image Selection

R code (Minnow_Selection_Image_Quality_Metadata.R) was used to filter out high quality, minnow images using the IQM and IM metadata files.

All image metadata files are downloaded from Fish-AIR and the version used is stored on the OSC data commons under the Fish Traits dataverse. The metadata files have been generated using the Tulane worflow.

Criteria for selection of an image was based on findings from Pepper et al. 2021.

Criteria chosen:

family == "Cyprinidae"
specimenView == "left"
specimenCurved == "straight"
allPartsVisible == "True"
partsOverlapping == "True"
partsFolded == "False"
uniformBackground == "True"
partsMissing == "False"
brightness == "normal"
onFocus == "True"
colorIssues == "none"
containsScaleBar == "True"
from either INHS or UWZM institutions

Analysis

See more details in Morphology-analysis.

Each segmented image has the following traits: trunk, head, eye, dorsal fin, caudal fin, anal fin, pelvic fin, and pectoral fin. For each segmented trait, there may be more than one "blob", or group of pixels identifying a trait. We created a matrix of presence.absence.matrix.csv.

For each trait, we counted the number of "blobs" and the percentage of the largest blob as a proportion of all blobs for a trait.

All intermediate tables will be saved in the folder "Results".

Figures

We created a heat map to show the success of the segmentation to detect traits from the images.

Figures are in the folder "Results".

Running the Workflow

Instructions are provided for running the workflow on a single computer or a SLURM cluster.

The run time for 20 images is about 45 minutes and the run time for all the images is about 2 hours.

Software Requirements

To run the workflow conda and Singularity (now Apptainer) must to be installed.

Component Software Dependencies

This workflow will automatically download and setup the software dependencies required by the workflow components. These dependencies are provided using either Singularity Containers or Conda Environments. Singularity Containers are used to provide the machine learning components essential to this workflow. Singularity Containers enable highly reproducible and portable software components. However, using Singularity Containers can pose challenges for script development by domain scientists. Therefore, we use Conda Environments for the domain scientist scripts included in this workflow.

Hardware Requirements

Minimally the workflow requires 1 CPU, 5 GB memory, and 30 GB disk space. A Linux machine is required for this workflow to provide Singularity containerization.

Install Workflow Runner

To run the workflow Snakemake v7 with mamba must be installed. (The workflow definition is not compatible with Snakemake v8+.) To handle this we create a new conda environment named "snakemake".

If you are running the workflow on a cluster that provides a conda environment module you should load that module (eg. module load miniconda3).

Run the following command to create a conda environment named "snakemake" with the required workflow dependencies.

conda create -c conda-forge -c bioconda -n snakemake snakemake=7 mamba

Enter "Y" when prompted to install snakemake and mamba.

If you loaded an environment module you should unload it (eg. module purge).

See the official instructions for installing snakemake for more options.

Limit images

In the config/config.yaml file, the user can limit the number of images for a test run by change the integer under limit_images, or run them all by entering "". Be aware that downloading all the images and running the workflow takes a couple of hours.

Run snakemake

Run the following commands to activate the conda environment and run the workflow:

source activate snakemake
snakemake --jobs 1 --use-singularity --use-conda

The --jobs argument specifies how many processes the snakemake can run at a time.

Run snakemake on a SLURM Cluster

Running the workflow on a SLURM cluster enables scaling beyond a single machine. The run-workflow.sh sbatch script is provided to run the workflow using sbatch and will process up to 20 jobs simultaneously.

If your SLURM cluster provides a conda environment module you should load that module before running the next step(eg. module load miniconda3).

Run the following commmand to activate the snakemake conda environment:

source activate snakemake

Running on the workflow in the background:

sbatch run-workflow.sh

Then you can monitor the job progress as you would with any SLURM background job. Some SLURM clusters require providing sbatch a SLURM account name via the --account command line argument.

See the Run-on-OSC wiki article for the commands used to run the workflow on OSC.

Run on Docker

In some cases it is possible to run the workflow using Docker. See the experimental Docker Instructions for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 348 Commits
.snakemake/log		.snakemake/log
Files		Files
Results		Results
Scripts		Scripts
config		config
envs		envs
slurm		slurm
workflow		workflow
.DS_Store		.DS_Store
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
debug-workflow.sh		debug-workflow.sh
license		license
minnowTraits_workflow.png		minnowTraits_workflow.png
paths.R		paths.R
run-workflow.sh		run-workflow.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minnow Segmented Traits

Goals

Organization

Inputs

Data Files

Components

Images

Image Selection

Analysis

Figures

Running the Workflow

Software Requirements

Component Software Dependencies

Hardware Requirements

Install Workflow Runner

Limit images

Run snakemake

Run snakemake on a SLURM Cluster

Run on Docker

About

Releases 2

Packages

Contributors 4

Languages

License

hdr-bgnn/Minnow_Segmented_Traits

Folders and files

Latest commit

History

Repository files navigation

Minnow Segmented Traits

Goals

Organization

Inputs

Data Files

Components

Images

Image Selection

Analysis

Figures

Running the Workflow

Software Requirements

Component Software Dependencies

Hardware Requirements

Install Workflow Runner

Limit images

Run snakemake

Run snakemake on a SLURM Cluster

Run on Docker

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages