Skip to content

Low CPU usage for nanopore data #771

Closed
@rlorigro

Description

Because of how minimap2 creates batches, each batch takes as long as the longest read takes to map. This is particularly poorly suited for HMW nanopore sequencing, since its read length distribution has a long right tail. This effectively creates the worst case scenario for CPU efficiency, where the vast majority of each batch is mapped long before the longest read finishes. My latest (winnowmap) run took 38hrs to map:

image

The average CPU usage was 25% (up to the beginning of sorting) and the cost is $138 on the lowest memory AWS instance that supports 96 threads.

Can we have an option to sort batches by length, or use dynamic load balancing?

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions