Closed
Description
Because of how minimap2 creates batches, each batch takes as long as the longest read takes to map. This is particularly poorly suited for HMW nanopore sequencing, since its read length distribution has a long right tail. This effectively creates the worst case scenario for CPU efficiency, where the vast majority of each batch is mapped long before the longest read finishes. My latest (winnowmap) run took 38hrs to map:
The average CPU usage was 25% (up to the beginning of sorting) and the cost is $138 on the lowest memory AWS instance that supports 96 threads.
Can we have an option to sort batches by length, or use dynamic load balancing?