Skip to content

Instantly share code, notes, and snippets.

@ckandoth
Last active November 12, 2024 13:49
Show Gist options
  • Save ckandoth/f6a52ae704a170f9b67e80e0aa5d4b23 to your computer and use it in GitHub Desktop.
Save ckandoth/f6a52ae704a170f9b67e80e0aa5d4b23 to your computer and use it in GitHub Desktop.
Test Illumina Dragen software on an Azure NP10 VM

Purpose

Test an NP-series VM on Azure with Dragen's pay-as-you-go (PAYG) license

Prerequisites

  1. Sign up for an Azure subscription if you don't already have one.

  2. Visit Quotas in Azure Portal, login if needed, and increase the NP series quota to 40, so we can operate up to 4 NP10 VMs at a time, or 2 NP20 VMs. Based on demand for these SKUs in your region, you may also need to submit a service request and justify your use-case to a person before that quota gets approved.

  3. Visit this page, login if needed, and ensure that Status is set to Enable for the Azure subscription you intend to use. This allows programmatic deployment of the PAYG VMs as we will test below.

  4. Install Azure CLI and use the az login command to login using the same Microsoft creds as you used above.

  5. An RSA or ED25519 private/public key pair in your .ssh folder.

Test

Create a resource group with a basic virtual network that allows SSH:

az group create --name dgn-rg --location eastus
az network nsg create --resource-group dgn-rg --name dgn-nsg
az network nsg rule create --resource-group dgn-rg --nsg-name dgn-nsg --name SSH --priority 300 --protocol TCP --access Allow --direction Inbound --source-address-prefixes "*" --source-port-ranges "*" --destination-address-prefixes "*" --destination-port-ranges 22
az network vnet create --resource-group dgn-rg --network-security-group dgn-nsg --name dgn-vnet --address-prefixes "10.0.0.0/16" --subnet-name default --subnet-prefixes "10.0.0.0/24"

Create a VM (includes creation of NIC and public IP):

az vm create --resource-group dgn-rg --nsg dgn-nsg --vnet-name dgn-vnet --subnet default --name dgn-vm --size Standard_NP10s --ephemeral-os-disk true --ephemeral-placement CacheDisk --security-type Standard --public-ip-sku Standard --image illuminainc1586452220102:dragen-vm-payg:dragen-4-3-6-payg:latest --plan-publisher illuminainc1586452220102 --plan-product dragen-vm-payg --plan-name dragen-4-3-6-payg --accept-term --admin-username ckandoth --ssh-key-values .ssh/id_ed25519.pub

The IP will be printed, or we can get it as follows:

az network public-ip list

ssh into it and chown the large 778GB disk:

ssh ckandoth@172.190.49.30
sudo chown $UID:$GROUPS /mnt

Download a test dataset into the large disk /mnt and build a hash table for chr21:

mkdir /mnt/tmp
wget -P /mnt/tmp https://data.cyri.ac/test_tum_nrm_wgs.tar
tar -xf /mnt/tmp/test_tum_nrm_wgs.tar -C /mnt
dragen --intermediate-results-dir /mnt/tmp --build-hash-table true --ht-reference /mnt/ref/GRCh38_chr21.fa --output-directory /mnt/ref --ht-num-threads=8

Run alignment of the test FASTQs against the chr21 hash table:

mkdir /mnt/out
dragen --intermediate-results-dir /mnt/tmp -r /mnt/ref -1 /mnt/nrm/nrm_C0JD1ACXX_L001_R1_001.fastq.gz -2 /mnt/nrm/nrm_C0JD1ACXX_L001_R2_001.fastq.gz --RGID C0JD1ACXX.1 --RGSM nrm --output-directory /mnt/out --output-file-prefix nrm --enable-duplicate-marking true --enable-map-align-output true --output-format cram

To avoid cost of data transfer in/out of Azure, use Networked Streaming with Azure Blob storage.

Notes

Delete the VM to save money because it's the most expensive resource:

az vm delete --yes --resource-group dgn-rg --name dgn-vm

Delete the whole resource group to stop spending money altogether:

az group delete --yes --name dgn-rg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment