-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WDL generation. #6504
WDL generation. #6504
Conversation
@bshifaw This is the PR we discussed in slack. Since we won't actually check the generated files into the repo, I added a temporary commit with some example WDL/JSON files so you can see what currently gets generated, and am looking for comments on the WDL structure, etc. Thanks. |
bfccfa3
to
e36a243
Compare
@bshifaw Just a reminder that this branch is awaiting your review. To clarify, we don't need you to review the code, we just need you to review the generated WDLs/JSONs themselves, looking for problems / missing elements / etc., and suggesting any tweaks you think are necessary. Thanks! |
@bshifaw Minor tweak to David's comment: you won't be able to run the WDL gen directly from this branch because it requires a barclay upgrade, but I did check a couple of output WDLs in as a temporary commit so you can look at those. If you'd like, I can generate WDL for all the tools and check those in, but I didn't do that because the tools will require some annotation in addition to this branch in order to get WDLs with proper output sections out. |
Sorry, just seeing this now.
|
Thanks for the review @bshifaw.
There is limited support for runtime blocks now (you can see one here) but it requires annotating the tool with the proper values. It currently only supports the
I chose to make each WDL have both a workflow and a task, and generated an input JSON that contains the default values for each optional arg so that only the required args have to be provided (they're initialized with the type of value required similar to the skeleton that womtool generates). But maybe theres an alternative structure thats better. Can you point me to a WDL that uses the structure you're proposing? I'm not sure I understand how it would be used.
I used the full names wherever possible, but many of the tools have args with names that are WDL reserved works like |
|
e36a243
to
c46d362
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks pretty great. I still need to look at the Barclay branch.
I had a couple of comments / questions. I primarily commented on the PrintReads
wdl with a few other comments interspersed.
@cmnbroad and @bshifaw - In terms of the WDL workflow / task inputs - I think the way it's written up now is good. It would save space to only have task level inputs, but this gives more of the sense that each tool can stand on its own as an entity. It's also auto-generated so it's no more work to do it this way.
.travis.yml
Outdated
- CROMWELL_JAR=$HOME/cromwell-36.jar | ||
- WOMTOOL_JAR=$HOME/womtool-36.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to update to a newer cromwell / womtool version? Looks like v47
is used in the gatkWDLGenValidation
build task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I made a separate PR to upgrade to v51, which is merged now, and this is rebased on that.
build.gradle
Outdated
def wdlFiles = fileTree(dir: wdlGenFolder) | ||
wdlFiles = wdlFiles.filter { f -> !f.getAbsolutePath().endsWith(".html") && !f.getAbsolutePath().endsWith(".json")} | ||
return wdlFiles.any() { wdlFile -> | ||
//TODO: fix this before submitting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tempCommit/PrintReads.wdl
Outdated
@@ -0,0 +1,297 @@ | |||
version 1.0 | |||
|
|||
# Run PrintReads (GATK Version 4.1.7.0-7-gcd21b9c-SNAPSHOT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Run PrintReads (GATK Version 4.1.7.0-7-gcd21b9c-SNAPSHOT) | |
# Run PrintReads (WDL Autogenerated from: GATK Version 4.1.7.0-7-gcd21b9c-SNAPSHOT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, good idea.
tempCommit/PrintReads.wdl
Outdated
|
||
input { | ||
#Docker to use | ||
String dockerImage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about versioning and how it applies to the docker image and GATK executables here.
I think we should have a default value here for the gatk executable so this can run out of the box on a GATK docker image.
Should we have a default for a docker image here? Or should we make a check in the command
block below to make sure the GATK is the right version? I'm a little concerned about version mismatches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now defaults to broadinstitue/gatk:version.
tempCommit/PrintReads.wdl
Outdated
>>> | ||
|
||
runtime { | ||
docker: dockerImage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to give some default parameters for memory and disk?
I typically start with some defaults. Here's something like what I usually start with:
input {
Int? mem_gb
Int? preemptible_attempts
Int? disk_space_gb
Int? cpu
Int? boot_disk_size_gb
}
Boolean use_ssd = false
Float input_files_size_gb = size(reads_file, "GiB") + size(segments_fasta, "GiB") + size(boundaries_file, "GiB")
# You may have to change the following two parameter values depending on the task requirements
Int default_ram_mb = 2048
Int default_disk_space_gb = ceil((input_files_size_gb * 2) + 1024)
Int default_boot_disk_size_gb = 15
# Mem is in units of GB but our command and memory runtime values are in MB
Int machine_mem = if defined(mem_gb) then mem_gb * 1024 else default_ram_mb
runtime {
docker: docker_image
memory: machine_mem + " MB"
disks: "local-disk " + select_first([disk_space_gb, default_disk_space_gb]) + if use_ssd then " SSD" else " HDD"
bootDiskSizeGb: select_first([boot_disk_size_gb, default_boot_disk_size_gb])
preemptible: select_first([preemptible_attempts, 1])
cpu: select_first([cpu, 1])
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaults are set in the JSON input files (based on either the default or override values taken from the RuntimeProperties
annotations).
tempCommit/PrintReads.wdl
Outdated
~{true='--tmp_dir ' false='' defined(tmp_dir)}~{sep=' --tmp_dir ' tmp_dir} \ | ||
~{true='--use_jdk_deflater ' false='' defined(use_jdk_deflater)}~{sep=' --use_jdk_deflater ' use_jdk_deflater} \ | ||
~{true='--use_jdk_inflater ' false='' defined(use_jdk_inflater)}~{sep=' --use_jdk_inflater ' use_jdk_inflater} \ | ||
~{true='--verbosity ' false='' defined(verbosity)}~{sep=' --verbosity ' verbosity} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need another newline here because of the trailing backslash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, done.
@@ -83,17 +85,20 @@ | |||
programGroup = ReadDataManipulationProgramGroup.class | |||
) | |||
@DocumentedFeature | |||
@RuntimeProperties(memoryRequirements = "1GB") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for something like this. This is great!
@@ -248,29 +247,29 @@ void addToFeatureSources(final int featureQueryLookahead, final FeatureInput<? e | |||
} | |||
|
|||
/** | |||
* Given a Field known to be of type FeatureInput (or a Collection thereof), retrieves the type | |||
* Given a ArgumentDefinition for an argument known to be of type FeatureInput (or a Collection thereof), retrieves the type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Given a ArgumentDefinition for an argument known to be of type FeatureInput (or a Collection thereof), retrieves the type | |
* Given an ArgumentDefinition for an argument known to be of type FeatureInput (or a Collection thereof), retrieves the type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit is from another branch that was required for the new Barclay version, but thats been merged into master now and this is rebased on that, so this commit is gone.
@@ -58,4 +64,27 @@ protected void addCustomBindings(final DocWorkUnit currentWorkUnit) { | |||
} | |||
} | |||
|
|||
// @Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed since it's commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, done.
<#if group.supercat == supercat> | ||
<li><a data-toggle="collapse" data-parent="#sidenav" href="#${group.id}">${group.name}</a> | ||
<div id="${group.id}" | ||
<?php echo ($group == '${group.name}')? 'class="accordion-body collapse in"'.chr(62) : 'class="accordion-body collapse"'.chr(62);?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hoo boy. Are we really still using PHP? Not really a question for this code review… but I'm surprised.
Cool stuff!
|
Bug: it should be --output in the command block, not --output_arg |
de42cb2
to
42e2493
Compare
45b362c
to
70e1bbd
Compare
@jonn-smith Back to you. It took a fair amount of machinery to get a cromwell execution test running on travis with dummy inputs, etc., but its finally working. For the moment, the only tool that is annotated for WDLGen is PrintReads, but as soon as this goes in I'll start annotating others in a separate PR. |
Are we morally opposed to tracking the generated WDLs with git? That would be the easiest way to get up-to-date single-tool WDLs into Terra, i.e. add each to Dockstore tracking the GATK repo and import into Terra from Dockstore. |
@ldgauthier Yes, we plan to put the generated WDLs in Dockstore (Comms team has also requested this), and update them with each release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One or two very minor questions / suggestions.
Feel free to merge after looking at / addressing them.
/** | ||
* Adapter shim for use within GATK to run tools in command line validation mode. Note that this | ||
* class does not have it's own CommandLineProgramProperties annotation, and isn't intended to be | ||
* run directly from the command line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding in some documentation / naming the class to indicate that it really only validates the command line arguments for a given tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/** | ||
* Adds a super-category so that we can custom-order the categories in the doc index | ||
* | ||
* @param docWorkUnit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc needs updating here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
/** | ||
* Add the named argument {@code argDed}to the property map if applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Add the named argument {@code argDed}to the property map if applicable. | |
* Add the named argument {@code argDed} to the property map if applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
ce2be87
to
6550649
Compare
…ons on a separate branch.
This requires a Barclay upgrade (that is not yet published) since most of the basic WDL generation code is in Barclay. WDL is only generated for tools that are annotated with @RuntimeProperties, and that have input/output files arguments that are annotated with@WorkflowResource.
For each WDL generated, an accompanying JSON input file is generated that contains all the tool's arguments, with the optional arguments initialized to the tool's default values, and the required args initialized to a string that describes the required type. A temporary commit that contains a sample WDL/JSON generated by the WDL gen task in included to make it easier to see the WDL that results.
The only commits in this branch that are directly WDL-gen related are the WDL Gen commit itself, and the sample output commit. The other commits are either related to GATKPathSpecifier migration (not required for WDL gen) or Barclay upgrade migration (required for WDL gen).