Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Sequence.gc' Method for IUPAC bases #205

Merged
merged 9 commits into from
Feb 14, 2023
Merged

'Sequence.gc' Method for IUPAC bases #205

merged 9 commits into from
Feb 14, 2023

Conversation

Bardia-Masudy
Copy link
Contributor

Hello!
This is meant to address #128. I've added two additional methods to the Sequence object:

  • gc_strict: functions as gc, but filters out non-ATGC bases.
  • gc_iupac: functions as gc, but additionally considers IUPAC ambiguous bases as having fractional G/C content and filters out non-IUPAC bases.

Thoughts and Considerations:

  • I used a Counter because it should (and did, in my testing) perform better than 12 calls to count().
  • I didn't touch the original gc method because it's still faster, and remains accurate for most situations.
  • I'm wondering if there's a more efficient way to filter out bases than regular expressions, or if there's better within re than re.sub().
  • I'm wondering if there's some way to compile the expression for when many sequences are being analyzed, without having to compile it for every individual call.

I'm new to this process, so please tell me if there's anything I've misunderstood about the issue or the process of contributing. I'm more than happy to modify anything that should be changed.

@Bardia-Masudy Bardia-Masudy changed the title Iupac gc 'Sequence.gc' Method for IUPAC bases Jan 23, 2023
@mdshw5
Copy link
Owner

mdshw5 commented Jan 31, 2023

Thanks for submitting this! I'll take a look when I get some time this week and if everything looks good will merge.

@mdshw5
Copy link
Owner

mdshw5 commented Feb 14, 2023

Now that I've fixed the CI and tests are passing, this looks good to merge. Thanks @Bardia-Masudy!

@mdshw5 mdshw5 merged commit 0b71915 into mdshw5:master Feb 14, 2023
@Bardia-Masudy Bardia-Masudy deleted the IUPAC_gc branch February 17, 2023 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants