Why does relaxing the constraint (RepeatModification) lead to less successful augmentation?Β #789
Open
Description
To Reproduce
Run following code ...
from textattack.augmentation import Augmenter
from textattack.transformations import WordSwapEmbedding
from textattack.constraints.semantics import WordEmbeddingDistance
from textattack.constraints.grammaticality import PartOfSpeech
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack.shared import AttackedText
text_sample = "woody , what happened ?"
num_words_to_swap = len(AttackedText(text_sample).words) -1 # minus as what is a stop word
max_candidates = 50
num_samples = max_candidates**num_words_to_swap
print('max num_samples:', num_samples)
# Define constraints to ensure quality of perturbations
constraints = [StopwordModification(),RepeatModification()]
constraints.append(WordEmbeddingDistance(min_cos_sim=0.5))
constraints.append(PartOfSpeech(allow_verb_noun_swap=True))
# Define the transformation method
transformation = WordSwapEmbedding(
max_candidates=50 # Number of candidates to generate per word
)
# Combine transformation and constraints in an Augmenter
augmenter = Augmenter(
transformation=transformation,
constraints=constraints,
pct_words_to_swap=1, # Percentage of words to swap per perturbation
transformations_per_example=num_samples # Number of perturbations to generate per input
)
perturbations = augmenter.augment(text_sample)
actural_num_samples = len(perturbations)
print('actural_num_samples: ',actural_num_samples)
Which gives me the output:
max num_samples: 2500
actural_num_samples: 532
But when I delete the RepeatModification constraint the other constraints and code remains the same:
constraints = [StopwordModification()]
gives me the output:
max num_samples: 2500
actural_num_samples: 277
Expected behavior
I expect that easing the constraint should increase the num_samples, but it shows the opposite.
Is there anything I misunderstood or is there a bug?
System Information (please complete the following information):
- OS: Linux
- Library versions
torch==2.3.0, transformers==4.40.1
- Textattack version 0.3.10
Metadata
Assignees
Labels
No labels