Skip to content

Why does relaxing the constraint (RepeatModification) lead to less successful augmentation?Β #789

Open
@YanghaoZYH

Description

To Reproduce
Run following code ...

from textattack.augmentation import Augmenter
from textattack.transformations import WordSwapEmbedding
from textattack.constraints.semantics import WordEmbeddingDistance
from textattack.constraints.grammaticality import PartOfSpeech
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack.shared import AttackedText

text_sample = "woody , what happened ?"
num_words_to_swap = len(AttackedText(text_sample).words) -1 # minus as what is a stop word
max_candidates = 50

num_samples = max_candidates**num_words_to_swap
print('max num_samples:', num_samples)

# Define constraints to ensure quality of perturbations
constraints = [StopwordModification(),RepeatModification()]
constraints.append(WordEmbeddingDistance(min_cos_sim=0.5))
constraints.append(PartOfSpeech(allow_verb_noun_swap=True))

# Define the transformation method
transformation = WordSwapEmbedding(
    max_candidates=50  # Number of candidates to generate per word
)

# Combine transformation and constraints in an Augmenter
augmenter = Augmenter(
    transformation=transformation,
    constraints=constraints,
    pct_words_to_swap=1,  # Percentage of words to swap per perturbation
    transformations_per_example=num_samples  # Number of perturbations to generate per input
)

perturbations = augmenter.augment(text_sample)
actural_num_samples = len(perturbations)
print('actural_num_samples: ',actural_num_samples)

Which gives me the output:

max num_samples: 2500
actural_num_samples:  532

But when I delete the RepeatModification constraint the other constraints and code remains the same:

constraints = [StopwordModification()]

gives me the output:

max num_samples: 2500
actural_num_samples:  277

Expected behavior
I expect that easing the constraint should increase the num_samples, but it shows the opposite.
Is there anything I misunderstood or is there a bug?

System Information (please complete the following information):

  • OS: Linux
  • Library versions torch==2.3.0, transformers==4.40.1
  • Textattack version 0.3.10

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions