Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to spefic the instruciton styple during generation #1

Open
afalf opened this issue Oct 24, 2024 · 1 comment
Open

Where to spefic the instruciton styple during generation #1

afalf opened this issue Oct 24, 2024 · 1 comment

Comments

@afalf
Copy link

afalf commented Oct 24, 2024

Great work! But I have a small question. The paper states that to maintain data diversity, instruction style are specified during generation. However, in the prompt provided in appendix C, there's no indication of how a particular style is assigned to these instructions. How exactly is the specific style determined? Is it required to define the style explicitly?

@orionw
Copy link
Owner

orionw commented Oct 24, 2024

Hi @afalf, and thanks for pointing this out!

I replaced the "LENGTH_FORMAT_FILL_ME" in Figure 3 with with one of these combinations from INSTRUCTION_FORMAT_DETAILS below:

format_details_length = [
    "a short instruction, only one or two sentences long.",
    "a medium length instruction, ranging from three to six sentences long.",
    "a long instruction, about one paragraph long.",
    "a very long instruction, about two paragraphs long."
]

format_details_features = [
    "Your instruction should also be written from the POV of a persona, where you describe why you are searching for this information.",
    "Start the instruction with some factual sentences that are background information, merely stating facts (but that do not answer the question).",
    "Your instruction should also contain negation, e.g. describing what types of documents are not relevant to the query.",
    ""
]

# take all combinations of the format details with the length
INSTRUCTION_FORMAT_DETAILS = []
for length in format_details_length:
    for feature in format_details_features:
        INSTRUCTION_FORMAT_DETAILS.append(f"{length} {feature}")

I see I forgot to include these in the paper, will update this in the next version. Thanks again for pointing it out and please let me know if you have other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants