GitHub

Hi there 👋

This is a replication package for ClarifyGPT: A Framework for Enhancing LLM-based Code Generation via Requirement Clarification.

Project Summary

Large Language Models (LLMs) have demonstrated impressive capabilities in automatically generating code from provided natural language requirements. However, in real-world practice, it is inevitable that the requirements written by users might be ambiguous or insufficient. Current LLMs will directly generate programs according to those unclear requirements regardless of interactive clarification, which will likely deviate from the origin user intents. To bridge that gap, we introduce a novel framework named ClarifyGPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. In particular, ClarifyGPT first detects whether a given requirement is ambiguous by performing a code consistency check. If it is ambiguous, ClarifyGPT prompts an LLM to generate targeted clarifying questions. After receiving question responses, ClarifyGPT refines the ambiguous requirement and inputs it into the same LLM to generate a final code solution. To evaluate our ClarifyGPT, we first conduct a human evaluation involving ten participants who use ClarifyGPT for code generation on two publicly available benchmarks: MBPP-sanitized and MBPP-ET. The results show that ClarifyGPT elevates the performance (Pass@1) of GPT-4 from 70.96% to 80.80% on MBPP-sanitized. Furthermore, to perform large-scale automated evaluations of ClarifyGPT across different LLMs and benchmarks without requiring user participation, we introduce a high-fidelity simulation method to simulate user responses. The automated evaluation results also demonstrate that ClarifyGPT can significantly enhance code generation performance compared to the baselines.

File organization

baseline ## baseline method GPT-Eingeering
evaluation ## the evaluation dataset and scripts
- human-eval ## HumanEval benchmark openai_humaneval
- MBPP ## MBPP benchmark google_mbpp
src ## the source code of our method ClarifyGPT
- prompt ## the designed prompts for HumanEval and MBPP
- clarify ## the main scripts to run ClarifyGPT

Run & Evaluation

Run ClairfyGPT on HumanEval or MBPP

python src/run_clarify_{model_name}_{benchmark}.py
For example:
python src/run_clarify_chatgpt_mbpp.py

Calculate the pass@1 metrics

python evaluation/MBPP/main.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
baseline/gpt-engineer-main		baseline/gpt-engineer-main
evaluation		evaluation
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hi there 👋

Project Summary

File organization

Run & Evaluation

About

Releases

Packages

Languages

ClarifyGPT/ClarifyGPT

Folders and files

Latest commit

History

Repository files navigation

Hi there 👋

Project Summary

File organization

Run & Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages