This project aims to exploring generative LLMs such as ChatGPT and GPT-4 for relevance ranking in Information Retrieval (IR).
We aim to answer the following two questions:
- How does ChatGPT perform on passage re-ranking tasks?
- How to distill the ranking capabilities of ChatGPT to a smaller, specialized model?
To answer the first question, we introduce an instructional permutation generation appraoch to instruct LLMs to directly output the permutations of a group of passages.
To answer the second question, we train a cross-encoder using 10K ChatGPT predicted permutations on MS MARCO.
Below are the results (average nDCG@10) of our preliminary experiments on TREC, BEIR and Mr. TyDi.