PaperQA is a Python-based question-answering application for research papers and documents. It uses OpenAI-powered embeddings and FAISS for vector storage to provide accurate answers based on the content of uploaded papers in .pdf
, .docx
, and .txt
formats.
- File Support: Reads and processes
.pdf
,.docx
, and.txt
files. - Embeddings & Vector Search: Converts document text to embeddings with OpenAI for similarity-based search.
- Interactive Q&A: Provides an interactive prompt for querying the content.
- Extensible Design: Modular code structure for ease of expansion and customization.
-
Clone the Repository
git clone https://github.com/yourusername/PaperQA.git cd PaperQA
-
Install Dependencies
pip install -r requirements.txt
-
Environment Setup
Create a
.env
file in the root directory and add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key_here
-
Place your documents in the
data
folder. -
Run the application:
python main.py
-
Enter questions related to the content of the papers in the prompt. Type
end
to exit.
document_reader.py
: Functions to read.pdf
,.docx
, and.txt
files.text_processing.py
: Splits large texts into manageable chunks.embeddings.py
: Generates and stores embeddings for document similarity search.main.py
: The main application logic, including the interactive prompt.
After running main.py
, you can input questions like:
Ask a question about the paper: What are the main conclusions?
The model will return a response based on the document content.
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.