PaperQA

PaperQA is a Python-based question-answering application for research papers and documents. It uses OpenAI-powered embeddings and FAISS for vector storage to provide accurate answers based on the content of uploaded papers in .pdf, .docx, and .txt formats.

Features

File Support: Reads and processes .pdf, .docx, and .txt files.
Embeddings & Vector Search: Converts document text to embeddings with OpenAI for similarity-based search.
Interactive Q&A: Provides an interactive prompt for querying the content.
Extensible Design: Modular code structure for ease of expansion and customization.

Installation

Prerequisites

Python 3.7+
OpenAI API Key
FAISS
LangChain and other dependencies in requirements.txt

Clone the Repository

git clone https://github.com/yourusername/PaperQA.git
cd PaperQA

Install Dependencies
```
pip install -r requirements.txt
```
Environment Setup

Create a .env file in the root directory and add your OpenAI API key:
```
OPENAI_API_KEY=your_openai_api_key_here
```

Usage

Place your documents in the data folder.
Run the application:
```
python main.py
```
Enter questions related to the content of the papers in the prompt. Type end to exit.

Project Structure

document_reader.py: Functions to read .pdf, .docx, and .txt files.
text_processing.py: Splits large texts into manageable chunks.
embeddings.py: Generates and stores embeddings for document similarity search.
main.py: The main application logic, including the interactive prompt.

Example

After running main.py, you can input questions like:

Ask a question about the paper: What are the main conclusions?

The model will return a response based on the document content.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperQA

Features

Installation

Prerequisites

Usage

Project Structure

Example

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
data		data
README.md		README.md
document_reader.py		document_reader.py
embeddings.py		embeddings.py
main.py		main.py
text_processing.py		text_processing.py

alireza-nasirian/PaperQA

Folders and files

Latest commit

History

Repository files navigation

PaperQA

Features

Installation

Prerequisites

Usage

Project Structure

Example

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages