- Introduction
- Features
- Architecture
- Installation
- Preparing Book Data
- Usage
- Current Issues
- Contributing
- License
Story Sage is a tool that helps users interact with their books through natural conversation. It uses AI to provide relevant answers about book content while avoiding spoilers.
- Interactive Q&A: Question and answer system that preserves plot surprises
- Semantic Search: Uses advanced embedding models to understand and retrieve relevant information across book content.
- Customizable Filters: Filter responses based on book, chapter, or specific entities like characters and places.
- Persistent Storage: Stores and retrieves embeddings efficiently using ChromaDB.
- Extensible Architecture: Easily extendable components for additional functionalities.
Story Sage uses a modular architecture with Retrieval-Augmented Generation (RAG) and chain-of-thought logic to deliver accurate and context-aware responses.
+------------------+ +------------------+
| User Interface | <------> | Story Sage |
+------------------+ +------------------+
|
|
v
+-------------------------+
| Retrieval Module |
| - StorySageRetriever |
| - ChromaDB Integration |
+-------------------------+
|
|
v
+-------------------------+
| Generation Module |
| - StorySageChain |
| - Language Model (LLM) |
+-------------------------+
|
|
v
+-------------------------+
| State Management |
| - StorySageState |
+-------------------------+
- StorySageRetriever: Handles the retrieval of relevant text chunks from the book based on user queries using ChromaDB.
- StorySageChain: Manages the generation of responses by processing retrieved information through a language model.
- StorySageState: Maintains the state of user interactions, including context and extracted entities.
- ChromaDB: Serves as the vector store for efficient storage and retrieval of text embeddings.
- Language Model (LLM): Generates human-like responses based on the provided context.
- Python 3.11.4
- pyenv
- Redis
- ChromaDB
- Sentence Transformers
- LangChain
-
Clone the Repository
git clone https://github.com/chrispatten/story_sage.git cd story_sage
-
Run Setup
make setup
This will:
- Install pyenv if needed
- Set up Python 3.11.4
- Create virtual environment
- Install Redis if needed
- Create default configuration files
-
Configure the Application Update the following configuration files:
config.yml
: Set your OpenAI API key and other settingsredis_config.conf
: Configure Redis settings if needed
Example config.yml:
OPENAI_API_KEY: "your-api-key" CHROMA_PATH: './chroma_data' CHROMA_COLLECTION: 'story_sage' ENTITIES_PATH: './entities/entities.json' SERIES_PATH: './series_prod.yml' N_CHUNKS: 15 REDIS_URL: 'redis://localhost:6379/0' REDIS_EXPIRE: 86400
-
Start Redis
make redis
-
Run the Application
make app
Create a series_prod.yml
file to configure your book series. Example structure:
- series_id: 2
series_name: 'Series Name'
series_metadata_name: 'series_name'
entity_settings:
names_to_skip:
- 'common_word'
person_titles:
- 'title1'
- 'title2'
base_characters:
- name: 'Character Name'
other_names:
- 'alias1'
- 'alias2'
books:
- number_in_series: 1
title: 'Book Title'
book_metadata_name: '01_book_name'
number_of_chapters: 17
Each book in the series should be stored as a separate text file. The text file should be named using the book_metadata_name
from the series.yml file. For example, the text file for the first book in the Harry Potter series would be named 01_the_sourcerers_stone.txt
.
Place books in a subdirectory titled with the series_metadata_name
from the series.yml file. For example, the Harry Potter books would be stored in the directory named ./books/harry_potter
.
Strip out any non-essential content from the text files, such as table of contents, author notes, etc. The text should only contain the main content of the book.
Chunk the book data into semantic chunks for efficient processing. Use the create_chunks.py
script to generate these chunks. Follow these steps:
Use this script to split book text files into semantically coherent chunks:
- Confirm your series and book files are organized in ./books/<series_name>/*.txt.
- Update the SERIES_NAME variable in create_chunks.py to match your directory.
- Run the script:
python create_chunks.py
- Confirm JSON files are generated in ./chunks/<series_name>/semantic_chunks/ for each chapter.
Use this script to extract named entities from your semantic chunks:
- Ensure the required text chunks are already generated in ./chunks/<series_name>/semantic_chunks/.
- Open extract_entities.py and set TARGET_SERIES_ID to match the correct series_id in series.yml.
- Run the script:
python extract_entities.py
- Check the ./entities/<series_name>/ directory for generated JSON files containing extracted entities.
Use this script to embed your semantic chunks into the ChromaDB vector store:
- Ensure you have successfully run
create_chunks.py
andextract_entities.py
. - Open
embed_chunks.py
and set theseries_metadata_name
variable to match your series. - Verify that
entities.json
andseries_prod.yml
are correctly configured. - Run the script:
python embed_chunks.py
- Confirm that the embedded documents are stored in the
./chroma_data
directory by checking the ChromaDB collection.
from story_sage import StorySage
# Initialize Story Sage
story_sage = StorySage(
api_key='your-openai-api-key',
chroma_path='./chroma_db',
chroma_collection_name='books_collection',
entities_dict={'series': {...}}, # Your entities data
series_list=[{'title': 'Series Title'}], # Your series data
n_chunks=5
)
# Ask a question
question = "What motivates the main character in Book 1?"
answer, context = story_sage.invoke(question)
print("Answer:", answer)
print("Context:", context)
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Commit your changes with clear messages.
- Open a pull request detailing your changes.
For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License.
It was created with the help of GitHub Copilot and Connor Tyrell