Story Sage

Introduction

Story Sage is a tool that helps users interact with their books through natural conversation. It uses AI to provide relevant answers about book content while avoiding spoilers.

Features

Interactive Q&A: Question and answer system that preserves plot surprises
Semantic Search: Uses advanced embedding models to understand and retrieve relevant information across book content.
Customizable Filters: Filter responses based on book, chapter, or specific entities like characters and places.
Persistent Storage: Stores and retrieves embeddings efficiently using ChromaDB.
Extensible Architecture: Easily extendable components for additional functionalities.

Architecture

Story Sage uses a modular architecture with Retrieval-Augmented Generation (RAG) and chain-of-thought logic to deliver accurate and context-aware responses.

+------------------+          +------------------+
|  User Interface  | <------> |    Story Sage    |
+------------------+          +------------------+
                                     |
                                     |
                                     v
                         +-------------------------+
                         |   Retrieval Module      |
                         | - StorySageRetriever    |
                         | - ChromaDB Integration  |
                         +-------------------------+
                                     |
                                     |
                                     v
                         +-------------------------+
                         |   Generation Module     |
                         | - StorySageChain        |
                         | - Language Model (LLM)  |
                         +-------------------------+
                                     |
                                     |
                                     v
                         +-------------------------+
                         |   State Management      |
                         | - StorySageState        |
                         +-------------------------+

Major Components

StorySageChain

Manages the generation of responses through a series of processing steps:

RouterFunction: Analyzes questions to determine temporal context
GetCharacters: Extracts character references
GetContextFilters: Builds context filters for search
GetInitialContext: Retrieves chunk summaries
IdentifyRelevantChunks: Evaluates summaries to find relevant content
GetContextByIds: Fetches full text of matched chunks
GetContext: Falls back to semantic search if no IDs found
Generate: Produces final response using language model

StorySageRetriever

Manages access to and filtering of the vector database:

first_pass_query: Evaluates initial user question
get_where_filter: Builds ChromaDB filters
Query operations return summarized chunks first
get_by_ids: Retrieves full text for relevant chunks
retrieve_chunks: Falls back to semantic search

StorySageSearch

Handles text search across chunks using multiple strategies:

EXACT: Direct text matching
PHRASE: Ordered text sequence matching
PROXIMITY: Near-distance term matching
FUZZY: Approximate string matching

Uses parallel processing and caching optimizations for performance.

Data Models

Core data structures that enable component interaction:

Chunk: Container for text content with hierarchical relationships
ChunkMetadata: Typed metadata about chunk position and relationships
Response interfaces: Structured containers for component outputs
Context interfaces: Query and filter parameter containers

Installation

Prerequisites

Python 3.11.4
pyenv
Redis
ChromaDB
Sentence Transformers
LangChain

Steps

Clone the Repository

git clone https://github.com/chrispatten/story_sage.git
cd story_sage

Run Setup
```
make setup
```
This will:
- Install pyenv if needed
- Set up Python 3.11.4
- Create virtual environment
- Install Redis if needed
- Create default configuration files

Configure the Application Update the following configuration files:

config.yml: Set your OpenAI API key and other settings
redis_config.conf: Configure Redis settings if needed

Example config.yml:

OPENAI_API_KEY: "your-api-key"
CHROMA_PATH: './chroma_data'
CHROMA_COLLECTION: 'story_sage'
ENTITIES_PATH: './entities/entities.json'
SERIES_PATH: './series_prod.yml'
N_CHUNKS: 15
REDIS_URL: 'redis://localhost:6379/0'
REDIS_EXPIRE: 86400

Running the Application

Start Redis
```
make redis
```
Run the Application
```
make app
```

Preparing Book Data

Series Configuration

Create a series_prod.yml file to configure your book series. Example structure:

- series_id: 2
  series_name: 'Series Name'
  series_metadata_name: 'series_name'
  entity_settings:
    names_to_skip:
      - 'common_word'
    person_titles:
      - 'title1'
      - 'title2'
    base_characters:
      - name: 'Character Name'
        other_names:
          - 'alias1'
          - 'alias2'
  books:
    - number_in_series: 1
      title: 'Book Title'
      book_metadata_name: '01_book_name'
      number_of_chapters: 17

Book File Preparation

Each book in the series should be stored as a separate text file. The text file should be named using the book_metadata_name from the series.yml file. For example, the text file for the first book in the Harry Potter series would be named 01_the_sourcerers_stone.txt.

Place books in a subdirectory titled with the series_metadata_name from the series.yml file. For example, the Harry Potter books would be stored in the directory named ./books/harry_potter.

Strip out any non-essential content from the text files, such as table of contents, author notes, etc. The text should only contain the main content of the book.

Chunking

Chunk the book data into semantic chunks for efficient processing. Use the create_chunks.py script to generate these chunks. Follow these steps:

Using create_chunks.py

Use this script to split book text files into semantically coherent chunks:

Confirm your series and book files are organized in ./books/<series_name>/*.txt.
Update the SERIES_NAME variable in create_chunks.py to match your directory.
Run the script:
```
python create_chunks.py
```
Confirm JSON files are generated in ./chunks/<series_name>/semantic_chunks/ for each chapter.

Entity Extraction

Using extract_entities.py

Use this script to extract named entities from your semantic chunks:

Ensure the required text chunks are already generated in ./chunks/<series_name>/semantic_chunks/.
Open extract_entities.py and set TARGET_SERIES_ID to match the correct series_id in series.yml.
Run the script:
```
python extract_entities.py
```
Check the ./entities/<series_name>/ directory for generated JSON files containing extracted entities.

Embedding

Using embed_chunks.py

Use this script to embed your semantic chunks into the ChromaDB vector store:

Ensure you have successfully run create_chunks.py and extract_entities.py.
Open embed_chunks.py and set the series_metadata_name variable to match your series.
Verify that entities.json and series_prod.yml are correctly configured.
Run the script:
```
python embed_chunks.py
```
Confirm that the embedded documents are stored in the ./chroma_data directory by checking the ChromaDB collection.

Usage

from story_sage import StorySage

# Initialize Story Sage
story_sage = StorySage(
   api_key='your-openai-api-key',
   chroma_path='./chroma_db',
   chroma_collection_name='books_collection',
   entities_dict={'series': {...}},  # Your entities data
   series_list=[{'title': 'Series Title'}],  # Your series data
   n_chunks=5
)

# Ask a question
question = "What motivates the main character in Book 1?"
answer, context = story_sage.invoke(question)

print("Answer:", answer)
print("Context:", context)

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch for your feature or bugfix.
Commit your changes with clear messages.
Open a pull request detailing your changes.

For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License.

It was created with the help of GitHub Copilot and Connor Tyrell

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
books		books
chunks/sherlock_holmes/semantic_chunks		chunks/sherlock_holmes/semantic_chunks
images		images
logs		logs
scripts		scripts
story_sage		story_sage
templates		templates
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
app.py		app.py
config.example.yml		config.example.yml
prompts.example.yml		prompts.example.yml
raptor_retrieval.ipynb		raptor_retrieval.ipynb
redis_config.example.conf		redis_config.example.conf
requirements.txt		requirements.txt
series.example.yml		series.example.yml
story_sage.ipynb		story_sage.ipynb
story_sage.png		story_sage.png
test_chunk_from_chroma.py		test_chunk_from_chroma.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Story Sage

Table of Contents

Introduction

Features

Architecture

Major Components

StorySageChain

StorySageRetriever

StorySageSearch

Data Models

Installation

Prerequisites

Steps

Running the Application

Preparing Book Data

Series Configuration

Book File Preparation

Chunking

Using create_chunks.py

Entity Extraction

Using extract_entities.py

Embedding

Using embed_chunks.py

Usage

Contributing

License

About

Releases 6

Contributors 2

Languages

License

ChrisPatten/story_sage

Folders and files

Latest commit

History

Repository files navigation

Story Sage

Table of Contents

Introduction

Features

Architecture

Major Components

StorySageChain

StorySageRetriever

StorySageSearch

Data Models

Installation

Prerequisites

Steps

Running the Application

Preparing Book Data

Series Configuration

Book File Preparation

Chunking

Using create_chunks.py

Entity Extraction

Using extract_entities.py

Embedding

Using embed_chunks.py

Usage

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases 6

Contributors 2

Languages