Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read the knowledge base in an MD document and use OpenAPI Q&A #5

Closed
ltc17681102655 opened this issue Jun 14, 2023 · 2 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@ltc17681102655
Copy link

How to read the knowledge base in an MD document and use OpenAPI Q&A

@HamaWhiteGG
Copy link
Owner

Can you provide a detailed description of the requirements? I'm a bit unclear about what needs to be done.

@HamaWhiteGG
Copy link
Owner

@ltc17681102655 Supported, you can view the example code in RetrievalMarkdownExample

public class RetrievalMarkdownExample {

    public static final String NAMESPACE = "markdown";

    public static void main(String[] args) {
        // Load Notion page as a markdown file
        String path = "docs/extras/use_cases/question_answering/notion_db/";
        var loader = new NotionDirectoryLoader(path);
        var docs = loader.load();
        var mdFile = docs.get(0).getPageContent();

        // Let's create groups based on the section headers in our page
        List<Pair<String, String>> headersToSplitOn = List.of(Pair.of("###", "Section"));
        MarkdownHeaderTextSplitter markdownSplitter = new MarkdownHeaderTextSplitter(headersToSplitOn);
        List<Document> mdHeaderSplits = markdownSplitter.splitText(mdFile);

        // Define our text splitter
        var textSplitter = RecursiveCharacterTextSplitter.builder()
                .chunkSize(500)
                .chunkOverlap(0)
                .keepSeparator(true)
                .build();
        var allSplits = textSplitter.splitDocuments(mdHeaderSplits);

        // Build pinecone and keep the metadata
        var vectorStore = initializePineconeIndex(NAMESPACE, allSplits);

        // Define our metadata
        var metadataFieldInfo = List.of(
                new AttributeInfo("Section", "Part of the document that the text comes from",
                        "string or list[string]"));
        var documentContentDescription = "Major sections of the document";

        // Define self query retriever
        var llm = OpenAI.builder().temperature(0).requestTimeout(30).build().init();
        var retriever = SelfQueryRetriever.fromLLM(llm, vectorStore, documentContentDescription, metadataFieldInfo);

        // create chat or Q+A apps that are aware of the explicit document structure.
        var chat = ChatOpenAI.builder().temperature(0).build().init();
        var qaChain = RetrievalQa.fromChainType(chat, retriever);
        var result = qaChain.run("Summarize the Testing section of the document");
        println(result);
    }
}

@HamaWhiteGG HamaWhiteGG added the enhancement New feature or request label Jul 13, 2023
@HamaWhiteGG HamaWhiteGG self-assigned this Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants