You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue you're experiencing with markdown ingestion in Dify, where text before the first markdown header (#) is not being ingested, might be related to how the markdown processing is implemented. Unfortunately, I couldn't find specific details in the codebase about how text before the first header is handled during ingestion.
However, the MarkdownExtractor class is responsible for loading markdown files and converting them into Document objects, which involves parsing the markdown into tuples of headers and their corresponding text content [1]. This might be where the issue arises, as the text before the first header might not be included in these tuples.
To address this, you might need to look into the MarkdownExtractor class and see if there's a way to modify it to include text before the first header. Alternatively, you could consider adding a header at the very beginning of your markdown content as a temporary workaround.
Self Checks
Dify version
0.14.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Ingest the following markdown into "Knowledge"
The rendered example looks like this
✔️ Expected Behavior
The
Any text above the first # is gone!
text should be ingested.❌ Actual Behavior
It is gone. As a consequence, there is no chunk for any text ahead of the first #.
The text was updated successfully, but these errors were encountered: