-
Notifications
You must be signed in to change notification settings - Fork 8.3k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is a problem with URLs being removed from responses during segmentation #11861
Comments
I found a similar unsolved discussion regarding URLs being removed from responses during segmentation. You can view it here [1]. To address the issue of URLs being removed during segmentation, you can adjust the configuration settings in Dify. Specifically, you can modify the "process_rule": {
"mode": "custom",
"rules": {
"pre_processing_rules": [
{
"id": "remove_extra_spaces",
"enabled": true
},
{
"id": "remove_urls_emails",
"enabled": false
}
],
"segmentation": {
"separator": "\n",
"max_tokens": 1000
}
}
} This configuration will prevent URLs from being removed during the segmentation process [2]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
How do we change the process_rule configuration? Could you be more specific?
2024年12月20日(金) 9:38 dosubot[bot] ***@***.***>:
… I found a similar unsolved discussion regarding URLs being removed from
responses during segmentation. You can view it here
<#11856> [1]
<#11856>.
To address the issue of URLs being removed during segmentation, you can
adjust the configuration settings in Dify. Specifically, you can modify the
process_rule configuration to retain URLs by setting the
remove_urls_emails rule to false. Here is an example configuration:
"process_rule": {
"mode": "custom",
"rules": {
"pre_processing_rules": [
{
"id": "remove_extra_spaces",
"enabled": true
},
{
"id": "remove_urls_emails",
"enabled": false
}
],
"segmentation": {
"separator": "\n",
"max_tokens": 1000
}
}
}
This configuration will prevent URLs from being removed during the
segmentation process [2]
<https://github.com/langgenius/dify/blob/main/web/app/(commonLayout)/datasets/template/template.en.mdx>
.
*To continue talking to Dosu <https://dosu.dev>, mention @dosu
<https://github.com/dosu>.*
------------------------------
Help Dosu learn! Give it feedback: Great Response
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=great_response>
| Irrelevant Answer
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=irrelevant_answer>
| Incorrect Sources
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=incorrect_sources>
| Too Verbose
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=too_verbose>
| Hallucination
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=hallucination>
| Other
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=other>
—
Reply to this email directly, view it on GitHub
<#11861 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APHDDGVDASHPP5Y7VD6BEDL2GNRG5AVCNFSM6AAAAABT6C2VR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJWGAZDENZQHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
松永 力
|
To change the
This process will ensure that URLs are retained during the segmentation process [1]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
The process_rule section seems to be in json format. If I were to write it in docker-compose.yaml, how would it look like? |
The process_rule section seems to be in json format. If I were to write it
in docker-compose.yaml, how would it look like?
2024年12月20日(金) 9:52 dosubot[bot] ***@***.***>:
… To change the process_rule configuration in Dify, you need to modify the
configuration file where these settings are defined. Here’s a step-by-step
guide:
1.
*Locate the Configuration File*: The process_rule settings are
typically part of a configuration file in your Dify setup. If you are using
a self-hosted Docker deployment, these settings might be in a JSON or YAML
file within your project directory.
2.
*Edit the Configuration*: Open the configuration file and locate the
process_rule section. Modify the remove_urls_emails rule by setting
its enabled field to false. Here’s an example configuration:
{
"process_rule": {
"mode": "custom",
"rules": {
"pre_processing_rules": [
{
"id": "remove_extra_spaces",
"enabled": true
},
{
"id": "remove_urls_emails",
"enabled": false
}
],
"segmentation": {
"separator": "\n",
"max_tokens": 1000
}
}
}
}
3.
*Apply the Changes*: If you are using Docker, you will need to restart
your Docker containers to apply the changes. Use the following commands:
docker compose down
docker compose up -d
This process will ensure that URLs are retained during the segmentation
process [1]
<https://github.com/langgenius/dify/blob/main/web/app/(commonLayout)/datasets/template/template.en.mdx>
.
*To continue talking to Dosu <https://dosu.dev>, mention @dosu
<https://github.com/dosu>.*
------------------------------
Help Dosu learn! Give it feedback: Great Response
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=great_response>
| Irrelevant Answer
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=irrelevant_answer>
| Incorrect Sources
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=incorrect_sources>
| Too Verbose
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=too_verbose>
| Hallucination
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=hallucination>
| Other
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=other>
—
Reply to this email directly, view it on GitHub
<#11861 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APHDDGQ5FHWQRTSOSKXIFZD2GNS5XAVCNFSM6AAAAABT6C2VR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJWGAZTMNJZGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
松永 力
|
To convert the version: '3.8'
services:
your_service_name:
image: your_image_name
environment:
PROCESS_RULE_MODE: "custom"
PRE_PROCESSING_RULES: |
- id: remove_extra_spaces
enabled: true
- id: remove_urls_emails
enabled: false
SEGMENTATION_SEPARATOR: "\n"
SEGMENTATION_MAX_TOKENS: 1000 Replace To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Self Checks
Dify version
"server_version": "0.11.0"
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
There is a problem with URLs being removed from responses during segmentation, what is the recommended format for retaining URLs, is Markdown formatting or special character escaping required?
✔️ Expected Behavior
URLs remain in the responses even after segmentation.
❌ Actual Behavior
No response
The text was updated successfully, but these errors were encountered: