Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a problem with URLs being removed from responses during segmentation #11861

Closed
5 tasks done
doboncho opened this issue Dec 20, 2024 · 6 comments
Closed
5 tasks done
Labels
🐞 bug Something isn't working

Comments

@doboncho
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

"server_version": "0.11.0"

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

There is a problem with URLs being removed from responses during segmentation, what is the recommended format for retaining URLs, is Markdown formatting or special character escaping required?

✔️ Expected Behavior

URLs remain in the responses even after segmentation.

❌ Actual Behavior

No response

Copy link

dosubot bot commented Dec 20, 2024

I found a similar unsolved discussion regarding URLs being removed from responses during segmentation. You can view it here [1].

To address the issue of URLs being removed during segmentation, you can adjust the configuration settings in Dify. Specifically, you can modify the process_rule configuration to retain URLs by setting the remove_urls_emails rule to false. Here is an example configuration:

"process_rule": {
    "mode": "custom",
    "rules": {
        "pre_processing_rules": [
            {
                "id": "remove_extra_spaces",
                "enabled": true
            },
            {
                "id": "remove_urls_emails",
                "enabled": false
            }
        ],
        "segmentation": {
            "separator": "\n",
            "max_tokens": 1000
        }
    }
}

This configuration will prevent URLs from being removed during the segmentation process [2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the 🐞 bug Something isn't working label Dec 20, 2024
@doboncho
Copy link
Author

doboncho commented Dec 20, 2024 via email

Copy link

dosubot bot commented Dec 20, 2024

To change the process_rule configuration in Dify, you need to modify the configuration file where these settings are defined. Here’s a step-by-step guide:

  1. Locate the Configuration File: The process_rule settings are typically part of a configuration file in your Dify setup. If you are using a self-hosted Docker deployment, these settings might be in a JSON or YAML file within your project directory.

  2. Edit the Configuration: Open the configuration file and locate the process_rule section. Modify the remove_urls_emails rule by setting its enabled field to false. Here’s an example configuration:

    {
      "process_rule": {
        "mode": "custom",
        "rules": {
          "pre_processing_rules": [
            {
              "id": "remove_extra_spaces",
              "enabled": true
            },
            {
              "id": "remove_urls_emails",
              "enabled": false
            }
          ],
          "segmentation": {
            "separator": "\n",
            "max_tokens": 1000
          }
        }
      }
    }
  3. Apply the Changes: If you are using Docker, you will need to restart your Docker containers to apply the changes. Use the following commands:

    docker compose down
    docker compose up -d

This process will ensure that URLs are retained during the segmentation process [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@doboncho
Copy link
Author

The process_rule section seems to be in json format. If I were to write it in docker-compose.yaml, how would it look like?

@doboncho
Copy link
Author

doboncho commented Dec 20, 2024 via email

Copy link

dosubot bot commented Dec 20, 2024

To convert the process_rule JSON configuration into a docker-compose.yaml format, you can define it as environment variables within the Docker Compose file. Here's how you can structure it:

version: '3.8'
services:
  your_service_name:
    image: your_image_name
    environment:
      PROCESS_RULE_MODE: "custom"
      PRE_PROCESSING_RULES: |
        - id: remove_extra_spaces
          enabled: true
        - id: remove_urls_emails
          enabled: false
      SEGMENTATION_SEPARATOR: "\n"
      SEGMENTATION_MAX_TOKENS: 1000

Replace your_service_name and your_image_name with the appropriate values for your Docker setup. This YAML format sets environment variables that correspond to the JSON structure you provided. After making these changes, remember to restart your Docker containers using docker compose down followed by docker compose up -d to apply the new configuration.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@langgenius langgenius locked and limited conversation to collaborators Dec 20, 2024
@crazywoola crazywoola converted this issue into discussion #11865 Dec 20, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant