Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new data sources to aws_bedrockagent_data_source #40711

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

hjoshi123
Copy link
Contributor

@hjoshi123 hjoshi123 commented Dec 28, 2024

Description

This PR introduces new data source configurations like confluence, sharePoint, salesforce to aws_bedrockagent_data_source resource which was introduced recently Bedrock API. Schema and data models have been updated for the same.

Relations

Closes #40577.
Closes #39770.

References

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DataSourceConfiguration.html#bedrock-Type-agent_DataSourceConfiguration-webConfiguration

Output from Acceptance Testing

% make testacc TESTS=TestAccXXX PKG=ec2

...

@hjoshi123 hjoshi123 requested a review from a team as a code owner December 28, 2024 02:18
Copy link

Community Note

Voting for Prioritization

  • Please vote on this pull request by adding a 👍 reaction to the original post to help the community and maintainers prioritize this pull request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

For Submitters

  • Review the contribution guide relating to the type of change you are making to ensure all of the necessary steps have been taken.
  • For new resources and data sources, use skaff to generate scaffolding with comments detailing common expectations.
  • Whether or not the branch has been rebased will not impact prioritization, but doing so is always a welcome surprise.

@github-actions github-actions bot added service/bedrockagent Issues and PRs that pertain to the bedrockagent service. needs-triage Waiting for first response or review from a maintainer. labels Dec 28, 2024
@hjoshi123
Copy link
Contributor Author

Need help in writing acceptance tests since I am not sure what the values of the data configurations should be. It would be nice if someone could assist me on this.

@FireballDWF
Copy link

Need help in writing acceptance tests since I am not sure what the values of the data configurations should be. It would be nice if someone could assist me on this.

Would an aws cli example of a "webConfiguration" datasource be helpful to you? I could craft that sometime this week.

@hjoshi123
Copy link
Contributor Author

Need help in writing acceptance tests since I am not sure what the values of the data configurations should be. It would be nice if someone could assist me on this.

Would an aws cli example of a "webConfiguration" datasource be helpful to you? I could craft that sometime this week.

Yes that would certainly help. It would be nice if you could give an example of all configuration too like sales force and share point if possible.

@FireballDWF
Copy link

Need help in writing acceptance tests since I am not sure what the values of the data configurations should be. It would be nice if someone could assist me on this.

Would an aws cli example of a "webConfiguration" datasource be helpful to you? I could craft that sometime this week.

Yes that would certainly help. It would be nice if you could give an example of all configuration too like sales force and share point if possible.

I've started testing my type=Web. Once finished, I could work on a SalesForce example, but as I will not be willing to share the values of credentialsSecretArn, and I don't have the ability to create a SalesForce account for you, you won't be able to do end-to-end testing, but it might be enough to get the CreateDataSource API call to work, as i suspect the creds are not checked until the sync/injestion job is started.

@hjoshi123
Copy link
Contributor Author

yup I totally understand that, I dont think we can implement tests for proprietary solutions (might need insights from maintainers as to how we resolve that) but things that are open we should be able to test

@FireballDWF
Copy link

FireballDWF commented Dec 31, 2024

aws cli example type=Web

resource "null_resource" "web_configuration_data_source" {
  provisioner "local-exec" {
    command = "aws bedrock-agent create-data-source --name outposts-web-source --knowledge-base-id ${aws_bedrockagent_knowledge_base.kb.id} --description 'web-configuration for outposts' --data-source-configuration '${local.web_configuration_data_source}'"
  }
}

locals {
  web_configuration_data_source = <<-EOT
 {
  "type": "WEB",
  "webConfiguration": {
    "crawlerConfiguration": {
      "crawlerLimits": {
        "maxPages": 25000,
        "rateLimit": 300
      },
      "exclusionFilters": [
        ".*\\.(txt|csv|md|pdf|doc|docx|xls|xlsx).*",
        ".*/(users|topics|products|contact\\-us|about\\-aws|pricing|privacy)/.*",
        ".*/(terms|getting\\-started)$",
        ".*\\.(github|pages\\.awscloud|awsstatic|oracle)\\.com.*",
        ".*\\.(gov|edu).*",
        ".*/author/.*",
        ".*/tag/.*",
        ".*week\\-in\\-review.*",
        ".*top\\-announcements\\-of\\-aws\\-reinvent.*",
        ".*/category/compute/$",
        ".*/category/(compute/(amazon\\-.*|aws\\-[a-np-z].*|[b-z].*|auto\\-scaling)|[a-bd-z].*|containers|contact\\-center|customer\\-enablement|case\\-study|customer\\-enablement)/.*",
        ".*/blogs/[^/]*/$",
        ".*/\\?.*"
      ],
      "inclusionFilters": [
        ".*/blogs/(compute|containers|networking\\-and\\-content\\-delivery|storage|publicsector|media|awsmarketplace|apn|machine\\-learning|industries|mt|aws|architecture|database)/.*"
      ],
      "scope": "HOST_ONLY"
    },
    "sourceConfiguration": {
      "urlConfiguration": {
        "seedUrls": [
          {
            "url": "https://aws.amazon.com/blogs/compute/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/containers/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/networking-and-content-delivery/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/storage/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/publicsector/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/media/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/awsmarketplace/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/apn/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/machine-learning/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/industries/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/mt/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/aws/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/architecture/category/compute/aws-outposts/"
          },
          {
            "url": "https://aws.amazon.com/blogs/database/category/compute/aws-outposts/"
          }
        ]
      }
    }
  }
}
EOT
}

@ewbankkit ewbankkit added enhancement Requests to existing resources that expand the functionality or scope. and removed needs-triage Waiting for first response or review from a maintainer. labels Jan 5, 2025
@ewbankkit ewbankkit self-assigned this Jan 5, 2025
@github-actions github-actions bot added the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label Jan 5, 2025
@hjoshi123
Copy link
Contributor Author

@ewbankkit I am writing up tests for web configuration but not sure how do we handle salesforce/sharepoint/confluence?

@FireballDWF
Copy link

@hjoshi123 I revised my prior comment with a revised example which provides valid values for a few more of the attributes, so consider adding them to your test case.

@hjoshi123
Copy link
Contributor Author

thank you @FireballDWF

@ewbankkit ewbankkit removed their assignment Jan 7, 2025
@github-actions github-actions bot added the documentation Introduces or discusses updates to documentation. label Jan 8, 2025
…to f-aws_bedrockagent_data_source-newdatasources
@hjoshi123
Copy link
Contributor Author

@FireballDWF in the web config in user_agent which is optional but what would be the value for the example that you sent?

@FireballDWF
Copy link

@FireballDWF in the web config in user_agent which is optional but what would be the value for the example that you sent?

I did not use one, you can supply any string that meets the Length Constraints: "Minimum length of 15. Maximum length of 40." per SDK documentation at https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_WebCrawlerConfiguration.html

It's really just a passthru to the UserAgent string used by HTTP/S clients

@hjoshi123
Copy link
Contributor Author

@FireballDWF I wrote a test based on your input.. let me know if that works.. also I am not sure how I should run it since I saw some of the test involved postgres related commands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Introduces or discusses updates to documentation. enhancement Requests to existing resources that expand the functionality or scope. prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. service/bedrockagent Issues and PRs that pertain to the bedrockagent service.
Projects
None yet
3 participants