Skip to content

Commit 2e1cec9

Browse files
authored
Merge pull request #2825 from shansv/shansv-feature-eventbridge-scheduled-stepfunction-bedrock-kb-sync
New serverless pattern - eventbridge-scheduled-stepfunction-bedrock-kb-sync
2 parents 7ae7e8c + ac401e1 commit 2e1cec9

File tree

53 files changed

+3943
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+3943
-0
lines changed
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# Amazon Bedrock Knowledge Base Synchronization Flow with Amazon EventBridge Scheduler
2+
3+
This pattern demonstrates an automated synchronization process for Amazon Bedrock Knowledge Bases using Amazon EventBridge Scheduler and AWS Step Functions. The solution enables periodic synchronization of data sources, ensuring your Knowledge Base stays up-to-date with the latest content.
4+
5+
Learn more about this pattern at Serverless Land Patterns: https://serverlessland.com/patterns/eventbridge-scheduled-stepfunction-bedrock-kb-sync
6+
7+
8+
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.
9+
10+
## Architecture
11+
![Architecture diagram](docs/images/KBSyncPipeline.jpg)
12+
13+
## Requirements
14+
15+
* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
16+
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
17+
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
18+
* [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) (AWS CDK) installed
19+
20+
## Deployment Instructions
21+
22+
1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
23+
```
24+
git clone https://github.com/aws-samples/serverless-patterns
25+
```
26+
2. Change directory to the pattern directory:
27+
```
28+
cd serverless-patterns/eventbridge-scheduled-stepfunction-bedrock-kb-sync/cdk
29+
```
30+
3. Setup local developer environment and dependencies:
31+
```
32+
make bootstrap-venv
33+
source .venv/bin/activate
34+
```
35+
4. From the command line, configure AWS CDK:
36+
```bash
37+
cdk bootstrap
38+
```
39+
5. From the command line, use AWS CDK to deploy the AWS resources for the pattern as specified in the `lib/appsync-eventbridge-datasource-stack.ts` file:
40+
```bash
41+
cdk deploy --all
42+
```
43+
6. This command will take sometime to run. After successfully completing, the below stacks deployed.
44+
```
45+
KbRoleStack
46+
CommonLambdaLayerStack
47+
OSSStack
48+
KbSyncPipelineStack
49+
KbInfraStack
50+
```
51+
52+
## How it works
53+
54+
Here's a detailed summary of your serverless pattern for automated Knowledge Base synchronization:
55+
56+
Pattern Overview: This is a scheduled, serverless workflow that automates the synchronization of Amazon Bedrock Knowledge Bases using AWS EventBridge Scheduler, AWS Step Functions, and Amazon Bedrock.
57+
58+
Key Components:
59+
a) EventBridge Scheduler
60+
- Runs every 15 minutes
61+
- Triggers the Step Function workflow
62+
- Passes Amazon Bedrock Knowledge Base ID as input parameter
63+
- Enables consistent and automated synchronization
64+
65+
b) Step Functions Workflow
66+
-Main Flow:
67+
- Receives Knowledge Base ID from EventBridge
68+
- Orchestrates the entire synchronization process
69+
- Handles error scenarios and retries
70+
- Manages parallel processing of multiple data sources
71+
72+
Step 1: Data Source Retrieval
73+
Queries all associated data sources for the given Knowledge Base ID
74+
Prepares the list for processing
75+
Validates data source configurations
76+
77+
Step 2: Map State for Parallel Processing
78+
Iterates through each data source
79+
Processes multiple data sources concurrently
80+
Manages state for each sync operation
81+
82+
Step 3: Synchronization Process (For each data source)
83+
Initiates the sync operation
84+
Monitors sync status
85+
Handles completion and failures
86+
Reports sync results
87+
88+
Step 4: Status Reporting
89+
Aggregates sync results
90+
Records success/failure metrics
91+
Generates summary reports
92+
93+
## Testing
94+
95+
Step 1: Upload Sample Documents to Amazon S3
96+
- Navigate to Amazon S3 in AWS Console
97+
- Locate the bucket named kb-data-source-{account-id}
98+
- Upload your sample documents to this bucket
99+
100+
Step 2: Wait for Scheduler Execution
101+
- The EventBridge scheduler is configured to run every 15 minutes
102+
- You can monitor the scheduler in EventBridge console
103+
Note: The next execution will occur at the next 15-minute interval
104+
105+
Step 3: Monitor Step Function Execution
106+
- Navigate to AWS Step Functions console
107+
- Locate the state machine execution named KnowledgeBaseSyncStateMachine
108+
- Monitor the workflow progress through different states
109+
- Verify successful completion of all steps
110+
111+
Step 4: Verify Sync Status in Amazon Bedrock
112+
- Go to Amazon Bedrock console
113+
- Navigate to Knowledge Bases
114+
- Select your Knowledge Base
115+
- Click on Data Sources
116+
- Check the Sync History tab
117+
- Verify the sync status shows as "Completed"
118+
- Review sync details including:
119+
Timestamp of sync
120+
Number of documents processed
121+
Any errors or warnings
122+
123+
124+
Step 5: Validation Points
125+
- Confirm documents are indexed
126+
- Check sync completion status
127+
- Verify no errors in sync history
128+
- Ensure all uploaded documents are processed
129+
130+
Troubleshooting
131+
If sync fails or documents aren't appearing:
132+
133+
Check S3 bucket permissions
134+
Review Step Function execution logs
135+
Verify document format compatibility
136+
Check Knowledge Base configuration
137+
138+
![KB Pipeline Architecture](docs/images/KBSyncPipeline.png)
139+
140+
## Delete stack
141+
142+
```bash
143+
cdk destroy --all
144+
```
145+
----
146+
Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
147+
148+
SPDX-License-Identifier: MIT-0
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
share/python-wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# PyInstaller
30+
# Usually these files are written by a python script from a template
31+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py,cover
50+
.hypothesis/
51+
.pytest_cache/
52+
cover/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
.pybuilder/
76+
target/
77+
78+
# Jupyter Notebook
79+
.ipynb_checkpoints
80+
81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
85+
# pyenv
86+
# For a library or package, you might want to ignore these files since the code is
87+
# intended to run in multiple environments; otherwise, check them in:
88+
# .python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
#Pipfile.lock
96+
97+
# UV
98+
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99+
# This is especially recommended for binary packages to ensure reproducibility, and is more
100+
# commonly ignored for libraries.
101+
#uv.lock
102+
103+
# poetry
104+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105+
# This is especially recommended for binary packages to ensure reproducibility, and is more
106+
# commonly ignored for libraries.
107+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108+
#poetry.lock
109+
110+
# pdm
111+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
112+
#pdm.lock
113+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
114+
# in version control.
115+
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
116+
.pdm.toml
117+
.pdm-python
118+
.pdm-build/
119+
120+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
121+
__pypackages__/
122+
123+
# Celery stuff
124+
celerybeat-schedule
125+
celerybeat.pid
126+
127+
# SageMath parsed files
128+
*.sage.py
129+
130+
# Environments
131+
.env
132+
.venv
133+
env/
134+
venv/
135+
ENV/
136+
env.bak/
137+
venv.bak/
138+
139+
# Spyder project settings
140+
.spyderproject
141+
.spyproject
142+
143+
# Rope project settings
144+
.ropeproject
145+
146+
# mkdocs documentation
147+
/site
148+
149+
# mypy
150+
.mypy_cache/
151+
.dmypy.json
152+
dmypy.json
153+
154+
# Pyre type checker
155+
.pyre/
156+
157+
# pytype static type analyzer
158+
.pytype/
159+
160+
# Cython debug symbols
161+
cython_debug/
162+
163+
# PyCharm
164+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
165+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
166+
# and can be added to the global gitignore or merged into this file. For a more nuclear
167+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
168+
#.idea/
169+
170+
# Ruff stuff:
171+
.ruff_cache/
172+
173+
# PyPI configuration file
174+
.pypirc
175+
176+
# CDK asset staging directory
177+
.cdk.staging
178+
cdk.out
179+
180+
# Misc
181+
unittests.xml
182+
183+
.coverage
184+
cov.xml

0 commit comments

Comments
 (0)