generated from oracle/template-repo
-
Notifications
You must be signed in to change notification settings - Fork 30
Enhancement/vector store oci refresh #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ViliTajnic
wants to merge
41
commits into
main
Choose a base branch
from
enhancement/vector-store-oci-refresh
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+4,565
−402
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
) Add comprehensive functionality to automatically refresh vector stores when documents are added or modified in OCI Object Storage buckets while preserving original embedding parameters. Key features: - Change detection using object metadata (etag, time_modified) - Parameter preservation from existing vector stores - Incremental processing of only new/modified files - New REST API endpoint for refresh operations - Comprehensive status reporting Files modified: - common/schema.py: Add VectorStoreRefreshRequest and VectorStoreRefreshStatus schemas - server/api/utils/oci.py: Add get_bucket_objects_with_metadata() and detect_changed_objects() - server/api/utils/embed.py: Add refresh functionality with get_vector_store_by_alias(), get_processed_objects_metadata(), and refresh_vector_store_from_bucket() - server/api/v1/embed.py: Add POST /v1/embed/refresh endpoint
to use last minimal SpringBoot version and the sys prompt defined for vector search.
Signed-off-by: Christopher Jones <christopher.jones@oracle.com>
* Added Unit Tests * Updated Documents * Updated Images
* Fix Release Action
* Shift pyproject.toml
* Expose FastMCP endpoints
Add reference to the new auto-refresh vector store functionality from OCI Object Storage buckets feature in the AI Optimizer Features section.
- Implemented get_processed_objects_metadata() to retrieve metadata from vector stores - Added ETag-based change detection for OCI Object Storage files - Support for both new metadata format (filename/etag) and legacy format (source) - Added get_total_chunks_count() helper function - Updated refresh_vector_store endpoint to skip already-processed files
- Fix vector store refresh endpoint: replace core_oci.get_oci() with utils_oci.get() - core_oci module was removed in main branch (PR #312) - Updated refresh_vector_store in src/server/api/v1/embed.py - Fix ValueError in client when vector store no longer exists - Added validation check in src/client/utils/st_common.py - Handles case where previously selected vector store is filtered out - Resets to empty selection instead of crashing
This commit merges the latest changes from main branch and adds extensive documentation for IDE integration to address issue #299. Changes from main merge: - Updated embed.py utility functions - Added webscrape.py for web scraping functionality - Updated embed API endpoints - Resolved .gitignore conflict New IDE Integration Documentation: - Created comprehensive guide: docs/content/advanced/ide_integration.md - Covers OpenAI-compatible REST API integration - Includes MCP (Model Context Protocol) integration details - Provides setup guides for: * Continue.dev * Cline * Cursor * Aider * Custom integrations - Includes API reference, examples, and troubleshooting - Documents RAG-powered development workflow - Covers SelectAI integration for IDEs - Provides cURL, Python, and Node.js examples Addresses: #299 (IDE Integration)
The IDE integration documentation has been removed from this branch. Addresses: #299 (IDE Integration)
- Add validation to prevent empty vector store names - Implement file listing endpoint to view embedded files - Display file metadata (name, chunks, size, modified date) - Enhance metadata capture for all file sources (local, SQL, web, OCI) - Add orphaned chunk detection and reporting - Auto-hide empty columns in file list display
Fixed issue where files refreshed from OCI Object Storage buckets were not showing size and modified date metadata in the file listing. Root cause: The OCI list_objects API call was not requesting metadata fields. By default, it only returns object names without size, etag, or modification time. Changes: - Added 'fields' parameter to list_objects API call in oci.py to explicitly request name, size, etag, timeModified, and md5 fields - Enhanced refresh_vector_store_from_bucket() in embed.py to build file_metadata dict from bucket objects and pass to document loader - Updated process_metadata() to add etag field to chunk metadata - Fixed Decimal to int conversion for size field in get_vector_store_files() - Added summary logging for OCI metadata retrieval Testing: Verified that files refreshed from OCI buckets now display correct size and modification date in the file listing UI.
Changed unused col1 variable to underscore to indicate it's intentionally unused (only col2 is used for the refresh button).
Tests for new functionality: - OCI bucket object metadata retrieval with fields parameter - Change detection for new and modified files - File listing from vector stores with metadata - Oracle Decimal to int conversion - Orphaned chunk detection - Old metadata format fallback Test coverage: - 10 tests for OCI refresh functions (get_bucket_objects_with_metadata, detect_changed_objects) - 6 tests for file listing (get_vector_store_files) - 3 integration tests for new API endpoints All 21 unit tests passing with comprehensive edge case coverage.
Improvements to the Split/Embed tool UI: - Add toggle control to switch between "Create New Vector Store" (default) and "Use Existing Vector Store" modes - When creating new VS: show simple text input for vector store name, display all configuration options (chunk size, overlap, distance metric, index type) - When using existing VS: hide configuration options (already defined by VS), filter dropdown to show only vector stores created with the same embedding model to prevent mixing embeddings - Show full vector store table name in both modes - Improved validation messages and help text - Prevents potential issues with mixing embeddings from different models in the same vector store This simplifies the UI and makes the distinction between creating new vs using existing vector stores much clearer.
Addresses pylint warnings (R0912: too-many-branches, R0915: too-many-statements) by extracting helper functions to improve code organization and maintainability. Changes: - Extract _render_create_new_vs_input() for create new mode UI - Extract _render_use_existing_vs_input() for use existing mode UI - Extract _validate_vector_store_alias() for validation logic - Extract _display_vector_store_info() for VS display and file list - Refactor main _render_vector_store_section() to use helpers Result: Main function reduced from 129 lines to 45 lines, with only 2 branches instead of 14, making it more maintainable and passing pylint complexity checks.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds OCI Object Storage refresh capability for vector stores with intelligent duplicate detection and metadata tracking.
Feature
OCI Vector Store Refresh synchronizes vector stores with documents in OCI Object Storage buckets. Automatically detects new and modified files, processes only changes, and avoids re-embedding unchanged documents.
How It Works
Key Benefits
Usage