Enhancement/vector store oci refresh #324

ViliTajnic · 2025-11-04T13:26:24Z

Summary

Adds OCI Object Storage refresh capability for vector stores with intelligent duplicate detection and metadata tracking.

Feature

OCI Vector Store Refresh synchronizes vector stores with documents in OCI Object Storage buckets. Automatically detects new and modified files, processes only changes, and avoids re-embedding unchanged documents.

How It Works

Compares OCI bucket contents with existing vector store metadata
Identifies new and modified files using ETags and timestamps
Downloads and embeds only changed files
Updates vector store incrementally while preserving existing content

Key Benefits

Efficiency: Only processes changed files, not entire bucket
Duplicate Prevention: Skips already-embedded files using ETag comparison
Metadata Tracking: Stores file size, modified date, and ETag with chunks
File Listing: View all embedded files with statistics

Usage

Select vector store and OCI bucket
Click "Refresh from OCI"
View results showing new/updated files and chunks processed

) Add comprehensive functionality to automatically refresh vector stores when documents are added or modified in OCI Object Storage buckets while preserving original embedding parameters. Key features: - Change detection using object metadata (etag, time_modified) - Parameter preservation from existing vector stores - Incremental processing of only new/modified files - New REST API endpoint for refresh operations - Comprehensive status reporting Files modified: - common/schema.py: Add VectorStoreRefreshRequest and VectorStoreRefreshStatus schemas - server/api/utils/oci.py: Add get_bucket_objects_with_metadata() and detect_changed_objects() - server/api/utils/embed.py: Add refresh functionality with get_vector_store_by_alias(), get_processed_objects_metadata(), and refresh_vector_store_from_bucket() - server/api/v1/embed.py: Add POST /v1/embed/refresh endpoint

to use last minimal SpringBoot version and the sys prompt defined for vector search.

#266 - #264

Signed-off-by: Christopher Jones <christopher.jones@oracle.com>

* Added Unit Tests * Updated Documents * Updated Images

* Fix Release Action

* Shift pyproject.toml

* Expose FastMCP endpoints

Add reference to the new auto-refresh vector store functionality from OCI Object Storage buckets feature in the AI Optimizer Features section.

- Implemented get_processed_objects_metadata() to retrieve metadata from vector stores - Added ETag-based change detection for OCI Object Storage files - Support for both new metadata format (filename/etag) and legacy format (source) - Added get_total_chunks_count() helper function - Updated refresh_vector_store endpoint to skip already-processed files

…ore-oci-refresh

- Fix vector store refresh endpoint: replace core_oci.get_oci() with utils_oci.get() - core_oci module was removed in main branch (PR #312) - Updated refresh_vector_store in src/server/api/v1/embed.py - Fix ValueError in client when vector store no longer exists - Added validation check in src/client/utils/st_common.py - Handles case where previously selected vector store is filtered out - Resets to empty selection instead of crashing

…ore-oci-refresh

This commit merges the latest changes from main branch and adds extensive documentation for IDE integration to address issue #299. Changes from main merge: - Updated embed.py utility functions - Added webscrape.py for web scraping functionality - Updated embed API endpoints - Resolved .gitignore conflict New IDE Integration Documentation: - Created comprehensive guide: docs/content/advanced/ide_integration.md - Covers OpenAI-compatible REST API integration - Includes MCP (Model Context Protocol) integration details - Provides setup guides for: * Continue.dev * Cline * Cursor * Aider * Custom integrations - Includes API reference, examples, and troubleshooting - Documents RAG-powered development workflow - Covers SelectAI integration for IDEs - Provides cURL, Python, and Node.js examples Addresses: #299 (IDE Integration)

The IDE integration documentation has been removed from this branch. Addresses: #299 (IDE Integration)

- Add validation to prevent empty vector store names - Implement file listing endpoint to view embedded files - Display file metadata (name, chunks, size, modified date) - Enhance metadata capture for all file sources (local, SQL, web, OCI) - Add orphaned chunk detection and reporting - Auto-hide empty columns in file list display

Fixed issue where files refreshed from OCI Object Storage buckets were not showing size and modified date metadata in the file listing. Root cause: The OCI list_objects API call was not requesting metadata fields. By default, it only returns object names without size, etag, or modification time. Changes: - Added 'fields' parameter to list_objects API call in oci.py to explicitly request name, size, etag, timeModified, and md5 fields - Enhanced refresh_vector_store_from_bucket() in embed.py to build file_metadata dict from bucket objects and pass to document loader - Updated process_metadata() to add etag field to chunk metadata - Fixed Decimal to int conversion for size field in get_vector_store_files() - Added summary logging for OCI metadata retrieval Testing: Verified that files refreshed from OCI buckets now display correct size and modification date in the file listing UI.

Changed unused col1 variable to underscore to indicate it's intentionally unused (only col2 is used for the refresh button).

Tests for new functionality: - OCI bucket object metadata retrieval with fields parameter - Change detection for new and modified files - File listing from vector stores with metadata - Oracle Decimal to int conversion - Orphaned chunk detection - Old metadata format fallback Test coverage: - 10 tests for OCI refresh functions (get_bucket_objects_with_metadata, detect_changed_objects) - 6 tests for file listing (get_vector_store_files) - 3 integration tests for new API endpoints All 21 unit tests passing with comprehensive edge case coverage.

Improvements to the Split/Embed tool UI: - Add toggle control to switch between "Create New Vector Store" (default) and "Use Existing Vector Store" modes - When creating new VS: show simple text input for vector store name, display all configuration options (chunk size, overlap, distance metric, index type) - When using existing VS: hide configuration options (already defined by VS), filter dropdown to show only vector stores created with the same embedding model to prevent mixing embeddings - Show full vector store table name in both modes - Improved validation messages and help text - Prevents potential issues with mixing embeddings from different models in the same vector store This simplifies the UI and makes the distinction between creating new vs using existing vector stores much clearer.

Addresses pylint warnings (R0912: too-many-branches, R0915: too-many-statements) by extracting helper functions to improve code organization and maintainability. Changes: - Extract _render_create_new_vs_input() for create new mode UI - Extract _render_use_existing_vs_input() for use existing mode UI - Extract _validate_vector_store_alias() for validation logic - Extract _display_vector_store_info() for VS display and file list - Refactor main _render_vector_store_section() to use helpers Result: Main function reduced from 129 lines to 45 lines, with only 2 branches instead of 14, making it more maintainable and passing pylint complexity checks.

ViliTajnic and others added 30 commits September 18, 2025 16:05

update to SpringAI export

47f7454

to use last minimal SpringBoot version and the sys prompt defined for vector search.

tutorial-springai

a1b5368

remove tutorial

c2731cd

update to fix OCI model issue

c27cd81

#266 - #264

Do not log the password; fix whitespace (#254)

bb3af77

Signed-off-by: Christopher Jones <christopher.jones@oracle.com>

changed ctx_prompt in sys_prompt

e21d8bf

Pre-1.2.0 (#268)

cd2fcfb

* Added Unit Tests * Updated Documents * Updated Images

Fix IaC Version filename (#269)

75bc682

* Fix Release Action

Project compliance (#270)

1c7529e

* Shift pyproject.toml

Closes #265

e55b082

remove debug

768514f

Trickle in MCP Support (#272)

df054ea

* Expose FastMCP endpoints

Update README with auto-refresh vector store feature documentation

7e4fbfd

Add reference to the new auto-refresh vector store functionality from OCI Object Storage buckets feature in the AI Optimizer Features section.

Merge from main

ce2cd5a

sync tests

3742d64

Revert to new embed image

3be0b1c

Merge remote-tracking branch 'origin/main' into enhancement/vector-st…

ee48c4f

…ore-oci-refresh

Merge remote-tracking branch 'origin/main' into enhancement/vector-st…

cbff0c9

…ore-oci-refresh

Add DEVELOPMENT_NOTES.md to gitignore

55a6d96

Merge remote-tracking branch 'origin/main' into enhancement/vector-st…

c0a34a5

…ore-oci-refresh

Merge latest main into enhancement/vector-store-oci-refresh

fe69272

Merge remote-tracking branch 'origin/main' into enhancement/vector-st…

066fef0

…ore-oci-refresh

Add vector store OCI refresh with duplicate detection

04966a0

Merge remote-tracking branch 'origin/main' into enhancement/vector-st…

d0d1be6

…ore-oci-refresh

Merge remote-tracking branch 'origin/main' into enhancement/vector-st…

b011d06

…ore-oci-refresh

Remove ignored files from tracking

323d25f

ViliTajnic added 3 commits October 30, 2025 15:46

Remove IDE integration documentation file

9acd893

The IDE integration documentation has been removed from this branch. Addresses: #299 (IDE Integration)

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Nov 4, 2025

ViliTajnic and others added 8 commits November 4, 2025 14:30

Fix pylint unused variable warning in split_embed.py

af82e10

Changed unused col1 variable to underscore to indicate it's intentionally unused (only col2 is used for the refresh button).

security update

0c6433c

Fix lint, but needs re-work of flow

9ed0f53

reorg

f0a9b44

Merge latest changes from main branch

cdd9c4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancement/vector store oci refresh #324

Enhancement/vector store oci refresh #324

ViliTajnic commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enhancement/vector store oci refresh #324

Are you sure you want to change the base?

Enhancement/vector store oci refresh #324

Conversation

ViliTajnic commented Nov 4, 2025

Summary

Feature

How It Works

Key Benefits

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants