From a4dd3b56176ea7df15992f64feef894835ededb0 Mon Sep 17 00:00:00 2001 From: Paula Date: Fri, 17 Oct 2025 13:50:08 +0200 Subject: [PATCH 01/13] update startup parameters --- site/content/ai-suite/reference/importer.md | 113 ++++++++++------- site/content/ai-suite/reference/retriever.md | 116 +++++++++++------- .../reference/triton-inference-server.md | 4 +- 3 files changed, 138 insertions(+), 95 deletions(-) diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index e4cce5d200..834f2a83c1 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -98,45 +98,20 @@ To start the service, use the AI service endpoint `/v1/graphragimporter`. Please refer to the documentation of [AI service](gen-ai.md) for more information on how to use it. -### Using Triton Inference Server (Private LLM) - -The first step is to install the LLM Host service with the LLM and -embedding models of your choice. The setup will the use the -Triton Inference Server and MLflow at the backend. -For more details, please refer to the [Triton Inference Server](triton-inference-server.md) -and [Mlflow](mlflow.md) documentation. - -Once the `llmhost` service is up-and-running, then you can start the Importer -service using the below configuration: - -```json -{ - "env": { - "username": "your_username", - "db_name": "your_database_name", - "api_provider": "triton", - "triton_url": "your-arangodb-llm-host-url", - "triton_model": "mistral-nemo-instruct" - }, -} -``` - -Where: -- `username`: ArangoDB database user with permissions to create and modify collections. -- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored. -- `api_provider`: Specifies which LLM provider to use. -- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running. -- `triton_model`: Name of the LLM model to use for text processing. - -### Using OpenAI (Public LLM) +### Using OpenAI for chat and embedding ```json { "env": { - "openai_api_key": "your_openai_api_key", "username": "your_username", "db_name": "your_database_name", - "api_provider": "openai" + "chat_api_provider": "openai", + "chat_api_url": "https://api.openai.com/v1", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "gpt-4o", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openai_api_key", + "embedding_api_key": "your_openai_api_key" }, } ``` @@ -144,8 +119,12 @@ Where: Where: - `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored -- `api_provider`: Specifies which LLM provider to use -- `openai_api_key`: Your OpenAI API key +- `chat_api_provider`: API provider for language model services +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} By default, for OpenAI API, the service is using @@ -153,7 +132,7 @@ By default, for OpenAI API, the service is using embedding model respectively. {{< /info >}} -### Using OpenRouter (Gemini, Anthropic, etc.) +### Using OpenRouter for chat and OpenAI for embedding OpenRouter makes it possible to connect to a huge array of LLM API providers, including non-OpenAI LLMs like Gemini Flash, Anthropic Claude @@ -167,27 +146,69 @@ while OpenAI is used for the embedding model. "env": { "db_name": "your_database_name", "username": "your_username", - "api_provider": "openrouter", - "openai_api_key": "your_openai_api_key", - "openrouter_api_key": "your_openrouter_api_key", - "openrouter_model": "mistralai/mistral-nemo" // Specify a model here + "chat_api_provider": "openai", + "embedding_api_provider": "openai", + "chat_api_url": "https://openrouter.ai/api/v1", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "mistral-nemo", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openrouter_api_key", + "embedding_api_key": "your_openai_api_key" }, } ``` Where: -- `username`: ArangoDB database user with permissions to access collections -- `db_name`: Name of the ArangoDB database where the knowledge graph is stored -- `api_provider`: Specifies which LLM provider to use -- `openai_api_key`: Your OpenAI API key (for the embedding model) -- `openrouter_api_key`: Your OpenRouter API key (for the LLM) -- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`) +- `username`: ArangoDB database user with permissions to access collections +- `db_name`: Name of the ArangoDB database where the knowledge graph is stored +- `chat_api_provider`: API provider for language model services +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} When using OpenRouter, the service defaults to `mistral-nemo` for generation (via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI). {{< /info >}} +### Using Triton Inference Server for chat and embedding + +The first step is to install the LLM Host service with the LLM and +embedding models of your choice. The setup will the use the +Triton Inference Server and MLflow at the backend. +For more details, please refer to the [Triton Inference Server](triton-inference-server.md) +and [Mlflow](mlflow.md) documentation. + +Once the `llmhost` service is up-and-running, then you can start the Importer +service using the below configuration: + +```json +{ + "env": { + "username": "your_username", + "db_name": "your_database_name", + "chat_api_provider": "triton", + "embedding_api_provider": "triton", + "chat_api_url": "your-arangodb-llm-host-url", + "embedding_api_url": "your-arangodb-llm-host-url", + "chat_model": "mistral-nemo-instruct", + "embedding_model": "nomic-embed-text-v1" + }, +} +``` + +Where: +- `username`: ArangoDB database user with permissions to create and modify collections +- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored +- `chat_api_provider`: Specifies which LLM provider to use for language model services +- `embedding_api_provider`: API provider for embedding model services (e.g., "triton") +- `chat_api_url`: API endpoint URL for the chat/language model service +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings + ## Building Knowledge Graphs Once the service is installed successfully, you can follow these steps diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 5949d8a369..c43f7931c4 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -88,54 +88,34 @@ To start the service, use the AI service endpoint `/v1/graphragretriever`. Please refer to the documentation of [AI service](gen-ai.md) for more information on how to use it. -### Using Triton Inference Server (Private LLM) +### Using OpenAI for chat and embedding -The first step is to install the LLM Host service with the LLM and -embedding models of your choice. The setup will the use the -Triton Inference Server and MLflow at the backend. -For more details, please refer to the [Triton Inference Server](triton-inference-server.md) -and [Mlflow](mlflow.md) documentation. - -Once the `llmhost` service is up-and-running, then you can start the Importer -service using the below configuration: ```json { "env": { "username": "your_username", "db_name": "your_database_name", - "api_provider": "triton", - "triton_url": "your-arangodb-llm-host-url", - "triton_model": "mistral-nemo-instruct" + "chat_api_provider": "openai", + "chat_api_url": "https://api.openai.com/v1", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "gpt-4o", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openai_api_key", + "embedding_api_key": "your_openai_api_key" }, } ``` Where: -- `username`: ArangoDB database user with permissions to access collections. -- `db_name`: Name of the ArangoDB database where the knowledge graph is stored. -- `api_provider`: Specifies which LLM provider to use. -- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running. -- `triton_model`: Name of the LLM model to use for text processing. - -### Using OpenAI (Public LLM) - -```json -{ - "env": { - "openai_api_key": "your_openai_api_key", - "username": "your_username", - "db_name": "your_database_name", - "api_provider": "openai" - }, -} -``` - -Where: -- `username`: ArangoDB database user with permissions to access collections. -- `db_name`: Name of the ArangoDB database where the knowledge graph is stored. -- `api_provider`: Specifies which LLM provider to use. -- `openai_api_key`: Your OpenAI API key. +- `username`: ArangoDB database user with permissions to create and modify collections +- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored +- `chat_api_provider`: API provider for language model services +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} By default, for OpenAI API, the service is using @@ -143,7 +123,7 @@ By default, for OpenAI API, the service is using embedding model respectively. {{< /info >}} -### Using OpenRouter (Gemini, Anthropic, etc.) +### Using OpenRouter for chat and OpenAI for embedding OpenRouter makes it possible to connect to a huge array of LLM API providers, including non-OpenAI LLMs like Gemini Flash, Anthropic Claude and publicly hosted @@ -157,27 +137,69 @@ OpenAI is used for the embedding model. "env": { "db_name": "your_database_name", "username": "your_username", - "api_provider": "openrouter", - "openai_api_key": "your_openai_api_key", - "openrouter_api_key": "your_openrouter_api_key", - "openrouter_model": "mistralai/mistral-nemo" // Specify a model here + "chat_api_provider": "openai", + "embedding_api_provider": "openai", + "chat_api_url": "https://openrouter.ai/api/v1", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "mistral-nemo", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openrouter_api_key", + "embedding_api_key": "your_openai_api_key" }, } ``` Where: -- `username`: ArangoDB database user with permissions to access collections. -- `db_name`: Name of the ArangoDB database where the knowledge graph is stored. -- `api_provider`: Specifies which LLM provider to use. -- `openai_api_key`: Your OpenAI API key (for the embedding model). -- `openrouter_api_key`: Your OpenRouter API key (for the LLM). -- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`). +- `username`: ArangoDB database user with permissions to access collections +- `db_name`: Name of the ArangoDB database where the knowledge graph is stored +- `chat_api_provider`: API provider for language model services +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} When using OpenRouter, the service defaults to `mistral-nemo` for generation (via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI). {{< /info >}} +### Using Triton Inference Server for chat and embedding + +The first step is to install the LLM Host service with the LLM and +embedding models of your choice. The setup will the use the +Triton Inference Server and MLflow at the backend. +For more details, please refer to the [Triton Inference Server](triton-inference-server.md) +and [Mlflow](mlflow.md) documentation. + +Once the `llmhost` service is up-and-running, then you can start the Importer +service using the below configuration: + +```json +{ + "env": { + "username": "your_username", + "db_name": "your_database_name", + "chat_api_provider": "triton", + "embedding_api_provider": "triton", + "chat_api_url": "your-arangodb-llm-host-url", + "embedding_api_url": "your-arangodb-llm-host-url", + "chat_model": "mistral-nemo-instruct", + "embedding_model": "nomic-embed-text-v1" + }, +} +``` + +Where: +- `username`: ArangoDB database user with permissions to create and modify collections +- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored +- `chat_api_provider`: Specifies which LLM provider to use for language model services +- `embedding_api_provider`: API provider for embedding model services (e.g., "triton") +- `chat_api_url`: API endpoint URL for the chat/language model service +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings + ## Executing queries After the Retriever service is installed successfully, you can interact with diff --git a/site/content/ai-suite/reference/triton-inference-server.md b/site/content/ai-suite/reference/triton-inference-server.md index 458226743e..1e1b982932 100644 --- a/site/content/ai-suite/reference/triton-inference-server.md +++ b/site/content/ai-suite/reference/triton-inference-server.md @@ -26,8 +26,8 @@ following steps: 1. Install the Triton LLM Host service. 2. Register your LLM model to MLflow by uploading the required files. -3. Configure the [Importer](importer.md#using-triton-inference-server-private-llm) service to use your LLM model. -4. Configure the [Retriever](retriever.md#using-triton-inference-server-private-llm) service to use your LLM model. +3. Configure the [Importer](importer.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model. +4. Configure the [Retriever](retriever.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model. {{< tip >}} Check out the dedicated [ArangoDB MLflow](mlflow.md) documentation page to learn From a32d84104fdf3788112d78d2dbebb891e87e767e Mon Sep 17 00:00:00 2001 From: Paula Mihu <97217318+nerpaula@users.noreply.github.com> Date: Wed, 22 Oct 2025 12:36:10 +0200 Subject: [PATCH 02/13] Apply suggestions from code review Co-authored-by: Anthony Mahanna <43019056+aMahanna@users.noreply.github.com> --- site/content/ai-suite/reference/importer.md | 3 +++ site/content/ai-suite/reference/retriever.md | 3 +++ 2 files changed, 6 insertions(+) diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index 834f2a83c1..e19b04d235 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -107,6 +107,7 @@ information on how to use it. "db_name": "your_database_name", "chat_api_provider": "openai", "chat_api_url": "https://api.openai.com/v1", + "embedding_api_provider": "openai", "embedding_api_url": "https://api.openai.com/v1", "chat_model": "gpt-4o", "embedding_model": "text-embedding-3-small", @@ -120,6 +121,7 @@ Where: - `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: API provider for language model services +- `embeddinga_api_provider`: API provider for embedding model services - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings @@ -162,6 +164,7 @@ Where: - `username`: ArangoDB database user with permissions to access collections - `db_name`: Name of the ArangoDB database where the knowledge graph is stored - `chat_api_provider`: API provider for language model services +- `embedding_api_provider`: API provider for embedding model services - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index c43f7931c4..6a688dad94 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -98,6 +98,7 @@ information on how to use it. "db_name": "your_database_name", "chat_api_provider": "openai", "chat_api_url": "https://api.openai.com/v1", + "embedding_api_provider": "openai", "embedding_api_url": "https://api.openai.com/v1", "chat_model": "gpt-4o", "embedding_model": "text-embedding-3-small", @@ -111,6 +112,7 @@ Where: - `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: API provider for language model services +- `emebdding_api_provider`: API provider for embedding model services - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings @@ -153,6 +155,7 @@ Where: - `username`: ArangoDB database user with permissions to access collections - `db_name`: Name of the ArangoDB database where the knowledge graph is stored - `chat_api_provider`: API provider for language model services +- `embedding_api_provider`: API provider for embedding model services - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings From 02adb5068c9605e77fcfe61e8b363f107f9c2343 Mon Sep 17 00:00:00 2001 From: Paula Date: Wed, 22 Oct 2025 15:56:00 +0200 Subject: [PATCH 03/13] apply changes to all versions, fix typo --- site/content/ai-suite/reference/importer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index e19b04d235..1e534320df 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -121,7 +121,7 @@ Where: - `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: API provider for language model services -- `embeddinga_api_provider`: API provider for embedding model services +- `embedding_api_provider`: API provider for embedding model services - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings From 3504267b99cf7d9ab9304b5da13e19d6ba52168d Mon Sep 17 00:00:00 2001 From: Paula Date: Fri, 24 Oct 2025 12:34:58 +0200 Subject: [PATCH 04/13] instant and deep search --- site/content/ai-suite/reference/retriever.md | 59 +++++++++----------- 1 file changed, 27 insertions(+), 32 deletions(-) diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 6a688dad94..d0e3aba1e1 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -15,9 +15,9 @@ the Arango team. ## Overview The Retriever service offers two distinct search methods: -- **Global search**: Analyzes entire document to identify themes and patterns, +- **Instant search**: Analyzes entire document to identify themes and patterns, perfect for high-level insights and comprehensive summaries. -- **Local search**: Focuses on specific entities and their relationships, ideal +- **Deep search**: Focuses on specific entities and their relationships, ideal for detailed queries about particular concepts. The service supports both private (Triton Inference Server) and public (OpenAI) @@ -33,19 +33,19 @@ graph and get contextually relevant responses. - Configurable community hierarchy levels {{< tip >}} -You can also use the GraphRAG Retriever service via the ArangoDB [web interface](../graphrag/web-interface.md). +You can also use the GraphRAG Retriever service via the [web interface](../graphrag/web-interface.md). {{< /tip >}} ## Search methods The Retriever service enables intelligent search and retrieval of information -from your knowledge graph. It provides two powerful search methods, global Search -and local Search, that leverage the structured knowledge graph created by the Importer +from your knowledge graph. It provides two powerful search methods, instant search +and deep search, that leverage the structured knowledge graph created by the Importer to deliver accurate and contextually relevant responses to your natural language queries. -### Global search +### Deep search -Global search is designed for queries that require understanding and aggregation +Deep search is designed for queries that require understanding and aggregation of information across your entire document. It's particularly effective for questions about overall themes, patterns, or high-level insights in your data. @@ -60,9 +60,9 @@ about overall themes, patterns, or high-level insights in your data. - "Summarize the key findings across all documents" - "What are the most important concepts discussed?" -### Local search +### Instant search -Local search focuses on specific entities and their relationships within your +Instant search focuses on specific entities and their relationships within your knowledge graph. It is ideal for detailed queries about particular concepts, entities, or relationships. @@ -210,28 +210,32 @@ it using the following HTTP endpoints, based on the selected search method. {{< tabs "executing-queries" >}} -{{< tab "Local search" >}} +{{< tab "Instant search" >}} ```bash -curl -X POST /v1/graphrag-query \ +curl -X POST /v1/graphrag-query-stream \ -H "Content-Type: application/json" \ -d '{ "query": "What is the AR3 Drone?", - "query_type": 2, - "provider": 0 + "query_type": "UNIFIED", + "provider": 0, + "include_metadata": true, + "use_llm_planner": false }' ``` {{< /tab >}} -{{< tab "Global search" >}} +{{< tab "Deep search" >}} ```bash curl -X POST /v1/graphrag-query \ -H "Content-Type: application/json" \ -d '{ - "query": "What is the AR3 Drone?", + "query": "What are the main themes and topics discussed in the documents?", "level": 1, - "query_type": 1, - "provider": 0 + "query_type": "LOCAL", + "provider": 0, + "include_metadata": true, + "use_llm_planner": true }' ``` {{< /tab >}} @@ -240,13 +244,15 @@ curl -X POST /v1/graphrag-query \ The request parameters are the following: - `query`: Your search query text. -- `level`: The community hierarchy level to use for the search (`1` for top-level communities). +- `level`: The community hierarchy level to use for the search (`1` for top-level communities). Defaults to `2` if not provided. - `query_type`: The type of search to perform. - - `1`: Global search. - - `2`: Local search. -- `provider`: The LLM provider to use + - `UNIFIED`: Instant search. + - `LOCAL`: Deep search. +- `provider`: The LLM provider to use: - `0`: OpenAI (or OpenRouter) - `1`: Triton +- `include_metadata`: Whether to include metadata in the response. If not specified, defaults to `true`. +- `use_llm_planner`: Whether to use the LLM planner for intelligent query processing. If not specified, defaults to `true`. ## Health check @@ -274,17 +280,6 @@ properties: } ``` -## Best Practices - -- **Choose the right search method**: - - Use global search for broad, thematic queries. - - Use local search for specific entity or relationship queries. - - -- **Performance considerations**: - - Global search may take longer due to its map-reduce process. - - Local search is typically faster for concrete queries. - ## API Reference For detailed API documentation, see the From bec2f30e9d40e15d134808102808dafe2509c922 Mon Sep 17 00:00:00 2001 From: Paula Date: Fri, 24 Oct 2025 13:55:47 +0200 Subject: [PATCH 05/13] update description of instant and deep search; fix some anchor links --- .../ai-suite/graphrag/web-interface.md | 8 ++--- site/content/ai-suite/reference/retriever.md | 30 +++++++++++-------- 2 files changed, 21 insertions(+), 17 deletions(-) diff --git a/site/content/ai-suite/graphrag/web-interface.md b/site/content/ai-suite/graphrag/web-interface.md index 7438127d6f..6caa04b177 100644 --- a/site/content/ai-suite/graphrag/web-interface.md +++ b/site/content/ai-suite/graphrag/web-interface.md @@ -159,10 +159,10 @@ See also the [GraphRAG Retriever](../reference/retriever.md) documentation. ## Chat with your Knowledge Graph The Retriever service provides two search methods: -- [Local search](../reference/retriever.md#local-search): Local queries let you - explore specific nodes and their direct connections. -- [Global search](../reference/retriever.md#global-search): Global queries uncover - broader patters and relationships across the entire Knowledge Graph. +- [Instant search](../reference/retriever.md#instant-search): Instant + queries provide fast responses. +- [Deep search](../reference/retriever.md#deep-search): This option will take + longer to return a response. ![Chat with your Knowledge Graph](../../images/graphrag-ui-chat.png) diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index d0e3aba1e1..1a6e324346 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -15,10 +15,10 @@ the Arango team. ## Overview The Retriever service offers two distinct search methods: -- **Instant search**: Analyzes entire document to identify themes and patterns, - perfect for high-level insights and comprehensive summaries. -- **Deep search**: Focuses on specific entities and their relationships, ideal - for detailed queries about particular concepts. +- **Instant search**: Focuses on specific entities and their relationships, ideal + for fast queries about particular concepts. +- **Deep search**: Analyzes the knowledge graph structure to identify themes and patterns, + perfect for comprehensive insights and detailed summaries. The service supports both private (Triton Inference Server) and public (OpenAI) LLM deployments, making it flexible for various security and infrastructure @@ -43,14 +43,17 @@ from your knowledge graph. It provides two powerful search methods, instant sear and deep search, that leverage the structured knowledge graph created by the Importer to deliver accurate and contextually relevant responses to your natural language queries. -### Deep search +### Deep Search -Deep search is designed for queries that require understanding and aggregation -of information across your entire document. It's particularly effective for questions -about overall themes, patterns, or high-level insights in your data. +Deep Search is designed for highly detailed, accurate responses that require understanding +what kind of information is available in different parts of the knowledge graph and +sequentially retrieving information in an LLM-guided research process. Use whenever +detail and accuracy are required (e.g. aggregation of highly technical details) and +very short latency is not (i.e. caching responses for frequently asked questions, +or use case with agents or research use cases). - **Community-Based Analysis**: Uses pre-generated community reports from your - knowledge graph to understand the overall structure and themes of your data, + knowledge graph to understand the overall structure and themes of your data. - **Map-Reduce Processing**: - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points. - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response. @@ -60,11 +63,12 @@ about overall themes, patterns, or high-level insights in your data. - "Summarize the key findings across all documents" - "What are the most important concepts discussed?" -### Instant search +### Instant Search -Instant search focuses on specific entities and their relationships within your -knowledge graph. It is ideal for detailed queries about particular concepts, -entities, or relationships. +Instant Search is designed for responses with very short latency. It triggers +fast unified retrieval over relevant parts of the knowledge graph via hybrid +(semantic and lexical) search and graph expansion algorithms, producing a fast, +streamed natural-language response with clickable references to the relevant documents. - **Entity Identification**: Identifies relevant entities from the knowledge graph based on the query. - **Context Gathering**: Collects: From b33024b54b6ac460a28390819b6b2dc386bb4ae0 Mon Sep 17 00:00:00 2001 From: Paula Date: Fri, 31 Oct 2025 15:53:23 +0100 Subject: [PATCH 06/13] add missing description for chat_api_url --- site/content/ai-suite/reference/importer.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index 1e534320df..e0eaa718ab 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -121,6 +121,7 @@ Where: - `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: API provider for language model services +- `chat_api_url`: API endpoint URL for the chat/language model service - `embedding_api_provider`: API provider for embedding model services - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis @@ -164,6 +165,7 @@ Where: - `username`: ArangoDB database user with permissions to access collections - `db_name`: Name of the ArangoDB database where the knowledge graph is stored - `chat_api_provider`: API provider for language model services +- `chat_api_url`: API endpoint URL for the chat/language model service - `embedding_api_provider`: API provider for embedding model services - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis From e48a6d062ce079824af92b9b77a53433f723a633 Mon Sep 17 00:00:00 2001 From: Paula Date: Fri, 31 Oct 2025 15:56:14 +0100 Subject: [PATCH 07/13] fix typo --- site/content/ai-suite/reference/retriever.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 1a6e324346..947fc1bb98 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -116,7 +116,8 @@ Where: - `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: API provider for language model services -- `emebdding_api_provider`: API provider for embedding model services +- `chat_api_url`: API endpoint URL for the chat/language model service +- `embedding_api_provider`: API provider for embedding model services - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings From 81d49fb368dcbbfd27e5ded9e133cd44f2159d89 Mon Sep 17 00:00:00 2001 From: Paula Date: Mon, 3 Nov 2025 17:01:14 +0100 Subject: [PATCH 08/13] reference project creation in retriever file; add project name validation rules --- site/content/ai-suite/reference/importer.md | 14 ++++++++++++++ site/content/ai-suite/reference/retriever.md | 9 +++++++++ 2 files changed, 23 insertions(+) diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index e0eaa718ab..4a36403cdb 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -35,6 +35,10 @@ To create a new GraphRAG project, use the `CreateProject` method by sending a `project_name` and a `project_type` in the request body. Optionally, you can provide a `project_description`. +The `project_name` must follow these validation rules: +- Length: 1–63 characters +- Allowed characters: letters, numbers, underscores (`_`), and hyphens (`-`) + ```curl curl -X POST "https://:8529/ai/v1/project" \ -H "Content-Type: application/json" \ @@ -44,6 +48,7 @@ curl -X POST "https://:8529/ai/v1/project" \ "project_description": "A documentation project for GraphRAG." }' ``` + All the relevant ArangoDB collections (such as documents, chunks, entities, relationships, and communities) created during the import process will have the project name as a prefix. For example, the Documents collection will @@ -51,6 +56,15 @@ become `_Documents`. The Knowledge Graph will also use the project name as a prefix. If no project name is specified, then all collections are prefixed with `default_project`, e.g., `default_project_Documents`. +Once created, you can reference your project in other services (such as the +Importer or Retriever) using the `genai_project_name` field: + +```json +{ + "genai_project_name": "docs" +} +``` + ### Project metadata Additional project metadata is accessible via the following endpoint, replacing diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 947fc1bb98..2571520000 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -36,6 +36,15 @@ graph and get contextually relevant responses. You can also use the GraphRAG Retriever service via the [web interface](../graphrag/web-interface.md). {{< /tip >}} +## Prerequisites + +Before using the Retriever service, you need to create a GraphRAG project and +import data using the Importer service. + +For detailed instructions on creating a project, see +[Creating a new project](importer.md#creating-a-new-project) in the Importer +documentation. + ## Search methods The Retriever service enables intelligent search and retrieval of information From e657d19b4bfd4868d0854ee366bee9bed5fce5b9 Mon Sep 17 00:00:00 2001 From: Paula Date: Tue, 4 Nov 2025 16:04:43 +0100 Subject: [PATCH 09/13] remove username from startup parameters --- site/content/ai-suite/reference/importer.md | 6 ------ site/content/ai-suite/reference/retriever.md | 6 ------ 2 files changed, 12 deletions(-) diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index 4a36403cdb..1e83f8a475 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -117,7 +117,6 @@ information on how to use it. ```json { "env": { - "username": "your_username", "db_name": "your_database_name", "chat_api_provider": "openai", "chat_api_url": "https://api.openai.com/v1", @@ -132,7 +131,6 @@ information on how to use it. ``` Where: -- `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: API provider for language model services - `chat_api_url`: API endpoint URL for the chat/language model service @@ -162,7 +160,6 @@ while OpenAI is used for the embedding model. { "env": { "db_name": "your_database_name", - "username": "your_username", "chat_api_provider": "openai", "embedding_api_provider": "openai", "chat_api_url": "https://openrouter.ai/api/v1", @@ -176,7 +173,6 @@ while OpenAI is used for the embedding model. ``` Where: -- `username`: ArangoDB database user with permissions to access collections - `db_name`: Name of the ArangoDB database where the knowledge graph is stored - `chat_api_provider`: API provider for language model services - `chat_api_url`: API endpoint URL for the chat/language model service @@ -206,7 +202,6 @@ service using the below configuration: ```json { "env": { - "username": "your_username", "db_name": "your_database_name", "chat_api_provider": "triton", "embedding_api_provider": "triton", @@ -219,7 +214,6 @@ service using the below configuration: ``` Where: -- `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: Specifies which LLM provider to use for language model services - `embedding_api_provider`: API provider for embedding model services (e.g., "triton") diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 2571520000..5ce96d128d 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -107,7 +107,6 @@ information on how to use it. ```json { "env": { - "username": "your_username", "db_name": "your_database_name", "chat_api_provider": "openai", "chat_api_url": "https://api.openai.com/v1", @@ -122,7 +121,6 @@ information on how to use it. ``` Where: -- `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: API provider for language model services - `chat_api_url`: API endpoint URL for the chat/language model service @@ -152,7 +150,6 @@ OpenAI is used for the embedding model. { "env": { "db_name": "your_database_name", - "username": "your_username", "chat_api_provider": "openai", "embedding_api_provider": "openai", "chat_api_url": "https://openrouter.ai/api/v1", @@ -166,7 +163,6 @@ OpenAI is used for the embedding model. ``` Where: -- `username`: ArangoDB database user with permissions to access collections - `db_name`: Name of the ArangoDB database where the knowledge graph is stored - `chat_api_provider`: API provider for language model services - `embedding_api_provider`: API provider for embedding model services @@ -195,7 +191,6 @@ service using the below configuration: ```json { "env": { - "username": "your_username", "db_name": "your_database_name", "chat_api_provider": "triton", "embedding_api_provider": "triton", @@ -208,7 +203,6 @@ service using the below configuration: ``` Where: -- `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored - `chat_api_provider`: Specifies which LLM provider to use for language model services - `embedding_api_provider`: API provider for embedding model services (e.g., "triton") From 52176cea0cee6bee77f62b45aa4bbc0d3f7c371e Mon Sep 17 00:00:00 2001 From: Paula Date: Tue, 4 Nov 2025 17:35:27 +0100 Subject: [PATCH 10/13] update startup parameters in genai service; add project_db_name in project creation; move and extend Projects --- site/content/ai-suite/reference/gen-ai.md | 154 +++++++++++++------ site/content/ai-suite/reference/importer.md | 53 +------ site/content/ai-suite/reference/retriever.md | 12 +- 3 files changed, 125 insertions(+), 94 deletions(-) diff --git a/site/content/ai-suite/reference/gen-ai.md b/site/content/ai-suite/reference/gen-ai.md index f545a7e255..078b3037db 100644 --- a/site/content/ai-suite/reference/gen-ai.md +++ b/site/content/ai-suite/reference/gen-ai.md @@ -33,22 +33,15 @@ in the platform. All services support the `profiles` field, which you can use to define the profile to use for the service. For example, you can define a GPU profile that enables the service to run an LLM on GPU resources. -## LLM Host Service Creation Request Body +## Service Creation Request Body -```json -{ - "env": { - "model_name": "" - } -} -``` - -## Using Labels in Creation Request Body +The following example shows a complete request body with all available options: ```json { "env": { - "model_name": "" + "model_name": "", + "profiles": "gpu,internal" }, "labels": { "key1": "value1", @@ -57,32 +50,116 @@ GPU profile that enables the service to run an LLM on GPU resources. } ``` -{{< info >}} -Labels are optional. Labels can be used to filter and identify services in -the Platform. If you want to use labels, define them as a key-value pair in `labels` -within the `env` field. -{{< /info >}} +**Optional fields:** + +- **labels**: Key-value pairs used to filter and identify services in the platform. +- **profiles**: A comma-separated string defining which profiles to use for the + service (e.g., `"gpu,internal"`). If not set, the service is created with the + default profile. Profiles must be present and created in the platform before + they can be used. + +The parameters required for the deployment of each service are defined in the +corresponding service documentation. See [Importer](importer.md) +and [Retriever](retriever.md). + +## Projects + +Projects help you organize your GraphRAG work by grouping related services and +keeping your data separate. When the Importer service creates ArangoDB collections +(such as documents, chunks, entities, relationships, and communities), it uses +your project name as a prefix. For example, a project named `docs` will have +collections like `docs_Documents`, `docs_Chunks`, and so on. -## Using Profiles in Creation Request Body +### Creating a project + +To create a new GraphRAG project, send a POST request to the project endpoint: + +```bash +curl -X POST "https://:8529/gen-ai/v1/project" \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + -d '{ + "project_name": "docs", + "project_type": "graphrag", + "project_db_name": "documentation", + "project_description": "A documentation project for GraphRAG." + }' +``` + +Where: +- **project_name** (required): Unique identifier for your project. Must be 1-63 + characters and contain only letters, numbers, underscores (`_`), and hyphens (`-`). +- **project_type** (required): Type of project (e.g., `"graphrag"`). +- **project_db_name** (required): The ArangoDB database name where the project + will be created. +- **project_description** (optional): A description of your project. + +Once created, you can reference your project in service deployments using the +`genai_project_name` field: ```json { - "env": { - "model_name": "", - "profiles": "gpu,internal" - } + "env": { + "genai_project_name": "docs" + } } ``` -{{< info >}} -The `profiles` field is optional. If it is not set, the service is created with -the default profile. Profiles must be present and created in the Platform before -they can be used. If you want to use profiles, define them as a comma-separated -string in `profiles` within the `env` field. -{{< /info >}} +### Listing projects -The parameters required for the deployment of each service are defined in the -corresponding service documentation. +**List all project names in a database:** + +```bash +curl -X GET "https://:8529/gen-ai/v1/all_project_names/" \ + -H "Authorization: Bearer " +``` + +This returns only the project names for quick reference. + +**List all projects with full metadata in a database:** + +```bash +curl -X GET "https://:8529/gen-ai/v1/all_projects/" \ + -H "Authorization: Bearer " +``` + +This returns complete project objects including metadata, associated services, +and knowledge graph information. + +### Getting project details + +Retrieve comprehensive metadata for a specific project: + +```bash +curl -X GET "https://:8529/gen-ai/v1/project_by_name//" \ + -H "Authorization: Bearer " +``` + +The response includes: +- Project configuration +- Associated Importer and Retriever services +- Knowledge graph metadata +- Service status information +- Last modification timestamp + +### Deleting a project + +Remove a project's metadata from the GenAI service: + +```bash +curl -X DELETE "https://:8529/gen-ai/v1/project//" \ + -H "Authorization: Bearer " +``` + +{{< warning >}} +Deleting a project only removes the project metadata from the GenAI service. +It does **not** delete: +- Services associated with the project (must be deleted separately) +- ArangoDB collections and data +- Knowledge graphs + +You must manually delete services and collections if needed. +{{< /warning >}} ## Obtaining a Bearer Token @@ -101,7 +178,7 @@ documentation. ## Complete Service lifecycle example -The example below shows how to install, monitor, and uninstall the Importer service. +The example below shows how to install, monitor, and uninstall the [Importer](importer.md) service. ### Step 1: Installing the service @@ -111,11 +188,10 @@ curl -X POST https://:8529/ai/v1/graphragimporter \ -H "Content-Type: application/json" \ -d '{ "env": { - "username": "", "db_name": "", - "api_provider": "", - "triton_url": "", - "triton_model": "" + "chat_api_provider": "", + "chat_api_key": "", + "chat_model": "" } }' ``` @@ -176,16 +252,6 @@ curl -X DELETE https://:8529/ai/v1/service/arangodb-graphrag-i - **Authentication**: All requests use the same Bearer token in the `Authorization` header {{< /info >}} -### Customizing the example - -Replace the following values with your actual configuration: -- `` - Your database username. -- `` - Target database name. -- `` - Your API provider (e.g., `triton`) -- `` - Your LLM host service URL. -- `` - Your Triton model name (e.g., `mistral-nemo-instruct`). -- `` - Your authentication token. - ## Service configuration The AI orchestrator service is **started by default**. diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index 1e83f8a475..5f66ecbe3e 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -28,54 +28,17 @@ different concepts in your document with the Retriever service. You can also use the GraphRAG Importer service via the [Data Platform web interface](../graphrag/web-interface.md). {{< /tip >}} -## Creating a new project - -To create a new GraphRAG project, use the `CreateProject` method by sending a -`POST` request to the `/ai/v1/project` endpoint. You must provide a unique -`project_name` and a `project_type` in the request body. Optionally, you can -provide a `project_description`. - -The `project_name` must follow these validation rules: -- Length: 1–63 characters -- Allowed characters: letters, numbers, underscores (`_`), and hyphens (`-`) - -```curl -curl -X POST "https://:8529/ai/v1/project" \ --H "Content-Type: application/json" \ --d '{ - "project_name": "docs", - "project_type": "graphrag", - "project_description": "A documentation project for GraphRAG." -}' -``` - -All the relevant ArangoDB collections (such as documents, chunks, entities, -relationships, and communities) created during the import process will -have the project name as a prefix. For example, the Documents collection will -become `_Documents`. The Knowledge Graph will also use the project -name as a prefix. If no project name is specified, then all collections -are prefixed with `default_project`, e.g., `default_project_Documents`. - -Once created, you can reference your project in other services (such as the -Importer or Retriever) using the `genai_project_name` field: - -```json -{ - "genai_project_name": "docs" -} -``` - -### Project metadata +## Prerequisites -Additional project metadata is accessible via the following endpoint, replacing -`` with the actual name of your project: +Before importing data, you need to create a GraphRAG project. Projects help you +organize your work and keep your data separate from other projects. -``` -GET /ai/v1/project_by_name/ -``` +For detailed instructions on creating and managing projects, see the +[Projects](gen-ai.md#projects) section in the GenAI Orchestration Service +documentation. -The endpoint provides comprehensive metadata about your project's components, -including its importer and retriever services and their status. +Once you have created a project, you can reference it when deploying the Importer +service using the `genai_project_name` field in the service configuration. ## Deployment options diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 5ce96d128d..8eb9d64711 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -38,12 +38,14 @@ You can also use the GraphRAG Retriever service via the [web interface](../graph ## Prerequisites -Before using the Retriever service, you need to create a GraphRAG project and -import data using the Importer service. +Before using the Retriever service, you need to: -For detailed instructions on creating a project, see -[Creating a new project](importer.md#creating-a-new-project) in the Importer -documentation. +1. **Create a GraphRAG project** - For detailed instructions on creating and + managing projects, see the [Projects](gen-ai.md#projects) section in the + GenAI Orchestration Service documentation. + +2. **Import data** - Use the [Importer](importer.md) service to transform your + text documents into a knowledge graph stored in ArangoDB. ## Search methods From dcc8be1ddac23508fe7aa571808e95628a2b66ee Mon Sep 17 00:00:00 2001 From: Paula Date: Wed, 5 Nov 2025 20:50:56 +0100 Subject: [PATCH 11/13] clarify project requirements; clarify OpenAI-compatible API usage --- site/content/ai-suite/reference/gen-ai.md | 4 ++ site/content/ai-suite/reference/importer.md | 52 +++++++++++------- site/content/ai-suite/reference/retriever.md | 55 ++++++++++++-------- 3 files changed, 70 insertions(+), 41 deletions(-) diff --git a/site/content/ai-suite/reference/gen-ai.md b/site/content/ai-suite/reference/gen-ai.md index 078b3037db..0745965f54 100644 --- a/site/content/ai-suite/reference/gen-ai.md +++ b/site/content/ai-suite/reference/gen-ai.md @@ -70,6 +70,10 @@ keeping your data separate. When the Importer service creates ArangoDB collectio your project name as a prefix. For example, a project named `docs` will have collections like `docs_Documents`, `docs_Chunks`, and so on. +Projects are required for the following services: +- Importer +- Retriever + ### Creating a project To create a new GraphRAG project, send a POST request to the project endpoint: diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index 5f66ecbe3e..daf130c262 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -68,14 +68,24 @@ services like OpenAI's models via the OpenAI API or a large array of models The Importer service can be configured to use either: - Triton Inference Server (for private LLM deployments) -- OpenAI (for public LLM deployments) -- OpenRouter (for public LLM deployments) +- Any OpenAI-compatible API (for public LLM deployments), including OpenAI, OpenRouter, Gemini, Anthropic, and more To start the service, use the AI service endpoint `/v1/graphragimporter`. Please refer to the documentation of [AI service](gen-ai.md) for more information on how to use it. -### Using OpenAI for chat and embedding +### Using OpenAI-compatible APIs + +The `openai` provider works with any OpenAI-compatible API, including: +- OpenAI (official API) +- OpenRouter +- Google Gemini +- Anthropic Claude +- Corporate or self-hosted LLMs with OpenAI-compatible endpoints + +set the `chat_api_url` and `embedding_api_url` to point to your provider's endpoint. + +**Example using OpenAI:** ```json { @@ -95,9 +105,9 @@ information on how to use it. Where: - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored -- `chat_api_provider`: API provider for language model services +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API - `chat_api_url`: API endpoint URL for the chat/language model service -- `embedding_api_provider`: API provider for embedding model services +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings @@ -105,19 +115,20 @@ Where: - `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -By default, for OpenAI API, the service is using -`gpt-4o-mini` and `text-embedding-3-small` models as LLM and -embedding model respectively. +When using the official OpenAI API, the service defaults to `gpt-4o-mini` and +`text-embedding-3-small` models. {{< /info >}} -### Using OpenRouter for chat and OpenAI for embedding +### Using different providers for chat and embedding -OpenRouter makes it possible to connect to a huge array of LLM API -providers, including non-OpenAI LLMs like Gemini Flash, Anthropic Claude -and publicly hosted open-source models. +You can mix and match any OpenAI-compatible APIs for chat and embedding. For example, +you might use one provider for text generation and another for embeddings, depending +on your needs for performance, cost, or model availability. -When using the OpenRouter option, the LLM responses are served via OpenRouter -while OpenAI is used for the embedding model. +Since both providers use `"openai"` as the provider value, you differentiate them by +setting different URLs in `chat_api_url` and `embedding_api_url`. + +**Example using OpenRouter for chat and OpenAI for embedding:** ```json { @@ -137,18 +148,19 @@ while OpenAI is used for the embedding model. Where: - `db_name`: Name of the ArangoDB database where the knowledge graph is stored -- `chat_api_provider`: API provider for language model services -- `chat_api_url`: API endpoint URL for the chat/language model service -- `embedding_api_provider`: API provider for embedding model services -- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `chat_api_url`: API endpoint URL for the chat/language model service (in this example, OpenRouter) +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `embedding_api_url`: API endpoint URL for the embedding model service (in this example, OpenAI) - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings - `chat_api_key`: API key for authenticating with the chat/language model service - `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -When using OpenRouter, the service defaults to `mistral-nemo` for generation -(via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI). +You can use any combination of OpenAI-compatible providers. This example shows +OpenRouter (for chat) and OpenAI (for embeddings), but you could use Gemini, +Anthropic, or any other compatible service. {{< /info >}} ### Using Triton Inference Server for chat and embedding diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 8eb9d64711..62586273d0 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -20,8 +20,7 @@ The Retriever service offers two distinct search methods: - **Deep search**: Analyzes the knowledge graph structure to identify themes and patterns, perfect for comprehensive insights and detailed summaries. -The service supports both private (Triton Inference Server) and public (OpenAI) -LLM deployments, making it flexible for various security and infrastructure +The service supports both private (Triton Inference Server) and public (any OpenAI-compatible API) LLM deployments, making it flexible for various security and infrastructure requirements. With simple HTTP endpoints, you can easily query your knowledge graph and get contextually relevant responses. @@ -97,14 +96,25 @@ streamed natural-language response with clickable references to the relevant doc ## Installation The Retriever service can be configured to use either the Triton Inference Server -(for private LLM deployments) or OpenAI/OpenRouter (for public LLM deployments). +(for private LLM deployments) or any OpenAI-compatible API (for public LLM deployments), +including OpenAI, OpenRouter, Gemini, Anthropic, and more. To start the service, use the AI service endpoint `/v1/graphragretriever`. Please refer to the documentation of [AI service](gen-ai.md) for more information on how to use it. -### Using OpenAI for chat and embedding +### Using OpenAI-compatible APIs +The `openai` provider works with any OpenAI-compatible API, including: +- OpenAI (official API) +- OpenRouter +- Google Gemini +- Anthropic Claude +- Corporate or self-hosted LLMs with OpenAI-compatible endpoints + +Set the `chat_api_url` and `embedding_api_url` to point to your provider's endpoint. + +**Example using OpenAI:** ```json { @@ -124,9 +134,9 @@ information on how to use it. Where: - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored -- `chat_api_provider`: API provider for language model services +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API - `chat_api_url`: API endpoint URL for the chat/language model service -- `embedding_api_provider`: API provider for embedding model services +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API - `embedding_api_url`: API endpoint URL for the embedding model service - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings @@ -134,19 +144,20 @@ Where: - `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -By default, for OpenAI API, the service is using -`gpt-4o-mini` and `text-embedding-3-small` models as LLM and -embedding model respectively. +When using the official OpenAI API, the service defaults to `gpt-4o-mini` and +`text-embedding-3-small` models. {{< /info >}} -### Using OpenRouter for chat and OpenAI for embedding +### Using different providers for chat and embedding -OpenRouter makes it possible to connect to a huge array of LLM API providers, -including non-OpenAI LLMs like Gemini Flash, Anthropic Claude and publicly hosted -open-source models. +You can mix and match any OpenAI-compatible APIs for chat and embedding. For example, +you might use one provider for text generation and another for embeddings, depending +on your needs for performance, cost, or model availability. -When using the OpenRouter option, the LLM responses are served via OpenRouter while -OpenAI is used for the embedding model. +Since both providers use `"openai"` as the provider value, you differentiate them by +setting different URLs in `chat_api_url` and `embedding_api_url`. + +**Example using OpenRouter for chat and OpenAI for embedding:** ```json { @@ -166,17 +177,19 @@ OpenAI is used for the embedding model. Where: - `db_name`: Name of the ArangoDB database where the knowledge graph is stored -- `chat_api_provider`: API provider for language model services -- `embedding_api_provider`: API provider for embedding model services -- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `chat_api_url`: API endpoint URL for the chat/language model service (in this example, OpenRouter) +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `embedding_api_url`: API endpoint URL for the embedding model service (in this example, OpenAI) - `chat_model`: Specific language model to use for text generation and analysis - `embedding_model`: Specific model to use for generating text embeddings - `chat_api_key`: API key for authenticating with the chat/language model service - `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -When using OpenRouter, the service defaults to `mistral-nemo` for generation -(via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI). +You can use any combination of OpenAI-compatible providers. This example shows +OpenRouter (for chat) and OpenAI (for embeddings), but you could use Gemini, +Anthropic, or any other compatible service. {{< /info >}} ### Using Triton Inference Server for chat and embedding @@ -259,7 +272,7 @@ The request parameters are the following: - `UNIFIED`: Instant search. - `LOCAL`: Deep search. - `provider`: The LLM provider to use: - - `0`: OpenAI (or OpenRouter) + - `0`: Any OpenAI-compatible API (OpenAI, OpenRouter, Gemini, Anthropic, etc.) - `1`: Triton - `include_metadata`: Whether to include metadata in the response. If not specified, defaults to `true`. - `use_llm_planner`: Whether to use the LLM planner for intelligent query processing. If not specified, defaults to `true`. From 59cdb870e575b928ca0eeed44637312e20529a94 Mon Sep 17 00:00:00 2001 From: Paula Date: Thu, 6 Nov 2025 13:05:29 +0100 Subject: [PATCH 12/13] restructure and clarify all available search methods and executing queries --- .../ai-suite/graphrag/web-interface.md | 2 - site/content/ai-suite/reference/retriever.md | 193 ++++++++++++------ 2 files changed, 136 insertions(+), 59 deletions(-) diff --git a/site/content/ai-suite/graphrag/web-interface.md b/site/content/ai-suite/graphrag/web-interface.md index ee2297a4c1..01d0d19f2c 100644 --- a/site/content/ai-suite/graphrag/web-interface.md +++ b/site/content/ai-suite/graphrag/web-interface.md @@ -184,8 +184,6 @@ The Retriever service provides two search methods: - [Deep search](../reference/retriever.md#deep-search): This option will take longer to return a response. -![Chat with your Knowledge Graph](../../images/graphrag-ui-chat.png) - In addition to querying the Knowledge Graph, the chat service allows you to do the following: - Switch the search method from **Instant search** to **Deep search** and vice-versa directly in the chat diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 62586273d0..0e524fb867 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -14,25 +14,23 @@ the Arango team. ## Overview -The Retriever service offers two distinct search methods: -- **Instant search**: Focuses on specific entities and their relationships, ideal - for fast queries about particular concepts. -- **Deep search**: Analyzes the knowledge graph structure to identify themes and patterns, - perfect for comprehensive insights and detailed summaries. - -The service supports both private (Triton Inference Server) and public (any OpenAI-compatible API) LLM deployments, making it flexible for various security and infrastructure -requirements. With simple HTTP endpoints, you can easily query your knowledge -graph and get contextually relevant responses. +The Retriever service provides intelligent search and retrieval from knowledge graphs, +with multiple search methods optimized for different query types. The service supports +both private (Triton Inference Server) and public (any OpenAI-compatible API) LLM +deployments, making it flexible for various security and infrastructure requirements. **Key features:** -- Dual search methods for different query types +- Multiple search methods optimized for different use cases +- Streaming support for real-time responses for `UNIFIED` queries +- Optional LLM orchestration for `LOCAL` queries +- Configurable community hierarchy levels for `GLOBAL` queries - Support for both private and public LLM deployments - Simple REST API interface - Integration with ArangoDB knowledge graphs -- Configurable community hierarchy levels {{< tip >}} -You can also use the GraphRAG Retriever service via the [web interface](../graphrag/web-interface.md). +You can use the Retriever service via the [web interface](../graphrag/web-interface.md) +for Instant and Deep Search, or through the API for full control over all query types. {{< /tip >}} ## Prerequisites @@ -46,12 +44,29 @@ Before using the Retriever service, you need to: 2. **Import data** - Use the [Importer](importer.md) service to transform your text documents into a knowledge graph stored in ArangoDB. -## Search methods +## Search Methods The Retriever service enables intelligent search and retrieval of information -from your knowledge graph. It provides two powerful search methods, instant search -and deep search, that leverage the structured knowledge graph created by the Importer -to deliver accurate and contextually relevant responses to your natural language queries. +from your knowledge graph. It provides multiple search methods that leverage +the structured knowledge graph created by the Importer to deliver accurate and +contextually relevant responses to your natural language queries. + +### Instant Search + +Instant Search is designed for responses with very short latency. It triggers +fast unified retrieval over relevant parts of the knowledge graph via hybrid +(semantic and lexical) search and graph expansion algorithms, producing a fast, +streamed natural-language response with clickable references to the relevant documents. + +{{< info >}} +The Instant Search method is also available via the [Web interface](../graphrag/web-interface.md). +{{< /info >}} + +```json +{ + "query_type": "UNIFIED" +} +``` ### Deep Search @@ -62,36 +77,50 @@ detail and accuracy are required (e.g. aggregation of highly technical details) very short latency is not (i.e. caching responses for frequently asked questions, or use case with agents or research use cases). -- **Community-Based Analysis**: Uses pre-generated community reports from your - knowledge graph to understand the overall structure and themes of your data. +{{< info >}} +The Deep Search method is also available via the [Web interface](../graphrag/web-interface.md). +{{< /info >}} + +```json +{ + "query_type": "LOCAL", + "use_llm_planner": true +} +``` + +### Global Search + +Global search is designed for queries that require understanding and aggregation of information across your entire document. It’s particularly effective for questions about overall themes, patterns, or high-level insights in your data. + +- **Community-Based Analysis**: Uses pre-generated community reports from your knowledge graph to understand the overall structure and themes of your data. - **Map-Reduce Processing**: - - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points. - - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response. + - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points. + - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response. -**Best use cases**: -- "What are the main themes in the dataset?" -- "Summarize the key findings across all documents" -- "What are the most important concepts discussed?" +```json +{ + "query_type": "GLOBAL" +} +``` -### Instant Search +### Local Search -Instant Search is designed for responses with very short latency. It triggers -fast unified retrieval over relevant parts of the knowledge graph via hybrid -(semantic and lexical) search and graph expansion algorithms, producing a fast, -streamed natural-language response with clickable references to the relevant documents. +Local search focuses on specific entities and their relationships within your knowledge graph. It is ideal for detailed queries about particular concepts, entities, or relationships. - **Entity Identification**: Identifies relevant entities from the knowledge graph based on the query. - **Context Gathering**: Collects: - - Related text chunks from original documents. - - Connected entities and their strongest relationships. - - Entity descriptions and attributes. - - Context from the community each entity belongs to. + - Related text chunks from original documents. + - Connected entities and their strongest relationships. + - Entity descriptions and attributes. + - Context from the community each entity belongs to. - **Prioritized Response**: Generates a response using the most relevant gathered information. -**Best use cases**: -- "What are the properties of [specific entity]?" -- "How is [entity A] related to [entity B]?" -- "What are the key details about [specific concept]?" +```json +{ + "query_type": "LOCAL", + "use_llm_planner": false +} +``` ## Installation @@ -195,12 +224,12 @@ Anthropic, or any other compatible service. ### Using Triton Inference Server for chat and embedding The first step is to install the LLM Host service with the LLM and -embedding models of your choice. The setup will the use the +embedding models of your choice. The setup will use the Triton Inference Server and MLflow at the backend. For more details, please refer to the [Triton Inference Server](triton-inference-server.md) and [Mlflow](mlflow.md) documentation. -Once the `llmhost` service is up-and-running, then you can start the Importer +Once the `llmhost` service is up-and-running, then you can start the Retriever service using the below configuration: ```json @@ -229,53 +258,103 @@ Where: ## Executing queries After the Retriever service is installed successfully, you can interact with -it using the following HTTP endpoints, based on the selected search method. +it using the following HTTP endpoints. {{< tabs "executing-queries" >}} -{{< tab "Instant search" >}} +{{< tab "Instant Search" >}} + ```bash curl -X POST /v1/graphrag-query-stream \ -H "Content-Type: application/json" \ -d '{ - "query": "What is the AR3 Drone?", + "query": "How are X and Y related?", "query_type": "UNIFIED", "provider": 0, - "include_metadata": true, - "use_llm_planner": false + "include_metadata": true }' ``` + {{< /tab >}} -{{< tab "Deep search" >}} +{{< tab "Deep Search" >}} ```bash curl -X POST /v1/graphrag-query \ -H "Content-Type: application/json" \ -d '{ - "query": "What are the main themes and topics discussed in the documents?", + "query": "What are the properties of a specific entity?", + "query_type": "LOCAL", + "use_llm_planner": true, + "provider": 0, + "include_metadata": true + }' +``` + +{{< /tab >}} + +{{< tab "Global Search" >}} + +```bash +curl -X POST /v1/graphrag-query \ + -H "Content-Type: application/json" \ + -d '{ + "query": "What are the main themes discussed in the document?", + "query_type": "GLOBAL", "level": 1, + "provider": 0, + "include_metadata": true + }' +``` + +{{< /tab >}} + +{{< tab "Local Search" >}} + +```bash +curl -X POST /v1/graphrag-query \ + -H "Content-Type: application/json" \ + -d '{ + "query": "What is the AR3 Drone?", "query_type": "LOCAL", + "use_llm_planner": false, "provider": 0, - "include_metadata": true, - "use_llm_planner": true + "include_metadata": true }' ``` + {{< /tab >}} {{< /tabs >}} -The request parameters are the following: -- `query`: Your search query text. -- `level`: The community hierarchy level to use for the search (`1` for top-level communities). Defaults to `2` if not provided. +### Request Parameters + +- `query`: Your search query text (required). + - `query_type`: The type of search to perform. - - `UNIFIED`: Instant search. - - `LOCAL`: Deep search. -- `provider`: The LLM provider to use: + - `GLOBAL` or `1`: Global Search (default if not specified). + - `LOCAL` or `2`: Deep Search when used with LLM planner, or standard Local Search without the planner. + - `UNIFIED` or `3`: Instant Search. + +- `use_llm_planner`: Whether to use LLM planner for intelligent query orchestration (optional) + - When enabled, orchestrates retrieval using both local and global strategies (powers Deep Search) + - Set to `false` for standard Local Search without orchestration + +- `level`: Community hierarchy level for analysis (only applicable for `GLOBAL` queries) + - `1` for top-level communities (broader themes) + - `2` for more granular communities (default) + +- `provider`: The LLM provider to use - `0`: Any OpenAI-compatible API (OpenAI, OpenRouter, Gemini, Anthropic, etc.) - - `1`: Triton -- `include_metadata`: Whether to include metadata in the response. If not specified, defaults to `true`. -- `use_llm_planner`: Whether to use the LLM planner for intelligent query processing. If not specified, defaults to `true`. + - `1`: Triton Inference Server + +- `include_metadata`: Whether to include metadata in the response (optional, defaults to `false`) + +- `response_instruction`: Custom instructions for response generation style (optional) + +- `use_cache`: Whether to use caching for this query (optional, defaults to `false`) + +- `show_citations`: Whether to show inline citations in the response (optional, defaults to `false`) ## Health check From 037006de343bfc6af904b74d80dd4da7db072668 Mon Sep 17 00:00:00 2001 From: Paula Date: Fri, 7 Nov 2025 11:28:16 +0100 Subject: [PATCH 13/13] address review comments --- site/content/ai-suite/reference/gen-ai.md | 20 +++++++++- site/content/ai-suite/reference/importer.md | 41 ++++++++++---------- site/content/ai-suite/reference/retriever.md | 40 +++++++++---------- 3 files changed, 59 insertions(+), 42 deletions(-) diff --git a/site/content/ai-suite/reference/gen-ai.md b/site/content/ai-suite/reference/gen-ai.md index 0745965f54..af436ce37d 100644 --- a/site/content/ai-suite/reference/gen-ai.md +++ b/site/content/ai-suite/reference/gen-ai.md @@ -194,12 +194,28 @@ curl -X POST https://:8529/ai/v1/graphragimporter \ "env": { "db_name": "", "chat_api_provider": "", - "chat_api_key": "", - "chat_model": "" + "chat_api_url": "https://api.openai.com/v1", + "embedding_api_provider": "openai", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "gpt-4o", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openai_api_key", + "embedding_api_key": "your_openai_api_key" } }' ``` +Where: +- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `chat_api_url`: API endpoint URL for the chat/language model service +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service + **Response:** ```json diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index daf130c262..7edb4ea50a 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -44,31 +44,33 @@ service using the `genai_project_name` field in the service configuration. You can choose between two deployment options based on your needs. -### Private LLM +### Triton Inference Server If you're working in an air-gapped environment or need to keep your data -private, you can use the private LLM mode with Triton Inference Server. +private, you can use Triton Inference Server. This option allows you to run the service completely within your own infrastructure. The Triton Inference Server is a crucial component when -running in private LLM mode. It serves as the backbone for running your +running with self-hosted models. It serves as the backbone for running your language (LLM) and embedding models on your own machines, ensuring your data never leaves your infrastructure. The server handles all the complex model operations, from processing text to generating embeddings, and provides both HTTP and gRPC interfaces for communication. -### Public LLM +### OpenAI-compatible APIs Alternatively, if you prefer a simpler setup and don't have specific privacy -requirements, you can use the public LLM mode. This option connects to cloud-based +requirements, you can use OpenAI-compatible APIs. This option connects to cloud-based services like OpenAI's models via the OpenAI API or a large array of models (Gemini, Anthropic, publicly hosted open-source models, etc.) via the OpenRouter option. +It also works with private corporate LLMs that expose an OpenAI-compatible endpoint. ## Installation and configuration -The Importer service can be configured to use either: -- Triton Inference Server (for private LLM deployments) -- Any OpenAI-compatible API (for public LLM deployments), including OpenAI, OpenRouter, Gemini, Anthropic, and more +The Importer service can be configured to use either Triton Inference Server or any +OpenAI-compatible API. OpenAI-compatible APIs work with public providers (OpenAI, +OpenRouter, Gemini, Anthropic) as well as private corporate LLMs that expose an +OpenAI-compatible endpoint. To start the service, use the AI service endpoint `/v1/graphragimporter`. Please refer to the documentation of [AI service](gen-ai.md) for more @@ -115,18 +117,23 @@ Where: - `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -When using the official OpenAI API, the service defaults to `gpt-4o-mini` and +When using the official OpenAI API, the service defaults to `gpt-4o` and `text-embedding-3-small` models. {{< /info >}} -### Using different providers for chat and embedding +### Using different OpenAI-compatible services for chat and embedding -You can mix and match any OpenAI-compatible APIs for chat and embedding. For example, -you might use one provider for text generation and another for embeddings, depending +You can use different OpenAI-compatible services for chat and embedding. For example, +you might use OpenRouter for chat and OpenAI for embeddings, depending on your needs for performance, cost, or model availability. -Since both providers use `"openai"` as the provider value, you differentiate them by -setting different URLs in `chat_api_url` and `embedding_api_url`. +{{< info >}} +Both `chat_api_provider` and `embedding_api_provider` must be set to the same value +(either both `"openai"` or both `"triton"`). You cannot mix Triton and OpenAI-compatible +APIs. However, you can use different OpenAI-compatible services (like OpenRouter, OpenAI, +Gemini, etc.) by setting both providers to `"openai"` and differentiating them with +different URLs in `chat_api_url` and `embedding_api_url`. +{{< /info >}} **Example using OpenRouter for chat and OpenAI for embedding:** @@ -157,12 +164,6 @@ Where: - `chat_api_key`: API key for authenticating with the chat/language model service - `embedding_api_key`: API key for authenticating with the embedding model service -{{< info >}} -You can use any combination of OpenAI-compatible providers. This example shows -OpenRouter (for chat) and OpenAI (for embeddings), but you could use Gemini, -Anthropic, or any other compatible service. -{{< /info >}} - ### Using Triton Inference Server for chat and embedding The first step is to install the LLM Host service with the LLM and diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 0e524fb867..47b4c0a92e 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -16,15 +16,15 @@ the Arango team. The Retriever service provides intelligent search and retrieval from knowledge graphs, with multiple search methods optimized for different query types. The service supports -both private (Triton Inference Server) and public (any OpenAI-compatible API) LLM -deployments, making it flexible for various security and infrastructure requirements. +LLMs through Triton Inference Server or any OpenAI-compatible API (including private +corporate LLMs), making it flexible for various deployment and infrastructure requirements. **Key features:** - Multiple search methods optimized for different use cases - Streaming support for real-time responses for `UNIFIED` queries - Optional LLM orchestration for `LOCAL` queries - Configurable community hierarchy levels for `GLOBAL` queries -- Support for both private and public LLM deployments +- Support for Triton Inference Server and OpenAI-compatible APIs - Simple REST API interface - Integration with ArangoDB knowledge graphs @@ -124,9 +124,10 @@ Local search focuses on specific entities and their relationships within your kn ## Installation -The Retriever service can be configured to use either the Triton Inference Server -(for private LLM deployments) or any OpenAI-compatible API (for public LLM deployments), -including OpenAI, OpenRouter, Gemini, Anthropic, and more. +The Retriever service can be configured to use either Triton Inference Server or any +OpenAI-compatible API. OpenAI-compatible APIs work with public providers (OpenAI, +OpenRouter, Gemini, Anthropic) as well as private corporate LLMs that expose an +OpenAI-compatible endpoint. To start the service, use the AI service endpoint `/v1/graphragretriever`. Please refer to the documentation of [AI service](gen-ai.md) for more @@ -173,18 +174,23 @@ Where: - `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -When using the official OpenAI API, the service defaults to `gpt-4o-mini` and +When using the official OpenAI API, the service defaults to `gpt-4o` and `text-embedding-3-small` models. {{< /info >}} -### Using different providers for chat and embedding +### Using different OpenAI-compatible services for chat and embedding -You can mix and match any OpenAI-compatible APIs for chat and embedding. For example, -you might use one provider for text generation and another for embeddings, depending +You can use different OpenAI-compatible services for chat and embedding. For example, +you might use OpenRouter for chat and OpenAI for embeddings, depending on your needs for performance, cost, or model availability. -Since both providers use `"openai"` as the provider value, you differentiate them by -setting different URLs in `chat_api_url` and `embedding_api_url`. +{{< info >}} +Both `chat_api_provider` and `embedding_api_provider` must be set to the same value +(either both `"openai"` or both `"triton"`). You cannot mix Triton and OpenAI-compatible +APIs. However, you can use different OpenAI-compatible services (like OpenRouter, OpenAI, +Gemini, etc.) by setting both providers to `"openai"` and differentiating them with +different URLs in `chat_api_url` and `embedding_api_url`. +{{< /info >}} **Example using OpenRouter for chat and OpenAI for embedding:** @@ -215,12 +221,6 @@ Where: - `chat_api_key`: API key for authenticating with the chat/language model service - `embedding_api_key`: API key for authenticating with the embedding model service -{{< info >}} -You can use any combination of OpenAI-compatible providers. This example shows -OpenRouter (for chat) and OpenAI (for embeddings), but you could use Gemini, -Anthropic, or any other compatible service. -{{< /info >}} - ### Using Triton Inference Server for chat and embedding The first step is to install the LLM Host service with the LLM and @@ -333,11 +333,11 @@ curl -X POST /v1/graphrag-query \ - `query_type`: The type of search to perform. - `GLOBAL` or `1`: Global Search (default if not specified). - - `LOCAL` or `2`: Deep Search when used with LLM planner, or standard Local Search without the planner. + - `LOCAL` or `2`: Deep Search when used with LLM planner (default), or standard Local Search when `llm_planner` is explicitly set to `false`. - `UNIFIED` or `3`: Instant Search. - `use_llm_planner`: Whether to use LLM planner for intelligent query orchestration (optional) - - When enabled, orchestrates retrieval using both local and global strategies (powers Deep Search) + - When enabled (default), orchestrates retrieval using both local and global strategies (powers Deep Search) - Set to `false` for standard Local Search without orchestration - `level`: Community hierarchy level for analysis (only applicable for `GLOBAL` queries)