diff --git a/site/content/ai-suite/graphrag/web-interface.md b/site/content/ai-suite/graphrag/web-interface.md index 927da744a2..01d0d19f2c 100644 --- a/site/content/ai-suite/graphrag/web-interface.md +++ b/site/content/ai-suite/graphrag/web-interface.md @@ -178,9 +178,11 @@ See also the [Retriever](../reference/retriever.md) documentation. ## Chat with your Knowledge Graph -The chat interface provides two search methods: -- **Instant search**: Instant queries provide fast responses. -- **Deep search**: This option will take longer to return a response. +The Retriever service provides two search methods: +- [Instant search](../reference/retriever.md#instant-search): Instant + queries provide fast responses. +- [Deep search](../reference/retriever.md#deep-search): This option will take + longer to return a response. In addition to querying the Knowledge Graph, the chat service allows you to do the following: - Switch the search method from **Instant search** to **Deep search** and vice-versa diff --git a/site/content/ai-suite/reference/gen-ai.md b/site/content/ai-suite/reference/gen-ai.md index f545a7e255..0745965f54 100644 --- a/site/content/ai-suite/reference/gen-ai.md +++ b/site/content/ai-suite/reference/gen-ai.md @@ -33,22 +33,15 @@ in the platform. All services support the `profiles` field, which you can use to define the profile to use for the service. For example, you can define a GPU profile that enables the service to run an LLM on GPU resources. -## LLM Host Service Creation Request Body +## Service Creation Request Body -```json -{ - "env": { - "model_name": "" - } -} -``` - -## Using Labels in Creation Request Body +The following example shows a complete request body with all available options: ```json { "env": { - "model_name": "" + "model_name": "", + "profiles": "gpu,internal" }, "labels": { "key1": "value1", @@ -57,32 +50,120 @@ GPU profile that enables the service to run an LLM on GPU resources. } ``` -{{< info >}} -Labels are optional. Labels can be used to filter and identify services in -the Platform. If you want to use labels, define them as a key-value pair in `labels` -within the `env` field. -{{< /info >}} +**Optional fields:** + +- **labels**: Key-value pairs used to filter and identify services in the platform. +- **profiles**: A comma-separated string defining which profiles to use for the + service (e.g., `"gpu,internal"`). If not set, the service is created with the + default profile. Profiles must be present and created in the platform before + they can be used. + +The parameters required for the deployment of each service are defined in the +corresponding service documentation. See [Importer](importer.md) +and [Retriever](retriever.md). + +## Projects + +Projects help you organize your GraphRAG work by grouping related services and +keeping your data separate. When the Importer service creates ArangoDB collections +(such as documents, chunks, entities, relationships, and communities), it uses +your project name as a prefix. For example, a project named `docs` will have +collections like `docs_Documents`, `docs_Chunks`, and so on. -## Using Profiles in Creation Request Body +Projects are required for the following services: +- Importer +- Retriever + +### Creating a project + +To create a new GraphRAG project, send a POST request to the project endpoint: + +```bash +curl -X POST "https://:8529/gen-ai/v1/project" \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + -d '{ + "project_name": "docs", + "project_type": "graphrag", + "project_db_name": "documentation", + "project_description": "A documentation project for GraphRAG." + }' +``` + +Where: +- **project_name** (required): Unique identifier for your project. Must be 1-63 + characters and contain only letters, numbers, underscores (`_`), and hyphens (`-`). +- **project_type** (required): Type of project (e.g., `"graphrag"`). +- **project_db_name** (required): The ArangoDB database name where the project + will be created. +- **project_description** (optional): A description of your project. + +Once created, you can reference your project in service deployments using the +`genai_project_name` field: ```json { - "env": { - "model_name": "", - "profiles": "gpu,internal" - } + "env": { + "genai_project_name": "docs" + } } ``` -{{< info >}} -The `profiles` field is optional. If it is not set, the service is created with -the default profile. Profiles must be present and created in the Platform before -they can be used. If you want to use profiles, define them as a comma-separated -string in `profiles` within the `env` field. -{{< /info >}} +### Listing projects -The parameters required for the deployment of each service are defined in the -corresponding service documentation. +**List all project names in a database:** + +```bash +curl -X GET "https://:8529/gen-ai/v1/all_project_names/" \ + -H "Authorization: Bearer " +``` + +This returns only the project names for quick reference. + +**List all projects with full metadata in a database:** + +```bash +curl -X GET "https://:8529/gen-ai/v1/all_projects/" \ + -H "Authorization: Bearer " +``` + +This returns complete project objects including metadata, associated services, +and knowledge graph information. + +### Getting project details + +Retrieve comprehensive metadata for a specific project: + +```bash +curl -X GET "https://:8529/gen-ai/v1/project_by_name//" \ + -H "Authorization: Bearer " +``` + +The response includes: +- Project configuration +- Associated Importer and Retriever services +- Knowledge graph metadata +- Service status information +- Last modification timestamp + +### Deleting a project + +Remove a project's metadata from the GenAI service: + +```bash +curl -X DELETE "https://:8529/gen-ai/v1/project//" \ + -H "Authorization: Bearer " +``` + +{{< warning >}} +Deleting a project only removes the project metadata from the GenAI service. +It does **not** delete: +- Services associated with the project (must be deleted separately) +- ArangoDB collections and data +- Knowledge graphs + +You must manually delete services and collections if needed. +{{< /warning >}} ## Obtaining a Bearer Token @@ -101,7 +182,7 @@ documentation. ## Complete Service lifecycle example -The example below shows how to install, monitor, and uninstall the Importer service. +The example below shows how to install, monitor, and uninstall the [Importer](importer.md) service. ### Step 1: Installing the service @@ -111,11 +192,10 @@ curl -X POST https://:8529/ai/v1/graphragimporter \ -H "Content-Type: application/json" \ -d '{ "env": { - "username": "", "db_name": "", - "api_provider": "", - "triton_url": "", - "triton_model": "" + "chat_api_provider": "", + "chat_api_key": "", + "chat_model": "" } }' ``` @@ -176,16 +256,6 @@ curl -X DELETE https://:8529/ai/v1/service/arangodb-graphrag-i - **Authentication**: All requests use the same Bearer token in the `Authorization` header {{< /info >}} -### Customizing the example - -Replace the following values with your actual configuration: -- `` - Your database username. -- `` - Target database name. -- `` - Your API provider (e.g., `triton`) -- `` - Your LLM host service URL. -- `` - Your Triton model name (e.g., `mistral-nemo-instruct`). -- `` - Your authentication token. - ## Service configuration The AI orchestrator service is **started by default**. diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md index e4cce5d200..daf130c262 100644 --- a/site/content/ai-suite/reference/importer.md +++ b/site/content/ai-suite/reference/importer.md @@ -28,40 +28,17 @@ different concepts in your document with the Retriever service. You can also use the GraphRAG Importer service via the [Data Platform web interface](../graphrag/web-interface.md). {{< /tip >}} -## Creating a new project - -To create a new GraphRAG project, use the `CreateProject` method by sending a -`POST` request to the `/ai/v1/project` endpoint. You must provide a unique -`project_name` and a `project_type` in the request body. Optionally, you can -provide a `project_description`. - -```curl -curl -X POST "https://:8529/ai/v1/project" \ --H "Content-Type: application/json" \ --d '{ - "project_name": "docs", - "project_type": "graphrag", - "project_description": "A documentation project for GraphRAG." -}' -``` -All the relevant ArangoDB collections (such as documents, chunks, entities, -relationships, and communities) created during the import process will -have the project name as a prefix. For example, the Documents collection will -become `_Documents`. The Knowledge Graph will also use the project -name as a prefix. If no project name is specified, then all collections -are prefixed with `default_project`, e.g., `default_project_Documents`. - -### Project metadata +## Prerequisites -Additional project metadata is accessible via the following endpoint, replacing -`` with the actual name of your project: +Before importing data, you need to create a GraphRAG project. Projects help you +organize your work and keep your data separate from other projects. -``` -GET /ai/v1/project_by_name/ -``` +For detailed instructions on creating and managing projects, see the +[Projects](gen-ai.md#projects) section in the GenAI Orchestration Service +documentation. -The endpoint provides comprehensive metadata about your project's components, -including its importer and retriever services and their status. +Once you have created a project, you can reference it when deploying the Importer +service using the `genai_project_name` field in the service configuration. ## Deployment options @@ -91,103 +68,135 @@ services like OpenAI's models via the OpenAI API or a large array of models The Importer service can be configured to use either: - Triton Inference Server (for private LLM deployments) -- OpenAI (for public LLM deployments) -- OpenRouter (for public LLM deployments) +- Any OpenAI-compatible API (for public LLM deployments), including OpenAI, OpenRouter, Gemini, Anthropic, and more To start the service, use the AI service endpoint `/v1/graphragimporter`. Please refer to the documentation of [AI service](gen-ai.md) for more information on how to use it. -### Using Triton Inference Server (Private LLM) - -The first step is to install the LLM Host service with the LLM and -embedding models of your choice. The setup will the use the -Triton Inference Server and MLflow at the backend. -For more details, please refer to the [Triton Inference Server](triton-inference-server.md) -and [Mlflow](mlflow.md) documentation. - -Once the `llmhost` service is up-and-running, then you can start the Importer -service using the below configuration: +### Using OpenAI-compatible APIs -```json -{ - "env": { - "username": "your_username", - "db_name": "your_database_name", - "api_provider": "triton", - "triton_url": "your-arangodb-llm-host-url", - "triton_model": "mistral-nemo-instruct" - }, -} -``` +The `openai` provider works with any OpenAI-compatible API, including: +- OpenAI (official API) +- OpenRouter +- Google Gemini +- Anthropic Claude +- Corporate or self-hosted LLMs with OpenAI-compatible endpoints -Where: -- `username`: ArangoDB database user with permissions to create and modify collections. -- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored. -- `api_provider`: Specifies which LLM provider to use. -- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running. -- `triton_model`: Name of the LLM model to use for text processing. +set the `chat_api_url` and `embedding_api_url` to point to your provider's endpoint. -### Using OpenAI (Public LLM) +**Example using OpenAI:** ```json { "env": { - "openai_api_key": "your_openai_api_key", - "username": "your_username", "db_name": "your_database_name", - "api_provider": "openai" + "chat_api_provider": "openai", + "chat_api_url": "https://api.openai.com/v1", + "embedding_api_provider": "openai", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "gpt-4o", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openai_api_key", + "embedding_api_key": "your_openai_api_key" }, } ``` Where: -- `username`: ArangoDB database user with permissions to create and modify collections - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored -- `api_provider`: Specifies which LLM provider to use -- `openai_api_key`: Your OpenAI API key +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `chat_api_url`: API endpoint URL for the chat/language model service +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -By default, for OpenAI API, the service is using -`gpt-4o-mini` and `text-embedding-3-small` models as LLM and -embedding model respectively. +When using the official OpenAI API, the service defaults to `gpt-4o-mini` and +`text-embedding-3-small` models. {{< /info >}} -### Using OpenRouter (Gemini, Anthropic, etc.) +### Using different providers for chat and embedding -OpenRouter makes it possible to connect to a huge array of LLM API -providers, including non-OpenAI LLMs like Gemini Flash, Anthropic Claude -and publicly hosted open-source models. +You can mix and match any OpenAI-compatible APIs for chat and embedding. For example, +you might use one provider for text generation and another for embeddings, depending +on your needs for performance, cost, or model availability. -When using the OpenRouter option, the LLM responses are served via OpenRouter -while OpenAI is used for the embedding model. +Since both providers use `"openai"` as the provider value, you differentiate them by +setting different URLs in `chat_api_url` and `embedding_api_url`. + +**Example using OpenRouter for chat and OpenAI for embedding:** ```json { "env": { "db_name": "your_database_name", - "username": "your_username", - "api_provider": "openrouter", - "openai_api_key": "your_openai_api_key", - "openrouter_api_key": "your_openrouter_api_key", - "openrouter_model": "mistralai/mistral-nemo" // Specify a model here + "chat_api_provider": "openai", + "embedding_api_provider": "openai", + "chat_api_url": "https://openrouter.ai/api/v1", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "mistral-nemo", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openrouter_api_key", + "embedding_api_key": "your_openai_api_key" }, } ``` Where: -- `username`: ArangoDB database user with permissions to access collections -- `db_name`: Name of the ArangoDB database where the knowledge graph is stored -- `api_provider`: Specifies which LLM provider to use -- `openai_api_key`: Your OpenAI API key (for the embedding model) -- `openrouter_api_key`: Your OpenRouter API key (for the LLM) -- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`) +- `db_name`: Name of the ArangoDB database where the knowledge graph is stored +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `chat_api_url`: API endpoint URL for the chat/language model service (in this example, OpenRouter) +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `embedding_api_url`: API endpoint URL for the embedding model service (in this example, OpenAI) +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -When using OpenRouter, the service defaults to `mistral-nemo` for generation -(via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI). +You can use any combination of OpenAI-compatible providers. This example shows +OpenRouter (for chat) and OpenAI (for embeddings), but you could use Gemini, +Anthropic, or any other compatible service. {{< /info >}} +### Using Triton Inference Server for chat and embedding + +The first step is to install the LLM Host service with the LLM and +embedding models of your choice. The setup will the use the +Triton Inference Server and MLflow at the backend. +For more details, please refer to the [Triton Inference Server](triton-inference-server.md) +and [Mlflow](mlflow.md) documentation. + +Once the `llmhost` service is up-and-running, then you can start the Importer +service using the below configuration: + +```json +{ + "env": { + "db_name": "your_database_name", + "chat_api_provider": "triton", + "embedding_api_provider": "triton", + "chat_api_url": "your-arangodb-llm-host-url", + "embedding_api_url": "your-arangodb-llm-host-url", + "chat_model": "mistral-nemo-instruct", + "embedding_model": "nomic-embed-text-v1" + }, +} +``` + +Where: +- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored +- `chat_api_provider`: Specifies which LLM provider to use for language model services +- `embedding_api_provider`: API provider for embedding model services (e.g., "triton") +- `chat_api_url`: API endpoint URL for the chat/language model service +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings + ## Building Knowledge Graphs Once the service is installed successfully, you can follow these steps diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md index 5949d8a369..0e524fb867 100644 --- a/site/content/ai-suite/reference/retriever.md +++ b/site/content/ai-suite/reference/retriever.md @@ -14,214 +14,347 @@ the Arango team. ## Overview -The Retriever service offers two distinct search methods: -- **Global search**: Analyzes entire document to identify themes and patterns, - perfect for high-level insights and comprehensive summaries. -- **Local search**: Focuses on specific entities and their relationships, ideal - for detailed queries about particular concepts. - -The service supports both private (Triton Inference Server) and public (OpenAI) -LLM deployments, making it flexible for various security and infrastructure -requirements. With simple HTTP endpoints, you can easily query your knowledge -graph and get contextually relevant responses. +The Retriever service provides intelligent search and retrieval from knowledge graphs, +with multiple search methods optimized for different query types. The service supports +both private (Triton Inference Server) and public (any OpenAI-compatible API) LLM +deployments, making it flexible for various security and infrastructure requirements. **Key features:** -- Dual search methods for different query types +- Multiple search methods optimized for different use cases +- Streaming support for real-time responses for `UNIFIED` queries +- Optional LLM orchestration for `LOCAL` queries +- Configurable community hierarchy levels for `GLOBAL` queries - Support for both private and public LLM deployments - Simple REST API interface - Integration with ArangoDB knowledge graphs -- Configurable community hierarchy levels {{< tip >}} -You can also use the GraphRAG Retriever service via the ArangoDB [web interface](../graphrag/web-interface.md). +You can use the Retriever service via the [web interface](../graphrag/web-interface.md) +for Instant and Deep Search, or through the API for full control over all query types. {{< /tip >}} -## Search methods +## Prerequisites + +Before using the Retriever service, you need to: + +1. **Create a GraphRAG project** - For detailed instructions on creating and + managing projects, see the [Projects](gen-ai.md#projects) section in the + GenAI Orchestration Service documentation. + +2. **Import data** - Use the [Importer](importer.md) service to transform your + text documents into a knowledge graph stored in ArangoDB. + +## Search Methods The Retriever service enables intelligent search and retrieval of information -from your knowledge graph. It provides two powerful search methods, global Search -and local Search, that leverage the structured knowledge graph created by the Importer -to deliver accurate and contextually relevant responses to your natural language queries. +from your knowledge graph. It provides multiple search methods that leverage +the structured knowledge graph created by the Importer to deliver accurate and +contextually relevant responses to your natural language queries. + +### Instant Search + +Instant Search is designed for responses with very short latency. It triggers +fast unified retrieval over relevant parts of the knowledge graph via hybrid +(semantic and lexical) search and graph expansion algorithms, producing a fast, +streamed natural-language response with clickable references to the relevant documents. -### Global search +{{< info >}} +The Instant Search method is also available via the [Web interface](../graphrag/web-interface.md). +{{< /info >}} + +```json +{ + "query_type": "UNIFIED" +} +``` -Global search is designed for queries that require understanding and aggregation -of information across your entire document. It's particularly effective for questions -about overall themes, patterns, or high-level insights in your data. +### Deep Search -- **Community-Based Analysis**: Uses pre-generated community reports from your - knowledge graph to understand the overall structure and themes of your data, +Deep Search is designed for highly detailed, accurate responses that require understanding +what kind of information is available in different parts of the knowledge graph and +sequentially retrieving information in an LLM-guided research process. Use whenever +detail and accuracy are required (e.g. aggregation of highly technical details) and +very short latency is not (i.e. caching responses for frequently asked questions, +or use case with agents or research use cases). + +{{< info >}} +The Deep Search method is also available via the [Web interface](../graphrag/web-interface.md). +{{< /info >}} + +```json +{ + "query_type": "LOCAL", + "use_llm_planner": true +} +``` + +### Global Search + +Global search is designed for queries that require understanding and aggregation of information across your entire document. It’s particularly effective for questions about overall themes, patterns, or high-level insights in your data. + +- **Community-Based Analysis**: Uses pre-generated community reports from your knowledge graph to understand the overall structure and themes of your data. - **Map-Reduce Processing**: - - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points. - - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response. + - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points. + - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response. -**Best use cases**: -- "What are the main themes in the dataset?" -- "Summarize the key findings across all documents" -- "What are the most important concepts discussed?" +```json +{ + "query_type": "GLOBAL" +} +``` -### Local search +### Local Search -Local search focuses on specific entities and their relationships within your -knowledge graph. It is ideal for detailed queries about particular concepts, -entities, or relationships. +Local search focuses on specific entities and their relationships within your knowledge graph. It is ideal for detailed queries about particular concepts, entities, or relationships. - **Entity Identification**: Identifies relevant entities from the knowledge graph based on the query. - **Context Gathering**: Collects: - - Related text chunks from original documents. - - Connected entities and their strongest relationships. - - Entity descriptions and attributes. - - Context from the community each entity belongs to. + - Related text chunks from original documents. + - Connected entities and their strongest relationships. + - Entity descriptions and attributes. + - Context from the community each entity belongs to. - **Prioritized Response**: Generates a response using the most relevant gathered information. -**Best use cases**: -- "What are the properties of [specific entity]?" -- "How is [entity A] related to [entity B]?" -- "What are the key details about [specific concept]?" +```json +{ + "query_type": "LOCAL", + "use_llm_planner": false +} +``` ## Installation The Retriever service can be configured to use either the Triton Inference Server -(for private LLM deployments) or OpenAI/OpenRouter (for public LLM deployments). +(for private LLM deployments) or any OpenAI-compatible API (for public LLM deployments), +including OpenAI, OpenRouter, Gemini, Anthropic, and more. To start the service, use the AI service endpoint `/v1/graphragretriever`. Please refer to the documentation of [AI service](gen-ai.md) for more information on how to use it. -### Using Triton Inference Server (Private LLM) - -The first step is to install the LLM Host service with the LLM and -embedding models of your choice. The setup will the use the -Triton Inference Server and MLflow at the backend. -For more details, please refer to the [Triton Inference Server](triton-inference-server.md) -and [Mlflow](mlflow.md) documentation. - -Once the `llmhost` service is up-and-running, then you can start the Importer -service using the below configuration: +### Using OpenAI-compatible APIs -```json -{ - "env": { - "username": "your_username", - "db_name": "your_database_name", - "api_provider": "triton", - "triton_url": "your-arangodb-llm-host-url", - "triton_model": "mistral-nemo-instruct" - }, -} -``` +The `openai` provider works with any OpenAI-compatible API, including: +- OpenAI (official API) +- OpenRouter +- Google Gemini +- Anthropic Claude +- Corporate or self-hosted LLMs with OpenAI-compatible endpoints -Where: -- `username`: ArangoDB database user with permissions to access collections. -- `db_name`: Name of the ArangoDB database where the knowledge graph is stored. -- `api_provider`: Specifies which LLM provider to use. -- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running. -- `triton_model`: Name of the LLM model to use for text processing. +Set the `chat_api_url` and `embedding_api_url` to point to your provider's endpoint. -### Using OpenAI (Public LLM) +**Example using OpenAI:** ```json { "env": { - "openai_api_key": "your_openai_api_key", - "username": "your_username", "db_name": "your_database_name", - "api_provider": "openai" + "chat_api_provider": "openai", + "chat_api_url": "https://api.openai.com/v1", + "embedding_api_provider": "openai", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "gpt-4o", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openai_api_key", + "embedding_api_key": "your_openai_api_key" }, } ``` Where: -- `username`: ArangoDB database user with permissions to access collections. -- `db_name`: Name of the ArangoDB database where the knowledge graph is stored. -- `api_provider`: Specifies which LLM provider to use. -- `openai_api_key`: Your OpenAI API key. +- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `chat_api_url`: API endpoint URL for the chat/language model service +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -By default, for OpenAI API, the service is using -`gpt-4o-mini` and `text-embedding-3-small` models as LLM and -embedding model respectively. +When using the official OpenAI API, the service defaults to `gpt-4o-mini` and +`text-embedding-3-small` models. {{< /info >}} -### Using OpenRouter (Gemini, Anthropic, etc.) +### Using different providers for chat and embedding + +You can mix and match any OpenAI-compatible APIs for chat and embedding. For example, +you might use one provider for text generation and another for embeddings, depending +on your needs for performance, cost, or model availability. -OpenRouter makes it possible to connect to a huge array of LLM API providers, -including non-OpenAI LLMs like Gemini Flash, Anthropic Claude and publicly hosted -open-source models. +Since both providers use `"openai"` as the provider value, you differentiate them by +setting different URLs in `chat_api_url` and `embedding_api_url`. -When using the OpenRouter option, the LLM responses are served via OpenRouter while -OpenAI is used for the embedding model. +**Example using OpenRouter for chat and OpenAI for embedding:** ```json { "env": { "db_name": "your_database_name", - "username": "your_username", - "api_provider": "openrouter", - "openai_api_key": "your_openai_api_key", - "openrouter_api_key": "your_openrouter_api_key", - "openrouter_model": "mistralai/mistral-nemo" // Specify a model here + "chat_api_provider": "openai", + "embedding_api_provider": "openai", + "chat_api_url": "https://openrouter.ai/api/v1", + "embedding_api_url": "https://api.openai.com/v1", + "chat_model": "mistral-nemo", + "embedding_model": "text-embedding-3-small", + "chat_api_key": "your_openrouter_api_key", + "embedding_api_key": "your_openai_api_key" }, } ``` Where: -- `username`: ArangoDB database user with permissions to access collections. -- `db_name`: Name of the ArangoDB database where the knowledge graph is stored. -- `api_provider`: Specifies which LLM provider to use. -- `openai_api_key`: Your OpenAI API key (for the embedding model). -- `openrouter_api_key`: Your OpenRouter API key (for the LLM). -- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`). +- `db_name`: Name of the ArangoDB database where the knowledge graph is stored +- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `chat_api_url`: API endpoint URL for the chat/language model service (in this example, OpenRouter) +- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API +- `embedding_api_url`: API endpoint URL for the embedding model service (in this example, OpenAI) +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings +- `chat_api_key`: API key for authenticating with the chat/language model service +- `embedding_api_key`: API key for authenticating with the embedding model service {{< info >}} -When using OpenRouter, the service defaults to `mistral-nemo` for generation -(via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI). +You can use any combination of OpenAI-compatible providers. This example shows +OpenRouter (for chat) and OpenAI (for embeddings), but you could use Gemini, +Anthropic, or any other compatible service. {{< /info >}} +### Using Triton Inference Server for chat and embedding + +The first step is to install the LLM Host service with the LLM and +embedding models of your choice. The setup will use the +Triton Inference Server and MLflow at the backend. +For more details, please refer to the [Triton Inference Server](triton-inference-server.md) +and [Mlflow](mlflow.md) documentation. + +Once the `llmhost` service is up-and-running, then you can start the Retriever +service using the below configuration: + +```json +{ + "env": { + "db_name": "your_database_name", + "chat_api_provider": "triton", + "embedding_api_provider": "triton", + "chat_api_url": "your-arangodb-llm-host-url", + "embedding_api_url": "your-arangodb-llm-host-url", + "chat_model": "mistral-nemo-instruct", + "embedding_model": "nomic-embed-text-v1" + }, +} +``` + +Where: +- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored +- `chat_api_provider`: Specifies which LLM provider to use for language model services +- `embedding_api_provider`: API provider for embedding model services (e.g., "triton") +- `chat_api_url`: API endpoint URL for the chat/language model service +- `embedding_api_url`: API endpoint URL for the embedding model service +- `chat_model`: Specific language model to use for text generation and analysis +- `embedding_model`: Specific model to use for generating text embeddings + ## Executing queries After the Retriever service is installed successfully, you can interact with -it using the following HTTP endpoints, based on the selected search method. +it using the following HTTP endpoints. {{< tabs "executing-queries" >}} -{{< tab "Local search" >}} +{{< tab "Instant Search" >}} + +```bash +curl -X POST /v1/graphrag-query-stream \ + -H "Content-Type: application/json" \ + -d '{ + "query": "How are X and Y related?", + "query_type": "UNIFIED", + "provider": 0, + "include_metadata": true + }' +``` + +{{< /tab >}} + +{{< tab "Deep Search" >}} + ```bash curl -X POST /v1/graphrag-query \ -H "Content-Type: application/json" \ -d '{ - "query": "What is the AR3 Drone?", - "query_type": 2, - "provider": 0 + "query": "What are the properties of a specific entity?", + "query_type": "LOCAL", + "use_llm_planner": true, + "provider": 0, + "include_metadata": true + }' +``` + +{{< /tab >}} + +{{< tab "Global Search" >}} + +```bash +curl -X POST /v1/graphrag-query \ + -H "Content-Type: application/json" \ + -d '{ + "query": "What are the main themes discussed in the document?", + "query_type": "GLOBAL", + "level": 1, + "provider": 0, + "include_metadata": true }' ``` + {{< /tab >}} -{{< tab "Global search" >}} +{{< tab "Local Search" >}} ```bash curl -X POST /v1/graphrag-query \ -H "Content-Type: application/json" \ -d '{ "query": "What is the AR3 Drone?", - "level": 1, - "query_type": 1, - "provider": 0 + "query_type": "LOCAL", + "use_llm_planner": false, + "provider": 0, + "include_metadata": true }' ``` + {{< /tab >}} {{< /tabs >}} -The request parameters are the following: -- `query`: Your search query text. -- `level`: The community hierarchy level to use for the search (`1` for top-level communities). +### Request Parameters + +- `query`: Your search query text (required). + - `query_type`: The type of search to perform. - - `1`: Global search. - - `2`: Local search. + - `GLOBAL` or `1`: Global Search (default if not specified). + - `LOCAL` or `2`: Deep Search when used with LLM planner, or standard Local Search without the planner. + - `UNIFIED` or `3`: Instant Search. + +- `use_llm_planner`: Whether to use LLM planner for intelligent query orchestration (optional) + - When enabled, orchestrates retrieval using both local and global strategies (powers Deep Search) + - Set to `false` for standard Local Search without orchestration + +- `level`: Community hierarchy level for analysis (only applicable for `GLOBAL` queries) + - `1` for top-level communities (broader themes) + - `2` for more granular communities (default) + - `provider`: The LLM provider to use - - `0`: OpenAI (or OpenRouter) - - `1`: Triton + - `0`: Any OpenAI-compatible API (OpenAI, OpenRouter, Gemini, Anthropic, etc.) + - `1`: Triton Inference Server + +- `include_metadata`: Whether to include metadata in the response (optional, defaults to `false`) + +- `response_instruction`: Custom instructions for response generation style (optional) + +- `use_cache`: Whether to use caching for this query (optional, defaults to `false`) + +- `show_citations`: Whether to show inline citations in the response (optional, defaults to `false`) ## Health check @@ -249,17 +382,6 @@ properties: } ``` -## Best Practices - -- **Choose the right search method**: - - Use global search for broad, thematic queries. - - Use local search for specific entity or relationship queries. - - -- **Performance considerations**: - - Global search may take longer due to its map-reduce process. - - Local search is typically faster for concrete queries. - ## API Reference For detailed API documentation, see the diff --git a/site/content/ai-suite/reference/triton-inference-server.md b/site/content/ai-suite/reference/triton-inference-server.md index 458226743e..1e1b982932 100644 --- a/site/content/ai-suite/reference/triton-inference-server.md +++ b/site/content/ai-suite/reference/triton-inference-server.md @@ -26,8 +26,8 @@ following steps: 1. Install the Triton LLM Host service. 2. Register your LLM model to MLflow by uploading the required files. -3. Configure the [Importer](importer.md#using-triton-inference-server-private-llm) service to use your LLM model. -4. Configure the [Retriever](retriever.md#using-triton-inference-server-private-llm) service to use your LLM model. +3. Configure the [Importer](importer.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model. +4. Configure the [Retriever](retriever.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model. {{< tip >}} Check out the dedicated [ArangoDB MLflow](mlflow.md) documentation page to learn