diff --git a/site/content/ai-suite/graphrag/web-interface.md b/site/content/ai-suite/graphrag/web-interface.md
index 927da744a2..01d0d19f2c 100644
--- a/site/content/ai-suite/graphrag/web-interface.md
+++ b/site/content/ai-suite/graphrag/web-interface.md
@@ -178,9 +178,11 @@ See also the [Retriever](../reference/retriever.md) documentation.
 
 ## Chat with your Knowledge Graph
 
-The chat interface provides two search methods:
-- **Instant search**: Instant queries provide fast responses.
-- **Deep search**: This option will take longer to return a response.
+The Retriever service provides two search methods:
+- [Instant search](../reference/retriever.md#instant-search): Instant
+  queries provide fast responses.
+- [Deep search](../reference/retriever.md#deep-search): This option will take
+  longer to return a response.
 
 In addition to querying the Knowledge Graph, the chat service allows you to do the following:
 - Switch the search method from **Instant search** to **Deep search** and vice-versa
diff --git a/site/content/ai-suite/reference/gen-ai.md b/site/content/ai-suite/reference/gen-ai.md
index f545a7e255..0745965f54 100644
--- a/site/content/ai-suite/reference/gen-ai.md
+++ b/site/content/ai-suite/reference/gen-ai.md
@@ -33,22 +33,15 @@ in the platform. All services support the `profiles` field, which you can use
 to define the profile to use for the service. For example, you can define a
 GPU profile that enables the service to run an LLM on GPU resources.
 
-## LLM Host Service Creation Request Body
+## Service Creation Request Body
 
-```json
-{
-    "env": {
-        "model_name": "<registered_model_name>"
-    }
-}
-```
-
-## Using Labels in Creation Request Body
+The following example shows a complete request body with all available options:
 
 ```json
 {
     "env": {
-        "model_name": "<registered_model_name>"
+        "model_name": "<registered_model_name>",
+        "profiles": "gpu,internal"
     },
     "labels": {
         "key1": "value1",
@@ -57,32 +50,120 @@ GPU profile that enables the service to run an LLM on GPU resources.
 }
 ```
 
-{{< info >}}
-Labels are optional. Labels can be used to filter and identify services in
-the Platform. If you want to use labels, define them as a key-value pair in `labels`
-within the `env` field.
-{{< /info >}}
+**Optional fields:**
+
+- **labels**: Key-value pairs used to filter and identify services in the platform.
+- **profiles**: A comma-separated string defining which profiles to use for the 
+  service (e.g., `"gpu,internal"`). If not set, the service is created with the 
+  default profile. Profiles must be present and created in the platform before 
+  they can be used.
+
+The parameters required for the deployment of each service are defined in the
+corresponding service documentation. See [Importer](importer.md)
+and [Retriever](retriever.md).
+
+## Projects
+
+Projects help you organize your GraphRAG work by grouping related services and 
+keeping your data separate. When the Importer service creates ArangoDB collections 
+(such as documents, chunks, entities, relationships, and communities), it uses 
+your project name as a prefix. For example, a project named `docs` will have 
+collections like `docs_Documents`, `docs_Chunks`, and so on.
 
-## Using Profiles in Creation Request Body
+Projects are required for the following services:
+- Importer
+- Retriever
+
+### Creating a project
+
+To create a new GraphRAG project, send a POST request to the project endpoint:
+
+```bash
+curl -X POST "https://<ExternalEndpoint>:8529/gen-ai/v1/project" \
+  -H "Authorization: Bearer <your-bearer-token>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "project_name": "docs",
+    "project_type": "graphrag",
+    "project_db_name": "documentation",
+    "project_description": "A documentation project for GraphRAG."
+  }'
+```
+
+Where:
+- **project_name** (required): Unique identifier for your project. Must be 1-63 
+  characters and contain only letters, numbers, underscores (`_`), and hyphens (`-`).
+- **project_type** (required): Type of project (e.g., `"graphrag"`).
+- **project_db_name** (required): The ArangoDB database name where the project 
+  will be created.
+- **project_description** (optional): A description of your project.
+
+Once created, you can reference your project in service deployments using the 
+`genai_project_name` field:
 
 ```json
 {
-    "env": {
-        "model_name": "<registered_model_name>",
-        "profiles": "gpu,internal"
-    }
+  "env": {
+    "genai_project_name": "docs"
+  }
 }
 ```
 
-{{< info >}}
-The `profiles` field is optional. If it is not set, the service is created with
-the default profile. Profiles must be present and created in the Platform before
-they can be used. If you want to use profiles, define them as a comma-separated
-string in `profiles` within the `env` field.
-{{< /info >}}
+### Listing projects
 
-The parameters required for the deployment of each service are defined in the
-corresponding service documentation.
+**List all project names in a database:**
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/all_project_names/<database_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+This returns only the project names for quick reference.
+
+**List all projects with full metadata in a database:**
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/all_projects/<database_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+This returns complete project objects including metadata, associated services, 
+and knowledge graph information.
+
+### Getting project details
+
+Retrieve comprehensive metadata for a specific project:
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/project_by_name/<database_name>/<project_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+The response includes:
+- Project configuration
+- Associated Importer and Retriever services
+- Knowledge graph metadata
+- Service status information
+- Last modification timestamp
+
+### Deleting a project
+
+Remove a project's metadata from the GenAI service:
+
+```bash
+curl -X DELETE "https://<ExternalEndpoint>:8529/gen-ai/v1/project/<database_name>/<project_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+{{< warning >}}
+Deleting a project only removes the project metadata from the GenAI service. 
+It does **not** delete:
+- Services associated with the project (must be deleted separately)
+- ArangoDB collections and data
+- Knowledge graphs
+
+You must manually delete services and collections if needed.
+{{< /warning >}}
 
 ## Obtaining a Bearer Token
 
@@ -101,7 +182,7 @@ documentation.
 
 ## Complete Service lifecycle example
 
-The example below shows how to install, monitor, and uninstall the Importer service.
+The example below shows how to install, monitor, and uninstall the [Importer](importer.md) service.
 
 ### Step 1: Installing the service
 
@@ -111,11 +192,10 @@ curl -X POST https://<ExternalEndpoint>:8529/ai/v1/graphragimporter \
   -H "Content-Type: application/json" \
   -d '{
     "env": {
-      "username": "<your-username>",
       "db_name": "<your-database-name>",
-      "api_provider": "<your-api-provider>",
-      "triton_url": "<your-arangodb-llm-host-url>",
-      "triton_model": "<your-triton-model>"
+      "chat_api_provider": "<your-api-provider>",
+      "chat_api_key": "<your-llm-provider-api-key>",
+      "chat_model": "<model-name>"
     }
   }'
 ```
@@ -176,16 +256,6 @@ curl -X DELETE https://<ExternalEndpoint>:8529/ai/v1/service/arangodb-graphrag-i
 - **Authentication**: All requests use the same Bearer token in the `Authorization` header
 {{< /info >}}
 
-### Customizing the example
-
-Replace the following values with your actual configuration:
-- `<your-username>` - Your database username.
-- `<your-database-name>` - Target database name.
-- `<your-api-provider>` - Your API provider (e.g., `triton`)
-- `<your-arangodb-llm-host-url>` - Your LLM host service URL.
-- `<your-triton-model>` - Your Triton model name (e.g., `mistral-nemo-instruct`).
-- `<your-bearer-token>` - Your authentication token.
-
 ## Service configuration
 
 The AI orchestrator service is **started by default**. 
diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md
index e4cce5d200..daf130c262 100644
--- a/site/content/ai-suite/reference/importer.md
+++ b/site/content/ai-suite/reference/importer.md
@@ -28,40 +28,17 @@ different concepts in your document with the Retriever service.
 You can also use the GraphRAG Importer service via the [Data Platform web interface](../graphrag/web-interface.md).
 {{< /tip >}}
 
-## Creating a new project
-
-To create a new GraphRAG project, use the `CreateProject` method by sending a
-`POST` request to the `/ai/v1/project` endpoint. You must provide a unique
-`project_name` and a `project_type` in the request body. Optionally, you can
-provide a `project_description`.
-
-```curl
-curl -X POST "https://<ExternalEndpoint>:8529/ai/v1/project" \
--H "Content-Type: application/json" \
--d '{
-  "project_name": "docs",
-  "project_type": "graphrag",
-  "project_description": "A documentation project for GraphRAG."
-}'
-```
-All the relevant ArangoDB collections (such as documents, chunks, entities,
-relationships, and communities) created during the import process will
-have the project name as a prefix. For example, the Documents collection will
-become `<project_name>_Documents`. The Knowledge Graph will also use the project
-name as a prefix. If no project name is specified, then all collections
-are prefixed with `default_project`, e.g., `default_project_Documents`.
-
-### Project metadata
+## Prerequisites
 
-Additional project metadata is accessible via the following endpoint, replacing
-`<your_project>` with the actual name of your project:
+Before importing data, you need to create a GraphRAG project. Projects help you 
+organize your work and keep your data separate from other projects.
 
-```
-GET /ai/v1/project_by_name/<your_project>
-```
+For detailed instructions on creating and managing projects, see the 
+[Projects](gen-ai.md#projects) section in the GenAI Orchestration Service 
+documentation.
 
-The endpoint provides comprehensive metadata about your project's components,
-including its importer and retriever services and their status.
+Once you have created a project, you can reference it when deploying the Importer 
+service using the `genai_project_name` field in the service configuration.
 
 ## Deployment options
 
@@ -91,103 +68,135 @@ services like OpenAI's models via the OpenAI API or a large array of models
 
 The Importer service can be configured to use either:
 - Triton Inference Server (for private LLM deployments)
-- OpenAI (for public LLM deployments)
-- OpenRouter (for public LLM deployments)
+- Any OpenAI-compatible API (for public LLM deployments), including OpenAI, OpenRouter, Gemini, Anthropic, and more
 
 To start the service, use the AI service endpoint `/v1/graphragimporter`. 
 Please refer to the documentation of [AI service](gen-ai.md) for more
 information on how to use it.
 
-### Using Triton Inference Server (Private LLM)
-
-The first step is to install the LLM Host service with the LLM and
-embedding models of your choice. The setup will the use the 
-Triton Inference Server and MLflow at the backend. 
-For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
-and [Mlflow](mlflow.md) documentation.
-
-Once the `llmhost` service is up-and-running, then you can start the Importer
-service using the below configuration:
+### Using OpenAI-compatible APIs
 
-```json
-{
-  "env": {
-    "username": "your_username",
-    "db_name": "your_database_name",
-    "api_provider": "triton",
-    "triton_url": "your-arangodb-llm-host-url",
-    "triton_model": "mistral-nemo-instruct"
-  },
-}
-```
+The `openai` provider works with any OpenAI-compatible API, including:
+- OpenAI (official API)
+- OpenRouter
+- Google Gemini
+- Anthropic Claude
+- Corporate or self-hosted LLMs with OpenAI-compatible endpoints
 
-Where:
-- `username`: ArangoDB database user with permissions to create and modify collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running.
-- `triton_model`: Name of the LLM model to use for text processing.
+set the `chat_api_url` and `embedding_api_url` to point to your provider's endpoint.
 
-### Using OpenAI (Public LLM)
+**Example using OpenAI:**
 
 ```json
 {
   "env": {
-    "openai_api_key": "your_openai_api_key",
-    "username": "your_username",
     "db_name": "your_database_name",
-    "api_provider": "openai"
+    "chat_api_provider": "openai",
+    "chat_api_url": "https://api.openai.com/v1",
+    "embedding_api_provider": "openai",
+    "embedding_api_url": "https://api.openai.com/v1",
+    "chat_model": "gpt-4o",
+    "embedding_model": "text-embedding-3-small",
+    "chat_api_key": "your_openai_api_key",
+    "embedding_api_key": "your_openai_api_key"
   },
 }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to create and modify collections
 - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
-- `api_provider`: Specifies which LLM provider to use
-- `openai_api_key`: Your OpenAI API key
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
-By default, for OpenAI API, the service is using
-`gpt-4o-mini` and `text-embedding-3-small` models as LLM and
-embedding model respectively.
+When using the official OpenAI API, the service defaults to `gpt-4o-mini` and 
+`text-embedding-3-small` models.
 {{< /info >}}
 
-### Using OpenRouter (Gemini, Anthropic, etc.)
+### Using different providers for chat and embedding
 
-OpenRouter makes it possible to connect to a huge array of LLM API
-providers, including non-OpenAI LLMs like Gemini Flash, Anthropic Claude
-and publicly hosted open-source models.
+You can mix and match any OpenAI-compatible APIs for chat and embedding. For example, 
+you might use one provider for text generation and another for embeddings, depending 
+on your needs for performance, cost, or model availability.
 
-When using the OpenRouter option, the LLM responses are served via OpenRouter
-while OpenAI is used for the embedding model.
+Since both providers use `"openai"` as the provider value, you differentiate them by 
+setting different URLs in `chat_api_url` and `embedding_api_url`.
+
+**Example using OpenRouter for chat and OpenAI for embedding:**
 
 ```json
     {
       "env": {
         "db_name": "your_database_name",
-        "username": "your_username",
-        "api_provider": "openrouter",
-        "openai_api_key": "your_openai_api_key",
-        "openrouter_api_key": "your_openrouter_api_key",
-        "openrouter_model": "mistralai/mistral-nemo"  // Specify a model here
+        "chat_api_provider": "openai",
+        "embedding_api_provider": "openai",
+        "chat_api_url": "https://openrouter.ai/api/v1",
+        "embedding_api_url": "https://api.openai.com/v1",
+        "chat_model": "mistral-nemo",
+        "embedding_model": "text-embedding-3-small",
+        "chat_api_key": "your_openrouter_api_key",
+        "embedding_api_key": "your_openai_api_key"
       },
     }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections  
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored  
-- `api_provider`: Specifies which LLM provider to use  
-- `openai_api_key`: Your OpenAI API key (for the embedding model)  
-- `openrouter_api_key`: Your OpenRouter API key (for the LLM)  
-- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`)
+- `db_name`: Name of the ArangoDB database where the knowledge graph is stored
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service (in this example, OpenRouter)
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service (in this example, OpenAI)
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
-When using OpenRouter, the service defaults to `mistral-nemo` for generation
-(via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI).
+You can use any combination of OpenAI-compatible providers. This example shows
+OpenRouter (for chat) and OpenAI (for embeddings), but you could use Gemini,
+Anthropic, or any other compatible service.
 {{< /info >}}
 
+### Using Triton Inference Server for chat and embedding
+
+The first step is to install the LLM Host service with the LLM and
+embedding models of your choice. The setup will the use the 
+Triton Inference Server and MLflow at the backend. 
+For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
+and [Mlflow](mlflow.md) documentation.
+
+Once the `llmhost` service is up-and-running, then you can start the Importer
+service using the below configuration:
+
+```json
+{
+  "env": {
+    "db_name": "your_database_name",
+    "chat_api_provider": "triton",
+    "embedding_api_provider": "triton",
+    "chat_api_url": "your-arangodb-llm-host-url",
+    "embedding_api_url": "your-arangodb-llm-host-url",
+    "chat_model": "mistral-nemo-instruct",
+    "embedding_model": "nomic-embed-text-v1"
+  },
+}
+```
+
+Where:
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Specifies which LLM provider to use for language model services
+- `embedding_api_provider`: API provider for embedding model services (e.g., "triton")
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+
 ## Building Knowledge Graphs
 
 Once the service is installed successfully, you can follow these steps
diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md
index 5949d8a369..0e524fb867 100644
--- a/site/content/ai-suite/reference/retriever.md
+++ b/site/content/ai-suite/reference/retriever.md
@@ -14,214 +14,347 @@ the Arango team.
 
 ## Overview
 
-The Retriever service offers two distinct search methods:
-- **Global search**: Analyzes entire document to identify themes and patterns,
-  perfect for high-level insights and comprehensive summaries.
-- **Local search**: Focuses on specific entities and their relationships, ideal
-  for detailed queries about particular concepts.
-
-The service supports both private (Triton Inference Server) and public (OpenAI)
-LLM deployments, making it flexible for various security and infrastructure
-requirements. With simple HTTP endpoints, you can easily query your knowledge
-graph and get contextually relevant responses.
+The Retriever service provides intelligent search and retrieval from knowledge graphs,
+with multiple search methods optimized for different query types. The service supports 
+both private (Triton Inference Server) and public (any OpenAI-compatible API) LLM 
+deployments, making it flexible for various security and infrastructure requirements.
 
 **Key features:**
-- Dual search methods for different query types
+- Multiple search methods optimized for different use cases
+- Streaming support for real-time responses for `UNIFIED` queries
+- Optional LLM orchestration for `LOCAL` queries
+- Configurable community hierarchy levels for `GLOBAL` queries
 - Support for both private and public LLM deployments
 - Simple REST API interface
 - Integration with ArangoDB knowledge graphs
-- Configurable community hierarchy levels
 
 {{< tip >}}
-You can also use the GraphRAG Retriever service via the ArangoDB [web interface](../graphrag/web-interface.md).
+You can use the Retriever service via the [web interface](../graphrag/web-interface.md)
+for Instant and Deep Search, or through the API for full control over all query types.
 {{< /tip >}}
 
-## Search methods
+## Prerequisites
+
+Before using the Retriever service, you need to:
+
+1. **Create a GraphRAG project** - For detailed instructions on creating and 
+   managing projects, see the [Projects](gen-ai.md#projects) section in the 
+   GenAI Orchestration Service documentation.
+
+2. **Import data** - Use the [Importer](importer.md) service to transform your 
+   text documents into a knowledge graph stored in ArangoDB.
+
+## Search Methods
 
 The Retriever service enables intelligent search and retrieval of information
-from your knowledge graph. It provides two powerful search methods, global Search
-and local Search, that leverage the structured knowledge graph created by the Importer
-to deliver accurate and contextually relevant responses to your natural language queries.
+from your knowledge graph. It provides multiple search methods that leverage 
+the structured knowledge graph created by the Importer to deliver accurate and 
+contextually relevant responses to your natural language queries.
+
+### Instant Search
+
+Instant Search is designed for responses with very short latency. It triggers
+fast unified retrieval over relevant parts of the knowledge graph via hybrid
+(semantic and lexical) search and graph expansion algorithms, producing a fast,
+streamed natural-language response with clickable references to the relevant documents.
 
-### Global search
+{{< info >}}
+The Instant Search method is also available via the [Web interface](../graphrag/web-interface.md).
+{{< /info >}}
+
+```json
+{
+  "query_type": "UNIFIED"
+}
+```
 
-Global search is designed for queries that require understanding and aggregation
-of information across your entire document. It's particularly effective for questions
-about overall themes, patterns, or high-level insights in your data.
+### Deep Search
 
-- **Community-Based Analysis**: Uses pre-generated community reports from your
-  knowledge graph to understand the overall structure and themes of your data,
+Deep Search is designed for highly detailed, accurate responses that require understanding
+what kind of information is available in different parts of the knowledge graph and
+sequentially retrieving information in an LLM-guided research process. Use whenever
+detail and accuracy are required (e.g. aggregation of highly technical details) and
+very short latency is not (i.e. caching responses for frequently asked questions,
+or use case with agents or research use cases).
+
+{{< info >}}
+The Deep Search method is also available via the [Web interface](../graphrag/web-interface.md).
+{{< /info >}}
+
+```json
+{
+  "query_type": "LOCAL",
+  "use_llm_planner": true
+}
+```
+
+### Global Search
+
+Global search is designed for queries that require understanding and aggregation of information across your entire document. It’s particularly effective for questions about overall themes, patterns, or high-level insights in your data.
+
+- **Community-Based Analysis**: Uses pre-generated community reports from your knowledge graph to understand the overall structure and themes of your data.
 - **Map-Reduce Processing**:
-   - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points.
-   - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response.
+  - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points.
+  - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response.
 
-**Best use cases**:
-- "What are the main themes in the dataset?"
-- "Summarize the key findings across all documents"
-- "What are the most important concepts discussed?"
+```json
+{
+  "query_type": "GLOBAL"
+}
+```
 
-### Local search
+### Local Search
 
-Local search focuses on specific entities and their relationships within your
-knowledge graph. It is ideal for detailed queries about particular concepts,
-entities, or relationships.
+Local search focuses on specific entities and their relationships within your knowledge graph. It is ideal for detailed queries about particular concepts, entities, or relationships.
 
 - **Entity Identification**: Identifies relevant entities from the knowledge graph based on the query.
 - **Context Gathering**: Collects:
-   - Related text chunks from original documents.
-   - Connected entities and their strongest relationships.
-   - Entity descriptions and attributes.
-   - Context from the community each entity belongs to.
+  - Related text chunks from original documents.
+  - Connected entities and their strongest relationships.
+  - Entity descriptions and attributes.
+  - Context from the community each entity belongs to.
 - **Prioritized Response**: Generates a response using the most relevant gathered information.
 
-**Best use cases**:
-- "What are the properties of [specific entity]?"
-- "How is [entity A] related to [entity B]?"
-- "What are the key details about [specific concept]?"
+```json
+{
+  "query_type": "LOCAL",
+  "use_llm_planner": false
+}
+```
 
 ## Installation
 
 The Retriever service can be configured to use either the Triton Inference Server
-(for private LLM deployments) or OpenAI/OpenRouter (for public LLM deployments).
+(for private LLM deployments) or any OpenAI-compatible API (for public LLM deployments), 
+including OpenAI, OpenRouter, Gemini, Anthropic, and more.
 
 To start the service, use the AI service endpoint `/v1/graphragretriever`. 
 Please refer to the documentation of [AI service](gen-ai.md) for more
 information on how to use it.
 
-### Using Triton Inference Server (Private LLM)
-
-The first step is to install the LLM Host service with the LLM and
-embedding models of your choice. The setup will the use the 
-Triton Inference Server and MLflow at the backend. 
-For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
-and [Mlflow](mlflow.md) documentation.
-
-Once the `llmhost` service is up-and-running, then you can start the Importer
-service using the below configuration:
+### Using OpenAI-compatible APIs
 
-```json
-{
-  "env": {
-    "username": "your_username",
-    "db_name": "your_database_name",
-    "api_provider": "triton",
-    "triton_url": "your-arangodb-llm-host-url",
-    "triton_model": "mistral-nemo-instruct"
-  },
-}
-```
+The `openai` provider works with any OpenAI-compatible API, including:
+- OpenAI (official API)
+- OpenRouter
+- Google Gemini
+- Anthropic Claude
+- Corporate or self-hosted LLMs with OpenAI-compatible endpoints
 
-Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running.
-- `triton_model`: Name of the LLM model to use for text processing.
+Set the `chat_api_url` and `embedding_api_url` to point to your provider's endpoint.
 
-### Using OpenAI (Public LLM)
+**Example using OpenAI:**
 
 ```json
 {
   "env": {
-    "openai_api_key": "your_openai_api_key",
-    "username": "your_username",
     "db_name": "your_database_name",
-    "api_provider": "openai"
+    "chat_api_provider": "openai",
+    "chat_api_url": "https://api.openai.com/v1",
+    "embedding_api_provider": "openai",
+    "embedding_api_url": "https://api.openai.com/v1",
+    "chat_model": "gpt-4o",
+    "embedding_model": "text-embedding-3-small",
+    "chat_api_key": "your_openai_api_key",
+    "embedding_api_key": "your_openai_api_key"
   },
 }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `openai_api_key`: Your OpenAI API key.
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
-By default, for OpenAI API, the service is using
-`gpt-4o-mini` and `text-embedding-3-small` models as LLM and
-embedding model respectively.
+When using the official OpenAI API, the service defaults to `gpt-4o-mini` and 
+`text-embedding-3-small` models.
 {{< /info >}}
 
-### Using OpenRouter (Gemini, Anthropic, etc.)
+### Using different providers for chat and embedding
+
+You can mix and match any OpenAI-compatible APIs for chat and embedding. For example, 
+you might use one provider for text generation and another for embeddings, depending 
+on your needs for performance, cost, or model availability.
 
-OpenRouter makes it possible to connect to a huge array of LLM API providers,
-including non-OpenAI LLMs like Gemini Flash, Anthropic Claude and publicly hosted
-open-source models.
+Since both providers use `"openai"` as the provider value, you differentiate them by 
+setting different URLs in `chat_api_url` and `embedding_api_url`.
 
-When using the OpenRouter option, the LLM responses are served via OpenRouter while
-OpenAI is used for the embedding model.
+**Example using OpenRouter for chat and OpenAI for embedding:**
 
 ```json
     {
       "env": {
         "db_name": "your_database_name",
-        "username": "your_username",
-        "api_provider": "openrouter",
-        "openai_api_key": "your_openai_api_key",
-        "openrouter_api_key": "your_openrouter_api_key",
-        "openrouter_model": "mistralai/mistral-nemo"  // Specify a model here
+        "chat_api_provider": "openai",
+        "embedding_api_provider": "openai",
+        "chat_api_url": "https://openrouter.ai/api/v1",
+        "embedding_api_url": "https://api.openai.com/v1",
+        "chat_model": "mistral-nemo",
+        "embedding_model": "text-embedding-3-small",
+        "chat_api_key": "your_openrouter_api_key",
+        "embedding_api_key": "your_openai_api_key"
       },
     }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `openai_api_key`: Your OpenAI API key (for the embedding model).
-- `openrouter_api_key`: Your OpenRouter API key (for the LLM).
-- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`).
+- `db_name`: Name of the ArangoDB database where the knowledge graph is stored
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service (in this example, OpenRouter)
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service (in this example, OpenAI)
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
-When using OpenRouter, the service defaults to `mistral-nemo` for generation
-(via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI).
+You can use any combination of OpenAI-compatible providers. This example shows
+OpenRouter (for chat) and OpenAI (for embeddings), but you could use Gemini,
+Anthropic, or any other compatible service.
 {{< /info >}}
 
+### Using Triton Inference Server for chat and embedding
+
+The first step is to install the LLM Host service with the LLM and
+embedding models of your choice. The setup will use the 
+Triton Inference Server and MLflow at the backend. 
+For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
+and [Mlflow](mlflow.md) documentation.
+
+Once the `llmhost` service is up-and-running, then you can start the Retriever
+service using the below configuration:
+
+```json
+{
+  "env": {
+    "db_name": "your_database_name",
+    "chat_api_provider": "triton",
+    "embedding_api_provider": "triton",
+    "chat_api_url": "your-arangodb-llm-host-url",
+    "embedding_api_url": "your-arangodb-llm-host-url",
+    "chat_model": "mistral-nemo-instruct",
+    "embedding_model": "nomic-embed-text-v1"
+  },
+}
+```
+
+Where:
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Specifies which LLM provider to use for language model services
+- `embedding_api_provider`: API provider for embedding model services (e.g., "triton")
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+
 ## Executing queries
 
 After the Retriever service is installed successfully, you can interact with 
-it using the following HTTP endpoints, based on the selected search method.
+it using the following HTTP endpoints.
 
 {{< tabs "executing-queries" >}}
 
-{{< tab "Local search" >}}
+{{< tab "Instant Search" >}}
+
+```bash
+curl -X POST /v1/graphrag-query-stream \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "How are X and Y related?",
+    "query_type": "UNIFIED",
+    "provider": 0,
+    "include_metadata": true
+  }'
+```
+
+{{< /tab >}}
+
+{{< tab "Deep Search" >}}
+
 ```bash
 curl -X POST /v1/graphrag-query \
   -H "Content-Type: application/json" \
   -d '{
-    "query": "What is the AR3 Drone?",
-    "query_type": 2,
-    "provider": 0
+    "query": "What are the properties of a specific entity?",
+    "query_type": "LOCAL",
+    "use_llm_planner": true,
+    "provider": 0,
+    "include_metadata": true
+  }'
+```
+
+{{< /tab >}}
+
+{{< tab "Global Search" >}}
+
+```bash
+curl -X POST /v1/graphrag-query \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "What are the main themes discussed in the document?",
+    "query_type": "GLOBAL",
+    "level": 1,
+    "provider": 0,
+    "include_metadata": true
   }'
 ```
+
 {{< /tab >}}
 
-{{< tab "Global search" >}}
+{{< tab "Local Search" >}}
 
 ```bash
 curl -X POST /v1/graphrag-query \
   -H "Content-Type: application/json" \
   -d '{
     "query": "What is the AR3 Drone?",
-    "level": 1,
-    "query_type": 1,
-    "provider": 0
+    "query_type": "LOCAL",
+    "use_llm_planner": false,
+    "provider": 0,
+    "include_metadata": true
   }'
 ```
+
 {{< /tab >}}
 
 {{< /tabs >}}
 
-The request parameters are the following:
-- `query`: Your search query text.
-- `level`: The community hierarchy level to use for the search (`1` for top-level communities).
+### Request Parameters
+
+- `query`: Your search query text (required).
+
 - `query_type`: The type of search to perform.
-  - `1`: Global search.
-  - `2`: Local search.
+  - `GLOBAL` or `1`: Global Search (default if not specified).
+  - `LOCAL` or `2`: Deep Search when used with LLM planner, or standard Local Search without the planner.
+  - `UNIFIED` or `3`: Instant Search.
+
+- `use_llm_planner`: Whether to use LLM planner for intelligent query orchestration (optional)
+  - When enabled, orchestrates retrieval using both local and global strategies (powers Deep Search)
+  - Set to `false` for standard Local Search without orchestration
+
+- `level`: Community hierarchy level for analysis (only applicable for `GLOBAL` queries)
+  - `1` for top-level communities (broader themes)
+  - `2` for more granular communities (default)
+
 - `provider`: The LLM provider to use
-  - `0`: OpenAI (or OpenRouter)
-  - `1`: Triton
+  - `0`: Any OpenAI-compatible API (OpenAI, OpenRouter, Gemini, Anthropic, etc.)
+  - `1`: Triton Inference Server
+
+- `include_metadata`: Whether to include metadata in the response (optional, defaults to `false`)
+
+- `response_instruction`: Custom instructions for response generation style (optional)
+
+- `use_cache`: Whether to use caching for this query (optional, defaults to `false`)
+
+- `show_citations`: Whether to show inline citations in the response (optional, defaults to `false`)
 
 ## Health check
 
@@ -249,17 +382,6 @@ properties:
 }
 ```
 
-## Best Practices
-
-- **Choose the right search method**:
-   - Use global search for broad, thematic queries.
-   - Use local search for specific entity or relationship queries.
-
-
-- **Performance considerations**:
-   - Global search may take longer due to its map-reduce process.
-   - Local search is typically faster for concrete queries.
-
 ## API Reference
 
 For detailed API documentation, see the
diff --git a/site/content/ai-suite/reference/triton-inference-server.md b/site/content/ai-suite/reference/triton-inference-server.md
index 458226743e..1e1b982932 100644
--- a/site/content/ai-suite/reference/triton-inference-server.md
+++ b/site/content/ai-suite/reference/triton-inference-server.md
@@ -26,8 +26,8 @@ following steps:
 
 1. Install the Triton LLM Host service.
 2. Register your LLM model to MLflow by uploading the required files.
-3. Configure the [Importer](importer.md#using-triton-inference-server-private-llm) service to use your LLM model.
-4. Configure the [Retriever](retriever.md#using-triton-inference-server-private-llm) service to use your LLM model.
+3. Configure the [Importer](importer.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model.
+4. Configure the [Retriever](retriever.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model.
 
 {{< tip >}}
 Check out the dedicated [ArangoDB MLflow](mlflow.md) documentation page to learn