Add comprehensive .github/copilot-instructions.md for repository onboarding #201

Copilot · 2025-10-23T09:03:44Z

Add comprehensive .github/copilot-instructions.md for repository onboarding

Summary

Created comprehensive copilot-instructions.md (313 lines, ~2 pages) with verified build/test instructions.

Latest updates: Enhanced all descriptions based on deep architectural understanding of the codebase:

Explained layered architecture: low-level templated index implementations → type-erased orchestrators → customization point extensions
Clarified ISA dispatching as runtime CPU feature detection (SSE/AVX2/AVX512) with dispatch to optimized code paths
Detailed type-erasure pattern with concrete examples (e.g., svs::Vamana wrapping templated VamanaIndex)
Explained svs_invoke as customization point objects enabling type-specific behavior specialization without modifying core algorithms
Enhanced directory annotations with architectural context (e.g., "templated implementation" vs "type-erased APIs")
Added performance focus explanation about compile-time dispatch and template metaprogramming
Detailed CI matrix builds, Python version support (3.9-3.12), and time estimates
Added AI-generated content attribution policy: Agents must add attribution line to file headers when creating content on behalf of Intel employees

The instructions now reflect genuine understanding of design patterns and architectural decisions, not just surface-level feature listings.

Original prompt

Your task is to "onboard" this repository to Copilot coding agent by adding a .github/copilot-instructions.md file in the repository that contains information describing how a coding agent seeing it for the first time can work most efficiently.

You will do this task only one time per repository and doing a good job can SIGNIFICANTLY improve the quality of the agent's work, so take your time, think carefully, and search thoroughly before writing the instructions.

- Reduce the likelihood of a coding agent pull request getting rejected by the user due to generating code that fails the continuous integration build, fails a validation pipeline, or having misbehavior. - Minimize bash command and build failures. - Allow the agent to complete its task more quickly by minimizing the need for exploration using grep, find, str_replace_editor, and code search tools. - Instructions must be no longer than 2 pages. - Instructions must not be task specific.

Add the following high level details about the codebase to reduce the amount of searching the agent has to do to understand the codebase each time:

A summary of what the repository does.
High level repository information, such as the size of the repo, the type of the project, the languages, frameworks, or target runtimes in use.

Additional details that you may incorporate:

The repository provides the ScalableVectorSearch (SVS) public library.
It implements a state-of-the-art vamana-graph-based approximate nearest neighbor search implementation.
It's optimized for Intel x86, but it runs on other hardware too.
The repository uses C++20 and modern concepts to enable the best compiler optimizations.
The project can be used as a header-only dependency.
The project also provides Python bindings, which have to make some assumptions about data shapes (esp. data dimensionality), to specialize templated compute operations.
Runtime architecture-aware ISA dispatching allows the project to be built and deployed on different CPU architectures. The ISA dispatching will use the highest available instruction set (SSE vs AVX2 vs AVX512) on the system it's being executed.

Add information about how to build and validate changes so the agent does not need to search and find it each time.

For each of bootstrap, build, test, run, lint, and any other scripted step, document the sequence of steps to take to run it successfully as well as the versions of any runtime or build tools used.
Each command should be validated by running it to ensure that it works correctly as well as any preconditions and postconditions.
Try cleaning the repo and environment and running commands in different orders and document errors and and misbehavior observed as well as any steps used to mitigate the problem.
Run the tests and document the order of steps required to run the tests.
Make a change to the codebase. Document any unexpected build issues as well as the workarounds.
Document environment setup steps that seem optional but that you have validated are actually required.
Document the time required for commands that failed due to timing out.
When you find a sequence of commands that work for a particular purpose, document them in detail.
Use language to indicate when something should always be done. For example: "always run npm install before building".
Record any validation steps from documentation.

Additional details you may incorporate:

For linting, clang-format is used. Currently "clang-format>=15,<16", as defined in .pre-commit-config.yaml.
Build uses cmake, with options defined in cmake/options.cmake. It is recommended to stay close to the CI configuration, defined in build-linux.yml. Currently it is `cmake -DCMAKE_BUILD_TYPE=RelWithDebugInfo -DSVS_BUILD_BINARIES=YES -DSVS_BUILD_TESTS=YES -DSVS_BUILD_EXAMPLES=YES -DSVS_EXPERIMENTAL_LEANVEC=YES -DSVS_NO_AVX512=NO -DSVS_EXPERIMENTAL_ENABLE_IVF=ON
Tests can be executed from within the build directory. The executable is ./tests/tests. Filters on CATCH_TEST_CASE names can be applied, e.g. ./tests/tests [integration][build].

List key facts about the layout and architecture of the codebase to help the agent find where to make changes with minimal searching.

A description of the major architectural elements of the project, including the relative paths to the main project files, the location
of configuration files for linting, compilation, testing, and preferences.
A description of the checks run prior to check in, including any GitHub workflows, continuous integration builds, or other validation pipelines.
Document the steps so that the agent can replicate these itself.
Any explicit validation steps that the agent can consider to have further confidence in its changes.
Dependencies that aren't obvious from the layout or file structure.
...

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

ahuber21 · 2025-10-23T10:24:39Z

The many code changes you added suggest a problem with clang-format. @copilot fix

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot · 2025-10-23T10:30:49Z

The many code changes you added suggest a problem with clang-format. @copilot fix

Fixed in commit 540068f. Reverted the accidental formatting changes to 86 files. The PR now contains only the new .github/copilot-instructions.md file as intended.

ahuber21

@copilot address these comments to improve the instructions.

ahuber21 · 2025-10-23T10:29:03Z

.github/copilot-instructions.md

+│   │   ├── vamana/          # Vamana graph index
+│   │   ├── flat/            # Flat (brute-force) index
+│   │   └── inverted/        # Inverted index (IVF)
+│   ├── orchestrators/       # High-level APIs


@copilot It may be useful to mention that type-erasure is used to provide simple and consistent interfaces.

Added in commit bd187f6. Updated orchestrators description to mention that they use type-erasure to provide simple and consistent interfaces.

ahuber21 · 2025-10-23T10:31:14Z

.github/copilot-instructions.md

+│   │   └── inverted/        # Inverted index (IVF)
+│   ├── orchestrators/       # High-level APIs
+│   ├── quantization/        # Vector quantization
+│   └── extensions/          # ISA-specific optimizations


@copilot extensions are more general than ISA-specific optimizations. They provide svs_invoke overloads/specializations to hook into core SVS routines. I suggest you investigate this behavior more carefully and include it in the instructions.

The concept is similar to std::invoke.

Updated in commit bd187f6. Changed the extensions description to explain that they provide svs_invoke overloads/specializations to hook into core SVS routines (similar to std::invoke). Also added this as a key principle in the Important Notes section.

ahuber21 · 2025-10-23T10:32:00Z

.github/copilot-instructions.md

+2. **pre-commit.yml**: Verifies code formatting with clang-format 15
+3. **cibuildwheel.yml**: Builds Python wheels (uses custom manylinux2014 container)
+
+**To replicate CI locally**: Use the exact cmake command from `build-linux.yml` (lines 70-77).


It's not safe to reference exact lines as files can change.

Fixed in commit bd187f6. Removed all specific line number references from the instructions.

ahuber21 · 2025-10-23T10:32:37Z

.github/copilot-instructions.md

+
+### Build Issues
+
+1. **Problem**: CMake configuration warns about unused `SVS_EXPERIMENTAL_LEANVEC` variable


SVS_EXPERIMENTAL_LEANVEC is part of the closed-source part of the repository. We can completely eliminate it from this repository to avoid confusion.

Removed in commit bd187f6. Eliminated all references to SVS_EXPERIMENTAL_LEANVEC from build commands and explanatory notes.

ahuber21 · 2025-10-23T10:33:51Z

.github/copilot-instructions.md

+cd build/tests && ./tests --list-tags
+
+# Clean and rebuild
+rm -rf build && mkdir build && cd build && cmake .. && make -j$(nproc)


This drops the cmake configuration and uses default values. You should update it to use the exact cmake configuration.

Fixed in commit bd187f6. Updated the Quick Reference section to use the exact cmake configuration with all required flags.

ahuber21 · 2025-10-23T10:34:28Z

.github/copilot-instructions.md

+1. **Trust these instructions first** - Only search the repository if information here is incomplete or incorrect
+2. **Always build out-of-source** - Use a `build/` directory, never configure CMake in the repository root
+3. **Follow the CI configuration** - Use the same cmake flags as `.github/workflows/build-linux.yml` for consistency
+4. **Format before committing** - Run `./tools/clang-format.sh clang-format` to avoid CI failures


Add the following (you are allowed to reword): Limit formatting to modified files, don't include formatting changes from other files.

Added in commit bd187f6. Updated the formatting guidance to emphasize: "Only format files you modify; do not include formatting changes from other files in your PR".

ahuber21 · 2025-10-23T10:35:31Z

.github/copilot-instructions.md

+7. **ISA dispatching** - Runtime dispatch means the same binary runs on different CPU architectures
+8. **Test filters are your friend** - Use Catch2 tags to run subsets of tests during development
+9. **Python bindings are specialized** - Changes to template parameters may require Python binding updates
+10. **Version is synchronized** - Keep version in sync across `CMakeLists.txt` (line 26), `setup.py` (line 43), and test files


Add that new features and bugfixes must be accompanied by tests.
Especially a bug must first be reproduced in a unit test, and then addressed and fixed in a code change.

Added in commit bd187f6. Added a new principle: "Tests are required - New features and bugfixes must be accompanied by tests. For bugs, first reproduce the issue in a unit test, then fix it in the code".

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot · 2025-10-23T10:43:06Z

@copilot address these comments to improve the instructions.

All review comments addressed in commit bd187f6. Changes include:

Added type-erasure explanation for orchestrators
Updated extensions to describe svs_invoke system
Removed specific line references
Eliminated SVS_EXPERIMENTAL_LEANVEC
Fixed cmake commands in Quick Reference
Added formatting constraint guidance
Added testing requirements for all changes

ahuber21 · 2025-10-23T10:54:40Z

@copilot it feels like you took my suggestions verbatim. Did you reference them against your understanding of the project? Can you improve the description further? Remember, doing a good job can SIGNIFICANTLY improve the quality of the agent's work.

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot · 2025-10-23T11:00:58Z

@copilot it feels like you took my suggestions verbatim. Did you reference them against your understanding of the project? Can you improve the description further? Remember, doing a good job can SIGNI...

Enhanced in commit d4f1061. I studied the codebase architecture more deeply and rewrote descriptions based on actual understanding rather than surface-level observations:

Key improvements:

Explained the layered architecture: low-level templated index implementations → type-erased orchestrators → customization point extensions
Clarified ISA dispatching as runtime CPU feature detection (SSE/AVX2/AVX512) with dispatch to optimized paths
Described type-erasure pattern with concrete examples (e.g., svs::Vamana wrapping templated VamanaIndex)
Explained svs_invoke as customization point objects enabling type-specific behavior specialization
Enhanced directory annotations with architectural context (e.g., "templated implementation" vs "type-erased APIs")
Added performance focus explanation about compile-time dispatch and template metaprogramming
Detailed CI matrix builds and Python version support

The instructions now reflect understanding of design patterns and architectural decisions, not just listing features.

ahuber21 · 2025-10-23T11:28:45Z

@copilot add another policy: if content was added on behalf of Intel employees, the following line must be added in the file header

These contents may have been developed with support from one or more Intel-operated generative artificial intelligence solutions.

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot · 2025-10-23T11:31:03Z

@copilot add another policy: if content was added on behalf of Intel employees, the following line must be added in the file header

...

Added in commit a8853bb. The policy is now included as item #7 in the Important Notes for Coding Agents section, instructing agents to add the attribution line to file headers when content is created on behalf of Intel employees.

napetrov · 2025-10-23T15:12:59Z

Might be worth considering not legacy monolithic instruction file but scope limited individual instructions in instructions folder - this can allow reduced context use for specific topic - this can be done on per folder or per type - for instance python API might be separate instruction , build system might be separate instruction, and C++ might be separate instruction , rest of CI and validation might be separate as well.

ethanglaser

A couple minor comments, looks great though

ethanglaser · 2025-10-23T19:13:04Z

.github/copilot-instructions.md

+
+### Prerequisites
+- CMake 3.21 or higher
+- C++20 compiler: GCC 11+, GCC 12+, or Clang 15+


Suggested change

- C++20 compiler: GCC 11+, GCC 12+, or Clang 15+

- C++20 compiler: GCC 11+ or Clang 15+

ethanglaser · 2025-10-23T19:17:37Z

.github/copilot-instructions.md

+│   │   ├── build-linux.yml  # Main build & test (Ubuntu 22.04, g++/clang)
+│   │   ├── pre-commit.yml   # Format checking
+│   │   ├── cibuildwheel.yml # Python wheel building
+│   │   └── build-*.y{a}ml   # macOS, ARM builds


Didn't notice there was inconsistencies here. Maybe makes sense to change to .yml and revise file name (build-macos.yml) in this PR.

Initial plan

1858cc0

Copilot AI assigned Copilot and ahuber21 Oct 23, 2025

Copilot started work on behalf of ahuber21 October 23, 2025 09:04 View session

Copilot AI and others added 2 commits October 23, 2025 09:24

Initial plan for copilot-instructions.md

eb5d7e6

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Add comprehensive .github/copilot-instructions.md

605ea15

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add Copilot instructions for efficient onboarding~~ Add comprehensive .github/copilot-instructions.md for repository onboarding Oct 23, 2025

Copilot finished work on behalf of ahuber21 October 23, 2025 09:38

Copilot AI requested a review from ahuber21 October 23, 2025 09:38

Copilot started work on behalf of ahuber21 October 23, 2025 10:25 View session

Revert accidental clang-format changes

540068f

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot finished work on behalf of ahuber21 October 23, 2025 10:31

ahuber21 requested changes Oct 23, 2025

View reviewed changes

Copilot started work on behalf of ahuber21 October 23, 2025 10:36 View session

Address review feedback on copilot-instructions.md

bd187f6

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot AI requested a review from ahuber21 October 23, 2025 10:44

Copilot finished work on behalf of ahuber21 October 23, 2025 10:44

Copilot started work on behalf of ahuber21 October 23, 2025 10:55 View session

Enhance copilot-instructions.md with deeper architectural understanding

d4f1061

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot finished work on behalf of ahuber21 October 23, 2025 11:02

ahuber21 marked this pull request as ready for review October 23, 2025 11:09

ahuber21 requested review from ethanglaser, homksei, mihaic and yuejiaointel as code owners October 23, 2025 11:09

ahuber21 requested review from aguerreb and ibhati and removed request for homksei and yuejiaointel October 23, 2025 11:09

Copilot started work on behalf of ahuber21 October 23, 2025 11:29 View session

Add AI-generated content attribution policy

a8853bb

Co-authored-by: ahuber21 <9201869+ahuber21@users.noreply.github.com>

Copilot finished work on behalf of ahuber21 October 23, 2025 11:32

ethanglaser reviewed Oct 23, 2025

View reviewed changes


		### Build Issues

		1. Problem: CMake configuration warns about unused `SVS_EXPERIMENTAL_LEANVEC` variable

	- C++20 compiler: GCC 11+, GCC 12+, or Clang 15+
	- C++20 compiler: GCC 11+ or Clang 15+

Add comprehensive .github/copilot-instructions.md for repository onboarding #201

Are you sure you want to change the base?

Add comprehensive .github/copilot-instructions.md for repository onboarding #201

Uh oh!

Conversation

Copilot AI commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

ahuber21 commented Oct 23, 2025

Uh oh!

Copilot AI commented Oct 23, 2025

Uh oh!

ahuber21 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Oct 23, 2025

Uh oh!

ahuber21 commented Oct 23, 2025

Uh oh!

Copilot AI commented Oct 23, 2025

Uh oh!

ahuber21 commented Oct 23, 2025

Uh oh!

Copilot AI commented Oct 23, 2025

Uh oh!

napetrov commented Oct 23, 2025

Uh oh!

ethanglaser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Oct 23, 2025 •

edited

Loading