Skip to content

Commit d339f6d

Browse files
FEAT: Refine Perf Benchmarking, test across different OSs and add it to CI (#315)
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#40134](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/40134) ------------------------------------------------------------------- ### Summary <!-- Insert your summary of changes below. Minimum 10 characters required. --> This pull request introduces a comprehensive performance benchmarking workflow for comparing `mssql-python` and `pyodbc` on real-world SQL Server workloads. The changes include a new benchmark script targeting the AdventureWorks2022 database, enhancements to the documentation, and updates to CI pipelines to automate database setup and benchmark execution on both Windows and Ubuntu environments. **Benchmarking infrastructure and documentation:** * Added a new script `benchmarks/perf-benchmarking.py` for real-world query performance comparisons between `mssql-python` and `pyodbc`, including statistical analysis, speedup calculations, and detailed reporting. * Expanded the `benchmarks/README.md` with instructions and key features for running both the new and existing benchmark scripts, clarifying usage, output, and requirements. **CI/CD pipeline automation:** * Updated `eng/pipelines/pr-validation-pipeline.yml` to automatically download and restore the AdventureWorks2022 database for both Windows and Ubuntu environments, ensuring benchmarks run against a realistic dataset. [[1]](diffhunk://#diff-296c8f902bbd70f34ee1c8c32383c8c99165fe4c8e5b0f234f8f22246e56a621R183-R230) [[2]](diffhunk://#diff-296c8f902bbd70f34ee1c8c32383c8c99165fe4c8e5b0f234f8f22246e56a621R550-R606) * Integrated steps in the pipeline to install required dependencies and execute the new benchmarking script, with conditional logic for platform and SQL Server availability. [[1]](diffhunk://#diff-296c8f902bbd70f34ee1c8c32383c8c99165fe4c8e5b0f234f8f22246e56a621R183-R230) [[2]](diffhunk://#diff-296c8f902bbd70f34ee1c8c32383c8c99165fe4c8e5b0f234f8f22246e56a621R550-R606) These changes provide a robust framework for evaluating and monitoring database driver performance in CI, helping guide future optimizations and ensuring reliability across platforms. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 9d385d3 commit d339f6d

File tree

3 files changed

+638
-11
lines changed

3 files changed

+638
-11
lines changed

benchmarks/README.md

Lines changed: 61 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,39 +2,89 @@
22

33
This directory contains benchmark scripts for testing the performance of various database operations using `pyodbc` and `mssql_python`. The goal is to evaluate and compare the performance of these libraries for common database operations.
44

5+
## Benchmark Scripts
6+
7+
### 1. `bench_mssql.py` - Richbench Framework Benchmarks
8+
Comprehensive benchmarks using the richbench framework for detailed performance analysis.
9+
10+
### 2. `perf-benchmarking.py` - Real-World Query Benchmarks
11+
Standalone script that tests real-world queries against AdventureWorks2022 database with statistical analysis.
12+
513
## Why Benchmarks?
614
- To measure the efficiency of `pyodbc` and `mssql_python` in handling database operations.
715
- To identify performance bottlenecks and optimize database interactions.
816
- To ensure the reliability and scalability of the libraries under different workloads.
917

1018
## How to Run Benchmarks
19+
20+
### Running bench_mssql.py (Richbench Framework)
21+
1122
1. **Set Up the Environment Variable**:
1223
- Ensure you have a running SQL Server instance.
1324
- Set the `DB_CONNECTION_STRING` environment variable with the connection string to your database. For example:
14-
```cmd
15-
set DB_CONNECTION_STRING=Server=your_server;Database=your_database;UID=your_user;PWD=your_password;
25+
```bash
26+
export DB_CONNECTION_STRING="Server=your_server;Database=AdventureWorks2022;UID=your_user;PWD=your_password;"
1627
```
1728

1829
2. **Install Richbench - Benchmarking Tool**:
19-
- Install richbench :
20-
```cmd
21-
pip install richbench
22-
```
30+
```bash
31+
pip install richbench
32+
```
2333

2434
3. **Run the Benchmarks**:
25-
- Execute richbench from the parent folder (mssql-python) :
26-
```cmd
35+
- Execute richbench from the parent folder (mssql-python):
36+
```bash
2737
richbench benchmarks
2838
```
29-
Results will be displayed in the terminal with detailed performance metrics.
39+
- Results will be displayed in the terminal with detailed performance metrics.
40+
41+
### Running perf-benchmarking.py (Real-World Queries)
42+
43+
This script tests performance with real-world queries from the AdventureWorks2022 database.
44+
45+
1. **Prerequisites**:
46+
- AdventureWorks2022 database must be available
47+
- Both `pyodbc` and `mssql-python` must be installed
48+
- Update the connection string in the script if needed
49+
50+
2. **Run from project root**:
51+
```bash
52+
python benchmarks/perf-benchmarking.py
53+
```
54+
55+
3. **Features**:
56+
- Runs each query multiple times (default: 5 iterations)
57+
- Calculates average, min, max, and standard deviation
58+
- Provides speedup comparisons between libraries
59+
- Tests various query patterns:
60+
- Complex joins with aggregations
61+
- Large dataset retrieval (10K+ rows)
62+
- Very large dataset (1.2M rows)
63+
- CTEs and subqueries
64+
- Detailed summary tables and conclusions
65+
66+
4. **Output**:
67+
The script provides:
68+
- Progress indicators during execution
69+
- Detailed results for each benchmark
70+
- Summary comparison table
71+
- Overall performance conclusion with speedup factors
3072

3173
## Key Features of `bench_mssql.py`
3274
- **Comprehensive Benchmarks**: Includes SELECT, INSERT, UPDATE, DELETE, complex queries, stored procedures, and transaction handling.
3375
- **Error Handling**: Each benchmark function is wrapped with error handling to ensure smooth execution.
3476
- **Progress Messages**: Clear progress messages are printed during execution for better visibility.
3577
- **Automated Setup and Cleanup**: The script automatically sets up and cleans up the database environment before and after the benchmarks.
3678

79+
## Key Features of `perf-benchmarking.py`
80+
- **Statistical Analysis**: Multiple iterations with avg/min/max/stddev calculations
81+
- **Real-World Queries**: Tests against AdventureWorks2022 with production-like queries
82+
- **Automatic Import Resolution**: Correctly imports local `mssql_python` package
83+
- **Comprehensive Reporting**: Detailed comparison tables and performance summaries
84+
- **Speedup Calculations**: Clear indication of performance differences
85+
3786
## Notes
3887
- Ensure the database user has the necessary permissions to create and drop tables and stored procedures.
39-
- The script uses permanent tables prefixed with `perfbenchmark_` for benchmarking purposes.
40-
- A stored procedure named `perfbenchmark_stored_procedure` is created and used during the benchmarks.
88+
- The `bench_mssql.py` script uses permanent tables prefixed with `perfbenchmark_` for benchmarking purposes.
89+
- A stored procedure named `perfbenchmark_stored_procedure` is created and used during the benchmarks.
90+
- The `perf-benchmarking.py` script connects to AdventureWorks2022 and requires read permissions only.

0 commit comments

Comments
 (0)