FEAT: Perf Improvement #304

gargsaumya · 2025-10-28T10:43:43Z

Work Item / Issue Reference

AB#39793

GitHub Issue: #<ISSUE_NUMBER>

Summary

Large result sets (100K+ rows) improved by up to ~2×
Very large result sets (~1.2M rows) improved by 1.4× to 1.7×
Complex joins and aggregation workloads improved by ~40–45%
General query workloads now operate at parity or better compared to pyodbc

This pull request introduces several performance and correctness improvements to the MSSQL Python driver, focusing on efficient row conversion, cursor behavior, and pybind object handling. The most significant changes include caching output converters and column maps for rows, enforcing forward-only cursor semantics, and optimizing Python type/class lookups in the C++ extension layer.

Row conversion and cursor improvements:

Added caching of column name maps and output converter maps in the Cursor class, so that row and column conversions are computed only once per result set, improving performance for large queries. This affects fetchone, fetchmany, and fetchall methods, as well as execution logic. [1] [2] [3] [4] [5] [6]
Changed the cursor's scroll behavior to explicitly reject absolute positioning for forward-only cursors, and implemented relative scrolling using repeated fetches to match pyodbc's behavior.

C++ extension (pybind) optimizations:

Introduced a PythonObjectCache namespace to cache commonly used Python classes (datetime, date, time, decimal, uuid), reducing repeated module imports and attribute lookups throughout the C++ codebase. All parameter binding and result conversion logic now uses this cache. [1] [2] [3] [4] [5] [6] [7]
Added caching of the decimal separator in the result set conversion logic to avoid repeated system calls during data retrieval.

Cursor semantics and configuration:

Updated statement execution logic in the C++ layer to always configure cursors as forward-only, ensuring consistent behavior and compatibility with the Python cursor implementation. [1] [2]

These changes collectively improve performance and reliability when fetching large result sets, ensure correct cursor behavior, and make the C++ extension more efficient in its interaction with Python objects.

Copilot

Pull Request Overview

This PR implements performance optimizations for database row fetching operations with a focus on reducing repeated operations and memory allocations. The key changes optimize the Row object initialization, introduce Python object caching in C++ bindings, and modify the cursor type from static to forward-only.

Refactored Row initialization to accept pre-built column and converter maps, reducing redundant computation
Added PythonObjectCache namespace in C++ to cache frequently used Python class imports (datetime, decimal, UUID)
Changed cursor type from SQL_CURSOR_STATIC to SQL_CURSOR_FORWARD_ONLY for better performance
Optimized batch fetching by pre-allocating rows and using direct list indexing instead of append operations

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
mssql_python/row.py	Refactored Row initialization to accept pre-built column/converter maps and added optimized converter application method
mssql_python/pybind/ddbc_bindings.cpp	Added PythonObjectCache for frequently imported classes, optimized decimal separator handling, changed cursor type to forward-only, and optimized batch fetch operations with pre-allocation
mssql_python/cursor.py	Added caching of column maps and converter maps, updated Row instantiation calls, and added scrolling validation check

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mssql_python/cursor.py

mssql_python/pybind/ddbc_bindings.cpp

mssql_python/cursor.py

mssql_python/row.py

> [AB#29184](https://sqlclientdrivers.visualstudio.com/mssql-python/_workitems/edit/39184)  > GitHub Issue: #213 -------------------------------------------------------------------  This pull request updates how `datetimeoffset` values are handled when reading from SQL Server in the Python bindings. The main change is to preserve the original timezone information in returned Python `datetime` objects, instead of always converting them to UTC. Correspondingly, the test suite has been updated to compare datetimes with their original timezone rather than converting to UTC for assertions. **Datetimeoffset handling improvements:** * Removed forced conversion of `datetimeoffset` values to UTC in `SQLGetData_wrap` and `FetchBatchData`, so Python datetime objects retain their original timezone info. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2808) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3321) **Test suite updates:** * Updated all relevant tests in `tests/test_004_cursor.py` to compare datetimes directly, preserving timezone information, instead of converting to UTC for equality checks. This affects tests for read/write, max/min offsets, DST transitions, executemany, and extreme offsets. [[1]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L7890-R7890) [[2]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L7929-R7924) [[3]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L7989-R7979) [[4]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L8071-R8056) [[5]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L8147-R8122)

### Work Item / Issue Reference   > AB#<WORK_ITEM_ID>  > GitHub Issue: #286 ------------------------------------------------------------------- ### Summary  Reintroduce Static Buffer as a temporary hotfix, will keep a new task to remove static tokens.

### Work Item / Issue Reference   > [AB#39534](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/39534) ------------------------------------------------------------------- ### Summary  This pull request updates the package to version 0.13.1 and introduces several reliability and stability improvements, particularly around authentication, timezone handling, connection pooling, and type processing. These changes are reflected in both the documentation and the package configuration. Release and documentation updates: * Updated the package version to `0.13.1` in `setup.py` to reflect the new release. * Revised the "What's new" section in `PyPI_Description.md` to highlight the main changes and improvements in v0.13.1. Reliability and stability improvements: * Fixed token handling for Microsoft Entra ID authentication to ensure stable and reliable connections. * Enhanced connection pool shutdown mechanism to prevent resource leaks and ensure reliable cleanup. Data handling improvements: * Removed forced UTC conversion for `datetimeoffset` values, preserving original timezone information in Python `datetime` objects. * Refined UUID string parameter handling to prevent automatic type coercion, ensuring predictable string processing.

> [AB#38821](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38821)  > GitHub Issue: #<ISSUE_NUMBER> -------------------------------------------------------------------  This pull request adds comprehensive support for the SQL Server `XML` data type to the Python MSSQL driver, ensuring proper handling for insertion, retrieval, batching, and streaming of XML data. It also introduces a suite of tests to verify correct XML behavior, including edge cases like empty, large, and malformed XML values. * Added support for the `SQL_SS_XML` data type throughout the driver, including binding, fetching, and row size calculations, so that XML columns are handled correctly during data operations. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R24) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2529-R2534) [[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2992) [[4]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R3195-R3198) [[5]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R3412) * Updated logic in `FetchMany_wrap` and `FetchAll_wrap` to treat XML columns as LOBs, enabling efficient streaming for large XML values. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3504-R3517) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3626-R3639) * Added multiple tests in `tests/test_004_cursor.py` to verify XML handling, including basic insert/fetch, empty/null values, large XML streaming, batch inserts, and error handling for malformed XML input.

…RIC_STRUCT (#287)   > [AB#38111](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38111)  > GitHub Issue: #<ISSUE_NUMBER> -------------------------------------------------------------------  This pull request significantly improves the handling of SQL Server NUMERIC/DECIMAL values in both the Python and C++ layers, addressing precision, scale, and binary representation for high-precision decimals. It also introduces a comprehensive suite of tests to validate numeric roundtrip, edge cases, and boundary conditions. The changes ensure compliance with SQL Server's maximum precision (38 digits), robust conversion between Python decimals and SQL binary formats, and better test coverage for numeric types. * The `_get_numeric_data` method in `cursor.py` now correctly calculates the binary representation of decimal values, supporting up to 38 digits of precision, and constructs the byte array for SQL Server compatibility. The restriction on precision is raised from 15 to 38 digits. [[1]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L198-R199) [[2]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L218-R223) [[3]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L232-R251) * The C++ `NumericData` struct now stores the value as a binary string (16 bytes) instead of a 64-bit integer, allowing support for high-precision numerics. Related memory handling is updated for parameter binding. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R24) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L59-R65) [[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L560-R564) [[4]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2055-R2065) [[5]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3797-R3801) * Old numeric tests were removed and replaced with a new, thorough set of tests covering roundtrip for basic, high-precision, negative, zero, small, boundary, NULL, fetchmany, and executemany scenarios for numeric values. This ensures that all critical cases are validated. [[1]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L1643-L1658) [[2]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L1724-L1765) [[3]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69R11348-R11564) --- These changes collectively make the library more robust and compliant with SQL Server's numeric type requirements, and the expanded tests will help catch future regressions.

### Work Item / Issue Reference   > [AB#39058](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/39058) ------------------------------------------------------------------- ### Summary This pull request introduces comprehensive support for setting ODBC connection attributes in the `mssql_python` library, aligning its functionality with pyodbc's `set_attr` API. The changes include new constants for connection attributes, transaction isolation levels, and related options, as well as robust error handling and input validation in both Python and C++ layers. This enables users to configure connection behavior (e.g., autocommit, isolation level, timeouts) in a standardized and secure manner. ### Connection Attribute Support * Added a wide set of ODBC connection attribute constants, transaction isolation level constants, access mode constants, and related enums to `mssql_python/__init__.py` and `mssql_python/constants.py`, making them available for use in Python code. * Implemented the `set_attr` method in the `Connection` Python class, providing pyodbc-compatible functionality for setting connection attributes with detailed input validation and error handling. ### C++ Backend Enhancements * Exposed `setAttribute` as a public method in the C++ `Connection` class, and added a new `setAttr` method in `ConnectionHandle`, with improved error reporting and range validation for SQLUINTEGER values. * Registered the new `set_attr` method with the Python bindings, making it accessible from Python code. ### Code Cleanup and Refactoring * Moved and consolidated connection attribute constants in `ConstantsDDBC` to improve maintainability, and removed legacy/unused constants. These changes provide a robust interface for configuring ODBC connection attributes, improve compatibility with pyodbc, and enhance error handling for attribute operations.

> [AB#36303](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/36303) > [AB#38478](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38478)  > GitHub Issue: #22 ------------------------------------------------------------------- This pull request refactors the `mssql_python` package to improve type safety and code clarity by adding explicit type annotations throughout the codebase. The changes mainly focus on the `__init__.py` and `auth.py` modules, updating function signatures, global variables, and constants to use Python type hints. This will help with static analysis, improve IDE support, and make the code easier to understand and maintain. * Added type annotations to global variables, constants, and function signatures in `mssql_python/__init__.py`, including SQL constants and configuration settings. * Updated function signatures in `mssql_python/auth.py` to use type hints for parameters and return types, such as changing raw `list`/`dict` usage to `List[str]`, `Dict[int, bytes]`, and `Optional[...]`. * Improved formatting for multi-line statements and error messages, and standardized quote usage for strings. * Updated class and method definitions to use explicit type annotations for attributes and properties, especially in the `Settings` and custom module classes. * Improved parameter exclusion logic and connection string validation for authentication handling in `auth.py`. * Ensured that sensitive parameters are more robustly excluded from connection strings. These changes collectively enhance the maintainability and robustness of the codebase by leveraging Python's type system and improving code readability.

### Work Item / Issue Reference   > [AB#38478](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38478) ------------------------------------------------------------------- ### Summary This pull request refactors the `connection.cpp` and `connection.h` files to improve code readability, maintainability, and consistency, while also making minor corrections and clarifications to comments. The changes mainly involve formatting, type usage, and error handling improvements, as well as updating include paths and constructor signatures. **Code Formatting and Readability Improvements** * Reformatted function calls and argument lists for better readability, including breaking up long lines and grouping parameters logically in methods such as `getEnvHandle`, `allocateDbcHandle`, `commit`, `rollback`, and others in `connection.cpp`. * Improved comment formatting and clarity, including updating TODOs and explanatory comments to be more precise and easier to understand. **Type and Variable Usage Updates** * Updated integer types in `setAttribute` from `long long` to `int64_t` for clarity and platform consistency. * Improved buffer management for string and binary attributes by clarifying buffer lifetime logic and using more explicit type casts. **Error Handling Enhancements** * Enhanced error handling in attribute setting and connection attribute application, including more detailed error messages and fallback logic. **Include Path and Constructor Signature Updates** * Updated include paths in both `connection.cpp` and `connection.h` for consistency and to support future platform agnostic changes. * Modified the `ConnectionHandle` constructor signature to improve clarity and maintainability.

mssql_python/pybind/ddbc_bindings.cpp

Removed assertions for InterfaceError and DatabaseError instances in connection tests.

mssql_python/cursor.py

mssql_python/pybind/ddbc_bindings.cpp

sumitmsft · 2025-11-05T09:03:36Z

I've identified some tests:

Execute queries with different schemas on same cursor; verify no stale column mappings between executions.
Close cursor and verify _cached_column_map and _cached_converter_map are cleared to prevent memory leaks.
Execute same query 10,000 times; verify memory doesn't grow unboundedly (should stay < 50MB).
Execute stored procedure with multiple result sets; verify each set gets correct column mapping, not cached from previous.
Mock missing datetime module during cache init; verify graceful error (not segfault) with meaningful message.
Query datetime/decimal/uuid types; verify no NULL pointer dereference if cache initialization failed.
Simulate datetime succeeds but decimal fails during cache init; verify fallback or clear error, not undefined behavior.
Query NULL datetime, date, time, decimal, uuid; verify converters handle NULL without crashes.
Test decimals: 0.00, -0.00, 999999999.99, 0.0000000001; verify correct conversion with cached converters.
Test SQL Server min/max datetime (1753-01-01, 9999-12-31); verify cached datetime converter handles boundaries.
Execute SELECT on table, ALTER table add column, SELECT again on same cursor; verify cache invalidation.
Call cursor.tables() then execute SELECT; verify column/converter cache doesn't mix metadata with data.
Execute 3 queries with integers, strings, then datetime on same cursor; verify converter cache updates correctly.
Execute invalid query causing error, then valid query; verify cache state recovered, not corrupted.
Start large fetch, kill connection mid-fetch; verify graceful error and cache cleanup (no memory leak).

Let's see if they can be incorporated.

…l-python into saumya/pref-setup

gargsaumya added 9 commits October 14, 2025 09:56

test main.py

e3e3083

better-8

7a05ed6

latest

1a68058

latest 2

0811744

working

d9e2056

cleanup

2eb1a51

cleaning up

cbbe9d6

cleanup

cb5ae54

cleanup

766ea85

gargsaumya changed the title ~~FEAT: Perf~~ FEAT: Perf Improvement Oct 28, 2025

gargsaumya marked this pull request as ready for review October 29, 2025 12:24

Copilot AI review requested due to automatic review settings October 29, 2025 12:24

Copilot AI reviewed Oct 29, 2025

View reviewed changes

gargsaumya and others added 13 commits October 30, 2025 16:56

fixing test failure

fa7d718

fixed test crash

909a653

improved perf

47e1f8f

fixed pipeline tests

0db3e27

Merge branch 'main' into saumya/pref-setup

e1a45c1

github-actions bot added the pr-size: large Substantial code update label Nov 3, 2025

github-advanced-security bot found potential problems Nov 3, 2025

View reviewed changes

mssql_python/pybind/ddbc_bindings.cpp Fixed Show fixed Hide fixed

gargsaumya added 2 commits November 3, 2025 20:28

Remove redundant error assertions in connection tests

377db41

Removed assertions for InterfaceError and DatabaseError instances in connection tests.

Merge branch 'main' into saumya/pref-setup

d956d85

Merge branch 'bewithgaurav/perf-check-CI' into saumya/pref-setup

f4cca91

bewithgaurav dismissed their stale review via f4cca91 November 5, 2025 05:15

bewithgaurav added 6 commits November 5, 2025 10:49

installation uninstallation nuances

f1f22bd

Merge branch 'bewithgaurav/perf-check-CI' into saumya/pref-setup

6becc44

fix odbc driver

baa6f40

odbc installation in windows

1992cb3

odbc installation in windows

5308b1f

Merge branch 'bewithgaurav/perf-check-CI' into saumya/pref-setup

e000808

sumitmsft reviewed Nov 5, 2025

View reviewed changes

mssql_python/cursor.py Outdated Show resolved Hide resolved

sumitmsft reviewed Nov 5, 2025

View reviewed changes

mssql_python/pybind/ddbc_bindings.cpp Show resolved Hide resolved

sumitmsft reviewed Nov 5, 2025

View reviewed changes

mssql_python/pybind/ddbc_bindings.cpp Show resolved Hide resolved

sumitmsft reviewed Nov 5, 2025

View reviewed changes

mssql_python/pybind/ddbc_bindings.cpp Show resolved Hide resolved

gargsaumya added 5 commits November 5, 2025 16:37

addressed review comments

dea7946

Merge branch 'saumya/pref-setup' of https://github.com/microsoft/mssq…

01855f8

…l-python into saumya/pref-setup

Merge branch 'main' into saumya/pref-setup

1b38ba4

addressed review comments

0a995d4

Merge branch 'saumya/pref-setup' of https://github.com/microsoft/mssq…

0e290a1

…l-python into saumya/pref-setup

gargsaumya force-pushed the saumya/pref-setup branch from 2303a8e to 0e290a1 Compare November 5, 2025 17:44

gargsaumya added 6 commits November 5, 2025 23:18

debug hang in arm

0e24554

debug hang in arm

723ab4b

Merge branch 'saumya/pref-setup' of https://github.com/microsoft/mssq…

e297104

…l-python into saumya/pref-setup

skipping stress tests in pipeline as locally all stress tests pass

4ae3861

skipping stress tests in pipeline as locally all stress tests pass

8fc8546

Merge branch 'saumya/pref-setup' of https://github.com/microsoft/mssq…

4acc3d7

…l-python into saumya/pref-setup

bewithgaurav previously approved these changes Nov 6, 2025

View reviewed changes

sumitmsft previously approved these changes Nov 6, 2025

View reviewed changes

Merge branch 'main' into saumya/pref-setup

618c511

gargsaumya dismissed stale reviews from sumitmsft and bewithgaurav via 618c511 November 6, 2025 06:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FEAT: Perf Improvement #304

FEAT: Perf Improvement #304

gargsaumya commented Oct 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sumitmsft commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

FEAT: Perf Improvement #304

Are you sure you want to change the base?

FEAT: Perf Improvement #304

Conversation

gargsaumya commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Work Item / Issue Reference

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sumitmsft commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gargsaumya commented Oct 28, 2025 •

edited

Loading