-
Notifications
You must be signed in to change notification settings - Fork 27
FEAT: Perf Improvement #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements performance optimizations for database row fetching operations with a focus on reducing repeated operations and memory allocations. The key changes optimize the Row object initialization, introduce Python object caching in C++ bindings, and modify the cursor type from static to forward-only.
- Refactored Row initialization to accept pre-built column and converter maps, reducing redundant computation
- Added PythonObjectCache namespace in C++ to cache frequently used Python class imports (datetime, decimal, UUID)
- Changed cursor type from SQL_CURSOR_STATIC to SQL_CURSOR_FORWARD_ONLY for better performance
- Optimized batch fetching by pre-allocating rows and using direct list indexing instead of append operations
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| mssql_python/row.py | Refactored Row initialization to accept pre-built column/converter maps and added optimized converter application method |
| mssql_python/pybind/ddbc_bindings.cpp | Added PythonObjectCache for frequently imported classes, optimized decimal separator handling, changed cursor type to forward-only, and optimized batch fetch operations with pre-allocation |
| mssql_python/cursor.py | Added caching of column maps and converter maps, updated Row instantiation calls, and added scrolling validation check |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
<!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#29184](https://sqlclientdrivers.visualstudio.com/mssql-python/_workitems/edit/39184) <!-- External contributors: GitHub Issue --> > GitHub Issue: #213 ------------------------------------------------------------------- <!-- Insert your summary of changes below. Minimum 10 characters required. --> This pull request updates how `datetimeoffset` values are handled when reading from SQL Server in the Python bindings. The main change is to preserve the original timezone information in returned Python `datetime` objects, instead of always converting them to UTC. Correspondingly, the test suite has been updated to compare datetimes with their original timezone rather than converting to UTC for assertions. **Datetimeoffset handling improvements:** * Removed forced conversion of `datetimeoffset` values to UTC in `SQLGetData_wrap` and `FetchBatchData`, so Python datetime objects retain their original timezone info. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2808) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3321) **Test suite updates:** * Updated all relevant tests in `tests/test_004_cursor.py` to compare datetimes directly, preserving timezone information, instead of converting to UTC for equality checks. This affects tests for read/write, max/min offsets, DST transitions, executemany, and extreme offsets. [[1]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L7890-R7890) [[2]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L7929-R7924) [[3]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L7989-R7979) [[4]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L8071-R8056) [[5]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L8147-R8122) <!-- > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > AB#<WORK_ITEM_ID> <!-- External contributors: GitHub Issue --> > GitHub Issue: #286 ------------------------------------------------------------------- ### Summary <!-- Insert your summary of changes below. Minimum 10 characters required. --> Reintroduce Static Buffer as a temporary hotfix, will keep a new task to remove static tokens. <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) ### Contribution Guidelines External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#39534](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/39534) ------------------------------------------------------------------- ### Summary <!-- Insert your summary of changes below. Minimum 10 characters required. --> This pull request updates the package to version 0.13.1 and introduces several reliability and stability improvements, particularly around authentication, timezone handling, connection pooling, and type processing. These changes are reflected in both the documentation and the package configuration. Release and documentation updates: * Updated the package version to `0.13.1` in `setup.py` to reflect the new release. * Revised the "What's new" section in `PyPI_Description.md` to highlight the main changes and improvements in v0.13.1. Reliability and stability improvements: * Fixed token handling for Microsoft Entra ID authentication to ensure stable and reliable connections. * Enhanced connection pool shutdown mechanism to prevent resource leaks and ensure reliable cleanup. Data handling improvements: * Removed forced UTC conversion for `datetimeoffset` values, preserving original timezone information in Python `datetime` objects. * Refined UUID string parameter handling to prevent automatic type coercion, ensuring predictable string processing. <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) ### Contribution Guidelines External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
<!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#38821](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38821) <!-- External contributors: GitHub Issue --> > GitHub Issue: #<ISSUE_NUMBER> ------------------------------------------------------------------- <!-- Insert your summary of changes below. Minimum 10 characters required. --> This pull request adds comprehensive support for the SQL Server `XML` data type to the Python MSSQL driver, ensuring proper handling for insertion, retrieval, batching, and streaming of XML data. It also introduces a suite of tests to verify correct XML behavior, including edge cases like empty, large, and malformed XML values. * Added support for the `SQL_SS_XML` data type throughout the driver, including binding, fetching, and row size calculations, so that XML columns are handled correctly during data operations. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R24) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2529-R2534) [[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2992) [[4]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R3195-R3198) [[5]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R3412) * Updated logic in `FetchMany_wrap` and `FetchAll_wrap` to treat XML columns as LOBs, enabling efficient streaming for large XML values. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3504-R3517) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3626-R3639) * Added multiple tests in `tests/test_004_cursor.py` to verify XML handling, including basic insert/fetch, empty/null values, large XML streaming, batch inserts, and error handling for malformed XML input. <!-- > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
…RIC_STRUCT (#287) <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#38111](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38111) <!-- External contributors: GitHub Issue --> > GitHub Issue: #<ISSUE_NUMBER> ------------------------------------------------------------------- <!-- Insert your summary of changes below. Minimum 10 characters required. --> This pull request significantly improves the handling of SQL Server NUMERIC/DECIMAL values in both the Python and C++ layers, addressing precision, scale, and binary representation for high-precision decimals. It also introduces a comprehensive suite of tests to validate numeric roundtrip, edge cases, and boundary conditions. The changes ensure compliance with SQL Server's maximum precision (38 digits), robust conversion between Python decimals and SQL binary formats, and better test coverage for numeric types. * The `_get_numeric_data` method in `cursor.py` now correctly calculates the binary representation of decimal values, supporting up to 38 digits of precision, and constructs the byte array for SQL Server compatibility. The restriction on precision is raised from 15 to 38 digits. [[1]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L198-R199) [[2]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L218-R223) [[3]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L232-R251) * The C++ `NumericData` struct now stores the value as a binary string (16 bytes) instead of a 64-bit integer, allowing support for high-precision numerics. Related memory handling is updated for parameter binding. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R24) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L59-R65) [[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L560-R564) [[4]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2055-R2065) [[5]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3797-R3801) * Old numeric tests were removed and replaced with a new, thorough set of tests covering roundtrip for basic, high-precision, negative, zero, small, boundary, NULL, fetchmany, and executemany scenarios for numeric values. This ensures that all critical cases are validated. [[1]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L1643-L1658) [[2]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69L1724-L1765) [[3]](diffhunk://#diff-82594712308ff34afa8b067af67db231e9a1372ef474da3db121e14e4d418f69R11348-R11564) --- These changes collectively make the library more robust and compliant with SQL Server's numeric type requirements, and the expanded tests will help catch future regressions. <!-- > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#39058](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/39058) ------------------------------------------------------------------- ### Summary This pull request introduces comprehensive support for setting ODBC connection attributes in the `mssql_python` library, aligning its functionality with pyodbc's `set_attr` API. The changes include new constants for connection attributes, transaction isolation levels, and related options, as well as robust error handling and input validation in both Python and C++ layers. This enables users to configure connection behavior (e.g., autocommit, isolation level, timeouts) in a standardized and secure manner. ### Connection Attribute Support * Added a wide set of ODBC connection attribute constants, transaction isolation level constants, access mode constants, and related enums to `mssql_python/__init__.py` and `mssql_python/constants.py`, making them available for use in Python code. * Implemented the `set_attr` method in the `Connection` Python class, providing pyodbc-compatible functionality for setting connection attributes with detailed input validation and error handling. ### C++ Backend Enhancements * Exposed `setAttribute` as a public method in the C++ `Connection` class, and added a new `setAttr` method in `ConnectionHandle`, with improved error reporting and range validation for SQLUINTEGER values. * Registered the new `set_attr` method with the Python bindings, making it accessible from Python code. ### Code Cleanup and Refactoring * Moved and consolidated connection attribute constants in `ConstantsDDBC` to improve maintainability, and removed legacy/unused constants. These changes provide a robust interface for configuring ODBC connection attributes, improve compatibility with pyodbc, and enhance error handling for attribute operations.
<!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#36303](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/36303) > [AB#38478](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38478) <!-- External contributors: GitHub Issue --> > GitHub Issue: #22 ------------------------------------------------------------------- This pull request refactors the `mssql_python` package to improve type safety and code clarity by adding explicit type annotations throughout the codebase. The changes mainly focus on the `__init__.py` and `auth.py` modules, updating function signatures, global variables, and constants to use Python type hints. This will help with static analysis, improve IDE support, and make the code easier to understand and maintain. * Added type annotations to global variables, constants, and function signatures in `mssql_python/__init__.py`, including SQL constants and configuration settings. * Updated function signatures in `mssql_python/auth.py` to use type hints for parameters and return types, such as changing raw `list`/`dict` usage to `List[str]`, `Dict[int, bytes]`, and `Optional[...]`. * Improved formatting for multi-line statements and error messages, and standardized quote usage for strings. * Updated class and method definitions to use explicit type annotations for attributes and properties, especially in the `Settings` and custom module classes. * Improved parameter exclusion logic and connection string validation for authentication handling in `auth.py`. * Ensured that sensitive parameters are more robustly excluded from connection strings. These changes collectively enhance the maintainability and robustness of the codebase by leveraging Python's type system and improving code readability.
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#38478](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38478) ------------------------------------------------------------------- ### Summary This pull request refactors the `connection.cpp` and `connection.h` files to improve code readability, maintainability, and consistency, while also making minor corrections and clarifications to comments. The changes mainly involve formatting, type usage, and error handling improvements, as well as updating include paths and constructor signatures. **Code Formatting and Readability Improvements** * Reformatted function calls and argument lists for better readability, including breaking up long lines and grouping parameters logically in methods such as `getEnvHandle`, `allocateDbcHandle`, `commit`, `rollback`, and others in `connection.cpp`. * Improved comment formatting and clarity, including updating TODOs and explanatory comments to be more precise and easier to understand. **Type and Variable Usage Updates** * Updated integer types in `setAttribute` from `long long` to `int64_t` for clarity and platform consistency. * Improved buffer management for string and binary attributes by clarifying buffer lifetime logic and using more explicit type casts. **Error Handling Enhancements** * Enhanced error handling in attribute setting and connection attribute application, including more detailed error messages and fallback logic. **Include Path and Constructor Signature Updates** * Updated include paths in both `connection.cpp` and `connection.h` for consistency and to support future platform agnostic changes. * Modified the `ConnectionHandle` constructor signature to improve clarity and maintainability.
Removed assertions for InterfaceError and DatabaseError instances in connection tests.
|
I've identified some tests:
Let's see if they can be incorporated. |
2303a8e to
0e290a1
Compare
618c511
Work Item / Issue Reference
Summary
This pull request introduces several performance and correctness improvements to the MSSQL Python driver, focusing on efficient row conversion, cursor behavior, and pybind object handling. The most significant changes include caching output converters and column maps for rows, enforcing forward-only cursor semantics, and optimizing Python type/class lookups in the C++ extension layer.
Row conversion and cursor improvements:
Cursorclass, so that row and column conversions are computed only once per result set, improving performance for large queries. This affectsfetchone,fetchmany, andfetchallmethods, as well as execution logic. [1] [2] [3] [4] [5] [6]C++ extension (pybind) optimizations:
PythonObjectCachenamespace to cache commonly used Python classes (datetime, date, time, decimal, uuid), reducing repeated module imports and attribute lookups throughout the C++ codebase. All parameter binding and result conversion logic now uses this cache. [1] [2] [3] [4] [5] [6] [7]Cursor semantics and configuration:
These changes collectively improve performance and reliability when fetching large result sets, ensure correct cursor behavior, and make the C++ extension more efficient in its interaction with Python objects.