Skip to content

Commit 0a5d1f2

Browse files
gargsaumyabewithgauravjahnvi480
authored
FEAT: Perf Improvement (#304)
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#39793](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/39793) <!-- External contributors: GitHub Issue --> > GitHub Issue: #<ISSUE_NUMBER> ------------------------------------------------------------------- ### Summary <!-- Insert your summary of changes below. Minimum 10 characters required. --> 1. Large result sets (100K+ rows) improved by up to ~2× 2. Very large result sets (~1.2M rows) improved by 1.4× to 1.7× 3. Complex joins and aggregation workloads improved by ~40–45% 4. General query workloads now operate at parity or better compared to pyodbc This pull request introduces several performance and correctness improvements to the MSSQL Python driver, focusing on efficient row conversion, cursor behavior, and pybind object handling. The most significant changes include caching output converters and column maps for rows, enforcing forward-only cursor semantics, and optimizing Python type/class lookups in the C++ extension layer. **Row conversion and cursor improvements:** * Added caching of column name maps and output converter maps in the `Cursor` class, so that row and column conversions are computed only once per result set, improving performance for large queries. This affects `fetchone`, `fetchmany`, and `fetchall` methods, as well as execution logic. [[1]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L124-R131) [[2]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L826-R848) [[3]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R1160-R1167) [[4]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L1961-R1994) [[5]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R2038-R2046) [[6]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R2080-R2088) * Changed the cursor's scroll behavior to explicitly reject absolute positioning for forward-only cursors, and implemented relative scrolling using repeated fetches to match pyodbc's behavior. **C++ extension (pybind) optimizations:** * Introduced a `PythonObjectCache` namespace to cache commonly used Python classes (datetime, date, time, decimal, uuid), reducing repeated module imports and attribute lookups throughout the C++ codebase. All parameter binding and result conversion logic now uses this cache. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R37-R98) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L461-R523) [[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L478-R540) [[4]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L491-R553) [[5]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L535-R597) [[6]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2005-R2067) [[7]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2083-R2146) * Added caching of the decimal separator in the result set conversion logic to avoid repeated system calls during data retrieval. **Cursor semantics and configuration:** * Updated statement execution logic in the C++ layer to always configure cursors as forward-only, ensuring consistent behavior and compatibility with the Python cursor implementation. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L1422-R1488) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L1559-R1625) These changes collectively improve performance and reliability when fetching large result sets, ensure correct cursor behavior, and make the C++ extension more efficient in its interaction with Python objects. <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) ### Contribution Guidelines External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary --> --------- Co-authored-by: Gaurav Sharma <sharmag@microsoft.com> Co-authored-by: Jahnvi Thakkar <61936179+jahnvi480@users.noreply.github.com>
1 parent d339f6d commit 0a5d1f2

File tree

10 files changed

+973
-751
lines changed

10 files changed

+973
-751
lines changed

benchmarks/perf-benchmarking.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
if not CONN_STR:
3333
print("Error: The environment variable DB_CONNECTION_STRING is not set. Please set it to a valid SQL Server connection string and try again.")
3434
sys.exit(1)
35+
3536
# Ensure pyodbc connection string has ODBC driver specified
3637
if CONN_STR and 'Driver=' not in CONN_STR:
3738
CONN_STR = f"Driver={{ODBC Driver 18 for SQL Server}};{CONN_STR}"

mssql_python/__init__.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212

1313
# Exceptions
1414
# https://www.python.org/dev/peps/pep-0249/#exceptions
15+
16+
# Import necessary modules
1517
from .exceptions import (
1618
Warning,
1719
Error,
@@ -175,6 +177,19 @@ def pooling(max_size: int = 100, idle_timeout: int = 600, enabled: bool = True)
175177

176178
_original_module_setattr = sys.modules[__name__].__setattr__
177179

180+
def _custom_setattr(name, value):
181+
if name == 'lowercase':
182+
with _settings_lock:
183+
_settings.lowercase = bool(value)
184+
# Update the module's lowercase variable
185+
_original_module_setattr(name, _settings.lowercase)
186+
else:
187+
_original_module_setattr(name, value)
188+
189+
# Replace the module's __setattr__ with our custom version
190+
sys.modules[__name__].__setattr__ = _custom_setattr
191+
192+
178193
# Export SQL constants at module level
179194
SQL_VARCHAR: int = ConstantsDDBC.SQL_VARCHAR.value
180195
SQL_LONGVARCHAR: int = ConstantsDDBC.SQL_LONGVARCHAR.value
@@ -281,4 +296,4 @@ def lowercase(self, value: bool) -> None:
281296
sys.modules[__name__] = new_module
282297

283298
# Initialize property values
284-
lowercase: bool = _settings.lowercase
299+
lowercase: bool = _settings.lowercase

mssql_python/cursor.py

Lines changed: 119 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -121,18 +121,14 @@ def __init__(self, connection: "Connection", timeout: int = 0) -> None:
121121
# Therefore, it must be a list with exactly one bool element.
122122

123123
# rownumber attribute
124-
self._rownumber: int = (
125-
-1
126-
) # DB-API extension: last returned row index, -1 before first
127-
self._next_row_index: int = (
128-
0 # internal: index of the next row the driver will return (0-based)
129-
)
130-
self._has_result_set: bool = False # Track if we have an active result set
131-
self._skip_increment_for_next_fetch: bool = (
132-
False # Track if we need to skip incrementing the row index
133-
)
134-
135-
self.messages: List[str] = [] # Store diagnostic messages
124+
self._rownumber = -1 # DB-API extension: last returned row index, -1 before first
125+
126+
self._cached_column_map = None
127+
self._cached_converter_map = None
128+
self._next_row_index = 0 # internal: index of the next row the driver will return (0-based)
129+
self._has_result_set = False # Track if we have an active result set
130+
self._skip_increment_for_next_fetch = False # Track if we need to skip incrementing the row index
131+
self.messages = [] # Store diagnostic messages
136132

137133
def _is_unicode_string(self, param: str) -> bool:
138134
"""
@@ -823,7 +819,57 @@ def _initialize_description(self, column_metadata: Optional[Any] = None) -> None
823819
)
824820
self.description = description
825821

826-
def _map_data_type(self, sql_type: int) -> type:
822+
def _build_converter_map(self):
823+
"""
824+
Build a pre-computed converter map for output converters.
825+
Returns a list where each element is either a converter function or None.
826+
This eliminates the need to look up converters for every row.
827+
"""
828+
if not self.description or not hasattr(self.connection, '_output_converters') or not self.connection._output_converters:
829+
return None
830+
831+
converter_map = []
832+
833+
for desc in self.description:
834+
if desc is None:
835+
converter_map.append(None)
836+
continue
837+
sql_type = desc[1]
838+
converter = self.connection.get_output_converter(sql_type)
839+
# If no converter found for the SQL type, try the WVARCHAR converter as a fallback
840+
if converter is None:
841+
from mssql_python.constants import ConstantsDDBC
842+
converter = self.connection.get_output_converter(ConstantsDDBC.SQL_WVARCHAR.value)
843+
844+
converter_map.append(converter)
845+
846+
return converter_map
847+
848+
def _get_column_and_converter_maps(self):
849+
"""
850+
Get column map and converter map for Row construction (thread-safe).
851+
This centralizes the column map building logic to eliminate duplication
852+
and ensure thread-safe lazy initialization.
853+
854+
Returns:
855+
tuple: (column_map, converter_map)
856+
"""
857+
# Thread-safe lazy initialization of column map
858+
column_map = self._cached_column_map
859+
if column_map is None and self.description:
860+
# Build column map locally first, then assign to cache
861+
column_map = {col_desc[0]: i for i, col_desc in enumerate(self.description)}
862+
self._cached_column_map = column_map
863+
864+
# Fallback to legacy column name map if no cached map
865+
column_map = column_map or getattr(self, '_column_name_map', None)
866+
867+
# Get cached converter map
868+
converter_map = getattr(self, '_cached_converter_map', None)
869+
870+
return column_map, converter_map
871+
872+
def _map_data_type(self, sql_type):
827873
"""
828874
Map SQL data type to Python data type.
829875
@@ -1135,9 +1181,14 @@ def execute( # pylint: disable=too-many-locals,too-many-branches,too-many-state
11351181
if self.description: # If we have column descriptions, it's likely a SELECT
11361182
self.rowcount = -1
11371183
self._reset_rownumber()
1184+
# Pre-build column map and converter map
1185+
self._cached_column_map = {col_desc[0]: i for i, col_desc in enumerate(self.description)}
1186+
self._cached_converter_map = self._build_converter_map()
11381187
else:
11391188
self.rowcount = ddbc_bindings.DDBCSQLRowCount(self.hstmt)
11401189
self._clear_rownumber()
1190+
self._cached_column_map = None
1191+
self._cached_converter_map = None
11411192

11421193
# After successful execution, initialize description if there are results
11431194
column_metadata = []
@@ -1957,11 +2008,11 @@ def fetchone(self) -> Union[None, Row]:
19572008
self._increment_rownumber()
19582009

19592010
self.rowcount = self._next_row_index
1960-
1961-
# Create and return a Row object, passing column name map if available
1962-
column_map = getattr(self, "_column_name_map", None)
1963-
return Row(self, self.description, row_data, column_map)
1964-
except Exception as e: # pylint: disable=broad-exception-caught
2011+
2012+
# Get column and converter maps
2013+
column_map, converter_map = self._get_column_and_converter_maps()
2014+
return Row(row_data, column_map, cursor=self, converter_map=converter_map)
2015+
except Exception as e:
19652016
# On error, don't increment rownumber - rethrow the error
19662017
raise e
19672018

@@ -2004,14 +2055,13 @@ def fetchmany(self, size: Optional[int] = None) -> List[Row]:
20042055
self.rowcount = 0
20052056
else:
20062057
self.rowcount = self._next_row_index
2007-
2058+
2059+
# Get column and converter maps
2060+
column_map, converter_map = self._get_column_and_converter_maps()
2061+
20082062
# Convert raw data to Row objects
2009-
column_map = getattr(self, "_column_name_map", None)
2010-
return [
2011-
Row(self, self.description, row_data, column_map)
2012-
for row_data in rows_data
2013-
]
2014-
except Exception as e: # pylint: disable=broad-exception-caught
2063+
return [Row(row_data, column_map, cursor=self, converter_map=converter_map) for row_data in rows_data]
2064+
except Exception as e:
20152065
# On error, don't increment rownumber - rethrow the error
20162066
raise e
20172067

@@ -2044,14 +2094,13 @@ def fetchall(self) -> List[Row]:
20442094
self.rowcount = 0
20452095
else:
20462096
self.rowcount = self._next_row_index
2047-
2097+
2098+
# Get column and converter maps
2099+
column_map, converter_map = self._get_column_and_converter_maps()
2100+
20482101
# Convert raw data to Row objects
2049-
column_map = getattr(self, "_column_name_map", None)
2050-
return [
2051-
Row(self, self.description, row_data, column_map)
2052-
for row_data in rows_data
2053-
]
2054-
except Exception as e: # pylint: disable=broad-exception-caught
2102+
return [Row(row_data, column_map, cursor=self, converter_map=converter_map) for row_data in rows_data]
2103+
except Exception as e:
20552104
# On error, don't increment rownumber - rethrow the error
20562105
raise e
20572106

@@ -2070,16 +2119,35 @@ def nextset(self) -> Union[bool, None]:
20702119
# Clear messages per DBAPI
20712120
self.messages = []
20722121

2122+
# Clear cached column and converter maps for the new result set
2123+
self._cached_column_map = None
2124+
self._cached_converter_map = None
2125+
20732126
# Skip to the next result set
20742127
ret = ddbc_bindings.DDBCSQLMoreResults(self.hstmt)
20752128
check_error(ddbc_sql_const.SQL_HANDLE_STMT.value, self.hstmt, ret)
20762129

20772130
if ret == ddbc_sql_const.SQL_NO_DATA.value:
20782131
self._clear_rownumber()
2132+
self.description = None
20792133
return False
20802134

20812135
self._reset_rownumber()
20822136

2137+
# Initialize description for the new result set
2138+
column_metadata = []
2139+
try:
2140+
ddbc_bindings.DDBCSQLDescribeCol(self.hstmt, column_metadata)
2141+
self._initialize_description(column_metadata)
2142+
2143+
# Pre-build column map and converter map for the new result set
2144+
if self.description:
2145+
self._cached_column_map = {col_desc[0]: i for i, col_desc in enumerate(self.description)}
2146+
self._cached_converter_map = self._build_converter_map()
2147+
except Exception as e: # pylint: disable=broad-exception-caught
2148+
# If describe fails, there might be no results in this result set
2149+
self.description = None
2150+
20832151
return True
20842152

20852153
def __enter__(self):
@@ -2252,58 +2320,34 @@ def scroll(self, value: int, mode: str = "relative") -> None: # pylint: disable
22522320

22532321
row_data: list = []
22542322

2255-
# Absolute special cases
2323+
# Absolute positioning not supported with forward-only cursors
22562324
if mode == "absolute":
2257-
if value == -1:
2258-
# Before first
2259-
ddbc_bindings.DDBCSQLFetchScroll(
2260-
self.hstmt, ddbc_sql_const.SQL_FETCH_ABSOLUTE.value, 0, row_data
2261-
)
2262-
self._rownumber = -1
2263-
self._next_row_index = 0
2264-
return
2265-
if value == 0:
2266-
# Before first, but tests want rownumber==0 pre and post the next fetch
2267-
ddbc_bindings.DDBCSQLFetchScroll(
2268-
self.hstmt, ddbc_sql_const.SQL_FETCH_ABSOLUTE.value, 0, row_data
2269-
)
2270-
self._rownumber = 0
2271-
self._next_row_index = 0
2272-
self._skip_increment_for_next_fetch = True
2273-
return
2325+
raise NotSupportedError(
2326+
"Absolute positioning not supported",
2327+
"Forward-only cursors do not support absolute positioning"
2328+
)
22742329

22752330
try:
22762331
if mode == "relative":
22772332
if value == 0:
22782333
return
2279-
ret = ddbc_bindings.DDBCSQLFetchScroll(
2280-
self.hstmt, ddbc_sql_const.SQL_FETCH_RELATIVE.value, value, row_data
2281-
)
2282-
if ret == ddbc_sql_const.SQL_NO_DATA.value:
2283-
raise IndexError(
2284-
"Cannot scroll to specified position: end of result set reached"
2334+
2335+
# For forward-only cursors, use multiple SQL_FETCH_NEXT calls
2336+
# This matches pyodbc's approach for skip operations
2337+
for i in range(value):
2338+
ret = ddbc_bindings.DDBCSQLFetchScroll(
2339+
self.hstmt, ddbc_sql_const.SQL_FETCH_NEXT.value, 0, row_data
22852340
)
2286-
# Consume N rows; last-returned index advances by N
2341+
if ret == ddbc_sql_const.SQL_NO_DATA.value:
2342+
raise IndexError(
2343+
"Cannot scroll to specified position: end of result set reached"
2344+
)
2345+
2346+
# Update position tracking
22872347
self._rownumber = self._rownumber + value
22882348
self._next_row_index = self._rownumber + 1
22892349
return
22902350

2291-
# absolute(k>0): map Python k (0-based next row) to ODBC ABSOLUTE k (1-based),
2292-
# intentionally passing k so ODBC fetches row #k (1-based), i.e., 0-based (k-1),
2293-
# leaving the NEXT fetch to return 0-based index k.
2294-
ret = ddbc_bindings.DDBCSQLFetchScroll(
2295-
self.hstmt, ddbc_sql_const.SQL_FETCH_ABSOLUTE.value, value, row_data
2296-
)
2297-
if ret == ddbc_sql_const.SQL_NO_DATA.value:
2298-
raise IndexError(
2299-
f"Cannot scroll to position {value}: end of result set reached"
2300-
)
2301-
2302-
# Tests expect rownumber == value after absolute(value)
2303-
# Next fetch should return row index 'value'
2304-
self._rownumber = value
2305-
self._next_row_index = value
2306-
23072351
except Exception as e: # pylint: disable=broad-exception-caught
23082352
if isinstance(e, (IndexError, NotSupportedError)):
23092353
raise
@@ -2457,4 +2501,5 @@ def setoutputsize(self, size: int, column: Optional[int] = None) -> None:
24572501
This method is a no-op in this implementation as buffer sizes
24582502
are managed automatically by the underlying driver.
24592503
"""
2460-
# This is a no-op - buffer sizes are managed automatically
2504+
# This is a no-op - buffer sizes are managed automatically
2505+

mssql_python/helpers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,4 +343,4 @@ def __init__(self) -> None:
343343
def get_settings() -> Settings:
344344
"""Return the global settings object"""
345345
with _settings_lock:
346-
return _settings
346+
return _settings

0 commit comments

Comments
 (0)