Skip to content

Conversation

@saurabh500
Copy link

@saurabh500 saurabh500 commented Oct 29, 2025

Work Item / Issue Reference

AB#39896

GitHub Issue: #306


Summary

Fixes #306 by introducing a connection string parser for mssql-python.

PR Title Guide

FEAT: Improve connection string parsing, honor the allow list, and improve error messaging.

Connection String Allowlist Feature - Review Guide

Overview

This PR introduces a comprehensive connection string validation and allowlist system that validates all ODBC connection parameters before passing them to the driver, improving usability and providing better error messages.


What This Changes

Before: Connection strings were passed directly to the ODBC driver with minimal or no validation. Unknown parameters were silently ignored by ODBC, malformed strings caused cryptic ODBC errors.

After: All connection strings are parsed, validated against an allowlist of ODBC Driver 18 parameters, and reconstructed with proper escaping. Clear error messages are provided for any issues.


Data Flow

User Input (connection string + kwargs)
    ↓
┌─────────────────────────────────────────────┐
│ Connection.__init__()                        │
│ _construct_connection_string()              │
└─────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────┐
│ Step 1: Parse Connection String             │
│ _ConnectionStringParser.parse()             │
│   - Tokenizes key=value pairs               │
│   - Handles braced values: {val}            │
│   - Processes escape sequences: }}-> }      │
│   - Detects syntax errors                   │
│   Output: Dict[str, str]                    │
└─────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────┐
│ Step 2: Validate Against Allowlist          │
│ ConnectionStringAllowList.normalize_keys()   │
│   - Checks unknown parameters               │
│   - Normalizes synonyms (host→Server)       │
│   - Blocks reserved params (Driver, APP)    │
│   - Warns about rejected params             │
│   Output: Dict[str, str] (filtered)         │
└─────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────┐
│ Step 3: Process kwargs                      │
│   - Normalize each kwarg key                │
│   - Block reserved parameters               │
│   - Merge into filtered params              │
│   - kwargs override connection string       │
└─────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────┐
│ Step 4: Build Connection String             │
│ _ConnectionStringBuilder.build()            │
│   - Add Driver (hardcoded)                  │
│   - Add APP=MSSQL-Python                    │
│   - Escape special characters               │
│   - Format: key1=value1;key2=value2;        │
│   Output: str (final connection string)     │
└─────────────────────────────────────────────┘
    ↓
ODBC Driver (validated, safe connection string)

File Structure & Review Order

Recommended Review Order

Review in this sequence to understand the architecture:

1. Core Components (understand the building blocks)

  1. mssql_python/connection_string_parser.py Start here

    • Purpose: Parses ODBC connection strings per MS-ODBCSTR spec
    • Key Classes:
      • ConnectionStringParseError - Custom exception
      • _ConnectionStringParser - Main parser class
    • Key Methods:
      • parse(connection_str) - Main entry point
      • _parse_value(str, pos) - Handles braced values & escaping
    • What to Look For:
      • Proper handling of escape sequences ({{, }})
      • Error collection and batch reporting
      • Edge cases: empty strings, whitespace, special chars
  2. mssql_python/connection_string_allowlist.py Critical

    • Purpose: Validates parameters against ODBC Driver 18 keywords
    • Key Classes:
      • ConnectionStringAllowList - Singleton allowlist manager
    • Key Methods:
      • normalize_key(key) - Maps synonyms to canonical names
      • filter_params(params) - Validates and filters parameters
    • What to Look For:
      • Completeness of allowlist (compare with ODBC docs)
      • Synonym mappings (host→Server, user→UID, etc.)
      • Reserved parameter handling (Driver, APP)
  3. mssql_python/connection_string_builder.py

    • Purpose: Safely constructs connection strings with escaping
    • Key Classes:
      • _ConnectionStringBuilder - Builder with escape logic
    • Key Methods:
      • add_param(key, value) - Adds a parameter
      • _needs_braces(value) - Determines if braces needed
      • _escape_value(value) - Escapes special characters
      • build() - Constructs final string
    • What to Look For:
      • Correct escaping logic (;, {, }, =)
      • Proper brace placement
      • Semicolon formatting

2. Integration (see how it fits together)

  1. mssql_python/connection.py Integration point

    • Modified Method: _construct_connection_string()
    • What Changed:
      • Lines 241-303: New implementation replacing old concat logic
      • Now uses parser-> filter-> builder pipeline
      • Handles kwargs with allowlist validation
      • Raises ValueError for reserved parameters
    • What to Look For:
      • Error handling for parse failures
      • kwargs override behavior
      • Reserved parameter rejection
      • Logging of warnings
  2. mssql_python/__init__.py

    • What Changed:
      • Added export: ConnectionStringParseError
    • What to Look For:
      • Proper exception exposure for users

3. Tests (validate functionality)

  1. tests/test_010_connection_string_parser.py Parser tests

    • Coverage: ~40 test cases
    • Test Categories:
      • Valid parsing scenarios
      • Braced value handling
      • Escape sequence processing
      • Error detection and reporting
      • Edge cases (empty, whitespace, unicode)
    • What to Look For:
      • Test coverage of MS-ODBCSTR spec
      • Error message clarity
      • Edge case handling
  2. tests/test_011_connection_string_allowlist.py Allowlist tests

    • Coverage: ~25 test cases
    • Test Categories:
      • Key normalization (synonyms)
      • Parameter filtering
      • Reserved parameter blocking
      • Unknown parameter detection
    • What to Look For:
      • All ODBC parameters tested
      • Synonym mappings validated
      • Security (Driver/APP blocking)
  3. tests/test_012_connection_string_integration.py Integration tests

    • Coverage: ~20 test cases
    • Test Categories:
      • End-to-end parsing-> filtering-> building
      • Real connection string scenarios
      • Error propagation from connect() API
      • kwargs override behavior
    • What to Look For:
      • Real-world usage patterns
      • Error messages match user expectations
      • Backward compatibility where possible
  4. tests/test_003_connection.py (Updated)

    • What Changed:
      • Updated assertions in test_construct_connection_string()
      • Updated assertions in test_connection_string_with_attrs_before()
      • Updated assertions in test_connection_string_with_odbc_param()
    • What to Look For:
      • Assertions match new builder output format
      • No semicolons in middle of assertions (builder handles formatting)
  5. tests/test_006_exceptions.py (Updated)

    • What Changed:
      • test_connection_error() now expects ConnectionStringParseError
      • Updated error message assertions
    • What to Look For:
      • Proper exception type changes
      • Error messages are helpful

4. Documentation (understand design decisions)

  1. docs/connection_string_allow_list_design.md Read this

    • Content:
      • Design rationale and motivation
      • Architecture decisions
      • Security considerations
      • Future enhancements
    • What to Look For:
      • Justification for approach
      • Trade-offs discussed
      • Security implications understood
  2. docs/parser_state_machine.md

    • Content:
      • Detailed state machine for parser
      • Character-by-character processing flow
      • Error handling states
      • Escape sequence examples
    • What to Look For:
      • Parser logic is well-documented
      • Edge cases covered in examples

Areas to Focus On

1. Security

  • Reserved Parameters: Verify Driver and APP cannot be set by users
  • Allowlist Completeness: Check all ODBC Driver 18 params are included
  • Escape Handling: Ensure no injection via special characters

2. Error Handling

  • Error Messages: Are they clear and actionable?
  • Error Collection: Multiple errors reported together?
  • Backward Compatibility: Do meaningful errors replace cryptic ODBC errors?

3. Performance

  • Parsing Overhead: There is no string split being used during parsing. Hence the cost of multiple allocation of strings is avoided.
  • No Regex in Hot Path: Parser uses character-by-character processing. Regex have been known to cause problems and its advisable to stay away from them.
  • Allowlist Lookup: O(1) dict lookups, minimal overhead

4. Correctness

  • Synonym Handling: All common aliases map correctly
  • Case Insensitivity: Keys normalized consistently

Testing Strategy

Test Coverage Map

Component Test File Key Scenarios
Parser test_010_connection_string_parser.py Syntax, escaping, errors
Allowlist test_011_connection_string_allowlist.py Validation, normalization
Builder test_012_connection_string_integration.py Escaping, formatting
Integration test_012_connection_string_integration.py End-to-end flows
Connection test_003_connection.py Connection string construction
Exceptions test_006_exceptions.py Error propagation

Behavior Changes

Should be aware of these behavioral changes:

1. Unknown Parameters Now Raise Errors

Before:

connect("Server=localhost;FakeParam=value")  # Silently ignored

After:

connect("Server=localhost;FakeParam=value")
# Raises: ConnectionStringParseError: Unknown keyword 'FakeParam' is not recognized

2. Malformed Strings Caught Early

Before:

connect("ServerLocalhost")  # ODBC error later

After:

connect("ServerLocalhost")
# Raises: ConnectionStringParseError: Incomplete specification: keyword 'ServerLocalhost' has no value

3. Reserved Parameters Blocked

Before:

connect("Server=localhost;Driver={SQL Server}")  # Maybe ignored

After:

connect("Server=localhost;Driver={SQL Server}")
# Raises: ValueError: Connection parameter 'Driver' is reserved and controlled by the driver

Key Concepts to Understand

ODBC Connection String Format

key1=value1;key2=value2;key3={val;ue}

Braced Values

Used when value contains semicolons or special characters:

PWD={my;pass;word} -> Password is "my;pass;word"

Escape Sequences

  • {{-> { (escaped left brace)
  • }}-> } (escaped right brace)

Example:

PWD={a}}b{{c} -> Password is "a}b{c"

@github-actions
Copy link

github-actions bot commented Oct 29, 2025

📊 Code Coverage Report

🔥 Diff Coverage

97%


🎯 Overall Coverage

77%


📈 Total Lines Covered: 4688 out of 6077
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/init.py (100%)
  • mssql_python/connection.py (100%)
  • mssql_python/connection_string_builder.py (100%)
  • mssql_python/connection_string_parser.py (98.1%): Missing lines 123,214
  • mssql_python/constants.py (90.9%): Missing lines 572-574
  • mssql_python/exceptions.py (100%)

Summary

  • Total: 193 lines
  • Missing: 5 lines
  • Coverage: 97%

mssql_python/connection_string_parser.py

Lines 119-127

  119                 if incomplete_text:
  120                     errors.append(f"Incomplete specification: keyword '{incomplete_text}' has no value (missing '=')")
  121                 # Skip to next semicolon
  122                 while current_pos < str_len and connection_str[current_pos] != ';':
! 123                     current_pos += 1
  124                 continue
  125             
  126             # Extract and normalize the key
  127             key = connection_str[key_start:current_pos].strip().lower()

Lines 210-218

  210         str_len = len(connection_str)
  211         
  212         # Skip leading whitespace before the value
  213         while start_pos < str_len and connection_str[start_pos] in ' \t':
! 214             start_pos += 1
  215         
  216         # If we've consumed the entire string or reached a semicolon, return empty value
  217         if start_pos >= str_len:
  218             return '', start_pos

mssql_python/constants.py

Lines 568-578

  568         try:
  569             from mssql_python.logging_config import get_logger
  570             from mssql_python.helpers import sanitize_user_input
  571             logger = get_logger()
! 572         except ImportError:
! 573             logger = None
! 574             sanitize_user_input = lambda x: str(x)[:50]  # Simple fallback
  575         
  576         filtered = {}
  577 
  578         # The rejected list should ideally be empty when used in the normal connection


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.helpers.py: 66.9%
mssql_python.pybind.ddbc_bindings.cpp: 70.7%
mssql_python.pybind.connection.connection.cpp: 74.4%
mssql_python.pybind.connection.connection_pool.cpp: 78.9%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.cursor.py: 83.4%
mssql_python.auth.py: 87.1%
mssql_python.pooling.py: 87.7%
mssql_python.row.py: 90.9%
mssql_python.exceptions.py: 92.8%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

@saurabh500 saurabh500 changed the title Test PR FEAT: Improved Connection String handling in Python Oct 29, 2025
@github-actions github-actions bot added the pr-size: large Substantial code update label Oct 29, 2025
Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devskim found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Saurabh Singh (SQL Drivers) added 4 commits October 29, 2025 14:16
… to _ConnectionStringAllowList

- Updated references in connection.py, connection_string_allowlist.py, connection_string_parser.py, and related test files to use the new class name.
- Changed method names from `parse` to `_parse` in _ConnectionStringParser for internal consistency.
- Adjusted unit tests to reflect the new class and method names, ensuring all tests pass with the updated structure.
- Improved code clarity and maintainability by adhering to naming conventions for internal classes.
…nd improve error handling for empty values
…ate test cases for parameter normalization
@saurabh500 saurabh500 marked this pull request as ready for review October 30, 2025 23:16
Copilot AI review requested due to automatic review settings October 30, 2025 23:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a comprehensive connection string allow-list feature for mssql-python, adding strict validation and parsing for ODBC connection strings. The implementation includes a three-stage pipeline (parse → filter → build) that validates connection string parameters against an allow-list, normalizes synonyms, and properly handles ODBC escape sequences.

Key Changes:

  • Implemented ODBC-compliant connection string parser with strict validation
  • Added connection string parameter allow-list with synonym normalization
  • Created connection string builder with proper escaping
  • Updated connection construction to use the new validation pipeline

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
mssql_python/exceptions.py Added ConnectionStringParseError exception for validation failures
mssql_python/connection_string_parser.py New ODBC-compliant parser with strict validation and error batching
mssql_python/connection_string_allowlist.py New allow-list manager for parameter validation and normalization
mssql_python/connection_string_builder.py New builder for reconstructing connection strings with proper escaping
mssql_python/connection.py Updated _construct_connection_string to use new validation pipeline
tests/test_010_connection_string_parser.py Comprehensive parser unit tests (448 lines)
tests/test_011_connection_string_allowlist.py Allow-list unit tests (245 lines)
tests/test_012_connection_string_integration.py Integration tests for complete flow (664 lines)
tests/test_003_connection.py Updated tests for new parameter name normalization
tests/test_006_exceptions.py Updated to expect ConnectionStringParseError for invalid strings
eng/pipelines/*.yml Removed Driver parameter from test connection strings
Comments suppressed due to low confidence (1)

mssql_python/connection_string_allowlist.py:1

  • [nitpick] The comment formatting is inconsistent. The comment on line 82 should be on a separate line above the code, matching the style used elsewhere in the file (e.g., lines 35-40).
"""

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

saurabh500 and others added 6 commits November 4, 2025 05:05
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Removed unused 'List' and 'logging' imports from connection_string_parser.py
- Removed unused 'sys', 'pooling', and 'threading' imports from test_003_connection.py
- Removed unused variables and unnecessary pass statements in tests
- Cleaned up test_012_connection_string_integration.py imports

All tests passing: 226 passed, 4 skipped
- Moved _ConnectionStringAllowList class from connection_string_allowlist.py to constants.py
- Updated all imports in connection.py and test files
Copy link
Collaborator

@bewithgaurav bewithgaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some restructuring comments which might streamline circular imports etc.
Logic lgtm and test coverage for all scenarios is super nice!

return RESERVED_PARAMETERS


class _ConnectionStringAllowList:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the ConnectionStringAllowList is now a part of constants.py, the below comment is in continuation of the same in lines of re-structuring:
Currently, _ConnectionStringAllowList is implemented as a class with only class methods and static methods, and no instance state. In Python, when a class doesn't need instance state or inheritance, it's generally more idiomatic to use module-level functions and constants instead.
Also, it'll be better to move RESERVED_PARAMETERS in this file to remove the lazy import circular dependency block, making constants.py the ultimate source of truth for all type constants

Current structure:

class _ConnectionStringAllowList:
    ALLOWED_PARAMS = {...}
    
    @classmethod
    def normalize_key(cls, key: str) -> Optional[str]:
        return cls.ALLOWED_PARAMS.get(key_lower)
    
    @staticmethod
    def _normalize_params(params: Dict[str, str], ...):
        # ...

Suggested refactoring:

Consider separating data (constants) from logic (validation functions) for better separation of concerns:

Step 1: Keep only constants inside constants.py (data only):

# mssql_python/constants.py
# Pure data/configuration - no logic

# Internal connection string constants (underscore prefix = internal use only)
_RESERVED_CONNECTION_STRING_PARAMS = ('Driver', 'APP')

_ALLOWED_CONNECTION_STRING_PARAMS = {
    'server': 'Server',
    'address': 'Server',
    'uid': 'UID',
    'pwd': 'PWD',
    # ... etc (same dictionary content, just not inside a class)
}

Step 2: Move validation logic to connection_string_parser.py (where it's used):

# mssql_python/connection_string_parser.py
# All connection string parsing and validation logic together

from mssql_python.constants import (
    _RESERVED_CONNECTION_STRING_PARAMS,
    _ALLOWED_CONNECTION_STRING_PARAMS
)

def _normalize_connection_string_key(key: str) -> Optional[str]:
    """Normalize a parameter key to its canonical form. (Internal use only)"""
    key_lower = key.lower().strip()
    return _ALLOWED_CONNECTION_STRING_PARAMS.get(key_lower)

def _validate_and_normalize_params(params: Dict[str, str], warn_rejected: bool = True) -> Dict[str, str]:
    """Validate and normalize parameters against the allow-list. (Internal use only)"""
    # ... implementation (moved from _ConnectionStringAllowList._normalize_params)
    filtered = {}
    for key, value in params.items():
        normalized_key = _normalize_connection_string_key(key)
        if normalized_key and normalized_key not in _RESERVED_CONNECTION_STRING_PARAMS:
            filtered[normalized_key] = value
    return filtered

class _ConnectionStringParser:
    """Parser for ODBC connection strings."""
    # ... existing parser implementation

why I made this suggestion:

  • all connection string validation lives in one place (connection_string_parser.py)
  • constants.py contains just data, no logic
  • no more lazy import

return ATTRIBUTE_SET_TIMING.get(attribute, AttributeSetTime.AFTER_ONLY)


# Import RESERVED_PARAMETERS from parser module to maintain single source of truth
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested in the other comment, keeping RESERVED_PARAMETERS in this file itself would be a better choice, since parser module is technically not a single source of truth as of now. Whenever lazy imports are needed to avoid circular dependencies, it often indicates that dependents should be co-located.

assert any("Empty value for keyword 'server'" in err for err in errors)
assert any("Empty value for keyword 'pwd'" in err for err in errors)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked for OCD purposes, adding these tests should cover up the remaining lines reported in codecoverage comment:

  • "Server= localhost;Database= mydb" (excessive whitespace)
  • "Server=\t\tlocalhost;PWD=\t{pass}"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha. Nice!!

from mssql_python.helpers import sanitize_user_input
logger = get_logger()
except ImportError:
logger = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested in the restructuring comment, this whole method would be moved - and hence we wouldn't be needing logger at all in constants.py - and logger can be used readily in connection_string_parser.py

If not, this might be a problem since we should not set logger as none in case of circular dependencies. We should remove the try: except: block in imports and fix circular dependency imports, if any - by streamlining modules.
defensive block in this case would render the logger of no use, which might be a problem in future - siliently skipping failures.

NotSupportedError,
)

# Connection string parser exceptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding this new line can you please append in the above list on line 15

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add the new files and classes that you have added in this folder

conn_str = add_driver_to_connection_str(connection_str)

# Add additional key-value pairs to the connection string
from mssql_python.connection_string_parser import _ConnectionStringParser, RESERVED_PARAMETERS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add all the import statements at the start of the file

Args:
initial_params: Dictionary of initial connection parameters
"""
self._params: Dict[str, str] = initial_params.copy() if initial_params else {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating a copy instead of directly assigning the value of initial_params?

parts = []

# Build in specific order: Driver first, then others
if 'Driver' in self._params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we find 'driver' or 'DRIVER' or 'dRiver', this if statement will become false.
Can we make all keys to smallercase or uppercase before comparing?

Reference: https://learn.microsoft.com/en-us/openspecs/sql_server_protocols/ms-odbcstr/55953f0e-2d30-4ad4-8e56-b4207e491409
"""

def __init__(self, allowlist=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that we have fully typed methods

# Main parsing loop
while current_pos < str_len:
# Skip leading whitespace and semicolons
while current_pos < str_len and connection_str[current_pos] in ' \t;':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use .strip() to remove leading whitespaces

Copy link
Contributor

@jahnvi480 jahnvi480 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments, overall the logic looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: large Substantial code update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More usable handling of connection string for mssql-python

6 participants