-
Notifications
You must be signed in to change notification settings - Fork 9
Add Python import tracking to code graph #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the WalkthroughA framework for import tracking is introduced with abstract methods in the base analyzer, concrete implementations in language-specific analyzers, integration into the multi-pass analysis workflow, and data model extensions to track import statements and their resolved entities in files. Changes
Sequence Diagram(s)sequenceDiagram
participant sa as SourceAnalyzer
participant aa as Analyzer<br/>(Python)
participant file as File
participant graph as Graph
Note over sa,graph: First Pass
sa->>aa: first_pass(file)
aa->>aa: extract_entity_symbols()
aa->>graph: add File node
aa->>aa: add_file_imports(file)
aa->>file: add_import(import_node)
Note over sa,graph: Second Pass
sa->>aa: second_pass(file)
aa->>aa: resolve_entity_symbols()
aa->>aa: resolve_import(files,lsp,...)
aa->>graph: returns list[Entity]
aa->>file: add_resolved_import(entity)
aa->>graph: add IMPORTS edge<br/>(file → entity)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Comment |
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
api/analyzers/analyzer.py(1 hunks)api/analyzers/java/analyzer.py(1 hunks)api/analyzers/python/analyzer.py(1 hunks)api/analyzers/source_analyzer.py(3 hunks)api/entities/file.py(2 hunks)tests/source_files/py_imports/module_a.py(1 hunks)tests/source_files/py_imports/module_b.py(1 hunks)tests/test_py_imports.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.2)
api/analyzers/java/analyzer.py
136-136: File may be undefined, or defined from star imports
(F405)
144-144: Unused method argument: files
(ARG002)
144-144: File may be undefined, or defined from star imports
(F405)
144-144: Unused method argument: lsp
(ARG002)
144-144: Unused method argument: file_path
(ARG002)
144-144: Unused method argument: path
(ARG002)
144-144: Unused method argument: import_node
(ARG002)
144-144: Entity may be undefined, or defined from star imports
(F405)
api/analyzers/python/analyzer.py
125-125: File may be undefined, or defined from star imports
(F405)
156-156: Do not catch blind exception: Exception
(BLE001)
159-159: File may be undefined, or defined from star imports
(F405)
159-159: Entity may be undefined, or defined from star imports
(F405)
212-212: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (10)
tests/source_files/py_imports/module_a.py (1)
1-12: LGTM! Well-structured test fixture.The test module is clean and well-documented, providing appropriate entities (ClassA and function_a) for validating import tracking functionality.
api/analyzers/source_analyzer.py (2)
115-118: LGTM! Import extraction properly integrated into first pass.The import extraction is correctly placed after hierarchy creation and properly guards against dependency files.
148-172: LGTM! Import resolution well-integrated into second pass.The import resolution logic is properly sequenced after entity symbol resolution and correctly creates IMPORTS edges in the graph.
tests/source_files/py_imports/module_b.py (1)
1-11: LGTM! Comprehensive test fixture.This module effectively exercises multiple relationships (IMPORTS and EXTENDS) and validates both class and function imports, providing good coverage for the import tracking feature.
api/analyzers/analyzer.py (1)
146-173: LGTM! Clear abstract interface for import tracking.The new abstract methods are well-documented with clear contracts, establishing a consistent pattern for language-specific import handling.
api/analyzers/python/analyzer.py (2)
125-157: LGTM! Import extraction logic is sound.The implementation correctly uses tree-sitter queries to capture both import types and properly handles errors with debug logging.
186-210: Verify that tree-sitter produces only simple identifier nodes after theimportkeyword in from-import statements.The extraction of
child.children[0]at lines 198 and 206 assumes dotted_name nodes may have multiple children requiring extraction of just the first component. However, in valid Python syntax, the names imported infrom X import Ystatements are always simple identifiers—qualified names likefrom X import Y.Zare syntax errors. Tree-sitter should parse imported names as singleidentifiernodes rather thandotted_namenodes after theimportkeyword.The current test cases (simple imports like
from module_a import ClassA) pass, but there's no verification that this logic handles edge cases or malformed input correctly. Confirm that:
- Tree-sitter produces only
identifiernodes (notdotted_name) for imported names in from-import statements- The
children[0]extraction doesn't inadvertently mask resolution issues if tree-sitter's grammar differs from expectationsapi/entities/file.py (1)
32-48: LGTM! Import tracking methods are well-implemented.The
add_importandadd_resolved_importmethods provide a clean API for managing import tracking, with clear documentation.api/analyzers/java/analyzer.py (1)
136-150: LGTM! Appropriate stub implementation for future work.The stub methods clearly document that Java import tracking is not yet implemented, which is appropriate for this Python-focused PR. The TODOs provide clear guidance for future implementation.
tests/test_py_imports.py (1)
9-63: Test lacks coverage for alternative Python import patterns that the analyzer already supports.The test only exercises
from module import entityimports (module_b.py). However, the PythonAnalyzer explicitly supports three patterns (add_file_imports docstring, lines 129-135):
import moduleimport module as aliasfrom module import namevariantsThe test source files don't include the first two patterns, creating a gap. Either add test cases with these patterns to module_a.py/module_b.py, or document this as intentional MVP scope.
| if import_node.type == 'import_statement': | ||
| # Handle "import module" or "import module as alias" | ||
| # Find all dotted_name and aliased_import nodes | ||
| for child in import_node.children: | ||
| if child.type == 'dotted_name': | ||
| # Try to resolve the module/name | ||
| identifier = child.children[0] if child.child_count > 0 else child | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) | ||
| elif child.type == 'aliased_import': | ||
| # Get the actual name from aliased import (before 'as') | ||
| if child.child_count > 0: | ||
| actual_name = child.children[0] | ||
| if actual_name.type == 'dotted_name' and actual_name.child_count > 0: | ||
| identifier = actual_name.children[0] | ||
| else: | ||
| identifier = actual_name | ||
| resolved = self.resolve_type(files, lsp, file_path, path, identifier) | ||
| res.extend(resolved) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify handling of dotted module imports and bare import statements.
The implementation may not correctly handle certain import patterns:
-
Dotted imports: For
import a.b.c, line 172 extractschild.children[0]which only resolvesa, not the full module patha.b.c. -
Bare module imports: For
import module(notfrom module import ...), the code attempts to resolve the module itself as a type/entity. This differs semantically fromfrom module import entity, where a specific entity is imported. Consider whether bare module imports should create IMPORTS edges or be handled differently.
🏁 Script executed:
#!/bin/bash
# Check for test cases or usage of dotted imports and bare import statements
rg -n "^import [a-zA-Z_][a-zA-Z0-9_]*\.[a-zA-Z0-9_.]+" tests/ --type py -C 2
rg -n "^import [a-zA-Z_][a-zA-Z0-9_]*$" tests/ --type py -C 2Length of output: 2908
I need to examine the actual implementation to verify whether the concerns about dotted imports and bare imports are valid.
🏁 Script executed:
#!/bin/bash
# Examine the resolve_type method and surrounding context
rg -n "def resolve_type" api/analyzers/python/analyzer.py -A 20Length of output: 1221
🏁 Script executed:
#!/bin/bash
# Check how dotted_name is handled elsewhere in the file
rg -n "dotted_name" api/analyzers/python/analyzer.py -B 2 -A 2Length of output: 1529
🏁 Script executed:
#!/bin/bash
# Examine the full import handling section
cat api/analyzers/python/analyzer.py | sed -n '160,200p'Length of output: 2271
🏁 Script executed:
#!/bin/bash
# Check if there's any handling of multi-part module names (a.b.c pattern)
rg -n "dotted" api/analyzers/python/analyzer.pyLength of output: 499
🏁 Script executed:
#!/bin/bash
# Look for test cases that might exercise dotted imports or the resolve logic
rg -n "resolve_type\|dotted_name\|aliased_import" tests/test_py_analyzer.py -B 2 -A 2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check structure of tree-sitter Python AST for dotted names
python3 << 'EOF'
try:
from tree_sitter import Language, Parser
import tree_sitter_python as tspython
# Create parser
PY_LANGUAGE = Language(tspython.language(), "python")
parser = Parser()
parser.set_language(PY_LANGUAGE)
# Parse sample dotted import
code = b"import a.b.c"
tree = parser.parse(code)
def print_tree(node, indent=0):
print(" " * indent + f"{node.type}: {repr(node.text)}")
for child in node.children:
print_tree(child, indent + 1)
print("=== import a.b.c ===")
print_tree(tree.root_node)
# Parse bare import
code2 = b"import os"
tree2 = parser.parse(code2)
print("\n=== import os ===")
print_tree(tree2.root_node)
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
EOFLength of output: 160
🏁 Script executed:
#!/bin/bash
# Check if resolve_type is called elsewhere and how results are used
rg -n "resolve_type\|\.extend\(resolved\)" api/analyzers/python/analyzer.py -B 1 -A 1Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check how the results (res) are used after being extended
rg -n "def get_imports\|return res" api/analyzers/python/analyzer.py -A 3 -B 3Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for actual usage of import edges or IMPORTS in the codebase
rg -n "IMPORTS\|ImportEdge" api/analyzers/ --type py | head -20Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check what resolve_type returns
rg -n "def resolve_type" api/analyzers/ --type py -A 30 | grep -A 30 "analyzer.py"Length of output: 8301
🏁 Script executed:
#!/bin/bash
# Check if there are actual dotted imports in the codebase (not just tests)
find api -name "*.py" -type f -exec grep -l "^import [a-zA-Z_][a-zA-Z0-9_]*\.[a-zA-Z0-9_]" {} \;Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look at what specific nodes we're dealing with
rg -n "import_statement\|aliased_import" api/analyzers/python/analyzer.py -B 1 -A 1Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if child_count and children indexing are correct
cat api/analyzers/python/analyzer.py | sed -n '160,195p'Length of output: 1905
🏁 Script executed:
#!/bin/bash
# Understand the import handling more thoroughly - get larger context
cat api/analyzers/python/analyzer.py | sed -n '150,210p'Length of output: 3457
🏁 Script executed:
#!/bin/bash
# Check for any node processing utilities that handle dotted names
rg -n "def.*dotted\|def.*name\|children\[0\]" api/analyzers/python/analyzer.py | head -15Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find and examine how nodes are extracted for identifiers
ast-grep --pattern 'identifier = $_ if $_ else $_'Length of output: 255
🏁 Script executed:
#!/bin/bash
# Look at the exact implementation more carefully
sed -n '166,184p' api/analyzers/python/analyzer.pyLength of output: 1229
🏁 Script executed:
#!/bin/bash
# Check how the full dotted path should be reconstructed
rg -n "join\|\.split\|text\|string" api/analyzers/python/analyzer.py | grep -i dot | head -10Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get node.text or equivalent for dotted_name nodes
python3 << 'EOF'
# Check if we can get full text from tree-sitter node
import inspect
try:
from tree_sitter import Node
if hasattr(Node, 'text'):
print("Node has 'text' attribute")
print("Node methods/attributes:", [m for m in dir(Node) if not m.startswith('_')][:20])
except Exception as e:
print(f"Error: {e}")
EOFLength of output: 36
🏁 Script executed:
#!/bin/bash
# Let's check the actual file structure and logic once more
cat api/analyzers/python/analyzer.py | sed -n '155,200p'Length of output: 2573
🏁 Script executed:
#!/bin/bash
# Check if there's a method to get the full text from a dotted_name node
rg -n "\.text\|get_text" api/analyzers/python/analyzer.py | head -10Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look at the identifier resolution and what resolve_type expects
rg -n "def resolve_type\|identifier" api/analyzers/python/analyzer.py | head -20Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search entire codebase for resolve_type implementation
rg -n "def resolve_type" api/ --type py -A 25Length of output: 7132
🏁 Script executed:
#!/bin/bash
# Check what happens with bare imports - understand the semantic concern
rg -n "import_statement\|from.*import" tests/test_py_analyzer.py -B 1 -A 3 | head -30Length of output: 0
Need to check where resolve_import is called and whether this code is actually used:Now let me search for where resolve_import is actually being called:
🏁 Script executed:
#!/bin/bash
# Find all calls to resolve_import
rg -n "resolve_import" api/ --type pyLength of output: 719
🏁 Script executed:
#!/bin/bash
# Check the add_imports method to see if resolve_import is called
rg -n "def add_imports\|resolve_import" api/analyzers/python/analyzer.py -B 2 -A 10Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check imports section context more carefully - find what calls resolve
rg -n "def.*import\|resolve" api/analyzers/python/analyzer.py | head -25Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find where resolve_import method is referenced/called
rg -n "\.resolve_import" api/ --type pyLength of output: 185
🏁 Script executed:
#!/bin/bash
# Check context around the resolve_import call
sed -n '160,180p' api/analyzers/source_analyzer.pyLength of output: 1270
🏁 Script executed:
#!/bin/bash
# See broader context of how imports are resolved
sed -n '150,190p' api/analyzers/source_analyzer.pyLength of output: 2358
🏁 Script executed:
#!/bin/bash
# Check what resolved_entities is used for
rg -n "resolved_entities" api/analyzers/source_analyzer.py -B 2 -A 5Length of output: 618
🏁 Script executed:
#!/bin/bash
# Understand the import node structure - what does add_import do with import nodes
rg -n "def add_import\|import_node" api/models/file.py -B 2 -A 5Length of output: 58
🏁 Script executed:
#!/bin/bash
# See how imports are stored/used in File class
rg -n "class File\|add_import\|imports" api/models/file.py | head -20Length of output: 58
🏁 Script executed:
#!/bin/bash
# Look at tree-sitter Python grammar to understand dotted_name structure
cat api/analyzers/python/analyzer.py | sed -n '1,50p'Length of output: 2406
🏁 Script executed:
#!/bin/bash
# Check if there's any reconstruction of full dotted paths
rg -n "get_text\|\.text\|join" api/analyzers/python/analyzer.py | head -15Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for how other analyzers handle dotted imports
sed -n '144,200p' api/analyzers/java/analyzer.pyLength of output: 344
🏁 Script executed:
#!/bin/bash
# Check the base analyzer to see if there's a pattern for handling this
sed -n '158,220p' api/analyzers/analyzer.pyLength of output: 628
Fix dotted import resolution to capture full module paths, not just first segment.
The code at line 172 (and 180, 198, 206) uses child.children[0] to extract identifiers from dotted_name nodes. This only captures the first segment. For import a.b.c, this extracts only a instead of the full path a.b.c.
# Current behavior extracts only first child:
identifier = child.children[0] if child.child_count > 0 else child
# Should use node.text to capture full dotted path:
identifier = child # use the entire dotted_name node, or reconstruct from text
Bare imports may not resolve as intended. Since resolve_type searches for a class_definition parent, bare module imports (e.g., import os) will likely return empty results. Verify whether this is intentional (only track entity imports, not module imports) or if module-level resolution needs different handling.
Add test coverage for dotted and bare import patterns to prevent regression, since no test cases currently exercise resolve_import logic.
🤖 Prompt for AI Agents
In api/analyzers/python/analyzer.py around lines 166 to 184, change the
identifier extraction for dotted_name nodes so you pass the full dotted path to
resolve_type instead of only the first segment: use the entire dotted_name node
(or reconstruct the full path from its children text segments joined by '.') for
both plain dotted_name and aliased_import branches before calling resolve_type;
ensure resolve_type accepts/handles a dotted_name node or a reconstructed
string, add handling/clarity for bare module imports if you intend to resolve
modules (or leave behavior documented if only entity imports are desired), and
add unit tests covering multi-segment imports like "import a.b.c" and simple
imports like "import os" to prevent regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
api/entities/file.py
Outdated
| self.imports: list[Node] = [] | ||
| self.resolved_imports: set[Self] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect type annotation for resolved_imports.
Line 26 declares resolved_imports: set[Self], but this should be set[Entity]. The resolved imports are Entity objects (classes, functions, etc.), not File objects. This is confirmed by the usage in source_analyzer.py line 171, where resolved_entity (from resolve_import returning list[Entity]) is added to resolved_imports.
Apply this diff to fix the type annotation:
self.entities: dict[Node, Entity] = {}
self.imports: list[Node] = []
- self.resolved_imports: set[Self] = set()
+ self.resolved_imports: set[Entity] = set()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.imports: list[Node] = [] | |
| self.resolved_imports: set[Self] = set() | |
| self.imports: list[Node] = [] | |
| self.resolved_imports: set[Entity] = set() |
🤖 Prompt for AI Agents
In api/entities/file.py around lines 25 to 26, the type annotation for
resolved_imports is incorrect (set[Self]) — change it to set[Entity] so it
reflects that resolved imports are Entity objects; update or add the necessary
import or forward reference for Entity in this module if not already present,
and run a type-check to ensure no remaining references expect Self.
|
@copilot |
…tity] Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Fixed in commit 109c94a. Changed |
Add Python import tracking to code graph
Plan:
Implementation Summary
Recent Fix:
resolved_importstype annotation fromset[Self]toset[Entity]add_resolved_import()parameter type fromSelftoEntitytyping.SelfimportCore Changes:
Extended File Entity (
api/entities/file.py)importslist to track import statement nodesresolved_imports: set[Entity]to store resolved entities (fixed type)add_import(),add_resolved_import()Extended Abstract Analyzer (
api/analyzers/analyzer.py)add_file_imports(file)abstract methodresolve_import(...)abstract methodImplemented Python Import Support (
api/analyzers/python/analyzer.py)Integrated into Analysis Pipeline (
api/analyzers/source_analyzer.py)Added Tests
module_a.py,module_b.pytest_py_imports.pyNew Graph Relationship: IMPORTS
File(module_b.py) -[IMPORTS]-> Class(ClassA)Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.
Summary by CodeRabbit
New Features
Tests