Skip to content

Conversation

@BewareMyPower
Copy link
Contributor

Fixes #513

Motivation

When a ClientImpl object is destructed, the LookupService object pointed by the lookupServicePtr_ field will be destructed. However, if there is a pending lookup operation in RetryableOperation, it could access the destructed RetryableLookupService::lookupService_ field, which is null.

Modifications

  • Call RetryableLookupService::close in ClientImpl::shutdown. Improve RetryableOperationCache::close to fail any new operation with ResultAlreadyClosed after close is called.
  • Add ResultAlreadyClosed to the fatal result list so that RetryableOperation won't retry for this error code.
  • Add testAfterClientShutdown to verify the pending lookup operation will fail with ResultDisconnected after shutdown is called because the timer is cancelled.
  • Add testRetryAfterDestroyed to verify any further operation will fail with ResultAlreadyClosed after RetryableLookupService::close.

@BewareMyPower BewareMyPower self-assigned this Nov 5, 2025
@BewareMyPower BewareMyPower added the bug Something isn't working label Nov 5, 2025
@BewareMyPower BewareMyPower added this to the 3.8.0 milestone Nov 5, 2025
@BewareMyPower BewareMyPower requested a review from Copilot November 6, 2025 02:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the RetryableOperationCache to properly handle operations after the cache has been closed by introducing a closed_ flag and returning ResultAlreadyClosed for new operations. The method clear() is renamed to close() for consistency with other components.

Key changes:

  • Renamed clear() to close() in RetryableOperationCache and added a closed_ flag
  • Added logic to return ResultAlreadyClosed when run() is called after close()
  • Updated ResultUtils.h to treat ResultAlreadyClosed as a non-retryable (fatal) error
  • Added test coverage for the race condition where operations are attempted after client shutdown

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib/RetryableOperationCache.h Renamed clear() to close(), added closed_ flag, and early return with ResultAlreadyClosed when operations are attempted after close
lib/RetryableLookupService.h Updated calls from clear() to close(), applied std::move optimization to constructor parameter
lib/ClientImpl.h Added LookupServiceFactory type alias and test-only constructor that accepts a custom lookup service factory
lib/ClientImpl.cc Refactored lookup service creation to use factory pattern, ensuring lookupServicePtr_->close() is called in shutdown()
lib/ResultUtils.h Added ResultAlreadyClosed to the set of fatal (non-retryable) results
tests/RetryableOperationCacheTest.cc Updated test to use close() instead of clear()
tests/LookupServiceTest.cc Added tests for race conditions when operations occur after client shutdown
Comments suppressed due to low confidence (1)

tests/RetryableOperationCacheTest.cc:121

  • The test name testClear should be updated to testClose to match the renamed method being tested.
TEST_F(RetryableOperationCacheTest, testClear) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@BewareMyPower BewareMyPower requested a review from Copilot November 6, 2025 02:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

tests/RetryableOperationCacheTest.cc:121

  • The test name testClear should be updated to testClose to match the renamed method being tested.
TEST_F(RetryableOperationCacheTest, testClear) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@BewareMyPower BewareMyPower merged commit a03eb92 into apache:main Nov 6, 2025
14 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/get-partition-metadata-segfault branch November 6, 2025 03:49
BewareMyPower added a commit that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] testNegativeAckPrecisionBitCnt could cause segmentation fault

2 participants