Skip to content

Conversation

@ghareeb-falazi
Copy link

@ghareeb-falazi ghareeb-falazi commented Oct 23, 2025

design

This PR aims at creating library instrumentation for Apache Iceberg scan metrics. The figure above summarizes the planned solution (the green classes constitute the instrumentation).

The Apache Iceberg API currently emits two types of metrics: ScanMetrics that are emitted when a table scan is executed, and CommitMetrics that are emitted when modifications to the data or metadata are executed, e.g., insert, update, delete, drop column, time travel, etc.
ScanMetrics are straightforward to report using a custom reporter. We can programmatically inject such a reporter into an exiting Scan object as described here. On the contrary, CommitMetrics are difficult to report using a custom reporter due to the absence of a programmatical interface to inject such a reporter. A configuration-based approach exists for this purpose as described here. This PR focuses on ScanMetrics only and creates a library-based instrumentation for them.

CommitMetrics and agent-based instrumentation will be the subject of future PRs. Therefore, this PR partially solves #15113

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 23, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: ghareeb-falazi / name: Ghareeb Falazi (93d9380)

@ghareeb-falazi ghareeb-falazi marked this pull request as ready for review November 4, 2025 13:57
@ghareeb-falazi ghareeb-falazi requested a review from a team as a code owner November 4, 2025 13:57
Comment on lines +7 to +11
implementation("org.apache.iceberg:iceberg-core:1.8.1") {
artifact {
classifier = "tests"
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this second implementation needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the abstract unit test class AbstractIcebergTest, I am using the public classes TestTables and TestTable that are defined in the src/test/java directory of the iceberg core project, so the second implementation is meant to give me access to these classes in the unit tests of the testing project.
As a side note: I am not really an expert in Gradle, so it could really be that a better approach exists...

ghareeb-falazi and others added 2 commits November 4, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants