diff --git a/docs/evidence-transformation.md b/docs/evidence-transformation.md new file mode 100644 index 000000000..d63bc7d53 --- /dev/null +++ b/docs/evidence-transformation.md @@ -0,0 +1,510 @@ +# Evidence Transformation Pipeline Documentation + +## Overview + +This document explains how Chainloop transforms non-JSON evidence types (JUnit and Jacoco) before injecting them into the policy engine for evaluation. This addresses [Issue #2183](https://github.com/chainloop-dev/chainloop/issues/2183). + +## Supported Evidence Types + +Chainloop currently supports transformation for the following non-JSON evidence types: + +1. **JUnit XML** - Test results +2. **Jacoco XML** - Code coverage reports +3. **Custom Evidence** - JSON-based custom evidence with metadata + +## Transformation Pipeline Architecture + +``` +┌─────────────────┐ +│ CI/CD Pipeline │ +│ (Input File) │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ Material Type │ +│ Detection │ (Based on schema contract) +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ Parsing & │ +│ Validation │ (Format-specific parser) +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ CAS Upload │ +│ (Store file) │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ Metadata │ +│ Enrichment │ (Add chainloop_metadata) +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ XML → JSON │ +│ Transformation │ (GetEvaluableContent) +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ Policy Engine │ +│ (Rego Policies) │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ Violations │ +│ Collection │ +└─────────────────┘ +``` + +## JUnit XML Transformation + +### Input Format +JUnit XML files containing test suite results. Supported formats: +- Raw XML file (`junit.xml`) +- ZIP archive containing XML files +- TAR.GZ archive containing XML files + +### Key Files +- **Crafter**: `/pkg/attestation/crafter/materials/junit_xml.go` +- **Parser**: `/pkg/attestation/crafter/materials/junit/junit.go` +- **Parser Library**: `github.com/joshdk/go-junit` + +### Transformation Process + +#### 1. Upload Phase +```go +// File is validated and uploaded to Content Addressable Storage (CAS) +// Supports: .xml, .zip, .tar.gz formats +``` + +#### 2. Parsing Phase +```go +// XML is parsed into structured test suite objects +// Using github.com/joshdk/go-junit parser +type TestSuite struct { + Name string + Tests int + Failures int + Errors int + Skipped int + Time float64 + TestCases []TestCase +} +``` + +#### 3. Transformation to JSON +Located in: `/pkg/attestation/crafter/api/attestation/v1/crafting_state.go` + +**Key Function**: `GetEvaluableContent()` + +```go +// JUnit test suites are marshaled to JSON array format +func (m *CraftingState_Material) GetEvaluableContent() ([]byte, error) { + // For JUnit materials: + // 1. Retrieve parsed test suites from material + // 2. Marshal test suites array to JSON + // 3. Return JSON bytes for policy evaluation +} +``` + +#### 4. JSON Structure for Policy Engine +```json +[ + { + "name": "TestSuiteName", + "tests": 10, + "failures": 1, + "errors": 0, + "skipped": 2, + "time": 1.234, + "testCases": [ + { + "name": "TestCaseName", + "classname": "com.example.TestClass", + "time": 0.123, + "status": "failed", + "error": { + "message": "Expected true but was false", + "type": "AssertionError" + } + } + ] + } +] +``` + +### Policy Evaluation +Rego policies can access test data via: +- `input.testCases` - Array of test cases +- `input.failures` - Number of failed tests +- `input.errors` - Number of test errors +- Individual test case details for granular rules + +## Jacoco XML Transformation + +### Input Format +Jacoco XML coverage reports generated by the Jacoco code coverage tool. + +### Key Files +- **Crafter**: `/pkg/attestation/crafter/materials/jacoco.go` +- **Structures**: `/pkg/attestation/crafter/materials/jacoco/jacoco.go` + +### Transformation Process + +#### 1. Validation Phase +```go +// Jacoco XML must contain INSTRUCTION counter type +// This is validated during the Validate() phase +type Report struct { + Name string + Package []Package + Counter []Counter +} + +// Mandatory: Must have Counter with Type="INSTRUCTION" +``` + +#### 2. Coverage Metrics Tracked +```xml + + + + + + +``` + +#### 3. Transformation to JSON +Located in: `/pkg/attestation/crafter/api/attestation/v1/crafting_state.go` + +**Key Function**: `GetEvaluableContent()` + +```go +// Jacoco XML is unmarshaled to Report struct, then marshaled to JSON +func (m *CraftingState_Material) GetEvaluableContent() ([]byte, error) { + // For Jacoco materials: + // 1. Unmarshal XML to jacoco.Report struct + // 2. Marshal Report struct to JSON + // 3. Return JSON bytes for policy evaluation +} +``` + +#### 4. JSON Structure for Policy Engine +```json +{ + "name": "ProjectName", + "package": [ + { + "name": "com/example/package", + "class": [ + { + "name": "com/example/package/ClassName", + "sourcefilename": "ClassName.java", + "counter": [ + { + "type": "INSTRUCTION", + "missed": 10, + "covered": 90 + }, + { + "type": "BRANCH", + "missed": 2, + "covered": 18 + }, + { + "type": "LINE", + "missed": 5, + "covered": 45 + } + ] + } + ], + "counter": [...] + } + ], + "counter": [ + { + "type": "INSTRUCTION", + "missed": 100, + "covered": 500 + } + ] +} +``` + +### Policy Evaluation +Rego policies can calculate coverage metrics: +- Instruction coverage: `covered / (missed + covered) * 100` +- Branch coverage +- Line coverage +- Enforce minimum coverage thresholds +- Check specific package/class coverage + +## Metadata Enrichment + +All evidence materials are enriched with Chainloop metadata before transformation: + +```go +type ChainloopMetadata struct { + // In-toto subject descriptor + Subject *intoto.Statement_Subject + + // Annotations from material schema + Annotations map[string]string + + // Material type information + MaterialType string +} +``` + +### In-toto Integration +Chainloop creates in-toto subject descriptors for all materials: + +```json +{ + "name": "junit-results.xml", + "digest": { + "sha256": "abc123..." + } +} +``` + +## Policy Engine Integration + +### File: `/pkg/policies/policies.go` + +The policy verification process: + +1. **Material Retrieval**: Get all materials from attestation +2. **Transformation**: Call `GetEvaluableContent()` for each material +3. **Policy Evaluation**: Execute Rego policies against JSON content +4. **Violation Collection**: Gather all policy violations + +```go +// Simplified policy evaluation flow +func Verify(attestation, policies) (violations, error) { + for _, material := range attestation.Materials { + // Transform material to evaluable JSON + jsonContent := material.GetEvaluableContent() + + // Evaluate all policies + for _, policy := range policies { + results := evaluateRego(policy, jsonContent) + violations = append(violations, results.Violations...) + } + } + return violations, nil +} +``` + +## Rego Policy Examples + +### JUnit Policy Example +```rego +package junit + +# Deny if any test failures exist +deny[msg] { + input.failures > 0 + msg := sprintf("JUnit tests have %d failures", [input.failures]) +} + +# Deny if error rate exceeds threshold +deny[msg] { + total_tests := input.tests + failed_tests := input.failures + input.errors + error_rate := (failed_tests / total_tests) * 100 + error_rate > 5 + msg := sprintf("Error rate %.2f%% exceeds 5%% threshold", [error_rate]) +} + +# Check for flaky tests (if test history available) +warn[msg] { + some i + test := input.testCases[i] + test.status == "skipped" + msg := sprintf("Test '%s' was skipped", [test.name]) +} +``` + +### Jacoco Policy Example +```rego +package coverage + +# Deny if instruction coverage below threshold +deny[msg] { + counter := input.counter[_] + counter.type == "INSTRUCTION" + + coverage := (counter.covered / (counter.missed + counter.covered)) * 100 + coverage < 80 + + msg := sprintf("Instruction coverage %.2f%% is below 80%% threshold", [coverage]) +} + +# Deny if branch coverage below threshold +deny[msg] { + counter := input.counter[_] + counter.type == "BRANCH" + + coverage := (counter.covered / (counter.missed + counter.covered)) * 100 + coverage < 70 + + msg := sprintf("Branch coverage %.2f%% is below 70%% threshold", [coverage]) +} + +# Check specific package coverage +deny[msg] { + pkg := input.package[_] + pkg.name == "com/example/critical" + + counter := pkg.counter[_] + counter.type == "INSTRUCTION" + + coverage := (counter.covered / (counter.missed + counter.covered)) * 100 + coverage < 95 + + msg := sprintf("Critical package coverage %.2f%% below 95%%", [coverage]) +} +``` + +## Adding New Evidence Types + +To add support for a new non-JSON evidence type: + +### 1. Create Material Crafter +```go +// /pkg/attestation/crafter/materials/newtype.go +type NewTypeMaterial struct { + *MaterialCommon +} + +func (m *NewTypeMaterial) Validate() error { + // Validate file format +} + +func (m *NewTypeMaterial) Upload() error { + // Upload to CAS +} +``` + +### 2. Implement Parser +```go +// /pkg/attestation/crafter/materials/newtype/parser.go +type ParsedData struct { + // Define structure +} + +func Parse(xmlContent []byte) (*ParsedData, error) { + // Parse XML/other format +} +``` + +### 3. Add Transformation +```go +// In crafting_state.go +func (m *CraftingState_Material) GetEvaluableContent() ([]byte, error) { + switch m.MaterialType { + case MaterialType_NEWTYPE: + // 1. Parse the file + parsed := newtype.Parse(m.Content) + + // 2. Marshal to JSON + jsonBytes, err := json.Marshal(parsed) + + // 3. Return for policy evaluation + return jsonBytes, err + } +} +``` + +### 4. Define Schema Contract +```protobuf +// In crafting_schema.proto +enum CraftingSchema_Material_MaterialType { + NEWTYPE = N; +} +``` + +### 5. Write Tests +```go +func TestNewTypeMaterial(t *testing.T) { + // Test parsing + // Test transformation + // Test policy evaluation +} +``` + +## Key Implementation Files + +### Core Transformation +- `/pkg/attestation/crafter/api/attestation/v1/crafting_state.go` - Main transformation logic +- `/pkg/attestation/crafter/materials/materials.go` - Material factory +- `/pkg/attestation/crafter/crafter.go` - Orchestrator + +### Evidence Type Handlers +- `/pkg/attestation/crafter/materials/junit_xml.go` - JUnit handler +- `/pkg/attestation/crafter/materials/junit/junit.go` - JUnit parser +- `/pkg/attestation/crafter/materials/jacoco.go` - Jacoco handler +- `/pkg/attestation/crafter/materials/jacoco/jacoco.go` - Jacoco structures +- `/pkg/attestation/crafter/materials/evidence.go` - Custom evidence handler + +### Policy Engine +- `/pkg/policies/policies.go` - Policy verification orchestrator +- `/pkg/policies/engine/rego/rego.go` - Rego policy execution + +### Schema Definitions +- `/app/controlplane/api/workflowcontract/v1/crafting_schema.proto` - Material type definitions + +### Test Data +- `/pkg/attestation/crafter/materials/testdata/junit.xml` +- `/pkg/attestation/crafter/materials/testdata/jacoco.xml` +- `/pkg/attestation/crafter/materials/testdata/evidence-*.json` + +## Performance Considerations + +1. **CAS Upload**: Files are uploaded once and referenced by digest +2. **Lazy Transformation**: JSON transformation happens only when policies are evaluated +3. **Cached Results**: Transformed content can be cached per attestation +4. **Streaming**: Large files are streamed during upload to avoid memory issues + +## Error Handling + +The transformation pipeline includes comprehensive error handling: + +1. **Parse Errors**: Invalid XML/format returns descriptive error +2. **Validation Errors**: Missing required fields fail validation phase +3. **Transform Errors**: JSON marshaling errors are caught and reported +4. **Policy Errors**: Rego evaluation errors don't crash the pipeline + +## Security Considerations + +1. **Content Addressing**: Files are stored by cryptographic hash (SHA-256) +2. **Immutability**: Once uploaded, evidence cannot be modified +3. **Attestation Signing**: Transformations are part of signed attestations +4. **Policy Isolation**: Each policy evaluation runs in isolated environment + +## References + +- [Issue #2183](https://github.com/chainloop-dev/chainloop/issues/2183) +- [JUnit XML Format](https://llg.cubic.org/docs/junit/) +- [Jacoco XML Report](https://www.jacoco.org/jacoco/trunk/doc/xml-report.html) +- [Open Policy Agent (Rego)](https://www.openpolicyagent.org/docs/latest/policy-language/) +- [in-toto Specification](https://in-toto.io/) + +## Contributing + +To improve this documentation or add support for new evidence types, please submit a PR to the Chainloop repository. + +--- + +**Document Version**: 1.0 +**Last Updated**: 2025-11-03 +**Created for**: ENG-177