Skip to content

Conversation

@jserv
Copy link

@jserv jserv commented Nov 6, 2025

Migrated Docker base from Ubuntu (eclipse-temurin:11-jre-noble) to Alpine Linux 3.22, achieving 57.9% image size reduction (1.305GB → 549.8MB) while maintaining full functionality.

Infrastructure

  • Base image: eclipse-temurin:11-jre-noblealpine:3.22
  • Multi-architecture: Added support for linux/amd64 and linux/arm64 via BuildKit
  • Multi-stage build: Optimized 3-stage build (base → intermediate-builder → final)

Size Optimization

  • Python cleanup: Removed __pycache__, *.pyc, *.pyo files
  • Babel locales: 31.4MB → 640KB (kept only en_* and root.dat)
  • Test directories: Removed tests/ and testing/ from site-packages
  • Build tools: Removed pip/setuptools from final image
  • Layer optimization: Moved curl to build-only stage, used --chown on COPY

Multi-Architecture Support

  • amd64: Official coursier musl static build (v2.1.24)
  • arm64: VirtusLab coursier glibc build (v2.1.24) with gcompat layer
  • Conditional glibc: ARM64 requires glibc for coursier compatibility

Build Improvements

  • Virtual build dependencies pattern (gcc, g++, musl-dev cleaned after use)
  • Enhanced .dockerignore to exclude dev artifacts
  • Environment variable for coursier version (COURSIER_VERSION=2.1.24)

Results

Metric Before After Improvement
Image size 1.305GB 549.8MB -755.2MB (57.9%)

@jserv jserv changed the title Ci tweak Migrate to Alpine Linux with 57.9% image size reduction Nov 6, 2025
Ubuntu-based image, powered by eclipse-temurin:11-jre-noble, was 1.3 GB,
unnecessarily large for a Jupyter notebook environment. Debian base
includes many unnecessary packages, apt cache bloat, and no aggressive
Python artifact cleanup.

Fix:
- Migrate from Ubuntu to Alpine Linux 3.22 (musl-based, 5.4MB base)
- Multi-architecture support (linux/amd64, linux/arm64) via BuildKit
  TARGETARCH
- Aggressive Python optimization:
  * Remove __pycache__, *.pyc, *.pyo files
  * Strip Babel locale data (31.4MB → 640KB, keeping only en_*)
  * Remove test directories and pip/setuptools from site-packages
- Virtual build dependencies pattern (gcc/g++/musl-dev cleaned after
  Jupyter install)
- Move curl to intermediate-builder stage only (not in final image)
- Optimize COPY with --chown to eliminate extra chown layer
- Architecture-specific coursier binaries:
  * amd64: Official musl static build (v2.1.24)
  * arm64: VirtusLab glibc build (v2.1.24) with gcompat layer
- Enhanced .dockerignore to exclude dev artifacts

Results:
- Image size: 1.305GB → 549.8MB (57.9% reduction)
- Architecture: linux/amd64, linux/arm64 validated
- Functionality: All notebooks work, GraphViz rendering verified

Known limitations:
- Jupyter authentication disabled (intentional for local dev, see
  source/jupyter_server_config.py)
- Python 3.12 paths hardcoded (will need update on Alpine Python updates)
- Scala 2.12.10 + Almond 0.9.1 (newer versions require source changes per
  patch)
CI failed with "libz.so.1: cannot open shared object file" because x86_64
coursier binary (cs-x86_64-pc-linux.gz) is glibc-linked with relocation
types (R_X86_64_GOTPCREL, R_X86_64_GOTOFF64) and symbols (__strtok_r,
__strdup) that Alpine's gcompat cannot handle.
- x86_64: Use cs-x86_64-pc-linux-static.gz (static musl binary, zero deps)
- ARM64: Keep glibc installation (VirtusLab binary still requires it)
- Simplify x86_64 path by removing glibc/gcompat/zlib dependencies
- Move curl to coursier download RUN (needed by both architectures)
@jserv jserv merged commit e205f12 into dev Nov 6, 2025
1 check passed
@jserv jserv deleted the ci-tweak branch November 6, 2025 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants