🐸 Evaluate Russian language quality in LLMs by measuring typical errors through benchmark tests with diverse datasets for improved responses.
open-source machine-learning data-analysis reproducibility user-experience software-testing data-quality quality-assessment model-evaluation performance-measurement algorithm-comparison benchmark-testing system-optimization ru-qual-bench code-portfolio
-
Updated
Nov 8, 2025 - Python