aquif-3.5

The aquif-3.5 series is the successor to aquif-3, featuring a simplified naming scheme, expanded Mixture of Experts (MoE) options, and across-the-board performance improvements. This release streamlines model selection while delivering enhanced capabilities across reasoning, multilingual support, and general intelligence tasks.

Model Repository Links

Model HuggingFace Repository
aquif-3.5-A0.6B-Preview aquiffoo/aquif-3.5-A0.6B-Preview
aquif-3.5-3B aquiffoo/aquif-3.5-3B
aquif-3.5-7B aquiffoo/aquif-3.5-7B
aquif-3.5-8B-Think aquiffoo/aquif-3.5-8B-Think
aquif-3.5-A4B-Think aquiffoo/aquif-3.5-A4B-Think

Model Overview

Model Size (B) Active Params (B) Reasoning MoE Multilingual Context Window
aquif-3.5-A0.6B 2.61 0.6 4k
aquif-3.5-3B 2.67 2.67 32k
aquif-3.5-7B 7.3 7.3 16k
aquif-3.5-8B-Think 8.2 8.2 40k
aquif-3.5-A4B-Think 12 4 128k

Model Details

aquif-3.5-A0.6B (Experimental MoE)

An experimental small-scale Mixture of Experts model designed for multilingual applications with minimal computational overhead. Despite its compact active parameter count, it demonstrates competitive performance against larger dense models.

Performance Comparison:
Metric aquif-3.5 (2.6B A0.6B) Qwen3 (0.8B) LFM2 (0.7B) aquif-3 (0.4B)
MMLU60.544.949.955.6
GPQA30.222.128.528.5
GSM8K50.736.546.452.1
HumanEval45.236.040.037.4
Average46.734.941.243.4
aquif-3.5-3B (State-of-the-Art Dense)

The new standard for small dense models, offering optimal performance-per-parameter efficiency for general-purpose applications.

Performance Comparison:
Metric aquif-3.5 (2.7B) EXAONE 3.5 (2.4B) Qwen3 (4B) Gemma 3 (4B) Phi-4-mini (3.8B) Apriel-5B-Instruct (4.8B) aquif-3 (3.2B)
MMLU (General Knowledge)70.260.470.459.667.364.667.5
GPQA Diamond (Science)35.828.439.330.925.228.436.1
LiveCodeBench (Coding)23.112.521.311.210.411.615.4
IFEval (Instruction Following)78.973.671.280.268.680.878.9
AIME 2025 (Competition Math)13.44.59.812.75.34.39.6
Average44.335.942.438.935.437.941.5
aquif-3.5-7B (Multilingual Long Context)

A Qwen-based architecture optimized for multilingual applications with extended context capabilities, delivering state-of-the-art performance in its size class.

Performance Comparison:
Metric aquif-3.5 (7.3B) EXAONE 3.5 (7.8B) Qwen3 (8.2B) Gemma 3 (12B) Llama 3.1 (8B) Kanana 1.5 (8B) aquif-3 (3.2B)
MMLU (General Knowledge)78.572.282.974.569.268.867.5
GPQA Diamond (Science)42.339.439.340.932.837.536.1
LiveCodeBench (Coding)21.318.023.913.710.816.515.4
IFEval (Instruction Following)85.682.685.480.275.080.178.9
AIME 2025 (Competition Math)23.418.320.918.82.713.49.6
Average50.246.150.445.638.143.341.5
aquif-3.5-8B-Think & aquif-3.5-A4B-Think (Reasoning Models)

Advanced reasoning-capable models designed for complex problem-solving tasks. The A4B variant leverages MoE architecture for enhanced efficiency while maintaining superior reasoning performance.

Performance Comparison:
Metric aquif-3.5 (12B A4B) aquif-3.5 (8B) Qwen3 Thinking 2507 (31B A3B) gpt-oss-20b (21B A4B) Nemotron Nano v2 (9B) Solar Pro 2
MMLU-Pro78.578.180.573.674.280.5
GPQA Diamond70.866.870.761.764.068.7
AIME 202584.481.456.361.769.761.3
LiveCodeBench66.161.570.772.171.161.6
Humanity's Last Exam8.98.29.88.56.57.0
TAU-Bench v2 (avg)43.736.835.743.234.938.7
Average58.755.554.053.553.453.0

Key Improvements Over aquif-3

  • Simplified Naming: Clear size-based nomenclature for easier model selection
  • Enhanced MoE Support: Multiple MoE configurations across different model sizes
  • Reasoning Capabilities: Dedicated thinking models for complex problem-solving
  • Extended Context: Up to 128k context window for long-form applications
  • Multilingual by Default: Native multilingual support across all variants
  • Performance Gains: 5-15% improvement across benchmarks compared to aquif-3

Usage Recommendations

  • aquif-3.5-A0.6B: Experimental applications, resource-constrained environments
  • aquif-3.5-3B: General-purpose applications, balanced performance/efficiency
  • aquif-3.5-7B: Multilingual applications, long-context tasks
  • aquif-3.5-8B-Think: Complex reasoning, scientific analysis
  • aquif-3.5-A4B-Think: Advanced reasoning with efficiency optimization

Technical Specifications

All models support:

  • BF16 and FP16 precision
  • Standard transformer architecture optimizations
  • Efficient attention mechanisms
  • Multi-head attention with optimized KV caching

Acknowledgements

  • Qwen Team: Base architecture for 7B, 8B, and 12B-A4B models
  • Meta Llama Team: Base architecture for 3B and 2.6B-A0.6B models
  • Hugging Face: Model hosting infrastructure and training libraries

License

This project is released under the Apache 2.0 License. See LICENSE file for details.