
moonshotai/Moonlight-16B-A3B · Hugging Face
Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …
Moonlight — Megatron Bridge - NVIDIA Documentation Hub
Moonlight is a 16B-parameter Mixture-of-Experts (MoE) model from Moonshot AI trained with 5.7T tokens using the innovative Muon optimizer.
Moonlight-16B-A3B-Instruct · Models
Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …
Moonlight-16B-A3B-Instruct | AI Model Details
Jun 14, 2025 · Based on these improvements, we introduce **Moonlight**, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the …
Megatron-Bridge/src/megatron/bridge/recipes/moonlight/moonlight_16b…
Training library for Megatron-based models. Contribute to NVIDIA-NeMo/Megatron-Bridge development by creating an account on GitHub.
Moonlight 16B – Efficient MoE Model for English, Code, Math
This model strikes an impressive performance-per-FLOP balance, outperforming similar-sized models like Llama 3 3B and DeepSeek-v2-Lite across benchmarks in English, code …
moonshotai/Moonlight-16B-A3B-Instruct · Model output is …
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here. Your need to confirm your account before you can post a new comment. using VLLM, Model output …
Moonlight-16B-A3B
What is Moonlight-16B-A3B? Moonlight-16B-A3B is a state-of-the-art Mixture-of-Expert (MoE) language model that represents a significant advancement in efficient AI model training.
Moonlight 16B A3B By moonshotai: Benchmarks and Detailed …
Features: 16b LLM, VRAM: 32.6GB, Context: 8K, License: mit, LLM Explorer Score: 0.29. Find out how Moonlight 16B A3B can be utilized in your business workflows, problem-solving, and …
Moonlight-16B-A3B · Models
Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …
Moonlight : Moonlight is a 16B parameter Mixture of Experts (MoE) model ...
Moonlight is a 16B parameter Mixture of Experts (MoE) model trained using the Muon optimizer, demonstrating outstanding performance in large-scale training. By incorporating weight decay …
slowfastai/Moonlight-16B-A3B-bnb-4bit · Hugging Face
These model weights are for personal testing purposes only. The goal is to find a quantization method that achieves high compression while preserving as much of the model's original …