About 578,000 results
Open links in new tab
  1. moonshotai/Moonlight-16B-A3B · Hugging Face

    Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …

  2. Moonlight — Megatron Bridge - NVIDIA Documentation Hub

    Moonlight is a 16B-parameter Mixture-of-Experts (MoE) model from Moonshot AI trained with 5.7T tokens using the innovative Muon optimizer.

  3. Moonlight-16B-A3B-Instruct · Models

    Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …

  4. Moonlight-16B-A3B-Instruct | AI Model Details

    Jun 14, 2025 · Based on these improvements, we introduce **Moonlight**, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the …

  5. Megatron-Bridge/src/megatron/bridge/recipes/moonlight/moonlight_16b

    Training library for Megatron-based models. Contribute to NVIDIA-NeMo/Megatron-Bridge development by creating an account on GitHub.

  6. Moonlight 16B – Efficient MoE Model for English, Code, Math

    This model strikes an impressive performance-per-FLOP balance, outperforming similar-sized models like Llama 3 3B and DeepSeek-v2-Lite across benchmarks in English, code …

  7. moonshotai/Moonlight-16B-A3B-Instruct · Model output is …

    Upload images, audio, and videos by dragging in the text input, pasting, or clicking here. Your need to confirm your account before you can post a new comment. using VLLM, Model output …

  8. Moonlight-16B-A3B

    What is Moonlight-16B-A3B? Moonlight-16B-A3B is a state-of-the-art Mixture-of-Expert (MoE) language model that represents a significant advancement in efficient AI model training.

  9. Moonlight 16B A3B By moonshotai: Benchmarks and Detailed …

    Features: 16b LLM, VRAM: 32.6GB, Context: 8K, License: mit, LLM Explorer Score: 0.29. Find out how Moonlight 16B A3B can be utilized in your business workflows, problem-solving, and …

  10. Moonlight-16B-A3B · Models

    Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …

  11. Moonlight : Moonlight is a 16B parameter Mixture of Experts (MoE) model ...

    Moonlight is a 16B parameter Mixture of Experts (MoE) model trained using the Muon optimizer, demonstrating outstanding performance in large-scale training. By incorporating weight decay …

  12. slowfastai/Moonlight-16B-A3B-bnb-4bit · Hugging Face

    These model weights are for personal testing purposes only. The goal is to find a quantization method that achieves high compression while preserving as much of the model's original …