JSW Homes - Search

About 578,000 results

Open links in new tab

Any time

huggingface.co
https://huggingface.co › moonshotai
moonshotai/Moonlight-16B-A3B · Hugging Face
Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …
nvidia.com
https://docs.nvidia.com › ... › latest › models › llm › moonlight.html
Moonlight — Megatron Bridge - NVIDIA Documentation Hub
Moonlight is a 16B-parameter Mixture-of-Experts (MoE) model from Moonshot AI trained with 5.7T tokens using the innovative Muon optimizer.
modelscope.cn
https://www.modelscope.cn › models › moonshotai
Moonlight-16B-A3B-Instruct · Models
Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …
aimodels.fyi
https://www.aimodels.fyi › models › huggingFace
Moonlight-16B-A3B-Instruct | AI Model Details
Jun 14, 2025 · Based on these improvements, we introduce **Moonlight**, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the …
github.com
https://github.com › ... › megatron › bridge › recipes › moonlight
Megatron-Bridge/src/megatron/bridge/recipes/moonlight/moonlight_16b…
Training library for Megatron-based models. Contribute to NVIDIA-NeMo/Megatron-Bridge development by creating an account on GitHub.
rapidapi.com
https://rapidapi.com › robotfa-robotma › api
Moonlight 16B – Efficient MoE Model for English, Code, Math
This model strikes an impressive performance-per-FLOP balance, outperforming similar-sized models like Llama 3 3B and DeepSeek-v2-Lite across benchmarks in English, code …
huggingface.co
https://huggingface.co › moonshotai › discussions
moonshotai/Moonlight-16B-A3B-Instruct · Model output is …
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here. Your need to confirm your account before you can post a new comment. using VLLM, Model output …
promptlayer.com
https://www.promptlayer.com › models
Moonlight-16B-A3B
What is Moonlight-16B-A3B? Moonlight-16B-A3B is a state-of-the-art Mixture-of-Expert (MoE) language model that represents a significant advancement in efficient AI model training.
llm-explorer.com
https://llm-explorer.com › model › moonshotai
Moonlight 16B A3B By moonshotai: Benchmarks and Detailed …
Features: 16b LLM, VRAM: 32.6GB, Context: 8K, License: mit, LLM Explorer Score: 0.29. Find out how Moonlight 16B A3B can be utilized in your business workflows, problem-solving, and …
modelscope.cn
https://www.modelscope.cn › models › moonshotai
Moonlight-16B-A3B · Models
Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto …
aibase.com
https://app.aibase.com › details
Moonlight : Moonlight is a 16B parameter Mixture of Experts (MoE) model ...
Moonlight is a 16B parameter Mixture of Experts (MoE) model trained using the Muon optimizer, demonstrating outstanding performance in large-scale training. By incorporating weight decay …
huggingface.co
https://huggingface.co › slowfastai
slowfastai/Moonlight-16B-A3B-bnb-4bit · Hugging Face
These model weights are for personal testing purposes only. The goal is to find a quantization method that achieves high compression while preserving as much of the model's original …

Some results have been removed
Pagination
- 1
- 2
- 3
- 4
- 5
- Next

moonshotai/Moonlight-16B-A3B · Hugging Face

Moonlight — Megatron Bridge - NVIDIA Documentation Hub

Moonlight-16B-A3B-Instruct · Models

Moonlight-16B-A3B-Instruct | AI Model Details

Megatron-Bridge/src/megatron/bridge/recipes/moonlight/moonlight_16b…

Moonlight 16B – Efficient MoE Model for English, Code, Math

moonshotai/Moonlight-16B-A3B-Instruct · Model output is …

Moonlight-16B-A3B

Moonlight 16B A3B By moonshotai: Benchmarks and Detailed …

Moonlight-16B-A3B · Models

Moonlight : Moonlight is a 16B parameter Mixture of Experts (MoE) model ...

slowfastai/Moonlight-16B-A3B-bnb-4bit · Hugging Face