MosaicML presents MPT-7B, the newest addition to our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1 trillion tokens of text and code, matching the quality of LLaMA-7B while being open-source and commercially usable. The model was trained on the MosaicML platform in just 9.5 days without human intervention, costing approximately $200,000. Today, you can train, fine-tune, and deploy your own private MPT models, starting from one of our checkpoints or from scratch. We're also releasing three fine-tuned models alongside MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+ with a context length of 65k tokens.
MPT Model Series
The MPT (MosaicML Pretrained Transformer) model series aims to address the limitations of existing open-source LLMs, such as LLaMA, Pythia, StableLM, and OpenLLaMA. Our MPT series is commercially usable, trained on 1 trillion tokens, capable of handling very long inputs, optimized for fast training and inference, and equipped with efficient open-source training code.
Model Evaluation
MPT-7B has been rigorously evaluated on a variety of benchmarks and consistently meets the high-quality bar set by LLaMA-7B.
New Models Released
We are releasing four models today:
MPT-7B Base: A decoder-style transformer with 6.7 billion parameters, trained on 1 trillion tokens of text and code.
MPT-7B-StoryWriter-65k+: A model designed to read and write stories with extremely long context lengths, fine-tuned on a filtered fiction subset of the books3 dataset.
MPT-7B-Instruct: A model for short-form instruction following, fine-tuned on a dataset derived from Databricks Dolly-15k and Anthropic’s Helpful and Harmless datasets.
MPT-7B-Chat: A chatbot-like model for dialogue generation, fine-tuned on various datasets including ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct.
MosaicML LLM Foundry
In addition to the model checkpoints, we have open-sourced the entire codebase for pretraining, fine-tuning, and evaluating MPT via our new MosaicML LLM Foundry, emphasizing efficiency, ease-of-use, and rigorous attention to detail.
Training and Deploying Custom MPT
To start building and deploying your own custom MPT models on the MosaicML platform, sign up here.
MPT-7B: Matching LLaMA-7B Quality
MPT-7B matches the quality of LLaMA-7B and outperforms other open-source models in the 7B-20B range on standard academic tasks. We compiled 11 open-source benchmarks commonly used for in-context learning (ICL) and evaluated them in an industry-standard manner. Our evaluation suite is open for the community to use and contribute to, ensuring the most rigorous evaluation possible.
Conclusion
With the introduction of MPT-7B, MosaicML has set a new standard for open-source, commercially usable LLMs. We invite businesses and the open-source community to build on this effort, utilizing the efficient and powerful MosaicML LLM Foundry to develop custom models and applications.
留言