马斯克开源大模型Grok-1，3140亿参数迄今最大！

type

status

date

slug

summary

category

icon

password

URL

北京时间2024年3月18日早，马斯克旗下AI初创公司xAI宣布，其研发的大模型Grok-1正式对外开源开放。用户可直接通过磁链下载基本模型权重和网络架构信息。

马斯克一直以来都是AI技术的积极推动者。此次xAI开源Grok-1，将进一步推动大模型技术的研发和应用，并促进AI技术的普惠化。

Grok-1简介:

参数量：3140亿，是目前参数量最大的开源大语言模型

训练目标：回答开放式问题

训练数据：来自人类和早期Grok模型的反馈

模型架构：混合专家模型（MoE）

Grok-1的开源意义:

推动大模型技术的研发和应用

促进AI技术的普惠化

为研究人员和开发者提供新的研究工具

以下是一些值得关注的细节:

Grok-1是目前参数量最大的开源大语言模型，其性能表现值得期待。

Grok-1的开源将为研究人员和开发者提供新的研究工具，促进AI技术的创新。

马斯克表示，xAI未来还将开源更多AI模型和工具，值得持续关注。

The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.

封面图片是根据 Grok 提出的以下提示使用 Midjourney 生成的：神经网络的 3D 插图，具有透明节点和发光连接，将不同的权重展示为连接线的不同厚度和颜色。

We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

我们正在发布我们的大型语言模型 Grok-1 的基本模型权重和网络架构。Grok-1 是一个 3140 亿参数的专家混合模型，由 xAI 从头开始训练。

This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.

这是 2023 年 10 月结束的 Grok-1 预训练阶段的原始基础模型检查点。这意味着该模型不会针对任何特定应用（例如对话）进行微调。

We are releasing the weights and the architecture under the Apache 2.0 license.

我们将在 Apache 2.0 许可下发布权重和架构。

To get started with using the model, follow the instructions at github.com/xai-org/grok.

要开始使用该模型，请按照 github.com/xai-org/grok 中的说明进行操作。

Model Details 型号详细信息

Base model trained on a large amount of text data, not fine-tuned for any particular task.

基础模型在大量文本数据上训练，未针对任何特定任务进行微调。

314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.

314B 参数 Mixture-of-Experts 模型，其中 25% 的权重在给定令牌上处于活动状态。

Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.

2023 年 10 月，xAI 使用 JAX 和 Rust 之上的自定义训练堆栈从头开始训练。

Grok-1 格罗克-1

This repository contains JAX example code for loading and running the Grok-1 open-weights model.

此存储库包含用于加载和运行 Grok-1 开放权重模型的 JAX 示例代码。

Make sure to download the checkpoint and place ckpt-0 directory in checkpoint. Then,

run确保下载检查点并将目录放在 ckpt-0 checkpoint 中。然后，运行

to test the code.以测试代码。

The script loads the checkpoint and samples from the model on a test input.

该脚本在测试输入上加载检查点和模型中的样本。

Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code. The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model.

由于模型较大（314B 参数），因此需要具有足够 GPU 内存的计算机才能使用示例代码测试模型。此存储库中 MoE 层的实现效率不高。选择该实现是为了避免需要自定义内核来验证模型的正确性。

Downloading the weights 下载权重

You can download the weights using a torrent client and this magnet link:

您可以使用 torrent 客户端和此磁力链接下载权重：

License 许可证

The code and associated Grok-1 weights in this release are licensed under the Apache 2.0 license. The license only applies to the source files in this repository and the model weights of Grok-1.

此版本中的代码和关联的 Grok-1 权重在 Apache 2.0 许可证下获得许可。该许可证仅适用于此存储库中的源文件和 Grok-1 的模型权重。

Model Details 型号详细信息

松下Panasonic Lumix S9谍照

中国电信1卡2号大湾区储值卡香港版