Models and hardware requirements

Tier: Premium, Ultimate
Offering: GitLab Self-Managed

You can integrate with industry-leading models from Mistral, Meta, Anthropic, and OpenAI through your preferred serving platform.

You can use:

Supported models to match your specific performance needs and use cases.
In GitLab 18.3 and later, your own compatible model to experiment with models beyond the officially supported options.
GitLab-managed models to connect to AI models without the need to host your own infrastructure. These models are managed entirely by GitLab.

Supported models

GitLab-supported models offer different levels of functionality for GitLab Duo features, depending on the specific model and feature combination.

Full functionality: The model can likely handle the feature without any loss of quality.
Partial functionality: The model supports the feature, but there might be compromises or limitations.
Limited functionality: The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues. Models that have limited functionality for a feature will not receive GitLab support for that specific feature.

Model family	Model	Code completion	Code generation	GitLab Duo Chat (non-agentic)	GitLab Duo Agent Platform
Claude 4	Claude 4 Sonnet	Full functionality	Full functionality	Full functionality	Full functionality
Claude 4	Claude 4.5 Sonnet	Full functionality	Full functionality	Full functionality	Full functionality
Claude 4	Claude 4.5 Haiku	Full functionality	Full functionality	Full functionality	Full functionality
Claude 4	Claude 4.5 Opus	Full functionality	Full functionality	Full functionality	Full functionality
GPT	GPT-4 Turbo	Full functionality	Full functionality	Partial functionality	Limited functionality
GPT	GPT-4o	Full functionality	Full functionality	Full functionality	Limited functionality
GPT	GPT-4o-mini	Full functionality	Full functionality	Partial functionality	Limited functionality
GPT	GPT-5	Full functionality	Full functionality	Full functionality	Full functionality
GPT	GPT-5 Mini	Full functionality	Full functionality	Full functionality	Partial functionality
GPT	GPT-5 Codex	Full functionality	Full functionality	Full functionality	Full functionality
GPT	GPT-5.1	Full functionality	Full functionality	Full functionality	Full functionality
GPT	GPT-5.2	Full functionality	Full functionality	Full functionality	Full functionality
GPT	GPT-oss-120B	Full functionality	Full functionality	Full functionality	Limited functionality
GPT	GPT-oss-20B	Partial functionality	Partial functionality	Partial functionality	Limited functionality
Mistral Codestral	Codestral 22B v0.1	Full functionality	Full functionality	Partial functionality	Limited functionality
Mistral	Mistral Small 24B Instruct 2506	Full functionality	Full functionality	Full functionality	Limited functionality
Llama	Llama 3 8B	Partial functionality	Full functionality	Limited functionality	Limited functionality
Llama	Llama 3.1 8B	Partial functionality	Full functionality	Partial functionality	Limited functionality
Llama	Llama 3 70B	Partial functionality	Full functionality	Limited functionality	Limited functionality
Llama	Llama 3.1 70B	Full functionality	Full functionality	Full functionality	Limited functionality
Llama	Llama 3.3 70B	Full functionality	Full functionality	Full functionality	Limited functionality

Compatible models

Status: Beta

You can use your own compatible models and platform with GitLab Duo features. For compatible models not included in supported model families, use the general model family.

Compatible models are excluded from the definition of Customer Integrated Models in the AI Functionality Terms. Compatible models and platforms must adhere to the OpenAI API specification. Models and platforms that have previously been marked as experimental or beta are now considered compatible models.

This feature is in beta and is therefore subject to change as we gather feedback and improve the integration:

GitLab does not provide technical support for issues specific to your chosen model or platform.
Not all GitLab Duo features are guaranteed to work optimally with every compatible model.
Response quality, speed, and performance overall might vary significantly based on your model choice.

Model family	Model
General	Any model compatible with the OpenAI API specification
CodeGemma	CodeGemma 2b
CodeGemma	CodeGemma 7b-it
CodeGemma	CodeGemma 7b-code
Code Llama	Code-Llama 13b
DeepSeek Coder	DeepSeek Coder 33b Instruct
DeepSeek Coder	DeepSeek Coder 33b Base

GitLab-managed models

GitLab-managed models integrate with GitLab-hosted AI Gateway infrastructure to provide access to AI models curated and made available by GitLab. Instead of using your own self-hosted models, you can choose to use GitLab-managed models for specific GitLab Duo features.

To choose which features use GitLab-managed models, see select a GitLab-managed model for a feature.

When enabled for a specific feature:

All calls to those features configured with a GitLab-managed model use the GitLab-hosted AI Gateway, not the self-hosted AI Gateway.
No detailed logs are generated in the GitLab-hosted AI Gateway, even when AI logs are enabled. This prevents unintended leaks of sensitive information.

Hardware requirements

The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:

Base system requirements

CPU:
- Minimum: 8 cores (16 threads)
- Recommended: 16+ cores for production environments
RAM:
- Minimum: 32 GB
- Recommended: 64 GB for most models
Storage:
- SSD with sufficient space for model weights and data.

GPU requirements by model size

Model size	Minimum GPU configuration	Minimum VRAM required
7B models (for example, Mistral 7B)	1x NVIDIA A100 (40 GB)	35 GB
22B models (for example, Codestral 22B)	2x NVIDIA A100 (80 GB)	110 GB
Mixtral 8x7B	2x NVIDIA A100 (80 GB)	220 GB
Mixtral 8x22B	8x NVIDIA A100 (80 GB)	526 GB

Use Hugging Face’s memory utility to verify memory requirements.

Response time by model size and GPU

Small machine

With a a2-highgpu-2g (2x NVIDIA A100 40 GB - 150 GB vRAM) or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests	Total TPS
Mistral-7B-Instruct-v0.3	1	7.09	717.0	101.19	7.09	101.17
Mistral-7B-Instruct-v0.3	10	8.41	764.2	90.35	13.70	557.80
Mistral-7B-Instruct-v0.3	100	13.97	693.23	49.17	20.81	3331.59

Medium machine

With a a2-ultragpu-4g (4x NVIDIA A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests	Total TPS
Mistral-7B-Instruct-v0.3	1	3.80	499.0	131.25	3.80	131.23
Mistral-7B-Instruct-v0.3	10	6.00	740.6	122.85	8.19	904.22
Mistral-7B-Instruct-v0.3	100	11.71	695.71	59.06	15.54	4477.34
Mixtral-8x7B-Instruct-v0.1	1	6.50	400.0	61.55	6.50	61.53
Mixtral-8x7B-Instruct-v0.1	10	16.58	768.9	40.33	32.56	236.13
Mixtral-8x7B-Instruct-v0.1	100	25.90	767.38	26.87	55.57	1380.68

Large machine

With a a2-ultragpu-8g (8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests (sec)	Total TPS
Mistral-7B-Instruct-v0.3	1	3.23	479.0	148.41	3.22	148.36
Mistral-7B-Instruct-v0.3	10	4.95	678.3	135.98	6.85	989.11
Mistral-7B-Instruct-v0.3	100	10.14	713.27	69.63	13.96	5108.75
Mixtral-8x7B-Instruct-v0.1	1	6.08	709.0	116.69	6.07	116.64
Mixtral-8x7B-Instruct-v0.1	10	9.95	645.0	63.68	13.40	481.06
Mixtral-8x7B-Instruct-v0.1	100	13.83	585.01	41.80	20.38	2869.12
Mixtral-8x22B-Instruct-v0.1	1	14.39	828.0	57.56	14.38	57.55
Mixtral-8x22B-Instruct-v0.1	10	20.57	629.7	30.24	28.02	224.71
Mixtral-8x22B-Instruct-v0.1	100	27.58	592.49	21.34	36.80	1609.85

AI Gateway Hardware Requirements

For recommendations on AI Gateway hardware, see the AI Gateway scaling recommendations.