Configure LLM platforms
- Tier: Premium, Ultimate
- Offering: GitLab Self-Managed
The AI Gateway supports multiple LLM providers through LiteLLM. Each platform has unique features and benefits that can cater to different needs. The following documentation summarises the providers we have validated and tested. If the platform you want to use is not in this documentation, provide feedback in the platform request issue (issue 526144).
Use multiple models and platforms
You can use multiple models and platforms in the same GitLab instance.
For example, you can configure one feature to use Azure OpenAI, and another feature to use AWS Bedrock, or self-hosted models served with vLLM.
This setup gives you flexibility to choose the best model and platform for each use case. Models must be supported and served through a compatible platform.
Self-hosted model deployments
vLLM
vLLM is a high-performance inference server optimized for serving LLMs with memory efficiency. It supports model parallelism and integrates easily with existing workflows.
To install vLLM, see the vLLM Installation Guide. You should install version v0.6.4.post1 or later.
Configuring the endpoint URL
When configuring the endpoint URL for any OpenAI API compatible platforms (such as vLLM) in GitLab:
- The URL must be suffixed with
/v1 - If using the default vLLM configuration, the endpoint URL would be
https://<hostname>:8000/v1 - If your server is configured behind a proxy or load balancer, you might not need to specify the port, in which case the URL would be
https://<hostname>/v1
Find the model name
After the model has been deployed, to get the model name for the model identifier field in GitLab, query the vLLM server’s /v1/models endpoint:
curl \
--header "Authorization: Bearer API_KEY" \
--header "Content-Type: application/json" \
http://your-vllm-server:8000/v1/modelsThe model name is the value of the data.id field in the response.
Example response:
{
"object": "list",
"data": [
{
"id": "Mixtral-8x22B-Instruct-v0.1",
"object": "model",
"created": 1739421415,
"owned_by": "vllm",
"root": "mistralai/Mixtral-8x22B-Instruct-v0.1",
// Additional fields removed for readability
}
]
}In this example, if the model’s id is Mixtral-8x22B-Instruct-v0.1, you would set the model identifier in GitLab as custom_openai/Mixtral-8x22B-Instruct-v0.1.
For more information, see the following documentation:
- vLLM supported models, see the vLLM Supported Models documentation.
- Available options when using vLLM to run a model, see the vLLM documentation on engine arguments.
Mistral-7B-Instruct-v0.2
-
Download the model from HuggingFace:
git clone https://<your-hugging-face-username>:<your-hugging-face-token>@huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 -
Run the server:
vllm serve <path-to-model>/Mistral-7B-Instruct-v0.3 \ --served_model_name <choose-a-name-for-the-model> \ --tokenizer_mode mistral \ --tensor_parallel_size <number-of-gpus> \ --load_format mistral \ --config_format mistral \ --tokenizer <path-to-model>/Mistral-7B-Instruct-v0.3
Mixtral-8x7B-Instruct-v0.1
-
Download the model from HuggingFace:
git clone https://<your-hugging-face-username>:<your-hugging-face-token>@huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 -
Rename the token config:
cd <path-to-model>/Mixtral-8x7B-Instruct-v0.1 cp tokenizer.model tokenizer.model.v3 -
Run the model:
vllm serve <path-to-model>/Mixtral-8x7B-Instruct-v0.1 \ --tensor_parallel_size 4 \ --served_model_name <choose-a-name-for-the-model> \ --tokenizer_mode mistral \ --load_format safetensors \ --tokenizer <path-to-model>/Mixtral-8x7B-Instruct-v0.1
Disable request logging to reduce latency
When running vLLM in production, you can significantly reduce latency by using the --disable-log-requests flag to disable request logging.
Use this flag only when you do not need detailed request logging.
Disabling request logging minimizes the overhead introduced by verbose logs, especially under high load, and can help improve performance levels.
vllm serve <path-to-model>/<model-version> \
--served_model_name <choose-a-name-for-the-model> \
--disable-log-requestsThis change has been observed to notably improve response times in internal benchmarks.
Cloud-hosted model deployments
GitLab has validated and tested the following providers. The AI Gateway supports LLM providers that are compatible with LiteLLM.
Configure authentication with AWS Bedrock
You can use several methods to authenticate AWS Bedrock with your AI Gateway.
Prerequisites:
- Models are automatically enabled in Bedrock when first invoked. For more information, see Bedrock model access.
- Have AWS credentials configured with appropriate IAM permissions.
Amazon EKS with Helm Chart (Recommended)
Use IRSA (IAM Roles for Service Accounts) for your AI Gateway pods to authenticate to AWS Bedrock, without storing static credentials.
After you authenticate Amazon EKS with IRSA, the AI Gateway automatically obtains temporary credentials from the IRSA role.
To use IRSA to authenticate Amazon EKS:
-
Create an IAM policy that grants access to Bedrock models. You can scope this to specific models if you require more security:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": "arn:aws:bedrock:*::foundation-model/*" } ] }aws iam create-policy \ --policy-name bedrock-ai-gateway-access \ --policy-document file://bedrock-policy.json \ --description "Bedrock access for AI Gateway" -
Optional. For stricter access control, replace the wildcard resource with specific model Amazon Resource Name (ARN). This ensures only approved models can be accessed, even if GitLab configuration changes. For available model ARNs, see Amazon Bedrock model IDs.
"Resource": [ "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0", "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-haiku-20240307-v1:0" ]Some models might use different ARN formats. For example, newer models might require inference profile ARNs in addition to foundation model ARNs. To check the the ARN format for your specific model, see the Amazon Bedrock model IDs.
-
Create an IAM role with a trust policy for your Amazon EKS service account to use. Replace the following values:
YOUR_ACCOUNT_ID: Your AWS account ID.REGION: Your Amazon EKS cluster region (for example,us-east-1).YOUR_OIDC_ID: Your Amazon EKS cluster’s OIDC provider ID.NAMESPACE: Kubernetes namespace where AI Gateway is deployed.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/YOUR_OIDC_ID" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.eks.REGION.amazonaws.com/id/YOUR_OIDC_ID:sub": "system:serviceaccount:NAMESPACE:ai-gateway", "oidc.eks.REGION.amazonaws.com/id/YOUR_OIDC_ID:aud": "sts.amazonaws.com" } } } ] }# Create the role aws iam create-role \ --role-name eks-ai-gateway-bedrock \ --assume-role-policy-document file://trust-policy.json \ --description "EKS IRSA role for AI Gateway to access Bedrock" -
Attach the Bedrock IAM policy to this role.
# Attach the role aws iam attach-role-policy \ --role-name eks-ai-gateway-bedrock \ --policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/bedrock-ai-gateway-access -
To configure the Helm chart, install the AI Gateway with the IAM role annotation:
serviceAccount: create: true name: ai-gateway annotations: eks.amazonaws.com/role-arn: arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_ROLE_NAME extraEnvironmentVariables: - name: AWS_REGION value: us-east-1
For more information, see IAM roles for service accounts.
Docker deployments
Configure IAM credentials through environment variables when starting the AI Gateway container:
docker run -d \
-e AWS_ACCESS_KEY_ID=your-access-key \
-e AWS_SECRET_ACCESS_KEY=your-secret-key \
-e AWS_REGION=us-east-1 \
-p 5052:5052 \
registry.gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/model-gateway:self-hosted-vX.Y.Z-eeThe IAM user or role must have a policy similar to the one you would set in Amazon EKS with Helm Chart.
Kubernetes deployments
For Kubernetes clusters other than Amazon EKS, you can use Kubernetes secrets to store AWS credentials:
-
Create a Kubernetes secret:
kubectl create secret generic aws-credentials \ --from-literal=access-key-id=YOUR_ACCESS_KEY_ID \ --from-literal=secret-access-key=YOUR_SECRET_ACCESS_KEY \ -n YOUR_NAMESPACE -
Configure the Helm chart to reference the secret:
extraEnvironmentVariables: - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: name: aws-credentials key: access-key-id - name: AWS_SECRET_ACCESS_KEY valueFrom: secretKeyRef: name: aws-credentials key: secret-access-key - name: AWS_REGION value: us-east-1
AWS Bedrock API keys
To use AWS Bedrock API keys as an alternative to IAM credentials:
-
Create a Kubernetes secret with the API key:
kubectl create secret generic bedrock-api-key \ --from-literal=token=YOUR_BEDROCK_API_KEY \ -n YOUR_NAMESPACE -
Configure the AI Gateway (add to your
values.yaml):extraEnvironmentVariables: - name: AWS_BEARER_TOKEN_BEDROCK valueFrom: secretKeyRef: name: bedrock-api-key key: token - name: AWS_REGION value: us-east-1
Private VPC endpoints
To use a private Bedrock endpoint in a VPC, set the AWS_BEDROCK_RUNTIME_ENDPOINT environment variable.
For Helm deployments:
extraEnvironmentVariables:
- name: AWS_BEDROCK_RUNTIME_ENDPOINT
value: https://bedrock-runtime.us-east-1.amazonaws.comFor Docker deployments:
docker run -d \
-e AWS_BEDROCK_RUNTIME_ENDPOINT=https://bedrock-runtime.us-east-1.amazonaws.com \
-e AWS_REGION=us-east-1 \
# ... other configurationFor VPC endpoints, use the format: https://vpce-{vpc-endpoint-id}-{service-name}.{region}.vpce.amazonaws.com
Configure authentication with Google Vertex AI
To use models from Google Vertex AI, you must authenticate your AI Gateway instance. You can use any of the following mechanisms:
-
Export the environment variables when starting the Docker container. To do this, set the following environment variables when running the AI Gateway container:
GOOGLE_APPLICATION_CREDENTIALS=/path/to/application_default_credentials.json VERTEXAI_PROJECT=<gcp-project-id> VERTEXAI_LOCATION=global -
Run the AI Gateway container on Google Cloud Run and use the Cloud Run service account for Vertex AI access.
Related topics
- Supported models and hardware requirements documentation.
- Amazon Bedrock supported foundation models
- AWS IAM best practices
- Amazon Bedrock Security
- For configuration information, see the following documentation: