Troubleshooting the GitLab chart

UPGRADE FAILED: “$name” has no deployed releases

This error occurs on your second install/upgrade if your initial install failed.

If your initial install completely failed, and GitLab was never operational, you should first purge the failed install before installing again.

helm uninstall <release-name>

If instead, the initial install command timed out, but GitLab still came up successfully, you can add the --force flag to the helm upgrade command to ignore the error and attempt to update the release.

Otherwise, if you received this error after having previously had successful deploys of the GitLab chart, then you are encountering a bug. Please open an issue on our issue tracker, and also check out issue #630 where we recovered our CI server from this problem.

Error: this command needs 2 arguments: release name, chart path

An error like this could occur when you run helm upgrade and there are some spaces in the parameters. In the following example, Test Username is the culprit:

helm upgrade gitlab gitlab/gitlab --timeout 600s --set global.email.display_name=Test Username ...

To fix it, pass the parameters in single quotes:

helm upgrade gitlab gitlab/gitlab --timeout 600s --set global.email.display_name='Test Username' ...

Application containers constantly initializing

If you experience Sidekiq, Webservice, or other Rails based containers in a constant state of Initializing, you’re likely waiting on the dependencies container to pass.

If you check the logs of a given Pod specifically for the dependencies container, you may see the following repeated:

Checking database connection and schema version
WARNING: This version of GitLab depends on gitlab-shell 8.7.1, ...
Database Schema
Current version: 0
Codebase version: 20190301182457

This is an indication that the migrations Job has not yet completed. The purpose of this Job is to both ensure that the database is seeded, as well as all relevant migrations are in place. The application containers are attempting to wait for the database to be at or above their expected database version. This is to ensure that the application does not malfunction to the schema not matching expectations of the codebase.

Find the migrations Job. kubectl get job -lapp=migrations
Find the Pod being run by the Job. kubectl get pod -lbatch.kubernetes.io/job-name=<job-name>
Examine the output, checking the STATUS column.

If the STATUS is Running, continue. If the STATUS is Completed, the application containers should start shortly after the next check passes.

Examine the logs from this pod. kubectl logs <pod-name>

Any failures during the run of this job should be addressed. These will block the use of the application until resolved. Possible problems are:

Unreachable or failed authentication to the configured PostgreSQL database
Unreachable or failed authentication to the configured Redis services
Failure to reach a Gitaly instance

Applying configuration changes

The following command will perform the necessary operations to apply any updates made to gitlab.yaml:

helm upgrade <release name> <chart path> -f gitlab.yaml

Included GitLab Runner failing to register

This can happen when the runner registration token has been changed in GitLab. (This often happens after you have restored a backup)

Find the new shared runner token located on the admin/runners webpage of your GitLab installation.
Find the name of existing runner token Secret stored in Kubernetes
```
kubectl get secrets | grep gitlab-runner-secret
```

Delete the existing secret

kubectl delete secret <runner-secret-name>

Create the new secret with two keys, (runner-registration-token with your shared token, and an empty runner-token)

kubectl create secret generic <runner-secret-name> --from-literal=runner-registration-token=<new-shared-runner-token> --from-literal=runner-token=""

Too many redirects

This can happen when you have TLS termination before the NGINX Ingress, and the tls-secrets are specified in the configuration.

Update your values to set global.ingress.annotations."nginx.ingress.kubernetes.io/ssl-redirect": "false"

Via a values file:

# values.yaml
global:
  ingress:
    annotations:
      "nginx.ingress.kubernetes.io/ssl-redirect": "false"

Via the Helm CLI:

helm ... --set-string global.ingress.annotations."nginx.ingress.kubernetes.io/ssl-redirect"=false

Apply the change.

When using an external service for SSL termination, that service is responsible for redirecting to https (if so desired).

Upgrades fail with Immutable Field Error

spec.clusterIP

Prior to the 3.0.0 release of these charts, the spec.clusterIP property had been populated into several Services despite having no actual value (""). This was a bug, and causes problems with Helm 3’s three-way merge of properties.

Once the chart was deployed with Helm 3, there would be no possible upgrade path unless one collected the clusterIP properties from the various Services and populated those into the values provided to Helm, or the affected services are removed from Kubernetes.

The 3.0.0 release of this chart corrected this error, but it requires manual correction.

This can be solved by simply removing all of the affected services.

Remove all affected services:

kubectl delete services -lrelease=RELEASE_NAME

Perform an upgrade via Helm.
Future upgrades will not face this error.

This will change any dynamic value for the LoadBalancer for NGINX Ingress from this chart, if in use. See global Ingress settings documentation for more details regarding externalIP. You may be required to update DNS records!

spec.selector

Sidekiq pods did not receive a unique selector prior to chart release 3.0.0. The problems with this were documented in.

Upgrades to 3.0.0 using Helm will automatically delete the old Sidekiq deployments and create new ones by appending -v1 to the name of the Sidekiq Deployments,HPAs, and Pods.

If you continue to run into this error on the Sidekiq deployment when installing 3.0.0, resolve these with the following steps:

Remove Sidekiq services

kubectl delete deployment --cascade -lrelease=RELEASE_NAME,app=sidekiq

Perform an upgrade via Helm.

cannot patch “RELEASE-NAME-cert-manager” with kind Deployment

Upgrading from CertManager version 0.10 introduced a number of breaking changes. The old Custom Resource Definitions must be uninstalled and removed from Helm’s tracking and then re-installed.

The Helm chart attempts to do this by default but if you encounter this error you may need to take manual action.

If this error message was encountered, then upgrading requires one more step than normal in order to ensure the new Custom Resource Definitions are actually applied to the deployment.

Remove the old CertManager Deployment.

kubectl delete deployments -l app=cert-manager --cascade

Run the upgrade again. This time install the new Custom Resource Definitions

helm upgrade --install --values - YOUR-RELEASE-NAME gitlab/gitlab < <(helm get values YOUR-RELEASE-NAME)

cannot patch `gitlab-kube-state-metrics` with kind Deployment

Upgrading from Prometheus version 11.16.9 to 15.0.4 changes the selector labels used on the kube-state-metrics Deployment, which is disabled by default (prometheus.kubeStateMetrics.enabled=false).

If this error message is encountered, meaning prometheus.kubeStateMetrics.enabled=true, then upgrading requires an additional step:

Remove the old kube-state-metrics Deployment.

kubectl delete deployments.apps -l app.kubernetes.io/instance=RELEASE_NAME,app.kubernetes.io/name=kube-state-metrics --cascade=orphan

Perform an upgrade via Helm.

`ImagePullBackOff`, `Failed to pull image` and `manifest unknown` errors

If you are using global.gitlabVersion, start by removing that property. Check the version mappings between the chart and GitLab and specify a compatible version of the gitlab/gitlab chart in your helm command.

UPGRADE FAILED: “cannot patch …” after `helm 2to3 convert`

This is a known issue. After migrating a Helm 2 release to Helm 3, the subsequent upgrades may fail. You can find the full explanation and workaround in Migrating from Helm v2 to Helm v3.

UPGRADE FAILED: type mismatch on mailroom: `%!t(<nil>)`

An error like this can happen if you do not provide a valid map for a key that expects a map.

For example, the configuration below will cause this error:

gitlab:
  mailroom:

To fix this, either:

Provide a valid map for gitlab.mailroom.
Remove the mailroom key entirely.

Note that for optional keys, an empty map ({}) is a valid value.

Error: `cannot drop view pg_stat_statements because extension pg_stat_statements requires it`

You may face this error when restoring a backup on your Helm chart instance. Use the following steps as a workaround:

Inside your toolbox pod open the DB console:
```
/srv/gitlab/bin/rails dbconsole -p
```
Drop the extension:
```
DROP EXTENSION pg_stat_statements;
```
Perform the restoration process.
After the restoration is complete, re-create the extension in the DB console:
```
CREATE EXTENSION pg_stat_statements;
```

If you encounter the same issue with the pg_buffercache extension, follow the same steps above to drop and re-create it.

You can find more details about this error in issue #2469.

Bundled PostgreSQL pod fails to start: `database files are incompatible with server`

The following error message may appear in the bundled PostgreSQL pod after upgrading to a new version of the GitLab Helm chart:

gitlab-postgresql FATAL:  database files are incompatible with server
gitlab-postgresql DETAIL:  The data directory was initialized by PostgreSQL version 11, which is not compatible with this version 12.7.

To address this, perform a Helm rollback to the previous version of the chart and then follow the steps in the upgrade guide to upgrade the bundled PostgreSQL version. Once PostgreSQL is properly upgraded, try the GitLab Helm chart upgrade again.

Bundled NGINX Ingress pod fails to start: `Failed to watch *v1beta1.Ingress`

The following error message may appear in the bundled NGINX Ingress controller pod if running Kubernetes version 1.22 or later:

Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource

To address this, ensure the Kubernetes version is 1.21 or older. See #2852 for more information regarding NGINX Ingress support for Kubernetes 1.22 or later.

Increased load on `/api/v4/jobs/request` endpoint

You may face this issue if the option workhorse.keywatcher was set to false for the deployment servicing /api/*. Use the following steps to verify:

Access the container gitlab-workhorse in the pod serving /api/*:

kubectl exec -it --container=gitlab-workhorse <gitlab_api_pod> -- /bin/bash

Inspect the file /srv/gitlab/config/workhorse-config.toml. The [redis] configuration might be missing:
```
grep '\[redis\]' /srv/gitlab/config/workhorse-config.toml
```

If the [redis] configuration is not present, the workhorse.keywatcher flag was set to false during deployment thus causing the extra load in the /api/v4/jobs/request endpoint. To fix this, enable the keywatcher in the webservice chart:

workhorse:
  keywatcher: true

Git over SSH: `the remote end hung up unexpectedly`

Git operations over SSH might fail intermittently with the following error:

fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

There are a number of potential causes for this error:

Network timeouts:

Git clients sometimes open a connection and leave it idling, like when compressing objects. Settings like timeout client in HAProxy might cause these idle connections to be terminated.

you can set a keepalive in sshd:
```
gitlab:
  gitlab-shell:
    config:
      clientAliveInterval: 15
```
gitlab-shell memory:

By default, the chart does not set a limit on GitLab Shell memory. If gitlab.gitlab-shell.resources.limits.memory is set too low, Git operations over SSH may fail with these errors.

Run kubectl describe nodes to confirm that this is caused by memory limits rather than timeouts over the network.
```
System OOM encountered, victim process: gitlab-shell
Memory cgroup out of memory: Killed process 3141592 (gitlab-shell)
```

Error: `kex_exchange_identification: Connection closed by remote host`

The following error can appear in the GitLab Shell logs:

subcomponent":"ssh","time":"2025-02-21T19:07:52Z","message":"kex_exchange_identification: Connection closed by remote host\r"}

This error is caused by OpenSSH sshd being unable to handle readiness and liveness probes. To resolve this error, use gitlab-sshd instead by changing sshDaemon: openssh to sshDaemon: gitlab-ssd in configuration:

gitlab:
  gitlab-shell:
    sshDaemon: gitlab-sshd

YAML configuration: `mapping values are not allowed in this context`

The following error message may appear when YAML configuration contains leading spaces:

template: /var/opt/gitlab/templates/workhorse-config.toml.tpl:16:98:
  executing \"/var/opt/gitlab/templates/workhorse-config.toml.tpl\" at <data.YAML>:
    error calling YAML:
      yaml: line 2: mapping values are not allowed in this context

To address this, ensure that there are no leading spaces in configuration.

For example, change this:

  key1: value1
  key2: value2

… to this:

key1: value1
key2: value2

TLS and certificates

If your GitLab instance needs to trust a private TLS certificate authority, GitLab might fail to handshake with other services like object storage, Elasticsearch, Jira, or Jenkins:

error: certificate verify failed (unable to get local issuer certificate)

Partial trust of certificates signed by private certificate authorities can occur if:

The supplied certificates are not in separate files.
The certificates init container doesn’t perform all the required steps.

Also, GitLab is mostly written in Ruby on Rails and Go, and each language’s TLS libraries work differently. This difference can result in issues like job logs failing to render in the GitLab UI but raw job logs downloading without issue.

Additionally, depending on the proxy_download configuration, your browser is redirected to the object storage with no issues if the trust store is correctly configured. At the same time, TLS handshakes by one or more GitLab components could still fail.

Certificate trust setup and troubleshooting

As part of troubleshooting certificate issues, be sure to:

Create secrets for each certificate you need to trust.
Provide only one certificate per file.
```
kubectl create secret generic custom-ca --from-file=unique_name=/path/to/cert
```
In this example, the certificate is stored using the key name unique_name

If you supply a bundle or a chain, some GitLab components won’t work.

Query secrets with kubectl get secrets and kubectl describe secrets/secretname, which shows the key name for the certificate under Data.

Supply additional certificates to trust using global.certificates.customCAs in the chart globals.

When a pod is deployed, an init container mounts the certificates and sets them up so the GitLab components can use them. The init container isregistry.gitlab.com/gitlab-org/build/cng/alpine-certificates.

Additional certificates are mounted into the container at /usr/local/share/ca-certificates, using the secret key name as the certificate filename.

The init container runs /scripts/bundle-certificates (source). In that script, update-ca-certificates:

Copies custom certificates from /usr/local/share/ca-certificates to /etc/ssl/certs.
Compiles a bundle ca-certificates.crt.
Generates hashes for each certificate and creates a symlink using the hash, which is required for Rails. Certificate bundles are skipped with a warning:
```
WARNING: unique_name does not contain exactly one certificate or CRL: skipping
```

Troubleshoot the init container’s status and logs. For example, to view the logs for the certificates init container and check for warnings:

kubectl logs gitlab-webservice-default-pod -c certificates

Check on the Rails console

Use the toolbox pod to verify if Rails trusts the certificates you supplied.

Start a Rails console (replace <namespace> with the namespace where GitLab is installed):

kubectl exec -ti $(kubectl get pod -n <namespace> -lapp=toolbox -o jsonpath='{.items[0].metadata.name}') -n <namespace> -- bash
/srv/gitlab/bin/rails console

Verify the location Rails checks for certificate authorities:
```
OpenSSL::X509::DEFAULT_CERT_DIR
```

Execute an HTTPS query in the Rails console:

## Configure a web server to connect to:
uri = URI.parse("https://myservice.example.com")

require 'openssl'
require 'net/http'
Rails.logger.level = 0
OpenSSL.debug=1
http = Net::HTTP.new(uri.host, uri.port)
http.set_debug_output($stdout)
http.use_ssl = true

http.verify_mode = OpenSSL::SSL::VERIFY_PEER
# http.verify_mode = OpenSSL::SSL::VERIFY_NONE # TLS verification disabled

response = http.request(Net::HTTP::Get.new(uri.request_uri))

Troubleshoot the init container

Run the certificates container using Docker.

Set up a directory structure and populate it with your certificates:

mkdir -p etc/ssl/certs usr/local/share/ca-certificates

  # The secret name is: my-root-ca
  # The key name is: corporate_root

kubectl get secret my-root-ca -ojsonpath='{.data.corporate_root}' | \
     base64 --decode > usr/local/share/ca-certificates/corporate_root

  # Check the certificate is correct:

openssl x509 -in usr/local/share/ca-certificates/corporate_root -text -noout

Determine the correct container version:

kubectl get deployment -lapp=webservice -ojsonpath='{.items[0].spec.template.spec.initContainers[0].image}'

Run container, which performs the preparation of etc/ssl/certs content:

docker run -ti --rm \
     -v $(pwd)/etc/ssl/certs:/etc/ssl/certs \
     -v $(pwd)/usr/local/share/ca-certificates:/usr/local/share/ca-certificates \
     registry.gitlab.com/gitlab-org/build/cng/gitlab-base:v15.10.3

Check your certificates have been correctly built:
- etc/ssl/certs/corporate_root.pem should have been created.
- There should be a hashed filename, which is a symlink to the certificate itself (such as etc/ssl/certs/1234abcd.0).
- The file and the symbolic link should display with:
```
ls -l etc/ssl/certs/ | grep corporate_root
```
  For example:
```
lrwxrwxrwx   1 root root      20 Oct  7 11:34 28746b42.0 -> corporate_root.pem
-rw-r--r--   1 root root    1948 Oct  7 11:34 corporate_root.pem
```

`308: Permanent Redirect` causing a redirect loop

308: Permanent Redirect can happen if your Load Balancer is configured to send unencrypted traffic (HTTP) to NGINX. Because NGINX defaults to redirecting HTTP to HTTPS, you may end up in a “redirect loop”.

To fix this, enable NGINX’s use-forwarded-headers setting.

“Invalid Word” errors in the `nginx-controller` logs and `404` errors

After upgrading to Helm chart 6.6 or later, you might experience 404 return codes when visiting your GitLab or third-party domains for applications installed in your cluster and are also seeing “invalid word” errors in the gitlab-nginx-ingress-controller logs:

gitlab-nginx-ingress-controller-899b7d6bf-688hr controller W1116 19:03:13.162001       7 store.go:846] skipping ingress gitlab/gitlab-minio: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-688hr controller W1116 19:03:13.465487       7 store.go:846] skipping ingress gitlab/gitlab-registry: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:12.233577       6 store.go:846] skipping ingress gitlab/gitlab-kas: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:12.536534       6 store.go:846] skipping ingress gitlab/gitlab-webservice-default: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:12.848844       6 store.go:846] skipping ingress gitlab/gitlab-webservice-default-smartcard: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:13.161640       6 store.go:846] skipping ingress gitlab/gitlab-minio: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:13.465425       6 store.go:846] skipping ingress gitlab/gitlab-registry: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass

In that case, review your GitLab values and any third-party Ingress objects for the use of configuration snippets. You may need to adjust or modify the nginx-ingress.controller.config.annotation-value-word-blocklist setting.

See Annotation value word blocklist for additional details.

Volume mount takes a long time

Mounting large volumes, such as the gitaly or toolbox chart volumes, can take a long time because Kubernetes recursively changes the permissions of the volume’s contents to match the Pod’s securityContext.

Starting with Kubernetes 1.23 you can set the securityContext.fsGroupChangePolicy to OnRootMismatch to mitigate this issue. This flag is supported by all GitLab subcharts.

For example for the Gitaly subchart:

gitlab:
  gitaly:
    securityContext:
      fsGroupChangePolicy: "OnRootMismatch"

See the Kubernetes documentation, for more details.

For Kubernetes versions not supporting fsGroupChangePolicy you can mitigate the issue by changing or fully deleting the settings for the securityContext.

gitlab:
  gitaly:
    securityContext:
      fsGroup: ""
      runAsUser: ""

The example syntax eliminates the securityContext setting entirely. Setting securityContext: {} or securityContext: does not work due to the way Helm merges default values with user provided configuration.

Intermittent 502 errors

When a request being handled by a Puma worker crosses the memory limit threshold, it is killed by the node’s OOMKiller. However, killing the request does not necessarily kill or restart the webservice pod itself. This situation causes the request to return a 502 timeout. In the logs, this appears as a Puma worker being created shortly after the 502 error is logged.

2024-01-19T14:12:08.949263522Z {"correlation_id":"XXXXXXXXXXXX","duration_ms":1261,"error":"badgateway: failed to receive response: context canceled"....
2024-01-19T14:12:24.214148186Z {"component": "gitlab","subcomponent":"puma.stdout","timestamp":"2024-01-19T14:12:24.213Z","pid":1,"message":"- Worker 2 (PID: 7414) booted in 0.84s, phase: 0"}

To solve this problem, raise memory limits for the webservice pods.

Upgrade failed - `cannot patch "gitlab-prometheus-server" with kind Deployment`

With chart 9.0 we updated the major version Prometheus subchart. The selector labels and version of Prometheus were changes and need manual interaction.

Please follow the migration guide to upgrade the Prometheus chart.

Toolbox backup failing on upload

A backup may fail when trying to upload to the object storage with an error like:

An error occurred (XAmzContentSHA256Mismatch) when calling the UploadPart operation: The Content-SHA256 you specified did not match what we received

This might be caused by an incompatibility of the awscli tool and your object storage service. This issue has been reported when using Dell ECS S3 Storage. To avoid this issue you can disable data integrity protection.

Webservice readiness probe fails

Beginning with GitLab chart version 9.2 (GitLab 18.2), dual stack support for both IPv4 and IPv6 is enabled by default. If you’re running a GitLab version prior to 18.2 with a custom monitoring IP allowlist, this may cause the Kubernetes probes for the webservice Pods to fail.

Events:
  Type     Reason                Age                   From                     Message
  ----     ------                ----                  ----                     -------
[snip]
  Warning  Unhealthy             43m (x15 over 44m)    kubelet                  Startup probe failed: HTTP probe failed with statuscode: 404

To fix the Webservice probes, either:

Upgrade the Webservice image to match the chart version.
Extend your monitoring allow list with the IPv6-mapped equivalent addressed (for example ::ffff:10.0.0.0 for 10.0.0.0).
Explicitly configure the monitoring endpoint to listen on IPv4 only (gitlab.webservice.monitoring.listenAddr=0.0.0.0).
Disable IP mapping on a node/kernel level.

invalid: `spec.progressDeadlineSeconds`

If using Helm v3.18.0, you’ll get this error when upgrading your chart:

Error: UPGRADE FAILED: cannot patch "gitlab-nginx-ingress-controller" with kind Deployment: Deployment.apps "gitlab-nginx-ingress-controller" is invalid: spec.progressDeadlineSeconds: Invalid value: 0: must be greater than minReadySeconds

To fix it, upgrade your Helm client to v3.18.1 or later. Alternatively, you can downgrade it to v3.17.x.

This is due to a Helm issue 30878.

Troubleshooting the GitLab chart

UPGRADE FAILED: “$name” has no deployed releases

Error: this command needs 2 arguments: release name, chart path

Application containers constantly initializing

Applying configuration changes

Included GitLab Runner failing to register

Too many redirects

Upgrades fail with Immutable Field Error

spec.clusterIP

spec.selector

cannot patch “RELEASE-NAME-cert-manager” with kind Deployment

cannot patch gitlab-kube-state-metrics with kind Deployment

ImagePullBackOff, Failed to pull image and manifest unknown errors

UPGRADE FAILED: “cannot patch …” after helm 2to3 convert

UPGRADE FAILED: type mismatch on mailroom: %!t(<nil>)

Error: cannot drop view pg_stat_statements because extension pg_stat_statements requires it

Bundled PostgreSQL pod fails to start: database files are incompatible with server

Bundled NGINX Ingress pod fails to start: Failed to watch *v1beta1.Ingress

Increased load on /api/v4/jobs/request endpoint

Git over SSH: the remote end hung up unexpectedly

Error: kex_exchange_identification: Connection closed by remote host

YAML configuration: mapping values are not allowed in this context