Troubleshooting the GitLab chart
UPGRADE FAILED: “$name” has no deployed releases
This error occurs on your second install/upgrade if your initial install failed.
If your initial install completely failed, and GitLab was never operational, you should first purge the failed install before installing again.
helm uninstall <release-name>If instead, the initial install command timed out, but GitLab still came up successfully,
you can add the --force flag to the helm upgrade command to ignore the error
and attempt to update the release.
Otherwise, if you received this error after having previously had successful deploys of the GitLab chart, then you are encountering a bug. Please open an issue on our issue tracker, and also check out issue #630 where we recovered our CI server from this problem.
Error: this command needs 2 arguments: release name, chart path
An error like this could occur when you run helm upgrade
and there are some spaces in the parameters. In the following
example, Test Username is the culprit:
helm upgrade gitlab gitlab/gitlab --timeout 600s --set global.email.display_name=Test Username ...To fix it, pass the parameters in single quotes:
helm upgrade gitlab gitlab/gitlab --timeout 600s --set global.email.display_name='Test Username' ...Application containers constantly initializing
If you experience Sidekiq, Webservice, or other Rails based containers in a constant
state of Initializing, you’re likely waiting on the dependencies container to
pass.
If you check the logs of a given Pod specifically for the dependencies container,
you may see the following repeated:
Checking database connection and schema version
WARNING: This version of GitLab depends on gitlab-shell 8.7.1, ...
Database Schema
Current version: 0
Codebase version: 20190301182457This is an indication that the migrations Job has not yet completed. The purpose
of this Job is to both ensure that the database is seeded, as well as all
relevant migrations are in place. The application containers are attempting to
wait for the database to be at or above their expected database version. This is
to ensure that the application does not malfunction to the schema not matching
expectations of the codebase.
- Find the
migrationsJob.kubectl get job -lapp=migrations - Find the Pod being run by the Job.
kubectl get pod -lbatch.kubernetes.io/job-name=<job-name> - Examine the output, checking the
STATUScolumn.
If the STATUS is Running, continue. If the STATUS is Completed, the application containers should start shortly after the next check passes.
Examine the logs from this pod. kubectl logs <pod-name>
Any failures during the run of this job should be addressed. These will block the use of the application until resolved. Possible problems are:
- Unreachable or failed authentication to the configured PostgreSQL database
- Unreachable or failed authentication to the configured Redis services
- Failure to reach a Gitaly instance
Applying configuration changes
The following command will perform the necessary operations to apply any updates made to gitlab.yaml:
helm upgrade <release name> <chart path> -f gitlab.yamlIncluded GitLab Runner failing to register
This can happen when the runner registration token has been changed in GitLab. (This often happens after you have restored a backup)
-
Find the new shared runner token located on the
admin/runnerswebpage of your GitLab installation. -
Find the name of existing runner token Secret stored in Kubernetes
kubectl get secrets | grep gitlab-runner-secret -
Delete the existing secret
kubectl delete secret <runner-secret-name> -
Create the new secret with two keys, (
runner-registration-tokenwith your shared token, and an emptyrunner-token)kubectl create secret generic <runner-secret-name> --from-literal=runner-registration-token=<new-shared-runner-token> --from-literal=runner-token=""
Too many redirects
This can happen when you have TLS termination before the NGINX Ingress, and the tls-secrets are specified in the configuration.
-
Update your values to set
global.ingress.annotations."nginx.ingress.kubernetes.io/ssl-redirect": "false"Via a values file:
# values.yaml global: ingress: annotations: "nginx.ingress.kubernetes.io/ssl-redirect": "false"Via the Helm CLI:
helm ... --set-string global.ingress.annotations."nginx.ingress.kubernetes.io/ssl-redirect"=false -
Apply the change.
When using an external service for SSL termination, that service is responsible for redirecting to https (if so desired).
Upgrades fail with Immutable Field Error
spec.clusterIP
Prior to the 3.0.0 release of these charts, the spec.clusterIP property
had been populated into several Services
despite having no actual value (""). This was a bug, and causes problems with Helm 3’s three-way
merge of properties.
Once the chart was deployed with Helm 3, there would be no possible upgrade path unless one
collected the clusterIP properties from the various Services and populated those into the values
provided to Helm, or the affected services are removed from Kubernetes.
The 3.0.0 release of this chart corrected this error, but it requires manual correction.
This can be solved by simply removing all of the affected services.
-
Remove all affected services:
kubectl delete services -lrelease=RELEASE_NAME -
Perform an upgrade via Helm.
-
Future upgrades will not face this error.
This will change any dynamic value for the LoadBalancer for NGINX Ingress from this chart, if in use.
See global Ingress settings documentation for more
details regarding externalIP. You may be required to update DNS records!
spec.selector
Sidekiq pods did not receive a unique selector prior to chart release
3.0.0. The problems with this were documented in.
Upgrades to 3.0.0 using Helm will automatically delete the old Sidekiq deployments and create new ones by appending -v1 to the
name of the Sidekiq Deployments,HPAs, and Pods.
If you continue to run into this error on the Sidekiq deployment when installing 3.0.0, resolve these with the following
steps:
-
Remove Sidekiq services
kubectl delete deployment --cascade -lrelease=RELEASE_NAME,app=sidekiq -
Perform an upgrade via Helm.
cannot patch “RELEASE-NAME-cert-manager” with kind Deployment
Upgrading from CertManager version 0.10 introduced a number of
breaking changes. The old Custom Resource Definitions must be uninstalled
and removed from Helm’s tracking and then re-installed.
The Helm chart attempts to do this by default but if you encounter this error you may need to take manual action.
If this error message was encountered, then upgrading requires one more step than normal in order to ensure the new Custom Resource Definitions are actually applied to the deployment.
-
Remove the old CertManager Deployment.
kubectl delete deployments -l app=cert-manager --cascade -
Run the upgrade again. This time install the new Custom Resource Definitions
helm upgrade --install --values - YOUR-RELEASE-NAME gitlab/gitlab < <(helm get values YOUR-RELEASE-NAME)
cannot patch gitlab-kube-state-metrics with kind Deployment
Upgrading from Prometheus version 11.16.9 to 15.0.4 changes the selector labels
used on the kube-state-metrics Deployment,
which is disabled by default (prometheus.kubeStateMetrics.enabled=false).
If this error message is encountered, meaning prometheus.kubeStateMetrics.enabled=true, then upgrading
requires an additional step:
-
Remove the old kube-state-metrics Deployment.
kubectl delete deployments.apps -l app.kubernetes.io/instance=RELEASE_NAME,app.kubernetes.io/name=kube-state-metrics --cascade=orphan -
Perform an upgrade via Helm.
ImagePullBackOff, Failed to pull image and manifest unknown errors
If you are using global.gitlabVersion,
start by removing that property.
Check the version mappings between the chart and GitLab
and specify a compatible version of the gitlab/gitlab chart in your helm command.
UPGRADE FAILED: “cannot patch …” after helm 2to3 convert
This is a known issue. After migrating a Helm 2 release to Helm 3, the subsequent upgrades may fail. You can find the full explanation and workaround in Migrating from Helm v2 to Helm v3.
UPGRADE FAILED: type mismatch on mailroom: %!t(<nil>)
An error like this can happen if you do not provide a valid map for a key that expects a map.
For example, the configuration below will cause this error:
gitlab:
mailroom:To fix this, either:
- Provide a valid map for
gitlab.mailroom. - Remove the
mailroomkey entirely.
Note that for optional keys, an empty map ({}) is a valid value.
Error: cannot drop view pg_stat_statements because extension pg_stat_statements requires it
You may face this error when restoring a backup on your Helm chart instance. Use the following steps as a workaround:
-
Inside your
toolboxpod open the DB console:/srv/gitlab/bin/rails dbconsole -p -
Drop the extension:
DROP EXTENSION pg_stat_statements; -
Perform the restoration process.
-
After the restoration is complete, re-create the extension in the DB console:
CREATE EXTENSION pg_stat_statements;
If you encounter the same issue with the pg_buffercache extension,
follow the same steps above to drop and re-create it.
You can find more details about this error in issue #2469.
Bundled PostgreSQL pod fails to start: database files are incompatible with server
The following error message may appear in the bundled PostgreSQL pod after upgrading to a new version of the GitLab Helm chart:
gitlab-postgresql FATAL: database files are incompatible with server
gitlab-postgresql DETAIL: The data directory was initialized by PostgreSQL version 11, which is not compatible with this version 12.7.To address this, perform a Helm rollback to the previous version of the chart and then follow the steps in the upgrade guide to upgrade the bundled PostgreSQL version. Once PostgreSQL is properly upgraded, try the GitLab Helm chart upgrade again.
Bundled NGINX Ingress pod fails to start: Failed to watch *v1beta1.Ingress
The following error message may appear in the bundled NGINX Ingress controller pod if running Kubernetes version 1.22 or later:
Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resourceTo address this, ensure the Kubernetes version is 1.21 or older. See #2852 for more information regarding NGINX Ingress support for Kubernetes 1.22 or later.
Increased load on /api/v4/jobs/request endpoint
You may face this issue if the option workhorse.keywatcher was set to false for the deployment servicing /api/*.
Use the following steps to verify:
-
Access the container
gitlab-workhorsein the pod serving/api/*:kubectl exec -it --container=gitlab-workhorse <gitlab_api_pod> -- /bin/bash -
Inspect the file
/srv/gitlab/config/workhorse-config.toml. The[redis]configuration might be missing:grep '\[redis\]' /srv/gitlab/config/workhorse-config.toml
If the [redis] configuration is not present, the workhorse.keywatcher flag was set to false during deployment
thus causing the extra load in the /api/v4/jobs/request endpoint. To fix this, enable the keywatcher in the
webservice chart:
workhorse:
keywatcher: trueGit over SSH: the remote end hung up unexpectedly
Git operations over SSH might fail intermittently with the following error:
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failedThere are a number of potential causes for this error:
-
Network timeouts:
Git clients sometimes open a connection and leave it idling, like when compressing objects. Settings like
timeout clientin HAProxy might cause these idle connections to be terminated.you can set a keepalive in
sshd:gitlab: gitlab-shell: config: clientAliveInterval: 15 -
gitlab-shellmemory:By default, the chart does not set a limit on GitLab Shell memory. If
gitlab.gitlab-shell.resources.limits.memoryis set too low, Git operations over SSH may fail with these errors.Run
kubectl describe nodesto confirm that this is caused by memory limits rather than timeouts over the network.System OOM encountered, victim process: gitlab-shell Memory cgroup out of memory: Killed process 3141592 (gitlab-shell)
Error: kex_exchange_identification: Connection closed by remote host
The following error can appear in the GitLab Shell logs:
subcomponent":"ssh","time":"2025-02-21T19:07:52Z","message":"kex_exchange_identification: Connection closed by remote host\r"}This error is caused by OpenSSH sshd being unable to handle readiness and liveness probes. To resolve this error, use
gitlab-sshd instead by changing sshDaemon: openssh to sshDaemon: gitlab-ssd in configuration:
gitlab:
gitlab-shell:
sshDaemon: gitlab-sshdYAML configuration: mapping values are not allowed in this context
The following error message may appear when YAML configuration contains leading spaces:
template: /var/opt/gitlab/templates/workhorse-config.toml.tpl:16:98:
executing \"/var/opt/gitlab/templates/workhorse-config.toml.tpl\" at <data.YAML>:
error calling YAML:
yaml: line 2: mapping values are not allowed in this contextTo address this, ensure that there are no leading spaces in configuration.
For example, change this:
key1: value1
key2: value2… to this:
key1: value1
key2: value2TLS and certificates
If your GitLab instance needs to trust a private TLS certificate authority, GitLab might fail to handshake with other services like object storage, Elasticsearch, Jira, or Jenkins:
error: certificate verify failed (unable to get local issuer certificate)Partial trust of certificates signed by private certificate authorities can occur if:
- The supplied certificates are not in separate files.
- The certificates init container doesn’t perform all the required steps.
Also, GitLab is mostly written in Ruby on Rails and Go, and each language’s TLS libraries work differently. This difference can result in issues like job logs failing to render in the GitLab UI but raw job logs downloading without issue.
Additionally, depending on the proxy_download configuration, your browser is
redirected to the object storage with no issues if the trust store is correctly configured.
At the same time, TLS handshakes by one or more GitLab components could still fail.
Certificate trust setup and troubleshooting
As part of troubleshooting certificate issues, be sure to:
-
Create secrets for each certificate you need to trust.
-
Provide only one certificate per file.
kubectl create secret generic custom-ca --from-file=unique_name=/path/to/certIn this example, the certificate is stored using the key name
unique_name
If you supply a bundle or a chain, some GitLab components won’t work.
Query secrets with kubectl get secrets and kubectl describe secrets/secretname,
which shows the key name for the certificate under Data.
Supply additional certificates to trust using global.certificates.customCAs
in the chart globals.
When a pod is deployed, an init container mounts the certificates and sets them up so the GitLab
components can use them. The init container isregistry.gitlab.com/gitlab-org/build/cng/alpine-certificates.
Additional certificates are mounted into the container at /usr/local/share/ca-certificates,
using the secret key name as the certificate filename.
The init container runs /scripts/bundle-certificates (source).
In that script, update-ca-certificates:
-
Copies custom certificates from
/usr/local/share/ca-certificatesto/etc/ssl/certs. -
Compiles a bundle
ca-certificates.crt. -
Generates hashes for each certificate and creates a symlink using the hash, which is required for Rails. Certificate bundles are skipped with a warning:
WARNING: unique_name does not contain exactly one certificate or CRL: skipping
Troubleshoot the init container’s status and logs. For example, to view the logs for the certificates init container and check for warnings:
kubectl logs gitlab-webservice-default-pod -c certificatesCheck on the Rails console
Use the toolbox pod to verify if Rails trusts the certificates you supplied.
-
Start a Rails console (replace
<namespace>with the namespace where GitLab is installed):kubectl exec -ti $(kubectl get pod -n <namespace> -lapp=toolbox -o jsonpath='{.items[0].metadata.name}') -n <namespace> -- bash /srv/gitlab/bin/rails console -
Verify the location Rails checks for certificate authorities:
OpenSSL::X509::DEFAULT_CERT_DIR -
Execute an HTTPS query in the Rails console:
## Configure a web server to connect to: uri = URI.parse("https://myservice.example.com") require 'openssl' require 'net/http' Rails.logger.level = 0 OpenSSL.debug=1 http = Net::HTTP.new(uri.host, uri.port) http.set_debug_output($stdout) http.use_ssl = true http.verify_mode = OpenSSL::SSL::VERIFY_PEER # http.verify_mode = OpenSSL::SSL::VERIFY_NONE # TLS verification disabled response = http.request(Net::HTTP::Get.new(uri.request_uri))
Troubleshoot the init container
Run the certificates container using Docker.
-
Set up a directory structure and populate it with your certificates:
mkdir -p etc/ssl/certs usr/local/share/ca-certificates # The secret name is: my-root-ca # The key name is: corporate_root kubectl get secret my-root-ca -ojsonpath='{.data.corporate_root}' | \ base64 --decode > usr/local/share/ca-certificates/corporate_root # Check the certificate is correct: openssl x509 -in usr/local/share/ca-certificates/corporate_root -text -noout -
Determine the correct container version:
kubectl get deployment -lapp=webservice -ojsonpath='{.items[0].spec.template.spec.initContainers[0].image}' -
Run container, which performs the preparation of
etc/ssl/certscontent:docker run -ti --rm \ -v $(pwd)/etc/ssl/certs:/etc/ssl/certs \ -v $(pwd)/usr/local/share/ca-certificates:/usr/local/share/ca-certificates \ registry.gitlab.com/gitlab-org/build/cng/gitlab-base:v15.10.3 -
Check your certificates have been correctly built:
-
etc/ssl/certs/corporate_root.pemshould have been created. -
There should be a hashed filename, which is a symlink to the certificate itself (such as
etc/ssl/certs/1234abcd.0). -
The file and the symbolic link should display with:
ls -l etc/ssl/certs/ | grep corporate_rootFor example:
lrwxrwxrwx 1 root root 20 Oct 7 11:34 28746b42.0 -> corporate_root.pem -rw-r--r-- 1 root root 1948 Oct 7 11:34 corporate_root.pem
-
308: Permanent Redirect causing a redirect loop
308: Permanent Redirect can happen if your Load Balancer is configured to send unencrypted traffic (HTTP) to NGINX.
Because NGINX defaults to redirecting HTTP to HTTPS, you may end up in a “redirect loop”.
To fix this, enable NGINX’s use-forwarded-headers setting.
“Invalid Word” errors in the nginx-controller logs and 404 errors
After upgrading to Helm chart 6.6 or later, you might experience 404 return
codes when visiting your GitLab or third-party domains for applications installed
in your cluster and are also seeing “invalid word” errors in the
gitlab-nginx-ingress-controller logs:
gitlab-nginx-ingress-controller-899b7d6bf-688hr controller W1116 19:03:13.162001 7 store.go:846] skipping ingress gitlab/gitlab-minio: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-688hr controller W1116 19:03:13.465487 7 store.go:846] skipping ingress gitlab/gitlab-registry: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:12.233577 6 store.go:846] skipping ingress gitlab/gitlab-kas: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:12.536534 6 store.go:846] skipping ingress gitlab/gitlab-webservice-default: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:12.848844 6 store.go:846] skipping ingress gitlab/gitlab-webservice-default-smartcard: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:13.161640 6 store.go:846] skipping ingress gitlab/gitlab-minio: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
gitlab-nginx-ingress-controller-899b7d6bf-lqcks controller W1116 19:03:13.465425 6 store.go:846] skipping ingress gitlab/gitlab-registry: nginx.ingress.kubernetes.io/configuration-snippet annotation contains invalid word proxy_pass
In that case, review your GitLab values and any third-party Ingress objects for the use
of configuration snippets.
You may need to adjust or modify the nginx-ingress.controller.config.annotation-value-word-blocklist
setting.
See Annotation value word blocklist for additional details.
Volume mount takes a long time
Mounting large volumes, such as the gitaly or toolbox chart volumes, can take a long time because Kubernetes
recursively changes the permissions of the volume’s contents to match the Pod’s securityContext.
Starting with Kubernetes 1.23 you can set the securityContext.fsGroupChangePolicy to OnRootMismatch to mitigate
this issue. This flag is supported by all GitLab subcharts.
For example for the Gitaly subchart:
gitlab:
gitaly:
securityContext:
fsGroupChangePolicy: "OnRootMismatch"See the Kubernetes documentation, for more details.
For Kubernetes versions not supporting fsGroupChangePolicy you can mitigate the
issue by changing or fully deleting the settings for the securityContext.
gitlab:
gitaly:
securityContext:
fsGroup: ""
runAsUser: ""The example syntax eliminates the securityContext setting entirely.
Setting securityContext: {} or securityContext: does not work due
to the way Helm merges default values with user provided configuration.
Intermittent 502 errors
When a request being handled by a Puma worker crosses the memory limit threshold, it is killed by the node’s OOMKiller.
However, killing the request does not necessarily kill or restart the webservice pod itself. This situation causes the request to return a 502 timeout.
In the logs, this appears as a Puma worker being created shortly after the 502 error is logged.
2024-01-19T14:12:08.949263522Z {"correlation_id":"XXXXXXXXXXXX","duration_ms":1261,"error":"badgateway: failed to receive response: context canceled"....
2024-01-19T14:12:24.214148186Z {"component": "gitlab","subcomponent":"puma.stdout","timestamp":"2024-01-19T14:12:24.213Z","pid":1,"message":"- Worker 2 (PID: 7414) booted in 0.84s, phase: 0"}To solve this problem, raise memory limits for the webservice pods.
Upgrade failed - cannot patch "gitlab-prometheus-server" with kind Deployment
With chart 9.0 we updated the major version Prometheus subchart. The selector labels and version of Prometheus were changes and need manual interaction.
Please follow the migration guide to upgrade the Prometheus chart.
Toolbox backup failing on upload
A backup may fail when trying to upload to the object storage with an error like:
An error occurred (XAmzContentSHA256Mismatch) when calling the UploadPart operation: The Content-SHA256 you specified did not match what we receivedThis might be caused by an incompatibility of the awscli tool and your object
storage service. This issue has been reported when using Dell ECS S3 Storage.
To avoid this issue you can disable data integrity protection.
Webservice readiness probe fails
Beginning with GitLab chart version 9.2 (GitLab 18.2), dual stack support for both IPv4 and IPv6 is enabled by default. If you’re running a GitLab version prior to 18.2 with a custom monitoring IP allowlist, this may cause the Kubernetes probes for the webservice Pods to fail.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
[snip]
Warning Unhealthy 43m (x15 over 44m) kubelet Startup probe failed: HTTP probe failed with statuscode: 404To fix the Webservice probes, either:
- Upgrade the Webservice image to match the chart version.
- Extend your monitoring allow list with the IPv6-mapped equivalent addressed
(for example
::ffff:10.0.0.0for10.0.0.0). - Explicitly configure the monitoring endpoint to listen on IPv4 only
(
gitlab.webservice.monitoring.listenAddr=0.0.0.0). - Disable IP mapping on a node/kernel level.
invalid: spec.progressDeadlineSeconds
If using Helm v3.18.0, you’ll get this error when upgrading your chart:
Error: UPGRADE FAILED: cannot patch "gitlab-nginx-ingress-controller" with kind Deployment: Deployment.apps "gitlab-nginx-ingress-controller" is invalid: spec.progressDeadlineSeconds: Invalid value: 0: must be greater than minReadySecondsTo fix it, upgrade your Helm client to v3.18.1 or later. Alternatively, you can downgrade it to v3.17.x.
This is due to a Helm issue 30878.