Monitoring Gitaly Cluster (Praefect)
To monitor Gitaly Cluster (Praefect), you can use Prometheus metrics. Two separate metrics endpoints are available from which metrics can be scraped:
- The default
/metrics
endpoint. /db_metrics
, which contains metrics that require database queries.
Default Prometheus /metrics
endpoint
The following metrics are available from the /metrics
endpoint:
-
gitaly_praefect_read_distribution
, a counter to track distribution of reads. It has two labels:virtual_storage
.storage
.
They reflect configuration defined for this instance of Praefect.
-
gitaly_praefect_replication_latency_bucket
, a histogram measuring the amount of time it takes for replication to complete after the replication job starts. -
gitaly_praefect_replication_delay_bucket
, a histogram measuring how much time passes between when the replication job is created and when it starts. -
gitaly_praefect_connections_total
, the total number of connections to Praefect. -
gitaly_praefect_method_types
, a count of accessor and mutator RPCs per node.
To monitor strong consistency, you can use the following Prometheus metrics:
gitaly_praefect_transactions_total
, the number of transactions created and voted on.gitaly_praefect_subtransactions_per_transaction_total
, the number of times nodes cast a vote for a single transaction. This can happen multiple times if multiple references are getting updated in a single transaction.gitaly_praefect_voters_per_transaction_total
: the number of Gitaly nodes taking part in a transaction.gitaly_praefect_transactions_delay_seconds
, the server-side delay introduced by waiting for the transaction to be committed.gitaly_hook_transaction_voting_delay_seconds
, the client-side delay introduced by waiting for the transaction to be committed.
To monitor repository verification, use the following Prometheus metrics:
gitaly_praefect_verification_jobs_dequeued_total
, the number of verification jobs picked up by the worker.gitaly_praefect_verification_jobs_completed_total
, the number of verification jobs completed by the worker. Theresult
label indicates the end result of the jobs:valid
indicates the expected replica existed on the storage.invalid
indicates the replica expected to exist did not exist on the storage.error
indicates the job failed and has to be retried.
gitaly_praefect_stale_verification_leases_released_total
, the number of stale verification leases released.
You can also monitor the Praefect logs.
Database metrics /db_metrics
endpoint
The following metrics are available from the /db_metrics
endpoint:
gitaly_praefect_unavailable_repositories
, the number of repositories that have no healthy, up to date replicas.gitaly_praefect_replication_queue_depth
, the number of jobs in the replication queue.gitaly_praefect_verification_queue_depth
, the total number of replicas pending verification.