Zoekt tasks SLIs (service level indicators)
The Zoekt tasks SLIs track the performance and reliability of asynchronous indexing tasks for Zoekt code search.
Metrics
The following metrics are emitted for Zoekt task processing:
Request rate
gitlab_sli_search_zoekt_tasks_requests_total: Counter that tracks the rate of tasks being added to the Zoekt indexing queue. This metric increments when tasks are created viaSearch::Zoekt::Repository#create_bulk_tasks.
Error rate
gitlab_sli_search_zoekt_tasks_total: Counter for total number of task completion attempts.gitlab_sli_search_zoekt_tasks_error_total: Counter for tasks that reached final failure state (after retries exhausted). This metric only increments when a task transitions to the:failedstate, not during intermediate retries.
Apdex (Application Performance Index)
The Apdex SLI measures task completion performance with a 30-minute (1800 second) threshold. This threshold aligns with the indexing timeout to ensure consistency with task execution limits.
The following metrics track Apdex:
gitlab_sli_search_zoekt_tasks_apdex_total: Counter for total number of completed tasks.gitlab_sli_search_zoekt_tasks_apdex_success_total: Counter for tasks that completed within the 30-minute threshold.
Labels
All metrics include the following labels for detailed analysis:
zoekt_node: The Zoekt node identifier handling the task (fromzoekt_node_id)task_type: The operation type, such as:index_repo: Full repository indexingdelete_repo: Repository deletion from index- Other task-specific operations
Example Prometheus queries
Overall task success rate
rate(gitlab_sli_search_zoekt_tasks_apdex_success_total[5m])
/
rate(gitlab_sli_search_zoekt_tasks_apdex_total[5m])Error rate by node
sum by (zoekt_node) (
rate(gitlab_sli_search_zoekt_tasks_error_total[5m])
)
/
sum by (zoekt_node) (
rate(gitlab_sli_search_zoekt_tasks_total[5m])
)Task throughput by type
sum by (task_type) (
rate(gitlab_sli_search_zoekt_tasks_requests_total[5m])
)Slow tasks (exceeding Apdex threshold)
rate(gitlab_sli_search_zoekt_tasks_apdex_total[5m])
-
rate(gitlab_sli_search_zoekt_tasks_apdex_success_total[5m])Per-node, per-type error rate
sum by (zoekt_node, task_type) (
rate(gitlab_sli_search_zoekt_tasks_error_total[5m])
)Implementation details
The SLI is defined in Gitlab::Metrics::ZoektTasksSlis and instrumented at key
points in the task lifecycle:
- Request rate: Incremented when tasks are created in
Search::Zoekt::Repository#create_bulk_tasks - Error rate: Incremented when tasks reach final failure in
Search::Zoekt::CallbackService#process_failure - Apdex: Recorded when tasks complete successfully in
Search::Zoekt::CallbackService#process_zoekt_success
For more information about Application SLIs, see the Application SLI framework documentation.