Testing AI features
This document highlights AI-specific testing considerations that complement GitLab standard testing guidelines. It focuses on the challenges AI features bring to testing, such as non-deterministic responses from third-party providers. Examples are included for each testing level.
AI-powered features depend on system components outside the GitLab monolith, such as the AI Gateway and IDE extensions. In addition to these guidelines, consult any testing guidelines documented in each component project.
Unit testing
Follow standard unit testing guidelines. For AI features, always mock third-party AI provider calls to ensure fast, reliable tests.
Unit test examples
- GitLab:
ee/spec/lib/code_suggestions/tasks/code_completion_spec.rb - VS Code extension:
code_suggestions/code_suggestions.test.ts
Integration tests
Use integration tests to verify request construction and response handling for AI providers. Mock AI provider responses to ensure predictable, fast tests that handle various responses, errors, and status codes.
Integration test examples
- GitLab:
ee/spec/requests/api/code_suggestions_spec.rb - VS Code extension:
main/test/integration/chat.test.js
Frontend feature tests
Use frontend feature tests to validate AI features from an end-user perspective. Mock AI providers to maintain speed and reliability. Focus on happy paths with selective negative path testing for high-risk scenarios.
Frontend feature test example
- GitLab Duo Chat:
ee/spec/features/duo_chat_spec.rb
DAP feature tests in core feature pages
To test that DAP features are functional in a core feature page and core features are functional with DAP components, use the following shared context and examples in a feature spec:
- Include the shared context
include_context 'with duo features enabled and agentic chat available for group on SaaS'to load DAP components in a feature page by default. - Include the shared examples
it_behaves_like 'user can use agentic chat'to test DAP features in a feature page.
For instance, ee/spec/features/epic_boards/epic_boards_spec.rb asserts the following scenario:
- Epic board is functional on a page that loads DAP components in the sidebar.
- DAP feature is functional in a page where the epic board is rendered.
- User visits a core feature page and open Duo Agentic Chat from the sidebar.
- User asks a question in the chat.
- Frontend JS/Vue initiates websocket connection with Workhorse (This Workhorse instance runs locally in the test environment).
- Frontend JS/Vue sends a gRPC request to DWS through Workhorse (This DWS instance runs locally in the test environment). LLM responses are mocked for explicit assertions therefore test failures are reproducible.
Run DAP feature tests when making a change in AI Gateway
These feature tests run when we make a change to AI Gateway repository as well, to verify that an MR does not accidentally break DAP features e.g.
- A developer opens an MR in AI Gateway project.
- A pipeline runs for the MR, which triggers downstream pipeline in GitLab project against
aigw/test-branchtest branch. This branch points the same SHA with master. - If a pipeline fails, the developer should investigate if the proposed change doesn’t accidentally introduce regressions.
NOTE:
aigw/test-branchbranch is unprotected by default for allowing AIGW & DWS maintainers to trigger downstream pipelines in GitLab project.
Run a feature spec locally with your DWS/AIGW change
- Run
gdk startto start services including DWS. - Open terminal at
<gdk-root>/gitlaband use one of the following options:- Run
export TEST_AI_GATEWAY_REPO_BRANCH=<your-remote-feature-branch>and delete<gitlab-rails-root>/tmp/tests/gitlab-ai-gateway/cache dir, OR - Run
export TEST_DUO_WORKFLOW_SERVICE_ENABLED="false" && export TEST_DUO_WORKFLOW_SERVICE_PORT=<your-local-dws-port>. This allows the feature tests to request to your local DWS instance. Make sure the following configuration is set to your local DWS and it’s running:- Set
truetoAIGW_MOCK_MODEL_RESPONSES - Set
truetoAIGW_USE_AGENTIC_MOCK
- Set
- Run
- Run a feature spec e.g.
bundle exec rspec ee/spec/features/epic_boards/epic_boards_spec.rb.
See logs of a test case
DAP consists of multiple services and API calls. To debug a test case failure, you may need to examine service logs to identify the root cause. Here are the couple of pointers:
-
GitLab-Rails REST API …
log/api_json.log -
GitLab-Rails GraphQL API …
log/graphql_json.log -
GitLab-Workhorse …
log/workhorse-test.log -
DWS … Either stdout or
DUO_WORKFLOW_LOGGING__TO_FILEingitlab-ai-gatewayrepo. -
You can also examine the state of VueJS app by having JS console log output:
it 'runs a test' do ... # This prints the browser logs. Combine with `console.log()` in JavaScript. browser_logs.each do |log| puts "#{log.level}: #{log.message}" end ... end
End-to-End testing
Use end-to-end tests sparingly to verify AI features work with real provider responses. Key considerations:
- Keep tests minimal due to slower execution and potential provider outages.
- Account for non-deterministic AI responses in test design. For example, use deterministic assertions on controlled elements like chatbot names, not AI-generated content.
E2E test examples
- GitLab:
specs/features/ee/browser_ui/3_create/web_ide/code_suggestions_in_web_ide_spec.rb - JetBrains:
test/kotlin/com/gitlab/plugin/e2eTest/tests/CodeSuggestionTest.kt
Live environment testing
- GitLab.com: We run minimal E2E tests continuously against staging and production environments. For example, Code Suggestions smoke tests.
- GitLab Self-Managed: We use the
gitlab-qaorchestrator with AI Gateway scenarios to test AI features on GitLab Self-Managed instances.
Exploratory testing
Perform exploratory testing before significant milestones to uncover bugs outside expected workflows and UX issues. This is especially important for AI features as they progress through experiment, beta, and GA phases.
Dogfooding
We dogfood everything. This is especially important for AI features given the rapidly changing nature of the field. See the dogfooding process for details.