Features
Everything you need to test and prove your AI agents are production-ready
Open source and cloud share the same features. Both include full mutation categories, safety checks, and all testing capabilities. The difference is execution: open source runs locally with local LLMs, while cloud provides zero-setup, scaling, team collaboration, and CI/CD workflows.
Mutation Capabilities
24 Mutation Types
Comprehensive mutation coverage across prompt and system layers. Available in both open source and cloud with no feature gating.
Core Prompt-Level Attacks (8)
Paraphrase, noise/typos, tone shift, prompt injection, encoding attacks, context manipulation, length extremes, custom.
Advanced Prompt-Level Attacks (7)
Multi-turn attacks, advanced jailbreaks, semantic similarity attacks, format poisoning, language mixing, token manipulation, temporal attacks.
System/Network-Level Attacks (9)
HTTP header injection, payload size attacks, content-type confusion, query parameter poisoning, request method attacks, protocol-level attacks, resource exhaustion, concurrent patterns, timeout manipulation.
Semantic Perturbation
Uses LLMs to rewrite inputs semantically without changing user intent. Generates meaningful variations that test agent robustness, not just random noise.
Invariant Assertions
Deterministic Checks
Contains patterns, regex matching, latency limits, JSON validity
Semantic Similarity
Vector-based similarity checking using local embeddings to ensure responses maintain semantic meaning
Safety Checks
PII detection and refusal checks using regex patterns. Available in both open source and cloud.
Advanced AI/ML Safety
Advanced AI/ML-based PII detection, contextual analysis, and safety scoring. Jailbreaking attacks are covered in mutation capabilities. Same features available in both open source and cloud.
Execution & Infrastructure
Open Source Execution
Run locally using Ollama with Qwen 3 8B. No API costs. Perfect for validation, experimentation, and CI integration. All features available locally.
Cloud Execution
Zero-setup execution with real LLM APIs. Fast, parallel test runs at scale. Same features as open source, with added speed and infrastructure.
Team Collaboration
Cloud plans include shared dashboards, team workflows, and collaboration features. Open source runs locally with full feature parity.
CI/CD Integration
Both open source and cloud support CI/CD integration. Cloud adds advanced workflows, gating, and team-wide enforcement.
Reporting
Interactive HTML Reports
Beautiful pass/fail matrices with mutation details and failure analysis
JSON Export
Export results as JSON for CI/CD integration and programmatic analysis
Terminal Output
Rich terminal UI with progress bars and real-time updates
Robustness Score
Mathematical score (0.0-1.0) that quantifies agent reliability
Test History
Historical test runs with trend analysis and commit-by-commit comparison. Available locally in open source. Cloud plans provide 6-12 months of centralized history with team access.
Integrations
HTTP Agents
Test any HTTP-based agent endpoint
Python Callables
Directly test Python functions and callables
LangChain
Native LangChain chain integration
CI/CD Integration
GitHub Actions, GitLab CI, Jenkins, CircleCI. Block merges on trust score drops. PR comments with results. Available in both open source and cloud. Cloud adds team-wide enforcement and advanced workflows.
Notifications
Slack, email, webhook support for test completion alerts. Available in cloud plans for team coordination.
Developer Experience
Simple CLI
Install with pip, configure with YAML, run with one command
YAML Configuration
Human-readable configuration format. Same config works for local and cloud
Rich Terminal UI
Beautiful progress bars and real-time feedback using Rich library
Type Safety
Full type hints and Pydantic validation for configuration
Rust Performance
Performance-critical operations (scoring, similarity) use Rust bindings
Ready to get started?