Features

Everything you need to test and prove your AI agents are production-ready

Open source and cloud share the same features. Both include full mutation categories, safety checks, and all testing capabilities. The difference is execution: open source runs locally with local LLMs, while cloud provides zero-setup, scaling, team collaboration, and CI/CD workflows.

Mutation Capabilities

24 Mutation Types

Comprehensive mutation coverage across prompt and system layers. Available in both open source and cloud with no feature gating.

Core Prompt-Level Attacks (8)

Paraphrase, noise/typos, tone shift, prompt injection, encoding attacks, context manipulation, length extremes, custom.

Advanced Prompt-Level Attacks (7)

Multi-turn attacks, advanced jailbreaks, semantic similarity attacks, format poisoning, language mixing, token manipulation, temporal attacks.

System/Network-Level Attacks (9)

HTTP header injection, payload size attacks, content-type confusion, query parameter poisoning, request method attacks, protocol-level attacks, resource exhaustion, concurrent patterns, timeout manipulation.

Semantic Perturbation

Uses LLMs to rewrite inputs semantically without changing user intent. Generates meaningful variations that test agent robustness, not just random noise.

Invariant Assertions

Deterministic Checks

Contains patterns, regex matching, latency limits, JSON validity

Semantic Similarity

Vector-based similarity checking using local embeddings to ensure responses maintain semantic meaning

Safety Checks

PII detection and refusal checks using regex patterns. Available in both open source and cloud.

Advanced AI/ML Safety

Advanced AI/ML-based PII detection, contextual analysis, and safety scoring. Jailbreaking attacks are covered in mutation capabilities. Same features available in both open source and cloud.

Execution & Infrastructure

Open Source Execution

Run locally using Ollama with Qwen 3 8B. No API costs. Perfect for validation, experimentation, and CI integration. All features available locally.

Cloud Execution

Zero-setup execution with real LLM APIs. Fast, parallel test runs at scale. Same features as open source, with added speed and infrastructure.

Team Collaboration

Cloud plans include shared dashboards, team workflows, and collaboration features. Open source runs locally with full feature parity.

CI/CD Integration

Both open source and cloud support CI/CD integration. Cloud adds advanced workflows, gating, and team-wide enforcement.

Reporting

Interactive HTML Reports

Beautiful pass/fail matrices with mutation details and failure analysis

JSON Export

Export results as JSON for CI/CD integration and programmatic analysis

Terminal Output

Rich terminal UI with progress bars and real-time updates

Robustness Score

Mathematical score (0.0-1.0) that quantifies agent reliability

Test History

Historical test runs with trend analysis and commit-by-commit comparison. Available locally in open source. Cloud plans provide 6-12 months of centralized history with team access.

Integrations

HTTP Agents

Test any HTTP-based agent endpoint

Python Callables

Directly test Python functions and callables

LangChain

Native LangChain chain integration

CI/CD Integration

GitHub Actions, GitLab CI, Jenkins, CircleCI. Block merges on trust score drops. PR comments with results. Available in both open source and cloud. Cloud adds team-wide enforcement and advanced workflows.

Notifications

Slack, email, webhook support for test completion alerts. Available in cloud plans for team coordination.

Developer Experience

Simple CLI

Install with pip, configure with YAML, run with one command

YAML Configuration

Human-readable configuration format. Same config works for local and cloud

Rich Terminal UI

Beautiful progress bars and real-time feedback using Rich library

Type Safety

Full type hints and Pydantic validation for configuration

Rust Performance

Performance-critical operations (scoring, similarity) use Rust bindings

Ready to get started?