Atlassian Security Testing Team Implements AI-Driven Rovo Dev Skills to Automate and Enhance White-Box Penetration Testing Workflows

The Atlassian Security Testing team, led by security researchers Giuliana and Sam, has announced a significant shift in their approach to application security by integrating AI-powered capabilities through Rovo Dev Skills. This initiative aims to refine white-box and code-assisted security engagements, allowing internal teams to maximize their ability to identify high-quality vulnerabilities while providing deep, context-driven recommendations for remediation. By leveraging agentic AI, the team is automating the foundational steps of security assessments—such as static analysis, route extraction, and lead generation—to allow human testers to focus on complex logic flaws and high-value exploitation.

In the realm of modern application security, a white-box assessment is considered the gold standard for depth and accuracy. Unlike black-box testing, which approaches a system from the outside with no prior knowledge, white-box testing involves a comprehensive analysis of the internal source code, architecture, and configurations. Traditionally, this process begins with security professionals running a variety of static analysis security testing (SAST) tools to build context about the target system. These tools help identify "hotspots"—areas of the code that are more likely to contain vulnerabilities—and extract API routes that define the attack surface. However, the manual orchestration of these tools and the subsequent triage of their results can be time-consuming and prone to noise.

To address these inefficiencies, Atlassian has turned to Rovo Dev Skills. Skills are defined as an open format that provides AI agents with specific instructions, scripts, and resources, enabling them to perform tasks with higher accuracy and efficiency than a general-purpose LLM. Technically, these skills are plain-text Markdown files containing instructions that guide the agent on how to handle specific workflows. According to the Atlassian team, these skills represent one of the most reliable methods for customizing an AI agent, as they provide the necessary domain expertise and tool-specific understanding required for rigorous security auditing.

The core of Atlassian’s new security workflow is an orchestrator prompt designed to call multiple specialized skills in an optimized sequence. This modular approach allows the security team to build a fully-fledged agentic testing workflow. The suite of skills developed by the team integrates several industry-standard open-source tools, each serving a distinct purpose in the vulnerability discovery lifecycle.

The initial phase of the assessment is handled by the language-analyzer skill, which utilizes the cloc (Count Lines of Code) tool. This skill allows the agent to detect the programming languages used in a repository and calculate the volume of code, providing the tester with an immediate understanding of the project’s scale and technical stack. Following this, the semgrep-scanner skill employs the semgrep tool to perform SAST scanning. This skill is specifically configured to map findings to Common Weakness Enumeration (CWE) and OWASP standards, ensuring that the AI-driven results are aligned with global security benchmarks.

To address the risk of credential leakage, the secret-scanner skill utilizes trufflehog to detect hardcoded secrets, such as API keys and passwords, within the codebase. Simultaneously, the endpoint-enumerator skill uses ripgrep (rg) to find API route definitions, effectively mapping the application’s external-facing interface. For deeper analysis, the security-reviewer skill performs AI-powered code auditing by combining traditional Unix tools like find and grep with the reasoning capabilities of the large language model.

Supercharge your Security Testing with Rovo Dev Skills - Work Life by Atlassian

One of the more advanced components of the workflow is the cvss-scorer. This skill uses a Python CVSS library to generate accurate Common Vulnerability Scoring System (CVSS) vectors and scores based on the agent’s determination of the vulnerability’s impact. This ensures that findings are prioritized based on standardized risk metrics. To manage the inevitable noise produced by automated tools, the sast-investigator skill triages findings to distinguish between true and false positives. Finally, the generate-test-leads skill uses Confluence Model Context Protocol (MCP) tools to produce a prioritized list of test leads for human validation, bridging the gap between automated discovery and manual verification.

The process of creating a Rovo Dev Skill is designed to be accessible to security engineers. Each skill is defined by a SKILL.md file, which includes a YAML frontmatter section containing the skill’s name and description. This is followed by detailed instructions in Markdown format. Users can create these skills using the Rovo Dev Command Line Interface (CLI) or by manually creating Markdown files at the project level (.rovodev/skills) or user-scoped level (~/.rovodev/skills). The Atlassian team emphasizes that effective skills should be specific, providing clear instructions on how to handle tool outputs and including examples of what "good" results look like.

Beyond the initial set of tools, Atlassian has integrated more specialized skills into their internal workflow. These include skills for identifying sensitive or "interesting" files that warrant closer inspection, identifying risky code patterns specific to the company’s internal libraries, and even automating the generation of Jira tickets for discovered vulnerabilities. By codifying procedural knowledge and team-specific context into these skills, the security team ensures that the AI agent acts as a highly capable pentesting assistant.

A critical consideration in the development of these AI agents is the distinction between using "Skills" and building integrations via the Model Context Protocol (MCP). The MCP is a structured framework that allows agents to interact with external tools and platforms, such as Jira or Confluence. While MCP is powerful for managing complex interactions and secrets, the Atlassian team argues that it is often "overkill" for simple command-line interface (CLI) tools. For tools like grep, semgrep, or cloc, the agent can reliably reason with Markdown-based instructions to execute the tool and interpret its results without the overhead of a full MCP server.

However, Atlassian still utilizes MCP in scenarios where LLM reasoning might be compromised by "hallucinations" or where high-stakes data management is required. This includes instances where the tool output is too large for the LLM’s context window, where complex state management is necessary across multiple steps, or when interacting with external APIs that require secure credential handling. The team notes that the choice between Skills and MCP is a fluid decision, as the field of agentic AI is rapidly evolving.

The implementation of Rovo Dev Skills represents a broader trend in the cybersecurity industry toward "AI-augmented" security. By automating the repetitive and data-heavy portions of the penetration testing process, Atlassian aims to increase its security coverage and find vulnerabilities faster than traditional manual methods would allow. The goal is not to replace the human element of security testing but to empower researchers to cut through the noise and focus on the most critical threats.

Ultimately, the Atlassian Security Testing team views Rovo Dev Skills as a way to "supercharge" the security workflow. By providing agents with access to procedural knowledge and company-specific context, they are creating a more resilient and efficient security posture. As these tools continue to mature, the team expects to expand the library of available skills, further integrating AI into the lifecycle of secure software development. The resources and orchestrator prompts developed by Giuliana and Sam have been made available via GitHub, encouraging the wider security community to explore and adopt agentic AI workflows in their own testing environments.