Atlassian Unveils the Engineering Architecture Behind Forge’s Scalable Usage-Based Billing System

Atlassian has detailed the complex engineering framework powering Forge Billing, the usage-based pricing infrastructure for its serverless cloud app development platform. As Forge has matured into a robust ecosystem powering thousands of applications across Jira and Confluence, Atlassian transitioned from traditional pricing models to a consumption-based approach. This shift necessitated the creation of a high-scale metering and attribution system capable of processing hundreds of millions of events daily without the risks of data loss or double-counting. The resulting architecture represents a sophisticated integration of event streaming, data lake technology, and automated commerce systems, ensuring that developers are billed accurately for the specific resources their applications consume above a designated free tier.

The move to usage-based pricing presented a significant engineering hurdle: the reliable measurement of diverse resources, such as function invocations, data storage, and log volume, across a distributed serverless environment. To solve this, Atlassian’s engineering team developed an end-to-end flow designed to normalize telemetry from various Forge services and translate it into financial line items. The core challenge was not merely "metering and billing," but ensuring systemic synchronization and enforcing budget limits without compromising the platform’s reliability. The architecture is structured as a linear progression, moving from service-level event emission through a centralized usage pipeline, into internal billing systems, and finally surfacing to the Developer Console.

Engineering the Forge Billing Platform for Reliability and Scale - Work Life by Atlassian

At the beginning of this pipeline, individual Forge services are responsible for measuring resource consumption. Each owning service tracks usage in a consistent manner—counting discrete events like invocations or measuring cumulative metrics like "GB-seconds" for storage. These services emit events that adhere to a shared "telemetry contract," which includes critical metadata such as application IDs, installation contexts, and specific units of measure. By keeping the measurement logic close to the source, Atlassian ensures that the raw data is as accurate as possible before it enters the broader processing ecosystem. This decentralized emission model allows individual teams to maintain their specific resource logic while providing a standardized format for downstream consumption.

Once emitted, these usage events are routed through StreamHub, Atlassian’s managed event bus designed for asynchronous service-to-service communication. StreamHub acts as a gatekeeper, validating each event against registered JSON schemas before persisting the data to Kafka. This layer is crucial for maintaining data integrity; it prevents malformed telemetry from polluting the billing pipeline. Rather than forcing downstream consumers to read directly from Kafka, StreamHub’s distribution layer fans out events to managed AWS resources like SQS, SNS, or Kinesis. This abstraction allows the system to scale horizontally and ensures that the delivery of usage events is both resilient and decoupled from the internal workings of the individual Forge services.

The heart of the billing architecture is the Usage Tracking Service (UTS), which Atlassian describes as the "nervous system" of Forge Billing. UTS ingests events from multiple producers in near real-time, performing normalization, verification, and enrichment. It resolves complex identifiers, such as environment contexts and app IDs, and performs the vital task of deduplication. In a high-scale event-streaming environment, ensuring that a single usage event is not counted twice is essential for maintaining developer trust. UTS ensures that by the time usage data leaves the pipeline, it is in a "billing-ready" state that can be safely consumed by Atlassian’s broader financial and commerce engines.

To handle the massive volume of data, which currently exceeds 300 million usage events per day, Atlassian utilizes an internal data lake named Plato, built on Databricks and Amazon S3. Plato is divided into two distinct tiers: a "cold tier" for long-term storage and a "hot tier" for low-latency queries. The cold tier stores the immutable, raw telemetry that serves as the "source of truth" for the entire system. This layer is used for large-scale batch processing, historical analysis, and re-processing data in the event of an outage or logic error. The hot tier, conversely, is optimized for performance, allowing developers to see their current usage and charges in the Developer Console with minimal delay. This dual-tier approach balances the need for absolute financial correctness with the requirement for a responsive user experience.

One of the most technically demanding aspects of the UTS and Plato integration is the handling of different metric types, specifically counter-style and gauge-style metrics. Counter-style metrics, such as the number of function invocations, are strictly increasing. The system uses unique event IDs to recognize and discard duplicates while summing legitimate events within a specific window. Gauge-style metrics, such as bytes of data stored over time, are more complex because they represent capacity rather than discrete actions. To handle this, the pipeline converts gauges into counter equivalents—for example, calculating "GB-hours." The system employs a "last-write-wins" deterministic logic for storage metrics, where the latest value for a specific resource in a given time window is treated as the correct state. This ensures that even if events arrive out of order or are delayed, the final billing calculation remains stable and defensible.

Once the usage data is normalized and verified, it is handed off to the Atlassian Commerce systems. This stage involves mapping usage to the "Developer Space," which serves as the primary billing account for Forge. A Developer Space can contain multiple apps, allowing teams to receive a single consolidated invoice while still maintaining visibility into the costs associated with each individual application. In this layer, the system applies Forge’s consumption-based pricing logic, which includes applying free-tier allowances and calculating any applicable discounts. If a payment is missed, the commerce system also manages dunning flows and potential subscription cancellations, ensuring the platform remains financially sustainable.

A critical component of the Forge pricing model is the application of free quotas. Unlike some platforms that apply quotas at the account level, Forge applies free allowances on a per-app basis. At the start of every monthly billing cycle, the Commerce system configures these allowances within UTS. As usage events flow through the month, UTS tracks consumption against these per-app limits. Only usage that exceeds the free threshold—referred to as "overage"—is flagged for billing. At the end of the cycle, the system aggregates these overages, generates a monthly invoice for the Developer Space, and resets the allowances for the next period. This granular approach allows developers to experiment with multiple small-scale applications without incurring costs across their entire portfolio.

To provide developers with predictability and control over their spending, Atlassian integrated a budgeting and alerting system. Forge provides fixed alerts when an application’s consumption crosses 50%, 75%, 90%, and 100% of its free allowance or its user-configured budget. These alerts are triggered by the same underlying telemetry that powers the billing engine, ensuring consistency between the alerts a developer receives and the final invoice they see. This transparency is vital for developers operating in a serverless environment, where unexpected spikes in traffic could otherwise lead to unforeseen financial liabilities.

In summary, the Forge Billing platform is a deterministic, governed pipeline that successfully bridges the gap between raw cloud telemetry and financial accounting. By leveraging StreamHub for reliable event delivery, UTS for normalization, and Plato for high-performance data processing, Atlassian has created a system that prioritizes correctness, auditability, and scale. The platform’s ability to handle 300 million events daily with "exactly-once" processing guarantees demonstrates the maturity of Atlassian’s cloud infrastructure. For the developer community, this system provides the necessary clarity and predictability to build and scale applications on Forge, backed by a billing engine that is as reliable as the core product services it measures.

Atlassian Unveils the Engineering Architecture Behind Forge’s Scalable Usage-Based Billing System

Leave a Reply Cancel reply

The Founders Co

Popular Posts

Leave a Reply Cancel reply

Related News