Customer stories

Finance

How Purse uses load testing to validate every critical payment flow before it goes live

About the company

Purse is a payment orchestrator that offers merchants orchestration capabilities such as connectivity to multiple payment methods and solutions, tokenization, authorization, capture, and bank reconciliation.

‍

Purse serves as the shared payment infrastructure for one of Europe's largest retail ecosystems, processing transactions where reliability is the product itself.

‍

Statistics

Industry

Finance

Location

France

Revenue

$38.7 million

Employees

60+

Key metrics

Load-testing-validated infrastructure migration

Gatling Enterprise users

Dev and QA across product teams

2 test types

per campaign

Java

simulations as code

1 migration

supported with load testing

When payment reliability became the test target

For Purse, performance sits at the core of what the platform delivers.

Every transaction flowing through Purse's APIs depends on low latency and high availability. A slow authorization, a degraded capture flow, or instability during a commercial peak does not just create a poor experience; it breaks a payment.

Nicolas Zangari leads Architecture and Platform at Purse, owning infrastructure decisions, developer experience, and the technical foundations the platform relies on. Thomas leads QA, responsible for testing practices and test automation across product teams. Together, they represent the two sides of how Purse approaches load testing: the infrastructure layer and the test execution layer.

Nicolas Zangari leads QA. He’s responsible for testing practices and test automation across product teams. Together, they represent the two sides of how Purse approaches load testing: the infrastructure layer and the test execution layer.

Their shared goal was clear: move load testing from an ad hoc, infrastructure-heavy exercise to a disciplined, accessible practice embedded in how Purse ships software.

Nicolas describes what that looked like before: “Our testing was a bit hit-or-miss, and we really needed to scale and standardize our process. The method itself was reliable, but it was a massive time sink”.

That context shaped two concrete objectives:

Validate that critical payment flows sustain target transaction volumes before production
Detect performance regressions before they reach merchants or end users

From JMeter to Gatling: choosing tests-as-code

Before adopting Gatling, the team had worked with JMeter. The transition away from it was driven by a fundamental mismatch: JMeter's GUI-centric approach made it difficult to maintain tests with the same discipline as application code.

For Nicolas, the case for Gatling was straightforward. "The main reason I've been doing Gatling for a long time is tests-as-code," he explains. "Being able to maintain tests the way you maintain code. JMeter's GUI is not for me."

Purse adopted Gatling Community Edition and has written all simulations in Java from the start, a deliberate choice aligned with the team's existing stack. Scala was never considered. Tests live in Git: some as a dedicated module within the application repository, others in a standalone repo. Either way, they follow the same version lifecycle as the code they test.

Moving to Enterprise: removing infrastructure as a blocker

The move from Gatling Open Source to Gatling Enterprise Edition was driven by friction. Running load tests at meaningful scale required provisioning and managing virtual machines; that logistical overhead made load testing feel like an event requiring coordination rather than a routine practice.

"The goal was to make testing a commodity, not an event where you have to mobilize people," says Nicolas.

Gatling Enterprise's managed load generators eliminated the infrastructure concern entirely. Teams can launch a test session without pre-configuring machines, coordinating VPN access, or worrying about whether the environment will hold. The historical record of runs is always accessible: no manual archiving, no lost results from a terminated session.

A second driver was real-time visibility. In Open Source, results were only available after the run completed and the report was built. Enterprise provides live metrics during execution — latency, throughput, error rates — which changes how the team monitors and reacts during a test session.

"The live visualization is what we spend the most time on," says Nicolas. "You can see latency data almost immediately, instead of waiting for the report to be generated at the end."

Test design: payment flows, not pages

Purse's load testing is API-only. The platform has no consumer-facing website; what matters is the behavior of payment APIs under load. Scenarios are designed accordingly.

Each product team owns its own simulations, structured around the critical flows their APIs support. For the core payment product, that means:

Payment session creation (two variants, reflecting different integration modes offered to merchants)
Simulated buyer interactions within the session: card selection, payment method choices
Session validation, which triggers authorization
Combined payment flows, simulating multi-instrument transactions
The full order lifecycle: capture, cancellation, and refund

This year, Purse added a new scenario category around the vault. Purse built an in-house card storage system to replace an outsourced solution, allowing merchants to register and reuse bank card data. Before going live, the vault needed to be validated under realistic load: it would progressively absorb the card storage volume previously handled externally.

Before going live, the vault needed to be validated under realistic load: it would progressively absorb the card storage volume previously handled externally.

The path mix is straightforward. Unlike a consumer website with unpredictable navigation patterns, payment API traffic follows predictable flows. Scenarios mirror what production traffic looks like, with limited variation in user paths.

Two test types, two questions

Purse runs two distinct injection profiles, each answering a different question.

Spike testing: can we sustain a peak?

The first profile simulates sharp, high-volume traffic bursts — the kind triggered by commercial events like Black Friday or seasonal sales. The goal is to validate that the platform holds under the maximum transaction rates Purse needs to support.

The metric here is business-native: transactions per hour, translated to a threshold that has direct meaning to the payment business. Once that threshold is met cleanly, the test passes.

Long-duration testing: does the system degrade over time?

The second profile applies sustained load over an extended period at lower volume. The purpose is different: detecting gradual degradation — memory pressure, connection pool exhaustion, cache behavior changes — that only manifests after time. This is where latency drift and throughput drops become visible.

Purse also uses load testing for a more tactical purpose: fine-tuning deployment behavior. When rolling out services, the team observed small perturbations during deployment windows. Load tests helped dial in the right configuration parameters to achieve clean, error-free rollouts.

How Purse runs and analyzes tests

Tests are launched manually through the Gatling Enterprise UI rather than triggered by an automated pipeline — though pipeline integration was part of the original motivation for moving to Enterprise and remains a future direction.

The current cadence is once per year on critical components before major releases, with the goal of moving toward more frequent, systematic execution to catch regressions earlier. Some campaigns now run without Nicolas's direct involvement, a marker of growing autonomy within product teams.

What the team looks at

Analysis focuses on three signals: error count, latency, and throughput. These are always translated into business terms — transactions per hour — to give results meaning beyond raw HTTP metrics.

For latency, the team compares runs against each other rather than against a fixed reference. Prior runs serve as the baseline, and drift from previous results is what triggers investigation. Closed injection is the default model, which means throughput reflects what the system actually delivers under a given concurrency level.

Observability during tests combines Gatling Enterprise's live UI with parallel monitoring in Datadog. Today, that analysis is manual: two windows, two data streams. A native Gatling Enterprise integration with Datadog is now available, which would let the team correlate request-level load test metrics with infrastructure resource consumption in a single dashboard.

A major validation: migrating to a new infrastructure

The most significant load testing exercise Purse has run to date came in the second half of 2024, when the team completed a full migration to a new, independently operated infrastructure.

The platform's core system had previously run in a shared hosting context. The migration meant deploying the same software on a completely new stack — with high availability and scalability built from scratch by Purse's own team.

The software was known and trusted. But validating that a completely new infrastructure could sustain production load was a different question, and one that load testing helped answer before any traffic was shifted.

The tests confirmed that the new platform met known performance benchmarks. The migration was completed with the confidence that the system had been verified under load before traffic was shifted.

Key Gatling Enterprise capabilities Purse relies on

Two capabilities dominate the team's day-to-day use.

Real-time run visualization. The live dashboard during a test session is where the team spends most of its attention. Seeing latency and throughput develop in real time, rather than waiting for a post-run report, changes the nature of the monitoring exercise. Issues are visible while the test is still running.

Run comparison. Comparing the current run against previous ones is the primary method for detecting regression. Without a formal reference run in all cases, the history of results provides the benchmark. The comparison view makes drift visible without requiring manual data extraction.

Benefits: access, confidence, and shared ownership

For Nicolas and Thomas, the most tangible change since adopting Gatling Enterprise is the removal of friction that used to make load testing feel like an exceptional operation.

"There's no longer an infrastructure topic in our load testing," says Nicolas. "It's clearly become much more accessible and professional."

That shift has had a secondary effect: knowledge about load testing is spreading across teams. Scenarios are owned by the teams that build the features, not centralized with a single expert. Campaigns run without the coordination overhead they once required. Historical results are always one click away.

Collaboration has changed too. Thomas points to a shift in how teams relate to testing: ownership is less concentrated, scenarios are more widely understood, and the practice is no longer dependent on a single person being in the room.

Next steps

Purse's immediate priority is regularity. Moving from annual testing on critical components to a more systematic, recurring cadence is the next step: one that would let the team catch regressions as they are introduced rather than only before major releases.

Beyond that, two directions are on the horizon.

The first is team autonomy; continuing to push ownership of load testing into product teams, so campaigns run as part of normal delivery rhythm rather than as separate initiatives.

The second is infrastructure optimization. As the platform's usage grows, load testing data could inform decisions about right-sizing infrastructure — identifying components that carry unnecessary overhead and tuning capacity to improve economic efficiency without sacrificing reliability.

‍