Load testing for black friday: what actually matters (from a QA perspective)

Marijana Rukavina
QA Practice Lead
CYBER64
Table of contents

Load testing for black friday: what actually matters (from a QA perspective)

Performance testing feels like a broad topic when viewed from a QA perspective. I learned that especially through load testing work with Gatling, where the task is not only to write scripts and press run.

No, the difficult part is deciding what kind of user behavior you are trying to simulate, which load pattern makes sense, which data can be trusted, and how to explain the results when something starts to slow down.

If you're preparing for a high-traffic event like Black Friday — $11.8 billion in U.S. online sales in 2025 alone — the goal isn't to "run some performance tests." The goal is to understand where your application can break, how it behaves under realistic traffic, and what needs to be fixed before customers are affected.

Over the years, working in QA, especially on e-commerce projects, I've seen how performance issues rarely appear as a single obvious failure. Sometimes checkout slows down because of a third-party dependency. Sometimes search looks fine until realistic catalog data is used. Sometimes the test itself fails due to a test data issue; the data becomes obsolete.

That is why I wanted to write this from a QA perspective: not as a perfect performance engineering playbook, but as a practical guide to what one should think of when preparing for an event like Black Friday.

Why QA belongs in load testing

Performance testing is often seen as a DevOps or performance engineering task. But from my experience, QA plays an important role in modeling relevant user journeys for performance tests and brings business context to test data, because a load test is only useful if it tests something that matters.

QA teams understand:

  • Which journeys matter most to customers
  • Where users are most likely to abandon a flow
  • Which defects have caused production issues before
  • How edge cases appear in real workflows

For example, testing the homepage with 10,000 users may look impressive. But if Black Friday revenue depends on product search, add to cart, checkout, payment authorization, and order confirmation, then those flows deserve more attention than generic homepage tests because they are the journeys that directly affect revenue.

That's where QA adds real value. We help connect the technical test to the real customer journey.

Start with critical paths

Before choosing test types, identify your critical paths.

A critical path is a user journey that must work under pressure. If it slows down or fails — Best Buy's site crashed on Black Friday 2025 — the business feels it fast.

For an e-commerce platform, critical paths usually include:

  • Homepage load
  • Category and product listing pages
  • Search
  • Product detail page
  • Add to cart
  • Cart update
  • Checkout
  • Payment authorization
  • Order confirmation
  • Login and account access
  • Promo code validation
  • Inventory and availability checks

QA can help rank these paths by business risk.

A simple way to start is with this table:

Test priority by user flow RISK • PRIORITY
Flow Why it matters Risk level Test priority
Checkout Direct revenue impact High Critical to test
Search Helps users find products High Critical to test
Account profile update Useful but not event-critical Low Test later
Order history Important after purchase Medium Test if time allows

This keeps the performance strategy practical. You're not trying to test everything with equal effort. You're testing what can hurt the customer, the team, or the business the most.

From my experience, this is also where you need to know the system well enough to pick the right battles. It is not enough to say "let's load test checkout." You need to ask what you actually want to validate: cache behavior, CDN impact, payment latency, inventory calls, search, or full end-to-end journey.

The main performance tests QAs should understand

Different tests answer different questions. Running only one type of test gives you only part of the puzzle.

Load test

A load test checks how your system performs under expected traffic.

This is your baseline test.

It answers:

  • Can the system handle normal peak traffic
  • Are response times acceptable
  • Do key flows stay stable
  • Does infrastructure scale as expected
  • Do errors stay below the agreed threshold

For Black Friday, this might mean testing your expected peak traffic based on last year's data, marketing forecasts, and current growth.

Example:

  • 5,000 concurrent users browsing products
  • 1,000 users adding items to cart
  • 300 users checking out
  • 50 payment requests per minute
  • 95% of product pages loading under 1 second
  • Checkout error rate below 1%

The goal isn't to break the system. The goal is to confirm that it behaves well under realistic traffic.

This is also one of the harder things to do in load testing. Designing the injection profile sounds simple until you need to decide how users arrive, how long the ramp-up should be, how long to sustain the load, what arrival rate to use, and how to ramp down.

One way to make injection profile design less subjective is to start from analytics data instead of guessing.

For Black Friday, I would look at last year's traffic, current growth forecasts, campaign plans, peak sessions per minute, conversion rate, checkout start rate, add-to-cart rate, guest vs. logged-in users, and the distribution between browsing, search, cart, and checkout journeys.

The goal is not to copy analytics exactly, but to turn real user behavior into a realistic load model.

For example, if analytics shows that the expected Black Friday peak is around 150k sessions per hour, that is roughly 42 new sessions per second. But I would also check the busiest 5-minute window, because hourly averages can hide sharp traffic spikes. If campaign traffic can push that number to 70 new sessions per second, that should be reflected in the Gatling profile.

I would then split that arrival rate across journeys. Not every user should run the full checkout scenario. Real users behave differently: many browse, some search, fewer add to cart, and even fewer complete checkout. A more realistic test might send 50% of users through browsing, 20% through search, 15% through add to cart activity, 8% through checkout, and the rest through login or account actions.

This helps avoid one of the most common mistakes in load testing: creating a technically clean scenario that does not represent real customer behavior.

For ramp-up, I would again use analytics. If traffic usually builds gradually, I would reflect that with a longer ramp-up. For sustained load, I would hold the expected peak long enough to observe system behavior, usually 30 to 60 minutes for baseline validation, and 2 to 4 hours for stronger Black Friday confidence.

Ramp-down also matters. I would not stop the test immediately after peak load. I would ramp down gradually and monitor the system to see whether response times recover or error rates normalize.

In practice, not every load test needs to run for hours. Many useful Gatling tests I worked with were around 30 minutes total, including ramp-up, sustained load, and ramp-down. This can be enough for baseline checks or comparing performance after a change.

This is where QA can help connect analytics, business risk, and technical behavior.

Gatling can generate the traffic, but the quality of the test depends on whether the injection profile reflects how users actually arrive and what they actually do.

Stress test

A stress test pushes the system beyond expected traffic.

It answers:

  • Where does the system start to fail
  • Which service becomes the bottleneck first
  • How does the system recover
  • Do failures happen safely
  • Does one slow dependency affect everything else

This is useful because traffic forecasts are never perfect.

Maybe a campaign performs better than expected. Maybe a product goes viral.

A stress test helps you find the limit before real users do.

Example:

  • The expected peak is 5,000 concurrent users
  • Stress test increases to 7,500, then 10,000
  • The team watches response times, error rates, CPU, memory, database load, and queue depth
  • The test stops when the system hits agreed failure criteria

Failure is not bad in a stress test. It gives your team useful data.

The real question is: did you learn where the limit is, and can you improve it before launch?

One thing I learned is that Gatling usually shows the symptom first. Gatling might tell me that the checkout p95 response time jumped from 800 ms to 5 seconds, or that payment authorization started returning more errors after traffic reached a certain level.

That tells me the user journey is affected, but to understand whether the cause is the payment provider, inventory service, database, cache, or application code, I need to combine load testing with observability. This is where APM tools like Dynatrace are useful, because they help trace the issue through services, dependencies, database calls, and infrastructure metrics.

Soak test

A soak test runs under sustained load for a longer period.

It answers:

  • Does performance degrade over time
  • Do memory leaks appear
  • Do queues keep growing
  • Do scheduled jobs interfere with traffic
  • Do logs, caches, or database connections cause problems after hours of use

This matters because real traffic events don't always happen in short bursts.

Black Friday can mean many hours of elevated traffic. A system that performs well for 20 minutes might still degrade after six hours.

Example:

  • Run 60% to 80% of the expected peak load for 6 to 12 hours
  • Include browsing, search, cart, and checkout flows
  • Monitor memory, garbage collection, database connections, cache hit rates, and background jobs
  • Check whether response times slowly increase

A soak test is especially useful for finding slow problems. These are the issues that don't show up in short tests but can cause problems in production later.

My experience with soak tests is that the challenge is not only technical. It is also practical. They take time, they occupy an environment, and they slow down the feedback loop.

With a 30-minute Gatling run, I can usually adjust the script, rerun it, and compare results quickly. With a 6-hour soak test, every mistake is expensive. If the data is wrong, monitoring is missing, or the environment changes during the run, you may lose hours and still not have a trustworthy result.

What to test for Black Friday

For Black Friday preparation, I would not start by trying to test every possible scenario or by creating too many different load profiles at once.

In my experience, that can make the whole process harder to control. If every run has a different traffic mix, different injection profile, different data set, and different journey split, it becomes difficult to compare results and understand whether a change actually improved performance.

I would rather start with a simple, solid baseline: the most important customer journeys, realistic data, and one traffic model that the team understands.

For me, this is where the KISS principle really applies. Keep the first version simple enough that the team can trust it, repeat it, and compare it.

The baseline should follow the customer journey. I would ask: what needs to work for a customer to find a product, trust the price, complete the purchase, and receive confirmation that the order went through?

Browsing and Discovery

A large part of Black Friday traffic will come from users who are searching, browsing categories, opening product listing pages, applying filters, viewing product detail pages, and comparing products. Many of them may never reach checkout, but these ghost loads still create significant pressure on the system. This is where search, filters, product images, catalog data, cache behavior, and CDN performance can have a big impact. If this part of the journey is slow — 63% of visitors bounce from pages taking over four seconds to load — users may leave before they ever add something to the cart.

Once users find products, the next important area is cart activity. Adding to cart, changing quantities, removing products, checking availability, and keeping cart state consistent are all important under load. From a QA perspective, I would pay close attention to correctness here. The cart needs to be fast, but it also needs to show the right product, quantity, price, stock status, and promotion information. A cart that loads quickly but shows incorrect data is still a serious problem.

Promotions and pricing

I would also give special attention to promotions and pricing, because Black Friday often depends on them. Discount codes, automatic promotions, bundle offers, loyalty pricing, tax calculation, shipping rules, and regional pricing can all create complexity. These rules are also often changed close to launch, which increases risk. In performance testing, I would not only check whether the pricing services respond quickly. I would also check whether the totals are correct and consistent across the product page, cart, and checkout. A fast wrong price is still a failed customer experience.

Login and guest checkout

Another journey I would include is login and guest checkout. Black Friday traffic is usually a mix of returning customers and new visitors. Some users will log in, some will create accounts, some may need a password reset, and some will continue as guests. If login becomes slow, guest checkout can become even more important. This part of the test can also reveal pressure on authentication services, session handling, token refresh, and rate limits.

Checkout

The highest-risk area is still checkout and payment. This is where performance issues directly affect revenue. I would test shipping selection, tax calculation, payment authorization, order placement, and order confirmation. For checkout, I would never stop at "the page responded." I would want to know whether the order was actually created correctly, whether the payment was authorized, whether the confirmation appeared, and whether the expected order events were triggered. A fast checkout response does not mean much if the order is missing, duplicated, or incorrect.

Third-party dependencies

I would also look beyond the main application and include third-party dependencies in the test strategy. Payment providers, tax services, inventory systems, search providers, email providers, analytics scripts, and ERP integrations can all become part of the Black Friday risk. If one dependency slows down, the important question is whether the application degrades gracefully or whether the whole customer journey starts to collapse.

Once that baseline is trusted, I would add more focused test runs with different traffic patterns, such as increasing load beyond the expected peak to understand limits, or sustaining load longer to check stability over time.

For me, this is the main point: Black Friday testing should follow the real path of the customer. The journey starts with discovery, continues through cart and pricing, moves into login or guest checkout, and ends with payment and order confirmation. Each part can fail differently, and QA can help make sure the test checks not only speed, but also whether the customer can actually complete what they came to do.

Test data is not a small detail

One of the biggest practical challenges in load testing is test data.

Gatling can read a CSV easily, but it will not validate whether the data makes business sense. This part is on us.

For e-commerce testing, the data needs to be representative:

  • SKUs should not all be too similar
  • Test users should represent realistic account behaviour
  • Products should be published and available
  • Stock, pricing, and promotions should be realistic
  • Different journeys should have different data needs

One generic CSV is usually not enough. Browsing, search, cart, checkout, promotions, and account flows all need different data.

I have not used the Gatling AI Assistant for this myself, but this is an area where an assistant could help: reviewing the simulation, explaining how the feeder is used, suggesting validations, or helping generate checks that catch obvious data problems earlier.

For example, it could help you think through whether each CSV row has the fields the journey needs, whether different scenarios need different feeders, or whether the script should validate that a product can actually be added to the cart before continuing to checkout.

Still, it would not replace business knowledge. The assistant can help improve the script and suggest technical checks, but QA still needs to know whether the data represents real customer behavior and valid business rules.

Data freshness is also a real problem. SKUs change, stock changes, products can become unpublished, and test accounts can stop being useful. When test data is outdated, Gatling may fail because the data is no longer valid, not because the system has a real performance problem. This problem looks small before the test, but it can waste a lot of time once the run starts.

This is one of those QA battles that is not always visible in the final report, but it affects the quality of the whole test.

Common mistakes to avoid during Black Friday performance testing

In my experience, Black Friday performance testing usually goes wrong when the test is too generic, too late, or too disconnected from real customer behavior.

Avoid these mistakes:

  • Testing only the homepage
  • Using perfect or unrealistic test data
  • Ignoring checkout correctness
  • Ignoring third-party dependencies
  • Running tests only one week before launch
  • Using unrealistic user behavior
  • Measuring averages instead of percentiles
  • Treating a stress test failure as a surprise instead of useful data
  • Forgetting to retest after fixes
  • Testing in an environment that is too different from production

For me, the "too late" part is especially important. It is much better to shift testing left and make performance testing part of the SDLC in smaller, regular steps, instead of becoming a last-minute activity before Black Friday.

The best performance tests are practical. They model real behavior, use real constraints, and give the team clear next steps.

What "ready" looks like before Black Friday

You don't need perfect results. You need confidence.

Before Black Friday, you should know:

  • Which journeys were tested
  • What traffic model was used
  • Where the system starts to fail
  • Which bottlenecks were fixed
  • Which risks are still accepted
  • How the system behaves during spikes
  • How it recovers after a load drop
  • Whether alerts and dashboards work
  • Who owns each production response action

That last point matters. Performance testing is not only about finding issues. It's about preparing the team to respond.

It also matters to compare runs after fixes. This is where Gatling Enterprise can be really useful, because the run comparison makes progress visible. It is easier to show whether a change helped when you can compare response times and trends across runs, rather than relying only on separate reports or impressions.

One thing I would not skip

If I had to give one piece of advice before a Black Friday load test, it would be this: do not start with the tool. Start with the risk.

Before opening Gatling, I would ask:

  • Which customer journeys would hurt the business most if they failed?
  • What traffic pattern are we actually expecting?
  • Is our test data realistic enough to trust the result?
  • Which third-party dependencies could slow down checkout?
  • What will we do if the test exposes a bottleneck?

The tool can generate traffic, but it cannot decide what matters. That is where QA adds value.

For me, a successful load test is not just one that produces a nice report. It is one that helps the team make a better decision before real customers are affected.

That is the mindset I would bring into Black Friday preparation: test the journeys that matter, question the assumptions behind the test, and make sure the results lead to action.

{{card}}

FAQ

What are the 4 main types of performance testing for Black Friday?

Load testing validates system behavior under expected peak traffic, stress testing pushes beyond expected limits to find breaking points, soak testing runs sustained load for hours to catch degradation over time, and spike testing simulates sudden traffic surges from campaigns or viral products.

Is Black Friday testing only about homepage performance?

No, Black Friday testing focuses on revenue-critical journeys like search, add to cart, checkout, and payment authorization rather than just homepage load, because these flows directly affect whether customers can complete purchases during high traffic.

Why does test data quality matter in load testing?

Gatling executes scripts but cannot validate whether SKUs are published, stock levels are realistic, or pricing rules match production behavior—outdated or generic test data causes false failures that waste time during critical preparation periods.

When should Black Friday load testing start?

Performance testing works best as part of regular development cycles rather than one week before the event, because late testing turns every finding into an urgent fix with no time to validate improvements or retest after changes.

Ready to move beyond local tests?

Start building a performance strategy that scales with your business.

Need technical references and tutorials?

Minimal features, for local use only