How to Test Data Ingestion Pipeline Performance at Scale in the Cloud

Behind the scenes of the tools, metrics, and automation that keep real-time data ingestion fast, reliable, and scalable.

Guidewire Engineering Team

Guidewire Engineering Blog

· ~9 min read · July 3, 2025 (Updated: July 3, 2025) · Free: Yes

Author: Anirudh Jaiswal, (Senior QA Engineer)

In today's insurance landscape, real-time data is no longer a luxury — it's a necessity.

The Guidewire Cloud Data Platform empowers Property and Casualty insurers to efficiently manage and process vast volumes of data. It integrates seamlessly with core Guidewire applications, PolicyCenter, ClaimCenter, and BillingCenter, offering scalable storage, real-time analytics, and actionable insights.

One of its key strengths is its real-time access to core system data, which enables swift and informed decisions across operations such as claims adjustment, risk assessment, and customer engagement.

To support this, the platform features resilient big data pipelines that collect, process, and analyze billions of records, as well as manage petabytes of data (a petabyte is roughly equivalent to 500 billion pages of standard typed text). It leverages Amazon Web Services (AWS) — a suite of powerful cloud tools and services — to meet insurers' demanding needs for storage, processing, and analytics.

End-to-End Architecture of the Guidewire Data Platform

Efficient ingestion is the cornerstone of real-time analytics, mainly as insurers increasingly rely on predictive models and automated workflows. To facilitate this, Guidewire employs Debezium, a change data capture (CDC) tool that streams database changes from transactional systems, such as claims and policy databases, to the Guidewire Cloud Data Platform. (Debezium acts like a digital "watchdog," capturing and transmitting only what's changed — like sending updates instead of the whole document every time you edit.)

This architecture equips insurers to remain agile, responsive, and consistent, even across complex, distributed systems. But what does this mean for day-to-day operations — and where are the real gains?

Why Performance Testing Matters for Data Ingestion

Data ingestion performance testing, focused on Debezium, is a critical practice for ensuring the smooth operation of the Guidewire Data Platform. It validates key metrics such as throughput, latency, and resource utilization, helping insurers maintain real-time data processing without sacrificing reliability or accuracy.

Ultimately, performance testing safeguards the speed and integrity of data pipelines, ensuring that the information powering decision-making is processed efficiently and without disruption. These benefits directly support business continuity in modern insurance operations.

The following points illustrate why performance testing is essential:

Real-Time Decision-Making: Timely claims processing, underwriting, and policy updates depend on uninterrupted, low-latency data flow. Testing confirms that ingestion pipelines meet high-throughput demands without delay.
Scalability Under Load: As data volumes grow, especially in the insurance industry, testing ensures the platform scales without degrading performance or introducing delays.
Business Continuity: Delays in ingestion can ripple through to downstream systems, affecting decisions and customer experiences. Testing ensures resilience under peak loads and failure scenarios.
Efficient Resource Use: Testing highlights inefficiencies — such as excessive CPU, memory, or bandwidth consumption — so teams can optimize for both performance and cost.

Performance testing protects both data reliability and operational agility, key pillars of a resilient data platform. To make that possible, we rely on a core set of metrics that measure and validate ingestion performance under real-world conditions.

Core Metrics for Testing Data Ingestion Performance

To evaluate the performance and reliability of the data ingestion pipeline, several key metrics should be monitored during testing:

Throughput: Measures how much data the system ingests over time (e.g., records per second or megabytes per second). High throughput is critical when processing large data volumes, especially in real-time scenarios.
Latency: Refers to the delay between when data is generated and when it becomes available in the target system. For real-time platforms like the Guidewire Cloud, low latency is crucial for enabling timely decisions and actions.
Scalability: Assesses how well the ingestion pipeline handles increasing data loads. Performance testing simulates different traffic patterns to verify that the system scales without degradation.
Resource Utilization: Tracks the utilization of system resources, including CPU, memory, and network bandwidth. Excessive usage can indicate inefficiencies or bottlenecks in the pipeline that may affect performance and cost.
Error Rate and Data Consistency: Validates that data is ingested accurately and consistently. Testing should confirm a low error rate, no data loss, and complete data integrity throughout the pipeline.
Impact on Source Databases: Evaluates the load placed on transactional source systems. The CDC process, especially under high-throughput conditions, can increase disk I/O and CPU usage.

Monitoring helps ensure that ingestion doesn't degrade the source system's performance. But measuring performance is only half the story — visualizing metrics in real time is what makes the data actionable.

Real-Time Visibility: Kibana for Data Ingestion Monitoring

To monitor the performance of data ingestion in the Guidewire Data Platform, we use Elasticsearch to store metrics and Kibana to visualize them. Together, they form an intuitive, real-time dashboard that supports every phase of performance testing.

Real-Time Connector Performance Dashboard in Kibana

Elasticsearch efficiently aggregates, searches, and analyzes high-throughput performance data. Its scalability and powerful query capabilities allow us to explore detailed metrics and uncover meaningful insights about ingestion behavior.

Kibana, tightly integrated with Elasticsearch, transforms this raw data into clear, interactive dashboards. These visualizations enable easy tracking of key metrics, such as throughput, latency, and resource usage, in real-time, ensuring complete visibility into system performance.

To further enhance the process, we've developed an automated performance testing tool that connects directly with Kibana APIs. It fetches real-time metrics and automatically downloads dashboards, allowing users to monitor system behavior, identify bottlenecks, and make informed, data-driven decisions. By automating dashboard access, we reduce manual effort and maintain consistent, accurate monitoring across every test run.

Automated Testing Suite: Validating Ingestion at Scale

To validate the scalability and reliability of data ingestion pipelines in the Guidewire Data Platform, we developed an Automated Performance Testing Suite powered by GT-Load, Guidewire's in-house load testing tool.

This suite enables us to simulate high transaction volumes across core applications, including PolicyCenter, ClaimCenter, BillingCenter, and ContactManager, ensuring that ingestion processes perform under pressure and support real-time decision-making at scale.

Central to this suite is GT-Load, our high-volume load generator, which enables rigorous, real-world testing of data ingestion performance across Guidewire Cloud applications.

GT-Load: Stress Testing Guidewire Cloud at Scale

GT-Load is a core component of the GT Framework, a set of test automation tools designed to facilitate the testing of Guidewire applications running on the Guidewire Cloud Platform. It simulates real-world load scenarios and tests the performance of Guidewire Cloud applications, particularly under high API traffic. It combines the Karate framework for scenario definition with the Gatling tool for orchestrating large-scale load generation. (Karate defines the specific actions to simulate — like creating a new policy — while Gatling executes those actions at scale, generating the heavy workload needed for testing.)

With GT-Load, we can generate thousands of claims and policies within PolicyCenter and ClaimCenter, pushing the system to its limits. In typical test runs, we simulate loads of 10,000, 50,000, or even 100,000 transactions in under two hours.

This volume of activity enables us to rigorously assess the scalability and resilience of the data ingestion layer, mirroring real-world conditions where high throughput and rapid data processing are crucial.

To maximize the value of GT-Load, we've integrated it into a fully automated testing suite — one that orchestrates everything from environment setup to load execution and teardown.

End-to-End Automation: From Setup to Teardown

The Automated Performance Testing Tool orchestrates the entire testing process, from environment setup to teardown, with minimal manual input. It begins by configuring Guidewire's core insurance applications and provisioning an AWS serverless Aurora PostgreSQL RDS instance, a high-performance managed database service.

Once the environment is running, the tool targets the data ingestion layer, referred to as the connector in Guidewire terminology. (The connector is the component that links core insurance systems to the data platform, streaming data changes for processing.) Here, it initiates high-load scenarios (10K, 50K, or 100K claims and policies) using GT-Load to simulate activity in PolicyCenter and ClaimCenter. At the same time, the connector streams this data to the Guidewire Data Platform for ingestion and processing.

A key feature of this tool is its integration with TeamCity, a continuous integration and delivery (CI/CD) platform that automates build and test workflows, which provides a user-friendly interface for triggering performance tests. Users simply select a load scenario and apply a few configuration settings; the tool handles the rest automatically.

After test execution, the tool performs a complete automated teardown, shutting down the InsuranceSuite applications, stopping the RDS instance, and deleting the connector. This resource cleanup eliminates unnecessary infrastructure costs and maintains efficient operations.

By automating both testing and infrastructure management, the tool delivers a streamlined, scalable, and cost-effective performance testing cycle.

With the performance testing cycle fully automated, the next critical step is capturing the right metrics to evaluate how the system behaves under load.

What We Measure During Performance Testing

As performance testing progresses, our automated tool continuously collects a comprehensive set of metrics from both the connector and the database. These insights enable us to assess the ingestion pipeline's performance under load and confirm that the system can handle high throughput without compromising stability.

Here are the core metrics we monitor:

Average Records Processing Rate (Throughput): Measures the speed at which the connector processes records and publishes them to Kafka, a system that moves data between applications in real time.
Heap Memory Usage: Tracks memory consumption to detect potential leaks or inefficiencies that could impact long-term stability.
CPU Utilization: Monitors the percentage of CPU usage by the ingestion process, helping identify performance bottlenecks.
Network In/Out Rate: Measures incoming and outgoing traffic during ingestion, helping to spot bandwidth constraints or network-related slowdowns.
Replication Lag: Captures the delay (in milliseconds) between a change in the source database and its processing by the connector. This metric is critical for maintaining near-real-time data availability.
RDS Metrics: Includes key database performance indicators, such as CPU Utilization, database connections, and Read/Write IOPS (Input/Output Operations Per Second). These metrics help assess the database's ability to support sustained ingestion under load.

Ingestion performance metrics visualized during load testing to assess system stability and efficiency.

Together, these metrics not only help identify bottlenecks and resource constraints but also provide the foundation for continuous improvement by powering the post-test analysis and reporting process.

Automated Reporting: Insights for Continuous Improvement

After each performance test, the Automated Performance Testing Suite generates a detailed PDF report that features graphs and charts, visually capturing system behavior under load. Alongside this, it produces a structured JSON file containing all raw test data, which is securely stored in Amazon Simple Storage Service for future analysis and reference.

This automated reporting process enables us to track performance trends over time, compare results across different load scenarios, and maintain a reliable historical record. These insights support continuous optimization and ensure data-driven improvements to system performance.

Conclusion & Takeaways

Data ingestion performance testing is critical to ensuring the Guidewire Cloud Data Platform can scale to meet the demands of real-time, high-volume data processing. Through tools like Debezium, GT-Load, and Kibana, Guidewire delivers high throughput, low latency, and efficient resource utilization.

The automated performance testing suite, integrated with TeamCity, simplifies the testing workflow, minimizing manual effort while optimizing infrastructure usage. Key metrics, including throughput, memory usage, and replication lag, are continuously monitored, enabling teams to make informed decisions and drive ongoing improvements.

And to answer the question posed at the start — what does this mean for day-to-day operations and real-world gains? It means faster, more reliable decision-making, scalable systems, reduced operational risk, and data-driven agility for modern insurers. That's the power of performance testing done right.

Next Read: Introducing the Guidewire Data Platform for Real-Time Insurance Data

Want to learn more about the platform's architecture? Check out our previous post: Introducing the Guidewire Data Platform — an enterprise-grade, cloud-native foundation for real-time, internet-scale data.

If you are interested in joining our Engineering teams to develop innovative cloud-distributed systems and large-scale data platforms that enable a wide range of AI/ML SaaS applications, apply at Guidewire Careers.

#data-platforms #performance-testing #aws #big-data-pipeline #engineering