Performance Audit Results for openIMIS

openIMIS Performance Testing Results

November 29, 2023

Damian Borowiecki

In this case study, we delve into the journey of conducting a performance audit for a well established project. This blog post serves as an overview of what we achieved, and it takes from where we left during the previous post that was dealing with planning and creating performance tests for openIMIS.

Key Highlights

Introduction to the Project: A brief overview of the project scope and its technical environment. This section sets the stage by describing the project’s architecture, the technology stack, and the performance expectations.
Performance Testing Strategy: We discuss our approach to performance testing, including the selection of tools, the design of test scripts, and the rationale behind our testing methodology. This section emphasizes how our strategy aligns with the project’s objectives and requirements.
Execution of Performance Scripts: A deep dive into the execution phase, highlighting how we conducted the tests, managed resources, and handled challenges. Key aspects like environment setup, script customization, and handling test data are covered in detail.
Analysis of the Results: Here, we analyze the results obtained from different tools and how they offer varying perspectives on the application’s performance. We discuss the interpretation of cProfile and JMeter outputs, and how they complement each other in understanding the application’s behavior.
Reporting and Insights: Focus on the reporting phase, explaining how we synthesized data into actionable insights. This part includes the creation of comprehensive reports, highlighting key performance metrics, bottlenecks identified, and the overall impact on user experience.
Lessons Learned and Best Practices: Reflection on the lessons learned from this performance audit. We share best practices and tips for effectively conducting performance audits, from planning to execution and reporting.
Conclusion and Future Steps: We conclude with our takeaways from this exercise and how it shapes our future approach to performance testing and optimization. We also discuss how these insights can be applied to similar projects in diverse environments.

This blog post is aimed at providing a real-world example of conducting a performance audit, offering insights and practical knowledge for anyone looking to enhance application’s performance and reliability.

Introduction to the Project: openIMIS

This case study focuses on the performance audit of openIMIS, a system designed for social protection and health insurance management. The project has a significant aim of managing health insurance data, crucial for effective healthcare delivery.

Technical Environment: React, Django, Microkernel Architecture

The core of openIMIS is built using React for frontend and Django for backend. A key challenge during development was managing the microkernel architecture. While this wasn’t directly impacting performance testing, it was relevant for understanding development bottlenecks and the complex interactions between modules.

Performance Expectations: Testing Under Various Loads

Our goal was to test openIMIS under different load conditions. With limited production instances and varied usage, the focus was to see how the system performs with multiple users on the latest release. This pro-bono initiative by our organization was to ensure openIMIS not only meets but surpasses industry standards, contributing to the community using the software.

Performance Testing Strategy

In assessing the performance of OpenIMIS, our strategy was grounded in a mix of established industry tools and innovative testing approaches. This section outlines the tools we chose, the design of our test scripts, and the rationale behind our testing methodology.

Choice of Tools: JMeter and cProfile

For load testing, JMeter was our tool of choice. Its status as an industry standard, despite a somewhat clunky UI, guarantees reliability and a wide range of plugins, making it a versatile choice. The browser recording feature of JMeter, although not unique to it, proved useful in capturing realistic user interactions.

For Python profiling, we turned to cProfile, primarily for its integration with Django through the django-cprofile-middleware. This allowed us to leverage SnakeViz for visualizing performance bottlenecks. We initially considered Django Silk for this purpose, but it fell short in our tests, failing to capture request data accurately.

openimis performance testing — Screenshot 1. High-Level Overview of profiler visualization of claim request fetch.

Screenshot 1. presents a fine-grained overview of how time-consuming operations are under the graphQL resolve_claims query. This overview makes it easy to detect potential bottlenecks in the code.

Looking at the table presented on the Screenshot 2., we could point out that the most time-consuming function was the system posix.stat call. Second most time consuming request was the database call.

Screenshot 2. Profiler table with the most time consuming operations for fetching 100 claims.

Designing Test Scripts: From API to Frontend Interactions

Our testing approach involved a mix of simple API tests and more complex scenarios recorded on the frontend. The latter helped us uncover performance issues that might not be evident from API interactions alone.

Screenshot 3. Insuree creation using only essential API calls.

openimis performance audit — Screenshot 4. Insuree creation using recorded API calls.

Screenshots 3. and 4. show the difference between API calls that were considered necessary for user creation and API calls that were actually triggered while interaction between the user and frontend was recorded.

A critical aspect of our testing was the generation of test data. Instead of relying solely on scrambled production data, we developed scripts to generate large volumes of realistic test data. This process was initially slow, using default data generation functionalities in place, but we significantly improved performance by implementing bulk updates, parallel execution,more efficient randomization techniques, and improved randomization of data (related entities fetched once and randomized per chunks of data). For example, we achieved a 100x improvement in the creation time for claims data, a key metric for our system.

Methodology Rationale: Load, Spike, and Stress Tests

Our methodology encompassed a range of tests, including load, spike, and stress tests, each designed to assess different aspects of openIMIS performance. Load tests helped us understand the system’s behavior under predefined user traffic, while spike tests evaluated its response to sudden surges in usage. Stress tests were particularly revealing, pushing the system to its limits to identify its breaking points.

This comprehensive approach is aligned with our goal to ensure that OpenIMIS not only performs optimally under normal conditions but also remains robust and reliable in extreme scenarios.

Execution of Performance Scripts

Executing our performance tests required a careful setup to ensure accuracy and reproducibility. This section outlines our execution environment, the management of test execution, the challenges we faced, and how we monitored and collected data.

1. Execution Environment: Dedicated EC2 Instance

We conducted our tests on a dedicated EC2 m5.4xlarge instance. This choice was driven by the need to isolate test execution from other processes and avoid any potential interference. The use of a separate machine also ensured that the tests did not impact the server’s performance.

To further mirror real-world scenarios, we ran the application in Docker containers and executed tests in both debug and production modes. The database used was PostgreSQL, with the application also supporting legacy MSSQL.

2. Managing Test Execution: Custom Scripts and Reproducibility

Custom shell scripts were essential for automating our test scenarios, including spike, ramp-up, and stress tests. Each type of test had specific configurations:

Spike Tests: Executed separate test suites with different thread counts.
Ramp-Up Tests: Ran tests with varying thread numbers and ramp-up times.
Stress Tests: Simultaneously ran all test suites with changing thread counts.

We used dstat to monitor the local environment, ensuring the testing machine itself did not reach its limits, which could skew the results. This monitoring was crucial for maintaining the consistency and reproducibility of our tests.

Copy Code

#!/bin/bash

# Define different thread counts
thread_counts=(1 5 10 30 50 100 150 300)

# Array of your thread group names
thread_groups=("TG1" "TG2" "TG3" "TG4" "TG5" "TG6" "TG7") # Add all your thread group names here

for tg in "${thread_groups[@]}"
do
	# Create a directory for each thread group
	mkdir -p Outputs/thread_group_${tg}


	# Loop through each thread count and run the test for the current thread group
	for count in "${thread_counts[@]}"
	do
    	# Start system monitoring in the background (example using dstat for Linux)
    	dstat --output Outputs/thread_group_${tg}/system_stats_${tg}_${count}u.csv 5 > /dev/null &
   	 
    	# Record the process ID of the monitoring tool
    	MONITOR_PID=$!

    	echo "Running test for $tg with $count threads"
    	jmeter -n -t alltests.jmx -J${tg}threads=$count -l Outputs/thread_group_${tg}/thread_${tg}_${count}u.jtl
    	# Kill the monitoring process
    	kill $MONITOR_PID
	done


done

echo "All tests completed."

Code Snippet 1. Script used for the execution of spike tests.

Code Snippet 1 shows how spike tests are executed – at first, all Test Groups and number of threads to be executed are listed. After that, tests are executed for each product of the thread group and number of threads. For each one of them additional dstat output is recorded.

Screenshot 5. shows how to configure the thread group to use variables passed through the cli or env variables.

3. Challenges and Solutions: Handling System Limits

One notable challenge was handling 504 Gateway Timeouts when the application reached its limits. This occasionally required rebooting the EC2 instance. We noted this in our results, suggesting the implementation of automatic recovery features. Additionally, the initial JMeter test plan required adaptation to fit our script-driven approach, particularly for managing different thread counts and test suites.

4. Monitoring and Data Collection: dstat and JMeter Reports

For system monitoring, we used dstat, complemented by a custom Python script for visualizing its output using matplotlib. The test results were primarily visualized through HTML reports generated by JMeter, tailored to our needs. We cross-referenced the JMeter data with profiler outputs on smaller data sets to verify the accuracy and relevance of our findings. Exemplary system usage is presented on Screenshot 6. During the Test Execution we monitored CPU, Disk, Network usages as well as the system load.

Analysis of the Results

This section delves into how we analyzed the performance test data and the critical insights we garnered. We focused on a predefined set of metrics, correlating JMeter results with cProfile outputs where necessary, to understand the application’s performance in various scenarios.

1. Approach to Data Analysis

Our analysis was anchored by a predetermined set of criteria, outlined in a table with specific metrics, target values, and threshold values. This table included metrics like user authentication requests, search actions, online transactions, HTTP errors, transaction failures, peak load errors, and database deadlocks.

Each performance test run generated output in CSV format JMeter that was later on used to generate HTML reports, offering aggregated data and visual charts. These reports were then compared against our table to assess whether the system met the defined criteria under different loads. When a test suite failed to meet the criteria, we delved deeper into cProfile outputs to pinpoint potential causes.

Chart 2. presents the response times for 150 threads with ramp up time 75 for Insuree Test Suite over time.

2. Key Performance Metrics

Metric	Description	Target Value	Threshold Value
User Authentication Requests	Users login into the system.	2s	5s
User Search Actions	E.g. Insuree search based on name or ID	5s	7s
User Online Transactions	Creating an Insuree or creating a transaction	3s	5s
HTTP Errors	Issues with API for users	4xx: 0.5%5xx: 0.1%	4xx: 1%5xx: 0.2%
Transaction Failure	Issues with user interactions	Login: 0.2%Search: 0.5%	Login: 0.3%Search: 0.7%
Peak Load Errors	Failure rates for peak hours	2x values of transaction failures	4x values of transaction failures
Database deadlock	Transactions are waiting indefinitely for one another	0%	0%

Table 1. Target and Threshold Values for Performance Testing Assertions.

The predetermined table served as our guideline for evaluating key performance metrics, such as response times for different user actions, error rates, and failure rates during peak loads. These metrics were critical in assessing the application’s responsiveness, reliability, and overall user experience under stress.

3. Identifying Bottlenecks

The most significant bottlenecks we identified were related to claims querying and processing. Given the complexity and dependency-heavy nature of these operations in health insurance management, these bottlenecks were particularly challenging. Under certain loads, these operations could trigger 504 Gateway Timeouts, highlighting a critical area for optimization.

4. Insights and Conclusions

The performance testing revealed vital insights into openIMIS scalability and reliability. Key takeaways included:

The necessity to optimize complex operations, particularly claims processing, to prevent system timeouts.
The importance of setting realistic performance benchmarks based on real-world scenarios, as seen in our criteria table.
A confirmation that while the system performs well under normal conditions, there is a need for improvement under peak loads to ensure continuous reliability.

In conclusion, the performance analysis not only identified specific areas for optimization but also underscored the importance of rigorous testing in uncovering hidden inefficiencies. It provided a clearer understanding of how openIMIS behaves under various stress conditions, setting the stage for targeted improvements to enhance both its scalability and user experience.

Reporting and Insights

This section outlines how we consolidated the performance testing data into a comprehensive report and the critical insights that were derived, shaping our future approach to openIMIS development and optimization.

1. Report Structure and Content

The report was structured to provide a clear and complete overview of the entire testing process:

System Overview: Covered the testing objectives, the test environment, and outlined any risks and assumptions.
Testing Methodology: Detailed the test scripting, test cases, and execution methods.
Results Presentation: Focused on performance metrics and key findings from the tests.
Implications: Analyzed how the results aligned with our objectives and compliance with performance benchmarks.
Recommendations for Improvements: Offered targeted suggestions for code, database, and infrastructure improvements.
Summary and Conclusion: Provided a succinct wrap-up of the entire testing process and its outcomes.

This report, crafted in a standardized document format, served as a foundational tool for understanding and communicating the performance of the system.

2. Key Insights and Improvement Proposals

Several significant insights and improvement recommendations emerged from the analysis:

Advanced Caching: Implementing more sophisticated caching mechanisms, such as Redis, for the master data to enhance performance.
Optimizing Complex Queries: Utilizing tools like RabbitMQ for better handling of complex requests, reducing bottlenecks (this approach is more about managing asynchronous tasks and message queuing rather than directly optimizing queries).
Database Pooling with PgBouncer: Introducing PgBouncer to manage database connections more efficiently, addressing the slow response times during high concurrency.
Optimizing Heavy Computations: Moving complex operations like claims processing to more optimized environments than standard Python code, possibly using libraries like Pandas/SciPy for better performance.

3. Actionable Outcomes and Future Steps

The next steps involve creating a dedicated backlog in the ‘Performance and Quality’ space on Jira to systematically address the identified issues. These recommendations are not just fixes but also strategic directions for future development and optimization efforts. They are expected to significantly influence how openIMIS evolves in terms of scalability, efficiency, and overall performance.

Lessons Learned and Best Practices

In the course of our performance testing for openIMIS, we gained several key insights and learned important lessons that not only enhanced our understanding of performance testing but also shaped our approach to future projects.

1. Unexpected System Behavior Under Different Loads

One of the primary lessons learned was the unpredictability of system behavior under varying loads. Simple code-level profiling and single-threaded testing are often insufficient to uncover issues that manifest only under specific conditions or heavy loads.

Surprisingly, we found that recorded user interactions resulted in far more requests than anticipated, highlighting the importance of replicating actual user behavior as closely as possible in tests.

2. Importance of a Well-Defined Plan and Collaboration

A well-defined plan is crucial for effective performance testing. Collaborating with QA experts to define test objectives and develop test cases is essential. Our experience underscored the value of having actionable outcomes from these tests.

Additionally, the investment in creating robust test data pays off in the long run by enabling more efficient retesting and accurate results. While we found JMeter to be a solid tool for our needs, it’s important to choose tools based on the specific requirements of the project, keeping in mind that different tools might better suit different scenarios.

3. Integrating Performance Testing into the Development Life Cycle

The experience has led us to believe that performance testing should be an integral part of the Software Development Life Cycle. Incorporating these tests regularly and early in the development process can help identify and address performance issues more proactively, leading to more robust and efficient applications.