Connect for Life/Vxnaid Performance Testing

SolDevelo conducted performance testing on the Vxnaid system with the purpose of optimizing it for seamless integration into various vaccination campaigns that require an effective management of large workloads.

The Client

Johnson & Johnson is a multinational corporation known for its diversified healthcare products. Founded in 1886, it operates in three main segments: pharmaceuticals, medical devices, and consumer health products. The company is renowned for its commitment to innovation, with a wide range of products including prescription drugs, medical devices, over-the-counter medications, and personal care items. Johnson & Johnson is also recognized for its corporate social responsibility initiatives and efforts in global health.

Connect for Life/Vxnaid

The CfL/Vxnaid system represents an innovative digital platform designed by Johnson & Johnson Global Public Health and its collaborators, such as SolDevelo, to facilitate effective vaccine administration and optimize vaccination campaigns. 

This system is poised to introduce Vxnaid, a robust digital solution comprising three core pillars: 

  1. An identification component ensures vaccinees are correctly identified at each clinic visit. 
  2. A dashboard of reports comes with the platform. It allows for almost real-time insights into the progress of the vaccination campaign in the field. 
  3. The engagement modality of the platform uses basic phone technology to reach the targeted vaccinees with SMS/WhatsApp/voice messages in the preferred local language to recall persons for their appointments. 

Connect for Life leverages basic mobile phone technology to transmit concise, actionable messages aimed at influencing health behaviors through the dissemination of health tips and adherence reminders. This approach is especially crucial in mitigating scenarios where patients may encounter stigma and overcoming obstacles associated with remote access, allowing healthcare providers to proactively monitor patients regardless of location or time constraints. 

The Challenge

Our primary objective was to ensure the successful implementation of Vxnaid as the successor to the existing SHIP framework by validating its ability to manage significantly larger workloads while maintaining optimal performance and reliability. 

To achieve this, our team conducted comprehensive performance testing on the Vxnaid system, assessing its responsiveness, scalability, and robustness under increased operational demands. Through rigorous testing protocols, we identified critical bottlenecks and potential issues within the system, providing detailed reports that outlined performance metrics and actionable insights. 

Our efforts aimed to enhance the overall efficiency and effectiveness of the Vxnaid platform, ensuring its seamless integration into broader vaccination campaigns while addressing key areas of improvement to optimize its functionality and user experience.

Methodology

The objectives of the performance testing were centered around conducting load tests, spike tests and stress tests:

  • The load tests aimed to assess the system’s response under varying levels of simulated user activity, evaluating its performance as the workload increased. 
  • Stress tests were conducted to gauge the system’s stability and identify its breaking points under extreme conditions. 
  • Spike tests were designed to evaluate the system’s resilience and responsiveness to sudden and significant increases in user activity within a short timeframe. On the other hand, endurance tests focused on evaluating the system’s ability to maintain consistent performance over an extended period. 

By undertaking these comprehensive tests, the goal was to provide a thorough understanding of the CfL system’s performance across different scenarios and workloads.

PropertyBasic Case Extended Case
Patients700050 000
Operators 60200
Locations 8100
Messages / Year50 0001 000 000
Weekly Distribution3002 222

Another crucial objective was to identify potential bottlenecks within the CfL system. The focus was on pinpointing any areas where the system’s performance might be constrained or compromised under specific conditions. By systematically examining the system’s behavior, the testing aimed to uncover any bottlenecks that could impede optimal performance, allowing for targeted improvements and optimizations to enhance overall system efficiency. 

Tools and technologies used

Hardware Infrastructure: EC2 m5.4xlarge instance

Test Data: Users, Patients, Visits

Test Tools: JMeter, Nexmock

Configuration: CfL/Vxnaid Distribution Default Configuration

Isolation from Production: True

Monitoring and Logging: JMeter Report Tool, JProfiler

For mocking text messages, we implemented Nexmock within this project.

Metrics measured

If a threshold value is exceeded for certain metrics, the test is considered a failure. 

If the result is below target value then the test should be considered as passed. 

MetricDescription Target ValueThreshold Value
User Authentication RequestsOperators log in to the system2s3s
User Search ActionsE.g. patient search based on name or ID5s7s
User Online TransactionsCreating a patient or creating a transaction3s5s
Media OperationsRequesting and uploading patient biometric data5s7s
Scheduled TasksCommunication with external servers for messaging30s40s
HTTP ErrorsIssues with API for operators4xx: 0.5%
5xx: 0.1%
4xx: 1%
5xx: 0.2%
Transaction FailureIssues with user interactionsLogin: 0.2%
Search: 0.5%
Patient Registration: 0.5%
Visit Status Update: 0.5%
Login: 0.3%
Search: 0.7%
Patient Registration: 0.7%
Visit Status Update: 0.7%
Peak Load ErrorsFailure rates for peak hours2x values of transaction failures4x values of transaction failures
Database DeadlockTransactions are waiting indefinitely for one another0%0%
CPU & MemoryUtilization of server resources 70%90%

Test scripting

For our performance testing case study, our approach was centered on simulating a realistic user experience and validating critical functionalities under various conditions. To achieve this, we designed test cases that mirrored the natural flow of page usage by multiple users. This involved scenarios such as querying data from a patient table, a fundamental and frequently accessed feature, to ensure optimized response times and system stability. 

Additionally, we tested the timeliness of scheduled reminders by verifying if there were no delays between the scheduled trigger and the actual message delivery. Another crucial aspect was assessing the impact of database modifications, specifically adding new data, on system performance. By incorporating these test cases into our performance testing strategy, we were able to identify and address potential bottlenecks, and optimize the application’s responsiveness and scalability effectively.

Test Cases

TC01: Multiple login requests

TC02: Searching and viewing patients for large data sets

TC04: Test messages functionality

TC05: Test job scheduling reminders for the next day

TC06: Add patient

TC01, TC02, TC04 and TC06 were recorded, capturing all communication between the server and the client. These recorded interactions were subsequently utilized in testing, replicating actual user behaviors as outlined in the test plan. TC05 was initiated through the execution of an SQL script, generating patients with associated visits, followed by the monitoring of server logs. 

Load Generation

The MYSQL routines used for generating users and patients, both with and without visits, also ensure the creation of all necessary accompanying data. Messages were initiated by simulating traffic through the sending of requests in JMeter.

Users: 60/200/500

Patients: 7000/50 000/100 000

Visits: 100 000 patients 20 000 visits

Messages: 10 000 messages per 12 hours

System Overview Methodology

Technologies

  • OpenMRS Framework
  • MySQL 5.7
  • Spring
  • JSP, Groovy Server Pages – although some custom/non-standard, React, plain JavaScript – generally the front-end is a big mix of technologies
  • REST APIs available

Key components and functionalities

The platform is adept at registering patients across a diverse network of various locations, which involves the robust handling of data related to thousands of registered individuals. In addition to patient registration, the system efficiently manages hundreds of concurrent users, facilitating real-time access and interaction with patient records and other operational functionalities. 

A central feature of the system is its ability to manage the high volume of messaging traffic, with an average of a couple of thousands of messages transmitted daily. These messages are primarily scheduled during nighttime batch processes, where a specialized job runs to queue and prepare messages for dissemination over the subsequent 24 hours. This automated scheduling ensures timely delivery of critical communications without impacting real-time operations.

Performance Testing Results

Our performance test results were combined into an extensive report covering several important areas. This report offered a thorough analysis of the system’s performance evaluation and was painstakingly organized with distinct parts. An executive summary of the report’s main performance insights and goals was provided. The system overview was then covered in detail, together with the testing goals, test conditions, and related assumptions and dangers. Our methodology, including test cases, scripting, and execution details, was covered in full in the methodology section. Key discoveries and thorough performance measurements were included in the results, which were bolstered by profiler insights and visually represented data. 

The paper examined the effects on the goals and offered thorough suggestions for upgrades, including database improvements and code-level optimizations. Acknowledgments, contact details, and a final summary were also included. Additional appendices offered in-depth test execution details and supplementary materials for a holistic understanding of the findings and remedies derived from the performance testing process.

Example of visualized data

Below you can see one of the summaries of test cases for a spike test with a load of 100 000 patients. This summary indicates which tests either failed or passed based on pre-defined thresholds.

Spike 100 000 patients

Test Case60 users200 users500 users
TC01
TC02
TC06

We also provided detailed data in a table to enhance understanding of each request’s behavior. Additionally, we visualized some of this data to observe how it evolves over the duration of the test.

In addition to measuring the communication between a server and a client, we employed a profiler to analyze which parts of the code consume the most execution time. This data was included in the report, providing valuable insights into potential performance bottlenecks within the codebase.

Performance Highlights

The system showed mostly commendable performance in handling 60,000 patients, 200 to 500 users and ~2.5k reminders per day, maintaining an average response time well within acceptable thresholds. In endurance scenarios, the system demonstrated stability in delivering messages. Our analysis assured the client of the system’s reliability and overall stability for most operations. 

However, the patient registration functionality, particularly when accessed through the UI, has shown to cause performance issues, affecting the overall system efficiency. This discovery indicated the need for further investigation into patient registration processes by the development team to enhance the system’s capabilities.

Also, our report highlighted issues concerning the efficiency of the scheduling functionality, particularly noting potential problems that could arise on larger instances and delays in message delivery exceeding the set threshold. We specified the operational limits within which the functionality performs adequately and emphasized the necessity for further investigation by the development team for scenarios that surpass these limits.

Key Performance Issues

Breaking Points and Stress Tests: The system shows breaking points under stress conditions, particularly for 7000, 50,000, and 100,000 patients. The server breaks at 800 users for the first two categories and at 400 users for the latter. A bottleneck has been identified in the “patientRegistration” process, where error rates exceed the acceptable threshold, impacting overall system reliability.

Root Causes of Issues

“patientRegistration” Request: This operation is a critical point of vulnerability, leading to a cascade of errors and server failures.

Message Delivery System: Performance bottlenecks and CPU utilization due to concurrency issues are rising from results from tests carried out, and should be further investigated.

Recommendations

Recommendations for Improvement

Code-Level Optimizations: Implementing batch processing, optimizing queries, and considering a more efficient database schema or a NoSQL solution to improve task retrieval and data handling.

Database Enhancements: Balancing the heavy indexing of the database to improve the efficiency of insert operations and considering tools like ProxySQL for improved data retrieval.

Update and Testing of New Modules: During the tests we did a second iteration with updated modules. It is recommended to update the CfL/Vxnaid system with the new versions of modules available in the GitHub repository. Before deployment, these modules must undergo thorough testing to ensure they are production-ready, as they are not yet verified for widespread use. This testing should be comprehensive, covering all functional aspects to ensure stability and reliability.

Implications and Alignment with Objectives

Stability Assessment: The system is generally stable but struggles under heavier loads, than anticipated for the implementation. 

Identification of Bottlenecks: Bottlenecks in the patient registration process and message delivery system have been identified, and need to be addressed in order to improve overall performance. While a CfL version with updated modules is usable, bigger instances may have issues in the future.

Conclusion

In conclusion, the comprehensive evaluation of our system reveals a commendable overall state of health, marked by stable performance and acceptable error rates in most operations. However, a critical area that requires immediate attention revolves around the process of adding patients, particularly the “patientRegistration” requests, which exhibit a notable deviation from acceptable error thresholds. 

Addressing and resolving the issues associated with patient registration is paramount to ensuring the continued excellence of the system. With the right focus on optimizing the patient registration process, we can fortify the system’s robustness and maintain the high standards of reliability and performance observed in other facets. By strategically addressing these challenges, we can uphold the system’s positive trajectory and deliver an enhanced experience for our users.

Author

Scroll to Top