Elevated API Errors - user access login error

Incident Report for Northpass

Postmortem

Service Disruption on September 9–10, 2025

Summary

On September 9, 2025, Northpass experienced a service disruption affecting all AWS-hosted customers. Users reported difficulties logging in, accessing courses, and completing activities. Azure-hosted customers were not affected.

Impact

  • Duration: ~2 hours of major disruption, followed by slower performance for several more hours
  • Affected customers: AWS-hosted environments only
  • Symptoms: login errors, slow page loads, delayed certificates, and issues with integrations (e.g., Zoom sessions)

Root Cause

The disruption was caused by our primary database running out of available input/output capacity (IOPS) in AWS. This slowed down critical operations and caused delays across the platform.

Resolution

Our engineering team took immediate action to stabilize the system, including expanding capacity and reducing system load. Once traffic normalized, performance returned to expected levels.

Next Steps (Preventing Recurrence)

We are implementing the following permanent improvements:

  • Upgrading our AWS database storage to a higher-performance type with more IOPS capacity
  • Improving monitoring and alerting to detect database pressure earlier
  • Optimizing how we process background tasks to reduce load during peak usage
  • Optimizing database queries to reduce impact on performance and improve reliability
Posted Sep 10, 2025 - 09:38 EDT

Resolved

This incident has been resolved.
Posted Sep 09, 2025 - 23:00 EDT

Update

We are continuing to monitor for issues. Customers and Learners may experience slower page loads as services recover.
Posted Sep 09, 2025 - 21:43 EDT

Update

We are continuing to monitor for issues. Customers and Learners may experience slower page loads as services recover.
Posted Sep 09, 2025 - 20:44 EDT

Update

We are continuing to monitor for any further issues.
Posted Sep 09, 2025 - 19:43 EDT

Update

Applications are stable but we are still monitoring and checking for slowness.
Posted Sep 09, 2025 - 18:45 EDT

Update

We are continuing to monitor for any further issues.
Posted Sep 09, 2025 - 17:43 EDT

Update

We are continuing to monitor for any further issues.
Posted Sep 09, 2025 - 16:17 EDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Sep 09, 2025 - 15:13 EDT

Identified

The issue has been identified and a fix is being implemented.
Posted Sep 09, 2025 - 15:04 EDT

Update

We are continuing to investigate this issue.
Posted Sep 09, 2025 - 14:32 EDT

Update

We are continuing to investigate this issue.
Posted Sep 09, 2025 - 14:08 EDT

Update

We are continuing to investigate this issue.
Posted Sep 09, 2025 - 13:52 EDT

Update

We are continuing to investigate this issue. Admins and end users are experiencing errors when logging in
Posted Sep 09, 2025 - 13:48 EDT

Investigating

We're experiencing an elevated level of API errors and are currently looking into the issue.
Posted Sep 09, 2025 - 13:42 EDT
This incident affected: Northpass App - AWS.