US: Media (like video/documents) in learning activities not showing

Incident Report for FeedbackFruits

Postmortem

Context
The FeedbackFruits product consists of multiple modules that perform various tasks, such as storing activities, synchronizing grades and handling file uploads by students and teachers. This latter module, the Media Service, also handles serving uploaded files. This enables activities such as Interactive Video and Interactive Document to display their respective file types. Like all other FeedbackFruits services, this service has a unique instance per region.

Files in the Media Service can be stored in various ways, but the two relevant ones are the FeedbackFruits file storage system (Azure Blob Storage) and a so-called URL store, where the file is located on a third-party server (e.g. a PDF on Wikipedia, or a video on Panopto) which we point to.
When a file is served to a user, it may require conversion from one filetype to another. In this instance, the relevant conversion involved converting a SRT subtitle file to a VTT subtitle file that is suitable for usage in a browser context.

Incident
On 2026-02-02 at 15:11 UTC we received an automated notification that the Media Service was, intermittently, not available in the US region. At 16:39 UTC on the same day, our support personnel raised the same issue through our internal escalation channels. Following an initial investigation into the issue, at 17:03 UTC an incident was created on our status page.

Inspection of the log files for the Media Service in the US region revealed that a request for a specific SRT file was causing the Media Service to crash, every time this file was requested. The direct cause for this was that the file was persisted as a malformed URL store. These files were created by an earlier issue, where a very limited amount of files (144 in the US region) would be physically stored on our Azure Blob Storage instance, but instead of pointing to this file, our records would point to a URL that points to this file. This particular URL would, as part of our security best-practices, expire after 24 hours, thus preventing the Media Service from converting or serving this file.

This would normally not be problematic, as the Media Service is built with such scenarios in mind. However, the conversion path for SRT to VTT files was susceptible to crashing if the input file is malformed, as was the case here. Any time this particular file would be requested, it would crash the pods hosting the Media Service. Automatic retries for this file from our web application and worker queue would therefore effectively prevent the Media Service from serving files to other clients.

Impact
Users in the US region experienced failures when loading media assets for activities relying on the Media Service, including Interactive Video and Interactive Document. In practice, this resulted in media files failing to load or timing out for end users. Other regions were unaffected.

Resolution
At 17:09 UTC, the issue was identified by engineers. Following automated conversion of these 144 malformed files, at 17:21 UTC, the incident was resolved and traffic to the Media service was able to resume as normal.

Mitigations
The direct cause for this incident, the existence of malformed files, has been addressed as part of the resolution. Previously, a fix was already introduced to prevent these malformed files from being created going forward, but existing malformed files were not removed in the US region due to human error. Going forward, we will utilize automated migrations for the Media Service to prevent manual scripts from not being applied properly to each region.

Furthermore, the automated notification for downtime in the US region was not accurately reporting the nature of the incident, which was sustained rather than intermittent as the automated alerting system would suggest. We will adjust our monitoring to better account for momentary re-availability of services that are experiencing prolonged downtime, as to not misrepresent issues that are affecting real users.

Finally, an error in the conversion from SRT to VTT allowed the application to continuously crash - this should not be the case for any conversion path, and we will be rolling out changes shortly to prevent this conversion from bringing down the whole application when a single malformed file fails to convert. This effort will include regression testing on all existing conversion paths to prevent similar scenarios from occurring in the future.

Posted Feb 03, 2026 - 17:45 UTC

Resolved

The issue causing reduced availability for media in the FeedbackFruits US region has been fully resolved. We will continue to monitor system performance to ensure stability.
Posted Feb 02, 2026 - 17:45 UTC

Monitoring

Fix has been implemented and we're monitoring the rollout, traffic is returning to normal. No regions other than US were affected.
Posted Feb 02, 2026 - 17:22 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Feb 02, 2026 - 17:10 UTC

Investigating

We are currently investigating this issue. Based on our current assessment, this issue is only affecting customers on our US hosting region.
Posted Feb 02, 2026 - 17:03 UTC
This incident affected: United States (Media).