What is it about?

This paper focuses on improving the reliability of live video streaming, especially for large-scale events where even small failures can disrupt thousands or millions of viewers. Today, live streaming platforms often use redundant cloud pipelines that encode and package the same live content in parallel. If one pipeline fails, another can take over. These systems rely on precisely synchronized video segments so that content from different pipelines can be swapped seamlessly. However, this approach has several practical challenges. Small timing differences between pipelines can cause segment mismatches. Defective or corrupted segments may still get distributed. Slight content misalignment can result in noticeable audio or video glitches during playback. In addition, scaling this redundancy model across multiple regions often requires publishing the same content multiple times, which increases operational complexity and infrastructure costs. To address these issues, the paper proposes a smarter, more scalable solution. It introduces segment template-aware origin servers that understand predictable segment timing and have visibility across multiple pipelines. This allows the origin to select only valid segments, avoid distributing defective content, and prioritize the healthiest pipeline to reduce playback issues. The paper also proposes scaling redundancy without duplicating publishing workflows. Instead, it uses multi-region distributed storage with asynchronous cross-region synchronization, creating a more decoupled, efficient, and scalable architecture for resilient live streaming.

Featured Image

Why is it important?

Live streaming failures are highly visible and disruptive — especially for sports, news, concerts, and major events. Even small glitches can damage user experience and brand reputation. Improving resiliency: Reduces outages and playback errors. Minimizes audio and video glitches. Makes large-scale live events more stable. Lowers infrastructure costs by avoiding unnecessary duplication. Simplifies system design while improving reliability. In short, this work helps make live streaming systems more reliable, more scalable, and more cost-efficient, which is critical as global demand for live content continues to grow.

Perspectives

I had the privilege of collaborating with an incredibly talented co-authors on this exciting and complex area of live streaming. We hope our work contributes to advancing the field of resilient and scalable live streaming systems.

Xiaomei Liu
Netflix Inc

Read the Original

This page is a summary of: Intelligent Live Origin and Resilient Streaming, February 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3789239.3793275.
You can read the full text:

Read

Contributors

The following have contributed to this page