From GC Jitter to Stable Low-Latency: High-Performance Optimization with Span and Memory in ShenDesk

Background

ShenDesk is an enterprise-grade, real-time customer service and visitor behavior tracking system, supporting both SaaS and On-Premises deployments. Built on the .NET ecosystem, its core architecture features high-concurrency WebSocket channels, real-time message processing, visitor tracking data stream analysis, and extensible Open APIs.

As the creator of ShenDesk, I have overseen the system’s evolution from its initial "proof-of-concept" phase to its current state: a robust solution capable of supporting demanding commercial environments. Along this journey, the platform has undergone multiple performance bottlenecks and architectural refactorings.

Driven by these real-world engineering challenges, I began a deep dive into the .NET memory model and high-performance programming techniques.

Introduction

In an era where high concurrency and low latency have become the default expectations, modern .NET applications can no longer afford "unconscious memory allocations." Especially in big data processing, file parsing, network communication, and real-time systems, frequent array copying and string allocations are often the true culprits behind performance bottlenecks.

While developing the ShenDesk server application, we encountered a textbook challenge:
High-frequency WebSocket message processing and visitor tracking data stream parsing generated a massive volume of short-lived objects during peak hours. This led to spiked GC pressure, noticeable latency jitter, and capped throughput.

If we had stuck to traditional array manipulations and string concatenation patterns, our window for optimization would have been extremely narrow.

This is precisely why Span<T> and Memory<T> were introduced to C#.

These types allow us to slice and manipulate memory without triggering additional heap allocations. By leveraging stack allocation and controlled memory reference mechanisms, we can:

Eliminate unnecessary array copying.
Reduce GC invocation frequency.
Mitigate latency jitter.
Enhance throughput stability.

By integrating Span<T> into ShenDesk’s real-time messaging channels and protocol parsing layers, we significantly reduced transient allocations, allowing the system to remain steady under heavy concurrent loads.

Drawing from real-world engineering scenarios, this article will provide a deep dive into:

The underlying mechanics of Span<T> and Memory<T>.
How they facilitate allocation-free memory management.
Best practices for Web API, WebSocket, and I/O processing.
Tangible performance gains observed in production systems.

If you are building high-performance .NET systems—or if you are currently battling GC stutters and latency spikes—understanding these two types is no longer an "advanced trick"; it is an engineering essential.

What is a Span?

A Span is a lightweight value type that represents a contiguous region of memory. Unlike traditional array operations, Span allows developers to directly access and manipulate memory data without the need for data copying.

A Span can point to various memory sources, including:

Arrays
Stack memory
Native memory (unmanaged memory)
Strings

The key advantage of Span is its ability to avoid unnecessary memory allocations. While traditional array or string operations often create new objects and copy data, Span operates directly on existing memory. This architectural shift significantly improves application performance and efficiency.

// Example: Operating on an array using Span
int[] array = new int[100];
Span<int> span = array.AsSpan();

// Directly modify the original array
span[0] = 10;
span.Slice(10, 20).Fill(1); // Fill a specific segment of data

The Importance of Span

Traditional operations, such as extracting substrings or creating array slices, typically result in the creation of new objects in memory. These additional memory allocations not only increase the workload of the Garbage Collector (GC) but also degrade overall application performance.

Span addresses this issue by creating a "view" of existing memory rather than copying the data. This approach offers several key advantages:

Faster execution speeds
Reduced memory footprint
Enhanced overall system throughput
Lower Garbage Collection frequency

Span is particularly indispensable in high-performance scenarios, such as:

Web Servers (e.g., handling high-concurrency requests)
Data Parsers (e.g., JSON or binary protocol parsing)
Real-time Systems (where latency is critical)
Large-scale Data Processing

Core Features of Span

Span provides several critical features that make it the ideal choice for high-performance memory management:

Stack-only Allocation: As a ref struct, Span is not allocated on the managed heap, which significantly boosts execution efficiency.
Slicing Support: It allows for direct manipulation of specific memory segments without the overhead of data duplication.
Type and Memory Safety: Span provides indexed access with bounds checking, preventing common pitfalls like buffer overflows while maintaining performance.
Reduced GC Pressure: By minimizing transient object creation, it drastically reduces the frequency and duration of Garbage Collection cycles.
Optimized for Big Data: It excels in scenarios involving massive datasets where traditional object-oriented overhead becomes a bottleneck.

// Example: Slicing operations with Span
byte[] buffer = new byte[1024];
Span<byte> bufferSpan = buffer.AsSpan();

// Process the first 512 bytes
ProcessData(bufferSpan.Slice(0, 512));

// Process the remaining 512 bytes
ProcessData(bufferSpan.Slice(512));

What is Memory?

While Memory<T> shares many similarities with Span<T>, it is specifically designed for a broader range of scenarios—most notably asynchronous programming. The key distinctions are:

Stack vs. Heap: While Span<T> is a "stack-only" type restricted to synchronous methods, Memory<T> can be stored on the heap.
Asynchronous Compatibility: Memory<T> can be used within async methods and stored as a class field, allowing it to persist across await boundaries.
Lifecycle Management: It represents a memory region that can be safely passed between asynchronous operations.
Performance Trade-off: Although Memory<T> is slightly slower than Span<T> due to its extra layer of abstraction, it offers significantly more flexibility while still maintaining excellent performance compared to traditional arrays.

// Example: Using Memory in an asynchronous method
async Task ProcessDataAsync(Memory<byte> dataMemory)
{
    // Processing data asynchronously
    // Notice we convert to .Span at the point of synchronous execution
    await Task.Run(() => ProcessData(dataMemory.Span));

    // Memory can be stored for later use, unlike Span
    _storedMemory = dataMemory;
}

When to Use Span

Span<T> is the optimal choice in the following scenarios:

Synchronous Workflows: Ideal for processing arrays, buffers, or strings within a single execution thread.
Data Parsing: High-performance parsing of network protocols, custom file formats, or structured logs.
File I/O Operations: Efficiently reading from or writing to large files without intermediate allocations.
Massive Datasets: When your application requires low-latency processing of large-scale data.
Performance-Critical Paths: Any "hot path" in your code where memory optimization and GC pressure are significant concerns.

// Example: Parsing a string using Span
string s = "127.0.0.1:8080";
ReadOnlySpan<char> span = s.AsSpan();

int colonPos = span.IndexOf(':');
if (colonPos > 0)
{
    // These slices do not create new string objects
    var ipSpan = span.Slice(0, colonPos);
    var portSpan = span.Slice(colonPos + 1);

    // Process the IP and Port as views of the original string
}

When to Use Span

Span<T> is the optimal choice in the following scenarios:

Synchronous Workflows: Ideal for processing arrays, buffers, or strings within a single execution thread.
Data Parsing: High-performance parsing of network protocols, custom file formats, or structured logs.
File I/O Operations: Efficiently reading from or writing to large files without intermediate allocations.
Massive Datasets: When your application requires low-latency processing of large-scale data.
Performance-Critical Paths: Any "hot path" in your code where memory optimization and GC pressure are significant concerns.

// Example: Parsing a string using Span
string s = "127.0.0.1:8080";
ReadOnlySpan<char> span = s.AsSpan();

int colonPos = span.IndexOf(':');
if (colonPos > 0)
{
    // These slices do not create new string objects
    var ipSpan = span.Slice(0, colonPos);
    var portSpan = span.Slice(colonPos + 1);

    // Process the IP and Port as views of the original string
}

Real-World Application Scenarios

Span<T> and Memory<T> have been widely adopted across various high-performance domains, including:

High-Performance Web APIs: ASP.NET Core leverages Span internally to maximize throughput.
File Processing Systems: Efficiently handling large-scale file I/O with minimal overhead.
Networking Applications: Protocol parsing and low-level packet processing.
Real-Time Systems: Low-latency data manipulation where every millisecond counts.
Game Development: High-efficiency memory management for rendering and physics engines.
Parsers and Serializers: Rapid data transformation for formats like JSON, XML, or Protobuf.

For instance, ASP.NET Core utilizes Span extensively within its internal pipeline to optimize request handling, particularly in the following areas:

Request Header Parsing: Splitting and validating headers without string allocations.
URL Decoding: In-place transformation of encoded characters.
JSON Serialization/Deserialization: Using System.Text.Json for high-speed data binding.
Response Buffering: Directly writing to the output stream buffer.

Performance Benefits

Implementing Span<T> and Memory<T> yields significant performance gains across several key metrics:

Minimal Memory Allocation: Eliminates redundant allocations and data duplication.
Accelerated Execution: Direct memory access removes intermediate processing steps, shortening the execution path.
Reduced GC Pressure: Drastically lowers both the frequency of garbage collection and the duration of "stop-the-world" pauses.
Optimized Resource Utilization: Ideally suited for high-throughput, high-concurrency applications.

Benchmark data indicates that transitioning to Span can achieve:

Over 90% reduction in memory allocations.
2x to 5x faster execution speeds in intensive processing tasks.
Significant reduction in Garbage Collection (GC) pause times.

// Performance Comparison: String Processing
// Traditional Approach - Allocates a new string object
string substring = bigString.Substring(start, length);

// Using Span - Zero-allocation "view" of the original string
ReadOnlySpan<char> span = bigString.AsSpan().Slice(start, length);

Conclusion

High performance is not about "showing off" technical tricks; it is a fundamental engineering mindset.

At their core, Span<T> and Memory<T> solve more than just memory allocation issues. They challenge us to rethink how data flows through our systems: Does this data really need to be copied? Is a heap allocation truly necessary? Can we eliminate Garbage Collection (GC) pressure at the source?

When you begin to scrutinize your code through the lenses of "memory lifecycle" and "allocation cost," you quickly realize that many so-called performance bottlenecks are simply the result of unconscious programming habits.

Throughout the evolution of ShenDesk, the benefits of this structural shift in thinking have far outweighed any single micro-optimization. Our systems have become more resilient under high concurrency, our latency curves have smoothed out, and our resource utilization has become far more predictable.

If you are building .NET systems for real-world users, my advice is straightforward:

Don't wait for a performance crisis to start firefighting. Understand these low-level capabilities early on, and make "avoiding unnecessary allocations" your default principle.

Engineering quality is never decided at the moment of deployment; it is forged by the choices we make every time we write a line of code.

Wrapping up

ShenDesk is still evolving.

If you’ve ever built or deployed a real-time chat system, I’d genuinely love to hear your experience—
how you handled live updates, load balancing, or flexible deployment models in production.

Let’s compare notes.

If you’re curious

I’ve been building ShenDesk, a customer support chat system designed to run reliably
both online and on your own infrastructure.

🌐 Website: https://shendesk.com
📘 Documentation: https://docs.shendesk.com

You can try it for free, whether you prefer a hosted setup or self-hosting.

Feedback from developers interested in self-hosted systems, real-time communication,
and customer experience engineering is always welcome.

UI snapshots

Visitor side
Fast loading, no message loss

Agent side
Reliable, feature-rich, built for real-world support work

Web admin panel

Search This Blog

Xusheng Cao