Challeges - JSON Processing - C#
JSON Data Transformation: From LINQ to High-Performance with Span<T>

⚙️ The Parsing Challenge
Receive a JSON containing names and scores as strings. Clean the data and return only valid users (score between 0 and 100).
[
{"name": " Alice ", "score": "295"},
{"name": "Bob", "score": "58"},
{"name": "Charlie", "score": "72"},
{"name": "Daisy", "score": "88 "},
{"name": "Eve", "score": "null"},
{"name": "Frank", "score": "30"},
{"name": "Grace", "score": "-81"},
{"name": "Hank", "score": "a90"},
{"name": "Jack", "score": "0"},
{"name": "", "score": "1"}
]
The challenge seems straightforward: parse a "dirty" JSON from a legacy system, trim whitespace, convert values, and filter results.
But when we scale to 10 million records, the gap between "elegant code" and "efficient code" can cost gigabytes of memory and precious CPU cycles...
🏗️ The Setup: Class vs. Record
Before we dive into the logic, we need to define our data structures. We use a standard class to represent the raw, "dirty" data coming from the JSON (where everything is a string), and a record for our immutable, cleaned-up final object.
public record UserRecord(string Name, int Score);
public class User
{
public string name { get; set; } = string.Empty;
public string score { get; set; } = string.Empty;
}
By using a record for the output, we gain built-in immutability and value-based equality, which is ideal for data transformation pipelines.
📜 Solution 1: The Declarative Standard (LINQ)
The classic approach using LINQ. It’s highly readable and expressive, and uses modern C# features. However, beneath this clean syntax lies a performance trap: Memory Pressure.
public static IEnumerable<UserRecord> StandardProcess(string json)
{
var users = JsonSerializer.Deserialize<List<User>>(json) ?? [];
return users
.Where(u =>
!string.IsNullOrEmpty(u.name?.Trim())
&&
!string.IsNullOrEmpty(u.score?.Trim())
)
.Select(u => new {
Name = u.name.Trim(),
IsValid = int.TryParse(u.score.Trim(), out int score),
Score = score
})
.Where(x => x.IsValid && (x.Score >= 0 && x.Score <= 100))
.Select(x => new UserRecord(x.Name, x.Score));
}
⚠️ The "String Churn" Problem
Every time you call .Trim(), C# allocates a brand new string object on the Managed Heap. In this pipeline, we are trimming the same strings multiple times just to check if they are valid.
If you have 10 million records, you aren't just processing 10 million strings: you are potentially creating 30 to 40 million temporary objects that the Garbage Collector (GC) will have to clean up. This cycle of rapid allocation and destruction creates Memory Pressure, leading to frequent GC Pauses that freeze your application's execution while it tries to "clean the house".
🔄 Turning Point: The Power of Span
Instead of creating a new string on the Heap every time we clean a space, we leverage ReadOnlySpan<char>. Think of a Span as a "window" into existing memory. Performing a .Trim() on a Span doesn't copy data: it merely moves the start and end pointers on the Stack.
The Result: Zero extra garbage allocated on the Heap during the validation phase.
public static IEnumerable<UserRecord> OptimizedProcess(string json)
{
var users = JsonSerializer.Deserialize<List<User>>(json) ?? [];
static bool IsUserValid(User u)
{
var nameSpan = u.name.AsSpan().Trim();
var scoreSpan = u.score.AsSpan().Trim();
if (nameSpan.IsEmpty || scoreSpan.IsEmpty) return false;
return
int.TryParse(scoreSpan, out int score)
&&
(score >= 0 && score <= 100);
}
return users
.Where(IsUserValid)
.Select(u => new UserRecord(r.name.Trim(), int.Parse(u.score)));
}
🚀 Why this is a Game Changer
By using a static local function, we ensure that the logic is isolated and doesn't capture external variables, which prevents unnecessary allocations. We only perform the actual string allocation (.Trim() and .ToString()) inside the .Select() for records that we already know are valid.
However, notice a small catch: we are still calling int.Parse() after int.TryParse(). We are doing the work twice. To reach the peak of engineering, we need to solve this redundancy.
💎 The Elite Solution: Yield Return & Streaming
Why create intermediate anonymous lists if you can process items one by one? The yield return keyword creates a State Machine behind the scenes that delivers data on demand. Combined with Span, we achieve an optimized, low-allocation pipeline.
public static IEnumerable<UserRecord> HighPerformanceProcess(string json)
{
var users = JsonSerializer.Deserialize<List<User>>(json) ?? [];
foreach (var u in users)
{
var nameSpan = u.name.AsSpan().Trim();
var scoreSpan = u.score.AsSpan().Trim();
if (nameSpan.IsEmpty || scoreSpan.IsEmpty) continue;
if (int.TryParse(scoreSpan, out int score)
&&
score >= 0 && score <= 100)
{
yield return new UserRecord(nameSpan.ToString(), score);
}
}
}
In this final version, we avoid the "double parse" (validating with int.TryParse and then converting again with int.Parse) and leverage the full power of Streaming for a continuous data flow.
📉 The 10M Record Shock: A Reality Check
This is where pure "syntactic sugar" fails under pressure. When testing with 10,000,000 records, the results are brutal. We aren't just comparing execution time: we are observing how the Garbage Collector (GC) struggles to keep up with massive string allocations.
📊 The Brutal Numbers (Actual Console Output)
Strategy | Execution Time | Allocated Memory | Verdict |
Standard (LINQ) | 6,779.36 ms | 1,048,975.98 KB | High GC Thrashing |
Optimized (Local Function) | 6,465.75 ms | 1,371,074.11 KB | Balanced |
High Performance (Yield) | 6,004.58 ms | 1,438,536.04 KB | Elite Performance |
🧠 The Analysis: CPU vs. Memory
Looking at the Standard (LINQ) approach, it appears to use "less" memory (1.0 GB), but it is significantly slower.
Why? Because it generates so much "string churn" that the Garbage Collector is forced to run aggressively. Every time it cleans up those millions of temporary strings created by .Trim(), it pauses your application. Those extra 775 ms are the "tax" paid for inefficient memory management.
The High Performance version is the king of the hill. By using ReadOnlySpan and a single-pass yield return logic, it avoids redundant parsing and unnecessary allocations. It doesn't just run faster: it runs smarter, allowing the CPU to focus on processing data instead of managing trash.
🧩 Additional Notes
Modern C#: Utilizing Raw String Literals, Collection Expressions, Positional Records and Local Functions for clean, modern syntax.
Immutability: Using
recordtypes ensures transformed data is thread-safe and lightweight.Scalability: For massive volumes, combining
SpanandYieldtransforms C# into a low-latency tool that competes directly with lower-level languages.
Forget simple mapping. When scale knocks on your door, you need to master the Stack.
🔗 C# Console App full code on GitHub:
https://github.com/andreecirillo/csharp-high-performance-json





