Amazon SDE-2 / Intermediate

Amazon SDE-2.NET Interview 2026

Level: L5 | Bar Raiser: Yes | 80% of these were asked Aug-Oct 2025. Master all 25 = phone screen ready.

⚠️ Amazon SDE-2 Bar: Code must be production-ready + System Design for 100K TPS + Bar Raiser on 3-4 LPs. If you can't explain trade-offs with numbers, instant reject.

Amazon SDE-2 Loop: What Amazon Actually Tests

RoundTimeFail RateKey Focus
Coding 145 min62%Leetcode Med + C# idiomatic. async, LINQ, edge cases. Must compile.
Coding 245 min55%OOP + Concurrency. Thread-safe cache, Producer-Consumer, Channels
System Design60 min71%"Design Prime Video API". Scale, DynamoDB, SQS, C# microservices. Numbers or fail.
Bar Raiser60 min49%3-4 LPs deep. STAR + metrics. "Disagree and Commit" = 90% ask rate.

Part 1: Coding + Data Structures - 10 Questions

Question: Implement thread-safe LRU Cache. Get/Put O(1). Handle 10K concurrent requests.

Expected Answer: ConcurrentDictionary + ReaderWriterLockSlim + Doubly Linked List. Or use MemoryCache with size limit.
Amazon follow-up: How to shard for 1M keys? Answer: Consistent hashing on key, 100 shards. Each shard = LRU. IMemoryCache per shard.
Bar Raiser Trap: Using lock on entire method = blocks readers. Use ReaderWriterLockSlim or ConcurrentLru from Microsoft.Extensions.Caching.

Question: Infinite stream of numbers. Return top 10 frequent at any time. O(log K).

Answer: HashMap for freq + MinHeap of size 10. PriorityQueue<int,int> in.NET 6+.
Code: On new num: increment freq. If in heap, update. If not and freq > heap.Min, replace min.
Follow-up: What if K=1M? Answer: Heap won't fit. Use Count-Min Sketch for approx, or MapReduce. Show you know limits.
Why they ask: Tests heap + streaming. Amazon CloudWatch Logs does this.

Question: SQS pushes 50K msg/s. Process with 100 workers. Don't OOM.

Answer: System.Threading.Channels with BoundedChannelOptions(1000). FullMode = Wait = backpressure.
var ch = Channel.CreateBounded<Msg>(new(1000) { FullMode = Wait });
_ = Task.Run(async () => { // Producer
    await foreach(var msg in SqsPoller()) await ch.Writer.WriteAsync(msg);
});
await Parallel.ForEachAsync(ch.Reader.ReadAllAsync(), new() { MaxDegreeOfParallelism = 100 }, 
    async (msg, ct) => await Process(msg));
Trap: BlockingCollection blocks threads. Channels = async. Task.Run per message = thread explosion.

Question: Serialize tree to string, deserialize. Handle 1M nodes.

Answer: Preorder with "null" marker. Use Queue for deser. For 1M nodes, recursion = StackOverflow. Use explicit stack or BFS.
Follow-up: Compress it? Answer: Protobuf or write ints as bytes, not strings. BinaryWriter = 4 bytes vs "12345," = 6 bytes.
SDE-2 bar: Must mention stack overflow risk and fix.

Question: Merge K sorted List<int>. K=1000, total N=1M. O(N log K).

Answer: MinHeap of K elements. Each element = {val, listIndex, elementIndex}.
Code: PriorityQueue<(int val, int li, int ei), int> pq. Push first from each list. Pop min, push next from same list.
Trap: Naive merge = O(N*K). Or List.Sort() = O(N log N). Heap is optimal.
Follow-up: What if lists on disk? Answer: External sort. Read chunk from each file, merge.

Question: Increment counter from 100 threads. No locks. Fast.

Answer: Interlocked.Increment(ref _count). Atomic CPU instruction. 10x faster than lock.
Follow-up: Need to increment 3 counters atomically? Answer: Can't with Interlocked. Use lock or lock-free struct with Interlocked.CompareExchange loop.
Trap: volatile int is not atomic. i++ = read,add,write. Race condition.

Question: Detect cycle in dependency graph. 100K nodes.

Answer: DFS with 3 states: 0=unvisited, 1=visiting, 2=visited. If hit state 1, cycle.
Code: bool Dfs(int node, int[] state) { if(state==1) return true;... }
Follow-up: Print cycle nodes? Answer: Track path stack. On cycle, unwind.
Amazon use: Circular dependencies in microservices. Build system.

Question: Code Token Bucket: 100 req/s, burst 200. Thread-safe.

Answer: Track tokens + lastRefill. On request: refill based on time, check tokens.
class TokenBucket {
    double tokens; DateTime last; readonly double rate; readonly int cap;
    public bool Allow() {
        lock(this) {
            var now = DateTime.UtcNow;
            tokens = Math.Min(cap, tokens + (now-last).TotalSeconds * rate);
            last = now;
            if(tokens >= 1) { tokens--; return true; }
            return false;
        }
    }
}
Follow-up: Distributed? Answer: Redis + Lua script for atomic. This is local only.

Question: [1,[2,[3]],4][1,2,3,4]. Iterator, not recursion.

Answer: Use Stack. Push in reverse. If item is list, push contents. IEnumerable<int> with yield return.
Why iterator: Recursion = stack overflow for depth 10K. Stack = O(depth) memory.
SDE-2 bar: Must handle object or dynamic. Type check: if(item is IList<object>)

Question: OOP design: mkdir, ls, addFile, readFile. Like Linux.

Answer: Composite pattern. abstract class Node { string Name; } class File : Node { byte[] Content; } class Dir : Node { Dictionary<string, Node> children; }
Follow-up: Add symlinks? Answer: class Symlink : Node { string Target; } Resolve with cycle check.
Follow-up: 1B files? Answer: Don't store in memory. Use Trie or DB. This is in-memory design.

Part 2:.NET Core + Concurrency - 8 Questions

Question: When to use ValueTask? Why not always? Thread vs Task?

Answer: Thread: OS thread, 1MB stack. For CPU-bound, blocking work. Avoid.
Task: Heap alloc, ThreadPool. For I/O-bound. await releases thread.
ValueTask: Struct. No alloc if result ready. Use when >90% sync: MemoryCache.TryGetValue. Can't await twice.
Amazon math: 50K RPS * 24 bytes Task = 1.2MB/s GC. ValueTask cuts 80% if cache hit 80%.

Question: Parse 1GB log without GC. Use Span?

Answer: Span<T> = ref struct, stack-only, no heap. Slice arrays zero-copy. Memory<T> = heap-safe, can store in field.
ReadOnlySpan<char> line = logLine.AsSpan();
var level = line.Slice(0, line.IndexOf(' ')); // No string alloc
Trap: Can't use Span in async or class field. Use ReadOnlyMemory<char> then .Span in method.
SDE-2 bar: Kestrel uses Span. High-perf = must know.

Question: Inject Scoped DbContext into Singleton service. What happens?

Answer: Captive Dependency. Singleton lives app lifetime. Gets first Scoped DbContext. After first request ends, DbContext disposed. Next call = ObjectDisposedException.
Fix: Inject IServiceScopeFactory. Create scope per operation.
using var scope = _scopeFactory.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDb>();
Interview ends if: You say "make DbContext Singleton". That's crash.

Question: Why not new HttpClient() or static?

Answer: new per request = socket exhaustion. TIME_WAIT 240s. Static = DNS stale. IHttpClientFactory pools handlers, rotates every 2min for DNS updates.
Code: services.AddHttpClient("api").SetHandlerLifetime(TimeSpan.FromMinutes(5));
Amazon incident: This bug took down prod. Bar raiser asks. DNS failover won't work with static.

Question: What does compiler generate for await?

Answer: State machine struct. IAsyncStateMachine. Stores locals, resume label. On await, return Task, register continuation. On complete, MoveNext() resumes.
Why SDE-2: Debug perf. Each await = state machine alloc if not ValueTask. for loop with await = N allocs. Use ValueTask or batch.
Follow-up: How to avoid alloc? Answer: async ValueTask, or ManualResetValueTaskSource for pooling.

Question: Gen2 GC every 5s. API freezes. Root cause?

Answer: Gen2 = expensive full heap scan. Every 5s = LOH pressure or memory leak. LOH >85KB objects not compacted.
Debug: dotnet-counters monitor → Gen2 count. dotnet-gcdump → see who holds refs.
Common: byte[] buffer = new byte[100_000]; in loop. Use ArrayPool<byte>.Shared.Rent().
SDE-2 bar: Must know LOH, POH. "Add RAM" = reject.

Question: Include(u => u.Orders).Include(u => u.Addresses) slow. 10K users * 10 orders * 2 addr = 200K rows.

Answer: Cartesian explosion. Join = 10K*10*2 rows. Use .AsSplitQuery(). Runs 3 queries: users, orders, addresses. O(users+orders+addr).
When not: If child collections small, join is faster. Measure.
Trap: Split query + Take(10) = wrong. Loads all then takes 10. Use .AsSingleQuery() for paging.

Question: app.UseExceptionHandler() after app.UseRouting(). Bug?

Answer: Yes. Exception in Routing won't be caught. Middleware = pipeline. Order: Exception → HTTPS → Static → Routing → Auth → Endpoints.
SDE-2 must know: UseRouting + UseEndpoints split. Auth must be between. UseStaticFiles before Auth = CSS public.
Debug: 500 errors not hitting ExceptionHandler. Check order.

Part 3: System Design + LP - 7 Questions

Requirements: Upload 5GB file. Resumable. Fast.

Answer:
1. Init: Client → API: InitiateMultipartUpload. API → S3. Return UploadId. Store in DynamoDB.
2. Upload: Client splits 5GB/100MB=50 parts. Upload parallel to S3 via presigned URLs. UploadPart. Store ETag per part in DDB.
3. Complete: Client sends ETag list. API → CompleteMultipartUpload. S3 concatenates.
4. Resume: Client asks API for parts. API → ListParts. Skip uploaded.
Scale: 50 parts * 1000 users = 50K PUT/s. S3 limit 3500 PUT/s per prefix. Use random prefix or increase limits.
.NET: TransferUtility.UploadAsync() does this. Or manual with HttpClient.
LP: "Insist on Highest Standards" - Checksum each part. "Dive Deep" - What if Complete fails?

Requirements: 1000 req/s per user. 100K users. <10ms p99.

Answer:
Algorithm: Token Bucket. Smooth. Sliding Window = accurate but more memory.
Storage: Redis Cluster. Key=rl:{userId}. Value={tokens, lastTs}. Lua script for atomic:
local key = KEYS[1] local rate=ARGV[1] local cap=ARGV[2]
local data = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(data[1]) or cap
local ts = tonumber(data[2]) or 0
local now = redis.call('TIME')[1]
tokens = math.min(cap, tokens + (now-ts)*rate)
if tokens >= 1 then tokens=tokens-1; redis.call('HMSET',key,'tokens',tokens,'ts',now) return 1 end
return 0
Scale: 100K users * 100 bytes = 10MB. Redis handles 1M ops/s. Shard by userId.
Follow-up: Redis down? Answer: Fail open with local cache, or fail closed. Circuit breaker. Polly in.NET.

Question: OrderCreated needs to go to Email, SMS, Analytics, Fraud. Design.

Answer: SNS Topic: OrderCreated. Fan-out.
Subscribers: 4 SQS queues. EmailService polls EmailQueue. SMSService polls SMSQueue.
Why not SQS direct: OrderService would need to know all consumers. Tight coupling. SNS = pub/sub.
EventBridge: If need routing: if total > 1000 then FraudQueue. Has schema registry.
.NET: IAmazonSQS + IHostedService long poll. MessageAttribute for filtering.
Follow-up: Exactly-once? Answer: SQS FIFO + dedup ID. Or idempotent consumer with DynamoDB.

Question: PK=userId. 90% traffic is userId="guest". Throttling. Fix?

Answer:
1. Write Sharding: PK=guest#1 to guest#10. Random on write. On read, Query all 10 + merge.
2. Cache: DAX or Redis for guest. Write-through.
3. GSI: If query pattern allows, GSI with different PK.
Numbers: 1 partition = 3000 RCU, 1000 WCU. Hot key blows past. Shard to 10 = 30K RCU.
SDE-2 bar: Must mention 3 solutions + trade-offs. "Add RCU" = reject.

Bad SDE-1 answer: "I added cache." No data = reject.

SDE-2 Bar Raiser answer:
S: Checkout API p99 2.1s, SLA 300ms. 5% customers dropping.
T: Find root cause + fix in 3 days before Prime Day.
A: 1. X-Ray trace: 70% time in DynamoDB. 2. CloudWatch Metrics: Hot partition on userId="guest". 3. Changed PK to userId#timestamp for guests. 4. Added DAX cache. 5. Load test with k6 to 150K RPS.
R: p99 2.1s → 180ms. Cart abandonment -3%. Prime Day 0 incidents. Promoted to SDE-3.
Key: Metrics before/after. Tool names. "I" not "we". 2-day timeline = "Bias for Action".

Answer Framework:
S: Principal wanted DynamoDB Streams + Lambda for audit. I thought Kinesis cheaper.
T: Convince without ego, but commit if I lose.
A: 1. Data: Streams $0.02/M vs Kinesis $0.015/M + 20% less latency. 2. POC: Kinesis handled 50K RPS. 3. Doc with pros/cons. 4. Principal still chose Streams for integration. 5. I committed: Built it, added alerting, shipped.
R: Shipped on time. 6 months later we hit Streams limit, migrated to Kinesis using my POC. Earned trust.
Reject if: "I was right" = no commit. "I just did it" = no disagree. Need both.

Answer:
S: Deploy pipeline had 15 microservices, 2-hour deploy, 30% fail rate.
T: Cut deploy time to <15min, improve reliability.
A: 1. Invented: Built custom.NET tool to diff K8s manifests, deploy only changed services. 2. Simplified: Merged 15 services to 3 domain services. Used Backstage.io for 1-click deploy. Added canary + auto-rollback.
R: Deploy time 2hr → 8min. Fail rate 30% → 2%. Team velocity +40%. Got promoted.
Key: "Invent" = new tool. "Simplify" = removed complexity. Metrics mandatory.

Amazon SDE-2 Quiz: 8 Questions 🏆

Passing: 6/8 | These are actual screen-level questions from Aug-Oct 2025. Score 6+ = phone screen ready.

🔥 Top 5 SDE-2 Reject Reasons from 2025
  1. Can't scale past 10K TPS: No Redis, no async, says "add servers"
  2. Bar Raiser fail: LP stories with no metrics or "we" not "I"
  3. System Design: "Use Kafka" without requirements. No capacity numbers
  4. Concurrency bugs: lock(this), race conditions, don't know Interlocked
  5. Defensive: Argues with interviewer. No "Dive Deep" on feedback

Mastered SDE-2? L6 Bar Is Different

SDE-3 tests distributed systems + Principal-level LPs. No coding warmups. Only hard trade-offs.

Go to SDE-3 Advanced →

Or review: SDE-1 Fundamentals


Comments on Amazon SDE-2 / Intermediate (0)

No comments yet. Be the first to share your thoughts!