Amazon SDE-2 / Intermediate
Amazon SDE-2.NET Interview 2026
Level: L5 | Bar Raiser: Yes | 80% of these were asked Aug-Oct 2025. Master all 25 = phone screen ready.
⚠️ Amazon SDE-2 Bar: Code must be production-ready + System Design for 100K TPS + Bar Raiser on 3-4 LPs. If you can't explain trade-offs with numbers, instant reject.
Amazon SDE-2 Loop: What Amazon Actually Tests
| Round | Time | Fail Rate | Key Focus |
|---|---|---|---|
| Coding 1 | 45 min | 62% | Leetcode Med + C# idiomatic. async, LINQ, edge cases. Must compile. |
| Coding 2 | 45 min | 55% | OOP + Concurrency. Thread-safe cache, Producer-Consumer, Channels |
| System Design | 60 min | 71% | "Design Prime Video API". Scale, DynamoDB, SQS, C# microservices. Numbers or fail. |
| Bar Raiser | 60 min | 49% | 3-4 LPs deep. STAR + metrics. "Disagree and Commit" = 90% ask rate. |
Part 1: Coding + Data Structures - 10 Questions
Question: Implement thread-safe LRU Cache. Get/Put O(1). Handle 10K concurrent requests.
Expected Answer:
Amazon follow-up: How to shard for 1M keys? Answer: Consistent hashing on key, 100 shards. Each shard = LRU.
Bar Raiser Trap: Using
Expected Answer:
ConcurrentDictionary + ReaderWriterLockSlim + Doubly Linked List. Or use MemoryCache with size limit.Amazon follow-up: How to shard for 1M keys? Answer: Consistent hashing on key, 100 shards. Each shard = LRU.
IMemoryCache per shard.Bar Raiser Trap: Using
lock on entire method = blocks readers. Use ReaderWriterLockSlim or ConcurrentLru from Microsoft.Extensions.Caching.
Question: Infinite stream of numbers. Return top 10 frequent at any time. O(log K).
Answer: HashMap for freq + MinHeap of size 10.
Code: On new num: increment freq. If in heap, update. If not and freq > heap.Min, replace min.
Follow-up: What if K=1M? Answer: Heap won't fit. Use Count-Min Sketch for approx, or MapReduce. Show you know limits.
Why they ask: Tests heap + streaming. Amazon CloudWatch Logs does this.
Answer: HashMap for freq + MinHeap of size 10.
PriorityQueue<int,int> in.NET 6+.Code: On new num: increment freq. If in heap, update. If not and freq > heap.Min, replace min.
Follow-up: What if K=1M? Answer: Heap won't fit. Use Count-Min Sketch for approx, or MapReduce. Show you know limits.
Why they ask: Tests heap + streaming. Amazon CloudWatch Logs does this.
Question: SQS pushes 50K msg/s. Process with 100 workers. Don't OOM.
Answer:
Answer:
System.Threading.Channels with BoundedChannelOptions(1000). FullMode = Wait = backpressure.var ch = Channel.CreateBounded<Msg>(new(1000) { FullMode = Wait });
_ = Task.Run(async () => { // Producer
await foreach(var msg in SqsPoller()) await ch.Writer.WriteAsync(msg);
});
await Parallel.ForEachAsync(ch.Reader.ReadAllAsync(), new() { MaxDegreeOfParallelism = 100 },
async (msg, ct) => await Process(msg));
Trap: BlockingCollection blocks threads. Channels = async. Task.Run per message = thread explosion.
Question: Serialize tree to string, deserialize. Handle 1M nodes.
Answer: Preorder with "null" marker. Use Queue for deser. For 1M nodes, recursion = StackOverflow. Use explicit stack or BFS.
Follow-up: Compress it? Answer: Protobuf or write ints as bytes, not strings.
SDE-2 bar: Must mention stack overflow risk and fix.
Answer: Preorder with "null" marker. Use Queue for deser. For 1M nodes, recursion = StackOverflow. Use explicit stack or BFS.
Follow-up: Compress it? Answer: Protobuf or write ints as bytes, not strings.
BinaryWriter = 4 bytes vs "12345," = 6 bytes.SDE-2 bar: Must mention stack overflow risk and fix.
Question: Merge K sorted
Answer: MinHeap of K elements. Each element = {val, listIndex, elementIndex}.
Code:
Trap: Naive merge = O(N*K). Or
Follow-up: What if lists on disk? Answer: External sort. Read chunk from each file, merge.
List<int>. K=1000, total N=1M. O(N log K).Answer: MinHeap of K elements. Each element = {val, listIndex, elementIndex}.
Code:
PriorityQueue<(int val, int li, int ei), int> pq. Push first from each list. Pop min, push next from same list.Trap: Naive merge = O(N*K). Or
List.Sort() = O(N log N). Heap is optimal.Follow-up: What if lists on disk? Answer: External sort. Read chunk from each file, merge.
Question: Increment counter from 100 threads. No locks. Fast.
Answer:
Follow-up: Need to increment 3 counters atomically? Answer: Can't with Interlocked. Use
Trap:
Answer:
Interlocked.Increment(ref _count). Atomic CPU instruction. 10x faster than lock.Follow-up: Need to increment 3 counters atomically? Answer: Can't with Interlocked. Use
lock or lock-free struct with Interlocked.CompareExchange loop.Trap:
volatile int is not atomic. i++ = read,add,write. Race condition.
Question: Detect cycle in dependency graph. 100K nodes.
Answer: DFS with 3 states: 0=unvisited, 1=visiting, 2=visited. If hit state 1, cycle.
Code:
Follow-up: Print cycle nodes? Answer: Track path stack. On cycle, unwind.
Amazon use: Circular dependencies in microservices. Build system.
Answer: DFS with 3 states: 0=unvisited, 1=visiting, 2=visited. If hit state 1, cycle.
Code:
bool Dfs(int node, int[] state) { if(state==1) return true;... }Follow-up: Print cycle nodes? Answer: Track path stack. On cycle, unwind.
Amazon use: Circular dependencies in microservices. Build system.
Question: Code Token Bucket: 100 req/s, burst 200. Thread-safe.
Answer: Track
Answer: Track
tokens + lastRefill. On request: refill based on time, check tokens.class TokenBucket {
double tokens; DateTime last; readonly double rate; readonly int cap;
public bool Allow() {
lock(this) {
var now = DateTime.UtcNow;
tokens = Math.Min(cap, tokens + (now-last).TotalSeconds * rate);
last = now;
if(tokens >= 1) { tokens--; return true; }
return false;
}
}
}
Follow-up: Distributed? Answer: Redis + Lua script for atomic. This is local only.
Question:
Answer: Use Stack. Push in reverse. If item is list, push contents.
Why iterator: Recursion = stack overflow for depth 10K. Stack = O(depth) memory.
SDE-2 bar: Must handle
[1,[2,[3]],4] → [1,2,3,4]. Iterator, not recursion.Answer: Use Stack. Push in reverse. If item is list, push contents.
IEnumerable<int> with yield return.Why iterator: Recursion = stack overflow for depth 10K. Stack = O(depth) memory.
SDE-2 bar: Must handle
object or dynamic. Type check: if(item is IList<object>)
Question: OOP design: mkdir, ls, addFile, readFile. Like Linux.
Answer: Composite pattern.
Follow-up: Add symlinks? Answer:
Follow-up: 1B files? Answer: Don't store in memory. Use Trie or DB. This is in-memory design.
Answer: Composite pattern.
abstract class Node { string Name; } class File : Node { byte[] Content; } class Dir : Node { Dictionary<string, Node> children; }Follow-up: Add symlinks? Answer:
class Symlink : Node { string Target; } Resolve with cycle check.Follow-up: 1B files? Answer: Don't store in memory. Use Trie or DB. This is in-memory design.
Part 2:.NET Core + Concurrency - 8 Questions
Question: When to use
Answer: Thread: OS thread, 1MB stack. For CPU-bound, blocking work. Avoid.
Task: Heap alloc, ThreadPool. For I/O-bound.
ValueTask: Struct. No alloc if result ready. Use when >90% sync:
Amazon math: 50K RPS * 24 bytes Task = 1.2MB/s GC. ValueTask cuts 80% if cache hit 80%.
ValueTask? Why not always? Thread vs Task?Answer: Thread: OS thread, 1MB stack. For CPU-bound, blocking work. Avoid.
Task: Heap alloc, ThreadPool. For I/O-bound.
await releases thread.ValueTask: Struct. No alloc if result ready. Use when >90% sync:
MemoryCache.TryGetValue. Can't await twice.Amazon math: 50K RPS * 24 bytes Task = 1.2MB/s GC. ValueTask cuts 80% if cache hit 80%.
Question: Parse 1GB log without GC. Use Span?
Answer:
SDE-2 bar: Kestrel uses Span. High-perf = must know.
Answer:
Span<T> = ref struct, stack-only, no heap. Slice arrays zero-copy. Memory<T> = heap-safe, can store in field.ReadOnlySpan<char> line = logLine.AsSpan();
var level = line.Slice(0, line.IndexOf(' ')); // No string alloc
Trap: Can't use Span in async or class field. Use ReadOnlyMemory<char> then .Span in method.SDE-2 bar: Kestrel uses Span. High-perf = must know.
Question: Inject Scoped
Answer: Captive Dependency. Singleton lives app lifetime. Gets first Scoped DbContext. After first request ends, DbContext disposed. Next call = ObjectDisposedException.
Fix: Inject
DbContext into Singleton service. What happens?Answer: Captive Dependency. Singleton lives app lifetime. Gets first Scoped DbContext. After first request ends, DbContext disposed. Next call = ObjectDisposedException.
Fix: Inject
IServiceScopeFactory. Create scope per operation.using var scope = _scopeFactory.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDb>();
Interview ends if: You say "make DbContext Singleton". That's crash.
Question: Why not
Answer:
Code:
Amazon incident: This bug took down prod. Bar raiser asks. DNS failover won't work with static.
new HttpClient() or static?Answer:
new per request = socket exhaustion. TIME_WAIT 240s. Static = DNS stale. IHttpClientFactory pools handlers, rotates every 2min for DNS updates.Code:
services.AddHttpClient("api").SetHandlerLifetime(TimeSpan.FromMinutes(5));Amazon incident: This bug took down prod. Bar raiser asks. DNS failover won't work with static.
Question: What does compiler generate for
Answer: State machine struct.
Why SDE-2: Debug perf. Each await = state machine alloc if not ValueTask.
Follow-up: How to avoid alloc? Answer:
await?Answer: State machine struct.
IAsyncStateMachine. Stores locals, resume label. On await, return Task, register continuation. On complete, MoveNext() resumes.Why SDE-2: Debug perf. Each await = state machine alloc if not ValueTask.
for loop with await = N allocs. Use ValueTask or batch.Follow-up: How to avoid alloc? Answer:
async ValueTask, or ManualResetValueTaskSource for pooling.
Question: Gen2 GC every 5s. API freezes. Root cause?
Answer: Gen2 = expensive full heap scan. Every 5s = LOH pressure or memory leak. LOH >85KB objects not compacted.
Debug:
Common:
SDE-2 bar: Must know LOH, POH. "Add RAM" = reject.
Answer: Gen2 = expensive full heap scan. Every 5s = LOH pressure or memory leak. LOH >85KB objects not compacted.
Debug:
dotnet-counters monitor → Gen2 count. dotnet-gcdump → see who holds refs.Common:
byte[] buffer = new byte[100_000]; in loop. Use ArrayPool<byte>.Shared.Rent().SDE-2 bar: Must know LOH, POH. "Add RAM" = reject.
Question:
Answer: Cartesian explosion. Join = 10K*10*2 rows. Use
When not: If child collections small, join is faster. Measure.
Trap: Split query +
Include(u => u.Orders).Include(u => u.Addresses) slow. 10K users * 10 orders * 2 addr = 200K rows.Answer: Cartesian explosion. Join = 10K*10*2 rows. Use
.AsSplitQuery(). Runs 3 queries: users, orders, addresses. O(users+orders+addr).When not: If child collections small, join is faster. Measure.
Trap: Split query +
Take(10) = wrong. Loads all then takes 10. Use .AsSingleQuery() for paging.
Question:
Answer: Yes. Exception in Routing won't be caught. Middleware = pipeline. Order: Exception → HTTPS → Static → Routing → Auth → Endpoints.
SDE-2 must know:
Debug: 500 errors not hitting ExceptionHandler. Check order.
app.UseExceptionHandler() after app.UseRouting(). Bug?Answer: Yes. Exception in Routing won't be caught. Middleware = pipeline. Order: Exception → HTTPS → Static → Routing → Auth → Endpoints.
SDE-2 must know:
UseRouting + UseEndpoints split. Auth must be between. UseStaticFiles before Auth = CSS public.Debug: 500 errors not hitting ExceptionHandler. Check order.
Part 3: System Design + LP - 7 Questions
Requirements: Upload 5GB file. Resumable. Fast.
Answer:
1. Init: Client → API:
2. Upload: Client splits 5GB/100MB=50 parts. Upload parallel to S3 via presigned URLs.
3. Complete: Client sends ETag list. API →
4. Resume: Client asks API for parts. API →
Scale: 50 parts * 1000 users = 50K PUT/s. S3 limit 3500 PUT/s per prefix. Use random prefix or increase limits.
.NET:
LP: "Insist on Highest Standards" - Checksum each part. "Dive Deep" - What if Complete fails?
Answer:
1. Init: Client → API:
InitiateMultipartUpload. API → S3. Return UploadId. Store in DynamoDB.2. Upload: Client splits 5GB/100MB=50 parts. Upload parallel to S3 via presigned URLs.
UploadPart. Store ETag per part in DDB.3. Complete: Client sends ETag list. API →
CompleteMultipartUpload. S3 concatenates.4. Resume: Client asks API for parts. API →
ListParts. Skip uploaded.Scale: 50 parts * 1000 users = 50K PUT/s. S3 limit 3500 PUT/s per prefix. Use random prefix or increase limits.
.NET:
TransferUtility.UploadAsync() does this. Or manual with HttpClient.LP: "Insist on Highest Standards" - Checksum each part. "Dive Deep" - What if Complete fails?
Requirements: 1000 req/s per user. 100K users. <10ms p99.
Answer:
Algorithm: Token Bucket. Smooth. Sliding Window = accurate but more memory.
Storage: Redis Cluster. Key=
Follow-up: Redis down? Answer: Fail open with local cache, or fail closed. Circuit breaker.
Answer:
Algorithm: Token Bucket. Smooth. Sliding Window = accurate but more memory.
Storage: Redis Cluster. Key=
rl:{userId}. Value={tokens, lastTs}. Lua script for atomic:local key = KEYS[1] local rate=ARGV[1] local cap=ARGV[2]
local data = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(data[1]) or cap
local ts = tonumber(data[2]) or 0
local now = redis.call('TIME')[1]
tokens = math.min(cap, tokens + (now-ts)*rate)
if tokens >= 1 then tokens=tokens-1; redis.call('HMSET',key,'tokens',tokens,'ts',now) return 1 end
return 0
Scale: 100K users * 100 bytes = 10MB. Redis handles 1M ops/s. Shard by userId.Follow-up: Redis down? Answer: Fail open with local cache, or fail closed. Circuit breaker.
Polly in.NET.
Question: OrderCreated needs to go to Email, SMS, Analytics, Fraud. Design.
Answer: SNS Topic:
Subscribers: 4 SQS queues. EmailService polls EmailQueue. SMSService polls SMSQueue.
Why not SQS direct: OrderService would need to know all consumers. Tight coupling. SNS = pub/sub.
EventBridge: If need routing:
.NET:
Follow-up: Exactly-once? Answer: SQS FIFO + dedup ID. Or idempotent consumer with DynamoDB.
Answer: SNS Topic:
OrderCreated. Fan-out.Subscribers: 4 SQS queues. EmailService polls EmailQueue. SMSService polls SMSQueue.
Why not SQS direct: OrderService would need to know all consumers. Tight coupling. SNS = pub/sub.
EventBridge: If need routing:
if total > 1000 then FraudQueue. Has schema registry..NET:
IAmazonSQS + IHostedService long poll. MessageAttribute for filtering.Follow-up: Exactly-once? Answer: SQS FIFO + dedup ID. Or idempotent consumer with DynamoDB.
Question: PK=userId. 90% traffic is userId="guest". Throttling. Fix?
Answer:
1. Write Sharding: PK=
2. Cache: DAX or Redis for guest. Write-through.
3. GSI: If query pattern allows, GSI with different PK.
Numbers: 1 partition = 3000 RCU, 1000 WCU. Hot key blows past. Shard to 10 = 30K RCU.
SDE-2 bar: Must mention 3 solutions + trade-offs. "Add RCU" = reject.
Answer:
1. Write Sharding: PK=
guest#1 to guest#10. Random on write. On read, Query all 10 + merge.2. Cache: DAX or Redis for guest. Write-through.
3. GSI: If query pattern allows, GSI with different PK.
Numbers: 1 partition = 3000 RCU, 1000 WCU. Hot key blows past. Shard to 10 = 30K RCU.
SDE-2 bar: Must mention 3 solutions + trade-offs. "Add RCU" = reject.
Bad SDE-1 answer: "I added cache." No data = reject.
SDE-2 Bar Raiser answer:
S: Checkout API p99 2.1s, SLA 300ms. 5% customers dropping.
T: Find root cause + fix in 3 days before Prime Day.
A: 1. X-Ray trace: 70% time in DynamoDB. 2. CloudWatch Metrics: Hot partition on userId="guest". 3. Changed PK to
R: p99 2.1s → 180ms. Cart abandonment -3%. Prime Day 0 incidents. Promoted to SDE-3.
Key: Metrics before/after. Tool names. "I" not "we". 2-day timeline = "Bias for Action".
SDE-2 Bar Raiser answer:
S: Checkout API p99 2.1s, SLA 300ms. 5% customers dropping.
T: Find root cause + fix in 3 days before Prime Day.
A: 1. X-Ray trace: 70% time in DynamoDB. 2. CloudWatch Metrics: Hot partition on userId="guest". 3. Changed PK to
userId#timestamp for guests. 4. Added DAX cache. 5. Load test with k6 to 150K RPS.R: p99 2.1s → 180ms. Cart abandonment -3%. Prime Day 0 incidents. Promoted to SDE-3.
Key: Metrics before/after. Tool names. "I" not "we". 2-day timeline = "Bias for Action".
Answer Framework:
S: Principal wanted DynamoDB Streams + Lambda for audit. I thought Kinesis cheaper.
T: Convince without ego, but commit if I lose.
A: 1. Data: Streams $0.02/M vs Kinesis $0.015/M + 20% less latency. 2. POC: Kinesis handled 50K RPS. 3. Doc with pros/cons. 4. Principal still chose Streams for integration. 5. I committed: Built it, added alerting, shipped.
R: Shipped on time. 6 months later we hit Streams limit, migrated to Kinesis using my POC. Earned trust.
Reject if: "I was right" = no commit. "I just did it" = no disagree. Need both.
S: Principal wanted DynamoDB Streams + Lambda for audit. I thought Kinesis cheaper.
T: Convince without ego, but commit if I lose.
A: 1. Data: Streams $0.02/M vs Kinesis $0.015/M + 20% less latency. 2. POC: Kinesis handled 50K RPS. 3. Doc with pros/cons. 4. Principal still chose Streams for integration. 5. I committed: Built it, added alerting, shipped.
R: Shipped on time. 6 months later we hit Streams limit, migrated to Kinesis using my POC. Earned trust.
Reject if: "I was right" = no commit. "I just did it" = no disagree. Need both.
Answer:
S: Deploy pipeline had 15 microservices, 2-hour deploy, 30% fail rate.
T: Cut deploy time to <15min, improve reliability.
A: 1. Invented: Built custom.NET tool to diff K8s manifests, deploy only changed services. 2. Simplified: Merged 15 services to 3 domain services. Used Backstage.io for 1-click deploy. Added canary + auto-rollback.
R: Deploy time 2hr → 8min. Fail rate 30% → 2%. Team velocity +40%. Got promoted.
Key: "Invent" = new tool. "Simplify" = removed complexity. Metrics mandatory.
S: Deploy pipeline had 15 microservices, 2-hour deploy, 30% fail rate.
T: Cut deploy time to <15min, improve reliability.
A: 1. Invented: Built custom.NET tool to diff K8s manifests, deploy only changed services. 2. Simplified: Merged 15 services to 3 domain services. Used Backstage.io for 1-click deploy. Added canary + auto-rollback.
R: Deploy time 2hr → 8min. Fail rate 30% → 2%. Team velocity +40%. Got promoted.
Key: "Invent" = new tool. "Simplify" = removed complexity. Metrics mandatory.
Amazon SDE-2 Quiz: 8 Questions 🏆
Passing: 6/8 | These are actual screen-level questions from Aug-Oct 2025. Score 6+ = phone screen ready.
🔥 Top 5 SDE-2 Reject Reasons from 2025
- Can't scale past 10K TPS: No Redis, no async, says "add servers"
- Bar Raiser fail: LP stories with no metrics or "we" not "I"
- System Design: "Use Kafka" without requirements. No capacity numbers
- Concurrency bugs:
lock(this), race conditions, don't knowInterlocked - Defensive: Argues with interviewer. No "Dive Deep" on feedback
Mastered SDE-2? L6 Bar Is Different
SDE-3 tests distributed systems + Principal-level LPs. No coding warmups. Only hard trade-offs.
Go to SDE-3 Advanced →Or review: SDE-1 Fundamentals
No comments yet. Be the first to share your thoughts!