Skip to content

Java Concurrency β€” All-Inclusive Study Guide ​

πŸ“ Quiz Β· πŸƒ Flashcards

Target: Senior Java backend engineer interview (5+ yrs experience) Java baseline: Java 21 LTS β€” with call-outs for 8/11/17 differences, and notes on 24/25 evolution Format: Concept explanation β†’ runnable code β†’ interviewer-style Q&A per section

This is the deep-dive companion to ../INTERVIEW_PREP.md Β§3 (Concurrency & Multithreading). Every question in that section has a richer answer here.


Table of Contents ​

Part 1 β€” Foundations ​

  1. Why concurrency matters & the mental model
  2. The Java Memory Model (JMM) & happens-before
  3. Threads β€” lifecycle, creation, control

Part 2 β€” Synchronization primitives ​

  1. synchronized and intrinsic locks
  2. volatile and atomicity
  3. java.util.concurrent locks
  4. Atomics and CAS

Part 3 β€” Thread-safe collections ​

  1. Concurrent collections
  2. BlockingQueue family

Part 4 β€” Executors, async, and parallelism ​

  1. Executor framework
  2. Future and CompletableFuture
  3. Fork/Join and parallel streams
  4. Synchronizers / coordination primitives

Part 5 β€” Modern concurrency (Java 21+) ​

  1. Virtual threads (Project Loom)
  2. Structured concurrency
  3. Scoped values

Part 6 β€” Patterns, problems, pitfalls ​

  1. Common concurrency problems
  2. Concurrency patterns
  3. Performance and tuning

Part 7 β€” Testing, debugging, and Spring ​

  1. Testing concurrent code
  2. Debugging & troubleshooting
  3. Spring and concurrency

Part 8 β€” Quick reference ​

  1. Glossary
  2. Decision tables
  3. Top 30 rapid-fire interview questions
  4. Further reading

Part 1 β€” Foundations ​

1. Why concurrency matters & the mental model ​

Concurrency vs parallelism ​

  • Concurrency is about dealing with many things at once β€” structuring a program so progress can be made on multiple tasks logically simultaneously. Single-core machines can be concurrent (via time-slicing).
  • Parallelism is about doing many things at once β€” physically executing multiple computations simultaneously. Requires multiple cores.

You can have concurrency without parallelism (event loop on one CPU) and parallelism without concurrency (pure data-parallel numeric computation). Modern Java programs almost always want both.

CPU-bound vs I/O-bound β€” the single most important distinction ​

Every concurrency decision downstream (thread pool sizing, virtual threads vs platform threads, reactive vs imperative, sync vs async) depends on which side of this line your workload sits.

CPU-boundI/O-bound
ExampleImage encoding, cryptography, Avro serialization in a hot pathREST call, DB query, reading from Kafka, file I/O
Optimal threadsβ‰ˆ # CPU coresCan be 100s–1000s (most blocked on I/O)
Virtual threads help?No β€” adds overhead, no benefitYes β€” huge win, their main use case
Scaling bottleneckCoresContext-switch cost, memory per thread

Most backend services (REST APIs, Kafka consumers, DB-heavy apps) are overwhelmingly I/O-bound. This is why virtual threads are a bigger deal than they look.

Amdahl's Law ​

Speedup ≀ 1 / (S + (1-S)/N) where S is the serial fraction and N is the number of processors. If 10% of your code is serial, even with infinite cores you get at best a 10Γ— speedup. Lesson: reducing serial fractions matters more than adding cores.

The three hard problems of concurrency ​

Every concurrency bug boils down to one of these:

  1. Atomicity β€” multiple operations need to happen as a single indivisible step. i++ is not atomic: it's read-modify-write.
  2. Visibility β€” changes made by one thread must become visible to other threads. Without synchronization, a thread may cache a value forever and never see writes from others.
  3. Ordering β€” operations must appear to happen in a sensible order. The compiler and CPU aggressively reorder instructions; without synchronization, threads can observe reorderings that look impossible from a single-thread view.

synchronized, volatile, final, atomics, and locks each give you some combination of these three. The JMM (Β§2) is the specification that defines precisely what you get.

Q&A ​

  1. Q: Concurrency vs parallelism?

    • Concurrency = composition/structure for handling multiple tasks. Parallelism = simultaneous execution. Concurrency is a property of your program; parallelism is a property of execution.
  2. Q: Your service handles 10k Kafka messages/day, mostly calling a downstream REST API and a DB. Are you CPU-bound or I/O-bound? Why does it matter?

    • I/O-bound β€” each message spends ~99% of its wall-clock time waiting on network/DB. Matters because: (a) thread pool can be much larger than CPU count, (b) virtual threads give a big win, (c) async/non-blocking buys less than you'd think vs. throwing more threads at it.
  3. Q: What are the three fundamental problems concurrency has to solve?

    • Atomicity (compound ops), visibility (cross-thread value propagation), ordering (reordering by compiler/CPU).
  4. Q: Why is i++ not thread-safe even for a single variable?

    • Three JVM instructions: load, increment, store. Two threads can interleave load-load-inc-inc-store-store, losing one update.

2. The Java Memory Model (JMM) & happens-before ​

Why the JMM exists ​

Modern compilers and CPUs reorder instructions for performance. Caches mean each core can see a different view of memory. Without a formal model, "what does this program do?" is undefined across threads.

The JMM is the specification of what reorderings are allowed and what values a read is permitted to return. It's defined in terms of the happens-before relation.

Concrete example of reordering surprising you:

java
class Reorder {
  int x, y;
  int a, b;

  void thread1() { a = 1; x = b; }
  void thread2() { b = 1; y = a; }
}

After both threads run, can x == 0 && y == 0? Intuitively no. In practice β€” yes β€” because the JVM is allowed to reorder the independent writes and reads within each thread. Without volatile, synchronized, or some happens-before edge, this outcome is legal.

Happens-before, precisely ​

Action A happens-before action B (written A hb B) means: if A hb B, then the effects of A are visible to B and A is ordered before B. This is the JMM's only cross-thread guarantee. If there is no hb edge between two actions, the JVM can reorder or cache-delay them arbitrarily.

The happens-before rules (memorize these) ​

  1. Program order β€” within a single thread, each action hb every action that follows it in program order.
  2. Monitor lock β€” an unlock on monitor M hb every subsequent lock on that same M.
  3. Volatile β€” a write to a volatile variable hb every subsequent read of that same variable.
  4. Thread start β€” Thread.start() on T hb every action in T.
  5. Thread join / termination β€” every action in T hb any action in another thread that successfully returns from T.join().
  6. Interruption β€” a thread calling interrupt() on T hb T detecting it was interrupted (via InterruptedException or isInterrupted()).
  7. Transitivity β€” if A hb B and B hb C, then A hb C.

Also: constructor completion hb the first action in the finalizer (mostly historical).

Safe publication ​

An object is safely published if other threads see its fully-initialized state and not a partially-constructed reference. You get safe publication by:

  1. Initializing in a static initializer (JVM guarantees happens-before via class loading).
  2. Storing to a volatile field (or AtomicReference).
  3. Storing to a final field in a constructor (see final semantics below).
  4. Storing to a field guarded by a lock.

Everything else β€” including a plain assignment to a non-volatile field β€” can be seen as a partially-constructed object by another thread.

final field semantics ​

When a constructor finishes, all writes to final fields are frozen β€” any thread that subsequently obtains a reference to the object is guaranteed to see the correctly-initialized final fields, even without synchronization.

This is the magic that makes String, Integer, Long, and records safe to share without extra synchronization.

Caveat: the object reference itself must still be published safely. And writes to non-final fields inside the constructor aren't covered by this guarantee.

Piggybacking on synchronization ​

A subtle consequence of the rules: every write made before a synchronization release is visible to every read after the corresponding synchronization acquire β€” not just the fields explicitly mentioned. So you can use a volatile flag to publish the results of a long chain of non-volatile writes:

java
int[] data; // not volatile
volatile boolean ready = false;

// thread 1
data = computeExpensiveArray(); // non-volatile write
ready = true;                    // volatile write

// thread 2
if (ready) {                     // volatile read
  use(data);                     // guaranteed to see fully-computed array
}

This is called piggybacking and is the basis of volatile-flag patterns.

Double-checked locking (DCL) ​

Classic example of the JMM biting you. The broken pre-Java-5 version:

java
class Lazy {
  private Singleton instance;
  public Singleton get() {
    if (instance == null) {                   // (1) unsynchronized read
      synchronized (this) {
        if (instance == null) {
          instance = new Singleton();         // (2) partial construction can leak
        }
      }
    }
    return instance;                          // (3) can return partial object
  }
}

Problem: a thread doing (1) can see a non-null instance that has been assigned but whose constructor hasn't finished. Returns a partially-constructed object.

Fixed version (Java 5+):

java
class Lazy {
  private volatile Singleton instance;        // volatile!
  public Singleton get() {
    Singleton local = instance;
    if (local == null) {
      synchronized (this) {
        local = instance;
        if (local == null) {
          instance = local = new Singleton();
        }
      }
    }
    return local;
  }
}

The volatile write of instance = new Singleton() has release semantics; the volatile read on the fast path has acquire semantics; together they give happens-before between the constructor and the fast-path read.

Even better: use the initialization-on-demand holder idiom, which is lazy + thread-safe without a single lock:

java
class Lazy {
  private static class Holder { static final Singleton INSTANCE = new Singleton(); }
  public static Singleton get() { return Holder.INSTANCE; }
}

Q&A ​

  1. Q: What is the Java Memory Model?

    • The formal spec of what reorderings the JVM can do and what values reads can return. Defined via the happens-before relation. Without hb edges between actions on different threads, you get no ordering or visibility guarantees.
  2. Q: What's happens-before?

    • A relation between actions. If A hb B, then A's effects are visible to B and A is ordered before B. Established by monitor locks, volatile writes/reads, thread start/join, interrupt, and program order within a thread (plus transitivity).
  3. Q: Without any synchronization, can one thread's write to a non-volatile int field ever be seen by another thread?

    • It might be, but the JVM gives zero guarantee. It could be cached forever. In practice it usually becomes visible eventually, but correctness cannot depend on that.
  4. Q: Why is double-checked locking broken before Java 5, and how does volatile fix it?

    • Without volatile, a thread can observe the partially-initialized object on the fast path β€” the write of the reference can be reordered before the constructor's writes complete. volatile on the field gives release/acquire semantics: the object's fields are guaranteed to be fully written before the reference becomes visible, and fully seen by the reader on the fast path.
  5. Q: What's special about final fields?

    • They have a freeze semantics: once the constructor returns, any thread that sees a reference to the object sees fully-initialized final fields, without synchronization. Makes String, records, and immutable types safe by default.
  6. Q: Name the happens-before rules.

    • Program order, monitor lock releaseβ†’acquire, volatile writeβ†’read, Thread.start()β†’everything in the child, everything in threadβ†’T.join() return, interruptβ†’interrupted check, transitivity.
  7. Q: What is "safe publication"?

    • Publishing an object such that other threads see its fully-initialized state. Via static init, volatile, final fields, or lock-guarded fields. A plain assignment to a non-volatile non-final field is not safe publication.

3. Threads β€” lifecycle, creation, control ​

Thread states ​

                 start()
   NEW ─────────────────────▢ RUNNABLE ◀──────────────┐
                                 β”‚                    β”‚
                                 β”‚ enters             β”‚ lock acquired
                                 β”‚ synchronized       β”‚
                                 β–Ό                    β”‚
                              BLOCKED β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                                 β”‚ wait()/join()/park()
                                 β–Ό
                              WAITING ──── notify/unpark ──┐
                                 β”‚                          β”‚
                                 β”‚                          β”‚
                                 β–Ό                          β–Ό
                           TIMED_WAITING                 RUNNABLE
                           (sleep, wait(ms), park(ns))
                                 β”‚
                                 β”‚ run() returns / exception
                                 β–Ό
                             TERMINATED
  • NEW β€” created but start() not called yet.
  • RUNNABLE β€” either running on a CPU or ready to run (JVM doesn't distinguish at the Thread.State level).
  • BLOCKED β€” waiting for a monitor lock (failed synchronized entry).
  • WAITING β€” in Object.wait(), Thread.join(), or LockSupport.park() with no timeout.
  • TIMED_WAITING β€” same as WAITING but with a timeout.
  • TERMINATED β€” run() has returned.

Note: a thread blocked on ReentrantLock.lock() is in WAITING (via LockSupport.park), not BLOCKED. This trips people up in thread dumps.

Creating threads ​

Classic platform thread:

java
// 1. Extend Thread (rarely used β€” ties execution to identity)
new Thread() { public void run() { work(); } }.start();

// 2. Runnable (preferred pre-Java 21)
new Thread(() -> work(), "worker-1").start();

// 3. Callable via Executor (when you need a return value)
ExecutorService pool = Executors.newFixedThreadPool(4);
Future<Integer> f = pool.submit(() -> computeValue());
int result = f.get();

Modern: Thread.Builder (Java 19+):

java
Thread t = Thread.ofPlatform().name("worker-").daemon().unstarted(() -> work());
t.start();

Thread vt = Thread.ofVirtual().name("vt-", 0).start(() -> work());

Thread.ofVirtual() is the gateway to virtual threads (Β§14).

Daemon threads ​

  • Daemon threads don't prevent JVM exit. When all non-daemon threads finish, the JVM halts daemons abruptly.
  • Set before start(): t.setDaemon(true).
  • Use for background housekeeping (metrics flush, cleanup). Don't use for anything that must complete (file writes, DB commits) β€” you can lose data on shutdown.

sleep vs yield vs wait vs join vs park ​

CallReleases locks?Wakeable byPurpose
Thread.sleep(ms)Notimeout, interruptWait for elapsed time
Thread.yield()Noimmediate (hint)Suggest scheduler run another RUNNABLE thread β€” advisory, often a no-op
obj.wait()Yes (the monitor)notify, notifyAll, interruptWait for a condition while holding a monitor
t.join()No (not holding any)thread T terminatesWait for another thread to die
LockSupport.park()Nounpark(this), interruptPrimitive used by ReentrantLock, etc.

The gotcha: sleep does not release any monitor locks you hold. If you sleep inside synchronized, every other thread waiting on that monitor continues to wait.

Interruption β€” the cooperative cancellation protocol ​

Java has no forced thread termination (Thread.stop is deprecated and dangerous). Cancellation is cooperative: you set an interrupt flag, and well-behaved code checks it.

Two ways a thread observes interruption:

  1. Interruptible blocking calls (sleep, wait, join, lockInterruptibly, most NIO, most java.util.concurrent blocking methods) throw InterruptedException β€” and clear the interrupt flag.
  2. Polling β€” Thread.currentThread().isInterrupted() (does not clear) or Thread.interrupted() (static, clears).

Golden rule: never swallow InterruptedException. Either rethrow, or restore the interrupt flag so callers can see it:

java
try {
  Thread.sleep(1000);
} catch (InterruptedException e) {
  Thread.currentThread().interrupt(); // restore
  // optional: exit loop, throw a domain exception
}

Swallowing it without restoration is one of the most common concurrency bugs β€” it silently breaks cancellation across an entire subsystem.

Uncaught exception handlers ​

A thread that terminates with an uncaught exception prints a stack trace and dies. If that thread was driving critical work, the rest of your app may hang forever, unaware.

java
Thread t = new Thread(task);
t.setUncaughtExceptionHandler((thread, ex) ->
    logger.error("Thread {} died", thread.getName(), ex));
t.start();

// Or globally:
Thread.setDefaultUncaughtExceptionHandler((thread, ex) ->
    logger.error("Uncaught in {}", thread.getName(), ex));

For pooled threads, set this via a custom ThreadFactory (Β§10).

Q&A ​

  1. Q: Runnable vs Callable vs Thread?

    • Thread is an execution-context class; extending it is rare. Runnable is a no-arg no-return task. Callable<V> returns a value and can throw checked exceptions β€” used with ExecutorService.submit().
  2. Q: If a thread is stuck trying to enter a synchronized block, what state is it in?

    • BLOCKED. If it's stuck in ReentrantLock.lock(), it's WAITING (parked). Different states for what is effectively the same situation β€” useful to know when reading thread dumps.
  3. Q: Does Thread.sleep() release locks?

    • No. sleep holds every lock the thread has. Object.wait() is the only call that releases its monitor.
  4. Q: How do you cancel a running thread in Java?

    • Cooperatively via interruption. Call t.interrupt(); the target thread must check Thread.interrupted() or allow an interruptible blocking call to throw InterruptedException. Thread.stop() is deprecated.
  5. Q: What's the right way to handle InterruptedException?

    • If you can't propagate it, restore the interrupt flag: Thread.currentThread().interrupt();. Never swallow silently β€” it breaks upstream cancellation.
  6. Q: How do you give a thread a meaningful name in production?

    • Constructor or Thread.Builder; for pooled threads, provide a ThreadFactory that names them myservice-kafka-consumer-%d or similar. Critical for thread dump readability.
  7. Q: Daemon vs non-daemon?

    • Daemon threads don't hold the JVM open. Non-daemon threads do; the JVM runs until all finish. Default is non-daemon (inherited from parent). Don't do work you care about on daemons β€” they're killed at shutdown.

Part 2 β€” Synchronization primitives ​

4. synchronized and intrinsic locks ​

Every Java object has an intrinsic lock (also called a monitor). synchronized acquires that lock on entry and releases on exit (whether via normal return or exception β€” always).

Instance lock vs class lock ​

java
class Counter {
  private int count;

  // instance lock β€” on `this`. Each Counter instance has its own.
  public synchronized void inc() { count++; }

  // equivalent:
  public void inc2() { synchronized (this) { count++; } }

  // class lock β€” on Counter.class. Shared across all instances.
  public static synchronized void reset() { /* ... */ }

  // equivalent:
  public static void reset2() { synchronized (Counter.class) { /* ... */ } }
}

An instance-synchronized method and a static-synchronized method do not lock each other out β€” they're on different monitors.

Synchronized methods vs blocks ​

  • Synchronized blocks are almost always better. You can narrow the critical section to just the code that needs it, and you can lock on a private internal object to avoid external interference.
java
class Cache {
  // BAD: locks on `this` β€” external code can also synchronize on your Cache and deadlock you
  public synchronized void put(String k, String v) { /* ... */ }
}

class Cache {
  private final Object lock = new Object(); // private, unreachable from outside
  public void put(String k, String v) {
    synchronized (lock) { /* ... */ }
  }
}

Locking on this leaks your locking strategy. Prefer a private final Object lock.

Don't lock on: String literals (pooled β€” other code may lock on the same interned string), boxed primitives from cache (Integer.valueOf(1)), or Class objects that other code might lock on.

Reentrance ​

Java's intrinsic locks are reentrant: a thread that already holds a monitor can re-enter a synchronized block on the same monitor without deadlocking. Internally the JVM tracks a count per (thread, monitor) pair.

java
class A {
  public synchronized void foo() { bar(); }      // locks this
  public synchronized void bar() { /* fine */ }  // re-enters same lock
}

Without reentrance, any synchronized method calling another synchronized method on the same object would self-deadlock. (C's pthread_mutex_t is non-reentrant by default β€” Java chose differently.)

wait / notify / notifyAll ​

Three methods on Object (not Thread). To call them you must hold the monitor.

  • wait() atomically releases the monitor and blocks until signaled. On wakeup, reacquires before returning.
  • notify() wakes one waiting thread (arbitrary choice).
  • notifyAll() wakes all waiting threads.

The always-loop rule: wait() can return spuriously (without any corresponding notify), or after a notify by the time you re-acquire the lock the condition may no longer hold. Always wait in a loop:

java
synchronized (lock) {
  while (!condition) {      // NOT `if` β€” spurious wakeup + lost-wakeup races
    lock.wait();
  }
  // condition is true, proceed
}

notify vs notifyAll: prefer notifyAll unless you can prove (a) only one thread can make progress and (b) threads aren't waiting for different conditions on the same monitor. A "lost wakeup" from wrong notify choice is a hellish bug.

Producer-consumer with wait/notify (classic example) ​

java
class BoundedBuffer<T> {
  private final Object lock = new Object();
  private final Queue<T> queue = new ArrayDeque<>();
  private final int capacity;

  BoundedBuffer(int capacity) { this.capacity = capacity; }

  public void put(T item) throws InterruptedException {
    synchronized (lock) {
      while (queue.size() == capacity) lock.wait();
      queue.add(item);
      lock.notifyAll();  // wake consumers
    }
  }

  public T take() throws InterruptedException {
    synchronized (lock) {
      while (queue.isEmpty()) lock.wait();
      T item = queue.remove();
      lock.notifyAll();  // wake producers
      return item;
    }
  }
}

In modern code you'd use BlockingQueue (Β§9). The wait/notify version is a rite of passage and interview classic.

Double-checked locking (revisited) ​

Covered in Β§2. Summary: broken without volatile. Fixed with volatile. Even better: lazy-holder idiom.

HotSpot lock inflation (brief) ​

HotSpot optimizes synchronized aggressively:

  • Biased locking (historical; removed by default in JDK 15+) β€” bias the lock to the first thread that took it.
  • Thin / lightweight locks β€” uncontended case is a single CAS on the object header.
  • Heavyweight / inflated β€” under contention, inflates to an OS-level mutex.

You almost never need to reason about this directly, but know that a one-thread synchronized is essentially free, and the cost grows with contention.

Q&A ​

  1. Q: synchronized method vs synchronized block β€” which is better?

    • Block, almost always. You can narrow the critical section and lock on a private object to avoid collisions with callers.
  2. Q: Why prefer a private Object lock over synchronized (this)?

    • External callers can synchronize on your object and cause deadlocks or hold your lock while you need it. Private lock objects aren't reachable from outside.
  3. Q: What's reentrance?

    • Same thread can re-acquire a lock it already holds. Java's intrinsic locks are reentrant; without reentrance, recursive synchronized calls would self-deadlock.
  4. Q: Why must you always wait() in a loop?

    • Spurious wakeups are allowed by the JVM. Also, between notify and the re-acquisition of the lock, some other thread can invalidate the condition. A while-loop re-checks.
  5. Q: notify vs notifyAll?

    • Prefer notifyAll. notify is an optimization that's safe only when every waiter waits for the same condition and any one waking up is equally progress. Lost-wakeup bugs from wrong notify are nasty.
  6. Q: Can you call wait() without holding a lock?

    • No β€” IllegalMonitorStateException. You must hold the monitor of the object you're wait-ing on.
  7. Q: Does synchronized ever reorder across its boundaries?

    • The JMM allows code inside the block to be reordered freely, but no operation can move out of the block across lock/unlock boundaries. Acquire-release semantics.

5. volatile and atomicity ​

volatile is the lightweight sibling of synchronized. It gives you:

  • Visibility β€” a write to a volatile is immediately flushed and subsequent reads see it.
  • Ordering (release/acquire) β€” writes before the volatile store cannot be reordered past it; reads after the volatile load cannot be reordered before it.
  • Atomicity of 64-bit reads/writes β€” plain long and double reads/writes aren't atomic on 32-bit JVMs; volatile forces them to be.

What volatile does NOT give you ​

  • No compound-operation atomicity. volatile int counter; counter++; is still a race β€” read-modify-write is three operations.
  • No mutual exclusion. Two threads can both be inside the "critical section" simultaneously.

When volatile is sufficient ​

A good rule of thumb: a lone, independent flag variable that one thread writes and others only read to decide whether to stop.

java
class StoppableWorker implements Runnable {
  private volatile boolean shutdown = false;

  public void shutdown() { shutdown = true; }

  @Override public void run() {
    while (!shutdown) {
      doWork();
    }
  }
}

If shutdown were not volatile, the running thread could cache false forever and never exit.

When volatile is NOT sufficient ​

Any time you do read-modify-write:

java
// BAD β€” race condition
volatile int counter;
counter++;  // three ops: load, add, store β€” two threads can both read N, write N+1

For this, use AtomicInteger (CAS-based, Β§7) or a lock.

Volatile write as a publication barrier ​

The piggyback trick from Β§2:

java
// Thread 1 initializes data, then publishes via a volatile flag
data = buildLargeStructure();  // non-volatile writes
ready = true;                  // volatile write β€” acts as a release barrier

// Thread 2
if (ready) {                   // volatile read β€” acquire barrier
  use(data);                   // guaranteed to see the fully-built structure
}

This is why the volatile-flag initialization pattern works.

Q&A ​

  1. Q: What does volatile give you?

    • Visibility, release/acquire ordering, and atomicity of 64-bit reads/writes. No mutual exclusion, no compound-op atomicity.
  2. Q: Can you implement a thread-safe counter with just volatile?

    • No. counter++ is read-modify-write; two threads can lose an update. Use AtomicInteger or a lock.
  3. Q: When is volatile enough?

    • A flag that one writer flips and other threads only read. Or safely publishing a reference to an immutable object (the classic volatile Singleton instance pattern).
  4. Q: volatile vs synchronized performance?

    • volatile is much cheaper β€” it's essentially a memory barrier plus forbids reordering across it. synchronized has the same barrier cost plus potentially a syscall on contention.
  5. Q: Are plain long reads/writes atomic in Java?

    • The JLS allows 32-bit JVMs to split 64-bit plain reads/writes. volatile long forces atomicity. In practice all modern 64-bit HotSpots are atomic even for plain long, but don't rely on it.

6. java.util.concurrent locks ​

Introduced in Java 5 to fix what synchronized couldn't do: timeouts, interruptibility, fairness, multiple condition variables, non-block-structured locking.

Lock interface ​

java
public interface Lock {
  void lock();
  void lockInterruptibly() throws InterruptedException;
  boolean tryLock();
  boolean tryLock(long timeout, TimeUnit unit) throws InterruptedException;
  void unlock();
  Condition newCondition();
}

Canonical usage (the try-finally is mandatory β€” unlike synchronized, unlock() is not automatic):

java
Lock lock = new ReentrantLock();
lock.lock();
try {
  // critical section
} finally {
  lock.unlock();
}

Forget the finally and you leak the lock on any exception. This is the single most common bug with explicit locks.

ReentrantLock ​

The workhorse. Reentrant semantics identical to synchronized, plus:

  • tryLock() β€” non-blocking; true if acquired, false if not. Lets you avoid deadlock.
  • tryLock(timeout, unit) β€” block up to a timeout; useful for responsive systems.
  • lockInterruptibly() β€” can be canceled via Thread.interrupt(), unlike synchronized which is uninterruptible.
  • Fairness β€” new ReentrantLock(true) gives FIFO ordering to waiters (at a throughput cost).
  • Multiple Conditions per lock β€” wait sets for different conditions (e.g., not-full vs not-empty on a bounded buffer).

When to reach for ReentrantLock over synchronized:

  • Need tryLock (deadlock avoidance, back-off retries).
  • Need timed or interruptible acquisition.
  • Need fairness.
  • Need multiple Conditions on one lock.
  • Non-block-structured lock acquisition (acquire in one method, release in another β€” rare, risky).

Otherwise, prefer synchronized: it's simpler, auto-released on exception, and in Java 21+ cooperates better with virtual threads (though JEP 491 in Java 24 eliminates the pinning issue).

ReentrantReadWriteLock ​

Splits locking into read and write modes:

  • Read lock β€” many readers can hold simultaneously; mutually exclusive with writers.
  • Write lock β€” exclusive; no readers and no other writers.

Good when you have many more reads than writes (e.g., a cache read 100Γ— per write).

java
private final ReadWriteLock lock = new ReentrantReadWriteLock();
private final Map<String, String> cache = new HashMap<>();

public String get(String key) {
  lock.readLock().lock();
  try { return cache.get(key); }
  finally { lock.readLock().unlock(); }
}

public void put(String key, String val) {
  lock.writeLock().lock();
  try { cache.put(key, val); }
  finally { lock.writeLock().unlock(); }
}

Pitfall: a read lock cannot be upgraded to a write lock (deadlock). You must release the read lock and re-acquire the write lock, then re-validate state.

For read-heavy workloads, ConcurrentHashMap is almost always a better answer than a hand-rolled RW lock.

StampedLock (Java 8+) ​

Even faster for read-heavy workloads via optimistic reads:

java
private final StampedLock sl = new StampedLock();
private double x, y;

public double distanceFromOrigin() {
  long stamp = sl.tryOptimisticRead();   // no lock acquired
  double localX = x, localY = y;
  if (!sl.validate(stamp)) {              // was there a concurrent write?
    stamp = sl.readLock();                // fall back to real read lock
    try {
      localX = x; localY = y;
    } finally {
      sl.unlockRead(stamp);
    }
  }
  return Math.sqrt(localX * localX + localY * localY);
}

Critical caveats:

  • Not reentrant. A thread holding the lock that re-acquires will deadlock.
  • No Condition support.
  • Stamp must be passed to unlock (not the lock object).

Use sparingly β€” the API is tricky. ConcurrentHashMap is usually a cleaner answer.

Condition ​

Replaces wait/notify/notifyAll when using explicit locks. You can have multiple Conditions per lock β€” producer-consumer becomes clean:

java
class BoundedBuffer<T> {
  private final Lock lock = new ReentrantLock();
  private final Condition notFull  = lock.newCondition();
  private final Condition notEmpty = lock.newCondition();
  private final Deque<T> q = new ArrayDeque<>();
  private final int capacity;

  BoundedBuffer(int capacity) { this.capacity = capacity; }

  public void put(T x) throws InterruptedException {
    lock.lock();
    try {
      while (q.size() == capacity) notFull.await();   // wait only for space
      q.add(x);
      notEmpty.signal();                               // wake one consumer
    } finally { lock.unlock(); }
  }

  public T take() throws InterruptedException {
    lock.lock();
    try {
      while (q.isEmpty()) notEmpty.await();            // wait only for items
      T x = q.remove();
      notFull.signal();                                // wake one producer
      return x;
    } finally { lock.unlock(); }
  }
}

With wait/notifyAll and a single monitor, you'd wake producers and consumers indiscriminately. Multiple conditions target wakeups precisely, reducing spurious contention.

LockSupport.park/unpark ​

The low-level primitive all higher-level blocking is built on.

  • park() β€” blocks until unpark(thread) or interruption.
  • unpark(t) β€” makes t's next park() return immediately (permits are "pre-accumulated", but only up to 1).

You rarely touch this directly unless you're building a lock. Useful to know for reading thread dumps (parking to wait for <0x...> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)).

Q&A ​

  1. Q: synchronized vs ReentrantLock β€” when pick which?

    • synchronized when a simple mutual-exclusion pattern suffices (cleaner, auto-released). ReentrantLock when you need tryLock, timeouts, interruptibility, fairness, or multiple Conditions.
  2. Q: What happens if you forget unlock() in a ReentrantLock?

    • The lock leaks β€” no other thread can ever acquire it. synchronized auto-releases on exception; explicit locks don't. The try-finally pattern is mandatory.
  3. Q: When does ReadWriteLock pay off?

    • Read-heavy workloads (~>5:1 read/write ratio) and operations long enough that shared-read parallelism is meaningful. For short ops, the bookkeeping overhead erases the gain.
  4. Q: Why is StampedLock not reentrant?

    • Optimistic reads would be incoherent under reentrance (stamps wouldn't compose). The designers chose simplicity/speed over reentrance.
  5. Q: What does Condition.await() do differently from Object.wait()?

    • Same shape (atomically releases the lock and blocks), but lives on a specific Condition tied to the Lock β€” you can have many Conditions per lock for targeted signaling.
  6. Q: Will lockInterruptibly actually interrupt synchronized?

    • No β€” synchronized is uninterruptible. lockInterruptibly works only with Lock instances. This is one reason to prefer ReentrantLock for responsive shutdown.

7. Atomics and CAS ​

Compare-and-swap, the hardware primitive ​

CAS(address, expected, new) β€” a single atomic instruction: if *address == expected, write new and return true; else return false, do not modify. Hardware-supported on all modern CPUs (x86 LOCK CMPXCHG, ARM LDREX/STREX, etc.).

All java.util.concurrent.atomic classes are built on CAS. ReentrantLock, ConcurrentHashMap, semaphores β€” all use CAS for their fast paths.

AtomicInteger and friends ​

java
AtomicInteger counter = new AtomicInteger(0);
counter.incrementAndGet();        // atomic i++
counter.addAndGet(5);
counter.compareAndSet(10, 20);    // if 10, set to 20 atomically

AtomicReference<Config> config = new AtomicReference<>(initial);
config.updateAndGet(c -> c.withTimeout(30)); // retries via CAS on conflict

updateAndGet and accumulateAndGet (Java 8+) hide the CAS loop:

java
// Under the hood:
V prev, next;
do {
  prev = get();
  next = updater.apply(prev);
} while (!compareAndSet(prev, next));
return next;

Crucial: the updater function must be pure and idempotent with respect to side effects β€” it can be called multiple times due to retries.

Array and field updaters ​

java
AtomicIntegerArray arr = new AtomicIntegerArray(1024);
arr.incrementAndGet(42);

// Field updater β€” atomic ops on a plain volatile field (memory saved vs AtomicRef per instance)
static final AtomicIntegerFieldUpdater<Node> VERSION =
    AtomicIntegerFieldUpdater.newUpdater(Node.class, "version");
// Requires: field must be `volatile int` (or long/reference)

Use field updaters when you have millions of objects and don't want one AtomicInteger instance per object.

The ABA problem ​

Thread 1 reads A. Thread 2 changes A→B→A. Thread 1's CAS(expected=A, new=X) succeeds — but the world has changed under it. If you're using the value as a pointer (lock-free stack), this is a correctness bug.

Fix: carry a version along with the value.

java
AtomicStampedReference<Node> head = new AtomicStampedReference<>(null, 0);

int[] stampHolder = new int[1];
Node prev = head.get(stampHolder);
int stamp = stampHolder[0];
Node next = new Node(value, prev);
head.compareAndSet(prev, next, stamp, stamp + 1);

The stamp is incremented on every change, so ABA becomes A(v=1)β†’B(v=2)β†’A(v=3) and the CAS fails.

AtomicMarkableReference is the same idea with a single boolean mark (enough for some lock-free algorithms like Harris's linked list).

LongAdder and LongAccumulator (Java 8+) ​

AtomicLong serializes all writers through a single memory location. Under heavy write contention, cache-line bouncing kills throughput.

LongAdder keeps an array of cells; each thread hashes to a cell and CASes only that one. sum() walks all cells and sums them.

  • Write-heavy, read-occasional (metrics counters, request counters): use LongAdder.
  • Both reads and writes frequent: AtomicLong.
  • Custom reduction (min, max, etc.): LongAccumulator with a lambda.
java
LongAdder requests = new LongAdder();
requests.increment();         // very cheap under contention
long total = requests.sum();  // not atomic with concurrent increments β€” eventually consistent

VarHandle (Java 9+) ​

The modern, safe, typed replacement for sun.misc.Unsafe. Lets you do low-level memory ops (CAS, volatile, opaque, acquire/release) on fields and arrays without Atomic wrapper objects:

java
class Node {
  volatile int state;
  private static final VarHandle STATE;
  static {
    try {
      STATE = MethodHandles.lookup().findVarHandle(Node.class, "state", int.class);
    } catch (ReflectiveOperationException e) { throw new ExceptionInInitializerError(e); }
  }

  boolean tryClaim() {
    return STATE.compareAndSet(this, 0, 1);
  }
}

VarHandle access modes (get, set, getVolatile, setVolatile, setRelease, getAcquire, getOpaque, compareAndSet, getAndAdd, etc.) map directly to JMM access semantics. Relevant for lock-free library code; most application code should stick to atomics.

Q&A ​

  1. Q: What does AtomicInteger give you over volatile int?

    • Atomic compound operations (incrementAndGet, compareAndSet, updateAndGet) via hardware CAS. volatile int only gives visibility/ordering; i++ is still a race.
  2. Q: Explain CAS.

    • Compare-and-swap: atomically check-and-set. A single instruction that replaces a memory location's value only if it currently equals an expected value. Building block for all lock-free algorithms.
  3. Q: What is the ABA problem?

    • A thread reads A, gets preempted, others change it to B then back to A. A naive CAS on A succeeds but the state has changed underneath. Mitigate with AtomicStampedReference or monotonic version counters.
  4. Q: AtomicLong vs LongAdder β€” when pick which?

    • LongAdder for write-heavy counters (striping across cache lines eliminates contention). AtomicLong when reads and writes are mixed and you need exact value reads. LongAdder.sum() is best-effort under concurrent updates.
  5. Q: Why does updateAndGet take a pure function?

    • Under contention, the CAS retries. Your updater can be called multiple times with different inputs. Side effects will execute per-attempt, which is almost never what you want.
  6. Q: What is VarHandle and what replaced it?

    • Java 9+. Safe, typed access to memory ops with JMM-aware semantics (acquire, release, volatile, plain, opaque). Replaces internal sun.misc.Unsafe for most use cases.

Part 3 β€” Thread-safe collections ​

8. Concurrent collections ​

The general-purpose collections (HashMap, ArrayList, LinkedList, HashSet) are not thread-safe. Collections.synchronizedMap(m) wraps each method in synchronized β€” correct but coarse-grained (one lock, full serialization). The java.util.concurrent collections are the modern answer: designed for concurrency from the ground up, almost always faster than wrapped equivalents, and with iterator semantics that don't throw ConcurrentModificationException.

ConcurrentHashMap β€” the workhorse ​

Pre-Java 8 design: 16 segments, each a miniature HashMap with its own lock. Writers locked a segment; readers were lock-free. Fine for typical use but segment count was fixed at 16.

Java 8+ redesign:

  • Buckets with per-bucket locking. No segments. Writers synchronized on the bucket's first node.
  • CAS on empty buckets. Inserting into an empty bucket uses compareAndSet β€” zero locking on the common case.
  • Treeify at 8 β€” a bucket with β‰₯8 entries becomes a red-black tree for O(log n) lookups under hash collisions. Untreeifies at ≀6.
  • Lock-free reads. Gets traverse the bucket's linked list / tree via volatile reads.
  • Aggregate operations β€” forEach, search, reduce, reduceKeys, reduceValues, plus parallel variants.

Atomicity guarantees that matter:

java
map.compute(key, (k, v) -> v == null ? 1 : v + 1);   // atomic read-modify-write
map.merge(key, 1, Integer::sum);                      // compact version of above
map.computeIfAbsent(key, k -> expensiveCompute(k));   // computes ONCE across threads

Compare to get+put, which is a race:

java
// BAD β€” TOCTOU race
if (!map.containsKey(key)) map.put(key, compute());

// GOOD
map.computeIfAbsent(key, k -> compute());

Iterator semantics β€” weakly consistent:

  • Does not throw ConcurrentModificationException (unlike HashMap).
  • Reflects state at some point at or after iterator creation β€” may or may not see concurrent modifications.
  • Safe to modify the map while iterating.

Why no null keys/values?

map.get(k) returning null is ambiguous β€” is the key absent or present-with-null-value? In a single-threaded map you can disambiguate with containsKey, but in concurrent code the state can change between calls. CHM disallows nulls to remove this ambiguity.

When NOT to use CHM:

  • You need a sorted map β†’ ConcurrentSkipListMap.
  • You need a consistent snapshot across multiple keys β†’ use a lock or a different structure.
  • Compound invariants across multiple entries β†’ atomicity is per-entry only.

CopyOnWriteArrayList / CopyOnWriteArraySet ​

Thread-safety via immutability + copy-on-write: writes copy the entire backing array; reads are lock-free volatile reads of the array reference.

  • Reads: zero synchronization overhead, no iterator modification issues.
  • Writes: O(n) allocation + copy. Ruinous for write-heavy workloads.
  • Iterator: snapshot at iterator creation time β€” never reflects subsequent modifications. No ConcurrentModificationException.

Good use case: observer/listener lists β€” listeners are registered rarely, events fire constantly.

java
private final CopyOnWriteArrayList<EventListener> listeners = new CopyOnWriteArrayList<>();

public void fire(Event e) {
  for (EventListener l : listeners) l.handle(e);  // no lock, snapshot iterator
}

ConcurrentSkipListMap / ConcurrentSkipListSet ​

Sorted, concurrent, scales with processors. Skip list = probabilistic layered linked list (O(log n) expected). Not as fast as ConcurrentHashMap for pure hash lookups, but the only good option when you need sorted concurrent access (range queries, ceiling/floor).

ConcurrentLinkedQueue / ConcurrentLinkedDeque ​

Unbounded, non-blocking FIFO queues based on the Michael-Scott lock-free algorithm. Good for high-throughput producer-consumer where producers never need to wait (e.g., you always have capacity).

  • offer(e) always succeeds (unbounded).
  • poll() returns null immediately if empty β€” non-blocking, not wait-for-data.
  • size() is O(n) β€” traverses β€” don't call it in hot paths.

For blocking semantics (wait for space / wait for data), use BlockingQueue (Β§9).

Collections.synchronizedXxx β€” when? ​

  • Legacy codebases.
  • Simple wrappers where you already have a plain collection and need minimal thread-safety without code restructuring.
  • Remember to manually synchronize during iteration:
java
List<String> syncList = Collections.synchronizedList(new ArrayList<>());
synchronized (syncList) {                // manual lock for composite ops
  for (String s : syncList) { /* safe */ }
}

Prefer a concurrent collection in new code.

Q&A ​

  1. Q: How does ConcurrentHashMap achieve thread safety in Java 8+?

    • Per-bucket synchronization on the first node, with CAS for empty-bucket inserts. Reads are lock-free via volatile. Buckets treeify at 8 entries.
  2. Q: Why doesn't ConcurrentHashMap allow null keys or values?

    • get returning null would be ambiguous between "absent" and "present with null" β€” can't be disambiguated under concurrency. Disallowing nulls makes the API unambiguous.
  3. Q: What's the difference between map.computeIfAbsent and if (!map.containsKey(k)) map.put(k, compute())?

    • The first is atomic β€” the compute runs at most once across threads. The second is a race; two threads can both see absent and both compute + put.
  4. Q: When would you use CopyOnWriteArrayList?

    • Write-rare, read-dominant. Listener/observer lists are the canonical case. Also good for small collections (arrays are cheap to copy).
  5. Q: ConcurrentHashMap vs Collections.synchronizedMap β€” differences?

    • CHM: per-bucket locking, concurrent reads, weak-consistent iterators, atomic compound ops, no nulls. Synchronized wrapper: one lock on everything, iteration must be externally synchronized, allows nulls (if underlying map does).
  6. Q: Is ConcurrentHashMap iteration thread-safe?

    • Yes and "weakly consistent" β€” the iterator never throws CME, but may or may not reflect concurrent modifications. Safe to iterate + mutate from multiple threads, but don't rely on observing any specific modification.
  7. Q: You want a sorted concurrent TreeMap β€” what do you use?

    • ConcurrentSkipListMap. Skip-list-based, lock-free reads, scales well.

9. BlockingQueue family ​

BlockingQueue<T> extends Queue<T> with blocking put/take operations. The backbone of producer-consumer patterns and the work queue behind every ThreadPoolExecutor.

Method matrix:

OperationThrowsReturns specialBlocksBlocks with timeout
Insertadd(e)offer(e)put(e)offer(e, t, u)
Removeremove()poll()take()poll(t, u)
Examineelement()peek()β€”β€”

Almost always use the blocking pair (put, take) or the timed pair (offer(t), poll(t)). The "throws" and "returns null" variants are for specific use cases.

ArrayBlockingQueue ​

  • Bounded, fixed capacity.
  • Backed by a circular array.
  • Single ReentrantLock for both head and tail; producers and consumers contend on the same lock.
  • Optional fairness β€” FIFO among waiters.

Good default for bounded producer-consumer with moderate contention.

LinkedBlockingQueue ​

  • Bounded or unbounded (default unbounded β€” Integer.MAX_VALUE).
  • Linked-node-based.
  • Two locks β€” separate head and tail locks. Producers and consumers don't block each other as long as the queue isn't empty or full.
  • Higher throughput than ArrayBlockingQueue under heavy mixed contention.

Danger: default-constructed new LinkedBlockingQueue<>() is unbounded. Always specify a capacity unless you know why you don't.

PriorityBlockingQueue ​

  • Unbounded (bounded by heap memory).
  • Elements ordered by Comparable or supplied Comparator.
  • Heap-based β€” O(log n) insert/remove.
  • Iterator is not in priority order.

Good for scheduling/priority work.

SynchronousQueue ​

  • Zero capacity. Each put waits for a matching take and vice versa β€” direct handoff.
  • Used internally by Executors.newCachedThreadPool: "hand this task to a thread right now, or spin up a new one."
  • Two modes: non-fair (LIFO-biased, faster) and fair (FIFO).

DelayQueue ​

  • Elements implement Delayed (returns a getDelay(TimeUnit)).
  • take blocks until an element's delay has expired.
  • Used internally by ScheduledThreadPoolExecutor.
  • Good for timed retries, rate limiting, expiring caches.

LinkedTransferQueue ​

  • Unbounded, linked.
  • Adds transfer(e) β€” blocks until a consumer receives the item (like SynchronousQueue) but without zero-capacity restriction.
  • tryTransfer(e) β€” non-blocking hand-off attempt, falls back to offer if no consumer is waiting.

Useful when you want "handoff if possible, queue if not."

BlockingDeque / LinkedBlockingDeque ​

Double-ended blocking queue. Used by ForkJoinPool work-stealing (each worker has a local deque; steals from other workers' tails). Rarely used directly.

Producer-consumer β€” the canonical example ​

java
public class PipelineDemo {

  private static final BlockingQueue<Message> queue = new LinkedBlockingQueue<>(1000);
  private static final Message POISON = new Message("__STOP__");

  public static void main(String[] args) throws InterruptedException {
    var producers = Executors.newFixedThreadPool(4);
    var consumers = Executors.newFixedThreadPool(8);

    for (int i = 0; i < 4; i++) producers.submit(new Producer());
    for (int i = 0; i < 8; i++) consumers.submit(new Consumer());

    // ... later, to shut down cleanly:
    producers.shutdown();
    producers.awaitTermination(1, TimeUnit.MINUTES);

    // Poison pills: one per consumer thread so each wakes up and exits.
    for (int i = 0; i < 8; i++) queue.put(POISON);
    consumers.shutdown();
  }

  static class Producer implements Runnable {
    public void run() {
      try {
        while (!Thread.currentThread().isInterrupted()) {
          Message m = fetchFromSource();
          queue.put(m);  // blocks if queue full β†’ backpressure
        }
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }
    }
  }

  static class Consumer implements Runnable {
    public void run() {
      try {
        while (true) {
          Message m = queue.take();
          if (m == POISON) return;
          process(m);
        }
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }
    }
  }
}

Key points:

  • Bounded queue (new LinkedBlockingQueue<>(1000)) gives backpressure β€” producers block when consumers fall behind.
  • Poison pill shutdown pattern β€” one pill per consumer. Alternative: shutdownNow() + InterruptedException handling.
  • Preserve interrupt flag on catch.

πŸ’‘ Applied example: a legacy MQ microservice (Kafka consumer β†’ bounded processing queue β†’ downstream workers) is exactly this pattern. The bounded queue is the bulkhead that prevents a slow downstream from OOM-ing the consumer.

Q&A ​

  1. Q: Producer-consumer β€” outline an implementation.

    • Bounded BlockingQueue between threads. Producer put() (blocks on full), consumer take() (blocks on empty). Poison-pill or shutdown() to stop. The bounded queue is the backpressure mechanism.
  2. Q: ArrayBlockingQueue vs LinkedBlockingQueue β€” differences?

    • Array: fixed-size circular array, single lock, lower memory. Linked: linked nodes, two locks (head+tail) so producers/consumers don't contend, optionally unbounded (dangerous default).
  3. Q: What's SynchronousQueue and why is it used for newCachedThreadPool?

    • Zero-capacity direct-handoff queue. put waits for a take. newCachedThreadPool uses it so every task either hands off to an existing idle thread or spawns a new one β€” never queues.
  4. Q: What's the danger of unbounded LinkedBlockingQueue?

    • Under a traffic spike or slow consumer, it grows without limit and you OOM. Always set an explicit capacity.
  5. Q: How do you cleanly stop a consumer thread blocked on take()?

    • Either (a) shutdownNow() the executor, which interrupts the thread and take throws InterruptedException, or (b) a poison-pill message.
  6. Q: What does DelayQueue do?

    • take() blocks until the head element's delay expires. Used for timed retries, scheduled tasks (ScheduledThreadPoolExecutor is built on it).

Part 4 β€” Executors, async, and parallelism ​

10. Executor framework ​

Executor decouples task submission from task execution. Instead of new Thread(runnable).start(), you submit work to a pool that manages threads, queuing, lifecycle, and failure.

The hierarchy ​

Executor                               // single method: execute(Runnable)
  └── ExecutorService                  // shutdown, invokeAll, submit(Callable)
        └── ScheduledExecutorService   // schedule, scheduleAtFixedRate
        └── AbstractExecutorService
              └── ThreadPoolExecutor   // the main impl
              └── ForkJoinPool         // work-stealing (Β§12)

ThreadPoolExecutor β€” the real thing ​

java
new ThreadPoolExecutor(
    int corePoolSize,          // threads kept alive even when idle
    int maximumPoolSize,       // ceiling on total threads
    long keepAliveTime,        // how long above-core idle threads live
    TimeUnit unit,
    BlockingQueue<Runnable> workQueue,
    ThreadFactory threadFactory,
    RejectedExecutionHandler rejectionHandler);

The task-submission flow (this is the interview gotcha β€” get the ORDER right):

  1. If current thread count < corePoolSize: create a new thread, run task directly.
  2. Else: try to enqueue to workQueue.
  3. If enqueue fails (queue full): if thread count < maximumPoolSize, create a new thread, run task on it.
  4. If at max and queue full: invoke RejectedExecutionHandler.

Critical consequence: if you use an unbounded queue (like default LinkedBlockingQueue), step 2 always succeeds β€” you never reach step 3, and maximumPoolSize is never used. Threads beyond corePoolSize are never created. This is exactly what Executors.newFixedThreadPool does, and why it can OOM on the queue before adding a single extra thread.

Rejection policies ​

PolicyBehaviorUse case
AbortPolicy (default)Throws RejectedExecutionExceptionTell the caller you're overloaded
CallerRunsPolicyRuns the task on the submitting threadNatural backpressure β€” submitter slows down
DiscardPolicySilently drops the taskOnly when losing work is acceptable
DiscardOldestPolicyDrops the oldest queued task, tries againNewest-wins (stale data pipelines)

CallerRunsPolicy is the quiet hero: it applies backpressure automatically. Your HTTP request thread (or Kafka consumer thread) slows down and stops pulling new work, letting the pool catch up.

Why Executors factory methods are dangerous ​

FactoryRealityDanger
newFixedThreadPool(n)core=max=n, unbounded LinkedBlockingQueueQueue grows without bound β†’ OOM
newCachedThreadPool()core=0, max=MAX_VALUE, SynchronousQueueThread count explodes under burst β†’ OOM
newSingleThreadExecutorLike newFixedThreadPool(1)Same unbounded queue issue
newScheduledThreadPool(n)Uses DelayedWorkQueueTasks scheduled at high rate accumulate

Brian Goetz and Effective Java both recommend: build your ThreadPoolExecutor explicitly and pick a bounded queue + a real rejection policy.

A production-safe executor ​

java
ThreadPoolExecutor exec = new ThreadPoolExecutor(
    8,                                                    // core
    32,                                                   // max
    60L, TimeUnit.SECONDS,                                // idle above-core threads live 60s
    new ArrayBlockingQueue<>(500),                        // BOUNDED β€” backpressure
    new ThreadFactoryBuilder()                            // Guava; or write your own
        .setNameFormat("order-worker-%d")
        .setUncaughtExceptionHandler((t, e) ->
            log.error("Uncaught in {}", t.getName(), e))
        .build(),
    new ThreadPoolExecutor.CallerRunsPolicy());

Hand-rolled ThreadFactory:

java
class NamedThreadFactory implements ThreadFactory {
  private final String prefix;
  private final AtomicInteger count = new AtomicInteger();
  private final boolean daemon;

  NamedThreadFactory(String prefix, boolean daemon) {
    this.prefix = prefix;
    this.daemon = daemon;
  }

  @Override public Thread newThread(Runnable r) {
    Thread t = new Thread(r, prefix + "-" + count.incrementAndGet());
    t.setDaemon(daemon);
    t.setUncaughtExceptionHandler((th, e) ->
        LoggerFactory.getLogger(prefix).error("Uncaught", e));
    return t;
  }
}

Name your threads. Unnamed pool threads become pool-3-thread-17 in thread dumps β€” useless for diagnosis.

Pool sizing ​

Brian Goetz's formula (JCiP Β§8.2):

N_threads = N_cpus Γ— U_cpu Γ— (1 + W/C)

Where:

  • N_cpus = available processors (Runtime.getRuntime().availableProcessors())
  • U_cpu = target CPU utilization (0.0–1.0)
  • W/C = wait time / compute time ratio

Intuition:

  • Pure CPU work (W/C = 0, U=1): N_cpus threads.
  • Pure I/O wait (W/C = 99): ~100 Γ— N_cpus threads.
  • Typical web request hitting DB + cache + downstream: W/C = 5–10 β†’ ~5–10 Γ— N_cpus.

Don't guess β€” measure. Instrument pool stats (getActiveCount, getQueue().size(), getCompletedTaskCount) and tune.

Little's Law (complementary view): L = Ξ»W β€” average concurrent requests = arrival rate Γ— average response time. If your service gets 500 req/s at 200ms each, you need at least 100 concurrent threads to keep up.

Graceful shutdown ​

java
exec.shutdown();                                   // no new tasks; existing queue drains
if (!exec.awaitTermination(30, TimeUnit.SECONDS)) {
  exec.shutdownNow();                              // interrupts workers, returns unexecuted tasks
  if (!exec.awaitTermination(10, TimeUnit.SECONDS)) {
    log.error("Pool did not terminate");
  }
}
  • shutdown() β€” accept no new, finish pending. awaitTermination waits up to a timeout.
  • shutdownNow() β€” drain the queue (returns what was left), interrupt all workers. Workers that are blocked in interruptible calls (take, sleep) throw; non-interruptible tasks keep running.

Spring Boot's ExecutorService beans are auto-shutdown via SmartLifecycle. In non-Spring code, always shutdown() on application exit.

Q&A ​

  1. Q: Explain the ThreadPoolExecutor submission logic.

    • Core β†’ queue β†’ max β†’ reject. If threads < core, create new. Else queue. If queue full AND threads < max, create new. Else apply rejection policy.
  2. Q: Why is Executors.newFixedThreadPool dangerous in production?

    • It uses an unbounded LinkedBlockingQueue. Under overload, the queue grows without limit β†’ OOM. Build a ThreadPoolExecutor with a bounded queue and a real rejection policy.
  3. Q: What does CallerRunsPolicy do, and why is it useful?

    • When the pool is saturated, runs the rejected task on the thread that submitted it. Naturally slows the producer β€” a backpressure mechanism without extra plumbing.
  4. Q: How do you pick a thread pool size?

    • Goetz formula: cpus Γ— util Γ— (1 + W/C). For I/O-heavy work, much larger than CPU count. Always measure and tune; never just pick a number.
  5. Q: shutdown() vs shutdownNow()?

    • shutdown stops accepting new tasks, lets existing ones complete. shutdownNow interrupts workers and returns queued tasks that weren't started. Pattern: shutdown β†’ wait β†’ shutdownNow as fallback.
  6. Q: Why always set a ThreadFactory?

    • To name threads, set daemon flag, install an uncaught exception handler. Unnamed threads make thread dumps unreadable; default UEH silently prints to stderr and exits the thread.
  7. Q: submit(Callable) vs execute(Runnable)?

    • submit returns a Future<V> you can get() or cancel; exceptions are wrapped in ExecutionException. execute has no return, exceptions go to the uncaught handler. Prefer submit if you care about the result or error.

11. Future and CompletableFuture ​

Future<V> β€” the primitive ​

java
Future<Integer> f = executor.submit(() -> compute());
Integer result = f.get(5, TimeUnit.SECONDS);  // blocks; may throw ExecutionException, TimeoutException
boolean cancelled = f.cancel(true);            // true = interrupt if running

Limitations:

  • get() is blocking.
  • No composition (can't say "do X then Y then Z" cleanly).
  • No combining (can't wait for multiple).
  • Callback on completion requires manual polling or extra threads.

CompletableFuture<V> β€” Java 8+ ​

Non-blocking composition, callbacks, combination, error recovery. The modern answer to async control flow in imperative Java.

Creation:

java
CompletableFuture<String> cf = CompletableFuture.supplyAsync(() -> fetchData());
CompletableFuture<Void> done = CompletableFuture.runAsync(() -> doSideEffect());
CompletableFuture<String> constant = CompletableFuture.completedFuture("already done");

// Manually completable
CompletableFuture<String> manual = new CompletableFuture<>();
externalCallback.onComplete(value -> manual.complete(value));

Transformation:

MethodReturnsPurpose
thenApply(f)CF<U>Map value β†’ value
thenAccept(c)CF<Void>Consume value, no return
thenRun(r)CF<Void>Side effect, ignore value

Composition (flatMap):

java
// thenApply: returns CF<CF<User>> β€” nested, awkward
CompletableFuture<CompletableFuture<User>> nested =
    userIdFuture.thenApply(id -> fetchUserAsync(id));

// thenCompose: flat-maps β€” returns CF<User>
CompletableFuture<User> flat =
    userIdFuture.thenCompose(id -> fetchUserAsync(id));

Rule: if your function returns a CF, use thenCompose (like Optional's flatMap or Stream's flatMap).

Combining two futures:

java
CompletableFuture<String> name = fetchName(id);
CompletableFuture<Integer> age = fetchAge(id);

// thenCombine: both run, combines results
CompletableFuture<Person> p = name.thenCombine(age, Person::new);

// applyToEither: whichever completes first
CompletableFuture<String> fastest = primary.applyToEither(fallback, s -> s);

Fan-out / fan-in β€” allOf, anyOf:

java
List<CompletableFuture<Quote>> futures = vendors.stream()
    .map(v -> CompletableFuture.supplyAsync(() -> v.getQuote(req), executor))
    .toList();

CompletableFuture<Void> all = CompletableFuture.allOf(futures.toArray(CompletableFuture[]::new));

CompletableFuture<List<Quote>> collected = all.thenApply(v ->
    futures.stream().map(CompletableFuture::join).toList());

List<Quote> quotes = collected.orTimeout(2, TimeUnit.SECONDS).join();

Error handling ​

java
CompletableFuture<User> result = fetchUser(id)
    .exceptionally(ex -> {                                      // replace failure with default
      log.warn("fetchUser failed", ex);
      return DEFAULT_USER;
    })
    .handle((user, ex) -> {                                     // handle both outcomes
      if (ex != null) { /* cleanup */; return fallback; }
      return user;
    })
    .whenComplete((user, ex) -> metrics.record(ex != null));    // side-effect only; exception propagates
  • exceptionally β€” only runs on failure.
  • handle β€” always runs; receives (value, exception). Can transform the outcome.
  • whenComplete β€” side-effect only; does not suppress the exception.

The common-pool trap β€” THE interview killer ​

java
// Which pool does this run on?
CompletableFuture.supplyAsync(() -> expensiveOp())    // ForkJoinPool.commonPool()
                 .thenApplyAsync(v -> doMore(v));     // also commonPool()

// Which pool does this run on?
CompletableFuture.supplyAsync(() -> computeA())
                 .thenApply(a -> computeB(a));        // WHOEVER completed the prior stage

Two dangers:

  1. Common-pool saturation. ForkJoinPool.commonPool() has cores - 1 threads by default. If you submit blocking work (DB, HTTP), you can starve parallel streams, other CompletableFutures, and any Fork/Join work sharing the pool.
  2. Non-Async variants execute inline on whatever thread completed the prior stage β€” including the thread that called complete(). Can be surprising (a callback thread suddenly runs CPU work).

Rules of thumb:

  • Always pass your own executor for async work: supplyAsync(task, myExecutor), thenApplyAsync(f, myExecutor).
  • Never block inside a stage on the common pool without specifying an executor.
  • Use a pool dedicated to blocking I/O, different from one for CPU work.

Timeouts (Java 9+) ​

java
cf.orTimeout(2, TimeUnit.SECONDS);               // completes exceptionally with TimeoutException
cf.completeOnTimeout(FALLBACK, 2, TimeUnit.SECONDS);  // completes normally with fallback

Before Java 9, you had to implement this with ScheduledExecutorService + cancel. Don't reinvent it.

Full worked example β€” parallel multi-vendor quote ​

java
public CompletableFuture<QuoteSummary> getQuotes(Request req, Executor blockingIo) {
  List<CompletableFuture<Quote>> futures = vendors.stream()
      .map(v -> CompletableFuture.supplyAsync(() -> v.callSlowApi(req), blockingIo)
          .orTimeout(3, TimeUnit.SECONDS)
          .exceptionally(ex -> Quote.unavailable(v, ex)))
      .toList();

  return CompletableFuture.allOf(futures.toArray(CompletableFuture[]::new))
      .thenApply(v -> futures.stream()
          .map(CompletableFuture::join)    // safe β€” all already complete
          .filter(Quote::isAvailable)
          .toList())
      .thenApply(QuoteSummary::from);
}

Note the .exceptionally per-future so that allOf doesn't fail on a single vendor's failure β€” the allOf is about completion, not success.

CompletableFuture vs virtual threads ​

Since Java 21, for blocking I/O work, you no longer need CompletableFuture just to avoid blocking a thread:

java
// "Reactive-style" with CompletableFuture
cf1.thenCompose(a -> cf2(a)).thenCompose(b -> cf3(b));

// With virtual threads β€” plain imperative code, no composition machinery
Virtual t = Thread.startVirtualThread(() -> {
  A a = fetchA();
  B b = fetchB(a);
  C c = fetchC(b);
});

Use CompletableFuture when you need explicit parallelism (fan-out), callbacks from non-thread sources, or when interop with a CF-based API. Don't use it purely to avoid blocking β€” that's what virtual threads are for.

Q&A ​

  1. Q: Future vs CompletableFuture?

    • Future is a blocking handle with no composition. CompletableFuture adds non-blocking callbacks, composition (thenCompose), combination (thenCombine, allOf, anyOf), error recovery, and timeouts.
  2. Q: thenApply vs thenCompose?

    • thenApply(T β†’ U): synchronous map. If U = CompletableFuture<V>, you end up with nested CF<CF<V>>. thenCompose(T β†’ CF<V>): flat-maps, returns CF<V>. Same distinction as Stream.map vs flatMap.
  3. Q: thenCombine vs thenCompose?

    • thenCombine: wait for both this and another independent CF, then combine values. thenCompose: sequential β€” use the result of this to start the next CF.
  4. Q: allOf vs anyOf?

    • allOf completes when every input completes (result is Void; you then join each). anyOf completes as soon as any input completes (result is Object β€” the first completed value).
  5. Q: What's the common-pool trap?

    • supplyAsync/thenXxxAsync without an executor use ForkJoinPool.commonPool() (~cores - 1 threads). Submitting blocking work saturates it and starves everything else. Always pass an explicit executor.
  6. Q: Non-Async variants β€” what thread do they run on?

    • Whoever completed the prior stage. If completed synchronously (completedFuture), the submitting thread. Can be surprising β€” if completion comes from a callback thread, your CPU code runs there.
  7. Q: How do you implement a timeout on a CompletableFuture (Java 9+)?

    • cf.orTimeout(t, unit) β€” completes exceptionally with TimeoutException. Or cf.completeOnTimeout(fallback, t, unit) β€” completes normally with a fallback.
  8. Q: Does exceptionally recover from timeouts caused by orTimeout?

    • Yes β€” orTimeout's TimeoutException flows through exceptionally/handle like any other failure.

12. Fork/Join and parallel streams ​

ForkJoinPool ​

Designed for divide-and-conquer: recursively split work until small, compute leaves, combine. Work-stealing for load balancing.

Work-stealing: each worker has its own deque. When a worker splits a task, the continuation goes on its local deque. Idle workers steal from the tail of other workers' deques (original owner pushes/pops from head). Locality + minimal contention.

java
class SumTask extends RecursiveTask<Long> {
  private static final int THRESHOLD = 10_000;
  private final long[] arr;
  private final int lo, hi;

  SumTask(long[] arr, int lo, int hi) { this.arr = arr; this.lo = lo; this.hi = hi; }

  @Override protected Long compute() {
    if (hi - lo <= THRESHOLD) {
      long sum = 0;
      for (int i = lo; i < hi; i++) sum += arr[i];
      return sum;
    }
    int mid = (lo + hi) >>> 1;
    SumTask left  = new SumTask(arr, lo, mid);
    SumTask right = new SumTask(arr, mid, hi);
    left.fork();                      // async submit
    long r = right.compute();         // compute inline
    long l = left.join();             // wait for left
    return l + r;
  }
}

ForkJoinPool pool = new ForkJoinPool();
long total = pool.invoke(new SumTask(data, 0, data.length));
  • RecursiveTask<V> β€” computes a value.
  • RecursiveAction β€” void return.
  • fork() β€” enqueue asynchronously.
  • join() β€” wait for a forked task.
  • Pattern: always fork() one side, compute() the other inline β€” avoids one unnecessary enqueue.

The common pool ​

ForkJoinPool.commonPool() is a singleton shared across the JVM. Used by:

  • CompletableFuture.supplyAsync (default executor)
  • parallelStream()
  • Any RecursiveTask submitted without a custom pool

Size = max(1, cores - 1) by default. Can be overridden with -Djava.util.concurrent.ForkJoinPool.common.parallelism=N.

Do NOT submit blocking work to the common pool β€” you starve everyone else sharing it.

Parallel streams β€” stream().parallel() ​

Parallel streams split the source, work each chunk on the common pool, and combine. Straightforward for CPU-bound, stateless, associative reductions.

When it wins:

  • Large enough source (tens of thousands+ for trivial ops, fewer for heavy per-element work).
  • Splittable source (ArrayList, arrays β€” great; LinkedList β€” bad; iterator-based β€” bad).
  • Stateless, associative operations.
  • No shared mutable state.

When it loses or misbehaves:

  • Blocking I/O per element (starves the common pool).
  • Small or unevenly-sized sources.
  • Encounter-order-sensitive collectors (toList was order-preserving even in parallel pre-Java 10; in modern Java toList is order-preserving but can lose perf β€” prefer toUnorderedList or collect to a Set / concurrent collector).
  • Shared mutable state in lambdas (don't).

Custom pool for parallel streams (workaround for "don't want to use the common pool"):

java
ForkJoinPool myPool = new ForkJoinPool(16);
try {
  List<Integer> result = myPool.submit(() ->
      data.parallelStream().map(this::heavy).toList()
  ).join();
} finally {
  myPool.shutdown();
}

Internally, parallelStream() uses whichever ForkJoinPool it's running inside of β€” so submitting into your own pool captures it.

Q&A ​

  1. Q: What is ForkJoinPool and how does work-stealing work?

    • A thread pool optimized for recursive divide-and-conquer. Each worker has its own deque; workers pop from their own head, steal from others' tails. Minimizes contention and keeps cores busy.
  2. Q: When does parallelStream() actually help?

    • CPU-bound work on a splittable source (array/ArrayList), thousands+ of elements, stateless ops, no blocking. Hurts with blocking I/O, linked lists, tiny tasks, or shared state.
  3. Q: Why should you not do blocking I/O in a parallelStream?

    • It uses the common ForkJoinPool with cores - 1 threads. Blocking a thread blocks work-stealing for everyone else using the pool β€” CompletableFuture.supplyAsync without executor, other parallel streams, etc.
  4. Q: What's the difference between fork()/join() and invokeAll?

    • invokeAll(task1, task2) is a convenience that forks all but the last and computes the last inline (idiomatic pattern). fork()/join() are the primitives.
  5. Q: Can you submit a parallelStream to a specific pool?

    • Yes β€” wrap in myPool.submit(() -> data.parallelStream()...).join(). parallelStream uses the pool of its running thread, so this captures your pool.

13. Synchronizers / coordination primitives ​

When you need threads to coordinate timing or resource access beyond simple producer-consumer.

CountDownLatch ​

One-shot "wait for N events."

java
CountDownLatch ready = new CountDownLatch(5);

for (int i = 0; i < 5; i++) {
  pool.submit(() -> {
    doSetup();
    ready.countDown();
  });
}

ready.await();                               // blocks until all 5 count down
startWork();

Two classic patterns:

  • Start gate β€” one CountDownLatch(1); workers await(); main thread countDown() releases them all simultaneously (useful for load tests).
  • Completion latch β€” CountDownLatch(N); each of N workers countDown(); coordinator await() until all done.

Cannot be reset β€” if you need reuse, use CyclicBarrier or Phaser.

CyclicBarrier ​

Reusable "wait for N parties at a rendezvous point."

java
CyclicBarrier barrier = new CyclicBarrier(4, () -> log.info("All arrived, phase complete"));

// in each of 4 threads, looping through phases:
for (int phase = 0; phase < 10; phase++) {
  doPhaseWork(phase);
  barrier.await();      // wait for other 3; barrier action fires once all arrive
}

The optional barrier action runs once in a single thread when all parties arrive β€” useful for phase transitions in parallel algorithms.

On broken barrier: if any thread is interrupted or times out at the barrier, the barrier is "broken" β€” all other waiters throw BrokenBarrierException and the barrier is unusable until reset.

Phaser ​

Generalizes both. Parties can register/deregister dynamically; phases advance when all registered parties arrive.

java
Phaser phaser = new Phaser(1);                    // main thread is party 1

for (int i = 0; i < n; i++) {
  phaser.register();                              // dynamically add worker
  pool.submit(() -> {
    doSetup();
    phaser.arriveAndAwaitAdvance();               // barrier, phase 0 β†’ 1
    doMainWork();
    phaser.arriveAndDeregister();                 // leave phaser
  });
}

phaser.arriveAndAwaitAdvance();                   // main also arrives for phase 0

Use when party count is dynamic or you have multiple phases. For simpler cases, CountDownLatch or CyclicBarrier is clearer.

Semaphore ​

Bounded resource permits.

java
Semaphore apiTokens = new Semaphore(10);          // max 10 concurrent callers

public Response call() throws InterruptedException {
  apiTokens.acquire();
  try {
    return httpClient.get(url);
  } finally {
    apiTokens.release();
  }
}
  • acquire() β€” blocks if no permits; interruptible.
  • tryAcquire(timeout, unit) β€” timed.
  • Fairness β€” new Semaphore(10, true) gives FIFO ordering.
  • acquire(n) / release(n) β€” multi-permit ops.

Good for rate-limiting, connection pools, bounded-concurrency patterns.

Note: a Semaphore(1) works like a mutex but is not reentrant and is not tied to any particular thread β€” any thread can release. Use only when you need that.

Exchanger ​

Two threads meet and exchange values. Niche β€” genetic algorithms, pipelined buffering. Rarely used directly.

java
Exchanger<Buffer> ex = new Exchanger<>();

// Thread A (producer)
Buffer empty = new Buffer();
while (running) {
  fill(empty);
  empty = ex.exchange(empty);       // hand full buffer, receive empty
}

// Thread B (consumer)
Buffer full = new Buffer();
while (running) {
  drain(full);
  full = ex.exchange(full);
}

CountDownLatch vs CyclicBarrier β€” know the differences ​

CountDownLatchCyclicBarrier
ReusableNo (one-shot)Yes
TriggerCounter reaches 0 (external counts down)All N parties call await()
Thread countCoordinator and workers are differentAll parties are equivalent
Barrier actionβ€”Optional, runs in one thread on completion
CancellationPermanently latched, can't resetCan be broken and reset

Example β€” coordinated load test startup ​

java
int n = 100;
CountDownLatch ready = new CountDownLatch(n);
CountDownLatch start = new CountDownLatch(1);
CountDownLatch done  = new CountDownLatch(n);

for (int i = 0; i < n; i++) {
  pool.submit(() -> {
    ready.countDown();
    start.await();
    try { hitEndpoint(); }
    finally { done.countDown(); }
  });
}

ready.await();                    // all threads ready & blocked
long t0 = System.nanoTime();
start.countDown();                // fire!
done.await();
log.info("Elapsed: {} ms", (System.nanoTime() - t0) / 1_000_000);

Three latches for three distinct coordination events β€” classic pattern.

Q&A ​

  1. Q: CountDownLatch vs CyclicBarrier?

    • Latch is one-shot and triggered by external countDowns, barrier is reusable and triggered when all N parties call await. Latch has separate coordinator/workers; barrier treats all parties equally.
  2. Q: When would you use Phaser?

    • When the number of parties changes dynamically, or you have multi-phase algorithms where reuse + dynamic registration matters. Otherwise prefer simpler latch/barrier.
  3. Q: What's a Semaphore good for?

    • Bounded concurrency / resource permits: rate limiting, max concurrent HTTP calls, connection pooling. Semaphore(1) can act as a non-reentrant mutex but that's rarely the right tool.
  4. Q: Can a CountDownLatch be reset?

    • No β€” it's one-shot. Once it reaches 0 it stays there forever.
  5. Q: What does "broken barrier" mean in CyclicBarrier?

    • If any waiting thread is interrupted or times out, the barrier enters a broken state; all other waiters throw BrokenBarrierException. Must call reset() to reuse.

Part 5 β€” Modern concurrency (Java 21+) ​

14. Virtual threads (Project Loom) ​

Arrived as a preview in Java 19, finalized in Java 21 (JEP 444). The single biggest concurrency feature added to Java since the java.util.concurrent package itself.

The problem they solve ​

Traditional Java threads are 1:1 mapped to OS threads. An OS thread costs:

  • ~1 MB of committed stack memory (default)
  • A TCB in the kernel
  • Scheduling time in the OS scheduler

Realistic limit: a few thousand platform threads before you hit memory pressure or scheduler overhead. For a web service with 10k concurrent connections, you either (a) use a thread pool that's much smaller than the request count (queueing, head-of-line blocking), or (b) use async/reactive code (callback hell, Mono/Flux complexity, non-trivial debugging).

Virtual threads remove the tradeoff.

Platform vs virtual threads ​

Platform threadVirtual thread
Mapped to1:1 with OS threadM:N with OS carrier threads
Stack~1 MB committedFew hundred bytes, grows dynamically
Creation costSlow (syscall)Fast (allocation only)
SchedulerOSJVM (ForkJoinPool of carriers)
Typical countHundreds to low thousandsMillions

A virtual thread is an instance of Thread (no separate API surface); t.isVirtual() returns true. Most code that works with Thread just works.

How the JVM runs them β€” M:N scheduling ​

Virtual threads run on carrier threads (platform threads managed by a ForkJoinPool, default size = available processors).

When a virtual thread reaches a blocking operation (Thread.sleep, I/O, lock, await), the JVM unmounts it from its carrier and parks the continuation (a suspendable representation of the Java call stack). The carrier is freed to run other virtual threads. When the blocking operation completes, the virtual thread is re-mounted on an available carrier and continues.

Translation: imperative blocking code becomes as efficient as callback-based non-blocking code.

Creating virtual threads ​

java
// Builder
Thread vt = Thread.ofVirtual().name("user-", 0).start(() -> work());

// Startup shortcut
Thread.startVirtualThread(() -> work());

// Executor β€” one virtual thread per task
try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) {
  for (int i = 0; i < 10_000; i++) {
    exec.submit(() -> handleRequest());
  }
} // AutoCloseable β€” implicit shutdown + awaitTermination

The try-with-resources pattern on ExecutorService is Java 19+ and very clean for scoped work.

When virtual threads help β€” and when they don't ​

Help (the sweet spot):

  • Many concurrent tasks, each doing blocking I/O (DB, HTTP, messaging, file).
  • Thread-per-request servers (Tomcat, Jetty with virtual threads enabled).
  • Existing imperative code β€” just switch the executor.

Don't help, or hurt:

  • CPU-bound work β€” virtual threads add scheduling overhead with no benefit. Use a platform-thread pool sized to cores.
  • Small number of threads already β€” zero benefit.
  • Pinning workloads (see below) β€” you lose the whole point.
  • ThreadLocal-heavy code β€” each virtual thread gets its own ThreadLocals, and with millions of threads the memory adds up. (Fix: Scoped Values β€” Β§16.)

Pinning β€” the gotcha you must know ​

A virtual thread is pinned to its carrier if it tries to block while:

  1. Inside a synchronized block or method (pre-Java 24).
  2. Executing a native method or foreign function.

When pinned, the JVM cannot unmount β€” the carrier thread is held captive for the duration of the block. If enough virtual threads pin simultaneously, you exhaust carriers and throughput collapses.

Java 21/22/23 mitigation: replace synchronized with ReentrantLock on hot paths that block.

java
// BEFORE (pins on I/O inside)
synchronized (this) {
  var resp = httpClient.send(req, ...);   // blocks β†’ PINS the carrier
}

// AFTER
lock.lock();
try {
  var resp = httpClient.send(req, ...);   // blocks β†’ virtual thread unmounts cleanly
} finally {
  lock.unlock();
}

Java 24 (JEP 491 β€” Synchronize Virtual Threads without Pinning): removes the synchronized pinning problem. A virtual thread blocking inside synchronized unmounts normally (the monitor is handed back on unmount, re-acquired on remount). Native pinning remains.

Diagnosing pinning: run with -Djdk.tracePinnedThreads=full (stack traces) or =short (summary). JFR events: jdk.VirtualThreadPinned.

Spring Boot 3.2+ integration ​

properties
spring.threads.virtual.enabled=true

Enables virtual threads for Tomcat/Jetty request handling, @Async, and TaskExecutor. One flag turns your imperative Spring MVC service into a highly-concurrent server without code changes β€” provided you don't hit pinning.

Don't pool virtual threads. They're cheap to create. Executors.newVirtualThreadPerTaskExecutor() is "one new vthread per submitted task." Pooling re-introduces the queue-or-reject tradeoff you just solved.

Don't put virtual threads in a ThreadPoolExecutor. Some APIs expect platform threads (e.g., ones that use Thread-local state extensively). And you'd lose the scaling property.

A quick benchmark you can cite ​

Running 10,000 concurrent tasks that each sleep 1 second:

Platform threads (fixed pool of 200):   ~50 seconds total (50 batches of 200)
Platform threads (cached pool):         ~1 second but ~10k OS threads β†’ resource pressure
Virtual threads:                         ~1 second, ~8 carrier threads, minimal memory

Q&A ​

  1. Q: What is a virtual thread?

    • A lightweight Java thread (instance of Thread) managed by the JVM, not the OS. Runs on a pool of carrier platform threads via M:N scheduling. Unmounts from its carrier on blocking ops, re-mounts when ready.
  2. Q: When do virtual threads shine?

    • High-concurrency, I/O-bound workloads β€” thread-per-request web servers, fan-out to many downstream services, traditional imperative code that was previously bottlenecked on thread count.
  3. Q: When should you NOT use virtual threads?

    • CPU-bound work (use a sized platform-thread pool). Small concurrency (no benefit). Heavy synchronized blocks with blocking inside (pinning, pre-Java 24). ThreadLocal-heavy code.
  4. Q: What is pinning?

    • A virtual thread held on its carrier because it tries to block inside synchronized (pre-Java 24) or native code. Carrier can't be freed; enough pinning = carrier exhaustion. Fix: use ReentrantLock, or Java 24+.
  5. Q: How many virtual threads can you run?

    • Millions, on typical hardware. Limited by heap (hundreds of bytes per thread's continuation).
  6. Q: Do virtual threads use ThreadLocal?

    • Yes, but it's expensive at scale (per-vthread storage) and InheritableThreadLocal propagation is per-thread. Scoped Values (Β§16) are the recommended replacement.
  7. Q: Should you pool virtual threads?

    • No. Their whole point is cheap creation. Use newVirtualThreadPerTaskExecutor() β€” one vthread per submission, auto-destroyed when done.
  8. Q: Does enabling virtual threads make existing code faster?

    • For I/O-bound code hitting thread-count limits, yes β€” often dramatically. For CPU-bound or low-concurrency code, no. For code with synchronized blocking inside on pre-Java 24, could be worse (pinning).

15. Structured concurrency ​

Preview across Java 21/22/23/24/25 (JEPs 453, 462, 480, 499, 505 β€” still incubating). Stabilization expected soon. Know the concept; syntax may evolve.

The problem ​

Unstructured CompletableFuture / ExecutorService code has no parent-child relationship. When one branch fails or times out, sibling branches keep running. Threads leak; cancellation doesn't propagate.

java
// Bad β€” unstructured
Future<A> fa = exec.submit(() -> fetchA());
Future<B> fb = exec.submit(() -> fetchB());

A a = fa.get();                                // throws
// fb is still running, orphaned, no one will join or cancel it

The idea ​

Tasks spawned from a method should complete before the method returns β€” either all succeed, or all are cancelled. Parent-child relationships should be explicit, like block scopes in structured programming.

StructuredTaskScope ​

java
// Both tasks must succeed, or cancel the whole scope
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
  Subtask<String> name = scope.fork(() -> fetchName(userId));
  Subtask<Integer> age = scope.fork(() -> fetchAge(userId));

  scope.join();                // wait for both (or first failure)
  scope.throwIfFailed();       // propagate exception if any failed

  return new User(name.get(), age.get());
}

What the scope guarantees:

  • fork() starts a subtask (on a virtual thread by default).
  • join() waits for all to complete (or joinUntil(deadline)).
  • On exception or timeout: the scope cancels all remaining subtasks (Thread.interrupt() to each).
  • On scope close (end of try-with-resources): close() forces cleanup of any still-running.

Policies ​

  • ShutdownOnFailure β€” cancel all siblings on the first failure. throwIfFailed() propagates the exception.
  • ShutdownOnSuccess β€” cancel all siblings on the first success. result() returns the winner's value. (Pattern: "first to respond wins" β€” redundant queries, race between primary and cache.)
  • Custom policies by subclassing StructuredTaskScope<T> and overriding handleComplete.

Example β€” first-of-N redundant query ​

java
try (var scope = new StructuredTaskScope.ShutdownOnSuccess<Quote>()) {
  for (Vendor v : vendors) scope.fork(() -> v.getQuote(req));
  scope.joinUntil(Instant.now().plusSeconds(2));
  return scope.result();                   // first successful quote; siblings cancelled
}

Why it matters with virtual threads ​

With virtual threads, it's cheap to fork many concurrent tasks β€” but without structured concurrency, they leak on failure. Structured concurrency makes parallelism safe at vthread scale: every forked task has a lifetime bounded by the enclosing block.

Comparison β€” imperative, futures, structured ​

java
// Imperative (serial)                       // latency = A + B
A a = fetchA();
B b = fetchB();

// CompletableFuture (parallel, unstructured) // latency = max(A,B), but leaks on failure
CompletableFuture<A> fa = CompletableFuture.supplyAsync(this::fetchA);
CompletableFuture<B> fb = CompletableFuture.supplyAsync(this::fetchB);
A a = fa.join();
B b = fb.join();

// StructuredTaskScope (parallel, structured) // latency = max(A,B), safe on failure
try (var scope = new ShutdownOnFailure()) {
  var fa = scope.fork(this::fetchA);
  var fb = scope.fork(this::fetchB);
  scope.join(); scope.throwIfFailed();
  return new Result(fa.get(), fb.get());
}

The structured version is the best of all worlds: parallel, safe, cancelable, readable as a block of code.

Q&A ​

  1. Q: What problem does structured concurrency solve?

    • The lack of parent-child lifetime ties in unstructured concurrent code. Forked tasks can leak or continue after the "parent" context has moved on or failed. Structured scopes ensure subtasks complete (or are cancelled) before the enclosing block exits.
  2. Q: What does ShutdownOnFailure do?

    • As soon as any subtask fails, all other running subtasks are interrupted. scope.join() returns, and throwIfFailed() re-raises the root failure.
  3. Q: Why does it pair so well with virtual threads?

    • Virtual threads make forking cheap. Structured concurrency makes forking safe. Together: you can fan out to many subtasks with confidence they'll be cleaned up on failure.
  4. Q: Is it stable yet?

    • No β€” preview API through Java 25 (incubating). Syntax has shifted across versions. Know the concept; use --enable-preview if adopting.

16. Scoped values ​

Preview in Java 20+ (JEP 429, 446, 464, 481 in 21/22/23/24). Replacement for ThreadLocal in many use cases, especially when using virtual threads.

Why ThreadLocal is problematic at scale ​

  • Mutable. ThreadLocal can be set and re-set, making the data flow opaque.
  • Unbounded lifetime. Data lives until the thread dies β€” in a thread pool, until the pool is shut down. Forgetting to remove() is a memory leak.
  • Per-thread storage. With millions of virtual threads, per-vthread ThreadLocals multiply memory usage.
  • InheritableThreadLocal β€” propagates on thread creation via Thread.inheritedAccessControlContext snapshot. Unreliable with async code that hops threads.

The scoped-values model ​

  • Immutable. A scoped value is bound once at a well-defined point and cannot be changed within the scope.
  • Bounded scope. Valid only within the lambda passed to ScopedValue.runWhere() / callWhere(). Outside = not bound.
  • Efficient inheritance in structured concurrency β€” child tasks see parent's bindings without copying.
java
private static final ScopedValue<User> CURRENT_USER = ScopedValue.newInstance();

public void handleRequest(Request req) {
  User authenticated = authenticate(req);
  ScopedValue.runWhere(CURRENT_USER, authenticated, () -> {
    processRequest(req);
  });
}

void processRequest(Request req) {
  User u = CURRENT_USER.get();          // available throughout call tree
  audit(u, req);
}

Migration from ThreadLocal ​

ThreadLocalScopedValue
TL.set(v) / TL.get() / TL.remove()ScopedValue.runWhere(SV, v, () -> ...) inside lambda, SV.get()
Mutable, manually clearedImmutable, auto-cleared at end of lambda
Any thread, any timeOnly within scope block
Value per threadValue per scope instance, shared safely

For request-scoped context (user, correlation ID, tenant), scoped values are simpler and safer than ThreadLocal.

Q&A ​

  1. Q: Why are scoped values preferred over ThreadLocal with virtual threads?

    • ThreadLocals are per-thread β€” with millions of vthreads, memory adds up. Scoped values are scope-bounded, immutable, cheap to inherit in structured concurrency, and can't leak.
  2. Q: Can you mutate a scoped value?

    • No β€” immutable within its scope. If you need a different value, rebind with another runWhere (which establishes a nested scope).
  3. Q: Are scoped values stable yet?

    • Preview through Java 25 (incubating). Use --enable-preview to experiment.

Part 6 β€” Patterns, problems, pitfalls ​

17. Common concurrency problems ​

Race condition ​

Two or more threads access shared state, and the final state depends on interleaving. The "interleaving" can be sub-instruction-level (e.g., the load/add/store of i++).

Example:

java
class Counter {
  int count;
  public void inc() { count++; }       // RACE
  public int get() { return count; }   // STALE / VISIBILITY
}

Fixes (pick one):

  • AtomicInteger count; count.incrementAndGet();
  • synchronized on inc() and get()
  • private final Lock lock = new ReentrantLock(); + explicit lock in both

Check-then-act races are a subset:

java
if (!cache.containsKey(k)) cache.put(k, expensive());  // RACE β€” two threads both enter
cache.computeIfAbsent(k, key -> expensive());          // atomic

Deadlock ​

Two or more threads each hold a resource the other needs. Nobody progresses.

Coffman's four necessary conditions (all must hold for deadlock; remove any one to prevent):

  1. Mutual exclusion β€” resources can't be shared.
  2. Hold and wait β€” a thread holds one while waiting for another.
  3. No preemption β€” the OS can't forcibly take a lock from a thread.
  4. Circular wait β€” a cycle in the "waiting for" graph.

Classic deadlock:

java
Object a = new Object();
Object b = new Object();

// Thread 1
synchronized (a) {
  synchronized (b) { /* ... */ }
}

// Thread 2
synchronized (b) {
  synchronized (a) { /* ... */ }
}

T1 holds a, waits for b. T2 holds b, waits for a. Deadlock.

Prevention strategies:

  • Lock ordering β€” impose a global order on locks (e.g., by System.identityHashCode, account ID, etc.) and always acquire in that order. Breaks circular wait.
    java
    void transfer(Account from, Account to, long amount) {
      Account first = from.id < to.id ? from : to;
      Account second = from.id < to.id ? to : from;
      synchronized (first.lock) {
        synchronized (second.lock) { /* ... */ }
      }
    }
  • tryLock with timeout β€” if you can't acquire, release what you have and back off.
    java
    if (fromLock.tryLock(1, SECONDS)) {
      try {
        if (toLock.tryLock(1, SECONDS)) {
          try { /* transfer */ }
          finally { toLock.unlock(); }
        } else { /* abort or retry */ }
      } finally { fromLock.unlock(); }
    }
  • Fewer locks β€” coarser locking reduces surface area. Or use lock-free data structures.

Detection from a thread dump (jstack -l <pid>): look for "Found one Java-level deadlock" β€” JVM automatically reports deadlocks on intrinsic locks and ReentrantLocks.

Livelock ​

Threads aren't blocked, but they're making no useful progress β€” each reacts to the other's state and keeps changing. Example: two threads politely backing off when they see the other trying, and thus neither ever gets to go.

Fix: introduce randomness (jittered backoff), or use fairness.

Starvation ​

One thread never gets resources because others are continually granted them. Often a fairness problem.

  • Unfair locks / semaphores β€” a thread can keep losing the race to acquire.
  • Priority scheduling β€” low-priority threads starved by high-priority ones.
  • Read-write locks β€” a steady stream of readers can starve writers (or vice versa).

Fix: fair locks (new ReentrantLock(true), new Semaphore(n, true)), or explicit scheduling.

Priority inversion ​

Low-priority thread holds a lock that a high-priority thread needs. A medium-priority thread monopolizes the CPU, blocking the low-priority one, which blocks the high-priority one. Caused the Mars Pathfinder to reset.

Fix: priority inheritance (the lock holder temporarily inherits the waiter's priority). Java doesn't provide this directly; typically not an issue because you rarely rely on Java thread priorities.

Thread leak ​

A thread that's never going to terminate, typically:

  • Pool never shut down.
  • Thread stuck forever on uninterruptible blocking (e.g., socket.read() without a timeout).
  • Task that catches InterruptedException and continues.

Symptoms: thread count grows over time; heap dump shows many Thread instances; JVM never exits.

Fix: timeouts on I/O, honor interruption, shutdown() pools on app exit, monitor thread count.

Lost update ​

Classic read-modify-write race. Two threads read the same value, each modifies independently, each writes back. The second write overwrites the first.

Fix: atomics, locks, or optimistic concurrency (version columns, CAS).

ThreadLocal leak in thread pools ​

java
private static final ThreadLocal<LargeObject> CACHE = new ThreadLocal<>();

pool.submit(() -> {
  CACHE.set(buildLargeObject());
  // forget to remove β†’ LargeObject stays pinned to this thread forever
});

Pool threads live for the lifetime of the app. Anything stored in ThreadLocal without remove() is a memory leak across the pool's lifetime.

Fix: always try { ... } finally { CACHE.remove(); }. Or use Scoped Values (Β§16).

Reading a deadlock from a thread dump ​

"Thread-1" #12 prio=5 os_prio=0 tid=0x... nid=0x...
   java.lang.Thread.State: BLOCKED (on object monitor)
   at com.example.Bank.transfer(Bank.java:20)
   - waiting to lock <0x...> (a java.lang.Object)
   - locked <0x...>              (a java.lang.Object)

"Thread-2" #13 prio=5 os_prio=0 tid=0x... nid=0x...
   java.lang.Thread.State: BLOCKED (on object monitor)
   at com.example.Bank.transfer(Bank.java:20)
   - waiting to lock <0x...> (a java.lang.Object)
   - locked <0x...>              (a java.lang.Object)

Found one Java-level deadlock:
...

Clues:

  • State BLOCKED (on object monitor) β€” waiting for a synchronized lock.
  • waiting to lock X paired with locked Y β€” cycle with another thread's waits.
  • JVM's automatic deadlock detection prints the "Found one Java-level deadlock" summary for intrinsic locks and Java-level locks.

Q&A ​

  1. Q: What are the four conditions for deadlock?

    • Mutual exclusion, hold-and-wait, no preemption, circular wait. All four must hold.
  2. Q: How do you prevent deadlock?

    • Most common: enforce a global lock ordering so circular wait is impossible. Backup: tryLock with timeout + backoff. Better: redesign to acquire fewer locks (coarser grain, or lock-free data structures).
  3. Q: Race condition vs data race?

    • Data race = two threads accessing the same non-volatile field, at least one writing, no happens-before edge β€” undefined behavior per JMM. Race condition = general β€” outcome depends on interleaving (may or may not involve a data race).
  4. Q: What's livelock?

    • Threads respond to each other's state and change, but make no forward progress (retry loops, politeness). Fix with randomized backoff or fairness.
  5. Q: What causes thread leaks?

    • Pools not shut down, uninterruptible blocking without timeout, swallowed InterruptedException, long-running daemons missed on graceful shutdown. Detect by monitoring thread counts.
  6. Q: Why do ThreadLocals leak in thread pools?

    • Pool threads never die, so values set via ThreadLocal.set() persist indefinitely. Always remove() in a finally, or use Scoped Values.
  7. Q: How do you diagnose a suspected deadlock in production?

    • Thread dump (jstack -l <pid> or jcmd <pid> Thread.print). Look for BLOCKED threads and cross-referenced locks. The JVM auto-reports intrinsic-lock deadlocks at the end of the dump.

18. Concurrency patterns ​

Immutability ​

An immutable object is inherently thread-safe β€” no writes means no races. Build immutability into your domain model:

  • final class (or sealed), all final fields.
  • No setters.
  • Defensive copies on input (arrays, collections).
  • No leak of this from constructor.

Records (Java 14+) are immutable by construction:

java
public record Money(long amountCents, Currency currency) {}

If you need "mutation," return a new instance:

java
public record Order(List<Item> items, Status status) {
  public Order withStatus(Status newStatus) { return new Order(items, newStatus); }
}

When to use: value objects, configs, event messages β€” basically anywhere you don't need mutable state.

Thread confinement ​

Data that's only ever accessed from one thread needs no synchronization. Several forms:

  • Stack confinement β€” local variables. Trivially thread-safe.
  • Thread confinement via ThreadLocal β€” e.g., SimpleDateFormat (not thread-safe) wrapped per-thread.
  • Serial thread confinement β€” an object is handed off from one thread to another via a safe publication mechanism (queue), and only the current owner accesses it. Used in pipelines.

Monitor pattern ​

Class encapsulates mutable state, all access guarded by the same lock. Classic OO concurrency:

java
public class Counter {
  private int count;
  public synchronized void inc() { count++; }
  public synchronized int get() { return count; }
}

Simple and safe, but coarse-grained β€” high contention kills throughput. Move to finer-grained locks or atomics for hot paths.

Lock striping ​

Instead of one lock for a whole data structure, maintain N locks and hash keys to a lock. Reduces contention:

java
class StripedCache<K, V> {
  private static final int STRIPES = 16;
  private final Object[] locks = new Object[STRIPES];
  private final Map<K, V>[] maps;

  @SuppressWarnings("unchecked")
  StripedCache() {
    maps = (Map<K, V>[]) new Map[STRIPES];
    for (int i = 0; i < STRIPES; i++) { locks[i] = new Object(); maps[i] = new HashMap<>(); }
  }

  private int stripe(K k) { return (k.hashCode() & 0x7fffffff) % STRIPES; }

  public V get(K k) {
    int s = stripe(k);
    synchronized (locks[s]) { return maps[s].get(k); }
  }

  public void put(K k, V v) {
    int s = stripe(k);
    synchronized (locks[s]) { maps[s].put(k, v); }
  }
}

This is exactly what pre-Java-8 ConcurrentHashMap did (with 16 stripes). Post-Java 8 it's per-bucket; same idea at finer granularity.

Read-write lock pattern ​

Cache-style: many readers, occasional writers. Use ReentrantReadWriteLock or StampedLock (Β§6). Only helps when reads hold the lock long enough to parallelize usefully.

Producer-consumer / bounded buffer ​

Covered in Β§9. Core pattern for decoupling rate-of-production from rate-of-consumption, and for backpressure via bounded queue size.

Future / promise ​

Decouple "task kicked off" from "result retrieved." Java's Future, CompletableFuture, and StructuredTaskScope.Subtask all instantiate this.

Thread pool ​

Decouple task submission from execution. Centralizes lifecycle and limits resource use. The bedrock of server-side Java. Β§10.

Half-sync / half-async, leader/follower ​

  • Half-sync / half-async: I/O happens async (event loop, NIO selector) but business logic is handled synchronously on worker threads. Netty, Vert.x. Virtual threads collapse these back into plain sync.
  • Leader/follower: one thread at a time blocks on incoming work; on receiving, promotes a follower to leader before processing. Reduces handoff overhead. Apache Thrift, some RPC frameworks.

Mostly historical / implementation-level β€” good to recognize in a design discussion.

Actor-lite / Disruptor ​

  • Actor model (Akka): each actor owns private state, processes messages serially from a mailbox. Concurrency via no shared state.
  • Disruptor (LMAX): ring-buffer-based high-throughput inter-thread messaging; single producer, multiple consumers; minimizes GC and contention. Used in financial exchanges.

Niche but worth knowing they exist.

Q&A ​

  1. Q: Why is immutability a concurrency strategy?

    • Immutable objects can be shared across threads without synchronization. No writes means no races. Combine with safe publication and you're done.
  2. Q: What's lock striping?

    • Instead of one lock on a data structure, partition it into N independent regions each with its own lock. Reduces contention; ConcurrentHashMap pre-Java-8 used 16 stripes.
  3. Q: What's thread confinement?

    • Restrict an object to a single thread at a time, eliminating the need for synchronization. Stack variables are trivially confined; ThreadLocal is explicit confinement.
  4. Q: When does a Read-Write lock help?

    • Read-dominant, where reads take long enough for shared-read parallelism to pay off. For short ops or write-heavy, a single mutex or ConcurrentHashMap is faster.

19. Performance and tuning ​

Lock contention β€” measuring, not guessing ​

You can't optimize what you don't measure. Tools:

  • JFR (Java Flight Recorder) β€” built into JVM, low overhead, emits events for thread park, monitor wait, lock contention. -XX:StartFlightRecording=duration=60s,filename=rec.jfr. Open in JMC.
  • async-profiler β€” --lock mode records lock contention stack traces. Low overhead.
  • jstack / jcmd Thread.print β€” point-in-time snapshot of where threads are blocked.

False sharing ​

Two unrelated variables on the same cache line (typically 64 bytes on x86). Writes to either invalidate the line for everyone, causing cross-core cache-line bouncing even though the writes are logically independent.

Symptom: parallel code that scales poorly; profiling shows high L2/L3 miss rate.

Fix: pad variables to separate cache lines. Two Java options:

  1. Manual padding:
    java
    class Padded {
      long p1, p2, p3, p4, p5, p6, p7; // padding
      volatile long value;
      long p9, p10, p11, p12, p13, p14, p15; // padding
    }
  2. @Contended (JDK-internal, -XX:-RestrictContended or open it to user code on specific JVM versions):
    java
    @jdk.internal.vm.annotation.Contended
    long value;
    Used inside LongAdder, Thread's thread-local random number generator state, and other performance-critical JDK classes.

Real-world: AtomicLong is fine up to a few CPUs contending; beyond that, cache-line bouncing crushes it. LongAdder uses @Contended cells to avoid it.

Lock-free vs wait-free ​

  • Blocking β€” some thread holding a lock can block all others.
  • Lock-free β€” at least one thread is guaranteed to make progress in finite steps (no global blocking), but individual threads can starve under contention.
  • Wait-free β€” every thread is guaranteed to make progress in a bounded number of steps, regardless of contention.

ConcurrentLinkedQueue is lock-free. AtomicInteger.incrementAndGet via CAS loop is lock-free (a thread can keep losing the CAS). True wait-free algorithms exist but are rare and complex.

For application code: lock-free data structures (atomics, CHM) are usually enough. Wait-free is the domain of real-time/safety-critical code.

Context-switch cost ​

Every time the OS swaps a thread on/off a core: save registers, invalidate TLB entries (maybe), cold caches, scheduler bookkeeping. Micro-benchmarks measure single-digit microseconds on modern hardware, but the cache cold-start can extend the real cost significantly.

Implication: throwing more threads at work doesn't scale past a point β€” context-switch overhead eats the gain. This is one motivation for virtual threads (user-space "context switch" is much cheaper than kernel-level) and for async I/O (fewer threads, each doing more).

When concurrency hurts ​

  • Tasks shorter than the synchronization/enqueue overhead. A microsecond of work + a microsecond of ConcurrentQueue overhead = no speedup.
  • Shared mutable state β€” any serial point dominates (Amdahl's Law).
  • Heavy contention on a few locks.
  • Too many threads for too few cores β€” context-switch thrashing.

Rule: profile before parallelizing. Parallelism is a multiplier on correctly-designed code, not a fix for slow code.

Sizing pools β€” Goetz's formula recap ​

N_threads = N_cpus Γ— target_util Γ— (1 + wait_time / compute_time)
  • Measure wait vs compute in production (profiler, tracing).
  • Typical REST service hitting DB + cache + 1 downstream: wait/compute β‰ˆ 5–15 β†’ pool size β‰ˆ 5–15 Γ— cores.
  • For compute-heavy: size to CPU count.

Little's Law ​

L = Ξ» Γ— W

Mean concurrent items = arrival rate Γ— mean time per item. If you serve 500 req/s at 200 ms p50, you need at least 100 concurrent-in-flight slots. If your pool is smaller than that (and the queue is bounded), you'll reject.

Use this to size pools and queues before measuring.

Amdahl's Law recap ​

Serial fraction S β†’ maximum speedup = 1 / S even with infinite parallelism. If 10% of code is serial, cap is 10Γ—. Find and eliminate serial bottlenecks before scaling horizontally.

Q&A ​

  1. Q: What is false sharing and how do you fix it?

    • Two logically independent variables sharing a cache line; writes to either cause cross-core cache invalidation. Fix: pad them onto separate cache lines, or use @Contended where available.
  2. Q: What's the difference between lock-free and wait-free?

    • Lock-free: system-wide progress guaranteed (some thread always moves); individual threads may starve. Wait-free: every thread makes progress in a bounded number of steps.
  3. Q: How do you pick a thread pool size?

    • Goetz's formula: cores Γ— util Γ— (1 + W/C). Measure wait vs compute. Sanity-check with Little's Law against expected throughput Γ— latency.
  4. Q: When does parallel code hurt performance?

    • Tiny tasks (overhead dominates), heavy contention, too many threads for cores, serial-dominated workloads (Amdahl). Profile first.
  5. Q: Your service has 8 cores and serves 500 req/s at 100 ms each. How many concurrent requests do you need capacity for?

    • Little's Law: L = 500 Γ— 0.1 = 50 concurrent slots. Pool size should be β‰₯50 to avoid queueing becoming the bottleneck.
  6. Q: How do you measure lock contention in production?

    • JFR with lock-contention events, async-profiler --lock mode, thread dumps under load. Look for threads repeatedly BLOCKED or parked on the same monitor.

Part 7 β€” Testing, debugging, and Spring ​

20. Testing concurrent code ​

Why concurrency bugs are hard to test ​

  • Non-deterministic β€” a bug may manifest once in ten thousand runs. A green test run doesn't mean the bug is absent.
  • Heisenbugs β€” adding logging, debugger breakpoints, or test instrumentation can change timing and mask the bug.
  • Platform dependent β€” reordering allowed by the JMM may not happen on x86 (strong model) but does on ARM (weaker). A test passing locally can fail in prod.

Goal of concurrency testing: maximize likelihood of exercising the race by varying timing and load, rather than a single happy-path run.

Coordination primitives in tests ​

Most bugs surface when you force specific interleavings. Use:

  • CountDownLatch to coordinate "everyone ready, go!" timing:
    java
    @Test
    void concurrentIncrementsAreAtomic() throws InterruptedException {
      Counter c = new Counter();
      int n = 100;
      CountDownLatch start = new CountDownLatch(1);
      CountDownLatch done = new CountDownLatch(n);
      ExecutorService pool = Executors.newFixedThreadPool(n);
      for (int i = 0; i < n; i++) {
        pool.submit(() -> {
          try { start.await(); } catch (InterruptedException e) { return; }
          for (int j = 0; j < 1000; j++) c.inc();
          done.countDown();
        });
      }
      start.countDown();                       // everyone goes at once
      assertTrue(done.await(10, TimeUnit.SECONDS));
      assertEquals(n * 1000, c.get());         // if not atomic, will be less
      pool.shutdown();
    }
  • Phaser for multi-phase tests.
  • CyclicBarrier when phases are short and reusable.

Awaitility ​

DSL for polling with timeout β€” clean way to assert eventual conditions:

java
await().atMost(2, SECONDS).until(() -> queue.size() == 0);
await().atMost(500, MILLIS).until(repository::findAll, hasSize(10));

Beats Thread.sleep(...) in tests β€” faster on success, more robust on failure.

Testing CompletableFuture ​

java
@Test
void asyncProcessor_returnsResult() {
  CompletableFuture<String> cf = processor.processAsync("input");
  assertThat(cf.get(5, SECONDS)).isEqualTo("INPUT");   // bounded wait
}

@Test
void asyncProcessor_propagatesErrors() {
  CompletableFuture<String> cf = processor.processAsync(null);
  assertThatThrownBy(() -> cf.get(5, SECONDS))
      .isInstanceOf(ExecutionException.class)
      .hasCauseInstanceOf(IllegalArgumentException.class);
}

Always use timed get(timeout, unit) in tests β€” an untimed get() that hangs turns a failing test into an infinite hang.

Controllable executors for tests ​

In production code, accept an Executor or ExecutorService as a dependency. Pass a synchronous or controllable one in tests:

java
// Synchronous runner β€” runs tasks on the caller thread
Executor caller = Runnable::run;

// Or use AWait.runInThread with a CountDownLatch

For Spring @Async, inject a SyncTaskExecutor in tests to make assertions deterministic.

jcstress β€” OpenJDK concurrency stress-testing ​

jcstress (Java Concurrency Stress tests) runs your code across many threads and iterations while capturing every outcome. It's the tool for testing JMM-level properties:

java
@JCStressTest
@Outcome(id = "1, 1", expect = Expect.ACCEPTABLE,   desc = "Both saw the write")
@Outcome(id = "0, 0", expect = Expect.FORBIDDEN,    desc = "Reordering β€” bug!")
@State
public class VolatilePublication {
  int x; volatile boolean ready;

  @Actor public void writer() { x = 1; ready = true; }
  @Actor public void reader(II_Result r) {
    r.r1 = ready ? 1 : 0;
    r.r2 = x;
  }
}

jcstress runs these actors in all possible interleavings, reports observed outcomes. If "0,0" is forbidden but observed β†’ bug. Reserved for library-level JMM testing; overkill for business logic.

Load testing > unit testing for concurrency ​

Unit tests at most demonstrate correctness under a single run. For true confidence:

  • Stress the actual service under multi-user load (JMeter, gatling, k6).
  • Run with assertions on invariants (e.g., no lost messages, counter values match).
  • Run with contention-aware profiling.
  • Chaos β€” fault injection, latency injection, slow consumer simulations.

Deterministic bugs β€” don't always need concurrency testing ​

Many "concurrency bugs" are actually design bugs. If your service is stateless and delegates all coordination to a queue/DB, concurrency testing is less valuable than basic integration testing against a real broker/DB (via Testcontainers).

πŸ’‘ Applied in practice: the "integration tests with real MongoDB via Testcontainers, no DB mocking" guideline is exactly this philosophy β€” test against the real thing, use jcstress/unit tests only for code that has inherent concurrency semantics.

Q&A ​

  1. Q: Why is concurrent code hard to test?

    • Non-determinism β€” the bug may appear in 1 run of 10,000. Adding logs/breakpoints changes timing and masks bugs. Platform-dependent JMM behavior. Green runs don't prove absence of bugs.
  2. Q: How do you coordinate threads in a test to exercise a race?

    • CountDownLatch gates: a "start" latch to fire all threads simultaneously, a "done" latch to collect. Sometimes Phaser for multi-phase. Avoid Thread.sleep as a timing primitive β€” use Awaitility.
  3. Q: What's jcstress?

    • OpenJDK's concurrency stress-test framework. Runs actors in every possible interleaving and classifies outcomes. Used for JMM-level tests (e.g., proving a data structure's reordering behavior).
  4. Q: Why always use timed Future.get(timeout) in tests?

    • Untimed get() is a hang on bug β€” the test appears to pass nothing, hangs CI, wastes time. Timed get() fails loudly within the timeout.
  5. Q: How do you make Spring @Async deterministic in tests?

    • Inject a SyncTaskExecutor (executes on caller thread), or a controllable ThreadPoolTaskExecutor with a CountDownLatch hook. Sometimes override @Async's executor bean in a test config.

21. Debugging & troubleshooting ​

Thread dumps β€” your first tool ​

bash
jstack <pid>             # prints to stdout
jstack -l <pid>          # also prints locked synchronizers (AQS)
jcmd <pid> Thread.print  # modern equivalent, faster
kill -3 <pid>            # UNIX: sends SIGQUIT β†’ JVM prints thread dump to stdout/log

A thread dump is a point-in-time snapshot of every thread's state and stack. For any "application is hung" or "this request is slow" question, this is step 1.

Take multiple dumps 10–30s apart. If a thread is at the same stack across dumps, it's stuck; if it's at different stacks, it's doing work.

Thread states in a dump (review) ​

  • RUNNABLE β€” on CPU or ready. A thread at java.net.SocketInputStream.socketRead0 (native) is "RUNNABLE" but actually blocked in the kernel.
  • BLOCKED (on object monitor) β€” waiting for a synchronized lock.
  • WAITING (parking) β€” blocked in LockSupport.park, which is ReentrantLock, Condition.await, CountDownLatch.await, CompletableFuture.get, etc.
  • WAITING (on object monitor) β€” Object.wait().
  • TIMED_WAITING β€” timed versions of the above.

Deadlock detection ​

JVM auto-detects deadlocks on intrinsic locks and java.util.concurrent.locks locks; prints them at the end of the dump:

Found one Java-level deadlock:
=============================
"Thread-2":
  waiting to lock monitor 0x... (object 0x..., a Object),
  which is held by "Thread-1"
"Thread-1":
  waiting to lock monitor 0x... (object 0x..., a Object),
  which is held by "Thread-2"

Live detection: VisualVM / JConsole "Detect Deadlock" button hits the same JVM API.

Pool / thread-count growth ​

jcmd <pid> Thread.print | grep -c "java.lang.Thread.State"   # rough thread count

Monitor over time; unbounded growth = leak. Can also expose JVM thread count via Actuator (/actuator/metrics/jvm.threads.live).

JFR (Java Flight Recorder) ​

bash
# Start recording
jcmd <pid> JFR.start name=prof duration=60s filename=/tmp/rec.jfr

# Or on startup
java -XX:StartFlightRecording=duration=60s,filename=/tmp/rec.jfr MyApp

Open in JDK Mission Control (JMC). Relevant views:

  • Thread CPU β€” who's burning CPU, broken down by method.
  • Lock Contention β€” which monitors are most contended, with stack traces for acquirers and contenders.
  • Java Monitor Blocked β€” threads blocked on monitors.
  • Java Thread Park β€” threads parked (LockSupport) β€” ReentrantLocks, semaphores.
  • Virtual Thread Pinned β€” -XX:+UnlockExperimentalVMOptions may be needed on older JDKs.

JFR is low-overhead (~1–2%) and can be left enabled in production for incident forensics.

async-profiler ​

Flame graphs of CPU or lock contention. Excellent complement to JFR:

bash
./profiler.sh -e cpu -d 60 -f /tmp/cpu.html <pid>      # CPU flame graph
./profiler.sh -e lock -d 60 -f /tmp/lock.html <pid>    # lock-contention flame graph

Lock-contention flame graphs show stack traces of threads that were blocked and stack traces of the holders β€” great for "who is holding X so long that I'm blocked?"

Heap dumps for thread leaks ​

bash
jcmd <pid> GC.heap_dump /tmp/heap.hprof

Open in Eclipse MAT. Navigate to java.lang.Thread instances, check if count matches expected pool sizes. Look at ThreadLocal maps for large retained sets.

Virtual thread pinning diagnostics ​

java -Djdk.tracePinnedThreads=full     # stack trace at every pin
java -Djdk.tracePinnedThreads=short    # just a summary

JFR emits jdk.VirtualThreadPinned events you can filter in JMC.

Q&A ​

  1. Q: How do you diagnose a hung JVM?

    • Multiple thread dumps 10–30s apart, compare. Look for BLOCKED/WAITING threads all stuck at the same stack. Check the JVM's auto-deadlock report. If not a deadlock: look for long-held locks, stuck I/O (timeouts?), or CPU spinning (profile with async-profiler).
  2. Q: A production pod's CPU is at 100% but throughput is terrible β€” your approach?

    • jstack for thread states; most likely many threads BLOCKED/WAITING and a few spinning. async-profiler CPU flame graph to find the hot method. If spinning in CAS loops, suspects: contention on atomics, tight retry loops, GC thrash (check GC logs).
  3. Q: How do you distinguish a deadlock from a livelock in a dump?

    • Deadlock: threads permanently stuck in BLOCKED/parked states; stacks don't change across dumps. Livelock: threads are RUNNABLE and stacks change β€” but no useful progress (counters don't advance). Both are "stuck" but only one is detectable from thread-state inspection.
  4. Q: What's -Djdk.tracePinnedThreads=full?

    • JVM flag that emits a stack trace each time a virtual thread is pinned to its carrier (synchronized blocking, native). Critical for diagnosing virtual-thread throughput issues.
  5. Q: How do you check thread count over time?

    • Actuator /actuator/metrics/jvm.threads.live or similar, or scrape jcmd Thread.print. Unbounded growth is a leak.

22. Spring and concurrency ​

@Async β€” proxy-based async ​

java
@Service
public class EmailService {
  @Async
  public CompletableFuture<Boolean> send(Email e) {
    smtpClient.send(e);
    return CompletableFuture.completedFuture(true);
  }
}

Requires @EnableAsync on a @Configuration class.

Proxy caveat β€” the classic gotcha: @Async is implemented via a Spring-created proxy. A call from inside the same bean bypasses the proxy, so it runs synchronously:

java
@Service
public class OrderService {
  public void processOrder(Order o) {
    // ...
    notify(o);   // INTRA-CLASS β€” bypasses proxy, NOT async!
  }
  @Async
  public void notify(Order o) { /* ... */ }
}

Fix: split into two beans, inject the other:

java
@Service
public class OrderService {
  private final NotifyService notify;
  public OrderService(NotifyService notify) { this.notify = notify; }
  public void processOrder(Order o) { notify.send(o); }  // proxied call
}

@Service
public class NotifyService {
  @Async public void send(Order o) { /* ... */ }
}

Same gotcha applies to @Transactional, @Cacheable, @Retryable β€” any AOP annotation.

Also: @Async methods must be public, return void, Future, or CompletableFuture. Private and package-private methods aren't proxied.

Configuring the @Async executor ​

Without configuration, Spring uses a SimpleAsyncTaskExecutor β€” creates a new thread per task. Do not ship to production with this default.

java
@Configuration
@EnableAsync
public class AsyncConfig implements AsyncConfigurer {
  @Override public Executor getAsyncExecutor() {
    ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
    exec.setCorePoolSize(8);
    exec.setMaxPoolSize(32);
    exec.setQueueCapacity(200);
    exec.setThreadNamePrefix("app-async-");
    exec.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
    exec.initialize();
    return exec;
  }

  @Override public AsyncUncaughtExceptionHandler getAsyncUncaughtExceptionHandler() {
    return (ex, method, params) -> log.error("@Async failed: {}", method, ex);
  }
}

Uncaught exceptions: for @Async methods returning void, exceptions go to the AsyncUncaughtExceptionHandler. For CompletableFuture returns, they go to the future.

@Async with virtual threads (Spring Boot 3.2+) ​

properties
spring.threads.virtual.enabled=true

Spring then uses a virtual-thread-per-task executor for @Async, Tomcat request handling, and scheduling. Be aware of pinning (Β§14).

You can also explicitly provide a TaskExecutor:

java
@Bean
public AsyncTaskExecutor applicationTaskExecutor() {
  return new TaskExecutorAdapter(Executors.newVirtualThreadPerTaskExecutor());
}

@Scheduled and the TaskScheduler ​

java
@Scheduled(fixedRate = 5000)
public void poll() { /* ... */ }

@Scheduled(cron = "0 */10 * * * *", zone = "UTC")
public void every10Minutes() { /* ... */ }

Requires @EnableScheduling (your productivity app has this). Key attributes:

  • fixedRate β€” N ms between invocations (concurrent if task > rate).
  • fixedDelay β€” N ms after completion before next.
  • initialDelay β€” first run offset.
  • cron β€” cron expression.

Default scheduler has a single thread. If one task is slow, others block. Configure a sized scheduler:

java
@Bean
public TaskScheduler taskScheduler() {
  ThreadPoolTaskScheduler s = new ThreadPoolTaskScheduler();
  s.setPoolSize(4);
  s.setThreadNamePrefix("scheduler-");
  return s;
}

@Transactional + @Async ​

Each runs on its own thread; transaction context does not propagate automatically across thread boundaries. An @Async method annotated with @Transactional starts a new transaction on the async thread. The caller's transaction neither includes nor rolls back because of it.

Practical consequence: if your flow is "do DB work, then kick off an async email send," the email failure can't roll back the DB. Treat them as independent boundaries.

If you need outbox semantics (atomic DB + message), use the outbox pattern (Β§17 of INTERVIEW_PREP.md #5): write a row to an outbox table inside the main transaction, have a separate scheduler drain it. This is the same pattern you use for Kafka-producer reliability.

Controller return types for async ​

Spring MVC supports several:

  • Callable<T> β€” controller returns Callable; Spring runs it on a TaskExecutor, releases the servlet thread. Good for "long compute but still sync-style."
    java
    @GetMapping("/slow") public Callable<String> slow() { return () -> compute(); }
  • DeferredResult<T> β€” controller returns immediately; completion triggered externally (e.g., from a message handler, webhook, or another thread). Good for "wait until something happens."
    java
    @GetMapping("/wait") public DeferredResult<Event> wait(String id) {
      DeferredResult<Event> r = new DeferredResult<>(10_000L); // 10s timeout
      pendingRequests.put(id, r);
      return r;
    }
    // Elsewhere:  pendingRequests.remove(id).setResult(event);
  • CompletableFuture<T> β€” natural fit for async composition.
  • ResponseBodyEmitter / SseEmitter β€” streaming responses / server-sent events.

All four release the servlet thread while waiting β€” critical for high-concurrency APIs on a limited Tomcat thread pool.

Spring WebFlux / Reactor (brief) ​

Reactive stack built on Project Reactor (Mono<T>, Flux<T>). Non-blocking by design β€” backpressure, functional composition. Historically the answer to "scale to many concurrent connections without huge thread counts."

With virtual threads, the main motivation for WebFlux (thread-count scaling) largely goes away for typical imperative workloads. WebFlux still wins for:

  • Streaming data (SSE, WebSocket).
  • Reactive all the way down (reactive DB driver, reactive message consumer).
  • Explicit backpressure semantics.

For most Spring shops, the pragmatic choice is: Spring MVC + virtual threads unless you have a specific reactive driver requirement.

Q&A ​

  1. Q: What's the @Async self-invocation pitfall?

    • Intra-bean calls (this.foo() inside the same class) bypass the Spring proxy, so @Async (and @Transactional) don't apply. Fix: move the annotated method to another bean.
  2. Q: What's the default @Async executor and why is it dangerous?

    • SimpleAsyncTaskExecutor β€” spawns a new thread per task. Under load, thread count explodes. Configure a ThreadPoolTaskExecutor with explicit bounds and rejection policy.
  3. Q: Does @Transactional propagate across @Async?

    • No β€” different thread, different transaction. If you need atomicity across the boundary, use the outbox pattern.
  4. Q: Callable vs DeferredResult in a controller β€” which do you pick?

    • Callable when the async work is internal to your service (submit to a pool). DeferredResult when the completion comes from outside (a message handler, another API, a webhook).
  5. Q: Spring Boot 3.2+ β€” how do you enable virtual threads?

    • spring.threads.virtual.enabled=true. Tomcat, @Async, scheduling all switch to virtual-thread-per-task. Watch out for pinning if you rely heavily on synchronized.
  6. Q: When would you choose WebFlux over MVC in 2026?

    • Streaming/SSE/WebSocket workloads, full-reactive stacks (reactive DB drivers), or explicit backpressure needs. For typical sync-style services, MVC + virtual threads is the pragmatic choice.
  7. Q: Your @Scheduled task takes 30 seconds occasionally, scheduled every 10s. What happens?

    • Default single-thread scheduler: new invocations queue up behind the slow one. Configure a pool-sized TaskScheduler, or use fixedDelay so the next invocation is only scheduled after completion.

Part 8 β€” Quick reference ​

23. Glossary ​

  • Atomic operation β€” indivisible from other threads' perspective; appears to happen all-or-nothing.
  • ABA problem β€” value changes Aβ†’Bβ†’A; naive CAS doesn't notice. Fix with versioning.
  • Acquire / release semantics β€” memory-barrier semantics on a volatile read / volatile write respectively. No writes after cross the acquire; no writes before cross the release.
  • Barrier (memory) β€” CPU instruction preventing reordering across a point.
  • Blocking β€” a thread that cannot make progress without an external event (lock released, data available).
  • Carrier thread β€” a platform thread inside the virtual-thread scheduler that runs virtual threads.
  • CAS (compare-and-swap) β€” atomic: set to new if value equals expected.
  • Condition variable β€” primitive for threads to wait for a predicate; Condition or Object.wait.
  • Context switch β€” OS swapping one thread off a core for another; has significant cache/TLB cost.
  • Continuation β€” saved state of a virtual thread (stack + locals) so it can be unmounted and resumed.
  • Critical section β€” code protected by a lock; only one (or N) thread at a time.
  • Deadlock β€” mutual blocking via a circular wait chain.
  • Fairness β€” FIFO ordering of waiters vs. unfair/greedy (newest-in-first on some locks).
  • Fork/Join β€” recursive divide-and-conquer using work-stealing deques.
  • Happens-before β€” JMM relation: if A hb B, A is ordered before and visible to B.
  • Heisenbug β€” a bug that changes behavior when observed (logs, breakpoints).
  • Intrinsic lock β€” the monitor every Java object has; acquired by synchronized.
  • Livelock β€” threads keep changing state but make no progress.
  • Lock-free β€” system-wide progress guaranteed (some thread always moves).
  • Monitor β€” intrinsic lock + condition variable bundled; every object has one.
  • Mutex β€” mutual-exclusion lock (vs. semaphore which may allow N).
  • Parking β€” putting a thread to sleep without releasing locks; LockSupport.park.
  • Pinning β€” virtual thread stuck on its carrier because it cannot be unmounted (inside synchronized or native).
  • Platform thread β€” traditional OS-backed Java thread (1:1 with kernel thread).
  • Publication (safe) β€” making an object visible to other threads with its fields initialized.
  • Race condition β€” outcome depends on thread interleaving.
  • Race (data) β€” unsynchronized concurrent access, at least one write. Undefined behavior per JMM.
  • Reentrant lock β€” same thread can re-acquire a lock it already holds.
  • Semaphore β€” N permits, bounded resource access.
  • Spurious wakeup β€” wait() returning without a matching notify. Always loop.
  • Starvation β€” a thread indefinitely denied resources by others.
  • Stop-the-world (STW) β€” GC phase during which all app threads are paused.
  • Structured concurrency β€” parent-child task lifetimes bounded to a lexical scope.
  • Virtual thread β€” JVM-managed lightweight thread, cheap to create, runs on carriers.
  • Volatile β€” field modifier giving visibility and ordering but not atomicity of compound ops.
  • Wait-free β€” every thread makes progress in bounded steps regardless of contention.
  • Work-stealing β€” idle workers steal tasks from busy workers' deques.

24. Decision tables ​

Which lock? ​

SituationPick
Simple mutual exclusion, short sectionsynchronized
Need tryLock, timeout, interruptibility, fairnessReentrantLock
Read-heavy, many concurrent readersReentrantReadWriteLock β€” or ConcurrentHashMap
Read-very-heavy, short writes, not reentrantStampedLock with optimistic reads
Per-partition locking on a mapConcurrentHashMap (built-in striping)
Lock across methods (non-block-structured)ReentrantLock (explicit acquire/release)

Which queue? ​

SituationPick
Bounded producer-consumer, low-ish contentionArrayBlockingQueue
Bounded or unbounded, high mixed contentionLinkedBlockingQueue
Priority-ordered tasksPriorityBlockingQueue
Direct handoff, no queuingSynchronousQueue
Time-based scheduling / retriesDelayQueue
Handoff if possible, queue if notLinkedTransferQueue
Non-blocking FIFO, never block on enqueueConcurrentLinkedQueue

Which executor? ​

SituationPick
Production workload, genericHand-rolled ThreadPoolExecutor with bounded queue + CallerRunsPolicy
I/O-bound, Java 21+Executors.newVirtualThreadPerTaskExecutor()
CPU-bound divide-and-conquerForkJoinPool (or commonPool if non-blocking)
Scheduled / periodicScheduledThreadPoolExecutor (or Spring TaskScheduler)
Single serial executorExecutors.newSingleThreadExecutor (prefer explicit pool + size 1)

Which concurrent collection? ​

SituationPick
Generic concurrent mapConcurrentHashMap
Sorted concurrent map / range queriesConcurrentSkipListMap
Read-dominant list (observers)CopyOnWriteArrayList
Non-blocking FIFO queueConcurrentLinkedQueue
Blocking producer-consumerArrayBlockingQueue / LinkedBlockingQueue
Blocking deque (work-stealing)LinkedBlockingDeque

Which atomic / counter? ​

SituationPick
Low-contention counterAtomicInteger / AtomicLong
Write-heavy counter (metrics)LongAdder
CAS on an object referenceAtomicReference
Lock-free linked list / stack (avoid ABA)AtomicStampedReference
Low-memory atomic across many instancesAtomicReferenceFieldUpdater (or VarHandle)
Custom reduction (min/max/product)LongAccumulator / DoubleAccumulator

synchronized vs ReentrantLock vs StampedLock ​

FeaturesynchronizedReentrantLockStampedLock
ReentrantYesYesNo
tryLock / timedNoYesYes
InterruptibleNolockInterruptiblyYes
FairNoOptionalNo
Multiple ConditionsNoYesNo
Optimistic readsNoNoYes
Auto-release on exceptionYesNo (try-finally)No
Virtual-thread friendlyJava 24+AlwaysAlways

25. Top 30 rapid-fire interview questions ​

Last-minute pre-interview review. If you can speak 30 seconds to each, you're ready.

  1. Concurrency vs parallelism? Composition/structure vs simultaneous execution.
  2. Three hard problems of concurrency? Atomicity, visibility, ordering.
  3. What is the JMM? Spec of reorderings and visibility via happens-before.
  4. Name happens-before rules. Program order, monitor, volatile, thread start, thread join, interrupt, transitivity.
  5. Why is i++ not atomic? Three operations: load, add, store. Two threads interleave, lose updates.
  6. volatile β€” what does it give you, what doesn't it? Gives visibility + ordering + 64-bit atomicity. Doesn't give compound-op atomicity or mutual exclusion.
  7. synchronized vs ReentrantLock? synchronized for simple cases (auto-release, cleaner). ReentrantLock for tryLock, timeouts, interruptibility, fairness, multiple Conditions.
  8. Why double-checked locking needs volatile? Otherwise the reference can be published before the constructor's writes complete β€” reader sees a partial object.
  9. What does Thread.sleep do to locks? Nothing β€” holds them. Object.wait is the only call that releases its monitor.
  10. How do you cancel a thread? Cooperatively via interrupt(). Target must check isInterrupted or let an interruptible blocking call throw.
  11. Never-swallow rule for InterruptedException? Propagate or restore the flag: Thread.currentThread().interrupt().
  12. CAS? Compare-and-swap β€” atomic hardware op: set to new if equals expected. Foundation of lock-free code.
  13. ABA problem? A→B→A change invisible to a naive CAS. Fix with AtomicStampedReference / version counters.
  14. AtomicLong vs LongAdder? LongAdder stripes writes across cache lines; much faster under write contention. Use for metrics/counters.
  15. ConcurrentHashMap (Java 8+) internals? Per-bucket sync + CAS on empty buckets + treeify at 8. Lock-free reads via volatile.
  16. Why no nulls in ConcurrentHashMap? get returning null is ambiguous between absent and present-with-null, unresolvable under concurrency.
  17. ThreadPoolExecutor task flow? Core β†’ queue β†’ max β†’ reject. Unbounded queue means max never kicks in.
  18. Why is newFixedThreadPool dangerous? Unbounded LinkedBlockingQueue β†’ queue OOM under overload.
  19. Goetz's pool-sizing formula? cores Γ— util Γ— (1 + W/C).
  20. CompletableFuture.thenCompose vs thenCombine? Compose = flatMap, sequential dependency. Combine = zip of two independent CFs.
  21. Common-pool trap? Default async uses ForkJoinPool.commonPool() (cores-1). Blocking work saturates it. Always pass your own executor.
  22. Work-stealing? Each worker has its own deque; idle workers steal from others' tails. Fork/Join + parallel streams.
  23. When are virtual threads a win? I/O-bound, high-concurrency, blocking imperative code.
  24. When do virtual threads NOT help? CPU-bound, low concurrency, pinning-heavy code.
  25. What is pinning? Virtual thread stuck on carrier due to synchronized (pre-Java 24) or native. Fix with ReentrantLock or Java 24+.
  26. Coffman's 4 deadlock conditions? Mutual exclusion, hold-and-wait, no preemption, circular wait.
  27. How to prevent deadlock? Global lock ordering; tryLock with backoff; fewer locks (lock-free, coarser).
  28. Producer-consumer outline? Bounded BlockingQueue; producer put, consumer take. Bounded queue = backpressure. Poison pill or shutdownNow to stop.
  29. @Async self-invocation issue? Intra-bean calls bypass the Spring proxy β€” not async. Split into two beans.
  30. Reading a thread dump β€” what's BLOCKED vs WAITING? BLOCKED = waiting to enter a synchronized block. WAITING = in Object.wait, join, or parked (which covers ReentrantLock, CountDownLatch, etc.).

26. Further reading ​

Canonical books ​

  • Brian Goetz, Java Concurrency in Practice (2006) β€” still the definitive book. Pre-Java 8 but the JMM and patterns are unchanged. Every senior Java dev should have read it. Spend time on Chapters 3 (Visibility), 5 (Building Blocks), 6 (Task Execution), and 16 (The JMM).
  • Doug Lea, Concurrent Programming in Java β€” older but foundational (he wrote most of java.util.concurrent).

JEPs to know ​

  • JEP 444 β€” Virtual Threads (Java 21, final)
  • JEP 453 / 462 / 480 / 499 / 505 β€” Structured Concurrency (preview, still evolving)
  • JEP 429 / 446 / 464 / 481 β€” Scoped Values (preview, still evolving)
  • JEP 491 β€” Synchronize Virtual Threads without Pinning (Java 24 β€” removes synchronized pinning)
  • JEP 425 β€” Virtual Threads Preview (Java 19, historical)

Standard references ​

  • java.util.concurrent package summary β€” https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/package-summary.html
  • Java Language Spec Β§17 (Threads and Locks) β€” the JMM, formally.
  • Doug Lea's jsr166 site β€” concurrent-utilities updates and JMM papers.

Deep dives when you want more ​

  • Aleksey ShipilΡ‘v's JMM talks (YouTube) β€” the canonical "what the JMM really means" deep dive.
  • Martin Thompson / LMAX Disruptor talks β€” high-performance concurrency patterns.
  • Ron Pressler's Project Loom talks β€” virtual-thread internals and design.

Hands-on ​

  • jcstress harness β€” if you want to stress-test JMM properties.
  • Your own code under JFR + JMC β€” even a quick run reveals lock hotspots and GC interaction you can learn from.

End of CONCURRENCY.md. Good luck β€” and when you hit a concurrency question in the interview, remember: every answer ties back to one of atomicity, visibility, or ordering. Everything else is machinery for solving those three.

Last updated: