~50 µs per orchestration
No actor mailbox, no graph materialisation, no per-call scheduling layer. Asyncc.Parallel is a heap allocation + a couple of atomic increments + your callback. The library never gets in the way.
v0.2.6 · MIT · JDK 11+ · JDK 21 native
A Java port of the Node.js async library. Compose Parallel, Series, Waterfall, Race, Map, Reduce, Queue, and Lock into pipelines. ~50 µs per orchestration overhead. Backed by virtual threads when you want them.
// fan out two enrichment lookups, score, serialize final var tasks = List.of( c -> exec.submit(() -> c.success(lookupA(req))), c -> exec.submit(() -> c.success(lookupB(req))) ); Asyncc.Parallel(tasks, wrap(results -> { var scored = score(req, results.get(0), results.get(1)); reply.send(serialize(scored)); }));
Why async.java
Most async-coordination libraries on the JVM grew out of pre-Loom assumptions: they own their thread pool, they assume long-running flows, and they layer many frames between you and your code. async.java picks a different point in the design space.
No actor mailbox, no graph materialisation, no per-call scheduling layer. Asyncc.Parallel is a heap allocation + a couple of atomic increments + your callback. The library never gets in the way.
Pass Executors.newVirtualThreadPerTaskExecutor() to NeoQueue or your tasks and every fan-out spawns on a virtual thread. The library handles the orchestration; Loom handles the threads.
Hardened in v0.2.x with dedup guards, atomic counters, slot-write-before-counter-increment ordering, and a v0.2.4 fix for the ArrayList resize race under high-throughput fan-out. Adversarial fuzz tests pin the at-most-once contract across all combinators.
c.success(v) / c.fail(e)Shorthand for c.done(null, v) and c.done(e, null). The continuation parameter is named c — short for continuation — everywhere in the docs.
WrapErrFirst.wrap(...)Wrap a value-only consumer into an error-first callback and skip the if (err != null)... preamble. Throws on unhandled errors; pair with an explicit error consumer if you want both branches.
Waterfall wrapping a Map wrapping a Parallel wrapping a Race is a perfectly normal pipeline — they all use the same error-first callback shape. See the composability showcase.
Install
Add the JitPack repository and pin the version. Releases are signed git tags on the main repo; see the releases page for the latest.
<!-- pom.xml -->
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependency>
<groupId>com.github.async-java</groupId>
<artifactId>async.java</artifactId>
<version>v0.2.6</version>
</dependency>For Gradle, see the JitPack page for v0.2.6. The library targets JDK 11 but is tested on 11, 17, and 21.
Combinators
Every combinator takes tasks (or values) and an error-first final callback. Compose them freely — they nest without surprises because they all honor the same at-most-once final-callback contract.
Benchmark
Same 5-stage pipeline, both orchestrators, 60-second sustained WebSocket runs from a Rust load tester. Numbers are end-to-end round-trip latency (parse → validate → enrich ∥ → score → serialize) on JDK 21 with a virtual-thread executor. Full methodology in the load-curve post.
| offered load | library | p50 | p99 | max | drops |
|---|---|---|---|---|---|
| 500 msg/s (50 × 10) | async.java | 5.7 ms | 14.3 ms | 46 ms | 0 |
| akka-streams | 17.8 ms | 30.7 ms | 55 ms | 0 | |
| 1 000 msg/s (200 × 5) | async.java | 5.1 ms | 14.8 ms | 21 ms | 0 |
| akka-streams | 5.9 ms | 54.3 ms | 100 ms | 0 | |
| 2 500 msg/s (50 × 50) | async.java | 5.0 ms | 11.5 ms | 18 ms | 0 |
| akka-streams | 2 017 ms | 5 230 ms | 6 258 ms | ~14 % |
The gap is dispatcher queue-wait. async.java's per-call overhead doesn't enqueue anything onto a shared
contended structure, so it stays flat as load grows. Akka Streams' per-call runWith queues a fresh actor
mailbox; under saturation the queue depth itself becomes the tail latency. Read
the full breakdown.
Project Loom
Loom changed what "blocking" costs. It didn't change what coordinating a fan-out costs. async.java handles the coordination; Loom handles the threads. The two compose cleanly.
// One executor for the whole app. VT spawn is ~250 ns; cost is essentially free.
final var vt = Executors.newVirtualThreadPerTaskExecutor();
// Optional: route NeoQueue defaults through VTs too.
NeoQueue.setExecutor(vt);
// Now every task is a virtual thread. Blocking I/O inside a task is a continuation
// park, not a kernel thread block. The orchestration is still callbacks.
Asyncc.ParallelLimit(8, fetchTasks, (err, results) -> {
// ...
});For the full Loom-integration story —
structured concurrency vs. callbacks, ThreadLocal vs ScopedValue, why NeoLock
is still relevant — see the README's Project Loom section.
Recent posts
Where the 3-4× tail-latency gap comes from, why it isn't magic, and what it means for picking a JVM async coordination library in 2026.