Skip to content

Node.js API Latency And Throughput

Pack: nodejs Source: nodejs/nodejs-api-latency-and-throughput/SKILL.md Use this skill when the problem is API performance under real request load, not generic runtime style.

Scope

endpoint latency
throughput under concurrency
repeated per-request setup
I/O amplification
expensive serialization or CPU work in hot paths

Default path

Pick the slow endpoint or request class.
Measure p50 and p95 before changing code.
Remove repeated per-request setup first:
- config parsing
- client construction
- redundant auth or schema work
Batch or collapse avoidable I/O.
Keep concurrency bounded and intentional.
Move CPU-heavy work off the hot request path only when measurement shows it dominates.

When to deviate

Favor p95 and p99 over throughput if the business pain is tail latency.
Trade some peak throughput for safer memory or better cancellation when the API is interactive.
Stream responses only when it reduces real wait time or memory pressure.

Guardrails

Tail latency usually matters more than one fast happy-path benchmark.
Reusing clients and connections is often higher leverage than clever code changes.
Promise.all is not a free win when it creates downstream pressure or burst load.
Keep request cancellation and timeouts in the design when external I/O is involved.

Avoid

building clients or pools inside the request handler
unbounded concurrency in a hot path
optimizing CPU when network or database dominates
reporting throughput gains while hiding worse tail latency or memory spikes

Verification checklist

endpoint and percentile metric are explicit
per-request setup was audited
concurrency shape is intentional
the dominant I/O or CPU bottleneck is named
p95 or user-visible latency improved, not just synthetic throughput

Official references

References

Latency and throughput defaults