Build high-throughput file uploads in Node.js using streaming, backpressure, presigned URLs, multipart S3, queues, and worker threads — without melting your server.
Uploading isn't hard — uploading at scale is. The moment traffic spikes or files get big, memory vanishes, CPUs peg at 100%, and users stare at spinning cursors. Let's be real: the difference between a demo and a production system is your upload pipeline.
This is the guide I wish I had before my first "Friday-night incident."
What "high-throughput" really means
High throughput isn't just raw Mbps. It's:
- Zero copy (or close): stream bytes end-to-end without buffering whole files.
- Backpressure-aware: slow writers when readers lag.
- Bounded concurrency: many users, controlled resource use.
- Resumable & idempotent: flaky networks don't start from scratch.
- Async processing: uploads return fast; heavy work happens off the hot path.
If you nail those, you can push 10–100× more traffic through the same hardware.
Architecture at a glance (descriptive flow)
- Client obtains a presigned URL or tus endpoint.
- Client uploads directly to object storage (S3/GCS/MinIO) with multipart/resumable.
- Storage sends a callback/webhook (or client posts metadata) to your API.
- API enqueues a processing job (image/video/PDF parsing) in Redis.
- Background workers stream from storage → transform → storage, emitting progress.
- API exposes status endpoints; UI polls or subscribes to events.
This design keeps your Node web tier slim, fast, and very hard to DoS via "fat files."
Pattern 1 — Stream, don't buffer (Express/Fastify)
Avoid middlewares that slurp entire files into memory. Use low-level parsers (Busboy/fastify-multipart
) and stream.pipeline
to honor backpressure.
// Fastify example: streaming to local disk (or a temp volume)
import Fastify from 'fastify';
import multipart from '@fastify/multipart';
import { createWriteStream, promises as fs } from 'fs';
import { pipeline } from 'stream/promises';
import { randomUUID } from 'crypto';
import { resolve } from 'path';
const app = Fastify({ logger: true, bodyLimit: 0 });
await app.register(multipart, { attachFieldsToBody: 'keyValues' });
app.post('/upload', async (req, reply) => {
const parts = req.parts();
for await (const part of parts) {
if (part.type === 'file') {
const id = randomUUID();
const tmp = resolve('/var/tmp', `${id}-${part.filename}`);
await pipeline(part.file, createWriteStream(tmp)); // backpressure-aware
// TODO: hand off tmp path to a queue; do not process inline
// Minimal ack with an id the client can use for status
reply.code(202).send({ id, filename: part.filename });
return;
}
}
reply.code(400).send({ error: 'No file provided' });
});
app.listen({ port: 3000 });
Notes
bodyLimit: 0
lets you control limits explicitly (use a reverse proxy for 413s).- Always stream. Never
Buffer.concat
an entire file in memory.
Pattern 2 — Push the bytes off your server (presigned URLs)
Direct-to-storage removes your web tier from the hot path. Your API returns a presigned URL; the browser uploads straight to S3 with checksums + retry. Your CPU and memory thank you.
// Node 18+ AWS SDK v3 example: generate presigned S3 URL
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
const s3 = new S3Client({ region: 'us-east-1' });
export async function getUploadUrl({ key, contentType, checksumSHA256 }) {
const cmd = new PutObjectCommand({
Bucket: process.env.BUCKET,
Key: key,
ContentType: contentType,
// S3 will verify integrity
ChecksumSHA256: checksumSHA256,
});
const url = await getSignedUrl(s3, cmd, { expiresIn: 60 }); // seconds
return { url, key };
}
Client flow
- Hash file (
crypto.subtle.digest('SHA-256', ...)
) → base64url. - Call
/upload-url
→ receive presigned URL. PUT
file to S3 withContent-Type
+x-amz-checksum-sha256
.- Post metadata to your API → enqueue processing.
Pattern 3 — For very large files: multipart/resumable
- S3 Multipart Upload splits files into 5–256MB parts; failed parts retry independently.
- tus (open protocol) does chunked, resumable uploads with pause/resume.
Either way, your Node server handles control (create/upload/complete), not raw bytes.
Pattern 4 — Bound your concurrency (the quiet superpower)
Unbounded parallelism kills throughput via thrash. Bound it everywhere:
// Simple p-limit for controlled parallel part uploads
import pLimit from 'p-limit';
const limit = pLimit(4); // tune for your bandwidth/CPU
await Promise.all(parts.map(p => limit(() => uploadPart(p))));
Use the same idea for processing tasks and storage reads/writes.
Pattern 5 — Process off the hot path: queues + workers
Return 202 ASAP; do heavy work in a worker process with BullMQ (Redis) or Cloud Tasks equivalents.
// Producer (API)
import { Queue } from 'bullmq';
const q = new Queue('media', { connection: { host: '127.0.0.1', port: 6379 } });
await q.add('transcode', { key, userId }, {
attempts: 5, backoff: { type: 'exponential', delay: 2000 }
});
// Worker (separate process / container)
import { Worker, QueueEvents } from 'bullmq';
import sharp from 'sharp';
import { Readable } from 'stream';
import { Upload } from '@aws-sdk/lib-storage';
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';
const s3 = new S3Client({ region: 'us-east-1' });
const worker = new Worker('media', async job => {
const { key } = job.data;
const obj = await s3.send(new GetObjectCommand({ Bucket: process.env.BUCKET, Key: key }));
const transform = sharp().resize(1920).jpeg({ quality: 82 }); // CPU-bound; see Pattern 6
const upload = new Upload({
client: s3,
params: { Bucket: process.env.BUCKET, Key: `${key}.jpg`, Body: Readable.from(obj.Body).pipe(transform) },
queueSize: 4, // concurrent parts
partSize: 8 * 1024 * 1024,
});
await upload.done();
});
Why it works
- Web tier never blocks on CPU work.
- Retries and backoff are isolated in the queue.
- Workers can scale horizontally without changing the API.
Pattern 6 — Don't block the event loop (use Worker Threads)
Image/video/PDF transforms are CPU-bound. Run them in Worker Threads or a separate service so the event loop stays snappy.
// Worker Threads wrapper
import { Worker } from 'node:worker_threads';
export function runHeavyTask(payload) {
return new Promise((resolve, reject) => {
const worker = new Worker(new URL('./heavy-task.js', import.meta.url), { workerData: payload });
worker.on('message', resolve);
worker.on('error', reject);
worker.on('exit', code => code !== 0 && reject(new Error('Worker stopped')));
});
}
heavy-task.js
does the CPU work (e.g., sharp
, pdf-lib
) and posts results back.
Pro tip: Cap worker pool size to Math.max(1, cores - 1)
and reuse workers.
Pattern 7 — Idempotency & retries
Uploads and callbacks may duplicate under retries. Use idempotency keys (e.g., X-Idempotency-Key
) and upserts in your DB so "create once, confirm many" is safe. For processing, make job payloads replayable (source of truth = storage object).
Pattern 8 — Integrity, limits, and safe defaults
- Checksums: verify on the client and at storage (
ChecksumSHA256
orContent-MD5
). - Content length: reject impossible
Content-Length
or force chunked encoding. - Limits: enforce max size at the edge (CDN/WAF), then proxy (Nginx
client_max_body_size
), then app (multipart plugin limits). - Timeouts: short timeouts on the API; long timeouts on storage/network clients.
- AbortController: cancel stalled pipes cleanly.
Pattern 9 — Observability that actually helps
Log per-upload IDs and attach them to:
- presign issued time,
- client start/finish times,
- storage
ETag
/checksum, - queue job id and worker hostname,
- processing durations and outlier samples (p95/p99).
You can't fix what you can't see.
Case study (composite of real teams)
A team accepted 200–400MB media files over a single Express route using a body parser that buffered to memory. At ~70 concurrent uploads, latency exploded and nodes OOM-killed.
They switched to:
- presigned direct-to-S3,
- multipart uploads with 8MB parts, queueSize=4,
- API that returned 202 with an
uploadId
, - BullMQ workers pulling from storage and Worker Threads for transforms.
Results after a day of tuning:
- Web node RSS dropped by ~80–90%.
- p99 "upload complete → 202" fell below 300ms.
- Processing throughput scaled linearly with worker replicas.
Nothing exotic — just clean streaming, bounded concurrency, and separation of concerns.
Quick checklist (tape this above your monitor)
- Direct-to-storage (presigned/multipart) for large files
- Stream (
pipeline
) end-to-end—no buffering - Bound concurrency everywhere (uploads, parts, processing)
- Queue + workers; CPU in Worker Threads
- Idempotency keys, retries, and checksums
- Size limits at the edge + sane timeouts
- Per-upload observability (ids, p95/p99)
Conclusion
High-throughput Node uploads aren't about hero servers — they're about boring, reliable plumbing. Stream the bytes. Keep the event loop free. Push heavy lifting to the right place. With a few disciplined patterns, the system feels… calm — even on Monday mornings.
If you want a follow-up with a full sample repo (Fastify + presign + S3 multipart + BullMQ + Worker Threads), drop a comment and I'll prioritize the most requested stack.