Build high-throughput file uploads in Node.js using streaming, backpressure, presigned URLs, multipart S3, queues, and worker threads — without melting your server.

Uploading isn't hard — uploading at scale is. The moment traffic spikes or files get big, memory vanishes, CPUs peg at 100%, and users stare at spinning cursors. Let's be real: the difference between a demo and a production system is your upload pipeline.

This is the guide I wish I had before my first "Friday-night incident."

What "high-throughput" really means

High throughput isn't just raw Mbps. It's:

  • Zero copy (or close): stream bytes end-to-end without buffering whole files.
  • Backpressure-aware: slow writers when readers lag.
  • Bounded concurrency: many users, controlled resource use.
  • Resumable & idempotent: flaky networks don't start from scratch.
  • Async processing: uploads return fast; heavy work happens off the hot path.

If you nail those, you can push 10–100× more traffic through the same hardware.

Architecture at a glance (descriptive flow)

  1. Client obtains a presigned URL or tus endpoint.
  2. Client uploads directly to object storage (S3/GCS/MinIO) with multipart/resumable.
  3. Storage sends a callback/webhook (or client posts metadata) to your API.
  4. API enqueues a processing job (image/video/PDF parsing) in Redis.
  5. Background workers stream from storage → transform → storage, emitting progress.
  6. API exposes status endpoints; UI polls or subscribes to events.

This design keeps your Node web tier slim, fast, and very hard to DoS via "fat files."

Pattern 1 — Stream, don't buffer (Express/Fastify)

Avoid middlewares that slurp entire files into memory. Use low-level parsers (Busboy/fastify-multipart) and stream.pipeline to honor backpressure.

// Fastify example: streaming to local disk (or a temp volume)
import Fastify from 'fastify';
import multipart from '@fastify/multipart';
import { createWriteStream, promises as fs } from 'fs';
import { pipeline } from 'stream/promises';
import { randomUUID } from 'crypto';
import { resolve } from 'path';

const app = Fastify({ logger: true, bodyLimit: 0 });
await app.register(multipart, { attachFieldsToBody: 'keyValues' });

app.post('/upload', async (req, reply) => {
  const parts = req.parts();
  for await (const part of parts) {
    if (part.type === 'file') {
      const id = randomUUID();
      const tmp = resolve('/var/tmp', `${id}-${part.filename}`);
      await pipeline(part.file, createWriteStream(tmp)); // backpressure-aware

      // TODO: hand off tmp path to a queue; do not process inline
      // Minimal ack with an id the client can use for status
      reply.code(202).send({ id, filename: part.filename });
      return;
    }
  }
  reply.code(400).send({ error: 'No file provided' });
});

app.listen({ port: 3000 });

Notes

  • bodyLimit: 0 lets you control limits explicitly (use a reverse proxy for 413s).
  • Always stream. Never Buffer.concat an entire file in memory.

Pattern 2 — Push the bytes off your server (presigned URLs)

Direct-to-storage removes your web tier from the hot path. Your API returns a presigned URL; the browser uploads straight to S3 with checksums + retry. Your CPU and memory thank you.

// Node 18+ AWS SDK v3 example: generate presigned S3 URL
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';

const s3 = new S3Client({ region: 'us-east-1' });

export async function getUploadUrl({ key, contentType, checksumSHA256 }) {
  const cmd = new PutObjectCommand({
    Bucket: process.env.BUCKET,
    Key: key,
    ContentType: contentType,
    // S3 will verify integrity
    ChecksumSHA256: checksumSHA256,
  });
  const url = await getSignedUrl(s3, cmd, { expiresIn: 60 }); // seconds
  return { url, key };
}

Client flow

  1. Hash file (crypto.subtle.digest('SHA-256', ...)) → base64url.
  2. Call /upload-url → receive presigned URL.
  3. PUT file to S3 with Content-Type + x-amz-checksum-sha256.
  4. Post metadata to your API → enqueue processing.

Pattern 3 — For very large files: multipart/resumable

  • S3 Multipart Upload splits files into 5–256MB parts; failed parts retry independently.
  • tus (open protocol) does chunked, resumable uploads with pause/resume.

Either way, your Node server handles control (create/upload/complete), not raw bytes.

Pattern 4 — Bound your concurrency (the quiet superpower)

Unbounded parallelism kills throughput via thrash. Bound it everywhere:

// Simple p-limit for controlled parallel part uploads
import pLimit from 'p-limit';

const limit = pLimit(4); // tune for your bandwidth/CPU
await Promise.all(parts.map(p => limit(() => uploadPart(p))));

Use the same idea for processing tasks and storage reads/writes.

Pattern 5 — Process off the hot path: queues + workers

Return 202 ASAP; do heavy work in a worker process with BullMQ (Redis) or Cloud Tasks equivalents.

// Producer (API)
import { Queue } from 'bullmq';
const q = new Queue('media', { connection: { host: '127.0.0.1', port: 6379 } });

await q.add('transcode', { key, userId }, {
  attempts: 5, backoff: { type: 'exponential', delay: 2000 }
});
// Worker (separate process / container)
import { Worker, QueueEvents } from 'bullmq';
import sharp from 'sharp';
import { Readable } from 'stream';
import { Upload } from '@aws-sdk/lib-storage';
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';

const s3 = new S3Client({ region: 'us-east-1' });

const worker = new Worker('media', async job => {
  const { key } = job.data;
  const obj = await s3.send(new GetObjectCommand({ Bucket: process.env.BUCKET, Key: key }));
  const transform = sharp().resize(1920).jpeg({ quality: 82 }); // CPU-bound; see Pattern 6

  const upload = new Upload({
    client: s3,
    params: { Bucket: process.env.BUCKET, Key: `${key}.jpg`, Body: Readable.from(obj.Body).pipe(transform) },
    queueSize: 4, // concurrent parts
    partSize: 8 * 1024 * 1024,
  });
  await upload.done();
});

Why it works

  • Web tier never blocks on CPU work.
  • Retries and backoff are isolated in the queue.
  • Workers can scale horizontally without changing the API.

Pattern 6 — Don't block the event loop (use Worker Threads)

Image/video/PDF transforms are CPU-bound. Run them in Worker Threads or a separate service so the event loop stays snappy.

// Worker Threads wrapper
import { Worker } from 'node:worker_threads';

export function runHeavyTask(payload) {
  return new Promise((resolve, reject) => {
    const worker = new Worker(new URL('./heavy-task.js', import.meta.url), { workerData: payload });
    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', code => code !== 0 && reject(new Error('Worker stopped')));
  });
}

heavy-task.js does the CPU work (e.g., sharp, pdf-lib) and posts results back.

Pro tip: Cap worker pool size to Math.max(1, cores - 1) and reuse workers.

Pattern 7 — Idempotency & retries

Uploads and callbacks may duplicate under retries. Use idempotency keys (e.g., X-Idempotency-Key) and upserts in your DB so "create once, confirm many" is safe. For processing, make job payloads replayable (source of truth = storage object).

Pattern 8 — Integrity, limits, and safe defaults

  • Checksums: verify on the client and at storage (ChecksumSHA256 or Content-MD5).
  • Content length: reject impossible Content-Length or force chunked encoding.
  • Limits: enforce max size at the edge (CDN/WAF), then proxy (Nginx client_max_body_size), then app (multipart plugin limits).
  • Timeouts: short timeouts on the API; long timeouts on storage/network clients.
  • AbortController: cancel stalled pipes cleanly.

Pattern 9 — Observability that actually helps

Log per-upload IDs and attach them to:

  • presign issued time,
  • client start/finish times,
  • storage ETag/checksum,
  • queue job id and worker hostname,
  • processing durations and outlier samples (p95/p99).

You can't fix what you can't see.

Case study (composite of real teams)

A team accepted 200–400MB media files over a single Express route using a body parser that buffered to memory. At ~70 concurrent uploads, latency exploded and nodes OOM-killed.

They switched to:

  • presigned direct-to-S3,
  • multipart uploads with 8MB parts, queueSize=4,
  • API that returned 202 with an uploadId,
  • BullMQ workers pulling from storage and Worker Threads for transforms.

Results after a day of tuning:

  • Web node RSS dropped by ~80–90%.
  • p99 "upload complete → 202" fell below 300ms.
  • Processing throughput scaled linearly with worker replicas.

Nothing exotic — just clean streaming, bounded concurrency, and separation of concerns.

Quick checklist (tape this above your monitor)

  • Direct-to-storage (presigned/multipart) for large files
  • Stream (pipeline) end-to-end—no buffering
  • Bound concurrency everywhere (uploads, parts, processing)
  • Queue + workers; CPU in Worker Threads
  • Idempotency keys, retries, and checksums
  • Size limits at the edge + sane timeouts
  • Per-upload observability (ids, p95/p99)

Conclusion

High-throughput Node uploads aren't about hero servers — they're about boring, reliable plumbing. Stream the bytes. Keep the event loop free. Push heavy lifting to the right place. With a few disciplined patterns, the system feels… calm — even on Monday mornings.

If you want a follow-up with a full sample repo (Fastify + presign + S3 multipart + BullMQ + Worker Threads), drop a comment and I'll prioritize the most requested stack.