Skip to main content

DATA PIPELINE

data-pipeline.ts

ETL pipeline: typed stages, checkpoint/resume, quality checks.

Stark avatarStark

WHAT THIS PATTERN TEACHES

How to build ETL pipelines with typed stages, validation between steps, checkpoint/resume for long-running jobs, and idempotent processing via dedup keys.

WHEN TO USE THIS

Data ingestion, transformation, migration — any multi-step data processing workflow.

AT A GLANCE

const pipeline = new Pipeline([
  extract(source),
  validate(schema),
  transform(normalize),
  load(destination),
]);
await pipeline.run({ checkpoint: true });

FRAMEWORK IMPLEMENTATIONS

TypeScript
interface Stage<In, Out> {
  name: string;
  process: (input: In) => Promise<Out>;
  validate?: (output: Out) => boolean;
}

class Pipeline {
  constructor(private stages: Stage<any, any>[]) {}

  async run(opts: { checkpoint?: boolean } = {}) {
    let data: any = null;
    for (const stage of this.stages) {
      data = await stage.process(data);
      if (stage.validate && !stage.validate(data)) {
        throw new Error(`Validation failed: ${stage.name}`);
      }
      if (opts.checkpoint) await this.save(stage.name, data);
    }
    return data;
  }
← All Patterns