TypeScriptNode.jsData Engineering

Parsing 7,000 Federal Contracts with TypeScript

April 28, 2025·4 min read

Federal procurement data is publicly available but messy. Inconsistent field names, missing values, varying date formats, acronym-heavy descriptions. Building a backend to make this data useful required a few key design decisions upfront.

The Problem

The raw dataset contained 7,000+ IT contract records from the Government of Canada's open data portal. The client needed to benchmark their bids against historical contract values and identify which vendors were winning which categories of work.

Normalization Strategy

Every record went through a three-stage pipeline: extract, normalize, validate. The normalize step handled the bulk of the complexity. It mapped 40+ inconsistent field names down to a clean schema, parsed dates across 6 different formats, and bucketed vendor names that appeared under multiple spellings.

typescript

interface Contract {
  id: string;
  vendor: string;
  value: number;
  category: ContractCategory;
  awardDate: Date;
  ministry: string;
}

function normalize(raw: RawRecord): Contract | null {
  const value = parseFloat(raw.contract_value ?? raw.value ?? '0');
  if (!value || value <= 0) return null;

  return {
    id: raw.reference_number,
    vendor: normalizeVendorName(raw.vendor_name),
    value,
    category: classifyCategory(raw.description),
    awardDate: parseFlexibleDate(raw.award_date),
    ministry: raw.department ?? raw.ministry ?? 'Unknown',
  };
}

Exposing the Data

REST endpoints let the dashboard team query contracts by ministry, vendor, date range, or category without touching the raw files. Filtering happened at the API layer, not the client. That kept payloads small and queries fast.

Good data pipelines are boring by design. The goal is predictable output, not clever code.

All posts