System Overview
How CRUNCH takes messy data
and ships something clean.
Each row is a processing layer. Raw data enters at the top in any format and flows down through AI-powered analysis, a human review gate, distributed Databricks processing, and clean output delivery. Colored labels on the left identify each layer's role.
Ingest
File Upload
Drop any CSV, Excel, or
Google Sheet up to 500MB
Google Sheet up to 500MB
Schema Snapshot
Auto-detect encoding,
delimiter, and column count
delimiter, and column count
Data Profile
Null count, type guesses,
and anomaly flags on load
and anomaly flags on load
Input Validator
Confirms file integrity
before AI processing begins
before AI processing begins
AI Plan
Claude AI Planner
Reads every column header and samples each field. Maps data types, formats, and semantic intent. Generates a full transformation plan before a single cell is touched.
Column Classifier
Maps semantic intent per
column: date, currency, name,
ID, category, free text
column: date, currency, name,
ID, category, free text
Issue Detector
Flags nulls, mixed types,
bad formatting, casing
inconsistencies, and duplicates
bad formatting, casing
inconsistencies, and duplicates
Review
Human Review Gate
Every proposed change is shown before execution. Approve the full plan, adjust individual suggestions, or override per column. You stay in control the whole time.
Override Controls
Set custom defaults, null
handling policy, and
per-column transformation rules
handling policy, and
per-column transformation rules
Process
Databricks Compute Engine
Distributed processing at enterprise scale. Executes the approved plan across the full dataset with parallel workloads, fault tolerance, and sub-minute completion for most files.
Normalize
Dates to ISO 8601. Currencies
to float. Names title-cased.
Casing and abbreviations unified.
to float. Names title-cased.
Casing and abbreviations unified.
Resolve
Nulls and blanks handled
by context. Inferred values
flagged and logged separately.
by context. Inferred values
flagged and logged separately.
Output
CSV Export
Clean, UTF-8 encoded
comma-separated file
comma-separated file
Excel Export
Formatted .xlsx with
column type headers
column type headers
JSON Export
Structured records, ready
for APIs and pipelines
for APIs and pipelines
Audit Trail
Every transformation logged
in a human-readable change log
in a human-readable change log