Q

2.0

Products Data Management Composer · Data Pipelines
⚙️ Composer — Data Pipelines Drag-and-drop operators The AI Data Engineer's tool

Every pipeline, drag-and-dropped.

Drag. Drop. Deploy. No Python. No lock-in. No waiting.

Composer turns ETL operators into visual blocks. Build a production-grade workflow on a canvas — wire a Dataset, blend with Data Blend, transform, then map and load with MIL. Powered by xAQUA's in-memory query engine. Version it in Git. Deploy with one click. First pipeline on Day 1.

Built for data engineers who've had enough of boilerplate, and for analysts who shouldn't need Python to move data. Composer composes the workflow. Your team owns the outcome.

⚙️
Composer
Visual workflow editor · live
Building
Composer drag-and-drop pipeline build animation Animation showing four operators selected from the operator library and dropped onto a canvas to build a data pipeline: Dataset Operator extracts data, Data Blend Operator integrates sources, Transformation Operator cleanses and aggregates, MIL Operator maps and loads to target. Operators xAQUA · UDP Dataset Extract source Data Blend SQL JOIN sources Transformation Cleanse · Aggregate MIL Map · Identify · Load + MORE Data Load Salesforce Python MySQL → SQS SQS → Email PIPELINE RiskAnalytics · main ADDING DATASET ADDING BLEND ADDING TRANSFORM ADDING MIL PIPELINE READY 01 · DATASET Extract Risk Data postgres · last_30d 02 · BLEND Join Customer Master INNER · on customer_id 03 · TRANSFORM Cleanse · Aggregate in-memory query engine 04 · MIL Map · Identify · Load snowflake · SCD-2 Dataset Extract source Data Blend SQL JOIN sources Transformation Cleanse · Aggregate MIL Map · Identify · Load Pipeline ready · auto-generated Git versioned · CI/CD deployed · K8s healthy
🚀 Drag from the operator library. Drop on the canvas. Wire it up. Composer generates the workflow — runs on the in-memory query engine, deploys to K8s.
Composer in production
Drag · Drop
Visual Editor
Day 1
First Pipeline
0
Python Required
9
Pipeline Templates
In-Memory
Query Engine
Why Composer Exists

Pipelines shouldn't be a full-time job.

Most data teams spend more time writing and maintaining pipeline code than getting value from the data that moves through it. The scripts sprawl. Schema breaks on Friday at 5pm. Nobody remembers why the cron exists. By the time a new source lands, you're three sprints behind.

Composer replaces the code with a canvas. xAQUA's in-memory query engine runs the workflow underneath — pipelines auto-generate, version in Git, and deploy through CI/CD to Kubernetes. Schema contracts catch breaks at design time. Quality operators catch bad data before it leaves the pipeline. Observability tells you what broke, and why, in minutes.

Pipelines aren't code. They're an operating model.

The Composer Foundation

More than ETL.
Built on a foundation.

Composer isn't another drag-and-drop ETL canvas. It's the tool of xAQUA's AI Data Engineer, sitting on a foundation engineered to solve every critical data pipeline challenge — semantics, lineage, observability, master data, migration testing — at the root.

PILLAR 02 · UNDERSTANDING

Semantic Layer Foundation

Powered by SemantIQ. Composer understands both sides of every pipeline — source and target — in business terms. Source-to-target field mapping is auto-generated, not hand-documented.

SemantIQ Auto schema inference Auto mapping
PILLAR 03 · LINEAGE

End-to-End Column-Level Lineage

Active Metadata from SemantIQ tracks every transform at the column level. Forward impact: "if I change this, what breaks?" Backward root-cause: "this dashboard is wrong — where did the data come from?"

Column-level Forward impact Root cause
PILLAR 04 · MIGRATION TESTING

Natural Language Migration Testing

The killer capability. xAQUA Analytics Data Lake lets you reconcile source and target migrations in plain English — no SQL, no scripts. Ask "Do Q3 totals match?"; get row counts, sums, deltas, and the rows that don't reconcile. Migration testing that used to take weeks, in minutes.

Analytics Data Lake ConverseSQL No SQL required
PILLAR 05 · TRUST

Built-in Observability & Trust Score

SLA tracking, anomaly detection, schema drift alerts, and dataset-level Trust Scores are not bolted on — they're built into every operator. Quality gates fire before bad data leaves the pipeline.

SLA · Drift · Anomaly Trust Score Quality gates
PILLAR 06 · MASTER DATA

Master Data Built-In

Automated MDM, Probabilistic Entity Resolution, and SCD-0/1/2/3 strategies — all built into Composer's MIL operator. No separate MDM tool. Customer 360, Patient 360, Member 360 — by configuration.

Automated MDM Probabilistic ER SCD-0/1/2/3
Active Pipeline Health

Data engineers spend 60% of their time fixing broken pipelines.
Composer fixes that.

Schema drift. Silent failures. Cascading downstream errors. The firefighting tax. Composer collapses it with a four-part defense — prevent at design time, detect in real time, trace through end-to-end lineage, alert before bad data leaves the gate.

60%
The firefighting tax
Industry research

The data engineer's biggest line item isn't building. It's fixing.

Most data teams report spending roughly 60% of their working time investigating, diagnosing, and repairing pipelines that broke overnight — schemas that drifted, sources that changed, queries that silently returned the wrong rows. That's three days a week, per engineer, lost to firefighting. Composer reclaims those days. Pipelines built on Composer don't break the same way — and when something does shift upstream, you know it minutes after deploy, not the morning the dashboard is wrong.

Why 60% — three reasons
CAUSE 01 · BUILD
Every new source means a custom Python script, a code review, a deploy, and a hope-for-the-best Monday.
CAUSE 02 · DEPLOY
Schema breaks in prod because nobody validated the contract at design time. Three downstream reports are already wrong.
CAUSE 03 · OPERATE
An orchestrator runs the pipeline. Someone else watches quality. A third tool does lineage. Nothing talks.
Shift Left · Prevent breaks before they happen
Battle-tested operators from Athyna
Composer's operators come from Athyna — the same transformations data analysts have already tested interactively on real data in the studio. By the time they land in a production pipeline, they've been proven. Fewer novel transformations means fewer novel breaks.
Pre-validated · Reusable
SemantIQ schema contracts
Schema contracts validate at design time and runtime. SemantIQ tracks every source schema; when a column is added, removed, or retyped, contracts catch it — before the next run executes, not after the dashboard is wrong. Broken contracts surface in both editors and pipelines.
SemantIQ · Active Metadata
Shift Right · Detect and diagnose in seconds
Real-time observability
SLA tracking, throughput, latency, row-count anomaly detection, distribution drift. Slack, Teams, and PagerDuty routing. Know in minutes — not the next morning — that something is off, and exactly which run, which operator, which row count is suspect.
Live · Per-step
End-to-end column-level lineage
SemantIQ tracks every transform with column-level lineage. Forward: "if I change this column, what breaks?" Backward: "this dashboard is wrong — where did the data come from?" One graph. Every dependency. Root cause in seconds, not days.
SemantIQ · Column-level
SEMANTIQ ACTIVE METADATA Schema change → instant impact analysis LIVE ALERT · 2 min ago
SemantIQ lineage alert visualization Three-column lineage diagram: a Salesforce Contact source schema where the 'region' column was added is shown on the left; the Composer pipeline 'members_daily' in the middle with its operators including a contract validator that caught the change; the impacted downstream consumers on the right — Member 360 Dashboard, Churn ML Model, and Compliance DaaS API. Curved lines connect the columns through the pipeline to the consumers. SOURCE Salesforce · Contact SCHEMA id uuid email string phone string name string region string ⚠ NEW COLUMN · added 2 min ago Q COMPOSER PIPELINE members_daily.pipeline OPERATORS 01 · DATASET SalesforceCDC 02 · TRANSFORM EntityResolve 03 · SEMANTIQ CONTRACT ContractValidator ⚠ schema drift detected · column 'region' added 04 · QUALITY QualityGate 05 · LOAD SnowflakeLoad DOWNSTREAM 3 dependencies impacted FORWARD IMPACT Member 360 Dashboard uses region · 14 widgets Churn Prediction Model retrain required · cohort split Compliance DaaS API contract update · v2.3 → v2.4
✓ Caught at design time. SemantIQ's column-level lineage flagged the change before the next scheduled run, with a forward-impact list ready for review.
contract: members_daily/v2.3.yaml · alerted: #data-engineering
Legacy Migration & Modernization

From legacy systems to a modern stack.
In weeks, not years.

Government agencies are stuck on mainframes. Commercial firms are stuck on systems someone wrote in 1998. Both face the same trap: undocumented business rules, opaque schemas, and migration projects that overrun every estimate. Composer breaks the trap. Built on a semantic-layer foundation that understands both sides of the migration — your legacy schema and your target system — Composer auto-generates the mapping, enforces master-data quality, and lets you reconcile source and target in plain English.

01 · UNDERSTAND

Both sides of the migration, understood.

SemantIQ models the semantics of your legacy source and your target system — Salesforce, Snowflake, Databricks, BigQuery, whatever you're migrating to. With both sides understood, source-to-target field mapping is auto-generated, not hand-documented.

  • Semantic layer for source & target
  • Auto-inferred schemas at design & runtime
  • Auto-generated source-to-target mapping
  • Documented lineage from day one
02 · ENSURE QUALITY

The highest quality migrated data.

Migration that loses or corrupts master data isn't migration — it's data debt with a new database. Composer's quality engineering is built into every operator: profile, cleanse, deduplicate, resolve, and history-track on the way through.

  • Auto data profiling — structure · pattern · value · integrity
  • Automated Master Data Management
  • Probabilistic Entity Resolution (golden records)
  • SCD-0, SCD-1, SCD-2, SCD-3 — all built in
03 · RECONCILE

Reconcile source & target in plain English.

The killer feature: xAQUA Analytics Data Lake lets you virtually reconcile source and target — without writing a single line of SQL. Ask in English: "Do Q3 totals match?" The data lake responds with row counts, sums, deltas, and the rows that don't reconcile.

  • Plain-English reconciliation queries
  • NL → ConverseSQL → in-memory query engine
  • Row count · sum · checksum · field-level diff
  • Discrepancy detail report, instantly
XAQUA ANALYTICS DATA LAKE Migration testing & reconciliation, in plain English LIVE · NO SQL REQUIRED
👤
Reconcile Q3 2024 benefit payments between the legacy mainframe (BENEFITS_HIST) and the Snowflake target (warehouse.payments). Are totals and row counts identical?
Running reconciliation across both data sources. Results below.
RECONCILIATION · Q3 2024 PAYMENTS ⚠ DISCREPANCY DETECTED
SOURCE → TARGET
Source Legacy.BENEFITS_HIST
Target snowflake.warehouse.payments
ROW COUNT
source 12,847,103
target 12,847,089
delta ⚠ -14 records
SUM (PAYMENT_AMT)
source $4,287,341,022.18
target $4,287,338,994.61
delta ⚠ $2,027.57 (0.00005%)
CHECKSUMS BY CATEGORY
Standard ✓ matched
Reversed ⚠ 14 records · 9/28–9/30
Adjusted ✓ matched
Discrepancy isolated to the cutover window (Sep 28–30) — likely reversed-payment edge cases that didn't replay. Want me to auto-generate a remediation pipeline in Composer to backfill these 14 records?
CASE STUDY · CALIFORNIA STATE AGENCY Salesforce Migration · Production
6
datasets migrated
6
weeks · DEV → TEST → PROD
1
fractional analyst

From legacy chaos to a clean Salesforce CRM — without an army of consultants.

A California state agency ran a tangle of legacy datasets in diverse formats, with severe data quality problems and no reliable master or reference data. Compliance reporting depended on manual reconciliations. They needed to migrate to Salesforce — fast, with audit-grade quality.

Using xAQUA Athyna with natural-language transformations (NL → ConverseSQL → in-memory query engine) for prep, and xAQUA Composer for no-code ETL into Salesforce, the team profiled, cleansed, deduplicated, and loaded six datasets through DEV → TEST → PROD with one fractional analyst. Master-data uniqueness was enforced with SCD-0 and SCD-1 strategies built directly into Composer's MIL operator.

Migrated: Compliance Tracking V1 · Compliance V2 · ISR (Farm Monitoring) · NASS (Agricultural Statistics) · plus reference datasets
Master & Reference Data
Account Contact Location Address Account Contact Associated Location Commodity Commodity Category Regulatory Code
Nine Pipeline Templates

Every pipeline pattern —
templated and ready.

Drag a template. Configure the sources. Deploy. Nine production-grade starting points for the patterns teams build every quarter.

🏢
Cloud DW Integration
Snowflake · Databricks · BigQuery · Redshift. Full schema inference.
🔄
ETL / ELT Pipeline
Batch or streaming. Any source, any target. Contract-enforced.
🤝
Data Sharing
Partner DaaS and API gateway. Governed, masked, metered.
🧠
ML Training Prep
Acquire, profile, split — feed features into model training.
🤖
ML Data Pipeline
Train · Evaluate · Test · Package. End to end, no notebooks.
🧬
Multi-Domain MDM Hub
Patient · Member · Customer 360. Probabilistic entity resolution built in.
🏛️
Legacy Migration
Extract · cleanse · modernize. 18-month projects in 6 weeks.
🧹
Data Wrangling
Merge · dedupe · aggregate · filter. Reusable across pipelines.
🔍
Data Profiling
Structure, value, integrity. Profile the source before you move it.
What Composer Does

Pipelines as a first-class operating model.

🧩
Visual Workflow Composer
Drag operators onto a canvas. Wire them up. Composer generates the workflow definition, validates it, and runs it on xAQUA's in-memory query engine. No boilerplate — ever.
  • Visual task configuration with JSON schema
  • Sub-workflow reuse and parameterized templates
  • Auto-generated workflow definition file
📦
Purpose-Built Operator Library
Dataset, Data Blend, Transformation, MIL, Data Load, Python, Salesforce, MySQL→SQS, SQS→Email and more. All in a searchable library. Drag into any pipeline. Configure visually.
  • Database · SaaS · file · API connectors
  • Custom operator onboarding in minutes
  • Versioned, governed operator library
🛡️
Schema Contracts & Quality Gates
Enforce schema contracts at design time and runtime. Quality operators gate every step — bad data doesn't make it past the fence.
  • Design-time and runtime schema validation
  • Built-in DQ operators — validate, dedupe, resolve
  • Probabilistic Entity Resolution operator
CDC & Streaming
Real-time and near-real-time sync via Apache Kafka. Pull change data from Salesforce, ServiceNow, SAP, or any operational database. Ship it in seconds.
  • Kafka-native CDC streams
  • API polling for SaaS systems
  • Batch, micro-batch, and true streaming
🔀
Git-Versioned · CI/CD-Deployed
Every pipeline is automatically versioned in Git. Every deploy runs through CI/CD. Every promotion is reviewable. Engineering process, built in.
  • Integrated GitHub repository
  • One-click deploy to Kubernetes
  • Environment promotion · dev → staging → prod
📡
Observability Built In
SLA tracking, anomaly detection, and alerting across every pipeline. Know in minutes — not the next morning — why a pipeline is off.
  • SLA, throughput, and latency tracking
  • Anomaly detection on row counts and distributions
  • Slack, Teams, PagerDuty routing
The Pipeline Lifecycle

One platform. Five steps. Zero handoffs.

Create. Modify. Deploy. Run. Monitor. The whole loop, on one canvas — no second tools, no copy-paste between systems.

Composer Pipeline Lifecycle Five-stage pipeline lifecycle around the Composer platform: 1 Create, 2 Modify and Version, 3 Deploy, 4 Schedule and Run, 5 Monitor. Each step shown as a card connected to the central Composer hub. Q COMPOSER · UDP 1 2 3 4 5 Create Visually configure ETL/ELT operators on a canvas — no code, no boilerplate. Modify & Version Every change auto-versioned in Git. Branch, review, merge — full history. Deploy One-click promote to K8s. CI/CD-driven. Dev → staging → prod, governed. Schedule & Run Cron, event triggers, manual runs. Retries, backoff, SLAs — built in. Monitor SLA tracking, anomaly detection, alerts to Slack · Teams · PagerDuty.
Powered by integrated platform components
Operator
Catalog
Pipeline
Repository
Metadata
Repository
UDP 360
Database
GitHub
Integration
CI/CD
Pipeline
Docker
Registry
Universal Connectivity

Connect anything to anything.

Any source. Any target. Any format. Composer's operators handle the integration tax — extract, transform, resolve, and load across every system you run.

Composer ETL stages with source and target connectors Central pillar shows four ETL stages — Extract, Transform, Entity Resolution, Map & Load. Source connectors on the left (Oracle, Snowflake, MySQL, Salesforce, Excel, Parquet, Redshift) feed in via dashed paths. Target connectors on the right receive the output via dashed paths. SOURCES COMPOSER PIPELINE TARGETS Oracle Snowflake MySQL Salesforce X Excel · CSV Parquet R Redshift Oracle Snowflake MySQL DBX Databricks BQ BigQuery S3 Amazon S3 R Redshift 01 · STAGE Extract 02 · STAGE Transform 03 · STAGE Entity Resolution 04 · STAGE Map & Load
01 · EXTRACT
Pull from any source
Real-time, batch, or streaming. Out-of-the-box connectors for relational, NoSQL, SaaS, cloud storage, and file formats.
02 · TRANSFORM
Cleanse, blend, aggregate
Filter, sort, pivot, merge, group, impute, and reshape — visually configured, run on the in-memory query engine.
03 · ENTITY RESOLUTION
Match across systems
Probabilistic Entity Resolution operator unifies records — Customer 360, Patient 360, Member 360 — without custom code.
04 · MAP & LOAD
Deliver to any target
SCD-0/1/2/3 strategies, append or upsert, schema mapping. Land into warehouses, lakes, databases, or DaaS APIs.
Four Operators · One Pipeline

Compose the canvas. Composer runs the workflow.

Drag a Dataset Operator to extract. Add a Data Blend Operator to integrate sources. Cleanse and aggregate with the Transformation Operator. Map and load to your target with the MIL Operator. Wire them up — Composer generates the workflow and runs it on xAQUA's in-memory query engine. Version it in Git. Deploy to your K8s cluster.

  • In-memory query engine · zero infrastructure tax
  • Schema contracts enforced at design and run time
  • Quality gate operators on every step
  • Generated workflow is yours — inspect, edit, export
composer · core operators
── composer pipeline · risk_analytics ──

[1] UDP Dataset Operator
  source: "postgres://risk.transactions"
  asset:  "Query Asset · last_30d"           ✓ extracted

[2] UDP Data Blend Operator
  join:   "INNER · on customer_id"
  with:   "File Asset · customer_master.csv"  ✓ blended

[3] UDP Transformation Operator
  tasks:  "filter, group_by, aggregate"
  engine: "in-memory query engine"           ✓ transformed

[4] UDP MIL Operator
  target: "snowflake.risk.scores"
  scd:    "SCD-2 · history preserved"          ✓ loaded

── deploy · main@a3f9c2 ──
  workflow  RiskAnalytics.pipeline
  schedule  0 */6 * * *          ✓ active
  k8s pod   healthy             ✓ green

STATUS: GREEN · next run in 47m
Use Cases

Where Composer earns its keep.

🏛️
Legacy System Modernization
State agencies · regulatory bodies
Extract, cleanse, transform, and load from legacy mainframes and siloed operational systems. One regulator finished what was a planned 18-month migration in 6 weeks — with a single analyst on Composer.
✓ Migrations that ship, not stall
🧬
MDM Hub & 360 Views
Healthcare · Financial Services
Build a Master Member / Patient / Customer Index using probabilistic entity resolution. Ingest from every operational system. Share governed 360s via DaaS API. No custom ER code.
✓ Single source of identity, across every system
❄️
Cloud Data Warehouse Pipelines
Snowflake · Databricks · BigQuery
Land data into Snowflake or Databricks without writing a line of Python. CDC from SaaS, validate the schema, quality-gate the load, and watch every run. Schema drift gets caught at the door.
✓ Warehouses you can trust on Monday morning
🧠
ML & Predictive Pipelines
Healthcare · Insurance · Public Health
Blend hospital and emergency discharge data, impute missing fields, deduplicate incidents, filter by cohort, and feed a fatal-incidence prediction model — one workflow, end to end. Train, evaluate, package, deploy.
✓ Reproducible training, retrained on schedule
Why Composer

Not another ETL tool.

Composer is a module of a unified platform — not a standalone pipeline product that needs its own catalog, its own quality engine, and its own lineage.

In-memory query engine · zero lock-in
Composer runs on xAQUA's own in-memory query engine. No proprietary runtime to license. No JVM cluster to babysit. The pipeline definition is portable, inspectable, and yours — Composer is a faster way to build it, not a cage around it.
Contracts before code runs
Schema contracts validate at design time, not just at runtime. A breaking source change fails the build — before it breaks your 3am pipeline.
Quality gates, every step
Every task has an optional quality gate — validate, dedupe, resolve, enforce. Bad data is stopped at the fence, not chased through downstream dashboards.
Same semantic layer · every pipeline
Composer reads and writes against the same business vocabulary your analysts, BI, and governance tools use. No two versions of "customer." Ever.
Operated by AI · augmented by humans
Most ETL tools are operated by humans, with an AI assistant glued on. Composer inverts that — the AI Data Engineer agent operates the canvas; humans review, approve, and steer. Augmentation, not replacement.
Deploys where your data lives
Private VPC, air-gapped, on-prem, or cloud. No data leaves your boundary. Pipelines run next to the data — including FedRAMP-aligned environments.
Meet the AI Data Engineer

Composer is the tool of xAQUA's AI Data Engineer.

The AI Data Engineer is an xAQUA agent that lives inside Composer. Powered by Active Metadata from the semantic layer, the catalog, and the lineage graph, the agent understands your sources, your business definitions, and your governance rules. Ask in English; the agent composes the pipeline, configures every operator, validates contracts, and wires the workflow.

Promote ad-hoc work from Athyna — xAQUA's interactive data studio — into Composer with one prompt. Same semantic layer. Same catalog. Same governance. The agent wraps the recipe into a scheduled, monitored, Git-versioned production pipeline. You review, approve, and steer.

See the full AI Data Team →
AI Data Engineer
xAQUA AI Data Team · always on
Operates Composer · Active Metadata
👤
Turn Priya's Athyna member prep into a daily pipeline.
Promoted Athyna recipe → Composer pipeline:

· members_daily.pipeline — 5 operators
· Schedule: 0 2 * * * · SLA 10min
· Quality gate: 99.2% threshold
· Git: main/pipelines/members_daily
· Deployed to K8s · observability wired

Running tonight. First output ready by 02:10.

Ready to stop writing boilerplate?

See Composer build a CDC pipeline from Salesforce to Snowflake — with entity resolution, quality gates, Git versioning, and K8s deployment — in under fifteen minutes.