DocIQ Suite 💬 Chat ✍️ Create 🧲 Extract 📚 Library ⚖️ Review 🛡️ Comply

🧲 DocIQ Extract Fields, not text

A PDF is not a database. Make it one.

Pull structured fields out of death certificates, contracts, financial statements, and audit evidence — with confidence scores on every value.

DocIQ Extract turns thousands of unstructured documents into rows in a table. Pick a schema. Drop the files. Get back named fields, source page references, and a confidence band on every extracted value. High-confidence rows stream into your downstream system. Low-confidence rows route to a human. Nothing falls through.

"We used to have three temps re-keying death certificates into the eligibility system. Now they review the 8% Extract isn't sure about."

See an extraction → Browse schemas

🧲

SOP-PB-04 Batch · Death Certificates

3 files · Schema: Death Certificate Fields

Completed

01Connect

→

02Schema

→

03Extract

→

04Review

Field Value Confidence Page

Deceased Name Robert T. Hargrove

97% DC-001 · 1

Date of Death March 14, 2024

99% DC-001 · 1

Cause of Death Cardiac arrest

94% DC-001 · 2

Place of Death Sacramento, CA

78% DC-001 · 1

Certifying Physician Dr. M. Chen, MD

91% DC-001 · 2

Certificate Number 2024-CA-0041728

96% DC-001 · 1

36 fields · 3 docs 91% avg confidence 3 need review

System schemas · plus custom

95%+

Average field confidence

3 sec

Per page · most schemas

CSV/JSON

Streaming export · API ready

100%

Source page provenance

Why this exists

OCR turned ink into text. Extract turns text into rows.

Your eligibility system needs a date, not a paragraph. Your audit dashboard needs a dollar amount, not a footnote. Your contract repository needs a renewal date, not a 40-page MSA.

Generic OCR gets you halfway. It turns the scan into searchable text. Then someone — usually a temp, an analyst, or an offshore team — sits and re-keys the fields you actually wanted. That's the integration tax in human form.

DocIQ Extract closes the gap. Pick a schema. Drop the files. Get rows. With confidence scores so you know which rows are safe and which need a second pair of eyes.

System schemas

Four schemas. Most extraction jobs your team runs.

Ready out of the box. Or define your own — point at sample documents, name the fields, save the schema, run forever.

🪦

System schema · Active

Death Certificate Fields

Core fields from US death certificates. Drives benefit termination, beneficiary payouts, and survivor eligibility flows. Tested against fifty-state variants.

deceased_name date_of_death cause place cert_number + 7 more

📊 12 fields

📜

System schema

Contract Key Terms

The eight clauses lawyers care about most: parties, effective date, term, renewal, termination, fees, governing law, and indemnity caps. Works on PDF and DOCX.

parties effective_date term_length renewal + 4 more

📊 8 fields

💰

System schema

Financial Statement KPIs

Revenue, COGS, gross margin, operating income, net income, cash position, AR/AP, debt — fifteen line items pulled from quarterly and annual reports. SEC filings welcome.

revenue gross_margin net_income cash + 11 more

📊 15 fields

🛡️

System schema

Audit Evidence Fields

SOX, SOC 2, ISO control evidence: control ID, owner, period, evidence reference, tester, status, exception flag, remediation. Audit packets in. Tracker row out.

control_id owner test_period status + 6 more

📊 10 fields

＋

Custom schema

Define your own

Name your fields, type each one (string, date, number, enum), provide three example documents, save. The schema is yours forever — versioned, governed, ready to run on a million pages.

+ Add field

What's in the box

Six things that make extraction trustable.

🎯

Per-field confidence

Every value carries a 0–100% score from the model itself, not a heuristic. Color-banded green / yellow / red so reviewers see at a glance where to look.

📍

Page-level provenance

Every extracted value cites the exact document and page it came from. One click opens the PDF at the right page with the source passage highlighted.

⚡

Batch + single-doc modes

Run on one document or a thousand. The pipeline streams results as they complete — your dashboard fills row by row instead of waiting for the whole batch.

✏️

Inline correction

Reviewers can edit any value in the results table. Corrections are logged and become ground truth — your accuracy improves on the schemas you actually use.

📤

Native export

CSV, JSON, Excel, or push directly to a Library collection or downstream system via API. Nothing about the value or the provenance is lost in transit.

🔁

Re-runs on schema updates

Add a field to a schema? Re-run on the historical batch with one click. Old rows get the new column — no full reprocessing, no duplicate ingestion costs.

The four-stage flow

Files in. Reviewed rows out.

The same four stages every time. Predictable. Auditable. Fast.

STAGE 01

Connect or Upload

Point at an S3 prefix, a SharePoint site, a Box folder, a Drive collection — or drag-drop from your desktop. DocIQ syncs incrementally on a cadence you set. PDFs, scans, Word, images: OCR, layout analysis, and language detection run automatically.

S3 · SharePoint · Box · DriveSync on cadenceOCR built-in

STAGE 02

Select Schema

Pick a system schema or one of yours. Preview the field list before you run — confirm the schema matches what these documents actually contain.

System or customVersioned

STAGE 03

Run Extraction

Documents loaded, fields extracted, confidence scored, citations linked. Watch progress in real time. Results stream to the table as they're ready.

StreamingCancellable

STAGE 04

Review Results

Filter by confidence band. Inline-edit low-confidence values. Export, save to Library, or hand off to a downstream API — all from one screen.

FilterEditExport

Confidence-scored output

Three bands. Three handling rules.

Not every extracted field deserves the same trust. DocIQ Extract bands every value so your downstream workflow can route automatically — and your reviewers know exactly where to spend their attention.

High · 90% and aboveAuto-accepted. Streamed to downstream system. No human touch unless audited.

≥ 90%

Medium · 70–89%Routed to human review. Flagged in the queue. Most clear up in a five-second glance.

70–89%

Low · below 70%Held for review. Source highlighted. Reviewer corrects or marks as un-extractable. Correction becomes ground truth.

< 70%

SOP-PB-04 Batch Extract · Results

JOB-2026-00118 · 3 files · 36 fields

Field Value Confidence Source

deceased_name Margaret L. Whitfield

98% DC-002 · 1

date_of_death 2024-08-22

99% DC-002 · 1

place_of_death Oakland, CA · Alameda Co.

74% DC-002 · 1

certifying_physician Dr. illegible

42% DC-002 · 2

cert_number 2024-CA-0044891

97% DC-002 · 1

33 high 2 medium 1 needs review Avg confidence 91%

Where it lands

From inbox to downstream system.

🪦

Death-claim adjudication

Death certificates → deceased name, date, certificate number, cause. Streams into the eligibility system to terminate benefits and trigger survivor payouts. Hours, not weeks.

Public sector pension

📜

Contract repository hydration

Ten thousand vendor contracts → renewal dates, fee schedules, indemnity caps. Pulled into a CLM tracker. Procurement finally has a single source of truth they didn't have to type.

Enterprise legal

💰

Quarterly KPI ingestion

10-Q PDFs across a portfolio → revenue, margin, net income, cash. Loaded directly into the analytics warehouse. The IR team stops re-keying and starts analyzing.

Financial services

🛡️

Audit evidence aggregation

SOX evidence packets → control ID, owner, test result, exceptions. Status dashboard auto-updates. Auditors review the exceptions, not the row count.

GRC / internal audit

Why DocIQ Extract

Generic OCR stops where extraction begins.

/ 01

Federated, not vendor-locked

Most extraction tools demand you upload everything to their cloud. DocIQ Extract connects to your document stores — S3, SharePoint, Box, Drive, Confluence — and runs incremental extraction on a schedule. No migration. No re-uploads.

/ 02

Confidence is first-class

Every field carries a real, model-generated confidence score. Most platforms give you "extracted text" and call it done. We give you something you can route on.

/ 03

Source page provenance

Click any value, jump to the page in the original document. Audits stop being "find the original" and start being "verify the highlight."

/ 04

Corrections become ground truth

Reviewer fixes a value? It's logged, surfaced as training signal for that schema, and the next batch tends to do better on the same edge case.

/ 05

Streaming results, not blocking

Don't wait for a 10,000-document batch to finish. Rows appear as they complete. Reviewers can start working at row one, not row ten thousand.

/ 06

Inside your tenant

VPC, private LLM, your encryption keys. Death certificates, contracts, financials don't leave the perimeter. The same air-gapped deployment as a $300B pension fund runs.

Stop re-keying. Start running.

Bring 50 documents. Get back a clean table.

Send us a sample of your real documents. We'll define a schema, run the extraction, and walk you through the results — confidence bands, corrections, exports — in a 30-minute session.

Run a sample extraction 📄 Download brochure (PDF) Back to DocIQ

Overview

🔌

The Six AI Data Agents

🔮

AI Data ScientistPredictive models

Architecture

Technical Docs

Browse all products →See what can be licensed

On This Page

What Cezu Can Do

Analyze, report, build, predict, govern — all from one search box.

See capabilities →

Governance

Data Management

Data Products

Intelligence

Predictive ModelsClickML

Vertical Products

🛡️

xAQUA Aegis LiveCybersecurity · GRC

🏛️

xAQUA for Pensions Roadmap

🏦

xAQUA for FinServ Roadmap

⚕️

xAQUA for Healthcare Future

Product Roadmap →

By Use Case

Data Preparation & Transformation

Data Migration & Integration

Analytics & Reporting

AI & ML

Data as a Product (DaaP)

Self-Service Data Management

Data Governance & Quality

Browse all solutions →

By Industry

By Role

Need help implementing?xAQUA Expert Services →

UDP Editions

◐

xAQUA EssentialsSMB · self-serve · from $49/mo

◑

xAQUA EnterprisePrivate VPC or air-gapped

●

xAQUA for GovernmentGovCloud · FedRAMP aligned

Modules & Products

Modules à la carteLicense only what you need

Vertical ProductsAegis · Pensions · FinServ

Compare Options

Request Custom Quote

Buying Resources

ROI Calculator

Pricing FAQ

Need to accelerate?xAQUA Expert Services →

Prefer a partner?Find a Partner →

Learn

Blog

Documentation

Webinars & Events

Whitepapers & Guides

Glossary

Newsletter

Customer Stories

All Customer Stories

$300B+ Public Pension8× ROI in 3 weeks

Salesforce MigrationStalled year → 6 weeks, one analyst

Testimonials

ROI Calculator

Thought Leadership

Forbes Articles

The Frankenstack Problem

The Integration Tax

The Smartphone Moment

True Unification vs M&A

About

About xAQUA

Careers

Trust & Security

Contact