Q

2.0

xAQUA â€ē DocIQ Suite â€ē Extract
🧲 DocIQ Extract Fields, not text

A PDF is not a database. Make it one.

Pull structured fields out of death certificates, contracts, financial statements, and audit evidence — with confidence scores on every value.

DocIQ Extract turns thousands of unstructured documents into rows in a table. Pick a schema. Drop the files. Get back named fields, source page references, and a confidence band on every extracted value. High-confidence rows stream into your downstream system. Low-confidence rows route to a human. Nothing falls through.

"We used to have three temps re-keying death certificates into the eligibility system. Now they review the 8% Extract isn't sure about."

🧲
SOP-PB-04 Batch ¡ Death Certificates
3 files ¡ Schema: Death Certificate Fields
Completed
01Connect
→
02Schema
→
03Extract
→
04Review
Field Value Confidence Page
Deceased Name Robert T. Hargrove
97%
DC-001 ¡ 1
Date of Death March 14, 2024
99%
DC-001 ¡ 1
Cause of Death Cardiac arrest
94%
DC-001 ¡ 2
Place of Death Sacramento, CA
78%
DC-001 ¡ 1
Certifying Physician Dr. M. Chen, MD
91%
DC-001 ¡ 2
Certificate Number 2024-CA-0041728
96%
DC-001 ¡ 1
36 fields ¡ 3 docs 91% avg confidence 3 need review
4
System schemas ¡ plus custom
95%+
Average field confidence
3 sec
Per page ¡ most schemas
CSV/JSON
Streaming export ¡ API ready
100%
Source page provenance
Why this exists

OCR turned ink into text. Extract turns text into rows.

Your eligibility system needs a date, not a paragraph. Your audit dashboard needs a dollar amount, not a footnote. Your contract repository needs a renewal date, not a 40-page MSA.

Generic OCR gets you halfway. It turns the scan into searchable text. Then someone — usually a temp, an analyst, or an offshore team — sits and re-keys the fields you actually wanted. That's the integration tax in human form.

DocIQ Extract closes the gap. Pick a schema. Drop the files. Get rows. With confidence scores so you know which rows are safe and which need a second pair of eyes.

System schemas

Four schemas. Most extraction jobs your team runs.

Ready out of the box. Or define your own — point at sample documents, name the fields, save the schema, run forever.

đŸĒĻ
System schema ¡ Active
Death Certificate Fields

Core fields from US death certificates. Drives benefit termination, beneficiary payouts, and survivor eligibility flows. Tested against fifty-state variants.

deceased_name date_of_death cause place cert_number + 7 more
📊 12 fields
📜
System schema
Contract Key Terms

The eight clauses lawyers care about most: parties, effective date, term, renewal, termination, fees, governing law, and indemnity caps. Works on PDF and DOCX.

parties effective_date term_length renewal + 4 more
📊 8 fields
💰
System schema
Financial Statement KPIs

Revenue, COGS, gross margin, operating income, net income, cash position, AR/AP, debt — fifteen line items pulled from quarterly and annual reports. SEC filings welcome.

revenue gross_margin net_income cash + 11 more
📊 15 fields
đŸ›Ąī¸
System schema
Audit Evidence Fields

SOX, SOC 2, ISO control evidence: control ID, owner, period, evidence reference, tester, status, exception flag, remediation. Audit packets in. Tracker row out.

control_id owner test_period status + 6 more
📊 10 fields
īŧ‹
Custom schema
Define your own

Name your fields, type each one (string, date, number, enum), provide three example documents, save. The schema is yours forever — versioned, governed, ready to run on a million pages.

+ Add field
What's in the box

Six things that make extraction trustable.

đŸŽ¯
Per-field confidence
Every value carries a 0–100% score from the model itself, not a heuristic. Color-banded green / yellow / red so reviewers see at a glance where to look.
📍
Page-level provenance
Every extracted value cites the exact document and page it came from. One click opens the PDF at the right page with the source passage highlighted.
⚡
Batch + single-doc modes
Run on one document or a thousand. The pipeline streams results as they complete — your dashboard fills row by row instead of waiting for the whole batch.
âœī¸
Inline correction
Reviewers can edit any value in the results table. Corrections are logged and become ground truth — your accuracy improves on the schemas you actually use.
📤
Native export
CSV, JSON, Excel, or push directly to a Library collection or downstream system via API. Nothing about the value or the provenance is lost in transit.
🔁
Re-runs on schema updates
Add a field to a schema? Re-run on the historical batch with one click. Old rows get the new column — no full reprocessing, no duplicate ingestion costs.
The four-stage flow

Files in. Reviewed rows out.

The same four stages every time. Predictable. Auditable. Fast.

STAGE 01
Connect or Upload
Point at an S3 prefix, a SharePoint site, a Box folder, a Drive collection — or drag-drop from your desktop. DocIQ syncs incrementally on a cadence you set. PDFs, scans, Word, images: OCR, layout analysis, and language detection run automatically.
S3 ¡ SharePoint ¡ Box ¡ DriveSync on cadenceOCR built-in
STAGE 02
Select Schema
Pick a system schema or one of yours. Preview the field list before you run — confirm the schema matches what these documents actually contain.
System or customVersioned
STAGE 03
Run Extraction
Documents loaded, fields extracted, confidence scored, citations linked. Watch progress in real time. Results stream to the table as they're ready.
StreamingCancellable
STAGE 04
Review Results
Filter by confidence band. Inline-edit low-confidence values. Export, save to Library, or hand off to a downstream API — all from one screen.
FilterEditExport
Confidence-scored output

Three bands. Three handling rules.

Not every extracted field deserves the same trust. DocIQ Extract bands every value so your downstream workflow can route automatically — and your reviewers know exactly where to spend their attention.

High ¡ 90% and aboveAuto-accepted. Streamed to downstream system. No human touch unless audited.
â‰Ĩ 90%
Medium · 70–89%Routed to human review. Flagged in the queue. Most clear up in a five-second glance.
70–89%
Low ¡ below 70%Held for review. Source highlighted. Reviewer corrects or marks as un-extractable. Correction becomes ground truth.
< 70%
SOP-PB-04 Batch Extract ¡ Results
JOB-2026-00118 ¡ 3 files ¡ 36 fields
Field Value Confidence Source
deceased_name Margaret L. Whitfield
98%
DC-002 ¡ 1
date_of_death 2024-08-22
99%
DC-002 ¡ 1
place_of_death Oakland, CA ¡ Alameda Co.
74%
DC-002 ¡ 1
certifying_physician Dr. illegible
42%
DC-002 ¡ 2
cert_number 2024-CA-0044891
97%
DC-002 ¡ 1
33 high 2 medium 1 needs review Avg confidence 91%
Where it lands

From inbox to downstream system.

đŸĒĻ
Death-claim adjudication

Death certificates → deceased name, date, certificate number, cause. Streams into the eligibility system to terminate benefits and trigger survivor payouts. Hours, not weeks.

Public sector pension
📜
Contract repository hydration

Ten thousand vendor contracts → renewal dates, fee schedules, indemnity caps. Pulled into a CLM tracker. Procurement finally has a single source of truth they didn't have to type.

Enterprise legal
💰
Quarterly KPI ingestion

10-Q PDFs across a portfolio → revenue, margin, net income, cash. Loaded directly into the analytics warehouse. The IR team stops re-keying and starts analyzing.

Financial services
đŸ›Ąī¸
Audit evidence aggregation

SOX evidence packets → control ID, owner, test result, exceptions. Status dashboard auto-updates. Auditors review the exceptions, not the row count.

GRC / internal audit
Why DocIQ Extract

Generic OCR stops where extraction begins.

/ 01
Federated, not vendor-locked
Most extraction tools demand you upload everything to their cloud. DocIQ Extract connects to your document stores — S3, SharePoint, Box, Drive, Confluence — and runs incremental extraction on a schedule. No migration. No re-uploads.
/ 02
Confidence is first-class
Every field carries a real, model-generated confidence score. Most platforms give you "extracted text" and call it done. We give you something you can route on.
/ 03
Source page provenance
Click any value, jump to the page in the original document. Audits stop being "find the original" and start being "verify the highlight."
/ 04
Corrections become ground truth
Reviewer fixes a value? It's logged, surfaced as training signal for that schema, and the next batch tends to do better on the same edge case.
/ 05
Streaming results, not blocking
Don't wait for a 10,000-document batch to finish. Rows appear as they complete. Reviewers can start working at row one, not row ten thousand.
/ 06
Inside your tenant
VPC, private LLM, your encryption keys. Death certificates, contracts, financials don't leave the perimeter. The same air-gapped deployment as a $300B pension fund runs.
Stop re-keying. Start running.

Bring 50 documents. Get back a clean table.

Send us a sample of your real documents. We'll define a schema, run the extraction, and walk you through the results — confidence bands, corrections, exports — in a 30-minute session.