Skip to main content

DAN AI — Confidence Scores and Accuracy

Every extraction is stamped with a confidence score between 0 and 100, visible on the document detail page in the dashboard and in the accuracy_percent column of document_extractions (or the confidence block in API responses).

The score reflects how certain the AI is that the value it returned is correct. It's not a guarantee of correctness — but it's a strong signal for where to focus human review.

Score ranges

RangeMeaningRecommended action
90 – 100High confidenceAuto-approve in most workflows. Spot-check a sample once a week to confirm.
75 – 89Medium confidenceQuick human review of the key fields (total, dates, identifiers).
50 – 74Low confidenceFull field-by-field review. The file may be scanned, partially obscured, or use unusual terminology.
0 – 49Very low / failedRe-extract with a different doc_type, or upload a clearer scan.

Per-field vs document-level

DAN AI reports confidence at two levels:

  • Per field — under confidence in the response (e.g. { "invoice_number": 95, "total_amount": 97 }). A field with no entry was extracted with high confidence and the score was suppressed.
  • Document-level — the overall classifier confidence shown next to the document type (e.g. invoice · 95%). This is how sure DAN AI is that the file is actually an invoice.

A document can have a high document-level score (classifier is sure it's an invoice) but a low confidence on individual fields (e.g. the date is hand-written and hard to read). The fix is different for each:

  • Low document-level confidence → wrong doc_type for this file.
  • Low per-field confidence → scan quality, ambiguous text, or a field the prompt doesn't handle well.

What drives a low score

In practice, low confidence usually traces back to one of these:

  • Scan quality — low contrast, glare, skew, or low DPI. Re-scanning at higher contrast usually beats higher resolution.
  • Hand-written text — DAN AI handles printed text reliably; handwriting is much harder.
  • Unusual layout — e.g. an invoice with the total at the top instead of the bottom. Consider an account-wide custom prompt — see Custom Fields and Templates.
  • Wrong document type — the prompt doesn't know what to look for. Switch types and re-extract.
  • Field not present — if the AI returns a value but the source file doesn't actually contain that field, the score will usually be low. Consider restricting the extraction to fields you know are present.

Confidence and verification

When a human edits a field and marks the document verified, the verified values are what flow into exports and webhook payloads — regardless of confidence score. The original AI response is still preserved in document_extractions.raw_response so you can audit edits later.

This means confidence scores guide review effort before verification, not what gets exported after.

Using scores in automation

Two common patterns:

Pattern 1 — auto-approve high confidence, queue the rest. Configure your webhook handler to auto-process documents where every field's confidence is ≥ 90, and create a review task for anything below.

Pattern 2 — gate by specific fields. For invoices, you usually care most about total_amount and invoice_number. Auto-approve if those two are ≥ 95, regardless of what the others scored.

Where to find scores

PlaceWhat you see
Dashboard, document detail pageDocument-level type confidence next to the type badge, plus per-field scores beside any uncertain field.
REST API response (/extract, /documents/:id)confidence block keyed by field name; meta.accuracy_percent for the document-level score.
Webhook payloadSame shape as the API response — confidence per field.
Excel exportA _confidence column appears next to each field column in the top-level sheet.

Improving scores over time

If you consistently see low scores on a particular vendor's documents, the most reliable fixes (in order of effort):

  1. Re-scan or re-render the source PDF at higher contrast. Fastest, no engineering work.
  2. Switch to a more specific doc_type. If you've been using general, try invoice or purchase_order.
  3. Restrict extraction to a custom field list. Asks the AI to focus only on what you need — see Custom Fields and Templates.
  4. Add an account-wide custom prompt for that document type. Most invasive but most powerful — best for non-standard layouts you receive often.