OCR · Gemini Vision · Confidence Scoring

Turn a ration-card photo into
structured, validated data.

Upload an image and the pipeline preprocesses it, aligns it to a known state template, reads it with OCR, understands it with Gemini Vision, and returns clean JSON fields with a confidence score — flagging anything that needs a human.

How it works

One upload kicks off a queued, retry-safe pipeline. Every step is observable and every outcome is recorded.

01

Upload

POST an image; it's saved and queued as a Celery job.

02

Preprocess

Denoise, flatten lighting, deskew & template-align.

03

OCR + Vision

PaddleOCR reads text; Gemini extracts fields.

04

Score & validate

Confidence + field rules decide the status.

05

Result

Structured JSON: completed, needs_review or failed.

Built for messy, real-world cards

Photos are noisy, skewed and unevenly lit. The pipeline is designed to extract reliable data anyway.

🧼

Smart preprocessing

Denoise, background-flatten and CLAHE contrast — cleaned, never hard-binarized, so detail survives.

🧭

Template alignment

ORB feature matching to per-state templates, applied only on a confident match so a bad guess can't wreck the image.

👁️

OCR + Gemini Vision

PaddleOCR and Gemini both read the original photo; the model treats OCR as a noisy hint, not gospel.

📊

Confidence scoring

Cross-checks Gemini's fields against the OCR text — agreement is the signal, hallucinations get caught.

Validation rules

Required fields, card-number length and member-count sanity checks flag records for human review.

🔁

Retry-safe queue

Failures retry with backoff; exhausted jobs land in a failed bucket. Every outcome is saved to disk.

A two-call API

Submit an image, then poll for the result. That's the whole contract.

request
# 1 · submit an image
POST /upload  # multipart: file=@card.jpg
 { "job_id": "abc-123", "status": "queued" }

# 2 · poll for the result
GET /result/abc-123
result
{
  "status": "completed",
  "result": {
    "structured": {
      "card_number": "KA12345678",
      "head_of_family": "R. Kumar",
      "address": "Bengaluru, KA",
      "members": 4,
      "confidence": 0.92,
      "issues": []
    }
  }
}

The stack

A focused, modern Python backend with a Streamlit control room.

FastAPI · API
Celery + Redis · queue
OpenCV · preprocessing
PaddleOCR · text
Gemini Vision · understanding
Streamlit · dashboard