Upload an image and the pipeline preprocesses it, aligns it to a known state template, reads it with OCR, understands it with Gemini Vision, and returns clean JSON fields with a confidence score — flagging anything that needs a human.
One upload kicks off a queued, retry-safe pipeline. Every step is observable and every outcome is recorded.
POST an image; it's saved and queued as a Celery job.
Denoise, flatten lighting, deskew & template-align.
PaddleOCR reads text; Gemini extracts fields.
Confidence + field rules decide the status.
Structured JSON: completed, needs_review or failed.
Photos are noisy, skewed and unevenly lit. The pipeline is designed to extract reliable data anyway.
Denoise, background-flatten and CLAHE contrast — cleaned, never hard-binarized, so detail survives.
ORB feature matching to per-state templates, applied only on a confident match so a bad guess can't wreck the image.
PaddleOCR and Gemini both read the original photo; the model treats OCR as a noisy hint, not gospel.
Cross-checks Gemini's fields against the OCR text — agreement is the signal, hallucinations get caught.
Required fields, card-number length and member-count sanity checks flag records for human review.
Failures retry with backoff; exhausted jobs land in a failed bucket. Every outcome is saved to disk.
Submit an image, then poll for the result. That's the whole contract.
# 1 · submit an image POST /upload # multipart: file=@card.jpg → { "job_id": "abc-123", "status": "queued" } # 2 · poll for the result GET /result/abc-123
{
"status": "completed",
"result": {
"structured": {
"card_number": "KA12345678",
"head_of_family": "R. Kumar",
"address": "Bengaluru, KA",
"members": 4,
"confidence": 0.92,
"issues": []
}
}
}
A focused, modern Python backend with a Streamlit control room.