jpskill.com
💼 ビジネス コミュニティ

receipt-scanner-master

領収書のスキャン、OCR精度向上、経費分類、データベース連携など、領収書に関する様々な処理に対応し、アップロード時のトラブルシューティングも行うSkill。

📜 元の英語説明(参考)

Master receipt scanning operations including parsing, debugging, enhancing accuracy, and database integration. Use when working with receipts, images, OCR issues, expense categorization, or troubleshooting receipt uploads.

🇯🇵 日本人クリエイター向け解説

一言でいうと

領収書のスキャン、OCR精度向上、経費分類、データベース連携など、領収書に関する様々な処理に対応し、アップロード時のトラブルシューティングも行うSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o receipt-scanner-master.zip https://jpskill.com/download/18970.zip && unzip -o receipt-scanner-master.zip && rm receipt-scanner-master.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/18970.zip -OutFile "$d\receipt-scanner-master.zip"; Expand-Archive "$d\receipt-scanner-master.zip" -DestinationPath $d -Force; ri "$d\receipt-scanner-master.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して receipt-scanner-master.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → receipt-scanner-master フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

[Skill 名] receipt-scanner-master

レシートスキャナーマスター

AIを活用したOCRを使用してレシート画像から構造化データを抽出し、データベースに保存するレシートスキャンシステムをマスターしましょう。

このスキルでできること

このスキルは、以下のことを支援します。

  1. レシート画像(JPG、PNG、WebP、PDF)を構造化データに解析します。
  2. OCRの精度問題や抽出エラーをデバッグします。
  3. レシート解析エンジンとプロンプトを強化します。
  4. Webインターフェースを通じてレシートのアップロードをテストします。
  5. データベース統合の問題をトラブルシューティングします。
  6. 抽出されたデータを実際のレシートと照合して検証します。
  7. カテゴリ分類と明細項目の抽出を改善します。

システムアーキテクチャ

フロントエンドコンポーネント

Receipt Scanner Component: /home/adamsl/planner/office-assistant/js/components/receipt-scanner.js

  • http://localhost:8080/receipt-scanner.html にある主要なレシートスキャンインターフェースです。
  • レシート画像のドラッグ&ドロップまたはファイルアップロードに対応しています。
  • レシートを解析し、明細項目を表形式で表示します。
  • 各明細項目にはカテゴリピッカーのドロップダウンがあります。
  • 項目はカテゴリ分類されるとすぐにデータベースに自動保存されます。
  • カテゴリ分類された項目のみが保存されます(未分類の項目は無視されます)。
  • レシート全体のカテゴリピッカーはありません(削除されました)。

Upload Component: /home/adamsl/planner/office-assistant/js/upload-component.js

  • 代替のアップロードインターフェース(銀行取引明細書用)です。
  • システムからの最近のダウンロードを表示します。
  • ターミナル表示を通じてリアルタイムの処理フィードバックを表示します。
  • バックエンドからのストリーミング応答(Server-Sent Events)を処理します。
  • 正常なインポート後にファイルリストを自動更新します。

バックエンドコンポーネント

Receipt Parser: app/services/receipt_parser.py

  • ファイルの種類とサイズを検証します。
  • 画像を処理および圧縮します。
  • 一時的および永続的なファイルストレージを管理します。
  • 抽出のためにAIエンジンと連携します。

Receipt Engine: app/services/receipt_engine.py

  • OCRと抽出にGoogle Gemini AIを使用します。
  • 厳格な精度検証ルールを実装しています。
  • Pydanticモデルを介して構造化データを返します。
  • モデルを次の順序で試行します: gemini-2.5-flash (最初)、2.0-flash、2.5-pro、pro-latest
  • proクォータ制限を避けるため、flashモデルを最初に利用します。
  • flashモデルとproモデルでクォータが分かれています。

API Endpoints: app/api/receipt_endpoints.py

  • /api/parse-receipt - レシート画像をアップロードして解析します(一時データを返し、保存はしません)。
  • /api/receipt-items - カテゴリ分類された個々の明細項目を自動保存します。
  • /api/save-receipt - カテゴリ分類された項目の最終保存(バッチ操作)です。
  • /api/receipts/{expense_id} - レシートのメタデータを取得します。
  • /api/receipts/file/{year}/{month}/{filename} - 保存されたレシートファイルを提供します。

Data Models: app/models/receipt_models.py

  • ReceiptExtractionResult - 完全なレシートデータ構造です。
  • ReceiptItem - カテゴリ分類された個々の明細項目です。
  • ReceiptTotals - 小計、税金、チップ、割引、合計です。
  • ReceiptPartyInfo - 販売者詳細です。
  • ReceiptMeta - 解析メタデータとモデル情報です。
  • PaymentMethod - Enum: CASH, CARD, BANK, OTHER です。

データベース統合

テーブル:

  • expenses - 主要な経費エントリ(金額、日付、カテゴリ、支払い方法)です。
  • receipt_metadata - 解析メタデータ(モデル、信頼度、生応答)です。

ストレージ構造:

app/data/receipts/
├── YYYY/
│   ├── MM/
│   │   ├── receipt_TIMESTAMP_filename.jpg
│   │   └── receipt_TIMESTAMP_filename.pdf
└── temp/
    └── temp_receipt_TIMESTAMP_filename.jpg

このスキルの使い方

ステップ1: レシート解析のテスト

レシート画像を解析して構造化データを抽出します。

# APIサーバーが実行中でない場合は起動します
python3 api_server.py

# curlでテストします(別のターミナルから)
curl -X POST "http://localhost:8000/api/parse-receipt" \
  -F "file=@/path/to/receipt.jpg"

期待される応答:

{
  "parsed_data": {
    "transaction_date": "2025-01-15",
    "payment_method": "CARD",
    "party": {
      "merchant_name": "Walmart",
      "merchant_phone": null,
      "merchant_address": "123 Main St",
      "store_location": "Store #1234"
    },
    "items": [
      {
        "description": "MILK WHOLE GAL",
        "quantity": 1.0,
        "unit_price": 4.99,
        "line_total": 4.99
      }
    ],
    "totals": {
      "subtotal": 4.99,
      "tax_amount": 0.35,
      "tip_amount": 0.0,
      "discount_amount": 0.0,
      "total_amount": 5.34
    },
    "meta": {
      "currency": "USD",
      "receipt_number": "12345",
      "model_name": "gemini-2.5-pro"
    }
  },
  "temp_file_name": "temp_receipt_20250115T120000Z_receipt.jpg"
}

ステップ2: OCR精度問題のデバッグ

OCRが誤った金額や説明を生成する場合:

よくある問題:

  1. 数字の混同: 4↔9, 3↔8, 5↔6, 0↔8, 1↔7
  2. 項目不足: レシートから項目が抽出されない
  3. 合計の誤り: 抽出された金額が一致しない
  4. 画像品質の低下: ぼやけている、暗い、または低解像度の画像

デバッグプロセス:

  1. 元の画像品質を確認します:

    # レシート画像を表示します
    open /path/to/receipt.jpg
    # または
    xdg-open /path/to/receipt.jpg
    • テキストははっきりと読めますか?
    • 画像は適切に配置されていますか?
    • 十分なコントラストがありますか?
  2. app/services/receipt_engine.py:96-173Geminiプロンプトを確認します:

    • 精度ルールと検証ステップを探します。
    • 新しい問題タイプに特定の指示が必要かどうかを確認します。
    • 数字の混同防止ルールが明確であることを確認します。
  3. より高品質の画像でテストします:

    • 設定で RECEIPT_IMAGE_MAX_WIDTH_PX を増やします。
    • receipt_parser.py:80,83 でJPEG品質を上げます。
  4. 検証ロジックを追加します:

    • 各項目について quantity × unit_price = line_total を確認します。
    • sum(line_totals) ≈ subtotal を検証します。
    • subtotal + tax - discount = total を比較します。
  5. AIの生応答を調べます:

    # receipt_engine.py:78 にデバッグログを追加します
    print(f"Raw Gemini Response: {json_response}")

ステップ3: レシートパーサーの強化

解析精度と機能を向上させるには:

Geminiプロンプトの変更 (app/services/receipt_engine.py):


def _get_prompt(self) -> str:
    return """
    You are an expert at extracting st

(原文がここで切り詰められています)
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Receipt Scanner Master

Master the receipt scanning system that uses AI-powered OCR to extract structured data from receipt images and store them in the database.

What This Skill Does

This skill helps you:

  1. Parse receipt images (JPG, PNG, WebP, PDF) into structured data
  2. Debug OCR accuracy issues and extraction errors
  3. Enhance the receipt parsing engine and prompts
  4. Test receipt uploads through the web interface
  5. Troubleshoot database integration issues
  6. Validate extracted data against actual receipts
  7. Improve categorization and line item extraction

System Architecture

Frontend Components

Receipt Scanner Component: /home/adamsl/planner/office-assistant/js/components/receipt-scanner.js

  • Primary receipt scanning interface at http://localhost:8080/receipt-scanner.html
  • Drag-and-drop or file upload for receipt images
  • Parses receipts and displays line items in a table
  • Each line item has a category-picker dropdown
  • Items auto-save to database immediately when categorized
  • Only categorized items are saved (uncategorized items ignored)
  • No overall receipt-level category picker (removed)

Upload Component: /home/adamsl/planner/office-assistant/js/upload-component.js

  • Alternative upload interface (bank statements)
  • Displays recent downloads from the system
  • Shows real-time processing feedback via terminal display
  • Handles streaming responses from backend (Server-Sent Events)
  • Auto-refreshes file list after successful imports

Backend Components

Receipt Parser: app/services/receipt_parser.py

  • Validates file types and sizes
  • Processes and compresses images
  • Manages temporary and permanent file storage
  • Coordinates with AI engine for extraction

Receipt Engine: app/services/receipt_engine.py

  • Uses Google Gemini AI for OCR and extraction
  • Implements strict accuracy validation rules
  • Returns structured data via Pydantic models
  • Tries models in order: gemini-2.5-flash (first), 2.0-flash, 2.5-pro, pro-latest
  • Flash model used first to avoid pro quota limits
  • Separate quotas for flash vs pro models

API Endpoints: app/api/receipt_endpoints.py

  • /api/parse-receipt - Uploads and parses receipt image (returns temp data, doesn't save)
  • /api/receipt-items - Auto-saves individual line items when categorized
  • /api/save-receipt - Final save for categorized items (batch operation)
  • /api/receipts/{expense_id} - Retrieves receipt metadata
  • /api/receipts/file/{year}/{month}/{filename} - Serves stored receipt files

Data Models: app/models/receipt_models.py

  • ReceiptExtractionResult - Complete receipt data structure
  • ReceiptItem - Individual line items with categorization
  • ReceiptTotals - Subtotal, tax, tip, discount, total
  • ReceiptPartyInfo - Merchant details
  • ReceiptMeta - Parsing metadata and model info
  • PaymentMethod - Enum: CASH, CARD, BANK, OTHER

Database Integration

Tables:

  • expenses - Main expense entries (amount, date, category, method)
  • receipt_metadata - Parsing metadata (model, confidence, raw response)

Storage Structure:

app/data/receipts/
├── YYYY/
│   ├── MM/
│   │   ├── receipt_TIMESTAMP_filename.jpg
│   │   └── receipt_TIMESTAMP_filename.pdf
└── temp/
    └── temp_receipt_TIMESTAMP_filename.jpg

How to Use This Skill

Step 1: Test Receipt Parsing

Parse a receipt image to extract structured data:

# Start the API server if not running
python3 api_server.py

# Test with curl (from another terminal)
curl -X POST "http://localhost:8000/api/parse-receipt" \
  -F "file=@/path/to/receipt.jpg"

Expected Response:

{
  "parsed_data": {
    "transaction_date": "2025-01-15",
    "payment_method": "CARD",
    "party": {
      "merchant_name": "Walmart",
      "merchant_phone": null,
      "merchant_address": "123 Main St",
      "store_location": "Store #1234"
    },
    "items": [
      {
        "description": "MILK WHOLE GAL",
        "quantity": 1.0,
        "unit_price": 4.99,
        "line_total": 4.99
      }
    ],
    "totals": {
      "subtotal": 4.99,
      "tax_amount": 0.35,
      "tip_amount": 0.0,
      "discount_amount": 0.0,
      "total_amount": 5.34
    },
    "meta": {
      "currency": "USD",
      "receipt_number": "12345",
      "model_name": "gemini-2.5-pro"
    }
  },
  "temp_file_name": "temp_receipt_20250115T120000Z_receipt.jpg"
}

Step 2: Debug OCR Accuracy Issues

When OCR produces incorrect amounts or descriptions:

Common Issues:

  1. Digit Confusion: 4↔9, 3↔8, 5↔6, 0↔8, 1↔7
  2. Missing Items: Items not extracted from receipt
  3. Wrong Totals: Extracted amounts don't match
  4. Poor Image Quality: Blurry, dark, or low-resolution images

Debug Process:

  1. Check the raw image quality:

    # View the receipt image
    open /path/to/receipt.jpg
    # or
    xdg-open /path/to/receipt.jpg
    • Is text clearly readable?
    • Is image properly oriented?
    • Is there sufficient contrast?
  2. Review the Gemini prompt in app/services/receipt_engine.py:96-173:

    • Look for the accuracy rules and verification steps
    • Check if new issue types need specific instructions
    • Verify digit confusion prevention rules are clear
  3. Test with higher quality image:

    • Increase RECEIPT_IMAGE_MAX_WIDTH_PX in settings
    • Increase JPEG quality in receipt_parser.py:80,83
  4. Add validation logic:

    • Check quantity × unit_price = line_total for each item
    • Verify sum(line_totals) ≈ subtotal
    • Compare subtotal + tax - discount = total
  5. Examine the raw AI response:

    # Add debug logging in receipt_engine.py:78
    print(f"Raw Gemini Response: {json_response}")

Step 3: Enhance the Receipt Parser

To improve parsing accuracy and features:

Modify the Gemini Prompt (app/services/receipt_engine.py):

def _get_prompt(self) -> str:
    return """
    You are an expert at extracting structured data from receipt images with EXTREME ACCURACY.

    [Add new instructions here, such as:]

    **NEW RULE**: For grocery store receipts, items often have:
    - Short codes (e.g., "VEG", "DAIRY", "MEAT")
    - Weight-based pricing (price per lb/kg)
    - Multi-buy discounts (e.g., "2 for $5")

    **VALIDATION ENHANCEMENT**: Before returning JSON:
    1. Verify every item's math: quantity × unit_price = line_total
    2. Sum all line_totals and compare to subtotal
    3. Check: subtotal + tax - discount + tip = total_amount
    4. If any validation fails, RE-EXAMINE the receipt more carefully

    ... [rest of prompt]
    """

Improve Image Processing (app/services/receipt_parser.py):

async def _process_image(self, image_data: bytes, mime_type: str):
    if mime_type.startswith("image/"):
        img = Image.open(BytesIO(image_data))

        # Add preprocessing steps:
        # 1. Auto-rotate based on EXIF
        # 2. Increase contrast for faded receipts
        # 3. Sharpen slightly for better OCR
        # 4. Convert to grayscale if color isn't needed

Add Custom Validation (app/api/receipt_endpoints.py):

@router.post("/parse-receipt")
async def parse_receipt_endpoint(file: UploadFile = File(...)):
    parsed_data, temp_file_name = await parser.process_receipt(file)

    # Add validation here:
    validation_errors = validate_receipt_data(parsed_data)
    if validation_errors:
        return JSONResponse(
            status_code=422,
            content={
                "errors": validation_errors,
                "parsed_data": parsed_data,
                "temp_file_name": temp_file_name
            }
        )

    return ParseReceiptResponse(...)

Step 4: Test Through Web Interface

Test the complete workflow including UI:

  1. Start the API server:

    cd /home/adamsl/planner/nonprofit_finance_db
    python3 api_server.py
  2. Open the web interface:

    cd /home/adamsl/planner/office-assistant
    # Open index.html in browser or use a local server
    python3 -m http.server 8080
    # Navigate to http://localhost:8080
  3. Test the upload flow:

    • Download a test receipt (PDF or image) to ~/Downloads
    • Verify it appears in the upload component
    • Select the receipt and click "Import Selected PDF"
    • Watch the terminal output for processing steps
    • Verify success message and database insertion
  4. Check database entries:

    # Connect to database and verify
    mysql -u root -p nonprofit_finance_db
    -- Check latest expense entries
    SELECT * FROM expenses ORDER BY id DESC LIMIT 5;
    
    -- Check receipt metadata
    SELECT * FROM receipt_metadata ORDER BY id DESC LIMIT 5;
    
    -- Verify file storage
    SELECT expense_id, receipt_url FROM expenses WHERE receipt_url IS NOT NULL LIMIT 5;

Step 5: Troubleshoot Database Issues

Common database integration problems:

Issue: Receipt parsed but not saved to database

Debug steps:

# Check API server logs
tail -f api_server.log

# Look for errors in save_receipt_endpoint
grep -A 10 "Error saving expense" api_server.log

# Verify database connection
python3 -c "from app.repositories.expenses import ExpenseRepository; repo = ExpenseRepository(); print('Connection OK')"

Issue: File saved to temp but not moved to permanent storage

Debug steps:

# Check temp directory
ls -lth app/data/receipts/temp/ | head -20

# Check permanent storage structure
ls -R app/data/receipts/ | grep -E "^\\./"

# Verify permissions
ls -ld app/data/receipts/

Issue: Categorization not working

Debug steps:

# Check categories table
mysql -u root -p -e "SELECT id, name, category_path FROM categories ORDER BY id;" nonprofit_finance_db

# Verify category_id assignments in parsed items
# Items without category_id are not saved to database

Step 6: Validate Extraction Accuracy

Manually verify OCR accuracy:

  1. Get the parsed data:

    curl -X POST "http://localhost:8000/api/parse-receipt" \
      -F "file=@receipt.jpg" | jq '.'
  2. Compare against actual receipt:

    • Open receipt image side-by-side
    • Check each line item: description, quantity, price, total
    • Verify merchant name and address
    • Confirm tax amount and final total
    • Note any discrepancies
  3. Calculate accuracy metrics:

    # Create a validation script
    import json
    
    def validate_receipt(parsed_json, actual_receipt_data):
        errors = []
    
        # Check item count
        if len(parsed_json['items']) != len(actual_receipt_data['items']):
            errors.append(f"Item count mismatch: {len(parsed_json['items'])} vs {len(actual_receipt_data['items'])}")
    
        # Check each item
        for i, (parsed, actual) in enumerate(zip(parsed_json['items'], actual_receipt_data['items'])):
            if parsed['line_total'] != actual['line_total']:
                errors.append(f"Item {i}: ${parsed['line_total']} vs ${actual['line_total']}")
    
        # Check total
        if parsed_json['totals']['total_amount'] != actual_receipt_data['total']:
            errors.append(f"Total: ${parsed_json['totals']['total_amount']} vs ${actual_receipt_data['total']}")
    
        return errors

Configuration Files

Environment Variables (.env):

GEMINI_API_KEY=your_gemini_api_key_here

# Receipt settings
RECEIPT_MAX_SIZE_MB=10
RECEIPT_IMAGE_MAX_WIDTH_PX=2048
RECEIPT_IMAGE_MAX_HEIGHT_PX=2048
RECEIPT_PARSE_TIMEOUT_SECONDS=30
RECEIPT_UPLOAD_DIR=app/data/receipts
RECEIPT_TEMP_UPLOAD_DIR=app/data/receipts/temp

Settings (app/config.py):

class Settings(BaseSettings):
    GEMINI_API_KEY: str
    RECEIPT_MAX_SIZE_MB: int = 10
    RECEIPT_IMAGE_MAX_WIDTH_PX: int = 1024
    RECEIPT_IMAGE_MAX_HEIGHT_PX: int = 1024
    RECEIPT_PARSE_TIMEOUT_SECONDS: int = 30
    RECEIPT_UPLOAD_DIR: str = "app/data/receipts"
    RECEIPT_TEMP_UPLOAD_DIR: str = "app/data/receipts/temp"

Receipt Scanner Workflow (Important!)

CRITICAL: Items do NOT automatically save when you scan a receipt. You must categorize items for them to be saved.

Workflow Steps:

  1. Upload receipt → Parses and shows line items (nothing saved yet)
  2. Select category for each itemItem saves immediately to database
  3. "Save Expense" button → Optional final confirmation

What Gets Saved:

  • ✓ Items with categories selected → Saved to expenses table
  • ✗ Items without categories → Ignored, not saved
  • Each categorized item becomes a separate expense entry

Database Behavior:

// When you select a category for an item:
_persistCategorizedItem(index, categoryId) {
  // Immediately POSTs to /api/receipt-items
  // Creates expense entry in database
  // Returns expense_id for the item
}

Common Issues & Solutions

Issue: "GEMINI_API_KEY environment variable not set"

Solution:

# Add to .env file
echo 'GEMINI_API_KEY=your_key_here' >> .env

# Or export in current session
export GEMINI_API_KEY=your_key_here

Issue: Gemini API quota exceeded (429 error)

Root Cause: Hit the free tier daily quota for a specific model

Solutions:

  1. Model fallback (already implemented):

    • Receipt engine tries flash models first (separate quota from pro)
    • Order: gemini-2.5-flash → 2.0-flash → 2.5-pro → pro-latest
  2. Wait for quota reset (24 hours)

  3. Use different Google account:

    • Create API key from different account
    • Update GEMINI_API_KEY in .env
  4. Upgrade to paid tier (higher quotas)

Issue: OCR reads $4.99 as $9.99

Root Cause: Digit confusion (4 vs 9)

Solution: Enhance Gemini prompt with specific digit rules:

**DIGIT 4 vs 9 RECOGNITION**:
- 4 has sharp angles, often looks like "4" with a horizontal line and vertical line meeting
- 9 has a curved top, looks like "g" or "q" without the tail
- Context check: grocery items rarely cost $9.99, more often $4.99

Issue: Missing line items in extraction

Root Cause: Items at bottom of receipt or spanning multiple lines

Solution:

  1. Increase image resolution in receipt_parser.py
  2. Add instruction to Gemini prompt:
    **COMPLETE EXTRACTION**: Extract ALL items from top to bottom of receipt.
    Do not skip items even if they are:
    - At the very bottom of the receipt
    - Spanning multiple lines
    - In a different format or font

Issue: Tax calculation mismatch

Root Cause: Some items are tax-exempt or have different tax rates

Solution:

  • Add per-item tax tracking in ReceiptItem model
  • Update Gemini prompt to identify taxable vs non-taxable items
  • Validate: sum(item.tax_amount for item in items) = totals.tax_amount

Issue: "Receipt parsing exceeded 30 seconds"

Root Cause: Large image file or slow API response

Solutions:

# Increase timeout in settings
RECEIPT_PARSE_TIMEOUT_SECONDS=60

# Reduce image size before sending to API
# In receipt_parser.py, decrease max dimensions
max_width = 1024  # Instead of 2048
max_height = 1024

Issue: Uploaded file not appearing in component

Root Cause: Frontend not polling or backend endpoint error

Debug steps:

# Check backend endpoint
curl http://localhost:8000/api/recent-downloads

# Check frontend console
# Open browser DevTools → Console → look for errors

# Verify file in Downloads folder
ls -lth ~/Downloads/*.pdf | head -5

Key Files Reference

Backend Files

  • app/services/receipt_parser.py - Main parsing logic
  • app/services/receipt_engine.py - AI engine integration
  • app/api/receipt_endpoints.py - REST API endpoints
  • app/models/receipt_models.py - Data models
  • app/repositories/receipt_metadata.py - Metadata storage
  • app/repositories/expenses.py - Expense storage
  • app/config.py - Configuration settings

Frontend Files

  • /home/adamsl/planner/office-assistant/js/upload-component.js - Upload UI component
  • /home/adamsl/planner/office-assistant/js/app.js - Main application
  • /home/adamsl/planner/office-assistant/js/category-picker.js - Category selection

Test Files

  • tests/test_receipt_processing.py - Receipt processing tests
  • tests/test_receipt_items_api.py - API endpoint tests
  • test_receipt_api.py - Integration tests

Examples

Example 1: Scan and Categorize a Receipt (Web Interface)

User request:

I want to scan my Meijer receipt and categorize the groceries

You would:

  1. Direct user to the receipt scanner:

    Open http://localhost:8080/receipt-scanner.html in your browser
  2. Guide the workflow:

    • Upload: Drag and drop the receipt image or click to browse
    • Wait: Receipt parses automatically (gemini-2.5-flash model)
    • Review: Check the parsed line items in the table
    • Categorize: Select category for each item you want to track
      • Click category dropdown for each item
      • Select appropriate category (e.g., "Groceries > Dairy")
      • Item saves immediately to database
    • Optional: Click "Save Expense" to confirm completion
  3. Verify in database:

    • Only categorized items are saved
    • Each item is a separate expense entry
    • Uncategorized items are ignored
  4. View in Daily Expense Categorizer:

    • Navigate to http://localhost:8080/daily_expense_categorizer.html
    • Select the month from dropdown
    • Select the date
    • See all saved receipt items
    • Can re-categorize if needed

Example 2: Parse a Grocery Receipt (API)

User request:

Parse this grocery receipt via API and extract all items with prices

You would:

  1. Verify API server is running:

    ps aux | grep api_server.py
    # If not running: python3 api_server.py
  2. Parse the receipt:

    curl -X POST "http://localhost:8080/api/parse-receipt" \
      -F "file=@grocery_receipt.jpg" | jq '.'
  3. Review the output:

    • Check items[] array for all products
    • Verify totals.total_amount matches receipt
    • Note the temp_file_name for saving later
    • Note: Nothing is saved to database yet
  4. If items are missing:

    • Open the receipt image and compare
    • Check if image quality is sufficient
    • Look for items at bottom or in different sections

Example 3: Debug OCR Misreading Prices

User request:

The receipt parser is reading $4.99 items as $9.99

You would:

  1. Reproduce the issue:

    curl -X POST "http://localhost:8000/api/parse-receipt" \
      -F "file=@problem_receipt.jpg" > parsed_output.json
    
    # Compare parsed vs actual
    cat parsed_output.json | jq '.parsed_data.items[] | {description, unit_price}'
  2. Read the current Gemini prompt:

    grep -A 30 "DIGIT CONFUSION PREVENTION" app/services/receipt_engine.py
  3. Enhance the prompt with specific 4 vs 9 rules:

    # In receipt_engine.py, _get_prompt() method
    **CRITICAL: DIGIT 4 vs DIGIT 9**:
    - When you see what might be 4 or 9, examine the top of the digit
    - 4: Angular top, horizontal line going right
    - 9: Curved/circular top, like the letter "g"
    - Common grocery prices: $4.99, $14.99, NOT $9.99, $19.99
    - If unsure, default to 4 for items under $10
  4. Test with the problematic receipt:

    # Restart server to load new prompt
    pkill -f api_server.py
    python3 api_server.py &
    
    # Re-test
    curl -X POST "http://localhost:8000/api/parse-receipt" \
      -F "file=@problem_receipt.jpg" | jq '.parsed_data.items[].unit_price'
  5. Verify improvement and test with other receipts

Example 4: Add Custom Validation

User request:

Validate that line totals match quantity times price

You would:

  1. Read the current endpoint code:

    cat app/api/receipt_endpoints.py | grep -A 20 "parse_receipt_endpoint"
  2. Create a validation function:

    # Add to receipt_endpoints.py
    def validate_receipt_math(parsed_data: ReceiptExtractionResult) -> List[str]:
        errors = []
    
        for i, item in enumerate(parsed_data.items):
            expected_total = round(item.quantity * item.unit_price, 2)
            if abs(expected_total - item.line_total) > 0.01:
                errors.append(
                    f"Item {i} '{item.description}': "
                    f"{item.quantity} × ${item.unit_price} = ${expected_total}, "
                    f"but line_total is ${item.line_total}"
                )
    
        # Validate subtotal
        items_sum = sum(item.line_total for item in parsed_data.items)
        if abs(items_sum - parsed_data.totals.subtotal) > 0.50:
            errors.append(
                f"Items sum to ${items_sum:.2f} but subtotal is ${parsed_data.totals.subtotal:.2f}"
            )
    
        # Validate final total
        calculated_total = (
            parsed_data.totals.subtotal +
            (parsed_data.totals.tax_amount or 0) +
            (parsed_data.totals.tip_amount or 0) -
            (parsed_data.totals.discount_amount or 0)
        )
        if abs(calculated_total - parsed_data.totals.total_amount) > 0.01:
            errors.append(
                f"Calculated total ${calculated_total:.2f} != stated total ${parsed_data.totals.total_amount:.2f}"
            )
    
        return errors
  3. Integrate validation into endpoint:

    @router.post("/parse-receipt", response_model=ParseReceiptResponse)
    async def parse_receipt_endpoint(file: UploadFile = File(...)):
        parser = get_receipt_parser()
        temp_file_name: Optional[str] = None
        try:
            parsed_data, temp_file_name = await parser.process_receipt(file)
    
            # Add validation
            validation_errors = validate_receipt_math(parsed_data)
            if validation_errors:
                # Log errors but still return the data
                print(f"Validation warnings: {validation_errors}")
    
            return ParseReceiptResponse(parsed_data=parsed_data, temp_file_name=temp_file_name)
  4. Test the validation:

    # Use a receipt with known correct totals
    curl -X POST "http://localhost:8000/api/parse-receipt" \
      -F "file=@test_receipt_good.jpg"
    
    # Use a receipt with deliberate errors (or mock the data)
    # Check logs for validation warnings
    tail -f api_server.log

Example 5: Integrate with Letta Agent

User request:

Make Letta able to scan and categorize receipts

You would:

  1. Ensure this skill is available to Letta:

    # Skill already in .claude/skills/receipt-scanner/
    # Letta can invoke Claude Code skills via agent tool calls
  2. Create a Letta tool function:

    # In letta_agent/tools/receipt_tools.py
    from typing import Optional
    import httpx
    
    @tool
    def scan_receipt(image_path: str) -> dict:
        """
        Scan a receipt image and extract structured data.
    
        Args:
            image_path: Path to the receipt image file
    
        Returns:
            Dictionary with merchant, items, totals, and metadata
        """
        with open(image_path, 'rb') as f:
            files = {'file': f}
            response = httpx.post(
                'http://localhost:8000/api/parse-receipt',
                files=files,
                timeout=60.0
            )
    
        if response.status_code == 200:
            return response.json()
        else:
            return {'error': response.text}
  3. Register the tool with Letta agent:

    # In hybrid_letta_persistent.py
    from letta_agent.tools.receipt_tools import scan_receipt
    
    agent = client.create_agent(
        name="finance_assistant",
        tools=[scan_receipt, ...],
        ...
    )
  4. Test with Letta:

    # Chat with Letta
    response = client.send_message(
        agent_id=agent.id,
        message="Scan the receipt at ~/Downloads/walmart_receipt.jpg and tell me the total"
    )
    print(response)

Success Criteria

The skill is successful when:

  • Receipts parse with >95% accuracy on item prices
  • All line items are extracted (no missing items)
  • Totals match within $0.01 tolerance
  • Database integration works consistently
  • Web interface provides clear feedback
  • Common OCR issues have documented solutions
  • Letta agents can successfully use receipt scanning

Tips for Users

  1. Start with high-quality images: Clear, well-lit, straight photos work best
  2. Test incrementally: Parse → validate → save (don't skip validation)
  3. Build validation suite: Collect problematic receipts and test regularly
  4. Monitor accuracy trends: Track OCR errors to identify patterns
  5. Update prompt iteratively: Add specific rules as you encounter issues
  6. Use streaming responses: Enable real-time feedback for better UX
  7. Backup original files: Keep original receipts even after successful parsing