{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihurunyobrbk3ibdx7tuq2mzkhbu7enev5ee6fdbhv6ls3wcjhcua",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mihi3fpjaht2"
  },
  "path": "/t/invoice-data-recognition/174564#post_5",
  "publishedAt": "2026-04-01T09:03:19.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "Thank you again. I think I will start with a workflow that extracts paragraphs of text to understand the workflow as this sounds easier, there is no tabular data for my first case I will use as a test. The invoice sounds much more complicated. I know some Python but have not done AI or OCR with Python before.\n\nIf I do extract an invoice, the PDF we do get has multiple invoice summaries per page. I failed to mention this earlier. It’s an invoice summary from a major US shipper. There is one charge per shipment so they fit multiple invoices per page. We do 1000s of shipments every month with this shipper and other shippers.",
  "title": "Invoice Data Recognition"
}