{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreihurunyobrbk3ibdx7tuq2mzkhbu7enev5ee6fdbhv6ls3wcjhcua",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3miggi6kuzgw2"
},
"path": "/t/invoice-data-recognition/174564#post_5",
"publishedAt": "2026-04-01T09:03:19.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "Thank you again. I think I will start with a workflow that extracts paragraphs of text to understand the workflow as this sounds easier, there is no tabular data for my first case I will use as a test. The invoice sounds much more complicated. I know some Python but have not done AI or OCR with Python before.\n\nIf I do extract an invoice, the PDF we do get has multiple invoice summaries per page. I failed to mention this earlier. It’s an invoice summary from a major US shipper. There is one charge per shipment so they fit multiple invoices per page. We do 1000s of shipments every month with this shipper and other shippers.",
"title": "Invoice Data Recognition"
}