{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreihn4vpdbspwmiahx53l5l25cy4yv2vftlgaabkhi4g6zwddh5qxyi",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mn75q7wx5jv2"
},
"path": "/t/phone-shot-image-to-studio-shot-image-version-for-products/176425#post_2",
"publishedAt": "2026-06-01T02:58:43.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"Long version here",
"Shopify/background-replacement",
"BRIA RMBG-2.0",
"SAM2",
"Segment Anything",
"ComfyUI basic inpaint docs",
"ComfyUI mask editor",
"FLUX.1 Fill dev",
"ComfyUI Flux Fill workflow docs",
"ComfyUI-Flux-Inpainting",
"Comfy-Inpainting-Works",
"FLUX.1 Kontext dev",
"Finegrain Product Placement LoRA",
"Finegrain Product Placement Space",
"Finegrain product placement Flux LoRA experiment",
"Qwen-Image-Edit",
"Qwen-Image-Edit model card",
"Qwen Image Edit blog",
"ComfyUI Qwen-Image-Edit workflow docs",
"Qwen-Image technical report",
"IC-Light GitHub",
"ComfyUI-IC-Light",
"ComfyUI background replacement with IC-Light",
"Product Photography Relight workflow",
"ComfyUI hand repair workflow",
"HandFixer",
"ComfyUI VRAM modes / server config",
"ComfyUI Dynamic VRAM",
"Diffusers memory optimization",
"ComfyUI-GGUF",
"Nunchaku / SVDQuant",
"ComfyUI-nunchaku",
"HFPC: product background inpainting evaluation with product consistency",
"DINOv2",
"OpenAI CLIP",
"OpenCLIP",
"LPIPS",
"Tesseract OCR",
"scikit-image SSIM",
"Diffuse to Choose project page",
"Diffuse to Choose paper",
"Amazon Science: Virtual Try-All",
"Product2IMG",
"BG60k / e-commerce product background generation",
"HFPC product consistency evaluation",
"BRIA Product Shot",
"RunComfy Qwen Image Edit workflow",
"RunComfy Put It Here Kontext"
],
"textContent": "I looked into this a bit. Depending on your VRAM, I think there are a few workable routes. (Long version here) :\n\n* * *\n\n## Short version\n\nI would not treat this as one single “image-to-image” problem.\n\nIt is really a bundle of smaller tasks:\n\n * product cutout / masking\n * clean packshot generation\n * background replacement\n * floor and contact-shadow generation\n * relighting / color matching\n * product placement into a lifestyle scene\n * hand-object interaction if a person is holding the product\n * logo / label / text / hardware / stitching preservation\n * low-VRAM execution\n * product-identity QA\n\n\n\nThe safest rule is:\n\n> If SKU identity matters, do not regenerate the whole product unless you absolutely have to.\n\nFor e-commerce/product photography, a beautiful image can still be a failure if the product is no longer the same product. I would usually start from workflows that preserve the original product pixels and only generate the background, floor, shadow, lighting, or small boundary/contact areas around it.\n\nA useful mental model is:\n\n\n protect product pixels\n generate or edit background pixels\n add floor/contact shadow\n relight or color-match\n composite original product back if needed\n verify product identity\n\n\nThis is also why I would separate “phone shot to studio packshot” from “person holding the product.” The latter is not ordinary product placement; it is hand-object interaction with occlusion.\n\n* * *\n\n## 1. First principle: prompt the scene, not the SKU\n\nFor background replacement, do not over-describe the product itself. If you prompt “a brown leather handbag with gold zipper and braided handle,” the model may try to recreate a plausible brown handbag instead of preserving the exact one.\n\nA better prompt usually describes:\n\n * where the product is grounded\n * the studio/background scene\n * the floor/surface\n * lighting style\n * camera/product-photography style\n\n\n\nShopify’s old SDXL background-replacement Space has a very useful prompting rule in this direction: do not describe the product; describe grounding, scene, and style. See Shopify/background-replacement.\n\nSo my shorthand would be:\n\n> Prompt the scene, not the SKU.\n\nExample:\n\n\n clean white ecommerce studio background, product standing on a matte white surface, soft diffused studio lighting, subtle realistic contact shadow, catalog product photography\n\n\nNot:\n\n\n brown leather handbag with gold zipper, braided handle, front logo, side stitching\n\n\nThe second prompt may encourage the model to redraw the product.\n\n* * *\n\n## 2. Workflow options\n\n### W0. Cutout + composite + shadow\n\nThis is the safest baseline for a clean packshot.\n\n\n input product photo\n -> background removal / segmentation\n -> original product cutout with alpha\n -> white or light gray canvas\n -> scale and center\n -> synthetic or inpainted contact shadow\n -> optional relighting / color match\n -> final QA\n\n\nThis is not flashy, but it preserves SKU identity better than almost any full-image generation workflow. Use this as the control group before testing Flux/Qwen/Kontext workflows.\n\nUseful parts:\n\n * BRIA RMBG-2.0 for background removal\n * SAM2 for promptable segmentation\n * Segment Anything if you need interactive masks\n * Pillow/OpenCV/Photoshop/ComfyUI nodes for compositing\n\n\n\nBest for:\n\n * white-background product listing\n * exact product shape/color/material\n * low VRAM\n * batch packshots\n\n\n\nMain failure modes:\n\n * halo around edges\n * no contact shadow\n * product looks pasted\n * white product on white background loses shape\n\n\n\n* * *\n\n### W1. SDXL background-only inpaint\n\nSDXL is not the newest or strongest editor, but it is still a useful low-VRAM baseline.\n\nUse it for:\n\n * background-only inpainting\n * floor/contact shadow experiments\n * quick packshot tests\n * comparison baseline before heavier Flux/Qwen workflows\n\n\n\nThe important point is to protect the product.\n\n\n input photo\n -> product mask\n -> invert mask or protect product region\n -> inpaint only background/floor/shadow\n -> composite original product back\n -> QA product crop\n\n\nDo not ask SDXL to redraw the product if exact identity matters. It may produce a similar-looking product with changed hardware, stitching, logo, label, color, or proportions.\n\nUseful links:\n\n * ComfyUI basic inpaint docs\n * ComfyUI mask editor\n\n\n\n* * *\n\n### W2. Flux Fill for background/floor/shadow\n\nFLUX.1 Fill dev is very relevant, but I would frame it as a **masked completion component** , not a one-click product-photography solution.\n\nGood use:\n\n\n protect original product\n mask background/floor/shadow area\n Flux Fill generates only the missing background/floor/shadow\n composite original product back if needed\n relight / blend / QA\n\n\nIt is promising for:\n\n * replacing messy phone-shot backgrounds\n * adding studio floors\n * extending canvas/outpainting\n * creating more natural shadows around a protected product\n\n\n\nBut product-background swap quality depends heavily on:\n\n * mask precision\n * mask expansion/blur\n * contact shadow\n * relighting\n * final blending\n * whether you composite the original product back\n\n\n\nLow-VRAM users may need GGUF/NF4/offload/custom nodes. Also see:\n\n * ComfyUI Flux Fill workflow docs\n * ComfyUI-Flux-Inpainting\n * Comfy-Inpainting-Works\n\n\n\n* * *\n\n### W3. Flux Kontext direct edit\n\nFLUX.1 Kontext dev is probably one of the strongest local candidates for direct “phone shot → studio shot” editing. The model card describes image editing from text instructions, object/style/character reference, and successive edits with minimal visual drift.\n\nTest it like this:\n\n\n input product photo\n -> Flux Kontext\n -> prompt: turn this into a professional ecommerce studio product photo\n -> output\n -> product identity QA\n\n\nHowever, for strict e-commerce use, I would not trust the direct output blindly. A direct edit may look excellent while quietly changing:\n\n * silhouette\n * color\n * leather/fabric texture\n * zipper or buckle shape\n * logo\n * label text\n * handle length\n * stitching\n * product proportions\n\n\n\nFor serious use, compare two versions:\n\n\n A. Flux Kontext direct edit\n B. Flux Kontext for studio look + original product composited back\n\n\nA may look better. B is usually safer for SKU identity.\n\n* * *\n\n### W3b. Flux Kontext composite-back variant\n\nThis is the safer Kontext route.\n\n\n input product photo\n -> Flux Kontext creates target studio look/background/lighting\n -> use generated output as visual target\n -> cut out original product\n -> composite original product back\n -> contact shadow / relight / color match\n -> QA\n\n\nThis is useful when Kontext gives good lighting/background style but changes the product too much.\n\n* * *\n\n### W4. Finegrain Product Placement\n\nFinegrain Product Placement LoRA is useful for thinking about product placement. It is a Flux Kontext LoRA aimed at product photography with bounding-box control.\n\nThe mental model is not “just prompt harder.” It is:\n\n\n scene image\n + transparent product cutout\n + placement box\n -> product blended into scene\n\n\nThe Finegrain Product Placement Space exposes this clearly: upload a scene photo, draw a box where the item should go, and provide a product image with transparent background.\n\nImportant caveat: the model card explicitly says products in hands are not supported. So Finegrain is relevant for:\n\n * product on table\n * product on shelf\n * product on floor\n * product in a room scene\n * product on display\n\n\n\nIt is not the answer to:\n\n * person holding the bag\n * hand gripping the handle\n * shoulder-worn bag\n * complex hand/object occlusion\n\n\n\nAlso check the official blog: Finegrain product placement Flux LoRA experiment.\n\n* * *\n\n### W5. Qwen-Image-Edit for labels, packaging, logos, printed text\n\nQwen-Image-Edit is especially relevant when product text matters. I would not necessarily start with it for a plain leather bag, but I would test it for:\n\n * product boxes\n * bottles\n * packaging\n * labels\n * signs\n * logos\n * printed instructions\n * UI/product mockups\n * localized marketing creatives\n\n\n\nQwen’s strength is text-aware image editing, but:\n\n> text-capable is not SKU-safe.\n\nFor product work, the question is not merely “is the generated text readable?” The question is “is this still the same label/logo/brand/product?”\n\nUse:\n\n * OCR before/after\n * manual logo review\n * crop comparison\n * original label-region composite-back if needed\n\n\n\nUseful links:\n\n * Qwen-Image-Edit model card\n * Qwen Image Edit blog\n * ComfyUI Qwen-Image-Edit workflow docs\n * Qwen-Image technical report\n\n\n\n* * *\n\n### W6. Relighting / IC-Light\n\nRelighting deserves its own step.\n\nMany product/background swaps fail because the old phone-shot lighting remains on the product. The background changes, but the product still has the old shadows and highlights, so the image looks pasted together.\n\nUse relighting after:\n\n * cutout + composite\n * generated background\n * Flux Fill background work\n * manual placement\n * product-background blending\n\n\n\nA generic route:\n\n\n product cutout\n + selected/generated background\n -> composite product\n -> relight foreground to match background\n -> add/refine contact shadow\n -> restore original product details if softened\n\n\nUseful links:\n\n * IC-Light GitHub\n * ComfyUI-IC-Light\n * ComfyUI background replacement with IC-Light\n * Product Photography Relight workflow\n\n\n\n* * *\n\n### W7. Manual placement + boundary/shadow fill\n\nIf product identity matters, a controlled manual workflow can be safer than a powerful all-in-one model.\n\n\n product cutout\n + target scene/background\n -> manually place product\n -> mask only boundary/contact/shadow area\n -> SDXL or Flux Fill repairs local boundary/shadow\n -> relight/color-match\n -> QA\n\n\nThis is good for:\n\n * bag on table\n * shoes on floor\n * bottle on bathroom counter\n * product on shelf\n * small accessory on desk\n\n\n\nIt is weak for:\n\n * hand holding product\n * product worn on body\n * heavy occlusion\n * wrong product perspective\n\n\n\n* * *\n\n### W8. Product in hand / person holding product\n\nThis is the hardest case.\n\nA person holding a product is not ordinary product placement. It adds:\n\n * hand/product occlusion\n * fingers wrapping around handles\n * product scale relative to body\n * gravity and strap deformation\n * contact shadows\n * foreground/background ordering\n * hand reconstruction\n * product identity preservation\n\n\n\nI would not expect normal product placement to solve this.\n\nA safer local workaround is:\n\n\n person image with suitable pose\n + original product cutout\n -> manually place product near hand\n -> mask only fingers / handle / contact / occlusion\n -> local inpaint / hand repair / Flux Fill / SDXL inpaint\n -> composite original product body back\n -> relight / shadow / QA\n\n\nThe key is to regenerate only the tiny contact/occlusion region, not the whole product.\n\nUseful links:\n\n * ComfyUI hand repair workflow\n * ComfyUI basic inpaint docs\n * HandFixer\n\n\n\nThere are cloud/partner-model templates for product-in-hand UGC-style workflows, but I would treat those as reference/fallback, not the main local/open route.\n\n* * *\n\n## 3. VRAM guide\n\nThis is approximate. “Runs on 8GB” or “runs on 12GB” is not enough information. It depends on:\n\n * model\n * quantization\n * text encoder\n * VAE\n * resolution\n * steps\n * LoRA/distillation\n * CPU/RAM offload\n * ComfyUI version\n * node implementation\n * system RAM\n * generation time\n\n\n\n### 8GB VRAM\n\nStart with:\n\n * cutout + composite + shadow\n * SDXL background-only inpaint\n * small resolution tests\n * VAE tiling/slicing\n * aggressive offload if needed\n\n\n\nTreat Flux/Qwen as experimental. Some community workflows may run, but speed and stability can be poor.\n\n### 12GB VRAM\n\nMore realistic:\n\n * SDXL composite workflows\n * Flux GGUF experiments\n * Flux Kontext GGUF tests\n * Flux Fill with quant/offload\n * careful text encoder choice\n\n\n\nStill log runtime. A 12GB report can mean under a minute or many minutes depending on quantization and workflow.\n\n### 16GB VRAM\n\nA good experimentation tier:\n\n * Flux GGUF/FP8 becomes more serious\n * Qwen 4-bit/GGUF/NF4 becomes testable\n * Finegrain placement may be possible\n * relighting/composite workflows are practical\n\n\n\n### 24GB VRAM\n\nA practical local comparison tier:\n\n * Flux Kontext\n * Flux Fill\n * Qwen-Image-Edit quantized or optimized\n * SDXL + ControlNet/IP-Adapter workflows\n * more comfortable high-res tests\n\n\n\n### 32GB+\n\nAt this point, focus less on “can it run?” and more on:\n\n * product identity\n * failure rate\n * batch reliability\n * legal/license terms\n * QA automation\n * repeatability\n * throughput\n\n\n\nUseful low-VRAM links:\n\n * ComfyUI VRAM modes / server config\n * ComfyUI Dynamic VRAM\n * Diffusers memory optimization\n * ComfyUI-GGUF\n * Nunchaku / SVDQuant\n * ComfyUI-nunchaku\n\n\n\n* * *\n\n## 4. Suggested order of testing\n\nI would test in this order:\n\n\n 1. W0 cutout + composite + shadow\n 2. W1 SDXL background-only inpaint\n 3. W2 Flux Fill background/floor/shadow\n 4. W6 relighting / IC-Light\n 5. W3 Flux Kontext direct edit\n 6. W3b Flux Kontext composite-back\n 7. W4 Finegrain product placement\n 8. W5 Qwen-Image-Edit for labels/text\n 9. W8 product-in-hand local workaround\n\n\nReason:\n\n * start with the least destructive workflow\n * establish a product-identity baseline\n * add generation only where it helps\n * reserve direct full-image editing for cases where the safer route is not enough\n\n\n\n* * *\n\n## 5. QA checklist\n\nDo not evaluate only the full image. The background is supposed to change. The product is not.\n\nCompare:\n\n * original product crop\n * generated product crop\n * original mask\n * generated/product mask\n * label crop\n * logo crop\n * hardware crop\n * full image\n\n\n\nCheck product identity:\n\n * silhouette / proportions\n * color\n * material texture\n * leather grain / fabric weave\n * hardware\n * zipper / buckle / strap / handle\n * stitching\n * label/logo\n * small text\n * barcode if relevant\n * product scale\n\n\n\nCheck scene realism:\n\n * contact shadow\n * light direction\n * floor contact\n * perspective\n * reflection\n * background consistency\n * old lighting still on product\n * pasted/cutout look\n\n\n\nFor text-heavy products:\n\n * run OCR before/after\n * inspect manually\n * preserve original label region if needed\n\n\n\nUseful QA/research links:\n\n * HFPC: product background inpainting evaluation with product consistency\n * DINOv2\n * OpenAI CLIP\n * OpenCLIP\n * LPIPS\n * Tesseract OCR\n * scikit-image SSIM\n\n\n\nAutomated metrics are useful as red flags, not final approval. Human review is still necessary for SKU identity.\n\n* * *\n\n## 6. Research framing\n\nThis problem is close to e-commerce item insertion / virtual try-all research.\n\nDiffuse to Choose is especially relevant because it frames the task as inserting an e-commerce item into a target scene while preserving fine-grained reference-item details and producing plausible blending, lighting, and shadows.\n\nUseful research links:\n\n * Diffuse to Choose project page\n * Diffuse to Choose paper\n * Amazon Science: Virtual Try-All\n * Product2IMG\n * BG60k / e-commerce product background generation\n * HFPC product consistency evaluation\n\n\n\nThe practical local/ComfyUI route is basically an approximation of this harder research problem:\n\n\n reference product\n + product mask/cutout\n + target scene/background\n + local edit/fill/blend\n + relighting\n + product-consistency QA\n\n\n* * *\n\n## 7. Commercial APIs\n\nI would treat commercial product-shot APIs as reference/fallback, not the main answer.\n\nThey can be useful for:\n\n * benchmarking quality\n * fast production\n * product-shot-specific pipelines\n * cases where local VRAM is too limited\n * product-in-hand or UGC-style templates\n\n\n\nBut check:\n\n * cost\n * privacy\n * uploaded product/customer images\n * licensing\n * output usage rights\n * data retention\n * brand safety\n * repeatability\n\n\n\nExamples to compare against, not necessarily start with:\n\n * BRIA Product Shot\n * RunComfy Qwen Image Edit workflow\n * RunComfy Put It Here Kontext\n\n\n\n* * *\n\n## 8. Compact decision tree\n\n\n Need exact white-background packshot?\n -> Use cutout + composite + shadow first.\n -> Avoid regenerating the product.\n\n Need background replacement?\n -> Segment product.\n -> Inpaint/fill only background/floor/shadow.\n -> Composite original product back if identity matters.\n -> Relight.\n\n Need one-shot phone-shot-to-studio conversion?\n -> Try Flux Kontext.\n -> Also make a composite-back version.\n -> Compare product crop.\n\n Need product in a lifestyle scene?\n -> If placed on a surface: try manual placement, Finegrain, or fill boundary/shadow.\n -> If held by a person: treat as hand-object interaction.\n\n Product has important text/logo/label?\n -> Test Qwen-Image-Edit.\n -> OCR + manual review.\n -> Composite original label/logo region if needed.\n\n Low VRAM?\n -> 8GB: cutout/composite + SDXL baseline.\n -> 12GB: Flux GGUF experiments.\n -> 16GB: Flux/Qwen quantized experiments.\n -> 24GB+: serious comparison.\n\n\n* * *\n\n## 9. My practical recommendation\n\nI would start with this baseline:\n\n\n 1. Segment/remove background.\n 2. Save original product cutout and mask.\n 3. Create or generate a clean studio background.\n 4. Composite original product onto it.\n 5. Add/inpaint contact shadow only.\n 6. Relight/color-match if needed.\n 7. Compare product crop, label crop, and full image.\n\n\nThen compare against:\n\n * Flux Kontext direct edit\n * Flux Fill masked background/floor/shadow\n * Finegrain placement for surface placement\n * Qwen-Image-Edit for label/text-heavy products\n * commercial APIs only as reference/fallback\n\n\n\nFor a simple packshot, the safest result may come from boring compositing rather than the strongest model. For lifestyle placement, the best route is usually product cutout + target scene + local fill/blend + relighting. For a person holding the product, expect the task to be much harder and use local hand/contact inpainting rather than ordinary product placement.\n\nAlso check model cards, repo licenses, API terms, brand policy, and privacy constraints before using outputs commercially.",
"title": "Phone shot image to studio shot image version for products"
}