Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihn4vpdbspwmiahx53l5l25cy4yv2vftlgaabkhi4g6zwddh5qxyi",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mn75q7wx5jv2"
  },
  "path": "/t/phone-shot-image-to-studio-shot-image-version-for-products/176425#post_2",
  "publishedAt": "2026-06-01T02:58:43.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Long version here",
    "Shopify/background-replacement",
    "BRIA RMBG-2.0",
    "SAM2",
    "Segment Anything",
    "ComfyUI basic inpaint docs",
    "ComfyUI mask editor",
    "FLUX.1 Fill dev",
    "ComfyUI Flux Fill workflow docs",
    "ComfyUI-Flux-Inpainting",
    "Comfy-Inpainting-Works",
    "FLUX.1 Kontext dev",
    "Finegrain Product Placement LoRA",
    "Finegrain Product Placement Space",
    "Finegrain product placement Flux LoRA experiment",
    "Qwen-Image-Edit",
    "Qwen-Image-Edit model card",
    "Qwen Image Edit blog",
    "ComfyUI Qwen-Image-Edit workflow docs",
    "Qwen-Image technical report",
    "IC-Light GitHub",
    "ComfyUI-IC-Light",
    "ComfyUI background replacement with IC-Light",
    "Product Photography Relight workflow",
    "ComfyUI hand repair workflow",
    "HandFixer",
    "ComfyUI VRAM modes / server config",
    "ComfyUI Dynamic VRAM",
    "Diffusers memory optimization",
    "ComfyUI-GGUF",
    "Nunchaku / SVDQuant",
    "ComfyUI-nunchaku",
    "HFPC: product background inpainting evaluation with product consistency",
    "DINOv2",
    "OpenAI CLIP",
    "OpenCLIP",
    "LPIPS",
    "Tesseract OCR",
    "scikit-image SSIM",
    "Diffuse to Choose project page",
    "Diffuse to Choose paper",
    "Amazon Science: Virtual Try-All",
    "Product2IMG",
    "BG60k / e-commerce product background generation",
    "HFPC product consistency evaluation",
    "BRIA Product Shot",
    "RunComfy Qwen Image Edit workflow",
    "RunComfy Put It Here Kontext"
  ],
  "textContent": "I looked into this a bit. Depending on your VRAM, I think there are a few workable routes. (Long version here) :\n\n* * *\n\n## Short version\n\nI would not treat this as one single “image-to-image” problem.\n\nIt is really a bundle of smaller tasks:\n\n  * product cutout / masking\n  * clean packshot generation\n  * background replacement\n  * floor and contact-shadow generation\n  * relighting / color matching\n  * product placement into a lifestyle scene\n  * hand-object interaction if a person is holding the product\n  * logo / label / text / hardware / stitching preservation\n  * low-VRAM execution\n  * product-identity QA\n\n\n\nThe safest rule is:\n\n> If SKU identity matters, do not regenerate the whole product unless you absolutely have to.\n\nFor e-commerce/product photography, a beautiful image can still be a failure if the product is no longer the same product. I would usually start from workflows that preserve the original product pixels and only generate the background, floor, shadow, lighting, or small boundary/contact areas around it.\n\nA useful mental model is:\n\n\n    protect product pixels\n    generate or edit background pixels\n    add floor/contact shadow\n    relight or color-match\n    composite original product back if needed\n    verify product identity\n\n\nThis is also why I would separate “phone shot to studio packshot” from “person holding the product.” The latter is not ordinary product placement; it is hand-object interaction with occlusion.\n\n* * *\n\n## 1. First principle: prompt the scene, not the SKU\n\nFor background replacement, do not over-describe the product itself. If you prompt “a brown leather handbag with gold zipper and braided handle,” the model may try to recreate a plausible brown handbag instead of preserving the exact one.\n\nA better prompt usually describes:\n\n  * where the product is grounded\n  * the studio/background scene\n  * the floor/surface\n  * lighting style\n  * camera/product-photography style\n\n\n\nShopify’s old SDXL background-replacement Space has a very useful prompting rule in this direction: do not describe the product; describe grounding, scene, and style. See Shopify/background-replacement.\n\nSo my shorthand would be:\n\n> Prompt the scene, not the SKU.\n\nExample:\n\n\n    clean white ecommerce studio background, product standing on a matte white surface, soft diffused studio lighting, subtle realistic contact shadow, catalog product photography\n\n\nNot:\n\n\n    brown leather handbag with gold zipper, braided handle, front logo, side stitching\n\n\nThe second prompt may encourage the model to redraw the product.\n\n* * *\n\n## 2. Workflow options\n\n### W0. Cutout + composite + shadow\n\nThis is the safest baseline for a clean packshot.\n\n\n    input product photo\n    -> background removal / segmentation\n    -> original product cutout with alpha\n    -> white or light gray canvas\n    -> scale and center\n    -> synthetic or inpainted contact shadow\n    -> optional relighting / color match\n    -> final QA\n\n\nThis is not flashy, but it preserves SKU identity better than almost any full-image generation workflow. Use this as the control group before testing Flux/Qwen/Kontext workflows.\n\nUseful parts:\n\n  * BRIA RMBG-2.0 for background removal\n  * SAM2 for promptable segmentation\n  * Segment Anything if you need interactive masks\n  * Pillow/OpenCV/Photoshop/ComfyUI nodes for compositing\n\n\n\nBest for:\n\n  * white-background product listing\n  * exact product shape/color/material\n  * low VRAM\n  * batch packshots\n\n\n\nMain failure modes:\n\n  * halo around edges\n  * no contact shadow\n  * product looks pasted\n  * white product on white background loses shape\n\n\n\n* * *\n\n### W1. SDXL background-only inpaint\n\nSDXL is not the newest or strongest editor, but it is still a useful low-VRAM baseline.\n\nUse it for:\n\n  * background-only inpainting\n  * floor/contact shadow experiments\n  * quick packshot tests\n  * comparison baseline before heavier Flux/Qwen workflows\n\n\n\nThe important point is to protect the product.\n\n\n    input photo\n    -> product mask\n    -> invert mask or protect product region\n    -> inpaint only background/floor/shadow\n    -> composite original product back\n    -> QA product crop\n\n\nDo not ask SDXL to redraw the product if exact identity matters. It may produce a similar-looking product with changed hardware, stitching, logo, label, color, or proportions.\n\nUseful links:\n\n  * ComfyUI basic inpaint docs\n  * ComfyUI mask editor\n\n\n\n* * *\n\n### W2. Flux Fill for background/floor/shadow\n\nFLUX.1 Fill dev is very relevant, but I would frame it as a **masked completion component** , not a one-click product-photography solution.\n\nGood use:\n\n\n    protect original product\n    mask background/floor/shadow area\n    Flux Fill generates only the missing background/floor/shadow\n    composite original product back if needed\n    relight / blend / QA\n\n\nIt is promising for:\n\n  * replacing messy phone-shot backgrounds\n  * adding studio floors\n  * extending canvas/outpainting\n  * creating more natural shadows around a protected product\n\n\n\nBut product-background swap quality depends heavily on:\n\n  * mask precision\n  * mask expansion/blur\n  * contact shadow\n  * relighting\n  * final blending\n  * whether you composite the original product back\n\n\n\nLow-VRAM users may need GGUF/NF4/offload/custom nodes. Also see:\n\n  * ComfyUI Flux Fill workflow docs\n  * ComfyUI-Flux-Inpainting\n  * Comfy-Inpainting-Works\n\n\n\n* * *\n\n### W3. Flux Kontext direct edit\n\nFLUX.1 Kontext dev is probably one of the strongest local candidates for direct “phone shot → studio shot” editing. The model card describes image editing from text instructions, object/style/character reference, and successive edits with minimal visual drift.\n\nTest it like this:\n\n\n    input product photo\n    -> Flux Kontext\n    -> prompt: turn this into a professional ecommerce studio product photo\n    -> output\n    -> product identity QA\n\n\nHowever, for strict e-commerce use, I would not trust the direct output blindly. A direct edit may look excellent while quietly changing:\n\n  * silhouette\n  * color\n  * leather/fabric texture\n  * zipper or buckle shape\n  * logo\n  * label text\n  * handle length\n  * stitching\n  * product proportions\n\n\n\nFor serious use, compare two versions:\n\n\n    A. Flux Kontext direct edit\n    B. Flux Kontext for studio look + original product composited back\n\n\nA may look better. B is usually safer for SKU identity.\n\n* * *\n\n### W3b. Flux Kontext composite-back variant\n\nThis is the safer Kontext route.\n\n\n    input product photo\n    -> Flux Kontext creates target studio look/background/lighting\n    -> use generated output as visual target\n    -> cut out original product\n    -> composite original product back\n    -> contact shadow / relight / color match\n    -> QA\n\n\nThis is useful when Kontext gives good lighting/background style but changes the product too much.\n\n* * *\n\n### W4. Finegrain Product Placement\n\nFinegrain Product Placement LoRA is useful for thinking about product placement. It is a Flux Kontext LoRA aimed at product photography with bounding-box control.\n\nThe mental model is not “just prompt harder.” It is:\n\n\n    scene image\n    + transparent product cutout\n    + placement box\n    -> product blended into scene\n\n\nThe Finegrain Product Placement Space exposes this clearly: upload a scene photo, draw a box where the item should go, and provide a product image with transparent background.\n\nImportant caveat: the model card explicitly says products in hands are not supported. So Finegrain is relevant for:\n\n  * product on table\n  * product on shelf\n  * product on floor\n  * product in a room scene\n  * product on display\n\n\n\nIt is not the answer to:\n\n  * person holding the bag\n  * hand gripping the handle\n  * shoulder-worn bag\n  * complex hand/object occlusion\n\n\n\nAlso check the official blog: Finegrain product placement Flux LoRA experiment.\n\n* * *\n\n### W5. Qwen-Image-Edit for labels, packaging, logos, printed text\n\nQwen-Image-Edit is especially relevant when product text matters. I would not necessarily start with it for a plain leather bag, but I would test it for:\n\n  * product boxes\n  * bottles\n  * packaging\n  * labels\n  * signs\n  * logos\n  * printed instructions\n  * UI/product mockups\n  * localized marketing creatives\n\n\n\nQwen’s strength is text-aware image editing, but:\n\n> text-capable is not SKU-safe.\n\nFor product work, the question is not merely “is the generated text readable?” The question is “is this still the same label/logo/brand/product?”\n\nUse:\n\n  * OCR before/after\n  * manual logo review\n  * crop comparison\n  * original label-region composite-back if needed\n\n\n\nUseful links:\n\n  * Qwen-Image-Edit model card\n  * Qwen Image Edit blog\n  * ComfyUI Qwen-Image-Edit workflow docs\n  * Qwen-Image technical report\n\n\n\n* * *\n\n### W6. Relighting / IC-Light\n\nRelighting deserves its own step.\n\nMany product/background swaps fail because the old phone-shot lighting remains on the product. The background changes, but the product still has the old shadows and highlights, so the image looks pasted together.\n\nUse relighting after:\n\n  * cutout + composite\n  * generated background\n  * Flux Fill background work\n  * manual placement\n  * product-background blending\n\n\n\nA generic route:\n\n\n    product cutout\n    + selected/generated background\n    -> composite product\n    -> relight foreground to match background\n    -> add/refine contact shadow\n    -> restore original product details if softened\n\n\nUseful links:\n\n  * IC-Light GitHub\n  * ComfyUI-IC-Light\n  * ComfyUI background replacement with IC-Light\n  * Product Photography Relight workflow\n\n\n\n* * *\n\n### W7. Manual placement + boundary/shadow fill\n\nIf product identity matters, a controlled manual workflow can be safer than a powerful all-in-one model.\n\n\n    product cutout\n    + target scene/background\n    -> manually place product\n    -> mask only boundary/contact/shadow area\n    -> SDXL or Flux Fill repairs local boundary/shadow\n    -> relight/color-match\n    -> QA\n\n\nThis is good for:\n\n  * bag on table\n  * shoes on floor\n  * bottle on bathroom counter\n  * product on shelf\n  * small accessory on desk\n\n\n\nIt is weak for:\n\n  * hand holding product\n  * product worn on body\n  * heavy occlusion\n  * wrong product perspective\n\n\n\n* * *\n\n### W8. Product in hand / person holding product\n\nThis is the hardest case.\n\nA person holding a product is not ordinary product placement. It adds:\n\n  * hand/product occlusion\n  * fingers wrapping around handles\n  * product scale relative to body\n  * gravity and strap deformation\n  * contact shadows\n  * foreground/background ordering\n  * hand reconstruction\n  * product identity preservation\n\n\n\nI would not expect normal product placement to solve this.\n\nA safer local workaround is:\n\n\n    person image with suitable pose\n    + original product cutout\n    -> manually place product near hand\n    -> mask only fingers / handle / contact / occlusion\n    -> local inpaint / hand repair / Flux Fill / SDXL inpaint\n    -> composite original product body back\n    -> relight / shadow / QA\n\n\nThe key is to regenerate only the tiny contact/occlusion region, not the whole product.\n\nUseful links:\n\n  * ComfyUI hand repair workflow\n  * ComfyUI basic inpaint docs\n  * HandFixer\n\n\n\nThere are cloud/partner-model templates for product-in-hand UGC-style workflows, but I would treat those as reference/fallback, not the main local/open route.\n\n* * *\n\n## 3. VRAM guide\n\nThis is approximate. “Runs on 8GB” or “runs on 12GB” is not enough information. It depends on:\n\n  * model\n  * quantization\n  * text encoder\n  * VAE\n  * resolution\n  * steps\n  * LoRA/distillation\n  * CPU/RAM offload\n  * ComfyUI version\n  * node implementation\n  * system RAM\n  * generation time\n\n\n\n### 8GB VRAM\n\nStart with:\n\n  * cutout + composite + shadow\n  * SDXL background-only inpaint\n  * small resolution tests\n  * VAE tiling/slicing\n  * aggressive offload if needed\n\n\n\nTreat Flux/Qwen as experimental. Some community workflows may run, but speed and stability can be poor.\n\n### 12GB VRAM\n\nMore realistic:\n\n  * SDXL composite workflows\n  * Flux GGUF experiments\n  * Flux Kontext GGUF tests\n  * Flux Fill with quant/offload\n  * careful text encoder choice\n\n\n\nStill log runtime. A 12GB report can mean under a minute or many minutes depending on quantization and workflow.\n\n### 16GB VRAM\n\nA good experimentation tier:\n\n  * Flux GGUF/FP8 becomes more serious\n  * Qwen 4-bit/GGUF/NF4 becomes testable\n  * Finegrain placement may be possible\n  * relighting/composite workflows are practical\n\n\n\n### 24GB VRAM\n\nA practical local comparison tier:\n\n  * Flux Kontext\n  * Flux Fill\n  * Qwen-Image-Edit quantized or optimized\n  * SDXL + ControlNet/IP-Adapter workflows\n  * more comfortable high-res tests\n\n\n\n### 32GB+\n\nAt this point, focus less on “can it run?” and more on:\n\n  * product identity\n  * failure rate\n  * batch reliability\n  * legal/license terms\n  * QA automation\n  * repeatability\n  * throughput\n\n\n\nUseful low-VRAM links:\n\n  * ComfyUI VRAM modes / server config\n  * ComfyUI Dynamic VRAM\n  * Diffusers memory optimization\n  * ComfyUI-GGUF\n  * Nunchaku / SVDQuant\n  * ComfyUI-nunchaku\n\n\n\n* * *\n\n## 4. Suggested order of testing\n\nI would test in this order:\n\n\n    1. W0 cutout + composite + shadow\n    2. W1 SDXL background-only inpaint\n    3. W2 Flux Fill background/floor/shadow\n    4. W6 relighting / IC-Light\n    5. W3 Flux Kontext direct edit\n    6. W3b Flux Kontext composite-back\n    7. W4 Finegrain product placement\n    8. W5 Qwen-Image-Edit for labels/text\n    9. W8 product-in-hand local workaround\n\n\nReason:\n\n  * start with the least destructive workflow\n  * establish a product-identity baseline\n  * add generation only where it helps\n  * reserve direct full-image editing for cases where the safer route is not enough\n\n\n\n* * *\n\n## 5. QA checklist\n\nDo not evaluate only the full image. The background is supposed to change. The product is not.\n\nCompare:\n\n  * original product crop\n  * generated product crop\n  * original mask\n  * generated/product mask\n  * label crop\n  * logo crop\n  * hardware crop\n  * full image\n\n\n\nCheck product identity:\n\n  * silhouette / proportions\n  * color\n  * material texture\n  * leather grain / fabric weave\n  * hardware\n  * zipper / buckle / strap / handle\n  * stitching\n  * label/logo\n  * small text\n  * barcode if relevant\n  * product scale\n\n\n\nCheck scene realism:\n\n  * contact shadow\n  * light direction\n  * floor contact\n  * perspective\n  * reflection\n  * background consistency\n  * old lighting still on product\n  * pasted/cutout look\n\n\n\nFor text-heavy products:\n\n  * run OCR before/after\n  * inspect manually\n  * preserve original label region if needed\n\n\n\nUseful QA/research links:\n\n  * HFPC: product background inpainting evaluation with product consistency\n  * DINOv2\n  * OpenAI CLIP\n  * OpenCLIP\n  * LPIPS\n  * Tesseract OCR\n  * scikit-image SSIM\n\n\n\nAutomated metrics are useful as red flags, not final approval. Human review is still necessary for SKU identity.\n\n* * *\n\n## 6. Research framing\n\nThis problem is close to e-commerce item insertion / virtual try-all research.\n\nDiffuse to Choose is especially relevant because it frames the task as inserting an e-commerce item into a target scene while preserving fine-grained reference-item details and producing plausible blending, lighting, and shadows.\n\nUseful research links:\n\n  * Diffuse to Choose project page\n  * Diffuse to Choose paper\n  * Amazon Science: Virtual Try-All\n  * Product2IMG\n  * BG60k / e-commerce product background generation\n  * HFPC product consistency evaluation\n\n\n\nThe practical local/ComfyUI route is basically an approximation of this harder research problem:\n\n\n    reference product\n    + product mask/cutout\n    + target scene/background\n    + local edit/fill/blend\n    + relighting\n    + product-consistency QA\n\n\n* * *\n\n## 7. Commercial APIs\n\nI would treat commercial product-shot APIs as reference/fallback, not the main answer.\n\nThey can be useful for:\n\n  * benchmarking quality\n  * fast production\n  * product-shot-specific pipelines\n  * cases where local VRAM is too limited\n  * product-in-hand or UGC-style templates\n\n\n\nBut check:\n\n  * cost\n  * privacy\n  * uploaded product/customer images\n  * licensing\n  * output usage rights\n  * data retention\n  * brand safety\n  * repeatability\n\n\n\nExamples to compare against, not necessarily start with:\n\n  * BRIA Product Shot\n  * RunComfy Qwen Image Edit workflow\n  * RunComfy Put It Here Kontext\n\n\n\n* * *\n\n## 8. Compact decision tree\n\n\n    Need exact white-background packshot?\n    -> Use cutout + composite + shadow first.\n    -> Avoid regenerating the product.\n\n    Need background replacement?\n    -> Segment product.\n    -> Inpaint/fill only background/floor/shadow.\n    -> Composite original product back if identity matters.\n    -> Relight.\n\n    Need one-shot phone-shot-to-studio conversion?\n    -> Try Flux Kontext.\n    -> Also make a composite-back version.\n    -> Compare product crop.\n\n    Need product in a lifestyle scene?\n    -> If placed on a surface: try manual placement, Finegrain, or fill boundary/shadow.\n    -> If held by a person: treat as hand-object interaction.\n\n    Product has important text/logo/label?\n    -> Test Qwen-Image-Edit.\n    -> OCR + manual review.\n    -> Composite original label/logo region if needed.\n\n    Low VRAM?\n    -> 8GB: cutout/composite + SDXL baseline.\n    -> 12GB: Flux GGUF experiments.\n    -> 16GB: Flux/Qwen quantized experiments.\n    -> 24GB+: serious comparison.\n\n\n* * *\n\n## 9. My practical recommendation\n\nI would start with this baseline:\n\n\n    1. Segment/remove background.\n    2. Save original product cutout and mask.\n    3. Create or generate a clean studio background.\n    4. Composite original product onto it.\n    5. Add/inpaint contact shadow only.\n    6. Relight/color-match if needed.\n    7. Compare product crop, label crop, and full image.\n\n\nThen compare against:\n\n  * Flux Kontext direct edit\n  * Flux Fill masked background/floor/shadow\n  * Finegrain placement for surface placement\n  * Qwen-Image-Edit for label/text-heavy products\n  * commercial APIs only as reference/fallback\n\n\n\nFor a simple packshot, the safest result may come from boring compositing rather than the strongest model. For lifestyle placement, the best route is usually product cutout + target scene + local fill/blend + relighting. For a person holding the product, expect the task to be much harder and use local hand/contact inpainting rather than ordinary product placement.\n\nAlso check model cards, repo licenses, API terms, brand policy, and privacy constraints before using outputs commercially.",
  "title": "Phone shot image to studio shot image version for products"
}