Pre-trained LVMs cannot directly generate product images if the specific products are not in their knowledge base. To address this issue, we design algorithms to customize LVMs for specific products and scenes. This customization process typically takes only a few hours, after which the customized LVMs can generate product visuals in seconds.
Powerful pre-trained LVMs, e.g., Stable Diffusion, have been trained on billions of images, thereby excelling at rendering highly creative images. However, customizing LVMs is needed for product visuals mainly because the particular products are not in the knowledge base of pre-trained LVMs. To leverage LVMs for product visuals, we design a novel customization framework called NOLA, which is short for new object learning algorithms. In particular, the customization is to make LVMs to learn given products. After learning, the LVMs can generate images with the products. Therefore, NOLA essentially extends the image generation capabilities of pre-trained LVMs to products by injecting the images of the products into the LVMs.
NOLA can customize LVMs not only for particular products, but also for particular scenes where the products are placed. In this way, it is convenient to use the customized LVMs to generate images of a product against a specific background.
NOLA only requires five images of a product and few background images (if any), and takes a handful of GPU hours to customize a pre-trained LVM. Afterwards, the customized LVM can produce images of the product in seconds.