Oral
in
Affinity Workshop: Tiny Papers Oral Session 1
VISUAL PROMPTING METHODS FOR GPT-4V BASED ZERO-SHOT GRAPHIC LAYOUT DESIGN GENERATION
Kunal Singh · Mukund Khanna · Ankan Biswas · Pradeep Moturi · Shivam
Graphic layout design generation is a challenging problem in computer vision. The key aspect of the challenge is ensuring coherent placement of textual elements on the background image to ensure aesthetic appeal and avoiding occlusion of key visual elements. Although prior methods have made attempts to solve this multi-modal problem, they couldn't perfect it. Owing to the complexity required in understanding the relationship between visual and text elements in the aforementioned task, we investigate GPT-4-Vision(GPT-4V), a large multimodal models(LMMs), to do zero-shot graphic layout design generation in a versatile manner. Our approach explores various off-the-shelf segmentation/superpixel methods to identify and mark the key regions to visually augment the image to enhance GPT-4V's spatial reasoning capability . The results of our comprehensive experiments on a self-curated dataset demonstrates the efficacy of our proposed visual prompting methods, showing improvement over standard GPT-4V prompting method and also performing at par and even better, for some techniques, than state-of-the-art specialist model.The code and data is available at https://anonymous.4open.science/r/VISUAL-PROMPTING-TECHNIQUES-FOR-GPT-4V-BASED-ZERO-SHOT-GRAPHIC-LAYOUT-DESIGN-GENERATION-5A6E