Abstract
Reconstructing clothed human models from single-view images is a critical research topic within computer vision and computer graphics. The primary goal is to generate a geometrically accurate and visually realistic 3D representation of a person—including detailed body morphology and garment structure—based solely on a 2D image captured from one perspective. The fundamental challenge in this domain is the reliable inference and reconstruction of body shape, surface texture, and intricate clothing attributes, particularly for regions occluded or absent in the observed view. To overcome these limitations, we introduce a novel clothed human reconstruction method that incorporates two core components: a back-view generation module and a clothed human reconstruction module. The back-view generation module leverages a state-of-the-art image diffusion technique to produce a plausible back-view image that aligns with the semantic and perceptual characteristics of the input image, thereby enriching the available information for subsequent reconstruction. In the clothed human reconstruction module, we use an estimated human parameterized mesh as a 3D prior to guide the reconstruction, alleviating the depth ambiguity typically caused by relying solely on 2D image information. Experimental results on the publicly available 3D human datasets CAPE and CustomHumans highlight the fact that our method produces more realistic back-view images, and the reconstructed human models exhibit higher similarity in terms of shape, pose, and texture to the input images.
Get full access to this article
View all access options for this article.
