Sage Journals: Discover world-class research

Abstract

Reconstructing clothed human models from single-view images is a critical research topic within computer vision and computer graphics. The primary goal is to generate a geometrically accurate and visually realistic 3D representation of a person—including detailed body morphology and garment structure—based solely on a 2D image captured from one perspective. The fundamental challenge in this domain is the reliable inference and reconstruction of body shape, surface texture, and intricate clothing attributes, particularly for regions occluded or absent in the observed view. To overcome these limitations, we introduce a novel clothed human reconstruction method that incorporates two core components: a back-view generation module and a clothed human reconstruction module. The back-view generation module leverages a state-of-the-art image diffusion technique to produce a plausible back-view image that aligns with the semantic and perceptual characteristics of the input image, thereby enriching the available information for subsequent reconstruction. In the clothed human reconstruction module, we use an estimated human parameterized mesh as a 3D prior to guide the reconstruction, alleviating the depth ambiguity typically caused by relying solely on 2D image information. Experimental results on the publicly available 3D human datasets CAPE and CustomHumans highlight the fact that our method produces more realistic back-view images, and the reconstructed human models exhibit higher similarity in terms of shape, pose, and texture to the input images.

Keywords

clothed human reconstruction single-view image image diffusion model human parameterized mesh

Get full access to this article

View all access options for this article.

References

Lyu

Yao

Gupta

, et al. Backdooring vision-language models with out-of-distribution data. arXiv preprint arXiv:2410.01264, 2024.

Yang

Xiao

Huang

, et al. StoryLLaVA: enhancing visual storytelling with multi-modal large language models. In: Proceedings of the 31st International Conference on Computational Linguistics. Association for Computational Linguistics. 2025, pp. 3936–3951.

Lyu

Pang

, et al. TrojVLM: backdoor attack against vision language models. In: European Conference on Computer Vision, Milan, Italy, 29 September 2024, pp. 467–483.

Chen

Jiang

, et al. Smart clothing system with multiple sensors based on digital twin technology. IEEE Internet Things J 2022; 10(7): 6377–6387.

Tian

, et al. Intelligent wearable system with motion and emotion recognition based on digital twin technology. IEEE Internet Things J 2024; 11(15): 26314.

Rombach

Blattmann

Lorenz

, et al. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 18–24 June 2022, pp. 10684–10695.

Pavlakos

Choutas

Ghorbani

, et al. Expressive body capture: 3D hands, face, and body from a single image In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 15–20 June 2019, pp. 10975–10985.

Chen

Zhong

Zhang

, et al. Action-aware linguistic skeleton optimization network for non-autoregressive video captioning. ACM Trans Multimed Comput Commun Appl 2024; 20(10): 1–24.

Lyu

Lin

Zheng

, et al. Task-agnostic detector for insertion-based backdoor attacks. arXiv preprint arXiv:2403.17155, 2024.

10.

You

Zhong

Liu

, et al. Converting artificial neural networks to ultra-low-latency spiking neural networks for action recognition. In: IEEE Transactions on Cognitive and Developmental Systems 2024; 16(4): 1533.

11.

Lyu

Wang

, et al. BadCLM: backdoor attack in clinical language models for electronic health records. In: AMIA annual symposium proceedings 2025; 2024: 768.

12.

Liu

, et al. Multimodal wearable system with dual-frequency enhancement network for risk recognition. IEEE Internet Things J 2025; 12: 17364.

13.

Zhang

, et al. Phase contour enhancement network for clothing parsing. IEEE Trans Consum Electron 2024; 70: 2784.

14.

Saito

Huang

Natsume

, et al. PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision, Seoul, Korea, 27 October 2019, pp. 2304–2314.

15.

Saito

Simon

Saragih

, et al. PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2020, pp. 84–93.

16.

Zheng

Liu

, et al. PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Trans Pattern Anal Mach Intell 2021; 44(6): 3170–3184.

17.

Xiu

Yang

Tzionas

, et al. ICON: Implicit clothed humans obtained from normals. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2022, pp. 13286–13296.

18.

Xiu

Yang

Cao

, et al. ECON: explicit clothed humans optimized via normal integration. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, 17‒24 June 2023, pp. 512–523.

19.

Han

Park

Yoon

, et al. High-fidelity 3D human digitization from single 2K resolution images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, 17‒24 June 2023, pp. 12869–12879.

20.

Zhang

Sun

Yang

, et al. Global-correlated 3D-decoupling transformer for clothed avatar reconstruction. Adv Neural Inf Process Syst 2023; 36: 7818–7830.

21.

Huang

Xiu

, et al. TECH: text-guided reconstruction of lifelike clothed humans. Proceedings of the 2024 International Conference on 3D vision 2024; 1: 1531–1542.

22.

Ruiz

Jampani

, et al. DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, 17‒24 June 2023, pp. 22500–22510.

23.

Shen

Gao

Yin

, et al. Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis. Adv Neural Inf Process Syst 2021; 34: 6087–6101.

24.

Poole

Jain

Barron

, et al. DreamFusion: text-to-3D using 2D diffusion. arXiv preprint arXiv:2209.14988, 2022.

25.

Hua

, et al. VTON-MP: multi-pose virtual try-on via appearance flow and feature filtering. IEEE Trans Consum Electron 2023; 69(4): 1101–1113.

26.

Jiang

, et al. VTON-SCFA: a virtual try-on network based on the semantic constraints and flow alignment. IEEE Trans Multimed 2022; 25: 777–791.

27.

Zhang

Yang

. SiFu: side-view conditioned implicit function for real-world usable clothed human reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), USA, 2024, pp. 9936–9947.

28.

Xiao

Liu

, et al. BiaCanDet: bioelectrical impedance analysis for breast cancer detection with space-time attention neural network. Expert Syst Appl 2025; 269: 126223.

29.

Lyu

Zheng

, et al. A study of the attention abnormality in Trojaned BERTs. arXiv preprint arXiv:2205.08305, 2022.

30.

Lyu

Zheng

Pang

, et al. Attention-enhancing backdoor attacks against BERT-based models. arXiv preprint arXiv:2310.14480, 2023.

31.

Lyu

Dong

Wong

, et al. A multimodal transformer: fusing clinical notes with structured EHR data for interpretable in-hospital mortality prediction. AMIA Annu Symp Proc 2023; 2022: 719–728.

32.

Zhong

Chen

, et al. Refined semantic enhancement towards frequency diffusion for video captioning. Proc AAAI Conf Artif Intell 2023; 37(3): 3724–3732.

33.

Zhu

Yuan

Yang

, et al. Fine-grained fragment diffusion for cross domain crowd counting. Proceedings of the 30th ACM international conference on multimedia 2022; 13: 5659–5668.

34.

Liu

Van Hoorick

, et al. Zero-1-to-3: zero-shot one image to 3D object. Proceedings of the IEEE/CVF International Conference on Computer Vision 2023; 10: 9298–9309.

35.

Chan

Nagano

Chan

, et al. Generative novel view synthesis with 3D-aware diffusion models. In: IEEE/CVF International Conference on Computer Vision, Paris, France, 2023, pp. 4217–4229.

36.

Liu

Lin

Zeng

, et al. SyncDreamer: generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023.

37.

Tseng

Kim

, et al. Consistent view synthesis with pose-guided diffusion models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, 17‒24 June 2023, pp. 16773–16783.

38.

Long

Guo

Lin

, et al. Wonder3D: single image to 3D using cross-domain diffusion. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 9970–9980.

39.

Voleti

Yao

Boss

, et al. SV3D: novel multi-view synthesis and 3D generation from a single image using latent video diffusion. European conference on computer vision, 2024, pp. 439–457.

40.

Cai

Yin

Zeng

, et al. SMPLer-X: scaling up expressive human pose and shape estimation. Adv Neural Inf Process Syst 2023; 36: 11454–11468.

41.

Kirillov

Mintun

Ravi

, et al. Segment anything. Proceedings of the IEEE/CVF International Conference on Computer Vision 2023; 0: 4015–4026.

42.

Zhang

Rao

Agrawala

. Adding conditional control to text-to-image diffusion models. Proceedings of the IEEE/CVF International Conference on Computer Vision 2023; 0: 3836–3847.

43.

Radford

Kim

Hallacy

, et al. Learning transferable visual models from natural language supervision. International conference on machine learning, 2021: 8748–8763.

44.

Hang

, et al. Efficient diffusion training via min-SNR weighting strategy. Proceedings of the IEEE/CVF International Conference on Computer Vision 2023: 7441–7451.

45.

Lorensen

Cline

. Marching cubes: a high resolution 3D surface construction algorithm. Seminal Graphics: Pioneering Efforts That Shaped the Field, 1998, pp. 347–353.

46.

Zheng

Guo

, et al. Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, 2021, pp. 5746–5756.

47.

Yang

Ranjan

, et al. Learning to dress 3D people in generative clothing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, 2020, pp. 6469–6478.

48.

Xue

Song

, et al. Learning locally editable virtual humans. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, 17‒24 June 2023, pp. 21024–21035.

49.

Wang

Sun

Cheng

, et al. Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 2020; 43(10): 3349–3364.

50.

Yang

Luo

Xiu

, et al. D-IF: uncertainty-aware human digitization via implicit distribution field. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023, pp. 9122–9132.

Enhancing clothed human reconstruction from single-view image with diffusion models and parameterized mesh guidance

Abstract

Keywords

Get full access to this article

References