Sage Journals: Discover world-class research

Abstract

Fashion sketch editing is intended to modify specific attributes of a sketch while preserving its original integrity, thus facilitating the rapid transformation of designers’ concepts into tangible designs. When fashion sketches are edited, it is crucial to precisely control the style of different parts and ensure that line connections and transitions are smooth and natural. The presence of complex and diverse semantic attributes in fashion sketches poses a challenge to focus existing attribute editing efforts on specific regions. To overcome this limitation, a semantically guided fashion sketch disentanglement model is proposed. First, the complete fashion sketch undergoes semantic segmentation into sleeves and torso using a semantic segmentation network. Thereafter, the latent space of the network is decomposed into semantic parts based on semantic segmentation to prevent editing operations from having unnecessary or unintended effects on non-targeted regions of the fashion sketch. Subsequently, a VGG-structured encoder and StyleGAN2 decoder are trained to obtain the latent vectors of both the complete and segmented sketches. More concise and explanatory features are then extracted in the latent space through sparse principal component analysis. Finally, perturbations along the principal directions are applied to explore variations in attributes related to sleeves and torso in fashion sketches. Extensive qualitative and quantitative experiments on the VITON-HD and Dress-Code datasets demonstrate that our model exhibits outstanding disentanglement ability and produces excellent editing effects in the target attribute regions while keeping the non-target regions virtually unaltered. Furthermore, the attribute disentanglement accuracy is significantly higher than that of other methods.

Keywords

Fashion sketch design attribute editing semantic segmentation sparse decomposition

Get full access to this article

View all access options for this article.

References

Goodfellow

Pouget-Abadie

Mirza

, et al. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014, vol. 27.

Zhu

Krähenbühl

Shechtman

, et al. Generative visual manipulation on the natural image manifold. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part V 14. Dordrecht: Springer International Publishing, 2016, pp. 597–613.

Zhou

Bai

, et al. M6-Fashion: high-fidelity multi-modal image generation and editing. arXiv preprint arXiv:2205.11705, 2022.

Chen

Tian

, et al. TailorGAN: making user-defined fashion designs. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 3241–3250.

Kingma

Welling

Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.

Rezende

Mohamed

Variational inference with normalizing flows. In International Conference on Machine Learning (PMLR), 2015, pp. 1530–1538.

Abdal

Qin

Wonka

Image2styleGAN: How to embed images into the styleGAN latent space? In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4432–4441.

Creswell

Bharath

AA.

Inverting the generator of a generative adversarial network. IEEE Trans Neural Networks Learn Syst 2018, 30(7): 1967–1974.

Dai

Yang

Wang

, et al. Edit like a designer: modeling design workflows for unaligned fashion editing. In Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM Press, 2021, pp. 3492–3500.

10.

Richardson

Alaluf

Patashnik

, et al. Encoding in style: a styleGAN encoder for image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2287–2296.

11.

Wei

Chen

Zhou

, et al. E2Style: improve the efficiency and effectiveness of StyleGAN inversion. IEEE Trans Image Process 2022; 31: 3267–3280.

12.

Lim

Tham

, et al. Attribute manipulation generative adversarial networks for fashion images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 10541–10550.

13.

Zhao

Shi

, et al. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2881–2890.

14.

Badrinarayanan

Kendall

Cipolla

SegNET: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 2017; 39(12): 2481–2495.

15.

Peng

Zhang

, et al. Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4353–4361.

16.

Chen

LC.

Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062, 2014.

17.

Wei

Chen

Jin

, et al. Stronger fewer and superior: harnessing vision foundation models for domain generalized semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 28619–28630.

18.

Mao

Yan

, et al. UFS-Net: unsupervised network for fashion style editing and generation. In 2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2023, pp. 2105–2110.

19.

Sun

Zhang

, et al. PFNet: Attribute-aware personalized fashion editing with explainable fashion compatibility analysis. Inform Process Manage 2024; 61(1): 103540.

20.

Ping

Ding

, et al. Fashion-AttGAN: attribute-aware fashion editing with multi-objective GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.

21.

Zuo

Kan

, et al. AttGAN: facial attribute editing by only changing what you want. IEEE Trans Image Process 2019; 28(11): 5464–5478.

22.

Kwon

Petrangeli

Kim

, et al. Tailor me: an editing network for fashion attribute shape manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 3831–3840.

23.

Wang

Qian

Zou

, et al. Coarse-to-fine attribute editing for fashion images. In Artificial Intelligence: First CAAI International Conference, CICAI 2021, Hangzhou, China, June 5–6, 2021, Proceedings, Part I 1. Dordrecht: Springer International Publishing, 2021, pp. 396–407.

24.

Dai

Yang

Wang

25.

Pumarola

Agudo

Martinez

, et al. GANimation: anatomically-aware facial animation from a single image. In Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 818–833.

26.

Zhang

Kan

Shan

, et al. Generative adversarial network with spatial attention for face attribute editing. In Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 417–432.

27.

Kumar

Sattigeri

Balakrishnan

Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848, 2017.

28.

Higgins

Matthey

Pal

, et al. Beta-VAE: Learning basic visual concepts with a constrained variational framework. ICLR (Poster), 2017, 3.

29.

Chen R

Xuechen

Roger

, et al. Isolating sources of disentanglement in VAEs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 2615–2625.

30.

Choi

Kim

, et al. StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8789–8797.

31.

Shi

Yang

Wan

, et al. SemanticStyleGAN: learning compositional generative priors for controllable image synthesis and editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11254–11264.

32.

Jahanian

Chai

Isola

On the “steerability” of generative adversarial networks. arXiv preprint arXiv:1907.07171, 2019.

33.

Plumerault

Borgne

Hudelot

Controlling generative models with continuous factors of variations. arXiv preprint arXiv:2001.10238, 2020.

34.

Shen

Tang

, et al. Interpreting the latent space of GANs for semantic face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: pp. 9243–9252.

35.

Voynov

Babenko

Unsupervised discovery of interpretable directions in the GAN latent space. In International Conference on Machine Learning, PMLR, 2020, pp. 9786–9796.

36.

Härkönen

Hertzmann

Lehtinen

, et al. GANSpace: discovering interpretable GAN controls. In Advances in Neural Information Processing Systems, 2020, vol. 33, pp. 9841–9850.

37.

Shen

Zhou

Closed-form factorization of latent semantics in GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1532–1540.

38.

Zhu

Shen

, et al. Region-based semantic factorization in GANs. arXiv preprint arXiv:2202.09649, 2022.

39.

Wei

Shi

Liu

, et al. Orthogonal Jacobian regularization for unsupervised disentanglement in image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6721–6730.

40.

Tov

Alaluf

Nitzan

, et al. Designing an encoder for StyleGAN image manipulation. ACM Trans Graph 2021; 40(4): 1–14.

41.

Karras

Laine

Aila

A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401–4410.

42.

Karras

Laine

Aittala

, et al. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8110–8119.

43.

Karras

Aittala

Laine

, et al. Alias-free generative adversarial networks. In Advances in Neural Information Processing Systems, 2021, vol. 34, pp. 852–863.

44.

Liu

Lin

Cao

, et al. SWIN Transformer: hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.

45.

Dosovitskiy

Beyer

Kolesnikov

, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

46.

Heusel

Ramsauer

Unterthiner

, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, 2017, vol. 30.

47.

Xie

Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1395–1403.

48.

Martin

Fowlkes

Tal

, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision, ICCV 2001. IEEE, 2001, vol. 2, pp. 416–423.

49.

Zhang

Ren

, et al. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

SKTNet: a semantically guided attribute disentanglement network for fashion sketch editing

Abstract

Keywords

Get full access to this article

References