Sage Journals: Discover world-class research

Abstract

Text-to-image synthesis (T2I) is a challenging task, as the model must generate high-quality images that are both semantically realistic and consistent. Current approaches typically begin by producing an initial blurred image, which is then refined to improve quality. However, many existing methods struggle to ensure that the refined image accurately corresponds to the provided text description. To address this limitation, this paper proposes a novel Multimodal Similarity-based Generative adversarial network for Text to Image Generation (MSG-TIG) framework. The proposed MSG-TIG framework involves the input text and segmented mask image as input. These inputs are subjected to the preprocessing step, where the text is transformed into reduced words by using the TS2 approach that offers dimension-reduced text for better performance and the noise in the image gets removed using Median filtering. From the preprocessed text, Bag of Words (BoW) and Class Frequency assisted Term Frequency and Inverse Document Frequency (CF-TF-IDF) features are extracted. Conversely, the color features, Compute Neighbour Pixel value in Hierarchy of Skeleton (CNP-HoS)-based features are extracted from the preprocessed mask image. Subsequently, the extracted feature set is passed into the Modified Similarity Score-assisted Multimodal Similarity-based Generative Adversarial Network (MSS-MS-GAN) to generate multiple images. The MSS-MS-GAN adopts the Modified Similarity Score assisted Multimodal Similarity Model (MSS-MSM) in the Generator phase to obtain better generative output by reducing the risk of collapse. The MSS-MS-GAN strategy acquired the Inception Score of 4.913, SSIM of 0.861 and PSNR of 35.245. In addition, it acquired lesser error values, such as MAE=0.228 and MSE=0.094, respectively.

Keywords

Get full access to this article

View all access options for this article.

References

Mathew

. An overview of text to visual generation using GAN. Indian J Image Process Recogn 2024; 4: 1–9.

Shim

Kim

, et al. Augmentation leak-prevention scheme using an auxiliary classifier in GAN-based image generation. J King Saud Univ-Comp Inform Sci 2023; 35: 101711.

Bahani

El Ouaazizi

Maalmi

. AraBERT and DF-GAN fusion for Arabic text-to-image generation. Array 2022; 16: 100260.

Tan

Liu

, et al. KT-GAN: knowledge-transfer generative adversarial network for text-to-image synthesis. IEEE Trans Image Process 2021; 30: 1275–1290.

Sun

Qian

, et al. Pccm-gan: photographic text-to-image generation with pyramid contrastive consistency model. Neurocomputing 2021; 449: 330–341.

Tan

Lee

Neo

, et al. Enhanced text-to-image synthesis with self-supervision. IEEE Access 2023; 11: 39508–39519.

Habib

Wadud

Pinky

, et al. GAC-Text-to-Image synthesis with generative models using attention mechanisms with contrastive learning. IEEE Access 2023; 12: 1–14.

Hinz

Heinrich

Wermter

. Semantic object accuracy for generative text-to-image synthesis. IEEE Trans Pattern Anal Mach Intell 2022; 44: 1552–1565.

Haoran

Yang

Haipeng

, et al. Fine-Grained cross-modal fusion based refinement for text-to-image synthesis. Chin J Electron 2023; 32: 1329–1340.

10.

Hossain

Sohel

Shiratuddin

, et al. Text to image synthesis for improved image captioning. IEEE Access 2021; 9: 64918–64928.

11.

Berrahal

Azizi

. Optimal text-to-image synthesis model for generating portrait images using generative adversarial network techniques. Indo J Electr Eng Comp Sci 2022; 25: 972–979.

12.

Khan

Jabeen

Khan

, et al. A realistic image generation of face from text description using the fully trained generative adversarial networks. IEEE Access 2021; 9: 1250–1260.

13.

Peng

Yang

Liu

, et al. SAM-GAN: self-attention supporting multi-stage generative adversarial networks for text-to-image synthesis. Neur Netw 2021; 138: 57–67.

14.

Yanagi

Togo

Ogawa

, et al. Query is gan: scene retrieval with attentional text-to-image generative adversarial network. IEEE Access 2019; 7: 153183–93.

15.

Liu

Lin

. A framework for the synthesis of x-ray security inspection images based on generative adversarial networks. IEEE Access 2023; 11: 63751–63760.

16.

Fan

, et al. MRP-GAN: multi-resolution parallel generative adversarial networks for text-to-image synthesis. Pattern Recognit Lett 2021; 147: 1–7.

17.

Sun

Guo

. DSG-GAN: multi-turn text-to-image synthesis via dual semantic-stream guidance with global and local linguistics. Intell Syst Appl 2023; 20: 200271.

18.

Pernuš

Fookes

Štruc

, et al. FICE: text-conditioned fashion-image editing with guided GAN inversion. Pattern Recognit 2025; 158: 111022.

19.

Matsumori

Abe

Shingyouchi

, et al. Lattegan: visually guided language attention for multi-turn text-conditioned image manipulation. IEEE Access 2021; 9: 160521–32.

20.

Zhang

Zhou

. Text to image synthesis using multi-generator text conditioned generative adversarial networks. Multim Tools Appl 2021; 80: 7789–7803.

21.

Zhang

Zhou

, et al. Instance mask embedding and attribute-adaptive generative adversarial network for text-to-image synthesis. IEEE Access 2020; 8: 37697–37711.

22.

Htwe

. Multimodal generative model based text-to-image synthesis. Int J Intell Eng Syst 2024; 17: 123–133.

23.

Bahng

Yoo

Cho

, et al. Coloring with words: Guiding image colorization through text-based palette generation. In Proceedings of the european conference on computer vision (eccv), 2018.

24.

Mhatre

Phondekar

Kadam

, et al. Dimensionality reduction for sentiment analysis using pre-processing techniques. In 2017 International conference on computing methodologies and communication (ICCMC), 2017.

25.

Song

Liu

. An improved adaptive weighted median filter algorithm. In Journal of physics: conference series, 2019.

26.

Ayadi

Elhamzi

Charfi

, et al. A hybrid feature extraction approach for brain MRI classification based on bag-of-words. Biomed Sig Process Control 2019; 48: 144–152.

27.

Ahuja

Chug

Kohli

, et al. The impact of features extraction on the sentiment analysis. Proc Comp Sci 2019; 152: 341–348.

28.

Kavitha

Suruliandi

. Texture and color feature extraction for classification of melanoma using SVM. In 2016 International conference on computing technologies and intelligent data engineering (ICCTIDE'16), 2016, pp. 1–6. IEEE.

29.

Ayzenberg

Lourenco

. Skeletal descriptions of shape provide unique perceptual information for object recognition. Sci Rep 2019; 9: 9359.

30.

https://www.robots.ox.ac.uk/∼vgg/data/bicos/

31.

https://drive.google.com/file/d/0B0ywwgffWnLLcms2WWJQRFNSWXM/view?resourcekey=0-Av8zFbeDDvNcF1sSjDR32w

Modified multimodal similarity based generative adversarial network in text to image generation

Abstract

Keywords

Get full access to this article

References