What is it about?

Visual and auditory modalities are complementary to each other and are important sources for humans to recognize information. Even if only one modal exists, there may be no difficulty in understanding the corresponding information. Nevertheless, if both modalities exist, the understanding of the corresponding information will be further enhanced. Sometimes one of the visual and audio modalities for certain information may be lost or not present. In this case, if the missing modality can be generated from the existing one, the understanding of the information can be improved. Each modal has different characteristics because the method for expressing information is different. Therefore it is difficult to generate the missing modality from the remaining one. However, this is not impossible because each modality is interrelated and shares the same underlying features about the information. This paper focuses on the audio-to-image generation problem of generating corresponding images from the audio input. This audio-to-image generation method can be utilized in a variety of applications by generating new data that does not already exist for a given source data.

Featured Image

Why is it important?

• Conditional Supervised Contrastive Generative Adversarial Networks (C-SupConGAN) is introduced that is based on conditional supervised contrastive loss (C-SupCon loss) for audio-to-image generation. C- SupConGAN stabilize the training of GAN and further improves performance. • C-SupConGAN can be extended by using source or target data information as well as class information. C-SupConGAN can be utilized for various multi-modal generation tasks as well as general image generation tasks. • The influence of different information (class, source, or target) is analyzed by performing extensive experiments using C-SupConGAN. Results demonstrate that the proposed method generates high quality images and achieves state-of-the-art results on the Sub-URMP dataset.

Perspectives

Audio-to-image generation method can be utilized in a variety of applications by generating new data that does not already exist for a given source data.

HaeChun Chung

Read the Original

This page is a summary of: C-SupConGAN: Using Contrastive Learning and Trained Data Features for Audio-to-Image Generation, December 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3582099.3582121.
You can read the full text:

Read

Contributors

The following have contributed to this page