Inter- and intra-modal contrastive hybrid learning framework for multimodal abstractive summarization

HIGHLIGHTS

SUMMARY

Research into multimodal abstractive summarization (MAS) has provided approaches for integrating image and text modalities into a short, concise and readable textual summary. Others SeqModel Multimodal Encoder Textual Decoder Current research focuses more on processes of the multimodal fusion and textual generation steps instead of feature extraction, as the feature extractors have already been widely used in the fields of natural language processing (NLP) and computer vision (CV) and obtain good performance. In approaches of multimodal fusion, multiple inputs are fused by attention-based or gate-based mechanisms to learn a representation that is . . .

If you want to have access to all the content you need to log in!

Thanks :)

If you don't have an account, you can create one here.

Add A Knowledge Base Question !