Fine-grained cross-modal semantic consistency in natural conservation image data from a multi-task perspective

HIGHLIGHTS

  • What: Building upon this the authors propose momentum encoding from a multi-task perspective as a bridge for information effectively improving mutual information representation quality and optimizing the distribution of feature points within the crossmodal shared semantic space. The authors seek not only to retrieve a single image but also to attach essential descriptions when summoning an image. To address this, the authors propose a multi-task model for joint training in cross-modal image-text retrieval and image captioning. In this paper, the objective is to preserve semantic consistency in the context of fine-grained visual . . .

     

    Logo ScioWire Beta black

    If you want to have access to all the content you need to log in!

    Thanks :)

    If you don't have an account, you can create one here.

     

Scroll to Top

Add A Knowledge Base Question !

+ = Verify Human or Spambot ?