Singh, Arvinder Bhase, Ninad Jain, Manav Ghorpade, Tushar
Published in
ITM Web of Conferences
Machine Translation is the process of translating text from one language to another which helps to reduce the conversation gap among people from different cultural backgrounds. The task performed by the Machine Translation System is to automatically translate between pairs of different natural languages, where Neural Machine Translation System stan...
Xian, Tiantao Li, Zhixin Zhang, Canlong Ma, Huifang
Published in
Neural networks : the official journal of the International Neural Network Society
Transformer-based architectures have shown great success in image captioning, where self-attention module can model source and target interaction (e.g., object-to-object, object-to-word, word-to-word). However, the global information is not explicitly considered in the attention weight calculation, which is essential to understand the scene content...
Landi, Federico Baraldi, Lorenzo Cornia, Marcella Cucchiara, Rita
Published in
Neural networks : the official journal of the International Neural Network Society
Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains...
Alsharid, Mohammad El-Bouri, Rasheed Sharma, Harshita Drukker, Lior Papageorghiou, Aris T. Noble, J. Alison
Published in
Proceedings. IEEE International Symposium on Biomedical Imaging
We propose a curriculum learning captioning method to caption fetal ultrasound images by training a model to dynamically transition between two different modalities (image and text) as training progresses. Specifically, we propose a course-focused dual curriculum method, where a course is training with a curriculum based on only one of the two moda...
Dehaqi, Ali Mollaahmadi Seydi, Vahid Madadi, Yeganeh
Published in
SN Computer Science
Image captioning is a task to make an image description, which needs recognizing the important attributes and also their relationships in the image. This task requires to generate semantically and syntactically correct sentences. Most image captioning models are based on RNN and MLE methods, but we propose a novel model based on GAN networks where ...
Wang, X. (author) Feng, S. (author) Zhu, Jihua (author) Hasegawa-Johnson, Mark (author) Scharenborg, O.E. (author)
This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that des...
Liu, Huan Wang, Guangbin Huang, Ting He, Ping Skitmore, Martin Luo, Xiaochun
This study proposed an automated method for manifesting construction activity scenes by image captioning – an approach rooted in computer vision and natural language generation. A linguistic description schema for manifesting the scenes is developed initially and two unique dedicated image captioning datasets are created for method validation. A ge...
Huang, Feicheng Li, Zhixin Wei, Haiyang Zhang, Canlong Ma, Huifang
Published in
Machine Learning
Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping relationships between words in sentence and regions in image, such unpredictable matching manner sometimes causes inharm...
shuang, wu shaojing, fan zhiqi, shen mohan, kankanhalli TUNG KUM HOE, ANTHONY
Alsharid, Mohammad El-Bouri, Rasheed Sharma, Harshita Drukker, Lior Papageorghiou, Aris T. Noble, J. Alison
Published in
Medical ultrasound, and preterm, perinatal and paediatric image analysis
We present a novel curriculum learning approach to train a natural language processing (NLP) based fetal ultrasound image captioning model. Datasets containing medical images and corresponding textual descriptions are relatively rare and hence, smaller-sized when compared to the datasets of natural images and their captions. This fact inspired us t...