Soumya Shamarao Jahagirdar

I am a Master's by Research student at CVIT, IIIT Hyderabad, India. I work with Prof. C V Jawahar and Prof. Dimosthenis Karatzas from Computer Vision Center (CVC), UAB, Spain. My master's research is centered around multimodal learning. More specifically, I am working on understanding and combining information in videos through visual and textual modalities for question-answering.

Previously in my undergraduate research, I have worked with Prof. Shankar Gangisetty from KLE Technological University and Prof. Anand Mishra on text-based multimodal learning, specifically, utilizing Scene-text in images for Text-based Visual Question Generation. I have also worked with Prof. Uma Mudenagudi and Samsung R&D Institute India-Bangalore on Depth Estimation and Densification. In my undergrad I also worked as a Research Assistant with Prof. B A Patil at Think & Ink Education and Research Foundation.

Email  /  GitHub  /  Google Scholar  /  LinkedIn  /  CV  /  Twitter  

profile photo

Research

My research interests lie in Computer Vision, Deep Learning, Machine Learning, Multimodal Learning and Natural Language Processing. This field of study excites me and pushes me to work much harder everyday.

Publications

project image

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering


Soumya Shamarao Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
International Conference on Computer Vision (ICCV) Workshops, VLAR, 2023

More details coming soon!

project image

Weakly Supervised Visual Question Answer Generation


Charani Alampalle, Shamanthak Hegde, Soumya Shamarao Jahagirdar, Shankar Gangisetty
Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, ODRUM, 2023
paper

We propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption.

project image

Making the V in Text-VQA Matter


Shamanthak Hegde, Soumya Shamarao Jahagirdar, Shankar Gangisetty
Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, ODRUM, 2023
paper

We propose a method to learn visual features (making V matter in TextVQA) along with the OCR features and question features using VQA dataset as external knowledge for Text-based VQA.

project image

Watching the News: Towards VideoQA Models that can Read


Soumya Shamarao Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
Winter Conference on Applications of Computer Vision, WACV , 2023
paper / code / website / youtube

We propose a novel VideoQA task that requires reading and understanding the text in the video. We focus on news videos and require QA systems to comprehend and answer questions about the topics presented by combining visual and textual cues in the video. We introduce the “NewsVideoQA” dataset that comprises more than 8,600 QA pairs on 3,000+ news videos obtained from diverse news channels from around the world.

project image

Look, Read and Ask: Learning to Ask Questions by Reading Text in Images


Soumya Shamarao Jahagirdar, Shankar Gangisetty, Anand Mishra
International Conference on Document Analysis and Recognition, (ICDAR) , 2021
paper / code / website / youtube

We present a novel problem of text-based visual question generation or TextVQG in short. Given the recent growing interest of the document image analysis community in combining text understanding with conversational artificial intelligence, e.g., text-based visual question answering, TextVQG becomes an important task. TextVQG aims to generate a natural language question for a given input image and an automatically extracted text also known as OCR token from it such that the OCR token is an answer to the generated question.

project image

DeepDNet: Deep Dense Network for Depth Completion Task


Girish Hegde, Tushar "Soumya Shamarao Jahagirdar, Vaishakh Nargund, Ramesh Ashok Tabib, Uma Mudenagudi, Basavaraja Vandrotti, Ankit Dhiman"
Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, WiCV, 2021
paper

We propose a Deep Dense Network for Depth Completion Task (DeepDNet) towards generating dense depth map using sparse depth and captured view. We propose Dense-Residual-Skip (DRS) Autoencoder along with an attention towards edge preservation using Gradient Aware Mean Squared Error (GAMSE) Loss.




Patents

project image

Method and Device of Depth Densification using RGB Image and Sparse Depth


patent
2022-05-05
website

PATENT NUMBER: WO2022103171A1; PATENT OFFICE: US; PUBLICATION DATE: 2022/05/19; Inventors Suhas MUDENAGUDI Uma, HEGDE Girish, Dattatray, Tabib Ramesh Ashok, JAHAGIRDAR Soumya, Shamarao, PHARALE Tushar, Irappa, Vandrotti Basavaraja, Shanthappa, Dhiman Ankit, NARGUND Vaishakh



Design and source code from Jon Barron's website