Jialin Wu Homepage
Jialin Wu Homepage
Home
Contact
Experience
Publications
Light
Dark
Automatic
Publications
Type
Conference paper
Paper-Workshop
Date
2024
2023
2022
2021
2020
2019
2018
Distilling vision-language models on millions of videos
The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this …
Yue Zhao
,
Long Zhao
,
Xingyi Zhou
,
Jialin Wu
,
Chun-Te Chu
,
Hui Miao
,
Florian Schroff
,
Hartwig Adam
,
Ting Liu
,
Boqing Gong
,
Philipp Krähenbühl
,
Liangzhe Yuan
PDF
Omni-SMoLA:Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Large multi-modal models (LMMs) exhibit remarkable performance across numerous tasks. However, generalist LMMs often suffer from …
Jialin Wu
,
Xia Hu
,
Yaqing Wang
,
Bo Pang
,
Radu Soricut
PDF
CausalLM Is Not Optimal for In-Context Learning
Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model …
Nan Ding
,
Tomer Levinboim
,
Jialin Wu
,
Sebastian Goodman
,
Radu Soricut
PDF
PaLI-X:On Scaling up a Multilingual Vision and Language Model
We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the …
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo
,
Jialin Wu
,
Et. Al.
,
(43 Authors)
PDF
RT-2:Vision-language-action models transfer web knowledge to robotic controling
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to …
Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia
,
Jialin Wu
,
Et. Al.
,
(54 Authors)
PDF
Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering
Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge …
Jialin Wu
,
Raymond J. Mooney
PDF
Multi-Modal Answer Validation for Knowledge-Based VQA
The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to …
Jialin Wu
,
Jiasen Lu
,
Ashish Sabharwal
,
Roozbeh Mottaghi
PDF
Improving VQA and its Explanations by Comparing Competing Explanations
Most recent state-of-the-art Visual Question Answering (VQA) systems are opaque black boxes that are only trained to fit the answer …
Jialin Wu
,
Liyan Chen
,
Raymond J. Mooney
PDF
CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation
Daily scenes are complex in the real world due to occlusion, undesired lighting conditions, etc. Although humans handle those …
Jungjun Kim
,
Hanbin Ko
,
Jialin Wu
PDF
Self-Critical Reasoning for Robust Visual Question Answering
Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because …
Jialin Wu
,
Raymond J. Mooney
PDF
Generating Question Relevant Captions to Aid Visual Question Answering
Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We …
Jialin Wu
,
Zeyuan Hu
,
Raymond J. Mooney
PDF
Faithful Multimodal Explanation for Visual Question Answering
AI systems’ ability to explain their reasoning is critical to their utility and trustworthiness. Deep neural networks have …
Jialin Wu
,
Raymond J. Mooney
PDF
Dynamic Filtering with Large Sampling Field for Convnets
We propose a dynamic filtering strategy with large sampling field for ConvNets (LS-DFN), where the position-specific kernels learn from …
Jialin Wu
,
Dai Li
,
Yu Yang
,
Chandrajit Bajaj
,
Xiangyang Ji
PDF
Cite
×