Jialin Wu Homepage
Jialin Wu Homepage
Home
Contact
Experience
Publications
Light
Dark
Automatic
Publications
Type
Conference paper
Paper-Workshop
Date
2025
2024
2023
2022
2021
2020
2019
2018
Gemini 2.5 Flash Image
Try it on
https://aistudio.google.com/models/gemini-2-5-flash-image
Google
Gemini 2.5:Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
In this report, we introduce the Gemini 2.X model family. Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash …
Gemini Team
PDF
Distilling vision-language models on millions of videos
The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this …
Yue Zhao
,
Long Zhao
,
Xingyi Zhou
,
Jialin Wu
,
Chun-Te Chu
,
Hui Miao
,
Florian Schroff
,
Hartwig Adam
,
Ting Liu
,
Boqing Gong
,
Philipp Krähenbühl
,
Liangzhe Yuan
PDF
Omni-SMoLA:Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Large multi-modal models (LMMs) exhibit remarkable performance across numerous tasks. However, generalist LMMs often suffer from …
Jialin Wu
,
Xia Hu
,
Yaqing Wang
,
Bo Pang
,
Radu Soricut
PDF
CausalLM Is Not Optimal for In-Context Learning
Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model …
Nan Ding
,
Tomer Levinboim
,
Jialin Wu
,
Sebastian Goodman
,
Radu Soricut
PDF
PaLI-X:On Scaling up a Multilingual Vision and Language Model
We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the …
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo
,
Jialin Wu
,
Et. Al.
,
(43 Authors)
PDF
RT-2:Vision-language-action models transfer web knowledge to robotic controling
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to …
Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia
,
Jialin Wu
,
Et. Al.
,
(54 Authors)
PDF
Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering
Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge …
Jialin Wu
,
Raymond J. Mooney
PDF
Multi-Modal Answer Validation for Knowledge-Based VQA
The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to …
Jialin Wu
,
Jiasen Lu
,
Ashish Sabharwal
,
Roozbeh Mottaghi
PDF
Improving VQA and its Explanations by Comparing Competing Explanations
Most recent state-of-the-art Visual Question Answering (VQA) systems are opaque black boxes that are only trained to fit the answer …
Jialin Wu
,
Liyan Chen
,
Raymond J. Mooney
PDF
CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation
Daily scenes are complex in the real world due to occlusion, undesired lighting conditions, etc. Although humans handle those …
Jungjun Kim
,
Hanbin Ko
,
Jialin Wu
PDF
Self-Critical Reasoning for Robust Visual Question Answering
Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because …
Jialin Wu
,
Raymond J. Mooney
PDF
Generating Question Relevant Captions to Aid Visual Question Answering
Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We …
Jialin Wu
,
Zeyuan Hu
,
Raymond J. Mooney
PDF
Faithful Multimodal Explanation for Visual Question Answering
AI systems’ ability to explain their reasoning is critical to their utility and trustworthiness. Deep neural networks have …
Jialin Wu
,
Raymond J. Mooney
PDF
Dynamic Filtering with Large Sampling Field for Convnets
We propose a dynamic filtering strategy with large sampling field for ConvNets (LS-DFN), where the position-specific kernels learn from …
Jialin Wu
,
Dai Li
,
Yu Yang
,
Chandrajit Bajaj
,
Xiangyang Ji
PDF
Cite
×