Jialin Wu Homepage
Jialin Wu Homepage
Home
Contact
Experience
Publications
Light
Dark
Automatic
Paper-Conference
Gemini 2.5 Flash Image
Try it on
https://aistudio.google.com/models/gemini-2-5-flash-image
Google
Gemini 2.5:Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
In this report, we introduce the Gemini 2.X model family. Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash …
Gemini Team
PDF
Distilling vision-language models on millions of videos
The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this …
Yue Zhao
,
Long Zhao
,
Xingyi Zhou
,
Jialin Wu
,
Chun-Te Chu
,
Hui Miao
,
Florian Schroff
,
Hartwig Adam
,
Ting Liu
,
Boqing Gong
,
Philipp Krähenbühl
,
Liangzhe Yuan
PDF
Omni-SMoLA:Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Large multi-modal models (LMMs) exhibit remarkable performance across numerous tasks. However, generalist LMMs often suffer from …
Jialin Wu
,
Xia Hu
,
Yaqing Wang
,
Bo Pang
,
Radu Soricut
PDF
CausalLM Is Not Optimal for In-Context Learning
Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model …
Nan Ding
,
Tomer Levinboim
,
Jialin Wu
,
Sebastian Goodman
,
Radu Soricut
PDF
PaLI-X:On Scaling up a Multilingual Vision and Language Model
We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the …
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo
,
Jialin Wu
,
Et. Al.
,
(43 Authors)
PDF
RT-2:Vision-language-action models transfer web knowledge to robotic controling
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to …
Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia
,
Jialin Wu
,
Et. Al.
,
(54 Authors)
PDF
Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering
Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge …
Jialin Wu
,
Raymond J. Mooney
PDF
Multi-Modal Answer Validation for Knowledge-Based VQA
The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to …
Jialin Wu
,
Jiasen Lu
,
Ashish Sabharwal
,
Roozbeh Mottaghi
PDF
Improving VQA and its Explanations by Comparing Competing Explanations
Most recent state-of-the-art Visual Question Answering (VQA) systems are opaque black boxes that are only trained to fit the answer …
Jialin Wu
,
Liyan Chen
,
Raymond J. Mooney
PDF
»
Cite
×