Apple Presents Latest Research at CVPR 2026
Apple is showcasing new research at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 in Denver, June 3-7. The company is sponsoring the conference and presenting work on video generation, multimodal understanding, image compression, and more.
Article intelligence
Key points
- Apple will present multiple research papers at CVPR 2026, including STARFlow-V, AToken, and Velox.
- Scheduled activities include keynote talks, invited talks, poster sessions, and booth presentations.
- Apple is sponsoring the conference and has several researchers serving as area chairs and reviewers.
Why it matters
This matters because apple will present multiple research papers at CVPR 2026, including STARFlow-V, AToken, and Velox.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
Apple is presenting new research at the annual IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which takes place in person in Denver at the Colorado Convention Center from June 3 to June 7. We are proud to sponsor the conference, which brings together the scientific and industrial research communities in computer vision and pattern recognition. Below is an overview of Apple’s participation at CVPR 2026.
Jump to a section:
Schedule
Poster Presentations at the Apple Booth
Accepted Papers
Acknowledgements
Schedule
Stop by the Apple booth (#231) during exhibition hours. All times listed in MDT (local time):
Friday, June 5: 10:00 AM – 6:00 PM
Saturday, June 6: 10:00 AM – 6:00 PM
Sunday, June 7: 10:00 AM – 3:00 PM
Schedule
Wednesday, June 3
KEYNOTE TALK
Generative AI for Sign Language (GenSign) Workshop
9:00 AM - 1:00 PM, Room 112
Colin Lea will be giving a keynote talk during the workshop.
INVITED TALK
Efficient Deep Learning for Computer Vision (ECV) Workshop 2026
9:00 AM - 6:00 PM, Room 502
Oncel Tuzel will be giving an invited talk during the workshop.
INVITED TALK
Efficient and On-Device Generation (EDGE) Workshop 2026
1:00 PM - 6:00 PM, Room 210/212
Oncel Tuzel and Lu Jiang will be giving invited talks during the workshop.
AFFINITY EVENT
Women in Computer Vision (WiCV)
6:00 PM - 8:00 PM, Room 105 B, Mentorship Dinner Offsite
Hsin-Ping (Cindy) Huang and Maggie Xiao will be representing Apple at the WiCV Mentorship Dinner.
Thursday, June 4
INVITED TALK
Video Large Language Models (VidLLMs) Workshop 2026
8:30 AM - 5:00 PM, Room 3A-3D
Afshin Dehghan will be giving an invited talk during the workshop.
Friday, June 5
POSTER
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
4:00 PM - 6:00 PM, Exhibit Hall A & F, Poster Session 2, #178
Jiatao Gu, Ying Shen (University of Illinois Urbana-Champaign), Tianrong Chen, Laurent Dinh, Yuyang Wang, Miguel Angel Bautista, David Berthelot, Josh Susskind, Shuangfei Zhai
POSTER
From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs
4:00 PM - 6:00 PM, Exhibition Hall A & F, Poster Session 3, #453
Le Zhang (Mila - Quebec AI Institute Université de Montréal), Jihan Yang (New York University), Soundarya Krishnan, Jimit Majmudar, Hugh Ge, Prasoon Puri, Prathamesh Saraf, Shruti Bhargava, Dhivya Piraviperumal, Yinan Ling, Cindy Pan, Hong Yu, Aishwarya Agrawal (Mila - Quebec AI Institute Université de Montréal), Andy Tseng
POSTER
What Matters in Practical Learned Image Compression
4:00 PM - 6:00 PM, Exhibition Hall A & F, Poster Session 3, #457
Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang, Sanjay Nair, Divija Hasteer, Oren Rippel
Saturday, June 6
POSTER
Bootstrapping Sign Language Annotations with Sign Language Models
7:30 AM - 9:00 AM, Exhibit Hall A, Findings Posters, #035
Colin Lea, Vassilis Baltatzis, Raja Kushalnagar (Gallaudet University), Lorna Quandt (Gallaudet University), Leah Findlater, Connor Gillis
POSTER
Velox: Learning Representations of 4D Geometry and Appearance
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 4, #527
Anagh Malik (University of Toronto), Xiaoming Zhao, Dorian Chan, David Lindell (University of Toronto), Oncel Tuzel, Rick Chang
POSTER
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
4:45 PM - 6:45 PM, Exhibition Hall A, Poster Session 4, #146
Sanjoy Chowdhury, Karren D. Yang (Nuance Labs), Chun-Liang Li, Xudong Liu, Fartash Faghri, Pavan Kumar Anasosalu Vasu, Oncel Tuzel, Dinesh Manocha (University of Maryland, College Park), Raviteja Vemulapalli
Sunday, June 7
ORAL
AToken: A Unified Tokenizer For Vision
9:00 AM - 10:15 AM, Four Seasons Ballroom, Oral Session 5B: Generalization and Adaptation
Jiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang
POSTER
AToken: A Unified Tokenizer For Vision
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 5, #007
Jiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang
POSTER
UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 5, #069
Rui Tian (Fudan University), Mingfei Gao, Haiming Gang, Jiasen Lu, Zhe Gan, Yinfei Yang, Zuxuan Wu (Fudan University), Afshin Dehghan
POSTER
TrajTok: Learning Trajectory Tokens enables better Video Understanding
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 5, #240
Chenhao Zheng (University of Washington), Jieyu Zhang (University of Washington), Oncel Tuzel, Chun-Liang Li, Ranjay Krishna (University of Washington)
POSTER
DSO: Direct Steering Optimization for Bias Mitigation
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 6, #288
Lucas Monteiro Paes, Niv Sivakumar, Yinong Wang (Carnegie Mellon University), Masha Fedzechkina Donaldson, Barry Theobald, Luca Zappella, Nick Apostoloff
FINDINGS POSTER
VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
3:30 PM - 5:30 PM, Exhibit Hall A, Poster Session 3, #298
Pavan Kumar Anasosalu Vasu, Cem Koc, Fartash Faghri, Chun-Liang Li, Brian Feng, Jeff Lai, Meng Cao, Oncel Tuzel, Hadi Pour Ansari
POSTER
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
3:30 PM - 5:30 PM, Exhibition Hall A, Poster Session 6, #098
Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jiasen Lu, Ashley Tong, Yinfei Yang, Wenze Hu, Zhe Gan
POSTER
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
3:30 PM - 5:30 PM, Exhibition Hall A, Poster Session 6, #141
Di Feng, Kaixin Ma, Feng Nan, Haofeng Chen, Bohan Zhai, David Griffiths, Mingfei Gao, Zhe Gan, Eshan Verma, Yinfei Yang, Zhifeng Chen, Afshin Dehghan
POSTER
Learning Long-term Motion Embeddings for Efficient Kinematics Generation
3:30 PM - 5:30 PM, Exhibition Hall A, Poster Session 6, #595
Nick Stracke (Ludwig Maximilian University of Munich), Kolja Bauer (Ludwig Maximilian University of Munich), Stefan Andreas Baumann (Ludwig Maximilian University of Munich), Joshua Susskind, Miguel Angel Bautista, Björn Ommer (Ludwig Maximilian University of Munich)
Poster Presentations at the Apple Booth
Friday, June 5, 10:00 AM – 12:00 PM
Pavan Kumar Anasosalu Vasu will present VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models.
Friday, June 5, 2:00 PM – 4:00 PM
Byeongjoo Ahn and Jiasen Lu will present AToken: A Unified Tokenizer For Vision.
Saturday, June 6, 10:00 AM – 12:00 PM
Jiatao Gu will present STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows.
Saturday, June 6, 2:00 PM – 4:00 PM
Rick Chang will present Velox: Learning Representations of 4D Geometry and Appearance.
Di Feng will present SO-Bench: A Structural Output Evaluation of Multimodal LLMs.
Accepted Papers
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
AuthorsSanjoy Chowdhury†, Karren D. Yang**, Xudong Liu, Fartash Faghri, Pavan Kumar Anasosalu Vasu, Oncel Tuzel, Dinesh Manocha†**, Chun-Liang Li**, Raviteja Vemulapalli
AToken: A Unified Tokenizer for Vision
AuthorsJiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang
Bootstrapping Sign Language Annotations with Sign Language Models
AuthorsColin Lea, Vasileios Baltatzis, Connor Gillis, Raja Kushalnagar†**, Lorna Quandt†**, Leah Findlater
DSO: Direct Steering Optimization for Bias Mitigation
AuthorsLucas Monteiro Paes‡, Nivedha Sivakumar‡, Oliver Wang†‡**, Masha Fedzechkina, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff
From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs
AuthorsLe Zhang†**, Jihan Yang‡, Soundarya Krishnan, Jimit Majmudar, Xiou Ge, Prasoon Puri, Prathamesh Saraf, Shruti Bhargava, Dhivya Piraviperumal, Yinan Ling, Cindy Pan, Hong Yu, Aishwarya Agrawal†, Bo-Hsiang Tseng
Learning Long-Term Motion Embeddings for Efficient Kinematics Generation
AuthorsNick Stracke†‡, Kolja Bauer†‡, Stefan Andreas Baumann†‡, Miguel Ángel Bautista, Josh Susskind, Björn Ommer†‡
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
AuthorsYusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
AuthorsDi Feng, Kaixin Ma, Feng Nan, Haofeng Chen, Bohan Zhai, David Griffiths, Mingfei Gao, Zhe Gan, Eshan Verma, Yinfei Yang, Zhifeng Chen, Afshin Dehghan
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
AuthorsJiatao Gu†, Ying Shen‡**, Tianrong Chen, Laurent Dinh, Yuyang Wang, Miguel Ángel Bautista, David Berthelot, Josh Susskind, Shuangfei Zhai
TrajTok: Learning Trajectory Tokens enables better Video Understanding
AuthorsChenhao Zheng†‡, Jieyu Zhang†‡, Jianing Zhang†, Weikai Huang†‡, Ashutosh Kumar§, Quan Kong§, Oncel Tuzel, Chun-Liang Li, Ranjay Krishna†‡
UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning
AuthorsRui Tian†, Mingfei Gao§‡, Haiming Gang, Jiasen Lu, Zhe Gan, Yinfei Yang, Zuxuan Wu†§, Afshin Dehghan
Velox: Learning Representations of 4D Geometry and Appearance
AuthorsAnagh Malik†, Dorian Chan, Xiaoming Zhao, David B. Lindell†, Oncel Tuzel, Jen-Hao Rick Chang
VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
AuthorsPavan Kumar Anasosalu Vasu*, Cem Koc*, Fartash Faghri*, Chun-Liang Li, Bo Feng, Zhengfeng Lai, Meng Cao, Oncel Tuzel, Hadi Pouransari*
What Matters in Practical Learned Image Compression
AuthorsKedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang, Sanjay Nair, Divija Hasteer, Oren Rippel
Acknowledgements
Alex Colburn and Qi Shan are recognized as Outstanding Area Chairs.
Byeongjoo Ahn, Chen Chen, Fartash Faghri, Oncel Tuzel, and Xiaoming Zhao are Area Chairs.
Jeffrey Bigham is a Workshop Co-Organizer for “VizWiz Grand Challenge Workshop 2026”.
Sanjoy Chowdhury, Barry-John Theobald, Santhosh Kumar Ramakrishnan, and Raviteja Vemulapalli are recognized as Outstanding Reviewers.
Vassilis Baltatzis, Honor Chen, Rick Chang, Haiming Gang, Mingfei Gao, Pavan Kumar Anasosalu Vasu, Colin Lea, Xianhang Li, Xudong Liu, Yongxi Lu, and Huangjie Zheng are Reviewers.
Neural Information Processing Systems (NeurIPS) 2024
December 6, 2024research area General
Apple is presenting new research at the annual conference on Neural Information Processing Systems (NeurIPS), which takes place in person in Vancouver, Canada, from December 10 - 15. We are proud to again sponsor the multi-track interdisciplinary conference, which brings together the scientific and industrial research communities surrounding Machine Learning. Below is an overview of Apple’s participation at NeurIPS 2024.
Read more
Empirical Methods in Natural Language Processing (EMNLP) 2024
November 4, 2024research area Speech and Natural Language Processing
Apple is presenting new research at the Empirical Methods in Natural Language Processing (EMNLP) conference, which takes place in person in Miami, Florida, from November 12 - 16. We are proud to again sponsor the conference, which brings together the scientific and industrial research communities around natural language processing and artificial intelligence. Below is an overview of Apple’s participation at EMNLP 2024.
Read more