Can ViT Layers Express Convolutions? Peking U, UCLA & Microsoft Researchers Say ‘Yes’ | Synced

By Sonic Mustang · March 16, 2026 · 1 min read

ai
machine learning & data science
research
ai
artificial intelligence

Source: Synced | AI Technology & Industry Review

In the new paper Can Vision Transformers Perform Convolution?, a research team from Peking University, UCLA and Microsoft Research proves that a single ViT layer with image patches as the input can perform any convolution operation constructively, and show that ViT performance in low data regimes can be significantly improved using their proposed ViT training pipeline.