Can ViT Layers Express Convolutions? Peking U, UCLA & Microsoft Researchers Say ‘Yes’ | Synced

In the new paper Can Vision Transformers Perform Convolution?, a research team from Peking University, UCLA and Microsoft Research proves that a single ViT layer with image patches as the input can...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

In the new paper Can Vision Transformers Perform Convolution?, a research team from Peking University, UCLA and Microsoft Research proves that a single ViT layer with image patches as the input can perform any convolution operation constructively, and show that ViT performance in low data regimes can be significantly improved using their proposed ViT training pipeline.