Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR

 

Fourier Vision Transformer with Efficient Token Mixing for Image Classification

 

Barış Bilgin Şenol
Master Student
(Supervisor: Asst.Prof.Özgür Salih Öğüz)
Computer Engineering Department
Bilkent University

Abstract: The traditional success of the Transformer model is often credited to its self-attention mechanism for mixing tokens. However, recent research indicates that alternative mixing techniques can also achieve effective results in various applications, emphasizing the significance of the overall architecture rather than just the token mixer. In response, we present the Fourier Vision Transformer (FoViT), which integrates the Discrete Fourier Transform to enhance the computational efficiency of the Transformer model. This novel approach harnesses the unique properties of the Discrete Fourier Transform, employing frequency representations for efficient token mixing and modeling long- range feature interactions. Extensive testing on image classification tasks shows that the Fourier Vision Transformer surpasses the original Vision Transformer in throughput and memory efficiency, while still delivering comparable accuracy.

 

DATE: April 01, Monday @ 13:30 Place: EA 502