Title: fft convolution development
Author:klae#3618
Date posted: 2022/12/09
Summary
Existing FFT-based convolution implementations consume too much memory. I am trying to improve memory and speed by using c++ and cuda.
Background
G2NET large kernel inference | Kaggle
Regarding this notebook, I wanted to improve the efficiency of FFT conv for large-scale convolution. However, since I don’t have the equipment for large-scale testing, I would like to borrow resource.
Scope of Work
Convolution in a large kernel is faster than convolution based on BLAS, and improvement in terms of memory and time compared to FFT convolution using Naive FFT
Timeline
2022/12/13~2022/12/20: repository code testing
Specification
Nvidia gpu with cuda compilable environment
10+ GB system memory
Participants
individual