Abstract:PreviousNext No AccessSEG Technical Program Expanded Abstracts 2009Accelerating 3D convolution using streaming architectures on FPGAsAuthors: Haohuan FuRobert G. ClappOskar MencerOliver PellHaohuan FuStanford UniversitySearch for more papers by this author, Robert G. ClappStanford UniversitySearch for more papers by this author, Oskar MencerImperial College LondonSearch for more papers by this author, and Oliver PellMaxelerSearch for more papers by this authorhttps://doi.org/10.1190/1.3255484 SectionsSupplemental MaterialAboutPDF/ePub ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InRedditEmail Abstract We investigate FPGA architectures for accelerating applications whose dominant cost is 3D convolution, such as modeling and Reverse Time Migration (RTM). We explore different design options, such as using different stencils, fitting multiple stencil operators into the FPGA, processing multiple time steps in one pass, and customizing the computation precisions. The exploration reveals constraints and tradeoffs between different design parameters and metrics. The experiment results show that the FPGA streaming architecture provides great potential for accelerating 3D convolution, and can achieve up to two orders of magnitude speedup.Permalink: https://doi.org/10.1190/1.3255484FiguresReferencesRelatedDetailsCited byHigh-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future ProjectionSmart-Cache: Optimising Memory Accesses for Arbitrary Boundaries and Stencils on FPGAsTuning Stencil codes in OpenCL for FPGAsFPGA-accelerated Richardson-Lucy deconvolution for 3D image dataSeismic Processing and the Computer Revolution(s)Robert G. Clapp19 August 2015Selected Case StudiesScaling Reverse Time Migration Performance through Reconfigurable Dataflow EnginesIEEE Micro, Vol. 34, No. 1Finite-Difference Wave Propagation Modeling on Special-Purpose Dataflow MachinesIEEE Transactions on Parallel and Distributed Systems, Vol. 24, No. 5Maximum Performance Computing with Dataflow Engines28 February 2013Exploiting run-time reconfiguration in stencil computationMaximum Performance Computing with Dataflow EnginesComputing in Science & Engineering, Vol. 14, No. 4Revisiting finite difference and spectral migration methods on diverse parallel architecturesComputers & Geosciences, Vol. 43Revisiting Convolution and FFT on Parallel Computation PlatformsHaohuan Fu, Robert G. Clapp, and Olav Lindtjorn21 October 2010 SEG Technical Program Expanded Abstracts 2009ISSN (print):1052-3812 ISSN (online):1949-4645Copyright: 2009 Pages: 4338 publication data© 2009 Copyright © 2009 Society of Exploration GeophysicistsPublisher:Society of Exploration Geophysicists HistoryPublished Online: 14 Oct 2009 CITATION INFORMATION Haohuan Fu, Robert G. Clapp, Oskar Mencer, and Oliver Pell, (2009), "Accelerating 3D convolution using streaming architectures on FPGAs," SEG Technical Program Expanded Abstracts : 3035-3039. https://doi.org/10.1190/1.3255484 Plain-Language Summary PDF DownloadLoading ...

Revisiting Convolution and FFT on Parallel Computation Platforms

Large Scale Numerical Simulation Via Parallelization and Reconfigurable Computing Hardware

Revisiting Finite Difference and Spectral Migration Methods on Diverse Parallel Architectures

Fast and High-Resolution Acoustic Beamforming: A Convolution Accelerated Deconvolution Implementation

On the Use of Small 2D Convolutions on GPUs

Accelerating 3D convolution using streaming architectures on FPGAs

Fast Fourier transforms for the evaluation of convolution products: CPU versus GPU implementation

Parallel Algorithms for Successive Convolution

Scaling and analyzing the stencil performance on multi-core and many-core architectures

A GPU Based Memory Optimized Parallel Method For FFT Implementation

HI-FFT: Heterogeneous Parallel In-Place Algorithm for Large-Scale 2D-FFT

GPU Support for Automatic Generation of Finite-Differences Stencil Kernels

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Accelerating Fast Fourier Transforms Using Hadoop and CUDA

Optimizing Complex Spatially-Variant Coefficient Stencils for Seismic Modeling on GPU

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs

Performance Optimization and Parallelization of a Parabolic Equation Solver in Computational Ocean Acoustics on Modern Many-core Computer

Parallel Implementations of the Split-Step Fourier Method for Solving Nonlinear Schrödinger Systems

Towards On-Chip Optical FFTs for Convolutional Neural Networks