Abstract:Similarity Search has been studied in many different fields of computer science, including data mining, information retrieval, databases and so on. Document similarity self-join is a crucial part of lots of applications, such as near-duplicate document detection, document clustering and web search. On a collection of documents, document similarity self-join finds out all pairs of documents whose similarity values are no lower than a threshold value. However, similarity search is a computation intensive procedure and consumes a large amount of time as the dataset size increases. Thus, many serial algorithms focus on speeding up the process by decreasing the possible similarity candidates for each query object on high-dimensional sparse datasets, including documents. However, the efficiency of those serial algorithms degrade badly as the threshold decreases. Parallel implementations based on OpenMP or MapReduce also adopt the pruning policy and do not solve the problem thoroughly. In this context, taking into account features of document datasets, we propose 2Step-SSJ, which solves the document similarity self-join in CUDA environment on GPUs. 2Step-SSJ performs the similarity self-join in two steps, i.e., similarity computing on the inverted list and similarity computing on the forward list, which compromises between the memory visiting and dot product computation. The experimental results show that 2StepSSJ could solve the problem much faster than existing methods on three benchmark text corpora, achieving the speedup of 2x-23x against the state-of-the-art parallel algorithm in general, while keep a relatively stable running time with different values of the threshold.

Fast Mesh Similarity Measuring Based on CUDA

A Fast Sah-Based Construction of Octree

Matching 3D Models with Global Geometric Feature Map

Fast Calculating Simplification Error of Triangular Mesh Using CUDA

Gdist: Efficient Distance Computation Between 3D Meshes on GPU

Fast Document Cosine Similarity Self-Join on GPUs.

Fast Algorithm Based on LBP Texture Histogram for Background Modeling on CUDA

Connectivity-Based Segmentation for GPU-Accelerated Mesh Decompression

Fast Fairing of 3D Point Clouds Using CUDA

GTS: GPU-based Tree Index for Fast Similarity Search

3-D Surface Quality Evaluation Based on Graphics Processing Unit

Mesh Segmentation for Parallel Decompression on GPU.

The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better

Research of parallel global sea surface temperature contours extraction algorithm on CUDA platform

Fastgcn: A Gpu Accelerated Tool For Fast Gene Co-Expression Networks

Economic Upper Bound Estimation in Hausdorff Distance Computation for Triangle Meshes

PSCC: Parallel Self-Collision Culling with Spatial Hashing on GPUs

Fast And Accurate Collision Detection Using Programmable Graphics Hardware

CUDA based shadow volume algorithm for subdivision surfaces

Fast Triangle Mesh Surface Intersection Algorithm Based on Uniform Grid

Parallel Spatial Hashing for Collision Detection of Deformable Surfaces