Abstract:With the rapid developments in video processing technologies, video data have increased rapidly and become popular in our daily life for both professional and consumer applications such as surveillance, education and entertainment. Because of the increasing processing workload, more and more queries with expensive video predicates are being implemented in a parallel environment for better performance. Such requirements entail that the data management system not only be able to store and access video content, but also be able to optimize queries that have expensive video predicates in an effective and efficient way in a cloud environment. In previous research literatures, parallel and distributed policies and query optimizations in relational database management systems are often based on the disk input/output (I/O) cost of involved operations and network transmission cost. However, for a query that contains expensive video predicates in a cloud environment, the traditional cost estimation model does not work well. Although researchers have proposed some approaches that can solve the problem in certain situations, there are still some unresolved issues, and these approaches need further optimizations. This paper is motivated by a real-world large supermarket business data and video surveillance data management scenario in a parallel environment. By considering the characteristics of video data and their expensive processing, we present methods named operating results buffer and operating results buffer-C for implementing expensive video predicates at simple node, mapping video data and executing expensive video predicates in a cloud environment, which reduce the cost of video data transmission and the invoking times of expensive video predicates. We propose a novel query optimization approach that reconstructs the join order-based estimation for attribute cardinality and computes the total cost with I/O, network and expensive processing. This approach reduces the invoking times of expensive video predicates to a greater degree and gives a better solution for mixed query optimization, which contains traditional data types and large object operations in a cloud environment. Our query performance improves by 30% to 80% compared with existing expensive predicates query optimization methods. Copyright (c) 2011 John Wiley & Sons, Ltd.

Efficient Batch Grouping In Relational Datasets

Efficient sorting, duplicate removal, grouping, and aggregation

ISRA-Based Grouping: A Disk Reorganization Approach for Disk Energy Conservation and Disk Performance Enhancement

A Study of Performance Optimization Method for Massive Spaito-temporal Data Based on Spatio-temporal Partition Clustering

Scheduling A Batch Processing Machine with Non-Identical Job Sizes: A Clustering Perspective

KCGS-Store: A Columnar Storage Based on Group Sorting of Key Columns

RPK-table Based Efficient Algorithm for Join-Aggregate Query on MapReduce.

Communication-Efficient Task Scheduling for Real-Time Distributed Computing.

Grouping Time Series for Efficient Columnar Storage.

PI-Join: Efficiently Processing Join Queries on Massive Data

Hybrid genetic algorithm-based optimisation of the batch order picking in a dense mobile rack warehouse

A Data Grouping Model Based on Cache Transaction for Unstructured Data Storage Systems

Cost-Based Optimization Of Logical Partitions For A Query Workload In A Hadoop Data Warehouse

Optimizing queries with expensive video predicates in cloud environment.

Wide Table Layout Optimization Based on Column Ordering and Duplication

Optimizing Data Migration Using Online Clustering.

Scheduling Parallel Batching Machines with Non-Identical Job Sizes from a Clustering Perspective

A grouping genetic algorithm for the Order Batching Problem in distribution warehouses

I/O efficient: computing SCCs in massive graphs

A Parallel Hierarchical Aggregation Algorithm in High Dimensional Data Warehouse

Earliness-tardiness Minimization on Scheduling a Batch Processing Machine with Non-Identical Job Sizes