Abstract:Graph processing model is being adopted extensively in various domains such as online gaming, social media, scientific computing and Internet of Things (IoT). Since general purpose data processing tools such as MapReduce are shown to be inefficient for iterative graph processing, many frameworks have been developed in recent years to facilitate analytics and computing of large-scale graphs. However, regardless of distributed or single machine based architecture of such frameworks, dynamic scalability is always a major concern. It becomes even more important when there is a correlation between scalability and monetary cost - similar to what public clouds provide. The pay-as-you-go model that is used by public cloud providers enables users to pay only for the number of resources they utilize. Nevertheless, processing large-scale graphs in such environments has been less studied and most frameworks are implemented for commodity clusters where they will not be charged for the resources that they consume. In this paper, we have developed algorithms to take advantage of resource heterogeneity in cloud environments. Using these algorithms, the system can automatically adjust the number and types of virtual machines according to the computation requirements for convergent graph applications to improve the performance and reduce the monetary cost of the entire operation. Also, a smart profiling mechanism along with a novel dynamic repartitioning approach helps to distribute graph partitions expeditiously. It is shown that this method outperforms popular frameworks such as Giraph and decreases more than 50 percent of the dollar cost compared to Giraph.

Rack-Scaling: An efficient rack-based redistribution method to accelerate the scaling of cloud disk arrays

A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P

Design and Evaluation of a New Approach to RAID-0 Scaling

Xscale: Online X-Code RAID-6 Scaling Using Lightweight Data Reorganization

SLAS: An efficient approach to scaling round-robin striped volumes

Scale-Out vs. Scale-Up Techniques for Cloud Performance and Productivity

FastScale: Accelerate RAID Scaling by Minimizing Data Migration.

MultiScaler: A Multi-Loop Auto-Scaling Approach for Cloud-Based Applications

Disaggregated RAID Storage in Modern Datacenters

Unleash Stranded Power in Data Centers with RackPacker

Moving big data to the cloud

RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers

Accelerate RDP RAID-6 Scaling by Reducing Disk I/Os and XOR Operations

Automatic Scaling of Internet Applications for Cloud Computing Services

Automating distributed tiered storage management in cluster computing

A Resource Co-Allocation Method for Load-Balance Scheduling over Big Data Platforms

Optimizing Multi-Cloud CDN Deployment and Scheduling Strategies Using Big Data Analysis

A Cost-Efficient Auto-Scaling Algorithm for Large-Scale Graph Processing in Cloud Environments with Heterogeneous Resources

A New Approach to Double I/O Performance for Ceph Distributed File System in Cloud Computing

Location-Aware Data Block Allocation Strategy for HDFS-Based Applications in the Cloud

Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing