Spark GraphX in Action
While graphs are often the most natural way to represent the connections among data, the complexity of large graphs makes them conceptually difficult and computationally expensive to explore, query, and analyze. GraphX, a powerful graph processing API for the Apache Spark analytics engine, makes it possible to efficiently explore and interpret large-scale graph data at near-realtime speeds. GraphX works with Spark's in-memory distributed framework to offer unprecedented speed and capacity for analyzing social media data, performing complex textual analysis, handling important machine learning algorithms, and much more.Spark GraphX in Action starts out with an overview of Apache Spark and the GraphX graph processing API. This example-based tutorial explains how to configure GraphX and use GraphX interactively. It offers a crystal-clear introduction to graph elements, which are needed to build big data graphs. Then, it explores the problems and possibilities of graph algorithm implementations. Along the way, it details practical techniques for enhancing applications and applying machine learning algorithms to graph data. KEY FEATURES Example-based tutorial
Quickly gets readers started with GraphX
Allows readers to go beyond the standard API
Readers should be comfortable reading Scala code. Experience with graph data and Apache Spark is helpful, but not required.
ABOUT THE TECHNOLOGY
Graphs have been slowly building in the popular consciousness for the past two decades, from "the six degrees of Kevin Bacon" in the 1990’s to post-9/11 calls to "connect the dots." But since 2013, graphs have exploded into the popular realm with Facebook’s "graph search" and now the three-vertex graph icon is the universal icon for "sharing" on social media. GraphX is the graph processing module that is part of the distributed in-memory computing framework Apache Spark.