Apache Spark Machine Learning Blueprints.

Develop a range of cutting-edge machine learning projects with Apache Spark using this actionable guideAbout This Book Customize Apache Spark and R to fit your analytical needs in customer research, fraud detection, risk analytics, and recommendation engine development Develop a set of practical Mac...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Alex
Format: Electronic eBook
Language:English
Published: Packt Publishing, 2016.
Edition:1.
Subjects:
Online Access: Full text (Emmanuel users only)
Table of Contents:
  • Cover; Copyright; Credits; About the Author; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Spark for Machine Learning; Spark overview and Spark advantages; Spark overview; Spark advantages; Spark computing for machine learning; Machine learning algorithms; MLlib; Other ML libraries; Spark RDD and dataframes; Spark RDD; Spark dataframes; Dataframes API for R; ML frameworks, RM4Es and Spark computing; ML frameworks; RM4Es; The Spark computing framework; ML workflows and Spark pipelines; ML as a step-by-step workflow; ML workflow examples; Spark notebooks.
  • Notebook approach for MLStep 1: Getting the software ready; Step 2: Installing the Knitr package; Step 3: Creating a simple report; Spark notebooks; Summary; Chapter 2: Data Preparation for Spark ML; Accessing and loading datasets; Accessing publicly available datasets; Loading datasets into Spark; Exploring and visualizing datasets; Data cleaning; Dealing with data incompleteness; Data cleaning in Spark; Data cleaning made easy; Identity matching; Identity issues; Identity matching on Spark; Entity resolution; Short string comparison; Long string comparison; Record deduplication.
  • Identity matching made betterCrowdsourced deduplication; Configuring the crowd; Using the crowd; Dataset reorganizing; Dataset reorganizing tasks; Dataset reorganizing with Spark SQL; Dataset reorganizing with R on Spark; Dataset joining; Dataset joining and its tool
  • the Spark SQL; Dataset joining in Spark; Dataset joining with the R data table package; Feature extraction; Feature development challenges; Feature development with Spark MLlib; Feature development with R; Repeatability and automation; Dataset preprocessing workflows; Spark pipelines for dataset preprocessing.
  • Dataset preprocessing automationSummary; Chapter 3: A Holistic View on Spark; Spark for a holistic view; The use case; Fast and easy computing; Methods for a holistic view; Regression modeling; The SEM approach; Decision trees; Feature preparation; PCA; Grouping by category to use subject knowledge; Feature selection; Model estimation; MLlib implementation; The R notebooks' implementation; Model evaluation; Quick evaluations; RMSE; ROC curves; Results explanation; Impact assessments; Deployment; Dashboard; Rules; Summary; Chapter 4: Fraud Detection on Spark; Spark for fraud detection.
  • The use caseDistributed computing; Methods for fraud detection; Random forest; Decision trees; Feature preparation; Feature extraction from LogFile; Data merging; Model estimation; MLlib implementation; R notebooks implementation; Model evaluation; A quick evaluation; Confusion matrix and false positive ratios; Results explanation; Big influencers and their impacts; Deploying fraud detection; Rules; Scoring; Summary; Chapter 5: Risk Scoring on Spark; Spark for risk scoring; The use case; Apache Spark notebooks; Methods of risk scoring; Logistic regression; Preparing coding in R.