Learning Apache Spark 2.
Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analytics About This Book Exclusive guide that covers how to get up and running with fast data processing using Apache Spark Explore and exploit various possibilities with Apache Spark using...
Saved in:
Main Author: | |
---|---|
Format: | Electronic eBook |
Language: | English |
Published: |
Packt Publishing,
2017.
|
Subjects: | |
Online Access: |
Full text (Emmanuel users only) |
MARC
LEADER | 00000cam a2200000ua 4500 | ||
---|---|---|---|
001 | in00000188268 | ||
006 | m o d | ||
007 | cr |n||||||||| | ||
008 | 170331s2017 xx o 000 0 eng d | ||
005 | 20240702214223.7 | ||
016 | 7 | |a 018316647 |2 Uk | |
019 | |a 981232962 |a 981692458 |a 981847576 | ||
020 | |a 1785889583 |q (ebk) | ||
020 | |a 9781785889585 | ||
020 | |a 9781785885136 | ||
020 | |a 1785885138 | ||
020 | |z 1785885138 | ||
035 | |a (OCoLC)980837825 |z (OCoLC)981232962 |z (OCoLC)981692458 |z (OCoLC)981847576 | ||
037 | |a 1003762 |b MIL | ||
040 | |a IDEBK |b eng |e pn |c IDEBK |d EBLCP |d YDX |d MERUC |d CHVBK |d OCLCO |d COO |d VT2 |d OCLCF |d OCLCQ |d UKMGB |d OCLCQ |d LVT |d UKAHL |d CNCEN |d NLW |d OCLCQ |d OCLCO |d OCLCL | ||
050 | 4 | |a T55.4-60.8 | |
082 | 0 | 4 | |a 006.3 |2 23 |
100 | 1 | |a Abbasi, Muhammad Asif. | |
245 | 1 | 0 | |a Learning Apache Spark 2. |
260 | |b Packt Publishing, |c 2017. | ||
300 | |a 1 online resource | ||
336 | |a text |b txt |2 rdacontent | ||
337 | |a computer |b c |2 rdamedia | ||
338 | |a online resource |b cr |2 rdacarrier | ||
505 | 0 | |a Cover; Copyright; Credits; About the Author; About the Reviewers; www.packtpub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: Architecture and Installation; Apache Spark architecture overview; Spark-core; Spark SQL; Spark streaming; MLlib; GraphX; Spark deployment; Installing Apache Spark; Writing your first Spark program; Scala shell examples; Python shell examples; Spark architecture; High level overview; Driver program; Cluster Manager; Worker; Executors; Tasks; SparkContext; Spark Session; Apache Spark cluster manager types. | |
505 | 8 | |a Building standalone applications with Apache SparkSubmitting applications; Deployment strategies; Running Spark examples; Building your own programs; Brain teasers; References; Summary; Chapter 2: Transformations and Actions with Spark RDDs; What is an RDD?; Constructing RDDs; Parallelizing existing collections; Referencing external data source; Operations on RDD; Transformations; Actions; Passing functions to Spark (Scala); Anonymous functions; Static singleton functions; Passing functions to Spark (Java); Passing functions to Spark (Python); Transformations; Map(func); Filter(func). | |
505 | 8 | |a FlatMap(func)Sample (withReplacement, fraction, seed); Set operations in Spark; Distinct(); Intersection(); Union(); Subtract(); Cartesian(); Actions; Reduce(func); Collect(); Count(); Take(n); First(); SaveAsXXFile(); foreach(func); PairRDDs; Creating PairRDDs; PairRDD transformations; reduceByKey(func); GroupByKey(func); reduceByKey vs. groupByKey -- Performance Implications; CombineByKey(func); Transformations on two PairRDDs; Actions available on PairRDDs; Shared variables; Broadcast variables; Accumulators; References; Summary; Chapter 3: ETL with Spark; What is ETL?; Exaction; Loading. | |
505 | 8 | |a TransformationHow is Spark being used?; Commonly Supported File Formats; Text Files; CSV and TSV Files; Writing CSV files; Tab Separated Files; JSON files; Sequence files; Object files; Commonly supported file systems; Working with HDFS; Working with Amazon S3; Structured Data sources and Databases; Working with NoSQL Databases; Working with Cassandra; Obtaining a Cassandra table as an RDD; Saving data to Cassandra; Working with HBase; Bulk Delete example; Map Partition Example; Working with MongoDB; Connection to MongoDB; Writing to MongoDB; Loading data from MongoDB. | |
505 | 8 | |a Working with Apache SolrImporting the JAR File via Spark-shell; Connecting to Solr via DataFrame API; Connecting to Solr via RDD; References; Summary; Chapter 4: Spark SQL; What is Spark SQL?; What is DataFrame API?; What is DataSet API?; What's new in Spark 2.0?; Under the hood -- catalyst optimizer; Solution 1; Solution 2; The Sparksession; Creating a SparkSession; Creating a DataFrame; Manipulating a DataFrame; Scala DataFrame manipulation -- examples; Python DataFrame manipulation -- examples; R DataFrame manipulation -- examples; Java DataFrame manipulation -- examples. | |
520 | |a Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analytics About This Book Exclusive guide that covers how to get up and running with fast data processing using Apache Spark Explore and exploit various possibilities with Apache Spark using real-world use cases in this book Want to perform efficient data processing at real time? This book will be your one-stop solution. Who This Book Is For This guide appeals to big data engineers, analysts, architects, software engineers, even technical managers who need to perform efficient data processing on Hadoop at real time. Basic familiarity with Java or Scala will be helpful. The assumption is that readers will be from a mixed background, but would be typically people with background in engineering/data science with no prior Spark experience and want to understand how Spark can help them on their analytics journey. What You Will Learn Get an overview of big data analytics and its importance for organizations and data professionals Delve into Spark to see how it is different from existing processing platforms Understand the intricacies of various file formats, and how to process them with Apache Spark. Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager. Learn the concepts of Spark SQL, SchemaRDD, Caching and working with Hive and Parquet file formats Understand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark. Introduce yourself to the deployment and usage of SparkR. Walk through the importance of Graph computation and the graph processing systems available in the market Check the real world example of Spark by building a recommendation engine with Spark using ALS. Use a Telco data set, to predict customer churn using Random Forests. In Detail Spark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos. The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being u ... | ||
588 | 0 | |a Print version record. | |
630 | 0 | 0 | |a Spark (Electronic resource : Apache Software Foundation) |
758 | |i has work: |a Learning Apache Spark 2 (Text) |1 https://id.oclc.org/worldcat/entity/E39PCXQFfPJfdmKyT9cBhmHMxC |4 https://id.oclc.org/worldcat/ontology/hasWork | ||
852 | |b Online |h ProQuest | ||
856 | 4 | 0 | |u https://ebookcentral.proquest.com/lib/emmanuel/detail.action?docID=4833064 |z Full text (Emmanuel users only) |t 0 |
938 | |a Askews and Holts Library Services |b ASKH |n AH31954839 | ||
938 | |a EBL - Ebook Library |b EBLB |n EBL4833064 | ||
938 | |a ProQuest MyiLibrary Digital eBook Collection |b IDEB |n cis36983548 | ||
938 | |a YBP Library Services |b YANK |n 13951676 | ||
947 | |a FLO |x pq-ebc-base | ||
999 | f | f | |s d0b48481-3c5a-447b-9e9d-dab0e45cc7b0 |i c8e0d8f0-a123-4124-9228-6cb1af477aed |t 0 |
952 | f | f | |a Emmanuel College |b Main Campus |c Emmanuel College Library |d Online |t 0 |e ProQuest |h Other scheme |
856 | 4 | 0 | |t 0 |u https://ebookcentral.proquest.com/lib/emmanuel/detail.action?docID=4833064 |y Full text (Emmanuel users only) |