Зарегистрироваться
Восстановить пароль
FAQ по входу

Ruiz Ed., Kuo K., Luraschi J. Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling

  • Файл формата pdf
  • размером 9,68 МБ
  • Добавлен пользователем
  • Описание отредактировано
Ruiz Ed., Kuo K., Luraschi J. Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling
O’Reilly Media, Inc., 2020. — 250 p. — ISBN: 9781492046363.
If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.
Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.
Analyze, explore, transform, and visualize data in Apache Spark with R
Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows
Perform analysis and modeling across many machines using distributed computing techniques
Use large-scale data from multiple sources and different formats with ease from within Spark
Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale
Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions.
Authors
Formatting
Conventions Used in This Book
Using Code Examples
O’Reilly Safari
How to Contact Us
Information
Hadoop
Spark
R
sparklyr
Recap
Getting Started
Prerequisites
Installing sparklyr
Installing Spark
Connecting to Spark
Using Spark
Web Interface
Analysis
Modeling
Data
Extensions
Distributed R
Streaming
Logs
Disconnecting
Using RStudio
Resources
Recap
Analysis
R as an Interface to Spark
Exercise
Import / Access
Wrangle
Correlations
Visualize
Recommended approach
Simple Plots
Histograms
Scatter vs Raster Plots
Model
Cache model data
Communicate
Reports
Presentation decks
Recap
Modeling
Overview
The Data
Exploratory Data Analysis
Feature Engineering
Model Building
Logistic Regression as a Generalized Linear Regression
More Machine Learning Algorithms
Working with Textual Data
Data Prep
Topic Modeling
  • Чтобы скачать этот файл зарегистрируйтесь и/или войдите на сайт используя форму сверху.
  • Регистрация