Xin Cheng Shen

Search

Spark
RDDs (Resilient Distributed Datasets)

03 - Spark

May 21, 2024, 1 min read

Hadoop is great, but has a ton of boilerplate and repetition. We can have better abstraction.

Hive
Pig

Spark

Technically more efficient and more usable than Hadoop

RDDs (Resilient Distributed Datasets)

collections of objects spread across a cluster
built through parallel transformations
automatically rebuilt on failure

Graph View

Recent Notes

Xin Cheng Shen
May 21, 2024
06 - Server Design
May 21, 2024
07 - Distributed Naming
May 21, 2024

Backlinks

CS 451

Created with Quartz v4.1.0, © 2024

GitHub
Discord Community