Hadoop is great, but has a ton of boilerplate and repetition. We can have better abstraction.
- Hive
- Pig
Spark
Technically more efficient and more usable than Hadoop
RDDs (Resilient Distributed Datasets)
- collections of objects spread across a cluster
- built through parallel transformations
- automatically rebuilt on failure