NoSQL Comparison:Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris。
Big Data Benchmark:Redshift, Hive, Shark, Impala and Stiger/Tez的基准。
论文
Facebook - One Trillion Edges: Graph Processing at Facebook-Scale:一兆边:Facebook规模的图像处理。
Stanford - Mining of Massive Datasets:海量数据集挖掘。
AMPLab - Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices:稀疏矩阵的分布式机器学习和图像处理。
AMPLab - MLbase: A Distributed Machine-learning System:分布式机器学习系统。
AMPLab - Shark: SQL and Rich Analytics at Scale:大规模的SQL和丰富的分析。
AMPLab - GraphX: A Resilient Distributed Graph System on Spark:基于Spark的弹性分布式图计算系统。
Google - HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm:艺术形状的基数估算算法。
Microsoft - Scalable Progressive Analytics on Big Data in the Cloud:云端大数据的可扩展性渐进分析。
Metamarkets - Druid: A Real-time Analytical Data Store:实时分析数据存储。
Google - Online, Asynchronous Schema Change in F1:F1中在线、异步形式的转变。
Google - F1: A Distributed SQL Database That Scales:分布式SQL数据库。
Google - MillWheel: Fault-Tolerant Stream Processing at Internet Scale:互联网规模下的容错流处理。
Facebook - Scuba: Diving into Data at Facebook:深化Facebook的数据世界。
Facebook - Unicorn: A System for Searching the Social Graph:搜索社交图的系统。
Facebook - Scaling Memcache at Facebook:Facebook对Memcache的扩展。
视频
数据可视化:包括Noah Iliinsky的数据可视化设计、Hans Rosling的200 Countries, 200 Years, 4 Minutes等。