mit 6.824 Distributed Systems L1 Introduction
程序员文章站
2022-03-15 16:59:01
...
为什么选择分布式系统?
- parallelism
- fault tolerance
- physical
- security/isolated
challenges:
- concurrency
- partial failure
- performance
course structure
-
lectures
-
papers
-
exams
-
labs
-
project(optional)
-
Lab 1 - MapReduce
-
Lab 2 - Raft for fault tolerance
-
Lab 3 - K/V server
-
Lab 4 - standard K/V service
Infrastructure
- storage
- communication
- computation
希望外表建立一个非分布式系统(abstractions)
RPC, threads, concurrency ctl (locks, etc.)
Performance
- Scalability - 2x computers -> 2x throughput
Fault Tolerance
- Availability
- Recoverability
- NV(non volatile) storge
- Replication
Topic-consistency
Put(k,v)
Get(k) -> v
因为分布式系统中有多个表,多个表之间可能存在不一致性。
Strong consistency:保证取的肯定是最新值
Weak consistency:不保证取的是最新值
弱一致性的要求会低一点,现实中更常见。
MapReduce 的设计目的就是为了让更多的程序原来用这个框架而不用具体了解分布式的实现细节
Abstract view of a MapReduce job
input is (already) split into M files
Input1 -> Map -> a,1 b,1
Input2 -> Map -> b,1
Input3 -> Map -> a,1 c,1
| | |
| | -> Reduce -> c,1
| -----> Reduce -> b,2
---------> Reduce -> a,2
MR calls Map() for each input file, produces set of k2,v2
"intermediate" data
each Map() call is a "task"
MR gathers all intermediate v2's for a given k2,
and passes each key + values to a Reduce call
final output is set of <k2,v3> pairs from Reduce()s
Example: word count
input is thousands of text files
Map(k, v)
split v into words
for each word w
emit(w, "1")
Reduce(k, v)
emit(len(v))