[微服务技术文章之其二] 微服务原则:去中心化数据管理
日常前言
- 翻译任务终了,最近的项目也已经交付出去,现在剩下的就是一些历史遗留问题要慢慢和第三方沟通处理……开始进入真正的项目空闲期了。不过大概再有两个星期,就又要开始搞新的机型了,这次还是用的高通 SDM450 芯片,嗯…应该不会太忙。
- 近期的主要任务是要学习 Android Camera HAL3 的流程,重点关注 Framework 以及 HAL 的部分,至少要搞懂 openCamera 流程,以及 takePicture 时候的控制流与数据流走向吧。到时候应该会整理资料,出个一系列的学习笔记。
- 好吧,回到文章主题,这一期的内容,说实话吧,不是很感兴趣,而且我对微服务这方面也不熟悉,所以翻译的时候挺容易分心的。这次翻译主要就是练一练熟练度吧,也没记下太多笔记。觉得收获挺少的,希望下一期能来个我比较熟悉或者感兴趣的主题……
- 以及,之前每一期的礼物,已经堆满衣柜顶了…要想办法处理出去了hh…
- 这次只有两篇文章被采纳:
版权相关
翻译人:StoneDemo,该成员来自云+社区翻译社
原文链接:Microservice Principles: Decentralized Data Management
原文作者:Nathan Peck
Microservice Principles: Decentralized Data Management
题目:(微服务原则:去中心化数据管理)
Microservice philosophy favors decentralization in all aspects of software design. This focus on decentralization doesn’t just guide the organization of business logic. It also guides how data is persisted.
微服务的理念主张将软件设计的各方各面进行去中心化。这种对去中心化的关注不仅指导业务逻辑的组织,它还会指导人们如何对数据进行存储。
In the traditional monolithic approach to software design it is common to use a monolithic data store, such as a SQL server that contains a single database with many tables. This central database is used as an engine for all data persistence, and often portions of the application logic are offloaded into the SQL server in the form of queries that use complex joins, or even stored procedures.
在传统的整体式软件设计方法中,我们通常使用整体式的数据存储,例如包含诸多表格(Table)的单个数据库的 SQL 服务器。这种*数据库作为全体数据的持久性引擎而被使用,并且通常应用程序逻辑的一部分以使用复杂连接(甚至存储过程)的查询的形式被卸载到 SQL 服务器中。
In contrast microservice architecture favors decentralized data management, as covered by Martin Fowler’s original 2014 paper that defined microservices. This article extends the concept of decentralized data management by showing some of the modern architectural patterns for data management that lead to highly successful decentralized applications.
相比之下,微服务架构则更倾向于去中心化的数据管理,正如 Martin Fowler 在 2014 年发表的定义微服务的原始论文中所描述的那样。本文通过展示一些现代的数据管理架构模式来扩展去中心化数据管理的概念,这些模式带来了一些高度成功的去中心化应用程序。
Thinking in REST
(REST 的思想)
In order to correctly organize data in a decentralized manner it is important to first understand how to model data using Representational State Transfer, or REST for short. REST was defined in 2000 by Roy Fielding and has guided the development of many massively scalable stateless systems ever since.
为了以去中心化的方式正确地组织数据,首先要了解如何使用表述性状态转移(Representational State Transfer,简称 REST)对数据建模。REST 是由 Roy Fielding 于 2000 年定义的,从那以后它一直指导着许多大规模可扩展的无状态系统(Stateless system)的开发。
The core principle of REST is to give each resource that is part of your application a URL, and then use standard HTTP verbs to interact with the resource. For example, the API for a basic social messaging app might be organized like this:
REST 的核心原则是,为每个属于应用程序一部分的资源提供一个 URL,然后使用标准的 HTTP 动词(Standard HTTP verbs)与资源进行交互。例如,一个基础的社交消息应用程序的 API 可能是这样组织的:
This API has three primitive resource types: user, message, and friend. Each primitive type is served by a set of resource paths. In a monolith one central server would handle requests for all the resource paths, and typically this service would be backed by a single database that also stores all the resource types as tables:
这个 API 有三种基本资源类型:用户,消息和朋友。每种基本类型都由一组资源路径提供服务。在整体式架构中,一个*服务器将处理所有对资源路径的请求,通常这个服务是由单一数据库支持的,并且它也将所有资源类型存储为表格:
A microservices deployment employing decentralized data management would serve the three resource types using three services: one service for user resources, one service for message resources, and one service for friend relationships. In addition each service has its own database.
一个使用去中心化数据管理的微服务的部署,将使用三个服务来服务这三种资源类型:一个服务用于用户资源,另一个服务用于消息资源,还有一个服务用于朋友关系。另外,每个服务都有独自的数据库。
The fact that each microservice has its own database does not mean that there need to be three database servers. In the early days of a platform the three databases will probably just have a logical distinction as three databases all hosted by a single physical SQL server.
事实上,每个微服务都拥有各自的数据库,这并不意味着需要三个数据库服务器。在平台初期,这三个数据库可能仅仅具有逻辑上的区别,即三个数据库全部由单个物理 SQL 服务器托管。
However, creating this logical distinction sets the platform up for easy physical scaling in the future. If this platform gains massive adoption the database administrator can split the three logical databases into three databases served by three different physical servers.
但是,创建这种逻辑区别将为后续的物理扩展奠定基础。如果此平台得到大量采用,数据库管理员可以将三个逻辑数据库分割为由三个不同物理服务器进行服务的数据库。
Avoid SQL JOIN
(回避 SQL JOIN)
A critical aspect of good decentralized data management is to avoid SQL JOIN. The need for joins usually starts from efforts to make an API easier for clients to consume. For example, the messaging app we are using as an example might have a timeline view. The timeline needs to have the latest message from each friend of the authenticated user as well as that friend’s name and avatar beside the message.
良好的去中心化数据管理的一个关键点是:回避 SQL JOIN(SQL 语言中的 Join 连接语句)。对连接的需求通常始于尽量使客户端更容易地使用 API。例如,我们正在使用的消息传递应用程序可能有一个时间轴视图(Timeline view)。时间轴需要获得来自经过验证的用户的每位朋友的最新消息,以及消息旁边的该朋友的姓名和头像信息。
With the basic REST API that we have defined the client would need to make many API calls to populate this view. For example, a user with two friends would require that the client make the following API requests in order to populate the view:
使用我们定义的基础 REST API,客户端需要进行多次 API 调用才能填充此视图。例如,有两位朋友的用户,客户端需要发出以下 API 请求才能填充视图:
A total of five requests would be made. One request to get the list of friends of the user, followed by two requests to get the name and avatar of each friend, and two requests to get the latest message from each friend.
总共会发出五个请求。一个请求用于获取用户的朋友列表,随后两个请求获取每个朋友的姓名和头像,最后两个请求获取每个朋友发来的最新消息。
Obviously this is unacceptable from a performance standpoint because there is so much extra roundtrip latency between the client and the server before the view is ready to be displayed. A sensible solution to this problem would be to add a new route to the API:
显然从性能的角度来看,这是无法接受的,因为在准备好显示视图之前,客户端和服务器之间存在非常多的往返延迟。该问题的一个明智的解决方案是添加一条到 API 的新路由:
The client can then fetch this single timeline resource to get all the data it requires to render the timeline view.
然后,客户端就可以获取此单个时间轴的资源,以得到用于呈现时间轴视图所需的全部数据。
The technique used for implementing this new resource is a prime example of the difference between centralized data management, and decentralized data management. In a monolith the logic for serving this route would probably be coded as a SQL join, and offloaded to the database server, which would access all three tables to generate a result:
用于实现这种新资源的技术,是集中式和去中心化数据管理之间差异的一个主要例子。在一个整体式应用中,服务于这种路由的逻辑可能会被编码为 SQL 连接,并且被卸载到数据库服务器,这将访问全部三个表以产生结果:
SELECT
m.id id,
m.user `user`,
u.name name,
u.avatar avatar,
m.text `text`,
m.when `when`
FROM messages m
INNER JOIN (
SELECT `from`,
`to` FROM `friends`
WHERE `from`=1) f
ON m.user = f.to
INNER JOIN users u on m.user = u.id
ORDER BY m.when DESC;
In a decentralized data management architecture such a SQL join is not only not advised, but actually impossible if the data is properly separated using logical and/or physical boundaries.
在去中心化的数据管理架构中,我们不仅不建议这样的 SQL 连接,而且如果数据使用逻辑和(或)物理边界恰当地进行了分割,则实际上不可能进行连接。
Instead each microservice should be the only gateway to accessing its own table. No single microservice has access to all three tables. To expose the timeline resource to a client we create an additional timeline microservice that lives on top of the three underlying data microservices and treats each as resources from which it fetches. This top level microservice joins the data from the underlying microservices and exposes the joined result to the client:
相反地,每个微服务应该是访问其自己的表的唯一途径。没有一个微服务可以访问全部三个表。为了将时间轴资源展示给客户端,我们创建了一个额外的时间轴微服务,它位于三个底层数据微服务之上,并将每个微服务视为获取资源的源头。这个顶层的微服务从底层的微服务中连接数据并将连接的结果展示给客户端:
This timeline service can make requests to the backing microservices in a matter of milliseconds because the timeline service and the other microservices are hosted in the same datacenter, and perhaps in containers that are hosted on the same physical machine. To further reduce roundtrip network penalties the timeline service could also take advantage of “bulk fetch” endpoints. For example the user microservice could have an endpoint that accepts a list of user ID’s and returns all matching user objects, so that the timeline service only has to make one request to the friends service, one request to the user service, and one request to the message service.
此时间轴服务可以在几毫秒内向支持微服务器发出请求,因为时间轴服务和其他微服务托管在同一个数据中心中,并且可能托管在同一台物理机器上的容器中。为了进一步减少往返网络的开销,时间轴服务还可以利用“批量提取(Bulk fetch)”端点。例如,用户微服务可以有一个接收用户 ID 列表,并返回所有匹配的用户对象的端点,以便时间轴服务只需要向朋友的服务、用户服务,以及消息服务各发出一个请求就行了。
The timeline service functions as a centralized place to define the logic for what a timeline is. If business requirements change and the client now needs to display the latest two messages from each friend then this can easily be changed in the timeline service without needing to make modifications to the other backing microservices that actually host the basic resources.
时间轴服务作为一个中心位置来定义时间轴的逻辑。如果业务需求发生了变化,现在客户端需要显示来自每位朋友的最新两条消息,则可以在时间轴服务中轻松更改需求,而无需修改实际托管基础资源的其他支持微服务。
Additionally the separation between how the data is stored and how the data is manipulated for display to users allows the underlying microservices to be refactored, as long as they continue to adhere to the resource format that the timeline service expects. The maintainers of the friend service could easily rewrite how the friend relationships are stored without breaking the timeline service. On the other hand the use of join queries requires all joins against a table be reviewed and updated if you need to update the table structure.
此外,数据如何存储,以及数据如何被操作以供用户显示,两者之间的分离使得底层微服务可以被重构,只要它们继续遵循时间轴服务所期望的资源格式。朋友服务的维护人员可以轻松地重写朋友关系的存储方式,而不会中断时间轴服务。另一方面,当需要更新表结构之时,使用连接查询方式需要检查和更新针对表的所有连接。
Eventual Consistency
(最终一致性)
One of the side effects of decentralized data management is the need to handle eventual consistency. In a centralized data store developers can use transactional capabilities to ensure that data is in a consistent state between multiple tables. However, this is not the case when data is separated into different logical or physical databases.
去中心化数据管理的副作用之一,就是需要处理最终的一致性(Eventual Consistency)。在集中式数据存储中,开发人员可以使用事务功能来确保数据在多个表中处于一致状态。但是,当数据分为不同的逻辑或物理数据库时,情况就并非如此了。
For example, consider what would happen if a user fetched their timeline at the exact same moment that one of their friends deleted their account:
- Timeline service fetches list of friends from the friend service and sees a friend ID that it needs to resolve
- Friend deletes their account, which deletes the user object from the user service, as well as all the friend references in the friend service
- Timeline service attempts to turn the friend ID into user details by making a request to the user service, but receives a 404 Not Found response instead
例如,假设用户在其某个朋友删除其帐户的同一时间获取了他们的时间轴,会发生什么情况:
- 时间轴服务从朋友服务中获取朋友列表,并查看需要解析的朋友 ID。
- 朋友删除了自己的帐户,这会从用户服务中删除用户对象,以及朋友服务中的所有朋友引用。
- 时间轴服务尝试通过向用户服务发出请求来将朋友 ID 转换为用户详细信息,但接收到 404 Not Found 响应。
Decentralized data modeling requires extra conditional handling to detect and handle such race conditions where underlying data has changed between requests. For the case of a simple social media application this is typically easy. But for a more complex application it may be necessary to keep some tables together in the same database to take advantage of database transaction. Typically these linked tables would also be handled by a single microservice. Alternatively if related data needs strong consistency but is still to be decentralized it may be necessary to use a two-phase commit in order to manipulate it safely.
去中心化的数据建模需要额外的条件处理,以检测与处理基础数据在请求之间发生变化的竞争条件。通常对于简单的社交媒体应用来说,这是很简单的。但是对于更复杂的应用程序,就可能需要将某些表保存在同一个数据库中,这样便可以利用数据库事务。通常,这些链接的表格也会由单个微服务处理。或者,如果相关数据需要很强的一致性,但仍然需要去中心化,则可能需要使用两阶段提交(Two-phase commit)以便安全地操作它。
Polyglot Persistence
(混合持久化)
One significant advantage of decentralized data management is the ability to take advantage of polyglot persistence. Different types of data have different storage requirements:
- Read/Write Balance (Some types of data have a very high write volume. This can require a different type of data store compared to data that has low write volume but high read volume.)
- Data Structure (Some types of highly structured data such as JSON documents may be better stored in a NoSQL database such as MongoDB, while flat relational objects may be more efficiently stored in a SQL database.)
- Data Querying (Some data may be accessible using a simple key value store, while other types of data may require advanced querying based on the values of multiple columns.)
- Data Lifecycle (Some data is temporary in nature and can be stored in a fast, in-memory store such as Redis or Memcached, while other data must be retained for all time and needs very durable storage on disk.)
- Data Size (Some data is made up of fairly uniform rows of consistent byte size, while other data may include large blobs that need to be stored in something like AWS S3.)
去中心化数据管理的一个显着优势是能够利用混合持久化(Polyglot Persistence)。不同类型的数据具有不同的存储需求:
- 读/写平衡(某些类型的数据具有非常高的写入量,与具有低写入量但读取量高的数据相比,这可能需要不同类型的数据存储。)
- 数据结构(某些类型的高度结构化数据,如 JSON 文档可能更好地存储在诸如 MongoDB 这样的 NoSQL 数据库中,而平面关系对象存储在 SQL 数据库中可能会更有效。)
- 数据查询(某些数据可能使用简单的键值存储进行访问,而其他类型的数据可能需要基于多列值的高级查询。)
- 数据生命周期(某些数据本质上属于临时数据,可以存储在快速的内存存储中,例如 Redis 或 Memcached,而其它数据必须一直保存,并且需要在磁盘上非常持久地存储。)
- 数据尺寸(某些数据由相当一致的字节大小的相当同一的行组成,而其他数据可能包含需要存储在类似 AWS S3 中的大对象。)
For the example social messaging app each message is actually a structured JSON document that contains metadata about media files, geolocation, etc. Additionally, it is expected that there will be many users posting messages, and the total number of messages that need to be persisted will grow quickly. For this scenario the team in charge of messages may choose to utilize a sharded MongoDB cluster to persist this structured JSON data.
对于社交消息传递应用程序的例子,每条消息实际上是一个结构化的 JSON 文档,其中包含了媒体文件、地理位置等元数据。此外可以预期的是,将会有许多用户发布消息,并且需要保留的消息总数将迅速增长。对于这种情况,负责消息的团队可能会选择使用分片的 MongoDB 集群来保存这些结构化的 JSON 数据。
On the other hand the user and friendship tables have a simple, flat data model and do not grow as quickly, so those services are backed by a Postgres server.
另一方面,用户和朋友关系表具有简单,扁平的数据模型,并且不会快速增长,因此这些服务则由 Postgres 服务器来支持。
Because this application is utilizing decentralized data management principles it is able to take advantage of polyglot persistence and store the different types of data in different databases that serve the needs of that particular data type.
由于该应用程序使用着去中心化的数据管理原则,因此它可以利用混合持久化,并将不同类型的数据存储在满足特定数据类型需求的不同数据库中。
Conclusion
(总结)
Decentralized data management can be properly deployed by starting from REST basics to figure out the separations between different resource types. These separations should then drive microservice and database boundaries. Where multiple types of resources are required to serve a composite resource to a client this can be built by using a higher level microservice that joins data from different underlying microservices. This requires careful handling of eventual consistency, but it allows the use of polyglot persistence to store different types of data in storage providers that best handle that type of data.
去中心化数据管理可以从 REST 基础出发,找出不同资源类型之间的分隔来适当地部署。这些分离将推动微服务和数据库的边界。在为客户端提供复合资源所需的多种资源类型的情况下,我们可以使用更高层的微服务来构建这种资源,该微服务可以连接来自不同底层微服务的数据。这需要仔细处理最终一致性,但它允许使用混合持久化技术,将不同类型的数据存储在最适合处理该类型数据的存储提供者中。