Kafka学习(一)-------- Quickstart
参考官网:
一、下载kafka
官网下载地址
截至2019年7月8日 最新版本为 2.3.0 2.12为编译的scala版本 2.3.0为kafka版本
- scala 2.12 - kafka_2.12-2.3.0.tgz (asc, sha512)
解压
> tar -xzf kafka_2.12-2.3.0.tgz
> cd kafka_2.12-2.3.0
二、启动服务
要先启动zookeeper kafka内置了一个 也可以不用
> bin/zookeeper-server-start.sh config/zookeeper.properties [2013-04-22 15:01:37,495] info reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.quorumpeerconfig) ... > bin/kafka-server-start.sh config/server.properties [2013-04-22 15:01:47,028] info verifying properties (kafka.utils.verifiableproperties) [2013-04-22 15:01:47,051] info property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.verifiableproperties) ...
三、创建topic
replication-factor为1 partitions为1 > bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test 查看topic > bin/kafka-topics.sh --list --bootstrap-server localhost:9092 test
也可以不创建topic 设置自动创建 当publish的时候
四、发送消息
用command line client 进行测试 一行就是一条消息
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test this is a message this is another message
五、消费者
command line consumer 可以接收消息
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning this is a message this is another message
六、设置多broker集群
单broker没有意思 我们可以设置三个broker
首先为每个broker 复制配置文件
> cp config/server.properties config/server-1.properties > cp config/server.properties config/server-2.properties
然后编辑
config/server-1.properties: broker.id=1 listeners=plaintext://:9093 log.dirs=/tmp/kafka-logs-1 config/server-2.properties: broker.id=2 listeners=plaintext://:9094 log.dirs=/tmp/kafka-logs-2
broker.id是唯一的 cluster中每一个node的名字 我们在same machine上 所有要设置listeners和log.dirs 以防冲突
建一个topic 一个partitions 三个replication-factor
> bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 1 --topic my-replicated-topic 用describe看看都是什么情况 > bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic topic:my-replicated-topic partitioncount:1 replicationfactor:3 configs: topic: my-replicated-topic partition: 0 leader: 1 replicas: 1,2,0 isr: 1,2,0
- 有几个概念 :
- "leader" is the node responsible for all reads and writes for the given partition. each node will be the leader for a randomly selected portion of the partitions.
- "replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
-
"isr" is the set of "in-sync" replicas. this is the subset of the replicas list that is currently alive and caught-up to the leader.
刚才那个topic
bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test
topic:test partitioncount:1 replicationfactor:1 configs:
topic: test partition: 0 leader: 0 replicas: 0 isr: 0
发送 接收
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic ... my test message 1 my test message 2 ^c > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic ... my test message 1 my test message 2 ^c
试一下容错 fault-tolerance
> ps aux | grep server-1.properties 7564 ttys002 0:15.91 /system/library/frameworks/javavm.framework/versions/1.8/home/bin/java... > kill -9 7564 看一下变化:leader换了一个 因为1被干掉了 > bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic topic:my-replicated-topic partitioncount:1 replicationfactor:3 configs: topic: my-replicated-topic partition: 0 leader: 2 replicas: 1,2,0 isr: 2,0 还是收到了消息 > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic ... my test message 1 my test message 2 ^c
七、使用kafka import/export data
刚才都是console 的数据,其他的sources other systems呢 用kafka connect
弄一个数据 > echo -e "foo\nbar" > test.txt 启动 指定配置文件 > bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties 验证一下 > more test.sink.txt foo bar 消费者端 > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning {"schema":{"type":"string","optional":false},"payload":"foo"} {"schema":{"type":"string","optional":false},"payload":"bar"} ... 可以继续写入 > echo another line>> test.txt
八、使用kafka streams
wordcountdemo
代码片段
// serializers/deserializers (serde) for string and long types final serde<string> stringserde = serdes.string(); final serde<long> longserde = serdes.long(); // construct a `kstream` from the input topic "streams-plaintext-input", where message values // represent lines of text (for the sake of this example, we ignore whatever may be stored // in the message keys). kstream<string, string> textlines = builder.stream("streams-plaintext-input", consumed.with(stringserde, stringserde); ktable<string, long> wordcounts = textlines // split each text line, by whitespace, into words. .flatmapvalues(value -> arrays.aslist(value.tolowercase().split("\\w+"))) // group the text words as message keys .groupby((key, value) -> value) // count the occurrences of each word (message key). .count() // store the running counts as a changelog stream to the output topic. wordcounts.tostream().to("streams-wordcount-output", produced.with(serdes.string(), serdes.long()));
建一个 kafka producer 指定input topic output topic
> bin/kafka-topics.sh --create \ --bootstrap-server localhost:9092 \ --replication-factor 1 \ --partitions 1 \ --topic streams-wordcount-output \ --config cleanup.policy=compact created topic "streams-wordcount-output".
启动wordcount demo application
bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.wordcountdemo
启动一个生产者写数据
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-plaintext-input all streams lead to kafka hello kafka streams
启动一个消费者接数据
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \ --topic streams-wordcount-output \ --from-beginning \ --formatter kafka.tools.defaultmessageformatter \ --property print.key=true \ --property print.value=true \ --property key.deserializer=org.apache.kafka.common.serialization.stringdeserializer \ --property value.deserializer=org.apache.kafka.common.serialization.longdeserializer all 1 streams 1 lead 1 to 1 kafka 1 hello 1 kafka 2 streams 2 kafka 1