学习资源

思科

网络工程

华为

网络工程

红帽

系统运维

RHCSA

RHCE

RHCA

OpenStack

RHCVA

RHCSS

甲骨文

数据库

OCA

OCP

OCM

MySQL

微软

系统运维

MTA

MCSA

MCSE

软件开发

编程设计

Java

Android

HTML5

其他

Python

学习文章

当前位置：首页 > >学习文章 > >

{大数据}Kafka

发布时间： 2018-01-18 17:33:50

Kafka是什么：

kafka + storm/spark streaming

Apache Kafka是一个开源消息系统，由Scala写成。是由Apache软件基金会开发的一个开源消息系统项目。

Kafka最初是由LinkedIn开发，并于2011年初开源。2012年10月从Apache Incubator毕业。该项目的目标是为处理实时数据提供一个统一、高通量、低等待的平台。

Kafka是一个分布式消息队列：生产者、消费者的功能。它提供了类似于JMS的特性，但是在设计实现上完全不同，此外它并不是JMS规范的实现。

Kafka对消息保存时根据Topic进行归类，发送消息者称为Producer,消息接受者称为Consumer,此外kafka集群有多个kafka实例组成，每个实例(server)成为broker。

无论是kafka集群，还是producer和consumer都依赖于zookeeper集群保存一些meta(元数据)信息，来保证系统可用性

Kafka核心组件：

Topic ：消息根据Topic进行归类

Producer：发送消息者

Consumer：消息接受者

broker：每个kafka实例(server)

Zookeeper：依赖集群保存meta信息。

Kafka集群部署：

1. 集群部署的基本流程

下载安装包、解压安装包、修改配置文件、分发安装包、启动集群

2. Kafka集群部署

1) 下载安装包

http://kafka.apache.org/downloads

2) 解压安装包

[hadoop@hdp08 ~]$ tar zxvf kafka_2.12-1.0.0.tgz -C apps

3) 修改配置文件

[hadoop@hdp08 kafka]$ vi config/server.properties

输入以下内容：

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at

# http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.

broker.id=1

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from

# java.net.InetAddress.getCanonicalHostName() if not configured.

# FORMAT:

# listeners = listener_name://host_name:port

# EXAMPLE:

# listeners = PLAINTEXT://your.host.name:9092

#listeners=PLAINTEXT://:9092

#用来监听链接的端口,producer或consumer将在此端口建立连接

port=9092

# Hostname and port the broker will advertise to producers and consumers. If not set,

# it uses the value for "listeners" if configured. Otherwise, it will use the value

# returned from java.net.InetAddress.getCanonicalHostName().

#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details

#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# 处理网络请求的线程数量

num.network.threads=3

# 用来处理磁盘IO的线程数量

num.io.threads=8

# 发送socket的缓冲区大小

socket.send.buffer.bytes=102400

# 接受socket的缓冲区大小

socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)

# 请求socket的缓冲区大小

socket.request.max.bytes=104857600

############################# Log Basics #############################

# kafka运行日志存放的路径

log.dirs=/home/hadoop/apps/kafka/logs

# The default number of log partitions per topic. More partitions allow greater

# parallelism for consumption, but this will also result in more files across

# the brokers

# topic在当前broker上的分片个数

num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.

# This value is recommended to be increased for installations with data dirs located in RAID array.

# 用来恢复和清理data下数据的线程数量

num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings #############################

# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"

# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.

offsets.topic.replication.factor=1

transaction.state.log.replication.factor=1

transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync

# the OS cache lazily. The following configurations control the flush of data to disk.

# There are a few important trade-offs here:

# 1. Durability: Unflushed data may be lost if you are not using replication.

# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.

# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.

# The settings below allow one to configure the flush policy to flush data after a period of time or

# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk

#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush

#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can

# be set to delete segments after a period of time, or after a given size has accumulated.

# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens

# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age

# segment文件保留的最长时间，超时将被删除

log.retention.hours=168

#滚动生成新的segment文件的较大时间

log.roll.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining

# segments drop below log.retention.bytes. Functions independently of log.retention.hours.

#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.

log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according

# to the retention policies

log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).

# This is a comma separated host:port pairs, each corresponding to a zk

# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".

# You can also append an optional chroot string to the urls to specify the

# root directory for all kafka znodes.

zookeeper.connect=hdp08:2181,hdp09:2181,hdp10:2181

# Timeout in ms for connecting to zookeeper

zookeeper.connection.timeout.ms=6000

############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.

# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.

# The default value for this is 3 seconds.

# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.

# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.

4) 分发安装包

[hadoop@hdp08 apps]$ scp -r kafka hadoop@hdp09:/home/hadoop/apps

[hadoop@hdp08 apps]$ scp -r kafka hadoop@hdp10:/home/hadoop/apps

5) 再次修改配置文件（重要）

依次修改各服务器上配置文件的的broker.id，分别是1,2,3不得重复。

6) 启动集群依次在各节点上启动kafka

bin/kafka-server-start.sh config/server.properties

7) Kafka常用操作命令l 查看当前服务器中的所有topic

bin/kafka-topics.sh --list --zookeeper hdp08:2181

创建topic

bin/kafka-topics.sh --create --zookeeper hdp08:2181 --replication-factor 1 --partitions 3 --topic first

删除topic

bin/kafka-topics.sh --delete --zookeeper hdp08:2181 --topic first

需要server.properties中设置delete.topic.enable=true否则只是标记删除或者直接重启。

通过shell命令发送消息

bin/kafka-console-producer.sh --broker-list hdp08:9092 --topic first

通过shell消费消息

bin/kafka-console-consumer.sh --zookeeper hdp08:2181 --from-beginning --topic first

查看消费位置

bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper hdp08:2181 -- group testGroup

查看某个Topic的详情

bin/kafka-topics.sh --topic first --describe --zookeeper hdp08:2181

QQ空间新浪微博腾讯微博人人网微信更多

上一篇： {大数据}Kafka Java API

下一篇： {大数据}Spark Streaming

十八年老品牌

微信咨询：gz_togogo 咨询电话：18922156670 咨询网站客服：在线客服

网络技术

系统运维

数据库

云计算

安全

大数据

人工智能

项目管理

软件开发

其他

优选课程

高校合作

企业定制

考试中心

学习资源

关于我们

学习文章

{大数据}Kafka

关于我们

联系我们

最新文章

客服热线

全国校区

友情链接

关注我们