Polling source: Flume periodically obtains data.Ī source must be associated with at least one channel.Įxecutes a command or script and uses the output of the execution result as the data source. There are two types of sources: driver-based source and polling source.ĭriver-based source: External systems proactively send data to Flume, driving Flume to receive data. The events in the channel are consumed by a sink and then pushed to the destination.Ī source receives events or generates events using a special mechanism, and places events to one or more channels. The source of the level-2 agent consolidates the received events and sends the consolidated events into a single channel. You can configure multiple level-1 agents and point them to the source of an agent using Flume. This architecture is used to import data from outside the cluster to the cluster. Multi-agent architecture: Flume can connect multiple agents to collect raw data and store them in the final Infrastructure: Flume can directly collect data with an agent, which is mainly for data collection in a cluster. Support custom data collection tasks based on users. Support cascading (connecting multiple Flumes) and data conflation. What Can Flume Do?Ĭollect log information from a fixed directory to a destination (HDFS, HBase, or Kafka).Ĭollect log information (taildir) to the destination in real time.
Flume collects data from local files (spooling directory source), real-time logs (taildir and exec), REST messages, Thrift, Avro, Syslog, Kafka, and other data sources. It roughly processes data and writes data to data receivers. In this post, I will share with you what is Flume, Flume system architecture, and Flume's key features.įlume is a stream log collection tool.