Real-Time Log Analysis with Filebeat on CentOS: A Step-by-Step Implementation Guide
Filebeat is a lightweight, efficient log shipper designed to collect, parse, and forward log data from local or remote servers to centralized systems like Elasticsearch for storage and analysis. Its real-time capabilities stem from its ability to monitor log files for changes (e.g., new lines appended due to application activity) and immediately forward those updates. Below is a structured guide to setting up Filebeat on CentOS for real-time log analysis, covering installation, configuration, verification, and optimization.
Before installing Filebeat, ensure your CentOS system meets the following requirements:
sudo systemctl status elasticsearch
sudo systemctl status kibana
sudo or as a user with root privileges.Filebeat can be installed via the official Elastic YUM repository to ensure access to the latest versions. Follow these steps:
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
/etc/yum.repos.d/elasticsearch.repo:echo "[elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md" | sudo tee -a /etc/yum.repos.d/elasticsearch.repo
yum to install the latest version of Filebeat:sudo yum install filebeat -y
The core of Filebeat’s real-time functionality lies in its configuration file (/etc/filebeat/filebeat.yml). Below are key settings to enable and optimize real-time log collection:
Specify the log files or directories to monitor. For example, to monitor all .log files in /var/log/:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
# Optional: Ignore logs older than 72 hours to reduce processing load
ignore_older: 72h
You can monitor multiple directories or specific files by adding more entries to the paths list (e.g., - /opt/myapp/logs/*.log).
Adjust the following parameters in the filebeat.inputs section to enhance real-time performance:
scan_frequency: Controls how often Filebeat scans for new or updated files (default: 10s). Reduce this to 5s for faster detection (e.g., scan_frequency: 5s).close_inactive: Closes a file if no new data is written for the specified duration (default: 5m). Set to a shorter interval (e.g., 1m) to immediately detect new log entries after inactivity.tail_files: If set to true, Filebeat starts reading from the end of new files (useful for avoiding old log entries). Default is false.Example configuration with optimized real-time settings:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
scan_frequency: 5s
close_inactive: 1m
tail_files: true
Send collected logs to Elasticsearch for storage and indexing. Replace localhost:9200 with your Elasticsearch server’s address if it’s remote:
output.elasticsearch:
hosts: ["localhost:9200"]
index: "filebeat-%{+yyyy.MM.dd}" # Daily indices for better manageability
Processors modify log data before sending it to Elasticsearch. For example, the add_fields processor adds a custom field to categorize logs:
processors:
- add_fields:
target: "" # Add fields to the root of the event
fields:
environment: "production"
application: "myapp"
After configuring Filebeat, start the service and configure it to launch at boot:
sudo systemctl start filebeat
sudo systemctl enable filebeat
Verify Filebeat’s status to ensure it’s running without errors:
sudo systemctl status filebeat
Check that Filebeat is successfully sending logs to Elasticsearch:
filebeat-2025.09.20):curl -X GET "localhost:9200/_cat/indices?v"
_search API to retrieve the latest logs. For example, to get logs from the last 5 minutes:curl -X GET "localhost:9200/filebeat-*/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"range": {
"@timestamp": {
"gte": "now-5m/m",
"lte": "now/m"
}
}
},
"size": 10
}'
Kibana provides a user-friendly interface for real-time log analysis. Follow these steps to set it up:
http://<server-ip>:5601) and navigate to Stack Management > Index Patterns. Click “Create index pattern”, enter filebeat-*, and select @timestamp as the time field.filebeat-* index pattern, and you’ll see real-time logs streaming in. Use filters (e.g., level: ERROR) to narrow down results.For production environments, consider these advanced configurations to improve reliability and performance:
myapp.log.1), but you can configure close_removed (close files when deleted) and close_renamed (close files when renamed) to avoid missing data.bulk_max_size parameter (default: 50) in the output.elasticsearch section to control how many logs are sent in each batch (higher values improve throughput but increase memory usage).By following these steps, you can configure Filebeat on CentOS to achieve real-time log analysis, enabling you to quickly identify and respond to issues in your applications and infrastructure.