Linux - OpenTelemetry Collector
The Sumo Logic App for Linux allows you to monitor the performance and resource utilization of hosts and processes that your mission-critical applications are dependent upon. In addition to that, it allows you to view information about events, logins, and the security status of your Linux system using Linux system logs.
The app consists of predefined searches and dashboards that provide visibility into your environment for real-time or historical analysis. Our dashboards provide insight into CPU, memory, network, file descriptors, page faults, and TCP connectors. This app uses OpenTelemetry, an open-source collector for the collection of both metrics and log data.
We use the Sumo Logic Distribution for OpenTelemetry Collector to collect Linux metrics and system logs. The OpenTelemetry collector runs on the Linux machine and uses the Host Metric Receiver to obtain Host and process metrics, and the Sumo Logic OpenTelemetry Exporter to send the metrics to Sumo Logic. Linux logs are sent to Sumo Logic through a filelog receiver.
Fields Created in Sumo Logic for Linux
Following are the fields that will be created as part of Linux App install if not already present.
Across apps
sumo.datasource
- has a fixed value of linux.
Collecting logs, metrics, and Linux app installation
Here are the steps for collecting logs, metrics, and app installation.
Step 1: Set up Collector
If you want to use an existing OpenTelemetry Collector, you can skip this step by selecting the Use an existing Collector option.
To create a new Collector:
- Select the Add a new Collector option.
- Select the platform where you want to install the Sumo Logic OpenTelemetry Collector.
This will generate a command that you can execute in the machine environment you need to monitor. Once executed, it will install the Sumo Logic OpenTelemetry Collector.
Step 2: Configure integration
In this step, you will configure the yaml file required for Linux Collection. The app requires path for system log file, based on your Linux version used.
Required Logs for Ubuntu
The following logs, located in your Linux machine's /var/log
folder, are required for using the Sumo app for Linux with Ubuntu:
auth.log
syslog
daemon.log
dpkg.log
kern.log
Required Logs for CentOS, Amazon Linux, and Red Hat
The following logs, located in your Linux machine's /var/log
folder, are required for using the Sumo app for Linux with CentOS, Amazon Linux, and most Red Hat forks:
audit/audit.log
secure
Messages
yum.log
Click on the Download YAML File button to get the yaml file.
By default, the path for linux log files required for all the distros are pre populated in the UI. Not all of the files might be available on your Linux distribution and unwanted file paths can be removed from the list. This is an optional step and the collection will work properly even if not all of the files are present on your system. If in doubt, you can leave the default file paths values.
By default, the collector will be sending process metrics to Sumo Logic. Since the number of processes running can be very large, this may result in significant increase in Data Points per Minute (DPM). If you would like to narrow down the list of processes being monitored, this can be done by adding the following entry under the process section of the downloaded yaml.
process:
include:
names: [ <process name1>, <process name2> ... ]
match_type: <strict|regexp>
Step 3: Send logs and metrics to Sumo
Once you have downloaded the yaml file as described in the previous step, follow the below steps based on your platform.
- Copy the yaml file to
/etc/otelcol-sumo/conf.d/
folder in the Linux instance which needs to be monitored. - Restart the collector using:
sudo systemctl restart otelcol-sumo
After successfully executing the above command, Sumo Logic will start receiving data from your host machine.
Click Next. This will install the app (dashboards and monitors) to your Sumo Logic Org.
Dashboard panels will start to fill automatically. It's important to note that each panel fills with data matching the time range query and received since the panel was created. Results won't immediately be available, but within 20 minutes, you'll see full graphs and maps.
Sample Log Messages
Dec 13 04:44:00 <1> [zypper++] Summary.cc(readPool):133 I_TsU(27372)Mesa-libGL1-8.0.4-20.4.1.i586(@System)
Sample Metrics
{
"queryId":"A",
"_source":"linux-otel-metric",
"process.executable.name":"apache2",
"_sourceName":"Http Input",
"process.command":"/usr/sbin/apache2",
"host":"ip-172-31-90-39.ec2.internal",
"os.type":"linux",
"sumo.datasource":"linux",
"process.executable.path":"/usr/sbin/apache2",
"process.command_line":"/usr/sbin/apache2_-k_start",
"process.owner":"www-data",
"_sourceCategory":"Labs/linux-otel/metric",
"_contentType":"Carbon2",
"metric":"process.memory.physical_usage",
"_collectorId":"000000000C984E1A",
"_sourceId":"0000000042E512AE",
"unit":"By",
"_collector":"Labs - linux-otel",
"process.pid":"26967",
"max":42295296,
"min":536576,
"avg":9061120,
"sum":144977920,
"latest":8069120,
"count":16
}
Sample Queries
Log query
Logs query from the Total Event Distribution panel.
%"sumo.datasource"=linux
| parse regex "\d+\s+\d+:\d+:\d+\s(?<dest_hostname>\S+)\s(?<process_name>\w*)(?:\[\d+\]|):\s+"
|where dest_hostname matches "{{dest_hostname}}" AND process_name matches "{{process_name}}"
|count as Events by dest_hostname
Metrics query
Metrics query from the CPU Utilization Over Time panel.
sumo.datasource=linux host.name=* metric=system.cpu.utilization state=(user OR system OR wait OR steal OR softirq OR interrupt OR nice) | sum by host.name | outlier
Linux Metrics Dashboards
Host Metrics - Overview
The Host Metrics - Overview dashboard gives you an at-a-glance view of the key metrics like CPU, memory, disk, network, and TCP connections of all your hosts. You can drill down from this dashboard to the Host Metrics - CPU/Disk/Memory/Network/TCP dashboard by using the honeycombs or line charts in all the panels.
Use this dashboard to:
- Identify hosts with high CPU, disk, memory utilization, and identify anomalies over time.
Host Metrics - CPU
The Host Metrics - CPU dashboard provides a detailed analysis based on CPU metrics. You can drill down from this dashboard to the Process Metrics - Details dashboard by using the honeycombs or line charts in all the panels.
Use this dashboard to:
- Identify hosts and processes with high CPU utilization.
- Examine CPU usage by type and identify anomalies over time.
Host Metrics - Disk
The Host Metrics - Disk dashboard provides detailed information about on disk utilization and disk IO operations.You can drill down from this dashboard to the Process Metrics - Details dashboard by using the honeycombs or line charts in all the panels.
Use this dashboard to:
- Identify hosts with high disk utilization and disk IO operations.
- Monitor abnormal spikes in read/write rates.
- Compare disk throughput across storage devices of a host.
Host Metrics - Memory
The Host Metrics - Memory dashboard provides detailed information on host memory usage, memory distribution, and swap space utilization. You can drill down from this dashboard to the Process Metrics - Details dashboard by using the honeycombs or line charts in all the panels.
Use this dashboard to:
- Identify hosts with high memory utilization.
- Examine memory distribution (free, buffered-cache, used, total) for a given host.
- Monitor abnormal spikes in memory and swap utilization.
Host Metrics - Network
The Host Metrics - Network dashboard provides detailed information on host network errors, throughput, and packets sent and received.
Use this dashboard to:
- Determine top hosts with network errors and dropped packets.
- Monitor abnormal spikes in incoming/outgoing packets and bytes sent and received.
- Use dashboard filters to compare throughput across the interface of a host.
Host Metrics - TCP
The Host Metrics - TCP dashboard provides detailed information around inbound, outbound, open, and established TCP connections.
Use this dashboard to:
- Identify abnormal spikes in inbound, outbound, open, or established connections.
Process Metrics - Details
The Process Metrics - Details dashboard gives you a detailed view of key process related metrics such as CPU and memory utilization, disk read/write throughput, and major/minor page faults.
Use this dashboard to:
- Determine the number of open file descriptors in all hosts. If the number of open file descriptors reaches the maximum file descriptor limits, it can cause IOException errors.
- Identify anomalies in CPU usage, memory usage, major/minor page faults and reads/writes over time.
- Troubleshoot memory leaks using the resident set memory trend chart.
Linux Log-Based Dashboards
Linux - Overview
See an overview of Linux activity, including the distribution of system events across hosts, group assignment changes, a breakdown of successful and failed logins, sudo attempts, and the count of reporting hosts.
Filtering the Overview dashboard
Click the funnel icon in the upper left of the dashboard to display filtering options. You can filter the dashboard by any combination of command, dest_group, host.name, and dest_user.
Linux - Event Sources
See information about system events, including their distribution across hosts, event counts per host by hour, and even counts by host and service.
Filtering the Event Sources dashboard
Click the funnel icon in the upper left of the dashboard to display filtering options. You can filter the dashboard by any combination of host.name and process.executable.name.
Linux - Login Status
See information about logins to Linux hosts; including logins by hour; failed logins per host; the top 30 successful and failed logins; and the top 30 successful and failed remote logins.
Filtering the Login Status dashboard
Click the funnel icon in the upper left of the dashboard to display filtering options. You can filter the dashboard by any combination of action
, host.name
, dest_user
, and outcome
.
Linux - Security Status
See information about security on Linux hosts, including su, sudo attempts, new and existing user assignments, package operations, and system start events.
Filtering the Security Status dashboard
Click the funnel icon in the upper left of the dashboard to display filtering options. You can filter the dashboard by any combination of action
, host.name
, dest_user
, and outcome
.