windows - OpenTelemetry Collector
The Sumo Logic App for Windows allows you to monitor the performance and resource utilization of hosts and processes that your mission-critical applications are dependent upon. In addition to that, our Windows App provides insight into your Windows system's operation and events so that you can better manage and maintain your environment.
The Windows App, which is based on the Windows event log format, consists of predefined searches and dashboards that provide visibility into your environment for real-time analysis of overall usage of Security Status, System Activity, Updates, User Activity, and Applications. Our dashboards provide insight into CPU, memory, network, file descriptors, page faults, and TCP connectors.
- Windows event logs are sent to Sumo Logic through OpenTelemetry Event Log receiver.
- Windows Host metrics are sent to Sumo Logic through OpenTelemetry Host Metrics receiver.
Fields Created in Sumo Logic for Windows
Following are the fields which will be created as part of Windows App install if not already present.
sumo.datasource
. Has a fixed value of windows.
Log Types
The Windows App assumes events are coming from Windows Event Log receiver in JSON format. It does not work with third party logs.
Standard Windows event channels include:
- Security
- System
- Application
Collection Configuration and App installation
As part of data collection setup and app installation, you can select the App from App Catalog and click on Install App. Follow the steps below.
Step 1: Set up Collector
If you want to use an existing OpenTelemetry Collector, you can skip this step by selecting the Use an existing Collector option.
To create a new Collector:
- Select the Add a new Collector option.
- Select the platform where you want to install the Sumo Logic OpenTelemetry Collector.
This will generate a command that you can execute in the machine environment you need to monitor. Once executed, it will install the Sumo Logic OpenTelemetry Collector.
Step 2: Configure integration
In this step, you will configure the yaml file required for Windows event logs and metrics Collection.
Any custom fields can be tagged along with the data in this step.
Once the details are filled in, click on the Download YAML File button to get the yaml file.
By default the collector will be sending process metrics to Sumo Logic. Since the number of processes running can be very large, this may result in significant increase in Data Points per Minute (DPM) . If you would like to narrow down the list of processes being monitored, this can be done by adding the following entry under the process section of the downloaded yaml.
process:
include:
names: [ <process name1>, <process name2> ... ]
match_type: <strict|regexp>
Step 3: Send logs to Sumo
Once you have downloaded the yaml file as described in the previous step, follow the below steps based on your platform.
- Copy the yaml file to
C:\ProgramData\Sumo Logic\OpenTelemetry Collector\config\conf.d
folder in the machine which needs to be monitored. - Restart the collector using:
Restart-Service -Name OtelcolSumo
After successfully executing the above command, Sumo Logic will start receiving data from your host machine.
Click Next. This will install the app (dashboards and monitors) to your Sumo Logic Org.
Dashboard panels will start to fill automatically. It's important to note that each panel fills with data matching the time range query and received since the panel was created. Results won't immediately be available, but within 20 minutes, you'll see full graphs and maps.
Sample Metrics Message
{
"queryId":"A",
"_source":"windows-otel-metric",
"_metricId":"tYzy7VHWrdxuGHOkPRT5pA",
"_sourceName":"Http Input",
"os.type":"windows",
"sumo.datasource":"windows",
"direction":"transmit",
"_sourceCategory":"Labs/windows-otel",
"_contentType":"Carbon2",
"host.name":"EC2AMAZ-T30T53R.ec2.internal",
"metric":"system.network.io",
"_collectorId":"000000000CEC8ECC",
"_sourceId":"0000000044DB46EF",
"unit":"By",
"_collector":"Labs - windows-otel",
"device":"Loopback_Pseudo-Interface_1",
"max":289495780,
"min":0,
"avg":229918329.73,
"sum":3448774946,
"latest":289485558,
"count":15
}
Sample Queries
This sample metrics query is from the Host Metric - CPU dashboard > CPU User Time panel.
sumo.datasource=windows host.name={{host.name}} cpu=cpu0 metric=system.cpu.utilization state=user | avg by host.name
This sample log query is from the Windows - Overview dashboard > System Restarts panel.
%"sumo.datasource"=windows "\"channel\":\"Security\""
| json "event_id", "computer", "message", "channel" as event_id_obj, host.name, msg_summary, channel nodrop
| json field=event_id_obj "id" as event_id
| parse regex field=msg_summary "(?<msg_summary>.*\.*)" nodrop
| where event_id = "4608" and channel = "Security" and host.name matches "{{host.name}}"
| count as Restarts
Sample Logs
{
"record_id":"6316",
"channel":"Application",
"event_data":"",
"task":"0",
"provider":"{\"name\":\"Microsoft-Windows-Security-SPP\",\"guid\":\"{E23B33B0-C8C9-472C-A5F9-F2BDFEA0F156}\",\"event_source\":\"Software Protection Platform Service\"}",
"system_time":"2023-01-20T15:22:02+0000816Z",
"computer":"EC2AMAZ-T30T53R",
"opcode":"0",
"keywords":"Classic",
"message":"Offline downlevel migration succeeded.",
"event_id":"{\"id\":\"16394\",\"qualifiers\":\"49152\"}",
"level":"Information"
}
Viewing Windows Event Log-Based Dashboards
Windows - Overview
The Windows - Overview dashboard provides insights into fatal or warning messages, policy changes, and system restarts.
Use this dashboard to:
- Monitor systems experiencing fatal errors, warnings, and system restarts.
- View system login attempts.
- Monitor policy changes performed on the system.
Windows - Default
The Windows - Default dashboard provides information about the start and stop operations for Windows services, Windows events, operations events, and Errors and Warnings.
Use this dashboard to:
- Monitor services being stopped, started on the system.
- Monitor event types (channels) collected from the system.
- Monitor log level (error, warning) trend on the systems.
- Monitor operations performed on the system like restarts, user creation, group creation, and firewall changes.
Windows - Event Errors
The Windows - Event Errors dashboards provide insights into error keyword trends and outliers.
Use this dashboard to:
- Monitor various errors in the systems.
- Monitor error trends and outliers to ensure they are within acceptable limits to decide the next step.
Windows - Application
The Windows - Application dashboard provides detailed information about install, uninstall, and event trends.
Use this dashboard to:
- Monitor Install and uninstall of applications performed on the system.
- Monitor log levels (error, warning, information) through trends and quick snapshots.
- Monitor various application-specific events happening through recent messages.
Windows - Host Metric Based Dashboards
Host Metrics - Overview
The Host Metrics - Overview dashboard gives you an at-a-glance view of the key metrics like CPU, memory, disk, network, and TCP connections of all your hosts. You can drill down from this dashboard to the Host Metrics - CPU/Disk/Memory/Network/TCP dashboard by using the honeycombs or line charts in all the panels.
Use this dashboard to:
- Identify hosts with high CPU, disk, memory utilization, and identify anomalies over time.
Host Metrics - CPU
The Host Metrics - CPU dashboard provides a detailed analysis based on CPU metrics. You can drill down from this dashboard to the Process Metrics - Details dashboard by using the honeycombs or line charts in all the panels.
Use this dashboard to:
- Identify hosts and processes with high CPU utilization.
- Examine CPU usage by type and identify anomalies over time.
Host Metrics - Disk
The Host Metrics - Disk dashboard provides detailed information about disk utilization and disk IO operations.You can drill down from this dashboard to the Process Metrics - Details dashboard by using the honeycombs or line charts in all the panels.
Use this dashboard to:
- Identify hosts with high disk utilization and disk IO operations.
- Monitor abnormal spikes in read/write rates.
- Compare disk throughput across storage devices of a host.
Host Metrics - Memory
The Host Metrics - Memory dashboard provides detailed information on host memory usage, memory distribution, and swap space utilization. You can drill down from this dashboard to the Process Metrics - Details dashboard by using the honeycombs or line charts in all the panels.
Use this dashboard to:
- Identify hosts with high memory utilization.
- Examine memory distribution (free, buffered-cache, used, total) for a given host.
- Monitor abnormal spikes in memory and swap utilization.
Host Metrics - Network
The Host Metrics - Network dashboard provides detailed information on host network errors, throughput, and packets sent and received.
Use this dashboard to:
- Determine top hosts with network errors and dropped packets.
- Monitor abnormal spikes in incoming/outgoing packets and bytes sent and received.
- Use dashboard filters to compare throughput across the interface of a host.
Host Metrics - TCP
The Host Metrics - TCP dashboard provides detailed information around inbound, outbound, open, and established TCP connections.
Use this dashboard to:
- Identify abnormal spikes in inbound, outbound, open, or established connections.
Process Metrics - Details
The Process Metrics - Details dashboard gives you a detailed view of key process related metrics such as CPU and memory utilization, disk read/write throughput, and major/minor page faults.
Use this dashboard to:
- Determine the number of open file descriptors in all hosts. If the number of open file descriptors reaches the maximum file descriptor limits, it can cause IOException errors.
- Identify anomalies in CPU usage, memory usage, major/minor page faults and reads/writes over time.
- Troubleshoot memory leaks using the resident set memory trend chart.