lookup (Classic) Search Operator
The lookup
(classic) operator maps data in your log messages to meaningful information saved in Sumo or on an HTTPS server. For example, you'd use a lookup operator to map "userID" to a real user's name. Or, you'd use a lookup operator to find deny-listed IP addresses.
In either case, you will point the operator to one of the following:
- A table of saved data generated by the Save operator.
- A CSV file hosted on an HTTPS server. Attempting to run the lookup operator against a CSV hosted on an HTTP server will not be loaded in Sumo Logic.
This topic has information about the classic version of the lookup
operator that works with the classic Lookup Tables feature. For information about the new, more scalable Lookup Tables feature and the new lookup
operator that works with it, see Lookup Tables and lookup.
Syntax
lookup <outputColumn-1> [as <field>] [,<outputColumn-2> [as <field>]] from <filePath> on <joinColumn-1> [,<joinColumn-2>]
Where:
outputColumn-x
is a list of field names in the header of thefilePath
.filePath
is an HTTPS address of a CSV file containing the external relationship table or a table saved to the Sumo Logic file system by the Save operator.noteBasic authentication is supported for CSV files, with the following syntax:
<https://USERNAME:PASSWORD@company.com/userTable.cs>
joinColumn-x
is a list of pairs of field names that define the relationship between values in the log data results with matching values in an external table.
Rules
- The size limit for the CSV file is 8MB.
- If using an HTTPS resource, the file must be downloaded within 10 seconds. If the file can't be downloaded in 10 seconds, it is probably too large.
- Your
joinColumn-x
need to be of the same data type and are case sensitive. If your search result's field consists of integer data then the field in your external lookup table must also be integer data. You can cast data to a string or numeric value, see Casting Data to a Number or String. - If looking up a metadata field you need to either rename the field using the
as
option in the lookup operation or filter null values immediately after the lookup operation.
Structuring CSV files
Sumo Logic supports HTTPS-hosted lookup CSV files with the following restrictions:
The CSV file must contain a header line.
The header line can't use special characters.
Fields that contain special characters, such as commas, must be enclosed in quotes. Wrapping all fields in your CSV file is worthwhile, though, as it would prevent any upload issues that might occur, should special characters be introduced in the future.
No spaces are allowed between quotes and values. For example:
"id","name","time"
"1","foo","6-15-12"
"2","zoo","6-14-12"
"3","woo","6-13-12"An URL provided to a lookup (classic) operator should be consistent in its use of query escape/unescaped parameters.
For example, the following URL is invalid since it is inconsistent in its use of escape characters (%7C versus |):
http://localhost:5000/test?v=csv&f1=indicator%7CSHA1&f2=indicator|SHA1
These URLs will be accepted:
http://localhost:5000/test?v=csv&f1=indicator*|SHA1&f2=indicator|*``SHA1``http``://localhost:5000/test?v=csv&f1=indicator*%7CSHA1&f2=indicator%7C*SHA1
Dashboard limitation
The lookup operator behaves differently when used in live mode versus interactive mode or an interactive search. When used in live mode the lookup operation is done continually to provide real-time results. However, only the most recent data point is looked up in real time, while the previous data points keep their previously looked up result. An interactive search will conduct the lookup operation on all data points when the query is processed. Therefore, when comparing live mode results to interactive results you will likely have differences in your lookup results.
For example, say you are plotting the average price of a stock over the last 30 days.
In live mode, lookup will return the real-time price and retain the previously looked up data points during the 30 day period.
In an interactive search, lookup will only use the real-time stock price to plot over the past 30 days. In this case, you'd have to provide the previous stock prices for the past 30 days.
In other words, in live mode, lookup will use and retain the lookup data at that point in time when it ran. Whereas lookup in an interactive search will only use the data that was available when it ran.
Examples
Type the lookup operator in the Search tab, just as you'd any other operator.
To match the userID string with a users' ID in your CSV, your query could be:
* | parse "name=*, phone number=*," as (name, phone)
| lookup email from https://company.com/userTable.csv on name=userName, phone=cell
where the userTable.csv
file includes the following:
"id","userName","email","IP","cell"
"1","Joe","joe@example.com","192.168.1.1","650-123-4567"
"2","John","john@example.com","192.168.1.2","212-123-4567"
"3","Susan","susan@example.com","192.168.1.3","914-123-4567"
"4","John","another_john@example.com",192.168.1.4","408-123-4567"
"5","John","yet_another_john@example.com","192.169.1.5","734-123-4567"
Running this query adds three fields to the output: name, phone, and email.
Composite field lookup
In our example above we had several users named John. A lookup operator can be used on a composite set of fields, so you can identify the correct email for each person named John because each unique cell phone number has also been mapped using a query like:
* | parse "name=*, phone number=*," as (name, phone)
| lookup email from https://company.com/userTable.csv on name=userName, phone=cell
Running this query adds an email field to the output.
Using multiple lookup operators together
Another way to use a lookup operator is to chain lookup operators together. Each operator can call separate CSV files. For example, if you wanted to find user names and the position each user has in a company, your query could be:
* | parse "userID=*," as userID
| lookup userName from https://company.com/userTable.csv on userID=id
| lookup position from https://company.com/userPosition.csv on userID=id
where the userPosition.csv file includes the following:
"id","position"
"1","Salesperso"
"2","Salesperson"
"3","Engineer"
"4","Manager"
"5","Senior Engineer"
In our example above, the first operator finds the name, and the second finds the position.
Handling null values
To find a mismatch from a lookup operator query, use the isNull operator.
For example, running a query like:
| parse "code=*]" as code
| lookup status_code from shared/statusupdates on status = code
| if (isNull(status_code), "unknown", status_code) as status_code
Using lookup to access saved data
Once you've saved the results of a search to the Sumo Logic file system using a Save operator, the lookup operator allows you to search that data.
For example, say we wanted to find the date when users signed up in a file named newDailyUsers (the full path is myFolder/mySubFolder/newDailyUsers). We'd use this query to find that information:
* | parse "user_name=*," as name
| lookup date from myFolder/mySubFolder/newDailyUsers on name=name
A file generated by a save operator can be saved to an org-level shared folder. This allows for others in your organization to use your search results when running their lookup queries. See saving files to a shared location for details.
Duplicate keys error
If the key you specify in a lookup operation matches several records, you get an error message that warns you of the duplication:
Lookup table folder/myfolder/filename has duplicate keys. The last value associated with a duplicated key will be used in the lookup result.
You only get the last associated value as a result.
For example, if you are searching your Apache Access logs from 34.87.4.6 and you are looking for an internal server errors by a specific keyid, lookup provides the last result that matches your criteria: