This thought often cross our mind when configuring log collection inputs in Logstash or Graylog. The arguments can go countless and every network have a unique argument based on their specific configuration. I won’t be debate on the myriad arguments but list down key logical factors to help us make our decision.
Log enrichment is evident
Unless we are planning to just dump the logs to get rid of centralized logging compliance requirement, we will be working with each unique log type to filter, transform, and add new data to make it useful.
We must need to identify and pick unique log types to apply the enrichment procedures.
Parsing is costly
If we are reading the strings of log messages to identify unique log types, we are wasting precious CPU cycles which can be put to better use. Syslog is the most widely encountered log format. It needs to be parsed to extract individual units of information like severity, timestamp, facility, host, and actual log message from the string.
If we are receiving multiple type of logs from multiple kind of devices on a single port, we need to parse-out each unique log type for future processing. The processing resources will take a significant hit as we scale.
Avoid parsing for log identification
We can increase our processing efficiency by skipping the need to parse logs for identification of type. This can be done in two ways
1) Split log types over ports
If we know that we will only receive log of type A on port 55200 and type B on 55300, we can skip the initial parsing and save lots of CPU cycles.
2) Prefer extraction over parsing
All the good log shipping agents support the ability to add additional field and send syslog in a structured format. All we have to do is define a log type field for each unique log type before shipping out. Upon receiving we can quickly extract the log type and load for further processing.
Extracting field value and matching it to know log type is no doubt consume less CPU cycles than parsing every string.
Conclusion
If our log is structured, we can happily receive all of it on few input ports or even a single port and process it efficiently.
If we are unable to change the shipment method, we should opt for multiple ports trying to split log types over individual ports.