ELK as a free NetFlow/IPFIX collector and visualizer

Abstract

The ELK (Elasitsearch, Logstash and Kibana) stack is one of the most flexible and open source system to store, search and visualize logs.i This post summarizes how to use ELK to store NetFlow/IPFIX data and draw some interesting graphs.

Dashboard example

Installation

There are many ways to install ELK, I suggest to get latest packages from ELK website. I’m using RPM installed into a CentOS 7. Installing RPMs are out of scope, but remember to enable and start them at boot:

# systemctl daemon-reload
# systemctl enable elasticsearch.service
# systemctl enable logstash.service
# systemctl enable kibana.service
# service elasticsearch start
# service logsgtash start
# service kibana start

Logstash configuration

A single Logstash instance can manages multiple sources and destinations. We want that Logstash listen to multiple UDP ports, and each port is bound to a dedicated Elasticsearch index:

# /etc/logstash/conf.d/netflow.conf
input {
  udp {
    port => 9996
    codec => netflow {
      versions => [5, 9]
    }
    type => netflow
    tags => "port_9996"
  }
  udp {
    port => 9995
    codec => netflow {
      versions => [5, 9]
    }
    type => netflow
    tags => "port_9995"
  }
}
output {
  if "port_9996" in [tags] {
    elasticsearch {
        hosts => "127.0.0.1"
        index => "logstash-netflow-9996-%{+YYYY.MM.dd}"
    }
  } else if "port_9995" in [tags] {
    elasticsearch {
        hosts => "127.0.0.1"
        index => "logstash-netflow-9995-%{+YYYY.MM.dd}"
    }
  }
}

The input part specify two different UDP ports: 9995 and 9996. Each port is tagged with a different tag, and traffice received from both ports are decided using NetFlow plugin. The output part store the incoming and decoded traffic into two indexes, depending on the tag.

In the real world you would store data into separated places because:

  • sources are different devices and generates different output (for example ASA and IOS-XE);
  • we want different cleaning strategies;
  • we want to post-process part of the incoming data.

After Logstash is restarted, the log file could show:

[2017-07-17T12:14:19,531][WARN ][logstash.codecs.netflow  ] No matching template for flow id 256
[2017-07-17T12:14:19,533][WARN ][logstash.codecs.netflow  ] No matching template for flow id 256
[2017-07-17T12:14:19,534][WARN ][logstash.codecs.netflow  ] No matching template for flow id 256

That’s normal, because source devices occasionally send the templates so after few minutes the warning should stop. If not, a good command to troubleshot is tshark usually included in the Wireshark package:

# tshark -f "udp port 9995" -i any -V -d "udp.port==9995,cflow"
[...]
Cisco NetFlow/IPFIX
    Version: 9
    Count: 14
    SysUptime: 155258638
    Timestamp: Jul 17, 2017 07:40:23.000000000 CEST
        CurrentSecs: 1500270023
    FlowSequence: 428247
    SourceId: 0
    FlowSet 1
        FlowSet Id: (Data) (265)
        FlowSet Length: 136
        Data (132 bytes), no template found

After few minutes, the output should change to:

[...]
Cisco NetFlow/IPFIX
    Version: 9
    Count: 15
    SysUptime: 155239891
    Timestamp: Jul 17, 2017 07:40:05.000000000 CEST
        CurrentSecs: 1500270005
    FlowSequence: 428035
    SourceId: 0
    FlowSet 1
        FlowSet Id: (Data) (256)
        FlowSet Length: 104
        Flow 1
            Flow Id: 88277552
            SrcAddr: 10.10.2.6 (10.10.2.6)
            SrcPort: 62701
            InputInt: 19
            DstAddr: 216.58.205.78 (216.58.205.78)
            DstPort: 80
            OutputInt: 17
            Protocol: 6
[...]

Elasticsearch configuration and checks

Be sure that your Elasticsearch indexes are incrementing:

# curl -X GET 'http://localhost:9200/_cat/indices?v'
health status index                            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-netflow-9996-2017.07.17 k1x36_PBTz6NUsuV-lp0hA   5   0   10448175            0      3.1gb          3.1gb
green  open   logstash-netflow-9995-2017.07.16 VDxDgk_NQ-GBoGqZeavjcA   5   0      10233            0        5mb            5mb
green  open   logstash-netflow-9995-2017.07.17 RCTBShCFRxGYcbkhzay91w   5   0    1111309            0    423.2mb        423.2mb
green  open   .kibana                          bom3k5N5Rwi0nA5uFhhUYg   1   0         16            0     34.7kb         34.7kb

If status is not green, you are required to analyze what is going on. But before that, if your installation is a standalone testing machine, be sure the number of replica are 0. If not, set for all indexes:

# curl -X PUT 'http://localhost:9200/_all/_settings?preserve_existing=false' -d '{"index.number_of_replicas":"0"}'

Be also sure there is enough free space under /var and regularly delete old data:

# crontab -l
00 18 * * * /bin/curl -X DELETE "http://localhost:9200/logstash-netflow-9995-$(date --date='7 days ago' +%Y.%m.%d)" &> /dev/null
00 18 * * * /bin/curl -X DELETE "http://localhost:9200/logstash-netflow-9996-$(date --date='7 days ago' +%Y.%m.%d)" &> /dev/null

Kibana configuration

Finally with Kibana we can visualize stored data. By default Kibana listen to 5601 port. A good approach is to put a proxy server in the front. For example you can use Apache with the following configuration:

ProxyPass "/"  "http://127.0.0.1:5601/"
ProxyPassReverse "/"  "http://127.0.0.1:5601/"

Drawing charts with Kibana

Kibana has two different ways to draw graphs: Visualize and Timelion. My suggestion is: use Timelion for line charts, using Visualize for everything else.

Counting DNS requests

The following example uses NetFlow generated by a Cisco IOS-XE device.

$q='netflow.l4_dst_port:53',
.es(q=$q).mvavg(5m).scale_interval(1s).label('DNS requests/s'),

DNS request/s

Explanation: using the default index, count how many requests there are with destination port 53, aggregate into 5-minute average data and assume 1 second rate.

Incoming and outgoing traffic for an IP range (IOS-XE)

The following example uses NetFlow generated by a Cisco IOS-XE device.

$src_q='netflow.ipv4_src_addr:[1.1.0.0 TO 1.1.1.255]', $dst_q='netflow.ipv4_dst_addr:[1.1.0.0 TO 1.1.1.255]',
.es($src_q, metric='sum:netflow.in_bytes').mvavg(5m).scale_interval(1s).multiply(8).divide(1048576).label('Upload (Mb/s)'),
.es($dst_q, metric='sum:netflow.in_bytes').mvavg(5m).scale_interval(1s).multiply(8).divide(1048576).label('Download (Mb/s)')

Incoming and outgoing traffic for an IP range

Explanation:

  • define the first query: traffic originated from the IP range;
  • define the second query: traffic destined to the IP range;
  • for each query:
    • aggregate traffic summarizing netflow.in_bytes;
    • aggregate into 5-minute average data;
    • assume 1 second rate;
    • convert byte/s to MegaBit/s;
    • associate a label.

Incoming and outgoing traffic for an IP range (ASA)

The following example uses NetFlow generated by a Cisco ASA device.

$src_q='netflow.ipv4_src_addr:[10.1.4.0 TO 10.1.5.255]', $dst_q='netflow.ipv4_dst_addr:[10.1.4.0 TO 10.1.5.255]',
.es(index=logstash-netflow-9995-*, q=$src_q, metric='sum:netflow.fwd_flow_delta_bytes').sum(.es(index=logstash-netflow-9995-*, q=$dst_q, metric='sum:netflow.rev_flow_delta_bytes')).mvavg(5m).scale_interval(1s).multiply(8).divide(1048576).label('Upload (Mb/s)'),
.es(index=logstash-netflow-9995-*, q=$dst_q, metric='sum:netflow.fwd_flow_delta_bytes').sum(.es(index=logstash-netflow-9995-*, q=$src_q, metric='sum:netflow.rev_flow_delta_bytes')).mvavg(5m).scale_interval(1s).multiply(8).divide(1048576).label('Download (Mb/s)')

Incoming and outgoing traffic for an IP range

Explanation: if IOS-XE has only in_bytes field, ASA has two different fields: fwd_flow_delta_bytes (from source to destination) and rev_flow_delta_bytes (from destination to source). So given a request/answer, will have the following traffic:

  • From 10.1.4.0/23 to any, destination port 80
  • From any to 10.1.4.0/23, source port 80

Moreover this query is using the non-default index bound to the port 9995. That’s why this second query is more complex than the previous one.

NOTE: I’m not 100% sure about this approach. If you want, drop me an email or leave a comment with our suggestions.

Top 10 traffic originators from and to an IP range

The last example draw two pie charts:

Top 10 traffic originators from and to an IP range

Explanation:

  • go to visualize and query for an IP range: netflow.ipv4_src_addr:[1.1.0.0 TO 1.112.255];
  • aggregate summarizing netflow.in_bytes;
  • add a split slice:
    • aggregate for “Significant Terms” using netflow.ipv4_src_addr.keyboard (or netflow.ipv4_dst_addr.keyboard);
    • select a size of 10 (we want a top 10 chart);
    • set a custom label.

References

Posted on 17 Jul 2017 by Andrea.
  • Gmail icon
  • Twitter icon
  • Facebook icon
  • LinkedIN icon
  • Google+ icon