Computer Science Department - University of Puerto Rico
Prepared by: José R. Ortiz-Ubarri
A simple way to monitor you network is to have aggregated information on the traffic of your network.
In a private network an IP thas has not been delegated can be a machine hijacking your network.
A wise user that does not like to follow rules.
An IP generating more traffic than usual can be a signal of a DoS.
A compromissed computer.
An user using the network for personal purpose in his work.
A computer generating SPAM.
An IP that is expected to generate traffic and is not might have crashed or off.
Might have an internet service crashed or off.
Depending on the degree of a network the task of aggregating all the traffic of an IP or all the IPs in your network gets harder.
An example of a File that contains 5 minutes of NetFlow data of the traffic of the University of Puerto Rico can be as high as 6.5MB. Or 363149 lines of flows.
NetFlow is a network protocol developed by Cisco that has become the standard for traffic monitoring , they run on some network devices and collect aggregated information of the network traffic.
This information is exported to a collector for analysis.
One NetFlow is a record representing an unidirectional sequence of packets that contains information on the source ip, the destination ip, the source port, destination port, the sum of the payload size of the packets, a timestamp, among others.
136.145.155.233 200.125.49.166 6 80 60239 52 1 136.145.182.24 190.144.252.102 6 9637 80 80 2 136.145.182.24 107.15.34.28 6 56387 38973 52 1 167.8.226.10 136.145.33.149 6 80 51965 52 1 206.248.76.205 136.145.30.60 6 49839 443 245 4 173.194.37.118 136.145.170.186 6 443 51991 40 1 136.145.33.149 167.8.226.10 6 51965 80 40 1 136.145.230.223 31.13.69.80 6 62206 443 1668 4 136.145.180.248 54.234.145.249 17 53 48266 160 1 136.145.115.196 221.219.225.133 6 80 64844 52 1 136.145.62.155 109.105.242.199 6 35432 51413 60 1 136.145.240.59 173.194.37.118 6 64488 443 2960 5 136.145.215.1 136.145.230.222 1 0 2816 1064 19 136.145.226.5 64.178.214.6 6 49598 443 40 1 136.145.193.56 204.93.223.146 6 1435 80 40 1 136.145.249.201 157.56.23.42 6 60301 443 2057 4 72.21.91.79 136.145.144.211 6 80 50126 3147 5 72.21.91.79 136.145.144.211 6 80 50125 2150 4 136.145.95.2 121.97.142.136 17 49349 43112 58 1 136.145.95.2 122.201.18.193 17 49349 23963 58 1
Note: This is a flow-print using the flowtools package. The columns are the src ip, dst ip, protocol, src port, dst port, octets and packets.
The map function
def map(line, params):
# Split the flow data into an array
data = line.split()
yield data[0], int(data[5])
yield data[1], int(data[5])
The reduce function
def reduce(iter, params):
from disco.util import kvgroup
for ip, traffic in kvgroup(sorted(iter)):
yield ip, sum(traffic)
Creating the job and linking the map and reduce functions
if __name__ == '__main__':
job = Job().run(input=["file:///bccd/home/jortiz/netflow-print.txt"],
map=map,
reduce=reduce,
# Print the input/output traffic per IP
for ip, traffic in result_iterator(job.wait(show=True)):
print(ip, traffic)
NetFlows, http://www.ietf.org/rfc/rfc3954.txt
Disco project, http://discoproject.org/
Python, www.python.org
Wikipedia, http://en.wikipedia.org/wiki/MapReduce