What Is Not An Ids Information Technology Essay

Nowadays, as more people make use of the internet, their computers and valuable data in their computer systems become a more interesting target for the intruders. Attackers scan the Internet constantly, searching for potential vulnerabilities in the machines that are connected to the network. Intruders aim at gaining control of a machine and to insert a malicious code into it. Later on, using these slaved machines (also called Zombies) intruder may initiate attacks such as worm attack, Denial-of-Service (DoS) attack and probing attack.

1.1. What is an IDS?

Intrusion is any set of actions that threaten the integrity, availability, or confidentiality of a network resource. An intrusion detection system (IDS) monitors network traffic and monitors for suspicious activity and alerts the system or network administrator. In some cases the IDS may also respond to anomalous or malicious traffic by taking action such as blocking the user or source IP address from accessing the network.

IDS come in a variety of "flavors" and approach the goal of detecting suspicious traffic in different ways. There are network based (NIDS) and host based (HIDS) intrusion detection systems.

a) NIDS: Network Intrusion Detection Systems (NIDS) are a subset of security management systems that are used to discover inappropriate, incorrect, or anomalous activities within networks.

b) HIDS: Host-based intrusion detection system (HIDS) monitors and analyzes the internals of a computing system rather than the network packets on its external interfaces.

There are IDS that detect based on looking for specific signatures of known threats- similar to the way antivirus software typically detects and protects against malware- and there are IDS that detect based on comparing traffic patterns against a baseline and looking for anomalies.

a) Signature Based: A signature based IDS will monitor packets on the network and compare them against a database of signatures or attributes from known malicious threats. This is similar to the way most antivirus software detects malware. The issue is that there will be a lag between a new threat being discovered in the wild and the signature for detecting that threat being applied to our IDS. During that lag time, the IDS would be unable to detect the new threat. The limitation of this approach lies in its dependence on frequent updates of the signature database and its inability to generalize and detect novel or unknown intrusions.

b) Anomaly Based: An IDS which is anomaly based will monitor network traffic and compare it against an established baseline. The baseline will identify what is "normal" for that network- what sort of bandwidth is generally used, what protocols are used, what ports and devices generally connect to each other- and alert the administrator or user when traffic is detected which is anomalous, or significantly different, than the baseline. However, statistical anomaly detection is not based on an adaptive intelligent model and cannot learn from normal and malicious traffic patterns.

There are IDS that simply monitor and alert and there are IDS that perform an action or actions in response to a detected threat.

a) Passive IDS: A passive IDS simply detects and alerts. When suspicious or malicious traffic is detected an alert is generated and sent to the administrator or user and it is up to them to take action to block the activity or respond in some way.

b) Reactive IDS: Reactive IDS will not only detect suspicious or malicious traffic and alert the administrator, but will take pre-defined proactive actions to respond to the threat. Typically this means blocking any further network traffic from the source IP address or user.

Intrusion detection systems help network administrators prepare for and deal with network security attacks. These systems collect information from a variety of systems and network sources, and analyze them for signs of intrusion and misuse. A variety of techniques have been employed for analysis ranging from traditional statistical methods to new machine learning approaches.

1.2. What is not an IDS?

Contrary to popular marketing belief and terminology employed in the literature on intrusion detection systems, not everything falls into this category. In particular, the following security devices are not IDS:

Network logging systems used, for example, network traffic monitoring systems.

Anti-virus products designed to detect malicious software such as viruses, Trojan horses, worms, bacteria, logic bombs.

Firewalls.

Security/cryptographic systems, for example VPN, SSL, S/MIME, Kerberos, Radius etc.

1.3. Attack Types

Attack can be classified into three types. They are as follows:

a) Reconnaissance: These attacks involve the gathering of information about a system in order to find its weaknesses such as port sweeps, ping sweeps, port scans, and Domain Name System (DNS) zone transfers.

b) Exploits: These attacks take advantage of a known bug or design flaw in the system.

c) Denial-of-Service (DoS): These attacks disrupt or deny access to a service or resource.

1.4. Existing System

One of the most well known and widely used intrusion detection systems is the open source, freely available Snort. It is available for a number of platforms and operating systems including both Linux and Windows. Snort has a large and loyal following and there are many resources available on the Internet where we can acquire signatures to implement to detect the latest threats.

1.5. Problem Statement

The classical signature-based approach:

Cannot detect unknown or new intrusions.

Patches and regular updates are required.

The statistical anomaly-based approach:

Not based on an adaptive intelligent model.

Cannot learn from normal and malicious traffic patterns.

An alternative approach based on machine learning must be developed.

1.6. Objectives

To implement intrusion detection system using NaÃ¯ve Bayes Classifier,

To protect secure information of an organization from outside and inside intruders,

To detect novel or unknown intrusions in real-time.

1.7. Scope of the Project

Increased network complexity, greater access, and a growing emphasis on the Internet have made network security a major concern for organizations. The number of computer security breaches has risen significantly in the last three years. In February 2000, several major web sites including Yahoo, Amazon, E-Bay, Datek, and E-Trade were shut down due to denial-of-service attacks on their web servers.

Today, a large amount of sensitive information is processed through computer networks, thus it is increasingly important to make information systems, especially those used for critical functions in the military and commercial sectors, resistant and tolerant to network intrusions. Hence Intrusion Detection has become an integral part of the information security process.

2 LITERATURE REVIEW

2.1. The TCP/IP Reference Model

The TCP/IP layer is a multi-layered architecture. This means that we have one functionality running at one depth, and another one at another level, and so forth. We can add new functionality to the application layers, for example, without having to re-implement the whole TCP/IP stack code, or to include a complete TCP/IP stack into the actual application.

The following four layers comprise the TCP/IP Internet model:

Application layer

Handles implementation of user applications.

Transport layer

Manages end-to-end communications between hosts.

Two transport layers protocols are TCP and UDP.

Network layer

Gets data from source to destination.

Link layer

Manages data transfer to and from physical medium.

Web browser

TCP

Ethernet driver

TCP

Web server

Stream

TCP segment

IP datagram

Ethernet frame

Figure 2.1 TCP/IP Internet Model

2.1.1. Internet Protocol (IP)

The IP protocol resides in the Internet layer. It is an unreliable and connectionless datagram protocol-a best-effort delivery service. The term best-effort means that IPv4 provides no error control or flow control (except for error detection on the header). IPv4 assumes the unreliability of the under- lying layers and does its best to get a transmission through to its destination, but with no guarantees. If reliability is important, IPv4 must be paired with a reliable protocol such as TCP.

IP Header

A datagram is a variable-length packet consisting of two parts: header and data.

The header is 20 to 60 bytes in length and contains information essential to routing and delivery. The header has a 20-byte fixed part and a variable length optional part of maximum of 40-bytes. The header format is shown below:

32-bits

VER(4-bits)

HLEN(4-bits)

Service(8-bits)

Total Length(16-bits)

Identification(16-bits)

Flags(3-bits)

Fragmentation Offset(13-bits)

TTL(8-bits)

Protocol(8-bits)

Header Checksum(16-bits)

Source Address(32-bits)

Destination Address(32-bits)

Options

Padding

Figure 2.2 IP Header Format

IP Header Field Description

Version - bits(0-3). This field is a version number of the IP protocol in binary. IPv4 is represented as 0100, while IPv6 is represented as 0110.

Header length (HLEN) - bits(4-7). This four bits field defines the total length of the datagram header in four byte words. This field is needed because the length of the header is variable (between 20 and 60 bytes). When there are no options, the header length is 20 bytes, and the value of this field is five (5 x 4 = 20). When the option field is at its maximum size, the value of this field is 15 (15 x 4 = 60).

Service - bits(8-15). This has two interpretations. They are:

a) Service Type

In this interpretation, the first three bits are called precedence bits. The next four bits are called type of service (TOS) bits, and the last bit is not used.

Table 2.1 Types of Service

TOS Bits

Description

0000

Normal (default)

0001

Minimize cost

0010

Maximize reliability

0100

Maximize throughput

1000

Minimize delay

b) Differentiated Services

This has Differentiated Services Code Point (DSCP) which is standard bits (0-5) and the remaining two bits (6-7) are still unused.

Total Length - bits(16-31). With this field, we know how large the packet is in octets, including headers and everything. For a single packet, the maximum size is 65535 octets, or bytes. The minimum packet size is 576 bytes, without caring if the packet arrives in fragments or not.

Identification - bits(32-46). This field is used in the reassembly of fragmented packets.

Flags - bits(47-49). This field contains a three flags pertaining to fragmentation. The first bit is reserved, but still not used, and must be set to zero. The second bit is set to zero if the packet may be fragmented and to one if it may not be fragmented. The third and last bit can be set to zero if this was the last fragment and one if there are more fragments of this same packet.

Fragment Offset - bits(50-63). The fragment offset field helps to determine where the packet belongs in the datagram. The first packet has fragment has offset zero and the fragments are calculated in 64 bits.

Time to live - bits(64-72). The TTL field shows how long the packet may live, or how many "hops" it may take to reach its destination. TTL field is decremented by 1 for each process that touches it and the packet is destroyed if TTL field is zero.

Protocol - bits(73-80). This field indicates the protocol of next layer. For example, this may be TCP, UDP or ICMP among others.

Header checksum - bits(81-96). This is a checksum of the IP header of the packet used for error detection.

Source address - bits(97-128). This field contains the source address.

Destination address - bits(129-160). This field contains the destination address.

Options. If the Header Length is greater than five, i.e. it is between (6-15), it means that the Options field is present and must be considered. It contains different optional settings within the header, such as SACK, record route options or Internet timestamps.

Padding - bits variable. This is a padding field that is used to make the header end at an even 32 bit boundary. The field must always be set to zeroes straight through to the end.

2.1.2. Internet Control Message Protocol (ICMP)

The Internet Control Message Protocol (ICMP) is gives important information about the health of the network.

Types of Messages

ICMP messages are divided into two broad categories:

a) error-reporting messages, and

b) query messages.

The error-reporting messages report problems that a router or a host (destination) may encounter when it processes an IP packet. Five types of errors are handled: destination unreachable, source quench, time exceeded, parameter problems, and redirection. The query messages, which occur in pairs, help a host or a network manager get specific information from a router or another host. For example, nodes can discover their neighbors. Also, hosts can discover and learn about routers on their network, and routers can help a node redirect its messages. Four types of query messages are - echo request and reply, timestamp request and reply, address-mask request and reply, & router solicitation and advertisement.

ICMP Header

8-bits

16-bits

Type

Code

Checksum

Rest of the header

Data Sections

Figure 2.3 ICMP Header Format

ICMP Header Field Description

Type - This field contains the ICMP type of the packet. This is different from ICMP type to type always. This field is eight bits in total.

Code - This field indicates code of all ICMP types. There may be single code or many code for each ICMP type. This field is eight bits in length, total.

Checksum - It is a 16 bit field. The checksum field should not be zero while calculating checksum.

2.1.3. User Datagram Protocol (UDP)

The User Datagram Protocol (UDP) is called a connectionless, unreliable transport protocol. It does not add anything to the services of IP except to provide process-to- process communication instead of host-to-host communication. Also, it performs very limited error checking.

If UDP is so powerless, why would a process want to use it? With the disadvantages come some advantages. UDP is a very simple protocol using a minimum of overhead. If a process wants to send a small message and does not care much about reliability, it can use UDP.

UDP Header

The UDP header contains a very simplified TCP header. It contains source-ports, destination-ports, header length and a checksum as seen in the image below.

16-bits

Source Port

Destination Port

Total Length

Checksum

Figure 2.4 UDP Header Format

UDP Header Field Description

Source port - bit(0-15). This is the port number used by the process running on the

source host. It is 16 bits long, which means that the port number can range from 0 to

65,535.

Destination port - bit (16-31). This is the port number used by the process running on

the destination host. It is also 16 bits long.

Total Length - bit(32-47). It denotes length of the whole packet in octets, which includes header and data portions. The length of shortest possible packet is 8 octets long.

Checksum - bit(48-63). This field is used to detect errors over the entire user datagram (header plus data).

2.1.4. Transmission Control Protocol (TCP)

TCP, like UDP, is a process-to-process (program-to-program) protocol. TCP, therefore, like UDP, uses port numbers. Unlike UDP, TCP is a connection- oriented protocol; it creates a virtual connection between two TCPs to send data. In addition, TCP uses flow and error control mechanisms at the transport level. In brief, TCP is called a connection-oriented, reliable transport protocol. It adds connection-oriented and reliability features to the services of IP.

TCP Header

32-bits

Source Port Address(16-bits)

Destination Port Address(16-bits)

Sequence Number(32-bits)

Acknowledge Number(32-bits)

HLEN

(4-bits)

Reserved

(6-bits)

Window Size(16-bits)

Checksum(16-bits)

Urgent Pointer(16-bits)

Options and Padding

Figure 2.5 TCP Header Format

TCP Header Field Description

Source port - bit(0-15). This is the source port of the packet which is originally bound directly to a process on the sending system.

Destination port - bit(16-31). This is the destination port of the TCP packet which is originally bound directly to a process on the receiving system.

Sequence Number - bit(32-63). This field sets a number on each TCP packet because of which the TCP stream can be properly sequenced. The Sequence number is then returned in the ACK field to acknowledge that the packet was properly received.

Acknowledgment Number - bit(64-95). This field acknowledges a specific packet a host has received. When a packet is received with one Sequence number set, and if packet is okay, then an ACK packet with the Acknowledgment number set to the same as the original Sequence number is replied.

Header length or Data Offset - bit 96 - 99. This four bits field indicates the number of four byte words in the TCP header. The length of the header can be between 20 and 60 bytes. Therefore, the value of this field can be between five (5 x 4 = 20) and 15 (15 x 4 = 60).

Reserved - bit 100 - 105. These bits are reserved for future usage.

Control - This field defines six different control bits or flags as:

Table 2.2 Description of flags in the control field

Flag

Description

URG

The value of the urgent pointer field is valid.

ACK

The value of the acknowledgment field is valid.

PSH

Push the data.

RST

Reset the connection.

SYN

Synchronize sequence numbers during connection.

FIN

Terminate the connection.

Window - bit(112-127). This field is used by to tell the sender how much data the receiver allows at the moment by receiving host. This is done by sending an ACK back, which contains the Sequence number that we want to acknowledge, and the Window field then contains the maximum accepted sequence numbers that the sending host can use before he receives the next ACK packet. The next ACK packet will update accepted Window which the sender may use.

Checksum - bit(128-143). This field specifies the checksum of the whole TCP header. The checksum also covers a 96 bit pseudo header containing the Source-address, Destination-address, protocol, and TCP length which provides more security.

Urgent Pointer - bit(144-159). This field points to the end of the data that is considered urgent. It is used when a connection has important data which should be processed by receiving end as soon as possible. The sender sets the URG flag and sets the Urgent pointer to indicate where the urgent data ends.

Options: This field has a variable length field. It has optional headers.

Padding: This field pads the TCP header until the whole header ends at a 32-bit boundary. It makes sure that the data portion of the packet begins on a 32-bit boundary, and that packet has no data loss. It has only zeros.

2.2. Naive Bayes Classifier

A Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".

In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In spite of their naive design and apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations.

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix. The Naive Bayes algorithm affords fast, highly scalable model building and scoring. It scales linearly with the number of predictors and rows. The build process for Naive Bayes is parallelized. Naive Bayes can be used for both binary and multiclass classification problems.

The Naive Bayes algorithm is based on conditional probabilities. It uses Bayes' Theorem, a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data.

Bayes' Theorem

Bayes' Theorem finds the probability of an event occurring given the probability of another event that has already occurred. If B represents the dependent event and A represents the prior event, Bayes' theorem can be stated as follows.

Prob(B given A) = Prob(A and B)/Prob(A)

To calculate the probability of B given A, the algorithm counts the number of cases where A and B occur together and divides it by the number of cases where A occurs alone.

Naive Bayes Algorithm

X be a set of instances xi = (a1,a2,â€¦,an)

V be a set of classifications vj

Naive Bayes assumption:

P ( a1, a2, â€¦ an | vj ) = â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦.â€¦ "(2.1)"

This leads to the following algorithm:

Naive_Bayes_Learn ( examples )

for each target value vj

estimate P ( vj )

for each attribute value ai of each attribute a

estimate P ( ai | vj )

Classify_New_Instance ( x )

We generally estimate P ( ai | vj ) using m-estimates:

P ( ai | vj ) = â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦. "(2.2)"

where:

n = the number of training examples for which v = vj

nc = number of examples for which v = vj and a = ai

p = a priori estimate for P ( ai | vj )

m = the equivalent sample size

2.3. Some Well-Known Attacks

2.3.1. DoS

A denial of service attack (DoS attack) or distributed denial of service (DDos) is an attempt to make a computer resource unavailable to its intended users. Perpetrators of DoS attacks typically target sites or services hosted on high-profile web servers such as banks, credit card payment gateways, etc. The term is generally used with regards to computer networks, but is not limited to this field, for example, it is also used in reference to CPU resource management.

One common method of attack involves saturating the target (victim) machine with external communications requests, such that it cannot respond to legitimate traffic, or responds so slowly as to be rendered effectively unavailable. In general terms, DoS attacks are implemented by either forcing the targeted computer(s) to reset, or consuming its resources so that it can no longer provide its intended service or obstructing the communication media between the intended users and the victim so that they can no longer communicate adequately.

Denial-of-service attacks are considered violations of the IAB's Internet proper use policy, and also violate the acceptable use policies of virtually all Internet Service Providers. They also commonly constitute violations of the laws of individual nations.

There are many types of denial of service (or DoS) attacks. Some DoS attacks (like a smurf mailbomb, or neptune neptune attack) abuse a perfectly legitimate feature. Others (Ping of Death, teardrop) create malformed packets that confuse the TCP/IP stack of the machine that is trying to reconstruct the packet. Still others (syslogd, back, apache2) take advantage of bugs in a particular network daemon.

Some Captured DoS attacks are as follows:

Smurf

Neptune

Teardrop

Pod

Land

Nuke

Smurf

The smurf attack is a way of generating significant computer network traffic on a victim network. This is a type of denial-of-service attack that floods a target system via spoofed broadcast ping messages.

"Smurf" attack use ICMP echo request packets directed to IP broadcast addresses from remote locations to create a denial-of-service attack. There are three participatants in these attacks: the attacker, the intermediary, and the victim (note that the intermediary can also be a victim). ICMP "echo request" packets is send to the broadcast address (xxx.xxx.xxx.255) of many subnets with the source address spoofed to be that of the intended victim by attacker. ICMP "echo reply" packets are send by any machines that are listening on these subenets to the victim. The smurf attack is effective because the attacker is able to use broadcast addresses to amplify what would otherwise be a rather innocuous ping flood. From an attacker's point of view, the attacker can flood a victim with a volume of packets 255 times as great in magnitude as the attacker would be able to achieve without such amplification. This amplification effect is shown in Figure 2.6. A single spoofed packet is sent by attacking machine to the broadcast address of some network, and every machine that is present on that network responds by sending a packet to the victim machine. There can be as many as 255 machines on an Ethernet segment, the attacker can use this amplification to generate a flood of ping packets 255 times as great in size as would otherwise be possible. This figure is a simplified form of the smurf attack. The attacker sends a stream of icmp "ECHO" requests to the broadcast address of many subnets, resulting in a large, continuous stream of "ECHO" replies that flood the victim in an actual attack.

Internet

Attacker

Victim

Echo Request

From attacker

To 192.168.0.225

Echo Reply

from 192.168.0.20

to victim

Echo Reply

from 192.168.0.20

to victim

Echo Reply

from 192.168.0.20

to victim

Echo Reply

from 192.168.0.20

to victim

Hundreds of echo reply's flood

One echo request sent to

broadcast address.

Figure 2.6 Smurf attack

Teardrop

A teardrop attack is a denial of service attack. The teardrop attack uses IP to create packet reassembly problems so the target computer crashes. The teardrop attack uses erroneous packet header information indicating overlapping fragments of packets so some data in some packets must overwrite data in other packets to re-assemble the packet. Attempts to re-assemble these packets with overlapping data can cause the computer to crash if the software is not prepared to handle erroneous packet header information.

Neptune

Neptune (also called SYN Flood attack) is a denial of service attack to which every TCP/IP implementation is vulnerable. For detecting a Neptune attack network traffic is monitored for a number of simultaneous SYN packets destined for a particular machine. The host sending these packets is usually unreachable.

The "tcpd" server adds a record to the data structure that stores information describing all pending connections caused by each half-open TCP connection made to a machine . This data structure is of finite size. By intentionally creating too many partially-open connections, it can be made to overflow. The half-open connections data structure on the victim server system will eventually overflow and the system will be unable to accept any new incoming connections until the table is emptied out. There is a timeout associated with a pending connection normally, so the half-open connections will ultimately expire and the victim server system will recover. The attacking system however can simply continue sending IP-spoofed packets requesting new connections quicker than the victim system can expire the pending connections. In some cases, the system may crash, exhaust memory or be rendered otherwise inoperative.

POD

A ping of death (abbreviated "POD") is a type of attack on a computer that involves sending a malformed or otherwise malicious ping to a computer. A ping is normally 64 bytes in size (or 84 bytes when IP header is considered); many computer systems cannot handle a ping larger than the maximum IP packet size, which is 65,535 bytes. Sending a ping of this size can crash the target computer.

Traditionally, this bug has been relatively easy to exploit. Generally, sending a 65,536 byte ping packet is illegal according to networking protocol, but a packet of such a size can be sent if it is fragmented; when the target computer reassembles the packet, a buffer overflow can occur, which often causes a system crash.

This exploit has affected a wide variety of systems, including Unix, Linux, Mac, Windows, printers, and routers. However, most systems since 1997-1998 have been fixed, so this bug is mostly historical.

In recent years, a different kind of ping attack has become wide-spread - ping flooding simply floods the victim with so much ping traffic that normal traffic fails to reach the system (a basic denial-of-service attack).

Land

When an attacker sends a spoofed SYN packet in which the source address is the same as the destination address, the land attack occurs. The reason a LAND attack works is because it causes the machine to reply to itself continuously. Directed against vulnerable systems, this attack caused systems to lock up or become unstable.

Nuke

Nuke is an old dos attack against computer network consisting of fragmented or otherwise invalid ICMP packets sent to the target, achieved by using modified ping utility to repeatedly send the corrupt data, thus slowing down the affected computer until it comes to complete stop.

2.3.2. Probe

Probing is a class of attacks in which an attacker scans a network of computers to collect information or find known vulnerabilities. An intruder with a map of machines and services that are present on a network can use this information to look for exploits. There are different types of probing: some of them abuse the computer's legitimate features; other ones use social engineering techniques. This class of attacks is the most commonly heard and requires very little technical expertise. Examples are Ipsweep, Mscan, Nmap, Saint, Satan, ping-sweep and Portsweep attacks.

Following are the captured attacks.

Satan

Ipsweep

Portsweep

Nmap

Nmap

Nmap is a "Network Mapper", used to discover computers and services on a computer network, thus creating a "map" of the network. Just like many simple port scanners, Nmap is capable of discovering passive services on a network despite the fact that such services aren't advertising themselves with a service discovery protocol. In addition Nmap may be able to determine various details about the remote computers. These include operating system, device type, uptime, software product used to run a service, exact version number of that product, presence of some firewall techniques and, on a local area network, even vendor of the remote network card.

Nmap can be used for black hat hacking, or attempting to gain unauthorized access to computer systems. It would typically be used to discover open ports which are likely to be running vulnerable services, in preparation for attacking those services with another program.

System administrators often use Nmap to search for unauthorized servers on their network, or for computers which don't meet the organization's minimum level of security.

Satan

Satan is a probing intrusion which automatically scans a network of computers to collect information or find out known vulnerabilities.

SATAN is an early predecessor of the SAINT scanning program which are similar to some extent. While SAINT and SATAN are quite similar in design and purpose, the particular vulnerabilities that each tools checks for are slightly different [4]. SATAN is distributed as a collection of C programs and perl that can be run either from within a web browser or from the UNIX command prompt like SAINT. SATAN supports three levels of scanning: heavy, normal and light. The vulnerabilities that SATAN checks for in heavy mode are:

REXD access

tftp file access

remote shell access

unrestricted NFS export

unrestricted X Server access

write-able ftp home directory

several Sendmail vulnerabilities

several ftp vulnerabilities

NFS export to unprivileged programs

NFS export via portmapper

NIS password file access

Smaller subsets of these vulnerabilities are checked in scan in light mode and normal mode.

Ipsweep

An Ipsweep attack is launced to determine which hosts are listening on a network. This information is useful to an attacker in searching for vulnerable machines and staging attacks. There are many ways an attacker can use to perform an Ipsweep attack. Sending ICMP Ping packets to every possible address within a subnet and wait to see which machines respond is the most common method and the method used in the simulation.

Portsweep

Port Sweep is a network testing tool that will let attacker learn a lot about Internet and its functionality. It is like more applications combined together to get more efficient results in easier way. Attacker can gather information about the computer and some other computers that are connected to Internet. This professionally designed application can be handy in finding all information (location, network type) about certain computer (IP, server, e-mail).Attacker can sweep their network to see if there is any open ports waiting to be hacked, to see what data is send.etc.

2.4. jNetPcap

jNetPcap is a java wrapper around libpcap and WinPcap native libraries found on various unix and windows platforms. jNetPcap exposes the functionality as a java programming interface (API) which helps in capturing packets in the network.

The main classes which implement libpcap and WinPcap functionality are:

org.jnetpcap.Pcap class - core libpcap methods available on all platforms

org.jnetpcap.winpcap.winpcap class - extensions based on WinPcap library typically only available on windows based system

The core libpcap implementation of jNetPcap, provides methods to do the following functions

Find a complete list of network interfaces the system has

Open either a network interface or a PCAP capture file for reading packets

Apply a packet filter

Dump packets into a PCAP capture file

Transmit raw link layer packets over a network interface

Gather statistics on network interface and report counters

2.5. jSMILE

jSMILE is a platform independent library of java classes for reasoning in graphical probabilistic models, such as Bayesian networks and influence diagrams. It can be embedded in programs that use graphical probabilistic models as their reasoning engines.

It is enough for jSMILE to have JRE installed so it be used to create stand-alone applications, applets, and servlets. Model building and inference are under full control of the application program, as the jSMILE library serves merely as a set of tools and structures that facilitates them.

3 SYSTEM DESIGN

Our aim is to design and develop an Intelligent Network Intrusion Detection System (INIDS) that would be accurate, low in false alarms, not easily cheated by small variations in patterns, adaptive and be of real time.

Attributes Used

For our INIDS, we have extracted 18 features from tcpdump files which can identify packet characteristics. The features are :

protocol type,

ip length,

don't fragment flag(df),

more fragment flag(mf),

fragmentation offset,

syn flood,

urgent pointer,

tcp flags(urg, ack, psh, rst, syn, fin),

tcp window size,

udp checksum,

icmp flood,

icmp checksum, and

type (packet is normal or attack)

3.1. System Block Diagram

Network

Sniffer

Detector

File System

Knowledge Based

Engine

TrainingDataSet

Captured

Normal

Attack

Trained

Figure 3.1 System Block Diagram

3.2. Data Flow Diagrams (DFDs)

DFD is a structured, diagrammatic technique for showing the functions performed by a system and the data flowing into, out of, and within it.

The 'Context Diagram 'or 'level-0 DFD' is an overall, simplified, view of the target system, which contains only one process box and the primary inputs and outputs.

Figure 3.2 Level-0 DFD

The 'level-1 DFD' shows all processes at the first level of numbering, data stores, external entities and the data flows between them. The purpose of this level is to show the major high-level processes of the system and their interrelation.

Figure 3.3 Level-1 DFD

The 'level-2 DFD' is a decomposition of a process shown in a level-1 diagram. Here we have decomposed "inference engine" process.

Figure 3.4 Level-2 DFD

3.3. Unified Modeling Language (UML)

UML is now the most widely used graphical representation scheme for modeling object-oriented systems. An attractive feature of the UML is its flexibility. The UML is extensible and is independent of any particular OOAD process. We have created a use case diagram to model the interactions between network administrators or crackers with theirs use cases.

Figure 3.5 Use Case Diagram

4 METHODOLOGY

To develop our system, we have adopted the traditional waterfall model. The waterfall model is a sequential software development process, in which progress is seen as flowing steadily downwards like a waterfall through the phases of Conception, Analysis, Design, Construction, Testing and Maintenance. To follow the waterfall model, one proceeds from one phase to the next in a sequential manner. For example, when the requirements are fully completed, one proceeds to design. When the design is fully completed, an implementation of that design is made by coders. Towards the later stages of this implementation phase, separate software components produced are combined to introduce new functionality and reduced risk through the removal of errors. Thus the waterfall model maintains that one should move to a phase only when its preceding phase is completed and perfected.

As this project is based on knowledge-based, a sizeable proportion of time was spent researching strategies for implementation. In order to achieve our desired goal regarding our project, we had come across several books and websites along with the remarkable suggestions of friends and seniors. We studied different existing systems that are applicable in several fields. We went through those existing systems and found out their characteristics, applicability and limitations as well. In this regard, the existed intrusion detection system "snort" became the inspiring software for us which is signature-based and failed to detect unknown intrusions and rely on the signatures extracted by human experts.

A learning algorithm is good if it produces hypothesis that do a good job of predicting the classifications of unseen examples. First we train our model with training dataset and then we test with test dataset. So, it is more convenient to adopt the following methodology:

Collect a large set of examples.

Divide it into two disjoint sets: the training set and the test set.

Apply the learning algorithm to the training set.

Measure the percentage of examples in the test set that are correct classified.

For the training and testing of our INIDS, we have used the 1998 DARPA's dataset provided by MIT Lincoln Laboratory. It is widely used dataset to train and test the intrusion detection system. It provides around 4 gigabytes of compressed Tcpdump data for 7 weeks of the network traffic. Each week has five days, and each day has the TCP dump data. It also provides TCP dump list file, which labels every flow whether the flow is attack or not. Every entries consists of the flow identifier number, date, time when the first packet of the flow is arrived, duration, service name, source port number, destination port number, source IP address, destination IP address, attack score, and the name of the attack. With this file, we are able to recognize which flow is an attack and to extract the data from the TCP dump data with the information in the TCP dump list file.

First week and second week of training data consists of normal traffic and other week consists of mixed dataset i.e. normal traffic and attack traffic. For the purpose of training our intrusion detection system, we have extracted normal traffic from outside tcpdump of the day Wednesday and Thursday of week second. Similarly, we have extracted attack traffic from other week's traffic. We have used editcap tool to split the huge tcpdump file and wireshark to filter the desired packets.

For our INIDS, we have extracted 18 features from tcpdump files which can identify packet characteristics. The features have to be preprocessed to be suitable for naive bayes algorithm because naive bayes algorithm cannot handle continuous value. So, while making dataset the continuous features are discretized. Then, this dataset is fed for the purpose of learning naive bayes classifier. Again, when inferencing we extract all the features for each packet and we feed them to naive bayes classifier which calculates the probability of packet is normal and based on the threshold the packet is classified as normal or attack.

5 IMPLEMENTATION

5.1. Object-Oriented Design

In this technique, various objects that occur in the problem domain and the solution domain are first identified and different kinds of relationships that exist among these objects are identified. This object structure is further refined to obtain the detailed design. This approach has several advantages such as less development effort, and time and better maintainability.

During this implementation phase, each component of the design is implemented as a program module, and each of these programs modules is unit tested, debugged and documented.

Tools Used:

Netbeans 6.5 IDE

API Used:

JSmile API

JNetPcap

Language Used:

Java

System Installation Requirement:

Operating System - XP, Vista, Window - 7

CPU - 500 MHz (or above)

Memory - 128MB (or above)

6 TESTING

Testing is necessary to carry-out whether the modules or system is working properly or not.

6.1. Level of Testing

While implementing our system, we go through various levels of testing which are as follows:

a) Unit Testing: The purpose or unit testing is to determine the correct working of the individual modules.

b) Integration Testing: During this phase the different modules are integrated in a planned manner. The different modules making up a system are never integrated in a single shot. Integration is normally carried out through a number of steps. During each integration step, the partially integrated system is tested.

c) System Testing: Finally when all the modules have been successfully integrated and tested, system testing is carried out.

6.2. Software Testing Strategies

Two of the most prevalent strategies that we performed are black-box testing and white-box testing.

a) Black-Box testing: Demonstrates that software functions are operational and the input is properly accepted and output is correct produced.

b) White-Box testing: Examines the fundamental aspect of the system with complete information and access to the internal logical structure, code and algorithms.

A lot of features are still to be added in our project. There are many limitations which are still to be corrected. Before releasing the final version of software, alpha testing, beta testing and acceptance testing can be done additionally.

7 RESULT

7.1. Screenshots

Figure7.1 Naive Bayes Classifier

Figure 7.2 GUI Layout

Figure 7.3 Detection of normal packets only

Figure 7.4 Detection normal as well as analomous packets

Figure 7.5 Viewing only analomous packets

7.2. Comparison with Other Existing System

Our INIDS can be compared with the existing IDS system such as snort which is regarded as ideal intrusion detection system. Snort is signature-based, whereas our system is machine learning-based. In terms of known attacks, we see that snort is better, whereas in case of unknown attacks, our system is better. Snort has command line configuration mode whereas our system has GUI mode for the configuration. As a result, one can find that our system is easy to use.

High

SNORT Low

SNORTS

Low or

INIDS

INIDS High

Figure 7.6 Accuracy of known attack Figure 7.7 Accuracy of unknown attack

SNORT

INIDS

High

Low

Figure 7.8 Ease of Use

8 CONCLUSIONS AND FURTHER WORK

8.1. Conclusions

We accomplished the project regarding the detection of network intrusions based on Naive Bayes algorithm. The completed project can detect the novel attacks with the learning techniques which was not detected by the existing system, Snort. Comparing with snort, although it provides high accuracy, it was more time consuming requiring regular updates. Our system can detect the intrusions more efficiently with less time consuming.

After completing this project we are able to do teamwork and knew the way to task dividing and cooperating in the task. Successful work not only made us feel proud but we also became good companions. In this way we completed our project successfully.

8.2. Further Work

Our system works only for IPv4 network. In future, it can be extended to IPv6 network. We have analyzed only packet header. So, our system could not detect "Exploits" intrusions. So, we could add payload analyzing features in our system in future.

As a naÃ¯ve Bayesian network is a restricted network that has only two layers and assumes complete independence between the information nodes. This poses a limitation to this research work. In order to alleviate this problem so as to reduce the false positives, active platform or event based classification may be thought of using Bayesian network. We continue our work in this direction in order to build an efficient intrusion detection model.

You may also find these documents helpful