Jump to content

Java Log File Analysis - Help

- - - - -

  • Please log in to reply
7 replies to this topic

#1
Mark Mckinney

Mark Mckinney

    Newbie

  • Members
  • Pip
  • 4 posts
I am attempting to write a program that analyzes a web server's log file to determine which computers have attempted to access that web server the most.
Any class in the Java standard library is available for use.

I have done some research to figure out what data structures would help, but I am stumped so far.
I know there are many, many ways to program this.

I have found information on "hit filters", and am convinced on using 'Try and catch' methods.

Any help would be appreciated, Thanks.

-Mark

#2
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
Depending on what the log file looks like and how structured it is. A regex may be all you need (Pattern & Matcher class in java).

#3
Mark Mckinney

Mark Mckinney

    Newbie

  • Members
  • Pip
  • 4 posts
Oh! That just might be the solution, researching more about it and will be testing it soon.

I'm going to post the first few lines of the log file, the rest is identical with different addresses.
I'm trying to count up each unique IP and output the top 3 most erroneous accesses.

[Wed Jun 30 20:02:53 2010] [error] [client 209.129.94.61] File does not exist:/site/hancocktools.com/_vti_bin
[Thu Jul 01 04:57:03 2010] [error] [client 67.218.116.163] File does not exist: C:/site/hypergrade.com/robots.txt

#4
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
That ain't too hard :)

    public static void main(String[] args)
    {
        String input = "[Wed Jun 30 20:02:53 2010] [error] [client 209.129.94.61] File does not exist:/site/hancocktools.com/_vti_bin\n" +
                "[Thu Jul 01 04:57:03 2010] [error] [client 67.218.116.163] File does not exist: C:/site/hypergrade.com/robots.txt";
        Pattern pattern = Pattern.compile("\\[(.*?)\\] \\[error\\] \\[client (.*?)\\] (.*)", Pattern.MULTILINE);
        Matcher matcher = pattern.matcher(input);
        while(matcher.find()){
            System.out.println("Date: " + matcher.group(1));
            System.out.println("IP: " + matcher.group(2));
            System.out.println("Error msg: " + matcher.group(3));
            System.out.println("");
        }
    }
Output:

Date: Wed Jun 30 20:02:53 2010
IP: 209.129.94.61
Error msg: File does not exist:/site/hancocktools.com/_vti_bin

Date: Thu Jul 01 04:57:03 2010
IP: 67.218.116.163
Error msg: File does not exist: C:/site/hypergrade.com/robots.txt


#5
Mark Mckinney

Mark Mckinney

    Newbie

  • Members
  • Pip
  • 4 posts
Thank you so much for the Pattern + Matcher example!
The only problem is that when I try to use "matcher.find()" on the log file, it doesn't progress past the first IP.
And when I try to use a Scanner, I end up getting myself into an infinite loop.

For some reason, I can't progress through each individual log, I must be way too tired at this point.
I know it is basic, and I have done it before, but I'm getting slightly frustrated.


Edit:
Here is my current code, which only prints out one IP address when ran.
I'm sure the answer is simple, I just have no energy left in me.


import java.io.FileNotFoundException;

import java.io.FileReader;

import java.util.Scanner;

import java.util.regex.Matcher;

import java.util.regex.Pattern;



public class LogAnal 

{

	public static void main(String[] args)

	{

		String input = "";

		try 

		{

			FileReader fr = new FileReader("small.log");

			Scanner scanner = new Scanner(fr);

			input = scanner.nextLine();

			

			Pattern pattern = Pattern.compile("\\[(.*?)\\] \\[error\\] \\[client (.*?)\\] (.*)", Pattern.MULTILINE);

			Matcher matcher = pattern.matcher(input);

			

			while(scanner.hasNextLine())

			{

				input = scanner.nextLine();

				

				if(matcher.find())

					System.out.println("IP: " + matcher.group(2));

			}

			

		} 

		catch (FileNotFoundException e) 

		{

			e.printStackTrace();

		}

	}

}



#6
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
By the way, the Pattern.compile(..) thingy is quite an expensive operation in terms of processing power.
Make sure you do that only once, and only do .matcher(..) multiple times.

#7
Mark Mckinney

Mark Mckinney

    Newbie

  • Members
  • Pip
  • 4 posts
Did you happen to see my edit?
I just woke up. :)

#8
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
Matcher matcher = pattern.matcher(input);
Is only done once outside the loop




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users