I am a beginner in shell programming in Unix. My current problem is log file analysis based on a given time range for listed dates. The file is a long file and I need to sort IPs based on a user input through terminal. For example, from the end of the logfile since it's sorted according to the date then I assume it should be converted e.g. the last date (date+%s) to timestamp and then subtract the value that user inserts by using switches -H(hours = h*3600 ) or -D(day = d*24*3600 ) and then compare by starting from the end of log file to reach the desired result. Any help on this as an example would be appreciated:
Example: user inputs: -H 12
last date in logfile = last row in logfile = 22 Oct 2002 21:02:33 +0200
convert it by using: date -d "22 Oct 2002 21:02:33 +0200" +%s subtract using to timestamp
timestamp - (12*3600) = X, means the date which is 12 hours later so you need all records from the end of logfile till this date.
The format example of the log file for each line is as follows:
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
I’ have however managed to sort and group IPs without giving any range using uniq and sort tools, but based on date and user switches (like -H) is somewhat difficult to get over with. A code sample or weblink for further help to list IPs based on "hours range input by user" in shell would be of great help.![]()
You need a regex in a Perl or Bash Shell Script.
Post a few lines of your log file and
we'll create one together.![]()
Thanks for the reply,
Here are two files I've attached, one is the log file the other my code which as so far, crippling,the objectives of my logfile analysis are echoed in the script. The main bit though for sorting repeated IPs and further analysis is by the end of the script where I used read for filename input only, in this case serverlog.log (switches are not included). After that the sorting over repeated IPs without switches are displayed through listing via an array. I need this array to assist me in displaying the total number of IPs in my file as well through count or incrementing which I attempted it (it seemed to work at first but now, gives nothing
since it sorts IPs and then displays a list only). The focus however is now on the date or hours range analysis as I explained in my first post, which is another point of confusion that I'm trying to tackle with...
![]()
Looks like you're almost there...
I ran your script and looked at the log file.
Your plan for working with dates is OK also,
so the only problem I see is your date format
is unrecognizable to Linux, so it has to be modified
prior to converting it to elapsed seconds.
Here is a simple script that...
1.) extracts the latest date from the log file
2.) repositions it into a Linux recognizable format
3.) converts it into elapsed seconds since 1970
Now you will have a reference date to work with
as you already have all the date math figured out.
Here is the script:
Code:#!/bin/env bash temp_date=`cat ./serverlog.log | tail -n1 \ | cut -d [ -f 2 | cut -d ] -f 1` echo "$temp_date" temp_date2=`echo $temp_date | \ sed -e 's/Jan/01/g' \ -e 's/Feb/02/g' -e 's/Mar/03/g' \ -e 's/Apr/04/g' -e 's/May/05/g' \ -e 's/Jun/06/g' -e 's/Jul/07/g' \ -e 's/Aug/08/g' -e 's/Sep/09/g' \ -e 's/Oct/10/g' -e 's/Nov/11/g' \ -e 's/Dec/12/g'` echo "$temp_date2" temp_year=`echo $temp_date2 | gawk '{print substr($0,7,4)}'` temp_month=`echo $temp_date2 | gawk '{print substr($0,4,2)}'` temp_day=`echo $temp_date2 | gawk '{print substr($0,1,2)}'` temp_time=`echo $temp_date2 | gawk '{print substr($0,12,8)}'` #UTC format utc_date="$temp_year-$temp_month-$temp_day $temp_time" echo "$utc_date" reference_seconds=`date --utc -d "$utc_date" +%s` echo "$reference_seconds"
So you can see that it works...
Here is the output on my system:
You have everything else figured out, but
if you want to keep working on it together,
then keep posting.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks