Jump to content

moving duplicate lines.

- - - - -

  • Please log in to reply
10 replies to this topic

#1
csharpit

csharpit

    Newbie

  • Members
  • Pip
  • 8 posts
require help with a very simple program which needs to remove the duplicate lines as well as other defined lines in a text file.

The program will be run on a similar file like the one below:

03/09/11 09:46:28.771 This is sample / test text
03/09/11 09:46:28.771 This is sample / test text
03/09/11 09:46:28.771 This is sample / test text
03/09/11 09:46:28.771 This is sample / test text
<test message needs to be deleted>

03/09/11 09:46:28.771 This is sample / test text
03/09/11 09:46:28.771 This is sample / test text
03/09/11 09:46:28.771 This is sample / test text
03/09/11 09:46:28.771 This is sample / test text
<test message needs to be deleted>


The output needs to be the following:

- The date & time part only of each line
- duplicate lines need to be removed
- Blank lines need to be removed
- Any other lines conataining text which does not begin with the date time part needs to be removed

I have started coding this as follows:



using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Globalization;

using System.IO;


namespace Remove

{

    class clsProgram

    {

        static void Main(string[] args)

        {

            try

            {

                // Initialise array which will read all lines in the log file

                string[] sLogs = File.ReadAllLines(@"C:test");

                StreamWriter sw = File.CreateText(@"C:test2");


                // DateTime variable sets the "previous time" as null

                DateTime? previousDateTime = null;


                // For every line in array sDateLogs which contains the search string

                foreach (string sLog in sLogs.Where(log => log.Contains("/")))

                {

                    // Initiate DateTime variable "Current DateTime"

                    DateTime currentDateTime;

                    // String TimeStamp removes everything after the first 21 characters so we are left

                    with the date time stamp only

                    string sTimeStamp = sLog.Remove(17);



                    Console.WriteLine(sDateTime);

                    sw.WriteLine(sDateTime);

                    

                }

                // Close StreamWriter

                sw.Close();

            }


            // Error handling

            catch (Exception ex)

            {

                Console.WriteLine(String.Format("Error: {0}", ex.Message));

            }

        }

    }

}




So far the program removes the first 17 characters to give the date and time part. The next steps required are to:

- search through line by line, comparing the previous line to the current and if they match then remove the duplicate. I have started this by initiating the previousDate variable but am stuck with taking it further.
- remove any line which does not begin with the date part
- remove any blank lines.

I understand and can describe how to go about doing what I want but having difficulty coding it.

Edited by csharpit, 06 September 2011 - 01:37 PM.
Corrected title


#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
If this is not a programming assignment, my favorite way to do this is to open the file in jEdit and do the following RegEx search/replace:
Search: ^(.*?)\n\1$
Replace: $1

I then hit "Replace All" until the count of replacements drops to 0.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
csharpit

csharpit

    Newbie

  • Members
  • Pip
  • 8 posts
Thanks for the alternative method, but this is a programming assignment.

#4
Momerath

Momerath

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 243 posts
using System;

using System.Collections.Generic;

using System.Globalization;

using System.IO;


namespace ConsoleApplication1 {

    class Program {

        static void Main() {

            String[] lines = File.ReadAllLines("C:/test");

            List<String> output = new List<String>();

            String last = String.Empty;

            DateTime parsed;


            for (int i = 0; i < lines.Length; i++) {

                if (lines[i].Trim() == String.Empty) continue;

                if (lines[i].Length < 21) continue;

                if (DateTime.TryParseExact(lines[i].Substring(0, 21), "MM/dd/yy hh:mm:ss.fff", CultureInfo.InvariantCulture, DateTimeStyles.None, out parsed)) {

                    if (lines[i].Substring(0, 21) != last) {

                        last = lines[i].Substring(0, 21);

                        output.Add(last);

                    }

                }

            }


            File.WriteAllLines("C:/test2", output.ToArray());

        }

    }

}

Change the 21's to 17's if you don't want to use the milliseconds portion (and remove '.fff' from the parse string).

#5
csharpit

csharpit

    Newbie

  • Members
  • Pip
  • 8 posts

Momerath said:

using System;

using System.Collections.Generic;

using System.Globalization;

using System.IO;


namespace ConsoleApplication1 {

    class Program {

        static void Main() {

            String[] lines = File.ReadAllLines("C:/test");

            List<String> output = new List<String>();

            String last = String.Empty;

            DateTime parsed;


            for (int i = 0; i < lines.Length; i++) {

                if (lines[i].Trim() == String.Empty) continue;

                if (lines[i].Length < 21) continue;

                if (DateTime.TryParseExact(lines[i].Substring(0, 21), "MM/dd/yy hh:mm:ss.fff", CultureInfo.InvariantCulture, DateTimeStyles.None, out parsed)) {

                    if (lines[i].Substring(0, 21) != last) {

                        last = lines[i].Substring(0, 21);

                        output.Add(last);

                    }

                }

            }


            File.WriteAllLines("C:/test2", output.ToArray());

        }

    }

}

Change the 21's to 17's if you don't want to use the milliseconds portion (and remove '.fff' from the parse string).


Thanks Momerath, but this only does half of what is required. What I need the program to do is calculate the time difference between each date and time stamp and displa this on the next line.

So for example, the output file should read something like this:

03/09/11 09:46:28.771
03/09/11 09:46:29.786 -- 1.02
03/09/11 09:46:38.037 -- 8.23

(Sorry if I did not make this clear in my initial post).

#6
Momerath

Momerath

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 243 posts
using System;

using System.Collections.Generic;

using System.Globalization;

using System.IO;


namespace ConsoleApplication1 {

    class Program {

        static void Main() {

            String[] lines = File.ReadAllLines("C:/utils/temp.txt");

            List<String> output = new List<String>();

            String last = String.Empty;

            DateTime parsed;

            DateTime lastDateTime = DateTime.Now;


            for (int i = 0; i < lines.Length; i++) {

                if (lines[i].Trim() == String.Empty) continue;

                if (lines[i].Length < 21) continue;

                if (DateTime.TryParseExact(lines[i].Substring(0, 21), "MM/dd/yy hh:mm:ss.fff", CultureInfo.InvariantCulture, DateTimeStyles.None, out parsed)) {

                    String temp = lines[i].Substring(0, 21);

                    if (temp != last) {

                        if (last == String.Empty) {

                            output.Add(temp);

                        } else {

                            output.Add(temp + " -- " + (parsed - lastDateTime).ToString(@"ss\.fff"));

                        }

                        last = temp;

                        lastDateTime = parsed;

                    }

                }

            }


            File.WriteAllLines("C:/utils/temp2.txt", output.ToArray());

        }

    }

}


#7
csharpit

csharpit

    Newbie

  • Members
  • Pip
  • 8 posts

Momerath said:

using System;

using System.Collections.Generic;

using System.Globalization;

using System.IO;


namespace ConsoleApplication1 {

    class Program {

        static void Main() {

            String[] lines = File.ReadAllLines("C:/utils/temp.txt");

            List<String> output = new List<String>();

            String last = String.Empty;

            DateTime parsed;

            DateTime lastDateTime = DateTime.Now;


            for (int i = 0; i < lines.Length; i++) {

                if (lines[i].Trim() == String.Empty) continue;

                if (lines[i].Length < 21) continue;

                if (DateTime.TryParseExact(lines[i].Substring(0, 21), "MM/dd/yy hh:mm:ss.fff", CultureInfo.InvariantCulture, DateTimeStyles.None, out parsed)) {

                    String temp = lines[i].Substring(0, 21);

                    if (temp != last) {

                        if (last == String.Empty) {

                            output.Add(temp);

                        } else {

                            output.Add(temp + " -- " + [COLOR="#FF0000"](parsed - lastDateTime).ToString(@"ss\.fff"));[/COLOR]                        }

                        last = temp;

                        lastDateTime = parsed;

                    }

                }

            }


            File.WriteAllLines("C:/utils/temp2.txt", output.ToArray());

        }

    }

}

The code does not run. It gives the following error on the highlighted section in the code:

No overload for method 'ToString' takes '1' arguments

#8
Momerath

Momerath

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 243 posts
Works fine here. I tested it before I posted it. .NET 4.0

#9
csharpit

csharpit

    Newbie

  • Members
  • Pip
  • 8 posts
I am using .NET 3.5.

Could that be an issue?

#10
Momerath

Momerath

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 243 posts
Yes, it's the issue. Change the line to
output.Add(temp + " -- " + (parsed - lastDateTime).TotalSeconds.ToString("00.000"));


#11
csharpit

csharpit

    Newbie

  • Members
  • Pip
  • 8 posts
Thanks this worked.

The only change I made was as follows:


output.Add(temp + " -- " + (parsed - lastDateTime).TotalSeconds.ToString("[COLOR="#FF0000"]0.0[/COLOR]"));


Is it possible for an explanation of how the code works? I understand the code as a whole and what it does at a high level but not so sure about the following within the if statements:

if (DateTime.TryParseExact(lines[i].Substring(0, 21),

and

String temp = lines[i].Substring(0, 21);

Comments on the code would be a great help.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users