Jump to content

Remving duplicates from a 1D array

- - - - -

  • Please log in to reply
17 replies to this topic

#1
Hiche

Hiche

    Newbie

  • Members
  • Pip
  • 9 posts
I'm finding this hard to implement. So basically I want to remove the elements (integers) in an array that are duplicates (repeated more than twice). We cannot use any external class but the Arrays class (not the ArrayList etc).

Any guidance is appreciated. Thank you.

#2
lethalwire

lethalwire

    while(false){ ... }

  • Members
  • PipPipPipPipPipPipPip
  • 748 posts
  • Programming Language:Java, PHP
  • Learning:Java, PHP
How about sorting the array using java.util.Arrays. Then iterating through removing the duplicates. (The duplicates should now be next to eachother)

#3
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
Put it in a Set, and then back to a List :D
Okay, ignore the above. Like lethalwire said wouldn't be a bad idea.

Just write a utility method which removes something from the given array at index x to help yourself:
public void removeElement(int[] array, int index){

}


#4
Perigren

Perigren

    Newbie

  • Members
  • Pip
  • 1 posts
I'm giving it a shot right now and I've found what can be helpful is writing a method that creates an int array of all the numbers that have duplicates in the original array. For example if the original array was {2,2,2,3,4,6,6,6,7,7,8,9}, this method would return {2,6,7}.

#5
Hiche

Hiche

    Newbie

  • Members
  • Pip
  • 9 posts
My problem is removing the duplicates from the array. After I sort them (I used Array.sort(array)), how exactly do I iterate over them AND remove them? If the duplicates are next to each other, then what? Sorry for my ignorance that might render you impatient at a point.

Before looking at the replies, I came up with this:

    for (int i = 0; i < array.length; i++)

    	{

    		for (int j = i + 1; j < array.length; j++)

    		{

    			if (i != j && array[i] == array[j])

    			{

    				// here's where I want to remove the duplicates

    			}

    		}

    	}

I iterate over the elements and if a duplicate of an integer is found, I want to remove it. For a better view on the problem statement, here it is:

Write a program RemoveDuplicates.java that takes three file names as command line arguments. The first two files contain a list of integers (the first line of each file contain the number of integers in that file). Your program should write to the third file the union of the numbers in the first two files (without any duplicates). Hint: Store the contents of the files in arrays first.

Taking three file names isn't important now. Just suppose the arguments are arrays. Also, just for clarification, merging two arrays is as follows, right?:
public static int[] merge(int[] array1, int[] array2)

    {

    	int[] array3 = new int[array1.length + array2.length];

    	for (int i = 0; i < array1.length; i++)

    		array3[i] = array1[i];

    	for (int j = 0; j < array2.length; j++)

    		array3[array1.length + j] = array2[j]; 	

    	return array3;

    }

Thank you for the replies.

#6
lethalwire

lethalwire

    while(false){ ... }

  • Members
  • PipPipPipPipPipPipPip
  • 748 posts
  • Programming Language:Java, PHP
  • Learning:Java, PHP
If that's the case, you can insert all of the integers from the first file into the array but don't add any integer that is ALREADY in the array(java.util.Arrays.binarySearch(...). Then do the same for the other file.
The only "difficult" part of the task is inserting integers in place, and shifting values in the array.

If you do this, you won't have the problem of removing duplicate items from the array. I would take the first strategy over this one because if you have an array of:

[1,1,2,3,4,4]

Then you'd have to remove a 1, and a 4. But you can't technically remove them without creating a new array. OR you'd have to set the values to something like : Integer.MIN_VALUE or Integer.MAX_VALUE;
Then when you wanted to add the numbers to the third file, you'd have to use a check of:

if(array[i]==Integer.MIN_VALUE)

   skip this and don't add



Errr.. I guess either approach is fairly simple. :p

#7
fread

fread

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 787 posts
Just to re-enforce what lethalwire was saying; it is easiest with an insertinplace method and a binary search method. And you only need one array.
Algorithm:
load the array with the contents of the two files

for(int i = lo + 1; i <= array.length-1; i++)
 {
        if  (   ! binarySearch(on the array for value i; from lo to i)    )
       array = insertInPlace(array,0,i-1, array[i]);
 }

Perfection of means and confusion of ends seem to characterize our age. Albert Einstein :confused:

#8
Ritwik I the programmer

Ritwik I the programmer

    Newbie

  • Members
  • PipPip
  • 17 posts
OK, you need a down to earth version? try this:

//Duplicate element deletion array

import java.util.*;

class Duplication

{

    private int arr[];

    public void get()

    {

        Scanner sc = new Scanner(System.in);

        System.out.println("Enter array size:");

        int size = sc.nextInt();

        arr = new int [size];

        for(int i = 0; i < size; i++)

        {

            System.out.println("Enter array element:");

            arr[i] = sc.nextInt();

        }

        System.out.println("Original Array:");

        show();

        delete();

        System.out.println("Duplicate deleted array:");

        show();

    }

    public void delete()

    {

        int size = arr.length;

        for(int i = 0; i < size; i++)

        {

            for(int j = i + 1; j < size; j++)

            {

                while(arr[i] == arr[j])

                {

                    for(int k = j; k < size - 1; k++)

                        arr[k] = arr[k + 1];

                    size--;

                    if(j == size) break;

                }

            }

        }

        int t[] = new int[size];

        for(int l = 0; l < size; l++)

            t[l] = arr[l];

        arr = t;

    }

    public void show()

    {

        for(int i = 0; i < arr.length; i++)

            System.out.print(arr[i] + " ");

        System.out.println();

    }

}

public class Duplication_main

{

   public static void main(String  args[])

   {

       Duplication d = new Duplication();

       d.get();

   }

}


#9
fread

fread

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 787 posts

Quote

...OK, you need a down to earth version?...
You nest loops four layers deep and use two arrays. I really don't see how this is much different from what we were explaining. I will like to see the performance if the files had 10000 numbers each :w00t:.

Note: this could be done without nesting a single loop, one array and you end up with a sorted list.
Perfection of means and confusion of ends seem to characterize our age. Albert Einstein :confused:

#10
lethalwire

lethalwire

    while(false){ ... }

  • Members
  • PipPipPipPipPipPipPip
  • 748 posts
  • Programming Language:Java, PHP
  • Learning:Java, PHP
It looks like the worst-case running time of the provided code is O(N^4). On larger sets of numbers, this method is not feasible.

---------- Post added at 10:42 AM ---------- Previous post was at 10:35 AM ----------

Can someone check me on this...
I believe that if the inner-most loop is called N-1 times, the while loop is called N-1 times, the 2nd outer-most loop called N-1 times and the 1st loop called N times then worst-case running time is
O(N*N*N*N)

If a list of 1,000 numbers is provided, and all numbers are equal then we see the worst case scenario.
If the inner most loop's operation takes 1ns to complete, and there are 1,000 numbers to operate on then that leaves us at:
1,000 numbers * 1ns per operation = 1,0000ns.
(1,000ns)^4 = 1,000,000,000,000ns = 1,000 seconds ~= 16.666667 minutes.

#11
fread

fread

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 787 posts
I see you see my point ;).
Perfection of means and confusion of ends seem to characterize our age. Albert Einstein :confused:

#12
Ritwik I the programmer

Ritwik I the programmer

    Newbie

  • Members
  • PipPip
  • 17 posts
The calculation shown by lethalwire is amazingly precise, but incorrect. The program copes well even with 100000 numbers in worst case scenario, though I don't know how. Even I think it shouldn't run this well, but it does. All I can the number of iterations is definitely not in the order of n^4. And you can ignore the 2 array part as lonf as you make the size an instance variable of the class and modify that.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users