Originally Posted by romavolman
What is the time complexity of the code that you posted?
I won't keep everyone in suspense; I'll give the answer in terms of a commonly used metric. Reference: Big O Notation - wikipedia
It is O(N squared).
Summary: The outer loop goes through all of the N elements of a
. So the complexity contributed by the outer loop O(N).
For each element in a, the inner loop goes through the N elements of b
unless and until a matching value is found in b
. Assuming that the elements of b
are "randomly" ordered, the average number of times through the second loop is N/2. (So the complexity contributed by the inner loop is also O(N).
Conclusion, as N gets larger, the total number of comparisons gets larger at a rate proportional to N squared. (Multiplicative constants like 1/2 are commonly omitted in Big-O expressions.)
If we assume that the most time is consumed by the decision-making comparison and the resulting conditional branch that determines whether to put the element into array c, the time of the entire process will be primarily affected the total number of comparisons, which is equal to the product of number of times through the outer loop and the number of times through the inner loop.
Therefore, the time complexity is O(N) times O(N), which, according to Big-O analysis described in the above reference, is equal to O(N squared).
Next question: Can this be improved?
Here's a thought that came to me as a result of remembering a scene from an old Sherlock Holmes movie many years ago (Basil Rathbone, Nigel Bruce). Sherlock had two sets of papers. Each piece of paper had a number written on it (taxi registration numbers collected by several of his Baker Street Irregulars at two different locations or some such thing).
He needed to make a list that consisted of numbers there were common to both sets. (This is slightly different from your problem where you need to find numbers that are in one array and not
in the other, but the procedure can be the same and the time complexity will be the same.)
I seem to remember that he didn't know whether or not any of the numbers in the first set were duplicates, and he wanted a list of all numbers that were common to the two sets.
Time was of the essence, and he figured out that taking one number at a time from the first set and riffling through the second set for each one, starting with the paper on top and looking at papers of the second set one at a time, continuing unless and until he found a match, would take too long (the bomb was ticking).
So Sherlock had a Better Idea: What if the papers in the second set were sorted according to the values of the numbers? (A little overhead here, performed only once: Sort the second set.)
It would be much quicker to see whether a given number from the first set is also in the second set, by the following procedure:
For each number from the first set, you would start at the top of the second set, and you might get lucky. The number might match. If it did, then you are through with this search, so you do whatever you do in case of a match and go on to the next number in the first set.
But if it didn't match, then you would "cut the deck" (the second set) in half and see if the number on top of the second half was less than, equal to, or greater than the number you were looking for. Maybe it would match, and you are through for this search. Put the second set back together again (still sorted), and go on to the next number in the first set.
If it was not equal, then, since the second set is sorted, you could use the value of the number on the top sheet of the second half of the second set to decide whether the number you are looking for can be in the first half or the second half of the "cut" deck.
Take the selected half and "cut the deck" in half again. Repeat the inspection/selection process until either
1. You have found what you were looking for.
2. You have cut the deck in half so many times that you have run out of things to cut. (Meaning that you have eliminated the possibility of finding it.)
This procedure is known as a binary search algorithm, and its complexity is O(log(N)). Reference: Binary Search Algorithm - wikipedia
Time complexity of the inspection process is O(N) (from the outer loop) times O(log(N)) (from the binary search), which is equal to O(N log(N)). Reference: Big O Notation - wikipedia
Now if you select a sort algorithm that is O(N log(N)) (remember, you only do this once), the total complexity is O(N log(N)), which, generally speaking, will be less than O(N squared).
So, here's a plan: Sort array b
. For each element in the array a
, do a binary search of the sorted array b
looking for that number. Act accordingly, and continue until you have run through all of the elements of a
Now, it just happens that...
class has convenient overloaded methods for sorting and performing binary searches that work great (and painlessly for the programmer) with arrays of ints
Note that, because of other things that go on in programs (time taken for things other than comparisons), the advantages might not be startlingly better for small arrays. In fact if you benchmark it for small arrays, the run times might even be longer for the "optimized" algorithm, but for larger arrays, we can expect improvement in average execution times.
That's what the Big-O stuff (as well as other kinds of statistical analysis) is all about
: Expectations, not guarantees.
Final note: There may be additional optimization possibilities, but maybe this can be at least a start for further investigation. Once you break out of the "naive" mindset of nested loops going through the two sets, other things might occur. Let the creative juices flow...