How to improve this algorithm's efficiency?
Hi all,
This is to do with the time and space complexity of the code below, an exercise on an algorithms course. It asks the user to choose integer n, then give integers between 1 and n minus one of the integers. Then it finds which of the integers is missing. The code as it is works just fine, however it runs too slow. The exercise instructions give as a hint that there is an algorithm for this with time complexity of O(n) and space complexity of O(1) and that the code is to be able to handle a million integers in a few seconds. Note that I can only add/alter things within the missingNumber( ) method. I'm a bit of a beginner in the time and space complexity subject matter, so could someone give me some hints as to which direction to go to improve my code? Thanks a bunch.
Code :
import java.util.Scanner;
public class MissingNumber {
public static Scanner reader = new Scanner(System.in);
public static int missingNumber(int[] numbers) {
int missing = 0;
for (int i: numbers) {
missing ^= i;
}
for (int i = 1; i <= numbers.length + 1; i++) {
missing ^= i;
}
return missing;
}
public static void main(String[] args) {
System.out.print("Max number? ");
int biggest = reader.nextInt();
int numbers[] = new int[biggest - 1];
System.out.println("Give numbers:");
for (int i = 0; i < biggest - 1; i++) {
numbers[i] = reader.nextInt();
}
int missing = missingNumber(numbers);
System.out.println("Missing number: " + missing);
}
}
Re: How to improve this algorithm's efficiency?
Quote:
Originally Posted by
MariaNiku
.
...The code as it is works just fine, however it runs too slow.
Can you define what you mean by "too slow"? Have you tested it?
Quote:
Originally Posted by
MariaNiku
The exercise instructions give as a hint that there is an algorithm for this with time complexity of O(n) and space complexity of O(1)
Hmmm---Here's what I think:
- The missingNumber() function is already O(n) in time (where n is the size of the array), isn't it? The function has two separate loops, each of which depends only on the size of the array, right?
- The missingNumber() function is already O(1) isn't it? The function's use of memory doesn't depend on the size of the array, right?
Quote:
Originally Posted by
MariaNiku
and that the code is to be able to handle a million integers in a few seconds.
Using the given method on my modest Centos 6.3 Linux workstation, it can find the missing number from an array of 999999 integers that I read from a file containing a single missing integer in the range 1...1000000 in something like 20 or 30 milliseconds as measured by System.nanotime(). (That's the elapsed time of the function execution. Reading the array from a file takes a couple of seconds.) Note that the integers in the file are not sorted, and the array is not sorted before calling the missingNumber() function.
Quote:
Originally Posted by
MariaNiku
I'm a bit of a beginner in the time and space complexity subject matter
Well, that's the reason you are taking the course, right?
Bottom line: I guess I'm missing the point here. It wouldn't be the first time. Not even the first time today.
I mean, maybe it's possible to come up with an algorithm that improves the run time and/or memory usage, but how would you know if it is actually better unless you have a way of testing it to compare with the original function? (That's why I asked whether you have tested it.)
Cheers!
Z
Re: How to improve this algorithm's efficiency?
Sorry for being imprecise there. The exercises are submitted via a system which checks it, testing the speed, then gives ok or else an error message. In this case the message said the code took too long with a test input (which was, I recall, three numbers). It didn't give more details, so perhaps it's a matter of some additional fine-tuning if that is possible.
Re: How to improve this algorithm's efficiency?
Quote:
Originally Posted by
MariaNiku
... fine-tuning if that is possible...
No, it's not a matter of fine-tuning. It's a matter of finding another algorithm. That's kind of the point of having a class in algorithms, right? Not optimizing a particular scheme for a particular compiler and a particular computer, but trying to find a better scheme.
Consider the following: Suppose you have an array of 999999 distinct integers in the range 1...1000000. (That's your problem for the value of biggest = 1000000.) You want to find the single missing integer in that range.
Now, here's a possibility:
Calculate the sum of the first million positive integers. There is a closed formula said to have been discovered by Princeps mathematicorum Karl Friedrich Gauss when he was in elementary school (he was seven years old at the time). See, for example Math Forum: Ask Dr. Math. You do not need a loop to find the sum of the first N integers. You do need a loop to calculate the sum of the integers in the array. Subtract the sum of the elements in the array from the sum of the first N integers, and you have the missing number at your fingertips.
Note that the sum of the first 1000000 integers is something like 500000500000, so you will need a long integer to do the math in Java. (Since the largest integer value is 2147483647.)
The method is still O(n) in time and O(1) in space. It might, indeed, be faster than the given algorithm since there is only one loop rather than two, but the time complexity in Big-O notation is still O(n).
On the other hand It might not be very much faster, since the loop involves adding into a long integer sum rather than just doing an integer exclusive-or of the individual elements.
Bottom line: Try it.
Then go back and compare run time for this scheme versus the run time for the original. With my 32-bit Linux system, it cut the run time for an array size of 999999 about in half. (10-15 milliseconds versus 20-30 milliseconds for the original).
Now, whether it is sufficient to satisfy the inflexible requirements of RoboGrader, I have no idea.
I also have no further interest in the problem unless there is feedback from a real, live, knowledgeable human being who can offer some insight.
[/begin Rambling non-sequitur]
Interaction with RoboGrader is one-way (no opportunity for rebuttal), and there is no way to know what is required other than messing around around hoping to get lucky enough to stumble across the specific answer that was programmed into that genius. (How slow is "too slow"? How is "slowness" measured? If speed is measured for the three inputs that you are using for evaluation, are you absolutely sure that the speed advantage scales consistently for a million inputs? Those are the kind of things I would ask of a real, live, knowledgeable human.)
Unlike things like, say Project Euler, where each problem is guaranteed to have exactly one correct answer,
one-way responses from RoboGrader like "it's too slow" just leave me cold.
I mean, sometimes algorithms that are marvelously more efficient for 10000000 inputs are actually slower than "worse" algorithms when tested with fewer inputs. And vice versa.
Even a blind hog finds an acorn once in a while, but I don't particularly enjoy being treated like a blind hog. It helps that I'm not taking that course for credit. I have a choice. You probably don't.
[/end Rambling non-sequitur]
So: Good luck.
Cheers!
Z