efficient way of checking duplicates
hi, i was wondering if someone could give me an efficient way of checking and removing duplicates.
basically i have this giant file which parses for IDs and Groups Num.
The condition is that I can have 2 different IDs go into the same group number, but I cannot have 2 of the same IDs (think of it more as repeated) go into the same group.
ie.
ID Group Num
150 200
180 200
would be valid
or
ID Group Num
150 200
150 201
but
ID Group Num
150 200
150 200
would not be.
thanks so much = )
Re: efficient way of checking duplicates
Use a Set, a Map, or combination of the two. In this case, a simple Map keyed with GroupNum and valued with a Set (filled with IDs) would work.
Re: efficient way of checking duplicates
Hashes or trees are also a good way to check for duplicates or even "closeness", where-as hashes are an exact check.
Hashes/trees are O(log(n)), hashes are O(1) (for the average search).