How to preprocess a vague dataset using java