You could also try opening a character stream to use UTF-8 encoding.
See:
Byte Encodings and Strings (The Java™ Tutorials > Internationalization > Working with Text)
I don't know if the Java stream will complain if there's an invalid character, you'll need to check on this.
I believe the reason why there are many different listings for what is a valid UTF-8 and what isn't is because the UTF-8 standard has changed a few times. Make sure you're using an up-to-date listing (I'm pretty sure the Wikipedia listing is current).