Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 6 of 6

Thread: UTF - 8 / Byte Stream

  1. #1
    Junior Member
    Join Date
    Oct 2010
    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default UTF - 8 / Byte Stream

    Hey
    I need to validate if the provided byte stream is UTF-8.
    I need help with start and programming logic. I could accept file which would have byteStream or i could use Scanner and they would input the byteStream. We don't know how long the byte stream is.
    I need to make sure that bytestream is UTF-8 validated.
    Any Kind of help is appreciated.
    Thank You


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: UTF - 8 / Byte Stream

    What do you mean by validate? Really the only thing you can do if you're trying to read from a file is to see if interpreting the file as UTF-8 encoded is what it's suppose to look like.

    If you have a stream, send a known character and see if you get the same expected character when you try to read it (I would recommend sending multiple known characters as many character encoding formats have similar layouts).

  3. #3
    Junior Member
    Join Date
    Oct 2010
    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: UTF - 8 / Byte Stream

    Quote Originally Posted by helloworld922 View Post
    What do you mean by validate? Really the only thing you can do if you're trying to read from a file is to see if interpreting the file as UTF-8 encoded is what it's suppose to look like.

    If you have a stream, send a known character and see if you get the same expected character when you try to read it (I would recommend sending multiple known characters as many character encoding formats have similar layouts).
    no, actually, this is security problem. I would take in a stream byte. And I would have to check it if its valid UTF-8. There is some way to do that. Probably i would need to go byte by byte or bit by bit to see its valid UTF-8 format. The main logic is to not to accept anything else then UTF-8, and if it does encounter something thats not UTF-8, then say, "NOT Valid Stream"

  4. #4
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: UTF - 8 / Byte Stream

    Take a look at Wikipedia: UTF-8. The only way you would know if you had an invalid UTF-8 character is if it's one of those who's hex sequence is invalid to the UTF-8 encoding. However, if the stream somehow passed another valid UTF-8 character in place of another UTF-8 character, you would have no way of knowing if that character was what the sender intended to send.

    You're probably better off asking for something like a MD5 hash from the sender and then checking that against the data you received.

  5. The Following User Says Thank You to helloworld922 For This Useful Post:

    JavaCODER (October 16th, 2010)

  6. #5
    Junior Member
    Join Date
    Oct 2010
    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: UTF - 8 / Byte Stream

    Quote Originally Posted by helloworld922 View Post
    Take a look at Wikipedia: UTF-8. The only way you would know if you had an invalid UTF-8 character is if it's one of those who's hex sequence is invalid to the UTF-8 encoding. However, if the stream somehow passed another valid UTF-8 character in place of another UTF-8 character, you would have no way of knowing if that character was what the sender intended to send.

    You're probably better off asking for something like a MD5 hash from the sender and then checking that against the data you received.
    I was thinking the same thing, but I have to write a java program as Exercise, I did look that wiki article, but there were many different articles that said something different then wiki did.

    I was hoping anyone here would know how to properly validate that.

  7. #6
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: UTF - 8 / Byte Stream

    You could also try opening a character stream to use UTF-8 encoding.

    See: Byte Encodings and Strings (The Java™ Tutorials > Internationalization > Working with Text)

    I don't know if the Java stream will complain if there's an invalid character, you'll need to check on this.

    I believe the reason why there are many different listings for what is a valid UTF-8 and what isn't is because the UTF-8 standard has changed a few times. Make sure you're using an up-to-date listing (I'm pretty sure the Wikipedia listing is current).

Similar Threads

  1. How Convert a Byte array to a image format
    By perlWhite in forum Algorithms & Recursion
    Replies: 7
    Last Post: February 19th, 2011, 03:16 PM
  2. Replies: 4
    Last Post: September 5th, 2010, 10:29 AM
  3. [SOLVED] utf-16 byte[] to string conversion
    By Gerhardl in forum What's Wrong With My Code?
    Replies: 5
    Last Post: February 25th, 2010, 07:06 AM
  4. byte[] from vector
    By perlWhite in forum Collections and Generics
    Replies: 1
    Last Post: August 26th, 2009, 05:10 AM
  5. Convert Vector to Byte Array
    By perlWhite in forum Java Theory & Questions
    Replies: 0
    Last Post: August 25th, 2009, 05:45 AM