Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 2 of 2

Thread: How to eliminate Junk charaters from the file...

  1. #1
    Junior Member
    Join Date
    Nov 2009
    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default How to eliminate Junk charaters from the file...

    I am having input file with junk characters (see attached file... U will see some junk characters in it...)

    These characters are causing problems in the workflow.

    I want to remove these characters from the file by scanning the file and removing any junk character found using a Java Programs.

    Can anyone tell me the right way to do this? (I found that in this editor in which I am posting this thread removes these junk characters on pasting them)


    On command line, I was able to perform the same using:
    sed -i 's/\o013//' RawInput.txt

    But I want a generalized program which removes all junk characters.
    Attached Files Attached Files
    Last edited by Harry_; November 10th, 2009 at 07:10 AM.


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: How to eliminate Junk charaters from the file...

    If you can thoroughly define what a junk character is, it's possible to come up with a good regular expression that can find the junk characters and remove them (or even just parse anything read in by hand to remove them). Are they anything that don't look like a path? Or is it anything proceeded by a comma (including the comma)? Are junk characters always separated from the rest of the good characters by white-spaces (i'm guessing not since there are some ,8 stuff after the paths, but I don't know if they're really junk or not)?

Tags for this Thread