Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 16 of 16

Thread: Stop word removal and stemming

  1. #1
    Junior Member
    Join Date
    Aug 2013
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Stop word removal and stemming

    I would like to know how to code a program which will remove stop words and perform stemming on the given input file eg: minutes of a meeting.I'm new to Java so not getting any ideas on how to program in java.


  2. #2
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Stop word removal and stemming

    ideas on how to program in java
    What parts of your program design are you having problems with coding in java?
    Is your problem with designing a computer program
    or with coding a program design in java?
    If you don't understand my answer, don't ignore it, ask a question.

  3. #3
    Junior Member
    Join Date
    Aug 2013
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Stop word removal and stemming

    Quote Originally Posted by Norm View Post
    What parts of your program design are you having problems with coding in java?
    Is your problem with designing a computer program
    or with coding a program design in java?
    I WANT to know how to give an input file for reading and how to find the stop words and stem ,
    i'm having trouble with coding. don't know how to proceed.

  4. #4
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Stop word removal and stemming

    input file for reading
    One of the easy classes for reading a file is the Scanner class. It has many useful methods that allow you to read the file in different ways.

    how to find the stop words and stem
    The String class has many methods for searching a String.

    i'm having trouble with coding.
    can you describe in more detail what you are trying to code?
    Also an example might help. Post some text that contains what you are trying to find and describe what in that text you want to find.
    If you don't understand my answer, don't ignore it, ask a question.

  5. #5
    Junior Member
    Join Date
    Aug 2013
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Stop word removal and stemming

    I'm doing a project in opinion mining. The basic idea is to identify human behavior based on the interactions in a meeting eg proposal of an idea, comment, acknowledge. I plan to give few minutes of meetings as input and in the first module have to preprocess the data using stop word removal, stemming and POS tagging. say for example "This is a short sentence "
    This DT
    is VBZ
    short JJ
    sentence NN.
    I want to use pos tagging mainly to identify the names of people in the meetings. Hope you get an idea on what I'm trying to explain?

  6. #6
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Stop word removal and stemming

    Is this what you are trying to do:
    Take a sentence of words, separate the words and assign some tag to each word.
    In your example, the tags were the UPPERCASE letters after each word.
    "This is a short sentence "
    This DT
    is VBZ
    short JJ
    sentence NN.
    DT is the tag for the word: "This"

    Where do the tags come from?
    If you don't understand my answer, don't ignore it, ask a question.

  7. #7
    Junior Member
    Join Date
    Aug 2013
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Stop word removal and stemming

    can you tell me what is the code to write inorder to read data from a folder

  8. #8
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Stop word removal and stemming

    See the Scanner class for methods that can be used to read data from a single file.
    If you want to read all the files in a folder, you need to use the File class to get a list of the files in the folder and then use the Scanner class to read each file in the list.
    If you don't understand my answer, don't ignore it, ask a question.

  9. #9
    Junior Member
    Join Date
    Aug 2013
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Stop word removal and stemming

    this is the code i have for stop word removal. can you help me to read the data from files in a folder istead of directly giving it in the program

    package split;
     
    import java.util.*;
    import java.io.*;
    public class Split {
     
    public static void main(String[] args)
    {
    for(String word : Split.words("The rain in spain falls mainly on the plains, except when it's not exactly working that way! And I need+some= way. How~ will \"(this)\" \"work\"?"))
    System.out.println(word);
     
     
    }
     
    static HashSet stopwords = new HashSet();
     
    public static void addStopwords()
    {
    try{
    BufferedReader br = new BufferedReader(new FileReader("stopwords.txt"));
     
    while(br.ready())
    {
    stopwords.add(br.readLine());
    }
     
    }
    catch(Exception e){System.out.println(e);}
    }
     
    public static ArrayList<String> words(String line)
    {
    if(stopwords.size() == 0)
    addStopwords();
     
    ArrayList result = new ArrayList();
     
    String[] words = line.split("[ \t\n,\\.\"!?$~()\\[\\]\\{\\}:;/\\\\<>+=%*]");
    for(int i=0; i < words.length; i++)
    {
    if(words[i] != null && !words[i].equals(""))
    {
    String word = words[i].toLowerCase();
    if(!stopwords.contains(word))
    {
    result.add(Stemmer.stem(word));
    }
    }
    }
     
    return result;
    }
     
    }
    Last edited by jps; September 3rd, 2013 at 12:28 AM. Reason: code tags

  10. #10
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Stop word removal and stemming

    can you help me to read the data from files in a folder
    I see that the code is using the BufferedReader class's methods to read a file.
    What problems are you having reading the data?

    Please edit your post and wrap the code with code tags. Be sure the code is properly formatted. Nested statements should be indented. All statements SHOULD NOT start in the first column.
    If you don't understand my answer, don't ignore it, ask a question.

  11. #11
    Junior Member
    Join Date
    Aug 2013
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Stop word removal and stemming

    sir what i want is to read information from different word documents which are stored in a folder individually. eg: the folder XYZ will have 4 word documents a.txt,b.txt, c.txt, d.txt. can you please help me to write a code so i can read from this folder

  12. #12
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Stop word removal and stemming

    Did you miss post#8?

    The File class has methods that return a list of the files in a folder. You can use that list to read the files in the folder one at a time.

    The steps are:
    get a list of the files in the folder
    begin loop
    get next file in the list
    read data from that file
    end loop
    If you don't understand my answer, don't ignore it, ask a question.

  13. #13
    Junior Member
    Join Date
    Aug 2013
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Stop word removal and stemming

    Quote Originally Posted by Norm View Post
    Did you miss post#8?

    The File class has methods that return a list of the files in a folder. You can use that list to read the files in the folder one at a time.

    The steps are:
    get a list of the files in the folder
    begin loop
    get next file in the list
    read data from that file
    end loop
    Ok sir. can you code in java.
    I'm also getting an exception in this line BufferedReader br = new BufferedReader(new FileReader("stopwords.txt"))
    it is :java.io.FileNotFoundException: stopwords.txt (The system cannot find the file specified). I'm using netbeans where do i save the txt file

  14. #14
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Stop word removal and stemming

    The system cannot find the file
    The program can not find the file in the location where you are looking. To find where the program is looking for the file, create a File object for the file you are trying to read and print that File object's absolute path value which will show you where the program is looking for the file.

    I don't know what your IDE does with the location of files when a program tries to read a file.
    If you don't understand my answer, don't ignore it, ask a question.

  15. #15
    Junior Member
    Join Date
    Aug 2013
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Stop word removal and stemming

    i do not know how to code in java can some one help me to read data from folder instead of this sentence
    "for(String word : Split.words("The rain in spain falls mainly on the plains, except when it's not exactly working that way! And I need+some= way. How~ will \"(this)\" \"work\"?"))
    System.out.println(word);"

  16. #16
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Stop word removal and stemming

    See the Scanner class for methods to read data from a file. There are many examples of code using the Scanner class's methods here on the forum. Do a Search.

    What have you tried? What problems are you having with it?
    If you don't understand my answer, don't ignore it, ask a question.

Similar Threads

  1. Replies: 7
    Last Post: March 29th, 2013, 07:38 AM
  2. Create a virus removal tool
    By Martin_PRO in forum What's Wrong With My Code?
    Replies: 1
    Last Post: February 11th, 2013, 06:01 AM
  3. Replies: 5
    Last Post: August 20th, 2012, 01:01 AM
  4. stop words removal
    By vanitha in forum Java Theory & Questions
    Replies: 2
    Last Post: July 27th, 2012, 08:58 AM
  5. Reading a text file word by word
    By dylanka in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: October 21st, 2011, 02:06 PM

Tags for this Thread