Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 4 of 4

Thread: Fastest way to read and search a string in a large file using core java

  1. #1
    Junior Member
    Join Date
    Nov 2012
    Posts
    10
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Fastest way to read and search a string in a large file using core java

    Hi All,

    I want to read a file and search string in file and returns it's related value.

    The file contains key and value pair and they are separated by space. Key is string and value is and URL in <a> tag, and the file size is 60MB.

    I read the file using file Input stream and stored it into hash Map and performed searching into hash map, and it's taking less than a second to read the file and search a string and return it's value.

    The searching in file having some rules as:
    1. search string in file can be a sentence, we have to search the matching key into the file and return all the values(URL). and the search string length should be greater than 10.
    2. if the search string contains _(underscore) at the end, then if the exact match(key) found in the file then no need to check the length of search string.

    e.g. of Search Sting :
    This is my first post to Java Programming Forums.

    If the file contains Programming key, then we should have to return all URL related to this key.

    I tried using file scanner, buffer Reader, File Input Stream and direct search in file and using hash map. But using hash map the performance was better.

    Can please anybody suggest me how can I improve the performs so that I can achieve the file read and search within 5 Milliseconds. This will be an great help for me..

    Thanks a lot in advance....

    -----------
    The code I have written to read the file is as mentioned below :-

    Please suggest us the changes or any other better way so that we can improve the reading and search performance by less than 5 milliseconds.

    1. File CharByCharSearch.java
    import java.io.BufferedReader;
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.Set;
     
    public class CharByCharSearch {
    	private static HashMap<String, String> mapForKeyValues = new HashMap<String, String>();
    	private static CharByCharSearch getHtml = null;
    	private static ThreadLocal localPool = new ThreadLocal();
    	private static BufferedReader dataInputStream = null;
    	static {
    		getHtml = new CharByCharSearch();
    		dataInputStream = FileReader.getFileContentsBR();
    		getHtml.grabHTMLLinksSearch();
    		localPool.set(mapForKeyValues);
    	}
     
    	public CharByCharSearch() {
     
    	}
     
    	public void grabHTMLLinksSearch() {
    		String html = "";
    		try {
    			long milliSeconds1 = System.currentTimeMillis();
    			long milliSeconds2 = 0l;
    			html = dataInputStream.readLine();
    			while (null != html) {
    				milliSeconds2 = System.currentTimeMillis();
    				String firstS = html.substring(
    						html.toLowerCase().indexOf("=") + 2, html.length());
    				mapForKeyValues.put(html.substring(0, html.indexOf("<") - 1)
    						.toLowerCase(), firstS.substring(0,
    						firstS.indexOf(" ") - 1));
    				html = dataInputStream.readLine();
    			}
    			System.out.println("time took to search the keyword@@@@ "
    					+ (milliSeconds2 - milliSeconds1));
    		} catch (Exception e) {
    			System.out.println("error when getting the data");
    			e.printStackTrace();
    		} finally {
    			try {
    				if (null != dataInputStream) {
    					dataInputStream.close();
    				}
    			} catch (Exception e) {
    				e.printStackTrace();
    			}
    		}
    	}
     
    	public ArrayList<String> search(String searchWord) {
    		ArrayList<String> linkURLS = new ArrayList<String>();
    		String searchKey = searchWord.toLowerCase();
    		String[] searchKeyValues = searchKey.split(" ");
    		int len = searchKeyValues.length;
    		HashMap<String, String> hashMap = (HashMap<String, String>) localPool
    				.get();
    		Set<String> keys = hashMap.keySet();
    		for (String key : keys) {
    			int index = searchByChar.kmp(searchKey, key);
    			if (key.length() >= 10) {
    				if (-1 != index) {// rule 1 & 2
    					linkURLS.add(mapForKeyValues.get(key));
    				}
    			} else if (key.equalsIgnoreCase(searchKey)) {// rule 3
    				linkURLS.add(mapForKeyValues.get(key));
    			} else if (key.endsWith("_")) {// rule 5
    				if (-1 != index) {
    					linkURLS.add(mapForKeyValues.get(key));
    				}
    			} else if (len > 0) {// rule 4
    				for (int i = 0; i < len; i++) {
    					if (searchKeyValues[i].equalsIgnoreCase(key)) {
    						linkURLS.add(mapForKeyValues.get(key));
    						break;
    					}
    				}
    			}
    		}
    		return linkURLS;
    	}
     
    	public static void main(String[] args) {
    		ArrayList<String> linkURLS = getHtml
    				.search("My Name Is jay_ patil_00_.");
    		for (String value : linkURLS) {
    			System.out.println(value);
    		}
    	}
    }

    --------------------------------------------------------------------------
    2. Second java file searchByChar.java
    public class searchByChar {
    	public static int[] prekmp(String pattern) {
    		int[] next = new int[pattern.length()];
    		int i=0, j=-1;
    		next[0]=-1;
    		while (i<pattern.length()-1) {
    			while (j>=0 && pattern.charAt(i)!=pattern.charAt(j))
    				j = next[j];
    			i++; 
    			j++;
    			next[i] = j;
    		}
    		return next;
    	}
     
    	public static int kmp(String text, String pattern) {
    		int[] next = prekmp(pattern);
    		int i=0, j=0;
    		while (i<text.length()) { 
    			while (j>=0 && text.charAt(i)!=pattern.charAt(j))
    				j = next[j];
    			i++; j++;
    			if (j==pattern.length()) 
    				return i-pattern.length();
    		}
    		return -1;
    	}
     
    }
    --------------------------------------------------------------------------
    The content of text file are like:

    patil_00_ <A HREF="http://support.jay.com:8080/index.jsp" title="View the Supp" target=_blank class="table">patil_00_</A>
    jay_ <A HREF="http://support.sac.com:8080/index.jsp" title="View the jsp" target=_blank class="link">jay_</A>
    ...........................
    and the 3rd file FileReader.java read the text file using DataInputStream and return the dataInputStream
    Last edited by patilsn_jay; November 26th, 2012 at 04:38 AM. Reason: added code also


  2. #2
    Junior Member
    Join Date
    Nov 2012
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Fastest way to read and search a string in a large file using core java

    Assuming it looks like:

    <key1> <value1> <key2> <value>... and IF search string is a substring of a key, return each of these results?

    How are you hashing it right now? Are you hashing every substring possible?
    Need Confluence & JIRA hosting? Query Foundry | Need Shared Hosting & Virtual Servers? Cloud Shards
    Ultra fast and reliable USA VPS!

  3. #3
    Junior Member
    Join Date
    Nov 2012
    Posts
    10
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Fastest way to read and search a string in a large file using core java

    Quote Originally Posted by patilsn_jay View Post
    Hi All,

    I want to read a file and search string in file and returns it's related value.

    The file contains key and value pair and they are separated by space. Key is string and value is and URL in <a> tag, and the file size is 60MB.

    I read the file using file Input stream and stored it into hash Map and performed searching into hash map, and it's taking less than a second to read the file and search a string and return it's value.

    The searching in file having some rules as:
    1. search string in file can be a sentence, we have to search the matching key into the file and return all the values(URL). and the search string length should be greater than 10.
    2. if the search string contains _(underscore) at the end, then if the exact match(key) found in the file then no need to check the length of search string.

    e.g. of Search Sting :
    This is my first post to Java Programming Forums.

    If the file contains Programming key, then we should have to return all URL related to this key.

    I tried using file scanner, buffer Reader, File Input Stream and direct search in file and using hash map. But using hash map the performance was better.

    Can please anybody suggest me how can I improve the performs so that I can achieve the file read and search within 5 Milliseconds. This will be an great help for me..

    Thanks a lot in advance....
    The code I have written to read the file is as mentioned below :-

    Please suggest us the changes or any other better way so that we can improve the reading and search performance by less than 5 milliseconds.



    1. File CharByCharSearch.java
    import java.io.BufferedReader;
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.Set;
     
    public class CharByCharSearch {
    	private static HashMap<String, String> mapForKeyValues = new HashMap<String, String>();
    	private static CharByCharSearch getHtml = null;
    	private static ThreadLocal localPool = new ThreadLocal();
    	private static BufferedReader dataInputStream = null;
    	static {
    		getHtml = new CharByCharSearch();
    		dataInputStream = FileReader.getFileContentsBR();
    		getHtml.grabHTMLLinksSearch();
    		localPool.set(mapForKeyValues);
    	}
     
    	public CharByCharSearch() {
     
    	}
     
    	public void grabHTMLLinksSearch() {
    		String html = "";
    		try {
    			long milliSeconds1 = System.currentTimeMillis();
    			long milliSeconds2 = 0l;
    			html = dataInputStream.readLine();
    			while (null != html) {
    				milliSeconds2 = System.currentTimeMillis();
    				String firstS = html.substring(
    						html.toLowerCase().indexOf("=") + 2, html.length());
    				mapForKeyValues.put(html.substring(0, html.indexOf("<") - 1)
    						.toLowerCase(), firstS.substring(0,
    						firstS.indexOf(" ") - 1));
    				html = dataInputStream.readLine();
    			}
    			System.out.println("time took to search the keyword@@@@ "
    					+ (milliSeconds2 - milliSeconds1));
    		} catch (Exception e) {
    			System.out.println("error when getting the data");
    			e.printStackTrace();
    		} finally {
    			try {
    				if (null != dataInputStream) {
    					dataInputStream.close();
    				}
    			} catch (Exception e) {
    				e.printStackTrace();
    			}
    		}
    	}
     
    	public ArrayList<String> search(String searchWord) {
    		ArrayList<String> linkURLS = new ArrayList<String>();
    		String searchKey = searchWord.toLowerCase();
    		String[] searchKeyValues = searchKey.split(" ");
    		int len = searchKeyValues.length;
    		HashMap<String, String> hashMap = (HashMap<String, String>) localPool
    				.get();
    		Set<String> keys = hashMap.keySet();
    		for (String key : keys) {
    			int index = searchByChar.kmp(searchKey, key);
    			if (key.length() >= 10) {
    				if (-1 != index) {// rule 1 & 2
    					linkURLS.add(mapForKeyValues.get(key));
    				}
    			} else if (key.equalsIgnoreCase(searchKey)) {// rule 3
    				linkURLS.add(mapForKeyValues.get(key));
    			} else if (key.endsWith("_")) {// rule 5
    				if (-1 != index) {
    					linkURLS.add(mapForKeyValues.get(key));
    				}
    			} else if (len > 0) {// rule 4
    				for (int i = 0; i < len; i++) {
    					if (searchKeyValues[i].equalsIgnoreCase(key)) {
    						linkURLS.add(mapForKeyValues.get(key));
    						break;
    					}
    				}
    			}
    		}
    		return linkURLS;
    	}
     
    	public static void main(String[] args) {
    		ArrayList<String> linkURLS = getHtml
    				.search("My Name Is jay_ patil_00_.");
    		for (String value : linkURLS) {
    			System.out.println(value);
    		}
    	}
    }

    --------------------------------------------------------------------------
    2. Second java file searchByChar.java
    public class searchByChar {
    	public static int[] prekmp(String pattern) {
    		int[] next = new int[pattern.length()];
    		int i=0, j=-1;
    		next[0]=-1;
    		while (i<pattern.length()-1) {
    			while (j>=0 && pattern.charAt(i)!=pattern.charAt(j))
    				j = next[j];
    			i++; 
    			j++;
    			next[i] = j;
    		}
    		return next;
    	}
     
    	public static int kmp(String text, String pattern) {
    		int[] next = prekmp(pattern);
    		int i=0, j=0;
    		while (i<text.length()) { 
    			while (j>=0 && text.charAt(i)!=pattern.charAt(j))
    				j = next[j];
    			i++; j++;
    			if (j==pattern.length()) 
    				return i-pattern.length();
    		}
    		return -1;
    	}
     
    }
    --------------------------------------------------------------------------
    The content of text file are like:

    patil_00_ <A HREF="http://support.jay.com:8080/index.jsp" title="View the Supp" target=_blank class="table">patil_00_</A>
    jay_ <A HREF="http://support.sac.com:8080/index.jsp" title="View the jsp" target=_blank class="link">jay_</A>
    ...........................
    and the 3rd file FileReader.java read the text file using DataInputStream and return the dataInputStream

  4. #4
    Junior Member
    Join Date
    Nov 2012
    Posts
    10
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Fastest way to read and search a string in a large file using core java

    Quote Originally Posted by concerto49 View Post
    Assuming it looks like:

    <key1> <value1> <key2> <value>... and IF search string is a substring of a key, return each of these results?

    How are you hashing it right now? Are you hashing every substring possible?

    The code I have written to read the file is as mentioned below :-

    Please suggest us the changes or any other better way so that we can improve the reading and search performance by less than 5 milliseconds.



    1. File CharByCharSearch.java
    import java.io.BufferedReader;
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.Set;
     
    public class CharByCharSearch {
    	private static HashMap<String, String> mapForKeyValues = new HashMap<String, String>();
    	private static CharByCharSearch getHtml = null;
    	private static ThreadLocal localPool = new ThreadLocal();
    	private static BufferedReader dataInputStream = null;
    	static {
    		getHtml = new CharByCharSearch();
    		dataInputStream = FileReader.getFileContentsBR();
    		getHtml.grabHTMLLinksSearch();
    		localPool.set(mapForKeyValues);
    	}
     
    	public CharByCharSearch() {
     
    	}
     
    	public void grabHTMLLinksSearch() {
    		String html = "";
    		try {
    			long milliSeconds1 = System.currentTimeMillis();
    			long milliSeconds2 = 0l;
    			html = dataInputStream.readLine();
    			while (null != html) {
    				milliSeconds2 = System.currentTimeMillis();
    				String firstS = html.substring(
    						html.toLowerCase().indexOf("=") + 2, html.length());
    				mapForKeyValues.put(html.substring(0, html.indexOf("<") - 1)
    						.toLowerCase(), firstS.substring(0,
    						firstS.indexOf(" ") - 1));
    				html = dataInputStream.readLine();
    			}
    			System.out.println("time took to search the keyword@@@@ "
    					+ (milliSeconds2 - milliSeconds1));
    		} catch (Exception e) {
    			System.out.println("error when getting the data");
    			e.printStackTrace();
    		} finally {
    			try {
    				if (null != dataInputStream) {
    					dataInputStream.close();
    				}
    			} catch (Exception e) {
    				e.printStackTrace();
    			}
    		}
    	}
     
    	public ArrayList<String> search(String searchWord) {
    		ArrayList<String> linkURLS = new ArrayList<String>();
    		String searchKey = searchWord.toLowerCase();
    		String[] searchKeyValues = searchKey.split(" ");
    		int len = searchKeyValues.length;
    		HashMap<String, String> hashMap = (HashMap<String, String>) localPool
    				.get();
    		Set<String> keys = hashMap.keySet();
    		for (String key : keys) {
    			int index = searchByChar.kmp(searchKey, key);
    			if (key.length() >= 10) {
    				if (-1 != index) {// rule 1 & 2
    					linkURLS.add(mapForKeyValues.get(key));
    				}
    			} else if (key.equalsIgnoreCase(searchKey)) {// rule 3
    				linkURLS.add(mapForKeyValues.get(key));
    			} else if (key.endsWith("_")) {// rule 5
    				if (-1 != index) {
    					linkURLS.add(mapForKeyValues.get(key));
    				}
    			} else if (len > 0) {// rule 4
    				for (int i = 0; i < len; i++) {
    					if (searchKeyValues[i].equalsIgnoreCase(key)) {
    						linkURLS.add(mapForKeyValues.get(key));
    						break;
    					}
    				}
    			}
    		}
    		return linkURLS;
    	}
     
    	public static void main(String[] args) {
    		ArrayList<String> linkURLS = getHtml
    				.search("My Name Is jay_ patil_00_.");
    		for (String value : linkURLS) {
    			System.out.println(value);
    		}
    	}
    }

    --------------------------------------------------------------------------
    2. Second java file searchByChar.java
    public class searchByChar {
    	public static int[] prekmp(String pattern) {
    		int[] next = new int[pattern.length()];
    		int i=0, j=-1;
    		next[0]=-1;
    		while (i<pattern.length()-1) {
    			while (j>=0 && pattern.charAt(i)!=pattern.charAt(j))
    				j = next[j];
    			i++; 
    			j++;
    			next[i] = j;
    		}
    		return next;
    	}
     
    	public static int kmp(String text, String pattern) {
    		int[] next = prekmp(pattern);
    		int i=0, j=0;
    		while (i<text.length()) { 
    			while (j>=0 && text.charAt(i)!=pattern.charAt(j))
    				j = next[j];
    			i++; j++;
    			if (j==pattern.length()) 
    				return i-pattern.length();
    		}
    		return -1;
    	}
     
    }
    --------------------------------------------------------------------------
    The content of text file are like:

    patil_00_ <A HREF="http://support.jay.com:8080/index.jsp" title="View the Supp" target=_blank class="table">patil_00_</A>
    jay_ <A HREF="http://support.sac.com:8080/index.jsp" title="View the jsp" target=_blank class="link">jay_</A>
    ...........................
    and the 3rd file FileReader.java read the text file using DataInputStream and return the dataInputStream

Similar Threads

  1. Fastest way to read and search a string in a large file using java
    By patilsn_jay in forum What's Wrong With My Code?
    Replies: 1
    Last Post: November 24th, 2012, 09:29 AM
  2. HW-how to read a string from a text file
    By yaboibangz in forum What's Wrong With My Code?
    Replies: 2
    Last Post: October 1st, 2012, 09:34 AM
  3. modification to core java file: javax.script.AbstractScriptEngine
    By amughost in forum Java Theory & Questions
    Replies: 0
    Last Post: June 13th, 2012, 11:37 AM
  4. search string in a text file
    By dewet in forum Android Development
    Replies: 2
    Last Post: April 10th, 2012, 02:39 PM
  5. exception while Read very large file > 300 MB
    By ps.ganesh in forum File I/O & Other I/O Streams
    Replies: 2
    Last Post: June 11th, 2009, 11:39 PM