Creating A Count Matrix In Java
Hi all,
I do some programming here and there, but it is been a long time since I've sat down and done anything with Java and its data structures.
I have a large number of .txt files (19 right now, but it could grow to as many as 160ish). These .txt files contain lines with genes and their locations in a certain biological plant species. Here is how one of these .txt files starts......
Name: Zea Mays
FileName: NC_001666
bp: 160719
Genes: rps12 (90173..91321)
rps7 (92413..94170)
ndhB -(95619..96101)
And this file continues with lines of genes and their respective locations. I need a Java program to look at each one of these .txt files and to create a matrix that contains every single gene name that appears (i.e. rps12) throughout the .txt files on the left and each possible species FileName across the top (i.e NC_001666). Each intersection should then contain the number of times that specific gene appears in that FileName. So, if rps12 appears 4 times in NC_001666.txt, then a 4 would be put at the intersection of rps12 and NC_001666.
Like I said before, it's been so long since I've worked with Java, I really have no idea how to start this or what data structures would be most useful, but if anyone could give me some help, that would be great. Thanks!
-statsman5
Re: Creating A Count Matrix In Java
It'd be really easy to use a modified set. I'd recommend a hash table to do it the fastest, and probably the easiest. Once you have all your data read in, you can change it to pretty much any data structure that would fit your needs.
Re: Creating A Count Matrix In Java
Lucky you, i got bored :D
Code :
import java.io.File;
import java.util.ArrayList;
import java.util.Scanner;
/**
* @author Andrew
*
*/
public class GeneCounter
{
String fileName;
ArrayList<ArrayList<Gene>> geneHash;
/**
* A simple test handler
*
* @param args
* @throws Exception
*/
public static void main (String[] args) throws Exception
{
Scanner reader = new Scanner(System.in);
boolean quit = false;
ArrayList<GeneCounter> listOfGenes = new ArrayList<GeneCounter>();
int size = 0;
while (!quit)
{
System.out.println("Input next file to add to matrix, or quit: ");
String input = reader.nextLine();
if (input.equals("quit"))
{
quit = true;
}
else
{
listOfGenes.add(new GeneCounter(input));
listOfGenes.get(size).printOut();
size++;
}
}
}
/**
* Builds a geneCounter for a file
*
* @param fileName
*/
public GeneCounter (String fileName) throws Exception
{
this.fileName = fileName;
Scanner file = new Scanner(new File(fileName));
// header stuff
file.nextLine();
file.nextLine();
file.nextLine();
file.next();
// determine a good hash table size
int numGenes = 0;
while (file.hasNext())
{
numGenes++;
file.nextLine();
}
file.close();
geneHash = new ArrayList<ArrayList<Gene>>();
for (int i = 0; i < numGenes; i++)
{
geneHash.add(new ArrayList<Gene>());
}
// read in genes and hash them
file = new Scanner(new File(fileName));
// header
file.nextLine();
file.nextLine();
file.nextLine();
file.next();
while (file.hasNext())
{
addHash(file.next(), numGenes / 10 + 1);
// skip range
file.nextLine();
}
}
/**
* Prints out a simple statistics list for this gene counter
*/
public void printOut ()
{
for (int i = 0; i < geneHash.size(); i++)
{
for (int j = 0; j < geneHash.get(i).size(); j++)
{
System.out.println(geneHash.get(i).get(j).name + " occured " + geneHash.get(i).get(j).occurences);
}
}
}
/**
* adds a gene to the hash table. Duplicates just increases the number of occurences
*
* @param element
*/
public void addHash (String name, int hashSize)
{
int index = hash(new Gene(name), hashSize);
for (int i = 0; i < geneHash.get(index).size(); i++)
{
// try to find the match
if (geneHash.get(index).get(i).name.equals(name))
{
geneHash.get(index).get(i).increaseOccurance();
return;
}
}
// didn't find it, add
geneHash.get(index).add(new Gene(name));
}
/**
* Hashes a gene, returns the key
*
* @param element
* @param tableSize
* @return
*/
public int hash (Gene element, int tableSize)
{
int hash = 0;
for (int i = 0; i < element.name.length(); i++)
{
hash = ((int) element.name.charAt(i)) + hash * 2;
}
return hash % tableSize;
}
/**
* Simple gene representation class
*
* @author Andrew
*
*/
private class Gene
{
String name;
int occurences;
public Gene (String name)
{
this.name = name;
this.occurences = 1;
}
public void increaseOccurance ()
{
occurences++;
}
}
}