I need ideas on how to read this
Hi,
I want to read a flat file with the following info:
http://www.geneontology.org/ontology...ology_edit.obo
There are about 30,000 Terms. Most terms are linked with others and all of them have an id which has this format: GO:number, i.e: GO:0006310.
For every term I need to get:
-their id.
-the is_a id.
- the relationship id
For instance:
Quote:
[Term]
id: GO:0000019
name: regulation of mitotic recombination
namespace: biological_process
def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators]
synonym: "regulation of recombination within rDNA repeats" NARROW []
is_a: GO:0000018 ! regulation of DNA recombination
relationship: regulates GO:0006312 ! mitotic recombination
I need:
GO:0000019, GO:0000018, GO:0006312.
Finally I must ignore them when "is_obsolete: true" is present. (The Term is not relevant and I don't need it's info)
I don't need any java code (although any suggestion is greatly appreciated), but I need a way to get this done. My final goal is to make a matrix with the Term's id in the first column, and the rest of the ids found, in the following columns. Any idea on how to do this?