Use a 3rd party library such as Apache POI - the Java API for Microsoft Documents to extract out the text. Then parse the text as appropriate.