Take a look at the Apache Foundation's projects...in particular PDFBox and POI - the former to work with PDF's and the latter for Excel.