Hello!

I am currently a graduate student working on a project at a University in Pittsburgh, PA.

I am trying to utilize a Named Entity Recognizer (NER) for use in JGAAP, a free software that analyzes text for authorship attribution. I am working on a project that uses the NER from Stanford University and utilizes it to 1) Locate the named entity, and 2) return the words an author uses that are before and after the named entity. I am doing this in order to see if there is any connection between authors who frequently use certain words before/after named entities and if so, can we use analysis measures to determine which author wrote a piece of work depending on the use of these before/after words.

I am fairly new to java programming , have taken 2 courses and done other small projects, but I am not quite sure where my current code is failing.

I have posted two code below, one with the code for 1) Recognizing the named entities in work and returning the list, and the second for 2) returning the words an author uses that are before and after the entity with a count of frequency.

Currently, the first file is returning a list of null for each word. I can not manage to get the NER to actually recognize a named entity. This problem then leads to the problem with the second part of code.

I have been working on this project for a few weeks and am stumped as to where to go next. If anyone has any input, please respond.

Thank you very much for your time and assistance.


FIRST CODE:

package com.jgaap.eventDrivers;

import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.crf.*;
import edu.stanford.nlp.ling.CoreLabel;

import java.util.List;

import com.jgaap.generics.Document;
import com.jgaap.generics.EventDriver;
import com.jgaap.generics.EventGenerationException;
import com.jgaap.generics.EventSet;
import com.jgaap.generics.Event;

public class StanfordNamedEntityRecognizer extends EventDriver {

private volatile AbstractSequenceClassifier<CoreLabel> classifier;

@Override
public String displayName() {
return "Stanford Named Entity Recognizer";
}

@Override
public String tooltipText() {
return "A Named Entity Recognizer developed by the Stanford NLP Group http://nlp.stanford.edu";
}

@Override
public boolean showInGUI() {
return true;
}

@Override
synchronized public EventSet createEventSet(Document doc)
throws EventGenerationException {
EventSet eventSet = new EventSet();
// String serializedClassifier = "/com/jgaap/resources/models/ner/english.all.3class.distsim.crf.ser.gz"; original classifier
String serializedClassifier = "/com/jgaap/resources/models/ner/english.muc.7class.distsim.crf.ser.gz"; // Runs with this one too. Still no output besides list of null
if (classifier == null)
synchronized (this) {
if (classifier == null) {
try {
classifier = CRFClassifier.getJarClassifier(serializedClassifie r, null);
} catch (Exception e) {
e.printStackTrace();
throw new EventGenerationException(
"Classifier failed to load");
}
}
}

String fileContents = doc.stringify();
List<List<CoreLabel>> out = classifier.classify(fileContents);
for (List<CoreLabel> sentence : out) {
for (CoreLabel word : sentence) {
System.out.println(word.ner()); // Added to see if it is finding any words. Everything is being returned null.
if (word.ner() != null) {
eventSet.addEvent(new Event(word.word()));
System.out.println(word.toString() + "\t" + word.word()
+ "\t" + word.ner());
}
}
}
return eventSet;
}

}





SECOND CODE:

package com.jgaap.eventDrivers;

import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.crf.CRFClassifier;
import edu.stanford.nlp.ling.CoreLabel;

import java.util.List;

import com.jgaap.generics.Document;
import com.jgaap.generics.Event;
import com.jgaap.generics.EventDriver;
import com.jgaap.generics.EventGenerationException;
import com.jgaap.generics.EventSet;

public class WordsBeforeAfterNamedEntities extends EventDriver {

private volatile AbstractSequenceClassifier<CoreLabel> classifier;

String serializedClassifier = "com.jgaap.generics.Document";

@Override
public String displayName() {
return "Words Before and After Named Entities";
}

@Override
public String tooltipText() {
return "Counts the words used before and after named entities";
}

@Override
public boolean showInGUI() {
return true;
}

@Override
public EventSet createEventSet(Document doc)
throws EventGenerationException {
EventSet eventSet = new EventSet();
String serializedClassifier = "/com/jgaap/resources/models/ner/english.all.3class.distsim.crf.ser.gz";
if (classifier == null)
synchronized (this) {
if (classifier == null) {
try {
classifier = CRFClassifier.getJarClassifier(
serializedClassifier, null);
} catch (Exception e) {
e.printStackTrace();
throw new EventGenerationException(
"Classifier failed to load");
}
}
}
String fileContents = doc.stringify();
List<List<CoreLabel>> out = classifier.classify(fileContents);

for (int i = 0; i < out.size(); i++) {
for (int j = 0; j < out.get(i).size(); j++) {
if (out.get(i).get(j).ner() != null) {
if (j > 0) {
eventSet.addEvent(new Event("B"
+ out.get(i).get(j - 1).word()));
}
if (j < out.get(i).size() - 1) {
eventSet.addEvent(new Event("A"
+ out.get(i).get(j + 1).word()));
}
}
}
}
return eventSet;
}
}