MontyLingua
MontyLingua is a popular natural language processing toolkit. It is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for both the Python and Java programming languages. It is enriched with common sense knowledge about the everyday world from Open Mind Common Sense. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. It does not require training. It was written by Hugo Liu at MIT in 2003.
Because it is enriched with common sense knowledge it can avoid many mistakes. e.g.:
- "(NX the/DT mosquito/NN bit/NN NX) (NX the/DT boy/NN NX)"
vs.
- "(NX the/DT mosquito/NN NX) (VX bit/VBD VX) (NX the/DT boy/NN NX)"[1]
Non-commercial use is free. If it is your intent to use this software for non-commercial, non-proprietary purposes, such as for academic research purposes, this software is free and is covered under the GNU GPL License.
Abilities
- MontyTokenizer: normalizes punctuation, spacing and contractions, with sensitivity to abbrevs.
- MontyTagger: Part-of-speech tagging using the Penn Treebank tagset, enriched with "Common Sense" from the Open Mind Common Sense project. Exceeds accuracy of Brill94 tbl tagger using default training files
- MontyREChunker: chunks tagged text into verb, noun, and adjective chunks (VX,NX, and AX respectively)
- MontyExtractor: extracts verb-argument structures, phrases, and other semantically valuable information from sentences and returns sentences as "digests"
- MontyLemmatiser: part-of-speech sensitive lemmatisation. Strips plurals (geese-->goose) and tense (were-->be, had-->have). Includes regexps from Humphreys and Carroll's morph.lex, and UPENN's XTAG corpus
- MontyNLGenerator: generates summaries, generates surface form sentences, determines and numbers NPs and tenses verbs, accounts for sentence_type