” NLP has a tremendous number of uses in the modern enterprise, which remains heavily text driven. Consider the sheer volume of text which flows through your organization on a daily basis, in the form of email, Word documents, Powerpoint slide decks, and instant messaging. If we can imbue computers with the ability to “understand” this text then we can automate workflows, automatically route documents to the users who need them, quickly classify documents for more efficient retrieval at a later date, quickly extract summaries from long documents, and allow computers to answer questions posed in their natural form”
“Built on proprietary natural language and artificial intelligence our cloud-based Content Transformation Platform ™ reads, understands, and transforms the vast amount of Big Data found in the world and automatically publishes unique, insightful, and optimized digital stories…at massive scale…at a fraction of the cost!”
Question and Answer systems also become reality using NLP tools, as you may have gathered from watching the performance of IBM’s Watson computer on the television program Jeopardy. Of course, Watson represents the cutting edge of NLP application, both in terms of hardware and software, and is probably financially out of reach to many organizations.
Using automatic summarization and document classification techniques, you can have the computer pre-read large numbers of documents for you, and isolate the ones that are “about” topics you are interested in depending on your position. Whether it’s “bottom line revenue” or “The Frozgobbit Project”, you can avoid tedious scanning through large groups of documents, looking for a handful that contain relevant information.
An excellent way to start exploring the world of NLP is by working with the Enron Corpus – a large body of real-world emails which became public domain as a result of being subpoena’d during the Enron trail. Text mining and NLP researchers have found this to be an excellent corpus to work with, as the data is – like most real world data – a bit messy. It reflects the way people communicate in real life, as opposed to being a purely academic tool. The Enron Email Dataset website has the actual data available for download, as well as pointers to articles, research papers, and other researchers working with this corpus.