MALLET
Making sense of your data.

Ever experienced data overload in a project?  In this day and age, it’s not all that uncommon to have more data than you know what to do with.  The Machine Learning for Language Toolkit (MALLET) is a Java library that bundles up some of the most powerful algorithms for performing data mining.  Using these algorithms, it’s possible to better organize and understand your data.

Fieldstone Software has experience using the document classification capabilities of MALLET.  A simple example of this would be a spam classifier like you’d find in your email inbox.  Since we know what good mail looks like, and since we know what spam typically looks like, we can craft a Naive Bayes classifier to make a statistical approximation as to whether or not a new message is spam.

We’ve also used the sequence tagging functionality that MALLET provides to extract important information from data.  By employing named-entity recognition techniques, we can figure out exactly what a document is talking about without having to read through the entire text ourselves.  Imagine someone hands you a book and asks you for all the characters and locations featured throughout the text.  Using named-entity recognition, a computer can accomplish that task in mere seconds as compared to the hours it would take a human.

At Fieldstone Software, we’re familiar with MALLET and the data mining concepts it employs.  If you think you have a use for MALLET in your project, please contact us.

Leave a Reply