You’ve probably experienced both these frustrations with search engines: You’re not quite sure which terms to use, so you poke around hoping you’ll find something relevant, or you get lots of irrelevant results that happen to include your search terms but have nothing to do with what you are looking for.
Text mining, also known as text analytics, promises the ability to find meaning and patterns in mountains of textual material by going far beyond conventional search capabilities. Unlike simple word and phrase searches that require exact or near-exact matches, text mining systems can find relevant material even if you don’t know the specific terminology the sources use, or if they use different words to express the same concepts. By applying linguistic principles through natural language processing, text mining systems can recognize meaning in context. This capability also helps text mining tools filter out irrelevant material that uses the same terms, such as excluding material about biological reproduction if you are searching for material about document or file reproduction.
Another major benefit of text mining is the ability to copy all the searched material and reorganize it into consistent records, even if it came from a variety of sources in different formats. For example, a system could be instructed to pull in social media posts, emails, and text messages and “clean” and merge them into a single data set for easier analysis.
Text mining is a potential solution whenever a business needs to analyze hundreds, thousands, or even millions of text records. Examples of current applications include product research and development (such as searching patent records for similar designs), sentiment analysis (finding trends of satisfaction or dissatisfaction in public tweets, customer emails, and other sources), competitive intelligence (finding out what competitors are up to by analyzing their document and social media output), and risk management (such as analyzing financial news and reports in search of potential risks).
Class activity ideas
- Natural language processing applies the same linguistic rules and concepts that humans use to encode and decode language. Ask students if they think computers will ever be able to understand text the way that humans can. Why or why not?
- How do students feel about their public social media posts being available for companies and other organizations to analyze?
Sources: “About Text Mining,” IBM Knowledge Center, accessed 7 April 2018, www.ibm.com; “What Is NLP Text Mining?” Linguamatics, accessed 7 April 2018, www.linguamatics.com; Text Mining Applications: 10 Examples Today,” Expert System, 18 April 2016, www.expertsystem.com.