認知コンピューティング: 自然言語処理の利点(HLTConゲスト寄稿)
Cognitive computing: The human benefit of natural language processing
June 2, 2016
An HLTCon guest blog
Courtesy of Evelyn Kent of Bacon Tree Consulting. This article originally appeared on KMWorld.
Natural language processing is a core ability of cognitive computing systems and is often defined as helping computers process and understand human language. NLP research has been ongoing since the 1930s, and though we have made significant gains in field, anyone who has combed through search results knows that humans have not completely bridged the communication gap with computers. Recent NLP research has focused on semi-supervised or unsupervised learning techniques, which use large amounts of hand annotated and non-annotated data to learn from. They are seeing some success, as speakers at Basis Technology’s HLTCon recently discussed. They are using NLP to process large amounts publically-available content to gather intelligence, address terrorist threats, conduct research into social issues, tackle communication issues in refugee camps, and identify victims of human trafficking in the sex trade.
On a very basic level, NLP enables computers to understand language by putting words together in meaningful phrases, assigning meaning to those phrases, and drawing inferences from them. Some of the most well-known components of NLP are part-of-speech tagging, named entity resolution, word sense disambiguation, and co-reference resolution, each of which play a vital role in identifying and characterizing the core text that carries the primary meaning of a phrase or sentence. Other deep technical processes behind NLP include machine learning techniques, computational linguistics, and statistics across training corpora.
The ability to process language naturally allows NLP applications to summarize documents, auto-classify text, conduct sentiment analysis, and provide search results with enhanced relevance ranking. In the field, it’s how you put the pieces together that counts.
Parsing data for clues
Patrick Butler, researcher at Virginia Tech, discussed his work on EMBERS – a project that uses publically available content to predict social events. The project is funded by IARPA and aims to create an automated system to parse online data for clues about what is happening in a specific society. Butler and his team are analyzing Tweets to determine not only what a protest is about but to predict when the next one might occur. They are also tracking flu cases through cancelled OpenTable reservations and by the number of cars they see parked outside of emergency rooms. They do all of this work in the language the content is written in, and some of their processing includes turning relative phrases into actual dates. “Next week” becomes the date of the content plus 7 days, for example.
Multi-lingual natural language processing is important to many of the cases presented at HLTCon, but not all languages are easily parsed through NLP. That is to say, the less written content there is in a language, the less developed NLP will be in that language: NLP in French is excellent, but NLP in Swahili is still difficult.
This is a barrier for Gregor Stewart’s and Danielle Forsyth’s projects, both of which deal with refugee crises. Stewart, vice president of product management at Basis Technology, and Forsyth, co-founder of Thetus Corporation, discussed how predicting political upheaval can help prepare for refugee movement to other areas. Stewart said that the refugee crisis in Europe now is not as new as it may seem. He said that there are about 6 million people who have been outside of their home countries for more than 5 years, and some of those have only recently been processed. The sheer volume of people moving into Europe has overwhelmed the governments there, and language differences are the biggest barrier to getting people to safety and creating mitigation policies. He speculated that this process would be greatly aided by better interpretation and translation tools that can be created through machine learning and natural language processing.
Predicting crises
Forsyth discussed anticipating refugee crises by parsing language for overt and hidden meaning. Her work currently focuses on Africa, and she recently found 5 phrases used by Burundi politicians that incite violence against minority groups, including the innocuous seeming “get to work.” Monitoring this type of language and using sentiment analysis to determine its meaning helps indicate if a political crisis is likely to instigate a refugee crisis. If aid groups can successfully predict a humanitarian crisis, they can mitigate some of the effects of the crisis and perhaps keep refugees in safe areas inside of their home countries. Multi-lingual NLP is essential to understanding the local language enough for Forsyth to be successful.
Giant Oak is using a combination of technologies that includes NLP to identify sex trade workers who are victims of human trafficking. To do so, they have to determine the behavior of sex workers who are in the trade willingly, and then identify deviations from that behavior. They have mined 85 million online ads and more than 2 million reviews for sex workers for locations, phone numbers and other rich data. They are also looking for sentiment in these ads to determine if the ad writer was unhappy or drugged – a very difficult task since there may not be much difference in behavior of someone who is taking drugs and someone who is drugged. Giant Oak’s work is still in early stages, but they are using machine learning and NLP to try to solve social issues and save lives.
So is Karthik Dinakar, Reid Hoffman Fellow at MIT. Dinakar uses models to understand and predict adolescent distress, crisis counseling, self-harm and heart disease. In his heart disease research he found that looking at a combination of a patient’s history, parsing the words used by the patient to describe symptoms and an angiogram is better at predicting heart attacks in women than doctors are. Dinakar also found that women often use different language to describe their symptoms than do men. For the past few decades, doctors have thought that this means that men and women have different heart issues, but Dinakar’s research indicates that the issues are the same. It is how the genders talk about them that is different. The overwhelming majority of male cardiologists simply do not understand what their female patients are saying. Mapping language differences may help more female heart attack victims survive.
The conversation about cognitive computing and big data often is enterprise focused – how we can make better business decisions, discover new business opportunities and the like – but the projects at HLTCon highlighted a real ability to turn big data into information that can help people in need, both in the collective sense and in the individual sense. It is this kind of creative use of NLP technologies that can literally make cognitive computing smart enough to do some good.
About the author
Evelyn Kent is a principal at Bacon Tree Consulting and an executive board member of the Cognitive Computing Consortium. She is a text analytics consultant who specializes in helping clients design systems to best auto-classify their content. Follow her on Twitter.
About HLTCon
The Human Language Technology Conference (HLTCon) is an annual event hosted by Basis Technology discussing the latest advancements in Natural Language Processing and Text Analytics used in government and enterprise Big Data platforms. The conference brings together directors, senior officials, linguists, technologists, product owners, and program managers, whose mandate is defined by getting the most out of HLT.
Join us next year to network with your peers across industry and government, listen to keynotes from senior leaders, attend training sessions on industry leading text analytics software, and be immersed in HLT innovations by the field’s top minds. Follow Basis Technology on Twitter for updates on HLTCon 2017.