Entity Extraction と Entity Resolution の違いは?


Entity Extraction (エンティティ抽出) はテキストの中の人名、地名などのキーワード(エンティティ)を抽出します。Entity Resolution (エンティティ解決) はキーワードを実際の人物や場所に結び付けます。キーワードの前後の文脈から、同じ名前の二人の人物がどちらであるかを識別したり、代名詞や役職名が意味する人物を特定できます。Entity Resolution は、エンティティに注目したテキスト分析の決め手となります。


What’s the Difference Between Entity Extraction (NER) and Entity Resolution?


Entity extraction, or named entity recognition (NER), is finding mentions of key “things” (aka “entities”) such as people, places, organizations, dates, and time within text. Entity mentions are the words in text that refer to entities, such as “Bill Clinton,” “White House,” and “U.S.”

Entity resolution takes it one step further and distinguishes between similarly named entities such as George W. Bush and George H. W. Bush. Or, from the mention of “Clinton,” figures out within that document if “Clinton” refers back to Bill Clinton or Hillary Clinton by looking at the context in which the entity appears—aka coreference resolution.

This feat is possible because entity resolution takes the mention of each entity, looks at the surrounding context, and compares it to a knowledge base (such as Wikidata). For example, if the entity is “Neil Armstrong,” is he mentioned in the context of “American, NASA, astronaut” or “Canadian, NHL, referee”?

Source: Bing search

Some entity resolution systems add coreference resolution, where the system chains together mentions of the same person within a document (indoc chaining) or across a body of documents (cross-document chaining), such as:

“Hillary Clinton and Bill Clinton visited Rosebud Diner during Clinton’s 2016 presidential campaign. Former President Clinton commented, “This is the best fried okra I’ve had in a long time!”

Based on context (modifiers), coreference resolution should figure out that “Hillary Clinton” is the same entity as “Clinton’s” and “Bill Clinton” is the same entity as “Former President Clinton.”

More sophisticated coreference resolution will do pronominal resolution and nominal resolution. Pronominal coreference resolution chains named entities to their pronouns. Nominal coreference resolution chains named entities to its noun references.

TYPE OF COREFERENCE RESOLUTION
EXAMPLE
Named entity Katherine Johnson’s calculations of orbital mechanics were critical to the success of NASA missions to the moon. Johnson calculated trajectories, launch windows, and emergency return paths.
Pronominal Katherine Johnson’s calculations of orbital mechanics were critical to the success of NASA missions to the moon. She calculated trajectories, launch windows, and emergency return paths.
Nominal Katherine Johnson’s calculations of orbital mechanics were critical to the success of NASA missions to the moon. The mathematician calculated trajectories, launch windows, and emergency return paths.

Entity resolution is the linchpin to making entity extraction truly useful. While there may be 16 entity mentions in a document, once the coreference resolution has been completed, there may only be three unique entities! All the downstream analyses around these entities—whether it is detecting sentiment around these entities or adding attributes about the entity to a knowledge base—benefit from having related information linked. Making sure not to link information about similarly named but different entities is equally important.