Research
Major Entity Identification
Paper Code Demo


The limited generalization of coreference resolution (CR) models has been a major bottleneck in the task’s broad application. Prior work has identified annotation differences, especially for mention detection, as one of the main reasons for the generalization gap and proposed using additional annotated target domain data. Rather than relying on this additional annotation, we propose an alternative referential task, Major Entity Identification (MEI), where we: (a) assume the target entities to be specified in the input, and (b) limit the task to only the frequent entities. Through extensive experiments, we demonstrate that MEI models generalize well across domains on multiple datasets with supervised models and LLM-based few-shot prompting. Additionally, MEI fits the classification framework, which enables the use of robust and intuitive classification-based metrics. Finally, MEI is also of practical use as it allows a user to search for all mentions of a particular entity or a group of entities of interest.
Motivation
Let us consider the following applications and their common features. MEI is a direct solution for all these applications. MEI
is more reliable than other solutions like heuristic prompting
and coreference resolution
. It is more generalizable, tailored for extraction of specific entities and is faster w/ parallelization.
Applications:
- Character Understanding in Movies, Novels
- Pseudonymization
- Financial Data Analysis
- Entity/Concept-based Context Retrieval
Common Requirements
- Long Documents
- Pre-defined set of entities
- Interest in occurence of specific entities
- Dense Annotations
MEIRa Models
We propose a family of models – Major Entity Identification via Ranking (MEIRa
) to perform MEI.MEIRa takes inspiration from the Entity Ranking (ER) family of models from coreference resolution. ER models maintain a list of entity representations that is dynamically updated upon association of mentions.
MEIRa takes as input a document and a set of succinct representative phrases, each uniquely identifying a set of entities to track. The document is then encoded with a language model like LongFormer
and then passed through a mention detector. There are two-types of linking modules: a static
linking module that is effective and efficient and is parallelizable, a hybrid
linking module which is equally fast as any coreference resolution model and offers SOTA performance on the task. For more details, visit the paper.

Sample Outputs:
Evaluation of LLMs
We also evaluate LLMs for their end-to-end referential capabilities on MEI. MEI is a generalizable across annotation strategies and is hence more reliable. We provide the first evaluation of referential capabilities that measures both the mention detection
and association
. We design a novel two-stage prompt that performs word-level coreference and later builds the entire span. We highlight the higher efficiency of this prompting strategy over a single-stage prompt that performs MEI at a span-level.
Instruction
You will receive a text along with a list of Key Entities and their corresponding Cluster IDs as input. Your task is to perform Coreference Resolution on the provided text to categorize “each word belonging to a cluster” with its respective cluster id. Also briefly describe the key entities in 1-2 sentences before starting the coreference task. Follow the format below to label a word with its cluster ID: word#cluster_id
Input
Key Entities:
- #1 - Katherine Hilbery
- #2 - Mrs. Hilbery
Text:
CHAPTER I It was a Sunday evening in October, and in common with many other young ladies of her class, Katharine Hilbery was pouring out tea. Perhaps a fifth part of her mind was thus occupied, and the remaining parts leapt over the little barrier of day which interposed between Monday morning and this rather subdued moment, and played with the things one does voluntarily and normally in the daylight. But although she was silent, she was evidently mistress of a situation which was familiar enough to her, and inclined to let it take its way for the six hundredth time, perhaps, without bringing into play any of her unoccupied faculties. A single glance was enough to show that Mrs. Hilbery was so rich in the gifts which make tea-parties of elderly distinguished people successful, that she scarcely needed any help from her daughter.Output
Description:
- #1- Katharine Hilbery: A young and apparently rich lady and the daughter of Mrs. Hilbery. She and Mrs. Hilbery were organizing a party for some distinguished elders.
- #2- Mrs. Hilbery: She is the mother of Katharine Hilbery and is a well-to-do member of the society and a very efficient and able hostess