Author:  Ken Stevens

This year the HAPI FHIR team prioritized developing an Enterprise Master Person Index (EMPI), within HAPI-FHIR.  Our hope is that HAPI FHIR EMPI will become a community initiative with wide participation and deployment and is the go-to solution for FHIR-based EMPI implementations.

Development began in early 2019 and in the last few months we have been engaged in building the initial release.   Here is what we’ve built so far:

  1. Phonetic indexing and searching are now natively supported in HAPI FHIR.  The “phonetic” search parameter on Patient, Person and Practitioner uses the phonetic normalizer of your choice.  Also included is a new extension on the SearchParameter resource to phonetically index any String element.  All Apache Codec encoders are supported (Soundex, Cologne, etc.). Or you can build your own.
  2. HAPI FHIR EMPI uses HAPI FHIR Subscriptions to match incoming Patient and Practitioner resources and asynchronously link them to matching Person resources.
  3. Dedicated link tables support workflow management for manual review of unclear matches (possible match, possible duplicate etc.)
  4. EMPI Matching configuration is managed via a JSON file.  This configuration currently includes search criteria for finding similar resources, match thresholds for match outcomes, and the EID system (if there is one) for special EID matching.
  5. Dedicated HAPI EMPI tags with interceptors to lock-down resources managed by HAPI FHIR EMPI.
  6. Batch processing for large-scale EMPI runs.

Advantages of building EMPI on FHIR

Organizations we work with who manage FHIR repositories are excited about running EMPI on top of FHIR for a couple of reasons:

  1. INBOUND: You get your mapping and indexing for free.  Those who manage legacy EMPI systems separate from their FHIR repository must, at great effort, build data pipelines and transformations from each of their source systems into their legacy EMPI system.  But those with existing data pipelines into their FHIR Repository benefit as HAPI FHIR EMPI is automatically notified of any changes to Patient, Practitioner, and Person resources and all all the matching fields EMPI needs are already mapped and indexed for EMPI.
  2. OUTBOUND.  Apps that look up patients in the EMPI (e.g. to reduce rekeying at registration) can use simple FHIR operations for their queries instead of having to learn proprietary APIs to talk to legacy EMPI systems.


Speaking with users, we have found that organizations tend to use EMPI in one of two ways:

  1. Real-time Clinical Match new patient records to existing records in real-time, automatically matching where possible and marking ambiguous cases for manual clarification later.
  2. Batch Analytics Load large batch files from various sources, matching up patients or practitioners in order to perform analytics (e.g. claims, quality metrics etc.)

These two scenarios have different needs.  The real-time system requires fast response time for small one-off queries.  The batch analytics scenario needs to efficiently run large batches that can take hours or days to complete.


A lesson we learned early-on is that EMPI can get complex really fast.  A simple-sounding requirement like supporting real-time updates for enterprise identifiers quickly fractures into a surprising number of special cases.  We’re likely still missing some cases.  We have aimed, whenever possible, to establish simplifying design principles to keep this complexity down. However, supporting the wide variety of EMPI use cases requires a fair amount of configurability which inevitably adds complexity.

Future Directions

Golden Record

Our initial implementation of a “Golden Record” as a Person resource that aggregates data from linked Patient or Practitioner resources is insufficient.  The FHIR Patient resource is much richer than the Person resource, but the Person resource is where the links reside.  We have just started adding the concept of a Golden Record which will be a designated Patient linked to a person with rules governing how this data can be updated; e.g. different data sources will have different levels of authority / priority for changing fields in the Golden Record.

Master Data Management 

Using FHIR paths and search parameters to compare and link resources naturally extends beyond matching people.  We are seeing interest in matching Organizations, Medications, Allergies, and many other FHIR resources.  While there isn’t a built-in linking resource for these other domains, like there is for people, most of what we built for matching people can be repurposed to match other resources as well.  We have designed the HAPI EMPI configuration and database entities in such a way that they can be extended beyond Patient and Practitioner in the future.

Probabilistic Matching

The algorithms required to support Probabilistic Matching are well-understood, the most common being the Fellegi Sunter Jaro algorithm.  Users we’ve spoken with have told us that while probabilistic matching looks good on paper, in practice its management is too complex to be useful.  Adding this capability would add considerable complexity and performance tuning load to HAPI FHIR EMPI. As a result, we will await community feedback to help gauge the demand and need for this feature.


If you have any thoughts on this initiative, please join us on the hapi-fhir discussion forums:

Further Reading