论文信息 - Enterprise Master Patient Index Entity Recognition by Long Short-Term Memory Network in Electronic Health Systems

Enterprise Master Patient Index Entity Recognition by Long Short-Term Memory Network in Electronic Health Systems

Named-entity recognition (NER) is the application of information extraction by artificial intelligence (AI) to locate and classify conceptual entities from natural language into pre-defined categories. In this study, we apply the Long Short-Term Memory network (LSTM) networks to identify the patient entities from the Enterprise Master Patient Index (EMPI). A sample dataset with 300,000 deidentified patient records is used to test the LSTM performance for EMPI entity recognition. The data entries are firstly converted into strings and represented by a Word2Vec model with 200 dimensions. Two LSTM models are developed for the NER recognition problem. The first LSTM model uses a multiclassifier with a softmax function, the second LSTM model uses a two-step classification procedure by binary logistic function. To evaluate the LSTM performance, we use a conventional deep neural network model for comparison, where the Levenshtein distance is used to represent the training data patterns. The classification performance is evaluated by ten-fold cross-validation. The two-step LSTM model has the classification accuracy of 99.82%, which is superior to both the multi-classification LSTM classifier at 61.08% and to the conventional deep neural network at 95.08%. Therefore, we conclude that the new two-step LSTM model provides an accurate and reliable solution to recognize the EMPI patient entities when it is properly configured and trained.

Jun Liu | Jimmy Xiangji Huang | Zhaohui Liang | Stephen Chan

[1] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[2] Fabio Rinaldi,et al. Entity recognition in the biomedical domain using a hybrid approach , 2017, J. Biomed. Semant..

[3] David T. Marc,et al. Why Patient Matching Is a Challenge: Research on Master Patient Index (MPI) Data Discrepancies in Key Identifying Fields. , 2016, Perspectives in health information management.

[4] Sooyoung Yoo,et al. Developing a Common Health Information Exchange Platform to Implement a Nationwide Health Information Network in South Korea , 2015, Healthcare informatics research.

[5] Oren Etzioni,et al. Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.