fbpx
Wikipedia

Author name disambiguation

Author name disambiguation is a type of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".

The author name "Li Li" might refer to a number of people, including the seven listed here.

An editor may apply the process to scholarly documents where the goal is to find all mentions of the same author and cluster them together. Authors of scholarly documents often share names which makes it hard to distinguish each author's work. Hence, author name disambiguation aims to find all publications that belong to a given author and distinguish them from publications of other authors who share the same name.

Methods edit

Considerable research has been conducted into name disambiguation.[1][2][3][4] Typical approaches for author name disambiguation rely on information to distinguish between authors, including (but not limited to) information about the authors such as: their name representation, affiliations and email addresses, and information about the publication: such as year of publication, co-authors, and the topic of the paper. This information can be used to train a machine learning classifier to decide whether two author mentions refer to the same author or not.[5] Much research regards name disambiguation as a clustering problem, i.e., partitioning documents into clusters, where each represents an author.[1][6][7] Other research treats it as a classification problem.[8] Some works constructs a document graph and utilizes the graph topology to learn document similarity.[7][9] Recently, several pieces of research[9][10] aim to learn low-dimensional document representations by employing network embedding methods.[11][12]

Applications edit

 
Some of the ways in which authorship has been indicated for the same person

There are multiple reasons that cause author names to be ambiguous, among which: individuals may publish under multiple names for a variety of reasons including different transliteration, misspelling, name change due to marriage, or the use of nicknames or middle names and initials.[13]

Motivations for disambiguating individuals include identifying inventors from patents, and researchers across differing publishers, research insitutions and time periods.[14] Name disambiguation is also a cornerstone in author-centric academic search and mining systems, such as AMiner (formerly ArnetMiner).[15]

Similar issues edit

Author name disambiguation is only one record linkage problem in the scholarly data domain. Closely related, and potentially mutually beneficial problems include: organisation (affiliation) disambiguation,[16] as well as conference or publication venue disambiguation, since data publishers often use different names or aliases for these entities.

Resources edit

Several well-known benchmarks to evaluate author name disambiguation are listed below, each of which provides publications with some ambiguous names and their ground truths.

  • AMiner name disambiguation dataset
  • CiteSeerX name disambiguation dataset
  • Semantic Scholar Author Name Disambiguation (S2AND) dataset[17]

Source Codes

  • Beard
  • Name disambiguation in AMiner[9]

References edit

  1. ^ a b Khabsa, Madian; Treeratpituk, Pucktada; Giles, C. Lee (2015). Proceedings of the 15th ACM/IEEE-CE on Joint Conference on Digital Libraries - JCDL '15. pp. 37–46. doi:10.1145/2756406.2756915. ISBN 9781450335942. S2CID 14068285.
  2. ^ Mann, Gideon S.; Yarowsky, David (2003). "Unsupervised personal name disambiguation". Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 -. Vol. 4. pp. 33–40. doi:10.3115/1119176.1119181. S2CID 29759924.
  3. ^ Han, Hui; Giles, Lee; Zha, Hongyuan; Li, Cheng; Tsioutsiouliklis, Kostas (2004). "Two supervised learning approaches for name disambiguation in author citations". Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries - JCDL '04. p. 296. doi:10.1145/996350.996419. ISBN 1581138326. S2CID 1089260.
  4. ^ Huang, Jian; Ertekin, Seyda; Giles, C. Lee (2006). Knowledge Discovery in Databases: PKDD 2006. Lecture Notes in Computer Science. Vol. 4213. pp. 536–544. doi:10.1007/11871637_53. ISBN 978-3-540-45374-1. ISSN 0302-9743. S2CID 14132755.
  5. ^ Treeratpituk, Pucktada; Giles, C. Lee (2009). Disambiguating authors in academic publications using random forests (PDF). Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM. pp. 39–48. CiteSeerX 10.1.1.147.3500. doi:10.1145/1555400.1555408.
  6. ^ Jie Tang; A.C.M. Fong; Bo Wang; Jing Zhang (2012). "A Unified Probabilistic Framework for Name Disambiguation in Digital Library". IEEE Transactions on Knowledge and Data Engineering. 24 (6). IEEE: 975–987. doi:10.1109/TKDE.2011.13. S2CID 1032074.
  7. ^ a b Xuezhi Wang; Jie Tang; Hong Cheng; Philip S. Yu (2011). ADANA: Active Name Disambiguation. Proceedings of 2011 IEEE International Conference on Data Mining. Vancouver: IEEE. pp. 794–803. doi:10.1109/ICDM.2011.19. ISBN 978-1-4577-2075-8.
  8. ^ Zeyd Boukhers; Nagaraj Bahubali Asundi (2022). "Whois? Deep Author Name Disambiguation Using Bibliographic Data". Linking Theory and Practice of Digital Libraries. Lecture Notes in Computer Science. Vol. 13541. Padua: Springer. pp. 201–215. arXiv:2207.04772. doi:10.1007/978-3-031-16802-4_16. ISBN 978-3-031-16801-7.
  9. ^ a b c Yutao Zhang; Fanjin Zhang; Peiran Yao; Jie Tang (2018). Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM. pp. 1002–1011.
  10. ^ Baichuan Zhang; Mohammad Al Hasan (2017). Name disambiguation in anonymized graphs using network embedding. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Singapore: ACM. pp. 1239–1248.
  11. ^ Bryan Perozzi; Rami Al-Rfou; Steven Skiena (2014). Deepwalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM. pp. 701–710.
  12. ^ Jiezhong Qiu; Yuxiao Dong; Hao Ma; Jian Li; Kuansan Wang; Jie Tang (2018). Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. Marina Del Rey: ACM. pp. 459–467.
  13. ^ Smalheiser, Neil R.; Torvik, Vetle I. (2009). "Author name disambiguation". Annual Review of Information Science and Technology. 43: 1–43. doi:10.1002/aris.2009.1440430113.
  14. ^ Morrison, Greg; Riccaboni, Massimo; Pammolli, Fabio (16 May 2017). "Disambiguation of patent inventors and assignees using high-resolution geolocation data". Scientific Data. 4: 170064. Bibcode:2017NatSD...470064M. doi:10.1038/sdata.2017.64. PMC 5433392. PMID 28509897.
  15. ^ Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su (2008). ArnetMiner: extraction and mining of academic social networks. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM. pp. 990–998.
  16. ^ Zhang, Ziqi; Nuzzolese, Andrea; Gentile, Anna Lisa (2017). Entity Deduplication on ScholarlyData. Proceedings of the Extended Semantic Web Conference. Springer-Verlag. pp. 85–100. doi:10.1007/978-3-319-58068-5_6.
  17. ^ Subramanian, Shivashankar; King, Daniel; Downey, Doug; Feldman, Sergey (21 Mar 2021). "S2AND: A Benchmark and Evaluation System for Author Name Disambiguation". arXiv:2103.07534 [cs.DL].

author, name, disambiguation, type, disambiguation, record, linkage, applied, names, individual, people, process, could, example, distinguish, individuals, with, name, john, smith, author, name, might, refer, number, people, including, seven, listed, here, edi. Author name disambiguation is a type of disambiguation and record linkage applied to the names of individual people The process could for example distinguish individuals with the name John Smith The author name Li Li might refer to a number of people including the seven listed here An editor may apply the process to scholarly documents where the goal is to find all mentions of the same author and cluster them together Authors of scholarly documents often share names which makes it hard to distinguish each author s work Hence author name disambiguation aims to find all publications that belong to a given author and distinguish them from publications of other authors who share the same name Contents 1 Methods 2 Applications 3 Similar issues 4 Resources 5 ReferencesMethods editConsiderable research has been conducted into name disambiguation 1 2 3 4 Typical approaches for author name disambiguation rely on information to distinguish between authors including but not limited to information about the authors such as their name representation affiliations and email addresses and information about the publication such as year of publication co authors and the topic of the paper This information can be used to train a machine learning classifier to decide whether two author mentions refer to the same author or not 5 Much research regards name disambiguation as a clustering problem i e partitioning documents into clusters where each represents an author 1 6 7 Other research treats it as a classification problem 8 Some works constructs a document graph and utilizes the graph topology to learn document similarity 7 9 Recently several pieces of research 9 10 aim to learn low dimensional document representations by employing network embedding methods 11 12 Applications edit nbsp Some of the ways in which authorship has been indicated for the same person There are multiple reasons that cause author names to be ambiguous among which individuals may publish under multiple names for a variety of reasons including different transliteration misspelling name change due to marriage or the use of nicknames or middle names and initials 13 Motivations for disambiguating individuals include identifying inventors from patents and researchers across differing publishers research insitutions and time periods 14 Name disambiguation is also a cornerstone in author centric academic search and mining systems such as AMiner formerly ArnetMiner 15 Similar issues editAuthor name disambiguation is only one record linkage problem in the scholarly data domain Closely related and potentially mutually beneficial problems include organisation affiliation disambiguation 16 as well as conference or publication venue disambiguation since data publishers often use different names or aliases for these entities Resources edit nbsp Scholia has a profile for author disambiguation Q25052136 Several well known benchmarks to evaluate author name disambiguation are listed below each of which provides publications with some ambiguous names and their ground truths AMiner name disambiguation dataset CiteSeerX name disambiguation dataset Semantic Scholar Author Name Disambiguation S2AND dataset 17 Source Codes Beard Name disambiguation in AMiner 9 References edit a b Khabsa Madian Treeratpituk Pucktada Giles C Lee 2015 Proceedings of the 15th ACM IEEE CE on Joint Conference on Digital Libraries JCDL 15 pp 37 46 doi 10 1145 2756406 2756915 ISBN 9781450335942 S2CID 14068285 Mann Gideon S Yarowsky David 2003 Unsupervised personal name disambiguation Proceedings of the seventh conference on Natural language learning at HLT NAACL 2003 Vol 4 pp 33 40 doi 10 3115 1119176 1119181 S2CID 29759924 Han Hui Giles Lee Zha Hongyuan Li Cheng Tsioutsiouliklis Kostas 2004 Two supervised learning approaches for name disambiguation in author citations Proceedings of the 2004 joint ACM IEEE conference on Digital libraries JCDL 04 p 296 doi 10 1145 996350 996419 ISBN 1581138326 S2CID 1089260 Huang Jian Ertekin Seyda Giles C Lee 2006 Knowledge Discovery in Databases PKDD 2006 Lecture Notes in Computer Science Vol 4213 pp 536 544 doi 10 1007 11871637 53 ISBN 978 3 540 45374 1 ISSN 0302 9743 S2CID 14132755 Treeratpituk Pucktada Giles C Lee 2009 Disambiguating authors in academic publications using random forests PDF Proceedings of the 9th ACM IEEE CS Joint Conference on Digital Libraries ACM pp 39 48 CiteSeerX 10 1 1 147 3500 doi 10 1145 1555400 1555408 Jie Tang A C M Fong Bo Wang Jing Zhang 2012 A Unified Probabilistic Framework for Name Disambiguation in Digital Library IEEE Transactions on Knowledge and Data Engineering 24 6 IEEE 975 987 doi 10 1109 TKDE 2011 13 S2CID 1032074 a b Xuezhi Wang Jie Tang Hong Cheng Philip S Yu 2011 ADANA Active Name Disambiguation Proceedings of 2011 IEEE International Conference on Data Mining Vancouver IEEE pp 794 803 doi 10 1109 ICDM 2011 19 ISBN 978 1 4577 2075 8 Zeyd Boukhers Nagaraj Bahubali Asundi 2022 Whois Deep Author Name Disambiguation Using Bibliographic Data Linking Theory and Practice of Digital Libraries Lecture Notes in Computer Science Vol 13541 Padua Springer pp 201 215 arXiv 2207 04772 doi 10 1007 978 3 031 16802 4 16 ISBN 978 3 031 16801 7 a b c Yutao Zhang Fanjin Zhang Peiran Yao Jie Tang 2018 Name Disambiguation in AMiner Clustering Maintenance and Human in the Loop Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining London ACM pp 1002 1011 Baichuan Zhang Mohammad Al Hasan 2017 Name disambiguation in anonymized graphs using network embedding Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Singapore ACM pp 1239 1248 Bryan Perozzi Rami Al Rfou Steven Skiena 2014 Deepwalk Online learning of social representations Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining New York ACM pp 701 710 Jiezhong Qiu Yuxiao Dong Hao Ma Jian Li Kuansan Wang Jie Tang 2018 Network Embedding as Matrix Factorization Unifying DeepWalk LINE PTE and node2vec Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining Marina Del Rey ACM pp 459 467 Smalheiser Neil R Torvik Vetle I 2009 Author name disambiguation Annual Review of Information Science and Technology 43 1 43 doi 10 1002 aris 2009 1440430113 Morrison Greg Riccaboni Massimo Pammolli Fabio 16 May 2017 Disambiguation of patent inventors and assignees using high resolution geolocation data Scientific Data 4 170064 Bibcode 2017NatSD 470064M doi 10 1038 sdata 2017 64 PMC 5433392 PMID 28509897 Jie Tang Jing Zhang Limin Yao Juanzi Li Li Zhang Zhong Su 2008 ArnetMiner extraction and mining of academic social networks Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining New York ACM pp 990 998 Zhang Ziqi Nuzzolese Andrea Gentile Anna Lisa 2017 Entity Deduplication on ScholarlyData Proceedings of the Extended Semantic Web Conference Springer Verlag pp 85 100 doi 10 1007 978 3 319 58068 5 6 Subramanian Shivashankar King Daniel Downey Doug Feldman Sergey 21 Mar 2021 S2AND A Benchmark and Evaluation System for Author Name Disambiguation arXiv 2103 07534 cs DL Retrieved from https en wikipedia org w index php title Author name disambiguation amp oldid 1217048631, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.