Експрес-підбір опонентів для разових рад із захисту PhD-дисертацій

Serhiy Shtovba Shtovba; Mykola Petrychko

doi:10.31558/2786-9482.2024.1.4

Authors

Serhiy Shtovba Vasyl’ Stus Donetsk National University https://orcid.org/0000-0003-1302-4899
Mykola Petrychko Vinnytsia National Technical University https://orcid.org/0000-0001-6836-7843

DOI:

https://doi.org/10.31558/2786-9482.2024.1.4

Keywords:

reviewer assignment problem, express assignment, natural language processing, categorization, discrete optimization, data analysis, Dimensions

Abstract

Today PhD thesis defense committee are formed manually. This causes both corruption risks and significant time spent on searching and analyzing candidates with a high chance of missing qualified opponents. Therefore, there is an interest in automating the formation of committees, which would allow to eliminate the mentioned risks of the human factor. The paper focuses on the express committee assignment when there is a need to narrow down a large list of candidates. The resulting short list can be analyzed either manually or processed by a fine-grained assignment procedure which is resource consuming and requires a much larger volume of initial information than the express assignment. A method of assigning a team of reviewers based on their relevance to the topic of the thesis is proposed, which, unlike the isolated assignment of candidates, takes into account the ability of the team of reviewers to jointly evaluate the work in terms of all aspects of its topic. The method is balanced in terms of assignment quality and resource costs criteria for the search of committee members. The method consists of 3 stages. At the first stage, the thesis and potential committee members are categorized by representing their topics with vectors in the space of research specialties from ANZSRC-2020. At the second stage, the level of correspondence of candidates to the topic of the thesis is calculated, taking into account the affinity of the research specialties of ANZSRC-2020. At the third stage, the committee is assigned, which corresponds to the topic of the thesis to the maximum possible extent. To implement the third stage, several optimization algorithms are proposed. Algorithm testing on the generated dataset of 67 PhD theses showed that the best balance in terms of assignment quality and resource costs criteria for team search provides a greedy algorithm without elitism and a complete search on a truncated set of candidates. As a result of the optimization, it was possible to improve the composition of committees by an average of 13-34%, depending on the type of algorithm used.

References

Zhao, X., & Zhang, Y. (2022). Reviewer assignment algorithms for peer review automation: A survey. Information Processing and Management, 59(5). https://doi.org/10.1016/j.ipm.2022.103028

Петричко, М. В., & Штовба, С. Д. (2024). Автоматизація підбору наукових рецензентів: огляд задач і методів. Вісник Вінницького політехнічного інституту, (1), 56 64. https://doi.org/10.31649/1997-9266-2024-172-1-56-64

Wang, F., Shi, N., & Chen, B. (2010). A comprehensive survey of the reviewer assignment problem. International Journal of Information Technology and Decision Making, 9(4), 645 668. https://doi.org/10.1142/S0219622010003993

Aksoy, M., Yanik, S., & Amasyali, M. F. (2023). Reviewer assignment problem: A systematic review of the literature. Journal of Artificial Intelligence Research. AI Access Foundation. https://doi.org/10.1613/JAIR.1.14318

Tan, S., Duan, Z., Zhao, S., Chen, J., & Zhang, Y. (2021). Improved reviewer assignment based on both word and semantic features. Information Retrieval Journal, 24(3), 175 204. https://doi.org/10.1007/s10791-021-09390-8

Yarowsky, D., & Florian, R. (1999). Taking the load off the conference chairs: Towards a digital paper-routing assistant. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, EMNLP 1999 (pp. 220–230). Association for Computational Linguistics (ACL).

Karimzadehgan, M., Zhai, C. X., & Belford, G. (2008). Multi-aspect expertise matching for review assignment. In Proceedings of International Conference on Information and Knowledge Management (pp. 1113–1122). https://doi.org/10.1145/1458082.1458230

Mirzaei, M., Sander, J., & Stroulia, E. (2019). Multi-aspect review-team assignment using latent research areas. Information Processing and Management, 56(3), 858–878. https://doi.org/10.1016/j.ipm.2019.01.007

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(4–5), 993–1022. https://doi.org/10.7551/mitpress/1120.003.0082

Ekinci, E., & Omurca, S. I. (2020). NET-LDA: A novel topic modeling method based on semantic document similarity. Turkish Journal of Electrical Engineering and Computer Sciences, 28(4), 2244–2260. https://doi.org/10.3906/ELK-1912-62

Anjum, O., Gong, H., Bhat, S., Xiong, J., & Hwu, W. M. (2019). Pare: A paper-reviewer matching approach using a common topic space. In EMNLP-IJCNLP 2019 – 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 518–528). Association for Computational Linguistics. https://doi.org/10.18653/v1/d19-1049

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations ofwords and phrases and their compositionality. In Advances in Neural Information Processing Systems. Neural information processing systems foundation.

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In EMNLP 2014 – 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1532–1543). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1162.

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051

Sun, C., Ng, K. T. J., Henville, P., & Marchant, R. (2019). Hierarchical word mover distance for collaboration recommender system. In Communications in Computer and Information Science (Vol. 996, pp. 289–302). Springer Verlag. https://doi.org/10.1007/978-981-13-6661-1_23

Kong, X., Jiang, H., Yang, Z., Xu, Z., Xia, F., & Tolba, A. (2016). Exploiting publication contents and collaboration networks for collaborator recommendation. PLoS ONE, 11(2): e0148492. https://doi.org/10.1371/journal.pone.0148492

Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In ACL 2018 – 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 1, pp. 328–339). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-1031

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019 – 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – Proceedings of the Conference (Vol. 1, pp. 4171–4186). Association for Computational Linguistics (ACL).

Alec, R., Jeffrey, W., Rewon, C., David, L., Dario, A., & Ilya, S. (2019). Language models are unsupervised multitask learners | Enhanced Reader. OpenAI Blog, 1(8), 9. Retrieved from https://github.com/codelucas/newspaper

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. ArXiv 2019. arXiv preprint arXiv:1910.01108.

Zhao, Y., Tang, J., & Du, Z. (2019). EFCNN: A restricted convolutional neural network for expert finding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11440 LNAI, pp. 96–107). Springer Verlag. https://doi.org/10.1007/978-3-030-16145-3_8

Shtovba, S., & Petrychko, M. (2021). An algorithm for topic modeling of researchers taking into account their interests in Google Scholar profiles. In CEUR Workshop Proceedings (Vol. 2864 “The Fourth International Workshop on Computer Modeling and Intelligent Systems”, pp. 299–311). CEUR-WS. https://doi.org/10.32782/cmis/2864-26

Jie, Y., Amores, J., Sebe, N., & Qi, T. (2006). A new study on distance metrics as similarity measurement. In 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 – Proceedings (Vol. 2006, pp. 533–536). https://doi.org/10.1109/ICME.2006.262443

Cha, S.-H. (2007). Comprehensive survey on distance/similarity measures between probability density functions. City, 1(2), 1.

Штовба, С. Д., & Петричко, М. В. (2024). Ідентифікація рівня спорідненості наукових спеціальностей на основі даних системи Dimensions. Проблеми програмування, (1), 77–85. https://doi.org/10.15407/pp2024.01.077

Shtovba, S., Petrychko, M., & Shtovba, O. (2023). Similarity metric оf categorical distributions for topic modeling problems with akin categories. In CEUR Workshop Proceedings (Vol. 3392 “The Sixth International Workshop on Computer Modeling and Intelligent Systems”, pp. 76–85). CEUR-WS. https://doi.org/10.32782/cmis/3392-7

Petrychko, M., & Shtovba, S. (2024). Dataset for PhD theses reviewers assignments. ResearchGate. http://dx.doi.org/10.13140/RG.2.2.23147.35362

Express assignment of reviewers for a PhD thesis defense committee

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Information

Language