Methods of solving the problem of stock management based on neuro-associative learning and reinforcement learning
DOI:
https://doi.org/10.31558/2786-9482.2024.1.5Keywords:
neuro-associative learning, learning with reinforcement, inventory management, bounded Cauchy machine, Q-learning method, the SARSA methodAbstract
Today, the development of intelligent methods aimed at solving inventory management problems is an urgent task. Many modern companies use the technology of constraint theory to improve and optimize their business processes, which provides dynamic inventory buffer management and is used for supply chain management. The aim of the work is to improve the efficiency of inventory management through neuro-associative learning based on the constrained Cauchy machine and reinforcement learning based on Q-learning and SARSA. To achieve this goal, a method based on a constrained Cauchy machine for inventory buffer management, a method based on Q-learning, and a method based on SARSA for general inventory management tasks are created. The proposed neural network model of the constrained Cauchy machine has a hetero-associative memory with no capacity limitations and provides high accuracy for inventory buffer management. The model uses the Cauchy distribution, which improves the convergence of the parametric identification method by comparison with the traditional restricted Boltzmann machine. Compared to the full Cauchy machine, the constrained Cauchy machine allows working with a larger memory size. Modification of the Q-learning and SARSA methods by means of dynamic parameters allows to increase the learning speed at a given level of the mean square error. Computational experiments have shown that controlling the importance of the reward, the learning rate parameter, and the parameter for e--greedy approach for the
Q-learning and SARSA methods allows making the solution search more global at the initial stages and more local at the final stages. The proposed methods allow expanding the scope of neuro-associative learning and reinforcement learning, which is confirmed by their adaptation for inventory management tasks, and contributes to the efficiency of intelligent computer systems for general and special purposes. Prospects for further research are the application of the proposed methods to other decision-making tasks, including the ones in the field of artificial intelligence.
References
Mayo-Alvarez, L., Del-Aguila-Arcentales, S., Alvarez-Risco, A., Chandra Sekar, M., Davies, N. M., & Yáñez, J. A. (2024). Innovation by integration of Drum-Buffer-Rope (DBR) method with Scrum-Kanban and use of Monte Carlo simulation for maximizing throughput in agile project management. Journal of Open Innovation: Technology, Market, and Complexity, 10(1). https://doi.org/10.1016/j.joitmc.2024.100228
Melendez, J. R., Zoghbe Nuñez, Y. A., Malvacias Escalona, A. M., Almeida, G. A., & Layana Ruiz, J. (2018). Theory of constraints: A systematic review from the management context. Espacios, 39(48).
Stopka, O., Zitrický, V., Ľupták, V., & Stopková, M. (2023). Application of specific tools of the theory of constraints – a case study. Cognitive Sustainability, 2(1). https://doi.org/10.55343/cogsust.48
Bart, A., Delahaye, B., Fournier, P., Lime, D., Monfroy, É., & Truchet, C. (2018). Reachability in parametric interval Markov chains using constraints. Theoretical Computer Science, 747, 48–74. https://doi.org/10.1016/j.tcs.2018.06.016
Haykin, S. (2008). Neural Networks and Learning Machines. Pearson Prentice Hall New Jersey USA 936 pLinks (vol. 3, p. 906). https://doi.org/978-0131471399
Shlomchak, G., Shvachych, G., Moroz, B., Fedorov, E., & Kozenkov, D. (2019). Automated control of temperature regimes of alloyed steel products based on multiprocessors computing systems. Metalurgija, 58(3–4), 299–302.
Shvachych, G. G., Ivaschenko, O. V., Busygin, V. V., & Fedorov, Y. Y. (2018). Parallel computational algorithms in thermal processes in metallurgy and mining. Naukovyi Visnyk Natsionalnoho Hirnychoho Universytetu, (4), 129–137. https://doi.org/10.29202/nvngu/2018-4/19
Fedorov, E., Utkina, T., Nechyporenko, O., & Korpan, Y. (2020). Development of technique for face detection in image based on binarization, scaling and segmentation methods. Eastern-European Journal of Enterprise Technologies, 1(9–103), 23–31. https://doi.org/10.15587/1729-4061.2020.195369
Singh, U. P., Jain, S., Tiwari, A., & Singh, R. K. (2019). Gradient evolution-based counter propagation network for approximation of noncanonical system. Soft Computing, 23(13), 4955-4967. https://doi.org/10.1007/s00500-018-3160-7
Sonika, Pratap, A., Chauhan, M., & Dixit, A. (2017). New technique for detecting fraudulent transactions using hybrid network consisting of full-counter propagation network and probabilistic network. In Proceeding – IEEE International Conference on Computing, Communication and Automation, ICCCA 2016 (pp. 177–182). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/CCAA.2016.7813713
Baggenstoss, P. M. (2019). Applications of projected belief networks (PBN). In European Signal Processing Conference (Vol. 2019 – September). European Signal Processing Conference, EUSIPCO. https://doi.org/10.23919/EUSIPCO.2019.8902708
Baggenstoss, P. M. (2019). On the duality between belief networks and feed-forward neural networks. IEEE Transactions on Neural Networks and Learning Systems, 30(1), 190–200. https://doi.org/10.1109/TNNLS.2018.2836662
Sountsov, P., & Miller, P. (2015). Spiking neuron network Helmholtz machine. Frontiers in Computational Neuroscience, 9(APR). https://doi.org/10.3389/fncom.2015.00046
Kohonen, T. (2012). Self-Organization and Associative Memory, (3rd ed.). Berlin; New York: Springer-Verlag. 311 р. https://doi.org/10.1007/978-3-642-88163-3
Kohonen, T. (2013). Essentials of the self-organizing map. Neural Networks, 37, 52–65. https://doi.org/10.1016/j.neunet.2012.09.018
Lobo, R. A., & Valle, M. E. (2020). Ensemble of binary classifiers combined using recurrent correlation associative memories. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12320 LNAI, pp. 442–455). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61380-8_30
Kobayashi, M. (2017). Quaternionic Hopfield neural networks with twin-multistate activation function. Neurocomputing, 267, 304–310. https://doi.org/10.1016/j.neucom.2017.06.013
Du, K. L., & Swamy, M. N. S. (2014). Neural Networks and Statistical Learning (Vol. 9781447155713, pp. 1–824). Springer-Verlag London Ltd. https://doi.org/10.1007/978-1-4471-5571-3
Javidmanesh, E. (2017). Global stability and bifurcation in delayed bidirectional associative memory neural networks with an arbitrary number of neurons. Journal of Dynamic Systems, Measurement and Control, Transactions of the ASME, 139(8). https://doi.org/10.1115/1.4036229
Park, Y. (2010). Optimal and robust design of brain-state-in-a-box neural associative memories. Neural Networks, 23(2), 210–218. https://doi.org/10.1016/j.neunet.2009.10.008
Khristodulo, O. I., Makhmutov, A. A., & Sazonova, T. V. (2017). Use algorithm based at hamming neural network method for natural objects classification. In Procedia Computer Science (Vol. 103, pp. 388–395). Elsevier B.V. https://doi.org/10.1016/j.procs.2017.01.126
Fischer, A., & Igel, C. (2014). Training restricted Boltzmann machines: An introduction. Pattern Recognition, 47(1), 25–39. https://doi.org/10.1016/j.patcog.2013.05.025
Wang, Q., Gao, X., Wan, K., Li, F., & Hu, Z. (2020). A novel restricted Boltzmann machine training algorithm with fast Gibbs sampling policy. Mathematical Problems in Engineering, 2020. https://doi.org/10.1155/2020/4206457
Bertsekas, D. P. (2019). Reinforcement Learning and Optimal Control. Belmont, MA: Athena Scientific.
François-Lavet, V., Henderson, P., Islam, R., Bellemare, M. G., & Pineau, J. (2018). An introduction to deep reinforcement learning. Foundations and Trends in Machine Learning, 11(3–4), 219–354. https://doi.org/10.1561/2200000071
Goldberg, D. A., Katz-Rogozhnikov, D. A., Lu, Y., Sharma, M., & Squillante, M. S. (2016). Asymptotic optimality of constant-order policies for lost sales inventory models with large lead times. Mathematics of Operations Research, 41(3), 898–913. https://doi.org/10.1287/moor.2015.0760
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (pp. 3215–3222). AAAI Press. https://doi.org/10.1609/aaai.v32i1.11796
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
Graesser, L., & Keng, W. L. (2019). Foundations of Deep Reinforcement Learning: Theory and Practice in Python. Boston: Addison-Wesley Professional.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction, (2nd ed.). Adaptive Computation and Machine Learning. Cambridge: The MIT Press.
Matta, M., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Giardino, D., Re, M., Silvestri, F., & Spanò, S. (2019). Q-RTS: A real-time swarm intelligence based on multi-agent Q-learning. Electronics Letters, 55(10), 589–591. https://doi.org/10.1049/el.2019.0244
Kar, S., Moura, J. M. F., & Poor, H. V. (2013). 2D-learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovations. IEEE Transactions on Signal Processing, 61(7), 1848–1862. https://doi.org/10.1109/TSP.2013.2241057
Ottoni, A. L., Nepomuceno, E. G., Oliveira, M. S., & Oliveira, D. C. (2022). Reinforcement learning for the traveling salesman problem with refueling. Complex and Intelligent Systems, 8(3), 2001–2015. https://doi.org/10.1007/s40747-021-00444-4