| 研究生: |
陳律宇 Lu-Yu Chen |
|---|---|
| 論文名稱: |
以自我組織特徵映射圖為基礎之 A SOM-based Fuzzy Systems Q-learning in Continuous State and Action Space |
| 指導教授: |
蘇木春
Mu-Chun Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 任務分解 、連續性Q-learning 、增強式學習 、自我組織特徵映射圖 |
| 外文關鍵詞: | continuous Q-learning, task decomposition, self-organizing feature map, reinforcement learning |
| 相關次數: | 點閱:10 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
所謂的增強式學習法(Reinforcement Learning),就是訓練對象與環境互動的過程中,不藉助監督者提供完整的指令下,可以自行發掘在各種狀態下該採取什麼行動才能獲得最大報酬。而Q-learning 是一種常見的增強式學習法,藉由建立每一個狀態對應每一個動作之Q值的查詢表(look-up table),Q-learning 可以順利的處理存在少量離散狀態與動作空間的問題上。但當處理的問題擁有大量的狀態與動作時,所要建立的查詢表便會十分的巨大,所以此種對於每一個狀態-動作建立查詢表的方法便顯得不可行。本論文提出一個以自我組織特
徵映射網路(Self-Organization Feature Map network, SOM network)為基礎的模糊系統來實作Q-learning,並以此方法來設計控制系統。為了加速訓練的過程,本論文結合任務分解(task decomposition)與自動任務分解的機制來處理複雜的任務。藉由機器人的模擬實驗,可以看出此方法的有效性。
In reinforcement learning, there is no supervisor to critically judge the chosen action at each step. The learning is through a trial-and-error procedure interacting with a dynamic environment. Q-learning is one popular approach to reinforcement learning. It is widely applied to problems with discrete states and actions and usually implemented by a look-up table where each item corresponds to a combination of a state
and an action. However, the look-up table plementation of Q-learning fails in problems with continuous state and action space because an
exhaustive enumeration of all state-action pairs is impossible. In this thesis, an implementation of Q-learning for solving problems with continuous state and action space using SOM-based fuzzy systems is proposed. Simulations of training a robot to complete two different tasks
are used to demonstrate the effectiveness of the proposed approach. Reinforcement learning usually is a slow process. In order to accelerate
the learning procedure, a hybrid approach which integrates the advantages of the ideas of hierarchical learning and the progressive learning to decompose a complex task into simple elementary tasks is proposed.
[1] M. N. Ahmadabadi and M. Asadpur, “Expertness Based Cooperative Q-Learning,” IEEE Transactions on Systems, Man, and Cybernetics-part B: Cybernetics, vol. 32, no. 1, Feb 2002.
[2] J. S. Albus, “A new approach to manipulator control: the cerebrellar model articulated controller(CMAC),” Journal of Dynamic Systems, Measurement and Control, pp. 220-227, 1997.
[3] G. A. Carpenter and S. Grossberg, “A massively parallel architecture for a self-organizing neural pattern recognition machine,” Comput. Vision Graphics Image Process, vol. 37, pp. 54-115, 1987.
[4] G. A. Carpenter and S. Grossberg, “ART 2: Self-organization of stable category recognition codes for analog input patterns,” Appl. Opt., vol. 26, pp. 4919-4930, 1987.
[5] G. A. Carpenter and S. Grossberg, “The ART of adaptive pattern recognition by a self-organization neural network,” computer, vol. 21, no. 3, pp. 77-88, 1988.
[6] G. A. Carpenter and S. Grossberg, “ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures,” Neural Networks, vol. 3, no. 2, pp. 129-152, 1990.
[7] C. Gaskett, D. Wettergreen, and A. Zelinsky, “Q-learning in continuous state and action spaces,” 12th Australian Joint Conference on Artificial Intelligence, Australia, 1999.
[8] P. Y. Glorennec, “Fuzzy Q-learning and Dynamical Fuzzy Q-Learning,” Proc. of 3rd IEEE International Conference on Fuzzy Systems, pp. 474-479, USA, 1994.
[9] P. Y. Glorennec and L. Jouffe, “Fuzzy Q-Learning,” Proc. Of 6th IEEE International Conference on Fuzzy Systems, pp. 659-662. Spain, 1997.
[10] H.-M. Gross, V. Stephan, and M. Krabbes. “A neural field approach to topological reinforcement learning in continuous action spaces, ” Proc. 1998 IEEE World Congress on Computational Intelligence, WCCI''98 and International Joint Conference on Neural Networks, IJCNN''98, Anchorage, Alaska, 1998.
[11] J. Hollatz, “Fuzzy identification using methods of intelligent data analysis,” Fuzzy Model Identification, H. Hellen doorn and D. Driankov, Eds., Springer-verlag, Berlin, 1997, pp. 166-191.
[12] T. Horiuchi, A. Fujino, O. Katai, and T. Sawaragi, “Fuzzy Interpolation-Based Q-Learning with Continuous States and Actions,” Proc. of 5th IEEE International Conference on Fuzzy Systems, pp. 594-600, USA, 1996.
[13] J.-S. R. Jang, “ANFIS : Adaptive-network-based fuzzy inference systems,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 23, no. 3, pp. 665-685, 1993.
[14] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy And Soft Computing, Prentice-Hall International, Inc., 1997.
[15] L. Jouffe and P. Y. Glorennec, “Comparison between Connectionist and Fuzzy Q-learning,” Proc. of 4th International Conference on Sofr Computing, pp. 557-560, Japan, 1996.
[16] T. Kohonen, Self-Organization and Associative Memory, Springer, Berlin, third edition, 1989.
[17] T. Kohonen, Self-organizing Maps, Springer-Verlag, Berlin, 1995.
[18] C.-T. Lin and C. S. G. Lee, “Neural-network-based fuzzy logic control and decision system,” IEEE Trans. on Computers, vol. 40, no. 12, pp. 1320-1336, 1991.
[19] C. -T. Lin, and C. S. G. Lee, Neural fuzzy System: A Neuro–Fuzzy Synergism to Intelligent Systems, Prentice-Hall International, Inc., 1996.
[20] L.-J. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning,” vol.8, no.3, 1992.
[21] L. -J. Lin and T. -M. Mitchell, “Reinforcement Learning with Hidden States,” Animals to Animats 2, MIT Press, pp. 271-280, 1993.
[22] L. -J. Lin, “Hierarchical learning of robots skills by reinforcement,” Neural Networks, IEEE International Conference, pp. 181-186, 1993.
[23] E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” Int. Journal of Man-Machine Studies, vol. 7, no. 1, pp. 1-13, 1975.
[24] J. Moody and C. Darken, “Fast learning in networks of local-tuned processing units,” Neural Comput., vol. 1, pp. 281-294, 1989.
[25] C.-H. Oh, T. Nakashima, and H. Ishibuchi, “Initialization of Q-values by fuzzy rules for accelerating Q-learning,” IEEE International Joint Conference, vol. 3, no. 4-9, pp. 2051-2056, 1998.
[26] M. J. D. Powell, “Radial basis functions for multivariable interpolation: A review,” Algorithms for Approximation, eds., J.C. Mason and M.G. Cox, Oxford : Oxford University Press, 1987, pp. 143-167.
[27] G. A. Rummery, Problem solving with reinforcement learning, PhD thesis, Cambridge University, 1995.
[28] F. Saito and T. Fukuda, “Learning architecture for real robot systems—extension of connectionist Q-learning for continuous robot control domain,” Proceedings of the International Conference on Robotics and Automation(IROS’94), pp. 27-32, 1994.
[29] J. C. Santamaria, R. S. Sutton, and A. Ram. “Experiments with reinforcement learning in problems with continuous state and action spaces,” Adaptive Behaviour, vol. 6, no. 2, pp. 163-218, 1998.
[30] S. Sehad and C. Touzet, “Self-organising map for reinforcement learning: Obstacle avoidance with Khepera,” Proceedings of Perception to Action, Lausanne, Switzerland, 1994.
[31] M. C. Su, “Identification of singleton fuzzy models via fuzzy hyperrectangular composite NN,” Fuzzy Model Identification : Selected Approaches, H, Hellendoorn and D. Driankov Eds., Springer, Berlin, Germany, 1997, pp. 193-212.
[32] M. C. Su, C. W. Liu, S. S. Tsay, “Neural-network-based fuzzy model and its application to transient stability prediction in power systems,” IEEE Trans on Systems, Man, and Cybernetics, vol. 29, pp. 149-157, Feb, 1999.
[33] M. C. Su, T. K. Liu, and H. T. Chang, “An efficient initialization scheme for the self-organizing feature map algorithm,” International Joint Conference on Neural Networks, Washington, D. C., 1999.
[34] M. C. Su, D. Y. Huang, C. H. Chou, and C. C. Hsieh, “A reinforcement-learning approach to robot navigation,” Networking, Sensing and Control, 2004 IEEE International Conference, vol. 1, pp. 665-669, 2004.
[35] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, no. 8, MIT Press, 1996, pp. 1038-1044.
[36] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA: MIT Press, 1998.
[37] C. F. Touzet. “Neural reinforcement learning for behaviour synthesis,” Robotics and Autonomous Systems, vol. 22, no. 3-4, pp. 251-81. 1997.
[38] L.-X. Wang and J. M. Mendel, “Back-propagation fuzzy systems as nonlinear dynamic system identifiers,” Int. Conf. on Fuzzy Systems, San Diego, 1992.
[39] L.-X. Wang, Adaptive Fuzzy Systems and Control: Design and Stability Analysis, Prentice Hall, Englowood Cliffs, NJ., 1994.
[40] L. X. Wang, A Course in Fuzzy Systems and Control, Prentice Hall, Inc., 1997.
[41] C. J. C. H. Watkins, and P. Dayan, “Technical note: Q learning,” Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.
[42] P. J. Werbos. “Approximate dynamic programming for real-time control and neural modeling,” Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A., Sofge Van Nostrand Reinhold, 1992.
[43] R. Carter, Mapping the Mind, 洪蘭,譯,大腦的秘密檔案,遠流出版公司,2002.
[44] 蘇木春,張孝德 著,機器學習:類神經網路、模糊系統以及基因演算法則,全華科技圖書股份有限公司。