以自我組織特徵映射圖為基礎之｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳律宇 Lu-Yu Chen
論文名稱：	以自我組織特徵映射圖為基礎之 A SOM-based Fuzzy Systems Q-learning in Continuous State and Action Space
指導教授：	蘇木春 Mu-Chun Su
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
畢業學年度：	94
語文別：	中文
論文頁數：	64
中文關鍵詞：	任務分解、連續性Q-learning 、增強式學習、自我組織特徵映射圖
外文關鍵詞：	continuous Q-learning, task decomposition, self-organizing feature map, reinforcement learning
相關次數：	點閱：10 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

所謂的增強式學習法（Reinforcement Learning），就是訓練對象與環境互動的過程中，不藉助監督者提供完整的指令下，可以自行發掘在各種狀態下該採取什麼行動才能獲得最大報酬。而Q-learning 是一種常見的增強式學習法，藉由建立每一個狀態對應每一個動作之Q值的查詢表（look-up table），Q-learning 可以順利的處理存在少量離散狀態與動作空間的問題上。但當處理的問題擁有大量的狀態與動作時，所要建立的查詢表便會十分的巨大，所以此種對於每一個狀態－動作建立查詢表的方法便顯得不可行。本論文提出一個以自我組織特
徵映射網路（Self-Organization Feature Map network, SOM network）為基礎的模糊系統來實作Q-learning，並以此方法來設計控制系統。為了加速訓練的過程，本論文結合任務分解（task decomposition）與自動任務分解的機制來處理複雜的任務。藉由機器人的模擬實驗，可以看出此方法的有效性。

In reinforcement learning, there is no supervisor to critically judge the chosen action at each step. The learning is through a trial-and-error procedure interacting with a dynamic environment. Q-learning is one popular approach to reinforcement learning. It is widely applied to problems with discrete states and actions and usually implemented by a look-up table where each item corresponds to a combination of a state
and an action. However, the look-up table plementation of Q-learning fails in problems with continuous state and action space because an
exhaustive enumeration of all state-action pairs is impossible. In this thesis, an implementation of Q-learning for solving problems with continuous state and action space using SOM-based fuzzy systems is proposed. Simulations of training a robot to complete two different tasks
are used to demonstrate the effectiveness of the proposed approach. Reinforcement learning usually is a slow process. In order to accelerate
the learning procedure, a hybrid approach which integrates the advantages of the ideas of hierarchical learning and the progressive learning to decompose a complex task into simple elementary tasks is proposed.

第一章　緒論	1
1	研究動機	1
2  研究目標	1
3	論文架構	2
第二章　增強式學習法與Q-LEARNING	3
1	增強式學習法	3
2	馬可夫決策程序	4
3	價值函數	5
4　Q-LEARNING	7
4.1  簡介	7
4.2　實驗結果	8
5　Q-LEARNING的缺點	9
5.1  巨大的狀態與動作空間	9
5.2  連續的狀態與動作	10
6　相關研究	10
6.1	QCON	11
6.2  CMAC-based Q-learning	11
6.3  Q-KOHON	12
7  結論	13
第三章　以SOM為基礎之模糊系統	14
1　模糊系統	14
1.1　模糊控制	14
1.2	模糊化類神經網路	15
1.3	效能與計算量	16
2  自我組織特徵映射演算法	17
2.1  簡介	17
2.2  演算法流程	17
2.3  SOM演算法的初始化	19
3　以SOM為基礎之模糊系統	21
3.1  基本架構	21
3.2  訓練方式	22
第四章  研究方法與步驟	24
1  SOMFUS-Q	24
1.1  基本架構	24
1.2  訓練步驟	25
1.3  歸納能力	26
2  任務分解	29
3  自動任務分解	30
3.1  適應共振理論簡介	30
3.2  自動任務分解機制	31
第五章　機器人導航模擬實驗	33
1	實驗說明	33
1.1　實驗環境	33
1.2　機器人	33
2	基本行為訓練	35
2.1  基本行為1：沿牆行進	35
2.2  基本行為2：尋找光源	37
2.3  基本行為3：前往指定位置	38
3　任務分解實驗	39
3.1  找尋充電器	39
3.2  貨物收集	41
4  自動任務分解實驗	42
5  實驗結果分析	43
5.1  任務分解優缺點	44
5.2　自動任務分解優缺點	44
第六章  結論與展望	46
1  結論	46
2  未來展望	46
參考文獻	48

                                

[1] M. N. Ahmadabadi and M. Asadpur, “Expertness Based Cooperative Q-Learning,” IEEE Transactions on Systems, Man, and Cybernetics-part B: Cybernetics, vol. 32, no. 1, Feb 2002.
[2] J. S. Albus, “A new approach to manipulator control: the cerebrellar model articulated controller(CMAC),” Journal of Dynamic Systems, Measurement and Control, pp. 220-227, 1997.
[3] G. A. Carpenter and S. Grossberg, “A massively parallel architecture for a self-organizing neural pattern recognition machine,” Comput. Vision Graphics Image Process, vol. 37, pp. 54-115, 1987.
[4] G. A. Carpenter and S. Grossberg, “ART 2: Self-organization of stable category recognition codes for analog input patterns,” Appl. Opt., vol. 26, pp. 4919-4930, 1987.
[5] G. A. Carpenter and S. Grossberg, “The ART of adaptive pattern recognition by a self-organization neural network,” computer, vol. 21, no. 3, pp. 77-88, 1988.
[6] G. A. Carpenter and S. Grossberg, “ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures,” Neural Networks, vol. 3, no. 2, pp. 129-152, 1990.
[7] C. Gaskett, D. Wettergreen, and A. Zelinsky, “Q-learning in continuous state and action spaces,” 12th Australian Joint Conference on Artificial Intelligence, Australia, 1999.
[8] P. Y. Glorennec, “Fuzzy Q-learning and Dynamical Fuzzy Q-Learning,” Proc. of 3rd IEEE International Conference on Fuzzy Systems, pp. 474-479, USA, 1994.
[9] P. Y. Glorennec and L. Jouffe, “Fuzzy Q-Learning,” Proc. Of 6th IEEE International Conference on Fuzzy Systems, pp. 659-662. Spain, 1997.
[10] H.-M. Gross, V. Stephan, and M. Krabbes. “A neural field approach to topological reinforcement learning in continuous action spaces, ” Proc. 1998 IEEE World Congress on Computational Intelligence, WCCI''98 and International Joint Conference on Neural Networks, IJCNN''98, Anchorage, Alaska, 1998.
[11] J. Hollatz, “Fuzzy identification using methods of intelligent data analysis,” Fuzzy Model Identification, H. Hellen doorn and D. Driankov, Eds., Springer-verlag, Berlin, 1997, pp. 166-191.
[12] T. Horiuchi, A. Fujino, O. Katai, and T. Sawaragi, “Fuzzy Interpolation-Based Q-Learning with Continuous States and Actions,” Proc. of 5th IEEE International Conference on Fuzzy Systems, pp. 594-600, USA, 1996.
[13] J.-S. R. Jang, “ANFIS : Adaptive-network-based fuzzy inference systems,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 23, no. 3, pp. 665-685, 1993.
[14] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy And Soft Computing, Prentice-Hall International, Inc., 1997.
[15] L. Jouffe and P. Y. Glorennec, “Comparison between Connectionist and Fuzzy Q-learning,” Proc. of 4th International Conference on Sofr Computing, pp. 557-560, Japan, 1996.
[16] T. Kohonen, Self-Organization and Associative Memory, Springer, Berlin, third edition, 1989.
[17] T. Kohonen, Self-organizing Maps, Springer-Verlag, Berlin, 1995.
[18] C.-T. Lin and C. S. G. Lee, “Neural-network-based fuzzy logic control and decision system,” IEEE Trans. on Computers, vol. 40, no. 12, pp. 1320-1336, 1991.
[19] C. -T. Lin, and C. S. G. Lee, Neural fuzzy System: A Neuro–Fuzzy Synergism to Intelligent Systems, Prentice-Hall International, Inc., 1996.
[20] L.-J. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning,” vol.8, no.3, 1992.
[21] L. -J. Lin and T. -M. Mitchell, “Reinforcement Learning with Hidden States,” Animals to Animats 2, MIT Press, pp. 271-280, 1993.
[22] L. -J. Lin, “Hierarchical learning of robots skills by reinforcement,” Neural Networks, IEEE International Conference, pp. 181-186, 1993.
[23] E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” Int. Journal of Man-Machine Studies, vol. 7, no. 1, pp. 1-13, 1975.
[24] J. Moody and C. Darken, “Fast learning in networks of local-tuned processing units,” Neural Comput., vol. 1, pp. 281-294, 1989.
[25] C.-H. Oh, T. Nakashima, and H. Ishibuchi, “Initialization of Q-values by fuzzy rules for accelerating Q-learning,” IEEE International Joint Conference, vol. 3, no. 4-9, pp. 2051-2056, 1998.
[26] M. J. D. Powell, “Radial basis functions for multivariable interpolation: A review,” Algorithms for Approximation, eds., J.C. Mason and M.G. Cox, Oxford : Oxford University Press, 1987, pp. 143-167.
[27] G. A. Rummery, Problem solving with reinforcement learning, PhD thesis, Cambridge University, 1995.
[28] F. Saito and T. Fukuda, “Learning architecture for real robot systems—extension of connectionist Q-learning for continuous robot control domain,” Proceedings of the International Conference on Robotics and Automation（IROS’94), pp. 27-32, 1994.
[29] J. C. Santamaria, R. S. Sutton, and A. Ram. “Experiments with reinforcement learning in problems with continuous state and action spaces,” Adaptive Behaviour, vol. 6, no. 2, pp. 163-218, 1998.
[30] S. Sehad and C. Touzet, “Self-organising map for reinforcement learning: Obstacle avoidance with Khepera,” Proceedings of Perception to Action, Lausanne, Switzerland, 1994.
[31] M. C. Su, “Identification of singleton fuzzy models via fuzzy hyperrectangular composite NN,” Fuzzy Model Identification : Selected Approaches, H, Hellendoorn and D. Driankov Eds., Springer, Berlin, Germany, 1997, pp. 193-212.
[32] M. C. Su, C. W. Liu, S. S. Tsay, “Neural-network-based fuzzy model and its application to transient stability prediction in power systems,” IEEE Trans on Systems, Man, and Cybernetics, vol. 29, pp. 149-157, Feb, 1999.
[33] M. C. Su, T. K. Liu, and H. T. Chang, “An efficient initialization scheme for the self-organizing feature map algorithm,” International Joint Conference on Neural Networks, Washington, D. C., 1999.
[34] M. C. Su, D. Y. Huang, C. H. Chou, and C. C. Hsieh, “A reinforcement-learning approach to robot navigation,” Networking, Sensing and Control, 2004 IEEE International Conference, vol. 1, pp. 665-669, 2004.
[35] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, no. 8, MIT Press, 1996, pp. 1038-1044.
[36] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA: MIT Press, 1998.
[37] C. F. Touzet. “Neural reinforcement learning for behaviour synthesis,” Robotics and Autonomous Systems, vol. 22, no. 3-4, pp. 251-81. 1997.
[38] L.-X. Wang and J. M. Mendel, “Back-propagation fuzzy systems as nonlinear dynamic system identifiers,” Int. Conf. on Fuzzy Systems, San Diego, 1992.
[39] L.-X. Wang, Adaptive Fuzzy Systems and Control: Design and Stability Analysis, Prentice Hall, Englowood Cliffs, NJ., 1994.
[40] L. X. Wang, A Course in Fuzzy Systems and Control, Prentice Hall, Inc., 1997.
[41] C. J. C. H. Watkins, and P. Dayan, “Technical note: Q learning,” Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.
[42] P. J. Werbos. “Approximate dynamic programming for real-time control and neural modeling,” Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A., Sofge Van Nostrand Reinhold, 1992.
[43] R. Carter, Mapping the Mind, 洪蘭，譯，大腦的秘密檔案，遠流出版公司，2002.
[44] 蘇木春，張孝德著，機器學習：類神經網路、模糊系統以及基因演算法則，全華科技圖書股份有限公司。

簡易檢索 / 詳目顯示

相關論文