查詢結果分析
相關文獻
- Reinforcement Learning for Ga-Based Neural Networks
- 運用類神經網路於股價指數之套利--以日經225指數為例
- 排程方法研究
- 運用類神經網路在外匯選擇權評價模式之實證研究
- 基因演算法自動演化之類神經網路在選擇權評價及避險之研究--分析與實證
- A GA-based Adaptive Fuzzy System with Reinforcement Learning
- A GA-Based Neural Network with Supervised and Reinforcement Learning
- Fuzzy Neural Network Control of Nonlinear Systems
- Neural-Based Annealing Processes Optimization of Cold-Rolling Coil by Genetic Algorithm and Grey Relational Analysis
- 最佳類神經網路於系統模式建立之應用
頁籤選單縮合
題 名 | Reinforcement Learning for Ga-Based Neural Networks=以基因演算法為基礎之類神經網路加強式學習 |
---|---|
作 者 | 林正堅; | 書刊名 | Journal of the Chinese Institute of Electrical Engineering |
卷 期 | 6:2 1999.05[民88.05] |
頁 次 | 頁141-156 |
分類號 | 448.5 |
關鍵詞 | 類神經網路; 基因演算法; 加強式學習; Neural netowrk; GA; Reinforcement learning; |
語 文 | 英文(English) |
中文摘要 | 本篇論文提出一基因加強式類神經網路(GRNN)來解決各種加強式學習的問題。所 提出GRNN是由兩個順向多層網路所構成。其中一個網路用來當作動作網路,以決定GRNN 的輸出;另一網路當作評估網路來幫助動作網路的學習。利用時序差預測方法,此評估網路 可以預測外部加強式訊號,且提供具更多訊息的內部加強式訊號給動作網路。而動作網路根 據內部加強式訊號,利用基因演算法將本身作一調變。在此我們乃利用內部加強式訊號當作 基因演算法的適合函數,如此基因演算法能評估一些候選解而不須等外部加強式訊號的迴 授,這種方法可以加速基因演算法的學習。電腦模擬已經驗證我們所提演算法之功能。 |
英文摘要 | This paper proposes a Genetic Reinforcement Neural Network (GRNN) for solving various reinforcement learning problems. The proposed GRNN is constructed by integrating two feedforward multilayer networks. One neural network acts as an action network for determining the outputs (actions) of the GRNN, and the other as a critic network for helping the learning of the action network. Using the temporal difference prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the genetic algorithm (GA) to adapt itself according to the internal reinforcement signal. The key concept of the proposed GRNN learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA. This learning scheme forms a novel hybrid GA, which consists of the temporal difference and gradient descent methods for the critic network learning, and the GA for the action network learning. By using the internal reinforcement signal as the fitness function, the GA can evaluate the candidate solutions (chromosomes) regularly even during the period without external reinforcement feedback from the environment. Hence, the GA can proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problems. Computer simulations have been conducted to illustrate the performance and applicability of the proposed learning scheme. |
本系統中英文摘要資訊取自各篇刊載內容。