Anda belum login :: 27 Nov 2024 06:49 WIB
Detail
ArtikelEmpirical Studies in Action Selection with Reinforcement Learning  
Oleh: Whiteson, Shimon ; Taylor, Matthew E. ; Stone, P.
Jenis: Article from Journal - e-Journal
Dalam koleksi: Adaptive Behavior vol. 15 no. 1 (Mar. 2007), page 33–50.
Topik: reinforcement learning; temporal difference methods; evolutionary computation; neural networks; robot soccer; autonomic computing
Fulltext: 33.pdf (1.1MB)
Isi artikelTo excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. In this article we aim to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together. First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to the performance of each method. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.
Opini AndaKlik untuk menuliskan opini Anda tentang koleksi ini!

Kembali
design
 
Process time: 0.015625 second(s)