DSpace Repository

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Show simple item record

dc.creator Jaakkola, Tommi
dc.creator Jordan, Michael I.
dc.creator Singh, Satinder P.
dc.date 2004-10-20T20:49:46Z
dc.date 2004-10-20T20:49:46Z
dc.date 1993-08-01
dc.date.accessioned 2013-10-09T02:48:33Z
dc.date.available 2013-10-09T02:48:33Z
dc.date.issued 2013-10-09
dc.identifier AIM-1441
dc.identifier CBCL-084
dc.identifier http://hdl.handle.net/1721.1/7205
dc.identifier.uri http://koha.mediu.edu.my:8181/xmlui/handle/1721
dc.description Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
dc.format 15 p.
dc.format 77605 bytes
dc.format 356324 bytes
dc.format application/octet-stream
dc.format application/pdf
dc.language en_US
dc.relation AIM-1441
dc.relation CBCL-084
dc.subject reinforcement learning
dc.subject stochastic approximation
dc.subject sconvergence
dc.subject dynamic programming
dc.title On the Convergence of Stochastic Iterative Dynamic Programming Algorithms


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account