On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

dc.creator	Jaakkola, Tommi
dc.creator	Jordan, Michael I.
dc.creator	Singh, Satinder P.
dc.date	2004-10-20T20:49:46Z
dc.date	2004-10-20T20:49:46Z
dc.date	1993-08-01
dc.date.accessioned	2013-10-09T02:48:33Z
dc.date.available	2013-10-09T02:48:33Z
dc.date.issued	2013-10-09
dc.identifier	AIM-1441
dc.identifier	CBCL-084
dc.identifier	http://hdl.handle.net/1721.1/7205
dc.identifier.uri	http://koha.mediu.edu.my:8181/xmlui/handle/1721
dc.description	Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
dc.format	15 p.
dc.format	77605 bytes
dc.format	356324 bytes
dc.format	application/octet-stream
dc.format	application/pdf
dc.language	en_US
dc.relation	AIM-1441
dc.relation	CBCL-084
dc.subject	reinforcement learning
dc.subject	stochastic approximation
dc.subject	sconvergence
dc.subject	dynamic programming
dc.title	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Files in this item

Files	Size	Format	View
There are no files associated with this item.