| dc.creator | Jaakkola, Tommi | |
| dc.creator | Jordan, Michael I. | |
| dc.creator | Singh, Satinder P. | |
| dc.date | 2004-10-20T20:49:46Z | |
| dc.date | 2004-10-20T20:49:46Z | |
| dc.date | 1993-08-01 | |
| dc.date.accessioned | 2013-10-09T02:48:33Z | |
| dc.date.available | 2013-10-09T02:48:33Z | |
| dc.date.issued | 2013-10-09 | |
| dc.identifier | AIM-1441 | |
| dc.identifier | CBCL-084 | |
| dc.identifier | http://hdl.handle.net/1721.1/7205 | |
| dc.identifier.uri | http://koha.mediu.edu.my:8181/xmlui/handle/1721 | |
| dc.description | Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. | |
| dc.format | 15 p. | |
| dc.format | 77605 bytes | |
| dc.format | 356324 bytes | |
| dc.format | application/octet-stream | |
| dc.format | application/pdf | |
| dc.language | en_US | |
| dc.relation | AIM-1441 | |
| dc.relation | CBCL-084 | |
| dc.subject | reinforcement learning | |
| dc.subject | stochastic approximation | |
| dc.subject | sconvergence | |
| dc.subject | dynamic programming | |
| dc.title | On the Convergence of Stochastic Iterative Dynamic Programming Algorithms |
| Files | Size | Format | View |
|---|---|---|---|
|
There are no files associated with this item. |
|||