Model-based approximation methods for reinforcement learning

DSpace Home
→
Harvested articles مقالات مستوردة من مؤسسات وجامعات عالمية
→
ScholarsArchive@OSU
→
View Item

dc.contributor	Dietterich, Thomas
dc.contributor	Burnett, Margaret
dc.contributor	Quinn, Michael
dc.contributor	Tadepalli, Prasad
dc.contributor	Burkes, David
dc.date	2006-07-24T15:35:59Z
dc.date	2006-07-24T15:35:59Z
dc.date	2006-05-08
dc.date	2006-07-24T15:35:59Z
dc.date.accessioned	2013-10-16T07:38:38Z
dc.date.available	2013-10-16T07:38:38Z
dc.date.issued	2013-10-16
dc.identifier	http://hdl.handle.net/1957/2581
dc.identifier.uri	http://koha.mediu.edu.my:8181/xmlui/handle/1957/2581
dc.description	Graduation date: 2007
dc.description	The thesis focuses on model-based approximation methods for reinforcement learning with large scale applications such as combinatorial optimization problems. First, the thesis proposes two new model-based methods to stablize the value–function approximation for reinforcement learning. The first one is the BFBP algorithm, a batch-like reinforcement learning process which iterates between the exploration and exploitation stages of the learning process. For the exploitation part, this thesis investigates the plausibility and performance of using more efficient offline algorithms such as linear regression, regression trees, and SVMs for value–function approximators. The thesis discovers that with systematic local search methods such as Limited Discrepancy Search and a good initial heuristic, the algorithm often coverges faster and to a better level of performance, compared with epsilon greedy exploration methods. The second method combines linear programming with the kernel trick to find value–function approximators for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attempt to minimize the number of support vectors while fitting the data. The advantage of the kernel methods is that they can easily adjust the complexity of the function approximator to fit the complexity of the value function. The thesis also proposes a model-based policy gradient reinforcement learning algorithm. In our approach, we learn the models P(s′\|s, a) and R(s′\|s, a), and then use dynamic programming to compute the value of the policy directly from the model. Unlike online sampling-based policy gradient algorithms, it does not suffer from high variances, and it also converges faster. In summary, the thesis purposed model-based approximation algorithms for both value function based and policy gradient reinforcement learning, with promising application results on multiple problem domains and job-shop scheduling benchmarks.
dc.language	en_US
dc.subject	Reinforcement Learning
dc.subject	Model-based Approximation
dc.subject	Policy Gradient Methods
dc.subject	Value Function Approximation
dc.subject	Kernel Methods
dc.subject	Convergence of Policy Gradient Methods
dc.subject	Sample Complexity
dc.title	Model-based approximation methods for reinforcement learning
dc.type	Thesis

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

ScholarsArchive@OSU

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Model-based approximation methods for reinforcement learning

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account