[PDF][PDF] Algorithms for inverse reinforcement learning.

AY Ng, SJ Russell - Icml, 2000 - ai.stanford.edu
Abstract This paper addresses the problem of inverse reinforcement learning (IRL) in
Markov decision processes, that is, the problem of extracting a reward function given
observed, optimal behavior. IRL may be useful for apprenticeship learning to acquire ...

Apprenticeship learning via inverse reinforcement learning

P Abbeel, AY Ng - Proceedings of the twenty-first international …, 2004 - dl.acm.org
Abstract We consider learning in a Markov decision process where we are not explicitly
given a reward function, but where instead we can observe an expert demonstrating the task
that we want to learn to perform. This setting is useful in applications (such as the task of ...

[PDF][PDF] Bayesian inverse reinforcement learning

D Ramachandran, E Amir - Urbana, 2007 - aaai.org
Abstract Inverse Reinforcement Learning (IRL) is the problem of learning the reward function
underlying a Markov Decision Process given the dynamics of the system and the behaviour
of an expert. IRL is motivated by situations where knowledge of the rewards is a goal by ...

[PDF][PDF] Maximum Entropy Inverse Reinforcement Learning.

BD Ziebart, AL Maas, JA Bagnell, AK Dey - AAAI, 2008 - aaai.org
Abstract Recent research has shown the benefit of framing problems of imitation learning as
solutions to Markov Decision Problems. This approach reduces learning to the problem of
recovering a utility function that makes the behavior induced by a near-optimal policy ...

Maximum margin planning

ND Ratliff, JA Bagnell, MA Zinkevich - Proceedings of the 23rd …, 2006 - dl.acm.org
Abstract Imitation learning of sequential, goal-directed behavior by standard supervised
techniques is often difficult. We frame learning such behaviors as a maximum margin
structured prediction problem over a space of policies. In this approach, we learn ...

Apprenticeship learning using inverse reinforcement learning and gradient methods

G Neu, C Szepesvári - arXiv preprint arXiv:1206.5264, 2012 - arxiv.org
Abstract: In this paper we propose a novel gradient algorithm to learn a policy from an
expert's observed behavior assuming that the expert behaves optimally with respect to some
unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find ...

[PDF][PDF] A game-theoretic approach to apprenticeship learning

U Syed, RE Schapire - Advances in neural information …, 2007 - machinelearning.wustl.edu
Abstract We study the problem of an apprentice learning to behave in an environment with
an unknown reward function by observing the behavior of an expert. We follow on the work
of Abbeel and Ng [1] who considered a framework in which the true reward function is ...

Learning agents for uncertain environments

S Russell - Proceedings of the eleventh annual conference on …, 1998 - dl.acm.org
Abstract This talk proposes a very simple “baseline architecture” for a learning agent that
can handle stochastic, partially observable environments. The architecture uses
reinforcement learning together with a method for representing temporal processes as ...

Inverse reinforcement learning

P Abbeel, AY Ng - Encyclopedia of machine learning, 2011 - Springer
A proposed theory that the immune system is capable of achieving immunological memory
by the existence of a mutually reinforcing network of B-cells. Tis network of B-cells forms due
to the ability of the paratopes, located on B-cells, to match against the idiotopes on other B ...

Apprenticeship learning using linear programming

U Syed, M Bowling, RE Schapire - Proceedings of the 25th international …, 2008 - dl.acm.org
Abstract In apprenticeship learning, the goal is to learn a policy in a Markov decision
process that is at least as good as a policy demonstrated by an expert. The difficulty arises in
that the MDP's true reward function is assumed to be unknown. We show how to frame ...