Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Quentin Le Lidec; Wilson Jallet; Ivan Laptev; Cordelia Schmid; Justin Carpentier

Communication Dans Un Congrès Année : 2023

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

(1) , (1, 2) , (1) , (1) , (1)

1
2

Quentin Le Lidec

Fonction : Auteur
PersonId : 1083016

Models of visual object recognition and scene understanding

Wilson Jallet

Fonction : Auteur
PersonId : 749065
IdHAL : wilson-jallet
ORCID : 0000-0001-8222-2739

Models of visual object recognition and scene understanding

Équipe Mouvement des Systèmes Anthropomorphes

Ivan Laptev

Fonction : Auteur

Models of visual object recognition and scene understanding

Cordelia Schmid

Fonction : Auteur

Models of visual object recognition and scene understanding

Justin Carpentier

Fonction : Auteur
PersonId : 3401
IdHAL : justin-carpentier
ORCID : 0000-0001-6585-2894
IdRef : 233948015

Models of visual object recognition and scene understanding

Résumé

Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly converge towards a locally optimal control trajectory which is only valid within the vicinity of the solution. Over the past decade, several approaches have aimed to adequately combine the two classes of methods in order to obtain the best of both worlds. Following on from this line of research, we propose several improvements on top of these approaches to learn global control policies quicker, notably by leveraging sensitivity information stemming from TO methods via Sobolev learning, and augmented Lagrangian techniques to enforce the consensus between TO and policy learning. We evaluate the benefits of these improvements on various classical tasks in robotics through comparison with existing approaches in the literature.

Domaines

Robotique [cs.RO] Apprentissage [cs.LG]

Fichier principal

lelidec2022policy.pdf (923.55 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Quentin Le Lidec : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03780392

Soumis le : lundi 19 septembre 2022-11:52:26

Dernière modification le : vendredi 19 avril 2024-16:18:58

Archivage à long terme le : mardi 20 décembre 2022-18:29:41

Dates et versions

hal-03780392 , version 1 (19-09-2022)

hal-03780392 , version 2 (20-01-2023)

hal-03780392 , version 3 (16-02-2023)

Identifiants

HAL Id : hal-03780392 , version 1

Citer

Quentin Le Lidec, Wilson Jallet, Ivan Laptev, Cordelia Schmid, Justin Carpentier. Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control. International Conference on Robotics and Automation, May 2023, London, United Kingdom. ⟨hal-03780392v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

275 Consultations

147 Téléchargements

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager