SoCal Robotics Symposium

Last weekend we were at Caltech for the 2019 SoCal Robotics Symposium. It was a great, small conference with really interesting ideas from academia, the Jet Propulsion Laboratory, and industry.

I presented our work on motivation dynamics [1]. The extended abstract is available here [2] and the summary poster slide is below. Thanks to the organizers for a well-organized and stimulating day.

[1] [pdf] P. B. Reverdy and D. E. Koditschek, “A dynamical system for prioritizing and coordinating motivations,” SIAM Journal on Applied Dynamical Systems, vol. 17, iss. 2, p. 1683–1715, 2018.
[Bibtex]
@article{PBR-DEK:18,
Author = {Reverdy, Paul B and Koditschek, Daniel E},
Date-Added = {2018-02-05 06:22:16 +0000},
Date-Modified = {2018-10-10 19:27:06 -0700},
Journal = {{SIAM Journal on Applied Dynamical Systems}},
Number = {2},
Pages = {1683--1715},
Pdf = {https://arxiv.org/abs/1703.01662},
Title = {A dynamical system for prioritizing and coordinating motivations},
Url = {https://epubs.siam.org/doi/pdf/10.1137/17M111972X},
Volume = {17},
Year = {2018},
Bdsk-Url-1 = {https://arxiv.org/abs/1703.01662}}
[2] [pdf] P. B. Reverdy, “Value-based decision making for human-machine robot control,” in Southern California robotics symposium, 2019.
[Bibtex]
@inproceedings{PBR-19c,
Author = {Reverdy, Paul B},
Booktitle = {Southern {C}alifornia Robotics Symposium},
Date-Added = {2019-04-27 14:01:02 -0700},
Date-Modified = {2019-04-27 14:03:00 -0700},
Pdf = {http://www.paulreverdy.com/wp-content/papercite-data/pdf/PBR-19c.pdf},
Title = {Value-based decision making for human-machine robot control},
Year = {2019}}

UCL algorithm corrections

There was a subtle but small error in the proofs published in [1]. We have corrected the error in a new appendix G added to the arXiv version of the paper, available at https://arxiv.org/abs/1307.6134v4. These corrections also apply to other papers which built off of the results in [1], including [2], and [3].

The error arose from our application of concentration inequalities, sometimes known as tail bounds. In the originally-published proofs, we condition on the number \smash{n_i^t} of times that the algorithm has selected arm i up to time t. Since the arm selection policy depends on the rewards accrued, \smash{n_i^t} and the rewards are dependent random variables. In the correction, we build upon an alternative concentration inequality that accounts for this dependence and show that proofs of all the performance bounds follow a similar pattern with slight modification to the decision heuristic.

[1] [pdf] P. B. Reverdy, V. Srivastava, and N. E. Leonard, “Modeling human decision making in generalized Gaussian multiarmed bandits,” Proceedings of the IEEE, vol. 102, iss. 4, p. 544–571, 2014.
[Bibtex]
@article{PBR-VS-NEL:14,
Author = {Reverdy, Paul B and Srivastava, Vaibhav and Leonard, Naomi Ehrich},
Date-Added = {2018-02-05 06:22:16 +0000},
Date-Modified = {2018-02-08 21:11:54 +0000},
Journal = {Proceedings of the {IEEE}},
Number = {4},
Pages = {544--571},
Pdf = {http://www.paulreverdy.com/wp-content/papercite-data/pdf/PBR-VS-NEL-14.pdf},
Publisher = {IEEE},
Title = {Modeling human decision making in generalized {G}aussian multiarmed bandits},
Volume = {102},
Year = {2014}}
[2] [doi] P. Reverdy, V. Srivastava, and N. E. Leonard, “Satisficing in multi-armed bandit problems,” IEEE Transactions on Automatic Control, vol. 62, iss. 8, p. 3788–3803, 2017.
[Bibtex]
@article{PR-VS-NEL:17,
Author = {Reverdy, Paul and Srivastava, Vaibhav and Leonard, Naomi Ehrich},
Date-Added = {2018-02-05 06:22:16 +0000},
Date-Modified = {2018-02-08 21:34:38 +0000},
Doi = {10.1109/TAC.2016.2644380},
Journal = {{IEEE} {T}ransactions on {A}utomatic {C}ontrol},
Number = {8},
Pages = {3788--3803},
Publisher = {IEEE},
Title = {Satisficing in multi-armed bandit problems},
Volume = {62},
Year = {2017},
Bdsk-Url-1 = {https://doi.org/10.1109/TAC.2016.2644380}}
[3] [pdf] [doi] P. Reverdy and N. E. Leonard, “Parameter estimation in softmax decision-making models with linear objective functions,” IEEE Transactions on Automation Science and Engineering, vol. 13, iss. 1, p. 54–67, 2016.
[Bibtex]
@article{PR-NEL:16,
Author = {Reverdy, Paul and Leonard, Naomi Ehrich},
Date-Added = {2018-02-05 06:22:16 +0000},
Date-Modified = {2018-02-08 21:37:50 +0000},
Doi = {10.1109/TASE.2015.2499244},
Journal = {{IEEE} {T}ransactions on {A}utomation {S}cience and {E}ngineering},
Number = {1},
Pages = {54--67},
Pdf = {http://www.paulreverdy.com/wp-content/papercite-data/pdf/PR-NEL-16.pdf},
Publisher = {IEEE},
Title = {Parameter estimation in softmax decision-making models with linear objective functions},
Volume = {13},
Year = {2016},
Bdsk-Url-1 = {https://doi.org/10.1109/TASE.2015.2499244}}

Students presenting at IMECE

The undergraduates who worked in the lab last summer will be presenting two posters at the ASME IMECE conference based on their summer work. If you’re attending IMECE in Pittsburgh, please stop by on November 11!

Brendan Bogar will present “Investigating a Framework for Visualizing Reinforcement Learning Algorithms via Quadrupedal Robotic Simulation”.

David Chan, Mel Nguyen, Oshadha Gunasekara, and Randall Kliman will present “An object-oriented framework for fast development and testing of mobile robot control algorithms”. Abstracts are available on the IMECE website.

Welcome summer students!

Now that the spring semester is over, we are quickly transitioning to summer research mode. This week, we have welcomed four students:

  • David Chan, University of Arizona, Electrical and Computer Engineering
  • Mel Nguyen, University of Arizona, Electrical and Computer Engineering
  • Oshadha Gunasekara, Carnegie Mellon University, Electrical and Computer Engineering and Robotics
  • Randall Kliman, Georgia Tech, Computer Engineering.
  • They are working on lab infrastructure, integrating our motion capture system and a fleet mobile robots using ROS. More updates as the summer, and the work, progresses!

    Motivation dynamics simulations

    The below two videos show, respectively, the physical state and the full state space of a motivation dynamics agent. The agent is motivated to visit each of the two goal states (red diamonds) while staying in the workspace (the black circle) and avoiding the obstacles (black discs with red circles at their centers).

    The motivation dynamics are as described in our SIADS paper. As guaranteed by the analysis in the paper, the closed-loop system exhibits a stable limit cycle where the agent cyclically visits each of the goal states in turn.