Natural language instructions for human–robot collaborative manipulation

Abstract

This paper presents a dataset of natural language instructions for object reference in manipulation scenarios. It comprises 1582 individual written instructions, which were collected via online crowdsourcing. This dataset is particularly useful for researchers who work in natural language processing, human–robot interaction, and robotic manipulation. In addition to serving as a rich corpus of domain-specific language, it provides a benchmark of image–instruction pairs to be used in system evaluations and uncovers inherent challenges in tabletop object specification. Example code is provided for easy access via Python.

Keywords

Natural language instructions human–robot collaboration manipulation ambiguity perspective spatial reference

Get full access to this article

View all access options for this article.

References

Anderson

Bader

Bard

, et al. (1991) The HCRC Map Task corpus. Language and Speech 34(4): 351–366.

Berenson

Srinivasa

(2008) Grasp synthesis in cluttered environments for dexterous hands. In: Proceedings of the 8th IEEE-RAS international conference on humanoid robots, Daejeon, South Korea, 1–3 December 2008, pp. 189–196. Piscataway, NJ: IEEE.

Bisk

Yuret

Marcu

(2016) Natural language communication with robots. In: Proceedings of the 15th annual conference of the North American chapter of the Association for Computational Linguistics, San Diego, CA, 12–17 June 2016, pp. 751–761. Stroudsburg, PA: Association for Computational Linguistics.

Boularias

Duvallet

, et al. (2015) Grounding spatial relations for outdoor robot navigation. In: Proceedings of IEEE international conference on robotics and automation (ICRA), Seattle, WA, 26–30 May 2015, pp. 1976–1982. Piscataway, NJ: IEEE.

Cohen

(1968) Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70(4): 213.

Di Eugenio

Jordan

Thomason

, et al. (2000) The agreement process: An empirical investigation of human–human computer-mediated collaborative dialogs. International Journal of Human–Computer Studies 53(6): 1017–1076.

Gatt

Van Der Sluis

Van Deemter

(2007) Evaluating algorithms for the generation of referring expressions using a balanced corpus. In: Proceedings of the 11th European workshop on natural language generation, Schloss Dagstuhl, Germany, 17–20 June, 2007, pp. 49–56. Stroudsburg, PA: Association for Computational Linguistics.

Gorniak

Roy

(2004) Grounded semantic composition for visual scenes. Journal of Artificial Intelligence Research 21: 429–470.

Guhe

Bard

(2008) Adapting referring expressions to the task environment. In: Proceedings of the 30th annual conference of the Cognitive Science Society (CogSci) (eds. Love

McRae

Sloutsky

), Washington, DC, 23–26 July 2008, pp. 2404–2409. Austin, TX: Cognitive Science Society.

10.

Howard

Tellex

Roy

(2014) A natural language planner interface for mobile manipulators. In: Proceedings of IEEE international conference on robotics and automation (ICRA), Hong Kong, China, 31 May–7 June 2014, pp. 6652–6659. Piscataway, NJ: IEEE.

11.

Jordan

(2000) Intentional influences on object redescriptions in dialogue: Evidence from an empirical study. PhD Thesis, University of Pittsburgh, USA.

12.

Judd

(1943) Facts of color-blindness. Journal of the Optical Society of America 33(6): 294–307.

13.

Keysar

Barr

Balin

, et al. (2000) Taking perspective in conversation: The role of mutual knowledge in comprehension. Psychological Science 11(1): 32–38.

14.

Krahmer

Van Deemter

(2012) Computational generation of referring expressions: A survey. Computational Linguistics 38(1): 173–218.

15.

Scalise

Admoni

, et al. (2016) Spatial references and perspective in natural language instructions for collaborative manipulation. In: Proceedings of IEEE international symposium on robot and human interactive communication, New York, NY, 26–31 August 2016, pp. 44–51. Piscataway, NJ: IEEE.

16.

MacMahon

Stankiewicz

(2006) Human and automated indoor route instruction following. Proceedings of the Cognitive Science Society 28: 1759–1764.

17.

Matuszek

Herbst

Zettlemoyer

, et al. (2013) Learning to parse natural language commands to a robot control system. In: Desai

Dudek

Khatib

, et al. (eds.) Experimental Robotics. (Springer Tracts in Advanced Robotics, vol. 88). Heidelberg: Springer, pp. 403–415.

18.

Suppe

Duvallet

, et al. (2015) Toward mobile robots reasoning like humans. In: Proceedings of the 29th AAAI conference on artificial intelligence, Austin, TX, 25–30 January 2015, pp. 1371–1379. Palo Alto, CA: AAAI.

19.

Paul

Arkin

Roy

, et al. (2016) Efficient grounding of abstract spatial concepts for natural language interaction with robot manipulators. In: Proceedings of Robotics: Science and systems XII (eds. Hsu

Amato

Berman

, et al.), Ann Arbor, MI, 18–22 June 2016. Cambridge, MA: MIT Press.

20.

Scheutz

Schermerhorn

Kramer

(2006) The utility of affect expression in natural language interactions in joint human–robot tasks. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction, Salt Lake City, UT, 2–3 March 2006, pp. 226–233. New York, NY: ACM.

21.

Skubic

Alexenko

Huo

, et al. (2012) Investigating spatial language for robot fetch commands. In: Proceedings of the 26th AAAI conference on artificial intelligence workshop, Toronto, Canada, 22–26 July 2012. Palo Alto, CA: AAAI.

22.

Tellex

Kollar

Dickerson

, et al. (2011) Understanding natural language commands for robotic navigation and mobile manipulation. In: Proceedings of the national conference on artificial intelligence (AAAI). San Francisco, CA, 7–11 August 2011, pp. 1507–1514. Palo Alto, CA: AAAI.

23.

Viethen

Dale

(2006) Algorithms for generating referring expressions: Do they do what people do? In: Proceedings of the fourth international natural language generation conference, Sydney, Australia, 15–16 July 2006, pp. 63–70. Stroudsburg, PA: Association for Computational Linguistics.

24.

Viethen

Dale

(2008) The use of spatial relations in referring expression generation. In: Proceedings of the 5th international natural language generation conference, Salt Fork, OH, 12–14 June 2008, pp. 59–67. Stroudsburg, PA: Association for Computational Linguistics.

25.

Viethen

Zwarts

Dale

, et al. (2010) Dialogue reference in a visual domain. In: Proceedings of the international conference on language resources and evaluation, Malta, 17–23 May 2010. Luxembourg: European Language Resources Association.