abstract

Reasonable Perception: Connecting Vision and Language Systems for Validating Scene Descriptions

Authors:
Leilani H. Gilpin

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Cagri Zaman

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Danielle Olson

MIT, Cambridge, MA, USA

MIT, Cambridge, MA, USA
View Profile

,
Ben Z. Yuan

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

HRI '18: Companion of the 2018 ACM/IEEE International Conference on Human-Robot InteractionMarch 2018Pages 115–116https://doi.org/10.1145/3173386.3176994

Published:01 March 2018Publication History

HRI '18: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction

Pages 115–116

ABSTRACT

Understanding explanations of machine perception is an important step towards developing accountable, trustworthy machines. Furthermore, speech and vision are the primary modalities by which humans collect information about the world, but the linking of visual and natural language domains is a relatively new pursuit in computer vision, and it is difficult to test performance in a safe environment. To couple human visual understanding and machine perception, we present an explanatory system for creating a library of possible context-specific actions associated with 3D objects in immersive virtual worlds. We also contribute a novel scene description dataset, generated natively in virtual reality containing speech, image, gaze, and acceleration data. We discuss the development of a hybrid machine learning algorithm linking vision data with environmental affordances in natural language. Our findings demonstrate that it is possible to develop a model which can generate interpretable verbal descriptions of possible actions associated with recognized 3D objects within immersive VR environments.

References

Joseph A Blass and Kenneth D Forbus . 2017. Analogical Chaining with Natural Language Instruction for Commonsense Reasoning AAAI. 4357--4363.Google Scholar
Scott E Fahlman . 1979. NETL, a system for representing and using real-world knowledge. MIT press.Google Scholar
Matthew Molineaux and David W Aha . 2015. Continuous explanation generation in a multi-agent domain. Technical Report. NAVAL RESEARCH LAB WASHINGTON DC.Google Scholar
Robert Speer and Catherine Havasi . 2013. ConceptNet 5: A large semantic network for relational knowledge. The People's Web Meets NLP. Springer, 161--176.Google Scholar
Yi Zhang Siyuan Qiao Zihao Xiao Tae Soo Kim Yizhou Wang Alan Yuille Weichao Qiu, Fangwei Zhong . 2017. UnrealCV: Virtual Worlds for Computer Vision. ACM Multimedia Open Source Software Competition (2017). Google ScholarDigital Library
Patrick Henry Winston and Dylan Holmes . 2017. The Genesis manifesto: Story understanding and human intelligence. (2017).Google Scholar

Index Terms

Reasonable Perception: Connecting Vision and Language Systems for Validating Scene Descriptions
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Virtual reality
  2. Visualization
    1. Visualization design and evaluation methods

Recommendations

Distance Perception with a Video See-Through Head-Mounted Display
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

In recent years, pass-through cameras have resurfaced as inclusions for virtual reality (VR) hardware. With modern cameras that now have increased resolution and frame rate, Video See-Through (VST) Head-Mounted Displays (HMD) can be used to provide an ...
Read More
A Preliminary Study on Full-Body Haptic Stimulation on Modulating Self-motion Perception in Virtual Reality
Augmented Reality, Virtual Reality, and Computer Graphics
Abstract
We introduce a novel experimental system to explore the role of vibrotactile haptic feedback in Virtual Reality (VR) to induce the self-motion illusion. Self-motion (also called vection) has been mostly studied through visual and auditory stimuli ...
Read More
Do you feel what you see? Multimodal perception in virtual reality
VRST '17: Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology

This paper discusses how different physically existing materials can be mapped on virtual textures in mixed reality environments by carrying out an explorative user study (n=101). For physical materials-in form of 3d trackable and moveable cubes-acrylic,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HRI '18: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction
March 2018
431 pages
ISBN:9781450356152
DOI:10.1145/3173386
General Chairs:
Takayuki Kanda
ATR, Japan
,
Selma Ŝabanović
Indiana University Bloomington, USA
,
Program Chairs:
Guy Hoffman
Cornell University, USA
,
Adriana Tapus
ENSTA-ParisTech, France
Copyright © 2018 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2018
Check for updates
Author Tags
commonsense reasoning
explainable ai
explainable robotic systems
virtual reality
Qualifiers
- abstract
Conference

Acceptance Rates
HRI '18 Paper Acceptance Rate49of206submissions,24%Overall Acceptance Rate192of519submissions,37%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 262
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reasonable Perception: Connecting Vision and Language Systems for Validating Scene Descriptions

HRI '18: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Distance Perception with a Video See-Through Head-Mounted Display

A Preliminary Study on Full-Body Haptic Stimulation on Modulating Self-motion Perception in Virtual Reality

Do you feel what you see? Multimodal perception in virtual reality

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reasonable Perception: Connecting Vision and Language Systems for Validating Scene Descriptions

HRI '18: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Distance Perception with a Video See-Through Head-Mounted Display

A Preliminary Study on Full-Body Haptic Stimulation on Modulating Self-motion Perception in Virtual Reality

Do you feel what you see? Multimodal perception in virtual reality

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media