ABSTRACT
Understanding explanations of machine perception is an important step towards developing accountable, trustworthy machines. Furthermore, speech and vision are the primary modalities by which humans collect information about the world, but the linking of visual and natural language domains is a relatively new pursuit in computer vision, and it is difficult to test performance in a safe environment. To couple human visual understanding and machine perception, we present an explanatory system for creating a library of possible context-specific actions associated with 3D objects in immersive virtual worlds. We also contribute a novel scene description dataset, generated natively in virtual reality containing speech, image, gaze, and acceleration data. We discuss the development of a hybrid machine learning algorithm linking vision data with environmental affordances in natural language. Our findings demonstrate that it is possible to develop a model which can generate interpretable verbal descriptions of possible actions associated with recognized 3D objects within immersive VR environments.
- Joseph A Blass and Kenneth D Forbus . 2017. Analogical Chaining with Natural Language Instruction for Commonsense Reasoning AAAI. 4357--4363.Google Scholar
- Scott E Fahlman . 1979. NETL, a system for representing and using real-world knowledge. MIT press.Google Scholar
- Matthew Molineaux and David W Aha . 2015. Continuous explanation generation in a multi-agent domain. Technical Report. NAVAL RESEARCH LAB WASHINGTON DC.Google Scholar
- Robert Speer and Catherine Havasi . 2013. ConceptNet 5: A large semantic network for relational knowledge. The People's Web Meets NLP. Springer, 161--176.Google Scholar
- Yi Zhang Siyuan Qiao Zihao Xiao Tae Soo Kim Yizhou Wang Alan Yuille Weichao Qiu, Fangwei Zhong . 2017. UnrealCV: Virtual Worlds for Computer Vision. ACM Multimedia Open Source Software Competition (2017). Google ScholarDigital Library
- Patrick Henry Winston and Dylan Holmes . 2017. The Genesis manifesto: Story understanding and human intelligence. (2017).Google Scholar
Index Terms
- Reasonable Perception: Connecting Vision and Language Systems for Validating Scene Descriptions
Recommendations
Distance Perception with a Video See-Through Head-Mounted Display
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsIn recent years, pass-through cameras have resurfaced as inclusions for virtual reality (VR) hardware. With modern cameras that now have increased resolution and frame rate, Video See-Through (VST) Head-Mounted Displays (HMD) can be used to provide an ...
A Preliminary Study on Full-Body Haptic Stimulation on Modulating Self-motion Perception in Virtual Reality
Augmented Reality, Virtual Reality, and Computer GraphicsAbstractWe introduce a novel experimental system to explore the role of vibrotactile haptic feedback in Virtual Reality (VR) to induce the self-motion illusion. Self-motion (also called vection) has been mostly studied through visual and auditory stimuli ...
Do you feel what you see? Multimodal perception in virtual reality
VRST '17: Proceedings of the 23rd ACM Symposium on Virtual Reality Software and TechnologyThis paper discusses how different physically existing materials can be mapped on virtual textures in mixed reality environments by carrying out an explorative user study (n=101). For physical materials-in form of 3d trackable and moveable cubes-acrylic,...
Comments