XR

2024-04-25

ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

Link: http://arxiv.org/abs/2404.16825v1open in new window

Abstract: With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content viewed on head mounted displays (HMDs) is actually a rendered viewport instead of an ERP image. In this work, we emphasize that focusing solely on ERP quality results in inferior viewport visual experiences for users. Thus, we propose ResVR, which is the first comprehensive framework for the joint Rescaling and Viewport Rendering of ODIs. ResVR allows obtaining LR ERP images for transmission while rendering high-quality viewports for users to watch on HMDs. In our ResVR, a novel discrete pixel sampling strategy is developed to tackle the complex mapping between the viewport and ERP, enabling end-to-end training of ResVR pipeline. Furthermore, a spherical pixel shape representation technique is innovatively derived from spherical differentiation to significantly improve the visual quality of rendered viewports. Extensive experiments demonstrate that our ResVR outperforms existing methods in viewport rendering tasks across different fields of view, resolutions, and view directions while keeping a low transmission overhead.

Comparing Continuous and Retrospective Emotion Ratings in Remote VR Study

Authors: Maximilian Warsinke, Tanja Kojić, Maurizio Vergari, Robert Spang, Jan-Niklas Voigt-Antons, Sebastian Möller

Link: http://arxiv.org/abs/2404.16487v1open in new window

Abstract: This study investigates the feasibility of remote virtual reality (VR) studies conducted at home using VR headsets and video conferencing by deploying an experiment on emotion ratings. 20 participants used head-mounted displays to immerse themselves in 360{\deg} videos selected to evoke emotional responses. The research compares continuous ratings using a graphical interface to retrospective questionnaires on a digitized Likert Scale for measuring arousal and valence, both based on the self-assessment manikin (SAM). It was hypothesized that the two different rating methods would lead to significantly different values for both valence and arousal. The goal was to investigate whether continuous ratings during the experience would better reflect users' emotions compared to the post-questionnaire by mitigating biases such as the peak-end rule. The results show significant differences with moderate to strong effect sizes for valence and no significant differences for arousal with low to moderate effect sizes. This indicates the need for further investigation of the methods used to assess emotion ratings in VR studies. Overall, this study is an example of a remotely conducted VR experiment, offering insights into methods for emotion elicitation in VR by varying the timing and interface of the rating.

The Impact of Social Environment and Interaction Focus on User Experience and Social Acceptability of an Augmented Reality Game

Authors: Lorenzo Cocchia, Maurizio Vergari, Tanja Kojic, Francesco Vona, Sebastian Moller, Franca Garzotto, Jan-Niklas Voigt-Antons

Link: http://arxiv.org/abs/2404.16479v1open in new window

Abstract: One of the most promising technologies inside the Extended Reality (XR) spectrum is Augmented Reality. This technology is already in people's pockets regarding Mobile Augmented Reality with their smartphones. The scientific community still needs answers about how humans could and should interact in environments where perceived stimuli are different from fully physical or digital circumstances. Moreover, it is still being determined if people accept these new technologies in different social environments and interaction settings or if some obstacles could exist. This paper explores the impact of the Social Environment and the Focus of social interaction on users while playing a location-based augmented reality game, measuring it with user experience and social acceptance indicators. An empirical study in a within-subject fashion was performed in different social environments and under different settings of social interaction focus with N = 28 participants compiling self-reported questionnaires after playing a Scavenger Hunt in Augmented Reality. The measures from two different Social Environments (Crowded vs. Uncrowded) resulted in statistically relevant mean differences with indicators from the Social Acceptability dimension. Moreover, the analyses show statistically relevant differences between the variances from different degrees of Social Interaction Focus with Overall Social Presence, Perceived Psychological Engagement, Perceived Attentional Engagement, and Perceived Emotional Contagion. The results suggest that a location-based AR game played in different social environments and settings can influence the user experience's social dimension. Therefore, they should be carefully considered while designing immersive technological experiences in public spaces involving social interactions between players.

Impact of spatial auditory navigation on user experience during augmented outdoor navigation tasks

Authors: Jan-Niklas Voigt-Antons, Zhirou Sun, Maurizio Vergari, Navid Ashrafi, Francesco Vona, Tanja Kojic

Link: http://arxiv.org/abs/2404.16473v1open in new window

Abstract: The auditory sense of humans is important when it comes to navigation. The importance is especially high in cases when an object of interest is visually partly or fully covered. Interactions with users of technology are mainly focused on the visual domain of navigation tasks. This paper presents the results of a literature review and user study exploring the impact of spatial auditory navigation on user experience during an augmented outdoor navigation task. For the user test, participants used an augmented reality app guiding them to different locations with different digital augmentation. We conclude that the utilization of the auditory sense is yet still underrepresented in augmented reality applications. In the future, more usage scenarios for audio-augmented reality such as navigation will enhance user experience and interaction quality.

2024-04-24

Shared Boundary Interfaces: can one fit all? A controlled study on virtual reality vs touch-screen interfaces on persons with Neurodevelopmental Disorders

Authors: Francesco Vona, Eleonora Beccaluva, Marco Mores, Franca Garzotto

Link: http://arxiv.org/abs/2404.15970v1open in new window

Abstract: Technology presents a significant educational opportunity, particularly in enhancing emotional engagement and expanding learning and educational prospects for individuals with Neurodevelopmental Disorders (NDD). Virtual reality emerges as a promising tool for addressing such disorders, complemented by numerous touchscreen applications that have shown efficacy in fostering education and learning abilities. VR and touchscreen technologies represent diverse interface modalities. This study primarily investigates which interface, VR or touchscreen, more effectively facilitates food education for individuals with NDD. We compared learning outcomes via pre- and post-exposure questionnaires. To this end, we developed GEA, a dual-interface, user-friendly web application for Food Education, adaptable for either immersive use in a head-mounted display (HMD) or non-immersive use on a tablet. A controlled study was conducted to determine which interface better promotes learning. Over three sessions, the experimental group engaged with all GEA games in VR (condition A), while the control group interacted with the same games on a tablet (condition B). Results indicated a significant increase in post-questionnaire scores across subjects, averaging a 46% improvement. This enhancement was notably consistent between groups, with VR and Tablet groups showing 42% and 41% improvements, respectively.

Training Attention Skills in Individuals with Neurodevelopmental Disorders using Virtual Reality and Eye-tracking technology

Authors: Alberto Patti, Francesco Vona, Anna Barberio, Marco Domenico Buttiglione, Ivan Crusco, Marco Mores, Franca Garzotto

Link: http://arxiv.org/abs/2404.15960v1open in new window

Abstract: Neurodevelopmental disorders (NDD), encompassing conditions like Intellectual Disability, Attention Deficit Hyperactivity Disorder, and Autism Spectrum Disorder, present challenges across various cognitive capacities. Attention deficits are often common in individuals with NDD due to the sensory system dysfunction that characterizes these disorders. Consequently, limited attention capability can affect the overall quality of life and the ability to transfer knowledge from one circumstance to another. The literature has increasingly recognized the potential benefits of virtual reality (VR) in supporting NDD learning and rehabilitation due to its interactive and engaging nature, which is critical for consistent practice. In previous studies, we explored the usage of a VR application called Wildcard to enhance attention skills in persons with NDD. The application has been redesigned in this study, exploiting eye-tracking technology to enable novel and more fine-grade interactions. A four-week experiment with 38 NDD participants was conducted to evaluate its usability and effectiveness in improving Visual Attention Skills. Results show the usability and effectiveness of Wildcard in enhancing attention skills, advocating for continued exploration of VR and eye-tracking technology's potential in NDD interventions.

BlissCam: Boosting Eye Tracking Efficiency with Learned In-Sensor Sparse Sampling

Authors: Yu Feng, Tianrui Ma, Yuhao Zhu, Xuan Zhang

Link: http://arxiv.org/abs/2404.15733v1open in new window

Abstract: Eye tracking is becoming an increasingly important task domain in emerging computing platforms such as Augmented/Virtual Reality (AR/VR). Today's eye tracking system suffers from long end-to-end tracking latency and can easily eat up half of the power budget of a mobile VR device. Most existing optimization efforts exclusively focus on the computation pipeline by optimizing the algorithm and/or designing dedicated accelerators while largely ignoring the front-end of any eye tracking pipeline: the image sensor. This paper makes a case for co-designing the imaging system with the computing system. In particular, we propose the notion of "in-sensor sparse sampling", whereby the pixels are drastically downsampled (by 20x) within the sensor. Such in-sensor sampling enhances the overall tracking efficiency by significantly reducing 1) the power consumption of the sensor readout chain and sensor-host communication interfaces, two major power contributors, and 2) the work done on the host, which receives and operates on far fewer pixels. With careful reuse of existing pixel circuitry, our proposed BLISSCAM requires little hardware augmentation to support the in-sensor operations. Our synthesis results show up to 8.2x energy reduction and 1.4x latency reduction over existing eye tracking pipelines.

2024-04-23

Augmented Voices: An Augmented Reality Experience Highlighting the Social Injustices of Gender-Based Violence in the Muslim South-Asian Diaspora

Authors: Hamida Khatri

Link: http://arxiv.org/abs/2404.15239v1open in new window

Abstract: This paper delves into the distressing prevalence of gender-based violence (GBV) and its deep-seated psychological ramifications, particularly among Muslim South Asian women living in diasporic communities. Despite the gravity of GBV, these women often face formidable barriers in voicing their experiences and accessing support. "Augmented Voices" emerges as a technological beacon, harnessing the potential of augmented reality (AR) to bridge the digital and physical realms through mobile devices, enhancing the visibility of these often-silenced voices. With its technological motivation firmly anchored in the convergence of AR and real-world interactions, "Augmented Voices" offers a digital platform where storytelling acts as a catalyst, bringing to the fore the experiences shared by these women. By superimposing their narratives onto physical locations via Geographic Information System (GIS) Mapping, the application "augments their voices" in the diaspora, providing a conduit for expression and solidarity. This project, currently at its developmental stage, aspires to elevate the stories of GBV victims to a level where their struggles are not just heard but felt, forging a powerful connection between the user and the narrative. It is designed to transcend the limitations of conventional storytelling, creating an "augmented" reality where voices that are often muted by societal constraints can resonate powerfully. The project underscores the urgent imperative to confront GBV, catalyzing societal transformation and fostering robust support networks for those in the margins. It is a pioneering example of how technology can become a formidable ally in the fight for social justice and the empowerment of the oppressed. Additionally, this paper delves into the AR workflow illustrating its relevance and contribution to the broader theme of site-specific AR for social justice.

Virtual Takeovers in the Metaverse: Interrogating Power in Our Past and Future(s) with Multi-Layered Narratives

Authors: Heather Snyder Quinn, Jessa Dickinson

Link: http://arxiv.org/abs/2404.15108v1open in new window

Abstract: Mariah is an augmented reality (AR) mobile application that exposes power structures (e.g., capitalism, patriarchy, white supremacy) through storytelling and celebrates acts of resistance against them. People can use Mariah to "legally trespass" the metaverse as a form of protest. Mariah provides historical context to the user's physical surroundings by superimposing images and playing stories about people who have experienced, and resisted, injustice. We share two implementations of Mariah that raise questions about free speech and property rights in the metaverse: (1) a protest against museums accepting "dirty money" from the opioid epidemic; and (2) a commemoration of sites where people have resisted power structures. Mariah is a case study for how experimenting with a technology in non-sanctioned ways (i.e., "hacking") can expose ways that it might interact with, and potentially amplify, existing power structures.

Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers

Authors: Buyun He, Yingguang Yang, Qi Wu, Hao Liu, Renyu Yang, Hao Peng, Xiang Wang, Yong Liao, Pengyuan Zhou

Link: http://arxiv.org/abs/2404.15070v1open in new window

Abstract: Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score.

Quantitative Evaluation of driver's situation awareness in virtual driving through Eye tracking analysis

Authors: Yunxiang Jiang, Qing Xu, Kai Zhen, Yu Chen

Link: http://arxiv.org/abs/2404.14817v1open in new window

Abstract: In driving tasks, the driver's situation awareness of the surrounding scenario is crucial for safety driving. However, current methods of measuring situation awareness mostly rely on subjective questionnaires, which interrupt tasks and lack non-intrusive quantification. To address this issue, our study utilizes objective gaze motion data to provide an interference-free quantification method for situation awareness. Three quantitative scores are proposed to represent three different levels of awareness: perception, comprehension, and projection, and an overall score of situation awareness is also proposed based on above three scores. To validate our findings, we conducted experiments where subjects performed driving tasks in a virtual reality simulated environment. All the four proposed situation awareness scores have clearly shown a significant correlation with driving performance. The proposed not only illuminates a new path for understanding and evaluating the situation awareness but also offers a satisfying proxy for driving performance.

2024-04-22

Penn & Slavery Project's Augmented Reality Tour: Augmenting a Campus to Reveal a Hidden History

Authors: VanJessica Gladney, Breanna Moore, Kathleen Brown

Link: http://arxiv.org/abs/2404.14379v1open in new window

Abstract: In 2006 and 2016, the University of Pennsylvania denied any ties to slavery. In 2017, a group of undergraduate researchers, led by Professor Kathleen Brown, investigated this claim. Initial research, focused on 18th century faculty and trustees who owned slaves, revealed deep connections between the university's history and the institution of slavery. These findings, and discussions amongst the researchers shaped the Penn and Slavery Project's goal of redefining complicity beyond ownership. Breanna Moore's contributions in PSP's second semester expanded the project's focus to include generational wealth gaps. In 2018, VanJessica Gladney served as the PSP's Public History Fellow and spread the project outreach in the greater Philadelphia area. That year, the PSP team began to design an augmented reality app as a Digital Interruption and an attempt to display the truth about Penn's history on its campus. Unfortunately, PSP faced delays due to COVID 19. Despite setbacks, the project persisted, engaging with activists and the wider community to confront historical injustices and modern inequalities.

Immersive Rover Control and Obstacle Detection based on Extended Reality and Artificial Intelligence

Authors: Sofía Coloma, Alexandre Frantz, Dave van der Meer, Ernest Skrzypczyk, Andrej Orsula, Miguel Olivares-Mendez

Link: http://arxiv.org/abs/2404.14095v1open in new window

Abstract: Lunar exploration has become a key focus, driving scientific and technological advances. Ongoing missions are deploying rovers to the surface of the Moon, targeting the far side and south pole. However, these terrains pose challenges, emphasizing the need for precise obstacles and resource detection to avoid mission risks. This work proposes a novel system that integrates eXtended Reality (XR) and Artificial Intelligence (AI) to teleoperate lunar rovers. It is capable of autonomously detecting rocks and recreating an immersive 3D virtual environment of the location of the robot. This system has been validated in a lunar laboratory to observe its advantages over traditional 2D-based teleoperation approaches

2024-04-18

Holographic Parallax Improves 3D Perceptual Realism

Authors: Dongyeon Kim, Seung-Woo Nam, Suyeon Choi, Jong-Mo Seo, Gordon Wetzstein, Yoonchan Jeong

Link: http://arxiv.org/abs/2404.11810v1open in new window

Abstract: Holographic near-eye displays are a promising technology to solve long-standing challenges in virtual and augmented reality display systems. Over the last few years, many different computer-generated holography (CGH) algorithms have been proposed that are supervised by different types of target content, such as 2.5D RGB-depth maps, 3D focal stacks, and 4D light fields. It is unclear, however, what the perceptual implications are of the choice of algorithm and target content type. In this work, we build a perceptual testbed of a full-color, high-quality holographic near-eye display. Under natural viewing conditions, we examine the effects of various CGH supervision formats and conduct user studies to assess their perceptual impacts on 3D realism. Our results indicate that CGH algorithms designed for specific viewpoints exhibit noticeable deficiencies in achieving 3D realism. In contrast, holograms incorporating parallax cues consistently outperform other formats across different viewing conditions, including the center of the eyebox. This finding is particularly interesting and suggests that the inclusion of parallax cues in CGH rendering plays a crucial role in enhancing the overall quality of the holographic experience. This work represents an initial stride towards delivering a perceptually realistic 3D experience with holographic near-eye displays.

2024-04-17

Establishing a Baseline for Gaze-driven Authentication Performance in VR: A Breadth-First Investigation on a Very Large Dataset

Authors: Dillon Lohr, Michael J. Proulx, Oleg Komogortsev

Link: http://arxiv.org/abs/2404.11798v1open in new window

Abstract: This paper performs the crucial work of establishing a baseline for gaze-driven authentication performance to begin answering fundamental research questions using a very large dataset of gaze recordings from 9202 people with a level of eye tracking (ET) signal quality equivalent to modern consumer-facing virtual reality (VR) platforms. The size of the employed dataset is at least an order-of-magnitude larger than any other dataset from previous related work. Binocular estimates of the optical and visual axes of the eyes and a minimum duration for enrollment and verification are required for our model to achieve a false rejection rate (FRR) of below 3% at a false acceptance rate (FAR) of 1 in 50,000. In terms of identification accuracy which decreases with gallery size, we estimate that our model would fall below chance-level accuracy for gallery sizes of 148,000 or more. Our major findings indicate that gaze authentication can be as accurate as required by the FIDO standard when driven by a state-of-the-art machine learning architecture and a sufficiently large training dataset.

Deep Pattern Network for Click-Through Rate Prediction

Authors: Hengyu Zhang, Junwei Pan, Dapeng Liu, Jie Jiang, Xiu Li

Link: http://arxiv.org/abs/2404.11456v1open in new window

Abstract: Click-through rate (CTR) prediction tasks play a pivotal role in real-world applications, particularly in recommendation systems and online advertising. A significant research branch in this domain focuses on user behavior modeling. Current research predominantly centers on modeling co-occurrence relationships between the target item and items previously interacted with by users in their historical data. However, this focus neglects the intricate modeling of user behavior patterns. In reality, the abundance of user interaction records encompasses diverse behavior patterns, indicative of a spectrum of habitual paradigms. These patterns harbor substantial potential to significantly enhance CTR prediction performance. To harness the informational potential within user behavior patterns, we extend Target Attention (TA) to Target Pattern Attention (TPA) to model pattern-level dependencies. Furthermore, three critical challenges demand attention: the inclusion of unrelated items within behavior patterns, data sparsity in behavior patterns, and computational complexity arising from numerous patterns. To address these challenges, we introduce the Deep Pattern Network (DPN), designed to comprehensively leverage information from user behavior patterns. DPN efficiently retrieves target-related user behavior patterns using a target-aware attention mechanism. Additionally, it contributes to refining user behavior patterns through a pre-training paradigm based on self-supervised learning while promoting dependency learning within sparse patterns. Our comprehensive experiments, conducted across three public datasets, substantiate the superior performance and broad compatibility of DPN.

AR for Sexual Violence: Maintaining Ethical Balance While Enhancing Empathy

Authors: Chunwei Lin

Link: http://arxiv.org/abs/2404.11305v1open in new window

Abstract: This study showcases an augmented reality (AR) experience designed to promote gender justice and increase awareness of sexual violence in Taiwan. By leveraging AR, this project overcomes the limitations of offline exhibitions on social issues by motivating the public to participate and enhancing their willingness to delve into the topic. The discussion explores how direct exposure to sexual violence can induce negative emotions and secondary trauma among users. It also suggests strategies for using AR to alleviate such issues, particularly by avoiding simulations of actual incidents.

Novel View Synthesis for Cinematic Anatomy on Mobile and Immersive Displays

Authors: Simon Niedermayr, Christoph Neuhauser, Kaloian Petkov, Klaus Engel, Rüdiger Westermann

Link: http://arxiv.org/abs/2404.11285v1open in new window

Abstract: Interactive photorealistic visualization of 3D anatomy (i.e., Cinematic Anatomy) is used in medical education to explain the structure of the human body. It is currently restricted to frontal teaching scenarios, where the demonstrator needs a powerful GPU and high-speed access to a large storage device where the dataset is hosted. We demonstrate the use of novel view synthesis via compressed 3D Gaussian splatting to overcome this restriction and to enable students to perform cinematic anatomy on lightweight mobile devices and in virtual reality environments. We present an automatic approach for finding a set of images that captures all potentially seen structures in the data. By mixing closeup views with images from a distance, the splat representation can recover structures up to the voxel resolution. The use of Mip-Splatting enables smooth transitions when the focal length is increased. Even for GB datasets, the final renderable representation can usually be compressed to less than 70 MB, enabling interactive rendering on low-end devices using rasterization.

2024-04-16

A Plausibility Study of Using Augmented Reality in the Ventriculoperitoneal Shunt Operations

Authors: Tandin Dorji, Pakinee Aimmanee, Vich Yindeedej

Link: http://arxiv.org/abs/2404.10713v1open in new window

Abstract: The field of augmented reality (AR) has undergone substantial growth, finding diverse applications in the medical industry. This paper delves into various techniques employed in medical surgeries, scrutinizing factors such as cost, implementation, and accessibility. The focus of this exploration is on AR-based solutions, with a particular emphasis on addressing challenges and proposing an innovative solution for ventriculoperitoneal shunt (VP) operations. The proposed solution introduces a novel flow in the pre-surgery phase, aiming to substantially reduce setup time and operation duration by creating 3D models of the skull and ventricles. Experiments are conducted where the models are visualized on a 3D- printed skull through an AR device, specifically the Microsoft HoloLens 2. The paper then conducts an in-depth analysis of this proposed solution, discussing its feasibility, advantages, limitations,and future implications.

The application of Augmented Reality (AR) in Remote Work and Education

Authors: Keqin Li, Peng Xirui, Jintong Song, Bo Hong, Jin Wang

Link: http://arxiv.org/abs/2404.10579v1open in new window

Abstract: With the rapid advancement of technology, Augmented Reality (AR) technology, known for its ability to deeply integrate virtual information with the real world, is gradually transforming traditional work modes and teaching methods. Particularly in the realms of remote work and online education, AR technology demonstrates a broad spectrum of application prospects. This paper delves into the application potential and actual effects of AR technology in remote work and education. Through a systematic literature review, this study outlines the key features, advantages, and challenges of AR technology. Based on theoretical analysis, it discusses the scientific basis and technical support that AR technology provides for enhancing remote work efficiency and promoting innovation in educational teaching models. Additionally, by designing an empirical research plan and analyzing experimental data, this article reveals the specific performance and influencing factors of AR technology in practical applications. Finally, based on the results of the experiments, this research summarizes the application value of AR technology in remote work and education, looks forward to its future development trends, and proposes forward-looking research directions and strategic suggestions, offering empirical foundation and theoretical guidance for further promoting the in-depth application of AR technology in related fields.

Teaching Chinese Sign Language with Feedback in Mixed Reality

Authors: Hongli Wen, Yang Xu, Lin Li, Xudong Ru

Link: http://arxiv.org/abs/2404.10490v1open in new window

Abstract: Traditional sign language teaching methods face challenges such as limited feedback and diverse learning scenarios. Although 2D resources lack real-time feedback, classroom teaching is constrained by a scarcity of teacher. Methods based on VR and AR have relatively primitive interaction feedback mechanisms. This study proposes an innovative teaching model that uses real-time monocular vision and mixed reality technology. First, we introduce an improved hand-posture reconstruction method to achieve sign language semantic retention and real-time feedback. Second, a ternary system evaluation algorithm is proposed for a comprehensive assessment, maintaining good consistency with experts in sign language. Furthermore, we use mixed reality technology to construct a scenario-based 3D sign language classroom and explore the user experience of scenario teaching. Overall, this paper presents a novel teaching method that provides an immersive learning experience, advanced posture reconstruction, and precise feedback, achieving positive feedback on user experience and learning effectiveness.

Referring Flexible Image Restoration

Authors: Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

Link: http://arxiv.org/abs/2404.10342v1open in new window

Abstract: In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.

2024-04-15

Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints

Authors: Faraz Faruqi, Yingtao Tian, Vrushank Phadnis, Varun Jampani, Stefanie Mueller

Link: http://arxiv.org/abs/2404.10142v1open in new window

Abstract: Generative AI tools are becoming more prevalent in 3D modeling, enabling users to manipulate or create new models with text or images as inputs. This makes it easier for users to rapidly customize and iterate on their 3D designs and explore new creative ideas. These methods focus on the aesthetic quality of the 3D models, refining them to look similar to the prompts provided by the user. However, when creating 3D models intended for fabrication, designers need to trade-off the aesthetic qualities of a 3D model with their intended physical properties. To be functional post-fabrication, 3D models have to satisfy structural constraints informed by physical principles. Currently, such requirements are not enforced by generative AI tools. This leads to the development of aesthetically appealing, but potentially non-functional 3D geometry, that would be hard to fabricate and use in the real world. This workshop paper highlights the limitations of generative AI tools in translating digital creations into the physical world and proposes new augmentations to generative AI tools for creating physically viable 3D models. We advocate for the development of tools that manipulate or generate 3D models by considering not only the aesthetic appearance but also using physical properties as constraints. This exploration seeks to bridge the gap between digital creativity and real-world applicability, extending the creative potential of generative AI into the tangible domain.

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

Authors: Min-Hui Lin, Mahesh Reddy, Guillaume Berger, Michel Sarkis, Fatih Porikli, Ning Bi

Link: http://arxiv.org/abs/2404.09918v1open in new window

Abstract: In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, realistic lighting conditions from textual descriptions, offering flexibility and control in portrait video relighting task. Unlike the previous relighting frameworks, our proposed system performs video relighting directly on-device, enabling real-time inference with real 360-degree HDRI maps. This on-device processing ensures both privacy and guarantees low runtime, providing an immediate response to changes in lighting conditions or user inputs. Our approach paves the way for new possibilities in real-time video applications, including video conferencing, gaming, and augmented reality, by allowing dynamic, text-based control of lighting conditions.

Software development in the age of LLMs and XR

Authors: Jesus M. Gonzalez-Barahona

Link: http://arxiv.org/abs/2404.09789v1open in new window

Abstract: Let's imagine that in a few years generative AI has changed software development dramatically, taking charge of most of the programming tasks. Let's also assume that extended reality devices became ubiquitous, being the preferred interface for interacting with computers. This paper proposes how this situation would impact IDEs, by exploring how the development process would be affected, and analyzing which tools would be needed for supporting developers.

Dynamic Ego-Velocity estimation Using Moving mmWave Radar: A Phase-Based Approach

Authors: Argha Sen, Soham Chakraborty, Soham Tripathy, Sandip Chakraborty

Link: http://arxiv.org/abs/2404.09691v1open in new window

Abstract: Precise ego-motion measurement is crucial for various applications, including robotics, augmented reality, and autonomous navigation. In this poster, we propose mmPhase, an odometry framework based on single-chip millimetre-wave (mmWave) radar for robust ego-motion estimation in mobile platforms without requiring additional modalities like the visual, wheel, or inertial odometry. mmPhase leverages a phase-based velocity estimation approach to overcome the limitations of conventional doppler resolution. For real-world evaluations of mmPhase we have developed an ego-vehicle prototype. Compared to the state-of-the-art baselines, mmPhase shows superior performance in ego-velocity estimation.

AAM-VDT: Vehicle Digital Twin for Tele-Operations in Advanced Air Mobility

Authors: Tuan Anh Nguyen, Taeho Kwag, Vinh Pham, Viet Nghia Nguyen, Jeongseok Hyun, Minseok Jang, Jae-Woo Lee

Link: http://arxiv.org/abs/2404.09621v1open in new window

Abstract: This study advanced tele-operations in Advanced Air Mobility (AAM) through the creation of a Vehicle Digital Twin (VDT) system for eVTOL aircraft, tailored to enhance remote control safety and efficiency, especially for Beyond Visual Line of Sight (BVLOS) operations. By synergizing digital twin technology with immersive Virtual Reality (VR) interfaces, we notably elevate situational awareness and control precision for remote operators. Our VDT framework integrates immersive tele-operation with a high-fidelity aerodynamic database, essential for authentically simulating flight dynamics and control tactics. At the heart of our methodology lies an eVTOL's high-fidelity digital replica, placed within a simulated reality that accurately reflects physical laws, enabling operators to manage the aircraft via a master-slave dynamic, substantially outperforming traditional 2D interfaces. The architecture of the designed system ensures seamless interaction between the operator, the digital twin, and the actual aircraft, facilitating exact, instantaneous feedback. Experimental assessments, involving propulsion data gathering, simulation database fidelity verification, and tele-operation testing, verify the system's capability in precise control command transmission and maintaining the digital-physical eVTOL synchronization. Our findings underscore the VDT system's potential in augmenting AAM efficiency and safety, paving the way for broader digital twin application in autonomous aerial vehicles.

Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation?

Authors: Dmitry Ignatov, Andrey Ignatov, Radu Timofte

Link: http://arxiv.org/abs/2404.09469v1open in new window

Abstract: We present ANYU, a new virtually augmented version of the NYU depth v2 dataset, designed for monocular depth estimation. In contrast to the well-known approach where full 3D scenes of a virtual world are utilized to generate artificial datasets, ANYU was created by incorporating RGB-D representations of virtual reality objects into the original NYU depth v2 images. We specifically did not match each generated virtual object with an appropriate texture and a suitable location within the real-world image. Instead, an assignment of texture, location, lighting, and other rendering parameters was randomized to maximize a diversity of the training data, and to show that it is randomness that can improve the generalizing ability of a dataset. By conducting extensive experiments with our virtually modified dataset and validating on the original NYU depth v2 and iBims-1 benchmarks, we show that ANYU improves the monocular depth estimation performance and generalization of deep neural networks with considerably different architectures, especially for the current state-of-the-art VPD model. To the best of our knowledge, this is the first work that augments a real-world dataset with randomly generated virtual 3D objects for monocular depth estimation. We make our ANYU dataset publicly available in two training configurations with 10% and 100% additional synthetically enriched RGB-D pairs of training images, respectively, for efficient training and empirical exploration of virtual augmentation at https://github.com/ABrain-One/ANYU

2024-04-14

Evaluating the efficacy of haptic feedback, 360° treadmill-integrated Virtual Reality framework and longitudinal training on decision-making performance in a complex search-and-shoot simulation

Authors: Akash K Rao, Arnav Bhavsar, Shubhajit Roy Chowdhury, Sushil Chandra, Ramsingh Negi, Prakash Duraisamy, Varun Dutt

Link: http://arxiv.org/abs/2404.09147v1open in new window

Abstract: Virtual Reality (VR) has made significant strides, offering users a multitude of ways to interact with virtual environments. Each sensory modality in VR provides distinct inputs and interactions, enhancing the user's immersion and presence. However, the potential of additional sensory modalities, such as haptic feedback and 360{\deg} locomotion, to improve decision-making performance has not been thoroughly investigated. This study addresses this gap by evaluating the impact of a haptic feedback, 360{\deg} locomotion-integrated VR framework and longitudinal, heterogeneous training on decision-making performance in a complex search-and-shoot simulation. The study involved 32 participants from a defence simulation base in India, who were randomly divided into two groups: experimental (haptic feedback, 360{\deg} locomotion-integrated VR framework with longitudinal, heterogeneous training) and placebo control (longitudinal, heterogeneous VR training without extrasensory modalities). The experiment lasted 10 days. On Day 1, all subjects executed a search-and-shoot simulation closely replicating the elements/situations in the real world. From Day 2 to Day 9, the subjects underwent heterogeneous training, imparted by the design of various complexity levels in the simulation using changes in behavioral attributes/artificial intelligence of the enemies. On Day 10, they repeated the search-and-shoot simulation executed on Day 1. The results showed that the experimental group experienced a gradual increase in presence, immersion, and engagement compared to the placebo control group. However, there was no significant difference in decision-making performance between the two groups on day 10. We intend to use these findings to design multisensory VR training frameworks that enhance engagement levels and decision-making performance.

Exploring Generative AI for Sim2Real in Driving Data Synthesis

Authors: Haonan Zhao, Yiting Wang, Thomas Bashford-Rogers, Valentina Donzella, Kurt Debattista

Link: http://arxiv.org/abs/2404.09111v1open in new window

Abstract: Datasets are essential for training and testing vehicle perception algorithms. However, the collection and annotation of real-world images is time-consuming and expensive. Driving simulators offer a solution by automatically generating various driving scenarios with corresponding annotations, but the simulation-to-reality (Sim2Real) domain gap remains a challenge. While most of the Generative Artificial Intelligence (AI) follows the de facto Generative Adversarial Nets (GANs)-based methods, the recent emerging diffusion probabilistic models have not been fully explored in mitigating Sim2Real challenges for driving data synthesis. To explore the performance, this paper applied three different generative AI methods to leverage semantic label maps from a driving simulator as a bridge for the creation of realistic datasets. A comparative analysis of these methods is presented from the perspective of image quality and perception. New synthetic datasets, which include driving images and auto-generated high-quality annotations, are produced with low costs and high scene variability. The experimental results show that although GAN-based methods are adept at generating high-quality images when provided with manually annotated labels, ControlNet produces synthetic datasets with fewer artefacts and more structural fidelity when using simulator-generated labels. This suggests that the diffusion-based approach may provide improved stability and an alternative method for addressing Sim2Real challenges.

2024-04-12

Single-image driven 3d viewpoint training data augmentation for effective wine label recognition

Authors: Yueh-Cheng Huang, Hsin-Yi Chen, Cheng-Jui Hung, Jen-Hui Chuang, Jenq-Neng Hwang

Link: http://arxiv.org/abs/2404.08820v1open in new window

Abstract: Confronting the critical challenge of insufficient training data in the field of complex image recognition, this paper introduces a novel 3D viewpoint augmentation technique specifically tailored for wine label recognition. This method enhances deep learning model performance by generating visually realistic training samples from a single real-world wine label image, overcoming the challenges posed by the intricate combinations of text and logos. Classical Generative Adversarial Network (GAN) methods fall short in synthesizing such intricate content combination. Our proposed solution leverages time-tested computer vision and image processing strategies to expand our training dataset, thereby broadening the range of training samples for deep learning applications. This innovative approach to data augmentation circumvents the constraints of limited training resources. Using the augmented training images through batch-all triplet metric learning on a Vision Transformer (ViT) architecture, we can get the most discriminative embedding features for every wine label, enabling us to perform one-shot recognition of existing wine labels in the training classes or future newly collected wine labels unavailable in the training. Experimental results show a significant increase in recognition accuracy over conventional 2D data augmentation techniques.

3D Human Scan With A Moving Event Camera

Authors: Kai Kohyama, Shintaro Shiba, Yoshimitsu Aoki

Link: http://arxiv.org/abs/2404.08504v1open in new window

Abstract: Capturing the 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR), but the development of event-based methods is necessary to handle data with different characteristics. This paper proposes a novel event-based method for 3D pose estimation and human mesh recovery. Prior work on event-based human mesh recovery require frames (images) as well as event data. The proposed method solely relies on events; it carves 3D voxels by moving the event camera around a stationary body, reconstructs the human pose and mesh by attenuated rays, and fit statistical body models, preserving high-frequency details. The experimental results show that the proposed method outperforms conventional frame-based methods in the estimation accuracy of both pose and body mesh. We also demonstrate results in challenging situations where a conventional camera has motion blur. This is the first to demonstrate event-only human mesh recovery, and we hope that it is the first step toward achieving robust and accurate 3D human body scanning from vision sensors.

OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering

Authors: Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu

Link: http://arxiv.org/abs/2404.08449v2open in new window

Abstract: Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively. Our code will be available for research purposes.

FaceFilterSense: A Filter-Resistant Face Recognition and Facial Attribute Analysis Framework

Authors: Shubham Tiwari, Yash Sethia, Ritesh Kumar, Ashwani Tanwar, Rudresh Dwivedi

Link: http://arxiv.org/abs/2404.08277v2open in new window

Abstract: With the advent of social media, fun selfie filters have come into tremendous mainstream use affecting the functioning of facial biometric systems as well as image recognition systems. These filters vary from beautification filters and Augmented Reality (AR)-based filters to filters that modify facial landmarks. Hence, there is a need to assess the impact of such filters on the performance of existing face recognition systems. The limitation associated with existing solutions is that these solutions focus more on the beautification filters. However, the current AR-based filters and filters which distort facial key points are in vogue recently and make the faces highly unrecognizable even to the naked eye. Also, the filters considered are mostly obsolete with limited variations. To mitigate these limitations, we aim to perform a holistic impact analysis of the latest filters and propose an user recognition model with the filtered images. We have utilized a benchmark dataset for baseline images, and applied the latest filters over them to generate a beautified/filtered dataset. Next, we have introduced a model FaceFilterNet for beautified user recognition. In this framework, we also utilize our model to comment on various attributes of the person including age, gender, and ethnicity. In addition, we have also presented a filter-wise impact analysis on face recognition, age estimation, gender, and ethnicity prediction. The proposed method affirms the efficacy of our dataset with an accuracy of 87.25% and an optimal accuracy for facial attribute analysis.

GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality

Authors: Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, Jon E. Froehlich

Link: http://arxiv.org/abs/2404.08213v1open in new window

Abstract: Voice assistants (VAs) like Siri and Alexa are transforming human-computer interaction; however, they lack awareness of users' spatiotemporal context, resulting in limited performance and unnatural dialogue. We introduce GazePointAR, a fully-functional context-aware VA for wearable augmented reality that leverages eye gaze, pointing gestures, and conversation history to disambiguate speech queries. With GazePointAR, users can ask "what's over there?" or "how do I solve this math problem?" simply by looking and/or pointing. We evaluated GazePointAR in a three-part lab study (N=12): (1) comparing GazePointAR to two commercial systems; (2) examining GazePointAR's pronoun disambiguation across three tasks; (3) and an open-ended phase where participants could suggest and try their own context-sensitive queries. Participants appreciated the naturalness and human-like nature of pronoun-driven queries, although sometimes pronoun use was counter-intuitive. We then iterated on GazePointAR and conducted a first-person diary study examining how GazePointAR performs in-the-wild. We conclude by enumerating limitations and design considerations for future context-aware VAs.

The Clarkston AR Gateways Project: Anchoring Refugee Presence and Narratives in a Small Town

Authors: Joshua A. Fisher, Fernando Rochaix

Link: http://arxiv.org/abs/2404.08179v1open in new window

Abstract: This paper outlines the Clarkston AR Gateways Project, a speculative process and artifact entering its second phase, where Augmented Reality (AR) will be used to amplify the diverse narratives of Clarkston, Georgia's refugee community. Focused on anchoring their stories and presence into the town's physical and digital landscapes, the project employs a participatory co-design approach, engaging directly with community members. This placemaking effort aims to uplift refugees by teaching them AR development skills that help them more autonomously express and elevate their voices through public art. The result is hoped to be AR experiences that not only challenge prevailing narratives but also celebrate the tapestry of cultures in the small town. This work is supported through AR's unique affordance for users to situate their experiences as interactive narratives within public spaces. Such site-specific AR interactive stories can encourage interactions within those spaces that shift how they are conceived, perceived, and experienced. This process of refugee-driven AR creation reflexively alters the space and affirms their presence and agency. The project's second phase aims to establish a model adaptable to diverse, refugee-inclusive communities, demonstrating how AR storytelling can be a powerful tool for cultural orientation and celebration.

2024-04-11

RASSAR: Room Accessibility and Safety Scanning in Augmented Reality

Authors: Xia Su, Han Zhang, Kaiming Cheng, Jaewook Lee, Qiaochu Liu, Wyatt Olson, Jon Froehlich

Link: http://arxiv.org/abs/2404.07479v1open in new window

Abstract: The safety and accessibility of our homes is critical to quality of life and evolves as we age, become ill, host guests, or experience life events such as having children. Researchers and health professionals have created assessment instruments such as checklists that enable homeowners and trained experts to identify and mitigate safety and access issues. With advances in computer vision, augmented reality (AR), and mobile sensors, new approaches are now possible. We introduce RASSAR, a mobile AR application for semi-automatically identifying, localizing, and visualizing indoor accessibility and safety issues such as an inaccessible table height or unsafe loose rugs using LiDAR and real-time computer vision. We present findings from three studies: a formative study with 18 participants across five stakeholder groups to inform the design of RASSAR, a technical performance evaluation across ten homes demonstrating state-of-the-art performance, and a user study with six stakeholders. We close with a discussion of future AI-based indoor accessibility assessment tools, RASSAR's extensibility, and key application scenarios.

2024-04-10

Mixed Reality Heritage Performance As a Decolonising Tool for Heritage Sites

Authors: Mariza Dima, Damon Daylamani-Zad, Vangelis Lympouridis

Link: http://arxiv.org/abs/2404.07348v1open in new window

Abstract: In this paper we introduce two world-first Mixed Reality (MR) experiences that fuse smart AR glasses and live theatre and take place in a heritage site with the purpose to reveal the site's hidden and difficult histories about slavery. We term these unique general audience experiences Mixed Reality Heritage Performances (MRHP). Along with the development of our initial two performances we designed and developed a tool and guidelines that can help heritage organisations with their decolonising process by critically engaging the public with under-represented voices and viewpoints of troubled European and colonial narratives. The evaluations showed the embodied and affective potential of MRHP to attract and educate heritage audiences visitors. Insights of the design process are being formulated into an extensive design toolkit that aims to support experience design, theatre and heritage professionals to collaboratively carry out similar projects.

Evaluating Navigation and Comparison Performance of Computational Notebooks on Desktop and in Virtual Reality

Authors: Sungwon In, Erick Krokos, Kirsten Whitley, Chris North, Yalong Yang

Link: http://arxiv.org/abs/2404.07161v1open in new window

Abstract: The computational notebook serves as a versatile tool for data analysis. However, its conventional user interface falls short of keeping pace with the ever-growing data-related tasks, signaling the need for novel approaches. With the rapid development of interaction techniques and computing environments, there is a growing interest in integrating emerging technologies in data-driven workflows. Virtual reality, in particular, has demonstrated its potential in interactive data visualizations. In this work, we aimed to experiment with adapting computational notebooks into VR and verify the potential benefits VR can bring. We focus on the navigation and comparison aspects as they are primitive components in analysts' workflow. To further improve comparison, we have designed and implemented a Branching&Merging functionality. We tested computational notebooks on the desktop and in VR, both with and without the added Branching&Merging capability. We found VR significantly facilitated navigation compared to desktop, and the ability to create branches enhanced comparison.

Exploring Physiological Responses in Virtual Reality-based Interventions for Autism Spectrum Disorder: A Data-Driven Investigation

Authors: Gianpaolo Alvari, Ersilia Vallefuoco, Melanie Cristofolini, Elio Salvadori, Marco Dianti, Alessia Moltani, Davide Dal Castello, Paola Venuti, Cesare Furlanello

Link: http://arxiv.org/abs/2404.07159v1open in new window

Abstract: Virtual Reality (VR) has emerged as a promising tool for enhancing social skills and emotional well-being in individuals with Autism Spectrum Disorder (ASD). Through a technical exploration, this study employs a multiplayer serious gaming environment within VR, engaging 34 individuals diagnosed with ASD and employing high-precision biosensors for a comprehensive view of the participants' arousal and responses during the VR sessions. Participants were subjected to a series of 3 virtual scenarios designed in collaboration with stakeholders and clinical experts to promote socio-cognitive skills and emotional regulation in a controlled and structured virtual environment. We combined the framework with wearable non-invasive sensors for bio-signal acquisition, focusing on the collection of heart rate variability, and respiratory patterns to monitor participants behaviors. Further, behavioral assessments were conducted using observation and semi-structured interviews, with the data analyzed in conjunction with physiological measures to identify correlations and explore digital-intervention efficacy. Preliminary analysis revealed significant correlations between physiological responses and behavioral outcomes, indicating the potential of physiological feedback to enhance VR-based interventions for ASD. The study demonstrated the feasibility of using real-time data to adapt virtual scenarios, suggesting a promising avenue to support personalized therapy. The integration of quantitative physiological feedback into digital platforms represents a forward step in the personalized intervention for ASD. By leveraging real-time data to adjust therapeutic content, this approach promises to enhance the efficacy and engagement of digital-based therapies.

SARA: Smart AI Reading Assistant for Reading Comprehension

Authors: Enkeleda Thaqi, Mohamed Mantawy, Enkelejda Kasneci

Link: http://arxiv.org/abs/2404.06906v1open in new window

Abstract: SARA integrates Eye Tracking and state-of-the-art large language models in a mixed reality framework to enhance the reading experience by providing personalized assistance in real-time. By tracking eye movements, SARA identifies the text segments that attract the user's attention the most and potentially indicate uncertain areas and comprehension issues. The process involves these key steps: text detection and extraction, gaze tracking and alignment, and assessment of detected reading difficulty. The results are customized solutions presented directly within the user's field of view as virtual overlays on identified difficult text areas. This support enables users to overcome challenges like unfamiliar vocabulary and complex sentences by offering additional context, rephrased solutions, and multilingual help. SARA's innovative approach demonstrates it has the potential to transform the reading experience and improve reading proficiency.

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Authors: Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi

Link: http://arxiv.org/abs/2404.06903v1open in new window

Abstract: The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary "flat" (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360$^{\circ}$ perspective, providing an enhanced immersive experience over existing techniques. Project website at: http://dreamscene360.github.io/

Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data

Authors: Aakash Kumar, Chen Chen, Ajmal Mian, Neils Lobo, Mubarak Shah

Link: http://arxiv.org/abs/2404.06715v1open in new window

Abstract: 3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augmented reality. Monocular 3D detection is attractive as it requires only a single camera, however, it lacks the accuracy and robustness required for real world applications. High resolution LiDAR on the other hand, can be expensive and lead to interference problems in heavy traffic given their active transmissions. We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection. Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor. Specifically, we use only 512 points, which is just 1% of a full LiDAR frame in the KITTI dataset. Our method reconstructs a complete 3D point cloud from this limited 3D information combined with a single image. The reconstructed 3D point cloud and corresponding image can be used by any multi-modal off-the-shelf detector for 3D object detection. By using the proposed network architecture with an off-the-shelf multi-modal 3D detector, the accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods and 6% to 9% compare to the baseline multi-modal methods on KITTI and JackRabbot datasets.

2024-04-09

Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

Authors: Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu, Eric Brachmann

Link: http://arxiv.org/abs/2404.06337v1open in new window

Abstract: Given two images, we can estimate the relative camera pose between them by establishing image-to-image correspondences. Usually, correspondences are 2D-to-2D and the pose we estimate is defined only up to scale. Some applications, aiming at instant augmented reality anywhere, require scale-metric pose estimates, and hence, they rely on external depth estimators to recover the scale. We present MicKey, a keypoint matching pipeline that is able to predict metric correspondences in 3D camera space. By learning to match 3D coordinates across images, we are able to infer the metric relative pose without depth measurements. Depth measurements are also not required for training, nor are scene reconstructions or image overlap information. MicKey is supervised only by pairs of images and their relative poses. MicKey achieves state-of-the-art performance on the Map-Free Relocalisation benchmark while requiring less supervision than competing approaches.

Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

Authors: Zhihao Lin, Wei Ma, Tao Lin, Yaowen Zheng, Jingquan Ge, Jun Wang, Jacques Klein, Tegawende Bissyande, Yang Liu, Li Li

Link: http://arxiv.org/abs/2404.06201v1open in new window

Abstract: Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. Like traditional SE tools, open-source collaboration is key in realising the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximising the sources of high-quality data. However, data especially of high quality, often holds commercial or sensitive value, making it less accessible for open-source AI-based SE projects. This reality presents a significant barrier to the development and enhancement of AI-based SE tools within the software engineering community. Therefore, researchers need to find solutions for enabling open-source AI-based SE models to tap into resources by different organisations. Addressing this challenge, our position paper investigates one solution to facilitate access to diverse organizational resources for open-source AI models, ensuring privacy and commercial sensitivities are respected. We introduce a governance framework centered on federated learning (FL), designed to foster the joint development and maintenance of open-source AI code models while safeguarding data privacy and security. Additionally, we present guidelines for developers on AI-based SE tool collaboration, covering data requirements, model architecture, updating strategies, and version control. Given the significant influence of data characteristics on FL, our research examines the effect of code data heterogeneity on FL performance.

Streamlined Transmission: A Semantic-Aware XR Deployment Framework Enhanced by Generative AI

Authors: Wanting Yang, Zehui Xiong, Tony Q. S. Quek, Xuemin Shen

Link: http://arxiv.org/abs/2404.06182v1open in new window

Abstract: In the era of 6G, featuring compelling visions of digital twins and metaverses, Extended Reality (XR) has emerged as a vital conduit connecting the digital and physical realms, garnering widespread interest. Ensuring a fully immersive wireless XR experience stands as a paramount technical necessity, demanding the liberation of XR from the confines of wired connections. In this paper, we first introduce the technologies applied in the wireless XR domain, delve into their benefits and limitations, and highlight the ongoing challenges. We then propose a novel deployment framework for a broad XR pipeline, termed "GeSa-XRF", inspired by the core philosophy of Semantic Communication (SemCom) which shifts the concern from "how" to transmit to "what" to transmit. Particularly, the framework comprises three stages: data collection, data analysis, and data delivery. In each stage, we integrate semantic awareness to achieve streamlined transmission and employ Generative Artificial Intelligence (GAI) to achieve collaborative refinements. For the data collection of multi-modal data with differentiated data volumes and heterogeneous latency requirements, we propose a novel SemCom paradigm based on multi-modal fusion and separation and a GAI-based robust superposition scheme. To perform a comprehensive data analysis, we employ multi-task learning to perform the prediction of field of view and personalized attention and discuss the possible preprocessing approaches assisted by GAI. Lastly, for the data delivery stage, we present a semantic-aware multicast-based delivery strategy aimed at reducing pixel level redundant transmissions and introduce the GAI collaborative refinement approach. The performance gain of the proposed GeSa-XRF is preliminarily demonstrated through a case study.

EVE: Enabling Anyone to Train Robot using Augmented Reality

Authors: Jun Wang, Chun-Cheng Chang, Jiafei Duan, Dieter Fox, Ranjay Krishna

Link: http://arxiv.org/abs/2404.06089v1open in new window

Abstract: The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training a robot to automate a task typically requires physical robots and expensive demonstration data from trained human annotators. Consequently, only those with access to physical robots produce demonstrations to train robots. To mitigate this issue, we introduce EVE, an iOS app that enables everyday users to train robots using intuitive augmented reality visualizations without needing a physical robot. With EVE, users can collect demonstrations by specifying waypoints with their hands, visually inspecting the environment for obstacles, modifying existing waypoints, and verifying collected trajectories. In a user study ($N=14$, $D=30$) consisting of three common tabletop tasks, EVE outperformed three state-of-the-art interfaces in success rate and was comparable to kinesthetic teaching-physically moving a real robot-in completion time, usability, motion intent communication, enjoyment, and preference ($mean_{p}=0.30$). We conclude by enumerating limitations and design considerations for future AR-based demonstration collection systems for robotics.

2024-04-08

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Authors: Giwon Hong, Aryo Pradipta Gema, Rohit Saxena, Xiaotang Du, Ping Nie, Yu Zhao, Laura Perez-Beltrachini, Max Ryabinin, Xuanli He, Clémentine Fourrier, Pasquale Minervini

Link: http://arxiv.org/abs/2404.05904v2open in new window

Abstract: Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone to ``hallucinations'' -- outputs that do not align with factual reality or the input context. This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and compare the tendency of each model to produce hallucinations. The leaderboard uses a comprehensive set of benchmarks focusing on different aspects of hallucinations, such as factuality and faithfulness, across various tasks, including question-answering, summarisation, and reading comprehension. Our analysis provides insights into the performance of different models, guiding researchers and practitioners in choosing the most reliable models for their applications.

A Realistic Surgical Simulator for Non-Rigid and Contact-Rich Manipulation in Surgeries with the da Vinci Research Kit

Authors: Yafei Ou, Sadra Zargarzadeh, Paniz Sedighi, Mahdi Tavakoli

Link: http://arxiv.org/abs/2404.05888v1open in new window

Abstract: Realistic real-time surgical simulators play an increasingly important role in surgical robotics research, such as surgical robot learning and automation, and surgical skills assessment. Although there are a number of existing surgical simulators for research, they generally lack the ability to simulate the diverse types of objects and contact-rich manipulation tasks typically present in surgeries, such as tissue cutting and blood suction. In this work, we introduce CRESSim, a realistic surgical simulator based on PhysX 5 for the da Vinci Research Kit (dVRK) that enables simulating various contact-rich surgical tasks involving different surgical instruments, soft tissue, and body fluids. The real-world dVRK console and the master tool manipulator (MTM) robots are incorporated into the system to allow for teleoperation through virtual reality (VR). To showcase the advantages and potentials of the simulator, we present three examples of surgical tasks, including tissue grasping and deformation, blood suction, and tissue cutting. These tasks are performed using the simulated surgical instruments, including the large needle driver, suction irrigator, and curved scissor, through VR-based teleoperation.

On the Fly Robotic-Assisted Medical Instrument Planning and Execution Using Mixed Reality

Authors: Letian Ai, Yihao Liu, Mehran Armand, Amir Kheradmand, Alejandro Martin-Gomez

Link: http://arxiv.org/abs/2404.05887v1open in new window

Abstract: Robotic-assisted medical systems (RAMS) have gained significant attention for their advantages in alleviating surgeons' fatigue and improving patients' outcomes. These systems comprise a range of human-computer interactions, including medical scene monitoring, anatomical target planning, and robot manipulation. However, despite its versatility and effectiveness, RAMS demands expertise in robotics, leading to a high learning cost for the operator. In this work, we introduce a novel framework using mixed reality technologies to ease the use of RAMS. The proposed framework achieves real-time planning and execution of medical instruments by providing 3D anatomical image overlay, human-robot collision detection, and robot programming interface. These features, integrated with an easy-to-use calibration method for head-mounted display, improve the effectiveness of human-robot interactions. To assess the feasibility of the framework, two medical applications are presented in this work: 1) coil placement during transcranial magnetic stimulation and 2) drill and injector device positioning during femoroplasty. Results from these use cases demonstrate its potential to extend to a wider range of medical scenarios.

Two-Person Interaction Augmentation with Skeleton Priors

Authors: Baiyi Li, Edmond S. L. Ho, Hubert P. H. Shum, He Wang

Link: http://arxiv.org/abs/2404.05490v2open in new window

Abstract: Close and continuous interaction with rich contacts is a crucial aspect of human activities (e.g. hugging, dancing) and of interest in many domains like activity recognition, motion prediction, character animation, etc. However, acquiring such skeletal motion is challenging. While direct motion capture is expensive and slow, motion editing/generation is also non-trivial, as complex contact patterns with topological and geometric constraints have to be retained. To this end, we propose a new deep learning method for two-body skeletal interaction motion augmentation, which can generate variations of contact-rich interactions with varying body sizes and proportions while retaining the key geometric/topological relations between two bodies. Our system can learn effectively from a relatively small amount of data and generalize to drastically different skeleton sizes. Through exhaustive evaluation and comparison, we show it can generate high-quality motions, has strong generalizability and outperforms traditional optimization-based methods and alternative deep learning solutions.

WebXR, A-Frame and Networked-Aframe as a Basis for an Open Metaverse: A Conceptual Architecture

Authors: Giuseppe Macario

Link: http://arxiv.org/abs/2404.05317v2open in new window

Abstract: This work proposes a WebXR-based cross-platform conceptual architecture, leveraging the A-Frame and Networked-Aframe frameworks, in order to facilitate the development of an open, accessible, and interoperable metaverse. By introducing the concept of spatial web app, this research contributes to the discourse on the metaverse, offering an architecture that democratizes access to virtual environments and extended reality through the web, and aligns with Tim Berners-Lee's original vision of the World Wide Web as an open platform in the digital realm.

Can Edge Computing fulfill the requirements of automated vehicular services using 5G network ?

Authors: Wendlasida Ouedraogo, Andrea Araldo, Badii Jouaber, Hind Castel, Remy Grunblatt

Link: http://arxiv.org/abs/2404.05296v1open in new window

Abstract: Communication and computation services supporting Connected and Automated Vehicles (CAVs) are characterized by stringent requirements, in terms of response time and reliability. Fulfilling these requirements is crucial for ensuring road safety and traffic optimization. The conceptually simple solution of hosting these services in the vehicles increases their cost (mainly due to the installation and maintenance of computation infrastructure) and may drain their battery excessively. Such disadvantages can be tackled via Multi-Access Edge Computing (MEC), consisting in deploying computation capability in network nodes deployed close to the devices (vehicles in this case), such as to satisfy the stringent CAV requirements. However, it is not yet clear under which conditions MEC can support CAV requirements and for which services. To shed light on this question, we conduct a simulation campaign using well-known open-source simulation tools, namely OMNeT++, Simu5G, Veins, INET, and SUMO. We are thus able to provide a reality check on MEC for CAV, pinpointing what are the computation capacities that must be installed in the MEC, to support the different services, and the amount of vehicles that a single MEC node can support. We find that such parameters must vary a lot, depending on the service considered. This study can serve as a preliminary basis for network operators to plan future deployment of MEC to support CAV.

2024-04-07

Reduction of Forgetting by Contextual Variation During Encoding Using 360-Degree Video-Based Immersive Virtual Environments

Authors: Takato Mizuho, Takuji Narumi, Hideaki Kuzuoka

Link: http://arxiv.org/abs/2404.05007v1open in new window

Abstract: Recall impairment in a different environmental context from learning is called context-dependent forgetting. Two learning methods have been proposed to prevent context-dependent forgetting: reinstatement and decontextualization. Reinstatement matches the environmental context between learning and retrieval, whereas decontextualization involves repeated learning in various environmental contexts and eliminates the context dependency of memory. Conventionally, these methods have been validated by switching between physical rooms. However, in this study, we use immersive virtual environments (IVEs) as the environmental context assisted by virtual reality (VR), which is known for its low cost and high reproducibility compared to traditional manipulation. Whereas most existing studies using VR have failed to reveal the reinstatement effect, we test its occurrence using a 360-degree video-based IVE with improved familiarity and realism instead of a computer graphics-based IVE. Furthermore, we are the first to address decontextualization using VR. Our experiment showed that repeated learning in the same constant IVE as retrieval did not significantly reduce forgetting compared to repeated learning in different constant IVEs. Conversely, repeated learning in various IVEs significantly reduced forgetting than repeated learning in constant IVEs. These findings contribute to the design of IVEs for VR-based applications, particularly in educational settings.

2024-04-06

TeleAware Robot: Designing Awareness-augmented Telepresence Robot for Remote Collaborative Locomotion

Authors: Ruyi Li, Yaxin Zhu, Min Liu, Yihang Zeng, Shanning Zhuang, Jiayi Fu, Yi Lu, Guyue Zhou, Can Liu, Jiangtao Gong

Link: http://arxiv.org/abs/2404.04579v1open in new window

Abstract: Telepresence robots can be used to support users to navigate an environment remotely and share the visiting experience with their social partners. Although such systems allow users to see and hear the remote environment and communicate with their partners via live video feed, this does not provide enough awareness of the environment and their remote partner's activities. In this paper, we introduce an awareness framework for collaborative locomotion in scenarios of onsite and remote users visiting a place together. From an observational study of small groups of people visiting exhibitions, we derived four design goals for enhancing the environmental and social awareness between social partners, and developed a set of awareness-enhancing techniques to add to a standard telepresence robot - named TeleAware robot. Through a controlled experiment simulating a guided exhibition visiting task, TeleAware robot showed the ability to lower the workload, facilitate closer social proximity, and improve mutual awareness and social presence compared with the standard one. We discuss the impact of mobility and roles of local and remote users, and provide insights for the future design of awareness-enhancing telepresence robot systems that facilitate collaborative locomotion.

2024-04-05

Effects of Multisensory Feedback on the Perception and Performance of Virtual Reality Hand-Retargeted Interaction

Authors: Hyunyoung Jang, Jinwook Kim, Jeongmi Lee

Link: http://arxiv.org/abs/2404.03899v1open in new window

Abstract: Retargeting methods that modify the visual representation of real movements have been widely used to expand the interaction space and create engaging virtual reality experiences. For optimal user experience and performance, it is essential to specify the perception of retargeting and utilize the appropriate range of modification parameters. However, previous studies mostly concentrated on whether users perceived the target sense or not and rarely examined the perceptual accuracy and sensitivity to retargeting. Moreover, it is unknown how the perception and performance in hand-retargeted interactions are influenced by multisensory feedback. In this study, we used rigorous psychophysical methods to specify users' perceptual accuracy and sensitivity to hand-retargeting and provide acceptable ranges of retargeting parameters. We also presented different multisensory feedback simultaneously with the retargeting to probe its effect on users' perception and task performance. The experimental results showed that providing continuous multisensory feedback, proportionate to the distance between the virtual hand and the targeted destination, heightened the accuracy of users' perception of hand retargeting without altering their perceptual sensitivity. Furthermore, the utilization of multisensory feedback considerably improved the precision of task performance, particularly at lower gain factors. Based on these findings, we propose design guidelines and potential applications of VR hand-retargeted interactions and multisensory feedback for optimal user experience and performance.

2024-04-04

I Did Not Notice: A Comparison of Immersive Analytics with Augmented and Virtual Reality

Authors: Xiaoyan Zhou, Anil Ufuk Batmaz, Adam S. Williams, Dylan Schreiber, Francisco Ortega

Link: http://arxiv.org/abs/2404.03814v1open in new window

Abstract: Immersive environments enable users to engage in embodied interaction, enhancing the sensemaking processes involved in completing tasks such as immersive analytics. Previous comparative studies on immersive analytics using augmented and virtual realities have revealed that users employ different strategies for data interpretation and text-based analytics depending on the environment. Our study seeks to investigate how augmented and virtual reality influences sensemaking processes in quantitative immersive analytics. Our results, derived from a diverse group of participants, indicate that users demonstrate comparable performance in both environments. However, it was observed that users exhibit a higher tolerance for cognitive load in VR and travel further in AR. Based on our findings, we recommend providing users with the option to switch between AR and VR, thereby enabling them to select an environment that aligns with their preferences and task requirements.

Integrating Large Language Models with Multimodal Virtual Reality Interfaces to Support Collaborative Human-Robot Construction Work

Authors: Somin Park, Carol C. Menassa, Vineet R. Kamat

Link: http://arxiv.org/abs/2404.03498v1open in new window

Abstract: In the construction industry, where work environments are complex, unstructured and often dangerous, the implementation of Human-Robot Collaboration (HRC) is emerging as a promising advancement. This underlines the critical need for intuitive communication interfaces that enable construction workers to collaborate seamlessly with robotic assistants. This study introduces a conversational Virtual Reality (VR) interface integrating multimodal interaction to enhance intuitive communication between construction workers and robots. By integrating voice and controller inputs with the Robot Operating System (ROS), Building Information Modeling (BIM), and a game engine featuring a chat interface powered by a Large Language Model (LLM), the proposed system enables intuitive and precise interaction within a VR setting. Evaluated by twelve construction workers through a drywall installation case study, the proposed system demonstrated its low workload and high usability with succinct command inputs. The proposed multimodal interaction system suggests that such technological integration can substantially advance the integration of robotic assistants in the construction industry.

Influence of Gameplay Duration, Hand Tracking, and Controller Based Control Methods on UX in VR

Authors: Tanja Kojić, Maurizio Vergari, Simon Knuth, Maximilian Warsinke, Sebastian Möller, Jan-Niklas Voigt-Antons

Link: http://arxiv.org/abs/2404.03337v1open in new window

Abstract: Inside-out tracking is growing popular in consumer VR, enhancing accessibility. It uses HMD camera data and neural networks for effective hand tracking. However, limited user experience studies have compared this method to traditional controllers, with no consensus on the optimal control technique. This paper investigates the impact of control methods and gaming duration on VR user experience, hypothesizing hand tracking might be preferred for short sessions and by users new to VR due to its simplicity. Through a lab study with twenty participants, evaluating presence, emotional response, UX quality, and flow, findings revealed control type and session length affect user experience without significant interaction. Controllers were generally superior, attributed to their reliability, and longer sessions increased presence and realism. The study found that individuals with more VR experience were more inclined to recommend hand tracking to others, which contradicted predictions.

Fusion of Mixture of Experts and Generative Artificial Intelligence in Mobile Edge Metaverse

Authors: Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, Shiwen Mao, Dong In Kim

Link: http://arxiv.org/abs/2404.03321v1open in new window

Abstract: In the digital transformation era, Metaverse offers a fusion of virtual reality (VR), augmented reality (AR), and web technologies to create immersive digital experiences. However, the evolution of the Metaverse is slowed down by the challenges of content creation, scalability, and dynamic user interaction. Our study investigates an integration of Mixture of Experts (MoE) models with Generative Artificial Intelligence (GAI) for mobile edge computing to revolutionize content creation and interaction in the Metaverse. Specifically, we harness an MoE model's ability to efficiently manage complex data and complex tasks by dynamically selecting the most relevant experts running various sub-models to enhance the capabilities of GAI. We then present a novel framework that improves video content generation quality and consistency, and demonstrate its application through case studies. Our findings underscore the efficacy of MoE and GAI integration to redefine virtual experiences by offering a scalable, efficient pathway to harvest the Metaverse's full potential.

Exploring Emotions in Multi-componential Space using Interactive VR Games

Authors: Rukshani Somarathna, Gelareh Mohammadi

Link: http://arxiv.org/abs/2404.03239v1open in new window

Abstract: Emotion understanding is a complex process that involves multiple components. The ability to recognise emotions not only leads to new context awareness methods but also enhances system interaction's effectiveness by perceiving and expressing emotions. Despite the attention to discrete and dimensional models, neuroscientific evidence supports those emotions as being complex and multi-faceted. One framework that resonated well with such findings is the Component Process Model (CPM), a theory that considers the complexity of emotions with five interconnected components: appraisal, expression, motivation, physiology and feeling. However, the relationship between CPM and discrete emotions has not yet been fully explored. Therefore, to better understand emotions underlying processes, we operationalised a data-driven approach using interactive Virtual Reality (VR) games and collected multimodal measures (self-reports, physiological and facial signals) from 39 participants. We used Machine Learning (ML) methods to identify the unique contributions of each component to emotion differentiation. Our results showed the role of different components in emotion differentiation, with the model including all components demonstrating the most significant contribution. Moreover, we found that at least five dimensions are needed to represent the variation of emotions in our dataset. These findings also have implications for using VR environments in emotion research and highlight the role of physiological signals in emotion recognition within such environments.

2024-04-03

Self-supervised 6-DoF Robot Grasping by Demonstration via Augmented Reality Teleoperation System

Authors: Xiwen Dengxiong, Xueting Wang, Shi Bai, Yunbo Zhang

Link: http://arxiv.org/abs/2404.03067v1open in new window

Abstract: Most existing 6-DoF robot grasping solutions depend on strong supervision on grasp pose to ensure satisfactory performance, which could be laborious and impractical when the robot works in some restricted area. To this end, we propose a self-supervised 6-DoF grasp pose detection framework via an Augmented Reality (AR) teleoperation system that can efficiently learn human demonstrations and provide 6-DoF grasp poses without grasp pose annotations. Specifically, the system collects the human demonstration from the AR environment and contrastively learns the grasping strategy from the demonstration. For the real-world experiment, the proposed system leads to satisfactory grasping abilities and learning to grasp unknown objects within three demonstrations.

AI-augmented Automation for Real Driving Prediction: an Industrial Use Case

Authors: Romina Eramo, Hamzeh Eyal Salman, Matteo Spezialetti, Darko Stern, Pierre Quinton, Antonio Cicchetti

Link: http://arxiv.org/abs/2404.02841v1open in new window

Abstract: The risen complexity of automotive systems requires new development strategies and methods to master the upcoming challenges. Traditional methods need thus to be changed by an increased level of automation, and a faster continuous improvement cycle. In this context, current vehicle performance tests represent a very time-consuming and expensive task due to the need to perform the tests in real driving conditions. As a consequence, agile/iterative processes like DevOps are largely hindered by the necessity of triggering frequent tests. This paper reports on a practical experience of developing an AI-augmented solution based on Machine Learning and Model-based Engineering to support continuous vehicle development and testing. In particular, historical data collected in real driving conditions is leveraged to synthesize a high-fidelity driving simulator and hence enable performance tests in virtual environments. Based on this practical experience, this paper also proposes a conceptual framework to support predictions based on real driving behavior.

2024-04-02

A Change of Scenery: Transformative Insights from Retrospective VR Embodied Perspective-Taking of Conflict With a Close Other

Authors: Seraphina Yong, Leo Cui, Evan Suma Rosenberg, Svetlana Yarosh

Link: http://arxiv.org/abs/2404.02277v1open in new window

Abstract: Close relationships are irreplaceable social resources, yet prone to high-risk conflict. Building on findings from the fields of HCI, virtual reality, and behavioral therapy, we evaluate the unexplored potential of retrospective VR-embodied perspective-taking to fundamentally influence conflict resolution in close others. We develop a biographically-accurate Retrospective Embodied Perspective-Taking system (REPT) and conduct a mixed-methods evaluation of its influence on close others' reflection and communication, compared to video-based reflection methods currently used in therapy (treatment as usual, or TAU). Our key findings provide evidence that REPT was able to significantly improve communication skills and positive sentiment of both partners during conflict, over TAU. The qualitative data also indicated that REPT surpassed basic perspective-taking by exclusively stimulating users to embody and reflect on both their own and their partner's experiences at the same level. In light of these findings, we provide implications and an agenda for social embodiment in HCI design: conceptualizing the use of `embodied social cognition,' and envisioning socially-embodied experiences as an interactive context.

Causality-based Transfer of Driving Scenarios to Unseen Intersections

Authors: Christoph Glasmacher, Michael Schuldes, Sleiman El Masri, Lutz Eckstein

Link: http://arxiv.org/abs/2404.02046v1open in new window

Abstract: Scenario-based testing of automated driving functions has become a promising method to reduce time and cost compared to real-world testing. In scenario-based testing automated functions are evaluated in a set of pre-defined scenarios. These scenarios provide information about vehicle behaviors, environmental conditions, or road characteristics using parameters. To create realistic scenarios, parameters and parameter dependencies have to be fitted utilizing real-world data. However, due to the large variety of intersections and movement constellations found in reality, data may not be available for certain scenarios. This paper proposes a methodology to systematically analyze relations between parameters of scenarios. Bayesian networks are utilized to analyze causal dependencies in order to decrease the amount of required data and to transfer causal patterns creating unseen scenarios. Thereby, infrastructural influences on movement patterns are investigated to generate realistic scenarios on unobserved intersections. For evaluation, scenarios and underlying parameters are extracted from the inD dataset. Movement patterns are estimated, transferred and checked against recorded data from those initially unseen intersections.

3D scene generation from scene graphs and self-attention

Authors: Pietro Bonazzi

Link: http://arxiv.org/abs/2404.01887v2open in new window

Abstract: Synthesizing realistic and diverse indoor 3D scene layouts in a controllable fashion opens up applications in simulated navigation and virtual reality. As concise and robust representations of a scene, scene graphs have proven to be well-suited as the semantic control on the generated layout. We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans. We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene, and use these as the building blocks of our model. Our model, leverages graph transformers to estimate the size, dimension and orientation of the objects in a room while satisfying relationships in the given scene graph. Our experiments shows self-attention layers leads to sparser (7.9x compared to Graphto3D) and more diverse scenes (16%).

Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Authors: Nassim Sehad, Lina Bariah, Wassim Hamidouche, Hamed Hellaoui, Riku Jäntti, Mérouane Debbah

Link: http://arxiv.org/abs/2404.01713v1open in new window

Abstract: Over the past two decades, the Internet-of-Things (IoT) has been a transformative concept, and as we approach 2030, a new paradigm known as the Internet of Senses (IoS) is emerging. Unlike conventional Virtual Reality (VR), IoS seeks to provide multi-sensory experiences, acknowledging that in our physical reality, our perception extends far beyond just sight and sound; it encompasses a range of senses. This article explores existing technologies driving immersive multi-sensory media, delving into their capabilities and potential applications. This exploration includes a comparative analysis between conventional immersive media streaming and a proposed use case that leverages semantic communication empowered by generative Artificial Intelligence (AI). The focal point of this analysis is the substantial reduction in bandwidth consumption by 99.93% in the proposed scheme. Through this comparison, we aim to underscore the practical applications of generative AI for immersive media while addressing the challenges and outlining future trajectories.

2024-04-01

Scalable Scene Modeling from Perspective Imaging: Physics-based Appearance and Geometry Inference

Authors: Shuang Song

Link: http://arxiv.org/abs/2404.01248v1open in new window

Abstract: 3D scene modeling techniques serve as the bedrocks in the geospatial engineering and computer science, which drives many applications ranging from automated driving, terrain mapping, navigation, virtual, augmented, mixed, and extended reality (for gaming and movie industry etc.). This dissertation presents a fraction of contributions that advances 3D scene modeling to its state of the art, in the aspects of both appearance and geometry modeling. In contrast to the prevailing deep learning methods, as a core contribution, this thesis aims to develop algorithms that follow first principles, where sophisticated physic-based models are introduced alongside with simpler learning and inference tasks. The outcomes of these algorithms yield processes that can consume much larger volume of data for highly accurate reconstructing 3D scenes at a scale without losing methodological generality, which are not possible by contemporary complex-model based deep learning methods. Specifically, the dissertation introduces three novel methodologies that address the challenges of inferring appearance and geometry through physics-based modeling. Overall, the research encapsulated in this dissertation marks a series of methodological triumphs in the processing of complex datasets. By navigating the confluence of deep learning, computational geometry, and photogrammetry, this work lays down a robust framework for future exploration and practical application in the rapidly evolving field of 3D scene reconstruction. The outcomes of these studies are evidenced through rigorous experiments and comparisons with existing state-of-the-art methods, demonstrating the efficacy and scalability of the proposed approaches.

Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs

Authors: Jialou Wang, Manli Zhu, Yulei Li, Honglei Li, Longzhi Yang, Wai Lok Woo

Link: http://arxiv.org/abs/2404.01151v1open in new window

Abstract: Localization plays a crucial role in enhancing the practicality and precision of VQA systems. By enabling fine-grained identification and interaction with specific parts of an object, it significantly improves the system's ability to provide contextually relevant and spatially accurate responses, crucial for applications in dynamic environments like robotics and augmented reality. However, traditional systems face challenges in accurately mapping objects within images to generate nuanced and spatially aware responses. In this work, we introduce "Detect2Interact", which addresses these challenges by introducing an advanced approach for fine-grained object visual key field detection. First, we use the segment anything model (SAM) to generate detailed spatial maps of objects in images. Next, we use Vision Studio to extract semantic object descriptions. Third, we employ GPT-4's common sense knowledge, bridging the gap between an object's semantics and its spatial map. As a result, Detect2Interact achieves consistent qualitative results on object key field detection across extensive test cases and outperforms the existing VQA system with object detection by providing a more reasonable and finer visual representation.

AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Authors: Liu Yang, Huiyu Duan, Long Teng, Yucheng Zhu, Xiaohong Liu, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet

Link: http://arxiv.org/abs/2404.01024v1open in new window

Abstract: In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distortions compared to natural omnidirectional images, however, there is no dedicated Image Quality Assessment (IQA) criteria for assessing them. This study addresses this gap by establishing a large-scale AI generated omnidirectional image IQA database named AIGCOIQA2024 and constructing a comprehensive benchmark. We first generate 300 omnidirectional images based on 5 AIGC models utilizing 25 text prompts. A subjective IQA experiment is conducted subsequently to assess human visual preferences from three perspectives including quality, comfortability, and correspondence. Finally, we conduct a benchmark experiment to evaluate the performance of state-of-the-art IQA models on our database. The database will be released to facilitate future research.

BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks

Authors: Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang

Link: http://arxiv.org/abs/2404.00924v1open in new window

Abstract: Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow estimation (OFE)) have been widely involved in our daily life in applications like autonomous driving, augmented reality and video composition. Although certain applications are security-critical or bear societal significance, the adversarial robustness of such models are not sufficiently studied, especially in the black-box scenario. In this work, we introduce the first unified black-box adversarial patch attack framework against pixel-wise regression tasks, aiming to identify the vulnerabilities of these models under query-based black-box attacks. We propose a novel square-based adversarial patch optimization framework and employ probabilistic square sampling and score-based gradient estimation techniques to generate the patch effectively and efficiently, overcoming the scalability problem of previous black-box patch attacks. Our attack prototype, named BadPart, is evaluated on both MDE and OFE tasks, utilizing a total of 7 models. BadPart surpasses 3 baseline methods in terms of both attack performance and efficiency. We also apply BadPart on the Google online service for portrait depth estimation, causing 43.5% relative distance error with 50K queries. State-of-the-art (SOTA) countermeasures cannot defend our attack effectively.

2024-03-30

Enhancing Empathy in Virtual Reality: An Embodied Approach to Mindset Modulation

Authors: Seoyeon Bae, Yoon Kyung Lee, Jungcheol Lee, Jaeheon Kim, Haeseong Jeon, Seung-Hwan Lim, Byung-Cheol Kim, Sowon Hahn

Link: http://arxiv.org/abs/2404.00300v1open in new window

Abstract: A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with the character and cultivate a growth mindset as they followed mission instructions. We considered several implementation factors to assist players in positioning within the VR experience, including positive feedback, content difficulty, background lighting, and multimodal feedback. We conducted an experiment to investigate the intervention's effectiveness in increasing empathy. Our findings revealed that the VR content and mindset training encouraged participants to improve their growth mindsets and empathic motives. This VR content was developed for college students to enhance their empathy and teamwork skills. It has the potential to improve collaboration in organizational and community environments.

2024-03-28

Using Deep Learning to Increase Eye-Tracking Robustness, Accuracy, and Precision in Virtual Reality

Authors: Kevin Barkevich, Reynold Bailey, Gabriel J. Diaz

Link: http://arxiv.org/abs/2403.19768v1open in new window

Abstract: Algorithms for the estimation of gaze direction from mobile and video-based eye trackers typically involve tracking a feature of the eye that moves through the eye camera image in a way that covaries with the shifting gaze direction, such as the center or boundaries of the pupil. Tracking these features using traditional computer vision techniques can be difficult due to partial occlusion and environmental reflections. Although recent efforts to use machine learning (ML) for pupil tracking have demonstrated superior results when evaluated using standard measures of segmentation performance, little is known of how these networks may affect the quality of the final gaze estimate. This work provides an objective assessment of the impact of several contemporary ML-based methods for eye feature tracking when the subsequent gaze estimate is produced using either feature-based or model-based methods. Metrics include the accuracy and precision of the gaze estimate, as well as drop-out rate.

MRNaB: Mixed Reality-based Robot Navigation Interface using Optical-see-through MR-beacon

Authors: Eduardo Iglesius, Masato Kobayashi, Yuki Uranishi, Haruo Takemura

Link: http://arxiv.org/abs/2403.19310v1open in new window

Abstract: Recent advancements in robotics have led to the development of numerous interfaces to enhance the intuitiveness of robot navigation. However, the reliance on traditional 2D displays imposes limitations on the simultaneous visualization of information. Mixed Reality (MR) technology addresses this issue by enhancing the dimensionality of information visualization, allowing users to perceive multiple pieces of information concurrently. This paper proposes Mixed reality-based robot navigation interface using an optical-see-through MR-beacon (MRNaB), a novel approach that incorporates an MR-beacon, situated atop the real-world environment, to function as a signal transmitter for robot navigation. This MR-beacon is designed to be persistent, eliminating the need for repeated navigation inputs for the same location. Our system is mainly constructed into four primary functions: "Add", "Move", "Delete", and "Select". These allow for the addition of a MR-beacon, location movement, its deletion, and the selection of MR-beacon for navigation purposes, respectively. The effectiveness of the proposed method was then validated through experiments by comparing it with the traditional 2D system. As the result, MRNaB was proven to increase the performance of the user when doing navigation to a certain place subjectively and objectively. For additional material, please check: https://mertcookimg.github.io/mrnab

FACTOID: FACtual enTailment fOr hallucInation Detection

Authors: Vipula Rawte, S. M Towhidul Islam Tonmoy, Krishnav Rajbangshi, Shravani Nag, Aman Chadha, Amit P. Sheth, Amitava Das

Link: http://arxiv.org/abs/2403.19113v1open in new window

Abstract: The widespread adoption of Large Language Models (LLMs) has facilitated numerous benefits. However, hallucination is a significant concern. In response, Retrieval Augmented Generation (RAG) has emerged as a highly promising paradigm to improve LLM outputs by grounding them in factual information. RAG relies on textual entailment (TE) or similar methods to check if the text produced by LLMs is supported or contradicted, compared to retrieved documents. This paper argues that conventional TE methods are inadequate for spotting hallucinations in content generated by LLMs. For instance, consider a prompt about the 'USA's stance on the Ukraine war''. The AI-generated text states, ...U.S. President Barack Obama says the U.S. will not put troops in Ukraine...'' However, during the war the U.S. president is Joe Biden which contradicts factual reality. Moreover, current TE systems are unable to accurately annotate the given text and identify the exact portion that is contradicted. To address this, we introduces a new type of TE called ``Factual Entailment (FE).'', aims to detect factual inaccuracies in content generated by LLMs while also highlighting the specific text segment that contradicts reality. We present FACTOID (FACTual enTAILment for hallucInation Detection), a benchmark dataset for FE. We propose a multi-task learning (MTL) framework for FE, incorporating state-of-the-art (SoTA) long text embeddings such as e5-mistral-7b-instruct, along with GPT-3, SpanBERT, and RoFormer. The proposed MTL architecture for FE achieves an avg. 40% improvement in accuracy on the FACTOID benchmark compared to SoTA TE methods. As FE automatically detects hallucinations, we assessed 15 modern LLMs and ranked them using our proposed Auto Hallucination Vulnerability Index (HVI_auto). This index quantifies and offers a comparative scale to evaluate and rank LLMs according to their hallucinations.

2024-03-27

The Correlations of Scene Complexity, Workload, Presence, and Cybersickness in a Task-Based VR Game

Authors: Mohammadamin Sanaei, Stephen B. Gilbert, Nikoo Javadpour, Hila Sabouni, Michael C. Dorneich, Jonathan W. Kelly

Link: http://arxiv.org/abs/2403.19019v1open in new window

Abstract: This investigation examined the relationships among scene complexity, workload, presence, and cybersickness in virtual reality (VR) environments. Numerous factors can influence the overall VR experience, and existing research on this matter is not yet conclusive, warranting further investigation. In this between-subjects experimental setup, 44 participants engaged in the Pendulum Chair game, with half exposed to a simple scene with lower optic flow and lower familiarity, and the remaining half to a complex scene characterized by higher optic flow and greater familiarity. The study measured the dependent variables workload, presence, and cybersickness and analyzed their correlations. Equivalence testing was also used to compare the simple and complex environments. Results revealed that despite the visible differences between the environments, within the 10% boundaries of the maximum possible value for workload and presence, and 13.6% of the maximum SSQ value, a statistically significant equivalence was observed between the simple and complex scenes. Additionally, a moderate, negative correlation emerged between workload and SSQ scores. The findings suggest two key points: (1) the nature of the task can mitigate the impact of scene complexity factors such as optic flow and familiarity, and (2) the correlation between workload and cybersickness may vary, showing either a positive or negative relationship.

From Virtual Reality to the Emerging Discipline of Perception Engineering

Authors: Steven M. LaValle, Evan G. Center, Timo Ojala, Matti Pouke, Nicoletta Prencipe, Basak Sakcak, Markku Suomalainen, Kalle G. Timperi, Vadim K. Weinstein

Link: http://arxiv.org/abs/2403.18588v1open in new window

Abstract: This paper makes the case that a powerful new discipline, which we term perception engineering, is steadily emerging. It follows from a progression of ideas that involve creating illusions, from historical paintings and film, to video games and virtual reality in modern times. Rather than creating physical artifacts such as bridges, airplanes, or computers, perception engineers create illusory perceptual experiences. The scope is defined over any agent that interacts with the physical world, including both biological organisms (humans, animals) and engineered systems (robots, autonomous systems). The key idea is that an agent, called a producer, alters the environment with the intent to alter the perceptual experience of another agent, called a receiver. Most importantly, the paper introduces a precise mathematical formulation of this process, based on the von Neumann-Morgenstern notion of information, to help scope and define the discipline. It is then applied to the cases of engineered and biological agents with discussion of its implications on existing fields such as virtual reality, robotics, and even social media. Finally, open challenges and opportunities for involvement are identified.

Neighbor-Environment Observer: An Intelligent Agent for Immersive Working Companionship

Authors: Zhe Sun, Qixuan Liang, Meng Wang, Zhenliang Zhang

Link: http://arxiv.org/abs/2403.18331v1open in new window

Abstract: Human-computer symbiosis is a crucial direction for the development of artificial intelligence. As intelligent systems become increasingly prevalent in our work and personal lives, it is important to develop strategies to support users across physical and virtual environments. While technological advances in personal digital devices, such as personal computers and virtual reality devices, can provide immersive experiences, they can also disrupt users' awareness of their surroundings and enhance the frustration caused by disturbances. In this paper, we propose a joint observation strategy for artificial agents to support users across virtual and physical environments. We introduce a prototype system, neighbor-environment observer (NEO), that utilizes non-invasive sensors to assist users in dealing with disruptions to their immersive experience. System experiments evaluate NEO from different perspectives and demonstrate the effectiveness of the joint observation strategy. A user study is conducted to evaluate its usability. The results show that NEO could lessen users' workload with the learned user preference. We suggest that the proposed strategy can be applied to various smart home scenarios.

2024-03-25

Review Ecosystems to access Educational XR Experiences: a Scoping Review

Authors: Shaun Bangay, Adam P. A. Cardilini, Sophie McKenzie, Maria Nicholas, Manjeet Singh

Link: http://arxiv.org/abs/2403.17243v1open in new window

Abstract: Educators, developers, and other stakeholders face challenges when creating, adapting, and utilizing virtual and augmented reality (XR) experiences for teaching curriculum topics. User created reviews of these applications provide important information about their relevance and effectiveness in supporting achievement of educational outcomes. To make these reviews accessible, relevant, and useful, they must be readily available and presented in a format that supports decision-making by educators. This paper identifies best practices for developing a new review ecosystem by analyzing existing approaches to providing reviews of interactive experiences. It focuses on the form and format of these reviews, as well as the mechanisms for sharing information about experiences and identifying which ones are most effective. The paper also examines the incentives that drive review creation and maintenance, ensuring that new experiences receive attention from reviewers and that relevant information is updated when necessary. The strategies and opportunities for developing an educational XR (eduXR) review ecosystem include methods for measuring properties such as quality metrics, engaging a broad range of stakeholders in the review process, and structuring the system as a closed loop managed by feedback and incentive structures to ensure stability and productivity. Computing educators are well-positioned to lead the development of these review ecosystems, which can relate XR experiences to the potential opportunities for teaching and learning that they offer.

2024-03-24

Designing Upper-Body Gesture Interaction with and for People with Spinal Muscular Atrophy in VR

Authors: Jingze Tian, Yingna Wang, Keye Yu, Liyi Xu, Junan Xie, Franklin Mingzhe Li, Yafeng Niu, Mingming Fan

Link: http://arxiv.org/abs/2403.16107v1open in new window

Abstract: Recent research proposed gaze-assisted gestures to enhance interaction within virtual reality (VR), providing opportunities for people with motor impairments to experience VR. Compared to people with other motor impairments, those with Spinal Muscular Atrophy (SMA) exhibit enhanced distal limb mobility, providing them with more design space. However, it remains unknown what gaze-assisted upper-body gestures people with SMA would want and be able to perform. We conducted an elicitation study in which 12 VR-experienced people with SMA designed upper-body gestures for 26 VR commands, and collected 312 user-defined gestures. Participants predominantly favored creating gestures with their hands. The type of tasks and participants' abilities influence their choice of body parts for gesture design. Participants tended to enhance their body involvement and preferred gestures that required minimal physical effort, and were aesthetically pleasing. Our research will contribute to creating better gesture-based input methods for people with motor impairments to interact with VR.

2024-03-23

Perception and Control of Surfing in Virtual Reality using a 6-DoF Motion Platform

Authors: Premankur Banerjee, Jason Cherin, Jayati Upadhyay, Jason Kutch, Heather Culbertson

Link: http://arxiv.org/abs/2403.15924v1open in new window

Abstract: The paper presents a system for simulating surfing in Virtual Reality (VR), emphasizing the recreation of aquatic motions and user-initiated propulsive forces using a 6-Degree of Freedom (DoF) motion platform. We present an algorithmic approach to accurately render surfboard kinematics and interactive paddling dynamics, validated through experimental evaluation with (N=17) participants. Results indicate that the system effectively reproduces various acceleration levels, the perception of which is independent of users' body posture. We additionally found that the presence of ocean ripples amplifies the perception of acceleration. This system aims to enhance the realism and interactivity of VR surfing, laying a foundation for future advancements in surf therapy and interactive aquatic VR experiences.