HCI

2024-04-25

Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class

Authors: Mazda Moayeri, Michael Rabbat, Mark Ibrahim, Diane Bouchacourt

Link: http://arxiv.org/abs/2404.16717v1open in new window

Abstract: Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms -- from diced to whole, on a table or in a bowl -- yet standard VLM classifiers map all instances of a class to a \it{single vector based on the class label}. We argue that to represent this rich diversity within a class, zero-shot classification should move beyond a single vector. We propose a method to encode and account for diversity within a class using inferred attributes, still in the zero-shot setting without retraining. We find our method consistently outperforms standard zero-shot classification over a large suite of datasets encompassing hierarchies, diverse object states, and real-world geographic diversity, as well finer-grained datasets where intra-class diversity may be less prevalent. Importantly, our method is inherently interpretable, offering faithful explanations for each inference to facilitate model debugging and enhance transparency. We also find our method scales efficiently to a large number of attributes to account for diversity -- leading to more accurate predictions for atypical instances. Finally, we characterize a principled trade-off between overall and worst class accuracy, which can be tuned via a hyperparameter of our method. We hope this work spurs further research into the promise of zero-shot classification beyond a single class vector for capturing diversity in the world, and building transparent AI systems without compromising performance.

Benchmarking Mobile Device Control Agents across Diverse Configurations

Authors: Juyong Lee, Taywon Min, Minyong An, Changyeon Kim, Kimin Lee

Link: http://arxiv.org/abs/2404.16660v1open in new window

Abstract: Developing autonomous agents for mobile devices can significantly enhance user interactions by offering increased efficiency and accessibility. However, despite the growing interest in mobile device control agents, the absence of a commonly adopted benchmark makes it challenging to quantify scientific progress in this area. In this work, we introduce B-MoCA: a novel benchmark designed specifically for evaluating mobile device control agents. To create a realistic benchmark, we develop B-MoCA based on the Android operating system and define 60 common daily tasks. Importantly, we incorporate a randomization feature that changes various aspects of mobile devices, including user interface layouts and language settings, to assess generalization performance. We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs as well as agents trained from scratch using human expert demonstrations. While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to enhance their effectiveness. Our source code is publicly available at https://b-moca.github.io.

Comparing Continuous and Retrospective Emotion Ratings in Remote VR Study

Authors: Maximilian Warsinke, Tanja Kojić, Maurizio Vergari, Robert Spang, Jan-Niklas Voigt-Antons, Sebastian Möller

Link: http://arxiv.org/abs/2404.16487v1open in new window

Abstract: This study investigates the feasibility of remote virtual reality (VR) studies conducted at home using VR headsets and video conferencing by deploying an experiment on emotion ratings. 20 participants used head-mounted displays to immerse themselves in 360{\deg} videos selected to evoke emotional responses. The research compares continuous ratings using a graphical interface to retrospective questionnaires on a digitized Likert Scale for measuring arousal and valence, both based on the self-assessment manikin (SAM). It was hypothesized that the two different rating methods would lead to significantly different values for both valence and arousal. The goal was to investigate whether continuous ratings during the experience would better reflect users' emotions compared to the post-questionnaire by mitigating biases such as the peak-end rule. The results show significant differences with moderate to strong effect sizes for valence and no significant differences for arousal with low to moderate effect sizes. This indicates the need for further investigation of the methods used to assess emotion ratings in VR studies. Overall, this study is an example of a remotely conducted VR experiment, offering insights into methods for emotion elicitation in VR by varying the timing and interface of the rating.

CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations

Authors: Chen Wei, Jiachen Zou, Dietmar Heinke, Quanying Liu

Link: http://arxiv.org/abs/2404.16482v1open in new window

Abstract: A central question for cognitive science is to understand how humans process visual objects, i.e, to uncover human low-dimensional concept representation space from high-dimensional visual stimuli. Generating visual stimuli with controlling concepts is the key. However, there are currently no generative models in AI to solve this problem. Here, we present the Concept based Controllable Generation (CoCoG) framework. CoCoG consists of two components, a simple yet efficient AI agent for extracting interpretable concept and predicting human decision-making in visual similarity judgment tasks, and a conditional generation model for generating visual stimuli given the concepts. We quantify the performance of CoCoG from two aspects, the human behavior prediction accuracy and the controllable generation ability. The experiments with CoCoG indicate that 1) the reliable concept embeddings in CoCoG allows to predict human behavior with 64.07% accuracy in the THINGS-similarity dataset; 2) CoCoG can generate diverse objects through the control of concepts; 3) CoCoG can manipulate human similarity judgment behavior by intervening key concepts. CoCoG offers visual objects with controlling concepts to advance our understanding of causality in human cognition. The code of CoCoG is available at \url{https://github.com/ncclab-sustech/CoCoG}.

The Impact of Social Environment and Interaction Focus on User Experience and Social Acceptability of an Augmented Reality Game

Authors: Lorenzo Cocchia, Maurizio Vergari, Tanja Kojic, Francesco Vona, Sebastian Moller, Franca Garzotto, Jan-Niklas Voigt-Antons

Link: http://arxiv.org/abs/2404.16479v1open in new window

Abstract: One of the most promising technologies inside the Extended Reality (XR) spectrum is Augmented Reality. This technology is already in people's pockets regarding Mobile Augmented Reality with their smartphones. The scientific community still needs answers about how humans could and should interact in environments where perceived stimuli are different from fully physical or digital circumstances. Moreover, it is still being determined if people accept these new technologies in different social environments and interaction settings or if some obstacles could exist. This paper explores the impact of the Social Environment and the Focus of social interaction on users while playing a location-based augmented reality game, measuring it with user experience and social acceptance indicators. An empirical study in a within-subject fashion was performed in different social environments and under different settings of social interaction focus with N = 28 participants compiling self-reported questionnaires after playing a Scavenger Hunt in Augmented Reality. The measures from two different Social Environments (Crowded vs. Uncrowded) resulted in statistically relevant mean differences with indicators from the Social Acceptability dimension. Moreover, the analyses show statistically relevant differences between the variances from different degrees of Social Interaction Focus with Overall Social Presence, Perceived Psychological Engagement, Perceived Attentional Engagement, and Perceived Emotional Contagion. The results suggest that a location-based AR game played in different social environments and settings can influence the user experience's social dimension. Therefore, they should be carefully considered while designing immersive technological experiences in public spaces involving social interactions between players.

Impact of spatial auditory navigation on user experience during augmented outdoor navigation tasks

Authors: Jan-Niklas Voigt-Antons, Zhirou Sun, Maurizio Vergari, Navid Ashrafi, Francesco Vona, Tanja Kojic

Link: http://arxiv.org/abs/2404.16473v1open in new window

Abstract: The auditory sense of humans is important when it comes to navigation. The importance is especially high in cases when an object of interest is visually partly or fully covered. Interactions with users of technology are mainly focused on the visual domain of navigation tasks. This paper presents the results of a literature review and user study exploring the impact of spatial auditory navigation on user experience during an augmented outdoor navigation task. For the user test, participants used an augmented reality app guiding them to different locations with different digital augmentation. We conclude that the utilization of the auditory sense is yet still underrepresented in augmented reality applications. In the future, more usage scenarios for audio-augmented reality such as navigation will enhance user experience and interaction quality.

2024-04-24

A communication protocol based on NK boolean networks for coordinating collective action

Authors: Yori Ong

Link: http://arxiv.org/abs/2404.16240v1open in new window

Abstract: In this paper, I describe a digital social communication protocol (Gridt) based on Kauffman's NK boolean networks. The main assertion is that a communication network with this topology supports infinitely scalable self-organization of collective action without requiring hierarchy or central control. The paper presents the functionality of this protocol and substantiates the following propositions about its function and implications: (1) Communication via NK boolean networks facilitates coordination on collective action games for any variable number of users, and justifies the assumption that the game's payoff structure is common knowledge; (2) Use of this protocol increases its users' transfer empowerment, a form of intrinsic motivation that motivates coordinated action independent of the task or outcome; (3) Communication via this network can be considered 'cheap talk' and benefits the strategy of players with aligned interests, but not of players with conflicting interests; (4) Absence of significant barriers for its realization warrants a timely and continuing discussion on the ethics and implications of this technology; (5) Full realization of the technology's potential calls for a free-to-use service with maximal transparency of design and associated economic incentives.

MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models

Authors: Grace Guo, Lifu Deng, Animesh Tandon, Alex Endert, Bum Chul Kwon

Link: http://arxiv.org/abs/2404.16174v1open in new window

Abstract: The recent prevalence of publicly accessible, large medical imaging datasets has led to a proliferation of artificial intelligence (AI) models for cardiovascular image classification and analysis. At the same time, the potentially significant impacts of these models have motivated the development of a range of explainable AI (XAI) methods that aim to explain model predictions given certain image inputs. However, many of these methods are not developed or evaluated with domain experts, and explanations are not contextualized in terms of medical expertise or domain knowledge. In this paper, we propose a novel framework and python library, MiMICRI, that provides domain-centered counterfactual explanations of cardiovascular image classification models. MiMICRI helps users interactively select and replace segments of medical images that correspond to morphological structures. From the counterfactuals generated, users can then assess the influence of each segment on model predictions, and validate the model against known medical facts. We evaluate this library with two medical experts. Our evaluation demonstrates that a domain-centered XAI approach can enhance the interpretability of model explanations, and help experts reason about models in terms of relevant domain knowledge. However, concerns were also surfaced about the clinical plausibility of the counterfactuals generated. We conclude with a discussion on the generalizability and trustworthiness of the MiMICRI framework, as well as the implications of our findings on the development of domain-centered XAI methods for model interpretability in healthcare contexts.

A Situated-Infrastructuring of WhatsApp for Business in India

Authors: Ankolika De

Link: http://arxiv.org/abs/2404.16124v1open in new window

Abstract: WhatsApp has become a pivotal communication tool in India, transcending cultural boundaries and deeply integrating into the nation's digital landscape. Meta's introduction of WhatsApp for Business aligns seamlessly with the platform's popularity, offering businesses a crucial tool. However, the monetization plans pose challenges, particularly for smaller businesses, in balancing revenue goals with accessibility. This study, employing discourse analysis, examines Meta's infrastructuring of WhatsApp in India, emphasizing the dynamic interplay of technological, social, and cultural dimensions. Consequently, it highlights potential power differences caused by the deployment of WhatsApp for Business followed by its gradual but significant modifications, encouraging scholars to investigate the implications and ethics of rapid technological changes, particularly for marginalized users.

The State of the Art in Visual Analytics for 3D Urban Data

Authors: Fabio Miranda, Thomas Ortner, Gustavo Moreira, Maryam Hosseini, Milena Vuckovic, Filip Biljecki, Claudio Silva, Marcos Lage, Nivan Ferreira

Link: http://arxiv.org/abs/2404.15976v1open in new window

Abstract: Urbanization has amplified the importance of three-dimensional structures in urban environments for a wide range of phenomena that are of significant interest to diverse stakeholders. With the growing availability of 3D urban data, numerous studies have focused on developing visual analysis techniques tailored to the unique characteristics of urban environments. However, incorporating the third dimension into visual analytics introduces additional challenges in designing effective visual tools to tackle urban data's diverse complexities. In this paper, we present a survey on visual analytics of 3D urban data. Our work characterizes published works along three main dimensions (why, what, and how), considering use cases, analysis tasks, data, visualizations, and interactions. We provide a fine-grained categorization of published works from visualization journals and conferences, as well as from a myriad of urban domains, including urban planning, architecture, and engineering. By incorporating perspectives from both urban and visualization experts, we identify literature gaps, motivate visualization researchers to understand challenges and opportunities, and indicate future research directions.

2024-04-23

Augmented Voices: An Augmented Reality Experience Highlighting the Social Injustices of Gender-Based Violence in the Muslim South-Asian Diaspora

Authors: Hamida Khatri

Link: http://arxiv.org/abs/2404.15239v1open in new window

Abstract: This paper delves into the distressing prevalence of gender-based violence (GBV) and its deep-seated psychological ramifications, particularly among Muslim South Asian women living in diasporic communities. Despite the gravity of GBV, these women often face formidable barriers in voicing their experiences and accessing support. "Augmented Voices" emerges as a technological beacon, harnessing the potential of augmented reality (AR) to bridge the digital and physical realms through mobile devices, enhancing the visibility of these often-silenced voices. With its technological motivation firmly anchored in the convergence of AR and real-world interactions, "Augmented Voices" offers a digital platform where storytelling acts as a catalyst, bringing to the fore the experiences shared by these women. By superimposing their narratives onto physical locations via Geographic Information System (GIS) Mapping, the application "augments their voices" in the diaspora, providing a conduit for expression and solidarity. This project, currently at its developmental stage, aspires to elevate the stories of GBV victims to a level where their struggles are not just heard but felt, forging a powerful connection between the user and the narrative. It is designed to transcend the limitations of conventional storytelling, creating an "augmented" reality where voices that are often muted by societal constraints can resonate powerfully. The project underscores the urgent imperative to confront GBV, catalyzing societal transformation and fostering robust support networks for those in the margins. It is a pioneering example of how technology can become a formidable ally in the fight for social justice and the empowerment of the oppressed. Additionally, this paper delves into the AR workflow illustrating its relevance and contribution to the broader theme of site-specific AR for social justice.

Evaluating Physician-AI Interaction for Cancer Management: Paving the Path towards Precision Oncology

Authors: Zeshan Hussain, Barbara D. Lam, Fernando A. Acosta-Perez, Irbaz Bin Riaz, Maia Jacobs, Andrew J. Yee, David Sontag

Link: http://arxiv.org/abs/2404.15187v1open in new window

Abstract: We evaluated how clinicians approach clinical decision-making when given findings from both randomized controlled trials (RCTs) and machine learning (ML) models. To do so, we designed a clinical decision support system (CDSS) that displays survival curves and adverse event information from a synthetic RCT and ML model for 12 patients with multiple myeloma. We conducted an interventional study in a simulated setting to evaluate how clinicians synthesized the available data to make treatment decisions. Participants were invited to participate in a follow-up interview to discuss their choices in an open-ended format. When ML model results were concordant with RCT results, physicians had increased confidence in treatment choice compared to when they were given RCT results alone. When ML model results were discordant with RCT results, the majority of physicians followed the ML model recommendation in their treatment selection. Perceived reliability of the ML model was consistently higher after physicians were provided with data on how it was trained and validated. Follow-up interviews revealed four major themes: (1) variability in what variables participants used for decision-making, (2) perceived advantages to an ML model over RCT data, (3) uncertainty around decision-making when the ML model quality was poor, and (4) perception that this type of study is an important thought exercise for clinicians. Overall, ML-based CDSSs have the potential to change treatment decisions in cancer management. However, meticulous development and validation of these systems as well as clinician training are required before deployment.

Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition

Authors: David Doukhan, Simon Devauchelle, Lucile Girard-Monneron, Mía Chávez Ruz, V. Chaddouk, Isabelle Wagner, Albert Rilliard

Link: http://arxiv.org/abs/2404.15176v1open in new window

Abstract: This paper presents a software allowing to describe voices using a continuous Voice Femininity Percentage (VFP). This system is intended for transgender speakers during their voice transition and for voice therapists supporting them in this process. A corpus of 41 French cis- and transgender speakers was recorded. A perceptual evaluation allowed 57 participants to estimate the VFP for each voice. Binary gender classification models were trained on external gender-balanced data and used on overlapping windows to obtain average gender prediction estimates, which were calibrated to predict VFP and obtained higher accuracy than $F_0$ or vocal track length-based models. Training data speaking style and DNN architecture were shown to impact VFP estimation. Accuracy of the models was affected by speakers' age. This highlights the importance of style, age, and the conception of gender as binary or not, to build adequate statistical representations of cultural concepts.

Lost in Magnitudes: Exploring the Design Space for Visualizing Data with Large Value Ranges

Authors: Katerina Batziakoudi, Florent Cabric, Stéphanie Rey, Jean-Daniel Fekete

Link: http://arxiv.org/abs/2404.15150v1open in new window

Abstract: We explore the design space for the static visualization of datasets with quantitative attributes that vary over multiple orders of magnitude-we call these attributes Orders of Magnitude Values (OMVs)-and provide design guidelines and recommendations on effective visual encodings for OMVs. Current charts rely on linear or logarithmic scales to visualize values, leading to limitations in performing simple tasks for OMVs. In particular, linear scales prevent the reading of smaller magnitudes and their comparisons, while logarithmic scales are challenging for the general public to understand. Our design space leverages the approach of dividing OMVs into two different parts: mantissa and exponent, in a way similar to scientific notation. This separation allows for a visual encoding of both parts. For our exploration, we use four datasets, each with two attributes: an OMV, divided into mantissa and exponent, and a second attribute that is nominal, ordinal, time, or quantitative. We start from the original design space described by the Grammar of Graphics and systematically generate all possible visualizations for these datasets, employing different marks and visual channels. We refine this design space by enforcing integrity constraints from visualization and graphical perception literature. Through a qualitative assessment of all viable combinations, we discuss the most effective visualizations for OMVs, focusing on channel and task effectiveness. The article's main contributions are 1) the presentation of the design space of OMVs, 2) the generation of a large number of OMV visualizations, among which some are novel and effective, 3) the refined definition of a scale that we call E+M for OMVs, and 4) guidelines and recommendations for designing effective OMV visualizations. These efforts aim to enrich visualization systems to better support data with OMVs and guide future research.

Virtual Takeovers in the Metaverse: Interrogating Power in Our Past and Future(s) with Multi-Layered Narratives

Authors: Heather Snyder Quinn, Jessa Dickinson

Link: http://arxiv.org/abs/2404.15108v1open in new window

Abstract: Mariah is an augmented reality (AR) mobile application that exposes power structures (e.g., capitalism, patriarchy, white supremacy) through storytelling and celebrates acts of resistance against them. People can use Mariah to "legally trespass" the metaverse as a form of protest. Mariah provides historical context to the user's physical surroundings by superimposing images and playing stories about people who have experienced, and resisted, injustice. We share two implementations of Mariah that raise questions about free speech and property rights in the metaverse: (1) a protest against museums accepting "dirty money" from the opioid epidemic; and (2) a commemoration of sites where people have resisted power structures. Mariah is a case study for how experimenting with a technology in non-sanctioned ways (i.e., "hacking") can expose ways that it might interact with, and potentially amplify, existing power structures.

MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

Authors: Zheng Ning, Zheng Zhang, Jerrick Ban, Kaiwen Jiang, Ruohong Gan, Yapeng Tian, Toby Jia-Jun Li

Link: http://arxiv.org/abs/2404.15107v1open in new window

Abstract: Spatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized equipment and skills, posing a high barrier for amateur video creators. We present MIMOSA, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only monaural or stereo audio, MIMOSA automatically grounds each sound source to the corresponding sounding object in the visual scene and enables users to further validate and fix the errors in the locations of sounding objects. Users can also augment the spatial audio effect by flexibly manipulating the sounding source positions and creatively customizing the audio effect. The design of MIMOSA exemplifies a human-AI collaboration approach that, instead of utilizing state-of art end-to-end "black-box" ML models, uses a multistep pipeline that aligns its interpretable intermediate results with the user's workflow. A lab user study with 15 participants demonstrates MIMOSA's usability, usefulness, expressiveness, and capability in creating immersive spatial audio effects in collaboration with users.

Between Flat-Earthers and Fitness Coaches: Who is Citing Scientific Publications in YouTube Video Descriptions?

Authors: Olga Zagovora, Katrin Weller

Link: http://arxiv.org/abs/2404.15083v1open in new window

Abstract: In this study, we undertake an extensive analysis of YouTube channels that reference research publications in their video descriptions, offering a unique insight into the intersection of digital media and academia. Our investigation focuses on three principal aspects: the background of YouTube channel owners, their thematic focus, and the nature of their operational dynamics, specifically addressing whether they work individually or in groups. Our results highlight a strong emphasis on content related to science and engineering, as well as health, particularly in channels managed by individual researchers and academic institutions. However, there is a notable variation in the popularity of these channels, with professional YouTubers and commercial media entities often outperforming in terms of viewer engagement metrics like likes, comments, and views. This underscores the challenge academic channels face in attracting a wider audience. Further, we explore the role of academic actors on YouTube, scrutinizing their impact in disseminating research and the types of publications they reference. Despite a general inclination towards professional academic topics, these channels displayed a varied effectiveness in spotlighting highly cited research. Often, they referenced a wide array of publications, indicating a diverse but not necessarily impact-focused approach to content selection.

Vision Beyond Boundaries: An Initial Design Space of Domain-specific Large Vision Models in Human-robot Interaction

Authors: Yuchong Zhang, Yong Ma, Danica Kragic

Link: http://arxiv.org/abs/2404.14965v1open in new window

Abstract: The emergence of Large Vision Models (LVMs) is following in the footsteps of the recent prosperity of Large Language Models (LLMs) in following years. However, there's a noticeable gap in structured research applying LVMs to Human-Robot Interaction (HRI), despite extensive evidence supporting the efficacy of vision models in enhancing interactions between humans and robots. Recognizing the vast and anticipated potential, we introduce an initial design space that incorporates domain-specific LVMs, chosen for their superior performance over normal models. We delve into three primary dimensions: HRI contexts, vision-based tasks, and specific domains. The empirical validation was implemented among 15 experts across six evaluated metrics, showcasing the primary efficacy in relevant decision-making scenarios. We explore the process of ideation and potential application scenarios, envisioning this design space as a foundational guideline for future HRI system design, emphasizing accurate domain alignment and model selection.

G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

Authors: Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

Link: http://arxiv.org/abs/2404.14934v1open in new window

Abstract: Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in developing generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to designing a software pipeline that exploits wealthy 2D videos to generate realistic radar data, but it needs to address the challenge of simulating diversified and fine-grained reflection properties of user gestures. To this end, we design G3R with three key components: (i) a gesture reflection point generator expands the arm's skeleton points to form human reflection points; (ii) a signal simulation model simulates the multipath reflection and attenuation of radar signals to output the human intensity map; (iii) an encoder-decoder model combines a sampling module and a fitting module to address the differences in number and distribution of points between generated and real-world radar data for generating realistic radar data. We implement and evaluate G3R using 2D videos from public data sources and self-collected real-world radar data, demonstrating its superiority over other state-of-the-art approaches for gesture recognition.

Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

Authors: Ranim Khojah, Mazen Mohamad, Philipp Leitner, Francisco Gomes de Oliveira Neto

Link: http://arxiv.org/abs/2404.14901v1open in new window

Abstract: Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.

2024-04-18

Evaluating AI for Law: Bridging the Gap with Open-Source Solutions

Authors: Rohan Bhambhoria, Samuel Dahan, Jonathan Li, Xiaodan Zhu

Link: http://arxiv.org/abs/2404.12349v1open in new window

Abstract: This study evaluates the performance of general-purpose AI, like ChatGPT, in legal question-answering tasks, highlighting significant risks to legal professionals and clients. It suggests leveraging foundational models enhanced by domain-specific knowledge to overcome these issues. The paper advocates for creating open-source legal AI systems to improve accuracy, transparency, and narrative diversity, addressing general AI's shortcomings in legal contexts.

Large Language Models for Synthetic Participatory Planning of Shared Automated Electric Mobility Systems

Authors: Jiangbo Yu

Link: http://arxiv.org/abs/2404.12317v1open in new window

Abstract: Unleashing the synergies of rapidly evolving mobility technologies in a multi-stakeholder landscape presents unique challenges and opportunities for addressing urban transportation problems. This paper introduces a novel synthetic participatory method, critically leveraging large language models (LLMs) to create digital avatars representing diverse stakeholders to plan shared automated electric mobility systems (SAEMS). These calibratable agents collaboratively identify objectives, envision and evaluate SAEMS alternatives, and strategize implementation under risks and constraints. The results of a Montreal case study indicate that a structured and parameterized workflow provides outputs with high controllability and comprehensiveness on an SAEMS plan than generated using a single LLM-enabled expert agent. Consequently, the approach provides a promising avenue for cost-efficiently improving the inclusivity and interpretability of multi-objective transportation planning, suggesting a paradigm shift in how we envision and strategize for sustainable and equitable transportation systems.

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

Authors: Shreya Shankar, J. D. Zamfirescu-Pereira, Björn Hartmann, Aditya G. Parameswaran, Ian Arawjo

Link: http://arxiv.org/abs/2404.12272v1open in new window

Abstract: Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators simply inherit all the problems of the LLMs they evaluate, requiring further human validation. We present a mixed-initiative approach to ``validate the validators'' -- aligning LLM-generated evaluation functions (be it prompts or code) with human requirements. Our interface, EvalGen, provides automated assistance to users in generating evaluation criteria and implementing assertions. While generating candidate implementations (Python functions, LLM grader prompts), EvalGen asks humans to grade a subset of LLM outputs; this feedback is used to select implementations that better align with user grades. A qualitative study finds overall support for EvalGen but underscores the subjectivity and iterative process of alignment. In particular, we identify a phenomenon we dub \emph{criteria drift}: users need criteria to grade outputs, but grading outputs helps users define criteria. What is more, some criteria appears \emph{dependent} on the specific LLM outputs observed (rather than independent criteria that can be defined \emph{a priori}), raising serious questions for approaches that assume the independence of evaluation from observation of model outputs. We present our interface and implementation details, a comparison of our algorithm with a baseline approach, and implications for the design of future LLM evaluation assistants.

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM

Authors: Michelle S. Lam, Janice Teoh, James Landay, Jeffrey Heer, Michael S. Bernstein

Link: http://arxiv.org/abs/2404.12259v1open in new window

Abstract: Data analysts have long sought to turn unstructured text data into meaningful concepts. Though common, topic modeling and clustering focus on lower-level keywords and require significant interpretative work. We introduce concept induction, a computational process that instead produces high-level concepts, defined by explicit inclusion criteria, from unstructured text. For a dataset of toxic online comments, where a state-of-the-art BERTopic model outputs "women, power, female," concept induction produces high-level concepts such as "Criticism of traditional gender roles" and "Dismissal of women's concerns." We present LLooM, a concept induction algorithm that leverages large language models to iteratively synthesize sampled text and propose human-interpretable concepts of increasing generality. We then instantiate LLooM in a mixed-initiative text analysis tool, enabling analysts to shift their attention from interpreting topics to engaging in theory-driven analysis. Through technical evaluations and four analysis scenarios ranging from literature review to content moderation, we find that LLooM's concepts improve upon the prior art of topic models in terms of quality and data coverage. In expert case studies, LLooM helped researchers to uncover new insights even from familiar datasets, for example by suggesting a previously unnoticed concept of attacks on out-party stances in a political social media dataset.

E-Vote Your Conscience: Perceptions of Coercion and Vote Buying, and the Usability of Fake Credentials in Online Voting

Authors: Louis-Henri Merino, Alaleh Azhir, Haoqian Zhang, Simone Colombo, Bernhard Tellenbach, Vero Estrada-Galiñanes, Bryan Ford

Link: http://arxiv.org/abs/2404.12075v1open in new window

Abstract: Online voting is attractive for convenience and accessibility, but is more susceptible to voter coercion and vote buying than in-person voting. One mitigation is to give voters fake voting credentials that they can yield to a coercer. Fake credentials appear identical to real ones, but cast votes that are silently omitted from the final tally. An important unanswered question is how ordinary voters perceive such a mitigation: whether they could understand and use fake credentials, and whether the coercion risks justify the costs of mitigation. We present the first systematic study of these questions, involving 150 diverse individuals in Boston, Massachusetts. All participants "registered" and "voted" in a mock election: 120 were exposed to coercion resistance via fake credentials, the rest forming a control group. Of the 120 participants exposed to fake credentials, 96% understood their use. 53% reported that they would create fake credentials in a real-world voting scenario, given the opportunity. 10% mistakenly voted with a fake credential, however. 22% reported either personal experience with or direct knowledge of coercion or vote-buying incidents. These latter participants rated the coercion-resistant system essentially as trustworthy as in-person voting via hand-marked paper ballots. Of the 150 total participants to use the system, 87% successfully created their credentials without assistance; 83% both successfully created and properly used their credentials. Participants give a System Usability Scale score of 70.4, which is slightly above the industry's average score of 68. Our findings appear to support the importance of the coercion problem in general, and the promise of fake credentials as a possible mitigation, but user error rates remain an important usability challenge for future work.

A Flexible Architecture for Web-based GIS Applications using Docker and Graph Databases

Authors: Yves Annanias, Daniel Wiegreffe

Link: http://arxiv.org/abs/2404.12074v1open in new window

Abstract: Regional planning processes and associated redevelopment projects can be complex due to the vast amount of diverse data involved. However, all of this data shares a common geographical reference, especially in the renaturation of former open-cast mining areas. To ensure safety, it is crucial to maintain a comprehensive overview of the interrelated data and draw accurate conclusions. This requires special tools and can be a very time-consuming process. A geographical information system (GIS) is well-suited for this purpose, but even a GIS has limitations when dealing with multiple data types and sources. Additional tools are often necessary to process and view all the data, which can complicate the planning process. Our paper describes a system architecture that addresses the aforementioned issues and provides a simple, yet flexible tool for these activities. The architecture is based on microservices using Docker and is divided into a backend and a frontend. The backend simplifies and generalizes the integration of different data types, while a graph database is used to link relevant data and reveal potential new relationships between them. Finally, a modern web frontend displays the data and relationships.

Deconstructing Human-AI Collaboration: Agency, Interaction, and Adaptation

Authors: Steffen Holter, Mennatallah El-Assady

Link: http://arxiv.org/abs/2404.12056v1open in new window

Abstract: As full AI-based automation remains out of reach in most real-world applications, the focus has instead shifted to leveraging the strengths of both human and AI agents, creating effective collaborative systems. The rapid advances in this area have yielded increasingly more complex systems and frameworks, while the nuance of their characterization has gotten more vague. Similarly, the existing conceptual models no longer capture the elaborate processes of these systems nor describe the entire scope of their collaboration paradigms. In this paper, we propose a new unified set of dimensions through which to analyze and describe human-AI systems. Our conceptual model is centered around three high-level aspects - agency, interaction, and adaptation - and is developed through a multi-step process. Firstly, an initial design space is proposed by surveying the literature and consolidating existing definitions and conceptual frameworks. Secondly, this model is iteratively refined and validated by conducting semi-structured interviews with nine researchers in this field. Lastly, to illustrate the applicability of our design space, we utilize it to provide a structured description of selected human-AI systems.

AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration

Authors: Bo Pan, Jiaying Lu, Ke Wang, Li Zheng, Zhen Wen, Yingchaojie Feng, Minfeng Zhu, Wei Chen

Link: http://arxiv.org/abs/2404.11943v1open in new window

Abstract: The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with existing coordination frameworks. This difficulty stems from the inherent ambiguity of natural language for specifying the collaboration process and the significant cognitive effort required to extract crucial information (e.g. agent relationship, task dependency, result correspondence) from a vast amount of text-form content during exploration. In this work, we present a visual exploration framework to facilitate the design of coordination strategies in multi-agent collaboration. We first establish a structured representation for LLM-based multi-agent coordination strategy to regularize the ambiguity of natural language. Based on this structure, we devise a three-stage generation method that leverages LLMs to convert a user's general goal into an executable initial coordination strategy. Users can further intervene at any stage of the generation process, utilizing LLMs and a set of interactions to explore alternative strategies. Whenever a satisfactory strategy is identified, users can commence the collaboration and examine the visually enhanced execution result. We develop AgentCoord, a prototype interactive system, and conduct a formal user study to demonstrate the feasibility and effectiveness of our approach.

CelluloTactix: Towards Empowering Collaborative Online Learning through Tangible Haptic Interaction with Cellulo Robots

Authors: Hasaru Kariyawasam, Wafa Johal

Link: http://arxiv.org/abs/2404.11876v1open in new window

Abstract: Online learning has soared in popularity in the educational landscape of COVID-19 and carries the benefits of increased flexibility and access to far-away training resources. However, it also restricts communication between peers and teachers, limits physical interactions and confines learning to the computer screen and keyboard. In this project, we designed a novel way to engage students in collaborative online learning by using haptic-enabled tangible robots, Cellulo. We built a library which connects two robots remotely for a learning activity based around the structure of a biological cell. To discover how separate modes of haptic feedback might differentially affect collaboration, two modes of haptic force-feedback were implemented (haptic co-location and haptic consensus). With a case study, we found that the haptic co-location mode seemed to stimulate collectivist behaviour to a greater extent than the haptic consensus mode, which was associated with individualism and less interaction. While the haptic co-location mode seemed to encourage information pooling, participants using the haptic consensus mode tended to focus more on technical co-ordination. This work introduces a novel system that can provide interesting insights on how to integrate haptic feedback into collaborative remote learning activities in future.

2024-04-17

Developing Situational Awareness for Joint Action with Autonomous Vehicles

Authors: Robert Kaufman, David Kirsh, Nadir Weibel

Link: http://arxiv.org/abs/2404.11800v1open in new window

Abstract: Unanswered questions about how human-AV interaction designers can support rider's informational needs hinders Autonomous Vehicles (AV) adoption. To achieve joint human-AV action goals - such as safe transportation, trust, or learning from an AV - sufficient situational awareness must be held by the human, AV, and human-AV system collectively. We present a systems-level framework that integrates cognitive theories of joint action and situational awareness as a means to tailor communications that meet the criteria necessary for goal success. This framework is based on four components of the shared situation: AV traits, action goals, subject-specific traits and states, and the situated driving context. AV communications should be tailored to these factors and be sensitive when they change. This framework can be useful for understanding individual, shared, and distributed human-AV situational awareness and designing for future AV communications that meet the informational needs and goals of diverse groups and in diverse driving contexts.

Establishing a Baseline for Gaze-driven Authentication Performance in VR: A Breadth-First Investigation on a Very Large Dataset

Authors: Dillon Lohr, Michael J. Proulx, Oleg Komogortsev

Link: http://arxiv.org/abs/2404.11798v1open in new window

Abstract: This paper performs the crucial work of establishing a baseline for gaze-driven authentication performance to begin answering fundamental research questions using a very large dataset of gaze recordings from 9202 people with a level of eye tracking (ET) signal quality equivalent to modern consumer-facing virtual reality (VR) platforms. The size of the employed dataset is at least an order-of-magnitude larger than any other dataset from previous related work. Binocular estimates of the optical and visual axes of the eyes and a minimum duration for enrollment and verification are required for our model to achieve a false rejection rate (FRR) of below 3% at a false acceptance rate (FAR) of 1 in 50,000. In terms of identification accuracy which decreases with gallery size, we estimate that our model would fall below chance-level accuracy for gallery sizes of 148,000 or more. Our major findings indicate that gaze authentication can be as accurate as required by the FIDO standard when driven by a state-of-the-art machine learning architecture and a sufficiently large training dataset.

Incremental Bootstrapping and Classification of Structured Scenes in a Fuzzy Ontology

Authors: Luca Buoncompagni, Fulvio Mastrogiovanni

Link: http://arxiv.org/abs/2404.11744v1open in new window

Abstract: We foresee robots that bootstrap knowledge representations and use them for classifying relevant situations and making decisions based on future observations. Particularly for assistive robots, the bootstrapping mechanism might be supervised by humans who should not repeat a training phase several times and should be able to refine the taught representation. We consider robots that bootstrap structured representations to classify some intelligible categories. Such a structure should be incrementally bootstrapped, i.e., without invalidating the identified category models when a new additional category is considered. To tackle this scenario, we presented the Scene Identification and Tagging (SIT) algorithm, which bootstraps structured knowledge representation in a crisp OWL-DL ontology. Over time, SIT bootstraps a graph representing scenes, sub-scenes and similar scenes. Then, SIT can classify new scenes within the bootstrapped graph through logic-based reasoning. However, SIT has issues with sensory data because its crisp implementation is not robust to perception noises. This paper presents a reformulation of SIT within the fuzzy domain, which exploits a fuzzy DL ontology to overcome the robustness issues. By comparing the performances of fuzzy and crisp implementations of SIT, we show that fuzzy SIT is robust, preserves the properties of its crisp formulation, and enhances the bootstrapped representations. On the contrary, the fuzzy implementation of SIT leads to less intelligible knowledge representations than the one bootstrapped in the crisp domain.

Evaluating Tenant-Landlord Tensions Using Generative AI on Online Tenant Forums

Authors: Xin Chen, Cheng Ren, Tim A Thomas

Link: http://arxiv.org/abs/2404.11681v1open in new window

Abstract: Tenant-landlord relationships exhibit a power asymmetry where landlords' power to evict the tenants at a low-cost results in their dominating status in such relationships. Tenant concerns are thus often unspoken, unresolved, or ignored and this could lead to blatant conflicts as suppressed tenant concerns accumulate. Modern machine learning methods and Large Language Models (LLM) have demonstrated immense abilities to perform language tasks. In this study, we incorporate Latent Dirichlet Allocation (LDA) with GPT-4 to classify Reddit post data scraped from the subreddit r/Tenant, aiming to unveil trends in tenant concerns while exploring the adoption of LLMs and machine learning methods in social science research. We find that tenant concerns in topics like fee dispute and utility issues are consistently dominant in all four states analyzed while each state has other common tenant concerns special to itself. Moreover, we discover temporal trends in tenant concerns that provide important implications regarding the impact of the pandemic and the Eviction Moratorium.

Interaction Techniques for Exploratory Data Visualization on Mobile Devices

Authors: Luke S. Snyder, Ryan A. Rossi, Eunyee Koh, Jeffrey Heer, Jane Hoffswell

Link: http://arxiv.org/abs/2404.11602v1open in new window

Abstract: The ubiquity and on-the-go availability of mobile devices makes them central to many tasks such as interpersonal communication and media consumption. However, despite the potential of mobile devices for on-demand exploratory data visualization, existing mobile interactions are difficult, often using highly custom interactions, complex gestures, or multi-modal input. We synthesize limitations from the literature and outline four motivating principles for improved mobile interaction: leverage ubiquitous modalities, prioritize discoverability, enable rapid in-context data exploration, and promote graceful recovery. We then contribute thirteen interaction candidates and conduct a formative study with twelve participants who experienced our interactions in a testbed prototype. Based on these interviews, we discuss design considerations and tradeoffs from four main themes: precise and rapid inspection, focused navigation, single-touch and fixed orientation interaction, and judicious use of motion.

Embedding Privacy in Computational Social Science and Artificial Intelligence Research

Authors: Keenan Jones, Fatima Zahrah, Jason R. C. Nurse

Link: http://arxiv.org/abs/2404.11515v1open in new window

Abstract: Privacy is a human right. It ensures that individuals are free to engage in discussions, participate in groups, and form relationships online or offline without fear of their data being inappropriately harvested, analyzed, or otherwise used to harm them. Preserving privacy has emerged as a critical factor in research, particularly in the computational social science (CSS), artificial intelligence (AI) and data science domains, given their reliance on individuals' data for novel insights. The increasing use of advanced computational models stands to exacerbate privacy concerns because, if inappropriately used, they can quickly infringe privacy rights and lead to adverse effects for individuals - especially vulnerable groups - and society. We have already witnessed a host of privacy issues emerge with the advent of large language models (LLMs), such as ChatGPT, which further demonstrate the importance of embedding privacy from the start. This article contributes to the field by discussing the role of privacy and the primary issues that researchers working in CSS, AI, data science and related domains are likely to face. It then presents several key considerations for researchers to ensure participant privacy is best preserved in their research design, data collection and use, analysis, and dissemination of research results.

Frameworking for a Community-led Feminist Ethics

Authors: Ana O Henriques, Hugo Nicolau, Kyle Montague

Link: http://arxiv.org/abs/2404.11514v1open in new window

Abstract: This paper introduces a relational perspective on ethics within the context of Feminist Digital Civics and community-led design. Ethics work in HCI has primarily focused on prescriptive machine ethics and bioethics principles rather than people. In response, we advocate for a community-led, processual approach to ethics, acknowledging power dynamics and local contexts. We thus propose a multidimensional adaptive model for ethics in HCI design, integrating an intersectional feminist ethical lens. This framework embraces feminist epistemologies, methods, and methodologies, fostering a reflexive practice. By weaving together situated knowledges, standpoint theory, intersectionality, participatory methods, and care ethics, our approach offers a holistic foundation for ethics in HCI, aiming to advance community-led practices and enrich the discourse surrounding ethics within this field.

Designing Touchscreen Menu Interfaces for In-Vehicle Infotainment Systems: the Effect of Depth and Breadth Trade-off and Task Types on Visual-Manual Distraction

Authors: Louveton Nicolas, McCall Rod, Engel Thomas

Link: http://arxiv.org/abs/2404.11469v1open in new window

Abstract: Multitasking with a touch screen user-interface while driving is known to impact negatively driving performance and safety. Literature shows that list scrolling interfaces generate more visual-manual distraction than structured menus and sequential navigation. Depth and breadth trade-offs for structured navigation have been studied. However, little is known on how secondary task characteristics interact with those trade-offs. In this study, we make the hypothesis that both menu's depth and task complexity interact in generating visual-manual distraction. Using a driving simulation setup, we collected telemetry and eye-tracking data to evaluate driving performance. Participants were multitasking with a mobile app, presenting a range of eight depth and breadth trade-offs under three types of secondary tasks, involving different cognitive operations (Systematic reading, Search for an item, Memorize items' state). The results confirm our hypothesis. Systematic interaction with menu items generated a visual demand that increased with menu's depth, while visual demand reach an optimum for Search and Memory tasks. We discuss implications for design: In a multitasking context, display design effectiveness must be assessed while considering menu's layout but also cognitive processes involved.

Using Game Engines and Machine Learning to Create Synthetic Satellite Imagery for a Tabletop Verification Exercise

Authors: Johannes Hoster, Sara Al-Sayed, Felix Biessmann, Alexander Glaser, Kristian Hildebrand, Igor Moric, Tuong Vy Nguyen

Link: http://arxiv.org/abs/2404.11461v1open in new window

Abstract: Satellite imagery is regarded as a great opportunity for citizen-based monitoring of activities of interest. Relevant imagery may however not be available at sufficiently high resolution, quality, or cadence -- let alone be uniformly accessible to open-source analysts. This limits an assessment of the true long-term potential of citizen-based monitoring of nuclear activities using publicly available satellite imagery. In this article, we demonstrate how modern game engines combined with advanced machine-learning techniques can be used to generate synthetic imagery of sites of interest with the ability to choose relevant parameters upon request; these include time of day, cloud cover, season, or level of activity onsite. At the same time, resolution and off-nadir angle can be adjusted to simulate different characteristics of the satellite. While there are several possible use-cases for synthetic imagery, here we focus on its usefulness to support tabletop exercises in which simple monitoring scenarios can be examined to better understand verification capabilities enabled by new satellite constellations and very short revisit times.

Characterizing and modeling harms from interactions with design patterns in AI interfaces

Authors: Lujain Ibrahim, Luc Rocher, Ana Valdivia

Link: http://arxiv.org/abs/2404.11370v1open in new window

Abstract: The proliferation of applications using artificial intelligence (AI) systems has led to a growing number of users interacting with these systems through sophisticated interfaces. Human-computer interaction research has long shown that interfaces shape both user behavior and user perception of technical capabilities and risks. Yet, practitioners and researchers evaluating the social and ethical risks of AI systems tend to overlook the impact of anthropomorphic, deceptive, and immersive interfaces on human-AI interactions. Here, we argue that design features of interfaces with adaptive AI systems can have cascading impacts, driven by feedback loops, which extend beyond those previously considered. We first conduct a scoping review of AI interface designs and their negative impact to extract salient themes of potentially harmful design patterns in AI interfaces. Then, we propose Design-Enhanced Control of AI systems (DECAI), a conceptual model to structure and facilitate impact assessments of AI interface designs. DECAI draws on principles from control systems theory -- a theory for the analysis and design of dynamic physical systems -- to dissect the role of the interface in human-AI systems. Through two case studies on recommendation systems and conversational language model systems, we show how DECAI can be used to evaluate AI interface designs.

AR for Sexual Violence: Maintaining Ethical Balance While Enhancing Empathy

Authors: Chunwei Lin

Link: http://arxiv.org/abs/2404.11305v1open in new window

Abstract: This study showcases an augmented reality (AR) experience designed to promote gender justice and increase awareness of sexual violence in Taiwan. By leveraging AR, this project overcomes the limitations of offline exhibitions on social issues by motivating the public to participate and enhancing their willingness to delve into the topic. The discussion explores how direct exposure to sexual violence can induce negative emotions and secondary trauma among users. It also suggests strategies for using AR to alleviate such issues, particularly by avoiding simulations of actual incidents.

"That's our game!" : Reflections on co-designing a robotic game with neurodiverse children

Authors: Patricia Piedade, Isabel Neto, Ana Pires, Rui Prada, Hugo Nicolau

Link: http://arxiv.org/abs/2404.11252v1open in new window

Abstract: Many neurodivergent (ND) children are integrated into mainstream schools alongside their neurotypical (NT) peers. However, they often face social exclusion, which may have lifelong effects. Inclusive play activities can be a strong driver of inclusion. Unfortunately, games designed for the specific needs of neurodiverse groups, those that include neurodivergent and neurotypical individuals, are scarce. Given the potential of robots as engaging devices, we led a 6-month co-design process to build an inclusive and entertaining robotic game for neurodiverse classrooms. We first interviewed neurodivergent adults and educators to identify the barriers and facilitators for including neurodivergent children in mainstream classrooms. Then, we conducted five co-design sessions, engaging four neurodiverse classrooms with 81 children (19 neurodivergent). We present a reflection on our co-design process and the resulting robotic game through the lens of Self-Determination Theory, discussing how our methodology supported the intrinsic motivations of neurodivergent children.

Ethical Concerns when Working with Mixed-Ability Groups of Children

Authors: Patricia Piedade, Ana Henriques, Filipa Rocha, Isabel Neto, Hugo Nicolau

Link: http://arxiv.org/abs/2404.11248v1open in new window

Abstract: Accessibility research has gained traction, yet ethical gaps persist in the inclusion of individuals with disabilities, especially children. Inclusive research practices are essential to ensure research and design solutions cater to the needs of all individuals, regardless of their abilities. Working with children with disabilities in Human-Computer Interaction and Human-Robot Interaction presents a unique set of ethical dilemmas. These young participants often require additional care, support, and accommodations, which can fall off researchers' resources or expertise. The lack of clear guidance on navigating these challenges further aggravates the problem. To provide a base and address this issue, we adopt a critical reflective approach, evaluating our impact by analyzing two case studies involving children with disabilities in HCI/HRI research.

PartiPlay: A Participatory Game Design Kit for Neurodiverse Classrooms

Authors: Patricia Piedade, Isabel Neto, Ana Pires, Rui Prada, Hugo Nicolau

Link: http://arxiv.org/abs/2404.11234v1open in new window

Abstract: Play is a central aspect of childhood development, with games as a vital tool to promote it. However, neurodivergent children, especially those in neurodiverse environments, are underserved by HCI games research. Most existing work takes on a top-down approach, disregarding neurodivergent interest for the majority of the design process. Co-design is often proposed as a tool to create truly accessible and inclusive gaming experiences. Nevertheless, co-designing with neurodivergent children within neurodiverse groups brings about unique challenges, such as different communication styles, sensory needs and preferences. Building upon recommendations from prior work in neurodivergent, mixed-ability, and child-led co-design, we propose a concrete participatory game design kit for neurodiverse classrooms: PartiPlay. Moreover, we present preliminary findings from an in-the-wild experiment with the said kit, showcasing its ability to create an inclusive co-design process for neurodiverse groups of children. We aim to provide actionable steps for future participatory design research with neurodiverse children.

Unlocking Memories with AI: Exploring the Role of AI-Generated Cues in Personal Reminiscing

Authors: Jun Li Jeung, Janet Yi-Ching Huang

Link: http://arxiv.org/abs/2404.11227v1open in new window

Abstract: While technology-mediated reminiscing has been studied for decades, generating relevant cues to trigger personal reminiscing remains challenging. The potential of AI in generating relevant content across various domains has been recently recognized, yet its use in facilitating reminiscing is still less explored. This work aims to explore the use of AI in supporting the recall of personal memories associated with significant objects at home. We designed Treasurefinder, a device powered by a large language model (LLM) that generates open-ended questions based on stories stored in NFC-tagged physical objects or cards. We conducted an exploratory study with 12 participants, grouped in pairs, to observe reminiscing behaviors when using Treasurefinder. The results showed the AI-generated questions 1) supported individuals to recall the past, 2) provided new insights about the other person, and 3) encouraged reflection. Notably, the device facilitated active memory retrieval related to cherished objects that are often overlooked.

[DC] bRight XR: How to train designers to keep on the bright side?

Authors: Romain Rouyer, David Bourguignon, Stéphanie Fleck

Link: http://arxiv.org/abs/2404.11142v1open in new window

Abstract: This research project aims to promote ethical principles among designers engaged in adaptive-XR by providing tools for self-assessment. We introduce a Design-Based Research (DBR) methodology to build bRight-XR, a framework including a heuristic evaluation matrix and based on learning theory.

Do you need a DAO?

Authors: Henrik Axelsen, Johannes Rude Jensen, Omri Ross

Link: http://arxiv.org/abs/2404.11076v1open in new window

Abstract: Decentralized Autonomous Organizations (DAOs) have seen exponential growth and interest due to their potential to redefine organizational structure and governance. Despite this, there is a discrepancy between the ideals of autonomy and decentralization and the actual experiences of DAO stakeholders. The Information Systems (IS) literature has yet to fully explore whether DAOs are the optimal organizational choice. Addressing this gap, our research asks, "Is a DAO suitable for your organizational needs?" We derive a gated decision-making framework through a thematic review of the academic and grey literature on DAOs. Through five scenarios, the framework critically emphasizes the gaps between DAOs' theoretical capabilities and practical challenges. Our findings contribute to the IS discourse on blockchain technologies, with some ancillary contributions to the IS literature on organizational management and practitioner literature.

Large Language Models Meet User Interfaces: The Case of Provisioning Feedback

Authors: Stanislav Pozdniakov, Jonathan Brazil, Solmaz Abdi, Aneesha Bakharia, Shazia Sadiq, Dragan Gasevic, Paul Denny, Hassan Khosravi

Link: http://arxiv.org/abs/2404.11072v1open in new window

Abstract: Incorporating Generative AI (GenAI) and Large Language Models (LLMs) in education can enhance teaching efficiency and enrich student learning. Current LLM usage involves conversational user interfaces (CUIs) for tasks like generating materials or providing feedback. However, this presents challenges including the need for educator expertise in AI and CUIs, ethical concerns with high-stakes decisions, and privacy risks. CUIs also struggle with complex tasks. To address these, we propose transitioning from CUIs to user-friendly applications leveraging LLMs via API calls. We present a framework for ethically incorporating GenAI into educational tools and demonstrate its application in our tool, Feedback Copilot, which provides personalized feedback on student assignments. Our evaluation shows the effectiveness of this approach, with implications for GenAI researchers, educators, and technologists. This work charts a course for the future of GenAI in education.

Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions

Authors: Leena Mathur, Paul Pu Liang, Louis-Philippe Morency

Link: http://arxiv.org/abs/2404.11023v1open in new window

Abstract: Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal that involves creating agents that can sense, perceive, reason about, learn from, and respond to affect, behavior, and cognition of other agents (human or artificial). Progress towards Social-AI has accelerated in the past decade across several computing communities, including natural language processing, machine learning, robotics, human-machine interaction, computer vision, and speech. Natural language processing, in particular, has been prominent in Social-AI research, as language plays a key role in constructing the social world. In this position paper, we identify a set of underlying technical challenges and open questions for researchers across computing communities to advance Social-AI. We anchor our discussion in the context of social intelligence concepts and prior progress in Social-AI research.

Remote Breathing Monitoring Using LiDAR Technology

Authors: Omar Rinchi, Ahmad Alsharoa, Denise A. Baker

Link: http://arxiv.org/abs/2404.10970v1open in new window

Abstract: Breathing monitoring is crucial in healthcare for early detection of health issues, but traditional methods face challenges like invasiveness, privacy concerns, and limited applicability in daily settings. This paper introduces light detection and ranging (LiDAR) sensors as a remote, privacy-respecting alternative for monitoring breathing metrics, including inhalation/exhalation patterns, respiratory rates, breath depth, and detecting breathlessness. We highlight LiDARs ability to function across various postures, presenting empirical evidence of its accuracy and reliability. Our findings position LiDAR as an innovative solution in breathing monitoring, offering significant advantages over conventional methods.

Tinker or Transfer? A Tale of Two Techniques in Teaching Visualization

Authors: Adam Hyland, Murtaza Ali

Link: http://arxiv.org/abs/2404.10967v1open in new window

Abstract: In education there exists a tension between two modes of learning: traditional lecture-based instruction and more tinkering-based creative learning. In this paper, we outline our efforts as two Ph.D. students (who are skilled in visualization but are not, importantly, professionally trained visualization experts) to implement creative learning activities in an information visualization course in our home department. We describe our motivation for doing so, and how what began out of necessity turned into an endeavor whose utility we strongly believe in. In implementing these activities, we received largely positive reviews from students, along with constructive feedback which helped us iteratively improve the activities. Finally, we also detail our future plans for turning this work into a formal design inquiry with students to build a new class centered entirely around creative learning.

2024-04-16

Human-Algorithm Collaborative Bayesian Optimization for Engineering Systems

Authors: Tom Savage, Ehecatl Antonio del Rio Chanona

Link: http://arxiv.org/abs/2404.10949v1open in new window

Abstract: Bayesian optimization has been successfully applied throughout Chemical Engineering for the optimization of functions that are expensive-to-evaluate, or where gradients are not easily obtainable. However, domain experts often possess valuable physical insights that are overlooked in fully automated decision-making approaches, necessitating the inclusion of human input. In this article we re-introduce the human back into the data-driven decision making loop by outlining an approach for collaborative Bayesian optimization. Our methodology exploits the hypothesis that humans are more efficient at making discrete choices rather than continuous ones and enables experts to influence critical early decisions. We apply high-throughput (batch) Bayesian optimization alongside discrete decision theory to enable domain experts to influence the selection of experiments. At every iteration we apply a multi-objective approach that results in a set of alternate solutions that have both high utility and are reasonably distinct. The expert then selects the desired solution for evaluation from this set, allowing for the inclusion of expert knowledge and improving accountability, whilst maintaining the advantages of Bayesian optimization. We demonstrate our approach across a number of applied and numerical case studies including bioprocess optimization and reactor geometry design, demonstrating that even in the case of an uninformed practitioner our algorithm recovers the regret of standard Bayesian optimization. Through the inclusion of continuous expert opinion, our approach enables faster convergence, and improved accountability for Bayesian optimization in engineering systems.

Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop

Authors: Hector Kohler, Quentin Delfosse, Paul Festor, Philippe Preux

Link: http://arxiv.org/abs/2404.10906v1open in new window

Abstract: Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions: what distinguishes explainability from interpretability? Should explainable and interpretable agents be developed outside of domains where transparency is imperative? What advantages do interpretable policies offer over neural networks? How can we rigorously define and measure interpretability in policies, without user studies? What reinforcement learning paradigms,are the most suited to develop interpretable agents? Can Markov Decision Processes integrate interpretable state representations? In addition to motivate an Interpretable RL community centered around the aforementioned questions, we propose the first venue dedicated to Interpretable RL: the InterpPol Workshop.

Exploring Augmentation and Cognitive Strategies for AI based Synthetic Personae

Authors: Rafael Arias Gonzalez, Steve DiPaola

Link: http://arxiv.org/abs/2404.10890v1open in new window

Abstract: Large language models (LLMs) hold potential for innovative HCI research, including the creation of synthetic personae. However, their black-box nature and propensity for hallucinations pose challenges. To address these limitations, this position paper advocates for using LLMs as data augmentation systems rather than zero-shot generators. We further propose the development of robust cognitive and memory frameworks to guide LLM responses. Initial explorations suggest that data enrichment, episodic memory, and self-reflection techniques can improve the reliability of synthetic personae and open up new avenues for HCI research.

A Systematic Survey of the Gemini Principles for Digital Twin Ontologies

Authors: James Michael Tooth, Nilufer Tuptuk, Jeremy Daniel McKendrick Watson

Link: http://arxiv.org/abs/2404.10754v1open in new window

Abstract: Ontologies are widely used for achieving interoperable Digital Twins (DTws), yet competing DTw definitions compound interoperability issues. Semantically linking these differing twins is feasible through ontologies and Cognitive Digital Twins (CDTws). However, it is often unclear how ontology use bolsters broader DTw advancements. This article presents a systematic survey following the PRISMA method, to explore the potential of ontologies to support DTws to meet the Centre for Digital Built Britain's Gemini Principles and aims to link progress in ontologies to this framework. The Gemini Principles focus on common DTw requirements, considering: Purpose for 1) Public Good, 2) Value Creation, and 3) Insight; Trustworthiness with sufficient 4) Security, 5) Openness, and 6) Quality; and appropriate Functionality of 7) Federation, 8) Curation, and 9) Evolution. This systematic literature review examines the role of ontologies in facilitating each principle. Existing research uses ontologies to solve DTw challenges within these principles, particularly by connecting DTws, optimising decisionmaking, and reasoning governance policies. Furthermore, analysing the sectoral distribution of literature found that research encompassing the crossover of ontologies, DTws and the Gemini Principles is emerging, and that most innovation is predominantly within manufacturing and built environment sectors. Critical gaps for researchers, industry practitioners, and policymakers are subsequently identified.

Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration

Authors: Benjamin A Newman, Chris Paxton, Kris Kitani, Henny Admoni

Link: http://arxiv.org/abs/2404.10733v1open in new window

Abstract: Agents that assist people need to have well-initialized policies that can adapt quickly to align with their partners' reward functions. Initializing policies to maximize performance with unknown partners can be achieved by bootstrapping nonlinear models using imitation learning over large, offline datasets. Such policies can require prohibitive computation to fine-tune in-situ and therefore may miss critical run-time information about a partner's reward function as expressed through their immediate behavior. In contrast, online logistic regression using low-capacity models performs rapid inference and fine-tuning updates and thus can make effective use of immediate in-task behavior for reward function alignment. However, these low-capacity models cannot be bootstrapped as effectively by offline datasets and thus have poor initializations. We propose BLR-HAC, Bootstrapped Logistic Regression for Human Agent Collaboration, which bootstraps large nonlinear models to learn the parameters of a low-capacity model which then uses online logistic regression for updates during collaboration. We test BLR-HAC in a simulated surface rearrangement task and demonstrate that it achieves higher zero-shot accuracy than shallow methods and takes far less computation to adapt online while still achieving similar performance to fine-tuned, large nonlinear models. For code, please see our project page https://sites.google.com/view/blr-hac.

Attention-Aware Visualization: Tracking and Responding to User Perception Over Time

Authors: Arvind Srinivasan, Johannes Ellemose, Peter W. S. Butcher, Panagiotis D. Ritsos, Niklas Elmqvist

Link: http://arxiv.org/abs/2404.10732v1open in new window

Abstract: We propose the notion of Attention-Aware Visualizations (AAVs) that track the user's perception of a visual representation over time and feed this information back to the visualization. Such context awareness is particularly useful for ubiquitous and immersive analytics where knowing which embedded visualizations the user is looking at can be used to make visualizations react appropriately to the user's attention: for example, by highlighting data the user has not yet seen. We can separate the approach into three components: (1) measuring the user's gaze on a visualization and its parts; (2) tracking the user's attention over time; and (3) reactively modifying the visual representation based on the current attention metric. In this paper, we present two separate implementations of AAV: a 2D data-agnostic method for web-based visualizations that can use an embodied eyetracker to capture the user's gaze, and a 3D data-aware one that uses the stencil buffer to track the visibility of each individual mark in a visualization. Both methods provide similar mechanisms for accumulating attention over time and changing the appearance of marks in response. We also present results from a qualitative evaluation studying visual feedback and triggering mechanisms for capturing and revisualizing attention.

Cross-Language Evolution of Divergent Collective Memory Around the Arab Spring

Authors: H. Laurie Jones, Brian C. Keegan

Link: http://arxiv.org/abs/2404.10706v1open in new window

Abstract: The Arab Spring was a historic set of protests beginning in 2011 that toppled governments and led to major conflicts. Collective memories of events like these can vary significantly across social contexts in response to political, cultural, and linguistic factors. While Wikipedia plays an important role in documenting both historic and current events, little attention has been given to how Wikipedia articles, created in the aftermath of major events, continue to evolve over years or decades. Using the archived content of Arab Spring-related topics across the Arabic and English Wikipedias between 2011 and 2024, we define and evaluate multilingual measures of event salience, deliberation, contextualization, and consolidation of collective memory surrounding the Arab Spring. Our findings about the temporal evolution of the Wikipedia articles' content similarity across languages has implications for theorizing about online collective memory processes and evaluating linguistic models trained on these data.

MathWriting: A Dataset For Handwritten Mathematical Expression Recognition

Authors: Philippe Gervais, Asya Fadeeva, Andrii Maksai

Link: http://arxiv.org/abs/2404.10690v1open in new window

Abstract: We introduce MathWriting, the largest online handwritten mathematical expression dataset to date. It consists of 230k human-written samples and an additional 400k synthetic ones. MathWriting can also be used for offline HME recognition and is larger than all existing offline HME datasets like IM2LATEX-100K. We introduce a benchmark based on MathWriting data in order to advance research on both online and offline HME recognition.

PD-Insighter: A Visual Analytics System to Monitor Daily Actions for Parkinson's Disease Treatment

Authors: Jade Kandel, Chelsea Duppen, Qian Zhang, Howard Jiang, Angelos Angelopoulos, Ashley Neall, Pranav Wagh, Daniel Szafir, Henry Fuchs, Michael Lewek, Danielle Albers Szafir

Link: http://arxiv.org/abs/2404.10661v1open in new window

Abstract: People with Parkinson's Disease (PD) can slow the progression of their symptoms with physical therapy. However, clinicians lack insight into patients' motor function during daily life, preventing them from tailoring treatment protocols to patient needs. This paper introduces PD-Insighter, a system for comprehensive analysis of a person's daily movements for clinical review and decision-making. PD-Insighter provides an overview dashboard for discovering motor patterns and identifying critical deficits during activities of daily living and an immersive replay for closely studying the patient's body movements with environmental context. Developed using an iterative design study methodology in consultation with clinicians, we found that PD-Insighter's ability to aggregate and display data with respect to time, actions, and local environment enabled clinicians to assess a person's overall functioning during daily life outside the clinic. PD-Insighter's design offers future guidance for generalized multiperspective body motion analytics, which may significantly improve clinical decision-making and slow the functional decline of PD and other medical conditions.

Authors: Julieana Moon, Naimul Khan

Link: http://arxiv.org/abs/2404.10649v1open in new window

Abstract: Within the evolving field of digital intervention, serious games emerge as promising tools for evidence-based interventions. Research indicates that gamified therapy, whether employed independently or in conjunction with online psychoeducation or traditional programs, proves more efficacious in delivering care to patients. As we navigate the intricate realm of serious game design, bridging the gap between therapeutic approaches and creative design proves complex. Professionals in clinical and research roles demonstrate innovative thinking yet face challenges in executing engaging therapeutic serious games due to the lack of specialized design skills and knowledge. Thus, a larger question remains: How might we aid and educate professionals in clinical and research roles the importance of game design to support their innovative therapeutic approaches? This study examines potential solutions aimed at facilitating the integration of gamification design principles into clinical study protocols, a pivotal aspect for aligning therapeutic practices with captivating narratives in the pursuit of innovative interventions. We propose two solutions, a flow chart framework for serious games or a comprehensive reference document encompassing gamification design principles and guidelines for best design practices. Through an examination of literature reviews, it was observed that selected design decisions varied across studies. Thus, we propose that the second solution, a comprehensive reference design guide, is more versatile and adaptable.

A Longitudinal Study of Child Wellbeing Assessment via Online Interactions with a Social Robots

Authors: Nida Itrat Abbasi, Guy Laban, Tasmin Ford, Peter B. Jones, Hatice Gunes

Link: http://arxiv.org/abs/2404.10593v1open in new window

Abstract: Socially Assistive Robots are studied in different Child-Robot Interaction settings. However, logistical constraints limit accessibility, particularly affecting timely support for mental wellbeing. In this work, we have investigated whether online interactions with a robot can be used for the assessment of mental wellbeing in children. The children (N=40, 20 girls and 20 boys; 8-13 years) interacted with the Nao robot (30-45 mins) over three sessions, at least a week apart. Audio-visual recordings were collected throughout the sessions that concluded with the children answering user perception questionnaires pertaining to their anxiety towards the robot, and the robot's abilities. We divided the participants into three wellbeing clusters (low, med and high tertiles) using their responses to the Short Moods and Feelings Questionnaire (SMFQ) and further analysed how their wellbeing and their perceptions of the robot changed over the wellbeing tertiles, across sessions and across participants' gender. Our primary findings suggest that (I) online mediated-interactions with robots can be effective in assessing children's mental wellbeing over time, and (II) children's overall perception of the robot either improved or remained consistent across time. Supplementary exploratory analyses have also revealed that gender affected the children's wellbeing assessments as well as their perceptions of the robot.

Learning Symbolic Task Representation from a Human-Led Demonstration: A Memory to Store, Retrieve, Consolidate, and Forget Experiences

Authors: Luca Buoncompagni, Fulvio Mastrogiovanni

Link: http://arxiv.org/abs/2404.10591v1open in new window

Abstract: We present a symbolic learning framework inspired by cognitive-like memory functionalities (i.e., storing, retrieving, consolidating and forgetting) to generate task representations to support high-level task planning and knowledge bootstrapping. We address a scenario involving a non-expert human, who performs a single task demonstration, and a robot, which online learns structured knowledge to re-execute the task based on experiences, i.e., observations. We consider a one-shot learning process based on non-annotated data to store an intelligible representation of the task, which can be refined through interaction, e.g., via verbal or visual communication. Our general-purpose framework relies on fuzzy Description Logic, which has been used to extend the previously developed Scene Identification and Tagging algorithm. In this paper, we exploit such an algorithm to implement cognitive-like memory functionalities employing scores that rank memorised observations over time based on simple heuristics. Our main contribution is the formalisation of a framework that can be used to systematically investigate different heuristics for bootstrapping hierarchical knowledge representations based on robot observations. Through an illustrative assembly task scenario, the paper presents the performance of our framework to discuss its benefits and limitations.

The Evolution of Learning: Assessing the Transformative Impact of Generative AI on Higher Education

Authors: Stefanie Krause, Bhumi Hitesh Panchal, Nikhil Ubhe

Link: http://arxiv.org/abs/2404.10551v1open in new window

Abstract: Generative Artificial Intelligence (GAI) models such as ChatGPT have experienced a surge in popularity, attracting 100 million active users in 2 months and generating an estimated 10 million daily queries. Despite this remarkable adoption, there remains a limited understanding to which extent this innovative technology influences higher education. This research paper investigates the impact of GAI on university students and Higher Education Institutions (HEIs). The study adopts a mixed-methods approach, combining a comprehensive survey with scenario analysis to explore potential benefits, drawbacks, and transformative changes the new technology brings. Using an online survey with 130 participants we assessed students' perspectives and attitudes concerning present ChatGPT usage in academics. Results show that students use the current technology for tasks like assignment writing and exam preparation and believe it to be a effective help in achieving academic goals. The scenario analysis afterwards projected potential future scenarios, providing valuable insights into the possibilities and challenges associated with incorporating GAI into higher education. The main motivation is to gain a tangible and precise understanding of the potential consequences for HEIs and to provide guidance responding to the evolving learning environment. The findings indicate that irresponsible and excessive use of the technology could result in significant challenges. Hence, HEIs must develop stringent policies, reevaluate learning objectives, upskill their lecturers, adjust the curriculum and reconsider examination approaches.

BDAN: Mitigating Temporal Difference Across Electrodes in Cross-Subject Motor Imagery Classification via Generative Bridging Domain

Authors: Zhige Chen, Rui Yang, Mengjie Huang, Chengxuan Qin, Zidong Wang

Link: http://arxiv.org/abs/2404.10494v1open in new window

Abstract: Because of "the non-repeatability of the experiment settings and conditions" and "the variability of brain patterns among subjects", the data distributions across sessions and electrodes are different in cross-subject motor imagery (MI) studies, eventually reducing the performance of the classification model. Systematically summarised based on the existing studies, a novel temporal-electrode data distribution problem is investigated under both intra-subject and inter-subject scenarios in this paper. Based on the presented issue, a novel bridging domain adaptation network (BDAN) is proposed, aiming to minimise the data distribution difference across sessions in the aspect of the electrode, thus improving and enhancing model performance. In the proposed BDAN, deep features of all the EEG data are extracted via a specially designed spatial feature extractor. With the obtained spatio-temporal features, a special generative bridging domain is established, bridging the data from all the subjects across sessions. The difference across sessions and electrodes is then minimized using the customized bridging loss functions, and the known knowledge is automatically transferred through the constructed bridging domain. To show the effectiveness of the proposed BDAN, comparison experiments and ablation studies are conducted on a public EEG dataset. The overall comparison results demonstrate the superior performance of the proposed BDAN compared with the other advanced deep learning and domain adaptation methods.

Synthetic vs Human Emotional Faces: What Changes in Humans' Decoding Accuracy

Authors: Terry Amorese, Marialucia Cuciniello, Alessandro Vinciarelli, Gennaro Cordasco, Anna Esposito

Link: http://arxiv.org/abs/2404.10435v1open in new window

Abstract: Considered the increasing use of assistive technologies in the shape of virtual agents, it is necessary to investigate those factors which characterize and affect the interaction between the user and the agent, among these emerges the way in which people interpret and decode synthetic emotions, i.e., emotional expressions conveyed by virtual agents. For these reasons, an article is proposed, which involved 278 participants split in differently aged groups (young, middle-aged, and elders). Within each age group, some participants were administered a naturalistic decoding task, a recognition task of human emotional faces, while others were administered a synthetic decoding task, namely emotional expressions conveyed by virtual agents. Participants were required to label pictures of female and male humans or virtual agents of different ages (young, middle-aged, and old) displaying static expressions of disgust, anger, sadness, fear, happiness, surprise, and neutrality. Results showed that young participants showed better recognition performances (compared to older groups) of anger, sadness, and neutrality, while female participants showed better recognition performances(compared to males) ofsadness, fear, and neutrality; sadness and fear were better recognized when conveyed by real human faces, while happiness, surprise, and neutrality were better recognized when represented by virtual agents. Young faces were better decoded when expressing anger and surprise, middle-aged faces were better decoded when expressing sadness, fear, and happiness , while old faces were better decoded in the case of disgust; on average, female faces where better decoded compared to male ones.

CanvasPic: An Interactive Tool for Freely Generating Facial Images Based on Spatial Layout

Authors: Jiafu Wei, Chia-Ming Chang, Xi Yang, Takeo Igarashi

Link: http://arxiv.org/abs/2404.10352v1open in new window

Abstract: In real-world usage, existing GAN image generation tools come up short due to their lack of intuitive interfaces and limited flexibility. To overcome these limitations, we developed CanvasPic, an innovative tool for flexible GAN image generation. Our tool introduces a novel 2D layout design that allows users to intuitively control image attributes based on real-world images. By interacting with the distances between images in the spatial layout, users are able to conveniently control the influence of each attribute on the target image and explore a wide range of generated results. Considering practical application scenarios, a user study involving 24 participants was conducted to compare our tool with existing tools in GAN image generation. The results of the study demonstrate that our tool significantly enhances the user experience, enabling more effective achievement of desired generative results.

The Dearth of the Author in AI-Supported Writing

Authors: Max Kreminski

Link: http://arxiv.org/abs/2404.10289v1open in new window

Abstract: We diagnose and briefly discuss the dearth of the author: a condition that arises when AI-based creativity support tools for writing allow users to produce large amounts of text without making a commensurate number of creative decisions, resulting in output that is sparse in expressive intent. We argue that the dearth of the author helps to explain a number of recurring difficulties and anxieties around AI-based writing support tools, but that it also suggests an ambitious new goal for AI-based CSTs.

AI-Assisted Writing in Education: Ecosystem Risks and Mitigations

Authors: Antonette Shibani, Simon Buckingham Shum

Link: http://arxiv.org/abs/2404.10281v1open in new window

Abstract: While the excitement around the capabilities of technological advancements is giving rise to new AI-based writing assistants, the overarching ecosystem plays a crucial role in how they are adopted in educational practice. In this paper, we point to key ecological aspects for consideration. We draw insights from extensive research integrated with practice on a writing feedback tool over 9 years at a university, and we highlight potential risks when these are overlooked. It informs the design of educational writing support tools to be better aligned within broader contexts to balance innovation with practical impact.

CO-oPS: A Mobile App for Community Oversight of Privacy and Security

Authors: Mamtaj Akter, Leena Alghamdi, Dylan Gillespie, Nazmus Miazi, Jess Kropczynski, Heather Lipford, Pamela Wisniewski

Link: http://arxiv.org/abs/2404.10258v1open in new window

Abstract: Smartphone users install numerous mobile apps that require access to different information from their devices. Much of this information is very sensitive, and users often struggle to manage these accesses due to their lack of tech expertise and knowledge regarding mobile privacy. Thus, they often seek help from others to make decisions regarding their mobile privacy and security. We embedded these social processes in a mobile app titled "CO-oPS'' ("Community Oversight for Privacy and Security"). CO-oPS allows trusted community members to review one another's apps installed and permissions granted to those apps. Community members can provide feedback to one another regarding their privacy behaviors. Users are also allowed to hide some of their mobile apps that they do not like others to see, ensuring their personal privacy.

AniFrame: A Programming Language for 2D Drawing and Frame-Based Animation

Authors: Mark Edward M. Gonzales, Hans Oswald A. Ibrahim, Elyssia Barrie H. Ong, Ryan Austin Fernandez

Link: http://arxiv.org/abs/2404.10250v1open in new window

Abstract: Creative coding is an experimentation-heavy activity that requires translating high-level visual ideas into code. However, most languages and libraries for creative coding may not be adequately intuitive for beginners. In this paper, we present AniFrame, a domain-specific language for drawing and animation. Designed for novice programmers, it (i) features animation-specific data types, operations, and built-in functions to simplify the creation and animation of composite objects, (ii) allows for fine-grained control over animation sequences through explicit specification of the target object and the start and end frames, (iii) reduces the learning curve through a Python-like syntax, type inferencing, and a minimal set of control structures and keywords that map closely to their semantic intent, and (iv) promotes computational expressivity through support for common mathematical operations, built-in trigonometric functions, and user-defined recursion. Our usability test demonstrates AniFrame's potential to enhance readability and writability for multiple creative coding use cases. AniFrame is open-source, and its implementation and reference are available at https://github.com/memgonzales/aniframe-language.

Impostor Syndrome in Final Year Computer Science Students: An Eye Tracking and Biometrics Study

Authors: Alyssia Chen, Carol Wong, Katy Tarrit, Anthony Peruma

Link: http://arxiv.org/abs/2404.10194v1open in new window

Abstract: Imposter syndrome is a psychological phenomenon that affects individuals who doubt their skills and abilities, despite possessing the necessary competencies. This can lead to a lack of confidence and poor performance. While research has explored the impacts of imposter syndrome on students and professionals in various fields, there is limited knowledge on how it affects code comprehension in software engineering. In this exploratory study, we investigate the prevalence of imposter syndrome among final-year undergraduate computer science students and its effects on their code comprehension cognition using an eye tracker and heart rate monitor. Key findings demonstrate that students identifying as male exhibit lower imposter syndrome levels when analyzing code, and higher imposter syndrome is associated with increased time reviewing a code snippet and a lower likelihood of solving it correctly. This study provides initial data on this topic and establishes a foundation for further research to support student academic success and improve developer productivity and mental well-being.

2024-04-15

SoK (or SoLK?): On the Quantitative Study of Sociodemographic Factors and Computer Security Behaviors

Authors: Miranda Wei, Jaron Mink, Yael Eiger, Tadayoshi Kohno, Elissa M. Redmiles, Franziska Roesner

Link: http://arxiv.org/abs/2404.10187v1open in new window

Abstract: Researchers are increasingly exploring how gender, culture, and other sociodemographic factors correlate with user computer security and privacy behaviors. To more holistically understand relationships between these factors and behaviors, we make two contributions. First, we broadly survey existing scholarship on sociodemographics and secure behavior (151 papers) before conducting a focused literature review of 47 papers to synthesize what is currently known and identify open questions for future research. Second, by incorporating contemporary social and critical theories, we establish guidelines for future studies of sociodemographic factors and security behaviors that address how to overcome common pitfalls. We present a case study to demonstrate our guidelines in action, at-scale, that conduct a measurement study of the relationships between sociodemographics and de-identified, aggregated log data of security and privacy behaviors among 16,829 users on Facebook across 16 countries. Through these contributions, we position our work as a systemization of a lack of knowledge (SoLK). Overall, we find contradictory results and vast unknowns about how identity shapes security behavior. Through our guidelines and discussion, we chart new directions to more deeply examine how and why sociodemographic factors affect security behaviors.

EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

Authors: Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti Oulasvirta

Link: http://arxiv.org/abs/2404.10163v1open in new window

Abstract: From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available.

How Users Experience Closed Captions on Live Television: Quality Metrics Remain a Challenge

Authors: Mariana Arroyo Chavez, Molly Feanny, Matthew Seita, Bernard Thompson, Keith Delk, Skyler Officer, Abraham Glasser, Raja Kushalnagar, Christian Vogler

Link: http://arxiv.org/abs/2404.10153v1open in new window

Abstract: This paper presents a mixed methods study on how deaf, hard of hearing and hearing viewers perceive live TV caption quality with captioned video stimuli designed to mirror TV captioning experiences. To assess caption quality, we used four commonly-used quality metrics focusing on accuracy: word error rate, weighted word error rate, automated caption evaluation (ACE), and its successor ACE2. We calculated the correlation between the four quality metrics and viewer ratings for subjective quality and found that the correlation was weak, revealing that other factors besides accuracy affect user ratings. Additionally, even high-quality captions are perceived to have problems, despite controlling for confounding factors. Qualitative analysis of viewer comments revealed three major factors affecting their experience: Errors within captions, difficulty in following captions, and caption appearance. The findings raise questions as to how objective caption quality metrics can be reconciled with the user experience across a diverse spectrum of viewers.

Epigraphics: Message-Driven Infographics Authoring

Authors: Tongyu Zhou, Jeff Huang, Gromit Yeuk-Yin Chan

Link: http://arxiv.org/abs/2404.10152v1open in new window

Abstract: The message a designer wants to convey plays a pivotal role in directing the design of an infographic, yet most authoring workflows start with creating the visualizations or graphics first without gauging whether they fit the message. To address this gap, we propose Epigraphics, a web-based authoring system that treats an "epigraph" as the first-class object, and uses it to guide infographic asset creation, editing, and syncing. The system uses the text-based message to recommend visualizations, graphics, data filters, color palettes, and animations. It further supports between-asset interactions and fine-tuning such as recoloring, highlighting, and animation syncing that enhance the aesthetic cohesiveness of the assets. A gallery and case studies show that our system can produce infographics inspired by existing popular ones, and a task-based usability study with 10 designers show that a text-sourced workflow can standardize content, empower users to think more about the big picture, and facilitate rapid prototyping.

Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints

Authors: Faraz Faruqi, Yingtao Tian, Vrushank Phadnis, Varun Jampani, Stefanie Mueller

Link: http://arxiv.org/abs/2404.10142v1open in new window

Abstract: Generative AI tools are becoming more prevalent in 3D modeling, enabling users to manipulate or create new models with text or images as inputs. This makes it easier for users to rapidly customize and iterate on their 3D designs and explore new creative ideas. These methods focus on the aesthetic quality of the 3D models, refining them to look similar to the prompts provided by the user. However, when creating 3D models intended for fabrication, designers need to trade-off the aesthetic qualities of a 3D model with their intended physical properties. To be functional post-fabrication, 3D models have to satisfy structural constraints informed by physical principles. Currently, such requirements are not enforced by generative AI tools. This leads to the development of aesthetically appealing, but potentially non-functional 3D geometry, that would be hard to fabricate and use in the real world. This workshop paper highlights the limitations of generative AI tools in translating digital creations into the physical world and proposes new augmentations to generative AI tools for creating physically viable 3D models. We advocate for the development of tools that manipulate or generate 3D models by considering not only the aesthetic appearance but also using physical properties as constraints. This exploration seeks to bridge the gap between digital creativity and real-world applicability, extending the creative potential of generative AI into the tangible domain.

Design and Implementation of a Java-Based Client-Server Application

Authors: Omkar Patil, Aarya Shirbhate

Link: http://arxiv.org/abs/2404.10107v1open in new window

Abstract: This report details the development of a networked distributed system named Group Communication System (GCS), implemented in Java to exemplify socket programming and communication protocols. GCS facilitates group-based client-server communication through a command-line interface (CLI), enabling seamless group interaction and management. The project emphasizes fault tolerance, design patterns, and version control system (VCS) utilization. The report offers insights into system architecture, implementation, and practical considerations, providing a comprehensive understanding of distributed systems' technical background and operational aspects.

CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at Scale

Authors: Ashley Ge Zhang, Xiaohang Tang, Steve Oney, Yan Chen

Link: http://arxiv.org/abs/2404.10089v1open in new window

Abstract: The high demand for computer science education has led to high enrollments, with thousands of students in many introductory courses. In such large courses, it can be overwhelmingly difficult for instructors to understand class-wide problem-solving patterns or issues, which is crucial for improving instruction and addressing important pedagogical challenges. In this paper, we propose a technique and system, CFlow, for creating understandable and navigable representations of code at scale. CFlow is able to represent thousands of code samples in a visualization that resembles a single code sample. CFlow creates scalable code representations by (1) clustering individual statements with similar semantic purposes, (2) presenting clustered statements in a way that maintains semantic relationships between statements, (3) representing the correctness of different variations as a histogram, and (4) allowing users to navigate through solutions interactively using semantic filters. With a multi-level view design, users can navigate high-level patterns, and low-level implementations. This is in contrast to prior tools that either limit their focus on isolated statements (and thus discard the surrounding context of those statements) or cluster entire code samples (which can lead to large numbers of clusters -- for example, if there are n code features and m implementations of each, there can be m^n clusters). We evaluated the effectiveness of CFlow with a comparison study, found participants using CFlow spent only half the time identifying mistakes and recalled twice as many desired patterns from over 6,000 submissions.

Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems

Authors: Clemencia Siro, Mohammad Aliannejadi, Maarten de Rijke

Link: http://arxiv.org/abs/2404.09980v1open in new window

Abstract: Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems (TDSs). Obtaining high-quality and consistent ground-truth labels from annotators presents challenges. When evaluating a TDS, annotators must fully comprehend the dialogue before providing judgments. Previous studies suggest using only a portion of the dialogue context in the annotation process. However, the impact of this limitation on label quality remains unexplored. This study investigates the influence of dialogue context on annotation quality, considering the truncated context for relevance and usefulness labeling. We further propose to use large language models (LLMs) to summarize the dialogue context to provide a rich and short description of the dialogue context and study the impact of doing so on the annotator's performance. Reducing context leads to more positive ratings. Conversely, providing the entire dialogue context yields higher-quality relevance ratings but introduces ambiguity in usefulness ratings. Using the first user utterance as context leads to consistent ratings, akin to those obtained using the entire dialogue, with significantly reduced annotation effort. Our findings show how task design, particularly the availability of dialogue context, affects the quality and consistency of crowdsourced evaluation labels.

Interaction as Explanation: A User Interaction-based Method for Explaining Image Classification Models

Authors: Hyeonggeun Yun

Link: http://arxiv.org/abs/2404.09828v1open in new window

Abstract: In computer vision, explainable AI (xAI) methods seek to mitigate the 'black-box' problem by making the decision-making process of deep learning models more interpretable and transparent. Traditional xAI methods concentrate on visualizing input features that influence model predictions, providing insights primarily suited for experts. In this work, we present an interaction-based xAI method that enhances user comprehension of image classification models through their interaction. Thus, we developed a web-based prototype allowing users to modify images via painting and erasing, thereby observing changes in classification results. Our approach enables users to discern critical features influencing the model's decision-making process, aligning their mental models with the model's logic. Experiments conducted with five images demonstrate the potential of the method to reveal feature importance through user interaction. Our work contributes a novel perspective to xAI by centering on end-user engagement and understanding, paving the way for more intuitive and accessible explainability in AI systems.

Bridging the Gap: Advancements in Technology to Support Dementia Care -- A Scoping Review

Authors: Yong Ma, Oda Elise Nordberg, Jessica Hubbers, Yuchong Zhang, Arvid Rongve, Miroslav Bachinski, Morten Fjeld

Link: http://arxiv.org/abs/2404.09685v1open in new window

Abstract: Dementia has serious consequences for the daily life of the person affected due to the decline in the their cognitive, behavioral and functional abilities. Caring for people living with dementia can be challenging and distressing. Innovative solutions are becoming essential to enrich the lives of those impacted and alleviate caregiver burdens. This scoping review, spanning literature from 2010 to July 2023 in the field of Human-Computer Interaction (HCI), offers a comprehensive look at how interactive technology contributes to dementia care. Emphasizing technology's role in addressing the unique needs of people with dementia (PwD) and their caregivers, this review encompasses assistive devices, mobile applications, sensors, and GPS tracking. Delving into challenges encountered in clinical and home-care settings, it succinctly outlines the influence of cutting-edge technologies, such as wearables, virtual reality, robots, and artificial intelligence, in supporting individuals with dementia and their caregivers. We categorize current dementia-related technologies into six groups based on their intended use and function: 1) daily life monitoring, 2) daily life support, 3) social interaction and communication, 4) well-being enhancement, 5) cognitive support, and 6) caregiver support.

AAM-VDT: Vehicle Digital Twin for Tele-Operations in Advanced Air Mobility

Authors: Tuan Anh Nguyen, Taeho Kwag, Vinh Pham, Viet Nghia Nguyen, Jeongseok Hyun, Minseok Jang, Jae-Woo Lee

Link: http://arxiv.org/abs/2404.09621v1open in new window

Abstract: This study advanced tele-operations in Advanced Air Mobility (AAM) through the creation of a Vehicle Digital Twin (VDT) system for eVTOL aircraft, tailored to enhance remote control safety and efficiency, especially for Beyond Visual Line of Sight (BVLOS) operations. By synergizing digital twin technology with immersive Virtual Reality (VR) interfaces, we notably elevate situational awareness and control precision for remote operators. Our VDT framework integrates immersive tele-operation with a high-fidelity aerodynamic database, essential for authentically simulating flight dynamics and control tactics. At the heart of our methodology lies an eVTOL's high-fidelity digital replica, placed within a simulated reality that accurately reflects physical laws, enabling operators to manage the aircraft via a master-slave dynamic, substantially outperforming traditional 2D interfaces. The architecture of the designed system ensures seamless interaction between the operator, the digital twin, and the actual aircraft, facilitating exact, instantaneous feedback. Experimental assessments, involving propulsion data gathering, simulation database fidelity verification, and tele-operation testing, verify the system's capability in precise control command transmission and maintaining the digital-physical eVTOL synchronization. Our findings underscore the VDT system's potential in augmenting AAM efficiency and safety, paving the way for broader digital twin application in autonomous aerial vehicles.

Using Tangible Interaction to Design Musicking Artifacts for Non-musicians

Authors: Lucía Montesinos, Halfdan Hauch Jensen, Anders Sundnes Løvlie

Link: http://arxiv.org/abs/2404.09597v1open in new window

Abstract: This paper presents a Research through Design exploration of the potential for using tangible interactions to enable active music experiences - musicking - for non-musicians. We present the Tubularium prototype, which aims to help non-musicians play music without requiring any initial skill. We present the initial design of the prototype and the features implemented in order to enable music-making by non-musicians, and offer some reflections based on observations of informal initial user explorations of the prototype.

Joint Contrastive Learning with Feature Alignment for Cross-Corpus EEG-based Emotion Recognition

Authors: Qile Liu, Zhihao Zhou, Jiyuan Wang, Zhen Liang

Link: http://arxiv.org/abs/2404.09559v1open in new window

Abstract: The integration of human emotions into multimedia applications shows great potential for enriching user experiences and enhancing engagement across various digital platforms. Unlike traditional methods such as questionnaires, facial expressions, and voice analysis, brain signals offer a more direct and objective understanding of emotional states. However, in the field of electroencephalography (EEG)-based emotion recognition, previous studies have primarily concentrated on training and testing EEG models within a single dataset, overlooking the variability across different datasets. This oversight leads to significant performance degradation when applying EEG models to cross-corpus scenarios. In this study, we propose a novel Joint Contrastive learning framework with Feature Alignment (JCFA) to address cross-corpus EEG-based emotion recognition. The JCFA model operates in two main stages. In the pre-training stage, a joint domain contrastive learning strategy is introduced to characterize generalizable time-frequency representations of EEG signals, without the use of labeled data. It extracts robust time-based and frequency-based embeddings for each EEG sample, and then aligns them within a shared latent time-frequency space. In the fine-tuning stage, JCFA is refined in conjunction with downstream tasks, where the structural connections among brain electrodes are considered. The model capability could be further enhanced for the application in emotion detection and interpretation. Extensive experimental results on two well-recognized emotional datasets show that the proposed JCFA model achieves state-of-the-art (SOTA) performance, outperforming the second-best method by an average accuracy increase of 4.09% in cross-corpus EEG-based emotion recognition tasks.

Novel entropy difference-based EEG channel selection technique for automated detection of ADHD

Authors: Shishir Maheshwari, Kandala N V P S Rajesh, Vivek Kanhangad, U Rajendra Acharya, T Sunil Kumar

Link: http://arxiv.org/abs/2404.09493v1open in new window

Abstract: Attention deficit hyperactivity disorder (ADHD) is one of the common neurodevelopmental disorders in children. This paper presents an automated approach for ADHD detection using the proposed entropy difference (EnD)- based encephalogram (EEG) channel selection approach. In the proposed approach, we selected the most significant EEG channels for the accurate identification of ADHD using an EnD-based channel selection approach. Secondly, a set of features is extracted from the selected channels and fed to a classifier. To verify the effectiveness of the channels selected, we explored three sets of features and classifiers. More specifically, we explored discrete wavelet transform (DWT), empirical mode decomposition (EMD) and symmetrically-weighted local binary pattern (SLBP)-based features. To perform automated classification, we have used k-nearest neighbor (k-NN), Ensemble classifier, and support vectors machine (SVM) classifiers. Our proposed approach yielded the highest accuracy of 99.29% using the public database. In addition, the proposed EnD-based channel selection has consistently provided better classification accuracies than the entropy-based channel selection approach. Also, the developed method

LatticeML: A data-driven application for predicting the effective Young Modulus of high temperature graph based architected materials

Authors: Akshansh Mishra

Link: http://arxiv.org/abs/2404.09470v1open in new window

Abstract: Architected materials with their unique topology and geometry offer the potential to modify physical and mechanical properties. Machine learning can accelerate the design and optimization of these materials by identifying optimal designs and forecasting performance. This work presents LatticeML, a data-driven application for predicting the effective Young's Modulus of high-temperature graph-based architected materials. The study considers eleven graph-based lattice structures with two high-temperature alloys, Ti-6Al-4V and Inconel 625. Finite element simulations were used to compute the effective Young's Modulus of the 2x2x2 unit cell configurations. A machine learning framework was developed to predict Young's Modulus, involving data collection, preprocessing, implementation of regression models, and deployment of the best-performing model. Five supervised learning algorithms were evaluated, with the XGBoost Regressor achieving the highest accuracy (MSE = 2.7993, MAE = 1.1521, R-squared = 0.9875). The application uses the Streamlit framework to create an interactive web interface, allowing users to input material and geometric parameters and obtain predicted Young's Modulus values.

Human-in-the-Loop Segmentation of Multi-species Coral Imagery

Authors: Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Niko Suenderhauf, Tobias Fischer

Link: http://arxiv.org/abs/2404.09406v1open in new window

Abstract: Broad-scale marine surveys performed by underwater vehicles significantly increase the availability of coral reef imagery, however it is costly and time-consuming for domain experts to label images. Point label propagation is an approach used to leverage existing image data labeled with sparse point labels. The resulting augmented ground truth generated is then used to train a semantic segmentation model. Here, we first demonstrate that recent advances in foundation models enable generation of multi-species coral augmented ground truth masks using denoised DINOv2 features and K-Nearest Neighbors (KNN), without the need for any pre-training or custom-designed algorithms. For extremely sparsely labeled images, we propose a labeling regime based on human-in-the-loop principles, resulting in significant improvement in annotation efficiency: If only 5 point labels per image are available, our proposed human-in-the-loop approach improves on the state-of-the-art by 17.3% for pixel accuracy and 22.6% for mIoU; and by 10.6% and 19.1% when 10 point labels per image are available. Even if the human-in-the-loop labeling regime is not used, the denoised DINOv2 features with a KNN outperforms the prior state-of-the-art by 3.5% for pixel accuracy and 5.7% for mIoU (5 grid points). We also provide a detailed analysis of how point labeling style and the quantity of points per image affects the point label propagation quality and provide general recommendations on maximizing point label efficiency.

2024-04-14

Deceptive Patterns of Intelligent and Interactive Writing Assistants

Authors: Karim Benharrak, Tim Zindulka, Daniel Buschek

Link: http://arxiv.org/abs/2404.09375v1open in new window

Abstract: Large Language Models have become an integral part of new intelligent and interactive writing assistants. Many are offered commercially with a chatbot-like UI, such as ChatGPT, and provide little information about their inner workings. This makes this new type of widespread system a potential target for deceptive design patterns. For example, such assistants might exploit hidden costs by providing guidance up until a certain point before asking for a fee to see the rest. As another example, they might sneak unwanted content/edits into longer generated or revised text pieces (e.g. to influence the expressed opinion). With these and other examples, we conceptually transfer several deceptive patterns from the literature to the new context of AI writing assistants. Our goal is to raise awareness and encourage future research into how the UI and interaction design of such systems can impact people and their writing.

Sinestesia as a model for HCI: a Systematic Review

Authors: Simona Corciulo, Mario Alessandro Bochicchio

Link: http://arxiv.org/abs/2404.09303v1open in new window

Abstract: Synesthesia, conceived as a neuropsychological condition, may prove valuable in studying the interaction between humans and machines by analyzing the co-occurrence of sensory or cognitive responses triggered by a stimulus. In our approach, synesthesia is elevated beyond a mere perceptual-cognitive anomaly, offering insights into the reciprocal interaction between humans and the digital system, steering novel experimental design and enriching results interpretations.This review broadens the traditional scope, conventionally rooted in neuroscience and psychology, by considering how computer science can approach this condition. The interdisciplinary examination revolves around two primary viewpoints: one associating this condition with specific cognitive, perceptual, and behavioral anomalies, and the other acknowledging it as a prevalent human experience. Synesthesia, in this review, emerges as a significant model for Human Computer Interaction (HCI). The exploration of this specific condition aims to decipher how atypical pathways of perception and cognition can be encoded, empowering machines to actively engage in processing information from both the body and the environment. The authors attempt to amalgamate findings and insights from various disciplines, fostering collaboration between computer science, neuroscience, psychology, and philosophy.The overarching objective is to construct a comprehensive framework that elucidates how synesthesia and anomalies in information processing can be harnessed within HCI, with a particular emphasis on contributing to digital technologies for medical research and enhancing patients care and comfort. In this sense, the review endeavors also to fill the gap between theoretical understanding and practical application.

Investigating the impact of virtual element misalignment in collaborative Augmented Reality experiences

Authors: Francesco Vona, Sina Hinzmann, Michael Stern, Tanja Kojić, Navid Ashrafi, David Grieshammer, Jan-Niklas Voigt-Antons

Link: http://arxiv.org/abs/2404.09174v1open in new window

Abstract: The collaboration in co-located shared environments has sparked an increased interest in immersive technologies, including Augmented Reality (AR). Since research in this field has primarily focused on individual user experiences in AR, the collaborative aspects within shared AR spaces remain less explored, and fewer studies can provide guidelines for designing this type of experience. This article investigates how the user experience in a collaborative shared AR space is affected by divergent perceptions of virtual objects and the effects of positional synchrony and avatars. For this purpose, we developed an AR app and used two distinct experimental conditions to study the influencing factors. Forty-eight participants, organized into 24 pairs, participated in the experiment and jointly interacted with shared virtual objects. Results indicate that divergent perceptions of virtual objects did not directly influence communication and collaboration dynamics. Conversely, positional synchrony emerged as a critical factor, significantly enhancing the quality of the collaborative experience. On the contrary, while not negligible, avatars played a relatively less pronounced role in influencing these dynamics. The findings can potentially offer valuable practical insights, guiding the development of future collaborative AR/VR environments.

Evaluating the efficacy of haptic feedback, 360° treadmill-integrated Virtual Reality framework and longitudinal training on decision-making performance in a complex search-and-shoot simulation

Authors: Akash K Rao, Arnav Bhavsar, Shubhajit Roy Chowdhury, Sushil Chandra, Ramsingh Negi, Prakash Duraisamy, Varun Dutt

Link: http://arxiv.org/abs/2404.09147v1open in new window

Abstract: Virtual Reality (VR) has made significant strides, offering users a multitude of ways to interact with virtual environments. Each sensory modality in VR provides distinct inputs and interactions, enhancing the user's immersion and presence. However, the potential of additional sensory modalities, such as haptic feedback and 360{\deg} locomotion, to improve decision-making performance has not been thoroughly investigated. This study addresses this gap by evaluating the impact of a haptic feedback, 360{\deg} locomotion-integrated VR framework and longitudinal, heterogeneous training on decision-making performance in a complex search-and-shoot simulation. The study involved 32 participants from a defence simulation base in India, who were randomly divided into two groups: experimental (haptic feedback, 360{\deg} locomotion-integrated VR framework with longitudinal, heterogeneous training) and placebo control (longitudinal, heterogeneous VR training without extrasensory modalities). The experiment lasted 10 days. On Day 1, all subjects executed a search-and-shoot simulation closely replicating the elements/situations in the real world. From Day 2 to Day 9, the subjects underwent heterogeneous training, imparted by the design of various complexity levels in the simulation using changes in behavioral attributes/artificial intelligence of the enemies. On Day 10, they repeated the search-and-shoot simulation executed on Day 1. The results showed that the experimental group experienced a gradual increase in presence, immersion, and engagement compared to the placebo control group. However, there was no significant difference in decision-making performance between the two groups on day 10. We intend to use these findings to design multisensory VR training frameworks that enhance engagement levels and decision-making performance.

2024-04-13

ALICE: Combining Feature Selection and Inter-Rater Agreeability for Machine Learning Insights

Authors: Bachana Anasashvili, Vahidin Jeleskovic

Link: http://arxiv.org/abs/2404.09053v1open in new window

Abstract: This paper presents a new Python library called Automated Learning for Insightful Comparison and Evaluation (ALICE), which merges conventional feature selection and the concept of inter-rater agreeability in a simple, user-friendly manner to seek insights into black box Machine Learning models. The framework is proposed following an overview of the key concepts of interpretability in ML. The entire architecture and intuition of the main methods of the framework are also thoroughly discussed and results from initial experiments on a customer churn predictive modeling task are presented, alongside ideas for possible avenues to explore for the future. The full source code for the framework and the experiment notebooks can be found at: https://github.com/anasashb/aliceHU

NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT

Authors: Xinzhe Zheng, Sijie Ji, Yipeng Pan, Kaiwen Zhang, Chenshu Wu

Link: http://arxiv.org/abs/2404.08939v1open in new window

Abstract: Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost Inertial Measurement Units (IMUs) and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor maximized the potential of deep learning to achieve the desired accuracy. To enhance the tracking accuracy for indoor robotic applications, we introduce NeurIT, a sequence-to-sequence framework that elevates tracking accuracy to a new level. NeurIT employs a Time-Frequency Block-recurrent Transformer (TF-BRT) at its core, combining the power of recurrent neural network (RNN) and Transformer to learn representative features in both time and frequency domains. To fully utilize IMU information, we strategically employ body-frame differentiation of the magnetometer, which considerably reduces the tracking error. NeurIT is implemented on a customized robotic platform and evaluated in various indoor environments. Experimental results demonstrate that NeurIT achieves a mere 1-meter tracking error over a 300-meter distance. Notably, it significantly outperforms state-of-the-art baselines by 48.21% on unseen data. NeurIT also performs comparably to the visual-inertial approach (Tango Phone) in vision-favored conditions and surpasses it in plain environments. We believe NeurIT takes an important step forward toward practical neural inertial tracking for ubiquitous and scalable tracking of robotic things. NeurIT, including the source code and the dataset, is open-sourced here: https://github.com/NeurIT-Project/NeurIT.

2024-04-12

Hindsight PRIORs for Reward Learning from Human Preferences

Authors: Mudit Verma, Katherine Metcalf

Link: http://arxiv.org/abs/2404.08828v1open in new window

Abstract: Preference based Reinforcement Learning (PbRL) removes the need to hand specify a reward function by learning a reward from preference feedback over policy behaviors. Current approaches to PbRL do not address the credit assignment problem inherent in determining which parts of a behavior most contributed to a preference, which result in data intensive approaches and subpar reward functions. We address such limitations by introducing a credit assignment strategy (Hindsight PRIOR) that uses a world model to approximate state importance within a trajectory and then guides rewards to be proportional to state importance through an auxiliary predicted return redistribution objective. Incorporating state importance into reward learning improves the speed of policy learning, overall policy performance, and reward recovery on both locomotion and manipulation tasks. For example, Hindsight PRIOR recovers on average significantly (p<0.05) more reward on MetaWorld (20%) and DMC (15%). The performance gains and our ablations demonstrate the benefits even a simple credit assignment strategy can have on reward learning and that state importance in forward dynamics prediction is a strong proxy for a state's contribution to a preference decision. Code repository can be found at https://github.com/apple/ml-rlhf-hindsight-prior.

Interactive Sonification for Health and Energy using ChucK and Unity

Authors: Yichun Zhao, George Tzanetakis

Link: http://arxiv.org/abs/2404.08813v1open in new window

Abstract: Sonification can provide valuable insights about data but most existing approaches are not designed to be controlled by the user in an interactive fashion. Interactions enable the designer of the sonification to more rapidly experiment with sound design and allow the sonification to be modified in real-time by interacting with various control parameters. In this paper, we describe two case studies of interactive sonification that utilize publicly available datasets that have been described recently in the International Conference on Auditory Display (ICAD). They are from the health and energy domains: electroencephalogram (EEG) alpha wave data and air pollutant data consisting of nitrogen dioxide, sulfur dioxide, carbon monoxide, and ozone. We show how these sonfications can be recreated to support interaction utilizing a general interactive sonification framework built using ChucK, Unity, and Chunity. In addition to supporting typical sonification methods that are common in existing sonification toolkits, our framework introduces novel methods such as supporting discrete events, interleaved playback of multiple data streams for comparison, and using frequency modulation (FM) synthesis in terms of one data attribute modulating another. We also describe how these new functionalities can be used to improve the sonification experience of the two datasets we have investigated.

A Typology of Decision-Making Tasks for Visualization

Authors: Camelia D. Brumar, Sam Molnar, Gabriel Appleby, Kristi Potter, Remco Chang

Link: http://arxiv.org/abs/2404.08812v1open in new window

Abstract: Despite decision-making being a vital goal of data visualization, little work has been done to differentiate the decision-making tasks within our field. While visualization task taxonomies and typologies exist, they are often too granular for describing complex decision goals and decision-making processes, thus limiting their potential use in designing decision-support tools. In this paper, we contribute a typology of decision-making tasks that were iteratively refined from a list of design goals distilled from a literature review. Our typology is concise and consists of only three tasks: choose, activate, and create. Originally proposed by the scientific community, we extend and provide definitions for these tasks that are suitable for the visualization community. Our proposed typology offers two benefits. First, it facilitates the composition of decisions using these three tasks, allowing for flexible and clear descriptions across varying complexities and domains. Second, diagrams created using this typology encourage productive discourse between visualization designers and domain experts by abstracting the intricacies of data, thereby promoting clarity and rigorous analysis of decision-making processes. We motivate the use of our typology through four case studies and demonstrate the benefits of our approach through semi-structured interviews conducted with experienced members of the visualization community, comprising academic and industry experts, who have contributed to developing or publishing decision support systems for domain experts. Our interviewees composed diagrams using our typology to delineate the decision-making processes that drive their decision-support tools, demonstrating its descriptive capacity and effectiveness.

Semantic Approach to Quantifying the Consistency of Diffusion Model Image Generation

Authors: Brinnae Bent

Link: http://arxiv.org/abs/2404.08799v1open in new window

Abstract: In this study, we identify the need for an interpretable, quantitative score of the repeatability, or consistency, of image generation in diffusion models. We propose a semantic approach, using a pairwise mean CLIP (Contrastive Language-Image Pretraining) score as our semantic consistency score. We applied this metric to compare two state-of-the-art open-source image generation diffusion models, Stable Diffusion XL and PixArt-{\alpha}, and we found statistically significant differences between the semantic consistency scores for the models. Agreement between the Semantic Consistency Score selected model and aggregated human annotations was 94%. We also explored the consistency of SDXL and a LoRA-fine-tuned version of SDXL and found that the fine-tuned model had significantly higher semantic consistency in generated images. The Semantic Consistency Score proposed here offers a measure of image generation alignment, facilitating the evaluation of model architectures for specific tasks and aiding in informed decision-making regarding model selection.

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models

Authors: Yingchaojie Feng, Zhizhang Chen, Zhining Kang, Sijia Wang, Minfeng Zhu, Wei Zhang, Wei Chen

Link: http://arxiv.org/abs/2404.08793v1open in new window

Abstract: The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential weaknesses. However, the complexity of evaluating jailbreak performance and understanding prompt characteristics makes this analysis laborious. We collaborate with domain experts to characterize problems and propose an LLM-assisted framework to streamline the analysis process. It provides automatic jailbreak assessment to facilitate performance evaluation and support analysis of components and keywords in prompts. Based on the framework, we design JailbreakLens, a visual analysis system that enables users to explore the jailbreak performance against the target model, conduct multi-level analysis of prompt characteristics, and refine prompt instances to verify findings. Through a case study, technical evaluations, and expert interviews, we demonstrate our system's effectiveness in helping users evaluate model security and identify model weaknesses.

Training a Vision Language Model as Smartphone Assistant

Authors: Nicolai Dorka, Janusz Marecki, Ammar Anwar

Link: http://arxiv.org/abs/2404.08755v1open in new window

Abstract: Addressing the challenge of a digital assistant capable of executing a wide array of user tasks, our research focuses on the realm of instruction-based mobile device control. We leverage recent advancements in large language models (LLMs) and present a visual language model (VLM) that can fulfill diverse tasks on mobile devices. Our model functions by interacting solely with the user interface (UI). It uses the visual input from the device screen and mimics human-like interactions, encompassing gestures such as tapping and swiping. This generality in the input and output space allows our agent to interact with any application on the device. Unlike previous methods, our model operates not only on a single screen image but on vision-language sentences created from sequences of past screenshots along with corresponding actions. Evaluating our method on the challenging Android in the Wild benchmark demonstrates its promising efficacy and potential.

VizGroup: An AI-Assisted Event-Driven System for Real-Time Collaborative Programming Learning Analytics

Authors: Xiaohang Tang, Sam Wong, Kevin Pu, Xi Chen, Yalong Yang, Yan Chen

Link: http://arxiv.org/abs/2404.08743v1open in new window

Abstract: Programming instructors often conduct collaborative learning activities, like Peer Instruction, to foster a deeper understanding in students and enhance their engagement with learning. These activities, however, may not always yield productive outcomes due to the diversity of student mental models and their ineffective collaboration. In this work, we introduce VizGroup, an AI-assisted system that enables programming instructors to easily oversee students' real-time collaborative learning behaviors during large programming courses. VizGroup leverages Large Language Models (LLMs) to recommend event specifications for instructors so that they can simultaneously track and receive alerts about key correlation patterns between various collaboration metrics and ongoing coding tasks. We evaluated VizGroup with 12 instructors using a dataset collected from a Peer Instruction activity that was conducted in a large programming lecture. The results showed that compared to a version of VizGroup without the suggested units, VizGroup with suggested units helped instructors create additional monitoring units on previously undetected patterns on their own, covered a more diverse range of metrics, and influenced the participants' following notification creation strategies.

Mixing Modes: Active and Passive Integration of Speech, Text, and Visualization for Communicating Data Uncertainty

Authors: Chase Stokes, Chelsea Sanker, Bridget Cogley, Vidya Setlur

Link: http://arxiv.org/abs/2404.08623v1open in new window

Abstract: Interpreting uncertain data can be difficult, particularly if the data presentation is complex. We investigate the efficacy of different modalities for representing data and how to combine the strengths of each modality to facilitate the communication of data uncertainty. We implemented two multimodal prototypes to explore the design space of integrating speech, text, and visualization elements. A preliminary evaluation with 20 participants from academic and industry communities demonstrates that there exists no one-size-fits-all approach for uncertainty communication strategies; rather, the effectiveness of conveying uncertain data is intertwined with user preferences and situational context, necessitating a more refined, multimodal strategy for future interface design.

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

Authors: Hassan Ali, Philipp Allgeuer, Stefan Wermter

Link: http://arxiv.org/abs/2404.08424v1open in new window

Abstract: Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior. Therefore, intention prediction is pivotal in creating a natural interactive collaboration between humans and robots. In this paper, we examine the use of Large Language Models (LLMs) for inferring human intention during a collaborative object categorization task with a physical robot. We introduce a hierarchical approach for interpreting user non-verbal cues, like hand gestures, body poses, and facial expressions and combining them with environment states and user verbal cues captured using an existing Automatic Speech Recognition (ASR) system. Our evaluation demonstrates the potential of LLMs to interpret non-verbal cues and to combine them with their context-understanding capabilities and real-world knowledge to support intention prediction during human-robot interaction.

Study of Emotion Concept Formation by Integrating Vision, Physiology, and Word Information using Multilayered Multimodal Latent Dirichlet Allocation

Authors: Kazuki Tsurumaki, Chie Hieida, Kazuki Miyazawa

Link: http://arxiv.org/abs/2404.08295v1open in new window

Abstract: How are emotions formed? Through extensive debate and the promulgation of diverse theories , the theory of constructed emotion has become prevalent in recent research on emotions. According to this theory, an emotion concept refers to a category formed by interoceptive and exteroceptive information associated with a specific emotion. An emotion concept stores past experiences as knowledge and can predict unobserved information from acquired information. Therefore, in this study, we attempted to model the formation of emotion concepts using a constructionist approach from the perspective of the constructed emotion theory. Particularly, we constructed a model using multilayered multimodal latent Dirichlet allocation , which is a probabilistic generative model. We then trained the model for each subject using vision, physiology, and word information obtained from multiple people who experienced different visual emotion-evoking stimuli. To evaluate the model, we verified whether the formed categories matched human subjectivity and determined whether unobserved information could be predicted via categories. The verification results exceeded chance level, suggesting that emotion concept formation can be explained by the proposed model.

GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality

Authors: Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, Jon E. Froehlich

Link: http://arxiv.org/abs/2404.08213v1open in new window

Abstract: Voice assistants (VAs) like Siri and Alexa are transforming human-computer interaction; however, they lack awareness of users' spatiotemporal context, resulting in limited performance and unnatural dialogue. We introduce GazePointAR, a fully-functional context-aware VA for wearable augmented reality that leverages eye gaze, pointing gestures, and conversation history to disambiguate speech queries. With GazePointAR, users can ask "what's over there?" or "how do I solve this math problem?" simply by looking and/or pointing. We evaluated GazePointAR in a three-part lab study (N=12): (1) comparing GazePointAR to two commercial systems; (2) examining GazePointAR's pronoun disambiguation across three tasks; (3) and an open-ended phase where participants could suggest and try their own context-sensitive queries. Participants appreciated the naturalness and human-like nature of pronoun-driven queries, although sometimes pronoun use was counter-intuitive. We then iterated on GazePointAR and conducted a first-person diary study examining how GazePointAR performs in-the-wild. We conclude by enumerating limitations and design considerations for future context-aware VAs.

The Clarkston AR Gateways Project: Anchoring Refugee Presence and Narratives in a Small Town

Authors: Joshua A. Fisher, Fernando Rochaix

Link: http://arxiv.org/abs/2404.08179v1open in new window

Abstract: This paper outlines the Clarkston AR Gateways Project, a speculative process and artifact entering its second phase, where Augmented Reality (AR) will be used to amplify the diverse narratives of Clarkston, Georgia's refugee community. Focused on anchoring their stories and presence into the town's physical and digital landscapes, the project employs a participatory co-design approach, engaging directly with community members. This placemaking effort aims to uplift refugees by teaching them AR development skills that help them more autonomously express and elevate their voices through public art. The result is hoped to be AR experiences that not only challenge prevailing narratives but also celebrate the tapestry of cultures in the small town. This work is supported through AR's unique affordance for users to situate their experiences as interactive narratives within public spaces. Such site-specific AR interactive stories can encourage interactions within those spaces that shift how they are conceived, perceived, and experienced. This process of refugee-driven AR creation reflexively alters the space and affirms their presence and agency. The project's second phase aims to establish a model adaptable to diverse, refugee-inclusive communities, demonstrating how AR storytelling can be a powerful tool for cultural orientation and celebration.

2024-04-11

A-DisETrac Advanced Analytic Dashboard for Distributed Eye Tracking

Authors: Yasasi Abeysinghe, Bhanuka Mahanama, Gavindya Jayawardena, Yasith Jayawardana, Mohan Sunkara, Andrew T. Duchowski, Vikas Ashok, Sampath Jayarathna

Link: http://arxiv.org/abs/2404.08143v1open in new window

Abstract: Understanding how individuals focus and perform visual searches during collaborative tasks can help improve user engagement. Eye tracking measures provide informative cues for such understanding. This article presents A-DisETrac, an advanced analytic dashboard for distributed eye tracking. It uses off-the-shelf eye trackers to monitor multiple users in parallel, compute both traditional and advanced gaze measures in real-time, and display them on an interactive dashboard. Using two pilot studies, the system was evaluated in terms of user experience and utility, and compared with existing work. Moreover, the system was used to study how advanced gaze measures such as ambient-focal coefficient K and real-time index of pupillary activity relate to collaborative behavior. It was observed that the time a group takes to complete a puzzle is related to the ambient visual scanning behavior quantified and groups that spent more time had more scanning behavior. User experience questionnaire results suggest that their dashboard provides a comparatively good user experience.

Ephemeral Myographic Motion: Repurposing the Myo Armband to Control Disposable Pneumatic Sculptures

Authors: Celia Chen, Alex Leitch

Link: http://arxiv.org/abs/2404.08065v1open in new window

Abstract: This paper details the development of an interactive sculpture built from deprecated hardware technology and intentionally decomposable, transient materials. We detail a case study of "Strain" - an emotive prototype that reclaims two orphaned digital artifacts to power a kinetic sculpture made of common disposable objects. We use the Myo, an abandoned myoelectric armband, in concert with the Programmable Air, a soft-robotics prototyping project, to manipulate a pneumatic bladder array constructed from condoms, bamboo skewers, and a small library of 3D printed PLA plastic connectors designed to work with these generic parts. The resulting sculpture achieves surprisingly organic actuation. The goal of this project is to produce several reusable components: software to resuscitate the Myo Armband, homeostasis software for the Programmable Air or equivalent pneumatic projects, and a library of easily-printed parts that will work with generic bamboo disposables for sculptural prototyping. This project works to develop usable, repeatable engineering by applying it to a slightly whimsical object that promotes a strong emotional response in its audience. Through this, we transform the disposable into the sustainable. In this paper, we reflect on project-based insights into rescuing and revitalizing abandoned consumer electronics for future works.

Goal Recognition via Linear Programming

Authors: Felipe Meneguzzi, Luísa R. de A. Santos, Ramon Fraga Pereira, André G. Pereira

Link: http://arxiv.org/abs/2404.07934v1open in new window

Abstract: Goal Recognition is the task by which an observer aims to discern the goals that correspond to plans that comply with the perceived behavior of subject agents given as a sequence of observations. Research on Goal Recognition as Planning encompasses reasoning about the model of a planning task, the observations, and the goals using planning techniques, resulting in very efficient recognition approaches. In this article, we design novel recognition approaches that rely on the Operator-Counting framework, proposing new constraints, and analyze their constraints' properties both theoretically and empirically. The Operator-Counting framework is a technique that efficiently computes heuristic estimates of cost-to-goal using Integer/Linear Programming (IP/LP). In the realm of theory, we prove that the new constraints provide lower bounds on the cost of plans that comply with observations. We also provide an extensive empirical evaluation to assess how the new constraints improve the quality of the solution, and we found that they are especially informed in deciding which goals are unlikely to be part of the solution. Our novel recognition approaches have two pivotal advantages: first, they employ new IP/LP constraints for efficiently recognizing goals; second, we show how the new IP/LP constraints can improve the recognition of goals under both partial and noisy observability.

Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation

Authors: Jinkyung Park, Pamela Wisniewski, Vivek Singh

Link: http://arxiv.org/abs/2404.07926v1open in new window

Abstract: In this position paper, we discuss the potential for leveraging LLMs as interactive research tools to facilitate collaboration between human coders and AI to effectively annotate online risk data at scale. Collaborative human-AI labeling is a promising approach to annotating large-scale and complex data for various tasks. Yet, tools and methods to support effective human-AI collaboration for data annotation are under-studied. This gap is pertinent because co-labeling tasks need to support a two-way interactive discussion that can add nuance and context, particularly in the context of online risk, which is highly subjective and contextualized. Therefore, we provide some of the early benefits and challenges of using LLMs-based tools for risk annotation and suggest future directions for the HCI research community to leverage LLMs as research tools to facilitate human-AI collaboration in contextualized online data annotation. Our research interests align very well with the purposes of the LLMs as Research Tools workshop to identify ongoing applications and challenges of using LLMs to work with data in HCI research. We anticipate learning valuable insights from organizers and participants into how LLMs can help reshape the HCI community's methods for working with data.

Snake Story: Exploring Game Mechanics for Mixed-Initiative Co-creative Storytelling Games

Authors: Daijin Yang, Erica Kleinman, Giovanni Maria Troiano, Elina Tochilnikova, Casper Harteveld

Link: http://arxiv.org/abs/2404.07901v1open in new window

Abstract: Mixed-initiative co-creative storytelling games have existed for some time as a way to merge storytelling with play. However, modern mixed-initiative co-creative storytelling games predominantly prioritize story creation over gameplay mechanics, which might not resonate with all players. As such, there is untapped potential for creating mixed-initiative games with more complex mechanics in which players can engage with both co-creation and gameplay goals. To explore the potential of more prominent gameplay in mixed-initiative co-creative storytelling games, we created Snake Story, a variation of the classic Snake game featuring a human-AI co-writing element. To explore how players interact with the mixed-initiative game, we conducted a qualitative playtest with 11 participants. Analysis of both think-aloud and interview data revealed that players' strategies and experiences were affected by their perception of Snake Story as either a collaborative tool, a traditional game, or a combination of both. Based on these findings, we present design considerations for future development in mixed-initiative co-creative gaming.

Apprentice Tutor Builder: A Platform For Users to Create and Personalize Intelligent Tutors

Authors: Glen Smith, Adit Gupta, Christopher MacLellan

Link: http://arxiv.org/abs/2404.07883v1open in new window

Abstract: Intelligent tutoring systems (ITS) are effective for improving students' learning outcomes. However, their development is often complex, time-consuming, and requires specialized programming and tutor design knowledge, thus hindering their widespread application and personalization. We present the Apprentice Tutor Builder (ATB) , a platform that simplifies tutor creation and personalization. Instructors can utilize ATB's drag-and-drop tool to build tutor interfaces. Instructors can then interactively train the tutors' underlying AI agent to produce expert models that can solve problems. Training is achieved via using multiple interaction modalities including demonstrations, feedback, and user labels. We conducted a user study with 14 instructors to evaluate the effectiveness of ATB's design with end users. We found that users enjoyed the flexibility of the interface builder and ease and speed of agent teaching, but often desired additional time-saving features. With these insights, we identified a set of design recommendations for our platform and others that utilize interactive AI agents for tutor creation and customization.

The Dance of Logic and Unpredictability: Examining the Predictability of User Behavior on Visual Analytics Tasks

Authors: Alvitta Ottley

Link: http://arxiv.org/abs/2404.07865v1open in new window

Abstract: The quest to develop intelligent visual analytics (VA) systems capable of collaborating and naturally interacting with humans presents a multifaceted and intriguing challenge. VA systems designed for collaboration must adeptly navigate a complex landscape filled with the subtleties and unpredictabilities that characterize human behavior. However, it is noteworthy that scenarios exist where human behavior manifests predictably. These scenarios typically involve routine actions or present a limited range of choices. This paper delves into the predictability of user behavior in the context of visual analytics tasks. It offers an evidence-based discussion on the circumstances under which predicting user behavior is feasible and those where it proves challenging. We conclude with a forward-looking discussion of the future work necessary to cultivate more synergistic and efficient partnerships between humans and the VA system. This exploration is not just about understanding our current capabilities and limitations in mirroring human behavior but also about envisioning and paving the way for a future where human-machine interaction is more intuitive and productive.

Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models -- Technical Challenges and Implications for Monitoring and Verification

Authors: Tuong Vy Nguyen, Alexander Glaser, Felix Biessmann

Link: http://arxiv.org/abs/2404.07754v1open in new window

Abstract: Novel deep-learning (DL) architectures have reached a level where they can generate digital media, including photorealistic images, that are difficult to distinguish from real data. These technologies have already been used to generate training data for Machine Learning (ML) models, and large text-to-image models like DALL-E 2, Imagen, and Stable Diffusion are achieving remarkable results in realistic high-resolution image generation. Given these developments, issues of data authentication in monitoring and verification deserve a careful and systematic analysis: How realistic are synthetic images? How easily can they be generated? How useful are they for ML researchers, and what is their potential for Open Science? In this work, we use novel DL models to explore how synthetic satellite images can be created using conditioning mechanisms. We investigate the challenges of synthetic satellite image generation and evaluate the results based on authenticity and state-of-the-art metrics. Furthermore, we investigate how synthetic data can alleviate the lack of data in the context of ML methods for remote-sensing. Finally we discuss implications of synthetic satellite imagery in the context of monitoring and verification.

Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models

Authors: Marvin Pafla, Kate Larson, Mark Hancock

Link: http://arxiv.org/abs/2404.07725v1open in new window

Abstract: The field of eXplainable artificial intelligence (XAI) has produced a plethora of methods (e.g., saliency-maps) to gain insight into artificial intelligence (AI) models, and has exploded with the rise of deep learning (DL). However, human-participant studies question the efficacy of these methods, particularly when the AI output is wrong. In this study, we collected and analyzed 156 human-generated text and saliency-based explanations collected in a question-answering task (N=40) and compared them empirically to state-of-the-art XAI explanations (integrated gradients, conservative LRP, and ChatGPT) in a human-participant study (N=136). Our findings show that participants found human saliency maps to be more helpful in explaining AI answers than machine saliency maps, but performance negatively correlated with trust in the AI model and explanations. This finding hints at the dilemma of AI errors in explanation, where helpful explanations can lead to lower task performance when they support wrong AI predictions.

Efficient sEMG-based Cross-Subject Joint Angle Estimation via Hierarchical Spiking Attentional Feature Decomposition Network

Authors: Xin Zhou, Chuang Lin, Can Wang, Xiaojiang Peng

Link: http://arxiv.org/abs/2404.07517v1open in new window

Abstract: Surface electromyography (sEMG) has demonstrated significant potential in simultaneous and proportional control (SPC). However, existing algorithms for predicting joint angles based on sEMG often suffer from high inference costs or are limited to specific subjects rather than cross-subject scenarios. To address these challenges, we introduced a hierarchical Spiking Attentional Feature Decomposition Network (SAFE-Net). This network initially compresses sEMG signals into neural spiking forms using a Spiking Sparse Attention Encoder (SSAE). Subsequently, the compressed features are decomposed into kinematic and biological features through a Spiking Attentional Feature Decomposition (SAFD) module. Finally, the kinematic and biological features are used to predict joint angles and identify subject identities, respectively. Our validation on two datasets (SIAT-DB1 and SIAT-DB2) and comparison with two existing methods, Informer and Spikformer, demonstrate that SSAE achieves significant power consumption savings of 39.1% and 37.5% respectively over them in terms of inference costs. Furthermore, SAFE-Net surpasses Informer and Spikformer in recognition accuracy on both datasets. This study underscores the potential of SAFE-Net to advance the field of SPC in lower limb rehabilitation exoskeleton robots.

Interactive Prompt Debugging with Sequence Salience

Authors: Ian Tenney, Ryan Mullins, Bin Du, Shree Pandya, Minsuk Kahng, Lucas Dixon

Link: http://arxiv.org/abs/2404.07498v1open in new window

Abstract: We present Sequence Salience, a visual tool for interactive prompt debugging with input salience methods. Sequence Salience builds on widely used salience methods for text classification and single-token prediction, and extends this to a system tailored for debugging complex LLM prompts. Our system is well-suited for long texts, and expands on previous work by 1) providing controllable aggregation of token-level salience to the word, sentence, or paragraph level, making salience over long inputs tractable; and 2) supporting rapid iteration where practitioners can act on salience results, refine prompts, and run salience on the new output. We include case studies showing how Sequence Salience can help practitioners work with several complex prompting strategies, including few-shot, chain-of-thought, and constitutional principles. Sequence Salience is built on the Learning Interpretability Tool, an open-source platform for ML model visualizations, and code, notebooks, and tutorials are available at http://goo.gle/sequence-salience.

RASSAR: Room Accessibility and Safety Scanning in Augmented Reality

Authors: Xia Su, Han Zhang, Kaiming Cheng, Jaewook Lee, Qiaochu Liu, Wyatt Olson, Jon Froehlich

Link: http://arxiv.org/abs/2404.07479v1open in new window

Abstract: The safety and accessibility of our homes is critical to quality of life and evolves as we age, become ill, host guests, or experience life events such as having children. Researchers and health professionals have created assessment instruments such as checklists that enable homeowners and trained experts to identify and mitigate safety and access issues. With advances in computer vision, augmented reality (AR), and mobile sensors, new approaches are now possible. We introduce RASSAR, a mobile AR application for semi-automatically identifying, localizing, and visualizing indoor accessibility and safety issues such as an inaccessible table height or unsafe loose rugs using LiDAR and real-time computer vision. We present findings from three studies: a formative study with 18 participants across five stakeholder groups to inform the design of RASSAR, a technical performance evaluation across ten homes demonstrating state-of-the-art performance, and a user study with six stakeholders. We close with a discussion of future AI-based indoor accessibility assessment tools, RASSAR's extensibility, and key application scenarios.

Diversity's Double-Edged Sword: Analyzing Race's Effect on Remote Pair Programming Interactions

Authors: Shandler A. Mason, Sandeep Kaur Kuttal

Link: http://arxiv.org/abs/2404.07427v1open in new window

Abstract: Remote pair programming is widely used in software development, but no research has examined how race affects these interactions. We embarked on this study due to the historical under representation of Black developers in the tech industry, with White developers comprising the majority. Our study involved 24 experienced developers, forming 12 gender-balanced same- and mixed-race pairs. Pairs collaborated on a programming task using the think-aloud method, followed by individual retrospective interviews. Our findings revealed elevated productivity scores for mixed-race pairs, with no differences in code quality between same- and mixed-race pairs. Mixed-race pairs excelled in task distribution, shared decision-making, and role-exchange but encountered communication challenges, discomfort, and anxiety, shedding light on the complexity of diversity dynamics. Our study emphasizes race's impact on remote pair programming and underscores the need for diverse tools and methods to address racial disparities for collaboration.

Too good to be true: People reject free gifts from robots because they infer bad intentions

Authors: Benjamin Lebrun, Andrew Vonasch, Christoph Bartneck

Link: http://arxiv.org/abs/2404.07409v1open in new window

Abstract: A recent psychology study found that people sometimes reject overly generous offers from people because they imagine hidden ''phantom costs'' must be part of the transaction. Phantom costs occur when a person seems overly generous for no apparent reason. This study aims to explore whether people can imagine phantom costs when interacting with a robot. To this end, screen or physically embodied agents (human or robot) offered to people either a cookie or a cookie + $2. Participants were then asked to make a choice whether they would accept or decline the offer. Results showed that people did perceive phantom costs in the offer + $2 conditions when interacting with a human, but also with a robot, across both embodiment levels, leading to the characteristic behavioral effect that offering more money made people less likely to accept the offer. While people were more likely to accept offers from a robot than from a human, people more often accepted offers from humans when they were physically compared to screen embodied but were equally likely to accept the offer from a robot whether it was screen or physically embodied. This suggests that people can treat robots (and humans) as social agents with hidden intentions and knowledge, and that this influences their behavior toward them. This provides not only new insights on how people make decisions when interacting with a robot but also how robot embodiment impacts HRI research.

SealMates: Supporting Communication in Video Conferencing using a Collective Behavior-Driven Avatar

Authors: Mark Armstrong, Chi-Lan Yang, Kinga Skiers, Mengzhen Lim, Tamil Selvan Gunasekaran, Ziyue Wang, Takuji Narumi, Kouta Minamizawa, Yun Suen Pai

Link: http://arxiv.org/abs/2404.07403v1open in new window

Abstract: The limited nonverbal cues and spatially distributed nature of remote communication make it challenging for unacquainted members to be expressive during social interactions over video conferencing. Though it enables seeing others' facial expressions, the visual feedback can instead lead to unexpected self-focus, resulting in users missing cues for others to engage in the conversation equally. To support expressive communication and equal participation among unacquainted counterparts, we propose SealMates, a behavior-driven avatar in which the avatar infers the engagement level of the group based on collective gaze and speech patterns and then moves across interlocutors' windows in the video conferencing. By conducting a controlled experiment with 15 groups of triads, we found the avatar's movement encouraged people to experience more self-disclosure and made them perceive everyone was equally engaged in the conversation than when there was no behavior-driven avatar. We discuss how a behavior-driven avatar influences distributed members' perceptions and the implications of avatar-mediated communication for future platforms.

2024-04-10

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks

Authors: Ruijia Cheng, Titus Barik, Alan Leung, Fred Hohman, Jeffrey Nichols

Link: http://arxiv.org/abs/2404.07387v1open in new window

Abstract: Novices frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI-based scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through 10 user studies where novices used BISCUIT for machine learning tutorials, we discover that BISCUIT offers user semantic representation of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas. We discuss the implications of our findings for UI-centric interactive paradigm in code generation LLMs.

DimBridge: Interactive Explanation of Visual Patterns in Dimensionality Reductions with Predicate Logic

Authors: Brian Montambault, Gabriel Appleby, Jen Rogers, Camelia D. Brumar, Mingwei Li, Remco Chang

Link: http://arxiv.org/abs/2404.07386v2open in new window

Abstract: Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact with visual patterns in a projection and retrieve corresponding data patterns. DimBridge supports several interactions, allowing users to perform various analyses, from contrasting multiple clusters to explaining complex latent structures. Leveraging first-order predicate logic, DimBridge identifies subspaces in the original dimensions relevant to a queried pattern and provides an interface for users to visualize and interact with them. We demonstrate how DimBridge can help users overcome the challenges associated with interpreting visual patterns in projections.

Interactive Explanation of Visual Patterns in Dimensionality Reductions with Predicate Logic

Authors: Brian Montambault, Gabriel Appleby, Jen Rogers, Camelia D. Brumar, Mingwei Li, Remco Chang

Link: http://arxiv.org/abs/2404.07386v1open in new window

Abstract: Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact with visual patterns in a projection and retrieve corresponding data patterns. DimBridge supports several interactions, allowing users to perform various analyses, from contrasting multiple clusters to explaining complex latent structures. Leveraging first-order predicate logic, DimBridge identifies subspaces in the original dimensions relevant to a queried pattern and provides an interface for users to visualize and interact with them. We demonstrate how DimBridge can help users overcome the challenges associated with interpreting visual patterns in projections.

Building Workflows for Interactive Human in the Loop Automated Experiment (hAE) in STEM-EELS

Authors: Utkarsh Pratiush, Kevin M. Roccapriore, Yongtao Liu, Gerd Duscher, Maxim Ziatdinov, Sergei V. Kalinin

Link: http://arxiv.org/abs/2404.07381v1open in new window

Abstract: Exploring the structural, chemical, and physical properties of matter on the nano- and atomic scales has become possible with the recent advances in aberration-corrected electron energy-loss spectroscopy (EELS) in scanning transmission electron microscopy (STEM). However, the current paradigm of STEM-EELS relies on the classical rectangular grid sampling, in which all surface regions are assumed to be of equal a priori interest. This is typically not the case for real-world scenarios, where phenomena of interest are concentrated in a small number of spatial locations. One of foundational problems is the discovery of nanometer- or atomic scale structures having specific signatures in EELS spectra. Here we systematically explore the hyperparameters controlling deep kernel learning (DKL) discovery workflows for STEM-EELS and identify the role of the local structural descriptors and acquisition functions on the experiment progression. In agreement with actual experiment, we observe that for certain parameter combinations the experiment path can be trapped in the local minima. We demonstrate the approaches for monitoring automated experiment in the real and feature space of the system and monitor knowledge acquisition of the DKL model. Based on these, we construct intervention strategies, thus defining human-in the loop automated experiment (hAE). This approach can be further extended to other techniques including 4D STEM and other forms of spectroscopic imaging.

Fabricating Paper Circuits with Subtractive Processing

Authors: Ruhan Yang, Krithik Ranjan, Ellen Yi-Luen Do

Link: http://arxiv.org/abs/2404.07364v1open in new window

Abstract: This paper introduces a new method of paper circuit fabrication that overcomes design barriers and increases flexibility in circuit design. Conventional circuit boards rely on thin traces, which limits the complexity and accuracy when applied to paper circuits. To address this issue, we propose a method that uses large conductive zones in paper circuits and performs subtractive processing during their fabrication. This approach eliminates design barriers and allows for more flexibility in circuit design. We introduce PaperCAD, a software tool that simplifies the design process by converting traditional circuit design to paper circuit design. We demonstrate our technique by creating two paper circuit boards. Our approach has the potential to promote the development of new applications for paper circuits.

"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

Authors: Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, Carrie J. Cai

Link: http://arxiv.org/abs/2404.07362v1open in new window

Abstract: Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.

Enhancing Accessibility in Soft Robotics: Exploring Magnet-Embedded Paper-Based Interactions

Authors: Ruhan Yang, Ellen Yi-Luen Do

Link: http://arxiv.org/abs/2404.07360v1open in new window

Abstract: This paper explores the implementation of embedded magnets to enhance paper-based interactions. The integration of magnets in paper-based interactions simplifies the fabrication process, making it more accessible for building soft robotics systems. We discuss various interaction patterns achievable through this approach and highlight their potential applications.

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

Authors: Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang

Link: http://arxiv.org/abs/2404.07351v1open in new window

Abstract: Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important. To effectively automate the process of video analysis based on eye-tracking data, it is important to accurately replicate human gaze behavior. However, this task presents significant challenges due to the inherent complexity and ambiguity of human gaze patterns. In this work, we introduce a novel method for simulating human gaze behavior. Our approach uses a transformer-based reinforcement learning algorithm to train an agent that acts as a human observer, with the primary role of watching videos and simulating human gaze behavior. We employed an eye-tracking dataset gathered from videos generated by the VirtualHome simulator, with a primary focus on activity recognition. Our experimental results demonstrate the effectiveness of our gaze prediction method by highlighting its capability to replicate human gaze behavior and its applicability for downstream tasks where real human-gaze is used as input.

Mixed Reality Heritage Performance As a Decolonising Tool for Heritage Sites

Authors: Mariza Dima, Damon Daylamani-Zad, Vangelis Lympouridis

Link: http://arxiv.org/abs/2404.07348v1open in new window

Abstract: In this paper we introduce two world-first Mixed Reality (MR) experiences that fuse smart AR glasses and live theatre and take place in a heritage site with the purpose to reveal the site's hidden and difficult histories about slavery. We term these unique general audience experiences Mixed Reality Heritage Performances (MRHP). Along with the development of our initial two performances we designed and developed a tool and guidelines that can help heritage organisations with their decolonising process by critically engaging the public with under-represented voices and viewpoints of troubled European and colonial narratives. The evaluations showed the embodied and affective potential of MRHP to attract and educate heritage audiences visitors. Insights of the design process are being formulated into an extensive design toolkit that aims to support experience design, theatre and heritage professionals to collaboratively carry out similar projects.

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention

Authors: Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang

Link: http://arxiv.org/abs/2404.07347v1open in new window

Abstract: Humans utilize their gaze to concentrate on essential information while perceiving and interpreting intentions in videos. Incorporating human gaze into computational algorithms can significantly enhance model performance in video understanding tasks. In this work, we address a challenging and innovative task in video understanding: predicting the actions of an agent in a video based on a partial video. We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input. Our method utilizes a Graph Neural Network to recognize the agent's intention and predict the action sequence to fulfill this intention. To assess the efficiency of our approach, we collect a dataset containing household activities generated in the VirtualHome environment, accompanied by human gaze data of viewing videos. Our method outperforms state-of-the-art techniques, achieving a 7% improvement in accuracy for 18-class intention recognition. This highlights the efficiency of our method in learning important features from human gaze data.

Evaluating Navigation and Comparison Performance of Computational Notebooks on Desktop and in Virtual Reality

Authors: Sungwon In, Erick Krokos, Kirsten Whitley, Chris North, Yalong Yang

Link: http://arxiv.org/abs/2404.07161v1open in new window

Abstract: The computational notebook serves as a versatile tool for data analysis. However, its conventional user interface falls short of keeping pace with the ever-growing data-related tasks, signaling the need for novel approaches. With the rapid development of interaction techniques and computing environments, there is a growing interest in integrating emerging technologies in data-driven workflows. Virtual reality, in particular, has demonstrated its potential in interactive data visualizations. In this work, we aimed to experiment with adapting computational notebooks into VR and verify the potential benefits VR can bring. We focus on the navigation and comparison aspects as they are primitive components in analysts' workflow. To further improve comparison, we have designed and implemented a Branching&Merging functionality. We tested computational notebooks on the desktop and in VR, both with and without the added Branching&Merging capability. We found VR significantly facilitated navigation compared to desktop, and the ability to create branches enhanced comparison.

Exploring Physiological Responses in Virtual Reality-based Interventions for Autism Spectrum Disorder: A Data-Driven Investigation

Authors: Gianpaolo Alvari, Ersilia Vallefuoco, Melanie Cristofolini, Elio Salvadori, Marco Dianti, Alessia Moltani, Davide Dal Castello, Paola Venuti, Cesare Furlanello

Link: http://arxiv.org/abs/2404.07159v1open in new window

Abstract: Virtual Reality (VR) has emerged as a promising tool for enhancing social skills and emotional well-being in individuals with Autism Spectrum Disorder (ASD). Through a technical exploration, this study employs a multiplayer serious gaming environment within VR, engaging 34 individuals diagnosed with ASD and employing high-precision biosensors for a comprehensive view of the participants' arousal and responses during the VR sessions. Participants were subjected to a series of 3 virtual scenarios designed in collaboration with stakeholders and clinical experts to promote socio-cognitive skills and emotional regulation in a controlled and structured virtual environment. We combined the framework with wearable non-invasive sensors for bio-signal acquisition, focusing on the collection of heart rate variability, and respiratory patterns to monitor participants behaviors. Further, behavioral assessments were conducted using observation and semi-structured interviews, with the data analyzed in conjunction with physiological measures to identify correlations and explore digital-intervention efficacy. Preliminary analysis revealed significant correlations between physiological responses and behavioral outcomes, indicating the potential of physiological feedback to enhance VR-based interventions for ASD. The study demonstrated the feasibility of using real-time data to adapt virtual scenarios, suggesting a promising avenue to support personalized therapy. The integration of quantitative physiological feedback into digital platforms represents a forward step in the personalized intervention for ASD. By leveraging real-time data to adjust therapeutic content, this approach promises to enhance the efficacy and engagement of digital-based therapies.

How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models

Authors: Unnseo Park, Venkatesh Sivaraman, Adam Perer

Link: http://arxiv.org/abs/2404.07148v1open in new window

Abstract: Reinforcement learning (RL) is a promising approach to generate treatment policies for sepsis patients in intensive care. While retrospective evaluation metrics show decreased mortality when these policies are followed, studies with clinicians suggest their recommendations are often spurious. We propose that these shortcomings may be due to lack of diversity in observed actions and outcomes in the training data, and we construct experiments to investigate the feasibility of predicting sepsis disease severity changes due to clinician actions. Preliminary results suggest incorporating action information does not significantly improve model performance, indicating that clinician actions may not be sufficiently variable to yield measurable effects on disease progression. We discuss the implications of these findings for optimizing sepsis treatment.

"My toxic trait is thinking I'll remember this": gaps in the learner experience of video tutorials for feature-rich software

Authors: Ian Drosos, Advait Sarkar, Andrew D. Gordon

Link: http://arxiv.org/abs/2404.07114v1open in new window

Abstract: Video tutorials are a popular medium for informal and formal learning. However, when learners attempt to view and follow along with these tutorials, they encounter what we call gaps, that is, issues that can prevent learning. We examine the gaps encountered by users of video tutorials for feature-rich software, such as spreadsheets. We develop a theory and taxonomy of such gaps, identifying how they act as barriers to learning, by collecting and analyzing 360 viewer comments from 90 Microsoft Excel video tutorials published by 43 creators across YouTube, TikTok, and Instagram. We conducted contextual interviews with 8 highly influential tutorial creators to investigate the gaps their viewers experience and how they address them. Further, we obtain insights into their creative process and frustrations when creating video tutorials. Finally, we present creators with two designs that aim to address gaps identified in the comment analysis for feedback and alternative design ideas.

VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

Authors: Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos

Link: http://arxiv.org/abs/2404.07078v1open in new window

Abstract: Recognising emotions in context involves identifying the apparent emotions of an individual, taking into account contextual cues from the surrounding scene. Previous approaches to this task have involved the design of explicit scene-encoding architectures or the incorporation of external scene-related information, such as captions. However, these methods often utilise limited contextual information or rely on intricate training pipelines. In this work, we leverage the groundbreaking capabilities of Vision-and-Large-Language Models (VLLMs) to enhance in-context emotion classification without introducing complexity to the training process in a two-stage approach. In the first stage, we propose prompting VLLMs to generate descriptions in natural language of the subject's apparent emotion relative to the visual context. In the second stage, the descriptions are used as contextual information and, along with the image input, are used to train a transformer-based architecture that fuses text and visual features before the final classification task. Our experimental results show that the text and image features have complementary information, and our fused architecture significantly outperforms the individual modalities without any complex training methods. We evaluate our approach on three different datasets, namely, EMOTIC, CAER-S, and BoLD, and achieve state-of-the-art or comparable accuracy across all datasets and metrics compared to much more complex approaches. The code will be made publicly available on github: https://github.com/NickyFot/EmoCommonSense.git

WordDecipher: Enhancing Digital Workspace Communication with Explainable AI for Non-native English Speakers

Authors: Yuexi Chen, Zhicheng Liu

Link: http://arxiv.org/abs/2404.07005v1open in new window

Abstract: Non-native English speakers (NNES) face challenges in digital workspace communication (e.g., emails, Slack messages), often inadvertently translating expressions from their native languages, which can lead to awkward or incorrect usage. Current AI-assisted writing tools are equipped with fluency enhancement and rewriting suggestions; however, NNES may struggle to grasp the subtleties among various expressions, making it challenging to choose the one that accurately reflects their intent. Such challenges are exacerbated in high-stake text-based communications, where the absence of non-verbal cues heightens the risk of misinterpretation. By leveraging the latest advancements in large language models (LLM) and word embeddings, we propose WordDecipher, an explainable AI-assisted writing tool to enhance digital workspace communication for NNES. WordDecipher not only identifies the perceived social intentions detected in users' writing, but also generates rewriting suggestions aligned with users' intended messages, either numerically or by inferring from users' writing in their native language. Then, WordDecipher provides an overview of nuances to help NNES make selections. Through a usage scenario, we demonstrate how WordDecipher can significantly enhance an NNES's ability to communicate her request, showcasing its potential to transform workspace communication for NNES.

Untangling Critical Interaction with AI in Students Written Assessment

Authors: Antonette Shibani, Simon Knight, Kirsty Kitto, Ajanie Karunanayake, Simon Buckingham Shum

Link: http://arxiv.org/abs/2404.06955v1open in new window

Abstract: Artificial Intelligence (AI) has become a ubiquitous part of society, but a key challenge exists in ensuring that humans are equipped with the required critical thinking and AI literacy skills to interact with machines effectively by understanding their capabilities and limitations. These skills are particularly important for learners to develop in the age of generative AI where AI tools can demonstrate complex knowledge and ability previously thought to be uniquely human. To activate effective human-AI partnerships in writing, this paper provides a first step toward conceptualizing the notion of critical learner interaction with AI. Using both theoretical models and empirical data, our preliminary findings suggest a general lack of Deep interaction with AI during the writing process. We believe that the outcomes can lead to better task and tool design in the future for learners to develop deep, critical thinking when interacting with AI.

ChildCIdbLong: Longitudinal Child-Computer Interaction Database and Quantitative Analysis for Child Development

Authors: Juan Carlos Ruiz-Garcia, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Jaime Herreros-Rodriguez

Link: http://arxiv.org/abs/2404.06919v1open in new window

Abstract: This article provides a comprehensive overview of recent research in the area of Child-Computer Interaction (CCI). The main contributions of the present article are two-fold. First, we present a novel longitudinal CCI database named ChildCIdbLong, which comprises over 600 children aged 18 months to 8 years old, acquired continuously over 4 academic years (2019-2023). As a result, ChildCIdbLong comprises over 12K test acquisitions over a tablet device. Different tests are considered in ChildCIdbLong, requiring different touch and stylus gestures, enabling evaluation of skills like hand-eye coordination, fine motor skills, planning, and visual tracking, among others. In addition to the ChildCIdbLong database, we propose a novel quantitative metric called Test Quality (Q), designed to measure the motor and cognitive development of children through their interaction with a tablet device. In order to provide a better comprehension of the proposed Q metric, popular percentile-based growth representations are introduced for each test, providing a two-dimensional space to compare children's development with respect to the typical age skills of the population. The results achieved in the present article highlight the potential of the novel ChildCIdbLong database in conjunction with the proposed Q metric to measure the motor and cognitive development of children as they grow up. The proposed framework could be very useful as an automatic tool to support child experts (e.g., paediatricians, educators, or neurologists) for early detection of potential physical/cognitive impairments during children's development.

SARA: Smart AI Reading Assistant for Reading Comprehension

Authors: Enkeleda Thaqi, Mohamed Mantawy, Enkelejda Kasneci

Link: http://arxiv.org/abs/2404.06906v1open in new window

Abstract: SARA integrates Eye Tracking and state-of-the-art large language models in a mixed reality framework to enhance the reading experience by providing personalized assistance in real-time. By tracking eye movements, SARA identifies the text segments that attract the user's attention the most and potentially indicate uncertain areas and comprehension issues. The process involves these key steps: text detection and extraction, gaze tracking and alignment, and assessment of detected reading difficulty. The results are customized solutions presented directly within the user's field of view as virtual overlays on identified difficult text areas. This support enables users to overcome challenges like unfamiliar vocabulary and complex sentences by offering additional context, rephrased solutions, and multilingual help. SARA's innovative approach demonstrates it has the potential to transform the reading experience and improve reading proficiency.

Impact of Extensions on Browser Performance: An Empirical Study on Google Chrome

Authors: Bihui Jin, Heng Li, Ying Zou

Link: http://arxiv.org/abs/2404.06827v1open in new window

Abstract: Web browsers have been used widely by users to conduct various online activities, such as information seeking or online shopping. To improve user experience and extend the functionality of browsers, practitioners provide mechanisms to allow users to install third-party-provided plugins (i.e., extensions) on their browsers. However, little is known about the performance implications caused by such extensions. In this paper, we conduct an empirical study to understand the impact of extensions on the user-perceived performance (i.e., energy consumption and page load time) of Google Chrome, the most popular browser. We study a total of 72 representative extensions from 11 categories (e.g., Developer Tools and Sports). We observe that browser performance can be negatively impacted by the use of extensions, even when the extensions are used in unintended circumstances (e.g., when logging into an extension is not granted but required, or when an extension is not used for designated websites). We also identify a set of factors that significantly influence the performance impact of extensions, such as code complexity and privacy practices (i.e., collection of user data) adopted by the extensions. Based on our empirical observations, we provide recommendations for developers and users to mitigate the performance impact of browser extensions, such as conducting performance testing and optimization for unintended usage scenarios of extensions, or adhering to proper usage practices of extensions (e.g., logging into an extension when required).

A proposal for a revised meta-architecture of intelligent tutoring systems to foster explainability and transparency for educators

Authors: Florian Gnadlinger, Simone Kriglstein

Link: http://arxiv.org/abs/2404.06820v1open in new window

Abstract: This contribution draws attention to implications connected with meta-architectural design decisions for intelligent tutoring systems in the context of formative assessments. As a first result of addressing this issue, this contribution presents a meta-architectural system design that includes the role of educators.

Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

Link: http://arxiv.org/abs/2404.06762v1open in new window

Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantly enhance student engagement and learning efficiency. However, characterizing and simulating student's persona remain challenging in training and evaluating conversational ITSs. In this work, we propose a framework to construct profiles of different student groups by refining and integrating both cognitive and noncognitive aspects, and leverage LLMs for personality-aware student simulation in a language learning scenario. We further enhance the framework with multi-aspect validation, and conduct extensive analysis from both teacher and student perspectives. Our experimental results show that state-of-the-art LLMs can produce diverse student responses according to the given language ability and personality traits, and trigger teacher's adaptive scaffolding strategies.

Incremental XAI: Memorable Understanding of AI with Incremental Explanations

Authors: Jessica Y. Bo, Pan Hao, Brian Y. Lim

Link: http://arxiv.org/abs/2404.06733v1open in new window

Abstract: Many explainable AI (XAI) techniques strive for interpretability by providing concise salient information, such as sparse linear factors. However, users either only see inaccurate global explanations, or highly-varying local explanations. We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details. Focusing on linear factor explanations (factors $\times$ values = outcome), we introduce Incremental XAI to automatically partition explanations for general and atypical instances by providing Base + Incremental factors to help users read and remember more faithful explanations. Memorability is improved by reusing base factors and reducing the number of factors shown in atypical cases. In modeling, formative, and summative user studies, we evaluated the faithfulness, memorability and understandability of Incremental XAI against baseline explanation methods. This work contributes towards more usable explanation that users can better ingrain to facilitate intuitive engagement with AI.

MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education

Authors: Murong Yue, Wijdane Mifdal, Yixuan Zhang, Jennifer Suh, Ziyu Yao

Link: http://arxiv.org/abs/2404.06711v1open in new window

Abstract: Mathematical modeling (MM) is considered a fundamental skill for students in STEM disciplines. Practicing the MM skill is often the most effective when students can engage in group discussion and collaborative problem-solving. However, due to unevenly distributed teachers and educational resources needed to monitor such group activities, students do not always receive equal opportunities for this practice. Excitingly, large language models (LLMs) have recently demonstrated strong capability in both modeling mathematical problems and simulating characters with different traits and properties. Drawing inspiration from the advancement of LLMs, in this work, we present MATHVC, the very first LLM-powered virtual classroom containing multiple LLM-simulated student characters, with whom a human student can practice their MM skill. To encourage each LLM character's behaviors to be aligned with their specified math-relevant properties (termed "characteristics alignment") and the overall conversational procedure to be close to an authentic student MM discussion (termed "conversational procedural alignment"), we proposed three innovations: integrating MM domain knowledge into the simulation, defining a symbolic schema as the ground for character simulation, and designing a meta planner at the platform level to drive the conversational procedure. Through experiments and ablation studies, we confirmed the effectiveness of our simulation approach and showed the promise for MATHVC to benefit real-life students in the future.

Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition

Authors: Kehua Feng, Keyan Ding, Kede Ma, Zhihua Wang, Qiang Zhang, Huajun Chen

Link: http://arxiv.org/abs/2404.08008v1open in new window

Abstract: The past years have witnessed a proliferation of large language models (LLMs). Yet, automated and unbiased evaluation of LLMs is challenging due to the inaccuracy of standard metrics in reflecting human preferences and the inefficiency in sampling informative and diverse test examples. While human evaluation remains the gold standard, it is expensive and time-consuming, especially when dealing with a large number of testing samples. To address this problem, we propose a sample-efficient human evaluation method based on MAximum Discrepancy (MAD) competition. MAD automatically selects a small set of informative and diverse instructions, each adapted to two LLMs, whose responses are subject to three-alternative forced choice by human subjects. The pairwise comparison results are then aggregated into a global ranking using the Elo rating system. We select eight representative LLMs and compare them in terms of four skills: knowledge understanding, mathematical reasoning, writing, and coding. Experimental results show that the proposed method achieves a reliable and sensible ranking of LLMs' capabilities, identifies their relative strengths and weaknesses, and offers valuable insights for further LLM advancement.

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Authors: Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Yejin Choi

Link: http://arxiv.org/abs/2404.06664v1open in new window

Abstract: Frontier large language models (LLMs) are developed by researchers and practitioners with skewed cultural backgrounds and on datasets with skewed sources. However, LLMs' (lack of) multicultural knowledge cannot be effectively assessed with current methods for developing benchmarks. Existing multicultural evaluations primarily rely on expensive and restricted human annotations or potentially outdated internet resources. Thus, they struggle to capture the intricacy, dynamics, and diversity of cultural norms. LLM-generated benchmarks are promising, yet risk propagating the same biases they are meant to measure. To synergize the creativity and expert cultural knowledge of human annotators and the scalability and standardizability of LLM-based automation, we introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build truly challenging evaluation dataset for assessing the multicultural knowledge of LLMs, while improving annotators' capabilities and experiences. Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions, that modern LLMs fail at, in a gamified manner. Importantly, the increased level of AI assistance (e.g., LLM-generated revision hints) empowers users to create more difficult questions with enhanced perceived creativity of themselves, shedding light on the promises of involving heavier AI assistance in modern evaluation dataset creation procedures. Through a series of 1-hour workshop sessions, we gather CULTURALBENCH-V0.1, a compact yet high-quality evaluation dataset with users' red-teaming attempts, that different families of modern LLMs perform with accuracy ranging from 37.7% to 72.2%, revealing a notable gap in LLMs' multicultural proficiency.

2024-04-09

Missing Pieces: How Framing Uncertainty Impacts Longitudinal Trust in AI Decision Aids -- A Gig Driver Case Study

Authors: Rex Chen, Ruiyi Wang, Norman Sadeh, Fei Fang

Link: http://arxiv.org/abs/2404.06432v1open in new window

Abstract: Decision aids based on artificial intelligence (AI) are becoming increasingly common. When such systems are deployed in environments with inherent uncertainty, following AI-recommended decisions may lead to a wide range of outcomes. In this work, we investigate how the framing of uncertainty in outcomes impacts users' longitudinal trust in AI decision aids, which is crucial to ensuring that these systems achieve their intended purposes. More specifically, we use gig driving as a representative domain to address the question: how does exposing uncertainty at different levels of granularity affect the evolution of users' trust and their willingness to rely on recommended decisions? We report on a longitudinal mixed-methods study $(n = 51)$ where we measured the trust of gig drivers as they interacted with an AI-based schedule recommendation tool. Statistically significant quantitative results indicate that participants' trust in and willingness to rely on the tool for planning depended on the perceived accuracy of the tool's estimates; that providing ranged estimates improved trust; and that increasing prediction granularity and using hedging language improved willingness to rely on the tool even when trust was low. Additionally, we report on interviews with participants which revealed a diversity of experiences with the tool, suggesting that AI systems must build trust by going beyond general designs to calibrate the expectations of individual users.

Apprentices to Research Assistants: Advancing Research with Large Language Models

Authors: M. Namvarpour, A. Razi

Link: http://arxiv.org/abs/2404.06404v1open in new window

Abstract: Large Language Models (LLMs) have emerged as powerful tools in various research domains. This article examines their potential through a literature review and firsthand experimentation. While LLMs offer benefits like cost-effectiveness and efficiency, challenges such as prompt tuning, biases, and subjectivity must be addressed. The study presents insights from experiments utilizing LLMs for qualitative analysis, highlighting successes and limitations. Additionally, it discusses strategies for mitigating challenges, such as prompt optimization techniques and leveraging human expertise. This study aligns with the 'LLMs as Research Tools' workshop's focus on integrating LLMs into HCI data work critically and ethically. By addressing both opportunities and challenges, our work contributes to the ongoing dialogue on their responsible application in research.

ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

Authors: Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C. -W. Phan

Link: http://arxiv.org/abs/2404.06243v1open in new window

Abstract: Human action or activity recognition in videos is a fundamental task in computer vision with applications in surveillance and monitoring, self-driving cars, sports analytics, human-robot interaction and many more. Traditional supervised methods require large annotated datasets for training, which are expensive and time-consuming to acquire. This work proposes a novel approach using Cross-Architecture Pseudo-Labeling with contrastive learning for semi-supervised action recognition. Our framework leverages both labeled and unlabelled data to robustly learn action representations in videos, combining pseudo-labeling with contrastive learning for effective learning from both types of samples. We introduce a novel cross-architecture approach where 3D Convolutional Neural Networks (3D CNNs) and video transformers (VIT) are utilised to capture different aspects of action representations; hence we call it ActNetFormer. The 3D CNNs excel at capturing spatial features and local dependencies in the temporal domain, while VIT excels at capturing long-range dependencies across frames. By integrating these complementary architectures within the ActNetFormer framework, our approach can effectively capture both local and global contextual information of an action. This comprehensive representation learning enables the model to achieve better performance in semi-supervised action recognition tasks by leveraging the strengths of each of these architectures. Experimental results on standard action recognition datasets demonstrate that our approach performs better than the existing methods, achieving state-of-the-art performance with only a fraction of labeled data. The official website of this work is available at: https://github.com/rana2149/ActNetFormer.

Multimodal Road Network Generation Based on Large Language Model

Authors: Jiajing Chen, Weihang Xu, Haiming Cao, Zihuan Xu, Yu Zhang, Zhao Zhang, Siyao Zhang

Link: http://arxiv.org/abs/2404.06227v1open in new window

Abstract: With the increasing popularity of ChatGPT, large language models (LLMs) have demonstrated their capabilities in communication and reasoning, promising for transportation sector intelligentization. However, they still face challenges in domain-specific knowledge. This paper aims to leverage LLMs' reasoning and recognition abilities to replace traditional user interfaces and create an "intelligent operating system" for transportation simulation software, exploring their potential with transportation modeling and simulation. We introduce Network Generation AI (NGAI), integrating LLMs with road network modeling plugins, validated through experiments for accuracy and robustness. NGAI's effective use has reduced modeling costs, revolutionized transportation simulations, optimized user steps, and proposed a novel approach for LLM integration in the transportation field.

Privacy-preserving Scanpath Comparison for Pervasive Eye Tracking

Authors: Suleyman Ozdel, Efe Bozkir, Enkelejda Kasneci

Link: http://arxiv.org/abs/2404.06216v1open in new window

Abstract: As eye tracking becomes pervasive with screen-based devices and head-mounted displays, privacy concerns regarding eye-tracking data have escalated. While state-of-the-art approaches for privacy-preserving eye tracking mostly involve differential privacy and empirical data manipulations, previous research has not focused on methods for scanpaths. We introduce a novel privacy-preserving scanpath comparison protocol designed for the widely used Needleman-Wunsch algorithm, a generalized version of the edit distance algorithm. Particularly, by incorporating the Paillier homomorphic encryption scheme, our protocol ensures that no private information is revealed. Furthermore, we introduce a random processing strategy and a multi-layered masking method to obfuscate the values while preserving the original order of encrypted editing operation costs. This minimizes communication overhead, requiring a single communication round for each iteration of the Needleman-Wunsch process. We demonstrate the efficiency and applicability of our protocol on three publicly available datasets with comprehensive computational performance analyses and make our source code publicly accessible.

EVE: Enabling Anyone to Train Robot using Augmented Reality

Authors: Jun Wang, Chun-Cheng Chang, Jiafei Duan, Dieter Fox, Ranjay Krishna

Link: http://arxiv.org/abs/2404.06089v1open in new window

Abstract: The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training a robot to automate a task typically requires physical robots and expensive demonstration data from trained human annotators. Consequently, only those with access to physical robots produce demonstrations to train robots. To mitigate this issue, we introduce EVE, an iOS app that enables everyday users to train robots using intuitive augmented reality visualizations without needing a physical robot. With EVE, users can collect demonstrations by specifying waypoints with their hands, visually inspecting the environment for obstacles, modifying existing waypoints, and verifying collected trajectories. In a user study ($N=14$, $D=30$) consisting of three common tabletop tasks, EVE outperformed three state-of-the-art interfaces in success rate and was comparable to kinesthetic teaching-physically moving a real robot-in completion time, usability, motion intent communication, enjoyment, and preference ($mean_{p}=0.30$). We conclude by enumerating limitations and design considerations for future AR-based demonstration collection systems for robotics.

Breathing New Life into Existing Visualizations: A Natural Language-Driven Manipulation Framework

Authors: Can Liu, Jiacheng Yu, Yuhan Guo, Jiayi Zhuang, Yuchu Luo, Xiaoru Yuan

Link: http://arxiv.org/abs/2404.06039v1open in new window

Abstract: We propose an approach to manipulate existing interactive visualizations to answer users' natural language queries. We analyze the natural language tasks and propose a design space of a hierarchical task structure, which allows for a systematic decomposition of complex queries. We introduce a four-level visualization manipulation space to facilitate in-situ manipulations for visualizations, enabling a fine-grained control over the visualization elements. Our methods comprise two essential components: the natural language-to-task translator and the visualization manipulation parser. The natural language-to-task translator employs advanced NLP techniques to extract structured, hierarchical tasks from natural language queries, even those with varying degrees of ambiguity. The visualization manipulation parser leverages the hierarchical task structure to streamline these tasks into a sequence of atomic visualization manipulations. To illustrate the effectiveness of our approach, we provide real-world examples and experimental results. The evaluation highlights the precision of our natural language parsing capabilities and underscores the smooth transformation of visualization manipulations.

Cymatics Cup: Shape-Changing Drinks by Leveraging Cymatics

Authors: Weijen Chen, Yang Yang, Kao-Hua Liu, Yun Suen Pai, Junichi Yamaoka, Kouta Minamizawa

Link: http://arxiv.org/abs/2404.06027v1open in new window

Abstract: To enhance the dining experience, prior studies in Human-Computer Interaction (HCI) and gastrophysics have demonstrated that modifying the static shape of solid foods can amplify taste perception. However, the exploration of dynamic shape-changing mechanisms in liquid foods remains largely untapped. In the present study, we employ cymatics, a scientific discipline focused on utilizing sound frequencies to generate patterns in liquids and particles to augment the drinking experience. Utilizing speakers, we dynamically reshaped liquids exhibiting five distinct taste profiles and evaluated resultant changes in taste perception and drinking experience. Our research objectives extend beyond merely augmenting taste from visual to tactile sensations; we also prioritize the experiential aspects of drinking. Through a series of experiments and workshops, we revealed a significant impact on taste perception and overall drinking experience when mediated by cymatics effects. Building upon these findings, we designed and developed tableware to integrate cymatics principles into gastronomic experiences.

Combinational Nonuniform Timeslicing of Dynamic Networks

Authors: Seokweon Jung, DongHwa Shin, Hyeon Jeon, Jinwook Seo

Link: http://arxiv.org/abs/2404.06021v1open in new window

Abstract: Dynamic networks represent the complex and evolving interrelationships between real-world entities. Given the scale and variability of these networks, finding an optimal slicing interval is essential for meaningful analysis. Nonuniform timeslicing, which adapts to density changes within the network, is drawing attention as a solution to this problem. In this research, we categorized existing algorithms into two domains -- data mining and visualization -- according to their approach to the problem. Data mining approach focuses on capturing temporal patterns of dynamic networks, while visualization approach emphasizes lessening the burden of analysis. We then introduce a novel nonuniform timeslicing method that synthesizes the strengths of both approaches, demonstrating its efficacy with a real-world data. The findings suggest that combining the two approaches offers the potential for more effective network analysis.

Inclusive Practices for Child-Centered AI Design and Testing

Authors: Emani Dotch, Vitica Arnold

Link: http://arxiv.org/abs/2404.05920v1open in new window

Abstract: We explore ideas and inclusive practices for designing and testing child-centered artificially intelligent technologies for neurodivergent children. AI is promising for supporting social communication, self-regulation, and sensory processing challenges common for neurodivergent children. The authors, both neurodivergent individuals and related to neurodivergent people, draw from their professional and personal experiences to offer insights on creating AI technologies that are accessible and include input from neurodivergent children. We offer ideas for designing AI technologies for neurodivergent children and considerations for including them in the design process while accounting for their sensory sensitivities. We conclude by emphasizing the importance of adaptable and supportive AI technologies and design processes and call for further conversation to refine child-centered AI design and testing methods.

2024-04-08

ClusterRadar: an Interactive Web-Tool for the Multi-Method Exploration of Spatial Clusters Over Time

Authors: Lee Mason, Blánaid Hicks, Jonas S. Almeida

Link: http://arxiv.org/abs/2404.05897v1open in new window

Abstract: Spatial cluster analysis, the detection of localized patterns of similarity in geospatial data, has a wide-range of applications for scientific discovery and practical decision making. One way to detect spatial clusters is by using local indicators of spatial association, such as Local Moran's I or Getis-Ord Gi*. However, different indicators tend to produce substantially different results due to their distinct operational characteristics. Choosing a suitable method or comparing results from multiple methods is a complex task. Furthermore, spatial clusters are dynamic and it is often useful to track their evolution over time, which adds an additional layer of complexity. ClusterRadar is a web-tool designed to address these analytical challenges. The tool allows users to easily perform spatial clustering and analyze the results in an interactive environment, uniquely prioritizing temporal analysis and the comparison of multiple methods. The tool's interactive dashboard presents several visualizations, each offering a distinct perspective of the temporal and methodological aspects of the spatial clustering results. ClusterRadar has several features designed to maximize its utility to a broad user-base, including support for various geospatial formats, and a fully in-browser execution environment to preserve the privacy of sensitive data. Feedback from a varied set of researchers suggests ClusterRadar's potential for enhancing the temporal analysis of spatial clusters.

With or Without Permission: Site-Specific Augmented Reality for Social Justice CHI 2024 Workshop Proceedings

Authors: Rafael M. L. Silva, Ana María Cárdenas Gasca, Joshua A. Fisher, Erica Principe Cruz, Cinthya Jauregui, Amy Lueck, Fannie Liu, Andrés Monroy-Hernández, Kai Lukoff

Link: http://arxiv.org/abs/2404.05889v1open in new window

Abstract: This volume represents the proceedings of With or Without Permission: Site-Specific Augmented Reality for Social Justice CHI 2024 workshop.

Youth as Peer Auditors: Engaging Teenagers with Algorithm Auditing of Machine Learning Applications

Authors: Luis Morales-Navarro, Yasmin B. Kafai, Vedya Konda, Danaë Metaxa

Link: http://arxiv.org/abs/2404.05874v1open in new window

Abstract: As artificial intelligence/machine learning (AI/ML) applications become more pervasive in youth lives, supporting them to interact, design, and evaluate applications is crucial. This paper positions youth as auditors of their peers' ML-powered applications to better understand algorithmic systems' opaque inner workings and external impacts. In a two-week workshop, 13 youth (ages 14-15) designed and audited ML-powered applications. We analyzed pre/post clinical interviews in which youth were presented with auditing tasks. The analyses show that after the workshop all youth identified algorithmic biases and inferred dataset and model design issues. Youth also discussed algorithmic justice issues and ML model improvements. Furthermore, youth reflected that auditing provided them new perspectives on model functionality and ideas to improve their own models. This work contributes (1) a conceptualization of algorithm auditing for youth; and (2) empirical evidence of the potential benefits of auditing. We discuss potential uses of algorithm auditing in learning and child-computer interaction research.

An empirical evaluation for defining a mid-air gesture dictionary for web-based interaction

Authors: Thomas Pasquale, Cristina Gena, Fabiana Vernero

Link: http://arxiv.org/abs/2404.05842v1open in new window

Abstract: This paper presents an empirical evaluation of mid-air gestures in a web setting. Fifty-six (56) subjects, all of them HCI students, were divided into 16 groups and involved as designers. Each group worked separately with the same requirements. Firstly, designers identified the main actions required for a web-based interaction with a university classroom search service. Secondly, they proposed a set of mid-air gestures to carry out the identified actions: 99 different mid-air gestures for 16 different web actions were produced in total. Then, designers validated their proposals involving external subjects, namely 248 users in total. Finally, we analyzed their results and identified the most recurring or intuitive gestures as well as the potential criticalities associated with their proposals. Hence, we defined a mid-air gesture dictionary that contains, according to our analysis, the most suitable gestures for each identified web action. Our results suggest that most people tend to replicate gestures used in touch-based and mouse-based interfaces also in touchless interactions, ignoring the fact that they can be problematic due to the different distance between the user and the device in each interaction context.

Human-Machine Interaction in Automated Vehicles: Reducing Voluntary Driver Intervention

Authors: Xinzhi Zhong, Yang Zhou, Varshini Kamaraj, Zhenhao Zhou, Wissam Kontar, Dan Negrut, John D. Lee, Soyoung Ahn

Link: http://arxiv.org/abs/2404.05832v1open in new window

Abstract: This paper develops a novel car-following control method to reduce voluntary driver interventions and improve traffic stability in Automated Vehicles (AVs). Through a combination of experimental and empirical analysis, we show how voluntary driver interventions can instigate substantial traffic disturbances that are amplified along the traffic upstream. Motivated by these findings, we present a framework for driver intervention based on evidence accumulation (EA), which describes the evolution of the driver's distrust in automation, ultimately resulting in intervention. Informed through the EA framework, we propose a deep reinforcement learning (DRL)-based car-following control for AVs that is strategically designed to mitigate unnecessary driver intervention and improve traffic stability. Numerical experiments are conducted to demonstrate the effectiveness of the proposed control model.

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Authors: Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

Link: http://arxiv.org/abs/2404.05719v1open in new window

Abstract: Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities. Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate "any resolution" on top of Ferret to magnify details and leverage enhanced visual features. Specifically, each screen is divided into 2 sub-images based on the original aspect ratio (i.e., horizontal division for portrait screens and vertical division for landscape screens). Both sub-images are encoded separately before being sent to LLMs. We meticulously gather training samples from an extensive range of elementary UI tasks, such as icon recognition, find text, and widget listing. These samples are formatted for instruction-following with region annotations to facilitate precise referring and grounding. To augment the model's reasoning ability, we further compile a dataset for advanced tasks, including detailed description, perception/interaction conversations, and function inference. After training on the curated datasets, Ferret-UI exhibits outstanding comprehension of UI screens and the capability to execute open-ended instructions. For model evaluation, we establish a comprehensive benchmark encompassing all the aforementioned tasks. Ferret-UI excels not only beyond most open-source UI MLLMs, but also surpasses GPT-4V on all the elementary UI tasks.

Is English the New Programming Language? How About Pseudo-code Engineering?

Authors: Gian Alexandre Michaelsen, Renato P. dos Santos

Link: http://arxiv.org/abs/2404.08684v1open in new window

Abstract: Background: The integration of artificial intelligence (AI) into daily life, particularly through chatbots utilizing natural language processing (NLP), presents both revolutionary potential and unique challenges. This intended to investigate how different input forms impact ChatGPT, a leading language model by OpenAI, performance in understanding and executing complex, multi-intention tasks. Design: Employing a case study methodology supplemented by discourse analysis, the research analyzes ChatGPT's responses to inputs varying from natural language to pseudo-code engineering. The study specifically examines the model's proficiency across four categories: understanding of intentions, interpretability, completeness, and creativity. Setting and Participants: As a theoretical exploration of AI interaction, this study focuses on the analysis of structured and unstructured inputs processed by ChatGPT, without direct human participants. Data collection and analysis: The research utilizes synthetic case scenarios, including the organization of a "weekly meal plan" and a "shopping list," to assess ChatGPT's response to prompts in both natural language and pseudo-code engineering. The analysis is grounded in the identification of patterns, contradictions, and unique response elements across different input formats. Results: Findings reveal that pseudo-code engineering inputs significantly enhance the clarity and determinism of ChatGPT's responses, reducing ambiguity inherent in natural language. Enhanced natural language, structured through prompt engineering techniques, similarly improves the model's interpretability and creativity. Conclusions: The study underscores the potential of pseudo-code engineering in refining human-AI interaction and achieving more deterministic, concise, and direct outcomes, advocating for its broader application across disciplines requiring precise AI responses.

Eye Tracking on Text Reading with Visual Enhancements

Authors: Franziska Huth, Maurice Koch, Miriam Awad, Daniel Weiskopf, Kuno Kurzhals

Link: http://arxiv.org/abs/2404.05572v1open in new window

Abstract: The interplay between text and visualization is gaining importance for media where traditional text is enriched by visual elements to improve readability and emphasize facts. In two controlled eye-tracking experiments ($N=12$), we approach answers to the question: How do visualization techniques influence reading behavior? We compare plain text to that marked with highlights, icons, and word-sized data visualizations. We assess quantitative metrics~(eye movement, completion time, error rate) and subjective feedback~(personal preference and ratings). The results indicate that visualization techniques, especially in the first experiment, show promising trends for improved reading behavior. The results also show the need for further research to make reading more effective and inform suggestions for future studies.

Interactive Formal Specification for Mathematical Problems of Engineers

Authors: Walther Neuper

Link: http://arxiv.org/abs/2404.05462v1open in new window

Abstract: The paper presents the second part of a precise description of the prototype that has been developed in the course of the ISAC project over the last two decades. This part describes the "specify-phase", while the first part describing the "solve-phase" is already published. In the specify-phase a student interactively constructs a formal specification. The ISAC prototype implements formal specifications as established in theoretical computer science, however, the input language for the construction avoids requiring users to have knowledge of logic; this makes the system useful for various engineering faculties (and also for high school). The paper discusses not only ISAC's design of the specify-phase in detail, but also gives a brief introduction to implementation with the aim of advertising the re-use of formal frameworks (inclusive respective front-ends) with their generic tools for language definition and their rich pool of software components for formal mathematics.

Unlocking Adaptive User Experience with Generative AI

Authors: Yutan Huang, Tanjila Kanij, Anuradha Madugalla, Shruti Mahajan, Chetan Arora, John Grundy

Link: http://arxiv.org/abs/2404.05442v1open in new window

Abstract: Developing user-centred applications that address diverse user needs requires rigorous user research. This is time, effort and cost-consuming. With the recent rise of generative AI techniques based on Large Language Models (LLMs), there is a possibility that these powerful tools can be used to develop adaptive interfaces. This paper presents a novel approach to develop user personas and adaptive interface candidates for a specific domain using ChatGPT. We develop user personas and adaptive interfaces using both ChatGPT and a traditional manual process and compare these outcomes. To obtain data for the personas we collected data from 37 survey participants and 4 interviews in collaboration with a not-for-profit organisation. The comparison of ChatGPT generated content and manual content indicates promising results that encourage using LLMs in the adaptive interfaces design process.

Re-Ranking News Comments by Constructiveness and Curiosity Significantly Increases Perceived Respect, Trustworthiness, and Interest

Authors: Emily Saltz, Zaria Howard, Tin Acosta

Link: http://arxiv.org/abs/2404.05429v1open in new window

Abstract: Online commenting platforms have commonly developed systems to address online harms by removing and down-ranking content. An alternative, under-explored approach is to focus on up-ranking content to proactively prioritize prosocial commentary and set better conversational norms. We present a study with 460 English-speaking US-based news readers to understand the effects of re-ranking comments by constructiveness, curiosity, and personal stories on a variety of outcomes related to willingness to participate and engage, as well as perceived credibility and polarization in a comment section. In our rich-media survey experiment, participants across these four ranking conditions and a control group reviewed prototypes of comment sections of a Politics op-ed and Dining article. We found that outcomes varied significantly by article type. Up-ranking curiosity and constructiveness improved a number of measures for the Politics article, including perceived \textit{Respect}, \textit{Trustworthiness}, and \textit{Interestingness} of the comment section. Constructiveness also increased perceptions that the comments were favorable to Republicans, with no condition worsening perceptions of partisans. Additionally, in the Dining article, personal stories and constructiveness rankings significantly improved the perceived informativeness of the comments. Overall, these findings indicate that incorporating prosocial qualities of speech into ranking could be a promising approach to promote healthier, less polarized dialogue in online comment sections.

Indexing Analytics to Instances: How Integrating a Dashboard can Support Design Education

Authors: Ajit Jain, Andruid Kerne, Nic Lupfer, Gabriel Britain, Aaron Perrine, Yoonsuck Choe, John Keyser, Ruihong Huang, Jinsil Seo, Annie Sungkajun, Robert Lightfoot, Timothy McGuire

Link: http://arxiv.org/abs/2404.05417v1open in new window

Abstract: We investigate how to use AI-based analytics to support design education. The analytics at hand measure multiscale design, that is, students' use of space and scale to visually and conceptually organize their design work. With the goal of making the analytics intelligible to instructors, we developed a research artifact integrating a design analytics dashboard with design instances, and the design environment that students use to create them. We theorize about how Suchman's notion of mutual intelligibility requires contextualized investigation of AI in order to develop findings about how analytics work for people. We studied the research artifact in 5 situated course contexts, in 3 departments. A total of 236 students used the multiscale design environment. The 9 instructors who taught those students experienced the analytics via the new research artifact. We derive findings from a qualitative analysis of interviews with instructors regarding their experiences. Instructors reflected on how the analytics and their presentation in the dashboard have the potential to affect design education. We develop research implications addressing: (1) how indexing design analytics in the dashboard to actual design work instances helps design instructors reflect on what they mean and, more broadly, is a technique for how AI-based design analytics can support instructors' assessment and feedback experiences in situated course contexts; and (2) how multiscale design analytics, in particular, have the potential to support design education. By indexing, we mean linking which provides context, here connecting the numbers of the analytics with visually annotated design work instances.

WebXR, A-Frame and Networked-Aframe as a Basis for an Open Metaverse: A Conceptual Architecture

Authors: Giuseppe Macario

Link: http://arxiv.org/abs/2404.05317v1open in new window

Abstract: This work proposes a WebXR-based cross-platform conceptual architecture, leveraging the A-Frame and Networked-Aframe frameworks, in order to facilitate the development of an open, accessible, and interoperable metaverse. By introducing the concept of spatial web app, this research contributes to the discourse on the metaverse, offering an architecture that democratizes access to virtual environments and extended reality through the web, and aligns with Tim Berners-Lee's original vision of the World Wide Web as an open platform in the digital realm.

Exploiting Preference Elicitation in Interactive and User-centered Algorithmic Recourse: An Initial Exploration

Authors: Seyedehdelaram Esfahani, Giovanni De Toni, Bruno Lepri, Andrea Passerini, Katya Tentori, Massimo Zancanaro

Link: http://arxiv.org/abs/2404.05270v1open in new window

Abstract: Algorithmic Recourse aims to provide actionable explanations, or recourse plans, to overturn potentially unfavourable decisions taken by automated machine learning models. In this paper, we propose an interaction paradigm based on a guided interaction pattern aimed at both eliciting the users' preferences and heading them toward effective recourse interventions. In a fictional task of money lending, we compare this approach with an exploratory interaction pattern based on a combination of alternative plans and the possibility of freely changing the configurations by the users themselves. Our results suggest that users may recognize that the guided interaction paradigm improves efficiency. However, they also feel less freedom to experiment with "what-if" scenarios. Nevertheless, the time spent on the purely exploratory interface tends to be perceived as a lack of efficiency, which reduces attractiveness, perspicuity, and dependability. Conversely, for the guided interface, more time on the interface seems to increase its attractiveness, perspicuity, and dependability while not impacting the perceived efficiency. That might suggest that this type of interfaces should combine these two approaches by trying to support exploratory behavior while gently pushing toward a guided effective solution.

Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy

Authors: Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen

Link: http://arxiv.org/abs/2404.05238v2open in new window

Abstract: Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature attribution maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy on downstream tasks. In this paper, we address this question by leveraging CHM-Corr, a state-of-the-art, ante-hoc explainable classifier \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and training-set images, and then base on them to make classification decisions. We build CHM-Corr++, an interactive interface for CHM-Corr, enabling users to edit the feature attribution map provided by CHM-Corr and observe updated model decisions. Via CHM-Corr++, users can gain insights into if, when, and how the model changes its outputs, improving their understanding beyond static explanations. However, our user study with 18 users who performed 1,400 decisions finds no statistical significance that our interactive approach improves user accuracy on CUB-200 bird image classification over static explanations. This challenges the hypothesis that interactivity can boost human-AI team accuracy~\cite{sokol2020one,sun2022exploring,shen2024towards,singh2024rethinking,mindlin2024beyond,lakkaraju2022rethinking,cheng2019explaining,liu2021understanding} and raises needs for future research. We open-source CHM-Corr++, an interactive tool for editing image classifier attention (see an interactive demo \href{http://137.184.82.109:7080/}{here}). % , and it lays the groundwork for future research to enable effective human-AI interaction in computer vision. We release code and data on \href{https://github.com/anguyen8/chm-corr-interactive}{github}.

Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy

Authors: Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen

Link: http://arxiv.org/abs/2404.05238v1open in new window

Abstract: Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature attribution maps \cite{bansal2020sam} have been established as a common means for explaining the input features that are important to AI's decisions. It is an interesting but unexplored question whether allowing users to edit the importance scores of input features at test time would improve the human-AI team's accuracy on downstream tasks. In this paper, we address this question by taking CHM-Corr, a state-of-the-art, ante-hoc explanation method \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and the training-set images, and then uses them to make classification decisions. We build an interactive interface on top of CHM-Corr, enabling users to directly edit the initial feature attribution map provided by CHM-Corr. Via our CHM-Corr++ interface, users gain insights into if, when, and how the model changes its outputs, enhancing understanding beyond static explanations. Our user study with 18 machine learning researchers who performed $\sim$1,400 decisions shows that our interactive approach does not improve user accuracy on CUB-200 bird image classification over static explanations. This challenges the belief that interactivity inherently boosts XAI effectiveness~\cite{sokol2020one,sun2022exploring,shen2024towards,singh2024rethinking,mindlin2024beyond,lakkaraju2022rethinking,cheng2019explaining,liu2021understanding} and raises needs for future research. Our work contributes to the field by open-sourcing an interactive tool for manipulating model attention, and it lays the groundwork for future research to enable effective human-AI interaction in computer vision. We release code and data on \href{https://anonymous.4open.science/r/CHMCorrPlusPlus/}{github}. Our interface are available \href{http://137.184.82.109:7080/}{here}.

Fair Machine Guidance to Enhance Fair Decision Making in Biased People

Authors: Mingzhe Yang, Hiromi Arai, Naomi Yamashita, Yukino Baba

Link: http://arxiv.org/abs/2404.05228v1open in new window

Abstract: Teaching unbiased decision-making is crucial for addressing biased decision-making in daily life. Although both raising awareness of personal biases and providing guidance on unbiased decision-making are essential, the latter topics remains under-researched. In this study, we developed and evaluated an AI system aimed at educating individuals on making unbiased decisions using fairness-aware machine learning. In a between-subjects experimental design, 99 participants who were prone to bias performed personal assessment tasks. They were divided into two groups: a) those who received AI guidance for fair decision-making before the task and b) those who received no such guidance but were informed of their biases. The results suggest that although several participants doubted the fairness of the AI system, fair machine guidance prompted them to reassess their views regarding fairness, reflect on their biases, and modify their decision-making criteria. Our findings provide insights into the design of AI systems for guiding fair decision-making in humans.

Evaluation of an LLM in Identifying Logical Fallacies: A Call for Rigor When Adopting LLMs in HCI Research

Authors: Gionnieve Lim, Simon T. Perrault

Link: http://arxiv.org/abs/2404.05213v1open in new window

Abstract: There is increasing interest in the adoption of LLMs in HCI research. However, LLMs may often be regarded as a panacea because of their powerful capabilities with an accompanying oversight on whether they are suitable for their intended tasks. We contend that LLMs should be adopted in a critical manner following rigorous evaluation. Accordingly, we present the evaluation of an LLM in identifying logical fallacies that will form part of a digital misinformation intervention. By comparing to a labeled dataset, we found that GPT-4 achieves an accuracy of 0.79, and for our intended use case that excludes invalid or unidentified instances, an accuracy of 0.90. This gives us the confidence to proceed with the application of the LLM while keeping in mind the areas where it still falls short. The paper describes our evaluation approach, results and reflections on the use of the LLM for our intended task.

2024-04-07

Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring

Authors: Nazar Ponochevnyi, Anastasia Kuzminykh

Link: http://arxiv.org/abs/2404.05103v1open in new window

Abstract: Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross-modality input comparisons in other interaction domains suggest that the structure of spoken and typed-in interactions could notably differ, reflecting variations in user expectations based on interface affordances. Thus, in this work, we compare spoken and typed instructions for chart creation. Findings suggest that while both text and voice instructions cover chart elements and element organization, voice descriptions have a variety of command formats, element characteristics, and complex linguistic features. Based on these findings, we developed guidelines for designing voice-based authoring-oriented systems and additional features that can be incorporated into existing text-based systems to support speech modality.

Co-design Accessible Public Robots: Insights from People with Mobility Disability, Robotic Practitioners and Their Collaborations

Authors: Howard Ziyu Han, Franklin Mingzhe Li, Alesandra Baca Vazquez, Daragh Byrne, Nikolas Martelaro, Sarah E Fox

Link: http://arxiv.org/abs/2404.05050v1open in new window

Abstract: Sidewalk robots are increasingly common across the globe. Yet, their operation on public paths poses challenges for people with mobility disabilities (PwMD) who face barriers to accessibility, such as insufficient curb cuts. We interviewed 15 PwMD to understand how they perceive sidewalk robots. Findings indicated that PwMD feel they have to compete for space on the sidewalk when robots are introduced. We next interviewed eight robotics practitioners to learn about their attitudes towards accessibility. Practitioners described how issues often stem from robotic companies addressing accessibility only after problems arise. Both interview groups underscored the importance of integrating accessibility from the outset. Building on this finding, we held four co-design workshops with PwMD and practitioners in pairs. These convenings brought to bear accessibility needs around robots operating in public spaces and in the public interest. Our study aims to set the stage for a more inclusive future around public service robots.

Reduction of Forgetting by Contextual Variation During Encoding Using 360-Degree Video-Based Immersive Virtual Environments

Authors: Takato Mizuho, Takuji Narumi, Hideaki Kuzuoka

Link: http://arxiv.org/abs/2404.05007v1open in new window

Abstract: Recall impairment in a different environmental context from learning is called context-dependent forgetting. Two learning methods have been proposed to prevent context-dependent forgetting: reinstatement and decontextualization. Reinstatement matches the environmental context between learning and retrieval, whereas decontextualization involves repeated learning in various environmental contexts and eliminates the context dependency of memory. Conventionally, these methods have been validated by switching between physical rooms. However, in this study, we use immersive virtual environments (IVEs) as the environmental context assisted by virtual reality (VR), which is known for its low cost and high reproducibility compared to traditional manipulation. Whereas most existing studies using VR have failed to reveal the reinstatement effect, we test its occurrence using a 360-degree video-based IVE with improved familiarity and realism instead of a computer graphics-based IVE. Furthermore, we are the first to address decontextualization using VR. Our experiment showed that repeated learning in the same constant IVE as retrieval did not significantly reduce forgetting compared to repeated learning in different constant IVEs. Conversely, repeated learning in various IVEs significantly reduced forgetting than repeated learning in constant IVEs. These findings contribute to the design of IVEs for VR-based applications, particularly in educational settings.

Towards Developing Brain-Computer Interfaces for People with Multiple Sclerosis

Authors: John S. Russo, Tim Mahoney, Kirill Kokorin, Ashley Reynolds, Chin-Hsuan Sophie Lin, Sam E. John, David B. Grayden

Link: http://arxiv.org/abs/2404.04965v2open in new window

Abstract: Multiple Sclerosis (MS) is a severely disabling condition that leads to various neurological symptoms. A Brain-Computer Interface (BCI) may substitute some lost function; however, there is a lack of BCI research in people with MS. To progress this research area effectively and efficiently, we aimed to evaluate user needs and assess the feasibility and user-centric requirements of a BCI for people with MS. We conducted an online survey of 34 people with MS to qualitatively assess user preferences and establish the initial steps of user-centred design. The survey aimed to understand their interest and preferences in BCI and bionic applications. We demonstrated widespread interest for BCI applications in all stages of MS, with a preference for a non-invasive (n = 12) or minimally invasive (n = 15) BCI over carer assistance (n = 6). Qualitative assessment indicated that this preference was not influenced by level of independence. Additionally, strong interest was noted in bionic technology for sensory and autonomic functions. Considering the potential to enhance independence and quality of life for people living with MS, the results emphasise the importance of user-centred design for future advancement of BCIs that account for the unique pathological changes associated with MS.

Balancing Information Perception with Yin-Yang: Agent-Based Information Neutrality Model for Recommendation Systems

Authors: Mengyan Wang, Yuxuan Hu, Shiqing Wu, Weihua Li, Quan Bai, Verica Rupar

Link: http://arxiv.org/abs/2404.04906v1open in new window

Abstract: While preference-based recommendation algorithms effectively enhance user engagement by recommending personalized content, they often result in the creation of ``filter bubbles''. These bubbles restrict the range of information users interact with, inadvertently reinforcing their existing viewpoints. Previous research has focused on modifying these underlying algorithms to tackle this issue. Yet, approaches that maintain the integrity of the original algorithms remain largely unexplored. This paper introduces an Agent-based Information Neutrality model grounded in the Yin-Yang theory, namely, AbIN. This innovative approach targets the imbalance in information perception within existing recommendation systems. It is designed to integrate with these preference-based systems, ensuring the delivery of recommendations with neutral information. Our empirical evaluation of this model proved its efficacy, showcasing its capacity to expand information diversity while respecting user preferences. Consequently, AbIN emerges as an instrumental tool in mitigating the negative impact of filter bubbles on information consumption.

2024-04-06

Authors: Anubhav Jangra, Jamshid Mozafari, Adam Jatowt, Smaranda Muresan

Link: http://arxiv.org/abs/2404.04728v1open in new window

Abstract: Digital education has gained popularity in the last decade, especially after the COVID-19 pandemic. With the improving capabilities of large language models to reason and communicate with users, envisioning intelligent tutoring systems (ITSs) that can facilitate self-learning is not very far-fetched. One integral component to fulfill this vision is the ability to give accurate and effective feedback via hints to scaffold the learning process. In this survey article, we present a comprehensive review of prior research on hint generation, aiming to bridge the gap between research in education and cognitive science, and research in AI and Natural Language Processing. Informed by our findings, we propose a formal definition of the hint generation task, and discuss the roadmap of building an effective hint generation system aligned with the formal definition, including open challenges, future directions and ethical considerations.

"Don't Step on My Toes": Resolving Editing Conflicts in Real-Time Collaboration in Computational Notebooks

Authors: April Yi Wang, Zihan Wu, Christopher Brooks, Steve Oney

Link: http://arxiv.org/abs/2404.04695v1open in new window

Abstract: Real-time collaborative editing in computational notebooks can improve the efficiency of teamwork for data scientists. However, working together through synchronous editing of notebooks introduces new challenges. Data scientists may inadvertently interfere with each others' work by altering the shared codebase and runtime state if they do not set up a social protocol for working together and monitoring their collaborators' progress. In this paper, we propose a real-time collaborative editing model for resolving conflict edits in computational notebooks that introduces three levels of edit protection to help collaborators avoid introducing errors to both the program source code and changes to the runtime state.

Designing for Complementarity: A Conceptual Framework to Go Beyond the Current Paradigm of Using XAI in Healthcare

Authors: Elisa Rubegni, Omran Ayoub, Stefania Maria Rita Rizzo, Marco Barbero, Guenda Bernegger, Francesca Faraci, Francesca Mangili, Emiliano Soldini, Pierpaolo Trimboli, Alessandro Facchini

Link: http://arxiv.org/abs/2404.04638v1open in new window

Abstract: The widespread use of Artificial Intelligence-based tools in the healthcare sector raises many ethical and legal problems, one of the main reasons being their black-box nature and therefore the seemingly opacity and inscrutability of their characteristics and decision-making process. Literature extensively discusses how this can lead to phenomena of over-reliance and under-reliance, ultimately limiting the adoption of AI. We addressed these issues by building a theoretical framework based on three concepts: Feature Importance, Counterexample Explanations, and Similar-Case Explanations. Grounded in the literature, the model was deployed within a case study in which, using a participatory design approach, we designed and developed a high-fidelity prototype. Through the co-design and development of the prototype and the underlying model, we advanced the knowledge on how to design AI-based systems for enabling complementarity in the decision-making process in the healthcare domain. Our work aims at contributing to the current discourse on designing AI systems to support clinicians' decision-making processes.

Analyzing LLM Usage in an Advanced Computing Class in India

Authors: Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, Dhruv Kumar

Link: http://arxiv.org/abs/2404.04603v1open in new window

Abstract: This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between students and LLMs. Our study provides a comprehensive quantitative and qualitative analysis of raw interactions between students and LLMs within an advanced computing course (Distributed Systems) at an Indian University. We further complement this by conducting student interviews to gain deeper insights into their usage patterns. Our study shows that students make use of large language models (LLMs) in various ways: generating code or debugging code by identifying and fixing errors. They also copy and paste assignment descriptions into LLM interfaces for specific solutions, ask conceptual questions about complex programming ideas or theoretical concepts, and generate test cases to check code functionality and robustness. Our analysis includes over 4,000 prompts from 411 students and conducting interviews with 10 students. Our analysis shows that LLMs excel at generating boilerplate code and assisting in debugging, while students handle the integration of components and system troubleshooting. This aligns with the learning objectives of advanced computing courses, which are oriented towards teaching students how to build systems and troubleshoot, with less emphasis on generating code from scratch. Therefore, LLM tools can be leveraged to increase student productivity, as shown by the data we collected. This study contributes to the ongoing discussion on LLM use in education, advocating for their usefulness in advanced computing courses to complement higher-level learning and productivity.

TeleAware Robot: Designing Awareness-augmented Telepresence Robot for Remote Collaborative Locomotion

Authors: Ruyi Li, Yaxin Zhu, Min Liu, Yihang Zeng, Shanning Zhuang, Jiayi Fu, Yi Lu, Guyue Zhou, Can Liu, Jiangtao Gong

Link: http://arxiv.org/abs/2404.04579v1open in new window

Abstract: Telepresence robots can be used to support users to navigate an environment remotely and share the visiting experience with their social partners. Although such systems allow users to see and hear the remote environment and communicate with their partners via live video feed, this does not provide enough awareness of the environment and their remote partner's activities. In this paper, we introduce an awareness framework for collaborative locomotion in scenarios of onsite and remote users visiting a place together. From an observational study of small groups of people visiting exhibitions, we derived four design goals for enhancing the environmental and social awareness between social partners, and developed a set of awareness-enhancing techniques to add to a standard telepresence robot - named TeleAware robot. Through a controlled experiment simulating a guided exhibition visiting task, TeleAware robot showed the ability to lower the workload, facilitate closer social proximity, and improve mutual awareness and social presence compared with the standard one. We discuss the impact of mobility and roles of local and remote users, and provide insights for the future design of awareness-enhancing telepresence robot systems that facilitate collaborative locomotion.

A Map of Exploring Human Interaction patterns with LLM: Insights into Collaboration and Creativity

Authors: Jiayang Li, Jiale Li

Link: http://arxiv.org/abs/2404.04570v1open in new window

Abstract: The outstanding performance capabilities of large language model have driven the evolution of current AI system interaction patterns. This has led to considerable discussion within the Human-AI Interaction (HAII) community. Numerous studies explore this interaction from technical, design, and empirical perspectives. However, the majority of current literature reviews concentrate on interactions across the wider spectrum of AI, with limited attention given to the specific realm of interaction with LLM. We searched for articles on human interaction with LLM, selecting 110 relevant publications meeting consensus definition of Human-AI interaction. Subsequently, we developed a comprehensive Mapping Procedure, structured in five distinct stages, to systematically analyze and categorize the collected publications. Applying this methodical approach, we meticulously mapped the chosen studies, culminating in a detailed and insightful representation of the research landscape. Overall, our review presents an novel approach, introducing a distinctive mapping method, specifically tailored to evaluate human-LLM interaction patterns. We conducted a comprehensive analysis of the current research in related fields, employing clustering techniques for categorization, which enabled us to clearly delineate the status and challenges prevalent in each identified area.

Language Models as Critical Thinking Tools: A Case Study of Philosophers

Authors: Andre Ye, Jared Moore, Rose Novick, Amy X. Zhang

Link: http://arxiv.org/abs/2404.04516v1open in new window

Abstract: Current work in language models (LMs) helps us speed up or even skip thinking by accelerating and automating cognitive work. But can LMs help us with critical thinking -- thinking in deeper, more reflective ways which challenge assumptions, clarify ideas, and engineer new concepts? We treat philosophy as a case study in critical thinking, and interview 21 professional philosophers about how they engage in critical thinking and on their experiences with LMs. We find that philosophers do not find LMs to be useful because they lack a sense of selfhood (memory, beliefs, consistency) and initiative (curiosity, proactivity). We propose the selfhood-initiative model for critical thinking tools to characterize this gap. Using the model, we formulate three roles LMs could play as critical thinking tools: the Interlocutor, the Monitor, and the Respondent. We hope that our work inspires LM researchers to further develop LMs as critical thinking tools and philosophers and other 'critical thinkers' to imagine intellectually substantive uses of LMs.

Majority Voting of Doctors Improves Appropriateness of AI Reliance in Pathology

Authors: Hongyan Gu. Chunxu Yang, Shino Magaki, Neda Zarrin-Khameh, Nelli S. Lakis, Inma Cobos, Negar Khanlou, Xinhai R. Zhang, Jasmeet Assi, Joshua T. Byers, Ameer Hamza, Karam Han, Anders Meyer, Hilda Mirbaha, Carrie A. Mohila, Todd M. Stevens, Sara L. Stone, Wenzhong Yan, Mohammad Haeri, Xiang 'Anthony' Chen

Link: http://arxiv.org/abs/2404.04485v1open in new window

Abstract: As Artificial Intelligence (AI) making advancements in medical decision-making, there is a growing need to ensure doctors develop appropriate reliance on AI to avoid adverse outcomes. However, existing methods in enabling appropriate AI reliance might encounter challenges while being applied in the medical domain. With this regard, this work employs and provides the validation of an alternative approach -- majority voting -- to facilitate appropriate reliance on AI in medical decision-making. This is achieved by a multi-institutional user study involving 32 medical professionals with various backgrounds, focusing on the pathology task of visually detecting a pattern, mitoses, in tumor images. Here, the majority voting process was conducted by synthesizing decisions under AI assistance from a group of pathology doctors (pathologists). Two metrics were used to evaluate the appropriateness of AI reliance: Relative AI Reliance (RAIR) and Relative Self-Reliance (RSR). Results showed that even with groups of three pathologists, majority-voted decisions significantly increased both RAIR and RSR -- by approximately 9% and 31%, respectively -- compared to decisions made by one pathologist collaborating with AI. This increased appropriateness resulted in better precision and recall in the detection of mitoses. While our study is centered on pathology, we believe these insights can be extended to general high-stakes decision-making processes involving similar visual tasks.

2024-04-05

HIV Client Perspectives on Digital Health in Malawi

Authors: Lisa Orii, Caryl Feldacker, Jacqueline Madalitso Huwa, Agness Thawani, Evelyn Viola, Christine Kiruthu-Kamamia, Odala Sande, Hannock Tweya, Richard Anderson

Link: http://arxiv.org/abs/2404.04444v1open in new window

Abstract: eHealth has strong potential to advance HIV care in low- and middle-income countries. Given the sensitivity of HIV-related information and the risks associated with unintended HIV status disclosure, clients' privacy perceptions towards eHealth applications should be examined to develop client-centered technologies. Through focus group discussions with antiretroviral therapy (ART) clients from Lighthouse Trust, Malawi's public HIV care program, we explored perceptions of data security and privacy, including their understanding of data flow and their concerns about data confidentiality across several layers of data use. Our findings highlight the broad privacy concerns that affect ART clients' day-to-day choices, clients' trust in Malawi's health system, and their acceptance of, and familiarity with, point-of-care technologies used in HIV care. Based on our findings, we provide recommendations for building robust digital health systems in low- and middle-income countries with limited resources, nascent privacy regulations, and political will to take action to protect client data.

Humanoid Robots at work: where are we ?

Authors: Fabrice R. Noreils

Link: http://arxiv.org/abs/2404.04249v1open in new window

Abstract: Launched by Elon Musk and its Optimus, we are witnessing a new race in which many companies have already engaged. The objective it to put at work a new generation of humanoid robots in demanding industrial environments within 2 or 3 years. Is this objective realistic ? The aim of this document and its main contributions is to provide some hints by covering the following topics: First an analysis of 12 companies based on eight criteria that will help us to distinguish companies based on their maturity and approach to the market; second as these humanoids are very complex systems we will provide an overview of the technological challenges to be addressed; third when humanoids are deployed at scale, Operation and Maintenance become critical and the we will explore what is new with these complex machines; Finally Pilots are the last step to test the feasibility of a new system before mass deployment. This is an important step to test the maturity of a product and the strategy of the humanoid supplier to address a market and two pragmatic approaches will be discussed.

Social Skill Training with Large Language Models

Authors: Diyi Yang, Caleb Ziems, William Held, Omar Shaikh, Michael S. Bernstein, John Mitchell

Link: http://arxiv.org/abs/2404.04204v1open in new window

Abstract: People rely on social skills like conflict resolution to communicate effectively and to thrive in both work and personal life. However, practice environments for social skills are typically out of reach for most people. How can we make social skill training more available, accessible, and inviting? Drawing upon interdisciplinary research from communication and psychology, this perspective paper identifies social skill barriers to enter specialized fields. Then we present a solution that leverages large language models for social skill training via a generic framework. Our AI Partner, AI Mentor framework merges experiential learning with realistic practice and tailored feedback. This work ultimately calls for cross-disciplinary innovation to address the broader implications for workforce development and social equality.

Designing Robots to Help Women

Authors: Martin Cooney, Lena Klasén, Fernando Alonso-Fernandez

Link: http://arxiv.org/abs/2404.04123v1open in new window

Abstract: Robots are being designed to help people in an increasing variety of settings--but seemingly little attention has been given so far to the specific needs of women, who represent roughly half of the world's population but are highly underrepresented in robotics. Here we used a speculative prototyping approach to explore this expansive design space: First, we identified some potential challenges of interest, including crimes and illnesses that disproportionately affect women, as well as potential opportunities for designers, which were visualized in five sketches. Then, one of the sketched scenarios was further explored by developing a prototype, of a robotic helper drone equipped with computer vision to detect hidden cameras that could be used to spy on women. While object detection introduced some errors, hidden cameras were identified with a reasonable accuracy of 80% (Intersection over Union (IoU) score: 0.40). Our aim is that the identified challenges and opportunities could help spark discussion and inspire designers, toward realizing a safer, more inclusive future through responsible use of technology.

ChoreoVis: Planning and Assessing Formations in Dance Choreographies

Authors: Samuel Beck, Nina Doerr, Kuno Kurzhals, Alexander Riedlinger, Fabian Schmierer, Michael Sedlmair, Steffen Koch

Link: http://arxiv.org/abs/2404.04100v1open in new window

Abstract: Sports visualization has developed into an active research field over the last decades. Many approaches focus on analyzing movement data recorded from unstructured situations, such as soccer. For the analysis of choreographed activities like formation dancing, however, the goal differs, as dancers follow specific formations in coordinated movement trajectories. To date, little work exists on how visual analytics methods can support such choreographed performances. To fill this gap, we introduce a new visual approach for planning and assessing dance choreographies. In terms of planning choreographies, we contribute a web application with interactive authoring tools and views for the dancers' positions and orientations, movement trajectories, poses, dance floor utilization, and movement distances. For assessing dancers' real-world movement trajectories, extracted by manual bounding box annotations, we developed a timeline showing aggregated trajectory deviations and a dance floor view for detailed trajectory comparison. Our approach was developed and evaluated in collaboration with dance instructors, showing that introducing visual analytics into this domain promises improvements in training efficiency for the future.

Hierarchical Neural Additive Models for Interpretable Demand Forecasts

Authors: Leif Feddersen, Catherine Cleophas

Link: http://arxiv.org/abs/2404.04070v1open in new window

Abstract: Demand forecasts are the crucial basis for numerous business decisions, ranging from inventory management to strategic facility planning. While machine learning (ML) approaches offer accuracy gains, their interpretability and acceptance are notoriously lacking. Addressing this dilemma, we introduce Hierarchical Neural Additive Models for time series (HNAM). HNAM expands upon Neural Additive Models (NAM) by introducing a time-series specific additive model with a level and interacting covariate components. Covariate interactions are only allowed according to a user-specified interaction hierarchy. For example, weekday effects may be estimated independently of other covariates, whereas a holiday effect may depend on the weekday and an additional promotion may depend on both former covariates that are lower in the interaction hierarchy. Thereby, HNAM yields an intuitive forecasting interface in which analysts can observe the contribution for each known covariate. We evaluate the proposed approach and benchmark its performance against other state-of-the-art machine learning and statistical models extensively on real-world retail data. The results reveal that HNAM offers competitive prediction performance whilst providing plausible explanations.

VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots

Authors: Akhil Padmanabha, Jessie Yuan, Janavi Gupta, Zulekha Karachiwalla, Carmel Majidi, Henny Admoni, Zackory Erickson

Link: http://arxiv.org/abs/2404.04066v1open in new window

Abstract: Physically assistive robots present an opportunity to significantly increase the well-being and independence of individuals with motor impairments or other forms of disability who are unable to complete activities of daily living. Speech interfaces, especially ones that utilize Large Language Models (LLMs), can enable individuals to effectively and naturally communicate high-level commands and nuanced preferences to robots. Frameworks for integrating LLMs as interfaces to robots for high level task planning and code generation have been proposed, but fail to incorporate human-centric considerations which are essential while developing assistive interfaces. In this work, we present a framework for incorporating LLMs as speech interfaces for physically assistive robots, constructed iteratively with 3 stages of testing involving a feeding robot, culminating in an evaluation with 11 older adults at an independent living facility. We use both quantitative and qualitative data from the final study to validate our framework and additionally provide design guidelines for using LLMs as speech interfaces for assistive robots. Videos and supporting files are located on our project website: https://sites.google.com/andrew.cmu.edu/voicepilot/

Which Experimental Design is Better Suited for VQA Tasks? Eye Tracking Study on Cognitive Load, Performance, and Gaze Allocations

Authors: Sita A. Vriend, Sandeep Vidyapu, Amer Rama, Kun-Ting Chen, Daniel Weiskopf

Link: http://arxiv.org/abs/2404.04036v1open in new window

Abstract: We conducted an eye-tracking user study with 13 participants to investigate the influence of stimulus-question ordering and question modality on participants using visual question-answering (VQA) tasks. We examined cognitive load, task performance, and gaze allocations across five distinct experimental designs, aiming to identify setups that minimize the cognitive burden on participants. The collected performance and gaze data were analyzed using quantitative and qualitative methods. Our results indicate a significant impact of stimulus-question ordering on cognitive load and task performance, as well as a noteworthy effect of question modality on task performance. These findings offer insights for the experimental design of controlled user studies in visualization research.

Validation of critical maneuvers based on shared control

Authors: Mauricio Marcano, Joseba Sarabia, Asier Zubizarreta, Sergio Díaz

Link: http://arxiv.org/abs/2404.04011v1open in new window

Abstract: This paper presents the validation of shared control strategies for critical maneuvers in automated driving systems. Shared control involves collaboration between the driver and automation, allowing both parties to actively engage and cooperate at different levels of the driving task. The involvement of the driver adds complexity to the control loop, necessitating comprehensive validation methodologies. The proposed approach focuses on two critical maneuvers: overtaking in low visibility scenarios and lateral evasive actions. A modular architecture with an arbitration module and shared control algorithms is implemented, primarily focusing on the lateral control of the vehicle. The validation is conducted using a dynamic simulator, involving 8 real drivers interacting with a virtual environment. The results demonstrate improved safety and user acceptance, indicating the effectiveness of the shared control strategies in comparison with no shared-control support. Future work involves implementing shared control in drive-by-wire systems to enhance safety and driver comfort during critical maneuvers. Overall, this research contributes to the development and validation of shared control approaches in automated driving systems.

From Theory to Comprehension: A Comparative Study of Differential Privacy and $k$-Anonymity

Authors: Saskia Nuñez von Voigt, Luise Mehner, Florian Tschorsch

Link: http://arxiv.org/abs/2404.04006v1open in new window

Abstract: The notion of $\varepsilon$-differential privacy is a widely used concept of providing quantifiable privacy to individuals. However, it is unclear how to explain the level of privacy protection provided by a differential privacy mechanism with a set $\varepsilon$. In this study, we focus on users' comprehension of the privacy protection provided by a differential privacy mechanism. To do so, we study three variants of explaining the privacy protection provided by differential privacy: (1) the original mathematical definition; (2) $\varepsilon$ translated into a specific privacy risk; and (3) an explanation using the randomized response technique. We compare users' comprehension of privacy protection employing these explanatory models with their comprehension of privacy protection of $k$-anonymity as baseline comprehensibility. Our findings suggest that participants' comprehension of differential privacy protection is enhanced by the privacy risk model and the randomized response-based model. Moreover, our results confirm our intuition that privacy protection provided by $k$-anonymity is more comprehensible.

Approximate UMAP allows for high-rate online visualization of high-dimensional data streams

Authors: Peter Wassenaar, Pierre Guetschel, Michael Tangermann

Link: http://arxiv.org/abs/2404.04001v1open in new window

Abstract: In the BCI field, introspection and interpretation of brain signals are desired for providing feedback or to guide rapid paradigm prototyping but are challenging due to the high noise level and dimensionality of the signals. Deep neural networks are often introspected by transforming their learned feature representations into 2- or 3-dimensional subspace visualizations using projection algorithms like Uniform Manifold Approximation and Projection (UMAP). Unfortunately, these methods are computationally expensive, making the projection of data streams in real-time a non-trivial task. In this study, we introduce a novel variant of UMAP, called approximate UMAP (aUMAP). It aims at generating rapid projections for real-time introspection. To study its suitability for real-time projecting, we benchmark the methods against standard UMAP and its neural network counterpart parametric UMAP. Our results show that approximate UMAP delivers projections that replicate the projection space of standard UMAP while decreasing projection speed by an order of magnitude and maintaining the same training time.

Modeling social interaction dynamics using temporal graph networks

Authors: J. Taery Kim, Archit Naik, Isuru Jayarathne, Sehoon Ha, Jouh Yeong Chew

Link: http://arxiv.org/abs/2404.06611v1open in new window

Abstract: Integrating intelligent systems, such as robots, into dynamic group settings poses challenges due to the mutual influence of human behaviors and internal states. A robust representation of social interaction dynamics is essential for effective human-robot collaboration. Existing approaches often narrow their focus to facial expressions or speech, overlooking the broader context. We propose employing an adapted Temporal Graph Networks to comprehensively represent social interaction dynamics while enabling its practical implementation. Our method incorporates temporal multi-modal behavioral data including gaze interaction, voice activity and environmental context. This representation of social interaction dynamics is trained as a link prediction problem using annotated gaze interaction data. The F1-score outperformed the baseline model by 37.0%. This improvement is consistent for a secondary task of next speaker prediction which achieves an improvement of 29.0%. Our contributions are two-fold, including a model to representing social interaction dynamics which can be used for many downstream human-robot interaction tasks like human state inference and next speaker prediction. More importantly, this is achieved using a more concise yet efficient message passing method, significantly reducing it from 768 to 14 elements, while outperforming the baseline model.

Tensions between Preference and Performance: Designing for Visual Exploration of Multi-frequency Medical Network Data

Authors: Christian Knoll, Laura Koesten, Isotta Rigoni, Serge Vulliémoz, Torsten Möller

Link: http://arxiv.org/abs/2404.03965v1open in new window

Abstract: The analysis of complex high-dimensional data is a common task in many domains, resulting in bespoke visual exploration tools. Expectations and practices of domain experts as users do not always align with visualization theory. In this paper, we report on a design study in the medical domain where we developed two high-fidelity prototypes encoding EEG-derived brain network data with different types of visualizations. We evaluate these prototypes regarding effectiveness, efficiency, and preference with two groups: participants with domain knowledge (domain experts in medical research) and those without domain knowledge, both groups having little or no visualization experience. A requirement analysis and study of low-fidelity prototypes revealed a strong preference for a novel and aesthetically pleasing visualization design, as opposed to a design that is considered more optimal based on visualization theory. Our study highlights the pros and cons of both approaches, discussing trade-offs between task-specific measurements and subjective preference. While the aesthetically pleasing and novel low-fidelity prototype was favored, the results of our evaluation show that, in most cases, this was not reflected in participants' performance or subjective preference for the high-fidelity prototypes.

Open vocabulary keyword spotting through transfer learning from speech synthesis

Authors: Kesavaraj V, Anil Kumar Vuppala

Link: http://arxiv.org/abs/2404.03914v1open in new window

Abstract: Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting dependon a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).

Effects of Multisensory Feedback on the Perception and Performance of Virtual Reality Hand-Retargeted Interaction

Authors: Hyunyoung Jang, Jinwook Kim, Jeongmi Lee

Link: http://arxiv.org/abs/2404.03899v1open in new window

Abstract: Retargeting methods that modify the visual representation of real movements have been widely used to expand the interaction space and create engaging virtual reality experiences. For optimal user experience and performance, it is essential to specify the perception of retargeting and utilize the appropriate range of modification parameters. However, previous studies mostly concentrated on whether users perceived the target sense or not and rarely examined the perceptual accuracy and sensitivity to retargeting. Moreover, it is unknown how the perception and performance in hand-retargeted interactions are influenced by multisensory feedback. In this study, we used rigorous psychophysical methods to specify users' perceptual accuracy and sensitivity to hand-retargeting and provide acceptable ranges of retargeting parameters. We also presented different multisensory feedback simultaneously with the retargeting to probe its effect on users' perception and task performance. The experimental results showed that providing continuous multisensory feedback, proportionate to the distance between the virtual hand and the targeted destination, heightened the accuracy of users' perception of hand retargeting without altering their perceptual sensitivity. Furthermore, the utilization of multisensory feedback considerably improved the precision of task performance, particularly at lower gain factors. Based on these findings, we propose design guidelines and potential applications of VR hand-retargeted interactions and multisensory feedback for optimal user experience and performance.

Buck You: Designing Easy-to-Onboard Blockchain Applications with Zero-Knowledge Login and Sponsored Transactions on Sui

Authors: Eason Chen, Zimo Xiao, Justa Liang, Damien Chen, Pierce Hung, Kostas Kryptos Chalkias

Link: http://arxiv.org/abs/2404.03845v1open in new window

Abstract: In this paper, we developed a blockchain application to demonstrate the functionality of Sui's recent innovations: Zero Knowledge Login and Sponsored Transactions. Zero Knowledge Login allows users to create and access their blockchain wallets just with their OAuth accounts (e.g., Google, Facebook, Twitch), while Sponsored Transactions eliminate the need for users to prepare transaction fees, as they can delegate fees to sponsors' accounts. Additionally, thanks to Sui's Storage Rebate feature, sponsors in Sponsored Transactions can profit from the sponsorship, achieving a win-win and sustainable service model. Zero Knowledge Login and Sponsored Transactions are pivotal in overcoming key challenges novice blockchain users face, particularly in managing private keys and depositing initial transaction fees. By addressing these challenges in the user experience of blockchain, Sui makes the blockchain more accessible and engaging for novice users and paves the way for the broader adoption of blockchain applications in everyday life.

2024-04-04

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

Authors: Jonathan F. Carter, João Jorge, Oliver Gibson, Lionel Tarassenko

Link: http://arxiv.org/abs/2404.03831v1open in new window

Abstract: Advances in camera-based physiological monitoring have enabled the robust, non-contact measurement of respiration and the cardiac pulse, which are known to be indicative of the sleep stage. This has led to research into camera-based sleep monitoring as a promising alternative to "gold-standard" polysomnography, which is cumbersome, expensive to administer, and hence unsuitable for longer-term clinical studies. In this paper, we introduce SleepVST, a transformer model which enables state-of-the-art performance in camera-based sleep stage classification (sleep staging). After pre-training on contact sensor data, SleepVST outperforms existing methods for cardio-respiratory sleep staging on the SHHS and MESA datasets, achieving total Cohen's kappa scores of 0.75 and 0.77 respectively. We then show that SleepVST can be successfully transferred to cardio-respiratory waveforms extracted from video, enabling fully contact-free sleep staging. Using a video dataset of 50 nights, we achieve a total accuracy of 78.8% and a Cohen's $\kappa$ of 0.71 in four-class video-based sleep staging, setting a new state-of-the-art in the domain.

I Did Not Notice: A Comparison of Immersive Analytics with Augmented and Virtual Reality

Authors: Xiaoyan Zhou, Anil Ufuk Batmaz, Adam S. Williams, Dylan Schreiber, Francisco Ortega

Link: http://arxiv.org/abs/2404.03814v1open in new window

Abstract: Immersive environments enable users to engage in embodied interaction, enhancing the sensemaking processes involved in completing tasks such as immersive analytics. Previous comparative studies on immersive analytics using augmented and virtual realities have revealed that users employ different strategies for data interpretation and text-based analytics depending on the environment. Our study seeks to investigate how augmented and virtual reality influences sensemaking processes in quantitative immersive analytics. Our results, derived from a diverse group of participants, indicate that users demonstrate comparable performance in both environments. However, it was observed that users exhibit a higher tolerance for cognitive load in VR and travel further in AR. Based on our findings, we recommend providing users with the option to switch between AR and VR, thereby enabling them to select an environment that aligns with their preferences and task requirements.

Learning Social Fairness Preferences from Non-Expert Stakeholder Opinions in Kidney Placement

Authors: Mukund Telukunta, Sukruth Rao, Gabriella Stickney, Venkata Sriram Siddardh Nadendla, Casey Canfield

Link: http://arxiv.org/abs/2404.03800v1open in new window

Abstract: Modern kidney placement incorporates several intelligent recommendation systems which exhibit social discrimination due to biases inherited from training data. Although initial attempts were made in the literature to study algorithmic fairness in kidney placement, these methods replace true outcomes with surgeons' decisions due to the long delays involved in recording such outcomes reliably. However, the replacement of true outcomes with surgeons' decisions disregards expert stakeholders' biases as well as social opinions of other stakeholders who do not possess medical expertise. This paper alleviates the latter concern and designs a novel fairness feedback survey to evaluate an acceptance rate predictor (ARP) that predicts a kidney's acceptance rate in a given kidney-match pair. The survey is launched on Prolific, a crowdsourcing platform, and public opinions are collected from 85 anonymous crowd participants. A novel social fairness preference learning algorithm is proposed based on minimizing social feedback regret computed using a novel logit-based fairness feedback model. The proposed model and learning algorithm are both validated using simulation experiments as well as Prolific data. Public preferences towards group fairness notions in the context of kidney placement have been estimated and discussed in detail. The specific ARP tested in the Prolific survey has been deemed fair by the participants.

Revisiting Categorical Color Perception in Scatterplots: Sequential, Diverging, and Categorical Palettes

Authors: Chin Tseng, Arran Zeyu Wang, Ghulam Jilani Quadri, Danielle Albers Szafir

Link: http://arxiv.org/abs/2404.03787v1open in new window

Abstract: Existing guidelines for categorical color selection are heuristic, often grounded in intuition rather than empirical studies of readers' abilities. While design conventions recommend palettes maximize hue differences, more recent exploratory findings indicate other factors, such as lightness, may play a role in effective categorical palette design. We conducted a crowdsourced experiment on mean value judgments in multi-class scatterplots using five color palette families--single-hue sequential, multi-hue sequential, perceptually-uniform multi-hue sequential, diverging, and multi-hue categorical--that differ in how they manipulate hue and lightness. Participants estimated relative mean positions in scatterplots containing 2 to 10 categories using 20 colormaps. Our results confirm heuristic guidance that hue-based categorical palettes are most effective. However, they also provide additional evidence that scalable categorical encoding relies on more than hue variance.

Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations

Authors: Mahjabin Nahar, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee

Link: http://arxiv.org/abs/2404.03745v1open in new window

Abstract: The widespread adoption and transformative effects of large language models (LLMs) have sparked concerns regarding their capacity to produce inaccurate and fictitious content, referred to as `hallucinations'. Given the potential risks associated with hallucinations, humans should be able to identify them. This research aims to understand the human perception of LLM hallucinations by systematically varying the degree of hallucination (genuine, minor hallucination, major hallucination) and examining its interaction with warning (i.e., a warning of potential inaccuracies: absent vs. present). Participants (N=419) from Prolific rated the perceived accuracy and engaged with content (e.g., like, dislike, share) in a Q/A format. Results indicate that humans rank content as truthful in the order genuine > minor hallucination > major hallucination and user engagement behaviors mirror this pattern. More importantly, we observed that warning improves hallucination detection without significantly affecting the perceived truthfulness of genuine content. We conclude by offering insights for future tools to aid human detection of hallucinations.

Explaining Explainability: Understanding Concept Activation Vectors

Authors: Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal

Link: http://arxiv.org/abs/2404.03713v1open in new window

Abstract: Recent interpretability methods propose using concept-based explanations to translate the internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs. CAVs may be: (1) inconsistent between layers, (2) entangled with different concepts, and (3) spatially dependent. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact. Understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on ImageNet and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods.

Creator Hearts: Investigating the Impact Positive Signals from YouTube Creators in Shaping Comment Section Behavior

Authors: Frederick Choi, Charlotte Lambert, Vinay Koshy, Sowmya Pratipati, Tue Do, Eshwar Chandrasekharan

Link: http://arxiv.org/abs/2404.03612v1open in new window

Abstract: Much of the research in online moderation focuses on punitive actions. However, emerging research has shown that positive reinforcement is effective at encouraging desirable behavior on online platforms. We extend this research by studying the "creator heart" feature on YouTube, quantifying their primary effects on comments that receive hearts and on videos where hearts have been given. We find that creator hearts increased the visibility of comments, and increased the amount of positive engagement they received from other users. We also find that the presence of a creator hearted comment soon after a video is published can incentivize viewers to comment, increasing the total engagement with the video over time. We discuss the potential for creators to use hearts to shape behavior in their communities by highlighting, rewarding, and incentivizing desirable behaviors from users. We discuss avenues for extending our study to understanding positive signals from moderators on other platforms.

Integrating Large Language Models with Multimodal Virtual Reality Interfaces to Support Collaborative Human-Robot Construction Work

Authors: Somin Park, Carol C. Menassa, Vineet R. Kamat

Link: http://arxiv.org/abs/2404.03498v1open in new window

Abstract: In the construction industry, where work environments are complex, unstructured and often dangerous, the implementation of Human-Robot Collaboration (HRC) is emerging as a promising advancement. This underlines the critical need for intuitive communication interfaces that enable construction workers to collaborate seamlessly with robotic assistants. This study introduces a conversational Virtual Reality (VR) interface integrating multimodal interaction to enhance intuitive communication between construction workers and robots. By integrating voice and controller inputs with the Robot Operating System (ROS), Building Information Modeling (BIM), and a game engine featuring a chat interface powered by a Large Language Model (LLM), the proposed system enables intuitive and precise interaction within a VR setting. Evaluated by twelve construction workers through a drywall installation case study, the proposed system demonstrated its low workload and high usability with succinct command inputs. The proposed multimodal interaction system suggests that such technological integration can substantially advance the integration of robotic assistants in the construction industry.

Agora Elevator Bodily Sensation Study -- a report

Authors: Rebekah Rousi

Link: http://arxiv.org/abs/2404.03356v1open in new window

Abstract: This study set out to examine the relationship between expressed social emotions (i.e. that what people say they are feeling) and physical sensations, the connection between emotion and bodily experience. It additionally provided the opportunity to investigate how the neurological findings of gender differences can be observed in practice, what difference does it make in behaviour and judgment that we have varying levels of mirror neuron activity? The following report documents the study, procedure, results and findings.

Influence of Gameplay Duration, Hand Tracking, and Controller Based Control Methods on UX in VR

Authors: Tanja Kojić, Maurizio Vergari, Simon Knuth, Maximilian Warsinke, Sebastian Möller, Jan-Niklas Voigt-Antons

Link: http://arxiv.org/abs/2404.03337v1open in new window

Abstract: Inside-out tracking is growing popular in consumer VR, enhancing accessibility. It uses HMD camera data and neural networks for effective hand tracking. However, limited user experience studies have compared this method to traditional controllers, with no consensus on the optimal control technique. This paper investigates the impact of control methods and gaming duration on VR user experience, hypothesizing hand tracking might be preferred for short sessions and by users new to VR due to its simplicity. Through a lab study with twenty participants, evaluating presence, emotional response, UX quality, and flow, findings revealed control type and session length affect user experience without significant interaction. Controllers were generally superior, attributed to their reliability, and longer sessions increased presence and realism. The study found that individuals with more VR experience were more inclined to recommend hand tracking to others, which contradicted predictions.

Exploring Emotions in Multi-componential Space using Interactive VR Games

Authors: Rukshani Somarathna, Gelareh Mohammadi

Link: http://arxiv.org/abs/2404.03239v1open in new window

Abstract: Emotion understanding is a complex process that involves multiple components. The ability to recognise emotions not only leads to new context awareness methods but also enhances system interaction's effectiveness by perceiving and expressing emotions. Despite the attention to discrete and dimensional models, neuroscientific evidence supports those emotions as being complex and multi-faceted. One framework that resonated well with such findings is the Component Process Model (CPM), a theory that considers the complexity of emotions with five interconnected components: appraisal, expression, motivation, physiology and feeling. However, the relationship between CPM and discrete emotions has not yet been fully explored. Therefore, to better understand emotions underlying processes, we operationalised a data-driven approach using interactive Virtual Reality (VR) games and collected multimodal measures (self-reports, physiological and facial signals) from 39 participants. We used Machine Learning (ML) methods to identify the unique contributions of each component to emotion differentiation. Our results showed the role of different components in emotion differentiation, with the model including all components demonstrating the most significant contribution. Moreover, we found that at least five dimensions are needed to represent the variation of emotions in our dataset. These findings also have implications for using VR environments in emotion research and highlight the role of physiological signals in emotion recognition within such environments.

NLP4Gov: A Comprehensive Library for Computational Policy Analysis

Authors: Mahasweta Chakraborti, Sailendra Akash Bonagiri, Santiago Virgüez-Ruiz, Seth Frey

Link: http://arxiv.org/abs/2404.03206v1open in new window

Abstract: Formal rules and policies are fundamental in formally specifying a social system: its operation, boundaries, processes, and even ontology. Recent scholarship has highlighted the role of formal policy in collective knowledge creation, game communities, the production of digital public goods, and national social media governance. Researchers have shown interest in how online communities convene tenable self-governance mechanisms to regulate member activities and distribute rights and privileges by designating responsibilities, roles, and hierarchies. We present NLP4Gov, an interactive kit to train and aid scholars and practitioners alike in computational policy analysis. The library explores and integrates methods and capabilities from computational linguistics and NLP to generate semantic and symbolic representations of community policies from text records. Versatile, documented, and accessible, NLP4Gov provides granular and comparative views into institutional structures and interactions, along with other information extraction capabilities for downstream analysis.

Towards Collaborative Family-Centered Design for Online Safety, Privacy and Security

Authors: Mamtaj Akter, Zainab Agha, Ashwaq Alsoubai, Naima Ali, Pamela Wisniewski

Link: http://arxiv.org/abs/2404.03165v1open in new window

Abstract: Traditional online safety technologies often overly restrict teens and invade their privacy, while parents often lack knowledge regarding their digital privacy. As such, prior researchers have called for more collaborative approaches on adolescent online safety and networked privacy. In this paper, we propose family-centered approaches to foster parent-teen collaboration in ensuring their mobile privacy and online safety while respecting individual privacy, to enhance open discussion and teens' self-regulation. However, challenges such as power imbalances and conflicts with family values arise when implementing such approaches, making parent-teen collaboration difficult. Therefore, attending the family-centered design workshop will provide an invaluable opportunity for us to discuss these challenges and identify best research practices for the future of collaborative online safety and privacy within families.

Biodegradable Interactive Materials

Authors: Zhihan Zhang, Mallory Parker, Kuotian Liao, Jerry Cao, Anandghan Waghmare, Joseph Breda, Chris Matsumura, Serena Eley, Eleftheria Roumeli, Shwetak Patel, Vikram Iyer

Link: http://arxiv.org/abs/2404.03130v1open in new window

Abstract: The sense of touch is fundamental to how we interact with the physical and digital world. Conventional interactive surfaces and tactile interfaces use electronic sensors embedded into objects, however this approach poses serious challenges both for environmental sustainability and a future of truly ubiquitous interaction systems where information is encoded into everyday objects. In this work, we present Biodegradable Interactive Materials: backyard-compostable interactive interfaces that leverage information encoded in material properties. Inspired by natural systems, we propose an architecture that programmatically encodes multidimensional information into materials themselves and combines them with wearable devices that extend human senses to perceive the embedded data. We combine unrefined biological matter from plants and algae like chlorella with natural minerals like graphite and magnetite to produce materials with varying electrical, magnetic, and surface properties. We perform in-depth analysis using physics models, computational simulations, and real-world experiments to characterize their information density and develop decoding methods. Our passive, chip-less materials can robustly encode 12 bits of information, equivalent to 4096 unique classes. We further develop wearable device prototypes that can decode this information during touch interactions using off-the-shelf sensors. We demonstrate sample applications such as customized buttons, tactile maps, and interactive surfaces. We further demonstrate the natural degradation of these interactive materials in degrade outdoors within 21 days and perform a comparative environmental analysis of the benefits of this approach.

2024-04-03

Writing with AI Lowers Psychological Ownership, but Longer Prompts Can Help

Authors: Nikhita Joshi, Daniel Vogel

Link: http://arxiv.org/abs/2404.03108v1open in new window

Abstract: Feelings of something belonging to someone is called "psychological ownership." A common assumption is that writing with generative AI lowers psychological ownership, but the extent to which this occurs and the role of prompt length are unclear. We report on two experiments to better understand the relationship between psychological ownership and prompt length. Participants wrote short stories either completely by themselves or wrote prompts of varying lengths, enforced through word limits. Results show that when participants wrote longer prompts, they had higher levels of psychological ownership. Their comments suggest they felt encouraged to think more about their prompts and include more details about the story plot. However, these benefits plateaued when the prompt length was 75-100% of the target story length. Based on these results, we propose prompt entry interface designs that nudge users with soft and hard constraints to write longer prompts for increased psychological ownership.

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

Authors: Fred Hohman, Chaoqun Wang, Jinmook Lee, Jochen Görtler, Dominik Moritz, Jeffrey P Bigham, Zhile Ren, Cecile Foret, Qi Shan, Xiaoyi Zhang

Link: http://arxiv.org/abs/2404.03085v1open in new window

Abstract: On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.

Toward Safe Evolution of Artificial Intelligence (AI) based Conversational Agents to Support Adolescent Mental and Sexual Health Knowledge Discovery

Authors: Jinkyung Park, Vivek Singh, Pamela Wisniewski

Link: http://arxiv.org/abs/2404.03023v1open in new window

Abstract: Following the recent release of various Artificial Intelligence (AI) based Conversation Agents (CAs), adolescents are increasingly using CAs for interactive knowledge discovery on sensitive topics, including mental and sexual health topics. Exploring such sensitive topics through online search has been an essential part of adolescent development, and CAs can support their knowledge discovery on such topics through human-like dialogues. Yet, unintended risks have been documented with adolescents' interactions with AI-based CAs, such as being exposed to inappropriate content, false information, and/or being given advice that is detrimental to their mental and physical well-being (e.g., to self-harm). In this position paper, we discuss the current landscape and opportunities for CAs to support adolescents' mental and sexual health knowledge discovery. We also discuss some of the challenges related to ensuring the safety of adolescents when interacting with CAs regarding sexual and mental health topics. We call for a discourse on how to set guardrails for the safe evolution of AI-based CAs for adolescents.

Generative AI in the Wild: Prospects, Challenges, and Strategies

Authors: Yuan Sun, Eunchae Jang, Fenglong Ma, Ting Wang

Link: http://arxiv.org/abs/2404.04101v1open in new window

Abstract: Propelled by their remarkable capabilities to generate novel and engaging content, Generative Artificial Intelligence (GenAI) technologies are disrupting traditional workflows in many industries. While prior research has examined GenAI from a techno-centric perspective, there is still a lack of understanding about how users perceive and utilize GenAI in real-world scenarios. To bridge this gap, we conducted semi-structured interviews with (N=18) GenAI users in creative industries, investigating the human-GenAI co-creation process within a holistic LUA (Learning, Using and Assessing) framework. Our study uncovered an intriguingly complex landscape: Prospects-GenAI greatly fosters the co-creation between human expertise and GenAI capabilities, profoundly transforming creative workflows; Challenges-Meanwhile, users face substantial uncertainties and complexities arising from resource availability, tool usability, and regulatory compliance; Strategies-In response, users actively devise various strategies to overcome many of such challenges. Our study reveals key implications for the design of future GenAI tools.

ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale

Authors: Jinbin Huang, Chen Chen, Aditi Mishra, Bum Chul Kwon, Zhicheng Liu, Chris Bryan

Link: http://arxiv.org/abs/2404.02990v1open in new window

Abstract: Generative image models have emerged as a promising technology to produce realistic images. Despite potential benefits, concerns grow about its misuse, particularly in generating deceptive images that could raise significant ethical, legal, and societal issues. Consequently, there is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images. To this end, we developed ASAP, an interactive visualization system that automatically extracts distinct patterns of AI-generated images and allows users to interactively explore them via various views. To uncover fake patterns, ASAP introduces a novel image encoder, adapted from CLIP, which transforms images into compact "distilled" representations, enriched with information for differentiating authentic and fake images. These representations generate gradients that propagate back to the attention maps of CLIP's transformer block. This process quantifies the relative importance of each pixel to image authenticity or fakeness, exposing key deceptive patterns. ASAP enables the at scale interactive analysis of these patterns through multiple, coordinated visualizations. This includes a representation overview with innovative cell glyphs to aid in the exploration and qualitative evaluation of fake patterns across a vast array of images, as well as a pattern view that displays authenticity-indicating patterns in images and quantifies their impact. ASAP supports the analysis of cutting-edge generative models with the latest architectures, including GAN-based models like proGAN and diffusion models like the latent diffusion model. We demonstrate ASAP's usefulness through two usage scenarios using multiple fake image detection benchmark datasets, revealing its ability to identify and understand hidden patterns in AI-generated images, especially in detecting fake human faces produced by diffusion-based techniques.

Fragmented Moments, Balanced Choices: How Do People Make Use of Their Waiting Time?

Authors: Jian Zheng, Ge Gao

Link: http://arxiv.org/abs/2404.02880v1open in new window

Abstract: Everyone spends some time waiting every day. HCI research has developed tools for boosting productivity while waiting. However, little is known about how people naturally spend their waiting time. We conducted an experience sampling study with 21 working adults who used a mobile app to report their daily waiting time activities over two weeks. The aim of this study is to understand the activities people do while waiting and the effect of situational factors. We found that participants spent about 60% of their waiting time on leisure activities, 20% on productive activities, and 20% on maintenance activities. These choices are sensitive to situational factors, including accessible device, location, and certain routines of the day. Our study complements previous ones by demonstrating that people purpose waiting time for various goals beyond productivity and to maintain work-life balance. Our findings shed light on future empirical research and system design for time management.

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

Authors: Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag

Link: http://arxiv.org/abs/2404.02806v1open in new window

Abstract: Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability of LLMs to generate complete code that passes unit tests. As LLMs are increasingly used as programmer assistants, we study whether gains on existing benchmarks translate to gains in programmer productivity when coding with LLMs, including time spent coding. In addition to static benchmarks, we investigate the utility of preference metrics that might be used as proxies to measure LLM helpfulness, such as code acceptance or copy rates. To do so, we introduce RealHumanEval, a web interface to measure the ability of LLMs to assist programmers, through either autocomplete or chat support. We conducted a user study (N=213) using RealHumanEval in which users interacted with six LLMs of varying base model performance. Despite static benchmarks not incorporating humans-in-the-loop, we find that improvements in benchmark performance lead to increased programmer productivity; however gaps in benchmark versus human performance are not proportional -- a trend that holds across both forms of LLM support. In contrast, we find that programmer preferences do not correlate with their actual performance, motivating the need for better, human-centric proxy signals. We also open-source RealHumanEval to enable human-centric evaluation of new models and the study data to facilitate efforts to improve code models.

AI and personalized learning: bridging the gap with modern educational goals

Authors: Kristjan-Julius Laak, Jaan Aru

Link: http://arxiv.org/abs/2404.02798v1open in new window

Abstract: Personalized learning (PL) aspires to provide an alternative to the one-size-fits-all approach in education. Technology-based PL solutions have shown notable effectiveness in enhancing learning performance. However, their alignment with the broader goals of modern education is inconsistent across technologies and research areas. In this paper, we examine the characteristics of AI-driven PL solutions in light of the OECD Learning Compass 2030 goals. Our analysis indicates a gap between the objectives of modern education and the current direction of PL. We identify areas where most present-day PL technologies could better embrace essential elements of contemporary education, such as collaboration, cognitive engagement, and the development of general competencies. While the present PL solutions are instrumental in aiding learning processes, the PL envisioned by educational experts extends beyond simple technological tools and requires a holistic change in the educational system. Finally, we explore the potential of large language models, such as ChatGPT, and propose a hybrid model that blends artificial intelligence with a collaborative, teacher-facilitated approach to personalized learning.

IEEE VIS Workshop on Visualization for Climate Action and Sustainability

Authors: Benjamin Bach, Fanny Chevalier, Helen-Nicole Kostis, Mark Subbaro, Yvonne Jansen, Robert Soden

Link: http://arxiv.org/abs/2404.02743v1open in new window

Abstract: This first workshop on visualization for climate action and sustainability aims to explore and consolidate the role of data visualization in accelerating action towards addressing the current environmental crisis. Given the urgency and impact of the environmental crisis, we ask how our skills, research methods, and innovations can help by empowering people and organizations. We believe visualization holds an enormous power to aid understanding, decision making, communication, discussion, participation, education, and exploration of complex topics around climate action and sustainability. Hence, this workshop invites submissions and discussion around these topics with the goal of establishing a visible and actionable link between these fields and their respective stakeholders. The workshop solicits work-in-progress and research papers as well as pictorials and interactive demos from the whole range of visualization research (dashboards, interactive spaces, scientific visualization, storytelling, visual analytics, explainability etc.), within the context of environmentalism (climate science, sustainability, energy, circular economy, biodiversity, etc.) and across a range of scenarios from public awareness and understanding, visual analysis, expert decision making, science communication, personal decision making etc. After presentations of submissions, the workshop will feature dedicated discussion groups around data driven interactive experiences for the public, and tools for personal and professional decision making.

Evolving Agents: Interactive Simulation of Dynamic and Diverse Human Personalities

Authors: Jiale Li, Jiayang Li, Jiahao Chen, Yifan Li, Shijie Wang, Hugo Zhou, Minjun Ye, Yunsheng Su

Link: http://arxiv.org/abs/2404.02718v1open in new window

Abstract: Human-like Agents with diverse and dynamic personality could serve as an important design probe in the process of user-centered design, thereby enabling designers to enhance the user experience of interactive application.In this article, we introduce Evolving Agents, a novel agent architecture that consists of two systems: Personality and Behavior. The Personality system includes three modules: Cognition, Emotion and Character Growth. The Behavior system comprises two modules: Planning and Action. We also build a simulation platform that enables agents to interact with the environment and other agents. Evolving Agents can simulate the human personality evolution process. Compared to its initial state, agents' personality and behavior patterns undergo believable development after several days of simulation. Agents reflect on their behavior to reason and develop new personality traits. These traits, in turn, generate new behavior patterns, forming a feedback loop-like personality evolution.In our experiment, we utilized simulation platform with 10 agents for evaluation. During the evaluation, these agents experienced believable and inspirational personality evolution. Through ablation and control experiments, we demonstrated the outstanding effectiveness of agent personality evolution and all modules of our agent architecture contribute to creating believable human-like agents with diverse and dynamic personalities. We also demonstrated through workshops how Evolving Agents could inspire designers.

Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM

Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Yuekai Huang, Jun Hu, Qing Wang

Link: http://arxiv.org/abs/2404.02706v1open in new window

Abstract: Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in. Unfortunately, based on our analysis of 4,501 Android apps with text inputs, over 0.76 of them are missing hint-text. These issues are mostly caused by developers' lack of awareness when considering visually impaired individuals. To overcome these challenges, we developed an LLM-based hint-text generation model called HintDroid, which analyzes the GUI information of input components and uses in-context learning to generate the hint-text. To ensure the quality of hint-text generation, we further designed a feedback-based inspection mechanism to further adjust hint-text. The automated experiments demonstrate the high BLEU and a user study further confirms its usefulness. HintDroid can not only help visually impaired individuals, but also help ordinary people understand the requirements of input components. HintDroid demo video: https://youtu.be/FWgfcctRbfI.

Spatial Summation of Localized Pressure for Haptic Sensory Prostheses

Authors: Sreela Kodali, Cihualpilli Camino Cruz, Thomas C. Bulea, Kevin S. Rao Diana Bharucha-Goebel, Alexander T. Chesler, Carsten G. Bonnemann, Allison M. Okamura

Link: http://arxiv.org/abs/2404.02565v1open in new window

Abstract: A host of medical conditions, including amputations, diabetes, stroke, and genetic disease, result in loss of touch sensation. Because most types of sensory loss have no pharmacological treatment or rehabilitative therapy, we propose a haptic sensory prosthesis that provides substitutive feedback. The wrist and forearm are compelling locations for feedback due to available skin area and not occluding the hands, but have reduced mechanoreceptor density compared to the fingertips. Focusing on localized pressure as the feedback modality, we hypothesize that we can improve on prior devices by invoking a wider range of stimulus intensity using multiple points of pressure to evoke spatial summation, which is the cumulative perceptual experience from multiple points of stimuli. We conducted a preliminary perceptual test to investigate this idea and found that just noticeable difference is reduced with two points of pressure compared to one, motivating future work using spatial summation in sensory prostheses.

Cultural influence on autonomous vehicles acceptance

Authors: Chowdhury Shahriar Muzammel, Maria Spichkova, James Harland

Link: http://arxiv.org/abs/2404.03694v1open in new window

Abstract: Autonomous vehicles and other intelligent transport systems have been evolving rapidly and are being increasingly deployed worldwide. Previous work has shown that perceptions of autonomous vehicles and attitudes towards them depend on various attributes, including the respondent's age, education level and background. These findings with respect to age and educational level are generally uniform, such as showing that younger respondents are typically more accepting of autonomous vehicles, as are those with higher education levels. However the influence of factors such as culture are much less clear cut. In this paper we analyse the relationship between acceptance of autonomous vehicles and national culture by means of the well-known Hofstede cultural model.

PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts

Authors: Tian Huang, Chun Yu, Weinan Shi, Zijian Peng, David Yang, Weiqi Sun, Yuanchun Shi

Link: http://arxiv.org/abs/2404.02475v1open in new window

Abstract: Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing corresponding RPA tasks. PromptRPA incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for RPA generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.21% with PromptRPA, requiring an average of 1.66 user interventions for each new task. PromptRPA presents promising applications in fields such as tutorial creation, smart assistance, and customer service.

A neuroergonomics model to evaluating nuclear power plants operators' performance under heat stress driven by ECG time-frequency spectrums and fNIRS prefrontal cortex network: a CNN-GAT fusion model

Authors: Yan Zhang, Ming Jia, Meng Li, JianYu Wang, XiangMin Hu, ZhiHui Xu, Tao Chen

Link: http://arxiv.org/abs/2404.02439v1open in new window

Abstract: Operators experience complicated physiological and psychological states when exposed to extreme heat stress, which can impair cognitive function and decrease performance significantly, ultimately leading to severe secondary disasters. Therefore, there is an urgent need for a feasible technique to identify their abnormal states to enhance the reliability of human-cybernetics systems. With the advancement of deep learning in physiological modeling, a model for evaluating operators' performance driven by electrocardiogram (ECG) and functional near-infrared spectroscopy (fNIRS) was proposed, demonstrating high ecological validity. The model fused a convolutional neural network (CNN) backbone and a graph attention network (GAT) backbone to extract discriminative features from ECG time-frequency spectrums and fNIRS prefrontal cortex (PFC) network respectively with deeper neuroscience domain knowledge, and eventually achieved 0.90 AUC. Results supported that handcrafted features extracted by specialized neuroscience methods can alleviate overfitting. Inspired by the small-world nature of the brain network, the fNIRS PFC network was organized as an undirected graph and embedded by GAT. It is proven to perform better in information aggregation and delivery compared to a simple non-linear transformation. The model provides a potential neuroergonomics application for evaluating the human state in vital human-cybernetics systems under industry 5.0 scenarios.

A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion

Authors: Zeyu Zhao, Nan Gao, Zhi Zeng, Guixuan Zhang, Jie Liu, Shuwu Zhang

Link: http://arxiv.org/abs/2404.02411v1open in new window

Abstract: Diffusion models have shown great success in generating high-quality co-speech gestures for interactive humanoid robots or digital avatars from noisy input with the speech audio or text as conditions. However, they rarely focus on providing rich editing capabilities for content creators other than high-level specialized measures like style conditioning. To resolve this, we propose a unified framework utilizing diffusion inversion that enables multi-level editing capabilities for co-speech gesture generation without re-training. The method takes advantage of two key capabilities of invertible diffusion models. The first is that through inversion, we can reconstruct the intermediate noise from gestures and regenerate new gestures from the noise. This can be used to obtain gestures with high-level similarities to the original gestures for different speech conditions. The second is that this reconstruction reduces activation caching requirements during gradient calculation, making the direct optimization on input noises possible on current hardware with limited memory. With different loss functions designed for, e.g., joint rotation or velocity, we can control various low-level details by automatically tweaking the input noises through optimization. Extensive experiments on multiple use cases show that this framework succeeds in unifying high-level and low-level co-speech gesture editing.

2024-04-02

From Delays to Densities: Exploring Data Uncertainty through Speech, Text, and Visualization

Authors: Chase Stokes, Chelsea Sanker, Bridget Cogley, Vidya Setlur

Link: http://arxiv.org/abs/2404.02317v1open in new window

Abstract: Understanding and communicating data uncertainty is crucial for making informed decisions in sectors like finance and healthcare. Previous work has explored how to express uncertainty in various modes. For example, uncertainty can be expressed visually with quantile dot plots or linguistically with hedge words and prosody. Our research aims to systematically explore how variations within each mode contribute to communicating uncertainty to the user; this allows us to better understand each mode's affordances and limitations. We completed an exploration of the uncertainty design space based on pilot studies and ran two crowdsourced experiments examining how speech, text, and visualization modes and variants within them impact decision-making with uncertain data. Visualization and text were most effective for rational decision-making, though text resulted in lower confidence. Speech garnered the highest trust despite sometimes leading to risky decisions. Results from these studies indicate meaningful trade-offs among modes of information and encourage exploration of multimodal data representations.

A Change of Scenery: Transformative Insights from Retrospective VR Embodied Perspective-Taking of Conflict With a Close Other

Authors: Seraphina Yong, Leo Cui, Evan Suma Rosenberg, Svetlana Yarosh

Link: http://arxiv.org/abs/2404.02277v1open in new window

Abstract: Close relationships are irreplaceable social resources, yet prone to high-risk conflict. Building on findings from the fields of HCI, virtual reality, and behavioral therapy, we evaluate the unexplored potential of retrospective VR-embodied perspective-taking to fundamentally influence conflict resolution in close others. We develop a biographically-accurate Retrospective Embodied Perspective-Taking system (REPT) and conduct a mixed-methods evaluation of its influence on close others' reflection and communication, compared to video-based reflection methods currently used in therapy (treatment as usual, or TAU). Our key findings provide evidence that REPT was able to significantly improve communication skills and positive sentiment of both partners during conflict, over TAU. The qualitative data also indicated that REPT surpassed basic perspective-taking by exclusively stimulating users to embody and reflect on both their own and their partner's experiences at the same level. In light of these findings, we provide implications and an agenda for social embodiment in HCI design: conceptualizing the use of `embodied social cognition,' and envisioning socially-embodied experiences as an interactive context.

Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

Authors: Ruiwei Xiao, Xinying Hou, John Stamper

Link: http://arxiv.org/abs/2404.02213v1open in new window

Abstract: Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-solving and learning, we conducted a think-aloud study with 12 novices using the LLM Hint Factory, a system providing four levels of hints from general natural language guidance to concrete code assistance, varying in format and granularity. We discovered that high-level natural language hints alone can be helpless or even misleading, especially when addressing next-step or syntax-related help requests. Adding lower-level hints, like code examples with in-line comments, can better support students. The findings open up future work on customizing help responses from content, format, and granularity levels to accurately identify and meet students' learning needs.

Harder, Better, Faster, Stronger: Interactive Visualization for Human-Centered AI Tools

Authors: Md Naimul Hoque, Sungbok Shin, Niklas Elmqvist

Link: http://arxiv.org/abs/2404.02147v1open in new window

Abstract: Human-centered AI (HCAI), rather than replacing the human, puts the human user in the driver's seat of so-called human-centered AI-infused tools (HCAI tools): interactive software tools that amplify, augment, empower, and enhance human performance using AI models; often novel generative or foundation AI ones. In this paper, we discuss how interactive visualization can be a key enabling technology for creating such human-centered AI tools. Visualization has already been shown to be a fundamental component in explainable AI models, and coupling this with data-driven, semantic, and unified interaction feedback loops will enable a human-centered approach to integrating AI models in the loop with human users. We present several examples of our past and current work on such HCAI tools, including for creative writing, temporal prediction, and user experience analysis. We then draw parallels between these tools to suggest common themes on how interactive visualization can support the design of future HCAI tools.

The Effects of Group Sanctions on Participation and Toxicity: Quasi-experimental Evidence from the Fediverse

Authors: Carl Colglazier, Nathan TeBlunthuis, Aaron Shaw

Link: http://arxiv.org/abs/2404.02109v1open in new window

Abstract: Online communities often overlap and coexist, despite incongruent norms and approaches to content moderation. When communities diverge, decentralized and federated communities may pursue group-level sanctions, including defederation (disconnection) to block communication between members of specific communities. We investigate the effects of defederation in the context of the Fediverse, a set of decentralized, interconnected social networks with independent governance. Mastodon and Pleroma, the most popular software powering the Fediverse, allow administrators on one server to defederate from another. We use a difference-in-differences approach and matched controls to estimate the effects of defederation events on participation and message toxicity among affected members of the blocked and blocking servers. We find that defederation causes a drop in activity for accounts on the blocked servers, but not on the blocking servers. Also, we find no evidence of an effect of defederation on message toxicity.

Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows

Authors: Grace Guo, Dustin Arendt, Alex Endert

Link: http://arxiv.org/abs/2404.02081v1open in new window

Abstract: Explainable AI (XAI) tools represent a turn to more human-centered and human-in-the-loop AI approaches that emphasize user needs and perspectives in machine learning model development workflows. However, while the majority of ML resources available today are developed for Python computational environments such as JupyterLab and Jupyter Notebook, the same has not been true of interactive XAI systems, which are often still implemented as standalone interfaces. In this paper, we address this mismatch by identifying three design patterns for embedding front-end XAI interfaces into Jupyter, namely: 1) One-way communication from Python to JavaScript, 2) Two-way data synchronization, and 3) Bi-directional callbacks. We also provide an open-source toolkit, bonXAI, that demonstrates how each design pattern might be used to build interactive XAI tools for a Pytorch text classification workflow. Finally, we conclude with a discussion of best practices and open questions. Our aims for this paper are to discuss how interactive XAI tools might be developed for computational notebooks, and how they can better integrate into existing model development workflows to support more collaborative, human-centered AI.

Preuve de concept d'un bot vocal dialoguant en wolof

Authors: Elodie Gauthier, Papa-Séga Wade, Thierry Moudenc, Patrice Collen, Emilie De Neef, Oumar Ba, Ndeye Khoyane Cama, Cheikh Ahmadou Bamba Kebe, Ndeye Aissatou Gningue, Thomas Mendo'o Aristide

Link: http://arxiv.org/abs/2404.02009v1open in new window

Abstract: This paper presents the proof-of-concept of the first automatic voice assistant ever built in Wolof language, the main vehicular language spoken in Senegal. This voicebot is the result of a collaborative research project between Orange Innovation in France, Orange Senegal (aka Sonatel) and ADNCorp, a small IT company based in Dakar, Senegal. The purpose of the voicebot is to provide information to Orange customers about the Sargal loyalty program of Orange Senegal by using the most natural mean to communicate: speech. The voicebot receives in input the customer's oral request that is then processed by a SLU system to reply to the customer's request using audio recordings. The first results of this proof-of-concept are encouraging as we achieved 22% of WER for the ASR task and 78% of F1-score on the NLU task.

Cash or Non-Cash? Unveiling Ideators' Incentive Preferences in Crowdsourcing Contests

Authors: Christoph Riedl, Johann Füller, Katja Hutter, Gerard J. Tellis

Link: http://arxiv.org/abs/2404.01997v1open in new window

Abstract: Even though research has repeatedly shown that non-cash incentives can be effective, cash incentives are the de facto standard in crowdsourcing contests. In this multi-study research, we quantify ideators' preferences for non-cash incentives and investigate how allowing ideators to self-select their preferred incentive -- offering ideators a choice between cash and non-cash incentives -- affects their creative performance. We further explore whether the market context of the organization hosting the contest -- social (non-profit) or monetary (for-profit) -- moderates incentive preferences and their effectiveness. We find that individuals exhibit heterogeneous incentive preferences and often prefer non-cash incentives, even in for-profit contexts. Offering ideators a choice of incentives can enhance creative performance. Market context moderates the effect of incentives, such that ideators who receive non-cash incentives in for-profit contexts tend to exert less effort. We show that heterogeneity of ideators' preferences (and the ability to satisfy diverse preferences with suitably diverse incentive options) is a critical boundary condition to realizing benefits from offering ideators a choice of incentives. We provide managers with guidance to design effective incentives by improving incentive-preference fit for ideators.

Fast and Adaptive Questionnaires for Voting Advice Applications

Authors: Fynn Bachmann, Cristina Sarasua, Abraham Bernstein

Link: http://arxiv.org/abs/2404.01872v1open in new window

Abstract: The effectiveness of Voting Advice Applications (VAA) is often compromised by the length of their questionnaires. To address user fatigue and incomplete responses, some applications (such as the Swiss Smartvote) offer a condensed version of their questionnaire. However, these condensed versions can not ensure the accuracy of recommended parties or candidates, which we show to remain below 40%. To tackle these limitations, this work introduces an adaptive questionnaire approach that selects subsequent questions based on users' previous answers, aiming to enhance recommendation accuracy while reducing the number of questions posed to the voters. Our method uses an encoder and decoder module to predict missing values at any completion stage, leveraging a two-dimensional latent space reflective of political science's traditional methods for visualizing political orientations. Additionally, a selector module is proposed to determine the most informative subsequent question based on the voter's current position in the latent space and the remaining unanswered questions. We validated our approach using the Smartvote dataset from the Swiss Federal elections in 2019, testing various spatial models and selection methods to optimize the system's predictive accuracy. Our findings indicate that employing the IDEAL model both as encoder and decoder, combined with a PosteriorRMSE method for question selection, significantly improves the accuracy of recommendations, achieving 74% accuracy after asking the same number of questions as in the condensed version.

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Authors: Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

Link: http://arxiv.org/abs/2404.01862v1open in new window

Abstract: Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work. There are two main challenges: 1) A suitable motion feature is needed to describe complex human movements with crucial appearance information. 2) Gestures and speech exhibit inherent dependencies and should be temporally aligned even of arbitrary length. To solve these problems, we present a novel motion-decoupled framework to generate co-speech gesture videos. Specifically, we first introduce a well-designed nonlinear TPS transformation to obtain latent motion features preserving essential appearance information. Then a transformer-based diffusion model is proposed to learn the temporal correlation between gestures and speech, and performs generation in the latent motion space, followed by an optimal motion selection module to produce long-term coherent and consistent gesture videos. For better visual perception, we further design a refinement network focusing on missing details of certain areas. Extensive experimental results show that our proposed framework significantly outperforms existing approaches in both motion and video-related evaluations. Our code, demos, and more resources are available at https://github.com/thuhcsi/S2G-MDDiffusion.

"That's Not Good Science!": An Argument for the Thoughtful Use of Formative Situations in Research through Design

Authors: Raquel B Robinson, Anya Osborne, Chen Ji, James Collin Fey, Ella Dagan, Katherine Isbister

Link: http://arxiv.org/abs/2404.01848v1open in new window

Abstract: Most currently accepted approaches to evaluating Research through Design (RtD) presume that design prototypes are finalized and ready for robust testing in laboratory or in-the-wild settings. However, it is also valuable to assess designs at intermediate phases with mid-fidelity prototypes, not just to inform an ongoing design process, but also to glean knowledge of broader use to the research community. We propose 'formative situations' as a frame for examining mid-fidelity prototypes-in-process in this way. We articulate a set of criteria to help the community better assess the rigor of formative situations, in the service of opening conversation about establishing formative situations as a valuable contribution type within the RtD community.

Unmasking the Nuances of Loneliness: Using Digital Biomarkers to Understand Social and Emotional Loneliness in College Students

Authors: Malik Muhammad Qirtas, Evi Zafeirid, Dirk Pesch, Eleanor Bantry White

Link: http://arxiv.org/abs/2404.01845v1open in new window

Abstract: Background: Loneliness among students is increasing across the world, with potential consequences for mental health and academic success. To address this growing problem, accurate methods of detection are needed to identify loneliness and to differentiate social and emotional loneliness so that intervention can be personalized to individual need. Passive sensing technology provides a unique technique to capture behavioral patterns linked with distinct loneliness forms, allowing for more nuanced understanding and interventions for loneliness. Methods: To differentiate between social and emotional loneliness using digital biomarkers, our study included statistical tests, machine learning for predictive modeling, and SHAP values for feature importance analysis, revealing important factors in loneliness classification. Results: Our analysis revealed significant behavioral differences between socially and emotionally lonely groups, particularly in terms of phone usage and location-based features , with machine learning models demonstrating substantial predictive power in classifying loneliness levels. The XGBoost model, in particular, showed high accuracy and was effective in identifying key digital biomarkers, including phone usage duration and location-based features, as significant predictors of loneliness categories. Conclusion: This study underscores the potential of passive sensing data, combined with machine learning techniques, to provide insights into the behavioral manifestations of social and emotional loneliness among students. The identification of key digital biomarkers paves the way for targeted interventions aimed at mitigating loneliness in this population.

Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods

Authors: Zdravko Marinov, Moon Kim, Jens Kleesiek, Rainer Stiefelhagen

Link: http://arxiv.org/abs/2404.01816v1open in new window

Abstract: Interactive segmentation plays a crucial role in accelerating the annotation, particularly in domains requiring specialized expertise such as nuclear medicine. For example, annotating lesions in whole-body Positron Emission Tomography (PET) images can require over an hour per volume. While previous works evaluate interactive segmentation models through either real user studies or simulated annotators, both approaches present challenges. Real user studies are expensive and often limited in scale, while simulated annotators, also known as robot users, tend to overestimate model performance due to their idealized nature. To address these limitations, we introduce four evaluation metrics that quantify the user shift between real and simulated annotators. In an initial user study involving four annotators, we assess existing robot users using our proposed metrics and find that robot users significantly deviate in performance and annotation behavior compared to real annotators. Based on these findings, we propose a more realistic robot user that reduces the user shift by incorporating human factors such as click variation and inter-annotator disagreement. We validate our robot user in a second user study, involving four other annotators, and show it consistently reduces the simulated-to-real user shift compared to traditional robot users. By employing our robot user, we can conduct more large-scale and cost-efficient evaluations of interactive segmentation models, while preserving the fidelity of real user studies. Our implementation is based on MONAI Label and will be made publicly available.

Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Authors: Nassim Sehad, Lina Bariah, Wassim Hamidouche, Hamed Hellaoui, Riku Jäntti, Mérouane Debbah

Link: http://arxiv.org/abs/2404.01713v1open in new window

Abstract: Over the past two decades, the Internet-of-Things (IoT) has been a transformative concept, and as we approach 2030, a new paradigm known as the Internet of Senses (IoS) is emerging. Unlike conventional Virtual Reality (VR), IoS seeks to provide multi-sensory experiences, acknowledging that in our physical reality, our perception extends far beyond just sight and sound; it encompasses a range of senses. This article explores existing technologies driving immersive multi-sensory media, delving into their capabilities and potential applications. This exploration includes a comparative analysis between conventional immersive media streaming and a proposed use case that lever- ages semantic communication empowered by generative Artificial Intelligence (AI). The focal point of this analysis is the substantial reduction in bandwidth consumption by 99.93% in the proposed scheme. Through this comparison, we aim to underscore the practical applications of generative AI for immersive media while addressing the challenges and outlining future trajectories.

Tell and show: Combining multiple modalities to communicate manipulation tasks to a robot

Authors: Petr Vanc, Radoslav Skoviera, Karla Stepanova

Link: http://arxiv.org/abs/2404.01702v1open in new window

Abstract: As human-robot collaboration is becoming more widespread, there is a need for a more natural way of communicating with the robot. This includes combining data from several modalities together with the context of the situation and background knowledge. Current approaches to communication typically rely only on a single modality or are often very rigid and not robust to missing, misaligned, or noisy data. In this paper, we propose a novel method that takes inspiration from sensor fusion approaches to combine uncertain information from multiple modalities and enhance it with situational awareness (e.g., considering object properties or the scene setup). We first evaluate the proposed solution on simulated bimodal datasets (gestures and language) and show by several ablation experiments the importance of various components of the system and its robustness to noisy, missing, or misaligned observations. Then we implement and evaluate the model on the real setup. In human-robot interaction, we must also consider whether the selected action is probable enough to be executed or if we should better query humans for clarification. For these purposes, we enhance our model with adaptive entropy-based thresholding that detects the appropriate thresholds for different types of interaction showing similar performance as fine-tuned fixed thresholds.

NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps

Authors: Kristina Gligoric, Myra Cheng, Lucia Zheng, Esin Durmus, Dan Jurafsky

Link: http://arxiv.org/abs/2404.01651v1open in new window

Abstract: The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word. Here we show that computationally modeling this use-mention distinction is crucial for dealing with counterspeech online. Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself (e.g., calling a vaccine dangerous is not the same as expressing disapproval of someone for calling vaccines dangerous). We show that even recent language models fail at distinguishing use from mention, and that this failure propagates to two key downstream tasks: misinformation and hate speech detection, resulting in censorship of counterspeech. We introduce prompting mitigations that teach the use-mention distinction, and show they reduce these errors. Our work highlights the importance of the use-mention distinction for NLP and CSS and offers ways to address it.

InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis

Authors: Luoxuan Weng, Xingbo Wang, Junyu Lu, Yingchaojie Feng, Yihan Liu, Wei Chen

Link: http://arxiv.org/abs/2404.01644v1open in new window

Abstract: The proliferation of large language models (LLMs) has revolutionized the capabilities of natural language interfaces (NLIs) for data analysis. LLMs can perform multi-step and complex reasoning to generate data insights based on users' analytic intents. However, these insights often entangle with an abundance of contexts in analytic conversations such as code, visualizations, and natural language explanations. This hinders efficient identification, verification, and interpretation of insights within the current chat-based interfaces of LLMs. In this paper, we first conduct a formative study with eight experienced data analysts to understand their general workflow and pain points during LLM-powered data analysis. Then, we propose an LLM-based multi-agent framework to automatically extract, associate, and organize insights along with the analysis process. Based on this, we introduce InsightLens, an interactive system that visualizes the intricate conversational contexts from multiple aspects to facilitate insight discovery and exploration. A user study with twelve data analysts demonstrates the effectiveness of InsightLens, showing that it significantly reduces users' manual and cognitive effort without disrupting their conversational data analysis workflow, leading to a more efficient analysis experience.

Gen4DS: Workshop on Data Storytelling in an Era of Generative AI

Authors: Xingyu Lan, Leni Yang, Zezhong Wang, Danqing Shi, Sheelagh Carpendale

Link: http://arxiv.org/abs/2404.01622v1open in new window

Abstract: Storytelling is an ancient and precious human ability that has been rejuvenated in the digital age. Over the last decade, there has been a notable surge in the recognition and application of data storytelling, both in academia and industry. Recently, the rapid development of generative AI has brought new opportunities and challenges to this field, sparking numerous new questions. These questions may not necessarily be quickly transformed into papers, but we believe it is necessary to promptly discuss them to help the community better clarify important issues and research agendas for the future. We thus invite you to join our workshop (Gen4DS) to discuss questions such as: How can generative AI facilitate the creation of data stories? How might generative AI alter the workflow of data storytellers? What are the pitfalls and risks of incorporating AI in storytelling? We have designed both paper presentations and interactive activities (including hands-on creation, group discussion pods, and debates on controversial issues) for the workshop. We hope that participants will learn about the latest advances and pioneering work in data storytelling, engage in critical conversations with each other, and have an enjoyable, unforgettable, and meaningful experience at the event.

Collaborative human-AI trust (CHAI-T): A process framework for active management of trust in human-AI collaboration

Authors: Melanie J. McGrath, Andreas Duenser, Justine Lacey, Cecile Paris

Link: http://arxiv.org/abs/2404.01615v1open in new window

Abstract: Collaborative human-AI (HAI) teaming combines the unique skills and capabilities of humans and machines in sustained teaming interactions leveraging the strengths of each. In tasks involving regular exposure to novelty and uncertainty, collaboration between adaptive, creative humans and powerful, precise artificial intelligence (AI) promises new solutions and efficiencies. User trust is essential to creating and maintaining these collaborative relationships. Established models of trust in traditional forms of AI typically recognize the contribution of three primary categories of trust antecedents: characteristics of the human user, characteristics of the technology, and environmental factors. The emergence of HAI teams, however, requires an understanding of human trust that accounts for the specificity of task contexts and goals, integrates processes of interaction, and captures how trust evolves in a teaming environment over time. Drawing on both the psychological and computer science literature, the process framework of trust in collaborative HAI teams (CHAI-T) presented in this paper adopts the tripartite structure of antecedents established by earlier models, while incorporating team processes and performance phases to capture the dynamism inherent to trust in teaming contexts. These features enable active management of trust in collaborative AI systems, with practical implications for the design and deployment of collaborative HAI teams.

Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game

Authors: Silin Du, Xiaowei Zhang

Link: http://arxiv.org/abs/2404.01602v1open in new window

Abstract: Large language models (LLMs) have exhibited memorable strategic behaviors in social deductive games. However, the significance of opinion leadership exhibited by LLM-based agents has been overlooked, which is crucial for practical applications in multi-agent and human-AI interaction settings. Opinion leaders are individuals who have a noticeable impact on the beliefs and behaviors of others within a social group. In this work, we employ the Werewolf game as a simulation platform to assess the opinion leadership of LLMs. The game features the role of the Sheriff, tasked with summarizing arguments and recommending decision options, and therefore serves as a credible proxy for an opinion leader. We develop a framework integrating the Sheriff role and devise two novel metrics for evaluation based on the critical characteristics of opinion leaders. The first metric measures the reliability of the opinion leader, and the second assesses the influence of the opinion leader on other players' decisions. We conduct extensive experiments to evaluate LLMs of different scales. In addition, we collect a Werewolf question-answering dataset (WWQA) to assess and enhance LLM's grasp of the game rules, and we also incorporate human participants for further analysis. The results suggest that the Werewolf game is a suitable test bed to evaluate the opinion leadership of LLMs and few LLMs possess the capacity for opinion leadership.

Leveraging Digital Perceptual Technologies for Remote Perception and Analysis of Human Biomechanical Processes: A Contactless Approach for Workload and Joint Force Assessment

Authors: Jesudara Omidokun, Darlington Egeonu, Bochen Jia, Liang Yang

Link: http://arxiv.org/abs/2404.01576v1open in new window

Abstract: This study presents an innovative computer vision framework designed to analyze human movements in industrial settings, aiming to enhance biomechanical analysis by integrating seamlessly with existing software. Through a combination of advanced imaging and modeling techniques, the framework allows for comprehensive scrutiny of human motion, providing valuable insights into kinematic patterns and kinetic data. Utilizing Convolutional Neural Networks (CNNs), Direct Linear Transform (DLT), and Long Short-Term Memory (LSTM) networks, the methodology accurately detects key body points, reconstructs 3D landmarks, and generates detailed 3D body meshes. Extensive evaluations across various movements validate the framework's effectiveness, demonstrating comparable results to traditional marker-based models with minor differences in joint angle estimations and precise estimations of weight and height. Statistical analyses consistently support the framework's reliability, with joint angle estimations showing less than a 5-degree difference for hip flexion, elbow flexion, and knee angle methods. Additionally, weight estimation exhibits an average error of less than 6 % for weight and less than 2 % for height when compared to ground-truth values from 10 subjects. The integration of the Biomech-57 landmark skeleton template further enhances the robustness and reinforces the framework's credibility. This framework shows significant promise for meticulous biomechanical analysis in industrial contexts, eliminating the need for cumbersome markers and extending its utility to diverse research domains, including the study of specific exoskeleton devices' impact on facilitating the prompt return of injured workers to their tasks.

2024-04-01

PlayFutures: Imagining Civic Futures with AI and Puppets

Authors: Supratim Pait, Sumita Sharma, Ashley Frith, Michael Nitsche, Noura Howell

Link: http://arxiv.org/abs/2404.01527v1open in new window

Abstract: Children are the builders of the future and crucial to how the technologies around us develop. They are not voters but are participants in how the public spaces in a city are used. Through a workshop designed around kids of age 9-12, we investigate if novel technologies like artificial intelligence can be integrated in existing ways of play and performance to 1) re-imagine the future of civic spaces, 2) reflect on these novel technologies in the process and 3) build ways of civic engagement through play. We do this using a blend AI image generation and Puppet making to ultimately build future scenarios, perform debate and discussion around the futures and reflect on AI, its role and potential in their process. We present our findings of how AI helped envision these futures, aid performances, and report some initial reflections from children about the technology.

DeLVE into Earth's Past: A Visualization-Based Exhibit Deployed Across Multiple Museum Contexts

Authors: Mara Solen, Nigar Sultana, Laura Lukes, Tamara Munzner

Link: http://arxiv.org/abs/2404.01488v1open in new window

Abstract: While previous work has found success in deploying visualizations as museum exhibits, differences in visitor behaviour across varying museum contexts are understudied. We present an interactive Deep-time Literacy Visualization Exhibit (DeLVE) to help museum visitors understand deep time (lengths of extremely long geological processes) by improving proportional reasoning skills through comparison of different time periods. DeLVE uses a new visualization idiom, Connected Multi-Tier Ranges, to visualize curated datasets of past events across multiple scales of time, relating extreme scales with concrete scales that have more familiar magnitudes and units. Museum staff at three separate museums approved the deployment of DeLVE as a digital kiosk, and devoted time to curating a unique dataset in each of them. We collect data from two sources, an observational study and system trace logs, yielding evidence of successfully meeting our requirements. We discuss the importance of context: similar museum exhibits in different contexts were received very differently by visitors. We additionally discuss differences in our process from standard design study methodology which is focused on design studies for data analysis purposes, rather than for presentation. Supplemental materials are available at: https://osf.io/z53dq/?view_only=4df33aad207144aca149982412125541

A Design Space for Visualization with Large Scale-Item Ratios

Authors: Mara Solen, Tamara Munzner

Link: http://arxiv.org/abs/2404.01485v1open in new window

Abstract: The scale-item ratio is the relationship between the largest scale and the smallest item in a visualization. Designing visualizations when this ratio is large can be challenging, and designers have developed many approaches to overcome this challenge. We present a design space for visualization with large scale-item ratios. The design space includes three dimensions, with eight total subdimensions. We demonstrate its descriptive power by using it to code approaches from a corpus we compiled of 54 examples, created by a mix of academics and practitioners. We then partition these examples into five strategies, which are shared approaches with respect to design space dimension choices. We demonstrate generative power by analyzing missed opportunities within the corpus of examples, identified through analysis of the design space, where we note how certain examples could have benefited from different choices. Supplemental materials: https://osf.io/wbrdm/?view_only=04389a2101a04e71a2c208a93bf2f7f2

Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs

Authors: Pengda Wang, Zilin Xiao, Hanjie Chen, Frederick L. Oswald

Link: http://arxiv.org/abs/2404.01461v1open in new window

Abstract: Although large language models (LLMs) have demonstrated remarkable proficiency in understanding text and generating human-like text, they may exhibit biases acquired from training data in doing so. Specifically, LLMs may be susceptible to a common cognitive trap in human decision-making called the representativeness heuristic. This is a concept in psychology that refers to judging the likelihood of an event based on how closely it resembles a well-known prototype or typical example versus considering broader facts or statistical evidence. This work investigates the impact of the representativeness heuristic on LLM reasoning. We created REHEAT (Representativeness Heuristic AI Testing), a dataset containing a series of problems spanning six common types of representativeness heuristics. Experiments reveal that four LLMs applied to REHEAT all exhibited representativeness heuristic biases. We further identify that the model's reasoning steps are often incorrectly based on a stereotype rather than the problem's description. Interestingly, the performance improves when adding a hint in the prompt to remind the model of using its knowledge. This suggests the uniqueness of the representativeness heuristic compared to traditional biases. It can occur even when LLMs possess the correct knowledge while failing in a cognitive trap. This highlights the importance of future research focusing on the representativeness heuristic in model reasoning and decision-making and on developing solutions to address it.

A Preliminary Roadmap for LLMs as Assistants in Exploring, Analyzing, and Visualizing Knowledge Graphs

Authors: Harry Li, Gabriel Appleby, Ashley Suh

Link: http://arxiv.org/abs/2404.01425v1open in new window

Abstract: We present a mixed-methods study to explore how large language models (LLMs) can assist users in the visual exploration and analysis of knowledge graphs (KGs). We surveyed and interviewed 20 professionals from industry, government laboratories, and academia who regularly work with KGs and LLMs, either collaboratively or concurrently. Our findings show that participants overwhelmingly want an LLM to facilitate data retrieval from KGs through joint query construction, to identify interesting relationships in the KG through multi-turn conversation, and to create on-demand visualizations from the KG that enhance their trust in the LLM's outputs. To interact with an LLM, participants strongly prefer a chat-based 'widget,' built on top of their regular analysis workflows, with the ability to guide the LLM using their interactions with a visualization. When viewing an LLM's outputs, participants similarly prefer a combination of annotated visuals (e.g., subgraphs or tables extracted from the KG) alongside summarizing text. However, participants also expressed concerns about an LLM's ability to maintain semantic intent when translating natural language questions into KG queries, the risk of an LLM 'hallucinating' false data from the KG, and the difficulties of engineering a 'perfect prompt.' From the analysis of our interviews, we contribute a preliminary roadmap for the design of LLM-driven knowledge graph exploration systems and outline future opportunities in this emergent design space.

Towards a potential paradigm shift in health data collection and analysis

Authors: David Josef Herzog, Nitsa Judith Herzog

Link: http://arxiv.org/abs/2404.01403v1open in new window

Abstract: Industrial Revolution 4.0 transforms healthcare systems. The first three technological revolutions changed the relationship between human and machine interaction due to the exponential growth of machine numbers. The fourth revolution put humans into a situation where heterogeneous data is produced with unmatched quantity and quality not only by traditional methods, enforced by digitization, but also by ubiquitous computing, machine-to-machine interactions and smart environment. The modern cyber-physical space underlines the role of the person in the expanding context of computerization and big data processing. In healthcare, where data collection and analysis particularly depend on human efforts, the disruptive nature of these developments is evident. Adaptation to this process requires deep scrutiny of the trends and recognition of future medical data technologies` evolution. Significant difficulties arise from discrepancies in requirements by healthcare, administrative and technology stakeholders. Black box and grey box decisions made in medical imaging and diagnostic Decision Support Software are often not transparent enough for the professional, social and medico-legal requirements. While Explainable AI proposes a partial solution for AI applications in medicine, the approach has to be wider and multiplex. LLM potential and limitations are also discussed. This paper lists the most significant issues in these topics and describes possible solutions.

Evaluating Privacy Perceptions, Experience, and Behavior of Software Development Teams

Authors: Maxwell Prybylo, Sara Haghighi, Sai Teja Peddinti, Sepideh Ghanavati

Link: http://arxiv.org/abs/2404.01283v1open in new window

Abstract: With the increase in the number of privacy regulations, small development teams are forced to make privacy decisions on their own. In this paper, we conduct a mixed-method survey study, including statistical and qualitative analysis, to evaluate the privacy perceptions, practices, and knowledge of members involved in various phases of software development (SDLC). Our survey includes 362 participants from 23 countries, encompassing roles such as product managers, developers, and testers. Our results show diverse definitions of privacy across SDLC roles, emphasizing the need for a holistic privacy approach throughout SDLC. We find that software teams, regardless of their region, are less familiar with privacy concepts (such as anonymization), relying on self-teaching and forums. Most participants are more familiar with GDPR and HIPAA than other regulations, with multi-jurisdictional compliance being their primary concern. Our results advocate the need for role-dependent solutions to address the privacy challenges, and we highlight research directions and educational takeaways to help improve privacy-aware software development.

Information Plane Analysis Visualization in Deep Learning via Transfer Entropy

Authors: Adrian Moldovan, Angel Cataron, Razvan Andonie

Link: http://arxiv.org/abs/2404.01364v1open in new window

Abstract: In a feedforward network, Transfer Entropy (TE) can be used to measure the influence that one layer has on another by quantifying the information transfer between them during training. According to the Information Bottleneck principle, a neural model's internal representation should compress the input data as much as possible while still retaining sufficient information about the output. Information Plane analysis is a visualization technique used to understand the trade-off between compression and information preservation in the context of the Information Bottleneck method by plotting the amount of information in the input data against the compressed representation. The claim that there is a causal link between information-theoretic compression and generalization, measured by mutual information, is plausible, but results from different studies are conflicting. In contrast to mutual information, TE can capture temporal relationships between variables. To explore such links, in our novel approach we use TE to quantify information transfer between neural layers and perform Information Plane analysis. We obtained encouraging experimental results, opening the possibility for further investigations.

Image Reconstruction from Electroencephalography Using Latent Diffusion

Authors: Teng Fei, Virginia de Sa

Link: http://arxiv.org/abs/2404.01250v1open in new window

Abstract: In this work, we have adopted the diffusion-based image reconstruction pipeline previously used for fMRI image reconstruction and applied it to Electroencephalography (EEG). The EEG encoding method is very simple, and forms a baseline from which more sophisticated EEG encoding methods can be compared. We have also evaluated the fidelity of the generated image using the same metrics used in the previous functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) works. Our results show that while the reconstruction from EEG recorded to rapidly presented images is not as good as reconstructions from fMRI to slower presented images, it holds a surprising amount of information that could be applied in specific use cases. Also, EEG-based image reconstruction works better in some categories-such as land animals and food-than others, shedding new light on previous findings of EEG's sensitivity to those categories and revealing potential for these methods to further understand EEG responses to human visual coding. More investigation should use longer-duration image stimulations to elucidate the later components that might be salient to the different image categories.

AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding

Authors: Safwat Ali Khan, Wenyu Wang, Yiran Ren, Bin Zhu, Jiangfan Shi, Alyssa McGowan, Wing Lam, Kevin Moran

Link: http://arxiv.org/abs/2404.01240v1open in new window

Abstract: Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected code coverage - particularly on sophisticated proprietary apps. Prior work has illustrated that a primary cause of these coverage deficiencies is related to so-called tarpits, or complex screens that are difficult to navigate. In this paper, we take a critical step toward enabling AIG tools to effectively navigate tarpits during app exploration through a new form of automated semantic screen understanding. We introduce AURORA, a technique that learns from the visual and textual patterns that exist in mobile app UIs to automatically detect common screen designs and navigate them accordingly. The key idea of AURORA is that there are a finite number of mobile app screen designs, albeit with subtle variations, such that the general patterns of different categories of UI designs can be learned. As such, AURORA employs a multi-modal, neural screen classifier that is able to recognize the most common types of UI screen designs. After recognizing a given screen, it then applies a set of flexible and generalizable heuristics to properly navigate the screen. We evaluated AURORA both on a set of 12 apps with known tarpits from prior work, and on a new set of five of the most popular apps from the Google Play store. Our results indicate that AURORA is able to effectively navigate tarpit screens, outperforming prior approaches that avoid tarpits by 19.6% in terms of method coverage. The improvements can be attributed to AURORA's UI design classification and heuristic navigation techniques.

LLM Attributor: Interactive Visual Attribution for LLM Generation

Authors: Seongmin Lee, Zijie J. Wang, Aishwarya Chakravarthy, Alec Helbling, ShengYun Peng, Mansi Phute, Duen Horng Chau, Minsuk Kahng

Link: http://arxiv.org/abs/2404.01361v1open in new window

Abstract: While large language models (LLMs) have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library offers a new way to quickly attribute an LLM's text generation to training data points to inspect model behaviors, enhance its trustworthiness, and compare model-generated text with user-provided text. We describe the visual and interactive design of our tool and highlight usage scenarios for LLaMA2 models fine-tuned with two different datasets: online articles about recent disasters and finance-related question-answer pairs. Thanks to LLM Attributor's broad support for computational notebooks, users can easily integrate it into their workflow to interactively visualize attributions of their models. For easier access and extensibility, we open-source LLM Attributor at https://github.com/poloclub/ LLM-Attribution. The video demo is available at https://youtu.be/mIG2MDQKQxM.

Chat Modeling: Natural Language-based Procedural Modeling of Biological Structures without Training

Authors: Donggang Jia, Yunhai Wang, Ivan Viola

Link: http://arxiv.org/abs/2404.01063v1open in new window

Abstract: 3D modeling of biological structures is an inherently complex process, necessitating both biological and geometric understanding. Additionally, the complexity of user interfaces of 3D modeling tools and the associated steep learning curve further exacerbate the difficulty of authoring a 3D model. In this paper, we introduce a novel framework to address the challenge of using 3D modeling software by converting users' textual inputs into modeling actions within an interactive procedural modeling system. The framework incorporates a code generator of a novel code format and a corresponding code interpreter. The major technical innovation includes the user-refinement mechanism that captures the degree of user dissatisfaction with the modeling outcome, offers an interactive revision, and leverages this feedback for future improved 3D modeling. This entire framework is powered by large language models and eliminates the need for a traditional training process. We develop a prototype tool named Chat Modeling, offering both automatic and step-by-step 3D modeling approaches. Our evaluation of the framework with structural biologists highlights the potential of our approach being utilized in their scientific workflows. All supplemental materials are available at https://osf.io/x4qb7/.

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Authors: Haofeng Liu, Chenshu Xu, Yifei Yang, Lihua Zeng, Shengfeng He

Link: http://arxiv.org/abs/2404.01050v1open in new window

Abstract: Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise.

How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey

Authors: Zhonghao Shi, Ellen Landrum, Amy O' Connell, Mina Kian, Leticia Pinto-Alva, Kaleen Shrestha, Xiaoyuan Zhu, Maja J Matarić

Link: http://arxiv.org/abs/2404.00938v1open in new window

Abstract: Socially assistive robots (SARs) have shown great success in providing personalized cognitive-affective support for user populations with special needs such as older adults, children with autism spectrum disorder (ASD), and individuals with mental health challenges. The large body of work on SAR demonstrates its potential to provide at-home support that complements clinic-based interventions delivered by mental health professionals, making these interventions more effective and accessible. However, there are still several major technical challenges that hinder SAR-mediated interactions and interventions from reaching human-level social intelligence and efficacy. With the recent advances in large language models (LLMs), there is an increased potential for novel applications within the field of SAR that can significantly expand the current capabilities of SARs. However, incorporating LLMs introduces new risks and ethical concerns that have not yet been encountered, and must be carefully be addressed to safely deploy these more advanced systems. In this work, we aim to conduct a brief survey on the use of LLMs in SAR technologies, and discuss the potentials and risks of applying LLMs to the following three major technical challenges of SAR: 1) natural language dialog; 2) multimodal understanding; 3) LLMs as robot policies.

2024-03-31

Designing Human-AI Systems: Anthropomorphism and Framing Bias on Human-AI Collaboration

Authors: Samuel Aleksander Sánchez Olszewski

Link: http://arxiv.org/abs/2404.00634v1open in new window

Abstract: AI is redefining how humans interact with technology, leading to a synergetic collaboration between the two. Nevertheless, the effects of human cognition on this collaboration remain unclear. This study investigates the implications of two cognitive biases, anthropomorphism and framing effect, on human-AI collaboration within a hiring setting. Subjects were asked to select job candidates with the help of an AI-powered recommendation tool. The tool was manipulated to have either human-like or robot-like characteristics and presented its recommendations in either positive or negative frames. The results revealed that the framing of AI's recommendations had no significant influence on subjects' decisions. In contrast, anthropomorphism significantly affected subjects' agreement with AI recommendations. Contrary to expectations, subjects were less likely to agree with the AI if it had human-like characteristics. These findings demonstrate that cognitive biases can impact human-AI collaboration and highlight the need for tailored approaches to AI product design, rather than a single, universal solution.

"My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents

Authors: Yuki Hou, Haruki Tamoto, Homei Miyashita

Link: http://arxiv.org/abs/2404.00573v1open in new window

Abstract: In this study, we propose a novel human-like memory architecture designed for enhancing the cognitive abilities of large language model based dialogue agents. Our proposed architecture enables agents to autonomously recall memories necessary for response generation, effectively addressing a limitation in the temporal cognition of LLMs. We adopt the human memory cue recall as a trigger for accurate and efficient memory recall. Moreover, we developed a mathematical model that dynamically quantifies memory consolidation, considering factors such as contextual relevance, elapsed time, and recall frequency. The agent stores memories retrieved from the user's interaction history in a database that encapsulates each memory's content and temporal context. Thus, this strategic storage allows agents to recall specific memories and understand their significance to the user in a temporal context, similar to how humans recognize and recall past experiences.

The Emotional Impact of Game Duration: A Framework for Understanding Player Emotions in Extended Gameplay Sessions

Authors: Anoop Kumar, Suresh Dodda, Navin Kamuni, Venkata Sai Mahesh Vuppalapati

Link: http://arxiv.org/abs/2404.00526v1open in new window

Abstract: Video games have played a crucial role in entertainment since their development in the 1970s, becoming even more prominent during the lockdown period when people were looking for ways to entertain them. However, at that time, players were unaware of the significant impact that playtime could have on their feelings. This has made it challenging for designers and developers to create new games since they have to control the emotional impact that these games will take on players. Thus, the purpose of this study is to look at how a player's emotions are affected by the duration of the game. In order to achieve this goal, a framework for emotion detection is created. According to the experiment's results, the volunteers' general ability to express emotions increased from 20 to 60 minutes. In comparison to shorter gameplay sessions, the experiment found that extended gameplay sessions did significantly affect the player's emotions. According to the results, it was recommended that in order to lessen the potential emotional impact that playing computer and video games may have in the future, game producers should think about creating shorter, entertaining games.

Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation

Authors: Rohan Chaudhury, Mihir Godbole, Aakash Garg, Jinsil Hwaryoung Seo

Link: http://arxiv.org/abs/2404.01339v1open in new window

Abstract: Contemporary conversational systems often present a significant limitation: their responses lack the emotional depth and disfluent characteristic of human interactions. This absence becomes particularly noticeable when users seek more personalized and empathetic interactions. Consequently, this makes them seem mechanical and less relatable to human users. Recognizing this gap, we embarked on a journey to humanize machine communication, to ensure AI systems not only comprehend but also resonate. To address this shortcoming, we have designed an innovative speech synthesis pipeline. Within this framework, a cutting-edge language model introduces both human-like emotion and disfluencies in a zero-shot setting. These intricacies are seamlessly integrated into the generated text by the language model during text generation, allowing the system to mirror human speech patterns better, promoting more intuitive and natural user interactions. These generated elements are then adeptly transformed into corresponding speech patterns and emotive sounds using a rule-based approach during the text-to-speech phase. Based on our experiments, our novel system produces synthesized speech that's almost indistinguishable from genuine human communication, making each interaction feel more personal and authentic.

2024-03-30

Contextual AI Journaling: Integrating LLM and Time Series Behavioral Sensing Technology to Promote Self-Reflection and Well-being using the MindScape App

Authors: Subigya Nepal, Arvind Pillai, William Campbell, Talie Massachi, Eunsol Soul Choi, Orson Xu, Joanna Kuc, Jeremy Huckins, Jason Holden, Colin Depp, Nicholas Jacobson, Mary Czerwinski, Eric Granholm, Andrew T. Campbell

Link: http://arxiv.org/abs/2404.00487v1open in new window

Abstract: MindScape aims to study the benefits of integrating time series behavioral patterns (e.g., conversational engagement, sleep, location) with Large Language Models (LLMs) to create a new form of contextual AI journaling, promoting self-reflection and well-being. We argue that integrating behavioral sensing in LLMs will likely lead to a new frontier in AI. In this Late-Breaking Work paper, we discuss the MindScape contextual journal App design that uses LLMs and behavioral sensing to generate contextual and personalized journaling prompts crafted to encourage self-reflection and emotional development. We also discuss the MindScape study of college students based on a preliminary user study and our upcoming study to assess the effectiveness of contextual AI journaling in promoting better well-being on college campuses. MindScape represents a new application class that embeds behavioral intelligence in AI.

Interactive Multi-Robot Flocking with Gesture Responsiveness and Musical Accompaniment

Authors: Catie Cuan, Kyle Jeffrey, Kim Kleiven, Adrian Li, Emre Fisher, Matt Harrison, Benjie Holson, Allison Okamura, Matt Bennice

Link: http://arxiv.org/abs/2404.00442v1open in new window

Abstract: For decades, robotics researchers have pursued various tasks for multi-robot systems, from cooperative manipulation to search and rescue. These tasks are multi-robot extensions of classical robotic tasks and often optimized on dimensions such as speed or efficiency. As robots transition from commercial and research settings into everyday environments, social task aims such as engagement or entertainment become increasingly relevant. This work presents a compelling multi-robot task, in which the main aim is to enthrall and interest. In this task, the goal is for a human to be drawn to move alongside and participate in a dynamic, expressive robot flock. Towards this aim, the research team created algorithms for robot movements and engaging interaction modes such as gestures and sound. The contributions are as follows: (1) a novel group navigation algorithm involving human and robot agents, (2) a gesture responsive algorithm for real-time, human-robot flocking interaction, (3) a weight mode characterization system for modifying flocking behavior, and (4) a method of encoding a choreographer's preferences inside a dynamic, adaptive, learned system. An experiment was performed to understand individual human behavior while interacting with the flock under three conditions: weight modes selected by a human choreographer, a learned model, or subset list. Results from the experiment showed that the perception of the experience was not influenced by the weight mode selection. This work elucidates how differing task aims such as engagement manifest in multi-robot system design and execution, and broadens the domain of multi-robot tasks.

Visualizing Routes with AI-Discovered Street-View Patterns

Authors: Tsung Heng Wu, Md Amiruzzaman, Ye Zhao, Deepshikha Bhati, Jing Yang

Link: http://arxiv.org/abs/2404.00431v1open in new window

Abstract: Street-level visual appearances play an important role in studying social systems, such as understanding the built environment, driving routes, and associated social and economic factors. It has not been integrated into a typical geographical visualization interface (e.g., map services) for planning driving routes. In this paper, we study this new visualization task with several new contributions. First, we experiment with a set of AI techniques and propose a solution of using semantic latent vectors for quantifying visual appearance features. Second, we calculate image similarities among a large set of street-view images and then discover spatial imagery patterns. Third, we integrate these discovered patterns into driving route planners with new visualization techniques. Finally, we present VivaRoutes, an interactive visualization prototype, to show how visualizations leveraged with these discovered patterns can help users effectively and interactively explore multiple routes. Furthermore, we conducted a user study to assess the usefulness and utility of VivaRoutes.

A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration

Authors: Jie Gao, Simret Araya Gebreegziabher, Kenny Tsu Wei Choo, Toby Jia-Jun Li, Simon Tangi Perrault, Thomas W. Malone

Link: http://arxiv.org/abs/2404.00405v1open in new window

Abstract: With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to precisely understand the dynamics of this process. Additionally, we have developed a taxonomy of four primary interaction modes: Mode 1: Standard Prompting, Mode 2: User Interface, Mode 3: Context-based, and Mode 4: Agent Facilitator. This taxonomy was further enriched using the "5W1H" guideline method, which involved a detailed examination of definitions, participant roles (Who), the phases that happened (When), human objectives and LLM abilities (What), and the mechanics of each interaction mode (How). We anticipate this taxonomy will contribute to the future design and evaluation of human-LLM interaction.

Designing a User-centric Framework for Information Quality Ranking of Large-scale Street View Images

Authors: Tahiya Chowdhury, Ilan Mandel, Jorge Ortiz, Wendy Ju

Link: http://arxiv.org/abs/2404.00392v1open in new window

Abstract: Street view imagery (SVI), largely captured via outfitted fleets or mounted dashcams in consumer vehicles is a rapidly growing source of geospatial data used in urban sensing and development. These datasets are often collected opportunistically, are massive in size, and vary in quality which limits the scope and extent of their use in urban planning. Thus far there has not been much work to identify the obstacles experienced and tools needed by the users of such datasets. This severely limits the opportunities of using emerging street view images in supporting novel research questions that can improve the quality of urban life. This work includes a formative interview study with 5 expert users of large-scale street view datasets from academia, urban planning, and related professions which identifies novel use cases, challenges, and opportunities to increase the utility of these datasets. Based on the user findings, we present a framework to evaluate the quality of information for street images across three attributes (spatial, temporal, and content) that stakeholders can utilize for estimating the value of a dataset, and to improve it over time for their respective use case. We then present a case study using novel street view images where we evaluate our framework and present practical use cases for users. We discuss the implications for designing future systems to support the collection and use of street view data to assist in sensing and planning the urban environment.

On Task and in Sync: Examining the Relationship between Gaze Synchrony and Self-Reported Attention During Video Lecture Learning

Authors: Babette Bühler, Efe Bozkir, Hannah Deininger, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci

Link: http://arxiv.org/abs/2404.00333v1open in new window

Abstract: Successful learning depends on learners' ability to sustain attention, which is particularly challenging in online education due to limited teacher interaction. A potential indicator for attention is gaze synchrony, demonstrating predictive power for learning achievements in video-based learning in controlled experiments focusing on manipulating attention. This study (N=84) examines the relationship between gaze synchronization and self-reported attention of learners, using experience sampling, during realistic online video learning. Gaze synchrony was assessed through Kullback-Leibler Divergence of gaze density maps and MultiMatch algorithm scanpath comparisons. Results indicated significantly higher gaze synchronization in attentive participants for both measures and self-reported attention significantly predicted post-test scores. In contrast, synchrony measures did not correlate with learning outcomes. While supporting the hypothesis that attentive learners exhibit similar eye movements, the direct use of synchrony as an attention indicator poses challenges, requiring further research on the interplay of attention, gaze synchrony, and video content type.

Enhancing Empathy in Virtual Reality: An Embodied Approach to Mindset Modulation

Authors: Seoyeon Bae, Yoon Kyung Lee, Jungcheol Lee, Jaeheon Kim, Haeseong Jeon, Seung-Hwan Lim, Byung-Cheol Kim, Sowon Hahn

Link: http://arxiv.org/abs/2404.00300v1open in new window

Abstract: A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with the character and cultivate a growth mindset as they followed mission instructions. We considered several implementation factors to assist players in positioning within the VR experience, including positive feedback, content difficulty, background lighting, and multimodal feedback. We conducted an experiment to investigate the intervention's effectiveness in increasing empathy. Our findings revealed that the VR content and mindset training encouraged participants to improve their growth mindsets and empathic motives. This VR content was developed for college students to enhance their empathy and teamwork skills. It has the potential to improve collaboration in organizational and community environments.

Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World

Authors: Guande Wu, Chen Zhao, Claudio Silva, He He

Link: http://arxiv.org/abs/2404.00246v1open in new window

Abstract: Language agents that interact with the world on their own have great potential for automating digital tasks. While large language model (LLM) agents have made progress in understanding and executing tasks such as textual games and webpage control, many real-world tasks also require collaboration with humans or other LLMs in equal roles, which involves intent understanding, task coordination, and communication. To test LLM's ability to collaborate, we design a blocks-world environment, where two agents, each having unique goals and skills, build a target structure together. To complete the goals, they can act in the world and communicate in natural language. Under this environment, we design increasingly challenging settings to evaluate different collaboration perspectives, from independent to more complex, dependent tasks. We further adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors. Both human-machine and machine-machine experiments show that LLM agents have strong grounding capacities, and our approach significantly improves the evaluation metric.

2024-03-29

Tools and Tasks in Sensemaking: A Visual Accessibility Perspective

Authors: Yichun Zhao, Miguel A. Nacenta

Link: http://arxiv.org/abs/2404.00192v1open in new window

Abstract: Our previous interview study explores the needs and uses of diagrammatic information by the Blind and Low Vision (BLV) community, resulting in a framework called the Ladder of Diagram Access. The framework outlines five levels of information access when interacting with a diagram. In this paper, we connect this framework to include the global activity of sensemaking and discuss its (in)accessibility to the BLV demographic. We also discuss the integration of this framework into the sensemaking process and explore the current sensemaking practices and strategies employed by the BLV community, the challenges they face at different levels of the ladder, and potential solutions to enhance inclusivity towards a data-driven workforce.

No Risk, No Reward: Towards An Automated Measure of Psychological Safety from Online Communication

Authors: Sharon Ferguson, Georgia Van de Zande, Alison Olechowski

Link: http://arxiv.org/abs/2404.00171v1open in new window

Abstract: The data created from virtual communication platforms presents the opportunity to explore automated measures for monitoring team performance. In this work, we explore one important characteristic of successful teams - Psychological Safety - or the belief that a team is safe for interpersonal risk-taking. To move towards an automated measure of this phenomenon, we derive virtual communication characteristics and message keywords related to elements of Psychological Safety from the literature. Using a mixed methods approach, we investigate whether these characteristics are present in the Slack messages from two design teams - one high in Psychological Safety, and one low. We find that some usage characteristics, such as replies, reactions, and user mentions, might be promising metrics to indicate higher levels of Psychological Safety, while simple keyword searches may not be nuanced enough. We present the first step towards the automated detection of this important, yet complex, team characteristic.

Circle Back Next Week: The Effect of Meeting-Free Weeks on Distributed Workers' Unstructured Time and Attention Negotiation

Authors: Sharon Ferguson, Michael Massimi

Link: http://arxiv.org/abs/2404.00161v1open in new window

Abstract: While distributed workers rely on scheduled meetings for coordination and collaboration, these meetings can also challenge their ability to focus. Protecting worker focus has been addressed from a technical perspective, but companies are now attempting organizational interventions, such as meeting-free weeks. Recognizing distributed collaboration as a sociotechnical challenge, we first present an interview study with distributed workers participating in meeting-free weeks at an enterprise software company. We identify three orientations workers exhibit during these weeks: Focus, Collaborative, and Time-Bound, each with varying levels and use of unstructured time. These different orientations result in challenges in attention negotiation, which may be suited for technical interventions. This motivated a follow-up study investigating attention negotiation and the compensating mechanisms workers developed during meeting-free weeks. Our framework identified tensions between the attention-getting and attention-delegation strategies. We extend past work to show how workers adapt their virtual collaboration mechanisms in response to organizational interventions

Give Text A Chance: Advocating for Equal Consideration for Language and Visualization

Authors: Chase Stokes, Marti A. Hearst

Link: http://arxiv.org/abs/2404.00131v1open in new window

Abstract: Visualization research tends to de-emphasize consideration of the textual context in which its images are placed. We argue that visualization research should consider textual representations as a primary alternative to visual options when assessing designs, and when assessing designs, equal attention should be given to the construction of the language as to the visualizations. We also call for a consideration of readability when integrating visualizations with written text. In highlighting these points, visualization research would be elevated in efficacy and demonstrate thorough accounting for viewers' needs and responses.

Enhancing Dimension-Reduced Scatter Plots with Class and Feature Centroids

Authors: Daniel B. Hier, Tayo Obafemi-Ajayi, Gayla R. Olbricht, Devin M. Burns, Sasha Petrenko, Donald C. Wunsch II

Link: http://arxiv.org/abs/2403.20246v1open in new window

Abstract: Dimension reduction is increasingly applied to high-dimensional biomedical data to improve its interpretability. When datasets are reduced to two dimensions, each observation is assigned an x and y coordinates and is represented as a point on a scatter plot. A significant challenge lies in interpreting the meaning of the x and y axes due to the complexities inherent in dimension reduction. This study addresses this challenge by using the x and y coordinates derived from dimension reduction to calculate class and feature centroids, which can be overlaid onto the scatter plots. This method connects the low-dimension space to the original high-dimensional space. We illustrate the utility of this approach with data derived from the phenotypes of three neurogenetic diseases and demonstrate how the addition of class and feature centroids increases the interpretability of scatter plots.

Entertainment chatbot for the digital inclusion of elderly people without abstraction capabilities

Authors: Silvia García-Méndez, Francisco de Arriba-Pérez, Francisco J. González-Castaño, José A. Regueiro-Janeiro, Felipe Gil-Castiñeira

Link: http://arxiv.org/abs/2404.01327v1open in new window

Abstract: Current language processing technologies allow the creation of conversational chatbot platforms. Even though artificial intelligence is still too immature to support satisfactory user experience in many mass market domains, conversational interfaces have found their way into ad hoc applications such as call centres and online shopping assistants. However, they have not been applied so far to social inclusion of elderly people, who are particularly vulnerable to the digital divide. Many of them relieve their loneliness with traditional media such as TV and radio, which are known to create a feeling of companionship. In this paper we present the EBER chatbot, designed to reduce the digital gap for the elderly. EBER reads news in the background and adapts its responses to the user's mood. Its novelty lies in the concept of "intelligent radio", according to which, instead of simplifying a digital information system to make it accessible to the elderly, a traditional channel they find familiar -- background news -- is augmented with interactions via voice dialogues. We make it possible by combining Artificial Intelligence Modelling Language, automatic Natural Language Generation and Sentiment Analysis. The system allows accessing digital content of interest by combining words extracted from user answers to chatbot questions with keywords extracted from the news items. This approach permits defining metrics of the abstraction capabilities of the users depending on a spatial representation of the word space. To prove the suitability of the proposed solution we present results of real experiments conducted with elderly people that provided valuable insights. Our approach was considered satisfactory during the tests and improved the information search capabilities of the participants.

ITCMA: A Generative Agent Based on a Computational Consciousness Structure

Authors: Hanzhong Zhang, Jibin Yin, Haoyang Wang, Ziwei Xiang

Link: http://arxiv.org/abs/2403.20097v1open in new window

Abstract: Large Language Models (LLMs) still face challenges in tasks requiring understanding implicit instructions and applying common-sense knowledge. In such scenarios, LLMs may require multiple attempts to achieve human-level performance, potentially leading to inaccurate responses or inferences in practical environments, affecting their long-term consistency and behavior. This paper introduces the Internal Time-Consciousness Machine (ITCM), a computational consciousness structure. We further propose the ITCM-based Agent (ITCMA), which supports behavior generation and reasoning in open-world settings. ITCMA enhances LLMs' ability to understand implicit instructions and apply common-sense knowledge by considering agents' interaction and reasoning with the environment. Evaluations in the Alfworld environment show that trained ITCMA outperforms the state-of-the-art (SOTA) by 9% on the seen set. Even untrained ITCMA achieves a 96% task completion rate on the seen set, 5% higher than SOTA, indicating its superiority over traditional intelligent agents in utility and generalization. In real-world tasks with quadruped robots, the untrained ITCMA achieves an 85% task completion rate, which is close to its performance in the unseen set, demonstrating its comparable utility in real-world settings.

MindArm: Mechanized Intelligent Non-Invasive Neuro-Driven Prosthetic Arm System

Authors: Maha Nawaz, Abdul Basit, Muhammad Shafique

Link: http://arxiv.org/abs/2403.19992v1open in new window

Abstract: Currently, people with disability or difficulty to move their arms (referred to as "patients") have very limited technological solutions to efficiently address their physiological limitations. It is mainly due to two reasons: (1) the non-invasive solutions like mind-controlled prosthetic devices are typically very costly and require expensive maintenance; and (2) other solutions require costly invasive brain surgery, which is high risk to perform, expensive, and difficult to maintain. Therefore, current technological solutions are not accessible for all patients with different financial backgrounds. Toward this, we propose a low-cost technological solution called MindArm, a mechanized intelligent non-invasive neuro-driven prosthetic arm system. Our MindArm system employs a deep neural network (DNN) engine to translate brain signals into the intended prosthetic arm motion, thereby helping patients to perform many activities despite their physiological limitations. Here, our MindArm system utilizes widely accessible and low-cost surface electroencephalogram (EEG) electrodes coupled with an Open Brain Computer Interface and UDP networking for acquiring brain signals and transmitting them to the compute module for signal processing. In the compute module, we run a trained DNN model to interpret normalized micro-voltage of the brain signals, and then translate them into a prosthetic arm action via serial communication seamlessly. The experimental results on a fully working prototype demonstrate that, from the three defined actions, our MindArm system achieves positive success rates, i.e., 91% for idle/stationary, 85% for shake hand, and 84% for pick-up cup. This demonstrates that our MindArm provides a novel approach for an alternate low-cost mind-controlled prosthetic devices for all patients.

2024-03-28

"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices

Authors: Shivani Kapania, Ruiyi Wang, Toby Jia-Jun Li, Tianshi Li, Hong Shen

Link: http://arxiv.org/abs/2403.19876v1open in new window

Abstract: Large language models are increasingly applied in real-world scenarios, including research and education. These models, however, come with well-known ethical issues, which may manifest in unexpected ways in human-computer interaction research due to the extensive engagement with human subjects. This paper reports on research practices related to LLM use, drawing on 16 semi-structured interviews and a survey conducted with 50 HCI researchers. We discuss the ways in which LLMs are already being utilized throughout the entire HCI research pipeline, from ideation to system development and paper writing. While researchers described nuanced understandings of ethical issues, they were rarely or only partially able to identify and address those ethical concerns in their own projects. This lack of action and reliance on workarounds was explained through the perceived lack of control and distributed responsibility in the LLM supply chain, the conditional nature of engaging with ethics, and competing priorities. Finally, we reflect on the implications of our findings and present opportunities to shape emerging norms of engaging with large language models in HCI research.

Creating Aesthetic Sonifications on the Web with SIREN

Authors: Tristan Peng, Hongchan Choi, Jonathan Berger

Link: http://arxiv.org/abs/2403.19763v1open in new window

Abstract: SIREN is a flexible, extensible, and customizable web-based general-purpose interface for auditory data display (sonification). Designed as a digital audio workstation for sonification, synthesizers written in JavaScript using the Web Audio API facilitate intuitive mapping of data to auditory parameters for a wide range of purposes. This paper explores the breadth of sound synthesis techniques supported by SIREN, and details the structure and definition of a SIREN synthesizer module. The paper proposes further development that will increase SIREN's utility.

Leveraging Counterfactual Paths for Contrastive Explanations of POMDP Policies

Authors: Benjamin Kraske, Zakariya Laouar, Zachary Sunberg

Link: http://arxiv.org/abs/2403.19760v1open in new window

Abstract: As humans come to rely on autonomous systems more, ensuring the transparency of such systems is important to their continued adoption. Explainable Artificial Intelligence (XAI) aims to reduce confusion and foster trust in systems by providing explanations of agent behavior. Partially observable Markov decision processes (POMDPs) provide a flexible framework capable of reasoning over transition and state uncertainty, while also being amenable to explanation. This work investigates the use of user-provided counterfactuals to generate contrastive explanations of POMDP policies. Feature expectations are used as a means of contrasting the performance of these policies. We demonstrate our approach in a Search and Rescue (SAR) setting. We analyze and discuss the associated challenges through two case studies.

Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models

Authors: Ole Hall, Anil Yaman

Link: http://arxiv.org/abs/2403.19620v1open in new window

Abstract: Generative Adversarial Networks (GANs) have shown great success in generating high quality images and are thus used as one of the main approaches to generate art images. However, usually the image generation process involves sampling from the latent space of the learned art representations, allowing little control over the output. In this work, we first employ GANs that are trained to produce creative images using an architecture known as Creative Adversarial Networks (CANs), then, we employ an evolutionary approach to navigate within the latent space of the models to discover images. We use automatic aesthetic and collaborative interactive human evaluation metrics to assess the generated images. In the human interactive evaluation case, we propose a collaborative evaluation based on the assessments of several participants. Furthermore, we also experiment with an intelligent mutation operator that aims to improve the quality of the images through local search based on an aesthetic measure. We evaluate the effectiveness of this approach by comparing the results produced by the automatic and collaborative interactive evolution. The results show that the proposed approach can generate highly attractive art images when the evolution is guided by collaborative human feedback.

Exploring Communication Dynamics: Eye-tracking Analysis in Pair Programming of Computer Science Education

Authors: Wunmin Jang, Hong Gao, Tilman Michaeli, Enkelejda Kasneci

Link: http://arxiv.org/abs/2403.19560v1open in new window

Abstract: Pair programming is widely recognized as an effective educational tool in computer science that promotes collaborative learning and mirrors real-world work dynamics. However, communication breakdowns within pairs significantly challenge this learning process. In this study, we use eye-tracking data recorded during pair programming sessions to study communication dynamics between various pair programming roles across different student, expert, and mixed group cohorts containing 19 participants. By combining eye-tracking data analysis with focus group interviews and questionnaires, we provide insights into communication's multifaceted nature in pair programming. Our findings highlight distinct eye-tracking patterns indicating changes in communication skills across group compositions, with participants prioritizing code exploration over communication, especially during challenging tasks. Further, students showed a preference for pairing with experts, emphasizing the importance of understanding group formation in pair programming scenarios. These insights emphasize the importance of understanding group dynamics and enhancing communication skills through pair programming for successful outcomes in computer science education.

LLMs as Academic Reading Companions: Extending HCI Through Synthetic Personae

Authors: Celia Chen, Alex Leitch

Link: http://arxiv.org/abs/2403.19506v1open in new window

Abstract: This position paper argues that large language models (LLMs) constitute promising yet underutilized academic reading companions capable of enhancing learning. We detail an exploratory study examining Claude.ai from Anthropic, an LLM-based interactive assistant that helps students comprehend complex qualitative literature content. The study compares quantitative survey data and qualitative interviews assessing outcomes between a control group and an experimental group leveraging Claude.ai over a semester across two graduate courses. Initial findings demonstrate tangible improvements in reading comprehension and engagement among participants using the AI agent versus unsupported independent study. However, there is potential for overreliance and ethical considerations that warrant continued investigation. By documenting an early integration of an LLM reading companion into an educational context, this work contributes pragmatic insights to guide development of synthetic personae supporting learning. Broader impacts compel policy and industry actions to uphold responsible design in order to maximize benefits of AI integration while prioritizing student wellbeing.

A theoretical framework for the design and analysis of computational thinking problems in education

Authors: Giorgia Adorni, Alberto Piatti, Engin Bumbacher, Lucio Negrini, Francesco Mondada, Dorit Assaf, Francesca Mangili, Luca Gambardella

Link: http://arxiv.org/abs/2403.19475v1open in new window

Abstract: The field of computational thinking education has grown in recent years as researchers and educators have sought to develop and assess students' computational thinking abilities. While much of the research in this area has focused on defining computational thinking, the competencies it involves and how to assess them in teaching and learning contexts, this work takes a different approach. We provide a more situated perspective on computational thinking, focusing on the types of problems that require computational thinking skills to be solved and the features that support these processes. We develop a framework for analysing existing computational thinking problems in an educational context. We conduct a comprehensive literature review to identify prototypical activities from areas where computational thinking is typically pursued in education. We identify the main components and characteristics of these activities, along with their influence on activating computational thinking competencies. The framework provides a catalogue of computational thinking skills that can be used to understand the relationship between problem features and competencies activated. This study contributes to the field of computational thinking education by offering a tool for evaluating and revising existing problems to activate specific skills and for assisting in designing new problems that target the development of particular competencies. The results of this study may be of interest to researchers and educators working in computational thinking education.

"At the end of the day, I am accountable": Gig Workers' Self-Tracking for Multi-Dimensional Accountability Management

Authors: Rie Helene, Hernandez, Qiurong Song, Yubo Kou, Xinning Gui

Link: http://arxiv.org/abs/2403.19436v1open in new window

Abstract: Tracking is inherent in and central to the gig economy. Platforms track gig workers' performance through metrics such as acceptance rate and punctuality, while gig workers themselves engage in self-tracking. Although prior research has extensively examined how gig platforms track workers through metrics -- with some studies briefly acknowledging the phenomenon of self-tracking among workers -- there is a dearth of studies that explore how and why gig workers track themselves. To address this, we conducted 25 semi-structured interviews, revealing how gig workers self-tracking to manage accountabilities to themselves and external entities across three identities: the holistic self, the entrepreneurial self, and the platformized self. We connect our findings to neoliberalism, through which we contextualize gig workers' self-accountability and the invisible labor of self-tracking. We further discuss how self-tracking mitigates information and power asymmetries in gig work and offer design implications to support gig workers' multi-dimensional self-tracking.

An Interactive Human-Machine Learning Interface for Collecting and Learning from Complex Annotations

Authors: Jonathan Erskine, Matt Clifford, Alexander Hepburn, Raúl Santos-Rodríguez

Link: http://arxiv.org/abs/2403.19339v1open in new window

Abstract: Human-Computer Interaction has been shown to lead to improvements in machine learning systems by boosting model performance, accelerating learning and building user confidence. In this work, we aim to alleviate the expectation that human annotators adapt to the constraints imposed by traditional labels by allowing for extra flexibility in the form that supervision information is collected. For this, we propose a human-machine learning interface for binary classification tasks which enables human annotators to utilise counterfactual examples to complement standard binary labels as annotations for a dataset. Finally we discuss the challenges in future extensions of this work.

CogniDot: Vasoactivity-based Cognitive Load Monitoring with a Miniature On-skin Sensor

Authors: Hongbo Lan, Yanrong Li, Shixuan Li, Xin Yi, Tengxiang Zhang

Link: http://arxiv.org/abs/2403.19206v1open in new window

Abstract: Vascular activities offer valuable signatures for psychological monitoring applications. We present CogniDot, an affordable, miniature skin sensor placed on the temporal area on the head that senses cognitive loads with a single-pixel color sensor. With its energy-efficient design, bio-compatible adhesive, and compact size (22mm diameter, 8.5mm thickness), it is ideal for long-term monitoring of mind status. We showed in detail the hardware design of our sensor. The user study results with 12 participants show that CogniDot can accurately differentiate between three levels of cognitive loads with a within-user accuracy of 97%. We also discuss its potential for broader applications.

Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art Exploration

Authors: Louie Søs Meyer, Johanne Engel Aaen, Anitamalina Regitse Tranberg, Peter Kun, Matthias Freiberger, Sebastian Risi, Anders Sundnes Løvlie

Link: http://arxiv.org/abs/2403.19174v1open in new window

Abstract: This Research through Design paper explores how object detection may be applied to a large digital art museum collection to facilitate new ways of encountering and experiencing art. We present the design and evaluation of an interactive application called SMKExplore, which allows users to explore a museum's digital collection of paintings by browsing through objects detected in the images, as a novel form of open-ended exploration. We provide three contributions. First, we show how an object detection pipeline can be integrated into a design process for visual exploration. Second, we present the design and development of an app that enables exploration of an art museum's collection. Third, we offer reflections on future possibilities for museums and HCI researchers to incorporate object detection techniques into the digitalization of museums.

Exploring Holistic HMI Design for Automated Vehicles: Insights from a Participatory Workshop to Bridge In-Vehicle and External Communication

Authors: Haoyu Dong, Tram Thi Minh Tran, Rutger Verstegen, Silvia Cazacu, Ruolin Gao, Marius Hoggenmüller, Debargha Dey, Mervyn Franssen, Markus Sasalovici, Pavlo Bazilinskyy, Marieke Martens

Link: http://arxiv.org/abs/2403.19153v1open in new window

Abstract: Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospects of bridging these two seemingly distinct domains. Through a participatory workshop with automotive user interface researchers and practitioners, we facilitated a critical exploration of holistic HMI design by having workshop participants collaboratively develop interaction scenarios involving AVs, in-vehicle users, and external road users. The discussion offers insights into the escalation of interface elements as an HMI design strategy, the direct interactions between different users, and an expanded understanding of holistic HMI design. This work reflects a collaborative effort to understand the practical aspects of this holistic design approach, offering new perspectives and encouraging further investigation into this underexplored aspect of automotive user interfaces.

Real-time accident detection and physiological signal monitoring to enhance motorbike safety and emergency response

Authors: S. M. Kayser Mehbub Siam, Khadiza Islam Sumaiya, Md Rakib Al-Amin, Tamim Hasan Turjo, Ahsanul Islam, A. H. M. A. Rahim, Md Rakibul Hasan

Link: http://arxiv.org/abs/2403.19085v1open in new window

Abstract: Rapid urbanization and improved living standards have led to a substantial increase in the number of vehicles on the road, consequently resulting in a rise in the frequency of accidents. Among these accidents, motorbike accidents pose a particularly high risk, often resulting in serious injuries or deaths. A significant number of these fatalities occur due to delayed or inadequate medical attention. To this end, we propose a novel automatic detection and notification system specifically designed for motorbike accidents. The proposed system comprises two key components: a detection system and a physiological signal monitoring system. The detection system is integrated into the helmet and consists of a microcontroller, accelerometer, GPS, GSM, and Wi-Fi modules. The physio-monitoring system incorporates a sensor for monitoring pulse rate and SpO${2}$ saturation. All collected data are presented on an LCD display and wirelessly transmitted to the detection system through the microcontroller of the physiological signal monitoring system. If the accelerometer readings consistently deviate from the specified threshold decided through extensive experimentation, the system identifies the event as an accident and transmits the victim's information -- including the GPS location, pulse rate, and SpO${2}$ saturation rate -- to the designated emergency contacts. Preliminary results demonstrate the efficacy of the proposed system in accurately detecting motorbike accidents and promptly alerting emergency contacts. We firmly believe that the proposed system has the potential to significantly mitigate the risks associated with motorbike accidents and save lives.

2024-03-27

Towards Human-Centered Construction Robotics: An RL-Driven Companion Robot For Contextually Assisting Carpentry Workers

Authors: Yuning Wu, Jiaying Wei, Jean Oh, Daniel Cardoso Llach

Link: http://arxiv.org/abs/2403.19060v1open in new window

Abstract: In the dynamic construction industry, traditional robotic integration has primarily focused on automating specific tasks, often overlooking the complexity and variability of human aspects in construction workflows. This paper introduces a human-centered approach with a ``work companion rover" designed to assist construction workers within their existing practices, aiming to enhance safety and workflow fluency while respecting construction labor's skilled nature. We conduct an in-depth study on deploying a robotic system in carpentry formwork, showcasing a prototype that emphasizes mobility, safety, and comfortable worker-robot collaboration in dynamic environments through a contextual Reinforcement Learning (RL)-driven modular framework. Our research advances robotic applications in construction, advocating for collaborative models where adaptive robots support rather than replace humans, underscoring the potential for an interactive and collaborative human-robot workforce.

Visualizing High-Dimensional Temporal Data Using Direction-Aware t-SNE

Authors: Pavlin G. Poličar, Blaž Zupan

Link: http://arxiv.org/abs/2403.19040v1open in new window

Abstract: Many real-world data sets contain a temporal component or involve transitions from state to state. For exploratory data analysis, we can represent these high-dimensional data sets in two-dimensional maps, using embeddings of the data objects under exploration and representing their temporal relationships with directed edges. Most existing dimensionality reduction techniques, such as t-SNE and UMAP, do not take into account the temporal or relational nature of the data when constructing the embeddings, resulting in temporally cluttered visualizations that obscure potentially interesting patterns. To address this problem, we propose two complementary, direction-aware loss terms in the optimization function of t-SNE that emphasize the temporal aspects of the data, guiding the optimization and the resulting embedding to reveal temporal patterns that might otherwise go unnoticed. The Directional Coherence Loss (DCL) encourages nearby arrows connecting two adjacent time series points to point in the same direction, while the Edge Length Loss (ELL) penalizes arrows - which effectively represent time gaps in the visualized embedding - based on their length. Both loss terms are differentiable and can be easily incorporated into existing dimensionality reduction techniques. By promoting local directionality of the directed edges, our procedure produces more temporally meaningful and less cluttered visualizations. We demonstrate the effectiveness of our approach on a toy dataset and two real-world datasets.

Women are less comfortable expressing opinions online than men and report heightened fears for safety: Surveying gender differences in experiences of online harms

Authors: Francesca Stevens, Florence E. Enock, Tvesha Sippy, Jonathan Bright, Miranda Cross, Pica Johansson, Judy Wajcman, Helen Z. Margetts

Link: http://arxiv.org/abs/2403.19037v1open in new window

Abstract: Online harms, such as hate speech, trolling and self-harm promotion, continue to be widespread. While some work suggests women are disproportionately affected, other studies find mixed evidence for gender differences in experiences with content of this kind. Using a nationally representative survey of UK adults (N=1992), we examine exposure to a variety of harms, fears surrounding being targeted, the psychological impact of online experiences, the use of safety tools to protect against harm, and comfort with various forms of online participation across men and women. We find that while men and women see harmful content online to a roughly similar extent, women are more at risk than men of being targeted by harms including online misogyny, cyberstalking and cyberflashing. Women are significantly more fearful of being targeted by harms overall, and report greater negative psychological impact as a result of particular experiences. Perhaps in an attempt to mitigate risk, women report higher use of a range of safety tools and less comfort with several forms of online participation, with just 23% of women comfortable expressing political views online compared to 40% of men. We also find direct associations between fears surrounding harms and comfort with online behaviours. For example, fear of being trolled significantly decreases comfort expressing opinions, and fear of being targeted by misogyny significantly decreases comfort sharing photos. Our results are important because with much public discourse happening online, we must ensure all members of society feel safe and able to participate in online spaces.

Should I Help a Delivery Robot? Cultivating Prosocial Norms through Observations

Authors: Vivienne Bihe Chi, Shashank Mehrotra, Teruhisa Misu, Kumar Akash

Link: http://arxiv.org/abs/2403.19027v1open in new window

Abstract: We propose leveraging prosocial observations to cultivate new social norms to encourage prosocial behaviors toward delivery robots. With an online experiment, we quantitatively assess updates in norm beliefs regarding human-robot prosocial behaviors through observational learning. Results demonstrate the initially perceived normativity of helping robots is influenced by familiarity with delivery robots and perceptions of robots' social intelligence. Observing human-robot prosocial interactions notably shifts peoples' normative beliefs about prosocial actions; thereby changing their perceived obligations to offer help to delivery robots. Additionally, we found that observing robots offering help to humans, rather than receiving help, more significantly increased participants' feelings of obligation to help robots. Our findings provide insights into prosocial design for future mobility systems. Improved familiarity with robot capabilities and portraying them as desirable social partners can help foster wider acceptance. Furthermore, robots need to be designed to exhibit higher levels of interactivity and reciprocal capabilities for prosocial behavior.

The Correlations of Scene Complexity, Workload, Presence, and Cybersickness in a Task-Based VR Game

Authors: Mohammadamin Sanaei, Stephen B. Gilbert, Nikoo Javadpour, Hila Sabouni, Michael C. Dorneich, Jonathan W. Kelly

Link: http://arxiv.org/abs/2403.19019v1open in new window

Abstract: This investigation examined the relationships among scene complexity, workload, presence, and cybersickness in virtual reality (VR) environments. Numerous factors can influence the overall VR experience, and existing research on this matter is not yet conclusive, warranting further investigation. In this between-subjects experimental setup, 44 participants engaged in the Pendulum Chair game, with half exposed to a simple scene with lower optic flow and lower familiarity, and the remaining half to a complex scene characterized by higher optic flow and greater familiarity. The study measured the dependent variables workload, presence, and cybersickness and analyzed their correlations. Equivalence testing was also used to compare the simple and complex environments. Results revealed that despite the visible differences between the environments, within the 10% boundaries of the maximum possible value for workload and presence, and 13.6% of the maximum SSQ value, a statistically significant equivalence was observed between the simple and complex scenes. Additionally, a moderate, negative correlation emerged between workload and SSQ scores. The findings suggest two key points: (1) the nature of the task can mitigate the impact of scene complexity factors such as optic flow and familiarity, and (2) the correlation between workload and cybersickness may vary, showing either a positive or negative relationship.

Thelxinoë: Recognizing Human Emotions Using Pupillometry and Machine Learning

Authors: Darlene Barker, Haim Levkowitz

Link: http://arxiv.org/abs/2403.19014v1open in new window

Abstract: In this study, we present a method for emotion recognition in Virtual Reality (VR) using pupillometry. We analyze pupil diameter responses to both visual and auditory stimuli via a VR headset and focus on extracting key features in the time-domain, frequency-domain, and time-frequency domain from VR generated data. Our approach utilizes feature selection to identify the most impactful features using Maximum Relevance Minimum Redundancy (mRMR). By applying a Gradient Boosting model, an ensemble learning technique using stacked decision trees, we achieve an accuracy of 98.8% with feature engineering, compared to 84.9% without it. This research contributes significantly to the Thelxino"e framework, aiming to enhance VR experiences by integrating multiple sensor data for realistic and emotionally resonant touch interactions. Our findings open new avenues for developing more immersive and interactive VR environments, paving the way for future advancements in virtual touch technology.

SolderlessPCB: Reusing Electronic Components in PCB Prototyping through Detachable 3D Printed Housings

Authors: Zeyu Yan, Jiasheng Li, Zining Zhang, Huaishu Peng

Link: http://arxiv.org/abs/2403.18797v1open in new window

Abstract: The iterative prototyping process for printed circuit boards (PCBs) frequently employs surface-mounted device (SMD) components, which are often discarded rather than reused due to the challenges associated with desoldering, leading to unnecessary electronic waste. This paper introduces SolderlessPCB, a collection of techniques for solder-free PCB prototyping, specifically designed to promote the recycling and reuse of electronic components. Central to this approach are custom 3D-printable housings that allow SMD components to be mounted onto PCBs without soldering. We detail the design of SolderlessPCB and the experiments conducted to evaluate its design parameters, electrical performance, and durability. To illustrate the potential for reusing SMD components with SolderlessPCB, we discuss two scenarios: the reuse of components from earlier design iterations and from obsolete prototypes. We also provide examples demonstrating that SolderlessPCB can handle high-current applications and is suitable for high-speed data transmission. The paper concludes by discussing the limitations of our approach and suggesting future directions to overcome these challenges.

Teaching Introductory HRI: UChicago Course "Human-Robot Interaction: Research and Practice"

Authors: Sarah Sebo

Link: http://arxiv.org/abs/2403.18692v1open in new window

Abstract: In 2020, I designed the course CMSC 20630/30630 Human-Robot Interaction: Research and Practice as a hands-on introduction to human-robot interaction (HRI) research for both undergraduate and graduate students at the University of Chicago. Since 2020, I have taught and refined this course each academic year. Human-Robot Interaction: Research and Practice focuses on the core concepts and cutting-edge research in the field of human-robot interaction (HRI), covering topics that include: nonverbal robot behavior, verbal robot behavior, social dynamics, norms & ethics, collaboration & learning, group interactions, applications, and future challenges of HRI. Course meetings involve students in the class leading discussions about cutting-edge peer-reviewed research HRI publications. Students also participate in a quarter-long collaborative research project, where they pursue an HRI research question that often involves conducing their own human-subjects research study where they recruit human subjects to interact with a robot. In this paper, I detail the structure of the course and its learning goals as well as my reflections and student feedback on the course.

An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long Project

Authors: Ben Arie Tanay, Lexy Arinze, Siddhant S. Joshi, Kirsten A. Davis, James C. Davis

Link: http://arxiv.org/abs/2403.18679v1open in new window

Abstract: Background: Large Language Models (LLMs) such as ChatGPT and CoPilot are influencing software engineering practice. Software engineering educators must teach future software engineers how to use such tools well. As of yet, there have been few studies that report on the use of LLMs in the classroom. It is, therefore, important to evaluate students' perception of LLMs and possible ways of adapting the computing curriculum to these shifting paradigms. Purpose: The purpose of this study is to explore computing students' experiences and approaches to using LLMs during a semester-long software engineering project. Design/Method: We collected data from a senior-level software engineering course at Purdue University. This course uses a project-based learning (PBL) design. The students used LLMs such as ChatGPT and Copilot in their projects. A sample of these student teams were interviewed to understand (1) how they used LLMs in their projects; and (2) whether and how their perspectives on LLMs changed over the course of the semester. We analyzed the data to identify themes related to students' usage patterns and learning outcomes. Results/Discussion: When computing students utilize LLMs within a project, their use cases cover both technical and professional applications. In addition, these students perceive LLMs to be efficient tools in obtaining information and completion of tasks. However, there were concerns about the responsible use of LLMs without being detrimental to their own learning outcomes. Based on our findings, we recommend future research to investigate the usage of LLM's in lower-level computer engineering courses to understand whether and how LLMs can be integrated as a learning aid without hurting the learning outcomes.

Aiming for Relevance

Authors: Bar Eini Porat, Danny Eytan, Uri Shalit

Link: http://arxiv.org/abs/2403.18668v1open in new window

Abstract: Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care. 10 pages, 9 figures.