We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 542 entries: 1-542 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 2 May 24

[1]  arXiv:2405.00001 [pdf, ps, other]
Title: Experimental Evaluation of the PHP's cURL Library Performance
Authors: Yordan Kalmukov
Journal-ref: PROCEEDINGS OF UNIVERSITY OF RUSE - 2023, volume 62, book 3.2., pp. 28-33
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

cURL (libcurl) is a popular and widely used library distributed with the php interpreter. It allows php applications to connect to and communicate with external resources (servers) by using wide variety of communication protocols. In most cases it is the preferred way of consuming external REST web services. Programmers usually use it for granted without even thinking of any performance issues. During an experimental analysis of the Hadoop's WebHDFS API throughput, it has been noted that read (download) speed from WebHDFS reduces with increasing the file size. However, this issue does not happen when writing to WebHDFS. Since the communication between the php application and the WebHDFS API is handled by the php's cURL library, then the cause of the download speed decrease could be either the cURL library itself or the API. This paper presents a series of experimental analyses aiming to determine the cause of the download speed decrease in previous experiments - whether it is the WebHDFS API or the php's cURL library. Both parties are tested in multiple ways separately and independently of each other. Results clearly prove (in two different ways) that the cause of the download speed decrease is the php's cURL library itself, not the consumed API.

[2]  arXiv:2405.00003 [pdf, other]
Title: TALICS$^3$: Tape Library Cloud Storage System Simulator
Comments: 19 pages, 11 figures. Submitted to Simulation Modelling Practice and Theory
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)

High performance computing data is surging fast into the exabyte-scale world, where tape libraries are the main platform for long-term durable data storage besides high-cost DNA. Tape libraries are extremely hard to model, but accurate modeling is critical for system administrators to obtain valid performance estimates for their designs. This research introduces a discrete event tape simulation platform that realistically models tape library behavior in a networked cloud environment, by incorporating real-world phenomena and effects. The platform addresses several challenges, including precise estimation of data access latency, rates of robot exchange, data collocation, deduplication/compression ratio, and attainment of durability goals through replication or erasure coding. The suggested simulator has the capability to compare the single enterprise configuration with multiple commodity library (RAIL) configurations, making it a useful tool for system administrators and reliability engineers. They can use the simulator to obtain practical and reliable performance estimates for their long-term, durable, and cost-effective cold data storage architecture designs.

[3]  arXiv:2405.00004 [pdf, other]
Title: Self-healing Nodes with Adaptive Data-Sharding
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Data sharding, a technique for partitioning and distributing data among multiple servers or nodes, offers enhancements in the scalability, performance, and fault tolerance of extensive distributed systems. Nonetheless, this strategy introduces novel challenges, including load balancing among shards, management of node failures and data loss, and adaptation to evolving data and workload patterns. This paper proposes an innovative approach to tackle these challenges by empowering self-healing nodes with adaptive data sharding. Leveraging concepts such as self-replication, fractal regeneration, sentient data sharding, and symbiotic node clusters, our approach establishes a dynamic and resilient data sharding scheme capable of addressing diverse scenarios and meeting varied requirements. Implementation and evaluation of our approach involve a prototype system simulating a large-scale distributed database across various data sharding scenarios. Comparative analyses against existing data sharding techniques highlight the superior scalability, performance, fault tolerance, and adaptability of our approach. Additionally, the paper delves into potential applications and limitations, providing insights into the future research directions that can further advance this innovative approach.

[4]  arXiv:2405.00005 [pdf, other]
Title: Scheduling of Distributed Applications on the Computing Continuum: A Survey
Comments: 7 pages, 3 figures, 3 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The demand for distributed applications has significantly increased over the past decade, with improvements in machine learning techniques fueling this growth. These applications predominantly utilize Cloud data centers for high-performance computing and Fog and Edge devices for low-latency communication for small-size machine learning model training and inference. The challenge of executing applications with different requirements on heterogeneous devices requires effective methods for solving NP-hard resource allocation and application scheduling problems. The state-of-the-art techniques primarily investigate conflicting objectives, such as the completion time, energy consumption, and economic cost of application execution on the Cloud, Fog, and Edge computing infrastructure. Therefore, in this work, we review these research works considering their objectives, methods, and evaluation tools. Based on the review, we provide a discussion on the scheduling methods in the Computing Continuum.

[5]  arXiv:2405.00006 [pdf, ps, other]
Title: Steel Plate Fault Detection using the Fitness Dependent Optimizer and Neural Networks
Comments: 19 pages
Subjects: Neural and Evolutionary Computing (cs.NE)

Detecting faults in steel plates is crucial for ensuring the safety and reliability of the structures and industrial equipment. Early detection of faults can prevent further damage and costly repairs. This chapter aims at diagnosing and predicting the likelihood of steel plates developing faults using experimental text data. Various machine learning methods such as GWO-based and FDO-based MLP and CMLP are tested to classify steel plates as either faulty or non-faulty. The experiments produced promising results for all models, with similar accuracy and performance. However, the FDO-based MLP and CMLP models consistently achieved the best results, with 100% accuracy in all tested datasets. The other models' outcomes varied from one experiment to another. The findings indicate that models that employed the FDO as a learning algorithm had the potential to achieve higher accuracy with a little longer runtime compared to other algorithms. In conclusion, early detection of faults in steel plates is critical for maintaining safety and reliability, and machine learning techniques can help predict and diagnose these faults accurately.

[6]  arXiv:2405.00007 [pdf, ps, other]
Title: Cantera-Based Python Computer Program for Solving Steam Power Cycles with Superheating
Authors: Osama A. Marzouk
Comments: 11 pages, 4 tables, journal paper
Journal-ref: International Journal of Emerging Technology and Advanced Engineering. 13(3), 63-73. 2023
Subjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)

One of the main sources of electricity generation is power plants that use water (steam) to rotate turbines, which drive large electric generators. The steam can be generated from renewable or non-renewable energy sources, such as geothermal energy and nuclear fuels. Having an analysis tool for modeling the performance of such steam power plants can greatly help in reaching optimum designs, leading to less fuel consumption, reduced pollution, and cheaper electricity. It is further advantageous if such modeling tool is free to access, does not require many inputs from the user, and gives results in a very short time. These remarks establish a motivation for the current study. This article documents a computer code written in the Python programming language for numerically analysing the main processes in a steam power cycle with superheating. The code utilizes built-in thermodynamic properties for water in the open-source software package "Cantera". A validation case with a benchmarking example in the literature using an independent source of water properties suggests that the developed code is correct. The code can be viewed as an extension to the Python examples for thermodynamic and power generation applications. Cantera can handle both subcritical and supercritical types of superheating. In the subcritical superheating, the steam absolute pressure does not exceed 220.9 bar. In the supercritical superheating, water becomes in a special condition called supercritical fluid, with absolute pressures above 220.9 bar.

[7]  arXiv:2405.00009 [pdf, other]
Title: Service Level Agreements and Security SLA: A Comprehensive Survey
Comments: 25 pages, 5 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

A Service Level Agreement (SLA) is a formal contract between a service provider and a consumer, representing a crucial instrument to define, manage, and maintain relationships between these two parties. The SLA's ability to define the Quality of Service (QoS) expectations, standards, and accountability helps to deliver high-quality services and increase client confidence in disparate application domains, such as Cloud computing and the Internet of Things. An open research direction in this context is related to the possible integration of new metrics to address the security and privacy aspects of services, thus providing protection of sensitive information, mitigating risks, and building trust. This survey paper identifies state of the art covering concepts, approaches, and open problems of SLA management with a distinctive and original focus on the recent development of Security SLA (SecSLA). It contributes by carrying out a comprehensive review and covering the gap between the analyses proposed in existing surveys and the most recent literature on this topic, spanning from 2017 to 2023. Moreover, it proposes a novel classification criterium to organize the analysis based on SLA life cycle phases. This original point of view can help both academics and industrial practitioners to understand and properly locate existing contributions in the advancement of the different aspects of SLA technology. The present work highlights the importance of the covered topics and the need for new research improvements to tackle present and demanding challenges.

[8]  arXiv:2405.00010 [pdf, ps, other]
Title: A Review on Industrial Augmented Reality Systems for the Industry 4.0 Shipyard
Comments: Accepted version of an IEEE Access journal article
Journal-ref: P. Fraga-Lamas, T. M. Fernandez-Carames, O. Blanco-Novoa and M. A. Vilar-Montesinos, "A Review on Industrial Augmented Reality Systems for the Industry 4.0 Shipyard," in IEEE Access, vol. 6, pp. 13358-13375, 2018
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Human-Computer Interaction (cs.HC)

Shipbuilding companies are upgrading their inner workings in order to create Shipyards 4.0, where the principles of Industry 4.0 are paving the way to further digitalized and optimized processes in an integrated network. Among the different Industry 4.0 technologies, this article focuses on Augmented Reality, whose application in the industrial field has led to the concept of Industrial Augmented Reality (IAR). This article first describes the basics of IAR and then carries out a thorough analysis of the latest IAR systems for industrial and shipbuilding applications. Then, in order to build a practical IAR system for shipyard workers, the main hardware and software solutions are compared. Finally, as a conclusion after reviewing all the aspects related to IAR for shipbuilding, it is proposed an IAR system architecture that combines Cloudlets and Fog Computing, which reduce latency response and accelerate rendering tasks while offloading compute intensive tasks from the Cloud.

[9]  arXiv:2405.00011 [pdf, other]
Title: A Multiscale Fracture Model using Peridynamic Enrichment of Finite Elements within an Adaptive Partition of Unity: Experimental Validation
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Partition of unity methods (PUM) are of domain decomposition type and provide the opportunity for multiscale and multiphysics numerical modeling. Within the PUM global-local enrichment scheme [1, 2] different physical models can exist to capture multiscale behavior. For instance, we consider classical linear elasticity globally and local zones where fractures occur. The elastic fields of the undamaged media provide appropriate boundary data for local PD simulations on a subdomain containing the crack tip to grow the crack path. Once the updated crack path is found, the elastic field in the body and surrounding the crack is updated using PUM basis with appropriate enrichment near the crack. The subdomain for the PD simulation is chosen to include the current crack tip as well as nearby features that will influence crack growth. This paper is part II of this series and validates the combined PD/PUM simulator against the experimental results presented in [3]. The presented results show that we can attain good agreement between experimental and simulation data with a local PD subdomain that is moving with the crack tip and adaptively chosen size.

[10]  arXiv:2405.00013 [pdf, other]
Title: The GA4GH Task Execution API: Enabling Easy Multi Cloud Task Execution
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The Global Alliance for Genomics and Health (GA4GH) Task Execution Service (TES) API is a standardized schema and API for describing and executing batch execution tasks. It provides a common way to submit and manage tasks to a variety of compute environments, including on premise High Performance Compute and High Throughput Computing (HPC/HTC) systems, Cloud computing platforms, and hybrid environments. The TES API is designed to be flexible and extensible, allowing it to be adapted to a wide range of use cases, such as "bringing compute to the data" solutions for federated and distributed data analysis or load balancing across multi cloud infrastructures. This API has been adopted by a number of different service providers and utilized by several workflow engines. Using its capabilities, genomes research institutes are building hybrid compute systems to study life science.

[11]  arXiv:2405.00015 [pdf, other]
Title: Experiences Porting Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-study
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Parallel algorithms relying on synchronous parallelization libraries often experience adverse performance due to global synchronization barriers. Asynchronous many-task runtimes offer task futurization capabilities that minimize or remove the need for global synchronization barriers. This paper conducts a case study of the multidimensional Fast Fourier Transform to identify which applications will benefit from the asynchronous many-task model. Our basis is the popular FFTW library. We use the asynchronous many-task model HPX and a one-dimensional FFTW backend to implement multiple versions using different HPX features and highlight overheads and pitfalls during migration. Furthermore, we add an HPX threading backend to FFTW. The case study analyzes shared memory scaling properties between our HPX-based parallelization and FFTW with its pthreads, OpenMP, and HPX backends. The case study also compares FFTW's MPI+X backend to a purely HPX-based distributed implementation. The FFT application does not profit from asynchronous task execution. In contrast, enforcing task synchronization results in better cache performance and thus better runtime. Nonetheless, the HPX backend for FFTW is competitive with existing backends. Our distributed HPX implementation based on HPX collectives using MPI parcelport performs similarly to FFTW's MPI+OpenMP. However, the LCI parcelport of HPX accelerated communication up to a factor of 5.

[12]  arXiv:2405.00016 [pdf, ps, other]
Title: HPX with Spack and Singularity Containers: Evaluating Overheads for HPX/Kokkos using an astrophysics application
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Instrumentation and Methods for Astrophysics (astro-ph.IM)

Cloud computing for high performance computing resources is an emerging topic. This service is of interest to researchers who care about reproducible computing, for software packages with complex installations, and for companies or researchers who need the compute resources only occasionally or do not want to run and maintain a supercomputer on their own. The connection between HPC and containers is exemplified by the fact that Microsoft Azure's Eagle cloud service machine is number three on the November 23 Top 500 list. For cloud services, the HPC application and dependencies are installed in containers, e.g. Docker, Singularity, or something else, and these containers are executed on the physical hardware. Although containerization leverages the existing Linux kernel and should not impose overheads on the computation, there is the possibility that machine-specific optimizations might be lost, particularly machine-specific installs of commonly used packages. In this paper, we will use an astrophysics application using HPX-Kokkos and measure overheads on homogeneous resources, e.g. Supercomputer Fugaku, using CPUs only and on heterogenous resources, e.g. LSU's hybrid CPU and GPU system. We will report on challenges in compiling, running, and using the containers as well as performance performance differences.

[13]  arXiv:2405.00017 [pdf, other]
Title: Queuing dynamics of asynchronous Federated Learning
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)

We study asynchronous federated learning mechanisms with nodes having potentially different computational speeds. In such an environment, each node is allowed to work on models with potential delays and contribute to updates to the central server at its own pace. Existing analyses of such algorithms typically depend on intractable quantities such as the maximum node delay and do not consider the underlying queuing dynamics of the system. In this paper, we propose a non-uniform sampling scheme for the central server that allows for lower delays with better complexity, taking into account the closed Jackson network structure of the associated computational graph. Our experiments clearly show a significant improvement of our method over current state-of-the-art asynchronous algorithms on an image classification problem.

[14]  arXiv:2405.00018 [pdf, other]
Title: Proof-of-concept: Using ChatGPT to Translate and Modernize an Earth System Model from Fortran to Python/JAX
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Atmospheric and Oceanic Physics (physics.ao-ph)

Earth system models (ESMs) are vital for understanding past, present, and future climate, but they suffer from legacy technical infrastructure. ESMs are primarily implemented in Fortran, a language that poses a high barrier of entry for early career scientists and lacks a GPU runtime, which has become essential for continued advancement as GPU power increases and CPU scaling slows. Fortran also lacks differentiability - the capacity to differentiate through numerical code - which enables hybrid models that integrate machine learning methods. Converting an ESM from Fortran to Python/JAX could resolve these issues. This work presents a semi-automated method for translating individual model components from Fortran to Python/JAX using a large language model (GPT-4). By translating the photosynthesis model from the Community Earth System Model (CESM), we demonstrate that the Python/JAX version results in up to 100x faster runtimes using GPU parallelization, and enables parameter estimation via automatic differentiation. The Python code is also easy to read and run and could be used by instructors in the classroom. This work illustrates a path towards the ultimate goal of making climate models fast, inclusive, and differentiable.

[15]  arXiv:2405.00021 [pdf, other]
Title: SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Recently, interpreting complex charts with logical reasoning have emerged as challenges due to the development of vision-language models. A prior state-of-the-art (SOTA) model, Deplot, has presented an end-to-end method that leverages the vision-language model to convert charts into table format utilizing Large Language Models (LLMs) for reasoning. However, unlike natural images, charts contain a mix of essential and irrelevant information required for chart reasoning, and we discover that this characteristic can lower the performance of chart-to-table extraction. In this paper, we introduce SIMPLOT, a method designed to extract only the elements necessary for chart reasoning. The proposed method involves two steps: 1) training to mimic a simple plot that contains only the essential information from a complex chart for table extraction, followed by 2) performing reasoning based on the table. Our model enables accurate chart reasoning without the need for additional annotations or datasets, and its effectiveness is demonstrated through various experiments. Furthermore, we propose a novel prompt addressing the shortcoming of recent SOTA model, ignoring visual attributes such as color. Our source code is available at https://github.com/sangwu99/Simplot.

[16]  arXiv:2405.00023 [pdf, ps, other]
Title: Revolutionizing Retail Analytics: Advancing Inventory and Customer Insight with AI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In response to the significant challenges facing the retail sector, including inefficient queue management, poor demand forecasting, and ineffective marketing, this paper introduces an innovative approach utilizing cutting-edge machine learning technologies. We aim to create an advanced smart retail analytics system (SRAS), leveraging these technologies to enhance retail efficiency and customer engagement. To enhance customer tracking capabilities, a new hybrid architecture is proposed integrating several predictive models. In the first stage of the proposed hybrid architecture for customer tracking, we fine-tuned the YOLOV8 algorithm using a diverse set of parameters, achieving exceptional results across various performance metrics. This fine-tuning process utilized actual surveillance footage from retail environments, ensuring its practical applicability. In the second stage, we explored integrating two sophisticated object-tracking models, BOT-SORT and ByteTrack, with the labels detected by YOLOV8. This integration is crucial for tracing customer paths within stores, which facilitates the creation of accurate visitor counts and heat maps. These insights are invaluable for understanding consumer behavior and improving store operations. To optimize inventory management, we delved into various predictive models, optimizing and contrasting their performance against complex retail data patterns. The GRU model, with its ability to interpret time-series data with long-range temporal dependencies, consistently surpassed other models like Linear Regression, showing 2.873% and 29.31% improvements in R2-score and mAPE, respectively.

[17]  arXiv:2405.00024 [pdf, ps, other]
Title: Swarm UAVs Communication
Comments: 50 pages, 17 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)

The advancement in cyber-physical systems has opened a new way in disaster management and rescue operations. The usage of UAVs is very promising in this context. UAVs, mainly quadcopters, are small in size and their payload capacity is limited. A single UAV can not traverse the whole area. Hence multiple UAVs or swarms of UAVs come into the picture managing the entire payload in a modular and equiproportional manner. In this work we have explored a vast topic related to UAVs. Among the UAVs quadcopter is the main focus. We explored the types of quadcopters, their flying strategy,their communication protocols, architecture and controlling techniques, followed by the swarm behaviour in nature and UAVs. Swarm behaviour and a few swarm optimization algorithms has been explored here. Swarm architecture and communication in between swarm UAV networks also got a special attention in our work. In disaster management the UAV swarm network must have to search a large area. And for this proper path planning algorithm is required. We have discussed the existing path planning algorithm, their advantages and disadvantages in great detail. Formation maintenance of the swarm network is an important issue which has been explored through leader-follower technique. The wireless path loss model has been modelled using friis and ground ray reflection model. Using this path loss models we have managed to create the link budget and simulate the variation of communication link performance with the variation of distance.

[18]  arXiv:2405.00025 [pdf, other]
Title: Leveraging Pre-trained CNNs for Efficient Feature Extraction in Rice Leaf Disease Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Rice disease classification is a critical task in agricultural research, and in this study, we rigorously evaluate the impact of integrating feature extraction methodologies within pre-trained convolutional neural networks (CNNs). Initial investigations into baseline models, devoid of feature extraction, revealed commendable performance with ResNet-50 and ResNet-101 achieving accuracies of 91% and 92%, respectively. Subsequent integration of Histogram of Oriented Gradients (HOG) yielded substantial improvements across architectures, notably propelling the accuracy of EfficientNet-B7 from 92\% to an impressive 97%. Conversely, the application of Local Binary Patterns (LBP) demonstrated more conservative performance enhancements. Moreover, employing Gradient-weighted Class Activation Mapping (Grad-CAM) unveiled that HOG integration resulted in heightened attention to disease-specific features, corroborating the performance enhancements observed. Visual representations further validated HOG's notable influence, showcasing a discernible surge in accuracy across epochs due to focused attention on disease-affected regions. These results underscore the pivotal role of feature extraction, particularly HOG, in refining representations and bolstering classification accuracy. The study's significant highlight was the achievement of 97% accuracy with EfficientNet-B7 employing HOG and Grad-CAM, a noteworthy advancement in optimizing pre-trained CNN-based rice disease identification systems. The findings advocate for the strategic integration of advanced feature extraction techniques with cutting-edge pre-trained CNN architectures, presenting a promising avenue for substantially augmenting the precision and effectiveness of image-based disease classification systems in agricultural contexts.

[19]  arXiv:2405.00026 [pdf, ps, other]
Title: Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)

Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transaction data, focusing on technical advancements for robust and precise fraud detection. Results demonstrat e that the integration of NN and SMOTE exhibits superior precision, recall, and F1-score compared to traditional models, highlighting its potential as an advanced solution for handling imbalanced datasets in credit card fraud detection scenarios. This rese arch contributes to the ongoing efforts to develop effective and efficient mechanisms for safeguarding financial transactions from fraudulent activities.

[20]  arXiv:2405.00027 [pdf, other]
Title: Multidimensional Compressed Sensing for Spectral Light Field Imaging
Comments: 8 pages, published of VISAPP 2024
Journal-ref: In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP 2024, ISBN 978-989-758-679-8, ISSN 2184-4321, pages 349-356
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work where a light field is vectorized to a 1D signal, our method employs a 5D basis and a novel 5D measurement model, hence, matching the intrinsic dimensionality of multispectral light fields. We mathematically and empirically show the equivalence of 5D and 1D sensing models, and most importantly that the 5D framework achieves orders of magnitude faster reconstruction while requiring a small fraction of the memory. Moreover, our new multidimensional sensing model opens new research directions for designing efficient visual data acquisition algorithms and hardware.

[21]  arXiv:2405.00028 [pdf, ps, other]
Title: MaRDIFlow: A CSE workflow framework for abstracting meta-data from FAIR computational experiments
Comments: 13 pages, 7 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Numerical algorithms and computational tools are instrumental in navigating and addressing complex simulation and data processing tasks. The exponential growth of metadata and parameter-driven simulations has led to an increasing demand for automated workflows that can replicate computational experiments across platforms. In general, a computational workflow is defined as a sequential description for accomplishing a scientific objective, often described by tasks and their associated data dependencies. If characterized through input-output relation, workflow components can be structured to allow interchangeable utilization of individual tasks and their accompanying metadata. In the present work, we develop a novel computational framework, namely, MaRDIFlow, that focuses on the automation of abstracting meta-data embedded in an ontology of mathematical objects. This framework also effectively addresses the inherent execution and environmental dependencies by incorporating them into multi-layered descriptions. Additionally, we demonstrate a working prototype with example use cases and methodically integrate them into our workflow tool and data provenance framework. Furthermore, we show how to best apply the FAIR principles to computational workflows, such that abstracted components are Findable, Accessible, Interoperable, and Reusable in nature.

[22]  arXiv:2405.00029 [pdf, ps, other]
Title: Automatic Creative Selection with Cross-Modal Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Application developers advertise their Apps by creating product pages with App images, and bidding on search terms. It is then crucial for App images to be highly relevant with the search terms. Solutions to this problem require an image-text matching model to predict the quality of the match between the chosen image and the search terms. In this work, we present a novel approach to matching an App image to search terms based on fine-tuning a pre-trained LXMERT model. We show that compared to the CLIP model and a baseline using a Transformer model for search terms, and a ResNet model for images, we significantly improve the matching accuracy. We evaluate our approach using two sets of labels: advertiser associated (image, search term) pairs for a given application, and human ratings for the relevance between (image, search term) pairs. Our approach achieves 0.96 AUC score for advertiser associated ground truth, outperforming the transformer+ResNet baseline and the fine-tuned CLIP model by 8% and 14%. For human labeled ground truth, our approach achieves 0.95 AUC score, outperforming the transformer+ResNet baseline and the fine-tuned CLIP model by 16% and 17%.

[23]  arXiv:2405.00030 [pdf, other]
Title: DeepOps & SLURM: Your GPU Cluster Guide
Authors: Arindam Majee
Comments: 32 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In the ever evolving landscape of deep learning, unlocking the potential of cutting-edge models demands computational resources that surpass the capabilities of individual machines. Enter the NVIDIA DeepOps Slurm cluster, a meticulously orchestrated symphony of high-performance nodes, each equipped with powerful GPUs and meticulously managed by the efficient Slurm resource allocation system. This guide serves as your comprehensive roadmap, empowering you to harness the immense parallel processing capabilities of this cluster and propel your deep learning endeavors to new heights. Whether you are a seasoned deep learning practitioner seeking to optimize performance or a newcomer eager to unlock the power of parallel processing, this guide caters to your needs. We wll delve into the intricacies of the cluster hardware architecture, exploring the capabilities of its GPUs and the underlying network fabric. You will master the art of leveraging DeepOps containers for efficient and reproducible workflows, fine-tune resource configurations for optimal performance, and confidently submit jobs to unleash the full potential of parallel processing.

[24]  arXiv:2405.00031 [pdf, other]
Title: SegNet: A Segmented Deep Learning based Convolutional Neural Network Approach for Drones Wildfire Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

This research addresses the pressing challenge of enhancing processing times and detection capabilities in Unmanned Aerial Vehicle (UAV)/drone imagery for global wildfire detection, despite limited datasets. Proposing a Segmented Neural Network (SegNet) selection approach, we focus on reducing feature maps to boost both time resolution and accuracy significantly advancing processing speeds and accuracy in real-time wildfire detection. This paper contributes to increased processing speeds enabling real-time detection capabilities for wildfire, increased detection accuracy of wildfire, and improved detection capabilities of early wildfire, through proposing a new direction for image classification of amorphous objects like fire, water, smoke, etc. Employing Convolutional Neural Networks (CNNs) for image classification, emphasizing on the reduction of irrelevant features vital for deep learning processes, especially in live feed data for fire detection. Amidst the complexity of live feed data in fire detection, our study emphasizes on image feed, highlighting the urgency to enhance real-time processing. Our proposed algorithm combats feature overload through segmentation, addressing challenges arising from diverse features like objects, colors, and textures. Notably, a delicate balance of feature map size and dataset adequacy is pivotal. Several research papers use smaller image sizes, compromising feature richness which necessitating a new approach. We illuminate the critical role of pixel density in retaining essential details, especially for early wildfire detection. By carefully selecting number of filters during training, we underscore the significance of higher pixel density for proper feature selection. The proposed SegNet approach is rigorously evaluated using real-world dataset obtained by a drone flight and compared to state-of-the-art literature.

[25]  arXiv:2405.00038 [pdf, other]
Title: Getting a Handle on Unmanaged Memory
Subjects: Programming Languages (cs.PL); Operating Systems (cs.OS)

The inability to relocate objects in unmanaged languages brings with it a menagerie of problems. Perhaps the most impactful is memory fragmentation, which has long plagued applications such as databases and web servers. These issues either fester or require Herculean programmer effort to address on a per-application basis because, in general, heap objects cannot be moved in unmanaged languages. In contrast, managed languages like C# cleanly address fragmentation through the use of compacting garbage collection techniques built upon heap object movement. In this work, we bridge this gap between unmanaged and managed languages through the use of handles, a level of indirection allowing heap object movement. Handles open the door to seamlessly employ runtime features from managed languages in existing, unmodified code written in unmanaged languages. We describe a new compiler and runtime system, ALASKA, that acts as a drop-in replacement for malloc. Without any programmer effort, the ALASKA compiler transforms pointer-based code to utilize handles, with optimizations to reduce performance impact. A codesigned runtime system manages this level of indirection and exploits heap object movement via an extensible service interface. We investigate the overheads of ALASKA on large benchmarks and applications spanning multiple domains. To show the power and extensibility of handles, we use ALASKA to eliminate fragmentation on the heap through compaction, reducing memory usage by up to 40% in Redis.

[26]  arXiv:2405.00040 [pdf, ps, other]
Title: A guideline for the methodology chapter in computer science dissertations
Authors: Marco Araujo
Subjects: General Literature (cs.GL)

Rather than simply offering suggestions, this guideline for the methodology chapter in computer science dissertations provides thorough insights on how to develop a strong research methodology within the area of computer science. The method is structured into several parts starting with an overview of research strategies which include experiments, surveys, interviews and case studies. The guide highlights the significance of defining a research philosophy and reasoning by talking about paradigms such as positivism, constructivism and pragmatism. Besides, it reveals the importance of types of research including deductive and inductive methodologies; basic versus applied research approaches. Moreover, this guideline discusses data collection and analysis intricacies that divide data into quantitative and qualitative typologies. It explains different ways in which data can be collected from observation to experimentation, interviews or surveys. It also mentions ethical considerations in research emphasizing ethical behavior like following academic principles. In general, this guideline is an essential tool for undertaking computer science dissertations that help researchers structure their work while maintaining ethical standards in their study design.

[27]  arXiv:2405.00055 [pdf, other]
Title: A Hybrid Probabilistic Battery Health Management Approach for Robust Inspection Drone Operations
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Health monitoring of remote critical infrastructure is a complex and expensive activity due to the limited infrastructure accessibility. Inspection drones are ubiquitous assets that enhance the reliability of critical infrastructures through improved accessibility. However, due to the harsh operation environment, it is crucial to monitor their health to ensure successful inspection operations. The battery is a key component that determines the overall reliability of the inspection drones and, with an appropriate health management approach, contributes to reliable and robust inspections. In this context, this paper presents a novel hybrid probabilistic approach for battery end-of-discharge (EOD) voltage prediction of Li-Po batteries. The hybridization is achieved in an error-correction configuration, which combines physics-based discharge and probabilistic error-correction models to quantify the aleatoric and epistemic uncertainty. The performance of the hybrid probabilistic methodology was empirically evaluated on a dataset comprising EOD voltage under varying load conditions. The dataset was obtained from real inspection drones operated on different flights, focused on offshore wind turbine inspections. The proposed approach has been tested with different probabilistic methods and demonstrates 14.8% improved performance in probabilistic accuracy compared to the best probabilistic method. In addition, aleatoric and epistemic uncertainties provide robust estimations to enhance the diagnosis of battery health-states.

[28]  arXiv:2405.00056 [pdf, other]
Title: Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation
Comments: 13 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2312.09953
Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)

Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs movements and AoI is formulated as a mean field game (MFG). The MFG optimization yields an expansive solution space encompassing continuous state and action, resulting in significant computational complexity. To address practical situations, we propose, a new mean field hybrid proximal policy optimization (MF-HPPO) scheme to minimize the average AoI by optimizing the UAV's trajectories and data collection scheduling of the ground sensors given mixed continuous and discrete actions. Furthermore, a long short term memory (LSTM) is leveraged in MF-HPPO to predict the time-varying network state and stabilize the training. Numerical results demonstrate that the proposed MF-HPPO reduces the average AoI by up to 45 percent and 57 percent in the considered simulation setting, as compared to multi-agent deep Q-learning (MADQN) method and non-learning random algorithm, respectively.

[29]  arXiv:2405.00062 [pdf, ps, other]
Title: Hardware Accelerators for Autonomous Cars: A Review
Comments: 14 pages, 15 figures, 4 tables
Subjects: Hardware Architecture (cs.AR); Robotics (cs.RO)

Autonomous Vehicles (AVs) redefine transportation with sophisticated technology, integrating sensors, cameras, and intricate algorithms. Implementing machine learning in AV perception demands robust hardware accelerators to achieve real-time performance at reasonable power consumption and footprint. Lot of research and development efforts using different technologies are still being conducted to achieve the goal of getting a fully AV and some cars manufactures offer commercially available systems. Unfortunately, they still lack reliability because of the repeated accidents they have encountered such as the recent one which happened in California and for which the Cruise company had its license suspended by the state of California for an undetermined period [1]. This paper critically reviews the most recent findings of machine vision systems used in AVs from both hardware and algorithmic points of view. It discusses the technologies used in commercial cars with their pros and cons and suggests possible ways forward. Thus, the paper can be a tangible reference for researchers who have the opportunity to get involved in designing machine vision systems targeting AV

[30]  arXiv:2405.00066 [pdf, other]
Title: Research and application of artificial intelligence based webshell detection model: A literature review
Comments: 21 pages, 6 figures
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Webshell, as the "culprit" behind numerous network attacks, is one of the research hotspots in the field of cybersecurity. However, the complexity, stealthiness, and confusing nature of webshells pose significant challenges to the corresponding detection schemes. With the rise of Artificial Intelligence (AI) technology, researchers have started to apply different intelligent algorithms and neural network architectures to the task of webshell detection. However, the related research still lacks a systematic and standardized methodological process, which is confusing and redundant. Therefore, following the development timeline, we carefully summarize the progress of relevant research in this field, dividing it into three stages: Start Stage, Initial Development Stage, and In-depth Development Stage. We further elaborate on the main characteristics and core algorithms of each stage. In addition, we analyze the pain points and challenges that still exist in this field and predict the future development trend of this field from our point of view. To the best of our knowledge, this is the first review that details the research related to AI-based webshell detection. It is also hoped that this paper can provide detailed technical information for more researchers interested in AI-based webshell detection tasks.

[31]  arXiv:2405.00073 [pdf, other]
Title: Loyal Wingman Assessment: Social Navigation for Human-Autonomous Collaboration in Simulated Air Combat
Subjects: Human-Computer Interaction (cs.HC)

This study proposes social navigation metrics for autonomous agents in air combat, aiming to facilitate their smooth integration into pilot formations. The absence of such metrics poses challenges to safety and effectiveness in mixed human-autonomous teams. The proposed metrics prioritize naturalness and comfort. We suggest validating them through a user study involving military pilots in simulated air combat scenarios alongside autonomous loyal wingmen. The experiment will involve setting up simulations, designing scenarios, and evaluating performance using feedback from questionnaires and data analysis. These metrics aim to enhance the operational performance of autonomous loyal wingmen, thereby contributing to safer and more strategic air combat.

[32]  arXiv:2405.00074 [pdf, other]
Title: PAODING: A High-fidelity Data-free Pruning Toolkit for Debloating Pre-trained Neural Networks
Comments: 3 pages
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)

We present PAODING, a toolkit to debloat pretrained neural network models through the lens of data-free pruning. To preserve the model fidelity, PAODING adopts an iterative process, which dynamically measures the effect of deleting a neuron to identify candidates that have the least impact to the output layer. Our evaluation shows that PAODING can significantly reduce the model size, generalize on different datasets and models, and meanwhile preserve the model fidelity in terms of test accuracy and adversarial robustness. PAODING is publicly available on PyPI via https://pypi.org/project/paoding-dl.

[33]  arXiv:2405.00076 [pdf, ps, other]
Title: On Correcting SHAP Scores
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent work uncovered examples of classifiers for which SHAP scores yield misleading feature attributions. While such examples might be perceived as suggesting the inadequacy of Shapley values for explainability, this paper shows that the source of the identified shortcomings of SHAP scores resides elsewhere. Concretely, the paper makes the case that the failings of SHAP scores result from the characteristic functions used in earlier works. Furthermore, the paper identifies a number of properties that characteristic functions ought to respect, and proposes several novel characteristic functions, each exhibiting one or more of the desired properties. More importantly, some of the characteristic functions proposed in this paper are guaranteed not to exhibit any of the shortcomings uncovered by earlier work. The paper also investigates the impact of the new characteristic functions on the complexity of computing SHAP scores. Finally, the paper proposes modifications to the tool SHAP to use instead one of our novel characteristic functions, thereby eliminating some of the limitations reported for SHAP scores.

[34]  arXiv:2405.00077 [pdf, other]
Title: BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samples, and (3) sampling misalignment, due to instrumental limitations, impacting downstream brain network analysis and clinical outcome predictions. In this work, we propose a novel model called BrainODE to achieve continuous modeling of dynamic brain signals using Ordinary Differential Equations (ODE). By learning latent initial values and neural ODE functions from irregular time series, BrainODE effectively reconstructs brain signals at any time point, mitigating the aforementioned three data challenges of brain signals altogether. Comprehensive experimental results on real-world neuroimaging datasets demonstrate the superior performance of BrainODE and its capability of addressing the three data challenges.

[35]  arXiv:2405.00078 [pdf, other]
Title: Mitigating Spectre-PHT using Speculation Barriers in Linux BPF
Subjects: Cryptography and Security (cs.CR); Operating Systems (cs.OS)

High-performance IO demands low-overhead communication between user- and kernel space. This demand can no longer be fulfilled by traditional system calls. Linux's extended Berkeley Packet Filter (BPF) avoids user-/kernel transitions by just-in-time compiling user-provided bytecode and executing it in kernel mode with near-native speed. To still isolate BPF programs from the kernel, they are statically analyzed for memory- and type-safety, which imposes some restrictions but allows for good expressiveness and high performance. However, to mitigate the Spectre vulnerabilities disclosed in 2018, defenses which reject potentially-dangerous programs had to be deployed. We find that this affects 24% to 54% of programs in a dataset with 844 real-world BPF programs from popular open-source projects. To solve this, users are forced to disable the defenses to continue using the programs, which puts the entire system at risk.
To enable secure and expressive untrusted Linux kernel extensions, we propose Berrify, an enhancement to the kernel's Spectre defenses that reduces the number of BPF application programs rejected from 54% to zero. We measure Berrify's overhead for all mainstream performance-sensitive applications of BPF (i.e., event tracing, profiling, and packet processing) and find that it improves significantly upon the status-quo where affected BPF programs are either unusable or enable transient execution attacks on the kernel.

[36]  arXiv:2405.00080 [pdf, other]
Title: Recommenadation aided Caching using Combinatorial Multi-armed Bandits
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Networking and Internet Architecture (cs.NI)

We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increase cache hits. We formulate the cache hit optimization problem as a combinatorial multi-armed bandit (CMAB). We propose a UCB-based algorithm to decide which contents to cache and recommend. We provide an upper bound on the regret of our algorithm. We numerically demonstrate the performance of our algorithm and compare it to state-of-the-art algorithms.

[37]  arXiv:2405.00099 [pdf, other]
Title: Creative Beam Search
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Large language models are revolutionizing several areas, including artificial creativity. However, the process of generation in machines profoundly diverges from that observed in humans. In particular, machine generation is characterized by a lack of intentionality and an underlying creative process. We propose a method called Creative Beam Search that uses Diverse Beam Search and LLM-as-a-Judge to perform response generation and response validation. The results of a qualitative experiment show how our approach can provide better output than standard sampling techniques. We also show that the response validation step is a necessary complement to the response generation step.

[38]  arXiv:2405.00109 [pdf, other]
Title: Successive Interference Cancellation for ISAC in a Large Full-Duplex Cellular Network
Subjects: Information Theory (cs.IT)

To reuse the scarce spectrum efficiently, a large full-duplex cellular network with integrated sensing and communication (ISAC) is studied. Monostatic detection at the base station (BS) is considered. At the BS, we receive two signals: the communication-mode uplink signal to be decoded and the radar-mode signal to be detected. After self-interference cancellation (SIC), inspired by NOMA, successive interference cancellation (SuIC) is a natural strategy at the BS to retrieve both signals. However, the ordering of SuIC, usually based on some measure of channel strength, is not clear as the radar-mode target is unknown. The detection signal suffers a double path-loss making it vulnerable, but the uplink signal to be decoded originates at a user which has much lower power than the BS making it weak as well. Further, the intercell interference from a large network reduces the channel disparity between the two signals. We investigate the impact of both SuIC orders at the BS, i.e., decoding $1^{st}$ or detecting $1^{st}$ and highlight the importance of careful order selection. We find the existence of a threshold target distance before which detecting $1^{st}$ is superior and decoding $2^{nd}$ does not suffer much. After this distance, both decoding $1^{st}$ and detecting $2^{nd}$ is superior. Similarly, a threshold UE power exists after which the optimum SuIC order changes. We consider imperfections in SIC; this helps highlight the vulnerability of the decoding and detection in the setup.

[39]  arXiv:2405.00117 [pdf, ps, other]
Title: Training a high-performance retinal foundation model with half-the-data and 400 times less compute
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Artificial Intelligence holds tremendous potential in medicine, but is traditionally limited by the lack of massive datasets to train models on. Foundation models, pre-trained models that can be adapted to downstream tasks with small datasets, could alleviate this problem. Researchers at Moorfields Eye Hospital (MEH) proposed RETFound-MEH, a foundation model for retinal imaging that was trained on 900,000 images, including private hospital data. Recently, data-efficient DERETFound was proposed that provides comparable performance while being trained on only 150,000 images that are all publicly available. However, both these models required very substantial resources to train initially and are resource-intensive in downstream use. We propose a novel Token Reconstruction objective that we use to train RETFound-Green, a retinal foundation model trained using only 75,000 publicly available images and 400 times less compute. We estimate the cost of training RETFound-MEH and DERETFound at $10,000 and $14,000, respectively, while RETFound-Green could be trained for less than $100, with equally reduced environmental impact. RETFound-Green is also far more efficient in downstream use: it can be downloaded 14 times faster, computes vector embeddings 2.7 times faster which then require 2.6 times less storage space. Despite this, RETFound-Green does not perform systematically worse. In fact, it performs best on 14 tasks, compared to six for DERETFound and two for RETFound-MEH. Our results suggest that RETFound-Green is a very efficient, high-performance retinal foundation model. We anticipate that our Token Reconstruction objective could be scaled up for even higher performance and be applied to other domains beyond retinal imaging.

[40]  arXiv:2405.00123 [pdf, other]
Title: Graph Neural Network Approach to Semantic Type Detection in Tables
Journal-ref: In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 121-133. Singapore: Springer Nature Singapore, 2024
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

This study addresses the challenge of detecting semantic column types in relational tables, a key task in many real-world applications. While language models like BERT have improved prediction accuracy, their token input constraints limit the simultaneous processing of intra-table and inter-table information. We propose a novel approach using Graph Neural Networks (GNNs) to model intra-table dependencies, allowing language models to focus on inter-table information. Our proposed method not only outperforms existing state-of-the-art algorithms but also offers novel insights into the utility and functionality of various GNN types for semantic type detection. The code is available at https://github.com/hoseinzadeehsan/GAIT

[41]  arXiv:2405.00127 [pdf, other]
Title: GPU-friendly Stroke Expansion
Subjects: Graphics (cs.GR)

Vector graphics includes both filled and stroked paths as the main primitives. While there are many techniques for rendering filled paths on GPU, stroked paths have proved more elusive. This paper presents a technique for performing stroke expansion, namely the generation of the outline representing the stroke of the given input path. Stroke expansion is a global problem, with challenging constraints on continuity and correctness. Nonetheless, we implement it using a fully parallel algorithm suitable for execution in a GPU compute shader, with minimal preprocessing. The output of our method can be either line or circular arc segments, both of which are well suited to GPU rendering, and the number of segments is minimal. We introduce several novel techniques, including an encoding of vector graphics primitives suitable for parallel processing, and an Euler spiral based method for computing approximations to parallel curves and evolutes.

[42]  arXiv:2405.00129 [pdf, other]
Title: Complex contagions can outperform simple contagions for network reconstruction with dense networks or saturated dynamics
Comments: 8 pages, 5 figures
Subjects: Social and Information Networks (cs.SI); Populations and Evolution (q-bio.PE); Machine Learning (stat.ML)

Network scientists often use complex dynamic processes to describe network contagions, but tools for fitting contagion models typically assume simple dynamics. Here, we address this gap by developing a nonparametric method to reconstruct a network and dynamics from a series of node states, using a model that breaks the dichotomy between simple pairwise and complex neighborhood-based contagions. We then show that a network is more easily reconstructed when observed through the lens of complex contagions if it is dense or the dynamic saturates, and that simple contagions are better otherwise.

[43]  arXiv:2405.00131 [pdf, other]
Title: Finding Diverse Strings and Longest Common Subsequences in a Graph
Comments: Proceedings of 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024), Leibniz International Proceedings in Informatics, Vol.296, pp.21:0-21:17, June 2024
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Formal Languages and Automata Theory (cs.FL)

In this paper, we study for the first time the Diverse Longest Common Subsequences (LCSs) problem under Hamming distance. Given a set of a constant number of input strings, the problem asks to decide if there exists some subset $\mathcal X$ of $K$ longest common subsequences whose diversity is no less than a specified threshold $\Delta$, where we consider two types of diversities of a set $\mathcal X$ of strings of equal length: the Sum diversity and the Min diversity defined as the sum and the minimum of the pairwise Hamming distance between any two strings in $\mathcal X$, respectively. We analyze the computational complexity of the respective problems with Sum- and Min-diversity measures, called the Max-Sum and Max-Min Diverse LCSs, respectively, considering both approximation algorithms and parameterized complexity. Our results are summarized as follows. When $K$ is bounded, both problems are polynomial time solvable. In contrast, when $K$ is unbounded, both problems become NP-hard, while Max-Sum Diverse LCSs problem admits a PTAS. Furthermore, we analyze the parameterized complexity of both problems with combinations of parameters $K$ and $r$, where $r$ is the length of the candidate strings to be selected. Importantly, all positive results above are proven in a more general setting, where an input is an edge-labeled directed acyclic graph (DAG) that succinctly represents a set of strings of the same length. Negative results are proven in the setting where an input is explicitly given as a set of strings. The latter results are equipped with an encoding such a set as the longest common subsequences of a specific input string set.

[44]  arXiv:2405.00134 [pdf, other]
Title: Transforming Dutch: Debiasing Dutch Coreference Resolution Systems for Non-binary Pronouns
Comments: 22 pages, 2 figures. Accepted at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Gender-neutral pronouns are increasingly being introduced across Western languages. Recent evaluations have however demonstrated that English NLP systems are unable to correctly process gender-neutral pronouns, with the risk of erasing and misgendering non-binary individuals. This paper examines a Dutch coreference resolution system's performance on gender-neutral pronouns, specifically hen and die. In Dutch, these pronouns were only introduced in 2016, compared to the longstanding existence of singular they in English. We additionally compare two debiasing techniques for coreference resolution systems in non-binary contexts: Counterfactual Data Augmentation (CDA) and delexicalisation. Moreover, because pronoun performance can be hard to interpret from a general evaluation metric like LEA, we introduce an innovative evaluation metric, the pronoun score, which directly represents the portion of correctly processed pronouns. Our results reveal diminished performance on gender-neutral pronouns compared to gendered counterparts. Nevertheless, although delexicalisation fails to yield improvements, CDA substantially reduces the performance gap between gendered and gender-neutral pronouns. We further show that CDA remains effective in low-resource settings, in which a limited set of debiasing documents is used. This efficacy extends to previously unseen neopronouns, which are currently infrequently used but may gain popularity in the future, underscoring the viability of effective debiasing with minimal resources and low computational costs.

[45]  arXiv:2405.00135 [pdf, other]
Title: Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach
Comments: This work has been submitted to the IEEE Communications Letters
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Task-oriented semantic communications (TSC) enhance radio resource efficiency by transmitting task-relevant semantic information. However, current research often overlooks the inherent semantic distinctions among encoded features. Due to unavoidable channel variations from time and frequency-selective fading, semantically sensitive feature units could be more susceptible to erroneous inference if corrupted by dynamic channels. Therefore, this letter introduces a unified channel-resilient TSC framework via information bottleneck. This framework complements existing TSC approaches by controlling information flow to capture fine-grained feature-level semantic robustness. Experiments on a case study for real-time subchannel allocation validate the framework's effectiveness.

[46]  arXiv:2405.00136 [pdf, other]
Title: Data-Driven Permissible Safe Control with Barrier Certificates
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)

This paper introduces a method of identifying a maximal set of safe strategies from data for stochastic systems with unknown dynamics using barrier certificates. The first step is learning the dynamics of the system via Gaussian process (GP) regression and obtaining probabilistic errors for this estimate. Then, we develop an algorithm for constructing piecewise stochastic barrier functions to find a maximal permissible strategy set using the learned GP model, which is based on sequentially pruning the worst controls until a maximal set is identified. The permissible strategies are guaranteed to maintain probabilistic safety for the true system. This is especially important for learning-enabled systems, because a rich strategy space enables additional data collection and complex behaviors while remaining safe. Case studies on linear and nonlinear systems demonstrate that increasing the size of the dataset for learning the system grows the permissible strategy set.

[47]  arXiv:2405.00138 [pdf, other]
Title: Rolling in the Shadows: Analyzing the Extraction of MEV Across Layer-2 Rollups
Subjects: Cryptography and Security (cs.CR)

The emergence of decentralized finance has transformed asset trading on the blockchain, making traditional financial instruments more accessible while also introducing a series of exploitative economic practices known as Maximal Extractable Value (MEV). Concurrently, decentralized finance has embraced rollup-based Layer-2 solutions to facilitate asset trading at reduced transaction costs compared to Layer-1 solutions such as Ethereum. However, rollups lack a public mempool like Ethereum, making the extraction of MEV more challenging. In this paper, we investigate the prevalence and impact of MEV on Ethereum and prominent rollups such as Arbitrum, Optimism, and zkSync over a nearly three-year period. Our analysis encompasses various metrics including volume, profits, costs, competition, and response time to MEV opportunities. We discover that MEV is widespread on rollups, with trading volume comparable to Ethereum. We also find that, although MEV costs are lower on rollups, profits are also significantly lower compared to Ethereum. Additionally, we examine the prevalence of sandwich attacks on rollups. While our findings did not detect any sandwiching activity on popular rollups, we did identify the potential for cross-layer sandwich attacks facilitated by transactions that are sent across rollups and Ethereum. Consequently, we propose and evaluate the feasibility of three novel attacks that exploit cross-layer transactions, revealing that attackers could have already earned approximately 2 million USD through cross-layer sandwich attacks.

[48]  arXiv:2405.00139 [pdf, ps, other]
Title: AfricAIED 2024: 2nd Workshop on Artificial Intelligence in Education in Africa
Comments: Satellite workshop at the 25th International Conference on Artificial Intelligence in Education
Subjects: Computers and Society (cs.CY)

Recent AI advancements offer transformative potential for global education, yet their application often overlooks Africa's unique educational landscape. AfricAIED 2024 will address this gap, spotlighting efforts to develop AI in Education (AIED) systems tailored to Africa's needs. Building on the success of the inaugural workshop, AfricAIED 2024 will feature an online AI Hackathon focused on democratizing preparation for Ghana's National Science & Maths Quiz (NSMQ). Participants will create open-source AI tools leveraging resources from the Brilla AI project to level the academic playing field and enhance science and math education across Africa. The workshop will showcase top competitors' solutions, invite discussions on AIED opportunities and challenges in Africa, and highlight the latest advancements in AI education integration. AfricAIED 2024 aims to foster collaboration and innovation, amplifying African voices in the AIED community and driving positive change in African education through AI.

[49]  arXiv:2405.00141 [pdf, ps, other]
Title: RIS-aided Wireless Communication with Movable Elements Geometry Impact on Performance
Comments: 5 pages, 4 figures
Subjects: Systems and Control (eess.SY)

Reconfigurable Intelligent Surfaces (RIS) are known as a promising technology to improve the performance of wireless communication networks, and have been extensively studied. Movable Antennas (MA) are a novel technology that fully exploits the antenna placement for enhancing the system performance. This article aims at evaluating the impact of transmit power and number of antenna elements on the outage probability performance of an MA-enabled RIS structure (MA-RIS), compared to existing Fixed-Position Antenna RIS (FPA-RIS). The change in geometry caused by the movement of antennas and its implications for the effective number of illuminated elements, are studied for 1D and 2D array structures. Our numerical results confirm the performance advantage provided by MA-RIS, achieving 24\% improvement in outage probability, and 2 dB gain in Signal-to-Noise Ratio (SNR), as compared to FPA-RIS.

[50]  arXiv:2405.00142 [pdf, other]
Title: Utilizing Machine Learning and 3D Neuroimaging to Predict Hearing Loss: A Comparative Analysis of Dimensionality Reduction and Regression Techniques
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In this project, we have explored machine learning approaches for predicting hearing loss thresholds on the brain's gray matter 3D images. We have solved the problem statement in two phases. In the first phase, we used a 3D CNN model to reduce high-dimensional input into latent space and decode it into an original image to represent the input in rich feature space. In the second phase, we utilized this model to reduce input into rich features and used these features to train standard machine learning models for predicting hearing thresholds. We have experimented with autoencoders and variational autoencoders in the first phase for dimensionality reduction and explored random forest, XGBoost and multi-layer perceptron for regressing the thresholds. We split the given data set into training and testing sets and achieved an 8.80 range and 22.57 range for PT500 and PT4000 on the test set, respectively. We got the lowest RMSE using multi-layer perceptron among the other models.
Our approach leverages the unique capabilities of VAEs to capture complex, non-linear relationships within high-dimensional neuroimaging data. We rigorously evaluated the models using various metrics, focusing on the root mean squared error (RMSE). The results highlight the efficacy of the multi-layer neural network model, which outperformed other techniques in terms of accuracy. This project advances the application of data mining in medical diagnostics and enhances our understanding of age-related hearing loss through innovative machine-learning frameworks.

[51]  arXiv:2405.00144 [pdf, ps, other]
Title: Greater benefits of deep learning-based computer-aided detection systems for finding small signals in 3D volumetric medical images
Subjects: Human-Computer Interaction (cs.HC)

Purpose: Radiologists are tasked with visually scrutinizing large amounts of data produced by 3D volumetric imaging modalities. Small signals can go unnoticed during the 3d search because they are hard to detect in the visual periphery. Recent advances in machine learning and computer vision have led to effective computer-aided detection (CADe) support systems with the potential to mitigate perceptual errors.
Approach: Sixteen non-expert observers searched through digital breast tomosynthesis (DBT) phantoms and single cross-sectional slices of the DBT phantoms. The 3D/2D searches occurred with and without a convolutional neural network (CNN)-based CADe support system. The model provided observers with bounding boxes superimposed on the image stimuli while they looked for a small microcalcification signal and a large mass signal. Eye gaze positions were recorded and correlated with changes in the area under the ROC curve (AUC).
Results: The CNN-CADe improved the 3D search for the small microcalcification signal (delta AUC = 0.098, p = 0.0002) and the 2D search for the large mass signal (delta AUC = 0.076, p = 0.002). The CNN-CADe benefit in 3D for the small signal was markedly greater than in 2D (delta delta AUC = 0.066, p = 0.035). Analysis of individual differences suggests that those who explored the least with eye movements benefited the most from the CNN-CADe (r = -0.528, p = 0.036). However, for the large signal, the 2D benefit was not significantly greater than the 3D benefit (delta delta AUC = 0.033, p = 0.133).
Conclusion: The CNN-CADe brings unique performance benefits to the 3D (vs. 2D) search of small signals by reducing errors caused by the under-exploration of the volumetric data.

[52]  arXiv:2405.00145 [pdf, other]
Title: GUing: A Mobile GUI Search Engine using a Vision-Language Model
Subjects: Software Engineering (cs.SE); Computer Vision and Pattern Recognition (cs.CV)

App developers use the Graphical User Interface (GUI) of other apps as an important source of inspiration to design and improve their own apps. In recent years, research suggested various approaches to retrieve GUI designs that fit a certain text query from screenshot datasets acquired through automated GUI exploration. However, such text-to-GUI retrieval approaches only leverage the textual information of the GUI elements in the screenshots, neglecting visual information such as icons or background images. In addition, the retrieved screenshots are not steered by app developers and often lack important app features, e.g. whose UI pages require user authentication. To overcome these limitations, this paper proposes GUing, a GUI search engine based on a vision-language model called UIClip, which we trained specifically for the app GUI domain. For this, we first collected app introduction images from Google Play, which usually display the most representative screenshots selected and often captioned (i.e. labeled) by app vendors. Then, we developed an automated pipeline to classify, crop, and extract the captions from these images. This finally results in a large dataset which we share with this paper: including 303k app screenshots, out of which 135k have captions. We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind in GUI retrieval. We evaluated our approach on various datasets from related work and in manual experiment. The results demonstrate that our model outperforms previous approaches in text-to-GUI retrieval achieving a Recall@10 of up to 0.69 and a HIT@10 of 0.91. We also explored the performance of UIClip for other GUI tasks including GUI classification and Sketch-to-GUI retrieval with encouraging results.

[53]  arXiv:2405.00154 [pdf, other]
Title: EEvA: Fast Expert-Based Algorithms for Buffer Page Replacement
Subjects: Databases (cs.DB)

Optimal page replacement is an important problem in efficient buffer management. The range of replacement strategies known in the literature varies from simple but efficient FIFO-based algorithms to more accurate but potentially costly methods tailored to specific data access patterns. The principal issue in adopting a pattern-specific replacement logic in a DB buffer manager is to guarantee non-degradation in general high-load regimes. In this paper, we propose a new family of page replacement algorithms for DB buffer manager which demonstrate a superior performance wrt competitors on custom data access patterns and imply a low computational overhead on TPC-C. We provide theoretical foundations and an extensive experimental study on the proposed algorithms which covers synthetic benchmarks and an implementation in an open-source DB kernel evaluated on TPC-C.

[54]  arXiv:2405.00155 [pdf, other]
Title: HistNERo: Historical Named Entity Recognition for the Romanian Language
Comments: Accepted at the International Conference on Document Analysis and Recognition (ICDAR 2024)
Subjects: Computation and Language (cs.CL)

This work introduces HistNERo, the first Romanian corpus for Named Entity Recognition (NER) in historical newspapers. The dataset contains 323k tokens of text, covering more than half of the 19th century (i.e., 1817) until the late part of the 20th century (i.e., 1990). Eight native Romanian speakers annotated the dataset with five named entities. The samples belong to one of the following four historical regions of Romania, namely Bessarabia, Moldavia, Transylvania, and Wallachia. We employed this proposed dataset to perform several experiments for NER using Romanian pre-trained language models. Our results show that the best model achieved a strict F1-score of 55.69%. Also, by reducing the discrepancies between regions through a novel domain adaption technique, we improved the performance on this corpus to a strict F1-score of 66.80%, representing an absolute gain of more than 10%.

[55]  arXiv:2405.00156 [pdf, other]
Title: Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification
Comments: 11 pages, 13 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantum Physics (quant-ph)

Quantum machine learning (QML) has the potential for improving the multi-label classification of rare, albeit critical, diseases in large-scale chest x-ray (CXR) datasets due to theoretical quantum advantages over classical machine learning (CML) in sample efficiency and generalizability. While prior literature has explored QML with CXRs, it has focused on binary classification tasks with small datasets due to limited access to quantum hardware and computationally expensive simulations. To that end, we implemented a Jax-based framework that enables the simulation of medium-sized qubit architectures with significant improvements in wall-clock time over current software offerings. We evaluated the performance of our Jax-based framework in terms of efficiency and performance for hybrid quantum transfer learning for long-tailed classification across 8, 14, and 19 disease labels using large-scale CXR datasets. The Jax-based framework resulted in up to a 58% and 95% speed-up compared to PyTorch and TensorFlow implementations, respectively. However, compared to CML, QML demonstrated slower convergence and an average AUROC of 0.70, 0.73, and 0.74 for the classification of 8, 14, and 19 CXR disease labels. In comparison, the CML models had an average AUROC of 0.77, 0.78, and 0.80 respectively. In conclusion, our work presents an accessible implementation of hybrid quantum transfer learning for long-tailed CXR classification with a computationally efficient Jax-based framework.

[56]  arXiv:2405.00157 [pdf, other]
Title: Information-Theoretic Opacity-Enforcement in Markov Decision Processes
Subjects: Systems and Control (eess.SY)

The paper studies information-theoretic opacity, an information-flow privacy property, in a setting involving two agents: A planning agent who controls a stochastic system and an observer who partially observes the system states. The goal of the observer is to infer some secret, represented by a random variable, from its partial observations, while the goal of the planning agent is to make the secret maximally opaque to the observer while achieving a satisfactory total return. Modeling the stochastic system using a Markov decision process, two classes of opacity properties are considered -- Last-state opacity is to ensure that the observer is uncertain if the last state is in a specific set and initial-state opacity is to ensure that the observer is unsure of the realization of the initial state. As the measure of opacity, we employ the Shannon conditional entropy capturing the information about the secret revealed by the observable. Then, we develop primal-dual policy gradient methods for opacity-enforcement planning subject to constraints on total returns. We propose novel algorithms to compute the policy gradient of entropy for each observation, leveraging message passing within the hidden Markov models. This gradient computation enables us to have stable and fast convergence. We demonstrate our solution of opacity-enforcement control through a grid world example.

[57]  arXiv:2405.00163 [pdf, ps, other]
Title: Logical analysis and contradiction detection in high-level requirements during the review process using sat-solver
Comments: 10 pages, 6 pages, 4 table, 12th International Conference on Software Engineering & Trends (SE 2024)
Subjects: Software Engineering (cs.SE)

DO-178C stands out as a guiding standard for aviation system development processes. This standard not only mandates ensuring the consistency of requirements in the software verification process but also recognizes it as a mandatory element. The main objective of this study is to introduce a method for analyzing and identifying inconsistencies between high-level requirements using information obtained from a data dictionary. This method aims to transform high-level requirements into logical expressions and then thoroughly examine them using a SAT Solver to detect inconsistencies. While methods focused on identifying inconsistencies among requirements often appear in the literature, this study presents a novel approach to detect contradictions between non-natural language, systematically structured, and language-independent requirements. The goal of this approach is to significantly reduce the review time of high-level requirements in the software verification process. Evaluations indicate that the use of this method results in substantial time savings in the inconsistency detection process.

[58]  arXiv:2405.00165 [pdf, ps, other]
Title: Solvable Initial Value Problems Ruled by Discontinuous Ordinary Differential Equations
Comments: Preliminary version presented at STACS'2024
Subjects: Computational Complexity (cs.CC); Logic in Computer Science (cs.LO); Dynamical Systems (math.DS)

We study initial value problems having dynamics ruled by discontinuous ordinary differential equations with the property of possessing a unique solution. We identify a precise class of such systems that we call \emph{solvable intitial value problems} and we prove that for this class of problems the unique solution can always be obtained analytically via transfinite recursion. We present several examples including a nontrivial one whose solution yields, at an integer time, a real encoding of the halting set for Turing machines; therefore showcasing that the behavior of solvable systems is related to ordinal Turing computations.

[59]  arXiv:2405.00166 [pdf, other]
Title: Discovering intrinsic multi-compartment pharmacometric models using Physics Informed Neural Networks
Comments: Accepted into the International conference on Scientific Computation and Machine Learning 2024 (SCML 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Pharmacometric models are pivotal across drug discovery and development, playing a decisive role in determining the progression of candidate molecules. However, the derivation of mathematical equations governing the system is a labor-intensive trial-and-error process, often constrained by tight timelines. In this study, we introduce PKINNs, a novel purely data-driven pharmacokinetic-informed neural network model. PKINNs efficiently discovers and models intrinsic multi-compartment-based pharmacometric structures, reliably forecasting their derivatives. The resulting models are both interpretable and explainable through Symbolic Regression methods. Our computational framework demonstrates the potential for closed-form model discovery in pharmacometric applications, addressing the labor-intensive nature of traditional model derivation. With the increasing availability of large datasets, this framework holds the potential to significantly enhance model-informed drug discovery.

[60]  arXiv:2405.00168 [pdf, other]
Title: Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method
Subjects: Computer Vision and Pattern Recognition (cs.CV)

RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing benchmarks predominantly consist of videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quality. This makes the data unrepresentative of severe imaging conditions, leading to tracking failures in MMW scenarios. To bridge this gap, we present a new benchmark, MV-RGBT, captured specifically in MMW scenarios. In contrast with the existing datasets, MV-RGBT comprises more object categories and scenes, providing a diverse and challenging benchmark. Furthermore, for severe imaging conditions of MMW scenarios, a new problem is posed, namely \textit{when to fuse}, to stimulate the development of fusion strategies for such data. We propose a new method based on a mixture of experts, namely MoETrack, as a baseline fusion strategy. In MoETrack, each expert generates independent tracking results along with the corresponding confidence score, which is used to control the fusion process. Extensive experimental results demonstrate the significant potential of MV-RGBT in advancing RGBT tracking and elicit the conclusion that fusion is not always beneficial, especially in MMW scenarios. Significantly, the proposed MoETrack method achieves new state-of-the-art results not only on MV-RGBT, but also on standard benchmarks, such as RGBT234, LasHeR, and the short-term split of VTUAV (VTUAV-ST). More information of MV-RGBT and the source code of MoETrack will be released at https://github.com/Zhangyong-Tang/MoETrack.

[61]  arXiv:2405.00172 [pdf, other]
Title: Re-visiting Skip-Gram Negative Sampling: Dimension Regularization for More Efficient Dissimilarity Preservation in Graph Embeddings
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

A wide range of graph embedding objectives decompose into two components: one that attracts the embeddings of nodes that are perceived as similar, and another that repels embeddings of nodes that are perceived as dissimilar. Because real-world graphs are sparse and the number of dissimilar pairs grows quadratically with the number of nodes, Skip-Gram Negative Sampling (SGNS) has emerged as a popular and efficient repulsion approach. SGNS repels each node from a sample of dissimilar nodes, as opposed to all dissimilar nodes. In this work, we show that node-wise repulsion is, in aggregate, an approximate re-centering of the node embedding dimensions. Such dimension operations are much more scalable than node operations. The dimension approach, in addition to being more efficient, yields a simpler geometric interpretation of the repulsion. Our result extends findings from the self-supervised learning literature to the skip-gram model, establishing a connection between skip-gram node contrast and dimension regularization. We show that in the limit of large graphs, under mild regularity conditions, the original node repulsion objective converges to optimization with dimension regularization. We use this observation to propose an algorithm augmentation framework that speeds up any existing algorithm, supervised or unsupervised, using SGNS. The framework prioritizes node attraction and replaces SGNS with dimension regularization. We instantiate this generic framework for LINE and node2vec and show that the augmented algorithms preserve downstream performance while dramatically increasing efficiency.

[62]  arXiv:2405.00175 [pdf, other]
Title: Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

This paper introduces uRAG--a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. Each RAG system consumes the retrieval results for a unique purpose, such as open-domain question answering, fact verification, entity linking, and relation extraction. We introduce a generic training guideline that standardizes the communication between the search engine and the downstream RAG systems that engage in optimizing the retrieval model. This lays the groundwork for us to build a large-scale experimentation ecosystem consisting of 18 RAG systems that engage in training and 18 unknown RAG systems that use the uRAG as the new users of the search engine. Using this experimentation ecosystem, we answer a number of fundamental research questions that improve our understanding of promises and challenges in developing search engines for machines.

[63]  arXiv:2405.00181 [pdf, other]
Title: Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Comments: Codebase: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA). Specifically, each instance of the proposed benchmark involves three sets of human annotations to indicate the "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. In addition, we also introduce MMEval, a novel evaluation metric designed to better align with human preferences for CUVA, facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies. Finally, we propose a novel prompt-based method that can serve as a baseline approach for the challenging CUVA. We conduct extensive experiments to show the superiority of our evaluation metric and the prompt-based approach. Our code and dataset are available at https://github.com/fesvhtr/CUVA.

[64]  arXiv:2405.00182 [pdf, other]
Title: M-DEW: Extending Dynamic Ensemble Weighting to Handle Missing Values
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Missing value imputation is a crucial preprocessing step for many machine learning problems. However, it is often considered as a separate subtask from downstream applications such as classification, regression, or clustering, and thus is not optimized together with them. We hypothesize that treating the imputation model and downstream task model together and optimizing over full pipelines will yield better results than treating them separately. Our work describes a novel AutoML technique for making downstream predictions with missing data that automatically handles preprocessing, model weighting, and selection during inference time, with minimal compute overhead. Specifically we develop M-DEW, a Dynamic missingness-aware Ensemble Weighting (DEW) approach, that constructs a set of two-stage imputation-prediction pipelines, trains each component separately, and dynamically calculates a set of pipeline weights for each sample during inference time. We thus extend previous work on dynamic ensemble weighting to handle missing data at the level of full imputation-prediction pipelines, improving performance and calibration on downstream machine learning tasks over standard model averaging techniques. M-DEW is shown to outperform the state-of-the-art in that it produces statistically significant reductions in model perplexity in 17 out of 18 experiments, while improving average precision in 13 out of 18 experiments.

[65]  arXiv:2405.00183 [pdf, ps, other]
Title: Capabilities
Comments: 14
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

In our daily lives, as in science and in all other domains, we encounter huge numbers of dispositions (tendencies, potentials, powers) which are realized in processes such as sneezing, sweating, shedding dandruff, and on and on. Among this plethora of what we can think of as mere dispositions is a subset of dispositions in whose realizations we have an interest a car responding well when driven on ice, a rabbits lungs responding well when it is chased by a wolf, and so on. We call the latter capabilities and we attempt to provide a robust ontological account of what capabilities are that is of sufficient generality to serve a variety of purposes, for example by providing a useful extension to ontology-based research in areas where capabilities data are currently being collected in siloed fashion.

[66]  arXiv:2405.00184 [pdf, other]
Title: Semi-Supervised Hierarchical Multi-Label Classifier Based on Local Information
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Scarcity of labeled data is a common problem in supervised classification, since hand-labeling can be time consuming, expensive or hard to label; on the other hand, large amounts of unlabeled information can be found. The problem of scarcity of labeled data is even more notorious in hierarchical classification, because the data of a node is split among its children, which results in few instances associated to the deepest nodes of the hierarchy. In this work it is proposed the semi-supervised hierarchical multi-label classifier based on local information (SSHMC-BLI) which can be trained with labeled and unlabeled data to perform hierarchical classification tasks. The method can be applied to any type of hierarchical problem, here we focus on the most difficult case: hierarchies of DAG type, where the instances can be associated to multiple paths of labels which can finish in an internal node. SSHMC-BLI builds pseudo-labels for each unlabeled instance from the paths of labels of its labeled neighbors, while it considers whether the unlabeled instance is similar to its neighbors. Experiments on 12 challenging datasets from functional genomics show that making use of unlabeled along with labeled data can help to improve the performance of a supervised hierarchical classifier trained only on labeled data, even with statistical significance.

[67]  arXiv:2405.00186 [pdf, ps, other]
Title: Credentials in the Occupation Ontology
Comments: 11
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR)

The term credential encompasses educational certificates, degrees, certifications, and government-issued licenses. An occupational credential is a verification of an individuals qualification or competence issued by a third party with relevant authority. Job seekers often leverage such credentials as evidence that desired qualifications are satisfied by their holders. Many U.S. education and workforce development organizations have recognized the importance of credentials for employment and the challenges of understanding the value of credentials. In this study, we identified and ontologically defined credential and credential-related terms at the textual and semantic levels based on the Occupation Ontology (OccO), a BFO-based ontology. Different credential types and their authorization logic are modeled. We additionally defined a high-level hierarchy of credential related terms and relations among many terms, which were initiated in concert with the Alabama Talent Triad (ATT) program, which aims to connect learners, earners, employers and education/training providers through credentials and skills. To our knowledge, our research provides for the first time systematic ontological modeling of the important domain of credentials and related contents, supporting enhanced credential data and knowledge integration in the future.

[68]  arXiv:2405.00187 [pdf, other]
Title: Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer
Comments: ICDAR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still heavily relies on large labeled datasets for effective training. Several semi-supervised approaches have emerged to overcome this challenge, often employing CNN-based detectors with anchor proposals and post-processing techniques like non-maximal suppression (NMS). However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms. Previous research has focused on two key areas to improve transformer-based detectors: refining the quality of object queries and optimizing attention mechanisms. However, increasing object queries can introduce redundancy, while adjustments to the attention mechanism can increase complexity. To address these challenges, we introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features. Our approach demonstrates remarkable reductions in false positives and substantial enhancements in table detection performance, particularly in complex documents characterized by diverse table structures. This work provides more efficient and accurate table detection in semi-supervised settings.

[69]  arXiv:2405.00189 [pdf, other]
Title: Comparing Motion Distortion Between Vehicle Field Deployments
Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024
Subjects: Robotics (cs.RO)

Recent advances in autonomous driving for uncrewed ground vehicles (UGVs) have spurred significant development, particularly in challenging terrains. This paper introduces a classification system assessing various UGV deployments reported in the literature. Our approach considers motion distortion features that include internal UGV features, such as mass and speed, and external features, such as terrain complexity, which all influence the efficiency of models and navigation systems. We present results that map UGV deployments relative to vehicle kinetic energy and terrain complexity, providing insights into the level of complexity and risk associated with different operational environments. Additionally, we propose a motion distortion metric to assess UGV navigation performance that does not require an explicit quantification of motion distortion features. Using this metric, we conduct a case study to illustrate the impact of motion distortion features on modeling accuracy. This research advocates for creating a comprehensive database containing many different motion distortion features, which would contribute to advancing the understanding of autonomous driving capabilities in rough conditions and provide a validation framework for future developments in UGV navigation systems.

[70]  arXiv:2405.00196 [pdf, other]
Title: Synthetic Image Verification in the Era of Generative AI: What Works and What Isn't There Yet
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this work we present an overview of approaches for the detection and attribution of synthetic images and highlight their strengths and weaknesses. We also point out and discuss hot topics in this field and outline promising directions for future research.

[71]  arXiv:2405.00197 [pdf, ps, other]
Title: Grounding Realizable Entities
Comments: 13
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR)

Ontological representations of qualities, dispositions, and roles have been refined over the past decade, clarifying subtle distinctions in life science research. After articulating a widely-used characterization of these entities within the context of Basic Formal Ontology (BFO), we identify gaps in this treatment and motivate the need for supplementing the BFO characterization. By way of supplement, we propose definitions for grounding relations holding between qualities and dispositions, and dispositions and roles, illustrating our proposal by representing subtle aspects of host-pathogen interactions.

[72]  arXiv:2405.00198 [pdf, other]
Title: Data-driven identification of stable differential operators using constrained regression
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)

Identifying differential operators from data is essential for the mathematical modeling of complex physical and biological systems where massive datasets are available. These operators must be stable for accurate predictions for dynamics forecasting problems. In this article, we propose a novel methodology for learning sparse differential operators that are theoretically linearly stable by solving a constrained regression problem. These underlying constraints are obtained following linear stability for dynamical systems. We further extend this approach for learning nonlinear differential operators by determining linear stability constraints for linearized equations around an equilibrium point. The applicability of the proposed method is demonstrated for both linear and nonlinear partial differential equations such as 1-D scalar advection-diffusion equation, 1-D Burgers equation and 2-D advection equation. The results indicated that solutions to constrained regression problems with linear stability constraints provide accurate and linearly stable sparse differential operators.

[73]  arXiv:2405.00199 [pdf, other]
Title: Field Report on a Wearable and Versatile Solution for Field Acquisition and Exploration
Comments: 5 pages, 6 figures, Accepted for the Workshop on Field Robotics at ICRA2024
Subjects: Robotics (cs.RO)

This report presents a wearable plug-and-play platform for data acquisition in the field. The platform, extending a waterproof Pelican Case into a 20 kg backpack offers 5.5 hours of power autonomy, while recording data with two cameras, a lidar, an Inertial Measurement Unit (IMU), and a Global Navigation Satellite System (GNSS) receiver. The system only requires a single operator and is readily controlled with a built-in screen and buttons. Due to its small footprint, it offers greater flexibility than large vehicles typically deployed in off-trail environments. We describe the platform's design, detailing the mechanical parts, electrical components, and software stack. We explain the system's limitations, drawing from its extensive deployment spanning over 20 kilometers of trajectories across various seasons, environments, and weather conditions. We derive valuable lessons learned from these deployments and present several possible applications for the system. The possible use cases consider not only academic research but also insights from consultations with our industrial partners. The mechanical design including all CAD files, as well as the software stack, are publicly available at https://github.com/norlab-ulaval/backpack_workspace.

[74]  arXiv:2405.00200 [pdf, other]
Title: In-Context Learning with Long-Context Models: An In-Depth Exploration
Comments: 27 pages; preprint
Subjects: Computation and Language (cs.CL)

As model context lengths continue to increase, the number of demonstrations that can be provided in-context approaches the size of entire training datasets. We study the behavior of in-context learning (ICL) at this extreme scale on multiple datasets and models. We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations. We contrast this with example retrieval and finetuning: example retrieval shows excellent performance at low context lengths but has diminished gains with more demonstrations; finetuning is more data hungry than ICL but can sometimes exceed long-context ICL performance with additional data. We use this ICL setting as a testbed to study several properties of both in-context learning and long-context models. We show that long-context ICL is less sensitive to random input shuffling than short-context ICL, that grouping of same-label examples can negatively impact performance, and that the performance boosts we see do not arise from cumulative gain from encoding many examples together. We conclude that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples rather than task learning.

[75]  arXiv:2405.00201 [pdf, other]
Title: SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. However, the substantial requirements for computational power and storage have discouraged its widespread use. Moreover, increasing evidence of catastrophic forgetting and overparameterization in the Transformer architecture has motivated researchers to seek more efficient fine-tuning (PEFT) methods. Commonly known parameter-efficient fine-tuning methods like LoRA and BitFit are typically applied across all layers of the model. We propose a PEFT method, called Stratified Progressive Adaptation Fine-tuning (SPAFIT), based on the localization of different types of linguistic knowledge to specific layers of the model. Our experiments, conducted on nine tasks from the GLUE benchmark, show that our proposed SPAFIT method outperforms other PEFT methods while fine-tuning only a fraction of the parameters adjusted by other methods.

[76]  arXiv:2405.00202 [pdf, other]
Title: Leveraging Active Subspaces to Capture Epistemic Model Uncertainty in Deep Generative Models for Molecular Design
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

Deep generative models have been accelerating the inverse design process in material and drug design. Unlike their counterpart property predictors in typical molecular design frameworks, generative molecular design models have seen fewer efforts on uncertainty quantification (UQ) due to computational challenges in Bayesian inference posed by their large number of parameters. In this work, we focus on the junction-tree variational autoencoder (JT-VAE), a popular model for generative molecular design, and address this issue by leveraging the low dimensional active subspace to capture the uncertainty in the model parameters. Specifically, we approximate the posterior distribution over the active subspace parameters to estimate the epistemic model uncertainty in an extremely high dimensional parameter space. The proposed UQ scheme does not require alteration of the model architecture, making it readily applicable to any pre-trained model. Our experiments demonstrate the efficacy of the AS-based UQ and its potential impact on molecular optimization by exploring the model diversity under epistemic uncertainty.

[77]  arXiv:2405.00204 [pdf, other]
Title: General Purpose Verification for Chain of Thought Prompting
Comments: 22 pages, preprint
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should adhere to while reasoning: (i) Relevance, (ii) Mathematical Accuracy, and (iii) Logical Consistency. We apply these constraints to the reasoning steps generated by the LLM to improve the accuracy of the final generation. The constraints are applied in the form of verifiers: the model itself is asked to verify if the generated steps satisfy each constraint. To further steer the generations towards high-quality solutions, we use the perplexity of the reasoning steps as an additional verifier. We evaluate our method on 4 distinct types of reasoning tasks, spanning a total of 9 different datasets. Experiments show that our method is always better than vanilla generation, and, in 6 out of the 9 datasets, it is better than best-of N sampling which samples N reasoning chains and picks the lowest perplexity generation.

[78]  arXiv:2405.00205 [pdf, ps, other]
Title: A Logic for Reasoning About Aggregate-Combine Graph Neural Networks
Comments: arXiv admin note: text overlap with arXiv:2307.05150
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)

We propose a modal logic in which counting modalities appear in linear inequalities. We show that each formula can be transformed into an equivalent graph neural network (GNN). We also show that a broad class of GNNs can be transformed efficiently into a formula, thus significantly improving upon the literature about the logical expressiveness of GNNs. We also show that the satisfiability problem is PSPACE-complete. These results bring together the promise of using standard logical methods for reasoning about GNNs and their properties, particularly in applications such as GNN querying, equivalence checking, etc. We prove that such natural problems can be solved in polynomial space.

[79]  arXiv:2405.00208 [pdf, other]
Title: A Primer on the Inner Workings of Transformer-based Language Models
Subjects: Computation and Language (cs.CL)

The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architecture. We conclude by presenting a comprehensive overview of the known internal mechanisms implemented by these models, uncovering connections across popular approaches and active research directions in this area.

[80]  arXiv:2405.00213 [pdf, other]
Title: Block-As-Domain Adaptation for Workload Prediction from fNIRS Data
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)

Functional near-infrared spectroscopy (fNIRS) is a non-intrusive way to measure cortical hemodynamic activity. Predicting cognitive workload from fNIRS data has taken on a diffuse set of methods. To be applicable in real-world settings, models are needed, which can perform well across different sessions as well as different subjects. However, most existing works assume that training and testing data come from the same subjects and/or cannot generalize well across never-before-seen subjects. Additional challenges imposed by fNIRS data include the high variations in inter-subject fNIRS data and also in intra-subject data collected across different blocks of sessions. To address these issues, we propose an effective method, referred to as the class-aware-block-aware domain adaptation (CABA-DA) which explicitly minimize intra-session variance by viewing different blocks from the same subject same session as different domains. We minimize the intra-class domain discrepancy and maximize the inter-class domain discrepancy accordingly. In addition, we propose an MLPMixer-based model for cognitive load classification. Experimental results demonstrate the proposed model has better performance compared with three different baseline models on three public-available datasets of cognitive workload. Two of them are collected from n-back tasks and one of them is from finger tapping. From our experiments, we also show the proposed contrastive learning method can also improve baseline models we compared with.

[81]  arXiv:2405.00216 [pdf, other]
Title: Graphical Reasoning: LLM-based Semi-Open Relation Extraction
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper presents a comprehensive exploration of relation extraction utilizing advanced language models, specifically Chain of Thought (CoT) and Graphical Reasoning (GRE) techniques. We demonstrate how leveraging in-context learning with GPT-3.5 can significantly enhance the extraction process, particularly through detailed example-based reasoning. Additionally, we introduce a novel graphical reasoning approach that dissects relation extraction into sequential sub-tasks, improving precision and adaptability in processing complex relational data. Our experiments, conducted on multiple datasets, including manually annotated data, show considerable improvements in performance metrics, underscoring the effectiveness of our methodologies.

[82]  arXiv:2405.00217 [pdf, other]
Title: GMC-PINNs: A new general Monte Carlo PINNs method for solving fractional partial differential equations on irregular domains
Subjects: Machine Learning (cs.LG)

Physics-Informed Neural Networks (PINNs) have been widely used for solving partial differential equations (PDEs) of different types, including fractional PDEs (fPDES) [29]. Herein, we propose a new general (quasi) Monte Carlo PINN for solving fPDEs on irregular domains. Specifically, instead of approximating fractional derivatives by Monte Carlo approximations of integrals as was done previously in [31], we use a more general Monte Carlo approximation method to solve different fPDEs, which is valid for fractional differentiation under any definition. Moreover, based on the ensemble probability density function, the generated nodes are all located in denser regions near the target point where we perform the differentiation. This has an unexpected connection with known finite difference methods on non-equidistant or nested grids, and hence our method inherits their advantages. At the same time, the generated nodes exhibit a block-like dense distribution, leading to a good computational efficiency of this approach. We present the framework for using this algorithm and apply it to several examples. Our results demonstrate the effectiveness of GMC-PINNs in dealing with irregular domain problems and show a higher computational efficiency compared to the original fPINN method. We also include comparisons with the Monte Carlo fPINN [31]. Finally, we use examples to demonstrate the effectiveness of the method in dealing with fuzzy boundary location problems, and then use the method to solve the coupled 3D fractional Bloch-Torrey equation defined in the ventricular domain of the human brain, and compare the results with classical numerical methods.

[83]  arXiv:2405.00218 [pdf, other]
Title: Constrained Decoding for Secure Code Generation
Comments: 17 pages, 8 figures
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)

Code Large Language Models (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure. Previous research has primarily focused on generating secure code, overlooking the fact that secure code also needs to be correct. This oversight can lead to a false sense of security. Currently, the community lacks a method to measure actual progress in this area, and we need solutions that address both security and correctness of code generation.
This paper introduces a new benchmark, CodeGuard+, along with two new metrics, secure-pass@k and secure@$k_{\text{pass}}$, to measure Code LLMs' ability to generate both secure and correct code. Using our new evaluation methods, we show that the state-of-the-art defense technique, prefix tuning, may not be as strong as previously believed, since it generates secure code but sacrifices functional correctness. We also demonstrate that different decoding methods significantly affect the security of Code LLMs.
Furthermore, we explore a new defense direction: constrained decoding for secure code generation. We propose new constrained decoding techniques to generate code that satisfies security and correctness constraints simultaneously. Our results reveal that constrained decoding is more effective than prefix tuning to improve the security of Code LLMs, without requiring a specialized training dataset. Moreover, constrained decoding can be used together with prefix tuning to further improve the security of Code LLMs.

[84]  arXiv:2405.00219 [pdf, ps, other]
Title: Machine Learning-based Estimation of Respiratory Fluctuations in a Healthy Adult Population using BOLD fMRI and Head Motion Parameters
Comments: 6 pages, 5 figure, conference abstract
Subjects: Machine Learning (cs.LG)

Motivation: In many fMRI studies, respiratory signals are often missing or of poor quality. Therefore, it could be highly beneficial to have a tool to extract respiratory variation (RV) waveforms directly from fMRI data without the need for peripheral recording devices.
Goal(s): Investigate the hypothesis that head motion parameters contain valuable information regarding respiratory patter, which can help machine learning algorithms estimate the RV waveform.
Approach: This study proposes a CNN model for reconstruction of RV waveforms using head motion parameters and BOLD signals.
Results: This study showed that combining head motion parameters with BOLD signals enhances RV waveform estimation.
Impact: It is expected that application of the proposed method will lower the cost of fMRI studies, reduce complexity, and decrease the burden on participants as they will not be required to wear a respiratory bellows.

[85]  arXiv:2405.00220 [pdf, other]
Title: Context-Aware Mobile Network Performance Prediction Using Network & Remote Sensing Data
Comments: Accepted at the 17th International Workshop on AI-ML-Powered Autonomous Telco Networks - IEEE International Conference on Communications (ICC) 2024
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

Accurate estimation of Network Performance is crucial for several tasks in telecom networks. Telecom networks regularly serve a vast number of radio nodes. Each radio node provides services to end-users in the associated coverage areas. The task of predicting Network Performance for telecom networks necessitates considering complex spatio-temporal interactions and incorporating geospatial information where the radio nodes are deployed. Instead of relying on historical data alone, our approach augments network historical performance datasets with satellite imagery data. Our comprehensive experiments, using real-world data collected from multiple different regions of an operational network, show that the model is robust and can generalize across different scenarios. The results indicate that the model, utilizing satellite imagery, performs very well across the tested regions. Additionally, the model demonstrates a robust approach to the cold-start problem, offering a promising alternative for initial performance estimation in newly deployed sites.

[86]  arXiv:2405.00223 [pdf, other]
Title: ConFides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration
Subjects: Human-Computer Interaction (cs.HC)

Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue. ConFides aims to aid exploration and post-AI-transcription editing by visually representing the confidence associated with the transcription. We demonstrate how our tool can assist intelligence analysts who use ASR outputs in their analytical and exploratory tasks and how it can help mitigate misinterpretation of crucial information. We also discuss opportunities for improving textual data cleaning and model transparency for human-machine collaboration.

[87]  arXiv:2405.00225 [pdf, other]
Title: Hacia una implementación ética e inclusiva de la Inteligencia Artificial en las organizaciones: un marco multidimensional
Comments: in Spanish language
Subjects: Computers and Society (cs.CY)

The article analyzes the impact of artificial intelligence (AI) on contemporary society and the importance of adopting an ethical approach to its development and implementation within organizations. It examines the critical perspective of French philosopher \'Eric Sadin and others, who warn of the risks of unbridled technologization that can erode human autonomy. However, the article also recognizes the active role that various actors, such as governments, academics and civil society, can play in shaping the development of AI aligned with human and social values. A multidimensional approach is proposed that combines ethics with regulation, innovation and education. It highlights the importance of developing detailed ethical frameworks, incorporating ethics in the training of professionals, conducting ethical impact audits, and encouraging stakeholder participation in AI design. In addition, four fundamental pillars for the ethical implementation of AI in organizations are presented: 1) Integrated values, 2) Trust and transparency, 3) Empowering human growth, and 4) Identifying strategic factors. These pillars cover aspects such as alignment with the company's ethical identity, governance and accountability, human-centered design, continuous training and adaptability in the face of technological and market changes. It concludes by emphasizing that ethics must be the cornerstone of the strategy of any organization that aspires to incorporate AI, establishing a solid framework to ensure that the technology is developed and used in a way that respects and promotes human values.

[88]  arXiv:2405.00227 [pdf, other]
Title: Optimized Non-Primary Channel Access Design in IEEE 802.11bn
Comments: This work has been submitted to the IEEE for possible publication. 6 pages, 5 figures
Subjects: Networking and Internet Architecture (cs.NI)

The IEEE 802.11 standards, culminating in IEEE 802.11be (Wi-Fi 7), have significantly expanded bandwidth capacities from 20 MHz to 320 MHz, marking a crucial evolution in wireless access technology. Despite these advancements, the full potential of these capacities remains largely untapped due to inefficiencies in channel management, in particular, the underutilization of secondary (non-primary) channels when the primary channel is occupied. This paper delves into the Non-Primary Channel Access (NPCA) protocol, initially proposed by the IEEE 802.11 Ultra-High Reliability (UHR) group, aimed at addressing these inefficiencies. Our research not only proposes an analytical model to assess the throughput of NPCA in terms of average throughput but also crucially identifies that the overhead associated with the NPCA protocol is significant and cannot be ignored. This overhead often undermines the effectiveness of the NPCA, challenging the assumption that it is invariably superior to traditional models. Based on these findings, we have developed and simulated a new hybrid model that dynamically integrates the strengths of both legacy and NPCA models. This model overall outperforms the existing models under all channel occupancy conditions, offering a robust solution to enhance throughput efficiency.

[89]  arXiv:2405.00228 [pdf, other]
Title: Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion
Comments: 17 pages, 7 figures, 10 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face Recognition (FR) models are trained on large-scale datasets, which have privacy and ethical concerns. Lately, the use of synthetic data to complement or replace genuine data for the training of FR models has been proposed. While promising results have been obtained, it still remains unclear if generative models can yield diverse enough data for such tasks. In this work, we introduce a new method, inspired by the physical motion of soft particles subjected to stochastic Brownian forces, allowing us to sample identities distributions in a latent space under various constraints. With this in hands, we generate several face datasets and benchmark them by training FR models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-art diffusion-based synthetic datasets. We also show that this method can be used to mitigate leakage from the generator's training set and explore the ability of generative models to generate data beyond it.

[90]  arXiv:2405.00229 [pdf, other]
Title: Aptly: Making Mobile Apps from Natural Language
Comments: 11 pages, 7 figures, 2 tables
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

We present Aptly, an extension of the MIT App Inventor platform enabling mobile app development via natural language powered by code-generating large language models (LLMs). Aptly complements App Inventor's block language with a text language designed to allow visual code generation via text-based LLMs. We detail the technical aspects of how the Aptly server integrates LLMs with a realtime collaboration function to facilitate the automated creation and editing of mobile apps given user instructions. The paper concludes with insights from a study of a pilot implementation involving high school students, which examines Aptly's practicality and user experience. The findings underscore Aptly's potential as a tool that democratizes app development and fosters technological creativity.

[91]  arXiv:2405.00233 [pdf, other]
Title: SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Comments: Demo and code: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these challenges, we introduce SemantiCodec, a novel codec designed to compress audio into fewer than a hundred tokens per second across diverse audio types, including speech, general audio, and music, without compromising quality. SemantiCodec features a dual-encoder architecture: a semantic encoder using a self-supervised AudioMAE, discretized using k-means clustering on extensive audio data, and an acoustic encoder to capture the remaining details. The semantic and acoustic encoder outputs are used to reconstruct audio via a diffusion-model-based decoder. SemantiCodec is presented in three variants with token rates of 25, 50, and 100 per second, supporting a range of ultra-low bit rates between 0.31 kbps and 1.43 kbps. Experimental results demonstrate that SemantiCodec significantly outperforms the state-of-the-art Descript codec on reconstruction quality. Our results also suggest that SemantiCodec contains significantly richer semantic information than all evaluated audio codecs, even at significantly lower bitrates. Our code and demos are available at https://haoheliu.github.io/SemantiCodec/.

[92]  arXiv:2405.00236 [pdf, other]
Title: STT: Stateful Tracking with Transformers for Autonomous Driving
Comments: ICRA 2024
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset.

[93]  arXiv:2405.00237 [pdf, ps, other]
Title: A Categorical Approach to Coalgebraic Fixpoint Logic
Subjects: Logic in Computer Science (cs.LO)

We define a framework for incorporating alternation-free fixpoint logics into the dual-adjunction setup for coalgebraic modal logics. We achieve this by using order-enriched categories. We give a least-solution semantics as well as an initial algebra semantics, and prove they are equivalent. We also show how to place the alternation-free coalgebraic $\mu$-calculus in this framework, as well as PDL and a logic with a probabilistic dynamic modality.

[94]  arXiv:2405.00242 [pdf, other]
Title: Guiding Attention in End-to-End Driving Models
Comments: Accepted for publication at the 35th IEEE Intelligent Vehicles Symposium (IV 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving. However, training these well-performing models usually requires a huge amount of data, while still lacking explicit and intuitive activation maps to reveal the inner workings of these models while driving. In this paper, we study how to guide the attention of these models to improve their driving quality and obtain more intuitive activation maps by adding a loss term during training using salient semantic maps. In contrast to previous work, our method does not require these salient semantic maps to be available during testing time, as well as removing the need to modify the model's architecture to which it is applied. We perform tests using perfect and noisy salient semantic maps with encouraging results in both, the latter of which is inspired by possible errors encountered with real data. Using CIL++ as a representative state-of-the-art model and the CARLA simulator with its standard benchmarks, we conduct experiments that show the effectiveness of our method in training better autonomous driving models, especially when data and computational resources are scarce.

[95]  arXiv:2405.00243 [pdf, other]
Title: A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning
Comments: Accepted by IJCAI 2024 Main Track
Subjects: Multiagent Systems (cs.MA); Computer Science and Game Theory (cs.GT)

Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods.We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.

[96]  arXiv:2405.00244 [pdf, other]
Title: Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
Comments: This paper has been accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruction, we present Real-HDRV, a large-scale real-world benchmark dataset for HDR video reconstruction, featuring various scenes, diverse motion patterns, and high-quality labels. Specifically, our dataset contains 500 LDRs-HDRs video pairs, comprising about 28,000 LDR frames and 4,000 HDR labels, covering daytime, nighttime, indoor, and outdoor scenes. To our best knowledge, our dataset is the largest real-world HDR video reconstruction dataset. Correspondingly, we propose an end-to-end network for HDR video reconstruction, where a novel two-stage strategy is designed to perform alignment sequentially. Specifically, the first stage performs global alignment with the adaptively estimated global offsets, reducing the difficulty of subsequent alignment. The second stage implicitly performs local alignment in a coarse-to-fine manner at the feature level using the adaptive separable convolution. Extensive experiments demonstrate that: (1) models trained on our dataset can achieve better performance on real scenes than those trained on synthetic datasets; (2) our method outperforms previous state-of-the-art methods. Our dataset is available at https://github.com/yungsyu99/Real-HDRV.

[97]  arXiv:2405.00246 [pdf, other]
Title: A Framework for Approximation Schemes on Knapsack and Packing Problems of Hyperspheres and Fat Objects
Subjects: Computational Geometry (cs.CG)

Geometric packing problems have been investigated for centuries in mathematics. In contrast, works on sphere packing in the field of approximation algorithms are scarce. Most results are for squares and rectangles, and their d-dimensional counterparts. To help fill this gap, we present a framework that yields approximation schemes for the geometric knapsack problem as well as other packing problems and some generalizations, and that supports not only hyperspheres but also a wide range of shapes for the items and the bins. Our first result is a PTAS for the hypersphere multiple knapsack problem. In fact, we can deal with a more generalized version of the problem that contains additional constraints on the items. These constraints, under some conditions, can encompass very common and pertinent constraints such as conflict constraints, multiple-choice constraints, and capacity constraints. Our second result is a resource augmentation scheme for the multiple knapsack problem for a wide range of convex fat objects, which are not restricted to polygons and polytopes. Examples are ellipsoids, rhombi, hypercubes, hyperspheres under the Lp-norm, etc. Also, for the generalized version of the multiple knapsack problem, our technique still yields a PTAS under resource augmentation for these objects. Thirdly, we improve the resource augmentation schemes of fat objects to allow rotation on the objects by any angle. This result, in particular, brings something extra to our framework, since most results comprising such general objects are limited to translations. At last, our framework is able to contemplate other problems such as the cutting stock problem, the minimum-size bin packing problem and the multiple strip packing problem.

[98]  arXiv:2405.00248 [pdf, other]
Title: Who is Authentic Speaker
Authors: Qiang Huang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Voice conversion (VC) using deep learning technologies can now generate high quality one-to-many voices and thus has been used in some practical application fields, such as entertainment and healthcare. However, voice conversion can pose potential social issues when manipulated voices are employed for deceptive purposes. Moreover, it is a big challenge to find who are real speakers from the converted voices as the acoustic characteristics of source speakers are changed greatly. In this paper we attempt to explore the feasibility of identifying authentic speakers from converted voices. This study is conducted with the assumption that certain information from the source speakers persists, even when their voices undergo conversion into different target voices. Therefore our experiments are geared towards recognising the source speakers given the converted voices, which are generated by using FragmentVC on the randomly paired utterances from source and target speakers. To improve the robustness against converted voices, our recognition model is constructed by using hierarchical vector of locally aggregated descriptors (VLAD) in deep neural networks. The authentic speaker recognition system is mainly tested in two aspects, including the impact of quality of converted voices and the variations of VLAD. The dataset used in this work is VCTK corpus, where source and target speakers are randomly paired. The results obtained on the converted utterances show promising performances in recognising authentic speakers from converted voices.

[99]  arXiv:2405.00250 [pdf, other]
Title: SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations
Comments: 8 pages, 6 figures, Accepted to IV 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Vector maps are essential in autonomous driving for tasks like localization and planning, yet their creation and maintenance are notably costly. While recent advances in online vector map generation for autonomous vehicles are promising, current models lack adaptability to different sensor configurations. They tend to overfit to specific sensor poses, leading to decreased performance and higher retraining costs. This limitation hampers their practical use in real-world applications. In response to this challenge, we propose a modular pipeline for vector map generation with improved generalization to sensor configurations. The pipeline leverages probabilistic semantic mapping to generate a bird's-eye-view (BEV) semantic map as an intermediate representation. This intermediate representation is then converted to a vector map using the MapTRv2 decoder. By adopting a BEV semantic map robust to different sensor configurations, our proposed approach significantly improves the generalization performance. We evaluate the model on datasets with sensor configurations not used during training. Our evaluation sets includes larger public datasets, and smaller scale private data collected on our platform. Our model generalizes significantly better than the state-of-the-art methods.

[100]  arXiv:2405.00251 [pdf, other]
Title: Semantically Consistent Video Inpainting with Conditional Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper we reframe video inpainting as a conditional generative modeling problem and present a framework for solving such problems with conditional video diffusion models. We highlight the advantages of using a generative approach for this task, showing that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context.

[101]  arXiv:2405.00253 [pdf, other]
Title: CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification
Subjects: Computation and Language (cs.CL); Software Engineering (cs.SE)

Large Language Models (LLMs) have made significant advancements in the field of code generation, offering unprecedented support for automated programming and assisting developers. However, LLMs sometimes generate code that appears plausible but fails to meet the expected requirements or executes incorrectly. This phenomenon of hallucinations in the coding field has not been explored. To advance the community's understanding and research on code hallucinations in LLMs, we propose a definition method for these hallucinations based on execution verification and introduce the concept of code hallucinations for the first time. We categorize code hallucinations into four main types: mapping, naming, resource, and logic hallucinations, each further divided into different subcategories to better understand and address the unique challenges faced by LLMs during code generation. To systematically evaluate code hallucinations, we propose a dynamic detection algorithm for code hallucinations and construct the CodeHalu benchmark, which includes 8,883 samples from 699 tasks, to actively detect hallucination phenomena in LLMs during programming. We tested 16 popular LLMs on this benchmark to evaluate the frequency and nature of their hallucinations during code generation. The findings reveal significant variations in the accuracy and reliability of LLMs in generating code, highlighting the urgent need to improve models and training methods to ensure the functional correctness and safety of automatically generated code. This study not only classifies and quantifies code hallucinations but also provides insights for future improvements in LLM-based code generation research. The CodeHalu benchmark and code are publicly available at https://github.com/yuchen814/CodeHalu.

[102]  arXiv:2405.00254 [pdf, other]
Title: Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issues due to the inherent heterogeneity in human preferences, as well as their potential strategic behavior in providing feedback. Specifically, we propose two frameworks to address heterogeneous human feedback in principled ways: personalization-based one and aggregation-based one. For the former, we propose two approaches based on representation learning and clustering, respectively, for learning multiple reward models that trades off the bias (due to preference heterogeneity) and variance (due to the use of fewer data for learning each model by personalization). We then establish sample complexity guarantees for both approaches. For the latter, we aim to adhere to the single-model framework, as already deployed in the current RLHF paradigm, by carefully aggregating diverse and truthful preferences from humans. We propose two approaches based on reward and preference aggregation, respectively: the former utilizes both utilitarianism and Leximin approaches to aggregate individual reward models, with sample complexity guarantees; the latter directly aggregates the human feedback in the form of probabilistic opinions. Under the probabilistic-opinion-feedback model, we also develop an approach to handle strategic human labelers who may bias and manipulate the aggregated preferences with untruthful feedback. Based on the ideas in mechanism design, our approach ensures truthful preference reporting, with the induced aggregation rule maximizing social welfare functions.

[103]  arXiv:2405.00256 [pdf, other]
Title: ASAM: Boosting Segment Anything Model with Adversarial Tuning
Authors: Bo Li, Haoke Xiao, Lv Tang
Comments: This paper is accepted by CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in https://asam2024.github.io/.

[104]  arXiv:2405.00258 [pdf, ps, other]
Title: Nearly Perfect Covering Codes
Subjects: Information Theory (cs.IT)

Nearly perfect packing codes are those codes that meet the Johnson upper bound on the size of error-correcting codes. This bound is an improvement to the sphere-packing bound. A related bound for covering codes is known as the van Wee bound. Codes that meet this bound will be called nearly perfect covering codes. In this paper, such codes with covering radius one will be considered. It will be proved that these codes can be partitioned into three families depending on the smallest distance between neighboring codewords. Some of the codes contained in these families will be completely characterized. Construction for codes for each such family will be presented, the weight distribution of codes from these families will be examined, and some codes with special properties will be discussed.

[105]  arXiv:2405.00260 [pdf, other]
Title: CREPE: Coordinate-Aware End-to-End Document Parser
Comments: Accepted at the International Conference on Document Analysis and Recognition (ICDAR 2024) main conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OCR text, and token-triggered coordinate decoding. We also proposed a weakly-supervised framework for cost-efficient training, requiring only parsing annotations without high-cost coordinate annotations. Our experimental evaluations demonstrate CREPE's state-of-the-art performances on document parsing tasks. Beyond that, CREPE's adaptability is further highlighted by its successful usage in other document understanding tasks such as layout analysis, document visual question answering, and so one. CREPE's abilities including OCR and semantic parsing not only mitigate error propagation issues in existing OCR-dependent methods, it also significantly enhance the functionality of sequence generation models, ushering in a new era for document understanding studies.

[106]  arXiv:2405.00262 [pdf, other]
Title: Improved Massively Parallel Triangle Counting in $O(1)$ Rounds
Comments: To appear in PODC 2024
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

In this short note, we give a novel algorithm for $O(1)$ round triangle counting in bounded arboricity graphs. Counting triangles in $O(1)$ rounds (exactly) is listed as one of the interesting remaining open problems in the recent survey of Im et al. [IKLMV23]. The previous paper of Biswas et al. [BELMR20], which achieved the best bounds under this setting, used $O(\log \log n)$ rounds in sublinear space per machine and $O(m\alpha)$ total space where $\alpha$ is the arboricity of the graph and $n$ and $m$ are the number of vertices and edges in the graph, respectively. Our new algorithm is very simple, achieves the optimal $O(1)$ rounds without increasing the space per machine and the total space, and has the potential of being easily implementable in practice.

[107]  arXiv:2405.00263 [pdf, other]
Title: Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its time on memory transfer instead of computation. Recently, parallel decoding, a type of speculative decoding algorithms, is becoming more popular and has demonstrated impressive efficiency improvement in generation. It introduces extra decoding heads to large models, enabling them to predict multiple subsequent tokens simultaneously and verify these candidate continuations in a single decoding step. However, this approach deviates from the training objective of next token prediction used during pre-training, resulting in a low hit rate for candidate tokens. In this paper, we propose a new speculative decoding algorithm, Clover, which integrates sequential knowledge into the parallel decoding process. This enhancement improves the hit rate of speculators and thus boosts the overall efficiency. Clover transmits the sequential knowledge from pre-speculated tokens via the Regressive Connection, then employs an Attention Decoder to integrate these speculated tokens. Additionally, Clover incorporates an Augmenting Block that modifies the hidden states to better align with the purpose of speculative generation rather than next token prediction. The experiment results demonstrate that Clover outperforms the baseline by up to 91% on Baichuan-Small and 146% on Baichuan-Large, respectively, and exceeds the performance of the previously top-performing method, Medusa, by up to 37% on Baichuan-Small and 57% on Baichuan-Large, respectively.

[108]  arXiv:2405.00264 [pdf, other]
Title: Using Texture to Classify Forests Separately from Vegetation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Identifying terrain within satellite image data is a key issue in geographical information sciences, with numerous environmental and safety implications. Many techniques exist to derive classifications from spectral data captured by satellites. However, the ability to reliably classify vegetation remains a challenge. In particular, no precise methods exist for classifying forest vs. non-forest vegetation in high-level satellite images. This paper provides an initial proposal for a static, algorithmic process to identify forest regions in satellite image data through texture features created from detected edges and the NDVI ratio captured by Sentinel-2 satellite images. With strong initial results, this paper also identifies the next steps to improve the accuracy of the classification and verification processes.

[109]  arXiv:2405.00266 [pdf, other]
Title: Robot-As-A-Sensor: Forming a Sensing Network with Robots for Underground Mining Missions
Comments: Submitted to Special Issue on Neuro-Inspired Learning for Robotics for IEEE Transactions on Cognitive and Developmental Systems
Subjects: Networking and Internet Architecture (cs.NI)

Nowadays, robots are deployed as mobile platforms equipped with sensing, communication and computing capabilities, especially in the mining industry, where they perform tasks in hazardous and repetitive environments. Despite their potential, individual robots face significant limitations when completing complex tasks that require the collaboration of multiple robots. This collaboration requires a robust wireless network to ensure operational efficiency and reliability. This paper introduces the concept of "Robot-As-A-Sensor" (RAAS), which treats the robots as mobile sensors within structures similar to Wireless Sensor Networks (WSNs). We later identify specific challenges in integrating RAAS technology and propose technological advancements to address these challenges. Finally, we provide an outlook about the technologies that can contribute to realising RAAS, suggesting that this approach could catalyse a shift towards safer, more intelligent, and sustainable industry practices. We believe that this innovative RAAS framework could significantly transform industries requiring advanced technological integration.

[110]  arXiv:2405.00267 [pdf, other]
Title: Differentially Private Release of Israel's National Registry of Live Births
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Data Structures and Algorithms (cs.DS)

In February 2024, Israel's Ministry of Health released microdata of live births in Israel in 2014. The dataset is based on Israel's National Registry of Live Births and offers substantial value in multiple areas, such as scientific research and policy-making. At the same time, the data was processed so as to protect the privacy of 2014's mothers and newborns. The release was co-designed by the authors together with stakeholders from both inside and outside the Ministry of Health. This paper presents the methodology used to obtain that release. It also describes the considerations involved in choosing the methodology and the process followed.
We used differential privacy as our formal measure of the privacy loss incurred by the released dataset. More concretely, we prove that the released dataset is differentially private with privacy loss budget \varepsilon = 9.98. We extensively used the private selection algorithm of Liu and Talwar (STOC 2019) to bundle together multiple steps such as data transformation, model generation algorithm, hyperparameter selection, and evaluation. The model generation algorithm selected was PrivBayes (Zhang et al., SIGMOD 2014). The evaluation was based on a list of acceptance criteria, which were also disclosed only approximately so as to provide an overall differential privacy guarantee. We also discuss concrete challenges and barriers that appear relevant to the next steps of this pilot project, as well as to future differentially private releases.

[111]  arXiv:2405.00268 [pdf, other]
Title: A biased random-key genetic algorithm with variable mutants to solve a vehicle routing problem
Comments: 25 pages, 9 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

The paper explores the Biased Random-Key Genetic Algorithm (BRKGA) in the domain of logistics and vehicle routing. Specifically, the application of the algorithm is contextualized within the framework of the Vehicle Routing Problem with Occasional Drivers and Time Window (VRPODTW) that represents a critical challenge in contemporary delivery systems. Within this context, BRKGA emerges as an innovative solution approach to optimize routing plans, balancing cost-efficiency with operational constraints. This research introduces a new BRKGA, characterized by a variable mutant population which can vary from generation to generation, named BRKGA-VM. This novel variant was tested to solve a VRPODTW. For this purpose, an innovative specific decoder procedure was proposed and implemented. Furthermore, a hybridization of the algorithm with a Variable Neighborhood Descent (VND) algorithm has also been considered, showing an improvement of problem-solving capabilities. Computational results show a better performances in term of effectiveness over a previous version of BRKGA, denoted as MP. The improved performance of BRKGA-VM is evident from its ability to optimize solutions across a wide range of scenarios, with significant improvements observed for each type of instance considered. The analysis also reveals that VM achieves preset goals more quickly compared to MP, thanks to the increased variability induced in the mutant population which facilitates the exploration of new regions of the solution space. Furthermore, the integration of VND has shown an additional positive impact on the quality of the solutions found.

[112]  arXiv:2405.00269 [pdf, other]
Title: Adaptive Integral Sliding Mode Control for Attitude Tracking of Underwater Robots With Large Range Pitch Variations in Confined Space
Subjects: Robotics (cs.RO)

Underwater robots play a crucial role in exploring aquatic environments. The ability to flexibly adjust their attitudes is essential for underwater robots to effectively accomplish tasks in confined space. However, the highly coupled six degrees of freedom dynamics resulting from attitude changes and the complex turbulence within limited spatial areas present significant challenges. To address the problem of attitude control of underwater robots, this letter investigates large-range pitch angle tracking during station holding as well as simultaneous roll and yaw angle control to enable versatile attitude adjustments. Based on dynamic modeling, this letter proposes an adaptive integral sliding mode controller (AISMC) that integrates an integral module into traditional sliding mode control (SMC) and adaptively adjusts the switching gain for improved tracking accuracy, reduced chattering, and enhanced robustness. The stability of the closed-loop control system is established through Lyapunov analysis. Extensive experiments and comparison studies are conducted using a commercial remotely operated vehicle (ROV), the results of which demonstrate that AISMC achieves satisfactory performance in attitude tracking control in confined space with unknown disturbances, significantly outperforming both PID and SMC.

[113]  arXiv:2405.00273 [pdf, other]
Title: Social Life Simulation for Non-Cognitive Skills Learning
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

Non-cognitive skills are crucial for personal and social life well-being, and such skill development can be supported by narrative-based (e.g., storytelling) technologies. While generative AI enables interactive and role-playing storytelling, little is known about how users engage with and perceive the use of AI in social life simulation for non-cognitive skills learning. To this end, we introduced SimuLife++, an interactive platform enabled by a large language model (LLM). The system allows users to act as protagonists, creating stories with one or multiple AI-based characters in diverse social scenarios. In particular, we expanded the Human-AI interaction to a Human-AI-AI collaboration by including a sage agent, who acts as a bystander to provide users with more insightful perspectives on their choices and conversations. Through a within-subject user study, we found that the inclusion of the sage agent significantly enhanced narrative immersion, according to the narrative transportation scale, leading to more messages, particularly in group chats. Participants' interactions with the sage agent were also associated with significantly higher scores in their perceived motivation, self-perceptions, and resilience and coping, indicating positive impacts on non-cognitive skills reflection. Participants' interview results further explained the sage agent's aid in decision-making, solving ethical dilemmas, and problem-solving; on the other hand, they suggested improvements in user control and balanced responses from multiple characters. We provide design implications on the application of generative AI in narrative solutions for non-cognitive skill development in broader social contexts.

[114]  arXiv:2405.00280 [pdf, other]
Title: Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Information Retrieval (cs.IR)

News coverage profoundly affects how countries and individuals behave in international relations. Yet, we have little empirical evidence of how news coverage varies across countries. To enable studies of global news coverage, we develop an efficient computational methodology that comprises three components: (i) a transformer model to estimate multilingual news similarity; (ii) a global event identification system that clusters news based on a similarity network of news articles; and (iii) measures of news synchrony across countries and news diversity within a country, based on country-specific distributions of news coverage of the global events. Each component achieves state-of-the art performance, scaling seamlessly to massive datasets of millions of news articles. We apply the methodology to 60 million news articles published globally between January 1 and June 30, 2020, across 124 countries and 10 languages, detecting 4357 news events. We identify the factors explaining diversity and synchrony of news coverage across countries. Our study reveals that news media tend to cover a more diverse set of events in countries with larger Internet penetration, more official languages, larger religious diversity, higher economic inequality, and larger populations. Coverage of news events is more synchronized between countries that not only actively participate in commercial and political relations -- such as, pairs of countries with high bilateral trade volume, and countries that belong to the NATO military alliance or BRICS group of major emerging economies -- but also countries that share certain traits: an official language, high GDP, and high democracy indices.

[115]  arXiv:2405.00283 [pdf, ps, other]
Title: An Unstructured Mesh Reaction-Drift-Diffusion Master Equation with Reversible Reactions
Subjects: Numerical Analysis (math.NA)

We develop a convergent reaction-drift-diffusion master equation (CRDDME) to facilitate the study of reaction processes in which spatial transport is influenced by drift due to one-body potential fields within general domain geometries. The generalized CRDDME is obtained through two steps. We first derive an unstructured grid jump process approximation for reversible diffusions, enabling the simulation of drift-diffusion processes where the drift arises due to a conservative field that biases particle motion. Leveraging the Edge-Averaged Finite Element method, our approach preserves detailed balance of drift-diffusion fluxes at equilibrium, and preserves an equilibrium Gibbs-Boltzmann distribution for particles undergoing drift-diffusion on the unstructured mesh. We next formulate a spatially-continuous volume reactivity particle-based reaction-drift-diffusion model for reversible reactions of the form $\textrm{A} + \textrm{B} \leftrightarrow \textrm{C}$. A finite volume discretization is used to generate jump process approximations to reaction terms in this model. The discretization is developed to ensure the combined reaction-drift-diffusion jump process approximation is consistent with detailed balance of reaction fluxes holding at equilibrium, along with supporting a discrete version of the continuous equilibrium state. The new CRDDME model represents a continuous-time discrete-space jump process approximation to the underlying volume reactivity model. We demonstrate the convergence and accuracy of the new CRDDME through a number of numerical examples, and illustrate its use on an idealized model for membrane protein receptor dynamics in T cell signaling.

[116]  arXiv:2405.00285 [pdf, other]
Title: iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning
Comments: 8 pages, 3 figures, 3 tables
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

This paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP), where the goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour. Though MTSP has been widely studied, obtaining near-optimal solutions for large-scale problems is still challenging due to its NP-hardness. Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly suboptimal solutions. We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm. Our experiments showed that these innovative designs enable our gradient estimator to converge 20% faster than the advanced reinforcement learning baseline and find up to 80% shorter tour length compared with Google OR-Tools MTSP solver, especially in large-scale problems (e.g. 1000 cities and 15 agents).

[117]  arXiv:2405.00287 [pdf, other]
Title: Stochastic Sampling for Contrastive Views and Hard Negative Samples in Graph-based Collaborative Filtering
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Graph-based collaborative filtering (CF) has emerged as a promising approach in recommendation systems. Despite its achievements, graph-based CF models face challenges due to data sparsity and negative sampling. In this paper, we propose a novel Stochastic sampling for i) COntrastive views and ii) hard NEgative samples (SCONE) to overcome these issues. By considering that they are both sampling tasks, we generate dynamic augmented views and diverse hard negative samples via our unified stochastic sampling framework based on score-based generative models. In our comprehensive evaluations with 6 benchmark datasets, our proposed SCONE significantly improves recommendation accuracy and robustness, and demonstrates the superiority of our approach over existing CF models. Furthermore, we prove the efficacy of user-item specific stochastic sampling for addressing the user sparsity and item popularity issues. The integration of the stochastic sampling and graph-based CF obtains the state-of-the-art in personalized recommendation systems, making significant strides in information-rich environments.

[118]  arXiv:2405.00289 [pdf, other]
Title: Adversarial Attacks and Defense for Conversation Entailment Task
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) that are proved to be very powerful on different NLP tasks. However, there are still many ways to attack the model with very low costs. How to defend the model becomes an important problem. In our work, we treat adversarial attack results as a new (unseen) domain of the model, and we frame the defending problem into how to improve the robustness of the model on the new domain. We focus on the task of conversation entailment, where multi-turn natural language dialogues are the premise, and the transformer model is fine-tuned to predict whether a given hypothesis about the given dialogue is true or false. The adversary would attack the hypothesis to fool the model to make the wrong predictions. We apply synonym-swapping as the attack method. To show the robustness of the model, we implement some fine-tuning strategies and propose the embedding perturbation loss as a method to improve the robustness of the model. Finally, we show the importance of our work by discussing the adversarial attacks in NLP in the real world.

[119]  arXiv:2405.00291 [pdf, other]
Title: How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses
Comments: 11 pages, full research paper, EDM 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Automated explanatory feedback systems play a crucial role in facilitating learning for a large cohort of learners by offering feedback that incorporates explanations, significantly enhancing the learning process. However, delivering such explanatory feedback in real-time poses challenges, particularly when high classification accuracy for domain-specific, nuanced responses is essential. Our study leverages the capabilities of large language models, specifically Generative Pre-Trained Transformers (GPT), to explore a sequence labeling approach focused on identifying components of desired and less desired praise for providing explanatory feedback within a tutor training dataset. Our aim is to equip tutors with actionable, explanatory feedback during online training lessons. To investigate the potential of GPT models for providing the explanatory feedback, we employed two commonly-used approaches: prompting and fine-tuning. To quantify the quality of highlighted praise components identified by GPT models, we introduced a Modified Intersection over Union (M-IoU) score. Our findings demonstrate that: (1) the M-IoU score effectively correlates with human judgment in evaluating sequence quality; (2) using two-shot prompting on GPT-3.5 resulted in decent performance in recognizing effort-based (M-IoU of 0.46) and outcome-based praise (M-IoU of 0.68); and (3) our optimally fine-tuned GPT-3.5 model achieved M-IoU scores of 0.64 for effort-based praise and 0.84 for outcome-based praise, aligning with the satisfaction levels evaluated by human coders. Our results show promise for using GPT models to provide feedback that focuses on specific elements in their open-ended responses that are desirable or could use improvement.

[120]  arXiv:2405.00293 [pdf, other]
Title: MoPEFT: A Mixture-of-PEFTs for the Segment Anything Model
Comments: Workshop on Foundation Models, CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The emergence of foundation models, such as the Segment Anything Model (SAM), has sparked interest in Parameter-Efficient Fine-Tuning (PEFT) methods that tailor these large models to application domains outside their training data. However, different PEFT techniques modify the representation of a model differently, making it a non-trivial task to select the most appropriate method for the domain of interest. We propose a new framework, Mixture-of-PEFTs methods (MoPEFT), that is inspired by traditional Mixture-of-Experts (MoE) methodologies and is utilized for fine-tuning SAM. Our MoPEFT framework incorporates three different PEFT techniques as submodules and dynamically learns to activate the ones that are best suited for a given data-task setup. We test our method on the Segment Anything Model and show that MoPEFT consistently outperforms other fine-tuning methods on the MESS benchmark.

[121]  arXiv:2405.00295 [pdf, other]
Title: Proof of Sampling: A Nash Equilibrium-Secured Verification Protocol for Decentralized Systems
Subjects: Computer Science and Game Theory (cs.GT)

This paper presents a secure and versatile sampling-based verification protocol, Proof of Sampling (PoSP) protocol, suitable for a wide range of decentralized applications. Our protocol has a pure strategy Nash Equilibrium, which compels rational participants to act honestly, thus fortifying the network's integrity. This design effectively eliminates the possibility of free-riding, achieving this with manageable computational overhead. When applied to decentralized inference for AI applications, we design spML based on PoSP protocol, which ingeniously amalgamates the strengths of optimistic fraud proof and zero knowledge proof based approaches, the foremost approaches in the domain at present. Within the realm of Layer 2 solutions, our protocol sp-rollups addresses the security vulnerabilities of current optimistic rollups, which include a risk of undetected fraud due to reliance on mixed strategy equilibria, all the while keeping the computational overhead within reasonable bounds. Moreover, the PoSP protocol can be effectively utilized for designing verification mechanisms within Actively Validated Services (AVS) in EigenLayer, further broadening its applicability. This innovative approach not only enhances the security and efficiency of decentralized systems but also paves the way for a new generation of scalable and reliable decentralized applications.

[122]  arXiv:2405.00298 [pdf, other]
Title: The Reversing Machine: Reconstructing Memory Assumptions
Subjects: Cryptography and Security (cs.CR)

Existing anti-malware software and reverse engineering toolkits struggle with stealthy sub-OS rootkits due to limitations of run-time kernel-level monitoring. A malicious kernel-level driver can bypass OS-level anti-virus mechanisms easily. Although static analysis of such malware is possible, obfuscation and packing techniques complicate offline analysis. Moreover, current dynamic analyzers suffer from virtualization performance overhead and create detectable traces that allow modern malware to evade them.
To address these issues, we present \textit{The Reversing Machine} (TRM), a new hypervisor-based memory introspection design for reverse engineering, reconstructing memory offsets, and fingerprinting evasive and obfuscated user-level and kernel-level malware. TRM proposes two novel techniques that enable efficient and transparent analysis of evasive malware: hooking a binary using suspended process creation for hypervisor-based memory introspection, and leveraging Mode-Based Execution Control (MBEC) to detect user/kernel mode transitions and memory access patterns. Unlike existing malware detection environments, TRM can extract full memory traces in user and kernel spaces and hook the entire target memory map to reconstruct arrays, structures within the operating system, and possible rootkits.
We perform TRM-assisted reverse engineering of kernel-level structures and show that it can speed up manual reverse engineering by 75\% on average. We obfuscate known malware with the latest packing tools and successfully perform similarity detection. Furthermore, we demonstrate a real-world attack by deploying a modified rootkit onto a driver that bypasses state-of-the-art security auditing tools. We show that TRM can detect each threat and that, out of 24 state-of-the-art AV solutions, only TRM can detect the most advanced threats.

[123]  arXiv:2405.00300 [pdf, other]
Title: On a new class of BDF and IMEX schemes for parabolic type equations
Comments: This article was accepted for publication in the SIAM Journal on Numerical Analysis on April 30, 2024
Subjects: Numerical Analysis (math.NA)

When applying the classical multistep schemes for solving differential equations, one often faces the dilemma that smaller time steps are needed with higher-order schemes, making it impractical to use high-order schemes for stiff problems. We construct in this paper a new class of BDF and implicit-explicit (IMEX) schemes for parabolic type equations based on the Taylor expansions at time $t^{n+\beta}$ with $\beta > 1$ being a tunable parameter. These new schemes, with a suitable $\beta$, allow larger time steps at higher-order for stiff problems than that is allowed with a usual higher-order scheme. For parabolic type equations, we identify an explicit uniform multiplier for the new second- to fourth-order schemes, and conduct rigorously stability and error analysis by using the energy argument. We also present ample numerical examples to validate our findings.

[124]  arXiv:2405.00301 [pdf, other]
Title: LITO: Learnable Intervention for Truthfulness Optimization
Comments: 14 pages, 5 figures
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) can generate long-form and coherent text, but they still frequently hallucinate facts, thus limiting their reliability. To address this issue, inference-time methods that elicit truthful responses have been proposed by shifting LLM representations towards learned "truthful directions". However, applying the truthful directions with the same intensity fails to generalize across different question contexts. We propose LITO, a Learnable Intervention method for Truthfulness Optimization that automatically identifies the optimal intervention intensity tailored to a specific context. LITO explores a sequence of model generations based on increasing levels of intervention intensities. It selects the most accurate response or refuses to answer when the predictions are highly uncertain. Experiments on multiple LLMs and question-answering datasets demonstrate that LITO improves truthfulness while preserving task accuracy. The adaptive nature of LITO counters issues with one-size-fits-all intervention-based solutions, maximizing model truthfulness by reflecting internal knowledge only when the model is confident.

[125]  arXiv:2405.00302 [pdf, other]
Title: Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models
Subjects: Computation and Language (cs.CL)

In feedback generation for logical errors in programming assignments, large language model (LLM)-based methods have shown great promise. These methods ask the LLM to generate feedback given the problem statement and a student's (buggy) submission. There are several issues with these types of methods. First, the generated feedback messages are often too direct in revealing the error in the submission and thus diminish valuable opportunities for the student to learn. Second, they do not consider the student's learning context, i.e., their previous submissions, current knowledge, etc. Third, they are not layered since existing methods use a single, shared prompt for all student submissions. In this paper, we explore using LLMs to generate a "feedback-ladder", i.e., multiple levels of feedback for the same problem-submission pair. We evaluate the quality of the generated feedback-ladder via a user study with students, educators, and researchers. We have observed diminishing effectiveness for higher-level feedback and higher-scoring submissions overall in the study. In practice, our method enables teachers to select an appropriate level of feedback to show to a student based on their personal learning context, or in a progressive manner to go more detailed if a higher-level feedback fails to correct the student's error.

[126]  arXiv:2405.00303 [pdf, other]
Title: Joint Optimization of Piecewise Linear Ensembles
Comments: 7 pages, 4 figures, submitted to IEEE MLSP 2024
Subjects: Machine Learning (cs.LG)

Tree ensembles achieve state-of-the-art performance despite being greedily optimized. Global refinement (GR) reduces greediness by jointly and globally optimizing all constant leaves. We propose Joint Optimization of Piecewise Linear ENsembles (JOPLEN), a piecewise-linear extension of GR. Compared to GR, JOPLEN improves model flexibility and can apply common penalties, including sparsity-promoting matrix norms and subspace-norms, to nonlinear prediction. We evaluate the Frobenius norm, $\ell_{2,1}$ norm, and Laplacian regularization for 146 regression and classification datasets; JOPLEN, combined with GB trees and RF, achieves superior performance in both settings. Additionally, JOPLEN with a nuclear norm penalty empirically learns smooth and subspace-aligned functions. Finally, we perform multitask feature selection by extending the Dirty LASSO. JOPLEN Dirty LASSO achieves a superior feature sparsity/performance tradeoff to linear and gradient boosted approaches. We anticipate that JOPLEN will improve regression, classification, and feature selection across many fields.

[127]  arXiv:2405.00307 [pdf, other]
Title: Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
Comments: Accepted by Journal of Natural Language Processing. arXiv admin note: text overlap with arXiv:2310.00283
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training speech recognition task and the downstream speech emotion recognition task. Then, AL methods are employed to iteratively select a subset of the most informative and diverse samples for fine-tuning, thereby reducing time consumption. Experiments demonstrate that our proposed method \textsc{After}, using only 20\% of samples, improves accuracy by 8.45\% and reduces time consumption by 79\%. The additional extension of \textsc{After} and ablation studies further confirm its effectiveness and applicability to various real-world scenarios. Our source code is available on Github for reproducibility. (https://github.com/Clearloveyuan/AFTER).

[128]  arXiv:2405.00308 [pdf, ps, other]
Title: FPGA Digital Dice using Pseudo Random Number Generator
Comments: 15 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Applications (stat.AP)

The goal of this project is to design a digital dice that displays dice numbers in real-time. The number is generated by a pseudo-random number generator (PRNG) using XORshift algorithm that is implemented in Verilog HDL on an FPGA. The digital dice is equipped with tilt sensor, display, power management circuit, and rechargeable battery hosted in a 3D printed dice casing. By shaking the digital dice, the tilt sensor signal produces a seed for the PRNG. This digital dice demonstrates a set of possible random numbers of 2, 4, 6, 8, 10, 12, 20, 100 that simulate the number of dice sides. The kit is named SUTDicey.

[129]  arXiv:2405.00309 [pdf, ps, other]
Title: New upper bounds on the number of non-zero weights of constacyclic codes
Subjects: Information Theory (cs.IT)

For any simple-root constacyclic code $\mathcal{C}$ over a finite field $\mathbb{F}_q$, as far as we know, the group $\mathcal{G}$ generated by the multiplier, the constacyclic shift and the scalar multiplications is the largest subgroup of the automorphism group ${\rm Aut}(\mathcal{C})$ of $\mathcal{C}$. In this paper, by calculating the number of $\mathcal{G}$-orbits of $\mathcal{C}\backslash\{\bf 0\}$, we give an explicit upper bound on the number of non-zero weights of $\mathcal{C}$ and present a necessary and sufficient condition for $\mathcal{C}$ to meet the upper bound. Some examples in this paper show that our upper bound is tight and better than the upper bounds in [Zhang and Cao, FFA, 2024]. In particular, our main results provide a new method to construct few-weight constacyclic codes. Furthermore, for the constacyclic code $\mathcal{C}$ belonging to two special types, we obtain a smaller upper bound on the number of non-zero weights of $\mathcal{C}$ by substituting $\mathcal{G}$ with a larger subgroup of ${\rm Aut}(\mathcal{C})$. The results derived in this paper generalize the main results in [Chen, Fu and Liu, IEEE-TIT, 2024]}.

[130]  arXiv:2405.00311 [pdf, ps, other]
Title: Three-layer deep learning network random trees for fault diagnosis in chemical production process
Subjects: Machine Learning (cs.LG)

With the development of technology, the chemical production process is becoming increasingly complex and large-scale, making fault diagnosis particularly important. However, current diagnostic methods struggle to address the complexities of large-scale production processes. In this paper, we integrate the strengths of deep learning and machine learning technologies, combining the advantages of bidirectional long and short-term memory neural networks, fully connected neural networks, and the extra trees algorithm to propose a novel fault diagnostic model named three-layer deep learning network random trees (TDLN-trees). First, the deep learning component extracts temporal features from industrial data, combining and transforming them into a higher-level data representation. Second, the machine learning component processes and classifies the features extracted in the first step. An experimental analysis based on the Tennessee Eastman process verifies the superiority of the proposed method.

[131]  arXiv:2405.00313 [pdf, other]
Title: Streamlining Image Editing with Layered Diffusion Brushes
Comments: arXiv admin note: text overlap with arXiv:2306.00219
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages prompt-guided and region-targeted alteration of intermediate denoising steps, enabling precise modifications while maintaining the integrity and context of the input image. We provide an editor based on Layered Diffusion Brushes modifications, which incorporates well-known image editing concepts such as layer masks, visibility toggles, and independent manipulation of layers; regardless of their order. Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU, enabling real-time feedback and rapid exploration of candidate edits. We validated our method and editing system through a user study involving both natural images (using inversion) and generated images, showcasing its usability and effectiveness compared to existing techniques such as InstructPix2Pix and Stable Diffusion Inpainting for refining images. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation, demonstrating its versatility and potential for enhancing creative workflows.

[132]  arXiv:2405.00314 [pdf, other]
Title: Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Performance (cs.PF)

Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high computational and memory demands hinder deployment, especially on resource-constrained devices. This underscores the necessity of algorithm-hardware co-design specific to ViTs, aiming to optimize their performance by tailoring both the algorithmic structure and the underlying hardware accelerator to each other's strengths. Model quantization, by converting high-precision numbers to lower-precision, reduces the computational demands and memory needs of ViTs, allowing the creation of hardware specifically optimized for these quantized algorithms, boosting efficiency. This article provides a comprehensive survey of ViTs quantization and its hardware acceleration. We first delve into the unique architectural attributes of ViTs and their runtime characteristics. Subsequently, we examine the fundamental principles of model quantization, followed by a comparative analysis of the state-of-the-art quantization techniques for ViTs. Additionally, we explore the hardware acceleration of quantized ViTs, highlighting the importance of hardware-friendly algorithm design. In conclusion, this article will discuss ongoing challenges and future research paths. We consistently maintain the related open-source materials at https://github.com/DD-DuDa/awesome-vit-quantization-acceleration.

[133]  arXiv:2405.00316 [pdf, other]
Title: Enhance Planning with Physics-informed Safety Controllor for End-to-end Autonomous Driving
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have limitations in achieving perfect accuracy on the training dataset and network performance can be affected by out-of-distribution problem. In this paper, we propose FusionAssurance, a novel trajectory-based end-to-end driving fusion framework which combines physics-informed control for safety assurance. By incorporating Potential Field into Model Predictive Control, FusionAssurance is capable of navigating through scenarios that are not included in the training dataset and scenarios where neural network fail to generalize. The effectiveness of the approach is demonstrated by extensive experiments under various scenarios on the CARLA benchmark.

[134]  arXiv:2405.00318 [pdf, other]
Title: Covariant spatio-temporal receptive fields for neuromorphic computing
Comments: Code available at this https URL
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Biological nervous systems constitute important sources of inspiration towards computers that are faster, cheaper, and more energy efficient. Neuromorphic disciplines view the brain as a coevolved system, simultaneously optimizing the hardware and the algorithms running on it. There are clear efficiency gains when bringing the computations into a physical substrate, but we presently lack theories to guide efficient implementations. Here, we present a principled computational model for neuromorphic systems in terms of spatio-temporal receptive fields, based on affine Gaussian kernels over space and leaky-integrator and leaky integrate-and-fire models over time. Our theory is provably covariant to spatial affine and temporal scaling transformations, and with close similarities to the visual processing in mammalian brains. We use these spatio-temporal receptive fields as a prior in an event-based vision task, and show that this improves the training of spiking networks, which otherwise is known as problematic for event-based vision. This work combines efforts within scale-space theory and computational neuroscience to identify theoretically well-founded ways to process spatio-temporal signals in neuromorphic systems. Our contributions are immediately relevant for signal processing and event-based vision, and can be extended to other processing tasks over space and time, such as memory and control.

[135]  arXiv:2405.00319 [pdf, other]
Title: Data Augmentation Policy Search for Long-Term Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Data augmentation serves as a popular regularization technique to combat overfitting challenges in neural networks. While automatic augmentation has demonstrated success in image classification tasks, its application to time-series problems, particularly in long-term forecasting, has received comparatively less attention. To address this gap, we introduce a time-series automatic augmentation approach named TSAA, which is both efficient and easy to implement. The solution involves tackling the associated bilevel optimization problem through a two-step process: initially training a non-augmented model for a limited number of epochs, followed by an iterative split procedure. During this iterative process, we alternate between identifying a robust augmentation policy through Bayesian optimization and refining the model while discarding suboptimal runs. Extensive evaluations on challenging univariate and multivariate forecasting benchmark problems demonstrate that TSAA consistently outperforms several robust baselines, suggesting its potential integration into prediction pipelines.

[136]  arXiv:2405.00320 [pdf, ps, other]
Title: Web3 and the State: Indian state's redescription of blockchain
Comments: 21 pages
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)

The article does a close reading of a discussion paper by NITI Aayog and a strategy paper by the Ministry of Electronics and Information Technology (MeitY) advocating non-financial use cases of blockchain in India. By noting the discursive shift from transparency to trust that grounds these two documents and consequently Indian state's redescription of blockchain, the paper foregrounds how governance by infrastructure is at the heart of new forms of governance and how blockchain systems are being designated as decentral by states to have recentralizing effects. The papers highlight how a mapping of discursive shifts of notions such as trust, transparency, (de)centralization and (dis)intermediation can be a potent site to investigate redescriptions of emerging sociotechnical systems.

[137]  arXiv:2405.00321 [pdf, other]
Title: DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training
Subjects: Computation and Language (cs.CL)

The NLI4CT task at SemEval-2024 emphasizes the development of robust models for Natural Language Inference on Clinical Trial Reports (CTRs) using large language models (LLMs). This edition introduces interventions specifically targeting the numerical, vocabulary, and semantic aspects of CTRs. Our proposed system harnesses the capabilities of the state-of-the-art Mistral model, complemented by an auxiliary model, to focus on the intricate input space of the NLI4CT dataset. Through the incorporation of numerical and acronym-based perturbations to the data, we train a robust system capable of handling both semantic-altering and numerical contradiction interventions. Our analysis on the dataset sheds light on the challenging sections of the CTRs for reasoning.

[138]  arXiv:2405.00322 [pdf, other]
Title: Characterizing Information Seeking Processes with Multiple Physiological Signals
Journal-ref: In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, Washington, DC, USA. ACM, New York, NY, USA, 12 pages
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

Information access systems are getting complex, and our understanding of user behavior during information seeking processes is mainly drawn from qualitative methods, such as observational studies or surveys. Leveraging the advances in sensing technologies, our study aims to characterize user behaviors with physiological signals, particularly in relation to cognitive load, affective arousal, and valence. We conduct a controlled lab study with 26 participants, and collect data including Electrodermal Activities, Photoplethysmogram, Electroencephalogram, and Pupillary Responses. This study examines informational search with four stages: the realization of Information Need (IN), Query Formulation (QF), Query Submission (QS), and Relevance Judgment (RJ). We also include different interaction modalities to represent modern systems, e.g., QS by text-typing or verbalizing, and RJ with text or audio information. We analyze the physiological signals across these stages and report outcomes of pairwise non-parametric repeated-measure statistical tests. The results show that participants experience significantly higher cognitive loads at IN with a subtle increase in alertness, while QF requires higher attention. QS involves demanding cognitive loads than QF. Affective responses are more pronounced at RJ than QS or IN, suggesting greater interest and engagement as knowledge gaps are resolved. To the best of our knowledge, this is the first study that explores user behaviors in a search process employing a more nuanced quantitative analysis of physiological signals. Our findings offer valuable insights into user behavior and emotional responses in information seeking processes. We believe our proposed methodology can inform the characterization of more complex processes, such as conversational information seeking.

[139]  arXiv:2405.00326 [pdf, ps, other]
Title: A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small Matrices
Comments: This article was submitted to Parallel Computing in December 9, 2013.This article was also published in IPSJ SIG Notes, Vol. 2015-HPC-148, Vol.2, pp.1-17 (February 23, 2015). (a non-reviewed technical report)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Performance (cs.PF)

In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a supercomputer. We assume that the sizes also fit the exa-scale computing requirements of current production runs of an application. To minimize communication time, we added several communication avoiding and communication reducing algorithms based on Message Passing Interface (MPI) non-blocking implementations. A performance evaluation with up to full nodes of the FX10 system indicates that (1) the MPI non-blocking implementation is 3x as efficient as the baseline implementation, (2) the hybrid MPI execution is 1.9x faster than the pure MPI execution, (3) our proposed solver is 2.3x and 22x faster than a ScaLAPACK routine with optimized blocking size and cyclic-cyclic distribution, respectively.

[140]  arXiv:2405.00327 [pdf, other]
Title: A Smoothed Analysis of the Space Complexity of Computing a Chaotic Sequence
Comments: arXiv admin note: text overlap with arXiv:2310.14185
Subjects: Computational Complexity (cs.CC)

This work is motivated by a question whether it is possible to calculate a chaotic sequence efficiently, e.g., is it possible to get the $n$-th bit of a bit sequence generated by a chaotic map, such as $\beta$-expansion, tent map and logistic map in $\mathrm{o}(n)$ time/space? This paper gives an affirmative answer to the question about the space complexity of a tent map. We show that the decision problem of whether a given bit sequence is a valid tent code is solved in $\mathrm{O}(\log^{2} n)$ space in a sense of the smoothed complexity.

[141]  arXiv:2405.00329 [pdf, ps, other]
Title: Metric geometry of the privacy-utility tradeoff
Subjects: Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Probability (math.PR)

Synthetic data are an attractive concept to enable privacy in data sharing. A fundamental question is how similar the privacy-preserving synthetic data are compared to the true data. Using metric privacy, an effective generalization of differential privacy beyond the discrete setting, we raise the problem of characterizing the optimal privacy-accuracy tradeoff by the metric geometry of the underlying space. We provide a partial solution to this problem in terms of the "entropic scale", a quantity that captures the multiscale geometry of a metric space via the behavior of its packing numbers. We illustrate the applicability of our privacy-accuracy tradeoff framework via a diverse set of examples of metric spaces.

[142]  arXiv:2405.00330 [pdf, other]
Title: Integrating A.I. in Higher Education: Protocol for a Pilot Study with 'SAMCares: An Adaptive Learning Hub'
Comments: Accepted in ASEE Annual Conference 2024
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Learning never ends, and there is no age limit to grow yourself. However, the educational landscape may face challenges in effectively catering to students' inclusion and diverse learning needs. These students should have access to state-of-the-art methods for lecture delivery, online resources, and technology needs. However, with all the diverse learning sources, it becomes harder for students to comprehend a large amount of knowledge in a short period of time. Traditional assistive technologies and learning aids often lack the dynamic adaptability required for individualized education plans. Large Language Models (LLM) have been used in language translation, text summarization, and content generation applications. With rapid growth in AI over the past years, AI-powered chatbots and virtual assistants have been developed. This research aims to bridge this gap by introducing an innovative study buddy we will be calling the 'SAMCares'. The system leverages a Large Language Model (LLM) (in our case, LLaMa-2 70B as the base model) and Retriever-Augmented Generation (RAG) to offer real-time, context-aware, and adaptive educational support. The context of the model will be limited to the knowledge base of Sam Houston State University (SHSU) course notes. The LLM component enables a chat-like environment to interact with it to meet the unique learning requirements of each student. For this, we will build a custom web-based GUI. At the same time, RAG enhances real-time information retrieval and text generation, in turn providing more accurate and context-specific assistance. An option to upload additional study materials in the web GUI is added in case additional knowledge support is required. The system's efficacy will be evaluated through controlled trials and iterative feedback mechanisms.

[143]  arXiv:2405.00332 [pdf, other]
Title: A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k benchmark, the gold standard for measuring elementary mathematical reasoning. We ensure that the two benchmarks are comparable across important metrics such as human solve rates, number of steps in solution, answer magnitude, and more. When evaluating leading open- and closed-source LLMs on GSM1k, we observe accuracy drops of up to 13%, with several families of models (e.g., Phi and Mistral) showing evidence of systematic overfitting across almost all model sizes. At the same time, many models, especially those on the frontier, (e.g., Gemini/GPT/Claude) show minimal signs of overfitting. Further analysis suggests a positive relationship (Spearman's r^2=0.32) between a model's probability of generating an example from GSM8k and its performance gap between GSM8k and GSM1k, suggesting that many models may have partially memorized GSM8k.

[144]  arXiv:2405.00334 [pdf, other]
Title: A Survey on Deep Active Learning: Recent Advances and New Frontiers
Comments: This paper is accepted by IEEE Transactions on Neural Networks and Learning Systems
Subjects: Machine Learning (cs.LG)

Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize main applications of DAL in Natural Language Processing (NLP), Computer Vision (CV), and Data Mining (DM), etc. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field.

[145]  arXiv:2405.00335 [pdf, ps, other]
Title: Finding the white male: The prevalence and consequences of algorithmic gender and race bias in political Google searches
Comments: 30 pages, 5 figures
Subjects: Computers and Society (cs.CY)

Search engines like Google have become major information gatekeepers that use artificial intelligence (AI) to determine who and what voters find when searching for political information. This article proposes and tests a framework of algorithmic representation of minoritized groups in a series of four studies. First, two algorithm audits of political image searches delineate how search engines reflect and uphold structural inequalities by under- and misrepresenting women and non-white politicians. Second, two online experiments show that these biases in algorithmic representation in turn distort perceptions of the political reality and actively reinforce a white and masculinized view of politics. Together, the results have substantive implications for the scientific understanding of how AI technology amplifies biases in political perceptions and decision-making. The article contributes to ongoing public debates and cross-disciplinary research on algorithmic fairness and injustice.

[146]  arXiv:2405.00338 [pdf, other]
Title: Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model
Comments: 10 pages, 2 figures
Subjects: Information Retrieval (cs.IR)

Owing to their powerful semantic reasoning capabilities, Large Language Models (LLMs) have been effectively utilized as recommenders, achieving impressive performance. However, the high inference latency of LLMs significantly restricts their practical deployment. To address this issue, this work investigates knowledge distillation from cumbersome LLM-based recommendation models to lightweight conventional sequential models. It encounters three challenges: 1) the teacher's knowledge may not always be reliable; 2) the capacity gap between the teacher and student makes it difficult for the student to assimilate the teacher's knowledge; 3) divergence in semantic space poses a challenge to distill the knowledge from embeddings. To tackle these challenges, this work proposes a novel distillation strategy, DLLM2Rec, specifically tailored for knowledge distillation from LLM-based recommendation models to conventional sequential models. DLLM2Rec comprises: 1) Importance-aware ranking distillation, which filters reliable and student-friendly knowledge by weighting instances according to teacher confidence and student-teacher consistency; 2) Collaborative embedding distillation integrates knowledge from teacher embeddings with collaborative signals mined from the data. Extensive experiments demonstrate the effectiveness of the proposed DLLM2Rec, boosting three typical sequential models with an average improvement of 47.97%, even enabling them to surpass LLM-based recommenders in some cases.

[147]  arXiv:2405.00340 [pdf, other]
Title: NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

State-of-the-art neural implicit surface representations have achieved impressive results in indoor scene reconstruction by incorporating monocular geometric priors as additional supervision. However, we have observed that multi-view inconsistency between such priors poses a challenge for high-quality reconstructions. In response, we present NC-SDF, a neural signed distance field (SDF) 3D reconstruction framework with view-dependent normal compensation (NC). Specifically, we integrate view-dependent biases in monocular normal priors into the neural implicit representation of the scene. By adaptively learning and correcting the biases, our NC-SDF effectively mitigates the adverse impact of inconsistent supervision, enhancing both the global consistency and local details in the reconstructions. To further refine the details, we introduce an informative pixel sampling strategy to pay more attention to intricate geometry with higher information content. Additionally, we design a hybrid geometry modeling approach to improve the neural implicit representation. Experiments on synthetic and real-world datasets demonstrate that NC-SDF outperforms existing approaches in terms of reconstruction quality.

[148]  arXiv:2405.00341 [pdf, ps, other]
Title: Google or ChatGPT: Who is the Better Helper for University Students
Subjects: Human-Computer Interaction (cs.HC)

Using information technology tools for academic help-seeking among college students has become a popular trend. In the evolutionary process between Generation Artificial Intelligence (GenAI) and traditional search engines, when students face academic challenges, do they tend to prefer Google, or are they more inclined to utilize ChatGPT? And what are the key factors influencing learners' preference to use ChatGPT for academic help-seeking? These relevant questions merit attention. The study employed a mixed-methods research design to investigate Taiwanese university students' online academic help-seeking preferences. The results indicated that students tend to prefer using ChatGPT to seek academic assistance, reflecting the potential popularity of GenAI in the educational field. Additionally, in comparing seven machine learning algorithms, the Random Forest and LightGBM algorithms exhibited superior performance. These two algorithms were employed to evaluate the predictive capability of 18 potential factors. It was found that GenAI fluency, GenAI distortions, and age were the core factors influencing how university students seek academic help. Overall, this study underscores that educators should prioritize the cultivation of students' critical thinking skills, while technical personnel should enhance the fluency and reliability of ChatGPT and Google searches and explore the integration of chat and search functions to achieve optimal balance.

[149]  arXiv:2405.00342 [pdf, ps, other]
Title: The Set of Stable Matchings and the Core in a Matching Market with Ties and Matroid Constraints
Authors: Naoyuki Kamiyama
Subjects: Computer Science and Game Theory (cs.GT)

In this paper, we consider a many-to-one matching market where ties in the preferences of agents are allowed. For this market with capacity constraints, Bonifacio, Juarez, Neme, and Oviedo proved some relationship between the set of stable matchings and the core. In this paper, we consider a matroid constraint that is a generalization of a capacity constraint. We prove that the results proved by Bonifacio, Juarez, Neme, and Oviedo can be generalized to this setting.

[150]  arXiv:2405.00344 [pdf, other]
Title: Expert Insight-Enhanced Follow-up Chest X-Ray Summary Generation
Comments: accepted by 22nd International Conference on Artificial Intelligence in medicine (AIME2024)
Subjects: Multimedia (cs.MM)

A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at current examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the authors' knowledge, there is only one work dedicated to generating summary of the latter findings, i.e., follow-up summary. In this study, we therefore propose a transformer-based framework to tackle this task. Motivated by our observations on the significance of medical lexicon on the fidelity of summary generation, we introduce two mechanisms to bestow expert insight to our model, namely expert soft guidance and masked entity modeling loss. The former mechanism employs a pretrained expert disease classifier to guide the presence level of specific abnormalities, while the latter directs the model's attention toward medical lexicon. Extensive experiments were conducted to demonstrate that the performance of our model is competitive with or exceeds the state-of-the-art.

[151]  arXiv:2405.00348 [pdf, other]
Title: Practical Dataset Distillation Based on Deep Support Vectors
Subjects: Machine Learning (cs.LG)

Conventional dataset distillation requires significant computational resources and assumes access to the entire dataset, an assumption impractical as it presumes all data resides on a central server. In this paper, we focus on dataset distillation in practical scenarios with access to only a fraction of the entire dataset. We introduce a novel distillation method that augments the conventional process by incorporating general model knowledge via the addition of Deep KKT (DKKT) loss. In practical settings, our approach showed improved performance compared to the baseline distribution matching distillation method on the CIFAR-10 dataset. Additionally, we present experimental evidence that Deep Support Vectors (DSVs) offer unique information to the original distillation, and their integration results in enhanced performance.

[152]  arXiv:2405.00349 [pdf, other]
Title: A Self-explaining Neural Architecture for Generalizable Concept Learning
Comments: IJCAI 2024
Subjects: Machine Learning (cs.LG)

With the wide proliferation of Deep Neural Networks in high-stake applications, there is a growing demand for explainability behind their decision-making process. Concept learning models attempt to learn high-level 'concepts' - abstract entities that align with human understanding, and thus provide interpretability to DNN architectures. However, in this paper, we demonstrate that present SOTA concept learning approaches suffer from two major problems - lack of concept fidelity wherein the models fail to learn consistent concepts among similar classes and limited concept interoperability wherein the models fail to generalize learned concepts to new domains for the same task. Keeping these in mind, we propose a novel self-explaining architecture for concept learning across domains which - i) incorporates a new concept saliency network for representative concept selection, ii) utilizes contrastive learning to capture representative domain invariant concepts, and iii) uses a novel prototype-based concept grounding regularization to improve concept alignment across domains. We demonstrate the efficacy of our proposed approach over current SOTA concept learning approaches on four widely used real-world datasets. Empirical results show that our method improves both concept fidelity measured through concept overlap and concept interoperability measured through domain adaptation performance.

[153]  arXiv:2405.00351 [pdf, other]
Title: Learning High-Quality Navigation and Zooming on Omnidirectional Images in Virtual Reality
Comments: 11 pages
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Viewing omnidirectional images (ODIs) in virtual reality (VR) represents a novel form of media that provides immersive experiences for users to navigate and interact with digital content. Nonetheless, this sense of immersion can be greatly compromised by a blur effect that masks details and hampers the user's ability to engage with objects of interest. In this paper, we present a novel system, called OmniVR, designed to enhance visual clarity during VR navigation. Our system enables users to effortlessly locate and zoom in on the objects of interest in VR. It captures user commands for navigation and zoom, converting these inputs into parameters for the Mobius transformation matrix. Leveraging these parameters, the ODI is refined using a learning-based algorithm. The resultant ODI is presented within the VR media, effectively reducing blur and increasing user engagement. To verify the effectiveness of our system, we first evaluate our algorithm with state-of-the-art methods on public datasets, which achieves the best performance. Furthermore, we undertake a comprehensive user study to evaluate viewer experiences across diverse scenarios and to gather their qualitative feedback from multiple perspectives. The outcomes reveal that our system enhances user engagement by improving the viewers' recognition, reducing discomfort, and improving the overall immersive experience. Our system makes the navigation and zoom more user-friendly.

[154]  arXiv:2405.00352 [pdf, other]
Title: Transformer-based Reasoning for Learning Evolutionary Chain of Events on Temporal Knowledge Graph
Comments: Accepted by SIGIR 2024 (the Full paper track, camera ready version)
Subjects: Artificial Intelligence (cs.AI)

Temporal Knowledge Graph (TKG) reasoning often involves completing missing factual elements along the timeline. Although existing methods can learn good embeddings for each factual element in quadruples by integrating temporal information, they often fail to infer the evolution of temporal facts. This is mainly because of (1) insufficiently exploring the internal structure and semantic relationships within individual quadruples and (2) inadequately learning a unified representation of the contextual and temporal correlations among different quadruples. To overcome these limitations, we propose a novel Transformer-based reasoning model (dubbed ECEformer) for TKG to learn the Evolutionary Chain of Events (ECE). Specifically, we unfold the neighborhood subgraph of an entity node in chronological order, forming an evolutionary chain of events as the input for our model. Subsequently, we utilize a Transformer encoder to learn the embeddings of intra-quadruples for ECE. We then craft a mixed-context reasoning module based on the multi-layer perceptron (MLP) to learn the unified representations of inter-quadruples for ECE while accomplishing temporal knowledge reasoning. In addition, to enhance the timeliness of the events, we devise an additional time prediction task to complete effective temporal information within the learned unified representation. Extensive experiments on six benchmark datasets verify the state-of-the-art performance and the effectiveness of our method.

[155]  arXiv:2405.00353 [pdf, ps, other]
Title: Dual-Role AoI-based Incentive Mechanism for HD map Crowdsourcing
Subjects: Computer Science and Game Theory (cs.GT)

A high-quality fresh high-definition (HD) map is vital in enhancing transportation efficiency and safety in autonomous driving. Vehicle-based crowdsourcing offers a promising approach for updating HD maps. However, recruiting crowdsourcing vehicles involves making the challenging tradeoff between the HD map freshness and recruitment costs. Existing studies on HD map crowdsourcing often (1) prioritize maximizing spatial coverage and (2) overlook the dual role of crowdsourcing vehicles in HD maps, as vehicles serve both as contributors and customers of HD maps. This motivates us to propose the Dual-Role Age of Information (AoI) based Incentive Mechanism (DRAIM) to address these issues. % Specifically, we propose the trajectory age of information, incorporating the expected AoI of the HD map and the trajectory, to quantify a vehicle's HD map usage utility, which is freshness- and trajectory-dependent. DRAIM aims to achieve the company's tradeoff between freshness and recruitment costs.

[156]  arXiv:2405.00354 [pdf, other]
Title: CrossMatch: Enhance Semi-Supervised Medical Image Segmentation with Perturbation Strategies and Knowledge Distillation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Semi-supervised learning for medical image segmentation presents a unique challenge of efficiently using limited labeled data while leveraging abundant unlabeled data. Despite advancements, existing methods often do not fully exploit the potential of the unlabeled data for enhancing model robustness and accuracy. In this paper, we introduce CrossMatch, a novel framework that integrates knowledge distillation with dual perturbation strategies-image-level and feature-level-to improve the model's learning from both labeled and unlabeled data. CrossMatch employs multiple encoders and decoders to generate diverse data streams, which undergo self-knowledge distillation to enhance consistency and reliability of predictions across varied perturbations. Our method significantly surpasses other state-of-the-art techniques in standard benchmarks by effectively minimizing the gap between training on labeled and unlabeled data and improving edge accuracy and generalization in medical image segmentation. The efficacy of CrossMatch is demonstrated through extensive experimental validations, showing remarkable performance improvements without increasing computational costs. Code for this implementation is made available at https://github.com/AiEson/CrossMatch.git.

[157]  arXiv:2405.00355 [pdf, other]
Title: Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper investigates the effectiveness of self-supervised pre-trained transformers compared to supervised pre-trained transformers and conventional neural networks (ConvNets) for detecting various types of deepfakes. We focus on their potential for improved generalization, particularly when training data is limited. Despite the notable success of large vision-language models utilizing transformer architectures in various tasks, including zero-shot and few-shot learning, the deepfake detection community has still shown some reluctance to adopt pre-trained vision transformers (ViTs), especially large ones, as feature extractors. One concern is their perceived excessive capacity, which often demands extensive data, and the resulting suboptimal generalization when training or fine-tuning data is small or less diverse. This contrasts poorly with ConvNets, which have already established themselves as robust feature extractors. Additionally, training and optimizing transformers from scratch requires significant computational resources, making this accessible primarily to large companies and hindering broader investigation within the academic community. Recent advancements in using self-supervised learning (SSL) in transformers, such as DINO and its derivatives, have showcased significant adaptability across diverse vision tasks and possess explicit semantic segmentation capabilities. By leveraging DINO for deepfake detection with modest training data and implementing partial fine-tuning, we observe comparable adaptability to the task and the natural explainability of the detection result via the attention mechanism. Moreover, partial fine-tuning of transformers for deepfake detection offers a more resource-efficient alternative, requiring significantly fewer computational resources.

[158]  arXiv:2405.00358 [pdf, other]
Title: Arbitrary Time Information Modeling via Polynomial Approximation for Temporal Knowledge Graph Embedding
Comments: Accepted by LREC-COLING 2024 (long paper, camera-ready version)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Distinguished from traditional knowledge graphs (KGs), temporal knowledge graphs (TKGs) must explore and reason over temporally evolving facts adequately. However, existing TKG approaches still face two main challenges, i.e., the limited capability to model arbitrary timestamps continuously and the lack of rich inference patterns under temporal constraints. In this paper, we propose an innovative TKGE method (PTBox) via polynomial decomposition-based temporal representation and box embedding-based entity representation to tackle the above-mentioned problems. Specifically, we decompose time information by polynomials and then enhance the model's capability to represent arbitrary timestamps flexibly by incorporating the learnable temporal basis tensor. In addition, we model every entity as a hyperrectangle box and define each relation as a transformation on the head and tail entity boxes. The entity boxes can capture complex geometric structures and learn robust representations, improving the model's inductive capability for rich inference patterns. Theoretically, our PTBox can encode arbitrary time information or even unseen timestamps while capturing rich inference patterns and higher-arity relations of the knowledge base. Extensive experiments on real-world datasets demonstrate the effectiveness of our method.

[159]  arXiv:2405.00359 [pdf, ps, other]
Title: Subquadratic Submodular Maximization with a General Matroid Constraint
Comments: 19 pages, To appear in ICALP 2024
Subjects: Data Structures and Algorithms (cs.DS)

We consider fast algorithms for monotone submodular maximization with a general matroid constraint. We present a randomized $(1 - 1/e - \epsilon)$-approximation algorithm that requires $\tilde{O}_{\epsilon}(\sqrt{r} n)$ independence oracle and value oracle queries, where $n$ is the number of elements in the matroid and $r \leq n$ is the rank of the matroid. This improves upon the previously best algorithm by Buchbinder-Feldman-Schwartz [Mathematics of Operations Research 2017] that requires $\tilde{O}_{\epsilon}(r^2 + \sqrt{r}n)$ queries.
Our algorithm is based on continuous relaxation, as with other submodular maximization algorithms in the literature. To achieve subquadratic query complexity, we develop a new rounding algorithm, which is our main technical contribution. The rounding algorithm takes as input a point represented as a convex combination of $t$ bases of a matroid and rounds it to an integral solution. Our rounding algorithm requires $\tilde{O}(r^{3/2} t)$ independence oracle queries, while the previously best rounding algorithm by Chekuri-Vondr\'{a}k-Zenklusen [FOCS 2010] requires $O(r^2 t)$ independence oracle queries. A key idea in our rounding algorithm is to use a directed cycle of arbitrary length in an auxiliary graph, while the algorithm of Chekuri-Vondr\'{a}k-Zenklusen focused on directed cycles of length two.

[160]  arXiv:2405.00361 [pdf, other]
Title: AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts
Subjects: Computation and Language (cs.CL)

We introduce AdaMoLE, a novel method for fine-tuning large language models (LLMs) through an Adaptive Mixture of Low-Rank Adaptation (LoRA) Experts. Moving beyond conventional methods that employ a static top-k strategy for activating experts, AdaMoLE dynamically adjusts the activation threshold using a dedicated threshold network, adaptively responding to the varying complexities of different tasks. By replacing a single LoRA in a layer with multiple LoRA experts and integrating a gating function with the threshold mechanism, AdaMoLE effectively selects and activates the most appropriate experts based on the input context. Our extensive evaluations across a variety of commonsense reasoning and natural language processing tasks show that AdaMoLE exceeds baseline performance. This enhancement highlights the advantages of AdaMoLE's adaptive selection of LoRA experts, improving model effectiveness without a corresponding increase in the expert count. The experimental validation not only confirms AdaMoLE as a robust approach for enhancing LLMs but also suggests valuable directions for future research in adaptive expert selection mechanisms, potentially broadening the scope for optimizing model performance across diverse language processing tasks.

[161]  arXiv:2405.00362 [pdf, other]
Title: Implicit Swept Volume SDF: Enabling Continuous Collision-Free Trajectory Generation for Arbitrary Shapes
Comments: accecpted by SIGGRAPH2024&TOG. Joint First Authors: Jingping Wang,Tingrui Zhang, Joint Corresponding authors: Fei Gao, Lan Xu
Subjects: Robotics (cs.RO); Computational Geometry (cs.CG); Graphics (cs.GR)

In the field of trajectory generation for objects, ensuring continuous collision-free motion remains a huge challenge, especially for non-convex geometries and complex environments. Previous methods either oversimplify object shapes, which results in a sacrifice of feasible space or rely on discrete sampling, which suffers from the "tunnel effect". To address these limitations, we propose a novel hierarchical trajectory generation pipeline, which utilizes the Swept Volume Signed Distance Field (SVSDF) to guide trajectory optimization for Continuous Collision Avoidance (CCA). Our interdisciplinary approach, blending techniques from graphics and robotics, exhibits outstanding effectiveness in solving this problem. We formulate the computation of the SVSDF as a Generalized Semi-Infinite Programming model, and we solve for the numerical solutions at query points implicitly, thereby eliminating the need for explicit reconstruction of the surface. Our algorithm has been validated in a variety of complex scenarios and applies to robots of various dynamics, including both rigid and deformable shapes. It demonstrates exceptional universality and superior CCA performance compared to typical algorithms. The code will be released at https://github.com/ZJU-FAST-Lab/Implicit-SVSDF-Planner for the benefit of the community.

[162]  arXiv:2405.00365 [pdf, other]
Title: Robust Continuous-Time Beam Tracking with Liquid Neural Network
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high overhead, we propose a novel solution for robust continuous-time beam tracking with liquid neural network, which dynamically adjust the narrow mmWave beams to ensure real-time beam alignment with mobile users. Through extensive simulations, we validate the effectiveness of our proposed method and demonstrate its superiority over existing state-of-the-art deep-learning-based approaches. Specifically, our scheme achieves at most 46.9% higher normalized spectral efficiency than the baselines when the user is moving at 5 m/s, demonstrating the potential of liquid neural networks to enhance mmWave mobile communication performance.

[163]  arXiv:2405.00366 [pdf, ps, other]
Title: L0-regularized compressed sensing with Mean-field Coherent Ising Machines
Comments: 19 pages, 7 figures
Subjects: Emerging Technologies (cs.ET); Quantum Physics (quant-ph); Applications (stat.AP); Computation (stat.CO)

Coherent Ising Machine (CIM) is a network of optical parametric oscillators that solves combinatorial optimization problems by finding the ground state of an Ising Hamiltonian. As a practical application of CIM, Aonishi et al. proposed a quantum-classical hybrid system to solve optimization problems of L0-regularization-based compressed sensing (L0RBCS). Gunathilaka et al. has further enhanced the accuracy of the system. However, the computationally expensive CIM's stochastic differential equations (SDEs) limit the use of digital hardware implementations. As an alternative to Gunathilaka et al.'s CIM SDEs used previously, we propose using the mean-field CIM (MF-CIM) model, which is a physics-inspired heuristic solver without quantum noise. MF-CIM surmounts the high computational cost due to the simple nature of the differential equations (DEs). Furthermore, our results indicate that the proposed model has similar performance to physically accurate SDEs in both artificial and magnetic resonance imaging data, paving the way for implementing CIM-based L0RBCS on digital hardware such as Field Programmable Gate Arrays (FPGAs).

[164]  arXiv:2405.00367 [pdf, other]
Title: Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation
Comments: Accepted at SIGIR 2024 short paper track
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different audio samples. Therefore, under many-to-one mapping conditions, audio-text datasets lead to poor performance of retrieval tasks. In this paper, we propose a novel approach to tackle the data imbalance problem in audio-language retrieval task. To overcome the limitation, we introduce a method that employs a distance sampling-based paraphraser leveraging ChatGPT, utilizing distance function to generate a controllable distribution of manipulated text data. For a set of sentences with the same context, the distance is used to calculate a degree of manipulation for any two sentences, and ChatGPT's few-shot prompting is performed using a text cluster with a similar distance defined by the Jaccard similarity. Therefore, ChatGPT, when applied to few-shot prompting with text clusters, can adjust the diversity of the manipulated text based on the distance. The proposed approach is shown to significantly enhance performance in audio-text retrieval, outperforming conventional text augmentation techniques.

[165]  arXiv:2405.00368 [pdf, other]
Title: Directed Redundancy in Time Series
Authors: Jan Østergaard
Comments: Accepted to be presented at the IEEE International Symposium on Information Theory 2024
Subjects: Information Theory (cs.IT)

We quantify the average amount of redundant information that is transferred from a subset of relevant random source processes to a target process. To identify the relevant source processes, we consider those that are connected to the target process and in addition share a certain proportion of the total information causally provided to the target. Even if the relevant processes have no directed information exchange between them, they can still causally provide redundant information to the target. This makes it difficult to identify the relevant processes. To solve this issue, we propose the existence of a hidden redundancy process that governs the shared information among the relevant processes. We bound the redundancy by the minimal average directed redundancy from the relevant processes to the target, from the hidden redundancy process to the target, and from the hidden redundancy process to the relevant processes.

[166]  arXiv:2405.00377 [pdf, ps, other]
Title: Thread review sentimental analysis with tkinter GUI & tableau dashboard
Authors: Robin Donal
Subjects: Computer Science and Game Theory (cs.GT)

This project focuses on utilizing a combination of Tkinter for GUI development and Tableauf for data visualization to do sentiment analysis on thread reviews.The main goal is to evaluate and visualize consumer sentiments as they are expressed in thread reviews in order to provide insights into areas for improvement, preferences, and customer satisfaction.The procedure starts with gathering thread reviews from many sources, which are then cleaned and prepared for analysis through preprocessing.Sentiment analysis classifies opinions as good, negative, or neutral based on the expressed sentiment by applying natural language processing techniques.The standard Python GUI package Tkinter is used to create an interactive user interface that allows users to enter thread reviews, start the sentiment analysis process, and see the analysis's outcomes.With the help of the user-friendly GUI, users may interact with the system and acquire insightful information with ease.Additionally, Tableau is used to produce a dynamic and eye-catching dashboard that displays the findings of the sentiment analysis using a variety of charts and graphs.Stakeholders may make educated decisions based on the studied data by using the dashboard, which provides a thorough overview of the sentiment distribution, frequency of positive and negative reviews, trending topics, and other pertinent indicators.Overall, this project offers a solid method for analyzing and comprehending customers' sentiments from thread reviews by integrating Tableauf for GUI development with Tkinter for sentiment analysis and data visualization. This allows for the creation of meaningful dashboards.

[167]  arXiv:2405.00378 [pdf, other]
Title: Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Consistency learning is a central strategy to tackle unlabeled data in semi-supervised medical image segmentation (SSMIS), which enforces the model to produce consistent predictions under the perturbation. However, most current approaches solely focus on utilizing a specific single perturbation, which can only cope with limited cases, while employing multiple perturbations simultaneously is hard to guarantee the quality of consistency learning. In this paper, we propose an Adaptive Bidirectional Displacement (ABD) approach to solve the above challenge. Specifically, we first design a bidirectional patch displacement based on reliable prediction confidence for unlabeled data to generate new samples, which can effectively suppress uncontrollable regions and still retain the influence of input perturbations. Meanwhile, to enforce the model to learn the potentially uncontrollable content, a bidirectional displacement operation with inverse confidence is proposed for the labeled images, which generates samples with more unreliable information to facilitate model learning. Extensive experiments show that ABD achieves new state-of-the-art performances for SSMIS, significantly improving different baselines. Source code is available at https://github.com/chy-upc/ABD.

[168]  arXiv:2405.00382 [pdf, ps, other]
Title: Modified least squares method and a review of its applications in machine learning and fractional differential/integral equations
Subjects: Numerical Analysis (math.NA)

The least squares method provides the best-fit curve by minimizing the total squares error. In this work, we provide the modified least squares method based on the fractional orthogonal polynomials that belong to the space $M_{n}^{\lambda} := \text{span}\{1,x^{\lambda},x^{2\lambda},\ldots,x^{n\lambda}\},~\lambda \in (0,2]$. Numerical experiments demonstrate how to solve different problems using the modified least squares method. Moreover, the results show the advantage of the modified least squares method compared to the classical least squares method. Furthermore, we discuss the various applications of the modified least squares method in the fields like fractional differential/integral equations and machine learning.

[169]  arXiv:2405.00383 [pdf, other]
Title: Learning Tactile Insertion in the Real World
Subjects: Robotics (cs.RO)

Humans have exceptional tactile sensing capabilities, which they can leverage to solve challenging, partially observable tasks that cannot be solved from visual observation alone. Research in tactile sensing attempts to unlock this new input modality for robots. Lately, these sensors have become cheaper and, thus, widely available. At the same time, the question of how to integrate them into control loops is still an active area of research, with central challenges being partial observability and the contact-rich nature of manipulation tasks. In this study, we propose to use Reinforcement Learning to learn an end-to-end policy, mapping directly from tactile sensor readings to actions. Specifically, we use Dreamer-v3 on a challenging, partially observable robotic insertion task with a Franka Research 3, both in simulation and on a real system. For the real setup, we built a robotic platform capable of resetting itself fully autonomously, allowing for extensive training runs without human supervision. Our preliminary results indicate that Dreamer is capable of utilizing tactile inputs to solve robotic manipulation tasks in simulation and reality. Furthermore, we find that providing the robot with tactile feedback generally improves task performance, though, in our setup, we do not yet include other sensing modalities. In the future, we plan to utilize our platform to evaluate a wide range of other Reinforcement Learning algorithms on tactile tasks.

[170]  arXiv:2405.00384 [pdf, other]
Title: Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol
Comments: Accepted for publication, 3rd ACM Int. Workshop on Multimedia AI against Disinformation (MAD'24) at ACM ICMR'24, June 10, 2024, Phuket, Thailand. This is the "accepted version"
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them. To facilitate further research and provide a common evaluation platform, we introduce an experimental protocol and a benchmark dataset simulating such inconsistencies. Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection, highlighting its potential in content verification applications.

[171]  arXiv:2405.00387 [pdf, other]
Title: Cell Switching in HAPS-Aided Networking: How the Obscurity of Traffic Loads Affects the Decision
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)

This study aims to introduce the cell load estimation problem of cell switching approaches in cellular networks specially-presented in a high-altitude platform station (HAPS)-assisted network. The problem arises from the fact that the traffic loads of sleeping base stations for the next time slot cannot be perfectly known, but they can rather be estimated, and any estimation error could result in divergence from the optimal decision, which subsequently affects the performance of energy efficiency. The traffic loads of the sleeping base stations for the next time slot are required because the switching decisions are made proactively in the current time slot. Two different Q-learning algorithms are developed; one is full-scale, focusing solely on the performance, while the other one is lightweight and addresses the computational cost. Results confirm that the estimation error is capable of changing cell switching decisions that yields performance divergence compared to no-error scenarios. Moreover, the developed Q-learning algorithms perform well since an insignificant difference (i.e., 0.3%) is observed between them and the optimum algorithm.

[172]  arXiv:2405.00390 [pdf, other]
Title: CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models
Comments: 25 pages, 7 figures, and 18 tables
Subjects: Computation and Language (cs.CL)

Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well.

[173]  arXiv:2405.00391 [pdf, other]
Title: Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial intelligence have been developed to reduce beamforming overhead. Intuitively, if we can implement the near-optimal beamforming only using a tiny subset of the all channel information, the overhead for channel estimation and beamforming would be reduced significantly compared with the traditional beamforming methods that usually need full channel information and the inversion of large dimensional matrix. In light of this idea, we propose a novel scheme that utilizes Wasserstein generative adversarial network with gradient penalty to infer the full beamforming matrices based on very little of channel information. Simulation results confirm that it can accomplish comparable performance with the weighted minimum mean-square error algorithm, while reducing the overhead by over 50%.

[174]  arXiv:2405.00392 [pdf, other]
Title: Certified Adversarial Robustness of Machine Learning-based Malware Detectors via (De)Randomized Smoothing
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Deep learning-based malware detection systems are vulnerable to adversarial EXEmples - carefully-crafted malicious programs that evade detection with minimal perturbation. As such, the community is dedicating effort to develop mechanisms to defend against adversarial EXEmples. However, current randomized smoothing-based defenses are still vulnerable to attacks that inject blocks of adversarial content. In this paper, we introduce a certifiable defense against patch attacks that guarantees, for a given executable and an adversarial patch size, no adversarial EXEmple exist. Our method is inspired by (de)randomized smoothing which provides deterministic robustness certificates. During training, a base classifier is trained using subsets of continguous bytes. At inference time, our defense splits the executable into non-overlapping chunks, classifies each chunk independently, and computes the final prediction through majority voting to minimize the influence of injected content. Furthermore, we introduce a preprocessing step that fixes the size of the sections and headers to a multiple of the chunk size. As a consequence, the injected content is confined to an integer number of chunks without tampering the other chunks containing the real bytes of the input examples, allowing us to extend our certified robustness guarantees to content insertion attacks. We perform an extensive ablation study, by comparing our defense with randomized smoothing-based defenses against a plethora of content manipulation attacks and neural network architectures. Results show that our method exhibits unmatched robustness against strong content-insertion attacks, outperforming randomized smoothing-based defenses in the literature.

[175]  arXiv:2405.00393 [pdf, other]
Title: Inferring State Machine from the Protocol Implementation via Large Langeuage Model
Subjects: Cryptography and Security (cs.CR)

State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex code structures and behaviors. To address these limitations, we propose an innovative state machine inference approach powered by Large Language Models (LLMs). Utilizing text-embedding technology, this method allows LLMs to dissect and analyze the intricacies of protocol implementation code. Through targeted prompt engineering, we systematically identify and infer the underlying state machines. Our evaluation across six protocol implementations demonstrates the method's high efficacy, achieving an accuracy rate exceeding 90% and successfully delineating differences on state machines among various implementations of the same protocol. Importantly, integrating this approach with protocol fuzzing has notably enhanced AFLNet's code coverage by 10% over RFCNLP, showcasing the considerable potential of LLMs in advancing network protocol security analysis. Our proposed method not only marks a significant step forward in accurate state machine inference but also opens new avenues for improving the security and reliability of protocol implementations.

[176]  arXiv:2405.00394 [pdf, other]
Title: Enhancing Mutual Trustworthiness in Federated Learning for Data-Rich Smart Cities
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

Federated learning is a promising collaborative and privacy-preserving machine learning approach in data-rich smart cities. Nevertheless, the inherent heterogeneity of these urban environments presents a significant challenge in selecting trustworthy clients for collaborative model training. The usage of traditional approaches, such as the random client selection technique, poses several threats to the system's integrity due to the possibility of malicious client selection. Primarily, the existing literature focuses on assessing the trustworthiness of clients, neglecting the crucial aspect of trust in federated servers. To bridge this gap, in this work, we propose a novel framework that addresses the mutual trustworthiness in federated learning by considering the trust needs of both the client and the server. Our approach entails: (1) Creating preference functions for servers and clients, allowing them to rank each other based on trust scores, (2) Establishing a reputation-based recommendation system leveraging multiple clients to assess newly connected servers, (3) Assigning credibility scores to recommending devices for better server trustworthiness measurement, (4) Developing a trust assessment mechanism for smart devices using a statistical Interquartile Range (IQR) method, (5) Designing intelligent matching algorithms considering the preferences of both parties. Based on simulation and experimental results, our approach outperforms baseline methods by increasing trust levels, global model accuracy, and reducing non-trustworthy clients in the system.

[177]  arXiv:2405.00395 [pdf, other]
Title: Trust Driven On-Demand Scheme for Client Deployment in Federated Learning
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Containerization technology plays a crucial role in Federated Learning (FL) setups, expanding the pool of potential clients and ensuring the availability of specific subsets for each learning iteration. However, doubts arise about the trustworthiness of devices deployed as clients in FL scenarios, especially when container deployment processes are involved. Addressing these challenges is important, particularly in managing potentially malicious clients capable of disrupting the learning process or compromising the entire model. In our research, we are motivated to integrate a trust element into the client selection and model deployment processes within our system architecture. This is a feature lacking in the initial client selection and deployment mechanism of the On-Demand architecture. We introduce a trust mechanism, named "Trusted-On-Demand-FL", which establishes a relationship of trust between the server and the pool of eligible clients. Utilizing Docker in our deployment strategy enables us to monitor and validate participant actions effectively, ensuring strict adherence to agreed-upon protocols while strengthening defenses against unauthorized data access or tampering. Our simulations rely on a continuous user behavior dataset, deploying an optimization model powered by a genetic algorithm to efficiently select clients for participation. By assigning trust values to individual clients and dynamically adjusting these values, combined with penalizing malicious clients through decreased trust scores, our proposed framework identifies and isolates harmful clients. This approach not only reduces disruptions to regular rounds but also minimizes instances of round dismissal, Consequently enhancing both system stability and security.

[178]  arXiv:2405.00399 [pdf, other]
Title: Enhanced Error Estimates for Augmented Subspace Method with Crouzeix-Raviart Element
Comments: 23 pages, 5 figures. arXiv admin note: text overlap with arXiv:2106.00548, arXiv:2401.04063
Subjects: Numerical Analysis (math.NA)

In this paper, we present some enhanced error estimates for augmented subspace methods with the nonconforming Crouzeix-Raviart (CR) element. Before the novel estimates, we derive the explicit error estimates for the case of single eigenpair and multiple eigenpairs based on our defined spectral projection operators, respectively. Then we first strictly prove that the CR element based augmented subspace method exhibits the second-order convergence rate between the two steps of the augmented subspace iteration, which coincides with the practical experimental results. The algebraic error estimates of second order for the augmented subspace method explicitly elucidate the dependence of the convergence rate of the algebraic error on the coarse space, which provides new insights into the performance of the augmented subspace method. Numerical experiments are finally supplied to verify these new estimate results and the efficiency of our algorithms.

[179]  arXiv:2405.00401 [pdf, other]
Title: Optimized Drug Design using Multi-Objective Evolutionary Algorithms with SELFIES
Journal-ref: The final version of this paper will be presented on Proceedings of the The IEEE World Congress on Computational Intelligence2024
Subjects: Neural and Evolutionary Computing (cs.NE)

Computer aided drug design is a promising approach to reduce the tremendous costs, i.e. time and resources, for developing new medicinal drugs. It finds application in aiding the traversal of the vast chemical space of potentially useful compounds. In this paper, we deploy multi-objective evolutionary algorithms, namely NSGA-II, NSGA-III, and MOEA/D, for this purpose. At the same time, we used the SELFIES string representation method. In addition to the QED and SA score, we optimize compounds using the GuacaMol benchmark multi-objective task sets. Our results indicate that all three algorithms show converging behavior and successfully optimize the defined criteria whilst differing mainly in the number of potential solutions found. We observe that novel and promising candidates for synthesis are discovered among obtained compounds in the Pareto-sets.

[180]  arXiv:2405.00402 [pdf, other]
Title: Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models
Subjects: Computation and Language (cs.CL)

The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) using demonstrations generated from robust Large Language Models (LLMs). Although these approaches deliver more performant models, they do not show sufficiently strong generalization ability as the training only relies on the provided demonstrations.
In this paper, we propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Our approach is based on a two-stage process, where reasoning abilities are first transferred between LLMs and Small Language Models (SLMs) via Instruction-tuning on demonstrations provided by LLMs, and then the instructed models Self-refine their abilities through preference optimization strategies. In particular, the second phase operates refinement heuristics based on the Direct Preference Optimization algorithm, where the SLMs are elicited to deliver a series of reasoning paths by automatically sampling the generated responses and providing rewards using ground truths from the LLMs. Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios, aligning the reasoning abilities of Smaller and Larger Language Models.

[181]  arXiv:2405.00410 [pdf, other]
Title: UCB-driven Utility Function Search for Multi-objective Reinforcement Learning
Subjects: Machine Learning (cs.LG)

In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to approximate a Pareto front of policies. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process, with the aim of maximising the hypervolume of the resulting Pareto front. The proposed method is shown to outperform various MORL baselines on Mujoco benchmark problems across different random seeds. The code is online at: https://github.com/SYCAMORE-1/ucb-MOPPO.

[182]  arXiv:2405.00415 [pdf, other]
Title: On Developing an Artifact-based Approach to Regulatory Requirements Engineering
Comments: The paper was accepted to the 14th International Model-Driven Requirements Engineering (MoDRE) workshop co-located with the 32nd IEEE International Requirements Engineering Conference (RE 2024) in Reykjavik, Iceland
Subjects: Software Engineering (cs.SE)

Context: Regulatory acts are a challenging source when eliciting, interpreting, and analyzing requirements. Requirements engineers often need to involve legal experts who, however, may often not be available. This raises the need for approaches to regulatory Requirements Engineering (RE) covering and integrating both legal and engineering perspectives.
Problem: Regulatory RE approaches need to capture and reflect both the elementary concepts and relationships from a legal perspective and their seamless transition to concepts used to specify software requirements. No existing approach considers explicating and managing legal domain knowledge and engineering-legal coordination.
Method: We conducted focus group sessions with legal researchers to identify the core challenges to establishing a regulatory RE approach. Based on our findings, we developed a candidate solution and conducted a first conceptual validation to assess its feasibility.
Results: We introduce the first version of our Artifact Model for Regulatory Requirements Engineering (AM4RRE) and its conceptual foundation. It provides a blueprint for applying legal (modelling) concepts and well-established RE concepts. Our initial results suggest that artifact-centric RE can be applied to managing legal domain knowledge and engineering-legal coordination.
Conclusions: The focus groups that served as a basis for building our model and the results from the expert validation both strengthen our confidence that we already provide a valuable basis for systematically integrating legal concepts into RE. This overcomes contemporary challenges to regulatory RE and serves as a basis for exposure to critical discussions in the community before continuing with the development of tool-supported extensions and large-scale empirical evaluations in practice.

[183]  arXiv:2405.00417 [pdf, other]
Title: Conformal Risk Control for Ordinal Classification
Comments: 17 pages, 8 figures, 2 table; 1 supplementary page
Journal-ref: In UAI 2023: The 39th Conference on Uncertainty in Artificial Intelligence
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

As a natural extension to the standard conformal prediction method, several conformal risk control methods have been recently developed and applied to various learning problems. In this work, we seek to control the conformal risk in expectation for ordinal classification tasks, which have broad applications to many real problems. For this purpose, we firstly formulated the ordinal classification task in the conformal risk control framework, and provided theoretic risk bounds of the risk control method. Then we proposed two types of loss functions specially designed for ordinal classification tasks, and developed corresponding algorithms to determine the prediction set for each case to control their risks at a desired level. We demonstrated the effectiveness of our proposed methods, and analyzed the difference between the two types of risks on three different datasets, including a simulated dataset, the UTKFace dataset and the diabetic retinopathy detection dataset.

[184]  arXiv:2405.00418 [pdf, other]
Title: Detection of ransomware attacks using federated learning based on the CNN model
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Computing is still under a significant threat from ransomware, which necessitates prompt action to prevent it. Ransomware attacks can have a negative impact on how smart grids, particularly digital substations. In addition to examining a ransomware detection method using artificial intelligence (AI), this paper offers a ransomware attack modeling technique that targets the disrupted operation of a digital substation. The first, binary data is transformed into image data and fed into the convolution neural network model using federated learning. The experimental findings demonstrate that the suggested technique detects ransomware with a high accuracy rate.

[185]  arXiv:2405.00420 [pdf, other]
Title: Self-supervised Pre-training of Text Recognizers
Comments: 18 pages, 6 figures, 4 tables, accepted to ICDAR24
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them. Therefore, methods utilizing unlabeled data are researched. We study self-supervised pre-training methods based on masked label prediction using three different approaches -- Feature Quantization, VQ-VAE, and Post-Quantized AE. We also investigate joint-embedding approaches with VICReg and NT-Xent objectives, for which we propose an image shifting technique to prevent model collapse where it relies solely on positional encoding while completely ignoring the input image. We perform our experiments on historical handwritten (Bentham) and historical printed datasets mainly to investigate the benefits of the self-supervised pre-training techniques with different amounts of annotated target domain data. We use transfer learning as strong baselines. The evaluation shows that the self-supervised pre-training on data from the target domain is very effective, but it struggles to outperform transfer learning from closely related domains. This paper is one of the first researches exploring self-supervised pre-training in document text recognition, and we believe that it will become a cornerstone for future research in this area. We made our implementation of the investigated methods publicly available at https://github.com/DCGM/pero-pretraining.

[186]  arXiv:2405.00422 [pdf, ps, other]
Title: Bona-Smith-type systems in bounded domains with slip-wall boundary conditions: Theoretical justification and a conservative numerical scheme
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

Considered herein is a class of Boussinesq systems of Bona-Smith type that describe water waves in bounded two-dimensional domains with slip-wall boundary conditions and variable bottom topography. Such boundary conditions are necessary in situations involving water waves in channels, ports, and generally in basins with solid boundaries. We prove that, given appropriate initial conditions, the corresponding initial-boundary value problems have unique solutions locally in time, which is a fundamental property of deterministic mathematical modeling. Moreover, we demonstrate that the systems under consideration adhere to three basic conservation laws for water waves: mass, vorticity, and energy conservation.
The theoretical analysis of these specific Boussinesq systems leads to a conservative mixed finite element formulation. Using explicit, relaxation Runge-Kutta methods for the discretization in time, we devise a fully discrete scheme for the numerical solution of initial-boundary value problems with slip-wall conditions, preserving mass, vorticity, and energy. Finally, we present a series of challenging numerical experiments to assess the applicability of the new numerical model.

[187]  arXiv:2405.00423 [pdf, ps, other]
Title: $α$-leakage by Rényi Divergence and Sibson Mutual Information
Authors: Ni Ding
Subjects: Information Theory (cs.IT)

For $\tilde{f}(t) = \exp(\frac{\alpha-1}{\alpha}t)$, this paper proposes a $\tilde{f}$-mean information gain measure. R\'{e}nyi divergence is shown to be the maximum $\tilde{f}$-mean information gain incurred at each elementary event $y$ of channel output $Y$ and Sibson mutual information is the $\tilde{f}$-mean of this $Y$-elementary information gain. Both are proposed as $\alpha$-leakage measures, indicating the most information an adversary can obtain on sensitive data. It is shown that the existing $\alpha$-leakage by Arimoto mutual information can be expressed as $\tilde{f}$-mean measures by a scaled probability. Further, Sibson mutual information is interpreted as the maximum $\tilde{f}$-mean information gain over all estimation decisions applied to channel output. This reveals that the exiting generalized Blahut-Arimoto method for computing R\'{e}nyi capacity (or Gallager's error exponent) in fact maximizes a $\tilde{f}$-mean information gain iteratively over estimation decision and channel input. This paper also derives a decomposition of $\tilde{f}$-mean information gain, analogous to the Sibson identity for R\'{e}nyi divergence.

[188]  arXiv:2405.00426 [pdf, other]
Title: On the Potential of RIS in the Context of PLA in Wireless Communication Systems
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)

Re-configurable Intelligent Surfaces (RIS) technology has proven itself a promising candidate for the next generation of wireless networks through its enhanced performance in terms of throughput, spectral, and energy efficiency. However, the broadcast nature of RIS-assisted wireless communication makes it vulnerable to malicious attacks at the physical layer. On the other hand, physical layer authentication is an emerging area in the security domain to thwart different attacks such as cloning, spoofing, and impersonation by using the random features of the physical layer. In this paper, we investigate RIS-assisted wireless communication systems to unlock the potential of using RIS for physical layer authentication (PLA). Specifically, we exploit two distinct features of the physical layer: pathloss and channel impulse response (CIR) for PLA in RIS-assisted wireless communication. We construct hypothesis tests for the estimated features and derive the closed-form errors' expressions. Further, we chose the critical error, i.e., missed detection as our objective function for minimization by optimizing the phase shift of the RIS pannel. We compare the performance of our proposed mechanisms with baseline mechanisms which are PLA schemes using the same features but with no RIS assistance. Furthermore, we thoroughly evaluate our proposed schemes using performance metrics such as the probability of false alarm (PFA), the probability of missed detection (PMD), and the receiver operating characteristic (ROC) curves. The results demonstrate the significant positive impact of RIS on PLA, as it effectively reduces PMD values to zero when determining the optimal phase shift.

[189]  arXiv:2405.00427 [pdf, other]
Title: Improved linearly ordered colorings of hypergraphs via SDP rounding
Comments: 19 pages; 13 pages for the main body
Subjects: Data Structures and Algorithms (cs.DS)

We consider the problem of linearly ordered (LO) coloring of hypergraphs. A hypergraph has an LO coloring if there is a vertex coloring, using a set of ordered colors, so that (i) no edge is monochromatic, and (ii) each edge has a unique maximum color. It is an open question as to whether or not a 2-LO colorable 3-uniform hypergraph can be LO colored with 3 colors in polynomial time. Nakajima and Zivn\'{y} recently gave a polynomial-time algorithm to color such hypergraphs with $\widetilde{O}(n^{1/3})$ colors and asked if SDP methods can be used directly to obtain improved bounds. Our main result is to show how to use SDP-based rounding methods to produce an LO coloring with $\widetilde{O}(n^{1/5})$ colors for such hypergraphs. We first show that we can reduce the problem to cases with highly structured SDP solutions, which we call balanced hypergraphs. Then we show how to apply classic SDP-rounding tools in this case. We believe that the reduction to balanced hypergraphs is novel and could be of independent interest.

[190]  arXiv:2405.00428 [pdf, other]
Title: CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection
Comments: 21 pages, 7 figures
Subjects: Software Engineering (cs.SE)

With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement). Therefore, it is important to conduct code clone detection to discover similar code pairs. Many approaches have been proposed to detect code clones where token-based tools can scale to big code. However, due to the lack of program details, they cannot handle more complicated code clones, i.e., semantic code clones. In this paper, we introduce CC2Vec, a novel code encoding method designed to swiftly identify simple code clones while also enhancing the capability for semantic code clone detection. To retain the program details between tokens, CC2Vec divides them into different categories (i.e., typed tokens) according to the syntactic types and then applies two self-attention mechanism layers to encode them. To resist changes in the code structure of semantic code clones, CC2Vec performs contrastive learning to reduce the differences introduced by different code implementations. We evaluate CC2Vec on two widely used datasets (i.e., BigCloneBench and Google Code Jam) and the results report that our method can effectively detect simple code clones. In addition, CC2Vec not only attains comparable performance to widely used semantic code clone detection systems such as ASTNN, SCDetector, and FCCA by simply fine-tuning, but also significantly surpasses these methods in both detection efficiency.

[191]  arXiv:2405.00429 [pdf, ps, other]
Title: Clique-free t-matchings in degree-bounded graphs
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

We consider problems of finding a maximum size/weight $t$-matching without forbidden subgraphs in an undirected graph $G$ with the maximum degree bounded by $t+1$, where $t$ is an integer greater than $2$. Depending on the variant forbidden subgraphs denote certain subsets of $t$-regular complete partite subgraphs of $G$. A graph is complete partite if there exists a partition of its vertex set such that every pair of vertices from different sets is connected by an edge and vertices from the same set form an independent set. A clique $K_t$ and a bipartite clique $K_{t,t}$ are examples of complete partite graphs. These problems are natural generalizations of the triangle-free and square-free $2$-matching problems in subcubic graphs. In the weighted setting we assume that the weights of edges of $G$ are vertex-induced on every forbidden subgraph. We present simple and fast combinatorial algorithms for these problems. The presented algorithms are the first ones for the weighted versions, and for the unweighted ones, are faster than those known previously. Our approach relies on the use of gadgets with so-called half-edges. A half-edge of edge $e$ is, informally speaking, a half of $e$ containing exactly one of its endpoints.

[192]  arXiv:2405.00431 [pdf, other]
Title: Detail-Enhancing Framework for Reference-Based Image Super-Resolution
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent years have witnessed the prosperity of reference-based image super-resolution (Ref-SR). By importing the high-resolution (HR) reference images into the single image super-resolution (SISR) approach, the ill-posed nature of this long-standing field has been alleviated with the assistance of texture transferred from reference images. Although the significant improvement in quantitative and qualitative results has verified the superiority of Ref-SR methods, the presence of misalignment before texture transfer indicates room for further performance improvement. Existing methods tend to neglect the significance of details in the context of comparison, therefore not fully leveraging the information contained within low-resolution (LR) images. In this paper, we propose a Detail-Enhancing Framework (DEF) for reference-based super-resolution, which introduces the diffusion model to generate and enhance the underlying detail in LR images. If corresponding parts are present in the reference image, our method can facilitate rigorous alignment. In cases where the reference image lacks corresponding parts, it ensures a fundamental improvement while avoiding the influence of the reference image. Extensive experiments demonstrate that our proposed method achieves superior visual results while maintaining comparable numerical outcomes.

[193]  arXiv:2405.00433 [pdf, other]
Title: Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models
Comments: arXiv admin note: text overlap with arXiv:2311.07625
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Activity and parameter sparsity are two standard methods of making neural networks computationally more efficient. Event-based architectures such as spiking neural networks (SNNs) naturally exhibit activity sparsity, and many methods exist to sparsify their connectivity by pruning weights. While the effect of weight pruning on feed-forward SNNs has been previously studied for computer vision tasks, the effects of pruning for complex sequence tasks like language modeling are less well studied since SNNs have traditionally struggled to achieve meaningful performance on these tasks. Using a recently published SNN-like architecture that works well on small-scale language modeling, we study the effects of weight pruning when combined with activity sparsity. Specifically, we study the trade-off between the multiplicative efficiency gains the combination affords and its effect on task performance for language modeling. To dissect the effects of the two sparsities, we conduct a comparative analysis between densely activated models and sparsely activated event-based models across varying degrees of connectivity sparsity. We demonstrate that sparse activity and sparse connectivity complement each other without a proportional drop in task performance for an event-based neural network trained on the Penn Treebank and WikiText-2 language modeling datasets. Our results suggest sparsely connected event-based neural networks are promising candidates for effective and efficient sequence modeling.

[194]  arXiv:2405.00435 [pdf, other]
Title: CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model
Subjects: Human-Computer Interaction (cs.HC)

The integration of new technology with cultural studies enhances our understanding of cultural heritage but often struggles to connect with diverse audiences. It is challenging to align personal interpretations with the intended meanings across different cultures. Our study investigates the important factors in appreciating art from a cross-cultural perspective. We explore the application of Large Language Models (LLMs) to bridge the cultural and language barriers in understanding Traditional Chinese Paintings (TCPs). We present CultiVerse, a visual analytics system that utilizes LLMs within a mixed-initiative framework, enhancing interpretative appreciation of TCP in a cross-cultural dialogue. CultiVerse addresses the challenge of translating the nuanced symbolism in art, which involves interpreting complex cultural contexts, aligning cross-cultural symbols, and validating cultural acceptance. CultiVerse integrates an interactive interface with the analytical capability of LLMs to explore a curated TCP dataset, facilitating the analysis of multifaceted symbolic meanings and the exploration of cross-cultural serendipitous discoveries. Empirical evaluations affirm that CultiVerse significantly improves cross-cultural understanding, offering deeper insights and engaging art appreciation.

[195]  arXiv:2405.00436 [pdf, other]
Title: Porting HPC Applications to AMD Instinct$^\text{TM}$ MI300A Using Unified Memory and OpenMP
Comments: Accepted paper at ISC High Performance 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

AMD Instinct$^\text{TM}$ MI300A is the world's first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYC$^\text{TM}$ cores and third generation CDNA$^\text{TM}$ compute units. A single memory space offers several advantages: i) it eliminates the need for data replication and costly data transfers, ii) it substantially simplifies application development and allows an incremental acceleration of applications, iii) is easy to maintain, and iv) its potential can be well realized via the abstractions in the OpenMP 5.2 standard, where the host and the device data environments can be unified in a more performant way. In this article, we provide a blueprint of the APU programming model leveraging unified memory and highlight key distinctions compared to the conventional approach with discrete GPUs. OpenFOAM, an open-source C++ library for computational fluid dynamics, is presented as a case study to emphasize the flexibility and ease of offloading a full-scale production-ready application on MI300 APUs using directive-based OpenMP programming.

[196]  arXiv:2405.00437 [pdf, other]
Title: Reduced-order modeling for second-order computational homogenization with applications to geometrically parameterized elastomeric metamaterials
Subjects: Computational Engineering, Finance, and Science (cs.CE)

The structural properties of mechanical metamaterials are typically studied with two-scale methods based on computational homogenization. Because such materials have a complex microstructure, enriched schemes such as second-order computational homogenization are required to fully capture their non-linear behavior, which arises from non-local interactions due to the buckling or patterning of the microstructure. In the two-scale formulation, the effective behavior of the microstructure is captured with a representative volume element (RVE), and a homogenized effective continuum is considered on the macroscale.
Although an effective continuum formulation is introduced, solving such two-scale models concurrently is still computationally demanding due to the many repeated solutions for each RVE at the microscale level. In this work, we propose a reduced-order model for the microscopic problem arising in second-order computational homogenization, using proper orthogonal decomposition and a novel hyperreduction method that is specifically tailored for this problem and inspired by the empirical cubature method. Two numerical examples are considered, in which the performance of the reduced-order model is carefully assessed by comparing its solutions with direct numerical simulations (entirely resolving the underlying microstructure) and the full second-order computational homogenization model. The reduced-order model is able to approximate the result of the full computational homogenization well, provided that the training data is representative for the problem at hand. Any remaining errors, when compared with the direct numerical simulation, can be attributed to the inherent approximation errors in the computational homogenization scheme. Regarding run times for one thread, speed-ups on the order of 100 are achieved with the reduced-order model as compared to direct numerical simulations.

[197]  arXiv:2405.00438 [pdf, other]
Title: MetaRM: Shifted Distributions Alignment via Meta-Learning
Comments: 11 pages, 6 figures. arXiv admin note: text overlap with arXiv:2401.06080
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model shifts, leading to the RM's reduced ability to distinguish between responses. This issue is further compounded when the RM, trained on a specific data distribution, struggles to generalize to examples outside of that distribution. These two issues can be united as a challenge posed by the shifted distribution of the environment. To surmount this challenge, we introduce MetaRM, a method leveraging meta-learning to align the RM with the shifted environment distribution. MetaRM is designed to train the RM by minimizing data loss, particularly for data that can improve the differentiation ability to examples of the shifted target distribution. Extensive experiments demonstrate that MetaRM significantly improves the RM's distinguishing ability in iterative RLHF optimization, and also provides the capacity to identify subtle differences in out-of-distribution samples.

[198]  arXiv:2405.00440 [pdf, ps, other]
Title: Intersection Types via Finite-Set Declarations
Comments: To appear in Wollic 2024
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

The lambda-cube is a famous pure type system (PTS) cube of eight powerful explicit type systems that include the simple, polymorphic and dependent type theories. The lambda-cube only types Strongly Normalising (SN) terms but not all of them. It is well known that even the most powerful system of the lambda-cube can only type the same pure untyped lambda-terms that are typable by the higher-order polymorphic implicitly typed lambda-calculus Fomega, and that there is an untyped {\lambda}-term U' that is SN but is not typable in Fomega or the lambda-cube. Hence, neither system can type all the SN terms it expresses. In this paper, we present the f-cube, an extension of the lambda-cube with finite-set declarations (FSDs) like y\in{C1,...,Cn} : B which means that y is of type B and can only be one of C1,..., Cn. The novelty of our FSDs is that they allow to represent intersection types as Pi-types. We show how to translate and type the term U' in the f-cube using an encoding of intersection types based on FSDs. Notably, our translation works without needing anything like the usual troublesome intersection-introduction rule that proves a pure untyped lambda-term M has an intersection of k types using k independent sub-derivations. As such, our approach is useful for language implementers who want the power of intersection types without the pain of the intersection-introduction rule.

[199]  arXiv:2405.00441 [pdf, other]
Title: Modeling Linear and Non-linear Layers: An MILP Approach Towards Finding Differential and Impossible Differential Propagations
Comments: 42 pages, 2 figures, 21 tables, 7 algorithms
Subjects: Cryptography and Security (cs.CR)

Symmetric key cryptography stands as a fundamental cornerstone in ensuring security within contemporary electronic communication frameworks. The cryptanalysis of classical symmetric key ciphers involves traditional methods and techniques aimed at breaking or analyzing these cryptographic systems. In the evaluation of new ciphers, the resistance against linear and differential cryptanalysis is commonly a key design criterion. The wide trail design technique for block ciphers facilitates the demonstration of security against linear and differential cryptanalysis. Assessing the scheme's security against differential attacks often involves determining the minimum number of active SBoxes for all rounds of a cipher. The propagation characteristics of a cryptographic component, such as an SBox, can be expressed using Boolean functions. Mixed Integer Linear Programming (MILP) proves to be a valuable technique for solving Boolean functions. We formulate a set of inequalities to model a Boolean function, which is subsequently solved by an MILP solver. To efficiently model a Boolean function and select a minimal set of inequalities, two key challenges must be addressed. We propose algorithms to address the second challenge, aiming to find more optimized linear and non-linear components. Our approaches are applied to modeling SBoxes (up to six bits) and EXOR operations with any number of inputs. Additionally, we introduce an MILP-based automatic tool for exploring differential and impossible differential propagations within a cipher. The tool is successfully applied to five lightweight block ciphers: Lilliput, GIFT64, SKINNY64, Klein, and MIBS.

[200]  arXiv:2405.00448 [pdf, other]
Title: MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces MMTryon, a multi-modal multi-reference VIrtual Try-ON (VITON) framework, which can generate high-quality compositional try-on results by taking as inputs a text instruction and multiple garment images. Our MMTryon mainly addresses two problems overlooked in prior literature: 1) Support of multiple try-on items and dressing styleExisting methods are commonly designed for single-item try-on tasks (e.g., upper/lower garments, dresses) and fall short on customizing dressing styles (e.g., zipped/unzipped, tuck-in/tuck-out, etc.) 2) Segmentation Dependency. They further heavily rely on category-specific segmentation models to identify the replacement regions, with segmentation errors directly leading to significant artifacts in the try-on results. For the first issue, our MMTryon introduces a novel multi-modality and multi-reference attention mechanism to combine the garment information from reference images and dressing-style information from text instructions. Besides, to remove the segmentation dependency, MMTryon uses a parsing-free garment encoder and leverages a novel scalable data generation pipeline to convert existing VITON datasets to a form that allows MMTryon to be trained without requiring any explicit segmentation. Extensive experiments on high-resolution benchmarks and in-the-wild test sets demonstrate MMTryon's superiority over existing SOTA methods both qualitatively and quantitatively. Besides, MMTryon's impressive performance on multi-items and style-controllable virtual try-on scenarios and its ability to try on any outfit in a large variety of scenarios from any source image, opens up a new avenue for future investigation in the fashion community.

[201]  arXiv:2405.00449 [pdf, other]
Title: RAG-based Explainable Prediction of Road Users Behaviors for Automated Driving using Knowledge Graphs and Large Language Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

Prediction of road users' behaviors in the context of autonomous driving has gained considerable attention by the scientific community in the last years. Most works focus on predicting behaviors based on kinematic information alone, a simplification of the reality since road users are humans, and as such they are highly influenced by their surrounding context. In addition, a large plethora of research works rely on powerful Deep Learning techniques, which exhibit high performance metrics in prediction tasks but may lack the ability to fully understand and exploit the contextual semantic information contained in the road scene, not to mention their inability to provide explainable predictions that can be understood by humans. In this work, we propose an explainable road users' behavior prediction system that integrates the reasoning abilities of Knowledge Graphs (KG) and the expressiveness capabilities of Large Language Models (LLM) by using Retrieval Augmented Generation (RAG) techniques. For that purpose, Knowledge Graph Embeddings (KGE) and Bayesian inference are combined to allow the deployment of a fully inductive reasoning system that enables the issuing of predictions that rely on legacy information contained in the graph as well as on current evidence gathered in real time by onboard sensors. Two use cases have been implemented following the proposed approach: 1) Prediction of pedestrians' crossing actions; 2) Prediction of lane change maneuvers. In both cases, the performance attained surpasses the current state of the art in terms of anticipation and F1-score, showing a promising avenue for future research in this field.

[202]  arXiv:2405.00451 [pdf, other]
Title: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. To enhance consistency in intermediate steps, we combine outcome validation and stepwise self-evaluation, continually updating the quality assessment of newly generated data. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data. Theoretical analysis reveals the critical importance of using on-policy sampled data for successful self-improving. Extensive evaluations on various arithmetic and commonsense reasoning tasks demonstrate remarkable performance improvements over existing models. For instance, our approach outperforms the Mistral-7B Supervised Fine-Tuning (SFT) baseline on GSM8K, MATH, and SciQ, with substantial percentage increases in accuracy to $80.7\%$ (+$4.8\%$), $32.2\%$ (+$3.3\%$), and $88.5\%$ (+$7.7\%$), respectively. Additionally, our research delves into the training and inference compute tradeoff, providing insights into how our method effectively maximizes performance gains.

[203]  arXiv:2405.00452 [pdf, other]
Title: Predictive Accuracy-Based Active Learning for Medical Image Segmentation
Comments: 9 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Active learning is considered a viable solution to alleviate the contradiction between the high dependency of deep learning-based segmentation methods on annotated data and the expensive pixel-level annotation cost of medical images. However, most existing methods suffer from unreliable uncertainty assessment and the struggle to balance diversity and informativeness, leading to poor performance in segmentation tasks. In response, we propose an efficient Predictive Accuracy-based Active Learning (PAAL) method for medical image segmentation, first introducing predictive accuracy to define uncertainty. Specifically, PAAL mainly consists of an Accuracy Predictor (AP) and a Weighted Polling Strategy (WPS). The former is an attached learnable module that can accurately predict the segmentation accuracy of unlabeled samples relative to the target model with the predicted posterior probability. The latter provides an efficient hybrid querying scheme by combining predicted accuracy and feature representation, aiming to ensure the uncertainty and diversity of the acquired samples. Extensive experiment results on multiple datasets demonstrate the superiority of PAAL. PAAL achieves comparable accuracy to fully annotated data while reducing annotation costs by approximately 50% to 80%, showcasing significant potential in clinical applications. The code is available at https://github.com/shijun18/PAAL-MedSeg.

[204]  arXiv:2405.00453 [pdf, other]
Title: Fuzzy Intelligent System for Student Software Project Evaluation
Comments: Submitted to IJMECS for consideration
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Developing software projects allows students to put knowledge into practice and gain teamwork skills. However, assessing student performance in project-oriented courses poses significant challenges, particularly as the size of classes increases. The current paper introduces a fuzzy intelligent system designed to evaluate academic software projects using object-oriented programming and design course as an example. To establish evaluation criteria, we first conducted a survey of student project teams (n=31) and faculty (n=3) to identify key parameters and their applicable ranges. The selected criteria - clean code, use of inheritance, and functionality - were selected as essential for assessing the quality of academic software projects. These criteria were then represented as fuzzy variables with corresponding fuzzy sets. Collaborating with three experts, including one professor and two course instructors, we defined a set of fuzzy rules for a fuzzy inference system. This system processes the input criteria to produce a quantifiable measure of project success. The system demonstrated promising results in automating the evaluation of projects. Our approach standardizes project evaluations and helps to reduce the subjective bias in manual grading.

[205]  arXiv:2405.00454 [pdf, ps, other]
Title: Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence
Comments: Accepted in ISIT 2024
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning. These approaches draw inspiration from various divergence measures, such as $f$-divergences and $\alpha$-R\'enyi divergences. Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $\alpha$-R\'enyi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques. In the pseudo-labeling and entropy minimization techniques as self-training methods for effective semi-supervised learning, the self-training process has some inherent mismatch between the true label and pseudo-label (noisy pseudo-labels) and some of our empirical risk functions are robust, concerning noisy pseudo-labels. Under some conditions, our empirical risk functions demonstrate better performance when compared to traditional self-training methods.

[206]  arXiv:2405.00456 [pdf, other]
Title: Counterfactual Explanations for Deep Learning-Based Traffic Forecasting
Comments: 24 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep learning models are widely used in traffic forecasting and have achieved state-of-the-art prediction accuracy. However, the black-box nature of those models makes the results difficult to interpret by users. This study aims to leverage an Explainable AI approach, counterfactual explanations, to enhance the explainability and usability of deep learning-based traffic forecasting models. Specifically, the goal is to elucidate relationships between various input contextual features and their corresponding predictions. We present a comprehensive framework that generates counterfactual explanations for traffic forecasting and provides usable insights through the proposed scenario-driven counterfactual explanations. The study first implements a deep learning model to predict traffic speed based on historical traffic data and contextual variables. Counterfactual explanations are then used to illuminate how alterations in these input variables affect predicted outcomes, thereby enhancing the transparency of the deep learning model. We investigated the impact of contextual features on traffic speed prediction under varying spatial and temporal conditions. The scenario-driven counterfactual explanations integrate two types of user-defined constraints, directional and weighting constraints, to tailor the search for counterfactual explanations to specific use cases. These tailored explanations benefit machine learning practitioners who aim to understand the model's learning mechanisms and domain experts who seek insights for real-world applications. The results showcase the effectiveness of counterfactual explanations in revealing traffic patterns learned by deep learning models, showing its potential for interpreting black-box deep learning models used for spatiotemporal predictions in general.

[207]  arXiv:2405.00459 [pdf, other]
Title: U.S. Election Hardens Hate Universe
Subjects: Social and Information Networks (cs.SI); Human-Computer Interaction (cs.HC); Adaptation and Self-Organizing Systems (nlin.AO); Physics and Society (physics.soc-ph)

Local or national politics can trigger potentially dangerous hate in someone. But with a third of the world's population eligible to vote in elections in 2024 alone, we lack understanding of how individual-level hate multiplies up to hate behavior at the collective global scale. Here we show, based on the most recent U.S. election, that offline events are associated with a rapid adaptation of the global online hate universe that hardens (strengthens) both its network-of-networks structure and the 'flavors' of hate content that it collectively produces. Approximately 50 million potential voters in hate communities are drawn closer to each other and to the broad mainstream of approximately 2 billion others. It triggers new hate content at scale around immigration, ethnicity, and antisemitism that aligns with conspiracy theories about Jewish-led replacement before blending in hate around gender identity/sexual orientation, and religion. Telegram acts as a key hardening agent - yet is overlooked by U.S. Congressional hearings and new E.U. legislation. Because the hate universe has remained robust since 2020, anti-hate messaging surrounding not only upcoming elections but also other events like the war in Gaza, should pivot to blending multiple hate 'flavors' while targeting previously untouched social media structures.

[208]  arXiv:2405.00461 [pdf, other]
Title: Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning
Comments: ICRA 2024 Full-day Workshop: C4SR+: Continuum, Compliant, Cooperative, Cognitive
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

Ultrasound robots are increasingly used in medical diagnostics and early disease screening. However, current ultrasound robots lack the intelligence to understand human intentions and instructions, hindering autonomous ultrasound scanning. To solve this problem, we propose a novel Ultrasound Embodied Intelligence system that equips ultrasound robots with the large language model (LLM) and domain knowledge, thereby improving the efficiency of ultrasound robots. Specifically, we first design an ultrasound operation knowledge database to add expertise in ultrasound scanning to the LLM, enabling the LLM to perform precise motion planning. Furthermore, we devise a dynamic ultrasound scanning strategy based on a \textit{think-observe-execute} prompt engineering, allowing LLMs to dynamically adjust motion planning strategies during the scanning procedures. Extensive experiments demonstrate that our system significantly improves ultrasound scan efficiency and quality from verbal commands. This advancement in autonomous medical scanning technology contributes to non-invasive diagnostics and streamlined medical workflows.

[209]  arXiv:2405.00465 [pdf, other]
Title: BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine
Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains; however, these models encounter issues such as generating inaccurate information or hallucinations. Retrieval-augmented generation provided a solution for these models to update knowledge and enhance their performance. In contrast to previous retrieval-augmented LMs, which utilize specialized cross-attention mechanisms to help LLM encode retrieved text, BiomedRAG adopts a simpler approach by directly inputting the retrieved chunk-based documents into the LLM. This straightforward design is easily applicable to existing retrieval and language models, effectively bypassing noise information in retrieved documents, particularly in noise-intensive tasks. Moreover, we demonstrate the potential for utilizing the LLM to supervise the retrieval model in the biomedical domain, enabling it to retrieve the document that assists the LM in improving its predictions. Our experiments reveal that with the tuned scorer,\textsc{ BiomedRAG} attains superior performance across 5 biomedical NLP tasks, encompassing information extraction (triple extraction, relation extraction), text classification, link prediction, and question-answering, leveraging over 9 datasets. For instance, in the triple extraction task, \textsc{BiomedRAG} outperforms other triple extraction systems with micro-F1 scores of 81.42 and 88.83 on GIT and ChemProt corpora, respectively.

[210]  arXiv:2405.00466 [pdf, other]
Title: Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

Foundational generative models should be traceable to protect their owners and facilitate safety regulation. To achieve this, traditional approaches embed identifiers based on supervisory trigger-response signals, which are commonly known as backdoor watermarks. They are prone to failure when the model is fine-tuned with nontrigger data. Our experiments show that this vulnerability is due to energetic changes in only a few 'busy' layers during fine-tuning. This yields a novel arbitrary-in-arbitrary-out (AIAO) strategy that makes watermarks resilient to fine-tuning-based removal. The trigger-response pairs of AIAO samples across various neural network depths can be used to construct watermarked subpaths, employing Monte Carlo sampling to achieve stable verification results. In addition, unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths, where a mask-controlled trigger function is proposed to preserve the generation performance and ensure the invisibility of the embedded backdoor. Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO; while the verification rates of other trigger-based methods fall from ~90% to ~70% after fine-tuning, those of our method remain consistently above 90%.

[211]  arXiv:2405.00467 [pdf, other]
Title: Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing
Comments: Accepted to Workshop on Insights from Negative Results in NLP 2024 (co-located with NAACL 2024)
Subjects: Computation and Language (cs.CL)

With the rapid development of LLMs, it is natural to ask how to harness their capabilities efficiently. In this paper, we explore whether it is feasible to direct each input query to a single most suitable LLM. To this end, we propose LLM routing for challenging reasoning tasks. Our extensive experiments suggest that such routing shows promise but is not feasible in all scenarios, so more robust approaches should be investigated to fill this gap.

[212]  arXiv:2405.00468 [pdf, other]
Title: Feature-Aware Noise Contrastive Learning For Unsupervised Red Panda Re-Identification
Comments: 7 pages, 5 figures, IJCNN2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

To facilitate the re-identification (Re-ID) of individual animals, existing methods primarily focus on maximizing feature similarity within the same individual and enhancing distinctiveness between different individuals. However, most of them still rely on supervised learning and require substantial labeled data, which is challenging to obtain. To avoid this issue, we propose a Feature-Aware Noise Contrastive Learning (FANCL) method to explore an unsupervised learning solution, which is then validated on the task of red panda re-ID. FANCL employs a Feature-Aware Noise Addition module to produce noised images that conceal critical features and designs two contrastive learning modules to calculate the losses. Firstly, a feature consistency module is designed to bridge the gap between the original and noised features. Secondly, the neural networks are trained through a cluster contrastive learning module. Through these more challenging learning tasks, FANCL can adaptively extract deeper representations of red pandas. The experimental results on a set of red panda images collected in both indoor and outdoor environments prove that FANCL outperforms several related state-of-the-art unsupervised methods, achieving high performance comparable to supervised learning methods.

[213]  arXiv:2405.00469 [pdf, other]
Title: Exploiting Positional Bias for Query-Agnostic Generative Content in Search
Comments: 8 pages, 4 main figures, 7 appendix pages, 2 appendix figures
Subjects: Information Retrieval (cs.IR)

In recent years, neural ranking models (NRMs) have been shown to substantially outperform their lexical counterparts in text retrieval. In traditional search pipelines, a combination of features leads to well-defined behaviour. However, as neural approaches become increasingly prevalent as the final scoring component of engines or as standalone systems, their robustness to malicious text and, more generally, semantic perturbation needs to be better understood. We posit that the transformer attention mechanism can induce exploitable defects through positional bias in search models, leading to an attack that could generalise beyond a single query or topic. We demonstrate such defects by showing that non-relevant text--such as promotional content--can be easily injected into a document without adversely affecting its position in search results. Unlike previous gradient-based attacks, we demonstrate these biases in a query-agnostic fashion. In doing so, without the knowledge of topicality, we can still reduce the negative effects of non-relevant content injection by controlling injection position. Our experiments are conducted with simulated on-topic promotional text automatically generated by prompting LLMs with topical context from target documents. We find that contextualisation of a non-relevant text further reduces negative effects whilst likely circumventing existing content filtering mechanisms. In contrast, lexical models are found to be more resilient to such content injection attacks. We then investigate a simple yet effective compensation for the weaknesses of the NRMs in search, validating our hypotheses regarding transformer bias.

[214]  arXiv:2405.00474 [pdf, other]
Title: On Convergence of Discrete Schemes for Computing the Rate-Distortion Function of Continuous Source
Subjects: Information Theory (cs.IT)

Computing the rate-distortion function for continuous sources is commonly regarded as a standard continuous optimization problem. When numerically addressing this problem, a typical approach involves discretizing the source space and subsequently solving the associated discrete problem. However, existing literature has predominantly concentrated on the convergence analysis of solving discrete problems, usually neglecting the convergence relationship between the original continuous optimization and its associated discrete counterpart. This neglect is not rigorous, since the solution of a discrete problem does not necessarily imply convergence to the solution of the original continuous problem, especially for non-linear problems. To address this gap, our study employs rigorous mathematical analysis, which constructs a series of finite-dimensional spaces approximating the infinite-dimensional space of the probability measure, establishing that solutions from discrete schemes converge to those from the continuous problems.

[215]  arXiv:2405.00476 [pdf, other]
Title: A Comprehensive Survey of Dynamic Graph Neural Networks: Models, Frameworks, Benchmarks, Experiments and Challenges
Comments: Under review of PVLDB2025
Subjects: Machine Learning (cs.LG)

Dynamic Graph Neural Networks (GNNs) combine temporal information with GNNs to capture structural, temporal, and contextual relationships in dynamic graphs simultaneously, leading to enhanced performance in various applications. As the demand for dynamic GNNs continues to grow, numerous models and frameworks have emerged to cater to different application needs. There is a pressing need for a comprehensive survey that evaluates the performance, strengths, and limitations of various approaches in this domain. This paper aims to fill this gap by offering a thorough comparative analysis and experimental evaluation of dynamic GNNs. It covers 81 dynamic GNN models with a novel taxonomy, 12 dynamic GNN training frameworks, and commonly used benchmarks. We also conduct experimental results from testing representative nine dynamic GNN models and three frameworks on six standard graph datasets. Evaluation metrics focus on convergence accuracy, training efficiency, and GPU memory usage, enabling a thorough comparison of performance across various models and frameworks. From the analysis and evaluation results, we identify key challenges and offer principles for future research to enhance the design of models and frameworks in the dynamic GNNs field.

[216]  arXiv:2405.00479 [pdf, other]
Title: Enhanced Visual Question Answering: A Comparative Analysis and Textual Feature Extraction Via Convolutions
Authors: Zhilin Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual Question Answering (VQA) has emerged as a highly engaging field in recent years, attracting increasing research efforts aiming to enhance VQA accuracy through the deployment of advanced models such as Transformers. Despite this growing interest, there has been limited exploration into the comparative analysis and impact of textual modalities within VQA, particularly in terms of model complexity and its effect on performance. In this work, we conduct a comprehensive comparison between complex textual models that leverage long dependency mechanisms and simpler models focusing on local textual features within a well-established VQA framework. Our findings reveal that employing complex textual encoders is not invariably the optimal approach for the VQA-v2 dataset. Motivated by this insight, we introduce an improved model, ConvGRU, which incorporates convolutional layers to enhance the representation of question text. Tested on the VQA-v2 dataset, ConvGRU achieves better performance without substantially increasing parameter complexity.

[217]  arXiv:2405.00480 [pdf, ps, other]
Title: Better Bounded Bisimulation Contractions (Preprint)
Subjects: Logic in Computer Science (cs.LO)

Bisimulations are standard in modal logic and, more generally, in the theory of state-transition systems. The quotient structure of a Kripke model with respect to the bisimulation relation is called a bisimulation contraction. The bisimulation contraction is a minimal model bisimilar to the original model, and hence, for (image-)finite models, a minimal model modally equivalent to the original. Similar definitions exist for bounded bisimulations ($k$-bisimulations) and bounded bisimulation contractions. Two finite models are $k$-bisimilar if and only if they are modally equivalent up to modal depth $k$. However, the quotient structure with respect to the $k$-bisimulation relation does not guarantee a minimal model preserving modal equivalence to depth $k$. In this paper, we remedy this asymmetry to standard bisimulations and provide a novel definition of bounded contractions called rooted $k$-contractions. We prove that rooted $k$-contractions preserve $k$-bisimilarity and are minimal with this property. Finally, we show that rooted $k$-contractions can be exponentially more succinct than standard $k$-contractions.

[218]  arXiv:2405.00482 [pdf, other]
Title: PackVFL: Efficient HE Packing for Vertical Federated Learning
Comments: 12 pages excluding references
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

As an essential tool of secure distributed machine learning, vertical federated learning (VFL) based on homomorphic encryption (HE) suffers from severe efficiency problems due to data inflation and time-consuming operations. To this core, we propose PackVFL, an efficient VFL framework based on packed HE (PackedHE), to accelerate the existing HE-based VFL algorithms. PackVFL packs multiple cleartexts into one ciphertext and supports single-instruction-multiple-data (SIMD)-style parallelism. We focus on designing a high-performant matrix multiplication (MatMult) method since it takes up most of the ciphertext computation time in HE-based VFL. Besides, devising the MatMult method is also challenging for PackedHE because a slight difference in the packing way could predominantly affect its computation and communication costs. Without domain-specific design, directly applying SOTA MatMult methods is hard to achieve optimal.
Therefore, we make a three-fold design: 1) we systematically explore the current design space of MatMult and quantify the complexity of existing approaches to provide guidance; 2) we propose a hybrid MatMult method according to the unique characteristics of VFL; 3) we adaptively apply our hybrid method in representative VFL algorithms, leveraging distinctive algorithmic properties to further improve efficiency. As the batch size, feature dimension and model size of VFL scale up to large sizes, PackVFL consistently delivers enhanced performance. Empirically, PackVFL propels existing VFL algorithms to new heights, achieving up to a 51.52X end-to-end speedup. This represents a substantial 34.51X greater speedup compared to the direct application of SOTA MatMult methods.

[219]  arXiv:2405.00483 [pdf, other]
Title: In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

As deep generative models advance, we anticipate deepfakes achieving "perfection"-generating no discernible artifacts or noise. However, current deepfake detectors, intentionally or inadvertently, rely on such artifacts for detection, as they are exclusive to deepfakes and absent in genuine examples. To bridge this gap, we introduce the Rebalanced Deepfake Detection Protocol (RDDP) to stress-test detectors under balanced scenarios where genuine and forged examples bear similar artifacts. We offer two RDDP variants: RDDP-WHITEHAT uses white-hat deepfake algorithms to create 'self-deepfakes,' genuine portrait videos with the resemblance of the underlying identity, yet carry similar artifacts to deepfake videos; RDDP-SURROGATE employs surrogate functions (e.g., Gaussian noise) to process both genuine and forged examples, introducing equivalent noise, thereby sidestepping the need of deepfake algorithms.
Towards detecting perfect deepfake videos that aligns with genuine ones, we present ID-Miner, a detector that identifies the puppeteer behind the disguise by focusing on motion over artifacts or appearances. As an identity-based detector, it authenticates videos by comparing them with reference footage. Equipped with the artifact-agnostic loss at frame-level and the identity-anchored loss at video-level, ID-Miner effectively singles out identity signals amidst distracting variations. Extensive experiments comparing ID-Miner with 12 baseline detectors under both conventional and RDDP evaluations with two deepfake datasets, along with additional qualitative studies, affirm the superiority of our method and the necessity for detectors designed to counter perfect deepfakes.

[220]  arXiv:2405.00485 [pdf, other]
Title: The Pyramid of Captions
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a formal information-theoretic framework for image captioning by regarding it as a representation learning task. Our framework defines three key objectives: task sufficiency, minimal redundancy, and human interpretability. Building upon this foundation, we propose a novel Pyramid of Captions (PoCa) method, which constructs caption pyramids by generating localized captions for zoomed-in image patches and integrating them with global caption information using large language models. This approach leverages intuition that the detailed examination of local patches can reduce error risks and address inaccuracies in global captions, either by correcting the hallucination or adding missing details. Based on our theoretical framework, we formalize this intuition and provide formal proof demonstrating the effectiveness of PoCa under certain assumptions. Empirical tests with various image captioning models and large language models show that PoCa consistently yields more informative and semantically aligned captions, maintaining brevity and interpretability.

[221]  arXiv:2405.00489 [pdf, other]
Title: Explainable Automatic Grading with Neural Additive Models
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Applications (stat.AP)

The use of automatic short answer grading (ASAG) models may help alleviate the time burden of grading while encouraging educators to frequently incorporate open-ended items in their curriculum. However, current state-of-the-art ASAG models are large neural networks (NN) often described as "black box", providing no explanation for which characteristics of an input are important for the produced output. This inexplicable nature can be frustrating to teachers and students when trying to interpret, or learn from an automatically-generated grade. To create a powerful yet intelligible ASAG model, we experiment with a type of model called a Neural Additive Model that combines the performance of a NN with the explainability of an additive model. We use a Knowledge Integration (KI) framework from the learning sciences to guide feature engineering to create inputs that reflect whether a student includes certain ideas in their response. We hypothesize that indicating the inclusion (or exclusion) of predefined ideas as features will be sufficient for the NAM to have good predictive power and interpretability, as this may guide a human scorer using a KI rubric. We compare the performance of the NAM with another explainable model, logistic regression, using the same features, and to a non-explainable neural model, DeBERTa, that does not require feature engineering.

[222]  arXiv:2405.00491 [pdf, ps, other]
Title: On the Relevance of Byzantine Robust Optimization Against Data Poisoning
Comments: 38 pages
Subjects: Machine Learning (cs.LG)

The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faulty workers}. The problem of {\em Byzantine ML} formalizes these robustness issues by considering a distributed ML environment in which workers (storing a portion of the global dataset) can deviate arbitrarily from the prescribed algorithm. Although the problem has attracted a lot of attention from a theoretical point of view, its practical importance for addressing realistic faults (where the behavior of any worker is locally constrained) remains unclear. It has been argued that the seemingly weaker threat model where only workers' local datasets get poisoned is more reasonable. We prove that, while tolerating a wider range of faulty behaviors, Byzantine ML yields solutions that are, in a precise sense, optimal even under the weaker data poisoning threat model. Then, we study a generic data poisoning model wherein some workers have {\em fully-poisonous local data}, i.e., their datasets are entirely corruptible, and the remainders have {\em partially-poisonous local data}, i.e., only a fraction of their local datasets is corruptible. We prove that Byzantine-robust schemes yield optimal solutions against both these forms of data poisoning, and that the former is more harmful when workers have {\em heterogeneous} local data.

[223]  arXiv:2405.00492 [pdf, other]
Title: Is Temperature the Creativity Parameter of Large Language Models?
Comments: To be published in the Proceedings of the 15th International Conference on Computational Creativity (ICCC'24), 8 pages, 2 figures, 2 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism. The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter. Here, we investigate this claim using a narrative generation task with a predetermined fixed context, model and prompt. Specifically, we present an empirical analysis of the LLM output for different temperature values using four necessary conditions for creativity in narrative generation: novelty, typicality, cohesion, and coherence. We find that temperature is weakly correlated with novelty, and unsurprisingly, moderately correlated with incoherence, but there is no relationship with either cohesion or typicality. However, the influence of temperature on creativity is far more nuanced and weak than suggested by the "creativity parameter" claim; overall results suggest that the LLM generates slightly more novel outputs as temperatures get higher. Finally, we discuss ideas to allow more controlled LLM creativity, rather than relying on chance via changing the temperature parameter.

[224]  arXiv:2405.00494 [pdf, other]
Title: GOLD: Geometry Problem Solver with Natural Language Description
Comments: Accepted in NAACL 2024 Findings
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Addressing the challenge of automated geometry math problem-solving in artificial intelligence (AI) involves understanding multi-modal information and mathematics. Current methods struggle with accurately interpreting geometry diagrams, which hinders effective problem-solving. To tackle this issue, we present the Geometry problem sOlver with natural Language Description (GOLD) model. GOLD enhances the extraction of geometric relations by separately processing symbols and geometric primitives within the diagram. Subsequently, it converts the extracted relations into natural language descriptions, efficiently utilizing large language models to solve geometry math problems. Experiments show that the GOLD model outperforms the Geoformer model, the previous best method on the UniGeo dataset, by achieving accuracy improvements of 12.7% and 42.1% in calculation and proving subsets. Additionally, it surpasses the former best model on the PGPS9K and Geometry3K datasets, PGPSNet, by obtaining accuracy enhancements of 1.8% and 3.2%, respectively.

[225]  arXiv:2405.00495 [pdf, other]
Title: The Loewner framework for parametric systems: Taming the curse of dimensionality
Comments: 32 pages, 4 figures
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)

The Loewner framework is an interpolatory framework for the approximation of linear and nonlinear systems. The purpose here is to extend this framework to linear parametric systems with an arbitrary number n of parameters. One main innovation established here is the construction of data-based realizations for any number of parameters. Equally importantly, we show how to alleviate the computational burden, by avoiding the explicit construction of large-scale n-dimensional Loewner matrices of size $N \times N$. This reduces the complexity from $O(N^3)$ to about $O(N^{1.4})$, thus taming the curse of dimensionality and making the solution scalable to very large data sets. To achieve this, a new generalized multivariate rational function realization is defined. Then, we introduce the n-dimensional multivariate Loewner matrices and show that they can be computed by solving a coupled set of Sylvester equations. The null space of these Loewner matrices then allows the construction of the multivariate barycentric transfer function. The principal result of this work is to show how the null space of the n-dimensional Loewner matrix can be computed using a sequence of 1-dimensional Loewner matrices, leading to a drastic computational burden reduction. Finally, we suggest two algorithms (one direct and one iterative) to construct, directly from data, multivariate (or parametric) realizations ensuring (approximate) interpolation. Numerical examples highlight the effectiveness and scalability of the method.

[226]  arXiv:2405.00505 [pdf, other]
Title: KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents
Comments: accepted ICDAR2024
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

In recent years, the challenge of extracting information from business documents has emerged as a critical task, finding applications across numerous domains. This effort has attracted substantial interest from both industry and academy, highlighting its significance in the current technological landscape. Most datasets in this area are primarily focused on Key Information Extraction (KIE), where the extraction process revolves around extracting information using a specific, predefined set of keys. Unlike most existing datasets and benchmarks, our focus is on discovering key-value pairs (KVPs) without relying on predefined keys, navigating through an array of diverse templates and complex layouts. This task presents unique challenges, primarily due to the absence of comprehensive datasets and benchmarks tailored for non-predetermined KVP extraction. To address this gap, we introduce KVP10k , a new dataset and benchmark specifically designed for KVP extraction. The dataset contains 10707 richly annotated images. In our benchmark, we also introduce a new challenging task that combines elements of KIE as well as KVP in a single task. KVP10k sets itself apart with its extensive diversity in data and richly detailed annotations, paving the way for advancements in the field of information extraction from complex business documents.

[227]  arXiv:2405.00507 [pdf, other]
Title: NeRF-Guided Unsupervised Learning of RGB-D Registration
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper focuses on training a robust RGB-D registration model without ground-truth pose supervision. Existing methods usually adopt a pairwise training strategy based on differentiable rendering, which enforces the photometric and the geometric consistency between the two registered frames as supervision. However, this frame-to-frame framework suffers from poor multi-view consistency due to factors such as lighting changes, geometry occlusion and reflective materials. In this paper, we present NeRF-UR, a novel frame-to-model optimization framework for unsupervised RGB-D registration. Instead of frame-to-frame consistency, we leverage the neural radiance field (NeRF) as a global model of the scene and use the consistency between the input and the NeRF-rerendered frames for pose optimization. This design can significantly improve the robustness in scenarios with poor multi-view consistency and provides better learning signal for the registration model. Furthermore, to bootstrap the NeRF optimization, we create a synthetic dataset, Sim-RGBD, through a photo-realistic simulator to warm up the registration model. By first training the registration model on Sim-RGBD and later unsupervisedly fine-tuning on real data, our framework enables distilling the capability of feature extraction and registration from simulation to reality. Our method outperforms the state-of-the-art counterparts on two popular indoor RGB-D datasets, ScanNet and 3DMatch. Code and models will be released for paper reproduction.

[228]  arXiv:2405.00514 [pdf, other]
Title: Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image-level regression is an important task in Earth observation, where visual domain and label shifts are a core challenge hampering generalization. However, cross-domain regression with remote sensing data remains understudied due to the absence of suited datasets. We introduce a new dataset with aerial and satellite imagery in five countries with three forest-related regression tasks. To match real-world applicative interests, we compare methods through a restrictive setup where no prior on the target domain is available during training, and models are adapted with limited information during testing. Building on the assumption that ordered relationships generalize better, we propose manifold diffusion for regression as a strong baseline for transduction in low-data regimes. Our comparison highlights the comparative advantages of inductive and transductive methods in cross-domain regression.

[229]  arXiv:2405.00515 [pdf, other]
Title: GAD-Generative Learning for HD Map-Free Autonomous Driving
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic programming or model predictive control. This results in a performance bottleneck for autonomous driving systems in that corner cases simply cannot be solved by enumerating hand-crafted rules. We present a deep-learning-based approach that brings prediction, decision, and planning modules together with the attempt to overcome the rule-based methods' deficiency in real-world applications of autonomous driving, especially for urban scenes. The DNN model we proposed is solely trained with 10 hours of human driver data, and it supports all mass-production ADAS features available on the market to date. This method is deployed onto a Jiyue test car with no modification to its factory-ready sensor set and compute platform. the feasibility, usability, and commercial potential are demonstrated in this article.

[230]  arXiv:2405.00516 [pdf, other]
Title: Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Comments: ACM 2024, Avila Spain. 9 pages
Journal-ref: ACM SAC Conference 2024, Avila, Spain, Article 4, 9 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Recent advancements in language models have demonstrated remarkable improvements in various natural language processing (NLP) tasks such as web navigation. Supervised learning (SL) approaches have achieved impressive performance while utilizing significantly less training data compared to previous methods. However, these SL-based models fall short when compared to reinforcement learning (RL) approaches, which have shown superior results. In this paper, we propose a novel approach that combines SL and RL techniques over the MiniWoB benchmark to leverage the strengths of both methods. We also address a critical limitation in previous models' understanding of HTML content, revealing a tendency to memorize target elements rather than comprehend the underlying structure. To rectify this, we propose methods to enhance true understanding and present a new baseline of results. Our experiments demonstrate that our approach outperforms previous SL methods on certain tasks using less data and narrows the performance gap with RL models, achieving 43.58\% average accuracy in SL and 36.69\% when combined with a multimodal RL approach. This study sets a new direction for future web navigation and offers insights into the limitations and potential of language modeling for computer tasks.

[231]  arXiv:2405.00519 [pdf, ps, other]
Title: Design Implications for a Social and Collaborative Understanding of online Information Assessment Practices, Challenges and Heuristics
Comments: To be published in Proceedings of ECSCW 2024, Rimini, Italy
Subjects: Human-Computer Interaction (cs.HC)

The broader adoption of social media platforms (e.g., TikTok), combined with recent developments in Generative AI (GAI) technologies has had a transformative effect on many peoples' ability to confidently assess the veracity and meaning of information online. In this paper, building on recent related work that surfaced the social ways that young people evaluate information online, we explore the decision-making practices, challenges and heuristics involved in young adults' assessments of information online. To do so, we designed and conducted a novel digital diary study, followed by data-informed interviews with young adults. Our findings uncover the information practices of young adults including the social and emotional motivations for ignoring, avoiding, and engaging with online information and the ways this is entangled with collaborative arrangements with algorithms as agents. In our discussion we bring these findings in close dialogue with work on information sensibility and contribute rich insights into young peoples' information sensibility practices embedded within social worlds. Finally, we surface how such practices are attuned to prioritise wellbeing over convenience or other commonly associated sufficing heuristics.

[232]  arXiv:2405.00523 [pdf, other]
Title: CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions
Comments: LREC-COLING 2024 Accepted
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This paper introduces CookingSense, a descriptive collection of knowledge assertions in the culinary domain extracted from various sources, including web data, scientific papers, and recipes, from which knowledge covering a broad range of aspects is acquired. CookingSense is constructed through a series of dictionary-based filtering and language model-based semantic filtering techniques, which results in a rich knowledgebase of multidisciplinary food-related assertions. Additionally, we present FoodBench, a novel benchmark to evaluate culinary decision support systems. From evaluations with FoodBench, we empirically prove that CookingSense improves the performance of retrieval augmented language models. We also validate the quality and variety of assertions in CookingSense through qualitative analysis.

[233]  arXiv:2405.00524 [pdf, ps, other]
Title: FMLFS: A federated multi-label feature selection based on information theory in IoT environment
Comments: This paper has been accepted by IEEE SmartComp 2024
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

In certain emerging applications such as health monitoring wearable and traffic monitoring systems, Internet-of-Things (IoT) devices generate or collect a huge amount of multi-label datasets. Within these datasets, each instance is linked to a set of labels. The presence of noisy, redundant, or irrelevant features in these datasets, along with the curse of dimensionality, poses challenges for multi-label classifiers. Feature selection (FS) proves to be an effective strategy in enhancing classifier performance and addressing these challenges. Yet, there is currently no existing distributed multi-label FS method documented in the literature that is suitable for distributed multi-label datasets within IoT environments. This paper introduces FMLFS, the first federated multi-label feature selection method. Here, mutual information between features and labels serves as the relevancy metric, while the correlation distance between features, derived from mutual information and joint entropy, is utilized as the redundancy measure. Following aggregation of these metrics on the edge server and employing Pareto-based bi-objective and crowding distance strategies, the sorted features are subsequently sent back to the IoT devices. The proposed method is evaluated through two scenarios: 1) transmitting reduced-size datasets to the edge server for centralized classifier usage, and 2) employing federated learning with reduced-size datasets. Evaluation across three metrics - performance, time complexity, and communication cost - demonstrates that FMLFS outperforms five other comparable methods in the literature and provides a good trade-off on three real-world datasets.

[234]  arXiv:2405.00526 [pdf, other]
Title: JNI Global References Are Still Vulnerable: Attacks and Defenses
Subjects: Cryptography and Security (cs.CR)

System services and resources in Android are accessed through IPC based mechanisms. Previous research has demonstrated that they are vulnerable to the denial-of-service attack (DoS attack). For instance, the JNI global reference (JGR), which is widely used by system services, can be exhausted to cause the system reboot (hence the name JGRE attack). Even though the Android team tries to fix the problem by enforcing security checks, we find that it is still possible to construct a JGR exhaustion DoS attack in the latest Android system.
In this paper, we propose a new JGR exhaustion DoS attack, which is effective in different Android versions, including the latest one (i.e., Android 10). Specifically, we developed JGREAnalyzer, a tool that can systematically detect JGR vulnerable services APIs via a call graph analysis and a forwarding reachability analysis. We applied this tool to different Android versions and found multiple vulnerabilities. In particular, among 148 system services in Android 10, 12 of them have 21 vulnerabilities. Among them, 9 can be successfully exploited without any permissions. We further analyze the root cause of the vulnerabilities and propose a new defense to mitigate the JGRE attack by restricting resource consumption via global reference counting.

[235]  arXiv:2405.00527 [pdf, other]
Title: ChatBI: Towards Natural Language to Complex Business Intelligence SQL
Subjects: Databases (cs.DB)

The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges.
In this paper, we propose ChatBI, a comprehensive and efficient technology for solving the NL2BI task. First, we analyze the interaction mode, an important module where NL2SQL and NL2BI differ in use, and design a smaller and cheaper model to match this interaction mode. In BI scenarios, tables contain a huge number of columns, making it impossible for existing NL2SQL methods that rely on Large Language Models (LLMs) for schema linking to proceed due to token limitations. The higher proportion of ambiguous columns in BI scenarios also makes schema linking difficult. ChatBI combines existing view technology in the database community to first decompose the schema linking problem into a Single View Selection problem and then uses a smaller and cheaper machine learning model to select the single view with a significantly reduced number of columns. The columns of this single view are then passed as the required columns for schema linking into the LLM. Finally, ChatBI proposes a phased process flow different from existing process flows, which allows ChatBI to generate SQL containing complex semantics and comparison relations more accurately.
We have deployed ChatBI on Baidu's data platform and integrated it into multiple product lines for large-scale production task evaluation. The obtained results highlight its superiority in practicality, versatility, and efficiency. At the same time, compared with the current mainstream NL2SQL technology under our real BI scenario data tables and queries, it also achieved the best results.

[236]  arXiv:2405.00529 [pdf, other]
Title: High-Order Block Toeplitz Inner-Bordering method for solving the Gelfand-Levitan-Marchenko equation
Subjects: Numerical Analysis (math.NA)

We propose a high precision algorithm for solving the Gelfand-Levitan-Marchenko equation. The algorithm is based on the block version of the Toeplitz Inner-Bordering algorithm of Levinson's type. To approximate integrals, we use the high-precision one-sided and two-sided Gregory quadrature formulas. Also we use the Woodbury formula to construct a computational algorithm. This makes it possible to use the almost Toeplitz structure of the matrices for the fast calculations.

[237]  arXiv:2405.00531 [pdf, other]
Title: Byzantine-Secure Relying Party for Resilient RPKI
Subjects: Cryptography and Security (cs.CR)

To protect against prefix hijacks, Resource Public Key Infrastructure (RPKI) has been standardized. To enjoy the security guarantees of RPKI validation, networks need to install a new component, the relying party validator, which fetches and validates RPKI objects and provides them to border routers. However, recent work shows that relying parties experience failures when retrieving RPKI objects and are vulnerable to attacks, all of which can disable RPKI validation. Therefore even the few adopters are not necessarily secure.
We make the first proposal that significantly improves the resilience and security of RPKI. We develop BRP, a Byzantine-Secure relying party implementation. In BRP the relying party nodes redundantly validate RPKI objects and reach a global consensus through voting. BRP provides an RPKI equivalent of public DNS, removing the need for networks to install, operate, and upgrade their own relying party instances while avoiding the need to trust operators of BRP nodes.
We show through simulations and experiments that BRP, as an intermediate RPKI service, results in less load on RPKI publication points and a robust output despite RPKI repository failures, jitter, and attacks. We engineer BRP to be fully backward compatible and readily deployable - it does not require any changes to the border routers and the RPKI repositories.
We demonstrate that BRP can protect many networks transparently, with either a decentralized or centralized deployment. BRP can be set up as a network of decentralized volunteer deployments, similarly to NTP and TOR, where different operators participate in the peering process with their node, and provide resilient and secure relying party validation to the Internet. BRP can also be hosted by a single operator as a centralized service, e.g., on one cloud or CDN, and provides RPKI validation benefits even when hosted on a single network.

[238]  arXiv:2405.00532 [pdf, other]
Title: ULLER: A Unified Language for Learning and Reasoning
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The field of neuro-symbolic artificial intelligence (NeSy), which combines learning and reasoning, has recently experienced significant growth. There now are a wide variety of NeSy frameworks, each with its own specific language for expressing background knowledge and how to relate it to neural networks. This heterogeneity hinders accessibility for newcomers and makes comparing different NeSy frameworks challenging. We propose a unified language for NeSy, which we call ULLER, a Unified Language for LEarning and Reasoning. ULLER encompasses a wide variety of settings, while ensuring that knowledge described in it can be used in existing NeSy systems. ULLER has a neuro-symbolic first-order syntax for which we provide example semantics including classical, fuzzy, and probabilistic logics. We believe ULLER is a first step towards making NeSy research more accessible and comparable, paving the way for libraries that streamline training and evaluation across a multitude of semantics, knowledge bases, and NeSy systems.

[239]  arXiv:2405.00536 [pdf, other]
Title: A Legal Framework for Natural Language Processing Model Training in Portugal
Comments: LEGAL2024 Legal and Ethical Issues in Human Language Technologies, LREC 2024
Subjects: Computation and Language (cs.CL); Emerging Technologies (cs.ET)

Recent advances in deep learning have promoted the advent of many computational systems capable of performing intelligent actions that, until then, were restricted to the human intellect. In the particular case of human languages, these advances allowed the introduction of applications like ChatGPT that are capable of generating coherent text without being explicitly programmed to do so. Instead, these models use large volumes of textual data to learn meaningful representations of human languages. Associated with these advances, concerns about copyright and data privacy infringements caused by these applications have emerged. Despite these concerns, the pace at which new natural language processing applications continued to be developed largely outperformed the introduction of new regulations. Today, communication barriers between legal experts and computer scientists motivate many unintentional legal infringements during the development of such applications. In this paper, a multidisciplinary team intends to bridge this communication gap and promote more compliant Portuguese NLP research by presenting a series of everyday NLP use cases, while highlighting the Portuguese legislation that may arise during its development.

[240]  arXiv:2405.00539 [pdf, other]
Title: Data-driven approximation of Koopman operators and generators: Convergence rates and error bounds
Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)

Global information about dynamical systems can be extracted by analysing associated infinite-dimensional transfer operators, such as Perron-Frobenius and Koopman operators as well as their infinitesimal generators. In practice, these operators typically need to be approximated from data. Popular approximation methods are extended dynamic mode decomposition (EDMD) and generator extended mode decomposition (gEDMD). We propose a unified framework that leverages Monte Carlo sampling to approximate the operator of interest on a finite-dimensional space spanned by a set of basis functions. Our framework contains EDMD and gEDMD as special cases, but can also be used to approximate more general operators. Our key contributions are proofs of the convergence of the approximating operator and its spectrum under non-restrictive conditions. Moreover, we derive explicit convergence rates and account for the presence of noise in the observations. Whilst all these results are broadly applicable, they also refine previous analyses of EDMD and gEDMD. We verify the analytical results with the aid of several numerical experiments.

[241]  arXiv:2405.00540 [pdf, other]
Title: Heat, Health, and Habitats: Analyzing the Intersecting Risks of Climate and Demographic Shifts in Austrian Districts
Subjects: Computers and Society (cs.CY); General Economics (econ.GN); Atmospheric and Oceanic Physics (physics.ao-ph)

The impact of hot weather on health outcomes of a population is mediated by a variety of factors, including its age profile and local green infrastructure. The combination of warming due to climate change and demographic aging suggests that heat-related health outcomes will deteriorate in the coming decades. Here, we measure the relationship between weekly all-cause mortality and heat days in Austrian districts using a panel dataset covering $2015-2022$. An additional day reaching $30$ degrees is associated with a $2.4\%$ increase in mortality per $1000$ inhabitants during summer. This association is roughly doubled in districts with a two standard deviation above average share of the population over $65$. Using forecasts of hot days (RCP) and demographics in $2050$, we observe that districts will have elderly populations and hot days $2-5$ standard deviations above the current mean in just $25$ years. This predicts a drastic increase in heat-related mortality. At the same time, district green scores, measured using $10\times 10$ meter resolution satellite images of residential areas, significantly moderate the relationship between heat and mortality. Thus, although local policies likely cannot reverse warming or demographic trends, they can take measures to mediate the health consequences of these growing risks, which are highly heterogeneous across regions, even in Austria.

[242]  arXiv:2405.00543 [pdf, other]
Title: New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for Aspect-Category Sentiment Analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal. To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4,876 text-image pairs with 14,618 fine-grained annotations for both text and image in the hotel domain. Additionally, we propose a Fine-Grained Cross-Modal Fusion Framework (FCMF) that effectively learns both intra- and inter-modality interactions and then fuses these information to produce a unified multimodal representation. Experimental results show that our framework outperforms SOTA models on the ViMACSA dataset, achieving the highest F1 score of 79.73%. We also explore characteristics and challenges in Vietnamese multimodal sentiment analysis, including misspellings, abbreviations, and the complexities of the Vietnamese language. This work contributes both a benchmark dataset and a new framework that leverages fine-grained multimodal information to improve multimodal aspect-category sentiment analysis. Our dataset is available for research purposes: https://github.com/hoangquy18/Multimodal-Aspect-Category-Sentiment-Analysis.

[243]  arXiv:2405.00545 [pdf, other]
Title: A Double Maximization Approach for Optimizing the LM Rate of Mismatched Decoding
Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)

An approach is established for maximizing the Lower bound on the Mismatch capacity (hereafter abbreviated as LM rate), a key performance bound in mismatched decoding, by optimizing the channel input probability distribution. Under a fixed channel input probability distribution, the computation of the corresponding LM rate is a convex optimization problem. When optimizing the channel input probability distribution, however, the corresponding optimization problem adopts a max-min formulation, which is generally non-convex and is intractable with standard approaches. To solve this problem, a novel dual form of the LM rate is proposed, thereby transforming the max-min formulation into an equivalent double maximization formulation. This new formulation leads to a maximization problem setup wherein each individual optimization direction is convex. Consequently, an alternating maximization algorithm is established to solve the resultant maximization problem setup. Each step of the algorithm only involves a closed-form iteration, which is efficiently implemented with standard optimization procedures. Numerical experiments show the proposed approach for optimizing the LM rate leads to noticeable rate gains.

[244]  arXiv:2405.00549 [pdf, ps, other]
Title: A Confirmation Rule for the Ethereum Consensus Protocol
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

A Confirmation Rule, within blockchain networks, refers to an algorithm implemented by network nodes that determines (either probabilistically or deterministically) the permanence of certain blocks on the blockchain. An example of Confirmation Rule is the Bitcoin's longest chain Confirmation Rule where a block is confirmed (with high probability) when it has a sufficiently long chain of successors, its siblings have notably shorter successor chains, and network synchrony holds. In this work, we devise a Confirmation Rule for Ethereum's consensus protocol, Gasper. Initially, our focus is on developing a rule specifically for LMD-GHOST, the component of Gasper responsible for ensuring dynamic availability. This is done independently of the influence of FFG-Casper, which is designed to finalize the blocks produced by LMD-GHOST. Subsequently, we build upon this rule to consider FFG-Casper's impact, aiming to achieve fast block confirmations through a heuristic that balances confirmation speed with a trade-off in safety guarantees. This refined Confirmation Rule could potentially standardize fast block confirmation within Gasper.

[245]  arXiv:2405.00552 [pdf, other]
Title: Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs
Comments: 8 pages, 6 figures. Code to be released at: this https URL
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

We present a novel approach for long-term human trajectory prediction, which is essential for long-horizon robot planning in human-populated environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to 60s. We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood (NLL) and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged baselines for a time horizon of 60s.

[246]  arXiv:2405.00554 [pdf, other]
Title: A First Look at Selection Bias in Preference Elicitation for Recommendation
Comments: Accepted at the CONSEQUENCES'23 workshop at RecSys '23
Subjects: Information Retrieval (cs.IR)

Preference elicitation explicitly asks users what kind of recommendations they would like to receive. It is a popular technique for conversational recommender systems to deal with cold-starts. Previous work has studied selection bias in implicit feedback, e.g., clicks, and in some forms of explicit feedback, i.e., ratings on items. Despite the fact that the extreme sparsity of preference elicitation interactions make them severely more prone to selection bias than natural interactions, the effect of selection bias in preference elicitation on the resulting recommendations has not been studied yet. To address this gap, we take a first look at the effects of selection bias in preference elicitation and how they may be further investigated in the future. We find that a big hurdle is the current lack of any publicly available dataset that has preference elicitation interactions. As a solution, we propose a simulation of a topic-based preference elicitation process. The results from our simulation-based experiments indicate (i) that ignoring the effect of selection bias early in preference elicitation can lead to an exacerbation of overrepresentation in subsequent item recommendations, and (ii) that debiasing methods can alleviate this effect, which leads to significant improvements in subsequent item recommendation performance. Our aim is for the proposed simulator and initial results to provide a starting point and motivation for future research into this important but overlooked problem setting.

[247]  arXiv:2405.00555 [pdf, other]
Title: Derivative-based regularization for regression
Subjects: Machine Learning (cs.LG)

In this work, we introduce a novel approach to regularization in multivariable regression problems. Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data. We call these estimated derivatives data derivatives. The goal of our method is to align the model to the data, not only in terms of target values but also in terms of the derivatives involved. To estimate data derivatives, we select (from the training data) 2-tuples of input-value pairs, using either nearest neighbour or random, selection. On synthetic and real datasets, we evaluate the effectiveness of adding DLoss, with different weights, to the standard mean squared error loss. The experimental results show that with DLoss (using nearest neighbour selection) we obtain, on average, the best rank with respect to MSE on validation data sets, compared to no regularization, L2 regularization, and Dropout.

[248]  arXiv:2405.00556 [pdf, other]
Title: Swarm Learning: A Survey of Concepts, Applications, and Trends
Comments: 31 pages
Subjects: Machine Learning (cs.LG)

Deep learning models have raised privacy and security concerns due to their reliance on large datasets on central servers. As the number of Internet of Things (IoT) devices increases, artificial intelligence (AI) will be crucial for resource management, data processing, and knowledge acquisition. To address those issues, federated learning (FL) has introduced a novel approach to building a versatile, large-scale machine learning framework that operates in a decentralized and hardware-agnostic manner. However, FL faces network bandwidth limitations and data breaches. To reduce the central dependency in FL and increase scalability, swarm learning (SL) has been proposed in collaboration with Hewlett Packard Enterprise (HPE). SL represents a decentralized machine learning framework that leverages blockchain technology for secure, scalable, and private data management. A blockchain-based network enables the exchange and aggregation of model parameters among participants, thus mitigating the risk of a single point of failure and eliminating communication bottlenecks. To the best of our knowledge, this survey is the first to introduce the principles of Swarm Learning, its architectural design, and its fields of application. In addition, it highlights numerous research avenues that require further exploration by academic and industry communities to unlock the full potential and applications of SL.

[249]  arXiv:2405.00557 [pdf, other]
Title: Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge, posing potential risks during deployment. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs, which usually require a strong LLM's emergent ability to improve its original bad answer. To address these challenges, we propose a novel self-alignment method that utilizes a Chain of Thought (CoT) approach, termed AlignCoT. This method encompasses stages of Question Analysis, Answer Guidance, and Safe Answer production. It is designed to enable LLMs to generate high-quality, safe responses throughout various stages of their development. Furthermore, we introduce the Mixture of insighTful Experts (MoTE) architecture, which applies the mixture of experts to enhance each component of the AlignCoT process, markedly increasing alignment efficiency. The MoTE approach not only outperforms existing methods in aligning LLMs with human values but also highlights the benefits of using self-generated data, revealing the dual benefits of improved alignment and training efficiency.

[250]  arXiv:2405.00558 [pdf, other]
Title: Cross-Cluster Networking to Support Extended Reality Services
Subjects: Networking and Internet Architecture (cs.NI)

Extented Reality (XR) refers to a class of contemporary services that are intertwined with a plethora of rather demanding Quality of Service (QoS) and functional requirements. Despite Kubernetes being the de-facto standard in terms of deploying and managing contemporary containerized microservices, it lacks adequate support for cross-cluster networking, hindering service-to-service communication across diverse cloud domains. Although there are tools that may be leveraged alongside Kubernetes in order to establish multi-cluster deployments, each one of them comes with its drawbacks and limitations. The purpose of this article is to explore the various potential technologies that may facilitate multi-cluster deployments and to propose how they may be leveraged to provide a cross-cluster connectivity solution that caters to the intricacies of XR services. The proposed solution is based on the use of two open source frameworks, namely Cluster API for multi-cluster management, and Liqo for multi-cluster interconnectivity. The efficiency of this approach is evaluated in the context of two experiments. This work is the first attempt at proposing a solution for supporting multi-cluster deployments in a manner that is aligned with the requirements of XR services

[251]  arXiv:2405.00559 [pdf, other]
Title: An Energy Stable Well-balanced Scheme for the Barotropic Euler System with Gravity under the Anelastic Scaling
Subjects: Numerical Analysis (math.NA)

We design and analyse an energy stable, structure preserving, well-balanced and asymptotic preserving (AP) scheme for the barotropic Euler system with gravity in the anelastic limit. The key to energy stability is the introduction of appropriate velocity shifts in the convective fluxes of mass and momenta. The semi-implicit in time and finite volume in space fully-discrete scheme supports the positivity of density and yields the consistency with the weak solutions of the Euler system upon mesh refinement. The numerical scheme admits the discrete hydrostatic states as solutions and the stability of numerical solutions in terms of the relative energy leads to well-balancing. The AP property of the scheme, i.e. the boundedness of the mesh parameters with respect to the Mach/Froude numbers and the scheme's asymptotic consistency with the anelastic Euler system is rigorously shown on the basis of apriori energy estimates. The numerical scheme is resolved in two steps: by solving a non-linear elliptic problem for the density and a subsequent explicit computation of the velocity. Results from several benchmark case studies are presented to corroborate the proposed claims.

[252]  arXiv:2405.00565 [pdf, other]
Title: Leveraging Stack Traces for Spectrum-based Fault Localization in the Absence of Failing Tests
Subjects: Software Engineering (cs.SE)

Bug fixing is a crucial task in software maintenance to hold user trust. Although various automated fault localization techniques exist, they often require specific conditions to be effective. For example, Spectrum-Based Fault Localization (SBFL) techniques need at least one failing test to identify bugs, which may not always be available. Bug reports, particularly those with stack traces, provide detailed information on system execution failures and are invaluable for developers. This study focuses on utilizing stack traces from crash reports as fault-triggering tests for SBFL. Our findings indicate that only 3.33% of bugs have fault-triggering tests, limiting traditional SBFL efficiency. However, 98.3% of bugfix intentions align directly with exceptions in stack traces, and 78.3% of buggy methods are reachable within an average of 0.34 method calls, proving stack traces as a reliable source for locating bugs. We introduce a new approach, SBEST, that integrates stack trace data with test coverage to enhance fault localization. Our approach shows a significant improvement, increasing Mean Average Precision (MAP) by 32.22% and Mean Reciprocal Rank (MRR) by 17.43% over traditional stack trace ranking methods.

[253]  arXiv:2405.00566 [pdf, other]
Title: NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance
Subjects: Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); General Finance (q-fin.GN)

Recently, many works have proposed various financial large language models (FinLLMs) by pre-training from scratch or fine-tuning open-sourced LLMs on financial corpora. However, existing FinLLMs exhibit unsatisfactory performance in understanding financial text when numeric variables are involved in questions. In this paper, we propose a novel LLM, called numeric-sensitive large language model (NumLLM), for Chinese finance. We first construct a financial corpus from financial textbooks which is essential for improving numeric capability of LLMs during fine-tuning. After that, we train two individual low-rank adaptation (LoRA) modules by fine-tuning on our constructed financial corpus. One module is for adapting general-purpose LLMs to financial domain, and the other module is for enhancing the ability of NumLLM to understand financial text with numeric variables. Lastly, we merge the two LoRA modules into the foundation model to obtain NumLLM for inference. Experiments on financial question-answering benchmark show that NumLLM can boost the performance of the foundation model and can achieve the best overall performance compared to all baselines, on both numeric and non-numeric questions.

[254]  arXiv:2405.00568 [pdf, other]
Title: Powering In-Database Dynamic Model Slicing for Structured Data Analytics
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning system. The process can be prohibitively expensive, especially when there are a combinatorial number of subdatasets extracted for different analytical purposes. This calls for efficient in-database support of advanced analytical methods In this paper, we introduce LEADS, a novel SQL-aware dynamic model slicing technique to customize models for subdatasets specified by SQL queries. LEADS improves the predictive modeling of structured data via the mixture of experts (MoE) technique and maintains inference efficiency by a SQL-aware gating network. At the core of LEADS is the construction of a general model with multiple expert sub-models via MoE trained over the entire database. This SQL-aware MoE technique scales up the modeling capacity, enhances effectiveness, and preserves efficiency by activating only necessary experts via the gating network during inference. Additionally, we introduce two regularization terms during the training process of LEADS to strike a balance between effectiveness and efficiency. We also design and build an in-database inference system, called INDICES, to support end-to-end advanced structured data analytics by non-intrusively incorporating LEADS onto PostgreSQL. Our extensive experiments on real-world datasets demonstrate that LEADS consistently outperforms baseline models, and INDICES delivers effective in-database analytics with a considerable reduction in inference latency compared to traditional solutions.

[255]  arXiv:2405.00569 [pdf, other]
Title: A novel central compact finite-difference scheme for third derivatives with high spectral resolution
Comments: 28 pages, 15 figures, 10 Tables
Subjects: Numerical Analysis (math.NA)

In this paper, we introduce a novel category of central compact schemes inspired by existing cell-node and cell-centered compact finite difference schemes, that offer a superior spectral resolution for solving the dispersive wave equation. In our approach, we leverage both the function values at the cell nodes and cell centers to calculate third-order spatial derivatives at the cell nodes. To compute spatial derivatives at the cell centers, we employ a technique that involves half-shifting the indices within the formula initially designed for the cell-nodes. In contrast to the conventional compact interpolation scheme, our proposed method effectively sidesteps the introduction of transfer errors. We employ the Taylor-series expansion-based method to calculate the finite difference coefficients. By conducting systematic Fourier analysis and numerical tests, we note that the methods exhibit exceptional characteristics such as high order, superior resolution, and low dissipation. Computational findings further illustrate the effectiveness of high-order compact schemes, particularly in addressing problems with a third derivative term.

[256]  arXiv:2405.00570 [pdf, other]
Title: WEST GCN-LSTM: Weighted Stacked Spatio-Temporal Graph Neural Networks for Regional Traffic Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Regional traffic forecasting is a critical challenge in urban mobility, with applications to various fields such as the Internet of Everything. In recent years, spatio-temporal graph neural networks have achieved state-of-the-art results in the context of numerous traffic forecasting challenges. This work aims at expanding upon the conventional spatio-temporal graph neural network architectures in a manner that may facilitate the inclusion of information regarding the examined regions, as well as the populations that traverse them, in order to establish a more efficient prediction model. The end-product of this scientific endeavour is a novel spatio-temporal graph neural network architecture that is referred to as WEST (WEighted STacked) GCN-LSTM. Furthermore, the inclusion of the aforementioned information is conducted via the use of two novel dedicated algorithms that are referred to as the Shared Borders Policy and the Adjustable Hops Policy. Through information fusion and distillation, the proposed solution manages to significantly outperform its competitors in the frame of an experimental evaluation that consists of 19 forecasting models, across several datasets. Finally, an additional ablation study determined that each of the components of the proposed solution contributes towards enhancing its overall performance.

[257]  arXiv:2405.00571 [pdf, other]
Title: Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies have proposed pseudo-word token-based Zero-Shot CIR (ZS-CIR) methods, which utilize a projection module to map images to word tokens. However, we conjecture that this approach has a downside: the projection module distorts the original image representation and confines the resulting composed embeddings to the text-side. In order to resolve this, we introduce a novel ZS-CIR method that uses Spherical Linear Interpolation (Slerp) to directly merge image and text representations by identifying an intermediate embedding of both. Furthermore, we introduce Text-Anchored-Tuning (TAT), a method that fine-tunes the image encoder while keeping the text encoder fixed. TAT closes the modality gap between images and text, making the Slerp process much more effective. Notably, the TAT method is not only efficient in terms of the scale of the training dataset and training time, but it also serves as an excellent initial checkpoint for training supervised CIR models, thereby highlighting its wider potential. The integration of the Slerp-based ZS-CIR with a TAT-tuned model enables our approach to deliver state-of-the-art retrieval performance across CIR benchmarks.

[258]  arXiv:2405.00574 [pdf, other]
Title: EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Emotion AI is the ability of computers to understand human emotional states. Existing works have achieved promising progress, but two limitations remain to be solved: 1) Previous studies have been more focused on short sequential video emotion analysis while overlooking long sequential video. However, the emotions in short sequential videos only reflect instantaneous emotions, which may be deliberately guided or hidden. In contrast, long sequential videos can reveal authentic emotions; 2) Previous studies commonly utilize various signals such as facial, speech, and even sensitive biological signals (e.g., electrocardiogram). However, due to the increasing demand for privacy, developing Emotion AI without relying on sensitive signals is becoming important. To address the aforementioned limitations, in this paper, we construct a dataset for Emotion Analysis in Long-sequential and De-identity videos called EALD by collecting and processing the sequences of athletes' post-match interviews. In addition to providing annotations of the overall emotional state of each video, we also provide the Non-Facial Body Language (NFBL) annotations for each player. NFBL is an inner-driven emotional expression and can serve as an identity-free clue to understanding the emotional state. Moreover, we provide a simple but effective baseline for further research. More precisely, we evaluate the Multimodal Large Language Models (MLLMs) with de-identification signals (e.g., visual, speech, and NFBLs) to perform emotion analysis. Our experimental results demonstrate that: 1) MLLMs can achieve comparable, even better performance than the supervised single-modal models, even in a zero-shot scenario; 2) NFBL is an important cue in long sequential emotion analysis. EALD will be available on the open-source platform.

[259]  arXiv:2405.00577 [pdf, ps, other]
Title: Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)

Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN and model explainability techniques have been applied on fMRI datasets for disorder prediction tasks, with a particular emphasis on the robustness of biomarkers produced for neurodegenerative diseases and neuropsychiatric disorders. We found that while most studies have performant models, salient features highlighted in these studies vary greatly across studies on the same disorder and little has been done to evaluate their robustness. To address these issues, we suggest establishing new standards that are based on objective evaluation metrics to determine the robustness of these potential biomarkers. We further highlight gaps in the existing literature and put together a prediction-attribution-evaluation framework that could set the foundations for future research on improving the robustness of potential biomarkers discovered via GNNs.

[260]  arXiv:2405.00578 [pdf, other]
Title: The Real, the Better: Aligning Large Language Models with Online Human Behaviors
Comments: 11 pages, 6 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language model alignment is widely used and studied to avoid LLM producing unhelpful and harmful responses. However, the lengthy training process and predefined preference bias hinder adaptation to online diverse human preferences. To this end, this paper proposes an alignment framework, called Reinforcement Learning with Human Behavior (RLHB), to align LLMs by directly leveraging real online human behaviors. By taking the generative adversarial framework, the generator is trained to respond following expected human behavior; while the discriminator tries to verify whether the triplets of query, response, and human behavior come from real online environments. Behavior modeling in natural-language form and the multi-model joint training mechanism enable an active and sustainable online alignment. Experimental results confirm the effectiveness of our proposed methods by both human and automatic evaluations.

[261]  arXiv:2405.00579 [pdf, other]
Title: LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game
Subjects: Computer Science and Game Theory (cs.GT)

Although Hierarchical Federated Learning (HFL) utilizes edge servers (ESs) to alleviate communication burdens, its model performance will be degraded by non-IID data and limited communication resources. Current works often assume that data is uniformly distributed, which however contradicts the heterogeneity of IoT. Solutions of additional model training to check the data distribution inevitably increases computational costs and the risk of privacy leakage. The challenges in solving these issues are how to reduce the impact of non-IID data without involving raw data and how to rationalize the communication resource allocation for addressing straggler problem. To tackle these challenges, we propose a novel optimization method based on coaLition formation gamE and grAdient Projection, called LEAP. Specifically, we combine edge data distribution with coalition formation game innovatively to adjust the correlations between clients and ESs dynamically, which ensures optimal correlations. We further capture the client heterogeneity to achieve the rational bandwidth allocation from coalition perception and determine the optimal transmission power within specified delay constraints at client level. Experimental results on four real datasets show that LEAP is able to achieve 20.62% improvement in model accuracy compared to the state-of-the-art baselines. Moreover, LEAP effectively reduce transmission energy consumption by at least about 2.24 times.

[262]  arXiv:2405.00584 [pdf, ps, other]
Title: Construction of extremal Type II $\mathbb{Z}_{8}$-codes via doubling method
Comments: 12 pages. arXiv admin note: text overlap with arXiv:2310.14080
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

Extremal Type II $\mathbb{Z}_{8}$-codes are a class of self-dual $\mathbb{Z}_{8}$-codes with Euclidean weights divisible by $16$ and the largest possible minimum Euclidean weight for a given length. We introduce a doubling method for constructing a Type II $\mathbb{Z}_{2k}$-code of length $n$ from a known Type II $\mathbb{Z}_{2k}$-code of length $n$. Based on this method, we develop an algorithm to construct new extremal Type II $\mathbb{Z}_8$-codes starting from an extremal Type II $\mathbb{Z}_8$-code of type $(\frac{n}{2},0,0)$ with an extremal $\mathbb{Z}_4$-residue code and length $24, 32$ or $40$.
We construct at least ten new extremal Type II $\mathbb{Z}_8$-codes of length $32$ and type $(15,1,1)$. Extremal Type II $\mathbb{Z}_8$-codes of length $32$ of this type were not known before. Moreover, the binary residue codes of the constructed extremal $\mathbb{Z}_8$-codes are optimal $[32,15]$ binary codes.

[263]  arXiv:2405.00586 [pdf, other]
Title: Multi-Robot Strategies for Communication-Constrained Exploration and Electrostatic Anomaly Characterization
Subjects: Robotics (cs.RO)

Exploration of extreme or remote environments such as Mars is often recognized as an opportunity for multi-robot systems. However, this poses challenges for maintaining robust inter-robot communication without preexisting infrastructure. It may be that robots can only share information when they are physically in close proximity with each other. At the same time, atmospheric phenomena such as dust devils are poorly understood and characterization of their electrostatic properties is of scientific interest. We perform a comparative analysis of two multi-robot communication strategies: a distributed approach, with pairwise intermittent rendezvous, and a centralized, fixed base station approach. We also introduce and evaluate the effectiveness of an algorithm designed to predict the location and strength of electrostatic anomalies, assuming robot proximity. Using an agent-based simulation, we assess the performance of these strategies in a 2D grid cell representation of a Martian environment. Results indicate that a decentralized rendezvous system consistently outperforms a fixed base station system in terms of exploration speed and in reducing the risk of data loss. We also find that inter-robot data sharing improves performance when trying to predict the location and strength of an electrostatic anomaly. These findings indicate the importance of appropriate communication strategies for efficient multi-robot science missions.

[264]  arXiv:2405.00587 [pdf, other]
Title: GraCo: Granularity-Controllable Interactive Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant results. In this work, we introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input. This enhances the customization of the interactive system and eliminates redundancy while resolving ambiguity. Nevertheless, the exorbitant cost of annotating multi-granularity masks and the lack of available datasets with granularity annotations make it difficult for models to acquire the necessary guidance to control output granularity. To address this problem, we design an any-granularity mask generator that exploits the semantic property of the pre-trained IS model to automatically generate abundant mask-granularity pairs without requiring additional manual annotation. Based on these pairs, we propose a granularity-controllable learning strategy that efficiently imparts the granularity controllability to the IS model. Extensive experiments on intricate scenarios at object and part levels demonstrate that our GraCo has significant advantages over previous methods. This highlights the potential of GraCo to be a flexible annotation tool, capable of adapting to diverse segmentation scenarios. The project page: https://zhao-yian.github.io/GraCo.

[265]  arXiv:2405.00588 [pdf, other]
Title: Are Models Biased on Text without Gender-related Language?
Comments: In International Conference on Learning Representations 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG)

Gender bias research has been pivotal in revealing undesirable behaviors in large language models, exposing serious gender stereotypes associated with occupations, and emotions. A key observation in prior work is that models reinforce stereotypes as a consequence of the gendered correlations that are present in the training data. In this paper, we focus on bias where the effect from training data is unclear, and instead address the question: Do language models still exhibit gender bias in non-stereotypical settings? To do so, we introduce UnStereoEval (USE), a novel framework tailored for investigating gender bias in stereotype-free scenarios. USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations. To systematically benchmark the fairness of popular language models in stereotype-free scenarios, we utilize USE to automatically generate benchmarks without any gender-related language. By leveraging USE's sentence-level score, we also repurpose prior gender bias benchmarks (Winobias and Winogender) for non-stereotypical evaluation. Surprisingly, we find low fairness across all 28 tested models. Concretely, models demonstrate fair behavior in only 9%-41% of stereotype-free sentences, suggesting that bias does not solely stem from the presence of gender-related words. These results raise important questions about where underlying model biases come from and highlight the need for more systematic and comprehensive bias evaluation. We release the full dataset and code at https://ucinlp.github.io/unstereo-eval.

[266]  arXiv:2405.00596 [pdf, other]
Title: Unbundle-Rewrite-Rebundle: Runtime Detection and Rewriting of Privacy-Harming Code in JavaScript Bundles
Subjects: Cryptography and Security (cs.CR)

This work presents Unbundle-Rewrite-Rebundle (URR), a system for detecting privacy-harming portions of bundled JavaScript code, and rewriting that code at runtime to remove the privacy harming behavior without breaking the surrounding code or overall application. URR is a novel solution to the problem of JavaScript bundles, where websites pre-compile multiple code units into a single file, making it impossible for content filters and ad-blockers to differentiate between desired and unwanted resources. Where traditional content filtering tools rely on URLs, URR analyzes the code at the AST level, and replaces harmful AST sub-trees with privacy-and-functionality maintaining alternatives.
We present an open-sourced implementation of URR as a Firefox extension, and evaluate it against JavaScript bundles generated by the most popular bundling system (Webpack) deployed on the Tranco 10k. We measure the performance, measured by precision (1.00), recall (0.95), and speed (0.43s per-script) when detecting and rewriting three representative privacy harming libraries often included in JavaScript bundles, and find URR to be an effective approach to a large-and-growing blind spot unaddressed by current privacy tools.

[267]  arXiv:2405.00600 [pdf, other]
Title: Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods
Comments: Accepted to Field Robotics, 1. May 2024
Subjects: Robotics (cs.RO)

For autonomous ground vehicles (AGVs) deployed in suburban neighborhoods and other human-centric environments the problem of localization remains a fundamental challenge. There are well established methods for localization with GPS, lidar, and cameras. But even in ideal conditions these have limitations. GPS is not always available and is often not accurate enough on its own, visual methods have difficulty coping with appearance changes due to weather and other factors, and lidar methods are prone to defective solutions due to ambiguous scene geometry. Radar on the other hand is not highly susceptible to these problems, owing in part to its longer range. Further, radar is also robust to challenging conditions that interfere with vision and lidar including fog, smoke, rain, and darkness. We present a radar-based localization system that includes a novel method for highly-accurate radar odometry for smooth, high-frequency relative pose estimation and a novel method for radar-based place recognition and relocalization. We present experiments demonstrating our methods' accuracy and reliability, which are comparable with \new{other methods' published results for radar localization and we find outperform a similar method as ours applied to lidar measurements}. Further, we show our methods are lightweight enough to run on common low-power embedded hardware with ample headroom for other autonomy functions.

[268]  arXiv:2405.00601 [pdf, other]
Title: How founder motivations, goals, and actions influence early trajectories of online communities
Comments: To be published in CHI 2024
Subjects: Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI)

Online communities offer their members various benefits, such as information access, social and emotional support, and entertainment. Despite the important role that founders play in shaping communities, prior research has focused primarily on what drives users to participate and contribute; the motivations and goals of founders remain underexplored. To uncover how and why online communities get started, we present findings from a survey of 951 recent founders of Reddit communities. We find that topical interest is the most common motivation for community creation, followed by motivations to exchange information, connect with others, and self-promote. Founders have heterogeneous goals for their nascent communities, but they tend to privilege community quality and engagement over sheer growth. These differences in founders' early attitudes towards their communities help predict not only the community-building actions that they pursue, but also the ability of their communities to attract visitors, contributors, and subscribers over the first 28 days. We end with a discussion of the implications for researchers, designers, and founders of online communities.

[269]  arXiv:2405.00602 [pdf, other]
Title: Investigating Automatic Scoring and Feedback using Large Language Models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Automatic grading and feedback have been long studied using traditional machine learning and deep learning techniques using language models. With the recent accessibility to high performing large language models (LLMs) like LLaMA-2, there is an opportunity to investigate the use of these LLMs for automatic grading and feedback generation. Despite the increase in performance, LLMs require significant computational resources for fine-tuning and additional specific adjustments to enhance their performance for such tasks. To address these issues, Parameter Efficient Fine-tuning (PEFT) methods, such as LoRA and QLoRA, have been adopted to decrease memory and computational requirements in model fine-tuning. This paper explores the efficacy of PEFT-based quantized models, employing classification or regression head, to fine-tune LLMs for automatically assigning continuous numerical grades to short answers and essays, as well as generating corresponding feedback. We conducted experiments on both proprietary and open-source datasets for our tasks. The results show that prediction of grade scores via finetuned LLMs are highly accurate, achieving less than 3% error in grade percentage on average. For providing graded feedback fine-tuned 4-bit quantized LLaMA-2 13B models outperform competitive base models and achieve high similarity with subject matter expert feedback in terms of high BLEU and ROUGE scores and qualitatively in terms of feedback. The findings from this study provide important insights into the impacts of the emerging capabilities of using quantization approaches to fine-tune LLMs for various downstream tasks, such as automatic short answer scoring and feedback generation at comparatively lower costs and latency.

[270]  arXiv:2405.00603 [pdf, other]
Title: Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issues, we propose a novel framework for expressive voice conversion called "SAVC" based on soft speech units from HuBert-soft. Taking soft speech units as input, we design an attribute encoder to extract content and prosody features respectively. Specifically, we first introduce statistic perturbation imposed by adversarial style augmentation to eliminate speaker information. Then the prosody is implicitly modeled on soft speech units with knowledge distillation. Experiment results show that the intelligibility and naturalness of converted speech outperform previous work.

[271]  arXiv:2405.00604 [pdf, other]
Title: A Preprocessing and Evaluation Toolbox for Trajectory Prediction Research on the Drone Datasets
Comments: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

The availability of high-quality datasets is crucial for the development of behavior prediction algorithms in autonomous vehicles. This paper highlights the need for standardizing the use of certain datasets for motion forecasting research to simplify comparative analysis and proposes a set of tools and practices to achieve this. Drawing on extensive experience and a comprehensive review of current literature, we summarize our proposals for preprocessing, visualizing, and evaluation in the form of an open-sourced toolbox designed for researchers working on trajectory prediction problems. The clear specification of necessary preprocessing steps and evaluation metrics is intended to alleviate development efforts and facilitate the comparison of results across different studies. The toolbox is available at: https://github.com/westny/dronalize.

[272]  arXiv:2405.00611 [pdf, other]
Title: Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) with their strong zero-shot topic extraction capabilities offer an alternative to probabilistic topic modelling and closed-set topic classification approaches. As zero-shot topic extractors, LLMs are expected to understand human instructions to generate relevant and non-hallucinated topics based on the given documents. However, LLM-based topic modelling approaches often face difficulties in generating topics with adherence to granularity as specified in human instructions, often resulting in many near-duplicate topics. Furthermore, methods for addressing hallucinated topics generated by LLMs have not yet been investigated. In this paper, we focus on addressing the issues of topic granularity and hallucinations for better LLM-based topic modelling. To this end, we introduce a novel approach that leverages Direct Preference Optimisation (DPO) to fine-tune open-source LLMs, such as Mistral-7B. Our approach does not rely on traditional human annotation to rank preferred answers but employs a reconstruction pipeline to modify raw topics generated by LLMs, thus enabling a fast and efficient training and inference framework. Comparative experiments show that our fine-tuning approach not only significantly improves the LLM's capability to produce more coherent, relevant, and precise topics, but also reduces the number of hallucinated topics.

[273]  arXiv:2405.00614 [pdf, other]
Title: Multigroup Robustness
Subjects: Machine Learning (cs.LG)

To address the shortcomings of real-world datasets, robust learning algorithms have been designed to overcome arbitrary and indiscriminate data corruption. However, practical processes of gathering data may lead to patterns of data corruption that are localized to specific partitions of the training dataset. Motivated by critical applications where the learned model is deployed to make predictions about people from a rich collection of overlapping subpopulations, we initiate the study of multigroup robust algorithms whose robustness guarantees for each subpopulation only degrade with the amount of data corruption inside that subpopulation. When the data corruption is not distributed uniformly over subpopulations, our algorithms provide more meaningful robustness guarantees than standard guarantees that are oblivious to how the data corruption and the affected subpopulations are related. Our techniques establish a new connection between multigroup fairness and robustness.

[274]  arXiv:2405.00616 [pdf, other]
Title: An Expectation-Maximization Relaxed Method for Privacy Funnel
Subjects: Information Theory (cs.IT)

The privacy funnel (PF) gives a framework of privacy-preserving data release, where the goal is to release useful data while also limiting the exposure of associated sensitive information. This framework has garnered significant interest due to its broad applications in characterization of the privacy-utility tradeoff. Hence, there is a strong motivation to develop numerical methods with high precision and theoretical convergence guarantees. In this paper, we propose a novel relaxation variant based on Jensen's inequality of the objective function for the computation of the PF problem. This model is proved to be equivalent to the original in terms of optimal solutions and optimal values. Based on our proposed model, we develop an accurate algorithm which only involves closed-form iterations. The convergence of our algorithm is theoretically guaranteed through descent estimation and Pinsker's inequality. Numerical results demonstrate the effectiveness of our proposed algorithm.

[275]  arXiv:2405.00620 [pdf, other]
Title: Lane Segmentation Refinement with Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning. Previously, He et al. (2022) explored the extraction of the lane-level graph from aerial imagery utilizing a segmentation based approach. However, segmentation networks struggle to achieve perfect segmentation masks resulting in inaccurate lane graph extraction. We explore additional enhancements to refine this segmentation-based approach and extend it with a diffusion probabilistic model (DPM) component. This combination further improves the GEO F1 and TOPO F1 scores, which are crucial indicators of the quality of a lane graph, in the undirected graph in non-intersection areas. We conduct experiments on a publicly available dataset, demonstrating that our method outperforms the previous approach, particularly in enhancing the connectivity of such a graph, as measured by the TOPO F1 score. Moreover, we perform ablation studies on the individual components of our method to understand their contribution and evaluate their effectiveness.

[276]  arXiv:2405.00622 [pdf, other]
Title: Causal Evaluation of Language Models
Comments: 315 pages, 230 figures, 21 tables. Project website: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive benchmark for evaluating the causal reasoning capabilities of language models. First, we propose the CaLM framework, which establishes a foundational taxonomy consisting of four modules: causal target (i.e., what to evaluate), adaptation (i.e., how to obtain the results), metric (i.e., how to measure the results), and error (i.e., how to analyze the bad results). This taxonomy defines a broad evaluation design space while systematically selecting criteria and priorities. Second, we compose the CaLM dataset, comprising 126,334 data samples, to provide curated sets of causal targets, adaptations, metrics, and errors, offering extensive coverage for diverse research pursuits. Third, we conduct an extensive evaluation of 28 leading language models on a core set of 92 causal targets, 9 adaptations, 7 metrics, and 12 error types. Fourth, we perform detailed analyses of the evaluation results across various dimensions (e.g., adaptation, scale). Fifth, we present 50 high-level empirical findings across 9 dimensions (e.g., model), providing valuable guidance for future language model development. Finally, we develop a multifaceted platform, including a website, leaderboards, datasets, and toolkits, to support scalable and adaptable assessments. We envision CaLM as an ever-evolving benchmark for the community, systematically updated with new causal targets, adaptations, models, metrics, and error types to reflect ongoing research advancements. Project website is at https://opencausalab.github.io/CaLM.

[277]  arXiv:2405.00623 [pdf, other]
Title: "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust
Comments: Accepted to FAccT 2024. This version includes the appendix
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We explore this question through a large-scale, pre-registered, human-subject experiment (N=404) in which participants answer medical questions with or without access to responses from a fictional LLM-infused search engine. Using both behavioral and self-reported measures, we examine how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions (e.g., "I'm not sure, but...") decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. An exploratory analysis suggests that this increase can be attributed to reduced (but not fully eliminated) overreliance on incorrect answers. While we observe similar effects for uncertainty expressed from a general perspective (e.g., "It's not clear, but..."), these effects are weaker and not statistically significant. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters. This highlights the importance of user testing before deploying LLMs at scale.

[278]  arXiv:2405.00625 [pdf, other]
Title: Queue-based Eco-Driving at Roundabouts with Reinforcement Learning
Subjects: Machine Learning (cs.LG)

We address eco-driving at roundabouts in mixed traffic to enhance traffic flow and traffic efficiency in urban areas. The aim is to proactively optimize speed of automated or non-automated connected vehicles (CVs), ensuring both an efficient approach and smooth entry into roundabouts. We incorporate the traffic situation ahead, i.e. preceding vehicles and waiting queues. Further, we develop two approaches: a rule-based and an Reinforcement Learning (RL) based eco-driving system, with both using the approach link and information from conflicting CVs for speed optimization. A fair comparison of rule-based and RL-based approaches is performed to explore RL as a viable alternative to classical optimization. Results show that both approaches outperform the baseline. Improvements significantly increase with growing traffic volumes, leading to best results on average being obtained at high volumes. Near capacity, performance deteriorates, indicating limited applicability at capacity limits. Examining different CV penetration rates, a decline in performance is observed, but with substantial results still being achieved at lower CV rates. RL agents can discover effective policies for speed optimization in dynamic roundabout settings, but they do not offer a substantial advantage over classical approaches, especially at higher traffic volumes or lower CV penetration rates.

[279]  arXiv:2405.00627 [pdf, other]
Title: Koopman-based Deep Learning for Nonlinear System Estimation
Comments: 11 pages
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Nonlinear differential equations are encountered as models of fluid flow, spiking neurons, and many other systems of interest in the real world. Common features of these systems are that their behaviors are difficult to describe exactly and invariably unmodeled dynamics present challenges in making precise predictions. In many cases the models exhibit extremely complicated behavior due to bifurcations and chaotic regimes. In this paper, we present a novel data-driven linear estimator that uses Koopman operator theory to extract finite-dimensional representations of complex nonlinear systems. The extracted model is used together with a deep reinforcement learning network that learns the optimal stepwise actions to predict future states of the original nonlinear system. Our estimator is also adaptive to a diffeomorphic transformation of the nonlinear system which enables transfer learning to compute state estimates of the transformed system without relearning from scratch.

[280]  arXiv:2405.00629 [pdf, other]
Title: HUGO -- Highlighting Unseen Grid Options: Combining Deep Reinforcement Learning with a Heuristic Target Topology Approach
Comments: 12 pages + 2 pages references, 9 Figures, submission planed in Sustainable Energy, Grids and Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

With the growth of Renewable Energy (RE) generation, the operation of power grids has become increasingly complex. One solution is automated grid operation, where Deep Reinforcement Learning (DRL) has repeatedly shown significant potential in Learning to Run a Power Network (L2RPN) challenges. However, only individual actions at the substation level have been subjected to topology optimization by most existing DRL algorithms. In contrast, we propose a more holistic approach in this paper by proposing specific Target Topologies (TTs) as actions. These topologies are selected based on their robustness. As part of this paper, we present a search algorithm to find the TTs and upgrade our previously developed DRL agent CurriculumAgent (CAgent) to a novel topology agent. We compare the upgrade to the previous CAgent agent and can increase their scores significantly by 10%. Further, we achieve a 25% better median survival with our TTs included. Later analysis shows that almost all TTs are close to the base topology, explaining their robustness.

[281]  arXiv:2405.00630 [pdf, other]
Title: Depth Priors in Removal Neural Radiance Fields
Authors: Zhihao Guo, Peng Wang
Comments: 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Neural Radiance Fields (NeRF) have shown impressive results in 3D reconstruction and generating novel views. A key challenge within NeRF is the editing of reconstructed scenes, such as object removal, which requires maintaining consistency across multiple views and ensuring high-quality synthesised perspectives. Previous studies have incorporated depth priors, typically from LiDAR or sparse depth measurements provided by COLMAP, to improve the performance of object removal in NeRF. However, these methods are either costly or time-consuming. In this paper, we propose a novel approach that integrates monocular depth estimates with NeRF-based object removal models to significantly reduce time consumption and enhance the robustness and quality of scene generation and object removal. We conducted a thorough evaluation of COLMAP's dense depth reconstruction on the KITTI dataset to verify its accuracy in depth map generation. Our findings suggest that COLMAP can serve as an effective alternative to a ground truth depth map where such information is missing or costly to obtain. Additionally, we integrated various monocular depth estimation methods into the removal NeRF model, i.e., SpinNeRF, to assess their capacity to improve object removal performance. Our experimental results highlight the potential of monocular depth estimation to substantially improve NeRF applications.

[282]  arXiv:2405.00631 [pdf, other]
Title: Deep Metric Learning-Based Out-of-Distribution Detection with Synthetic Outlier Exposure
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a novel approach that combines deep metric learning and synthetic data generation using diffusion models for out-of-distribution (OOD) detection. One popular approach for OOD detection is outlier exposure, where models are trained using a mixture of in-distribution (ID) samples and ``seen" OOD samples. For the OOD samples, the model is trained to minimize the KL divergence between the output probability and the uniform distribution while correctly classifying the in-distribution (ID) data. In this paper, we propose a label-mixup approach to generate synthetic OOD data using Denoising Diffusion Probabilistic Models (DDPMs). Additionally, we explore recent advancements in metric learning to train our models.
In the experiments, we found that metric learning-based loss functions perform better than the softmax. Furthermore, the baseline models (including softmax, and metric learning) show a significant improvement when trained with the generated OOD data. Our approach outperforms strong baselines in conventional OOD detection metrics.

[283]  arXiv:2405.00632 [pdf, other]
Title: When Quantization Affects Confidence of Large Language Models?
Comments: Accepted to NAACL 2024 Findings
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent studies introduced effective compression techniques for Large Language Models (LLMs) via post-training quantization or low-bit weight representation. Although quantized weights offer storage efficiency and allow for faster inference, existing works have indicated that quantization might compromise performance and exacerbate biases in LLMs. This study investigates the confidence and calibration of quantized models, considering factors such as language model type and scale as contributors to quantization loss. Firstly, we reveal that quantization with GPTQ to 4-bit results in a decrease in confidence regarding true labels, with varying impacts observed among different language models. Secondly, we observe fluctuations in the impact on confidence across different scales. Finally, we propose an explanation for quantization loss based on confidence levels, indicating that quantization disproportionately affects samples where the full model exhibited low confidence levels in the first place.

[284]  arXiv:2405.00637 [pdf, ps, other]
Title: A Distributed Model Identification Algorithm for Multi-Agent Systems
Comments: 6 pages, 4 figures
Subjects: Systems and Control (eess.SY)

In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantages including a large reduction in the communication network's bandwidth requirements by minimizing the data exchanged at each iteration and enabling the model to adapt in real-time to disturbances. Furthermore, we extend our model identification process from linear frameworks to more complex non-linear convex models. This extension is validated through numerical studies demonstrating improved control performance for a synthetic IEEE test case.

[285]  arXiv:2405.00644 [pdf, other]
Title: ConstrainedZero: Chance-Constrained POMDP Planning using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints
Comments: In Proceedings of the 2024 International Joint Conference on Artificial Intelligence (IJCAI)
Subjects: Artificial Intelligence (cs.AI)

To plan safely in uncertain environments, agents must balance utility with safety constraints. Safe planning problems can be modeled as a chance-constrained partially observable Markov decision process (CC-POMDP) and solutions often use expensive rollouts or heuristics to estimate the optimal value and action-selection policy. This work introduces the ConstrainedZero policy iteration algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy with an additional network head that estimates the failure probability given a belief. This failure probability guides safe action selection during online Monte Carlo tree search (MCTS). To avoid overemphasizing search based on the failure estimates, we introduce $\Delta$-MCTS, which uses adaptive conformal inference to update the failure threshold during planning. The approach is tested on a safety-critical POMDP benchmark, an aircraft collision avoidance system, and the sustainability problem of safe CO$_2$ storage. Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.

[286]  arXiv:2405.00645 [pdf, other]
Title: Gradient-based Automatic Per-Weight Mixed Precision Quantization for Neural Networks On-Chip
Subjects: Machine Learning (cs.LG); Instrumentation and Detectors (physics.ins-det)

Model size and inference speed at deployment time, are major challenges in many deep learning applications. A promising strategy to overcome these challenges is quantization. However, a straightforward uniform quantization to very low precision can result in significant accuracy loss. Mixed-precision quantization, based on the idea that certain parts of the network can accommodate lower precision without compromising performance compared to other parts, offers a potential solution. In this work, we present High Granularity Quantization (HGQ), an innovative quantization-aware training method designed to fine-tune the per-weight and per-activation precision in an automatic way for ultra-low latency and low power neural networks which are to be deployed on FPGAs. We demonstrate that HGQ can outperform existing methods by a substantial margin, achieving resource reduction by up to a factor of 20 and latency improvement by a factor of 5 while preserving accuracy.

[287]  arXiv:2405.00646 [pdf, other]
Title: Learning to Compose: Improving Object Centric Learning by Injecting Compositionality
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding objective, while the compositionality is implicitly imposed by the architectural or algorithmic bias in the encoder. This misalignment between auto-encoding objective and learning compositionality often results in failure of capturing meaningful object representations. In this study, we propose a novel objective that explicitly encourages compositionality of the representations. Built upon the existing object-centric learning framework (e.g., slot attention), our method incorporates additional constraints that an arbitrary mixture of object representations from two images should be valid by maximizing the likelihood of the composite data. We demonstrate that incorporating our objective to the existing framework consistently improves the objective-centric learning and enhances the robustness to the architectural choices.

[288]  arXiv:2405.00648 [pdf, other]
Title: HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models
Subjects: Software Engineering (cs.SE)

Large language models (LLMs) have transformed the landscape of language processing, yet struggle with significant challenges in terms of security, privacy, and the generation of seemingly coherent but factually inaccurate outputs, commonly referred to as hallucinations. Among these challenges, one particularly pressing issue is Fact-Conflicting Hallucination (FCH), where LLMs generate content that directly contradicts established facts. Tackling FCH poses a formidable task due to two primary obstacles: Firstly, automating the construction and updating of benchmark datasets is challenging, as current methods rely on static benchmarks that don't cover the diverse range of FCH scenarios. Secondly, validating LLM outputs' reasoning process is inherently complex, especially with intricate logical relations involved.
In addressing these obstacles, we propose an innovative approach leveraging logic programming to enhance metamorphic testing for detecting Fact-Conflicting Hallucinations (FCH). Our method gathers data from sources like Wikipedia, expands it with logical reasoning to create diverse test cases, assesses LLMs through structured prompts, and validates their coherence using semantic-aware assessment mechanisms. Our method generates test cases and detects hallucinations across six different LLMs spanning nine domains, revealing hallucination rates ranging from 24.7% to 59.8%. Key observations indicate that LLMs encounter challenges, particularly with temporal concepts, handling out-of-distribution knowledge, and exhibiting deficiencies in logical reasoning capabilities. The outcomes underscore the efficacy of logic-based test cases generated by our tool in both triggering and identifying hallucinations. These findings underscore the imperative for ongoing collaborative endeavors within the community to detect and address LLM hallucinations.

[289]  arXiv:2405.00650 [pdf, other]
Title: Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models
Comments: 10 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.

[290]  arXiv:2405.00657 [pdf, other]
Title: RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document Abstractive Summarization
Comments: NAACL 2024 Main & Long Conference Paper (Oral Presentation)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

For long document summarization, discourse structure is important to discern the key content of the text and the differences in importance level between sentences. Unfortunately, the integration of rhetorical structure theory (RST) into parameter-efficient fine-tuning strategies for long document summarization remains unexplored. Therefore, this paper introduces RST-LoRA and proposes four RST-aware variants to explicitly incorporate RST into the LoRA model. Our empirical evaluation demonstrates that incorporating the type and uncertainty of rhetorical relations can complementarily enhance the performance of LoRA in summarization tasks. Furthermore, the best-performing variant we introduced outperforms the vanilla LoRA and full-parameter fine-tuning models, as confirmed by multiple automatic and human evaluations, and even surpasses previous state-of-the-art methods.

[291]  arXiv:2405.00659 [pdf, other]
Title: NLU-STR at SemEval-2024 Task 1: Generative-based Augmentation and Encoder-based Scoring for Semantic Textual Relatedness
Subjects: Computation and Language (cs.CL)

Semantic textual relatedness is a broader concept of semantic similarity. It measures the extent to which two chunks of text convey similar meaning or topics, or share related concepts or contexts. This notion of relatedness can be applied in various applications, such as document clustering and summarizing. SemRel-2024, a shared task in SemEval-2024, aims at reducing the gap in the semantic relatedness task by providing datasets for fourteen languages and dialects including Arabic. This paper reports on our participation in Track A (Algerian and Moroccan dialects) and Track B (Modern Standard Arabic). A BERT-based model is augmented and fine-tuned for regression scoring in supervised track (A), while BERT-based cosine similarity is employed for unsupervised track (B). Our system ranked 1st in SemRel-2024 for MSA with a Spearman correlation score of 0.49. We ranked 5th for Moroccan and 12th for Algerian with scores of 0.83 and 0.53, respectively.

[292]  arXiv:2405.00662 [pdf, other]
Title: No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
Comments: Code and run histories are available at this https URL
Subjects: Machine Learning (cs.LG)

Reinforcement learning (RL) is inherently rife with non-stationarity since the states and rewards the agent observes during training depend on its changing policy. Therefore, networks in deep RL must be capable of adapting to new observations and fitting new targets. However, previous works have observed that networks in off-policy deep value-based methods exhibit a decrease in representation rank, often correlated with an inability to continue learning or a collapse in performance. Although this phenomenon has generally been attributed to neural network learning under non-stationarity, it has been overlooked in on-policy policy optimization methods which are often thought capable of training indefinitely. In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments, revealing that PPO agents are also affected by feature rank deterioration and loss of plasticity. We show that this is aggravated with stronger non-stationarity, ultimately driving the actor's performance to collapse, regardless of the performance of the critic. We draw connections between representation collapse, performance collapse, and trust region issues in PPO, and present Proximal Feature Optimization (PFO), a novel auxiliary loss, that along with other interventions shows that regularizing the representation dynamics improves the performance of PPO agents.

[293]  arXiv:2405.00664 [pdf, other]
Title: Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This study presents a targeted model editing analysis focused on the latest large language model, Llama-3. We explore the efficacy of popular model editing techniques - ROME, MEMIT, and EMMET, which are designed for precise layer interventions. We identify the most effective layers for targeted edits through an evaluation that encompasses up to 4096 edits across three distinct strategies: sequential editing, batch editing, and a hybrid approach we call as sequential-batch editing. Our findings indicate that increasing edit batch-sizes may degrade model performance more significantly than using smaller edit batches sequentially for equal number of edits. With this, we argue that sequential model editing is an important component for scaling model editing methods and future research should focus on methods that combine both batched and sequential editing. This observation suggests a potential limitation in current model editing methods which push towards bigger edit batch sizes, and we hope it paves way for future investigations into optimizing batch sizes and model editing performance.

[294]  arXiv:2405.00665 [pdf, other]
Title: Optimizing Profitability in Timely Gossip Networks
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

We consider a communication system where a group of users, interconnected in a bidirectional gossip network, wishes to follow a time-varying source, e.g., updates on an event, in real-time. The users wish to maintain their expected version ages below a threshold, and can either rely on gossip from their neighbors or directly subscribe to a server publishing about the event, if the former option does not meet the timeliness requirements. The server wishes to maximize its profit by increasing subscriptions from users and minimizing event sampling frequency to reduce costs. This leads to a Stackelberg game between the server and the users where the sender is the leader deciding its sampling frequency and the users are the followers deciding their subscription strategies. We investigate equilibrium strategies for low-connectivity and high-connectivity topologies.

[295]  arXiv:2405.00666 [pdf, other]
Title: RGB$\leftrightarrow$X: Image decomposition and synthesis using material- and lighting-aware diffusion models
Journal-ref: SIGGRAPH Conference Papers '24, July 27-August 1, 2024, Denver, CO, USA
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

The three areas of realistic forward rendering, per-pixel inverse rendering, and generative image synthesis may seem like separate and unrelated sub-fields of graphics and vision. However, recent work has demonstrated improved estimation of per-pixel intrinsic channels (albedo, roughness, metallicity) based on a diffusion architecture; we call this the RGB$\rightarrow$X problem. We further show that the reverse problem of synthesizing realistic images given intrinsic channels, X$\rightarrow$RGB, can also be addressed in a diffusion framework.
Focusing on the image domain of interior scenes, we introduce an improved diffusion model for RGB$\rightarrow$X, which also estimates lighting, as well as the first diffusion X$\rightarrow$RGB model capable of synthesizing realistic images from (full or partial) intrinsic channels. Our X$\rightarrow$RGB model explores a middle ground between traditional rendering and generative models: we can specify only certain appearance properties that should be followed, and give freedom to the model to hallucinate a plausible version of the rest.
This flexibility makes it possible to use a mix of heterogeneous training datasets, which differ in the available channels. We use multiple existing datasets and extend them with our own synthetic and real data, resulting in a model capable of extracting scene properties better than previous work and of generating highly realistic images of interior scenes.

[296]  arXiv:2405.00670 [pdf, other]
Title: Adapting Pretrained Networks for Image Quality Assessment on High Dynamic Range Displays
Comments: 7 pages, 3 figures, 3 tables. Submitted to Human Vision and Electronic Imaging 2024 (HVEI)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Conventional image quality metrics (IQMs), such as PSNR and SSIM, are designed for perceptually uniform gamma-encoded pixel values and cannot be directly applied to perceptually non-uniform linear high-dynamic-range (HDR) colors. Similarly, most of the available datasets consist of standard-dynamic-range (SDR) images collected in standard and possibly uncontrolled viewing conditions. Popular pre-trained neural networks are likewise intended for SDR inputs, restricting their direct application to HDR content. On the other hand, training HDR models from scratch is challenging due to limited available HDR data. In this work, we explore more effective approaches for training deep learning-based models for image quality assessment (IQA) on HDR data. We leverage networks pre-trained on SDR data (source domain) and re-target these models to HDR (target domain) with additional fine-tuning and domain adaptation. We validate our methods on the available HDR IQA datasets, demonstrating that models trained with our combined recipe outperform previous baselines, converge much quicker, and reliably generalize to HDR inputs.

[297]  arXiv:2405.00672 [pdf, other]
Title: TexSliders: Diffusion-Based Texture Editing in CLIP Space
Comments: SIGGRAPH 2024 Conference Proceedings
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show that they are not directly applicable to textures, since their common underlying approach, manipulating attention maps, is unsuitable for the texture domain. To address this, we propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation. We define editing directions using simple text prompts (e.g., "aged wood" to "new wood") and map these to CLIP image embedding space using a texture prior, with a sampling-based approach that gives us identity-preserving directions in CLIP space. To further improve identity preservation, we project these directions to a CLIP subspace that minimizes identity variations resulting from entangled texture attributes. Our editing pipeline facilitates the creation of arbitrary sliders using natural language prompts only, with no ground-truth annotated data necessary.

[298]  arXiv:2405.00675 [pdf, other]
Title: Self-Play Preference Optimization for Language Model Alignment
Comments: 25 pages, 4 figures, 5 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed \textit{Self-Play Preference Optimization} (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models.

[299]  arXiv:2405.00676 [pdf, other]
Title: Spectrally Pruned Gaussian Fields with Neural Compensation
Comments: Code: this https URL Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between primitives. In this paper, we propose a memory-efficient Gaussian field named SUNDAE with spectral pruning and neural compensation. On one hand, we construct a graph on the set of Gaussian primitives to model their relationship and design a spectral down-sampling module to prune out primitives while preserving desired signals. On the other hand, to compensate for the quality loss of pruning Gaussians, we exploit a lightweight neural network head to mix splatted features, which effectively compensates for quality losses while capturing the relationship between primitives in its weights. We demonstrate the performance of SUNDAE with extensive results. For example, SUNDAE can achieve 26.80 PSNR at 145 FPS using 104 MB memory while the vanilla Gaussian splatting algorithm achieves 25.60 PSNR at 160 FPS using 523 MB memory, on the Mip-NeRF360 dataset. Codes are publicly available at https://runyiyang.github.io/projects/SUNDAE/.

Cross-lists for Thu, 2 May 24

[300]  arXiv:2405.00012 (cross-list from quant-ph) [pdf, other]
Title: A quantum neural network framework for scalable quantum circuit approximation of unitary matrices
Comments: 58 pages. arXiv admin note: substantial text overlap with arXiv:2304.14096
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)

In this paper, we develop a Lie group theoretic approach for parametric representation of unitary matrices. This leads to develop a quantum neural network framework for quantum circuit approximation of multi-qubit unitary gates. Layers of the neural networks are defined by product of exponential of certain elements of the Standard Recursive Block Basis, which we introduce as an alternative to Pauli string basis for matrix algebra of complex matrices of order $2^n$. The recursive construction of the neural networks implies that the quantum circuit approximation is scalable i.e. quantum circuit for an $(n+1)$-qubit unitary can be constructed from the circuit of $n$-qubit system by adding a few CNOT gates and single-qubit gates.

[301]  arXiv:2405.00036 (cross-list from physics.soc-ph) [pdf, other]
Title: Spatio-temporal load shifting for truly clean computing
Subjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY)

Companies with datacenters are procuring significant amounts of renewable energy to reduce their carbon footprint. There is increasing interest in achieving 24/7 Carbon-Free Energy (CFE) matching in electricity usage, aiming to eliminate all carbon footprints associated with electricity consumption on an hourly basis. However, the variability of renewable energy resources poses significant challenges for achieving this goal. We explore the impact of shifting computing jobs and associated power loads both in time and between datacenter locations. We develop an optimization model to simulate a network of geographically distributed datacenters managed by a company leveraging spatio-temporal load flexibility to achieve 24/7 CFE matching. We isolate three signals relevant for informed use of load flexiblity: varying average quality of renewable energy resources, low correlation between wind power generation over long distances due to different weather conditions, and lags in solar radiation peak due to Earth's rotation. We illustrate that the location of datacenters and the time of year affect which signal drives an effective load-shaping strategy. The energy procurement and load-shifting decisions based on informed use of these signals facilitate the resource-efficiency and cost-effectiveness of clean computing -- the costs of 24/7 CFE are reduced by 1.29$\pm$0.07 EUR/MWh for every additional percentage of flexible load. We provide practical guidelines on how companies with datacenters can leverage spatio-temporal load flexibility for truly clean computing. Our results and the open-source optimization model can also be useful for a broader variety of companies with flexible loads and an interest in eliminating their carbon footprint.

[302]  arXiv:2405.00041 (cross-list from physics.soc-ph) [pdf, other]
Title: A theory of best choice selection through objective arguments grounded in Linear Response Theory concepts
Comments: 25 pages, 2 figures, 5 tables, 72 references; accepted in a Special Issue of the journal Physics in honor of Serge Galam for his 70th birthday and 40 years of Sociophysics
Subjects: Physics and Society (physics.soc-ph); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)

In this paper, we propose how to use objective arguments grounded in statistical mechanics concepts in order to obtain a single number, obtained after aggregation, which would allow to rank "agents", "opinions", ..., all defined in a very broad sense. We aim toward any process which should a priori demand or lead to some consensus in order to attain the presumably best choice among many possibilities. In order to precise the framework, we discuss previous attempts, recalling trivial "means of scores", - weighted or not, Condorcet paradox, TOPSIS, etc. We demonstrate through geometrical arguments on a toy example, with 4 criteria, that the pre-selected order of criteria in previous attempts makes a difference on the final result. However, it might be unjustified. Thus, we base our "best choice theory" on the linear response theory in statistical mechanics: we indicate that one should be calculating correlations functions between all possible choice evaluations, thereby avoiding an arbitrarily ordered set of criteria. We justify the point through an example with 6 possible criteria. Applications in many fields are suggested. Beside, two toy models serving as practical examples and illustrative arguments are given in an Appendix.

[303]  arXiv:2405.00065 (cross-list from math.OC) [pdf, other]
Title: From Linear to Linearizable Optimization: A Novel Framework with Applications to Stationary and Non-stationary DR-submodular Optimization
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper introduces the notion of upper linearizable/quadratizable functions, a class that extends concavity and DR-submodularity in various settings, including monotone and non-monotone cases over different convex sets. A general meta-algorithm is devised to convert algorithms for linear/quadratic maximization into ones that optimize upper quadratizable functions, offering a unified approach to tackling concave and DR-submodular optimization problems. The paper extends these results to multiple feedback settings, facilitating conversions between semi-bandit/first-order feedback and bandit/zeroth-order feedback, as well as between first/zeroth-order feedback and semi-bandit/bandit feedback. Leveraging this framework, new projection-free algorithms are derived using Follow The Perturbed Leader (FTPL) and other algorithms as base algorithms for linear/convex optimization, improving upon state-of-the-art results in various cases. Dynamic and adaptive regret guarantees are obtained for DR-submodular maximization, marking the first algorithms to achieve such guarantees in these settings. Notably, the paper achieves these advancements with fewer assumptions compared to existing state-of-the-art results, underscoring its broad applicability and theoretical contributions to non-convex optimization.

[304]  arXiv:2405.00070 (cross-list from q-bio.QM) [pdf, other]
Title: Bayesian-Guided Generation of Synthetic Microbiomes with Minimized Pathogenicity
Journal-ref: The 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE EMBC), 2024
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI)

Synthetic microbiomes offer new possibilities for modulating microbiota, to address the barriers in multidtug resistance (MDR) research. We present a Bayesian optimization approach to enable efficient searching over the space of synthetic microbiome variants to identify candidates predictive of reduced MDR. Microbiome datasets were encoded into a low-dimensional latent space using autoencoders. Sampling from this space allowed generation of synthetic microbiome signatures. Bayesian optimization was then implemented to select variants for biological screening to maximize identification of designs with restricted MDR pathogens based on minimal samples. Four acquisition functions were evaluated: expected improvement, upper confidence bound, Thompson sampling, and probability of improvement. Based on each strategy, synthetic samples were prioritized according to their MDR detection. Expected improvement, upper confidence bound, and probability of improvement consistently produced synthetic microbiome candidates with significantly fewer searches than Thompson sampling. By combining deep latent space mapping and Bayesian learning for efficient guided screening, this study demonstrated the feasibility of creating bespoke synthetic microbiomes with customized MDR profiles.

[305]  arXiv:2405.00082 (cross-list from quant-ph) [pdf, other]
Title: Structure learning of Hamiltonians from real-time evolution
Comments: 50 pages
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

We initiate the study of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m \lambda_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-studied under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $\lambda_a$, are unknown. But is it possible to learn a local Hamiltonian without prior knowledge of its interaction structure?
We present a new, general approach to Hamiltonian learning that not only solves the challenging structure learning variant, but also resolves other open questions in the area, all while achieving the gold standard of Heisenberg-limited scaling. In particular, our algorithm recovers the Hamiltonian to $\varepsilon$ error with an evolution time scaling with $1/\varepsilon$, and has the following appealing properties: (1) it does not need to know the Hamiltonian terms; (2) it works beyond the short-range setting, extending to any Hamiltonian $H$ where the sum of terms interacting with a qubit has bounded norm; (3) it evolves according to $H$ in constant time $t$ increments, thus achieving constant time resolution. To our knowledge, no prior algorithm with Heisenberg-limited scaling existed with even one of these properties. As an application, we can also learn Hamiltonians exhibiting power-law decay up to accuracy $\varepsilon$ with total evolution time beating the standard limit of $1/\varepsilon^2$.

[306]  arXiv:2405.00105 (cross-list from quant-ph) [pdf, other]
Title: Quantum Doeblin coefficients: A simple upper bound on contraction coefficients
Authors: Christoph Hirche
Comments: 15 pages, 6 figures. Short version accepted at ISIT 2024
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

Contraction coefficients give a quantitative strengthening of the data processing inequality. As such, they have many natural applications whenever closer analysis of information processing is required. However, it is often challenging to calculate these coefficients. As a remedy we discuss a quantum generalization of Doeblin coefficients. These give an efficiently computable upper bound on many contraction coefficients. We prove several properties and discuss generalizations and applications. In particular, we give additional stronger bounds for PPT channels and introduce reverse Doeblin coefficients that bound certain expansion coefficients.

[307]  arXiv:2405.00122 (cross-list from math.OC) [pdf, ps, other]
Title: An enhanced POSTA based on Nelder-Mead simplex search and quadratic interpolation
Authors: Tianyu Liu
Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE)

State transition algorithm (STA) is a metaheuristic method for global optimization. Recently, a modified STA named parameter optimal state transition algorithm (POSTA) is proposed. In POSTA, the performance of expansion operator, rotation operator and axesion operator is optimized through a parameter selection mechanism. But due to the insufficient utilization of historical information, POSTA still suffers from slow convergence speed and low solution accuracy on specific problems. To make better use of the historical information, Nelder-Mead (NM) simplex search and quadratic interpolation (QI) are integrated into POSTA. The enhanced POSTA is tested against 14 benchmark functions with 20-D, 30-D and 50-D space. An experimental comparison with several competitive metaheuristic methods demonstrates the effectiveness of the proposed method.

[308]  arXiv:2405.00130 (cross-list from eess.IV) [pdf, other]
Title: A Flexible 2.5D Medical Image Segmentation Approach with In-Slice and Cross-Slice Attention
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is feasible, it fails to capture the spatial relationships between slices. On the other hand, 3D models face challenges such as resolution inconsistencies in 2.5D images, along with computational complexity and susceptibility to overfitting when trained with limited data. In this context, 2.5D models, which capture inter-slice correlations using only 2D neural networks, emerge as a promising solution due to their reduced computational demand and simplicity in implementation. In this paper, we introduce CSA-Net, a flexible 2.5D segmentation model capable of processing 2.5D images with an arbitrary number of slices through an innovative Cross-Slice Attention (CSA) module. This module uses the cross-slice attention mechanism to effectively capture 3D spatial information by learning long-range dependencies between the center slice (for segmentation) and its neighboring slices. Moreover, CSA-Net utilizes the self-attention mechanism to understand correlations among pixels within the center slice. We evaluated CSA-Net on three 2.5D segmentation tasks: (1) multi-class brain MRI segmentation, (2) binary prostate MRI segmentation, and (3) multi-class prostate MRI segmentation. CSA-Net outperformed leading 2D and 2.5D segmentation methods across all three tasks, demonstrating its efficacy and superiority. Our code is publicly available at https://github.com/mirthAI/CSA-Net.

[309]  arXiv:2405.00146 (cross-list from quant-ph) [pdf, other]
Title: Averting multi-qubit burst errors in surface code magic state factories
Comments: 13 pages, 12 figures
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

Fault-tolerant quantum computation relies on the assumption of time-invariant, sufficiently low physical error rates. However, current superconducting quantum computers suffer from frequent disruptive noise events, including cosmic ray impacts and shifting two-level system defects. Several methods have been proposed to mitigate these issues in software, but they add large overheads in terms of physical qubit count, as it is difficult to preserve logical information through burst error events. We focus on mitigating multi-qubit burst errors in magic state factories, which are expected to comprise up to 95% of the space cost of future quantum programs. Our key insight is that magic state factories do not need to preserve logical information over time; once we detect an increase in local physical error rates, we can simply turn off parts of the factory that are affected, re-map the factory to the new chip geometry, and continue operating. This is much more efficient than previous more general methods, and is resilient even under many simultaneous impact events. Using precise physical noise models, we show an efficient ray detection method and evaluate our strategy in different noise regimes. Compared to existing baselines, we find reductions in ray-induced overheads by several orders of magnitude, reducing total qubitcycle cost by geomean 6.5x to 13.9x depending on the noise model. This work reduces the burden on hardware by providing low-overhead software mitigation of these errors.

[310]  arXiv:2405.00158 (cross-list from stat.ME) [pdf, other]
Title: BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking and Hierarchical Stacking in Python
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

Averaging predictions from multiple competing inferential models frequently outperforms predictions from any single model, providing that models are optimally weighted to maximize predictive performance. This is particularly the case in so-called $\mathcal{M}$-open settings where the true model is not in the set of candidate models, and may be neither mathematically reifiable nor known precisely. This practice of model averaging has a rich history in statistics and machine learning, and there are currently a number of methods to estimate the weights for constructing model-averaged predictive distributions. Nonetheless, there are few existing software packages that can estimate model weights from the full variety of methods available, and none that blend model predictions into a coherent predictive distribution according to the estimated weights. In this paper, we introduce the BayesBlend Python package, which provides a user-friendly programming interface to estimate weights and blend multiple (Bayesian) models' predictive distributions. BayesBlend implements pseudo-Bayesian model averaging, stacking and, uniquely, hierarchical Bayesian stacking to estimate model weights. We demonstrate the usage of BayesBlend with examples of insurance loss modeling.

[311]  arXiv:2405.00162 (cross-list from math.OC) [pdf, other]
Title: Real Stability and Log Concavity are coNP-Complete
Authors: Tracy Chin
Comments: 21 pages, 1 figure
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Combinatorics (math.CO)

Real-stable, Lorentzian, and log-concave polynomials are well-studied classes of polynomials, and have been powerful tools in resolving several conjectures. We show that the problems of deciding whether a polynomial of fixed degree is real stable or log concave are coNP-complete. On the other hand, while all homogeneous real-stable polynomials are Lorentzian and all Lorentzian polynomials are log concave on the positive orthant, the problem of deciding whether a polynomial of fixed degree is Lorentzian can be solved in polynomial time.

[312]  arXiv:2405.00222 (cross-list from quant-ph) [pdf, other]
Title: Optimized Distribution of Entanglement Graph States in Quantum Networks
Comments: 11 pages, 13 figures
Subjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI)

Building large-scale quantum computers, essential to demonstrating quantum advantage, is a key challenge. Quantum Networks (QNs) can help address this challenge by enabling the construction of large, robust, and more capable quantum computing platforms by connecting smaller quantum computers. Moreover, unlike classical systems, QNs can enable fully secured long-distance communication. Thus, quantum networks lie at the heart of the success of future quantum information technologies. In quantum networks, multipartite entangled states distributed over the network help implement and support many quantum network applications for communications, sensing, and computing. Our work focuses on developing optimal techniques to generate and distribute multipartite entanglement states efficiently. Prior works on generating general multipartite entanglement states have focused on the objective of minimizing the number of maximally entangled pairs (EPs) while ignoring the heterogeneity of the network nodes and links as well as the stochastic nature of underlying processes. In this work, we develop a hypergraph based linear programming framework that delivers optimal (under certain assumptions) generation schemes for general multipartite entanglement represented by graph states, under the network resources, decoherence, and fidelity constraints, while considering the stochasticity of the underlying processes. We illustrate our technique by developing generation schemes for the special cases of path and tree graph states, and discuss optimized generation schemes for more general classes of graph states. Using extensive simulations over a quantum network simulator (NetSquid), we demonstrate the effectiveness of our developed techniques and show that they outperform prior known schemes by up to orders of magnitude.

[313]  arXiv:2405.00230 (cross-list from math.OC) [pdf, other]
Title: A decomposition-based approach for large-scale pickup and delivery problems
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

With the advent of self-driving cars, experts envision autonomous mobility-on-demand services in the near future to cope with overloaded transportation systems in cities worldwide. Efficient operations are imperative to unlock such a system's maximum improvement potential. Existing approaches either consider a narrow planning horizon or ignore essential characteristics of the underlying problem. In this paper, we develop an algorithmic framework that allows the study of very large-scale pickup and delivery routing problems with more than 20 thousand requests, which arise in the context of integrated request pooling and vehicle-to-request dispatching. We conduct a computational study and present comparative results showing the characteristics of the developed approaches. Furthermore, we apply our algorithm to related benchmark instances from the literature to show the efficacy. Finally, we solve very large-scale instances and derive insights on upper-bound improvements regarding fleet sizing and customer delay acceptance from a practical perspective.

[314]  arXiv:2405.00239 (cross-list from eess.IV) [pdf, other]
Title: IgCONDA-PET: Implicitly-Guided Counterfactual Diffusion for Detecting Anomalies in PET Images
Comments: 12 pages, 6 figures, 1 table
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Minimizing the need for pixel-level annotated data for training PET anomaly segmentation networks is crucial, particularly due to time and cost constraints related to expert annotations. Current un-/weakly-supervised anomaly detection methods rely on autoencoder or generative adversarial networks trained only on healthy data, although these are more challenging to train. In this work, we present a weakly supervised and Implicitly guided COuNterfactual diffusion model for Detecting Anomalies in PET images, branded as IgCONDA-PET. The training is conditioned on image class labels (healthy vs. unhealthy) along with implicit guidance to generate counterfactuals for an unhealthy image with anomalies. The counterfactual generation process synthesizes the healthy counterpart for a given unhealthy image, and the difference between the two facilitates the identification of anomaly locations. The code is available at: https://github.com/igcondapet/IgCONDA-PET.git

[315]  arXiv:2405.00252 (cross-list from quant-ph) [pdf, other]
Title: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Comments: Our code is provided at this https URL
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Optimization techniques in deep learning are predominantly led by first-order gradient methodologies, such as SGD. However, neural network training can greatly benefit from the rapid convergence characteristics of second-order optimization. Newton's GD stands out in this category, by rescaling the gradient using the inverse Hessian. Nevertheless, one of its major bottlenecks is matrix inversion, which is notably time-consuming in $O(N^3)$ time with weak scalability.
Matrix inversion can be translated into solving a series of linear equations. Given that quantum linear solver algorithms (QLSAs), leveraging the principles of quantum superposition and entanglement, can operate within a $\text{polylog}(N)$ time frame, they present a promising approach with exponential acceleration. Specifically, one of the most recent QLSAs demonstrates a complexity scaling of $O(d\cdot\kappa \log(N\cdot\kappa/\epsilon))$, depending on: {size~$N$, condition number~$\kappa$, error tolerance~$\epsilon$, quantum oracle sparsity~$d$} of the matrix. However, this also implies that their potential exponential advantage may be hindered by certain properties (i.e. $\kappa$ and $d$).
We propose Q-Newton, a hybrid quantum-classical scheduler for accelerating neural network training with Newton's GD. Q-Newton utilizes a streamlined scheduling module that coordinates between quantum and classical linear solvers, by estimating & reducing $\kappa$ and constructing $d$ for the quantum solver.
Our evaluation showcases the potential for Q-Newton to significantly reduce the total training time compared to commonly used optimizers like SGD. We hypothesize a future scenario where the gate time of quantum machines is reduced, possibly realized by attoseconds physics. Our evaluation establishes an ambitious and promising target for the evolution of quantum computing.

[316]  arXiv:2405.00282 (cross-list from math.OC) [pdf, ps, other]
Title: MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games
Authors: Anran Hu, Junzi Zhang
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Reinforcement learning for multi-agent games has attracted lots of attention recently. However, given the challenge of solving Nash equilibria for large population games, existing works with guaranteed polynomial complexities either focus on variants of zero-sum and potential games, or aim at solving (coarse) correlated equilibria, or require access to simulators, or rely on certain assumptions that are hard to verify. This work proposes MF-OML (Mean-Field Occupation-Measure Learning), an online mean-field reinforcement learning algorithm for computing approximate Nash equilibria of large population sequential symmetric games. MF-OML is the first fully polynomial multi-agent reinforcement learning algorithm for provably solving Nash equilibria (up to mean-field approximation gaps that vanish as the number of players $N$ goes to infinity) beyond variants of zero-sum and potential games. When evaluated by the cumulative deviation from Nash equilibria, the algorithm is shown to achieve a high probability regret bound of $\tilde{O}(M^{3/4}+N^{-1/2}M)$ for games with the strong Lasry-Lions monotonicity condition, and a regret bound of $\tilde{O}(M^{11/12}+N^{- 1/6}M)$ for games with only the Lasry-Lions monotonicity condition, where $M$ is the total number of episodes and $N$ is the number of agents of the game. As a byproduct, we also obtain the first tractable globally convergent computational algorithm for computing approximate Nash equilibria of monotone mean-field games.

[317]  arXiv:2405.00304 (cross-list from quant-ph) [pdf, other]
Title: QUACK: Quantum Aligned Centroid Kernel
Comments: Submitted to IEEE International Conference on Quantum Computing and Engineering (QCE) 2024
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

Quantum computing (QC) seems to show potential for application in machine learning (ML). In particular quantum kernel methods (QKM) exhibit promising properties for use in supervised ML tasks. However, a major disadvantage of kernel methods is their unfavorable quadratic scaling with the number of training samples. Together with the limits imposed by currently available quantum hardware (NISQ devices) with their low qubit coherence times, small number of qubits, and high error rates, the use of QC in ML at an industrially relevant scale is currently impossible. As a small step in improving the potential applications of QKMs, we introduce QUACK, a quantum kernel algorithm whose time complexity scales linear with the number of samples during training, and independent of the number of training samples in the inference stage. In the training process, only the kernel entries for the samples and the centers of the classes are calculated, i.e. the maximum shape of the kernel for n samples and c classes is (n, c). During training, the parameters of the quantum kernel and the positions of the centroids are optimized iteratively. In the inference stage, for every new sample the circuit is only evaluated for every centroid, i.e. c times. We show that the QUACK algorithm nevertheless provides satisfactory results and can perform at a similar level as classical kernel methods with quadratic scaling during training. In addition, our (simulated) algorithm is able to handle high-dimensional datasets such as MNIST with 784 features without any dimensionality reduction.

[318]  arXiv:2405.00385 (cross-list from stat.ML) [pdf, other]
Title: Variational Bayesian Methods for a Tree-Structured Stick-Breaking Process Mixture of Gaussians
Authors: Yuta Nakahara
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

The Bayes coding algorithm for context tree source is a successful example of Bayesian tree estimation in text compression in information theory. This algorithm provides an efficient parametric representation of the posterior tree distribution and exact updating of its parameters. We apply this algorithm to a clustering task in machine learning. More specifically, we apply it to Bayesian estimation of the tree-structured stick-breaking process (TS-SBP) mixture models. For TS-SBP mixture models, only Markov chain Monte Carlo methods have been proposed so far, but any variational Bayesian methods have not been proposed yet. In this paper, we propose a variational Bayesian method that has a subroutine similar to the Bayes coding algorithm for context tree sources. We confirm its behavior by a numerical experiment on a toy example.

[319]  arXiv:2405.00389 (cross-list from math.OC) [pdf, other]
Title: Employing Federated Learning for Training Autonomous HVAC Systems
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

Buildings account for 40 % of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. Hence, common research goals are to improve the learning speed, as well as to improve their ability to generalize, in order to facilitate transfer learning to unseen building environments. In this paper, we take a federated learning approach to training the reinforcement learning controller of an HVAC system. A global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to simultaneously minimize energy consumption and maximize thermal comfort. The federated optimization strategy indirectly increases both the rate at which experience data is collected and the variation in the data. We demonstrate through experimental evaluation that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy.

[320]  arXiv:2405.00398 (cross-list from math.CT) [pdf, ps, other]
Title: CaTT contexts are finite computads
Subjects: Category Theory (math.CT); Logic in Computer Science (cs.LO)

Two novel descriptions of weak {\omega}-categories have been recently proposed, using type-theoretic ideas. The first one is the dependent type theory CaTT whose models are {\omega}-categories. The second is a recursive description of a category of computads together with an adjunction to globular sets, such that the algebras for the induced monad are again {\omega}-categories. We compare the two descriptions by showing that there exits a fully faithful morphism of categories with families from the syntactic category of CaTT to the opposite of the category of computads, which gives an equivalence on the subcategory of finite computads. We derive a more direct connection between the category of models of CaTT and the category of algebras for the monad on globular sets, induced by the adjunction with computads.

[321]  arXiv:2405.00430 (cross-list from physics.med-ph) [pdf, ps, other]
Title: Continuous sPatial-Temporal Deformable Image Registration (CPT-DIR) for motion modelling in radiotherapy: beyond classic voxel-based methods
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)

Background and purpose: Deformable image registration (DIR) is a crucial tool in radiotherapy for extracting and modelling organ motion. However, when significant changes and sliding boundaries are present, it faces compromised accuracy and uncertainty, determining the subsequential contour propagation and dose accumulation procedures. Materials and methods: We propose an implicit neural representation (INR)-based approach modelling motion continuously in both space and time, named Continues-sPatial-Temporal DIR (CPT-DIR). This method uses a multilayer perception (MLP) network to map 3D coordinate (x,y,z) to its corresponding velocity vector (vx,vy,vz). The displacement vectors (dx,dy,dz) are then calculated by integrating velocity vectors over time. The MLP's parameters can rapidly adapt to new cases without pre-training, enhancing optimisation. The DIR's performance was tested on the DIR-Lab dataset of 10 lung 4DCT cases, using metrics of landmark accuracy (TRE), contour conformity (Dice) and image similarity (MAE). Results: The proposed CPT-DIR can reduce landmark TRE from 2.79mm to 0.99mm, outperforming B-splines' results for all cases. The MAE of the whole-body region improves from 35.46HU to 28.99HU. Furthermore, CPT-DIR surpasses B-splines for accuracy in the sliding boundary region, lowering MAE and increasing Dice coefficients for the ribcage from 65.65HU and 90.41% to 42.04HU and 90.56%, versus 75.40HU and 89.30% without registration. Meanwhile, CPT-DIR offers significant speed advantages, completing in under 15 seconds compared to a few minutes with the conventional B-splines method. Conclusion: Leveraging the continuous representations, the CPT-DIR method significantly enhances registration accuracy, automation and speed, outperforming traditional B-splines in landmark and contour precision, particularly in the challenging areas.

[322]  arXiv:2405.00442 (cross-list from stat.ML) [pdf, other]
Title: Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration
Comments: This paper is under consideration at Pattern Recognition Letters
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The key factor in implementing machine learning algorithms in decision-making situations is not only the accuracy of the model but also its confidence level. The confidence level of a model in a classification problem is often given by the output vector of a softmax function for convenience. However, these values are known to deviate significantly from the actual expected model confidence. This problem is called model calibration and has been studied extensively. One of the simplest techniques to tackle this task is focal loss, a generalization of cross-entropy by introducing one positive parameter. Although many related studies exist because of the simplicity of the idea and its formalization, the theoretical analysis of its behavior is still insufficient. In this study, our objective is to understand the behavior of focal loss by reinterpreting this function geometrically. Our analysis suggests that focal loss reduces the curvature of the loss surface in training the model. This indicates that curvature may be one of the essential factors in achieving model calibration. We design numerical experiments to support this conjecture to reveal the behavior of focal loss and the relationship between calibration performance and curvature.

[323]  arXiv:2405.00447 (cross-list from math.OC) [pdf, other]
Title: A Modelling Framework for Energy-Management and Eco-Driving Problems using Convex Relaxations
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper presents a convex optimization framework for eco-driving and vehicle energy management problems. We will first show that several types of eco-driving and vehicle energy management problems can be modelled using the same notions of energy storage buffers and energy storage converters that are connected to a power network. It will be shown that these problems can be formulated as optimization problems with linear cost functions and linear dynamics, and nonlinear constraints representing the power converters. We will show that under some mild conditions, the (non-convex) optimization problem has the same (globally) optimal solution as a convex relaxation. This means that the problems can be solved efficiently and that the solution is guaranteed to be globally optimal. Finally, a numerical example of the eco-driving problem is used to illustrate this claim.

[324]  arXiv:2405.00472 (cross-list from eess.IV) [pdf, other]
Title: DmADs-Net: Dense multiscale attention and depth-supervised network for medical image segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning has made important contributions to the development of medical image segmentation. Convolutional neural networks, as a crucial branch, have attracted strong attention from researchers. Through the tireless efforts of numerous researchers, convolutional neural networks have yielded numerous outstanding algorithms for processing medical images. The ideas and architectures of these algorithms have also provided important inspiration for the development of later technologies.Through extensive experimentation, we have found that currently mainstream deep learning algorithms are not always able to achieve ideal results when processing complex datasets and different types of datasets. These networks still have room for improvement in lesion localization and feature extraction. Therefore, we have created the Dense Multiscale Attention and Depth-Supervised Network (DmADs-Net).We use ResNet for feature extraction at different depths and create a Multi-scale Convolutional Feature Attention Block to improve the network's attention to weak feature information. The Local Feature Attention Block is created to enable enhanced local feature attention for high-level semantic information. In addition, in the feature fusion phase, a Feature Refinement and Fusion Block is created to enhance the fusion of different semantic information.We validated the performance of the network using five datasets of varying sizes and types. Results from comparative experiments show that DmADs-Net outperformed mainstream networks. Ablation experiments further demonstrated the effectiveness of the created modules and the rationality of the network architecture.

[325]  arXiv:2405.00498 (cross-list from math.CT) [pdf, ps, other]
Title: The Natural Display Topos of Coalgebras
Authors: Colin Zwanziger
Comments: PhD Thesis, Carnegie Mellon University
Subjects: Category Theory (math.CT); Programming Languages (cs.PL)

A classical result of topos theory holds that the category of coalgebras for a Cartesian comonad on a topos is again a topos (Kock and Wraith, 1971).
It is natural to refine this result to a topos-theoretic setting that includes universes. To this end, we introduce the notions of natural display topos and natural Cartesian display comonad, and show that the natural model of coalgebras for a natural Cartesian display comonad on a natural display topos is again a natural display topos. As an application, this result extends the approach to universes of Hofmann and Streicher (1997) from presheaf toposes to sheaf toposes with enough points.
Whereas natural display toposes provide a categorical semantics for a form of extensional Martin-L\"of type theory, we also prove our main result in the more general setting of natural typoses, which encompasses models of intensional Martin-L\"of type theory.
A natural Cartesian display comonad on a natural typos may also be used as a model for dependent type theory with an S4 box operator, or comonadic modality, as introduced by Nanevski et al. (2008). Modal contexts, which have been regarded as tricky to handle semantically, are interpreted as contexts of the natural typos of coalgebras. We sketch an interpretation within this approach.
As part of the framework in which the above takes place, we introduce a refinement of the notion of natural model (see Awodey, 2018), which is (strictly 2-)equivalent to the notion of full, split comprehension category (see Jacobs, 1993), rather than the notion of category with attributes (Cartmell 1978).

[326]  arXiv:2405.00518 (cross-list from math.CO) [pdf, ps, other]
Title: Graph-Based Multivariate Multiscale Dispersion Entropy: Efficient Implementation and Applications to Real-World Network Data
Comments: 9 pages, 10 figures
Subjects: Combinatorics (math.CO); Computational Engineering, Finance, and Science (cs.CE); Chaotic Dynamics (nlin.CD)

We introduce Multivariate Multiscale Graph-based Dispersion Entropy (mvDEG), a novel, computationally efficient method for analyzing multivariate time series data in graph and complex network frameworks, and demonstrate its application in real-world data. mvDEG effectively combines temporal dynamics with topological relationships, offering enhanced analysis compared to traditional nonlinear entropy methods. Its efficacy is established through testing on synthetic signals, such as uncorrelated and correlated noise, showcasing its adeptness in discerning various levels of dependency and complexity.
The robustness of mvDEG is further validated with real-world datasets, effectively differentiating various two-phase flow regimes and capturing distinct dynamics in weather data analysis. An important advancement of mvDEG is its computational efficiency. Our optimized algorithm displays a computational time that grows linearly with the number of vertices or nodes, in contrast to the exponential growth observed in classical methods. This efficiency is achieved through refined matrix power calculations that exploit matrix and Kronecker product properties, making our method faster than the state of the art. The significant acceleration in computational time positions mvDEG as a transformative tool for extensive and real-time applications, setting a new benchmark in the analysis of time series recorded at distributed locations and opening avenues for innovative applications.

[327]  arXiv:2405.00542 (cross-list from eess.IV) [pdf, other]
Title: UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO). To mitigate potential adverse effects associated with injections, researchers have proposed the development of cross-modality medical image generation algorithms capable of converting UWF-SLO images into their UWF-FA counterparts. Current image generation techniques applied to fundus photography encounter difficulties in producing high-resolution retinal images, particularly in capturing minute vascular lesions. To address these issues, we introduce a novel conditional generative adversarial network (UWAFA-GAN) to synthesize UWF-FA from UWF-SLO. This approach employs multi-scale generators and an attention transmit module to efficiently extract both global structures and local lesions. Additionally, to counteract the image blurriness issue that arises from training with misaligned data, a registration module is integrated within this framework. Our method performs non-trivially on inception scores and details generation. Clinical user studies further indicate that the UWF-FA images generated by UWAFA-GAN are clinically comparable to authentic images in terms of diagnostic reliability. Empirical evaluations on our proprietary UWF image datasets elucidate that UWAFA-GAN outperforms extant methodologies. The code is accessible at https://github.com/Tinysqua/UWAFA-GAN.

[328]  arXiv:2405.00572 (cross-list from eess.SP) [pdf, other]
Title: A Modular Pragmatic Architecture for Multiuser MIMO with Array-Fed RIS
Comments: 5 pages, 8 figures
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Systems and Control (eess.SY)

We propose a power- and hardware-efficient, pragmatic, modular, multiuser/multibeam array-fed RIS architecture particularly suited to operate in very high frequency bands (high mmWave and sub-THz), where channels are typically sparse in the beamspace and line-of-sight (LOS) is required to achieve an acceptable received signal level. The key module is an active multi-antenna feeder (AMAF) with a small number of active antennas placed in the near field of a RIS with a much larger number of passive controllable reflecting elements. We propose a pragmatic approach to obtain a steerable beam with high gain and very low sidelobes. Then, $K$ independently controlled beams can be achieved by stacking $K$ of such AMAF-RIS modules. Our analysis takes in full account: 1) the near-end crosstalk (NEXT) between the modules, 2) the far-end crosstalk (FEXT) due to the sidelobes; 3) a thorough energy efficiency comparison with respect to conventional {\em active arrays} with the same beamforming performance. Overall, we show that the proposed architecture is very attractive in terms of spectral efficiency, ease of implementation (hardware complexity), and energy efficiency.

[329]  arXiv:2405.00592 (cross-list from stat.ML) [pdf, other]
Title: Scaling and renormalization in high-dimensional regression
Comments: 64 pages, 16 figures
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)

This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

[330]  arXiv:2405.00610 (cross-list from math.GR) [pdf, ps, other]
Title: Growth in products of matrices: fastest, average, and generic
Comments: 10 pages. Comments are welcome
Subjects: Group Theory (math.GR); Cryptography and Security (cs.CR); Combinatorics (math.CO); Dynamical Systems (math.DS); Probability (math.PR)

The problems that we consider in this paper are as follows. Let A and B be 2x2 matrices (over reals). Let w(A, B) be a word of length n. After evaluating w(A, B) as a product of matrices, we get a 2x2 matrix, call it W. What is the largest (by the absolute value) possible entry of W, over all w(A, B) of length n, as a function of n? What is the expected absolute value of the largest (by the absolute value) entry in a random product of n matrices, where each matrix is A or B with probability 0.5? What is the Lyapunov exponent for a random matrix product like that? We give partial answer to the first of these questions and an essentially complete answer to the second question. For the third question (the most difficult of the three), we offer a very simple method to produce an upper bound on the Lyapunov exponent in the case where all entries of the matrices A and B are nonnegative.

[331]  arXiv:2405.00636 (cross-list from physics.soc-ph) [pdf, other]
Title: Robustness of graph embedding methods for community detection
Comments: 17 pages, 26 figures, 3 tables. Comments are welcome
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an)

This study investigates the robustness of graph embedding methods for community detection in the face of network perturbations, specifically edge deletions. Graph embedding techniques, which represent nodes as low-dimensional vectors, are widely used for various graph machine learning tasks due to their ability to capture structural properties of networks effectively. However, the impact of perturbations on the performance of these methods remains relatively understudied. The research considers state-of-the-art graph embedding methods from two families: matrix factorization (e.g., LE, LLE, HOPE, M-NMF) and random walk-based (e.g., DeepWalk, LINE, node2vec). Through experiments conducted on both synthetic and real-world networks, the study reveals varying degrees of robustness within each family of graph embedding methods. The robustness is found to be influenced by factors such as network size, initial community partition strength, and the type of perturbation. Notably, node2vec and LLE consistently demonstrate higher robustness for community detection across different scenarios, including networks with degree and community size heterogeneity. These findings highlight the importance of selecting an appropriate graph embedding method based on the specific characteristics of the network and the task at hand, particularly in scenarios where robustness to perturbations is crucial.

[332]  arXiv:2405.00642 (cross-list from stat.ML) [pdf, other]
Title: From Empirical Observations to Universality: Dynamics of Deep Learning with Inputs Built on Gaussian mixture
Comments: 19 pages, 9 figures
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)

This study broadens the scope of theoretical frameworks in deep learning by delving into the dynamics of neural networks with inputs that demonstrate the structural characteristics to Gaussian Mixture (GM). We analyzed how the dynamics of neural networks under GM-structured inputs diverge from the predictions of conventional theories based on simple Gaussian structures. A revelation of our work is the observed convergence of neural network dynamics towards conventional theory even with standardized GM inputs, highlighting an unexpected universality. We found that standardization, especially in conjunction with certain nonlinear functions, plays a critical role in this phenomena. Consequently, despite the complex and varied nature of GM distributions, we demonstrate that neural networks exhibit asymptotic behaviors in line with predictions under simple Gaussian frameworks.

[333]  arXiv:2405.00647 (cross-list from physics.med-ph) [pdf, other]
Title: Screening of BindingDB database ligands against EGFR, HER2, Estrogen, Progesterone and NF-kB receptors based on machine learning and molecular docking
Subjects: Medical Physics (physics.med-ph); Machine Learning (cs.LG)

Breast cancer, the second most prevalent cancer among women worldwide, necessitates the exploration of novel therapeutic approaches. To target the four subgroups of breast cancer "hormone receptor-positive and HER2-negative, hormone receptor-positive and HER2-positive, hormone receptor-negative and HER2-positive, and hormone receptor-negative and HER2-negative" it is crucial to inhibit specific targets such as EGFR, HER2, ER, NF-kB, and PR.
In this study, we evaluated various methods for binary and multiclass classification. Among them, the GA-SVM-SVM:GA-SVM-SVM model was selected with an accuracy of 0.74, an F1-score of 0.73, and an AUC of 0.94 for virtual screening of ligands from the BindingDB database. This model successfully identified 4454, 803, 438, and 378 ligands with over 90% precision in both active/inactive and target prediction for the classes of EGFR+HER2, ER, NF-kB, and PR, respectively, from the BindingDB database. Based on to the selected ligands, we created a dendrogram that categorizes different ligands based on their targets. This dendrogram aims to facilitate the exploration of chemical space for various therapeutic targets.
Ligands that surpassed a 90% threshold in the product of activity probability and correct target selection probability were chosen for further investigation using molecular docking. The binding energy range for these ligands against their respective targets was calculated to be between -15 and -5 kcal/mol. Finally, based on general and common rules in medicinal chemistry, we selected 2, 3, 3, and 8 new ligands with high priority for further studies in the EGFR+HER2, ER, NF-kB, and PR classes, respectively.

[334]  arXiv:2405.00663 (cross-list from quant-ph) [pdf, other]
Title: Quantum cryptographic protocols with dual messaging system via 2D alternate quantum walks and genuine single particle entangled states
Comments: 11 pages (including supplementary material), 2 figures and 1 table
Subjects: Quantum Physics (quant-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Cryptography and Security (cs.CR); Quantum Algebra (math.QA); Optics (physics.optics)

Single-particle entangled states (SPES) can offer a more secure way of encoding and processing quantum information than their multi-particle counterparts. The SPES generated via a 2D alternate quantum-walk setup from initially separable states can be either 3-way or 2-way entangled. This letter shows that the generated genuine three-way and nonlocal two-way SPES can be used as cryptographic keys to securely encode two distinct messages simultaneously. We detail the message encryption-decryption steps and show the resilience of the 3-way and 2-way SPES-based cryptographic protocols against eavesdropper attacks like intercept-and-resend and man-in-the-middle. We also detail how these protocols can be experimentally realized using single photons, with the three degrees of freedom being OAM, path, and polarization. These have unparalleled security for quantum communication tasks. The ability to simultaneously encode two distinct messages using the generated SPES showcases the versatility and efficiency of the proposed cryptographic protocol. This capability could significantly improve the throughput of quantum communication systems.

Replacements for Thu, 2 May 24

[335]  arXiv:1908.01696 (replaced) [pdf, ps, other]
Title: A two-parameter entropy and its fundamental properties
Comments: This is a pre-print of the published article
Journal-ref: Reviews in Mathematical Physics, Vol. 33, No. 04, 2130003 (2021)
Subjects: Mathematical Physics (math-ph); Information Theory (cs.IT)
[336]  arXiv:1910.02123 (replaced) [pdf, other]
Title: Maximum Matchings in Geometric Intersection Graphs
Comments: 25 pages, 1 figure; a preliminary version appeared at STACS 2020
Journal-ref: Discrete and Computational Geometry (DCG), 70(3), October 2023, pp. 550-579
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
[337]  arXiv:2012.12437 (replaced) [pdf, other]
Title: Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars
Comments: Published at IROS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[338]  arXiv:2101.00445 (replaced) [pdf, other]
Title: Long Plane Trees
Comments: 40 pages, 30 figures; a preliminary version appeared at SoCG 2022
Subjects: Computational Geometry (cs.CG)
[339]  arXiv:2106.00599 (replaced) [pdf, other]
Title: ClustML: A Measure of Cluster Pattern Complexity in Scatterplots Learnt from Human-labeled Groupings
Comments: Published in SAGE Information Visualization journal
Journal-ref: Information Visualization Journal 23(2) 105-122 (2024)
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[340]  arXiv:2106.04003 (replaced) [pdf, other]
Title: Double Descent and Other Interpolation Phenomena in GANs
Subjects: Machine Learning (cs.LG)
[341]  arXiv:2106.10362 (replaced) [pdf, other]
Title: Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback
Comments: arXiv admin note: text overlap with arXiv:2103.03181
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
[342]  arXiv:2106.14935 (replaced) [pdf, other]
Title: Dynamic Connectivity in Disk Graphs
Comments: 58 pages, 27 figures; a preliminary version appeared at SoCG 2022
Journal-ref: Discrete and Computational Geometry (DCG), 71(1), Jan 2024, pp. 214.277
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
[343]  arXiv:2109.10561 (replaced) [pdf, ps, other]
Title: A Few-Shot Learning Approach for Sound Source Distance Estimation Using Relation Networks
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[344]  arXiv:2201.02986 (replaced) [pdf, other]
Title: SoK: Rowhammer on Commodity Operating Systems
Comments: To Appear in the 19th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2024)
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
[345]  arXiv:2201.04272 (replaced) [pdf, ps, other]
Title: Comparative evaluation of the web-based contiguous cartogram generation tool go-cart.io
Comments: 26 pages, 11 figures, 3 tables; version accepted by PLOS ONE
Subjects: Human-Computer Interaction (cs.HC)
[346]  arXiv:2202.00788 (replaced) [pdf, other]
Title: Modular Multi-Rotors: From Quadrotors to Fully-Actuated Aerial Vehicles
Subjects: Robotics (cs.RO)
[347]  arXiv:2202.12737 (replaced) [pdf, other]
Title: Alpha-NML Universal Predictors
Subjects: Information Theory (cs.IT)
[348]  arXiv:2203.14859 (replaced) [pdf, other]
Title: FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an Example
Comments: Extended version of the paper accepted for publication at the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[349]  arXiv:2204.09747 (replaced) [pdf, other]
Title: Conceptual structure and the growth of scientific knowledge
Comments: 40 pages, 13 figures
Subjects: Social and Information Networks (cs.SI)
[350]  arXiv:2207.04248 (replaced) [pdf, other]
Title: A Statistical-Modelling Approach to Feedforward Neural Network Model Selection
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)
[351]  arXiv:2207.06956 (replaced) [pdf, ps, other]
Title: Cover and Hitting Times of Hyperbolic Random Graphs
Comments: 55 pages, 4 figures. Appeared in Proceedings of RANDOM 2022
Subjects: Probability (math.PR); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[352]  arXiv:2208.14226 (replaced) [pdf, other]
Title: Unsupervised Representation Learning in Deep Reinforcement Learning: A Review
Subjects: Machine Learning (cs.LG)
[353]  arXiv:2209.05557 (replaced) [pdf, other]
Title: Blurring Diffusion Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[354]  arXiv:2209.11200 (replaced) [pdf, other]
Title: Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper
Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV)
[355]  arXiv:2211.05716 (replaced) [pdf, other]
Title: Resource-Aware Heterogeneous Federated Learning using Neural Architecture Search
Comments: Accepted at the 30th International European Conference on Parallel and Distributed Computing (Euro-Par 2024)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[356]  arXiv:2211.10307 (replaced) [pdf, other]
Title: SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification
Comments: The SeaTurtleID2022 dataset is the latest version of the SeaTurtleID dataset which was described in the previous versions of this arXiv submission. Notice the change of title in the latest version
Journal-ref: Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7146-7156
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[357]  arXiv:2212.04481 (replaced) [pdf, other]
Title: A Survey of Graph Neural Networks for Social Recommender Systems
Comments: Published in ACM CSUR April 2024. GitHub repository with the curated list of papers: this https URL
Journal-ref: ACM Comput. Surv. (April 2024)
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[358]  arXiv:2212.06278 (replaced) [pdf, other]
Title: Efficient Bayesian Uncertainty Estimation for nnU-Net
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[359]  arXiv:2212.11379 (replaced) [pdf, ps, other]
Title: The Inverse of Exact Renormalization Group Flows as Statistical Inference
Comments: 52 pages, 3 tables. V2 Minor revisions, matches the published version of the text
Journal-ref: Entropy 2024, 26, 389
Subjects: High Energy Physics - Theory (hep-th); Artificial Intelligence (cs.AI)
[360]  arXiv:2212.13253 (replaced) [pdf, other]
Title: DSI2I: Dense Style for Unpaired Image-to-Image Translation
Comments: To appear on TMLR '24, Reviewed on OpenReview: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[361]  arXiv:2301.02608 (replaced) [pdf, other]
Title: An interpretable machine learning system for colorectal cancer diagnosis from pathology slides
Comments: Accepted at npj Precision Oncology. Available at: this https URL
Journal-ref: npj Precis. Onc. 8, 56 (2024)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[362]  arXiv:2301.09430 (replaced) [pdf, other]
Title: Rethinking Real-world Image Deraining via An Unpaired Degradation-Conditioned Diffusion Model
Comments: 18 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[363]  arXiv:2301.10097 (replaced) [pdf, other]
Title: Canonical variables based numerical schemes for hybrid plasma models with kinetic ions and massless electrons
Comments: 25 pages, 8 figures
Subjects: Numerical Analysis (math.NA); Plasma Physics (physics.plasm-ph)
[364]  arXiv:2301.12226 (replaced) [pdf, other]
Title: Influence Maximization with Unknown Individual Effect on General Network
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
[365]  arXiv:2302.06358 (replaced) [pdf, other]
Title: Anticipating Next Active Objects for Egocentric Videos
Comments: Accepted by IEEE ACCESS, this paper carries the Manuscript DOI: 10.1109/ACCESS.2024.3395282. The complete peer-reviewed version is available via this DOI, while the arXiv version is a post-author manuscript without peer-review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[366]  arXiv:2302.08860 (replaced) [pdf, other]
Title: Realizing temporal graphs from fastest travel times
Comments: 57 pages, 10 figures
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
[367]  arXiv:2302.10025 (replaced) [pdf, other]
Title: DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises
Comments: camera-ready version, accepted by Transaction of ACL (TACL)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[368]  arXiv:2302.11887 (replaced) [pdf, ps, other]
Title: A Curry-Howard Correspondence for Linear, Reversible Computation
Comments: This version is an extended version of the v1
Journal-ref: 31st EACSL Annual Conference on Computer Science Logic (CSL 2023)
Subjects: Logic in Computer Science (cs.LO)
[369]  arXiv:2303.00633 (replaced) [pdf, other]
Title: An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
[370]  arXiv:2303.15318 (replaced) [pdf, other]
Title: Closed-Loop Koopman Operator Approximation
Comments: 13 pages, 11 figures, 3 tables, accepted for accepted for publication in Machine Learning: Science and Technology
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
[371]  arXiv:2304.01698 (replaced) [pdf, other]
Title: Inverse Unscented Kalman Filter
Comments: 20 pages, 5 figures. arXiv admin note: text overlap with arXiv:2210.00359
Subjects: Optimization and Control (math.OC); Signal Processing (eess.SP); Systems and Control (eess.SY); Machine Learning (stat.ML)
[372]  arXiv:2305.11774 (replaced) [pdf, other]
Title: Multi-objective optimisation via the R2 utilities
Comments: The code is available at: this https URL
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[373]  arXiv:2305.12081 (replaced) [pdf, other]
Title: MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement
Comments: IJCAI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[374]  arXiv:2305.12661 (replaced) [pdf, other]
Title: Semantic-guided modeling of spatial relation and object co-occurrence for indoor scene recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[375]  arXiv:2306.02176 (replaced) [pdf, other]
Title: TransRUPNet for Improved Polyp Segmentation
Comments: Accepted at EMBC 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[376]  arXiv:2306.02546 (replaced) [pdf, other]
Title: Leveraging Generative Models to Recover Variable Names from Stripped Binary
Subjects: Software Engineering (cs.SE)
[377]  arXiv:2306.10274 (replaced) [pdf, other]
Title: Benchmarking Deep Learning Architectures for Urban Vegetation Point Cloud Semantic Segmentation from MLS
Comments: The paper has been accepted for publication in IEEE Transactions on Geoscience and Remote Sensing. DOI: 10.1109/TGRS.2024.3381976
Journal-ref: in IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-14, 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[378]  arXiv:2306.13941 (replaced) [pdf, other]
Title: Grassroots Social Networking: Where People have Agency over their Personal Information and Social Graph
Authors: Ehud Shapiro
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
[379]  arXiv:2306.14063 (replaced) [pdf, other]
Title: Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[380]  arXiv:2306.16308 (replaced) [pdf, other]
Title: Gaussian random field approximation via Stein's method with applications to wide random neural networks
Comments: To appear in Applied and Computational Harmonic Analysis
Subjects: Probability (math.PR); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[381]  arXiv:2307.01716 (replaced) [pdf, other]
Title: Raster Interval Object Approximations for Spatial Intersection Joins
Comments: 34 pages
Subjects: Databases (cs.DB)
[382]  arXiv:2307.03784 (replaced) [pdf, other]
Title: NeuroBlend: Towards Low-Power yet Accurate Neural Network-Based Inference Engine Blending Binary and Fixed-Point Convolutions
Comments: 6 pages - In proceeding of GLSVLSI 2024
Subjects: Hardware Architecture (cs.AR)
[383]  arXiv:2307.04500 (replaced) [pdf, ps, other]
Title: Optimal Academic Plan Derived from Articulation Agreements: A Preliminary Experiment on Human-Generated and (Hypothetical) Algorithm-Generated Academic Plans
Subjects: Human-Computer Interaction (cs.HC)
[384]  arXiv:2307.06675 (replaced) [pdf, other]
Title: Meta-State-Space Learning: An Identification Approach for Stochastic Dynamical Systems
Comments: Accepted in Automatica
Subjects: Systems and Control (eess.SY)
[385]  arXiv:2307.07300 (replaced) [pdf, other]
Title: Evaluation Methodologies in Software Protection Research
Subjects: Cryptography and Security (cs.CR)
[386]  arXiv:2307.10182 (replaced) [pdf, other]
Title: Enhancing Super-Resolution Networks through Realistic Thick-Slice CT Simulation
Comments: 11 pages, 4 figures
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
[387]  arXiv:2307.15615 (replaced) [pdf, other]
Title: A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond
Comments: A list of open-sourced code from the papers reviewed has been organized and is available at this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[388]  arXiv:2308.00692 (replaced) [pdf, other]
Title: LISA: Reasoning Segmentation via Large Language Model
Comments: Code, models, and data are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[389]  arXiv:2308.02151 (replaced) [pdf, other]
Title: Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[390]  arXiv:2308.02585 (replaced) [pdf, other]
Title: PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
Subjects: Machine Learning (cs.LG)
[391]  arXiv:2308.02885 (replaced) [pdf, other]
Title: REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption
Subjects: Cryptography and Security (cs.CR)
[392]  arXiv:2308.03159 (replaced) [pdf, ps, other]
Title: Semilinear elliptic eigenvalue problem: Parametric analyticity and the uncertainty quantification
Authors: Byeong-Ho Bahn
Comments: 37 pages, 0 figures
Subjects: Numerical Analysis (math.NA)
[393]  arXiv:2308.03290 (replaced) [pdf, other]
Title: FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Comments: Accepted to AutoML 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[394]  arXiv:2308.12820 (replaced) [pdf, other]
Title: Prediction without Preclusion: Recourse Verification with Reachable Sets
Comments: ICLR 2024 Spotlight. The first two authors contributed equally
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
[395]  arXiv:2308.13646 (replaced) [pdf, other]
Title: GRASP: A Rehearsal Policy for Efficient Online Continual Learning
Comments: Accepted to the Conference on Lifelong Learning Agents (CoLLAs) 2024
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[396]  arXiv:2308.14411 (replaced) [pdf, ps, other]
Title: Community College Articulation Agreement Websites: Students' Suggestions for New Academic Advising Software Features
Subjects: Human-Computer Interaction (cs.HC)
[397]  arXiv:2308.16118 (replaced) [pdf, other]
Title: Response: Emergent analogical reasoning in large language models
Comments: Response to publication in Nature Human Behaviour titled "Emergent analogical reasoning in large language models," (Webb, Holyoak, and Lu, 2023, arXiv:2212.09196). 14 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[398]  arXiv:2309.00069 (replaced) [pdf, ps, other]
Title: Multistage DPG time-marching scheme for nonlinear problems
Subjects: Numerical Analysis (math.NA)
[399]  arXiv:2309.01237 (replaced) [pdf, other]
Title: The Information Geometry of UMAP
Comments: 11 pages, 2 figures, 3 tables; Github repo (this https URL)
Subjects: Computational Geometry (cs.CG); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Geometric Topology (math.GT)
[400]  arXiv:2309.01513 (replaced) [pdf, other]
Title: RGI-Net: 3D Room Geometry Inference from Room Impulse Responses in the Absence of First-order Echoes
Comments: 5 pages, 3 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[401]  arXiv:2309.01886 (replaced) [pdf, ps, other]
Title: Reconstruction of Unstable Heavy Particles Using Deep Symmetry-Preserving Attention Networks
Comments: Accepted by Nature Communications Physics, replaced with published version
Journal-ref: Commun Phys 7, 139 (2024)
Subjects: High Energy Physics - Experiment (hep-ex); Machine Learning (cs.LG); High Energy Physics - Phenomenology (hep-ph)
[402]  arXiv:2309.06629 (replaced) [pdf, other]
Title: The Relational Bottleneck as an Inductive Bias for Efficient Abstraction
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[403]  arXiv:2309.09248 (replaced) [pdf, other]
Title: The Director: A Composable Behaviour System with Soft Transitions
Subjects: Robotics (cs.RO)
[404]  arXiv:2309.12870 (replaced) [pdf, ps, other]
Title: Penalty Ensembles for Navier-Stokes with Random Initial Conditions & Forcing
Authors: Rui Fang
Subjects: Numerical Analysis (math.NA)
[405]  arXiv:2309.15375 (replaced) [pdf, other]
Title: PPG-to-ECG Signal Translation for Continuous Atrial Fibrillation Detection via Attention-based Deep State-Space Modeling
Comments: Accepted to 46th IEEE EMBC
Subjects: Machine Learning (cs.LG)
[406]  arXiv:2310.00381 (replaced) [pdf, other]
Title: Quadratic constraint consistency in the projection-free approximation of harmonic maps and bending isometries
Subjects: Numerical Analysis (math.NA)
[407]  arXiv:2310.01012 (replaced) [pdf, other]
Title: Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[408]  arXiv:2310.03289 (replaced) [pdf, other]
Title: Collaborative Safety-Critical Control for Networked Dynamic Systems
Comments: This work is under review for publication in the IEEE Transactions on Automatic Control
Subjects: Optimization and Control (math.OC); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Dynamical Systems (math.DS)
[409]  arXiv:2310.07355 (replaced) [pdf, other]
Title: IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[410]  arXiv:2310.08213 (replaced) [pdf, other]
Title: A Universal Scheme for Dynamic Partitioned Shortest Path Index
Subjects: Databases (cs.DB)
[411]  arXiv:2310.09611 (replaced) [pdf, other]
Title: VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction
Comments: 19 pages, 8 figures
Subjects: Human-Computer Interaction (cs.HC)
[412]  arXiv:2310.09617 (replaced) [pdf, other]
Title: How Good is ChatGPT in Giving Advice on Your Visualization Design?
Comments: 24 pages, 4 figures
Subjects: Human-Computer Interaction (cs.HC)
[413]  arXiv:2310.10117 (replaced) [pdf, other]
Title: Federated Learning with Convex Global and Local Constraints
Authors: Chuan He, Le Peng, Ju Sun
Comments: Accepted by Transactions on Machine Learning Research. Code associated with this paper can be found in this https URL
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[414]  arXiv:2310.12153 (replaced) [pdf, other]
Title: Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing
Comments: Accepted at CVPR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[415]  arXiv:2310.12931 (replaced) [pdf, other]
Title: Eureka: Human-Level Reward Design via Coding Large Language Models
Comments: ICLR 2024. Project website and open-source code: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[416]  arXiv:2310.14661 (replaced) [pdf, other]
Title: Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[417]  arXiv:2310.18042 (replaced) [pdf, other]
Title: Sui Lutris: A Blockchain Combining Broadcast and Consensus
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
[418]  arXiv:2310.19852 (replaced) [pdf, other]
Title: AI Alignment: A Comprehensive Survey
Comments: Continually updated, including weak-to-strong generalization and socio-technical thinking. 58 pages (excluding bibliography), 801 references
Subjects: Artificial Intelligence (cs.AI)
[419]  arXiv:2311.02189 (replaced) [pdf, other]
Title: FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling
Comments: ICLR 2024; Codes available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[420]  arXiv:2311.04536 (replaced) [pdf, other]
Title: Uniform Partitioning of a Bounded Region using Opaque ASYNC Luminous Mobile Robots
Comments: This paper recently got accepted in ICDCN 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[421]  arXiv:2311.04811 (replaced) [pdf, other]
Title: Image-Based Virtual Try-On: A Survey
Comments: 30 pages, 18 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422]  arXiv:2311.04843 (replaced) [pdf, other]
Title: Bridging Dimensions: Confident Reachability for High-Dimensional Controllers
Subjects: Machine Learning (cs.LG)
[423]  arXiv:2311.06582 (replaced) [pdf, ps, other]
Title: Combinatory Array Logic with Sums
Authors: Rodrigo Raya
Comments: Some inaccuracies in the original report have been corrected. Accepted for presentation in PLS '14
Subjects: Logic in Computer Science (cs.LO)
[424]  arXiv:2311.07338 (replaced) [pdf, other]
Title: A mathematical model of the visual MacKay effect
Subjects: Optimization and Control (math.OC); Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Neurons and Cognition (q-bio.NC)
[425]  arXiv:2311.07686 (replaced) [pdf, ps, other]
Title: Achieving Optimum Received Power with Elementwise Updates in the Least Number of Steps for Discrete-Phase RISs
Comments: 26 pages, 13 figures, 4 tables
Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)
[426]  arXiv:2311.08469 (replaced) [pdf, other]
Title: UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
Comments: accepted at NAACL'24
Subjects: Computation and Language (cs.CL)
[427]  arXiv:2311.11210 (replaced) [pdf, other]
Title: HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[428]  arXiv:2311.13172 (replaced) [pdf, other]
Title: Learning to Complement with Multiple Humans
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[429]  arXiv:2312.00610 (replaced) [pdf, ps, other]
Title: Experiment on Gender and Racial/Ethnic Bias Against Video Game Streamers: Comparing Perceived Gameplay Skill and Viewer Engagement
Subjects: Human-Computer Interaction (cs.HC)
[430]  arXiv:2312.01384 (replaced) [pdf, other]
Title: A Tight Lower Bound for 3-Coloring Grids in the Online-LOCAL Model
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
[431]  arXiv:2312.03871 (replaced) [pdf, other]
Title: Hidden yet quantifiable: A lower bound for confounding strength using randomized trials
Comments: Accepted for presentation at the International Conference on Artificial Intelligence and Statistics (AISTATS) 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[432]  arXiv:2312.06709 (replaced) [pdf, other]
Title: AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One
Comments: CVPR 2024 Version 3: CVPR Camera Ready, reconfigured full paper, table 1 is now more comprehensive Version 2: Added more acknowledgements and updated table 7 with more recent results. Ensured that the link in the abstract to our code is working properly Version 3: Fix broken hyperlinks
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[433]  arXiv:2312.09197 (replaced) [pdf, other]
Title: Model-Free Change Point Detection for Mixing Processes
Comments: 20 pages, 4 figures. Accepted by IEEE OJ-CSYS
Subjects: Systems and Control (eess.SY)
[434]  arXiv:2312.09801 (replaced) [pdf, other]
Title: ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)
Comments: 8 pages, 4 figures
Subjects: Computation and Language (cs.CL)
[435]  arXiv:2312.10967 (replaced) [pdf, other]
Title: Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems
Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[436]  arXiv:2312.11166 (replaced) [pdf, other]
Title: Volume-Preserving Transformers for Learning Time Series Data with Structure
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
[437]  arXiv:2312.11456 (replaced) [pdf, other]
Title: Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Comments: 53 pages; theoretical study and algorithmic design of iterative RLHF and DPO
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[438]  arXiv:2312.13119 (replaced) [pdf, other]
Title: Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)
[439]  arXiv:2312.14191 (replaced) [pdf, ps, other]
Title: Noisy Measurements Are Important, the Design of Census Products Is Much More Important
Authors: John M. Abowd
Journal-ref: Harvard Data Science Review, Volume 6, Number 2 (Spring, 2024)
Subjects: Cryptography and Security (cs.CR); Econometrics (econ.EM); Applications (stat.AP)
[440]  arXiv:2312.15796 (replaced) [pdf, other]
Title: GenCast: Diffusion-based ensemble forecasting for medium-range weather
Comments: Main text 11 pages, Appendices 76 pages
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
[441]  arXiv:2401.02051 (replaced) [pdf, other]
Title: Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Mode
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
[442]  arXiv:2401.02843 (replaced) [pdf, other]
Title: Thousands of AI Authors on the Future of AI
Comments: The asterisk indicates the corresponding author. The dagger indicates equal contribution
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[443]  arXiv:2401.04801 (replaced) [pdf, other]
Title: Refining Remote Photoplethysmography Architectures using CKA and Empirical Methods
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444]  arXiv:2401.05859 (replaced) [pdf, ps, other]
Title: New Construction of $q$-ary Codes Correcting a Burst of at most $t$ Deletions
Subjects: Information Theory (cs.IT)
[445]  arXiv:2401.10852 (replaced) [pdf, other]
Title: Software Resource Disaggregation for HPC with Serverless Computing
Comments: Accepted for publication in the 2024 International Parallel and Distributed Processing Symposium (IPDPS)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[446]  arXiv:2401.11608 (replaced) [pdf, other]
Title: $\texttt{immrax}$: A Parallelizable and Differentiable Toolbox for Interval Analysis and Mixed Monotone Reachability in JAX
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
[447]  arXiv:2401.11648 (replaced) [pdf, other]
Title: Next Visit Diagnosis Prediction via Medical Code-Centric Multimodal Contrastive EHR Modelling with Hierarchical Regularisation
Authors: Heejoon Koo
Comments: Accepted to EACL 2024 (The 18th Conference of the European Chapter of the Association for Computational Linguistics)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[448]  arXiv:2401.11974 (replaced) [pdf, other]
Title: Cross-Validation Conformal Risk Control
Comments: accepted for presentation at 2024 IEEE International Symposium on Information Theory (ISIT 2024)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[449]  arXiv:2401.13605 (replaced) [pdf, ps, other]
Title: Regulating AI-Based Remote Biometric Identification. Investigating the Public Demand for Bans, Audits, and Public Database Registrations
Subjects: Computers and Society (cs.CY)
[450]  arXiv:2401.14887 (replaced) [pdf, other]
Title: The Power of Noise: Redefining Retrieval for RAG Systems
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
[451]  arXiv:2401.15140 (replaced) [pdf, other]
Title: Link Prediction Accuracy on Real-World Networks Under Non-Uniform Missing Edge Patterns
Comments: Submitted to PLOS ONE
Subjects: Dynamical Systems (math.DS); Social and Information Networks (cs.SI)
[452]  arXiv:2401.15166 (replaced) [pdf, other]
Title: Probabilistic Design of Multi-Dimensional Spatially-Coupled Codes
Comments: 12 pages (double column), 5 figures, the short version has been accepted at the IEEE International Symposium on Information Theory (ISIT)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[453]  arXiv:2401.16465 (replaced) [pdf, other]
Title: DressCode: Autoregressively Sewing and Generating Garments from Text Guidance
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[454]  arXiv:2402.00384 (replaced) [pdf, ps, other]
Title: Adaptive FRIT-based Recursive Robust Controller Design Using Forgetting Factors
Comments: This work has been accepted to The 32nd Mediterranean Conference on Control and Automation (MED2024)
Subjects: Systems and Control (eess.SY)
[455]  arXiv:2402.00724 (replaced) [pdf, ps, other]
Title: Automatic Segmentation of the Spinal Cord Nerve Rootlets
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[456]  arXiv:2402.00905 (replaced) [pdf, other]
Title: Fine-Tuning and Prompt Engineering for Large Language Models-based Code Review Automation
Comments: 13 pages. Submit to IST journal
Subjects: Software Engineering (cs.SE)
[457]  arXiv:2402.04829 (replaced) [pdf, other]
Title: NeRF as a Non-Distant Environment Emitter in Physics-based Inverse Rendering
Comments: SIGGRAPH 2024. Project page and video: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[458]  arXiv:2402.06025 (replaced) [pdf, other]
Title: Doing Experiments and Revising Rules with Natural Language and Probabilistic Reasoning
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[459]  arXiv:2402.07330 (replaced) [pdf, other]
Title: Expert-Adaptive Medical Image Segmentation
Authors: Binyan Hu, A. K. Qin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
[460]  arXiv:2402.07712 (replaced) [pdf, other]
Title: Model Collapse Demystified: The Case of Regression
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[461]  arXiv:2402.09146 (replaced) [pdf, other]
Title: ResQuNNs:Towards Enabling Deep Learning in Quantum Convolution Neural Networks
Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph)
[462]  arXiv:2402.10466 (replaced) [pdf, other]
Title: Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Comments: More results in the next version. Code available at: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[463]  arXiv:2402.10962 (replaced) [pdf, other]
Title: Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Comments: Code: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[464]  arXiv:2403.00175 (replaced) [pdf, other]
Title: FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything
Comments: 14 pages, 9 figures, 1 table
Journal-ref: Sensors 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[465]  arXiv:2403.00209 (replaced) [pdf, other]
Title: ChartReformer: Natural Language-Driven Chart Image Editing
Comments: Published in ICDAR 2024. Code and model are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[466]  arXiv:2403.00549 (replaced) [pdf, other]
Title: Relaxometry Guided Quantitative Cardiac Magnetic Resonance Image Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[467]  arXiv:2403.01699 (replaced) [pdf, other]
Title: Brilla AI: AI Contestant for the National Science and Maths Quiz
Comments: 14 pages. Accepted for the WideAIED track at the 25th International Conference on AI in Education (AIED 2024)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[468]  arXiv:2403.02833 (replaced) [pdf, other]
Title: SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[469]  arXiv:2403.03744 (replaced) [pdf, other]
Title: Towards Safe Large Language Models for Medicine
Subjects: Artificial Intelligence (cs.AI)
[470]  arXiv:2403.04299 (replaced) [pdf, other]
Title: LitSim: A Conflict-aware Policy for Long-term Interactive Traffic Simulation
Comments: 9 pages, 6 figures, under review
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[471]  arXiv:2403.04778 (replaced) [pdf, other]
Title: An Efficient Difference-of-Convex Solver for Privacy Funnel
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Theory (cs.IT)
[472]  arXiv:2403.05452 (replaced) [pdf, other]
Title: The R2D2 deep neural network series paradigm for fast precision imaging in radio astronomy
Comments: Accepted for publication in ApJS
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[473]  arXiv:2403.06332 (replaced) [pdf, ps, other]
Title: Exploiting the Margin: How Capitalism Fuels AI at the Expense of Minoritized Groups
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[474]  arXiv:2403.07257 (replaced) [pdf, other]
Title: The Dawn of AI-Native EDA: Opportunities and Challenges of Large Circuit Models
Authors: Lei Chen (1), Yiqi Chen (2), Zhufei Chu (3), Wenji Fang (4), Tsung-Yi Ho (5), Ru Huang (2,6), Yu Huang (7), Sadaf Khan (5), Min Li (1), Xingquan Li (8), Yu Li (5), Yun Liang (2), Jinwei Liu (5), Yi Liu (5), Yibo Lin (2), Guojie Luo (2), Zhengyuan Shi (5), Guangyu Sun (2), Dimitrios Tsaras (1), Runsheng Wang (2), Ziyi Wang (5), Xinming Wei (2), Zhiyao Xie (4), Qiang Xu (5), Chenhao Xue (2), Junchi Yan (9), Jun Yang (6), Bei Yu (5), Mingxuan Yuan (1), Evangeline F.Y. Young (5), Xuan Zeng (10), Haoyi Zhang (2), Zuodong Zhang (2), Yuxiang Zhao (2), Hui-Ling Zhen (1), Ziyang Zheng (5), Binwu Zhu (5), Keren Zhu (5), Sunan Zou (2) ((1) Huawei Noah's Ark Lab, (2) Peking University, (3) Ningbo University, (4) Hong Kong University of Science and Technology, (5) The Chinese University of Hong Kong, (6) Southeast University, (7) Huawei HiSilicon, (8) Peng Cheng Laboratory, (9) Shanghai Jiao Tong University, (10) Fudan University)
Comments: The authors are ordered alphabetically. Contact: qxu@cse[dot]cuhk[dot]edu[dot]hk, gluo@pku[dot]edu[dot]cn, yuan.mingxuan@huawei[dot]com
Subjects: Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)
[475]  arXiv:2403.07928 (replaced) [pdf, other]
Title: Strategic Bidding in Knapsack Auctions
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
[476]  arXiv:2403.10560 (replaced) [pdf, other]
Title: Holographic Phase Retrieval via Wirtinger Flow: Cartesian Form with Auxiliary Amplitude
Subjects: Information Theory (cs.IT); Graphics (cs.GR); Image and Video Processing (eess.IV); Numerical Analysis (math.NA)
[477]  arXiv:2403.11169 (replaced) [pdf, other]
Title: Correcting misinformation on social media with a large language model
Comments: 53 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[478]  arXiv:2403.11893 (replaced) [pdf, other]
Title: Entanglement Coordination Rates in Multi-User Networks
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
[479]  arXiv:2403.12830 (replaced) [pdf, other]
Title: Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Approximate Unlearning Completeness
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[480]  arXiv:2403.13315 (replaced) [pdf, other]
Title: PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[481]  arXiv:2403.13799 (replaced) [pdf, other]
Title: Reverse Training to Nurse the Reversal Curse
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[482]  arXiv:2403.13890 (replaced) [pdf, other]
Title: Towards Learning Contrast Kinetics with Multi-Condition Latent Diffusion Models
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[483]  arXiv:2403.16411 (replaced) [pdf, other]
Title: A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups
Comments: Preprint for L-CSS
Subjects: Systems and Control (eess.SY)
[484]  arXiv:2403.17169 (replaced) [pdf, other]
Title: QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims
Comments: 11 pages, 1 figure,Accepted for publication at the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[485]  arXiv:2403.17240 (replaced) [pdf, other]
Title: The Role of $n$-gram Smoothing in the Age of Neural Networks
Comments: NAACL 2024
Subjects: Computation and Language (cs.CL)
[486]  arXiv:2403.18257 (replaced) [pdf, other]
Title: Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation
Comments: work in progress
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[487]  arXiv:2404.00231 (replaced) [pdf, ps, other]
Title: Attention-based Shape-Deformation Networks for Artifact-Free Geometry Reconstruction of Lumbar Spine from MR Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[488]  arXiv:2404.00247 (replaced) [pdf, ps, other]
Title: Facilitating Reinforcement Learning for Process Control Using Transfer Learning: Perspectives
Comments: Final Version of Asian Control Conference (ASCC 2024)
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[489]  arXiv:2404.00781 (replaced) [pdf, other]
Title: Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning
Comments: Published in the Proceedings of the 12th International Conference on Learning Representations (ICLR 2024). Code is available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[490]  arXiv:2404.01094 (replaced) [pdf, other]
Title: HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[491]  arXiv:2404.02543 (replaced) [pdf, other]
Title: Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search Dataset
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[492]  arXiv:2404.02595 (replaced) [pdf, other]
Title: QFNN-FFD: Quantum Federated Neural Network for Financial Fraud Detection
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Risk Management (q-fin.RM)
[493]  arXiv:2404.02877 (replaced) [pdf, other]
Title: FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery
Comments: 15 figures, 4 tables, comprehensive survey, comparative study
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[494]  arXiv:2404.03443 (replaced) [pdf, ps, other]
Title: Part-Attention Based Model Make Occluded Person Re-Identification Stronger
Comments: Accepted By International Joint Conference on Neural Networks 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[495]  arXiv:2404.03685 (replaced) [pdf, other]
Title: Cooperative Evolutionary Pressure and Diminishing Returns Might Explain the Fermi Paradox: On What Super-AIs Are Like
Authors: Daniel Vallstrom
Comments: 23 pages, 1 figure. Added acknowledgement, clarifications, references
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
[496]  arXiv:2404.07066 (replaced) [pdf, other]
Title: Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Comments: 12 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[497]  arXiv:2404.07337 (replaced) [pdf, ps, other]
Title: Probabilistic estimates of the diameters of the Rubik's Cube groups
Authors: So Hirata
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Probability (math.PR)
[498]  arXiv:2404.07356 (replaced) [pdf, other]
Title: GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data
Comments: Accepted to the 37th Canadian Artificial Intelligence Conference (2024), 12 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[499]  arXiv:2404.09378 (replaced) [pdf, other]
Title: Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation
Comments: 12 pages, 8 figures, 6 tables; minor corrections
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[500]  arXiv:2404.10446 (replaced) [pdf, other]
Title: Watching Grass Grow: Long-term Visual Navigation and Mission Planning for Autonomous Biodiversity Monitoring
Comments: to be presented at the Workshop on Field Robotics - ICRA 2024
Subjects: Robotics (cs.RO)
[501]  arXiv:2404.10876 (replaced) [pdf, other]
Title: Course Recommender Systems Need to Consider the Job Market
Comments: accepted at SIGIR 2024 as a perspective paper. Camera Ready will come soon
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[502]  arXiv:2404.11568 (replaced) [pdf, other]
Title: On the Scalability of GNNs for Molecular Graphs
Subjects: Machine Learning (cs.LG)
[503]  arXiv:2404.11988 (replaced) [pdf, other]
Title: The Emerging AI Divide in the United States
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[504]  arXiv:2404.12087 (replaced) [pdf, other]
Title: Optimizing the diffusion coefficient of overdamped Langevin dynamics
Comments: 76 pages, 11 figures
Subjects: Numerical Analysis (math.NA)
[505]  arXiv:2404.12402 (replaced) [pdf, other]
Title: Sup3r: A Semi-Supervised Algorithm for increasing Sparsity, Stability, and Separability in Hierarchy Of Time-Surfaces architectures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[506]  arXiv:2404.13195 (replaced) [pdf, ps, other]
Title: Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[507]  arXiv:2404.14232 (replaced) [pdf, other]
Title: Shifting Focus with HCEye: Exploring the Dynamics of Visual Highlighting and Cognitive Load on User Attention and Saliency Prediction
Comments: 18 pages, 9 Figures, Conference: ACM Symposium on Eye Tracking Research & Applications (ETRA); Journal: Proc. ACM Hum.-Comput. Interact., Vol. 8, No. ETRA, Article 236. Publication date: May 2024
Journal-ref: Proc. ACM Hum.-Comput. Interact., Vol. 8, No. ETRA, Article 236. Publication date: May 2024
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[508]  arXiv:2404.14236 (replaced) [pdf, other]
Title: EcoPull: Sustainable IoT Image Retrieval Empowered by TinyML Models
Comments: Paper submitted to IEEE GLOBECOM 2024. Copyright may be transferred without further notice
Subjects: Networking and Internet Architecture (cs.NI)
[509]  arXiv:2404.14399 (replaced) [pdf, other]
Title: MLQAOA: Graph Learning Accelerated Hybrid Quantum-Classical Multilevel QAOA
Comments: 18 pages, 3 figures, 4 tables
Subjects: Quantum Physics (quant-ph); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
[510]  arXiv:2404.14591 (replaced) [pdf, other]
Title: Predicting the Temporal Dynamics of Prosthetic Vision
Subjects: Computational Engineering, Finance, and Science (cs.CE)
[511]  arXiv:2404.14710 (replaced) [pdf, other]
Title: Challenges of Using Pre-trained Models: the Practitioners' Perspective
Comments: SANER 2024
Subjects: Software Engineering (cs.SE)
[512]  arXiv:2404.15378 (replaced) [pdf, other]
Title: Hierarchical Hybrid Sliced Wasserstein: A Scalable Metric for Heterogeneous Joint Distributions
Authors: Khai Nguyen, Nhat Ho
Comments: 28 pages, 11 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[513]  arXiv:2404.15735 (replaced) [pdf, other]
Title: On Replacing Cryptopuzzles with Useful Computation in Blockchain Proof-of-Work Protocols
Comments: Submitted to ACM Computing Surveys
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[514]  arXiv:2404.15789 (replaced) [pdf, other]
Title: MotionMaster: Training-free Camera Motion Transfer For Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[515]  arXiv:2404.16233 (replaced) [pdf, other]
Title: AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models
Comments: Accepted at AutoML 2024 Conference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[516]  arXiv:2404.17335 (replaced) [pdf, other]
Title: A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation
Comments: 16 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[517]  arXiv:2404.17429 (replaced) [pdf, other]
Title: Separation capacity of linear reservoirs with random connectivity matrix
Authors: Youness Boutaib
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
[518]  arXiv:2404.17465 (replaced) [pdf, ps, other]
Title: Fast Abstracts and Student Forum Proceedings -- EDCC 2024 -- 19th European Dependable Computing Conference
Subjects: Software Engineering (cs.SE); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Robotics (cs.RO)
[519]  arXiv:2404.17553 (replaced) [pdf, other]
Title: Federated Transfer Component Analysis Towards Effective VNF Profiling
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
[520]  arXiv:2404.17888 (replaced) [pdf, other]
Title: A Hybrid Approach for Document Layout Analysis in Document images
Comments: ICDAR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[521]  arXiv:2404.17954 (replaced) [pdf, other]
Title: Parameterized Linear Time Transitive Closure
Comments: arXiv admin note: substantial text overlap with arXiv:2212.03945
Subjects: Data Structures and Algorithms (cs.DS)
[522]  arXiv:2404.18084 (replaced) [pdf, other]
Title: Age-minimal Multicast by Graph Attention Reinforcement Learning
Subjects: Networking and Internet Architecture (cs.NI)
[523]  arXiv:2404.18191 (replaced) [pdf, other]
Title: Exploring the Robustness of In-Context Learning with Noisy Labels
Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Optimization and Control (math.OC)
[524]  arXiv:2404.18253 (replaced) [pdf, other]
Title: Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
Authors: Tengjun Huang
Comments: Accepted by the Twelfth International Conference on Learning Representations (ICLR) Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[525]  arXiv:2404.18399 (replaced) [pdf, other]
Title: Semantic Line Combination Detector
Comments: CVPR 2024 accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[526]  arXiv:2404.18416 (replaced) [pdf, other]
[527]  arXiv:2404.18444 (replaced) [pdf, other]
Title: U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models
Authors: Song Mei
Comments: v2 updated discussions of related literature
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
[528]  arXiv:2404.18519 (replaced) [pdf, other]
Title: On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[529]  arXiv:2404.18702 (replaced) [pdf, other]
Title: Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Applications (stat.AP); Machine Learning (stat.ML)
[530]  arXiv:2404.18771 (replaced) [pdf, other]
Title: KBX: Verified Model Synchronization via Formal Bidirectional Transformation
Subjects: Software Engineering (cs.SE)
[531]  arXiv:2404.18796 (replaced) [pdf, other]
Title: Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[532]  arXiv:2404.19097 (replaced) [pdf, other]
Title: Exploring the Capability of LLMs in Performing Low-Level Visual Analytic Tasks on SVG Data Visualizations
Subjects: Human-Computer Interaction (cs.HC)
[533]  arXiv:2404.19109 (replaced) [pdf, other]
Title: The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset
Subjects: Machine Learning (cs.LG); General Finance (q-fin.GN)
[534]  arXiv:2404.19145 (replaced) [pdf, other]
Title: Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Machine Learning (stat.ML)
[535]  arXiv:2404.19217 (replaced) [pdf, other]
Title: FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills
Subjects: Robotics (cs.RO)
[536]  arXiv:2404.19242 (replaced) [pdf, other]
Title: A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems
Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Methodology (stat.ME)
[537]  arXiv:2404.19265 (replaced) [pdf, other]
Title: Mapping New Realities: Ground Truth Image Creation with Pix2Pix Image-to-Image Translation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[538]  arXiv:2404.19326 (replaced) [pdf, other]
Title: LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation
Comments: LVOS V2
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[539]  arXiv:2404.19336 (replaced) [pdf, ps, other]
Title: Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts
Comments: Accepted in ITS 2024
Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
[540]  arXiv:2404.19431 (replaced) [pdf, ps, other]
Title: Integrated Sensing and Communications for Unsourced Random Access: Fundamental Limits
Subjects: Information Theory (cs.IT)
[541]  arXiv:2404.19438 (replaced) [pdf, other]
Title: Neuro-Vision to Language: Image Reconstruction and Language enabled Interaction via Brain Recordings
Subjects: Neural and Evolutionary Computing (cs.NE)
[542]  arXiv:2404.19706 (replaced) [pdf, other]
Title: RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting
Comments: To be published in ACM SIGGRAPH 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[ total of 542 entries: 1-542 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)