Contributors Forks Stargazers Issues

Updated on 2025.04.18

Website

You can learn directly from this page

Tracking

Publish Date Title Authors PDF Code
2025-04-16 Efficient spin-orbit torque driven magnetization switching of GdFe using phosphorus-implanted platinum layers Kazuki Shintaku et.al. 2504.11796 null
2025-04-15 Chiral Domain Walls Induced by Radially Magnetized Nanotube Geometry Nobuyuki Umetsu et.al. 2504.11005 null
2025-04-16 Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution Chenghao Li et.al. 2504.09566 null
2025-04-13 Sub-nanosecond in-plane magnetization switching induced by field-like spin-orbit torques from ferromagnets Hanying Zhang et.al. 2504.09431 null
2025-04-12 Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking You Wu et.al. 2504.09228 null
2025-04-11 Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions Yingqian Xu et.al. 2504.08257 null
2025-04-08 Magnetic Memory Driven by Orbital Current Jingkai Xu et.al. 2504.05780 null
2025-04-07 Dimensionality Enhanced Out-of-Plane Spin Currents in NbIrTe $_4$ for Efficient Field-Free Switching of Perpendicular Magnetization Wei Yang et.al. 2504.05280 null
2025-04-02 Shape Anisotropy Enabled Field Free Switching of Perpendicular Nanomagnets Akanksha Chouhan et.al. 2504.01634 null
2025-03-31 Symmetry Enhanced Unconventional Spin Current Anisotropy in a Collinear Antiferromagnet Pankhuri Gupta et.al. 2503.20545 null
2025-03-26 Intrinsic back-switching phenomenon in SOT-MRAM devices Kuldeep Ray et.al. 2503.19840 null
2025-03-22 MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking Haolin Qin et.al. 2503.17699 link
2025-04-07 Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID Yu-Hsi Chen et.al. 2503.17237 link
2025-03-21 Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks Haijin Zeng et.al. 2503.16930 null
2025-03-21 Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking Meng Zhou et.al. 2503.16768 null
2025-03-17 UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network Siyuan Yao et.al. 2503.12888 link
2025-03-16 Equivalent-Circuit Thermal Model for Batteries with One-Shot Parameter Identification Myisha A. Chowdhury et.al. 2503.12616 null
2025-03-13 Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking Xinglong Sun et.al. 2503.09951 null
2025-03-09 Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking Chaocan Xue et.al. 2503.06625 link
2025-03-09 Dynamic Updates for Language Adaptation in Visual-Language Tracking Xiaohai Li et.al. 2503.06621 link
2025-03-06 High resolution spectra of the [6297-6303] and [6361-6367] Angstr{ö}m domains (including forbidden OI lines) of the Sun and brightest stars Jean-Marie Malherbe et.al. 2503.05832 null
2025-03-07 Separating the bulk and interface contribution of spin-orbit torque in ferromagnet-Heavy metal bilayers tuned by variation of resistivity of heavy metal Abu Bakkar Miah et.al. 2503.05341 null
2025-03-07 Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Simon A. Aytes et.al. 2503.05179 link
2025-03-02 Inefficiency of the orbit Hall effect on spin torque in transition metal/ferromagnet bilayers Yizhuo Song et.al. 2503.00910 null
2025-02-27 MITracker: Multi-View Integration for Visual Object Tracking Mengjie Xu et.al. 2502.20111 null
2025-03-08 Dynamic Degradation Decomposition Network for All-in-One Image Restoration Huiqiang Wang et.al. 2502.19068 null
2025-02-25 UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking He Wang et.al. 2502.18220 null
2025-02-24 Symmetry-breaking effects on spin-orbit torque switching in ferromagnetic semiconductors with perpendicular magnetic anisotropy Apu Kumar Jana et.al. 2502.16788 null
2025-02-17 Effects of antiferromagnetic coupling and pinning on domain wall dynamics in synthetic ferrimagnets Sougata Mallick et.al. 2502.11621 null
2025-02-13 Modelling spin-orbitronics effects at interfaces and chiral molecules Poonam Kumari et.al. 2502.09239 null
2025-02-12 Highly efficient field-free switching by orbital Hall torque in a MoS2-based device operating at room temperature Antonio Bianco et.al. 2502.08483 null
2025-02-08 Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark Shiao Wang et.al. 2502.05574 link
2025-02-06 Visualizing Field-free Deterministic Magnetic Switching of all-van der Waals Spin-Orbit Torque System Using Spin Ensembles in Hexagonal Boron Nitride Xi Zhang et.al. 2502.04561 null
2025-01-27 Investigation of Sub-configurations Reveals Stable Spin-Orbit Torque Switching Polarity in Polycrystalline Mn3Sn Boyu Zhao et.al. 2501.15815 null
2025-01-25 Thermal Stability and Depinning Currents of Domain Wall-Based Artificial Synapses Guntas Kaur et.al. 2501.15102 null
2025-02-16 Enhancing Unconventional Spin-Orbit Torque Efficiency: Numerical Study on the Influence of Crystallographic Texture and Polycrystalline Effects on Low-Symmetry Materials Yifei Yang et.al. 2501.14200 null
2025-01-22 Enhanced Field-Free Perpendicular Magnetization Switching via spin splitting torque in Altermagnetic RuO2-based Heterostructures Badsha Sekh et.al. 2501.12593 null
2025-01-18 Multilayered MXenes for future two-dimensional nonvolatile magnetic memories P. Kumar et.al. 2501.10678 null
2025-01-13 Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions Xiantong Zhao et.al. 2501.07133 null
2025-01-11 ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation Xuanle Zhao et.al. 2501.06598 link
2025-01-18 BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination Zhongxuan Zhang et.al. 2501.03616 null
2025-01-05 DeTrack: In-model Latent Denoising Learning for Visual Object Tracking Xinyu Zhou et.al. 2501.02467 null
2024-12-31 Alternative harmonic detection approach for quantitative determination of spin and orbital torques Y. Xu et.al. 2501.00403 null
2024-12-30 An Experimental Study of Passive UAV Tracking with Digital Arrays and Cellular Downlink Signals Yifei Sun et.al. 2412.20788 null
2024-12-30 Spin-orbit torque in a three-fold-symmetric bilayer and its effect on magnetization dynamics Wuzhang Fang et.al. 2412.20746 null
2024-12-28 Learning Adaptive and View-Invariant Vision Transformer with Multi-Teacher Knowledge Distillation for Real-Time UAV Tracking You Wu et.al. 2412.20002 link
2024-12-27 Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues X. Feng et.al. 2412.19648 link
2024-12-26 Semistrong edge colorings of planar graphs Yuquan Lin et.al. 2412.19230 null
2024-12-26 SUTrack: Towards Simple and Unified Single Object Tracking Xin Chen et.al. 2412.19138 link
2024-12-24 Linear Enhancement of Spin-Orbit Torques and Absence of Bulk Rashba-Type Spin Splitting in Perpendicularly Magnetized [Pt/Co/W]n Superlattices Zhihao Yan et.al. 2412.18481 null
2024-12-24 Field-free current-induced magnetization switching of a room temperature van der Waals magnet for neuromorphic computing Chenxi Zhou et.al. 2412.18429 null
2024-12-24 All-electric mimicking synaptic plasticity based on the noncollinear antiferromagnetic device Cuimei Cao et.al. 2412.18418 null
2025-01-01 Unsupervised UAV 3D Trajectories Estimation with Sparse Point Clouds Hanfang Liang et.al. 2412.12716 link
2024-12-15 Exploring Enhanced Contextual Information for Video-Level Object Tracking Ben Kang et.al. 2412.11023 link
2024-12-13 Visual Object Tracking across Diverse Data Modalities: A Review Mengmeng Wang et.al. 2412.09991 null
2024-12-09 Magnetic Switching in Monolayer 2D Diluted Magnetic Semiconductors via Spin-to- Spin Conversion Siwei Chen et.al. 2412.06650 null
2024-12-09 Energy Efficient Stochastic Signal Manipulation in Superparamagnetic Tunnel Junctions via Voltage-Controlled Exchange Coupling Qi Jia et.al. 2412.06256 null
2024-12-03 GSOT3D: Towards Generic 3D Single Object Tracking in the Wild Yifan Jiao et.al. 2412.02129 link
2024-12-01 MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning You Wu et.al. 2412.00626 null
2024-11-29 Current-driven motion of magnetic domain-wall skyrmions Haoyang Nie et.al. 2411.19566 null
2024-11-28 Unveiling the anisotropy of linear and nonlinear charge-spin conversion in Weyl semimetal TaIrTe4 Tao Tang et.al. 2411.19062 null
2024-12-04 A Distractor-Aware Memory for Visual Object Tracking with SAM2 Jovana Videnovic et.al. 2411.17576 link
2024-11-24 MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking Chunhui Zhang et.al. 2411.15761 link
2024-11-23 How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking Xuchen Li et.al. 2411.15600 null
2024-11-23 MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking Xinqi Liu et.al. 2411.15459 null
2024-11-24 ClickTrack: Towards Real-time Interactive Single Object Tracking Kuiran Wang et.al. 2411.13183 null
2024-11-30 SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Cheng-Yen Yang et.al. 2411.11922 link
2024-11-14 Compression Method for Solar Polarization Spectra Collected from Hinode SOT/SP Observations Jargalmaa Batmunkh et.al. 2411.09311 null
2024-11-10 Orthogonal Spin-Orbit Torque-Induced Deterministic Switching in NiO Yixiao Qiao et.al. 2411.06379 null
2024-11-08 Giant spin Hall effect with multi-directional spin components in Ni4W Yifei Yang et.al. 2411.05682 null
2024-11-04 Single-layer spin-orbit-torque magnetization switching due to spin Berry curvature generated by minute spontaneous atomic displacement in a Weyl oxide Hiroto Horiuchi et.al. 2411.01806 null
2024-11-04 ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model Yiming Sun et.al. 2411.01756 null
2024-11-03 Capping layer dependent anti-correlation between magnetic damping and spin-orbital to charge conversion Antarjami Sahoo et.al. 2411.01662 null
2024-11-01 Spin orbit torque-driven motion of quasi-Bloch domain wall in perpendicularly magnetized W/CoFeB/MgO structures Nobuyuki Umetsu et.al. 2411.00516 null
2024-10-31 Origin of line broadening in fading granule: influence of small-scale turbulence Ryohtaroh T. Ishikawa et.al. 2410.23654 null
2024-10-27 NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking Yu Liu et.al. 2410.20421 link
2024-10-25 Can Stories Help LLMs Reason? Curating Information Space Through Narrative Vahid Sadiri Javadi et.al. 2410.19221 null
2024-10-19 The Solution for Single Object Tracking Task of Perception Test Challenge 2024 Zhiqiang Zhong et.al. 2410.16329 null
2024-10-14 A stronger form of Yamamoto’s theorem II – Spectral operators Soumyashant Nayak et.al. 2410.16318 null
2024-10-03 Leveraging Event Streams with Deep Reinforcement Learning for End-to-End UAV Tracking Ala Souissi et.al. 2410.14685 null
2024-10-16 DaDiff: Domain-aware Diffusion Model for Nighttime UAV Tracking Haobo Zuo et.al. 2410.12270 link
2024-10-14 SMART-TRACK: A Novel Kalman Filter-Guided Sensor Fusion For Robust UAV Object Tracking in Dynamic Environments Khaled Gabr et.al. 2410.10409 link
2024-10-09 DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM Xuchen Li et.al. 2410.02492 null
2024-10-01 Energy-efficient picosecond spin-orbit torque magnetization switching in ferro- and ferrimagnetic films Eva Díaz et.al. 2410.00474 null
2024-09-27 Improving Visual Object Tracking through Visual Prompting Shih-Fang Chen et.al. 2409.18901 link
2024-09-27 Prompt-Driven Temporal Domain Adaptation for Nighttime UAV Tracking Changhong Fu et.al. 2409.18533 link
2024-09-26 A 5T-2MTJ STT-assisted Spin Orbit Torque based Ternary Content Addressable Memory for Hardware Accelerators Siri Narla et.al. 2409.17863 null
2024-09-26 General Compression Framework for Efficient Transformer Object Tracking Lingyi Hong et.al. 2409.17564 null
2024-09-26 Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking Pengcheng Shao et.al. 2409.17560 null
2024-09-25 Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2 Chunhui Zhang et.al. 2409.16902 link
2024-09-25 Conditional Generative Denoiser for Nighttime UAV Tracking Yucheng Wang et.al. 2409.16834 link
2024-09-25 Progressive Representation Learning for Real-Time UAV Tracking Changhong Fu et.al. 2409.16652 link
2024-09-25 Enhancing Nighttime UAV Tracking with Light Distribution Suppression Liangliang Yao et.al. 2409.16631 link
2024-09-24 Pulse Shaping Strategies for Efficient Switching of Magnetic Tunnel Junctions by Spin-Orbit Torque Marco Hoffmann et.al. 2409.16454 null
2024-09-24 CloudTrack: Scalable UAV Tracking with Cloud Semantics Yannik Blei et.al. 2409.16111 link
2024-09-20 A survey of sulfur-bearing molecular lines toward the dense cores in eleven massive protoclusters Mengyao Tang et.al. 2409.13231 null
2024-09-19 Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC Jiawen Kang et.al. 2409.12388 link
2024-09-11 Topological Spin-Orbit Torque in Ferrimagnetic Weyl Semimetal Tomonari Meguro et.al. 2409.07106 null
2024-09-09 Effects of Interfacial Oxygen Diffusion on the Magnetic Properties and Thermal Stability of Pd/CoFeB/Pd/Ta Heterostructure Saravanan Lakshmanan et.al. 2409.05783 null
2024-09-11 Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition Hao Shi et.al. 2409.00815 null
2024-08-30 Advancing Multi-talker ASR Performance with Large Language Models Mohan Shi et.al. 2408.17431 null
2024-08-30 Cross Fusion RGB-T Tracking with Bi-directional Adapter Zhirong Zeng et.al. 2408.16979 null
2024-08-23 Energy-efficient field-free unconventional spin-orbit torque magnetization switching dynamics in van der Waals heterostructures Lalit Pandey et.al. 2408.13095 null
2024-08-21 Low-Light Object Tracking: A Benchmark Pengzhi Zhong et.al. 2408.11463 link
2024-08-20 MambaEVT: Event Stream based Visual Object Tracking using State Space Model Xiao Wang et.al. 2408.10487 link
2024-08-19 Reconfigurable Spin Logics and High-density Multistate Memory in a Single Spin-orbit Torque Device Raghvendra Posti et.al. 2408.09866 null
2024-08-16 Initialization-Free Multistate Memristor: Synergy of Spin-Orbit Torque and Magnetic Fields Raghvendra Posti et.al. 2408.08641 null
2024-08-15 MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking Simiao Lai et.al. 2408.07889 null
2024-08-12 Latent Disentanglement for Low Light Image Enhancement Zhihao Zheng et.al. 2408.06245 null
2024-08-11 Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends Jeffry Victor et.al. 2408.05857 null
2024-08-05 VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking Yuxuan Lu et.al. 2408.02263 null
2024-08-04 3D Single-object Tracking in Point Clouds with High Temporal Variation Qiao Wu et.al. 2408.02049 null
2024-07-30 Strained topological insulator spin-orbit torque random access memory (STI-SOTRAM) bit cell for energy-efficient Processing in Memory Md Golam Morshed et.al. 2407.20925 null
2024-07-19 HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation Zezeng Li et.al. 2407.14419 null
2024-07-17 Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm Shiyu Liu et.al. 2407.12614 null
2024-07-15 Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss Mufeng Yao et.al. 2407.10485 link
2024-07-16 Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking Lorenzo Vaquero et.al. 2407.10151 link
2024-07-12 DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects Peng Wang et.al. 2407.09051 null
2024-07-11 Manipulating a Tetris-Inspired 3D Video Representation Mihir Godbole et.al. 2407.08885 null
2024-07-11 Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets Linh Van Ma et.al. 2407.08872 link
2024-07-11 CommRad: Context-Aware Sensing-Driven Millimeter-Wave Networks Ish Kumar Jain et.al. 2407.08817 null
2024-07-10 Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors Lei Cheng et.al. 2407.08049 null
2024-07-10 Large spin-orbit torque in a-plane $α$-Fe${2}$O${3}$ /Pt bilayers Igor Lyalin et.al. 2407.07731 null
2024-07-10 Spin Splitting in Altermagnetic RuO $_2$ Enables Field-free Spin-Orbit Torque Switching via Dominant Out-of-Plane Spin Polarization Zhuoyi Li et.al. 2407.07447 null
2024-07-09 Unconventional Spin-Orbit Torques from Sputtered MoTe2 Films Shuchen Li et.al. 2407.06487 null
2024-07-07 Addressing single object tracking in satellite imagery through prompt-engineered solutions Athena Psalta et.al. 2407.05518 null
2024-07-07 Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking You Wu et.al. 2407.05383 null
2024-07-09 P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds Jiahao Nie et.al. 2407.05238 link
2024-07-05 Median Mishaps between Chirality and Spin-Orbit Torques via Asymmetric Hysteresis Minhwan Kim et.al. 2407.04624 null
2024-07-04 Serialized Output Training by Learned Dominance Ying Shi et.al. 2407.03966 null
2024-07-04 TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers Fatemeh Nourilenjan Nokabadi et.al. 2407.03946 link
2024-07-04 Out-of-Plane Polarization from Spin Reflection Induces Field-Free Spin-Orbit Torque Switching in Structures with Canted NiO Interfacial Moments Zhe Zhang et.al. 2407.03676 null

HDR

Publish Date Title Authors PDF Code
2025-04-17 CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework Wentao Wu et.al. 2504.12576 null
2025-04-16 Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space Kaustav Chanda et.al. 2504.12515 null
2025-04-16 Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging Tristan S. W. Stevens et.al. 2504.12154 null
2025-04-11 High Dynamic Range Modulo Imaging for Robust Object Detection in Autonomous Driving Kebin Contreras et.al. 2504.11472 null
2025-04-15 GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR Christophe Bolduc et.al. 2504.10809 null
2025-04-14 Minimal Sensing for Orienting a Solar Panel Jeremy Klotz et.al. 2504.10765 null
2025-04-13 Low-Light Image Enhancement using Event-Based Illumination Estimation Lei Sun et.al. 2504.09379 null
2025-04-10 S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion Yujin Wang et.al. 2504.07667 null
2025-04-08 Orthogonal Matching Pursuit based Reconstruction for Modulo Hysteresis Operators Matthias Beckmann et.al. 2504.05895 null
2025-04-08 Inter-event Interval Microscopy for Event Cameras Changqing Su et.al. 2504.04924 null
2025-04-06 eKalibr-Stereo: Continuous-Time Spatiotemporal Calibration for Event-Based Stereo Visual Systems Shuolong Chen et.al. 2504.04451 link
2025-04-05 Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision Applications Brayan Monroy et.al. 2504.04228 null
2025-04-03 Brightness Perceiving for Recursive Low-Light Image Enhancement Haodian Wang et.al. 2504.02362 link
2025-04-02 Anomaly Detection for Hybrid Butterfly Subspecies via Probability Filtering Bo-Kai Ruan et.al. 2504.01671 link
2025-03-31 DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting Seungjun Lee et.al. 2503.24210 null
2025-03-29 SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry Peiyu Chen et.al. 2503.22963 link
2025-03-28 Enhancing Celestial Imaging: High Dynamic Range with Neuromorphic Cameras Satyapreet Singh Yadav et.al. 2503.22814 null
2025-03-26 SpikeDerain: Unveiling Clear Videos from Rainy Sequences Using Color Spike Streams Hanwen Liang et.al. 2503.20315 null
2025-03-26 A Survey on Event-driven 3D Reconstruction: Development under Different Categories Chuanzhi Xu et.al. 2503.19753 null
2025-03-25 Maximum Likelihood Estimation Based Complex-Valued Robust Chinese Remainder Theorem and Its Fast Algorithm Xiaoping Li et.al. 2503.18625 null
2025-03-21 Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras Shuang Guo et.al. 2503.17262 link
2025-03-20 Neuromorphic Cameras in Astronomy: Unveiling the Future of Celestial Imaging Beyond Conventional Limits Satyapreet Singh Yadav et.al. 2503.15883 null
2025-03-19 Boosting HDR Image Reconstruction via Semantic Knowledge Transfer Qingsen Yan et.al. 2503.15361 null
2025-03-20 VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention Mingzhe Zheng et.al. 2503.15138 null
2025-03-18 Weakly Supervised Spatial Implicit Neural Representation Learning for 3D MRI-Ultrasound Deformable Image Registration in HDR Prostate Brachytherapy Jing Wang et.al. 2503.14395 null
2025-03-17 UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks Yuanbin Qian et.al. 2503.12905 link
2025-03-17 Stereo Event-based, 6-DOF Pose Tracking for Uncooperative Spacecraft Zibin Liu et.al. 2503.12732 link
2025-03-16 EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera Luming Wang et.al. 2503.12419 link
2025-03-14 Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP Trevor D. Canham et.al. 2503.11883 null
2025-03-13 GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping Jinfeng Liu et.al. 2503.10143 null
2025-03-10 Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion Haowen Bai et.al. 2503.07235 null
2025-03-08 Optimization models for needle placement in 3D-printed masks for high dose rate brachytherapy Nasim Mirzavand Boroujeni et.al. 2503.06000 null
2025-03-16 DeepGrav: Anomalous Gravitational-Wave Detection Through Deep Latent Features Jianqi Yan et.al. 2503.03799 link
2025-03-05 BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation Gangwei Xu et.al. 2503.03256 null
2025-03-04 ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement Xuejian Guo et.al. 2503.02484 link
2025-03-03 S-R2D2: a spherical extension of the R2D2 deep neural network series paradigm for wide-field radio-interferometric imaging A. Tajja et.al. 2503.01462 null
2025-03-03 Adaptive cold-atom magnetometry mitigating the trade-off between sensitivity and dynamic range Zhu Ma et.al. 2503.01211 null
2025-03-01 High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm Zhaoyi Tian et.al. 2503.00410 link
2025-03-01 Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach Guixu Lin et.al. 2503.00377 null
2025-02-28 EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration Kuangyi Chen et.al. 2503.00167 link
2025-02-28 SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events Yunfan Lu et.al. 2502.21120 null
2025-02-18 Fast Antibiotic resistance-Based gene editing of mammalian cells with CRISPR-Cas9 (FAB-CRISPR) Petia Adarska et.al. 2502.12675 null
2025-02-14 Quantifying Phase Magnitudes of Open-Source Focused-Probe 4D-STEM Ptychography Reconstructions Toma Susi et.al. 2502.09938 link
2025-02-10 Indoor Light and Heat Estimation from a Single Panorama Guanzhou Ji et.al. 2502.06973 null
2025-02-09 Compressed sensing enabled high-bandwidth and large dynamic range magnetic sensing Galya Haim et.al. 2502.06070 null
2025-02-09 Energy-Efficient Autonomous Aerial Navigation with Dynamic Vision Sensors: A Physics-Guided Neuromorphic Approach Sourav Sanyal et.al. 2502.05938 null
2025-02-07 Differentiable Mobile Display Photometric Stereo Gawoon Ban et.al. 2502.05055 null
2025-02-05 Deep Learning-based Event Data Coding: A Joint Spatiotemporal and Polarity Solution Abdelrahman Seleem et.al. 2502.03285 null
2025-02-04 Event-aided Semantic Scene Completion Shangwei Guo et.al. 2502.02334 link
2025-01-23 HP2 Survey V. Ophiuchus: Filament formation in a dispersing cloud complex João Alves et.al. 2501.13931 null
2025-01-22 DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning Wenhao Gu et.al. 2501.12898 null
2025-01-20 UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion Zixuan Chen et.al. 2501.11515 null
2025-01-10 eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events Shuolong Chen et.al. 2501.05688 link
2025-01-07 AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene Chaoran Feng et.al. 2501.02807 null
2024-12-26 Learning Monocular Depth from Events via Egomotion Compensation Haitao Meng et.al. 2412.19067 null
2024-12-25 HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis Mohammed Hamdan et.al. 2412.18981 null
2024-12-20 High-Dynamic Range Broadband Terahertz Time-Domain Spectrometer Based on Organic Crystal MNA Samira Mansourzadeh et.al. 2412.15718 null
2024-12-19 Event-assisted 12-stop HDR Imaging of Dynamic Scene Shi Guo et.al. 2412.14705 null
2025-01-06 LEDiff: Latent Exposure Diffusion for HDR Generation Chao Wang et.al. 2412.14456 null
2024-12-18 Development of a High-Resolution, High-Dynamic-Range Charge Detector for Ion Beam Monitoring O. Adriani et.al. 2412.13934 null
2024-12-18 Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode Xin Su et.al. 2412.13749 link
2024-12-17 Transforming Single Photon Camera Images to Color High Dynamic Range Images Sumit Sharma et.al. 2412.12942 null
2024-12-17 Efficient Event-based Semantic Segmentation with Spike-driven Lightweight Transformer-based Networks Xiaxin Zhu et.al. 2412.12843 null
2024-12-17 Compressed Sensing Based Residual Recovery Algorithms and Hardware for Modulo Sampling Shaik Basheeruddin Shah et.al. 2412.12724 null
2024-12-16 Towards Physically-Based Sky-Modeling Ian J. Maquignaz et.al. 2412.11883 null
2024-12-16 High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit Sonia Rani et.al. 2412.11859 null
2024-12-16 Predicting the Original Appearance of Damaged Historical Documents Zhenhua Yang et.al. 2412.11634 link
2024-12-16 Event-based Detectors for Laser Guide Star Tip-Tilt Sensing Monique Cockram et.al. 2412.11436 null
2024-12-12 Continuous Gaussian Process Pre-Optimization for Asynchronous Event-Inertial Odometry Zhixiang Wang et.al. 2412.08909 null
2024-12-10 EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering Toshiya Yura et.al. 2412.07293 null
2024-12-09 Fitting Spherical Gaussians to Dynamic HDRI Sequences Pascal Clausen et.al. 2412.06511 null
2024-12-09 Event fields: Capturing light fields at high speed, resolution, and dynamic range Ziyuan Qu et.al. 2412.06191 null
2024-12-07 On an Analytical Inversion Formula for the Modulo Radon Transform Matthias Beckmann et.al. 2412.05711 null
2024-12-05 DHOST theories as disformal gravity: From black holes to radiative spacetimes Jibril Ben Achour et.al. 2412.04135 null
2024-12-05 High-power single-cycle THz emission from large-area photoconductive emitters at 400 kHz Mohsen Khalili et.al. 2412.04004 null
2024-12-05 Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization Tianyu Chen et.al. 2412.03941 null
2024-12-04 Accelerating HI density predictions during the Epoch of Reionization using a GPR-based emulator on N-body simulations Gaurav Pundir et.al. 2412.03485 null
2024-12-03 EvRT-DETR: The Surprising Effectiveness of DETR-based Detection for Event Cameras Dmitrii Torbunov et.al. 2412.02890 link
2024-12-02 Learning Differential Pyramid Representation for Tone Mapping Qirui Yang et.al. 2412.01463 null
2024-11-28 Event-based Tracking of Any Point with Motion-Robust Correlation Features Friedhelm Hamann et.al. 2412.00133 link
2024-11-25 CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain Jingchao Peng et.al. 2411.16327 null
2024-11-22 High-dynamic-range atomic clocks with dual Heisenberg-limited precision scaling Jungeng Zhou et.al. 2411.14944 null
2024-11-20 Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and Welding David Mascareñas et.al. 2411.13108 null
2024-11-18 Noise Filtering Benchmark for Neuromorphic Satellites Observations Sami Arja et.al. 2411.11233 link
2024-11-16 Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion Kepeng Xu et.al. 2411.10775 null
2024-11-15 CaLES: A GPU-accelerated solver for large-eddy simulation of wall-bounded flows Maochao Xiao et.al. 2411.09364 link
2024-11-11 Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models NVIDIA et.al. 2411.07126 null
2024-11-25 Increasing the scalability of graph convolution for FPGA-implemented event-based vision Piotr Wzorek et.al. 2411.04269 null
2024-11-13 DEIO: Deep Event Inertial Odometry Weipeng Guan et.al. 2411.03928 link
2024-11-05 Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor Anish Bhattacharya et.al. 2411.03303 null
2024-11-05 Learning-based Lossless Event Data Compression Ahmadreza Sezavar et.al. 2411.03010 null
2024-10-30 Automatic programming via large language models with population self-evolution for dynamic job shop scheduling problem Jin Huang et.al. 2410.22657 null
2024-10-29 EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data Zhonghua Yi et.al. 2410.21743 link
2024-10-28 NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments Taiyi Pan et.al. 2410.21615 link
2024-10-27 BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events Yijin Li et.al. 2410.20451 null
2024-10-26 Unleashing Dynamic Range and Resolution in Unlimited Sensing Framework via Novel Hardware Yuliang Zhu et.al. 2410.20193 null
2024-10-21 Scene-Segmentation-Based Exposure Compensation for Tone Mapping of High Dynamic Range Scenes Yuma Kinoshita et.al. 2410.19839 null
2024-10-24 Environment Maps Editing using Inverse Rendering and Adversarial Implicit Functions Antonio D’Orazio et.al. 2410.18622 null
2024-10-23 Frequency-dependent amplitude correction to free-precession scalar magnetometers M. E. Limes et.al. 2410.18224 null
2024-10-22 SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition Jiaqi Chen et.al. 2410.16746 link
2024-10-19 A Cycle Ride to HDR: Semantics Aware Self-Supervised Framework for Unpaired LDR-to-HDR Image Translation Hrishav Bakul Barua et.al. 2410.15068 link
2024-10-17 360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers Jack Hilliard et.al. 2410.13566 null
2024-10-17 On Quantum Programming Languages Benoît Valiron et.al. 2410.13337 null
2024-10-16 An O(m+n)-Space Spatiotemporal Denoising Filter with Cache-Like Memories for Dynamic Vision Sensors Qinghang Zhao et.al. 2410.12423 null
2024-10-10 DifFRelight: Diffusion-Based Facial Performance Relighting Mingming He et.al. 2410.08188 null
2024-10-18 IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera Jian Huang et.al. 2410.08107 link
2024-10-09 Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras Friedhelm Hamann et.al. 2410.06698 null
2024-10-03 Spiking Neural Network as Adaptive Event Stream Slicer Jiahang Cao et.al. 2410.02249 link
2024-10-03 Capturing complex hand movements and object interactions using machine learning-powered stretchable smart textile gloves Arvin Tashakori et.al. 2410.02221 link
2024-10-01 Signatures of Black Hole Spin and Plasma Acceleration in Jet Polarimetry Zachary Gelles et.al. 2410.00954 null
2024-10-04 VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models Jiapeng Wang et.al. 2410.00741 null
2024-09-26 Photon Inhibition for Energy-Efficient Single-Photon Imaging Lucas J. Koerner et.al. 2409.18337 null
2024-09-26 Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions Weng Fei Low et.al. 2409.17988 null
2024-09-26 Unsupervised Learning Based Multi-Scale Exposure Fusion Chaobing Zheng et.al. 2409.17830 null
2024-09-26 Event-based Stereo Depth Estimation: A Survey Suman Ghosh et.al. 2409.17680 null
2024-09-26 Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking Pengcheng Shao et.al. 2409.17560 null
2024-09-25 EventHDR: from Event to High-Speed HDR Videos and Beyond Yunhao Zou et.al. 2409.17029 null
2024-09-25 Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training Kun Song et.al. 2409.16767 null
2024-09-24 Sub-Nyquist USF Spectral Estimation: $K$ Frequencies with $6K + 4$ Modulo Samples Ruiming Guo et.al. 2409.16472 null
2024-09-24 Neuromorphic Drone Detection: an Event-RGB Multimodal Approach Gabriele Magrini et.al. 2409.16099 link
2024-09-24 Deep chroma compression of tone-mapped images Xenios Milidonis et.al. 2409.16032 link
2024-09-23 Mixing Data-driven and Geometric Models for Satellite Docking Port State Estimation using an RGB or Event Camera Cedric Le Gentil et.al. 2409.15581 null
2024-09-23 SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream Jinze Yu et.al. 2409.15176 link
2024-09-21 Monocular Event-Inertial Odometry with Adaptive decay-based Time Surface and Polarity-aware Tracking Kai Tang et.al. 2409.13971 null
2024-09-20 Intrinsic Single-Image HDR Reconstruction Sebastian Dille et.al. 2409.13803 link
2024-09-20 Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors Zixin Zhang et.al. 2409.13392 null
2024-09-18 EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning Yukun Tian et.al. 2409.11813 null
2024-09-18 Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network Jiale Wang et.al. 2409.11677 null
2024-09-16 Programmable multifunctional integrated microwave photonic circuit on thin-film lithium niobate Chuangchuang Wei et.al. 2409.10227 null
2024-09-15 SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux Rui Graca et.al. 2409.09648 null
2024-09-13 Integration of high-performance compact interferometric sensors in a suspended interferometer Alexandra Mitchell et.al. 2409.08843 null
2024-09-13 Adaptive Robust High-Precision Atomic Gravimetry Jinye Wei et.al. 2409.08550 null
2024-09-07 Neural Augmentation Based Panoramic High Dynamic Range Stitching Chaobing Zheng et.al. 2409.04679 null
2024-09-05 MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice Friedhelm Hamann et.al. 2409.03358 link
2024-09-03 Gradient events: improved acquisition of visual information in event cameras Eero Lehtonen et.al. 2409.01764 null
2024-09-02 SoK: Security of the Image Processing Pipeline in Autonomous Vehicles Michael Kühr et.al. 2409.01234 link
2024-08-30 Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms Marcus Märtens et.al. 2408.16971 null
2024-08-29 EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More Kanghao Chen et.al. 2408.16254 null
2024-08-28 ES-PTAM: Event-based Stereo Parallel Tracking and Mapping Suman Ghosh et.al. 2408.15605 link
2024-08-27 Towards Real-world Event-guided Low-light Video Enhancement and Deblurring Taewoo Kim et.al. 2408.14916 link
2024-08-27 Recent Event Camera Innovations: A Survey Bharatesh Chakravarthi et.al. 2408.13627 link
2024-08-24 Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation Yuxuan Zhou et.al. 2408.13586 link
2024-08-22 ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes Zhenyi Liu et.al. 2408.12048 link
2024-08-20 Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm Xiao Wang et.al. 2408.10488 link
2024-08-20 MambaEVT: Event Stream based Visual Object Tracking using State Space Model Xiao Wang et.al. 2408.10487 link
2024-08-19 Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms Xiao Wang et.al. 2408.09764 link
2024-08-19 Phase-Separated Charge Order and Twinning Across Length Scales in CsV $_3$Sb$_5$ Jayden Plumb et.al. 2408.08842 null
2024-08-16 CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving Shihan Peng et.al. 2408.08500 null
2024-08-13 MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation JunYong Choi et.al. 2408.06707 null
2024-08-13 HDRGS: High Dynamic Range Gaussian Splatting Jiahao Wu et.al. 2408.06543 link
2024-08-12 Rethinking Video with a Universal Event-Based Representation Andrew Freeman et.al. 2408.06248 null
2024-08-10 EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency Junjie Jiang et.al. 2408.05452 null
2024-08-06 Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera Zibin Liu et.al. 2408.03225 link
2024-07-31 Exploiting Change Blindness for Video Coding: Perspectives from a Less Promising User Study Mitra Amiri et.al. 2408.00052 null
2024-07-23 HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images Shreyas Singh et.al. 2407.16503 link
2024-07-23 SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging Lingtong Kong et.al. 2407.16308 link
2024-07-24 SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams Liangyan Jiang et.al. 2407.15708 link
2024-08-04 Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering Jiahao Cui et.al. 2407.13309 link
2024-07-18 Learned HDR Image Compression for Perceptually Optimal Storage and Display Peibei Cao et.al. 2407.13179 null
2024-07-17 Nonlinear tomographic reconstruction via nonsmooth optimization Vasileios Charisopoulos et.al. 2407.12984 null
2024-07-16 VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos Devesh Walawalkar et.al. 2407.12214 null
2024-07-16 I $^2$ -SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM Gwangtak Bae et.al. 2407.11347 null
2024-07-15 Temporal Event Stereo via Joint Learning with Stereoscopic Flow Hoonhee Cho et.al. 2407.10831 link
2024-07-15 Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation Yuhwan Jeong et.al. 2407.10703 link
2024-07-15 Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction Lin Zhu et.al. 2407.10636 null
2024-07-18 Efficient hybrid technique for generating sub-grid haloes in reionization simulations Ankur Barsode et.al. 2407.10585 null
2024-07-12 Radiance Fields from Photons Sacha Jungerman et.al. 2407.09386 null
2024-07-11 Event-based vision on FPGAs – a survey Tomasz Kryjak et.al. 2407.08356 null
2024-07-12 Dynamic phase transition into a mixed-CDW state in 1 $T$-TaS$_2$ via a thermal quench A. de la Torre et.al. 2407.07953 null
2024-07-08 PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes Mohammad Reza Karimi Dastjerdi et.al. 2407.06150 null
2024-07-08 Neuromorphic Imaging with Super-Resolution Pei Zhang et.al. 2407.05764 null

Low-Level

Publish Date Title Authors PDF Code
2025-04-17 SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Haoxuan Li et.al. 2504.13172 null
2025-04-17 Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal Inzamamul Alam et.al. 2504.12809 null
2025-04-17 AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting Xin Su et.al. 2504.12605 null
2025-04-16 Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling Zhihua Wang et.al. 2504.12204 null
2025-04-16 Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging Tristan S. W. Stevens et.al. 2504.12154 null
2025-04-16 Generalized Visual Relation Detection with Diffusion Models Kaifeng Gao et.al. 2504.12100 null
2025-04-16 R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors Haoyang Wang et.al. 2504.11946 null
2025-04-16 Learning Physics-Informed Color-Aware Transforms for Low-Light Image Enhancement Xingxing Yang et.al. 2504.11896 null
2025-04-16 HyperKING: Quantum-Classical Generative Adversarial Networks for Hyperspectral Image Restoration Chia-Hsiang Lin et.al. 2504.11782 null
2025-04-15 Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain Pengcheng Zheng et.al. 2504.11286 null
2025-04-15 Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach Xiaoxiao Ma et.al. 2504.11262 null
2025-04-15 Visual Re-Ranking with Non-Visual Side Information Gustav Hanning et.al. 2504.11134 null
2025-04-15 UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques Pedro Diaz-Garcia et.al. 2504.11063 null
2025-04-15 TMCIR: Token Merge Benefits Composed Image Retrieval Chaoyang Wang et.al. 2504.10995 null
2025-04-15 AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent Pu Wang et.al. 2504.10978 null
2025-04-15 An Efficient and Mixed Heterogeneous Model for Image Restoration Yubin Gu et.al. 2504.10967 null
2025-04-15 DAAF:Degradation-Aware Adaptive Fusion Framework for Robust Infrared and Visible Images Fusion Tianpei Zhang et.al. 2504.10871 null
2025-04-14 PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems Maud Biquard et.al. 2504.10375 null
2025-04-14 Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis Kaiwen Zheng et.al. 2504.10351 null
2025-04-14 VibrantLeaves: A principled parametric image generator for training deep restoration models Raphael Achddou et.al. 2504.10201 null
2025-04-14 Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction Yucheng Lu et.al. 2504.10080 null
2025-04-14 Progressive Transfer Learning for Multi-Pass Fundus Image Restoration Uyen Phan et.al. 2504.10025 null
2025-04-14 Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration Gang Wu et.al. 2504.09973 null
2025-04-14 Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition Changwei Wang et.al. 2504.09881 null
2025-04-13 Computationally iterative methods for salt-and-pepper denoising Jianwei Ke et.al. 2504.09408 null
2025-04-13 Low-Light Image Enhancement using Event-Based Illumination Estimation Lei Sun et.al. 2504.09379 null
2025-04-12 Beyond Degradation Conditions: All-in-One Image Restoration via HOG Transformers Jiawei Wu et.al. 2504.09377 null
2025-04-11 Hypergraph Vision Transformers: Images are More than Nodes, More than Edges Joshua Fixelle et.al. 2504.08710 null
2025-04-11 ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration Yongsheng Yu et.al. 2504.08591 null
2025-04-11 FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations Cheng-Yu Hsieh et.al. 2504.08368 null
2025-04-11 DreamFuse: Adaptive Image Fusion with Diffusion Transformer Junjia Huang et.al. 2504.08291 null
2025-04-11 VL-UR: Vision-Language-guided Universal Restoration of Images Degraded by Adverse Weather Conditions Ziyan Liu et.al. 2504.08219 null
2025-04-10 Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement Daniel Torres et.al. 2504.07810 null
2025-04-10 Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval Zehong Ma et.al. 2504.07718 null
2025-04-10 Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying Shichen Li et.al. 2504.07465 null
2025-04-10 Synthetic CT Generation from Time-of-Flight Non-Attenutaion-Corrected PET for Whole-Body PET Attenuation Correction Weijie Chen et.al. 2504.07450 null
2025-04-09 Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model Yingjie Zhou et.al. 2504.07148 null
2025-04-09 Distilling Textual Priors from LLM to Efficient Image Fusion Ran Zhang et.al. 2504.07029 null
2025-04-09 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Ruotian Peng et.al. 2504.06666 null
2025-04-09 Rethinking LayerNorm in Image Restoration Transformers MinKyu Lee et.al. 2504.06629 null
2025-04-08 AstroClearNet: Deep image prior for multi-frame astronomical image restoration Yashil Sukurdeep et.al. 2504.06463 null
2025-04-09 Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions Hao Zhang et.al. 2504.05795 null
2025-04-07 Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion Xingyu Hu et.al. 2504.05164 null
2025-04-07 DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration Jiamei Xiong et.al. 2504.05135 null
2025-04-08 Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision Yuandong Pu et.al. 2504.04903 null
2025-04-07 Content-Aware Transformer for All-in-one Image Restoration Gang Wu et.al. 2504.04869 null
2025-04-07 Inland Waterway Object Detection in Multi-environment: Dataset and Approach Shanshan Wang et.al. 2504.04835 null
2025-04-06 NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval Peng Gao et.al. 2504.04339 null
2025-04-05 JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration Yunlong Lin et.al. 2504.04158 null
2025-04-04 Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal Yuyang Hu et.al. 2504.03607 null
2025-04-04 REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval Shabnam Choudhury et.al. 2504.03169 null
2025-04-04 Finding the Reflection Point: Unpadding Images to Remove Data Augmentation Artifacts in Large Open Source Image Datasets for Machine Learning Lucas Choi et.al. 2504.03168 null
2025-04-03 RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models ZhongLi Fang et.al. 2504.02640 null
2025-04-03 Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement Hesong Li et.al. 2504.02555 null
2025-04-03 HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement Hantang Li et.al. 2504.02373 null
2025-04-03 Brightness Perceiving for Recursive Low-Light Image Enhancement Haodian Wang et.al. 2504.02362 link
2025-04-03 SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW Masakazu Yoshimura et.al. 2504.02345 null
2025-04-02 Bridge the Gap between SNN and ANN for Image Restoration Xin Su et.al. 2504.01755 null
2025-04-02 Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval Yuji Nozawa et.al. 2504.01348 null
2025-04-01 IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval Bangwei Liu et.al. 2504.00954 null
2025-04-01 Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data Yiqun Duan et.al. 2504.00812 null
2025-04-01 Deconver: A Deconvolutional Network for Medical Image Segmentation Pooya Ashtari et.al. 2504.00302 link
2025-03-31 InstructRestore: Region-Customized Image Restoration with Human Instructions Shuaizheng Liu et.al. 2503.24357 link
2025-03-31 CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization Yingrui Ji et.al. 2503.24182 null
2025-03-31 3D Dental Model Segmentation with Geometrical Boundary Preserving Shufan Xi et.al. 2503.23702 null
2025-03-30 Multiview Image-Based Localization Cameron Fiore et.al. 2503.23577 null
2025-03-30 ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts Linfeng Tang et.al. 2503.23356 null
2025-03-30 DSPFusion: Image Fusion via Degradation and Semantic Dual-Prior Guidance Linfeng Tang et.al. 2503.23355 null
2025-03-29 A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery Pengyu Chen et.al. 2503.23200 null
2025-03-29 indiSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy Ashesh Ashesh et.al. 2503.22983 null
2025-03-28 RELD: Regularization by Latent Diffusion Models for Image Restoration Pasquale Cascarano et.al. 2503.22563 null
2025-03-27 Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration Yujie Chen et.al. 2503.21970 null
2025-03-27 LOCORE: Image Re-ranking with Long-Context Sequence Modeling Zilin Xiao et.al. 2503.21772 link
2025-03-27 Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck Adrian Bulat et.al. 2503.21757 null
2025-03-27 Invert2Restore: Zero-Shot Degradation-Blind Image Restoration Hamadi Chihaoui et.al. 2503.21486 null
2025-03-27 Diffusion Image Prior Hamadi Chihaoui et.al. 2503.21410 null
2025-03-27 FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval Zixu Li et.al. 2503.21309 link
2025-03-27 Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing Shuai Li et.al. 2503.21236 null
2025-03-26 Underwater Image Enhancement by Convolutional Spiking Neural Networks Vidya Sudevan et.al. 2503.20485 link
2025-03-26 Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration Shihao Zhou et.al. 2503.20174 null
2025-03-25 CoLLM: A Large Language Model for Composed Image Retrieval Chuong Huynh et.al. 2503.19910 link
2025-03-25 LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark Dataset Manjushree Aithal et.al. 2503.19804 null
2025-03-25 Scene-agnostic Pose Regression for Visual Localization Junwei Zheng et.al. 2503.19543 null
2025-03-25 From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting Zhiwei Huang et.al. 2503.19358 null
2025-03-25 Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval Haoqiang Lin et.al. 2503.19296 link
2025-03-24 LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment Haoran Wang et.al. 2503.18640 null
2025-03-24 OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning Hui Li et.al. 2503.18635 null
2025-03-24 Dig2DIG: Dig into Diffusion Information Gains for Image Fusion Bing Cao et.al. 2503.18627 null
2025-03-24 Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model Tianpei Zhang et.al. 2503.18378 null
2025-03-23 LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space Zhangyu Wang et.al. 2503.18142 null
2025-03-23 Deep Learning Assisted Denoising of Experimental Micrographs Owais Ahmad et.al. 2503.17945 null
2025-03-23 Cross-Domain Underwater Image Enhancement Guided by No-Reference Image Quality Assessment: A Transfer Learning Approach Zhi Zhang et.al. 2503.17937 null
2025-03-23 Cat-AIR: Content and Task-Aware All-in-One Image Restoration Jiachen Jiang et.al. 2503.17915 null
2025-03-23 What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images Dongheng Lin et.al. 2503.17899 null
2025-03-22 good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval Pranavi Kolouju et.al. 2503.17871 null
2025-03-21 Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval Yuanmin Tang et.al. 2503.17109 link
2025-03-21 Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks Haijin Zeng et.al. 2503.16930 null
2025-03-20 Efficient Bayesian Computation Using Plug-and-Play Priors for Poisson Inverse Problems Teresa Klatzer et.al. 2503.16222 null
2025-03-20 3-D Image-to-Image Fusion in Lightsheet Microscopy by Two-Step Adversarial Network: Contribution to the FuseMyCells Challenge Marek Wodzinski et.al. 2503.16075 null
2025-03-20 PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval Qiang Zou et.al. 2503.16064 link
2025-03-20 Automating 3D Dataset Generation with Neural Radiance Fields P. Schulz et.al. 2503.15997 link
2025-03-20 DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration Suraj Singh et.al. 2503.15984 null
2025-03-21 UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations Debabrata Mandal et.al. 2503.15868 null
2025-03-19 Image Restoration Models with Optimal Transport and Total Variation Regularization Weijia Huang et.al. 2503.14947 null
2025-03-19 MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance Zihan Cao et.al. 2503.14944 null
2025-03-19 Degradation Alchemy: Self-Supervised Unknown-to-Known Transformation for Blind Hyperspectral Image Fusion He Huang et.al. 2503.14892 null
2025-03-18 Revisiting Image Fusion for Multi-Illuminant White-Balance Correction David Serrano-Lozano et.al. 2503.14774 null
2025-03-18 SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model Yucheng Mao et.al. 2503.14463 null
2025-03-18 AI-Driven Diabetic Retinopathy Diagnosis Enhancement through Image Processing and Salp Swarm Algorithm-Optimized Ensemble Network Saif Ur Rehman Khan et.al. 2503.14209 null
2025-03-18 Towards properties of adversarial image perturbations Egor Kuznetsov et.al. 2503.14111 null
2025-03-18 Intra and Inter Parser-Prompted Transformers for Effective Image Restoration Cong Wang et.al. 2503.14037 link
2025-03-17 Scale Efficient Training for Large Datasets Qing Zhou et.al. 2503.13385 null
2025-03-17 From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective Chen Zhao et.al. 2503.13165 null
2025-03-17 All You Need to Know About Training Image Retrieval Models Gabriele Berton et.al. 2503.13045 link
2025-03-17 Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion Yidi Liu et.al. 2503.12764 null
2025-03-16 DPF-Net: Physical Imaging Model Embedded Data-Driven Underwater Image Enhancement Han Mei et.al. 2503.12470 link
2025-03-16 Pathology Image Restoration via Mixture of Prompts Jiangdong Cai et.al. 2503.12399 link
2025-03-14 Advancements in Real-Time Oncology Diagnosis: Harnessing AI and Image Fusion Techniques Leila Bagheriye et.al. 2503.11332 null
2025-03-14 Breaking Shallow Limits: Task-Driven Pixel Fusion for Gap-free RGBT Tracking Andong Lu et.al. 2503.11247 null
2025-03-14 Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption Du Chen et.al. 2503.11221 null
2025-03-14 InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences Hongkai Zheng et.al. 2503.11043 null
2025-03-13 ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning Pengfei Luo et.al. 2503.10166 link
2025-03-13 Hybrid Agents for Image Restoration Bingchen Li et.al. 2503.10120 null
2025-03-13 Dream-IF: Dynamic Relative EnhAnceMent for Image Fusion Xingxin Xu et.al. 2503.10109 null
2025-03-12 FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification Shoaib Meraj Sami et.al. 2503.09873 null
2025-03-12 Multi-Agent Image Restoration Xu Jiang et.al. 2503.09403 null
2025-03-12 Revisiting Medical Image Retrieval via Knowledge Consolidation Yang Nan et.al. 2503.09370 null
2025-03-12 MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration Zhehui Wu et.al. 2503.09131 link
2025-03-12 Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal Rongxin Liao et.al. 2503.09013 link
2025-03-11 QUIET-SR: Quantum Image Enhancement Transformer for Single Image Super-Resolution Siddhant Dutta et.al. 2503.08759 null
2025-03-11 Language-Depth Navigated Thermal and Visible Image Fusion Jinchang Zhang et.al. 2503.08676 null
2025-03-11 PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net Jun Yin et.al. 2503.08276 null
2025-03-11 TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement Miao Zhang et.al. 2503.08168 null
2025-03-11 Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features Hanbyul Lee et.al. 2503.08148 null
2025-03-11 Deep Perceptual Enhancement for Medical Image Analysis S M A Sharif et.al. 2503.08027 link
2025-03-10 GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts Minwen Liao et.al. 2503.07417 null
2025-03-10 Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion Haowen Bai et.al. 2503.07235 null
2025-03-11 Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios Chenglu Pan et.al. 2503.07232 null
2025-03-10 Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization Michael Green et.al. 2503.07038 null
2025-03-10 Zero-Shot Hashing Based on Reconstruction With Part Alignment Yan Jiang et.al. 2503.07037 null
2025-03-10 Learning a Unified Degradation-aware Representation Model for Multi-modal Image Fusion Haolong Ma et.al. 2503.07033 null
2025-03-10 MERLION: Marine ExploRation with Language guIded Online iNformative Visual Sampling and Enhancement Shrutika Vishal Thengane et.al. 2503.06953 null
2025-03-09 RoboDesign1M: A Large-scale Dataset for Robot Design Understanding Tri Le et.al. 2503.06796 null
2025-03-09 StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition Yanqing Shen et.al. 2503.06601 link
2025-03-07 Data-Efficient Generalization for Zero-shot Composed Image Retrieval Zining Chen et.al. 2503.05204 null
2025-03-06 RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining Tengfei Zhang et.al. 2503.04653 null
2025-03-06 Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior Haitao Wu et.al. 2503.04207 null
2025-03-05 An Adaptive Underwater Image Enhancement Framework via Multi-Domain Fusion and Color Compensation Yuezhe Tian et.al. 2503.03640 null
2025-03-05 Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks Samuel Repka et.al. 2503.03507 null
2025-03-05 Two-Stream Thermal Imaging Fusion for Enhanced Time of Birth Detection in Neonatal Care Jorge García-Torres et.al. 2503.03244 null
2025-03-03 Hyperspectral Image Restoration and Super-resolution with Physics-Aware Deep Learning for Biomedical Applications Yuchen Xiang et.al. 2503.02908 null
2025-03-04 ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement Xuejian Guo et.al. 2503.02484 link
2025-03-04 Semantic Prior Distillation with Vision Foundation Model for Enhanced Rapid Bone Scintigraphy Image Restoration Pengchen Liang et.al. 2503.02321 null
2025-03-03 MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting Mojtaba Safari et.al. 2503.01576 link
2025-03-03 Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions Zihan Shen et.al. 2503.01339 null
2025-03-03 Composed Multi-modal Retrieval: A Survey of Approaches and Applications Kun Zhang et.al. 2503.01334 link
2025-03-03 Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual Chong Wang et.al. 2503.01288 link
2025-03-03 Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond Guanyao Wu et.al. 2503.01210 null
2025-03-02 Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion Daiki Nishiyama et.al. 2503.00925 null
2025-03-01 Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement Aupendu Kar et.al. 2503.00642 link
2025-03-01 Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning Songlin Dong et.al. 2503.00515 null
2025-02-28 SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events Yunfan Lu et.al. 2502.21120 null
2025-02-28 CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval Zelong Sun et.al. 2502.20826 null
2025-02-28 Diffusion Restoration Adapter for Real-World Image Restoration Hanbang Liang et.al. 2502.20679 null
2025-02-28 HVI: A New Color Space for Low-light Image Enhancement Qingsen Yan et.al. 2502.20272 link
2025-02-27 Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps Tianxiao Gao et.al. 2502.20054 null
2025-02-27 Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement Nan An et.al. 2502.19867 null
2025-02-27 One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion Chunyang Cheng et.al. 2502.19854 link
2025-02-26 ILACS-LGOT: A Multi-Layer Contrast Enhancement Approach for Palm-Vein Images Kaveen Perera et.al. 2502.19456 null
2025-02-27 On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation Ruben T. Lucassen et.al. 2502.19285 null
2025-02-26 Self-supervised conformal prediction for uncertainty quantification in Poisson imaging problems Bernardin Tamo Amougou et.al. 2502.19194 null
2025-02-26 Multi-level Attention-guided Graph Neural Network for Image Restoration Jiatao Jiang et.al. 2502.19181 null
2025-02-27 RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images Yuhan Tang et.al. 2502.19153 null
2025-02-26 Dynamic Degradation Decomposition Network for All-in-One Image Restoration Huiqiang Wang et.al. 2502.19068 null
2025-02-25 Spatial Analysis of Neuromuscular Junctions Activation in Three-Dimensional Histology-based Muscle Reconstructions Alessandro Ascani Orsini et.al. 2502.18646 link
2025-02-24 Splitting Regularized Wasserstein Proximal Algorithms for Nonsmooth Sampling Problems Fuqun Han et.al. 2502.16773 link
2025-02-23 Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries Yin Wu et.al. 2502.16636 link
2025-02-21 Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement Uche A. Nnolim et.al. 2502.15986 null
2025-02-21 ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval Guanqi Zhan et.al. 2502.15682 null
2025-02-21 LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement Namrah Siddiqua et.al. 2502.15186 null
2025-02-21 Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization Ach Khozaimi et.al. 2502.15156 null
2025-02-20 Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications Maha Ezzelarab et.al. 2502.14995 null
2025-02-20 CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond Yukai Shi et.al. 2502.14493 null
2025-02-20 EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement Wenhui Zhu et.al. 2502.14260 null
2025-02-19 RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior Ching-Hua Lee et.al. 2502.13574 null
2025-02-18 Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Shuo Xing et.al. 2502.13146 link
2025-02-18 Local Flaw Detection with Adaptive Pyramid Image Fusion Across Spatial Sampling Resolution for SWRs Siyu You et.al. 2502.12512 null
2025-02-17 Descriminative-Generative Custom Tokens for Vision-Language Models Pramuditha Perera et.al. 2502.12095 null
2025-02-17 ILIAS: Instance-Level Image retrieval At Scale Giorgos Kordopatis-Zilos et.al. 2502.11748 null
2025-02-17 Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics Francesco Croce et.al. 2502.11725 link
2025-02-17 Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization Yuanze Xu et.al. 2502.11408 null
2025-02-12 E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection Junjie Wu et.al. 2502.10455 null
2025-02-19 Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal Jinpei Guo et.al. 2502.09873 link
2025-02-13 Source function from two-particle correlation function through entropy-regularized Richardson-Lucy deblurring C. K. Tam et.al. 2502.09478 null
2025-02-13 ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation Rotem Shalev-Arkushin et.al. 2502.09411 null
2025-02-12 Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions Prajwal Gatti et.al. 2502.08438 null
2025-02-13 MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers Ao Li et.al. 2502.07856 null
2025-02-11 Captured by Captions: On Memorization and its Mitigation in CLIP Models Wenhao Wang et.al. 2502.07830 null
2025-02-11 Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems Ai Chen et.al. 2502.07351 link
2025-02-11 Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos Haowen Gao et.al. 2502.07327 null
2025-02-11 PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval Osman Tursun et.al. 2502.07215 null
2025-02-10 AstroLoc: Robust Space to Ground Image Localizer Gabriele Berton et.al. 2502.07003 null
2025-02-10 UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis Zemin Yang et.al. 2502.06324 null
2025-02-09 A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement Muhammad Turab et.al. 2502.05995 null
2025-02-09 Uni-Retrieval: A Multi-Style Retrieval Framework for STEM’s Education Yanhao Jia et.al. 2502.05863 null
2025-02-11 UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control Kaizhen Zhu et.al. 2502.05749 link
2025-02-07 Self-supervised Conformal Prediction for Uncertainty Quantification in Imaging Problems Jasper M. Everink et.al. 2502.05127 null
2025-02-07 Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition S Sreehari et.al. 2502.04680 null
2025-02-07 HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion Mengting Ma et.al. 2502.04623 null
2025-02-06 Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion Marco Mistretta et.al. 2502.04263 link
2025-02-05 All-in-One Image Compression and Restoration Huimin Zeng et.al. 2502.03649 link
2025-02-05 Efficient Image Restoration via Latent Consistency Flow Matching Elad Cohen et.al. 2502.03500 null
2025-02-05 Human-Aligned Image Models Improve Visual Decoding from the Brain Nona Rajabi et.al. 2502.03081 null
2025-02-04 Blind Visible Watermark Removal with Morphological Dilation Preston K. Robinette et.al. 2502.02676 null
2025-02-04 MATCNN: Infrared and Visible Image Fusion Method Based on Multi-scale CNN with Attention Transformer Jingjing Liu et.al. 2502.01959 link
2025-02-03 Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis Haowen Bai et.al. 2502.01467 null
2025-02-03 Human Body Restoration with One-Step Diffusion Model and A New Benchmark Jue Gong et.al. 2502.01411 null
2025-02-03 ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies Costin F. Ciusdel et.al. 2502.01335 null
2025-02-04 Compressed Image Generation with Denoising Diffusion Codebook Models Guy Ohayon et.al. 2502.01189 null
2025-02-01 A framework for river connectivity classification using temporal image processing and attention based neural networks Timothy James Becker et.al. 2502.00474 null
2025-02-01 Shape from Semantics: 3D Shape Generation from Multi-View Semantics Liangchen Li et.al. 2502.00360 null
2025-01-31 Deep Ensembling with Multimodal Image Fusion for Efficient Classification of Lung Cancer Surochita Pal et.al. 2502.00078 null
2025-01-30 Integrating Spatial and Frequency Information for Under-Display Camera Image Restoration Kyusu Ahn et.al. 2501.18517 null
2025-01-31 MatIR: A Hybrid Mamba-Transformer Image Restoration Model Juan Wen et.al. 2501.18401 link
2025-01-30 Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers Malte Tölle et.al. 2501.18237 null
2025-01-29 Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment Zixue Zeng et.al. 2501.17690 link
2025-01-28 Text-to-Image Generation for Vocabulary Learning Using the Keyword Method Nuwan T. Attygalle et.al. 2501.17099 null
2025-01-27 Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration Long Peng et.al. 2501.16583 null
2025-01-27 UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images Tatiana Taís Schein et.al. 2501.16211 link
2025-01-27 Freestyle Sketch-in-the-Loop Image Segmentation Subhadeep Koley et.al. 2501.16022 null
2025-01-27 CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference Zhengyang Lu et.al. 2501.15852 link
2025-01-26 Universal Image Restoration Pre-training via Degradation Classification JiaKui Hu et.al. 2501.15510 link
2025-01-26 Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations Zijun Long et.al. 2501.15379 null
2025-01-24 Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders Zaheer Ahmad et.al. 2501.14709 null
2025-01-24 Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement Guoxi Huang et.al. 2501.14265 link
2025-01-24 CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image Xiaojun Tang et.al. 2501.14264 null
2025-01-23 Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models Jakob Krogh Petersen et.al. 2501.14051 link
2025-01-23 INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration Di You et.al. 2501.14014 null
2025-01-23 Binary Diffusion Probabilistic Model Vitaliy Kinakh et.al. 2501.13915 null
2025-01-23 Where Do You Go? Pedestrian Trajectory Prediction using Scene Features Mohammad Ali Rezaei et.al. 2501.13848 null
2025-01-22 UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior I-Hsiang Chen et.al. 2501.13134 null
2025-01-22 Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects Louis Aberdeen et.al. 2501.13009 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration Ruicheng Zhang et.al. 2501.12832 link
2025-01-21 Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping Hongxu Yang et.al. 2501.12245 null
2025-01-21 DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains Junyu Xia et.al. 2501.12235 null
2025-01-21 Proxies for Distortion and Consistency with Applications for Real-World Image Restoration Sean Man et.al. 2501.12102 null
2025-01-20 SILO: Solving Inverse Problems with Latent Operators Ron Raphaeli et.al. 2501.11746 null
2025-01-19 Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection Zhipeng Yu et.al. 2501.11063 link
2025-01-19 Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation Zhengwen Shen et.al. 2501.10958 null
2025-01-18 Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption Jinyuan Liu et.al. 2501.10761 link
2025-01-18 A Resource-Efficient Training Framework for Remote Sensing Text–Image Retrieval Weihang Zhang et.al. 2501.10638 null
2025-01-17 DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration Huiyun Cao et.al. 2501.10325 null
2025-01-16 FLOL: Fast Baselines for Real-World Low-Light Enhancement Juan C. Benito et.al. 2501.09718 link
2025-01-16 Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression Yongheng Zhang et.al. 2501.09321 null
2025-01-16 Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images Yongheng Zhang et.al. 2501.09268 null
2025-01-15 Vision Foundation Models for Computed Tomography Suraj Pai et.al. 2501.09001 link
2025-01-12 SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval Bhavin Jawade et.al. 2501.08347 null
2025-01-14 AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring Sanjida Afrin Mou et.al. 2501.08266 link
2025-01-13 Depth and Image Fusion for Road Obstacle Detection Using Stereo Camera Oleg Perezyabov et.al. 2501.07245 null
2025-01-12 Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation Zhenyang Feng et.al. 2501.06749 null
2025-01-11 Natural Language Supervision for Low-light Image Enhancement Jiahui Tang et.al. 2501.06546 null
2025-01-10 Underwater Image Enhancement using Generative Adversarial Networks: A Survey Kancharagunta Kishan Babu et.al. 2501.06273 null
2025-01-09 HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction Shaurya Singh Rathore et.al. 2501.05195 null
2025-01-09 ResPanDiff: Diffusion Model with Disentangled Modulations for Image Fusion Shiqi Cao et.al. 2501.05091 null
2025-01-09 IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation Qi Chen et.al. 2501.04995 link
2025-01-08 Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration Laibin Chang et.al. 2501.04740 null
2025-01-14 HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion Chia-Ming Lee et.al. 2501.04665 null
2025-01-08 FrontierNet: Learning Visual Cues to Explore Boyang Sun et.al. 2501.04597 null
2025-01-08 MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration Zhi Jin et.al. 2501.04486 link
2025-01-08 Recognition-Oriented Low-Light Image Enhancement based on Global and Pixelwise Optimization Seitaro Ono et.al. 2501.04210 null
2025-01-07 Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications L. Berlyand et.al. 2501.04182 null
2025-01-07 Convergent Primal-Dual Plug-and-Play Image Restoration: A General Algorithm and Applications Yodai Suzuki et.al. 2501.03780 link
2025-01-06 ImageMM: Joint multi-frame image restoration and super-resolution Yashil Sukurdeep et.al. 2501.03002 null
2025-01-06 Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI Xujin Li et.al. 2501.02841 null
2025-01-06 Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis Xiaojiao Guo et.al. 2501.02701 link
2025-01-03 iCBIR-Sli: Interpretable Content-Based Image Retrieval with 2D Slice Embeddings Shuhei Tomoshige et.al. 2501.01642 null
2025-01-02 Domain-invariant feature learning in brain MR imaging for content-based image retrieval Shuya Tobari et.al. 2501.01326 null
2025-01-03 Conditional Consistency Guided Image Translation and Enhancement Amil Bhagat et.al. 2501.01223 link
2025-01-02 Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion Dong Zhang et.al. 2501.01114 null
2024-12-30 Text-to-Image GAN with Pretrained Representations Xiaozhou You et.al. 2501.00116 null
2024-12-30 Varformer: Adapting VAR’s Generative Prior for Image Restoration Siyang Wang et.al. 2412.21063 link
2024-12-30 Low-Light Image Enhancement via Generative Perceptual Priors Han Zhou et.al. 2412.20916 null
2024-12-29 Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) Tomer Garber et.al. 2412.20596 link
2024-12-28 Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems Wen-Dong Jiang et.al. 2412.20201 null
2024-12-28 UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin et.al. 2412.20157 link
2024-12-28 MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration Boyun Li et.al. 2412.20066 link
2024-12-28 An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models Yuang Wang et.al. 2412.19992 null
2024-12-27 Generative Adversarial Network on Motion-Blur Image Restoration Zhengdong Li et.al. 2412.19479 null
2024-12-25 FOR: Finetuning for Object Level Open Vocabulary Image Retrieval Hila Levi et.al. 2412.18806 null
2024-12-24 Underwater Image Restoration via Polymorphic Large Kernel CNNs Xiaojiao Guo et.al. 2412.18459 link
2024-12-24 UNet–: Memory-Efficient and Feature-Enhanced Network Architecture based on U-Net with Reduced Skip-Connections Lingxiao Yin et.al. 2412.18276 null
2024-12-24 SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos Zhen Zhang et.al. 2412.18214 link
2024-12-24 ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval Le Dong et.al. 2412.18136 link
2024-12-22 Where am I? Cross-View Geo-localization with Natural Language Descriptions Junyan Ye et.al. 2412.17007 null
2024-12-21 Optoelectronic generative adversarial networks Jumin Qiu et.al. 2412.16672 link
2024-12-21 Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising Yuchen Wang et.al. 2412.16645 null
2024-12-24 Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling Daichi Yashima et.al. 2412.16576 link
2024-12-21 Rethinking Model Redundancy for Low-light Image Enhancement Tong Li et.al. 2412.16459 null
2024-12-20 SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild Jannik Elsäßer et.al. 2412.16147 null
2024-12-20 NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images Yue Guo et.al. 2412.15890 null
2024-12-20 Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation Aiwen Jiang et.al. 2412.15845 link
2024-12-20 A New Method to Capturing Compositional Knowledge in Linguistic Space Jiahe Wan et.al. 2412.15632 null
2024-12-20 Stabilizing Laplacian Inversion in Fokker-Planck Image Retrieval using the Transport-of-Intensity Equation Samantha J Alloo et.al. 2412.15513 null
2024-12-19 Learning Visual Composition through Improved Semantic Guidance Austin Stone et.al. 2412.15396 null
2024-12-19 Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model Minglong Xue et.al. 2412.14630 link
2024-12-19 MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Junjie Zhou et.al. 2412.14475 null
2024-12-18 Personalized Generative Low-light Image Denoising and Enhancement Xijun Wang et.al. 2412.14327 null
2024-12-18 Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing Le-Anh Tran et.al. 2412.14220 link
2024-12-18 Adversarial Hubness in Multi-Modal Retrieval Tingwei Zhang et.al. 2412.14113 link
2024-12-18 Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval Giacomo Pacini et.al. 2412.13834 null
2024-12-18 Fed-AugMix: Balancing Privacy and Utility via Data Augmentation Haoyang Li et.al. 2412.13818 null
2024-12-18 Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode Xin Su et.al. 2412.13749 link
2024-12-18 VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement Chen Zhao et.al. 2412.13655 link
2024-12-18 DarkIR: Robust Low-Light Image Restoration Daniel Feijoo et.al. 2412.13443 link
2024-12-18 Zero-Shot Low Light Image Enhancement with Diffusion Prior Joshua Cho et.al. 2412.13401 link
2024-12-17 Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration Xinlong Cheng et.al. 2412.12550 null
2024-12-17 Three Things to Know about Deep Metric Learning Yash Patel et.al. 2412.12432 null
2024-12-16 Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD) Ki-Hwan Oh et.al. 2412.12238 link
2024-12-16 Ultra-High-Definition Dynamic Multi-Exposure Image Fusion via Infinite Pixel Learning Xingchi Chen et.al. 2412.11685 null
2024-12-16 CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution Bingwen Hu et.al. 2412.11609 null
2024-12-15 Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval Zelong Sun et.al. 2412.11087 null
2024-12-15 Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval Yuanmin Tang et.al. 2412.11077 link
2024-12-15 Towards Context-aware Convolutional Network for Image Restoration Fangwei Hao et.al. 2412.11008 null
2024-12-14 Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification Yucong Meng et.al. 2412.10776 null
2024-12-16 Matrix Completion via Residual Spectral Matching Ziyuan Chen et.al. 2412.10005 null
2024-12-13 $\textrm{A}^{\textrm{2}}$ RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion Jiawei Li et.al. 2412.09954 link
2024-12-12 OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs Yuanzhi Zhu et.al. 2412.09465 link
2024-12-13 Are Conditional Latent Diffusion Models Effective for Image Restoration? Yunchen Yuan et.al. 2412.09324 null
2024-12-13 MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition Qiwen Gu et.al. 2412.09199 null
2024-12-12 ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring Zhongbao Yang et.al. 2412.09193 null
2024-12-12 Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration Yunshuai Zhou et.al. 2412.08939 link
2024-12-12 A Flexible Plug-and-Play Module for Generating Variable-Length Liyang He et.al. 2412.08922 link
2024-12-11 Image Retrieval Methods in the Dissimilarity Space Madhu Kiran et.al. 2412.08618 null
2024-12-11 Convergence Analysis of a Proximal Stochastic Denoising Regularization Algorithm Marien Renaud et.al. 2412.08262 null
2024-12-11 Visible and Infrared Image Fusion Using Encoder-Decoder Network Ferhat Can Ataman et.al. 2412.08073 link
2024-12-11 BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion Huafeng Li et.al. 2412.08050 link
2024-12-10 Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance Wanwen Chen et.al. 2412.07741 null
2024-12-10 Leveraging Content and Context Cues for Low-Light Image Enhancement Igor Morawski et.al. 2412.07693 link
2024-12-10 Analytical-Heuristic Modeling and Optimization for Low-Light Image Enhancement Axel Martinez et.al. 2412.07659 null
2024-12-10 Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE).pdf Tu Vo et.al. 2412.07527 null
2024-12-10 Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring Yuzhi Zhao et.al. 2412.07256 link
2024-12-10 EchoIR: Advancing Image Restoration with Echo Upsampling and Bi-Level Optimization Yuhan He et.al. 2412.07225 null
2024-12-10 A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing Yujie Feng et.al. 2412.07195 null
2024-12-09 InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention Howard Zhang et.al. 2412.06753 null
2024-12-09 EchoSim4D: A Proof-of-Concept Gamified XR Echocardiography Training Simulator for Neonates using 4D Ultrasound Volume Deepthy Rose Jose et.al. 2412.06271 null
2024-12-08 A Review on Multisensor Data Fusion for Wearable Health Monitoring Arlene John et.al. 2412.05895 null
2024-12-07 Compositional Image Retrieval via Instruction-Aware Contrastive Learning Wenliang Zhong et.al. 2412.05756 link
2024-12-07 Enhancing Sample Generation of Diffusion Models using Noise Level Correction Abulikemu Abuduweili et.al. 2412.05488 null
2024-12-06 Equivariant Denoisers for Image Restoration Marien Renaud et.al. 2412.05343 null
2024-12-06 ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration Chi-Wei Hsiao et.al. 2412.05043 null
2024-12-06 DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection Yishuo Chen et.al. 2412.04931 link
2024-12-06 DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification Ying Jin et.al. 2412.04828 null
2024-12-06 Modality Decoupling is All You Need: A Simple Solution for Unsupervised Hyperspectral Image Fusion Songcheng Du et.al. 2412.04802 null
2024-12-05 Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise Brayan Monroy et.al. 2412.04648 link
2024-12-05 MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers Byeonghyeon Lee et.al. 2412.04591 null
2024-12-05 Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image Shuang Xu et.al. 2412.04201 null
2024-12-05 Deep priors for satellite image restoration with accurate uncertainties Biquard Maud et.al. 2412.04130 null
2024-12-05 Blind Underwater Image Restoration using Co-Operational Regressor Networks Ozer Can Devecioglu et.al. 2412.03995 null
2024-12-05 LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model Yuan Xue et.al. 2412.03841 null
2024-12-05 Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration Yuzhen Du et.al. 2412.03814 null
2024-12-04 Composed Image Retrieval for Training-Free Domain Conversion Nikos Efthymiadis et.al. 2412.03297 link
2024-12-04 Task-driven Image Fusion with Learnable Fusion Loss Haowen Bai et.al. 2412.03240 null
2024-12-04 Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution Jiahua Xiao et.al. 2412.02960 null
2024-12-03 Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval Leah Bar et.al. 2412.02310 link
2024-12-03 Relaxed and Inertial Nonlinear Forward-Backward with Momentum Fernando Roldán et.al. 2412.02045 link
2024-12-02 Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features MD Shaikh Rahman et.al. 2412.01555 null
2024-12-02 Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond MD Raqib Khan et.al. 2412.01456 link
2024-12-02 FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration Hao Li et.al. 2412.01427 null
2024-12-02 Neuron Abandoning Attention Flow: Visual Explanation of Dynamics inside CNN Models Yi Liao et.al. 2412.01202 null
2024-12-01 Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration Haoze Sun et.al. 2412.00878 null
2024-12-01 DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement Tongshun Zhang et.al. 2412.00683 link
2024-12-01 MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning You Wu et.al. 2412.00626 null
2024-11-30 Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion Michail Dontas et.al. 2412.00557 null
2024-11-29 Self-Supervised Denoiser Framework Emilien Valat et.al. 2411.19593 null
2024-11-27 Optimizing Image Retrieval with an Extended b-Metric Space Abdelkader Belhenniche et.al. 2411.18800 null
2024-11-27 Hierarchical Information Flow for Generalized Efficient Image Restoration Yawei Li et.al. 2411.18588 null
2024-11-27 Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir et.al. 2411.18466 null
2024-11-27 Adaptive Blind All-in-One Image Restoration David Serrano-Lozano et.al. 2411.18412 link
2024-11-29 HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning Zengxi Zhang et.al. 2411.18296 link
2024-11-27 TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution Linwei Dong et.al. 2411.18263 link
2024-12-02 Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision Jinnyeong Kim et.al. 2411.18025 null
2024-11-26 Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation Sudarshan Rajagopalan et.al. 2411.17814 null
2024-11-26 GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration Sudarshan Rajagopalan et.al. 2411.17687 null
2024-11-26 Learning Visual Hierarchies with Hyperbolic Embeddings Ziwei Wang et.al. 2411.17490 null
2024-11-26 Puzzle Similarity: A Perceptually-guided No-Reference Metric for Artifact Detection in 3D Scene Reconstructions Nicolai Hermann et.al. 2411.17489 null
2024-11-26 MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers Ruoxi Zhu et.al. 2411.17226 link
2024-11-25 Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding Yubin Gu et.al. 2411.16217 null
2024-11-25 U2NeRF: Unsupervised Underwater Image Restoration and Neural Radiance Fields Vinayak Gupta et.al. 2411.16172 null
2024-11-25 Image Generation Diversity Issues and How to Tame Them Mischa Dombrowski et.al. 2411.16171 link
2024-11-24 PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation Chia-Ming Lee et.al. 2411.15922 link
2024-11-24 MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking Chunhui Zhang et.al. 2411.15761 link
2024-11-24 LTCF-Net: A Transformer-Enhanced Dual-Channel Fourier Framework for Low-Light Image Restoration Gaojing Zhang et.al. 2411.15740 null
2024-11-22 Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration Darshan Thaker et.al. 2411.15295 null
2024-11-22 MambaIRv2: Attentive State Space Restoration Hang Guo et.al. 2411.15269 link
2024-11-22 Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval Zengbao Sun et.al. 2411.14704 link
2024-11-21 Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection Ali Awad et.al. 2411.14626 null
2024-11-21 Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion Jinhong He et.al. 2411.13961 link
2024-11-20 Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms Matthieu Kowalski et.al. 2411.13276 null
2024-11-20 Globally Correlation-Aware Hard Negative Generation Wenjie Peng et.al. 2411.13145 link
2024-11-19 Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution Yang Zou et.al. 2411.12530 link
2024-11-19 Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models Jun Xiao et.al. 2411.12450 null
2024-11-19 Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images Zheng Gong et.al. 2411.12278 null
2024-11-16 GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding Yue Zhou et.al. 2411.11904 link
2024-11-18 Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion Meng Zhou et.al. 2411.11799 link
2024-11-18 Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment Zhendong Liu et.al. 2411.11543 null
2024-11-17 Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method Yan Zheng et.al. 2411.11135 null
2024-11-19 TSFormer: A Robust Framework for Efficient UHD Image Restoration Xin Su et.al. 2411.10951 null
2024-11-16 AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations Jiawei Mao et.al. 2411.10708 null
2024-11-16 Underwater Image Enhancement with Cascaded Contrastive Learning Yi Liu et.al. 2411.10682 link
2024-11-16 SPDFusion: An Infrared and Visible Image Fusion Network Based on a Non-Euclidean Representation of Riemannian Manifolds Huan Kang et.al. 2411.10679 null
2024-11-15 Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence Guodong Sun et.al. 2411.10321 null
2024-11-15 Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting Ziqi Xie et.al. 2411.10309 link
2024-11-15 Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion Dan He et.al. 2411.10036 null
2024-11-14 Instruction-Driven Fusion of Infrared-Visible Images: Tailoring for Diverse Downstream Tasks Zengyi Yang et.al. 2411.09387 null
2024-11-13 Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval Saul Santos et.al. 2411.08590 link
2024-11-13 Saliency Map-based Image Retrieval using Invariant Krawtchouk Moments Ashkan Nejad et.al. 2411.08567 link
2024-11-12 CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising Linxuan Li et.al. 2411.07930 link
2024-11-12 Joint multi-dimensional dynamic attention and transformer for general image restoration Huan Zhang et.al. 2411.07893 link
2024-11-12 All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model Yuanbo Wen et.al. 2411.07445 null
2024-11-11 Multi-scale Frequency Enhancement Network for Blind Image Deblurring Yawen Xiang et.al. 2411.06893 null
2024-11-10 Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration Chen Wu et.al. 2411.06456 null
2024-11-08 A Modular Conditional Diffusion Framework for Image Reconstruction Magauiya Zhussip et.al. 2411.05993 null
2024-11-05 From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing Xintian Sun et.al. 2411.05826 null
2024-11-07 Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion Yiming Sun et.al. 2411.04697 link
2024-11-07 l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion Gargi Panda et.al. 2411.04519 null
2024-11-05 Test-Time Dynamic Image Fusion Bing Cao et.al. 2411.02840 link
2024-11-05 ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing Yuka Ogino et.al. 2411.02799 null
2024-11-04 TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives Maitreya Patel et.al. 2411.02545 null
2024-11-11 INQUIRE: A Natural World Text-to-Image Retrieval Benchmark Edward Vendrow et.al. 2411.02537 link
2024-11-04 Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models Sharat Agarwal et.al. 2411.01925 null
2024-11-03 Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration Xiaole Tang et.al. 2411.01656 link
2024-11-03 Conditional Controllable Image Fusion Bing Cao et.al. 2411.01573 link
2024-11-03 Efficient Medical Image Retrieval Using DenseNet and FAISS for BIRADS Classification MD Shaikh Rahman et.al. 2411.01473 null
2024-11-03 TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement Xuanzhao Dong et.al. 2411.01403 null
2024-11-02 Medical X-Ray Image Enhancement Using Global Contrast-Limited Adaptive Histogram Equalization Sohrab Namazi Nia et.al. 2411.01373 null
2024-11-01 Identifying Implicit Social Biases in Vision-Language Models Kimia Hamidieh et.al. 2411.00997 null
2024-10-31 Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes Shaohua Liu et.al. 2411.00239 null
2024-10-31 Chasing Better Deep Image Priors between Over- and Under-parameterization Qiming Wu et.al. 2410.24187 link
2024-10-31 Nearest Neighbor Normalization Improves Multimodal Retrieval Neil Chowdhury et.al. 2410.24114 link
2024-10-31 Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation Yihang Zhou et.al. 2410.23962 null
2024-10-31 Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model Hao Zhang et.al. 2410.23905 link
2024-10-31 MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval Haiwen Li et.al. 2410.23736 null
2024-10-31 Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data Yucun Hou et.al. 2410.23628 null
2024-10-31 MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction Ziqi Gao et.al. 2410.23577 link
2024-10-30 Decoupling Semantic Similarity from Spatial Alignment for Neural Networks Tassilo Wald et.al. 2410.23107 link
2024-10-30 EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models Shangquan Sun et.al. 2410.22959 link
2024-10-30 SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion Kun Hu et.al. 2410.22837 link
2024-10-30 Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement Sahil Ali Akbar et.al. 2410.21946 link
2024-10-29 Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications Monica Riedler et.al. 2410.21943 link
2024-10-28 Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework Vladimir Arkhipkin et.al. 2410.21061 link
2024-10-27 Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement Junhao Tan et.al. 2410.20314 link
2024-10-27 Deep Learning, Machine Learning – Digital Signal and Image Processing: From Theory to Application Weiche Hsieh et.al. 2410.20304 null
2024-10-24 HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision Burak Ercan et.al. 2410.19164 null
2024-10-24 ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval Zijia Zhao et.al. 2410.18715 link
2024-10-29 DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Yuang Ai et.al. 2410.18666 link
2024-10-23 DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection Qingpeng Li et.al. 2410.17822 link
2024-10-23 An Intelligent Agentic System for Complex Image Restoration Problems Kaiwen Zhu et.al. 2410.17809 link
2024-10-23 A variational approach to nonlocal image restoration flows Harsh Prasad et.al. 2410.17649 null
2024-10-23 Diffusion Priors for Variational Likelihood Estimation and Image Denoising Jun Cheng et.al. 2410.17521 link
2024-10-22 Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval Yuanmin Tang et.al. 2410.17393 null
2024-10-20 LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration Yuang Ai et.al. 2410.15385 link
2024-10-20 GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning Haiwen Diao et.al. 2410.15266 link
2024-10-19 A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends Junjun Jiang et.al. 2410.15067 link
2024-10-19 Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway’s Digitised Book Collection Marie Roald et.al. 2410.14969 link
2024-10-16 Development of Image Collection Method Using YOLO and Siamese Network Chan Young Shin et.al. 2410.12561 null
2024-10-16 Towards Flexible and Efficient Diffusion Low Light Enhancer Guanzhou Lan et.al. 2410.12346 null
2024-10-16 Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond Pengwei Liang et.al. 2410.12274 null
2024-10-15 Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos Zhouxia Wang et.al. 2410.11828 null
2024-10-15 LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images Yuzhou Cheng et.al. 2410.11505 null
2024-10-13 Fusion Based Hand Geometry Recognition Using Dempster-Shafer Theory Asish Bera et.al. 2410.09842 null
2024-10-13 LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond Md Tanvir Islam et.al. 2410.09831 link
2024-10-14 LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection Mingjia Li et.al. 2410.08810 link
2024-10-11 Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers Jin Cao et.al. 2410.08688 link
2024-10-16 Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP Eunji Kim et.al. 2410.08469 null
2024-10-11 A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification Eugene P. W. Ang et.al. 2410.08456 null
2024-10-10 TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration Hsing-Hua Wang et.al. 2410.08177 link
2024-10-10 A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks Hoin Jung et.al. 2410.07593 link
2024-10-09 Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval Mohammad Omama et.al. 2410.07022 null
2024-10-09 Rethinking the Evaluation of Visible and Infrared Image Fusion Dayan Guan et.al. 2410.06811 link
2024-10-09 InstantIR: Blind Image Restoration with Instant Generative Reference Jen-Yuan Huang et.al. 2410.06551 null
2024-10-09 MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging Noel C. F. Codella et.al. 2410.06542 null
2024-10-08 Temporal Image Caption Retrieval Competition – Description and Results Jakub Pokrywka et.al. 2410.06314 null
2024-10-08 GSLoc: Visual Localization with 3D Gaussian Splatting Kazii Botashev et.al. 2410.06165 null
2024-10-08 Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning Ayush Singh et.al. 2410.05928 null
2024-10-08 ReFIR: Grounding Large Restoration Models with Retrieval Augmentation Hang Guo et.al. 2410.05601 link
2024-10-09 LoTLIP: Improving Language-Image Pre-training for Long Text Understanding Wei Wu et.al. 2410.05249 null
2024-10-07 Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration Zhiyu Zhu et.al. 2410.04811 link
2024-10-06 Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli Valentyn Piskovskyi et.al. 2410.04497 null
2024-10-06 SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems Ismail Alkhouri et.al. 2410.04479 link
2024-10-05 Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model Keda Tao et.al. 2410.04161 null
2024-10-04 Diffusion State-Guided Projected Gradient for Inverse Problems Rayhan Zirvi et.al. 2410.03463 link
2024-10-03 PnP-Flow: Plug-and-Play Image Restoration with Flow Matching Ségolène Martin et.al. 2410.02423 link
2024-10-03 Can Capacitive Touch Images Enhance Mobile Keyboard Decoding? Piyawat Lertvittayakumjorn et.al. 2410.02264 link
2024-10-02 Posterior sampling via Langevin dynamics based on generative priors Vishal Purohit et.al. 2410.02078 null
2024-10-03 EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections Francesc Net et.al. 2410.01536 link
2024-10-04 CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment Safouane El Ghazouali et.al. 2410.01411 link
2024-10-01 Three-Operator Splitting Method with Two-Step Inertial Extrapolation Olaniyi S. Iyiola et.al. 2410.01099 null
2024-10-01 GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer Youngho Yoon et.al. 2410.00672 link
2024-10-01 Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration Guy Ohayon et.al. 2410.00418 link
2024-10-01 GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction Zaid Ilyas et.al. 2410.00380 null
2024-09-30 Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation Aleyna Kütük et.al. 2410.00266 null
2024-09-30 A Survey on Diffusion Models for Inverse Problems Giannis Daras et.al. 2410.00083 null
2024-09-30 UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation Cheng Zhang et.al. 2409.20197 link
2024-09-29 Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation Xiaofeng Cong et.al. 2409.19685 link
2024-09-28 Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration Chu-Jie Qin et.al. 2409.19403 link
2024-09-28 VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition Ahmad Khaliq et.al. 2409.19293 link
2024-09-28 PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution Song Zhang et.al. 2409.19269 link
2024-09-28 Extending Depth of Field for Varifocal Multiview Images Zhilong Li et.al. 2409.19220 null
2024-09-27 MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion Bardienus Duisterhof et.al. 2409.19152 null
2024-09-27 Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors Yunlong Lin et.al. 2409.18899 null
2024-09-26 Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval Mankeerat Sidhu et.al. 2409.18733 null
2024-09-27 Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification Salma Hassan et.al. 2409.18715 null
2024-09-27 Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models Nguyen Gia Bach et.al. 2409.18476 link
2024-09-27 SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement Yunkui Pang et.al. 2409.18355 link
2024-09-26 Toward Efficient Deep Blind RAW Image Restoration Marcos V. Conde et.al. 2409.18204 link
2024-09-26 Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs Qinpeng Cui et.al. 2409.17778 link
2024-09-25 Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement Yihao Zhou et.al. 2409.16661 null
2024-09-25 Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement Guanlin Li et.al. 2409.16604 link
2024-09-24 Proactive Schemes: A Survey of Adversarial Attacks for Social Good Vishal Asnani et.al. 2409.16491 null
2024-09-24 Liger at W.M. Keck Observatory: imager structural analysis, fabrication, and characterization plan James Wiley et.al. 2409.16263 null
2024-09-23 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Weifeng Lin et.al. 2409.15278 link
2024-09-23 FusionRF: High-Fidelity Satellite Neural Radiance Fields from Multispectral and Panchromatic Acquisitions Michael Sprintson et.al. 2409.15132 null
2024-09-22 Low-Light Enhancement Effect on Classification and Detection: An Empirical Study Xu Wu et.al. 2409.14461 null
2024-09-22 Quantitative and Qualitative Evaluation of NLM and Wavelet Methods in Image Enhancement Cameron Khanpour et.al. 2409.14334 null
2024-09-20 Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval Morris Florek et.al. 2409.13513 link
2024-09-19 Deep Learning-Based Detection of Referable Diabetic Retinopathy and Macular Edema Using Ultra-Widefield Fundus Imaging Philippe Zhang et.al. 2409.12854 null
2024-09-19 Fundus image enhancement through direct diffusion bridges Sehui Kim et.al. 2409.12377 link
2024-09-18 Denoising diffusion models for high-resolution microscopy image restoration Pamela Osuna-Vargas et.al. 2409.12078 null
2024-09-18 DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion Jian Xu et.al. 2409.11642 link
2024-09-17 Ultrasound Image Enhancement with the Variance of Diffusion Models Yuxin Zhang et.al. 2409.11380 link
2024-09-17 Improving the Efficiency of Visually Augmented Language Models Paula Ontalvilla et.al. 2409.11148 link
2024-09-17 CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement Xuanzhao Dong et.al. 2409.10966 link
2024-09-16 Taming Diffusion Models for Image Restoration: A Review Ziwei Luo et.al. 2409.10353 null
2024-09-17 Fuse4Seg: Image-Level Fusion Based Multi-Modality Medical Image Segmentation Yuchen Guo et.al. 2409.10328 null
2024-09-16 Garment Attribute Manipulation with Multi-level Attention Vittorio Casula et.al. 2409.10206 null
2024-09-16 DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion Yuchen Guo et.al. 2409.10080 null
2024-09-15 Underwater Image Enhancement via Dehazing and Color Restoration Chengqin Wu et.al. 2409.09779 null
2024-09-15 Unsupervised Hyperspectral and Multispectral Image Blind Fusion Based on Deep Tucker Decomposition Network with Spatial-Spectral Manifold Learning He Wang et.al. 2409.09670 link
2024-09-14 Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval Amirreza Mahbod et.al. 2409.09430 link
2024-09-14 Infrared and Visible Image Fusion with Hierarchical Human Perception Guang Yang et.al. 2409.09291 null
2024-09-12 Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement Vamsi Krishna Vasa et.al. 2409.07862 null
2024-09-12 Quaternion Nuclear Norm minus Frobenius Norm Minimization for color image reconstruction Yu Guo et.al. 2409.07797 null
2024-09-11 FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process Yang Luo et.al. 2409.07451 null
2024-09-11 Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement Xianmin Chen et.al. 2409.07040 link
2024-09-11 PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening RuoCheng Wu et.al. 2409.06980 null
2024-09-10 Modeling Image Tone Dichotomy with the Power Function Axel Martinez et.al. 2409.06764 null
2024-09-10 Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer Li Ke et.al. 2409.06590 null
2024-09-10 Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models Siyu Zhai et.al. 2409.06420 null
2024-09-10 A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions Zhicong Wu et.al. 2409.06381 null
2024-09-10 Multi-Weather Image Restoration via Histogram-Based Transformer Feature Enhancement Yang Wen et.al. 2409.06334 null
2024-09-10 AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration Hongyi Cai et.al. 2409.06206 null
2024-09-09 Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding Bram Willemsen et.al. 2409.05721 link
2024-09-09 Open-World Dynamic Prompt and Continual Visual Representation Learning Youngeun Kim et.al. 2409.05312 null
2024-09-09 Rethinking the Atmospheric Scattering-driven Attention via Channel and Gamma Correction Priors for Low-Light Image Enhancement Shyang-En Weng et.al. 2409.05274 link
2024-09-07 Training-free ZS-CIR via Weighted Modality Fusion and Similarity Ren-Di Wu et.al. 2409.04918 link
2024-09-07 Power Line Aerial Image Restoration under dverse Weather: Datasets and Baselines Sai Yang et.al. 2409.04812 link
2024-09-06 Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models Saghir Alfasly et.al. 2409.04631 null
2024-09-06 Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior Charlesquin Kemajou Mbakam et.al. 2409.04384 null
2024-09-06 RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement Hao Luo et.al. 2409.04363 link
2024-09-06 Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks Hangcheng Cao et.al. 2409.04133 null
2024-09-05 Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration Pei Wang et.al. 2409.03455 null
2024-09-05 KAN See In the Dark Aoxiang Ning et.al. 2409.03404 link
2024-09-05 Multiple weather images restoration using the task transformer and adaptive mixup strategy Yang Wen et.al. 2409.03249 null
2024-09-05 Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion Chenguang Zhu et.al. 2409.03223 null
2024-09-05 Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem Qiwen Zhu et.al. 2409.03179 link
2024-09-04 Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications Abby Stylianou et.al. 2409.03012 null
2024-09-04 Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening Ivan Pereira-Sánchez et.al. 2409.02675 link
2024-09-04 NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval Sepanta Zeighami et.al. 2409.02343 link
2024-09-03 Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models Jiaqi Xu et.al. 2409.02101 link
2024-09-03 F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring Subhajit Paul et.al. 2409.02056 null
2024-09-03 AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions Chenghao Qian et.al. 2409.02045 link
2024-09-03 Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment Konstantin Schall et.al. 2409.01936 link
2024-09-03 Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion Ke Cao et.al. 2409.01728 null
2024-09-03 Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement Kun Zhou et.al. 2409.01641 link
2024-09-03 GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting Zixuan Guo et.al. 2409.01581 null
2024-09-02 A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches Kim Jinwoo et.al. 2409.01219 null
2024-08-30 Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method Yuji Lin et.al. 2408.17339 link
2024-09-02 RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance Avideep Mukherjee et.al. 2408.17095 null
2024-08-30 Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL Haiyang Zhao et.al. 2408.17060 null
2024-08-29 GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content Lebin Zhou et.al. 2408.16866 null
2024-09-02 A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising Shuaiyu Yuan et.al. 2408.16481 null
2024-08-29 Enhanced Control for Diffusion Bridge in Image Restoration Conghan Yue et.al. 2408.16303 link
2024-08-29 Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models Kengo Nakata et.al. 2408.16296 null
2024-08-29 LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement Ye Yu et.al. 2408.16235 link
2024-08-28 Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration Xu Zhang et.al. 2408.15994 null
2024-08-28 MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion Yanglin Deng et.al. 2408.15641 link
2024-08-28 Temporal Attention for Cross-View Sequential Image Localization Dong Yuan et.al. 2408.15569 link
2024-08-27 A Preliminary Exploration Towards General Image Restoration Xiangtao Kong et.al. 2408.15143 null
2024-08-27 Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild Tianqi Wei et.al. 2408.14723 null
2024-08-26 FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation Daixun Li et.al. 2408.13980 null
2024-08-25 LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task Ali Asgarov et.al. 2408.13909 link
2024-08-23 O-Mamba: O-shape State-Space Model for Underwater Image Enhancement Chenyu Dong et.al. 2408.12816 link
2024-08-22 CODE: Confident Ordinary Differential Editing Bastien van Delft et.al. 2408.12418 link
2024-08-22 Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement Lingyu Zhu et.al. 2408.12316 link
2024-08-21 Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations Lintong Zhang et.al. 2408.11966 null
2024-08-21 OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal Qiao Mo et.al. 2408.11480 link
2024-08-21 UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation Xiangyu Zhao et.al. 2408.11305 link
2024-08-21 Taming Generative Diffusion for Universal Blind Image Restoration Siwei Tu et.al. 2408.11287 null
2024-08-20 Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement Satoshi Kosugi et.al. 2408.11055 link
2024-08-20 SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement Linlin Hu et.al. 2408.10934 null
2024-08-20 UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement Yingtie Lei et.al. 2408.10653 link
2024-08-19 BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval Zhenyu Lu et.al. 2408.10383 null
2024-08-19 Multi-Scale Representation Learning for Image Restoration with State-Space Model Yuhong He et.al. 2408.10145 null
2024-08-19 Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration Alik Pramanick et.al. 2408.09912 link
2024-08-19 Fashion Image-to-Image Translation for Complementary Item Retrieval Matteo Attimonelli et.al. 2408.09847 link
2024-08-19 ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement Eashan Adhikarla et.al. 2408.09650 link
2024-08-17 Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration Xin Lin et.al. 2408.09241 link
2024-08-16 DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer’s Case Studies Mohammad Hossein Najafi et.al. 2408.08489 null
2024-08-15 Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks Jiawei Wu et.al. 2408.08149 link
2024-08-15 HAIR: Hypernetworks-based All-in-One Image Restoration Jin Cao et.al. 2408.08091 link
2024-08-15 DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions Ryosuke Korekata et.al. 2408.07910 null
2024-08-13 Review Learning: Advancing All-in-One Ultra-High-Definition Image Restoration Training Method Xin Su et.al. 2408.06709 null
2024-08-12 Wavelet based inpainting detection Barglazan Adrian-Alin et.al. 2408.06429 null
2024-08-12 Latent Disentanglement for Low Light Image Enhancement Zhihao Zheng et.al. 2408.06245 null
2024-08-10 Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network Junyan Ye et.al. 2408.05475 link
2024-08-10 Greedy randomized block Kaczmarz method for matrix equation AXB=C and its applications in color image restoration Wenli Wang et.al. 2408.05444 null
2024-08-08 Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration Ziran Zhang et.al. 2408.04227 null
2024-08-08 MultiColor: Image Colorization by Learning from Multiple Color Spaces Xiangcheng Du et.al. 2408.04172 null
2024-08-06 AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval Pavel Suma et.al. 2408.03282 link
2024-08-05 Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models Tongtong Feng et.al. 2408.02408 null
2024-08-02 On Validation of Search & Retrieval of Tissue Images in Digital Pathology H. R. Tizhoosh et.al. 2408.01570 null
2024-08-02 Underwater Object Detection Enhancement via Channel Stabilization Muhammad Ali et.al. 2408.01293 link
2024-08-02 Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement Wenbin Zou et.al. 2408.01276 link
2024-08-02 Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration Donwon Park et.al. 2408.01099 null
2024-08-02 FCDFusion: a Fast, Low Color Deviation Method for Fusing Visible and Infrared Image Pairs Hesong Li et.al. 2408.01080 null
2024-08-01 A Prior Embedding-Driven Architecture for Long Distance Blind Iris Recognition Qi Xiong et.al. 2408.00210 null
2024-07-30 UniProcessor: A Text-induced Unified Low-level Image Processor Huiyu Duan et.al. 2407.20928 link
2024-07-27 Inverse Problems with Diffusion Models: A MAP Estimation Perspective Sai bharath chandra Gutha et.al. 2407.20784 link
2024-07-29 ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement Ezequiel Perez-Zarate et.al. 2407.19708 link
2024-07-31 Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint Song Zhang et.al. 2407.19248 null
2024-07-27 Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration Xiaoyan Yu et.al. 2407.19139 link
2024-07-26 Dilated Strip Attention Network for Image Restoration Fangwei Hao et.al. 2407.18613 null
2024-07-25 RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models Haoyu Chen et.al. 2407.18035 null
2024-07-25 Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography Kailai Zhou et.al. 2407.17996 link
2024-07-23 S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks Neha A S et.al. 2407.17587 null
2024-07-24 Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation Yongqi Li et.al. 2407.17274 null
2024-07-23 CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction Liang Zhao et.al. 2407.16204 null
2024-07-23 Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems Sojin Lee et.al. 2407.16125 link
2024-07-20 Deep Learning CT Image Restoration using System Blur and Noise Models Yijie Yuan et.al. 2407.14983 null
2024-07-23 AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement Yunlong Lin et.al. 2407.14900 null
2024-07-20 Dual High-Order Total Variation Model for Underwater Image Restoration Yuemei Li et.al. 2407.14868 link
2024-07-19 Adaptive Frequency Enhancement Network for Single Image Deraining Fei Yan et.al. 2407.14292 null
2024-07-19 Double-Shot 3D Shape Measurement with a Dual-Branch Network Mingyang Lei et.al. 2407.14198 null
2024-07-19 TaGAT: Topology-Aware Graph Attention Network For Multi-modal Retinal Image Fusion Xin Tian et.al. 2407.14188 link
2024-07-18 Visual Haystacks: Answering Harder Questions About Sets of Images Tsung-Han Wu et.al. 2407.13766 link
2024-07-18 Any Image Restoration with Efficient Automatic Degradation Adaptation Bin Ren et.al. 2407.13372 link
2024-07-18 Training-Free Large Model Priors for Multiple-in-One Image Restoration Xuanhua He et.al. 2407.13181 null
2024-07-18 Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement Eashan Adhikarla et.al. 2407.13170 null
2024-07-21 HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration Shuchang Zhang et.al. 2407.13120 null
2024-07-17 Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations Tomáš Chobola et.al. 2407.12511 link
2024-07-17 GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval Han Zhou et.al. 2407.12431 link
2024-07-17 Towards Revisiting Visual Place Recognition for Joining Submaps in Multimap SLAM Markus Weißflog et.al. 2407.12408 null
2024-07-17 GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity Shuo Cao et.al. 2407.12273 null
2024-07-16 Haze-Aware Attention Network for Single-Image Dehazing Lihan Tong et.al. 2407.11505 null
2024-07-16 EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis Ruijie Yang et.al. 2407.11401 null
2024-07-15 No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations Walter Simoncini et.al. 2407.10964 link
2024-07-15 In-Loop Filtering via Trained Look-Up Tables Zhuoyuan Li et.al. 2407.10926 null
2024-07-15 MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration Yulin Ren et.al. 2407.10833 null
2024-07-15 DINO Pre-training for Vision-based End-to-end Autonomous Driving Shubham Juneja et.al. 2407.10803 null
2024-07-15 Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval Youngsun Lim et.al. 2407.10683 null
2024-07-15 An experimental evaluation of Siamese Neural Networks for robot localization using omnidirectional imaging in indoor environments J. J. Cabrera et.al. 2407.10536 null

Image Matching

Publish Date Title Authors PDF Code
2025-04-11 Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models Josef Bengtson et.al. 2504.08348 null
2025-04-10 Image registration of 2D optical thin sections in a 3D porous medium: Application to a Berea sandstone digital rock image Jaehong Chung et.al. 2504.06604 link
2025-04-08 To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition Davide Sferrazza et.al. 2504.06116 null
2025-04-10 Learning Affine Correspondences by Integrating Geometric Constraints Pengju Sun et.al. 2504.04834 link
2025-04-01 Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data Yiqun Duan et.al. 2504.00812 null
2025-03-31 CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching Zizhuo Li et.al. 2503.23925 null
2025-03-28 Pairwise Matching of Intermediate Representations for Fine-grained Explainability Lauren Shrack et.al. 2503.22881 link
2025-03-26 Multimodal Image Matching based on Frequency-domain Information of Local Energy Response Meng Yang et.al. 2503.20827 null
2025-03-22 Normalized Matching Transformer Abtin Pourhadi et.al. 2503.17715 link
2025-03-20 Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors Tian Yi Lim et.al. 2503.16275 null
2025-03-20 MapGlue: Multimodal Remote Sensing Image Matching Peihao Wu et.al. 2503.16185 link
2025-03-19 PAPI-Reg: Patch-to-Pixel Solution for Efficient Cross-Modal Registration between LiDAR Point Cloud and Camera Image Yuanchao Yue et.al. 2503.15285 null
2025-04-07 Less Biased Noise Scale Estimation for Threshold-Robust RANSAC Johan Edstedt et.al. 2503.13433 null
2025-03-17 SatDepth: A Novel Dataset for Satellite Image Matching Rahul Deshmukh et.al. 2503.12706 link
2025-03-14 Refining Image Edge Detection via Linear Canonical Riesz Transforms Shuhui Yang et.al. 2503.11148 null
2025-03-13 Speedy MASt3R Jingxing Li et.al. 2503.10017 null
2025-03-11 Keypoint Detection and Description for Raw Bayer Images Jiakai Lin et.al. 2503.08673 null
2025-03-06 Learning 3D Medical Image Models From Brain Functional Connectivity Network Supervision For Mental Disorder Diagnosis Xingcan Hu et.al. 2503.04205 null
2025-03-07 Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration Qianliang Wu et.al. 2503.04127 null
2025-03-05 JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba Xiaoyong Lu et.al. 2503.03437 null
2025-02-28 CNSv2: Probabilistic Correspondence Encoded Neural Image Servo Anzhe Chen et.al. 2503.00132 null
2025-02-27 A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization Yejun Zhang et.al. 2502.20036 link
2025-02-27 RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges Thibaut Loiseau et.al. 2502.19955 null
2025-02-26 BEV-LIO(LC): BEV Image Assisted LiDAR-Inertial Odometry with Loop Closure Haoxin Cai et.al. 2502.19242 link
2025-02-25 PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching Han Nie et.al. 2502.18104 link
2025-02-25 Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking Xin Tong et.al. 2502.17766 null
2025-03-04 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model Yaxuan Huang et.al. 2502.16779 null
2025-02-16 FeaKM: Robust Collaborative Perception under Noisy Pose Conditions Jiuwu Hao et.al. 2502.11003 link
2025-02-24 Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation Emanuele Mule et.al. 2502.06288 link
2025-02-04 Muographic Image Upsampling with Machine Learning for Built Infrastructure Applications William O’Donnell et.al. 2502.02624 null
2025-02-01 MambaGlue: Fast and Robust Local Feature Matching With Mamba Kihwan Ryoo et.al. 2502.00462 link
2025-01-24 Dense-SfM: Structure from Motion with Dense Consistent Matching JongMin Lee et.al. 2501.14277 null
2025-01-20 MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching Yepeng Liu et.al. 2501.11299 null
2025-01-13 MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training Xingyi He et.al. 2501.07556 null
2025-01-13 Matching Free Depth Recovery from Structured Light Zhuohang Yu et.al. 2501.07113 null
2025-01-02 Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views Yulun Wu et.al. 2501.01196 null
2024-12-31 Towards Real-Time 2D Mapping: Harnessing Drones, AI, and Computer Vision for Advanced Insights Bharath Kumar Agnur et.al. 2412.20210 null
2024-12-27 MINIMA: Modality Invariant Image Matching Xingyu Jiang et.al. 2412.19412 link
2024-12-24 GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network Xianfeng Song et.al. 2412.18221 link
2024-12-17 Bringing Multimodality to Amazon Visual Search System Xinliang Zhu et.al. 2412.13364 null
2024-12-04 Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis Siyoon Jin et.al. 2412.03150 null
2024-11-20 DT-LSD: Deformable Transformer-based Line Segment Detection Sebastian Janampa et.al. 2411.13005 link
2024-11-15 Image Matching Filtering and Refinement by Planes and Beyond Fabio Bellavia et.al. 2411.09484 link
2024-11-11 XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration Ismail Can Yagmur et.al. 2411.07430 link
2024-11-07 The Impact of Semi-Supervised Learning on Line Segment Detection Johanna Engman et.al. 2411.04596 link
2024-11-04 Silver medal Solution for Image Matching Challenge 2024 Yian Wang et.al. 2411.01851 null
2024-10-30 Variable Resolution Sampling and Deep Learning Image Recovery for Accelerated Multi-Spectral MRI Near Metal Implants Azadeh Sharafi et.al. 2410.23329 null
2024-11-05 RelationBooth: Towards Relation-Aware Customized Object Generation Qingyu Shi et.al. 2410.23280 null
2024-10-31 ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses Junjie Ni et.al. 2410.22733 null
2024-10-30 LoFLAT: Local Feature Matching using Focused Linear Attention Transformer Naijian Cao et.al. 2410.22710 null
2024-10-26 Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification Yue Su et.al. 2410.20097 null
2024-10-01 A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference Yuan Li et.al. 2410.11848 null
2024-10-15 LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images Yuzhou Cheng et.al. 2410.11505 null
2024-10-12 Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence Felipe Cadar et.al. 2410.09533 link
2024-09-27 Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras Yipeng Lu et.al. 2409.18673 null
2024-09-25 Game4Loc: A UAV Geo-Localization Benchmark from Game Data Yuxiang Ji et.al. 2409.16925 link
2024-09-24 Automatic Registration of SHG and H&E Images with Feature-based Initial Alignment and Intensity-based Instance Optimization: Contribution to the COMULIS Challenge Marek Wodzinski et.al. 2409.15931 null
2024-09-10 Weakly-supervised Camera Localization by Ground-to-satellite Image Registration Yujiao Shi et.al. 2409.06471 link
2024-09-05 Enabling Practical and Privacy-Preserving Image Processing Chao Wang et.al. 2409.03568 null
2024-09-20 A General Albedo Recovery Approach for Aerial Photogrammetric Images through Inverse Rendering Shuang Song et.al. 2409.03032 link
2024-08-29 Super-Resolution works for coastal simulations Zhi-Song Liu et.al. 2408.16553 null
2024-09-15 Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks Sierra Bonilla et.al. 2408.16445 link
2024-08-26 Affine steerers for structured keypoint description Georg Bökman et.al. 2408.14186 link
2024-08-25 TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers Chuanrui Zhang et.al. 2408.13770 null
2024-09-11 Coarse-to-fine Alignment Makes Better Speech-image Retrieval Lifeng Zhou et.al. 2408.13119 null
2024-08-19 BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval Zhenyu Lu et.al. 2408.10383 null
2024-08-14 RSD-DOG : A New Image Descriptor based on Second Order Derivatives Darshan Venkatrayappa et.al. 2408.07687 null
2024-08-09 One Shot is Enough for Sequential Infrared Small Target Segmentation Bingbing Dan et.al. 2408.04823 link
2024-08-07 PRISM: PRogressive dependency maxImization for Scale-invariant image Matching Xudong Cai et.al. 2408.03598 null
2024-08-05 ConDL: Detector-Free Dense Image Matching Monika Kwiatkowski et.al. 2408.02766 null
2024-08-04 Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image Xinlin Ren et.al. 2408.02079 link
2024-07-29 Image-text matching for large-scale book collections Artemis Llabrés et.al. 2407.19812 link
2024-07-26 PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis Sohyeong Kim et.al. 2407.18695 null
2024-07-22 RADA: Robust and Accurate Feature Learning with Domain Adaptation Jingtai He et.al. 2407.15791 null
2024-07-17 GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection Jingwen Yu et.al. 2407.11736 link
2024-07-16 REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching Han Nie et.al. 2407.11637 link
2024-07-16 A Self-Correcting Strategy of the Digital Volume Correlation Displacement Field Based on Image Matching: Application to Poor Speckles Quality and Complex-Large Deformation Chengsheng Li et.al. 2407.11287 null
2024-07-14 Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching Xiaoyong Lu et.al. 2407.07789 null
2024-07-10 Mutual Information calculation on different appearances Jiecheng Liao et.al. 2407.07410 null
2024-07-15 SfM on-the-fly: Get better 3D from What You Capture Zongqian Zhan et.al. 2407.03939 null
2024-07-03 IMC 2024 Methods & Solutions Review Shyam Gupta et.al. 2407.03172 null
2024-06-21 High Resolution Surface Reconstruction of Cultural Heritage Objects Using Shape from Polarization Method F. S. Mortazavi et.al. 2406.15121 null
2024-06-16 Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models Yikai Zhang et.al. 2406.10902 link
2024-06-14 Grounding Image Matching in 3D with MASt3R Vincent Leroy et.al. 2406.09756 link

MutilModal

Publish Date Title Authors PDF Code
2025-04-17 SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Haoxuan Li et.al. 2504.13172 null
2025-04-17 Hadamard product in deep learning: Introduction, Advances and Challenges Grigorios G Chrysos et.al. 2504.13112 null
2025-04-17 EventVAD: Training-Free Event-Aware Video Anomaly Detection Yihua Shao et.al. 2504.13092 null
2025-04-17 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 null
2025-04-17 ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images Sangwook Kim et.al. 2504.13023 null
2025-04-17 EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery Wei Zhang et.al. 2504.12795 null
2025-04-17 Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration Yicheng Pan et.al. 2504.12773 null
2025-04-17 SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding Qianqian Sun et.al. 2504.12704 null
2025-04-17 GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning Liangyu Xu et.al. 2504.12597 null
2025-04-16 Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis Shravan Chaudhari et.al. 2504.12511 null
2025-04-16 Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis Miaosen Luo et.al. 2504.12151 null
2025-04-16 Instruction-augmented Multimodal Alignment for Image-Text and Element Matching Xinli Yue et.al. 2504.12018 null
2025-04-16 AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection Yuhao Chao et.al. 2504.11914 null
2025-04-16 Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation Julia Kreutzer et.al. 2504.11829 null
2025-04-15 DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis Efthymios Georgiou et.al. 2504.11082 null
2025-04-15 Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation Yan Rong et.al. 2504.11002 null
2025-04-14 CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates Ankit Kumar Shaw et.al. 2504.10738 null
2025-04-14 Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization Darryl Hannan et.al. 2504.10727 null
2025-04-14 Relation-Rich Visual Document Generator for Visual Information Extraction Zi-Han Jiang et.al. 2504.10659 null
2025-04-15 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Jinguo Zhu et.al. 2504.10479 null
2025-04-14 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Tao Zhang et.al. 2504.10465 null
2025-04-14 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Weixian Lei et.al. 2504.10462 null
2025-04-14 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos Rui Chen et.al. 2504.10358 null
2025-04-14 CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation Junchen Fu et.al. 2504.10307 null
2025-04-14 PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search Pengfei Hu et.al. 2504.10222 null
2025-04-14 The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance Anwesha Mohanty et.al. 2504.10179 null
2025-04-14 COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts Jiansheng Li et.al. 2504.10158 null
2025-04-14 CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography I-Sheng Fang et.al. 2504.10090 null
2025-04-15 MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework Zihan Ling et.al. 2504.10074 null
2025-04-11 Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Boyang Deng et.al. 2504.08727 null
2025-04-10 POEM: Precise Object-level Editing via MLLM control Marco Schouten et.al. 2504.08111 null
2025-04-10 GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Lang Lin et.al. 2504.07962 null
2025-04-10 MM-IFEngine: Towards Multimodal Instruction Following Shengyuan Ding et.al. 2504.07957 link
2025-04-10 Perception-R1: Pioneering Perception Policy with Reinforcement Learning En Yu et.al. 2504.07954 link
2025-04-10 MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation Nico Catalano et.al. 2504.07942 null
2025-04-10 VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding Henghao Zhao et.al. 2504.07519 null
2025-04-10 How Can Objects Help Video-Language Understanding? Zitian Tang et.al. 2504.07454 null
2025-04-10 Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing Chenxi Sun et.al. 2504.07424 null
2025-04-10 Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction Kyoyun Choi et.al. 2504.07415 null
2025-04-09 Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning Ashutosh Chaubey et.al. 2504.07198 null
2025-04-10 VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Xinhao Li et.al. 2504.06958 null
2025-04-09 MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking Chang Nie et.al. 2504.06863 null
2025-04-09 Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions Angela Lopez-Cardona et.al. 2504.06843 null
2025-04-09 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Ruotian Peng et.al. 2504.06666 null
2025-04-09 Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program Minghe Gao et.al. 2504.06606 null
2025-04-08 Mind the Gap: Evaluating Vision Systems in Small Data Applications Samuel Stevens et.al. 2504.06486 link
2025-04-08 Transfer between Modalities with MetaQueries Xichen Pan et.al. 2504.06256 null
2025-04-08 V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Xiangxi Zheng et.al. 2504.06148 null
2025-04-08 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models Pengfei Zhou et.al. 2504.05782 null
2025-04-08 On the Suitability of Reinforcement Fine-Tuning to Visual Tasks Xiaxu Chen et.al. 2504.05682 null
2025-04-07 URECA: Unique Region Caption Anything Sangbeom Lim et.al. 2504.05305 null
2025-04-07 LiveVQA: Live Visual Knowledge Seeking Mingyang Fu et.al. 2504.05288 null
2025-04-07 Explaining Low Perception Model Competency with High-Competency Counterfactuals Sara Pohland et.al. 2504.05254 null
2025-04-07 Towards Visual Text Grounding of Multimodal Large Language Model Ming Li et.al. 2504.04974 null
2025-04-07 Video-Bench: Human-Aligned Video Generation Benchmark Hui Han et.al. 2504.04907 null
2025-04-07 OrderChain: A General Prompting Paradigm to Improve Ordinal Understanding Ability of MLLM Jinhong Wang et.al. 2504.04801 null
2025-04-07 OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance Chaoyi Wang et.al. 2504.04781 null
2025-04-07 Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data Samarth Mishra et.al. 2504.04740 null
2025-04-07 LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts Yimu Wang et.al. 2504.04653 null
2025-04-06 Advancing Egocentric Video Question Answering with Multimodal Large Language Models Alkesh Patel et.al. 2504.04550 null
2025-04-04 MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Wulin Xie et.al. 2504.03641 null
2025-04-03 Hummus: A Dataset of Humorous Multimodal Metaphor Use Xiaoyu Tong et.al. 2504.02983 link
2025-04-03 Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning Zhihan Zhang et.al. 2504.02906 link
2025-04-03 Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision Xiaofeng Han et.al. 2504.02477 null
2025-04-03 The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy Matheus Valentim et.al. 2504.02217 null
2025-04-03 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Runhui Huang et.al. 2504.01934 null
2025-04-02 Spatial-R1: Enhancing MLLMs in Video Spatial Reasoning Kun Ouyang et.al. 2504.01805 link
2025-04-02 PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$ de Contextualization Aofan Liu et.al. 2504.01444 null
2025-04-02 Slow-Fast Architecture for Video Multi-Modal Large Language Models Min Shi et.al. 2504.01328 link
2025-04-01 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Junhao Cheng et.al. 2504.01014 link
2025-04-01 IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval Bangwei Liu et.al. 2504.00954 null
2025-04-02 Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning Ram Ramrakhya et.al. 2504.00907 null
2025-04-01 Improved Visual-Spatial Reasoning via R1-Zero-Like Training Zhenyi Liao et.al. 2504.00883 null
2025-04-01 Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights Yuchen Liu et.al. 2504.00839 null
2025-04-01 QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA Shuai Li et.al. 2504.00654 null
2025-03-31 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Shengqiong Wu et.al. 2503.24379 null
2025-03-31 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Yi Chen et.al. 2503.24376 link
2025-03-31 H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding Qi Wu et.al. 2503.24008 null
2025-03-31 BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation Yumeng Fu et.al. 2503.23990 null
2025-03-31 Boosting MLLM Reasoning with Text-Debiased Hint-GRPO Qihan Huang et.al. 2503.23905 null
2025-04-01 Evaluating small vision-language models as AI assistants for radio astronomical source analysis tasks S. Riggi et.al. 2503.23859 link
2025-03-31 OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training Yijie Zheng et.al. 2503.23830 null
2025-03-31 XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? Fengxiang Wang et.al. 2503.23771 null
2025-03-31 STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding? Yun Li et.al. 2503.23765 null
2025-03-31 AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization Yiyang Du et.al. 2503.23733 link
2025-03-28 Q-Insight: Understanding Image Quality via Visual Reinforcement Learning Weiqi Li et.al. 2503.22679 link
2025-03-28 Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Antonia Karamolegkou et.al. 2503.22610 null
2025-03-28 NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving Fuhao Li et.al. 2503.22436 null
2025-03-31 Agent-Centric Personalized Multiple Clustering with Multi-Modal LLMs Ziye Chen et.al. 2503.22241 null
2025-03-28 Learning to Instruct for Visual Instruction Tuning Zhihan Zhou et.al. 2503.22215 null
2025-03-28 DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos Yunming Liang et.al. 2503.22208 null
2025-03-28 EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos Yuxuan Li et.al. 2503.22152 link
2025-03-28 Tokenization of Gaze Data Tim Rolff et.al. 2503.22145 null
2025-03-28 A Survey on Remote Sensing Foundation Models: From Vision to Multimodality Ziyue Huang et.al. 2503.22081 link
2025-03-27 Video-R1: Reinforcing Video Reasoning in MLLMs Kaituo Feng et.al. 2503.21776 link
2025-03-27 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models Yuhan Zhang et.al. 2503.21745 null
2025-03-27 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Zhengxi Lu et.al. 2503.21620 link
2025-03-27 FusionSegReID: Advancing Person Re-Identification with Multimodal Retrieval and Precise Segmentation Jincheng Yan et.al. 2503.21595 null
2025-03-27 FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs Xiaoqin Wang et.al. 2503.21457 link
2025-03-27 InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression Dongchen Lu et.al. 2503.21307 link
2025-03-26 ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction Yiqiao Jin et.al. 2503.20978 null
2025-03-26 MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams Yanpeng Sun et.al. 2503.20745 null
2025-03-26 Vision as LoRA Han Wang et.al. 2503.20680 link
2025-03-26 Beyond Intermediate States: Explaining Visual Redundancy through Language Dingchen Yang et.al. 2503.20540 link
2025-03-26 Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering Zehui Liao et.al. 2503.20504 null
2025-03-26 MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning Yiwei Ma et.al. 2503.20502 null
2025-03-26 From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment Yucheng Suo et.al. 2503.20472 null
2025-03-26 MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation Rongyu Zhang et.al. 2503.20384 null
2025-03-26 Dynamic Pyramid Network for Efficient Multimodal Large Language Model Hao Ai et.al. 2503.20322 null
2025-03-26 Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs Zitian Wang et.al. 2503.20309 null
2025-03-25 LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Kexian Tang et.al. 2503.19990 null
2025-03-25 CoLLM: A Large Language Model for Composed Image Retrieval Chuong Huynh et.al. 2503.19910 link
2025-03-25 Scaling Vision Pre-Training to 4K Resolution Baifeng Shi et.al. 2503.19903 null
2025-03-25 Perception-Enhanced Multitask Multimodal Semantic Communication for UAV-Assisted Integrated Sensing and Communication System Ziji Guo et.al. 2503.19594 null
2025-03-25 DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts Ling Zhong et.al. 2503.19498 null
2025-03-25 ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning Jiaqi Liao et.al. 2503.19312 null
2025-03-24 MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks Wenhao You et.al. 2503.19134 null
2025-03-24 LLaVAction: evaluating and training multi-modal large language models for action recognition Shaokai Ye et.al. 2503.18712 link
2025-03-25 Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models Yazhou Zhang et.al. 2503.18681 null
2025-03-24 Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark Bingchen Miao et.al. 2503.18665 link
2025-03-24 Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding Xiangrui Liu et.al. 2503.18478 null
2025-03-24 A Simple yet Effective Layout Token in Large Language Models for Document Understanding Zhaoqing Zhu et.al. 2503.18434 null
2025-03-23 Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering Zixin Chen et.al. 2503.18172 null
2025-03-23 MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation Jiaxin Huang et.al. 2503.18135 null
2025-03-23 MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection Yibo Yan et.al. 2503.18132 null
2025-03-23 Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models Qiao Liang et.al. 2503.18034 null
2025-03-22 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding Wenxuan Zhu et.al. 2503.17827 link
2025-03-21 LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models Jian Liang et.al. 2503.16843 null
2025-03-21 When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts Jun Seong Kim et.al. 2503.16826 null
2025-03-20 Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions Hadi Amini et.al. 2503.16585 link
2025-03-20 OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence Long Yuan et.al. 2503.16326 null
2025-03-20 Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data Zijian Li et.al. 2503.16260 null
2025-03-20 CLS-RL: Image Classification with Rule-Based Reinforcement Learning Ming Li et.al. 2503.16188 link
2025-03-20 OThink-MR1: Stimulating multimodal generalized reasoning capabilities through dynamic reinforcement learning Zhiyuan Liu et.al. 2503.16081 null
2025-03-20 Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models Zhihang Liu et.al. 2503.16036 null
2025-03-20 BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models Zenghui Yuan et.al. 2503.16023 null
2025-03-20 DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering Haochen Wang et.al. 2503.15887 null
2025-03-20 A Vision Centric Remote Sensing Benchmark Abduljaleel Adejumo et.al. 2503.15816 null
2025-03-19 LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning Federico Cocchi et.al. 2503.15621 link
2025-03-19 Visual Position Prompt for MLLM based Visual Grounding Wei Tang et.al. 2503.15426 link
2025-03-19 Leveraging Perfect Multimodal Alignment and Gaussian Assumptions for Cross-modal Transfer Abhi Kamboj et.al. 2503.15352 null
2025-03-19 LEGION: Learning to Ground and Explain for Synthetic Image Detection Hengrui Kang et.al. 2503.15264 null
2025-03-20 Benchmarking Large Language Models for Handwritten Text Recognition Giorgia Crosilla et.al. 2503.15195 null
2025-03-19 UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Qihui Zhang et.al. 2503.14941 null
2025-03-19 VisNumBench: Evaluating Number Sense of Multimodal Large Language Models Tengjin Weng et.al. 2503.14939 null
2025-03-19 FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding Chongjun Tu et.al. 2503.14935 null
2025-03-19 POSTA: A Go-to Framework for Customized Artistic Poster Generation Haoyu Chen et.al. 2503.14908 null
2025-03-19 Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations Shuo Li et.al. 2503.14895 null
2025-03-18 Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives Sara Sarto et.al. 2503.14604 null
2025-03-18 Aligning Multimodal LLM with Human Preference: A Survey Tao Yu et.al. 2503.14504 null
2025-03-19 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Xinyu Fang et.al. 2503.14478 link
2025-03-18 VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation Shoubin Yu et.al. 2503.14350 null
2025-03-19 DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies Wei Song et.al. 2503.14324 link
2025-03-18 Towards Harmless Multimodal Assistants with Blind Preference Optimization Yongqi Li et.al. 2503.14189 null
2025-03-18 Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Zining Wang et.al. 2503.14140 null
2025-03-18 MP-GUI: Modality Perception with MLLMs for GUI Understanding Ziwei Wang et.al. 2503.14021 link
2025-03-18 SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability Jiankang Wang et.al. 2503.13983 null
2025-03-18 Survey of Adversarial Robustness in Multimodal Large Language Models Chengze Jiang et.al. 2503.13962 null
2025-03-18 Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation Sayak Nag et.al. 2503.13947 null
2025-03-17 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research James Burgess et.al. 2503.13399 link
2025-03-17 Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning Mengyao Lyu et.al. 2503.13383 null
2025-03-17 Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning Hai-Long Sun et.al. 2503.13360 null
2025-03-17 3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o Dingning Liu et.al. 2503.13185 null
2025-03-17 MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs Erik Daxberger et.al. 2503.13111 null
2025-03-17 Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference Hao Yin et.al. 2503.13108 link
2025-03-17 ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models Hao Yin et.al. 2503.13107 link
2025-03-17 Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided, Self-Consistent MLLMs for Food Preparation Task Planning Yu-Hong Shen et.al. 2503.13055 null
2025-03-17 Efficient Motion-Aware Video MLLM Zijia Zhao et.al. 2503.13016 null
2025-03-17 HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model Haiyang Guo et.al. 2503.12941 null
2025-03-14 VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity Jing Bi et.al. 2503.11557 null
2025-03-14 A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving Tin Stribor Sohn et.al. 2503.11400 null
2025-03-14 Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware Insu Jang et.al. 2503.11367 link
2025-03-14 Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space Weichen Zhan et.al. 2503.11094 link
2025-03-14 EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks Yi Zhang et.al. 2503.11089 null
2025-03-14 BannerAgency: Advertising Banner Design with Multimodal LLM Agents Heng Wang et.al. 2503.11060 null
2025-03-14 RONA: Pragmatically Diverse Image Captioning with Coherence Relations Aashish Anantha Ramakrishnan et.al. 2503.10997 link
2025-03-13 Learning to Inference Adaptively for Multimodal Large Language Models Zhuoyan Xu et.al. 2503.10905 null
2025-03-13 PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models Zilu Guo et.al. 2503.10529 null
2025-03-13 Interactive Multimodal Fusion with Temporal Modeling Jun Yu et.al. 2503.10523 null
2025-03-13 TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models Xudong Tan et.al. 2503.10501 link
2025-03-13 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Wanhua Li et.al. 2503.10437 link
2025-03-13 CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Yufan Deng et.al. 2503.10391 null
2025-03-13 A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection Heng Yim Nicole Oo et.al. 2503.10371 null
2025-03-13 IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification Yuhao Wang et.al. 2503.10324 null
2025-03-13 VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Weiyun Wang et.al. 2503.10291 null
2025-03-13 LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents Boyu Chen et.al. 2503.10200 null
2025-03-13 Hybrid Agents for Image Restoration Bingchen Li et.al. 2503.10120 null
2025-03-13 BIMBA: Selective-Scan Compression for Long-Range Video Question Answering Md Mohaiminul Islam et.al. 2503.09590 link
2025-03-12 Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding Haoyu Zhang et.al. 2503.09143 null
2025-03-11 Seeing What’s Not There: Spurious Correlation in Multimodal LLMs Parsa Hosseini et.al. 2503.08884 null
2025-03-11 Language-Depth Navigated Thermal and Visible Image Fusion Jinchang Zhang et.al. 2503.08676 null
2025-03-11 SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Muzhi Zhu et.al. 2503.08625 null
2025-03-11 LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Xianfeng Wu et.al. 2503.08619 link
2025-03-11 HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding Shehreen Azad et.al. 2503.08585 null
2025-03-11 RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding Xichen Tan et.al. 2503.08576 null
2025-03-11 FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework Jianian Zhu et.al. 2503.08461 null
2025-03-11 KAP: MLLM-assisted OCR Text Enhancement for Hybrid Retrieval in Chinese Non-Narrative Documents Hsin-Ling Hsu et.al. 2503.08452 null
2025-03-11 Embodied Crowd Counting Runling Long et.al. 2503.08367 null
2025-03-12 Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs Chongjun Tu et.al. 2503.08342 null
2025-03-11 Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework Zhuo Zhi et.al. 2503.08308 null
2025-03-10 Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts Shiu-hong Kao et.al. 2503.07503 null
2025-03-10 LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? Bangyan Li et.al. 2503.07487 null
2025-03-10 REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding Yan Tai et.al. 2503.07413 link
2025-03-10 ALLVB: All-in-One Long Video Understanding Benchmark Xichen Tan et.al. 2503.07298 null
2025-03-10 A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images Xiaoyi Liang et.al. 2503.07094 null
2025-03-10 Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Jiazheng Liu et.al. 2503.07002 null
2025-03-10 Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs Wenzhuo Xu et.al. 2503.06989 null
2025-03-10 Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition Xinyu Xi et.al. 2503.06978 null
2025-03-10 ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks Yan Yang et.al. 2503.06885 null
2025-03-09 SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation Zisheng Chen et.al. 2503.06764 link
2025-03-11 Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Wenxuan Huang et.al. 2503.06749 link
2025-03-07 Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information Junbo Zhao et.al. 2503.05543 null
2025-03-07 Can Large Language Models Grasp Concepts in Visual Content? A Case Study on YouTube Shorts about Depression Jiaying “Lizzy” Liu et.al. 2503.05109 null
2025-03-06 FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement Ian Huang et.al. 2503.04919 null
2025-03-06 Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Wenke Huang et.al. 2503.04543 null
2025-03-06 Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition Bin Chen et.al. 2503.04201 null
2025-03-06 MASTER: Multimodal Segmentation with Text Prompts Fuyang Liu et.al. 2503.04199 null
2025-03-06 Biological Sequence with Language Model Prompting: A Survey Jiyue Jiang et.al. 2503.04135 null
2025-03-07 Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts Xiangnan Chen et.al. 2503.04095 null
2025-03-06 RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models Wenhui Zhu et.al. 2503.03987 null
2025-03-05 DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance Zhao Yang et.al. 2503.03689 link
2025-03-05 BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation Hiep Truong Cong et.al. 2503.03280 null
2025-03-05 COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence Wentao Li et.al. 2503.03215 null
2025-03-05 Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings Sneh Pillai et.al. 2503.03202 null
2025-03-04 Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs Wei-Yao Wang et.al. 2503.02597 link
2025-03-05 MCiteBench: A Benchmark for Multimodal Citation Text Generation in MLLMs Caiyu Hu et.al. 2503.02589 link
2025-03-04 A Token-level Text Image Foundation Model for Document Understanding Tongkun Guan et.al. 2503.02304 null
2025-03-03 Distilled Prompt Learning for Incomplete Multimodal Survival Prediction Yingxue Xu et.al. 2503.01653 null
2025-03-03 RemiHaven: Integrating “In-Town” and “Out-of-Town” Peers to Provide Personalized Reminiscence Support for Older Drifters Xuechen Zhang et.al. 2503.01358 null
2025-03-04 UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Hao Tang et.al. 2503.01342 link
2025-03-03 Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG Wenbin Wang et.al. 2503.01222 link
2025-03-03 Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models Tianjie Ju et.al. 2503.01208 link
2025-03-03 Scientific Reasoning: Assessment of Multimodal Generative LLMs Florian Dreyer et.al. 2503.01064 null
2025-03-02 LLM-Fusion: A Novel Multimodal Fusion Model for Accelerated Material Discovery Onur Boyar et.al. 2503.01022 null
2025-02-28 Adaptive Keyframe Sampling for Long Video Understanding Xi Tang et.al. 2502.21271 null
2025-02-28 RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Yuheng Ji et.al. 2502.21257 null
2025-02-28 Fine-Grained Retrieval-Augmented Generation for Visual Question Answering Zhengxuan Zhang et.al. 2502.20964 null
2025-02-28 HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models Xiao Wang et.al. 2502.20811 null
2025-03-03 MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts Peijie Wang et.al. 2502.20808 null
2025-02-28 Towards General Visual-Linguistic Face Forgery Detection(V2) Ke Sun et.al. 2502.20698 link
2025-02-27 Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection Sari Masri et.al. 2502.20573 null
2025-02-27 Protecting multimodal large language models against misleading visualizations Jonathan Tonglet et.al. 2502.20503 link
2025-02-27 VideoA11y: Method and Dataset for Accessible Video Description Chaoyu Li et.al. 2502.20480 null
2025-02-27 Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription Benjamin Gutteridge et.al. 2502.20295 link
2025-02-27 Mixture of Experts for Recognizing Depression from Interview and Reading Tasks Loukas Ilias et.al. 2502.20213 null
2025-02-27 New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration Xuzheng Yang et.al. 2502.20104 null
2025-02-27 AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs Xuyang Wei et.al. 2502.20035 link
2025-02-27 Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up Lang Huang et.al. 2502.20008 null
2025-02-27 Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents Zhenyu Liu et.al. 2502.19917 link
2025-02-27 Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy Zaijing Li et.al. 2502.19902 null
2025-02-27 Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention Weiyan Shi et.al. 2502.19877 null
2025-02-27 One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion Chunyang Cheng et.al. 2502.19854 link
2025-02-27 Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack Chenhe Gu et.al. 2502.19672 null
2025-02-26 ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models Danae Sánchez Villegas et.al. 2502.19409 null
2025-02-26 M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance Qingpei Guo et.al. 2502.18778 null
2025-02-25 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Xiangyu Zhao et.al. 2502.18411 link
2025-02-25 ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis Li Lei et.al. 2502.18180 null
2025-02-25 VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion Pei Liu et.al. 2502.18042 null
2025-02-25 MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks Hyeonjeong Ha et.al. 2502.17832 link
2025-02-25 Can Multimodal LLMs Perform Time Series Anomaly Detection? Xiongxiao Xu et.al. 2502.17812 link
2025-02-24 MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference Zhongwei Wan et.al. 2502.17599 link
2025-02-24 PosterSum: A Multimodal Benchmark for Scientific Poster Summarization Rohit Saxena et.al. 2502.17540 link
2025-02-24 Introducing Visual Perception Token into Multimodal Large Language Model Runpeng Yu et.al. 2502.17425 link
2025-02-24 MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Jiarui Zhang et.al. 2502.17422 link
2025-02-24 HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization Zhenghao Liu et.al. 2502.17315 link
2025-02-24 Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts Zhenghao Liu et.al. 2502.17297 link
2025-02-24 Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence Wenzhe Yin et.al. 2502.17028 null
2025-02-24 Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs Himanshu Beniwal et.al. 2502.16901 link
2025-02-24 SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding Liangtao Shi et.al. 2502.16786 link
2025-02-23 AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation Rui Li et.al. 2502.16680 link
2025-02-23 Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries Yin Wu et.al. 2502.16636 link
2025-02-23 Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review Pei Fu et.al. 2502.16586 null
2025-02-21 Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models Anirudh Sundar et.al. 2502.15639 null
2025-02-21 Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs Gengyuan Zhang et.al. 2502.15457 null
2025-02-21 Research advances on fish feeding behavior recognition and intensity quantification methods in aquaculture Shulong Zhang et.al. 2502.15311 null
2025-02-21 M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment Chuan Cui et.al. 2502.15167 null
2025-02-20 Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation Yun-Wei Chu et.al. 2502.15040 null
2025-02-20 Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework Yuming Yang et.al. 2502.14864 link
2025-02-20 Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension Amir Hossein Yari et.al. 2502.14315 null
2025-02-20 Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach Yurong Wu et.al. 2502.14285 null
2025-02-21 PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC Haowei Liu et.al. 2502.14282 null
2025-02-19 ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities Chanjin Zheng et.al. 2502.13832 link
2025-02-19 From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education Yi-Fan Zhang et.al. 2502.13789 null
2025-02-18 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Bencheng Liao et.al. 2502.13145 link
2025-02-18 SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models Xianfu Cheng et.al. 2502.13059 null
2025-02-18 AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks Yurun Chen et.al. 2502.13053 null
2025-02-18 Towards Text-Image Interleaved Retrieval Xin Zhang et.al. 2502.12799 link
2025-02-18 Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning Yunhao Gou et.al. 2502.12635 null
2025-02-18 SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings Weikai Lu et.al. 2502.12562 link
2025-02-18 MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos Huaying Yuan et.al. 2502.12558 null
2025-02-18 SAFEERASER: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning Junkai Chen et.al. 2502.12520 null
2025-02-17 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Ling Yang et.al. 2502.12148 link
2025-02-17 PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection Jinhe Bi et.al. 2502.12119 null
2025-02-17 Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications Li Qiao et.al. 2502.12096 null
2025-02-17 Unhackable Temporal Rewarding for Scalable Video MLLMs En Yu et.al. 2502.12081 null
2025-02-17 GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs Yi Fang et.al. 2502.11925 null
2025-02-17 EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models Jiamin Su et.al. 2502.11916 null
2025-02-17 MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation Haochen Xue et.al. 2502.11903 null
2025-02-17 Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities Hanbin Wang et.al. 2502.11829 link
2025-02-17 Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning Yuqi Pang et.al. 2502.11751 link
2025-02-17 Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent Junda Wu et.al. 2502.11740 null
2025-02-14 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Yi-Fan Zhang et.al. 2502.10391 null
2025-02-14 AutoS $^2$ earch: Unlocking the Reasoning Potential of Large Models for Web-based Source Search Zhengqiu Zhu et.al. 2502.09913 null
2025-02-13 EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Rui Yang et.al. 2502.09560 null
2025-02-13 A Benchmark for Crime Surveillance Video Analysis with Large Models Haoran Chen et.al. 2502.09325 null
2025-02-13 From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs Mingxiao Li et.al. 2502.09093 null
2025-02-12 FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per Violation Yang Sun et.al. 2502.08260 link
2025-02-12 Learning Human Skill Generators at Key-Step Levels Yilu Wu et.al. 2502.08234 null
2025-02-13 Universal Adversarial Attack on Aligned Multimodal LLMs Temurbek Rahmatullaev et.al. 2502.07987 null
2025-02-11 DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities Chashi Mahiul Islam et.al. 2502.07905 null
2025-02-11 Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models Jiacong Xu et.al. 2502.07601 null
2025-02-11 MLLM4PUE: Toward Universal Embeddings in Computational Pathology through Multimodal LLMs Qifeng Zhou et.al. 2502.07221 null
2025-02-11 Early Risk Prediction of Pediatric Cardiac Arrest from Electronic Health Records via Multimodal Fused Transformer Jiaying Lu et.al. 2502.07158 null
2025-02-09 AI-Driven HSI: Multimodality, Fusion, Challenges, and the Deep Learning Revolution David S. Bhatti et.al. 2502.06894 null
2025-02-11 CoS: Chain-of-Shot Prompting for Long Video Understanding Jian Hu et.al. 2502.06428 null
2025-02-07 Survey on AI-Generated Media Detection: From Non-MLLM to MLLM Yueying Zou et.al. 2502.05240 null
2025-02-07 Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray Yunhang Shen et.al. 2502.05177 link
2025-02-07 Multitwine: Multi-Object Compositing with Text and Layout Control Gemma Canet Tarrés et.al. 2502.05165 null
2025-02-07 Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs Rohit Saxena et.al. 2502.05092 null
2025-02-07 Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Han Zhang et.al. 2502.04976 null
2025-02-07 Cached Multi-Lora Composition for Multi-Concept Image Generation Xiandong Zou et.al. 2502.04923 link
2025-02-07 MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin Minrui Chen et.al. 2502.04794 null
2025-02-06 EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models He Hu et.al. 2502.04424 null
2025-02-05 PerPO: Perceptual Preference Optimization via Discriminative Rewarding Zining Zhu et.al. 2502.04371 link
2025-02-06 PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? Mennatullah Siam et.al. 2502.04192 link
2025-02-06 MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation Qinhan Yu et.al. 2502.04176 null
2025-02-05 Large Language Models Are Universal Recommendation Learners Junguang Jiang et.al. 2502.03041 null
2025-02-05 Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning Yibo Yan et.al. 2502.02871 null
2025-02-04 SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency Qianhao Yuan et.al. 2502.02458 link
2025-02-04 Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment Yaling Shen et.al. 2502.02438 null
2025-02-06 LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models Tzu-Tao Chang et.al. 2502.02406 null
2025-02-04 Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Jinyang Wu et.al. 2502.02339 null
2025-02-04 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration Younan Zhu et.al. 2502.01969 null
2025-02-04 MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving Shiju Zhao et.al. 2502.01960 null
2025-02-04 DAMO: Data- and Model-aware Alignment of Multi-modal LLMs Jinda Lu et.al. 2502.01943 null
2025-02-03 Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models Hashmat Shadab Malik et.al. 2502.01576 link
2025-02-03 Position: Empowering Time Series Reasoning with Multimodal LLMs Yaxuan Kong et.al. 2502.01477 null
2025-02-03 Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models Mingi Jung et.al. 2502.01419 null
2025-01-31 Efficient Reasoning with Hidden Thinking Xuan Shen et.al. 2501.19201 link
2025-01-31 Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs Hongliang Li et.al. 2501.19036 null
2025-01-31 Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation Bin Zhu et.al. 2501.19017 null
2025-01-30 BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos Lehao Lin et.al. 2501.18565 null
2025-01-29 Generative AI for Vision: A Comprehensive Study of Frameworks and Applications Fouad Bousetouane et.al. 2501.18033 null
2025-01-29 Topological Signatures of Adversaries in Multimodal Alignments Minh Vu et.al. 2501.18006 null
2025-01-30 Leveraging Multimodal LLM for Inspirational User Interface Search Seokhyeon Park et.al. 2501.17799 link
2025-01-29 Learning Free Token Reduction for Multi-Modal LLM Zihui Zhao et.al. 2501.17391 null
2025-01-31 Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence Lindy Gan et.al. 2501.16813 null
2025-01-28 Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding Yun Li et.al. 2501.16786 null
2025-01-28 MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark Dongyi Yi et.al. 2501.16688 null
2025-01-28 CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs Jinlan Fu et.al. 2501.16629 link
2025-01-27 AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models Zheng Lian et.al. 2501.16566 null
2025-01-27 LUCY: Linguistic Understanding and Control Yielding Early Stage of Her Heting Gao et.al. 2501.16327 link
2025-01-27 FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers Renshan Zhang et.al. 2501.16297 null
2025-01-27 Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models Jing Zhang et.al. 2501.16282 null
2025-01-27 Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? Zhiling Chen et.al. 2501.15795 null
2025-01-27 Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning Michael Xieyang Liu et.al. 2501.15727 null
2025-01-26 Ocean-OCR: Towards General OCR Application via a Vision-Language Model Song Chen et.al. 2501.15558 link
2025-01-26 Unveiling the Potential of Multimodal Retrieval Augmented Generation with Planning Xiaohan Yu et.al. 2501.15470 null
2025-01-26 Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations Zijun Long et.al. 2501.15379 null
2025-01-26 Baichuan-Omni-1.5 Technical Report Yadong Li et.al. 2501.15368 link
2025-01-25 Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink Yining Wang et.al. 2501.15269 null
2025-01-23 Pilot: Building the Federated Multimodal Instruction Tuning Framework Baochen Xiong et.al. 2501.13985 null
2025-01-23 GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration Yue Fan et.al. 2501.13896 null
2025-01-23 EventVL: Understand Event Streams via Multimodal Large Language Model Pengteng Li et.al. 2501.13707 null
2025-01-23 LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models Yizheng Sun et.al. 2501.13652 null
2025-01-23 ReasVQA: Advancing VideoQA with Imperfect Reasoning Process Jianxin Liang et.al. 2501.13536 null
2025-01-23 50 Shades of Deceptive Patterns: A Unified Taxonomy, Multimodal Detection, and Security Implications Zewei Shi et.al. 2501.13351 link
2025-01-24 Multi-aspect Knowledge Distillation with Large Language Model Taegyeong Lee et.al. 2501.13341 link
2025-01-22 Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning Bohao Yang et.al. 2501.13042 link
2025-01-22 InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling Yi Wang et.al. 2501.12386 link
2025-01-21 VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model Xianwei Zhuang et.al. 2501.12327 link
2025-01-21 Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization Jie Zhao et.al. 2501.11968 null
2025-01-21 EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents Zhili Cheng et.al. 2501.11858 link
2025-01-20 Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution Zhiyuan You et.al. 2501.11561 null
2025-01-20 EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery Guankun Wang et.al. 2501.11347 link
2025-01-20 ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction Xiangyang Hu et.al. 2501.11276 link
2025-01-20 A Survey of World Models for Autonomous Driving Tuo Feng et.al. 2501.11260 null
2025-01-19 Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation Zhengwen Shen et.al. 2501.10958 null
2025-01-18 Visual RAG: Expanding MLLM visual knowledge without fine-tuning Mirco Bonomo et.al. 2501.10834 null
2025-01-17 FaceXBench: Evaluating Multimodal LLMs on Face Understanding Kartik Narayan et.al. 2501.10360 link
2025-01-16 A Simple Aerial Detection Baseline of Multimodal Language Models Qingyun Li et.al. 2501.09720 link
2025-01-16 Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis Qize Yang et.al. 2501.09502 null
2025-01-16 Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics Yuanyuan Wei et.al. 2501.09218 null
2025-01-15 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Ruixiang Jiang et.al. 2501.09012 link
2025-01-15 The Devil is in Temporal Token: High Quality Video Reasoning Segmentation Sitong Gong et.al. 2501.08549 link
2025-01-14 LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Hongyu Li et.al. 2501.08282 link
2025-01-14 Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness Jiaxing Zhao et.al. 2501.07978 link
2025-01-14 Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models Yifang Xu et.al. 2501.07972 null
2025-01-14 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Haomiao Xiong et.al. 2501.07819 link
2025-01-13 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought Chengzu Li et.al. 2501.07542 null
2025-01-13 Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method Wenping Jin et.al. 2501.07496 link
2025-01-13 Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation Han Liu et.al. 2501.07110 link
2025-01-13 LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models Mozhgan Nasr Azadani et.al. 2501.06986 link
2025-01-12 X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding Wenqi Zhou et.al. 2501.06835 null
2025-01-12 GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing Ruizhe Ou et.al. 2501.06828 null
2025-01-12 MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection Kaiying Yan et.al. 2501.06764 null
2025-01-12 Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints Ming Dai et.al. 2501.06710 link
2025-01-11 ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation Xuanle Zhao et.al. 2501.06598 link
2025-01-11 Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs Shan Zhang et.al. 2501.06430 link
2025-01-10 PEACE: Empowering Geologic Map Holistic Understanding with MLLMs Yangyu Huang et.al. 2501.06184 null
2025-01-10 Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs Dabing Cheng et.al. 2501.05884 null
2025-01-10 Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models You Li et.al. 2501.05767 null
2025-01-10 TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos Korawat Charoenpitaks et.al. 2501.05733 link
2025-01-09 MECASA: Motor Execution Classification using Additive Self-Attention for Hybrid EEG-fNIRS Data Gourav Siddhad et.al. 2501.05525 null
2025-01-09 Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark Yunzhuo Hao et.al. 2501.05444 link
2025-01-09 Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration Xuyang Liu et.al. 2501.05179 link
2025-01-09 Optimizing Multitask Industrial Processes with Predictive Action Guidance Naval Kishore Mehta et.al. 2501.05108 null
2025-01-09 DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving Xuran Zheng et.al. 2501.05081 null
2025-01-09 Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency Shiji Zhao et.al. 2501.04931 null
2025-01-08 Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs Yikang Zhou et.al. 2501.04670 link
2025-01-08 InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Yuhang Liu et.al. 2501.04575 link
2025-01-08 Evidence-based multimodal fusion on structured EHRs and free-text notes for ICU outcome prediction Yucheng Ruan et.al. 2501.04389 link
2025-01-08 Multimodal Graph Constrastive Learning and Prompt for ChartQA Yue Dai et.al. 2501.04303 null
2025-01-08 H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving Siran Chen et.al. 2501.04302 null
2025-01-07 RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance Matin Mortaheb et.al. 2501.03995 null
2025-01-06 Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches Alhassan Mumuni et.al. 2501.03151 null
2025-01-07 Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild Wanpeng Hu et.al. 2501.02964 link
2025-01-06 A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation Toomas Tahves et.al. 2501.02858 null
2025-01-06 Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging? Hongyi Miao et.al. 2501.02751 null
2025-01-05 FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance Haicheng Wang et.al. 2501.02430 link
2025-01-04 What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph Yutao Jiang et.al. 2501.02268 link
2025-01-03 AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs Sanjoy Chowdhury et.al. 2501.02135 null
2025-01-03 VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Chaoyou Fu et.al. 2501.01957 link
2025-01-03 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Yifan Du et.al. 2501.01904 link
2025-01-03 Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models Guosheng Zhang et.al. 2501.01720 null
2025-01-02 Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants Lixiong Qin et.al. 2501.01243 null
2025-01-02 Towards Interactive Deepfake Analysis Lixiong Qin et.al. 2501.01164 link
2025-01-02 EliGen: Entity-Level Controlled Image Generation with Regional Attention Hong Zhang et.al. 2501.01097 link
2025-01-02 Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs Linhao Huang et.al. 2501.01042 null
2025-01-01 Decoding the Flow: CauseMotion for Emotional Causality Analysis in Long-form Conversations Yuxuan Zhang et.al. 2501.00778 null
2024-12-31 Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method Zhenpeng Huang et.al. 2501.00584 null
2024-12-31 VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Xinhao Li et.al. 2501.00574 link
2024-12-31 Fine-grained Video-Text Retrieval: A New Benchmark and Method Yifan Xu et.al. 2501.00513 null
2024-12-31 Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion Hebin Wang et.al. 2501.00330 null
2024-12-31 MLLM-as-a-Judge for Image Safety without Human Labeling Zhenting Wang et.al. 2501.00192 null
2024-12-30 GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models Shangyu Xing et.al. 2412.21036 null
2024-12-30 Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering Junxiao Xue et.al. 2412.20927 null
2024-12-28 ST $^3$ : Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming Jiedong Zhuang et.al. 2412.20105 null
2024-12-28 On the Compositional Generalization of Multimodal LLMs for Medical Imaging Zhenyang Cai et.al. 2412.20070 link
2024-12-27 Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework Jiang Liu et.al. 2412.19684 null
2024-12-27 CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs Siyu Wang et.al. 2412.19663 null
2024-12-27 MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios Jiaqi Fan et.al. 2412.19406 link
2024-12-26 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Ziang Yan et.al. 2412.19326 link
2024-12-26 Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries Roberto Amoroso et.al. 2412.19304 null
2024-12-26 SeaMo: A Multi-Seasonal and Multimodal Remote Sensing Foundation Model Xuyang Li et.al. 2412.19237 null
2024-12-25 MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models Kaiwen Zuo et.al. 2412.18947 null
2024-12-25 RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting Yilei Jiang et.al. 2412.18826 null
2024-12-24 Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation Faraz Waseem et.al. 2412.18688 null
2024-12-24 MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning Abdelmadjid Chergui et.al. 2412.18437 link
2024-12-24 Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles Zihan Wang et.al. 2412.18416 null
2024-12-24 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Huanjin Yao et.al. 2412.18319 link
2024-12-24 ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation Mengyang Wu et.al. 2412.18216 link
2024-12-24 Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Yucong Luo et.al. 2412.18176 null
2024-12-24 VisionLLM-based Multimodal Fusion Network for Glottic Carcinoma Early Detection Zhaohui Jin et.al. 2412.18124 null
2024-12-24 Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach Jing Bi et.al. 2412.18108 null
2024-12-24 An Ensemble Approach to Short-form Video Quality Assessment Using Multimodal LLM Wen Wen et.al. 2412.18060 null
2024-12-23 A Multimodal Fusion Framework for Bridge Defect Detection with Cross-Verification Ravi Datta Rachuri et.al. 2412.17968 null
2024-12-23 Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy Priyaranjan Pattnayak et.al. 2412.17759 null
2024-12-23 HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data Ting Zhou et.al. 2412.17574 link
2024-12-23 Multimodal Preference Data Synthetic Alignment with Reward Model Robert Wijaya et.al. 2412.17417 link
2024-12-23 MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models Beibei Yu et.al. 2412.17339 null
2024-12-23 Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding Yueyang Li et.al. 2412.17337 link
2024-12-23 Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective Kaifang Long et.al. 2412.17297 null
2024-12-22 SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults Jinzhi Wang et.al. 2412.17077 null
2024-12-22 CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models Yeyuan Wang et.al. 2412.16869 link
2024-12-22 GME: Improving Universal Multimodal Retrieval by Multimodal LLMs Xin Zhang et.al. 2412.16855 null
2024-12-21 AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles Aritra Kumar Lahiri et.al. 2412.16701 null
2024-12-20 MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection Andrea Moglia et.al. 2412.15925 link
2024-12-20 Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution Wentao Tan et.al. 2412.15650 link
2024-12-20 Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM Yangyang Guo et.al. 2412.15614 null
2024-12-20 QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning Xinyang Tong et.al. 2412.15576 null
2024-12-20 Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage Saehyung Lee et.al. 2412.15484 null
2024-12-19 MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs Yuxuan Wan et.al. 2412.15310 link
2024-12-19 OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving Shuo Xing et.al. 2412.15208 link
2024-12-19 Progressive Multimodal Reasoning via Active Retrieval Guanting Dong et.al. 2412.14835 null
2024-12-19 Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models Zijun Chen et.al. 2412.14660 link
2024-12-18 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Jihan Yang et.al. 2412.14171 link
2024-12-18 InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models Cong Wei et.al. 2412.14006 link
2024-12-18 LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Yipeng Zhang et.al. 2412.13871 link
2024-12-17 Modality-Inconsistent Continual Learning of Multimodal Large Language Models Weiguo Pian et.al. 2412.13050 null
2024-12-17 ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing Yaohui Ma et.al. 2412.12821 link
2024-12-17 PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model Yuqing Wang et.al. 2412.12737 link
2024-12-17 ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding Zhenxing Zhang et.al. 2412.12718 link
2024-12-17 Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation Andong Chen et.al. 2412.12627 null
2024-12-17 FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning Seunghee Kim et.al. 2412.12567 null
2024-12-17 Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models Sina Bagheri Nezhad et.al. 2412.12500 link
2024-12-16 Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering Jinhe Bi et.al. 2412.12359 link
2024-12-16 Instruction-based Image Manipulation by Watching How Things Move Mingdeng Cao et.al. 2412.12087 null
2024-12-16 CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Guo Chen et.al. 2412.12075 null
2024-12-16 Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning Yuti Liu et.al. 2412.11952 null
2024-12-16 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges Yibo Yan et.al. 2412.11936 null
2024-12-16 PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension Kun Ouyang et.al. 2412.11906 null
2024-12-16 GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training Renqiu Xia et.al. 2412.11863 link
2024-12-16 IDEA-Bench: How Far are Generative Models from Professional Designing? Chen Liang et.al. 2412.11767 link
2024-12-16 From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality Shixin Jiang et.al. 2412.11694 null
2024-12-16 ACE- $M^3$ : Automatic Capability Evaluator for Multimodal Medical Models Xiechi Zhang et.al. 2412.11453 null
2024-12-15 Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal Yuhao Wang et.al. 2412.11196 null
2024-12-13 Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining Zhiqi Ge et.al. 2412.10342 null
2024-12-13 BrushEdit: All-In-One Image Inpainting and Editing Yaowei Li et.al. 2412.10316 null
2024-12-13 Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer’s Disease Identification Yifan Gao et.al. 2412.09928 null
2024-12-12 ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation Ali Athar et.al. 2412.09754 null
2024-12-12 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Zhuofan Zong et.al. 2412.09618 null
2024-12-13 Olympus: A Universal Task Router for Computer Vision Tasks Yuanze Lin et.al. 2412.09612 link
2024-12-12 SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Hao Li et.al. 2412.09604 null
2024-12-12 Do Multimodal Large Language Models See Like Humans? Jiaying Lin et.al. 2412.09603 null
2024-12-12 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Pan Zhang et.al. 2412.09596 link
2024-12-12 OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Jitesh Jain et.al. 2412.09585 link
2024-12-12 Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Zhisheng Zhong et.al. 2412.09501 link
2024-12-12 Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation Baisen Wang et.al. 2412.09428 link
2024-12-12 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang et.al. 2412.09278 link
2024-12-11 LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information Ke Wang et.al. 2412.08771 null
2024-12-11 From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons Andrew Szot et.al. 2412.08442 null
2024-12-11 HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models Shiding Zhu et.al. 2412.08378 null
2024-12-11 M2SE: A Multistage Multitask Instruction Tuning Strategy for Unified Sentiment and Emotion Analysis Ao Li et.al. 2412.08049 link
2024-12-10 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Jianzong Wu et.al. 2412.07589 null
2024-12-09 SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations Zhaorun Chen et.al. 2412.06878 null
2024-12-09 ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Chunwei Wang et.al. 2412.06673 null
2024-12-09 3D Spatial Understanding in MLLMs: Disambiguation and Evaluation Chun-Peng Chang et.al. 2412.06613 null
2024-12-12 World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving Mingliang Zhai et.al. 2412.06324 null
2024-12-09 LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations Mingjie Xu et.al. 2412.06322 link
2024-12-09 Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness Qifan Yu et.al. 2412.06293 null
2024-12-09 ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models Bingchen Gong et.al. 2412.06292 null
2024-12-08 GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis Ashish Goswami et.al. 2412.06089 null
2024-12-08 Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Xiao Xu et.al. 2412.05939 null
2024-12-08 Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models Ma Teng et.al. 2412.05934 link
2024-12-08 [CLS] Token Tells Everything Needed for Training-free Efficient MLLMs Ao Wang et.al. 2412.05819 link
2024-12-06 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Zhe Chen et.al. 2412.05271 link
2024-12-06 CompCap: Improving Multimodal Large Language Models with Composite Captions Xiaohui Chen et.al. 2412.05243 null
2024-12-06 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Jarvis Guo et.al. 2412.05237 null
2024-12-06 LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation Donald Shenaj et.al. 2412.05148 link
2024-12-06 Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models Zehao Wang et.al. 2412.04939 null
2024-12-06 EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation Yongxin Wang et.al. 2412.04903 null
2024-12-06 Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis Rui Zhou et.al. 2412.04707 null
2024-12-05 Assessing and Learning Alignment of Unimodal Vision and Language Models Le Zhang et.al. 2412.04616 null
2024-12-05 p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay Jun Zhang et.al. 2412.04449 link
2024-12-05 EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios Lu Qiu et.al. 2412.04447 null
2024-12-05 GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Kaiyi Huang et.al. 2412.04440 null
2024-12-05 Grounding Descriptions in Images informs Zero-Shot Visual Recognition Shaunak Halbe et.al. 2412.04429 link
2024-12-05 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Jiuhai Chen et.al. 2412.04424 link
2024-12-05 Liquid: Language Models are Scalable Multi-modal Generators Junfeng Wu et.al. 2412.04332 link
2024-12-05 FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression Bo Tong et.al. 2412.04317 link
2024-12-04 VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding Chaoyu Li et.al. 2412.03735 null
2024-12-04 DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation Qingdong He et.al. 2412.03255 null
2024-12-04 Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges Minghao Shao et.al. 2412.03220 null
2024-12-04 ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning Zhe Xie et.al. 2412.03104 link
2024-12-03 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Kaixiong Gong et.al. 2412.02611 null
2024-12-03 Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks Jinjin Cai et.al. 2412.02531 null
2024-12-03 VR Based Emotion Recognition Using Deep Multimodal Fusion With Biosignals Across Multiple Anatomical Domains Pubudu L. Indrasiri et.al. 2412.02283 null
2024-12-03 Personalized Multimodal Large Language Models: A Survey Junda Wu et.al. 2412.02142 null
2024-12-03 WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image Yuci Liang et.al. 2412.02141 null
2024-12-03 Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey Yunkai Dang et.al. 2412.02104 null
2024-12-02 PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving Xuewen Luo et.al. 2412.02025 null
2024-12-02 MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models Xiaomin Li et.al. 2412.01343 null
2024-12-02 Enhancing Perception Capabilities of Multimodal LLMs with Training-free Fusion Zhuokun Chen et.al. 2412.01289 null
2024-12-02 Ponder & Press: Advancing Visual GUI Agent towards General Computer Control Yiqin Wang et.al. 2412.01268 null
2024-12-02 T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs Shukang Yin et.al. 2411.19951 link
2024-11-29 VLSBench: Unveiling Visual Leakage in Multimodal Safety Xuhao Hu et.al. 2411.19939 null
2024-11-29 On Domain-Specific Post-Training for Multimodal Large Language Models Daixuan Cheng et.al. 2411.19930 null
2024-11-29 Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings Qiong Wu et.al. 2411.19628 link
2024-11-28 Libra: Leveraging Temporal Images for Biomedical Radiology Analysis Xi Zhang et.al. 2411.19378 link
2024-11-28 SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation Yuhan Pei et.al. 2411.19182 null
2024-11-28 Detailed Object Description with Controllable Dimensions Xinran Wang et.al. 2411.19106 link
2024-11-28 I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting Nicola Fanelli et.al. 2411.19050 link
2024-11-28 DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users Wataru Kawabe et.al. 2411.18908 null
2024-11-27 Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment Soumya Suvra Ghosal et.al. 2411.18688 null
2024-11-27 Cross-modal Information Flow in Multimodal Large Language Models Zhi Zhang et.al. 2411.18620 link
2024-11-27 GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Pengfei Zhou et.al. 2411.18499 null
2024-11-27 ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Qing Jiang et.al. 2411.18363 link
2024-11-27 Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models Jingming Liu et.al. 2411.18142 null
2024-11-26 NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects? Jiaxuan Li et.al. 2411.17794 null
2024-11-26 Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Yuhang Han et.al. 2411.17686 null
2024-11-26 What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics Jordan J. Bird et.al. 2411.17593 null
2024-11-26 Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey Jiayi Kuang et.al. 2411.17558 null
2024-11-26 InsightEdit: Towards Better Instruction Following for Image Editing Yingjing Xu et.al. 2411.17323 null
2024-11-26 in-Car Biometrics (iCarB) Datasets for Driver Recognition: Face, Fingerprint, and Voice Vedrana Krivokuca Hahn et.al. 2411.17305 null
2024-11-26 A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs Lehan He et.al. 2411.17265 null
2024-11-26 HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator Fan Yang et.al. 2411.17261 null
2024-11-26 Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment Zheng Chen et.al. 2411.17237 link
2024-11-26 DOGE: Towards Versatile Visual Document Grounding and Referring Yinan Zhou et.al. 2411.17125 null
2024-11-26 Multimodal Alignment and Fusion: A Survey Songtao Li et.al. 2411.17040 null
2024-11-25 TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation Linqing Zhong et.al. 2411.16425 null
2024-11-25 Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models Hao Yi et.al. 2411.16201 null
2024-11-25 Interpreting Object-level Foundation Models via Visual Precision Search Ruoyu Chen et.al. 2411.16198 link
2024-11-25 ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Haozhan Shen et.al. 2411.16044 link
2024-11-23 Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark Rong-Cheng Tu et.al. 2411.15488 link
2024-11-23 Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy Te Yang et.al. 2411.15453 null
2024-11-22 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Chaoyou Fu et.al. 2411.15296 link
2024-11-22 VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Daeun Lee et.al. 2411.15115 null
2024-11-22 mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA Tao Zhang et.al. 2411.15041 null
2024-11-22 De-biased Multimodal Electrocardiogram Analysis Haitao Li et.al. 2411.14795 null
2024-11-22 Evaluating and Advancing Multimodal Large Language Models in Ability Lens Feng Chen et.al. 2411.14725 null
2024-11-22 FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data Binqian Xu et.al. 2411.14717 link
2024-11-22 Any-to-3D Generation via Hybrid Diffusion Supervision Yijun Fan et.al. 2411.14715 null
2024-11-21 LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval Weiheng Lu et.al. 2411.14505 null
2024-11-21 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Yuhao Dong et.al. 2411.14432 link
2024-11-21 Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding Yiming Zhang et.al. 2411.14401 null
2024-11-21 Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance Haozhe Zhao et.al. 2411.14279 null
2024-11-21 Separable Mixture of Low-Rank Adaptation for Continual Visual Instruction Tuning Ziqi Wang et.al. 2411.13949 null
2024-11-21 Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided Visual Prompts Honglin Li et.al. 2411.13909 null
2024-11-20 Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs Rui Cao et.al. 2411.13697 link
2024-11-20 AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations Gaurav Verma et.al. 2411.13451 null
2024-11-20 DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving Xianda Guo et.al. 2411.13112 link
2024-11-20 Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving Hao Zhou et.al. 2411.13076 null
2024-11-19 Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models Zhen Zeng et.al. 2411.12790 null
2024-11-19 Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting Haoyu Zhao et.al. 2411.12789 null
2024-11-19 Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning Pengkun Jiao et.al. 2411.12787 null
2024-11-19 Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model Yiming Shi et.al. 2411.12783 null
2024-11-18 Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning Xudong Yan et.al. 2411.12584 null
2024-11-19 CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model Dongyoung Go et.al. 2411.12287 null
2024-11-18 AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning Kun Xiang et.al. 2411.11930 link
2024-11-18 Dissecting Misalignment of Multimodal Large Language Models via Influence Function Lijie Hu et.al. 2411.11667 null
2024-11-18 MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models Harshita Sharma et.al. 2411.11362 null
2024-11-18 CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset Zhiming Wang et.al. 2411.11360 link
2024-11-18 MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis Yingjie Zhou et.al. 2411.11235 null
2024-11-19 Multilingual Large Language Models: A Systematic Survey Shaolin Zhu et.al. 2411.11072 link
2024-11-19 VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? Yunlong Tang et.al. 2411.10979 null
2024-11-17 Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu et.al. 2411.10950 link
2024-11-17 Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning Wenke Huang et.al. 2411.10928 null
2024-11-16 BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization Md. Nazmus Sadat Samin et.al. 2411.10879 link
2024-11-16 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Jinqiang Long et.al. 2411.10669 link
2024-11-15 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Weiyun Wang et.al. 2411.10442 null
2024-11-15 Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization Yuhan Fu et.al. 2411.10436 null
2024-11-15 Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting Ziqi Xie et.al. 2411.10309 link
2024-11-15 Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning Jingru Yang et.al. 2411.10252 null
2024-11-15 CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation Xiaofei Zhu et.al. 2411.10060 null
2024-11-15 VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos Weihao Zhong et.al. 2411.10032 null
2024-11-15 Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs Xiaofeng Zhang et.al. 2411.09968 null
2024-11-14 MagicQuill: An Intelligent Interactive Image Editing System Zichen Liu et.al. 2411.09703 link
2024-11-14 Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models Wei Wang et.al. 2411.09691 null
2024-11-14 Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models Chutian Meng et.al. 2411.09449 null
2024-11-14 Spider: Any-to-Many Multimodal LLM Jinxiang Lai et.al. 2411.09439 link
2024-11-14 LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation Zhenshi Li et.al. 2411.09301 link
2024-11-13 Multimodal Instruction Tuning with Hybrid State Space Models Jianing Zhou et.al. 2411.08840 null
2024-11-13 Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks? Quan Zhang et.al. 2411.08466 null
2024-11-13 Material Property Prediction with Element Attribute Knowledge Graphs and Multimodal Representation Learning Chao Huang et.al. 2411.08414 null
2024-11-12 SimBase: A Simple Baseline for Temporal Video Grounding Peijun Bao et.al. 2411.07945 null
2024-11-12 Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding Zirui Shao et.al. 2411.07722 null
2024-11-12 Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models Tiejin Chen et.al. 2411.07559 null
2024-11-11 Multimodal Fusion Balancing Through Game-Theoretic Regularization Konstantinos Kontras et.al. 2411.07335 null
2024-11-11 CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models Junho Kim et.al. 2411.06869 null
2024-11-11 Learning from Feedback: Semantic Enhancement for Object SLAM Using Foundation Models Jungseok Hong et.al. 2411.06752 null
2024-11-10 KMM: Key Frame Mask Mamba for Extended Motion Generation Zeyu Zhang et.al. 2411.06481 link
2024-11-09 A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks Chia Xin Liang et.al. 2411.06284 null
2024-11-09 An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models Fatemeh Shiri et.al. 2411.06048 link
2024-11-08 Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation Dong Shu et.al. 2411.05316 link
2024-11-08 Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding Jaeyoo Park et.al. 2411.05254 null
2024-11-07 On Erroneous Agreements of CLIP Image Embeddings Siting Li et.al. 2411.05195 null
2024-11-07 Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models Pete Janowczyk et.al. 2411.05056 null
2024-11-07 CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM Jingwei Xu et.al. 2411.04954 null
2024-11-07 GUI Agents with Foundation Models: A Comprehensive Survey Shuai Wang et.al. 2411.04890 null
2024-11-07 Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs Chengxin Hu et.al. 2411.04708 null
2024-11-06 Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education Anand Syamkumar et.al. 2411.04308 null
2024-11-06 Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection Nana Lin et.al. 2411.04158 null
2024-11-06 Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination Dingjie Song et.al. 2411.03823 link
2024-11-06 StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding Junming Lin et.al. 2411.03628 link
2024-11-05 MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Ziliang Gan et.al. 2411.03314 null
2024-11-05 Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? Jingyu Xiao et.al. 2411.03292 link
2024-11-06 Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent Yangning Li et.al. 2411.02937 link
2024-11-05 Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning Mingcheng Li et.al. 2411.02793 null
2024-11-05 Multimodal Commonsense Knowledge Distillation for Visual Question Answering Shuo Yang et.al. 2411.02722 null
2024-11-05 Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios Yunkai Dang et.al. 2411.02708 null
2024-11-04 MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs Sheng-Chieh Lin et.al. 2411.02571 null
2024-11-04 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution Yang Yue et.al. 2411.02359 link
2024-11-04 KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension Jie Yang et.al. 2411.01846 null
2024-11-04 ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model Yiming Sun et.al. 2411.01756 null
2024-11-03 UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models Sejoon Oh et.al. 2411.01703 null
2024-11-03 Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation Seongsu Ha et.al. 2411.01494 null
2024-11-02 Can Multimodal Large Language Model Think Analogically? Diandian Guo et.al. 2411.01307 null
2024-11-02 Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems Mikołaj Małkiński et.al. 2411.01173 null
2024-11-01 Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasks Laura Wenderoth et.al. 2411.00725 null
2024-11-01 Unified Generative and Discriminative Training for Multi-modal Large Language Models Wei Chow et.al. 2411.00304 null
2024-10-31 JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment Joao Sousa et.al. 2410.23988 null
2024-10-31 Leveraging LLMs for MT in Crisis Scenarios: a blueprint for low-resource languages Séamus Lankford et.al. 2410.23890 null
2024-10-31 Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding Jinlong He et.al. 2410.23822 null
2024-10-30 PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures Tianxiang Wu et.al. 2410.23089 null
2024-10-29 Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring Matthew McKinney et.al. 2410.22558 null
2024-10-29 Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench Zheyuan Liu et.al. 2410.22108 link
2024-10-28 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Hanyu Wang et.al. 2410.21264 null
2024-10-28 Face-MLLM: A Large Face Perception Model Haomiao Sun et.al. 2410.20717 null
2024-10-27 Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys Lu Wang et.al. 2410.20402 null
2024-10-26 LLMs Can Evolve Continually on Modality for X-Modal Reasoning Jiazuo Yu et.al. 2410.20178 link
2024-10-25 Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements Silvia Terragni et.al. 2410.19974 null
2024-10-25 Improving Multimodal Large Language Models Using Continual Learning Shikhar Srivastava et.al. 2410.19925 null
2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng et.al. 2410.19702 null
2024-10-28 BIFRÖST: 3D-Aware Image compositing with Language Instructions Lingxiao Li et.al. 2410.19079 link
2024-10-24 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Zhangheng Li et.al. 2410.18967 null
2024-10-24 SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models Zonghao Ying et.al. 2410.18927 null
2024-10-24 Distill Visual Chart Reasoning Ability from LLMs to MLLMs Wei He et.al. 2410.18798 link
2024-10-24 DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Yuang Ai et.al. 2410.18666 link
2024-10-25 Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks Lehan Wang et.al. 2410.18387 null
2024-10-23 TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts Yuxuan Xie et.al. 2410.18071 null
2024-10-23 CLEAR: Character Unlearning in Textual and Visual Modalities Alexey Dontsov et.al. 2410.18057 null
2024-10-23 Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation Wenfang Yao et.al. 2410.17918 link
2024-10-23 ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning Zhiwei Hao et.al. 2410.17779 link
2024-10-23 YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions Xiguang Li et.al. 2410.17734 null
2024-10-23 Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact Junhua Liu et.al. 2410.17532 null
2024-10-22 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Xiaoqian Shen et.al. 2410.17434 link
2024-10-22 Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models Zhijie Tan et.al. 2410.16983 null
2024-10-22 IPL: Leveraging Multimodal Large Language Models for Intelligent Product Listing Kang Chen et.al. 2410.16977 null
2024-10-22 Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance Zhangwei Gao et.al. 2410.16261 link
2024-10-21 LLaVA-KD: A Framework of Distilling Multimodal Large Language Models Yuxuan Cai et.al. 2410.16236 link
2024-10-21 Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining Han Huang et.al. 2410.16166 link
2024-10-21 Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Xiang Yue et.al. 2410.16153 null
2024-10-21 Mitigating Object Hallucination via Concentric Causal Attention Yun Xing et.al. 2410.15926 link
2024-10-21 AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection Xiaoman Xu et.al. 2410.15591 link
2024-10-20 Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation Jiayu Xiong et.al. 2410.15475 null
2024-10-20 Modality-Fair Preference Optimization for Trustworthy MLLM Alignment Songtao Jiang et.al. 2410.15334 null
2024-10-19 SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation Jingxuan Chen et.al. 2410.15164 link
2024-10-19 LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound Xuechen Guo et.al. 2410.15074 null
2024-10-18 MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps Xiongtao Zhou et.al. 2410.14668 link
2024-10-18 MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems Zifeng Zhu et.al. 2410.14179 null
2024-10-18 RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training Muhe Ding et.al. 2410.14154 null
2024-10-17 PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Rongyao Fang et.al. 2410.13861 link
2024-10-17 $γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models Yaxin Luo et.al. 2410.13859 null
2024-10-17 Can MLLMs Understand the Deep Implication Behind Chinese Images? Chenhao Zhang et.al. 2410.13854 link
2024-10-18 Harnessing Webpage UIs for Text-Rich Visual Understanding Junpeng Liu et.al. 2410.13824 null
2024-10-17 MobA: A Two-Level Agent System for Efficient Mobile Task Automation Zichen Zhu et.al. 2410.13757 link
2024-10-17 Exploring the Design Space of Visual Context Representation in Video MLLMs Yifan Du et.al. 2410.13694 link
2024-10-17 Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant Haoran Hao et.al. 2410.13360 link
2024-10-16 MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs Yunqiu Xu et.al. 2410.12332 null
2024-10-16 Understanding the Role of LLMs in Multimodal Evaluation Benchmarks Botian Jiang et.al. 2410.12329 link
2024-10-16 Multimodal Fusion with Relational Learning for Molecular Property Prediction Zhengyang Zhou et.al. 2410.12128 null
2024-10-15 MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding Yue Cao et.al. 2410.11829 link
2024-10-15 MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation Chenxi Wang et.al. 2410.11779 link
2024-10-15 SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding Ying Chen et.al. 2410.11761 null
2024-10-15 Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions Yuhan Fu et.al. 2410.11701 null
2024-10-15 VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Sijie Cheng et.al. 2410.11623 null
2024-10-15 MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark Bin Shan et.al. 2410.11538 link
2024-10-15 Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs Sihang Zhao et.al. 2410.11437 link
2024-10-15 Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models Zhongye Liu et.al. 2410.11242 link
2024-10-15 MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation Xianping Ma et.al. 2410.11160 link
2024-10-14 Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes Tim Broedermann et.al. 2410.10791 link
2024-10-14 MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages Shubhi Bansal et.al. 2410.10407 link
2024-10-14 Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation Shun Qian et.al. 2410.10319 null
2024-10-14 ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization Jiawei Li et.al. 2410.10238 null
2024-10-14 Tracing Human Stress from Physiological Signals using UWB Radar Jia Xu et.al. 2410.10155 null
2024-10-15 LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models Han Qiu et.al. 2410.09962 link
2024-10-13 Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records Shuai Jiang et.al. 2410.09880 null
2024-10-13 Text4Seg: Reimagining Image Segmentation as Text Generation Mengcheng Lan et.al. 2410.09855 link
2024-10-12 Skipping Computations in Multimodal LLMs Mustafa Shukor et.al. 2410.09454 link
2024-10-12 MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection Xi Jiang et.al. 2410.09453 link
2024-10-11 Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion Shiao Wang et.al. 2410.08879 null
2024-10-11 Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking Wei Zhang et.al. 2410.08616 null
2024-10-11 Baichuan-Omni Technical Report Yadong Li et.al. 2410.08565 link
2024-10-11 SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models Haotian Xia et.al. 2410.08474 link
2024-10-10 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo et.al. 2410.08202 null
2024-10-10 Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models Qingni Wang et.al. 2410.08174 null
2024-10-10 Agent S: An Open Agentic Framework that Uses Computers Like a Human Saaket Agashe et.al. 2410.08164 link
2024-10-10 Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs Xiaoyuan Liu et.al. 2410.08145 link
2024-10-09 Retrieval Replace Reduction: An effective visual token reduction method via semantic match Yingen Liu et.al. 2410.07278 null
2024-10-09 Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis Bohan Zeng et.al. 2410.07155 link
2024-10-09 Personalized Visual Instruction Tuning Renjie Pi et.al. 2410.07113 link
2024-10-10 Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology Xiangyu Wang et.al. 2410.07087 null
2024-10-09 HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding Keliang Li et.al. 2410.06777 null
2024-10-09 To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models Junyan Lin et.al. 2410.06765 link
2024-10-09 ING-VP: MLLMs cannot Play Easy Vision-based Games Yet Haoran Zhang et.al. 2410.06555 link
2024-10-09 Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection Aravinda Reddy PN et.al. 2410.06543 null
2024-10-08 Multimodal Situational Safety Kaiwen Zhou et.al. 2410.06172 null
2024-10-08 Quadratic Is Not What You Need For Multimodal Large Language Models Phu Pham et.al. 2410.06169 link
2024-10-08 $\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$ tendable Deepfake Detection Yize Chen et.al. 2410.06126 null
2024-10-07 Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents Boyu Gou et.al. 2410.05243 link
2024-10-07 Organizing Unstructured Image Collections using Natural Language Mingxuan Liu et.al. 2410.05217 null
2024-10-07 Multimodal Fusion Strategies for Mapping Biophysical Landscape Features Lucia Gordon et.al. 2410.04833 link
2024-10-07 MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models Kaichen Huang et.al. 2410.04819 link
2024-10-07 Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality Guanyu Zhou et.al. 2410.04780 link
2024-10-07 MM-R $^3$ : On (In-)Consistency of Multi-modal Large Language Models (MLLMs) Shih-Han Chou et.al. 2410.04778 null
2024-10-07 Diffusion Models in 3D Vision: A Survey Zhen Wang et.al. 2410.04738 null
2024-10-07 ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models Ziyue Wang et.al. 2410.04659 link
2024-10-08 FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering Siqiao Xue et.al. 2410.04526 null
2024-10-06 MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration Lai Wei et.al. 2410.04521 link
2024-10-04 Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models Xin Zou et.al. 2410.03577 link
2024-10-04 Gradient-based Jailbreak Images for Multimodal Fusion Models Javier Rando et.al. 2410.03489 link
2024-10-04 MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Junpeng Yue et.al. 2410.03450 null
2024-10-04 SELU: Self-Learning Embodied MLLMs in Unknown Environments Boyu Li et.al. 2410.03303 null
2024-10-03 Contrastive Localized Language-Image Pre-Training Hong-You Chen et.al. 2410.02746 null
2024-10-03 LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model Duy M. H. Nguyen et.al. 2410.02615 null
2024-10-03 Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment Kai Liu et.al. 2410.02505 link
2024-10-04 SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack Zihao Pan et.al. 2410.02240 link
2024-10-04 From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities Wanpeng Zhang et.al. 2410.02155 null
2024-10-02 Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations Minoh Jeong et.al. 2410.02086 null
2024-10-02 EMMA: Efficient Visual Alignment in Multi-Modal LLMs Sara Ghazanfari et.al. 2410.02080 link
2024-10-03 Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks Mengzhao Jia et.al. 2410.01744 link
2024-10-02 Visual Perception in Text Strings Qi Jia et.al. 2410.01733 link
2024-10-02 The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs Hong Li et.al. 2410.01417 null
2024-10-02 SHAP-CAT: A interpretable multi-modal framework enhancing WSI classification via virtual staining and shapley-value-based multimodal fusion Jun Wang et.al. 2410.01408 null
2024-10-01 FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks Peiran Wu et.al. 2410.01089 null
2024-10-01 Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data Ivica Dimitrovski et.al. 2410.00469 null
2024-10-01 Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations Miyu Goko et.al. 2410.00436 null
2024-10-01 MERIT: Multimodal Wearable Vital Sign Waveform Monitoring Yongyang Tang et.al. 2410.00392 null
2024-09-30 Multimodal Alignment of Histopathological Images Using Cell Segmentation and Point Set Matching for Integrative Cancer Analysis Jun Jiang et.al. 2410.00152 null
2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang et.al. 2409.20566 null
2024-09-30 UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Qiaojun Yu et.al. 2409.20551 null
2024-09-30 Melody Is All You Need For Music Generation Shaopeng Wei et.al. 2409.20196 link
2024-09-30 VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection Huilin Deng et.al. 2409.20146 null
2024-09-30 Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval Yabing Wang et.al. 2409.19961 link
2024-09-30 WildFusion: Multimodal Implicit 3D Reconstructions in the Wild Yanbaihui Liu et.al. 2409.19904 null
2024-10-01 Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration Kaihang Pan et.al. 2409.19872 link
2024-09-29 Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs Fengzhu Zeng et.al. 2409.19656 null
2024-09-28 A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping Houjian Yu et.al. 2409.19457 null
2024-09-28 Visual Question Decomposition on Multimodal Large Language Models Haowei Zhang et.al. 2409.19339 null
2024-09-27 Enhancing Explainability in Multimodal Large Language Models Using Ontological Context Jihen Amara et.al. 2409.18753 null
2024-09-27 3DPX: Single Panoramic X-ray Analysis Guided by 3D Oral Structure Reconstruction Xiaoshuang Li et.al. 2409.18701 null
2024-09-27 Image-guided topic modeling for interpretable privacy classification Alina Elena Baia et.al. 2409.18674 link
2024-09-27 When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation Yuli Zhou et.al. 2409.18653 link
2024-09-27 Align $^2$ LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation Hongzhe Huang et.al. 2409.18541 link
2024-09-27 FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation Yuki Imajuku et.al. 2409.18459 null
2024-09-26 Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing Huthaifa I. Ashqar et.al. 2409.18286 null
2024-09-26 EAGLE: Egocentric AGgregated Language-video Engine Jing Bi et.al. 2409.17523 null
2024-09-26 Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE Xun Zhu et.al. 2409.17508 link
2024-09-25 Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents Junting Lu et.al. 2409.17140 null
2024-09-25 Pruning Multilingual Large Language Models for Multilingual Inference Hwichan Kim et.al. 2409.16911 link
2024-09-25 MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features Katharina Anderer et.al. 2409.16765 link
2024-09-26 EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models Jiacheng Zhang et.al. 2409.16723 null
2024-09-25 EventHallusion: Diagnosing Event Hallucinations in Video LLMs Jiacheng Zhang et.al. 2409.16597 link
2024-09-24 DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection Jiaxin Ye et.al. 2409.15936 link
2024-09-25 M^2PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning Taowen Wang et.al. 2409.15657 link
2024-09-23 MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models Mohammad Shahab Sepehri et.al. 2409.15477 link
2024-09-24 OmniBench: Towards The Future of Universal Omni-Language Models Yizhi Li et.al. 2409.15272 link
2024-09-23 Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation Manu Gaur et.al. 2409.15125 null
2024-09-23 Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond Hong Chen et.al. 2409.14993 null
2024-09-23 FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension Junzhuo Liu et.al. 2409.14750 link
2024-09-24 Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Yan Shu et.al. 2409.14485 link
2024-09-21 Enhancing Advanced Visual Reasoning Ability of Large Language Models Zhiyuan Li et.al. 2409.13980 null
2024-09-20 MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension Ting Liu et.al. 2409.13609 link
2024-09-18 Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference Najmeh Forouzandehmehr et.al. 2409.12150 null
2024-09-18 Fusion in Context: A Multimodal Approach to Affective State Recognition Youssef Mohamed et.al. 2409.11906 null
2024-09-18 Bridging Design and Development with Automated Declarative UI Code Generation Ting Zhou et.al. 2409.11667 null
2024-09-17 Towards Time Series Reasoning with LLMs Winnie Chow et.al. 2409.11376 null
2024-09-17 CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration Jiahui Gao et.al. 2409.11365 null
2024-09-17 Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection Yuta Kaneko et.al. 2409.11223 null
2024-09-16 Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving Yunsheng Ma et.al. 2409.11182 null
2024-09-17 Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs Dingjie Song et.al. 2409.10994 link
2024-09-17 Multi-Floor Zero-Shot Object Navigation Policy Lingfeng Zhang et.al. 2409.10906 null
2024-09-16 XLM for Autonomous Driving Systems: A Comprehensive Review Sonda Fourati et.al. 2409.10484 null
2024-09-16 Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models Weihao Ye et.al. 2409.10197 link
2024-09-15 Explore the Hallucination on Low-level Perception for MLLMs Yinan Sun et.al. 2409.09748 null
2024-09-15 AutoJournaling: A Context-Aware Journaling System Leveraging MLLMs on Smartphone Screenshots Tianyi Zhang et.al. 2409.09696 null
2024-09-14 Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM Yuanjie Lyu et.al. 2409.09362 null
2024-09-14 ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models Yahan Tu et.al. 2409.09318 null
2024-09-13 Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation Cheng Charles Ma et.al. 2409.09135 null
2024-09-11 Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU Zhenyu Ning et.al. 2409.09086 null
2024-09-13 VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation Hanning Chen et.al. 2409.08464 link
2024-09-11 Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering Weixi Weng et.al. 2409.07331 null
2024-09-11 Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout Anbin QI et.al. 2409.07078 null
2024-09-10 LIME-M: Less Is More for Evaluation of MLLMs Kang Zhu et.al. 2409.06851 link
2024-09-10 VoiceWukong: Benchmarking Deepfake Voice Detection Ziwei Yan et.al. 2409.06348 null
2024-09-10 MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding Surbhi Madan et.al. 2409.06224 null
2024-09-09 MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data Jianyi Zhang et.al. 2409.06067 null
2024-09-09 Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models Hongyang Lei et.al. 2409.05929 null
2024-09-09 Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Haritheja Etukuru et.al. 2409.05865 link
2024-09-15 MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Run Luo et.al. 2409.05840 null
2024-09-11 A Survey of Multimodal Composite Editing and Retrieval Suyan Li et.al. 2409.05405 link
2024-09-07 Training-free ZS-CIR via Weighted Modality Fusion and Similarity Ren-Di Wu et.al. 2409.04918 link
2024-09-06 Influence of Early through Late Fusion on Pancreas Segmentation from Imperfectly Registered Multimodal MRI Lucas W. Remedios et.al. 2409.04563 link
2024-09-10 Question-Answering Dense Video Events Hangyu Qin et.al. 2409.04388 null
2024-09-09 Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver Zeren Zhang et.al. 2409.04214 link
2024-09-06 UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity Yicheng Fu et.al. 2409.04081 null
2024-09-09 mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Anwen Hu et.al. 2409.03420 link
2024-09-05 ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding Zhengzhuo Xu et.al. 2409.03277 null
2024-09-05 OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving Julong Wei et.al. 2409.03272 null
2024-09-05 TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations Mingze Gao et.al. 2409.03206 null
2024-09-04 No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning Manu Gaur et.al. 2409.03025 null
2024-09-06 HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts Xinyu Liu et.al. 2409.02919 link
2024-09-04 LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Xidong Wang et.al. 2409.02889 link
2024-09-04 A Medical Multimodal Large Language Model for Pediatric Pneumonia Weiwei Tian et.al. 2409.02608 null
2024-09-02 Understanding Multimodal Hallucination with Parameter-Free Representation Alignment Yueqian Wang et.al. 2409.01151 link
2024-09-01 Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model Fuqiang Niu et.al. 2409.00597 null
2024-08-31 StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models Yuxiang Guo et.al. 2409.00304 null
2024-08-30 EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs Zhen Fan et.al. 2408.17168 null
2024-08-30 AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding Yonghui Wang et.al. 2408.16986 link
2024-08-29 Law of Vision Representation in MLLMs Shijia Yang et.al. 2408.16357 link
2024-08-28 Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Min Shi et.al. 2408.15998 link
2024-08-28 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Fangxun Shu et.al. 2408.15881 link
2024-08-28 A Survey on Evaluation of Multimodal Large Language Models Jiaxing Huang et.al. 2408.15769 null
2024-08-28 MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms Tianyi Shang et.al. 2408.15740 link
2024-08-28 TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning Jinglun Li et.al. 2408.15566 link
2024-08-28 Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models Wenbin Wang et.al. 2408.15556 link
2024-08-27 Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation Jian Hu et.al. 2408.15205 link
2024-08-27 GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer Based Fusion Network for Multimodal Sentiment Analysis Yijie Jin et.al. 2408.14809 link
2024-08-26 Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos Qirui Chen et.al. 2408.14469 null
2024-08-26 Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos Jiajun Fei et.al. 2408.14023 link
2024-08-26 FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation Daixun Li et.al. 2408.13980 null
2024-08-25 ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models Yeji Park et.al. 2408.13906 link
2024-08-23 MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Yi-Fan Zhang et.al. 2408.13257 null
2024-08-23 ParGo: Bridging Vision-Language with Partial and Global Views An-Lan Wang et.al. 2408.12928 link
2024-08-23 IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities Bin Wang et.al. 2408.12902 link
2024-08-23 Semantic Alignment for Multimodal Large Language Models Tao Wu et.al. 2408.12867 null
2024-08-22 Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models Jean Park et.al. 2408.12763 null
2024-08-23 Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Khang T. Doan et.al. 2408.12480 null
2024-08-26 MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model Chaoya Jiang et.al. 2408.12321 null
2024-08-21 CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion Yunlong Tang et.al. 2408.12009 null
2024-08-21 SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Yuanyang Yin et.al. 2408.11813 null
2024-08-21 EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model Feipeng Ma et.al. 2408.11795 null
2024-08-21 EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning Bohao Xing et.al. 2408.11424 link
2024-08-21 EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning Zhihao Li et.al. 2408.11397 null
2024-08-22 Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model Mengying Ge et.al. 2408.11286 null
2024-08-20 FLAME: Learning to Navigate with Multimodal LLM in Urban Environments Yunzhe Xu et.al. 2408.11051 link
2024-08-19 CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving Hidehisa Arai et.al. 2408.10845 null
2024-08-20 PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection Tri Cao et.al. 2408.10738 null
2024-08-21 SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition Zebang Cheng et.al. 2408.10500 link
2024-08-19 FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant Zhengchao Huang et.al. 2408.10072 link
2024-08-19 Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting Yun-Da Tsai et.al. 2408.09798 null
2024-08-20 Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation Yuyang Ye et.al. 2408.09698 link
2024-08-18 Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models Kening Zheng et.al. 2408.09429 link
2024-08-17 BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger Yulin Chen et.al. 2408.09093 null
2024-08-16 ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis Yubao Zhao et.al. 2408.08849 link
2024-08-16 Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM Wanting Yang et.al. 2408.08765 null
2024-08-16 Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm Hongcheng Liu et.al. 2408.08693 link
2024-08-16 Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning Wenwen Zhuang et.al. 2408.08640 link
2024-08-16 A Survey on Benchmarks of Multimodal Large Language Models Jian Li et.al. 2408.08632 link
2024-08-16 CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving Shihan Peng et.al. 2408.08500 null
2024-08-15 When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding Pingping Zhang et.al. 2408.08093 null
2024-08-14 End-to-end Semantic-centric Video-based Multimodal Affective Computing Ronghao Lin et.al. 2408.07694 null
2024-08-15 Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities Enneng Yang et.al. 2408.07666 link
2024-08-15 MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark Minxuan Zhou et.al. 2408.07543 link
2024-08-14 LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image Fan Yang et.al. 2408.07422 null
2024-08-14 Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion Peiyuan Chen et.al. 2408.07303 null
2024-08-13 CROME: Cross-Modal Adapters for Efficient Multimodal LLM Sayna Ebrahimi et.al. 2408.06610 null
2024-08-13 Social Debiasing for Fair Multi-modal LLMs Harry Cheng et.al. 2408.06569 null
2024-08-12 Deep Multimodal Collaborative Learning for Polyp Re-Identification Suncheng Xiang et.al. 2408.05914 link
2024-08-11 Advancing Re-Ranking with Multimodal Fusion and Target-Oriented Auxiliary Tasks in E-Commerce Search Enqiang Xu et.al. 2408.05751 null
2024-08-11 A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot Haoxuan Ding et.al. 2408.05729 link
2024-08-13 SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning Yuze Zhao et.al. 2408.05517 link
2024-08-10 How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model Yuxin Zhu et.al. 2408.05411 null
2024-08-09 Revisiting Multi-Modal LLM Evaluation Jian Lu et.al. 2408.05334 null
2024-08-09 Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing Jiarui Xie et.al. 2408.05307 null
2024-08-09 VITA: Towards Open-Source Interactive Omni Multimodal LLM Chaoyou Fu et.al. 2408.05211 link
2024-08-09 Instruction Tuning-free Visual Token Complement for Multimodal LLMs Dongsheng Wang et.al. 2408.05019 null
2024-08-13 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Jiabo Ye et.al. 2408.04840 link
2024-08-09 Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models Qirui Jiao et.al. 2408.04594 link
2024-08-08 MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models Haoxuan Li et.al. 2408.04388 link
2024-08-08 MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning Rex Liu et.al. 2408.04243 null
2024-08-08 M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction Hui Luo et.al. 2408.04170 null
2024-08-07 Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks Zaijing Li et.al. 2408.03615 link
2024-08-07 Unlocking the Non-Native Language Context Limitation: Native Language Prompting Facilitates Knowledge Elicitation Baixuan Li et.al. 2408.03544 link
2024-08-07 Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation Weiqi Feng et.al. 2408.03505 null
2024-08-06 Targeted Visual Prompting for Medical Visual Question Answering Sergio Tascon-Morales et.al. 2408.03043 link
2024-08-05 Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions Xinbei Ma et.al. 2408.02544 link
2024-08-05 UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model Zhaowei Li et.al. 2408.02503 link
2024-08-06 Infusing Environmental Captions for Long-Form Video Language Grounding Hyogun Lee et.al. 2408.02336 null
2024-08-05 REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models Agneet Chatterjee et.al. 2408.02231 null
2024-08-04 Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping Mingxin Huang et.al. 2408.02034 link
2024-08-03 MiniCPM-V: A GPT-4V Level MLLM on Your Phone Yuan Yao et.al. 2408.01800 link
2024-08-03 MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition Ruoyu Wang et.al. 2408.01766 null
2024-08-02 Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs Yilun Hua et.al. 2408.01417 null
2024-08-05 Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs Peng Ding et.al. 2408.01355 link
2024-08-02 A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks Jiaqi Wang et.al. 2408.01319 null
2024-08-02 Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models Kohou Wang et.al. 2408.01003 null
2024-08-02 Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation Zijian Yi et.al. 2408.00970 link
2024-08-01 Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Benlin Liu et.al. 2408.00754 null
2024-08-01 Are Bigger Encoders Always Better in Vision Large Models? Bozhou Li et.al. 2408.00620 null
2024-08-01 Multimodal Fusion and Coherence Modeling for Video Topic Segmentation Hai Yu et.al. 2408.00365 null
2024-08-01 Towards Flexible Evaluation for Generative Visual Question Answering Huishan Ji et.al. 2408.00300 link
2024-08-01 Multi-Modal Parameter-Efficient Fine-tuning via Graph Neural Network Bin Cheng et.al. 2408.00290 null
2024-07-31 ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models Mingrui Wu et.al. 2407.21534 link
2024-07-31 MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training Zhanpeng Chen et.al. 2407.21439 link
2024-07-31 Design and Development of Laughter Recognition System Based on Multimodal Fusion and Deep Learning Fuzheng Zhao et.al. 2407.21391 null
2024-07-31 Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM Can Wang et.al. 2407.21333 null
2024-07-30 Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate Zheng Lin et.al. 2407.20505 link
2024-07-29 CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models Junda Wu et.al. 2407.20454 null
2024-07-29 Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning Xingchen Zeng et.al. 2407.20174 link
2024-07-29 Diffusion Feedback Helps CLIP See Better Wenxuan Wang et.al. 2407.20171 link
2024-07-29 ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 Wenjun Huang et.al. 2407.19832 null
2024-07-29 Multimodal Large Language Models for Bioimage Analysis Shanghang Zhang et.al. 2407.19778 null
2024-07-29 Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images Jiaxin Zhanga et.al. 2407.19719 null
2024-07-29 Harnessing Large Vision and Language Models in Agriculture: A Review Hongyan Zhu et.al. 2407.19679 null
2024-07-29 ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck Chia-Hao Kao et.al. 2407.19651 null
2024-07-28 ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding Zhen Chen et.al. 2407.19435 link
2024-07-28 LLAVADI: What Matters For Multimodal Large Language Models Distillation Shilin Xu et.al. 2407.19409 null
2024-07-27 Data Processing Techniques for Modern Multimodal Models Yinheng Li et.al. 2407.19180 null
2024-07-26 Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment Yuze Zheng et.al. 2407.18854 null
2024-07-26 Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models Xiang Shi et.al. 2407.18626 link
2024-07-25 Automated Ensemble Multimodal Machine Learning for Healthcare Fergus Imrie et.al. 2407.18227 null
2024-07-26 Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic Fakhraddin Alwajih et.al. 2407.18129 null
2024-07-25 ERIT Lightweight Multimodal Dataset for Elderly Emotion Recognition and Multimodal Fusion Evaluation Rita Frieske et.al. 2407.17772 null
2024-07-24 DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation Qian Feng et.al. 2407.17348 null
2024-07-23 CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs Jihyung Kil et.al. 2407.16837 link
2024-07-23 Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation Tao Meng et.al. 2407.16714 null
2024-07-23 PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects Junyi Li et.al. 2407.16696 link
2024-07-24 MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues Liyun Zhang et.al. 2407.16552 null
2024-07-23 Harmonizing Visual Text Comprehension and Generation Zhen Zhao et.al. 2407.16364 link
2024-07-23 INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model Yiwei Ma et.al. 2407.16198 link
2024-07-23 UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models Liu Qi et.al. 2407.16160 link
2024-07-22 Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight Ziyuan Huang et.al. 2407.15819 null
2024-07-22 GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI Zhaojie Fang et.al. 2407.15719 link
2024-07-22 Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models Feifan Zhang et.al. 2407.15335 null
2024-07-21 MIBench: Evaluating Multimodal Large Language Models over Multiple Images Haowei Liu et.al. 2407.15272 null
2024-07-23 BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM Hanjun Luo et.al. 2407.15240 link
2024-07-23 DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer Jinfeng Wei et.al. 2407.15130 null
2024-07-21 Navigation Instruction Generation with BEV Perception and Large Language Models Sheng Fan et.al. 2407.15087 link
2024-07-19 On Pre-training of Multimodal Language Models Customized for Chart Understanding Wan-Cyuan Fan et.al. 2407.14506 null
2024-07-19 T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation Kaiyue Sun et.al. 2407.14505 link
2024-07-19 Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding Renshan Zhang et.al. 2407.14439 link
2024-07-19 Not All Attention is Needed: Parameter and Computation Efficient Tuning for Multi-modal Large Language Models via Effective Attention Skipping Qiong Wu et.al. 2407.14093 null
2024-07-18 X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs Sirnam Swetha et.al. 2407.13851 null
2024-07-20 EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension Wei Zhang et.al. 2407.13596 link
2024-07-18 OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird’s-eye-view Vehicle Semantic Segmentation Jian Sun et.al. 2407.13137 null
2024-07-17 MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models Leyang Shen et.al. 2407.12709 link
2024-07-17 E5-V: Universal Embeddings with Multimodal Large Language Models Ting Jiang et.al. 2407.12580 link
2024-07-17 Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning Mustafa Dogan et.al. 2407.12498 null
2024-07-17 ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Yufan Shen et.al. 2407.12358 link
2024-07-16 UrbanWorld: An Urban World Model for 3D City Generation Yu Shang et.al. 2407.11965 link
2024-07-17 Harnessing Large Language Models for Multimodal Product Bundling Xiaohao Liu et.al. 2407.11712 link
2024-07-15 By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting Hyungjun Yoon et.al. 2407.10385 link
2024-07-13 Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding Ruihuang Li et.al. 2407.09781 null
2024-07-12 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Shraman Pramanick et.al. 2407.09413 link
2024-07-17 Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study Yulong Yang et.al. 2407.09295 null

Prompt

Publish Date Title Authors PDF Code
2025-04-17 IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design Fei Shen et.al. 2504.13176 null
2025-04-17 Personalized Text-to-Image Generation with Auto-Regressive Models Kaiyue Sun et.al. 2504.13162 null
2025-04-17 Science-T2I: Addressing Scientific Illusions in Image Synthesis Jialuo Li et.al. 2504.13129 null
2025-04-17 Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration Yusi Sun et.al. 2504.13119 null
2025-04-17 Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems Ivica Kostric et.al. 2504.13095 null
2025-04-17 EventVAD: Training-Free Event-Aware Video Anomaly Detection Yihua Shao et.al. 2504.13092 null
2025-04-17 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 null
2025-04-17 Early Accessibility: Automating Alt-Text Generation for UI Icons During App Development Sabrina Haque et.al. 2504.13069 null
2025-04-17 Accuracy is Not Agreement: Expert-Aligned Evaluation of Crash Narrative Classification Models Sudesh Ramesh Bhagat et.al. 2504.13068 null
2025-04-17 Aspect-Based Summarization with Self-Aspect Retrieval Enhanced Generation Yichao Feng et.al. 2504.13054 null
2025-04-16 Towards Learning to Complete Anything in Lidar Ayca Takmaz et.al. 2504.12264 null
2025-04-16 Cobra: Efficient Line Art COlorization with BRoAder References Junhao Zhuang et.al. 2504.12240 null
2025-04-16 Exploring GRBs and supernovae connection: does a superluminous hypernova population exist? Achille Fiore et.al. 2504.12224 null
2025-04-16 Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification Jaime E. Cuellar et.al. 2504.12180 null
2025-04-16 FocusedAD: Character-centric Movie Audio Description Xiaojun Ye et.al. 2504.12157 null
2025-04-16 ARCeR: an Agentic RAG for the Automated Definition of Cyber Ranges Matteo Lupinacci et.al. 2504.12143 null
2025-04-16 Multilingual Contextualization of Large Language Models for Document-Level Machine Translation Miguel Moura Ramos et.al. 2504.12140 null
2025-04-16 Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models - Laura Fieback et.al. 2504.12137 null
2025-04-16 Clarifying Ambiguities: on the Role of Ambiguity Types in Prompting Methods for Clarification Generation Anfu Tang et.al. 2504.12113 null
2025-04-16 A Diffusion-Based Framework for Terrain-Aware Remote Sensing Image Reconstruction Zhenyu Yu et.al. 2504.12112 null
2025-04-15 Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception Ziqi Pang et.al. 2504.11457 null
2025-04-15 SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL Junke Wang et.al. 2504.11455 null
2025-04-15 RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models Juan Diego Rodriguez et.al. 2504.11381 null
2025-04-15 DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks Yupei Liu et.al. 2504.11358 null
2025-04-16 Seedream 3.0 Technical Report Yu Gao et.al. 2504.11346 null
2025-04-15 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Wei Xiong et.al. 2504.11343 null
2025-04-15 A Mathematical Framework of Semantic Communication based on Category Theory Shuheng Hua et.al. 2504.11334 null
2025-04-15 Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis Hao Liu et.al. 2504.11331 null
2025-04-15 Decorrelation in Complex Wave Scattering Qihang Zhang et.al. 2504.11330 null
2025-04-15 Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints Ruicheng Ao et.al. 2504.11320 null
2025-04-14 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Tao Zhang et.al. 2504.10465 null
2025-04-14 Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling? Olha Shaposhnyk et.al. 2504.10397 null
2025-04-14 Brain-Machine Interfaces & Information Retrieval Challenges and Opportunities Yashar Moshfeghi et.al. 2504.10371 null
2025-04-14 SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning Yiting Wang et.al. 2504.10369 null
2025-04-14 DICE: A Framework for Dimensional and Contextual Evaluation of Language Models Aryan Shrivastava et.al. 2504.10359 null
2025-04-14 Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis Yifan Yang et.al. 2504.10352 null
2025-04-15 Efficient Prompt Tuning for Hierarchical Ingredient Recognition Yinxuan Gui et.al. 2504.10322 null
2025-04-14 SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model Zongcan Ding et.al. 2504.10320 null
2025-04-14 Analysis of Attention in Video Diffusion Transformers Yuxin Wen et.al. 2504.10317 null
2025-04-14 ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting Huiqi Wu et.al. 2504.10316 null
2025-04-11 Towards an Understanding of Context Utilization in Code Intelligence Yanlin Wang et.al. 2504.08734 null
2025-04-11 Generating Fine Details of Entity Interactions Xinyi Gu et.al. 2504.08714 null
2025-04-11 Fast-Slow-Thinking: Complex Task Solving with Large Language Models Yiliu Sun et.al. 2504.08690 null
2025-04-11 Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis Alexandre Bazin et.al. 2504.08666 null
2025-04-11 Quality evaluation of Tabby coding assistant using real source code snippets Marta Borek et.al. 2504.08650 null
2025-04-11 Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization Jialu Li et.al. 2504.08641 null
2025-04-11 A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English Julian Bäumler et.al. 2504.08609 null
2025-04-11 Lexical Bundle Frequency as a Construct-Relevant Candidate Feature in Automated Scoring of L2 Academic Writing Burak Senel et.al. 2504.08537 null
2025-04-11 Task Memory Engine (TME): Enhancing State Awareness for Multi-Step LLM Agent Tasks Ye Ye et.al. 2504.08525 null
2025-04-11 Scholar Inbox: Personalized Paper Recommendations for Scientists Markus Flicke et.al. 2504.08385 null
2025-04-10 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Zhongyang Li et.al. 2504.07964 link
2025-04-10 Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge Riccardo Cantini et.al. 2504.07887 link
2025-04-10 Towards Sustainable Creativity Support: An Exploratory Study on Prompt Based Image Generation Daniel Hove Paludan et.al. 2504.07879 null
2025-04-10 SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos Joshua Li et.al. 2504.07867 null
2025-04-10 2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization Mengyang Li et.al. 2504.07856 null
2025-04-10 Understanding Learner-LLM Chatbot Interactions and the Impact of Prompting Guidelines Cansu Koyuturk et.al. 2504.07840 null
2025-04-10 HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss Yi Huang et.al. 2504.07827 null
2025-04-10 What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks Pavel Chizhov et.al. 2504.07825 link
2025-04-10 A System for Comprehensive Assessment of RAG Frameworks Mattia Rengo et.al. 2504.07803 link
2025-04-10 FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness Chandan Kumar Sah et.al. 2504.07801 null
2025-04-09 A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility Andreas Hochlehnert et.al. 2504.07086 null
2025-04-09 Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection Ruoyu Chen et.al. 2504.07060 link
2025-04-09 TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling Liang-Hsuan Tseng et.al. 2504.07053 link
2025-04-09 Towards LLMs Robustness to Changes in Prompt Format Styles Lilian Ngweta et.al. 2504.06969 null
2025-04-09 RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts Natalia Loukachevitch et.al. 2504.06947 null
2025-04-09 Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration Kostas Hatalis et.al. 2504.06943 null
2025-04-09 FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks Dekun Dai et.al. 2504.06939 null
2025-04-09 MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs Jiawei Mao et.al. 2504.06897 null
2025-04-09 MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking Chang Nie et.al. 2504.06863 null
2025-04-09 EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation Diljeet Jagpal et.al. 2504.06861 null
2025-04-09 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Gleb Rodionov et.al. 2504.06261 null
2025-04-08 Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning Muhammad Baqer Mollah et.al. 2504.06173 link
2025-04-08 A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning Akash Kumar et.al. 2504.06153 null
2025-04-08 Multi-Sense Embeddings for Language Models and Knowledge Distillation Qitong Wang et.al. 2504.06036 null
2025-04-08 Information-Theoretic Reward Decomposition for Generalizable RLHF Liyuan Mao et.al. 2504.06020 null
2025-04-08 Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? Roman Kochnev et.al. 2504.06006 null
2025-04-08 econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians Can Zhang et.al. 2504.06003 null
2025-04-08 NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge Firoj Alam et.al. 2504.05995 null
2025-04-08 An Empirical Study of GPT-4o Image Generation Capabilities Sixiang Chen et.al. 2504.05979 null
2025-04-08 AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting Xiaolin Fan et.al. 2504.05966 null
2025-04-07 CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models Kavana Venkatesh et.al. 2504.05306 null
2025-04-07 URECA: Unique Region Caption Anything Sangbeom Lim et.al. 2504.05305 null
2025-04-08 NoveltyBench: Evaluating Language Models for Humanlike Diversity Yiming Zhang et.al. 2504.05228 null
2025-04-08 Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG Hengran Zhang et.al. 2504.05220 null
2025-04-07 MSA-UNet3+: Multi-Scale Attention UNet3+ with New Supervised Prototypical Contrastive Loss for Coronary DSA Image Segmentation Rayan Merghani Ahmed et.al. 2504.05184 null
2025-04-07 BRIDGES: Bridging Graph Modality and Large Language Models within EDA Tasks Wei Li et.al. 2504.05180 null
2025-04-07 Attention-Based Multi-Scale Temporal Fusion Network for Uncertain-Mode Fault Diagnosis in Multimode Processes Guangqiang Li et.al. 2504.05172 null
2025-04-07 Pr $εε$ mpt: Sanitizing Sensitive Prompts for LLMs Amrita Roy Chowdhury et.al. 2504.05147 null
2025-04-07 DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration Jiamei Xiong et.al. 2504.05135 null
2025-04-07 ABCDWaveNet: Advancing Robust Road Ponding Detection in Fog through Dynamic Frequency-Spatial Synergy Ronghui Zhang et.al. 2504.05112 null
2025-04-04 Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions Ting-Hsuan Liao et.al. 2504.03639 null
2025-04-04 VISTA-OCR: Towards generative and interactive end to end OCR models Laziz Hamdi et.al. 2504.03621 null
2025-04-04 PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector Kaidong Li et.al. 2504.03563 null
2025-04-04 Diverse In-Context Example Selection After Decomposing Programs and Aligned Utterances Improves Semantic Parsing Mayank Kothyari et.al. 2504.03541 link
2025-04-04 State estimation for gas purity monitoring and control in water electrolysis systems Lucas Cammann et.al. 2504.03522 null
2025-04-04 ATM-Net: Anatomy-Aware Text-Guided Multi-Modal Fusion for Fine-Grained Lumbar Spine Segmentation Sheng Lian et.al. 2504.03476 null
2025-04-04 Locations of Characters in Narratives: Andersen and Persuasion Datasets Batuhan Ozyurt et.al. 2504.03434 link
2025-04-04 MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance Chen Hu et.al. 2504.03379 null
2025-04-04 Point Cloud-based Grasping for Soft Hand Exoskeleton Chen Hu et.al. 2504.03369 null
2025-04-04 Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification Francesca Ronchini et.al. 2504.03329 null
2025-04-03 A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models Gaurav Verma et.al. 2504.02793 null
2025-04-03 A Framework for Robust Cognitive Evaluation of LLMs Karin de Langis et.al. 2504.02789 null
2025-04-03 From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks Joshua Holstein et.al. 2504.02780 null
2025-04-03 BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs Alexander Leszczynski et.al. 2504.02779 link
2025-04-03 Robot-Led Vision Language Model Wellbeing Assessment of Children Nida Itrat Abbasi et.al. 2504.02765 null
2025-04-04 RBT4DNN: Requirements-based Testing of Neural Networks Nusrat Jahan Mozumder et.al. 2504.02737 link
2025-04-03 Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study Aryan Agrawal et.al. 2504.02733 link
2025-04-03 LLM for Complex Reasoning Task: An Exploratory Study in Fermi Problems Zishuo Liu et.al. 2504.02671 null
2025-04-03 Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic Segmentation Feng Gao et.al. 2504.02647 link
2025-04-03 Prompt Optimization with Logged Bandit Data Haruka Kiyohara et.al. 2504.02646 null
2025-04-03 Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation Baban Gain et.al. 2504.01919 null
2025-04-02 Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework Andrey Sidorenko et.al. 2504.01908 link
2025-04-02 Is Temporal Prompting All We Need For Limited Labeled Action Recognition? Shreyank N Gowda et.al. 2504.01890 null
2025-04-02 Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks Ali Al-Kaswan et.al. 2504.01850 null
2025-04-02 Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images Nusrat Munia et.al. 2504.01838 link
2025-04-02 Implicit Bias Injection Attacks against Text-to-Image Diffusion Models Huayang Huang et.al. 2504.01819 link
2025-04-02 UniViTAR: Unified Vision Transformer with Native Resolution Limeng Qiao et.al. 2504.01792 null
2025-04-02 Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation Mingrui Ye et.al. 2504.01764 link
2025-04-02 Stable Structure Learning with HC-Stable and Tabu-Stable Algorithms Neville K. Kitson et.al. 2504.01740 link
2025-04-02 TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication Petr Vanc et.al. 2504.01708 null
2025-03-31 Consistent Subject Generation via Contrastive Instantiated Concepts Lee Hsin-Ying et.al. 2503.24387 null
2025-03-31 Effectively Controlling Reasoning Models through Thinking Intervention Tong Wu et.al. 2503.24370 null
2025-03-31 ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion Rana Muhammad Shahroz Khan et.al. 2503.24354 null
2025-03-31 Contextual Preference Collaborative Measure Framework Based on Belief System Hang Yu et.al. 2503.24328 null
2025-03-31 A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG Arshia Kermani et.al. 2503.24307 null
2025-03-31 Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning Jiacheng Lin et.al. 2503.24289 link
2025-03-31 EP240414a: Off-axis View of a Jet-Cocoon System from an Expanded Progenitor Star Jian-He Zheng et.al. 2503.24266 null
2025-04-02 Text2Tracks: Prompt-based Music Recommendation via Generative Retrieval Enrico Palumbo et.al. 2503.24193 null
2025-03-31 Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms Shuoming Zhang et.al. 2503.24191 null
2025-03-31 LLM4FS: Leveraging Large Language Models for Feature Selection and How to Improve It Jianhao Li et.al. 2503.24157 null
2025-03-28 ActionStudio: A Lightweight Framework for Data and Training of Action Models Jianguo Zhang et.al. 2503.22673 link
2025-03-28 Unicorn: Text-Only Data Synthesis for Vision Language Model Training Xiaomin Yu et.al. 2503.22655 link
2025-03-28 Shadow and gravitational lensing produced by the nonlinear accretion of a scalar field onto a black hole J. C. Acevedo-Muñoz et.al. 2503.22624 null
2025-03-28 Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Antonia Karamolegkou et.al. 2503.22610 null
2025-03-28 Towards a Quantum Information Theory of Hadronization: Dihadron Fragmentation and Neutral Polarization in Heavy Baryons Rebecca von Kuk et.al. 2503.22607 null
2025-03-28 Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish Kevin Cohen et.al. 2503.22585 link
2025-03-28 Pseudovarieties of semigroups Jorge Almeida et.al. 2503.22546 null
2025-03-28 Automated UX Insights from User Research Videos by Integrating Facial Emotion and Text Sentiment Simran Kaur Ghatoray et.al. 2503.22510 null
2025-03-28 Generative Reliability-Based Design Optimization Using In-Context Learning Capabilities of Large Language Models Zhonglin Jiang et.al. 2503.22401 null
2025-03-28 Fighting Fire with Fire: Channel-Independent RF Fingerprinting via the Ratio of Linear to Logarithmic Differential Spectrum Tianshu Chen et.al. 2503.22378 null
2025-03-27 Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model Abdelrahman Shaker et.al. 2503.21782 link
2025-03-27 VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Chi-Pin Huang et.al. 2503.21781 null
2025-03-27 Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation Reza Qorbani et.al. 2503.21780 link
2025-03-27 Test-Time Visual In-Context Tuning Jiahao Xie et.al. 2503.21777 link
2025-03-27 MemInsight: Autonomous Memory Augmentation for LLM Agents Rana Salama et.al. 2503.21760 null
2025-03-27 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework Qi Qin et.al. 2503.21758 link
2025-03-27 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Dian Zheng et.al. 2503.21755 link
2025-03-27 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Shitian Zhao et.al. 2503.21749 null
2025-03-27 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models Yuhan Zhang et.al. 2503.21745 null
2025-03-27 GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics Arsham Gholamzadeh Khoee et.al. 2503.21735 null
2025-03-26 Understanding R1-Zero-Like Training: A Critical Perspective Zichen Liu et.al. 2503.20783 link
2025-03-26 Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising Yan-Bo Lin et.al. 2503.20782 null
2025-03-26 Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields Shijie Zhou et.al. 2503.20776 null
2025-03-27 Beyond Believability: Accurate Human Behavior Simulation with Fine-Tuned LLMs Yuxuan Lu et.al. 2503.20749 null
2025-03-26 Vision as LoRA Han Wang et.al. 2503.20680 link
2025-03-26 BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation Yuyang Peng et.al. 2503.20672 null
2025-03-26 AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction Sadaf Khademi et.al. 2503.20662 null
2025-03-26 AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports Xiangwen Zhang et.al. 2503.20654 null
2025-03-26 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Han Wu et.al. 2503.20641 link
2025-03-26 IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting Hao Fu et.al. 2503.20612 link
2025-03-25 Scaling Vision Pre-Training to 4K Resolution Baifeng Shi et.al. 2503.19903 null
2025-03-25 Scaling Down Text Encoders of Text-to-Image Diffusion Models Lifu Wang et.al. 2503.19897 link
2025-03-25 A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design Jie Tian et.al. 2503.19889 null
2025-03-25 CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation Nengbo Wang et.al. 2503.19878 null
2025-03-25 Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators Seungone Kim et.al. 2503.19877 null
2025-03-25 An Overview of Low-Rank Structures in the Training and Adaptation of Large Models Laura Balzano et.al. 2503.19859 null
2025-03-25 Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Xiaoyu Tian et.al. 2503.19855 null
2025-03-25 Towards Online Multi-Modal Social Interaction Understanding Xinpeng Li et.al. 2503.19851 link
2025-03-25 A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950 Zhao Fang et.al. 2503.19844 null
2025-03-25 Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy Athiya Deviyani et.al. 2503.19828 null
2025-03-24 Target-Aware Video Diffusion Models Taeksoo Kim et.al. 2503.18950 null
2025-03-24 Equivariant Image Modeling Ruixiao Dong et.al. 2503.18948 link
2025-03-24 Video-T1: Test-Time Scaling for Video Generation Fangfu Liu et.al. 2503.18942 null
2025-03-25 Coincidence measurement of two-photon double ionization of argon through an autoionizing resonance Sebastian Hell et.al. 2503.18913 null
2025-03-24 AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration Zhexuan Wang et.al. 2503.18891 link
2025-03-24 Efficient and Accurate Scene Text Recognition with Cascaded-Transformers Savas Ozkan et.al. 2503.18883 null
2025-03-24 Efficient Self-Supervised Adaptation for Medical Image Analysis Moein Sorkhei et.al. 2503.18873 link
2025-03-24 Reasoning to Learn from Latent Thoughts Yangjun Ruan et.al. 2503.18866 null
2025-03-25 MC-LLaVA: Multi-Concept Personalized Vision-Language Model Ruichuan An et.al. 2503.18854 link
2025-03-24 3DSwapping: Texture Swapping For 3D Object From Single Reference Image Xiao Cao et.al. 2503.18853 null
2025-03-21 Core Components of Emotional Impulsivity: A Mouse-Cursor Tracking Study Anton Leontyev et.al. 2503.17328 null
2025-03-21 FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models Mingyang Song et.al. 2503.17287 link
2025-03-21 Revisiting End To End Sparse Autoencoder Training – A Short Finetune is All You Need Adam Karvonen et.al. 2503.17272 link
2025-03-21 Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology Devavrat Tomar et.al. 2503.17238 null
2025-03-21 LLMs Love Python: A Study of LLMs’ Bias for Programming Languages and Libraries Lukas Twist et.al. 2503.17181 link
2025-03-21 ExplainitAI: When do we trust artificial intelligence? The influence of content and explainability in a cross-cultural comparison Sora Kang et.al. 2503.17158 null
2025-03-21 Modifying Large Language Model Post-Training for Diverse Creative Writing John Joon Young Chung et.al. 2503.17126 null
2025-03-21 Multi-modal Multi-platform Person Re-Identification: Benchmark and Method Ruiyang Ha et.al. 2503.17096 null
2025-03-21 Collapse of Rotating White Dwarfs and Multimessenger Signals Takami Kuroda et.al. 2503.17082 null
2025-03-21 Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? Jeremy Barnes et.al. 2503.17039 null
2025-03-20 DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding Keyan Chen et.al. 2503.16426 link
2025-03-20 Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Yang Sui et.al. 2503.16419 link
2025-03-20 Sparse Nonparametric Contextual Bandits Hamish Flynn et.al. 2503.16382 null
2025-03-20 Enhancing Software Quality Assurance with an Adaptive Differential Evolution based Quantum Variational Autoencoder-Transformer Model Seshu Babu Barma et.al. 2503.16335 null
2025-03-20 LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates Ying Shen et.al. 2503.16334 null
2025-03-20 Issue2Test: Generating Reproducing Test Cases from Issue Reports Noor Nashid et.al. 2503.16320 null
2025-03-20 PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification Sharon Peled et.al. 2503.16284 link
2025-03-20 Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data Zijian Li et.al. 2503.16260 null
2025-03-20 M2N2V2: Multi-Modal Unsupervised and Training-free Interactive Segmentation Markus Karmann et.al. 2503.16254 null
2025-03-20 AI Agents in Cryptoland: Practical Attacks and No Silver Bullet Atharv Singh Patlan et.al. 2503.16248 null
2025-03-20 Dynamic Bi-Elman Attention Networks (DBEAN): Dual-Directional Context-Aware Representation Learning for Enhanced Text Classification ZhengLin Lai et.al. 2503.15469 link
2025-03-19 Visual Position Prompt for MLLM based Visual Grounding Wei Tang et.al. 2503.15426 link
2025-03-19 Probing the topology of the space of tokens with structured prompts Michael Robinson et.al. 2503.15421 null
2025-03-19 A time-to-event three-outcome design for randomized phase II cancer trials Minghua Shan et.al. 2503.15418 null
2025-03-19 TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification Junnan Zhu et.al. 2503.15289 null
2025-03-19 TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models Teng-Fang Hsiao et.al. 2503.15283 null
2025-03-19 Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning? Roberto Araya et.al. 2503.15268 null
2025-03-19 Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study Jomar Thomas Almonte et.al. 2503.15248 null
2025-03-19 CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification Wenlong Yu et.al. 2503.15234 link
2025-03-19 Context-Aware Vision Language Foundation Models for Ocular Disease Screening in Retinal Images Lucie Berger et.al. 2503.15212 null
2025-03-18 Aligning Multimodal LLM with Human Preference: A Survey Tao Yu et.al. 2503.14504 null
2025-03-18 The Power of Context: How Multimodality Improves Image Super-Resolution Kangfu Mei et.al. 2503.14503 null
2025-03-18 Tracking Meets Large Multimodal Models for Driving Scenario Understanding Ayesha Ishaq et.al. 2503.14498 link
2025-03-18 Gricean Norms as a Basis for Effective Collaboration Fardin Saad et.al. 2503.14484 link
2025-03-18 ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing Yulin Pan et.al. 2503.14482 null
2025-03-18 LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers Nikhil Abhyankar et.al. 2503.14434 link
2025-03-18 MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation Hongyu Zhang et.al. 2503.14428 null
2025-03-18 Large Language Models for Virtual Human Gesture Selection Parisa Ghanad Torshizi et.al. 2503.14408 null
2025-03-18 Impossible Videos Zechen Bai et.al. 2503.14378 null
2025-03-18 RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment Chao Wang et.al. 2503.14358 null
2025-03-17 Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance Noah Y. Siegel et.al. 2503.13445 null
2025-03-17 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Ye Liu et.al. 2503.13444 link
2025-03-17 DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models Haoyang Li et.al. 2503.13443 link
2025-03-18 MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling Yingyue Li et.al. 2503.13440 link
2025-03-18 DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective Dengyun Peng et.al. 2503.13413 link
2025-03-17 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research James Burgess et.al. 2503.13399 link
2025-03-17 Aligned Probing: Relating Toxic Behavior and Model Internals Andreas Waldis et.al. 2503.13390 null
2025-03-17 Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning Hai-Long Sun et.al. 2503.13360 null
2025-03-17 LEAVS: An LLM-based Labeler for Abdominal CT Supervision Ricardo Bigolin Lanfredi et.al. 2503.13330 link
2025-03-17 Edit Transfer: Learning Image Editing via Vision In-Context Relations Lan Chen et.al. 2503.13327 null
2025-03-14 RNN-DAS: A New Deep Learning Approach for Detection and Real-Time Monitoring of Volcano-Tectonic Events Using Distributed Acoustic Sensing Javier Fernandez-Carabantes et.al. 2503.11622 null
2025-03-14 Synthesizing Access Control Policies using Large Language Models Adarsh Vatsa et.al. 2503.11573 null
2025-03-14 Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models Hao Cheng et.al. 2503.11519 null
2025-03-14 Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks Diego Gosmar et.al. 2503.11517 link
2025-03-14 T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation Seyed Mohammad Hadi Hosseini et.al. 2503.11481 null
2025-03-14 Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models Xu Liu et.al. 2503.11411 null
2025-03-14 Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages Jiyeong Kim et.al. 2503.11384 null
2025-03-14 Modeling Subjectivity in Cognitive Appraisal with Language Models Yuxiang Zhou et.al. 2503.11381 null
2025-03-14 Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model Moritz A. Zanger et.al. 2503.11339 null
2025-03-14 AI-Assisted Object Condensation Clustering for Calorimeter Shower Reconstruction at CLAS12 Gregory Matousek et.al. 2503.11277 null
2025-03-13 GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Rongyao Fang et.al. 2503.10639 link
2025-03-14 Distilling Diversity and Control in Diffusion Models Rohit Gandikota et.al. 2503.10637 null
2025-03-13 V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes Yanming Zhang et.al. 2503.10634 null
2025-03-13 Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search Andy Zhou et.al. 2503.10619 null
2025-03-13 Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models Andy Zhou et.al. 2503.10617 null
2025-03-13 ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer Bolin Chen et.al. 2503.10614 null
2025-03-13 Unlock the Power of Unlabeled Data in Language Driving Model Chaoqun Wang et.al. 2503.10586 null
2025-03-13 ASIDE: Architectural Separation of Instructions and Data in Language Models Egor Zverev et.al. 2503.10566 null
2025-03-13 MASQUE: A Text-Guided Diffusion-Based Framework for Localized and Customized Adversarial Makeup Youngjin Kwon et.al. 2503.10549 null
2025-03-13 KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation Zixian Liu et.al. 2503.10546 null
2025-03-12 MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System Jihao Zhao et.al. 2503.09600 link
2025-03-12 Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot Andrew Crossman et.al. 2503.09586 null
2025-03-12 Evolution of the Three Spectral Components in the Prompt Emission of GRB 240825A Chen-Wei Wang et.al. 2503.09562 null
2025-03-12 Contextuality sans incompatibility in the simplest scenario: Communication supremacy of a qubit Partha Patra et.al. 2503.09534 null
2025-03-12 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Bowen Jin et.al. 2503.09516 link
2025-03-12 Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection Romain Thoreau et.al. 2503.09493 null
2025-03-12 SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery Jiayuan Huang et.al. 2503.09474 null
2025-03-12 Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models Zhihua Tian et.al. 2503.09446 null
2025-03-12 SuperCarver: Texture-Consistent 3D Geometry Super-Resolution for High-Fidelity Surface Detail Generation Qijian Zhang et.al. 2503.09439 null
2025-03-12 PromptMap: An Alternative Interaction Style for AI-Based Image Generation Krzysztof Adamkiewicz et.al. 2503.09436 link
2025-03-11 Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs Ariba Khan et.al. 2503.08688 link
2025-03-11 OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models Jialv Zou et.al. 2503.08686 link
2025-03-11 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful Iván Arcuschin et.al. 2503.08679 link
2025-03-11 AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence Zekun Li et.al. 2503.08669 null
2025-03-11 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling Subin Kim et.al. 2503.08605 null
2025-03-11 NSF-SciFy: Mining the NSF Awards Database for Scientific Claims Delip Rao et.al. 2503.08600 null
2025-03-11 There’s more to life in reflected light: Simulating the detectability of a range of molecules for high-contrast, high-resolution observations of non-transiting terrestrial exoplanets Miles H. Currie et.al. 2503.08592 null
2025-03-11 BiasEdit: Debiasing Stereotyped Language Models via Model Editing Xin Xu et.al. 2503.08588 link
2025-03-11 Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation Mingkang Zhu et.al. 2503.08575 null
2025-03-11 ComicsPAP: understanding comic strips by picking the correct panel Emanuele Vivoli et.al. 2503.08561 null
2025-03-10 GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval Justus-Jonas Erker et.al. 2503.07519 link
2025-03-10 TokenButler: Token Importance is Predictable Yash Akhauri et.al. 2503.07518 link
2025-03-10 CPAny: Couple With Any Encoder to Refer Multi-Object Tracking Weize Li et.al. 2503.07516 null
2025-03-10 Language Models Fail to Introspect About Their Knowledge of Language Siyuan Song et.al. 2503.07513 link
2025-03-10 Plume: Scaffolding Text Composition in Dashboards Maxim Lisnic et.al. 2503.07512 null
2025-03-10 Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts Shiu-hong Kao et.al. 2503.07503 null
2025-03-10 V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation Guiwei Zhang et.al. 2503.07493 link
2025-03-10 Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction Zongzheng Zhang et.al. 2503.07485 link
2025-03-10 YOLOE: Real-Time Seeing Anything Ao Wang et.al. 2503.07465 link
2025-03-10 Anatomy-Aware Conditional Image-Text Retrieval Meng Zheng et.al. 2503.07456 null
2025-03-10 From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics Jaewook Lee et.al. 2503.07429 null
2025-03-10 TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision Shaobin Zhuang et.al. 2503.07416 null
2025-03-10 REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding Yan Tai et.al. 2503.07413 link
2025-03-10 TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models Ruidong Chen et.al. 2503.07389 link
2025-03-10 Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment Xing Xie et.al. 2503.07334 link
2025-03-10 Self-Corrective Task Planning by Inverse Prompting with Large Language Models Jiho Lee et.al. 2503.07317 null
2025-03-10 Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies Luyi Jiang et.al. 2503.07306 null
2025-03-07 Fairness-Aware Low-Rank Adaptation Under Demographic Privacy Constraints Parameswaran Kamalaruban et.al. 2503.05684 null
2025-03-07 Task-oriented Uncertainty Collaborative Learning for Label-Efficient Brain Tumor Segmentation Zhenxuan Zhang et.al. 2503.05682 null
2025-03-07 AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data Zengqun Zhao et.al. 2503.05665 link
2025-03-07 VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Yuxuan Bian et.al. 2503.05639 link
2025-03-07 Nuanced Safety for Generative AI: How Demographics Shape Responsiveness to Severity Pushkar Mishra et.al. 2503.05609 null
2025-03-07 Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models Zheng Li et.al. 2503.05595 link
2025-03-07 Evaluating open-source Large Language Models for automated fact-checking Nicolo’ Fontana et.al. 2503.05565 null
2025-03-07 S4M: Segment Anything with 4 Extreme Points Adrien Meyer et.al. 2503.05534 null
2025-03-07 State-of-the-Art Stroke Lesion Segmentation at 1/1000th of Parameters Alex Fedorov et.al. 2503.05531 null
2025-03-07 Cognitive Bias Detection Using Advanced Prompt Engineering Frederic Lemieux et.al. 2503.05516 null
2025-03-07 Shifting Long-Context LLMs Research from Input to Output Yuhao Wu et.al. 2503.04723 null
2025-03-06 Enough Coin Flips Can Make LLMs Act Bayesian Ritwik Gupta et.al. 2503.04722 null
2025-03-06 Scaling Rich Style-Prompted Text-to-Speech Datasets Anuj Diwan et.al. 2503.04713 link
2025-03-06 L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning Pranjal Aggarwal et.al. 2503.04697 null
2025-03-06 Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation Aishik Konwer et.al. 2503.04639 null
2025-03-06 SynGraph: A Dynamic Graph-LLM Synthesis Framework for Sparse Streaming User Sentiment Modeling Xin Zhang et.al. 2503.04619 null
2025-03-06 Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation Armel Zebaze et.al. 2503.04554 null
2025-03-06 Generalized Interpolating Discrete Diffusion Dimitri von Rütte et.al. 2503.04482 null
2025-03-06 ToolFuzz – Automated Agent Tool Testing Ivan Milev et.al. 2503.04479 null
2025-03-06 Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges Francisco Eiras et.al. 2503.04474 null
2025-03-05 A Practical Memory Injection Attack against LLM Agents Shen Dong et.al. 2503.03704 null
2025-03-05 A Generative Approach to High Fidelity 3D Reconstruction from Text Data Venkat Kumar R et.al. 2503.03664 null
2025-03-05 LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant Wei Li et.al. 2503.03663 null
2025-03-05 Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset Jessica Hoffmann et.al. 2503.03654 null
2025-03-05 Token-Level Privacy in Large Language Models Re’em Harel et.al. 2503.03652 null
2025-03-05 DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms Xiaojun Bi et.al. 2503.03644 link
2025-03-05 Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering Lingli Cao et.al. 2503.03609 link
2025-03-05 Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Kristian Kuznetsov et.al. 2503.03601 null
2025-03-05 Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs Haoran Fan et.al. 2503.03594 link
2025-03-05 Digital Twin-Enabled Blockage-Aware Dynamic mmWave Multi-Hop V2X Communication Supat Roongpraiwan et.al. 2503.03590 null
2025-03-04 Prompting Generative AI with Interaction-Augmented Instructions Leixian Shen et.al. 2503.02874 null
2025-03-04 Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework Ziang Zhou et.al. 2503.02863 null
2025-03-04 Evaluation of Architectural Synthesis Using Generative AI Jingfei Huang et.al. 2503.02861 null
2025-03-04 A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness Nathan Drenkow et.al. 2503.02797 null
2025-03-04 Quantitative Resilience Modeling for Autonomous Cyber Defense Xavier Cadet et.al. 2503.02780 null
2025-03-04 Prime Convolutional Model: Breaking the Ground for Theoretical Explainability Francesco Panelli et.al. 2503.02773 null
2025-03-04 From Metaphor to Mechanism: How LLMs Decode Traditional Chinese Medicine Symbolic Language for Modern Clinical Relevance Jiacheng Tang et.al. 2503.02760 null
2025-03-04 BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression Daniil Larionov et.al. 2503.02756 null
2025-03-04 Evaluating Knowledge Generation and Self-Refinement Strategies for LLM-based Column Type Annotation Keti Korini et.al. 2503.02718 link
2025-03-04 FlowPlan: Zero-Shot Task Planning with LLM Flow Engineering for Robotic Instruction Following Zijun Lin et.al. 2503.02698 null
2025-02-28 Persuasion Should be Double-Blind: A Multi-Domain Dialogue Dataset With Faithfulness Based on Causal Theory of Mind Dingyi Zhang et.al. 2502.21297 null
2025-02-28 Contextualizing biological perturbation experiments through language Menghua Wu et.al. 2502.21290 link
2025-02-28 Adaptive Keyframe Sampling for Long Video Understanding Xi Tang et.al. 2502.21271 null
2025-02-28 RuCCoD: Towards Automated ICD Coding in Russian Aleksandr Nesterov et.al. 2502.21263 link
2025-02-28 PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts Boxiao Yu et.al. 2502.21260 null
2025-02-28 Towards Developing Ethical Reasoners: Integrating Probabilistic Reasoning and Decision-Making for Complex AI Systems Nijesh Upreti et.al. 2502.21250 null
2025-02-28 Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens Xinyu Shi et.al. 2502.21219 null
2025-02-28 Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought Jianhao Huang et.al. 2502.21212 null
2025-02-28 CuPID: Leveraging Masked Single-Lead ECG Modelling for Enhancing the Representations Adtian Atienza et.al. 2502.21127 null
2025-02-28 SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events Yunfan Lu et.al. 2502.21120 null
2025-02-27 Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation Sucheng Ren et.al. 2502.20388 link
2025-02-27 Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Jeffrey Yang Fan Chiang et.al. 2502.20383 null
2025-02-27 Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers Shalev Lifshitz et.al. 2502.20379 null
2025-02-27 Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization Ryan C. Barron et.al. 2502.20364 null
2025-02-27 Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs Kuan Lok Zhou et.al. 2502.20356 null
2025-02-27 On Adversarial Attacks In Acoustic Drone Localization Tamir Shor et.al. 2502.20325 null
2025-02-27 ACCORD: Application Context-aware Cross-layer Optimization and Resource Design for 5G/NextG Machine-centric Applications Azuka Chiejina et.al. 2502.20320 null
2025-02-27 LangProBe: a Language Programs Benchmark Shangyin Tan et.al. 2502.20315 null
2025-02-27 Mobius: Text to Seamless Looping Video Generation via Latent Shift Xiuli Bi et.al. 2502.20307 link
2025-02-27 Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription Benjamin Gutteridge et.al. 2502.20295 link
2025-02-26 Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Lucy Xiaoyang Shi et.al. 2502.19417 null
2025-02-26 Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing Akshat Gupta et.al. 2502.19416 null
2025-02-26 The Mighty ToRR: A Benchmark for Table Reasoning and Robustness Shir Ashury-Tahan et.al. 2502.19412 link
2025-02-26 Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices Xinru Wang et.al. 2502.19410 null
2025-02-26 DataMan: Data Manager for Pre-training Large Language Models Ru Peng et.al. 2502.19363 null
2025-02-26 Optimal COVID-19 vaccine prioritization by age depends critically on inter-group contacts and vaccination rates Iker Atienza-Diez et.al. 2502.19292 null
2025-02-26 CritiQ: Mining Data Quality Criteria from Human Preferences Honglin Guo et.al. 2502.19279 null
2025-02-26 Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in Pre-trained Vision-Language Models Jiawei Kong et.al. 2502.19269 null
2025-02-26 Enhancing Gradient-based Discrete Sampling via Parallel Tempering Luxu Liang et.al. 2502.19240 null
2025-02-26 AI-Powered Bayesian Inference Veronika Ročková et.al. 2502.19231 null
2025-02-25 K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs Ziheng Ouyang et.al. 2502.18461 null
2025-02-25 Evaluating the Effectiveness of Small Language Models in Detecting Refactoring Bugs Rohit Gheyi et.al. 2502.18454 null
2025-02-25 MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning Chanwoo Park et.al. 2502.18439 null
2025-02-25 Rank1: Test-Time Compute for Reranking in Information Retrieval Orion Weller et.al. 2502.18418 link
2025-02-25 MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification Zhuoqin Yang et.al. 2502.18416 null
2025-02-25 ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Yifan Pu et.al. 2502.18364 null
2025-02-25 GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music Xinran Liu et.al. 2502.18309 null
2025-02-25 LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation Pengzhi Li et.al. 2502.18302 null
2025-02-25 Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training Botao Ye et.al. 2502.18219 null
2025-02-25 FLARE: A Framework for Stellar Flare Forecasting using Stellar Physical Properties and Historical Records Bingke Zhu et.al. 2502.18218 null
2025-02-24 Stronger Neyman Regret Guarantees for Adaptive Experimental Design Georgy Noarov et.al. 2502.17427 link
2025-02-24 Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs Jan Betley et.al. 2502.17424 link
2025-02-24 Function-Space Learning Rates Edward Milsom et.al. 2502.17405 link
2025-02-24 What is a Good Question? Utility Estimation with LLM-based Simulations Dong-Ho Lee et.al. 2502.17383 null
2025-02-24 A Closer Look at TabPFN v2: Strength, Limitation, and Extension Han-Jia Ye et.al. 2502.17361 null
2025-02-24 Goal-Oriented Middleware Filtering at Transport Layer Based on Value of Updates Polina Kutsevol et.al. 2502.17350 null
2025-02-24 Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents Prafulla Kumar Choubey et.al. 2502.17321 null
2025-02-24 A novel approach to navigate the taxonomic hierarchy to address the Open-World Scenarios in Medicinal Plant Classification Soumen Sinha et.al. 2502.17289 null
2025-02-24 Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing Yi-Kai Zhang et.al. 2502.17282 link
2025-02-24 Extracting domain-specific terms using contextual word embeddings Andraž Repar et.al. 2502.17278 null
2025-02-21 ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval Guanqi Zhan et.al. 2502.15682 null
2025-02-21 AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind Zhining Zhang et.al. 2502.15676 link
2025-02-21 Empowering LLMs with Logical Reasoning: A Comprehensive Survey Fengxiang Cheng et.al. 2502.15652 null
2025-02-21 MemoryPods: Enhancing Asynchronous Communication in Extended Reality Akos Nagy et.al. 2502.15622 null
2025-02-21 Extraction multi-étiquettes de relations en utilisant des couches de Transformer Ngoc Luyen Le et.al. 2502.15619 null
2025-02-21 Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author’s Style Xueran Han et.al. 2502.15616 null
2025-02-21 Ontological models cannot adequately represent state update for sequential measurement of incompatible observables Alisson Tezzin et.al. 2502.15615 null
2025-02-21 Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid Yunfeng Li et.al. 2502.15583 null
2025-02-21 Context-Aware Doubly-Robust Semi-Supervised Learning Clement Ruah et.al. 2502.15577 null
2025-02-21 A Cautionary Tale About “Neutrally” Informative AI Tools Ahead of the 2025 Federal Elections in Germany Ina Dormuth et.al. 2502.15568 null
2025-02-20 Prompt-to-Leaderboard Evan Frick et.al. 2502.14855 link
2025-02-20 Red-Teaming LLM Multi-Agent Systems via Communication Attacks Pengfei He et.al. 2502.14847 null
2025-02-20 Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Yue Yang et.al. 2502.14846 null
2025-02-20 Dynamic Concepts Personalization from Single Videos Rameen Abdal et.al. 2502.14844 null
2025-02-20 Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps Martin Tutek et.al. 2502.14829 link
2025-02-20 eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables Luis Antonio Gutiérrez Guanilo et.al. 2502.14820 null
2025-02-20 Dynamic Low-Rank Sparse Adaptation for Large Language Models Weizhong Huang et.al. 2502.14816 link
2025-02-20 Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration Pengxiang Ding et.al. 2502.14795 null
2025-02-20 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Tian Xie et.al. 2502.14768 link
2025-02-20 HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States Yilei Jiang et.al. 2502.14744 link
2025-02-19 Where’s the Bug? Attention Probing for Scalable Fault Localization Adam Stein et.al. 2502.13966 null
2025-02-19 RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision Guangzhi Xiong et.al. 2502.13957 null
2025-02-19 Neurosymbolic artificial intelligence via large language models and coherence-driven inference Steve Huntsman et.al. 2502.13953 null
2025-02-19 A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models Hao Huang et.al. 2502.13942 null
2025-02-19 Citation proximus: the role of social and semantic ties in citing behaviour Diego Kozlowski et.al. 2502.13934 null
2025-02-19 Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences? Xiaochen Wang et.al. 2502.13925 null
2025-02-19 Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis Jiahao Gai et.al. 2502.13921 null
2025-02-19 Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health Xingbo Wang et.al. 2502.13920 null
2025-02-19 Judging the Judges: A Collection of LLM-Generated Relevance Judgements Hossein A. Rahmani et.al. 2502.13908 link
2025-02-19 DataSciBench: An LLM Agent Benchmark for Data Science Dan Zhang et.al. 2502.13897 link
2025-02-18 UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models Huawei Lin et.al. 2502.13141 link
2025-02-18 Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions Taedong Yun et.al. 2502.13135 null
2025-02-18 STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models Narun Raman et.al. 2502.13119 null
2025-02-18 Near-Optimal Private Learning in Linear Contextual Bandits Fan Chen et.al. 2502.13115 null
2025-02-18 KAPPA: A Generic Patent Analysis Framework with Keyphrase-Based Portraits Xin Xia et.al. 2502.13076 null
2025-02-18 Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction Nils Constantin Hellwig et.al. 2502.13044 null
2025-02-18 HPSS: Heuristic Prompting Strategy Search for LLM Evaluators Bosi Wen et.al. 2502.13031 null
2025-02-18 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks Markus J. Buehler et.al. 2502.13025 link
2025-02-18 Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation Sha Li et.al. 2502.13019 null
2025-02-18 LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation Junchen Fu et.al. 2502.12945 null
2025-02-17 Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA Patryk Marszałek et.al. 2502.12122 link
2025-02-17 A-MEM: Agentic Memory for LLM Agents Wujiang Xu et.al. 2502.12110 link
2025-02-17 VLM $^2$ -Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues Jianshu Zhang et.al. 2502.12084 null
2025-02-17 Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation Zhongyi Qiu et.al. 2502.12073 null
2025-02-17 Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions Lan Zhang et.al. 2502.12065 link
2025-02-17 Designing Role Vectors to Improve LLM Inference Behaviour Daniele Potertì et.al. 2502.12055 null
2025-02-17 Robotic CBCT Meets Robotic Ultrasound Feng Li et.al. 2502.12019 null
2025-02-17 Learning Generalizable Prompt for CLIP with Class Similarity Knowledge Sehun Jung et.al. 2502.11969 null
2025-02-17 VAQUUM: Are Vague Quantifiers Grounded in Visual Data? Hugh Mee Wong et.al. 2502.11874 null
2025-02-17 Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu Renhao Pei et.al. 2502.11862 link
2025-02-14 Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction WonJin Yoon et.al. 2502.10388 null
2025-02-14 Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models Jiexin Ding et.al. 2502.10378 null
2025-02-14 Adversarial Mixup Unlearning Zhuoyi Peng et.al. 2502.10288 null
2025-02-14 Are Large Language Models the future crowd workers of Linguistics? Iris Ferrazzo et.al. 2502.10266 null
2025-02-14 VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models Gokul Karthik Kumar et.al. 2502.10250 null
2025-02-14 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Guoqing Ma et.al. 2502.10248 link
2025-02-14 Combinatorial Reinforcement Learning with Preference Feedback Joongkyu Lee et.al. 2502.10158 null
2025-02-14 NeuroXVocal: Detection and Explanation of Alzheimer’s Disease through Non-invasive Analysis of Picture-prompted Speech Nikolaos Ntampakis et.al. 2502.10108 null
2025-02-14 MTLM: an Innovative Language Model Training Paradigm for ASR Qingliang Meng et.al. 2502.10058 null
2025-02-14 ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments Juyeong Hwang et.al. 2502.10046 null
2025-02-13 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Dongzhi Jiang et.al. 2502.09621 null
2025-02-13 Designing a Conditional Prior Distribution for Flow-Based Generative Models Noam Issachar et.al. 2502.09611 null
2025-02-13 CoT-Valve: Length-Compressible Chain-of-Thought Tuning Xinyin Ma et.al. 2502.09601 link
2025-02-13 GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Angelos Zavras et.al. 2502.09598 link
2025-02-13 Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs Siyan Zhao et.al. 2502.09597 link
2025-02-13 Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks Qian Wan et.al. 2502.09577 null
2025-02-13 Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering Mark Beliaev et.al. 2502.09573 null
2025-02-13 MDCrow: Automating Molecular Dynamics Workflows with Large Language Models Quintina Campbell et.al. 2502.09565 link
2025-02-13 Improve LLM-based Automatic Essay Scoring with Linguistic Features Zhaoyi Joey Hou et.al. 2502.09497 null
2025-02-13 Objective quantification of mood states using large language models Jakub Onysk et.al. 2502.09487 null
2025-02-12 Rhythmic sharing: A bio-inspired paradigm for zero-shot adaptation and learning in neural networks Hoony Kang et.al. 2502.08644 link
2025-02-12 Ultrasound Image Generation using Latent Diffusion Models Benoit Freiche et.al. 2502.08580 null
2025-02-12 AR Glulam: Accurate Augmented Reality Using Multiple Fiducial Markers for Glulam Fabrication Alexander Htet Kyaw et.al. 2502.08566 null
2025-02-12 QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval Wonduk Seo et.al. 2502.08557 null
2025-02-12 LLMs can implicitly learn from mistakes in-context Lisa Alazraki et.al. 2502.08550 null
2025-02-12 LoRa Fine Synchronization with Two-Pass Time and Frequency Offset Estimation Joachim Tapparel et.al. 2502.08485 null
2025-02-12 Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning Qifan Yu et.al. 2502.08482 null
2025-02-12 Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring Heejin Do et.al. 2502.08450 null
2025-02-12 A Semantic Parsing Algorithm to Solve Linear Ordering Problems Maha Alkhairy et.al. 2502.08415 null
2025-02-12 IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance Paul Röttger et.al. 2502.08395 null
2025-02-11 Auditing Prompt Caching in Language Model APIs Chenchen Gu et.al. 2502.07776 link
2025-02-11 Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers Italo Santos et.al. 2502.07763 null
2025-02-11 An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating Mohammad Ali Labbaf Khaniki et.al. 2502.07755 null
2025-02-11 WHODUNIT: Evaluation benchmark for culprit detection in mystery stories Kshitij Gupta et.al. 2502.07747 link
2025-02-11 HRP: High-Rank Preheating for Superior LoRA Initialization Yuzhu Chen et.al. 2502.07739 null
2025-02-11 Pluto: Authoring Semantically Aligned Text and Charts for Data-Driven Communication Arjun Srinivasan et.al. 2502.07725 null
2025-02-11 RenderBox: Expressive Performance Rendering with Text Control Huan Zhang et.al. 2502.07711 null
2025-02-11 Methodology for Identifying Social Groups within a Transactional Graph Maxence Morin et.al. 2502.07694 null
2025-02-11 Are Princelings Truly Busted? Evaluating Transaction Discounts in China’s Land Market Julia Manso et.al. 2502.07692 null
2025-02-11 exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem Sajad Ebrahimi et.al. 2502.07683 link
2025-02-10 Rationalization Models for Text-to-SQL Gaetano Rossiello et.al. 2502.06759 null
2025-02-10 SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement Yuqi Lin et.al. 2502.06756 link
2025-02-10 Discovery of skill switching criteria for learning agile quadruped locomotion Wanming Yu et.al. 2502.06676 null
2025-02-10 Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations Rui Chen et.al. 2502.06669 null
2025-02-10 In-Context Learning (and Unlearning) of Length Biases Stephanie Schoch et.al. 2502.06653 null
2025-02-10 Estimation of Food Intake Quantity Using Inertial Signals from Smartwatches Ioannis Levi et.al. 2502.06649 null
2025-02-10 Quasi-stationary distributions for subcritical population models Pablo Groisman et.al. 2502.06638 null
2025-02-10 Unleashing the Potential of Pre-Trained Diffusion Models for Generalizable Person Re-Identification Jiachen Li et.al. 2502.06619 link
2025-02-10 A Large-scale AI-generated Image Inpainting Benchmark Paschalis Giakoumoglou et.al. 2502.06593 null
2025-02-10 Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training Yuchen Zhuang et.al. 2502.06589 null
2025-02-07 FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Shilong Zhang et.al. 2502.05179 link
2025-02-07 MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison Kaijie Zhu et.al. 2502.05174 null
2025-02-07 In-context denoising with one-layer transformers: connections between attention and associative memory retrieval Matthew Smart et.al. 2502.05164 null
2025-02-07 CodeSCM: Causal Analysis for Multi-Modal Code Generation Mukur Gupta et.al. 2502.05150 link
2025-02-07 From Restless to Contextual: A Thresholding Bandit Approach to Improve Finite-horizon Performance Jiamin Xu et.al. 2502.05145 link
2025-02-07 Segment Geometry Optimization and Prototype Studies of a Multi-Coincidence GAGG Solar Neutrino Detector Brooks Hartsock et.al. 2502.05095 null
2025-02-07 Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs Thierry Bossy et.al. 2502.05087 link
2025-02-07 ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework Xiaoyu Deng et.al. 2502.05084 null
2025-02-07 Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures Tushar Pandey et.al. 2502.05078 link
2025-02-07 Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images Aditya Kumar et.al. 2502.05066 link
2025-02-06 ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Alec Helbling et.al. 2502.04320 link
2025-02-06 ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters Kamer Ali Yuksel et.al. 2502.04315 link
2025-02-06 DexterityGen: Foundation Controller for Unprecedented Dexterity Zhao-Heng Yin et.al. 2502.04307 null
2025-02-06 Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization Yuanye Liu et.al. 2502.04295 link
2025-02-06 GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation Weihang Li et.al. 2502.04293 null
2025-02-06 Cognitive AI framework: advances in the simulation of human thought Rommel Salas-Guerra et.al. 2502.04259 null
2025-02-06 MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion Xintong Hao et.al. 2502.04235 null
2025-02-06 Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data Laura Biester et.al. 2502.04218 null
2025-02-06 “Short-length” Adversarial Training Helps LLMs Defend “Long-length” Jailbreak Attacks: Theoretical and Empirical Evidence Shaopeng Fu et.al. 2502.04204 link
2025-02-06 Lexical Substitution is not Synonym Substitution: On the Importance of Producing Contextually Relevant Word Substitutes Juraj Vladika et.al. 2502.04173 null
2025-02-05 Contextuality with Pauli observables in cycle scenarios Raman Choudhary et.al. 2502.03451 null
2025-02-05 A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) Yiye Chen et.al. 2502.03450 null
2025-02-05 Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation Alexey A. Novikov et.al. 2502.03420 null
2025-02-05 Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts Nikta Gohari Sadr et.al. 2502.03418 null
2025-02-05 Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach Abdullahi Isa Ahmed et.al. 2502.03377 null
2025-02-05 Interactive Visualization Recommendation with Hier-SUCB Songwen Hu et.al. 2502.03375 link
2025-02-05 Controllable GUI Exploration Aryan Garg et.al. 2502.03330 null
2025-02-05 ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model Qiguang Chen et.al. 2502.03325 null
2025-02-05 ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models Ying Zhang et.al. 2502.03266 link
2025-02-05 MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent Xinyao Liao et.al. 2502.03207 null
2025-02-04 Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling Xiaowen Qiu et.al. 2502.02590 null
2025-02-04 Contextuality of Quantum Error-Correcting Codes Derek Khu et.al. 2502.02553 null
2025-02-04 OVERTHINKING: Slowdown Attacks on Reasoning LLMs Abhinav Kumar et.al. 2502.02542 link
2025-02-04 Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies Han Zhou et.al. 2502.02533 null
2025-02-04 Catoni Contextual Bandits are Robust to Heavy-tailed Rewards Chenlu Ye et.al. 2502.02486 null
2025-02-04 An extended Wigner’s friend no-go theorem inspired by generalized contextuality Laurens Walleghem et.al. 2502.02461 null
2025-02-04 IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning Quan Zhang et.al. 2502.02454 null
2025-02-04 Personalization Toolkit: Training Free Personalization of Large Vision Language Models Soroush Seifi et.al. 2502.02452 null
2025-02-04 LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models Jiangong Chen et.al. 2502.02441 link
2025-02-04 FewTopNER: Integrating Few-Shot Learning with Topic Modeling and Named Entity Recognition in a Multilingual Framework Ibrahim Bouabdallaoui et.al. 2502.02391 link
2025-01-31 Low-Rank Adapting Models for Sparse Autoencoders Matthew Chen et.al. 2501.19406 link
2025-01-31 Vintix: Action Model via In-Context Reinforcement Learning Andrey Polubarov et.al. 2501.19400 link
2025-01-31 Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models Wenzhi Fang et.al. 2501.19389 link
2025-01-31 The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking Yuchun Miao et.al. 2501.19358 null
2025-01-31 LLM-based Affective Text Generation Quality Based on Different Quantization Values Yarik Menchaca Resendiz et.al. 2501.19317 null
2025-01-31 Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution Tatiana Anikina et.al. 2501.19316 null
2025-01-31 Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes Zhiyao Xu et.al. 2501.19298 null
2025-01-31 Analysis of LLMs vs Human Experts in Requirements Engineering Cory Hymel et.al. 2501.19297 null
2025-01-31 Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs James Flemings et.al. 2501.19287 null
2025-01-31 Pheromone-based Learning of Optimal Reasoning Paths Anirudh Chari et.al. 2501.19278 null
2025-01-30 R.I.P.: Better Models by Survival of the Fittest Prompts Ping Yu et.al. 2501.18578 null
2025-01-30 BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos Lehao Lin et.al. 2501.18565 null
2025-01-30 Semantic Web and Creative AI – A Technical Report from ISWS 2023 Raia Abu Ahmad et.al. 2501.18542 null
2025-01-30 Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges Manveer Singh Tamber et.al. 2501.18536 link
2025-01-30 CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction Peter J. Bentley et.al. 2501.18504 null
2025-01-30 HSRMamba: Contextual Spatial-Spectral State Space Model for Single Hyperspectral Super-Resolution Shi Chen et.al. 2501.18500 null
2025-01-30 CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization Yanxia Deng et.al. 2501.18475 null
2025-01-30 Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations Chengxi Zeng et.al. 2501.18474 null
2025-01-30 ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation Minghua He et.al. 2501.18460 null
2025-01-30 o3-mini vs DeepSeek-R1: Which One is Safer? Aitor Arrieta et.al. 2501.18438 link
2025-01-29 Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs’ Domain-Specific Insight Learning? Pouya Pezeshkpour et.al. 2501.17840 link
2025-01-29 U2A: Unified Unimodal Adaptation for Robust and Efficient Multimodal Learning Md Kaykobad Reza et.al. 2501.17823 null
2025-01-29 Leveraging Multimodal LLM for Inspirational User Interface Search Seokhyeon Park et.al. 2501.17799 link
2025-01-29 AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing Peter Pak et.al. 2501.17784 null
2025-01-29 Unraveling Log4Shell: Analyzing the Impact and Response to the Log4j Vulnerabil John Doll et.al. 2501.17760 null
2025-01-29 Early External Safety Testing of OpenAI’s o3-mini: Insights from the Pre-Deployment Evaluation Aitor Arrieta et.al. 2501.17749 null
2025-01-29 VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback Sayeh Gholipour Picha et.al. 2501.17726 null
2025-01-29 RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts Eujeong Choi et.al. 2501.17715 link
2025-01-29 In-Context Meta LoRA Generation Yihua Shao et.al. 2501.17635 null
2025-01-29 Uncertainty Quantification and Decomposition for LLM-based Recommendation Wonbin Kweon et.al. 2501.17630 link
2025-01-28 CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation Nikolai Kalischek et.al. 2501.17162 null
2025-01-28 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders Zhengxuan Wu et.al. 2501.17148 link
2025-01-28 FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Deren Lei et.al. 2501.17144 link
2025-01-28 ASTRAL: Automated Safety Testing of Large Language Models Miriam Ugarte et.al. 2501.17132 null
2025-01-28 Scenario Understanding of Traffic Scenes Through Large Visual Language Models Rivera Esteban et.al. 2501.17131 null
2025-01-28 COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models Tobias Materzok et.al. 2501.17104 null
2025-01-28 Text-to-Image Generation for Vocabulary Learning Using the Keyword Method Nuwan T. Attygalle et.al. 2501.17099 null
2025-01-28 Context is Key in Agent Security Lillian Tsai et.al. 2501.17070 null
2025-01-28 Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Akash Kumar et.al. 2501.17053 null
2025-01-28 Large Language Models for Code Generation: The Practitioners Perspective Zeeshan Rasheed et.al. 2501.16998 link
2025-01-27 RelightVid: Temporal-Consistent Diffusion Model for Video Relighting Ye Fang et.al. 2501.16330 null
2025-01-27 Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology Meiyun Cao et.al. 2501.16309 null
2025-01-27 RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval Long Nguyen et.al. 2501.16303 null
2025-01-27 CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation Xiaochuan Ma et.al. 2501.16246 null
2025-01-27 Language-Based Bayesian Optimization Research Assistant (BORA) Abdoulatif Cissé et.al. 2501.16224 null
2025-01-27 Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models Huayu Li et.al. 2501.16215 link
2025-01-27 Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs Antony Bartlett et.al. 2501.16191 null
2025-01-27 Can summarization approximate simplification? A gold standard comparison Giacomo Magnifico et.al. 2501.16181 null
2025-01-27 BAG: Body-Aligned 3D Wearable Asset Generation Zhongjin Luo et.al. 2501.16177 null
2025-01-27 Will Systems of LLM Agents Cooperate: An Investigation into a Social Dilemma Richard Willis et.al. 2501.16173 link
2025-01-24 HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation Xin Zhou et.al. 2501.14729 link
2025-01-24 Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? Ipek Baris Schlicht et.al. 2501.14719 null
2025-01-24 Gland Segmentation Using SAM With Cancer Grade as a Prompt Yijie Zhu et.al. 2501.14718 null
2025-01-24 Funzac at CoMeDi Shared Task: Modeling Annotator Disagreement from Word-In-Context Perspectives Olufunke O. Sarumi et.al. 2501.14617 link
2025-01-24 Calibrating Wireless AI via Meta-Learned Context-Dependent Conformal Prediction Seonghoon Yoo et.al. 2501.14566 null
2025-01-24 Next-Generation Wireless: Tracking the Evolutionary Path of 6G Mobile Communication Ekram Hossain et.al. 2501.14552 null
2025-01-24 VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning Benjamin Callewaert et.al. 2501.14540 null
2025-01-24 Automated Assignment Grading with Large Language Models: Insights From a Bioinformatics Course Pavlin G. Poličar et.al. 2501.14499 null
2025-01-24 Evaluating and Improving Graph to Text Generation with Large Language Models Jie He et.al. 2501.14497 link
2025-01-24 Boundary Value Test Input Generation Using Prompt Engineering with LLMs: Fault Detection and Coverage Analysis Xiujing Guo et.al. 2501.14465 null
2025-01-23 GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing Akashah Shabbir et.al. 2501.13925 link
2025-01-23 The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities Chan-Jan Hsu et.al. 2501.13921 link
2025-01-23 IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Jiayi Lei et.al. 2501.13920 null
2025-01-23 Improving Video Generation with Human Feedback Jie Liu et.al. 2501.13918 null
2025-01-23 Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models Linh Tran et.al. 2501.13904 null
2025-01-23 Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning Zuyao You et.al. 2501.13893 link
2025-01-23 Generating Realistic Forehead-Creases for User Verification via Conditioned Piecewise Polynomial Curves Abhishek Tandon et.al. 2501.13889 link
2025-01-23 A RAG-Based Institutional Assistant Gustavo Kuratomi et.al. 2501.13880 null
2025-01-23 Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems Ethan Wilson et.al. 2501.13878 null
2025-01-23 Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning Shiyu Zhang et.al. 2501.13859 null
2025-01-22 Constructive characterisations of the must-preorder for asynchrony Giovanni Bernardi et.al. 2501.13002 null
2025-01-22 Can supermassive stars form in protogalaxies due to internal Lyman-Werner feedback? James Sullivan et.al. 2501.12986 null
2025-01-22 LLM4WM: Adapting LLM for Wireless Multi-Tasking Xuanyu Liu et.al. 2501.12983 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models Chongren Sun et.al. 2501.12975 link
2025-01-22 Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference Weizhi Fei et.al. 2501.12959 null
2025-01-22 PreciseCam: Precise Camera Control for Text-to-Image Generation Edurne Bernal-Berdun et.al. 2501.12910 null
2025-01-22 The impact of hyperons on neutron star mergers: gravitational waves, mass ejection and black hole formation Hristijan Kochankovski et.al. 2501.12905 null
2025-01-22 Architectural Fusion Through Contextual Partitioning in Large Language Models: A Novel Approach to Parameterized Knowledge Integration Offa Kingsleigh et.al. 2501.12901 null
2025-01-22 HierPromptLM: A Pure PLM-based Framework for Representation Learning on Heterogeneous Text-rich Networks Qiuyu Zhu et.al. 2501.12857 null
2025-01-21 Towards Affordance-Aware Articulation Synthesis for Rigged Objects Yu-Chu Yu et.al. 2501.12393 null
2025-01-21 Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL Yeounoh Chung et.al. 2501.12372 link
2025-01-21 FuocChuVIP123 at CoMeDi Shared Task: Disagreement Ranking with XLM-Roberta Sentence Embeddings and Deep Neural Regression Phuoc Duong Huy Chu et.al. 2501.12336 null
2025-01-21 Decoherence of Schrödinger cat states in light of wave/particle duality Th. K. Mavrogordatos et.al. 2501.12328 null
2025-01-21 UI-TARS: Pioneering Automated GUI Interaction with Native Agents Yujia Qin et.al. 2501.12326 link
2025-01-21 CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification Cristiano Patrício et.al. 2501.12266 null
2025-01-21 mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework Bingyi Liu et.al. 2501.12263 null
2025-01-21 HAC++: Towards 100X Compression of 3D Gaussian Splatting Yihang Chen et.al. 2501.12255 link
2025-01-21 CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning Yuanheng Fang et.al. 2501.12226 null
2025-01-21 You Can’t Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense Wuyuao Mai et.al. 2501.12210 null
2025-01-17 FaceXBench: Evaluating Multimodal LLMs on Face Understanding Kartik Narayan et.al. 2501.10360 link
2025-01-17 Natural Language Processing of Privacy Policies: A Survey Andrick Adhikari et.al. 2501.10319 null
2025-01-17 PaSa: An LLM Agent for Comprehensive Academic Paper Search Yichen He et.al. 2501.10120 link
2025-01-17 How Do Programming Students Use Generative AI? Christian Rahe et.al. 2501.10091 null
2025-01-17 CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment Yating Liu et.al. 2501.10071 link
2025-01-17 FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization Zhaopeng Gu et.al. 2501.10067 link
2025-01-17 OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning Jinyuan Feng et.al. 2501.10062 null
2025-01-17 MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paul Röttger et.al. 2501.10057 link
2025-01-17 Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions Zhijie Tan et.al. 2501.10011 null
2025-01-17 RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation Yuefan Cao et.al. 2501.09982 null
2025-01-16 Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues Youngjoon Jang et.al. 2501.09754 null
2025-01-16 Coming full circle – A unified framework for Kochen-Specker contextuality Markus Frembs et.al. 2501.09750 null
2025-01-16 Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models Bihui Jin et.al. 2501.09745 null
2025-01-16 Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text Jihed Ncib et.al. 2501.09719 null
2025-01-16 CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education Tianyu Wang et.al. 2501.09709 link
2025-01-16 Practical Continual Forgetting for Pre-trained Vision Models Hongbo Zhao et.al. 2501.09705 link
2025-01-16 Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Zhihe Yang et.al. 2501.09695 link
2025-01-16 Quantum Contextual Hypergraphs, Operators, Inequalities, and Applications in Higher Dimensions Mladen Pavicic et.al. 2501.09637 null
2025-01-16 LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading Kuan-Ming Liu et.al. 2501.09636 null
2025-01-16 Constraints on Cosmic Rays Acceleration in Bright Gamma-ray Bursts with Observations of Fermi Xing-Fu Zhang et.al. 2501.09594 null
2025-01-15 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Jingyuan Chen et.al. 2501.09019 null
2025-01-15 Prompt gravitational-wave mergers aided by gas in Active Galactic Nuclei: The hydrodynamics of binary-single black hole scatterings Connar Rowan et.al. 2501.09017 null
2025-01-15 How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias Tosin Fadahunsi et.al. 2501.09014 link
2025-01-15 Bayesian analysis of analog gravity systems with the Rezzolla-Zhidenko metric Saulo Albuquerque et.al. 2501.09000 null
2025-01-15 Analyzing the Ethical Logic of Six Large Language Models W. Russell Neuman et.al. 2501.08951 null
2025-01-15 Disentangling Exploration of Large Language Models by Optimal Exploitation Tim Grams et.al. 2501.08925 null
2025-01-15 Feature-based One-For-All: A Universal Framework for Heterogeneous Knowledge Distillation Jhe-Hao Lin et.al. 2501.08885 null
2025-01-15 Exploring Task-Level Optimal Prompts for Visual In-Context Learning Yan Zhu et.al. 2501.08841 null
2025-01-15 ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind Kazutoshi Shinoda et.al. 2501.08838 link
2025-01-15 IDEA: Image Description Enhanced CLIP-Adapter Zhipeng Ye et.al. 2501.08816 link
2025-01-14 DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models Hyeonwoo Kim et.al. 2501.08333 null
2025-01-14 Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Miran Heo et.al. 2501.08326 null
2025-01-14 ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations Ziyuan Huang et.al. 2501.08324 null
2025-01-14 HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Abhilasha Ravichander et.al. 2501.08292 null
2025-01-14 SmartEraser: Remove Anything from Images using Masked-Region Guidance Longtao Jiang et.al. 2501.08279 null
2025-01-14 Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing Pulkit Arora et.al. 2501.08276 null
2025-01-14 TriMod Fusion for Multimodal Named Entity Recognition in Social Media Mosab Alfaqeeh et.al. 2501.08267 null
2025-01-14 Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints Jonathan Nöther et.al. 2501.08246 null
2025-01-14 ASTRID – An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems Mohita Chowdhury et.al. 2501.08208 null
2025-01-14 ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving Zain Ul Abedin et.al. 2501.08203 null
2025-01-13 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought Chengzu Li et.al. 2501.07542 null
2025-01-13 Investigating Large Language Models in Inferring Personality Traits from User Conversations Jianfeng Zhu et.al. 2501.07532 null
2025-01-13 IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion Tharun Anand et.al. 2501.07530 null
2025-01-13 RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment Difei Gu et.al. 2501.07525 link
2025-01-13 Guided SAM: Label-Efficient Part Segmentation S. B. van Rooij et.al. 2501.07434 null
2025-01-13 Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection Xin Yin et.al. 2501.07425 null
2025-01-13 Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion Lala Shakti Swarup Ray et.al. 2501.07408 null
2025-01-13 Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models Yasiru Ranasinghe et.al. 2501.07396 null
2025-01-13 Enhancing Retrieval-Augmented Generation: A Study of Best Practices Siran Li et.al. 2501.07391 link
2025-01-13 Approaching ballistic motion in 3D simulations of gamma-ray burst jets in realistic binary neutron star merger environments Emma Dreas et.al. 2501.07385 null
2025-01-10 Multi-subject Open-set Personalization in Video Generation Tsai-Shien Chen et.al. 2501.06187 null
2025-01-10 PEACE: Empowering Geologic Map Holistic Understanding with MLLMs Yangyu Huang et.al. 2501.06184 null
2025-01-10 ScooterLab: A Programmable and Participatory Sensing Research Testbed using Micromobility Vehicles Ubaidullah Khan et.al. 2501.06177 null
2025-01-10 Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories Gerd Kortemeyer et.al. 2501.06143 null
2025-01-10 Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI Yuya Asano et.al. 2501.06129 null
2025-01-10 Explaining Deep Learning-based Anomaly Detection in Energy Consumption Data by Focusing on Contextually Relevant Data Mohammad Noorchenarboo et.al. 2501.06099 null
2025-01-10 A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection Tsui Qin Mok et.al. 2501.06038 null
2025-01-10 The all-charm tetraquark and its contribution to two-photon processes Panagiotis Kalamidas et.al. 2501.06034 null
2025-01-10 How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters Romina Oji et.al. 2501.06025 link
2025-01-10 BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response Hongruixuan Chen et.al. 2501.06019 link
2025-01-09 Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark Yunzhuo Hao et.al. 2501.05444 link
2025-01-09 TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts Yu-Hao Huang et.al. 2501.05403 link
2025-01-09 FairCode: Evaluating Social Bias of LLMs in Code Generation Yongkang Du et.al. 2501.05396 link
2025-01-09 CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models Junha Park et.al. 2501.05359 null
2025-01-09 Continuity in Potential Infinite Models Matthias Eberl et.al. 2501.05276 null
2025-01-09 CallNavi: A Study and Challenge on Function Calling Routing and Invocation in Large Language Models Yewei Song et.al. 2501.05255 null
2025-01-09 Online Prompt and Solver Selection for Program Synthesis Yixuan Li et.al. 2501.05247 null
2025-01-09 Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection Pei-Kang Lee et.al. 2501.05228 null
2025-01-09 FaceMe: Robust Blind Face Restoration with Personal Identification Siyu Liu et.al. 2501.05177 null
2025-01-09 Deep Assessment of Code Review Generation Approaches: Beyond Lexical Similarity Yanjie Jiang et.al. 2501.05176 null
2025-01-08 Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding Joshua Jones et.al. 2501.04693 null
2025-01-08 Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling Nannan Li et.al. 2501.04666 null
2025-01-08 External quantum fluctuations select measurement contexts Jonte R. Hance et.al. 2501.04664 null
2025-01-08 Assessing Language Comprehension in Large Language Models Using Construction Grammar Wesley Scivetti et.al. 2501.04661 null
2025-01-08 FleSpeech: Flexibly Controllable Speech Generation with Various Prompts Hanzhao Li et.al. 2501.04644 null
2025-01-08 “Can you be my mum?”: Manipulating Social Robots in the Large Language Models Era Giulio Antonio Abbo et.al. 2501.04633 null
2025-01-08 MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation Daniele Molino et.al. 2501.04614 null
2025-01-08 Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion Yangfan He et.al. 2501.04606 link
2025-01-08 Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models Miaoyang He et.al. 2501.04582 null
2025-01-08 The Impostor is Among Us: Can Large Language Models Capture the Complexity of Human Personas? Christopher Lazik et.al. 2501.04543 null
2025-01-07 WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings Haochen Song et.al. 2501.03999 null
2025-01-07 NeuralSVG: An Implicit Representation for Text-to-Vector Generation Sagi Polaczek et.al. 2501.03992 null
2025-01-07 Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles Yuxi Xia et.al. 2501.03991 null
2025-01-07 Semantically Cohesive Word Grouping in Indian Languages N J Karthika et.al. 2501.03988 null
2025-01-07 VLM-driven Behavior Tree for Context-aware Task Planning Naoki Wake et.al. 2501.03968 link
2025-01-07 Vision Language Models as Values Detectors Giulio Antonio Abbo et.al. 2501.03957 null
2025-01-07 Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection Pablo Miralles-González et.al. 2501.03940 null
2025-01-07 Truthful mechanisms for linear bandit games with private contexts Yiting Hu et.al. 2501.03865 null
2025-01-07 Progressive Document-level Text Simplification via Large Language Models Dengzhao Fang et.al. 2501.03857 null
2025-01-07 Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Zekai Gu et.al. 2501.03847 link
2025-01-06 Rate-My-LoRA: Efficient and Adaptive Federated Model Tuning for Cardiac MRI Segmentation Xiaoxiao He et.al. 2501.03223 null
2025-01-06 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Rui Qian et.al. 2501.03218 link
2025-01-06 The FACTS Grounding Leaderboard: Benchmarking LLMs’ Ability to Ground Responses to Long-Form Input Alon Jacovi et.al. 2501.03200 null
2025-01-06 Visualizing quantum entanglement in Bose-Einstein condensates without state vectors Russell B. Thompson et.al. 2501.03199 null
2025-01-06 Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text Ali Al-Lawati et.al. 2501.03166 link
2025-01-06 The Scaling Law for LoRA Base on Mutual Information Upper Bound Jing Zhang et.al. 2501.03152 null
2025-01-06 VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity Yerong Li et.al. 2501.03139 null
2025-01-06 PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Mingyang Song et.al. 2501.03124 link
2025-01-06 LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases Dylan Bouchard et.al. 2501.03112 link
2025-01-06 Physics, Environment and Environmental Education; Perceptions from trainee Natural Science teachers Daniel Alejandro Valderrama et.al. 2501.03090 null
2025-01-03 Metadata Conditioning Accelerates Language Model Pre-training Tianyu Gao et.al. 2501.01956 link
2025-01-03 Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) for Passive Sonar Classification Jarin Ritu et.al. 2501.01921 link
2025-01-03 Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions Rachneet Sachdeva et.al. 2501.01872 link
2025-01-03 A review of long lasting activities of the central engine of gamma-ray bursts Bruce Gendre et.al. 2501.01857 null
2025-01-03 MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning Pu Yang et.al. 2501.01834 null
2025-01-03 Time Series Language Model for Descriptive Caption Generation Mohamed Trabelsi et.al. 2501.01832 null
2025-01-03 Ingredients: Blending Custom Photos with Video Diffusion Transformers Zhengcong Fei et.al. 2501.01790 link
2025-01-03 SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation Mingjie Li et.al. 2501.01765 null
2025-01-03 How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models Simone Corbo et.al. 2501.01741 null
2025-01-03 AR4D: Autoregressive 4D Generation from Monocular Videos Hanxin Zhu et.al. 2501.01722 null
2025-01-02 GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Zhangyang Qi et.al. 2501.01428 link
2025-01-02 Object-level Visual Prompts for Compositional Image Generation Gaurav Parmar et.al. 2501.01424 null
2025-01-02 Multi-Modal Video Feature Extraction for Popularity Prediction Haixu Liu et.al. 2501.01422 null
2025-01-02 Nested Attention: Semantic-aware Attention Values for Concept Personalization Or Patashnik et.al. 2501.01407 null
2025-01-02 StereoMath: An Accessible and Musical Equation Editor Kenneth Ge et.al. 2501.01404 null
2025-01-02 Training Medical Large Vision-Language Models with Abnormal-Aware Feedback Yucheng Zhou et.al. 2501.01377 null
2025-01-02 Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement Z. Zhang et.al. 2501.01368 null
2025-01-02 ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding Austin T. Wang et.al. 2501.01366 null
2025-01-02 CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models Johan Wahréus et.al. 2501.01335 link
2025-01-02 Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension Yanbo Fang et.al. 2501.01332 null
2024-12-30 Distributed Mixture-of-Agents for Edge Inference with Large Language Models Purbesh Mitra et.al. 2412.21200 link
2024-12-30 Adversarial Attack and Defense for LoRa Device Identification and Authentication via Deep Learning Yalin E. Sagduyu et.al. 2412.21164 null
2024-12-30 Unified dimensionality reduction techniques in chronic liver disease detection Anand Karna et.al. 2412.21156 null
2024-12-30 Exploring and Controlling Diversity in LLM-Agent Conversation KuanChao Chu et.al. 2412.21102 null
2024-12-30 Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model Yifei Huang et.al. 2412.21080 link
2024-12-30 Efficient Multi-Task Inferencing with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring Ehsan Latif et.al. 2412.21065 null
2024-12-30 Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration Wanglong Lu et.al. 2412.21042 link
2024-12-30 Automated Robustness Testing for LLM-based NLP Software Mingxuan Xiao et.al. 2412.21016 link
2024-12-30 Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline Nicola Messina et.al. 2412.21009 link
2024-12-30 Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering Junxiao Xue et.al. 2412.20927 null
2024-12-27 Enhancing Whisper’s Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization Kumud Tripathi et.al. 2412.19785 null
2024-12-27 Hard Photon Triggered Jets in $p$-$p$ and $A$-$A$ Collisions C. Sirimanna et.al. 2412.19738 null
2024-12-27 Can Large Language Models Adapt to Other Agents In-Context? Matthew Riemer et.al. 2412.19726 null
2024-12-27 Toward Adaptive Reasoning in Large Language Models with Thought Rollback Sijia Chen et.al. 2412.19707 link
2024-12-27 Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework Jiang Liu et.al. 2412.19684 null
2024-12-27 Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP Zhongxing Xu et.al. 2412.19650 null
2024-12-27 ReNeg: Learning Negative Embedding with Reward Guidance Xiaomin Li et.al. 2412.19637 link
2024-12-27 RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations Mingshu Zhao et.al. 2412.19628 link
2024-12-27 Signatures of prediction during natural listening in MEG data? Sahel Azizpour et.al. 2412.19622 null
2024-12-27 Gradient Weight-normalized Low-rank Projection for Efficient LLM Training Jia-Hong Huang et.al. 2412.19616 link
2024-12-24 Decentralized Intelligence in GameFi: Embodied AI Agents and the Convergence of DeFi and Virtual Ecosystems Fernando Jia et.al. 2412.18601 link
2024-12-24 ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation Hongjie Li et.al. 2412.18600 null
2024-12-24 DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Minghong Cai et.al. 2412.18597 link
2024-12-24 Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control Sergey Sedov et.al. 2412.18582 null
2024-12-24 Distilling Fine-grained Sentiment Understanding from Large Language Models Yice Zhang et.al. 2412.18552 link
2024-12-24 Token-Budget-Aware LLM Reasoning Tingxu Han et.al. 2412.18547 link
2024-12-24 Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation Derong Xu Xinhang Li et.al. 2412.18537 link
2024-12-24 Segment-Based Attention Masking for GPTs Shahar Katz et.al. 2412.18487 link
2024-12-24 Betting vs. Trading: Learning a Linear Decision Policy for Selling Wind Power and Hydrogen Yannick Heiser et.al. 2412.18479 null
2024-12-24 Is Large Language Model Good at Triple Set Prediction? An Empirical Study Yuan Yuan et.al. 2412.18443 null
2024-12-23 The Superposition of Diffusion Models Using the Itô Density Estimator Marta Skreta et.al. 2412.17762 null
2024-12-23 **Reasoning to Attend: Try to Understand How Token Works** Rui Qian et.al. 2412.17741 link
2024-12-23 Contextual Backpropagation Loops: Amplifying Deep Reasoning with Iterative Top-Down Feedback Jacob Fein-Ashley et.al. 2412.17737 link
2024-12-23 Chumor 2.0: Towards Benchmarking Chinese Humor Understanding Ruiqi He et.al. 2412.17729 link
2024-12-23 Knowledge Editing through Chain-of-Thought Changyue Wang et.al. 2412.17727 link
2024-12-23 The Cosmological Population of Gamma-Ray Bursts from the Disks of Active Galactic Nuclei Hoyoung D. Kang et.al. 2412.17714 null
2024-12-23 EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities Zhe Chen et.al. 2412.17677 link
2024-12-23 Detecting anxiety and depression in dialogues: a multi-label and explainable approach Francisco de Arriba-Pérez et.al. 2412.17651 null
2024-12-23 DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder Ente Lin et.al. 2412.17644 null
2024-12-23 LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding Hao Li et.al. 2412.17635 null
2024-12-20 MotiF: Making Text Count in Image Animation with Motion Focal Loss Shijie Wang et.al. 2412.16153 null
2024-12-20 A vector logic for extensional formal semantics Daniel Quigley et.al. 2412.16152 null
2024-12-20 PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics Daniil Larionov et.al. 2412.16120 null
2024-12-20 Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs Lynn Greschner et.al. 2412.15993 null
2024-12-20 APIRL: Deep Reinforcement Learning for REST API Fuzzing Myles Foley et.al. 2412.15991 link
2024-12-20 From General to Specific: Tailoring Large Language Models for Personalized Healthcare Ruize Shi et.al. 2412.15957 null
2024-12-20 MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection Andrea Moglia et.al. 2412.15925 link
2024-12-20 On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education Lorenz Wendlinger et.al. 2412.15902 null
2024-12-20 On Robust Cross Domain Alignment Anish Chakrabarty et.al. 2412.15861 null
2024-12-20 Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation Aiwen Jiang et.al. 2412.15845 link
2024-12-19 PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation Muntasir Wahed et.al. 2412.15209 null
2024-12-19 FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching Sucheng Ren et.al. 2412.15205 link
2024-12-19 Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying Federico Castagna et.al. 2412.15177 link
2024-12-19 Rethinking Uncertainty Estimation in Natural Language Generation Lukas Aichberger et.al. 2412.15176 null
2024-12-19 Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM Yatai Ji et.al. 2412.15156 link
2024-12-19 AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling Zihan Liu et.al. 2412.15084 null
2024-12-19 MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance Hallee E. Wong et.al. 2412.15058 null
2024-12-19 Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI Isadora Krsek et.al. 2412.15047 null
2024-12-19 LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Felix Friedrich et.al. 2412.15035 null
2024-12-19 Large Language Models and Code Security: A Systematic Literature Review Enna Basic et.al. 2412.15004 null
2024-12-18 FashionComposer: Compositional Fashion Image Generation Sihui Ji et.al. 2412.14168 null
2024-12-18 Alignment faking in large language models Ryan Greenblatt et.al. 2412.14093 link
2024-12-18 Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets Simon Thorne et.al. 2412.14062 null
2024-12-18 Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation Vera Neplenbroek et.al. 2412.14050 link
2024-12-18 Hansel: Output Length Controlling Framework for Large Language Models Seoha Song et.al. 2412.14033 null
2024-12-18 Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation Haotong Lin et.al. 2412.14015 link
2024-12-18 What makes a good metric? Evaluating automatic metrics for text-to-image consistency Candace Ross et.al. 2412.13989 null
2024-12-18 RAG for Effective Supply Chain Security Questionnaire Automation Zaynab Batool Reza et.al. 2412.13988 null
2024-12-18 Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation Eleni Sgouritsa et.al. 2412.13952 null
2024-12-18 CoRa: A Collision-Resistant LoRa Symbol Detector of Low Complexity José Álamos et.al. 2412.13930 null
2024-12-17 CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models Gaoyang Zhang et.al. 2412.13195 link
2024-12-17 MotionBridge: Dynamic Video Inbetweening with Flexible Controls Maham Tanveer et.al. 2412.13190 null
2024-12-17 Move-in-2D: 2D-Conditioned Human Motion Generation Hsin-Ping Huang et.al. 2412.13185 null
2024-12-17 DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation Miriam Wanner et.al. 2412.13175 null
2024-12-17 Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study Bolei Ma et.al. 2412.13169 link
2024-12-17 F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration Lu Liu et.al. 2412.13155 null
2024-12-17 Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation Huaijin Pi et.al. 2412.13111 null
2024-12-17 Prompt Augmentation for Self-supervised Text-guided Image Manipulation Rumeysa Bodur et.al. 2412.13081 null
2024-12-17 Identifying Bias in Deep Neural Networks Using Image Transforms Sai Teja Erukude et.al. 2412.13079 link
2024-12-17 Harnessing Event Sensory Data for Error Pattern Prediction in Vehicles: A Language Model Approach Hugo Math et.al. 2412.13041 link
2024-12-16 CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology Yuxuan Sun et.al. 2412.12077 null
2024-12-16 A LoRA is Worth a Thousand Pictures Chenxi Liu et.al. 2412.12048 null
2024-12-16 How Private are Language Models in Abstractive Summarization? Anthony Hughes et.al. 2412.12040 null
2024-12-16 Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection Ira Ceka et.al. 2412.12039 null
2024-12-16 Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm Rajat Khanda et.al. 2412.12006 null
2024-12-16 The Open Source Advantage in Large Language Models (LLMs) Jiya Manchanda et.al. 2412.12004 null
2024-12-16 SAMIC: Segment Anything with In-Context Spatial Prompt Engineering Savinay Nagendra et.al. 2412.11998 null
2024-12-16 Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support Devika Venugopalan et.al. 2412.11995 link
2024-12-16 Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives Sam Relins et.al. 2412.11878 link
2024-12-16 A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation Tian-Yi Che et.al. 2412.11832 null
2024-12-13 A Grounded Typology of Word Classes Coleman Haley et.al. 2412.10369 null
2024-12-13 TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies Ruijie Zheng et.al. 2412.10345 null
2024-12-13 SCBench: A KV Cache-Centric Analysis of Long-Context Methods Yucheng Li et.al. 2412.10319 null
2024-12-13 My Statistics is Better than Yours Simon Benhaïem et.al. 2412.10296 null
2024-12-13 Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation Yu-Jhe Li et.al. 2412.10292 null
2024-12-13 One world, one opinion? The superstar effect in LLM responses Sofie Goethals et.al. 2412.10281 null
2024-12-13 Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT Danielle R. Thomas et.al. 2412.10267 link
2024-12-13 Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models Harry J. Davies et.al. 2412.10257 null
2024-12-13 Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts Hazel Kim et.al. 2412.10246 null
2024-12-13 SPT: Sequence Prompt Transformer for Interactive Image Segmentation Senlin Cheng et.al. 2412.10224 null
2024-12-12 Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors Yue Feng et.al. 2412.09625 null
2024-12-12 LoRACLR: Contrastive Adaptation for Customization of Diffusion Models Enis Simsar et.al. 2412.09622 null
2024-12-12 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Zhuofan Zong et.al. 2412.09618 null
2024-12-12 Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG Kavana Venkatesh et.al. 2412.09614 null
2024-12-12 TimeRefine: Temporal Grounding with Time Refining Video LLM Xizi Wang et.al. 2412.09601 link
2024-12-12 Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders Fiona Ryan et.al. 2412.09586 link
2024-12-12 Obfuscated Activations Bypass LLM Latent-Space Defenses Luke Bailey et.al. 2412.09565 null
2024-12-12 Does Representation Matter? Exploring Intermediate Layers in Large Language Models Oscar Skean et.al. 2412.09563 null
2024-12-12 SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing Xueting Li et.al. 2412.09545 null
2024-12-12 Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM Han Wang et.al. 2412.09530 link
2024-12-11 GPD-1: Generative Pre-training for Driving Zixun Xie et.al. 2412.08643 link
2024-12-11 Fast Prompt Alignment for Text-to-Image Generation Khalil Mrini et.al. 2412.08639 link
2024-12-11 DMin: Scalable Training Data Influence Estimation for Diffusion Models Huawei Lin et.al. 2412.08637 link
2024-12-11 FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models Vladimir Kulikov et.al. 2412.08629 link
2024-12-11 Der Effizienz- und Intelligenzbegriff in der Lexikographie und kuenstlichen Intelligenz: kann ChatGPT die lexikographische Textsorte nachbilden? Ivan Arias-Arias et.al. 2412.08599 null
2024-12-11 Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks Arsalan Masoudifard et.al. 2412.08593 null
2024-12-11 LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Zejian Li et.al. 2412.08580 link
2024-12-11 Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation Hongming Guo et.al. 2412.08577 null
2024-12-11 Can We Generate Visual Programs Without Prompting LLMs? Michal Shlapentokh-Rothman et.al. 2412.08564 null
2024-12-11 Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations Hugo Flores García et.al. 2412.08550 null
2024-12-10 From Slow Bidirectional to Fast Causal Video Generators Tianwei Yin et.al. 2412.07772 null
2024-12-10 Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting Zetong Yang et.al. 2412.07768 null
2024-12-10 Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds Xiaoyu Xiang et.al. 2412.07766 null
2024-12-10 PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation Fatemeh Nazarieh et.al. 2412.07754 null
2024-12-10 Multi-Shot Character Consistency for Text-to-Video Generation Yuval Atzmon et.al. 2412.07750 null
2024-12-10 LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models Ziqi Lu et.al. 2412.07746 null
2024-12-10 StyleMaster: Stylize Your Video with Artistic Generation and Translation Zixuan Ye et.al. 2412.07744 null
2024-12-10 SKIPNet: Spatial Attention Skip Connections for Enhanced Brain Tumor Classification Khush Mendiratta et.al. 2412.07736 null
2024-12-10 Granite Guardian Inkit Padhi et.al. 2412.07724 link
2024-12-10 Leveraging Content and Context Cues for Low-Light Image Enhancement Igor Morawski et.al. 2412.07693 link
2024-12-09 Visual Lexicon: Rich Image Features in Language Space XuDong Wang et.al. 2412.06774 null
2024-12-09 Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty Meera Hahn et.al. 2412.06771 link
2024-12-09 Ranking-aware adapter for text-driven image ordering with CLIP Wei-Hsiang Yu et.al. 2412.06760 link
2024-12-09 JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM Takuro Fujii et.al. 2412.06738 link
2024-12-09 Revisiting GRB 060218: new insights into low-luminosity gamma-ray bursts from a revised shock breakout model Christopher M. Irwin et.al. 2412.06736 null
2024-12-09 AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark Lan Li et.al. 2412.06724 link
2024-12-09 VP-MEL: Visual Prompts Guided Multimodal Entity Linking Hongze Mi et.al. 2412.06720 null
2024-12-09 Facade: High-Precision Insider Threat Detection Using Deep Contextual Anomaly Detection Alex Kantchelian et.al. 2412.06700 null
2024-12-09 Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach Weichao Xu et.al. 2412.06684 null
2024-12-09 Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion Shuaiting Li et.al. 2412.06661 null
2024-12-06 Sparse autoencoders reveal selective remapping of visual concepts during adaptation Hyesu Lim et.al. 2412.05276 link
2024-12-06 Mind the Time: Temporally-Controlled Multi-Event Video Generation Ziyi Wu et.al. 2412.05263 null
2024-12-06 TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft Qian Long et.al. 2412.05255 link
2024-12-06 From classical techniques to convolution-based models: A review of object detection algorithms Fnu Neha et.al. 2412.05252 null
2024-12-06 LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds James Beetham et.al. 2412.05232 null
2024-12-06 Are Frontier Large Language Models Suitable for Q&A in Science Centres? Jacob Watson et.al. 2412.05200 null
2024-12-06 QueEn: A Large Language Model for Quechua-English Translation Junhao Chen et.al. 2412.05184 null
2024-12-06 A text-to-tabular approach to generate synthetic patient data using LLMs Margaux Tornqvist et.al. 2412.05153 link
2024-12-06 LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation Donald Shenaj et.al. 2412.05148 link
2024-12-06 A Practical Examination of AI-Generated Text Detectors for Large Language Models Brian Tufts et.al. 2412.05139 null
2024-12-05 Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail Luca Bartolomei et.al. 2412.04472 link
2024-12-05 PaintScene4D: Consistent 4D Scene Generation from Text Prompts Vinayak Gupta et.al. 2412.04471 null
2024-12-05 UnZipLoRA: Separating Content and Style from a Single Image Chang Liu et.al. 2412.04465 null
2024-12-05 Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Enshen Zhou et.al. 2412.04455 null
2024-12-05 EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios Lu Qiu et.al. 2412.04447 null
2024-12-05 GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Kaiyi Huang et.al. 2412.04440 null
2024-12-05 Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Yuying Ge et.al. 2412.04432 link
2024-12-05 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Jiuhai Chen et.al. 2412.04424 link
2024-12-05 Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation Xuying Li et.al. 2412.04415 null
2024-12-05 Discriminative Fine-tuning of LVLMs Yassine Ouali et.al. 2412.04378 null
2024-12-04 Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Wujian Peng et.al. 2412.03565 link
2024-12-04 Best-of-N Jailbreaking John Hughes et.al. 2412.03556 link
2024-12-04 Imagine360: Immersive 360 Video Generation from Perspective Anchor Jing Tan et.al. 2412.03552 null
2024-12-04 Perception Tokens Enhance Visual Reasoning in Multimodal Language Models Mahtab Bigverdi et.al. 2412.03548 null
2024-12-04 Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models Natalie Mackraz et.al. 2412.03537 null
2024-12-04 A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences Gabriel Lino Garcia et.al. 2412.03531 null
2024-12-04 You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? Dominic Lohr et.al. 2412.03516 null
2024-12-04 Gesture Classification in Artworks Using Contextual Image Features Azhar Hussian et.al. 2412.03456 null
2024-12-04 PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation Ao Wang et.al. 2412.03409 link
2024-12-04 Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment Feng He et.al. 2412.03400 null
2024-12-03 Motion Prompting: Controlling Video Generation with Motion Trajectories Daniel Geng et.al. 2412.02700 null
2024-12-03 Diffusion-based Visual Anagram as Multi-task Learning Zhiyuan Xu et.al. 2412.02693 link
2024-12-03 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Viet Nguyen et.al. 2412.02687 null
2024-12-03 T-REG: Preference Optimization with Token-Level Reward Regularization Wenxuan Zhou et.al. 2412.02685 null
2024-12-03 Liquefaction: Privately Liquefying Blockchain Assets James Austgen et.al. 2412.02634 null
2024-12-03 Time-Reversal Provides Unsupervised Feedback to LLMs Yerram Varun et.al. 2412.02626 null
2024-12-03 Explainable CTR Prediction via LLM Reasoning Xiaohan Yu et.al. 2412.02588 null
2024-12-03 Copy-Move Forgery Detection and Question Answering for Remote Sensing Image Ze Zhang et.al. 2412.02575 link
2024-12-03 Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey Chenyang Liu et.al. 2412.02573 link
2024-12-03 Unveiling Concept Attribution in Diffusion Models Quang H. Nguyen et.al. 2412.02542 link
2024-11-29 SIMS: Simulating Human-Scene Interactions with Real World Script Planning Wenjia Wang et.al. 2411.19921 null
2024-11-29 Handling irresolvable conflicts in the Semantic Web: an RDF-based conflict-tolerant version of the Deontic Traditional Scheme Livio Robaldo et.al. 2411.19918 link
2024-11-29 Another look at inference after prediction Jessica Gronsbell et.al. 2411.19908 link
2024-11-29 Cross-Domain Recommendation Meets Large Language Models Ajay Krishna Vajjala et.al. 2411.19862 link
2024-11-29 Neuroplasticity and Psychedelics: a comprehensive examination of classic and non-classic compounds in pre and clinical models Claudio Agnorelli et.al. 2411.19840 null
2024-11-29 Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation Robin D. Pesl et.al. 2411.19804 null
2024-11-29 PerLA: Perceptive 3D Language Assistant Guofeng Mei et.al. 2411.19774 null
2024-11-29 SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks Kim-Celine Kahl et.al. 2411.19688 link
2024-11-29 Measurement of the Inclusive Cross Sections of Prompt $J/ψ$ and $ψ(3686)$ Production in $e^{+}e^{-}$ Annihilation from $\sqrt{s}=3.808$ to $4.951$ GeV BESIII Collaboration et.al. 2411.19642 null
2024-11-29 Unleashing the Transformative Power of Deliberation With Contextual Citizens Ariane Lambert-Mogiliansky et.al. 2411.19596 null
2024-11-27 Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis Eva Prakash et.al. 2411.18602 null
2024-11-27 Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning Omkar Khade et.al. 2411.18571 null
2024-11-27 A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models Rong Wang et.al. 2411.18564 null
2024-11-27 Bumblebee cosmology: Tests using distance- and time-redshift probes Xincheng Zhu et.al. 2411.18559 null
2024-11-27 Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models Minhyeok Lee et.al. 2411.18530 link
2024-11-27 Perturbation Ontology based Graph Attention Networks Yichen Wang et.al. 2411.18520 null
2024-11-27 Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS Jinyang Wu et.al. 2411.18478 null
2024-11-28 MM-Path: Multi-modal, Multi-granularity Path Representation Learning – Extended Version Ronghui Xu et.al. 2411.18428 link
2024-11-27 Short-time existence and uniqueness for some infinite-dimensional Nash systems Davide Francesco Redaelli et.al. 2411.18356 null
2024-11-27 TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models Riza Velioglu et.al. 2411.18350 link
2024-11-26 Video-Guided Foley Sound Generation with Multimodal Controls Ziyang Chen et.al. 2411.17698 null
2024-11-26 Instance-Aware Graph Prompt Learning Jiazheng Li et.al. 2411.17676 null
2024-11-26 Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting Liyun Zhang et.al. 2411.17674 null
2024-11-26 SketchAgent: Language-Driven Sequential Sketch Generation Yael Vinker et.al. 2411.17673 null
2024-11-26 Synthetic Data Generation with LLM for Improved Depression Prediction Andrea Kang et.al. 2411.17672 null
2024-11-26 Linguistic Laws Meet Protein Sequences: A Comparative Analysis of Subword Tokenization Methods Burak Suyunu et.al. 2411.17669 link
2024-11-26 BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings Abhay Shanbhag et.al. 2411.17661 null
2024-11-26 Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism Yi-Chien Lin et.al. 2411.17651 null
2024-11-26 SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation Claudia Cuttano et.al. 2411.17646 link
2024-11-26 Uma proposta para o uso de RPG no Ensino de Física: A Vingança de Newton Maria Rita Vasconcelos Brandão Souza et.al. 2411.17642 null
2024-11-25 Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective Jean Marie Tshimula et.al. 2411.16642 null
2024-11-25 Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric Zhichao Zhang et.al. 2411.16619 null
2024-11-25 MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series Aaron Wheeler et.al. 2411.16585 link
2024-11-25 RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Chan Hee Song et.al. 2411.16537 null
2024-11-25 Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings Carolin M. Schuster et.al. 2411.16527 link
2024-11-25 Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency Jerry Yao-Chieh Hu et.al. 2411.16525 null
2024-11-25 Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis Boming Miao et.al. 2411.16503 null
2024-11-25 Interpreting Language Reward Models via Contrastive Explanations Junqi Jiang et.al. 2411.16502 null
2024-11-25 Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval Xiaocong Yang et.al. 2411.16454 null
2024-11-25 VQ-SGen: A Vector Quantized Stroke Representation for Sketch Generation Jiawei Wang et.al. 2411.16446 null
2024-11-22 VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Daeun Lee et.al. 2411.15115 null
2024-11-22 AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution Fengyuan Liu et.al. 2411.15102 link
2024-11-22 Instance-Aware Generalized Referring Expression Segmentation E-Ro Nguyen et.al. 2411.15087 null
2024-11-22 FloAt: Flow Warping of Self-Attention for Clothing Animation Generation Swasti Shreya Mishra et.al. 2411.15028 null
2024-11-22 FTA generation using GenAI with an Autonomy sensor Usecase Sneha Sudhir Shetiya et.al. 2411.15007 null
2024-11-22 ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data Junhong Shen et.al. 2411.15004 link
2024-11-22 Free Energy Projective Simulation (FEPS): Active inference with interpretability Joséphine Pazem et.al. 2411.14991 null
2024-11-22 Generative AI may backfire for counterspeech Dominik Bär et.al. 2411.14986 null
2024-11-22 Exploring Foundation Models Fine-Tuning for Cytology Classification Manon Dausort et.al. 2411.14975 link
2024-11-22 Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation Colin Diggs et.al. 2411.14971 null
2024-11-21 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Yuhao Dong et.al. 2411.14432 link
2024-11-21 Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings Aaron Zheng et.al. 2411.14398 null
2024-11-21 Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation Yuanhao Cai et.al. 2411.14384 null
2024-11-21 DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Tianhe Ren et.al. 2411.14347 link
2024-11-21 UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages Bethel Melesse Tessema et.al. 2411.14343 link
2024-11-21 Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams Jitendra Bhandari et.al. 2411.14299 link
2024-11-21 CAIP: Detecting Router Misconfigurations with Context-Aware Iterative Prompting of LLMs Xi Jiang et.al. 2411.14283 null
2024-11-21 Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance Haozhe Zhao et.al. 2411.14279 null
2024-11-21 Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification Junhua Liu et.al. 2411.14252 null
2024-11-21 Natural Language Reinforcement Learning Xidong Feng et.al. 2411.14251 link
2024-11-20 Metacognition for Unknown Situations and Environments (MUSE) Rodolfo Valiente et.al. 2411.13537 null
2024-11-20 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Ziqi Huang et.al. 2411.13503 link
2024-11-20 AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations Gaurav Verma et.al. 2411.13451 null
2024-11-20 From Prompt Engineering to Prompt Craft Joseph Lindley et.al. 2411.13422 null
2024-11-20 Theory-independent monitoring of the decoherence of a superconducting qubit with generalized contextuality Albert Aloy et.al. 2411.13421 link
2024-11-20 Unleashing the Power of Large Language Models for Group POI Recommendations Jing Long et.al. 2411.13415 null
2024-11-21 Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese Dat Van-Thanh Nguyen et.al. 2411.13407 null
2024-11-20 Adversarial Diffusion Compression for Real-World Image Super-Resolution Bin Chen et.al. 2411.13383 link
2024-11-20 I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception Jiawei Zhang et.al. 2411.13314 null
2024-11-20 Combining Autoregressive and Autoencoder Language Models for Text Classification João Gonçalves et.al. 2411.13282 link
2024-11-19 ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models Salma Kharrat et.al. 2411.12736 link
2024-11-19 Neurosymbolic Graph Enrichment for Grounded World Models Stefano De Giorgis et.al. 2411.12671 null
2024-11-19 SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation Ron Keuth et.al. 2411.12602 link
2024-11-19 AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Yuanbin Man et.al. 2411.12593 null
2024-11-19 Large Language Models for Combinatorial Optimization of Design Structure Matrix Shuo Jiang et.al. 2411.12571 null
2024-11-19 Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution Yang Zou et.al. 2411.12530 link
2024-11-19 Human-AI Co-Creativity: Exploring Synergies Across Levels of Creative Collaboration Jennifer Haase et.al. 2411.12527 null
2024-11-19 3D Reconstruction by Looking: Instantaneous Blind Spot Detector for Indoor SLAM through Mixed Reality Hanbeom Chang et.al. 2411.12514 null
2024-11-19 Evaluating the Prompt Steerability of Large Language Models Erik Miehling et.al. 2411.12405 link
2024-11-19 DGSNA: prompt-based Dynamic Generative Scene-based Noise Addition method Zihao Chen et.al. 2411.12363 null
2024-11-18 Absorbing state dynamics of stochastic gradient descent Guanming Zhang et.al. 2411.11834 null
2024-11-18 The Lambda Calculus is Quantifiable Valentin Maestracci et.al. 2411.11809 null
2024-11-18 Novel Application of Neutrinos to Evaluate U.S. Nuclear Weapons Performance J. R. Distel et.al. 2411.11804 null
2024-11-18 Competing Bandits in Decentralized Large Contextual Matching Markets Satush Parikh et.al. 2411.11794 null
2024-11-18 LLM-IE: A Python Package for Generative Information Extraction with Large Language Models Enshuo Hsu et.al. 2411.11779 null
2024-11-18 Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment Allison Huang et.al. 2411.11731 link
2024-11-18 Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation Mingchao Qi et.al. 2411.11714 link
2024-11-18 Exploring LLMs for Verifying Technical System Specifications Against Requirements Lasse M. Reinpold et.al. 2411.11582 null
2024-11-18 Simple But Not Secure: An Empirical Security Analysis of Two-factor Authentication Systems Zhi Wang et.al. 2411.11551 null
2024-11-18 A Code Knowledge Graph-Enhanced System for LLM-Based Fuzz Driver Generation Hanxiang Xu et.al. 2411.11532 link
2024-11-15 LLaVA-o1: Let Vision Language Models Reason Step-by-Step Guowei Xu et.al. 2411.10440 link
2024-11-15 Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations Jianfeng Chi et.al. 2411.10414 null
2024-11-15 Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation Markus Karmann et.al. 2411.10411 null
2024-11-15 On the Foundation Model for Cardiac MRI Reconstruction Chi Zhang et.al. 2411.10403 null
2024-11-15 A Survey of Event Causality Identification: Principles, Taxonomy, Challenges, and Assessment Zefan Zeng et.al. 2411.10371 null
2024-11-15 Bias Unveiled: Investigating Social Bias in LLM-Generated Code Lin Ling et.al. 2411.10351 null
2024-11-15 Number it: Temporal Grounding Videos like Flipping Manga Yongliang Wu et.al. 2411.10332 link
2024-11-15 Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding Huming Qiu et.al. 2411.10329 null
2024-11-15 Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning Jingru Yang et.al. 2411.10252 null
2024-11-15 Measuring Non-Adversarial Reproduction of Training Data in Large Language Models Michael Aerni et.al. 2411.10242 null
2024-11-14 MagicQuill: An Intelligent Interactive Image Editing System Zichen Liu et.al. 2411.09703 link
2024-11-14 LLM Hallucination Reasoning with Zero-shot Knowledge Test Seongmin Lee et.al. 2411.09689 null
2024-11-14 Squeezed Attention: Accelerating Long Context Length LLM Inference Coleman Hooper et.al. 2411.09688 link
2024-11-14 The lowest-radiation environments in the Solar System: new opportunities for underground rare-event searches Xilin Zhang et.al. 2411.09634 null
2024-11-14 Local deployment of large-scale music AI models on commodity hardware Xun Zhou et.al. 2411.09625 null
2024-11-14 PTR: Precision-Driven Tool Recommendation for Large Language Models Hang Gao et.al. 2411.09613 null
2024-11-14 Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration Yifan Shao et.al. 2411.09604 link
2024-11-14 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Zhengyi Wang et.al. 2411.09595 null
2024-11-14 SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas Yu-Kai Hung et.al. 2411.09577 null
2024-11-14 Spider: Any-to-Many Multimodal LLM Jinxiang Lai et.al. 2411.09439 link
2024-11-13 Large Wireless Model (LWM): A Foundation Model for Wireless Channels Sadjad Alikhani et.al. 2411.08872 link
2024-11-13 The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models Daniel P. Jeong et.al. 2411.08870 link
2024-11-13 CamemBERT 2.0: A Smarter French Language Model Aged to Perfection Wissam Antoun et.al. 2411.08868 null
2024-11-13 LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs Piyush Jha et.al. 2411.08862 null
2024-11-13 Process-aware Human Activity Recognition Jiawei Zheng et.al. 2411.08814 null
2024-11-13 Logic-based Knowledge Awareness for Autonomous Agents in Continuous Spaces Arabinda Ghosh et.al. 2411.08754 null
2024-11-13 Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers Clément Dumas et.al. 2411.08745 link
2024-11-13 New advances in universal approximation with neural networks of minimal width Dennis Rochau et.al. 2411.08735 null
2024-11-14 Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models Somanshu Singla et.al. 2411.08733 link
2024-11-13 Polymetis:Large Language Modeling for Multiple Material Domains Chao Huang et.al. 2411.08728 null
2024-11-12 From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents Chuyi Kong et.al. 2411.07965 null
2024-11-12 MANTIS: A Mixed-Signal Near-Sensor Convolutional Imager SoC Using Charge-Domain 4b-Weighted 5-to-84-TOPS/W MAC Operations for Feature Extraction and Region-of-Interest Detection Martin Lefebvre et.al. 2411.07946 null
2024-11-12 CryptoLLM: Unleashing the Power of Prompted LLMs for SmartQnA and Classification of Crypto Posts Aniket Deroy et.al. 2411.07917 null
2024-11-12 INTRABENCH: Interactive Radiological Benchmark Constantin Ulrich et.al. 2411.07885 null
2024-11-12 Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models Yusen Zhang et.al. 2411.07858 link
2024-11-12 FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training Philip Zmushko et.al. 2411.07837 link
2024-11-12 Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices Kilian Pfeiffer et.al. 2411.07826 null
2024-11-12 Federated Low-Rank Adaptation with Differential Privacy over Wireless Networks Tianqu Kang et.al. 2411.07806 null
2024-11-12 RedCode: Risky Code Execution and Generation Benchmark for Code Agents Chengquan Guo et.al. 2411.07781 link
2024-11-12 Topological resilience of optical skyrmions in local decoherence Li-Wen Wang et.al. 2411.07775 null
2024-11-11 Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations Chaitanya Malaviya et.al. 2411.07237 null
2024-11-11 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Yoad Tewel et.al. 2411.07232 null
2024-11-11 Tasks, Time, and Tools: Quantifying Online Sensemaking Efforts Through a Survey-based Study Andrew Kuznetsov et.al. 2411.07206 null
2024-11-11 DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID Nyle Siddiqui et.al. 2411.07205 link
2024-11-11 NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics David Robinson et.al. 2411.07186 null
2024-11-11 SAMPart3D: Segment Any Part in 3D Objects Yunhan Yang et.al. 2411.07184 link
2024-11-11 Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis Taihang Hu et.al. 2411.07132 link
2024-11-11 Fast and Robust Contextual Node Representation Learning over Dynamic Graphs Xingzhi Guo et.al. 2411.07123 null
2024-11-11 Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation Ziwei Liu et.al. 2411.07021 null
2024-11-11 Flaring gamma-ray emission coincident with a hyperactive fast radio burst source Yi Xing et.al. 2411.06996 null
2024-11-08 LLMs as Method Actors: A Model for Prompt Engineering and Architecture Colin Doyle et.al. 2411.05778 link
2024-11-08 Quantitative Assessment of Intersectional Empathetic Bias and Understanding Vojtech Formanek et.al. 2411.05777 link
2024-11-08 End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering Dylan Goetting et.al. 2411.05755 link
2024-11-08 A doublet of cosmological models to challenge the H0 tension in the Pantheon Supernovae Ia catalog B. De Simone et.al. 2411.05744 null
2024-11-08 Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition Abhisek Ray et.al. 2411.05692 link
2024-11-08 Tell What You Hear From What You See – Video to Audio Generation Through Text Xiulong Liu et.al. 2411.05679 link
2024-11-08 Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation Xiwen Wei et.al. 2411.05663 link
2024-11-08 Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation Long Truong To et.al. 2411.05641 null
2024-11-08 From Resource Control to Digital Trust with User-Managed Access Wouter Termont et.al. 2411.05622 null
2024-11-08 Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages JA Meaney et.al. 2411.05593 null
2024-11-07 SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Muyang Li et.al. 2411.05007 link
2024-11-07 HourVideo: 1-Hour Video-Language Understanding Keshigeyan Chandrasegaran et.al. 2411.04998 link
2024-11-07 Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives Hao Sun et.al. 2411.04991 link
2024-11-07 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Wenqiang Sun et.al. 2411.04928 null
2024-11-07 StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration Panwen Hu et.al. 2411.04925 null
2024-11-07 Structure Matters: Dynamic Policy Gradient Sara Klein et.al. 2411.04913 null
2024-11-07 In the Era of Prompt Learning with Vision-Language Models Ankit Jha et.al. 2411.04892 null
2024-11-07 Prompt-Guided Internal States for Hallucination Detection of Large Language Models Fujie Zhang et.al. 2411.04847 link
2024-11-07 VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models Ming Cheng et.al. 2411.04825 null
2024-11-07 Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet Elija Deineko et.al. 2411.04777 null
2024-11-06 Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? Daniel P. Jeong et.al. 2411.04118 link
2024-11-06 Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset Alexandre Galashov et.al. 2411.04034 null
2024-11-06 Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages Aniket Deroy et.al. 2411.04025 null
2024-11-06 Predicting and Publishing Accurate Imbalance Prices Using Monte Carlo Tree Search Fabio Pavirani et.al. 2411.04011 null
2024-11-06 Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning Jiawei Yao et.al. 2411.03978 link
2024-11-06 Continuous-Time State Estimation Methods in Robotics: A Survey William Talbot et.al. 2411.03951 null
2024-11-06 Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks Felipe Marra et.al. 2411.03948 link
2024-11-06 Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks Ryan Campbell et.al. 2411.03945 link
2024-11-06 Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models Minh Duc Bui et.al. 2411.03888 link
2024-11-06 Data Fusion of Synthetic Query Variants With Generative Large Language Models Timo Breuer et.al. 2411.03881 link
2024-11-05 Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? Jingyu Xiao et.al. 2411.03292 link
2024-11-05 Proxy-informed Bayesian transfer learning with unknown sources Sabina J. Sloman et.al. 2411.03263 null
2024-11-05 DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models Ying Zhou et.al. 2411.03250 null
2024-11-05 On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models Tariq Berrada Ifriqi et.al. 2411.03177 null
2024-11-05 From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice Alicia Guo et.al. 2411.03137 null
2024-11-05 MA^2: A Self-Supervised and Motion Augmenting Autoencoder for Gait-Based Automatic Disease Detection Yiqun Liu et.al. 2411.03129 null
2024-11-05 “Create a Fear of Missing Out” – ChatGPT Implements Unsolicited Deceptive Designs in Generated Websites Without Warning Veronika Krauß et.al. 2411.03108 null
2024-11-05 Speech Separation with Pretrained Frontend to Minimize Domain Mismatch Wupeng Wang et.al. 2411.03085 link
2024-11-05 Growing a Tail: Increasing Output Diversity in Large Language Models Michal Shur-Ofry et.al. 2411.02989 null
2024-11-05 AtlasSeg: Atlas Prior Guided Dual-U-Net for Cortical Segmentation in Fetal Brain MRI Haoan Xu et.al. 2411.02867 null
2024-11-04 Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages Hoang Nguyen et.al. 2411.02398 null
2024-11-04 Training-free Regional Prompting for Diffusion Transformers Anthony Chen et.al. 2411.02395 link
2024-11-04 Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning Md Rifat Arefin et.al. 2411.02344 link
2024-11-04 Prospects for optical detections from binary neutron star mergers with the next-generation multi-messenger observatories E. Loffredo et.al. 2411.02342 link
2024-11-04 PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Ruyang Liu et.al. 2411.02327 link
2024-11-04 An Empirical Study on the Code Refactoring Capability of Large Language Models Jonathan Cordeiro et.al. 2411.02320 null
2024-11-04 Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast Marilyn Rego et.al. 2411.02318 null
2024-11-04 Defining and Evaluating Physical Safety for Large Language Models Yung-Chen Tang et.al. 2411.02317 null
2024-11-04 CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments Kung-Hsiang Huang et.al. 2411.02305 link
2024-11-04 Combining Induction and Transduction for Abstract Reasoning Wen-Ding Li et.al. 2411.02272 link
2024-10-31 DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion Weicai Ye et.al. 2410.24203 link
2024-10-31 **Redefining in Dictionary: Towards a Enhanced Semantic Understanding of Creative Generation** Fu Feng et.al. 2410.24160 null
2024-10-31 Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age Nouar AlDahoul et.al. 2410.24148 null
2024-10-31 COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes Muhammad Ali et.al. 2410.24139 link
2024-10-31 Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing Akash Dhruv et.al. 2410.24119 link
2024-10-31 AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization Amir Kazemi et.al. 2410.24116 null
2024-10-31 In-Context Fine-Tuning for Time-Series Foundation Models Abhimanyu Das et.al. 2410.24087 null
2024-10-31 Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs Muhammed Saeed et.al. 2410.24049 null
2024-10-31 Handwriting Recognition in Historical Documents with Multimodal LLM Lucian Li et.al. 2410.24034 null
2024-10-31 Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks Yingzhe Peng et.al. 2410.24032 null
2024-10-30 RelationBooth: Towards Relation-Aware Customized Object Generation Qingyu Shi et.al. 2410.23280 null
2024-10-30 SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Yining Hong et.al. 2410.23277 null
2024-10-30 EMMA: End-to-End Multimodal Model for Autonomous Driving Jyh-Jing Hwang et.al. 2410.23262 null
2024-10-30 Evaluating Cultural and Social Awareness of LLM Web Agents Haoyi Qiu et.al. 2410.23252 null
2024-10-30 ProTransformer: Robustify Transformers via Plug-and-Play Paradigm Zhichao Hou et.al. 2410.23182 link
2024-10-30 ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning Millennium Bismay et.al. 2410.23180 link
2024-10-31 Why Gradient Subspace? Identifying and Mitigating LoRA’s Bottlenecks in Federated Fine-Tuning of Large Language Models Navyansh Mahla et.al. 2410.23111 null
2024-10-30 PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures Tianxiang Wu et.al. 2410.23089 null
2024-10-30 BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference Junqi Zhao et.al. 2410.23079 link
2024-10-30 Toward Understanding In-context vs. In-weight Learning Bryan Chan et.al. 2410.23042 null
2024-10-29 Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier Kai Wang et.al. 2410.22317 link
2024-10-29 Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving Bo Jiang et.al. 2410.22313 link
2024-10-29 Embedding-based classifiers can detect prompt injection attacks Md. Ahsan Ayub et.al. 2410.22284 link
2024-10-29 Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models Renzhe Yu et.al. 2410.22282 null
2024-10-29 NCA-Morph: Medical Image Registration with Neural Cellular Automata Amin Ranem et.al. 2410.22265 link
2024-10-29 FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation Farima Fatahi Bayat et.al. 2410.22257 null
2024-10-29 ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising Ashutosh Chaubey et.al. 2410.22233 link
2024-10-29 Synthetic Data Generation with Large Language Models for Personalized Community Question Answering Marco Braga et.al. 2410.22182 link
2024-10-29 Benchmarking LLM Guardrails in Handling Multilingual Toxicity Yahan Yang et.al. 2410.22153 null
2024-10-29 AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts Vishal Kumar et.al. 2410.22143 null
2024-10-28 Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context Manuel Benavent-Lledo et.al. 2410.21275 link
2024-10-28 Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics Yaniv Nikankin et.al. 2410.21272 link
2024-10-28 LoRA vs Full Fine-tuning: An Illusion of Equivalence Reece Shuttleworth et.al. 2410.21228 null
2024-10-28 Exploring contextual modeling with linear complexity for point cloud segmentation Yong Xien Chng et.al. 2410.21211 null
2024-10-28 Simplest Mechanism Builder Algorithm (SiMBA): An Automated Microkinetic Model Discovery Tool Miguel Ángel de Carvalho Servia et.al. 2410.21205 link
2024-10-28 CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants Lize Alberts et.al. 2410.21159 link
2024-10-28 Palisade – Prompt Injection Detection Framework Sahasra Kokkula et.al. 2410.21146 null
2024-10-28 Do LLMs generate test oracles that capture the actual or the expected program behaviour? Michael Konstantinou et.al. 2410.21136 null
2024-10-28 KA $^2$ ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation Shangde Gao et.al. 2410.21085 null
2024-10-28 Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring Honglin Mu et.al. 2410.21083 null
2024-10-25 Model merging with SVD to tie the Knots George Stoica et.al. 2410.19735 link
2024-10-25 Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks Yinglun Xu et.al. 2410.19705 null
2024-10-25 Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs Yifei Zhang et.al. 2410.19694 null
2024-10-25 AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs Clemencia Siro et.al. 2410.19692 null
2024-10-25 Planning-Aware Diffusion Networks for Enhanced Motion Forecasting in Autonomous Driving Liu Yunhao et.al. 2410.19639 null
2024-10-25 GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing Hosam Elgendy et.al. 2410.19552 link
2024-10-25 CloserMusicDB: A Modern Multipurpose Dataset of High Quality Music Aleksandra Piekarzewicz et.al. 2410.19540 null
2024-10-25 Optimization with First Order Algorithms Charles Dossal et.al. 2410.19506 null
2024-10-25 Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization Anthony Cui et.al. 2410.19499 null
2024-10-25 A Debate-Driven Experiment on LLM Hallucinations and Accuracy Ray Li et.al. 2410.19485 null
2024-10-24 Unbounded: A Generative Infinite Game of Character Life Simulation Jialu Li et.al. 2410.18975 null
2024-10-24 ConceptDrift: Uncovering Biases through the Lens of Foundational Models Cristian Daniel Păduraru et.al. 2410.18970 null
2024-10-24 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Zhangheng Li et.al. 2410.18967 null
2024-10-24 On the Crucial Role of Initialization for Matrix Factorization Bingcong Li et.al. 2410.18965 null
2024-10-24 Learning to Look: Seeking Information for Decision Making via Policy Factorization Shivin Dass et.al. 2410.18964 null
2024-10-24 Context is Key: A Benchmark for Forecasting with Essential Textual Information Andrew Robert Williams et.al. 2410.18959 link
2024-10-24 BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning Yujuan Velvin Fu et.al. 2410.18955 null
2024-10-24 From Blind Solvers to Logical Thinkers: Benchmarking LLMs’ Logical Integrity on Faulty Mathematical Problems A M Muntasir Rahman et.al. 2410.18921 null
2024-10-25 A Survey on Speech Large Language Models Jing Peng et.al. 2410.18908 null
2024-10-24 PRISM: A Methodology for Auditing Biases in Large Language Models Leif Azzopardi et.al. 2410.18906 link
2024-10-23 TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts Yuxuan Xie et.al. 2410.18071 null
2024-10-23 Disordered charge density waves in the kagome metal FeGe Hengxin Tan et.al. 2410.18063 null
2024-10-23 CLEAR: Character Unlearning in Textual and Visual Modalities Alexey Dontsov et.al. 2410.18057 null
2024-10-23 Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases Anna Glazkova et.al. 2410.18040 null
2024-10-23 MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning Jingfan Zhang et.al. 2410.18035 null
2024-10-23 Measurements of $ψ{(2S)}$ and $χ_{c1}(3872)$ production within fully reconstructed jets LHCb collaboration et.al. 2410.18018 null
2024-10-23 Scalable Ranked Preference Optimization for Text-to-Image Generation Shyamgopal Karthik et.al. 2410.18013 null
2024-10-23 Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation Suho Kang et.al. 2410.18001 link
2024-10-23 An evolutionary game theory approach to modeling behavioral interaction in disclosing infection begins with an outbreak: COVID-19 as an example Pranav Verma et.al. 2410.17996 null
2024-10-23 Closed-form merging of parameter-efficient modules for Federated Continual Learning Riccardo Salami et.al. 2410.17961 null
2024-10-22 Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods Tsachi Blau et.al. 2410.17222 null
2024-10-22 Hierarchical Upper Confidence Bounds for Constrained Online Learning Ali Baheri et.al. 2410.17216 null
2024-10-22 YOLO-TS: Real-Time Traffic Sign Detection with Enhanced Accuracy Using Optimized Receptive Fields and Anchor-Free Fusion Junzhou Chen et.al. 2410.17144 null
2024-10-22 PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles Li Siyan et.al. 2410.17127 link
2024-10-22 Enhancing Answer Attribution for Faithful Text Generation with Large Language Models Juraj Vladika et.al. 2410.17112 null
2024-10-23 Optimal Design for Reward Modeling in RLHF Antoine Scheid et.al. 2410.17055 null
2024-10-22 Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups Charvi Rastogi et.al. 2410.17032 null
2024-10-23 GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks Shuyang Hou et.al. 2410.17031 null
2024-10-22 SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine Xiaochen Wang et.al. 2410.17021 null
2024-10-22 LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices Chuntao Ding et.al. 2410.16954 link
2024-10-21 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Shuangrui Ding et.al. 2410.16268 link
2024-10-21 MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report Samrajya Thapa et.al. 2410.16239 link
2024-10-21 Building A Coding Assistant via the Retrieval-Augmented Language Model Xinze Li et.al. 2410.16229 link
2024-10-21 Theoretical Limitations of Ensembles in the Age of Overparameterization Niclas Dern et.al. 2410.16201 null
2024-10-21 From Tokens to Materials: Leveraging Language Models for Scientific Discovery Yuwei Wan et.al. 2410.16165 link
2024-10-21 An Explainable Contrastive-based Dilated Convolutional Network with Transformer for Pediatric Pneumonia Detection Chandravardhan Singh Raghaw et.al. 2410.16143 null
2024-10-21 Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs Kang Zhao et.al. 2410.16135 null
2024-10-21 Do LLMs write like humans? Variation in grammatical and rhetorical styles Alex Reinhart et.al. 2410.16107 null
2024-10-21 Analysing the Residual Stream of Language Models Under Knowledge Conflicts Yu Zhao et.al. 2410.16090 null
2024-10-21 Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context Maggie Mi et.al. 2410.16069 null
2024-10-18 MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps Xiongtao Zhou et.al. 2410.14668 link
2024-10-18 DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph Maitreya Prafulla Chitale et.al. 2410.14666 null
2024-10-18 GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings Raghuveer Thirukovalluru et.al. 2410.14635 link
2024-10-18 CELI: Controller-Embedded Language Model Interactions Jan-Samuel Wagner et.al. 2410.14627 null
2024-10-18 DiSCo Meets LLMs: A Unified Approach for Sparse Retrieval and Contextual Distillation in Conversational Search Simon Lupart et.al. 2410.14609 null
2024-10-18 Neural Combinatorial Clustered Bandits for Recommendation Systems Baran Atalar et.al. 2410.14586 null
2024-10-18 Do LLMs “know” internally when they follow instructions? Juyeon Heo et.al. 2410.14516 link
2024-10-18 CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection Andrea Appiani et.al. 2410.14509 null
2024-10-18 Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models Cody Clop et.al. 2410.14479 null
2024-10-18 An abstract structure determines the contextuality degree of observable-based Kochen-Specker proofs Axel Muller et.al. 2410.14463 null
2024-10-17 Can MLLMs Understand the Deep Implication Behind Chinese Images? Chenhao Zhang et.al. 2410.13854 link
2024-10-17 AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents Ke Yang et.al. 2410.13825 null
2024-10-17 ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution Junhao Gu et.al. 2410.13807 null
2024-10-17 PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment Zekun Moore Wang et.al. 2410.13785 null
2024-10-17 Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors Georgios Chochlakis et.al. 2410.13776 null
2024-10-17 Improving Multi-modal Large Language Model through Boosting Vision Capabilities Yanpeng Sun et.al. 2410.13733 null
2024-10-17 Persistent Pre-Training Poisoning of LLMs Yiming Zhang et.al. 2410.13722 null
2024-10-17 Jailbreaking LLM-Controlled Robots Alexander Robey et.al. 2410.13691 null
2024-10-17 Label-free prediction of fluorescence markers in bovine satellite cells using deep learning Sania Sinha et.al. 2410.13685 null
2024-10-18 Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion Yijun Liang et.al. 2410.13674 link
2024-10-16 Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media Ross Deans Kristensen-McLachlan et.al. 2410.12791 null
2024-10-16 Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models Ce Zhang et.al. 2410.12790 link
2024-10-16 JudgeBench: A Benchmark for Evaluating LLM-based Judges Sijun Tan et.al. 2410.12784 link
2024-10-16 Context-Scaling versus Task-Scaling in In-Context Learning Amirhesam Abedsoltan et.al. 2410.12783 null
2024-10-16 SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation Jaehong Yoon et.al. 2410.12761 null
2024-10-16 How Does Variance Shape the Regret in Contextual Bandits? Zeyu Jia et.al. 2410.12713 null
2024-10-16 Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization Xingqi Wang et.al. 2410.12700 link
2024-10-17 Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 Mohamad Abdi et.al. 2410.12686 null
2024-10-17 Context Matters: Leveraging Contextual Features for Time Series Forecasting Sameep Chattopadhyay et.al. 2410.12672 null
2024-10-16 CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training Zhiyuan Ma et.al. 2410.12595 null
2024-10-15 KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities Hsin-Ping Huang et.al. 2410.11824 null
2024-10-15 SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing Zhiyuan Zhang et.al. 2410.11815 null
2024-10-15 Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability Tsz Ting Chung et.al. 2410.11786 null
2024-10-15 On the Training Convergence of Transformers for In-Context Classification Wei Shen et.al. 2410.11778 null
2024-10-15 SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding Ying Chen et.al. 2410.11761 null
2024-10-15 Identification and modelling of optically thin inverse Compton scattering in the prompt emission of GRB131014A Pragyan Pratim Bordoloi et.al. 2410.11753 null
2024-10-15 Personas with Attitudes: Controlling LLMs for Diverse Data Annotation Leon Fröhling et.al. 2410.11745 link
2024-10-15 RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation Anton Antonov et.al. 2410.11722 link
2024-10-15 Adaptive Coordinators and Prompts on Heterogeneous Graphs for Cross-Domain Recommendations Hengyu Zhang et.al. 2410.11719 null
2024-10-15 It’s Just Another Day: Unique Video Captioning by Discriminative Prompting Toby Perrett et.al. 2410.11702 null
2024-10-14 Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models Jingzhi Bao et.al. 2410.10821 link
2024-10-14 Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Ziyue Li et.al. 2410.10814 link
2024-10-14 Denial-of-Service Poisoning Attacks against Large Language Models Kuofeng Gao et.al. 2410.10760 link
2024-10-14 Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification Jan Cegin et.al. 2410.10756 link
2024-10-14 FlexGen: Flexible Multi-View Generation from Text and Image Inputs Xinli Xu et.al. 2410.10745 null
2024-10-14 SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing Pengrui Quan et.al. 2410.10741 link
2024-10-14 Large Language Models Are Active Critics in NLG Evaluation Shuying Xu et.al. 2410.10724 null
2024-10-15 4-LEGS: 4D Language Embedded Gaussian Splatting Gal Fiebelman et.al. 2410.10719 null
2024-10-14 Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues Qibing Ren et.al. 2410.10700 link
2024-10-14 Functional Flexibility in Generative AI Interfaces: Text Editing with LLMs through Conversations, Toolbars, and Prompts Florian Lehmann et.al. 2410.10644 null
2024-10-11 AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation Zijun Wang et.al. 2410.09040 link
2024-10-11 Mentor-KD: Making Small Language Models Better Multi-step Reasoners Hojae Lee et.al. 2410.09037 link
2024-10-11 AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Maksym Andriushchenko et.al. 2410.09024 null
2024-10-11 Parameter-Efficient Fine-Tuning of State Space Models Kevin Galim et.al. 2410.09016 link
2024-10-11 The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals Xiaofeng Wu et.al. 2410.09013 null
2024-10-11 Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models Hao Li et.al. 2410.09012 link
2024-10-11 Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory Rebecca M. M. Hicke et.al. 2410.08991 link
2024-10-11 Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Jingyu Zhang et.al. 2410.08968 null
2024-10-11 Exploring the Design Space of Cognitive Engagement Techniques with AI-Generated Code for Enhanced Learning Majeed Kazemitabaar et.al. 2410.08922 null
2024-10-11 Utilizing ChatGPT in a Data Structures and Algorithms Course: A Teaching Assistant’s Perspective Pooriya Jamie et.al. 2410.08899 null
2024-10-10 LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts Anh-Quan Cao et.al. 2410.08211 null
2024-10-10 HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation Shanyan Guan et.al. 2410.08192 null
2024-10-10 SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation Hang Yin et.al. 2410.08189 null
2024-10-10 Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs Xiaoyuan Liu et.al. 2410.08145 link
2024-10-10 Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks Mathis Pink et.al. 2410.08133 null
2024-10-10 Think Beyond Size: Dynamic Prompting for More Effective Reasoning Kamesh R et.al. 2410.08130 null
2024-10-10 What Makes Large Language Models Reason in (Multi-Turn) Code Generation? Kunhao Zheng et.al. 2410.08105 null
2024-10-10 Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models Wenting Tan et.al. 2410.08068 link
2024-10-10 Reversible Decoupling Network for Single Image Reflection Removal Hao Zhao et.al. 2410.08063 link
2024-10-10 Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions Inderjeet Nair et.al. 2410.08058 link
2024-10-09 MM-Ego: Towards Building Egocentric Multimodal LLMs Hanrong Ye et.al. 2410.07177 null
2024-10-09 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Fabian Paischer et.al. 2410.07170 link
2024-10-09 AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation Yukang Cao et.al. 2410.07164 null
2024-10-09 InstructG2I: Synthesizing Images from Multimodal Attributed Graphs Bowen Jin et.al. 2410.07157 link
2024-10-09 VHELM: A Holistic Evaluation of Vision Language Models Tony Lee et.al. 2410.07112 link
2024-10-09 I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy Gian Maria Campedelli et.al. 2410.07109 link
2024-10-09 Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context Sangwon Yu et.al. 2410.07103 null
2024-10-09 Robots in the Middle: Evaluating LLMs in Dispute Resolution Jinzhe Tan et.al. 2410.07053 null
2024-10-09 PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness Zekun Wang et.al. 2410.07035 null
2024-10-09 Modeling of the Gamma Ray Burst photospheric emission: Monte Carlo simulation of the GRB prompt emission, numerical results and discussion Amina Trabelsi et.al. 2410.07005 link
2024-10-07 GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting Yukang Cao et.al. 2410.05259 null
2024-10-08 TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models Rabin Adhikari et.al. 2410.05239 link
2024-10-07 Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer Siyuan Hou et.al. 2410.05151 null
2024-10-08 PAMLR: A Passive-Active Multi-Armed Bandit-Based Solution for LoRa Channel Allocation Jihoon Yun et.al. 2410.05147 null
2024-10-07 CR-CTC: Consistency regularization on CTC for improved speech recognition Zengwei Yao et.al. 2410.05101 link
2024-10-07 IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification Yan He et.al. 2410.05100 null
2024-10-07 Human-in-the-loop Reasoning For Traffic Sign Detection: Collaborative Approach Yolo With Video-llava Mehdi Azarafza et.al. 2410.05096 null
2024-10-07 HyperINF: Unleashing the HyperPower of the Schulz’s Method for Data Influence Estimation Xinyu Zhou et.al. 2410.05090 link
2024-10-07 ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Ziru Chen et.al. 2410.05080 null
2024-10-07 Large Language Model Based Multi-Objective Optimization for Integrated Sensing and Communications in UAV Networks Haoyun Li et.al. 2410.05062 null
2024-10-04 Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models Tinghui Zhu et.al. 2410.03659 link
2024-10-04 Conditional Enzyme Generation Using Protein Language Models with Adapters Jason Yang et.al. 2410.03634 null
2024-10-04 Searching for type I seesaw mechanism in a two Heavy Neutral Leptons scenario at FCC-ee Sehar Ajmal et.al. 2410.03615 null
2024-10-04 Understanding Reasoning in Chain-of-Thought from the Hopfieldian View Lijie Hu et.al. 2410.03595 null
2024-10-04 Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models Xin Zou et.al. 2410.03577 link
2024-10-04 Individual vaccination as Nash equilibrium in a SIR model with application to the 2009-10 Influenza A(H1N1) epidemic in France Laetitia Laguzet et.al. 2410.03567 null
2024-10-04 Re-examining Sexism and Misogyny Classification with Annotator Attitudes Aiqi Jiang et.al. 2410.03543 null
2024-10-04 Collaborative and Efficient Personalization with Mixtures of Adaptors Abdulla Jasem Almansoori et.al. 2410.03497 null
2024-10-04 Gradient-based Jailbreak Images for Multimodal Fusion Models Javier Rando et.al. 2410.03489 link
2024-10-04 Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation Tobias Leemann et.al. 2410.03461 null
2024-10-03 Erasing Conceptual Knowledge from Language Models Rohit Gandikota et.al. 2410.02760 link
2024-10-03 Loong: Generating Minute-level Long Videos with Autoregressive Language Models Yuqing Wang et.al. 2410.02757 null
2024-10-03 CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation Han He et.al. 2410.02748 link
2024-10-03 Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization Lei Xu et.al. 2410.02741 link
2024-10-03 Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation Rohin Manvi et.al. 2410.02725 null
2024-10-03 Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization Ryan C. Barron et.al. 2410.02721 null
2024-10-03 HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly Howard Yen et.al. 2410.02694 link
2024-10-03 HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router Lingrui Mei et.al. 2410.02684 link
2024-10-03 DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life Yu Ying Chiu et.al. 2410.02683 null
2024-10-03 Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models Shuoyuan Wang et.al. 2410.02681 null
2024-10-02 DreamGarden: A Designer Assistant for Growing Games from a Single Prompt Sam Earle et.al. 2410.01791 null
2024-10-02 Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models Shayekh Bin Islam et.al. 2410.01782 link
2024-10-02 Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning Xingrui Gu et.al. 2410.01739 null
2024-10-02 LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits Duy Nguyen et.al. 2410.01735 link
2024-10-02 ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Rinon Gal et.al. 2410.01731 null
2024-10-02 Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting Longyu Feng et.al. 2410.01724 null
2024-10-02 Examining the Role of Relationship Alignment in Large Language Models Kristen M. Altenburger et.al. 2410.01708 null
2024-10-02 FactAlign: Long-form Factuality Alignment of Large Language Models Chao-Wei Huang et.al. 2410.01691 link
2024-10-02 Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding Yanming Liu et.al. 2410.01671 null
2024-10-02 Extending Contextual Self-Modulation: Meta-Learning Across Modalities, Task Dimensionalities, and Data Regimes Roussel Desmond Nzoyem et.al. 2410.01655 link
2024-09-30 LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner Xiaopan Zhang et.al. 2409.20560 null
2024-09-30 Uni $^2$ Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection Yubin Wang et.al. 2409.20558 null
2024-09-30 LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation Ziyao Zhang et.al. 2409.20550 null
2024-09-30 Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models Arpan Mukherjee et.al. 2409.20512 null
2024-09-30 COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models Divyanshu Daiya et.al. 2409.20502 null
2024-09-30 Online Decision Deferral under Budget Constraints Mirabel Reid et.al. 2409.20489 null
2024-10-01 Instance-adaptive Zero-shot Chain-of-Thought Prompting Xiaosong Yuan et.al. 2409.20441 null
2024-09-30 World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering Jiacong Wang et.al. 2409.20424 link
2024-09-30 Superposition of PRS and PDSCH for ISAC System: Spectral Efficiency Enhancement and Range Ambiguity Elimination Keivan Khosroshahi et.al. 2409.20420 null
2024-09-30 Wait, but Tylenol is Acetaminophen… Investigating and Improving Language Models’ Ability to Resist Requests for Misinformation Shan Chen et.al. 2409.20385 null
2024-09-27 ProMerge: Prompt and Merge for Unsupervised Instance Segmentation Dylan Li et.al. 2409.18961 null
2024-09-27 LML: Language Model Learning a Dataset for Data-Augmented Prediction Praneeth Vadlapati et.al. 2409.18957 link
2024-09-27 Improving Visual Object Tracking through Visual Prompting Shih-Fang Chen et.al. 2409.18901 link
2024-09-27 IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation Fan Lin et.al. 2409.18892 link
2024-09-27 LW2G: Learning Whether to Grow for Prompt-based Continual Learning Qian Feng et.al. 2409.18860 link
2024-09-27 Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects Annie Chu et.al. 2409.18847 null
2024-09-27 LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis Hamed Babaei Giglou et.al. 2409.18812 link
2024-09-27 Can AI Enhance its Creativity to Beat Humans ? Anne-Gaëlle Maltese et.al. 2409.18776 null
2024-09-27 Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations James Ford et.al. 2409.18764 null
2024-09-27 Interaction Equivalence Beniamino Accattoli et.al. 2409.18709 null
2024-09-26 EgoLM: Multi-Modal Language Model of Egocentric Motions Fangzhou Hong et.al. 2409.18127 null
2024-09-26 GSON: A Group-based Social Navigation Framework with Large Multimodal Model Shangyi Luo et.al. 2409.18084 null
2024-09-26 Infer Human’s Intentions Before Following Natural Language Instructions Yanming Wan et.al. 2409.18073 link
2024-09-26 Infering Alt-text For UI Icons With Large Language Models During App Development Sabrina Haque et.al. 2409.18060 null
2024-09-26 MARS: Multi-radio Architecture with Radio Selection using Decision Trees for emerging mesoscale CPS/IoT applications Jothi Prasanna Shanmuga Sundaram et.al. 2409.18043 null
2024-09-26 DARE: Diverse Visual Question Answering with Robustness Evaluation Hannah Sterz et.al. 2409.18023 null
2024-09-26 Control Industrial Automation System with Large Language Models Yuchen Xia et.al. 2409.18009 link
2024-09-26 Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation Ashmi Banerjee et.al. 2409.18003 null
2024-09-26 Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models Georg Ahnert et.al. 2409.17990 link
2024-09-26 GRB 240529A: A Tale of Two Shocks Tian-Rui Sun et.al. 2409.17983 null
2024-09-25 Attention Prompting on Image for Large Vision-Language Models Runpeng Yu et.al. 2409.17143 link
2024-09-25 Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset Andrew Goldberg et.al. 2409.17126 null
2024-09-26 Characterizing stable regions in the residual stream of LLMs Jett Janiak et.al. 2409.17113 null
2024-09-25 Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts Mohammad Sadil Khan et.al. 2409.17106 link
2024-09-25 Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation Richard D. Paul et.al. 2409.17085 null
2024-09-25 Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors Aiping Zhang et.al. 2409.17058 link
2024-09-25 GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design Phillip Mueller et.al. 2409.17045 null
2024-09-25 Counterfactual Token Generation in Large Language Models Ivi Chatzi et.al. 2409.17027 link
2024-09-25 AXCEL: Automated eXplainable Consistency Evaluation using LLMs P Aditya Sreekar et.al. 2409.16984 null
2024-09-25 DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling Kyuheon Jung et.al. 2409.16949 link
2024-09-24 Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation Yong Xien Chng et.al. 2409.16278 null
2024-09-24 Second Order Bounds for Contextual Bandits with Function Approximation Aldo Pacchiano et.al. 2409.16197 null
2024-09-24 Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation Xiaohong Liu et.al. 2409.16183 null
2024-09-24 Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering Ziyu Zhao et.al. 2409.16167 null
2024-09-24 Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework Lu Chen et.al. 2409.16146 link
2024-09-24 HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection Yuqi Ma et.al. 2409.16136 null
2024-09-24 Evaluation of state-of-the-art ASR Models in Child-Adult Interactions Aditya Ashvin et.al. 2409.16135 null
2024-09-24 MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents Ming Zhu et.al. 2409.16120 link
2024-09-24 Exploring Hint Generation Approaches in Open-Domain Question Answering Jamshid Mozafari et.al. 2409.16096 link
2024-09-24 MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models Wenhao Yu et.al. 2409.16030 null
2024-09-18 To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Zayne Sprague et.al. 2409.12183 link
2024-09-18 Investigating the effects of precise mass measurements of Ru and Pd isotopes on machine learning mass modeling W. S. Porter et.al. 2409.12141 null
2024-09-18 MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion Kalakonda Sai Shashank et.al. 2409.12140 link
2024-09-18 Self-similar solutions of oscillatory reconnection: parameter study of magnetic field strength and background temperature Luiz A. C. A. Schiavo et.al. 2409.12130 null
2024-09-18 Fully charmed tetraquark production at the LHC experiments Ilia Belov et.al. 2409.12070 null
2024-09-18 Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking Ningyuan Xi et.al. 2409.12059 null
2024-09-19 Using Large Language Models to Generate Clinical Trial Tables and Figures Yumeng Yang et.al. 2409.12046 null
2024-09-18 Mixture of Prompt Learning for Vision Language Models Yu Du et.al. 2409.12011 null
2024-09-18 Ramp reversal memory in bulk crystals of 1T-TaS2 Avital Fried et.al. 2409.11977 null
2024-09-18 Sampling Latent Material-Property Information From LLM-Derived Embedding Representations Luke P. J. Gilligan et.al. 2409.11971 null
2024-09-17 LPT++: Efficient Training on Mixture of Long-tailed Experts Bowen Dong et.al. 2409.11323 null
2024-09-17 MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping Amirreza Fateh et.al. 2409.11316 link
2024-09-17 Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models Divij Gupta et.al. 2409.11302 null
2024-09-17 TISIS : Trajectory Indexing for SImilarity Search Sara Jarrad et.al. 2409.11301 null
2024-09-18 Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling Xinyue Fang et.al. 2409.11283 null
2024-09-17 Machine Learning and Theory Ladenness – A Phenomenological Account Alberto Termine et.al. 2409.11277 null
2024-09-18 The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives Samee Arif et.al. 2409.11261 link
2024-09-17 Norm of Mean Contextualized Embeddings Determines their Variance Hiroaki Yamagiwa et.al. 2409.11253 link
2024-09-17 Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse Maojia Song et.al. 2409.11242 link
2024-09-17 Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection Yuta Kaneko et.al. 2409.11223 null
2024-09-16 Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models Momoko Shiraishi et.al. 2409.10506 null
2024-09-16 Do Pre-trained Vision-Language Models Encode Object States? Kaleb Newman et.al. 2409.10488 null
2024-09-16 Addressing misspecification in contextual optimization Omar Bennouna et.al. 2409.10479 null
2024-09-16 A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration Zhang Zheng et.al. 2409.10403 null
2024-09-16 Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation Hanbo Bi et.al. 2409.10389 null
2024-09-16 On Synthetic Texture Datasets: Challenges, Creation, and Curation Blaine Hoak et.al. 2409.10297 null
2024-09-16 From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs Navya Jain et.al. 2409.10245 null
2024-09-16 Robust Bird’s Eye View Segmentation by Adapting DINOv2 Merve Rabia Barın et.al. 2409.10228 null
2024-09-16 Exploring Quantum Contextuality with the Quantum Moebius-Escher-Penrose hypergraph Mirko Navara et.al. 2409.10179 null
2024-09-17 jina-embeddings-v3: Multilingual Embeddings With Task LoRA Saba Sturua et.al. 2409.10173 null
2024-09-13 Contri(e)ve: Context + Retrieve for Scholarly Question Answering Kanchan Shivashankar et.al. 2409.09010 null
2024-09-13 SynSUM – Synthetic Benchmark with Structured and Unstructured Medical Records Paloma Rabaey et.al. 2409.08936 link
2024-09-13 LLM-based Weak Supervision Framework for Query Intent Classification in Video Search Farnoosh Javadi et.al. 2409.08931 null
2024-09-13 Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers Namita Singh et.al. 2409.08916 null
2024-09-13 Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing Minh-Duc Vu et.al. 2409.08885 null
2024-09-13 Data Efficient Child-Adult Speaker Diarization with Simulated Conversations Anfeng Xu et.al. 2409.08881 link
2024-09-13 InstantDrag: Improving Interactivity in Drag-based Image Editing Joonghyuk Shin et.al. 2409.08857 null
2024-09-13 A RAG Approach for Generating Competency Questions in Ontology Engineering Xueli Pan et.al. 2409.08820 null
2024-09-13 Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR Mingyu Cui et.al. 2409.08797 link
2024-09-13 LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment Huan Zhang et.al. 2409.08795 link
2024-09-12 Click2Mask: Local Editing with Dynamic Mask Generation Omer Regev et.al. 2409.08272 link
2024-09-12 Improving Text-guided Object Inpainting with Semantic Pre-inpainting Yifu Chen et.al. 2409.08260 link
2024-09-12 Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding Hongyu Li et.al. 2409.08251 null
2024-09-12 OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering Jiahao Nick Li et.al. 2409.08250 null
2024-09-12 TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder NaHyeon Park et.al. 2409.08248 link
2024-09-12 LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems Hakan T. Otal et.al. 2409.08234 link
2024-09-12 Exploring Use and Perceptions of Generative AI Art Tools by Blind Artists Gayatri Raman et.al. 2409.08226 null
2024-09-12 AudioBERT: Audio Knowledge Augmented Language Model Hyunjong Ok et.al. 2409.08199 link
2024-09-12 Fine-tuning Large Language Models for Entity Matching Aaron Steiner et.al. 2409.08185 link
2024-09-12 On the Role of Context in Reading Time Prediction Andreas Opedal et.al. 2409.08160 link
2024-09-11 Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation Gavin Butts et.al. 2409.07424 null
2024-09-11 AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge Han Wang et.al. 2409.07394 link
2024-09-11 Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code Khiem Ton et.al. 2409.07368 null
2024-09-11 Enhancing Sequential Music Recommendation with Negative Feedback-informed Contrastive Learning Pavan Seshadri et.al. 2409.07367 null
2024-09-11 PaveSAM Segment Anything for Pavement Distress Neema Jakisa Owor et.al. 2409.07295 null
2024-09-12 Alignment of Diffusion Models: Fundamentals, Challenges, and Future Buhua Liu et.al. 2409.07253 link
2024-09-11 Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning Yingling Lu et.al. 2409.07238 link
2024-09-12 3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents Yingjie Zhou et.al. 2409.07236 link
2024-09-11 Swin-LiteMedSAM: A Lightweight Box-Based Segment Anything Model for Large-Scale Medical Image Datasets Ruochen Gao et.al. 2409.07172 link
2024-09-11 Leveraging Unstructured Text Data for Federated Instruction Tuning of Large Language Models Rui Ye et.al. 2409.07136 null
2024-09-10 E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning Zihan Liao et.al. 2409.06679 null
2024-09-10 SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Teng Hu et.al. 2409.06633 null
2024-09-10 One-Shot Imitation under Mismatched Execution Kushal Kedia et.al. 2409.06615 null
2024-09-10 Simulation-based Scenario Generation for Robust Hybrid AI for Autonomy Hambisa Keno et.al. 2409.06608 null
2024-09-10 Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System Leilei Lin et.al. 2409.06568 link
2024-09-10 ChatGPT’s Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools Ehsan Firouzi et.al. 2409.06561 null
2024-09-10 An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition Yi-Cheng Wang et.al. 2409.06468 null
2024-09-10 Continual Domain Incremental Learning for Privacy-aware Digital Pathology Pratibha Kumari et.al. 2409.06455 null
2024-09-10 Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles Qiujing Lu et.al. 2409.06450 null
2024-09-10 HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data Hossein Hajipour et.al. 2409.06446 link
2024-09-09 Promptable Closed-loop Traffic Simulation Shuhan Tan et.al. 2409.05863 null
2024-09-09 Recognizing molecular chirality via twisted 2D materials Lorenzo Cavicchi et.al. 2409.05839 null
2024-09-09 Are Large Language Models a Threat to Programming Platforms? An Exploratory Study Md Mustakim Billah et.al. 2409.05824 null
2024-09-09 Leveraging Object Priors for Point Tracking Bikram Boote et.al. 2409.05786 link
2024-09-09 A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System B. Sankar et.al. 2409.05747 null
2024-09-09 What Did My Car Say? Autonomous Vehicle Explanation Errors, Context, and Personal Traits Impact Comfort, Reliance, Satisfaction, and Driving Confidence Robert Kaufman et.al. 2409.05731 null
2024-09-09 Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling Sara Ferro et.al. 2409.05699 null
2024-09-09 SX-Stitch: An Efficient VMS-UNet Based Framework for Intraoperative Scoliosis X-Ray Image Stitching Yi Li et.al. 2409.05681 null
2024-09-09 Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models Aakash Sen Sharma et.al. 2409.05668 null
2024-09-09 DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification Junzhou Chen et.al. 2409.05587 null
2024-09-06 Question-Answering Dense Video Events Hangyu Qin et.al. 2409.04388 null
2024-09-06 J/ $ψ$-hadron correlations at midrapidity in pp collisions at $\sqrt{s}$ = 13 TeV ALICE Collaboration et.al. 2409.04364 null
2024-09-06 Connectivity-Inspired Network for Context-Aware Recognition Gianluca Carloni et.al. 2409.04360 link
2024-09-06 First studies on cascaded dual-phase liquid hole-multipliers in xenon G. Martinez-Lema et.al. 2409.04338 null
2024-09-06 Active learning for regression in engineering populations: A risk-informed approach Daniel R. Clarkson et.al. 2409.04328 null
2024-09-06 Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs Aliakbar Nafar et.al. 2409.04318 link
2024-09-06 FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning Yunhao Bai et.al. 2409.04298 link
2024-09-06 Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets Desiree Heim et.al. 2409.04286 null
2024-09-06 An overview of domain-specific foundation model: key technologies, applications and challenges Haolong Chen et.al. 2409.04267 null
2024-09-06 FPT Algorithms using Minimal Parameters for a Generalized Version of Maximin Shares Klaus Jansen et.al. 2409.04225 null
2024-09-05 LLM-CI: Assessing Contextual Integrity Norms in Language Models Yan Shvartzshnaider et.al. 2409.03735 null
2024-09-06 RAG based Question-Answering for Contextual Response Prediction System Sriram Veturi et.al. 2409.03708 null
2024-09-06 LLM-based multi-agent poetry generation in non-cooperative environments Ran Zhang et.al. 2409.03659 link
2024-09-05 Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers Amit Ben Artzy et.al. 2409.03621 link
2024-09-05 Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration Pei Wang et.al. 2409.03455 null
2024-09-05 Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities Wei Lu et.al. 2409.03444 link
2024-09-05 Leveraging Large Language Models through Natural Language Processing to provide interpretable Machine Learning predictions of mental deterioration in real time Francisco de Arriba-Pérez et.al. 2409.03375 null
2024-09-05 TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation Shahzaib Iqbal et.al. 2409.03367 null
2024-09-05 Sketch: A Toolkit for Streamlining LLM Operations Xin Jiang et.al. 2409.03346 null
2024-09-05 N-gram Prediction and Word Difference Representations for Language Modeling DongNyeong Heo et.al. 2409.03295 null
2024-09-04 HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts Xinyu Liu et.al. 2409.02919 link
2024-09-04 Building a Scalable, Effective, and Steerable Search and Ranking Platform Marjan Celikik et.al. 2409.02856 null
2024-09-04 Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model Tornike Karchkhadze et.al. 2409.02845 null
2024-09-04 MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Xiang Yue et.al. 2409.02813 null
2024-09-04 Non-Orthogonal Multiple-Access Strategies for Direct-to-Satellite IoT Networks Felipe Augusto Tondo et.al. 2409.02748 null
2024-09-04 Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection Kaiqing Lin et.al. 2409.02664 null
2024-09-04 PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation Jun Ling et.al. 2409.02657 null
2024-09-04 Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects Kyungmin Jo et.al. 2409.02653 null
2024-09-04 Mamba as a motion encoder for robotic imitation learning Toshiaki Tsuji et.al. 2409.02636 null
2024-09-04 PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation Aneta Pawelec et.al. 2409.02617 null
2024-08-30 DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model Mona Sheikh Zeinoddin et.al. 2408.17433 link
2024-08-30 CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models Jonathan Bourne et.al. 2408.17428 link
2024-09-03 Open-vocabulary Temporal Action Localization using VLMs Naoki Wake et.al. 2408.17422 null
2024-08-30 MoRe Fine-Tuning with 10x Fewer Parameters Wenxuan Tan et.al. 2408.17383 link
2024-08-30 Efficient Multi-task Prompt Tuning for Recommendation Ting Bai et.al. 2408.17214 null
2024-08-30 NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar Runwei Guan et.al. 2408.17207 null
2024-08-30 Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study Shubham Agarwal et.al. 2408.17181 null
2024-08-30 Wireless Integrated Authenticated Communication System (WIA-Comm) Amith N Bharadwaj et.al. 2408.17112 null
2024-08-30 Understanding the User: An Intent-Based Ranking Dataset Abhijit Anand et.al. 2408.17103 null
2024-08-30 Reasoning AI Performance Degradation in 6G Networks with Large Language Models Liming Huang et.al. 2408.17097 null
2024-08-29 PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning Noor Hussein et.al. 2408.16769 link
2024-08-29 SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Ziyu Guo et.al. 2408.16768 link
2024-08-29 ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Fangfu Liu et.al. 2408.16767 null
2024-08-29 An algebraic characterisation of Kochen-Specker contextuality Markus Frembs et.al. 2408.16764 null
2024-08-29 Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge Beidi Dong et.al. 2408.16749 null
2024-08-29 GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models Moreno D’Incà et.al. 2408.16700 link
2024-08-29 Iterative Graph Alignment Fangyuan Yu et.al. 2408.16667 link
2024-08-29 LLMs generate structurally realistic social networks but overestimate political homophily Serina Chang et.al. 2408.16629 link
2024-08-29 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Shengpeng Ji et.al. 2408.16532 link
2024-08-29 UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation Piotr Rudol et.al. 2408.16501 null
2024-08-29 Spatio-Temporal Context Prompting for Zero-Shot Action Detection Wei-Jhe Huang et.al. 2408.15996 null
2024-08-28 TEDRA: Text-based Editing of Dynamic and Photoreal Actors Basavaraj Sunagad et.al. 2408.15995 null
2024-08-28 Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration Xu Zhang et.al. 2408.15994 null
2024-08-28 In-Context Imitation Learning via Next-Token Prediction Letian Fu et.al. 2408.15980 link
2024-08-28 Fall Detection for Smart Living using YOLOv5 Gracile Astlin Pereira et.al. 2408.15955 null
2024-08-28 Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games Nicholas R. Waytowich et.al. 2408.15950 null
2024-08-28 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Yuncheng Yang et.al. 2408.15915 link
2024-08-28 CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization Feize Wu et.al. 2408.15914 null
2024-08-28 Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models Sebastian Vallejo Vera et.al. 2408.15895 null
2024-08-28 Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation Shaofei Huang et.al. 2408.15876 link
2024-08-27 SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images Zafer Yildiz et.al. 2408.15224 link
2024-08-27 LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li et.al. 2408.15221 null
2024-08-27 Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation Jian Hu et.al. 2408.15205 link
2024-08-27 On the parameterized complexity of computing good edge-labelings Davi de Andrade et.al. 2408.15181 null
2024-08-27 A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships Gracile Astlin Pereira et.al. 2408.15178 null
2024-08-27 X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation Hanjia Lyu et.al. 2408.15172 null
2024-08-28 Urdu Digital Text Word Optical Character Recognition Using Permuted Auto Regressive Sequence Modeling Ahmed Mustafa et.al. 2408.15119 null
2024-08-27 CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP Zhenchen Tang et.al. 2408.15098 null
2024-08-27 MiWaves Reinforcement Learning Algorithm Susobhan Ghosh et.al. 2408.15076 link
2024-08-28 Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance Kunpeng Wang et.al. 2408.15063 link
2024-08-27 Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models Aradhye Agarwal et.al. 2408.14470 link
2024-08-26 Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study Liuchang Xu Shuo Zhao et.al. 2408.14438 null
2024-08-26 Social perception of faces in a vision-language model Carina I. Hausladen et.al. 2408.14435 link
2024-08-26 Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications Luyue Xu et.al. 2408.14432 null
2024-08-26 Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning Sakhinana Sagar Srinivas et.al. 2408.14387 null
2024-08-26 ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty Xindi Wu et.al. 2408.14339 null
2024-08-26 Claim Verification in the Age of Large Language Models: A Survey Alphaeus Dmonte et.al. 2408.14317 null
2024-08-27 Text3DAug – Prompted Instance Augmentation for LiDAR Perception Laurenz Reichardt et.al. 2408.14253 link
2024-08-27 SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Trung Dao et.al. 2408.14176 link
2024-08-26 Contrastive Learning Subspace for Text Clustering Qian Yong et.al. 2408.14119 null
2024-08-23 Domain-specific long text classification from sparse relevant information Célia D’Cruz et.al. 2408.13253 null
2024-08-23 LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation Shuai Yang et.al. 2408.13252 null
2024-08-23 CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities Tao Wu et.al. 2408.13239 link
2024-08-23 Enhancing Few-Shot Transfer Learning with Optimized Multi-Task Prompt Tuning through Modular Prompt Composition Ahmad Pouramini et.al. 2408.13227 null
2024-08-23 Polarization Measurement of Gamma-ray Bursts with Fermi-GBM: The Case of GRB 180720B P. Veres et.al. 2408.13199 null
2024-08-23 Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning Hourui Deng et.al. 2408.13184 null
2024-08-23 Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation Bonan Li et.al. 2408.13149 null
2024-08-23 SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks Kai-Wei Chang et.al. 2408.13040 null
2024-08-23 Indoor scene recognition from images under visual corruptions Willams de Lima Costa et.al. 2408.13029 null
2024-08-23 A Web-Based Solution for Federated Learning with LLM-Based Automation Chamith Mawela et.al. 2408.13010 null
2024-08-22 Controllable Text Generation for Large Language Models: A Survey Xun Liang et.al. 2408.12599 link
2024-08-23 Non-Homophilic Graph Pre-Training and Prompt Learning Xingtong Yu et.al. 2408.12594 link
2024-08-22 Contextual Stochastic Optimization for School Desegregation Policymaking Hongzhao Guan et.al. 2408.12572 null
2024-08-22 Towards Evaluating and Building Versatile Large Language Models for Medicine Chaoyi Wu et.al. 2408.12547 link
2024-08-22 Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition Bozheng Li et.al. 2408.12475 null
2024-08-22 DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems Jiaju Chen et.al. 2408.12470 link
2024-08-22 FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing Jue Wang et.al. 2408.12429 link
2024-08-22 Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce Ádám Tibor Czapp et.al. 2408.12392 null
2024-08-22 Orbits of Binary Stars: from Visual Measures to Speckle Interferometry Andrei Tokovinin et.al. 2408.12376 null
2024-08-23 RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering Pratyush Kumar et.al. 2408.12369 link
2024-08-21 NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation Zhenye Lou et.al. 2408.11787 link
2024-08-21 Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards Omar Erak et.al. 2408.11775 link
2024-08-21 D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models M. Forlini et.al. 2408.11761 null
2024-08-21 MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs Yulin Ren et.al. 2408.11758 link
2024-08-21 FocusLLM: Scaling LLM’s Context by Parallel Decoding Zhenyu Li et.al. 2408.11745 link
2024-08-21 JieHua Paintings Style Feature Extracting Model using Stable Diffusion with ControlNet Yujia Gu et.al. 2408.11744 null
2024-08-21 CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering Yuliang Cai et.al. 2408.11742 link
2024-08-22 LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites Zachariah Sollenberger et.al. 2408.11729 null
2024-08-21 Efficient Detection of Toxic Prompts in Large Language Models Yi Liu et.al. 2408.11727 null
2024-08-21 Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests Amirhossein Deljouyi et.al. 2408.11710 link
2024-08-20 Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement Satoshi Kosugi et.al. 2408.11055 link
2024-08-20 Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks Nathaniel Pinckney et.al. 2408.11053 link
2024-08-20 Multiple Topology Replica Exchange of Expanded Ensembles (MT-REXEE) for Multidimensional Alchemical Calculations Anika J. Friedman et.al. 2408.11038 link
2024-08-20 An Overlooked Role of Context-Sensitive Dendrites Mohsin Raza et.al. 2408.11019 null
2024-08-20 Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter Farhanul Haque et.al. 2408.10955 null
2024-08-20 The Evolution of Reinforcement Learning in Quantitative Finance Nikolaos Pippas et.al. 2408.10932 null
2024-08-20 CHECKWHY: Causal Fact Verification via Argument Structure Jiasheng Si et.al. 2408.10918 link
2024-08-21 BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model Yeyong Yu et.al. 2408.10903 link
2024-08-20 DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection Xinqi Su et.al. 2408.10883 link
2024-08-20 Manifold Transform by Recurrent Cortical Circuit Enhances Robust Encoding of Familiar Stimuli Weifan Wang et.al. 2408.10873 null
2024-08-19 SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models Anke Tang et.al. 2408.10174 link
2024-08-19 Customizing Language Models with Instance-wise LoRA for Sequential Recommendation Xiaoyu Kong et.al. 2408.10159 link
2024-08-19 In-Context Learning with Representations: Contextual Generalization of Trained Transformers Tong Yang et.al. 2408.10147 null
2024-08-19 Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models Tianyu Zhang et.al. 2408.10124 link
2024-08-19 FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant Zhengchao Huang et.al. 2408.10072 link
2024-08-19 Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development Yuncheng Jiang et.al. 2408.10067 null
2024-08-19 Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory Haoran Li et.al. 2408.10053 null
2024-08-19 Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype Yadong Lu et.al. 2408.09984 null
2024-08-20 Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM’s Structured Questions for National Teacher Certification Exams Ling He et.al. 2408.09982 null
2024-08-19 Contextual Importance and Utility in Python: New Functionality and Insights with the py-ciu Package Kary Främling et.al. 2408.09957 link
2024-08-19 PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars Sumanth Prabhu et.al. 2408.08869 null
2024-08-16 Visual Agents as Fast and Slow Thinkers Guangyan Sun et.al. 2408.08862 link
2024-08-16 Revisiting the propagation of highly-energetic gamma rays in the Galaxy Gaetano Di Marco et.al. 2408.08818 null
2024-08-16 CIKMar: A Dual-Encoder Approach to Prompt-Based Reranking in Educational Dialogue Systems Joanito Agili Lopo et.al. 2408.08805 null
2024-08-16 Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification Abdullah Al Imran et.al. 2408.08803 null
2024-08-16 Neighbor Overlay-Induced Graph Attention Network Tiqiao Wei et.al. 2408.08788 null
2024-08-16 Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions Bhuvanashree Murugadoss et.al. 2408.08781 null
2024-08-16 Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions Chenming Tang et.al. 2408.08780 null
2024-08-16 Watching the Generative AI Hype Bubble Deflate David Gray Widder et.al. 2408.08778 null
2024-08-16 Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused Dingwei Chen et.al. 2408.08769 null
2024-08-15 SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training Gengwei Zhang et.al. 2408.08295 link
2024-08-15 Heavy Labels Out! Dataset Distillation with Label Space Lightening Ruonan Yu et.al. 2408.08201 null
2024-08-15 “I Try to Represent Myself as I Am”: Self-Presentation Preferences of People with Invisible Disabilities through Embodied Social VR Avatars Ria J. Gualano et.al. 2408.08193 null
2024-08-16 Beyond Full Label: Single-Point Prompt for Infrared Small Target Label Generation Shuai Yuan et.al. 2408.08191 link
2024-08-16 FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance Jiasong Feng et.al. 2408.08189 null
2024-08-15 Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion Adi Haviv et.al. 2408.08184 null
2024-08-15 EmBARDiment: an Embodied AI Agent for Productivity in XR Riccardo Bovo et.al. 2408.08158 null
2024-08-15 P/D-Serve: Serving Disaggregated Large Language Model at Scale Yibo Jin et.al. 2408.08147 null
2024-08-15 MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU Yan Li et.al. 2408.08144 null
2024-08-15 Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification Levente Murgás et.al. 2408.08126 link
2024-08-14 Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques Ivory Yang et.al. 2408.07676 null
2024-08-14 See It All: Contextualized Late Aggregation for 3D Dense Captioning Minjung Kim et.al. 2408.07648 null
2024-08-14 Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach Shizhou Zhang et.al. 2408.07500 link
2024-08-14 DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency Xiaojing Zhong et.al. 2408.07481 null
2024-08-14 Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification Yongcheng Li et.al. 2408.07467 link
2024-08-14 Large Language Models Prompting With Episodic Memory Dai Do et.al. 2408.07465 null
2024-08-15 BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning Asif Hanif et.al. 2408.07440 link
2024-08-14 Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator Federico Nicolas Peccia et.al. 2408.07404 null
2024-08-14 A Quantum-Inspired Analysis of Human Disambiguation Processes Daphne Wang et.al. 2408.07402 null
2024-08-14 Segment Using Just One Example Pratik Vora et.al. 2408.07393 null
2024-08-13 Categorical Framework for Typed Extensional and Intensional Models in Formal Semantics Daniel Quigley et.al. 2408.07058 null
2024-08-13 TableGuard – Securing Structured & Unstructured Data Anantha Sharma et.al. 2408.07045 null
2024-08-13 Imagen 3 Imagen-Team-Google et.al. 2408.07009 null
2024-08-13 Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models Chun Jie Chong et.al. 2408.07004 null
2024-08-13 Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2 Osher Rafaeli et.al. 2408.06970 null
2024-08-13 Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas Louis Kwok et.al. 2408.06929 link
2024-08-13 SceneGPT: A Language Model for 3D Scene Understanding Shivam Chandhok et.al. 2408.06926 null
2024-08-13 New refinements of Narayana polynomials and Motzkin polynomials Janet J. W. Dong et.al. 2408.06912 null
2024-08-13 Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives Zhihu Wang et.al. 2408.06904 null
2024-08-13 Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media Pranav Venkatesh et.al. 2408.06900 null
2024-08-12 Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks Lucas Félix et.al. 2408.06341 null
2024-08-12 LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification Tanisha Khurana et.al. 2408.06335 null
2024-08-12 Animate, or Inanimate, That is the Question for Large Language Models Leonardo Ranaldi et.al. 2408.06332 null
2024-08-12 Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let’s Take TravelPlanner as an Example Yanan Chen et.al. 2408.06318 null
2024-08-12 From SAM to SAM 2: Exploring Improvements in Meta’s Segment Anything Model Athulya Sundaresan Geetha et.al. 2408.06305 null
2024-08-12 Long-Form Answers to Visual Questions from Blind and Low Vision People Mina Huh et.al. 2408.06303 null
2024-08-12 Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM Trisha Das et.al. 2408.06285 null
2024-08-12 Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning Yingjin Song et.al. 2408.06259 null
2024-08-12 Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images Siladittya Manna et.al. 2408.06235 null
2024-08-12 Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting Halley Young et.al. 2408.06186 null
2024-08-09 Multi-Garment Customized Model Generation Yichen Liu et.al. 2408.05206 null
2024-08-09 Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners Michael Vaccaro Jr et.al. 2408.05204 null
2024-08-09 TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning Yujie Feng et.al. 2408.05200 link
2024-08-09 ECG-FM: An Open Electrocardiogram Foundation Model Kaden McKeen et.al. 2408.05178 link
2024-08-09 AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset Pritam Deka et.al. 2408.05149 null
2024-08-09 How Well Do LLMs Identify Cultural Unity in Diversity? Jialin Li et.al. 2408.05102 link
2024-08-09 Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts Tingchen Fu et.al. 2408.05094 null
2024-08-09 Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models Zikai Xie et.al. 2408.05093 link
2024-08-09 Generating novel experimental hypotheses from language models: A case study on cross-dative generalization Kanishka Misra et.al. 2408.05086 link
2024-08-09 SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation Da Mu et.al. 2408.05057 null
2024-08-08 SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation Jieming Yu et.al. 2408.04593 null
2024-08-08 SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals Haoran Zheng et.al. 2408.04575 null
2024-08-08 Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User’s Casual Sketches Yongzhi Xu et.al. 2408.04567 null
2024-08-08 Conversational Prompt Engineering Liat Ein-Dor et.al. 2408.04560 null
2024-08-08 Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models Yupeng Chang et.al. 2408.04556 link
2024-08-08 Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models Fabio Pernisi et.al. 2408.04522 null
2024-08-08 Model-Based Transfer Learning for Contextual Reinforcement Learning Jung-Hoon Cho et.al. 2408.04498 link
2024-08-08 What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant Jonan Richards et.al. 2408.04477 null
2024-08-09 Achieving Robust Data-driven Contextual Decision Making in a Data Augmentation Way Zhaoen Li et.al. 2408.04469 null
2024-08-08 Modelling Probabilistic FPC in Guarded Type Theory Philipp Jan Andries Stassen et.al. 2408.04455 null
2024-08-07 SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature Vinícius Di Oliveira et.al. 2408.03936 null
2024-08-07 FMiFood: Multi-modal Contrastive Learning for Food Image Classification Xinyue Pan et.al. 2408.03922 null
2024-08-07 CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Xiangyan Liu et.al. 2408.03910 link
2024-08-07 Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models Shachi H Kumar et.al. 2408.03907 null
2024-08-07 Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond Beomseok Lee et.al. 2408.03900 link
2024-08-07 BeeManc at the PLABA Track of TAC-2023: Investigating LLMs and Controllable Attributes for Improving Biomedical Text Readability Zihao Li et.al. 2408.03871 link
2024-08-07 GAIA – A Large Language Model for Advanced Power Dispatch Yuheng Cheng et.al. 2408.03847 null
2024-08-07 WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Prannaya Gupta et.al. 2408.03837 link
2024-08-07 Target Prompting for Information Extraction with Vision Language Model Dipankar Medhi et.al. 2408.03834 null
2024-08-07 Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring Zifan Wang et.al. 2408.03811 null
2024-08-06 Training LLMs to Recognize Hedges in Spontaneous Narratives Amie J. Paige et.al. 2408.03319 link
2024-08-06 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Charlie Snell et.al. 2408.03314 null
2024-08-06 MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation Xiaofeng Mao et.al. 2408.03312 null
2024-08-06 A search for soft X-ray emission lines in the afterglow spectrum of GRB 221009A Sergio Campana et.al. 2408.03306 null
2024-08-06 SARA: Singular-Value Based Adaptive Low-Rank Adaption Jihao Gu et.al. 2408.03290 null
2024-08-06 Biomedical SAM 2: Segment Anything in Biomedical Images and Videos Zhiling Yan et.al. 2408.03286 link
2024-08-06 Synthesizing Text-to-SQL Data from Weak and Strong LLMs Jiaxi Yang et.al. 2408.03256 null
2024-08-06 Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons Yifei Wang et.al. 2408.03247 link
2024-08-06 Making Long-Context Language Models Better Multi-Hop Reasoners Yanyang Li et.al. 2408.03246 link
2024-08-07 Red Type-1 Quasars after Cosmic Noon and Impact on $L_{\rm UV}$ -related Quasar Statistics Yongjung Kim et.al. 2408.03228 null
2024-08-05 Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? Mohammad Bahrami Karkevandi et.al. 2408.02651 null
2024-08-05 SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models Muxi Diao et.al. 2408.02632 null
2024-08-05 Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection Sajal Aggarwal et.al. 2408.02595 null
2024-08-05 The Role of Functional Muscle Networks in Improving Hand Gesture Perception for Human-Machine Interfaces Costanza Armanini et.al. 2408.02547 null
2024-08-05 Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph Zhao Kaichen et.al. 2408.02535 null
2024-08-05 Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation Aaron Imani et.al. 2408.02502 null
2024-08-05 Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection Ting Lei et.al. 2408.02484 link
2024-08-05 TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments Daeun Song et.al. 2408.02454 null
2024-08-05 FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification Yijin Huang et.al. 2408.02426 link
2024-08-05 Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models Zi Liang et.al. 2408.02416 link
2024-08-02 Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting Xiangyu Zhao et.al. 2408.01423 null
2024-08-02 Mission Impossible: A Statistical Perspective on Jailbreaking LLMs Jingtong Su et.al. 2408.01420 null
2024-08-02 Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs Yilun Hua et.al. 2408.01417 null
2024-08-02 Conditional LoRA Parameter Generation Xiaolong Jin et.al. 2408.01415 null
2024-08-02 Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer Yu Yang et.al. 2408.01402 null
2024-08-02 Transformers are Universal In-context Learners Takashi Furuya et.al. 2408.01367 null
2024-08-02 MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code Kaiwen Ning et.al. 2408.01354 link
2024-08-02 Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks Anders Giovanni Møller et.al. 2408.01346 null
2024-08-02 Synergistic pathways of modulation enable robust task packing within neural dynamics Giacomo Vedovati et.al. 2408.01316 null
2024-08-02 TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling Dong Huo et.al. 2408.01291 null
2024-08-01 Segment anything model 2: an application to 2D and 3D medical images Haoyu Dong et.al. 2408.00756 link
2024-08-01 Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Benlin Liu et.al. 2408.00754 null
2024-08-01 Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions Guangzhi Xiong et.al. 2408.00727 link
2024-08-01 Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM Xiaofeng Liu et.al. 2408.00706 null
2024-08-01 Can Developers Prompt? A Controlled Experiment for Code Documentation Generation Hans-Alexander Kruse et.al. 2408.00686 null
2024-08-01 Quantum Order by Disorder: A Key to Understanding the Magnetic Phases of BaCo $_2$(AsO$_4$)$_2$ Sangyun Lee et.al. 2408.00622 null
2024-08-01 Mitigating Multilingual Hallucination in Large Vision-Language Models Xiaoye Qu et.al. 2408.00550 link
2024-08-01 Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model Felipe Mahlow et.al. 2408.00544 null
2024-08-01 Jailbreaking Text-to-Image Models with LLM-Based Agents Yingkai Dong et.al. 2408.00523 null
2024-08-01 A new approach for encoding code and assisting code understanding Mengdan Fan et.al. 2408.00521 null
2024-07-31 Vision-Language Model Based Handwriting Verification Mihir Chauhan et.al. 2407.21788 null
2024-07-31 Tulip Agent – Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries Felix Ocker et.al. 2407.21778 null
2024-07-31 Ge-based Clinopyroxene series: first principles and experimental local probe study Ricardo P. Moreira et.al. 2407.21749 null
2024-07-31 A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation Mothilal Asokan et.al. 2407.21739 null
2024-07-31 Detecting, Explaining, and Mitigating Memorization in Diffusion Models Yuxin Wen et.al. 2407.21720 link
2024-07-31 Hyper-parameter tuning for text guided image editing Shiwen Zhang et.al. 2407.21703 link
2024-07-31 Four-loop two-mass tadpoles and the $ρ$ parameter Samuel Abreu et.al. 2407.21700 null
2024-07-31 Kramers-Kronig relations via Laplace formalism and $L^1$ integrability Marco Prevedelli et.al. 2407.21694 null
2024-07-31 MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment Anurag Das et.al. 2407.21654 null
2024-07-31 MSA2Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation Sina Ghorbani Kolahi et.al. 2407.21640 link
2024-07-30 Add-SD: Rational Generation without Manual Reference Lingfeng Yang et.al. 2407.21016 link
2024-07-30 CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning Yuexi Du et.al. 2407.21011 link
2024-07-30 AI-Assisted Generation of Difficult Math Questions Vedant Shah et.al. 2407.21009 link
2024-07-30 Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection Jinfa Huang et.al. 2407.21004 link
2024-07-30 From Feature Importance to Natural Language Explanations Using LLMs with RAG Sule Tekkesinoglu et.al. 2407.20990 link
2024-07-30 UniProcessor: A Text-induced Unified Low-level Image Processor Huiyu Duan et.al. 2407.20928 link
2024-07-30 SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition Hao Tan et.al. 2407.20920 null
2024-07-30 Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation Pujan Paudel et.al. 2407.20910 null
2024-07-30 ThinkRepair: Self-Directed Automated Program Repair Xin Yin et.al. 2407.20898 link
2024-07-30 Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations Sarthak Anand et.al. 2407.20856 null
2024-07-29 QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval Hongming Tan et.al. 2407.20207 null
2024-07-29 Deciphering the Instability of the Black Hole Ringdown Quasinormal Spectrum A. Ianniccari et.al. 2407.20144 null
2024-07-29 Context-Aware CSI Tracking and Path Loss Prediction Using Machine Learning and Dynamical Systems Anis Hamadouche et.al. 2407.20123 null
2024-07-29 Generative Diffusion Model Bootstraps Zero-shot Classification of Fetal Ultrasound Images In Underrepresented African Populations Fangyijie Wang et.al. 2407.20072 link
2024-07-29 Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models Zhe Li et.al. 2407.20053 null
2024-07-29 Reproducibility Study of “ITI-GEN: Inclusive Text-to-Image Generation” Daniel Gallo Fernández et.al. 2407.19996 link
2024-07-29 A Study on the Implementation Method of an Agent-Based Advanced RAG System Using Graph Cheonsu Jeong et.al. 2407.19994 null
2024-07-29 MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion Chencan Fu et.al. 2407.19976 null
2024-07-29 FedDEO: Description-Enhanced One-Shot Federated Learning with Diffusion Models Mingzhao Yang et.al. 2407.19953 null
2024-07-29 FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention Yu Lu et.al. 2407.19918 null
2024-07-26 Small Molecule Optimization with Large Language Models Philipp Guevorguian et.al. 2407.18897 link
2024-07-26 The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs Aleix Sant et.al. 2407.18786 null
2024-07-26 TESSILATOR: a one-stop shop for measuring TESS rotation periods A. S. Binks et.al. 2407.18761 link
2024-07-29 Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery Yuni Susanti et.al. 2407.18752 link
2024-07-26 Towards Generalized Offensive Language Identification Alphaeus Dmonte et.al. 2407.18738 null
2024-07-26 Neurosymbolic AI for Enhancing Instructability in Generative AI Amit Sheth et.al. 2407.18722 null
2024-07-26 Probing exotic long-lived particles from the prompt side using the CONTUR method Louie Corpe et.al. 2407.18710 null
2024-07-26 Dilated Strip Attention Network for Image Restoration Fangwei Hao et.al. 2407.18613 null
2024-07-26 Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation Chaoyi Ai et.al. 2407.18562 null
2024-07-26 A Universal Prompting Strategy for Extracting Process Model Information from Natural Language Text using Large Language Models Julian Neuberger et.al. 2407.18540 link
2024-07-25 LoRA-Pro: Are Low-Rank Adapters Properly Optimized? Zhengbo Wang et.al. 2407.18242 link
2024-07-26 Recursive Introspection: Teaching Language Model Agents How to Self-Improve Yuxiao Qu et.al. 2407.18219 null
2024-07-26 Exploring Scaling Trends in LLM Robustness Nikolaus Howe et.al. 2407.18213 link
2024-07-26 Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation Razieh Azizi et.al. 2407.18195 null
2024-07-25 Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning Sindhura Kommu et.al. 2407.18181 null
2024-07-25 Efficient Inference of Vision Instruction-Following Models with Elastic Cache Zuyan Liu et.al. 2407.18121 link
2024-07-25 Keypoint Promptable Re-Identification Vladimir Somers et.al. 2407.18112 link
2024-07-25 DINOv2 Rocks Geological Image Analysis: Classification, Segmentation, and Interpretability Florent Brondolo et.al. 2407.18100 link
2024-07-25 C2P: Featuring Large Language Models with Causal Reasoning Abdolmahdi Bagheri et.al. 2407.18069 null
2024-07-25 I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition Yannis Vasilakis et.al. 2407.18058 link
2024-07-24 WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries Wenting Zhao et.al. 2407.17468 null
2024-07-24 Fluent Student-Teacher Redteaming T. Ben Thompson et.al. 2407.17447 link
2024-07-24 Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Michael-Andrei Panaitescu-Liess et.al. 2407.17417 null
2024-07-24 (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork Tianjin Huang et.al. 2407.17412 null
2024-07-24 PERSONA: A Reproducible Testbed for Pluralistic Alignment Louis Castricato et.al. 2407.17387 null
2024-07-24 ViPer: Visual Personalization of Generative Models via Individual Preference Learning Sogand Salehi et.al. 2407.17365 null
2024-07-24 DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation Qian Feng et.al. 2407.17348 null
2024-07-24 How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations? Leo Yu-Ho Lo et.al. 2407.17291 null
2024-07-24 A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks Fabiano Belém et.al. 2407.17284 null
2024-07-25 LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model Wanggong Yang et.al. 2407.17229 null
2024-07-23 Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions Fabio Tosi et.al. 2407.16698 link
2024-07-23 Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack Xiaoyue Xu et.al. 2407.16695 link
2024-07-23 Can Large Language Models Automatically Jailbreak GPT-4V? Yuanwei Wu et.al. 2407.16686 null
2024-07-23 SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation Pengfei Chen et.al. 2407.16682 null
2024-07-23 RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent Huiyu Xu et.al. 2407.16667 null
2024-07-23 Lawma: The Power of Specialization for Legal Tasks Ricardo Dominguez-Olmedo et.al. 2407.16615 null
2024-07-23 Shared Imagination: LLMs Hallucinate Alike Yilun Zhou et.al. 2407.16604 null
2024-07-23 Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs Yifan Xia et.al. 2407.16576 null
2024-07-24 Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning Fang-Duo Tsai et.al. 2407.16564 link
2024-07-23 Patched RTC: evaluating LLMs for diverse software development tasks Asankhaya Sharma et.al. 2407.16557 link
2024-07-22 AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description Junyu Xie et.al. 2407.15850 link
2024-07-22 LLMmap: Fingerprinting For Large Language Models Dario Pasquini et.al. 2407.15847 link
2024-07-22 HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning Eugene Valassakis et.al. 2407.15844 null
2024-07-22 Artist: Aesthetically Controllable Text-Driven Stylization without Training Ruixiang Jiang et.al. 2407.15842 link
2024-07-22 Inequalities in Computational Thinking Among Incoming Students in an STEM Chilean University Felipe González-Pizarro et.al. 2407.15833 null
2024-07-23 Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy’s Properties Shao-Yu Fu et.al. 2407.15824 null
2024-07-22 Robust Facial Reactions Generation: An Emotion-Aware Framework with Modality Compensation Guanyu Hu et.al. 2407.15798 null
2024-07-22 AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection Yunkang Cao et.al. 2407.15795 link
2024-07-22 CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning Emanuele Frascaroli et.al. 2407.15793 link
2024-07-22 Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach Rian Dolphin et.al. 2407.15788 null
2024-07-19 T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation Kaiyue Sun et.al. 2407.14505 link
2024-07-19 M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models Seunggeun Chi et.al. 2407.14502 null
2024-07-19 Evaluating the Reliability of Self-Explanations in Large Language Models Korbinian Randl et.al. 2407.14487 link
2024-07-19 ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities Peng Xu et.al. 2407.14482 null
2024-07-19 Contrastive Learning with Counterfactual Explanations for Radiology Report Generation Mingjie Li et.al. 2407.14474 null
2024-07-19 AttentNet: Fully Convolutional 3D Attention for Lung Nodule Detection Majedaldein Almahasneh et.al. 2407.14464 null
2024-07-19 From Instruction to Insight: Exploring the Functional and Semantic Roles of Text in Interactive Dashboards Nicole Sultanum et.al. 2407.14451 null
2024-07-19 Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model Seonghui Min et.al. 2407.14434 null
2024-07-19 Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models Hyun-Jic Oh et.al. 2407.14426 null
2024-07-19 Improving Retrieval in Sponsored Search by Leveraging Query Context Signals Akash Kumar Mohankumar et.al. 2407.14346 null
2024-07-18 Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Boyang Deng et.al. 2407.13759 null
2024-07-18 LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation David Schlangen et.al. 2407.13744 null
2024-07-18 HazeCLIP: Towards Language Guided Real-World Image Dehazing Ruiyi Wang et.al. 2407.13719 link
2024-07-18 CoDefeater: Using LLMs To Find Defeaters in Assurance Cases Usman Gohar et.al. 2407.13717 link
2024-07-18 Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender Portfolio Jing Xu et.al. 2407.13687 null
2024-07-18 EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension Wei Zhang et.al. 2407.13596 link
2024-07-18 Robust Calibration of Large Vision-Language Adapters Balamurali Murugesan et.al. 2407.13588 link
2024-07-18 SAM-Driven Weakly Supervised Nodule Segmentation with Uncertainty-Aware Cross Teaching Xingyue Zhao et.al. 2407.13553 null
2024-07-18 GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding Changshuo Wang et.al. 2407.13519 link
2024-07-19 Mask2Map: Vectorized HD Map Construction Using Bird’s Eye View Segmentation Masks Sehwan Choi et.al. 2407.13517 link
2024-07-17 NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model Zhongqun Zhang et.al. 2407.12727 null
2024-07-17 Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? Ben Yao et.al. 2407.12725 null
2024-07-17 Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs Yiqing Shen et.al. 2407.12678 link
2024-07-17 FastSAM-3DSlicer: A 3D-Slicer Extension for 3D Volumetric Segment Anything Model with Uncertainty Quantification Yiqing Shen et.al. 2407.12658 link
2024-07-17 Zero-shot Text-guided Infinite Image Synthesis with LLM guidance Soyeong Kwon et.al. 2407.12642 null
2024-07-17 Rethinking the Architecture Design for Efficient Generic Event Boundary Detection Ziwei Zheng et.al. 2407.12622 link
2024-07-17 Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models Donggeun Kim et.al. 2407.12616 null
2024-07-17 AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism William Brannon et.al. 2407.12613 link
2024-07-17 Continuous reasoning for adaptive container image distribution in the cloud-edge continuum Damiano Azzolini et.al. 2407.12605 link
2024-07-17 VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Ofir Abramovich et.al. 2407.12594 link

USage Instructions

Usage instructions: here