Updated on 2025.04.18
Website
You can learn directly from this page
Tracking
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-04-16 | Efficient spin-orbit torque driven magnetization switching of GdFe using phosphorus-implanted platinum layers | Kazuki Shintaku et.al. | 2504.11796 | null |
2025-04-15 | Chiral Domain Walls Induced by Radially Magnetized Nanotube Geometry | Nobuyuki Umetsu et.al. | 2504.11005 | null |
2025-04-16 | Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution | Chenghao Li et.al. | 2504.09566 | null |
2025-04-13 | Sub-nanosecond in-plane magnetization switching induced by field-like spin-orbit torques from ferromagnets | Hanying Zhang et.al. | 2504.09431 | null |
2025-04-12 | Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking | You Wu et.al. | 2504.09228 | null |
2025-04-11 | Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions | Yingqian Xu et.al. | 2504.08257 | null |
2025-04-08 | Magnetic Memory Driven by Orbital Current | Jingkai Xu et.al. | 2504.05780 | null |
2025-04-07 | Dimensionality Enhanced Out-of-Plane Spin Currents in NbIrTe $_4$ for Efficient Field-Free Switching of Perpendicular Magnetization | Wei Yang et.al. | 2504.05280 | null |
2025-04-02 | Shape Anisotropy Enabled Field Free Switching of Perpendicular Nanomagnets | Akanksha Chouhan et.al. | 2504.01634 | null |
2025-03-31 | Symmetry Enhanced Unconventional Spin Current Anisotropy in a Collinear Antiferromagnet | Pankhuri Gupta et.al. | 2503.20545 | null |
2025-03-26 | Intrinsic back-switching phenomenon in SOT-MRAM devices | Kuldeep Ray et.al. | 2503.19840 | null |
2025-03-22 | MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking | Haolin Qin et.al. | 2503.17699 | link |
2025-04-07 | Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID | Yu-Hsi Chen et.al. | 2503.17237 | link |
2025-03-21 | Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks | Haijin Zeng et.al. | 2503.16930 | null |
2025-03-21 | Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking | Meng Zhou et.al. | 2503.16768 | null |
2025-03-17 | UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network | Siyuan Yao et.al. | 2503.12888 | link |
2025-03-16 | Equivalent-Circuit Thermal Model for Batteries with One-Shot Parameter Identification | Myisha A. Chowdhury et.al. | 2503.12616 | null |
2025-03-13 | Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking | Xinglong Sun et.al. | 2503.09951 | null |
2025-03-09 | Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking | Chaocan Xue et.al. | 2503.06625 | link |
2025-03-09 | Dynamic Updates for Language Adaptation in Visual-Language Tracking | Xiaohai Li et.al. | 2503.06621 | link |
2025-03-06 | High resolution spectra of the [6297-6303] and [6361-6367] Angstr{ö}m domains (including forbidden OI lines) of the Sun and brightest stars | Jean-Marie Malherbe et.al. | 2503.05832 | null |
2025-03-07 | Separating the bulk and interface contribution of spin-orbit torque in ferromagnet-Heavy metal bilayers tuned by variation of resistivity of heavy metal | Abu Bakkar Miah et.al. | 2503.05341 | null |
2025-03-07 | Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching | Simon A. Aytes et.al. | 2503.05179 | link |
2025-03-02 | Inefficiency of the orbit Hall effect on spin torque in transition metal/ferromagnet bilayers | Yizhuo Song et.al. | 2503.00910 | null |
2025-02-27 | MITracker: Multi-View Integration for Visual Object Tracking | Mengjie Xu et.al. | 2502.20111 | null |
2025-03-08 | Dynamic Degradation Decomposition Network for All-in-One Image Restoration | Huiqiang Wang et.al. | 2502.19068 | null |
2025-02-25 | UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking | He Wang et.al. | 2502.18220 | null |
2025-02-24 | Symmetry-breaking effects on spin-orbit torque switching in ferromagnetic semiconductors with perpendicular magnetic anisotropy | Apu Kumar Jana et.al. | 2502.16788 | null |
2025-02-17 | Effects of antiferromagnetic coupling and pinning on domain wall dynamics in synthetic ferrimagnets | Sougata Mallick et.al. | 2502.11621 | null |
2025-02-13 | Modelling spin-orbitronics effects at interfaces and chiral molecules | Poonam Kumari et.al. | 2502.09239 | null |
2025-02-12 | Highly efficient field-free switching by orbital Hall torque in a MoS2-based device operating at room temperature | Antonio Bianco et.al. | 2502.08483 | null |
2025-02-08 | Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark | Shiao Wang et.al. | 2502.05574 | link |
2025-02-06 | Visualizing Field-free Deterministic Magnetic Switching of all-van der Waals Spin-Orbit Torque System Using Spin Ensembles in Hexagonal Boron Nitride | Xi Zhang et.al. | 2502.04561 | null |
2025-01-27 | Investigation of Sub-configurations Reveals Stable Spin-Orbit Torque Switching Polarity in Polycrystalline Mn3Sn | Boyu Zhao et.al. | 2501.15815 | null |
2025-01-25 | Thermal Stability and Depinning Currents of Domain Wall-Based Artificial Synapses | Guntas Kaur et.al. | 2501.15102 | null |
2025-02-16 | Enhancing Unconventional Spin-Orbit Torque Efficiency: Numerical Study on the Influence of Crystallographic Texture and Polycrystalline Effects on Low-Symmetry Materials | Yifei Yang et.al. | 2501.14200 | null |
2025-01-22 | Enhanced Field-Free Perpendicular Magnetization Switching via spin splitting torque in Altermagnetic RuO2-based Heterostructures | Badsha Sekh et.al. | 2501.12593 | null |
2025-01-18 | Multilayered MXenes for future two-dimensional nonvolatile magnetic memories | P. Kumar et.al. | 2501.10678 | null |
2025-01-13 | Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions | Xiantong Zhao et.al. | 2501.07133 | null |
2025-01-11 | ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Xuanle Zhao et.al. | 2501.06598 | link |
2025-01-18 | BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination | Zhongxuan Zhang et.al. | 2501.03616 | null |
2025-01-05 | DeTrack: In-model Latent Denoising Learning for Visual Object Tracking | Xinyu Zhou et.al. | 2501.02467 | null |
2024-12-31 | Alternative harmonic detection approach for quantitative determination of spin and orbital torques | Y. Xu et.al. | 2501.00403 | null |
2024-12-30 | An Experimental Study of Passive UAV Tracking with Digital Arrays and Cellular Downlink Signals | Yifei Sun et.al. | 2412.20788 | null |
2024-12-30 | Spin-orbit torque in a three-fold-symmetric bilayer and its effect on magnetization dynamics | Wuzhang Fang et.al. | 2412.20746 | null |
2024-12-28 | Learning Adaptive and View-Invariant Vision Transformer with Multi-Teacher Knowledge Distillation for Real-Time UAV Tracking | You Wu et.al. | 2412.20002 | link |
2024-12-27 | Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues | X. Feng et.al. | 2412.19648 | link |
2024-12-26 | Semistrong edge colorings of planar graphs | Yuquan Lin et.al. | 2412.19230 | null |
2024-12-26 | SUTrack: Towards Simple and Unified Single Object Tracking | Xin Chen et.al. | 2412.19138 | link |
2024-12-24 | Linear Enhancement of Spin-Orbit Torques and Absence of Bulk Rashba-Type Spin Splitting in Perpendicularly Magnetized [Pt/Co/W]n Superlattices | Zhihao Yan et.al. | 2412.18481 | null |
2024-12-24 | Field-free current-induced magnetization switching of a room temperature van der Waals magnet for neuromorphic computing | Chenxi Zhou et.al. | 2412.18429 | null |
2024-12-24 | All-electric mimicking synaptic plasticity based on the noncollinear antiferromagnetic device | Cuimei Cao et.al. | 2412.18418 | null |
2025-01-01 | Unsupervised UAV 3D Trajectories Estimation with Sparse Point Clouds | Hanfang Liang et.al. | 2412.12716 | link |
2024-12-15 | Exploring Enhanced Contextual Information for Video-Level Object Tracking | Ben Kang et.al. | 2412.11023 | link |
2024-12-13 | Visual Object Tracking across Diverse Data Modalities: A Review | Mengmeng Wang et.al. | 2412.09991 | null |
2024-12-09 | Magnetic Switching in Monolayer 2D Diluted Magnetic Semiconductors via Spin-to- Spin Conversion | Siwei Chen et.al. | 2412.06650 | null |
2024-12-09 | Energy Efficient Stochastic Signal Manipulation in Superparamagnetic Tunnel Junctions via Voltage-Controlled Exchange Coupling | Qi Jia et.al. | 2412.06256 | null |
2024-12-03 | GSOT3D: Towards Generic 3D Single Object Tracking in the Wild | Yifan Jiao et.al. | 2412.02129 | link |
2024-12-01 | MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning | You Wu et.al. | 2412.00626 | null |
2024-11-29 | Current-driven motion of magnetic domain-wall skyrmions | Haoyang Nie et.al. | 2411.19566 | null |
2024-11-28 | Unveiling the anisotropy of linear and nonlinear charge-spin conversion in Weyl semimetal TaIrTe4 | Tao Tang et.al. | 2411.19062 | null |
2024-12-04 | A Distractor-Aware Memory for Visual Object Tracking with SAM2 | Jovana Videnovic et.al. | 2411.17576 | link |
2024-11-24 | MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking | Chunhui Zhang et.al. | 2411.15761 | link |
2024-11-23 | How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking | Xuchen Li et.al. | 2411.15600 | null |
2024-11-23 | MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking | Xinqi Liu et.al. | 2411.15459 | null |
2024-11-24 | ClickTrack: Towards Real-time Interactive Single Object Tracking | Kuiran Wang et.al. | 2411.13183 | null |
2024-11-30 | SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory | Cheng-Yen Yang et.al. | 2411.11922 | link |
2024-11-14 | Compression Method for Solar Polarization Spectra Collected from Hinode SOT/SP Observations | Jargalmaa Batmunkh et.al. | 2411.09311 | null |
2024-11-10 | Orthogonal Spin-Orbit Torque-Induced Deterministic Switching in NiO | Yixiao Qiao et.al. | 2411.06379 | null |
2024-11-08 | Giant spin Hall effect with multi-directional spin components in Ni4W | Yifei Yang et.al. | 2411.05682 | null |
2024-11-04 | Single-layer spin-orbit-torque magnetization switching due to spin Berry curvature generated by minute spontaneous atomic displacement in a Weyl oxide | Hiroto Horiuchi et.al. | 2411.01806 | null |
2024-11-04 | ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model | Yiming Sun et.al. | 2411.01756 | null |
2024-11-03 | Capping layer dependent anti-correlation between magnetic damping and spin-orbital to charge conversion | Antarjami Sahoo et.al. | 2411.01662 | null |
2024-11-01 | Spin orbit torque-driven motion of quasi-Bloch domain wall in perpendicularly magnetized W/CoFeB/MgO structures | Nobuyuki Umetsu et.al. | 2411.00516 | null |
2024-10-31 | Origin of line broadening in fading granule: influence of small-scale turbulence | Ryohtaroh T. Ishikawa et.al. | 2410.23654 | null |
2024-10-27 | NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking | Yu Liu et.al. | 2410.20421 | link |
2024-10-25 | Can Stories Help LLMs Reason? Curating Information Space Through Narrative | Vahid Sadiri Javadi et.al. | 2410.19221 | null |
2024-10-19 | The Solution for Single Object Tracking Task of Perception Test Challenge 2024 | Zhiqiang Zhong et.al. | 2410.16329 | null |
2024-10-14 | A stronger form of Yamamoto’s theorem II – Spectral operators | Soumyashant Nayak et.al. | 2410.16318 | null |
2024-10-03 | Leveraging Event Streams with Deep Reinforcement Learning for End-to-End UAV Tracking | Ala Souissi et.al. | 2410.14685 | null |
2024-10-16 | DaDiff: Domain-aware Diffusion Model for Nighttime UAV Tracking | Haobo Zuo et.al. | 2410.12270 | link |
2024-10-14 | SMART-TRACK: A Novel Kalman Filter-Guided Sensor Fusion For Robust UAV Object Tracking in Dynamic Environments | Khaled Gabr et.al. | 2410.10409 | link |
2024-10-09 | DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2410.02492 | null |
2024-10-01 | Energy-efficient picosecond spin-orbit torque magnetization switching in ferro- and ferrimagnetic films | Eva Díaz et.al. | 2410.00474 | null |
2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901 | link |
2024-09-27 | Prompt-Driven Temporal Domain Adaptation for Nighttime UAV Tracking | Changhong Fu et.al. | 2409.18533 | link |
2024-09-26 | A 5T-2MTJ STT-assisted Spin Orbit Torque based Ternary Content Addressable Memory for Hardware Accelerators | Siri Narla et.al. | 2409.17863 | null |
2024-09-26 | General Compression Framework for Efficient Transformer Object Tracking | Lingyi Hong et.al. | 2409.17564 | null |
2024-09-26 | Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking | Pengcheng Shao et.al. | 2409.17560 | null |
2024-09-25 | Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2 | Chunhui Zhang et.al. | 2409.16902 | link |
2024-09-25 | Conditional Generative Denoiser for Nighttime UAV Tracking | Yucheng Wang et.al. | 2409.16834 | link |
2024-09-25 | Progressive Representation Learning for Real-Time UAV Tracking | Changhong Fu et.al. | 2409.16652 | link |
2024-09-25 | Enhancing Nighttime UAV Tracking with Light Distribution Suppression | Liangliang Yao et.al. | 2409.16631 | link |
2024-09-24 | Pulse Shaping Strategies for Efficient Switching of Magnetic Tunnel Junctions by Spin-Orbit Torque | Marco Hoffmann et.al. | 2409.16454 | null |
2024-09-24 | CloudTrack: Scalable UAV Tracking with Cloud Semantics | Yannik Blei et.al. | 2409.16111 | link |
2024-09-20 | A survey of sulfur-bearing molecular lines toward the dense cores in eleven massive protoclusters | Mengyao Tang et.al. | 2409.13231 | null |
2024-09-19 | Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC | Jiawen Kang et.al. | 2409.12388 | link |
2024-09-11 | Topological Spin-Orbit Torque in Ferrimagnetic Weyl Semimetal | Tomonari Meguro et.al. | 2409.07106 | null |
2024-09-09 | Effects of Interfacial Oxygen Diffusion on the Magnetic Properties and Thermal Stability of Pd/CoFeB/Pd/Ta Heterostructure | Saravanan Lakshmanan et.al. | 2409.05783 | null |
2024-09-11 | Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition | Hao Shi et.al. | 2409.00815 | null |
2024-08-30 | Advancing Multi-talker ASR Performance with Large Language Models | Mohan Shi et.al. | 2408.17431 | null |
2024-08-30 | Cross Fusion RGB-T Tracking with Bi-directional Adapter | Zhirong Zeng et.al. | 2408.16979 | null |
2024-08-23 | Energy-efficient field-free unconventional spin-orbit torque magnetization switching dynamics in van der Waals heterostructures | Lalit Pandey et.al. | 2408.13095 | null |
2024-08-21 | Low-Light Object Tracking: A Benchmark | Pengzhi Zhong et.al. | 2408.11463 | link |
2024-08-20 | MambaEVT: Event Stream based Visual Object Tracking using State Space Model | Xiao Wang et.al. | 2408.10487 | link |
2024-08-19 | Reconfigurable Spin Logics and High-density Multistate Memory in a Single Spin-orbit Torque Device | Raghvendra Posti et.al. | 2408.09866 | null |
2024-08-16 | Initialization-Free Multistate Memristor: Synergy of Spin-Orbit Torque and Magnetic Fields | Raghvendra Posti et.al. | 2408.08641 | null |
2024-08-15 | MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking | Simiao Lai et.al. | 2408.07889 | null |
2024-08-12 | Latent Disentanglement for Low Light Image Enhancement | Zhihao Zheng et.al. | 2408.06245 | null |
2024-08-11 | Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends | Jeffry Victor et.al. | 2408.05857 | null |
2024-08-05 | VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking | Yuxuan Lu et.al. | 2408.02263 | null |
2024-08-04 | 3D Single-object Tracking in Point Clouds with High Temporal Variation | Qiao Wu et.al. | 2408.02049 | null |
2024-07-30 | Strained topological insulator spin-orbit torque random access memory (STI-SOTRAM) bit cell for energy-efficient Processing in Memory | Md Golam Morshed et.al. | 2407.20925 | null |
2024-07-19 | HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation | Zezeng Li et.al. | 2407.14419 | null |
2024-07-17 | Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm | Shiyu Liu et.al. | 2407.12614 | null |
2024-07-15 | Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss | Mufeng Yao et.al. | 2407.10485 | link |
2024-07-16 | Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking | Lorenzo Vaquero et.al. | 2407.10151 | link |
2024-07-12 | DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects | Peng Wang et.al. | 2407.09051 | null |
2024-07-11 | Manipulating a Tetris-Inspired 3D Video Representation | Mihir Godbole et.al. | 2407.08885 | null |
2024-07-11 | Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets | Linh Van Ma et.al. | 2407.08872 | link |
2024-07-11 | CommRad: Context-Aware Sensing-Driven Millimeter-Wave Networks | Ish Kumar Jain et.al. | 2407.08817 | null |
2024-07-10 | Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors | Lei Cheng et.al. | 2407.08049 | null |
2024-07-10 | Large spin-orbit torque in a-plane $α$-Fe${2}$O${3}$ /Pt bilayers | Igor Lyalin et.al. | 2407.07731 | null |
2024-07-10 | Spin Splitting in Altermagnetic RuO $_2$ Enables Field-free Spin-Orbit Torque Switching via Dominant Out-of-Plane Spin Polarization | Zhuoyi Li et.al. | 2407.07447 | null |
2024-07-09 | Unconventional Spin-Orbit Torques from Sputtered MoTe2 Films | Shuchen Li et.al. | 2407.06487 | null |
2024-07-07 | Addressing single object tracking in satellite imagery through prompt-engineered solutions | Athena Psalta et.al. | 2407.05518 | null |
2024-07-07 | Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking | You Wu et.al. | 2407.05383 | null |
2024-07-09 | P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds | Jiahao Nie et.al. | 2407.05238 | link |
2024-07-05 | Median Mishaps between Chirality and Spin-Orbit Torques via Asymmetric Hysteresis | Minhwan Kim et.al. | 2407.04624 | null |
2024-07-04 | Serialized Output Training by Learned Dominance | Ying Shi et.al. | 2407.03966 | null |
2024-07-04 | TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers | Fatemeh Nourilenjan Nokabadi et.al. | 2407.03946 | link |
2024-07-04 | Out-of-Plane Polarization from Spin Reflection Induces Field-Free Spin-Orbit Torque Switching in Structures with Canted NiO Interfacial Moments | Zhe Zhang et.al. | 2407.03676 | null |
HDR
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-04-17 | CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework | Wentao Wu et.al. | 2504.12576 | null |
2025-04-16 | Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space | Kaustav Chanda et.al. | 2504.12515 | null |
2025-04-16 | Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging | Tristan S. W. Stevens et.al. | 2504.12154 | null |
2025-04-11 | High Dynamic Range Modulo Imaging for Robust Object Detection in Autonomous Driving | Kebin Contreras et.al. | 2504.11472 | null |
2025-04-15 | GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR | Christophe Bolduc et.al. | 2504.10809 | null |
2025-04-14 | Minimal Sensing for Orienting a Solar Panel | Jeremy Klotz et.al. | 2504.10765 | null |
2025-04-13 | Low-Light Image Enhancement using Event-Based Illumination Estimation | Lei Sun et.al. | 2504.09379 | null |
2025-04-10 | S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion | Yujin Wang et.al. | 2504.07667 | null |
2025-04-08 | Orthogonal Matching Pursuit based Reconstruction for Modulo Hysteresis Operators | Matthias Beckmann et.al. | 2504.05895 | null |
2025-04-08 | Inter-event Interval Microscopy for Event Cameras | Changqing Su et.al. | 2504.04924 | null |
2025-04-06 | eKalibr-Stereo: Continuous-Time Spatiotemporal Calibration for Event-Based Stereo Visual Systems | Shuolong Chen et.al. | 2504.04451 | link |
2025-04-05 | Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision Applications | Brayan Monroy et.al. | 2504.04228 | null |
2025-04-03 | Brightness Perceiving for Recursive Low-Light Image Enhancement | Haodian Wang et.al. | 2504.02362 | link |
2025-04-02 | Anomaly Detection for Hybrid Butterfly Subspecies via Probability Filtering | Bo-Kai Ruan et.al. | 2504.01671 | link |
2025-03-31 | DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting | Seungjun Lee et.al. | 2503.24210 | null |
2025-03-29 | SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry | Peiyu Chen et.al. | 2503.22963 | link |
2025-03-28 | Enhancing Celestial Imaging: High Dynamic Range with Neuromorphic Cameras | Satyapreet Singh Yadav et.al. | 2503.22814 | null |
2025-03-26 | SpikeDerain: Unveiling Clear Videos from Rainy Sequences Using Color Spike Streams | Hanwen Liang et.al. | 2503.20315 | null |
2025-03-26 | A Survey on Event-driven 3D Reconstruction: Development under Different Categories | Chuanzhi Xu et.al. | 2503.19753 | null |
2025-03-25 | Maximum Likelihood Estimation Based Complex-Valued Robust Chinese Remainder Theorem and Its Fast Algorithm | Xiaoping Li et.al. | 2503.18625 | null |
2025-03-21 | Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras | Shuang Guo et.al. | 2503.17262 | link |
2025-03-20 | Neuromorphic Cameras in Astronomy: Unveiling the Future of Celestial Imaging Beyond Conventional Limits | Satyapreet Singh Yadav et.al. | 2503.15883 | null |
2025-03-19 | Boosting HDR Image Reconstruction via Semantic Knowledge Transfer | Qingsen Yan et.al. | 2503.15361 | null |
2025-03-20 | VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention | Mingzhe Zheng et.al. | 2503.15138 | null |
2025-03-18 | Weakly Supervised Spatial Implicit Neural Representation Learning for 3D MRI-Ultrasound Deformable Image Registration in HDR Prostate Brachytherapy | Jing Wang et.al. | 2503.14395 | null |
2025-03-17 | UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks | Yuanbin Qian et.al. | 2503.12905 | link |
2025-03-17 | Stereo Event-based, 6-DOF Pose Tracking for Uncooperative Spacecraft | Zibin Liu et.al. | 2503.12732 | link |
2025-03-16 | EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera | Luming Wang et.al. | 2503.12419 | link |
2025-03-14 | Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP | Trevor D. Canham et.al. | 2503.11883 | null |
2025-03-13 | GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping | Jinfeng Liu et.al. | 2503.10143 | null |
2025-03-10 | Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion | Haowen Bai et.al. | 2503.07235 | null |
2025-03-08 | Optimization models for needle placement in 3D-printed masks for high dose rate brachytherapy | Nasim Mirzavand Boroujeni et.al. | 2503.06000 | null |
2025-03-16 | DeepGrav: Anomalous Gravitational-Wave Detection Through Deep Latent Features | Jianqi Yan et.al. | 2503.03799 | link |
2025-03-05 | BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation | Gangwei Xu et.al. | 2503.03256 | null |
2025-03-04 | ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement | Xuejian Guo et.al. | 2503.02484 | link |
2025-03-03 | S-R2D2: a spherical extension of the R2D2 deep neural network series paradigm for wide-field radio-interferometric imaging | A. Tajja et.al. | 2503.01462 | null |
2025-03-03 | Adaptive cold-atom magnetometry mitigating the trade-off between sensitivity and dynamic range | Zhu Ma et.al. | 2503.01211 | null |
2025-03-01 | High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm | Zhaoyi Tian et.al. | 2503.00410 | link |
2025-03-01 | Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach | Guixu Lin et.al. | 2503.00377 | null |
2025-02-28 | EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration | Kuangyi Chen et.al. | 2503.00167 | link |
2025-02-28 | SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events | Yunfan Lu et.al. | 2502.21120 | null |
2025-02-18 | Fast Antibiotic resistance-Based gene editing of mammalian cells with CRISPR-Cas9 (FAB-CRISPR) | Petia Adarska et.al. | 2502.12675 | null |
2025-02-14 | Quantifying Phase Magnitudes of Open-Source Focused-Probe 4D-STEM Ptychography Reconstructions | Toma Susi et.al. | 2502.09938 | link |
2025-02-10 | Indoor Light and Heat Estimation from a Single Panorama | Guanzhou Ji et.al. | 2502.06973 | null |
2025-02-09 | Compressed sensing enabled high-bandwidth and large dynamic range magnetic sensing | Galya Haim et.al. | 2502.06070 | null |
2025-02-09 | Energy-Efficient Autonomous Aerial Navigation with Dynamic Vision Sensors: A Physics-Guided Neuromorphic Approach | Sourav Sanyal et.al. | 2502.05938 | null |
2025-02-07 | Differentiable Mobile Display Photometric Stereo | Gawoon Ban et.al. | 2502.05055 | null |
2025-02-05 | Deep Learning-based Event Data Coding: A Joint Spatiotemporal and Polarity Solution | Abdelrahman Seleem et.al. | 2502.03285 | null |
2025-02-04 | Event-aided Semantic Scene Completion | Shangwei Guo et.al. | 2502.02334 | link |
2025-01-23 | HP2 Survey V. Ophiuchus: Filament formation in a dispersing cloud complex | João Alves et.al. | 2501.13931 | null |
2025-01-22 | DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning | Wenhao Gu et.al. | 2501.12898 | null |
2025-01-20 | UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion | Zixuan Chen et.al. | 2501.11515 | null |
2025-01-10 | eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events | Shuolong Chen et.al. | 2501.05688 | link |
2025-01-07 | AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene | Chaoran Feng et.al. | 2501.02807 | null |
2024-12-26 | Learning Monocular Depth from Events via Egomotion Compensation | Haitao Meng et.al. | 2412.19067 | null |
2024-12-25 | HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis | Mohammed Hamdan et.al. | 2412.18981 | null |
2024-12-20 | High-Dynamic Range Broadband Terahertz Time-Domain Spectrometer Based on Organic Crystal MNA | Samira Mansourzadeh et.al. | 2412.15718 | null |
2024-12-19 | Event-assisted 12-stop HDR Imaging of Dynamic Scene | Shi Guo et.al. | 2412.14705 | null |
2025-01-06 | LEDiff: Latent Exposure Diffusion for HDR Generation | Chao Wang et.al. | 2412.14456 | null |
2024-12-18 | Development of a High-Resolution, High-Dynamic-Range Charge Detector for Ion Beam Monitoring | O. Adriani et.al. | 2412.13934 | null |
2024-12-18 | Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode | Xin Su et.al. | 2412.13749 | link |
2024-12-17 | Transforming Single Photon Camera Images to Color High Dynamic Range Images | Sumit Sharma et.al. | 2412.12942 | null |
2024-12-17 | Efficient Event-based Semantic Segmentation with Spike-driven Lightweight Transformer-based Networks | Xiaxin Zhu et.al. | 2412.12843 | null |
2024-12-17 | Compressed Sensing Based Residual Recovery Algorithms and Hardware for Modulo Sampling | Shaik Basheeruddin Shah et.al. | 2412.12724 | null |
2024-12-16 | Towards Physically-Based Sky-Modeling | Ian J. Maquignaz et.al. | 2412.11883 | null |
2024-12-16 | High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit | Sonia Rani et.al. | 2412.11859 | null |
2024-12-16 | Predicting the Original Appearance of Damaged Historical Documents | Zhenhua Yang et.al. | 2412.11634 | link |
2024-12-16 | Event-based Detectors for Laser Guide Star Tip-Tilt Sensing | Monique Cockram et.al. | 2412.11436 | null |
2024-12-12 | Continuous Gaussian Process Pre-Optimization for Asynchronous Event-Inertial Odometry | Zhixiang Wang et.al. | 2412.08909 | null |
2024-12-10 | EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering | Toshiya Yura et.al. | 2412.07293 | null |
2024-12-09 | Fitting Spherical Gaussians to Dynamic HDRI Sequences | Pascal Clausen et.al. | 2412.06511 | null |
2024-12-09 | Event fields: Capturing light fields at high speed, resolution, and dynamic range | Ziyuan Qu et.al. | 2412.06191 | null |
2024-12-07 | On an Analytical Inversion Formula for the Modulo Radon Transform | Matthias Beckmann et.al. | 2412.05711 | null |
2024-12-05 | DHOST theories as disformal gravity: From black holes to radiative spacetimes | Jibril Ben Achour et.al. | 2412.04135 | null |
2024-12-05 | High-power single-cycle THz emission from large-area photoconductive emitters at 400 kHz | Mohsen Khalili et.al. | 2412.04004 | null |
2024-12-05 | Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization | Tianyu Chen et.al. | 2412.03941 | null |
2024-12-04 | Accelerating HI density predictions during the Epoch of Reionization using a GPR-based emulator on N-body simulations | Gaurav Pundir et.al. | 2412.03485 | null |
2024-12-03 | EvRT-DETR: The Surprising Effectiveness of DETR-based Detection for Event Cameras | Dmitrii Torbunov et.al. | 2412.02890 | link |
2024-12-02 | Learning Differential Pyramid Representation for Tone Mapping | Qirui Yang et.al. | 2412.01463 | null |
2024-11-28 | Event-based Tracking of Any Point with Motion-Robust Correlation Features | Friedhelm Hamann et.al. | 2412.00133 | link |
2024-11-25 | CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain | Jingchao Peng et.al. | 2411.16327 | null |
2024-11-22 | High-dynamic-range atomic clocks with dual Heisenberg-limited precision scaling | Jungeng Zhou et.al. | 2411.14944 | null |
2024-11-20 | Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and Welding | David Mascareñas et.al. | 2411.13108 | null |
2024-11-18 | Noise Filtering Benchmark for Neuromorphic Satellites Observations | Sami Arja et.al. | 2411.11233 | link |
2024-11-16 | Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion | Kepeng Xu et.al. | 2411.10775 | null |
2024-11-15 | CaLES: A GPU-accelerated solver for large-eddy simulation of wall-bounded flows | Maochao Xiao et.al. | 2411.09364 | link |
2024-11-11 | Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models | NVIDIA et.al. | 2411.07126 | null |
2024-11-25 | Increasing the scalability of graph convolution for FPGA-implemented event-based vision | Piotr Wzorek et.al. | 2411.04269 | null |
2024-11-13 | DEIO: Deep Event Inertial Odometry | Weipeng Guan et.al. | 2411.03928 | link |
2024-11-05 | Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor | Anish Bhattacharya et.al. | 2411.03303 | null |
2024-11-05 | Learning-based Lossless Event Data Compression | Ahmadreza Sezavar et.al. | 2411.03010 | null |
2024-10-30 | Automatic programming via large language models with population self-evolution for dynamic job shop scheduling problem | Jin Huang et.al. | 2410.22657 | null |
2024-10-29 | EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data | Zhonghua Yi et.al. | 2410.21743 | link |
2024-10-28 | NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments | Taiyi Pan et.al. | 2410.21615 | link |
2024-10-27 | BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events | Yijin Li et.al. | 2410.20451 | null |
2024-10-26 | Unleashing Dynamic Range and Resolution in Unlimited Sensing Framework via Novel Hardware | Yuliang Zhu et.al. | 2410.20193 | null |
2024-10-21 | Scene-Segmentation-Based Exposure Compensation for Tone Mapping of High Dynamic Range Scenes | Yuma Kinoshita et.al. | 2410.19839 | null |
2024-10-24 | Environment Maps Editing using Inverse Rendering and Adversarial Implicit Functions | Antonio D’Orazio et.al. | 2410.18622 | null |
2024-10-23 | Frequency-dependent amplitude correction to free-precession scalar magnetometers | M. E. Limes et.al. | 2410.18224 | null |
2024-10-22 | SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition | Jiaqi Chen et.al. | 2410.16746 | link |
2024-10-19 | A Cycle Ride to HDR: Semantics Aware Self-Supervised Framework for Unpaired LDR-to-HDR Image Translation | Hrishav Bakul Barua et.al. | 2410.15068 | link |
2024-10-17 | 360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers | Jack Hilliard et.al. | 2410.13566 | null |
2024-10-17 | On Quantum Programming Languages | Benoît Valiron et.al. | 2410.13337 | null |
2024-10-16 | An O(m+n)-Space Spatiotemporal Denoising Filter with Cache-Like Memories for Dynamic Vision Sensors | Qinghang Zhao et.al. | 2410.12423 | null |
2024-10-10 | DifFRelight: Diffusion-Based Facial Performance Relighting | Mingming He et.al. | 2410.08188 | null |
2024-10-18 | IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera | Jian Huang et.al. | 2410.08107 | link |
2024-10-09 | Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras | Friedhelm Hamann et.al. | 2410.06698 | null |
2024-10-03 | Spiking Neural Network as Adaptive Event Stream Slicer | Jiahang Cao et.al. | 2410.02249 | link |
2024-10-03 | Capturing complex hand movements and object interactions using machine learning-powered stretchable smart textile gloves | Arvin Tashakori et.al. | 2410.02221 | link |
2024-10-01 | Signatures of Black Hole Spin and Plasma Acceleration in Jet Polarimetry | Zachary Gelles et.al. | 2410.00954 | null |
2024-10-04 | VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models | Jiapeng Wang et.al. | 2410.00741 | null |
2024-09-26 | Photon Inhibition for Energy-Efficient Single-Photon Imaging | Lucas J. Koerner et.al. | 2409.18337 | null |
2024-09-26 | Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions | Weng Fei Low et.al. | 2409.17988 | null |
2024-09-26 | Unsupervised Learning Based Multi-Scale Exposure Fusion | Chaobing Zheng et.al. | 2409.17830 | null |
2024-09-26 | Event-based Stereo Depth Estimation: A Survey | Suman Ghosh et.al. | 2409.17680 | null |
2024-09-26 | Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking | Pengcheng Shao et.al. | 2409.17560 | null |
2024-09-25 | EventHDR: from Event to High-Speed HDR Videos and Beyond | Yunhao Zou et.al. | 2409.17029 | null |
2024-09-25 | Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training | Kun Song et.al. | 2409.16767 | null |
2024-09-24 | Sub-Nyquist USF Spectral Estimation: $K$ Frequencies with $6K + 4$ Modulo Samples | Ruiming Guo et.al. | 2409.16472 | null |
2024-09-24 | Neuromorphic Drone Detection: an Event-RGB Multimodal Approach | Gabriele Magrini et.al. | 2409.16099 | link |
2024-09-24 | Deep chroma compression of tone-mapped images | Xenios Milidonis et.al. | 2409.16032 | link |
2024-09-23 | Mixing Data-driven and Geometric Models for Satellite Docking Port State Estimation using an RGB or Event Camera | Cedric Le Gentil et.al. | 2409.15581 | null |
2024-09-23 | SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream | Jinze Yu et.al. | 2409.15176 | link |
2024-09-21 | Monocular Event-Inertial Odometry with Adaptive decay-based Time Surface and Polarity-aware Tracking | Kai Tang et.al. | 2409.13971 | null |
2024-09-20 | Intrinsic Single-Image HDR Reconstruction | Sebastian Dille et.al. | 2409.13803 | link |
2024-09-20 | Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors | Zixin Zhang et.al. | 2409.13392 | null |
2024-09-18 | EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning | Yukun Tian et.al. | 2409.11813 | null |
2024-09-18 | Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network | Jiale Wang et.al. | 2409.11677 | null |
2024-09-16 | Programmable multifunctional integrated microwave photonic circuit on thin-film lithium niobate | Chuangchuang Wei et.al. | 2409.10227 | null |
2024-09-15 | SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux | Rui Graca et.al. | 2409.09648 | null |
2024-09-13 | Integration of high-performance compact interferometric sensors in a suspended interferometer | Alexandra Mitchell et.al. | 2409.08843 | null |
2024-09-13 | Adaptive Robust High-Precision Atomic Gravimetry | Jinye Wei et.al. | 2409.08550 | null |
2024-09-07 | Neural Augmentation Based Panoramic High Dynamic Range Stitching | Chaobing Zheng et.al. | 2409.04679 | null |
2024-09-05 | MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice | Friedhelm Hamann et.al. | 2409.03358 | link |
2024-09-03 | Gradient events: improved acquisition of visual information in event cameras | Eero Lehtonen et.al. | 2409.01764 | null |
2024-09-02 | SoK: Security of the Image Processing Pipeline in Autonomous Vehicles | Michael Kühr et.al. | 2409.01234 | link |
2024-08-30 | Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms | Marcus Märtens et.al. | 2408.16971 | null |
2024-08-29 | EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More | Kanghao Chen et.al. | 2408.16254 | null |
2024-08-28 | ES-PTAM: Event-based Stereo Parallel Tracking and Mapping | Suman Ghosh et.al. | 2408.15605 | link |
2024-08-27 | Towards Real-world Event-guided Low-light Video Enhancement and Deblurring | Taewoo Kim et.al. | 2408.14916 | link |
2024-08-27 | Recent Event Camera Innovations: A Survey | Bharatesh Chakravarthi et.al. | 2408.13627 | link |
2024-08-24 | Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation | Yuxuan Zhou et.al. | 2408.13586 | link |
2024-08-22 | ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes | Zhenyi Liu et.al. | 2408.12048 | link |
2024-08-20 | Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm | Xiao Wang et.al. | 2408.10488 | link |
2024-08-20 | MambaEVT: Event Stream based Visual Object Tracking using State Space Model | Xiao Wang et.al. | 2408.10487 | link |
2024-08-19 | Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms | Xiao Wang et.al. | 2408.09764 | link |
2024-08-19 | Phase-Separated Charge Order and Twinning Across Length Scales in CsV $_3$Sb$_5$ | Jayden Plumb et.al. | 2408.08842 | null |
2024-08-16 | CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving | Shihan Peng et.al. | 2408.08500 | null |
2024-08-13 | MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation | JunYong Choi et.al. | 2408.06707 | null |
2024-08-13 | HDRGS: High Dynamic Range Gaussian Splatting | Jiahao Wu et.al. | 2408.06543 | link |
2024-08-12 | Rethinking Video with a Universal Event-Based Representation | Andrew Freeman et.al. | 2408.06248 | null |
2024-08-10 | EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency | Junjie Jiang et.al. | 2408.05452 | null |
2024-08-06 | Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera | Zibin Liu et.al. | 2408.03225 | link |
2024-07-31 | Exploiting Change Blindness for Video Coding: Perspectives from a Less Promising User Study | Mitra Amiri et.al. | 2408.00052 | null |
2024-07-23 | HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images | Shreyas Singh et.al. | 2407.16503 | link |
2024-07-23 | SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging | Lingtong Kong et.al. | 2407.16308 | link |
2024-07-24 | SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams | Liangyan Jiang et.al. | 2407.15708 | link |
2024-08-04 | Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering | Jiahao Cui et.al. | 2407.13309 | link |
2024-07-18 | Learned HDR Image Compression for Perceptually Optimal Storage and Display | Peibei Cao et.al. | 2407.13179 | null |
2024-07-17 | Nonlinear tomographic reconstruction via nonsmooth optimization | Vasileios Charisopoulos et.al. | 2407.12984 | null |
2024-07-16 | VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos | Devesh Walawalkar et.al. | 2407.12214 | null |
2024-07-16 | I $^2$ -SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM | Gwangtak Bae et.al. | 2407.11347 | null |
2024-07-15 | Temporal Event Stereo via Joint Learning with Stereoscopic Flow | Hoonhee Cho et.al. | 2407.10831 | link |
2024-07-15 | Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation | Yuhwan Jeong et.al. | 2407.10703 | link |
2024-07-15 | Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction | Lin Zhu et.al. | 2407.10636 | null |
2024-07-18 | Efficient hybrid technique for generating sub-grid haloes in reionization simulations | Ankur Barsode et.al. | 2407.10585 | null |
2024-07-12 | Radiance Fields from Photons | Sacha Jungerman et.al. | 2407.09386 | null |
2024-07-11 | Event-based vision on FPGAs – a survey | Tomasz Kryjak et.al. | 2407.08356 | null |
2024-07-12 | Dynamic phase transition into a mixed-CDW state in 1 $T$-TaS$_2$ via a thermal quench | A. de la Torre et.al. | 2407.07953 | null |
2024-07-08 | PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes | Mohammad Reza Karimi Dastjerdi et.al. | 2407.06150 | null |
2024-07-08 | Neuromorphic Imaging with Super-Resolution | Pei Zhang et.al. | 2407.05764 | null |
Low-Level
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-04-17 | SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Haoxuan Li et.al. | 2504.13172 | null |
2025-04-17 | Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal | Inzamamul Alam et.al. | 2504.12809 | null |
2025-04-17 | AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting | Xin Su et.al. | 2504.12605 | null |
2025-04-16 | Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling | Zhihua Wang et.al. | 2504.12204 | null |
2025-04-16 | Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging | Tristan S. W. Stevens et.al. | 2504.12154 | null |
2025-04-16 | Generalized Visual Relation Detection with Diffusion Models | Kaifeng Gao et.al. | 2504.12100 | null |
2025-04-16 | R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors | Haoyang Wang et.al. | 2504.11946 | null |
2025-04-16 | Learning Physics-Informed Color-Aware Transforms for Low-Light Image Enhancement | Xingxing Yang et.al. | 2504.11896 | null |
2025-04-16 | HyperKING: Quantum-Classical Generative Adversarial Networks for Hyperspectral Image Restoration | Chia-Hsiang Lin et.al. | 2504.11782 | null |
2025-04-15 | Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain | Pengcheng Zheng et.al. | 2504.11286 | null |
2025-04-15 | Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach | Xiaoxiao Ma et.al. | 2504.11262 | null |
2025-04-15 | Visual Re-Ranking with Non-Visual Side Information | Gustav Hanning et.al. | 2504.11134 | null |
2025-04-15 | UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques | Pedro Diaz-Garcia et.al. | 2504.11063 | null |
2025-04-15 | TMCIR: Token Merge Benefits Composed Image Retrieval | Chaoyang Wang et.al. | 2504.10995 | null |
2025-04-15 | AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent | Pu Wang et.al. | 2504.10978 | null |
2025-04-15 | An Efficient and Mixed Heterogeneous Model for Image Restoration | Yubin Gu et.al. | 2504.10967 | null |
2025-04-15 | DAAF:Degradation-Aware Adaptive Fusion Framework for Robust Infrared and Visible Images Fusion | Tianpei Zhang et.al. | 2504.10871 | null |
2025-04-14 | PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems | Maud Biquard et.al. | 2504.10375 | null |
2025-04-14 | Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis | Kaiwen Zheng et.al. | 2504.10351 | null |
2025-04-14 | VibrantLeaves: A principled parametric image generator for training deep restoration models | Raphael Achddou et.al. | 2504.10201 | null |
2025-04-14 | Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction | Yucheng Lu et.al. | 2504.10080 | null |
2025-04-14 | Progressive Transfer Learning for Multi-Pass Fundus Image Restoration | Uyen Phan et.al. | 2504.10025 | null |
2025-04-14 | Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration | Gang Wu et.al. | 2504.09973 | null |
2025-04-14 | Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition | Changwei Wang et.al. | 2504.09881 | null |
2025-04-13 | Computationally iterative methods for salt-and-pepper denoising | Jianwei Ke et.al. | 2504.09408 | null |
2025-04-13 | Low-Light Image Enhancement using Event-Based Illumination Estimation | Lei Sun et.al. | 2504.09379 | null |
2025-04-12 | Beyond Degradation Conditions: All-in-One Image Restoration via HOG Transformers | Jiawei Wu et.al. | 2504.09377 | null |
2025-04-11 | Hypergraph Vision Transformers: Images are More than Nodes, More than Edges | Joshua Fixelle et.al. | 2504.08710 | null |
2025-04-11 | ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration | Yongsheng Yu et.al. | 2504.08591 | null |
2025-04-11 | FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations | Cheng-Yu Hsieh et.al. | 2504.08368 | null |
2025-04-11 | DreamFuse: Adaptive Image Fusion with Diffusion Transformer | Junjia Huang et.al. | 2504.08291 | null |
2025-04-11 | VL-UR: Vision-Language-guided Universal Restoration of Images Degraded by Adverse Weather Conditions | Ziyan Liu et.al. | 2504.08219 | null |
2025-04-10 | Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement | Daniel Torres et.al. | 2504.07810 | null |
2025-04-10 | Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval | Zehong Ma et.al. | 2504.07718 | null |
2025-04-10 | Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying | Shichen Li et.al. | 2504.07465 | null |
2025-04-10 | Synthetic CT Generation from Time-of-Flight Non-Attenutaion-Corrected PET for Whole-Body PET Attenuation Correction | Weijie Chen et.al. | 2504.07450 | null |
2025-04-09 | Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model | Yingjie Zhou et.al. | 2504.07148 | null |
2025-04-09 | Distilling Textual Priors from LLM to Efficient Image Fusion | Ran Zhang et.al. | 2504.07029 | null |
2025-04-09 | Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Ruotian Peng et.al. | 2504.06666 | null |
2025-04-09 | Rethinking LayerNorm in Image Restoration Transformers | MinKyu Lee et.al. | 2504.06629 | null |
2025-04-08 | AstroClearNet: Deep image prior for multi-frame astronomical image restoration | Yashil Sukurdeep et.al. | 2504.06463 | null |
2025-04-09 | Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions | Hao Zhang et.al. | 2504.05795 | null |
2025-04-07 | Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion | Xingyu Hu et.al. | 2504.05164 | null |
2025-04-07 | DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration | Jiamei Xiong et.al. | 2504.05135 | null |
2025-04-08 | Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision | Yuandong Pu et.al. | 2504.04903 | null |
2025-04-07 | Content-Aware Transformer for All-in-one Image Restoration | Gang Wu et.al. | 2504.04869 | null |
2025-04-07 | Inland Waterway Object Detection in Multi-environment: Dataset and Approach | Shanshan Wang et.al. | 2504.04835 | null |
2025-04-06 | NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval | Peng Gao et.al. | 2504.04339 | null |
2025-04-05 | JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration | Yunlong Lin et.al. | 2504.04158 | null |
2025-04-04 | Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal | Yuyang Hu et.al. | 2504.03607 | null |
2025-04-04 | REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval | Shabnam Choudhury et.al. | 2504.03169 | null |
2025-04-04 | Finding the Reflection Point: Unpadding Images to Remove Data Augmentation Artifacts in Large Open Source Image Datasets for Machine Learning | Lucas Choi et.al. | 2504.03168 | null |
2025-04-03 | RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models | ZhongLi Fang et.al. | 2504.02640 | null |
2025-04-03 | Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement | Hesong Li et.al. | 2504.02555 | null |
2025-04-03 | HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement | Hantang Li et.al. | 2504.02373 | null |
2025-04-03 | Brightness Perceiving for Recursive Low-Light Image Enhancement | Haodian Wang et.al. | 2504.02362 | link |
2025-04-03 | SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW | Masakazu Yoshimura et.al. | 2504.02345 | null |
2025-04-02 | Bridge the Gap between SNN and ANN for Image Restoration | Xin Su et.al. | 2504.01755 | null |
2025-04-02 | Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval | Yuji Nozawa et.al. | 2504.01348 | null |
2025-04-01 | IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval | Bangwei Liu et.al. | 2504.00954 | null |
2025-04-01 | Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data | Yiqun Duan et.al. | 2504.00812 | null |
2025-04-01 | Deconver: A Deconvolutional Network for Medical Image Segmentation | Pooya Ashtari et.al. | 2504.00302 | link |
2025-03-31 | InstructRestore: Region-Customized Image Restoration with Human Instructions | Shuaizheng Liu et.al. | 2503.24357 | link |
2025-03-31 | CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization | Yingrui Ji et.al. | 2503.24182 | null |
2025-03-31 | 3D Dental Model Segmentation with Geometrical Boundary Preserving | Shufan Xi et.al. | 2503.23702 | null |
2025-03-30 | Multiview Image-Based Localization | Cameron Fiore et.al. | 2503.23577 | null |
2025-03-30 | ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts | Linfeng Tang et.al. | 2503.23356 | null |
2025-03-30 | DSPFusion: Image Fusion via Degradation and Semantic Dual-Prior Guidance | Linfeng Tang et.al. | 2503.23355 | null |
2025-03-29 | A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery | Pengyu Chen et.al. | 2503.23200 | null |
2025-03-29 | indiSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy | Ashesh Ashesh et.al. | 2503.22983 | null |
2025-03-28 | RELD: Regularization by Latent Diffusion Models for Image Restoration | Pasquale Cascarano et.al. | 2503.22563 | null |
2025-03-27 | Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration | Yujie Chen et.al. | 2503.21970 | null |
2025-03-27 | LOCORE: Image Re-ranking with Long-Context Sequence Modeling | Zilin Xiao et.al. | 2503.21772 | link |
2025-03-27 | Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck | Adrian Bulat et.al. | 2503.21757 | null |
2025-03-27 | Invert2Restore: Zero-Shot Degradation-Blind Image Restoration | Hamadi Chihaoui et.al. | 2503.21486 | null |
2025-03-27 | Diffusion Image Prior | Hamadi Chihaoui et.al. | 2503.21410 | null |
2025-03-27 | FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval | Zixu Li et.al. | 2503.21309 | link |
2025-03-27 | Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing | Shuai Li et.al. | 2503.21236 | null |
2025-03-26 | Underwater Image Enhancement by Convolutional Spiking Neural Networks | Vidya Sudevan et.al. | 2503.20485 | link |
2025-03-26 | Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration | Shihao Zhou et.al. | 2503.20174 | null |
2025-03-25 | CoLLM: A Large Language Model for Composed Image Retrieval | Chuong Huynh et.al. | 2503.19910 | link |
2025-03-25 | LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark Dataset | Manjushree Aithal et.al. | 2503.19804 | null |
2025-03-25 | Scene-agnostic Pose Regression for Visual Localization | Junwei Zheng et.al. | 2503.19543 | null |
2025-03-25 | From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting | Zhiwei Huang et.al. | 2503.19358 | null |
2025-03-25 | Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval | Haoqiang Lin et.al. | 2503.19296 | link |
2025-03-24 | LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment | Haoran Wang et.al. | 2503.18640 | null |
2025-03-24 | OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning | Hui Li et.al. | 2503.18635 | null |
2025-03-24 | Dig2DIG: Dig into Diffusion Information Gains for Image Fusion | Bing Cao et.al. | 2503.18627 | null |
2025-03-24 | Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model | Tianpei Zhang et.al. | 2503.18378 | null |
2025-03-23 | LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space | Zhangyu Wang et.al. | 2503.18142 | null |
2025-03-23 | Deep Learning Assisted Denoising of Experimental Micrographs | Owais Ahmad et.al. | 2503.17945 | null |
2025-03-23 | Cross-Domain Underwater Image Enhancement Guided by No-Reference Image Quality Assessment: A Transfer Learning Approach | Zhi Zhang et.al. | 2503.17937 | null |
2025-03-23 | Cat-AIR: Content and Task-Aware All-in-One Image Restoration | Jiachen Jiang et.al. | 2503.17915 | null |
2025-03-23 | What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images | Dongheng Lin et.al. | 2503.17899 | null |
2025-03-22 | good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval | Pranavi Kolouju et.al. | 2503.17871 | null |
2025-03-21 | Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2503.17109 | link |
2025-03-21 | Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks | Haijin Zeng et.al. | 2503.16930 | null |
2025-03-20 | Efficient Bayesian Computation Using Plug-and-Play Priors for Poisson Inverse Problems | Teresa Klatzer et.al. | 2503.16222 | null |
2025-03-20 | 3-D Image-to-Image Fusion in Lightsheet Microscopy by Two-Step Adversarial Network: Contribution to the FuseMyCells Challenge | Marek Wodzinski et.al. | 2503.16075 | null |
2025-03-20 | PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval | Qiang Zou et.al. | 2503.16064 | link |
2025-03-20 | Automating 3D Dataset Generation with Neural Radiance Fields | P. Schulz et.al. | 2503.15997 | link |
2025-03-20 | DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration | Suraj Singh et.al. | 2503.15984 | null |
2025-03-21 | UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations | Debabrata Mandal et.al. | 2503.15868 | null |
2025-03-19 | Image Restoration Models with Optimal Transport and Total Variation Regularization | Weijia Huang et.al. | 2503.14947 | null |
2025-03-19 | MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance | Zihan Cao et.al. | 2503.14944 | null |
2025-03-19 | Degradation Alchemy: Self-Supervised Unknown-to-Known Transformation for Blind Hyperspectral Image Fusion | He Huang et.al. | 2503.14892 | null |
2025-03-18 | Revisiting Image Fusion for Multi-Illuminant White-Balance Correction | David Serrano-Lozano et.al. | 2503.14774 | null |
2025-03-18 | SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model | Yucheng Mao et.al. | 2503.14463 | null |
2025-03-18 | AI-Driven Diabetic Retinopathy Diagnosis Enhancement through Image Processing and Salp Swarm Algorithm-Optimized Ensemble Network | Saif Ur Rehman Khan et.al. | 2503.14209 | null |
2025-03-18 | Towards properties of adversarial image perturbations | Egor Kuznetsov et.al. | 2503.14111 | null |
2025-03-18 | Intra and Inter Parser-Prompted Transformers for Effective Image Restoration | Cong Wang et.al. | 2503.14037 | link |
2025-03-17 | Scale Efficient Training for Large Datasets | Qing Zhou et.al. | 2503.13385 | null |
2025-03-17 | From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective | Chen Zhao et.al. | 2503.13165 | null |
2025-03-17 | All You Need to Know About Training Image Retrieval Models | Gabriele Berton et.al. | 2503.13045 | link |
2025-03-17 | Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion | Yidi Liu et.al. | 2503.12764 | null |
2025-03-16 | DPF-Net: Physical Imaging Model Embedded Data-Driven Underwater Image Enhancement | Han Mei et.al. | 2503.12470 | link |
2025-03-16 | Pathology Image Restoration via Mixture of Prompts | Jiangdong Cai et.al. | 2503.12399 | link |
2025-03-14 | Advancements in Real-Time Oncology Diagnosis: Harnessing AI and Image Fusion Techniques | Leila Bagheriye et.al. | 2503.11332 | null |
2025-03-14 | Breaking Shallow Limits: Task-Driven Pixel Fusion for Gap-free RGBT Tracking | Andong Lu et.al. | 2503.11247 | null |
2025-03-14 | Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption | Du Chen et.al. | 2503.11221 | null |
2025-03-14 | InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences | Hongkai Zheng et.al. | 2503.11043 | null |
2025-03-13 | ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning | Pengfei Luo et.al. | 2503.10166 | link |
2025-03-13 | Hybrid Agents for Image Restoration | Bingchen Li et.al. | 2503.10120 | null |
2025-03-13 | Dream-IF: Dynamic Relative EnhAnceMent for Image Fusion | Xingxin Xu et.al. | 2503.10109 | null |
2025-03-12 | FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification | Shoaib Meraj Sami et.al. | 2503.09873 | null |
2025-03-12 | Multi-Agent Image Restoration | Xu Jiang et.al. | 2503.09403 | null |
2025-03-12 | Revisiting Medical Image Retrieval via Knowledge Consolidation | Yang Nan et.al. | 2503.09370 | null |
2025-03-12 | MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration | Zhehui Wu et.al. | 2503.09131 | link |
2025-03-12 | Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal | Rongxin Liao et.al. | 2503.09013 | link |
2025-03-11 | QUIET-SR: Quantum Image Enhancement Transformer for Single Image Super-Resolution | Siddhant Dutta et.al. | 2503.08759 | null |
2025-03-11 | Language-Depth Navigated Thermal and Visible Image Fusion | Jinchang Zhang et.al. | 2503.08676 | null |
2025-03-11 | PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net | Jun Yin et.al. | 2503.08276 | null |
2025-03-11 | TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement | Miao Zhang et.al. | 2503.08168 | null |
2025-03-11 | Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features | Hanbyul Lee et.al. | 2503.08148 | null |
2025-03-11 | Deep Perceptual Enhancement for Medical Image Analysis | S M A Sharif et.al. | 2503.08027 | link |
2025-03-10 | GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts | Minwen Liao et.al. | 2503.07417 | null |
2025-03-10 | Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion | Haowen Bai et.al. | 2503.07235 | null |
2025-03-11 | Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios | Chenglu Pan et.al. | 2503.07232 | null |
2025-03-10 | Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization | Michael Green et.al. | 2503.07038 | null |
2025-03-10 | Zero-Shot Hashing Based on Reconstruction With Part Alignment | Yan Jiang et.al. | 2503.07037 | null |
2025-03-10 | Learning a Unified Degradation-aware Representation Model for Multi-modal Image Fusion | Haolong Ma et.al. | 2503.07033 | null |
2025-03-10 | MERLION: Marine ExploRation with Language guIded Online iNformative Visual Sampling and Enhancement | Shrutika Vishal Thengane et.al. | 2503.06953 | null |
2025-03-09 | RoboDesign1M: A Large-scale Dataset for Robot Design Understanding | Tri Le et.al. | 2503.06796 | null |
2025-03-09 | StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition | Yanqing Shen et.al. | 2503.06601 | link |
2025-03-07 | Data-Efficient Generalization for Zero-shot Composed Image Retrieval | Zining Chen et.al. | 2503.05204 | null |
2025-03-06 | RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining | Tengfei Zhang et.al. | 2503.04653 | null |
2025-03-06 | Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior | Haitao Wu et.al. | 2503.04207 | null |
2025-03-05 | An Adaptive Underwater Image Enhancement Framework via Multi-Domain Fusion and Color Compensation | Yuezhe Tian et.al. | 2503.03640 | null |
2025-03-05 | Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks | Samuel Repka et.al. | 2503.03507 | null |
2025-03-05 | Two-Stream Thermal Imaging Fusion for Enhanced Time of Birth Detection in Neonatal Care | Jorge García-Torres et.al. | 2503.03244 | null |
2025-03-03 | Hyperspectral Image Restoration and Super-resolution with Physics-Aware Deep Learning for Biomedical Applications | Yuchen Xiang et.al. | 2503.02908 | null |
2025-03-04 | ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement | Xuejian Guo et.al. | 2503.02484 | link |
2025-03-04 | Semantic Prior Distillation with Vision Foundation Model for Enhanced Rapid Bone Scintigraphy Image Restoration | Pengchen Liang et.al. | 2503.02321 | null |
2025-03-03 | MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting | Mojtaba Safari et.al. | 2503.01576 | link |
2025-03-03 | Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions | Zihan Shen et.al. | 2503.01339 | null |
2025-03-03 | Composed Multi-modal Retrieval: A Survey of Approaches and Applications | Kun Zhang et.al. | 2503.01334 | link |
2025-03-03 | Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual | Chong Wang et.al. | 2503.01288 | link |
2025-03-03 | Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond | Guanyao Wu et.al. | 2503.01210 | null |
2025-03-02 | Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion | Daiki Nishiyama et.al. | 2503.00925 | null |
2025-03-01 | Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement | Aupendu Kar et.al. | 2503.00642 | link |
2025-03-01 | Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning | Songlin Dong et.al. | 2503.00515 | null |
2025-02-28 | SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events | Yunfan Lu et.al. | 2502.21120 | null |
2025-02-28 | CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval | Zelong Sun et.al. | 2502.20826 | null |
2025-02-28 | Diffusion Restoration Adapter for Real-World Image Restoration | Hanbang Liang et.al. | 2502.20679 | null |
2025-02-28 | HVI: A New Color Space for Low-light Image Enhancement | Qingsen Yan et.al. | 2502.20272 | link |
2025-02-27 | Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps | Tianxiao Gao et.al. | 2502.20054 | null |
2025-02-27 | Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement | Nan An et.al. | 2502.19867 | null |
2025-02-27 | One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion | Chunyang Cheng et.al. | 2502.19854 | link |
2025-02-26 | ILACS-LGOT: A Multi-Layer Contrast Enhancement Approach for Palm-Vein Images | Kaveen Perera et.al. | 2502.19456 | null |
2025-02-27 | On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation | Ruben T. Lucassen et.al. | 2502.19285 | null |
2025-02-26 | Self-supervised conformal prediction for uncertainty quantification in Poisson imaging problems | Bernardin Tamo Amougou et.al. | 2502.19194 | null |
2025-02-26 | Multi-level Attention-guided Graph Neural Network for Image Restoration | Jiatao Jiang et.al. | 2502.19181 | null |
2025-02-27 | RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images | Yuhan Tang et.al. | 2502.19153 | null |
2025-02-26 | Dynamic Degradation Decomposition Network for All-in-One Image Restoration | Huiqiang Wang et.al. | 2502.19068 | null |
2025-02-25 | Spatial Analysis of Neuromuscular Junctions Activation in Three-Dimensional Histology-based Muscle Reconstructions | Alessandro Ascani Orsini et.al. | 2502.18646 | link |
2025-02-24 | Splitting Regularized Wasserstein Proximal Algorithms for Nonsmooth Sampling Problems | Fuqun Han et.al. | 2502.16773 | link |
2025-02-23 | Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries | Yin Wu et.al. | 2502.16636 | link |
2025-02-21 | Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement | Uche A. Nnolim et.al. | 2502.15986 | null |
2025-02-21 | ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval | Guanqi Zhan et.al. | 2502.15682 | null |
2025-02-21 | LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement | Namrah Siddiqua et.al. | 2502.15186 | null |
2025-02-21 | Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization | Ach Khozaimi et.al. | 2502.15156 | null |
2025-02-20 | Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications | Maha Ezzelarab et.al. | 2502.14995 | null |
2025-02-20 | CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond | Yukai Shi et.al. | 2502.14493 | null |
2025-02-20 | EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement | Wenhui Zhu et.al. | 2502.14260 | null |
2025-02-19 | RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior | Ching-Hua Lee et.al. | 2502.13574 | null |
2025-02-18 | Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Shuo Xing et.al. | 2502.13146 | link |
2025-02-18 | Local Flaw Detection with Adaptive Pyramid Image Fusion Across Spatial Sampling Resolution for SWRs | Siyu You et.al. | 2502.12512 | null |
2025-02-17 | Descriminative-Generative Custom Tokens for Vision-Language Models | Pramuditha Perera et.al. | 2502.12095 | null |
2025-02-17 | ILIAS: Instance-Level Image retrieval At Scale | Giorgos Kordopatis-Zilos et.al. | 2502.11748 | null |
2025-02-17 | Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics | Francesco Croce et.al. | 2502.11725 | link |
2025-02-17 | Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization | Yuanze Xu et.al. | 2502.11408 | null |
2025-02-12 | E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection | Junjie Wu et.al. | 2502.10455 | null |
2025-02-19 | Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal | Jinpei Guo et.al. | 2502.09873 | link |
2025-02-13 | Source function from two-particle correlation function through entropy-regularized Richardson-Lucy deblurring | C. K. Tam et.al. | 2502.09478 | null |
2025-02-13 | ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation | Rotem Shalev-Arkushin et.al. | 2502.09411 | null |
2025-02-12 | Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions | Prajwal Gatti et.al. | 2502.08438 | null |
2025-02-13 | MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers | Ao Li et.al. | 2502.07856 | null |
2025-02-11 | Captured by Captions: On Memorization and its Mitigation in CLIP Models | Wenhao Wang et.al. | 2502.07830 | null |
2025-02-11 | Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems | Ai Chen et.al. | 2502.07351 | link |
2025-02-11 | Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos | Haowen Gao et.al. | 2502.07327 | null |
2025-02-11 | PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval | Osman Tursun et.al. | 2502.07215 | null |
2025-02-10 | AstroLoc: Robust Space to Ground Image Localizer | Gabriele Berton et.al. | 2502.07003 | null |
2025-02-10 | UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis | Zemin Yang et.al. | 2502.06324 | null |
2025-02-09 | A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement | Muhammad Turab et.al. | 2502.05995 | null |
2025-02-09 | Uni-Retrieval: A Multi-Style Retrieval Framework for STEM’s Education | Yanhao Jia et.al. | 2502.05863 | null |
2025-02-11 | UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control | Kaizhen Zhu et.al. | 2502.05749 | link |
2025-02-07 | Self-supervised Conformal Prediction for Uncertainty Quantification in Imaging Problems | Jasper M. Everink et.al. | 2502.05127 | null |
2025-02-07 | Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition | S Sreehari et.al. | 2502.04680 | null |
2025-02-07 | HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion | Mengting Ma et.al. | 2502.04623 | null |
2025-02-06 | Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Marco Mistretta et.al. | 2502.04263 | link |
2025-02-05 | All-in-One Image Compression and Restoration | Huimin Zeng et.al. | 2502.03649 | link |
2025-02-05 | Efficient Image Restoration via Latent Consistency Flow Matching | Elad Cohen et.al. | 2502.03500 | null |
2025-02-05 | Human-Aligned Image Models Improve Visual Decoding from the Brain | Nona Rajabi et.al. | 2502.03081 | null |
2025-02-04 | Blind Visible Watermark Removal with Morphological Dilation | Preston K. Robinette et.al. | 2502.02676 | null |
2025-02-04 | MATCNN: Infrared and Visible Image Fusion Method Based on Multi-scale CNN with Attention Transformer | Jingjing Liu et.al. | 2502.01959 | link |
2025-02-03 | Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis | Haowen Bai et.al. | 2502.01467 | null |
2025-02-03 | Human Body Restoration with One-Step Diffusion Model and A New Benchmark | Jue Gong et.al. | 2502.01411 | null |
2025-02-03 | ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies | Costin F. Ciusdel et.al. | 2502.01335 | null |
2025-02-04 | Compressed Image Generation with Denoising Diffusion Codebook Models | Guy Ohayon et.al. | 2502.01189 | null |
2025-02-01 | A framework for river connectivity classification using temporal image processing and attention based neural networks | Timothy James Becker et.al. | 2502.00474 | null |
2025-02-01 | Shape from Semantics: 3D Shape Generation from Multi-View Semantics | Liangchen Li et.al. | 2502.00360 | null |
2025-01-31 | Deep Ensembling with Multimodal Image Fusion for Efficient Classification of Lung Cancer | Surochita Pal et.al. | 2502.00078 | null |
2025-01-30 | Integrating Spatial and Frequency Information for Under-Display Camera Image Restoration | Kyusu Ahn et.al. | 2501.18517 | null |
2025-01-31 | MatIR: A Hybrid Mamba-Transformer Image Restoration Model | Juan Wen et.al. | 2501.18401 | link |
2025-01-30 | Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers | Malte Tölle et.al. | 2501.18237 | null |
2025-01-29 | Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment | Zixue Zeng et.al. | 2501.17690 | link |
2025-01-28 | Text-to-Image Generation for Vocabulary Learning Using the Keyword Method | Nuwan T. Attygalle et.al. | 2501.17099 | null |
2025-01-27 | Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration | Long Peng et.al. | 2501.16583 | null |
2025-01-27 | UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images | Tatiana Taís Schein et.al. | 2501.16211 | link |
2025-01-27 | Freestyle Sketch-in-the-Loop Image Segmentation | Subhadeep Koley et.al. | 2501.16022 | null |
2025-01-27 | CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference | Zhengyang Lu et.al. | 2501.15852 | link |
2025-01-26 | Universal Image Restoration Pre-training via Degradation Classification | JiaKui Hu et.al. | 2501.15510 | link |
2025-01-26 | Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations | Zijun Long et.al. | 2501.15379 | null |
2025-01-24 | Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders | Zaheer Ahmad et.al. | 2501.14709 | null |
2025-01-24 | Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement | Guoxi Huang et.al. | 2501.14265 | link |
2025-01-24 | CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image | Xiaojun Tang et.al. | 2501.14264 | null |
2025-01-23 | Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jakob Krogh Petersen et.al. | 2501.14051 | link |
2025-01-23 | INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration | Di You et.al. | 2501.14014 | null |
2025-01-23 | Binary Diffusion Probabilistic Model | Vitaliy Kinakh et.al. | 2501.13915 | null |
2025-01-23 | Where Do You Go? Pedestrian Trajectory Prediction using Scene Features | Mohammad Ali Rezaei et.al. | 2501.13848 | null |
2025-01-22 | UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior | I-Hsiang Chen et.al. | 2501.13134 | null |
2025-01-22 | Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects | Louis Aberdeen et.al. | 2501.13009 | null |
2025-01-22 | UniUIR: Considering Underwater Image Restoration as An All-in-One Learner | Xu Zhang et.al. | 2501.12981 | null |
2025-01-22 | FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration | Ruicheng Zhang et.al. | 2501.12832 | link |
2025-01-21 | Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping | Hongxu Yang et.al. | 2501.12245 | null |
2025-01-21 | DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains | Junyu Xia et.al. | 2501.12235 | null |
2025-01-21 | Proxies for Distortion and Consistency with Applications for Real-World Image Restoration | Sean Man et.al. | 2501.12102 | null |
2025-01-20 | SILO: Solving Inverse Problems with Latent Operators | Ron Raphaeli et.al. | 2501.11746 | null |
2025-01-19 | Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection | Zhipeng Yu et.al. | 2501.11063 | link |
2025-01-19 | Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation | Zhengwen Shen et.al. | 2501.10958 | null |
2025-01-18 | Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption | Jinyuan Liu et.al. | 2501.10761 | link |
2025-01-18 | A Resource-Efficient Training Framework for Remote Sensing Text–Image Retrieval | Weihang Zhang et.al. | 2501.10638 | null |
2025-01-17 | DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration | Huiyun Cao et.al. | 2501.10325 | null |
2025-01-16 | FLOL: Fast Baselines for Real-World Low-Light Enhancement | Juan C. Benito et.al. | 2501.09718 | link |
2025-01-16 | Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression | Yongheng Zhang et.al. | 2501.09321 | null |
2025-01-16 | Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images | Yongheng Zhang et.al. | 2501.09268 | null |
2025-01-15 | Vision Foundation Models for Computed Tomography | Suraj Pai et.al. | 2501.09001 | link |
2025-01-12 | SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval | Bhavin Jawade et.al. | 2501.08347 | null |
2025-01-14 | AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring | Sanjida Afrin Mou et.al. | 2501.08266 | link |
2025-01-13 | Depth and Image Fusion for Road Obstacle Detection Using Stereo Camera | Oleg Perezyabov et.al. | 2501.07245 | null |
2025-01-12 | Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation | Zhenyang Feng et.al. | 2501.06749 | null |
2025-01-11 | Natural Language Supervision for Low-light Image Enhancement | Jiahui Tang et.al. | 2501.06546 | null |
2025-01-10 | Underwater Image Enhancement using Generative Adversarial Networks: A Survey | Kancharagunta Kishan Babu et.al. | 2501.06273 | null |
2025-01-09 | HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction | Shaurya Singh Rathore et.al. | 2501.05195 | null |
2025-01-09 | ResPanDiff: Diffusion Model with Disentangled Modulations for Image Fusion | Shiqi Cao et.al. | 2501.05091 | null |
2025-01-09 | IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation | Qi Chen et.al. | 2501.04995 | link |
2025-01-08 | Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration | Laibin Chang et.al. | 2501.04740 | null |
2025-01-14 | HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion | Chia-Ming Lee et.al. | 2501.04665 | null |
2025-01-08 | FrontierNet: Learning Visual Cues to Explore | Boyang Sun et.al. | 2501.04597 | null |
2025-01-08 | MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration | Zhi Jin et.al. | 2501.04486 | link |
2025-01-08 | Recognition-Oriented Low-Light Image Enhancement based on Global and Pixelwise Optimization | Seitaro Ono et.al. | 2501.04210 | null |
2025-01-07 | Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications | L. Berlyand et.al. | 2501.04182 | null |
2025-01-07 | Convergent Primal-Dual Plug-and-Play Image Restoration: A General Algorithm and Applications | Yodai Suzuki et.al. | 2501.03780 | link |
2025-01-06 | ImageMM: Joint multi-frame image restoration and super-resolution | Yashil Sukurdeep et.al. | 2501.03002 | null |
2025-01-06 | Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI | Xujin Li et.al. | 2501.02841 | null |
2025-01-06 | Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis | Xiaojiao Guo et.al. | 2501.02701 | link |
2025-01-03 | iCBIR-Sli: Interpretable Content-Based Image Retrieval with 2D Slice Embeddings | Shuhei Tomoshige et.al. | 2501.01642 | null |
2025-01-02 | Domain-invariant feature learning in brain MR imaging for content-based image retrieval | Shuya Tobari et.al. | 2501.01326 | null |
2025-01-03 | Conditional Consistency Guided Image Translation and Enhancement | Amil Bhagat et.al. | 2501.01223 | link |
2025-01-02 | Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion | Dong Zhang et.al. | 2501.01114 | null |
2024-12-30 | Text-to-Image GAN with Pretrained Representations | Xiaozhou You et.al. | 2501.00116 | null |
2024-12-30 | Varformer: Adapting VAR’s Generative Prior for Image Restoration | Siyang Wang et.al. | 2412.21063 | link |
2024-12-30 | Low-Light Image Enhancement via Generative Perceptual Priors | Han Zhou et.al. | 2412.20916 | null |
2024-12-29 | Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) | Tomer Garber et.al. | 2412.20596 | link |
2024-12-28 | Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems | Wen-Dong Jiang et.al. | 2412.20201 | null |
2024-12-28 | UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity | Jingbo Lin et.al. | 2412.20157 | link |
2024-12-28 | MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration | Boyun Li et.al. | 2412.20066 | link |
2024-12-28 | An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models | Yuang Wang et.al. | 2412.19992 | null |
2024-12-27 | Generative Adversarial Network on Motion-Blur Image Restoration | Zhengdong Li et.al. | 2412.19479 | null |
2024-12-25 | FOR: Finetuning for Object Level Open Vocabulary Image Retrieval | Hila Levi et.al. | 2412.18806 | null |
2024-12-24 | Underwater Image Restoration via Polymorphic Large Kernel CNNs | Xiaojiao Guo et.al. | 2412.18459 | link |
2024-12-24 | UNet–: Memory-Efficient and Feature-Enhanced Network Architecture based on U-Net with Reduced Skip-Connections | Lingxiao Yin et.al. | 2412.18276 | null |
2024-12-24 | SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos | Zhen Zhang et.al. | 2412.18214 | link |
2024-12-24 | ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval | Le Dong et.al. | 2412.18136 | link |
2024-12-22 | Where am I? Cross-View Geo-localization with Natural Language Descriptions | Junyan Ye et.al. | 2412.17007 | null |
2024-12-21 | Optoelectronic generative adversarial networks | Jumin Qiu et.al. | 2412.16672 | link |
2024-12-21 | Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising | Yuchen Wang et.al. | 2412.16645 | null |
2024-12-24 | Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling | Daichi Yashima et.al. | 2412.16576 | link |
2024-12-21 | Rethinking Model Redundancy for Low-light Image Enhancement | Tong Li et.al. | 2412.16459 | null |
2024-12-20 | SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild | Jannik Elsäßer et.al. | 2412.16147 | null |
2024-12-20 | NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images | Yue Guo et.al. | 2412.15890 | null |
2024-12-20 | Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation | Aiwen Jiang et.al. | 2412.15845 | link |
2024-12-20 | A New Method to Capturing Compositional Knowledge in Linguistic Space | Jiahe Wan et.al. | 2412.15632 | null |
2024-12-20 | Stabilizing Laplacian Inversion in Fokker-Planck Image Retrieval using the Transport-of-Intensity Equation | Samantha J Alloo et.al. | 2412.15513 | null |
2024-12-19 | Learning Visual Composition through Improved Semantic Guidance | Austin Stone et.al. | 2412.15396 | null |
2024-12-19 | Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model | Minglong Xue et.al. | 2412.14630 | link |
2024-12-19 | MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | Junjie Zhou et.al. | 2412.14475 | null |
2024-12-18 | Personalized Generative Low-light Image Denoising and Enhancement | Xijun Wang et.al. | 2412.14327 | null |
2024-12-18 | Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing | Le-Anh Tran et.al. | 2412.14220 | link |
2024-12-18 | Adversarial Hubness in Multi-Modal Retrieval | Tingwei Zhang et.al. | 2412.14113 | link |
2024-12-18 | Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval | Giacomo Pacini et.al. | 2412.13834 | null |
2024-12-18 | Fed-AugMix: Balancing Privacy and Utility via Data Augmentation | Haoyang Li et.al. | 2412.13818 | null |
2024-12-18 | Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode | Xin Su et.al. | 2412.13749 | link |
2024-12-18 | VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement | Chen Zhao et.al. | 2412.13655 | link |
2024-12-18 | DarkIR: Robust Low-Light Image Restoration | Daniel Feijoo et.al. | 2412.13443 | link |
2024-12-18 | Zero-Shot Low Light Image Enhancement with Diffusion Prior | Joshua Cho et.al. | 2412.13401 | link |
2024-12-17 | Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration | Xinlong Cheng et.al. | 2412.12550 | null |
2024-12-17 | Three Things to Know about Deep Metric Learning | Yash Patel et.al. | 2412.12432 | null |
2024-12-16 | Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD) | Ki-Hwan Oh et.al. | 2412.12238 | link |
2024-12-16 | Ultra-High-Definition Dynamic Multi-Exposure Image Fusion via Infinite Pixel Learning | Xingchi Chen et.al. | 2412.11685 | null |
2024-12-16 | CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution | Bingwen Hu et.al. | 2412.11609 | null |
2024-12-15 | Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval | Zelong Sun et.al. | 2412.11087 | null |
2024-12-15 | Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2412.11077 | link |
2024-12-15 | Towards Context-aware Convolutional Network for Image Restoration | Fangwei Hao et.al. | 2412.11008 | null |
2024-12-14 | Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification | Yucong Meng et.al. | 2412.10776 | null |
2024-12-16 | Matrix Completion via Residual Spectral Matching | Ziyuan Chen et.al. | 2412.10005 | null |
2024-12-13 | $\textrm{A}^{\textrm{2}}$ RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion | Jiawei Li et.al. | 2412.09954 | link |
2024-12-12 | OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs | Yuanzhi Zhu et.al. | 2412.09465 | link |
2024-12-13 | Are Conditional Latent Diffusion Models Effective for Image Restoration? | Yunchen Yuan et.al. | 2412.09324 | null |
2024-12-13 | MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition | Qiwen Gu et.al. | 2412.09199 | null |
2024-12-12 | ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring | Zhongbao Yang et.al. | 2412.09193 | null |
2024-12-12 | Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration | Yunshuai Zhou et.al. | 2412.08939 | link |
2024-12-12 | A Flexible Plug-and-Play Module for Generating Variable-Length | Liyang He et.al. | 2412.08922 | link |
2024-12-11 | Image Retrieval Methods in the Dissimilarity Space | Madhu Kiran et.al. | 2412.08618 | null |
2024-12-11 | Convergence Analysis of a Proximal Stochastic Denoising Regularization Algorithm | Marien Renaud et.al. | 2412.08262 | null |
2024-12-11 | Visible and Infrared Image Fusion Using Encoder-Decoder Network | Ferhat Can Ataman et.al. | 2412.08073 | link |
2024-12-11 | BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion | Huafeng Li et.al. | 2412.08050 | link |
2024-12-10 | Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance | Wanwen Chen et.al. | 2412.07741 | null |
2024-12-10 | Leveraging Content and Context Cues for Low-Light Image Enhancement | Igor Morawski et.al. | 2412.07693 | link |
2024-12-10 | Analytical-Heuristic Modeling and Optimization for Low-Light Image Enhancement | Axel Martinez et.al. | 2412.07659 | null |
2024-12-10 | Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE).pdf | Tu Vo et.al. | 2412.07527 | null |
2024-12-10 | Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring | Yuzhi Zhao et.al. | 2412.07256 | link |
2024-12-10 | EchoIR: Advancing Image Restoration with Echo Upsampling and Bi-Level Optimization | Yuhan He et.al. | 2412.07225 | null |
2024-12-10 | A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing | Yujie Feng et.al. | 2412.07195 | null |
2024-12-09 | InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention | Howard Zhang et.al. | 2412.06753 | null |
2024-12-09 | EchoSim4D: A Proof-of-Concept Gamified XR Echocardiography Training Simulator for Neonates using 4D Ultrasound Volume | Deepthy Rose Jose et.al. | 2412.06271 | null |
2024-12-08 | A Review on Multisensor Data Fusion for Wearable Health Monitoring | Arlene John et.al. | 2412.05895 | null |
2024-12-07 | Compositional Image Retrieval via Instruction-Aware Contrastive Learning | Wenliang Zhong et.al. | 2412.05756 | link |
2024-12-07 | Enhancing Sample Generation of Diffusion Models using Noise Level Correction | Abulikemu Abuduweili et.al. | 2412.05488 | null |
2024-12-06 | Equivariant Denoisers for Image Restoration | Marien Renaud et.al. | 2412.05343 | null |
2024-12-06 | ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration | Chi-Wei Hsiao et.al. | 2412.05043 | null |
2024-12-06 | DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection | Yishuo Chen et.al. | 2412.04931 | link |
2024-12-06 | DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification | Ying Jin et.al. | 2412.04828 | null |
2024-12-06 | Modality Decoupling is All You Need: A Simple Solution for Unsupervised Hyperspectral Image Fusion | Songcheng Du et.al. | 2412.04802 | null |
2024-12-05 | Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise | Brayan Monroy et.al. | 2412.04648 | link |
2024-12-05 | MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers | Byeonghyeon Lee et.al. | 2412.04591 | null |
2024-12-05 | Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image | Shuang Xu et.al. | 2412.04201 | null |
2024-12-05 | Deep priors for satellite image restoration with accurate uncertainties | Biquard Maud et.al. | 2412.04130 | null |
2024-12-05 | Blind Underwater Image Restoration using Co-Operational Regressor Networks | Ozer Can Devecioglu et.al. | 2412.03995 | null |
2024-12-05 | LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model | Yuan Xue et.al. | 2412.03841 | null |
2024-12-05 | Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration | Yuzhen Du et.al. | 2412.03814 | null |
2024-12-04 | Composed Image Retrieval for Training-Free Domain Conversion | Nikos Efthymiadis et.al. | 2412.03297 | link |
2024-12-04 | Task-driven Image Fusion with Learnable Fusion Loss | Haowen Bai et.al. | 2412.03240 | null |
2024-12-04 | Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution | Jiahua Xiao et.al. | 2412.02960 | null |
2024-12-03 | Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval | Leah Bar et.al. | 2412.02310 | link |
2024-12-03 | Relaxed and Inertial Nonlinear Forward-Backward with Momentum | Fernando Roldán et.al. | 2412.02045 | link |
2024-12-02 | Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features | MD Shaikh Rahman et.al. | 2412.01555 | null |
2024-12-02 | Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond | MD Raqib Khan et.al. | 2412.01456 | link |
2024-12-02 | FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration | Hao Li et.al. | 2412.01427 | null |
2024-12-02 | Neuron Abandoning Attention Flow: Visual Explanation of Dynamics inside CNN Models | Yi Liao et.al. | 2412.01202 | null |
2024-12-01 | Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration | Haoze Sun et.al. | 2412.00878 | null |
2024-12-01 | DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement | Tongshun Zhang et.al. | 2412.00683 | link |
2024-12-01 | MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning | You Wu et.al. | 2412.00626 | null |
2024-11-30 | Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion | Michail Dontas et.al. | 2412.00557 | null |
2024-11-29 | Self-Supervised Denoiser Framework | Emilien Valat et.al. | 2411.19593 | null |
2024-11-27 | Optimizing Image Retrieval with an Extended b-Metric Space | Abdelkader Belhenniche et.al. | 2411.18800 | null |
2024-11-27 | Hierarchical Information Flow for Generalized Efficient Image Restoration | Yawei Li et.al. | 2411.18588 | null |
2024-11-27 | Complexity Experts are Task-Discriminative Learners for Any Image Restoration | Eduard Zamfir et.al. | 2411.18466 | null |
2024-11-27 | Adaptive Blind All-in-One Image Restoration | David Serrano-Lozano et.al. | 2411.18412 | link |
2024-11-29 | HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning | Zengxi Zhang et.al. | 2411.18296 | link |
2024-11-27 | TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution | Linwei Dong et.al. | 2411.18263 | link |
2024-12-02 | Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision | Jinnyeong Kim et.al. | 2411.18025 | null |
2024-11-26 | Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation | Sudarshan Rajagopalan et.al. | 2411.17814 | null |
2024-11-26 | GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration | Sudarshan Rajagopalan et.al. | 2411.17687 | null |
2024-11-26 | Learning Visual Hierarchies with Hyperbolic Embeddings | Ziwei Wang et.al. | 2411.17490 | null |
2024-11-26 | Puzzle Similarity: A Perceptually-guided No-Reference Metric for Artifact Detection in 3D Scene Reconstructions | Nicolai Hermann et.al. | 2411.17489 | null |
2024-11-26 | MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers | Ruoxi Zhu et.al. | 2411.17226 | link |
2024-11-25 | Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding | Yubin Gu et.al. | 2411.16217 | null |
2024-11-25 | U2NeRF: Unsupervised Underwater Image Restoration and Neural Radiance Fields | Vinayak Gupta et.al. | 2411.16172 | null |
2024-11-25 | Image Generation Diversity Issues and How to Tame Them | Mischa Dombrowski et.al. | 2411.16171 | link |
2024-11-24 | PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation | Chia-Ming Lee et.al. | 2411.15922 | link |
2024-11-24 | MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking | Chunhui Zhang et.al. | 2411.15761 | link |
2024-11-24 | LTCF-Net: A Transformer-Enhanced Dual-Channel Fourier Framework for Low-Light Image Restoration | Gaojing Zhang et.al. | 2411.15740 | null |
2024-11-22 | Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration | Darshan Thaker et.al. | 2411.15295 | null |
2024-11-22 | MambaIRv2: Attentive State Space Restoration | Hang Guo et.al. | 2411.15269 | link |
2024-11-22 | Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval | Zengbao Sun et.al. | 2411.14704 | link |
2024-11-21 | Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection | Ali Awad et.al. | 2411.14626 | null |
2024-11-21 | Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion | Jinhong He et.al. | 2411.13961 | link |
2024-11-20 | Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms | Matthieu Kowalski et.al. | 2411.13276 | null |
2024-11-20 | Globally Correlation-Aware Hard Negative Generation | Wenjie Peng et.al. | 2411.13145 | link |
2024-11-19 | Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution | Yang Zou et.al. | 2411.12530 | link |
2024-11-19 | Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models | Jun Xiao et.al. | 2411.12450 | null |
2024-11-19 | Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images | Zheng Gong et.al. | 2411.12278 | null |
2024-11-16 | GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding | Yue Zhou et.al. | 2411.11904 | link |
2024-11-18 | Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion | Meng Zhou et.al. | 2411.11799 | link |
2024-11-18 | Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment | Zhendong Liu et.al. | 2411.11543 | null |
2024-11-17 | Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method | Yan Zheng et.al. | 2411.11135 | null |
2024-11-19 | TSFormer: A Robust Framework for Efficient UHD Image Restoration | Xin Su et.al. | 2411.10951 | null |
2024-11-16 | AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations | Jiawei Mao et.al. | 2411.10708 | null |
2024-11-16 | Underwater Image Enhancement with Cascaded Contrastive Learning | Yi Liu et.al. | 2411.10682 | link |
2024-11-16 | SPDFusion: An Infrared and Visible Image Fusion Network Based on a Non-Euclidean Representation of Riemannian Manifolds | Huan Kang et.al. | 2411.10679 | null |
2024-11-15 | Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence | Guodong Sun et.al. | 2411.10321 | null |
2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309 | link |
2024-11-15 | Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion | Dan He et.al. | 2411.10036 | null |
2024-11-14 | Instruction-Driven Fusion of Infrared-Visible Images: Tailoring for Diverse Downstream Tasks | Zengyi Yang et.al. | 2411.09387 | null |
2024-11-13 | Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval | Saul Santos et.al. | 2411.08590 | link |
2024-11-13 | Saliency Map-based Image Retrieval using Invariant Krawtchouk Moments | Ashkan Nejad et.al. | 2411.08567 | link |
2024-11-12 | CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising | Linxuan Li et.al. | 2411.07930 | link |
2024-11-12 | Joint multi-dimensional dynamic attention and transformer for general image restoration | Huan Zhang et.al. | 2411.07893 | link |
2024-11-12 | All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model | Yuanbo Wen et.al. | 2411.07445 | null |
2024-11-11 | Multi-scale Frequency Enhancement Network for Blind Image Deblurring | Yawen Xiang et.al. | 2411.06893 | null |
2024-11-10 | Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration | Chen Wu et.al. | 2411.06456 | null |
2024-11-08 | A Modular Conditional Diffusion Framework for Image Reconstruction | Magauiya Zhussip et.al. | 2411.05993 | null |
2024-11-05 | From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Xintian Sun et.al. | 2411.05826 | null |
2024-11-07 | Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion | Yiming Sun et.al. | 2411.04697 | link |
2024-11-07 | l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion | Gargi Panda et.al. | 2411.04519 | null |
2024-11-05 | Test-Time Dynamic Image Fusion | Bing Cao et.al. | 2411.02840 | link |
2024-11-05 | ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing | Yuka Ogino et.al. | 2411.02799 | null |
2024-11-04 | TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives | Maitreya Patel et.al. | 2411.02545 | null |
2024-11-11 | INQUIRE: A Natural World Text-to-Image Retrieval Benchmark | Edward Vendrow et.al. | 2411.02537 | link |
2024-11-04 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925 | null |
2024-11-03 | Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration | Xiaole Tang et.al. | 2411.01656 | link |
2024-11-03 | Conditional Controllable Image Fusion | Bing Cao et.al. | 2411.01573 | link |
2024-11-03 | Efficient Medical Image Retrieval Using DenseNet and FAISS for BIRADS Classification | MD Shaikh Rahman et.al. | 2411.01473 | null |
2024-11-03 | TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement | Xuanzhao Dong et.al. | 2411.01403 | null |
2024-11-02 | Medical X-Ray Image Enhancement Using Global Contrast-Limited Adaptive Histogram Equalization | Sohrab Namazi Nia et.al. | 2411.01373 | null |
2024-11-01 | Identifying Implicit Social Biases in Vision-Language Models | Kimia Hamidieh et.al. | 2411.00997 | null |
2024-10-31 | Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes | Shaohua Liu et.al. | 2411.00239 | null |
2024-10-31 | Chasing Better Deep Image Priors between Over- and Under-parameterization | Qiming Wu et.al. | 2410.24187 | link |
2024-10-31 | Nearest Neighbor Normalization Improves Multimodal Retrieval | Neil Chowdhury et.al. | 2410.24114 | link |
2024-10-31 | Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation | Yihang Zhou et.al. | 2410.23962 | null |
2024-10-31 | Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model | Hao Zhang et.al. | 2410.23905 | link |
2024-10-31 | MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval | Haiwen Li et.al. | 2410.23736 | null |
2024-10-31 | Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data | Yucun Hou et.al. | 2410.23628 | null |
2024-10-31 | MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction | Ziqi Gao et.al. | 2410.23577 | link |
2024-10-30 | Decoupling Semantic Similarity from Spatial Alignment for Neural Networks | Tassilo Wald et.al. | 2410.23107 | link |
2024-10-30 | EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models | Shangquan Sun et.al. | 2410.22959 | link |
2024-10-30 | SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion | Kun Hu et.al. | 2410.22837 | link |
2024-10-30 | Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement | Sahil Ali Akbar et.al. | 2410.21946 | link |
2024-10-29 | Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications | Monica Riedler et.al. | 2410.21943 | link |
2024-10-28 | Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework | Vladimir Arkhipkin et.al. | 2410.21061 | link |
2024-10-27 | Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement | Junhao Tan et.al. | 2410.20314 | link |
2024-10-27 | Deep Learning, Machine Learning – Digital Signal and Image Processing: From Theory to Application | Weiche Hsieh et.al. | 2410.20304 | null |
2024-10-24 | HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision | Burak Ercan et.al. | 2410.19164 | null |
2024-10-24 | ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval | Zijia Zhao et.al. | 2410.18715 | link |
2024-10-29 | DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation | Yuang Ai et.al. | 2410.18666 | link |
2024-10-23 | DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection | Qingpeng Li et.al. | 2410.17822 | link |
2024-10-23 | An Intelligent Agentic System for Complex Image Restoration Problems | Kaiwen Zhu et.al. | 2410.17809 | link |
2024-10-23 | A variational approach to nonlocal image restoration flows | Harsh Prasad et.al. | 2410.17649 | null |
2024-10-23 | Diffusion Priors for Variational Likelihood Estimation and Image Denoising | Jun Cheng et.al. | 2410.17521 | link |
2024-10-22 | Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2410.17393 | null |
2024-10-20 | LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration | Yuang Ai et.al. | 2410.15385 | link |
2024-10-20 | GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning | Haiwen Diao et.al. | 2410.15266 | link |
2024-10-19 | A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends | Junjun Jiang et.al. | 2410.15067 | link |
2024-10-19 | Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway’s Digitised Book Collection | Marie Roald et.al. | 2410.14969 | link |
2024-10-16 | Development of Image Collection Method Using YOLO and Siamese Network | Chan Young Shin et.al. | 2410.12561 | null |
2024-10-16 | Towards Flexible and Efficient Diffusion Low Light Enhancer | Guanzhou Lan et.al. | 2410.12346 | null |
2024-10-16 | Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond | Pengwei Liang et.al. | 2410.12274 | null |
2024-10-15 | Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos | Zhouxia Wang et.al. | 2410.11828 | null |
2024-10-15 | LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images | Yuzhou Cheng et.al. | 2410.11505 | null |
2024-10-13 | Fusion Based Hand Geometry Recognition Using Dempster-Shafer Theory | Asish Bera et.al. | 2410.09842 | null |
2024-10-13 | LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond | Md Tanvir Islam et.al. | 2410.09831 | link |
2024-10-14 | LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection | Mingjia Li et.al. | 2410.08810 | link |
2024-10-11 | Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers | Jin Cao et.al. | 2410.08688 | link |
2024-10-16 | Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP | Eunji Kim et.al. | 2410.08469 | null |
2024-10-11 | A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification | Eugene P. W. Ang et.al. | 2410.08456 | null |
2024-10-10 | TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration | Hsing-Hua Wang et.al. | 2410.08177 | link |
2024-10-10 | A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks | Hoin Jung et.al. | 2410.07593 | link |
2024-10-09 | Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval | Mohammad Omama et.al. | 2410.07022 | null |
2024-10-09 | Rethinking the Evaluation of Visible and Infrared Image Fusion | Dayan Guan et.al. | 2410.06811 | link |
2024-10-09 | InstantIR: Blind Image Restoration with Instant Generative Reference | Jen-Yuan Huang et.al. | 2410.06551 | null |
2024-10-09 | MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging | Noel C. F. Codella et.al. | 2410.06542 | null |
2024-10-08 | Temporal Image Caption Retrieval Competition – Description and Results | Jakub Pokrywka et.al. | 2410.06314 | null |
2024-10-08 | GSLoc: Visual Localization with 3D Gaussian Splatting | Kazii Botashev et.al. | 2410.06165 | null |
2024-10-08 | Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning | Ayush Singh et.al. | 2410.05928 | null |
2024-10-08 | ReFIR: Grounding Large Restoration Models with Retrieval Augmentation | Hang Guo et.al. | 2410.05601 | link |
2024-10-09 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | Wei Wu et.al. | 2410.05249 | null |
2024-10-07 | Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration | Zhiyu Zhu et.al. | 2410.04811 | link |
2024-10-06 | Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli | Valentyn Piskovskyi et.al. | 2410.04497 | null |
2024-10-06 | SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems | Ismail Alkhouri et.al. | 2410.04479 | link |
2024-10-05 | Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model | Keda Tao et.al. | 2410.04161 | null |
2024-10-04 | Diffusion State-Guided Projected Gradient for Inverse Problems | Rayhan Zirvi et.al. | 2410.03463 | link |
2024-10-03 | PnP-Flow: Plug-and-Play Image Restoration with Flow Matching | Ségolène Martin et.al. | 2410.02423 | link |
2024-10-03 | Can Capacitive Touch Images Enhance Mobile Keyboard Decoding? | Piyawat Lertvittayakumjorn et.al. | 2410.02264 | link |
2024-10-02 | Posterior sampling via Langevin dynamics based on generative priors | Vishal Purohit et.al. | 2410.02078 | null |
2024-10-03 | EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections | Francesc Net et.al. | 2410.01536 | link |
2024-10-04 | CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment | Safouane El Ghazouali et.al. | 2410.01411 | link |
2024-10-01 | Three-Operator Splitting Method with Two-Step Inertial Extrapolation | Olaniyi S. Iyiola et.al. | 2410.01099 | null |
2024-10-01 | GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer | Youngho Yoon et.al. | 2410.00672 | link |
2024-10-01 | Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration | Guy Ohayon et.al. | 2410.00418 | link |
2024-10-01 | GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction | Zaid Ilyas et.al. | 2410.00380 | null |
2024-09-30 | Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation | Aleyna Kütük et.al. | 2410.00266 | null |
2024-09-30 | A Survey on Diffusion Models for Inverse Problems | Giannis Daras et.al. | 2410.00083 | null |
2024-09-30 | UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation | Cheng Zhang et.al. | 2409.20197 | link |
2024-09-29 | Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation | Xiaofeng Cong et.al. | 2409.19685 | link |
2024-09-28 | Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration | Chu-Jie Qin et.al. | 2409.19403 | link |
2024-09-28 | VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition | Ahmad Khaliq et.al. | 2409.19293 | link |
2024-09-28 | PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution | Song Zhang et.al. | 2409.19269 | link |
2024-09-28 | Extending Depth of Field for Varifocal Multiview Images | Zhilong Li et.al. | 2409.19220 | null |
2024-09-27 | MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion | Bardienus Duisterhof et.al. | 2409.19152 | null |
2024-09-27 | Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors | Yunlong Lin et.al. | 2409.18899 | null |
2024-09-26 | Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval | Mankeerat Sidhu et.al. | 2409.18733 | null |
2024-09-27 | Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification | Salma Hassan et.al. | 2409.18715 | null |
2024-09-27 | Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models | Nguyen Gia Bach et.al. | 2409.18476 | link |
2024-09-27 | SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement | Yunkui Pang et.al. | 2409.18355 | link |
2024-09-26 | Toward Efficient Deep Blind RAW Image Restoration | Marcos V. Conde et.al. | 2409.18204 | link |
2024-09-26 | Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs | Qinpeng Cui et.al. | 2409.17778 | link |
2024-09-25 | Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement | Yihao Zhou et.al. | 2409.16661 | null |
2024-09-25 | Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement | Guanlin Li et.al. | 2409.16604 | link |
2024-09-24 | Proactive Schemes: A Survey of Adversarial Attacks for Social Good | Vishal Asnani et.al. | 2409.16491 | null |
2024-09-24 | Liger at W.M. Keck Observatory: imager structural analysis, fabrication, and characterization plan | James Wiley et.al. | 2409.16263 | null |
2024-09-23 | PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions | Weifeng Lin et.al. | 2409.15278 | link |
2024-09-23 | FusionRF: High-Fidelity Satellite Neural Radiance Fields from Multispectral and Panchromatic Acquisitions | Michael Sprintson et.al. | 2409.15132 | null |
2024-09-22 | Low-Light Enhancement Effect on Classification and Detection: An Empirical Study | Xu Wu et.al. | 2409.14461 | null |
2024-09-22 | Quantitative and Qualitative Evaluation of NLM and Wavelet Methods in Image Enhancement | Cameron Khanpour et.al. | 2409.14334 | null |
2024-09-20 | Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval | Morris Florek et.al. | 2409.13513 | link |
2024-09-19 | Deep Learning-Based Detection of Referable Diabetic Retinopathy and Macular Edema Using Ultra-Widefield Fundus Imaging | Philippe Zhang et.al. | 2409.12854 | null |
2024-09-19 | Fundus image enhancement through direct diffusion bridges | Sehui Kim et.al. | 2409.12377 | link |
2024-09-18 | Denoising diffusion models for high-resolution microscopy image restoration | Pamela Osuna-Vargas et.al. | 2409.12078 | null |
2024-09-18 | DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion | Jian Xu et.al. | 2409.11642 | link |
2024-09-17 | Ultrasound Image Enhancement with the Variance of Diffusion Models | Yuxin Zhang et.al. | 2409.11380 | link |
2024-09-17 | Improving the Efficiency of Visually Augmented Language Models | Paula Ontalvilla et.al. | 2409.11148 | link |
2024-09-17 | CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement | Xuanzhao Dong et.al. | 2409.10966 | link |
2024-09-16 | Taming Diffusion Models for Image Restoration: A Review | Ziwei Luo et.al. | 2409.10353 | null |
2024-09-17 | Fuse4Seg: Image-Level Fusion Based Multi-Modality Medical Image Segmentation | Yuchen Guo et.al. | 2409.10328 | null |
2024-09-16 | Garment Attribute Manipulation with Multi-level Attention | Vittorio Casula et.al. | 2409.10206 | null |
2024-09-16 | DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion | Yuchen Guo et.al. | 2409.10080 | null |
2024-09-15 | Underwater Image Enhancement via Dehazing and Color Restoration | Chengqin Wu et.al. | 2409.09779 | null |
2024-09-15 | Unsupervised Hyperspectral and Multispectral Image Blind Fusion Based on Deep Tucker Decomposition Network with Spatial-Spectral Manifold Learning | He Wang et.al. | 2409.09670 | link |
2024-09-14 | Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval | Amirreza Mahbod et.al. | 2409.09430 | link |
2024-09-14 | Infrared and Visible Image Fusion with Hierarchical Human Perception | Guang Yang et.al. | 2409.09291 | null |
2024-09-12 | Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement | Vamsi Krishna Vasa et.al. | 2409.07862 | null |
2024-09-12 | Quaternion Nuclear Norm minus Frobenius Norm Minimization for color image reconstruction | Yu Guo et.al. | 2409.07797 | null |
2024-09-11 | FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process | Yang Luo et.al. | 2409.07451 | null |
2024-09-11 | Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement | Xianmin Chen et.al. | 2409.07040 | link |
2024-09-11 | PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening | RuoCheng Wu et.al. | 2409.06980 | null |
2024-09-10 | Modeling Image Tone Dichotomy with the Power Function | Axel Martinez et.al. | 2409.06764 | null |
2024-09-10 | Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer | Li Ke et.al. | 2409.06590 | null |
2024-09-10 | Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models | Siyu Zhai et.al. | 2409.06420 | null |
2024-09-10 | A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions | Zhicong Wu et.al. | 2409.06381 | null |
2024-09-10 | Multi-Weather Image Restoration via Histogram-Based Transformer Feature Enhancement | Yang Wen et.al. | 2409.06334 | null |
2024-09-10 | AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration | Hongyi Cai et.al. | 2409.06206 | null |
2024-09-09 | Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding | Bram Willemsen et.al. | 2409.05721 | link |
2024-09-09 | Open-World Dynamic Prompt and Continual Visual Representation Learning | Youngeun Kim et.al. | 2409.05312 | null |
2024-09-09 | Rethinking the Atmospheric Scattering-driven Attention via Channel and Gamma Correction Priors for Low-Light Image Enhancement | Shyang-En Weng et.al. | 2409.05274 | link |
2024-09-07 | Training-free ZS-CIR via Weighted Modality Fusion and Similarity | Ren-Di Wu et.al. | 2409.04918 | link |
2024-09-07 | Power Line Aerial Image Restoration under dverse Weather: Datasets and Baselines | Sai Yang et.al. | 2409.04812 | link |
2024-09-06 | Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models | Saghir Alfasly et.al. | 2409.04631 | null |
2024-09-06 | Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior | Charlesquin Kemajou Mbakam et.al. | 2409.04384 | null |
2024-09-06 | RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement | Hao Luo et.al. | 2409.04363 | link |
2024-09-06 | Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks | Hangcheng Cao et.al. | 2409.04133 | null |
2024-09-05 | Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration | Pei Wang et.al. | 2409.03455 | null |
2024-09-05 | KAN See In the Dark | Aoxiang Ning et.al. | 2409.03404 | link |
2024-09-05 | Multiple weather images restoration using the task transformer and adaptive mixup strategy | Yang Wen et.al. | 2409.03249 | null |
2024-09-05 | Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion | Chenguang Zhu et.al. | 2409.03223 | null |
2024-09-05 | Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem | Qiwen Zhu et.al. | 2409.03179 | link |
2024-09-04 | Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications | Abby Stylianou et.al. | 2409.03012 | null |
2024-09-04 | Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening | Ivan Pereira-Sánchez et.al. | 2409.02675 | link |
2024-09-04 | NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval | Sepanta Zeighami et.al. | 2409.02343 | link |
2024-09-03 | Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models | Jiaqi Xu et.al. | 2409.02101 | link |
2024-09-03 | F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring | Subhajit Paul et.al. | 2409.02056 | null |
2024-09-03 | AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions | Chenghao Qian et.al. | 2409.02045 | link |
2024-09-03 | Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment | Konstantin Schall et.al. | 2409.01936 | link |
2024-09-03 | Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion | Ke Cao et.al. | 2409.01728 | null |
2024-09-03 | Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement | Kun Zhou et.al. | 2409.01641 | link |
2024-09-03 | GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting | Zixuan Guo et.al. | 2409.01581 | null |
2024-09-02 | A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches | Kim Jinwoo et.al. | 2409.01219 | null |
2024-08-30 | Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method | Yuji Lin et.al. | 2408.17339 | link |
2024-09-02 | RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance | Avideep Mukherjee et.al. | 2408.17095 | null |
2024-08-30 | Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL | Haiyang Zhao et.al. | 2408.17060 | null |
2024-08-29 | GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content | Lebin Zhou et.al. | 2408.16866 | null |
2024-09-02 | A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising | Shuaiyu Yuan et.al. | 2408.16481 | null |
2024-08-29 | Enhanced Control for Diffusion Bridge in Image Restoration | Conghan Yue et.al. | 2408.16303 | link |
2024-08-29 | Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models | Kengo Nakata et.al. | 2408.16296 | null |
2024-08-29 | LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement | Ye Yu et.al. | 2408.16235 | link |
2024-08-28 | Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration | Xu Zhang et.al. | 2408.15994 | null |
2024-08-28 | MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion | Yanglin Deng et.al. | 2408.15641 | link |
2024-08-28 | Temporal Attention for Cross-View Sequential Image Localization | Dong Yuan et.al. | 2408.15569 | link |
2024-08-27 | A Preliminary Exploration Towards General Image Restoration | Xiangtao Kong et.al. | 2408.15143 | null |
2024-08-27 | Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild | Tianqi Wei et.al. | 2408.14723 | null |
2024-08-26 | FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation | Daixun Li et.al. | 2408.13980 | null |
2024-08-25 | LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task | Ali Asgarov et.al. | 2408.13909 | link |
2024-08-23 | O-Mamba: O-shape State-Space Model for Underwater Image Enhancement | Chenyu Dong et.al. | 2408.12816 | link |
2024-08-22 | CODE: Confident Ordinary Differential Editing | Bastien van Delft et.al. | 2408.12418 | link |
2024-08-22 | Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement | Lingyu Zhu et.al. | 2408.12316 | link |
2024-08-21 | Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations | Lintong Zhang et.al. | 2408.11966 | null |
2024-08-21 | OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal | Qiao Mo et.al. | 2408.11480 | link |
2024-08-21 | UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | Xiangyu Zhao et.al. | 2408.11305 | link |
2024-08-21 | Taming Generative Diffusion for Universal Blind Image Restoration | Siwei Tu et.al. | 2408.11287 | null |
2024-08-20 | Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement | Satoshi Kosugi et.al. | 2408.11055 | link |
2024-08-20 | SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement | Linlin Hu et.al. | 2408.10934 | null |
2024-08-20 | UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement | Yingtie Lei et.al. | 2408.10653 | link |
2024-08-19 | BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval | Zhenyu Lu et.al. | 2408.10383 | null |
2024-08-19 | Multi-Scale Representation Learning for Image Restoration with State-Space Model | Yuhong He et.al. | 2408.10145 | null |
2024-08-19 | Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration | Alik Pramanick et.al. | 2408.09912 | link |
2024-08-19 | Fashion Image-to-Image Translation for Complementary Item Retrieval | Matteo Attimonelli et.al. | 2408.09847 | link |
2024-08-19 | ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement | Eashan Adhikarla et.al. | 2408.09650 | link |
2024-08-17 | Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration | Xin Lin et.al. | 2408.09241 | link |
2024-08-16 | DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer’s Case Studies | Mohammad Hossein Najafi et.al. | 2408.08489 | null |
2024-08-15 | Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks | Jiawei Wu et.al. | 2408.08149 | link |
2024-08-15 | HAIR: Hypernetworks-based All-in-One Image Restoration | Jin Cao et.al. | 2408.08091 | link |
2024-08-15 | DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions | Ryosuke Korekata et.al. | 2408.07910 | null |
2024-08-13 | Review Learning: Advancing All-in-One Ultra-High-Definition Image Restoration Training Method | Xin Su et.al. | 2408.06709 | null |
2024-08-12 | Wavelet based inpainting detection | Barglazan Adrian-Alin et.al. | 2408.06429 | null |
2024-08-12 | Latent Disentanglement for Low Light Image Enhancement | Zhihao Zheng et.al. | 2408.06245 | null |
2024-08-10 | Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network | Junyan Ye et.al. | 2408.05475 | link |
2024-08-10 | Greedy randomized block Kaczmarz method for matrix equation AXB=C and its applications in color image restoration | Wenli Wang et.al. | 2408.05444 | null |
2024-08-08 | Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration | Ziran Zhang et.al. | 2408.04227 | null |
2024-08-08 | MultiColor: Image Colorization by Learning from Multiple Color Spaces | Xiangcheng Du et.al. | 2408.04172 | null |
2024-08-06 | AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval | Pavel Suma et.al. | 2408.03282 | link |
2024-08-05 | Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models | Tongtong Feng et.al. | 2408.02408 | null |
2024-08-02 | On Validation of Search & Retrieval of Tissue Images in Digital Pathology | H. R. Tizhoosh et.al. | 2408.01570 | null |
2024-08-02 | Underwater Object Detection Enhancement via Channel Stabilization | Muhammad Ali et.al. | 2408.01293 | link |
2024-08-02 | Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement | Wenbin Zou et.al. | 2408.01276 | link |
2024-08-02 | Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration | Donwon Park et.al. | 2408.01099 | null |
2024-08-02 | FCDFusion: a Fast, Low Color Deviation Method for Fusing Visible and Infrared Image Pairs | Hesong Li et.al. | 2408.01080 | null |
2024-08-01 | A Prior Embedding-Driven Architecture for Long Distance Blind Iris Recognition | Qi Xiong et.al. | 2408.00210 | null |
2024-07-30 | UniProcessor: A Text-induced Unified Low-level Image Processor | Huiyu Duan et.al. | 2407.20928 | link |
2024-07-27 | Inverse Problems with Diffusion Models: A MAP Estimation Perspective | Sai bharath chandra Gutha et.al. | 2407.20784 | link |
2024-07-29 | ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement | Ezequiel Perez-Zarate et.al. | 2407.19708 | link |
2024-07-31 | Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint | Song Zhang et.al. | 2407.19248 | null |
2024-07-27 | Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration | Xiaoyan Yu et.al. | 2407.19139 | link |
2024-07-26 | Dilated Strip Attention Network for Image Restoration | Fangwei Hao et.al. | 2407.18613 | null |
2024-07-25 | RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models | Haoyu Chen et.al. | 2407.18035 | null |
2024-07-25 | Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography | Kailai Zhou et.al. | 2407.17996 | link |
2024-07-23 | S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks | Neha A S et.al. | 2407.17587 | null |
2024-07-24 | Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation | Yongqi Li et.al. | 2407.17274 | null |
2024-07-23 | CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction | Liang Zhao et.al. | 2407.16204 | null |
2024-07-23 | Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems | Sojin Lee et.al. | 2407.16125 | link |
2024-07-20 | Deep Learning CT Image Restoration using System Blur and Noise Models | Yijie Yuan et.al. | 2407.14983 | null |
2024-07-23 | AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement | Yunlong Lin et.al. | 2407.14900 | null |
2024-07-20 | Dual High-Order Total Variation Model for Underwater Image Restoration | Yuemei Li et.al. | 2407.14868 | link |
2024-07-19 | Adaptive Frequency Enhancement Network for Single Image Deraining | Fei Yan et.al. | 2407.14292 | null |
2024-07-19 | Double-Shot 3D Shape Measurement with a Dual-Branch Network | Mingyang Lei et.al. | 2407.14198 | null |
2024-07-19 | TaGAT: Topology-Aware Graph Attention Network For Multi-modal Retinal Image Fusion | Xin Tian et.al. | 2407.14188 | link |
2024-07-18 | Visual Haystacks: Answering Harder Questions About Sets of Images | Tsung-Han Wu et.al. | 2407.13766 | link |
2024-07-18 | Any Image Restoration with Efficient Automatic Degradation Adaptation | Bin Ren et.al. | 2407.13372 | link |
2024-07-18 | Training-Free Large Model Priors for Multiple-in-One Image Restoration | Xuanhua He et.al. | 2407.13181 | null |
2024-07-18 | Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement | Eashan Adhikarla et.al. | 2407.13170 | null |
2024-07-21 | HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration | Shuchang Zhang et.al. | 2407.13120 | null |
2024-07-17 | Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations | Tomáš Chobola et.al. | 2407.12511 | link |
2024-07-17 | GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval | Han Zhou et.al. | 2407.12431 | link |
2024-07-17 | Towards Revisiting Visual Place Recognition for Joining Submaps in Multimap SLAM | Markus Weißflog et.al. | 2407.12408 | null |
2024-07-17 | GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity | Shuo Cao et.al. | 2407.12273 | null |
2024-07-16 | Haze-Aware Attention Network for Single-Image Dehazing | Lihan Tong et.al. | 2407.11505 | null |
2024-07-16 | EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis | Ruijie Yang et.al. | 2407.11401 | null |
2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964 | link |
2024-07-15 | In-Loop Filtering via Trained Look-Up Tables | Zhuoyuan Li et.al. | 2407.10926 | null |
2024-07-15 | MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration | Yulin Ren et.al. | 2407.10833 | null |
2024-07-15 | DINO Pre-training for Vision-based End-to-end Autonomous Driving | Shubham Juneja et.al. | 2407.10803 | null |
2024-07-15 | Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval | Youngsun Lim et.al. | 2407.10683 | null |
2024-07-15 | An experimental evaluation of Siamese Neural Networks for robot localization using omnidirectional imaging in indoor environments | J. J. Cabrera et.al. | 2407.10536 | null |
Image Matching
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-04-11 | Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models | Josef Bengtson et.al. | 2504.08348 | null |
2025-04-10 | Image registration of 2D optical thin sections in a 3D porous medium: Application to a Berea sandstone digital rock image | Jaehong Chung et.al. | 2504.06604 | link |
2025-04-08 | To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition | Davide Sferrazza et.al. | 2504.06116 | null |
2025-04-10 | Learning Affine Correspondences by Integrating Geometric Constraints | Pengju Sun et.al. | 2504.04834 | link |
2025-04-01 | Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data | Yiqun Duan et.al. | 2504.00812 | null |
2025-03-31 | CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching | Zizhuo Li et.al. | 2503.23925 | null |
2025-03-28 | Pairwise Matching of Intermediate Representations for Fine-grained Explainability | Lauren Shrack et.al. | 2503.22881 | link |
2025-03-26 | Multimodal Image Matching based on Frequency-domain Information of Local Energy Response | Meng Yang et.al. | 2503.20827 | null |
2025-03-22 | Normalized Matching Transformer | Abtin Pourhadi et.al. | 2503.17715 | link |
2025-03-20 | Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors | Tian Yi Lim et.al. | 2503.16275 | null |
2025-03-20 | MapGlue: Multimodal Remote Sensing Image Matching | Peihao Wu et.al. | 2503.16185 | link |
2025-03-19 | PAPI-Reg: Patch-to-Pixel Solution for Efficient Cross-Modal Registration between LiDAR Point Cloud and Camera Image | Yuanchao Yue et.al. | 2503.15285 | null |
2025-04-07 | Less Biased Noise Scale Estimation for Threshold-Robust RANSAC | Johan Edstedt et.al. | 2503.13433 | null |
2025-03-17 | SatDepth: A Novel Dataset for Satellite Image Matching | Rahul Deshmukh et.al. | 2503.12706 | link |
2025-03-14 | Refining Image Edge Detection via Linear Canonical Riesz Transforms | Shuhui Yang et.al. | 2503.11148 | null |
2025-03-13 | Speedy MASt3R | Jingxing Li et.al. | 2503.10017 | null |
2025-03-11 | Keypoint Detection and Description for Raw Bayer Images | Jiakai Lin et.al. | 2503.08673 | null |
2025-03-06 | Learning 3D Medical Image Models From Brain Functional Connectivity Network Supervision For Mental Disorder Diagnosis | Xingcan Hu et.al. | 2503.04205 | null |
2025-03-07 | Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration | Qianliang Wu et.al. | 2503.04127 | null |
2025-03-05 | JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba | Xiaoyong Lu et.al. | 2503.03437 | null |
2025-02-28 | CNSv2: Probabilistic Correspondence Encoded Neural Image Servo | Anzhe Chen et.al. | 2503.00132 | null |
2025-02-27 | A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization | Yejun Zhang et.al. | 2502.20036 | link |
2025-02-27 | RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges | Thibaut Loiseau et.al. | 2502.19955 | null |
2025-02-26 | BEV-LIO(LC): BEV Image Assisted LiDAR-Inertial Odometry with Loop Closure | Haoxin Cai et.al. | 2502.19242 | link |
2025-02-25 | PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching | Han Nie et.al. | 2502.18104 | link |
2025-02-25 | Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking | Xin Tong et.al. | 2502.17766 | null |
2025-03-04 | Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model | Yaxuan Huang et.al. | 2502.16779 | null |
2025-02-16 | FeaKM: Robust Collaborative Perception under Noisy Pose Conditions | Jiuwu Hao et.al. | 2502.11003 | link |
2025-02-24 | Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation | Emanuele Mule et.al. | 2502.06288 | link |
2025-02-04 | Muographic Image Upsampling with Machine Learning for Built Infrastructure Applications | William O’Donnell et.al. | 2502.02624 | null |
2025-02-01 | MambaGlue: Fast and Robust Local Feature Matching With Mamba | Kihwan Ryoo et.al. | 2502.00462 | link |
2025-01-24 | Dense-SfM: Structure from Motion with Dense Consistent Matching | JongMin Lee et.al. | 2501.14277 | null |
2025-01-20 | MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching | Yepeng Liu et.al. | 2501.11299 | null |
2025-01-13 | MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training | Xingyi He et.al. | 2501.07556 | null |
2025-01-13 | Matching Free Depth Recovery from Structured Light | Zhuohang Yu et.al. | 2501.07113 | null |
2025-01-02 | Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views | Yulun Wu et.al. | 2501.01196 | null |
2024-12-31 | Towards Real-Time 2D Mapping: Harnessing Drones, AI, and Computer Vision for Advanced Insights | Bharath Kumar Agnur et.al. | 2412.20210 | null |
2024-12-27 | MINIMA: Modality Invariant Image Matching | Xingyu Jiang et.al. | 2412.19412 | link |
2024-12-24 | GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network | Xianfeng Song et.al. | 2412.18221 | link |
2024-12-17 | Bringing Multimodality to Amazon Visual Search System | Xinliang Zhu et.al. | 2412.13364 | null |
2024-12-04 | Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis | Siyoon Jin et.al. | 2412.03150 | null |
2024-11-20 | DT-LSD: Deformable Transformer-based Line Segment Detection | Sebastian Janampa et.al. | 2411.13005 | link |
2024-11-15 | Image Matching Filtering and Refinement by Planes and Beyond | Fabio Bellavia et.al. | 2411.09484 | link |
2024-11-11 | XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration | Ismail Can Yagmur et.al. | 2411.07430 | link |
2024-11-07 | The Impact of Semi-Supervised Learning on Line Segment Detection | Johanna Engman et.al. | 2411.04596 | link |
2024-11-04 | Silver medal Solution for Image Matching Challenge 2024 | Yian Wang et.al. | 2411.01851 | null |
2024-10-30 | Variable Resolution Sampling and Deep Learning Image Recovery for Accelerated Multi-Spectral MRI Near Metal Implants | Azadeh Sharafi et.al. | 2410.23329 | null |
2024-11-05 | RelationBooth: Towards Relation-Aware Customized Object Generation | Qingyu Shi et.al. | 2410.23280 | null |
2024-10-31 | ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses | Junjie Ni et.al. | 2410.22733 | null |
2024-10-30 | LoFLAT: Local Feature Matching using Focused Linear Attention Transformer | Naijian Cao et.al. | 2410.22710 | null |
2024-10-26 | Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification | Yue Su et.al. | 2410.20097 | null |
2024-10-01 | A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference | Yuan Li et.al. | 2410.11848 | null |
2024-10-15 | LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images | Yuzhou Cheng et.al. | 2410.11505 | null |
2024-10-12 | Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence | Felipe Cadar et.al. | 2410.09533 | link |
2024-09-27 | Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras | Yipeng Lu et.al. | 2409.18673 | null |
2024-09-25 | Game4Loc: A UAV Geo-Localization Benchmark from Game Data | Yuxiang Ji et.al. | 2409.16925 | link |
2024-09-24 | Automatic Registration of SHG and H&E Images with Feature-based Initial Alignment and Intensity-based Instance Optimization: Contribution to the COMULIS Challenge | Marek Wodzinski et.al. | 2409.15931 | null |
2024-09-10 | Weakly-supervised Camera Localization by Ground-to-satellite Image Registration | Yujiao Shi et.al. | 2409.06471 | link |
2024-09-05 | Enabling Practical and Privacy-Preserving Image Processing | Chao Wang et.al. | 2409.03568 | null |
2024-09-20 | A General Albedo Recovery Approach for Aerial Photogrammetric Images through Inverse Rendering | Shuang Song et.al. | 2409.03032 | link |
2024-08-29 | Super-Resolution works for coastal simulations | Zhi-Song Liu et.al. | 2408.16553 | null |
2024-09-15 | Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks | Sierra Bonilla et.al. | 2408.16445 | link |
2024-08-26 | Affine steerers for structured keypoint description | Georg Bökman et.al. | 2408.14186 | link |
2024-08-25 | TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers | Chuanrui Zhang et.al. | 2408.13770 | null |
2024-09-11 | Coarse-to-fine Alignment Makes Better Speech-image Retrieval | Lifeng Zhou et.al. | 2408.13119 | null |
2024-08-19 | BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval | Zhenyu Lu et.al. | 2408.10383 | null |
2024-08-14 | RSD-DOG : A New Image Descriptor based on Second Order Derivatives | Darshan Venkatrayappa et.al. | 2408.07687 | null |
2024-08-09 | One Shot is Enough for Sequential Infrared Small Target Segmentation | Bingbing Dan et.al. | 2408.04823 | link |
2024-08-07 | PRISM: PRogressive dependency maxImization for Scale-invariant image Matching | Xudong Cai et.al. | 2408.03598 | null |
2024-08-05 | ConDL: Detector-Free Dense Image Matching | Monika Kwiatkowski et.al. | 2408.02766 | null |
2024-08-04 | Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image | Xinlin Ren et.al. | 2408.02079 | link |
2024-07-29 | Image-text matching for large-scale book collections | Artemis Llabrés et.al. | 2407.19812 | link |
2024-07-26 | PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis | Sohyeong Kim et.al. | 2407.18695 | null |
2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
2024-07-17 | GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jingwen Yu et.al. | 2407.11736 | link |
2024-07-16 | REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching | Han Nie et.al. | 2407.11637 | link |
2024-07-16 | A Self-Correcting Strategy of the Digital Volume Correlation Displacement Field Based on Image Matching: Application to Poor Speckles Quality and Complex-Large Deformation | Chengsheng Li et.al. | 2407.11287 | null |
2024-07-14 | Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching | Xiaoyong Lu et.al. | 2407.07789 | null |
2024-07-10 | Mutual Information calculation on different appearances | Jiecheng Liao et.al. | 2407.07410 | null |
2024-07-15 | SfM on-the-fly: Get better 3D from What You Capture | Zongqian Zhan et.al. | 2407.03939 | null |
2024-07-03 | IMC 2024 Methods & Solutions Review | Shyam Gupta et.al. | 2407.03172 | null |
2024-06-21 | High Resolution Surface Reconstruction of Cultural Heritage Objects Using Shape from Polarization Method | F. S. Mortazavi et.al. | 2406.15121 | null |
2024-06-16 | Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models | Yikai Zhang et.al. | 2406.10902 | link |
2024-06-14 | Grounding Image Matching in 3D with MASt3R | Vincent Leroy et.al. | 2406.09756 | link |
MutilModal
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-04-17 | SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Haoxuan Li et.al. | 2504.13172 | null |
2025-04-17 | Hadamard product in deep learning: Introduction, Advances and Challenges | Grigorios G Chrysos et.al. | 2504.13112 | null |
2025-04-17 | EventVAD: Training-Free Event-Aware Video Anomaly Detection | Yihua Shao et.al. | 2504.13092 | null |
2025-04-17 | SkyReels-V2: Infinite-length Film Generative Model | Guibin Chen et.al. | 2504.13074 | null |
2025-04-17 | ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images | Sangwook Kim et.al. | 2504.13023 | null |
2025-04-17 | EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery | Wei Zhang et.al. | 2504.12795 | null |
2025-04-17 | Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration | Yicheng Pan et.al. | 2504.12773 | null |
2025-04-17 | SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Qianqian Sun et.al. | 2504.12704 | null |
2025-04-17 | GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning | Liangyu Xu et.al. | 2504.12597 | null |
2025-04-16 | Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis | Shravan Chaudhari et.al. | 2504.12511 | null |
2025-04-16 | Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis | Miaosen Luo et.al. | 2504.12151 | null |
2025-04-16 | Instruction-augmented Multimodal Alignment for Image-Text and Element Matching | Xinli Yue et.al. | 2504.12018 | null |
2025-04-16 | AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection | Yuhao Chao et.al. | 2504.11914 | null |
2025-04-16 | Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation | Julia Kreutzer et.al. | 2504.11829 | null |
2025-04-15 | DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis | Efthymios Georgiou et.al. | 2504.11082 | null |
2025-04-15 | Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation | Yan Rong et.al. | 2504.11002 | null |
2025-04-14 | CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates | Ankit Kumar Shaw et.al. | 2504.10738 | null |
2025-04-14 | Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization | Darryl Hannan et.al. | 2504.10727 | null |
2025-04-14 | Relation-Rich Visual Document Generator for Visual Information Extraction | Zi-Han Jiang et.al. | 2504.10659 | null |
2025-04-15 | InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models | Jinguo Zhu et.al. | 2504.10479 | null |
2025-04-14 | Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding | Tao Zhang et.al. | 2504.10465 | null |
2025-04-14 | The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Weixian Lei et.al. | 2504.10462 | null |
2025-04-14 | FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos | Rui Chen et.al. | 2504.10358 | null |
2025-04-14 | CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation | Junchen Fu et.al. | 2504.10307 | null |
2025-04-14 | PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search | Pengfei Hu et.al. | 2504.10222 | null |
2025-04-14 | The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance | Anwesha Mohanty et.al. | 2504.10179 | null |
2025-04-14 | COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts | Jiansheng Li et.al. | 2504.10158 | null |
2025-04-14 | CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography | I-Sheng Fang et.al. | 2504.10090 | null |
2025-04-15 | MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework | Zihan Ling et.al. | 2504.10074 | null |
2025-04-11 | Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images | Boyang Deng et.al. | 2504.08727 | null |
2025-04-10 | POEM: Precise Object-level Editing via MLLM control | Marco Schouten et.al. | 2504.08111 | null |
2025-04-10 | GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Lang Lin et.al. | 2504.07962 | null |
2025-04-10 | MM-IFEngine: Towards Multimodal Instruction Following | Shengyuan Ding et.al. | 2504.07957 | link |
2025-04-10 | Perception-R1: Pioneering Perception Policy with Reinforcement Learning | En Yu et.al. | 2504.07954 | link |
2025-04-10 | MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation | Nico Catalano et.al. | 2504.07942 | null |
2025-04-10 | VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding | Henghao Zhao et.al. | 2504.07519 | null |
2025-04-10 | How Can Objects Help Video-Language Understanding? | Zitian Tang et.al. | 2504.07454 | null |
2025-04-10 | Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing | Chenxi Sun et.al. | 2504.07424 | null |
2025-04-10 | Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction | Kyoyun Choi et.al. | 2504.07415 | null |
2025-04-09 | Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning | Ashutosh Chaubey et.al. | 2504.07198 | null |
2025-04-10 | VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning | Xinhao Li et.al. | 2504.06958 | null |
2025-04-09 | MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Chang Nie et.al. | 2504.06863 | null |
2025-04-09 | Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions | Angela Lopez-Cardona et.al. | 2504.06843 | null |
2025-04-09 | Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Ruotian Peng et.al. | 2504.06666 | null |
2025-04-09 | Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program | Minghe Gao et.al. | 2504.06606 | null |
2025-04-08 | Mind the Gap: Evaluating Vision Systems in Small Data Applications | Samuel Stevens et.al. | 2504.06486 | link |
2025-04-08 | Transfer between Modalities with MetaQueries | Xichen Pan et.al. | 2504.06256 | null |
2025-04-08 | V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models | Xiangxi Zheng et.al. | 2504.06148 | null |
2025-04-08 | MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models | Pengfei Zhou et.al. | 2504.05782 | null |
2025-04-08 | On the Suitability of Reinforcement Fine-Tuning to Visual Tasks | Xiaxu Chen et.al. | 2504.05682 | null |
2025-04-07 | URECA: Unique Region Caption Anything | Sangbeom Lim et.al. | 2504.05305 | null |
2025-04-07 | LiveVQA: Live Visual Knowledge Seeking | Mingyang Fu et.al. | 2504.05288 | null |
2025-04-07 | Explaining Low Perception Model Competency with High-Competency Counterfactuals | Sara Pohland et.al. | 2504.05254 | null |
2025-04-07 | Towards Visual Text Grounding of Multimodal Large Language Model | Ming Li et.al. | 2504.04974 | null |
2025-04-07 | Video-Bench: Human-Aligned Video Generation Benchmark | Hui Han et.al. | 2504.04907 | null |
2025-04-07 | OrderChain: A General Prompting Paradigm to Improve Ordinal Understanding Ability of MLLM | Jinhong Wang et.al. | 2504.04801 | null |
2025-04-07 | OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance | Chaoyi Wang et.al. | 2504.04781 | null |
2025-04-07 | Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data | Samarth Mishra et.al. | 2504.04740 | null |
2025-04-07 | LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts | Yimu Wang et.al. | 2504.04653 | null |
2025-04-06 | Advancing Egocentric Video Question Answering with Multimodal Large Language Models | Alkesh Patel et.al. | 2504.04550 | null |
2025-04-04 | MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Wulin Xie et.al. | 2504.03641 | null |
2025-04-03 | Hummus: A Dataset of Humorous Multimodal Metaphor Use | Xiaoyu Tong et.al. | 2504.02983 | link |
2025-04-03 | Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning | Zhihan Zhang et.al. | 2504.02906 | link |
2025-04-03 | Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision | Xiaofeng Han et.al. | 2504.02477 | null |
2025-04-03 | The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy | Matheus Valentim et.al. | 2504.02217 | null |
2025-04-03 | ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement | Runhui Huang et.al. | 2504.01934 | null |
2025-04-02 | Spatial-R1: Enhancing MLLMs in Video Spatial Reasoning | Kun Ouyang et.al. | 2504.01805 | link |
2025-04-02 | PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$ de Contextualization | Aofan Liu et.al. | 2504.01444 | null |
2025-04-02 | Slow-Fast Architecture for Video Multi-Modal Large Language Models | Min Shi et.al. | 2504.01328 | link |
2025-04-01 | AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction | Junhao Cheng et.al. | 2504.01014 | link |
2025-04-01 | IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval | Bangwei Liu et.al. | 2504.00954 | null |
2025-04-02 | Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning | Ram Ramrakhya et.al. | 2504.00907 | null |
2025-04-01 | Improved Visual-Spatial Reasoning via R1-Zero-Like Training | Zhenyi Liao et.al. | 2504.00883 | null |
2025-04-01 | Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights | Yuchen Liu et.al. | 2504.00839 | null |
2025-04-01 | QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA | Shuai Li et.al. | 2504.00654 | null |
2025-03-31 | Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Shengqiong Wu et.al. | 2503.24379 | null |
2025-03-31 | Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Yi Chen et.al. | 2503.24376 | link |
2025-03-31 | H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding | Qi Wu et.al. | 2503.24008 | null |
2025-03-31 | BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation | Yumeng Fu et.al. | 2503.23990 | null |
2025-03-31 | Boosting MLLM Reasoning with Text-Debiased Hint-GRPO | Qihan Huang et.al. | 2503.23905 | null |
2025-04-01 | Evaluating small vision-language models as AI assistants for radio astronomical source analysis tasks | S. Riggi et.al. | 2503.23859 | link |
2025-03-31 | OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training | Yijie Zheng et.al. | 2503.23830 | null |
2025-03-31 | XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? | Fengxiang Wang et.al. | 2503.23771 | null |
2025-03-31 | STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding? | Yun Li et.al. | 2503.23765 | null |
2025-03-31 | AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization | Yiyang Du et.al. | 2503.23733 | link |
2025-03-28 | Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Weiqi Li et.al. | 2503.22679 | link |
2025-03-28 | Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users | Antonia Karamolegkou et.al. | 2503.22610 | null |
2025-03-28 | NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving | Fuhao Li et.al. | 2503.22436 | null |
2025-03-31 | Agent-Centric Personalized Multiple Clustering with Multi-Modal LLMs | Ziye Chen et.al. | 2503.22241 | null |
2025-03-28 | Learning to Instruct for Visual Instruction Tuning | Zhihan Zhou et.al. | 2503.22215 | null |
2025-03-28 | DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos | Yunming Liang et.al. | 2503.22208 | null |
2025-03-28 | EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos | Yuxuan Li et.al. | 2503.22152 | link |
2025-03-28 | Tokenization of Gaze Data | Tim Rolff et.al. | 2503.22145 | null |
2025-03-28 | A Survey on Remote Sensing Foundation Models: From Vision to Multimodality | Ziyue Huang et.al. | 2503.22081 | link |
2025-03-27 | Video-R1: Reinforcing Video Reasoning in MLLMs | Kaituo Feng et.al. | 2503.21776 | link |
2025-03-27 | 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models | Yuhan Zhang et.al. | 2503.21745 | null |
2025-03-27 | UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning | Zhengxi Lu et.al. | 2503.21620 | link |
2025-03-27 | FusionSegReID: Advancing Person Re-Identification with Multimodal Retrieval and Precise Segmentation | Jincheng Yan et.al. | 2503.21595 | null |
2025-03-27 | FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs | Xiaoqin Wang et.al. | 2503.21457 | link |
2025-03-27 | InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression | Dongchen Lu et.al. | 2503.21307 | link |
2025-03-26 | ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction | Yiqiao Jin et.al. | 2503.20978 | null |
2025-03-26 | MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams | Yanpeng Sun et.al. | 2503.20745 | null |
2025-03-26 | Vision as LoRA | Han Wang et.al. | 2503.20680 | link |
2025-03-26 | Beyond Intermediate States: Explaining Visual Redundancy through Language | Dingchen Yang et.al. | 2503.20540 | link |
2025-03-26 | Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering | Zehui Liao et.al. | 2503.20504 | null |
2025-03-26 | MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning | Yiwei Ma et.al. | 2503.20502 | null |
2025-03-26 | From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment | Yucheng Suo et.al. | 2503.20472 | null |
2025-03-26 | MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation | Rongyu Zhang et.al. | 2503.20384 | null |
2025-03-26 | Dynamic Pyramid Network for Efficient Multimodal Large Language Model | Hao Ai et.al. | 2503.20322 | null |
2025-03-26 | Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs | Zitian Wang et.al. | 2503.20309 | null |
2025-03-25 | LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? | Kexian Tang et.al. | 2503.19990 | null |
2025-03-25 | CoLLM: A Large Language Model for Composed Image Retrieval | Chuong Huynh et.al. | 2503.19910 | link |
2025-03-25 | Scaling Vision Pre-Training to 4K Resolution | Baifeng Shi et.al. | 2503.19903 | null |
2025-03-25 | Perception-Enhanced Multitask Multimodal Semantic Communication for UAV-Assisted Integrated Sensing and Communication System | Ziji Guo et.al. | 2503.19594 | null |
2025-03-25 | DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts | Ling Zhong et.al. | 2503.19498 | null |
2025-03-25 | ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning | Jiaqi Liao et.al. | 2503.19312 | null |
2025-03-24 | MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks | Wenhao You et.al. | 2503.19134 | null |
2025-03-24 | LLaVAction: evaluating and training multi-modal large language models for action recognition | Shaokai Ye et.al. | 2503.18712 | link |
2025-03-25 | Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models | Yazhou Zhang et.al. | 2503.18681 | null |
2025-03-24 | Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark | Bingchen Miao et.al. | 2503.18665 | link |
2025-03-24 | Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding | Xiangrui Liu et.al. | 2503.18478 | null |
2025-03-24 | A Simple yet Effective Layout Token in Large Language Models for Document Understanding | Zhaoqing Zhu et.al. | 2503.18434 | null |
2025-03-23 | Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering | Zixin Chen et.al. | 2503.18172 | null |
2025-03-23 | MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Jiaxin Huang et.al. | 2503.18135 | null |
2025-03-23 | MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection | Yibo Yan et.al. | 2503.18132 | null |
2025-03-23 | Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models | Qiao Liang et.al. | 2503.18034 | null |
2025-03-22 | 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding | Wenxuan Zhu et.al. | 2503.17827 | link |
2025-03-21 | LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models | Jian Liang et.al. | 2503.16843 | null |
2025-03-21 | When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts | Jun Seong Kim et.al. | 2503.16826 | null |
2025-03-20 | Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions | Hadi Amini et.al. | 2503.16585 | link |
2025-03-20 | OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence | Long Yuan et.al. | 2503.16326 | null |
2025-03-20 | Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data | Zijian Li et.al. | 2503.16260 | null |
2025-03-20 | CLS-RL: Image Classification with Rule-Based Reinforcement Learning | Ming Li et.al. | 2503.16188 | link |
2025-03-20 | OThink-MR1: Stimulating multimodal generalized reasoning capabilities through dynamic reinforcement learning | Zhiyuan Liu et.al. | 2503.16081 | null |
2025-03-20 | Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models | Zhihang Liu et.al. | 2503.16036 | null |
2025-03-20 | BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models | Zenghui Yuan et.al. | 2503.16023 | null |
2025-03-20 | DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering | Haochen Wang et.al. | 2503.15887 | null |
2025-03-20 | A Vision Centric Remote Sensing Benchmark | Abduljaleel Adejumo et.al. | 2503.15816 | null |
2025-03-19 | LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning | Federico Cocchi et.al. | 2503.15621 | link |
2025-03-19 | Visual Position Prompt for MLLM based Visual Grounding | Wei Tang et.al. | 2503.15426 | link |
2025-03-19 | Leveraging Perfect Multimodal Alignment and Gaussian Assumptions for Cross-modal Transfer | Abhi Kamboj et.al. | 2503.15352 | null |
2025-03-19 | LEGION: Learning to Ground and Explain for Synthetic Image Detection | Hengrui Kang et.al. | 2503.15264 | null |
2025-03-20 | Benchmarking Large Language Models for Handwritten Text Recognition | Giorgia Crosilla et.al. | 2503.15195 | null |
2025-03-19 | UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation | Qihui Zhang et.al. | 2503.14941 | null |
2025-03-19 | VisNumBench: Evaluating Number Sense of Multimodal Large Language Models | Tengjin Weng et.al. | 2503.14939 | null |
2025-03-19 | FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding | Chongjun Tu et.al. | 2503.14935 | null |
2025-03-19 | POSTA: A Go-to Framework for Customized Artistic Poster Generation | Haoyu Chen et.al. | 2503.14908 | null |
2025-03-19 | Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations | Shuo Li et.al. | 2503.14895 | null |
2025-03-18 | Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives | Sara Sarto et.al. | 2503.14604 | null |
2025-03-18 | Aligning Multimodal LLM with Human Preference: A Survey | Tao Yu et.al. | 2503.14504 | null |
2025-03-19 | Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM | Xinyu Fang et.al. | 2503.14478 | link |
2025-03-18 | VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation | Shoubin Yu et.al. | 2503.14350 | null |
2025-03-19 | DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies | Wei Song et.al. | 2503.14324 | link |
2025-03-18 | Towards Harmless Multimodal Assistants with Blind Preference Optimization | Yongqi Li et.al. | 2503.14189 | null |
2025-03-18 | Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Zining Wang et.al. | 2503.14140 | null |
2025-03-18 | MP-GUI: Modality Perception with MLLMs for GUI Understanding | Ziwei Wang et.al. | 2503.14021 | link |
2025-03-18 | SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability | Jiankang Wang et.al. | 2503.13983 | null |
2025-03-18 | Survey of Adversarial Robustness in Multimodal Large Language Models | Chengze Jiang et.al. | 2503.13962 | null |
2025-03-18 | Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation | Sayak Nag et.al. | 2503.13947 | null |
2025-03-17 | MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | James Burgess et.al. | 2503.13399 | link |
2025-03-17 | Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning | Mengyao Lyu et.al. | 2503.13383 | null |
2025-03-17 | Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning | Hai-Long Sun et.al. | 2503.13360 | null |
2025-03-17 | 3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o | Dingning Liu et.al. | 2503.13185 | null |
2025-03-17 | MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs | Erik Daxberger et.al. | 2503.13111 | null |
2025-03-17 | Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference | Hao Yin et.al. | 2503.13108 | link |
2025-03-17 | ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models | Hao Yin et.al. | 2503.13107 | link |
2025-03-17 | Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided, Self-Consistent MLLMs for Food Preparation Task Planning | Yu-Hong Shen et.al. | 2503.13055 | null |
2025-03-17 | Efficient Motion-Aware Video MLLM | Zijia Zhao et.al. | 2503.13016 | null |
2025-03-17 | HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model | Haiyang Guo et.al. | 2503.12941 | null |
2025-03-14 | VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity | Jing Bi et.al. | 2503.11557 | null |
2025-03-14 | A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving | Tin Stribor Sohn et.al. | 2503.11400 | null |
2025-03-14 | Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware | Insu Jang et.al. | 2503.11367 | link |
2025-03-14 | Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Weichen Zhan et.al. | 2503.11094 | link |
2025-03-14 | EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks | Yi Zhang et.al. | 2503.11089 | null |
2025-03-14 | BannerAgency: Advertising Banner Design with Multimodal LLM Agents | Heng Wang et.al. | 2503.11060 | null |
2025-03-14 | RONA: Pragmatically Diverse Image Captioning with Coherence Relations | Aashish Anantha Ramakrishnan et.al. | 2503.10997 | link |
2025-03-13 | Learning to Inference Adaptively for Multimodal Large Language Models | Zhuoyan Xu et.al. | 2503.10905 | null |
2025-03-13 | PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models | Zilu Guo et.al. | 2503.10529 | null |
2025-03-13 | Interactive Multimodal Fusion with Temporal Modeling | Jun Yu et.al. | 2503.10523 | null |
2025-03-13 | TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models | Xudong Tan et.al. | 2503.10501 | link |
2025-03-13 | 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models | Wanhua Li et.al. | 2503.10437 | link |
2025-03-13 | CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | Yufan Deng et.al. | 2503.10391 | null |
2025-03-13 | A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection | Heng Yim Nicole Oo et.al. | 2503.10371 | null |
2025-03-13 | IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification | Yuhao Wang et.al. | 2503.10324 | null |
2025-03-13 | VisualPRM: An Effective Process Reward Model for Multimodal Reasoning | Weiyun Wang et.al. | 2503.10291 | null |
2025-03-13 | LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents | Boyu Chen et.al. | 2503.10200 | null |
2025-03-13 | Hybrid Agents for Image Restoration | Bingchen Li et.al. | 2503.10120 | null |
2025-03-13 | BIMBA: Selective-Scan Compression for Long-Range Video Question Answering | Md Mohaiminul Islam et.al. | 2503.09590 | link |
2025-03-12 | Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding | Haoyu Zhang et.al. | 2503.09143 | null |
2025-03-11 | Seeing What’s Not There: Spurious Correlation in Multimodal LLMs | Parsa Hosseini et.al. | 2503.08884 | null |
2025-03-11 | Language-Depth Navigated Thermal and Visible Image Fusion | Jinchang Zhang et.al. | 2503.08676 | null |
2025-03-11 | SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories | Muzhi Zhu et.al. | 2503.08625 | null |
2025-03-11 | LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Xianfeng Wu et.al. | 2503.08619 | link |
2025-03-11 | HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding | Shehreen Azad et.al. | 2503.08585 | null |
2025-03-11 | RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding | Xichen Tan et.al. | 2503.08576 | null |
2025-03-11 | FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework | Jianian Zhu et.al. | 2503.08461 | null |
2025-03-11 | KAP: MLLM-assisted OCR Text Enhancement for Hybrid Retrieval in Chinese Non-Narrative Documents | Hsin-Ling Hsu et.al. | 2503.08452 | null |
2025-03-11 | Embodied Crowd Counting | Runling Long et.al. | 2503.08367 | null |
2025-03-12 | Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs | Chongjun Tu et.al. | 2503.08342 | null |
2025-03-11 | Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework | Zhuo Zhi et.al. | 2503.08308 | null |
2025-03-10 | Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts | Shiu-hong Kao et.al. | 2503.07503 | null |
2025-03-10 | LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? | Bangyan Li et.al. | 2503.07487 | null |
2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413 | link |
2025-03-10 | ALLVB: All-in-One Long Video Understanding Benchmark | Xichen Tan et.al. | 2503.07298 | null |
2025-03-10 | A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images | Xiaoyi Liang et.al. | 2503.07094 | null |
2025-03-10 | Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning | Jiazheng Liu et.al. | 2503.07002 | null |
2025-03-10 | Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs | Wenzhuo Xu et.al. | 2503.06989 | null |
2025-03-10 | Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition | Xinyu Xi et.al. | 2503.06978 | null |
2025-03-10 | ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks | Yan Yang et.al. | 2503.06885 | null |
2025-03-09 | SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation | Zisheng Chen et.al. | 2503.06764 | link |
2025-03-11 | Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models | Wenxuan Huang et.al. | 2503.06749 | link |
2025-03-07 | Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information | Junbo Zhao et.al. | 2503.05543 | null |
2025-03-07 | Can Large Language Models Grasp Concepts in Visual Content? A Case Study on YouTube Shorts about Depression | Jiaying “Lizzy” Liu et.al. | 2503.05109 | null |
2025-03-06 | FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement | Ian Huang et.al. | 2503.04919 | null |
2025-03-06 | Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model | Wenke Huang et.al. | 2503.04543 | null |
2025-03-06 | Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition | Bin Chen et.al. | 2503.04201 | null |
2025-03-06 | MASTER: Multimodal Segmentation with Text Prompts | Fuyang Liu et.al. | 2503.04199 | null |
2025-03-06 | Biological Sequence with Language Model Prompting: A Survey | Jiyue Jiang et.al. | 2503.04135 | null |
2025-03-07 | Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts | Xiangnan Chen et.al. | 2503.04095 | null |
2025-03-06 | RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models | Wenhui Zhu et.al. | 2503.03987 | null |
2025-03-05 | DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Zhao Yang et.al. | 2503.03689 | link |
2025-03-05 | BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation | Hiep Truong Cong et.al. | 2503.03280 | null |
2025-03-05 | COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence | Wentao Li et.al. | 2503.03215 | null |
2025-03-05 | Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings | Sneh Pillai et.al. | 2503.03202 | null |
2025-03-04 | Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs | Wei-Yao Wang et.al. | 2503.02597 | link |
2025-03-05 | MCiteBench: A Benchmark for Multimodal Citation Text Generation in MLLMs | Caiyu Hu et.al. | 2503.02589 | link |
2025-03-04 | A Token-level Text Image Foundation Model for Document Understanding | Tongkun Guan et.al. | 2503.02304 | null |
2025-03-03 | Distilled Prompt Learning for Incomplete Multimodal Survival Prediction | Yingxue Xu et.al. | 2503.01653 | null |
2025-03-03 | RemiHaven: Integrating “In-Town” and “Out-of-Town” Peers to Provide Personalized Reminiscence Support for Older Drifters | Xuechen Zhang et.al. | 2503.01358 | null |
2025-03-04 | UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface | Hao Tang et.al. | 2503.01342 | link |
2025-03-03 | Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG | Wenbin Wang et.al. | 2503.01222 | link |
2025-03-03 | Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models | Tianjie Ju et.al. | 2503.01208 | link |
2025-03-03 | Scientific Reasoning: Assessment of Multimodal Generative LLMs | Florian Dreyer et.al. | 2503.01064 | null |
2025-03-02 | LLM-Fusion: A Novel Multimodal Fusion Model for Accelerated Material Discovery | Onur Boyar et.al. | 2503.01022 | null |
2025-02-28 | Adaptive Keyframe Sampling for Long Video Understanding | Xi Tang et.al. | 2502.21271 | null |
2025-02-28 | RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete | Yuheng Ji et.al. | 2502.21257 | null |
2025-02-28 | Fine-Grained Retrieval-Augmented Generation for Visual Question Answering | Zhengxuan Zhang et.al. | 2502.20964 | null |
2025-02-28 | HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | Xiao Wang et.al. | 2502.20811 | null |
2025-03-03 | MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts | Peijie Wang et.al. | 2502.20808 | null |
2025-02-28 | Towards General Visual-Linguistic Face Forgery Detection(V2) | Ke Sun et.al. | 2502.20698 | link |
2025-02-27 | Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection | Sari Masri et.al. | 2502.20573 | null |
2025-02-27 | Protecting multimodal large language models against misleading visualizations | Jonathan Tonglet et.al. | 2502.20503 | link |
2025-02-27 | VideoA11y: Method and Dataset for Accessible Video Description | Chaoyu Li et.al. | 2502.20480 | null |
2025-02-27 | Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription | Benjamin Gutteridge et.al. | 2502.20295 | link |
2025-02-27 | Mixture of Experts for Recognizing Depression from Interview and Reading Tasks | Loukas Ilias et.al. | 2502.20213 | null |
2025-02-27 | New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration | Xuzheng Yang et.al. | 2502.20104 | null |
2025-02-27 | AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs | Xuyang Wei et.al. | 2502.20035 | link |
2025-02-27 | Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up | Lang Huang et.al. | 2502.20008 | null |
2025-02-27 | Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents | Zhenyu Liu et.al. | 2502.19917 | link |
2025-02-27 | Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy | Zaijing Li et.al. | 2502.19902 | null |
2025-02-27 | Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention | Weiyan Shi et.al. | 2502.19877 | null |
2025-02-27 | One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion | Chunyang Cheng et.al. | 2502.19854 | link |
2025-02-27 | Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack | Chenhe Gu et.al. | 2502.19672 | null |
2025-02-26 | ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models | Danae Sánchez Villegas et.al. | 2502.19409 | null |
2025-02-26 | M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance | Qingpei Guo et.al. | 2502.18778 | null |
2025-02-25 | OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference | Xiangyu Zhao et.al. | 2502.18411 | link |
2025-02-25 | ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis | Li Lei et.al. | 2502.18180 | null |
2025-02-25 | VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion | Pei Liu et.al. | 2502.18042 | null |
2025-02-25 | MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks | Hyeonjeong Ha et.al. | 2502.17832 | link |
2025-02-25 | Can Multimodal LLMs Perform Time Series Anomaly Detection? | Xiongxiao Xu et.al. | 2502.17812 | link |
2025-02-24 | MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference | Zhongwei Wan et.al. | 2502.17599 | link |
2025-02-24 | PosterSum: A Multimodal Benchmark for Scientific Poster Summarization | Rohit Saxena et.al. | 2502.17540 | link |
2025-02-24 | Introducing Visual Perception Token into Multimodal Large Language Model | Runpeng Yu et.al. | 2502.17425 | link |
2025-02-24 | MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs | Jiarui Zhang et.al. | 2502.17422 | link |
2025-02-24 | HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization | Zhenghao Liu et.al. | 2502.17315 | link |
2025-02-24 | Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts | Zhenghao Liu et.al. | 2502.17297 | link |
2025-02-24 | Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence | Wenzhe Yin et.al. | 2502.17028 | null |
2025-02-24 | Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs | Himanshu Beniwal et.al. | 2502.16901 | link |
2025-02-24 | SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding | Liangtao Shi et.al. | 2502.16786 | link |
2025-02-23 | AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation | Rui Li et.al. | 2502.16680 | link |
2025-02-23 | Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries | Yin Wu et.al. | 2502.16636 | link |
2025-02-23 | Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review | Pei Fu et.al. | 2502.16586 | null |
2025-02-21 | Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models | Anirudh Sundar et.al. | 2502.15639 | null |
2025-02-21 | Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs | Gengyuan Zhang et.al. | 2502.15457 | null |
2025-02-21 | Research advances on fish feeding behavior recognition and intensity quantification methods in aquaculture | Shulong Zhang et.al. | 2502.15311 | null |
2025-02-21 | M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment | Chuan Cui et.al. | 2502.15167 | null |
2025-02-20 | Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation | Yun-Wei Chu et.al. | 2502.15040 | null |
2025-02-20 | Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework | Yuming Yang et.al. | 2502.14864 | link |
2025-02-20 | Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension | Amir Hossein Yari et.al. | 2502.14315 | null |
2025-02-20 | Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach | Yurong Wu et.al. | 2502.14285 | null |
2025-02-21 | PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC | Haowei Liu et.al. | 2502.14282 | null |
2025-02-19 | ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities | Chanjin Zheng et.al. | 2502.13832 | link |
2025-02-19 | From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education | Yi-Fan Zhang et.al. | 2502.13789 | null |
2025-02-18 | Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation | Bencheng Liao et.al. | 2502.13145 | link |
2025-02-18 | SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models | Xianfu Cheng et.al. | 2502.13059 | null |
2025-02-18 | AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks | Yurun Chen et.al. | 2502.13053 | null |
2025-02-18 | Towards Text-Image Interleaved Retrieval | Xin Zhang et.al. | 2502.12799 | link |
2025-02-18 | Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning | Yunhao Gou et.al. | 2502.12635 | null |
2025-02-18 | SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings | Weikai Lu et.al. | 2502.12562 | link |
2025-02-18 | MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos | Huaying Yuan et.al. | 2502.12558 | null |
2025-02-18 | SAFEERASER: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning | Junkai Chen et.al. | 2502.12520 | null |
2025-02-17 | HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation | Ling Yang et.al. | 2502.12148 | link |
2025-02-17 | PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection | Jinhe Bi et.al. | 2502.12119 | null |
2025-02-17 | Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications | Li Qiao et.al. | 2502.12096 | null |
2025-02-17 | Unhackable Temporal Rewarding for Scalable Video MLLMs | En Yu et.al. | 2502.12081 | null |
2025-02-17 | GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs | Yi Fang et.al. | 2502.11925 | null |
2025-02-17 | EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models | Jiamin Su et.al. | 2502.11916 | null |
2025-02-17 | MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation | Haochen Xue et.al. | 2502.11903 | null |
2025-02-17 | Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Hanbin Wang et.al. | 2502.11829 | link |
2025-02-17 | Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning | Yuqi Pang et.al. | 2502.11751 | link |
2025-02-17 | Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent | Junda Wu et.al. | 2502.11740 | null |
2025-02-14 | MM-RLHF: The Next Step Forward in Multimodal LLM Alignment | Yi-Fan Zhang et.al. | 2502.10391 | null |
2025-02-14 | AutoS $^2$ earch: Unlocking the Reasoning Potential of Large Models for Web-based Source Search | Zhengqiu Zhu et.al. | 2502.09913 | null |
2025-02-13 | EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | Rui Yang et.al. | 2502.09560 | null |
2025-02-13 | A Benchmark for Crime Surveillance Video Analysis with Large Models | Haoran Chen et.al. | 2502.09325 | null |
2025-02-13 | From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs | Mingxiao Li et.al. | 2502.09093 | null |
2025-02-12 | FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per Violation | Yang Sun et.al. | 2502.08260 | link |
2025-02-12 | Learning Human Skill Generators at Key-Step Levels | Yilu Wu et.al. | 2502.08234 | null |
2025-02-13 | Universal Adversarial Attack on Aligned Multimodal LLMs | Temurbek Rahmatullaev et.al. | 2502.07987 | null |
2025-02-11 | DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities | Chashi Mahiul Islam et.al. | 2502.07905 | null |
2025-02-11 | Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models | Jiacong Xu et.al. | 2502.07601 | null |
2025-02-11 | MLLM4PUE: Toward Universal Embeddings in Computational Pathology through Multimodal LLMs | Qifeng Zhou et.al. | 2502.07221 | null |
2025-02-11 | Early Risk Prediction of Pediatric Cardiac Arrest from Electronic Health Records via Multimodal Fused Transformer | Jiaying Lu et.al. | 2502.07158 | null |
2025-02-09 | AI-Driven HSI: Multimodality, Fusion, Challenges, and the Deep Learning Revolution | David S. Bhatti et.al. | 2502.06894 | null |
2025-02-11 | CoS: Chain-of-Shot Prompting for Long Video Understanding | Jian Hu et.al. | 2502.06428 | null |
2025-02-07 | Survey on AI-Generated Media Detection: From Non-MLLM to MLLM | Yueying Zou et.al. | 2502.05240 | null |
2025-02-07 | Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Yunhang Shen et.al. | 2502.05177 | link |
2025-02-07 | Multitwine: Multi-Object Compositing with Text and Layout Control | Gemma Canet Tarrés et.al. | 2502.05165 | null |
2025-02-07 | Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs | Rohit Saxena et.al. | 2502.05092 | null |
2025-02-07 | Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark | Han Zhang et.al. | 2502.04976 | null |
2025-02-07 | Cached Multi-Lora Composition for Multi-Concept Image Generation | Xiandong Zou et.al. | 2502.04923 | link |
2025-02-07 | MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin | Minrui Chen et.al. | 2502.04794 | null |
2025-02-06 | EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models | He Hu et.al. | 2502.04424 | null |
2025-02-05 | PerPO: Perceptual Preference Optimization via Discriminative Rewarding | Zining Zhu et.al. | 2502.04371 | link |
2025-02-06 | PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? | Mennatullah Siam et.al. | 2502.04192 | link |
2025-02-06 | MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation | Qinhan Yu et.al. | 2502.04176 | null |
2025-02-05 | Large Language Models Are Universal Recommendation Learners | Junguang Jiang et.al. | 2502.03041 | null |
2025-02-05 | Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning | Yibo Yan et.al. | 2502.02871 | null |
2025-02-04 | SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency | Qianhao Yuan et.al. | 2502.02458 | link |
2025-02-04 | Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment | Yaling Shen et.al. | 2502.02438 | null |
2025-02-06 | LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models | Tzu-Tao Chang et.al. | 2502.02406 | null |
2025-02-04 | Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking | Jinyang Wu et.al. | 2502.02339 | null |
2025-02-04 | Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration | Younan Zhu et.al. | 2502.01969 | null |
2025-02-04 | MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving | Shiju Zhao et.al. | 2502.01960 | null |
2025-02-04 | DAMO: Data- and Model-aware Alignment of Multi-modal LLMs | Jinda Lu et.al. | 2502.01943 | null |
2025-02-03 | Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models | Hashmat Shadab Malik et.al. | 2502.01576 | link |
2025-02-03 | Position: Empowering Time Series Reasoning with Multimodal LLMs | Yaxuan Kong et.al. | 2502.01477 | null |
2025-02-03 | Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models | Mingi Jung et.al. | 2502.01419 | null |
2025-01-31 | Efficient Reasoning with Hidden Thinking | Xuan Shen et.al. | 2501.19201 | link |
2025-01-31 | Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs | Hongliang Li et.al. | 2501.19036 | null |
2025-01-31 | Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation | Bin Zhu et.al. | 2501.19017 | null |
2025-01-30 | BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos | Lehao Lin et.al. | 2501.18565 | null |
2025-01-29 | Generative AI for Vision: A Comprehensive Study of Frameworks and Applications | Fouad Bousetouane et.al. | 2501.18033 | null |
2025-01-29 | Topological Signatures of Adversaries in Multimodal Alignments | Minh Vu et.al. | 2501.18006 | null |
2025-01-30 | Leveraging Multimodal LLM for Inspirational User Interface Search | Seokhyeon Park et.al. | 2501.17799 | link |
2025-01-29 | Learning Free Token Reduction for Multi-Modal LLM | Zihui Zhao et.al. | 2501.17391 | null |
2025-01-31 | Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence | Lindy Gan et.al. | 2501.16813 | null |
2025-01-28 | Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding | Yun Li et.al. | 2501.16786 | null |
2025-01-28 | MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark | Dongyi Yi et.al. | 2501.16688 | null |
2025-01-28 | CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs | Jinlan Fu et.al. | 2501.16629 | link |
2025-01-27 | AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models | Zheng Lian et.al. | 2501.16566 | null |
2025-01-27 | LUCY: Linguistic Understanding and Control Yielding Early Stage of Her | Heting Gao et.al. | 2501.16327 | link |
2025-01-27 | FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers | Renshan Zhang et.al. | 2501.16297 | null |
2025-01-27 | Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models | Jing Zhang et.al. | 2501.16282 | null |
2025-01-27 | Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? | Zhiling Chen et.al. | 2501.15795 | null |
2025-01-27 | Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning | Michael Xieyang Liu et.al. | 2501.15727 | null |
2025-01-26 | Ocean-OCR: Towards General OCR Application via a Vision-Language Model | Song Chen et.al. | 2501.15558 | link |
2025-01-26 | Unveiling the Potential of Multimodal Retrieval Augmented Generation with Planning | Xiaohan Yu et.al. | 2501.15470 | null |
2025-01-26 | Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations | Zijun Long et.al. | 2501.15379 | null |
2025-01-26 | Baichuan-Omni-1.5 Technical Report | Yadong Li et.al. | 2501.15368 | link |
2025-01-25 | Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink | Yining Wang et.al. | 2501.15269 | null |
2025-01-23 | Pilot: Building the Federated Multimodal Instruction Tuning Framework | Baochen Xiong et.al. | 2501.13985 | null |
2025-01-23 | GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration | Yue Fan et.al. | 2501.13896 | null |
2025-01-23 | EventVL: Understand Event Streams via Multimodal Large Language Model | Pengteng Li et.al. | 2501.13707 | null |
2025-01-23 | LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models | Yizheng Sun et.al. | 2501.13652 | null |
2025-01-23 | ReasVQA: Advancing VideoQA with Imperfect Reasoning Process | Jianxin Liang et.al. | 2501.13536 | null |
2025-01-23 | 50 Shades of Deceptive Patterns: A Unified Taxonomy, Multimodal Detection, and Security Implications | Zewei Shi et.al. | 2501.13351 | link |
2025-01-24 | Multi-aspect Knowledge Distillation with Large Language Model | Taegyeong Lee et.al. | 2501.13341 | link |
2025-01-22 | Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Bohao Yang et.al. | 2501.13042 | link |
2025-01-22 | InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling | Yi Wang et.al. | 2501.12386 | link |
2025-01-21 | VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Xianwei Zhuang et.al. | 2501.12327 | link |
2025-01-21 | Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization | Jie Zhao et.al. | 2501.11968 | null |
2025-01-21 | EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | Zhili Cheng et.al. | 2501.11858 | link |
2025-01-20 | Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution | Zhiyuan You et.al. | 2501.11561 | null |
2025-01-20 | EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery | Guankun Wang et.al. | 2501.11347 | link |
2025-01-20 | ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction | Xiangyang Hu et.al. | 2501.11276 | link |
2025-01-20 | A Survey of World Models for Autonomous Driving | Tuo Feng et.al. | 2501.11260 | null |
2025-01-19 | Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation | Zhengwen Shen et.al. | 2501.10958 | null |
2025-01-18 | Visual RAG: Expanding MLLM visual knowledge without fine-tuning | Mirco Bonomo et.al. | 2501.10834 | null |
2025-01-17 | FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Kartik Narayan et.al. | 2501.10360 | link |
2025-01-16 | A Simple Aerial Detection Baseline of Multimodal Language Models | Qingyun Li et.al. | 2501.09720 | link |
2025-01-16 | Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis | Qize Yang et.al. | 2501.09502 | null |
2025-01-16 | Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics | Yuanyuan Wei et.al. | 2501.09218 | null |
2025-01-15 | Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | Ruixiang Jiang et.al. | 2501.09012 | link |
2025-01-15 | The Devil is in Temporal Token: High Quality Video Reasoning Segmentation | Sitong Gong et.al. | 2501.08549 | link |
2025-01-14 | LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Hongyu Li et.al. | 2501.08282 | link |
2025-01-14 | Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness | Jiaxing Zhao et.al. | 2501.07978 | link |
2025-01-14 | Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models | Yifang Xu et.al. | 2501.07972 | null |
2025-01-14 | 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding | Haomiao Xiong et.al. | 2501.07819 | link |
2025-01-13 | Imagine while Reasoning in Space: Multimodal Visualization-of-Thought | Chengzu Li et.al. | 2501.07542 | null |
2025-01-13 | Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method | Wenping Jin et.al. | 2501.07496 | link |
2025-01-13 | Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation | Han Liu et.al. | 2501.07110 | link |
2025-01-13 | LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models | Mozhgan Nasr Azadani et.al. | 2501.06986 | link |
2025-01-12 | X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding | Wenqi Zhou et.al. | 2501.06835 | null |
2025-01-12 | GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing | Ruizhe Ou et.al. | 2501.06828 | null |
2025-01-12 | MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection | Kaiying Yan et.al. | 2501.06764 | null |
2025-01-12 | Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints | Ming Dai et.al. | 2501.06710 | link |
2025-01-11 | ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Xuanle Zhao et.al. | 2501.06598 | link |
2025-01-11 | Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Shan Zhang et.al. | 2501.06430 | link |
2025-01-10 | PEACE: Empowering Geologic Map Holistic Understanding with MLLMs | Yangyu Huang et.al. | 2501.06184 | null |
2025-01-10 | Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs | Dabing Cheng et.al. | 2501.05884 | null |
2025-01-10 | Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models | You Li et.al. | 2501.05767 | null |
2025-01-10 | TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos | Korawat Charoenpitaks et.al. | 2501.05733 | link |
2025-01-09 | MECASA: Motor Execution Classification using Additive Self-Attention for Hybrid EEG-fNIRS Data | Gourav Siddhad et.al. | 2501.05525 | null |
2025-01-09 | Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark | Yunzhuo Hao et.al. | 2501.05444 | link |
2025-01-09 | Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration | Xuyang Liu et.al. | 2501.05179 | link |
2025-01-09 | Optimizing Multitask Industrial Processes with Predictive Action Guidance | Naval Kishore Mehta et.al. | 2501.05108 | null |
2025-01-09 | DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving | Xuran Zheng et.al. | 2501.05081 | null |
2025-01-09 | Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency | Shiji Zhao et.al. | 2501.04931 | null |
2025-01-08 | Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs | Yikang Zhou et.al. | 2501.04670 | link |
2025-01-08 | InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | Yuhang Liu et.al. | 2501.04575 | link |
2025-01-08 | Evidence-based multimodal fusion on structured EHRs and free-text notes for ICU outcome prediction | Yucheng Ruan et.al. | 2501.04389 | link |
2025-01-08 | Multimodal Graph Constrastive Learning and Prompt for ChartQA | Yue Dai et.al. | 2501.04303 | null |
2025-01-08 | H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving | Siran Chen et.al. | 2501.04302 | null |
2025-01-07 | RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance | Matin Mortaheb et.al. | 2501.03995 | null |
2025-01-06 | Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches | Alhassan Mumuni et.al. | 2501.03151 | null |
2025-01-07 | Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild | Wanpeng Hu et.al. | 2501.02964 | link |
2025-01-06 | A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation | Toomas Tahves et.al. | 2501.02858 | null |
2025-01-06 | Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging? | Hongyi Miao et.al. | 2501.02751 | null |
2025-01-05 | FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance | Haicheng Wang et.al. | 2501.02430 | link |
2025-01-04 | What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph | Yutao Jiang et.al. | 2501.02268 | link |
2025-01-03 | AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs | Sanjoy Chowdhury et.al. | 2501.02135 | null |
2025-01-03 | VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction | Chaoyou Fu et.al. | 2501.01957 | link |
2025-01-03 | Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Yifan Du et.al. | 2501.01904 | link |
2025-01-03 | Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models | Guosheng Zhang et.al. | 2501.01720 | null |
2025-01-02 | Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants | Lixiong Qin et.al. | 2501.01243 | null |
2025-01-02 | Towards Interactive Deepfake Analysis | Lixiong Qin et.al. | 2501.01164 | link |
2025-01-02 | EliGen: Entity-Level Controlled Image Generation with Regional Attention | Hong Zhang et.al. | 2501.01097 | link |
2025-01-02 | Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs | Linhao Huang et.al. | 2501.01042 | null |
2025-01-01 | Decoding the Flow: CauseMotion for Emotional Causality Analysis in Long-form Conversations | Yuxuan Zhang et.al. | 2501.00778 | null |
2024-12-31 | Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method | Zhenpeng Huang et.al. | 2501.00584 | null |
2024-12-31 | VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling | Xinhao Li et.al. | 2501.00574 | link |
2024-12-31 | Fine-grained Video-Text Retrieval: A New Benchmark and Method | Yifan Xu et.al. | 2501.00513 | null |
2024-12-31 | Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion | Hebin Wang et.al. | 2501.00330 | null |
2024-12-31 | MLLM-as-a-Judge for Image Safety without Human Labeling | Zhenting Wang et.al. | 2501.00192 | null |
2024-12-30 | GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models | Shangyu Xing et.al. | 2412.21036 | null |
2024-12-30 | Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering | Junxiao Xue et.al. | 2412.20927 | null |
2024-12-28 | ST $^3$ : Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming | Jiedong Zhuang et.al. | 2412.20105 | null |
2024-12-28 | On the Compositional Generalization of Multimodal LLMs for Medical Imaging | Zhenyang Cai et.al. | 2412.20070 | link |
2024-12-27 | Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework | Jiang Liu et.al. | 2412.19684 | null |
2024-12-27 | CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs | Siyu Wang et.al. | 2412.19663 | null |
2024-12-27 | MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Jiaqi Fan et.al. | 2412.19406 | link |
2024-12-26 | Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment | Ziang Yan et.al. | 2412.19326 | link |
2024-12-26 | Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries | Roberto Amoroso et.al. | 2412.19304 | null |
2024-12-26 | SeaMo: A Multi-Seasonal and Multimodal Remote Sensing Foundation Model | Xuyang Li et.al. | 2412.19237 | null |
2024-12-25 | MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models | Kaiwen Zuo et.al. | 2412.18947 | null |
2024-12-25 | RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting | Yilei Jiang et.al. | 2412.18826 | null |
2024-12-24 | Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation | Faraz Waseem et.al. | 2412.18688 | null |
2024-12-24 | MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning | Abdelmadjid Chergui et.al. | 2412.18437 | link |
2024-12-24 | Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles | Zihan Wang et.al. | 2412.18416 | null |
2024-12-24 | Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search | Huanjin Yao et.al. | 2412.18319 | link |
2024-12-24 | ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation | Mengyang Wu et.al. | 2412.18216 | link |
2024-12-24 | Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation | Yucong Luo et.al. | 2412.18176 | null |
2024-12-24 | VisionLLM-based Multimodal Fusion Network for Glottic Carcinoma Early Detection | Zhaohui Jin et.al. | 2412.18124 | null |
2024-12-24 | Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach | Jing Bi et.al. | 2412.18108 | null |
2024-12-24 | An Ensemble Approach to Short-form Video Quality Assessment Using Multimodal LLM | Wen Wen et.al. | 2412.18060 | null |
2024-12-23 | A Multimodal Fusion Framework for Bridge Defect Detection with Cross-Verification | Ravi Datta Rachuri et.al. | 2412.17968 | null |
2024-12-23 | Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Priyaranjan Pattnayak et.al. | 2412.17759 | null |
2024-12-23 | HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data | Ting Zhou et.al. | 2412.17574 | link |
2024-12-23 | Multimodal Preference Data Synthetic Alignment with Reward Model | Robert Wijaya et.al. | 2412.17417 | link |
2024-12-23 | MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models | Beibei Yu et.al. | 2412.17339 | null |
2024-12-23 | Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding | Yueyang Li et.al. | 2412.17337 | link |
2024-12-23 | Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective | Kaifang Long et.al. | 2412.17297 | null |
2024-12-22 | SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults | Jinzhi Wang et.al. | 2412.17077 | null |
2024-12-22 | CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models | Yeyuan Wang et.al. | 2412.16869 | link |
2024-12-22 | GME: Improving Universal Multimodal Retrieval by Multimodal LLMs | Xin Zhang et.al. | 2412.16855 | null |
2024-12-21 | AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles | Aritra Kumar Lahiri et.al. | 2412.16701 | null |
2024-12-20 | MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection | Andrea Moglia et.al. | 2412.15925 | link |
2024-12-20 | Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution | Wentao Tan et.al. | 2412.15650 | link |
2024-12-20 | Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM | Yangyang Guo et.al. | 2412.15614 | null |
2024-12-20 | QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning | Xinyang Tong et.al. | 2412.15576 | null |
2024-12-20 | Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage | Saehyung Lee et.al. | 2412.15484 | null |
2024-12-19 | MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs | Yuxuan Wan et.al. | 2412.15310 | link |
2024-12-19 | OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving | Shuo Xing et.al. | 2412.15208 | link |
2024-12-19 | Progressive Multimodal Reasoning via Active Retrieval | Guanting Dong et.al. | 2412.14835 | null |
2024-12-19 | Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models | Zijun Chen et.al. | 2412.14660 | link |
2024-12-18 | Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Jihan Yang et.al. | 2412.14171 | link |
2024-12-18 | InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Cong Wei et.al. | 2412.14006 | link |
2024-12-18 | LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer | Yipeng Zhang et.al. | 2412.13871 | link |
2024-12-17 | Modality-Inconsistent Continual Learning of Multimodal Large Language Models | Weiguo Pian et.al. | 2412.13050 | null |
2024-12-17 | ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing | Yaohui Ma et.al. | 2412.12821 | link |
2024-12-17 | PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model | Yuqing Wang et.al. | 2412.12737 | link |
2024-12-17 | ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding | Zhenxing Zhang et.al. | 2412.12718 | link |
2024-12-17 | Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation | Andong Chen et.al. | 2412.12627 | null |
2024-12-17 | FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning | Seunghee Kim et.al. | 2412.12567 | null |
2024-12-17 | Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models | Sina Bagheri Nezhad et.al. | 2412.12500 | link |
2024-12-16 | Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering | Jinhe Bi et.al. | 2412.12359 | link |
2024-12-16 | Instruction-based Image Manipulation by Watching How Things Move | Mingdeng Cao et.al. | 2412.12087 | null |
2024-12-16 | CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding | Guo Chen et.al. | 2412.12075 | null |
2024-12-16 | Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning | Yuti Liu et.al. | 2412.11952 | null |
2024-12-16 | A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | Yibo Yan et.al. | 2412.11936 | null |
2024-12-16 | PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension | Kun Ouyang et.al. | 2412.11906 | null |
2024-12-16 | GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training | Renqiu Xia et.al. | 2412.11863 | link |
2024-12-16 | IDEA-Bench: How Far are Generative Models from Professional Designing? | Chen Liang et.al. | 2412.11767 | link |
2024-12-16 | From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality | Shixin Jiang et.al. | 2412.11694 | null |
2024-12-16 | ACE- $M^3$ : Automatic Capability Evaluator for Multimodal Medical Models | Xiechi Zhang et.al. | 2412.11453 | null |
2024-12-15 | Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal | Yuhao Wang et.al. | 2412.11196 | null |
2024-12-13 | Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining | Zhiqi Ge et.al. | 2412.10342 | null |
2024-12-13 | BrushEdit: All-In-One Image Inpainting and Editing | Yaowei Li et.al. | 2412.10316 | null |
2024-12-13 | Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer’s Disease Identification | Yifan Gao et.al. | 2412.09928 | null |
2024-12-12 | ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation | Ali Athar et.al. | 2412.09754 | null |
2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618 | null |
2024-12-13 | Olympus: A Universal Task Router for Computer Vision Tasks | Yuanze Lin et.al. | 2412.09612 | link |
2024-12-12 | SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding | Hao Li et.al. | 2412.09604 | null |
2024-12-12 | Do Multimodal Large Language Models See Like Humans? | Jiaying Lin et.al. | 2412.09603 | null |
2024-12-12 | InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions | Pan Zhang et.al. | 2412.09596 | link |
2024-12-12 | OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation | Jitesh Jain et.al. | 2412.09585 | link |
2024-12-12 | Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition | Zhisheng Zhong et.al. | 2412.09501 | link |
2024-12-12 | Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation | Baisen Wang et.al. | 2412.09428 | link |
2024-12-12 | Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Xiaoshuang Huang et.al. | 2412.09278 | link |
2024-12-11 | LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information | Ke Wang et.al. | 2412.08771 | null |
2024-12-11 | From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons | Andrew Szot et.al. | 2412.08442 | null |
2024-12-11 | HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models | Shiding Zhu et.al. | 2412.08378 | null |
2024-12-11 | M2SE: A Multistage Multitask Instruction Tuning Strategy for Unified Sentiment and Emotion Analysis | Ao Li et.al. | 2412.08049 | link |
2024-12-10 | DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Jianzong Wu et.al. | 2412.07589 | null |
2024-12-09 | SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations | Zhaorun Chen et.al. | 2412.06878 | null |
2024-12-09 | ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Chunwei Wang et.al. | 2412.06673 | null |
2024-12-09 | 3D Spatial Understanding in MLLMs: Disambiguation and Evaluation | Chun-Peng Chang et.al. | 2412.06613 | null |
2024-12-12 | World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving | Mingliang Zhai et.al. | 2412.06324 | null |
2024-12-09 | LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Mingjie Xu et.al. | 2412.06322 | link |
2024-12-09 | Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness | Qifan Yu et.al. | 2412.06293 | null |
2024-12-09 | ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models | Bingchen Gong et.al. | 2412.06292 | null |
2024-12-08 | GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis | Ashish Goswami et.al. | 2412.06089 | null |
2024-12-08 | Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models | Xiao Xu et.al. | 2412.05939 | null |
2024-12-08 | Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models | Ma Teng et.al. | 2412.05934 | link |
2024-12-08 | [CLS] Token Tells Everything Needed for Training-free Efficient MLLMs | Ao Wang et.al. | 2412.05819 | link |
2024-12-06 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Zhe Chen et.al. | 2412.05271 | link |
2024-12-06 | CompCap: Improving Multimodal Large Language Models with Composite Captions | Xiaohui Chen et.al. | 2412.05243 | null |
2024-12-06 | MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | Jarvis Guo et.al. | 2412.05237 | null |
2024-12-06 | LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation | Donald Shenaj et.al. | 2412.05148 | link |
2024-12-06 | Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models | Zehao Wang et.al. | 2412.04939 | null |
2024-12-06 | EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation | Yongxin Wang et.al. | 2412.04903 | null |
2024-12-06 | Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis | Rui Zhou et.al. | 2412.04707 | null |
2024-12-05 | Assessing and Learning Alignment of Unimodal Vision and Language Models | Le Zhang et.al. | 2412.04616 | null |
2024-12-05 | p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay | Jun Zhang et.al. | 2412.04449 | link |
2024-12-05 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu et.al. | 2412.04447 | null |
2024-12-05 | GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration | Kaiyi Huang et.al. | 2412.04440 | null |
2024-12-05 | Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Shaunak Halbe et.al. | 2412.04429 | link |
2024-12-05 | Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Jiuhai Chen et.al. | 2412.04424 | link |
2024-12-05 | Liquid: Language Models are Scalable Multi-modal Generators | Junfeng Wu et.al. | 2412.04332 | link |
2024-12-05 | FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression | Bo Tong et.al. | 2412.04317 | link |
2024-12-04 | VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding | Chaoyu Li et.al. | 2412.03735 | null |
2024-12-04 | DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation | Qingdong He et.al. | 2412.03255 | null |
2024-12-04 | Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges | Minghao Shao et.al. | 2412.03220 | null |
2024-12-04 | ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning | Zhe Xie et.al. | 2412.03104 | link |
2024-12-03 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | Kaixiong Gong et.al. | 2412.02611 | null |
2024-12-03 | Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks | Jinjin Cai et.al. | 2412.02531 | null |
2024-12-03 | VR Based Emotion Recognition Using Deep Multimodal Fusion With Biosignals Across Multiple Anatomical Domains | Pubudu L. Indrasiri et.al. | 2412.02283 | null |
2024-12-03 | Personalized Multimodal Large Language Models: A Survey | Junda Wu et.al. | 2412.02142 | null |
2024-12-03 | WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image | Yuci Liang et.al. | 2412.02141 | null |
2024-12-03 | Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey | Yunkai Dang et.al. | 2412.02104 | null |
2024-12-02 | PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving | Xuewen Luo et.al. | 2412.02025 | null |
2024-12-02 | MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models | Xiaomin Li et.al. | 2412.01343 | null |
2024-12-02 | Enhancing Perception Capabilities of Multimodal LLMs with Training-free Fusion | Zhuokun Chen et.al. | 2412.01289 | null |
2024-12-02 | Ponder & Press: Advancing Visual GUI Agent towards General Computer Control | Yiqin Wang et.al. | 2412.01268 | null |
2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951 | link |
2024-11-29 | VLSBench: Unveiling Visual Leakage in Multimodal Safety | Xuhao Hu et.al. | 2411.19939 | null |
2024-11-29 | On Domain-Specific Post-Training for Multimodal Large Language Models | Daixuan Cheng et.al. | 2411.19930 | null |
2024-11-29 | Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings | Qiong Wu et.al. | 2411.19628 | link |
2024-11-28 | Libra: Leveraging Temporal Images for Biomedical Radiology Analysis | Xi Zhang et.al. | 2411.19378 | link |
2024-11-28 | SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation | Yuhan Pei et.al. | 2411.19182 | null |
2024-11-28 | Detailed Object Description with Controllable Dimensions | Xinran Wang et.al. | 2411.19106 | link |
2024-11-28 | I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting | Nicola Fanelli et.al. | 2411.19050 | link |
2024-11-28 | DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users | Wataru Kawabe et.al. | 2411.18908 | null |
2024-11-27 | Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment | Soumya Suvra Ghosal et.al. | 2411.18688 | null |
2024-11-27 | Cross-modal Information Flow in Multimodal Large Language Models | Zhi Zhang et.al. | 2411.18620 | link |
2024-11-27 | GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation | Pengfei Zhou et.al. | 2411.18499 | null |
2024-11-27 | ChatRex: Taming Multimodal LLM for Joint Perception and Understanding | Qing Jiang et.al. | 2411.18363 | link |
2024-11-27 | Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models | Jingming Liu et.al. | 2411.18142 | null |
2024-11-26 | NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects? | Jiaxuan Li et.al. | 2411.17794 | null |
2024-11-26 | Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration | Yuhang Han et.al. | 2411.17686 | null |
2024-11-26 | What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics | Jordan J. Bird et.al. | 2411.17593 | null |
2024-11-26 | Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey | Jiayi Kuang et.al. | 2411.17558 | null |
2024-11-26 | InsightEdit: Towards Better Instruction Following for Image Editing | Yingjing Xu et.al. | 2411.17323 | null |
2024-11-26 | in-Car Biometrics (iCarB) Datasets for Driver Recognition: Face, Fingerprint, and Voice | Vedrana Krivokuca Hahn et.al. | 2411.17305 | null |
2024-11-26 | A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs | Lehan He et.al. | 2411.17265 | null |
2024-11-26 | HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator | Fan Yang et.al. | 2411.17261 | null |
2024-11-26 | Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment | Zheng Chen et.al. | 2411.17237 | link |
2024-11-26 | DOGE: Towards Versatile Visual Document Grounding and Referring | Yinan Zhou et.al. | 2411.17125 | null |
2024-11-26 | Multimodal Alignment and Fusion: A Survey | Songtao Li et.al. | 2411.17040 | null |
2024-11-25 | TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation | Linqing Zhong et.al. | 2411.16425 | null |
2024-11-25 | Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models | Hao Yi et.al. | 2411.16201 | null |
2024-11-25 | Interpreting Object-level Foundation Models via Visual Precision Search | Ruoyu Chen et.al. | 2411.16198 | link |
2024-11-25 | ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration | Haozhan Shen et.al. | 2411.16044 | link |
2024-11-23 | Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark | Rong-Cheng Tu et.al. | 2411.15488 | link |
2024-11-23 | Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy | Te Yang et.al. | 2411.15453 | null |
2024-11-22 | MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | Chaoyou Fu et.al. | 2411.15296 | link |
2024-11-22 | VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Daeun Lee et.al. | 2411.15115 | null |
2024-11-22 | mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA | Tao Zhang et.al. | 2411.15041 | null |
2024-11-22 | De-biased Multimodal Electrocardiogram Analysis | Haitao Li et.al. | 2411.14795 | null |
2024-11-22 | Evaluating and Advancing Multimodal Large Language Models in Ability Lens | Feng Chen et.al. | 2411.14725 | null |
2024-11-22 | FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data | Binqian Xu et.al. | 2411.14717 | link |
2024-11-22 | Any-to-3D Generation via Hybrid Diffusion Supervision | Yijun Fan et.al. | 2411.14715 | null |
2024-11-21 | LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval | Weiheng Lu et.al. | 2411.14505 | null |
2024-11-21 | Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2411.14432 | link |
2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401 | null |
2024-11-21 | Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance | Haozhe Zhao et.al. | 2411.14279 | null |
2024-11-21 | Separable Mixture of Low-Rank Adaptation for Continual Visual Instruction Tuning | Ziqi Wang et.al. | 2411.13949 | null |
2024-11-21 | Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided Visual Prompts | Honglin Li et.al. | 2411.13909 | null |
2024-11-20 | Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs | Rui Cao et.al. | 2411.13697 | link |
2024-11-20 | AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | Gaurav Verma et.al. | 2411.13451 | null |
2024-11-20 | DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | Xianda Guo et.al. | 2411.13112 | link |
2024-11-20 | Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving | Hao Zhou et.al. | 2411.13076 | null |
2024-11-19 | Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models | Zhen Zeng et.al. | 2411.12790 | null |
2024-11-19 | Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting | Haoyu Zhao et.al. | 2411.12789 | null |
2024-11-19 | Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning | Pengkun Jiao et.al. | 2411.12787 | null |
2024-11-19 | Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model | Yiming Shi et.al. | 2411.12783 | null |
2024-11-18 | Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning | Xudong Yan et.al. | 2411.12584 | null |
2024-11-19 | CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model | Dongyoung Go et.al. | 2411.12287 | null |
2024-11-18 | AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning | Kun Xiang et.al. | 2411.11930 | link |
2024-11-18 | Dissecting Misalignment of Multimodal Large Language Models via Influence Function | Lijie Hu et.al. | 2411.11667 | null |
2024-11-18 | MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models | Harshita Sharma et.al. | 2411.11362 | null |
2024-11-18 | CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset | Zhiming Wang et.al. | 2411.11360 | link |
2024-11-18 | MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis | Yingjie Zhou et.al. | 2411.11235 | null |
2024-11-19 | Multilingual Large Language Models: A Systematic Survey | Shaolin Zhu et.al. | 2411.11072 | link |
2024-11-19 | VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? | Yunlong Tang et.al. | 2411.10979 | null |
2024-11-17 | Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering | Zeping Yu et.al. | 2411.10950 | link |
2024-11-17 | Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning | Wenke Huang et.al. | 2411.10928 | null |
2024-11-16 | BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization | Md. Nazmus Sadat Samin et.al. | 2411.10879 | link |
2024-11-16 | Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts | Jinqiang Long et.al. | 2411.10669 | link |
2024-11-15 | Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | Weiyun Wang et.al. | 2411.10442 | null |
2024-11-15 | Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization | Yuhan Fu et.al. | 2411.10436 | null |
2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309 | link |
2024-11-15 | Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Jingru Yang et.al. | 2411.10252 | null |
2024-11-15 | CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation | Xiaofei Zhu et.al. | 2411.10060 | null |
2024-11-15 | VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos | Weihao Zhong et.al. | 2411.10032 | null |
2024-11-15 | Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs | Xiaofeng Zhang et.al. | 2411.09968 | null |
2024-11-14 | MagicQuill: An Intelligent Interactive Image Editing System | Zichen Liu et.al. | 2411.09703 | link |
2024-11-14 | Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models | Wei Wang et.al. | 2411.09691 | null |
2024-11-14 | Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models | Chutian Meng et.al. | 2411.09449 | null |
2024-11-14 | Spider: Any-to-Many Multimodal LLM | Jinxiang Lai et.al. | 2411.09439 | link |
2024-11-14 | LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Zhenshi Li et.al. | 2411.09301 | link |
2024-11-13 | Multimodal Instruction Tuning with Hybrid State Space Models | Jianing Zhou et.al. | 2411.08840 | null |
2024-11-13 | Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks? | Quan Zhang et.al. | 2411.08466 | null |
2024-11-13 | Material Property Prediction with Element Attribute Knowledge Graphs and Multimodal Representation Learning | Chao Huang et.al. | 2411.08414 | null |
2024-11-12 | SimBase: A Simple Baseline for Temporal Video Grounding | Peijun Bao et.al. | 2411.07945 | null |
2024-11-12 | Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding | Zirui Shao et.al. | 2411.07722 | null |
2024-11-12 | Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models | Tiejin Chen et.al. | 2411.07559 | null |
2024-11-11 | Multimodal Fusion Balancing Through Game-Theoretic Regularization | Konstantinos Kontras et.al. | 2411.07335 | null |
2024-11-11 | CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Junho Kim et.al. | 2411.06869 | null |
2024-11-11 | Learning from Feedback: Semantic Enhancement for Object SLAM Using Foundation Models | Jungseok Hong et.al. | 2411.06752 | null |
2024-11-10 | KMM: Key Frame Mask Mamba for Extended Motion Generation | Zeyu Zhang et.al. | 2411.06481 | link |
2024-11-09 | A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks | Chia Xin Liang et.al. | 2411.06284 | null |
2024-11-09 | An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models | Fatemeh Shiri et.al. | 2411.06048 | link |
2024-11-08 | Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation | Dong Shu et.al. | 2411.05316 | link |
2024-11-08 | Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding | Jaeyoo Park et.al. | 2411.05254 | null |
2024-11-07 | On Erroneous Agreements of CLIP Image Embeddings | Siting Li et.al. | 2411.05195 | null |
2024-11-07 | Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models | Pete Janowczyk et.al. | 2411.05056 | null |
2024-11-07 | CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM | Jingwei Xu et.al. | 2411.04954 | null |
2024-11-07 | GUI Agents with Foundation Models: A Comprehensive Survey | Shuai Wang et.al. | 2411.04890 | null |
2024-11-07 | Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs | Chengxin Hu et.al. | 2411.04708 | null |
2024-11-06 | Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education | Anand Syamkumar et.al. | 2411.04308 | null |
2024-11-06 | Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection | Nana Lin et.al. | 2411.04158 | null |
2024-11-06 | Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination | Dingjie Song et.al. | 2411.03823 | link |
2024-11-06 | StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding | Junming Lin et.al. | 2411.03628 | link |
2024-11-05 | MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning | Ziliang Gan et.al. | 2411.03314 | null |
2024-11-05 | Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? | Jingyu Xiao et.al. | 2411.03292 | link |
2024-11-06 | Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | Yangning Li et.al. | 2411.02937 | link |
2024-11-05 | Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning | Mingcheng Li et.al. | 2411.02793 | null |
2024-11-05 | Multimodal Commonsense Knowledge Distillation for Visual Question Answering | Shuo Yang et.al. | 2411.02722 | null |
2024-11-05 | Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios | Yunkai Dang et.al. | 2411.02708 | null |
2024-11-04 | MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs | Sheng-Chieh Lin et.al. | 2411.02571 | null |
2024-11-04 | DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Yang Yue et.al. | 2411.02359 | link |
2024-11-04 | KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension | Jie Yang et.al. | 2411.01846 | null |
2024-11-04 | ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model | Yiming Sun et.al. | 2411.01756 | null |
2024-11-03 | UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models | Sejoon Oh et.al. | 2411.01703 | null |
2024-11-03 | Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation | Seongsu Ha et.al. | 2411.01494 | null |
2024-11-02 | Can Multimodal Large Language Model Think Analogically? | Diandian Guo et.al. | 2411.01307 | null |
2024-11-02 | Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems | Mikołaj Małkiński et.al. | 2411.01173 | null |
2024-11-01 | Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasks | Laura Wenderoth et.al. | 2411.00725 | null |
2024-11-01 | Unified Generative and Discriminative Training for Multi-modal Large Language Models | Wei Chow et.al. | 2411.00304 | null |
2024-10-31 | JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment | Joao Sousa et.al. | 2410.23988 | null |
2024-10-31 | Leveraging LLMs for MT in Crisis Scenarios: a blueprint for low-resource languages | Séamus Lankford et.al. | 2410.23890 | null |
2024-10-31 | Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding | Jinlong He et.al. | 2410.23822 | null |
2024-10-30 | PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures | Tianxiang Wu et.al. | 2410.23089 | null |
2024-10-29 | Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring | Matthew McKinney et.al. | 2410.22558 | null |
2024-10-29 | Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench | Zheyuan Liu et.al. | 2410.22108 | link |
2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264 | null |
2024-10-28 | Face-MLLM: A Large Face Perception Model | Haomiao Sun et.al. | 2410.20717 | null |
2024-10-27 | Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys | Lu Wang et.al. | 2410.20402 | null |
2024-10-26 | LLMs Can Evolve Continually on Modality for X-Modal Reasoning | Jiazuo Yu et.al. | 2410.20178 | link |
2024-10-25 | Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements | Silvia Terragni et.al. | 2410.19974 | null |
2024-10-25 | Improving Multimodal Large Language Models Using Continual Learning | Shikhar Srivastava et.al. | 2410.19925 | null |
2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702 | null |
2024-10-28 | BIFRÖST: 3D-Aware Image compositing with Language Instructions | Lingxiao Li et.al. | 2410.19079 | link |
2024-10-24 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Zhangheng Li et.al. | 2410.18967 | null |
2024-10-24 | SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models | Zonghao Ying et.al. | 2410.18927 | null |
2024-10-24 | Distill Visual Chart Reasoning Ability from LLMs to MLLMs | Wei He et.al. | 2410.18798 | link |
2024-10-24 | DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation | Yuang Ai et.al. | 2410.18666 | link |
2024-10-25 | Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks | Lehan Wang et.al. | 2410.18387 | null |
2024-10-23 | TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts | Yuxuan Xie et.al. | 2410.18071 | null |
2024-10-23 | CLEAR: Character Unlearning in Textual and Visual Modalities | Alexey Dontsov et.al. | 2410.18057 | null |
2024-10-23 | Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation | Wenfang Yao et.al. | 2410.17918 | link |
2024-10-23 | ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning | Zhiwei Hao et.al. | 2410.17779 | link |
2024-10-23 | YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions | Xiguang Li et.al. | 2410.17734 | null |
2024-10-23 | Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact | Junhua Liu et.al. | 2410.17532 | null |
2024-10-22 | LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding | Xiaoqian Shen et.al. | 2410.17434 | link |
2024-10-22 | Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models | Zhijie Tan et.al. | 2410.16983 | null |
2024-10-22 | IPL: Leveraging Multimodal Large Language Models for Intelligent Product Listing | Kang Chen et.al. | 2410.16977 | null |
2024-10-22 | Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance | Zhangwei Gao et.al. | 2410.16261 | link |
2024-10-21 | LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Yuxuan Cai et.al. | 2410.16236 | link |
2024-10-21 | Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining | Han Huang et.al. | 2410.16166 | link |
2024-10-21 | Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages | Xiang Yue et.al. | 2410.16153 | null |
2024-10-21 | Mitigating Object Hallucination via Concentric Causal Attention | Yun Xing et.al. | 2410.15926 | link |
2024-10-21 | AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection | Xiaoman Xu et.al. | 2410.15591 | link |
2024-10-20 | Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation | Jiayu Xiong et.al. | 2410.15475 | null |
2024-10-20 | Modality-Fair Preference Optimization for Trustworthy MLLM Alignment | Songtao Jiang et.al. | 2410.15334 | null |
2024-10-19 | SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation | Jingxuan Chen et.al. | 2410.15164 | link |
2024-10-19 | LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound | Xuechen Guo et.al. | 2410.15074 | null |
2024-10-18 | MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps | Xiongtao Zhou et.al. | 2410.14668 | link |
2024-10-18 | MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems | Zifeng Zhu et.al. | 2410.14179 | null |
2024-10-18 | RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training | Muhe Ding et.al. | 2410.14154 | null |
2024-10-17 | PUMA: Empowering Unified MLLM with Multi-granular Visual Generation | Rongyao Fang et.al. | 2410.13861 | link |
2024-10-17 | $γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models | Yaxin Luo et.al. | 2410.13859 | null |
2024-10-17 | Can MLLMs Understand the Deep Implication Behind Chinese Images? | Chenhao Zhang et.al. | 2410.13854 | link |
2024-10-18 | Harnessing Webpage UIs for Text-Rich Visual Understanding | Junpeng Liu et.al. | 2410.13824 | null |
2024-10-17 | MobA: A Two-Level Agent System for Efficient Mobile Task Automation | Zichen Zhu et.al. | 2410.13757 | link |
2024-10-17 | Exploring the Design Space of Visual Context Representation in Video MLLMs | Yifan Du et.al. | 2410.13694 | link |
2024-10-17 | Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant | Haoran Hao et.al. | 2410.13360 | link |
2024-10-16 | MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs | Yunqiu Xu et.al. | 2410.12332 | null |
2024-10-16 | Understanding the Role of LLMs in Multimodal Evaluation Benchmarks | Botian Jiang et.al. | 2410.12329 | link |
2024-10-16 | Multimodal Fusion with Relational Learning for Molecular Property Prediction | Zhengyang Zhou et.al. | 2410.12128 | null |
2024-10-15 | MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding | Yue Cao et.al. | 2410.11829 | link |
2024-10-15 | MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Chenxi Wang et.al. | 2410.11779 | link |
2024-10-15 | SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding | Ying Chen et.al. | 2410.11761 | null |
2024-10-15 | Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions | Yuhan Fu et.al. | 2410.11701 | null |
2024-10-15 | VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI | Sijie Cheng et.al. | 2410.11623 | null |
2024-10-15 | MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark | Bin Shan et.al. | 2410.11538 | link |
2024-10-15 | Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Sihang Zhao et.al. | 2410.11437 | link |
2024-10-15 | Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models | Zhongye Liu et.al. | 2410.11242 | link |
2024-10-15 | MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation | Xianping Ma et.al. | 2410.11160 | link |
2024-10-14 | Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes | Tim Broedermann et.al. | 2410.10791 | link |
2024-10-14 | MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages | Shubhi Bansal et.al. | 2410.10407 | link |
2024-10-14 | Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation | Shun Qian et.al. | 2410.10319 | null |
2024-10-14 | ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization | Jiawei Li et.al. | 2410.10238 | null |
2024-10-14 | Tracing Human Stress from Physiological Signals using UWB Radar | Jia Xu et.al. | 2410.10155 | null |
2024-10-15 | LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models | Han Qiu et.al. | 2410.09962 | link |
2024-10-13 | Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records | Shuai Jiang et.al. | 2410.09880 | null |
2024-10-13 | Text4Seg: Reimagining Image Segmentation as Text Generation | Mengcheng Lan et.al. | 2410.09855 | link |
2024-10-12 | Skipping Computations in Multimodal LLMs | Mustafa Shukor et.al. | 2410.09454 | link |
2024-10-12 | MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection | Xi Jiang et.al. | 2410.09453 | link |
2024-10-11 | Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion | Shiao Wang et.al. | 2410.08879 | null |
2024-10-11 | Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking | Wei Zhang et.al. | 2410.08616 | null |
2024-10-11 | Baichuan-Omni Technical Report | Yadong Li et.al. | 2410.08565 | link |
2024-10-11 | SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models | Haotian Xia et.al. | 2410.08474 | link |
2024-10-10 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Gen Luo et.al. | 2410.08202 | null |
2024-10-10 | Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Qingni Wang et.al. | 2410.08174 | null |
2024-10-10 | Agent S: An Open Agentic Framework that Uses Computers Like a Human | Saaket Agashe et.al. | 2410.08164 | link |
2024-10-10 | Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs | Xiaoyuan Liu et.al. | 2410.08145 | link |
2024-10-09 | Retrieval Replace Reduction: An effective visual token reduction method via semantic match | Yingen Liu et.al. | 2410.07278 | null |
2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155 | link |
2024-10-09 | Personalized Visual Instruction Tuning | Renjie Pi et.al. | 2410.07113 | link |
2024-10-10 | Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology | Xiangyu Wang et.al. | 2410.07087 | null |
2024-10-09 | HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding | Keliang Li et.al. | 2410.06777 | null |
2024-10-09 | To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models | Junyan Lin et.al. | 2410.06765 | link |
2024-10-09 | ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | Haoran Zhang et.al. | 2410.06555 | link |
2024-10-09 | Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection | Aravinda Reddy PN et.al. | 2410.06543 | null |
2024-10-08 | Multimodal Situational Safety | Kaiwen Zhou et.al. | 2410.06172 | null |
2024-10-08 | Quadratic Is Not What You Need For Multimodal Large Language Models | Phu Pham et.al. | 2410.06169 | link |
2024-10-08 | $\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$ tendable Deepfake Detection | Yize Chen et.al. | 2410.06126 | null |
2024-10-07 | Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Boyu Gou et.al. | 2410.05243 | link |
2024-10-07 | Organizing Unstructured Image Collections using Natural Language | Mingxuan Liu et.al. | 2410.05217 | null |
2024-10-07 | Multimodal Fusion Strategies for Mapping Biophysical Landscape Features | Lucia Gordon et.al. | 2410.04833 | link |
2024-10-07 | MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models | Kaichen Huang et.al. | 2410.04819 | link |
2024-10-07 | Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Guanyu Zhou et.al. | 2410.04780 | link |
2024-10-07 | MM-R $^3$ : On (In-)Consistency of Multi-modal Large Language Models (MLLMs) | Shih-Han Chou et.al. | 2410.04778 | null |
2024-10-07 | Diffusion Models in 3D Vision: A Survey | Zhen Wang et.al. | 2410.04738 | null |
2024-10-07 | ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models | Ziyue Wang et.al. | 2410.04659 | link |
2024-10-08 | FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering | Siqiao Xue et.al. | 2410.04526 | null |
2024-10-06 | MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration | Lai Wei et.al. | 2410.04521 | link |
2024-10-04 | Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models | Xin Zou et.al. | 2410.03577 | link |
2024-10-04 | Gradient-based Jailbreak Images for Multimodal Fusion Models | Javier Rando et.al. | 2410.03489 | link |
2024-10-04 | MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents | Junpeng Yue et.al. | 2410.03450 | null |
2024-10-04 | SELU: Self-Learning Embodied MLLMs in Unknown Environments | Boyu Li et.al. | 2410.03303 | null |
2024-10-03 | Contrastive Localized Language-Image Pre-Training | Hong-You Chen et.al. | 2410.02746 | null |
2024-10-03 | LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model | Duy M. H. Nguyen et.al. | 2410.02615 | null |
2024-10-03 | Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment | Kai Liu et.al. | 2410.02505 | link |
2024-10-04 | SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack | Zihao Pan et.al. | 2410.02240 | link |
2024-10-04 | From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities | Wanpeng Zhang et.al. | 2410.02155 | null |
2024-10-02 | Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations | Minoh Jeong et.al. | 2410.02086 | null |
2024-10-02 | EMMA: Efficient Visual Alignment in Multi-Modal LLMs | Sara Ghazanfari et.al. | 2410.02080 | link |
2024-10-03 | Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks | Mengzhao Jia et.al. | 2410.01744 | link |
2024-10-02 | Visual Perception in Text Strings | Qi Jia et.al. | 2410.01733 | link |
2024-10-02 | The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Hong Li et.al. | 2410.01417 | null |
2024-10-02 | SHAP-CAT: A interpretable multi-modal framework enhancing WSI classification via virtual staining and shapley-value-based multimodal fusion | Jun Wang et.al. | 2410.01408 | null |
2024-10-01 | FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks | Peiran Wu et.al. | 2410.01089 | null |
2024-10-01 | Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data | Ivica Dimitrovski et.al. | 2410.00469 | null |
2024-10-01 | Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations | Miyu Goko et.al. | 2410.00436 | null |
2024-10-01 | MERIT: Multimodal Wearable Vital Sign Waveform Monitoring | Yongyang Tang et.al. | 2410.00392 | null |
2024-09-30 | Multimodal Alignment of Histopathological Images Using Cell Segmentation and Point Set Matching for Integrative Cancer Analysis | Jun Jiang et.al. | 2410.00152 | null |
2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566 | null |
2024-09-30 | UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models | Qiaojun Yu et.al. | 2409.20551 | null |
2024-09-30 | Melody Is All You Need For Music Generation | Shaopeng Wei et.al. | 2409.20196 | link |
2024-09-30 | VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Huilin Deng et.al. | 2409.20146 | null |
2024-09-30 | Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval | Yabing Wang et.al. | 2409.19961 | link |
2024-09-30 | WildFusion: Multimodal Implicit 3D Reconstructions in the Wild | Yanbaihui Liu et.al. | 2409.19904 | null |
2024-10-01 | Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration | Kaihang Pan et.al. | 2409.19872 | link |
2024-09-29 | Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs | Fengzhu Zeng et.al. | 2409.19656 | null |
2024-09-28 | A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping | Houjian Yu et.al. | 2409.19457 | null |
2024-09-28 | Visual Question Decomposition on Multimodal Large Language Models | Haowei Zhang et.al. | 2409.19339 | null |
2024-09-27 | Enhancing Explainability in Multimodal Large Language Models Using Ontological Context | Jihen Amara et.al. | 2409.18753 | null |
2024-09-27 | 3DPX: Single Panoramic X-ray Analysis Guided by 3D Oral Structure Reconstruction | Xiaoshuang Li et.al. | 2409.18701 | null |
2024-09-27 | Image-guided topic modeling for interpretable privacy classification | Alina Elena Baia et.al. | 2409.18674 | link |
2024-09-27 | When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation | Yuli Zhou et.al. | 2409.18653 | link |
2024-09-27 | Align $^2$ LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation | Hongzhe Huang et.al. | 2409.18541 | link |
2024-09-27 | FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation | Yuki Imajuku et.al. | 2409.18459 | null |
2024-09-26 | Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing | Huthaifa I. Ashqar et.al. | 2409.18286 | null |
2024-09-26 | EAGLE: Egocentric AGgregated Language-video Engine | Jing Bi et.al. | 2409.17523 | null |
2024-09-26 | Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Xun Zhu et.al. | 2409.17508 | link |
2024-09-25 | Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents | Junting Lu et.al. | 2409.17140 | null |
2024-09-25 | Pruning Multilingual Large Language Models for Multilingual Inference | Hwichan Kim et.al. | 2409.16911 | link |
2024-09-25 | MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features | Katharina Anderer et.al. | 2409.16765 | link |
2024-09-26 | EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models | Jiacheng Zhang et.al. | 2409.16723 | null |
2024-09-25 | EventHallusion: Diagnosing Event Hallucinations in Video LLMs | Jiacheng Zhang et.al. | 2409.16597 | link |
2024-09-24 | DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection | Jiaxin Ye et.al. | 2409.15936 | link |
2024-09-25 | M^2PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning | Taowen Wang et.al. | 2409.15657 | link |
2024-09-23 | MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models | Mohammad Shahab Sepehri et.al. | 2409.15477 | link |
2024-09-24 | OmniBench: Towards The Future of Universal Omni-Language Models | Yizhi Li et.al. | 2409.15272 | link |
2024-09-23 | Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation | Manu Gaur et.al. | 2409.15125 | null |
2024-09-23 | Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | Hong Chen et.al. | 2409.14993 | null |
2024-09-23 | FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension | Junzhuo Liu et.al. | 2409.14750 | link |
2024-09-24 | Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding | Yan Shu et.al. | 2409.14485 | link |
2024-09-21 | Enhancing Advanced Visual Reasoning Ability of Large Language Models | Zhiyuan Li et.al. | 2409.13980 | null |
2024-09-20 | MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension | Ting Liu et.al. | 2409.13609 | link |
2024-09-18 | Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference | Najmeh Forouzandehmehr et.al. | 2409.12150 | null |
2024-09-18 | Fusion in Context: A Multimodal Approach to Affective State Recognition | Youssef Mohamed et.al. | 2409.11906 | null |
2024-09-18 | Bridging Design and Development with Automated Declarative UI Code Generation | Ting Zhou et.al. | 2409.11667 | null |
2024-09-17 | Towards Time Series Reasoning with LLMs | Winnie Chow et.al. | 2409.11376 | null |
2024-09-17 | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | Jiahui Gao et.al. | 2409.11365 | null |
2024-09-17 | Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection | Yuta Kaneko et.al. | 2409.11223 | null |
2024-09-16 | Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving | Yunsheng Ma et.al. | 2409.11182 | null |
2024-09-17 | Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs | Dingjie Song et.al. | 2409.10994 | link |
2024-09-17 | Multi-Floor Zero-Shot Object Navigation Policy | Lingfeng Zhang et.al. | 2409.10906 | null |
2024-09-16 | XLM for Autonomous Driving Systems: A Comprehensive Review | Sonda Fourati et.al. | 2409.10484 | null |
2024-09-16 | Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models | Weihao Ye et.al. | 2409.10197 | link |
2024-09-15 | Explore the Hallucination on Low-level Perception for MLLMs | Yinan Sun et.al. | 2409.09748 | null |
2024-09-15 | AutoJournaling: A Context-Aware Journaling System Leveraging MLLMs on Smartphone Screenshots | Tianyi Zhang et.al. | 2409.09696 | null |
2024-09-14 | Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM | Yuanjie Lyu et.al. | 2409.09362 | null |
2024-09-14 | ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models | Yahan Tu et.al. | 2409.09318 | null |
2024-09-13 | Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation | Cheng Charles Ma et.al. | 2409.09135 | null |
2024-09-11 | Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU | Zhenyu Ning et.al. | 2409.09086 | null |
2024-09-13 | VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation | Hanning Chen et.al. | 2409.08464 | link |
2024-09-11 | Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Weixi Weng et.al. | 2409.07331 | null |
2024-09-11 | Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout | Anbin QI et.al. | 2409.07078 | null |
2024-09-10 | LIME-M: Less Is More for Evaluation of MLLMs | Kang Zhu et.al. | 2409.06851 | link |
2024-09-10 | VoiceWukong: Benchmarking Deepfake Voice Detection | Ziwei Yan et.al. | 2409.06348 | null |
2024-09-10 | MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Surbhi Madan et.al. | 2409.06224 | null |
2024-09-09 | MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data | Jianyi Zhang et.al. | 2409.06067 | null |
2024-09-09 | Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models | Hongyang Lei et.al. | 2409.05929 | null |
2024-09-09 | Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments | Haritheja Etukuru et.al. | 2409.05865 | link |
2024-09-15 | MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | Run Luo et.al. | 2409.05840 | null |
2024-09-11 | A Survey of Multimodal Composite Editing and Retrieval | Suyan Li et.al. | 2409.05405 | link |
2024-09-07 | Training-free ZS-CIR via Weighted Modality Fusion and Similarity | Ren-Di Wu et.al. | 2409.04918 | link |
2024-09-06 | Influence of Early through Late Fusion on Pancreas Segmentation from Imperfectly Registered Multimodal MRI | Lucas W. Remedios et.al. | 2409.04563 | link |
2024-09-10 | Question-Answering Dense Video Events | Hangyu Qin et.al. | 2409.04388 | null |
2024-09-09 | Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver | Zeren Zhang et.al. | 2409.04214 | link |
2024-09-06 | UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity | Yicheng Fu et.al. | 2409.04081 | null |
2024-09-09 | mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | Anwen Hu et.al. | 2409.03420 | link |
2024-09-05 | ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding | Zhengzhuo Xu et.al. | 2409.03277 | null |
2024-09-05 | OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving | Julong Wei et.al. | 2409.03272 | null |
2024-09-05 | TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations | Mingze Gao et.al. | 2409.03206 | null |
2024-09-04 | No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning | Manu Gaur et.al. | 2409.03025 | null |
2024-09-06 | HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts | Xinyu Liu et.al. | 2409.02919 | link |
2024-09-04 | LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | Xidong Wang et.al. | 2409.02889 | link |
2024-09-04 | A Medical Multimodal Large Language Model for Pediatric Pneumonia | Weiwei Tian et.al. | 2409.02608 | null |
2024-09-02 | Understanding Multimodal Hallucination with Parameter-Free Representation Alignment | Yueqian Wang et.al. | 2409.01151 | link |
2024-09-01 | Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model | Fuqiang Niu et.al. | 2409.00597 | null |
2024-08-31 | StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models | Yuxiang Guo et.al. | 2409.00304 | null |
2024-08-30 | EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs | Zhen Fan et.al. | 2408.17168 | null |
2024-08-30 | AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding | Yonghui Wang et.al. | 2408.16986 | link |
2024-08-29 | Law of Vision Representation in MLLMs | Shijia Yang et.al. | 2408.16357 | link |
2024-08-28 | Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Min Shi et.al. | 2408.15998 | link |
2024-08-28 | LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Fangxun Shu et.al. | 2408.15881 | link |
2024-08-28 | A Survey on Evaluation of Multimodal Large Language Models | Jiaxing Huang et.al. | 2408.15769 | null |
2024-08-28 | MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms | Tianyi Shang et.al. | 2408.15740 | link |
2024-08-28 | TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning | Jinglun Li et.al. | 2408.15566 | link |
2024-08-28 | Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models | Wenbin Wang et.al. | 2408.15556 | link |
2024-08-27 | Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation | Jian Hu et.al. | 2408.15205 | link |
2024-08-27 | GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer Based Fusion Network for Multimodal Sentiment Analysis | Yijie Jin et.al. | 2408.14809 | link |
2024-08-26 | Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos | Qirui Chen et.al. | 2408.14469 | null |
2024-08-26 | Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos | Jiajun Fei et.al. | 2408.14023 | link |
2024-08-26 | FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation | Daixun Li et.al. | 2408.13980 | null |
2024-08-25 | ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models | Yeji Park et.al. | 2408.13906 | link |
2024-08-23 | MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | Yi-Fan Zhang et.al. | 2408.13257 | null |
2024-08-23 | ParGo: Bridging Vision-Language with Partial and Global Views | An-Lan Wang et.al. | 2408.12928 | link |
2024-08-23 | IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities | Bin Wang et.al. | 2408.12902 | link |
2024-08-23 | Semantic Alignment for Multimodal Large Language Models | Tao Wu et.al. | 2408.12867 | null |
2024-08-22 | Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models | Jean Park et.al. | 2408.12763 | null |
2024-08-23 | Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Khang T. Doan et.al. | 2408.12480 | null |
2024-08-26 | MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model | Chaoya Jiang et.al. | 2408.12321 | null |
2024-08-21 | CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion | Yunlong Tang et.al. | 2408.12009 | null |
2024-08-21 | SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | Yuanyang Yin et.al. | 2408.11813 | null |
2024-08-21 | EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model | Feipeng Ma et.al. | 2408.11795 | null |
2024-08-21 | EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning | Bohao Xing et.al. | 2408.11424 | link |
2024-08-21 | EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning | Zhihao Li et.al. | 2408.11397 | null |
2024-08-22 | Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model | Mengying Ge et.al. | 2408.11286 | null |
2024-08-20 | FLAME: Learning to Navigate with Multimodal LLM in Urban Environments | Yunzhe Xu et.al. | 2408.11051 | link |
2024-08-19 | CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Hidehisa Arai et.al. | 2408.10845 | null |
2024-08-20 | PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection | Tri Cao et.al. | 2408.10738 | null |
2024-08-21 | SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition | Zebang Cheng et.al. | 2408.10500 | link |
2024-08-19 | FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant | Zhengchao Huang et.al. | 2408.10072 | link |
2024-08-19 | Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting | Yun-Da Tsai et.al. | 2408.09798 | null |
2024-08-20 | Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation | Yuyang Ye et.al. | 2408.09698 | link |
2024-08-18 | Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models | Kening Zheng et.al. | 2408.09429 | link |
2024-08-17 | BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger | Yulin Chen et.al. | 2408.09093 | null |
2024-08-16 | ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis | Yubao Zhao et.al. | 2408.08849 | link |
2024-08-16 | Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM | Wanting Yang et.al. | 2408.08765 | null |
2024-08-16 | Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm | Hongcheng Liu et.al. | 2408.08693 | link |
2024-08-16 | Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning | Wenwen Zhuang et.al. | 2408.08640 | link |
2024-08-16 | A Survey on Benchmarks of Multimodal Large Language Models | Jian Li et.al. | 2408.08632 | link |
2024-08-16 | CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving | Shihan Peng et.al. | 2408.08500 | null |
2024-08-15 | When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding | Pingping Zhang et.al. | 2408.08093 | null |
2024-08-14 | End-to-end Semantic-centric Video-based Multimodal Affective Computing | Ronghao Lin et.al. | 2408.07694 | null |
2024-08-15 | Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | Enneng Yang et.al. | 2408.07666 | link |
2024-08-15 | MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Minxuan Zhou et.al. | 2408.07543 | link |
2024-08-14 | LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image | Fan Yang et.al. | 2408.07422 | null |
2024-08-14 | Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion | Peiyuan Chen et.al. | 2408.07303 | null |
2024-08-13 | CROME: Cross-Modal Adapters for Efficient Multimodal LLM | Sayna Ebrahimi et.al. | 2408.06610 | null |
2024-08-13 | Social Debiasing for Fair Multi-modal LLMs | Harry Cheng et.al. | 2408.06569 | null |
2024-08-12 | Deep Multimodal Collaborative Learning for Polyp Re-Identification | Suncheng Xiang et.al. | 2408.05914 | link |
2024-08-11 | Advancing Re-Ranking with Multimodal Fusion and Target-Oriented Auxiliary Tasks in E-Commerce Search | Enqiang Xu et.al. | 2408.05751 | null |
2024-08-11 | A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot | Haoxuan Ding et.al. | 2408.05729 | link |
2024-08-13 | SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning | Yuze Zhao et.al. | 2408.05517 | link |
2024-08-10 | How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model | Yuxin Zhu et.al. | 2408.05411 | null |
2024-08-09 | Revisiting Multi-Modal LLM Evaluation | Jian Lu et.al. | 2408.05334 | null |
2024-08-09 | Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing | Jiarui Xie et.al. | 2408.05307 | null |
2024-08-09 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu et.al. | 2408.05211 | link |
2024-08-09 | Instruction Tuning-free Visual Token Complement for Multimodal LLMs | Dongsheng Wang et.al. | 2408.05019 | null |
2024-08-13 | mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models | Jiabo Ye et.al. | 2408.04840 | link |
2024-08-09 | Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | Qirui Jiao et.al. | 2408.04594 | link |
2024-08-08 | MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models | Haoxuan Li et.al. | 2408.04388 | link |
2024-08-08 | MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning | Rex Liu et.al. | 2408.04243 | null |
2024-08-08 | M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction | Hui Luo et.al. | 2408.04170 | null |
2024-08-07 | Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks | Zaijing Li et.al. | 2408.03615 | link |
2024-08-07 | Unlocking the Non-Native Language Context Limitation: Native Language Prompting Facilitates Knowledge Elicitation | Baixuan Li et.al. | 2408.03544 | link |
2024-08-07 | Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation | Weiqi Feng et.al. | 2408.03505 | null |
2024-08-06 | Targeted Visual Prompting for Medical Visual Question Answering | Sergio Tascon-Morales et.al. | 2408.03043 | link |
2024-08-05 | Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | Xinbei Ma et.al. | 2408.02544 | link |
2024-08-05 | UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model | Zhaowei Li et.al. | 2408.02503 | link |
2024-08-06 | Infusing Environmental Captions for Long-Form Video Language Grounding | Hyogun Lee et.al. | 2408.02336 | null |
2024-08-05 | REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models | Agneet Chatterjee et.al. | 2408.02231 | null |
2024-08-04 | Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping | Mingxin Huang et.al. | 2408.02034 | link |
2024-08-03 | MiniCPM-V: A GPT-4V Level MLLM on Your Phone | Yuan Yao et.al. | 2408.01800 | link |
2024-08-03 | MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition | Ruoyu Wang et.al. | 2408.01766 | null |
2024-08-02 | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs | Yilun Hua et.al. | 2408.01417 | null |
2024-08-05 | Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs | Peng Ding et.al. | 2408.01355 | link |
2024-08-02 | A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks | Jiaqi Wang et.al. | 2408.01319 | null |
2024-08-02 | Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models | Kohou Wang et.al. | 2408.01003 | null |
2024-08-02 | Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation | Zijian Yi et.al. | 2408.00970 | link |
2024-08-01 | Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model | Benlin Liu et.al. | 2408.00754 | null |
2024-08-01 | Are Bigger Encoders Always Better in Vision Large Models? | Bozhou Li et.al. | 2408.00620 | null |
2024-08-01 | Multimodal Fusion and Coherence Modeling for Video Topic Segmentation | Hai Yu et.al. | 2408.00365 | null |
2024-08-01 | Towards Flexible Evaluation for Generative Visual Question Answering | Huishan Ji et.al. | 2408.00300 | link |
2024-08-01 | Multi-Modal Parameter-Efficient Fine-tuning via Graph Neural Network | Bin Cheng et.al. | 2408.00290 | null |
2024-07-31 | ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models | Mingrui Wu et.al. | 2407.21534 | link |
2024-07-31 | MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training | Zhanpeng Chen et.al. | 2407.21439 | link |
2024-07-31 | Design and Development of Laughter Recognition System Based on Multimodal Fusion and Deep Learning | Fuzheng Zhao et.al. | 2407.21391 | null |
2024-07-31 | Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM | Can Wang et.al. | 2407.21333 | null |
2024-07-30 | Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate | Zheng Lin et.al. | 2407.20505 | link |
2024-07-29 | CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models | Junda Wu et.al. | 2407.20454 | null |
2024-07-29 | Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning | Xingchen Zeng et.al. | 2407.20174 | link |
2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171 | link |
2024-07-29 | ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 | Wenjun Huang et.al. | 2407.19832 | null |
2024-07-29 | Multimodal Large Language Models for Bioimage Analysis | Shanghang Zhang et.al. | 2407.19778 | null |
2024-07-29 | Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images | Jiaxin Zhanga et.al. | 2407.19719 | null |
2024-07-29 | Harnessing Large Vision and Language Models in Agriculture: A Review | Hongyan Zhu et.al. | 2407.19679 | null |
2024-07-29 | ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck | Chia-Hao Kao et.al. | 2407.19651 | null |
2024-07-28 | ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding | Zhen Chen et.al. | 2407.19435 | link |
2024-07-28 | LLAVADI: What Matters For Multimodal Large Language Models Distillation | Shilin Xu et.al. | 2407.19409 | null |
2024-07-27 | Data Processing Techniques for Modern Multimodal Models | Yinheng Li et.al. | 2407.19180 | null |
2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854 | null |
2024-07-26 | Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models | Xiang Shi et.al. | 2407.18626 | link |
2024-07-25 | Automated Ensemble Multimodal Machine Learning for Healthcare | Fergus Imrie et.al. | 2407.18227 | null |
2024-07-26 | Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Fakhraddin Alwajih et.al. | 2407.18129 | null |
2024-07-25 | ERIT Lightweight Multimodal Dataset for Elderly Emotion Recognition and Multimodal Fusion Evaluation | Rita Frieske et.al. | 2407.17772 | null |
2024-07-24 | DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation | Qian Feng et.al. | 2407.17348 | null |
2024-07-23 | CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs | Jihyung Kil et.al. | 2407.16837 | link |
2024-07-23 | Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation | Tao Meng et.al. | 2407.16714 | null |
2024-07-23 | PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects | Junyi Li et.al. | 2407.16696 | link |
2024-07-24 | MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues | Liyun Zhang et.al. | 2407.16552 | null |
2024-07-23 | Harmonizing Visual Text Comprehension and Generation | Zhen Zhao et.al. | 2407.16364 | link |
2024-07-23 | INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | Yiwei Ma et.al. | 2407.16198 | link |
2024-07-23 | UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models | Liu Qi et.al. | 2407.16160 | link |
2024-07-22 | Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight | Ziyuan Huang et.al. | 2407.15819 | null |
2024-07-22 | GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI | Zhaojie Fang et.al. | 2407.15719 | link |
2024-07-22 | Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models | Feifan Zhang et.al. | 2407.15335 | null |
2024-07-21 | MIBench: Evaluating Multimodal Large Language Models over Multiple Images | Haowei Liu et.al. | 2407.15272 | null |
2024-07-23 | BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM | Hanjun Luo et.al. | 2407.15240 | link |
2024-07-23 | DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer | Jinfeng Wei et.al. | 2407.15130 | null |
2024-07-21 | Navigation Instruction Generation with BEV Perception and Large Language Models | Sheng Fan et.al. | 2407.15087 | link |
2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506 | null |
2024-07-19 | T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Kaiyue Sun et.al. | 2407.14505 | link |
2024-07-19 | Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding | Renshan Zhang et.al. | 2407.14439 | link |
2024-07-19 | Not All Attention is Needed: Parameter and Computation Efficient Tuning for Multi-modal Large Language Models via Effective Attention Skipping | Qiong Wu et.al. | 2407.14093 | null |
2024-07-18 | X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs | Sirnam Swetha et.al. | 2407.13851 | null |
2024-07-20 | EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension | Wei Zhang et.al. | 2407.13596 | link |
2024-07-18 | OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird’s-eye-view Vehicle Semantic Segmentation | Jian Sun et.al. | 2407.13137 | null |
2024-07-17 | MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Leyang Shen et.al. | 2407.12709 | link |
2024-07-17 | E5-V: Universal Embeddings with Multimodal Large Language Models | Ting Jiang et.al. | 2407.12580 | link |
2024-07-17 | Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning | Mustafa Dogan et.al. | 2407.12498 | null |
2024-07-17 | ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data | Yufan Shen et.al. | 2407.12358 | link |
2024-07-16 | UrbanWorld: An Urban World Model for 3D City Generation | Yu Shang et.al. | 2407.11965 | link |
2024-07-17 | Harnessing Large Language Models for Multimodal Product Bundling | Xiaohao Liu et.al. | 2407.11712 | link |
2024-07-15 | By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting | Hyungjun Yoon et.al. | 2407.10385 | link |
2024-07-13 | Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding | Ruihuang Li et.al. | 2407.09781 | null |
2024-07-12 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | Shraman Pramanick et.al. | 2407.09413 | link |
2024-07-17 | Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study | Yulong Yang et.al. | 2407.09295 | null |
Prompt
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-04-17 | IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design | Fei Shen et.al. | 2504.13176 | null |
2025-04-17 | Personalized Text-to-Image Generation with Auto-Regressive Models | Kaiyue Sun et.al. | 2504.13162 | null |
2025-04-17 | Science-T2I: Addressing Scientific Illusions in Image Synthesis | Jialuo Li et.al. | 2504.13129 | null |
2025-04-17 | Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration | Yusi Sun et.al. | 2504.13119 | null |
2025-04-17 | Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems | Ivica Kostric et.al. | 2504.13095 | null |
2025-04-17 | EventVAD: Training-Free Event-Aware Video Anomaly Detection | Yihua Shao et.al. | 2504.13092 | null |
2025-04-17 | SkyReels-V2: Infinite-length Film Generative Model | Guibin Chen et.al. | 2504.13074 | null |
2025-04-17 | Early Accessibility: Automating Alt-Text Generation for UI Icons During App Development | Sabrina Haque et.al. | 2504.13069 | null |
2025-04-17 | Accuracy is Not Agreement: Expert-Aligned Evaluation of Crash Narrative Classification Models | Sudesh Ramesh Bhagat et.al. | 2504.13068 | null |
2025-04-17 | Aspect-Based Summarization with Self-Aspect Retrieval Enhanced Generation | Yichao Feng et.al. | 2504.13054 | null |
2025-04-16 | Towards Learning to Complete Anything in Lidar | Ayca Takmaz et.al. | 2504.12264 | null |
2025-04-16 | Cobra: Efficient Line Art COlorization with BRoAder References | Junhao Zhuang et.al. | 2504.12240 | null |
2025-04-16 | Exploring GRBs and supernovae connection: does a superluminous hypernova population exist? | Achille Fiore et.al. | 2504.12224 | null |
2025-04-16 | Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification | Jaime E. Cuellar et.al. | 2504.12180 | null |
2025-04-16 | FocusedAD: Character-centric Movie Audio Description | Xiaojun Ye et.al. | 2504.12157 | null |
2025-04-16 | ARCeR: an Agentic RAG for the Automated Definition of Cyber Ranges | Matteo Lupinacci et.al. | 2504.12143 | null |
2025-04-16 | Multilingual Contextualization of Large Language Models for Document-Level Machine Translation | Miguel Moura Ramos et.al. | 2504.12140 | null |
2025-04-16 | Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models - | Laura Fieback et.al. | 2504.12137 | null |
2025-04-16 | Clarifying Ambiguities: on the Role of Ambiguity Types in Prompting Methods for Clarification Generation | Anfu Tang et.al. | 2504.12113 | null |
2025-04-16 | A Diffusion-Based Framework for Terrain-Aware Remote Sensing Image Reconstruction | Zhenyu Yu et.al. | 2504.12112 | null |
2025-04-15 | Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception | Ziqi Pang et.al. | 2504.11457 | null |
2025-04-15 | SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL | Junke Wang et.al. | 2504.11455 | null |
2025-04-15 | RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models | Juan Diego Rodriguez et.al. | 2504.11381 | null |
2025-04-15 | DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks | Yupei Liu et.al. | 2504.11358 | null |
2025-04-16 | Seedream 3.0 Technical Report | Yu Gao et.al. | 2504.11346 | null |
2025-04-15 | A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce | Wei Xiong et.al. | 2504.11343 | null |
2025-04-15 | A Mathematical Framework of Semantic Communication based on Category Theory | Shuheng Hua et.al. | 2504.11334 | null |
2025-04-15 | Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis | Hao Liu et.al. | 2504.11331 | null |
2025-04-15 | Decorrelation in Complex Wave Scattering | Qihang Zhang et.al. | 2504.11330 | null |
2025-04-15 | Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Ruicheng Ao et.al. | 2504.11320 | null |
2025-04-14 | Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding | Tao Zhang et.al. | 2504.10465 | null |
2025-04-14 | Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling? | Olha Shaposhnyk et.al. | 2504.10397 | null |
2025-04-14 | Brain-Machine Interfaces & Information Retrieval Challenges and Opportunities | Yashar Moshfeghi et.al. | 2504.10371 | null |
2025-04-14 | SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning | Yiting Wang et.al. | 2504.10369 | null |
2025-04-14 | DICE: A Framework for Dimensional and Contextual Evaluation of Language Models | Aryan Shrivastava et.al. | 2504.10359 | null |
2025-04-14 | Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis | Yifan Yang et.al. | 2504.10352 | null |
2025-04-15 | Efficient Prompt Tuning for Hierarchical Ingredient Recognition | Yinxuan Gui et.al. | 2504.10322 | null |
2025-04-14 | SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model | Zongcan Ding et.al. | 2504.10320 | null |
2025-04-14 | Analysis of Attention in Video Diffusion Transformers | Yuxin Wen et.al. | 2504.10317 | null |
2025-04-14 | ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting | Huiqi Wu et.al. | 2504.10316 | null |
2025-04-11 | Towards an Understanding of Context Utilization in Code Intelligence | Yanlin Wang et.al. | 2504.08734 | null |
2025-04-11 | Generating Fine Details of Entity Interactions | Xinyi Gu et.al. | 2504.08714 | null |
2025-04-11 | Fast-Slow-Thinking: Complex Task Solving with Large Language Models | Yiliu Sun et.al. | 2504.08690 | null |
2025-04-11 | Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis | Alexandre Bazin et.al. | 2504.08666 | null |
2025-04-11 | Quality evaluation of Tabby coding assistant using real source code snippets | Marta Borek et.al. | 2504.08650 | null |
2025-04-11 | Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Jialu Li et.al. | 2504.08641 | null |
2025-04-11 | A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English | Julian Bäumler et.al. | 2504.08609 | null |
2025-04-11 | Lexical Bundle Frequency as a Construct-Relevant Candidate Feature in Automated Scoring of L2 Academic Writing | Burak Senel et.al. | 2504.08537 | null |
2025-04-11 | Task Memory Engine (TME): Enhancing State Awareness for Multi-Step LLM Agent Tasks | Ye Ye et.al. | 2504.08525 | null |
2025-04-11 | Scholar Inbox: Personalized Paper Recommendations for Scientists | Markus Flicke et.al. | 2504.08385 | null |
2025-04-10 | C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Zhongyang Li et.al. | 2504.07964 | link |
2025-04-10 | Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge | Riccardo Cantini et.al. | 2504.07887 | link |
2025-04-10 | Towards Sustainable Creativity Support: An Exploratory Study on Prompt Based Image Generation | Daniel Hove Paludan et.al. | 2504.07879 | null |
2025-04-10 | SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos | Joshua Li et.al. | 2504.07867 | null |
2025-04-10 | 2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization | Mengyang Li et.al. | 2504.07856 | null |
2025-04-10 | Understanding Learner-LLM Chatbot Interactions and the Impact of Prompting Guidelines | Cansu Koyuturk et.al. | 2504.07840 | null |
2025-04-10 | HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss | Yi Huang et.al. | 2504.07827 | null |
2025-04-10 | What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks | Pavel Chizhov et.al. | 2504.07825 | link |
2025-04-10 | A System for Comprehensive Assessment of RAG Frameworks | Mattia Rengo et.al. | 2504.07803 | link |
2025-04-10 | FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness | Chandan Kumar Sah et.al. | 2504.07801 | null |
2025-04-09 | A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility | Andreas Hochlehnert et.al. | 2504.07086 | null |
2025-04-09 | Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection | Ruoyu Chen et.al. | 2504.07060 | link |
2025-04-09 | TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling | Liang-Hsuan Tseng et.al. | 2504.07053 | link |
2025-04-09 | Towards LLMs Robustness to Changes in Prompt Format Styles | Lilian Ngweta et.al. | 2504.06969 | null |
2025-04-09 | RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts | Natalia Loukachevitch et.al. | 2504.06947 | null |
2025-04-09 | Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration | Kostas Hatalis et.al. | 2504.06943 | null |
2025-04-09 | FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks | Dekun Dai et.al. | 2504.06939 | null |
2025-04-09 | MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs | Jiawei Mao et.al. | 2504.06897 | null |
2025-04-09 | MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Chang Nie et.al. | 2504.06863 | null |
2025-04-09 | EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation | Diljeet Jagpal et.al. | 2504.06861 | null |
2025-04-09 | Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Gleb Rodionov et.al. | 2504.06261 | null |
2025-04-08 | Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning | Muhammad Baqer Mollah et.al. | 2504.06173 | link |
2025-04-08 | A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning | Akash Kumar et.al. | 2504.06153 | null |
2025-04-08 | Multi-Sense Embeddings for Language Models and Knowledge Distillation | Qitong Wang et.al. | 2504.06036 | null |
2025-04-08 | Information-Theoretic Reward Decomposition for Generalizable RLHF | Liyuan Mao et.al. | 2504.06020 | null |
2025-04-08 | Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? | Roman Kochnev et.al. | 2504.06006 | null |
2025-04-08 | econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians | Can Zhang et.al. | 2504.06003 | null |
2025-04-08 | NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge | Firoj Alam et.al. | 2504.05995 | null |
2025-04-08 | An Empirical Study of GPT-4o Image Generation Capabilities | Sixiang Chen et.al. | 2504.05979 | null |
2025-04-08 | AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting | Xiaolin Fan et.al. | 2504.05966 | null |
2025-04-07 | CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models | Kavana Venkatesh et.al. | 2504.05306 | null |
2025-04-07 | URECA: Unique Region Caption Anything | Sangbeom Lim et.al. | 2504.05305 | null |
2025-04-08 | NoveltyBench: Evaluating Language Models for Humanlike Diversity | Yiming Zhang et.al. | 2504.05228 | null |
2025-04-08 | Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG | Hengran Zhang et.al. | 2504.05220 | null |
2025-04-07 | MSA-UNet3+: Multi-Scale Attention UNet3+ with New Supervised Prototypical Contrastive Loss for Coronary DSA Image Segmentation | Rayan Merghani Ahmed et.al. | 2504.05184 | null |
2025-04-07 | BRIDGES: Bridging Graph Modality and Large Language Models within EDA Tasks | Wei Li et.al. | 2504.05180 | null |
2025-04-07 | Attention-Based Multi-Scale Temporal Fusion Network for Uncertain-Mode Fault Diagnosis in Multimode Processes | Guangqiang Li et.al. | 2504.05172 | null |
2025-04-07 | Pr $εε$ mpt: Sanitizing Sensitive Prompts for LLMs | Amrita Roy Chowdhury et.al. | 2504.05147 | null |
2025-04-07 | DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration | Jiamei Xiong et.al. | 2504.05135 | null |
2025-04-07 | ABCDWaveNet: Advancing Robust Road Ponding Detection in Fog through Dynamic Frequency-Spatial Synergy | Ronghui Zhang et.al. | 2504.05112 | null |
2025-04-04 | Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions | Ting-Hsuan Liao et.al. | 2504.03639 | null |
2025-04-04 | VISTA-OCR: Towards generative and interactive end to end OCR models | Laziz Hamdi et.al. | 2504.03621 | null |
2025-04-04 | PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector | Kaidong Li et.al. | 2504.03563 | null |
2025-04-04 | Diverse In-Context Example Selection After Decomposing Programs and Aligned Utterances Improves Semantic Parsing | Mayank Kothyari et.al. | 2504.03541 | link |
2025-04-04 | State estimation for gas purity monitoring and control in water electrolysis systems | Lucas Cammann et.al. | 2504.03522 | null |
2025-04-04 | ATM-Net: Anatomy-Aware Text-Guided Multi-Modal Fusion for Fine-Grained Lumbar Spine Segmentation | Sheng Lian et.al. | 2504.03476 | null |
2025-04-04 | Locations of Characters in Narratives: Andersen and Persuasion Datasets | Batuhan Ozyurt et.al. | 2504.03434 | link |
2025-04-04 | MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance | Chen Hu et.al. | 2504.03379 | null |
2025-04-04 | Point Cloud-based Grasping for Soft Hand Exoskeleton | Chen Hu et.al. | 2504.03369 | null |
2025-04-04 | Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification | Francesca Ronchini et.al. | 2504.03329 | null |
2025-04-03 | A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models | Gaurav Verma et.al. | 2504.02793 | null |
2025-04-03 | A Framework for Robust Cognitive Evaluation of LLMs | Karin de Langis et.al. | 2504.02789 | null |
2025-04-03 | From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks | Joshua Holstein et.al. | 2504.02780 | null |
2025-04-03 | BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs | Alexander Leszczynski et.al. | 2504.02779 | link |
2025-04-03 | Robot-Led Vision Language Model Wellbeing Assessment of Children | Nida Itrat Abbasi et.al. | 2504.02765 | null |
2025-04-04 | RBT4DNN: Requirements-based Testing of Neural Networks | Nusrat Jahan Mozumder et.al. | 2504.02737 | link |
2025-04-03 | Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study | Aryan Agrawal et.al. | 2504.02733 | link |
2025-04-03 | LLM for Complex Reasoning Task: An Exploratory Study in Fermi Problems | Zishuo Liu et.al. | 2504.02671 | null |
2025-04-03 | Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic Segmentation | Feng Gao et.al. | 2504.02647 | link |
2025-04-03 | Prompt Optimization with Logged Bandit Data | Haruka Kiyohara et.al. | 2504.02646 | null |
2025-04-03 | Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation | Baban Gain et.al. | 2504.01919 | null |
2025-04-02 | Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework | Andrey Sidorenko et.al. | 2504.01908 | link |
2025-04-02 | Is Temporal Prompting All We Need For Limited Labeled Action Recognition? | Shreyank N Gowda et.al. | 2504.01890 | null |
2025-04-02 | Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks | Ali Al-Kaswan et.al. | 2504.01850 | null |
2025-04-02 | Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images | Nusrat Munia et.al. | 2504.01838 | link |
2025-04-02 | Implicit Bias Injection Attacks against Text-to-Image Diffusion Models | Huayang Huang et.al. | 2504.01819 | link |
2025-04-02 | UniViTAR: Unified Vision Transformer with Native Resolution | Limeng Qiao et.al. | 2504.01792 | null |
2025-04-02 | Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation | Mingrui Ye et.al. | 2504.01764 | link |
2025-04-02 | Stable Structure Learning with HC-Stable and Tabu-Stable Algorithms | Neville K. Kitson et.al. | 2504.01740 | link |
2025-04-02 | TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication | Petr Vanc et.al. | 2504.01708 | null |
2025-03-31 | Consistent Subject Generation via Contrastive Instantiated Concepts | Lee Hsin-Ying et.al. | 2503.24387 | null |
2025-03-31 | Effectively Controlling Reasoning Models through Thinking Intervention | Tong Wu et.al. | 2503.24370 | null |
2025-03-31 | ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion | Rana Muhammad Shahroz Khan et.al. | 2503.24354 | null |
2025-03-31 | Contextual Preference Collaborative Measure Framework Based on Belief System | Hang Yu et.al. | 2503.24328 | null |
2025-03-31 | A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG | Arshia Kermani et.al. | 2503.24307 | null |
2025-03-31 | Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning | Jiacheng Lin et.al. | 2503.24289 | link |
2025-03-31 | EP240414a: Off-axis View of a Jet-Cocoon System from an Expanded Progenitor Star | Jian-He Zheng et.al. | 2503.24266 | null |
2025-04-02 | Text2Tracks: Prompt-based Music Recommendation via Generative Retrieval | Enrico Palumbo et.al. | 2503.24193 | null |
2025-03-31 | Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms | Shuoming Zhang et.al. | 2503.24191 | null |
2025-03-31 | LLM4FS: Leveraging Large Language Models for Feature Selection and How to Improve It | Jianhao Li et.al. | 2503.24157 | null |
2025-03-28 | ActionStudio: A Lightweight Framework for Data and Training of Action Models | Jianguo Zhang et.al. | 2503.22673 | link |
2025-03-28 | Unicorn: Text-Only Data Synthesis for Vision Language Model Training | Xiaomin Yu et.al. | 2503.22655 | link |
2025-03-28 | Shadow and gravitational lensing produced by the nonlinear accretion of a scalar field onto a black hole | J. C. Acevedo-Muñoz et.al. | 2503.22624 | null |
2025-03-28 | Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users | Antonia Karamolegkou et.al. | 2503.22610 | null |
2025-03-28 | Towards a Quantum Information Theory of Hadronization: Dihadron Fragmentation and Neutral Polarization in Heavy Baryons | Rebecca von Kuk et.al. | 2503.22607 | null |
2025-03-28 | Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish | Kevin Cohen et.al. | 2503.22585 | link |
2025-03-28 | Pseudovarieties of semigroups | Jorge Almeida et.al. | 2503.22546 | null |
2025-03-28 | Automated UX Insights from User Research Videos by Integrating Facial Emotion and Text Sentiment | Simran Kaur Ghatoray et.al. | 2503.22510 | null |
2025-03-28 | Generative Reliability-Based Design Optimization Using In-Context Learning Capabilities of Large Language Models | Zhonglin Jiang et.al. | 2503.22401 | null |
2025-03-28 | Fighting Fire with Fire: Channel-Independent RF Fingerprinting via the Ratio of Linear to Logarithmic Differential Spectrum | Tianshu Chen et.al. | 2503.22378 | null |
2025-03-27 | Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model | Abdelrahman Shaker et.al. | 2503.21782 | link |
2025-03-27 | VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | Chi-Pin Huang et.al. | 2503.21781 | null |
2025-03-27 | Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation | Reza Qorbani et.al. | 2503.21780 | link |
2025-03-27 | Test-Time Visual In-Context Tuning | Jiahao Xie et.al. | 2503.21777 | link |
2025-03-27 | MemInsight: Autonomous Memory Augmentation for LLM Agents | Rana Salama et.al. | 2503.21760 | null |
2025-03-27 | Lumina-Image 2.0: A Unified and Efficient Image Generative Framework | Qi Qin et.al. | 2503.21758 | link |
2025-03-27 | VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness | Dian Zheng et.al. | 2503.21755 | link |
2025-03-27 | LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis | Shitian Zhao et.al. | 2503.21749 | null |
2025-03-27 | 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models | Yuhan Zhang et.al. | 2503.21745 | null |
2025-03-27 | GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics | Arsham Gholamzadeh Khoee et.al. | 2503.21735 | null |
2025-03-26 | Understanding R1-Zero-Like Training: A Critical Perspective | Zichen Liu et.al. | 2503.20783 | link |
2025-03-26 | Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising | Yan-Bo Lin et.al. | 2503.20782 | null |
2025-03-26 | Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields | Shijie Zhou et.al. | 2503.20776 | null |
2025-03-27 | Beyond Believability: Accurate Human Behavior Simulation with Fine-Tuned LLMs | Yuxuan Lu et.al. | 2503.20749 | null |
2025-03-26 | Vision as LoRA | Han Wang et.al. | 2503.20680 | link |
2025-03-26 | BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation | Yuyang Peng et.al. | 2503.20672 | null |
2025-03-26 | AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction | Sadaf Khademi et.al. | 2503.20662 | null |
2025-03-26 | AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports | Xiangwen Zhang et.al. | 2503.20654 | null |
2025-03-26 | Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | Han Wu et.al. | 2503.20641 | link |
2025-03-26 | IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting | Hao Fu et.al. | 2503.20612 | link |
2025-03-25 | Scaling Vision Pre-Training to 4K Resolution | Baifeng Shi et.al. | 2503.19903 | null |
2025-03-25 | Scaling Down Text Encoders of Text-to-Image Diffusion Models | Lifu Wang et.al. | 2503.19897 | link |
2025-03-25 | A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design | Jie Tian et.al. | 2503.19889 | null |
2025-03-25 | CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation | Nengbo Wang et.al. | 2503.19878 | null |
2025-03-25 | Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators | Seungone Kim et.al. | 2503.19877 | null |
2025-03-25 | An Overview of Low-Rank Structures in the Training and Adaptation of Large Models | Laura Balzano et.al. | 2503.19859 | null |
2025-03-25 | Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | Xiaoyu Tian et.al. | 2503.19855 | null |
2025-03-25 | Towards Online Multi-Modal Social Interaction Understanding | Xinpeng Li et.al. | 2503.19851 | link |
2025-03-25 | A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950 | Zhao Fang et.al. | 2503.19844 | null |
2025-03-25 | Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy | Athiya Deviyani et.al. | 2503.19828 | null |
2025-03-24 | Target-Aware Video Diffusion Models | Taeksoo Kim et.al. | 2503.18950 | null |
2025-03-24 | Equivariant Image Modeling | Ruixiao Dong et.al. | 2503.18948 | link |
2025-03-24 | Video-T1: Test-Time Scaling for Video Generation | Fangfu Liu et.al. | 2503.18942 | null |
2025-03-25 | Coincidence measurement of two-photon double ionization of argon through an autoionizing resonance | Sebastian Hell et.al. | 2503.18913 | null |
2025-03-24 | AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration | Zhexuan Wang et.al. | 2503.18891 | link |
2025-03-24 | Efficient and Accurate Scene Text Recognition with Cascaded-Transformers | Savas Ozkan et.al. | 2503.18883 | null |
2025-03-24 | Efficient Self-Supervised Adaptation for Medical Image Analysis | Moein Sorkhei et.al. | 2503.18873 | link |
2025-03-24 | Reasoning to Learn from Latent Thoughts | Yangjun Ruan et.al. | 2503.18866 | null |
2025-03-25 | MC-LLaVA: Multi-Concept Personalized Vision-Language Model | Ruichuan An et.al. | 2503.18854 | link |
2025-03-24 | 3DSwapping: Texture Swapping For 3D Object From Single Reference Image | Xiao Cao et.al. | 2503.18853 | null |
2025-03-21 | Core Components of Emotional Impulsivity: A Mouse-Cursor Tracking Study | Anton Leontyev et.al. | 2503.17328 | null |
2025-03-21 | FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models | Mingyang Song et.al. | 2503.17287 | link |
2025-03-21 | Revisiting End To End Sparse Autoencoder Training – A Short Finetune is All You Need | Adam Karvonen et.al. | 2503.17272 | link |
2025-03-21 | Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology | Devavrat Tomar et.al. | 2503.17238 | null |
2025-03-21 | LLMs Love Python: A Study of LLMs’ Bias for Programming Languages and Libraries | Lukas Twist et.al. | 2503.17181 | link |
2025-03-21 | ExplainitAI: When do we trust artificial intelligence? The influence of content and explainability in a cross-cultural comparison | Sora Kang et.al. | 2503.17158 | null |
2025-03-21 | Modifying Large Language Model Post-Training for Diverse Creative Writing | John Joon Young Chung et.al. | 2503.17126 | null |
2025-03-21 | Multi-modal Multi-platform Person Re-Identification: Benchmark and Method | Ruiyang Ha et.al. | 2503.17096 | null |
2025-03-21 | Collapse of Rotating White Dwarfs and Multimessenger Signals | Takami Kuroda et.al. | 2503.17082 | null |
2025-03-21 | Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? | Jeremy Barnes et.al. | 2503.17039 | null |
2025-03-20 | DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding | Keyan Chen et.al. | 2503.16426 | link |
2025-03-20 | Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Yang Sui et.al. | 2503.16419 | link |
2025-03-20 | Sparse Nonparametric Contextual Bandits | Hamish Flynn et.al. | 2503.16382 | null |
2025-03-20 | Enhancing Software Quality Assurance with an Adaptive Differential Evolution based Quantum Variational Autoencoder-Transformer Model | Seshu Babu Barma et.al. | 2503.16335 | null |
2025-03-20 | LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates | Ying Shen et.al. | 2503.16334 | null |
2025-03-20 | Issue2Test: Generating Reproducing Test Cases from Issue Reports | Noor Nashid et.al. | 2503.16320 | null |
2025-03-20 | PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification | Sharon Peled et.al. | 2503.16284 | link |
2025-03-20 | Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data | Zijian Li et.al. | 2503.16260 | null |
2025-03-20 | M2N2V2: Multi-Modal Unsupervised and Training-free Interactive Segmentation | Markus Karmann et.al. | 2503.16254 | null |
2025-03-20 | AI Agents in Cryptoland: Practical Attacks and No Silver Bullet | Atharv Singh Patlan et.al. | 2503.16248 | null |
2025-03-20 | Dynamic Bi-Elman Attention Networks (DBEAN): Dual-Directional Context-Aware Representation Learning for Enhanced Text Classification | ZhengLin Lai et.al. | 2503.15469 | link |
2025-03-19 | Visual Position Prompt for MLLM based Visual Grounding | Wei Tang et.al. | 2503.15426 | link |
2025-03-19 | Probing the topology of the space of tokens with structured prompts | Michael Robinson et.al. | 2503.15421 | null |
2025-03-19 | A time-to-event three-outcome design for randomized phase II cancer trials | Minghua Shan et.al. | 2503.15418 | null |
2025-03-19 | TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification | Junnan Zhu et.al. | 2503.15289 | null |
2025-03-19 | TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models | Teng-Fang Hsiao et.al. | 2503.15283 | null |
2025-03-19 | Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning? | Roberto Araya et.al. | 2503.15268 | null |
2025-03-19 | Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study | Jomar Thomas Almonte et.al. | 2503.15248 | null |
2025-03-19 | CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification | Wenlong Yu et.al. | 2503.15234 | link |
2025-03-19 | Context-Aware Vision Language Foundation Models for Ocular Disease Screening in Retinal Images | Lucie Berger et.al. | 2503.15212 | null |
2025-03-18 | Aligning Multimodal LLM with Human Preference: A Survey | Tao Yu et.al. | 2503.14504 | null |
2025-03-18 | The Power of Context: How Multimodality Improves Image Super-Resolution | Kangfu Mei et.al. | 2503.14503 | null |
2025-03-18 | Tracking Meets Large Multimodal Models for Driving Scenario Understanding | Ayesha Ishaq et.al. | 2503.14498 | link |
2025-03-18 | Gricean Norms as a Basis for Effective Collaboration | Fardin Saad et.al. | 2503.14484 | link |
2025-03-18 | ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing | Yulin Pan et.al. | 2503.14482 | null |
2025-03-18 | LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers | Nikhil Abhyankar et.al. | 2503.14434 | link |
2025-03-18 | MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation | Hongyu Zhang et.al. | 2503.14428 | null |
2025-03-18 | Large Language Models for Virtual Human Gesture Selection | Parisa Ghanad Torshizi et.al. | 2503.14408 | null |
2025-03-18 | Impossible Videos | Zechen Bai et.al. | 2503.14378 | null |
2025-03-18 | RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment | Chao Wang et.al. | 2503.14358 | null |
2025-03-17 | Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance | Noah Y. Siegel et.al. | 2503.13445 | null |
2025-03-17 | VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning | Ye Liu et.al. | 2503.13444 | link |
2025-03-17 | DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models | Haoyang Li et.al. | 2503.13443 | link |
2025-03-18 | MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Yingyue Li et.al. | 2503.13440 | link |
2025-03-18 | DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective | Dengyun Peng et.al. | 2503.13413 | link |
2025-03-17 | MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | James Burgess et.al. | 2503.13399 | link |
2025-03-17 | Aligned Probing: Relating Toxic Behavior and Model Internals | Andreas Waldis et.al. | 2503.13390 | null |
2025-03-17 | Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning | Hai-Long Sun et.al. | 2503.13360 | null |
2025-03-17 | LEAVS: An LLM-based Labeler for Abdominal CT Supervision | Ricardo Bigolin Lanfredi et.al. | 2503.13330 | link |
2025-03-17 | Edit Transfer: Learning Image Editing via Vision In-Context Relations | Lan Chen et.al. | 2503.13327 | null |
2025-03-14 | RNN-DAS: A New Deep Learning Approach for Detection and Real-Time Monitoring of Volcano-Tectonic Events Using Distributed Acoustic Sensing | Javier Fernandez-Carabantes et.al. | 2503.11622 | null |
2025-03-14 | Synthesizing Access Control Policies using Large Language Models | Adarsh Vatsa et.al. | 2503.11573 | null |
2025-03-14 | Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models | Hao Cheng et.al. | 2503.11519 | null |
2025-03-14 | Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks | Diego Gosmar et.al. | 2503.11517 | link |
2025-03-14 | T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation | Seyed Mohammad Hadi Hosseini et.al. | 2503.11481 | null |
2025-03-14 | Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models | Xu Liu et.al. | 2503.11411 | null |
2025-03-14 | Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages | Jiyeong Kim et.al. | 2503.11384 | null |
2025-03-14 | Modeling Subjectivity in Cognitive Appraisal with Language Models | Yuxiang Zhou et.al. | 2503.11381 | null |
2025-03-14 | Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model | Moritz A. Zanger et.al. | 2503.11339 | null |
2025-03-14 | AI-Assisted Object Condensation Clustering for Calorimeter Shower Reconstruction at CLAS12 | Gregory Matousek et.al. | 2503.11277 | null |
2025-03-13 | GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing | Rongyao Fang et.al. | 2503.10639 | link |
2025-03-14 | Distilling Diversity and Control in Diffusion Models | Rohit Gandikota et.al. | 2503.10637 | null |
2025-03-13 | V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes | Yanming Zhang et.al. | 2503.10634 | null |
2025-03-13 | Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search | Andy Zhou et.al. | 2503.10619 | null |
2025-03-13 | Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models | Andy Zhou et.al. | 2503.10617 | null |
2025-03-13 | ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer | Bolin Chen et.al. | 2503.10614 | null |
2025-03-13 | Unlock the Power of Unlabeled Data in Language Driving Model | Chaoqun Wang et.al. | 2503.10586 | null |
2025-03-13 | ASIDE: Architectural Separation of Instructions and Data in Language Models | Egor Zverev et.al. | 2503.10566 | null |
2025-03-13 | MASQUE: A Text-Guided Diffusion-Based Framework for Localized and Customized Adversarial Makeup | Youngjin Kwon et.al. | 2503.10549 | null |
2025-03-13 | KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation | Zixian Liu et.al. | 2503.10546 | null |
2025-03-12 | MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System | Jihao Zhao et.al. | 2503.09600 | link |
2025-03-12 | Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot | Andrew Crossman et.al. | 2503.09586 | null |
2025-03-12 | Evolution of the Three Spectral Components in the Prompt Emission of GRB 240825A | Chen-Wei Wang et.al. | 2503.09562 | null |
2025-03-12 | Contextuality sans incompatibility in the simplest scenario: Communication supremacy of a qubit | Partha Patra et.al. | 2503.09534 | null |
2025-03-12 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Bowen Jin et.al. | 2503.09516 | link |
2025-03-12 | Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection | Romain Thoreau et.al. | 2503.09493 | null |
2025-03-12 | SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery | Jiayuan Huang et.al. | 2503.09474 | null |
2025-03-12 | Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models | Zhihua Tian et.al. | 2503.09446 | null |
2025-03-12 | SuperCarver: Texture-Consistent 3D Geometry Super-Resolution for High-Fidelity Surface Detail Generation | Qijian Zhang et.al. | 2503.09439 | null |
2025-03-12 | PromptMap: An Alternative Interaction Style for AI-Based Image Generation | Krzysztof Adamkiewicz et.al. | 2503.09436 | link |
2025-03-11 | Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs | Ariba Khan et.al. | 2503.08688 | link |
2025-03-11 | OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models | Jialv Zou et.al. | 2503.08686 | link |
2025-03-11 | Chain-of-Thought Reasoning In The Wild Is Not Always Faithful | Iván Arcuschin et.al. | 2503.08679 | link |
2025-03-11 | AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence | Zekun Li et.al. | 2503.08669 | null |
2025-03-11 | Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling | Subin Kim et.al. | 2503.08605 | null |
2025-03-11 | NSF-SciFy: Mining the NSF Awards Database for Scientific Claims | Delip Rao et.al. | 2503.08600 | null |
2025-03-11 | There’s more to life in reflected light: Simulating the detectability of a range of molecules for high-contrast, high-resolution observations of non-transiting terrestrial exoplanets | Miles H. Currie et.al. | 2503.08592 | null |
2025-03-11 | BiasEdit: Debiasing Stereotyped Language Models via Model Editing | Xin Xu et.al. | 2503.08588 | link |
2025-03-11 | Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation | Mingkang Zhu et.al. | 2503.08575 | null |
2025-03-11 | ComicsPAP: understanding comic strips by picking the correct panel | Emanuele Vivoli et.al. | 2503.08561 | null |
2025-03-10 | GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval | Justus-Jonas Erker et.al. | 2503.07519 | link |
2025-03-10 | TokenButler: Token Importance is Predictable | Yash Akhauri et.al. | 2503.07518 | link |
2025-03-10 | CPAny: Couple With Any Encoder to Refer Multi-Object Tracking | Weize Li et.al. | 2503.07516 | null |
2025-03-10 | Language Models Fail to Introspect About Their Knowledge of Language | Siyuan Song et.al. | 2503.07513 | link |
2025-03-10 | Plume: Scaffolding Text Composition in Dashboards | Maxim Lisnic et.al. | 2503.07512 | null |
2025-03-10 | Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts | Shiu-hong Kao et.al. | 2503.07503 | null |
2025-03-10 | V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation | Guiwei Zhang et.al. | 2503.07493 | link |
2025-03-10 | Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction | Zongzheng Zhang et.al. | 2503.07485 | link |
2025-03-10 | YOLOE: Real-Time Seeing Anything | Ao Wang et.al. | 2503.07465 | link |
2025-03-10 | Anatomy-Aware Conditional Image-Text Retrieval | Meng Zheng et.al. | 2503.07456 | null |
2025-03-10 | From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Jaewook Lee et.al. | 2503.07429 | null |
2025-03-10 | TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision | Shaobin Zhuang et.al. | 2503.07416 | null |
2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413 | link |
2025-03-10 | TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models | Ruidong Chen et.al. | 2503.07389 | link |
2025-03-10 | Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment | Xing Xie et.al. | 2503.07334 | link |
2025-03-10 | Self-Corrective Task Planning by Inverse Prompting with Large Language Models | Jiho Lee et.al. | 2503.07317 | null |
2025-03-10 | Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies | Luyi Jiang et.al. | 2503.07306 | null |
2025-03-07 | Fairness-Aware Low-Rank Adaptation Under Demographic Privacy Constraints | Parameswaran Kamalaruban et.al. | 2503.05684 | null |
2025-03-07 | Task-oriented Uncertainty Collaborative Learning for Label-Efficient Brain Tumor Segmentation | Zhenxuan Zhang et.al. | 2503.05682 | null |
2025-03-07 | AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data | Zengqun Zhao et.al. | 2503.05665 | link |
2025-03-07 | VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control | Yuxuan Bian et.al. | 2503.05639 | link |
2025-03-07 | Nuanced Safety for Generative AI: How Demographics Shape Responsiveness to Severity | Pushkar Mishra et.al. | 2503.05609 | null |
2025-03-07 | Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models | Zheng Li et.al. | 2503.05595 | link |
2025-03-07 | Evaluating open-source Large Language Models for automated fact-checking | Nicolo’ Fontana et.al. | 2503.05565 | null |
2025-03-07 | S4M: Segment Anything with 4 Extreme Points | Adrien Meyer et.al. | 2503.05534 | null |
2025-03-07 | State-of-the-Art Stroke Lesion Segmentation at 1/1000th of Parameters | Alex Fedorov et.al. | 2503.05531 | null |
2025-03-07 | Cognitive Bias Detection Using Advanced Prompt Engineering | Frederic Lemieux et.al. | 2503.05516 | null |
2025-03-07 | Shifting Long-Context LLMs Research from Input to Output | Yuhao Wu et.al. | 2503.04723 | null |
2025-03-06 | Enough Coin Flips Can Make LLMs Act Bayesian | Ritwik Gupta et.al. | 2503.04722 | null |
2025-03-06 | Scaling Rich Style-Prompted Text-to-Speech Datasets | Anuj Diwan et.al. | 2503.04713 | link |
2025-03-06 | L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning | Pranjal Aggarwal et.al. | 2503.04697 | null |
2025-03-06 | Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation | Aishik Konwer et.al. | 2503.04639 | null |
2025-03-06 | SynGraph: A Dynamic Graph-LLM Synthesis Framework for Sparse Streaming User Sentiment Modeling | Xin Zhang et.al. | 2503.04619 | null |
2025-03-06 | Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation | Armel Zebaze et.al. | 2503.04554 | null |
2025-03-06 | Generalized Interpolating Discrete Diffusion | Dimitri von Rütte et.al. | 2503.04482 | null |
2025-03-06 | ToolFuzz – Automated Agent Tool Testing | Ivan Milev et.al. | 2503.04479 | null |
2025-03-06 | Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges | Francisco Eiras et.al. | 2503.04474 | null |
2025-03-05 | A Practical Memory Injection Attack against LLM Agents | Shen Dong et.al. | 2503.03704 | null |
2025-03-05 | A Generative Approach to High Fidelity 3D Reconstruction from Text Data | Venkat Kumar R et.al. | 2503.03664 | null |
2025-03-05 | LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant | Wei Li et.al. | 2503.03663 | null |
2025-03-05 | Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset | Jessica Hoffmann et.al. | 2503.03654 | null |
2025-03-05 | Token-Level Privacy in Large Language Models | Re’em Harel et.al. | 2503.03652 | null |
2025-03-05 | DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms | Xiaojun Bi et.al. | 2503.03644 | link |
2025-03-05 | Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering | Lingli Cao et.al. | 2503.03609 | link |
2025-03-05 | Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders | Kristian Kuznetsov et.al. | 2503.03601 | null |
2025-03-05 | Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs | Haoran Fan et.al. | 2503.03594 | link |
2025-03-05 | Digital Twin-Enabled Blockage-Aware Dynamic mmWave Multi-Hop V2X Communication | Supat Roongpraiwan et.al. | 2503.03590 | null |
2025-03-04 | Prompting Generative AI with Interaction-Augmented Instructions | Leixian Shen et.al. | 2503.02874 | null |
2025-03-04 | Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework | Ziang Zhou et.al. | 2503.02863 | null |
2025-03-04 | Evaluation of Architectural Synthesis Using Generative AI | Jingfei Huang et.al. | 2503.02861 | null |
2025-03-04 | A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness | Nathan Drenkow et.al. | 2503.02797 | null |
2025-03-04 | Quantitative Resilience Modeling for Autonomous Cyber Defense | Xavier Cadet et.al. | 2503.02780 | null |
2025-03-04 | Prime Convolutional Model: Breaking the Ground for Theoretical Explainability | Francesco Panelli et.al. | 2503.02773 | null |
2025-03-04 | From Metaphor to Mechanism: How LLMs Decode Traditional Chinese Medicine Symbolic Language for Modern Clinical Relevance | Jiacheng Tang et.al. | 2503.02760 | null |
2025-03-04 | BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression | Daniil Larionov et.al. | 2503.02756 | null |
2025-03-04 | Evaluating Knowledge Generation and Self-Refinement Strategies for LLM-based Column Type Annotation | Keti Korini et.al. | 2503.02718 | link |
2025-03-04 | FlowPlan: Zero-Shot Task Planning with LLM Flow Engineering for Robotic Instruction Following | Zijun Lin et.al. | 2503.02698 | null |
2025-02-28 | Persuasion Should be Double-Blind: A Multi-Domain Dialogue Dataset With Faithfulness Based on Causal Theory of Mind | Dingyi Zhang et.al. | 2502.21297 | null |
2025-02-28 | Contextualizing biological perturbation experiments through language | Menghua Wu et.al. | 2502.21290 | link |
2025-02-28 | Adaptive Keyframe Sampling for Long Video Understanding | Xi Tang et.al. | 2502.21271 | null |
2025-02-28 | RuCCoD: Towards Automated ICD Coding in Russian | Aleksandr Nesterov et.al. | 2502.21263 | link |
2025-02-28 | PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts | Boxiao Yu et.al. | 2502.21260 | null |
2025-02-28 | Towards Developing Ethical Reasoners: Integrating Probabilistic Reasoning and Decision-Making for Complex AI Systems | Nijesh Upreti et.al. | 2502.21250 | null |
2025-02-28 | Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens | Xinyu Shi et.al. | 2502.21219 | null |
2025-02-28 | Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought | Jianhao Huang et.al. | 2502.21212 | null |
2025-02-28 | CuPID: Leveraging Masked Single-Lead ECG Modelling for Enhancing the Representations | Adtian Atienza et.al. | 2502.21127 | null |
2025-02-28 | SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events | Yunfan Lu et.al. | 2502.21120 | null |
2025-02-27 | Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation | Sucheng Ren et.al. | 2502.20388 | link |
2025-02-27 | Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | Jeffrey Yang Fan Chiang et.al. | 2502.20383 | null |
2025-02-27 | Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers | Shalev Lifshitz et.al. | 2502.20379 | null |
2025-02-27 | Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization | Ryan C. Barron et.al. | 2502.20364 | null |
2025-02-27 | Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs | Kuan Lok Zhou et.al. | 2502.20356 | null |
2025-02-27 | On Adversarial Attacks In Acoustic Drone Localization | Tamir Shor et.al. | 2502.20325 | null |
2025-02-27 | ACCORD: Application Context-aware Cross-layer Optimization and Resource Design for 5G/NextG Machine-centric Applications | Azuka Chiejina et.al. | 2502.20320 | null |
2025-02-27 | LangProBe: a Language Programs Benchmark | Shangyin Tan et.al. | 2502.20315 | null |
2025-02-27 | Mobius: Text to Seamless Looping Video Generation via Latent Shift | Xiuli Bi et.al. | 2502.20307 | link |
2025-02-27 | Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription | Benjamin Gutteridge et.al. | 2502.20295 | link |
2025-02-26 | Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models | Lucy Xiaoyang Shi et.al. | 2502.19417 | null |
2025-02-26 | Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing | Akshat Gupta et.al. | 2502.19416 | null |
2025-02-26 | The Mighty ToRR: A Benchmark for Table Reasoning and Robustness | Shir Ashury-Tahan et.al. | 2502.19412 | link |
2025-02-26 | Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices | Xinru Wang et.al. | 2502.19410 | null |
2025-02-26 | DataMan: Data Manager for Pre-training Large Language Models | Ru Peng et.al. | 2502.19363 | null |
2025-02-26 | Optimal COVID-19 vaccine prioritization by age depends critically on inter-group contacts and vaccination rates | Iker Atienza-Diez et.al. | 2502.19292 | null |
2025-02-26 | CritiQ: Mining Data Quality Criteria from Human Preferences | Honglin Guo et.al. | 2502.19279 | null |
2025-02-26 | Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in Pre-trained Vision-Language Models | Jiawei Kong et.al. | 2502.19269 | null |
2025-02-26 | Enhancing Gradient-based Discrete Sampling via Parallel Tempering | Luxu Liang et.al. | 2502.19240 | null |
2025-02-26 | AI-Powered Bayesian Inference | Veronika Ročková et.al. | 2502.19231 | null |
2025-02-25 | K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs | Ziheng Ouyang et.al. | 2502.18461 | null |
2025-02-25 | Evaluating the Effectiveness of Small Language Models in Detecting Refactoring Bugs | Rohit Gheyi et.al. | 2502.18454 | null |
2025-02-25 | MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning | Chanwoo Park et.al. | 2502.18439 | null |
2025-02-25 | Rank1: Test-Time Compute for Reranking in Information Retrieval | Orion Weller et.al. | 2502.18418 | link |
2025-02-25 | MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification | Zhuoqin Yang et.al. | 2502.18416 | null |
2025-02-25 | ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation | Yifan Pu et.al. | 2502.18364 | null |
2025-02-25 | GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music | Xinran Liu et.al. | 2502.18309 | null |
2025-02-25 | LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation | Pengzhi Li et.al. | 2502.18302 | null |
2025-02-25 | Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training | Botao Ye et.al. | 2502.18219 | null |
2025-02-25 | FLARE: A Framework for Stellar Flare Forecasting using Stellar Physical Properties and Historical Records | Bingke Zhu et.al. | 2502.18218 | null |
2025-02-24 | Stronger Neyman Regret Guarantees for Adaptive Experimental Design | Georgy Noarov et.al. | 2502.17427 | link |
2025-02-24 | Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs | Jan Betley et.al. | 2502.17424 | link |
2025-02-24 | Function-Space Learning Rates | Edward Milsom et.al. | 2502.17405 | link |
2025-02-24 | What is a Good Question? Utility Estimation with LLM-based Simulations | Dong-Ho Lee et.al. | 2502.17383 | null |
2025-02-24 | A Closer Look at TabPFN v2: Strength, Limitation, and Extension | Han-Jia Ye et.al. | 2502.17361 | null |
2025-02-24 | Goal-Oriented Middleware Filtering at Transport Layer Based on Value of Updates | Polina Kutsevol et.al. | 2502.17350 | null |
2025-02-24 | Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents | Prafulla Kumar Choubey et.al. | 2502.17321 | null |
2025-02-24 | A novel approach to navigate the taxonomic hierarchy to address the Open-World Scenarios in Medicinal Plant Classification | Soumen Sinha et.al. | 2502.17289 | null |
2025-02-24 | Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing | Yi-Kai Zhang et.al. | 2502.17282 | link |
2025-02-24 | Extracting domain-specific terms using contextual word embeddings | Andraž Repar et.al. | 2502.17278 | null |
2025-02-21 | ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval | Guanqi Zhan et.al. | 2502.15682 | null |
2025-02-21 | AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind | Zhining Zhang et.al. | 2502.15676 | link |
2025-02-21 | Empowering LLMs with Logical Reasoning: A Comprehensive Survey | Fengxiang Cheng et.al. | 2502.15652 | null |
2025-02-21 | MemoryPods: Enhancing Asynchronous Communication in Extended Reality | Akos Nagy et.al. | 2502.15622 | null |
2025-02-21 | Extraction multi-étiquettes de relations en utilisant des couches de Transformer | Ngoc Luyen Le et.al. | 2502.15619 | null |
2025-02-21 | Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author’s Style | Xueran Han et.al. | 2502.15616 | null |
2025-02-21 | Ontological models cannot adequately represent state update for sequential measurement of incompatible observables | Alisson Tezzin et.al. | 2502.15615 | null |
2025-02-21 | Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid | Yunfeng Li et.al. | 2502.15583 | null |
2025-02-21 | Context-Aware Doubly-Robust Semi-Supervised Learning | Clement Ruah et.al. | 2502.15577 | null |
2025-02-21 | A Cautionary Tale About “Neutrally” Informative AI Tools Ahead of the 2025 Federal Elections in Germany | Ina Dormuth et.al. | 2502.15568 | null |
2025-02-20 | Prompt-to-Leaderboard | Evan Frick et.al. | 2502.14855 | link |
2025-02-20 | Red-Teaming LLM Multi-Agent Systems via Communication Attacks | Pengfei He et.al. | 2502.14847 | null |
2025-02-20 | Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation | Yue Yang et.al. | 2502.14846 | null |
2025-02-20 | Dynamic Concepts Personalization from Single Videos | Rameen Abdal et.al. | 2502.14844 | null |
2025-02-20 | Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps | Martin Tutek et.al. | 2502.14829 | link |
2025-02-20 | eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables | Luis Antonio Gutiérrez Guanilo et.al. | 2502.14820 | null |
2025-02-20 | Dynamic Low-Rank Sparse Adaptation for Large Language Models | Weizhong Huang et.al. | 2502.14816 | link |
2025-02-20 | Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration | Pengxiang Ding et.al. | 2502.14795 | null |
2025-02-20 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Tian Xie et.al. | 2502.14768 | link |
2025-02-20 | HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States | Yilei Jiang et.al. | 2502.14744 | link |
2025-02-19 | Where’s the Bug? Attention Probing for Scalable Fault Localization | Adam Stein et.al. | 2502.13966 | null |
2025-02-19 | RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision | Guangzhi Xiong et.al. | 2502.13957 | null |
2025-02-19 | Neurosymbolic artificial intelligence via large language models and coherence-driven inference | Steve Huntsman et.al. | 2502.13953 | null |
2025-02-19 | A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models | Hao Huang et.al. | 2502.13942 | null |
2025-02-19 | Citation proximus: the role of social and semantic ties in citing behaviour | Diego Kozlowski et.al. | 2502.13934 | null |
2025-02-19 | Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences? | Xiaochen Wang et.al. | 2502.13925 | null |
2025-02-19 | Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis | Jiahao Gai et.al. | 2502.13921 | null |
2025-02-19 | Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health | Xingbo Wang et.al. | 2502.13920 | null |
2025-02-19 | Judging the Judges: A Collection of LLM-Generated Relevance Judgements | Hossein A. Rahmani et.al. | 2502.13908 | link |
2025-02-19 | DataSciBench: An LLM Agent Benchmark for Data Science | Dan Zhang et.al. | 2502.13897 | link |
2025-02-18 | UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models | Huawei Lin et.al. | 2502.13141 | link |
2025-02-18 | Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions | Taedong Yun et.al. | 2502.13135 | null |
2025-02-18 | STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models | Narun Raman et.al. | 2502.13119 | null |
2025-02-18 | Near-Optimal Private Learning in Linear Contextual Bandits | Fan Chen et.al. | 2502.13115 | null |
2025-02-18 | KAPPA: A Generic Patent Analysis Framework with Keyphrase-Based Portraits | Xin Xia et.al. | 2502.13076 | null |
2025-02-18 | Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction | Nils Constantin Hellwig et.al. | 2502.13044 | null |
2025-02-18 | HPSS: Heuristic Prompting Strategy Search for LLM Evaluators | Bosi Wen et.al. | 2502.13031 | null |
2025-02-18 | Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks | Markus J. Buehler et.al. | 2502.13025 | link |
2025-02-18 | Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation | Sha Li et.al. | 2502.13019 | null |
2025-02-18 | LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation | Junchen Fu et.al. | 2502.12945 | null |
2025-02-17 | Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA | Patryk Marszałek et.al. | 2502.12122 | link |
2025-02-17 | A-MEM: Agentic Memory for LLM Agents | Wujiang Xu et.al. | 2502.12110 | link |
2025-02-17 | VLM $^2$ -Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues | Jianshu Zhang et.al. | 2502.12084 | null |
2025-02-17 | Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation | Zhongyi Qiu et.al. | 2502.12073 | null |
2025-02-17 | Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions | Lan Zhang et.al. | 2502.12065 | link |
2025-02-17 | Designing Role Vectors to Improve LLM Inference Behaviour | Daniele Potertì et.al. | 2502.12055 | null |
2025-02-17 | Robotic CBCT Meets Robotic Ultrasound | Feng Li et.al. | 2502.12019 | null |
2025-02-17 | Learning Generalizable Prompt for CLIP with Class Similarity Knowledge | Sehun Jung et.al. | 2502.11969 | null |
2025-02-17 | VAQUUM: Are Vague Quantifiers Grounded in Visual Data? | Hugh Mee Wong et.al. | 2502.11874 | null |
2025-02-17 | Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu | Renhao Pei et.al. | 2502.11862 | link |
2025-02-14 | Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction | WonJin Yoon et.al. | 2502.10388 | null |
2025-02-14 | Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models | Jiexin Ding et.al. | 2502.10378 | null |
2025-02-14 | Adversarial Mixup Unlearning | Zhuoyi Peng et.al. | 2502.10288 | null |
2025-02-14 | Are Large Language Models the future crowd workers of Linguistics? | Iris Ferrazzo et.al. | 2502.10266 | null |
2025-02-14 | VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models | Gokul Karthik Kumar et.al. | 2502.10250 | null |
2025-02-14 | Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Guoqing Ma et.al. | 2502.10248 | link |
2025-02-14 | Combinatorial Reinforcement Learning with Preference Feedback | Joongkyu Lee et.al. | 2502.10158 | null |
2025-02-14 | NeuroXVocal: Detection and Explanation of Alzheimer’s Disease through Non-invasive Analysis of Picture-prompted Speech | Nikolaos Ntampakis et.al. | 2502.10108 | null |
2025-02-14 | MTLM: an Innovative Language Model Training Paradigm for ASR | Qingliang Meng et.al. | 2502.10058 | null |
2025-02-14 | ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments | Juyeong Hwang et.al. | 2502.10046 | null |
2025-02-13 | MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Dongzhi Jiang et.al. | 2502.09621 | null |
2025-02-13 | Designing a Conditional Prior Distribution for Flow-Based Generative Models | Noam Issachar et.al. | 2502.09611 | null |
2025-02-13 | CoT-Valve: Length-Compressible Chain-of-Thought Tuning | Xinyin Ma et.al. | 2502.09601 | link |
2025-02-13 | GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis | Angelos Zavras et.al. | 2502.09598 | link |
2025-02-13 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Siyan Zhao et.al. | 2502.09597 | link |
2025-02-13 | Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks | Qian Wan et.al. | 2502.09577 | null |
2025-02-13 | Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering | Mark Beliaev et.al. | 2502.09573 | null |
2025-02-13 | MDCrow: Automating Molecular Dynamics Workflows with Large Language Models | Quintina Campbell et.al. | 2502.09565 | link |
2025-02-13 | Improve LLM-based Automatic Essay Scoring with Linguistic Features | Zhaoyi Joey Hou et.al. | 2502.09497 | null |
2025-02-13 | Objective quantification of mood states using large language models | Jakub Onysk et.al. | 2502.09487 | null |
2025-02-12 | Rhythmic sharing: A bio-inspired paradigm for zero-shot adaptation and learning in neural networks | Hoony Kang et.al. | 2502.08644 | link |
2025-02-12 | Ultrasound Image Generation using Latent Diffusion Models | Benoit Freiche et.al. | 2502.08580 | null |
2025-02-12 | AR Glulam: Accurate Augmented Reality Using Multiple Fiducial Markers for Glulam Fabrication | Alexander Htet Kyaw et.al. | 2502.08566 | null |
2025-02-12 | QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval | Wonduk Seo et.al. | 2502.08557 | null |
2025-02-12 | LLMs can implicitly learn from mistakes in-context | Lisa Alazraki et.al. | 2502.08550 | null |
2025-02-12 | LoRa Fine Synchronization with Two-Pass Time and Frequency Offset Estimation | Joachim Tapparel et.al. | 2502.08485 | null |
2025-02-12 | Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning | Qifan Yu et.al. | 2502.08482 | null |
2025-02-12 | Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring | Heejin Do et.al. | 2502.08450 | null |
2025-02-12 | A Semantic Parsing Algorithm to Solve Linear Ordering Problems | Maha Alkhairy et.al. | 2502.08415 | null |
2025-02-12 | IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance | Paul Röttger et.al. | 2502.08395 | null |
2025-02-11 | Auditing Prompt Caching in Language Model APIs | Chenchen Gu et.al. | 2502.07776 | link |
2025-02-11 | Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers | Italo Santos et.al. | 2502.07763 | null |
2025-02-11 | An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating | Mohammad Ali Labbaf Khaniki et.al. | 2502.07755 | null |
2025-02-11 | WHODUNIT: Evaluation benchmark for culprit detection in mystery stories | Kshitij Gupta et.al. | 2502.07747 | link |
2025-02-11 | HRP: High-Rank Preheating for Superior LoRA Initialization | Yuzhu Chen et.al. | 2502.07739 | null |
2025-02-11 | Pluto: Authoring Semantically Aligned Text and Charts for Data-Driven Communication | Arjun Srinivasan et.al. | 2502.07725 | null |
2025-02-11 | RenderBox: Expressive Performance Rendering with Text Control | Huan Zhang et.al. | 2502.07711 | null |
2025-02-11 | Methodology for Identifying Social Groups within a Transactional Graph | Maxence Morin et.al. | 2502.07694 | null |
2025-02-11 | Are Princelings Truly Busted? Evaluating Transaction Discounts in China’s Land Market | Julia Manso et.al. | 2502.07692 | null |
2025-02-11 | exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem | Sajad Ebrahimi et.al. | 2502.07683 | link |
2025-02-10 | Rationalization Models for Text-to-SQL | Gaetano Rossiello et.al. | 2502.06759 | null |
2025-02-10 | SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement | Yuqi Lin et.al. | 2502.06756 | link |
2025-02-10 | Discovery of skill switching criteria for learning agile quadruped locomotion | Wanming Yu et.al. | 2502.06676 | null |
2025-02-10 | Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations | Rui Chen et.al. | 2502.06669 | null |
2025-02-10 | In-Context Learning (and Unlearning) of Length Biases | Stephanie Schoch et.al. | 2502.06653 | null |
2025-02-10 | Estimation of Food Intake Quantity Using Inertial Signals from Smartwatches | Ioannis Levi et.al. | 2502.06649 | null |
2025-02-10 | Quasi-stationary distributions for subcritical population models | Pablo Groisman et.al. | 2502.06638 | null |
2025-02-10 | Unleashing the Potential of Pre-Trained Diffusion Models for Generalizable Person Re-Identification | Jiachen Li et.al. | 2502.06619 | link |
2025-02-10 | A Large-scale AI-generated Image Inpainting Benchmark | Paschalis Giakoumoglou et.al. | 2502.06593 | null |
2025-02-10 | Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training | Yuchen Zhuang et.al. | 2502.06589 | null |
2025-02-07 | FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation | Shilong Zhang et.al. | 2502.05179 | link |
2025-02-07 | MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison | Kaijie Zhu et.al. | 2502.05174 | null |
2025-02-07 | In-context denoising with one-layer transformers: connections between attention and associative memory retrieval | Matthew Smart et.al. | 2502.05164 | null |
2025-02-07 | CodeSCM: Causal Analysis for Multi-Modal Code Generation | Mukur Gupta et.al. | 2502.05150 | link |
2025-02-07 | From Restless to Contextual: A Thresholding Bandit Approach to Improve Finite-horizon Performance | Jiamin Xu et.al. | 2502.05145 | link |
2025-02-07 | Segment Geometry Optimization and Prototype Studies of a Multi-Coincidence GAGG Solar Neutrino Detector | Brooks Hartsock et.al. | 2502.05095 | null |
2025-02-07 | Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs | Thierry Bossy et.al. | 2502.05087 | link |
2025-02-07 | ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework | Xiaoyu Deng et.al. | 2502.05084 | null |
2025-02-07 | Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Tushar Pandey et.al. | 2502.05078 | link |
2025-02-07 | Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images | Aditya Kumar et.al. | 2502.05066 | link |
2025-02-06 | ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features | Alec Helbling et.al. | 2502.04320 | link |
2025-02-06 | ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters | Kamer Ali Yuksel et.al. | 2502.04315 | link |
2025-02-06 | DexterityGen: Foundation Controller for Unprecedented Dexterity | Zhao-Heng Yin et.al. | 2502.04307 | null |
2025-02-06 | Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization | Yuanye Liu et.al. | 2502.04295 | link |
2025-02-06 | GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation | Weihang Li et.al. | 2502.04293 | null |
2025-02-06 | Cognitive AI framework: advances in the simulation of human thought | Rommel Salas-Guerra et.al. | 2502.04259 | null |
2025-02-06 | MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion | Xintong Hao et.al. | 2502.04235 | null |
2025-02-06 | Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data | Laura Biester et.al. | 2502.04218 | null |
2025-02-06 | “Short-length” Adversarial Training Helps LLMs Defend “Long-length” Jailbreak Attacks: Theoretical and Empirical Evidence | Shaopeng Fu et.al. | 2502.04204 | link |
2025-02-06 | Lexical Substitution is not Synonym Substitution: On the Importance of Producing Contextually Relevant Word Substitutes | Juraj Vladika et.al. | 2502.04173 | null |
2025-02-05 | Contextuality with Pauli observables in cycle scenarios | Raman Choudhary et.al. | 2502.03451 | null |
2025-02-05 | A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) | Yiye Chen et.al. | 2502.03450 | null |
2025-02-05 | Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation | Alexey A. Novikov et.al. | 2502.03420 | null |
2025-02-05 | Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts | Nikta Gohari Sadr et.al. | 2502.03418 | null |
2025-02-05 | Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach | Abdullahi Isa Ahmed et.al. | 2502.03377 | null |
2025-02-05 | Interactive Visualization Recommendation with Hier-SUCB | Songwen Hu et.al. | 2502.03375 | link |
2025-02-05 | Controllable GUI Exploration | Aryan Garg et.al. | 2502.03330 | null |
2025-02-05 | ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model | Qiguang Chen et.al. | 2502.03325 | null |
2025-02-05 | ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models | Ying Zhang et.al. | 2502.03266 | link |
2025-02-05 | MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent | Xinyao Liao et.al. | 2502.03207 | null |
2025-02-04 | Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling | Xiaowen Qiu et.al. | 2502.02590 | null |
2025-02-04 | Contextuality of Quantum Error-Correcting Codes | Derek Khu et.al. | 2502.02553 | null |
2025-02-04 | OVERTHINKING: Slowdown Attacks on Reasoning LLMs | Abhinav Kumar et.al. | 2502.02542 | link |
2025-02-04 | Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | Han Zhou et.al. | 2502.02533 | null |
2025-02-04 | Catoni Contextual Bandits are Robust to Heavy-tailed Rewards | Chenlu Ye et.al. | 2502.02486 | null |
2025-02-04 | An extended Wigner’s friend no-go theorem inspired by generalized contextuality | Laurens Walleghem et.al. | 2502.02461 | null |
2025-02-04 | IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning | Quan Zhang et.al. | 2502.02454 | null |
2025-02-04 | Personalization Toolkit: Training Free Personalization of Large Vision Language Models | Soroush Seifi et.al. | 2502.02452 | null |
2025-02-04 | LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models | Jiangong Chen et.al. | 2502.02441 | link |
2025-02-04 | FewTopNER: Integrating Few-Shot Learning with Topic Modeling and Named Entity Recognition in a Multilingual Framework | Ibrahim Bouabdallaoui et.al. | 2502.02391 | link |
2025-01-31 | Low-Rank Adapting Models for Sparse Autoencoders | Matthew Chen et.al. | 2501.19406 | link |
2025-01-31 | Vintix: Action Model via In-Context Reinforcement Learning | Andrey Polubarov et.al. | 2501.19400 | link |
2025-01-31 | Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models | Wenzhi Fang et.al. | 2501.19389 | link |
2025-01-31 | The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking | Yuchun Miao et.al. | 2501.19358 | null |
2025-01-31 | LLM-based Affective Text Generation Quality Based on Different Quantization Values | Yarik Menchaca Resendiz et.al. | 2501.19317 | null |
2025-01-31 | Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution | Tatiana Anikina et.al. | 2501.19316 | null |
2025-01-31 | Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes | Zhiyao Xu et.al. | 2501.19298 | null |
2025-01-31 | Analysis of LLMs vs Human Experts in Requirements Engineering | Cory Hymel et.al. | 2501.19297 | null |
2025-01-31 | Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs | James Flemings et.al. | 2501.19287 | null |
2025-01-31 | Pheromone-based Learning of Optimal Reasoning Paths | Anirudh Chari et.al. | 2501.19278 | null |
2025-01-30 | R.I.P.: Better Models by Survival of the Fittest Prompts | Ping Yu et.al. | 2501.18578 | null |
2025-01-30 | BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos | Lehao Lin et.al. | 2501.18565 | null |
2025-01-30 | Semantic Web and Creative AI – A Technical Report from ISWS 2023 | Raia Abu Ahmad et.al. | 2501.18542 | null |
2025-01-30 | Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges | Manveer Singh Tamber et.al. | 2501.18536 | link |
2025-01-30 | CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction | Peter J. Bentley et.al. | 2501.18504 | null |
2025-01-30 | HSRMamba: Contextual Spatial-Spectral State Space Model for Single Hyperspectral Super-Resolution | Shi Chen et.al. | 2501.18500 | null |
2025-01-30 | CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization | Yanxia Deng et.al. | 2501.18475 | null |
2025-01-30 | Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations | Chengxi Zeng et.al. | 2501.18474 | null |
2025-01-30 | ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation | Minghua He et.al. | 2501.18460 | null |
2025-01-30 | o3-mini vs DeepSeek-R1: Which One is Safer? | Aitor Arrieta et.al. | 2501.18438 | link |
2025-01-29 | Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs’ Domain-Specific Insight Learning? | Pouya Pezeshkpour et.al. | 2501.17840 | link |
2025-01-29 | U2A: Unified Unimodal Adaptation for Robust and Efficient Multimodal Learning | Md Kaykobad Reza et.al. | 2501.17823 | null |
2025-01-29 | Leveraging Multimodal LLM for Inspirational User Interface Search | Seokhyeon Park et.al. | 2501.17799 | link |
2025-01-29 | AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing | Peter Pak et.al. | 2501.17784 | null |
2025-01-29 | Unraveling Log4Shell: Analyzing the Impact and Response to the Log4j Vulnerabil | John Doll et.al. | 2501.17760 | null |
2025-01-29 | Early External Safety Testing of OpenAI’s o3-mini: Insights from the Pre-Deployment Evaluation | Aitor Arrieta et.al. | 2501.17749 | null |
2025-01-29 | VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback | Sayeh Gholipour Picha et.al. | 2501.17726 | null |
2025-01-29 | RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts | Eujeong Choi et.al. | 2501.17715 | link |
2025-01-29 | In-Context Meta LoRA Generation | Yihua Shao et.al. | 2501.17635 | null |
2025-01-29 | Uncertainty Quantification and Decomposition for LLM-based Recommendation | Wonbin Kweon et.al. | 2501.17630 | link |
2025-01-28 | CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation | Nikolai Kalischek et.al. | 2501.17162 | null |
2025-01-28 | AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders | Zhengxuan Wu et.al. | 2501.17148 | link |
2025-01-28 | FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data | Deren Lei et.al. | 2501.17144 | link |
2025-01-28 | ASTRAL: Automated Safety Testing of Large Language Models | Miriam Ugarte et.al. | 2501.17132 | null |
2025-01-28 | Scenario Understanding of Traffic Scenes Through Large Visual Language Models | Rivera Esteban et.al. | 2501.17131 | null |
2025-01-28 | COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models | Tobias Materzok et.al. | 2501.17104 | null |
2025-01-28 | Text-to-Image Generation for Vocabulary Learning Using the Keyword Method | Nuwan T. Attygalle et.al. | 2501.17099 | null |
2025-01-28 | Context is Key in Agent Security | Lillian Tsai et.al. | 2501.17070 | null |
2025-01-28 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | Akash Kumar et.al. | 2501.17053 | null |
2025-01-28 | Large Language Models for Code Generation: The Practitioners Perspective | Zeeshan Rasheed et.al. | 2501.16998 | link |
2025-01-27 | RelightVid: Temporal-Consistent Diffusion Model for Video Relighting | Ye Fang et.al. | 2501.16330 | null |
2025-01-27 | Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology | Meiyun Cao et.al. | 2501.16309 | null |
2025-01-27 | RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval | Long Nguyen et.al. | 2501.16303 | null |
2025-01-27 | CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation | Xiaochuan Ma et.al. | 2501.16246 | null |
2025-01-27 | Language-Based Bayesian Optimization Research Assistant (BORA) | Abdoulatif Cissé et.al. | 2501.16224 | null |
2025-01-27 | Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models | Huayu Li et.al. | 2501.16215 | link |
2025-01-27 | Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs | Antony Bartlett et.al. | 2501.16191 | null |
2025-01-27 | Can summarization approximate simplification? A gold standard comparison | Giacomo Magnifico et.al. | 2501.16181 | null |
2025-01-27 | BAG: Body-Aligned 3D Wearable Asset Generation | Zhongjin Luo et.al. | 2501.16177 | null |
2025-01-27 | Will Systems of LLM Agents Cooperate: An Investigation into a Social Dilemma | Richard Willis et.al. | 2501.16173 | link |
2025-01-24 | HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Xin Zhou et.al. | 2501.14729 | link |
2025-01-24 | Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? | Ipek Baris Schlicht et.al. | 2501.14719 | null |
2025-01-24 | Gland Segmentation Using SAM With Cancer Grade as a Prompt | Yijie Zhu et.al. | 2501.14718 | null |
2025-01-24 | Funzac at CoMeDi Shared Task: Modeling Annotator Disagreement from Word-In-Context Perspectives | Olufunke O. Sarumi et.al. | 2501.14617 | link |
2025-01-24 | Calibrating Wireless AI via Meta-Learned Context-Dependent Conformal Prediction | Seonghoon Yoo et.al. | 2501.14566 | null |
2025-01-24 | Next-Generation Wireless: Tracking the Evolutionary Path of 6G Mobile Communication | Ekram Hossain et.al. | 2501.14552 | null |
2025-01-24 | VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning | Benjamin Callewaert et.al. | 2501.14540 | null |
2025-01-24 | Automated Assignment Grading with Large Language Models: Insights From a Bioinformatics Course | Pavlin G. Poličar et.al. | 2501.14499 | null |
2025-01-24 | Evaluating and Improving Graph to Text Generation with Large Language Models | Jie He et.al. | 2501.14497 | link |
2025-01-24 | Boundary Value Test Input Generation Using Prompt Engineering with LLMs: Fault Detection and Coverage Analysis | Xiujing Guo et.al. | 2501.14465 | null |
2025-01-23 | GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing | Akashah Shabbir et.al. | 2501.13925 | link |
2025-01-23 | The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities | Chan-Jan Hsu et.al. | 2501.13921 | link |
2025-01-23 | IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models | Jiayi Lei et.al. | 2501.13920 | null |
2025-01-23 | Improving Video Generation with Human Feedback | Jie Liu et.al. | 2501.13918 | null |
2025-01-23 | Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models | Linh Tran et.al. | 2501.13904 | null |
2025-01-23 | Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning | Zuyao You et.al. | 2501.13893 | link |
2025-01-23 | Generating Realistic Forehead-Creases for User Verification via Conditioned Piecewise Polynomial Curves | Abhishek Tandon et.al. | 2501.13889 | link |
2025-01-23 | A RAG-Based Institutional Assistant | Gustavo Kuratomi et.al. | 2501.13880 | null |
2025-01-23 | Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems | Ethan Wilson et.al. | 2501.13878 | null |
2025-01-23 | Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning | Shiyu Zhang et.al. | 2501.13859 | null |
2025-01-22 | Constructive characterisations of the must-preorder for asynchrony | Giovanni Bernardi et.al. | 2501.13002 | null |
2025-01-22 | Can supermassive stars form in protogalaxies due to internal Lyman-Werner feedback? | James Sullivan et.al. | 2501.12986 | null |
2025-01-22 | LLM4WM: Adapting LLM for Wireless Multi-Tasking | Xuanyu Liu et.al. | 2501.12983 | null |
2025-01-22 | UniUIR: Considering Underwater Image Restoration as An All-in-One Learner | Xu Zhang et.al. | 2501.12981 | null |
2025-01-22 | OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models | Chongren Sun et.al. | 2501.12975 | link |
2025-01-22 | Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference | Weizhi Fei et.al. | 2501.12959 | null |
2025-01-22 | PreciseCam: Precise Camera Control for Text-to-Image Generation | Edurne Bernal-Berdun et.al. | 2501.12910 | null |
2025-01-22 | The impact of hyperons on neutron star mergers: gravitational waves, mass ejection and black hole formation | Hristijan Kochankovski et.al. | 2501.12905 | null |
2025-01-22 | Architectural Fusion Through Contextual Partitioning in Large Language Models: A Novel Approach to Parameterized Knowledge Integration | Offa Kingsleigh et.al. | 2501.12901 | null |
2025-01-22 | HierPromptLM: A Pure PLM-based Framework for Representation Learning on Heterogeneous Text-rich Networks | Qiuyu Zhu et.al. | 2501.12857 | null |
2025-01-21 | Towards Affordance-Aware Articulation Synthesis for Rigged Objects | Yu-Chu Yu et.al. | 2501.12393 | null |
2025-01-21 | Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL | Yeounoh Chung et.al. | 2501.12372 | link |
2025-01-21 | FuocChuVIP123 at CoMeDi Shared Task: Disagreement Ranking with XLM-Roberta Sentence Embeddings and Deep Neural Regression | Phuoc Duong Huy Chu et.al. | 2501.12336 | null |
2025-01-21 | Decoherence of Schrödinger cat states in light of wave/particle duality | Th. K. Mavrogordatos et.al. | 2501.12328 | null |
2025-01-21 | UI-TARS: Pioneering Automated GUI Interaction with Native Agents | Yujia Qin et.al. | 2501.12326 | link |
2025-01-21 | CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification | Cristiano Patrício et.al. | 2501.12266 | null |
2025-01-21 | mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework | Bingyi Liu et.al. | 2501.12263 | null |
2025-01-21 | HAC++: Towards 100X Compression of 3D Gaussian Splatting | Yihang Chen et.al. | 2501.12255 | link |
2025-01-21 | CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning | Yuanheng Fang et.al. | 2501.12226 | null |
2025-01-21 | You Can’t Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense | Wuyuao Mai et.al. | 2501.12210 | null |
2025-01-17 | FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Kartik Narayan et.al. | 2501.10360 | link |
2025-01-17 | Natural Language Processing of Privacy Policies: A Survey | Andrick Adhikari et.al. | 2501.10319 | null |
2025-01-17 | PaSa: An LLM Agent for Comprehensive Academic Paper Search | Yichen He et.al. | 2501.10120 | link |
2025-01-17 | How Do Programming Students Use Generative AI? | Christian Rahe et.al. | 2501.10091 | null |
2025-01-17 | CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment | Yating Liu et.al. | 2501.10071 | link |
2025-01-17 | FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization | Zhaopeng Gu et.al. | 2501.10067 | link |
2025-01-17 | OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning | Jinyuan Feng et.al. | 2501.10062 | null |
2025-01-17 | MSTS: A Multimodal Safety Test Suite for Vision-Language Models | Paul Röttger et.al. | 2501.10057 | link |
2025-01-17 | Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions | Zhijie Tan et.al. | 2501.10011 | null |
2025-01-17 | RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation | Yuefan Cao et.al. | 2501.09982 | null |
2025-01-16 | Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues | Youngjoon Jang et.al. | 2501.09754 | null |
2025-01-16 | Coming full circle – A unified framework for Kochen-Specker contextuality | Markus Frembs et.al. | 2501.09750 | null |
2025-01-16 | Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models | Bihui Jin et.al. | 2501.09745 | null |
2025-01-16 | Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text | Jihed Ncib et.al. | 2501.09719 | null |
2025-01-16 | CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education | Tianyu Wang et.al. | 2501.09709 | link |
2025-01-16 | Practical Continual Forgetting for Pre-trained Vision Models | Hongbo Zhao et.al. | 2501.09705 | link |
2025-01-16 | Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | Zhihe Yang et.al. | 2501.09695 | link |
2025-01-16 | Quantum Contextual Hypergraphs, Operators, Inequalities, and Applications in Higher Dimensions | Mladen Pavicic et.al. | 2501.09637 | null |
2025-01-16 | LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading | Kuan-Ming Liu et.al. | 2501.09636 | null |
2025-01-16 | Constraints on Cosmic Rays Acceleration in Bright Gamma-ray Bursts with Observations of Fermi | Xing-Fu Zhang et.al. | 2501.09594 | null |
2025-01-15 | Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion | Jingyuan Chen et.al. | 2501.09019 | null |
2025-01-15 | Prompt gravitational-wave mergers aided by gas in Active Galactic Nuclei: The hydrodynamics of binary-single black hole scatterings | Connar Rowan et.al. | 2501.09017 | null |
2025-01-15 | How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias | Tosin Fadahunsi et.al. | 2501.09014 | link |
2025-01-15 | Bayesian analysis of analog gravity systems with the Rezzolla-Zhidenko metric | Saulo Albuquerque et.al. | 2501.09000 | null |
2025-01-15 | Analyzing the Ethical Logic of Six Large Language Models | W. Russell Neuman et.al. | 2501.08951 | null |
2025-01-15 | Disentangling Exploration of Large Language Models by Optimal Exploitation | Tim Grams et.al. | 2501.08925 | null |
2025-01-15 | Feature-based One-For-All: A Universal Framework for Heterogeneous Knowledge Distillation | Jhe-Hao Lin et.al. | 2501.08885 | null |
2025-01-15 | Exploring Task-Level Optimal Prompts for Visual In-Context Learning | Yan Zhu et.al. | 2501.08841 | null |
2025-01-15 | ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind | Kazutoshi Shinoda et.al. | 2501.08838 | link |
2025-01-15 | IDEA: Image Description Enhanced CLIP-Adapter | Zhipeng Ye et.al. | 2501.08816 | link |
2025-01-14 | DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models | Hyeonwoo Kim et.al. | 2501.08333 | null |
2025-01-14 | Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Miran Heo et.al. | 2501.08326 | null |
2025-01-14 | ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations | Ziyuan Huang et.al. | 2501.08324 | null |
2025-01-14 | HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | Abhilasha Ravichander et.al. | 2501.08292 | null |
2025-01-14 | SmartEraser: Remove Anything from Images using Masked-Region Guidance | Longtao Jiang et.al. | 2501.08279 | null |
2025-01-14 | Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing | Pulkit Arora et.al. | 2501.08276 | null |
2025-01-14 | TriMod Fusion for Multimodal Named Entity Recognition in Social Media | Mosab Alfaqeeh et.al. | 2501.08267 | null |
2025-01-14 | Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints | Jonathan Nöther et.al. | 2501.08246 | null |
2025-01-14 | ASTRID – An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems | Mohita Chowdhury et.al. | 2501.08208 | null |
2025-01-14 | ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | Zain Ul Abedin et.al. | 2501.08203 | null |
2025-01-13 | Imagine while Reasoning in Space: Multimodal Visualization-of-Thought | Chengzu Li et.al. | 2501.07542 | null |
2025-01-13 | Investigating Large Language Models in Inferring Personality Traits from User Conversations | Jianfeng Zhu et.al. | 2501.07532 | null |
2025-01-13 | IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion | Tharun Anand et.al. | 2501.07530 | null |
2025-01-13 | RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | Difei Gu et.al. | 2501.07525 | link |
2025-01-13 | Guided SAM: Label-Efficient Part Segmentation | S. B. van Rooij et.al. | 2501.07434 | null |
2025-01-13 | Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection | Xin Yin et.al. | 2501.07425 | null |
2025-01-13 | Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion | Lala Shakti Swarup Ray et.al. | 2501.07408 | null |
2025-01-13 | Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models | Yasiru Ranasinghe et.al. | 2501.07396 | null |
2025-01-13 | Enhancing Retrieval-Augmented Generation: A Study of Best Practices | Siran Li et.al. | 2501.07391 | link |
2025-01-13 | Approaching ballistic motion in 3D simulations of gamma-ray burst jets in realistic binary neutron star merger environments | Emma Dreas et.al. | 2501.07385 | null |
2025-01-10 | Multi-subject Open-set Personalization in Video Generation | Tsai-Shien Chen et.al. | 2501.06187 | null |
2025-01-10 | PEACE: Empowering Geologic Map Holistic Understanding with MLLMs | Yangyu Huang et.al. | 2501.06184 | null |
2025-01-10 | ScooterLab: A Programmable and Participatory Sensing Research Testbed using Micromobility Vehicles | Ubaidullah Khan et.al. | 2501.06177 | null |
2025-01-10 | Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories | Gerd Kortemeyer et.al. | 2501.06143 | null |
2025-01-10 | Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI | Yuya Asano et.al. | 2501.06129 | null |
2025-01-10 | Explaining Deep Learning-based Anomaly Detection in Energy Consumption Data by Focusing on Contextually Relevant Data | Mohammad Noorchenarboo et.al. | 2501.06099 | null |
2025-01-10 | A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection | Tsui Qin Mok et.al. | 2501.06038 | null |
2025-01-10 | The all-charm tetraquark and its contribution to two-photon processes | Panagiotis Kalamidas et.al. | 2501.06034 | null |
2025-01-10 | How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters | Romina Oji et.al. | 2501.06025 | link |
2025-01-10 | BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response | Hongruixuan Chen et.al. | 2501.06019 | link |
2025-01-09 | Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark | Yunzhuo Hao et.al. | 2501.05444 | link |
2025-01-09 | TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts | Yu-Hao Huang et.al. | 2501.05403 | link |
2025-01-09 | FairCode: Evaluating Social Bias of LLMs in Code Generation | Yongkang Du et.al. | 2501.05396 | link |
2025-01-09 | CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models | Junha Park et.al. | 2501.05359 | null |
2025-01-09 | Continuity in Potential Infinite Models | Matthias Eberl et.al. | 2501.05276 | null |
2025-01-09 | CallNavi: A Study and Challenge on Function Calling Routing and Invocation in Large Language Models | Yewei Song et.al. | 2501.05255 | null |
2025-01-09 | Online Prompt and Solver Selection for Program Synthesis | Yixuan Li et.al. | 2501.05247 | null |
2025-01-09 | Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection | Pei-Kang Lee et.al. | 2501.05228 | null |
2025-01-09 | FaceMe: Robust Blind Face Restoration with Personal Identification | Siyu Liu et.al. | 2501.05177 | null |
2025-01-09 | Deep Assessment of Code Review Generation Approaches: Beyond Lexical Similarity | Yanjie Jiang et.al. | 2501.05176 | null |
2025-01-08 | Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding | Joshua Jones et.al. | 2501.04693 | null |
2025-01-08 | Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling | Nannan Li et.al. | 2501.04666 | null |
2025-01-08 | External quantum fluctuations select measurement contexts | Jonte R. Hance et.al. | 2501.04664 | null |
2025-01-08 | Assessing Language Comprehension in Large Language Models Using Construction Grammar | Wesley Scivetti et.al. | 2501.04661 | null |
2025-01-08 | FleSpeech: Flexibly Controllable Speech Generation with Various Prompts | Hanzhao Li et.al. | 2501.04644 | null |
2025-01-08 | “Can you be my mum?”: Manipulating Social Robots in the Large Language Models Era | Giulio Antonio Abbo et.al. | 2501.04633 | null |
2025-01-08 | MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation | Daniele Molino et.al. | 2501.04614 | null |
2025-01-08 | Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion | Yangfan He et.al. | 2501.04606 | link |
2025-01-08 | Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models | Miaoyang He et.al. | 2501.04582 | null |
2025-01-08 | The Impostor is Among Us: Can Large Language Models Capture the Complexity of Human Personas? | Christopher Lazik et.al. | 2501.04543 | null |
2025-01-07 | WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings | Haochen Song et.al. | 2501.03999 | null |
2025-01-07 | NeuralSVG: An Implicit Representation for Text-to-Vector Generation | Sagi Polaczek et.al. | 2501.03992 | null |
2025-01-07 | Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles | Yuxi Xia et.al. | 2501.03991 | null |
2025-01-07 | Semantically Cohesive Word Grouping in Indian Languages | N J Karthika et.al. | 2501.03988 | null |
2025-01-07 | VLM-driven Behavior Tree for Context-aware Task Planning | Naoki Wake et.al. | 2501.03968 | link |
2025-01-07 | Vision Language Models as Values Detectors | Giulio Antonio Abbo et.al. | 2501.03957 | null |
2025-01-07 | Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection | Pablo Miralles-González et.al. | 2501.03940 | null |
2025-01-07 | Truthful mechanisms for linear bandit games with private contexts | Yiting Hu et.al. | 2501.03865 | null |
2025-01-07 | Progressive Document-level Text Simplification via Large Language Models | Dengzhao Fang et.al. | 2501.03857 | null |
2025-01-07 | Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control | Zekai Gu et.al. | 2501.03847 | link |
2025-01-06 | Rate-My-LoRA: Efficient and Adaptive Federated Model Tuning for Cardiac MRI Segmentation | Xiaoxiao He et.al. | 2501.03223 | null |
2025-01-06 | Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction | Rui Qian et.al. | 2501.03218 | link |
2025-01-06 | The FACTS Grounding Leaderboard: Benchmarking LLMs’ Ability to Ground Responses to Long-Form Input | Alon Jacovi et.al. | 2501.03200 | null |
2025-01-06 | Visualizing quantum entanglement in Bose-Einstein condensates without state vectors | Russell B. Thompson et.al. | 2501.03199 | null |
2025-01-06 | Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text | Ali Al-Lawati et.al. | 2501.03166 | link |
2025-01-06 | The Scaling Law for LoRA Base on Mutual Information Upper Bound | Jing Zhang et.al. | 2501.03152 | null |
2025-01-06 | VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity | Yerong Li et.al. | 2501.03139 | null |
2025-01-06 | PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models | Mingyang Song et.al. | 2501.03124 | link |
2025-01-06 | LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases | Dylan Bouchard et.al. | 2501.03112 | link |
2025-01-06 | Physics, Environment and Environmental Education; Perceptions from trainee Natural Science teachers | Daniel Alejandro Valderrama et.al. | 2501.03090 | null |
2025-01-03 | Metadata Conditioning Accelerates Language Model Pre-training | Tianyu Gao et.al. | 2501.01956 | link |
2025-01-03 | Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) for Passive Sonar Classification | Jarin Ritu et.al. | 2501.01921 | link |
2025-01-03 | Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions | Rachneet Sachdeva et.al. | 2501.01872 | link |
2025-01-03 | A review of long lasting activities of the central engine of gamma-ray bursts | Bruce Gendre et.al. | 2501.01857 | null |
2025-01-03 | MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning | Pu Yang et.al. | 2501.01834 | null |
2025-01-03 | Time Series Language Model for Descriptive Caption Generation | Mohamed Trabelsi et.al. | 2501.01832 | null |
2025-01-03 | Ingredients: Blending Custom Photos with Video Diffusion Transformers | Zhengcong Fei et.al. | 2501.01790 | link |
2025-01-03 | SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation | Mingjie Li et.al. | 2501.01765 | null |
2025-01-03 | How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models | Simone Corbo et.al. | 2501.01741 | null |
2025-01-03 | AR4D: Autoregressive 4D Generation from Monocular Videos | Hanxin Zhu et.al. | 2501.01722 | null |
2025-01-02 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Zhangyang Qi et.al. | 2501.01428 | link |
2025-01-02 | Object-level Visual Prompts for Compositional Image Generation | Gaurav Parmar et.al. | 2501.01424 | null |
2025-01-02 | Multi-Modal Video Feature Extraction for Popularity Prediction | Haixu Liu et.al. | 2501.01422 | null |
2025-01-02 | Nested Attention: Semantic-aware Attention Values for Concept Personalization | Or Patashnik et.al. | 2501.01407 | null |
2025-01-02 | StereoMath: An Accessible and Musical Equation Editor | Kenneth Ge et.al. | 2501.01404 | null |
2025-01-02 | Training Medical Large Vision-Language Models with Abnormal-Aware Feedback | Yucheng Zhou et.al. | 2501.01377 | null |
2025-01-02 | Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement | Z. Zhang et.al. | 2501.01368 | null |
2025-01-02 | ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding | Austin T. Wang et.al. | 2501.01366 | null |
2025-01-02 | CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Johan Wahréus et.al. | 2501.01335 | link |
2025-01-02 | Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension | Yanbo Fang et.al. | 2501.01332 | null |
2024-12-30 | Distributed Mixture-of-Agents for Edge Inference with Large Language Models | Purbesh Mitra et.al. | 2412.21200 | link |
2024-12-30 | Adversarial Attack and Defense for LoRa Device Identification and Authentication via Deep Learning | Yalin E. Sagduyu et.al. | 2412.21164 | null |
2024-12-30 | Unified dimensionality reduction techniques in chronic liver disease detection | Anand Karna et.al. | 2412.21156 | null |
2024-12-30 | Exploring and Controlling Diversity in LLM-Agent Conversation | KuanChao Chu et.al. | 2412.21102 | null |
2024-12-30 | Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Yifei Huang et.al. | 2412.21080 | link |
2024-12-30 | Efficient Multi-Task Inferencing with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring | Ehsan Latif et.al. | 2412.21065 | null |
2024-12-30 | Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration | Wanglong Lu et.al. | 2412.21042 | link |
2024-12-30 | Automated Robustness Testing for LLM-based NLP Software | Mingxuan Xiao et.al. | 2412.21016 | link |
2024-12-30 | Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline | Nicola Messina et.al. | 2412.21009 | link |
2024-12-30 | Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering | Junxiao Xue et.al. | 2412.20927 | null |
2024-12-27 | Enhancing Whisper’s Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization | Kumud Tripathi et.al. | 2412.19785 | null |
2024-12-27 | Hard Photon Triggered Jets in $p$-$p$ and $A$-$A$ Collisions | C. Sirimanna et.al. | 2412.19738 | null |
2024-12-27 | Can Large Language Models Adapt to Other Agents In-Context? | Matthew Riemer et.al. | 2412.19726 | null |
2024-12-27 | Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Sijia Chen et.al. | 2412.19707 | link |
2024-12-27 | Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework | Jiang Liu et.al. | 2412.19684 | null |
2024-12-27 | Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP | Zhongxing Xu et.al. | 2412.19650 | null |
2024-12-27 | ReNeg: Learning Negative Embedding with Reward Guidance | Xiaomin Li et.al. | 2412.19637 | link |
2024-12-27 | RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations | Mingshu Zhao et.al. | 2412.19628 | link |
2024-12-27 | Signatures of prediction during natural listening in MEG data? | Sahel Azizpour et.al. | 2412.19622 | null |
2024-12-27 | Gradient Weight-normalized Low-rank Projection for Efficient LLM Training | Jia-Hong Huang et.al. | 2412.19616 | link |
2024-12-24 | Decentralized Intelligence in GameFi: Embodied AI Agents and the Convergence of DeFi and Virtual Ecosystems | Fernando Jia et.al. | 2412.18601 | link |
2024-12-24 | ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation | Hongjie Li et.al. | 2412.18600 | null |
2024-12-24 | DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Minghong Cai et.al. | 2412.18597 | link |
2024-12-24 | Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control | Sergey Sedov et.al. | 2412.18582 | null |
2024-12-24 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang et.al. | 2412.18552 | link |
2024-12-24 | Token-Budget-Aware LLM Reasoning | Tingxu Han et.al. | 2412.18547 | link |
2024-12-24 | Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation | Derong Xu Xinhang Li et.al. | 2412.18537 | link |
2024-12-24 | Segment-Based Attention Masking for GPTs | Shahar Katz et.al. | 2412.18487 | link |
2024-12-24 | Betting vs. Trading: Learning a Linear Decision Policy for Selling Wind Power and Hydrogen | Yannick Heiser et.al. | 2412.18479 | null |
2024-12-24 | Is Large Language Model Good at Triple Set Prediction? An Empirical Study | Yuan Yuan et.al. | 2412.18443 | null |
2024-12-23 | The Superposition of Diffusion Models Using the Itô Density Estimator | Marta Skreta et.al. | 2412.17762 | null |
2024-12-23 | **Reasoning to Attend: Try to Understand How |
Rui Qian et.al. | 2412.17741 | link |
2024-12-23 | Contextual Backpropagation Loops: Amplifying Deep Reasoning with Iterative Top-Down Feedback | Jacob Fein-Ashley et.al. | 2412.17737 | link |
2024-12-23 | Chumor 2.0: Towards Benchmarking Chinese Humor Understanding | Ruiqi He et.al. | 2412.17729 | link |
2024-12-23 | Knowledge Editing through Chain-of-Thought | Changyue Wang et.al. | 2412.17727 | link |
2024-12-23 | The Cosmological Population of Gamma-Ray Bursts from the Disks of Active Galactic Nuclei | Hoyoung D. Kang et.al. | 2412.17714 | null |
2024-12-23 | EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities | Zhe Chen et.al. | 2412.17677 | link |
2024-12-23 | Detecting anxiety and depression in dialogues: a multi-label and explainable approach | Francisco de Arriba-Pérez et.al. | 2412.17651 | null |
2024-12-23 | DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder | Ente Lin et.al. | 2412.17644 | null |
2024-12-23 | LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding | Hao Li et.al. | 2412.17635 | null |
2024-12-20 | MotiF: Making Text Count in Image Animation with Motion Focal Loss | Shijie Wang et.al. | 2412.16153 | null |
2024-12-20 | A vector logic for extensional formal semantics | Daniel Quigley et.al. | 2412.16152 | null |
2024-12-20 | PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics | Daniil Larionov et.al. | 2412.16120 | null |
2024-12-20 | Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs | Lynn Greschner et.al. | 2412.15993 | null |
2024-12-20 | APIRL: Deep Reinforcement Learning for REST API Fuzzing | Myles Foley et.al. | 2412.15991 | link |
2024-12-20 | From General to Specific: Tailoring Large Language Models for Personalized Healthcare | Ruize Shi et.al. | 2412.15957 | null |
2024-12-20 | MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection | Andrea Moglia et.al. | 2412.15925 | link |
2024-12-20 | On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education | Lorenz Wendlinger et.al. | 2412.15902 | null |
2024-12-20 | On Robust Cross Domain Alignment | Anish Chakrabarty et.al. | 2412.15861 | null |
2024-12-20 | Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation | Aiwen Jiang et.al. | 2412.15845 | link |
2024-12-19 | PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation | Muntasir Wahed et.al. | 2412.15209 | null |
2024-12-19 | FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching | Sucheng Ren et.al. | 2412.15205 | link |
2024-12-19 | Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying | Federico Castagna et.al. | 2412.15177 | link |
2024-12-19 | Rethinking Uncertainty Estimation in Natural Language Generation | Lukas Aichberger et.al. | 2412.15176 | null |
2024-12-19 | Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM | Yatai Ji et.al. | 2412.15156 | link |
2024-12-19 | AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling | Zihan Liu et.al. | 2412.15084 | null |
2024-12-19 | MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance | Hallee E. Wong et.al. | 2412.15058 | null |
2024-12-19 | Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI | Isadora Krsek et.al. | 2412.15047 | null |
2024-12-19 | LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps | Felix Friedrich et.al. | 2412.15035 | null |
2024-12-19 | Large Language Models and Code Security: A Systematic Literature Review | Enna Basic et.al. | 2412.15004 | null |
2024-12-18 | FashionComposer: Compositional Fashion Image Generation | Sihui Ji et.al. | 2412.14168 | null |
2024-12-18 | Alignment faking in large language models | Ryan Greenblatt et.al. | 2412.14093 | link |
2024-12-18 | Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets | Simon Thorne et.al. | 2412.14062 | null |
2024-12-18 | Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation | Vera Neplenbroek et.al. | 2412.14050 | link |
2024-12-18 | Hansel: Output Length Controlling Framework for Large Language Models | Seoha Song et.al. | 2412.14033 | null |
2024-12-18 | Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation | Haotong Lin et.al. | 2412.14015 | link |
2024-12-18 | What makes a good metric? Evaluating automatic metrics for text-to-image consistency | Candace Ross et.al. | 2412.13989 | null |
2024-12-18 | RAG for Effective Supply Chain Security Questionnaire Automation | Zaynab Batool Reza et.al. | 2412.13988 | null |
2024-12-18 | Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation | Eleni Sgouritsa et.al. | 2412.13952 | null |
2024-12-18 | CoRa: A Collision-Resistant LoRa Symbol Detector of Low Complexity | José Álamos et.al. | 2412.13930 | null |
2024-12-17 | CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models | Gaoyang Zhang et.al. | 2412.13195 | link |
2024-12-17 | MotionBridge: Dynamic Video Inbetweening with Flexible Controls | Maham Tanveer et.al. | 2412.13190 | null |
2024-12-17 | Move-in-2D: 2D-Conditioned Human Motion Generation | Hsin-Ping Huang et.al. | 2412.13185 | null |
2024-12-17 | DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation | Miriam Wanner et.al. | 2412.13175 | null |
2024-12-17 | Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study | Bolei Ma et.al. | 2412.13169 | link |
2024-12-17 | F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration | Lu Liu et.al. | 2412.13155 | null |
2024-12-17 | Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation | Huaijin Pi et.al. | 2412.13111 | null |
2024-12-17 | Prompt Augmentation for Self-supervised Text-guided Image Manipulation | Rumeysa Bodur et.al. | 2412.13081 | null |
2024-12-17 | Identifying Bias in Deep Neural Networks Using Image Transforms | Sai Teja Erukude et.al. | 2412.13079 | link |
2024-12-17 | Harnessing Event Sensory Data for Error Pattern Prediction in Vehicles: A Language Model Approach | Hugo Math et.al. | 2412.13041 | link |
2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077 | null |
2024-12-16 | A LoRA is Worth a Thousand Pictures | Chenxi Liu et.al. | 2412.12048 | null |
2024-12-16 | How Private are Language Models in Abstractive Summarization? | Anthony Hughes et.al. | 2412.12040 | null |
2024-12-16 | Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection | Ira Ceka et.al. | 2412.12039 | null |
2024-12-16 | Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm | Rajat Khanda et.al. | 2412.12006 | null |
2024-12-16 | The Open Source Advantage in Large Language Models (LLMs) | Jiya Manchanda et.al. | 2412.12004 | null |
2024-12-16 | SAMIC: Segment Anything with In-Context Spatial Prompt Engineering | Savinay Nagendra et.al. | 2412.11998 | null |
2024-12-16 | Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support | Devika Venugopalan et.al. | 2412.11995 | link |
2024-12-16 | Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives | Sam Relins et.al. | 2412.11878 | link |
2024-12-16 | A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation | Tian-Yi Che et.al. | 2412.11832 | null |
2024-12-13 | A Grounded Typology of Word Classes | Coleman Haley et.al. | 2412.10369 | null |
2024-12-13 | TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies | Ruijie Zheng et.al. | 2412.10345 | null |
2024-12-13 | SCBench: A KV Cache-Centric Analysis of Long-Context Methods | Yucheng Li et.al. | 2412.10319 | null |
2024-12-13 | My Statistics is Better than Yours | Simon Benhaïem et.al. | 2412.10296 | null |
2024-12-13 | Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation | Yu-Jhe Li et.al. | 2412.10292 | null |
2024-12-13 | One world, one opinion? The superstar effect in LLM responses | Sofie Goethals et.al. | 2412.10281 | null |
2024-12-13 | Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT | Danielle R. Thomas et.al. | 2412.10267 | link |
2024-12-13 | Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models | Harry J. Davies et.al. | 2412.10257 | null |
2024-12-13 | Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts | Hazel Kim et.al. | 2412.10246 | null |
2024-12-13 | SPT: Sequence Prompt Transformer for Interactive Image Segmentation | Senlin Cheng et.al. | 2412.10224 | null |
2024-12-12 | Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors | Yue Feng et.al. | 2412.09625 | null |
2024-12-12 | LoRACLR: Contrastive Adaptation for Customization of Diffusion Models | Enis Simsar et.al. | 2412.09622 | null |
2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618 | null |
2024-12-12 | Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG | Kavana Venkatesh et.al. | 2412.09614 | null |
2024-12-12 | TimeRefine: Temporal Grounding with Time Refining Video LLM | Xizi Wang et.al. | 2412.09601 | link |
2024-12-12 | Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders | Fiona Ryan et.al. | 2412.09586 | link |
2024-12-12 | Obfuscated Activations Bypass LLM Latent-Space Defenses | Luke Bailey et.al. | 2412.09565 | null |
2024-12-12 | Does Representation Matter? Exploring Intermediate Layers in Large Language Models | Oscar Skean et.al. | 2412.09563 | null |
2024-12-12 | SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing | Xueting Li et.al. | 2412.09545 | null |
2024-12-12 | Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM | Han Wang et.al. | 2412.09530 | link |
2024-12-11 | GPD-1: Generative Pre-training for Driving | Zixun Xie et.al. | 2412.08643 | link |
2024-12-11 | Fast Prompt Alignment for Text-to-Image Generation | Khalil Mrini et.al. | 2412.08639 | link |
2024-12-11 | DMin: Scalable Training Data Influence Estimation for Diffusion Models | Huawei Lin et.al. | 2412.08637 | link |
2024-12-11 | FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models | Vladimir Kulikov et.al. | 2412.08629 | link |
2024-12-11 | Der Effizienz- und Intelligenzbegriff in der Lexikographie und kuenstlichen Intelligenz: kann ChatGPT die lexikographische Textsorte nachbilden? | Ivan Arias-Arias et.al. | 2412.08599 | null |
2024-12-11 | Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks | Arsalan Masoudifard et.al. | 2412.08593 | null |
2024-12-11 | LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations | Zejian Li et.al. | 2412.08580 | link |
2024-12-11 | Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation | Hongming Guo et.al. | 2412.08577 | null |
2024-12-11 | Can We Generate Visual Programs Without Prompting LLMs? | Michal Shlapentokh-Rothman et.al. | 2412.08564 | null |
2024-12-11 | Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations | Hugo Flores García et.al. | 2412.08550 | null |
2024-12-10 | From Slow Bidirectional to Fast Causal Video Generators | Tianwei Yin et.al. | 2412.07772 | null |
2024-12-10 | Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting | Zetong Yang et.al. | 2412.07768 | null |
2024-12-10 | Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds | Xiaoyu Xiang et.al. | 2412.07766 | null |
2024-12-10 | PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation | Fatemeh Nazarieh et.al. | 2412.07754 | null |
2024-12-10 | Multi-Shot Character Consistency for Text-to-Video Generation | Yuval Atzmon et.al. | 2412.07750 | null |
2024-12-10 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | Ziqi Lu et.al. | 2412.07746 | null |
2024-12-10 | StyleMaster: Stylize Your Video with Artistic Generation and Translation | Zixuan Ye et.al. | 2412.07744 | null |
2024-12-10 | SKIPNet: Spatial Attention Skip Connections for Enhanced Brain Tumor Classification | Khush Mendiratta et.al. | 2412.07736 | null |
2024-12-10 | Granite Guardian | Inkit Padhi et.al. | 2412.07724 | link |
2024-12-10 | Leveraging Content and Context Cues for Low-Light Image Enhancement | Igor Morawski et.al. | 2412.07693 | link |
2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774 | null |
2024-12-09 | Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty | Meera Hahn et.al. | 2412.06771 | link |
2024-12-09 | Ranking-aware adapter for text-driven image ordering with CLIP | Wei-Hsiang Yu et.al. | 2412.06760 | link |
2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738 | link |
2024-12-09 | Revisiting GRB 060218: new insights into low-luminosity gamma-ray bursts from a revised shock breakout model | Christopher M. Irwin et.al. | 2412.06736 | null |
2024-12-09 | AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark | Lan Li et.al. | 2412.06724 | link |
2024-12-09 | VP-MEL: Visual Prompts Guided Multimodal Entity Linking | Hongze Mi et.al. | 2412.06720 | null |
2024-12-09 | Facade: High-Precision Insider Threat Detection Using Deep Contextual Anomaly Detection | Alex Kantchelian et.al. | 2412.06700 | null |
2024-12-09 | Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach | Weichao Xu et.al. | 2412.06684 | null |
2024-12-09 | Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion | Shuaiting Li et.al. | 2412.06661 | null |
2024-12-06 | Sparse autoencoders reveal selective remapping of visual concepts during adaptation | Hyesu Lim et.al. | 2412.05276 | link |
2024-12-06 | Mind the Time: Temporally-Controlled Multi-Event Video Generation | Ziyi Wu et.al. | 2412.05263 | null |
2024-12-06 | TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | Qian Long et.al. | 2412.05255 | link |
2024-12-06 | From classical techniques to convolution-based models: A review of object detection algorithms | Fnu Neha et.al. | 2412.05252 | null |
2024-12-06 | LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds | James Beetham et.al. | 2412.05232 | null |
2024-12-06 | Are Frontier Large Language Models Suitable for Q&A in Science Centres? | Jacob Watson et.al. | 2412.05200 | null |
2024-12-06 | QueEn: A Large Language Model for Quechua-English Translation | Junhao Chen et.al. | 2412.05184 | null |
2024-12-06 | A text-to-tabular approach to generate synthetic patient data using LLMs | Margaux Tornqvist et.al. | 2412.05153 | link |
2024-12-06 | LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation | Donald Shenaj et.al. | 2412.05148 | link |
2024-12-06 | A Practical Examination of AI-Generated Text Detectors for Large Language Models | Brian Tufts et.al. | 2412.05139 | null |
2024-12-05 | Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail | Luca Bartolomei et.al. | 2412.04472 | link |
2024-12-05 | PaintScene4D: Consistent 4D Scene Generation from Text Prompts | Vinayak Gupta et.al. | 2412.04471 | null |
2024-12-05 | UnZipLoRA: Separating Content and Style from a Single Image | Chang Liu et.al. | 2412.04465 | null |
2024-12-05 | Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection | Enshen Zhou et.al. | 2412.04455 | null |
2024-12-05 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu et.al. | 2412.04447 | null |
2024-12-05 | GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration | Kaiyi Huang et.al. | 2412.04440 | null |
2024-12-05 | Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Yuying Ge et.al. | 2412.04432 | link |
2024-12-05 | Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Jiuhai Chen et.al. | 2412.04424 | link |
2024-12-05 | Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation | Xuying Li et.al. | 2412.04415 | null |
2024-12-05 | Discriminative Fine-tuning of LVLMs | Yassine Ouali et.al. | 2412.04378 | null |
2024-12-04 | Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning | Wujian Peng et.al. | 2412.03565 | link |
2024-12-04 | Best-of-N Jailbreaking | John Hughes et.al. | 2412.03556 | link |
2024-12-04 | Imagine360: Immersive 360 Video Generation from Perspective Anchor | Jing Tan et.al. | 2412.03552 | null |
2024-12-04 | Perception Tokens Enhance Visual Reasoning in Multimodal Language Models | Mahtab Bigverdi et.al. | 2412.03548 | null |
2024-12-04 | Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models | Natalie Mackraz et.al. | 2412.03537 | null |
2024-12-04 | A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences | Gabriel Lino Garcia et.al. | 2412.03531 | null |
2024-12-04 | You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? | Dominic Lohr et.al. | 2412.03516 | null |
2024-12-04 | Gesture Classification in Artworks Using Contextual Image Features | Azhar Hussian et.al. | 2412.03456 | null |
2024-12-04 | PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation | Ao Wang et.al. | 2412.03409 | link |
2024-12-04 | Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment | Feng He et.al. | 2412.03400 | null |
2024-12-03 | Motion Prompting: Controlling Video Generation with Motion Trajectories | Daniel Geng et.al. | 2412.02700 | null |
2024-12-03 | Diffusion-based Visual Anagram as Multi-task Learning | Zhiyuan Xu et.al. | 2412.02693 | link |
2024-12-03 | SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance | Viet Nguyen et.al. | 2412.02687 | null |
2024-12-03 | T-REG: Preference Optimization with Token-Level Reward Regularization | Wenxuan Zhou et.al. | 2412.02685 | null |
2024-12-03 | Liquefaction: Privately Liquefying Blockchain Assets | James Austgen et.al. | 2412.02634 | null |
2024-12-03 | Time-Reversal Provides Unsupervised Feedback to LLMs | Yerram Varun et.al. | 2412.02626 | null |
2024-12-03 | Explainable CTR Prediction via LLM Reasoning | Xiaohan Yu et.al. | 2412.02588 | null |
2024-12-03 | Copy-Move Forgery Detection and Question Answering for Remote Sensing Image | Ze Zhang et.al. | 2412.02575 | link |
2024-12-03 | Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Chenyang Liu et.al. | 2412.02573 | link |
2024-12-03 | Unveiling Concept Attribution in Diffusion Models | Quang H. Nguyen et.al. | 2412.02542 | link |
2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921 | null |
2024-11-29 | Handling irresolvable conflicts in the Semantic Web: an RDF-based conflict-tolerant version of the Deontic Traditional Scheme | Livio Robaldo et.al. | 2411.19918 | link |
2024-11-29 | Another look at inference after prediction | Jessica Gronsbell et.al. | 2411.19908 | link |
2024-11-29 | Cross-Domain Recommendation Meets Large Language Models | Ajay Krishna Vajjala et.al. | 2411.19862 | link |
2024-11-29 | Neuroplasticity and Psychedelics: a comprehensive examination of classic and non-classic compounds in pre and clinical models | Claudio Agnorelli et.al. | 2411.19840 | null |
2024-11-29 | Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation | Robin D. Pesl et.al. | 2411.19804 | null |
2024-11-29 | PerLA: Perceptive 3D Language Assistant | Guofeng Mei et.al. | 2411.19774 | null |
2024-11-29 | SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks | Kim-Celine Kahl et.al. | 2411.19688 | link |
2024-11-29 | Measurement of the Inclusive Cross Sections of Prompt $J/ψ$ and $ψ(3686)$ Production in $e^{+}e^{-}$ Annihilation from $\sqrt{s}=3.808$ to $4.951$ GeV | BESIII Collaboration et.al. | 2411.19642 | null |
2024-11-29 | Unleashing the Transformative Power of Deliberation With Contextual Citizens | Ariane Lambert-Mogiliansky et.al. | 2411.19596 | null |
2024-11-27 | Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis | Eva Prakash et.al. | 2411.18602 | null |
2024-11-27 | Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning | Omkar Khade et.al. | 2411.18571 | null |
2024-11-27 | A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models | Rong Wang et.al. | 2411.18564 | null |
2024-11-27 | Bumblebee cosmology: Tests using distance- and time-redshift probes | Xincheng Zhu et.al. | 2411.18559 | null |
2024-11-27 | Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models | Minhyeok Lee et.al. | 2411.18530 | link |
2024-11-27 | Perturbation Ontology based Graph Attention Networks | Yichen Wang et.al. | 2411.18520 | null |
2024-11-27 | Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Jinyang Wu et.al. | 2411.18478 | null |
2024-11-28 | MM-Path: Multi-modal, Multi-granularity Path Representation Learning – Extended Version | Ronghui Xu et.al. | 2411.18428 | link |
2024-11-27 | Short-time existence and uniqueness for some infinite-dimensional Nash systems | Davide Francesco Redaelli et.al. | 2411.18356 | null |
2024-11-27 | TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models | Riza Velioglu et.al. | 2411.18350 | link |
2024-11-26 | Video-Guided Foley Sound Generation with Multimodal Controls | Ziyang Chen et.al. | 2411.17698 | null |
2024-11-26 | Instance-Aware Graph Prompt Learning | Jiazheng Li et.al. | 2411.17676 | null |
2024-11-26 | Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting | Liyun Zhang et.al. | 2411.17674 | null |
2024-11-26 | SketchAgent: Language-Driven Sequential Sketch Generation | Yael Vinker et.al. | 2411.17673 | null |
2024-11-26 | Synthetic Data Generation with LLM for Improved Depression Prediction | Andrea Kang et.al. | 2411.17672 | null |
2024-11-26 | Linguistic Laws Meet Protein Sequences: A Comparative Analysis of Subword Tokenization Methods | Burak Suyunu et.al. | 2411.17669 | link |
2024-11-26 | BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings | Abhay Shanbhag et.al. | 2411.17661 | null |
2024-11-26 | Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism | Yi-Chien Lin et.al. | 2411.17651 | null |
2024-11-26 | SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation | Claudia Cuttano et.al. | 2411.17646 | link |
2024-11-26 | Uma proposta para o uso de RPG no Ensino de Física: A Vingança de Newton | Maria Rita Vasconcelos Brandão Souza et.al. | 2411.17642 | null |
2024-11-25 | Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective | Jean Marie Tshimula et.al. | 2411.16642 | null |
2024-11-25 | Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric | Zhichao Zhang et.al. | 2411.16619 | null |
2024-11-25 | MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series | Aaron Wheeler et.al. | 2411.16585 | link |
2024-11-25 | RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics | Chan Hee Song et.al. | 2411.16537 | null |
2024-11-25 | Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings | Carolin M. Schuster et.al. | 2411.16527 | link |
2024-11-25 | Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency | Jerry Yao-Chieh Hu et.al. | 2411.16525 | null |
2024-11-25 | Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis | Boming Miao et.al. | 2411.16503 | null |
2024-11-25 | Interpreting Language Reward Models via Contrastive Explanations | Junqi Jiang et.al. | 2411.16502 | null |
2024-11-25 | Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval | Xiaocong Yang et.al. | 2411.16454 | null |
2024-11-25 | VQ-SGen: A Vector Quantized Stroke Representation for Sketch Generation | Jiawei Wang et.al. | 2411.16446 | null |
2024-11-22 | VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Daeun Lee et.al. | 2411.15115 | null |
2024-11-22 | AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | Fengyuan Liu et.al. | 2411.15102 | link |
2024-11-22 | Instance-Aware Generalized Referring Expression Segmentation | E-Ro Nguyen et.al. | 2411.15087 | null |
2024-11-22 | FloAt: Flow Warping of Self-Attention for Clothing Animation Generation | Swasti Shreya Mishra et.al. | 2411.15028 | null |
2024-11-22 | FTA generation using GenAI with an Autonomy sensor Usecase | Sneha Sudhir Shetiya et.al. | 2411.15007 | null |
2024-11-22 | ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Junhong Shen et.al. | 2411.15004 | link |
2024-11-22 | Free Energy Projective Simulation (FEPS): Active inference with interpretability | Joséphine Pazem et.al. | 2411.14991 | null |
2024-11-22 | Generative AI may backfire for counterspeech | Dominik Bär et.al. | 2411.14986 | null |
2024-11-22 | Exploring Foundation Models Fine-Tuning for Cytology Classification | Manon Dausort et.al. | 2411.14975 | link |
2024-11-22 | Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation | Colin Diggs et.al. | 2411.14971 | null |
2024-11-21 | Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2411.14432 | link |
2024-11-21 | Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings | Aaron Zheng et.al. | 2411.14398 | null |
2024-11-21 | Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation | Yuanhao Cai et.al. | 2411.14384 | null |
2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347 | link |
2024-11-21 | UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | Bethel Melesse Tessema et.al. | 2411.14343 | link |
2024-11-21 | Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams | Jitendra Bhandari et.al. | 2411.14299 | link |
2024-11-21 | CAIP: Detecting Router Misconfigurations with Context-Aware Iterative Prompting of LLMs | Xi Jiang et.al. | 2411.14283 | null |
2024-11-21 | Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance | Haozhe Zhao et.al. | 2411.14279 | null |
2024-11-21 | Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification | Junhua Liu et.al. | 2411.14252 | null |
2024-11-21 | Natural Language Reinforcement Learning | Xidong Feng et.al. | 2411.14251 | link |
2024-11-20 | Metacognition for Unknown Situations and Environments (MUSE) | Rodolfo Valiente et.al. | 2411.13537 | null |
2024-11-20 | VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Ziqi Huang et.al. | 2411.13503 | link |
2024-11-20 | AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | Gaurav Verma et.al. | 2411.13451 | null |
2024-11-20 | From Prompt Engineering to Prompt Craft | Joseph Lindley et.al. | 2411.13422 | null |
2024-11-20 | Theory-independent monitoring of the decoherence of a superconducting qubit with generalized contextuality | Albert Aloy et.al. | 2411.13421 | link |
2024-11-20 | Unleashing the Power of Large Language Models for Group POI Recommendations | Jing Long et.al. | 2411.13415 | null |
2024-11-21 | Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese | Dat Van-Thanh Nguyen et.al. | 2411.13407 | null |
2024-11-20 | Adversarial Diffusion Compression for Real-World Image Super-Resolution | Bin Chen et.al. | 2411.13383 | link |
2024-11-20 | I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception | Jiawei Zhang et.al. | 2411.13314 | null |
2024-11-20 | Combining Autoregressive and Autoencoder Language Models for Text Classification | João Gonçalves et.al. | 2411.13282 | link |
2024-11-19 | ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models | Salma Kharrat et.al. | 2411.12736 | link |
2024-11-19 | Neurosymbolic Graph Enrichment for Grounded World Models | Stefano De Giorgis et.al. | 2411.12671 | null |
2024-11-19 | SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation | Ron Keuth et.al. | 2411.12602 | link |
2024-11-19 | AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Yuanbin Man et.al. | 2411.12593 | null |
2024-11-19 | Large Language Models for Combinatorial Optimization of Design Structure Matrix | Shuo Jiang et.al. | 2411.12571 | null |
2024-11-19 | Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution | Yang Zou et.al. | 2411.12530 | link |
2024-11-19 | Human-AI Co-Creativity: Exploring Synergies Across Levels of Creative Collaboration | Jennifer Haase et.al. | 2411.12527 | null |
2024-11-19 | 3D Reconstruction by Looking: Instantaneous Blind Spot Detector for Indoor SLAM through Mixed Reality | Hanbeom Chang et.al. | 2411.12514 | null |
2024-11-19 | Evaluating the Prompt Steerability of Large Language Models | Erik Miehling et.al. | 2411.12405 | link |
2024-11-19 | DGSNA: prompt-based Dynamic Generative Scene-based Noise Addition method | Zihao Chen et.al. | 2411.12363 | null |
2024-11-18 | Absorbing state dynamics of stochastic gradient descent | Guanming Zhang et.al. | 2411.11834 | null |
2024-11-18 | The Lambda Calculus is Quantifiable | Valentin Maestracci et.al. | 2411.11809 | null |
2024-11-18 | Novel Application of Neutrinos to Evaluate U.S. Nuclear Weapons Performance | J. R. Distel et.al. | 2411.11804 | null |
2024-11-18 | Competing Bandits in Decentralized Large Contextual Matching Markets | Satush Parikh et.al. | 2411.11794 | null |
2024-11-18 | LLM-IE: A Python Package for Generative Information Extraction with Large Language Models | Enshuo Hsu et.al. | 2411.11779 | null |
2024-11-18 | Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment | Allison Huang et.al. | 2411.11731 | link |
2024-11-18 | Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation | Mingchao Qi et.al. | 2411.11714 | link |
2024-11-18 | Exploring LLMs for Verifying Technical System Specifications Against Requirements | Lasse M. Reinpold et.al. | 2411.11582 | null |
2024-11-18 | Simple But Not Secure: An Empirical Security Analysis of Two-factor Authentication Systems | Zhi Wang et.al. | 2411.11551 | null |
2024-11-18 | A Code Knowledge Graph-Enhanced System for LLM-Based Fuzz Driver Generation | Hanxiang Xu et.al. | 2411.11532 | link |
2024-11-15 | LLaVA-o1: Let Vision Language Models Reason Step-by-Step | Guowei Xu et.al. | 2411.10440 | link |
2024-11-15 | Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations | Jianfeng Chi et.al. | 2411.10414 | null |
2024-11-15 | Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation | Markus Karmann et.al. | 2411.10411 | null |
2024-11-15 | On the Foundation Model for Cardiac MRI Reconstruction | Chi Zhang et.al. | 2411.10403 | null |
2024-11-15 | A Survey of Event Causality Identification: Principles, Taxonomy, Challenges, and Assessment | Zefan Zeng et.al. | 2411.10371 | null |
2024-11-15 | Bias Unveiled: Investigating Social Bias in LLM-Generated Code | Lin Ling et.al. | 2411.10351 | null |
2024-11-15 | Number it: Temporal Grounding Videos like Flipping Manga | Yongliang Wu et.al. | 2411.10332 | link |
2024-11-15 | Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding | Huming Qiu et.al. | 2411.10329 | null |
2024-11-15 | Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Jingru Yang et.al. | 2411.10252 | null |
2024-11-15 | Measuring Non-Adversarial Reproduction of Training Data in Large Language Models | Michael Aerni et.al. | 2411.10242 | null |
2024-11-14 | MagicQuill: An Intelligent Interactive Image Editing System | Zichen Liu et.al. | 2411.09703 | link |
2024-11-14 | LLM Hallucination Reasoning with Zero-shot Knowledge Test | Seongmin Lee et.al. | 2411.09689 | null |
2024-11-14 | Squeezed Attention: Accelerating Long Context Length LLM Inference | Coleman Hooper et.al. | 2411.09688 | link |
2024-11-14 | The lowest-radiation environments in the Solar System: new opportunities for underground rare-event searches | Xilin Zhang et.al. | 2411.09634 | null |
2024-11-14 | Local deployment of large-scale music AI models on commodity hardware | Xun Zhou et.al. | 2411.09625 | null |
2024-11-14 | PTR: Precision-Driven Tool Recommendation for Large Language Models | Hang Gao et.al. | 2411.09613 | null |
2024-11-14 | Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration | Yifan Shao et.al. | 2411.09604 | link |
2024-11-14 | LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models | Zhengyi Wang et.al. | 2411.09595 | null |
2024-11-14 | SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas | Yu-Kai Hung et.al. | 2411.09577 | null |
2024-11-14 | Spider: Any-to-Many Multimodal LLM | Jinxiang Lai et.al. | 2411.09439 | link |
2024-11-13 | Large Wireless Model (LWM): A Foundation Model for Wireless Channels | Sadjad Alikhani et.al. | 2411.08872 | link |
2024-11-13 | The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Daniel P. Jeong et.al. | 2411.08870 | link |
2024-11-13 | CamemBERT 2.0: A Smarter French Language Model Aged to Perfection | Wissam Antoun et.al. | 2411.08868 | null |
2024-11-13 | LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | Piyush Jha et.al. | 2411.08862 | null |
2024-11-13 | Process-aware Human Activity Recognition | Jiawei Zheng et.al. | 2411.08814 | null |
2024-11-13 | Logic-based Knowledge Awareness for Autonomous Agents in Continuous Spaces | Arabinda Ghosh et.al. | 2411.08754 | null |
2024-11-13 | Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers | Clément Dumas et.al. | 2411.08745 | link |
2024-11-13 | New advances in universal approximation with neural networks of minimal width | Dennis Rochau et.al. | 2411.08735 | null |
2024-11-14 | Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models | Somanshu Singla et.al. | 2411.08733 | link |
2024-11-13 | Polymetis:Large Language Modeling for Multiple Material Domains | Chao Huang et.al. | 2411.08728 | null |
2024-11-12 | From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents | Chuyi Kong et.al. | 2411.07965 | null |
2024-11-12 | MANTIS: A Mixed-Signal Near-Sensor Convolutional Imager SoC Using Charge-Domain 4b-Weighted 5-to-84-TOPS/W MAC Operations for Feature Extraction and Region-of-Interest Detection | Martin Lefebvre et.al. | 2411.07946 | null |
2024-11-12 | CryptoLLM: Unleashing the Power of Prompted LLMs for SmartQnA and Classification of Crypto Posts | Aniket Deroy et.al. | 2411.07917 | null |
2024-11-12 | INTRABENCH: Interactive Radiological Benchmark | Constantin Ulrich et.al. | 2411.07885 | null |
2024-11-12 | Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models | Yusen Zhang et.al. | 2411.07858 | link |
2024-11-12 | FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training | Philip Zmushko et.al. | 2411.07837 | link |
2024-11-12 | Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices | Kilian Pfeiffer et.al. | 2411.07826 | null |
2024-11-12 | Federated Low-Rank Adaptation with Differential Privacy over Wireless Networks | Tianqu Kang et.al. | 2411.07806 | null |
2024-11-12 | RedCode: Risky Code Execution and Generation Benchmark for Code Agents | Chengquan Guo et.al. | 2411.07781 | link |
2024-11-12 | Topological resilience of optical skyrmions in local decoherence | Li-Wen Wang et.al. | 2411.07775 | null |
2024-11-11 | Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations | Chaitanya Malaviya et.al. | 2411.07237 | null |
2024-11-11 | Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models | Yoad Tewel et.al. | 2411.07232 | null |
2024-11-11 | Tasks, Time, and Tools: Quantifying Online Sensemaking Efforts Through a Survey-based Study | Andrew Kuznetsov et.al. | 2411.07206 | null |
2024-11-11 | DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID | Nyle Siddiqui et.al. | 2411.07205 | link |
2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186 | null |
2024-11-11 | SAMPart3D: Segment Any Part in 3D Objects | Yunhan Yang et.al. | 2411.07184 | link |
2024-11-11 | Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis | Taihang Hu et.al. | 2411.07132 | link |
2024-11-11 | Fast and Robust Contextual Node Representation Learning over Dynamic Graphs | Xingzhi Guo et.al. | 2411.07123 | null |
2024-11-11 | Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation | Ziwei Liu et.al. | 2411.07021 | null |
2024-11-11 | Flaring gamma-ray emission coincident with a hyperactive fast radio burst source | Yi Xing et.al. | 2411.06996 | null |
2024-11-08 | LLMs as Method Actors: A Model for Prompt Engineering and Architecture | Colin Doyle et.al. | 2411.05778 | link |
2024-11-08 | Quantitative Assessment of Intersectional Empathetic Bias and Understanding | Vojtech Formanek et.al. | 2411.05777 | link |
2024-11-08 | End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Dylan Goetting et.al. | 2411.05755 | link |
2024-11-08 | A doublet of cosmological models to challenge the H0 tension in the Pantheon Supernovae Ia catalog | B. De Simone et.al. | 2411.05744 | null |
2024-11-08 | Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition | Abhisek Ray et.al. | 2411.05692 | link |
2024-11-08 | Tell What You Hear From What You See – Video to Audio Generation Through Text | Xiulong Liu et.al. | 2411.05679 | link |
2024-11-08 | Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation | Xiwen Wei et.al. | 2411.05663 | link |
2024-11-08 | Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation | Long Truong To et.al. | 2411.05641 | null |
2024-11-08 | From Resource Control to Digital Trust with User-Managed Access | Wouter Termont et.al. | 2411.05622 | null |
2024-11-08 | Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages | JA Meaney et.al. | 2411.05593 | null |
2024-11-07 | SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Muyang Li et.al. | 2411.05007 | link |
2024-11-07 | HourVideo: 1-Hour Video-Language Understanding | Keshigeyan Chandrasegaran et.al. | 2411.04998 | link |
2024-11-07 | Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives | Hao Sun et.al. | 2411.04991 | link |
2024-11-07 | DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion | Wenqiang Sun et.al. | 2411.04928 | null |
2024-11-07 | StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration | Panwen Hu et.al. | 2411.04925 | null |
2024-11-07 | Structure Matters: Dynamic Policy Gradient | Sara Klein et.al. | 2411.04913 | null |
2024-11-07 | In the Era of Prompt Learning with Vision-Language Models | Ankit Jha et.al. | 2411.04892 | null |
2024-11-07 | Prompt-Guided Internal States for Hallucination Detection of Large Language Models | Fujie Zhang et.al. | 2411.04847 | link |
2024-11-07 | VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models | Ming Cheng et.al. | 2411.04825 | null |
2024-11-07 | Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet | Elija Deineko et.al. | 2411.04777 | null |
2024-11-06 | Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? | Daniel P. Jeong et.al. | 2411.04118 | link |
2024-11-06 | Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset | Alexandre Galashov et.al. | 2411.04034 | null |
2024-11-06 | Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages | Aniket Deroy et.al. | 2411.04025 | null |
2024-11-06 | Predicting and Publishing Accurate Imbalance Prices Using Monte Carlo Tree Search | Fabio Pavirani et.al. | 2411.04011 | null |
2024-11-06 | Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning | Jiawei Yao et.al. | 2411.03978 | link |
2024-11-06 | Continuous-Time State Estimation Methods in Robotics: A Survey | William Talbot et.al. | 2411.03951 | null |
2024-11-06 | Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks | Felipe Marra et.al. | 2411.03948 | link |
2024-11-06 | Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks | Ryan Campbell et.al. | 2411.03945 | link |
2024-11-06 | Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models | Minh Duc Bui et.al. | 2411.03888 | link |
2024-11-06 | Data Fusion of Synthetic Query Variants With Generative Large Language Models | Timo Breuer et.al. | 2411.03881 | link |
2024-11-05 | Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? | Jingyu Xiao et.al. | 2411.03292 | link |
2024-11-05 | Proxy-informed Bayesian transfer learning with unknown sources | Sabina J. Sloman et.al. | 2411.03263 | null |
2024-11-05 | DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models | Ying Zhou et.al. | 2411.03250 | null |
2024-11-05 | On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models | Tariq Berrada Ifriqi et.al. | 2411.03177 | null |
2024-11-05 | From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice | Alicia Guo et.al. | 2411.03137 | null |
2024-11-05 | MA^2: A Self-Supervised and Motion Augmenting Autoencoder for Gait-Based Automatic Disease Detection | Yiqun Liu et.al. | 2411.03129 | null |
2024-11-05 | “Create a Fear of Missing Out” – ChatGPT Implements Unsolicited Deceptive Designs in Generated Websites Without Warning | Veronika Krauß et.al. | 2411.03108 | null |
2024-11-05 | Speech Separation with Pretrained Frontend to Minimize Domain Mismatch | Wupeng Wang et.al. | 2411.03085 | link |
2024-11-05 | Growing a Tail: Increasing Output Diversity in Large Language Models | Michal Shur-Ofry et.al. | 2411.02989 | null |
2024-11-05 | AtlasSeg: Atlas Prior Guided Dual-U-Net for Cortical Segmentation in Fetal Brain MRI | Haoan Xu et.al. | 2411.02867 | null |
2024-11-04 | Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages | Hoang Nguyen et.al. | 2411.02398 | null |
2024-11-04 | Training-free Regional Prompting for Diffusion Transformers | Anthony Chen et.al. | 2411.02395 | link |
2024-11-04 | Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning | Md Rifat Arefin et.al. | 2411.02344 | link |
2024-11-04 | Prospects for optical detections from binary neutron star mergers with the next-generation multi-messenger observatories | E. Loffredo et.al. | 2411.02342 | link |
2024-11-04 | PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Ruyang Liu et.al. | 2411.02327 | link |
2024-11-04 | An Empirical Study on the Code Refactoring Capability of Large Language Models | Jonathan Cordeiro et.al. | 2411.02320 | null |
2024-11-04 | Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast | Marilyn Rego et.al. | 2411.02318 | null |
2024-11-04 | Defining and Evaluating Physical Safety for Large Language Models | Yung-Chen Tang et.al. | 2411.02317 | null |
2024-11-04 | CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments | Kung-Hsiang Huang et.al. | 2411.02305 | link |
2024-11-04 | Combining Induction and Transduction for Abstract Reasoning | Wen-Ding Li et.al. | 2411.02272 | link |
2024-10-31 | DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion | Weicai Ye et.al. | 2410.24203 | link |
2024-10-31 | **Redefining |
Fu Feng et.al. | 2410.24160 | null |
2024-10-31 | Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age | Nouar AlDahoul et.al. | 2410.24148 | null |
2024-10-31 | COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes | Muhammad Ali et.al. | 2410.24139 | link |
2024-10-31 | Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing | Akash Dhruv et.al. | 2410.24119 | link |
2024-10-31 | AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization | Amir Kazemi et.al. | 2410.24116 | null |
2024-10-31 | In-Context Fine-Tuning for Time-Series Foundation Models | Abhimanyu Das et.al. | 2410.24087 | null |
2024-10-31 | Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs | Muhammed Saeed et.al. | 2410.24049 | null |
2024-10-31 | Handwriting Recognition in Historical Documents with Multimodal LLM | Lucian Li et.al. | 2410.24034 | null |
2024-10-31 | Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks | Yingzhe Peng et.al. | 2410.24032 | null |
2024-10-30 | RelationBooth: Towards Relation-Aware Customized Object Generation | Qingyu Shi et.al. | 2410.23280 | null |
2024-10-30 | SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation | Yining Hong et.al. | 2410.23277 | null |
2024-10-30 | EMMA: End-to-End Multimodal Model for Autonomous Driving | Jyh-Jing Hwang et.al. | 2410.23262 | null |
2024-10-30 | Evaluating Cultural and Social Awareness of LLM Web Agents | Haoyi Qiu et.al. | 2410.23252 | null |
2024-10-30 | ProTransformer: Robustify Transformers via Plug-and-Play Paradigm | Zhichao Hou et.al. | 2410.23182 | link |
2024-10-30 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay et.al. | 2410.23180 | link |
2024-10-31 | Why Gradient Subspace? Identifying and Mitigating LoRA’s Bottlenecks in Federated Fine-Tuning of Large Language Models | Navyansh Mahla et.al. | 2410.23111 | null |
2024-10-30 | PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures | Tianxiang Wu et.al. | 2410.23089 | null |
2024-10-30 | BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference | Junqi Zhao et.al. | 2410.23079 | link |
2024-10-30 | Toward Understanding In-context vs. In-weight Learning | Bryan Chan et.al. | 2410.23042 | null |
2024-10-29 | Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier | Kai Wang et.al. | 2410.22317 | link |
2024-10-29 | Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving | Bo Jiang et.al. | 2410.22313 | link |
2024-10-29 | Embedding-based classifiers can detect prompt injection attacks | Md. Ahsan Ayub et.al. | 2410.22284 | link |
2024-10-29 | Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models | Renzhe Yu et.al. | 2410.22282 | null |
2024-10-29 | NCA-Morph: Medical Image Registration with Neural Cellular Automata | Amin Ranem et.al. | 2410.22265 | link |
2024-10-29 | FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation | Farima Fatahi Bayat et.al. | 2410.22257 | null |
2024-10-29 | ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising | Ashutosh Chaubey et.al. | 2410.22233 | link |
2024-10-29 | Synthetic Data Generation with Large Language Models for Personalized Community Question Answering | Marco Braga et.al. | 2410.22182 | link |
2024-10-29 | Benchmarking LLM Guardrails in Handling Multilingual Toxicity | Yahan Yang et.al. | 2410.22153 | null |
2024-10-29 | AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts | Vishal Kumar et.al. | 2410.22143 | null |
2024-10-28 | Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context | Manuel Benavent-Lledo et.al. | 2410.21275 | link |
2024-10-28 | Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Yaniv Nikankin et.al. | 2410.21272 | link |
2024-10-28 | LoRA vs Full Fine-tuning: An Illusion of Equivalence | Reece Shuttleworth et.al. | 2410.21228 | null |
2024-10-28 | Exploring contextual modeling with linear complexity for point cloud segmentation | Yong Xien Chng et.al. | 2410.21211 | null |
2024-10-28 | Simplest Mechanism Builder Algorithm (SiMBA): An Automated Microkinetic Model Discovery Tool | Miguel Ángel de Carvalho Servia et.al. | 2410.21205 | link |
2024-10-28 | CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants | Lize Alberts et.al. | 2410.21159 | link |
2024-10-28 | Palisade – Prompt Injection Detection Framework | Sahasra Kokkula et.al. | 2410.21146 | null |
2024-10-28 | Do LLMs generate test oracles that capture the actual or the expected program behaviour? | Michael Konstantinou et.al. | 2410.21136 | null |
2024-10-28 | KA $^2$ ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation | Shangde Gao et.al. | 2410.21085 | null |
2024-10-28 | Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring | Honglin Mu et.al. | 2410.21083 | null |
2024-10-25 | Model merging with SVD to tie the Knots | George Stoica et.al. | 2410.19735 | link |
2024-10-25 | Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks | Yinglun Xu et.al. | 2410.19705 | null |
2024-10-25 | Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs | Yifei Zhang et.al. | 2410.19694 | null |
2024-10-25 | AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | Clemencia Siro et.al. | 2410.19692 | null |
2024-10-25 | Planning-Aware Diffusion Networks for Enhanced Motion Forecasting in Autonomous Driving | Liu Yunhao et.al. | 2410.19639 | null |
2024-10-25 | GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing | Hosam Elgendy et.al. | 2410.19552 | link |
2024-10-25 | CloserMusicDB: A Modern Multipurpose Dataset of High Quality Music | Aleksandra Piekarzewicz et.al. | 2410.19540 | null |
2024-10-25 | Optimization with First Order Algorithms | Charles Dossal et.al. | 2410.19506 | null |
2024-10-25 | Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization | Anthony Cui et.al. | 2410.19499 | null |
2024-10-25 | A Debate-Driven Experiment on LLM Hallucinations and Accuracy | Ray Li et.al. | 2410.19485 | null |
2024-10-24 | Unbounded: A Generative Infinite Game of Character Life Simulation | Jialu Li et.al. | 2410.18975 | null |
2024-10-24 | ConceptDrift: Uncovering Biases through the Lens of Foundational Models | Cristian Daniel Păduraru et.al. | 2410.18970 | null |
2024-10-24 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Zhangheng Li et.al. | 2410.18967 | null |
2024-10-24 | On the Crucial Role of Initialization for Matrix Factorization | Bingcong Li et.al. | 2410.18965 | null |
2024-10-24 | Learning to Look: Seeking Information for Decision Making via Policy Factorization | Shivin Dass et.al. | 2410.18964 | null |
2024-10-24 | Context is Key: A Benchmark for Forecasting with Essential Textual Information | Andrew Robert Williams et.al. | 2410.18959 | link |
2024-10-24 | BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Yujuan Velvin Fu et.al. | 2410.18955 | null |
2024-10-24 | From Blind Solvers to Logical Thinkers: Benchmarking LLMs’ Logical Integrity on Faulty Mathematical Problems | A M Muntasir Rahman et.al. | 2410.18921 | null |
2024-10-25 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
2024-10-24 | PRISM: A Methodology for Auditing Biases in Large Language Models | Leif Azzopardi et.al. | 2410.18906 | link |
2024-10-23 | TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts | Yuxuan Xie et.al. | 2410.18071 | null |
2024-10-23 | Disordered charge density waves in the kagome metal FeGe | Hengxin Tan et.al. | 2410.18063 | null |
2024-10-23 | CLEAR: Character Unlearning in Textual and Visual Modalities | Alexey Dontsov et.al. | 2410.18057 | null |
2024-10-23 | Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases | Anna Glazkova et.al. | 2410.18040 | null |
2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
2024-10-23 | Measurements of $ψ{(2S)}$ and $χ_{c1}(3872)$ production within fully reconstructed jets | LHCb collaboration et.al. | 2410.18018 | null |
2024-10-23 | Scalable Ranked Preference Optimization for Text-to-Image Generation | Shyamgopal Karthik et.al. | 2410.18013 | null |
2024-10-23 | Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Suho Kang et.al. | 2410.18001 | link |
2024-10-23 | An evolutionary game theory approach to modeling behavioral interaction in disclosing infection begins with an outbreak: COVID-19 as an example | Pranav Verma et.al. | 2410.17996 | null |
2024-10-23 | Closed-form merging of parameter-efficient modules for Federated Continual Learning | Riccardo Salami et.al. | 2410.17961 | null |
2024-10-22 | Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods | Tsachi Blau et.al. | 2410.17222 | null |
2024-10-22 | Hierarchical Upper Confidence Bounds for Constrained Online Learning | Ali Baheri et.al. | 2410.17216 | null |
2024-10-22 | YOLO-TS: Real-Time Traffic Sign Detection with Enhanced Accuracy Using Optimized Receptive Fields and Anchor-Free Fusion | Junzhou Chen et.al. | 2410.17144 | null |
2024-10-22 | PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles | Li Siyan et.al. | 2410.17127 | link |
2024-10-22 | Enhancing Answer Attribution for Faithful Text Generation with Large Language Models | Juraj Vladika et.al. | 2410.17112 | null |
2024-10-23 | Optimal Design for Reward Modeling in RLHF | Antoine Scheid et.al. | 2410.17055 | null |
2024-10-22 | Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups | Charvi Rastogi et.al. | 2410.17032 | null |
2024-10-23 | GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks | Shuyang Hou et.al. | 2410.17031 | null |
2024-10-22 | SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine | Xiaochen Wang et.al. | 2410.17021 | null |
2024-10-22 | LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices | Chuntao Ding et.al. | 2410.16954 | link |
2024-10-21 | SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Shuangrui Ding et.al. | 2410.16268 | link |
2024-10-21 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239 | link |
2024-10-21 | Building A Coding Assistant via the Retrieval-Augmented Language Model | Xinze Li et.al. | 2410.16229 | link |
2024-10-21 | Theoretical Limitations of Ensembles in the Age of Overparameterization | Niclas Dern et.al. | 2410.16201 | null |
2024-10-21 | From Tokens to Materials: Leveraging Language Models for Scientific Discovery | Yuwei Wan et.al. | 2410.16165 | link |
2024-10-21 | An Explainable Contrastive-based Dilated Convolutional Network with Transformer for Pediatric Pneumonia Detection | Chandravardhan Singh Raghaw et.al. | 2410.16143 | null |
2024-10-21 | Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs | Kang Zhao et.al. | 2410.16135 | null |
2024-10-21 | Do LLMs write like humans? Variation in grammatical and rhetorical styles | Alex Reinhart et.al. | 2410.16107 | null |
2024-10-21 | Analysing the Residual Stream of Language Models Under Knowledge Conflicts | Yu Zhao et.al. | 2410.16090 | null |
2024-10-21 | Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context | Maggie Mi et.al. | 2410.16069 | null |
2024-10-18 | MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps | Xiongtao Zhou et.al. | 2410.14668 | link |
2024-10-18 | DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph | Maitreya Prafulla Chitale et.al. | 2410.14666 | null |
2024-10-18 | GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | Raghuveer Thirukovalluru et.al. | 2410.14635 | link |
2024-10-18 | CELI: Controller-Embedded Language Model Interactions | Jan-Samuel Wagner et.al. | 2410.14627 | null |
2024-10-18 | DiSCo Meets LLMs: A Unified Approach for Sparse Retrieval and Contextual Distillation in Conversational Search | Simon Lupart et.al. | 2410.14609 | null |
2024-10-18 | Neural Combinatorial Clustered Bandits for Recommendation Systems | Baran Atalar et.al. | 2410.14586 | null |
2024-10-18 | Do LLMs “know” internally when they follow instructions? | Juyeon Heo et.al. | 2410.14516 | link |
2024-10-18 | CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection | Andrea Appiani et.al. | 2410.14509 | null |
2024-10-18 | Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models | Cody Clop et.al. | 2410.14479 | null |
2024-10-18 | An abstract structure determines the contextuality degree of observable-based Kochen-Specker proofs | Axel Muller et.al. | 2410.14463 | null |
2024-10-17 | Can MLLMs Understand the Deep Implication Behind Chinese Images? | Chenhao Zhang et.al. | 2410.13854 | link |
2024-10-17 | AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | Ke Yang et.al. | 2410.13825 | null |
2024-10-17 | ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution | Junhao Gu et.al. | 2410.13807 | null |
2024-10-17 | PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment | Zekun Moore Wang et.al. | 2410.13785 | null |
2024-10-17 | Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors | Georgios Chochlakis et.al. | 2410.13776 | null |
2024-10-17 | Improving Multi-modal Large Language Model through Boosting Vision Capabilities | Yanpeng Sun et.al. | 2410.13733 | null |
2024-10-17 | Persistent Pre-Training Poisoning of LLMs | Yiming Zhang et.al. | 2410.13722 | null |
2024-10-17 | Jailbreaking LLM-Controlled Robots | Alexander Robey et.al. | 2410.13691 | null |
2024-10-17 | Label-free prediction of fluorescence markers in bovine satellite cells using deep learning | Sania Sinha et.al. | 2410.13685 | null |
2024-10-18 | Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion | Yijun Liang et.al. | 2410.13674 | link |
2024-10-16 | Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media | Ross Deans Kristensen-McLachlan et.al. | 2410.12791 | null |
2024-10-16 | Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models | Ce Zhang et.al. | 2410.12790 | link |
2024-10-16 | JudgeBench: A Benchmark for Evaluating LLM-based Judges | Sijun Tan et.al. | 2410.12784 | link |
2024-10-16 | Context-Scaling versus Task-Scaling in In-Context Learning | Amirhesam Abedsoltan et.al. | 2410.12783 | null |
2024-10-16 | SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Jaehong Yoon et.al. | 2410.12761 | null |
2024-10-16 | How Does Variance Shape the Regret in Contextual Bandits? | Zeyu Jia et.al. | 2410.12713 | null |
2024-10-16 | Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization | Xingqi Wang et.al. | 2410.12700 | link |
2024-10-17 | Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 | Mohamad Abdi et.al. | 2410.12686 | null |
2024-10-17 | Context Matters: Leveraging Contextual Features for Time Series Forecasting | Sameep Chattopadhyay et.al. | 2410.12672 | null |
2024-10-16 | CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training | Zhiyuan Ma et.al. | 2410.12595 | null |
2024-10-15 | KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities | Hsin-Ping Huang et.al. | 2410.11824 | null |
2024-10-15 | SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing | Zhiyuan Zhang et.al. | 2410.11815 | null |
2024-10-15 | Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability | Tsz Ting Chung et.al. | 2410.11786 | null |
2024-10-15 | On the Training Convergence of Transformers for In-Context Classification | Wei Shen et.al. | 2410.11778 | null |
2024-10-15 | SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding | Ying Chen et.al. | 2410.11761 | null |
2024-10-15 | Identification and modelling of optically thin inverse Compton scattering in the prompt emission of GRB131014A | Pragyan Pratim Bordoloi et.al. | 2410.11753 | null |
2024-10-15 | Personas with Attitudes: Controlling LLMs for Diverse Data Annotation | Leon Fröhling et.al. | 2410.11745 | link |
2024-10-15 | RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation | Anton Antonov et.al. | 2410.11722 | link |
2024-10-15 | Adaptive Coordinators and Prompts on Heterogeneous Graphs for Cross-Domain Recommendations | Hengyu Zhang et.al. | 2410.11719 | null |
2024-10-15 | It’s Just Another Day: Unique Video Captioning by Discriminative Prompting | Toby Perrett et.al. | 2410.11702 | null |
2024-10-14 | Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Jingzhi Bao et.al. | 2410.10821 | link |
2024-10-14 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li et.al. | 2410.10814 | link |
2024-10-14 | Denial-of-Service Poisoning Attacks against Large Language Models | Kuofeng Gao et.al. | 2410.10760 | link |
2024-10-14 | Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification | Jan Cegin et.al. | 2410.10756 | link |
2024-10-14 | FlexGen: Flexible Multi-View Generation from Text and Image Inputs | Xinli Xu et.al. | 2410.10745 | null |
2024-10-14 | SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing | Pengrui Quan et.al. | 2410.10741 | link |
2024-10-14 | Large Language Models Are Active Critics in NLG Evaluation | Shuying Xu et.al. | 2410.10724 | null |
2024-10-15 | 4-LEGS: 4D Language Embedded Gaussian Splatting | Gal Fiebelman et.al. | 2410.10719 | null |
2024-10-14 | Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues | Qibing Ren et.al. | 2410.10700 | link |
2024-10-14 | Functional Flexibility in Generative AI Interfaces: Text Editing with LLMs through Conversations, Toolbars, and Prompts | Florian Lehmann et.al. | 2410.10644 | null |
2024-10-11 | AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation | Zijun Wang et.al. | 2410.09040 | link |
2024-10-11 | Mentor-KD: Making Small Language Models Better Multi-step Reasoners | Hojae Lee et.al. | 2410.09037 | link |
2024-10-11 | AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents | Maksym Andriushchenko et.al. | 2410.09024 | null |
2024-10-11 | Parameter-Efficient Fine-Tuning of State Space Models | Kevin Galim et.al. | 2410.09016 | link |
2024-10-11 | The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals | Xiaofeng Wu et.al. | 2410.09013 | null |
2024-10-11 | Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models | Hao Li et.al. | 2410.09012 | link |
2024-10-11 | Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory | Rebecca M. M. Hicke et.al. | 2410.08991 | link |
2024-10-11 | Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements | Jingyu Zhang et.al. | 2410.08968 | null |
2024-10-11 | Exploring the Design Space of Cognitive Engagement Techniques with AI-Generated Code for Enhanced Learning | Majeed Kazemitabaar et.al. | 2410.08922 | null |
2024-10-11 | Utilizing ChatGPT in a Data Structures and Algorithms Course: A Teaching Assistant’s Perspective | Pooriya Jamie et.al. | 2410.08899 | null |
2024-10-10 | LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts | Anh-Quan Cao et.al. | 2410.08211 | null |
2024-10-10 | HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation | Shanyan Guan et.al. | 2410.08192 | null |
2024-10-10 | SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation | Hang Yin et.al. | 2410.08189 | null |
2024-10-10 | Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs | Xiaoyuan Liu et.al. | 2410.08145 | link |
2024-10-10 | Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks | Mathis Pink et.al. | 2410.08133 | null |
2024-10-10 | Think Beyond Size: Dynamic Prompting for More Effective Reasoning | Kamesh R et.al. | 2410.08130 | null |
2024-10-10 | What Makes Large Language Models Reason in (Multi-Turn) Code Generation? | Kunhao Zheng et.al. | 2410.08105 | null |
2024-10-10 | Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Wenting Tan et.al. | 2410.08068 | link |
2024-10-10 | Reversible Decoupling Network for Single Image Reflection Removal | Hao Zhao et.al. | 2410.08063 | link |
2024-10-10 | Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions | Inderjeet Nair et.al. | 2410.08058 | link |
2024-10-09 | MM-Ego: Towards Building Egocentric Multimodal LLMs | Hanrong Ye et.al. | 2410.07177 | null |
2024-10-09 | One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | Fabian Paischer et.al. | 2410.07170 | link |
2024-10-09 | AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation | Yukang Cao et.al. | 2410.07164 | null |
2024-10-09 | InstructG2I: Synthesizing Images from Multimodal Attributed Graphs | Bowen Jin et.al. | 2410.07157 | link |
2024-10-09 | VHELM: A Holistic Evaluation of Vision Language Models | Tony Lee et.al. | 2410.07112 | link |
2024-10-09 | I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy | Gian Maria Campedelli et.al. | 2410.07109 | link |
2024-10-09 | Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context | Sangwon Yu et.al. | 2410.07103 | null |
2024-10-09 | Robots in the Middle: Evaluating LLMs in Dispute Resolution | Jinzhe Tan et.al. | 2410.07053 | null |
2024-10-09 | PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness | Zekun Wang et.al. | 2410.07035 | null |
2024-10-09 | Modeling of the Gamma Ray Burst photospheric emission: Monte Carlo simulation of the GRB prompt emission, numerical results and discussion | Amina Trabelsi et.al. | 2410.07005 | link |
2024-10-07 | GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting | Yukang Cao et.al. | 2410.05259 | null |
2024-10-08 | TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models | Rabin Adhikari et.al. | 2410.05239 | link |
2024-10-07 | Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer | Siyuan Hou et.al. | 2410.05151 | null |
2024-10-08 | PAMLR: A Passive-Active Multi-Armed Bandit-Based Solution for LoRa Channel Allocation | Jihoon Yun et.al. | 2410.05147 | null |
2024-10-07 | CR-CTC: Consistency regularization on CTC for improved speech recognition | Zengwei Yao et.al. | 2410.05101 | link |
2024-10-07 | IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification | Yan He et.al. | 2410.05100 | null |
2024-10-07 | Human-in-the-loop Reasoning For Traffic Sign Detection: Collaborative Approach Yolo With Video-llava | Mehdi Azarafza et.al. | 2410.05096 | null |
2024-10-07 | HyperINF: Unleashing the HyperPower of the Schulz’s Method for Data Influence Estimation | Xinyu Zhou et.al. | 2410.05090 | link |
2024-10-07 | ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | Ziru Chen et.al. | 2410.05080 | null |
2024-10-07 | Large Language Model Based Multi-Objective Optimization for Integrated Sensing and Communications in UAV Networks | Haoyun Li et.al. | 2410.05062 | null |
2024-10-04 | Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models | Tinghui Zhu et.al. | 2410.03659 | link |
2024-10-04 | Conditional Enzyme Generation Using Protein Language Models with Adapters | Jason Yang et.al. | 2410.03634 | null |
2024-10-04 | Searching for type I seesaw mechanism in a two Heavy Neutral Leptons scenario at FCC-ee | Sehar Ajmal et.al. | 2410.03615 | null |
2024-10-04 | Understanding Reasoning in Chain-of-Thought from the Hopfieldian View | Lijie Hu et.al. | 2410.03595 | null |
2024-10-04 | Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models | Xin Zou et.al. | 2410.03577 | link |
2024-10-04 | Individual vaccination as Nash equilibrium in a SIR model with application to the 2009-10 Influenza A(H1N1) epidemic in France | Laetitia Laguzet et.al. | 2410.03567 | null |
2024-10-04 | Re-examining Sexism and Misogyny Classification with Annotator Attitudes | Aiqi Jiang et.al. | 2410.03543 | null |
2024-10-04 | Collaborative and Efficient Personalization with Mixtures of Adaptors | Abdulla Jasem Almansoori et.al. | 2410.03497 | null |
2024-10-04 | Gradient-based Jailbreak Images for Multimodal Fusion Models | Javier Rando et.al. | 2410.03489 | link |
2024-10-04 | Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation | Tobias Leemann et.al. | 2410.03461 | null |
2024-10-03 | Erasing Conceptual Knowledge from Language Models | Rohit Gandikota et.al. | 2410.02760 | link |
2024-10-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757 | null |
2024-10-03 | CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation | Han He et.al. | 2410.02748 | link |
2024-10-03 | Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization | Lei Xu et.al. | 2410.02741 | link |
2024-10-03 | Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | Rohin Manvi et.al. | 2410.02725 | null |
2024-10-03 | Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization | Ryan C. Barron et.al. | 2410.02721 | null |
2024-10-03 | HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly | Howard Yen et.al. | 2410.02694 | link |
2024-10-03 | HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router | Lingrui Mei et.al. | 2410.02684 | link |
2024-10-03 | DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life | Yu Ying Chiu et.al. | 2410.02683 | null |
2024-10-03 | Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models | Shuoyuan Wang et.al. | 2410.02681 | null |
2024-10-02 | DreamGarden: A Designer Assistant for Growing Games from a Single Prompt | Sam Earle et.al. | 2410.01791 | null |
2024-10-02 | Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam et.al. | 2410.01782 | link |
2024-10-02 | Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning | Xingrui Gu et.al. | 2410.01739 | null |
2024-10-02 | LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits | Duy Nguyen et.al. | 2410.01735 | link |
2024-10-02 | ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation | Rinon Gal et.al. | 2410.01731 | null |
2024-10-02 | Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting | Longyu Feng et.al. | 2410.01724 | null |
2024-10-02 | Examining the Role of Relationship Alignment in Large Language Models | Kristen M. Altenburger et.al. | 2410.01708 | null |
2024-10-02 | FactAlign: Long-form Factuality Alignment of Large Language Models | Chao-Wei Huang et.al. | 2410.01691 | link |
2024-10-02 | Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding | Yanming Liu et.al. | 2410.01671 | null |
2024-10-02 | Extending Contextual Self-Modulation: Meta-Learning Across Modalities, Task Dimensionalities, and Data Regimes | Roussel Desmond Nzoyem et.al. | 2410.01655 | link |
2024-09-30 | LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner | Xiaopan Zhang et.al. | 2409.20560 | null |
2024-09-30 | Uni $^2$ Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection | Yubin Wang et.al. | 2409.20558 | null |
2024-09-30 | LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation | Ziyao Zhang et.al. | 2409.20550 | null |
2024-09-30 | Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models | Arpan Mukherjee et.al. | 2409.20512 | null |
2024-09-30 | COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models | Divyanshu Daiya et.al. | 2409.20502 | null |
2024-09-30 | Online Decision Deferral under Budget Constraints | Mirabel Reid et.al. | 2409.20489 | null |
2024-10-01 | Instance-adaptive Zero-shot Chain-of-Thought Prompting | Xiaosong Yuan et.al. | 2409.20441 | null |
2024-09-30 | World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering | Jiacong Wang et.al. | 2409.20424 | link |
2024-09-30 | Superposition of PRS and PDSCH for ISAC System: Spectral Efficiency Enhancement and Range Ambiguity Elimination | Keivan Khosroshahi et.al. | 2409.20420 | null |
2024-09-30 | Wait, but Tylenol is Acetaminophen… Investigating and Improving Language Models’ Ability to Resist Requests for Misinformation | Shan Chen et.al. | 2409.20385 | null |
2024-09-27 | ProMerge: Prompt and Merge for Unsupervised Instance Segmentation | Dylan Li et.al. | 2409.18961 | null |
2024-09-27 | LML: Language Model Learning a Dataset for Data-Augmented Prediction | Praneeth Vadlapati et.al. | 2409.18957 | link |
2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901 | link |
2024-09-27 | IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation | Fan Lin et.al. | 2409.18892 | link |
2024-09-27 | LW2G: Learning Whether to Grow for Prompt-based Continual Learning | Qian Feng et.al. | 2409.18860 | link |
2024-09-27 | Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects | Annie Chu et.al. | 2409.18847 | null |
2024-09-27 | LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis | Hamed Babaei Giglou et.al. | 2409.18812 | link |
2024-09-27 | Can AI Enhance its Creativity to Beat Humans ? | Anne-Gaëlle Maltese et.al. | 2409.18776 | null |
2024-09-27 | Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations | James Ford et.al. | 2409.18764 | null |
2024-09-27 | Interaction Equivalence | Beniamino Accattoli et.al. | 2409.18709 | null |
2024-09-26 | EgoLM: Multi-Modal Language Model of Egocentric Motions | Fangzhou Hong et.al. | 2409.18127 | null |
2024-09-26 | GSON: A Group-based Social Navigation Framework with Large Multimodal Model | Shangyi Luo et.al. | 2409.18084 | null |
2024-09-26 | Infer Human’s Intentions Before Following Natural Language Instructions | Yanming Wan et.al. | 2409.18073 | link |
2024-09-26 | Infering Alt-text For UI Icons With Large Language Models During App Development | Sabrina Haque et.al. | 2409.18060 | null |
2024-09-26 | MARS: Multi-radio Architecture with Radio Selection using Decision Trees for emerging mesoscale CPS/IoT applications | Jothi Prasanna Shanmuga Sundaram et.al. | 2409.18043 | null |
2024-09-26 | DARE: Diverse Visual Question Answering with Robustness Evaluation | Hannah Sterz et.al. | 2409.18023 | null |
2024-09-26 | Control Industrial Automation System with Large Language Models | Yuchen Xia et.al. | 2409.18009 | link |
2024-09-26 | Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation | Ashmi Banerjee et.al. | 2409.18003 | null |
2024-09-26 | Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models | Georg Ahnert et.al. | 2409.17990 | link |
2024-09-26 | GRB 240529A: A Tale of Two Shocks | Tian-Rui Sun et.al. | 2409.17983 | null |
2024-09-25 | Attention Prompting on Image for Large Vision-Language Models | Runpeng Yu et.al. | 2409.17143 | link |
2024-09-25 | Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset | Andrew Goldberg et.al. | 2409.17126 | null |
2024-09-26 | Characterizing stable regions in the residual stream of LLMs | Jett Janiak et.al. | 2409.17113 | null |
2024-09-25 | Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts | Mohammad Sadil Khan et.al. | 2409.17106 | link |
2024-09-25 | Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation | Richard D. Paul et.al. | 2409.17085 | null |
2024-09-25 | Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors | Aiping Zhang et.al. | 2409.17058 | link |
2024-09-25 | GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design | Phillip Mueller et.al. | 2409.17045 | null |
2024-09-25 | Counterfactual Token Generation in Large Language Models | Ivi Chatzi et.al. | 2409.17027 | link |
2024-09-25 | AXCEL: Automated eXplainable Consistency Evaluation using LLMs | P Aditya Sreekar et.al. | 2409.16984 | null |
2024-09-25 | DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling | Kyuheon Jung et.al. | 2409.16949 | link |
2024-09-24 | Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation | Yong Xien Chng et.al. | 2409.16278 | null |
2024-09-24 | Second Order Bounds for Contextual Bandits with Function Approximation | Aldo Pacchiano et.al. | 2409.16197 | null |
2024-09-24 | Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation | Xiaohong Liu et.al. | 2409.16183 | null |
2024-09-24 | Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering | Ziyu Zhao et.al. | 2409.16167 | null |
2024-09-24 | Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework | Lu Chen et.al. | 2409.16146 | link |
2024-09-24 | HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection | Yuqi Ma et.al. | 2409.16136 | null |
2024-09-24 | Evaluation of state-of-the-art ASR Models in Child-Adult Interactions | Aditya Ashvin et.al. | 2409.16135 | null |
2024-09-24 | MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | Ming Zhu et.al. | 2409.16120 | link |
2024-09-24 | Exploring Hint Generation Approaches in Open-Domain Question Answering | Jamshid Mozafari et.al. | 2409.16096 | link |
2024-09-24 | MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models | Wenhao Yu et.al. | 2409.16030 | null |
2024-09-18 | To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning | Zayne Sprague et.al. | 2409.12183 | link |
2024-09-18 | Investigating the effects of precise mass measurements of Ru and Pd isotopes on machine learning mass modeling | W. S. Porter et.al. | 2409.12141 | null |
2024-09-18 | MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140 | link |
2024-09-18 | Self-similar solutions of oscillatory reconnection: parameter study of magnetic field strength and background temperature | Luiz A. C. A. Schiavo et.al. | 2409.12130 | null |
2024-09-18 | Fully charmed tetraquark production at the LHC experiments | Ilia Belov et.al. | 2409.12070 | null |
2024-09-18 | Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking | Ningyuan Xi et.al. | 2409.12059 | null |
2024-09-19 | Using Large Language Models to Generate Clinical Trial Tables and Figures | Yumeng Yang et.al. | 2409.12046 | null |
2024-09-18 | Mixture of Prompt Learning for Vision Language Models | Yu Du et.al. | 2409.12011 | null |
2024-09-18 | Ramp reversal memory in bulk crystals of 1T-TaS2 | Avital Fried et.al. | 2409.11977 | null |
2024-09-18 | Sampling Latent Material-Property Information From LLM-Derived Embedding Representations | Luke P. J. Gilligan et.al. | 2409.11971 | null |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323 | null |
2024-09-17 | MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping | Amirreza Fateh et.al. | 2409.11316 | link |
2024-09-17 | Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models | Divij Gupta et.al. | 2409.11302 | null |
2024-09-17 | TISIS : Trajectory Indexing for SImilarity Search | Sara Jarrad et.al. | 2409.11301 | null |
2024-09-18 | Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling | Xinyue Fang et.al. | 2409.11283 | null |
2024-09-17 | Machine Learning and Theory Ladenness – A Phenomenological Account | Alberto Termine et.al. | 2409.11277 | null |
2024-09-18 | The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives | Samee Arif et.al. | 2409.11261 | link |
2024-09-17 | Norm of Mean Contextualized Embeddings Determines their Variance | Hiroaki Yamagiwa et.al. | 2409.11253 | link |
2024-09-17 | Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse | Maojia Song et.al. | 2409.11242 | link |
2024-09-17 | Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection | Yuta Kaneko et.al. | 2409.11223 | null |
2024-09-16 | Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models | Momoko Shiraishi et.al. | 2409.10506 | null |
2024-09-16 | Do Pre-trained Vision-Language Models Encode Object States? | Kaleb Newman et.al. | 2409.10488 | null |
2024-09-16 | Addressing misspecification in contextual optimization | Omar Bennouna et.al. | 2409.10479 | null |
2024-09-16 | A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration | Zhang Zheng et.al. | 2409.10403 | null |
2024-09-16 | Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation | Hanbo Bi et.al. | 2409.10389 | null |
2024-09-16 | On Synthetic Texture Datasets: Challenges, Creation, and Curation | Blaine Hoak et.al. | 2409.10297 | null |
2024-09-16 | From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs | Navya Jain et.al. | 2409.10245 | null |
2024-09-16 | Robust Bird’s Eye View Segmentation by Adapting DINOv2 | Merve Rabia Barın et.al. | 2409.10228 | null |
2024-09-16 | Exploring Quantum Contextuality with the Quantum Moebius-Escher-Penrose hypergraph | Mirko Navara et.al. | 2409.10179 | null |
2024-09-17 | jina-embeddings-v3: Multilingual Embeddings With Task LoRA | Saba Sturua et.al. | 2409.10173 | null |
2024-09-13 | Contri(e)ve: Context + Retrieve for Scholarly Question Answering | Kanchan Shivashankar et.al. | 2409.09010 | null |
2024-09-13 | SynSUM – Synthetic Benchmark with Structured and Unstructured Medical Records | Paloma Rabaey et.al. | 2409.08936 | link |
2024-09-13 | LLM-based Weak Supervision Framework for Query Intent Classification in Video Search | Farnoosh Javadi et.al. | 2409.08931 | null |
2024-09-13 | Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers | Namita Singh et.al. | 2409.08916 | null |
2024-09-13 | Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing | Minh-Duc Vu et.al. | 2409.08885 | null |
2024-09-13 | Data Efficient Child-Adult Speaker Diarization with Simulated Conversations | Anfeng Xu et.al. | 2409.08881 | link |
2024-09-13 | InstantDrag: Improving Interactivity in Drag-based Image Editing | Joonghyuk Shin et.al. | 2409.08857 | null |
2024-09-13 | A RAG Approach for Generating Competency Questions in Ontology Engineering | Xueli Pan et.al. | 2409.08820 | null |
2024-09-13 | Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR | Mingyu Cui et.al. | 2409.08797 | link |
2024-09-13 | LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment | Huan Zhang et.al. | 2409.08795 | link |
2024-09-12 | Click2Mask: Local Editing with Dynamic Mask Generation | Omer Regev et.al. | 2409.08272 | link |
2024-09-12 | Improving Text-guided Object Inpainting with Semantic Pre-inpainting | Yifu Chen et.al. | 2409.08260 | link |
2024-09-12 | Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding | Hongyu Li et.al. | 2409.08251 | null |
2024-09-12 | OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering | Jiahao Nick Li et.al. | 2409.08250 | null |
2024-09-12 | TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder | NaHyeon Park et.al. | 2409.08248 | link |
2024-09-12 | LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | Hakan T. Otal et.al. | 2409.08234 | link |
2024-09-12 | Exploring Use and Perceptions of Generative AI Art Tools by Blind Artists | Gayatri Raman et.al. | 2409.08226 | null |
2024-09-12 | AudioBERT: Audio Knowledge Augmented Language Model | Hyunjong Ok et.al. | 2409.08199 | link |
2024-09-12 | Fine-tuning Large Language Models for Entity Matching | Aaron Steiner et.al. | 2409.08185 | link |
2024-09-12 | On the Role of Context in Reading Time Prediction | Andreas Opedal et.al. | 2409.08160 | link |
2024-09-11 | Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation | Gavin Butts et.al. | 2409.07424 | null |
2024-09-11 | AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge | Han Wang et.al. | 2409.07394 | link |
2024-09-11 | Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code | Khiem Ton et.al. | 2409.07368 | null |
2024-09-11 | Enhancing Sequential Music Recommendation with Negative Feedback-informed Contrastive Learning | Pavan Seshadri et.al. | 2409.07367 | null |
2024-09-11 | PaveSAM Segment Anything for Pavement Distress | Neema Jakisa Owor et.al. | 2409.07295 | null |
2024-09-12 | Alignment of Diffusion Models: Fundamentals, Challenges, and Future | Buhua Liu et.al. | 2409.07253 | link |
2024-09-11 | Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning | Yingling Lu et.al. | 2409.07238 | link |
2024-09-12 | 3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents | Yingjie Zhou et.al. | 2409.07236 | link |
2024-09-11 | Swin-LiteMedSAM: A Lightweight Box-Based Segment Anything Model for Large-Scale Medical Image Datasets | Ruochen Gao et.al. | 2409.07172 | link |
2024-09-11 | Leveraging Unstructured Text Data for Federated Instruction Tuning of Large Language Models | Rui Ye et.al. | 2409.07136 | null |
2024-09-10 | E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning | Zihan Liao et.al. | 2409.06679 | null |
2024-09-10 | SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation | Teng Hu et.al. | 2409.06633 | null |
2024-09-10 | One-Shot Imitation under Mismatched Execution | Kushal Kedia et.al. | 2409.06615 | null |
2024-09-10 | Simulation-based Scenario Generation for Robust Hybrid AI for Autonomy | Hambisa Keno et.al. | 2409.06608 | null |
2024-09-10 | Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System | Leilei Lin et.al. | 2409.06568 | link |
2024-09-10 | ChatGPT’s Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools | Ehsan Firouzi et.al. | 2409.06561 | null |
2024-09-10 | An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition | Yi-Cheng Wang et.al. | 2409.06468 | null |
2024-09-10 | Continual Domain Incremental Learning for Privacy-aware Digital Pathology | Pratibha Kumari et.al. | 2409.06455 | null |
2024-09-10 | Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles | Qiujing Lu et.al. | 2409.06450 | null |
2024-09-10 | HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data | Hossein Hajipour et.al. | 2409.06446 | link |
2024-09-09 | Promptable Closed-loop Traffic Simulation | Shuhan Tan et.al. | 2409.05863 | null |
2024-09-09 | Recognizing molecular chirality via twisted 2D materials | Lorenzo Cavicchi et.al. | 2409.05839 | null |
2024-09-09 | Are Large Language Models a Threat to Programming Platforms? An Exploratory Study | Md Mustakim Billah et.al. | 2409.05824 | null |
2024-09-09 | Leveraging Object Priors for Point Tracking | Bikram Boote et.al. | 2409.05786 | link |
2024-09-09 | A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System | B. Sankar et.al. | 2409.05747 | null |
2024-09-09 | What Did My Car Say? Autonomous Vehicle Explanation Errors, Context, and Personal Traits Impact Comfort, Reliance, Satisfaction, and Driving Confidence | Robert Kaufman et.al. | 2409.05731 | null |
2024-09-09 | Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling | Sara Ferro et.al. | 2409.05699 | null |
2024-09-09 | SX-Stitch: An Efficient VMS-UNet Based Framework for Intraoperative Scoliosis X-Ray Image Stitching | Yi Li et.al. | 2409.05681 | null |
2024-09-09 | Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models | Aakash Sen Sharma et.al. | 2409.05668 | null |
2024-09-09 | DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification | Junzhou Chen et.al. | 2409.05587 | null |
2024-09-06 | Question-Answering Dense Video Events | Hangyu Qin et.al. | 2409.04388 | null |
2024-09-06 | J/ $ψ$-hadron correlations at midrapidity in pp collisions at $\sqrt{s}$ = 13 TeV | ALICE Collaboration et.al. | 2409.04364 | null |
2024-09-06 | Connectivity-Inspired Network for Context-Aware Recognition | Gianluca Carloni et.al. | 2409.04360 | link |
2024-09-06 | First studies on cascaded dual-phase liquid hole-multipliers in xenon | G. Martinez-Lema et.al. | 2409.04338 | null |
2024-09-06 | Active learning for regression in engineering populations: A risk-informed approach | Daniel R. Clarkson et.al. | 2409.04328 | null |
2024-09-06 | Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs | Aliakbar Nafar et.al. | 2409.04318 | link |
2024-09-06 | FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning | Yunhao Bai et.al. | 2409.04298 | link |
2024-09-06 | Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | Desiree Heim et.al. | 2409.04286 | null |
2024-09-06 | An overview of domain-specific foundation model: key technologies, applications and challenges | Haolong Chen et.al. | 2409.04267 | null |
2024-09-06 | FPT Algorithms using Minimal Parameters for a Generalized Version of Maximin Shares | Klaus Jansen et.al. | 2409.04225 | null |
2024-09-05 | LLM-CI: Assessing Contextual Integrity Norms in Language Models | Yan Shvartzshnaider et.al. | 2409.03735 | null |
2024-09-06 | RAG based Question-Answering for Contextual Response Prediction System | Sriram Veturi et.al. | 2409.03708 | null |
2024-09-06 | LLM-based multi-agent poetry generation in non-cooperative environments | Ran Zhang et.al. | 2409.03659 | link |
2024-09-05 | Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers | Amit Ben Artzy et.al. | 2409.03621 | link |
2024-09-05 | Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration | Pei Wang et.al. | 2409.03455 | null |
2024-09-05 | Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities | Wei Lu et.al. | 2409.03444 | link |
2024-09-05 | Leveraging Large Language Models through Natural Language Processing to provide interpretable Machine Learning predictions of mental deterioration in real time | Francisco de Arriba-Pérez et.al. | 2409.03375 | null |
2024-09-05 | TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation | Shahzaib Iqbal et.al. | 2409.03367 | null |
2024-09-05 | Sketch: A Toolkit for Streamlining LLM Operations | Xin Jiang et.al. | 2409.03346 | null |
2024-09-05 | N-gram Prediction and Word Difference Representations for Language Modeling | DongNyeong Heo et.al. | 2409.03295 | null |
2024-09-04 | HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts | Xinyu Liu et.al. | 2409.02919 | link |
2024-09-04 | Building a Scalable, Effective, and Steerable Search and Ranking Platform | Marjan Celikik et.al. | 2409.02856 | null |
2024-09-04 | Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model | Tornike Karchkhadze et.al. | 2409.02845 | null |
2024-09-04 | MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark | Xiang Yue et.al. | 2409.02813 | null |
2024-09-04 | Non-Orthogonal Multiple-Access Strategies for Direct-to-Satellite IoT Networks | Felipe Augusto Tondo et.al. | 2409.02748 | null |
2024-09-04 | Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection | Kaiqing Lin et.al. | 2409.02664 | null |
2024-09-04 | PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation | Jun Ling et.al. | 2409.02657 | null |
2024-09-04 | Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects | Kyungmin Jo et.al. | 2409.02653 | null |
2024-09-04 | Mamba as a motion encoder for robotic imitation learning | Toshiaki Tsuji et.al. | 2409.02636 | null |
2024-09-04 | PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation | Aneta Pawelec et.al. | 2409.02617 | null |
2024-08-30 | DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model | Mona Sheikh Zeinoddin et.al. | 2408.17433 | link |
2024-08-30 | CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | Jonathan Bourne et.al. | 2408.17428 | link |
2024-09-03 | Open-vocabulary Temporal Action Localization using VLMs | Naoki Wake et.al. | 2408.17422 | null |
2024-08-30 | MoRe Fine-Tuning with 10x Fewer Parameters | Wenxuan Tan et.al. | 2408.17383 | link |
2024-08-30 | Efficient Multi-task Prompt Tuning for Recommendation | Ting Bai et.al. | 2408.17214 | null |
2024-08-30 | NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar | Runwei Guan et.al. | 2408.17207 | null |
2024-08-30 | Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study | Shubham Agarwal et.al. | 2408.17181 | null |
2024-08-30 | Wireless Integrated Authenticated Communication System (WIA-Comm) | Amith N Bharadwaj et.al. | 2408.17112 | null |
2024-08-30 | Understanding the User: An Intent-Based Ranking Dataset | Abhijit Anand et.al. | 2408.17103 | null |
2024-08-30 | Reasoning AI Performance Degradation in 6G Networks with Large Language Models | Liming Huang et.al. | 2408.17097 | null |
2024-08-29 | PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning | Noor Hussein et.al. | 2408.16769 | link |
2024-08-29 | SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners | Ziyu Guo et.al. | 2408.16768 | link |
2024-08-29 | ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model | Fangfu Liu et.al. | 2408.16767 | null |
2024-08-29 | An algebraic characterisation of Kochen-Specker contextuality | Markus Frembs et.al. | 2408.16764 | null |
2024-08-29 | Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge | Beidi Dong et.al. | 2408.16749 | null |
2024-08-29 | GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models | Moreno D’Incà et.al. | 2408.16700 | link |
2024-08-29 | Iterative Graph Alignment | Fangyuan Yu et.al. | 2408.16667 | link |
2024-08-29 | LLMs generate structurally realistic social networks but overestimate political homophily | Serina Chang et.al. | 2408.16629 | link |
2024-08-29 | WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling | Shengpeng Ji et.al. | 2408.16532 | link |
2024-08-29 | UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation | Piotr Rudol et.al. | 2408.16501 | null |
2024-08-29 | Spatio-Temporal Context Prompting for Zero-Shot Action Detection | Wei-Jhe Huang et.al. | 2408.15996 | null |
2024-08-28 | TEDRA: Text-based Editing of Dynamic and Photoreal Actors | Basavaraj Sunagad et.al. | 2408.15995 | null |
2024-08-28 | Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration | Xu Zhang et.al. | 2408.15994 | null |
2024-08-28 | In-Context Imitation Learning via Next-Token Prediction | Letian Fu et.al. | 2408.15980 | link |
2024-08-28 | Fall Detection for Smart Living using YOLOv5 | Gracile Astlin Pereira et.al. | 2408.15955 | null |
2024-08-28 | Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | Nicholas R. Waytowich et.al. | 2408.15950 | null |
2024-08-28 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
2024-08-28 | CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization | Feize Wu et.al. | 2408.15914 | null |
2024-08-28 | Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models | Sebastian Vallejo Vera et.al. | 2408.15895 | null |
2024-08-28 | Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation | Shaofei Huang et.al. | 2408.15876 | link |
2024-08-27 | SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images | Zafer Yildiz et.al. | 2408.15224 | link |
2024-08-27 | LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet | Nathaniel Li et.al. | 2408.15221 | null |
2024-08-27 | Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation | Jian Hu et.al. | 2408.15205 | link |
2024-08-27 | On the parameterized complexity of computing good edge-labelings | Davi de Andrade et.al. | 2408.15181 | null |
2024-08-27 | A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships | Gracile Astlin Pereira et.al. | 2408.15178 | null |
2024-08-27 | X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation | Hanjia Lyu et.al. | 2408.15172 | null |
2024-08-28 | Urdu Digital Text Word Optical Character Recognition Using Permuted Auto Regressive Sequence Modeling | Ahmed Mustafa et.al. | 2408.15119 | null |
2024-08-27 | CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP | Zhenchen Tang et.al. | 2408.15098 | null |
2024-08-27 | MiWaves Reinforcement Learning Algorithm | Susobhan Ghosh et.al. | 2408.15076 | link |
2024-08-28 | Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance | Kunpeng Wang et.al. | 2408.15063 | link |
2024-08-27 | Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models | Aradhye Agarwal et.al. | 2408.14470 | link |
2024-08-26 | Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study | Liuchang Xu Shuo Zhao et.al. | 2408.14438 | null |
2024-08-26 | Social perception of faces in a vision-language model | Carina I. Hausladen et.al. | 2408.14435 | link |
2024-08-26 | Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications | Luyue Xu et.al. | 2408.14432 | null |
2024-08-26 | Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning | Sakhinana Sagar Srinivas et.al. | 2408.14387 | null |
2024-08-26 | ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty | Xindi Wu et.al. | 2408.14339 | null |
2024-08-26 | Claim Verification in the Age of Large Language Models: A Survey | Alphaeus Dmonte et.al. | 2408.14317 | null |
2024-08-27 | Text3DAug – Prompted Instance Augmentation for LiDAR Perception | Laurenz Reichardt et.al. | 2408.14253 | link |
2024-08-27 | SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher | Trung Dao et.al. | 2408.14176 | link |
2024-08-26 | Contrastive Learning Subspace for Text Clustering | Qian Yong et.al. | 2408.14119 | null |
2024-08-23 | Domain-specific long text classification from sparse relevant information | Célia D’Cruz et.al. | 2408.13253 | null |
2024-08-23 | LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation | Shuai Yang et.al. | 2408.13252 | null |
2024-08-23 | CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | Tao Wu et.al. | 2408.13239 | link |
2024-08-23 | Enhancing Few-Shot Transfer Learning with Optimized Multi-Task Prompt Tuning through Modular Prompt Composition | Ahmad Pouramini et.al. | 2408.13227 | null |
2024-08-23 | Polarization Measurement of Gamma-ray Bursts with Fermi-GBM: The Case of GRB 180720B | P. Veres et.al. | 2408.13199 | null |
2024-08-23 | Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Hourui Deng et.al. | 2408.13184 | null |
2024-08-23 | Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation | Bonan Li et.al. | 2408.13149 | null |
2024-08-23 | SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks | Kai-Wei Chang et.al. | 2408.13040 | null |
2024-08-23 | Indoor scene recognition from images under visual corruptions | Willams de Lima Costa et.al. | 2408.13029 | null |
2024-08-23 | A Web-Based Solution for Federated Learning with LLM-Based Automation | Chamith Mawela et.al. | 2408.13010 | null |
2024-08-22 | Controllable Text Generation for Large Language Models: A Survey | Xun Liang et.al. | 2408.12599 | link |
2024-08-23 | Non-Homophilic Graph Pre-Training and Prompt Learning | Xingtong Yu et.al. | 2408.12594 | link |
2024-08-22 | Contextual Stochastic Optimization for School Desegregation Policymaking | Hongzhao Guan et.al. | 2408.12572 | null |
2024-08-22 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu et.al. | 2408.12547 | link |
2024-08-22 | Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition | Bozheng Li et.al. | 2408.12475 | null |
2024-08-22 | DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems | Jiaju Chen et.al. | 2408.12470 | link |
2024-08-22 | FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing | Jue Wang et.al. | 2408.12429 | link |
2024-08-22 | Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce | Ádám Tibor Czapp et.al. | 2408.12392 | null |
2024-08-22 | Orbits of Binary Stars: from Visual Measures to Speckle Interferometry | Andrei Tokovinin et.al. | 2408.12376 | null |
2024-08-23 | RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering | Pratyush Kumar et.al. | 2408.12369 | link |
2024-08-21 | NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation | Zhenye Lou et.al. | 2408.11787 | link |
2024-08-21 | Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards | Omar Erak et.al. | 2408.11775 | link |
2024-08-21 | D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models | M. Forlini et.al. | 2408.11761 | null |
2024-08-21 | MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs | Yulin Ren et.al. | 2408.11758 | link |
2024-08-21 | FocusLLM: Scaling LLM’s Context by Parallel Decoding | Zhenyu Li et.al. | 2408.11745 | link |
2024-08-21 | JieHua Paintings Style Feature Extracting Model using Stable Diffusion with ControlNet | Yujia Gu et.al. | 2408.11744 | null |
2024-08-21 | CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering | Yuliang Cai et.al. | 2408.11742 | link |
2024-08-22 | LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites | Zachariah Sollenberger et.al. | 2408.11729 | null |
2024-08-21 | Efficient Detection of Toxic Prompts in Large Language Models | Yi Liu et.al. | 2408.11727 | null |
2024-08-21 | Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests | Amirhossein Deljouyi et.al. | 2408.11710 | link |
2024-08-20 | Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement | Satoshi Kosugi et.al. | 2408.11055 | link |
2024-08-20 | Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks | Nathaniel Pinckney et.al. | 2408.11053 | link |
2024-08-20 | Multiple Topology Replica Exchange of Expanded Ensembles (MT-REXEE) for Multidimensional Alchemical Calculations | Anika J. Friedman et.al. | 2408.11038 | link |
2024-08-20 | An Overlooked Role of Context-Sensitive Dendrites | Mohsin Raza et.al. | 2408.11019 | null |
2024-08-20 | Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter | Farhanul Haque et.al. | 2408.10955 | null |
2024-08-20 | The Evolution of Reinforcement Learning in Quantitative Finance | Nikolaos Pippas et.al. | 2408.10932 | null |
2024-08-20 | CHECKWHY: Causal Fact Verification via Argument Structure | Jiasheng Si et.al. | 2408.10918 | link |
2024-08-21 | BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model | Yeyong Yu et.al. | 2408.10903 | link |
2024-08-20 | DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection | Xinqi Su et.al. | 2408.10883 | link |
2024-08-20 | Manifold Transform by Recurrent Cortical Circuit Enhances Robust Encoding of Familiar Stimuli | Weifan Wang et.al. | 2408.10873 | null |
2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174 | link |
2024-08-19 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | link |
2024-08-19 | In-Context Learning with Representations: Contextual Generalization of Trained Transformers | Tong Yang et.al. | 2408.10147 | null |
2024-08-19 | Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models | Tianyu Zhang et.al. | 2408.10124 | link |
2024-08-19 | FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant | Zhengchao Huang et.al. | 2408.10072 | link |
2024-08-19 | Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development | Yuncheng Jiang et.al. | 2408.10067 | null |
2024-08-19 | Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory | Haoran Li et.al. | 2408.10053 | null |
2024-08-19 | Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype | Yadong Lu et.al. | 2408.09984 | null |
2024-08-20 | Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM’s Structured Questions for National Teacher Certification Exams | Ling He et.al. | 2408.09982 | null |
2024-08-19 | Contextual Importance and Utility in Python: New Functionality and Insights with the py-ciu Package | Kary Främling et.al. | 2408.09957 | link |
2024-08-19 | PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | Sumanth Prabhu et.al. | 2408.08869 | null |
2024-08-16 | Visual Agents as Fast and Slow Thinkers | Guangyan Sun et.al. | 2408.08862 | link |
2024-08-16 | Revisiting the propagation of highly-energetic gamma rays in the Galaxy | Gaetano Di Marco et.al. | 2408.08818 | null |
2024-08-16 | CIKMar: A Dual-Encoder Approach to Prompt-Based Reranking in Educational Dialogue Systems | Joanito Agili Lopo et.al. | 2408.08805 | null |
2024-08-16 | Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification | Abdullah Al Imran et.al. | 2408.08803 | null |
2024-08-16 | Neighbor Overlay-Induced Graph Attention Network | Tiqiao Wei et.al. | 2408.08788 | null |
2024-08-16 | Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions | Bhuvanashree Murugadoss et.al. | 2408.08781 | null |
2024-08-16 | Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Chenming Tang et.al. | 2408.08780 | null |
2024-08-16 | Watching the Generative AI Hype Bubble Deflate | David Gray Widder et.al. | 2408.08778 | null |
2024-08-16 | Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused | Dingwei Chen et.al. | 2408.08769 | null |
2024-08-15 | SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training | Gengwei Zhang et.al. | 2408.08295 | link |
2024-08-15 | Heavy Labels Out! Dataset Distillation with Label Space Lightening | Ruonan Yu et.al. | 2408.08201 | null |
2024-08-15 | “I Try to Represent Myself as I Am”: Self-Presentation Preferences of People with Invisible Disabilities through Embodied Social VR Avatars | Ria J. Gualano et.al. | 2408.08193 | null |
2024-08-16 | Beyond Full Label: Single-Point Prompt for Infrared Small Target Label Generation | Shuai Yuan et.al. | 2408.08191 | link |
2024-08-16 | FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance | Jiasong Feng et.al. | 2408.08189 | null |
2024-08-15 | Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion | Adi Haviv et.al. | 2408.08184 | null |
2024-08-15 | EmBARDiment: an Embodied AI Agent for Productivity in XR | Riccardo Bovo et.al. | 2408.08158 | null |
2024-08-15 | P/D-Serve: Serving Disaggregated Large Language Model at Scale | Yibo Jin et.al. | 2408.08147 | null |
2024-08-15 | MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU | Yan Li et.al. | 2408.08144 | null |
2024-08-15 | Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification | Levente Murgás et.al. | 2408.08126 | link |
2024-08-14 | Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques | Ivory Yang et.al. | 2408.07676 | null |
2024-08-14 | See It All: Contextualized Late Aggregation for 3D Dense Captioning | Minjung Kim et.al. | 2408.07648 | null |
2024-08-14 | Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach | Shizhou Zhang et.al. | 2408.07500 | link |
2024-08-14 | DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency | Xiaojing Zhong et.al. | 2408.07481 | null |
2024-08-14 | Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification | Yongcheng Li et.al. | 2408.07467 | link |
2024-08-14 | Large Language Models Prompting With Episodic Memory | Dai Do et.al. | 2408.07465 | null |
2024-08-15 | BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning | Asif Hanif et.al. | 2408.07440 | link |
2024-08-14 | Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator | Federico Nicolas Peccia et.al. | 2408.07404 | null |
2024-08-14 | A Quantum-Inspired Analysis of Human Disambiguation Processes | Daphne Wang et.al. | 2408.07402 | null |
2024-08-14 | Segment Using Just One Example | Pratik Vora et.al. | 2408.07393 | null |
2024-08-13 | Categorical Framework for Typed Extensional and Intensional Models in Formal Semantics | Daniel Quigley et.al. | 2408.07058 | null |
2024-08-13 | TableGuard – Securing Structured & Unstructured Data | Anantha Sharma et.al. | 2408.07045 | null |
2024-08-13 | Imagen 3 | Imagen-Team-Google et.al. | 2408.07009 | null |
2024-08-13 | Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models | Chun Jie Chong et.al. | 2408.07004 | null |
2024-08-13 | Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2 | Osher Rafaeli et.al. | 2408.06970 | null |
2024-08-13 | Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas | Louis Kwok et.al. | 2408.06929 | link |
2024-08-13 | SceneGPT: A Language Model for 3D Scene Understanding | Shivam Chandhok et.al. | 2408.06926 | null |
2024-08-13 | New refinements of Narayana polynomials and Motzkin polynomials | Janet J. W. Dong et.al. | 2408.06912 | null |
2024-08-13 | Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives | Zhihu Wang et.al. | 2408.06904 | null |
2024-08-13 | Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media | Pranav Venkatesh et.al. | 2408.06900 | null |
2024-08-12 | Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks | Lucas Félix et.al. | 2408.06341 | null |
2024-08-12 | LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification | Tanisha Khurana et.al. | 2408.06335 | null |
2024-08-12 | Animate, or Inanimate, That is the Question for Large Language Models | Leonardo Ranaldi et.al. | 2408.06332 | null |
2024-08-12 | Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let’s Take TravelPlanner as an Example | Yanan Chen et.al. | 2408.06318 | null |
2024-08-12 | From SAM to SAM 2: Exploring Improvements in Meta’s Segment Anything Model | Athulya Sundaresan Geetha et.al. | 2408.06305 | null |
2024-08-12 | Long-Form Answers to Visual Questions from Blind and Low Vision People | Mina Huh et.al. | 2408.06303 | null |
2024-08-12 | Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM | Trisha Das et.al. | 2408.06285 | null |
2024-08-12 | Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning | Yingjin Song et.al. | 2408.06259 | null |
2024-08-12 | Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images | Siladittya Manna et.al. | 2408.06235 | null |
2024-08-12 | Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting | Halley Young et.al. | 2408.06186 | null |
2024-08-09 | Multi-Garment Customized Model Generation | Yichen Liu et.al. | 2408.05206 | null |
2024-08-09 | Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners | Michael Vaccaro Jr et.al. | 2408.05204 | null |
2024-08-09 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | link |
2024-08-09 | ECG-FM: An Open Electrocardiogram Foundation Model | Kaden McKeen et.al. | 2408.05178 | link |
2024-08-09 | AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | Pritam Deka et.al. | 2408.05149 | null |
2024-08-09 | How Well Do LLMs Identify Cultural Unity in Diversity? | Jialin Li et.al. | 2408.05102 | link |
2024-08-09 | Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts | Tingchen Fu et.al. | 2408.05094 | null |
2024-08-09 | Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models | Zikai Xie et.al. | 2408.05093 | link |
2024-08-09 | Generating novel experimental hypotheses from language models: A case study on cross-dative generalization | Kanishka Misra et.al. | 2408.05086 | link |
2024-08-09 | SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation | Da Mu et.al. | 2408.05057 | null |
2024-08-08 | SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation | Jieming Yu et.al. | 2408.04593 | null |
2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575 | null |
2024-08-08 | Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User’s Casual Sketches | Yongzhi Xu et.al. | 2408.04567 | null |
2024-08-08 | Conversational Prompt Engineering | Liat Ein-Dor et.al. | 2408.04560 | null |
2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
2024-08-08 | Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models | Fabio Pernisi et.al. | 2408.04522 | null |
2024-08-08 | Model-Based Transfer Learning for Contextual Reinforcement Learning | Jung-Hoon Cho et.al. | 2408.04498 | link |
2024-08-08 | What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant | Jonan Richards et.al. | 2408.04477 | null |
2024-08-09 | Achieving Robust Data-driven Contextual Decision Making in a Data Augmentation Way | Zhaoen Li et.al. | 2408.04469 | null |
2024-08-08 | Modelling Probabilistic FPC in Guarded Type Theory | Philipp Jan Andries Stassen et.al. | 2408.04455 | null |
2024-08-07 | SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature | Vinícius Di Oliveira et.al. | 2408.03936 | null |
2024-08-07 | FMiFood: Multi-modal Contrastive Learning for Food Image Classification | Xinyue Pan et.al. | 2408.03922 | null |
2024-08-07 | CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Xiangyan Liu et.al. | 2408.03910 | link |
2024-08-07 | Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models | Shachi H Kumar et.al. | 2408.03907 | null |
2024-08-07 | Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond | Beomseok Lee et.al. | 2408.03900 | link |
2024-08-07 | BeeManc at the PLABA Track of TAC-2023: Investigating LLMs and Controllable Attributes for Improving Biomedical Text Readability | Zihao Li et.al. | 2408.03871 | link |
2024-08-07 | GAIA – A Large Language Model for Advanced Power Dispatch | Yuheng Cheng et.al. | 2408.03847 | null |
2024-08-07 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta et.al. | 2408.03837 | link |
2024-08-07 | Target Prompting for Information Extraction with Vision Language Model | Dipankar Medhi et.al. | 2408.03834 | null |
2024-08-07 | Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring | Zifan Wang et.al. | 2408.03811 | null |
2024-08-06 | Training LLMs to Recognize Hedges in Spontaneous Narratives | Amie J. Paige et.al. | 2408.03319 | link |
2024-08-06 | Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters | Charlie Snell et.al. | 2408.03314 | null |
2024-08-06 | MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation | Xiaofeng Mao et.al. | 2408.03312 | null |
2024-08-06 | A search for soft X-ray emission lines in the afterglow spectrum of GRB 221009A | Sergio Campana et.al. | 2408.03306 | null |
2024-08-06 | SARA: Singular-Value Based Adaptive Low-Rank Adaption | Jihao Gu et.al. | 2408.03290 | null |
2024-08-06 | Biomedical SAM 2: Segment Anything in Biomedical Images and Videos | Zhiling Yan et.al. | 2408.03286 | link |
2024-08-06 | Synthesizing Text-to-SQL Data from Weak and Strong LLMs | Jiaxi Yang et.al. | 2408.03256 | null |
2024-08-06 | Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons | Yifei Wang et.al. | 2408.03247 | link |
2024-08-06 | Making Long-Context Language Models Better Multi-Hop Reasoners | Yanyang Li et.al. | 2408.03246 | link |
2024-08-07 | Red Type-1 Quasars after Cosmic Noon and Impact on $L_{\rm UV}$ -related Quasar Statistics | Yongjung Kim et.al. | 2408.03228 | null |
2024-08-05 | Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? | Mohammad Bahrami Karkevandi et.al. | 2408.02651 | null |
2024-08-05 | SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | Muxi Diao et.al. | 2408.02632 | null |
2024-08-05 | Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection | Sajal Aggarwal et.al. | 2408.02595 | null |
2024-08-05 | The Role of Functional Muscle Networks in Improving Hand Gesture Perception for Human-Machine Interfaces | Costanza Armanini et.al. | 2408.02547 | null |
2024-08-05 | Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph | Zhao Kaichen et.al. | 2408.02535 | null |
2024-08-05 | Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation | Aaron Imani et.al. | 2408.02502 | null |
2024-08-05 | Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection | Ting Lei et.al. | 2408.02484 | link |
2024-08-05 | TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments | Daeun Song et.al. | 2408.02454 | null |
2024-08-05 | FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification | Yijin Huang et.al. | 2408.02426 | link |
2024-08-05 | Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models | Zi Liang et.al. | 2408.02416 | link |
2024-08-02 | Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting | Xiangyu Zhao et.al. | 2408.01423 | null |
2024-08-02 | Mission Impossible: A Statistical Perspective on Jailbreaking LLMs | Jingtong Su et.al. | 2408.01420 | null |
2024-08-02 | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs | Yilun Hua et.al. | 2408.01417 | null |
2024-08-02 | Conditional LoRA Parameter Generation | Xiaolong Jin et.al. | 2408.01415 | null |
2024-08-02 | Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer | Yu Yang et.al. | 2408.01402 | null |
2024-08-02 | Transformers are Universal In-context Learners | Takashi Furuya et.al. | 2408.01367 | null |
2024-08-02 | MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code | Kaiwen Ning et.al. | 2408.01354 | link |
2024-08-02 | Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks | Anders Giovanni Møller et.al. | 2408.01346 | null |
2024-08-02 | Synergistic pathways of modulation enable robust task packing within neural dynamics | Giacomo Vedovati et.al. | 2408.01316 | null |
2024-08-02 | TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling | Dong Huo et.al. | 2408.01291 | null |
2024-08-01 | Segment anything model 2: an application to 2D and 3D medical images | Haoyu Dong et.al. | 2408.00756 | link |
2024-08-01 | Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model | Benlin Liu et.al. | 2408.00754 | null |
2024-08-01 | Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Guangzhi Xiong et.al. | 2408.00727 | link |
2024-08-01 | Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM | Xiaofeng Liu et.al. | 2408.00706 | null |
2024-08-01 | Can Developers Prompt? A Controlled Experiment for Code Documentation Generation | Hans-Alexander Kruse et.al. | 2408.00686 | null |
2024-08-01 | Quantum Order by Disorder: A Key to Understanding the Magnetic Phases of BaCo $_2$(AsO$_4$)$_2$ | Sangyun Lee et.al. | 2408.00622 | null |
2024-08-01 | Mitigating Multilingual Hallucination in Large Vision-Language Models | Xiaoye Qu et.al. | 2408.00550 | link |
2024-08-01 | Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model | Felipe Mahlow et.al. | 2408.00544 | null |
2024-08-01 | Jailbreaking Text-to-Image Models with LLM-Based Agents | Yingkai Dong et.al. | 2408.00523 | null |
2024-08-01 | A new approach for encoding code and assisting code understanding | Mengdan Fan et.al. | 2408.00521 | null |
2024-07-31 | Vision-Language Model Based Handwriting Verification | Mihir Chauhan et.al. | 2407.21788 | null |
2024-07-31 | Tulip Agent – Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries | Felix Ocker et.al. | 2407.21778 | null |
2024-07-31 | Ge-based Clinopyroxene series: first principles and experimental local probe study | Ricardo P. Moreira et.al. | 2407.21749 | null |
2024-07-31 | A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation | Mothilal Asokan et.al. | 2407.21739 | null |
2024-07-31 | Detecting, Explaining, and Mitigating Memorization in Diffusion Models | Yuxin Wen et.al. | 2407.21720 | link |
2024-07-31 | Hyper-parameter tuning for text guided image editing | Shiwen Zhang et.al. | 2407.21703 | link |
2024-07-31 | Four-loop two-mass tadpoles and the $ρ$ parameter | Samuel Abreu et.al. | 2407.21700 | null |
2024-07-31 | Kramers-Kronig relations via Laplace formalism and $L^1$ integrability | Marco Prevedelli et.al. | 2407.21694 | null |
2024-07-31 | MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment | Anurag Das et.al. | 2407.21654 | null |
2024-07-31 | MSA2Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation | Sina Ghorbani Kolahi et.al. | 2407.21640 | link |
2024-07-30 | Add-SD: Rational Generation without Manual Reference | Lingfeng Yang et.al. | 2407.21016 | link |
2024-07-30 | CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning | Yuexi Du et.al. | 2407.21011 | link |
2024-07-30 | AI-Assisted Generation of Difficult Math Questions | Vedant Shah et.al. | 2407.21009 | link |
2024-07-30 | Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection | Jinfa Huang et.al. | 2407.21004 | link |
2024-07-30 | From Feature Importance to Natural Language Explanations Using LLMs with RAG | Sule Tekkesinoglu et.al. | 2407.20990 | link |
2024-07-30 | UniProcessor: A Text-induced Unified Low-level Image Processor | Huiyu Duan et.al. | 2407.20928 | link |
2024-07-30 | SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition | Hao Tan et.al. | 2407.20920 | null |
2024-07-30 | Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation | Pujan Paudel et.al. | 2407.20910 | null |
2024-07-30 | ThinkRepair: Self-Directed Automated Program Repair | Xin Yin et.al. | 2407.20898 | link |
2024-07-30 | Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations | Sarthak Anand et.al. | 2407.20856 | null |
2024-07-29 | QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval | Hongming Tan et.al. | 2407.20207 | null |
2024-07-29 | Deciphering the Instability of the Black Hole Ringdown Quasinormal Spectrum | A. Ianniccari et.al. | 2407.20144 | null |
2024-07-29 | Context-Aware CSI Tracking and Path Loss Prediction Using Machine Learning and Dynamical Systems | Anis Hamadouche et.al. | 2407.20123 | null |
2024-07-29 | Generative Diffusion Model Bootstraps Zero-shot Classification of Fetal Ultrasound Images In Underrepresented African Populations | Fangyijie Wang et.al. | 2407.20072 | link |
2024-07-29 | Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models | Zhe Li et.al. | 2407.20053 | null |
2024-07-29 | Reproducibility Study of “ITI-GEN: Inclusive Text-to-Image Generation” | Daniel Gallo Fernández et.al. | 2407.19996 | link |
2024-07-29 | A Study on the Implementation Method of an Agent-Based Advanced RAG System Using Graph | Cheonsu Jeong et.al. | 2407.19994 | null |
2024-07-29 | MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion | Chencan Fu et.al. | 2407.19976 | null |
2024-07-29 | FedDEO: Description-Enhanced One-Shot Federated Learning with Diffusion Models | Mingzhao Yang et.al. | 2407.19953 | null |
2024-07-29 | FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention | Yu Lu et.al. | 2407.19918 | null |
2024-07-26 | Small Molecule Optimization with Large Language Models | Philipp Guevorguian et.al. | 2407.18897 | link |
2024-07-26 | The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs | Aleix Sant et.al. | 2407.18786 | null |
2024-07-26 | TESSILATOR: a one-stop shop for measuring TESS rotation periods | A. S. Binks et.al. | 2407.18761 | link |
2024-07-29 | Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery | Yuni Susanti et.al. | 2407.18752 | link |
2024-07-26 | Towards Generalized Offensive Language Identification | Alphaeus Dmonte et.al. | 2407.18738 | null |
2024-07-26 | Neurosymbolic AI for Enhancing Instructability in Generative AI | Amit Sheth et.al. | 2407.18722 | null |
2024-07-26 | Probing exotic long-lived particles from the prompt side using the CONTUR method | Louie Corpe et.al. | 2407.18710 | null |
2024-07-26 | Dilated Strip Attention Network for Image Restoration | Fangwei Hao et.al. | 2407.18613 | null |
2024-07-26 | Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation | Chaoyi Ai et.al. | 2407.18562 | null |
2024-07-26 | A Universal Prompting Strategy for Extracting Process Model Information from Natural Language Text using Large Language Models | Julian Neuberger et.al. | 2407.18540 | link |
2024-07-25 | LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | Zhengbo Wang et.al. | 2407.18242 | link |
2024-07-26 | Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Yuxiao Qu et.al. | 2407.18219 | null |
2024-07-26 | Exploring Scaling Trends in LLM Robustness | Nikolaus Howe et.al. | 2407.18213 | link |
2024-07-26 | Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation | Razieh Azizi et.al. | 2407.18195 | null |
2024-07-25 | Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning | Sindhura Kommu et.al. | 2407.18181 | null |
2024-07-25 | Efficient Inference of Vision Instruction-Following Models with Elastic Cache | Zuyan Liu et.al. | 2407.18121 | link |
2024-07-25 | Keypoint Promptable Re-Identification | Vladimir Somers et.al. | 2407.18112 | link |
2024-07-25 | DINOv2 Rocks Geological Image Analysis: Classification, Segmentation, and Interpretability | Florent Brondolo et.al. | 2407.18100 | link |
2024-07-25 | C2P: Featuring Large Language Models with Causal Reasoning | Abdolmahdi Bagheri et.al. | 2407.18069 | null |
2024-07-25 | I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition | Yannis Vasilakis et.al. | 2407.18058 | link |
2024-07-24 | WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries | Wenting Zhao et.al. | 2407.17468 | null |
2024-07-24 | Fluent Student-Teacher Redteaming | T. Ben Thompson et.al. | 2407.17447 | link |
2024-07-24 | Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | Michael-Andrei Panaitescu-Liess et.al. | 2407.17417 | null |
2024-07-24 | (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork | Tianjin Huang et.al. | 2407.17412 | null |
2024-07-24 | PERSONA: A Reproducible Testbed for Pluralistic Alignment | Louis Castricato et.al. | 2407.17387 | null |
2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365 | null |
2024-07-24 | DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation | Qian Feng et.al. | 2407.17348 | null |
2024-07-24 | How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations? | Leo Yu-Ho Lo et.al. | 2407.17291 | null |
2024-07-24 | A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks | Fabiano Belém et.al. | 2407.17284 | null |
2024-07-25 | LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model | Wanggong Yang et.al. | 2407.17229 | null |
2024-07-23 | Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions | Fabio Tosi et.al. | 2407.16698 | link |
2024-07-23 | Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack | Xiaoyue Xu et.al. | 2407.16695 | link |
2024-07-23 | Can Large Language Models Automatically Jailbreak GPT-4V? | Yuanwei Wu et.al. | 2407.16686 | null |
2024-07-23 | SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation | Pengfei Chen et.al. | 2407.16682 | null |
2024-07-23 | RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | Huiyu Xu et.al. | 2407.16667 | null |
2024-07-23 | Lawma: The Power of Specialization for Legal Tasks | Ricardo Dominguez-Olmedo et.al. | 2407.16615 | null |
2024-07-23 | Shared Imagination: LLMs Hallucinate Alike | Yilun Zhou et.al. | 2407.16604 | null |
2024-07-23 | Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs | Yifan Xia et.al. | 2407.16576 | null |
2024-07-24 | Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning | Fang-Duo Tsai et.al. | 2407.16564 | link |
2024-07-23 | Patched RTC: evaluating LLMs for diverse software development tasks | Asankhaya Sharma et.al. | 2407.16557 | link |
2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850 | link |
2024-07-22 | LLMmap: Fingerprinting For Large Language Models | Dario Pasquini et.al. | 2407.15847 | link |
2024-07-22 | HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning | Eugene Valassakis et.al. | 2407.15844 | null |
2024-07-22 | Artist: Aesthetically Controllable Text-Driven Stylization without Training | Ruixiang Jiang et.al. | 2407.15842 | link |
2024-07-22 | Inequalities in Computational Thinking Among Incoming Students in an STEM Chilean University | Felipe González-Pizarro et.al. | 2407.15833 | null |
2024-07-23 | Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy’s Properties | Shao-Yu Fu et.al. | 2407.15824 | null |
2024-07-22 | Robust Facial Reactions Generation: An Emotion-Aware Framework with Modality Compensation | Guanyu Hu et.al. | 2407.15798 | null |
2024-07-22 | AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection | Yunkang Cao et.al. | 2407.15795 | link |
2024-07-22 | CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning | Emanuele Frascaroli et.al. | 2407.15793 | link |
2024-07-22 | Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach | Rian Dolphin et.al. | 2407.15788 | null |
2024-07-19 | T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Kaiyue Sun et.al. | 2407.14505 | link |
2024-07-19 | M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models | Seunggeun Chi et.al. | 2407.14502 | null |
2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487 | link |
2024-07-19 | ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities | Peng Xu et.al. | 2407.14482 | null |
2024-07-19 | Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Mingjie Li et.al. | 2407.14474 | null |
2024-07-19 | AttentNet: Fully Convolutional 3D Attention for Lung Nodule Detection | Majedaldein Almahasneh et.al. | 2407.14464 | null |
2024-07-19 | From Instruction to Insight: Exploring the Functional and Semantic Roles of Text in Interactive Dashboards | Nicole Sultanum et.al. | 2407.14451 | null |
2024-07-19 | Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model | Seonghui Min et.al. | 2407.14434 | null |
2024-07-19 | Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models | Hyun-Jic Oh et.al. | 2407.14426 | null |
2024-07-19 | Improving Retrieval in Sponsored Search by Leveraging Query Context Signals | Akash Kumar Mohankumar et.al. | 2407.14346 | null |
2024-07-18 | Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion | Boyang Deng et.al. | 2407.13759 | null |
2024-07-18 | LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation | David Schlangen et.al. | 2407.13744 | null |
2024-07-18 | HazeCLIP: Towards Language Guided Real-World Image Dehazing | Ruiyi Wang et.al. | 2407.13719 | link |
2024-07-18 | CoDefeater: Using LLMs To Find Defeaters in Assurance Cases | Usman Gohar et.al. | 2407.13717 | link |
2024-07-18 | Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender Portfolio | Jing Xu et.al. | 2407.13687 | null |
2024-07-18 | EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension | Wei Zhang et.al. | 2407.13596 | link |
2024-07-18 | Robust Calibration of Large Vision-Language Adapters | Balamurali Murugesan et.al. | 2407.13588 | link |
2024-07-18 | SAM-Driven Weakly Supervised Nodule Segmentation with Uncertainty-Aware Cross Teaching | Xingyue Zhao et.al. | 2407.13553 | null |
2024-07-18 | GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding | Changshuo Wang et.al. | 2407.13519 | link |
2024-07-19 | Mask2Map: Vectorized HD Map Construction Using Bird’s Eye View Segmentation Masks | Sehwan Choi et.al. | 2407.13517 | link |
2024-07-17 | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model | Zhongqun Zhang et.al. | 2407.12727 | null |
2024-07-17 | Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Ben Yao et.al. | 2407.12725 | null |
2024-07-17 | Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs | Yiqing Shen et.al. | 2407.12678 | link |
2024-07-17 | FastSAM-3DSlicer: A 3D-Slicer Extension for 3D Volumetric Segment Anything Model with Uncertainty Quantification | Yiqing Shen et.al. | 2407.12658 | link |
2024-07-17 | Zero-shot Text-guided Infinite Image Synthesis with LLM guidance | Soyeong Kwon et.al. | 2407.12642 | null |
2024-07-17 | Rethinking the Architecture Design for Efficient Generic Event Boundary Detection | Ziwei Zheng et.al. | 2407.12622 | link |
2024-07-17 | Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models | Donggeun Kim et.al. | 2407.12616 | null |
2024-07-17 | AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism | William Brannon et.al. | 2407.12613 | link |
2024-07-17 | Continuous reasoning for adaptive container image distribution in the cloud-edge continuum | Damiano Azzolini et.al. | 2407.12605 | link |
2024-07-17 | VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding | Ofir Abramovich et.al. | 2407.12594 | link |
USage Instructions
Usage instructions: here